Whats the best way to parse a file in Java - java

I have a text file with Tag - Value format data. I want to parse this file to form a Trie. What will be the best approach?
Sample of File: (String inside "" is a tag and '#' is used to comment the line.)
#Hi, this is a sample file.
"abcd" = 12;
"abcde" = 16;
"http" = 32;
"sip" = 21;

This is basically a properties file, I would remove the " around the tags, then use the Properties class http://java.sun.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader) to load the file.

Read that in using Properties and trim the excess parts (", ; and whitespace). Short example:
Properties props = Properties.load(this.getClass()
.getResourceAsStream("path/to.file"));
Map<String, String> cleanedProps = new HashMap<String, String>();
for(Entry pair : props.entrySet()) {
cleanedProps.put(cleanKey(pair.getKey()),
cleanValue(pair.getValue()));
}
Note that in the solution above you only need implement the cleanKey() and cleanValue() yourself. You may want to change the datatypes accordingly if necessary, I used Strings just as an example.

There are many ways to do this; others have mentioned that java.util.Properties gets most of the job done, and is probably the most robust solution.
One other option is to use a java.util.Scanner.
Use the Scanner(File) constructor to scan a file
You can useDelimiter appropriate for this format
nextInt() can be used to extract the numbers
Perhaps you can put the key/value pairs into a SortedMap<String,Integer>
Here's an example that scans a String for simplicity:
String text =
"#Hi, this is a sample file.\n" +
"\n" +
"\"abcd\" = 12; \r\n" +
"\"abcde\"=16;\n" +
" # \"ignore\" = 13;\n" +
"\"http\" = 32; # Comment here \r" +
"\"zzz\" = 666; # Out of order! \r" +
" \"sip\" = 21 ;";
System.out.println(text);
System.out.println("----------");
SortedMap<String,Integer> map = new TreeMap<String,Integer>();
Scanner sc = new Scanner(text).useDelimiter("[\"=; ]+");
while (sc.hasNextLine()) {
if (sc.hasNext("[a-z]+")) {
map.put(sc.next(), sc.nextInt());
}
sc.nextLine();
}
System.out.println(map);
This prints (as seen on ideone.com):
#Hi, this is a sample file.
"abcd" = 12;
"abcde"=16;
# "ignore" = 13;
"http" = 32; # Comment here
"zzz" = 666; # Out of order!
"sip" = 21 ;
----------
{abcd=12, abcde=16, http=32, sip=21, zzz=666}
Related questions
Validating input using java.util.Scanner
Iterate Over Map
See also
regular-expressions.info/Tutorial

The most natural way is probably this:
void doParse() {
String text =
"#Hi, this is a sample file.\n"
+ "\"abcd\" = 12;\n"
+ "\"abcde\" = 16;\n"
+ "#More comment\n"
+ "\"http\" = 32;\n"
+ "\"sip\" = 21;";
Matcher matcher = Pattern.compile("\"(.+)\" = ([0-9]+)").matcher(text);
while (matcher.find()) {
String txt = matcher.group(1);
int val = Integer.parseInt(matcher.group(2));
System.out.format("parsed: %s , %d%n", txt, val);
}
}

Related

using regex output in a method

I'm using regex to read data from a file but I'm having trouble using the data I'm reading.
here is my code:
File file = new File(eventsFile);
try {
Scanner sc = new Scanner(file);
while(sc.hasNext()){
String eventLine = sc.nextLine();
Pattern pattern = Pattern.compile("^Event=(?<event>[^,]*),time=(?<time>[^,]*)(,rings=(?<rings>[^,]*))?$");
Matcher matcher = pattern.matcher(eventLine);
while (matcher.find()) {
System.out.print(matcher.group("event") + " " + matcher.group("time"));
String eventName = matcher.group("event");
int time = Integer.parseInt(matcher.group("time"));
Class<?> eventClass = Class.forName(eventName);
Constructor<?> constructor = eventClass.getConstructor(long.class);
Event event = (Event) constructor.newInstance(time);
addEvent(event);
if (matcher.group(4) != null) {
System.out.println(" " + matcher.group(4));
} else {
System.out.println();
}
}
}
The print statements are there just temporarily to make sure the scanning of the file and regex work. what i'm trying to accomplish is use matcher.group(1) and matcher.group(2) as follows addEvent(new eventname(time)) where eventname is matcher.group(1) and time is matcher.group(2)
I tried creating variables to store group(1) and 2 and use them in addEvent but that didn't really work. So any ideas on how to approach such an issue?
EDIT:
Example of text file
Event=ThermostatNight,time=0
Event=LightOn,time=2000
Event=WaterOff,time=10000
Event=ThermostatDay,time=12000
Event=Bell,time=9000,rings=5
Event=WaterOn,time=6000
Event=LightOff,time=4000
Event=Terminate,time=20000
Event=FansOn,time=7000
Event=FansOff,time=8000
I'm trying to reach a situation where i would be running for an addEvent function for each of these lines in the text file that would follow this example addEvent(new ThermostatNight(0));

String Manipulation in java 1.6

String can be like below. Using java1.6
String example = "<number>;<name-value>;<name-value>";
String abc = "+17005554141;qwq=1234;ddd=ewew;otg=383";
String abc = "+17005554141;qwq=123454";
String abc = "+17005554141";
I want to remove qwq=1234 if present from String. qwq is fixed and its value can VARY like for ex 1234 or 12345 etc
expected result :
String abc = "+17005554141;ddd=ewew;otg=383";
String abc = "+17005554141"; \\removed ;qwq=123454
String abc = "+17005554141";
I tried through
abc = abc.replaceAll(";qwq=.*;", "");
but not working.
I came up with this qwq=\d*\;? and it works. It matches for 0 or more decimals after qwq=. It also has an optional parameter ; since your example seems to include that this is not always appended after the number.
I know the question is not about javascript, but here's an example where you can see the regex working:
const regex = /qwq=\d*\;?/g;
var items = ["+17005554141;qwq=123454",
"+17005554141",
"+17005554141;qwq=1234;ddd=ewew;otg=383"];
for(let i = 0; i < items.length; i++) {
console.log("Item before replace: " + items[i]);
console.log("Item after replace: " + items[i].replace(regex, "") + "\n\n");
}
You can use regex for removing that kind of string like this. Use this code,
String example = "+17005554141;qwq=1234;ddd=ewew;otg=383";
System.out.println("Before: " + example);
System.out.println("After: " + example.replaceAll("qwq=\\d+;?", ""));
This gives following output,
Before: +17005554141;qwq=1234;ddd=ewew;otg=383
After: +17005554141;ddd=ewew;otg=383
.* applies to multi-characters, not limited to digits. Use something that applies only to bunch of digits
abc.replaceAll(";qwq=\\d+", "")
^^
Any Number
please try
abc = abc.replaceAll("qwq=[0-9]*;", "");
If you don't care about too much convenience, you can achieve this by just plain simple String operations (indexOf, replace and substring). This is maybe the most legacy way to do this:
private static String replaceQWQ(String target)
{
if (target.indexOf("qwq=") != -1) {
if (target.indexOf(';', target.indexOf("qwq=")) != -1) {
String replace =
target.substring(target.indexOf("qwq="), target.indexOf(';', target.indexOf("qwq=")) + 1);
target = target.replace(replace, "");
} else {
target = target.substring(0, target.indexOf("qwq=") - 1);
}
}
return target;
}
Small test:
String abc = "+17005554141;qwq=1234;ddd=ewew;otg=383";
String def = "+17005554141;qwq=1234";
System.out.println(replaceQWQ(abc));
System.out.println(replaceQWQ(def));
outputs:
+17005554141;ddd=ewew;otg=383
+17005554141
Another one:
abc.replaceAll(";qwq=[^;]*;", ";");
You must to use groups in replaceAll method.
Here is an example:
abc.replaceAll("(.*;)(qwq=\\d*;)(.*)", "$1$3");
More about groups you can find on: http://www.vogella.com/tutorials/JavaRegularExpressions/article.html

Length of String within tags in java

We need to find the length of the tag names within the tags in java
{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}
so the length of Student tag is 7 and that of subject tag is 7 and that of marks is 5.
I am trying to split the tags and then find the length of each string within the tag.
But the code I am trying gives me only the first tag name and not others.
Can you please help me on this?
I am very new to java. Please let me know if this is a very silly question.
Code part:
System.out.println(
getParenthesesContent("{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}"));
public static String getParenthesesContent(String str) {
return str.substring(str.indexOf('{')+1,str.indexOf('}'));
}
You can use Patterns with this regex \\{(\[a-zA-Z\]*)\\} :
String text = "{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}";
Matcher matcher = Pattern.compile("\\{([a-zA-Z]*)\\}").matcher(text);
while (matcher.find()) {
System.out.println(
String.format(
"tag name = %s, Length = %d ",
matcher.group(1),
matcher.group(1).length()
)
);
}
Outputs
tag name = Student, Length = 7
tag name = Subject, Length = 7
tag name = Marks, Length = 5
You might want to give a try to another regex:
String s = "{Abc}{Defg}100{Hij}100{/Klmopr}{/Stuvw}"; // just a sample String
Pattern p = Pattern.compile("\\{\\W*(\\w++)\\W*\\}");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group(1) + ", length: " + m.group(1).length());
}
Output you get:
Abc, length: 3
Defg, length: 4
Hij, length: 3
Klmopr, length: 6
Stuvw, length: 5
If you need to use charAt() to walk over the input String, you might want to consider using something like this (I made some explanations in the comments to the code):
String s = "{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}";
ArrayList<String> tags = new ArrayList<>();
for(int i = 0; i < s.length(); i++) {
StringBuilder sb = new StringBuilder(); // Use StringBuilder and its append() method to append Strings (it's more efficient than "+=") String appended = ""; // This String will be appended when correct tag is found
if(s.charAt(i) == '{') { // If start of tag is found...
while(!(Character.isLetter(s.charAt(i)))) { // Skip characters that are not letters
i++;
}
while(Character.isLetter(s.charAt(i))) { // Append String with letters that are found
sb.append(s.charAt(i));
i++;
}
if(!(tags.contains(sb.toString()))) { // Add final String to ArrayList only if it not contained here yet
tags.add(sb.toString());
}
}
}
for(String tag : tags) { // Printing Strings contained in ArrayList and their length
System.out.println(tag + ", length: " + tag.length());
}
Output you get:
Student, length: 7
Subject, length: 7
Marks, length: 5
yes use regular expression, find the pattern and apply that.

Better string manipulation code

I'm looking for an efficient (one line) string manipulation code to achieve this, regex probably.
I have a string, for example, "Calvin" and I need to convert this to "/C/a/l/Calvin".
i.e. take first three characters, separate them using '/' and later append the original string.
This is the code I've come up with and its working fine, just looking for a better one.
String first = StringUtils.substring(prodName, 0, 1);
String second = StringUtils.substring(prodName, 1, 2);
String third = StringUtils.substring(prodName, 2, 3);
String prodPath = path + "/" + first + "/" + second + "/" + third + "/" + prodName + "/" ;
prodName.replaceAll("^(.)(.)(.).*", "/$1/$2/$3/$0")
What is the point of StringUtils.substring(prodName, 0, 1) when the built-in prodName.substring(0, 1) will do the same thing??
Anyway, assuming prodName is always at least 3 characters long (since you didn't give rules for expected output if it is not), this is the fastest way to do it:
String prodPath = path + '/' +
prodName.charAt(0) + '/' +
prodName.charAt(1) + '/' +
prodName.charAt(2) + '/' +
prodName + '/';
Normally, char + char is integer addition, not string concatenation, but since the first value is a String, and the + operator is left-associative, all + operators are string concatenations, not numeric additions.
How about using String.charAt
StringBuilder b = new StringBuilder (path);
b.append ('/').append (prodName.charAt (0))
.append ('/').append(prodName.charAt (1))
.append ('/').append(prodName.charAt (2))
.append ('/').append (prodName).append ('/');
Don't use regex for simple stuff like this. You may save a couple lines, but you loose a lot in readability. Regex usually take some time to understand when reading them.
String s = path;
for (int i = 0; i < 3; i++)
s += prodName.substring(i,i+1) + "/";
s += prodName
You can use MessageFormat.format()
MessageFormat.format("{0}/{1}/{2}/{3}/{4}/", baseDir, name.charAt(0), name.charAt(1), name.charAt(2), name);
imho i would wrap it for readability,
private String getProductionDirectoryPath(String baseDir, String name) {
return MessageFormat.format("{0}/{1}/{2}/{3}/{4}/", baseDir, name.charAt(0), name.charAt(1), name.charAt(2), name);
}
Positive look ahead can be used
public static void main(String[] args) {
String s = "Calvin";
System.out.println(s.replaceAll("(?=^(\\w)(\\w)(\\w))", "/$1/$2/$3/"));
}
O/P:
/C/a/l/Calvin
No use of a regex, but a simple split over nothing =)
String[] firstThree = prodName.split("");
String prodPath = path + "/" + firstThree[0] + "/" + firstThree[1] + "/" + firstThree[2] + "/" + prodName + "/";
Another approach is using charAt():
String prodPath = path + "/" + prodName.charAt(0) + "/" + prodName.charAt(1) + "/"+ prodName.charAt(2) + "/" + prodName + "/";
You said efficient but you maybe meant terse. I doubt either should be an objective, so you have a different problem.
Why do you care that this string transformation requires four lines of code? Are you concerned that something that in your mind is one operation ("create transformed string") is spread over four Java operations? You should extract the four lines of Java into their own method. Then, when you read the code where the operation is needed you have one conceptual operation ("create transformed string") corresponding to one Java operation (call a method). You could call the methid createTransformedString to make the code even clearer.
You can use String Builder:
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 3; i++) {
sb.append("/").append(prodName.charAt(i));
}
sb.append('/').append(prodName);
Or you can put all the code in loop:
int size = 2;
StringBuilder sb = new StringBuilder();
for (int i = 0; i <= size; i++) {
if (i == 0)
sb.append('/');
sb.append(prodName.charAt(i)).append("/");
if (i == size)
sb.append(prodName);
}

Java Split method strings into method name and argument

I am writing a small programming language for a game I am making, this language will be for allowing users to define their own spells for the wizard entity outside the internal game code. I have the language written down, but I'm not entirely sure how to change a string like
setSpellName("Fireball")
setSplashDamage(32,5)
into an array which would have the method name and the arguments after it, like
{"setSpellName","Fireball"}
{"setSplashDamage","32","5"}
How could I do this using java's String.split or string regex's?
Thanks in advance.
Since you're only interested in the function name and parameters I'd suggest scanning up to the first instance of ( and then to the last ) for the params, as so.
String input = "setSpellName(\"Fireball\")";
String functionName = input.substring(0, input.indexOf('('));
String[] params = input.substring(input.indexOf(')'), input.length - 1).split(",");
To capture the String
setSpellName("Fireball")
Do something like this:
String[] line = argument.split("(");
Gets you "setSpellName" at line[0] and "Fireball") at line[1]
Get rid of the last parentheses like this
line[1].replaceAll(")", " ").trim();
Build your JSON with the two "cleaned" Strings.
There's probably a better way with Regex, but this is the quick and dirty way.
With String.indexOf() and String.substring(), you can parse out the function and parameters. Once you parse them out, apply the quotes are around each of them. Then combine them all back together delimited by commas and wrapped in curly braces.
public static void main(String[] args) throws Exception {
List<String> commands = new ArrayList() {{
add("setSpellName(\"Fireball\")");
add("setSplashDamage(32,5)");
}};
for (String command : commands) {
int openParen = command.indexOf("(");
String function = String.format("\"%s\"", command.substring(0, openParen));
String[] parameters = command.substring(openParen + 1, command.indexOf(")")).split(",");
for (int i = 0; i < parameters.length; i++) {
// Surround parameter with double quotes
if (!parameters[i].startsWith("\"")) {
parameters[i] = String.format("\"%s\"", parameters[i]);
}
}
String combine = String.format("{%s,%s}", function, String.join(",", parameters));
System.out.println(combine);
}
}
Results:
{"setSpellName","Fireball"}
{"setSplashDamage","32","5"}
This is a solution using regex, use this Regex "([\\w]+)\\(\"?([\\w]+)\"?\\)":
String input = "setSpellName(\"Fireball\")";
String pattern = "([\\w]+)\\(\"?([\\w]+)\"?\\)";
Pattern r = Pattern.compile(pattern);
String[] matches;
Matcher m = r.matcher(input);
if (m.find()) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
String[] params = m.group(2).split(",");
if (params.length > 1) {
matches = new String[params.length + 1];
matches[0] = m.group(1);
System.out.println(params.length);
for (int i = 0; i < params.length; i++) {
matches[i + 1] = params[i];
}
System.out.println(String.join(" :: ", matches));
} else {
matches = new String[2];
matches[0] = m.group(1);
matches[1] = m.group(2);
System.out.println(String.join(", ", matches));
}
}
([\\w]+) is the first group to get the function name.
\\(\"?([\\w]+)\"?\\) is the second group to get the parameters.
This is a Working DEMO.

Categories