Extracting informations from Text file in java

Extracting informations from Text file in java - java

I'm writing a program where I need to read a text file and extract some specific strings, the text is written in DOT language and this is an example of the file:
digraph G {
node [shape=circle];
0 [xlabel="[]"];
1 [xlabel="[[Text]]"];
0 -> 1 [label="a"];//this
1 -> 2 [label="ab"];//this
1 -> 3 [label="123"];//this
}
I want to ignore everything but the lines that have the structure of the commented lines (by //this);
Then split every line to three parts, i.e.:
1 -> 2 [label="ab"];
saved as a list of strings (or array ...):
[1,2,ab]
I tried a lots with regex but I couldn't get the expected results.

Here is the regex you can use:
(?m)^(\d+)\s+->\s+(\d+)\s+\[\w+="([^"]*)"];\s*//[^/\n]*$
See regex demo.
All the necessary details are held in Group 1, 2 and 3.
See Java code:
String str = "digraph G {\nnode [shape=circle];\n0 [xlabel=\"[]\"];\n1 [xlabel=\"[[Text]]\"];\n0 -> 1 [label=\"a\"];//this\n1 -> 2 [label=\"ab\"];//this\n1 -> 3 [label=\"123\"];//this\n}";
Pattern ptrn = Pattern.compile("(?m)^(\\d+)\\s+->\\s+(\\d+)\\s+\\[\\w+=\"([^\"]*)\"\\];\\s*//[^/\n]*$");
Matcher m = ptrn.matcher(str);
ArrayList<String[]> results = new ArrayList<String[]>();
while (m.find()) {
results.add(new String[]{m.group(1), m.group(2), m.group(3)});
}
for(int i = 0; i < results.size(); i++) { // Display results
System.out.println(Arrays.toString(results.get(i)));
}

IF you are guaranteed that the line will always be in the format of a -> b [label="someLabel"]; then I guess you can use a bunch of splits to get what you need:
if (outputLine.contains("[label=")) {
String[] split1 = outputLine.split("->");
String first = split1[0].replace(" ", ""); // value of 1
String[] split2 = split1[1].split("\\[label=\"");
String second = split2[0].replace(" ", ""); // value of 2
String label = split2[1].replace("\"", "").replace(" ", "").replace("]", "").replace(";", ""); // just the label
String[] finalArray = {first, second, label};
System.out.println(Arrays.toString(finalArray)); // [1, 2, ab]
}
Seems clunky. Probably a better way to do this.

Related

(hello-> h3o) How to replace in a String the middle letters for the number of letters replaced

I need to build a method which receive a String e.g. "elephant-rides are really fun!". and return another similar String, in this example the return should be: "e6t-r3s are r4y fun!". (because e-lephan-t has 6 middle letters, r-ide-s has 3 middle letters and so on)
To get that return I need to replace in each word the middle letters for the number of letters replaced leaving without changes everything which isn't a letter and the first and the last letter of every word.
for the moment I've tried using regex to split the received string into words, and saving these words in an array of strings also I have another array of int in which I save the number of middle letters, but I don't know how to join both arrays and the symbols into a correct String to return
String string="elephant-rides are really fun!";
String[] parts = string.split("[^a-zA-Z]");
int[] sizes = new int[parts.length];
int index=0;
for(String aux: parts)
{
sizes[index]= aux.length()-2;
System.out.println( sizes[index]);
index++;
}

You may use
String text = "elephant-rides are really fun!";
Pattern r = Pattern.compile("(?U)(\\w)(\\w{2,})(\\w)");
Matcher m = r.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(1) + m.group(2).length() + m.group(3));
}
m.appendTail(sb); // append the rest of the contents
System.out.println(sb);
// => e6t-r3s are r4y fun!
See the Java demo
Here, (?U)(\\w)(\\w{2,})(\\w) matches any Unicode word char capturing it into Group 1, then captures any 2 or more word chars into Group 2 and then captures a single word char into Group 3, and inside the .appendReplacement method, the second group contents are "converted" into its length.
Java 9+:
String text = "elephant-rides are really fun!";
Pattern r = Pattern.compile("(?U)(\\w)(\\w{2,})(\\w)");
Matcher m = r.matcher(text);
String result = m.replaceAll(x -> x.group(1) + x.group(2).length() + x.group(3));
System.out.println( result );
// => e6t-r3s are r4y fun!

For the instructions you gave us, this would be sufficient:
String [] result = string.split("[\\s-]");
for (int i=0; i<result.length; i++){
result[i] = "" + result[i].charAt(0) + ((result[i].length())-2) + result[i].charAt(result[i].length()-1);
}
With your input, it creates the array [ "e6t", "r3s", "a1e", "r4y", "f2!" ]
And it works even with one or two sized words, but it gives result such as:
Input: I am a small; Output: [ "I-1I", "a0m", "a-1a", "s3l" ]
Again, for the instructions you gave us this would be legal.
Hope I helped!

Extract words between double quotes based on position

I have a single string that contains several quotes, i.e:
"Bruce Wayne" "43" "male" "Gotham"
I want to create a method using regex that extracts certain values from the String based on their position.
So for example, if I pass the Int values 1 and 3 it should return a String of:
"Bruce Wayne" "male"
Please note the double quotes are part of the String and are escaped characters (\")

If the number of (possible) groups is known you could use a regular expression like "(.*?)"\s*"(.*?)"\s*"(.*?)"\s*"(.*?)" along with Pattern and Matcher and access the groups by number (group 0 will always be the first match, group 1 will be the first capturing group in the expression and so on).
If the number of groups is not known you could just use expression "(.*?)" and use Matcher#find() too apply the expression in a loop and collect all the matches (group 0 in that case) into a list. Then use your indices to access the list element (element 1 would be at index 0 then).
Another alternative would be to use string.replaceAll("^[^\"]*\"|\"[^\"]*$","").split("\"\\s*\""), i.e. remove the leading and trailing double quotes with any text before or after and then split on quotes with optional whitespace in between.
Example:
assume the string optional crap before "Bruce Wayne" "43" "male" "Gotham" optional crap after
string.replaceAll("^[^\"]*\"|\"[^\"]*$","") will result in Bruce Wayne" "43" "male" "Gotham
applying split("\"\\s*\"") on the result of the step before will yield the array [Bruce Wayne, 43, male, Gotham]
then just access the array elements by index (zero-based)

My function starts at 0. You said that you want 1 and 3 but usually you start at 0 when working with arrays. So to get "Bruce Wayne" you'd ask for 0 not 1. (you could change that if you'd like though)
String[] getParts(String text, int... positions) {
String results[] = new String[positions.length];
Matcher m = Pattern.compile("\"[^\"]*\"").matcher(text);
for(int i = 0, j = 0; m.find() && j < positions.length; i++) {
if(i != positions[j]) continue;
results[j] = m.group();
j++;
}
return results;
}
// Usage
public Test() {
String[] parts = getParts(" \"Bruce Wayne\" \"43\" \"male\" \"Gotham\" ", 0, 2);
System.out.println(Arrays.toString(parts));
// = ["Bruce Wayne", "male"]
}
The method accepts as many parameters as you like.
getParts(" \"a\" \"b\" \"c\" \"d\" ", 0, 2, 3); // = a, c, d
// or
getParts(" \"a\" \"b\" \"c\" \"d\" ", 3); // = d

The function to extract words based on position:
import java.util.ArrayList;
import java.util.regex.*;
public String getString(String input, int i, int j){
ArrayList <String> list = new ArrayList <String> ();
Matcher m = Pattern.compile("(\"[^\"]+\")").matcher(input);
while (m.find()) {
list.add(m.group(1));
}
return list.get(i - 1) + list.get(j - 1);
}
Then the words can be extracted like:
String input = "\"Bruce Wayne\" \"43\" \"male\" \"Gotham\"";
String res = getString(input, 1, 3);
System.out.println(res);
Output:
"Bruce Wayne""male"

confused how .split() works in Java

I have this string which I am taking in from a text file.
"1 normal 1 [(o, 21) (o, 17) (t, 3)]"
I want to take in 1, normal, 1, o, 21, 17, t, 3 in a string array.
Scanner inFile = new Scanner(new File("input.txt");
String input = inFile.nextLine();
String[] tokens = input.split(" |\\(|\\)|\\[\\(|\\, |\\]| \\(");
for(int i =0 ; i<tokens.length; ++i)
{
System.out.println(tokens[i]);
}
Output:
1
normal
1
o
21
o
17
t
3
Why are there spaces being stored in the array.

That's not spaces, that's empty strings. Your string is:
"1 normal 1 [(o, 21) (o, 17) (t, 3)]"
It's split in the following way according to your regexp:
Token = "1"
Delimiter = " "
Token = "normal"
Delimiter = " "
Token = "1"
Delimiter = " "
Token = "" <-- empty string
Delimiter = "[("
Token = "o"
... end so on
When two adjacent delimiters appear, it's considered that there's an empty string token between them.
To fix this you may change your regexp, for example, like this:
"[ \\(\\)\\[\\,\\]]+"
Thus any number of " ()[,]" adjacent characters will be considered as a delimiter.

For example here:
1 [(o
At first step it matches a single space.
The next step it matches [(
So between these two matching, a void String "" is returned.

How can I parse a "key1:value1, value, key2:value3" string into ArrayLists?

I have a string
String line = "abc:xyz uvw, def:ghi, mno:rst, ijk:efg, abc opq";
I want to parse this string into two lists:
ArrayList<String> tags;
ArrayList<String> values;
where the tags are the words before the colon (in my example: abc, def, ijk and mno). That is I want
tags = Arrays.asList("abc", "def", "mno", "ijk");
values = Arrays.asList("xyz uvw", "ghi", "rst", "efg, abc opq");
Note that the values can have spaces and commas in them and are not just one word.

Since your values can contain commas, you need to split when you find a key.
A key is defined as a word preceding a :.
So, your split pattern will be ", (?=[a-zA-z]+:)"
This checks for a comma space chars colon in the specified order, looking ahead the chars and colon.
Checks for a key, and splits with lookahead (thus leaving the key intact). This will give you an array of keyValue pairs
Then you can split with : to get the keys and values.
String str = "Your string";
String[] keyValuePairs = str.split(", (?=[a-zA-z]+:)");
for (String keyValuePair : keyValuePairs) {
String[] keyvalue = keyValuePair.split(":");
keysArray.add(keyvalue[0]);
valuesArray.add(keyvalue[1]);
}

I would go with a regex. I am not sure how to do this in Java but in python that would be:
(\w+):([ ,\w]+)(,|$)
Tested on pythex with input abc:xy z uvw, def:g,hi, mno:rst. The result is:
Match 1
1. abc
2. xy z uvw
3. ,
Match 2
1. def
2. g,hi
3. ,
Match 3
1. mno
2. rst
3. Empty
So for each match you get the key in position 1 and the value in 2. The separator is stored in 3

First obtain your string from the file
List<String> tags = new ArrayList<String>();
List<String> values = new ArrayList<String>;
String lineThatWasRead = ...
Then we split it by commas to obtain the pairs, and for each pari split by :
List<String> separatedStringList = Arrays.asList(lineThatWasRead.split(","));
for (String str : separatedStringList)
{
//Since it is possible values also contain commas, we have to check
//if the current string is a new pair of tag:value or just adding to the previous value
if (!str.contains(":"))
{
if (values.size > 0)
{
values.set(values.size() - 1, values.get(values.size()-1) + ", " + str);
continue; //And go to next str since this one will not have new keys
}
}
String[] keyValArray = str.split(:);
String key = keyValArray[0];
String val = keyValArray[1];
tags.add(key);
values.add(val);
}
Note that you are not forced to use a list but I just like the flexibility they give. Some might argue String[] would perform better for the first split by ,.

You get your line as string.
//your code here
String line = //your code here
String[] stuff = line.split(":")// this will split your line by ":" symbol.
stuff[0] //this is your key
stuff[1] //this is your value

How to parse string with Java?

I am trying to make a simple calculator application that would take a string like this
5 + 4 + 3 - 2 - 10 + 15
I need Java to parse this string into an array
{5, +4, +3, -2, -10, +15}
Assume the user may enter 0 or more spaces between each number and each operator
I'm new to Java so I'm not entirely sure how to accomplish this.

You can use Integer.parseInt to get the values, splitting the string you can achieve with String class. A regex could work, but I dont know how to do those :3

Take a look at String.split():
String str = "1 + 2";
System.out.println(java.util.Arrays.toString(str.split(" ")));
[1, +, 2]
Note that split uses regular expressions, so you would have to quote the character to split by "." or similar characters with special meanings. Also, multiple spaces in a row will create empty strings in the parse array which you would need to skip.
This solves the simple example. For more rigorous parsing of true expressions you would want to create a grammar and use something like Antlr.

Let str be your line buffer.
Use Regex.match for pattern ([-+]?[ \t]*[0-9]+).
Accumulate all matches into String[] tokens.
Then, for each token in tokens:
String s[] = tokens[i].split(" +");
if (s.length > 1)
tokens[i] = s[0] + s[1];
else
tokens[i] = s[0];

You can use positive lookbehind:
String s = "5 + 4 + 3 - 2 - 10 + 15";
Pattern p = Pattern.compile("(?<=[0-9]) +");
String[] result = p.split(s);
for(String ss : result)
System.out.println(ss.replaceAll(" ", ""));

String cal = "5 + 4 + 3 - 2 - 10 + 15";
//matches combinations of '+' or '-', whitespace, number
Pattern pat = Pattern.compile("[-+]{1}\\s*\\d+");
Matcher mat = pat.matcher(cal);
List<String> ops = new ArrayList<String>();
while(mat.find())
{
ops.add(mat.group());
}
//gets first number and puts in beginning of List
ops.add(0, cal.substring(0, cal.indexOf(" ")));
for(int i = 0; i < ops.size(); i++)
{
//remove whitespace
ops.set(i, ops.get(i).replaceAll("\\s*", ""));
}
System.out.println(Arrays.toString(ops.toArray()));
//[5, +4, +3, -2, -10, +15]

Based off the input of some of the answers here, I found this to be the best solution
// input
String s = "5 + 4 + 3 - 2 - 10 + 15";
ArrayList<Integer> numbers = new ArrayList<Integer>();
// remove whitespace
s = s.replaceAll("\\s+", "");
// parse string
Pattern pattern = Pattern.compile("[-]?\\d+");
Matcher matcher = pattern.matcher(s);
// add numbers to array
while (matcher.find()) {
numbers.add(Integer.parseInt(matcher.group()));
}
// numbers
// {5, 4, 3, -2, -10, 15}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting informations from Text file in java - java

Related

(hello-> h3o) How to replace in a String the middle letters for the number of letters replaced

Extract words between double quotes based on position

confused how .split() works in Java

How can I parse a "key1:value1, value, key2:value3" string into ArrayLists?

How to parse string with Java?

Categories

Resources