Simple delimiter parser - java

I'm trying out PetitParser for parsing a simple integer list delimited by commas. For example: "1, 2, 3, 4"
I tried creating a integer parser and then use the delimitedBy method.
Parser integerParser = digit().plus().flatten().trim().map((String value) -> Integer.parseInt(value));
Parser listParser = integerParser.delimitedBy(of(','));
List<Integer> list = listParser.parse(input).get();
This returns a list with the parsed integers but also the delimiters.
For example: [1, ,, 2, ,, 3, ,, 4]
Is there a way to exclude the delimiters from the result?

Yes, there is:
Parser listParser = integerParser
.delimitedBy(of(','))
.map(withoutSeparators());
To get withoutSeparators() import import static org.petitparser.utils.Functions.withoutSeparators;.

Here is an example of how to do it without any external library:
public static void main(String args[]) {
// source String
String delimitedNumbers = "1, 2, 3, 4, 5, 6, 7, 8, 9, 10";
// split this String by its delimiter, a comma here
String[] delimitedNumbersSplit = delimitedNumbers.split(",");
// provide a data structure that holds numbers (integers) only
List<Integer> numberList = new ArrayList<>();
// for each part of the split list
for (String num : delimitedNumbersSplit) {
// remove all whitespaces and parse the number
int n = Integer.parseInt(num.trim());
// add the number to the list of numbers
numberList.add(n);
}
// then create the representation you want to print
StringBuilder sb = new StringBuilder();
// [Java 8] concatenate the numbers to a String that delimits by whitespace
numberList.forEach(number -> sb.append(number).append(" "));
// then remove the trailing whitespace
String numbersWithoutCommas = sb.toString();
numbersWithoutCommas = numbersWithoutCommas.substring(0, numbersWithoutCommas.length() - 1);
// and print the result
System.out.println(numbersWithoutCommas);
}
Note that you don't need to trim() the results of the split String if you have a list without whitespaces.
In case you need the PetitParser library, you will have to look up how to use it in its docs.

With a bit more code I got the result without the delimiters:
Parser number = digit().plus().flatten().trim()
.map((String value) -> Integer.parseInt(value));
Parser next = of(',').seq(number).map((List<?> input) -> input.get(1)).star();
Parser parser = number.seq(next).map((List<List<?>> input) -> {
List<Object> result = new ArrayList<>();
result.add(input.get(0));
result.addAll(input.get(1));
return result;
});

Related

Extracting informations from Text file in java

I'm writing a program where I need to read a text file and extract some specific strings, the text is written in DOT language and this is an example of the file:
digraph G {
node [shape=circle];
0 [xlabel="[]"];
1 [xlabel="[[Text]]"];
0 -> 1 [label="a"];//this
1 -> 2 [label="ab"];//this
1 -> 3 [label="123"];//this
}
I want to ignore everything but the lines that have the structure of the commented lines (by //this);
Then split every line to three parts, i.e.:
1 -> 2 [label="ab"];
saved as a list of strings (or array ...):
[1,2,ab]
I tried a lots with regex but I couldn't get the expected results.
Here is the regex you can use:
(?m)^(\d+)\s+->\s+(\d+)\s+\[\w+="([^"]*)"];\s*//[^/\n]*$
See regex demo.
All the necessary details are held in Group 1, 2 and 3.
See Java code:
String str = "digraph G {\nnode [shape=circle];\n0 [xlabel=\"[]\"];\n1 [xlabel=\"[[Text]]\"];\n0 -> 1 [label=\"a\"];//this\n1 -> 2 [label=\"ab\"];//this\n1 -> 3 [label=\"123\"];//this\n}";
Pattern ptrn = Pattern.compile("(?m)^(\\d+)\\s+->\\s+(\\d+)\\s+\\[\\w+=\"([^\"]*)\"\\];\\s*//[^/\n]*$");
Matcher m = ptrn.matcher(str);
ArrayList<String[]> results = new ArrayList<String[]>();
while (m.find()) {
results.add(new String[]{m.group(1), m.group(2), m.group(3)});
}
for(int i = 0; i < results.size(); i++) { // Display results
System.out.println(Arrays.toString(results.get(i)));
}
IF you are guaranteed that the line will always be in the format of a -> b [label="someLabel"]; then I guess you can use a bunch of splits to get what you need:
if (outputLine.contains("[label=")) {
String[] split1 = outputLine.split("->");
String first = split1[0].replace(" ", ""); // value of 1
String[] split2 = split1[1].split("\\[label=\"");
String second = split2[0].replace(" ", ""); // value of 2
String label = split2[1].replace("\"", "").replace(" ", "").replace("]", "").replace(";", ""); // just the label
String[] finalArray = {first, second, label};
System.out.println(Arrays.toString(finalArray)); // [1, 2, ab]
}
Seems clunky. Probably a better way to do this.

Java - Extract string from pattern

Given some strings that look like this:
(((((((((((((4)+13)*5)/1)+7)+12)*3)-6)-11)+9)*2)/8)-10)
(((((((((((((4)+13)*6)/1)+5)+12)*2)-7)-11)+8)*3)/9)-10)
(((((((((((((4)+13)*6)/1)+7)+12)*2)-8)-11)+5)*3)/9)-10)
(btw, they are solutions for a puzzle which I write a program for :) )
They all share this pattern
"(((((((((((((.)+13)*.)/.)+.)+12)*.)-.)-11)+.)*.)/.)-10)"
For 1 solution : How can I get the values with this given pattern?
So for the first solution I will get an collection,list,array (doesn't matter) like this:
[4,5,1,7,3,6,9,2,8]
You've done most of the work actually by providing the pattern. All you need to do is use capturing groups where the . are (and escape the rest).
I put your inputs in a String array and got the results into a List of integers (as you said, you can change it to something else). As for the pattern, you want to capture the dots; this is done by surrounding them with ( and ). The problem in your case is that the whole string is full of them, so we need to quote / escape them out (meaning, tell the regex compiler that we mean the literal / character ( and )). This can be done by putting the part we want to escape between \Q and \E.
The code below shows a coherent (though maybe not effective) way to do this. Just be careful with using the right amount of \ in the right places:
public class Example {
public static void main(String[] args) {
String[] inputs = new String[3];
inputs[0] = "(((((((((((((4)+13)*5)/1)+7)+12)*3)-6)-11)+9)*2)/8)-10)";
inputs[1] = "(((((((((((((4)+13)*6)/1)+5)+12)*2)-7)-11)+8)*3)/9)-10)";
inputs[2] = "(((((((((((((4)+13)*6)/1)+7)+12)*2)-8)-11)+5)*3)/9)-10)";
List<Integer> results;
String pattern = "(((((((((((((.)+13)*.)/.)+.)+12)*.)-.)-11)+.)*.)/.)-10)"; // Copy-paste from your question.
pattern = pattern.replaceAll("\\.", "\\\\E(.)\\\\Q");
pattern = "\\Q" + pattern;
Pattern p = Pattern.compile(pattern);
Matcher m;
for (String input : inputs) {
m = p.matcher(input);
results = new ArrayList<>();
if (m.matches()) {
for (int i = 1; i < m.groupCount() + 1; i++) {
results.add(Integer.parseInt(m.group(i)));
}
}
System.out.println(results);
}
}
}
Output:
[4, 5, 1, 7, 3, 6, 9, 2, 8]
[4, 6, 1, 5, 2, 7, 8, 3, 9]
[4, 6, 1, 7, 2, 8, 5, 3, 9]
Notes:
You are using a single ., which means
Any character (may or may not match line terminators)
So if you have a number there which is not a single digit or a single character which is not a number (digit), something will go wrong either in the matches or parseInt. Consider \\d to signify a single digit or \\d+ for a number instead.
See Pattern for more info on regex in Java.

Changing the format of array

Is there a method you can use to change the format of an array (as in the way it is separated by commas and enclosed in brackets when it's printed out). I want to get rid of the brackets and commas so that the terms are only separated by spaces.
Then don't use the Arrays.toString() method, write your own:
StringBuilder sb = new StringBuilder();
for (xxxx val : array) {
sb.append(val).append(" ");
}
sb.setLength(sb.length() - 1);
return sb.toString();
Yes, use Guava's Joiner class for full control of your output.
E.g.,
Integer[] foo = new Integer[] {1, 2, 3};
System.out.println(Joiner.on(' ').join(foo);
gives 1 2 3
You can replace the , and the ( and )
String formatedString = myArrayList.toString()
.replace(",", " ") //remove the commas
.replace("[", "") //remove the right bracket
.replace("]", "");
Use org.apache.commons.lang.StringUtils.
E.g.:
String[] array = {"A","B"};
System.out.println( StringUtils.join(array, "_") );

Reading and replacing Integers in a string

I have a string for example "x(10, 9, 8)" I want to read each integer from the string then using the integer as an array index retrieve a new integer from an array and replace it with this value.
All of the methods I've tried seem more suited to applying the same thing to all integers, or just retrieving the integers and then loosing track of them. Can anyone tell me the best way to do this please?
Using regular expressions, you can "browse" through each number in your string, regardless of how they are separated, and replace them as required. For example, the code below prints x(101, 99, 88):
public static void main(String[] args) {
int[] array = {0, 1, 2, 3, 4, 5, 6, 7, 88, 99, 101};
String s = "x(10, 9, 8)";
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(s);
StringBuilder replace = new StringBuilder();
int start = 0;
while(m.find()) {
//append the non-digit part first
replace.append(s.substring(start, m.start()));
start = m.end();
//parse the number and append the number in the array at that index
int index = Integer.parseInt(m.group());
replace.append(array[index]);
}
//append the end of the string
replace.append(s.substring(start, s.length()));
System.out.println(replace);
}
Note: you should add some exception handling.
Parse the numbers of your string using Integer.parseInt(), String.split(","), and String.indexOf() (for the ( and ). Create a List with them.
Iterate through this list and create a new List with the values from the array.
Iterate through the new List and create the response String.

Inputting integers into an arraylist

If I have a line of integers in a text file in the following format:
[3, 3, 5, 0, 0]
how can I go about adding the integers into an arraylist?
I have this, but it isn't working:
while (input.hasNextInt())
{
int tempInt = input.nextInt();
rtemp.add(tempInt);
}
How do I deal with the commas and the brackets?
You can use ReplaceAll(String regex, String replacement) to remove the brackets and then use Split() function to split the string into an ArrayList using ", " as your delimiter. This will however split the String into smaller strings containing only the numbers so use Integer.parseInt() to convert the string to int.
You need to remember that the numbers in a file are strings, and need to be converted to actual integers first. Assuming that in the input file each line has the format described (example: [3, 3, 5, 0, 0]) this should work for adding all of the numbers to a single ArrayList, ignoring spaces, brackets and commas:
BufferedReader in = new BufferedReader(new FileReader("file.txt"));
List<Integer> ints = new ArrayList<Integer>();
String line = in.readLine();
while (line != null) {
String[] numbers = line.split("[\\[\\],\\s]+");
for (int i = 1; i < numbers.length; i++)
ints.add(Integer.parseInt(numbers[i]));
line = in.readLine();
}
in.close();
Untested but this should work.
string s = "[1,2,3,4,5]";
ArrayList list = new ArrayList();
foreach (string st in s.Replace("[", "").Replace("]", "").Split(','))
{
list.Add(int.Parse(st));
}
You can read the entire string and use regular expression or also the string tokenizer
//you already know how to read those lines into Strings.
String s="[3, 3, 5, 0, 0]";
StringTokenizer tokenizer = new StringTokenizer(s, "[,]");
while(tokenizer.hasMoreTokens())
list.add(Integer.parseInt(tokenizer.nextToken())
java.util.StringTokenizer will help you in splitting the String using the specified delimiters("[","," or a "]")
Refer http://docs.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html
You can try something like this:
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
String[] nums = line.substring(1,line.length() - 1).split(",");
for(String n:nums){
list.add(Integer.valueOf(n));
}
}
You could also use a regex for this purpose.

Categories