java use regular expression in String.split to prepare json argument - java

I have a csv file which contain this type of document:
{""cast_id"": 10, ""character"": ""Mushu (voice)"", ""credit_id"": ""52fe43a09251416c75017cbb"", ""gender"": 2, ""id"": 776, ""name"": ""Eddie Murphy"", ""order"": 0}, {""cast_id"": 62, ""character"": ""[Singing voice]"", ""credit_id"": ""597a65c8925141233d0000bb"", ""gender"": 2, ""id"": 18897, ""name"": ""Jackie Chan"", ""order"": 1}, {""cast_id"": 16, ""character"": ""Mulan (voice)"", ""credit_id"": ""52fe43a09251416c75017cd5"", ""gender"": 1, ""id"": 21702, ""name"": ""Ming-Na Wen"", ""order"": 2}
I used this regular expression first to change quadruple quote to double quote:
String newResult = result.replaceAll("\"{2}", "\"");
Then I use this regular expression to split this string:
String[] jsonResult = newResult.split(", (?![^{]*\\})");
However, it seperates the string into this:
{"cast_id": 10, "character": "Mushu (voice)", "credit_id": "52fe43a09251416c75017cbb", "gender": 2, "id": 776, "name": "Eddie Murphy", "order": 0}
{"cast_id": 62
"character": "[Singing voice
something else then
{"cast_id": 16, "character": "Mulan (voice)", "credit_id": "52fe43a09251416c75017cd5", "gender": 1, "id": 21702, "name": "Ming-Na Wen", "order": 2}
So my regular expression failed when it meets square brackets [], can I have some help with this?
I tried to use http://www.regexplanet.com/advanced/java/index.html but I don't understand what I should put in option, replacement and input. How do I use this website?
Thanks

You are dealing with JSON data which has been saved as one column CSV file. :)
Quotes will be escaped with double quotes in CSV, so you could just use a CSV library to read your file. As I said, you should expect to get just one column - one value containing your JSON. Then you use a JSON library to parse your JSON.
=> you would not need to implement any parsing at all.

You should be looking for the pattern }, {
The regex: (?<=\}), (?=\{) does just that. Your regex will give a false positive if a } is missing at the end of the string.
(Tested with https://regex101.com/)
After that you can parse each string as JSON, use a library for that.

As others recommended, a parser would be a better solution than splitting yourself. Regular expressions run into limitations when you get nested brackets, for example. I used Google's Gson library, and tweaking your input slightly produced the desired split. The important step was to turn your input into a JSON array, otherwise the parser would fail after the first element:
// Pre-processed your input to remove the double double quotes
String input = "{'cast_id': 10, 'character': 'Mushu (voice)', 'credit_id': '52fe43a09251416c75017cbb', 'gender': 2, 'id': 776, 'name': 'Eddie Murphy', 'order': 0}, {'cast_id': 62, 'character': '[Singing voice]', 'credit_id': '597a65c8925141233d0000bb', 'gender': 2, 'id': 18897, 'name': 'Jackie Chan', 'order': 1}, {'cast_id': 16, 'character': 'Mulan (voice)', 'credit_id': '52fe43a09251416c75017cd5', 'gender': 1, 'id': 21702, 'name': 'Ming-Na Wen', 'order': 2}";
JsonArray array = new JsonParser().parse("[" + input + "]").getAsJsonArray();
for (int i = 0; i < array.size(); i++)
{
System.out.println(array.get(i));
}
Output:
{"cast_id":10,"character":"Mushu (voice)","credit_id":"52fe43a09251416c75017cbb","gender":2,"id":776,"name":"Eddie Murphy","order":0}
{"cast_id":62,"character":"[Singing voice]","credit_id":"597a65c8925141233d0000bb","gender":2,"id":18897,"name":"Jackie Chan","order":1}
{"cast_id":16,"character":"Mulan (voice)","credit_id":"52fe43a09251416c75017cd5","gender":1,"id":21702,"name":"Ming-Na Wen","order":2}

Related

Read data as arrays from a Text File in Java

I have a text file with a bunch of arrays with a specific number I have to find in the array. The text file looks like this:
(8) {1, 4, 6, 8, 12, 22}
(50) {2, 5, 6, 7, 10, 11, 24, 50, 65}
(1) {1}
(33) {1, 2, 5, 6, 11, 12, 13, 21, 25, 26, 30, 33, 60, 88, 99}
(1) {1, 2, 3, 4, 8, 9, 100}
(1) {2, 3, 5, 6, 11, 12, 13, 21, 25, 26, 30, 33, 60, 88, 99}
where the number inside the parenthesis is the number I have to find using binary search. and the rest is the actual array.
I do not know how I would get this array from the text file and be able to read it as an actual array.
[This is a question on a previous coding competition I took, and am going over the problems]
I already have a method to do the binary search, and I have used scanner to read the file like this:
Scanner sc = new Scanner(new File("search_race.dat"));
and used a while loop to be able to loop through the file and read it.
But I am stuck on how to make java know that the stuff in the curly braces is an array and the stuff in the parenthesis is what it must use binary search on said array to find.
You could simply parse each line (the number to find and the array) as follow :
while (sc.hasNext()) {
int numberToFind = Integer.parseInt(sc.next("\\(\\d+\\)").replaceAll("[()]", ""));
int[] arrayToFindIn = Arrays.stream(sc.nextLine().split("[ ,{}]"))
.filter(x->!x.isEmpty())
.mapToInt(Integer::parseInt)
.toArray();
// Apply your binary search ! Craft it by yourself or use a std one like below :
// int positionInArray = Arrays.binarySearch(arrayToFindIn, numberToFind);
}
If you don't like the replaceAll, you could replace the first line in the loop by the two below :
String toFindGroup = sc.next("\\(\\d+\\)");
int numberToFind = Integer.parseInt(toFindGroup.substring(1, toFindGroup.length()-1));
Cheers!
TL;DR: You have to check character by character and see if it's a curly brace or a parenthesis or a digit
Long Answer:
First, create a POJO (let's call this AlgoContainer, but use whatever name you like) with the fields int numberToFind and ArrayList<Integer> listOfNumbers.
Then, read the file like #ManojBanik has mentioned in the comments
Now create an ArrayList<AlgoContainer> (it's size should be the same as the ArrayList<String> that was gotten while reading the file line by line)
Then loop through the ArrayList<String> in the above step and perform the following operations:
Create and instantiate an AlgoContainer object instance (let's call this tempAlgoContainer).
check if the first character is an open parentheses -> yes? create an empty temp String -> check if the next character is a number -> yes? -> append it to the empty String and repeat this until you find the closing parenthesis.
Found the open parenthesis? parse the temp String to int and set the numberToFind field of tempAlgoContainer to that number.
Next up is the curly bracket stuff: found a curly bracket? create a new empty temp String -> check if the next character is digit -> yes? append then append it to the empty String just like in step #2 until you find a comma or a closing curly brace.
Found a comma? parse the temp String to int and then add it to the listOfNumbers (which is a field) of tempAlgoContainer -> make the temp String empty again.
Found a closing curly brace? repeat the above step and break out of the loop. You are now ready to process whatever you want to do. Your data is ready.
Also, it's a good idea to have a member function or instance method of AlgoContainer (call it whatever you want) to perform the binary search so that you can simply loop through the ArrayList<AlgoContainer> and call that BS function on it (no-pun-intended)
To read the file, you can use Files.readAllLines()
To actually parse each line, you can use something like this.
First, to make things easier, remove any whitespace from the line.
line.replaceAll("\\s+", "");
This will essentially transform (8) {1, 4, 6, 8, 12, 22} into (8){1,4,6,8,12,22}.
Next, use a regular expression to validate the line. If the line does not match no further actions are required.
Expression: \([0-9]*\)\{[0-9]*(,[0-9]*)*}
\([0-9]*\) relates to (8) (above example)
\{[0-9]*(,[0-9]*)*} relates to {1,4,6,8,12,22}
If you don´t understand the expression, head over here.
Finally, we can parse the string into its two components: The number to search for and the int[] with the actual values.
// start from index one to skip the first bracket
int targetEnd = trimmed.indexOf(')', 1);
String searchString = trimmed.substring(1, targetEnd);
// parsing wont throw an exception, since we checked with the regex its a number
int numberToFind = Integer.parseInt(searchString);
// skip ')' and '{', align to the first value, skip the last '}'
String valuesString = trimmed.substring(targetEnd + 2, trimmed.length() - 1);
// split the array at ',' to get each value as string
int[] values = Arrays.stream(valuesString.split(","))
.mapToInt(Integer::parseInt).toArray();
With both of these components parsed, you can do the binary search yourself.
Example code as Gist on GitHub
You could read strings in file line by line and then use regex on each line to separate the string to groups.
Below regex should fit to match the line
\((\d+)\) \{([\d, ]+)\}
Then group(1) will give the digit inside the parentheses (as a String) and group(2) will give the String inside curly braces, which you can split using , and space(assuming every comma follows a space) and get an array of numbers (as Strings again).

Replacing more than one character in a String

I am printing an array, but I only want to display the numbers. I want to remove the brackets and commas from the String (leaving only the digits). So far, I have been able to remove the commas but I was looking for a way add more arguments to the replaceAll method.
How can I remove the brackets as well as the commas?
cubeToString = Arrays.deepToString(cube);
System.out.println(cubeToString);
String cleanLine = "";
cleanLine = cubeToString.replaceAll(", ", ""); //I want to put all braces in this statement too
System.out.println(cleanLine);
The output is:
[[0, 0, 0, 0], [1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5]]
[[0000][1111][2222][3333][4444][5555]]
You can use the special characters [ and ] to form a pattern, and then \\ to escape the [ and ] (from your input) like,
cleanLine = cubeToString.replaceAll("[\\[\\]\\s,]", "");
or replace everything not a digit. Like,
cleanLine = cubeToString.replaceAll("\\D", "");
What you are doing is effectively using Java like you use a scripting language.
In this case, it happens to work well because your arrays only contain numbers and you don't have to worry about escaping characters that may also appear inside your array elements.
But it's still not efficient or Java-like to be converting strings several times, one of them with regular expressions (replaceAll), to get to your end-result.
A nicer and more efficient approach is to directly build the string that you need without any comma's or square brackets:
public static void main(String[] args) throws Exception {
int[][] cube = { { 0, 0, 0, 0 }, { 1, 1, 1, 1 }, { 2, 2, 2, 2 }, { 3, 3, 3, 3 }, { 4, 4, 4, 4 },
{ 5, 5, 5, 5 } };
StringBuilder builder = new StringBuilder();
for (int[] r : cube) {
for (int c : r) {
builder.append(c);
}
}
String cleanLine = builder.toString();
System.out.println(cleanLine);
}
Output:
000011112222333344445555

find all letters in String with regex

I know toCharArray() method but I am interested in regex. I have question for you about speed of two regex:
String s = "123456";
// Warm up JVM
for (int i = 0; i < 10000000; ++i) {
String[] arr = s.split("(?!^)");
String[] arr2 = s.split("(?<=\\G.{1})");
}
long start = System.nanoTime();
String[] arr = s.split("(?!^)");
long stop = System.nanoTime();
System.out.println(stop - start);
System.out.println(Arrays.toString(arr));
start = System.nanoTime();
String[] arr2 = s.split("(?<=\\G.{1})");
stop = System.nanoTime();
System.out.println(stop - start);
System.out.println(Arrays.toString(arr2));
output:
Run 1:
3158
[1, 2, 3, 4, 5, 6]
3947
[1, 2, 3, 4, 5, 6]
Run 2:
2763
[1, 2, 3, 4, 5, 6]
3158
[1, 2, 3, 4, 5, 6]
two regex are doing the same job. Why the first regex is more faster than second one ? . Thanks for your answers.
I can never be 100% sure, but I can think of one reason.
(?!^) always fails or succeeds in one shot (one attempt), that is if it can't find the start-of-string which is just a single test.
As for (?<=\\G.{1}) (which is exactly equivalent to just (?<=\\G.)) it always involved two steps or two matching attempts.
\\G matches either at the start-of-string or at the end of previous match, and even when it is successful, the regex engine still has to try and match a single character ..
For example, in your string 123456, at the start of the string:
(?!^): fails immediately.
(?<=\\G.): \\G succeeds, but then it looks for . but can't find a character behind because this is the start-of-string so now it fails, but as you can see it attempted two steps versus one step for the previous expression.
The same goes for every other position in the input string. Always two tests for (?<=\\G.) versus a single test for (?!^).

Input from text file to array

The input will be a text file with an arbitrary amount of integers from 0-9 with NO spaces. How do I populate an array with these integers so I can sort them later?
What I have so far is as follows:
BufferedReader numInput = null;
int[] theList;
try {
numInput = new BufferedReader(new FileReader(fileName));
} catch (FileNotFoundException e) {
System.out.println("File not found");
e.printStackTrace();
}
int i = 0;
while(numInput.ready()){
theList[i] = numInput.read();
i++;
Obviously theList isn't initialized, but I don't know what the length will be. Also I'm not too sure about how to do this in general. Thanks for any help I receive.
To clarify the input, it will look like:
1236654987432165498732165498756484654651321
I won't know the length, and I only want the single integer characters, not multiple. So 0-9, not 0-10 like I accidentally said earlier.
Going for Collection API i.e. ArrayList
ArrayList a=new Arraylist();
while(numInput.ready()){
a.add(numInput.read());
}
You could use a List<Integer> instead of a int[]. Using a List<Integer>, you can add items as desired, the List will grow along. If you are done, you can use the toArray(int[]) method to transform the List into an int[].
1 . Use guava to nicely read file's 1st line into 1 String
readFirstLine
2 . convert that String to char array - because all of your numbers are one digit lengh, so they are in fact chars
3 . convert chars to integers.
4 . add them to list.
public static void main(String[] args) {
String s = "1236654987432165498732165498756484654651321";
char[] charArray = s.toCharArray();
List<Integer> numbers = new ArrayList<Integer>(charArray.length);
for (char c : charArray) {
Integer integer = Integer.parseInt(String.valueOf(c));
numbers.add(integer);
}
System.out.println(numbers);
}
prints: [1, 2, 3, 6, 6, 5, 4, 9, 8, 7, 4, 3, 2, 1, 6, 5, 4, 9, 8, 7, 3, 2, 1, 6, 5, 4, 9, 8, 7, 5, 6, 4, 8, 4, 6, 5, 4, 6, 5, 1, 3, 2, 1]

Parsing Json objects in Java

I'm new to JSON and I'm really struggling to parse this layout with GSON in Java
{"time_entries":
[
{"hours":1.0,
"id":311,
"created_on":"2012-11-02T14:53:38Z",
"user":{"id":7,"name":"blah"},
"issue":{"id":266},
"activity":{"id":10,"name":"blah"},
"updated_on":"2012-11-02T14:53:38Z",
"comments":"blah",
"spent_on":"2012-11-02",
"project":{"id":10,"name":"blah"}},
{"hours":6.0,
"id":310,
"created_on":"2012-11-02T13:49:24Z",
"user":{"id":4,"name":"blah"},
"issue":{"id":258},
"activity":{"id":9,"name":"blah"},
"updated_on":"2012-11-02T13:49:24Z",
"comments":"blah",
"spent_on":"2012-11-02",
"project":{"id":11,"name":"blah"
}}
],
"offset":0,
"limit":2,
"total_count":306
}
If it helps it's the output the Redmine API gives you for time entries.
I'm struggling to understand some of the basic JSON concepts like objects and arrays and I haven't been able to find an example with a layout similar to this.
My main concern in using the tutorials I have read is that the multiple ID fields will get confused.
What's the best way to parse this without tying myself in knots?
I'm not set on using Gson and would be happy for a solution using Jackson or the built in library. The end goal is for Android implementation so I would prefer to use use serialization.
Thanks
EDIT:
My attempt at an "object model"
public class TimeResponse {
public List<Time_Entry> time_entries;
#SerializedName("hours")
public String hours;
#SerializedName("id")
public int id;
#SerializedName("created_on")
public String created_on;
#SerializedName("name")
public String name;
#SerializedName("updated_on")
public int updated_on;
public int page;
#SerializedName("comments")
public double comments;
#SerializedName("spent_on")
public String spent_on;
#SerializedName("offset")
public String offset;
#SerializedName("limit")
public String limit;
#SerializedName("total_count")
public String total_count;
}
I'm am unsure as to what I should write for my results list (if I need one) and I've have only declared an id and name string once despite it being used multiple times?
I am aware I shouldn't be using strings for my hours I'm in the process of looking into what the hours field actually contains. I believe the tutorial is slightly out of date in that the last three fields are not represented in the same way now in the Twitter API.
I am not sure what you mean by 'multiple ID fields'. There is no such thing as an ID in JSON.
Regarding the basic JSON concepts, see http://json.org/:
Object:
An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).
Array:
An array is an ordered collection of values. An array begins with [ (left bracket) and ends with ] (right bracket). Values are separated by , (comma).
Value:
A value can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested.
String:
A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes. A character is represented as a single character string. A string is very much like a C or Java string.
Number:
A number is very much like a C or Java number, except that the octal and hexadecimal formats are not used.
Edit:
There is not much you can do to simlify the JSON from your question except pretty-print it:
{
"time_entries": [
{
"hours": 1,
"id": 311,
"created_on": "2012-11-02T14:53:38Z",
"user": {
"id": 7,
"name": "blah"
},
"issue": {
"id": 266
},
"activity": {
"id": 10,
"name": "blah"
},
"updated_on": "2012-11-02T14:53:38Z",
"comments": "blah",
"spent_on": "2012-11-02",
"project": {
"id": 10,
"name": "blah"
}
},
{
"hours": 6,
"id": 310,
"created_on": "2012-11-02T13:49:24Z",
"user": {
"id": 4,
"name": "blah"
},
"issue": {
"id": 258
},
"activity": {
"id": 9,
"name": "blah"
},
"updated_on": "2012-11-02T13:49:24Z",
"comments": "blah",
"spent_on": "2012-11-02",
"project": {
"id": 11,
"name": "blah"
}
}
],
"offset": 0,
"limit": 2,
"total_count": 306
}
Perhaps you can see that you have one JSON Object with four name/value pairs:
time_entries
offset
limit
total_count
The last three of these have simple numeric values while the first (time_entries) is an Array of two more Objects. Each one of these two Objects conssits of various name/value pairs. The name/value pair id is just one of these.
Data is in name/value pairs
Data is separated by commas
Curly braces hold objects
Square brackets hold arrays
I ve used javascript here.. it may useful for you.. if you 've any other help let me knw
var jsonText = xmlhttp.responseText;
var obj = eval ("(" + jsonText + ")");
row_num=Object.keys(obj.time_entries).length;
this line give the length of time_entries array length
keydata[c]=Object.keys(obj.time_entries[0]);
columndata=keydata[0].toString();
my = columndata.split(",");
columndata contain the key values of time entries as a string of zero th index in that array
columnndata={hours,id,create_on..... and so on}
my={"hours","id".... etc}

Categories