Picking apart a string and replacing it - java

I have been picking my brain lately and can't seem to figure out how to pull the "text" from this string and replace the found pattern with those word(s).
Pattern searchPattern = Pattern.compile("\\[\\{(.+?)\\}\\]");
Matcher matcher = searchPattern.matcher(sb);
sb is the string that contains a few occurrences of these patterns that start with [{ and end with ]}.
[{ md : {o : "set", et : _LU.et.v.v }, d : {t : _LU.el.searchtype, l : _LU[_LU.el.searchtype].nfts.l, v : _LU[_LU.el.searchtype].nfts.v}}, { md : {o : "set", et : _LU.et.v.v }, d : {t : _LU.el.topicgroup, l : "Books", v : "ETBO"}}]
gets returned as
md : {o : "set", et : _LU.et.v.v }, d : {t : _LU.el.searchtype, l : _LU[_LU.el.searchtype].nfts.l, v : _LU[_LU.el.searchtype].nfts.v}}, { md : {o : "set", et : _LU.et.v.v }, d : {t : _LU.el.topicgroup, l : "Books", v : "ETBO"}
Notice the lack of [{ and }]. I manage to find the above pattern but how would I find the words set and Book and then replace the original found pattern with only those words. I can search the string if it contains a " via
while (matcher.find()) {
matcher.group(1).contains("\"");
but I really just need some ideas about how to go about doing this.

Is this what you are looking for (answer based on your first comment)?
its actually fairly large.. but goes along the lines of "hello my name is, etc, etc, etc, [{ md : {o : "set", et : _LU.et.v.v }, d : {t : _LU.el.searchtype, l : _LU[_LU.el.searchtype].nfts.l, v : _LU[_LU.el.searchtype].nfts.v}}, { md : {o : "set", et : _LU.et.v.v }, d : {t : _LU.el.topicgroup, l : "Books", v : "ETBO"}}] , some more text here, and some more" -> the [{ }] parts should be replaced with the text inside of them in this case set, books, etbo... resulting in a final string of "hello my name is, etc, etc, etc, set set Books ETBO , some more text here, and some more"
// text from your comment
String sb = "hello my name is, etc, etc, etc, [{ md : "
+ "{o : \"set\", et : _LU.et.v.v }, d : {t : "
+ "_LU.el.searchtype, l : _LU[_LU.el.searchtype].nfts.l, "
+ "v : _LU[_LU.el.searchtype].nfts.v}}, { md : {o : "
+ "\"set\", et : _LU.et.v.v }, d : {t : _LU.el.topicgroup, "
+ "l : \"Books\", v : \"ETBO\"}}] , "
+ "some more text here, and some more";
Pattern searchPattern = Pattern.compile("\\[\\{(.+?)\\}\\]");
Matcher matcher = searchPattern.matcher(sb);
// pattern that finds words between quotes
Pattern serchWordsInQuores = Pattern.compile("\"(.+?)\"");
// here I will collect words in quotes placed in [{ and }] and separate
// them with one space
StringBuilder words = new StringBuilder();
// buffer used while replacing [{ xxx }] part with words found in xxx
StringBuffer output = new StringBuffer();
while (matcher.find()) {// looking for [{ xxx }]
words.delete(0, words.length());
//now I search for words in quotes from [{ xxx }]
Matcher m = serchWordsInQuores.matcher(matcher.group());
while (m.find())
words.append(m.group(1)).append(" ");
matcher.appendReplacement(output, words.toString().trim());
//trim was used to remove last space
}
//we also need to append last part of String that wasn't used in matcher
matcher.appendTail(output);
System.out.println(output);
Output:
hello my name is, etc, etc, etc, set set Books ETBO , some more text here, and some more

OK, I think you need to do this in three passes, first time matching the section between the [{ }], and the second time going through the match doing the replace, and the third time replacing that match with the string you got from the second pass.
You already have a pattern for the first match, and you'd just use it again for the third match, when you replace it with the result of the second pass.
For the second pass, you're going to need to replaceAll on the first match. Something like this:
Pattern searchPattern = Pattern.compile("\\[\\{(.+?)\\}\\]");
Matcher matcher = searchPattern.matcher(sb);
while ( matcher.find() )
{
matcher.replaceFirst(matcher.group(1).replaceAll("[^\"]*\"([^\"]*)\"", "$1"));
}
The first pass is done by matcher.find(). The next one is done by matcher.group().replaceAll(), which is then passed into matcher.replaceFirst() for the third pass. The third pass is a little weird: it replaces the first example of the [{ }]. However, since we're starting from the beginning and moving forward, that will be the one we just found, and we won't match it again because it will get replaced by a non-matching string. The docs recommend resetting the matcher after replaceFirst(), but I think it will be safe here because it will continue from after that replacement, which is exactly what we want.
I would point out that this is not particularly efficient. I think that you would be better off doing more of this manually rather than with regular expressions.

LATEST REVISION
An Example on how to loop over a string with multiple boundaries and replacing at each level
public static String replace(CharSequence rawText, String oldWord, String newWord, String regex) {
Pattern patt = Pattern.compile(regex);
Matcher m = patt.matcher(rawText);
StringBuffer sb = new StringBuffer(rawText.length());
while (m.find()) {
String text = m.group(1);
if(oldWord == null || oldWord.isEmpty()) {
m.appendReplacement(sb, Matcher.quoteReplacement(newWord));
} else {
if(text.matches(oldWord)) {
m.appendReplacement(sb, Matcher.quoteReplacement(newWord));
}
}
}
m.appendTail(sb);
return sb.toString();
}
public static void main(String[] args) throws Exception {
String rawText = "[{MY NAME IS \"NAME\"}]";
rawText += " bla bla bla [{I LIVE IN \"SOME RANDOM CITY\" WHERE THE PIZZA IS GREAT!}]";
rawText += " bla bla etc etc [{I LOVE \"A HOBBY\"}]";
System.out.println(rawText);
Pattern searchPattern = Pattern.compile("\\[\\{(.+?)\\}\\]");
Matcher matcherBoundary = searchPattern.matcher(rawText);
List<String> replacement = new ArrayList<String>();
replacement.add("BOB");
replacement.add("LOS ANGELES");
replacement.add("PUPPIES");
int counter = 0;
while (matcherBoundary.find()) {
String result = Test.replace(matcherBoundary.group(1), null, replacement.get(counter), "\"([^\"]*)\"");
System.out.println(result);
counter++;
}
}
The output I get is:
**Raw Text**
[{MY NAME IS "NAME"}] bla bla bla [{I LIVE IN "SOME RANDOM CITY" WHERE THE PIZZA IS GREAT!}] bla bla etc etc [{I LOVE "A HOBBY"}]
**In Every Loop**
MY NAME IS BOB
I LIVE IN LOS ANGELES WHERE THE PIZZA IS GREAT!
I LOVE PUPPIES

Related

Regex query MongoDB Performance issue

i have Mongodb collection which contains single field , each day i am receiving 31000 documents and in the collection i have almost 6 months data
Here is how my data looks like in database
{
"_id" : ObjectId("59202aa3f32dfba00d0773c3"),
"Data" : "20-05-2017 18:38:13 SYSTEM_000_00_SAVING ",
"__v" : 0
}
{
"_id" : ObjectId("59202aa3f32dfba00d0773c4"),
"Data" : "20-05-2017 18:38:13 SyTime_000_09_00:00 ",
"__v" : 0
}
here is my code for query
DBObject query = new BasicDBObject();
Pattern regex = Pattern.compile("20-05-2017");
query.put("Data", regex);
i have created index but its still slow
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "NOB_SRB.fdevices"
},
{
"v" : 1,
"unique" : true,
"key" : {
"Data" : 1.0
},
"name" : "Data_1",
"ns" : "NOB_SRB.fdevices"
}
]
Add a start of input anchor ^ to the start of the regex:
Pattern regex = Pattern.compile("^20-05-2017");
Because your regex does not have an anchor, the entire field is searched for the date anywhere in it, which requires every character in the field to be compared.

Extract data inside nested braces

I want to extract content between the first nested braces and second nested braces separately. Now I am totally stuck with this can anyone help me. My file read.txt contains the below data . I just read that to a string "s".
BufferedReader br=new BufferedReader(new FileReader("read.txt"));
while(br.ready())
{
String s=br.readLine();
System.out.println(s);
}
Output
{ { "John", "ran" }, { "NOUN", "VERB" } },
{ { "The", "dog", "jumped"}, { "DET", "NOUN", "VERB" } },
{ { "Mike","lives","in","Poland"}, {"NOUN","VERB","DET","NOUN"} },
ie my output should look like
"John", "ran"
"NOUN", "VERB"
"The", "dog", "jumped"
"DET", "NOUN", "VERB"
"Mike","lives","in","Poland"
"NOUN","VERB","DET","NOUN"
Use this regex:
(?<=\{)(?!\s*\{)[^{}]+
See the matches in the Regex Demo.
In Java:
Pattern regex = Pattern.compile("(?<=\\{)(?!\\s*\\{)[^{}]+");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
}
Explanation
The lookbehind (?<=\{) asserts that what precedes the current position is a {
The negative lookahead (?!\s*\{) asserts that what follows is not optional whitespace then {
[^{}]+ matches any chars that are not curlies
If you split on "}," then you get your sets of words in a single string, then just a matter of replacing curly braces
As per your code
BufferedReader br=new BufferedReader(new FileReader("read.txt"));
while(br.ready())
{
String s=br.readLine();
String [] words = s.split ("},");
for (int x = 0; x < words.length; x++) {
String printme = words[x].replace("{", "").replace("}", "");
}
}
You could always remove the opening brackets, then split by '},' which would leave you with the list of strings you've asked for. (If that is all one string, of course)
String s = input.replace("{","");
String[] splitString = s.split("},");
Would first remove open brackets:
"John", "ran" }, "NOUN", "VERB" } },
"The", "dog", "jumped"}, "DET", "NOUN", "VERB" } },
"Mike","lives","in","Poland"},"NOUN","VERB","DET","NOUN"} },
Then would split by },
"John", "ran"
"NOUN", "VERB" }
"The", "dog", "jumped"
"DET", "NOUN", "VERB" }
"Mike","lives","in","Poland"
"NOUN","VERB","DET","NOUN"}
Then you just need to tidy them up with another replace!
Another approach could be searching for {...} substring with no inner { or } characters and take only its inner part without { and }.
Regex describing such substring can look like
\\{(?<content>[^{}]+)\\}
Explanation:
\\{ is escaped { so now it represents { literal (normally it represents start of quantifier {x,y} so it needed to be escaped)
(?<content>...) is named-capturing group, it will store only part between { and } and later we would be able to use this part (instead of entire match which would also include { })
[^{}]+ represents one or more non { } characters
\\} escaped } which means it represents }
DEMO:
String input = "{ { \"John\", \"ran\" }, { \"NOUN\", \"VERB\" } },\r\n" +
"{ { \"The\", \"dog\", \"jumped\"}, { \"DET\", \"NOUN\", \"VERB\" } },\r\n" +
"{ { \"Mike\",\"lives\",\"in\",\"Poland\"}, {\"NOUN\",\"VERB\",\"DET\",\"NOUN\"} },";
Pattern p = Pattern.compile("\\{(?<content>[^{}]+)\\}");
Matcher m = p.matcher(input);
while(m.find()){
System.out.println(m.group("content").trim());
}
Output:
"John", "ran"
"NOUN", "VERB"
"The", "dog", "jumped"
"DET", "NOUN", "VERB"
"Mike","lives","in","Poland"
"NOUN","VERB","DET","NOUN"

Escape " between to Strings

I'm getting an Json and then using Gson to deserialize it in an object where value is a String not an object.. so i'm trying to put that object into a String variable - not deserialized
I want to do something like this:
"{value" : {"id":"2"}} -> {"value" : "{\"id\":\"2\"}"}
I replaced the "value" : { with "value" : "{ like this:
result = result.replace("\"value\" : {", "\"value\" : \"{");
and then I replaced the }} like this:
result = result.replace("}}", "}\"}");
And my result was (after replacing everything):
{"value" : "{"id":"2","name":"game2"}"}
the only problem now: i also want to replace the " with \" but only inside the "{ ... }" I can't figure that out.
EDIT:
Incoming Json:
{"path" : "/gdi/games/2", "key" : "detail", "value" : {"id":"2","name":"game2"}},
{"path" : "/gdi/games/4", "key" : "detail", "value" : {"id":"4","name":"game4"}},
{"path" : "/gdi/games/6", "key" : "detail", "value" : {"id":"6","name":"game6"}}
The problem: value: could be anything (text) so I only want to store everything which cames between { } in my Object which become deserialized in an object that looks like:
String path;
String key;
String value;
To achieve this I have to escape the object (which is in "value") like it is String - After escaping that Gson can deserialize it for me.
Json needed:
{"path" : "/gdi/games/2", "key" : "detail", "value" : "{\"id\":\"2\",\"\name\":\"game2\"}"},
{"path" : "/gdi/games/4", "key" : "detail", "value" : "{\"id\":\"4\",\"\name\":\"game4\"}"},
{"path" : "/gdi/games/6", "key" : "detail", "value" : "{\"id\":\"6\",\"\name\":\"game6\"}"}
This does the trick:
input = input.replaceAll("(?=\"[^{}]*\\})", "\\\\");
It uses a look ahead to assert that the next curly bracket found after a double quote is a right curly - meaning the double quote must be within a pair of curly brackets.
The replacement term is a literal backslash - four backslashes needed due to double-escape: one for java string literal, one for regex escape.
Here's some test code using your sample input an producing your expected output:
String input =
"{\"path\" : \"/gdi/games/2\", \"key\" : \"detail\", \"value\" : {\"id\":\"2\",\"name\":\"game2\"}}," +
"{\"path\" : \"/gdi/games/4\", \"key\" : \"detail\", \"value\" : {\"id\":\"4\",\"name\":\"game4\"}}," +
"{\"path\" : \"/gdi/games/6\", \"key\" : \"detail\", \"value\" : {\"id\":\"6\",\"name\":\"game6\"}}";
input = input.replaceAll("(?=\"[^{}]*\\})", "\\\\");
System.out.println(input);
Output:
{"path" : "/gdi/games/2", "key" : "detail", "value" : {\"id\":\"2\",\"name\":\"game2\"}},
{"path" : "/gdi/games/4", "key" : "detail", "value" : {\"id\":\"4\",\"name\":\"game4\"}},
{"path" : "/gdi/games/6", "key" : "detail", "value" : {\"id\":\"6\",\"name\":\"game6\"}}
You can use Jackson library to map data to a class.
http://www.journaldev.com/2324/jackson-json-processing-api-in-java-example-tutorial
For printing backslashes - example
public static void main(String[] args) {
String a="{\"id\":\"2\",\"name\":\"game2\"}";
System.out.println("Before - "+a);
System.out.println("After - "+a.replace("\"", "\\\""));
}
output -
Before - {"id":"2","name":"game2"}
After - {\"id\":\"2\",\"name\":\"game2\"}
you just need to add double \ if you want to add "\"
for example: System.out.println("\\"");
print this statement you will get : \"
String abc: "\\"";

Read out elements of a string in specific order

For some reasons I have to use a specific string in my project. This is the text file (it's a JSON File):
{"algorithm":
[
{ "key": "onGapLeft", "value" : "moveLeft" },
{ "key": "onGapFront", "value" : "moveForward" },
{ "key": "onGapRight", "value" : "moveRight" },
{ "key": "default", "value" : "moveBackward" }
]
}
I've defined it in JAVA like this:
static String input = "{\"algorithm\": \n"+
"[ \n" +
"{ \"key\": \"onGapLeft\", \"value\" : \"moveLeft\" }, \n" +
"{ \"key\": \"onGapFront\", \"value\" : \"moveForward\" }, \n" +
"{ \"key\": \"onGapRight\", \"value\" : \"moveRight\" }, \n" +
"{ \"key\": \"default\", \"value\" : \"moveBackward\" } \n" +
"] \n" +
"}";
Now I have to isolate the keys and values in an array:
key[0] = onGapLeft; value[0] = moveLeft;
key[1] = onGapFront; value[1] = moveForward;
key[2] = onGapRight; value[2] = moveRight;
key[3] = default; value[3] = moveBackward;
I'm new to JAVA and don't understand the string class very well. Is there an easy way to get to that result? You would help me really!
Thanks!
UPDATE:
I didn't explained it well enough, sorry. This program will run on a LEGO NXT Robot. JSON won't work there as I want it to so I have to interpret this JSON File as a normal STRING! Hope that explains what I want :)
I propose a solution in several step.
1) Let's get the different parts of your ~JSON String. We will use a pattern to get the different {.*} parts :
public static void main(String[] args) throws Exception{
List<String> lines = new ArrayList<String>();
Pattern p = Pattern.compile("\\{.*\\}");
Matcher matcher = p.matcher(input);
while (matcher.find()) {
lines.add(matcher.group());
}
}
(you should take a look at Pattern and Matcher)
Now, lines contains 4 String :
{ "key": "onGapLeft", "value" : "moveLeft" }
{ "key": "onGapFront", "value" : "moveForward" }
{ "key": "onGapRight", "value" : "moveRight" }
{ "key": "default", "value" : "moveBackward" }
Given a String like one of those, you can remove curly brackets with a call to String#replaceAll();
List<String> cleanLines = new ArrayList<String>();
for(String line : lines) {
//replace curly brackets with... nothing.
//added a call to trim() in order to remove whitespace characters.
cleanLines.add(line.replaceAll("[{}]","").trim());
}
(You should take a look at String String#replaceAll(String regex))
Now, cleanLines contains :
"key": "onGapLeft", "value" : "moveLeft"
"key": "onGapFront", "value" : "moveForward"
"key": "onGapRight", "value" : "moveRight"
"key": "default", "value" : "moveBackward"
2) Let's parse one of those lines :
Given a line like :
"key": "onGapLeft", "value" : "moveLeft"
You can split it on , character using String#split(). It will give you a String[] containing 2 elements :
//parts[0] = "key": "onGapLeft"
//parts[1] = "value" : "moveLeft"
String[] parts = line.split(",");
(You should take a look at String[] String#split(String regex))
Let's clean those parts (remove "") and assign them to some variables:
String keyStr = parts[0].replaceAll("\"","").trim(); //Now, key = key: onGapLeft
String valueStr = parts[1].replaceAll("\"","").trim();//Now, value = value : moveLeft
//Then, you split `key: onGapLeft` with character `:`
String key = keyStr.split(":")[1].trim();
//And the same for `value : moveLeft` :
String value = valueStr.split(":")[1].trim();
That's it !
You should also take a look at Oracle's tutorial on regular expressions (This one is really important and you should invest time on it).
You need to use a JSON parser library here. For example, with org.json you could parse it as
String input = "{\"algorithm\": \n"+
"[ \n" +
"{ \"key\": \"onGapLeft\", \"value\" : \"moveLeft\" }, \n" +
"{ \"key\": \"onGapFront\", \"value\" : \"moveForward\" }, \n" +
"{ \"key\": \"onGapRight\", \"value\" : \"moveRight\" }, \n" +
"{ \"key\": \"default\", \"value\" : \"moveBackward\" } \n" +
"] \n" +
"}";
JSONObject root = new JSONObject(input);
JSONArray map = root.getJSONArray("algorithm");
for (int i = 0; i < map.length(); i++) {
JSONObject entry = map.getJSONObject(i);
System.out.println(entry.getString("key") + ": "
+ entry.getString("value"));
}
Output :
onGapLeft: moveLeft
onGapFront: moveForward
onGapRight: moveRight
default: moveBackward

Trouble splitting string with split(regex) Java

I want to split a number of strings similar to name: john, id: 20, dest: toledo, from: seattle, date_time: [2/8/12 15:48:01:837 MST] into only these tokens:
john
20
toledo
seattle
[2/8/12 15:48:01:837 MST]
I'm doing this
String delims = "(name|id|dest|from|date_time)?[:,\\s]+";
String line = "name: john, id: 20, dest: toledo, from: seattle, date_time: [2/8/12 15:48:01:837 MST]";
String[] lineTokens = line.split(delims, 5);
for (String t : lineTokens)
{
// for debugging
System.out.println (t);
// other processing I want to do
}
but every even element in lineTokens turns out to be either empty or just whitespace. Each odd element in lineTokens is what I want, i.e. lineTokens[0] is "", lineTokens[1] is "john", lineTokens[2] is "", lineTokens[3] is "20", etc. Can anyone explain what I'm doing wrong?
The problem is that your regex is not matching , id: as a whole, it is matching , as one and then id: as a 2nd match. Between these two matches you have an empty string. You need to modify it to match the whole thing. Something like this:
String delims = "(, )?(name|id|dest|from|date_time)?[:\\s]+";
http://ideone.com/Qgs8y
Why not a little less complicated regex solution.
String str = "name: john, id: 20, dest: toledo, from: seattle, date_time: [2/8/12 15:48:01:837 MST]";
String[] expr = str.split(", ");
for(String e : expr)
System.out.println(e.split(": ")[1]);
Output =
john
20
toledo
seattle
[2/8/12 15:48:01:837 MST]
I made some changes to your code:
String delims = "(name|id|dest|from|date_time)[:,\\s]+";
String line = "name: john, id: 20, dest: toledo, from: seattle, date_time: [2/8/12 15:48:01:837 MST]";
String[] lineTokens = line.split(delims);
for (String t : lineTokens)
{
// for debugging
System.out.println (t);
// other processing I want to do
}
also you should ignore the first element in lineTokens, since it's the capturing from the beginning of the line till "name:...."

Categories