I have this very long JSON string. I would like to filtrate it and only get the data between the first bracket. The problem is, I have many other brackets therefore my regex pattern is not working properly.
Here is the JSON string:
String jsondata = "["
+"{"
+ "test: 63453645"
+"date: 2016-07-17"
"{"
+ "id:534534"
+"}"
+ "blank : null"
+ "flags : null"
+ "}"
+"{"
+ "test: 543564236"
+"date: 2014-07-17"
+"{"
+ "id:6532465"
+"}"
+ "blank : null"
+ "flags : null"
+ "}"
+"]";
pattern = "\\{[^{}]*\\}";
pr = Pattern.compile(pattern);
math = pr.matcher(jsondata);
if (math.find()) {
System.out.println(math.group());
}
else
System.out.println("nomatch");
The problem with the pattern that I have is that it only prints out to the first } after the id:, but I want it to end at the last } which is after flags: null.
And I only want to print the first match, i.e not the string after because the also start and end with the same character, and that is why I have an if statement instead of a while loop.
Any suggestions? Thank you!
Regex with multiple brackets seems like a very difficult task. Can I match the last string instead? Starting from { to flags : null?
Like I said in comment,
I usually make use of JSON-Simple.
A great tutorial, decoding.
would look somewhat like:
JSONObject obj = JSONValue.parse(jsondata);
obj.get("test");
PS.
I do see some errors in your json data, make use of jsonlint to verify if your json is formatted correctly...
This will grab everything between the first { and last }:
String guts = jsondata.replaceAll("(?s)^.*?\\{(.*?flags : null[^}]*).*$", "$1");
The regex captures everything after the first { up to your semaphore text and all non-} chars following.
Related
I receive a string which contains a JSON object, unfortunately one of the values in that JSON-String looks like this:
{
"name":"Content-Type",
"value":"multipart/alternative; boundary="=-SITt2U5w3MJ1Y3RihaWzxw==""
}
As you can see the value of this JSON Object contains two " signs which need to be escaped so the outcome looks like this:
{
"name":"Content-Type",
"value":"multipart/alternative; boundary=/"=-SITt2U5w3MJ1Y3RihaWzxw==/""
}
Then the parsing into a JAVA Pojo works. Unforunately I can't detect it, the object above is one of many which come in an array. I know I could detect 'boundary' as a key word and escape the next two "-signs, I tried to get it working with regular expressions, but had no success doing so.
What could I do here?
If the intention is to escape the double quotes using regex, the below snippet can help.
public static void main(String[] args) {
String text = "{\n" +
" \"name\":\"Content-Type\",\n" +
" \"value\":\"multipart/alternative; boundary=\"=-SITt2U5w3MJ1Y3RihaWzxw==\"\"\n" +
"}";
Pattern pattern = Pattern.compile("boundary=\"([^\"]*?)\"");
Matcher matcher = pattern.matcher(text);
if (matcher.find()) {
System.out.println(matcher.replaceAll("boundary=\\\\\"" + matcher.group(1) + "\\\\\""));
}
}
It produces the below output:
{
"name":"Content-Type",
"value":"multipart/alternative; boundary=\"=-SITt2U5w3MJ1Y3RihaWzxw==\""
}
Regex pattern: boundary="([^]*?)"
Explanation: After boundary=" possessively match all characters till the next ". In replacement using the capture group to include escape characters
I am writing a program in SWI-prolog and Java.
My problem is, when i print the result from prolog in returns with [] and I don't want this.
The code for printing the results is
String t8 = "findDiseases(" + mylist + ",Diseases)."+ "\n";
Query q8 = new Query(t8);
Diagnosis_txt.append("Με τις δοθείσες πληροφορίες πάσχετε από: " +
"\n" +
"\n" +
q8.oneSolution().get("Diseases"));
while (q8.hasMoreSolutions()) {
Map<String, Term> s7 = q8.nextSolution();
System.out.println("Answer is " + s7.get("Diseases"));
}
And the printed results is
Answer is '[|]'(drepanocytocis, '[|]'(drepanocytocis, '[]'))
I want to get rid of this [|] and the []. I want to print only drepanocytocis.
if you want to remove all special characters you can do something like this:
answer = answer.replaceAll("[^a-zA-Z ]+", "").trim();
update
to remove any duplicate spaces after that run, the full solution can do somthing like this:
answer.replaceAll("[^a-zA-Z ]+", " ")
// remove duplicate spaces
.replaceAll("[ ]([ ]+)", " ")
// remove leading & trailing spaces
.trim();
It can then be split on spaces to get the correct sanitized answer...
However, as #andy suggested, I recommend finding the source of the data, and building a proper data structure for it to return exactly what you want. post processing should only kinda be used for data you have no control of, or old versions, etc...
From a server, I get strings of the following form:
String x = "fixedWord1:var1 data[[fixedWord2:var2 fixedWord3:var3 data[[fixedWord4] [fixedWord5=var5 fixedWord6=var6 fixedWord7=var7]]] , [fixedWord2:var2 fixedWord3:var3 data[[fixedWord4][fixedWord5=var5 fixedWord6=var6 fixedWord7=var7]]]] fixedWord8:fixedWord8";
(only spaces divide groups of word-var pairs)
Later, I want to store them in a Hashmap, like myHashMap.put(fixedWord1, var1); and so on.
Problem:
Inside the first "data[......]"-tag, the number of other "data[..........]"-tags is variable, and I don't know the length of the string in advance.
I don't know how to process such Strings without resorting to String.split(), which is discouraged by our assignment task givers (university).
I have searched the internet and couldn't find appropriate websites explaining such things.
It would be of great help, if experienced people could give me some links to websites or something like a "diagrammatic plan" so that I could code something.
EDIT:
got mistake in String (off-topic-begin "please don't lynch" off-topic-end), the right string is (changed fixedWord7=var7 ---to---> fixedWord7=[var7]):
String x = "fixedWord1:var1 data[[fixedWord2:var2 fixedWord3:var3 data[[fixedWord4] [fixedWord5=var5 fixedWord6=var6 fixedWord7=[var7]]]] , [fixedWord2:var2 fixedWord3:var3 data[[fixedWord4][fixedWord5=var5 fixedWord6=var6 fixedWord7=[var7]]]]] fixedWord8:fixedWord8";
I assume your string follows a same pattern, which has "data" and "[", "]" in it. And the variable name/value will not include these strings
remove string "data[", "[", "]", and "," from the original string
replaceAll("data[", "")
replaceAll("[", "")
etc
separate the string by space: " " by using StringTokenizer or loop through the String char by char.
then you will get array of strings like
fixedWorld1:var1
fixedWorld2:var2
......
fixedWorld4
fixedWorld5=var5
......
then again separate the sub strings by ":" or "=". and put the name/value into the Map
Problem is not absolutely clear but may be something like this will work for you:
Pattern p = Pattern.compile("\\b(\\w+)[:=]\\[?(\\w+)");
Matcher m = p.matcher( x );
while( m.find() ) {
System.out.println( "matched: " + m.group(1) + " - " + m.group(2) );
hashMap.put ( m.group(1), m.group(2) );
}
I am lost when it comes to building regex strings. I need a regular expression that does the following.
I have the following strings:
[~class:obj]
[~class|class2|more classes:obj]
[!class:obj]
[!class|class2|more classes:obj]
[?method:class]
[text]
A string can have multiple of whats above. Example string would be "[if] [!class:obj]"
I want to know what is in between the [] and broken into match groups. For example, the first match group would be the symbol if present (~|!|?) next what is before the : so that could be class or class|class2|etc... then what is on the right of the : and stop before the ]. There may be no : and what goes before it, but just something between the [].
So, how would I go about writing this regex? And is it possible to give the match group names so I know what it matched?
This is for a java project.
If you're sure enough of your inputs, you can probably use something like /\[(\~|\!|\?)?(?:((?:[^:\]]*?)+):)?([^\]]+?)\]/. (to translate that into Java, you'll want to escape the backslashes and use quotation marks instead of forward slashes)
Here are some web sites that might be helpful:
http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html
http://txt2re.com/index.php3?s=Test+test+june+2011+test&submit=Show+Matches
http://www.regexplanet.com/simple/
I believe that this should work:
/[(.*?)(?:\|(.*?))*]/
Also:
[a-z]*
Try this code
final Pattern
outerP = Pattern.compile("\\[.*?\\]"),
innerP = Pattern.compile("\\[([~!?]?)([^:]*):?(.*)\\]");
for (String s : asList(
"[~class:obj]",
"[if][~class:obj]",
"[~class|class2|more classes:obj]",
"[!class:obj]",
"[!class|class2|more classes:obj]",
"[?method:class]",
"[text]"))
{
final Matcher outerM = outerP.matcher(s);
System.out.println("Input: " + s);
while (outerM.find()) {
final Matcher m = innerP.matcher(outerM.group());
if (m.matches()) System.out.println(
m.group(1) + ";" + m.group(2) + ";" + m.group(3));
else System.out.println("No match");
}
}
For example I'm extracting a text String from a text file and I need those words to form an array. However, when I do all that some words end with comma (,) or a full stop (.) or even have brackets attached to them (which is all perfectly normal).
What I want to do is to get rid of those characters. I've been trying to do that using those predefined String methods in Java but I just can't get around it.
Reassign the variable to a substring:
s = s.substring(0, s.length() - 1)
Also an alternative way of solving your problem: you might also want to consider using a StringTokenizer to read the file and set the delimiters to be the characters you don't want to be part of words.
Use:
String str = "whatever";
str = str.replaceAll("[,.]", "");
replaceAll takes a regular expression. This:
[,.]
...looks for each comma and/or period.
To remove the last character do as Mark Byers said
s = s.substring(0, s.length() - 1);
Additionally, another way to remove the characters you don't want would be to use the .replace(oldCharacter, newCharacter) method.
as in:
s = s.replace(",","");
and
s = s.replace(".","");
You can't modify a String in Java. They are immutable. All you can do is create a new string that is substring of the old string, minus the last character.
In some cases a StringBuffer might help you instead.
The best method is what Mark Byers explains:
s = s.substring(0, s.length() - 1)
For example, if we want to replace \ to space " " with ReplaceAll, it doesn't work fine
String.replaceAll("\\", "");
or
String.replaceAll("\\$", ""); //if it is a path
Note that the word boundaries also depend on the Locale. I think the best way to do it using standard java.text.BreakIterator. Here is an example from the java.sun.com tutorial.
import java.text.BreakIterator;
import java.util.Locale;
public static void main(String[] args) {
String text = "\n" +
"\n" +
"For example I'm extracting a text String from a text file and I need those words to form an array. However, when I do all that some words end with comma (,) or a full stop (.) or even have brackets attached to them (which is all perfectly normal).\n" +
"\n" +
"What I want to do is to get rid of those characters. I've been trying to do that using those predefined String methods in Java but I just can't get around it.\n" +
"\n" +
"Every help appreciated. Thanx";
BreakIterator wordIterator = BreakIterator.getWordInstance(Locale.getDefault());
extractWords(text, wordIterator);
}
static void extractWords(String target, BreakIterator wordIterator) {
wordIterator.setText(target);
int start = wordIterator.first();
int end = wordIterator.next();
while (end != BreakIterator.DONE) {
String word = target.substring(start, end);
if (Character.isLetterOrDigit(word.charAt(0))) {
System.out.println(word);
}
start = end;
end = wordIterator.next();
}
}
Source: http://java.sun.com/docs/books/tutorial/i18n/text/word.html
You can use replaceAll() method :
String.replaceAll(",", "");
String.replaceAll("\\.", "");
String.replaceAll("\\(", "");
etc..