Using regex to remove JSON quotes - java

I am being given some JSON from an external process that I can't change, and I need to modify this JSON string for a downstream Java process to work. The JSON string looks like:
{"widgets":"blah","is_dog":"1"}
But it needs to look like:
{"widgets":blah,"is_dog":"1"}
I have to remove the quotes around blah. In reality, blah is a huge JSON object, and so I've simplified it for the sake of this question. So I figured I'd attack the problem by doing two String#replace calls, one before blah, and one after it:
dataString = dataString.replaceAll("{\"widgets\":\"", "{\"widgets\":");
dataString = dataString.replaceAll("\",\"is_dog\":\"1\"}", ",\"is_dog\":\"1\"}");
When I run this I get a vague runtime error:
Illegal repetition
Can any regex maestros spot where I'm going awrye? Thanks in advance.

I believe you need to escape braces. Braces are used for repetition ((foo){3} looks for foo three times in a row); hence the error.
Note: in this case it needs to be double escaping: \\{.

{ and } in regex have special meaning. They are to mention allowed repetition of patterns. So they are to be escaped here.
Use \\{\"widgets\":\"", "\\{\"widgets\": instead of {\"widgets\":\"", "{\"widgets\":.

Since the input string looks to be valid json, your best bet would be to parse it with an actual parser to a map-like structure. Regexes are not the right tools for this. Serializing this structure to to something not quite json would then be relatively simple.

I do wonder if you're better off taking the code for JSONObject and modifying the toString() method to make this a more reliable transformation than using regexps. Here's the source code, and you're looking for invocations of the quote() method

Well, why don't you simply do the following?
1) Decode the first JSON (which is correct with quotes) into varJSON1
2) Get the String "blah" in varJSON1 into varJSON2
3) Then decode the varJSON2

Related

Regex for IP and string

Im using this regex online test site.
Here is the regex im using:
\{"ip":"(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$","iphone":"admin/ios","dev":\{"action":"CUS","from":"REG","CUSA":"ADVERT"\}\}
And im trying to match it to:
{"ip":"192.168.50.5","iphone":"admin/ios","dev":{"action":"CUS","from":"REG","CUSA":"ADVERT"}}
When i run the test, it doesn't match, I need it to match on the site above for validation reasons.
A different perspective: it seems that it is already pretty hard to come up with a regex that initially works for you. What does this tell you about how hard will it be in the future to maintain this regex; and maybe extend it?!
What I am saying is: regexes are a good tool; but sometimes overrated. This looks like a string in JSON format. Wouldn't it be better to just take it as that, and use a garden-variety JSON parser instead of trying to build your own regex?
You see, what will be more robust over time - your self baked regex; or some standard library that millions of people are using?
One place to read about JSON parsers would be this question here.
This will be enough for your context.
"ip":"(\d+).(\d+).(\d+).(\d+)"
Edit:
Regex is not for structured data processing, most of the time you need a solution that just works. When sample data changed and doesn't match anymore, you update the regex string to match it again.
Since you want to get four numbers inside a quote pair after a key called "ip", this regex will definitely do it.
If you want something else, please provide more context. Thanks!

split JSON and string in android

My HTTP Request responds with combination of string and JSON, something like this:
null{"username:name","email:email"}
I need only the JSON part.
I directly tried parsing as json object, which was not right of course. I tried splitting it: serverResponse.split("{"), but android does not allow to parse with this character because it is not a pattern. Any suggestion how i can achieve this?
String.split uses regular expressions, and since '{' is a special character in regular expressions, you should escape it like this: serverResponse.split("\\{").
It would be better to change the server side, but you can also just use split. The only thing you need to do is escape your {.
String json = serverResponse.split("\\{")[1];
It is a bad idea and a bad practice to split a Json. If one day it you change on the serve side, it may pick a wrong part of your Json Object.
I recommend you to PARSE it, even if it is simple and small.

Replace multiple backslashes with HTML entities in Java string for JSP generated Json

after hours of googeling and searching within SO, I finaly come to the place where I need to ask you! :)
Situation is the following:
A webservice delivers data in a CDATA. This data is parsed and put into our model. Using Spring MVC we access the model inside the JSP files to create....here come the point... JSON! Don't ask, historically! ;-)
Now, somehow someone came to the glorious idea to put multiple (back)slashes into a title property. The getTitle() method returns the string "/// Glasvegas \\". This of course doesn't work, if we do a JavaSCript eval() on the JSON (created within the JSP) to get the JavaScript Json object. It simply interprets the backslashes as comment, making the Json invalid.
I tried to use the escapeHtml() methods from apache.common and springframework, but they both just ignore the backslashes while encoding all other special chars correctly.
Then I tried to write my own method:
public static String escapeHTML(String string) {
String foreslash="\";
String regex="\\\\";
System.out.println(string.replaceAll(regex,foreslash));
string.replaceAll(regex,foreslash);
return string;
}
In console output the string is correctly replaced, but if break at the return and inspect the variable 'string' in the debugger it's still "/// Glasvegas \\". Also the same in the generated JSP.
So, I'm kind of lost here.
Regards,
ASP
strings are immutable. the name of the method "replaceAll" makes it sound as though you're actually modifying the string object itself, but you're not. the method just returns the result of the operation. this is why you get the correct output from the System.out.println. but then you make an error of thought, thinking that just because the call is standing by itself, not inside a System.out.println, the java code should understand by itself that this time you want the change to be permanent in the string object ;)
try to rewrite the end of your method like this:
System.out.println(string.replaceAll(regex,foreslash));
return string.replaceAll(regex,foreslash);
also, the virable name "foreslash" makes it sound as though 92 is the code for a forward slash. maybe it is, i don't know. your regular expression then looks for a backslash. that's a bit confusing!

Need a little help on this regular expression

I have a Java string which looks like this, it is actually an XML tag:
"article-idref="527710" group="no" height="267" href="pc011018.pct" id="pc011018" idref="169419" print-rights="yes" product="wborc" rights="licensed" type="photo" width="322" "
Now I want to remove the article-idref="52770" segment by using regular expression, I came up with the following one:
trimedString.replaceAll("\\article-idref=.*?\"","");
but it doesn't seem to work, could anybody give me an idea on where I got wrong in my regular expression? I need this to be represented as a String in my Java class, so probably HTMLParser won't help me a lot here.
Thanks in advance!
Try this:
trimedString.replaceAll("article-idref=\"[^\"]*\" *","");
I corrected the regular expression by adding quotes and a word boundary (to prevent false matches). Also, in case you didn't, remember to reassign to your string after the replacement:
trimmedString = trimmedString.replaceAll("\\barticle-idref=\".*?\"", "");
See it working at ideone.
Also since this is from an XML document it might be better to use an XML parser to extract the correct attributes instead of a regular expression. This is because XML is quite a complex data format to parse correctly. The example in your question is simple enough. However a regular expression could break on a more complex case, such as a document that includes XML comments. This could be an issue if you are reading data from an untrusted source.
if you are sure the article-idref is allways at the beginning try this:
// removes everything from the beginning to the first whitespace
trimedString = trimedString.replaceFirst("^\\s","");
Be sure to assign the result to trimedString again, since replace does not midify the string itself but returns another string.

Split textual script into substrings by pattern

Consider following script (it's total nonsense in pseudo-language):
if (Request.hostMatch("asfasfasf.com") && someString.existsIn(new String[] {"brr", "hrr"})) {
if (Requqest.clientIp("10.0.x.x")) {
somevar = "1";
}
somevar = "2";
}
else {
somevar = "first";
}
string foo = "foo";
// etc. etc.
How would you grab if-block's parameters and contents from it? The if-block has format of:
if<whitespace>(<parameters>)<whitespace>{<contents>}<anything>
I tried using String.split() with regex pattern of ^if\s*\(|\)\s*\{|\}\s* but this fails miserably. Namely, the problem is that ) { is found also in inner if-block and the closing } is found from many places as well. I don't think neither lazy or eager expansion works here.
So... any pointers to what might I need here in order to implement this with regex?
I also need to get the remaining string without the if-block's code (so code starting from else { ...). Using just String.split() seems to make it difficult as there is no information about the length of the parts that were parsed away.
I initially created a loop based solution (using String.substring() heavily) for this, but it's dull. I would like to have something fancier instead. Should I go with regex or create a custom, generic function (there are many other cases than just this) that takes the parseable String and the pattern instead (consider the if<whitespace>(... pattern above)?
Edit: Changed returns to variable assignments as it would have not made sense otherwise.
You'd be far better off using (or writing) a parser than trying to do this with Regex.
Regex is great for somethings, but for complex parsing like this, it sucks. Another example where it sucks that gets asked a lot here is parsing HTML - you can do it to a limited degree, but for anything complex, a DOM parser is a much better solution.
For a [very] simple parser, what you need is a recursive function that searches for a braces { and }, recursing down a level each time it comes across an opening brace, and returning back up a level when it finds a closing brace. It then needs to store the string contents between the two braces at each level.
A regular language won't work because a regular grammar can't match things like "any number of open parenthesis followed by any number of close parenthesis". A context-free grammar would be needed for that.
Unless you use a context-free grammar parser for Java or a regular expression extension that makes regular expressions no longer regular, your loop-based solution is probably the fanciest solution.
As per the above, you'll need a parser. One type that's easy to implement (and fun to write!) is a recursive descent parser with backtracking. There is also a plethora of parser generators out there, though most of those have a learning curve. One Java-friendly parser generator is JavaCC.

Categories