Detect " in a JSON-String and escape it at certain position - java

I receive a string which contains a JSON object, unfortunately one of the values in that JSON-String looks like this:
{
"name":"Content-Type",
"value":"multipart/alternative; boundary="=-SITt2U5w3MJ1Y3RihaWzxw==""
}
As you can see the value of this JSON Object contains two " signs which need to be escaped so the outcome looks like this:
{
"name":"Content-Type",
"value":"multipart/alternative; boundary=/"=-SITt2U5w3MJ1Y3RihaWzxw==/""
}
Then the parsing into a JAVA Pojo works. Unforunately I can't detect it, the object above is one of many which come in an array. I know I could detect 'boundary' as a key word and escape the next two "-signs, I tried to get it working with regular expressions, but had no success doing so.
What could I do here?

If the intention is to escape the double quotes using regex, the below snippet can help.
public static void main(String[] args) {
String text = "{\n" +
" \"name\":\"Content-Type\",\n" +
" \"value\":\"multipart/alternative; boundary=\"=-SITt2U5w3MJ1Y3RihaWzxw==\"\"\n" +
"}";
Pattern pattern = Pattern.compile("boundary=\"([^\"]*?)\"");
Matcher matcher = pattern.matcher(text);
if (matcher.find()) {
System.out.println(matcher.replaceAll("boundary=\\\\\"" + matcher.group(1) + "\\\\\""));
}
}
It produces the below output:
{
"name":"Content-Type",
"value":"multipart/alternative; boundary=\"=-SITt2U5w3MJ1Y3RihaWzxw==\""
}
Regex pattern: boundary="([^]*?)"
Explanation: After boundary=" possessively match all characters till the next ". In replacement using the capture group to include escape characters

Related

Java: Weirdness in replaceAll RegEx

I'm trying to manipulate a String in Java to recognize the markdown options in Facebook Messenger.
I tested the RegEx in a couple of online testers and it worked, but when I tried to implement in Java, it's only recognizing text surrounded by underscores. I have an example that shows the problem here:
private String process(String input) {
String processed = input.replaceAll("(\\b|^)\\_(.*)\\_(\\b|$)", "underscore")
.replaceAll("(\\b|^)\\*(.*)\\*(\\b|$)", "star")
.replaceAll("(\\b|^)```(.*)```(\b|$)", "backticks")
.replaceAll("(\\b|^)\\~(.*)\\~(\\b|$)", "tilde")
.replaceAll("(\\b|^)\\`(.*)\\`(\\b|$)", "tick")
.replaceAll("(\\b|^)\\\\\\((.*)\\\\\\)(\\b|$)", "backslashparen")
.replaceAll("\\*", "%"); // am I matching stars wrong?
return processed;
}
public void test() {
String example = "_Text_\n" +
"*text*\n" +
"~Text~\n" +
"`Text`\n" +
"_Text_\n" + // is it only matching the first one?
"``` Text ```\n" +
"\\(Text\\)\n" +
"~Text~\n";
System.out.println(process(example));
}
I expect all the lines would match and be replaced, but only the first line was matched. I wondered if it was because it was the first line, so I copied it in the middle and it matched both. Then I figured I might have missed something matching the special characters, so I added the snip to match the astericks and replace with a percent sign and it worked. The output I'm getting is like so:
underscore
%text%
~Text~
`Text`
underscore
``` Text ```
\(Text\)
~Text~
Any ideas what I might be missing?
Thanks.
If you're using word boundaries then there is no need to match anchors in alternation because word boundary also matches start and end positions. So this are actually redundant matches:
(?:^|\b)
(?:\b|$)
and both can be just be replaced by \b.
However looking at your regex please note that only underscore is considered a word character and *, ~, ` are not word characters hence \b cannot be used around those characters instead \B should be used which is inverse of \b.
Besides this some more improvements can be done like using a negated character class instead of greedy .* and removing unnecessary group.
Code:
class MyRegex {
public static void main (String[] args) {
String example = "_Text_\n" +
"*text*\n" +
"~Text~\n" +
"`Text`\n" +
"_Text_\n" + // is it only matching the first one?
"``` Text ```\n" +
"\\(Text\\)\n" +
"~Text~\n";
System.out.println(process(example));
}
private static String process(String input) {
String processed = input.replaceAll("\\b_[^_]+_\\b", "underscore")
.replaceAll("\\B\\*[^*]+\\*\\B", "star")
.replaceAll("\\B```.+?```\\B", "backticks")
.replaceAll("\\B~[^~]+~\\B", "tilde")
.replaceAll("\\B`[^`]+`\\B", "tick")
.replaceAll("\\B\\\\\\(.*?\\\\\\)\\B", "backslashparen");
return processed;
}
}
Code Demo

Remove a string if it ends within java

I have to remove "OR" if it ends with in a given string.
public class StringReplaceTest {
public static void main(String[] args) {
String text = "SELECT count OR %' OR";
System.out.println("matches:" + text.matches("OR$"));
Pattern pattern = Pattern.compile("OR$");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("Found match at: " + matcher.start() + " to " + matcher.end());
System.out.println("substring:" + text.substring(matcher.start(), matcher.end()));
text = text.replace(text.substring(matcher.start(), matcher.end()), "");
System.out.println("after replace:" + text);
}
}
}
Output:
matches:false
Found match at: 19 to 21
substring:OR
after replace:SELECT count %'
Its removing all the occurrences of the string "OR" but I have to remove if its ends with only.
How to do that ?
Also regex is working with Pattern but not working with String.matches().
What is the difference between both and what is the best way to remove a string if it ends with ?
text.matches(".*OR$") as the match goes over the entire string.
Or:
if (text.endsWith("OR"))
Or:
text = text.replaceFirst(" OR$", "");
If you need to just remove the last OR, then I suggest using substring method as it is faster than a full regex pattern. In that case, you can remove the OR using this code:
text.substring(0, text.lastIndexOf("OR"));
If you need to replace OR by something else, you will need to use this code which detects the last OR with a break in the string.
text.replaceFirst("\\bOR$", "SOME");

Match string between multiple brackets

I have this very long JSON string. I would like to filtrate it and only get the data between the first bracket. The problem is, I have many other brackets therefore my regex pattern is not working properly.
Here is the JSON string:
String jsondata = "["
+"{"
+ "test: 63453645"
+"date: 2016-07-17"
"{"
+ "id:534534"
+"}"
+ "blank : null"
+ "flags : null"
+ "}"
+"{"
+ "test: 543564236"
+"date: 2014-07-17"
+"{"
+ "id:6532465"
+"}"
+ "blank : null"
+ "flags : null"
+ "}"
+"]";
pattern = "\\{[^{}]*\\}";
pr = Pattern.compile(pattern);
math = pr.matcher(jsondata);
if (math.find()) {
System.out.println(math.group());
}
else
System.out.println("nomatch");
The problem with the pattern that I have is that it only prints out to the first } after the id:, but I want it to end at the last } which is after flags: null.
And I only want to print the first match, i.e not the string after because the also start and end with the same character, and that is why I have an if statement instead of a while loop.
Any suggestions? Thank you!
Regex with multiple brackets seems like a very difficult task. Can I match the last string instead? Starting from { to flags : null?
Like I said in comment,
I usually make use of JSON-Simple.
A great tutorial, decoding.
would look somewhat like:
JSONObject obj = JSONValue.parse(jsondata);
obj.get("test");
PS.
I do see some errors in your json data, make use of jsonlint to verify if your json is formatted correctly...
This will grab everything between the first { and last }:
String guts = jsondata.replaceAll("(?s)^.*?\\{(.*?flags : null[^}]*).*$", "$1");
The regex captures everything after the first { up to your semaphore text and all non-} chars following.

How to ignore characters before and after my pattern?

I need some help creating a regex (in java) to find a pattern like:
(30.6284, -27.3493)
It's roughly a latitude longitude pair. Building from smaller pieces, I've come up with this:
String def = "\\((\\-?\\d+\\.\\d+),\\s*(\\-?\\d+\\.\\d+)\\)";
which works ok if I don't have any characters before or after the parenthesis. So this fails:
"hello (30.6284, -27.3493) "
but it'll work if I remove the "hello " before and the trailing whitespace. How can I ignore any other sequence of characters before and after the expression?
Thanks
You can use the following piece of code to find and extract multiple instances of the pattern in your text.
String def = "\\((\\-?\\d+\\.\\d+),\\s*(\\-?\\d+\\.\\d+)\\)";
String text = "hello (30.6284, -27.3493) (30.6284, -27.3493) ";
Pattern p = Pattern.compile(def);
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(text.substring(m.start(), m.end()));
}
I came up with this using this website: http://regexpal.com/ and http://www.regextester.com/
\(-?\d+\.?\d+, -?\d+\.?\d+\)
This will match, but not capture, and probably isn't in your language specific format (but should be easily modifiable. To support capturing you could use this one:
\((-?\d+\.?\d+), (-?\d+\.?\d+)\)
String s = "hello (30.6284, -27.3493) ";
System.out.println(s.replaceAll(".*(\\((\\-?\\d+\\.\\d+),\\s*(\\-?\\d+\\.\\d+)\\)).*","$1"));
output:
(30.6284, -27.3493)
Note that if you're going to be looping through to find things, I would use something like this:
Matcher m = Pattern.compile(".*(\\((\\-?\\d+\\.\\d+),\\s*(\\-?\\d+\\.\\d+)\\)).*").matcher(s);
while(m.find()){
System.out.println(m.start()+ " " + m.group(1));
}

Regular expression help in java

I am lost when it comes to building regex strings. I need a regular expression that does the following.
I have the following strings:
[~class:obj]
[~class|class2|more classes:obj]
[!class:obj]
[!class|class2|more classes:obj]
[?method:class]
[text]
A string can have multiple of whats above. Example string would be "[if] [!class:obj]"
I want to know what is in between the [] and broken into match groups. For example, the first match group would be the symbol if present (~|!|?) next what is before the : so that could be class or class|class2|etc... then what is on the right of the : and stop before the ]. There may be no : and what goes before it, but just something between the [].
So, how would I go about writing this regex? And is it possible to give the match group names so I know what it matched?
This is for a java project.
If you're sure enough of your inputs, you can probably use something like /\[(\~|\!|\?)?(?:((?:[^:\]]*?)+):)?([^\]]+?)\]/. (to translate that into Java, you'll want to escape the backslashes and use quotation marks instead of forward slashes)
Here are some web sites that might be helpful:
http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html
http://txt2re.com/index.php3?s=Test+test+june+2011+test&submit=Show+Matches
http://www.regexplanet.com/simple/
I believe that this should work:
/[(.*?)(?:\|(.*?))*]/
Also:
[a-z]*
Try this code
final Pattern
outerP = Pattern.compile("\\[.*?\\]"),
innerP = Pattern.compile("\\[([~!?]?)([^:]*):?(.*)\\]");
for (String s : asList(
"[~class:obj]",
"[if][~class:obj]",
"[~class|class2|more classes:obj]",
"[!class:obj]",
"[!class|class2|more classes:obj]",
"[?method:class]",
"[text]"))
{
final Matcher outerM = outerP.matcher(s);
System.out.println("Input: " + s);
while (outerM.find()) {
final Matcher m = innerP.matcher(outerM.group());
if (m.matches()) System.out.println(
m.group(1) + ";" + m.group(2) + ";" + m.group(3));
else System.out.println("No match");
}
}

Categories