Split string between words and quotation marks - java

I currently have this string:
"display_name":"test","game":"test123"
and I want to split the string so I can get the value test. I have looked all over the internet and tried some things, but I couldn't get it to work.
I found that splitting using quotation marks could be done using this regex: \"([^\"]*)\". So I tried this regex: display_name:\":\"([^\"]*)\"game\", but this returned null. I hope that someone could explain me why my regex didn't work and how it should be done.

You forget to include the ",comma before "game" and also you need to remove the extra colon after display_name
display_name\":\"([^\"]*)\",\"game\"
or
\"display_name\":\"([^\"]*)\",\"game\"
Now, print the group index 1.
DEMO
Matcher m = Pattern.compile("\"display_name\":\"([^\"]*)\",\"game\"").matcher(str);
while(m.find())
{
System.out.println(m.group(1))
}

I think you could do it easier, like this:
/(\w)+/g
This little regex will take all your strings.
Your java code should be something like:
Pattern pattern = Pattern.compile("(\w)+");
Matcher matcher = pattern.matcher(yourText);
while (matcher.find()) {
System.out.println("Result: " + matcher.group(2));
}
I also want to note as #AbishekManoharan noted that it looks like JSON

Related

Regex in java to extract specific pattern

I want to match the pattern (including the square brackets, equals, quotes)
[fixedtext="sometext"]
What would be a correct regex expression?
Anything can occur inside quotes. 'fixedtext' is fixed.
Your basic solution (although I'd be skeptical of this, per the comments) is essentially:
"\\[fixedtext=\\\"(.*)\\\"\\]"
which resolves to:
"\[fixedtext=\"(.*)\"\]"
Simple escaping of [] and quotes. The (.*) says capture everything in quotes as a capture group (matcher.group(1)).
But if you had a string of, for example '[fixedtext="abc\"]def"]' you'd get the an answer of abc\ instead of abc\"]def.
If you know the ending bracket ends the line, then use:
"\\[fixedtext=\\\"(.*)\\\"\\]$"
(add the $ at the end to mark end of line) and that should be fairly reliable.
My suggestion is using named-capturing groups.
You can find more details here:
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Here's an example for your input:
String input = "[fixedtext=\"sometext\"]";
Pattern pattern = Pattern.compile("\\[(?<field>.*)=\"(?<value>.*)\"]");
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
System.out.println(matcher.group("field"));
System.out.println(matcher.group("value"));
} else {
System.err.println(input + " doesn't match " + pattern);
}

Extract substring after a certain pattern

I have the following string:
http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true
How can I extract the part after 30/? In this case, it's 32531a5d-b0b1-4a8b-9029-b48f0eb40a34.I have another strings having same part upto 30/ and after that every string having different id upto next / which I want.
You can do like this:
String s = "http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true";
System.out.println(s.substring(s.indexOf("30/")+3, s.length()));
split function of String class won't help you in this case, because it discards the delimiter and that's not what we want here. you need to make a pattern that looks behind. The look behind synatax is:
(?<=X)Y
Which identifies any Y that is preceded by a X.
So in you case you need this pattern:
(?<=30/).*
compile the pattern, match it with your input, find the match, and catch it:
String input = "http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true";
Matcher matcher = Pattern.compile("(?<=30/).*").matcher(input);
matcher.find();
System.out.println(matcher.group());
Just for this one, or do you want a generic way to do it ?
String[] out = mystring.split("/")
return out[out.length - 2]
I think the / is definitely the delimiter you are searching for.
I can't see the problem you are talking about Alex
EDIT : Ok, Python got me with indexes.
Regular expression is the answer I think. However, how the expression is written depends on the data (url) format you want to process. Like this one:
Pattern pat = Pattern.compile("/Content/SiteFiles/30/([a-z0-9\\-]+)/.*");
Matcher m = pat.matcher("http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true");
if (m.find()) {
System.out.println(m.group(1));
}

Java replaceAll regex With Similar Result

Alright folks, my brain is fried. I'm trying to fix up some EMLs with bad boundaries by replacing the incorrect
--Boundary_([ArbitraryName])
lines with more proper
--Boundary_([ArbitraryName])--
lines, while leaving already correct
--Boundary_([ThisOneWasFine])--
lines alone. I've got the whole message in-memory as a String (yes, it's ugly, but JavaMail dies if it tries to parse these), and I'm trying to do a replaceAll on it. Here's the closest I can get.
//Identifie bondary lines that do not end in --
String regex = "^--Boundary_\\([^\\)]*\\)$";
Pattern pattern = Pattern.compile(regex,
Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
Matcher matcher = pattern.matcher(targetString);
//Store all of our unique results.
HashSet<String> boundaries = new HashSet<String>();
while (matcher.find())
boundaries.add(s);
//Add "--" at the end of the Strings we found.
for (String boundary : boundaries)
targetString = targetString.replaceAll(Pattern.quote(boundary),
boundary + "--");
This has the obvious problem of replacing all of the valid
--Boundary_([WasValid])--
lines with
--Boundary_([WasValid])----
However, this is the only setup I've gotten to even perform the replacement. If I try changing Pattern.quote(boundary) to Pattern.quote(boundary) + "$", nothing is replaced. If I try just using matcher.replaceAll("$0--") instead of the two loops, nothing is replaced. What's an elegant way to achieve my aim and why does it work?
There's no need to iterate through the matches with find(); that's part of what replaceAll() does.
s = s.replaceAll("(?im)^--Boundary_\\([^\\)]*\\)$", "$0--");
The $0 in the replacement string is a placeholder whatever the regex matched in this iteration.
The (?im) at the beginning of the regex turns on CASE_INSENSITIVE and MULTILINE modes.
You can try something like this:
String regex = "^--Boundary_\\([^\\)]*\\)(--)?$";
then see if the string ends with -- and replace only ones that don't.
Assuming all the strings are on there own line this works:
"(?im)^--Boundary_\\([^)]*\\)$"
Example script:
String str = "--Boundary_([ArbitraryName])\n--Boundary_([ArbitraryName])--\n--Boundary_([ArbitraryName])\n--Boundary_([ArbitraryName])--\n";
System.out.println(str.replaceAll("(?im)^--Boundary_\\([^)]*\\)$", "$0--"));
Edit: changed from JavaScript to Java, must have read too fast.(Thanks for pointing it out)

Replace a word that is not on a string

I'm trying to replace a word in a file whenever it appears except when it is contained in a string:
So I should replace this in
The test in this line consists in ...
But should not match in :
The test "in this line" consist in ...
This is what I'm trying:
line.replaceAll( "\\s+this\\s+", " that ")
But it fails with this scenario so I tried using:
line.replaceAll( "[^\"]\\s+this\\s+", " that ")
But doesn't work either.
Any help would be appreciated
This seems to work (in so far as I understand your requirements from the examples provided):
(?!.*\s+this\s+.*\")\s+this\s+
http://rubular.com/r/jZvR4XEbRf
You may need to adjust the escaping for java.
This is a bit better actually:
(?!\".*\s+this\s+)(?!\s+this\s+.*\")\s+this\s+
The only reliable way to do this is to search for EITHER a complete, quoted sequence OR the search term. You do this with one regex, and after each match you determine which one you matched. If it's the search term, you replace it; otherwise you leave it alone.
That means you can't use replaceAll(). Instead you have to use the appendReplacement() and appendTail() methods like replaceAll() itself does. Here's an example:
String s = "Replace this example. Don't replace \"this example.\" Replace this example.";
System.out.println(s);
Pattern p = Pattern.compile("\"[^\"]*\"|(\\bexample\\b)");
Matcher m = p.matcher(s);
StringBuffer sb = new StringBuffer();
while (m.find())
{
if (m.start(1) != -1)
{
m.appendReplacement(sb, "REPLACE");
}
}
m.appendTail(sb);
System.out.println(sb.toString());
output:
Replace this example. Don't replace "this example." Replace this example.
Replace this REPLACE. Don't replace "this example." Replace this REPLACE.
See demo online
I'm assuming every quotation mark is significant and they can't be escaped--in other words, that you're working with prose, not source code. Escaped quotes can be dealt with, but it greatly complicates the regex.
If you really must use replaceAll(), there is a trick where you use a lookahead to assert that the match is followed by an even number of quotes. But it's really ugly, and for large texts you might find it prohibitively expensive, performance-wise.

Strip all reluctant curly braces using regex

Note: This is a Java-only question (i.e. no Javascript, sed, Perl, etc.)
I need to filter out all the "reluctant" curly braces ({}) in a long string of text.
(by "reluctant" I mean as in reluctant quantifier).
I have been able to come up with the following regex which correctly finds and lists all such occurrences:
Pattern pattern = Pattern.compile("(\\{)(.*?)(\\})", Pattern.DOTALL);
Matcher matcher = pattern.matcher(originalString);
while (matcher.find()) {
Log.d("WITHIN_BRACES", matcher.group(2));
}
My problem now is how to replace every found matcher.group(0) with the corresponding matcher.group(2).
Intuitively I tried:
while (matcher.find()) {
String noBraces = matcher.replaceAll(matcher.group(2));
}
But that replaced all found matcher.group(0) with only the first matcher.group(2), which is of course not what I want.
Is there an expression or a method in Java's regex to perform this "corresponding replaceAll" that I need?
ANSWER: Thanks to the tip below, I have been able to come up with 2 fixes that did the trick:
if (matcher.find()) {
String noBraces = matcher.replaceAll("$2");
}
Fix #1: Use "$2" instead of matcher.group(2)
Fix #2: Use if instead of while.
Works now like a charm.
You can use the special backreference syntax:
String noBraces = matcher.replaceAll("$2");

Categories