Java Parsing of a String - java

I have a string like this:
String unparsed = "[thing.1][thin2g]"
I want to turn it into
"thing.1"
"thin2g"
Been trying for a while with regex expressions but nothing. Any thoughts? Thanks!
EDIT:
Tried:
String unparsed = "[thing.1][thin2g]"
String substring = unparsed.substring(1,unparsed.length - 1)
substring.replace("][","`")
String[] split = substring.split('`')
for(int i=0;i<split.length;i++)
{
System.out.println(split[i])
}
But this seems kinda heavy, was looking for something more elegant

String unparsed = "[thing.1][thin2g]";
Pattern pattern = Pattern.compile("\\[(.*?)\\]");
Matcher matcher = pattern.matcher(unparsed);
while(matcher.find()){
System.out.println(matcher.group(1));
}
My regex is not good. But it does parse the string into what you want.

\[\s*\w.*?\]
I never use regular expressions before but this one should work

Related

Java regex extract capture group if it exists

I apparently don't understand Java's regex library or regex either for that matter.
for this string:
String text = "asdf 2013-05-12 asdf";
this regex explodes in my face:
String REGEX_FORMAT_1 = ".+?([0-9]{4}\\s?-\\s?[0-9]{2}\\s?-\\s?[0-9]{2}).+";
Matcher matcher_1 = PATTERN_FORMAT_1.matcher(text);
if(matcher_1.matches()) {
String matchedGroup = matcher_1.group();
...
}
Semantically this makes sense to me but it seems I've totally misunderstood something. The regex works fine in some online regex editors like regex101 but not in others. Could someone please help me understand why I don't get the capture group containing 2013-05-12 ...
group() is equivalent to group(0) and returns the entire matched string. Use group(1) to pull out the first matched group.
String text = "asdf 2013-05-12 asdf";
String regex = ".+?([0-9]{4}\\s?-\\s?[0-9]{2}\\s?-\\s?[0-9]{2}).+";
Matcher matcher = Pattern.compile(regex).matcher(text);
if (matcher.matches()) {
String matchedGroup = matcher.group(1);
System.out.println(matchedGroup);
}
Output:
2013-05-12

String split for specific element

I need to split the following string only the data between the "CHAR" tabs:
Input:
<MSG><KEY>name.extObject</KEY><PARAM><CHAR>Number</CHAR><CHAR>7015:188188</CHAR></PARAM></MSG>
Expected output: Number 7015:188188
I am looking for something efficient.
Any recommendation ?
Thanks
It is good practice to avoid parsing XML/HTML with regex. Instead you can use proper XML parser? I like to use jsoup so here is example how it can be done with this libraryL:
String data = "<MSG><KEY>name.extObject</KEY><PARAM><CHAR>Number</CHAR><CHAR>7015:188188</CHAR></PARAM></MSG>";
Document doc = Jsoup.parse(data, "", Parser.xmlParser());
String charText = doc.select("CHAR").text();
System.out.println(charText);
Output: Number 7015:188188
I think you meant to capture the content between tags than splitting the string.
It's well known that you should NOT use a regex to parse xhtml since you can get w͈̦̝͉̬͔͕͡ͅe̴͏̰̜͖̗̤̙̖̕i̧̩̭̳̱̖̦͠ͅŗ̴̼̺̻͕̀d̶̩̖̦̖̲̣̺̫͘ ̡͇̥̩͓c͕̻̫͉̞͝ͅo̯̗͜͜͝ṇ̠͘t̛̬̮̞̥͕̙̞e̷̸̗̼͟ͅn̡͎̖̜̱͟͢t̨̙̫̻̱̺͈̗͝. Although, if you still want a regex you can use a regex like this:
<CHAR>(.*?)<\/CHAR>
Working demo
And you can have this java code:
String line = "<MSG><KEY>name.extObject</KEY><PARAM><CHAR>Number</CHAR><CHAR>7015:188188</CHAR></PARAM></MSG>";
Pattern pattern = Pattern.compile("<CHAR>(.*?)<\\/CHAR>");
Matcher matcher = pattern.matcher(line);
String result = "";
while (matcher.find()) {
result += matcher.group(1) + " ";
}
System.out.println(result); //Prints: Number 7015:188188
Update: as Pshemo pointed in his comment:
/ is not special character in Java regex engine. You don't have to escape it
So, you can use:
Pattern pattern = Pattern.compile("<CHAR>(.*?)</CHAR>");
Btw, I really like Pshemo answer, it's a nice approach to solve this without regex and xhtml
In case you know the tag value is always some digit, then an optional colon with digits, and it is the only <CHAR> tag that has such a numeric value, you may want to use this regex:
(?<=<CHAR>)\d+(?::\d+)?(?=<\/CHAR>)
Java string:
String pattern = "(?<=<CHAR>)\\d+(?::\\d+)?(?=</CHAR>)";
Sample code:
String str = "<MSG><KEY>name.extObject</KEY><PARAM><CHAR>Number</CHAR><CHAR>7015:188188</CHAR></PARAM></MSG>";
Pattern ptrn = Pattern.compile("(?<=<CHAR>)\\d+(?::\\d+)?(?=</CHAR>)");
Matcher matcher = ptrn.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(0));
}
Output:
7015:188188
String s = inputString;
String result="";
while(s.indexOf("<CHAR>") != -1)
{
result += s.substring(s.indexOf("<CHAR>") + "<CHAR>".length(), s.indexOf("</CHAR>")) + " ";
s = s.substring(s.indexOf("</CHAR>") + "</CHAR>".length());
}
//result is now the desired output
Regex for that is : (.*?)</CHAR>
However, it is better to use an XML parser for that.

how to ignore a value in a text?

I have a string like this :
EQ=ENABLED,QLPUB=50,EPRE=ENABLED
how can I ignore, the value of QLPUB? Actually I want to check this string in 3000 lines but I want to ignore 50.
is there any way to ignore it, for example with java regular expression or %s or ... ?
Try this regular expression:
s = s.replaceAll("(^|,)QLPUB=[^,]*", "");
See it working online: ideone
If value of QLPUB is always numeric you can use the following regex:
^EQ=ENABLED,QLPUB=\d*,EPRE=ENABLED$
Here's an example:
String text = "EQ=ENABLED,QLPUB=502,EPRE=ENABLED";
String pattern = "^EQ=ENABLED,QLPUB=\\d*,EPRE=ENABLED$";
Pattern compiledPattern = Pattern.compile(pattern);
Matcher matcher = compiledPattern.matcher(text);
if(matcher.find()) {
System.out.println(matcher.group());
}
If the value of QLPUB is anything but a , change the regex to:
^EQ=ENABLED,QLPUB=[^,]*,EPRE=ENABLED$
You could use regex /^EQ=ENABLED,QLPUB=\d+,EPRE=ENABLED$/. In java this would look like this:
String myString = "EQ=ENABLED,QLPUB=50,EPRE=ENABLED";
if(myString.matches("^EQ=ENABLED,QLPUB=\\d+,EPRE=ENABLED$"))
{
//your string matches regardless of the value of QLPUB
}

Parsing an expression containing repeated groups using Java regexp

I'm trying to parse from a string like below
"name1(value1),name2(value2),name3(value3),name4(value4),........" and so it goes
How can I do it recursively with groups?
String s = "name1(value1),name2(value2),name3(value3),name4(value4),";
Pattern p = Pattern.compile(".*?\\((.*?)\\)");
Matcher m = p.matcher(s);
while(m.find()){
System.out.println(m.group(1));
}
i would rather use the java String operations to get to the values but if you want to use regex, you could use something that looks like that:
[^\(]*\([^\)]*\),
Should be quite stable
You can test it here:
http://regexr.com?2u7u3
You can use matcher.find, try something like this:
String input = "name1(value1),name2(value2),name3(value3),name4(value4)";
Matcher matcher = Pattern.compile(".*?[(].*?[)]").matcher(input);
while(matcher.find()) {
System.out.println(matcher.group(0));
}
or just use String.split like this:
String input = "name1(value1),name2(value2),name3(value3),name4(value4)";
String[] split = input.split(",");

Java regular expression with hyphen

I need to match and parse data in a file that looks like:
4801-1-21-652-1-282098
4801-1-21-652-2-282098
4801-1-21-652-3-282098
4801-1-21-652-4-282098
4801-1-21-652-5-282098
but the pattern I wrote below does not seem to work. Can someone help me understand why?
final String patternStr = "(\\d+)-(\\d+)-(\\d+)-(\\d+)-(\\d+)-(\\d+)";
final Pattern p = Pattern.compile(patternStr);
while ((this.currentLine = this.reader.readLine()) != null) {
final Matcher m = p.matcher(this.currentLine);
if (m.matches()) {
System.out.println("SUCCESS");
}
}
It looks correct.
Something odd is conatined in your lines, probably. Look for some extra spaces and line breaks.
Try this:
final Matcher m = p.matcher(this.currentLine.trim());
Have you tried escaping the - as \\-?
It should work. Make sure there is no invisible characters, you an trim each line. You can refine the code as :
final String patternStr = "(\\d{4})-(\\d{1})-(\\d{2})-(\\d{3})-(\\d{1})-(\\d{6})";
There is white space in the data
4801-1-21-652-1-282098
4801-1-21-652-2-282098
4801-1-21-652-3-282098
4801-1-21-652-4-282098
4801-1-21-652-5-282098
final String patternStr = "\\s*(\\d+)-(\\d+)-(\\d+)-(\\d+)-(\\d+)-(\\d+)";

Categories