Regex: capture group in list like string - java

I've searched stacked overflow and the net and I found similar questions but none that gave me a concrete answer. I have a string that acts as a list with the following formatting
Key(Value)/Key(value)/Key(value,value)). I would like to match them by key name IF the key exists, so I don't really want the parenthesis included anywhere.. just the key and the value. I coded something out, but it's a real mess...
so my conditions are:
1)extract key value pairs without parenthesis
2)extract IF they are available...
3)If value portion of list contains two values delimited by a ",", extract individually
textToParse = "TdkRoot(0x0)/Tdk(0x2,0x0)/Tdk(0x0,0x1)/VAL(40A8F0B32240,2x4)/SN(0000:0000:0000:0000:0000:0000:0000:0000/IP(000.1.000.1)/Blue(2x4,2x4)"
String patternText = "^TdkRoot\(( [A-Za-z0-9]) Tdk\(( \\w}+) VAL\(( \\w) SN\(( \\w) IP\ (( \\w) Blue\(( \\w)"
Pattern pattern = Pattern.compile( patternText );
Matcher matcher = pattern.matcher(textToParse);
//Extract the groups from the regex (e.g. elements in braces)
String messageId = matcher.group( 1 );
String submitDate = matcher.group(4);
String statusText = matcher.group( 6 );
I think a cleaner/easier approach would be to extract the elements using patterns for each individual key/value. If so what pattern could I use to tell regex: for "key" grab "value" but leave the parenthesis... if value is delimited by a coma.. return array?? possibly?
Thanks Community!! Hope to hear from you!
PS I know (?<=\()(.*?)(?=\)) will capture anything in the parentheses "(This) value was captured), but how can I modify that to specify a key before the parentheses? "I want to capture whats in THIS(parentheses)" ... key THIS
possibly delimited by a coma

public static void main(String[] args) {
String textToParse = "TdkRoot(0x0)/Tdk(0x2,0x0)/Tdk(0x0,0x1)/VAL(40A8F0B32240,2x4)/SN(0000:0000:0000:0000:0000:0000:0000:0000)/IP(000.1.000.1)/Blue(2x4,2x4)";
Pattern p = Pattern.compile("(\\w+)\\((.*?)\\)");
Matcher m = p.matcher(textToParse);
while (m.find()) {
System.out.println("key :" + m.group(1));
if (m.group(2).contains(",")) {
String[] s = m.group(2).split(",");
System.out.println("values : " + Arrays.toString(s));
} else {
System.out.println("value :" + m.group(2));
}
}
}
o/p:
key :TdkRoot
value :0x0
key :Tdk
values : [0x2, 0x0]
key :Tdk
values : [0x0, 0x1]
key :VAL
values : [40A8F0B32240, 2x4]
key :SN
value :0000:0000:0000:0000:0000:0000:0000:0000
key :IP
value :000.1.000.1
key :Blue
values : [2x4, 2x4]

Not sure if this is what you are looking for (your sample code does not compile) but the following code parses the input text into a map :
String inputText = "TdkRoot(0x0)/Tdk(0x2,0x0)/Tdk(0x0,0x1)/VAL(40A8F0B32240,2x4)/SN(0000:0000:0000:0000:0000:0000:0000:0000)/IP(000.1.000.1)/Blue(2x4,2x4)";
Pattern outerPattern = Pattern.compile("([^/()]+)\\(([^()]+)\\)");
Pattern innerPattern = Pattern.compile("([^,]+)");
Map<String, Collection<String>> parsedData = new HashMap<String, Collection<String>>();
Matcher outerMatcher = outerPattern.matcher(inputText);
while (outerMatcher.find()) {
String key = outerMatcher.group(1);
String val = outerMatcher.group(2);
Collection<String> valueCollection = new ArrayList<String>();
Matcher innerMatcher = innerPattern.matcher(val);
while (innerMatcher.find()) {
valueCollection.add(innerMatcher.group(1));
}
parsedData.put(key, valueCollection);
}
System.out.println(parsedData);
The resulting map (printed on last line) is
{Blue=[2x4, 2x4], VAL=[40A8F0B32240, 2x4], IP=[000.1.000.1], TdkRoot=[0x0], SN=[0000:0000:0000:0000:0000:0000:0000:0000], Tdk=[0x0, 0x1]}

Related

Getting substring of a string that has a repeating character Java

I'm a writing a parser that will extract the tag and value out of a line that it reads from a file and I want to know how to get the value. So in this case I want to get
key = "accountName" and
value = "fname LName" and have it repeat with each line.
<accountName>fname LName</accountName>
<accountNumber>12345678912</accountNumber>
<accountOpenedDate>20200218</accountOpenedDate>
This is my code, this is within a while loop that is scanning each line using bufferedReader. I managed to get the key properly, but when I try to get the value, I get "String index out of range - 12. Not sure how to get the value between the two arrows > <.
String line;
if(line.startsWith("<"){
key = line.substring(line.indexOf("<"+1, line.indexOf(">"));
value = line.substring(line.indexOf(">"+1, line.indexOf("<")+1);
}
Though it is recommended to use XML parser but still if you want to do it by manually processing the string at each line:
(using regular expression is recommended to process line) but if you want todo manually with substring way here is the example:
private static void readKeyValue(String line) {
String key = null;
String value = null;
if (null != line && line.startsWith("<") && line.contains("</")) {
key = line.substring(line.indexOf("</")+ 2 , line.lastIndexOf(">"));
value = line.substring(line.indexOf(">") + 1, line.indexOf("</"));
}
System.out.println("key: "+ key);
System.out.println("value: "+ value);
}
You can use regular expressions to extract, assuming the line variable is a string read from each line.
String pattern = "<([a-zA-Z]+.*?)>([\\s\\S]*?)</[a-zA-Z]*?>";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
// find
if (m.find()) {
String key = m.group(1);
String value = m.group(2);
System.out.println("Key: " + key);
System.out.println("Value: " + value);
} else {
System.out.println("Invalid");
}

Using regex for doing string operation

I have a string
String s="my name is ${name}. My roll no is ${rollno} "
I want to do string operations to update the name and rollno using a method.
public void name(String name, String roll)
{
String new = s.replace(" ${name}", name).replace(" ${rollno}", roll);
}
Can we achieve the same using some other means like using regex to change after first "$" and similarly for the other?
You can use either Matcher#appendReplacement or Matcher#replaceAll (with Java 9+):
A more generic version:
String s="my name is ${name}. My roll no is ${rollno} ";
Matcher m = Pattern.compile("\\$\\{([^{}]+)\\}").matcher(s);
Map<String,String> replacements = new HashMap();
replacements.put("name","John");
replacements.put("rollno","123");
StringBuffer replacedLine = new StringBuffer();
while (m.find()) {
if (replacements.get(m.group(1)) != null)
m.appendReplacement(replacedLine, replacements.get(m.group(1)));
else
m.appendReplacement(replacedLine, m.group());
}
m.appendTail(replacedLine);
System.out.println(replacedLine.toString());
// => my name is John. My roll no is 123
Java 9+ solution:
Matcher m2 = Pattern.compile("\\$\\{([^{}]+)\\}").matcher(s);
String result = m2.replaceAll(x ->
replacements.get(x.group(1)) != null ? replacements.get(x.group(1)) : x.group());
System.out.println( result );
// => my name is John. My roll no is 123
See the Java demo.
The regex is \$\{([^{}]+)\}:
\$\{ - a ${ char sequence
([^{}]+) - Group 1 (m.group(1)): any one or more chars other than { and }
\} - a } char.
See the regex demo.

Extract Substring from String java

I want to extract specific substrings from a string:
String source = "info1 info1ContentA info1ContentB info3 info3ContentA info3ContentB"+
"info2 info2ContentA";
The result should be:
String info1 ="info1ContentA info1ContentB";
String info2 ="info2ContentA";
String info3 ="info3ContentA info3ContentB";
For me it's very difficult to extract the informations, because sometimes after "info" their are one, two or more content informations. Another problem that occurs is, that the order of info1, info2 etc. is not sorted and the "real data" doesn't contain a ascending number.
My first idea was to add info1, info2, info3 etc to an ArrayList.
private ArrayList<String> arr = new ArrayList<String>();
arr.add("info1");
arr.add("info2");
arr.add("info3");
Now I want to extract the substring with the method StringUtils.substringBetween() from Apache Commons (https://mvnrepository.com/artifact/org.apache.commons/commons-lang3/3.4):
String result = StringUtils.substringBetween(source, arr.get(0), arr.get(1));
This works, if info1 is in the string before info2, but like I said the "real data" is not sorted.
Any idea how I can fix this?
Split those string by space and then use String's method startsWith to add the part to proper result string
Map<String, String> resultMap = new HashMap<String, String>();
String[] prefixes = new String[]{"info1", "info2", "info3"};
String source = "info1 info1ContentA info1ContentB info3 info3ContentA info3ContentB"+" info2 info2ContentA";
String[] parts = source.split(" ");
for(String part : parts) {
for(String prefix : prefixes) {
if(part.startsWith(prefix) {
String currentResult = (resultMap.containsKey(prefix) ? resultMap.get(prefix) + part + " " : part);
resultMap.put(prefix, currentResult);
}
}
}
Also consider using StringBuilder instead of adding string parts
If you cannot be sure that parts will be embraces with spaces you can change at the beginning all part to <SPACE>part in your source string using String replace method
You can use a regular expression, like this:
String source = "info1 info1ContentA info1ContentB info3 info3ContentA info3ContentB info2 info2ContentA";
for (int i = 1; i < 3; i++) {
Pattern pattern = Pattern.compile("info" + i + "Content[A-Z]");
Matcher matcher = pattern.matcher(source);
List<String> matches = new ArrayList<>();
while (matcher.find()) {
matches.add(matcher.group());
}
// process the matches list
}

complex regular expression in Java

I have a rather complex (to me it seems rather complex) problem that I'm using regular expressions in Java for:
I can get any text string that must be of the format:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
I started with a regular expression for extracting the text between the M:/:D:/:C:/:Q: as:
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
And that works fine if the <either a url or string> is just an alphanumeric string. But it all falls apart when the embedded string is a url of the format:
tcp://someurl.something:port
Can anyone help me adjust the above reg exp to extract the text after :D: to be either a url or a alpha-numeric string?
Here's an example:
public static void main(String[] args) {
String name = "M:myString1:D:tcp://someurl.com:8989:C:myString2:Q:1";
boolean matchFound = false;
ArrayList<String> values = new ArrayList<>();
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
Matcher m3 = Pattern.compile(pattern2).matcher(name);
while (m3.find()) {
matchFound = true;
String m = m3.group(2);
System.out.println("regex found match: " + m);
values.add(m);
}
}
In the above example, my results would be:
myString1
tcp://someurl.com:8989
myString2
1
And note that the Strings can be of variable length, alphanumeric, but allowing some characters (such as the url format with :// and/or . - characters
You mention that the format is constant:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
Capture groups can do this for you with the pattern:
"M:(.*):D:(.*):C:(.*):Q:(.*)"
Or you can do a String.split() with a pattern of "M:|:D:|:C:|:Q:". However, the split will return an empty element at the first index. Everything else will follow.
public static void main(String[] args) throws Exception {
System.out.println("Regex: ");
String data = "M:<some text>:D:tcp://someurl.something:port:C:<some more text>:Q:<a number>";
Matcher matcher = Pattern.compile("M:(.*):D:(.*):C:(.*):Q:(.*)").matcher(data);
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
System.out.println();
System.out.println("String.split(): ");
String[] pieces = data.split("M:|:D:|:C:|:Q:");
for (String piece : pieces) {
System.out.println(piece);
}
}
Results:
Regex:
<some text>
tcp://someurl.something:port
<some more text>
<a number>
String.split():
<some text>
tcp://someurl.something:port
<some more text>
<a number>
To extract the URL/text part you don't need the regular expression. Use
int startPos = input.indexOf(":D:")+":D:".length();
int endPos = input.indexOf(":C:", startPos);
String urlOrText = input.substring(startPos, endPos);
Assuming you need to do some validation along with the parsing:
break the regex into different parts like this:
String m_regex = "[\\w.]+"; //in jsva a . in [] is just a plain dot
String url_regex = "."; //theres a bunch online, pick your favorite.
String d_regex = "(?:" + url_regex + "|\\p{Alnum}+)"; // url or a sequence of alphanumeric characters
String c_regex = "[\\w.]+"; //but i'm assuming you want this to be a bit more strictive. not sure.
String q_regex = "\\d+"; //what sort of number exactly? assuming any string of digits here
String regex = "M:(?<M>" + m_regex + "):"
+ "D:(?<D>" + d_regex + "):"
+ "C:(?<D>" + c_regex + "):"
+ "Q:(?<D>" + q_regex + ")";
Pattern p = Pattern.compile(regex);
Might be a good idea to keep the pattern as a static field somewhere and compile it in a static block so that the temporary regex strings don't overcrowd some class with basically useless fields.
Then you can retrieve each part by its name:
Matcher m = p.matcher( input );
if (m.matches()) {
String m_part = m.group( "M" );
...
String q_part = m.group( "Q" );
}
You can go even a step further by making a RegexGroup interface/objects where each implementing object represents a part of the regex which has a name and the actual regex. Though you definitely lose the simplicity makes it harder to understand it with a quick glance. (I wouldn't do this, just pointing out its possible and has its own benefits)

How to replace a word within a square bracket based a certain condition

I've a tricky condition which does not seem to work. For a given string, "Hi [HandleKey], you have [Action]", and a map which contains, map<"HandleKey","Peter"> I want to replace the square bracket and the word within if the key is found in the map. In this case, the map does not contain the key Action. The string should return "Hi Peter, you have [Action]".
Here is the code that I'm working on:
private String messageFormatter(String tMessage, Map<String, String> messageMap)
{
String formattedMsg = null;
Set<String> keyset = messageMap.keySet();
Iterator<String> keySetItr = keyset.iterator();
String msgkey = null;
boolean isFormatted = false;
while (keySetItr.hasNext())
{
msgkey = keySetItr.next();
if(t.contains(msgkey))
{
if(!isFormatted)
{
formattedMsg = tMessage.replaceAll("\\[", "").replaceAll("\\]", "");
formattedMsg = formattedMsg.replaceAll(msgkey, messageMap.get(msgkey));
isFormatted= true;
}else
{
formattedMsg = formattedMsg.replaceAll(msgkey, messageMap.get(msgkey));;
}
}else
{
formattedMsg=tMessage;
}
}
return formattedMsg;
}
The last else part is not right. Can anyone please help me with this. This code works fine for all the cases except when a matching key is not found in the map
is this idea ok for you?
instead of applying regex or extracting the stuff between [..], you could do some trick on your map side. e.g.
String s = "Hi [HandleKey], you have [Action]";
for(String k: yourMap.keySet()){
s=s.replaceAll("\\["+k+"\\]",yourMap.get(k));
}
You can do this with regex, here is a complete example code
public static void main(String[] args) {
String str = "Hi [HandleKey], you have [Action] ";
Hashtable<String, String> table = new Hashtable<String, String>();
table.put("HandleKey", "Peter");
Pattern pattern = Pattern.compile("\\[(\\w+)\\]");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
String key = matcher.group(1);
if (table.containsKey(key)) {
str = str.replaceFirst("\\[" + key + "\\]", table.get(key));
}
}
System.out.println(str);
}
Output:
Hi Peter, you have [Action]
Note that this is more efficient than looping over the Map if the map size is already large or growing.
To handle when key not in map with minimal changes to what you have above try
formattedMsg.replaceAll(msgkey,
(messageMap.containsKey(msgKey) ? messageMap.get(msgkey) : "[" + msgKey + "]"));
but looking again I can see that you're iterating the set of keys from the messageMap so the issue of a key not appearing in the map doesn't arise?
There's also a reference to if(t.contains(msgKey))... but not sure what t is
if you want the text to contain the formatted [msgKey] when its no found then replacing all "[" & "]" seems the wrong way to start if you want to put them back in in some cases.
I'd look at #iTech's suggestion and get regex doing more for you

Categories