Regex to split a string using java - java

I am trying to parse a string as I need to pass the map to UI.
Here is my input string :
"2020-02-01T00:00:00Z",1,
"2020-04-01T00:00:00Z",4,
"2020-05-01T00:00:00Z",2,
"2020-06-01T00:00:00Z",31,
"2020-07-01T00:00:00Z",60,
"2020-08-01T00:00:00Z",19,
"2020-09-01T00:00:00Z",10,
"2020-10-01T00:00:00Z",33,
"2020-11-01T00:00:00Z",280,
"2020-12-01T00:00:00Z",61,
"2021-01-01T00:00:00Z",122,
"2021-12-01T00:00:00Z",1
I need to split the string like this :
"2020-02-01T00:00:00Z",1 : split[0]
"2020-04-01T00:00:00Z",4 : split[1]
Issue is I can't split it on " , " as its repeated 2 times.
I need a regex that gives 2020-02-01T00:00:00Z,1 as one token to process further.
I am new to regex. Can someone please provide a regex expression for the same.

If you want the pairs of date-time and ID, you can use the regex, (\"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z\",\d+)(?=,|$) to get the match results.
The pattern, (?=,|$) is the lookahead assertion for comma or end of the line.
Demo:
import java.util.List;
import java.util.regex.MatchResult;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
String s = "\"2020-02-01T00:00:00Z\",1,\n"
+ " \"2020-04-01T00:00:00Z\",4,\n"
+ " \"2020-05-01T00:00:00Z\",2,\n"
+ " \"2020-06-01T00:00:00Z\",31,\n"
+ " \"2020-07-01T00:00:00Z\",60,\n"
+ " \"2020-08-01T00:00:00Z\",19,\n"
+ " \"2020-09-01T00:00:00Z\",10,\n"
+ " \"2020-10-01T00:00:00Z\",33,\n"
+ " \"2020-11-01T00:00:00Z\",280,\n"
+ " \"2020-12-01T00:00:00Z\",61,\n"
+ " \"2021-01-01T00:00:00Z\",122,\n"
+ " \"2021-12-01T00:00:00Z\",1";
List<String> list = Pattern.compile("(\\\"\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z\\\",\\d+)(?=,|$)")
.matcher(s)
.results()
.map(MatchResult::group)
.collect(Collectors.toList());
list.stream()
.forEach(p -> System.out.println(p));
}
}
Output:
"2020-02-01T00:00:00Z",1
"2020-04-01T00:00:00Z",4
"2020-05-01T00:00:00Z",2
"2020-06-01T00:00:00Z",31
"2020-07-01T00:00:00Z",60
"2020-08-01T00:00:00Z",19
"2020-09-01T00:00:00Z",10
"2020-10-01T00:00:00Z",33
"2020-11-01T00:00:00Z",280
"2020-12-01T00:00:00Z",61
"2021-01-01T00:00:00Z",122
"2021-12-01T00:00:00Z",1

Why can't you just split on , and ignore the last value?

Here's your pattern:
final Pattern pattern = Pattern.compile("(\\S+),(\\d+)");
final Matcher matcher = pattern.matcher("Input....");
Here's how to use it:
while (matcher.find()) {
final String date = matcher.group(1);
final String number = matcher.group(2);
}

Related

Find substring from a complex string using regex

I have a String containing huge script code as follows :
String script = "node {
stage(someString) {
try {
**parameters= [
[someString],
[someString],
[someString],
[someString],
[someString],
[someString],
[someString],
]**
//some more script
}
}";
I want to extract the parameters variable containing array of array values
I tried the following pattern but didnt work
Pattern pattern = Pattern.compile("parameters= [(.*?)]");
How do I extract the parameters variable from script String variable using Regex?
Thanks in advance!
You may try using:
parameters=\s*\[(.*)]
Explanation of the above regex:
parameters= - Matches parameters= literally.
\s* - Matches a white-space character zero or more times.
\[ - Matches [ literally.
(.*)] - represents a capturing group capturing everything before a ].
You can find the demo of the above regex in here.
Sample Implementation in java:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Main
{
private static final Pattern pattern = Pattern.compile("parameters=\\s*\\[(.*)]", Pattern.DOTALL);
public static void main(String[] args) {
String string = "node {\n"
+ " stage(someString) {\n"
+ " try {\n"
+ " **parameters= [\n"
+ " [someString],\n"
+ " [someString],\n"
+ " [someString],\n"
+ " [someString],\n"
+ " [someString],\n"
+ " [someString],\n"
+ " [someString],\n"
+ " ]**\n"
+ " //some more script";
StringBuilder sb = new StringBuilder();
Matcher matcher = pattern.matcher(string);
while(matcher.find()){
// Replaced all the unwanted spaces and commas. You can address that accordingly.
sb.append(matcher.group(1).replaceAll("[\\s,]+", " "));
}
System.out.println(sb.toString());
}
}
Please find the sample run of the above implementation in here.

How to split a string by 2 strings in Java

How do I split this string by two words?
<input type="hidden" name="SYNCHRONIZER_TOKEN" value="2f56248e-e54d-48ef-8c8c-6028d6f3d63f" id="SYNCHRONIZER_TOKEN" />
String 1: value="
String 2: " id="SYNC
After every split the string need to look like: 2f56248e-e54d-48ef-8c8c-6028d6f3d63f
Try using a regex to extract the value of interest. This way your code does not make any assumptions and will not break if there is something completely different after value=...
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class SplitString {
public static void main(String[] args){
String input = "<input type=\"hidden\" name=\"SYNCHRONIZER_TOKEN\" value=\"2f56248e-e54d-48ef-8c8c-6028d6f3d63f\" id=\"SYNCHRONIZER_TOKEN\" />\n";
Pattern pattern = Pattern.compile("value=\"[a-zA-Z0-9-]+\"");
Matcher matcher = pattern.matcher(input);
if (matcher.find()){
String keyValue = matcher.group(0);
String key = keyValue.split("=")[0];
String value = keyValue.split("=")[1];
System.out.println("KeyValue: " + keyValue);
System.out.println("Key: " + key);
System.out.println("Value: " + value);
}
}
}
The output looks like this
KeyValue: value="2f56248e-e54d-48ef-8c8c-6028d6f3d63f"
Key: value
Value: "2f56248e-e54d-48ef-8c8c-6028d6f3d63f"

java regex take variable between two tag

I am very new in regex and need your help. I wanna take numbers and letters between two span.
<span>454.000 $</span>
I wanna take 454.000 $. There are 12 space before . Please help me.
This Should Work.
Regexp:
\s+<.+>(.+)<.+>
Input:
<span>454.000 $</span>
Output:
454.000 $
JAVA CODE:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "\\s+<.+>(.+)<.+>";
final String string = " <span>454.000 $</span>";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
See: https://regex101.com/r/2zg5Ws/1
Capturing group using pattern matching is something like below
String x = " <span>454.000 $</span> ";
Pattern p = Pattern.compile("<span>(.*?)</span>");
Matcher m = p.matcher(x);
if (m.find()) {
System.out.println(">> "+ m.group(1)); // output 454.000 $
}
But for such cases I always prefer to use the replaceAll() as it is shorter version of code:
String num = x.replaceAll(".*<span>(.*?)</span>.*", "$1");
// num has 454.000 $
For the replace it is actually capturing the group from the text and replacing the whole text with that group ($1). This solution depends upon how your input string is.

Regex matching imports of a class

I've been trying to write a regex to match the imports of a class. Let the class be
import static org.junit.Assert.*;
import org.
package.
Test;
import mypackage.mystuff;
The output should be [org.junit.Assert.*, org.package.Test, mypackage.mystuff]. I've been struggling with the line breaks and with regular expressions in general since I'm not that experienced with them. This is my current attempt:
((?<=\bimport\s)\s*([^\s]+ )*([a-z.A-Z0-9]+.(?=;)))
This (almost) suits your needs:
(?<=import (?:static )?+)[^;]+
Debuggex Demo
Almost because the matches include the new lines if any (e.g. in your org.package.Test declaration). This should be handled afterwards:
Pattern pattern = Pattern.compile("(?<=import (?:static )?+)[^;]+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
String match = matcher.group().replaceAll("\\s+", "");
// do something with match
}
In Java, \s matches [ \t\n\x0B\f\r]. Have a look at possessive quantifiers as well to understand the ?+ quantifier.
This regex should work for all kinds of import statements and should not match invalid statements:
import\p{javaIdentifierIgnorable}*\p{javaWhitespace}+(?:static\p{javaIdentifierIgnorable}*\p{javaWhitespace}+)?(\p{javaJavaIdentifierStart}[\p{javaJavaIdentifierPart}\p{javaIdentifierIgnorable}]*(?:\p{javaWhitespace}*\.\p{javaWhitespace}*\*|(?:\p{javaWhitespace}*\.\p{javaWhitespace}*\p{javaJavaIdentifierStart}[\p{javaJavaIdentifierPart}\p{javaIdentifierIgnorable}]*)+(?:\p{javaWhitespace}*\.\p{javaWhitespace}*\*)?))\p{javaWhitespace}*;
It's extensively using Java's categories, e.g. \p{javaWhitespace} calls Character.isWhitespace:
Categories that behave like the java.lang.Character boolean ismethodname methods (except for the deprecated ones) are available through the same \p{prop} syntax where the specified property has the name javamethodname.
Still not readable? Guessed so. That's why I tried to express it with Java code (REGEX):
public class ImportMatching {
static final String IMPORTS = "import\n" +
"java.io.IOException;\n" +
"import java.nio.file.Files;\n" +
"import java . nio . file. Path;\n" +
"import java.nio.file.Paths\n" +
";import java.util.ArrayList;\n" +
"import static java.util. List.*;\n" +
"import java.util.List. *;\n" +
"import java.\n" +
" util.\n" +
" List;\n" +
" import java.util.regex.Matcher;import java.util.regex.Pattern\n" +
" ;\n" +
"import mypackage.mystuff;\n" +
"import mypackage.*;";
static final String WS = "\\p{javaWhitespace}";
static final String IG = "\\p{javaIdentifierIgnorable}";
static final String ID = "\\p{javaJavaIdentifierStart}" + multiple(charClass("\\p{javaJavaIdentifierPart}" + IG));
static final String DOT = multiple(WS) + "\\." + multiple(WS);
static final String WC = "\\*";
static final String REGEX = "import" + multiple(IG) + atLeastOnce(WS) +
optional(nonCapturingGroup("static" + multiple(IG) + atLeastOnce(WS))) +
group(
ID +
nonCapturingGroup(
or(
DOT + WC,
atLeastOnce(nonCapturingGroup(DOT + ID)) + optional(nonCapturingGroup(DOT + WC))
)
)
) +
multiple(WS) + ';';
public static void main(String[] args) {
final List<String> imports = getImports(IMPORTS);
System.out.printf("Matches: %d%n", imports.size());
imports.stream().forEach(System.out::println);
}
static List<String> getImports(String javaSource) {
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(javaSource);
List<String> imports = new ArrayList<>();
while(matcher.find()) {
imports.add(matcher.group(1).replaceAll(charClass(WS + IG), ""));
}
return imports;
}
static String nonCapturingGroup(String regex) {
return group("?:" + regex);
}
static String or(String option1, String option2) {
return option1 + '|' + option2;
}
static String atLeastOnce(String regex) {
return regex + '+';
}
static String optional(String regex) {
return regex + '?';
}
static String multiple(String regex) {
return regex + '*';
}
static String group(String regex) {
return '(' + regex + ')';
}
static String charClass(String regex) {
return '[' + regex + ']';
}
}
I'm using one group for the package.Class part and then replacing any noise from the matches.
The test input is the following string (IMPORTS):
import
java.io.IOException;
import java.nio.file.Files;
import java . nio . file. Path;
import java.nio.file.Paths
;import java.util.ArrayList;
import static java.util. List.*;
import java.util.List. *;
import java.
util.
List;
import java.util.regex.Matcher;import java.util.regex.Pattern
;
import mypackage.mystuff;
import mypackage.*;
The output:
Matches: 12
java.io.IOException
java.nio.file.Files
java.nio.file.Path
java.nio.file.Paths
java.util.ArrayList
java.util.List.*
java.util.List.*
java.util.List
java.util.regex.Matcher
java.util.regex.Pattern
mypackage.mystuff
mypackage.*
You can use this regex:
(\w+\.\n*\s*)+([\w\*]+)(?=\;)
Escaped For Java:
(\\w+\\.\\n*\\s*)+([\\w\\*]+)(?=\\;)
Here is a regex tester link
Maybe this is what you are looking for?
(?<=\bimport)(\s*\R*\s*(?:[a-z0-9A-Z]+(\R|\s)+)*)((([a-zA-Z0-9]+\.)+)[a-zA-Z0-9]*\*?);
Source
Try this regexp:
import (static )*([^;])*
This works good for me
import\s*((?:\w+[/./])+)

Find words matching a specific REGEX within a sentence using JAVA

I am trying to generate a dynamic message that can be used for processing using Java and Regular Expressions. My incoming value can be just "$bdate$" or be embedded within a sentence like "Your Birthdate : $bdate$". I want to replace these $aaa$ values dynamically at run time and am not able to isolate the regex matched values within a sentence. Here is what I have so far....
package com.test;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;
public class TestRegex {
public static String REGEX = "\\$((?:[a-zA-Z0-9_ ]*))\\$";
public static String testString = "Summary : $summary$"
+ "Age : $age$"
+ "Location : $location$";
public static void main(String[] args) {
System.out.println("Matcher : " + Pattern.matches(REGEX, "$ABX_ 11$"));
String [] splitStrings = testString.split("\\W+"); //also tried "\\b+"
List<String> stringList = Arrays.asList(splitStrings);
for(String test : stringList) {
System.out.println("Split Word : " + test);
}
}
}
The output is below - it misses the preceding and succeeding $ symbols:
Matcher : true
Split Word : Summary
Split Word : summary
Split Word : Age
Split Word : age
Split Word : Location
Split Word : location
I know I am very close but not able to figure out the issue - Can anyone please help !!
You can use the following:
String pattern = "\\w+|\\$\\w+\\$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(testString);
while (m.find( )) {
System.out.println("Found value: " + m.group(0) );
}
See Ideone DEMO
Just to extend #Karthik's answer and complete the thread, below code snippet only looks for words that match a pattern within the sentence and collects them - it might be easier to replace those dynamically at run time.
package com.test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestRegex {
public static String testString = "Summary : $summary$"
+ "Age : $age$"
+ "Location : $location$";
public static void main(String[] args) {
String pattern = "\\$\\w+\\$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(testString);
while (m.find( )) {
System.out.println("Found value: " + m.group(0) );
}
}
}

Categories