How to use a regex in Java to manipulate a string - java

String s = "My cake should have ( sixteen | sixten | six teen ) candles, I love and ( should be | would be ) puff them."
Final changed string
My cake should have <div><p id="1">sixteen</p><p id="2">sixten</p><p id="3">six teen</p></div> candles, I love and <div><p id="1">should be</p><p id="2"> would be</p> puff them
I have tried using this:
Pattern pattern = Pattern.compile("\\|\\s*(.*?)(?=\\s*\\|)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}

Use
import java.util.regex.*;
class Program
{
public static void main (String[] args) throws java.lang.Exception
{
String s = "My cake should have ( sixteen | sixten | six teen ) candles, I love and ( should be | would be ) puff them.";
Pattern pattern = Pattern.compile("\\(([^()|]*\\|[^()]*)\\)");
Matcher matcher = pattern.matcher(s);
StringBuffer changed = new StringBuffer();
while (matcher.find()){
String temp = "<div>";
String[] items = matcher.group(1).trim().split("\\s*\\|\\s*");
for (int i = 1; i<=items.length; i++) {
temp += "<p id=\"" + i + "\">" + items[i-1] + "</p>";
}
matcher.appendReplacement(changed, temp+"</div>");
}
matcher.appendTail(changed);
System.out.println(changed.toString());
}
}
See Java proof.
Results: My cake should have <div><p id="1">sixteen</p><p id="2">sixten</p><p id="3">six teen</p></div> candles, I love and <div><p id="1">should be</p><p id="2">would be</p></div> puff them.
Regex used
\(([^()|]*\|[^()]*)\)
EXPLANATION
--------------------------------------------------------------------------------
\( '('
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^()|]* any character except: '(', ')', '|' (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
[^()]* any character except: '(', ')' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\) ')'
In short: Match parens with pipes, take the content with no parens and split with pipes, trim, combine into one string with loop. StringBuffer and matcher.appendReplacement do the magic with string manipulation during replacing.

Related

How to extract content from sign using regex in Java?

My content is a string like this:whoad123##${studentA.math1.math2}dsafddasfd${studentB.math2.math3},now I want to extract the content studentA,studentB which is in the braces and before the first pot(${**}).What's wrong with my code?
static java.util.regex.Pattern p1=java.util.regex.Pattern.compile("\\*\\$\\{\\w+\\}");
private static String getStudentName(String expression) {
StringBuffer stringBuffer = new StringBuffer();
java.util.regex.Matcher m1= p1.matcher(expression);
while(m1.find()) {
String param=m1.group(1);
stringBuffer.append(param.substring(2,param.indexOf("\\.")) + ",");
}
if(stringBuffer.length()>0){
return stringBuffer.deleteCharAt(stringBuffer.length()-1).toString();
}
return null;
}
Use
\$\{([^{}.]+)
See proof
Declare in Java as
static java.util.regex.Pattern p1=java.util.regex.Pattern.compile("\\$\\{([^{}.]+)");
EXPLANATION
EXPLANATION
--------------------------------------------------------------------------------
\$ '$'
--------------------------------------------------------------------------------
\{ '{'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^{}.]+ any character except: '{', '}', '.' (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
public static String getStudentName(String expression) {
Pattern pattern = Pattern.compile("\\{(?<student>\\w+)\\.[^\\}]+\\}");
Matcher matcher = pattern.matcher(expression);
List<String> names = new ArrayList<>();
while (matcher.find()) {
names.add(matcher.group("student"));
}
return String.join(",", names);
}
See demo in regex101.com

Regex to replace comments with number of new lines

I want to replace all Java-style comments (/* */) with the number of new lines for that comment. So far, I can only come up with something that replaces comments with an empty string
String.replaceAll("/\\*[\\s\\S]*?\\*/", "")
Is it possible to replace the matching regexes instead with the number of new lines it contains? If this is not possible with just regex matching, what's the best way for it to be done?
For example,
/* This comment
has 2 new lines
contained within */
will be replaced with a string of just 2 new lines.
Since Java supports the \G construct, just do it all in one go.
Use a global regex replace function.
Find
"/(?:\\/\\*(?=[\\S\\s]*?\\*\\/)|(?<!\\*\\/)(?!^)\\G)(?:(?!\\r?\\n|\\*\\/).)*((?:\\r?\\n)?)(?:\\*\\/)?/"
Replace
"$1"
https://regex101.com/r/l1VraO/1
Expanded
(?:
/ \*
(?= [\S\s]*? \* / )
|
(?<! \* / )
(?! ^ )
\G
)
(?:
(?! \r? \n | \* / )
.
)*
( # (1 start)
(?: \r? \n )?
) # (1 end)
(?: \* / )?
==================================================
==================================================
IF you should ever care about comment block delimiters started within
quoted strings like this
String comment = "/* this is a comment*/"
Here is a regex (addition) that parses the quoted string as well as the comment.
Still done in a single regex all at once in a global find / replace.
Find
"/(\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\")|(?:\\/\\*(?=[\\S\\s]*?\\*\\/)|(?<!\")(?<!\\*\\/)(?!^)\\G)(?:(?!\\r?\\n|\\*\\/).)*((?:\\r?\\n)?)(?:\\*\\/)?/"
Replace
"$1$2"
https://regex101.com/r/tUwuAI/1
Expanded
( # (1 start)
"
[^"\\]*
(?:
\\ [\S\s]
[^"\\]*
)*
"
) # (1 end)
|
(?:
/ \*
(?= [\S\s]*? \* / )
|
(?<! " )
(?<! \* / )
(?! ^ )
\G
)
(?:
(?! \r? \n | \* / )
.
)*
( # (2 start)
(?: \r? \n )?
) # (2 end)
(?: \* / )?
You can do it with a regex "replacement loop".
Most easily done in Java 9+:
String result = Pattern.compile("/\\*(?:[^*]++|\\*(?!/))*+\\*/").matcher(input)
.replaceAll(r -> r.group().replaceAll(".*", ""));
The main regex has been optimized for performance. The lambda has not been optimized.
For all Java versions:
Matcher m = Pattern.compile("/\\*(?:[^*]++|\\*(?!/))*+\\*/").matcher(input);
StringBuffer buf = new StringBuffer();
while (m.find())
m.appendReplacement(buf, m.group().replaceAll(".*", ""));
String result = m.appendTail(buf).toString();
Test
final String input = "Line 1\n"
+ "/* Inline comment */\n"
+ "Line 3\n"
+ "/* One-line\n"
+ " comment */\n"
+ "Line 6\n"
+ "/* This\n"
+ " comment\n"
+ " has\n"
+ " 4\n"
+ " lines */\n"
+ "Line 12";
Matcher m = Pattern.compile("(?s)/\\*(?:[^*]++|\\*(?!/))*+\\*/").matcher(input);
String result = m.replaceAll(r -> r.group().replaceAll(".*", ""));
// Show input/result side-by-side
String[] inLines = input.split("\n", -1);
String[] resLines = result.split("\n", -1);
int lineCount = Math.max(inLines.length, resLines.length);
System.out.println("input |result");
System.out.println("-------------------------+-------------------------");
for (int i = 0; i < lineCount; i++) {
System.out.printf("%-25s|%s%n", (i < inLines.length ? inLines[i] : ""),
(i < resLines.length ? resLines[i] : ""));
}
Output
input |result
-------------------------+-------------------------
Line 1 |Line 1
/* Inline comment */ |
Line 3 |Line 3
/* One-line |
comment */ |
Line 6 |Line 6
/* This |
comment |
has |
4 |
lines */ |
Line 12 |Line 12
Maybe, this expression,
\/\*.*?\*\/
on s mode might be close to what you have in mind.
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class re{
public static void main(String[] args){
final String regex = "\\/\\*.*?\\*\\/";
final String string = "/* This comment\n"
+ "has 2 new lines\n"
+ "contained within */\n\n"
+ "Some codes here 1\n\n"
+ "/* This comment\n"
+ "has 2 new lines\n"
+ "contained within \n"
+ "*/\n\n\n"
+ "Some codes here 2";
final String subst = "\n\n";
final Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println(result);
}
}
Output
Some codes here 1
Some codes here 2
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.

(DEFINE) feature in regex does not work in Java

I am trying to validate a JSON string using regex. Found the valid regex from another post https://stackoverflow.com/a/3845829/7493427
It uses DEFINE feature in regex. But I think the JRegex library does not support that feature. Is there a work around for this?
I used java.util.regex first, then found out about JRegex library. But this doesn't work too.
String regex = "(?(DEFINE)" +
"(?<number> -? (?= [1-9]|0(?!\\d) ) \\d+ (\\.\\d+)? ([eE] [+-]?
\\d+)? )" +
"(?<boolean> true | false | null )" +
"(?<string> \" ([^\"\\n\\r\\t\\\\\\\\]* | \\\\\\\\
[\"\\\\\\\\bfnrt\\/] | \\\\\\\\ u [0-9a-f]{4} )* \" )" +
"(?<array> \\[ (?: (?&json) (?: , (?&json) )* )? \\s*
\\] )" +
"(?<pair> \\s* (?&string) \\s* : (?&json) )" +
"(?<object> \\{ (?: (?&pair) (?: , (?&pair) )* )? \\s*
\\} )" +
"(?<json> \\s* (?: (?&number) | (?&boolean) | (?&string) | (?
&array) | (?&object) ) \\s* )" +
")" +
"\\A (?&json) \\Z";
String test = "{\"asd\" : \"asdasdasdasdasdasd\"}";
jregex.Pattern pattern = new jregex.Pattern(regex);
jregex.Matcher matcher = pattern.matcher(test);
if(matcher.find()) {
System.out.println(matcher.groups());
}
I expected a match as the test json is valid, but I get an exception.
Exception in thread "main" jregex.PatternSyntaxException: unknown group name in conditional expr.: DEFINE at jregex.Term.makeTree(Term.java:360) at jregex.Term.makeTree(Term.java:219)at jregex.Term.makeTree(Term.java:206) at jregex.Pattern.compile(Pattern.java:164) at jregex.Pattern.(Pattern.java:150) at jregex.Pattern.(Pattern.java:108) at com.cloak.utilities.regex.VariableValidationHelper.main(VariableValidationHelper.java:305)
You can use this rather simple Jackson setup:
private static final ObjectMapper MAPPER = new ObjectMapper();
public static boolean isValidJson(String json) {
try {
MAPPER.readValue(json, Map.class);
return true;
} catch(IOException e) {
return false;
}
}
ObjectMapper#readValue() will throw JsonProcessingExceptions (a sub class of IOException) when the input is invalid.

How to delimit both "=" and "==" in Java when reading

I want to be able to output both "==" and "=" as tokens.
For example, the input text file is:
biscuit==cookie apple=fruit+-()
The output:
biscuit
=
=
cookie
apple
=
fruit
+
-
(
)
What I want the output to be:
biscuit
==
cookie
apple
=
fruit
+
-
(
)
Here is my code:
Scanner s = null;
try {
s = new Scanner(new BufferedReader(new FileReader("input.txt")));
s.useDelimiter("\\s|(?<=\\p{Punct})|(?=\\p{Punct})");
while (s.hasNext()) {
String next = s.next();
System.out.println(next);
}
} finally {
if (s != null) {
s.close();
}
}
Thank you.
Edit: I want to be able to keep the current regex.
Just split the input string according to the below regex .
String s = "biscuit==cookie apple=fruit";
String[] tok = s.split("\\s+|\\b(?==+)|(?<==)(?!=)");
System.out.println(Arrays.toString(tok));
Output:
[biscuit, ==, cookie, apple, =, fruit]
Explanation:
\\s+ Matches one or more space characters.
| OR
\\b(?==+) Matches a word boundary only if it's followed by a = symbol.
| OR
(?<==) Lookafter to = symbol.
(?!=) And match the boundary only if it's not followed by a = symbol.
Update:
String s = "biscuit==cookie apple=fruit+-()";
String[] tok = s.split("\\s+|(?<!=)(?==+)|(?<==)(?!=)|(?=[+()-])");
System.out.println(Arrays.toString(tok));
Output:
[biscuit, ==, cookie, apple, =, fruit, +, -, (, )]
You might be able to qualify those punctuations with some additional assertions.
# "\\s|(?<===)|(?<=\\p{Punct})(?!(?<==)(?==))|(?=\\p{Punct})(?!(?<==)(?==))"
\s
| (?<= == )
| (?<= \p{Punct} )
(?!
(?<= = )
(?= = )
)
| (?= \p{Punct} )
(?!
(?<= = )
(?= = )
)
Info update
If some characters aren't covered in \p{Punct} just add them as a separate class within
the punctuation subexpressions.
For engines that don't do certain properties well inside classes, use this ->
# Raw: \s|(?<===)|(?<=\p{Punct}|[=+])(?!(?<==)(?==))|(?=\p{Punct}|[=+])(?!(?<==)(?==))
\s
| (?<= == )
| (?<= \p{Punct} | [=+] )
(?!
(?<= = )
(?= = )
)
| (?= \p{Punct} | [=+] )
(?!
(?<= = )
(?= = )
)
For engines that handle properties well inside classes, this is a better one ->
# Raw: \s|(?<===)|(?<=[\p{Punct}=+])(?!(?<==)(?==))|(?=[\p{Punct}=+])(?!(?<==)(?==))
\s
| (?<= == )
| (?<= [\p{Punct}=+] )
(?!
(?<= = )
(?= = )
)
| (?= [\p{Punct}=+] )
(?!
(?<= = )
(?= = )
)
In other words you want to split on
one or more whitespaces
place which has = after it and non-= before it (like foo|= where | represents this place)
place which has = before it it and non-= after it (like =|foo where | represents this place)
In other words
s.useDelimiter("\\s+|(?<!=)(?==)|(?<==)(?!=)");
// ^^^^^ ^^^^^^^^^^^ ^^^^^^^^^^^
//cases: 1) 2) 3)
Since it looks like you are building parser I would suggest using tool which will let you build correct grammar like http://www.antlr.org/. But if you must stick with regex then other improvement which will let you build regex easier would be using Matcher#find instead of delimiter from Scanner. This way your regex and code could look like
String data = "biscuit==cookie apple=fruit+-()";
String regex = "<=|==|>=|[\\Q<>+-=()\\E]|[^\\Q<>+-=()\\E]+";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(data);
while (m.find())
System.out.println(m.group());
Output:
biscuit
==
cookie apple
=
fruit
+
-
(
)
You can make this regex more general by using
String regex = "<=|==|>=|\\p{Punct}|\\P{Punct}+";
// ^^^^^^^^^^ ^^^^^^^^^^^-- standard cases
// ^^ ^^ ^^------------------------- special cases
Also this approach would require reading data from file first, and storing it in single String which you would parse. You can find many ways of how to read text from file for instance in this question:
Reading a plain text file in Java
so you can use something like
String data = new String(Files.readAllBytes(Paths.get("input.txt")));
You can specify encoding which String should use while reading bytes from file by using constructor String(bytes, encoding). So you can write it as new String(butes,"UTF-8") or to avoid typos while selecting encoding use one of stored in StandardCharsets class like new String(bytes, StandardCharsets.UTF_8).
(?===)|(?<===)|\s|(?<!=)(?==)|(?<==)(?!=)|(?=\p{P})|(?<=\p{P})|(?=\+)
You can try this.Se demo.
http://regex101.com/r/wQ1oW3/18

String between double quotes using Regex in Java

How can i get Strings between double quotes using Regex in Java?
_settext(_textbox(0,_near(_span("My Name"))) ,"Brittas John");
ex: I need My Name and Brittas John
Get the matched group from index 1 that is captured by enclosing inside the parenthesis (...)
"([^"]*)"
DEMO
Pattern explanation:
" '"'
( group and capture to \1:
[^"]* any character except: '"' (0 or more times) (Greedy)
) end of \1
" '"'
sample code:
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m = p.matcher("_settext(_textbox(0,_near(_span(\"My Name\"))) ,\"Brittas John\");");
while (m.find()) {
System.out.println(m.group(1));
}
Try this regex..
public static void main(String[] args) {
String s = "_settext(_textbox(0,_near(_span(\"My Name\"))) ,\"Brittas John\");";
Pattern p = Pattern.compile("\"(.*?)\"");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
}
O/P :
My Name
Brittas John

Categories