Regular Expression issue, deleting whole lines

Regular Expression issue, deleting whole lines - java

I have been trying for the last couple of hours to create a regular expression that deletes lines of text that start with particular wordage after selecting out a rating.
Below is what I'm trying to delete. I'm also trying to pull the Rating out of the paragraph (it's pass or fail).
Review Master: text here
1111111111 text here
Rating: Fail text here
Review Master Page text here
I am trying to delete all lines that start with the following.
I have
^Review Master:
^[0-9]{10}
^Rating:
^Review Master Page
Again, I am struggling with the replacement(deleting) and finding only the rating.

If you want to find those exact lines in your file then this will work:
Review Master:\n\\d++\nRating:\\s*+(\\w++)\nReview Master Page"
Here is an example using your input as a test string:
public static void main(String[] args) throws Exception {
final String in = "Review Master:\n"
+ "1111111111\n"
+ "Rating: Fail\n"
+ "Review Master Page";
final Matcher m = Pattern.compile(""
+ "Review Master:\n"
+ "\\d++\n"
+ "Rating:\\s*+(\\w++)\n"
+ "Review Master Page").matcher(in);
while(m.find()) {
System.out.println(m.group(1));
}
}
Output:
Fail
If you want to delete those lines then your need to replace the pattern in the file which your have as a String:
public static void main(String[] args) throws Exception {
final String in = "Some other text\n"
+ "Review Master:\n"
+ "1111111111\n"
+ "Rating: Fail\n"
+ "Review Master Page\n"
+ "Some final text";
final Matcher m = Pattern.compile(""
+ "\n?"
+ "Review Master:\n"
+ "\\d++\n"
+ "Rating:\\s*+(\\w++)\n"
+ "Review Master Page").matcher(in);
final StringBuffer output = new StringBuffer();
while (m.find()) {
System.out.println(m.group(1));
m.appendReplacement(output, "");
}
m.appendTail(output);
System.out.println("Result: \"" + output.toString() + "\"");
}
Output:
Fail
Result: "Some other text
Some final text"
i.e. we use the Matcher to yank the pass/fail from the input and also build the output replacing the block of text matched with nothing.
You have not made clear which parts of the patterns are variable.

Related

Regex: starts with messages and string between parent message curly brace

I want to get all the message data. Such that it should look for message and all the data between curly braces of the parent message. With the below pattern, I am not getting all parent body.
String data = "syntax = \"proto3\";\r\n" +
"package grpc;\r\n" +
"\r\n" +
"import \"envoyproxy/protoc-gen-validate/validate/validate.proto\";\r\n" +
"import \"google/api/annotations.proto\";\r\n" +
"import \"google/protobuf/wrappers.proto\";\r\n" +
"import \"protoc-gen-swagger/options/annotations.proto\";\r\n" +
"\r\n" +
"message Acc {\r\n" +
" message AccErr {\r\n" +
" enum Enum {\r\n" +
" UNKNOWN = 0;\r\n" +
" CASH = 1;\r\n" +
" }\r\n" +
" }\r\n" +
" string account_id = 1;\r\n" +
" string name = 3;\r\n" +
" string account_type = 4;\r\n" +
"}\r\n" +
"\r\n" +
"message Name {\r\n" +
" string firstname = 1;\r\n" +
" string lastname = 2;\r\n" +
"}";
List<String> allMessages = new ArrayList<>();
Pattern pattern = Pattern.compile("message[^\\}]*\\}");
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
String str = matcher.group();
allMessages.add(str);
System.out.println(str);
}
}
I am expecting response like below in my array list of string with size 2.
allMessage.get(0) should be:
message Acc {
message AccErr {
enum Enum {
UNKNOWN = 0;
CASH = 1;
}
}
string account_id = 1;
string name = 3;
string account_type = 4;
}
and allMessage.get(1) should be:
message Name {
string firstname = 1;
string lastname = 2;
}

First remove the input prior to "message" appearing at the start of the line, then split on newlines followed by "message" (include the newlines in the split so newlines that intervene parent messages are consumed):
String[] messages = data.replaceAll("(?sm)\\A.*?(?=message)", "").split("\\R+(?=message)");
See live demo.
If you actually need a List<String>, pass that result to Arrays.asList():
List<String> = Arrays.asList(data.replaceAll("(?sm)\\A.*?(?=message)", "").split("\\R+(?=message)"));
The first regex matches everything from start up to, but not including, the first line that starts with message, which is replaced with a blank (ie deleted). Breaking the down:
(?sm) turns on flags s, which makes dot also match newlines, and m, which makes ^ and $ match start and end of each line
\\A means the very start of input
.*? .* means any quantity of any character (including newline as per the s flag being set), but adding ? makes this reluctant, so it matches as few characters as possible while still matching
(?=^message) is a look ahead and means the following characters are a start of a line then "message"
See regex101 live demo for a thorough explanation.
The split regex matches one or more line break sequences when they are followed by "message":
\\R+ means one or more line break sequences (all OS variants)
(?=message) is a look ahead and means the following characters are "message"
See regex101 live demo for a thorough explanation.

Try this for your regex. It anchors on message being the start of a line, and uses a positive lookahead to find the next message or the end of messages.
Pattern.compile("(?s)\r\n(message.*?)(?=(\r\n)+message|$)")
// or
Pattern.compile("(?s)\r?\n(message.*?)(?=(\r?\n)+message|$)")
No spliting, parsing, or managing nested braces either :)
https://regex101.com/r/Wa2xxx/1

Unwanted elements appearing when splitting a string with multiple separators in Java

I have a string from which I need to remove all mentioned punctuations and spaces. My code looks as follows:
String s = "s[film] fever(normal) curse;";
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s+]");
System.out.println("spart[0]: " + spart[0]);
System.out.println("spart[1]: " + spart[1]);
System.out.println("spart[2]: " + spart[2]);
System.out.println("spart[3]: " + spart[3]);
System.out.println("spart[4]: " + spart[4]);
But, I am getting some elements which are blank. The output is:
spart[0]: s
spart[1]: film
spart[2]:
spart[3]: fever
spart[4]: normal
My desired output is:
spart[0]: s
spart[1]: film
spart[2]: fever
spart[3]: normal
spart[4]: curse

Try with this:
public static void main(String[] args) {
String s = "s[film] fever(normal) curse;";
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s]+");
for (String string : spart) {
System.out.println("'"+string+"'");
}
}
output:
's'
'film'
'fever'
'normal'
'curse'

I believe it is because you have a Greedy quantifier for space at the end there. I think you would have to use an escape sequence for the plus sign too.

String spart = s.replaceAll( "\\W", " " ).split(" +");

How to remove commas at the end of any string

I have Strings "a,b,c,d,,,,, ", ",,,,a,,,,"
I want these strings to be converted into "a,b,c,d" and ",,,,a" respectively.
I am writing a regular expression for this. My java code looks like this
public class TestRegx{
public static void main(String[] arg){
String text = ",,,a,,,";
System.out.println("Before " +text);
text = text.replaceAll("[^a-zA-Z0-9]","");
System.out.println("After " +text);
}}
But this is removing all the commas here.
How can write this to achieve as given above?

Use :
text.replaceAll(",*$", "")
As mentioned by #Jonny in comments, can also use:-
text.replaceAll(",+$", "")

Your first example had a space at the end, so it needs to match [, ]. When using the same regular expression multiple times, it's better to compile it up front, and it only needs to replace once, and only if at least one character will be removed (+).
Simple version:
text = text.replaceFirst("[, ]+$", "");
Full code to test both inputs:
String[] texts = { "a,b,c,d,,,,, ", ",,,,a,,,," };
Pattern p = Pattern.compile("[, ]+$");
for (String text : texts) {
String text2 = p.matcher(text).replaceFirst("");
System.out.println("Before \"" + text + "\"");
System.out.println("After \"" + text2 + "\"");
}
Output
Before "a,b,c,d,,,,, "
After "a,b,c,d"
Before ",,,,a,,,,"
After ",,,,a"

Java regEx URL matching issue

and as usual thank you in advance.
I am trying to familiarize myself with regEx and I am having an issue matching a URL.
Here is an example URL:
www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
here is what my regex breakdown looks like:
[site]/[dir]*?/[year]/[month]/[day]/[storyTitle]?/[id]/htmlpage.html
the [id] is a string 22 characters in length that can be either uppercase or lowercase letters, as well as numbers. However, I do not want to extract that from the URL. Just clarifying
Now, I need to extract two values from this url.
First,
I need to extract the dirs(s). However, the [dir] is optional, but also can be as many as wanted. In other words that parameter could not be there, or it could be dir1/dir2/dir3 ..etc . So, going off my first example :
www.examplesite.com/dir1/dir2/dir3/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
Here I would need to extract dir1/dir2/dir3 where a dir is a string that is a single word with all lowercase letters (ie sports/mlb/games). There are no numbers in the dir, only using that as an example.
But in this example of a valid URL:
www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
There is no [dir] so I would not extract anything. thus, the [dir] is optional
Secondly,
I need to extract the [storyTitle] where the [storyTitle] is also optional just like the [dir] above, but however if there is a storyTitle there can only be one.
So going off my previous examples
www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
would be valid where I need to extract 'title-of-some-story' where story titles are dash separated strings that are always lowercase. The example belowis also valid:
www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
In the above example, there is no [storyTitle] thus making it optional
Lastly, just to be thorough, a URL without a [dir] and without a [storyTitle] are also valid. Example:
www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
Is a valid URL. Any input would be helpful I hope I am clear.

Here is one example that will work.
public static void main(String[] args) {
Pattern p = Pattern.compile("(?:http://)?.+?(/.+?)?/\\d+/\\d{2}/\\d{2}(/.+?)?/\\w{22}");
String[] strings ={
"www.examplesite.com/dir1/dir2/4444/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html"
};
for (int idx = 0; idx < strings.length; idx++) {
Matcher m = p.matcher(strings[idx]);
if (m.find()) {
String dir = m.group(1);
String title = m.group(2);
if (title != null) {
title = title.substring(1); // remove the leading /
}
System.out.println(idx+": Dir: "+dir+", Title: "+title);
}
}
}

Here is an all regex solution.
Edit: Allows for http://
Java source:
import java.util.*;
import java.lang.*;
import java.util.regex.*;
class Main
{
public static void main (String[] args) throws java.lang.Exception
{
String url = "http://www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
String url2 = "www.examplesite.com/dir/dir2/dir3/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
String url3 = "www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
String patternStr = "(?:http://)?[^/]*[/]?([\\S]*)/[\\d]{4}/[\\d]{2}/[\\d]{2}[/]?([\\S]*)/[\\S]*/[\\S]*";
// Compile regular expression
Pattern pattern = Pattern.compile(patternStr);
// Match 1st url
System.out.println("Match 1st URL:");
Matcher matcher = pattern.matcher(url);
if (matcher.find()) {
System.out.println("URL: " + matcher.group(0));
System.out.println("DIR: " + matcher.group(1));
System.out.println("TITLE: " + matcher.group(2));
}
else{ System.out.println("No match."); }
// Match 2nd url
System.out.println("\nMatch 2nd URL:");
matcher = pattern.matcher(url2);
if (matcher.find()) {
System.out.println("URL: " + matcher.group(0));
System.out.println("DIR: " + matcher.group(1));
System.out.println("TITLE: " + matcher.group(2));
}
else{ System.out.println("No match."); }
// Match 3rd url
System.out.println("\nMatch 3rd URL:");
matcher = pattern.matcher(url3);
if (matcher.find()) {
System.out.println("URL: " + matcher.group(0));
System.out.println("DIR: " + matcher.group(1));
System.out.println("TITLE: " + matcher.group(2));
}
else{ System.out.println("No match."); }
}
}
Output:
Match 1st URL:
URL: http://www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: dir
TITLE: title-of-some-story
Match 2nd URL:
URL: www.examplesite.com/dir/dir2/dir3/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: dir/dir2/dir3
TITLE:
Match 3rd URL:
URL: www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR:
TITLE: title-of-some-story

get particular string using regex in java

i want how to code for get only link from string using regex or anyothers.
here the following is java code:
String aas = "window.open("+"\""+"http://www.example.com/jscript/jex5.htm"+"\""+")"+"\n"+"window.open("+"\""+"http://www.example.com/jscript/jex5.htm"+"\""+")";
how to get the link http://www.example.com/jscript/jex5.htm
thanks and advance

The Regex
(?<=window.open\(")[^"]*(?="\))
matches the link in the string you have given. Properly escaped it reads
"(?<=window.open\\(\")[^\"]*(?=\"\\))"

This will print out the first URL contained in the string that starts with "http://":
public static void main(String[] args) throws Exception {
String javascriptString = "window.open(" + "\"" + "http://www.example.com/jscript/jex5.htm" + "\"" + ")" + "\n" + "window.open(" + "\""
+ "http://www.example.com/jscript/jex5.htm" + "\"" + ")";
Pattern pattern = Pattern.compile(".*(http://.*)\".*\n.*");
Matcher m = pattern.matcher(javascriptString);
if (m.matches()) {
System.out.println(m.group(1));
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular Expression issue, deleting whole lines - java

Related

Regex: starts with messages and string between parent message curly brace

Unwanted elements appearing when splitting a string with multiple separators in Java

How to remove commas at the end of any string

Java regEx URL matching issue

get particular string using regex in java

Categories

Resources