I have the following text:
...,Niedersachsen,NOT IN CHARGE SINCE: 03.2009, CATEGORY:...,
Now I want to extract the date after NOT IN CHARGE SINCE: until the comma.
So i need only 03.2009 as result in my substring.
So how can I handle that?
String substr = "not in charge since:";
String before = s.substring(0, s.indexOf(substr));
String after = s.substring(s.indexOf(substr),s.lastIndexOf(","));
EDIT
for (String s : split) {
s = s.toLowerCase();
if (s.contains("ex peps")) {
String substr = "not in charge since:";
String before = s.substring(0, s.indexOf(substr));
String after = s.substring(s.indexOf(substr), s.lastIndexOf(","));
System.out.println(before);
System.out.println(after);
System.out.println("PEP!!!");
} else {
System.out.println("Line ok");
}
}
But that is not the result I want.
You can use Patterns for example :
String str = "Niedersachsen,NOT IN CHARGE SINCE: 03.2009, CATEGORY";
Pattern p = Pattern.compile("\\d{2}\\.\\d{4}");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group());
}
Output
03.2009
Note : if you want to get similar dates in all your String you can use while instead of if.
Edit
Or you can use :
String str = "Niedersachsen,NOT IN CHARGE SINCE: 03.03.2009, CATEGORY";
Pattern p = Pattern.compile("SINCE:(.*?)\\,");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1).trim());
}
You can use : to separate the String s.
String substr = "NOT IN CHARGE SINCE:";
String before = s.substring(0, s.indexOf(substr)+1);
String after = s.substring(s.indexOf(':')+1, s.lastIndexOf(','));
Of course, regular expressions give you more ways to do searching/matching, but assuming that the ":" is the key thing you are looking for (and it shows up exactly once in that position) then:
s.substring(s.indexOf(':')+1, s.lastIndexOf(',')).trim();
is the "most simple" and "least overhead" way of fetching that substring.
Hint: as you are searching for a single character, use a character as search pattern; not a string!
If you have a more generic usecase and you know the structure of the text to be matched well you might profit from using regular expressions:
Pattern pattern = Pattern.compile(".*NOT IN CHARGE SINCE: \([0-9.]*\),");
Matcher matcher = pattern.matcher(line);
System.out.println(matcher.group());
A more generic way to solve your problem is to use Regex to match Every group Between : and ,
Pattern pattern = Pattern.compile("(?<=:)(.*?)(?=,)");
Matcher m = p.matcher(str);
You have to create a pattern for it. Try this as a simple regex starting point, and feel free to improvise on it:
String s = "...,Niedersachsen,NOT IN CHARGE SINCE: 03.2009, CATEGORY:....,";
Pattern pattern = Pattern.compile(".*NOT IN CHARGE SINCE: ([\\d\\.]*).*");
Matcher matcher = pattern.matcher(s);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
That should get you whatever group of digits you received as date.
Related
Say for example I have the following string with a named capture group:
/this/(?<capture1>.*)/a/string/(?<capture2>.*)
And I want to replace the capture group with a value like "foo" so that I end up with a string that looks like:
/this/foo/a/string/bar
Limitations are:
Regex must be used as the string is evaluated elsewhere but it doesn't have to be a capture group.
I'd rather not have to regex match the regex.
EDIT: There can be many groups in the string.
You can find the starting and ending index
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
startindex= matcher.start();
stopindex=matcher.end();
// Your code for replacing that index and generating a new string with foo
// you can use string buffer to delete and insert the characters as you know the indexes
}
}
Full Implementation:
public static String getnewString(String text,String reg){
StringBuffer result = new StringBuffer(text);
Pattern pattern = Pattern.compile(reg);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
int startindex= matcher.start();
int stopindex=matcher.end();
System.out.println(startindex+" "+stopindex);
result.delete(startindex, stopindex);
result.insert(startindex, "foo");
}
return result.toString();
}
Try this,
int lastIndex = s.lastIndexOf("/");
String newString = s.substring(0, lastIndex+1).concat("newString");
System.out.println(newString);
Get the subString till last '/' and then add new string to the substring like above
I got it:
String string = "/this/(?<capture1>.*)/a/string/(?<capture2>.*)";
Pattern pattern = Pattern.compile(string);
Matcher matcher = pattern.matches(string);
string.replace(matcher.group("capture1"), "value 1");
string.replace(matcher.group("capture2"), "value 2");
Crazy, but works.
I have an array input like this which is an email id in reverse order along with some data:
MOC.OOHAY#ABC.PQRqwertySDdd
MOC.OOHAY#AB.JKLasDDbfn
MOC.OOHAY#XZ.JKGposDDbfn
I want my output to come as
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG
How should I filter the string since there is no pattern?
There is a pattern, and that is any upper case character which is followed either by another upper case letter, a period or else the # character.
Translated, this would become something like this:
String[] input = new String[]{"MOC.OOHAY#ABC.PQRqwertySDdd","MOC.OOHAY#AB.JKLasDDbfn" , "MOC.OOHAY#XZ.JKGposDDbfn"};
Pattern p = Pattern.compile("([A-Z.]+#[A-Z.]+)");
for(String string : input)
{
Matcher matcher = p.matcher(string);
if(matcher.find())
System.out.println(matcher.group(1));
}
Yields:
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG
Why do you think there is no pattern?
You clearly want to get the string till you find a lowercase letter.
You can use the regex (^[^a-z]+) to match it and extract.
Regex Demo
Simply split on [a-z], with limit 2:
String s1 = "MOC.OOHAY#ABC.PQRqwertySDdd";
String s2 = "MOC.OOHAY#AB.JKLasDDbfn";
String s3 = "MOC.OOHAY#XZ.JKGposDDbfn";
System.out.println(s1.split("[a-z]", 2)[0]);
System.out.println(s2.split("[a-z]", 2)[0]);
System.out.println(s3.split("[a-z]", 2)[0]);
Demo.
You can do it like this:
String arr[] = { "MOC.OOHAY#ABC.PQRqwertySDdd", "MOC.OOHAY#AB.JKLasDDbfn", "MOC.OOHAY#XZ.JKGposDDbfn" };
for (String test : arr) {
Pattern p = Pattern.compile("[A-Z]*\\.[A-Z]*#[A-Z]*\\.[A-Z.]*");
Matcher m = p.matcher(test);
if (m.find()) {
System.out.println(m.group());
}
}
I want to take a string according to my regex in java. Suppose i have a String "R12T12W5P12T5L3"
. And now i want to have something like this : myStr[0]="R12T12",myStr[1]="W5P12",myStr[2]=T5L3. I want to have my regex first a character then a number then again a character and last a number.
How can i do that?
String s="R12T12W5P12T5L3";
String regex = "([A-Z]\\d+){2}";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
while(m.find()){
System.out.println(m.group(0));
}
this will print
R12T12
W5P12
T5L3
you can put them into a list and convert into array at the end.
All operations from the regex to the string building, in javascript :
var str = "R12T12W5P12T5L3";
var result = str.split(/(?=[^\d]){2}/).map(function(v,i,a){
return i%2 ? a[i-1]+v+'",' : 'myStr['+(i/2)+']="'
}).join('').slice(0,-1);
Result :
myStr[0]="R12T12",myStr[1]="W5P12",myStr[2]="T5L3"
I am trying to match a regex pattern in Java, and I have two questions:
Inside the pattern I'm looking for there is a known beginning and then an unknown string that I want to get up until the first occurrence of an &.
there are multiple occurrences of these patterns in the line and I would like to get each occurrence separately.
For example I have this input line:
1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate%7C120HZ&sName=View+All&subCatView=true 0 2819357575609397706
And I am interested in these strings:
Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.
Screen+Refresh+Rate%7C120HZ
Assuming the known beginning is filter=**, the regular expression pattern (?:filter=\\*\\*)(.*?)(?:&) should get you what you need. Use Matcher.find() to get all occurrences of the pattern in a given string. Using the test string you provided, the following:
final Pattern p = Pattern.compile("(?:filter=\\*\\*)(.*?)(?:&)");
final Matcher m = p.matcher(testString);
int cnt = 0;
while (m.find()) {
System.out.println(++cnt + ": G1: " + m.group(1));
}
Will output:
1: G1: Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.
2: G1: Screen+Refresh+Rate%7C120HZ**
If i know that I might need other query parameters in the future, I think it'll be more prudent to decode and parse the URL.
String url = URLDecoder.decode("http://www.gold.com/shc/s/c_10153_12605_" +
"Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate" +
"%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All&viewItems=25&subCatView=true"
,"utf-8");
Pattern amp = Pattern.compile("&");
Pattern eq = Pattern.compile("=");
Map<String, String> params = new HashMap<String, String>();
String queryString = url.substring(url.indexOf('?') + 1);
for(String param : amp.split(queryString)) {
String[] pair = eq.split(param);
params.put(pair[0], pair[1]);
}
for(Entry<String, String> param : params.entrySet()) {
System.out.format("%s = %s\n", param.getKey(), param.getValue());
}
Output
subCatView = true
viewItems = 25
sName = View All
filter = Screen Refresh Rate|120HZ^Screen Size|37 in. to 42 in.
in your example, there is sometimes a "**" at the end before the "&". but basically, (assuming "filter=" is the start pattern you are looking for) you want something like:
"filter=([^&]+)&"
Using the regular expression (?<=filter=\*{0,2})[^&]*[^&*]+ in java:
Pattern p = Pattern.compile("(?<=filter=\\*{0,2})[^&]*[^&*]+");
String s = "1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All**&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ**&sName=View+All&subCatView=true 0 2819357575609397706";
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
EDIT:
Added [^&*]+ to the end of the regex to prevent the ** from being included in the second match.
EDIT2:
Changed regular expression to use lookbehind.
The regex you're looking for is
Screen\+Refresh\+Rate[^&]*
You could use Matcher.find() to find all matches.
are you looking for a string that follows with "filter=" and ignores the first "*" and is end with the first "&".
your can try the following:
String str = "1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All**&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ**&sName=View+All&subCatView=true 0 2819357575609397706";
Pattern p = Pattern.compile("filter=(?:\\**)([^&]+?)(?:\\**)&");
Matcher matcher = p.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
}
I have a file with some custom tags and I'd like to write a regular expression to extract the string between the tags. For example if my tag is:
[customtag]String I want to extract[/customtag]
How would I write a regular expression to extract only the string between the tags. This code seems like a step in the right direction:
Pattern p = Pattern.compile("[customtag](.+?)[/customtag]");
Matcher m = p.matcher("[customtag]String I want to extract[/customtag]");
Not sure what to do next. Any ideas? Thanks.
You're on the right track. Now you just need to extract the desired group, as follows:
final Pattern pattern = Pattern.compile("<tag>(.+?)</tag>", Pattern.DOTALL);
final Matcher matcher = pattern.matcher("<tag>String I want to extract</tag>");
matcher.find();
System.out.println(matcher.group(1)); // Prints String I want to extract
If you want to extract multiple hits, try this:
public static void main(String[] args) {
final String str = "<tag>apple</tag><b>hello</b><tag>orange</tag><tag>pear</tag>";
System.out.println(Arrays.toString(getTagValues(str).toArray())); // Prints [apple, orange, pear]
}
private static final Pattern TAG_REGEX = Pattern.compile("<tag>(.+?)</tag>", Pattern.DOTALL);
private static List<String> getTagValues(final String str) {
final List<String> tagValues = new ArrayList<String>();
final Matcher matcher = TAG_REGEX.matcher(str);
while (matcher.find()) {
tagValues.add(matcher.group(1));
}
return tagValues;
}
However, I agree that regular expressions are not the best answer here. I'd use XPath to find elements I'm interested in. See The Java XPath API for more info.
To be quite honest, regular expressions are not the best idea for this type of parsing. The regular expression you posted will probably work great for simple cases, but if things get more complex you are going to have huge problems (same reason why you cant reliably parse HTML with regular expressions). I know you probably don't want to hear this, I know I didn't when I asked the same type of questions, but string parsing became WAY more reliable for me after I stopped trying to use regular expressions for everything.
jTopas is an AWESOME tokenizer that makes it quite easy to write parsers by hand (I STRONGLY suggest jtopas over the standard java scanner/etc.. libraries). If you want to see jtopas in action, here are some parsers I wrote using jTopas to parse this type of file
If you are parsing XML files, you should be using an xml parser library. Dont do it youself unless you are just doing it for fun, there are plently of proven options out there
A generic,simpler and a bit primitive approach to find tag, attribute and value
Pattern pattern = Pattern.compile("<(\\w+)( +.+)*>((.*))</\\1>");
System.out.println(pattern.matcher("<asd> TEST</asd>").find());
System.out.println(pattern.matcher("<asd TEST</asd>").find());
System.out.println(pattern.matcher("<asd attr='3'> TEST</asd>").find());
System.out.println(pattern.matcher("<asd> <x>TEST<x>asd>").find());
System.out.println("-------");
Matcher matcher = pattern.matcher("<as x> TEST</as>");
if (matcher.find()) {
for (int i = 0; i <= matcher.groupCount(); i++) {
System.out.println(i + ":" + matcher.group(i));
}
}
String s = "<B><G>Test</G></B><C>Test1</C>";
String pattern ="\\<(.+)\\>([^\\<\\>]+)\\<\\/\\1\\>";
int count = 0;
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(s);
while(m.find())
{
System.out.println(m.group(2));
count++;
}
Try this:
Pattern p = Pattern.compile(?<=\\<(any_tag)\\>)(\\s*.*\\s*)(?=\\<\\/(any_tag)\\>);
Matcher m = p.matcher(anyString);
For example:
String str = "<TR> <TD>1Q Ene</TD> <TD>3.08%</TD> </TR>";
Pattern p = Pattern.compile("(?<=\\<TD\\>)(\\s*.*\\s*)(?=\\<\\/TD\\>)");
Matcher m = p.matcher(str);
while(m.find()){
Log.e("Regex"," Regex result: " + m.group())
}
Output:
10 Ene
3.08%
final Pattern pattern = Pattern.compile("tag\\](.+?)\\[/tag");
final Matcher matcher = pattern.matcher("[tag]String I want to extract[/tag]");
matcher.find();
System.out.println(matcher.group(1));
I prefix this reply with "you shouldn't use a regular expression to parse XML -- it's only going to result in edge cases that don't work right, and a forever-increasing-in-complexity regex while you try to fix it."
That being said, you need to proceed by matching the string and grabbing the group you want:
if (m.matches())
{
String result = m.group(1);
// do something with result
}
This works for me, use in your main method below Scanner input. Works for Hackerrank "Tag Content Extractor" also.
boolean matchFound = false;
Pattern r = Pattern.compile("<(.+)>([^<]+)</\\1>");
Matcher m = r.matcher(line);
while (m.find()) {
System.out.println(m.group(2));
matchFound = true;
}
if ( ! matchFound) {
System.out.println("None");
}
testCases--;