Java regular expression with hyphen - java

I need to match and parse data in a file that looks like:
4801-1-21-652-1-282098
4801-1-21-652-2-282098
4801-1-21-652-3-282098
4801-1-21-652-4-282098
4801-1-21-652-5-282098
but the pattern I wrote below does not seem to work. Can someone help me understand why?
final String patternStr = "(\\d+)-(\\d+)-(\\d+)-(\\d+)-(\\d+)-(\\d+)";
final Pattern p = Pattern.compile(patternStr);
while ((this.currentLine = this.reader.readLine()) != null) {
final Matcher m = p.matcher(this.currentLine);
if (m.matches()) {
System.out.println("SUCCESS");
}
}

It looks correct.
Something odd is conatined in your lines, probably. Look for some extra spaces and line breaks.
Try this:
final Matcher m = p.matcher(this.currentLine.trim());

Have you tried escaping the - as \\-?

It should work. Make sure there is no invisible characters, you an trim each line. You can refine the code as :
final String patternStr = "(\\d{4})-(\\d{1})-(\\d{2})-(\\d{3})-(\\d{1})-(\\d{6})";

There is white space in the data
4801-1-21-652-1-282098
4801-1-21-652-2-282098
4801-1-21-652-3-282098
4801-1-21-652-4-282098
4801-1-21-652-5-282098
final String patternStr = "\\s*(\\d+)-(\\d+)-(\\d+)-(\\d+)-(\\d+)-(\\d+)";

Related

Regex to get value between two colon excluding the colons

I have a string like this:
something:POST:/some/path
Now I want to take the POST alone from the string. I did this by using this regex
:([a-zA-Z]+):
But this gives me a value along with colons. ie I get this:
:POST:
but I need this
POST
My code to match the same and replace it is as follows:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
System.out.println(matcher.group());
ss = ss.replaceFirst(":([a-zA-Z]+):", "*");
}
System.out.println(ss);
EDIT:
I've decided to use the lookahead/lookbehind regex since I did not want to use replace with colons such as :*:. This is my final solution.
String s = "something:POST:/some/path/";
String regex = "(?<=:)[a-zA-Z]+(?=:)";
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.find()) {
s = s.replaceFirst(matcher.group(), "*");
System.out.println("replaced: " + s);
}
else {
System.out.println("not replaced: " + s);
}
There are two approaches:
Keep your Java code, and use lookahead/lookbehind (?<=:)[a-zA-Z]+(?=:), or
Change your Java code to replace the result with ":*:"
Note: You may want to define a String constant for your regex, since you use it in different calls.
As pointed out, the reqex captured group can be used to replace.
The following code did it:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
ss = ss.replaceFirst(matcher.group(1), "*");
}
System.out.println(ss);
UPDATE
Looking at your update, you just need ReplaceFirst only:
String result = s.replaceFirst(":[a-zA-Z]+:", ":*:");
See the Java demo
When you use (?<=:)[a-zA-Z]+(?=:), the regex engine checks each location inside the string for a * before it, and once found, tries to match 1+ ASCII letters and then assert that there is a : after them. With :[A-Za-z]+:, the checking only starts after a regex engine found : character. Then, after matching :POST:, the replacement pattern replaces the whole match. It is totlally OK to hardcode colons in the replacement pattern since they are hardcoded in the regex pattern.
Original answer
You just need to access Group 1:
if (matcher.find()) {
System.out.println(matcher.group(1));
}
See Java demo
Your :([a-zA-Z]+): regex contains a capturing group (see (....) subpattern). These groups are numbered automatically: the first one has an index of 1, the second has the index of 2, etc.
To replace it, use Matcher#appendReplacement():
String s = "something:POST:/some/path/";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile(":([a-zA-Z]+):").matcher(s);
while (m.find()) {
m.appendReplacement(result, ":*:");
}
m.appendTail(result);
System.out.println(result.toString());
See another demo
This is your solution:
regex = (:)([a-zA-Z]+)(:)
And code is:
String ss = "something:POST:/some/path/";
ss = ss.replaceFirst("(:)([a-zA-Z]+)(:)", "$1*$3");
ss now contains:
something:*:/some/path/
Which I believe is what you are looking for...

I am trying to extract text using regex but it is not working

I am trying to extract text using regex but it is not working. Although my regex work fine on regex validators.
public class HelloWorld {
public static void main(String []args){
String PATTERN1 = "F\\{([\\w\\s&]*)\\}";
String PATTERN2 = "{([\\w\\s&]*)\\}";
String src = "F{403}#{Title1}";
List<String> fvalues = Arrays.asList(src.split("#"));
System.out.println(fieldExtract(fvalues.get(0), PATTERN1));
System.out.println(fieldExtract(fvalues.get(1), PATTERN2));
}
private static String fieldExtract(String src, String ptrn) {
System.out.println(src);
System.out.println(ptrn);
Pattern pattern = Pattern.compile(ptrn);
Matcher matcher = pattern.matcher(src);
return matcher.group(1);
}
}
Why not use:
Pattern regex = Pattern.compile("F\\{([\\d\\s&]*)\\}#\\{([\\s\\w&]*)\\}");
To get both ?
This way the number will be in group 1 and the title in group 2.
Another thing if you're going to compile the regex (which can be helpful to performance) at least make the regex object static so that it doesn't get compiled each time you call the function (which kind of misses the whole pre-compilation point :) )
Basic demo here.
First problem:
String PATTERN2 = "\\{([\\w\\s&]*)\\}"; // quote '{'
Second problem:
Matcher matcher = pattern.matcher(src);
if( matcher.matches() ){
return matcher.group(1);
} else ...
The Matcher must be asked to plough the field, otherwise you can't harvest the results.

Java Parsing of a String

I have a string like this:
String unparsed = "[thing.1][thin2g]"
I want to turn it into
"thing.1"
"thin2g"
Been trying for a while with regex expressions but nothing. Any thoughts? Thanks!
EDIT:
Tried:
String unparsed = "[thing.1][thin2g]"
String substring = unparsed.substring(1,unparsed.length - 1)
substring.replace("][","`")
String[] split = substring.split('`')
for(int i=0;i<split.length;i++)
{
System.out.println(split[i])
}
But this seems kinda heavy, was looking for something more elegant
String unparsed = "[thing.1][thin2g]";
Pattern pattern = Pattern.compile("\\[(.*?)\\]");
Matcher matcher = pattern.matcher(unparsed);
while(matcher.find()){
System.out.println(matcher.group(1));
}
My regex is not good. But it does parse the string into what you want.
\[\s*\w.*?\]
I never use regular expressions before but this one should work

Java regex to extract text between tags

I have a file with some custom tags and I'd like to write a regular expression to extract the string between the tags. For example if my tag is:
[customtag]String I want to extract[/customtag]
How would I write a regular expression to extract only the string between the tags. This code seems like a step in the right direction:
Pattern p = Pattern.compile("[customtag](.+?)[/customtag]");
Matcher m = p.matcher("[customtag]String I want to extract[/customtag]");
Not sure what to do next. Any ideas? Thanks.
You're on the right track. Now you just need to extract the desired group, as follows:
final Pattern pattern = Pattern.compile("<tag>(.+?)</tag>", Pattern.DOTALL);
final Matcher matcher = pattern.matcher("<tag>String I want to extract</tag>");
matcher.find();
System.out.println(matcher.group(1)); // Prints String I want to extract
If you want to extract multiple hits, try this:
public static void main(String[] args) {
final String str = "<tag>apple</tag><b>hello</b><tag>orange</tag><tag>pear</tag>";
System.out.println(Arrays.toString(getTagValues(str).toArray())); // Prints [apple, orange, pear]
}
private static final Pattern TAG_REGEX = Pattern.compile("<tag>(.+?)</tag>", Pattern.DOTALL);
private static List<String> getTagValues(final String str) {
final List<String> tagValues = new ArrayList<String>();
final Matcher matcher = TAG_REGEX.matcher(str);
while (matcher.find()) {
tagValues.add(matcher.group(1));
}
return tagValues;
}
However, I agree that regular expressions are not the best answer here. I'd use XPath to find elements I'm interested in. See The Java XPath API for more info.
To be quite honest, regular expressions are not the best idea for this type of parsing. The regular expression you posted will probably work great for simple cases, but if things get more complex you are going to have huge problems (same reason why you cant reliably parse HTML with regular expressions). I know you probably don't want to hear this, I know I didn't when I asked the same type of questions, but string parsing became WAY more reliable for me after I stopped trying to use regular expressions for everything.
jTopas is an AWESOME tokenizer that makes it quite easy to write parsers by hand (I STRONGLY suggest jtopas over the standard java scanner/etc.. libraries). If you want to see jtopas in action, here are some parsers I wrote using jTopas to parse this type of file
If you are parsing XML files, you should be using an xml parser library. Dont do it youself unless you are just doing it for fun, there are plently of proven options out there
A generic,simpler and a bit primitive approach to find tag, attribute and value
Pattern pattern = Pattern.compile("<(\\w+)( +.+)*>((.*))</\\1>");
System.out.println(pattern.matcher("<asd> TEST</asd>").find());
System.out.println(pattern.matcher("<asd TEST</asd>").find());
System.out.println(pattern.matcher("<asd attr='3'> TEST</asd>").find());
System.out.println(pattern.matcher("<asd> <x>TEST<x>asd>").find());
System.out.println("-------");
Matcher matcher = pattern.matcher("<as x> TEST</as>");
if (matcher.find()) {
for (int i = 0; i <= matcher.groupCount(); i++) {
System.out.println(i + ":" + matcher.group(i));
}
}
String s = "<B><G>Test</G></B><C>Test1</C>";
String pattern ="\\<(.+)\\>([^\\<\\>]+)\\<\\/\\1\\>";
int count = 0;
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(s);
while(m.find())
{
System.out.println(m.group(2));
count++;
}
Try this:
Pattern p = Pattern.compile(?<=\\<(any_tag)\\>)(\\s*.*\\s*)(?=\\<\\/(any_tag)\\>);
Matcher m = p.matcher(anyString);
For example:
String str = "<TR> <TD>1Q Ene</TD> <TD>3.08%</TD> </TR>";
Pattern p = Pattern.compile("(?<=\\<TD\\>)(\\s*.*\\s*)(?=\\<\\/TD\\>)");
Matcher m = p.matcher(str);
while(m.find()){
Log.e("Regex"," Regex result: " + m.group())
}
Output:
10 Ene
3.08%
final Pattern pattern = Pattern.compile("tag\\](.+?)\\[/tag");
final Matcher matcher = pattern.matcher("[tag]String I want to extract[/tag]");
matcher.find();
System.out.println(matcher.group(1));
I prefix this reply with "you shouldn't use a regular expression to parse XML -- it's only going to result in edge cases that don't work right, and a forever-increasing-in-complexity regex while you try to fix it."
That being said, you need to proceed by matching the string and grabbing the group you want:
if (m.matches())
{
String result = m.group(1);
// do something with result
}
This works for me, use in your main method below Scanner input. Works for Hackerrank "Tag Content Extractor" also.
boolean matchFound = false;
Pattern r = Pattern.compile("<(.+)>([^<]+)</\\1>");
Matcher m = r.matcher(line);
while (m.find()) {
System.out.println(m.group(2));
matchFound = true;
}
if ( ! matchFound) {
System.out.println("None");
}
testCases--;

Parsing text from the end (using regular expressions)

I have a seemingly simple problem though i am unable to get my head around it.
Let's say i have the following string: 'abcabcabcabc' and i want to get the last occurrence of 'ab'. Is there a way i can do this without looping through all the other 'ab's from the beginning of the string?
I read about anchoring the end of the string and then parsing the string with the required regular expression. I am unsure how to do this in Java (is it supported?).
Update: I guess i have caused a lot of confusion with my (over) simplified example. Let me try another one. Say, i have a string as thus - '12/08/2008 some_text 21/10/2008 some_more_text 15/12/2008 and_finally_some_more'. Here, i want the last date and hence i need to use regular expressions. I hope this is a better example.
Thanks,
Anirudh
Firstly, thanks for all the answers.
Here is what i tried and this worked for me:
Pattern pattern = Pattern.compile("(ab)(?!.*ab)");
Matcher matcher = pattern.matcher("abcabcabcd");
if(matcher.find()) {
System.out.println(matcher.start() + ", " + matcher.end());
}
This displays the following:
6, 8
So, to generalize - <reg_ex>(?!.*<reg_ex>) should solve this problem where '?!' signifies that the string following it should not be present after the string that precedes '?!'.
Update: This page provides a more information on 'not followed by' using regex.
This will give you the last date in group 1 of the match object.
.*(\d{2}/\d{2}/\d{4})
Pattern p = Pattern.compile("ab.*?$");
Matcher m = p.matcher("abcabcabcabc");
boolean b = m.matches();
I do not understand what you are trying to do. Why only the last if they are all the same? Why a regular expression and why not int pos = s.lastIndexOf(String str) ?
For the date example, you could do this with the Pattern API and not in the regex itself. The basic idea is to get all the matches, then return the last one.
public static void main(String[] args) {
// this may be over-kill, you can replace with a much simpler but more lenient version
final String dateRegex = "\\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20)?[0-9]{2}\\b";
final String sample = "12/08/2008 some_text 21/10/2008 some_more_text 15/12/2008 and_finally_some_more";
List<String> allMatches = getAllMatches(dateRegex, sample);
System.out.println(allMatches.get(allMatches.size() - 1));
}
private static List<String> getAllMatches(final String regex, final String input) {
final Matcher matcher = Pattern.compile(regex).matcher(input);
return new ArrayList<String>() {{
while (matcher.find())
add(input.substring(matcher.start(), matcher.end()));
}};
}

Categories