Hey, I've been trying to figure out why this regular expression isn't matching correctly.
List l_operators = Arrays.asList(Pattern.compile(" (\\d+)").split(rtString.trim()));
The input string is "12+22+3"
The output I get is -- [,+,+]
There's a match at the beginning of the list which shouldn't be there? I really can't see it and I could use some insight. Thanks.
Well, technically, there is an empty string in front of the first delimiter (first sequence of digits). If you had, say a line of CSV, such as abc,def,ghi and another one ,jkl,mno you would clearly want to know that the first value in the second string was the empty string. Thus the behaviour is desirable in most cases.
For your particular case, you need to deal with it manually, or refine your regular expression somehow. Like this for instance:
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(rtString);
if (m.find()) {
List l_operators = Arrays.asList(p.split(rtString.substring(m.end()).trim()));
// ...
}
Ideally however, you should be using a parser for these type of strings. You can't for instance deal with parenthesis in expressions using just regular expressions.
That's the behavior of split in Java. You just have to take it (and deal with it) or use other library to split the string. I personally try to avoid split from Java.
An example of one alternative is to look at Splitter from Google Guava.
Try Guava's Splitter.
Splitter.onPattern("\\d+").omitEmptyStrings().split(rtString)
Related
I'm parsing every line of a file (XML file) and I need to find path="this_is_my_path". After this, I need to extract whats inside the \". I need to get this_is_my_path.
This is what I'm doing:
String pattern = ".*path=\"(.*?)\"";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(the_text_file);
while (m.find()) {
System.out.println(m.group().trim());
}
After running this, I'm getting this:
path="path_to_file"
test="ui_test" path="path_to_other_file"
.....
I should be printing this:
path_to_file
path_to_other_file
path_to_other_fileX
path_to_other_fileW
If you need to use regex, try with this:
(?<=path=\")(.*?)(?=\")
DEMO
Or you can use your regex, but without .* at the begenning, because it match also any content before path= in same line. Then get value by group 1.
Why reinvent the wheel? Unless this is a challenge or something?
http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/
One should really try and collect the many reasons why using a regular expression is insufficient for getting anything out reliably from an XML file, even if that "anything" is just a measly attribute, e.g. path and its (string) value. A simple pattern such as "path=\"(.*?)\"" is doomed to fail due to the tiniest amount of freedom the XML spec leaves for writing legal XML, and more.
White space, including line breaks, may occur before and after the equal sign.
Apostrophes can be used instead of quotes.
Any character can be written as a numeric or named entity.
The string could be part of an element or attribute value.
The string could occur in an XML comment.
The XML file may be written in an encoding which naive reading as a vanilla text file fails to take into account; hence data may be garbage.
So, just for the record: I strongly suggest to use an XSLT transformation to extract the desired attribute values. This requires just a very simple template. Using an XML parser requires more lines of codes, but it is equally reliable.
And here is the Java code I strongly advocate not to use - it just covers two out of the points mentioned above.
String theText = ...;
String pattern = "\\bpath\\s*=\\s*(\"(.*?)\"|'(.*?)')";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(theText);
while (m.find()) {
System.out.println(m.group(1).trim());
}
(And did you notice the word boundary preceding path? Just another chance to go wrong with this approach.)
For part of my Java assignment I'm required to select all records that have a certain area code. I have custom objects within an ArrayList, like ArrayList<Foo>.
Each object has a String phoneNumber variable. They are formatted like "(555) 555-5555"
My goal is to search through each custom object in the ArrayList<Foo> (call it listOfFoos) and place the objects with area code "616" in a temporaryListOfFoos ArrayList<Foo>.
I have looked into tokenizers, but was unable to get the syntax correct. I feel like what I need to do is similar to this post, but since I'm only trying to retrieve the first 3 digits (and I don't care about the remaining 7), this really didn't give me exactly what I was looking for. Ignore parentheses with string tokenizer?
What I did as a temporary work-around, was...
for (int i = 0; i<listOfFoos.size();i++){
if (listOfFoos.get(i).getPhoneNumber().contains("616")){
tempListOfFoos.add(listOfFoos.get(i));
}
}
This worked for our current dataset, however, if there was a 616 anywhere else in the phone numbers [like "(555) 616-5555"] it obviously wouldn't work properly.
If anyone could give me advice on how to retrieve only the first 3 digits, while ignoring the parentheses, I would greatly appreciate it.
You have two options:
Use value.startsWith("(616)") or,
Use regular expressions with this pattern "^\(616\).*"
The first option will be a lot quicker.
areaCode = number.substring(number.indexOf('(') + 1, number.indexOf(')')).trim() should do the job for you, given the formatting of phone numbers you have.
Or if you don't have any extraneous spaces, just use areaCode = number.substring(1, 4).
I think what you need is a capturing group. Have a look at the Groups and capturing section in this document.
Once you are done matching the input with a pattern (for example "\((\\d+)\) \\d+-\\d+"), you can get the number in the parentheses using a matcher (object of java.util.regex.Matcher) with matcher.group(1).
You could use a regular expression as shown below. The pattern will ensure the entire phone number conforms to your pattern ((XXX) XXX-XXXX) plus grabs the number within the parentheses.
int areaCodeToSearch = 555;
String pattern = String.format("\\((%d)\\) \\d{3}-\\d{4}", areaCodeToSearch);
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(phoneNumber);
if (m.matches()) {
String areaCode = m.group(1);
// ...
}
Whether you choose to use a regular expression versus a simple String lookup (as mentioned in other answers) will depend on how bothered you are about the format of the entire string.
I must confess, I'm pretty useless when it comes to writing regular expressions, but I've currently got a problem that's really confusing me.
I have written a function that takes a string as input (22K in size) and performs a single regex on it, looking for Long values. One a long value has been found, it is replaced with a String value from a hashmap.
However, it keeps on missing values within the String, the regex I have written is:
Pattern.compile("[*]{3}[0-9]{1,}[*]{3}");
The long values I'm searching for in the file are formatted as such:
***nnnnnnnnnnnnnnnn***
Now the regex seems to work, but like I said, it misses some values, for example:
***1407374883553285*** - FOUND
***281474976720057*** - NOT FOUND
I'm really quite confused as to why it's missing values, I'm using a simple while loop to do the search, and matcher.find() for when it does match.
I'm assuming that either my regex isn't strict enough, or it's missing values due to the way the data is structured in the input string.
If anyone can offer any advice, I'd greatly appreciate it.
Thanks
A cleaner regex is [*]{3}\d+[*]{3}. Check it against the following to see how it goes:
final Pattern pattern = Pattern.compile("[*]{3}\\d+[*]{3}");
final Matcher matcher = pattern.matcher("inputfile");
while (matcher.find())
{
System.out.println(matcher.group());
}
You can use java.util.regex.Pattern.matches(String regEx, CharSequence input) with regualar expression as "[*]{3}[0-9]*[*]{3}"
I am doing string manipulations and I need more advanced functions than the original ones provided in Java.
For example, I'd like to return a substring between the (n-1)th and nth occurrence of a character in a string.
My question is, are there classes already written by users which perform this function, and many others for string manipulations? Or should I dig on stackoverflow for each particular function I need?
Check out the Apache Commons class StringUtils, it has plenty of interesting ways to work with Strings.
http://commons.apache.org/lang/api-2.3/index.html?org/apache/commons/lang/StringUtils.html
Have you looked at the regular expression API? That's usually your best bet for doing complex things with strings:
http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
Along the lines of what you're looking to do, you can traverse the string against a pattern (in your case a single character) and match everything in the string up to but not including the next instance of the character as what is called a capture group.
It's been a while since I've written a regex, but if you were looking for the character A for instance, then I think you could use the regex A([^A]*) and keep matching that string. The stuff in the parenthesis is a capturing group, which I reference below. To match it, you'd use the matcher method on pattern:
http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#matcher%28java.lang.CharSequence%29
On the Matcher instance, you'd make sure that matches is true, and then keep calling find() and group(1) as needed, where group(1) would get you what is in between the parentheses. You could use a counter in your looping to make sure you get the n-1 instance of the letter.
Lastly, Pattern provides flags you can pass in to indicate things like case insensitivity, which you may need.
If I've made some mistakes here, then someone please correct me. Like I said, I don't write regexes every day, so I'm sure I'm a little bit off.
let's say I have two xml strings:
String logToSearch = "<abc><number>123456789012</number></abc>"
String logToSearch2 = "<abc><number xsi:type=\"soapenc:string\" /></abc>"
String logToSearch3 = "<abc><number /></abc>";
I need a pattern which finds the number tag if the tag contains value, i.e. the match should be found only in the logToSearch.
I'm not saying i'm looking for the number itself, but rather that the matcher.find method should return true only for the first string.
For now i have this:
Pattern pattern = Pattern.compile("<(" + pattrenString + ").*?>",
Pattern.CASE_INSENSITIVE);
where the patternString is simply "number". I tried to add "<(" + pattrenString + ")[^/>].*?> but it didn't work because in [^/>] each character is treated separately.
Thanks
This is absolutely the wrong way to parse XML. In fact, if you need more than just the basic example given here, there's provably no way to solve the more complex cases with regex.
Use an easy XML parser, like XOM. Now, using xpath, query for the elements and filter those without data. I can only imagine that this question is a precursor to future headaches unless you modify your approach right now.
So a search for "<number[^/>]*>" would find the opening tag. If you want to be sure it isn't empty, try "<number[^/>]*>[^<]" or "<number[^/>]*>[0-9]"