Regex and String Manipulation Techniques - java

Given that an input String may be specified as follows:
read(xpath(‘...’)) or
xpath(‘...’) or
...
Where ... just holds some xpath expression, for example,/comment/text
All I really want is the xpath expression; what would be an efficient way to in general extract this value given the three possible valid patterns that could be specified.
Also, I am implementing this in Java.

Here is the basic example, it matches xpath part in both of your string examples:
import java.util.regex.*;
class Untitled {
public static void main(String[] args) {
String input = "read(xpath('...'))";
String result = null;
Pattern regex = Pattern.compile("xpath\\(\'(.*?)\'\\)");
Matcher matcher = regex.matcher(input);
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
}
}

It would have helped you posted some code but you can use String Split. Please refer to http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html#split%28java.lang.String%29
If you are sure to have xpath('') , then you can use xpath(' as your regex and strip the string and gather the data inside it until it hits another ' (apostrophe).
I hope this gives you an idea.

Related

Regular expression get the third element from a string

Hello Im having trouble getting the third element of a string (F604080)
<sourceDocumentId>AX02_APF604_F604080</sourceDocumentId>
I have tried with this regular expression and variations, but i can manage to get
F604080.
(?<=\w+_)\w+(?=\<)
(?<=\w+_\w+_)\w+(?=\<)
....
Any help will be appreciated.
Thanks.
You don't need look behind or look ahead, instead just use this simple regex,
.*_(\w+)
and capture group 1.
Java codes,
public static void main(String[] args) {
String s = "<sourceDocumentId>AX02_APF604_F604080</sourceDocumentId>";
Pattern p = Pattern.compile(".*_(\\w+)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(1));
} else {
System.out.println("Didn't match");
}
}
Prints this like you wanted.
F604080
Using regex you can use something like >\w+_\w+_(\w+)<\/
String str = "<sourceDocumentId>AX02_APF604_F604080</sourceDocumentId>";
String code = null;
Matcher m = Pattern.compile(">\\w+_\\w+_(\\w+)</").matcher(str);
if (m.find()) {
code = m.group(1);
}
Simply use substring() operation
String code = str.substring(str.lastIndexOf('_') + 1, str.lastIndexOf('<'));
If later you parse XML with more element, you may use something like Java DOM Parser XML, but here this is not the best option as you have only one element
Can you just parse the string using "_" as separator and take the 3rd element ?
Both of your regular expressions seems to be matching the given string.
Anyway you could be a little bit more specific with this one:
^(?:<\w+>)(?:\w+)_(?:\w+)_(\w+)(?:<\/\w+>)$
Be sure that the input is the string you think it is and no additional text is given after that.

Use regex to replace a specific pattern

From a given string, am trying to replace a pattern such as "sometext.othertext.lasttext" with "lasttext". Is this possible in Java with Regex replace? If yes, how? Thanks in advance.
I tried
"hellow.world".replaceAll("(.*)\\.(.*)", "$2")
which results in world. But, I want to replace any such arbitrary sequence. For instance com.google.code should be replace with code and com.facebook should be replaced with facebook.
Just to add, a test input is:
if (com.google.code) then
and the test output should be:
if (code) then
Thanks.
I believe this is what you are looking for, if you're trying to avoid String methods. It can be made more succinct, but I'm hoping this will give you a better understanding.
As others suggested, String methods are cleaner.
class Split {
public static void main (String[] args) {
String inputString = "if (com.google.code) then";
Pattern p=Pattern.compile("((?<=\\()[^}]*(?=\\)))"); // Find text within parenthesis
Pattern p2 = Pattern.compile("(\\w+)(\\))"); // Find last portion of text between . and )
Matcher m = p.matcher(inputString);
Matcher m2 = p2.matcher(inputString);
String in2 = "";
if (m2.find())
in2=m2.group(1); // else ... error checking
inputString = m.replaceAll(in2); // do whatcha need to do
}
}
If the parenthesis aren't the concern, use this.
class Split {
public static void main (String[] args) {
String in = "if (com.google.code) then";
Pattern p = Pattern.compile("(\\w+)(\\))");
Matcher m = p.matcher(in);
if(m.find())
in = m.group(1);
System.out.println(in); // or whatever
}
}
Use:
str.replaceAll(".*\\.(\\w+)$", "$1")
Explanation here

I am trying to extract text using regex but it is not working

I am trying to extract text using regex but it is not working. Although my regex work fine on regex validators.
public class HelloWorld {
public static void main(String []args){
String PATTERN1 = "F\\{([\\w\\s&]*)\\}";
String PATTERN2 = "{([\\w\\s&]*)\\}";
String src = "F{403}#{Title1}";
List<String> fvalues = Arrays.asList(src.split("#"));
System.out.println(fieldExtract(fvalues.get(0), PATTERN1));
System.out.println(fieldExtract(fvalues.get(1), PATTERN2));
}
private static String fieldExtract(String src, String ptrn) {
System.out.println(src);
System.out.println(ptrn);
Pattern pattern = Pattern.compile(ptrn);
Matcher matcher = pattern.matcher(src);
return matcher.group(1);
}
}
Why not use:
Pattern regex = Pattern.compile("F\\{([\\d\\s&]*)\\}#\\{([\\s\\w&]*)\\}");
To get both ?
This way the number will be in group 1 and the title in group 2.
Another thing if you're going to compile the regex (which can be helpful to performance) at least make the regex object static so that it doesn't get compiled each time you call the function (which kind of misses the whole pre-compilation point :) )
Basic demo here.
First problem:
String PATTERN2 = "\\{([\\w\\s&]*)\\}"; // quote '{'
Second problem:
Matcher matcher = pattern.matcher(src);
if( matcher.matches() ){
return matcher.group(1);
} else ...
The Matcher must be asked to plough the field, otherwise you can't harvest the results.

Parse a string in Java

I have strings formatted similar to the one below in a Java program. I need to get the number out.
Host is up (0.0020s latency).
I need the number between the '(' and the 's' characters. E.g., I would need the 0.0020 in this example.
If you are sure it will always be the first number you could use the regular expresion \d+\.\d+ (but note that the backslashes need to be escaped in Java string literals).
Try this code:
String input = "Host is up (0.0020s latency).";
Pattern pattern = Pattern.compile("\\d+\\.\\d+");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
System.out.println(matcher.group());
}
See it working online: ideone
You could also include some of the surrounding characters in the regular expression to reduce the risk of matching the wrong number. To do exactly as you requested in the question (i.e. matching between ( and s) use this regular expression:
\((\d+\.\d+)s
See it working online: ideone
Sounds like a case for regular expressions.
You'll want to match for the decimal figure and then parse that match:
Float matchedValue;
Pattern pattern = Pattern.compile("\\d*\\.\\d+");
Matcher matcher = pattern.matcher(yourString);
boolean isfound = matcher.find();
if (isfound) {
matchedValue = Float.valueOf(matcher.group(0));
}
It depends on how "similar" you mean. You could potentially use a regular expression:
import java.math.BigDecimal;
import java.util.regex.*;
public class Test {
public static void main(String args[]) throws Exception {
Pattern pattern = Pattern.compile("[^(]*\\(([0-9]*\\.[0-9]*)s");
String text = "Host is up (0.0020s latency).";
Matcher match = pattern.matcher(text);
if (match.lookingAt())
{
String group = match.group(1);
System.out.println("Before parsing: " + group);
BigDecimal value = new BigDecimal(group);
System.out.println("Parsed: " + value);
}
else
{
System.out.println("No match");
}
}
}
Quite how specific you want to make your pattern is up to you, of course. This only checks for digits, a dot, then digits after an opening bracket and before an s. You may need to refine it to make the dot optional etc.
This is a great site for building regular expressions from simple to very complex. You choose the language and boom.
http://txt2re.com/
Here's a way without regex
String str = "Host is up (0.0020s latency).";
str = str.substring(str.indexOf('(')+1, str.indexOf("s l"));
System.out.println(str);
Of course using regular expressions in this case is best solution but in many simple cases you can use also something like :
String value = myString.subString(myString.indexOf("("), myString.lastIndexOf("s"))
double numericValue = Double.parseDouble(value);
This is not recomended because text in myString can changes.

Parsing text from the end (using regular expressions)

I have a seemingly simple problem though i am unable to get my head around it.
Let's say i have the following string: 'abcabcabcabc' and i want to get the last occurrence of 'ab'. Is there a way i can do this without looping through all the other 'ab's from the beginning of the string?
I read about anchoring the end of the string and then parsing the string with the required regular expression. I am unsure how to do this in Java (is it supported?).
Update: I guess i have caused a lot of confusion with my (over) simplified example. Let me try another one. Say, i have a string as thus - '12/08/2008 some_text 21/10/2008 some_more_text 15/12/2008 and_finally_some_more'. Here, i want the last date and hence i need to use regular expressions. I hope this is a better example.
Thanks,
Anirudh
Firstly, thanks for all the answers.
Here is what i tried and this worked for me:
Pattern pattern = Pattern.compile("(ab)(?!.*ab)");
Matcher matcher = pattern.matcher("abcabcabcd");
if(matcher.find()) {
System.out.println(matcher.start() + ", " + matcher.end());
}
This displays the following:
6, 8
So, to generalize - <reg_ex>(?!.*<reg_ex>) should solve this problem where '?!' signifies that the string following it should not be present after the string that precedes '?!'.
Update: This page provides a more information on 'not followed by' using regex.
This will give you the last date in group 1 of the match object.
.*(\d{2}/\d{2}/\d{4})
Pattern p = Pattern.compile("ab.*?$");
Matcher m = p.matcher("abcabcabcabc");
boolean b = m.matches();
I do not understand what you are trying to do. Why only the last if they are all the same? Why a regular expression and why not int pos = s.lastIndexOf(String str) ?
For the date example, you could do this with the Pattern API and not in the regex itself. The basic idea is to get all the matches, then return the last one.
public static void main(String[] args) {
// this may be over-kill, you can replace with a much simpler but more lenient version
final String dateRegex = "\\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20)?[0-9]{2}\\b";
final String sample = "12/08/2008 some_text 21/10/2008 some_more_text 15/12/2008 and_finally_some_more";
List<String> allMatches = getAllMatches(dateRegex, sample);
System.out.println(allMatches.get(allMatches.size() - 1));
}
private static List<String> getAllMatches(final String regex, final String input) {
final Matcher matcher = Pattern.compile(regex).matcher(input);
return new ArrayList<String>() {{
while (matcher.find())
add(input.substring(matcher.start(), matcher.end()));
}};
}

Categories