How to write and use regular expression in java

How to write and use regular expression in java - java

I want to write a regular expression in java which will accept the String having alphabets, numbers, - and space any number of times any where.
The string should only contain above mentioned and no other special characters. How to code the regular expression in java?
I tried the following, It works when I run it as a java application.
But the same code when I run in web application and accept the values through XML, It accepts '/'.
String test1 = null;
Scanner scan = new Scanner(System.in);
test1 = scan.nextLine();
String alphaExp = "^[a-zA-Z0-9-]*$";
Pattern r = Pattern.compile(alphaExp);
Matcher m = r.matcher(test1);
boolean flag = m.lookingAt();
System.out.println(flag);
Can anyone help me on this please?

You can try to use POSIX character classes (see here):
Pattern p = Pattern.compile("^[\\p{Alnum}\\p{Space}-]*$");
Matcher m = p.matcher("asfsdf 1212sdfsd-gf121sdg5 4s");
boolean b = m.lookingAt();
With this regular expression if the string you pass contain anything else than alphanumeric or space characters it will be a no match result.

I think you're just missing a space from the character class - since you mentioned it in your text ^[a-zA-Z0-9 -]*$
You can add the Pattern.MULTILINE flag too so you can specify how the pattern handles the lines:
String alphaExp = "^[a-zA-Z0-9 -]*$";
Pattern r = Pattern.compile(alphaExp, Pattern.MULTILINE);
Matcher m = r.matcher(test1);
boolean flag = m.lookingAt();

Pay attention to the fact that * quantifier will make it match to everything including no matches (0 or more times, like empty lines or blank tokens "", infinitely.
If you instead use + "[\w\d\s-\]+" it will match one or more (consider using \\ for each \ in your Java Regex code as follow: "[\\w\\d\\s-]+"
Consider that * is a quantity operator that works as {0, } and + works like {1, }

Related

“minus-sign” into this regular expression. How?

Consider:
String str = "XYhaku(ABH1235-123548)";
From the above string, I need only "ABH1235-123548" and so far I created a regular expression:
Pattern.compile("ABH\\d+")
But it returns false. So what the correct regular expression for it?

I would just grab whatever is in the parenthesis:
Pattern p = Pattern.compile("\\((?<data>[A-Z\\d]+\\-\\d+)\\)");
Or, if you want to be even more open (any parenthesis):
Pattern p = Pattern.compile("\\((?<data>.+\\)\\)");
Then just nab it:
String s = /* some input */;
Matcher m = p.matcher(s);
if (m.find()) { //just find first
String tag = m.group("data"); //ABH1235-123548
}

\d only matches digits. To include other characters, use a character class:
Pattern.compile("ABH[\\d-]+")
Note that the - must be placed first or last in the character class, because otherwise it will be treated as a range indicator ([A-Z] matching every letter between A and Z, for example). Another way to avoid that would be to escape it, but that adds two more backslashes to your string...

Java String- How to get a part of package name in android?

Its basically about getting string value between two characters. SO has many questions related to this. Like:
How to get a part of a string in java?
How to get a string between two characters?
Extract string between two strings in java
and more.
But I felt it quiet confusing while dealing with multiple dots in the string and getting the value between certain two dots.
I have got the package name as :
au.com.newline.myact
I need to get the value between "com." and the next "dot(.)". In this case "newline". I tried
Pattern pattern = Pattern.compile("com.(.*).");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
int ct = matcher.group();
I tried using substrings and IndexOf also. But couldn't get the intended answer. Because the package name in android varies by different number of dots and characters, I cannot use fixed index. Please suggest any idea.

As you probably know (based on .* part in your regex) dot . is special character in regular expressions representing any character (except line separators). So to actually make dot represent only dot you need to escape it. To do so you can place \ before it, or place it inside character class [.].
Also to get only part from parenthesis (.*) you need to select it with proper group index which in your case is 1.
So try with
String beforeTask = "au.com.newline.myact";
Pattern pattern = Pattern.compile("com[.](.*)[.]");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
String ct = matcher.group(1);//remember that regex finds Strings, not int
System.out.println(ct);
}
Output: newline
If you want to get only one element before next . then you need to change greedy behaviour of * quantifier in .* to reluctant by adding ? after it like
Pattern pattern = Pattern.compile("com[.](.*?)[.]");
// ^
Another approach is instead of .* accepting only non-dot characters. They can be represented by negated character class: [^.]*
Pattern pattern = Pattern.compile("com[.]([^.]*)[.]");
If you don't want to use regex you can simply use indexOf method to locate positions of com. and next . after it. Then you can simply substring what you want.
String beforeTask = "au.com.newline.myact.modelact";
int start = beforeTask.indexOf("com.") + 4; // +4 since we also want to skip 'com.' part
int end = beforeTask.indexOf(".", start); //find next `.` after start index
String resutl = beforeTask.substring(start, end);
System.out.println(resutl);

You can use reflections to get the name of any class. For example:
If I have a class Runner in com.some.package and I can run
Runner.class.toString() // string is "com.some.package.Runner"
to get the full name of the class which happens to have a package name inside.
TO get something after 'com' you can use Runner.class.toString().split(".") and then iterate over the returned array with boolean flag

All you have to do is split the strings by "." and then iterate through them until you find one that equals "com". The next string in the array will be what you want.
So your code would look something like:
String[] parts = packageName.split("\\.");
int i = 0;
for(String part : parts) {
if(part.equals("com")
break;
}
++i;
}
String result = parts[i+1];

private String getStringAfterComDot(String packageName) {
String strArr[] = packageName.split("\\.");
for(int i=0; i<strArr.length; i++){
if(strArr[i].equals("com"))
return strArr[i+1];
}
return "";
}

I have done heaps of projects before dealing with websites scraping and I
just have to create my own function/utils to get the job done. Regex might
be an overkill sometimes if you just want to extract a substring from
a given string like the one you have. Below is the function I normally
use to do this kind of task.
private String GetValueFromText(String sText, String sBefore, String sAfter)
{
String sRetValue = "";
int nPos = sText.indexOf(sBefore);
if ( nPos > -1 )
{
int nLast = sText.indexOf(sAfter,nPos+sBefore.length()+1);
if ( nLast > -1)
{
sRetValue = sText.substring(nPos+sBefore.length(),nLast);
}
}
return sRetValue;
}
To use it just do the following:
String sValue = GetValueFromText("au.com.newline.myact", ".com.", ".");

Regular expression match a-alphanumeric&b-digits&c-digits

I have query about java regular expressions. Actually, I am new to regular expressions.
So I need help to form a regex for the statement below:
Statement: a-alphanumeric&b-digits&c-digits
Possible matching Examples: 1) a-90485jlkerj&b-34534534&c-643546
2) A-RT7456ffgt&B-86763454&C-684241
Use case: First of all I have to validate input string against the regular expression. If the input string matches then I have to extract a value, b value and c value like
90485jlkerj, 34534534 and 643546 respectively.
Could someone please share how I can achieve this in the best possible way?
I really appreciate your help on this.

you can use this pattern :
^(?i)a-([0-9a-z]++)&b-([0-9]++)&c-([0-9]++)$
In the case what you try to match is not the whole string, just remove the anchors:
(?i)a-([0-9a-z]++)&b-([0-9]++)&c-([0-9]++)
explanations:
(?i) make the pattern case-insensitive
[0-9]++ digit one or more times (possessive)
[0-9a-z]++ the same with letters
^ anchor for the string start
$ anchor for the string end
Parenthesis in the two patterns are capture groups (to catch what you want)

Given a string with the format a-XXX&b-XXX&c-XXX, you can extract all XXX parts in one simple line:
String[] parts = str.replaceAll("[abc]-", "").split("&");
parts will be an array with 3 elements, being the target strings you want.
The simplest regex that matches your string is:
^(?i)a-([\\da-z]+)&b-(\\d+)&c-(\\d+)
With your target strings in groups 1, 2 and 3, but you need lot of code around that to get you the strings, which as shown above is not necessary.

Following code will help you:
String[] texts = new String[]{"a-90485jlkerj&b-34534534&c-643546", "A-RT7456ffgt&B-86763454&C-684241"};
Pattern full = Pattern.compile("^(?i)a-([\\da-z]+)&b-(\\d+)&c-(\\d+)");
Pattern patternA = Pattern.compile("(?i)([\\da-z]+)&[bc]");
Pattern patternB = Pattern.compile("(\\d+)");
for (String text : texts) {
if (full.matcher(text).matches()) {
for (String part : text.split("-")) {
Matcher m = patternA.matcher(part);
if (m.matches()) {
System.out.println(part.substring(m.start(), m.end()).split("&")[0]);
}
m = patternB.matcher(part);
if (m.matches()) {
System.out.println(part.substring(m.start(), m.end()));
}
}
}
}

Regular Expression - Java

For the string value "ABCD_12" (including quotes), I would like to extract only the content and exclude out the double quotes i.e. ABCD_12 . My code is:
private static void checkRegex()
{
final Pattern stringPattern = Pattern.compile("\"([a-zA-Z_0-9])+\"");
Matcher findMatches = stringPattern.matcher("\"ABC_12\"");
if (findMatches.matches())
System.out.println("Match found" + findMatches.group(0));
}
Now I have tried doing findMatches.group(1);, but that only returns the last character in the string (I did not understand why !).
How can I extract only the content leaving out the double quotes?

Try this regex:
Pattern.compile("\"([a-zA-Z_0-9]+)\"");
OR
Pattern.compile("\"([^\"]+)\"");
Problem in your code is a misplaced + outside right parenthesis. Which is causing capturing group to capture only 1 character (since + is outside) and that's why you get only last character eventually.

A nice simple (read: non-regex) way to do this is:
String myString = "\"ABC_12\"";
String myFilteredString = myString.replaceAll("\"", "");
System.out.println(myFilteredString);
gets you
ABC_12

You should change your pattern to this:
final Pattern stringPattern = Pattern.compile("\"([a-zA-Z_0-9]+)\"");
Note that the + sign was moved inside the group, since you want the character repetition to be part of the group. In the code you posted, what you were actually searching for was a repetition of the group, which consisted in a single occurence of a single characters in [a-zA-Z_0-9].

If your pattern is strictly any text in between double quotes, then you may be better off using substring:
String str = "\"ABC_12\"";
System.out.println(str.substring(1, str.lastIndexOf('\"')));
Assuming it is a bit more complex (double quotes in between a larger string), you can use the split() function in the Pattern class and use \" as your regex - this will split the string around the \" so you can easily extract the content you want
Pattern p = Pattern.compile("\"");
// Split input with the pattern
String[] result =
p.split(str);
for (int i=0; i<result.length; i++)
System.out.println(result[i]);
}
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html#split%28java.lang.CharSequence%29

Matcher.Find() returns false when it should be true

String s = "test";
Pattern pattern = Pattern.compile("\\n((\\w+\\s*[^\\n]){0,2})(\\b" + s + "\\b\\s)((\\w+\\s*){0,2})\\n?");
Matcher matcher = pattern.matcher(searchableText);
boolean topicTitleFound = matcher.find();
startIndex = 0;
while (topicTitleFound) {
int i = searchableText.indexOf(matcher.group(0));
if (i > startIndex) {
builder.append(documentText.substring(startIndex, i - 1));
...
This is the text that I tacle:
Some text comes here
topicTitle test :
test1 : testing123
test2 : testing456
test3 : testing789
test4 : testing9097
When I'm testing this regex on http://regexpal.com/ or http://www.regexplanet.com I clearly find the title that is saying: "topicTitle test". But in my java code topicTitleFound returns false.
Please help

It could be that you have carriage-return characters ('\r') before the newline characters ('\n') in your searchableText. This would cause the match to fail at line boundaries.
To make your multi-line pattern more robust, try using the MULTILINE option when compiling the regex. Then use ^ and $ as needed to match line boundaries.
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Update:
After actually testing out your code, I see that the pattern matches whether carriage-returns are present or not. In other words, your code "works" as-is, and topicTitleFound is true when it is first assigned (outside the while loop).
Are you sure that you are getting false for topicTitleFound? Or is the problem in the loop?
By the way, the use of indexOf() is wasteful and awkward, since the matcher already stores the index at which group 0 begins. Use this instead:
int i = matcher.start(0);

Your regex is a bit hard to decrypt - not really obvious what you're trying to do. One thing that springs to mind is that your regex expects the match to start with a newline, and your sample text doesn't.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to write and use regular expression in java - java

Related

“minus-sign” into this regular expression. How?

Java String- How to get a part of package name in android?

Regular expression match a-alphanumeric&b-digits&c-digits

Regular Expression - Java

Matcher.Find() returns false when it should be true

Categories

Resources