Java - Find Regex to match in String - java

I need some help because i'm junior in Java and after some research on the web
I can't find a solution.
So my problem:
String str : size="A4"
I would like to extract 'A4' with a regex by giving the word "size" in the regex.
How can I do ?

import java.util.*;
import java.util.regex.*;
import java.lang.*;
import java.io.*;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
Matcher m=Pattern.compile("size\\s*=\\s*\"([^\"]*)\"").matcher("size=\"A4\"");
while(m.find())
System.out.println(m.group(1));
}
}
Output:
A4
http://ideone.com/FqMuTA
Regex breakdown:
size\\s*=\\s*\"([^\"]*)\"
size matches size literally
\\s*=\\s* matches 0 or more white spaces leading or trailing the
= sign
\" matches a double quote
([^\"]*) matches 0 or more characters(which is not a double quote
[^\"]) and remembers the captured text as back-reference 1 i.e
nothing but captured group number 1 used below in the while loop
\" we match the ending double quote
You can find more info on regex here

Create a Pattern java.util.regex.Pattern that matches your conditions.
Generate a Matcher java.util.regex.Matcher that handles the input String
let the Matcher find the desired value (by using Matcher.group(group) )
.
//1. create Pattern
Pattern p = Pattern.compile("size=\\\"([A-Za-z0-9]{2})\\\"");
//2. generate Matcher
Matcher m = p.matcher(myString);
//3. find value using groups(int)
if(m.find()) {
System.out.println( m.group(1) );
}

Related

Looking for A Regular expression to match java regex (punct) pattern

I am looking for help/support for a Regex expression which will match studentIdMatch2 value in below class. studentIdMatch1 matches fine.However the studentIdMatch2 has studentId which can allow all the special characters other than : and ^ and comma.Hence its not working,thank you for your time and appreciate your support.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestRegEx {
public static void main(String args[]){
String studentIdMatch1 = "studentName:harry,^studentId:Id123";
String studentIdMatch2 = "studentName:harry,^studentId:Id-H/MPU/L&T/OA+_T/(1490)/17#)123";
Pattern pattern = Pattern
.compile("(\\p{Punct}?)(\\w+?)(:)(\\p{Punct}?)(\\w+?)(\\p{Punct}?),");
Matcher matcher = pattern.matcher(studentIdMatch1 + ","); // Works Fine(Matches Student Name and Id)
// No Special Characters in StudentId
//Matcher matcher = pattern.matcher(studentIdMatch2 + ","); //Wont work Special Characters in StudentId. Matches Student Name
while (matcher.find()) {
System.out.println("group1 = "+matcher.group(1)+ "group2 = "+matcher.group(2) +"group3 = "+matcher.group(3) +"group4 = "+matcher.group(4)+"group5 = "+matcher.group(5));
}
System.out.println("match ended");
}
}
You may try:
^SutdentName:(\w+),\^StudenId:([^\s,^:]+)$
Explanation of the above regex:
^, $ - Represents start and end of line respectively.
SutdentName: - Matches SutdentName: literally. Although according to me it should be StudentName; but I didn't changed it.
(\w+) - Represents first capturing group matching only word characters i.e. [A-Za-z0-9_] one or more times greedily.
,\^StudenId: - Matches ,^StudenId literally. Here also I guess it should be StudentId.
([^\s,^:]+) - Represents second capturing group matching everything other than white-space, ,, ^ and : one or more times greedily. You can add others according to your requirements.
You can find the demo of the above regex in here.
Sample Implementation in java:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Main
{
private static final Pattern pattern = Pattern.compile("^SutdentName:(\\w+),\\^StudenId:([^\\s,^:]+)$", Pattern.MULTILINE);
public static void main(String[] args) {
String string = "SutdentName:harry,^StudenId:Id123\n"
+ "SutdentName:harry,^StudenId:Id-H/MNK/U&T/BA+_T/(1490)/17#)123";
Matcher matcher = pattern.matcher(string);
while(matcher.find()){
System.out.println(matcher.group(1) + " " + matcher.group(2));
}
}
}
You can find the sample run of the above code in here.
The second (\\w+?) only captures words. So change it to capture what you want. i.e
allow all the special characters other than : and ^ and comma
like ([^:^,]+?)
^ - Negate the match
:^, - Matches : , ^ and comma

Regular Expression - Starting with and ending with string

I would like to write a regular expression to match files that starts with "AMDF" or "SB700" and does not end with ".tmp". This will be used in Java.
Code
See regex in use here
^(?:AMDF|SB700).*\.(?!tmp$)[^.]+$
Usage
See code in use here
import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
final String regex = "^(?:AMDF|SB700).*\\.(?!tmp$)[^.]+$";
final String[] files = {
"AMDF123978sudjfadfs.ext",
"SB700afddasjfkadsfs.ext",
"AMDE41312312089fsas.ext",
"SB701fs98dfjasdjfsd.ext",
"AMDF123120381203113.tmp"
};
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
for (String file:files) {
final Matcher matcher = pattern.matcher(file);
if(matcher.matches()) {
System.out.println(matcher.group(0));
}
}
}
}
Results
Input
AMDF123978sudjfadfs.ext
SB700afddasjfkadsfs.ext
AMDE41312312089fsas.ext
SB701fs98dfjasdjfsd.ext
AMDF123120381203113.tmp
Output
Below shows only matches.
AMDF123978sudjfadfs.ext
SB700afddasjfkadsfs.ext
Explanation
^ Assert position at the start of the line
(?:AMDF|SB700) Match either AMDF or SB700 literally
.* Match any character any number of times
\. Match a literal dot . character
(?!tmp$) Negative lookahead ensuring what follows doesn't match tmp literally (asserting the end of the line afterwards so as not to match .tmpx where x can be anything)
[^.]+ Match any character except . one or more times
$ Assert position at the end of the line
Here is another example that works:
^(SB700|AMDF).*(?!\.tmp).{4}$
An approach could be to try a regex using a negative lookahead to assert that the file name does not end on .tmp and use an anchor ^ to make sure that the file name starts with AMDF or SB700 like:
^(?!.*\.tmp$)(?:AMDF|SB700)\w*\.\w+$
Explanation
The beginning of the string ^
A negative lookahead (?!
To assert that the string ends with .tmp .*\.tmp$
A non capturing group which matches AMDF or SB700 (?:AMDF|SB700)
Match a word character zero or more times \w*
Match a dot \.
Match a word character one or more times \w+
The end of the string $
In Java it would look like:
^(?!.*\\.tmp$)(?:AMDF|SB700)\\w*\\.\\w+$
Demo

How to write a regular expressions that extracts tabbed pieces of text?

I have been trying to create a program to replace tab elements with spaces (assuming a tab is equivalent to 8 spaces, one or more of which taken by non-whitespace characters (letter).
I start to extract the text in a file from a scanner by the following:
try {
reader = new FileReader(file)
} catch (IOException io) {
println("File not found")
}
Scanner scanner = new Scanner(reader);
scanner.usedelimiter("//Z");
String text = Scanner.next();
And then I try parsing through pieces of text that end with a tab with ptrn1 below, and extract the length of the last word of each piece with ptrn2:
Pattern ptrn1 = Pattern.compile(".*\\t, Pattern.DOTALL);
Matcher matcher1 = ptrn1.matcher(text);
String nextPiece = matcher1.group();
println(matcher1.group()); /* gives me the first substring ending with tab*/
however:
Pattern ptrn2 = Pattern.compile("\\s.*\\t"); /*supposed to capture the last word in the string*/
Matcher matcher2 = ptrn2.matcher(nextPiece);
String lastword = matcher2.group();
The last line gives me an error since apparently it cannot match anything with the pattern ("\\s.\*\\t"). There is something wrong with this last regular expression, which is intended to say "any number of spaces, followed by any number of characters, followed by a tab. I have not been able to find out what is wrong with it though. I have tried ("\\s*.+\\t"), ("\\s*.*\\t"), and ("\s+.+\\t"); still no luck.
Later on, per recommendations below, I simplified the code and included the sample string in it. As follows:
import acm.program.*;
import acm.util.*;
import java.util.*;
import java.io.*;
import java.util.regex.*;
public class Untabify extends ConsoleProgram {
public void run(){
String s = "Be plain,\tgood son,\tand homely\tin thy drift.\tRiddling\tconfession\tfinds but riddling\tshrift. ";
Pattern ptrn1 =Pattern.compile(".*?\t", Pattern.DOTALL);
Pattern ptrn2 = Pattern.compile("[^\\s+]\t", Pattern.DOTALL);
String nextPiece;
Matcher matcher1 = ptrn1.matcher(s);
while (matcher1.find()){
nextPiece = matcher1.group();
println(nextPiece);
Matcher matcher2 = ptrn2.matcher(nextPiece);
println(matcher2.group());
}
}
}
The program variably crashes, first at "println(matcher2.group())"; and on the next run on "public void run()" with the message: "Debug Current Instruction Pointer" (what is the meaning of it?).
It would be useful to see a sample string. If you just want the last word before the tab, then you can use this:
([^\s]+)\t
Note the () are to put the last word in a group. [^\s]+ means 1 or more non-space.
You do not need to double-escape the tab character (i.e. \\t); \t will do fine. \t is interpreted as a tab character by the java String parser, and that tab character is sent to the regex parser, which interprets it as a tab character. You can see this answer for more information.
Also, you should use Pattern.DOTALL, not Pattern.Dotall.
The pattern "\\s.*\\t" must match a single whitespace character (\s) followed by 0 or more characters (.*) followed by a single tab (\t). If you want to capture the last word and a trailing tab you should use the word boundary escape \b
Pattern.compile("\\b.*\\b\t");
You could replace the . above to use \w or whatever your definition of a word character is if you don't want to match any character.
Here's the code you'd use to match any word immediately before a tab:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegEx {
public static void main(String args[]) {
String text = "ab cd\t ef gh\t ij";
Pattern pattern = Pattern.compile("\\b(\\w+)\\b\t", Pattern.DOTALL);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
}
The above will output
cd
gh
See the Regular Expression Tutorial, especially the sections on Predefined Character Classes and Boundary Matchers for more information.
You can get more detail and experiment with this regular expression on Regex101.

regular expression java

I am trying to Take the content between Input, my pattern is not doing the right thing please help.
below is the sudocode:
s="Input one Input Two Input Three";
Pattern pat = Pattern.compile("Input(.*?)");
Matcher m = pat.matcher(s);
if m.matches():
print m.group(..)
Required Output:
one
Two
Three
Use a lookahead for Input and use find in a loop, instead of matches:
Pattern pattern = Pattern.compile("Input(.*?)(?=Input|$)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
See it working online: ideone
But it's better to use split here:
String[] result = s.split("Input");
// You need to ignore the first element in result, because it is empty.
See it working online: ideone
this does not work, because m.matches is true if and only if the whole string is matched by the expression. You could go two ways:
Use s.split("Input") instead, it gives you an array of the substrings between occurences of "Input"
Use Matcher.find() and Matcher.group(int). But be aware that your current expression will match everything after the first occurence of "Input", so you should change your expression.
Greetings,
Jost
import java.util.regex.*;
public class Regex {
public static void main(String[] args) {
String s="Input one Input Two Input Three";
Pattern pat = Pattern.compile("(Input) (\\w+)");
Matcher m = pat.matcher(s);
while( m.find() ) {
System.out.println( m.group(2) );
}
}
}

How Match a Pattern in Text using Scanner and Pattern Classes?

i want to find whether a particular pattern exists in my text file or not.
i m using following classes for this :
java.util.regex.Pattern and java.util.Scanner;
my sample text Line is
String Line="DBREF 1A1F A 102 190 UNP P08046 EGR1_MOUSE 308 396";
and, i want to match following kind of pattern :
A 102 190
where, at A's position a-z or A-Z but single charter.
at 102's position any integer and of any length.
at 190's position any integer and of any length.
and,My code for pattern matching is:
Scanner sr=new Scanner(Line);
Pattern p = Pattern.compile("\\s+([a-zA-Z]){1}\\s+\\d{1,}\\s+\\d{1,}\\s+");
while(sr.hasNext(p))
{
System.out.println("Pattern exists");
System.out.println("Matched String : "+sr.next(p));
}
but,
pattern is not matching even it exist there..
i think the problem is with my pattern string :
\\s+([a-zA-Z]){1}\\s+\\d{1,}\\s+\\d{1,}\\s+"
anybody, Plz help me what pattern string should i use????
I'm not sure that Scanner is the best tool for this as hasNext(Pattern) checks to see if the next complete token has the next pattern. Your pattern goes across tokens.
Have you tried using a Matcher object instead of the Scanner?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Foo2 {
public static void main(String[] args) {
String line = "DBREF 1A1F A 102 190 UNP P08046 EGR1_MOUSE 308 396";
Pattern p = Pattern.compile("\\s+[a-zA-Z]\\s+\\d{1,}\\s+\\d{1,}\\s+");
Matcher matcher = p.matcher(line);
while (matcher.find()) {
System.out.printf("group: %s%n", matcher.group());
}
System.out.println("done");
}
}
This regex line works:
\\s+\\w\\s+\\d+\\s+\\d+
group(0) of your matcher (p.matcher) gives A 102 190
.
[EDIT] Ok, I'll give you a complete working sample then:
Pattern p = Pattern.compile("\\s+\\w\\s+\\d+\\s+\\d+");
Matcher matcher = p.matcher("DBREF 1A1F A 102 190 UNP P08046 EGR1_MOUSE 308 396");
matcher.find();
System.out.println("Found match: " + matcher.group(0));
// Found match: A 102 190

Categories