Matching the pattern of a string in java

Matching the pattern of a string in java - java

I have been trying to figure out how to match the pattern of my input string with this kind of string:
"xyz 123456789"
In general every time I have a input that has first 3 characters (can be both uppercase or lowercase) and last 9 are digits (any combination) the input string should be accepted.
So if I have i/p string = "Abc 234646593" it should be a match (one or two white-space allowed). Also it would be great if "Abc" and "234646593" should be stored in seperate strings.
I have seeing a lot of regex but do not fully understand it.

Here's a working Java solution:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex {
public static void main(String[] args) {
String input = "Abc 234646593";
// you could use \\s+ rather than \\s{1,2} if you only care that
// at least one whitespace char occurs
Pattern p = Pattern.compile("([a-zA-Z]{3})\\s{1,2}([0-9]{9})");
Matcher m = p.matcher(input);
String firstPart = null;
String secondPart = null;
if (m.matches()) {
firstPart = m.group(1); // grab first remembered match ([a-zA-Z]{3})
secondPart = m.group(2); // grab second remembered match ([0-9]{9})
System.out.println("First part: " + firstPart);
System.out.println("Second part: " + secondPart);
}
}
}
Prints out:
First part: Abc
Second part: 234646593

Related

Is there a regex where if first expression is valid then check for next [duplicate]

I have several strings in the rough form:
[some text] [some number] [some more text]
I want to extract the text in [some number] using the Java Regex classes.
I know roughly what regular expression I want to use (though all suggestions are welcome). What I'm really interested in are the Java calls to take the regex string and use it on the source data to produce the value of [some number].
EDIT: I should add that I'm only interested in a single [some number] (basically, the first instance). The source strings are short and I'm not going to be looking for multiple occurrences of [some number].

Full example:
private static final Pattern p = Pattern.compile("^([a-zA-Z]+)([0-9]+)(.*)");
public static void main(String[] args) {
// create matcher for pattern p and given string
Matcher m = p.matcher("Testing123Testing");
// if an occurrence if a pattern was found in a given string...
if (m.find()) {
// ...then you can use group() methods.
System.out.println(m.group(0)); // whole matched expression
System.out.println(m.group(1)); // first expression from round brackets (Testing)
System.out.println(m.group(2)); // second one (123)
System.out.println(m.group(3)); // third one (Testing)
}
}
Since you're looking for the first number, you can use such regexp:
^\D+(\d+).*
and m.group(1) will return you the first number. Note that signed numbers can contain a minus sign:
^\D+(-?\d+).*

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex1 {
public static void main(String[]args) {
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("hello1234goodboy789very2345");
while(m.find()) {
System.out.println(m.group());
}
}
}
Output:
1234
789
2345

Allain basically has the java code, so you can use that. However, his expression only matches if your numbers are only preceded by a stream of word characters.
"(\\d+)"
should be able to find the first string of digits. You don't need to specify what's before it, if you're sure that it's going to be the first string of digits. Likewise, there is no use to specify what's after it, unless you want that. If you just want the number, and are sure that it will be the first string of one or more digits then that's all you need.
If you expect it to be offset by spaces, it will make it even more distinct to specify
"\\s+(\\d+)\\s+"
might be better.
If you need all three parts, this will do:
"(\\D+)(\\d+)(.*)"
EDIT The Expressions given by Allain and Jack suggest that you need to specify some subset of non-digits in order to capture digits. If you tell the regex engine you're looking for \d then it's going to ignore everything before the digits. If J or A's expression fits your pattern, then the whole match equals the input string. And there's no reason to specify it. It probably slows a clean match down, if it isn't totally ignored.

In addition to Pattern, the Java String class also has several methods that can work with regular expressions, in your case the code will be:
"ab123abc".replaceFirst("\\D*(\\d*).*", "$1")
where \\D is a non-digit character.

In Java 1.4 and up:
String input = "...";
Matcher matcher = Pattern.compile("[^0-9]+([0-9]+)[^0-9]+").matcher(input);
if (matcher.find()) {
String someNumberStr = matcher.group(1);
// if you need this to be an int:
int someNumberInt = Integer.parseInt(someNumberStr);
}

This function collect all matching sequences from string. In this example it takes all email addresses from string.
static final String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#"
+ "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
public List<String> getAllEmails(String message) {
List<String> result = null;
Matcher matcher = Pattern.compile(EMAIL_PATTERN).matcher(message);
if (matcher.find()) {
result = new ArrayList<String>();
result.add(matcher.group());
while (matcher.find()) {
result.add(matcher.group());
}
}
return result;
}
For message = "adf#gmail.com, <another#osiem.osiem>>>> lalala#aaa.pl" it will create List of 3 elements.

Try doing something like this:
Pattern p = Pattern.compile("^.+(\\d+).+");
Matcher m = p.matcher("Testing123Testing");
if (m.find()) {
System.out.println(m.group(1));
}

Simple Solution
// Regexplanation:
// ^ beginning of line
// \\D+ 1+ non-digit characters
// (\\d+) 1+ digit characters in a capture group
// .* 0+ any character
String regexStr = "^\\D+(\\d+).*";
// Compile the regex String into a Pattern
Pattern p = Pattern.compile(regexStr);
// Create a matcher with the input String
Matcher m = p.matcher(inputStr);
// If we find a match
if (m.find()) {
// Get the String from the first capture group
String someDigits = m.group(1);
// ...do something with someDigits
}
Solution in a Util Class
public class MyUtil {
private static Pattern pattern = Pattern.compile("^\\D+(\\d+).*");
private static Matcher matcher = pattern.matcher("");
// Assumptions: inputStr is a non-null String
public static String extractFirstNumber(String inputStr){
// Reset the matcher with a new input String
matcher.reset(inputStr);
// Check if there's a match
if(matcher.find()){
// Return the number (in the first capture group)
return matcher.group(1);
}else{
// Return some default value, if there is no match
return null;
}
}
}
...
// Use the util function and print out the result
String firstNum = MyUtil.extractFirstNumber("Testing4234Things");
System.out.println(firstNum);

Look you can do it using StringTokenizer
String str = "as:"+123+"as:"+234+"as:"+345;
StringTokenizer st = new StringTokenizer(str,"as:");
while(st.hasMoreTokens())
{
String k = st.nextToken(); // you will get first numeric data i.e 123
int kk = Integer.parseInt(k);
System.out.println("k string token in integer " + kk);
String k1 = st.nextToken(); // you will get second numeric data i.e 234
int kk1 = Integer.parseInt(k1);
System.out.println("new string k1 token in integer :" + kk1);
String k2 = st.nextToken(); // you will get third numeric data i.e 345
int kk2 = Integer.parseInt(k2);
System.out.println("k2 string token is in integer : " + kk2);
}
Since we are taking these numeric data into three different variables we can use this data anywhere in the code (for further use)

How about [^\\d]*([0-9]+[\\s]*[.,]{0,1}[\\s]*[0-9]*).* I think it would take care of numbers with fractional part.
I included white spaces and included , as possible separator.
I'm trying to get the numbers out of a string including floats and taking into account that the user might make a mistake and include white spaces while typing the number.

Sometimes you can use simple .split("REGEXP") method available in java.lang.String. For example:
String input = "first,second,third";
//To retrieve 'first'
input.split(",")[0]
//second
input.split(",")[1]
//third
input.split(",")[2]

if you are reading from file then this can help you
try{
InputStream inputStream = (InputStream) mnpMainBean.getUploadedBulk().getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
String line;
//Ref:03
while ((line = br.readLine()) != null) {
if (line.matches("[A-Z],\\d,(\\d*,){2}(\\s*\\d*\\|\\d*:)+")) {
String[] splitRecord = line.split(",");
//do something
}
else{
br.close();
//error
return;
}
}
br.close();
}
}
catch (IOException ioExpception){
logger.logDebug("Exception " + ioExpception.getStackTrace());
}

Pattern p = Pattern.compile("(\\D+)(\\d+)(.*)");
Matcher m = p.matcher("this is your number:1234 thank you");
if (m.find()) {
String someNumberStr = m.group(2);
int someNumberInt = Integer.parseInt(someNumberStr);
}

Java regex convert string to valid json string

I have a pretty long string that looks something like
{abc:\"def\", ghi:\"jkl\"}
I want to convert this to a valid json string like
{\"abc\":\"def\", \"ghi\":\"jkl\"}
I started looking at the replaceAll(String regex, String replacement) method on the string object but i'm struggling to find the correct regex for it.
Can someone please help me with this.

In this particular case the regex should look for a word that is proceeded with {, space, or , and not followed by "
String str = "{abc:\"def\", ghi:\"jkl\"}";
String regex = "(?:[{ ,])(\\w+)(?!\")";
System.out.println(str.replaceAll(regex, "\\\"$1\\\""));
DEMO and regex explanation

I have to make an assumption that the "key" and "value" consist of only
"word characters" (\w) and there are no spaces in them.
Here is my program. Please also see the comments in-line:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexJson {
public static void main(String[] args) {
/*
* Note that the input string, when expressed in a Java program, need escape
* for backslash (\) and double quote ("). If you read directly
* from a file then these escapes are not needed
*/
String input = "{abc:\\\"def\\\", ghi:\\\"jkl\\\"}";
// regex for one pair of key-value pair. Eg: abc:\"edf\"
String keyValueRegex = "(?<key>\\w+):(?<value>\\\\\\\"\\w+\\\\\\\")";
// regex for a list of key-value pair, separated by a comma (,) and a space ( )
String pairsRegex = "(?<pairs>(,*\\s*"+keyValueRegex+")+)";
// regex include the open and closing braces ({})
String regex = "\\{"+pairsRegex+"\\}";
StringBuilder sb = new StringBuilder();
sb.append("{");
Pattern p1 = Pattern.compile(regex);
Matcher m1 = p1.matcher(input);
while (m1.find()) {
String pairs = m1.group("pairs");
Pattern p2 = Pattern.compile(keyValueRegex);
Matcher m2 = p2.matcher(pairs);
String comma = ""; // first time special
while (m2.find()) {
String key = m2.group("key");
String value = m2.group("value");
sb.append(String.format(comma + "\\\"%s\\\":%s", key, value));
comma = ", "; // second time and onwards
}
}
sb.append("}");
System.out.println("input is: " + input);
System.out.println(sb.toString());
}
}
The print out of this program is:
input is: {abc:\"def\", ghi:\"jkl\"}
{\"abc\":\"def\", \"ghi\":\"jkl\"}

Print out the last match of a regex

I have this code:
String responseData = "http://xxxxx-f.frehd.net/i/world/open/20150426/1370235-005A/EPISOD-1370235-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/.m3u8";
"http://xxxxx-f.frehd.net/i/world/open/20150426/1370235-005A/EPISOD-1370235-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/.m3u8";
String pattern = ^(https://.*\.54325)$;
Pattern pr = Pattern.compile(pattern);
Matcher math = pr.matcher(responseData);
if (math.find()) {
// print the url
}
else {
System.out.println("No Math");
}
I want to print out the last string that starts with http and ends with .m3u8. How do I do this? I'm stuck. All help is appreciated.
The problem I have now is that when I find a math and what to print out the string, I get everything from responseData.

In case you need to get some substring at the end that is preceded by similar substrings, you need to make sure the regex engine has already consumed as many characters before your required match as possible.
Also, you have a ^ in your pattern that means beginning of a string. Thus, it starts matching from the very beginning.
You can achieve what you want with just lastIndexOf and substring:
System.out.println(str.substring(str.lastIndexOf("http://")));
Or, if you need a regex, you'll need to use
String pattern = ".*(http://.*?\\.m3u8)$";
and use math.group(1) to print the value.
Sample code:
import java.util.regex.*;
public class HelloWorld{
public static void main(String []args){
String str = "http://xxxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-1370235-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/index_0_av.m3u8" +
"EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2795000,RESOLUTION=1280x720,CODECS=avc1.64001f, mp4a.40.2" +
"http://xxxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-1370235-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/index_6_av.m3u8";
String rx = ".*(http://.*?\\.m3u8)$";
Pattern ptrn = Pattern.compile(rx);
Matcher m = ptrn.matcher(str);
while (m.find()) {
System.out.println(m.group(1));
}
}
}
Output:
http://xxxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-1370235-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/index_6_av.m3u8
Also tested on RegexPlanet

Reusing the consumed characters in pattern matching in java?

Consider the following Pattern :-
aba
And the foll. source string :-
abababbbaba
01234567890 //Index Positions
Using Pattern and Matcher classes from java.util.regex package, finds this pattern only two times since regex does not consider already consumed characters.
What if I want to reuse a part of already consumed characters. That is, I want 3 matches here, one at position 0, one at 2 (which is ignored previously), and one at 8.
How do I do it??

I think you can use the indexOf() for something like that.
String str = "abababbbaba";
String substr = "aba";
int location = 0;
while ((location = str.indexOf(substr, location)) >= 0)
{
System.out.println(location);
location++;
}
Prints:
0, 2 and 8

You can use a look ahead for that. Now what you have is the first position in group(1) and the second match in group(2). Both making each String of length 3 in the sentence you are searching in.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Question8968432 {
public static void main(String args[]) {
final String needle = "aba";
final String sentence = "abababbbaba";
final Matcher m = Pattern.compile("(.)(?=(..))").matcher(sentence);
while (m.find()) {
final String match = m.group(1) + m.group(2);
final String hint = String.format("%s[%s]%s",
sentence.substring(0, m.start()), match,
sentence.substring(m.start() + match.length()));
if (match.equals(needle)) {
System.out.printf("Found %s starting at %d: %s\n",
match, m.start(), hint);
}
}
}
}
Output:
Found aba starting at 0: [aba]babbbaba
Found aba starting at 2: ab[aba]bbbaba
Found aba starting at 8: abababbb[aba]
You can skip the final String hint part, this is just to show you what it matches and where.

If you can change the regexp, then you can simply use something like:
a(?=ba)

I need a regular expression to replace 3rd matching substring

Example
input: abc def abc abc pqr
I want to to replace abc at third position with xyz.
output: abc gef abc xyz pqr
Thanks in advance

One way to do this would be to use.
String[] mySplitStrings = null;
String.Split(" ");
mySplitString[3] = "xyz";
And then rejoin the string, its not the best way to do it but it works, you could put the whole process into a function like.
string ReplaceStringInstance(Seperator, Replacement)
{
// Do Stuff
}

Group the three segments, that are the part before the replaced string, the replaced string and the rest and assemble the prefix, the replacement and the suffix:
String pattern = String.format("^(.*?%1$s.*?%1$s.*?)(%1$s)(.*)$", "abc");
String result = input.replaceAll(pattern, "$1xyz$3");
This solution assumes that the whole input is one line. If you have multiline input you'll have to replace the dots as they don't match line separators.

There's plenty of ways to do this, but here's one. It assumes that the groups of letters will be separated by spaces, and looks for the 3rd 'abc' block. It then does a single replace to replace that with 'xyz'.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class main {
private static String INPUT = "abc def abc abc pqr";
private static String REGEX = "((?:abc\\ ).*?(?:abc\\ ).*?)(abc\\ )";
private static String REPLACE = "$1xyz ";
public static void main(String[] args) {
System.out.println("Input: " + INPUT);
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); // get a matcher object
INPUT = m.replaceFirst(REPLACE);
System.out.println("Output: " + INPUT);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Matching the pattern of a string in java - java

Related

Is there a regex where if first expression is valid then check for next [duplicate]

Java regex convert string to valid json string

Print out the last match of a regex

Reusing the consumed characters in pattern matching in java?

I need a regular expression to replace 3rd matching substring

Categories

Resources