regex to match a suffix in a given string

regex to match a suffix in a given string - java

I have a function which is supposed to validate a string to not contain the below prefix
I want to match every word with
__test_timestamp__itemname
some examples are as follows
__test_1349333576093__cellphone_modelc1
__test_1349333576090__macbook_model_12
public boolean isvalid(String Name){
/*pattern match to check for suffix and return true if string starts with
__test_timestamp_
*/
}
The person name in this string can vary and so will the timestamp which is in milliseconds , however the timestamp is 13 characters in length and consists of digits , the itemname can contain numbers and underscore
How do I write a function to match this pattern ? Thank you in advance for helping!

I'm not familiarized with java but the regex is something like this:
^__test_[0-9]{13}__[A-Za-z0-9_]+$
^: for start string
[0-9]{13}: 13 numbers
[A-Za-z0-9_]+: 1 or more Mayus/minus chars, numbers and _
$: for end string
https://regex101.com/r/oWBfes/2
Edit: If you need more flexibility for the timestamp, you can set min and max like this:
{11,13}
^__test_[0-9]{11,13}__[A-Za-z0-9_]+$
Edit: add 100 max length:
(?=^.{0,100}$)(^__test_[0-9]{11,13}__[A-Za-z0-9_]+$)
Edit: to group last occurrence:
(?=^.{0,100}$)^__test_[0-9]{11,13}__([A-Za-z0-9_]+)$/
To catch what you want, group it with ()

You may use String#matches as follows:
public boolean isvalid(String name) {
return name.matches("__test_\\d+__\\S+");
}
Note that we don't assign any fixed width to the timestamp, because perhaps you have some earlier data whose timestamps could be less than 13 digits wide.

Related

Why does this regex fails to check accurately?

I have the following regex method which does the matches in 3 stages for a given string. But for some reason the Regex fails to check some of the things. As per whatever knowledge I have gained by working they seem to be correct. Can someone please correct me what am I doing wrong here?
I have the following code:
public class App {
public static void main(String[] args) {
String identifier = "urn:abc:de:xyz:234567.1890123";
if (identifier.matches("^urn:abc:de:xyz:.*")) {
System.out.println("Match ONE");
if (identifier.matches("^urn:abc:de:xyz:[0-9]{6,12}.[0-9]{1,7}.*")) {
System.out.println("Match TWO");
if (identifier.matches("^urn:abc:de:xyz:[0-9]{6,12}.[a-zA-Z0-9.-_]{1,20}$")) {
System.out.println("Match Three");
}
}
}
}
}
Ideally, this code should generate the output
Match ONE
Match TWO
Match Three
Only when the identifier = "urn:abc:de:xyz:234567.1890123.abd12" but it provides the same output event if the identifier does not match the regex such as for the following inputs:
"urn:abc:de:xyz:234567.1890123"
"urn:abc:de:xyz:234567.1890ANC"
"urn:abc:de:xyz:234567.1890123"
"urn:abc:de:xyz:234567.1890ACB.123"
I am not understanding why is it allowing the Alphanumeric characters after the . and also it does not care about the characters after the second ..
I would like my Regex to check that the string has the following format:
String starts with urn:abc:de:xyz:
Then it has the numbers [0-9] which range from 6 to 12 (234567).
Then it has the decimal point .
Then it has the numbers [0-9] which range from 1 to 7 (1890123)
Then it has the decimal point ..
Finally it has the alphanumeric character and spcial character which range from 1 to 20 (ABC123.-_12).
This is an valid string for my regex: urn:abc:de:xyz:234567.1890123.ABC123.-_12
This is an invalid string for my regex as it misses the elements from point 6:
urn:abc:de:xyz:234567.1890123
This is also an invalid string for my regex as it misses the elements from point 4 (it has ABC instead of decimal numbers).
urn:abc:de:xyz:234567.1890ABC.ABC123.-_12

This part of the regex:
[0-9]{6,12}.[0-9]{1,7} matches 6 to 12 digits followed by any character followed by 1 to 7 digits
To match a dot, it needs to be escaped. Try this:
^urn:abc:de:xyz:[0-9]{6,12}\.[0-9]{1,7}\.[a-zA-Z0-9\-_]{1,20}$

This will match with any number of dot alphanum at the end of the string as your examples:
^urn:abc:de:xyz:\d{6,12}\.\d{1,7}(?:\.[\w-]{1,20})+$
Demo & explanation

Regex for epoch time in millisecond using java

I have this string:
String str = "8240d66c-4771-4fae-9631-8a420f9099ca,458,cross_vendor_certificate_will_expire,1565102543758";
I would like to remove the epoch time from the string using regex I've searched the web but I didn't find a suitable solution.
This is my code I have so far:
public void createHashMapWithAlertCSVContent() throws Exception {
for(String item: lstServer) { //lstServer is a list contains names of the CSV files
String[] contentCSVStr= CmdHelper.Remote.File.cat(SERVER,INDENI_INSIGHT_PATH + "/"+item).split("\n");//Function to get CSV contents
mapServer.put(FileUtil.removeExtension(item), contentCSVStr);//Finally I add each String[] to hashmap key is the csv file name and String[] is the content of each CSV file
}
Assert.assertEquals(mapServer.size(), lstServer.size());
mapServer.remove("job");
}
example of possible content:
1. 0,TRIAL,8240d66c-4771-4fae-9631-8a420f9099ca,1566345600000,5,1565102549213
2. 8240d66c-4771-4fae-9631-8a420f9099ca,0,1565102673040
3. 8240d66c-4771-4fae-9631-8a420f9099ca,0.0.0.develop,4418,1565102673009
EDIT:
regex might be any location in the string and might exit more than once in the string.
length of the epoch time string for sure > 10

String input = "0,TRIAL,8240d66c-4771-4fae-9631-8a420f9099ca,1566345600000,5,1565102549213";
String output = input.replaceAll("\\d{10,},|,\\d{10,}", "");
System.out.println(output);
Output:
0,TRIAL,8240d66c-4771-4fae-9631-8a420f9099ca,5
The vertical bar | in the regular expression denotes two options, one with a number and a comma, the other with the comma before the number. This takes into account that the timestamp may be first or last in the string or somewhere between.
\\d denotes a digit and {10,} that there are at least 10 of them with no upper limit. Please consider yourself whether the lower limit should be 10, 13 or some other number of digits.
Corner case: if the string consists of only one or more timestamps, the above replacement will not remove the last one of them since it insists on removing one comma with each timestamp, and a string consisting of only one timestamp will not have a comma in it.

Regular Expression Parse Double

I am new to regular expressions. I want to search for NUMBER(19, 4) and the method should return the value(in this case 19,4). But I always get 0 as result !
int length =0;
length = patternLength(datatype,"^NUMBER\\((\\d+)\\,\\s*\\)$","NUMBER");
private static double patternLengthD(String datatype, String patternString, String startsWith) {
double length=0;
if (datatype.startsWith(startsWith)) {
Pattern patternA = Pattern.compile(patternString);
Matcher matcherA = patternA.matcher(datatype);
if (matcherA.find()) {
length = Double.parseDouble(matcherA.group(1));
}
}
return length;
}

You are missing the matching of digits after the comma.
You also don't need to escape the ,.
Use this:
"^NUMBER\\((\\d+),\\s*(\\d+)\\)$"
This will give you the first number in group(1) and the second number in group(2).
It is however fairly strict on spaces, so you can be more lenient and match on values like " NUMBER ( 19 , 4 ) " by using this:
"^\\s*NUMBER\\s*\\(\\s*(\\d+)\\s*,\\s*(\\d+)\\s*\\)\\s*$"
In that case you'll have to drop your startsWith and just use the regex directly. Also, you can remove the anchors (^$) if you change find() to matches().
Since NUMBER(19) is usually allowed too. You can make the second value optional:
"\\s*NUMBER\\s*\\(\\s*(\\d+)\\s*(?:,\\s*(\\d+)\\s*)?\\)\\s*"
group(2) will then return null if the second number is not given.
See regex101 for demo.
Note that your code doesn't compile.
Your method returns a double, but length is an int.
Although 19,4 looks like a floating point number, it is not, and representing it as such is wrong.
You should store the two values separately.

How to use Substring when String length is not fixed everytime

I have string something like :
SKU: XP321654
Quantity: 1
Order date: 01/08/2016
The SKU length is not fixed , so my function sometime returns me the first or two characters of Quantity also which I do not want to get. I want to get only SKU value.
My Code :
int index = Content.indexOf("SKU:");
String SKU = Content.substring(index, index+15);
If SKU has one or two more digits then also it is not able to get because I have specified limit till 15. If I do index + 16 to get long SKU data then for Short SKU it returns me some character of Quantity also.
How can I solve it. Is there any way to use instead of a static string character length as limit.
My SKU last digit will always number so any other thing which I can use to get only SKU till it's last digit?

Using .substring is simply not the way to process such things. What you need is a regex (or regular expression):
Pattern pat = Pattern.compile("SKU\\s*:\\s*(\\S+)");
String sku = null;
Matcher matcher = pattern.matcher(Content);
if(matcher.find()) { //we've found a match
sku = matcher.group(1);
}
//do something with sku
Unescaped the regex is something like:
SKU\s*:\s*(\S+)
you are thus looking for a pattern that starts with SKU then followed by zero or more \s (spacing characters like space and tab), followed by a colon (:) then potentially zero or more spacing characters (\s) and finally the part in which you are interested: one or more (that's the meaning of +) non-spacing characters (\S). By putting these in brackets, these are a matching group. If the regex succeeds in finding the pattern (matcher.find()), you can extract the content of the matching group matcher.group(1) and store it into a string.
Potentially you can improve the regex further if you for instance know more about how a SKU looks like. For instance if it consists only out of uppercase letters and digits, you can replace \S by [0-9A-Z], so then the pattern becomes:
Pattern pat = Pattern.compile("SKU\\s*:\\s*([0-9A-Z]+)");
EDIT: for the quantity data, you could use:
Pattern pat2 = Pattern.compile("Quantity\\s*:\\s*(\\d+)");
int qt = -1;
Matcher matcher = pat2.matcher(Content);
if(matcher.find()) { //we've found a match
qt = Integer.parseInt(matcher.group(1));
}
or see this jdoodle.

You know you can just refer to the length of the string right ?
String s = "SKU: XP321654";
String sku = s.substring(4, s.length()).trim();
I think using a regex is clearly overkill in this case, it is way way simpler than this. You can even split the expression although it's a bit less efficient than the solution above, but please don't use a regex for this !
String sku = "SKU: XP321654".split(':')[1].trim();

1: you have to split your input by lines (or split by \n)
2: when you have your line: you search for : and then you take the remaining of the line (with the String size as mentionned in Dici answer).

Depending on how exactly the string contains new lines, you could do this:
public static void main(String[] args) {
String s = "SKU: XP321654\r\n" +
"Quantity: 1\r\n" +
"Order date: 01/08/2016";
System.out.println(s.substring(s.indexOf(": ") + 2, s.indexOf("\r\n")));
}
Just note that this 1-liner has several restrictions:
The SKU property has to be first. If not, then modify the start index appropriately to search for "SKU: ".
The new lines might be separated otherwise, \R is a regex for all the valid new line escape characters combinations.

Regex to get first number in string with other characters

I'm new to regular expressions, and was wondering how I could get only the first number in a string like 100 2011-10-20 14:28:55. In this case, I'd want it to return 100, but the number could also be shorter or longer.
I was thinking about something like [0-9]+, but it takes every single number separately (100,2001,10,...)
Thank you.

/^[^\d]*(\d+)/
This will start at the beginning, skip any non-digits, and match the first sequence of digits it finds
EDIT:
this Regex will match the first group of numbers, but, as pointed out in other answers, parseInt is a better solution if you know the number is at the beginning of the string

Try this to match for first number in string (which can be not at the beginning of the string):
String s = "2011-10-20 525 14:28:55 10";
Pattern p = Pattern.compile("(^|\\s)([0-9]+)($|\\s)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(2));
}

Just
([0-9]+) .*
If you always have the space after the first number, this will work

Assuming there's always a space between the first two numbers, then
preg_match('/^(\d+)/', $number_string, $matches);
$number = $matches[1]; // 100
But for something like this, you'd be better off using simple string operations:
$space_pos = strpos($number_string, ' ');
$number = substr($number_string, 0, $space_pos);
Regexs are computationally expensive, and should be avoided if possible.

the below code would do the trick.
Integer num = Integer.parseInt("100 2011-10-20 14:28:55");

[0-9] means the numbers 0-9 can be used the + means 1 or more times. if you use [0-9]{3} will get you 3 numbers

Try ^(?'num'[0-9]+).*$ which forces it to start at the beginning, read a number, store it to 'num' and consume the remainder without binding.

This string extension works perfectly, even when string not starts with number.
return 1234 in each case - "1234asdfwewf", "%sdfsr1234" "## # 1234"
public static string GetFirstNumber(this string source)
{
if (string.IsNullOrEmpty(source) == false)
{
// take non digits from string start
string notNumber = new string(source.TakeWhile(c => Char.IsDigit(c) == false).ToArray());
if (string.IsNullOrEmpty(notNumber) == false)
{
//replace non digit chars from string start
source = source.Replace(notNumber, string.Empty);
}
//take digits from string start
source = new string(source.TakeWhile(char.IsDigit).ToArray());
}
return source;
}

NOTE: In Java, when you define the patterns as string literals, do not forget to use double backslashes to define a regex escaping backslash (\. = "\\.").
To get the number that appears at the start or beginning of a string you may consider using
^[0-9]*\.?[0-9]+ # Float or integer, leading digit may be missing (e.g, .35)
^-?[0-9]*\.?[0-9]+ # Optional - before number (e.g. -.55, -100)
^[-+]?[0-9]*\.?[0-9]+ # Optional + or - before number (e.g. -3.5, +30)
See this regex demo.
If you want to also match numbers with scientific notation at the start of the string, use
^[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Just number
^-?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional -
^[-+]?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional - or +
See this regex demo.
To make sure there is no other digit on the right, add a \b word boundary, or a (?!\d)
or (?!\.?\d) negative lookahead that will fail the match if there is any digit (or . and a digit) on the right.

public static void main(String []args){
Scanner s=new Scanner(System.in);
String str=s.nextLine();
Pattern p=Pattern.compile("[0-9]+");
Matcher m=p.matcher(str);
while(m.find()){
System.out.println(m.group()+" ");
}

\d+
\d stands for any decimal while + extends it to any other decimal coming directly after, until there is a non number character like a space or letter

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

regex to match a suffix in a given string - java

You may use String#matches as follows: public boolean isvalid(String name) { return name.matches("__test_\\d+__\\S+"); } Note that we don't assign any fixed width to the timestamp, because perhaps you have some earlier data whose timestamps could be less than 13 digits wide.

Related

Why does this regex fails to check accurately?

Regex for epoch time in millisecond using java

Regular Expression Parse Double

How to use Substring when String length is not fixed everytime

Regex to get first number in string with other characters

Categories

Resources