Regex to get first number in string with other characters - java

I'm new to regular expressions, and was wondering how I could get only the first number in a string like 100 2011-10-20 14:28:55. In this case, I'd want it to return 100, but the number could also be shorter or longer.
I was thinking about something like [0-9]+, but it takes every single number separately (100,2001,10,...)
Thank you.

/^[^\d]*(\d+)/
This will start at the beginning, skip any non-digits, and match the first sequence of digits it finds
EDIT:
this Regex will match the first group of numbers, but, as pointed out in other answers, parseInt is a better solution if you know the number is at the beginning of the string

Try this to match for first number in string (which can be not at the beginning of the string):
String s = "2011-10-20 525 14:28:55 10";
Pattern p = Pattern.compile("(^|\\s)([0-9]+)($|\\s)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(2));
}

Just
([0-9]+) .*
If you always have the space after the first number, this will work

Assuming there's always a space between the first two numbers, then
preg_match('/^(\d+)/', $number_string, $matches);
$number = $matches[1]; // 100
But for something like this, you'd be better off using simple string operations:
$space_pos = strpos($number_string, ' ');
$number = substr($number_string, 0, $space_pos);
Regexs are computationally expensive, and should be avoided if possible.

the below code would do the trick.
Integer num = Integer.parseInt("100 2011-10-20 14:28:55");

[0-9] means the numbers 0-9 can be used the + means 1 or more times. if you use [0-9]{3} will get you 3 numbers

Try ^(?'num'[0-9]+).*$ which forces it to start at the beginning, read a number, store it to 'num' and consume the remainder without binding.

This string extension works perfectly, even when string not starts with number.
return 1234 in each case - "1234asdfwewf", "%sdfsr1234" "## # 1234"
public static string GetFirstNumber(this string source)
{
if (string.IsNullOrEmpty(source) == false)
{
// take non digits from string start
string notNumber = new string(source.TakeWhile(c => Char.IsDigit(c) == false).ToArray());
if (string.IsNullOrEmpty(notNumber) == false)
{
//replace non digit chars from string start
source = source.Replace(notNumber, string.Empty);
}
//take digits from string start
source = new string(source.TakeWhile(char.IsDigit).ToArray());
}
return source;
}

NOTE: In Java, when you define the patterns as string literals, do not forget to use double backslashes to define a regex escaping backslash (\. = "\\.").
To get the number that appears at the start or beginning of a string you may consider using
^[0-9]*\.?[0-9]+ # Float or integer, leading digit may be missing (e.g, .35)
^-?[0-9]*\.?[0-9]+ # Optional - before number (e.g. -.55, -100)
^[-+]?[0-9]*\.?[0-9]+ # Optional + or - before number (e.g. -3.5, +30)
See this regex demo.
If you want to also match numbers with scientific notation at the start of the string, use
^[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Just number
^-?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional -
^[-+]?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional - or +
See this regex demo.
To make sure there is no other digit on the right, add a \b word boundary, or a (?!\d)
or (?!\.?\d) negative lookahead that will fail the match if there is any digit (or . and a digit) on the right.

public static void main(String []args){
Scanner s=new Scanner(System.in);
String str=s.nextLine();
Pattern p=Pattern.compile("[0-9]+");
Matcher m=p.matcher(str);
while(m.find()){
System.out.println(m.group()+" ");
}

\d+
\d stands for any decimal while + extends it to any other decimal coming directly after, until there is a non number character like a space or letter

Related

Get substring between "first two" occurrences of a character

I have a String:
String thestra = "/aaa/bbb/ccc/ddd/eee";
Every time, in my situation, for this Sting, a minimum of two slashes will be present without fail.
And I am getting the /aaa/ like below, which is the subString between "FIRST TWO occurrences" of the char / in the String.
System.out.println("/" + thestra.split("\\/")[1] + "/");
It solves my purpose but I am wondering if there is any other elegant and cleaner alternative to this?
Please notice that I need both slashes (leading and trailing) around aaa. i.e. /aaa/
You can use indexOf, which accepts a second argument for an index to start searching from:
int start = thestra.indexOf("/");
int end = thestra.indexOf("/", start + 1) + 1;
System.out.println(thestra.substring(start, end));
Whether or not it's more elegant is a matter of opinion, but at least it doesn't find every / in the string or create an unnecessary array.
Scanner::findInLine returning the first match of the pattern may be used:
String thestra = "/aaa/bbb/ccc/ddd/eee";
System.out.println(new Scanner(thestra).findInLine("/[^/]*/"));
Output:
/aaa/
Use Pattern and Matcher from java.util.regex.
Pattern pattern = Pattern.compile("/.*?/");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
String match = matcher.group(0); // output
}
Pattern.compile("/.*?/")
.matcher(thestra)
.results()
.map(MatchResult::group)
.findFirst().ifPresent(System.out::println);
You can test this variant :)
With best regards, Fr0z3Nn
Every time, in my situation, for this Sting, minimum two slashes will be present
if that is guaranteed, split at each / keeping those delimeters and take the first three substrings.
String str = String.format("%s%s%s",(thestra.split("((?<=\\/)|(?=\\/))")));
You could also match the leading forward slash, then use a negated character class [^/]* to optionally match any character except / and then match the trailing forward slash.
String thestra = "/aaa/bbb/ccc/ddd/eee";
Pattern pattern = Pattern.compile("/[^/]*/");
Matcher matcher = pattern.matcher(thestra);
if (matcher.find()) {
System.out.println(matcher.group());
}
Output
/aaa/
One of the many ways can be replacing the string with group#1 of the regex, [^/]*(/[^/].*?/).* as shown below:
public class Main {
public static void main(String[] args) {
String thestra = "/aaa/bbb/ccc/ddd/eee";
String result = thestra.replaceAll("[^/]*(/[^/].*?/).*", "$1");
System.out.println(result);
}
}
Output:
/aaa/
Explanation of the regex:
[^/]* : Not the character, /, any number of times
( : Start of group#1
/ : The character, /
[^/]: Not the character, /
.*?: Any character any number of times (lazy match)
/ : The character, /
) : End of group#1
.* : Any character any number of times
Updated the answer as per the following valuable suggestion from Holger:
Note that to the Java regex engine, the / has no special meaning, so there is no need for escaping here. Further, since you’re only expecting a single match (the .* at the end ensures this), replaceFirst would be more idiomatic. And since there was no statement about the first / being always at the beginning of the string, prepending the pattern with either , .*? or [^/]*, would be a good idea.
I am surprised nobody mentioned using Path as of Java 7.
String thestra = "/aaa/bbb/ccc/ddd/eee";
String path = Paths.get(thestra).getName(0).toString();
System.out.println("/" + path + "/");
/aaa/
String thestra = "/aaa/bbb/ccc/ddd/eee";
System.out.println(thestra.substring(0, thestra.indexOf("/", 2) + 1));

Regex to mask multiple phone numbers (~) separated except last 4 digiits

I am trying to find a regex which masks phone numbers except last 4 digits.
example: phone=9988998888~7654321908~6789054321
Desired output : phone=******8888~******1908~*****4321
I tried below regex but it is masking only starting number
phone=******8888~7654321908~6789054321
^(phone)=(\d(?=\d{4}))*
Use replaceAll​(Function<MatchResult,​String> replacer) to replace each digit in MatchResult with "*".
public class PhoneNumberMask {
public static void main(String[] args) {
String target = "phone=9988998888~7654321908~6789054321";
Pattern pattern = Pattern.compile("(\\d+(?=\\d{4}))");
Matcher matcher = pattern.matcher(target);
String result = matcher.replaceAll((matchResult) -> matchResult.group(1).replaceAll("\\d", "*"));
System.out.println(result);
}
}
You could use:
\d(?=\d{4})
See this online demo
\d - Any single digit.
(?=\d{4}) - Positive lookahead for 4 digits.
Replace with *.
See a Java demo
Assuming you only want to mask all numbers in a string that starts with phone= separated with ~, you can use a plain regex solution without a lambda in the replacement with
String masked = text.replaceAll("(\\G(?!^)(?:\\d{4}~)?|^phone=)\\d(?=\\d{4})", "$1*");
See the regex demo. Details:
(\G(?!^)(?:\d{4}~)?|^phone=) - Group 1: end of the previous successful match and then an optional sequence of four digits and a ~ or start of string and phone=
\d - a digit
(?=\d{4}) - followed with any four digits.

Regular Expression to match spaces between numbers and operators but no spaces between numbers

I've searched many post on this forum and to my surprise, I haven't found anyone with a problem like mine.
I have to make a simple calculator for string values from console. Right now,I'm trying to make some regexes to validate the input.
My calculator has to accept numbers with spaces between the operators (only + and - is allowed) but not the ones with spaces between numbers, to sum up:
2 + 2 = 4 is correct, but
2 2 + 2 --> this should make an error and inform user on the console that he put space between numbers.
I've come up with this:
static String properExpression = "([0-9]+[+-]?)*[0-9]+$";
static String noInput = "";
static String numbersFollowedBySpace = "[0-9]+[\\s]+[0-9]";
static String numbersWithSpaces = "\\d+[+-]\\d+";
//I've tried also "[\\d\\s+\\d]";
void validateUserInput() {
Scanner sc = new Scanner(System.in);
System.out.println("Enter a calculation.");
input = sc.nextLine();
if(input.matches(properExpression)) {
calculator.calculate();
} else if(input.matches(noInput)) {
System.out.print(0);
} else if(input.matches(numbersFollowedBySpace)) {
input.replaceAll(" ", "");
calculator.calculate();
} else if(input.matches(numbersWithSpaces))
{
System.out.println("Check the numbers.
It seems that there is a space between the digits");
}
else System.out.println("sth else");
Can you give me a hint about the regex I should use?
To match a complete expression, like 2+3=24 or 6 - 4 = 2, a regex like
^\d+\s*[+-]\s*\d+\s*=\s*\d+$
will do. Look at example 1 where you can play with it.
If you want to match longer expressions like 2+3+4+5=14 then you can use:
^\d+\s*([+-]\s*\d+\s*)+=\s*\d+$
Explanation:
^\d+ # first operand
\s* # 0 or more spaces
( # start repeating group
[+-]\s* # the operator (+/-) followed by 0 or more spaces
\d+\s* # 2nd (3rd,4th) operand followed by 0 or more spaces
)+ # end repeating group. Repeat 1 or more times.
=\s*\d+$ # equal sign, followed by 0 or more spaces and result.
Now, you might want to accept an expression like 2=2 as a valid expression. In that case the repeating group could be absent, so change + into *:
^\d+\s*([+-]\s*\d+\s*)*=\s*\d+$
Look at example 2 for that one.
Try:
^(?:\d+\s*[+-])*\s*\d+$
Demo
Explanation:
The ^ and $ anchor the regex to match the whole string.
I have added \s* to allow whitespace between each number/operator.
I have replaced [0-9] with \d just to simplify it slightly; the two are equivalent.
I'm a little unclear whether you wanted to allow/disallow including = <digits> at the end, since your question mentions this but your attempted properExpression expression doesn't attempt it. If this is the case, it should be fairly easy to see how the expression can be modified to support it.
Note that I've not attempted to solve any potential issues arising out of anything other than regex issues.
Tried as much as possible to keep your logical flow. Although there are other answers which are more efficient but you would've to alter your logical flow a lot.
Please see the below and let me know if you have any questions.
static String properExpression = "\\s*(\\d+\\s*[+-]\\s*)*\\d+\\s*";
static String noInput = "";
static String numbersWithSpaces = ".*\\d[\\s]+\\d.*";
//I've tried also "[\\d\\s+\\d]";
static void validateUserInput() {
Scanner sc = new Scanner(System.in);
System.out.println("Enter a calculation.");
String input = sc.nextLine();
if(input.matches(properExpression)) {
input=input.replaceAll(" ", ""); //You've to assign it back to input.
calculator.calculate(); //Hope you have a way to pass input to calculator object
} else if(input.matches(noInput)) {
System.out.print(0);
} else if(input.matches(numbersWithSpaces)) {
System.out.println("Check the numbers. It seems that there is a space between the digits");
} else
System.out.println("sth else");
Sample working version here
Explanation
The below allows replaceable spaces..
\\s* //Allow any optional leading spaces if any
( //Start number+operator sequence
\\d+ //Number
\\s* //Optional space
[+-] //Operator
\\s* //Optional space after operator
)* //End number+operator sequence(repeated)
\\d+ //Last number in expression
\\s* //Allow any optional space.
Numbers with spaces
.* //Any beginning expression
\\d //Digit
[\\s]+ //Followed by one or more spaces
\\d //Followed by another digit
.* //Rest of the expression

How to use Substring when String length is not fixed everytime

I have string something like :
SKU: XP321654
Quantity: 1
Order date: 01/08/2016
The SKU length is not fixed , so my function sometime returns me the first or two characters of Quantity also which I do not want to get. I want to get only SKU value.
My Code :
int index = Content.indexOf("SKU:");
String SKU = Content.substring(index, index+15);
If SKU has one or two more digits then also it is not able to get because I have specified limit till 15. If I do index + 16 to get long SKU data then for Short SKU it returns me some character of Quantity also.
How can I solve it. Is there any way to use instead of a static string character length as limit.
My SKU last digit will always number so any other thing which I can use to get only SKU till it's last digit?
Using .substring is simply not the way to process such things. What you need is a regex (or regular expression):
Pattern pat = Pattern.compile("SKU\\s*:\\s*(\\S+)");
String sku = null;
Matcher matcher = pattern.matcher(Content);
if(matcher.find()) { //we've found a match
sku = matcher.group(1);
}
//do something with sku
Unescaped the regex is something like:
SKU\s*:\s*(\S+)
you are thus looking for a pattern that starts with SKU then followed by zero or more \s (spacing characters like space and tab), followed by a colon (:) then potentially zero or more spacing characters (\s) and finally the part in which you are interested: one or more (that's the meaning of +) non-spacing characters (\S). By putting these in brackets, these are a matching group. If the regex succeeds in finding the pattern (matcher.find()), you can extract the content of the matching group matcher.group(1) and store it into a string.
Potentially you can improve the regex further if you for instance know more about how a SKU looks like. For instance if it consists only out of uppercase letters and digits, you can replace \S by [0-9A-Z], so then the pattern becomes:
Pattern pat = Pattern.compile("SKU\\s*:\\s*([0-9A-Z]+)");
EDIT: for the quantity data, you could use:
Pattern pat2 = Pattern.compile("Quantity\\s*:\\s*(\\d+)");
int qt = -1;
Matcher matcher = pat2.matcher(Content);
if(matcher.find()) { //we've found a match
qt = Integer.parseInt(matcher.group(1));
}
or see this jdoodle.
You know you can just refer to the length of the string right ?
String s = "SKU: XP321654";
String sku = s.substring(4, s.length()).trim();
I think using a regex is clearly overkill in this case, it is way way simpler than this. You can even split the expression although it's a bit less efficient than the solution above, but please don't use a regex for this !
String sku = "SKU: XP321654".split(':')[1].trim();
1: you have to split your input by lines (or split by \n)
2: when you have your line: you search for : and then you take the remaining of the line (with the String size as mentionned in Dici answer).
Depending on how exactly the string contains new lines, you could do this:
public static void main(String[] args) {
String s = "SKU: XP321654\r\n" +
"Quantity: 1\r\n" +
"Order date: 01/08/2016";
System.out.println(s.substring(s.indexOf(": ") + 2, s.indexOf("\r\n")));
}
Just note that this 1-liner has several restrictions:
The SKU property has to be first. If not, then modify the start index appropriately to search for "SKU: ".
The new lines might be separated otherwise, \R is a regex for all the valid new line escape characters combinations.

split a string in java into equal length substrings while maintaining word boundaries

How to split a string into equal parts of maximum character length while maintaining word boundaries?
Say, for example, if I want to split a string "hello world" into equal substrings of maximum 7 characters it should return me
"hello "
and
"world"
But my current implementation returns
"hello w"
and
"orld "
I am using the following code taken from Split string to equal length substrings in Java to split the input string into equal parts
public static List<String> splitEqually(String text, int size) {
// Give the list the right capacity to start with. You could use an array
// instead if you wanted.
List<String> ret = new ArrayList<String>((text.length() + size - 1) / size);
for (int start = 0; start < text.length(); start += size) {
ret.add(text.substring(start, Math.min(text.length(), start + size)));
}
return ret;
}
Will it be possible to maintain word boundaries while splitting the string into substring?
To be more specific I need the string splitting algorithm to take into account the word boundary provided by spaces and not solely rely on character length while splitting the string although that also needs to be taken into account but more like a max range of characters rather than a hardcoded length of characters.
If I understand your problem correctly then this code should do what you need (but it assumes that maxLenght is equal or greater than longest word)
String data = "Hello there, my name is not importnant right now."
+ " I am just simple sentecne used to test few things.";
int maxLenght = 10;
Pattern p = Pattern.compile("\\G\\s*(.{1,"+maxLenght+"})(?=\\s|$)", Pattern.DOTALL);
Matcher m = p.matcher(data);
while (m.find())
System.out.println(m.group(1));
Output:
Hello
there, my
name is
not
importnant
right now.
I am just
simple
sentecne
used to
test few
things.
Short (or not) explanation of "\\G\\s*(.{1,"+maxLenght+"})(?=\\s|$)" regex:
(lets just remember that in Java \ is not only special in regex, but also in String literals, so to use predefined character sets like \d we need to write it as "\\d" because we needed to escape that \ also in string literal)
\G - is anchor representing end of previously founded match, or if there is no match yet (when we just started searching) beginning of string (same as ^ does)
\s* - represents zero or more whitespaces (\s represents whitespace, * "zero-or-more" quantifier)
(.{1,"+maxLenght+"}) - lets split it in more parts (at runtime :maxLenght will hold some numeric value like 10 so regex will see it as .{1,10})
. represents any character (actually by default it may represent any character except line separators like \n or \r, but thanks to Pattern.DOTALL flag it can now represent any character - you may get rid of this method argument if you want to start splitting each sentence separately since its start will be printed in new line anyway)
{1,10} - this is quantifier which lets previously described element appear 1 to 10 times (by default will try to find maximal amout of matching repetitions),
.{1,10} - so based on what we said just now, it simply represents "1 to 10 of any characters"
( ) - parenthesis create groups, structures which allow us to hold specific parts of match (here we added parenthesis after \\s* because we will want to use only part after whitespaces)
(?=\\s|$) - is look-ahead mechanism which will make sure that text matched by .{1,10} will have after it:
space (\\s)
OR (written as |)
end of the string $ after it.
So thanks to .{1,10} we can match up to 10 characters. But with (?=\\s|$) after it we require that last character matched by .{1,10} is not part of unfinished word (there must be space or end of string after it).
Non-regex solution, just in case someone is more comfortable (?) not using regular expressions:
private String justify(String s, int limit) {
StringBuilder justifiedText = new StringBuilder();
StringBuilder justifiedLine = new StringBuilder();
String[] words = s.split(" ");
for (int i = 0; i < words.length; i++) {
justifiedLine.append(words[i]).append(" ");
if (i+1 == words.length || justifiedLine.length() + words[i+1].length() > limit) {
justifiedLine.deleteCharAt(justifiedLine.length() - 1);
justifiedText.append(justifiedLine.toString()).append(System.lineSeparator());
justifiedLine = new StringBuilder();
}
}
return justifiedText.toString();
}
Test:
String text = "Long sentence with spaces, and punctuation too. And supercalifragilisticexpialidocious words. No carriage returns, tho -- since it would seem weird to count the words in a new line as part of the previous paragraph's length.";
System.out.println(justify(text, 15));
Output:
Long sentence
with spaces,
and punctuation
too. And
supercalifragilisticexpialidocious
words. No
carriage
returns, tho --
since it would
seem weird to
count the words
in a new line
as part of the
previous
paragraph's
length.
It takes into account words that are longer than the set limit, so it doesn't skip them (unlike the regex version which just stops processing when it finds supercalifragilisticexpialidosus).
PS: The comment about all input words being expected to be shorter than the set limit, was made after I came up with this solution ;)

Categories