Length of specific substring - java

I check if my string begins with number using
if(RegEx(IsMatch(myString, #"\d+"))) ...
If this condition holds I want to get the length of this "numeric" substring that my string begins with.
I can find the length checking if every next character is a digit beginning from the first one and increasing some counter. Is there any better way to do this?

Well instead of using IsMatch, you should find the match:
// Presumably you'll be using the same regular expression every time, so
// we might as well just create it once...
private static readonly Regex Digits = new Regex(#"\d+");
...
Match match = Digits.Match(text);
if (match.Success)
{
string value = match.Value;
// Take the length or whatever
}
Note that this doesn't check that the digits occur at the start of the string. You could do that using #"^\d+" which will anchor the match to the beginning. Or you could check that match.Index was 0 if you wanted...

To check if my string begins with number, you need to use pattern ^\d+.
string pattern = #"^\d+";
MatchCollection mc = Regex.Matches(myString, pattern);
if(mc.Count > 0)
{
Console.WriteLine(mc[0].Value.Length);
}

Your regex checks if your string contains a sequence of one or more numbers. If you want to check that it starts with it you need to anchor it at the beginning:
Match m = Regex.Match(myString, #"^\d+");
if (m.Success)
{
int length = m.Length;
}

As an alternative to a regular expression, you can use extension methods:
int cnt = myString.TakeWhile(Char.IsDigit).Count();
If there are no digits in the beginning of the string you will naturally get a zero count. Otherwise you have the number of digits.

Instead of just checking IsMatch, get the match so you can get info about it, like the length:
var match = Regex.Match(myString, #"^\d+");
if (match.Success)
{
int count = match.Length;
}
Also, I added a ^ to the beginning of your pattern to limit it to the beginning of the string.

If you break out your code a bit more, you can take advantage of Regex.Match:
var length = 0;
var myString = "123432nonNumeric";
var match = Regex.Match(myString, #"\d+");
if(match.Success)
{
length = match.Value.Length;
}

Related

Regular Expression Parse Double

I am new to regular expressions. I want to search for NUMBER(19, 4) and the method should return the value(in this case 19,4). But I always get 0 as result !
int length =0;
length = patternLength(datatype,"^NUMBER\\((\\d+)\\,\\s*\\)$","NUMBER");
private static double patternLengthD(String datatype, String patternString, String startsWith) {
double length=0;
if (datatype.startsWith(startsWith)) {
Pattern patternA = Pattern.compile(patternString);
Matcher matcherA = patternA.matcher(datatype);
if (matcherA.find()) {
length = Double.parseDouble(matcherA.group(1));
}
}
return length;
}
You are missing the matching of digits after the comma.
You also don't need to escape the ,.
Use this:
"^NUMBER\\((\\d+),\\s*(\\d+)\\)$"
This will give you the first number in group(1) and the second number in group(2).
It is however fairly strict on spaces, so you can be more lenient and match on values like " NUMBER ( 19 , 4 ) " by using this:
"^\\s*NUMBER\\s*\\(\\s*(\\d+)\\s*,\\s*(\\d+)\\s*\\)\\s*$"
In that case you'll have to drop your startsWith and just use the regex directly. Also, you can remove the anchors (^$) if you change find() to matches().
Since NUMBER(19) is usually allowed too. You can make the second value optional:
"\\s*NUMBER\\s*\\(\\s*(\\d+)\\s*(?:,\\s*(\\d+)\\s*)?\\)\\s*"
group(2) will then return null if the second number is not given.
See regex101 for demo.
Note that your code doesn't compile.
Your method returns a double, but length is an int.
Although 19,4 looks like a floating point number, it is not, and representing it as such is wrong.
You should store the two values separately.

Counting comma and any text in java String

I'm trying to write a function to count specific Strings.
The Strings to count look like the following:
first any character except comma at least once -
the comma -
any chracter but at least once
example string:
test, test, test,
should count to 3
I've tried do that by doing the following:
int countSubstrings = 0;
final Pattern pattern = Pattern.compile("[^,]*,.+");
final Matcher matcher = pattern.matcher(commaString);
while (matcher.find()) {
countSubstrings++;
}
Though my solution doesn't work. It always ends up counting to one and no further.
Try this pattern instead: [^,]+
As you can see in the API, find() will give you the next subsequence that matches the pattern. So this will find your sequences of "non-commas" one after the other.
Your regex, especially the .+ part will match any char sequence of at least length 1. You want the match to be reluctant/lazy so add a ?: [^,]*,.+?
Note that .+? will still match a comma that directly follows a comma so you might want to replace .+? with [^,]+ instead (since commas can't match with this lazyness is not needed).
Besides that an easier solution might be to split the string and get the length of the array (or loop and check the elements if you don't want to allow for empty strings):
countSubstrings = commaString.split(",").length;
Edit:
Since you added an example that clarifies your expectations, you need to adjust your regex. You seem to want to count the number of strings followed by a comma so your regex can be simplified to [^,]+,. This matches any char sequence consisting of non-comma chars which is followed by a comma.
Note that this wouldn't match multiple commas or text at the end of the input, e.g. test,,test would result in a count of 1. If you have that requirement you need to adjust your regex.
So, quite good answers are already given. Very readable. Something like this should work, beware, it's not clean and probably not the fastest way to do this. But is is quite readable. :)
public int countComma(String lots_of_words) {
int count = 0;
for (int x = 0; x < lots_of_words.length(); x++) {
if (lots_of_words.charAt(x) == ',') {
count++;
}
}
return count;
}
Or even better:
public int countChar(String lots_of_words, char the_chosen_char) {
int count = 0;
for (int x = 0; x < lots_of_words.length(); x++) {
if (lots_of_words.charAt(x) == the_chosen_char) {
count++;
}
}
return count;
}

Get a substring of a string made of xCharsxInts

I have a list of constants:
public static final String INSTANCE_PREFIX = "in";
public static final String INDICATOR_PREFIX = "i";
public static final String MODEL_PREFIX = "m";
...
They have variable lengths, which are put in front of a number and the result is a variable's id. For example, it could be in30 or i2 or m4353. I am trying to make the method as abstract as possible to account for x letters x numbers. The letters are always going to be some prefix that is inside of my Constants.java so I know that much, but the method won't know with which combination it's working with.
I just want the number attached to the end. For example, I want to pass in the m4353 from above and just get back the 4353. Whether it uses the constants file or not is not relevant, but I include them as they may be useful for some approach.
It seems to me like you don't care about the prefixes at all, so I have ignored them in this answer. If you do care about the prefixes, please scroll down to the second half of this answer:
This code uses regular expressions to extract the trailing numbers at the end of a string.
() represents a capturing group (used by m.group(1));
[0-9]+ represents a String of digits of at least 1 in length
$ represents the end of the string, guaranteeing the numbers are only the ones at the end.
Here is the code:
private static final Pattern p = Pattern.compile("([0-9]+)$");
public static int extractNumber(String value) {
Matcher m = p.matcher(value);
if(m.find()) {
return Integer.parseInt(m.group(1));
} else {
return Integer.MIN_VALUE; // error code
}
}
Demo.
If you want to capture the prefix, you could use Pattern.compile("^([a-z]+)([0-9]+)$ instead.
Note that the numbers are now the second group, so they would be captured in m.group(2), and the prefix would be captured in m.group(1).
Try the String replaceAll method
For example:
String x = "prefix1111111";
x = x.replaceAll("\\D", "");
int justNum = Integer.parseInt(x);
where "\\D" is any non-digit character. So it deletes all non-digits in your string.
Note, you might want to use Long.parseLong or Double.parseDouble and the associated primitive types instead if your numbers will be longer than 9 digits as Java ints can only handle values up to 2147483647

Java String- How to get a part of package name in android?

Its basically about getting string value between two characters. SO has many questions related to this. Like:
How to get a part of a string in java?
How to get a string between two characters?
Extract string between two strings in java
and more.
But I felt it quiet confusing while dealing with multiple dots in the string and getting the value between certain two dots.
I have got the package name as :
au.com.newline.myact
I need to get the value between "com." and the next "dot(.)". In this case "newline". I tried
Pattern pattern = Pattern.compile("com.(.*).");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
int ct = matcher.group();
I tried using substrings and IndexOf also. But couldn't get the intended answer. Because the package name in android varies by different number of dots and characters, I cannot use fixed index. Please suggest any idea.
As you probably know (based on .* part in your regex) dot . is special character in regular expressions representing any character (except line separators). So to actually make dot represent only dot you need to escape it. To do so you can place \ before it, or place it inside character class [.].
Also to get only part from parenthesis (.*) you need to select it with proper group index which in your case is 1.
So try with
String beforeTask = "au.com.newline.myact";
Pattern pattern = Pattern.compile("com[.](.*)[.]");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
String ct = matcher.group(1);//remember that regex finds Strings, not int
System.out.println(ct);
}
Output: newline
If you want to get only one element before next . then you need to change greedy behaviour of * quantifier in .* to reluctant by adding ? after it like
Pattern pattern = Pattern.compile("com[.](.*?)[.]");
// ^
Another approach is instead of .* accepting only non-dot characters. They can be represented by negated character class: [^.]*
Pattern pattern = Pattern.compile("com[.]([^.]*)[.]");
If you don't want to use regex you can simply use indexOf method to locate positions of com. and next . after it. Then you can simply substring what you want.
String beforeTask = "au.com.newline.myact.modelact";
int start = beforeTask.indexOf("com.") + 4; // +4 since we also want to skip 'com.' part
int end = beforeTask.indexOf(".", start); //find next `.` after start index
String resutl = beforeTask.substring(start, end);
System.out.println(resutl);
You can use reflections to get the name of any class. For example:
If I have a class Runner in com.some.package and I can run
Runner.class.toString() // string is "com.some.package.Runner"
to get the full name of the class which happens to have a package name inside.
TO get something after 'com' you can use Runner.class.toString().split(".") and then iterate over the returned array with boolean flag
All you have to do is split the strings by "." and then iterate through them until you find one that equals "com". The next string in the array will be what you want.
So your code would look something like:
String[] parts = packageName.split("\\.");
int i = 0;
for(String part : parts) {
if(part.equals("com")
break;
}
++i;
}
String result = parts[i+1];
private String getStringAfterComDot(String packageName) {
String strArr[] = packageName.split("\\.");
for(int i=0; i<strArr.length; i++){
if(strArr[i].equals("com"))
return strArr[i+1];
}
return "";
}
I have done heaps of projects before dealing with websites scraping and I
just have to create my own function/utils to get the job done. Regex might
be an overkill sometimes if you just want to extract a substring from
a given string like the one you have. Below is the function I normally
use to do this kind of task.
private String GetValueFromText(String sText, String sBefore, String sAfter)
{
String sRetValue = "";
int nPos = sText.indexOf(sBefore);
if ( nPos > -1 )
{
int nLast = sText.indexOf(sAfter,nPos+sBefore.length()+1);
if ( nLast > -1)
{
sRetValue = sText.substring(nPos+sBefore.length(),nLast);
}
}
return sRetValue;
}
To use it just do the following:
String sValue = GetValueFromText("au.com.newline.myact", ".com.", ".");

Java using Matcher to fail when the immediate sequence is not matchable

Matcher.find finds the next subsequence, starting at a given index, which is compliant with the regex.
How can I make it so that it fails if the next character sequence is not compliant?
Ex:
String input = "123456text123";
Matcher mat1 = Pattern.compile("\\d+").matcher(input);
mat1.find();
System.out.println(mat1.group()); //123456
mat1.find(mat1.end());
System.out.println(mat1.group()); //123
I want to know if there's a way to make the second find fail, since the next sequence does not match the mat1 pattern.
I want to be able to 'compose' matchers, in such a way that they MUST always be found in sequence.
Is it possible at all?
You can check that the previous mat1.end() equals the next mat1.start().
int lastEnd = -1;
while (mat1.find()) {
// Was there any junk between last two matches?
if (mat1.start() != lastEnd+1) {
System.out.println("Fail.");
break;
}
System.out.println(mat1.group());
lastEnd = mat1.end();
}

Categories