Get a substring of a string made of xCharsxInts - java

I have a list of constants:
public static final String INSTANCE_PREFIX = "in";
public static final String INDICATOR_PREFIX = "i";
public static final String MODEL_PREFIX = "m";
...
They have variable lengths, which are put in front of a number and the result is a variable's id. For example, it could be in30 or i2 or m4353. I am trying to make the method as abstract as possible to account for x letters x numbers. The letters are always going to be some prefix that is inside of my Constants.java so I know that much, but the method won't know with which combination it's working with.
I just want the number attached to the end. For example, I want to pass in the m4353 from above and just get back the 4353. Whether it uses the constants file or not is not relevant, but I include them as they may be useful for some approach.

It seems to me like you don't care about the prefixes at all, so I have ignored them in this answer. If you do care about the prefixes, please scroll down to the second half of this answer:
This code uses regular expressions to extract the trailing numbers at the end of a string.
() represents a capturing group (used by m.group(1));
[0-9]+ represents a String of digits of at least 1 in length
$ represents the end of the string, guaranteeing the numbers are only the ones at the end.
Here is the code:
private static final Pattern p = Pattern.compile("([0-9]+)$");
public static int extractNumber(String value) {
Matcher m = p.matcher(value);
if(m.find()) {
return Integer.parseInt(m.group(1));
} else {
return Integer.MIN_VALUE; // error code
}
}
Demo.
If you want to capture the prefix, you could use Pattern.compile("^([a-z]+)([0-9]+)$ instead.
Note that the numbers are now the second group, so they would be captured in m.group(2), and the prefix would be captured in m.group(1).

Try the String replaceAll method
For example:
String x = "prefix1111111";
x = x.replaceAll("\\D", "");
int justNum = Integer.parseInt(x);
where "\\D" is any non-digit character. So it deletes all non-digits in your string.
Note, you might want to use Long.parseLong or Double.parseDouble and the associated primitive types instead if your numbers will be longer than 9 digits as Java ints can only handle values up to 2147483647

Related

Regular Expression Parse Double

I am new to regular expressions. I want to search for NUMBER(19, 4) and the method should return the value(in this case 19,4). But I always get 0 as result !
int length =0;
length = patternLength(datatype,"^NUMBER\\((\\d+)\\,\\s*\\)$","NUMBER");
private static double patternLengthD(String datatype, String patternString, String startsWith) {
double length=0;
if (datatype.startsWith(startsWith)) {
Pattern patternA = Pattern.compile(patternString);
Matcher matcherA = patternA.matcher(datatype);
if (matcherA.find()) {
length = Double.parseDouble(matcherA.group(1));
}
}
return length;
}
You are missing the matching of digits after the comma.
You also don't need to escape the ,.
Use this:
"^NUMBER\\((\\d+),\\s*(\\d+)\\)$"
This will give you the first number in group(1) and the second number in group(2).
It is however fairly strict on spaces, so you can be more lenient and match on values like " NUMBER ( 19 , 4 ) " by using this:
"^\\s*NUMBER\\s*\\(\\s*(\\d+)\\s*,\\s*(\\d+)\\s*\\)\\s*$"
In that case you'll have to drop your startsWith and just use the regex directly. Also, you can remove the anchors (^$) if you change find() to matches().
Since NUMBER(19) is usually allowed too. You can make the second value optional:
"\\s*NUMBER\\s*\\(\\s*(\\d+)\\s*(?:,\\s*(\\d+)\\s*)?\\)\\s*"
group(2) will then return null if the second number is not given.
See regex101 for demo.
Note that your code doesn't compile.
Your method returns a double, but length is an int.
Although 19,4 looks like a floating point number, it is not, and representing it as such is wrong.
You should store the two values separately.

Java String- How to get a part of package name in android?

Its basically about getting string value between two characters. SO has many questions related to this. Like:
How to get a part of a string in java?
How to get a string between two characters?
Extract string between two strings in java
and more.
But I felt it quiet confusing while dealing with multiple dots in the string and getting the value between certain two dots.
I have got the package name as :
au.com.newline.myact
I need to get the value between "com." and the next "dot(.)". In this case "newline". I tried
Pattern pattern = Pattern.compile("com.(.*).");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
int ct = matcher.group();
I tried using substrings and IndexOf also. But couldn't get the intended answer. Because the package name in android varies by different number of dots and characters, I cannot use fixed index. Please suggest any idea.
As you probably know (based on .* part in your regex) dot . is special character in regular expressions representing any character (except line separators). So to actually make dot represent only dot you need to escape it. To do so you can place \ before it, or place it inside character class [.].
Also to get only part from parenthesis (.*) you need to select it with proper group index which in your case is 1.
So try with
String beforeTask = "au.com.newline.myact";
Pattern pattern = Pattern.compile("com[.](.*)[.]");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
String ct = matcher.group(1);//remember that regex finds Strings, not int
System.out.println(ct);
}
Output: newline
If you want to get only one element before next . then you need to change greedy behaviour of * quantifier in .* to reluctant by adding ? after it like
Pattern pattern = Pattern.compile("com[.](.*?)[.]");
// ^
Another approach is instead of .* accepting only non-dot characters. They can be represented by negated character class: [^.]*
Pattern pattern = Pattern.compile("com[.]([^.]*)[.]");
If you don't want to use regex you can simply use indexOf method to locate positions of com. and next . after it. Then you can simply substring what you want.
String beforeTask = "au.com.newline.myact.modelact";
int start = beforeTask.indexOf("com.") + 4; // +4 since we also want to skip 'com.' part
int end = beforeTask.indexOf(".", start); //find next `.` after start index
String resutl = beforeTask.substring(start, end);
System.out.println(resutl);
You can use reflections to get the name of any class. For example:
If I have a class Runner in com.some.package and I can run
Runner.class.toString() // string is "com.some.package.Runner"
to get the full name of the class which happens to have a package name inside.
TO get something after 'com' you can use Runner.class.toString().split(".") and then iterate over the returned array with boolean flag
All you have to do is split the strings by "." and then iterate through them until you find one that equals "com". The next string in the array will be what you want.
So your code would look something like:
String[] parts = packageName.split("\\.");
int i = 0;
for(String part : parts) {
if(part.equals("com")
break;
}
++i;
}
String result = parts[i+1];
private String getStringAfterComDot(String packageName) {
String strArr[] = packageName.split("\\.");
for(int i=0; i<strArr.length; i++){
if(strArr[i].equals("com"))
return strArr[i+1];
}
return "";
}
I have done heaps of projects before dealing with websites scraping and I
just have to create my own function/utils to get the job done. Regex might
be an overkill sometimes if you just want to extract a substring from
a given string like the one you have. Below is the function I normally
use to do this kind of task.
private String GetValueFromText(String sText, String sBefore, String sAfter)
{
String sRetValue = "";
int nPos = sText.indexOf(sBefore);
if ( nPos > -1 )
{
int nLast = sText.indexOf(sAfter,nPos+sBefore.length()+1);
if ( nLast > -1)
{
sRetValue = sText.substring(nPos+sBefore.length(),nLast);
}
}
return sRetValue;
}
To use it just do the following:
String sValue = GetValueFromText("au.com.newline.myact", ".com.", ".");

Java better way to code this simple parameter intake?

"Employee identification number (a string) in the format XXX-L, where each X is a digit within the range 0-9 and the L is a letter within the range ‘A’-‘M’ (both lowercase and uppercase letters are acceptable)"
The above is a field which will be an argument for the constructor. Right now, I'm planning on making sure the the first 3 letters of the string is a number between 0-9, and then make sure there is a dash in the index of 4, and then make sure there is a letter between A-M in the 5th index, all using if else statements. Is there a better way of doing this, like if the entering of the parameter didn't have to be so exact, and the programs able to fix it by itself? Thank you.
I coded it and tried regex expression tools:
import java.util.regex.*;
public class Employee {
private String eName;
private String IDNumber;
public Employee(String name, String number) {
String regex = "[0-9][0-9][0-9][\\-][a-mA-M]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(number);
this.eName = name;
if(matcher.matches()) {
this.IDNumber = number;
} else{
this.IDNumber = "999-M";
}
}
public String getNumber() {
System.out.println(IDNumber);
return IDNumber;
}
public static void main(String[] args) {
Employee e = new Employee("John", "123-f");
e.getNumber();
Employee c = new Employee("Jane","25z");
c.getNumber();
}
}
I haven't thoroughly tested it, but it works, but looking at other people's regex expression, mine seems to be very newbish. I was wondering if someone can help me construct a shorter or better regex expression.
^\\d{3}-[a-mA-M]$
This should be an improvement I think.
^ means the start of the text
\d means any digit (backslash itself needs to be escaped)
{3} means previous match 3 times
the hyphen is a literal, as long as it isn't in square brackets
[a-mA-M] means any upper or lower case letter as you knew
$ means the end of the text
I used this site to test it out on regexpal.com

Length of specific substring

I check if my string begins with number using
if(RegEx(IsMatch(myString, #"\d+"))) ...
If this condition holds I want to get the length of this "numeric" substring that my string begins with.
I can find the length checking if every next character is a digit beginning from the first one and increasing some counter. Is there any better way to do this?
Well instead of using IsMatch, you should find the match:
// Presumably you'll be using the same regular expression every time, so
// we might as well just create it once...
private static readonly Regex Digits = new Regex(#"\d+");
...
Match match = Digits.Match(text);
if (match.Success)
{
string value = match.Value;
// Take the length or whatever
}
Note that this doesn't check that the digits occur at the start of the string. You could do that using #"^\d+" which will anchor the match to the beginning. Or you could check that match.Index was 0 if you wanted...
To check if my string begins with number, you need to use pattern ^\d+.
string pattern = #"^\d+";
MatchCollection mc = Regex.Matches(myString, pattern);
if(mc.Count > 0)
{
Console.WriteLine(mc[0].Value.Length);
}
Your regex checks if your string contains a sequence of one or more numbers. If you want to check that it starts with it you need to anchor it at the beginning:
Match m = Regex.Match(myString, #"^\d+");
if (m.Success)
{
int length = m.Length;
}
As an alternative to a regular expression, you can use extension methods:
int cnt = myString.TakeWhile(Char.IsDigit).Count();
If there are no digits in the beginning of the string you will naturally get a zero count. Otherwise you have the number of digits.
Instead of just checking IsMatch, get the match so you can get info about it, like the length:
var match = Regex.Match(myString, #"^\d+");
if (match.Success)
{
int count = match.Length;
}
Also, I added a ^ to the beginning of your pattern to limit it to the beginning of the string.
If you break out your code a bit more, you can take advantage of Regex.Match:
var length = 0;
var myString = "123432nonNumeric";
var match = Regex.Match(myString, #"\d+");
if(match.Success)
{
length = match.Value.Length;
}

Regular Expression problem in Java

I am trying to create a regular expression for the replaceAll method in Java. The test string is abXYabcXYZ and the pattern is abc. I want to replace any symbol except the pattern with +. For example the string abXYabcXYZ and pattern [^(abc)] should return ++++abc+++, but in my case it returns ab++abc+++.
public static String plusOut(String str, String pattern) {
pattern= "[^("+pattern+")]" + "".toLowerCase();
return str.toLowerCase().replaceAll(pattern, "+");
}
public static void main(String[] args) {
String text = "abXYabcXYZ";
String pattern = "abc";
System.out.println(plusOut(text, pattern));
}
When I try to replace the pattern with + there is no problem - abXYabcXYZ with pattern (abc) returns abxy+xyz. Pattern (^(abc)) returns the string without replacement.
Is there any other way to write NOT(regex) or group symbols as a word?
What you are trying to achieve is pretty tough with regular expressions, since there is no way to express “replace strings not matching a pattern”. You will have to use a “positive” pattern, telling what to match instead of what not to match.
Furthermore, you want to replace every character with a replacement character, so you have to make sure that your pattern matches exactly one character. Otherwise, you will replace whole strings with a single character, returning a shorter string.
For your toy example, you can use negative lookaheads and lookbehinds to achieve the task, but this may be more difficult for real-world examples with longer or more complex strings, since you will have to consider each character of your string separately, along with its context.
Here is the pattern for “not ‘abc’”:
[^abc]|a(?!bc)|(?<!a)b|b(?!c)|(?<!ab)c
It consists of five sub-patterns, connected with “or” (|), each matching exactly one character:
[^abc] matches every character except a, b or c
a(?!bc) matches a if it is not followed by bc
(?<!a)b matches b if it is not preceded with a
b(?!c) matches b if it is not followed by c
(?<!ab)c matches c if it is not preceded with ab
The idea is to match every character that is not in your target word abc, plus every word character that, according to the context, is not part of your word. The context can be examined using negative lookaheads (?!...) and lookbehinds (?<!...).
You can imagine that this technique will fail once you have a target word containing one character more than once, like example. It is pretty hard to express “match e if it is not followed by x and not preceded by l”.
Especially for dynamic patterns, it is by far easier to do a positive search and then replace every character that did not match in a second pass, as others have suggested here.
[^ ... ] will match one character that is not any of ...
So your pattern "[^(abc)]" is saying "match one character that is not a, b, c or the left or right bracket"; and indeed that is what happens in your test.
It is hard to say "replace all characters that are not part of the string 'abc'" in a single trivial regular expression. What you might do instead to achieve what you want could be some nasty thing like
while the input string still contains "abc"
find the next occurrence of "abc"
append to the output a string containing as many "+"s as there are characters before the "abc"
append "abc" to the output string
skip, in the input string, to a position just after the "abc" found
append to the output a string containing as many "+"s as there are characters left in the input
or possibly if the input alphabet is restricted you could use regular expressions to do something like
replace all occurrences of "abc" with a single character that does not occur anywhere in the existing string
replace all other characters with "+"
replace all occurrences of the target character with "abc"
which will be more readable but may not perform as well
Negating regexps is usually troublesome. I think you might want to use negative lookahead. Something like this might work:
String pattern = "(?<!ab).(?!abc)";
I didn't test it, so it may not really work for degenerate cases. And the performance might be horrible too. It is probably better to use a multistep algorithm.
Edit: No I think this won't work for every case. You will probably spend more time debugging a regexp like this than doing it algorithmically with some extra code.
Try to solve it without regular expressions:
String out = "";
int i;
for(i=0; i<text.length() - pattern.length() + 1; ) {
if (text.substring(i, i + pattern.length()).equals(pattern)) {
out += pattern;
i += pattern.length();
}
else {
out += "+";
i++;
}
}
for(; i<text.length(); i++) {
out += "+";
}
Rather than a single replaceAll, you could always try something like:
#Test
public void testString() {
final String in = "abXYabcXYabcHIH";
final String expected = "xxxxabcxxabcxxx";
String result = replaceUnwanted(in);
assertEquals(expected, result);
}
private String replaceUnwanted(final String in) {
final Pattern p = Pattern.compile("(.*?)(abc)([^a]*)");
final Matcher m = p.matcher(in);
final StringBuilder out = new StringBuilder();
while (m.find()) {
out.append(m.group(1).replaceAll(".", "x"));
out.append(m.group(2));
out.append(m.group(3).replaceAll(".", "x"));
}
return out.toString();
}
Instead of using replaceAll(...), I'd go for a Pattern/Matcher approach:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static String plusOut(String str, String pattern) {
StringBuilder builder = new StringBuilder();
String regex = String.format("((?:(?!%s).)++)|%s", pattern, pattern);
Matcher m = Pattern.compile(regex).matcher(str.toLowerCase());
while(m.find()) {
builder.append(m.group(1) == null ? pattern : m.group().replaceAll(".", "+"));
}
return builder.toString();
}
public static void main(String[] args) {
String text = "abXYabcXYZ";
String pattern = "abc";
System.out.println(plusOut(text, pattern));
}
}
Note that you'll need to use Pattern.quote(...) if your String pattern contains regex meta-characters.
Edit: I didn't see a Pattern/Matcher approach was already suggested by toolkit (although slightly different)...

Categories