Splitting string expression into tokens - java

My input is like
String str = "-1.33E+4-helloeeee+4+(5*2(10/2)5*10)/2";
i want the output as:
1.33E+4
helloeeee
4
5
2
10
2
5
10
2
But I am getting the output as
1.33, 4, helloeeee, 4, 5, 2, 10, 2, 5, 10, 2
i want the exponent value completely after splitting "1.33e+4"
here is my code:
String str = "-1.33E+4-helloeeee+4+(5*2(10/2)5*10)/2";
List<String> tokensOfExpression = new ArrayList<String>();
String[] tokens=str.split("[(?!E)+*\\-/()]+");
for(String token:tokens)
{
System.out.println(token);
tokensOfExpression.add(token);
}
if(tokensOfExpression.get(0).equals(""))
{
tokensOfExpression.remove(0);
}

I would first replace the E+ with a symbol that is not ambiguous such as
str.ReplaceAll("E+","SCINOT");
You can then parse with StringTokenizer, replacing the SCINOT symbol when you need to evaluate the number represented in scientific notation.

You can't do that with a single regular expression, because of the ambiguities introduced by FP constants in scientific notation, and in any case you need to know which token is which without having to re-scan them. You've also mis-stated your requirement, as you certainly need the binary operators in the output as well. You need to write both a scanner and a parser. Have a look for 'recursive descent expression parser' and 'Dijkstra shunting-yard algorithm'.Resetting the digest is redundant.

Try this
String[] tokens=str.split("(?<!E)+[*\\-/()+]");

It's easier to achieve the result with Matcher
String str = "-1.33E+4-helloeeee+4+(5*2(10/2)5*10)/2";
Matcher m = Pattern.compile("\\d+\\.\\d*E[+-]?\\d+|\\w+").matcher(str);
while(m.find()) {
System.out.println(m.group());
}
prints
1.33E+4
helloeeee
4
5
2
10
2
5
10
2
note that it needs some testing for different floating point expressions but it is easily adjustable

Related

How to split a string into only positive and negative integers?

I'm writing a program to do different calculations with vector functions, but the program I have as of now delimits the negative digits. I've tried using different delimiters but I can't seem to get the right one.
Does anyone know how to keep the positive and negative digits when splitting a string? Also, is there a way to keep any decimal values? .45 would return 45 and .29 would return 29
This is the code:
ArrayList<Integer> list = new ArrayList<Integer>();
String twoVectors = "a=<-1,2,-3> b=<4,-5,6>"; // vector and a and b
String[] tokens = twoVectors.split("\\D");
for (String s : tokens)
if (!s.equals(""))
list.add(Integer.parseInt(s));
System.out.println(Arrays.toString(list.toArray()));
When I run the program I get [1, 2, 3, 4, 5, 6] instead of [-1, 2, -3, 4, -5, 6]. All the functions I have worked perfectly fine but dont work when using negative values.
Any help would be appreciated.
You can use
String[] tokens = twoVectors.split("[^\\d-]+");
[^\\d-]+ : match anything except digits and -
[] : match everything mentioned inside []
^ : negation mean do not match (\\d-)
\\d- : digits 0-9 and - character
Regex Demo
String twoVectors = "a=<-1,2,-3> b=<4,-5,6>";
ArrayList<Integer> list = new ArrayList<Integer>();
String[] tokens = twoVectors.split("[^\\d-]");
for (String s : tokens)
if (!s.equals(""))
list.add(Integer.parseInt(s));
System.out.println(Arrays.toString(list.toArray()));
Output :
[-1, 2, -3, 4, -5, 6]
Or
you can use Pattern along with matcher to find all the desired values i.e singed or unsigned numbers with -?\\d+ regex
Regex Demo -?\d+
Update : For Double values , you can use [^\\d-.]+ and make sure to use Double instead of Integer along with Double.parseDouble
And with Pattern and Matcher use -?\\d*\\.?\\d+
Use [^\\d-] inside your split method i.e. twoVectors.split("[^\\d-]")
Why [^\\d-]:
^ : Finds regex that must match at the beginning of the line.
\d : Any digit from [0-9]
- : will match '-' if it exists
The regex that you currently have splits the string on anything but digits. So anything that is not a digit is considered a splitter. If you added - sign to this pattern, anything that is not a digit or a - sign will be included. This will work for some cases, but will fail if you have - or . without a number afterwards.
What you need to do is to specify the number format in a regex (like -?\d*.?\d+), and then find all matches of this pattern. You will also need to change the numbers to Double so that you can parse decimal numbers.
String twoVectors = "a=<-1,.2,-3> b=<4,-5,6>";
ArrayList<Double> numbers = new ArrayList<Double>();
Matcher matcher = Pattern.compile("-?\\d*\\.?\\d+").matcher(twoVectors);
while (matcher.find()) {
numbers.add(Double.parseDouble(matcher.group()));
}
System.out.println(Arrays.toString(numbers.toArray()));
Output
[-1.0, 0.2, -3.0, 4.0, -5.0, 6.0]
A 1-line solution:
List<Integer> numbers = Arrays
.stream(twoVectors.replaceAll("^[^\\d-]+", "").split("[^\\d-]+"))
.map(Integer::new)
.collect(Collectors.toList());
The initial replace is to remove the leading non-target chars (otherwise the split would return a blank in the first element).

Java Regex : 4 Letters followed by 2 Integers

Regex beginner here.
Already visited the followings, none answers my question :
1, 2, 3, 4, 5, 6, etc.
I have a simple regex to check if a string contains 4 chars followed by 2 digits.
[A-Za-z]{4}[0-9]{2}
But, when using it, it doesn't matches. Here is the method I use and an example of input and output :
Input in a JPasswordField
Mypass85
Output
false
Method
public static boolean checkPass(char[] ca){
String s = new String(ca);
System.out.println(s); // Prints : Mypass85
p = Pattern.compile("[A-Za-z]{4}[0-9]{2}");
return p.matcher(s).matches();
}
Matcher#matches attempts to match full input. Use Matcher#find instead:
public static boolean checkPass(String s){
System.out.println(s); // Prints : Mypass85
p = Pattern.compile("[A-Za-z]{4}[0-9]{2}");
return p.matcher(s).find();
}
Promoting a comment to an answer.
It doesn't match because "Mypass85" is 6 letters followed by 2 numbers, but your pattern expects exactly 4 letters followed by 2 numbers.
You can either pass something like "Pass85" to match your existing pattern, or you can get "Mypass85" to match by changing the {4} to {6} or to {4,} (4 or more).

Is there an easier way to find a certain format within a longer String?

I'm building a program and ran into a problem, I'm not sure how to conquer it most efficiently.
I need to write an algorithm that takes a String in this format:
12/05/2014 PROJ Assignment 4 20/20 100 4
and it will remove everything but
20/20
so I can then substring that and parse it to an integer value. This is what I've tried, but I'm not sure what the best way to do this would be. My while loop works, going from each / to the next, but the loop will only stop when the string has 20 100 4 left, I need the 20/20, but not the 100 or 4.
String line = "12/05/2014 PROJ Assignment 4 20/20 100 4";
int slashIndex = line.indexOf("/");
String temp = line.substring((slashIndex+1));
System.out.println(temp);
while(temp.indexOf("/") != -1){
slashIndex = temp.indexOf("/");
temp = temp.substring((slashIndex+1));
System.out.println(temp);
}
If I do it the way I'm doing, I could potentially use the slashIndex of the last slash, and then make a substring from the original String- however the score may vary. It could be 20/20 or it could be 100/200, or 10/100, so how could I make the program dynamic enough to decide where to cut it up?
Any thoughts or ideas would be great, thanks.
Connor
Just split the input on one or more whitespaces (\\s+). The 5th field will have index 4 of the parts.
String t = "12/05/2014 PROJ Assignment 4 20/20 100 4";
String[] parts = t.split("\\s+");
System.out.println(parts[4]);
Output:
20/20
try this
str = str.replaceAll(".* (\\d+/\\d+) .*", "$1");
String line = "12/05/2014 PROJ Assignment 4 20/20 100 4"
String pattern = " ([0-9]{1,3}\/[0-9]{1,3}) ";
String numbers = line.replaceAll(pattern, "$1");
System.out.println(numbers);
If you want to do it with regex, this one, ensure that you got exacly the same input format string.
I create an regexplain for an "explication of the regex"
Pattern mypat = Pattern.compile("\d{2}\\/\d{2}\\/\d{4} [A-Z]{4} \w+ \d? (\d+\/\d+) \d+ \d?");
// ...
Matcher m = mypat.matcher("12/05/2014 PROJ Assignment 4 20/20 100 4");
if (m.matches()) {
String value = m.group(1);
}
Create a regex (you may find something more sophisticated regarding the actual regex, i am not a regex expert) that matches the string you want, for example:
Pattern pattern = Pattern.compile("([2][0]\\/[2][0])");
then create a matcher using the pattern
Matcher m = pattern.matcher("12/05/2014 PROJ Assignment 4 20/20 100 4");
and finally if m.matches() get the first group that matched:
m.group(0)

Which regex to use?

i have expressions like :
-3-5
or -3--5
or 3-5
or 3-+5
or -3-+5
I need to extact the numbers , splitting on the "-" sign between them i.e in the above cases i would need,
-3 and 5, -3 and -5 , 3 and 5, 3 and +5 , -3 and +5.
I have tried using this:
String s[] = str.split("[+-]?\\d+\\-[+-]?\\d+");
int len = s.length;
for(int i=0;i<len;i++)System.out.println(s[i]);
but it's not working
Try to split with this regular expression:
str.split("\\b-")
The word boundary \b should only match before or after a digit so that in combination with - only the following - as the range indicator is matched:
-3-5, -3--5 , 3-5,3-+5,-3-+5
^ ^ ^ ^ ^
Crossposted to forums.sun.com.
This is not a job for REs by themselves. You need a scanner to return operators and numbers, and an expression parser. Consider -3-------5.
Your expression is pretty ok. Split is not the choice though since you are trying to match the expression to your string - not split the string using it:
Here is some code that can make use of your expression to obtain what you want:
String a = "-90--80";
Pattern x = Pattern.compile("([+-]?\\d+)\\-([+-]?\\d+)");
Matcher m = x.matcher(a);
if(m.find()){
System.out.println(m.group(1));
System.out.println(m.group(2));
}

Regex to find an integer within a string

I'd like to use regex with Java.
What I want to do is find the first integer in a string.
Example:
String = "the 14 dogs ate 12 bones"
Would return 14.
String = "djakld;asjl14ajdka;sdj"
Would also return 14.
This is what I have so far.
Pattern intsOnly = Pattern.compile("\\d*");
Matcher makeMatch = intsOnly.matcher("dadsad14 dssaf jfdkasl;fj");
makeMatch.find();
String inputInt = makeMatch.group();
System.out.println(inputInt);
What am I doing wrong?
You're asking for 0 or more digits. You need to ask for 1 or more:
"\\d+"
It looks like the other solutions failed to handle +/- and cases like 2e3, which java.lang.Integer.parseInt(String) supports, so I'll take my go at the problem. I'm somewhat inexperienced at regex, so I may have made a few mistakes, used something that Java's regex parser doesn't support, or made it overly complicated, but the statements seemed to work in Kiki 0.5.6.
All regular expressions are provided in both an unescaped format for reading, and an escaped format that you can use as a string literal in Java.
To get a byte, short, int, or long from a string:
unescaped: ([\+-]?\d+)([eE][\+-]?\d+)?
escaped: ([\\+-]?\\d+)([eE][\\+-]?\\d+)?
...and for bonus points...
To get a double or float from a string:
unescaped: ([\+-]?\d(\.\d*)?|\.\d+)([eE][\+-]?(\d(\.\d*)?|\.\d+))?
escaped: ([\\+-]?\\d(\\.\\d*)?|\\.\d+)([eE][\\+-]?(\\d(\\.\\d*)?|\\.\\d+))?
Use one of them:
Pattern intsOnly = Pattern.compile("[0-9]+");
or
Pattern intsOnly = Pattern.compile("\\d+");
Heres a handy one I made for C# with generics. It will match based on your regular expression and return the types you need:
public T[] GetMatches<T>(string Input, string MatchPattern) where T : IConvertible
{
List<T> MatchedValues = new List<T>();
Regex MatchInt = new Regex(MatchPattern);
MatchCollection Matches = MatchInt.Matches(Input);
foreach (Match m in Matches)
MatchedValues.Add((T)Convert.ChangeType(m.Value, typeof(T)));
return MatchedValues.ToArray<T>();
}
then if you wanted to grab only the numbers and return them in an string[] array:
string Test = "22$data44abc";
string[] Matches = this.GetMatches<string>(Test, "\\d+");
Hopefully this is useful to someone...
In addition to what PiPeep said, if you are trying to match integers within an expression, so that 1 + 2 - 3 will only match 1, 2, and 3, rather than 1, + 2 and - 3, you actually need to use a lookbehind statement, and the part you want will actually be returned by Matcher.group(2) rather than just Matcher.group().
unescaped: ([0-9])?((?(1)(?:[\+-]?\d+)|)(?:[eE][\+-]?\d+)?)
escaped: ([0-9])?((?(1)(?:[\\+-]?\\d+)|)(?:[eE][\\+-]?\\d+)?)
Also, for things like someNumber - 3, where someNumber is a variable name or something like that, you can use
unescaped: (\w)?((?(1)(?:[\+-]?\d+)|)(?:[eE][\+-]?\d+)?)
escaped: (\\w)?((?(1)(?:[\\+-]?\\d+)|)(?:[eE][\\+-]?\\d+)?)
Although of course that wont work if you are parsing a string like The net change to blahblah was +4
the java spec actually gives this monster of a regex for parsing doubles.
however it is considered bad practice, just trying to parse with the intended type, and catching the error, tends to be slightly more readable.
DOUBLE_PATTERN = Pattern
.compile("[\\x00-\\x20]*[+-]?(NaN|Infinity|((((\\p{Digit}+)(\\.)?((\\p{Digit}+)?)"
+ "([eE][+-]?(\\p{Digit}+))?)|(\\.((\\p{Digit}+))([eE][+-]?(\\p{Digit}+))?)|"
+ "(((0[xX](\\p{XDigit}+)(\\.)?)|(0[xX](\\p{XDigit}+)?(\\.)(\\p{XDigit}+)))"
+ "[pP][+-]?(\\p{Digit}+)))[fFdD]?))[\\x00-\\x20]*");

Categories