Split a Java String - java

I'm having a little trouble with Java regexes. I have a string like this
a + 4 * log(3/abs(1 – x)) + sen(-b/4 + PI)
and i need to split this in the following tokens:
{"a", "+", "4", "*", "log", "(3/abs(1 - x))", "+", "sen", "(-b/4 + PI)"}
Any idea?
I tried this PHP regex, but for some reason it won't work on Java
[a-z]+(\((?>[^()]+|(?1))*\))|[a-z]+|\d+|\/|\-|\*|\+

Match All vs Splitting
Matching and splitting are two sides of the same coin. This is quite tricky because Java does not support recursion and we have some nested parentheses. But this should do the trick:
Java
\(.*?\)(?![^(]*\))|[^\s(]+
See demo.
To iterate over all the matches:
Pattern regex = Pattern.compile("\\(.*?\\)(?![^(]*\\))|[^\\s(]+");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// the match: regexMatcher.group()
}
Explanation
\(.*?\)(?![^(]*\)) matches an opening parenthesis and everything up to a closing parenthesis that is not followed by an opening par and more closing pars. This works for the (simple(nesting)) in your expression, but would not work for (this(kind)of(nesting)) (see PHP solution)
| OR...
[^\s(]+ any chars that are not spaces or an opening par
PHP Option with Recursion
In PHP, we can use recursion to match the nested constructs more precisely (this will overcome the Java problem with (this(kind)of(nesting)):
(\((?:[^()]++|(?1))*\))|[^\s(]+

I have written a small java program to split instead of using regular expression spli, see if this can help
import java.util.ArrayList;
public class Test2 {
public static void main(String args[]) {
System.out.println(splitExp("a + 4 * log(3/abs(1 – x)) + sen(-b/4 + PI)"));
}
private static ArrayList<String> splitExp(String exp) {
StringBuilder chString = new StringBuilder();
ArrayList<String> arrL = new ArrayList<String>();
for (int i = 0 ; i < exp.length() ; i++ ) {
char ch = exp.charAt(i);
if(ch == ' ')
continue;
if(( ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z')) {
chString = chString.append(String.valueOf(ch));
}
else {
if (chString.length() > 0) {
arrL.add(chString.toString());
chString = new StringBuilder();
}
arrL.add(String.valueOf(ch));
}
}
return arrL;
}
}

Related

How can I find a String within a Java program converted to a string?

Basically, I read a java program into my program as a string, and I'm trying to find a way to extract strings from this. I have a loop counting through each character of this program, and this is what happens when it reaches a '"'.
else if (ch == '"')
{
String subString = " ";
index ++;
if (ch != '"')
{
subString += ch;
}
else
{
System.out.println(lineNumber + ", " + TokenType.STRING + ", " + subString);
index ++;
continue;
}
Unfortunately, this isn't working. This is the way I am trying to output the subString.
Essentially, I am looking for a way to add all the characters in between two "s together in order to get a String.
You could use regular expressions:
Pattern regex = Pattern.compile("(?:(?!<')\"(.*?(?<!\\\\)(?:\\\\\\\\)*)\")");
Matcher m = regex.matcher(content);
while (m.find())
System.out.println(m.group(1));
This will capture quoted strings, and takes account of escaped quotes/backslashes.
To break down the pattern:
(?: ... ) = don't capture as a group (the inside is captured instead)
(?!<') = make sure there isn't a single quote before (to avoid '"')
\"( ... )\" = capture what is inside the quotes
.*? = match the minimum of string of any chars
(?<!\\\\) = don't match single backslash before (double-escape = single backslash in content)
(?\\\\\\\\)* = match 0 or even number of backslashes
Together, 5. & 6. only match an even number of backslashes before the quote. This allows string endings like \\", \\\\", but not \" and \\\", which would be part of the string.
Non-regex solution, also taking care of escaped quotes:
List<String> strings = new ArrayList<>();
int start = -1;
int backslashes = 0;
for (int i = 0; i < content.length(); i++) {
char ch = content.charAt(i);
if (ch == '"') {
if (start == -1) {
start = i + 1;
backslashes = 0;
} else if (backslashes % 2 == 1) {
backslashes = 0;
} else {
strings.add(content.substring(start, i));
start = -1;
}
} else if (ch == '\\') backslashes++;
}
strings.forEach(System.out::println);

What is the most efficient method to get tuples within a pig bag string?

I have a string pig bag. Below are some of the possible pig bag formats.
{(Kumar,39)},(Raja, 30), (Mohammad, 45),{(balu,29)}
{(Raja, 30), (Mohammad, 45),{(balu,29)}}
{(Raja,30),(Kumar,34)}
Here everything surrounded by a "{}" is a pig bag. What is the most efficient way to get all the tuples and insert them into a tuple object? Tuples are the comma separated values surrounded by "()". Pig bags can contain pig bags within them along with tuples. Any help would be much appreciated. Below is what I've tried. Seems a clumsy approach though.
private static void convertStringToDataBag(String dataBagString) {
Map<Integer,Integer> openBracketsAndClosingBrackets = new HashMap<>();
char[] charArray = dataBagString.toCharArray();
for (int i=0; i<charArray.length;i++) {
if(charArray[i] == '(' || charArray[i] == '{') {
int closeIndex = findClosingParen(dataBagString,i);
openBracketsAndClosingBrackets.put(i,closeIndex);
String subString = dataBagString.substring(i+1,closeIndex);
System.out.println("sub string : " +subString);
if(!subString.contains("(") || !subString.contains(")") || !subString.contains("{") || !subString.contains("}"))) {
//consider this as a tuple and comma split and insert.
}
}
}
}
public static int findClosingParen(String str, int openPos) {
char[] text = str.toCharArray();
int closePos = openPos;
int counter = 1;
while (counter > 0) {
char c = text[++closePos];
if (c == '(' || c== '{') {
counter++;
}
else if (c == ')' || c== '}') {
counter--;
}
}
return closePos;
}
This should work for you :
public static void main(String[] args) throws Exception {
String s = "{(Kumar,39)},(Raja, 30), (Mohammad, 45),{(balu,29)}";
// Create / compile a pattern that captures everything between each "()"
Pattern p = Pattern.compile("\\((.*?)\\)");
//Create a matcher using the pattern and your input string.
Matcher m = p.matcher(s);
// As long as there are matches for that pattern, find them and print them.
while(m.find()) {
System.out.println(m.group(1)); // print data within each "()"
}
}
O/P :
Kumar,39
Raja, 30
Mohammad, 45
balu,29

Splitting string at parentheses in Java

Now,if I have a String like this:
String start = "(1374)(48.4%)(32)(100%)(290)(43.1%)";
How can I extract the six numbers 1374 48.4 32 100 290 43.1 or 1374 48.4% 32 100% 290 43.1%? Can it be done with a regex?
You can search for a regex identifying floating point numbers: ([+-]?(\d+\.)?\d+)
String start = "(1374)(48.4%)(32)(100%)(290)(43.1%)";
Pattern p = Pattern.compile("([+-]?(\\d+\\.)?\\d+)");
Matcher m = p.matcher(start);
while (m.find()) {
System.out.println(m.group(1));
}
Or use a regex that also makes sure that the brackets are there:
Pattern p = Pattern.compile("\\(([+-]?(\\d+\\.)?\\d+)\\%?\\)");
Let's do it without a regex!
int i = 0;
while (i < start.length()) {
while (i < start.length()) {
char ch = start.charAt(i);
// Maybe add other characters, e.g. %, if desired.
if (Character.isDigit(ch) || ch == '.') {
break;
}
++i;
}
int startOfBlock = i;
while (i < start.length()) {
char ch = start.charAt(i);
if (!Character.isDigit(ch) && ch != '.') {
break;
}
++i;
}
if (i > startOfBlock) {
System.out.println(start.substring(startOfBlock, i));
}
}
or u can try the following regex [\d\.%]+ it gives u the combinations of the strings containing the following items \d (digit) \. dot and % procentage sign one or more times click here for a live demo
String start = "(1374)(48.4%)(32)(100%)(290)(43.1%)";
for(String splitString : start.split("[()]")) {
System.out.print(splitString + " ");
}

Add separator in string using regex in Java

I have a string (for example: "foo12"), and I want to add a delimiting character in between the letters and numbers (e.g. "foo|12"). However, I can't seem to figure out what the appropriate code is for doing this in Java. Should I use a regex + replace or do I need to use a matcher?
A regex replace would be just fine:
String result = subject.replaceAll("(?<=\\p{L})(?=\\p{N})", "|");
This looks for a position right after a letter and right before a digit (by using lookaround assertions). If you only want to look for ASCII letters/digits, use
String result = subject.replaceAll("(?i)(?<=[a-z])(?=[0-9])", "|");
Split letters and numbers and concatenate with "|". Here is a one-liner:
String x = "foo12";
String result = x.replaceAll("[0-9]", "") + "|" + x.replaceAll("[a-zA-Z]", "");
Printing result will output: foo|12
Why even use regex? This isn't too hard to implement on your own:
public static String addDelimiter(String str, char delimiter) {
StringBuilder string = new StringBuilder(str);
boolean isLetter = false;
boolean isNumber = false;
for (int index = 0; index < string.length(); index++) {
isNumber = isNumber(string.charAt(index));
if (isLetter && isNumber) {
//the last char was a letter, and now we have a number
//so here we adjust the stringbuilder
string.insert(index, delimiter);
index++; //We just inserted the delimiter, get past the delimiter
}
isLetter = isLetter(string.charAt(index));
}
return string.toString();
}
public static boolean isLetter(char c) {
return 'A' <= c && c <= 'Z' || 'a' <= c && c <= 'z';
}
public static boolean isNumber(char c) {
return '0' <= c && c <= '9';
}
The advantage of this over regex is that regex can easily be slower. Additionally, it is easy to change the isLetter and isNumber methods to allow for inserting the delimiter in different places.

How can tokenize this string in java?

How can I split these simple mathematical expressions into seperate strings?
I know that I basically want to use the regular expression: "[0-9]+|[*+-^()]" but it appears String.split() won't work because it consumes the delimiter tokens as well.
I want it to split all integers: 0-9, and all operators *+-^().
So, 578+223-5^2
Will be split into:
578
+
223
-
5
^
2
What is the best approach to do that?
You could use StringTokenizer(String str, String delim, boolean returnDelims), with the operators as delimiters. This way, at least get each token individually (including the delimiters). You could then determine what kind of token you're looking at.
Going at this laterally, and assuming your intention is ultimately to evaluate the String mathematically, you might be better off using the ScriptEngine
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;
public class Evaluator {
private ScriptEngineManager sm = new ScriptEngineManager();
private ScriptEngine sEngine = sm.getEngineByName("js");
public double stringEval(String expr)
{
Object res = "";
try {
res = sEngine.eval(expr);
}
catch(ScriptException se) {
se.printStackTrace();
}
return Double.parseDouble( res.toString());
}
}
Which you can then call as follows:
Evaluator evr = new Evaluator();
String sTest = "+1+9*(2 * 5)";
double dd = evr.stringEval(sTest);
System.out.println(dd);
I went down this road when working on evaluating Strings mathematically and it's not so much the operators that will kill you in regexps but complex nested bracketed expressions. Not reinventing the wheel is a) safer b) faster and c) means less complex and nested code to maintain.
This works for the sample string you posted:
String s = "578+223-5^2";
String[] tokens = s.split("(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)");
The regex is made up entirely of lookaheads and lookbehinds; it matches a position (not a character, but a "gap" between characters), that is either preceded by a digit and followed by a non-digit, or preceded by a non-digit and followed by a digit.
Be aware that regexes are not well suited to the task of parsing math expressions. In particular, regexes can't easily handle balanced delimiters like parentheses, especially if they can be nested. (Some regex flavors have extensions which make that sort of thing easier, but not Java's.)
Beyond this point, you'll want to process the string using more mundane methods like charAt() and substring() and Integer.parseInt(). Or, if this isn't a learning exercise, use an existing math expression parsing library.
EDIT: ...or eval() it as #Syzygy recommends.
You can't use String.split() for that, since whatever characters match the specified pattern are removed from the output.
If you're willing to require spaces between the tokens, you can do...
"578 + 223 - 5 ^ 2 ".split(" ");
which yields...
578
+
223
-
5
^
2
Here's a short Java program that tokenizes such strings. If you're looking for evaluation of expression I can (shamelessly) point you at this post: An Arithemetic Expressions Solver in 64 Lines
import java.util.ArrayList;
import java.util.List;
public class Tokenizer {
private String input;
public Tokenizer(String input_) { input = input_.trim(); }
private char peek(int i) {
return i >= input.length() ? '\0' : input.charAt(i);
}
private String consume(String... arr) {
for(String s : arr)
if(input.startsWith(s))
return consume(s.length());
return null;
}
private String consume(int numChars) {
String result = input.substring(0, numChars);
input = input.substring(numChars).trim();
return result;
}
private String literal() {
for (int i = 0; true; ++i)
if (!Character.isDigit(peek(i)))
return consume(i);
}
public List<String> tokenize() {
List<String> res = new ArrayList<String>();
if(input.isEmpty())
return res;
while(true) {
res.add(literal());
if(input.isEmpty())
return res;
String s = consume("+", "-", "/", "*", "^");
if(s == null)
throw new RuntimeException("Syntax error " + input);
res.add(s);
}
}
public static void main(String[] args) {
Tokenizer t = new Tokenizer("578+223-5^2");
System.out.println(t.tokenize());
}
}
You only put the delimiters in the split statement. Also, the - mean range and has to be escaped.
"578+223-5^2".split("[*+\\-^()]")
You need to escape the -. I believe the quantifiers (+ and *) lose their special meaning, as do parentheses in a character class. If it doesn't work, try escaping those as well.
Here is my tokenizer solution that allows for negative numbers (unary).
So far it has been doing everything I needed it to:
private static List<String> tokenize(String expression)
{
char c;
List<String> tokens = new ArrayList<String>();
String previousToken = null;
int i = 0;
while(i < expression.length())
{
c = expression.charAt(i);
StringBuilder currentToken = new StringBuilder();
if (c == ' ' || c == '\t') // Matched Whitespace - Skip Whitespace
{
i++;
}
else if (c == '-' && (previousToken == null || isOperator(previousToken)) &&
((i+1) < expression.length() && Character.isDigit(expression.charAt((i+1))))) // Matched Negative Number - Add token to list
{
currentToken.append(expression.charAt(i));
i++;
while(i < expression.length() && Character.isDigit(expression.charAt(i)))
{
currentToken.append(expression.charAt(i));
i++;
}
}
else if (Character.isDigit(c)) // Matched Number - Add to token list
{
while(i < expression.length() && Character.isDigit(expression.charAt(i)))
{
currentToken.append(expression.charAt(i));
i++;
}
}
else if (c == '+' || c == '*' || c == '/' || c == '^' || c == '-') // Matched Operator - Add to token list
{
currentToken.append(c);
i++;
}
else // No Match - Invalid Token!
{
i++;
}
if (currentToken.length() > 0)
{
tokens.add(currentToken.toString());
previousToken = currentToken.toString();
}
}
return tokens;
}
You have to escape the "()" in Java, and the '-'
myString.split("[0-9]+|[\\*\\+\\-^\\(\\)]");

Categories