Java - extract content inside square brackets (ignore nested square brackets)? [duplicate] - java

This question already has an answer here:
Match contents within square brackets, including nested square brackets
(1 answer)
Closed 3 years ago.
I want to extract the string content inside square brackets (if inside one square brackets contains nested square brackets, it should be ignored).
Example:
c[ts[0],99:99,99:99] + 5 - d[ts[1],99:99,99:99, ts[2]] + 5
Should return:
match1 = "ts[0],99:99,99:99";
match2 = "ts[1],99:99,99:99, ts[2]";
The code I have so far works only with non-nested square brackets
String in = "c[ts[0],99:99,99:99] + 5 - d[ts[1],99:99,99:99, ts[2]] + 5";
Pattern p = Pattern.compile("\\[(.*?)\\]");
Matcher m = p.matcher(in);
while(m.find()) {
System.out.println(m.group(1));
}
// print: ts[0, ts[1, 2

I made a function to do it (not with regex, but it works)
for (int i = 0; i < in.length(); i++){
char c = in.charAt(i);
String part = String.valueOf(c);
int numberOfOpenBrackets = 0;
if (c == '[') {
part = "";
numberOfOpenBrackets++;
for (int j = i + 1; j < in.length(); j++) {
char d = in.charAt(j);
if (d == '[') {
numberOfOpenBrackets++;
}
if (d == ']') {
numberOfOpenBrackets--;
i = j;
if (numberOfOpenBrackets == 0) {
break;
}
}
part += d;
}
System.out.println(part);
part = "[" + part + "]";
}
result += part;
}
// print: ts[0],99:99,99:99
// ts[1],99:99,99:99, ts[2]

If the nesting is just one level, you can search for a sequence between the brackets:
a sequence of:
either a not a [
or a [ followed by the shortest sequence to ]
So
Pattern p = Pattern.compile("\\[([^\\[]|\\[.*?\\])*\\]");
// [ ]
// ( not-[ or
// [, shortest sequence to ]
// )* repeatedly
The problem being that brackets must be correctly paired: no syntax errors allowed.

Without regex; just straight java:
import java.util.ArrayList;
import java.util.List;
public class BracketParser {
public static List<String> parse(String target) throws Exception {
List<String> results = new ArrayList<>();
for (int idx = 0; idx < target.length(); idx++) {
if (target.charAt(idx) == '[') {
String result = readResult(target, idx + 1);
if (result == null) throw new Exception();
results.add(result);
idx += result.length() + 1;
}
}
return results;
}
private static String readResult(String target, int startIdx) {
int openBrackets = 0;
for (int idx = startIdx; idx < target.length(); idx++) {
char c = target.charAt(idx);
if (openBrackets == 0 && c == ']')
return target.substring(startIdx, idx);
if (c == '[') openBrackets++;
if (c == ']') openBrackets--;
}
return null;
}
public static void main(String[] args) throws Exception {
System.out.println(parse("c[ts[0],99:99,99:99] + 5 - d[ts[1],99:99,99:99, ts[2]] + 5"));
}
}
Complete code on GitHub

You might want to add a right boundary in your expression and ts start and swipe everything in between, which might work, maybe similar to this expression:
(ts.*?)(\]\s+\+)
If we have more chars here: (\s\+), you can simply add it with logical ORs in a char list and it would still work.
RegEx
If this wasn't your desired expression, you can modify/change your expressions in regex101.com.
RegEx Circuit
You can also visualize your expressions in jex.im:

Related

Decode String in Java

I am trying to convert this Python Solution in Java. For some reason, my Java Solution is not working. How can this be done correctly?
https://leetcode.com/problems/decode-string/description/
Given an encoded string, return its decoded string. The encoding rule is: k[encoded_string], where the encoded_string inside the square brackets is being repeated exactly k times. Note that k is guaranteed to be a positive integer.
You may assume that the input string is always valid; there are no extra white spaces, square brackets are well-formed, etc. Furthermore, you may assume that the original data does not contain any digits and that digits are only for those repeat numbers, k. For example, there will not be input like 3a or 2[4].
The test cases are generated so that the length of the output will never exceed 105.
Example 1:
Input: s = "3[a]2[bc]"
Output: "aaabcbc"
Example 2:
Input: s = "3[a2[c]]"
Output: "accaccacc"
Python Solution:
class Solution:
def decodeString(self, s: str) -> str:
stack = []
for char in s:
if char is not "]":
stack.append(char)
else:
sub_str = ""
while stack[-1] is not "[":
sub_str = stack.pop() + sub_str
stack.pop()
multiplier = ""
while stack and stack[-1].isdigit():
multiplier = stack.pop() + multiplier
stack.append(int(multiplier) * sub_str)
return "".join(stack)
Java Attempt:
class Solution {
public String decodeString(String s) {
Deque<String> list = new ArrayDeque<String>();
String subword = "";
String number = "";
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) != ']' ) {
list.add(String.valueOf(s.charAt(i)));
}
else {
subword = "";
while (list.size() > 0 && !list.getLast().equals("[") ) {
subword = list.pop() + subword;
}
if (list.size() > 0) list.pop();
number = "";
while (list.size() > 0 && isNumeric(list.getLast())){
number = list.pop() + number;
}
for (int j = 1; (isNumeric(number) && j <= Integer.parseInt(number)); j++) list.add(subword);
}
}
return String.join("", list);
}
public static boolean isNumeric(String str) {
try {
Double.parseDouble(str);
return true;
} catch(NumberFormatException e){
return false;
}
}
}
The reason why your posted code is not working is because the pop() method in python removes the last element by default.
But in Java, the ArrayDeque class's pop() method removes the first element.
In order to emulate the python code with the ArrayDeque, you'll need to use the removeLast() method of the ArrayDeque instance instead.
public class Solution{
public static String decodeString(String s) {
StringBuilder stack = new StringBuilder();
for(char c : s.toCharArray()) {
if(c != ']') {
stack.append(c);
} else {
StringBuilder sub_str = new StringBuilder();
while(stack.charAt(stack.length() - 1) != '[') {
sub_str.insert(0, stack.charAt(stack.length() - 1));
stack.deleteCharAt(stack.length() - 1);
}
stack.deleteCharAt(stack.length() - 1);
StringBuilder multiplier = new StringBuilder();
while(stack.length() > 0 && Character.isDigit(stack.charAt(stack.length() - 1))) {
multiplier.insert(0, stack.charAt(stack.length() - 1));
stack.deleteCharAt(stack.length() - 1);
}
for(int i = 0; i < Integer.parseInt(multiplier.toString()); i++) {
stack.append(sub_str);
}
}
}
return stack.toString();
}
public static void main(String[] args) {
System.out.println( decodeString("3[a2[c]]"));
//Output: "accaccacc"
System.out.println( decodeString("3[a]2[bc]"));
//Output: "aaabcbc"
}
}

Expansion on given string according to number * contents of the parentheses

I am trying to take a given string and when there is a number before parentheses then what's inside the parentheses gets repeated that number of times. I thought about using StringBuilder and built this function but I'm not sure how to get the inside of the parentheses repeated.
example- 3(ab) - result would be ababab , example- 3(b(2(c))) result would be bccbccbcc
in the function I built here it repeats the parentheses and not the contents of the parentheses.
public static String solve(String s){
StringBuilder sb = new StringBuilder();
int repeat = 0;
for (char c : s.toCharArray()) {
if (Character.isDigit(c)) {
repeat = repeat * 10 + Character.getNumericValue(c);
} else {
while (repeat > 0) {
sb.append(c);
repeat--;
}
sb.append(c);
}
}
return sb.toString();
}
}
Pretty much the same answer as #SerialLazers, but in java and with a bit of debugging-output to see how the code behaves:
public static String solve(String s)
{
Stack<Integer> countStack = new Stack<>(); // stack for counting
Stack<StringBuilder> stubs = new Stack<>(); // stack for parts of the string that were processed
stubs.push(new StringBuilder());
int count = 0;
for(char c : s.toCharArray())
{
System.out.println(Character.toString(c) + " " + count + " " + countStack + stubs);
if(Character.isDigit(c))
{
// part of a count (assumes digits are never part of the actual output-string)
count = count * 10 + (c - '0');
}
else if(c == '(')
{
// encountered the start of a new repeated group
if(count == 0)
// no count specified, assume a count of one
countStack.push(1);
else
// push the count for this group
countStack.push(count);
// push a new stringbuilder that will contain the new group
stubs.push(new StringBuilder());
count = 0; // reset count
}
else if(c == ')')
{
// group terminated => repeat n times and append to new group one above
String tmp = stubs.pop().toString();
int ct = countStack.pop();
for(int i = 0; i < ct; i++)
stubs.peek().append(tmp);
}
else
{
// just a normal character, append to topmost group
stubs.peek().append(c);
count = 0;
}
}
// if the string was valid there's only the output-string left on the stubs-list
return stubs.peek().toString();
}
Output:
3 0 [][]
( 3 [][]
b 0 [3][, ]
( 0 [3][, b]
2 0 [3, 1][, b, ]
( 2 [3, 1][, b, ]
c 0 [3, 1, 2][, b, , ]
) 0 [3, 1, 2][, b, , c]
) 0 [3, 1][, b, cc]
) 0 [3][, bcc]
Returns:
bccbccbcc
The problem is naturally recursive. Preserving the approach you’ve started, you could write something like the following. In real code, I’d probably favour an approach that separated tokenisation and parsing, meaning I would do two separate passes: the first to transform the input string into tokens, and the second to produce output from the token stream.
public static Pair<String, Integer> solve(String s, int start) {
int repeat = 0;
String ret = "";
for (int i = start; i < s.length(); i++) {
final char c = s.charAt(i);
if (Character.isDigit(c)) {
repeat = repeat * 10 + Character.getNumericValue(c);
} else if (c == '(') {
final Pair<String, Integer> inner = solve(s, i + 1);
// At least one repetition, even if no explicit `repeat` given.
ret += inner.first;
while (--repeat > 0) {
ret += inner.first;
}
repeat = 0; // Ensure that `repeat` isn’t -1 after the loop.
i = inner.second;
} else if (c == ')') {
return new Pair<>(ret, i);
} else {
ret += c;
}
}
return new Pair<>(ret, s.length());
}
Converting this code to use a single StringBuilder — to avoid redundant string copies — is left as an exercise.
The above uses a simple Pair helper class. Since Java doesn’t ship with one (groan), here’s a very simple implementation that can sit alongside the above code; you can also use JavaFX’s javafx.util.Pair or java.util.AbstractMap.SimpleEntry or whatever.
static class Pair<T, U> {
final T first;
final U second;
Pair(T f, U s) {
first = f;
second = s;
}
}
You need a stack to maintain sort of a memory of the operations you need from inner-most to outermost container.
Here is the code in Python:
def parenthesis_printer(s):
L = [""] # maintains the stack of the string-containers
N = [1] # maintains the stack of the print-multiplier needed for the corresponding string-container
nstr = ""
for i in range(len(s)):
if s[i].isnumeric():
nstr += s[i]
elif s[i] == '(':
nstr = "1" if len(nstr) == 0 else nstr
nval = int(nstr)
N.append(nval)
L.append("")
nstr = ""
elif s[i] == ')':
nval = N.pop()
lval = L.pop()
lstr = "".join([lval for _ in range(nval)])
L[-1] += lstr
else:
L[-1] += s[i]
return L[-1]
print(parenthesis_printer("3(b(2(c)))"))
Output:
bccbccbcc

Java Regular expression to find out the number of matching words

I am learning regular expression.Suppose, If I have two String like abcd & bcdd. To make them equal Strings I have to remove a from first and d from last string. is this possible to count the matched number like bcd=> (3).
Currently, I am doing this
Pattern p= Pattern.compile("["+abcd+"]{2}");
Matcher m= p.matcher("abcd bcdd");
My current solution doesn't provide me the correct result. So, my question
1) Is this possible ?
2) If possible, then how can I achieve that ?
Hope, you will help to increase my regular expression knowledge.
Not sure why you would use regex at all, if all you need is the number of "bcd"s. I've put both a non-regex and regex version here for comparison.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
<P>{#code java BcdRegexXmpl}</P>
**/
public class BcdRegexXmpl {
public static final void main(String[] igno_red) {
String sSentence = "abcd bcdd";
int iBcds = 0;
int iIdx = 0;
while(true) {
int iBcdIdx = sSentence.indexOf("bcd", iIdx);
if(iBcdIdx == -1) {
break;
}
iIdx = iBcdIdx + "bcd".length();
iBcds++;
}
System.out.println("Number of 'bcd's (no regex): " + iBcds);
//Alternatively
iBcds = 0;
//Same regex as #la-comadreja, with word-boundaries
//(for multiple "bcd"-s in a single word, remove the "\\b"-s)
Matcher m = Pattern.compile("\\b\\w*bcd\\w*\\b").matcher(sSentence);
while(m.find()) {
System.out.println("Found at index " + m.start());
iBcds++;
}
System.out.println("Number of 'bcd's (with regex): " + iBcds);
}
}
Output:
[R:\jeffy\programming\sandbox\xbnjava]java BcdRegexXmpl
Number of 'bcd's (no regex): 2
Found at index 0
Found at index 5
Number of 'bcd's (with regex): 2
Your pattern should be:
(a?)(bcd)(d?)
Another possibility is to write it as
\w*bcd\w*
If you want to count the number of "bcd"s in the string:
int bcds = 0;
for (int i = 0; i < str.length() - 2; i++) {
if (str.charAt(i) == 'b' && str.charAt(i+1) == 'c' && str.charAt(i+2) == 'd')
bcds++;
}
A maximally generalizable, concise and readable (and reasonably efficient) non-Regex answer:
int countMatches(String s, String searchStr) {
//Here, s is "abcd bcdd" and searchStr is "bcd"
int matches = 0;
for (int i = 0; i < s.length() - searchStr.length() + 1; i++) {
for (int j = 0; j < searchStr.length(); j++) {
if (s.charAt(i + j) != searchStr.charAt(j)) break;
if (j == searchStr.length() - 1) matches++;
}
}
return matches;
}

java regex, split on comma only if not in quotes or brackets

I would like to do a java split via regex.
I would like to split my string on every comma when it is NOT in single quotes or brackets.
example:
Hello, 'my,',friend,(how ,are, you),(,)
should give:
hello
my,
friend
how, are, you
,
I tried this:
(?i),(?=([^\'|\(]*\'|\([^\'|\(]*\'|\()*[^\'|\)]*$)
But I can't get it to work (I tested via http://java-regex-tester.appspot.com/)
Any ideas?
Nested paranthesises can't be split by regex. Its easier to split them manually.
public static List<String> split(String orig) {
List<String> splitted = new ArrayList<String>();
int nextingLevel = 0;
StringBuilder result = new StringBuilder();
for (char c : orig.toCharArray()) {
if (c == ',' && nextingLevel == 0) {
splitted.add(result.toString());
result.setLength(0);// clean buffer
} else {
if (c == '(')
nextingLevel++;
if (c == ')')
nextingLevel--;
result.append(c);
}
}
// Thanks PoeHah for pointing it out. This adds the last element to it.
splitted.add(result.toString());
return splitted;
}
Hope this helps.
A java CSV parser library would be better suited to this task than regex: http://sourceforge.net/projects/javacsv/
Assuming no nested (), you could split on
",(?=(?:[^']*'[^']*')*[^']*$)(?=(?:[^()]*\\([^()]*\\))*[^()]*$)"
It will only split on a comma when ahead in the string is an even number of ' and bracket pairs.
It's a brittle solution, but it may be good enough.
As in some comments and answer by #Balthus this should better be done in a CSV Parser. You do need to do some smart RexEx replacement to prepare the input string for parsing. Consider code like this:
String str = "Hello, 'my,',friend,(how ,are, you),(,)"; // input string
// prepare String for CSV parser: replace left/right brackets OR ' by a "
CsvReader reader = CsvReader.parse(str.replaceAll("[(')]", "\""));
reader.readRecord(); // read the CSV input
for (int i=0; i<reader.getColumnCount(); i++)
System.out.printf("col[%d]: [%s]%n", i, reader.get(i));
OUTPUT
col[0]: [Hello]
col[1]: [my,]
col[2]: [friend]
col[3]: [how ,are, you]
col[4]: [,]
I also need to split on comma outside of quotes and brackets.
After searching over all the related answers on SO, I realized a lexer is needed in such a case, and I wrote a generic implementation for myself. It supports a separator, multiple quotes and multiple brackets as regexes.
public static List<String> split(String string, String regex, String[] quotesRegex, String[] leftBracketsRegex,
String[] rightBracketsRegex) {
if (leftBracketsRegex.length != rightBracketsRegex.length) {
throw new IllegalArgumentException("Bracket count mismatch, left: " + leftBracketsRegex.length + ", right: "
+ rightBracketsRegex.length);
}
// Prepare all delimiters.
String[] delimiters = new String[1 + quotesRegex.length + leftBracketsRegex.length + rightBracketsRegex.length];
delimiters[0] = regex;
System.arraycopy(quotesRegex, 0, delimiters, 1, quotesRegex.length);
System.arraycopy(leftBracketsRegex, 0, delimiters, 1 + quotesRegex.length, leftBracketsRegex.length);
System.arraycopy(rightBracketsRegex, 0, delimiters, 1 + quotesRegex.length + leftBracketsRegex.length,
rightBracketsRegex.length);
// Build delimiter regex.
StringBuilder delimitersRegexBuilder = new StringBuilder("(?:");
boolean first = true;
for (String delimiter : delimiters) {
if (delimiter.endsWith("\\") && !delimiter.endsWith("\\\\")) {
throw new IllegalArgumentException("Delimiter contains trailing single \\: " + delimiter);
}
if (first) {
first = false;
} else {
delimitersRegexBuilder.append("|");
}
delimitersRegexBuilder
.append("(")
.append(delimiter)
.append(")");
}
delimitersRegexBuilder.append(")");
String delimitersRegex = delimitersRegexBuilder.toString();
// Scan.
int pendingQuoteIndex = -1;
Deque<Integer> bracketStack = new LinkedList<>();
StringBuilder pendingSegmentBuilder = new StringBuilder();
List<String> segmentList = new ArrayList<>();
Matcher matcher = Pattern.compile(delimitersRegex).matcher(string);
int matcherIndex = 0;
while (matcher.find()) {
pendingSegmentBuilder.append(string.substring(matcherIndex, matcher.start()));
int delimiterIndex = -1;
for (int i = 1; i <= matcher.groupCount(); ++i) {
if (matcher.group(i) != null) {
delimiterIndex = i - 1;
break;
}
}
if (delimiterIndex < 1) {
// Regex.
if (pendingQuoteIndex == -1 && bracketStack.isEmpty()) {
segmentList.add(pendingSegmentBuilder.toString());
pendingSegmentBuilder.setLength(0);
} else {
pendingSegmentBuilder.append(matcher.group());
}
} else {
delimiterIndex -= 1;
pendingSegmentBuilder.append(matcher.group());
if (delimiterIndex < quotesRegex.length) {
// Quote.
if (pendingQuoteIndex == -1) {
pendingQuoteIndex = delimiterIndex;
} else if (pendingQuoteIndex == delimiterIndex) {
pendingQuoteIndex = -1;
}
// Ignore unpaired quotes.
} else if (pendingQuoteIndex == -1) {
delimiterIndex -= quotesRegex.length;
if (delimiterIndex < leftBracketsRegex.length) {
// Left bracket
bracketStack.push(delimiterIndex);
} else {
delimiterIndex -= leftBracketsRegex.length;
// Right bracket
int topBracket = bracketStack.peek();
// Ignore unbalanced brackets.
if (delimiterIndex == topBracket) {
bracketStack.pop();
}
}
}
}
matcherIndex = matcher.end();
}
pendingSegmentBuilder.append(string.substring(matcherIndex, string.length()));
segmentList.add(pendingSegmentBuilder.toString());
while (segmentList.size() > 0 && segmentList.get(segmentList.size() - 1).isEmpty()) {
segmentList.remove(segmentList.size() - 1);
}
return segmentList;
}

How can I find the index of the first "element" in my string using Java?

I'm working on writing a simple Prolog interpreter in Java.
How can I find the last character index of the first element either the head element or the tail element of a string in "List Syntax"?
List Syntax looks like:
(X)
(p a b)
(func (func2 a) (func3 X Y))
(equal eve (mother cain))
The head for each of those strings in order are:
Head: "X", Index: 1
Head: "p", Index: 1
Head: "func", Index: 4
Head: "equal", Index: 5
Basically, I need to match the string that immediately follows the first "(" and ends either with a space or a closing ")", whichever comes first. I need the character index of the last character of the head element.
How can I match and get this index in Java?
Brabster's solution is really close. However, consider the case of:
((b X) Y)
Where the head element is (b x). I attempted to fix it by removing "(" from the scanner delimiters but it still hiccups because of the space between "b" and "x".
Similarly:
((((b W) X) Y) Z)
Where the head is (((b w) x) Y).
Java's Scanner class (introduced in Java 1.5) might be a good place to start.
Here's an example that I think does what you want (updated to include char counting capability)
public class Test {
public static void main(String[] args) {
String[] data = new String[] {
"(X)",
"(p a b)",
"(func (func2 a) (func3 X Y))",
"(equal eve (mother cain))",
"((b X) Y)",
"((((b W) X) Y) Z)"
};
for (String line:data) {
int headIdx = 0;
if (line.charAt(1) == '(') {
headIdx = countBrackets(line);
} else {
String head = "";
Scanner s = new Scanner(line);
s.useDelimiter("[)|(| ]");
head = s.next();
headIdx = line.indexOf(head) + head.length() - 1;
}
System.out.println(headIdx);
}
}
private static int countBrackets(String line) {
int bracketCount = 0;
int charCount = 0;
for (int i = 1; i < line.length(); i++) {
char c = line.charAt(i);
if (c == '(') {
bracketCount++;
} else if (c == ')') {
bracketCount--;
}
if (bracketCount == 0) {
return charCount + 1;
}
charCount++;
}
throw new IllegalStateException("Brackets not nested properly");
}
}
Output:
1
1
4
5
5
13
It's not a very elegant solution, but regexes can't count (i.e. brackets). I'd be thinking about using a parser generator if there's any more complexity in there :)
Is there a reason you can't just brute force it? Something like this?
public int firstIndex( String exp ) {
int parenCount = 0;
for (int i = 1; i < exp.length(); i++) {
if (exp.charAt(i) == '(') {
parenCount++;
}
else if (exp.charAt(i) == ')') {
parenCount--;
}
if (parenCount == 0 && (exp.charAt(i+1) == ' ' || exp.charAt(i) == ')')) {
return i;
}
}
}
I may be missing something here, but I think that would work.
I suggest you write a proper parser (operator precedence in the case of Prolog) and represent the terms as trees of Java objects for further processing.

Categories