Java Regex help extracting with negative lookahead

Java Regex help extracting with negative lookahead - java

I have the reg ex \\(.*?\\) to match what ever inside the parenthesis from my text
e.g. ((a=2 and age IN (15,18,56)) and (b=3 and c=4))
my output should only contain:
a=2 and age IN (15,18,56)
b=3 and c=4
I have tried using negative lookahead, not to match .*(?!IN)\\(.*?\\) but not returning what I expect. Can any body help with where I am going wrong?

You will need to parse nested expressions, and regular expressions alone cannot do that for you. A regular expression will only catch the innermost expressions with \\(([^(]*?)\\)
You can use the Pattern and Matcher classes to code a more complex solution.
Or you can use a parser. For Java, there's ANTL.
I just coded something that might help you:
public class NestedParser {
private final char opening;
private final char closing;
private String str;
private List<String> matches;
private int matchFrom(int beginIndex, boolean matchClosing) {
int i = beginIndex;
while (i < str.length()) {
if (str.charAt(i) == opening) {
i = matchFrom(i + 1, true);
if (i < 0) {
return i;
}
} else if (matchClosing && str.charAt(i) == closing) {
matches.add(str.substring(beginIndex, i));
return i + 1;
} else {
i++;
}
}
return -1;
}
public NestedParser(char opening, char closing) {
this.opening = opening;
this.closing = closing;
}
public List<String> match(String str) {
matches = new ArrayList<>();
if (str != null) {
this.str = str;
matchFrom(0, false);
}
return matches;
}
public static void main(String[] args) {
NestedParser parser = new NestedParser('(', ')');
System.out.println(parser.match(
"((a=2 and age IN (15,18,56)) and (b=3 and c=4))"));
}
}

It's not clear what you want in terms of nested brackets (eg. ((a = 2 and b = 3)): is this valid or not?)
This regex gets you most of the way there:
(\(.*?\)+)
On the input you specified, it matches two groups:
((a=2 and age IN (15,18,56))
(b=3 and c=4)) (notice the double-bracket at the end).
It will return everything, including nested brackets. Another variation will return only singly-bracketed expressions:
(\([^(]*?\))
The easiest way to test this is through Rubular.

Related

Is there a way to find out how many numbers are at the end of a string without knowing the exact index?

I have a method that extracts a certain substring from a string. This substring consists of the numbers in the string. Then this is parsed to an integer.
Method:
protected int startIndex() throws Exception {
String str = getWorkBook().getDefinedName("XYZ");
String sStr = str.substring(10,13);
return Integer.parseInt(sStr) - 1;
}
Example:
String :
'0 DB'!$B$460
subString :
460
Well, I manually entered the index range for the substring. But I would like to automate it.
My approach:
String str = getWorkBook().getDefinedName("XYZ");
int length = str.length();
String sStr = str.substring(length - 3, length);
This works well for this example.
Now there is the problem that the numbers at the end of the string can also be 4 or 5 digits. If that is the case, I naturally get a NullPointerException.
Is there a way or another approach to find out how many numbers are at the end of the string?

You can use the regex, (?<=\D)\d+$ which means one or more digits (i.e. \d+) from the end of the string, preceded by non-digits (i.e. \D).
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
// Test
System.out.println(getNumber("'0 DB'!$B$460"));
}
static String getNumber(String str) {
Matcher matcher = Pattern.compile("(?<=\\D)\\d+$").matcher(str);
if (matcher.find()) {
return matcher.group();
}
// If no match is found, return the string itself
return str;
}
}

In your case I would recommend to use regex with replaceAll like this:
String sStr = str.replaceAll(".*?([0-9]+)$", "$1");
This will extract the all the digits in the end or your String or any length.
Also I think you are missing the case when there are no digit in your String, for that I would recommend to check your string before you convert it to an Integer.
String sStr = str.replaceAll(".*?([0-9]+)$", "$1");
if (!sStr.isEmpty()) {
return Integer.parseInt(sStr) - 1;
}
return 0; // or any default value

If you just want to get the last number, you can go through the entire string on revert and get the start index:
protected static int startIndex() {
String str = getWorkBook().getDefinedName("XYZ");
if(Character.isDigit(str.charAt(str.length() - 1))) {
for(int i = str.length() - 1; i >= 0; i--){
if(!Character.isDigit(str.charAt(i)))
return i+1;
}
}
return -1;
}
and then print it:
public static void main(String[] args) {
int start = startIndex();
if(start != -1)
System.out.println(getWorkBook().getDefinedName("XYZ").substring(start));
else
System.out.println("No Number found");
}
You will have to add the

Simple and fast solution without RegEx:
public class Main
{
public static int getLastNumber(String str) {
int index = str.length() - 1;
while (index > 0 && Character.isDigit(str.charAt(index)))
index--;
return Integer.parseInt(str.substring(index + 1));
}
public static void main(String[] args) {
final String text = "'0 DB'!$B$460";
System.out.println(getLastNumber(text));
}
}
The output will be:
460

If I were going to do this I just search from the end. This is quite efficient. It returns -1 if no positive number is found. Other return options and the use of an OptionalInt could also be used.
String s = "'0 DB'!$B$460";
int i;
for (i = s.length(); i > 0 && Character.isDigit(s.charAt(i-1)); i--);
int vv = (i < s.length()) ? Integer.valueOf(s.substring(i)) : -1;
System.out.println(vv);
Prints
460
If you know that there will always be a number at the end you can forget the ternary (?:) above and just do the following:
int vv = Integer.valueOf(s.substring(i));

Pattern matching interview Q

I was recently in an interview and they asked me the following question:
Write a function to return true if a string matches a pattern, false
otherwise
Pattern: 1 character per item, (a-z), input: space delimited string
This was my solution for the first problem:
static boolean isMatch(String pattern, String input) {
char[] letters = pattern.toCharArray();
String[] split = input.split("\\s+");
if (letters.length != split.length) {
// early return - not possible to match if lengths aren't equal
return false;
}
Map<String, Character> map = new HashMap<>();
// aaaa test test test1 test1
boolean used[] = new boolean[26];
for (int i = 0; i < letters.length; i++) {
Character existing = map.get(split[i]);
if (existing == null) {
// put into map if not found yet
if (used[(int)(letters[i] - 'a')]) {
return false;
}
used[(int)(letters[i] - 'a')] = true;
map.put(split[i], letters[i]);
} else {
// doesn't match - return false
if (existing != letters[i]) {
return false;
}
}
}
return true;
}
public static void main(String[] argv) {
System.out.println(isMatch("aba", "blue green blue"));
System.out.println(isMatch("aba", "blue green green"));
}
The next part of the problem stumped me:
With no delimiters in the input, write the same function.
eg:
isMatch("aba", "bluegreenblue") -> true
isMatch("abc","bluegreenyellow") -> true
isMatch("aba", "t1t2t1") -> true
isMatch("aba", "t1t1t1") -> false
isMatch("aba", "t1t11t1") -> true
isMatch("abab", "t1t2t1t2") -> true
isMatch("abcdefg", "ieqfkvu") -> true
isMatch("abcdefg", "bluegreenredyellowpurplesilvergold") -> true
isMatch("ababac", "bluegreenbluegreenbluewhite") -> true
isMatch("abdefghijklmnopqrstuvwxyz", "zyxwvutsrqponmlkjihgfedcba") -> true
I wrote a bruteforce solution (generating all possible splits of the input string of size letters.length and checking in turn against isMatch) but the interviewer said it wasn't optimal.
I have no idea how to solve this part of the problem, is this even possible or am I missing something?
They were looking for something with a time complexity of O(M x N ^ C), where M is the length of the pattern and N is the length of the input, C is some constant.
Clarifications
I'm not looking for a regex solution, even if it works.
I'm not looking for the naive solution that generates all possible splits and checks them, even with optimization since that'll always be exponential time.

It is possible to optimize a backtracking solution. Instead of generating all splits first and then checking that it is a valid one, we can check it "on fly". Let's assume that we have already split a prefix(with length p) of the initial string and have matched i characters from the pattern. Let's take look at the i + 1 character.
If there is a string in the prefix that corresponds to the i + 1 letter, we should just check that a substring that starts at the position p + 1 is equal to it. If it is, we just proceed to i + 1 and p + the length of this string. Otherwise, we can kill this branch.
If there is no such string, we should try all substrings that start in the position p + 1 and end somewhere after it.
We can also use the following idea to reduce the number of branches in your solution: we can estimate the length of the suffix of the pattern which has not been processed yet(we know the length for the letters that already stand for some strings, and we know a trivial lower bound of the length of a string for any letter in the pattern(it is 1)). It allows us to kill a branch if the remaining part of the initial string is too short to match a the rest of the pattern.
This solution still has an exponential time complexity, but it can work much faster than generating all splits because invalid solutions can be thrown away much earlier, so the number of reachable states can reduce significantly.

I feel like this is cheating, and I'm not convinced the capture group and reluctant quantifier will do the right thing. Or maybe they're looking to see if you can recognize that, because of how quantifiers work, matching is ambiguous.
boolean matches(String s, String pattern) {
StringBuilder patternBuilder = new StringBuilder();
Map<Character, Integer> backreferences = new HashMap<>();
int nextBackreference = 1;
for (int i = 0; i < pattern.length(); i++) {
char c = pattern.charAt(i);
if (!backreferences.containsKey(c)) {
backreferences.put(c, nextBackreference++);
patternBuilder.append("(.*?)");
} else {
patternBuilder.append('\\').append(backreferences.get(c));
}
}
return s.matches(patternBuilder.toString());
}

You could improve on brute force by first assuming token lengths, and checking that the sum of token lengths equals the length of the test string. That would be quicker than pattern matching each time. Still very slow as number of unique tokens increases however.

UPDATE:
Here is my solution. Based it off of the explanation I made before.
import com.google.common.collect.*;
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.lang3.tuple.Pair;
import org.apache.commons.math3.util.Combinations;
import java.util.*;
/**
* Created by carlos on 2/14/15.
*/
public class PatternMatcher {
public static boolean isMatch(char[] pattern, String searchString){
return isMatch(pattern, searchString, new TreeMap<Integer, Pair<Integer, Integer>>(), Sets.newHashSet());
}
private static boolean isMatch(char[] pattern, String searchString, Map<Integer, Pair<Integer, Integer>> candidateSolution, Set<String> mappedStrings) {
List<Integer> occurrencesOfCharacterInPattern = getNextUnmappedPatternOccurrences(candidateSolution, pattern);
if(occurrencesOfCharacterInPattern.size() == 0)
return isValidSolution(candidateSolution, searchString, pattern, mappedStrings);
List<Pair<Integer, Integer>> sectionsOfUnmappedStrings = sectionsOfUnmappedStrings(searchString, candidateSolution);
if(sectionsOfUnmappedStrings.size() == 0)
return false;
String firstUnmappedString = substring(searchString, sectionsOfUnmappedStrings.get(0));
for (int substringSize = 1; substringSize <= firstUnmappedString.length(); substringSize++) {
String candidateSubstring = firstUnmappedString.substring(0, substringSize);
if(mappedStrings.contains(candidateSubstring))
continue;
List<Pair<Integer, Integer>> listOfAllOccurrencesOfSubstringInString = Lists.newArrayList();
for (int currentIndex = 0; currentIndex < sectionsOfUnmappedStrings.size(); currentIndex++) {
Pair<Integer,Integer> currentUnmappedSection = sectionsOfUnmappedStrings.get(currentIndex);
List<Pair<Integer, Integer>> occurrencesOfSubstringInString =
findAllInstancesOfSubstringInString(searchString, candidateSubstring,
currentUnmappedSection);
for(Pair<Integer,Integer> possibleAddition:occurrencesOfSubstringInString) {
listOfAllOccurrencesOfSubstringInString.add(possibleAddition);
}
}
if(listOfAllOccurrencesOfSubstringInString.size() < occurrencesOfCharacterInPattern.size())
return false;
Iterator<int []> possibleSolutionIterator =
new Combinations(listOfAllOccurrencesOfSubstringInString.size(),
occurrencesOfCharacterInPattern.size()).iterator();
iteratorLoop:
while(possibleSolutionIterator.hasNext()) {
Set<String> newMappedSets = Sets.newHashSet(mappedStrings);
newMappedSets.add(candidateSubstring);
TreeMap<Integer,Pair<Integer,Integer>> newCandidateSolution = Maps.newTreeMap();
// why doesn't Maps.newTreeMap(candidateSolution) work?
newCandidateSolution.putAll(candidateSolution);
int [] possibleSolutionIndexSet = possibleSolutionIterator.next();
for(int i = 0; i < possibleSolutionIndexSet.length; i++) {
Pair<Integer, Integer> candidatePair = listOfAllOccurrencesOfSubstringInString.get(possibleSolutionIndexSet[i]);
//if(candidateSolution.containsValue(Pair.of(0,1)) && candidateSolution.containsValue(Pair.of(9,10)) && candidateSolution.containsValue(Pair.of(18,19)) && listOfAllOccurrencesOfSubstringInString.size() == 3 && candidateSolution.size() == 3 && possibleSolutionIndexSet[0]==0 && possibleSolutionIndexSet[1] == 2){
if (makesSenseToInsert(newCandidateSolution, occurrencesOfCharacterInPattern.get(i), candidatePair))
newCandidateSolution.put(occurrencesOfCharacterInPattern.get(i), candidatePair);
else
break iteratorLoop;
}
if (isMatch(pattern, searchString, newCandidateSolution,newMappedSets))
return true;
}
}
return false;
}
private static boolean makesSenseToInsert(TreeMap<Integer, Pair<Integer, Integer>> newCandidateSolution, Integer startIndex, Pair<Integer, Integer> candidatePair) {
if(newCandidateSolution.size() == 0)
return true;
if(newCandidateSolution.floorEntry(startIndex).getValue().getRight() > candidatePair.getLeft())
return false;
Map.Entry<Integer, Pair<Integer, Integer>> ceilingEntry = newCandidateSolution.ceilingEntry(startIndex);
if(ceilingEntry !=null)
if(ceilingEntry.getValue().getLeft() < candidatePair.getRight())
return false;
return true;
}
private static boolean isValidSolution( Map<Integer, Pair<Integer, Integer>> candidateSolution,String searchString, char [] pattern, Set<String> mappedStrings){
List<Pair<Integer,Integer>> values = Lists.newArrayList(candidateSolution.values());
return areIntegersConsecutive(Lists.newArrayList(candidateSolution.keySet())) &&
arePairsConsecutive(values) &&
values.get(values.size() - 1).getRight() == searchString.length() &&
patternsAreUnique(pattern,mappedStrings);
}
private static boolean patternsAreUnique(char[] pattern, Set<String> mappedStrings) {
Set<Character> uniquePatterns = Sets.newHashSet();
for(Character character:pattern)
uniquePatterns.add(character);
return uniquePatterns.size() == mappedStrings.size();
}
private static List<Integer> getNextUnmappedPatternOccurrences(Map<Integer, Pair<Integer, Integer>> candidateSolution, char[] searchArray){
List<Integer> allMappedIndexes = Lists.newLinkedList(candidateSolution.keySet());
if(allMappedIndexes.size() == 0){
return occurrencesOfCharacterInArray(searchArray,searchArray[0]);
}
if(allMappedIndexes.size() == searchArray.length){
return Lists.newArrayList();
}
for(int i = 0; i < allMappedIndexes.size()-1; i++){
if(!areIntegersConsecutive(allMappedIndexes.get(i),allMappedIndexes.get(i+1))){
return occurrencesOfCharacterInArray(searchArray,searchArray[i+1]);
}
}
List<Integer> listOfNextUnmappedPattern = Lists.newArrayList();
listOfNextUnmappedPattern.add(allMappedIndexes.size());
return listOfNextUnmappedPattern;
}
private static String substring(String string, Pair<Integer,Integer> bounds){
try{
string.substring(bounds.getLeft(),bounds.getRight());
}catch (StringIndexOutOfBoundsException e){
System.out.println();
}
return string.substring(bounds.getLeft(),bounds.getRight());
}
private static List<Pair<Integer, Integer>> sectionsOfUnmappedStrings(String searchString, Map<Integer, Pair<Integer, Integer>> candidateSolution) {
if(candidateSolution.size() == 0) {
return Lists.newArrayList(Pair.of(0, searchString.length()));
}
List<Pair<Integer, Integer>> sectionsOfUnmappedStrings = Lists.newArrayList();
List<Pair<Integer,Integer>> allMappedPairs = Lists.newLinkedList(candidateSolution.values());
// Dont have to worry about the first index being mapped because of the way the first candidate solution is made
for(int i = 0; i < allMappedPairs.size() - 1; i++){
if(!arePairsConsecutive(allMappedPairs.get(i), allMappedPairs.get(i + 1))){
Pair<Integer,Integer> candidatePair = Pair.of(allMappedPairs.get(i).getRight(), allMappedPairs.get(i + 1).getLeft());
sectionsOfUnmappedStrings.add(candidatePair);
}
}
Pair<Integer,Integer> lastMappedPair = allMappedPairs.get(allMappedPairs.size() - 1);
if(lastMappedPair.getRight() != searchString.length()){
sectionsOfUnmappedStrings.add(Pair.of(lastMappedPair.getRight(),searchString.length()));
}
return sectionsOfUnmappedStrings;
}
public static boolean areIntegersConsecutive(List<Integer> integers){
for(int i = 0; i < integers.size() - 1; i++)
if(!areIntegersConsecutive(integers.get(i),integers.get(i+1)))
return false;
return true;
}
public static boolean areIntegersConsecutive(int left, int right){
return left == (right - 1);
}
public static boolean arePairsConsecutive(List<Pair<Integer,Integer>> pairs){
for(int i = 0; i < pairs.size() - 1; i++)
if(!arePairsConsecutive(pairs.get(i), pairs.get(i + 1)))
return false;
return true;
}
public static boolean arePairsConsecutive(Pair<Integer, Integer> left, Pair<Integer, Integer> right){
return left.getRight() == right.getLeft();
}
public static List<Integer> occurrencesOfCharacterInArray(char[] searchArray, char searchCharacter){
assert(searchArray.length>0);
List<Integer> occurrences = Lists.newLinkedList();
for(int i = 0;i<searchArray.length;i++){
if(searchArray[i] == searchCharacter)
occurrences.add(i);
}
return occurrences;
}
public static List<Pair<Integer,Integer>> findAllInstancesOfSubstringInString(String searchString, String substring, Pair<Integer,Integer> bounds){
String string = substring(searchString,bounds);
assert(StringUtils.isNoneBlank(substring,string));
int lastIndex = 0;
List<Pair<Integer,Integer>> listOfOccurrences = Lists.newLinkedList();
while(lastIndex != -1){
lastIndex = string.indexOf(substring,lastIndex);
if(lastIndex != -1){
int newIndex = lastIndex + substring.length();
listOfOccurrences.add(Pair.of(lastIndex + bounds.getLeft(), newIndex + bounds.getLeft()));
lastIndex = newIndex;
}
}
return listOfOccurrences;
}
}
It works with the cases provided, but is not thoroughly tested. Let me know if there are any mistakes.
ORIGINAL RESPONSE:
Assuming your string you are searching can have arbitrary length tokens (which some of your examples do) then:
You want to start trying to break your string into parts that match the pattern. Looking for contradictions along the way to cut down on your search tree.
When you start processing you're going to select N characters of the beginning of the string. Now, go and see if you can find that substring in the rest of the string. If you can't then it can't possibly be a solution. If you can then your string looks something like this
(N characters)<...>[(N characters)<...>] where either one of the <...> contains 0+ characters and aren't necessarily the same substring. And whats inside of [] could repeat a number of times equal to the number of times (N characters) appears in the string.
Now, you have the first letter of your pattern matched, your not sure if the rest of the pattern matches, but you can basically re-use this algorithm (with modifications) to interrogate the <...> parts of the string.
You would do this for N = 1,2,3,4...
Make sense?
I'll work an example (which doesn't cover all cases, but hopefully illustrates) Note, when i'm referring to substrings in the pattern i'll use single quotes and when i'm referring to substrings of the string i'll use double quotes.
isMatch("ababac", "bluegreenbluegreenbluewhite")
Ok, 'a' is my first pattern.
for N = 1 i get the string "b"
where is "b" in the search string?
bluegreenbluegreenbluewhite.
Ok, so at this point this string MIGHT match with "b" being the pattern 'a'. Lets see if we can do the same with the pattern 'b'. Logically, 'b' MUST be the entire string "luegreen" (because its squeezed between two consecutive 'a' patterns) then I check in between the 2nd and 3rd 'a'. YUP, its "luegreen".
Ok, so far i've matched all but the 'c' of my pattern. Easy case, its the rest of the string. It matches.
This is basically writing a Perl regex parser. ababc = (.+)(.+)(\1)(\2)(.+). So you just have to convert it to a Perl regex

Here's a sample snippet of my code:
public static final boolean isMatch(String patternStr, String input) {
// Initial Check (If all the characters in the pattern string are unique, degenerate case -> immediately return true)
char[] patt = patternStr.toCharArray();
Arrays.sort(patt);
boolean uniqueCase = true;
for (int i = 1; i < patt.length; i++) {
if (patt[i] == patt[i - 1]) {
uniqueCase = false;
break;
}
}
if (uniqueCase) {
return true;
}
String t1 = patternStr;
String t2 = input;
if (patternStr.length() == 0 && input.length() == 0) {
return true;
} else if (patternStr.length() != 0 && input.length() == 0) {
return false;
} else if (patternStr.length() == 0 && input.length() != 0) {
return false;
}
int count = 0;
StringBuffer sb = new StringBuffer();
char[] chars = input.toCharArray();
String match = "";
// first read for the first character pattern
for (int i = 0; i < chars.length; i++) {
sb.append(chars[i]);
count++;
if (!input.substring(count, input.length()).contains(sb.toString())) {
match = sb.delete(sb.length() - 1, sb.length()).toString();
break;
}
}
if (match.length() == 0) {
match = t2;
}
// based on that character, update patternStr and input string
t1 = t1.replace(String.valueOf(t1.charAt(0)), "");
t2 = t2.replace(match, "");
return isMatch(t1, t2);
}
I basically decided to first parse the pattern string and determine if there are any matching characters in the pattern string. For example in "aab" "a" is used twice in the pattern string and so "a" cannot map to something else. Otherwise, if there are no matching characters in a string such as "abc", it won't matter what our input string is since the pattern is unique and so it doesn't matter what each pattern character matches to (degenerative case).
If there are matching characters in the pattern string, then I would begin to check what each string matches to. Unfortunately, without knowing the delimiter I wouldn't know how long each string would be. Instead, I just decided to parse 1 character at a time and check if the other parts of the string contains the same string and continue adding characters to the buffer letter by letter until the buffer string cannot be found in the input string. Once I have the string determined, it's now in the buffer I would simply delete all the matched strings in the input string and the character pattern from the pattern string then recurse.
Apologies if my explanation wasn't very clear, I hope my code can be clear though.

basic java program not working

For the code i need to write a method the decompresses a string. For example if the user entered "2d5t" the method would return "ddttttt". My code now will work for that input but if the input uses a character without a number before it the program wont work when it should. For example if the input was just "d" the program wouldnt work instead of just returning "d". The code also has to be recursive.
Here is what my code is now please help.
public static String decompress(String compressedText) {
if (compressedText.equals(""))
return "";
return decompress(compressedText, charInt(compressedText, 0), 0);
}
public static String decompress(String text, int count, int pos) {
if (pos == text.length() || (pos == text.length()-2 && count == 0))
return "";
else if (count == 0)
return decompress(text, charInt(text, pos+2), pos+2);
return text.charAt(pos+1) + decompress(text, count-1, pos);
}
public static int charInt(String str, int idex) {
return str.charAt(idex) - '0';
}

Here's some pseudocode:
function createString(int times, char character){
if times is 0, do nothing
otherwise return character + createString(times-1, character);
}
function createString(string full){
split string by number/character pairs
for each pair, call createString(times, character), and append them
}
I don't believe in handing out real code, sorry. It's much better in the long run.

You need to validate your user input. Decide first, that what string values are acceptable to your method and then write a validate method. Then invoke that method inside your decompress method.
Look into string manipulation functions and regular expressions in Java. And then try rewriting your code.

As mentioned by others, this can be solved with regular expressions. An example solution is:
public static String decompress(String compressed) {
Matcher matcher = Pattern.compile("(\\d+)([^\\d])").matcher(compressed);
StringBuffer decompressed = new StringBuffer();
while (matcher.find()) {
Integer charNum = Integer.parseInt(matcher.group(1));
StringBuilder decompressedChars = new StringBuilder();
for (int i = 1; i <= charNum; i++) {
decompressedChars.append(matcher.group(2));
}
matcher.appendReplacement(decompressed, decompressedChars.toString());
}
matcher.appendTail(decompressed);
return decompressed.toString();
}
This code won't support numbers larger than Integer.MAX_VALUE and you might want to put some error handling and validation in there also.

**Edited to be recursive as per the OP's request
Tested left first one char lookahead parser using regex
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Parser{
private static String parse(StringBuilder output, String input, Integer offset){
if(offset<input.length()){
java.util.regex.Pattern p0 =
java.util.regex.Pattern.compile("\\d(?=[a-z])");
java.util.regex.Pattern p1 =
java.util.regex.Pattern.compile("[a-z]");
java.util.regex.Matcher m0 = p0.matcher(input);
java.util.regex.Matcher m1 = p1.matcher(input);
if (m0.find(offset) && m0.start() == offset)
{
for(Integer i = 0;
i < Integer.parseInt(String.valueOf(input.charAt(offset)));
++i) {
output.append(input.charAt(offset+1));
}
offset+=2;
}
else if (m1.find(offset) && m1.start() == offset) {
output.append(input.charAt(offset));
++offset;
}
else {
++offset;
}
return parse(output, input, offset);
}
else return output.toString();
}
public static void main(String[] args)
{
Integer offset = 0;
StringBuilder output = new StringBuilder();
parse(output, args[0], offset);
System.out.println(output.toString());
}
}

Check if String contains only letters

The idea is to have a String read and to verify that it does not contain any numeric characters. So something like "smith23" would not be acceptable.

What do you want? Speed or simplicity? For speed, go for a loop based approach. For simplicity, go for a one liner RegEx based approach.
Speed
public boolean isAlpha(String name) {
char[] chars = name.toCharArray();
for (char c : chars) {
if(!Character.isLetter(c)) {
return false;
}
}
return true;
}
Simplicity
public boolean isAlpha(String name) {
return name.matches("[a-zA-Z]+");
}

Java 8 lambda expressions. Both fast and simple.
boolean allLetters = someString.chars().allMatch(Character::isLetter);

Or if you are using Apache Commons, [StringUtils.isAlpha()].

First import Pattern :
import java.util.regex.Pattern;
Then use this simple code:
String s = "smith23";
if (Pattern.matches("[a-zA-Z]+",s)) {
// Do something
System.out.println("Yes, string contains letters only");
}else{
System.out.println("Nope, Other characters detected");
}
This will output:
Nope, Other characters detected

I used this regex expression (".*[a-zA-Z]+.*"). With if not statement it will avoid all expressions that have a letter before, at the end or between any type of other character.
String strWithLetters = "123AZ456";
if(! Pattern.matches(".*[a-zA-Z]+.*", str1))
return true;
else return false

A quick way to do it is by:
public boolean isStringAlpha(String aString) {
int charCount = 0;
String alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
if (aString.length() == 0) {
return false; //zero length string ain't alpha
}
for (int i = 0; i < aString.length(); i++) {
for (int j = 0; j < alphabet.length(); j++) {
if (aString.substring(i, i + 1).equals(alphabet.substring(j, j + 1))
|| aString.substring(i, i + 1).equals(alphabet.substring(j, j + 1).toLowerCase())) {
charCount++;
}
}
if (charCount != (i + 1)) {
System.out.println("\n**Invalid input! Enter alpha values**\n");
return false;
}
}
return true;
}
Because you don't have to run the whole aString to check if it isn't an alpha String.

private boolean isOnlyLetters(String s){
char c=' ';
boolean isGood=false, safe=isGood;
int failCount=0;
for(int i=0;i<s.length();i++){
c = s.charAt(i);
if(Character.isLetter(c))
isGood=true;
else{
isGood=false;
failCount+=1;
}
}
if(failCount==0 && s.length()>0)
safe=true;
else
safe=false;
return safe;
}
I know it's a bit crowded. I was using it with my program and felt the desire to share it with people. It can tell if any character in a string is not a letter or not. Use it if you want something easy to clarify and look back on.

Faster way is below. Considering letters are only a-z,A-Z.
public static void main( String[] args ){
System.out.println(bestWay("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
System.out.println(isAlpha("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
System.out.println(bestWay("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
System.out.println(isAlpha("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
}
public static boolean bettertWay(String name) {
char[] chars = name.toCharArray();
long startTimeOne = System.nanoTime();
for(char c : chars){
if(!(c>=65 && c<=90)&&!(c>=97 && c<=122) ){
System.out.println(System.nanoTime() - startTimeOne);
return false;
}
}
System.out.println(System.nanoTime() - startTimeOne);
return true;
}
public static boolean isAlpha(String name) {
char[] chars = name.toCharArray();
long startTimeOne = System.nanoTime();
for (char c : chars) {
if(!Character.isLetter(c)) {
System.out.println(System.nanoTime() - startTimeOne);
return false;
}
}
System.out.println(System.nanoTime() - startTimeOne);
return true;
}
Runtime is calculated in nano seconds. It may vary system to system.
5748//bettertWay without numbers
true
89493 //isAlpha without numbers
true
3284 //bettertWay with numbers
false
22989 //isAlpha with numbers
false

Check this,i guess this is help you because it's work in my project so once you check this code
if(! Pattern.matches(".*[a-zA-Z]+.*[a-zA-Z]", str1))
{
String not contain only character;
}
else
{
String contain only character;
}

String expression = "^[a-zA-Z]*$";
CharSequence inputStr = str;
Pattern pattern = Pattern.compile(expression);
Matcher matcher = pattern.matcher(inputStr);
if(matcher.matches())
{
//if pattern matches
}
else
{
//if pattern does not matches
}

Try using regular expressions: String.matches

public boolean isAlpha(String name)
{
String s=name.toLowerCase();
for(int i=0; i<s.length();i++)
{
if((s.charAt(i)>='a' && s.charAt(i)<='z'))
{
continue;
}
else
{
return false;
}
}
return true;
}

Feels as if our need is to find whether the character are only alphabets.
Here's how you can solve it-
Character.isAlphabetic(c)
helps to check if the characters of the string are alphabets or not.
where c is
char c = s.charAt(elementIndex);

While there are many ways to skin this cat, I prefer to wrap such code into reusable extension methods that make it trivial to do going forward. When using extension methods, you can also avoid RegEx as it is slower than a direct character check. I like using the extensions in the Extensions.cs NuGet package. It makes this check as simple as:
Add the https://www.nuget.org/packages/Extensions.cs package to your project.
Add "using Extensions;" to the top of your code.
"smith23".IsAlphabetic() will return False whereas "john smith".IsAlphabetic() will return True. By default the .IsAlphabetic() method ignores spaces, but it can also be overridden such that "john smith".IsAlphabetic(false) will return False since the space is not considered part of the alphabet.
Every other check in the rest of the code is simply MyString.IsAlphabetic().

To allow only ASCII letters, the character class \p{Alpha} can be used. (This is equivalent to [\p{Lower}\p{Upper}] or [a-zA-Z].)
boolean allLettersASCII = str.matches("\\p{Alpha}*");
For allowing all Unicode letters, use the character class \p{L} (or equivalently, \p{IsL}).
boolean allLettersUnicode = str.matches("\\p{L}*");
See the Pattern documentation.

I found an easy of way of checking a string whether all its digit is letter or not.
public static boolean isStringLetter(String input) {
boolean b = false;
for (int id = 0; id < input.length(); id++) {
if ('a' <= input.charAt(id) && input.charAt(id) <= 'z') {
b = true;
} else if ('A' <= input.charAt(id) && input.charAt(id) <= 'Z') {
b = true;
} else {
b = false;
}
}
return b;
}
I hope it could help anyone who is looking for such method.

Use StringUtils.isAlpha() method and it will make your life simple.

How can tokenize this string in java?

How can I split these simple mathematical expressions into seperate strings?
I know that I basically want to use the regular expression: "[0-9]+|[*+-^()]" but it appears String.split() won't work because it consumes the delimiter tokens as well.
I want it to split all integers: 0-9, and all operators *+-^().
So, 578+223-5^2
Will be split into:
578
+
223
-
5
^
2
What is the best approach to do that?

You could use StringTokenizer(String str, String delim, boolean returnDelims), with the operators as delimiters. This way, at least get each token individually (including the delimiters). You could then determine what kind of token you're looking at.

Going at this laterally, and assuming your intention is ultimately to evaluate the String mathematically, you might be better off using the ScriptEngine
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;
public class Evaluator {
private ScriptEngineManager sm = new ScriptEngineManager();
private ScriptEngine sEngine = sm.getEngineByName("js");
public double stringEval(String expr)
{
Object res = "";
try {
res = sEngine.eval(expr);
}
catch(ScriptException se) {
se.printStackTrace();
}
return Double.parseDouble( res.toString());
}
}
Which you can then call as follows:
Evaluator evr = new Evaluator();
String sTest = "+1+9*(2 * 5)";
double dd = evr.stringEval(sTest);
System.out.println(dd);
I went down this road when working on evaluating Strings mathematically and it's not so much the operators that will kill you in regexps but complex nested bracketed expressions. Not reinventing the wheel is a) safer b) faster and c) means less complex and nested code to maintain.

This works for the sample string you posted:
String s = "578+223-5^2";
String[] tokens = s.split("(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)");
The regex is made up entirely of lookaheads and lookbehinds; it matches a position (not a character, but a "gap" between characters), that is either preceded by a digit and followed by a non-digit, or preceded by a non-digit and followed by a digit.
Be aware that regexes are not well suited to the task of parsing math expressions. In particular, regexes can't easily handle balanced delimiters like parentheses, especially if they can be nested. (Some regex flavors have extensions which make that sort of thing easier, but not Java's.)
Beyond this point, you'll want to process the string using more mundane methods like charAt() and substring() and Integer.parseInt(). Or, if this isn't a learning exercise, use an existing math expression parsing library.
EDIT: ...or eval() it as #Syzygy recommends.

You can't use String.split() for that, since whatever characters match the specified pattern are removed from the output.
If you're willing to require spaces between the tokens, you can do...
"578 + 223 - 5 ^ 2 ".split(" ");
which yields...
578
+
223
-
5
^
2

Here's a short Java program that tokenizes such strings. If you're looking for evaluation of expression I can (shamelessly) point you at this post: An Arithemetic Expressions Solver in 64 Lines
import java.util.ArrayList;
import java.util.List;
public class Tokenizer {
private String input;
public Tokenizer(String input_) { input = input_.trim(); }
private char peek(int i) {
return i >= input.length() ? '\0' : input.charAt(i);
}
private String consume(String... arr) {
for(String s : arr)
if(input.startsWith(s))
return consume(s.length());
return null;
}
private String consume(int numChars) {
String result = input.substring(0, numChars);
input = input.substring(numChars).trim();
return result;
}
private String literal() {
for (int i = 0; true; ++i)
if (!Character.isDigit(peek(i)))
return consume(i);
}
public List<String> tokenize() {
List<String> res = new ArrayList<String>();
if(input.isEmpty())
return res;
while(true) {
res.add(literal());
if(input.isEmpty())
return res;
String s = consume("+", "-", "/", "*", "^");
if(s == null)
throw new RuntimeException("Syntax error " + input);
res.add(s);
}
}
public static void main(String[] args) {
Tokenizer t = new Tokenizer("578+223-5^2");
System.out.println(t.tokenize());
}
}

You only put the delimiters in the split statement. Also, the - mean range and has to be escaped.
"578+223-5^2".split("[*+\\-^()]")

You need to escape the -. I believe the quantifiers (+ and *) lose their special meaning, as do parentheses in a character class. If it doesn't work, try escaping those as well.

Here is my tokenizer solution that allows for negative numbers (unary).
So far it has been doing everything I needed it to:
private static List<String> tokenize(String expression)
{
char c;
List<String> tokens = new ArrayList<String>();
String previousToken = null;
int i = 0;
while(i < expression.length())
{
c = expression.charAt(i);
StringBuilder currentToken = new StringBuilder();
if (c == ' ' || c == '\t') // Matched Whitespace - Skip Whitespace
{
i++;
}
else if (c == '-' && (previousToken == null || isOperator(previousToken)) &&
((i+1) < expression.length() && Character.isDigit(expression.charAt((i+1))))) // Matched Negative Number - Add token to list
{
currentToken.append(expression.charAt(i));
i++;
while(i < expression.length() && Character.isDigit(expression.charAt(i)))
{
currentToken.append(expression.charAt(i));
i++;
}
}
else if (Character.isDigit(c)) // Matched Number - Add to token list
{
while(i < expression.length() && Character.isDigit(expression.charAt(i)))
{
currentToken.append(expression.charAt(i));
i++;
}
}
else if (c == '+' || c == '*' || c == '/' || c == '^' || c == '-') // Matched Operator - Add to token list
{
currentToken.append(c);
i++;
}
else // No Match - Invalid Token!
{
i++;
}
if (currentToken.length() > 0)
{
tokens.add(currentToken.toString());
previousToken = currentToken.toString();
}
}
return tokens;
}

You have to escape the "()" in Java, and the '-'
myString.split("[0-9]+|[\\*\\+\\-^\\(\\)]");

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Regex help extracting with negative lookahead - java

Related

Is there a way to find out how many numbers are at the end of a string without knowing the exact index?

Pattern matching interview Q

basic java program not working

Check if String contains only letters

How can tokenize this string in java?

Categories

Resources