regex is not working in guava - java

I am using guava 21.0 and trying to split a String by providing a regex(\\d).
However,I am not sure why is not working.
If I change regex to anything which is not regex (eg "a") then it works fine.
Here is the code :
public class SplitWithRegex {
public static Iterable<String> splitByRegex(String string, String regex){
return Splitter.on(regex).trimResults().omitEmptyStrings().split(string);
}
public static void main(String[] args) {
Iterable<String> itr = splitByRegex("abc243gca87asas**78sassnb32snb1ss22220220", "\\d");
for(String s : itr){
System.out.println(s);
}
}
}
Result when regex is applied :
abc243gca87asas**78sassnb32snb1ss22220220
Any help would be appreciated.

You must use Splitter.onPattern("\\d+") and not Splitter.on("\\d+").
Here's the javadoc for Splitter's on method, this is what it says:
Returns a splitter that uses the given fixed string as a separator.
For example, Splitter.on(", ").split("foo, bar,baz") returns an
iterable containing ["foo", "bar,baz"].
So, separator is a treated as String literal and not regex and hence, it does not split the String as expected. If you want regex based splitting then you can use String's split method or Splitter's onPattern method, e.g.:
String[] tokens = "abc243gca87asas**78sassnb32snb1ss22220220".split("\\d+");
for(String token : tokens){
System.out.println(token);
}
public static Iterable<String> splitByRegex(String string, String regex){
return Splitter.onPattern(regex).trimResults().omitEmptyStrings().split(string);
}

Related

Most precise regex match

Is there a way to find of most precise regex for a string?
For e.g.
Lets say, I have 2 regex:
1) .*bourne
2) .*ne
If I try to match Melbourne with the above regex, it will match with both regex.
But more precise match will be the first regex. Similarly, there can be very complex regex.
Is there a way to find the most precise match?
Is there a way to find the most precise match?
The most "precise" match is the the one where the regex needs to process less data until it finds a match, in this case, .*bourne.
Wouldn't sorting the patterns in descending order of length solve the problem ?
For example, if Java is the language being used something like the following should be fine right (just sort the pattern in descending order of length and then return for first match)?
public class TestPattern {
public static void main(String args[]){
String text ="Melbourne";
System.out.println("Mtaching regex --> "+getMatchingRegex(text));
}
public static String getMatchingRegex(String text) {
ArrayList<String> patterns = new ArrayList<String>();
patterns.add(".*ne") ;
patterns.add(".*urne") ;
patterns.add(".*bourne") ;
patterns.add(".*rne") ;
Collections.sort(patterns, new StringComparator());
for(String pattern:patterns) {
if(Pattern.matches(pattern, text))
return pattern;
}
return "No Regex matched";
}
public static class StringComparator implements Comparator<String>
{
#Override
public int compare(String s1, String s2)
{
return s2.length()-s1.length();
}
}
}

Remove part of string after or before a specific word in java

Is there a command in java to remove the rest of the string after or before a certain word;
Example:
Remove substring before the word "taken"
before:
"I need this words removed taken please"
after:
"taken please"
String are immutable, you can however find the word and create a substring:
public static String removeTillWord(String input, String word) {
return input.substring(input.indexOf(word));
}
removeTillWord("I need this words removed taken please", "taken");
There is apache-commons-lang class StringUtils that contains exactly you want:
e.g. public static String substringBefore(String str, String separator)
public static String foo(String str, String remove) {
return str.substring(str.indexOf(remove));
}
Clean way to safely remove until a string
String input = "I need this words removed taken please";
String token = "taken";
String result = input.contains(token)
? token + StringUtils.substringAfter(string, token)
: input;
Apache StringUtils functions are null-, empty-, and no match- safe
Since OP provided clear requirements
Remove the rest of the string after or before a certain word
and nobody has fulfilled those yet, here is my approach to the problem. There are certain rules to the implementation, but overall it should satisfy OP's needs, if he or she comes to revisit the question.
public static String remove(String input, String separator, boolean before) {
Objects.requireNonNull(input);
Objects.requireNonNull(separator);
if (input.trim().equals(separator)) {
return separator;
}
if (separator.isEmpty() || input.trim().isEmpty()) {
return input;
}
String[] tokens = input.split(separator);
String target;
if (before) {
target = tokens[0];
} else {
target = tokens[1];
}
return input.replace(target, "");
}

Returning the nth Token

I'm very new at Java and I have a question about a summer assignment. These are the instructions:
Write a class called SpecialToken that has a static method called thirdToken. This
method should return as a String, the third token of a String that you pass as a parameter.
You may assume that spaces will serve as delimiters.
This is what I have so far but honestly I am stumped at what the parameter should be and how to return the third token! I was thinking I could do something like nextToken() until the third.
public class SpecialToken {
public static String thirdToken() {
}
}
Try something like
public class SpecialToken {
public static String thirdToken(String str) {
String[] splited = str.split(" ");
return splited[2];
}
}
Also see this tutorial or try searching google for "java split string into array by space"
Also note, as Betlista said this does not have any error checking, so if the passed string only has two tokens delimited by one space, you will get an Array out of bounds exception.
Or an other way would be to "Use StringTokenizer to tokenize the string. Import java.util.StringTokenizer. Then create a new instance of a StringTokenizer with the string to tokenize and the delimiter as parameters. If you do not enter the delimiter as a parameter, the delimiter will automatically default to white space. After you have the StringTokenizer, you can use the nextToken() method to get each token. " via Wikihow
With this method, your code should look something like this:
public class SpecialToken {
public static String thirdToken(String str) {
StringTokenizer tok = new StringTokenizer(str); // If you do not enter the delimiter as a parameter, the delimiter will automatically default to white space
int n = tok.countTokens();
if (n < 3) {return "";}
tok.nextToken();
tok.nextToken();
return tok.nextToken();
}
}
However keep in mind Wikihow's warning "now, the use of StringTokenizer is discouraged and the use of the split() method in the String class or the use of the java.util.regex package is encouraged."

contains with collator

I have to test whether a string is included in another one but without considering case or accents (French accents in this case).
For example the function must return true if I search for "rhone" in the string "Vallée du Rhône".
The Collator is useful for string comparison with accents but does not provide a contains function.
Is there an easy way to do the job ? A regex maybe ?
Additional information :
I just need a true / false return value, I don't care about number of matches or position of the test string in the reference string.
You can use Normalizer to reduce strings to stripped-down versions that you can compare directly.
Edit: to be clear
String normalized = Normalizer.normalize(text, Normalizer.Form.NFD);
String ascii = normalized.replaceAll("[^\\p{ASCII}]", "");
Have a look at Normalizer.
You should call it with Normalizer.Form.NFD as your second argument.
So, that would be:
Normalizer.normalize(yourinput, Normalizer.Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "")
.toLowerCase()
.contains(yoursearchstring)
which will return true if match (and, of course, false otherwise)
How about this?
private static final Pattern ACCENTS_PATTERN = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
public static boolean containsIgnoreCaseAndAccents(String haystack, String needle) {
final String hsToCompare = removeAccents(haystack).toLowerCase();
final String nToCompare = removeAccents(needle).toLowerCase();
return hsToCompare.contains(nToCompare);
}
public static String removeAccents(String string) {
return ACCENTS_PATTERN.matcher(Normalizer.normalize(string, Normalizer.Form.NFD)).replaceAll("");
}
public static void main(String[] args) {
System.out.println(removeAccents("Vallée du Rhône"));
System.out.println(removeAccents("rhone"));
System.out.println(containsIgnoreCaseAndAccents("Vallée du Rhône", "rhone"));
}
The normal way to do this is to convert both strings to lowercase without accents, and then use the standard 'contains'.

Can I decorate Joiner class of Guava

I have a List<String> and we are using Joiner to get the comma separated presentation of that List but now we need to do little enhancement, We need to capitalize the values in the List. Now the code was -
String str = Joiner.on(',').skipNulls().join(myValueList);
But now as I need to capitalize the Strings present in values I need to iterate it first to capitalize and then pass to Joiner to join, but I den't think this is a good approach as it'll iterate the List twice, one to capitalize and then Joiner will iterate to Join.
Is there any other utility method that I'm missing which may do this in one iteration.
How will you do it with Guava?
You can use Iterables.transform()
Iterable<String> upperStrings = Iterables.transform(myValueList, new Function<String,String>() {
public String apply(String input) {
// any transformation possible here.
return (input == null) ? null : input.toUpperCase();
}
});
Str str = Joiner.on(',').skipNulls().join(upperStrings);
About Joachim Sauer's answer:
it can be made a lot less verbose if you move the Function to a place where it can be re-used, in Guava the typical scenario would be to use an enum:
public enum StringTransformations implements Function<String, String>{
LOWERCASE{
#Override
protected String process(final String input){
return input.toLowerCase();
}
},
UPPERCASE{
#Override
protected String process(final String input){
return input.toUpperCase();
}
}
// possibly more transformations here
;
#Override
public String apply(final String input){
return input == null ? null : process(input);
}
protected abstract String process(String input);
}
Now the client code looks like this:
String str =
Joiner
.on(',')
.skipNulls()
.join(
Iterables.transform(myValueList,
StringTransformations.UPPERCASE));
Which I'd call much more readable.
Of course it would be even better (in terms of both memory usage and performance) if you introduced a constant for your Joiner:
private static final Joiner COMMA_JOINER = Joiner.on(',').skipNulls();
// ...
String str = COMMA_JOINER.join(
Iterables.transform(myValueList,
StringTransformations.UPPERCASE));
How about the following?
Joiner.on(',').skipNulls().join(myValueList).toUpperCase()

Categories