contains with collator

contains with collator - java

I have to test whether a string is included in another one but without considering case or accents (French accents in this case).
For example the function must return true if I search for "rhone" in the string "Vallée du Rhône".
The Collator is useful for string comparison with accents but does not provide a contains function.
Is there an easy way to do the job ? A regex maybe ?
Additional information :
I just need a true / false return value, I don't care about number of matches or position of the test string in the reference string.

You can use Normalizer to reduce strings to stripped-down versions that you can compare directly.
Edit: to be clear
String normalized = Normalizer.normalize(text, Normalizer.Form.NFD);
String ascii = normalized.replaceAll("[^\\p{ASCII}]", "");

Have a look at Normalizer.
You should call it with Normalizer.Form.NFD as your second argument.
So, that would be:
Normalizer.normalize(yourinput, Normalizer.Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "")
.toLowerCase()
.contains(yoursearchstring)
which will return true if match (and, of course, false otherwise)

How about this?
private static final Pattern ACCENTS_PATTERN = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
public static boolean containsIgnoreCaseAndAccents(String haystack, String needle) {
final String hsToCompare = removeAccents(haystack).toLowerCase();
final String nToCompare = removeAccents(needle).toLowerCase();
return hsToCompare.contains(nToCompare);
}
public static String removeAccents(String string) {
return ACCENTS_PATTERN.matcher(Normalizer.normalize(string, Normalizer.Form.NFD)).replaceAll("");
}
public static void main(String[] args) {
System.out.println(removeAccents("Vallée du Rhône"));
System.out.println(removeAccents("rhone"));
System.out.println(containsIgnoreCaseAndAccents("Vallée du Rhône", "rhone"));
}

The normal way to do this is to convert both strings to lowercase without accents, and then use the standard 'contains'.

Related

regex is not working in guava

I am using guava 21.0 and trying to split a String by providing a regex(\\d).
However,I am not sure why is not working.
If I change regex to anything which is not regex (eg "a") then it works fine.
Here is the code :
public class SplitWithRegex {
public static Iterable<String> splitByRegex(String string, String regex){
return Splitter.on(regex).trimResults().omitEmptyStrings().split(string);
}
public static void main(String[] args) {
Iterable<String> itr = splitByRegex("abc243gca87asas**78sassnb32snb1ss22220220", "\\d");
for(String s : itr){
System.out.println(s);
}
}
}
Result when regex is applied :
abc243gca87asas**78sassnb32snb1ss22220220
Any help would be appreciated.

You must use Splitter.onPattern("\\d+") and not Splitter.on("\\d+").
Here's the javadoc for Splitter's on method, this is what it says:
Returns a splitter that uses the given fixed string as a separator.
For example, Splitter.on(", ").split("foo, bar,baz") returns an
iterable containing ["foo", "bar,baz"].
So, separator is a treated as String literal and not regex and hence, it does not split the String as expected. If you want regex based splitting then you can use String's split method or Splitter's onPattern method, e.g.:
String[] tokens = "abc243gca87asas**78sassnb32snb1ss22220220".split("\\d+");
for(String token : tokens){
System.out.println(token);
}
public static Iterable<String> splitByRegex(String string, String regex){
return Splitter.onPattern(regex).trimResults().omitEmptyStrings().split(string);
}

Most precise regex match

Is there a way to find of most precise regex for a string?
For e.g.
Lets say, I have 2 regex:
1) .*bourne
2) .*ne
If I try to match Melbourne with the above regex, it will match with both regex.
But more precise match will be the first regex. Similarly, there can be very complex regex.
Is there a way to find the most precise match?

Is there a way to find the most precise match?
The most "precise" match is the the one where the regex needs to process less data until it finds a match, in this case, .*bourne.

Wouldn't sorting the patterns in descending order of length solve the problem ?
For example, if Java is the language being used something like the following should be fine right (just sort the pattern in descending order of length and then return for first match)?
public class TestPattern {
public static void main(String args[]){
String text ="Melbourne";
System.out.println("Mtaching regex --> "+getMatchingRegex(text));
}
public static String getMatchingRegex(String text) {
ArrayList<String> patterns = new ArrayList<String>();
patterns.add(".*ne") ;
patterns.add(".*urne") ;
patterns.add(".*bourne") ;
patterns.add(".*rne") ;
Collections.sort(patterns, new StringComparator());
for(String pattern:patterns) {
if(Pattern.matches(pattern, text))
return pattern;
}
return "No Regex matched";
}
public static class StringComparator implements Comparator<String>
{
#Override
public int compare(String s1, String s2)
{
return s2.length()-s1.length();
}
}
}

Remove part of string after or before a specific word in java

Is there a command in java to remove the rest of the string after or before a certain word;
Example:
Remove substring before the word "taken"
before:
"I need this words removed taken please"
after:
"taken please"

String are immutable, you can however find the word and create a substring:
public static String removeTillWord(String input, String word) {
return input.substring(input.indexOf(word));
}
removeTillWord("I need this words removed taken please", "taken");

There is apache-commons-lang class StringUtils that contains exactly you want:
e.g. public static String substringBefore(String str, String separator)

public static String foo(String str, String remove) {
return str.substring(str.indexOf(remove));
}

Clean way to safely remove until a string
String input = "I need this words removed taken please";
String token = "taken";
String result = input.contains(token)
? token + StringUtils.substringAfter(string, token)
: input;
Apache StringUtils functions are null-, empty-, and no match- safe

Since OP provided clear requirements
Remove the rest of the string after or before a certain word
and nobody has fulfilled those yet, here is my approach to the problem. There are certain rules to the implementation, but overall it should satisfy OP's needs, if he or she comes to revisit the question.
public static String remove(String input, String separator, boolean before) {
Objects.requireNonNull(input);
Objects.requireNonNull(separator);
if (input.trim().equals(separator)) {
return separator;
}
if (separator.isEmpty() || input.trim().isEmpty()) {
return input;
}
String[] tokens = input.split(separator);
String target;
if (before) {
target = tokens[0];
} else {
target = tokens[1];
}
return input.replace(target, "");
}

Remove request parameter from query string

I have a query string that could be:
/fr/hello?language=en
or
/fr/welcome?param1=222&param2=aloa&language=en
or
/it/welcome?param1=222&language=en&param2=aa
I would like to remove from each query string the parameter language with its value, therefore the results would be:
/fr/hello
and
/fr/welcome?param1=222&param2=aloa
and
/it/welcome?param1=222&param2=aa
EDIT: The length of the value of the parameter could be more than 2
Does anybody know any good regex expression to use in String.replaceAll([regex],[replace]) ?

Use the below regex and replace the matched strings with empty string,
[&?]language.*?(?=&|\?|$)
DEMO
Example code:
String s1 = "/fr/welcome?param1=222&param2=aloa&language=en";
String s2 = "/fr/welcome?language=en";
String s3 = "/fr/welcome?param1=222&language=en&param2=aa";
String m1 = s1.replaceAll("[&?]language.*?(?=&|\\?|$)", "");
String m2 = s2.replaceAll("[&?]language.*?(?=&|\\?|$)", "");
String m3 = s3.replaceAll("[&?]language.*?(?=&|\\?|$)", "");
System.out.println(m1);
System.out.println(m2);
System.out.println(m3);
Output:
/fr/welcome?param1=222&param2=aloa
/fr/welcome
/fr/welcome?param1=222&param2=aa
IDEONE 1 or IDEONE 2

You could use regex with replaceAll()
public static void main(String[] args) {
String s1 = "/fr/welcome?language=en";
String s2 = "/fr/welcome?param1=222&param2=aloa&language=en";
String s3 = "/fr/welcome?param1=222&language=en&param2=aa";
String pattern = "[?&]language=.{2}"; // use pattern = "([?&]language=\\w+)"; for more than 2 letters after language ==.
System.out.println(s1.replaceAll(pattern, ""));
System.out.println(s2.replaceAll(pattern, ""));
System.out.println(s3.replaceAll(pattern, ""));
}
o/p :
/fr/welcome
/fr/welcome?param1=222&param2=aloa
/fr/welcome?param1=222&param2=aa

This regexp should help you:
"language=\\w{2}"

I would like to remove from each query string the parameter language
with its value,...
You can use replaceAll.
String s="/fr/welcome?language=en";
s=s.replaceAll("(\\?|&)language=\\w+", "");
(\\?|&) group will match ? or &
\\w+ will match one or more word character

This will remove any parameter properly, even if it is placed more than one (for example="/fr/welcome?language=en&param1=222&param2=aloa")
public String removeParamFromUrl(final String url, final String param) {
if (StringUtils.isNotBlank(url)) {
return url.replaceAll("&" + param + "=[^&]+", "")
.replaceAll("\\?" + param + "=[^&]+&", "?")
.replaceAll("\\?" + param + "=[^&]+", "");
} else {
return url;
}
}

Rather than using a regex, it may be better to use a dedicated URI-manipulation API to remove the query parameter. The Spring UriComponentsBuilder class can be used to remove the given query parameter, retaining the rest. I'm assuming a Spring-specific solution is acceptable, as this question is tagged with spring.
private static String removeQueryParam(String url) {
return UriComponentsBuilder.fromUriString(url)
.replaceQueryParam("language")
.build()
.toUriString();
}
From the question as asked, it's unclear why or whether a regex-based solution using String.replaceAll is necessary, or whether instead any Java or Spring-based solution would be acceptable. In other words, this may be an XY problem where the goal is to remove the "language" query parameter while retaining all other query parameters, and there's no particular reason a regex needs to be involved in the solution.

alternate method for using substring on a String

I have a string which contains an underscore as shown below:
123445_Lisick
I want to remove all the characters from the String after the underscore. I have tried the code below, it's working, but is there any other way to do this, as I need to put this logic inside a for loop to extract elements from an ArrayList.
public class Test {
public static void main(String args[]) throws Exception {
String str = "123445_Lisick";
int a = str.indexOf("_");
String modfiedstr = str.substring(0, a);
System.out.println(modfiedstr);
}
}

Another way is to use the split method.
String str = "123445_Lisick";
String[] parts = string.split("_");
String modfiedstr = parts[0];
I don't think that really buys you anything though. There's really nothing wrong with the method you're using.

Your method is fine. Though not explicitly stated in the API documentation, I feel it's safe to assume that indexOf(char) will run in O(n) time. Since your string is unordered and you don't know the location of the underscore apriori, you cannot avoid this linear search time. Once you have completed the search, extraction of the substring will be needed for future processing. It's generally safe to assume the for simple operations like this in a language which is reasonably well refined the library functions will have been optimized.
Note however, that you are making an implicit assumption that
an underscore will exist within the String
if there are more than one underscore in the string, all but the first should be included in the output
If either of these assumptions will not always hold, you will need to make adjustments to handle those situations. In either case, you should at least defensively check for a -1 returned from indexAt(char) indicating that '_' is not in the string. Assuming in this situation the entire String is desired, you could use something like this:
public static String stringAfter(String source, char delim) {
if(source == null) return null;
int index = source.indexOf(delim);
return (index >= 0)?source.substring(index):source;
}

You could also use something like that:
public class Main {
public static void main(String[] args) {
String str = "123445_Lisick";
Pattern pattern = Pattern.compile("^([^_]*).*");
Matcher matcher = pattern.matcher(str);
String modfiedstr = null;
if (matcher.find()) {
modfiedstr = matcher.group(1);
}
System.out.println(modfiedstr);
}
}
The regex groups a pattern from the start of the input string until a character that is not _ is found.
However as #Bill the lizard wrote, i don't think that there is anything wrong with the method you do it now. I would do it the same way you did it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

contains with collator - java

You can use Normalizer to reduce strings to stripped-down versions that you can compare directly. Edit: to be clear String normalized = Normalizer.normalize(text, Normalizer.Form.NFD); String ascii = normalized.replaceAll("[^\\p{ASCII}]", "");

The normal way to do this is to convert both strings to lowercase without accents, and then use the standard 'contains'.

Related

regex is not working in guava

Most precise regex match

Remove part of string after or before a specific word in java

Remove request parameter from query string

alternate method for using substring on a String

Categories

Resources