Using regex to define masks for numbers in Java - java

I am trying to define a set of rules, that will compute a mask based on the number it is given. For example I am trying to return a mask of 8472952424 of any number that start with 12, 13, 14, Or return 847235XXXX for any number that starts with 7 or 8.
The input numbers are 4 digit Integers and the return is a String. Do I need to convert the integers to string before I do the regex on them, and I am also not sure how to construct the expressions.
Edit
I have too much criteria to be done using separate if statements for each case. I am matching extension numbers to masks so it could be inserted correctly on Cisco CallManager database (in case you are curious)
Edit
This is what I have done for one of the cases but this is still not matching correctly:
public String lookupMask(int ext){
//convert to String
StringBuilder sb = new StringBuilder();
sb.append(ext);
String extString = sb.toString();
//compile and match pattern
Pattern p = Pattern.compile("^[12|13|14|15|17|19|42]");
Matcher m = p.matcher(extString);
if(m.matches()){
return "8472952424";
}
return null;
}

An example with Pattern could be this:
package test;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;
public class Main {
// working Pattern
private static final Pattern PATTERN = Pattern.compile("^((1[234579])|42)");
// Your Pattern won't work because although it takes in account the start of the
// input, the OR within a character class does not exempt you to write round brackets
// around sequential characters such as "12".
// In fact here, the OR will be interpreted as the "|" character in the class, thus
// allowing it as a start character.
private static final Pattern NON_WORKING_PATTERN = Pattern.compile("^[12|13|14|15|17|19|42]");
private static final String STARTS_WITH_1_234 = "8472952424";
private static final String STARTS_WITH_ANYTHING_ELSE = "847295XXXX";
public static void main(String[] args) {
// NON_WORKING_PATTERN "works" on "33333"
System.out.println(NON_WORKING_PATTERN.matcher("33333").find());
int[] testIntegers = new int[]{1200, 1300, 1400, 1500, 1700, 1900, 4200, 0000};
List<String> results = new ArrayList<String>();
for (int test: testIntegers) {
if (PATTERN.matcher(String.valueOf(test)).find()) {
results.add(STARTS_WITH_1_234);
}
else {
results.add(STARTS_WITH_ANYTHING_ELSE);
}
}
System.out.println(results);
}
}
Output:
true
[8472952424, 8472952424, 8472952424, 8472952424, 8472952424, 8472952424, 8472952424, 847295XXXX]

Related

Java function to parse all doubles from string

I know this has been asked before¹ but responses don't seem to cover all corner cases.
I tried implementing the suggestion¹ with the test case
String("Doubles -1.0, 0, 1, 1.12345 and 2.50")
Which should return
[-1, 0, 1, 1.12345, 2.50]:
import java.util.Scanner;
import java.util.ArrayList;
import java.util.Locale;
public class Main
{
public static void main(String[] args) {
String string = new String("Doubles -1.0, 0, 1, 1.12345 and 2.50");
System.out.println(string);
ArrayList<Double> doubles = getDoublesFromString(string);
System.out.println(doubles);
}
public static ArrayList<Double> getDoublesFromString(String string){
Scanner parser = new Scanner(string);
parser.useLocale(Locale.US);
ArrayList<Double> doubles = new ArrayList<Double>();
double currentDouble;
while (parser.hasNext()){
if(parser.hasNextDouble()){
currentDouble = parser.nextDouble();
doubles.add(currentDouble);
}
else {
parser.next();
}
}
parser.close();
return doubles;
}
}
Instead code above returns [1.12345, 2.5].
Did I implement it wrong? What's the fix for catching negative and 0's?
I would use a regex find all approach here:
String string = new String("Doubles -1.0, 0, 1, 1.12345 and 2.50");
List<String> nums = new ArrayList<>();
String pattern = "-?\\d+(?:\\.\\d+)?";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(string);
while (m.find()) {
nums.add(m.group());
}
System.out.println(nums); // [-1.0, 0, 1, 1.12345, 2.50]
By the way, your question makes use of the String constructor, which is seldom used, but is interesting to see, especially for those of us who never use it.
Here is an explanation of the regex pattern:
-? match an optional leading negative sign
\\d+ match a whole number
(?:\\.\\d+)? match an optional decimal component
For your specific example, adding this at the construction of the scanner is sufficient: parser.useDelimiter("\\s|,");
The problem in your code is that the tokens containing a comma are not recognized as valid doubles. What the code above does is configuring the scanner to consider not only blank characters but also commas as token delimiters, and therefore the comma will not be in the token anymore, hence it will be a valid double that will successfully be parsed.
I believe this is the most appropriate solution because matching all doubles is actually complex. Below, I have pasted the regex that Scanner uses to do that, see how complicated this really is. Compared to splitting the string and then using Double.parseDouble, this is pretty similar but involves less custom code, and more importantly no exception throwing, which is slow.
(([-+]?((((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))|(((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))\x{2e}(([0-9\p{javaDigit}]))+|\x{2e}(([0-9\p{javaDigit}]))++)([eE][+-]?(([0-9\p{javaDigit}]))+)?)|(((((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))|(((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))\x{2e}(([0-9\p{javaDigit}]))+|\x{2e}(([0-9\p{javaDigit}]))++)([eE][+-]?(([0-9\p{javaDigit}]))+)?)|(\Q-\E((((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))|(((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))\x{2e}(([0-9\p{javaDigit}]))+|\x{2e}(([0-9\p{javaDigit}]))++)([eE][+-]?(([0-9\p{javaDigit}]))+)?))|[-+]?0[xX][0-9a-fA-F].[0-9a-fA-F]+([pP][-+]?[0-9]+)?|(([-+]?(NaN|\QNaN\E|Infinity|\Q∞\E))|((NaN|\QNaN\E|Infinity|\Q∞\E))|(\Q-\E(NaN|\QNaN\E|Infinity|\Q∞\E)))
First of all: I would use the regex solution, too… It's better and the following is just an alternative using split and replace/replaceAll while catching Exceptions:
public static void main(String[] args) {
// input
String s = "Doubles -1.0, 0, 1, 1.12345 and 2.50";
// split by whitespace(s) (keep in mind the commas will stay)
String[] parts = s.split("\\s+");
// create a collection to store the Doubles
List<Double> nums = new ArrayList<>();
// stream the result of the split operation and
Arrays.stream(parts).forEach(p -> {
// try to…
try {
// replace all commas and parse the value
nums.add(Double.parseDouble(p.replaceAll(",", "")));
} catch (Exception e) {
// which won't work for words like "Doubles", so print an error on those
System.err.println("Could not parse \"" + p + "\"");
}
});
// finally print all successfully parsed Double values
nums.forEach(System.out::println);
}
Output:
Could not parse "Doubles"
Could not parse "and"
-1.0
0.0
1.0
1.12345
2.5

Regex for a line with doubles

I am fairly new to programming and regex is very confusing. I am trying to identify a data line that consists of 3 doubles with spaces in between for example:
500.00 56.48 500.00
I have tried this:
data.matches("^[0-9]+\\.[0-9]+\\s[0-9]+\\.[0-9]+\\s[0-9]+\\.[0-9]+$")
But this doesn't recognize the line. What am I doing wrong?
Don't do it the way you have tried.
Although the regex pattern you have used works for the numbers you have used, it will fail for a wide range of numbers e.g. .5 or 5.6E2 which are also double numbers.
Given below is the demo with your data and pattern:
public class Main {
public static void main(String[] args) {
String data = "500.00 56.48 500.00";
System.out.println(data.matches("^[0-9]+\\.[0-9]+\\s[0-9]+\\.[0-9]+\\s[0-9]+\\.[0-9]+$"));
}
}
Output:
true
However, it will fail to give you the expected result in the following case:
public class Main {
public static void main(String[] args) {
String data = ".5 5.6E2 500.00";
System.out.println(data.matches("^[0-9]+\\.[0-9]+\\s[0-9]+\\.[0-9]+\\s[0-9]+\\.[0-9]+$"));
}
}
Output:
false
Even though .5 and 5.6E2 are valid double numbers, your pattern failed to recognize them.
The recommended way:
You should split the data line on whitespace and try to parse each number using Double#parseDouble e.g.
public class Main {
public static void main(String[] args) {
String data = "500.00 56.48 500.00";
System.out.println(matches(data));
data = ".5 5.6E2 500.00";
System.out.println(matches(data));
data = ".5 500.00";
System.out.println(matches(data));
data = ".5 abc 500.00";
System.out.println(matches(data));
}
static boolean matches(String data) {
String[] nums = data.split("\\s+");
boolean match = true;
if (nums.length == 3) {
for (String num : nums) {
try {
Double.parseDouble(num);
} catch (NumberFormatException e) {
match = false;
break;
}
}
} else {
match = false;
}
return match;
}
}
Output:
true
true
false
false
Improve your regex by observing a few things:
[0-9] is the same as \d
you're looking for the same pattern, thrice
So, let's do that:
three times:
one or more numbers, optionally followed by
a period and then one or more numbers, optionally followed by
white space
Which means:
(...){3} where ... is:
\d+, optionally followed by
(\.\d+)? (i.e. zero-or-once), optionally followed by
\s* (zero-or-more)
Putting that all together, and remembering to use proper string escaping:
data.matches("^(\\d+(\\.\\d+)?\\s*){3}$")
You can see this working over on https://regex101.com/r/PGxAm9/1, and keeping regex101 bookmarked for future debugging is highly recommended.

Regex to allow space between numbers or nothing before the first one

I have the method that follows - verifyPhones, I am using two regexs on it.
The first one is to identify if the String is valid, if not I need to search which numbers are not valid.
My problem is when I have two valid numbers together - 20255501252025550125, the system is returning only one of them as wrong instead of the whole string.
How can I improve my regex to have achieve that?
Thanks in advance.
Definition of valid number:
Any number that have 9 numbers, separated or not by the char -
Example:
000-000-0000
0001110000
Here is my code:
public static String verifyPhones(String phones) {
Pattern patternValidAllPhones = Pattern.compile("^(((\\d{3}[-]?){2}\\d{4})[ ]+)+$");
Pattern patternToFindWrongPhones = Pattern.compile("([ ]+((\\d{3}[-]?){2}\\d{4})[ ]+)");
phones = phones.replaceAll("\\r", " ").replaceAll("\\n", " ").concat(" ");
Matcher matcherValidAllPhones = patternValidAllPhones.matcher(phones);
if(!matcherValidAllPhones.matches()) {
Matcher matcherToFindWrongPhones = patternToFindWrongPhones.matcher(phones);
return matcherToFindWrongPhones.replaceAll("").trim();
}
return "";
}
#Test
public void verifyPhonesTest_whenInvalidPhones_thenReturneInvalidPhones() {
String invalidPhones1 = "202-555*0125 202-555-0125 202-555-0125 202-555-0125";
String invalidPhones2 = "202-555-0125202-555-0125 202-555-0125 202-555-0125";
String invalidPhones3 = "202555*0125 202-555-0125 202-555-0125 202-555-0125";
String invalidPhones4 = "2025550125 20255501252025550125";
String result1 = PhonesService.verifyPhones(invalidPhones1);
String result2 = PhonesService.verifyPhones(invalidPhones2);
String result3 = PhonesService.verifyPhones(invalidPhones3);
String result4 = PhonesService.verifyPhones(invalidPhones4);
assertFalse(result1.isEmpty());
assertEquals("202-555*0125", result1);
assertFalse(result2.isEmpty());
assertEquals("202-555-0125202-555-0125", result2);
assertFalse(result3.isEmpty());
assertEquals("202555*0125", result3);
assertFalse(result4.isEmpty());
assertEquals("20255501252025550125", result4);
}
Here's what I suggest:
If valid numbers have to be separated by a space from each other, then you can first split the String by spaces into pieces, where each piece is going to be a number. And then apply validation pattern on each piece separately. Those pieces that do not match the pattern are going to be invalid numbers.
Here's an example:
private static final Pattern phonePattern = Pattern.compile("(\\d{3}[-]?){2}\\d{4}");
public static List<String> verifyPhones(String phones) {
String[] numbers = phones.split("\\s+");
List<String> wrongPhones = new ArrayList<>();
for (String number : numbers) {
if (!phonePattern.matcher(number).matches()) {
wrongPhones.add(number);
}
}
return wrongPhones;
}
Note: I've changed the method's signature. Now it returns a List of wrong numbers. You do not expect to always have only one invalid number, do you?

How do I split/parse this String properly using Regex

I am inexperienced with regex and rusty with JAVA, so some help here would be appreciated.
So I have a String in the form:
statement|digit|statement
statement|digit|statement
etc.
where statement can be any combination of characters, digits, and spaces.
I want to parse this string such that I save the first and last statements of each line in a separate string array.
for example if I had a string:
cats|1|short hair and long hair
cats|2|black, blue
dogs|1|cats are better than dogs
I want to be able to parse the string into two arrays.
Array one = [cats], [cats], [dogs]
Array two = [short hair and long hair],[black, blue],[cats are better than dogs]
Matcher m = Pattern.compile("(\\.+)|\\d+|=(\\.+)").matcher(str);
while(m.find()) {
String key = m.group(1);
String value = m.group(2);
System.out.printf("key=%s, value=%s\n", key, value);
}
I would have continued to add the keys and values into seperate arrays had my output been right but no luck. Any help with this would be very much appreciated.
Here is a solution with RegEx:
public class ParseString {
public static void main(String[] args) {
String data = "cats|1|short hair and long hair\n"+
"cats|2|black, blue\n"+
"dogs|1|cats are better than dogs";
List<String> result1 = new ArrayList<>();
List<String> result2 = new ArrayList<>();
Pattern pattern = Pattern.compile("(.+)\\|\\d+\\|(.+)");
Matcher m = pattern.matcher(data);
while (m.find()) {
String key = m.group(1);
String value = m.group(2);
result1.add(key);
result2.add(value);
System.out.printf("key=%s, value=%s\n", key, value);
}
}
}
Here is a great site to help with regex http://txt2re.com/ expressions. Enter some example text in step one. Select the parts you are interested in part 2. And select a language in step 3. Then copy, paste and massage the code that it spits out.
Double split should work:
class ParseString
{
public static void main(String[] args)
{
String s = "cats|1|short hair and long hair\ncats|2|black, blue\ndogs|1|cats are better than dogs";
String[] sa1 = s.split("\n");
for (int i = 0; i < sa1.length; i++)
{
String[] sa2 = sa1[i].split("\\|");
System.out.printf("key=%s, value=%s\n", sa2[0], sa2[2]);
} // end for i
} // end main
} // end class ParseString
Output:
key=cats, value=short hair and long hair
key=cats, value=black, blue
key=dogs, value=cats are better than dogs
There is no need for a complex regex pattern, you could simple split the string by the demiliter token using the string's split method (String#split()) on Java.
Working Example
public class StackOverFlow31840211 {
private static final int SENTENCE1_TOKEN_INDEX = 0;
private static final int DIGIT_TOKEN_INDEX = SENTENCE1_TOKEN_INDEX + 1;
private static final int SENTENCE2_TOKEN_INDEX = DIGIT_TOKEN_INDEX + 1;
public static void main(String[] args) {
String[] text = {
"cats|1|short hair and long hair",
"cats|2|black, blue",
"dogs|1|cats are better than dogs"
};
ArrayList<String> arrayOne = new ArrayList<String>();
ArrayList<String> arrayTwo = new ArrayList<String>();
for (String s : text) {
String[] tokens = s.split("\\|");
int tokenType = 0;
for (String token : tokens) {
switch (tokenType) {
case SENTENCE1_TOKEN_INDEX:
arrayOne.add(token);
break;
case SENTENCE2_TOKEN_INDEX:
arrayTwo.add(token);
break;
}
++tokenType;
}
}
System.out.println("Sentences for first token: " + arrayOne);
System.out.println("Sentences for third token: " + arrayTwo);
}
}
I agree with the other answers that you should use split, but I am providing an answer that uses Pattern.split, since it uses a regex.
import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.Pattern;
/* Name of the class has to be "Main" only if the class is public. */
class MatchExample
{
public static void main (String[] args) {
String[] data = {
"cats|1|short hair and long hair",
"cats|2|black, blue",
"dogs|1|cats are better than dogs"
};
Pattern p = Pattern.compile("\\|\\d+\\|");
for(String line: data){
String[] elements = p.split(line);
System.out.println(elements[0] + " // " + elements[1]);
}
}
}
Notice that the pattern will match on one or more digits between two |'s. I see what you are doing with the groupings.
The main problem is that you need to escape | and not the .. Also what is the = doing in your regex? I generalized the regex a little bit but you can replace .* by \\d+ to have the same as you.
Matcher m = Pattern.compile("^(.+?)\\|.*\\|(.+)$", Pattern.MULTILINE).matcher(str);
Here is the strict version:"^([^|]+)\\|\\d+\\|([^|]+)$" (also with MULTILINE)
And it's indeed easier using split (on the lines) as some have said, but like this:
String[] parts = str.split("\\|\\d+\\|");
If parts.length is not two then you know it is not a legal line.
If your input is always formatted like that, then you can just do with this single statement to get the left part in the even indexes and the right part in the odd indexes (0: line1-left, 1: line1-right, 2: line2-left, 3: line2-right, 4: line3-left ...), so you will get an array twice the size of line count.
String[] parts = str.split("\\|\\d+\\||\\n+");

Regex pattern for matching words like c++ in a text

I have a text which can have words like c++, c, .net, asp.net in any format.
Sample Text:
Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me C,C++,Java,asp.net skills.
I already have c,c++,java,.net,asp.net stored somewhere.
All I need is to pick the occurrences of all these words in the text.
The pattern I was using to match was (?i)\\b(" +Pattern.quote(key)+ ")\\b which doesn't match things like c++ and .net. So I tried escaping the literals using (?i)\\b(" +forRegex(key)+ ")\\b (method link here), and I got the same result.
The expected output is that it should match(case insensitive):
C++ : 2
C : 2
java: 2
asp.net : 1
.net : 1
Set<String> keywords; // add your keywords in this set;
String text="Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me C,C++,Java,asp.net skills.";
text=text.replaceAll("[, ; ]"," ");
String[] textArray=text.split(" ");
for(String s : keywords){
int count=0;
for(int i=0;i<textArray.length();i++){
if(textArray[i].equals(s)){
count++
}
}
System.out.println(s + " : " + count);
}
This works most of the time. (if you want better result change the regular expression on replaceAll method.)
I would choose a non-regex solution to your problem. Just put the keywords into an array, and search for each occurance in the input string. It uses String.indexOf(String, int) to iterate through the string without creating any new objects (beyond the index and counter).
public class SearchWordCountNonRegex {
public static final void main(String[] ignored) {
//Keywords and input searched for with lowercase, so the keyword "java"
//matches "Java", "java", and "JAVA".
String[] searchWords = {"c++", "c", "java", "asp.net", ".net"};
String input = "Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me C,C++,Java,asp.net skills.".
toLowerCase();
for(int i = 0; i < searchWords.length; i++) {
String searchWord = searchWords[i];
System.out.print(searchWord + ": ");
int foundCount = 0;
int currIdx = 0;
while(currIdx != -1) {
currIdx = input.indexOf(searchWord, currIdx);
if(currIdx != -1) {
foundCount++;
currIdx += searchWord.length();
} else {
currIdx = -1;
}
}
System.out.println(foundCount);
}
}
}
Output:
c++: 2
c: 4
java: 2
asp.net: 1
.net: 2
If you are really wanting a regex solution, you could try something like the following, which uses a case insensitive pattern to match each keyword.
The problem is that the number of occurrences must be kept track of separately. This could be done, for example, by adding each found keyword to a map, where the key is the keyword, and the value is its current count. In addition, once a match is found, the search continues from that point, which implies that any potential overlapping matches are hidden (such as when Asp.NET is found, that particular .NET match will never be found)--this may or may not be a desired behavior.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class SearchWordsRegexNoCounts {
public static final void main(String[] ignored) {
Matcher keywordMtchr = Pattern.compile("(C\\+\\+|C|Java|Asp\\.NET|\\.NET)",
Pattern.CASE_INSENSITIVE).matcher("");
String input = "Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me C,C++,Java,asp.net skills.";
keywordMtchr.reset(input);
while(keywordMtchr.find()) {
System.out.println("Keyword found at index " + keywordMtchr.start() + ": " + keywordMtchr.group(1));
}
}
}
Output:
Keyword found at index 7: java
Keyword found at index 32: .net
Keyword found at index 57: C
Keyword found at index 60: C++
Keyword found at index 90: C
Keyword found at index 92: C++
Keyword found at index 96: Java
Keyword found at index 101: asp.net
Using regex I've come up with the following solution. Although it can potentially find undesired matches as described in the code comments:
// "\\" is first because we don't want to escape any escape characters we will
// be adding ourselves
private static final String[] regexSpecial = {"\\", "(", ")", "[", "]", "{",
"}", ".", "+", "*", "?", "^", "$", "|"};
private static final String regexEscape = "\\";
private static final String[] regexEscapedSpecial;
static {
regexEscapedSpecial = new String[regexSpecial.length];
for (int i = 0; i < regexSpecial.length; i++) {
regexEscapedSpecial[i] = regexEscape + regexSpecial[i];
}
}
public static void main(String[] args) throws Throwable {
Set<String> searchWords = new HashSet<String>(Arrays.asList("c++", "c",
".net", "asp.net", "java"));
String text = "Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me\nC,C++,Java,asp.net skills.";
System.out.println(numOccurrences(text, searchWords, false));
}
/**
* Counts the number of occurrences of the given words in the given text. This
* allows the given "words" to contain non-word characters. Note that it is
* possible for unexpected matches to occur. For example if one of the words
* to match is "c" then while none of the "c"s in "coconut" will be matched,
* the "c" in "c-section" will even if only matches of "c" as in the "c
* programming language" were intended.
*/
public static Map<String, Integer> numOccurrences(String text,
Set<String> searchWords, boolean caseSensitive) {
Map<String, String> lowerCaseToSearchWords = new HashMap<String, String>();
List<String> searchWordsInOrder = sortByNonInclusion(searchWords);
StringBuilder regex = new StringBuilder("(?<!\\w)(");
boolean started = false;
for (String searchWord : searchWordsInOrder) {
lowerCaseToSearchWords.put(searchWord.toLowerCase(), searchWord);
if (started) {
regex.append("|");
} else {
started = true;
}
regex.append(escapeRegex(searchWord));
}
regex.append(")(?!\\w)");
Pattern pattern = null;
if (caseSensitive) {
pattern = Pattern.compile(regex.toString());
} else {
pattern = Pattern.compile(regex.toString(), Pattern.CASE_INSENSITIVE);
}
Matcher matcher = pattern.matcher(text);
Map<String, Integer> matches = new HashMap<String, Integer>();
while (matcher.find()) {
String match = lowerCaseToSearchWords.get(matcher.group(1).toLowerCase());
Integer oldVal = matches.get(match);
if (oldVal == null) {
oldVal = 0;
}
matches.put(match, oldVal + 1);
}
return matches;
}
/**
* Sorts the given collection of words in such a way that if A is a prefix of
* B, then it is guaranteed that A will appear after B in the sorted list.
*/
public static List<String> sortByNonInclusion(Collection<String> toSort) {
List<String> sorted = new ArrayList<String>(new HashSet<String>(toSort));
// sorting in reverse alphabetical order will ensure that if A is a prefix
// of B it will appear later in the list than B
Collections.sort(sorted, new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return o2.compareTo(o1);
}
});
return sorted;
}
/**
* Escape all regex special characters in the given text.
*/
public static String escapeRegex(String toEscape) {
for (int i = 0; i < regexSpecial.length; i++) {
toEscape = toEscape.replace(regexSpecial[i], regexEscapedSpecial[i]);
}
return toEscape;
}
The printed result is
{asp.net=1, c=2, c++=2, java=2, .net=1}

Categories