Replace substring with a regex combination

Replace substring with a regex combination - java

Since I'm not that familiar with java, I don't know if there's a library somewhere that can do this thing. If not, does anybody have any ideas how can this be accomplished?
For instance I have a string "foo" and I want to change the letter f with "f" and "a" so that the function returns a list of strings with values "foo" and "aoo".
How to deal with it when there's more of the same letters? "ffoo" into "ffoo", "afoo", "faoo", "aaoo".
A better explanation:
(("a",("a","b)),("c",("c","d")))
Above is a group of characters that need to be replaced with a character from the other element. "a" is to be replaced with "a" and with "b". "c" is to be replaced with "c" and "d".
If I have a string "ac", the resulting combinations I need are:
"ac"
"bc"
"ad"
"bd"
If the string is "IaJaKc", the resulting combinations are:
"IaJaKc"
"IbJaKc"
"IaJbKc"
"IbJbKc"
"IaJaKd"
"IbJaKd"
"IaJbKd"
"IbJbKd"
The number of combinations can be calculated like this:
(replacements_of_a^letter_amount_a)*(replacements_of_c^letter_amount_c)
first case: 2^1*2^1 = 4
second case: 2^2*2^1 = 8
If, say, the group is (("a",("a","b)),("c",("c","d","e"))), and the string is "aac", the number of combinations is:
2^2*3^1 = 12

Here is the code for your example with foo and aoo
public List<String> doSmthTricky (String str) {
return Arrays.asList("foo".replaceAll("(^.)(.*)", "$1$2 a$2").split(" "));
}
For the input "foo" this method returns a list with 2 strings "foo" and "aoo".
It works only if there is no whitespaces in your input string ("foo" in your example). Otherwise it's a bit more complicated.
How to deal with it when there's more of the same letters? "ffoo" into "ffoo", "afoo", "faoo", "aaoo".
I doubt that regular expressions could help here, you want to generate strings based on initial string, it's not a task for regexp.
UPD: I've created a recursive function (actually it's half-recursive half-iterative) which generates strings based on the template string by replacing its first characters with characters from a specified set:
public static List<String> generatePermutations (String template, String chars, int depth, List<String> result) {
if (depth <= 0) {
result.add (template);
return result;
}
for (int i = 0; i < chars.length(); i++) {
String newTemplate = template.substring(0, depth - 1) + chars.charAt(i) + template.substring(depth);
generatePermutations(newTemplate, chars, depth - 1, result);
}
generatePermutations(template, chars, depth - 1, result);
return result;
}
Parameter #depth means how many characters from the beginning of string should be replaced. Number of permutations (chars.size() + 1) ^ depth.
Tests:
System.out.println(generatePermutations("ffoo", "a", 2, new LinkedList<String>()));
Output: [aaoo, faoo, afoo, ffoo]
--
System.out.println(generatePermutations("ffoo", "ab", 3, new LinkedList<String>()));
Output: [aaao, baao, faao, abao, bbao, fbao, afao, bfao, ffao, aabo, babo, fabo, abbo, bbbo, fbbo, afbo, bfbo, ffbo, aaoo, baoo, faoo, aboo, bboo, fboo, afoo, bfoo, ffoo]

I'm not sure what you need. Please specify source and the result you expect. Anyway, you should use standard java classes for that purpose: java.util.regex.Pattern, java.util.regex.Matcher. If you need to deal with the repeating letters in the beginning, then there is two ways, use symbol "^" - which means beginning of the line, or for the same purpose you can use "\w" shortcut, which means beginning of the word. In more sophisticated cases, please take a look at "lookbehind" expressions. There are more than complete descriptions of these techniques you can find in java doc for java.util.regex and if it's not enough look at www.regular-expressions.info good luck.

Here it is:
public static void returnVariants(String input){
List<String> output = new ArrayList<String>();
StringBuffer word = new StringBuffer(input);
output.add(input);
String letters = "ac";
int lettersLength = letters.length();
int wordLength = word.length();
String replacement = "";
for (int i = 0; i < lettersLength; i++) {
for (int j = 0; j < wordLength; j++) {
if(word.charAt(j)==letters.charAt(i)){
if (word.charAt(j)=='a'){
replacement = "ab";
}else if (word.charAt(j)=='c'){
replacement = "cd";
}
List<String> tempList = new ArrayList<String>();
for (int k = 0; k < replacement.length(); k++) {
for (String variant : output){
StringBuffer tempBuffer = new StringBuffer(variant);
String combination = tempBuffer.replace(j, j+1, replacement.substring(k, k+1)).toString();
tempList.add(combination);
}
}
output.addAll(tempList);
if (j==0){
output.remove(0);
}
}
}
}
Set<String> uniqueCombinations = new HashSet(output);
System.out.println(uniqueCombinations);
}
If input is "ac", the combinations returned are "ac", "bc", "ad", "bd". If it can be optimized further, any additional help is welcome and appreciated.

Related

how to compare two strings to find common substring

i get termination due to timeout error when i compile. Please help me
Given two strings, determine if they share a common substring. A substring may be as small as one character.
For example, the words "a", "and", "art" share the common substring "a" . The words "be" and "cat" do not share a substring.
Input Format
The first line contains a single integer , the number of test cases.
The following pairs of lines are as follows:
The first line contains string s1 .
The second line contains string s2 .
Output Format
For each pair of strings, return YES or NO.
my code in java
public static void main(String args[])
{
String s1,s2;
int n;
Scanner s= new Scanner(System.in);
n=s.nextInt();
while(n>0)
{
int flag = 0;
s1=s.next();
s2=s.next();
for(int i=0;i<s1.length();i++)
{
for(int j=i;j<s2.length();j++)
{
if(s1.charAt(i)==s2.charAt(j))
{
flag=1;
}
}
}
if(flag==1)
{
System.out.println("YES");
}
else
{
System.out.println("NO");
}
n--;
}
}
}
any tips?

Below is my approach to get through the same HackerRank challenge described above
static String twoStrings(String s1, String s2) {
String result="NO";
Set<Character> set1 = new HashSet<Character>();
for (char s : s1.toCharArray()){
set1.add(s);
}
for(int i=0;i<s2.length();i++){
if(set1.contains(s2.charAt(i))){
result = "YES";
break;
}
}
return result;
}
It passed all the Test cases without a time out issue.

The reason for the timeout is probably: to compare two strings that each are 1.000.000 characters long, your code needs 1.000.000 * 1.000.000 comparisons, always.
There is a faster algorithm that only needs 2 * 1.000.000 comparisons. You should use the faster algorithm instead. Its basic idea is:
for each character in s1: add the character to a set (this is the first million)
for each character in s2: test whether the set from step 1 contains the character, and if so, return "yes" immediately (this is the second million)
Java already provides a BitSet data type that does all you need. It is used like this:
BitSet seenInS1 = new BitSet();
seenInS1.set('x');
seenInS1.get('x');

Since you're worried about execution time, if they give you an expected range of characters (for example 'a' to 'z'), you can solve it very efficiently like this:
import java.util.Arrays;
import java.util.Scanner;
public class Whatever {
final static char HIGHEST_CHAR = 'z'; // Use Character.MAX_VALUE if unsure.
public static void main(final String[] args) {
final Scanner scanner = new Scanner(System.in);
final boolean[] characterSeen = new boolean[HIGHEST_CHAR + 1];
mainloop:
for (int word = Integer.parseInt(scanner.nextLine()); word > 0; word--) {
Arrays.fill(characterSeen, false);
final String word1 = scanner.nextLine();
for (int i = 0; i < word1.length(); i++) {
characterSeen[word1.charAt(i)] = true;
}
final String word2 = scanner.nextLine();
for (int i = 0; i < word2.length(); i++) {
if (characterSeen[word2.charAt(i)]) {
System.out.println("YES");
continue mainloop;
}
}
System.out.println("NO");
}
}
}
The code was tested to work with a few inputs.
This uses a fast array rather than slower sets, and it only creates one non-String object (other than the Scanner) for the entire run of the program. It also runs in O(n) time rather than O(n²) time.
The only thing faster than an array might be the BitSet Roland Illig mentioned.
If you wanted to go completely overboard, you could also potentially speed it up by:
skipping the creation of a Scanner and all those String objects by using System.in.read(buffer) directly with a reusable byte[] buffer
skipping the standard process of having to spend time checking for and properly handling negative numbers and invalid inputs on the first line by making your own very fast int parser that just assumes it's getting the digits of a valid nonnegative int followed by a newline

There are different approaches to solve this problem but solving this problem in linear time is a bit tricky.
Still, this problem can be solved in linear time. Just apply KMP algorithm in a trickier way.
Let's say you have 2 strings. Find the length of both strings first. Say length of string 1 is bigger than string 2. Make string 1 as your text and string 2 as your pattern. If the length of the string is n and length of the pattern is m then time complexity of the above problem would be O(m+n) which is way faster than O(n^2).
In this problem, you need to modify the KMP algorithm to get the desired result.
Just need to modify the KMP
public static void KMPsearch(char[] text,char[] pattern)
{
int[] cache = buildPrefix(pattern);
int i=0,j=0;
while(i<text.length && j<pattern.length)
{
if(text[i]==pattern[j])
{System.out.println("Yes");
return;}
else{
if(j>0)
j = cache[j-1];
else
i++;
}
}
System.out.println("No");
return;
}
Understanding Knuth-Morris-Pratt Algorithm

There are two concepts involved in solving this question.
-Understanding that a single character is a valid substring.
-Deducing that we only need to know that the two strings have a common substring — we don’t need to know what that substring is.
Thus, the key to solving this question is determining whether or not the two strings share a common character.
To do this, we create two sets, a and b, where each set contains the unique characters that appear in the string it’s named after.
Because sets 26 don’t store duplicate values, we know that the size of our sets will never exceed the letters of the English alphabet.
In addition, the small size of these sets makes finding the intersection very quick.
If the intersection of the two sets is empty, we print NO on a new line; if the intersection of the two sets is not empty, then we know that strings and share one or more common characters and we print YES on a new line.
In code, it may look something like this
import java.util.*;
public class Solution {
static Set<Character> a;
static Set<Character> b;
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
int n = scan.nextInt();
for(int i = 0; i < n; i++) {
a = new HashSet<Character>();
b = new HashSet<Character>();
for(char c : scan.next().toCharArray()) {
a.add(c);
}
for(char c : scan.next().toCharArray()) {
b.add(c);
}
// store the set intersection in set 'a'
a.retainAll(b);
System.out.println( (a.isEmpty()) ? "NO" : "YES" );
}
scan.close();
}
}

public String twoStrings(String sOne, String sTwo) {
if (sOne.equals(sTwo)) {
return "YES";
}
Set<Character> charSetOne = new HashSet<Character>();
for (Character c : sOne.toCharArray())
charSetOne.add(c);
Set<Character> charSetTwo = new HashSet<Character>();
for (Character c : sTwo.toCharArray())
charSetTwo.add(c);
charSetOne.retainAll(charSetTwo);
if (charSetOne.size() > 0) {
return "YES";
}
return "NO";
}
This must work. Tested with some large inputs.

Python3
def twoStrings(s1, s2):
flag = False
for x in s1:
if x in s2:
flag = True
if flag == True:
return "YES"
else:
return "NO"
if __name__ == '__main__':
q = 2
text = [("hello","world"), ("hi","world")]
for q_itr in range(q):
s1 = text[q_itr][0]
s2 = text[q_itr][1]
result = twoStrings(s1, s2)
print(result)

static String twoStrings(String s1, String s2) {
for (Character ch : s1.toCharArray()) {
if (s2.indexOf(ch) > -1)
return "YES";
}
return "NO";
}

How to tokenize Chinese into individual characters in Java? [duplicate]

I need to split a String into an array of single character Strings.
Eg, splitting "cat" would give the array "c", "a", "t"

"cat".split("(?!^)")
This will produce
array ["c", "a", "t"]

"cat".toCharArray()
But if you need strings
"cat".split("")
Edit: which will return an empty first value.

String str = "cat";
char[] cArray = str.toCharArray();

If characters beyond Basic Multilingual Plane are expected on input (some CJK characters, new emoji...), approaches such as "a💫b".split("(?!^)") cannot be used, because they break such characters (results into array ["a", "?", "?", "b"]) and something safer has to be used:
"a💫b".codePoints()
.mapToObj(cp -> new String(Character.toChars(cp)))
.toArray(size -> new String[size]);

split("(?!^)") does not work correctly if the string contains surrogate pairs. You should use split("(?<=.)").
String[] splitted = "花ab🌹🌺🌷".split("(?<=.)");
System.out.println(Arrays.toString(splitted));
output:
[花, a, b, 🌹, 🌺, 🌷]

To sum up the other answers...
This works on all Java versions:
"cat".split("(?!^)")
This only works on Java 8 and up:
"cat".split("")

An efficient way of turning a String into an array of one-character Strings would be to do this:
String[] res = new String[str.length()];
for (int i = 0; i < str.length(); i++) {
res[i] = Character.toString(str.charAt(i));
}
However, this does not take account of the fact that a char in a String could actually represent half of a Unicode code-point. (If the code-point is not in the BMP.) To deal with that you need to iterate through the code points ... which is more complicated.
This approach will be faster than using String.split(/* clever regex*/), and it will probably be faster than using Java 8+ streams. It is probable faster than this:
String[] res = new String[str.length()];
int 0 = 0;
for (char ch: str.toCharArray[]) {
res[i++] = Character.toString(ch);
}
because toCharArray has to copy the characters to a new array.

for(int i=0;i<str.length();i++)
{
System.out.println(str.charAt(i));
}

Maybe you can use a for loop that goes through the String content and extract characters by characters using the charAt method.
Combined with an ArrayList<String> for example you can get your array of individual characters.

If the original string contains supplementary Unicode characters, then split() would not work, as it splits these characters into surrogate pairs. To correctly handle these special characters, a code like this works:
String[] chars = new String[stringToSplit.codePointCount(0, stringToSplit.length())];
for (int i = 0, j = 0; i < stringToSplit.length(); j++) {
int cp = stringToSplit.codePointAt(i);
char c[] = Character.toChars(cp);
chars[j] = new String(c);
i += Character.charCount(cp);
}

In my previous answer I mixed up with JavaScript. Here goes an analysis of performance in Java.
I agree with the need for attention on the Unicode Surrogate Pairs in Java String. This breaks the meaning of methods like String.length() or even the functional meaning of Character because it's ultimately a technical object which may not represent one character in human language.
I implemented 4 methods that split a string into list of character-representing strings (Strings corresponding to human meaning of characters). And here's the result of comparison:
A line is a String consisting of 1000 arbitrary chosen emojis and 1000 ASCII characters (1000 times <emoji><ascii>, total 2000 "characters" in human meaning).
(discarding 256 and 512 measures)
Implementations:
codePoints (java 11 and above)
public static List<String> toCharacterStringListWithCodePoints(String str) {
if (str == null) {
return Collections.emptyList();
}
return str.codePoints()
.mapToObj(Character::toString)
.collect(Collectors.toList());
}
classic
public static List<String> toCharacterStringListWithIfBlock(String str) {
if (str == null) {
return Collections.emptyList();
}
List<String> strings = new ArrayList<>();
char[] charArray = str.toCharArray();
int delta = 1;
for (int i = 0; i < charArray.length; i += delta) {
delta = 1;
if (i < charArray.length - 1 && Character.isSurrogatePair(charArray[i], charArray[i + 1])) {
delta = 2;
strings.add(String.valueOf(new char[]{ charArray[i], charArray[i + 1] }));
} else {
strings.add(Character.toString(charArray[i]));
}
}
return strings;
}
regex
static final Pattern p = Pattern.compile("(?<=.)");
public static List<String> toCharacterStringListWithRegex(String str) {
if (str == null) {
return Collections.emptyList();
}
return Arrays.asList(p.split(str));
}
Annex (RAW DATA):
codePoints;classic;regex;lines
45;44;84;256
14;20;98;512
29;42;91;1024
52;56;99;2048
87;121;174;4096
175;221;375;8192
345;411;839;16384
667;826;1285;32768
1277;1536;2440;65536
2426;2938;4238;131072

We can do this simply by
const string = 'hello';
console.log([...string]); // -> ['h','e','l','l','o']
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax says
Spread syntax (...) allows an iterable such as an array expression or string to be expanded...
So, strings can be quite simply spread into arrays of characters.

Writing method that spells word backwords and identifies number of palindromes

I'm new to java and I wrote this method to input a string word and output the word spelled backwards. The intent is to create a method and not use an already existing method such as the simple reverse. Please help point me in the direction of how to do this to reverse a word. I'm also trying to determine/count if there are palindromes. Please help! I've read other questions and I can't find anything specific enough to my case. I know that my code doesn't run, though I'm unsure how to fix it to get the correct output.
An example would be the word "backwards" to go to "sdrawkcab".
public static int reverseWord(String word) {
int palindromes = 0;
for (int i = word.length(); i >= 0; i--) {
System.out.print(i);
word.equalsIgnoreCase();
if (word.charAt(i)) == index(word.charAt(0 && 1))) {
palindromes++
System.out.println(palindromes)
}
return i;
}
}

There are multiple problems with your code.
1.The prototype of equalsIgnoreCase is
public boolean equalsIgnoreCase(String str);
So this method expect a String to be passed,but your not not passing anything here.To fix this,pass another string with whom you want to match your word like this..
word.equalsIgnoreCase("myAnotherString");
2.word.charAt(i);
Suppose word="qwerty",so indexing of each character will be like this
/* q w e r t y
0 1 2 3 4 5 */
So when you use i = word.length();i will 6 since word is of length 6.So
word.charAt(i) will search for character at index 6,but since there is not index 6,it will return an exception ArrayIndexOutOfBound.To fix this,start i from word.length()-1.
3.if (word.charAt(i));
This extra " ) ".Remove it.
Is Index() your own method?.If Yes,then check that also.

the below code prints the reverse of the input string and checks if it is a palindrome
public static void main(String[] args) {
String input = "dad";
char temp[] = input.toCharArray();//converting it to a array so that each character can be compared to the original string
char output[] = new char[temp.length];//taking another array of the same size as the input string
for (int i = temp.length - 1, j = 0; i >= 0; i--, j++) {//i variable for iterating through the input string and j variable for inserting data into output string.
System.out.print(temp[i]);//printing each variable of the input string in reverse order.
output[j] = temp[i];//inserting data into output string
}
System.out.println(String.valueOf(output));
if (String.valueOf(output).equalsIgnoreCase(input)) {//comparing the output string with the input string for palindrome check
System.out.println("palindrome");
}
}

Because your question about what is wrong with your code was already answered here is another way you could do it by using some concepts which are somewhat less low level than directly working with character arrays
public static boolean printWordAndCheckIfPalindrome(final String word) {
// Create a StringBuilder which helps when building a string
final StringBuilder reversedWordBuilder = new StringBuilder("");
// Get a stream of the character values of the word
word.chars()
// Add each character to the beginning of the reversed word,
// example for "backwards": "b", "ab", "cab", "kcab", ...
.forEach(characterOfString -> reversedWordBuilder.insert(0, (char) characterOfString));
// Generate a String out of the contents of the StringBuilder
final String reversedWord = reversedWordBuilder.toString();
// print the reversed word
System.out.println(reversedWord);
// if the reversed word equals the given word it is a palindrome
return word.equals(reversedWord);
}

Regex pattern for matching words like c++ in a text

I have a text which can have words like c++, c, .net, asp.net in any format.
Sample Text:
Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me C,C++,Java,asp.net skills.
I already have c,c++,java,.net,asp.net stored somewhere.
All I need is to pick the occurrences of all these words in the text.
The pattern I was using to match was (?i)\\b(" +Pattern.quote(key)+ ")\\b which doesn't match things like c++ and .net. So I tried escaping the literals using (?i)\\b(" +forRegex(key)+ ")\\b (method link here), and I got the same result.
The expected output is that it should match(case insensitive):
C++ : 2
C : 2
java: 2
asp.net : 1
.net : 1

Set<String> keywords; // add your keywords in this set;
String text="Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me C,C++,Java,asp.net skills.";
text=text.replaceAll("[, ; ]"," ");
String[] textArray=text.split(" ");
for(String s : keywords){
int count=0;
for(int i=0;i<textArray.length();i++){
if(textArray[i].equals(s)){
count++
}
}
System.out.println(s + " : " + count);
}
This works most of the time. (if you want better result change the regular expression on replaceAll method.)

I would choose a non-regex solution to your problem. Just put the keywords into an array, and search for each occurance in the input string. It uses String.indexOf(String, int) to iterate through the string without creating any new objects (beyond the index and counter).
public class SearchWordCountNonRegex {
public static final void main(String[] ignored) {
//Keywords and input searched for with lowercase, so the keyword "java"
//matches "Java", "java", and "JAVA".
String[] searchWords = {"c++", "c", "java", "asp.net", ".net"};
String input = "Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me C,C++,Java,asp.net skills.".
toLowerCase();
for(int i = 0; i < searchWords.length; i++) {
String searchWord = searchWords[i];
System.out.print(searchWord + ": ");
int foundCount = 0;
int currIdx = 0;
while(currIdx != -1) {
currIdx = input.indexOf(searchWord, currIdx);
if(currIdx != -1) {
foundCount++;
currIdx += searchWord.length();
} else {
currIdx = -1;
}
}
System.out.println(foundCount);
}
}
}
Output:
c++: 2
c: 4
java: 2
asp.net: 1
.net: 2
If you are really wanting a regex solution, you could try something like the following, which uses a case insensitive pattern to match each keyword.
The problem is that the number of occurrences must be kept track of separately. This could be done, for example, by adding each found keyword to a map, where the key is the keyword, and the value is its current count. In addition, once a match is found, the search continues from that point, which implies that any potential overlapping matches are hidden (such as when Asp.NET is found, that particular .NET match will never be found)--this may or may not be a desired behavior.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class SearchWordsRegexNoCounts {
public static final void main(String[] ignored) {
Matcher keywordMtchr = Pattern.compile("(C\\+\\+|C|Java|Asp\\.NET|\\.NET)",
Pattern.CASE_INSENSITIVE).matcher("");
String input = "Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me C,C++,Java,asp.net skills.";
keywordMtchr.reset(input);
while(keywordMtchr.find()) {
System.out.println("Keyword found at index " + keywordMtchr.start() + ": " + keywordMtchr.group(1));
}
}
}
Output:
Keyword found at index 7: java
Keyword found at index 32: .net
Keyword found at index 57: C
Keyword found at index 60: C++
Keyword found at index 90: C
Keyword found at index 92: C++
Keyword found at index 96: Java
Keyword found at index 101: asp.net

Using regex I've come up with the following solution. Although it can potentially find undesired matches as described in the code comments:
// "\\" is first because we don't want to escape any escape characters we will
// be adding ourselves
private static final String[] regexSpecial = {"\\", "(", ")", "[", "]", "{",
"}", ".", "+", "*", "?", "^", "$", "|"};
private static final String regexEscape = "\\";
private static final String[] regexEscapedSpecial;
static {
regexEscapedSpecial = new String[regexSpecial.length];
for (int i = 0; i < regexSpecial.length; i++) {
regexEscapedSpecial[i] = regexEscape + regexSpecial[i];
}
}
public static void main(String[] args) throws Throwable {
Set<String> searchWords = new HashSet<String>(Arrays.asList("c++", "c",
".net", "asp.net", "java"));
String text = "Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me\nC,C++,Java,asp.net skills.";
System.out.println(numOccurrences(text, searchWords, false));
}
/**
* Counts the number of occurrences of the given words in the given text. This
* allows the given "words" to contain non-word characters. Note that it is
* possible for unexpected matches to occur. For example if one of the words
* to match is "c" then while none of the "c"s in "coconut" will be matched,
* the "c" in "c-section" will even if only matches of "c" as in the "c
* programming language" were intended.
*/
public static Map<String, Integer> numOccurrences(String text,
Set<String> searchWords, boolean caseSensitive) {
Map<String, String> lowerCaseToSearchWords = new HashMap<String, String>();
List<String> searchWordsInOrder = sortByNonInclusion(searchWords);
StringBuilder regex = new StringBuilder("(?<!\\w)(");
boolean started = false;
for (String searchWord : searchWordsInOrder) {
lowerCaseToSearchWords.put(searchWord.toLowerCase(), searchWord);
if (started) {
regex.append("|");
} else {
started = true;
}
regex.append(escapeRegex(searchWord));
}
regex.append(")(?!\\w)");
Pattern pattern = null;
if (caseSensitive) {
pattern = Pattern.compile(regex.toString());
} else {
pattern = Pattern.compile(regex.toString(), Pattern.CASE_INSENSITIVE);
}
Matcher matcher = pattern.matcher(text);
Map<String, Integer> matches = new HashMap<String, Integer>();
while (matcher.find()) {
String match = lowerCaseToSearchWords.get(matcher.group(1).toLowerCase());
Integer oldVal = matches.get(match);
if (oldVal == null) {
oldVal = 0;
}
matches.put(match, oldVal + 1);
}
return matches;
}
/**
* Sorts the given collection of words in such a way that if A is a prefix of
* B, then it is guaranteed that A will appear after B in the sorted list.
*/
public static List<String> sortByNonInclusion(Collection<String> toSort) {
List<String> sorted = new ArrayList<String>(new HashSet<String>(toSort));
// sorting in reverse alphabetical order will ensure that if A is a prefix
// of B it will appear later in the list than B
Collections.sort(sorted, new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return o2.compareTo(o1);
}
});
return sorted;
}
/**
* Escape all regex special characters in the given text.
*/
public static String escapeRegex(String toEscape) {
for (int i = 0; i < regexSpecial.length; i++) {
toEscape = toEscape.replace(regexSpecial[i], regexEscapedSpecial[i]);
}
return toEscape;
}
The printed result is
{asp.net=1, c=2, c++=2, java=2, .net=1}

Remove all non alphabetic characters from a String array in java

I'm trying to write a method that removes all non alphabetic characters from a Java String[] and then convert the String to an lower case string. I've tried using regular expression to replace the occurence of all non alphabetic characters by "" .However, the output that I am getting is not able to do so. Here is the code
static String[] inputValidator(String[] line) {
for(int i = 0; i < line.length; i++) {
line[i].replaceAll("[^a-zA-Z]", "");
line[i].toLowerCase();
}
return line;
}
However if I try to supply an input that has non alphabets (say - or .) the output also consists of them, as they are not removed.
Example Input
A dog is an animal. Animals are not people.
Output that I'm getting
A
dog
is
an
animal.
Animals
are
not
people.
Output that is expected
a
dog
is
an
animal
animals
are
not
people

The problem is your changes are not being stored because Strings are immutable. Each of the method calls is returning a new String representing the change, with the current String staying the same. You just need to store the returned String back into the array.
line[i] = line[i].replaceAll("[^a-zA-Z]", "");
line[i] = line[i].toLowerCase();
Because the each method is returning a String you can chain your method calls together. This will perform the second method call on the result of the first, allowing you to do both actions in one line.
line[i] = line[i].replaceAll("[^a-zA-Z]", "").toLowerCase();

You need to assign the result of your regex back to lines[i].
for ( int i = 0; i < line.length; i++) {
line[i] = line[i].replaceAll("[^a-zA-Z]", "").toLowerCase();
}

It doesn't work because strings are immutable, you need to set a value
e.g.
line[i] = line[i].toLowerCase();

You must reassign the result of toLowerCase() and replaceAll() back to line[i], since Java String is immutable (its internal value never changes, and the methods in String class will return a new String object instead of modifying the String object).

As it already answered , just thought of sharing one more way that was not mentioned here >
str = str.replaceAll("\\P{Alnum}", "").toLowerCase();

A cool (but slightly cumbersome, if you don't like casting) way of doing what you want to do is go through the entire string, index by index, casting each result from String.charAt(index) to (byte), and then checking to see if that byte is either a) in the numeric range of lower-case alphabetic characters (a = 97 to z = 122), in which case cast it back to char and add it to a String, array, or what-have-you, or b) in the numeric range of upper-case alphabetic characters (A = 65 to Z = 90), in which case add 32 (A + 22 = 65 + 32 = 97 = a) and cast that to char and add it in. If it is in neither of those ranges, simply discard it.

You can also use Arrays.setAll for this:
Arrays.setAll(array, i -> array[i].replaceAll("[^a-zA-Z]", "").toLowerCase());

Here is working method
String name = "Joy.78#,+~'{/>";
String[] stringArray = name.split("\\W+");
StringBuilder result = new StringBuilder();
for (int i = 0; i < stringArray.length; i++) {
result.append(stringArray[i]);
}
String nameNew = result.toString();
nameNew.toLowerCase();

public static void solve(String line){
// trim to remove unwanted spaces
line= line.trim();
String[] split = line.split("\\W+");
// print using for-each
for (String s : split) {
System.out.println(s);
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Replace substring with a regex combination - java

Related

how to compare two strings to find common substring

How to tokenize Chinese into individual characters in Java? [duplicate]

Writing method that spells word backwords and identifies number of palindromes

Regex pattern for matching words like c++ in a text

Remove all non alphabetic characters from a String array in java

Categories

Resources