Split string into repeated characters

Split string into repeated characters - java

I want to split the string "aaaabbbccccaaddddcfggghhhh" into "aaaa", "bbb", "cccc". "aa", "dddd", "c", "f" and so on.
I tried this:
String[] arr = "aaaabbbccccaaddddcfggghhhh".split("(.)(?!\\1)");
But this eats away one character, so with the above regular expression I get "aaa" while I want it to be "aaaa" as the first string.
How do I achieve this?

Try this:
String str = "aaaabbbccccaaddddcfggghhhh";
String[] out = str.split("(?<=(.))(?!\\1)");
System.out.println(Arrays.toString(out));
=> [aaaa, bbb, cccc, aa, dddd, c, f, ggg, hhhh]
Explanation: we want to split the string at groups of same chars, so we need to find out the "boundary" between each group. I'm using Java's syntax for positive look-behind to pick the previous char and then a negative look-ahead with a back reference to verify that the next char is not the same as the previous one. No characters were actually consumed, because only two look-around assertions were used (that is, the regular expresion is zero-width).

What about capturing in a lookbehind?
(?<=(.))(?!\1|$)
as a Java string:
(?<=(.))(?!\\1|$)

here I am taking each character and Checking two conditions in the if loop i.e String can't exceed the length and if next character is not equaled to the first character continue the for loop else take new line and print it.
for (int i = 0; i < arr.length; i++) {
char chr= arr[i];
System.out.print(chr);
if (i + 1 < arr.length && arr[i + 1] != chr) {
System.out.print(" \n");
}
}

Related

java regex mask all elements in a list with last 4 characters visible

I have a list of alphanumeric strings as below
["nG5wnyPVNxS6PbbDNNbRsK5zanG94Et6Q4y74","GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1odeNv","GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1o12NN"]
I need to mask all elements with last 4 characters visible and [ " must not be masked as below.
["XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX4y74","XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXdeNv","XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX12NN"]
I have tried using
(\\W+)(\\W+)(\\w+)(\\w+)(\\w+)(\\w+)(\\w+)(\\W+)(\\W+)
as the key and $1$2XXXXXXXXXX$4$5$6$7$8$9 as the value in
maskedValue = maskedValue.replaceAll("(\\W+)(\\W+)(\\w+)(\\w+)(\\w+)(\\w+)(\\w+)(\\W+)(\\W+)", "$1$2XXXXXXXXXX$4$5$6$7$8$9")
but this only masked the first element.
["XXXXXXXXXXdeNv","nG5wnyPVNxS6PbbDNNbRsK5zanG94Et6Q4y74"]
Any leads are appreciated. Thanks in advance.

For a single value, you could use an assertion to match a word character asserting 4 characters at the end of the string.
\w(?=\w*\w{4}$)
Regex demo | Java demo
String values[] = {"nG5wnyPVNxS6PbbDNNbRsK5zanG94Et6Q4y74","GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1odeNv","GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1o12NN"};
for (String element : values)
System.out.println(element.replaceAll("\\w(?=\\w*\\w{4}$)", "X"));
Output
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX4y74
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXdeNv
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX12NN
For the whole string, you might use a finite quantifier in a positive lookbehind to match the opening " followed by a number of word characters. Then match all the characters that have 4 character before the closing "
"(?<=\"{0,100})\\w(?=\\w*\\w{4}\")"
Regex demo | Java demo
String regex = "(?<=\"{0,100})\\w(?=\\w*\\w{4}\")";
String string = "[\"nG5wnyPVNxS6PbbDNNbRsK5zanG94Et6Q4y74\",\"GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1odeNv\",\"GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1o12NN\"] ";
System.out.println(string.replaceAll(regex, "X"));
Output
["XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX4y74","XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXdeNv","XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX12NN"]

Using a stream:
List<String> terms = Arrays.asList(new String[] {
"nG5wnyPVNxS6PbbDNNbRsK5zanG94Et6Q4y74",
"GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1odeNv",
"GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1o12NN"
});
List<String> termsOut = terms.stream()
.map(t -> String.join("", Collections.nCopies(t.length() - 4, "x")) +
t.substring(t.length() - 4))
.collect(Collectors.toList());
System.out.println(termsOut);
This prints:
[xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx4y74,
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxdeNv,
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx12NN]
Note that this solution does not even use regex, which means it may outperform a regex based solution.

Assuming each of these strings will start and end with quotes
Algo:
Use a flag or stack data structure to know if it's a starting quote or ending quote.
For example:
Traverse the string. Initially flag will be false. When you encounter a new quote you have to flip flag and keep traversing till you find other quote. You can do the same with
Stack stack = new Stack<>();
Sample workflow:
String str="random";
boolean flag = false;
int idx = 0;
List<Pair<Integer, Integer>> indices = new ArrayList<>();
StringBuilder string = new StringBuilder(); // for final string
int start;
int end;
while(idx < str.length()){
if (str.charAt(idx) == '"' && !flag){
// start index of string
string.append(s.charAt(idx));
start = idx;
flag = true;
}
else if (str.charAt(idx) == '"' && !flag){
// end index of string
flag = false;
end = idx;
char[] mask = new char[end-3-start];
Arrays.fill(mask, 'x');
string.append(new String(mask)); // need to put 'x' in place
}
if (!flag){
string.append(s.charAt(idx));
}
idx++;
}
Complexity: O(n)

Extract words between double quotes based on position

I have a single string that contains several quotes, i.e:
"Bruce Wayne" "43" "male" "Gotham"
I want to create a method using regex that extracts certain values from the String based on their position.
So for example, if I pass the Int values 1 and 3 it should return a String of:
"Bruce Wayne" "male"
Please note the double quotes are part of the String and are escaped characters (\")

If the number of (possible) groups is known you could use a regular expression like "(.*?)"\s*"(.*?)"\s*"(.*?)"\s*"(.*?)" along with Pattern and Matcher and access the groups by number (group 0 will always be the first match, group 1 will be the first capturing group in the expression and so on).
If the number of groups is not known you could just use expression "(.*?)" and use Matcher#find() too apply the expression in a loop and collect all the matches (group 0 in that case) into a list. Then use your indices to access the list element (element 1 would be at index 0 then).
Another alternative would be to use string.replaceAll("^[^\"]*\"|\"[^\"]*$","").split("\"\\s*\""), i.e. remove the leading and trailing double quotes with any text before or after and then split on quotes with optional whitespace in between.
Example:
assume the string optional crap before "Bruce Wayne" "43" "male" "Gotham" optional crap after
string.replaceAll("^[^\"]*\"|\"[^\"]*$","") will result in Bruce Wayne" "43" "male" "Gotham
applying split("\"\\s*\"") on the result of the step before will yield the array [Bruce Wayne, 43, male, Gotham]
then just access the array elements by index (zero-based)

My function starts at 0. You said that you want 1 and 3 but usually you start at 0 when working with arrays. So to get "Bruce Wayne" you'd ask for 0 not 1. (you could change that if you'd like though)
String[] getParts(String text, int... positions) {
String results[] = new String[positions.length];
Matcher m = Pattern.compile("\"[^\"]*\"").matcher(text);
for(int i = 0, j = 0; m.find() && j < positions.length; i++) {
if(i != positions[j]) continue;
results[j] = m.group();
j++;
}
return results;
}
// Usage
public Test() {
String[] parts = getParts(" \"Bruce Wayne\" \"43\" \"male\" \"Gotham\" ", 0, 2);
System.out.println(Arrays.toString(parts));
// = ["Bruce Wayne", "male"]
}
The method accepts as many parameters as you like.
getParts(" \"a\" \"b\" \"c\" \"d\" ", 0, 2, 3); // = a, c, d
// or
getParts(" \"a\" \"b\" \"c\" \"d\" ", 3); // = d

The function to extract words based on position:
import java.util.ArrayList;
import java.util.regex.*;
public String getString(String input, int i, int j){
ArrayList <String> list = new ArrayList <String> ();
Matcher m = Pattern.compile("(\"[^\"]+\")").matcher(input);
while (m.find()) {
list.add(m.group(1));
}
return list.get(i - 1) + list.get(j - 1);
}
Then the words can be extracted like:
String input = "\"Bruce Wayne\" \"43\" \"male\" \"Gotham\"";
String res = getString(input, 1, 3);
System.out.println(res);
Output:
"Bruce Wayne""male"

java regular expression examples for match without length limitation

i trying to write a regular expression for match a string starting with letter "G" and second index should be any number (0-9) and rest of the string can be contain any thing and can be any length,
i'm stuck in following code
String[] array = { "DA4545", "G121", "G8756942", "N45", "4578", "#45565" };
String regExp = "^[G]\\d[0-9]";
for(int i = 0; i < array.length; i++)
{
if(Pattern.matches(regExp, array[i]))
{
System.out.println(array[i] + " - Successful");
}
}
output:
G12 - Successful
why is not match the 3 index "G8756942"

G - the letter G
[0-9] - a digit
.* - any sequence of characters
So the expression
G[0-9].*
will match a letter G followed by a digit followed by any sequence of characters.

when you write \d it already means [0-9]
so when you say \d[0-9] that means two digits exactly
better use :
^G\\d*
which will match all words starting with G and having zero or more digits

"^[G]\\d[0-9]"
This regex matches "G" followed by \\d, then another number.
Use one of these:
"^G\\d"
"^G[0-9]"
Also note that you don't need a character class since it only contains one letter, so it's redundant.

try this regex .* will match any character after digit
^G\\d.*
http://regex101.com/r/uE4tX1/1

why is not match the 3 index "G8756942"
Because you match for a string starting with G, followed by a \, a d and exactly one digit. Solution:
^[G]\d

This regex would be fine.
"G\\d.*"
Because matches method tries to match the whole input, you need to add .* at the last in your pattern and also you don't need to include anchors.
String[] array = { "DA4545", "G121", "G8756942", "N45", "4578", "#45565" };
String regExp = "G\\d.*";
for(int i = 0; i < array.length; i++)
{
if(Pattern.matches(regExp, array[i]))
{
System.out.println(array[i] + " - Successful");
}
}
Output:
G121 - Successful
G8756942 - Successful

How to exclude the words that have non-alphabetic characters from string

For example, if I want to delete the non-alphabetic characters I would do:
for (int i = 0; i < s.length; i++) {
s[i] = s[i].replaceAll("[^a-zA-Z]", "");
}
How do I completely exclude a word with a non-alphabetic character from the string?
For example:
Initial input:
"a cat jumped jumped; on the table"
It should exclude "jumped;" because of ";".
Output:
"a cat jumped on the table"

Edit: (in response to your edit)
You could do this:
String input = "a cat jumped jumped; on the table";
input = input.replaceAll("(^| )[^ ]*[^A-Za-z ][^ ]*(?=$| )", "");
Let's break down the regex:
(^| ) matches after the beginning of a word, either after a space or after the start of the string.
[^ ]* matches any sequence, including the null string, of non-spaces (because spaces break the word)
[^A-Za-z ] checks if the character is non-alphabetical and does not break the string.
Lastly, we need to append [^ ]* to make it match until the end of the word.
(?=$| ) matches the end of the word, either the end of the string or the next space character, but it doesn't consume the next space, so that consecutive words will still match (ie "I want to say hello, world! everybody" becomes "I want to say everybody")
Note: if "a cat jumped off the table." should output "a cat jumped off the table", then use this:
input = input.replaceAll(" [^ ]*[^A-Za-z ][^ ]*(?= )", "").replaceAll("[^A-Za-z]$", "");
Assuming you have 1 word per array element, you can do this to replace them with the empty string:
for (String string: s) {
if (s.matches(".*[^A-Za-z].*") {
s = "";
}
}
If you actually want to remove it, consider using an ArrayList:
ArrayList<String> stringList = new ArrayList<>();
for (int index = 0; index < s.length; index++) {
if (s[index].matches(".*[^A-Za-z].*") {
stringList.add(s[index]);
}
}
And the ArrayList will have all the elements that don't have non-alphabetical characters in them.

Try this:
s = s[i].join(" ").replaceAll("\\b\\w*\\W+\\w*(?=\\b)", "").split(" ");
It joins the array with spaces, then applies the regex. The regex looks for a word break (\b), then a word with at least one non-word character (\w*\W+\w*), and then a word break at the end (not matched, there will still be a space). The split splits the string into an array.

public static void main(String[] args) throws ClassNotFoundException {
String str[] ={ "123abass;[;[]","abcde","1234"};
for(String s : str)
{
if(s.matches("^[a-zA-Z]+$")) // should start and end with [a-zA-Z]
System.out.println(s);
}
O/P : abcde

You could use .toLowerCase() on each value in the array, then search the array against a-z values and it will be faster than a regular expression. Assume that your values are in an array called "myArray."
List<String> newValues = new ArrayList<>();
for(String s : myArray) {
if(containsOnlyLetters(s)) {
newValues.add(s);
}
}
//do this if you have to go back to an array instead of an ArrayList
String[] newArray = (String[])newValues.toArray();
This is the containsOnlyLetters method:
boolean containsOnlyLetters(String input) {
char[] inputLetters = input.toLowerCase().toCharArray();
for(char c : inputLetters) {
if(c < 'a' || c > 'z') {
return false;
}
}
return true;
}

Split a String at every 3rd comma in Java

I have a string that looks like this:
0,0,1,2,4,5,3,4,6
What I want returned is a String[] that was split after every 3rd comma, so the result would look like this:
[ "0,0,1", "2,4,5", "3,4,6" ]
I have found similar functions but they don't split at n-th amount of commas.

NOTE: while solution using split may work (last test on Java 17) it is based on bug since look-ahead in Java should have obvious maximum length. This limitation should theoretically prevent us from using + but somehow \G at start lets us use + here. In the future this bug may be fixed which means that split will stop working.
Safer approach would be using Matcher#find like
String data = "0,0,1,2,4,5,3,4,6";
Pattern p = Pattern.compile("\\d+,\\d+,\\d+");//no look-ahead needed
Matcher m = p.matcher(data);
List<String> parts = new ArrayList<>();
while(m.find()){
parts.add(m.group());
}
String[] result = parts.toArray(new String[0]);
You can try to use split method with (?<=\\G\\d+,\\d+,\\d+), regex
Demo
String data = "0,0,1,2,4,5,3,4,6";
String[] array = data.split("(?<=\\G\\d+,\\d+,\\d+),"); //Magic :)
// to reveal magic see explanation below answer
for(String s : array){
System.out.println(s);
}
output:
0,0,1
2,4,5
3,4,6
Explanation
\\d means one digit, same as [0-9], like 0 or 3
\\d+ means one or more digits like 1 or 23
\\d+, means one or more digits with comma after it, like 1, or 234,
\\d+,\\d+,\\d+ will accept three numbers with commas between them like 12,3,456
\\G means last match, or if there is none (in case of first usage) start of the string
(?<=...), is positive look-behind which will match comma , that has also some string described in (?<=...) before it
(?<=\\G\\d+,\\d+,\\d+), so will try to find comma that has three numbers before it, and these numbers have aether start of the string before it (like ^0,0,1 in your example) or previously matched comma, like 2,4,5 and 3,4,6.
Also in case you want to use other characters then digits you can also use other set of characters like
\\w which will match alphabetic characters, digits and _
\\S everything that is not white space
[^,] everything that is not comma
... and so on. More info in Pattern documentation
By the way, this form will work with split on every 3rd, 5th, 7th, (and other odd numbers) comma, like split("(?<=\\G\\w+,\\w+,\\w+,\\w+,\\w+),") will split on every 5th comma.
To split on every 2nd, 4th, 6th, 8th (and rest of even numbers) comma you will need to replace + with {1,maxLengthOfNumber} like split("(?<=\\G\\w{1,3},\\w{1,3},\\w{1,3},\\w{1,3}),") to split on every 4th comma when numbers can have max 3 digits (0, 00, 12, 000, 123, 412, 999).
To split on every 2nd comma you can also use this regex split("(?<!\\G\\d+),") based on my previous answer

Obligatory Guava answer:
String input = "0,0,1,2,4,5,3,4,6";
String delimiter = ",";
int partitionSize = 3;
for (Iterable<String> iterable : Iterables.partition(Splitter.on(delimiter).split(s), partitionSize)) {
System.out.println(Joiner.on(delimiter).join(iterable));
}
Outputs:
0,0,1
2,4,5
3,4,6

Try something like the below:
public String[] mySplitIntoThree(String str)
{
String[] parts = str.split(",");
List<String> strList = new ArrayList<String>();
for(int x = 0; x < parts.length - 2; x = x+3)
{
String tmpStr = parts[x] + "," + parts[x+1] + "," + parts[x+2];
strList.add(tmpStr);
}
return strList.toArray(new String[strList.size()]);
}
(You may need to import java.util.ArrayList and java.util.List)

Nice one for the coding dojo! Here's my good old-fashioned C-style answer:
If we call the bits between commas 'parts', and the results that get split off 'substrings' then:
n is the amount of parts found so far,
i is the start of the next part,
startIndex the start of the current substring
Iterate over the parts, every third part: chop off a substring.
Add the leftover part at the end to the result when you run out of commas.
List<String> result = new ArrayList<String>();
int startIndex = 0;
int n = 0;
for (int i = x.indexOf(',') + 1; i > 0; i = x.indexOf(',', i) + 1, n++) {
if (n % 3 == 2) {
result.add(x.substring(startIndex, i - 1));
startIndex = i;
}
}
result.add(x.substring(startIndex));

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Split string into repeated characters - java

What about capturing in a lookbehind? (?<=(.))(?!\1|$) as a Java string: (?<=(.))(?!\\1|$)

Related

java regex mask all elements in a list with last 4 characters visible

Extract words between double quotes based on position

java regular expression examples for match without length limitation

How to exclude the words that have non-alphabetic characters from string

Split a String at every 3rd comma in Java

Categories

Resources