Extract words between double quotes based on position - java

I have a single string that contains several quotes, i.e:
"Bruce Wayne" "43" "male" "Gotham"
I want to create a method using regex that extracts certain values from the String based on their position.
So for example, if I pass the Int values 1 and 3 it should return a String of:
"Bruce Wayne" "male"
Please note the double quotes are part of the String and are escaped characters (\")

If the number of (possible) groups is known you could use a regular expression like "(.*?)"\s*"(.*?)"\s*"(.*?)"\s*"(.*?)" along with Pattern and Matcher and access the groups by number (group 0 will always be the first match, group 1 will be the first capturing group in the expression and so on).
If the number of groups is not known you could just use expression "(.*?)" and use Matcher#find() too apply the expression in a loop and collect all the matches (group 0 in that case) into a list. Then use your indices to access the list element (element 1 would be at index 0 then).
Another alternative would be to use string.replaceAll("^[^\"]*\"|\"[^\"]*$","").split("\"\\s*\""), i.e. remove the leading and trailing double quotes with any text before or after and then split on quotes with optional whitespace in between.
Example:
assume the string optional crap before "Bruce Wayne" "43" "male" "Gotham" optional crap after
string.replaceAll("^[^\"]*\"|\"[^\"]*$","") will result in Bruce Wayne" "43" "male" "Gotham
applying split("\"\\s*\"") on the result of the step before will yield the array [Bruce Wayne, 43, male, Gotham]
then just access the array elements by index (zero-based)

My function starts at 0. You said that you want 1 and 3 but usually you start at 0 when working with arrays. So to get "Bruce Wayne" you'd ask for 0 not 1. (you could change that if you'd like though)
String[] getParts(String text, int... positions) {
String results[] = new String[positions.length];
Matcher m = Pattern.compile("\"[^\"]*\"").matcher(text);
for(int i = 0, j = 0; m.find() && j < positions.length; i++) {
if(i != positions[j]) continue;
results[j] = m.group();
j++;
}
return results;
}
// Usage
public Test() {
String[] parts = getParts(" \"Bruce Wayne\" \"43\" \"male\" \"Gotham\" ", 0, 2);
System.out.println(Arrays.toString(parts));
// = ["Bruce Wayne", "male"]
}
The method accepts as many parameters as you like.
getParts(" \"a\" \"b\" \"c\" \"d\" ", 0, 2, 3); // = a, c, d
// or
getParts(" \"a\" \"b\" \"c\" \"d\" ", 3); // = d

The function to extract words based on position:
import java.util.ArrayList;
import java.util.regex.*;
public String getString(String input, int i, int j){
ArrayList <String> list = new ArrayList <String> ();
Matcher m = Pattern.compile("(\"[^\"]+\")").matcher(input);
while (m.find()) {
list.add(m.group(1));
}
return list.get(i - 1) + list.get(j - 1);
}
Then the words can be extracted like:
String input = "\"Bruce Wayne\" \"43\" \"male\" \"Gotham\"";
String res = getString(input, 1, 3);
System.out.println(res);
Output:
"Bruce Wayne""male"

Related

java regex mask all elements in a list with last 4 characters visible

I have a list of alphanumeric strings as below
["nG5wnyPVNxS6PbbDNNbRsK5zanG94Et6Q4y74","GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1odeNv","GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1o12NN"]
I need to mask all elements with last 4 characters visible and [ " must not be masked as below.
["XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX4y74","XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXdeNv","XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX12NN"]
I have tried using
(\\W+)(\\W+)(\\w+)(\\w+)(\\w+)(\\w+)(\\w+)(\\W+)(\\W+)
as the key and $1$2XXXXXXXXXX$4$5$6$7$8$9 as the value in
maskedValue = maskedValue.replaceAll("(\\W+)(\\W+)(\\w+)(\\w+)(\\w+)(\\w+)(\\w+)(\\W+)(\\W+)", "$1$2XXXXXXXXXX$4$5$6$7$8$9")
but this only masked the first element.
["XXXXXXXXXXdeNv","nG5wnyPVNxS6PbbDNNbRsK5zanG94Et6Q4y74"]
Any leads are appreciated. Thanks in advance.
For a single value, you could use an assertion to match a word character asserting 4 characters at the end of the string.
\w(?=\w*\w{4}$)
Regex demo | Java demo
String values[] = {"nG5wnyPVNxS6PbbDNNbRsK5zanG94Et6Q4y74","GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1odeNv","GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1o12NN"};
for (String element : values)
System.out.println(element.replaceAll("\\w(?=\\w*\\w{4}$)", "X"));
Output
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX4y74
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXdeNv
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX12NN
For the whole string, you might use a finite quantifier in a positive lookbehind to match the opening " followed by a number of word characters. Then match all the characters that have 4 character before the closing "
"(?<=\"{0,100})\\w(?=\\w*\\w{4}\")"
Regex demo | Java demo
String regex = "(?<=\"{0,100})\\w(?=\\w*\\w{4}\")";
String string = "[\"nG5wnyPVNxS6PbbDNNbRsK5zanG94Et6Q4y74\",\"GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1odeNv\",\"GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1o12NN\"] ";
System.out.println(string.replaceAll(regex, "X"));
Output
["XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX4y74","XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXdeNv","XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX12NN"]
Using a stream:
List<String> terms = Arrays.asList(new String[] {
"nG5wnyPVNxS6PbbDNNbRsK5zanG94Et6Q4y74",
"GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1odeNv",
"GgQoDWqP7KtxXeePyyebu5EnNp8XxPC1o12NN"
});
List<String> termsOut = terms.stream()
.map(t -> String.join("", Collections.nCopies(t.length() - 4, "x")) +
t.substring(t.length() - 4))
.collect(Collectors.toList());
System.out.println(termsOut);
This prints:
[xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx4y74,
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxdeNv,
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx12NN]
Note that this solution does not even use regex, which means it may outperform a regex based solution.
Assuming each of these strings will start and end with quotes
Algo:
Use a flag or stack data structure to know if it's a starting quote or ending quote.
For example:
Traverse the string. Initially flag will be false. When you encounter a new quote you have to flip flag and keep traversing till you find other quote. You can do the same with
Stack stack = new Stack<>();
Sample workflow:
String str="random";
boolean flag = false;
int idx = 0;
List<Pair<Integer, Integer>> indices = new ArrayList<>();
StringBuilder string = new StringBuilder(); // for final string
int start;
int end;
while(idx < str.length()){
if (str.charAt(idx) == '"' && !flag){
// start index of string
string.append(s.charAt(idx));
start = idx;
flag = true;
}
else if (str.charAt(idx) == '"' && !flag){
// end index of string
flag = false;
end = idx;
char[] mask = new char[end-3-start];
Arrays.fill(mask, 'x');
string.append(new String(mask)); // need to put 'x' in place
}
if (!flag){
string.append(s.charAt(idx));
}
idx++;
}
Complexity: O(n)

(hello-> h3o) How to replace in a String the middle letters for the number of letters replaced

I need to build a method which receive a String e.g. "elephant-rides are really fun!". and return another similar String, in this example the return should be: "e6t-r3s are r4y fun!". (because e-lephan-t has 6 middle letters, r-ide-s has 3 middle letters and so on)
To get that return I need to replace in each word the middle letters for the number of letters replaced leaving without changes everything which isn't a letter and the first and the last letter of every word.
for the moment I've tried using regex to split the received string into words, and saving these words in an array of strings also I have another array of int in which I save the number of middle letters, but I don't know how to join both arrays and the symbols into a correct String to return
String string="elephant-rides are really fun!";
String[] parts = string.split("[^a-zA-Z]");
int[] sizes = new int[parts.length];
int index=0;
for(String aux: parts)
{
sizes[index]= aux.length()-2;
System.out.println( sizes[index]);
index++;
}
You may use
String text = "elephant-rides are really fun!";
Pattern r = Pattern.compile("(?U)(\\w)(\\w{2,})(\\w)");
Matcher m = r.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(1) + m.group(2).length() + m.group(3));
}
m.appendTail(sb); // append the rest of the contents
System.out.println(sb);
// => e6t-r3s are r4y fun!
See the Java demo
Here, (?U)(\\w)(\\w{2,})(\\w) matches any Unicode word char capturing it into Group 1, then captures any 2 or more word chars into Group 2 and then captures a single word char into Group 3, and inside the .appendReplacement method, the second group contents are "converted" into its length.
Java 9+:
String text = "elephant-rides are really fun!";
Pattern r = Pattern.compile("(?U)(\\w)(\\w{2,})(\\w)");
Matcher m = r.matcher(text);
String result = m.replaceAll(x -> x.group(1) + x.group(2).length() + x.group(3));
System.out.println( result );
// => e6t-r3s are r4y fun!
For the instructions you gave us, this would be sufficient:
String [] result = string.split("[\\s-]");
for (int i=0; i<result.length; i++){
result[i] = "" + result[i].charAt(0) + ((result[i].length())-2) + result[i].charAt(result[i].length()-1);
}
With your input, it creates the array [ "e6t", "r3s", "a1e", "r4y", "f2!" ]
And it works even with one or two sized words, but it gives result such as:
Input: I am a small; Output: [ "I-1I", "a0m", "a-1a", "s3l" ]
Again, for the instructions you gave us this would be legal.
Hope I helped!

Parsing a string with [3:0] substring in it

I want to store two numbers from a string into two distinct variables - for example, var1 = 3 and var2 = 0 from "[3:0]". I have the following code snippet:
String myStr = "[3:0]";
if (myStr.trim().matches("\\[(\\d+)\\]")) {
// Do something.
// If it enter the here, here I want to store 3 and 0 in different variables or an array
}
Is it possible doing this with split and regular expressions?
Don't call trim(). Enhance you regex instead.
Your regex is missing the pattern for : and the second number, and you don't need to escape the ].
To capture the matched numbers, you need the Matcher:
String myStr = " [3:0] ";
Matcher m = Pattern.compile("\\s*\\[(\\d+):(\\d+)]\\s*").matcher(myStr);
if (m.matches())
System.out.println(m.group(1) + ", " + m.group(2));
Output
3, 0
You can use replaceAll and split
String myStr = "[3:0]";
if(myStr.trim().matches("\\[\\d+:\\d+\\]") {
String[] numbers = myStr.replaceAll("[\\[\\]]","").split(":");
}
Moreover, your regExp to match String should be \\[\\d+:\\d+\\], if you want to avoid trim you can add \\s+ at start and end to match the spaces.But trim is not bad.
EDIT
As suggested by Andreas in comments,
String myStr = "[3:0]";
String regExp = "\\[(\\d+):(\\d+)\\]";
Pattern pattern = Pattern.compile(regExp);
Matcher matcher = pattern.matcher(myStr.trim());
if(matcher.find()) {
int a = Integer.parseInt(matcher.group(1));
int b = Integer.parseInt(matcher.group(2));
System.out.println(a + " : " + b);
}
OUTPUT
3 : 0
Without any regular expressions you could do this:
// this will remove the braces [ and ] and just leave "3:0"
String numberString= myString.trim().replace("[", "").replace("]","");
// this will split the string in everything before the : and everything after the : (so two values as an array)
String[] numbers = numberString.split(":");
// get the first value and parse it as a number "3" will become a simple 3
int firstNumber = Integer.parseInt(numbers[0]) ;
// get the second value and parse it from "0" to a plain 0
int secondNumber = Integer.parseInt(numbers[1]);
be carefull when parsing numbers, depending on your input string and what other possibilities there might be (e.g. "3:12" is ok, but "3:02" might throw an error).
In case you don't need to validate input and you want to simply get numbers from it, you could simply find indexOf(":") and substring parts which you are interested, in which are:
from [ (which is at position 0) till :
and from index of : till ] (which is at position equal to length of string -1)
Your code can look like
String text = "[3:0]";
int colonIndex = text.indexOf(':');
String first = text.substring(1, colonIndex);
String second = text.substring(colonIndex + 1, text.length() - 1);

Split a String at every 3rd comma in Java

I have a string that looks like this:
0,0,1,2,4,5,3,4,6
What I want returned is a String[] that was split after every 3rd comma, so the result would look like this:
[ "0,0,1", "2,4,5", "3,4,6" ]
I have found similar functions but they don't split at n-th amount of commas.
NOTE: while solution using split may work (last test on Java 17) it is based on bug since look-ahead in Java should have obvious maximum length. This limitation should theoretically prevent us from using + but somehow \G at start lets us use + here. In the future this bug may be fixed which means that split will stop working.
Safer approach would be using Matcher#find like
String data = "0,0,1,2,4,5,3,4,6";
Pattern p = Pattern.compile("\\d+,\\d+,\\d+");//no look-ahead needed
Matcher m = p.matcher(data);
List<String> parts = new ArrayList<>();
while(m.find()){
parts.add(m.group());
}
String[] result = parts.toArray(new String[0]);
You can try to use split method with (?<=\\G\\d+,\\d+,\\d+), regex
Demo
String data = "0,0,1,2,4,5,3,4,6";
String[] array = data.split("(?<=\\G\\d+,\\d+,\\d+),"); //Magic :)
// to reveal magic see explanation below answer
for(String s : array){
System.out.println(s);
}
output:
0,0,1
2,4,5
3,4,6
Explanation
\\d means one digit, same as [0-9], like 0 or 3
\\d+ means one or more digits like 1 or 23
\\d+, means one or more digits with comma after it, like 1, or 234,
\\d+,\\d+,\\d+ will accept three numbers with commas between them like 12,3,456
\\G means last match, or if there is none (in case of first usage) start of the string
(?<=...), is positive look-behind which will match comma , that has also some string described in (?<=...) before it
(?<=\\G\\d+,\\d+,\\d+), so will try to find comma that has three numbers before it, and these numbers have aether start of the string before it (like ^0,0,1 in your example) or previously matched comma, like 2,4,5 and 3,4,6.
Also in case you want to use other characters then digits you can also use other set of characters like
\\w which will match alphabetic characters, digits and _
\\S everything that is not white space
[^,] everything that is not comma
... and so on. More info in Pattern documentation
By the way, this form will work with split on every 3rd, 5th, 7th, (and other odd numbers) comma, like split("(?<=\\G\\w+,\\w+,\\w+,\\w+,\\w+),") will split on every 5th comma.
To split on every 2nd, 4th, 6th, 8th (and rest of even numbers) comma you will need to replace + with {1,maxLengthOfNumber} like split("(?<=\\G\\w{1,3},\\w{1,3},\\w{1,3},\\w{1,3}),") to split on every 4th comma when numbers can have max 3 digits (0, 00, 12, 000, 123, 412, 999).
To split on every 2nd comma you can also use this regex split("(?<!\\G\\d+),") based on my previous answer
Obligatory Guava answer:
String input = "0,0,1,2,4,5,3,4,6";
String delimiter = ",";
int partitionSize = 3;
for (Iterable<String> iterable : Iterables.partition(Splitter.on(delimiter).split(s), partitionSize)) {
System.out.println(Joiner.on(delimiter).join(iterable));
}
Outputs:
0,0,1
2,4,5
3,4,6
Try something like the below:
public String[] mySplitIntoThree(String str)
{
String[] parts = str.split(",");
List<String> strList = new ArrayList<String>();
for(int x = 0; x < parts.length - 2; x = x+3)
{
String tmpStr = parts[x] + "," + parts[x+1] + "," + parts[x+2];
strList.add(tmpStr);
}
return strList.toArray(new String[strList.size()]);
}
(You may need to import java.util.ArrayList and java.util.List)
Nice one for the coding dojo! Here's my good old-fashioned C-style answer:
If we call the bits between commas 'parts', and the results that get split off 'substrings' then:
n is the amount of parts found so far,
i is the start of the next part,
startIndex the start of the current substring
Iterate over the parts, every third part: chop off a substring.
Add the leftover part at the end to the result when you run out of commas.
List<String> result = new ArrayList<String>();
int startIndex = 0;
int n = 0;
for (int i = x.indexOf(',') + 1; i > 0; i = x.indexOf(',', i) + 1, n++) {
if (n % 3 == 2) {
result.add(x.substring(startIndex, i - 1));
startIndex = i;
}
}
result.add(x.substring(startIndex));

How to extract numbers from a string and get an array of ints?

I have a String variable (basically an English sentence with an unspecified number of numbers) and I'd like to extract all the numbers into an array of integers. I was wondering whether there was a quick solution with regular expressions?
I used Sean's solution and changed it slightly:
LinkedList<String> numbers = new LinkedList<String>();
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(line);
while (m.find()) {
numbers.add(m.group());
}
Pattern p = Pattern.compile("-?\\d+");
Matcher m = p.matcher("There are more than -2 and less than 12 numbers here");
while (m.find()) {
System.out.println(m.group());
}
... prints -2 and 12.
-? matches a leading negative sign -- optionally. \d matches a digit, and we need to write \ as \\ in a Java String though. So, \d+ matches 1 or more digits.
What about to use replaceAll java.lang.String method:
String str = "qwerty-1qwerty-2 455 f0gfg 4";
str = str.replaceAll("[^-?0-9]+", " ");
System.out.println(Arrays.asList(str.trim().split(" ")));
Output:
[-1, -2, 455, 0, 4]
Description
[^-?0-9]+
[ and ] delimites a set of characters to be single matched, i.e., only one time in any order
^ Special identifier used in the beginning of the set, used to indicate to match all characters not present in the delimited set, instead of all characters present in the set.
+ Between one and unlimited times, as many times as possible, giving back as needed
-? One of the characters “-” and “?”
0-9 A character in the range between “0” and “9”
Pattern p = Pattern.compile("[0-9]+");
Matcher m = p.matcher(myString);
while (m.find()) {
int n = Integer.parseInt(m.group());
// append n to list
}
// convert list to array, etc
You can actually replace [0-9] with \d, but that involves double backslash escaping, which makes it harder to read.
StringBuffer sBuffer = new StringBuffer();
Pattern p = Pattern.compile("[0-9]+.[0-9]*|[0-9]*.[0-9]+|[0-9]+");
Matcher m = p.matcher(str);
while (m.find()) {
sBuffer.append(m.group());
}
return sBuffer.toString();
This is for extracting numbers retaining the decimal
The accepted answer detects digits but does not detect formated numbers, e.g. 2,000, nor decimals, e.g. 4.8. For such use -?\\d+(,\\d+)*?\\.?\\d+?:
Pattern p = Pattern.compile("-?\\d+(,\\d+)*?\\.?\\d+?");
List<String> numbers = new ArrayList<String>();
Matcher m = p.matcher("Government has distributed 4.8 million textbooks to 2,000 schools");
while (m.find()) {
numbers.add(m.group());
}
System.out.println(numbers);
Output:
[4.8, 2,000]
Using Java 8, you can do:
String str = "There 0 are 1 some -2-34 -numbers 567 here 890 .";
int[] ints = Arrays.stream(str.replaceAll("-", " -").split("[^-\\d]+"))
.filter(s -> !s.matches("-?"))
.mapToInt(Integer::parseInt).toArray();
System.out.println(Arrays.toString(ints)); // prints [0, 1, -2, -34, 567, 890]
If you don't have negative numbers, you can get rid of the replaceAll (and use !s.isEmpty() in filter), as that's only to properly split something like 2-34 (this can also be handled purely with regex in split, but it's fairly complicated).
Arrays.stream turns our String[] into a Stream<String>.
filter gets rid of the leading and trailing empty strings as well as any - that isn't part of a number.
mapToInt(Integer::parseInt).toArray() calls parseInt on each String to give us an int[].
Alternatively, Java 9 has a Matcher.results method, which should allow for something like:
Pattern p = Pattern.compile("-?\\d+");
Matcher m = p.matcher("There 0 are 1 some -2-34 -numbers 567 here 890 .");
int[] ints = m.results().map(MatchResults::group).mapToInt(Integer::parseInt).toArray();
System.out.println(Arrays.toString(ints)); // prints [0, 1, -2, -34, 567, 890]
As it stands, neither of these is a big improvement over just looping over the results with Pattern / Matcher as shown in the other answers, but it should be simpler if you want to follow this up with more complex operations which are significantly simplified with the use of streams.
for rational numbers use this one: (([0-9]+.[0-9]*)|([0-9]*.[0-9]+)|([0-9]+))
Extract all real numbers using this.
public static ArrayList<Double> extractNumbersInOrder(String str){
str+='a';
double[] returnArray = new double[]{};
ArrayList<Double> list = new ArrayList<Double>();
String singleNum="";
Boolean numStarted;
for(char c:str.toCharArray()){
if(isNumber(c)){
singleNum+=c;
} else {
if(!singleNum.equals("")){ //number ended
list.add(Double.valueOf(singleNum));
System.out.println(singleNum);
singleNum="";
}
}
}
return list;
}
public static boolean isNumber(char c){
if(Character.isDigit(c)||c=='-'||c=='+'||c=='.'){
return true;
} else {
return false;
}
}
Fraction and grouping characters for representing real numbers may differ between languages. The same real number could be written in very different ways depending on the language.
The number two million in German
2,000,000.00
and in English
2.000.000,00
A method to fully extract real numbers from a given string in a language agnostic way:
public List<BigDecimal> extractDecimals(final String s, final char fraction, final char grouping) {
List<BigDecimal> decimals = new ArrayList<BigDecimal>();
//Remove grouping character for easier regexp extraction
StringBuilder noGrouping = new StringBuilder();
int i = 0;
while(i >= 0 && i < s.length()) {
char c = s.charAt(i);
if(c == grouping) {
int prev = i-1, next = i+1;
boolean isValidGroupingChar =
prev >= 0 && Character.isDigit(s.charAt(prev)) &&
next < s.length() && Character.isDigit(s.charAt(next));
if(!isValidGroupingChar)
noGrouping.append(c);
i++;
} else {
noGrouping.append(c);
i++;
}
}
//the '.' character has to be escaped in regular expressions
String fractionRegex = fraction == POINT ? "\\." : String.valueOf(fraction);
Pattern p = Pattern.compile("-?(\\d+" + fractionRegex + "\\d+|\\d+)");
Matcher m = p.matcher(noGrouping);
while (m.find()) {
String match = m.group().replace(COMMA, POINT);
decimals.add(new BigDecimal(match));
}
return decimals;
}
If you want to exclude numbers that are contained within words, such as bar1 or aa1bb, then add word boundaries \b to any of the regex based answers. For example:
Pattern p = Pattern.compile("\\b-?\\d+\\b");
Matcher m = p.matcher("9There 9are more9 th9an -2 and less than 12 numbers here9");
while (m.find()) {
System.out.println(m.group());
}
displays:
2
12
I would suggest to check the ASCII values to extract numbers from a String
Suppose you have an input String as myname12345 and if you want to just extract the numbers 12345 you can do so by first converting the String to Character Array then use the following pseudocode
for(int i=0; i < CharacterArray.length; i++)
{
if( a[i] >=48 && a[i] <= 58)
System.out.print(a[i]);
}
once the numbers are extracted append them to an array
Hope this helps
I found this expression simplest
String[] extractednums = msg.split("\\\\D++");
public static String extractNumberFromString(String number) {
String num = number.replaceAll("[^0-9]+", " ");
return num.replaceAll(" ", "");
}
extracts only numbers from string

Categories