confused how .split() works in Java - java

I have this string which I am taking in from a text file.
"1 normal 1 [(o, 21) (o, 17) (t, 3)]"
I want to take in 1, normal, 1, o, 21, 17, t, 3 in a string array.
Scanner inFile = new Scanner(new File("input.txt");
String input = inFile.nextLine();
String[] tokens = input.split(" |\\(|\\)|\\[\\(|\\, |\\]| \\(");
for(int i =0 ; i<tokens.length; ++i)
{
System.out.println(tokens[i]);
}
Output:
1
normal
1
o
21
o
17
t
3
Why are there spaces being stored in the array.

That's not spaces, that's empty strings. Your string is:
"1 normal 1 [(o, 21) (o, 17) (t, 3)]"
It's split in the following way according to your regexp:
Token = "1"
Delimiter = " "
Token = "normal"
Delimiter = " "
Token = "1"
Delimiter = " "
Token = "" <-- empty string
Delimiter = "[("
Token = "o"
... end so on
When two adjacent delimiters appear, it's considered that there's an empty string token between them.
To fix this you may change your regexp, for example, like this:
"[ \\(\\)\\[\\,\\]]+"
Thus any number of " ()[,]" adjacent characters will be considered as a delimiter.

For example here:
1 [(o
At first step it matches a single space.
The next step it matches [(
So between these two matching, a void String "" is returned.

Related

Parse from end to start of a String to grab data before the third occurrence of a delimiter

I am working on some strings and trying to parse through the data and retrieve a string that lies before the third occurrence of " - " from the end of the string. This data comes as a String from the DB and there is some text "-NONE----" that I would like to exclude while parsing.
Input (Below input is a String and not List)
String input1 = "-A123456-B987-013691-000-109264821"
String input2 = "-NONE----"
String input3 = "C1234567-A1241-EF-012361-000-18273460"
Output
String output1 = "-A123456-B987"
String output2= "-NONE----"
String output3 = "C1234567-A1241-EF"
Starting from the beginning of my string, I need to retrieve data before the third occurrence of
" - " (hyphen) is found, but I need to count " - " (hyphen) occurrence starting from end of string.
Any tips are appreciated.
You could use a regex replacement approach:
String input = "-A123456-B987-013691-000-109264821";
String output = "([^-]*(?:-[^-]+){2}).*", "$1");
System.out.println(output); // -A123456-B987
The regex pattern used here says to match:
( open capture group
[^-]* match optional first term
(?:-[^-]+){2} then match - and a term, twice
) close capture group, available as $1
.* consume the remainder of the string
You could match the three dashes from behind with the $ symbol and then extract everything that is in front of that. I created two capture groups, where the first one is what you want to extract:
private static String extractFront(String input1) {
if(input1.equals("-NONE----")) {
return input1;
} else {
Pattern pattern = Pattern.compile("(.*)(-[^-]*){3}$");
Matcher matcher = pattern.matcher(input1);
if (matcher.find()) {
return matcher.group(1);
}
return null;
}
}
Main to test:
public static void main(String[] args) {
String input1 = "-A123456-B987-013691-000-109264821";
String input2 = "-NONE----";
String input3 = "C1234567-A1241-EF-012361-000-18273460";
System.out.println(extractFront(input1));
System.out.println(extractFront(input2));
System.out.println(extractFront(input3));
}
Output:
-A123456-B987
-NONE----
C1234567-A1241-EF
EDIT: #stubbleweb1995 added the if condition for a complete solution
We can use streams, lambdas, and predicate.
Split your input on its end-of-line character, to get an array of strings. We filter out the “NONE” lines.
For each line, we split into pieces, using the hyphen as a delimiter. This gives us an array of strings that we reassemble using only the 3 parts.
Lastly we collect into a list.
Here is some untested code to get you started.
String[] lines = input.split( "\n" ) ;
List < String > results =
Arrays
.stream( lines )
.filter( line -> ! line.contains( "-NONE-" )
.map(
line -> {
String.join(
"-" ,
Arrays.copyOf( line.split( "-" , 4 ) , 3 , String[].class )
)
}
)
.toList()
;

(hello-> h3o) How to replace in a String the middle letters for the number of letters replaced

I need to build a method which receive a String e.g. "elephant-rides are really fun!". and return another similar String, in this example the return should be: "e6t-r3s are r4y fun!". (because e-lephan-t has 6 middle letters, r-ide-s has 3 middle letters and so on)
To get that return I need to replace in each word the middle letters for the number of letters replaced leaving without changes everything which isn't a letter and the first and the last letter of every word.
for the moment I've tried using regex to split the received string into words, and saving these words in an array of strings also I have another array of int in which I save the number of middle letters, but I don't know how to join both arrays and the symbols into a correct String to return
String string="elephant-rides are really fun!";
String[] parts = string.split("[^a-zA-Z]");
int[] sizes = new int[parts.length];
int index=0;
for(String aux: parts)
{
sizes[index]= aux.length()-2;
System.out.println( sizes[index]);
index++;
}
You may use
String text = "elephant-rides are really fun!";
Pattern r = Pattern.compile("(?U)(\\w)(\\w{2,})(\\w)");
Matcher m = r.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(1) + m.group(2).length() + m.group(3));
}
m.appendTail(sb); // append the rest of the contents
System.out.println(sb);
// => e6t-r3s are r4y fun!
See the Java demo
Here, (?U)(\\w)(\\w{2,})(\\w) matches any Unicode word char capturing it into Group 1, then captures any 2 or more word chars into Group 2 and then captures a single word char into Group 3, and inside the .appendReplacement method, the second group contents are "converted" into its length.
Java 9+:
String text = "elephant-rides are really fun!";
Pattern r = Pattern.compile("(?U)(\\w)(\\w{2,})(\\w)");
Matcher m = r.matcher(text);
String result = m.replaceAll(x -> x.group(1) + x.group(2).length() + x.group(3));
System.out.println( result );
// => e6t-r3s are r4y fun!
For the instructions you gave us, this would be sufficient:
String [] result = string.split("[\\s-]");
for (int i=0; i<result.length; i++){
result[i] = "" + result[i].charAt(0) + ((result[i].length())-2) + result[i].charAt(result[i].length()-1);
}
With your input, it creates the array [ "e6t", "r3s", "a1e", "r4y", "f2!" ]
And it works even with one or two sized words, but it gives result such as:
Input: I am a small; Output: [ "I-1I", "a0m", "a-1a", "s3l" ]
Again, for the instructions you gave us this would be legal.
Hope I helped!

Using split method in java to separate different inputs

Using the split method in java to split "Smith, John (111) 123-4567" to "John" "Smith" "111". I need to get rid of the comma and the parentheses. This is what I have so far but it doesn't split the strings.
// split data into tokens separated by spaces
tokens = data.split(" , \\s ( ) ");
first = tokens[1];
last = tokens[0];
area = tokens[2];
// display the tokens one per line
for(int k = 0; k < tokens.length; k++) {
System.out.print(tokens[1] + " " + tokens[0] + " " + tokens[2]);
}
Can also be solved by using a regular expression to parse the input:
String inputString = "Smith, John (111) 123-4567";
String regexPattern = "(?<lastName>.*), (?<firstName>.*) \\((?<cityCode>\\d+)\\).*";
Pattern pattern = Pattern.compile(regexPattern);
Matcher matcher = pattern.matcher(inputString);
if (matcher.matches()) {
out.printf("%s %s %s", matcher.group("firstName"),
matcher.group("lastName"),
matcher.group("cityCode"));
}
Output: John Smith 111
It looks like the string.split function does not know to split the parameter value into separate regex match strings.
Unless I am unaware of an undocumented feature of the Java string.split() function (documentation here), your split function parameter is trying to split the string by the entire value " , \\s ( )", which is not literally present in the operand string.
I am not able to test your code in a Java runtime to answer, but I think you need to split your split operation into individual split operations, something like:
data = "Last, First (111) 123-4567";
tokens = data.split(",");
//tokens variable should now have two strings:
//"Last", and "First (111) 123-4567"
last = tokens[0];
tokens = tokens[1].split(" ");
//tokens variable should now have three strings:
//"First", "(111)", and "123-4567"
first = tokens[0];
area = tokens[1];

Parsing a string with [3:0] substring in it

I want to store two numbers from a string into two distinct variables - for example, var1 = 3 and var2 = 0 from "[3:0]". I have the following code snippet:
String myStr = "[3:0]";
if (myStr.trim().matches("\\[(\\d+)\\]")) {
// Do something.
// If it enter the here, here I want to store 3 and 0 in different variables or an array
}
Is it possible doing this with split and regular expressions?
Don't call trim(). Enhance you regex instead.
Your regex is missing the pattern for : and the second number, and you don't need to escape the ].
To capture the matched numbers, you need the Matcher:
String myStr = " [3:0] ";
Matcher m = Pattern.compile("\\s*\\[(\\d+):(\\d+)]\\s*").matcher(myStr);
if (m.matches())
System.out.println(m.group(1) + ", " + m.group(2));
Output
3, 0
You can use replaceAll and split
String myStr = "[3:0]";
if(myStr.trim().matches("\\[\\d+:\\d+\\]") {
String[] numbers = myStr.replaceAll("[\\[\\]]","").split(":");
}
Moreover, your regExp to match String should be \\[\\d+:\\d+\\], if you want to avoid trim you can add \\s+ at start and end to match the spaces.But trim is not bad.
EDIT
As suggested by Andreas in comments,
String myStr = "[3:0]";
String regExp = "\\[(\\d+):(\\d+)\\]";
Pattern pattern = Pattern.compile(regExp);
Matcher matcher = pattern.matcher(myStr.trim());
if(matcher.find()) {
int a = Integer.parseInt(matcher.group(1));
int b = Integer.parseInt(matcher.group(2));
System.out.println(a + " : " + b);
}
OUTPUT
3 : 0
Without any regular expressions you could do this:
// this will remove the braces [ and ] and just leave "3:0"
String numberString= myString.trim().replace("[", "").replace("]","");
// this will split the string in everything before the : and everything after the : (so two values as an array)
String[] numbers = numberString.split(":");
// get the first value and parse it as a number "3" will become a simple 3
int firstNumber = Integer.parseInt(numbers[0]) ;
// get the second value and parse it from "0" to a plain 0
int secondNumber = Integer.parseInt(numbers[1]);
be carefull when parsing numbers, depending on your input string and what other possibilities there might be (e.g. "3:12" is ok, but "3:02" might throw an error).
In case you don't need to validate input and you want to simply get numbers from it, you could simply find indexOf(":") and substring parts which you are interested, in which are:
from [ (which is at position 0) till :
and from index of : till ] (which is at position equal to length of string -1)
Your code can look like
String text = "[3:0]";
int colonIndex = text.indexOf(':');
String first = text.substring(1, colonIndex);
String second = text.substring(colonIndex + 1, text.length() - 1);

How to parse string with Java?

I am trying to make a simple calculator application that would take a string like this
5 + 4 + 3 - 2 - 10 + 15
I need Java to parse this string into an array
{5, +4, +3, -2, -10, +15}
Assume the user may enter 0 or more spaces between each number and each operator
I'm new to Java so I'm not entirely sure how to accomplish this.
You can use Integer.parseInt to get the values, splitting the string you can achieve with String class. A regex could work, but I dont know how to do those :3
Take a look at String.split():
String str = "1 + 2";
System.out.println(java.util.Arrays.toString(str.split(" ")));
[1, +, 2]
Note that split uses regular expressions, so you would have to quote the character to split by "." or similar characters with special meanings. Also, multiple spaces in a row will create empty strings in the parse array which you would need to skip.
This solves the simple example. For more rigorous parsing of true expressions you would want to create a grammar and use something like Antlr.
Let str be your line buffer.
Use Regex.match for pattern ([-+]?[ \t]*[0-9]+).
Accumulate all matches into String[] tokens.
Then, for each token in tokens:
String s[] = tokens[i].split(" +");
if (s.length > 1)
tokens[i] = s[0] + s[1];
else
tokens[i] = s[0];
You can use positive lookbehind:
String s = "5 + 4 + 3 - 2 - 10 + 15";
Pattern p = Pattern.compile("(?<=[0-9]) +");
String[] result = p.split(s);
for(String ss : result)
System.out.println(ss.replaceAll(" ", ""));
String cal = "5 + 4 + 3 - 2 - 10 + 15";
//matches combinations of '+' or '-', whitespace, number
Pattern pat = Pattern.compile("[-+]{1}\\s*\\d+");
Matcher mat = pat.matcher(cal);
List<String> ops = new ArrayList<String>();
while(mat.find())
{
ops.add(mat.group());
}
//gets first number and puts in beginning of List
ops.add(0, cal.substring(0, cal.indexOf(" ")));
for(int i = 0; i < ops.size(); i++)
{
//remove whitespace
ops.set(i, ops.get(i).replaceAll("\\s*", ""));
}
System.out.println(Arrays.toString(ops.toArray()));
//[5, +4, +3, -2, -10, +15]
Based off the input of some of the answers here, I found this to be the best solution
// input
String s = "5 + 4 + 3 - 2 - 10 + 15";
ArrayList<Integer> numbers = new ArrayList<Integer>();
// remove whitespace
s = s.replaceAll("\\s+", "");
// parse string
Pattern pattern = Pattern.compile("[-]?\\d+");
Matcher matcher = pattern.matcher(s);
// add numbers to array
while (matcher.find()) {
numbers.add(Integer.parseInt(matcher.group()));
}
// numbers
// {5, 4, 3, -2, -10, 15}

Categories