About splits in Java - java

I'm a Java beginner, so please bear with me if this is an extremely easy answer.
Say I have code that looks like this:
String str;
String [] splits;
str = "The words never line up in such a way ";
splits = str.split(" ");
for (int i = 0; i < splits.length; i++)
System.out.println(splits[i]);
What does Java do at the end of the string? After "way" there is a space; since there is no value after the space does Java decide not to split again?
Thanks so much!

According to the Java documentation for split(), http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String),
The split(String r) is equivalent to the split(String r, 0) method, which will ignore and not include any blank trailing empty strings. Specifically from the docs:
"Trailing empty strings are therefore not included in the resulting
array."
So the last element in the array after the split will be "way"
You can confirm this by executing the code you mentioned.

You will not get any trailing space after delimiter if you use split method. Example
class Main
{
public static void main (String[] args)
{
String str;
String [] splits;
str = "The words never line up in such a way "; // some empty string after delimiter at end
splits = str.split(" ");
for (int i = 0; i < splits.length; i++)
System.out.println(splits[i]);
System.out.println("END");
}
}
OUTPUT
The
words
never
line
up
in
such
a
way
END
see no splitted string for end delimiters.
Now
class Main
{
public static void main (String[] args)
{
String str;
String [] splits;
str = "The words never line up in such a way yeah";
splits = str.split(" ");
for (int i = 0; i < splits.length; i++)
System.out.println(splits[i]);
System.out.println("END");
}
}
OUTPUT
The
words
never
line
up
in
such
a
way
yeah
END
see an extra string after delimiter which is also a empty string but not the trailing, so it will be in the array.

I´ve been looking at javadoc and here what it says about String.split:
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
It seems that this method calls .split with two arguments:
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded.
thanks

Related

Efficiently split large strings in Java

I have a large string that should be split at a certain character, if it is not preceded by another certain character.
Would is the most efficient way to do this?
An example: Split this string at ':', but not at "?:":
part1:part2:https?:example.com:anotherstring
What I have tried so far:
Regex (?<!\?):. Very slow.
First getting the indices where to split the string and then split it. Only efficient if there are not many split characters in the string.
Iterating over the string character by character. Efficient if there are not many protect characters (e.g. '?').
I fear you would have to go through the string and check if a ":" is preceded by a "?"
int lastIndex=0;
for(int index=string.indexOf(":"); index >= 0; index=string.indexOf(":", lastIndex)){
if(index == 0 || string.charAt(index-1) != '?'){
String splitString = string.subString(lastIndex, index);
// add splitString to list or array
lastIndex = index+1;
}
}
// add string.subString(lastIndex) to list or array
You will have to test this very carefully (since I didn't do that), but using a regular expression in the split() might produce the results you want:
public static void main(String[] args) {
String s = "Start.Teststring.Teststring1?.Teststring2.?Teststring3.?.End";
String[] result = s.split("(?<!\\?)\\.(?!\\.)");
System.out.println(String.join("|", result));
}
Output:
Start|Teststring|Teststring1?.Teststring2|?Teststring3|?.End
Note:
This only considers your example about splitting by dot if the dot is not preceded by an interrogation mark.
I don't think you will get a much more performant solution than the regex...

Space counter not functioning in Java

I am trying to make a word counter in java. I'm trying to count words by separating them with spaces.
I've managed to get rid of the spaces before or after a sentence with the trim function. However, I haven't been able to adjust for the case that the user types more than one space in between two words. For example, so far the string "hello world" with multiple spaces between hello and world, would output a word count greater than two. This is the code that I have tried so far to fix this problem.
public void countWord(){
String tokens[] = userInput.trim().split(" ");
int counter = tokens.length;
for(int i = 0; i < tokens.length; ++i) {
if(Objects.equals(" ", tokens[i])) {
--counter;
}
}
System.out.printf("Total word count is: %d", counter);
}
As you can see I create a word counting integer that holds the number of tokens created. Then I try and look for a token that only contains " " then decrement the word count by the amount of those strings. However this is not solving my problem.
Try regex to split
userInput.split("\\s+");
You've already split() on spaces, so there will be no more spaces in any of the tokens as split() returns:
the array of strings computed by splitting this string around matches of the given regular expression
(Emphasis mine)
However if there are extra spaces in your String there will be extra tokens, which will throw off the length. Instead use split("\\s+"). Then just return the length of the Array, as split() already will return all the tokens separated by spaces, which will be all the words:
System.out.printf("Total word count is: %d", tokens.length);
Which will print 5 for the test String
"Hello this is a String"
If you are intended to count the words, give a try to one of the followings:
Among those that others mentioned.
Here, this solution uses StringTokenizer.
String words = "The Hello World word counter by using StringTokenizer";
StringTokenizer st = new StringTokenizer(words);
System.out.println(st.countTokens()); // => 8
This way you can take an advantage of regexp by using it to split the string by words
String words = "The Hello World word counter by using regex";
int counter = words.split("\\w+").length;
System.out.println(counter); // => 8
Use Scanner for your own counter method:
public static int counter(String words) {
Scanner scanner = new Scanner(words);
int count = 0;
while(scanner.hasNext()) {
count += 1;
scanner.next();
}
return count;
}
If you want to count the spaces as you said in the title, you can use StringUtils from Commons
int count = StringUtils.countMatches("The Hello World space counter by using StringUtils", " ");
System.out.println(count);
Or if you use Spring the SpringUtils is also available for you.
int count = StringUtils.countOccurrencesOf("The Hello World space counter by using Spring-StringUtils", " ");
System.out.println(count);
I think you can easily fix it by checking if a tokens[i].equals(""). Thus checking if a word is an empty string. Since splitting on a space when using multiple spaces creates empty string objects in the array, this should work.
Why don't you get rid of all occurrences of 2 or more adjacent spaces and then split:
String tokens[] = userInput.trim().replaceAll("\\s+", " ").split(" ");

Need to split character numbers with comma and space using Java

Hi I am relatively new to Java. I have to compare amount value AED 555,439,972 /yr is lesser to another amount. so I tried to split using the code first
public static void main(String[] args) {
String value= "AED 555,439,972 /yr";
String[] tokens = value.split("\b");
int[] numbers = new int[tokens.length];
for (int i = 0; i < tokens.length; i++) {
numbers[i] = Integer.parseInt(tokens[i]);
}
System.out.println(numbers);
}
but I am getting Exception in thread "main" java.lang.NumberFormatException: For input string: "AED 555,439,972 /yr".
Appreciate if someone can help me to solve the problem.
Hope that you need to get the numeric value from the string.
First, use the following to remove all non-digit characters.
value.replaceAll("\\D", "")
\\D stands for non-digit character. Once every such character is replaced with empty string (which means those are removed), use Integer.parseInt on it. (Use Long.parseLong if the values can be out of Integer's range.)
In your code, you are trying to split the string by word character ends (which too is not done correctly; you need to escape it as \\b). That would give you an array having the result of the string split at each word end (after the AED, after the space following AED, after the first 3 digits, after the first comma and so on..), after which you are converting each of the resulting array components into integers, which would fail at the AED.
In short, the following is what you want:
Integer.parseInt(value.replaceAll("\\D", ""));
There are a few of things wrong with your code:
String[] tokens = value.split("\b");
The "\" needs to be escape, like this:
String[] tokens = value.split("\\b");
This will split your input on word boundaries. Only some of the elements in the tokens array will be valid numbers, the others will result in a NumberFormatException. More specifically, at index 2 you'll have "555", at index 4 you'll have 439, and at index 6 you'll have 972. These can be parsed to integers, the others cannot.
I found a solution from stack overflow itself
public static void main(String[] args) {
String line = "AED 555,439,972 /yr";
String digits = line.replaceAll("[^0-9]", "");
System.out.println(digits);
}
the output is 555439972
You are going about it the wrong way. It's a single formatted number, so treat it that way.
Remove all non-digit characters, then parse as an integer:
int amount = Integer.parseInt(value.replaceAll("\\D", ""));
Then you'll have the number of dirhams per year, which you can compare to other values.

How to return only first n number of words in a sentence Java

Say i have a simple sentence as below.
For example, this is what have:
A simple sentence consists of only one clause. A compound sentence
consists of two or more independent clauses. A complex sentence has at
least one independent clause plus at least one dependent clause. A set
of words with no independent clause may be an incomplete sentence,
also called a sentence fragment.
I want only first 10 words in the sentence above.
I'm trying to produce the following string:
A simple sentence consists of only one clause. A compound
I tried this:
bigString.split(" " ,10).toString()
But it returns the same bigString wrapped with [] array.
Thanks in advance.
Assume bigString : String equals your text. First thing you want to do is split the string in single words.
String[] words = bigString.split(" ");
How many words do you like to extract?
int n = 10;
Put words together
String newString = "";
for (int i = 0; i < n; i++) { newString = newString + " " + words[i];}
System.out.println(newString);
Hope this is what you needed.
If you want to know more about regular expressions (i.e. to tell java where to split), see here: How to split a string in Java
If you use the split-Method with a limiter (yours is 10) it won't just give you the first 10 parts and stop but give you the first 9 parts and the 10th place of the array contains the rest of the input String. ToString concatenates all Strings from the array resulting in the whole input String. What you can do to achieve what you initially wanted is:
String[] myArray = bigString.split(" " ,11);
myArray[10] = ""; //setting the rest to an empty String
myArray.toString(); //This should give you now what you wanted but surrouned with array so just cut that off iterating the array instead of toString or something.
This will help you
String[] strings = Arrays.stream(bigstring.split(" "))
.limit(10)
.toArray(String[]::new);
Here is exactly what you want:
String[] result = new String[10];
// regex \s matches a whitespace character: [ \t\n\x0B\f\r]
String[] raw = bigString.split("\\s", 11);
// the last entry of raw array is the whole sentence, need to be trimmed.
System.arraycopy(raw, 0, result , 0, 10);
System.out.println(Arrays.toString(result));

How to split a string and extract specific elements?

I have a file, which consists of lines such as
20 19:0.26 85:0.36 1064:0.236 # 750
I have been able to read it line by line and output it to the console. However, what I really need is to extract the elements like "19:0.26" "85:0.36" from each line, and perform certain operations on them. How to split the lines and get the elements that I want.
Use a regular expression:
Pattern.compile("\\d+:\\d+\\.\\d+");
Then you can create a Matcher object from this pattern end use its method find().
Parsing a line of data depends heavily on what the data is like and how consistent it is. Purely from your example data and the "elements like" that you mention, this could be as easy as
String[] parts = line.split(" ");
Modify this code as per yours,
public class JavaStringSplitExample{
public static void main(String args[]){
String str = "one-two-three";
String[] temp;
/* delimiter */
String delimiter = "-";
/* given string will be split by the argument delimiter provided. */
temp = str.split(delimiter);
/* print substrings */
for(int i =0; i < temp.length ; i++)
System.out.println(temp[i]);
/*
IMPORTANT : Some special characters need to be escaped while providing them as
delimiters like "." and "|".
*/
System.out.println("");
str = "one.two.three";
delimiter = "\\.";
temp = str.split(delimiter);
for(int i =0; i < temp.length ; i++)
System.out.println(temp[i]);
/*
Using second argument in the String.split() method, we can control the maximum
number of substrings generated by splitting a string.
*/
System.out.println("");
temp = str.split(delimiter,2);
for(int i =0; i < temp.length ; i++)
System.out.println(temp[i]);
}
}
Java Strings have a split method that you can call
String [] stringArray = "some string".split(" ");
You can use a Regular expression if you want to so that you can match certain characters to split off of.
String Doc:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html
Pattern Doc (Used to make regular expressions):
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

Categories