Tokenizer skipping blank values before the split - Java - java

I used Tokenizer to split a text file which was separated like so:
FIRST NAME, MIDDLE NAME, LAST NAME
harry, rob, park
tom,,do
while (File.hasNext())
{
StringTokenizer strTokenizer = new StringTokenizer(File.nextLine(), ",");
while (strTokenizer.hasMoreTokens())
{
I have used this method to split it.
The problem is that when there is missing data (for example Tom does not have a middle name) it will ignore the split (,) and register the next value as the middle name.
How do I get it to register it as blank instead?

based on Davio's answer you can manage the blank and replace it with your own String :
String[] result = File.nextLine().split(",");
for (int x = 0; x < result.length; x++){
if (result[x].isEmpty()) {
System.out.println("Undefined");
} else
System.out.println(result[x]);
}

Use String.split instead.
"tom,,do".split(",") will yield an array with 3 elements: tom, (blank) and do

Seems to me like you have 2 solutions:
You either double the comma in the file as suggested by ToYonos OR.
You can count the tokens before you assign the values to the variables using the countTokens() method, if there are only 2 tokens that means the person doesn't have a middle name.

From JavaDoc of StringTokenizer
StringTokenizer is a legacy class that is retained for compatibility
reasons although its use is discouraged in new code. It is recommended
that anyone seeking this functionality use the split method of String
or the java.util.regex package instead
So, you don't need the use of StringTokenizer.
An example with String.split
String name = "name, ,last name";
String[] nameParts = name.split(",");
for (String part : nameParts) {
System.out.println("> " + part);
}
this produces the following output
> name
>
> last name

Related

Split a string of multiple sentences into single sentences and surround them with html tags

I am a Java beginner and currently looking for a method to Split a String message into substrings, based on delimiter ( . ). Ideally I have single sentences then and I want to wrap each sentence in HTML tags, i. e. <p></p>.
I tried the following with BreakIterator class:
BreakIterator iterator = BreakIterator.getSentenceInstance(Locale.ENGLISH);
List<String> sentences = new ArrayList<String>();
iterator.setText(message);
int start = iterator.first();
String newMessage= "";
for (int end = iterator.next();
end != BreakIterator.DONE;
start = end, end = iterator.next()) {
newMessage= "<p>"+ message.substring(start,end) + "</p>";
sentences.add(newMessage);
}
This gives back one sentence. I am stuck here, I also want to wrap each number in a each sentence.
The String I have contains something like:
String message = "Hello, John. My phone number is: 02365897458.
Please call me tomorrow morning, at 8 am."
The output should be:
String newMessage = "<p>Hello, John.</p><p>My phone number is:
<number>02365897458</number>.
</p><p>Please call me tomorrow morning, at 8 am.</p>"
Is there a possibility to achieve this?
Try the split method on Java String. You can split on . and it will return an array of Strings.
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#split-java.lang.String-
This can easily be done using the StringTokenizer class, along with the StringBuilder class:
String message = SOME_STRING;
StringBuilder builder = new StringBuilder();
StringTokenizer tokenizer = new StringTokenizer(message, ".");
while(tokenizer.hasMoreTokens()) {
builder.append("<p>");
builder.append(tokenizer.nextToken());
builder.append("</p>");
}
return builder.toString();
You can add more delimiters as required for various tags.
Surrounding sentences could be archived by adding a <p> at the start, a </p> at the end and replacing each full-stop with .</p><p>. Take a look at the replace method for strings.
And to add the number tag, you could use a regex replace. The replaceAll method and a regex like [0-9]+, depending on what your numbers look like, can do that.
Something similar to this should work (untested):
newMessage = "<p>" + message.replace(".", ".</p><p>")
.replaceAll("([0-9]+)", "<number>$1</number>") +
"</p>"
As said above, you can use the split method. Because you're splitting on dots be sure to escape this in your regex. A simple example (there are other ways to keep the delimiter but I've done it like this for simplicity as you're beginning);
String toSplit = "Hello, John. My phone number is: 02365897458. Please call me tomorrow morning, at 8 am.";
String[] tokens = toSplit.split("\\.");
for(String token : tokens) {
token = "<p>" + token + ".</p>";
}

multiple sections in a csv row

I have a csv file formatted
<F,Bird,20,10/> < A,Fish,5,11,2/>
I was wondering how to read in those values separately.
Would I have to get the whole line to an array?
I have thought of doing line.split("/>") but then the first data would have < in them which I don't want.
If I on the other hand just seperate it using line.split(",") and then assign each values accordingly the values in the middle would merge so that does not work neither.
Is there a way to separate the string first without the <>/ symbols?
You can use several delimiters in split regexp, like this:
String line = "<F,Bird,20,10/> < A,Fish,5,11,2/>";
String[] lines = line.split("<|/> <|/>");
for (String item: lines) {
System.out.println(item);
}
Output (with all your spaces):
F,Bird,20,10
A,Fish,5,11,2
Try splitting your input string using the lookbehind ?<=/>:
String input = "<F,Bird,20,10/> < A,Fish,5,11,2/>";
input = input.replaceAll("\\s+", "");
String[] parts = input.split("(?<=/>)");
for (String part : parts) {
System.out.println(part.replaceAll("[<>/]", ""));
}
Note that I removed all spaces from your string to make splitting cleaner. We could still try to split with arbitrary whitespace present, but it would be more work. From this point, you can easily access the CSV data contained within each tag.
Output:
F,Bird,20,10
A,Fish,5,11,2
Demo here:
Rextester

How to return only first n number of words in a sentence Java

Say i have a simple sentence as below.
For example, this is what have:
A simple sentence consists of only one clause. A compound sentence
consists of two or more independent clauses. A complex sentence has at
least one independent clause plus at least one dependent clause. A set
of words with no independent clause may be an incomplete sentence,
also called a sentence fragment.
I want only first 10 words in the sentence above.
I'm trying to produce the following string:
A simple sentence consists of only one clause. A compound
I tried this:
bigString.split(" " ,10).toString()
But it returns the same bigString wrapped with [] array.
Thanks in advance.
Assume bigString : String equals your text. First thing you want to do is split the string in single words.
String[] words = bigString.split(" ");
How many words do you like to extract?
int n = 10;
Put words together
String newString = "";
for (int i = 0; i < n; i++) { newString = newString + " " + words[i];}
System.out.println(newString);
Hope this is what you needed.
If you want to know more about regular expressions (i.e. to tell java where to split), see here: How to split a string in Java
If you use the split-Method with a limiter (yours is 10) it won't just give you the first 10 parts and stop but give you the first 9 parts and the 10th place of the array contains the rest of the input String. ToString concatenates all Strings from the array resulting in the whole input String. What you can do to achieve what you initially wanted is:
String[] myArray = bigString.split(" " ,11);
myArray[10] = ""; //setting the rest to an empty String
myArray.toString(); //This should give you now what you wanted but surrouned with array so just cut that off iterating the array instead of toString or something.
This will help you
String[] strings = Arrays.stream(bigstring.split(" "))
.limit(10)
.toArray(String[]::new);
Here is exactly what you want:
String[] result = new String[10];
// regex \s matches a whitespace character: [ \t\n\x0B\f\r]
String[] raw = bigString.split("\\s", 11);
// the last entry of raw array is the whole sentence, need to be trimmed.
System.arraycopy(raw, 0, result , 0, 10);
System.out.println(Arrays.toString(result));

Java Split String Consecutive Delimiters

I have a need to split a string that is passed in to my app from an external source. This String is delimited with a caret "^" and here is how I split the String into an Array
String[] barcodeFields = contents.split("\\^+");
This works fine except that some of the passed in fields are empty and I need to account for them. I need to insert either "" or "null" or "empty" into any missing field.
And the missing fields have consecutive delimiters. How do I split a Java String into an array and insert a string such as "empty" as placeholders where there are consecutive delimiters?
The answer by mureinik is quite close, but wrong in an important edge case: when the trailing delimiters are in the end. To account for that you have to use:
contents.split("\\^", -1)
E.g. look at the following code:
final String line = "alpha ^beta ^^^";
List<String> fieldsA = Arrays.asList(line.split("\\^"));
List<String> fieldsB = Arrays.asList(line.split("\\^", -1));
System.out.printf("# of fieldsA is: %d\n", fieldsA.size());
System.out.printf("# of fieldsB is: %d\n", fieldsB.size());
The above prints:
# of fieldsA is: 2
# of fieldsB is: 5
String.split leaves an empty string ("") where it encounters consecutive delimiters, as long as you use the right regex. If you want to replace it with "empty", you'd have to do so yourself:
String[] split = barcodeFields.split("\\^");
for (int i = 0; i < split.length; ++i) {
if (split[i].length() == 0) {
split[i] = "empty";
}
}
Using ^+ means one (or more consecutive) carat characters. Remove the plus
String[] barcodeFields = contents.split("\\^");
and it won't eat empty fields. You'll get (your requested) "" for empty fields.
The following results in [blah, , bladiblah, moarblah]:
String test = "blah^^bladiblah^moarblah";
System.out.println(Arrays.toString(test.split("\\^")));
Where the ^^ are replaced by a "", the empty String

Java String Regex Divide - Always the Same Pattern

I never understood how to make properly regex to divide my Strings.
I have this types of Strings example = "on[?a, ?b, ?c]";
Sometimes I have this, Strings example2 = "not clear[?c]";
For the first Example I would like to divide into this:
[on, a, b, c]
or
String name = "on";
String [] vars = [a,b,c];
And for the second example I would like to divide into this type:
[not clear, c]
or
String name = "not clear";
String [] vars = [c];
Thanks alot in advance guys ;)
If you know the character set of your identifiers, you can simply do a split on all of the text that isn't in that set. For example, if your identifiers only consist of word characters ([a-zA-Z_0-9]) you can use:
String[] parts = "on[?a, ?b, ?c]".split("[\\W]+");
String name = parts[0];
String[] vars = Arrays.copyOfRange(parts, 1, parts.length);
If your identifiers only have A-Z (upper and lower) you could replace \\W above with ^A-Za-z.
I feel that this is more elegant than using a complex regular expression.
Edit: I realize that this will have issues with your second example "not clear". If you have no option of using something like an underscore instead of a space there, you could do one split on [? (or substring) to get the "name", and another split on the remainder, like so:
String s = "not clear[?a, ?b, ?c]";
String[] parts = s.split("\\[\\?"); //need the '?' so we don't get an extra empty array element in the next split
String name = parts[0];
String[] vars = parts[1].split("[\\W]+");
This comes close, but the problem is the third remembered group is actually repeated so it only captures the last match.
(.*?)\[(?:\s*(?:\?(.*?)(?:\s*,\s*\?(.*?))*)\s*)?]
For example, the first one you list on[?a, ?b, ?c] would give group 1 as on, 2 as a 3 as c. If you are using perl, you could the g flag to apply a regex to a line multiple times and use this:
my #tokens;
while ( my $line =~ /\s*(.*?)\s*[[,\]]/g ) {
push( #tokens, $1 );
}
Note, i did not actually test the perl code, just off the top of my head. It should give you the idea though
String[] parts = example.split("[^\\w ]");
List<String> x = new ArrayList<String>();
for (int i = 0; i < parts.length; i++) {
if (!"".equals(parts[i]) && !" ".equals(parts[i])) {
x.add(parts[i]);
}
}
This will work as long as you don't have more than one space separating your non-space characters. There's probably a cleverer way of filtering out the null and " " strings.

Categories