How split string to array (java) - java

I need split String to array. For exapmle i have string str = "apple fruits money Pacific Ocean".
and I try split to array like this:
String []arr = str.split(" ");
But I need the Pacific Ocean to register in one cell of the array. I can't change the separator, because i get data in this form ("apple fruits money Pacific Ocean").

If we admit that multiple consecutive capitalized words need to be considered as a single word, then you can do:
String []arr = str.split("\\s");
then
`String str = "apple fruits money Pacific Ocean";
String[] arr = str.split("\\s");
String[] finalArr = new String[arr.length];
int i = 0;
for (String word : arr) {
// capitalized
if (Character.isUpperCase(word.charAt(0))) {
// check if previous is capitalized
if (Character.isUpperCase(finalArr[i - 1].charAt(0))) {
finalArr[i - 1] = finalArr[i - 1] + word + " ";
} else {
finalArr[i] = word + " ";
}
} else {
finalArr[i] = word;
}
i++;
}
for (String s : finalArr) {
System.out.println(s);
}
}
}`
will result in:
apple
fruits
money
Pacific Ocean
null
You'll need to filter the nulls though and add some checks (if i-1 exists at all).

You need to change the separator as Elliott Frisch stated in his comment. You're not going to be able to determine whether or not a set of words need to stay together if they contain a space. If your word list were separated by another character (such as a comma) then the problem becomes much easier to solve.
String input = "apples,fruits,money,Pacific Ocean";
String[] arr = input.split(",");
Now your array contains each of the words in input.

The problem as described in the question and comments has no solution.
Consider this:
"banana red apple green apple"
This can be split like this:
["banana", "red", "apple", "green", "apple"]
or like this
["banana", "red apple", "green apple"]
Without semantic / contextual analysis it is impossible to know which is more likely to be correct. And it is impossible to know for sure what the (human) user actually meant.
I can't change the separator, because i get data in this form ("apple fruits money Pacific Ocean").
You need to redesign the form or the input syntax so that your software doesn't need to perform this task. There is no other way ... to always get the correct answer.
Think of it this way. Suppose someone gave you a sequence of words in a foreign language on a piece of paper, and asked you to split them correctly. How would you (a human) solve the problem, assuming that you didn't understand the language, and hadn't been given a dictionary or a set of rules? This is equivalent to the task you are setting the computer ...

This way it's not possible. If the string was joined earlier, try using a character other than space. Maybe the pipe | might be an option.

Related

Extra java input validation for strings

I want to make this so that short inputs can still be detected, such as "Londo" and "Lon", but want to keep it small and use it without basically copying and pasting the code, any tips? thank you.
if (Menu.answer1.equals("London"))
{
if (location.equals("London")) {
System.out.print(location + " ");
System.out.print(date + " ");
System.out.print(degrees + "C ");
System.out.print(wind + "MPH ");
System.out.print(winddirection + " ");
System.out.print(weather + " ");
System.out.println("");
}
You can use startsWith()
String city = "London";
if (city.startsWith("Lon")) {
// do something
}
Also if you need to check some substring, you can use contains method:
Menu.answer1 = "London";
Menu.answer1.contains("ondo"); // true
If you want to check against a fixed set of alternatives, you may use a list of valid inputs using contains:
List<String> londonNames = Arrays.asList("London", "Londo", "Lon");
if (londonNames.contains(Menu.answer1)) {
...
}
You can use (case-insensitive) regex to do the same, e.g.:
(?)Lon[a-z]{0,3} where
(?) = case insensitivity
Lon = Initial 3 characters
[a-z]{0,3} = any number of alphabets between 0 and 3
Here's an example:
String regex = "(?)Lon[a-z]{0,3}";
System.out.println("London".matches(regex));
System.out.println("Lond".matches(regex));
System.out.println("Lon".matches(regex));
If the underlying problem is that the user can enter one of several names, and you want to allow abbreviations, then a fairly standard approach is to have a table of acceptable names.
Given the user input, loop through the table testing "does the table entry start with the string typed by the user?" (like one of the previous answers here). If yes, then you have a potential match.
Keep looking. If you get a second match then the user input was ambiguous and should be rejected.
As a bonus, you can collect all names that match, and then use them in an error message. ("Pick one of London, Lonfoo, Lonbar").
This approach has the advantage (compared to a long chain of if-then-else logic) of not requiring you to write more code when all you want to do is have more data.
It automatically allows the shortest unique abbreviation, and will adjust when a once-unique abbreviation is no longer unique because of newly-added names.

Algorithm for word match percentage between two text files

I have two Strings with many words in it.
My task is to find the percentage of word match between two strings. Can someone suggest me the algorithm we already have to get precise percentage/matched word.
Example :
1. Mason natural fish oil 1000 mg omega-3 softgels - 200 ea
2. Mason Vitamins Omega 3 Fish Oil, 1000mg. Softgels, Bonus Size 200-Count Bottle
**Output** should be 8 words matched between two strings.
You can use method as below. I have added inline comments to discribe the each step you can try it. Note that on this code example I have used space character to split the words. If you have any concerns you can add comment.
Note that I have did the matching words ignoring the case because otherwise there was no possibility to have 8 matching words in your given example.
public static int matchStrings(String firstString, String SecondString) {
int matchingCount = 0;
//Getting the whole set of words in to array.
String[] allWords = firstString.split("\\s");
Set<String> firstInputset = new HashSet<String>();
//getting unique words in to set
for (String string : allWords) {
firstInputset.add(string);
}
//Loop through the set and check whether number of words occurrence in second String
for (String string : firstInputset) {
if (SecondString.toLowerCase().contains(string.toLowerCase())) {
matchingCount++;
}
}
return matchingCount;
}

Java: Checking each space in a String

I'm sure this is fairly simple, however I've tried googling the question but can't find an answer that fits my problem.
I'm playing around with string manipulation and one of the things I'm trying to do is get the first letter of each word. (And then place them all into a string)
I'm having trouble with registering each 'space' so that my If statement will be triggered. Here's what I have so far.
while (scanText.hasNext()) {
boolean isSpace = false;
if (scanText.hasNext(" ")) {isSpace = true;}
String s = scanText.next();
if (isSpace) {firstLetters += s + " ";}
}
Also, if there is a much better way to do this then please let me know
You can also split the original text by spaces, and collect the words.
String input = " Hello world aaa ";
String[] split = input.trim().split("\\s+"); // all types of whitespace; " +" to pick spaces only
// operate on "split" array containing words now: [Hello, world, aaa]
However using regexps here might be overkill.
Assuming that scanText is a Scanner object, you could use something like stated on the documentation:
Scanner s = new Scanner(input).useDelimiter("\\s+"); //regex for spaces
https://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html

Extracting Substrings from a List in Java

If I have a parent string (let's call it output) that contains a list of variable assignments like so ...
status.availability-state available
status.enabled-state enabled
status.status-reason The pool is available
And I want to extract the values of each variable in that list given the variable names, ie the substring after the space following status.availability-state, status.enabled-state, and status.status-reason, such that I end up with three different variable assignments making each of the following String comparisons true ...
String availability = output.substring(TODO);
String enabled = output.substring(TODO);
String reason = output.substring(TODO);
availability.equals("available");
enabled.equals("enabled");
reason.equals("The pool is available");
What is the simplest way to do this? Should I even use substring for this?
This is a little tricky because you need to assign the value to a specific variable - you can't just have a map of keys to variables in Java.
I would consider doing this with a switch:
for (String line : output.split('\n')) {
String[] frags = line.split(' ', 2); // Split the line in 2 at the space.
switch (frags[0]) { // This is the "key" of the variable.
case "status.availability-state":
availability = frags[1]; // This assigns the "value" to the relevant variable.
break;
case "status.enabled-state":
enabled = frags[1];
break;
// ... etc
}
}
It's not very pretty, but you don't have too many options.
There seem to be two questions here -- how to parse the string, and how to assign to variables by name.
Tackle the string parsing one step at a time:
first write a program to read one line at a time and output each one in the body of a loop. String.split() or StringTokenizer are two options here.
next enhance this by writing a method to handle one line. The same tools are helpful here, to split on spaces.
You should now have a program that can print name: status.availability-state, value: available for each line of input.
Next, you're asking to programatically assign to variables based on the name of the parameter.
There is no legitimate way to look at a variable's name at runtime (OK, Java 8 reflection has ways, but it shouldn't be used without very good reason).
So, the best you can do is to use a switch or if statement:
switch(name) {
case status.availability-state:
availability = value;
break;
... etc.
}
However, whenever you use switch or if you should think about whether there's a better way.
Is there any reason you can't turn these variables into Map entries?
configMap.add(name,value);
Then to read it:
doSomethingWith(configMap.get("status.availability");
That's what maps are for. Use them.
This is a similar situation to the rookie mistake of using variables called person1, person2, person3... instead of using an array. Eventually they ask "How do I go from the number 25 to my variable person25?" -- and the answer is, you can't, but an array or list makes it easy. people[number] or people.get(number)
A valid alternative is to split the string by \n and add to a Map. Example:
String properties = "status.availability-state available\nstatus.enabled-state enabled\nstatus.status-reason The pool is available";
Map<String, String> map = Arrays.stream(properties.split("\n"))
.collect(Collectors.toMap(s -> s.split(" ")[0], s -> s.split(" ", 2)[1]));
System.out.println(map.get("status.status-reason"));
Should output The pool is available
This loop will match and extract the variables, and you can then assign them as you see fit:
Pattern regex = Pattern.compile("status\\.(.*?)-.*? ([a-z]+)");
Matcher matcher = regex.matcher(output);
while (matcher.find()) {
System.out.println(matcher.group(1) + "=" + matcher.group(2));
}
status\\. matches "status."
(.*?) matches any sequence of characters but isn't greedy, and captures them
-.* matches dash, any chars, space
([a-z]+) matches any string of lower-case letters, and captures them
Here's one way to do it:
Map<String, String> properties = getProperties(propertiesString);
availability = properties.get("availability-state");
enabled = properties.get("enabled-state");
reason = properties.get("status-reason");
// ...
public void getProperties(String input) {
Map<String, String> properties = new HashMap<>();
String[] lines = output.split("\n");
for (String line : lines) {
String[] parts = line.split(" ");
int keyStartIndex = parts[0].indexOf(".") + 1;
int spaceIndex = parts[1].indexOf(" ");
string key = parts[0].substring(keyStartIndex, spaceIndex);
properties.put(key, parts[1]);
}
return properties;
}
This seems to be a bit more straight-forward, in terms of the code that's setting these values, as each value is set to exactly the value from the map, rather than iterating over some list of strings and seeing if it contains a particular value and doing different things based on that.
This is designed with the primary use-case being that the string is created at runtime in memory. If the properties are created in an external file, this code would still work (after creating the desired String in memory), but it may be a better idea to use either a Properties file, or perhaps a Scanner.

How can i extract specific terms from string lines in Java?

I have a serious problem with extracting terms from each string line. To be more specific, I have one csv formatted file which is actually not csv format (it saves all terms into line[0] only)
So, here's just example string line among thousands of string lines:
(split() doesn't work.!!! )
test.csv
"31451 CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1C[S#](=O)S[C##H]1CCCCC(=O)O "
"12232 COD05374044 23439353  C924O3S2    saponin   CCCC(=O)O "
"9048   CTD042032 23241  C3HO4O3S2 Berberine  [C##H]1CCCCC(=O)O "
I want to extract "beta-lipoic acid" ,"saponin" and "Berberine" only which is located in 5th position.
You can see there are big spaces between terms, so that's why I said 5th position.
In this case, how can I extract terms located in 5th position for each line?
One more thing: the length of whitespace between each of the six terms is not always equal. the length could be one, two, three, four, or five, or something like that.
Because the length of whitespace is random, I can not use the .split() function.
For example, in the first line I would get "beta-lipoic" instead "beta-lipoic acid.**
Here is a solution for your problem using the string split and index of,
import java.util.ArrayList;
public class StringSplit {
public static void main(String[] args) {
String[] seperatedStr = null;
int fourthStrIndex = 0;
String modifiedStr = null, finalStr = null;
ArrayList<String> strList = new ArrayList<String>();
strList.add("31451 CID005319044   15939353   C8H14O3S2 beta-lipoic acid C1C[S#](=O)S[C##H]1CCCCC(=O)O ");
strList.add("12232 COD05374044 23439353 C924O3S2 saponin CCCC(=O)O ");
strList.add("9048 CTD042032 23241 C3HO4O3S2 Berberine [C##H]1CCCCC(=O)O ");
for (String item: strList) {
seperatedStr = item.split("\\s+");
fourthStrIndex = item.indexOf(seperatedStr[3]) + seperatedStr[3].length();
modifiedStr = item.substring(fourthStrIndex, item.length());
finalStr = modifiedStr.substring(0, modifiedStr.indexOf(seperatedStr[seperatedStr.length - 1]));
System.out.println(finalStr.trim());
}
}
}
Output:
beta-lipoic acid
saponin
Berberine
Option 1 : Use spring.split and check for multiple consecutive spaces. Like the code below:
String s[] = str.split("\\s\\s+");
for (String string : s) {
System.out.println(string);
}
Option 2 : Implement your own string split logic by browsing through all the characters. Sample code below (This code is just to give an idea. I didnot test this code.)
public static List<String> getData(String str) {
List<String> list = new ArrayList<>();
String s="";
int count=0;
for(char c : str.toCharArray()){
System.out.println(c);
if (c==' '){
count++;
}else {
s = s+c;
}
if(count>1&&!s.equalsIgnoreCase("")){
list.add(s);
count=0;
s="";
}
}
return list;
}
This would be a relatively easy fix if it weren't for beta-lipoic acid...
Assuming that only spaces/tabs/other whitespace separate terms, you could split on whitespace.
Pattern whitespace = Pattern.compile("\\s+");
String[] terms = whitespace.split(line); // Not 100% sure of syntax here...
// Your desired term should be index 4 of the terms array
While this would work for the majority of your terms, this would also result in you losing the "acid" in "beta-lipoic acid"...
Another hacky solution would be to add in a check for the 6th spot in the array produced by the above code and see if it matches English letters. If so, you can be reasonably confident that the 6th spot is actually part of the same term as the 5th spot, so you can then concatenate those together. This falls apart pretty quickly though if you have terms with >= 3 words. So something like
Pattern possibleEnglishWord = Pattern.compile([[a-zA-Z]*); // Can add dashes and such as needed
if (possibleEnglishWord.matches(line[5])) {
// return line[4].append(line[5]) or something like that
}
Another thing you can try is to replace all groups of spaces with a single space, and then remove everything that isn't made up of just english letters/dashes
line = whitespace.matcher(line).replaceAll("");
Pattern notEnglishWord = Pattern.compile("^[a-zA-Z]*"); // The syntax on this is almost certainly wrong
notEnglishWord.matcher(line).replaceAll("");
Then hopefully the only thing that is left would be the term you're looking for.
Hopefully this helps, but I do admit it's rather convoluted. One of the issues is that it appears that non-term words may have only one space between them, which would fool Option 1 as presented by Hirak... If that weren't the case that option should work.
Oh by the way, if you do end up doing this, put the Pattern declarations outside of any loops. They only need to be created once.

Categories