Extracting Substrings from a List in Java - java

If I have a parent string (let's call it output) that contains a list of variable assignments like so ...
status.availability-state available
status.enabled-state enabled
status.status-reason The pool is available
And I want to extract the values of each variable in that list given the variable names, ie the substring after the space following status.availability-state, status.enabled-state, and status.status-reason, such that I end up with three different variable assignments making each of the following String comparisons true ...
String availability = output.substring(TODO);
String enabled = output.substring(TODO);
String reason = output.substring(TODO);
availability.equals("available");
enabled.equals("enabled");
reason.equals("The pool is available");
What is the simplest way to do this? Should I even use substring for this?

This is a little tricky because you need to assign the value to a specific variable - you can't just have a map of keys to variables in Java.
I would consider doing this with a switch:
for (String line : output.split('\n')) {
String[] frags = line.split(' ', 2); // Split the line in 2 at the space.
switch (frags[0]) { // This is the "key" of the variable.
case "status.availability-state":
availability = frags[1]; // This assigns the "value" to the relevant variable.
break;
case "status.enabled-state":
enabled = frags[1];
break;
// ... etc
}
}
It's not very pretty, but you don't have too many options.

There seem to be two questions here -- how to parse the string, and how to assign to variables by name.
Tackle the string parsing one step at a time:
first write a program to read one line at a time and output each one in the body of a loop. String.split() or StringTokenizer are two options here.
next enhance this by writing a method to handle one line. The same tools are helpful here, to split on spaces.
You should now have a program that can print name: status.availability-state, value: available for each line of input.
Next, you're asking to programatically assign to variables based on the name of the parameter.
There is no legitimate way to look at a variable's name at runtime (OK, Java 8 reflection has ways, but it shouldn't be used without very good reason).
So, the best you can do is to use a switch or if statement:
switch(name) {
case status.availability-state:
availability = value;
break;
... etc.
}
However, whenever you use switch or if you should think about whether there's a better way.
Is there any reason you can't turn these variables into Map entries?
configMap.add(name,value);
Then to read it:
doSomethingWith(configMap.get("status.availability");
That's what maps are for. Use them.
This is a similar situation to the rookie mistake of using variables called person1, person2, person3... instead of using an array. Eventually they ask "How do I go from the number 25 to my variable person25?" -- and the answer is, you can't, but an array or list makes it easy. people[number] or people.get(number)

A valid alternative is to split the string by \n and add to a Map. Example:
String properties = "status.availability-state available\nstatus.enabled-state enabled\nstatus.status-reason The pool is available";
Map<String, String> map = Arrays.stream(properties.split("\n"))
.collect(Collectors.toMap(s -> s.split(" ")[0], s -> s.split(" ", 2)[1]));
System.out.println(map.get("status.status-reason"));
Should output The pool is available

This loop will match and extract the variables, and you can then assign them as you see fit:
Pattern regex = Pattern.compile("status\\.(.*?)-.*? ([a-z]+)");
Matcher matcher = regex.matcher(output);
while (matcher.find()) {
System.out.println(matcher.group(1) + "=" + matcher.group(2));
}
status\\. matches "status."
(.*?) matches any sequence of characters but isn't greedy, and captures them
-.* matches dash, any chars, space
([a-z]+) matches any string of lower-case letters, and captures them

Here's one way to do it:
Map<String, String> properties = getProperties(propertiesString);
availability = properties.get("availability-state");
enabled = properties.get("enabled-state");
reason = properties.get("status-reason");
// ...
public void getProperties(String input) {
Map<String, String> properties = new HashMap<>();
String[] lines = output.split("\n");
for (String line : lines) {
String[] parts = line.split(" ");
int keyStartIndex = parts[0].indexOf(".") + 1;
int spaceIndex = parts[1].indexOf(" ");
string key = parts[0].substring(keyStartIndex, spaceIndex);
properties.put(key, parts[1]);
}
return properties;
}
This seems to be a bit more straight-forward, in terms of the code that's setting these values, as each value is set to exactly the value from the map, rather than iterating over some list of strings and seeing if it contains a particular value and doing different things based on that.
This is designed with the primary use-case being that the string is created at runtime in memory. If the properties are created in an external file, this code would still work (after creating the desired String in memory), but it may be a better idea to use either a Properties file, or perhaps a Scanner.

Related

Extra java input validation for strings

I want to make this so that short inputs can still be detected, such as "Londo" and "Lon", but want to keep it small and use it without basically copying and pasting the code, any tips? thank you.
if (Menu.answer1.equals("London"))
{
if (location.equals("London")) {
System.out.print(location + " ");
System.out.print(date + " ");
System.out.print(degrees + "C ");
System.out.print(wind + "MPH ");
System.out.print(winddirection + " ");
System.out.print(weather + " ");
System.out.println("");
}
You can use startsWith()
String city = "London";
if (city.startsWith("Lon")) {
// do something
}
Also if you need to check some substring, you can use contains method:
Menu.answer1 = "London";
Menu.answer1.contains("ondo"); // true
If you want to check against a fixed set of alternatives, you may use a list of valid inputs using contains:
List<String> londonNames = Arrays.asList("London", "Londo", "Lon");
if (londonNames.contains(Menu.answer1)) {
...
}
You can use (case-insensitive) regex to do the same, e.g.:
(?)Lon[a-z]{0,3} where
(?) = case insensitivity
Lon = Initial 3 characters
[a-z]{0,3} = any number of alphabets between 0 and 3
Here's an example:
String regex = "(?)Lon[a-z]{0,3}";
System.out.println("London".matches(regex));
System.out.println("Lond".matches(regex));
System.out.println("Lon".matches(regex));
If the underlying problem is that the user can enter one of several names, and you want to allow abbreviations, then a fairly standard approach is to have a table of acceptable names.
Given the user input, loop through the table testing "does the table entry start with the string typed by the user?" (like one of the previous answers here). If yes, then you have a potential match.
Keep looking. If you get a second match then the user input was ambiguous and should be rejected.
As a bonus, you can collect all names that match, and then use them in an error message. ("Pick one of London, Lonfoo, Lonbar").
This approach has the advantage (compared to a long chain of if-then-else logic) of not requiring you to write more code when all you want to do is have more data.
It automatically allows the shortest unique abbreviation, and will adjust when a once-unique abbreviation is no longer unique because of newly-added names.

parsing values from text file in java

I've got some text files I need to extract data from. The file itself contains around a hundred lines and the interesting part for me is:
AA====== test==== ====================================================/
AA normal low max max2 max3 /
AD .45000E+01 .22490E+01 .77550E+01 .90000E+01 .47330E+00 /
Say I need to extract the double values under "normal", "low" and "max". Is there any efficient and not-too-error-prone solution other than regexing the hell out of the text file?
If you really want to avoid regexes, and assuming you'll always have this same basic format, you could do something like:
HashMap<String, Double> map = new HashMap<>();
Scanner scan = new Scanner(filePath); //or your preferred input mechanism
assert (scan.nextLine().startsWith("AA====:); //remove the top line, ensure it is the top line
while (scan.hasNextLine()){
String[] headings = scan.nextLine().split("\\s+"); //("\t") can be used if you're sure the delimiters will always be tabs
String[] vals = scan.nextLine().split("\\s+");
assert headings[0].equals("AA"); //ensure
assert vals[0].equals("AD");
for (int i = 1; i< headings.length; i++){ //start with 1
map.put(headings[i], Double.parseDouble(vals[i]);
}
}
//to make sure a certain value is contained in the map:
assert map.containsKey("normal");
//use it:
double normalValue = map.get("normal");
}
Code is untested as I don't have access to an IDE at the moment. Also, I obviously don't know what's variable and what will remain constant here (read: the "AD", "AA", etc.), but hopefully you get the gist and can modify as needed.
If each line will always have this exact form you can use String.split()
String line; // Fill with one line from the file
String[] cols = line.split(".")
String normal = "."+cols[0]
String low = "."+cols[1]
String max = "."+cols[2]
If you know what index each value will start, you can just do substrings of the row. (The split method technically does a regex).
i.e.
String normal = line.substring(x, y).trim();
String low = line.substring(z, w).trim();
etc.

How can i extract specific terms from string lines in Java?

I have a serious problem with extracting terms from each string line. To be more specific, I have one csv formatted file which is actually not csv format (it saves all terms into line[0] only)
So, here's just example string line among thousands of string lines:
(split() doesn't work.!!! )
test.csv
"31451 CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1C[S#](=O)S[C##H]1CCCCC(=O)O "
"12232 COD05374044 23439353  C924O3S2    saponin   CCCC(=O)O "
"9048   CTD042032 23241  C3HO4O3S2 Berberine  [C##H]1CCCCC(=O)O "
I want to extract "beta-lipoic acid" ,"saponin" and "Berberine" only which is located in 5th position.
You can see there are big spaces between terms, so that's why I said 5th position.
In this case, how can I extract terms located in 5th position for each line?
One more thing: the length of whitespace between each of the six terms is not always equal. the length could be one, two, three, four, or five, or something like that.
Because the length of whitespace is random, I can not use the .split() function.
For example, in the first line I would get "beta-lipoic" instead "beta-lipoic acid.**
Here is a solution for your problem using the string split and index of,
import java.util.ArrayList;
public class StringSplit {
public static void main(String[] args) {
String[] seperatedStr = null;
int fourthStrIndex = 0;
String modifiedStr = null, finalStr = null;
ArrayList<String> strList = new ArrayList<String>();
strList.add("31451 CID005319044   15939353   C8H14O3S2 beta-lipoic acid C1C[S#](=O)S[C##H]1CCCCC(=O)O ");
strList.add("12232 COD05374044 23439353 C924O3S2 saponin CCCC(=O)O ");
strList.add("9048 CTD042032 23241 C3HO4O3S2 Berberine [C##H]1CCCCC(=O)O ");
for (String item: strList) {
seperatedStr = item.split("\\s+");
fourthStrIndex = item.indexOf(seperatedStr[3]) + seperatedStr[3].length();
modifiedStr = item.substring(fourthStrIndex, item.length());
finalStr = modifiedStr.substring(0, modifiedStr.indexOf(seperatedStr[seperatedStr.length - 1]));
System.out.println(finalStr.trim());
}
}
}
Output:
beta-lipoic acid
saponin
Berberine
Option 1 : Use spring.split and check for multiple consecutive spaces. Like the code below:
String s[] = str.split("\\s\\s+");
for (String string : s) {
System.out.println(string);
}
Option 2 : Implement your own string split logic by browsing through all the characters. Sample code below (This code is just to give an idea. I didnot test this code.)
public static List<String> getData(String str) {
List<String> list = new ArrayList<>();
String s="";
int count=0;
for(char c : str.toCharArray()){
System.out.println(c);
if (c==' '){
count++;
}else {
s = s+c;
}
if(count>1&&!s.equalsIgnoreCase("")){
list.add(s);
count=0;
s="";
}
}
return list;
}
This would be a relatively easy fix if it weren't for beta-lipoic acid...
Assuming that only spaces/tabs/other whitespace separate terms, you could split on whitespace.
Pattern whitespace = Pattern.compile("\\s+");
String[] terms = whitespace.split(line); // Not 100% sure of syntax here...
// Your desired term should be index 4 of the terms array
While this would work for the majority of your terms, this would also result in you losing the "acid" in "beta-lipoic acid"...
Another hacky solution would be to add in a check for the 6th spot in the array produced by the above code and see if it matches English letters. If so, you can be reasonably confident that the 6th spot is actually part of the same term as the 5th spot, so you can then concatenate those together. This falls apart pretty quickly though if you have terms with >= 3 words. So something like
Pattern possibleEnglishWord = Pattern.compile([[a-zA-Z]*); // Can add dashes and such as needed
if (possibleEnglishWord.matches(line[5])) {
// return line[4].append(line[5]) or something like that
}
Another thing you can try is to replace all groups of spaces with a single space, and then remove everything that isn't made up of just english letters/dashes
line = whitespace.matcher(line).replaceAll("");
Pattern notEnglishWord = Pattern.compile("^[a-zA-Z]*"); // The syntax on this is almost certainly wrong
notEnglishWord.matcher(line).replaceAll("");
Then hopefully the only thing that is left would be the term you're looking for.
Hopefully this helps, but I do admit it's rather convoluted. One of the issues is that it appears that non-term words may have only one space between them, which would fool Option 1 as presented by Hirak... If that weren't the case that option should work.
Oh by the way, if you do end up doing this, put the Pattern declarations outside of any loops. They only need to be created once.

How to design String decollator in a string contains many params

I need pass a string parameter that contains many params. When receive the parameter, I use String.split() to split it to get all the params.
But one promblem accured. How to design my string decollator so that any ASCII CODE on keyboard can be passed correctly.
Hope for any advice.
Maybe you could have a look at variadic arguments instead of splitting a string. For example:
public void method(String... strings) {
// strings is actually an array
String firstParam = strings[0];
String secondParam = strings[1];
// ...
}
Calling:
method("string1");
method("string1", "string2", "string3");
// as many string args as you want
If I understood correctly - you need to encode set of parameters to one string. You can use some sequence of characters for this purpose, E.g.
final String delimiter = "###"
String value = "param1###param2###param3";
String[] parameters = value.split(delimiter);
Choose a character which is easy to enter and unlikely to appear in the input. Let's assume that character is #.
Normal input would like like Item 1#Item 2#Item 3. Actually, you can .trim() every item and let the user enter Item 1 # Item 2 # Item 3 if s/he prefers.
However, like you describe, say the user would like to enter Item #1, Item #2, etc.. There are a few ways to let him/her do this, but the easier is to let them escape the delimiter. For example, instead of Item #1 # Item #2 # Item #3, which would result in 6 different items being found normally, let the user enter, for example Item ##1 # Item ##2 # Item ##3. Then in your parsing, make sure to handle the case when two or more #'s have been entered in a row. split likely won't be good enough, you'll have to go through the string yourself.
Here's a sketch of a method which would split the input string for you:
private static List<String> parseArguments(String input) {
ArrayList<String> arguments = new ArrayList<String>();
String[] prelArguments = input.split("#");
for (int i = 0; i < prelArguments.length; i++) {
String argument = prelArguments[i];
if (argument.equals("")) {
// We will enter here if there were two or more #'s in a row
StringBuilder combinedArgument = new StringBuilder(arguments.remove(arguments.size() - 1));
int inARow = 0;
while (prelArguments[i+inARow].equals("")) {
inARow++;
combinedArgument.append('#');
}
i += inARow;
combinedArgument.append(prelArguments[i]);
arguments.add(combinedArgument.toString());
} else {
arguments.add(argument);
}
}
return arguments;
}
Error handling, edge-case handling and some performance improvement is missing from the above, but I think the idea comes through.
I would eliminate the problem, which is the misuse of String as an argument container. If you need to pass more parameters, pass more parameters. If this gets out of hand, consider passing a map, or a custom object that can contain all the parameters.

Java Searching Through a String

So I want to search through a string to see if it contains the substring that I'm looking for. This is the algorithm I wrote up:
//Declares the String to be searched
String temp = "Hello World?";
//A String array is created to store the individual
//substrings in temp
String[] array = temp.split(" ");
//Iterates through String array and determines if the
//substring is present
for(String a : array)
{
if(a.equalsIgnoreCase("hello"))
{
System.out.println("Found");
break;
}
System.out.println("Not Found");
}
This algorithm works for "hello" but I don't know how to get it to work for "world" since it has a question mark attached to it.
Thanks for any help!
Take a look:
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html#contains(java.lang.CharSequence)
String.contains();
To get a containsIgnoreCase(), you'll have to make your searchword and your String toLowerCase().
Take a look at this answer:
How to check if a String contains another String in a case insensitive manner in Java?
return s1.toLowerCase().contains(s2.toLowerCase());
This will also be true for:
war of the worlds, because it will find world. If you don't want this behavior, youll have to change your method like #Bart Kiers said.
Split on the following instead:
"[\\s?.!,]"
which matches any space char, question mark, dot, exclamation or a comma (add more chars if you like).
Or do a temp = temp.toLowerCase() and then temp.contains("world").
You dont have to do this, it's already implemented:
IndexOf and others
You may want to use :
String string = "Hello World?";
boolean b = string.indexOf("Hello") > 0; // true
To ignore case, regular expressions must be used .
b = string.matches("(?i).*Hello.*");
One more variation to ignore case would be :
// To ignore case
b=string.toLowerCase().indexOf("Hello".toLowerCase()) > 0 // true

Categories