Getting integers out of a string containing words in java - java

Right now I have a string input along the lines of "Stern Brenda 90 86 45". I'm trying to find a way to get 90 86 and 45 out of that and assign them as ints to tests 3, 2, and 1 respectively to compute an average of them.
while ((line = reader.readLine()) != null) {
test3 = line.indexOf(-2, -1);
test2 = line.indexOf(-5, -4);
test1 = line.indexOf(-8, -7);
This is returning a value of -1 for each test (I tried using a regular expression to start from index -2 and go until another integer is found. Trying to get a two digit integer (as opposed to if I was just trying to get something like 5 or 6) is really whats throwing me off. Is using the .indexOf method the best way to go about getting these numbers out of the string? If so how am I using it incorrectly?
edit: I found a solution that was relatively simple.
while ((line = reader.readLine()) != null) {
String nums = line.replaceAll("[\\D]", "");
test1 = Integer.parseInt(nums.substring(0,2));
test2 = Integer.parseInt(nums.substring(2,4));
test3 = Integer.parseInt(nums.substring(4,6));
For the input "Stern Brenda 90 86 45", this returns 90 for test1, 86 for test2, and 45 for test3 (all as integers).

CheshireMoe almost has it right, but he's accessing a List like an array, which probably won't work. In his example:
Instead of:
test3 = Integer.parseInt(tokens[tokens.length-1]);
test2 = Integer.parseInt(tokens[tokens.length-2]);
test1 = Integer.parseInt(tokens[tokens.length-3]);
Should be:
test3 = Integer.parseInt(tokens.get(tokens.size()-1));
test2 = Integer.parseInt(tokens.get(tokens.size()-2));
test1 = Integer.parseInt(tokens.get(tokens.size()-3));
An easier solution might be just to split the array using the space:
while ((line = reader.readLine()) != null) {
String [] tokens = line.split(" ");
if (tokens.length != 5) { // catch errors in your data!
throw new Exception(); // <-- use this if you want to stop on bad data
// continue; <-- use this if you just want to skip the record, instead
}
test3 = Integer.parseInt(tokens[4]);
test2 = Integer.parseInt(tokens[3]);
test1 = Integer.parseInt(tokens[2]);
}
Based on your data, you might also consider putting in some validation like I've shown, to catch things like:
a value is missing (student didn't take one of the tests)
not all the grades were entered as numbers (i.e. bad characters)
first and last name both exist

You could use a regular expression to parse the string. This just computes the average. You can also assign the individual values as you deem appropriate.
String s = "Stern Brenda 90 86 45";
double sum = 0;
\\b - a word boundary
\\d+ - one or more digits
() - a capture group
matching on the string s
Matcher m = Pattern.compile("\\b(\\d+)\\b").matcher(s);
int count = 0;
a long as find() returns true, you have a match
so convert group(1) to a double, add to sum and increment the count.
while (m.find()) {
sum+= Double.parseDouble(m.group(1));
count++;
}
When done, compute the average.
System.out.println(sum + " " + count); // just for demo purposes.
if (count > 0) { //just in case
double avg = sum/count;
System.out.println("Avg = " + avg);
}
prints
221.0 3
Avg = 73.66666666666667
Check out the Pattern class for more details.
Formatting the final answer may be desirable. See System.out.printf

StringTokenizer is a very useful way to work with data strings that come files or streams. If your data is separated by a specific character (space in this case), StringTokenizer is a easy way to brake a large string into parts & iterate through the data.
Since you don't seem to care about the other 'words' in the line & have not specified if it will be a constant number (middle name?) my example puts all the tokens in an array to get the last three like in the question. I have also added the parseInt() method to convert from strings to int. Here is how you would tokenize your lines.
while ((line = reader.readLine()) != null) {
List<String> tokens = new ArrayList<>();
StringTokenizer tokenizer = new StringTokenizer(line, " ");
while (tokenizer.hasMoreElements()) {
tokens.add(tokenizer.nextToken());
}
test3 = Integer.parseInt(tokens.get(tokens.size()-1));
test2 = Integer.parseInt(tokens.get(tokens.size()-2));
test1 = Integer.parseInt(tokens.get(tokens.size()-3));
}

Related

Reading a file -- pairing a String and int value -- with multiple split lines

I am working on an exercise with the following criteria:
"The input consists of pairs of tokens where each pair begins with the type of ticket that the person bought ("coach", "firstclass", or "discount", case-sensitively) and is followed by the number of miles of the flight."
The list can be paired -- coach 1500 firstclass 2000 discount 900 coach 3500 -- and this currently works great. However, when the String and int value are split like so:
firstclass 5000 coach 1500 coach
100 firstclass
2000 discount 300
it breaks entirely. I am almost certain that it has something to do with me using this format (not full)
while(fileScanner.hasNextLine())
{
StringTokenizer token = new StringTokenizer(fileScanner.nextLine(), " ")
while(token.hasMoreTokens())
{
String ticketClass = token.nextToken().toLowerCase();
int count = Integer.parseInt(token.nextToken());
...
}
}
because it will always read the first value as a String and the second value as an integer. I am very lost on how to keep track of one or the other while going to read the next line. Any help is truly appreciated.
Similar (I think) problems:
Efficient reading/writing of key/value pairs to file in Java
Java-Read pairs of large numbers from file and represent them with linked list, get the sum and product of each pair
Reading multiple values in multiple lines from file (Java)
If you can afford to read the text file in all at once as a very long String, simply use the built-in String.split() with the regex \\s+, like so
String[] tokens = fileAsString.split("\\s+");
This will split the input file into tokens, assuming the tokens are separated by one or more whitespace characters (a whitespace character covers newline, space, tab, and carriage return). Even and odd tokens are ticket types and mile counts, respectively.
If you absolutely have to read in line-by-line and use StringTokenizer, a solution is to count number of tokens in the last line. If this number is odd, the first token in the current line would be of a different type of the first token in the last line. Once knowing the starting type of the current line, simply alternating types from there.
int tokenCount = 0;
boolean startingType = true; // true for String, false for integer
boolean currentType;
while(fileScanner.hasNextLine())
{
StringTokenizer token = new StringTokenizer(fileScanner.nextLine(), " ");
startingType = startingType ^ (tokenCount % 2 == 1); // if tokenCount is odd, the XOR ^ operator will flip the starting type of this line
tokenCount = 0;
while(token.hasMoreTokens())
{
tokenCount++;
currentType = startingType ^ (tokenCount % 2 == 0); // alternating between types in current line
if (currentType) {
String ticketClass = token.nextToken().toLowerCase();
// do something with ticketClass here
} else {
int mileCount = Integer.parseInt(token.nextToken());
// do something with mileCount here
}
...
}
}
I found another way to do this problem without using either the StringTokenizer or the regex...admittedly I had trouble with the regular expressions haha.
I declare these outside of the try-catch block because I want to use them in both my finally statement and return the points:
int points = 0;
ArrayList<String> classNames = new ArrayList<>();
ArrayList<Integer> classTickets = new ArrayList<>();
Then inside my try-statement, I declare the index variable because I won't need that outside of this block. That variable increases each time a new element is read. Odd elements are read as ticket classes and even elements are read as ticket prices:
try
{
int index = 0;
// read till the file is empty
while(fileScanner.hasNext())
{
// first entry is the ticket type
if(index % 2 == 0)
classNames.add(fileScanner.next());
// second entry is the number of points
else
classTickets.add(Integer.parseInt(fileScanner.next()));
index++;
}
}
You can either catch it here like this or use throws NoSuchElementException in your method declaration -- As long as you catch it on your method call
catch(NoSuchElementException noElement)
{
System.out.println("<###-NoSuchElementException-###>");
}
Then down here, loop through the number of elements. See which flight class it is and multiply the ticket count respectively and return the points outside of the block:
finally
{
for(int i = 0; i < classNames.size(); i++)
{
switch(classNames.get(i).toLowerCase())
{
case "firstclass": // 2 points for first
points += 2 * classTickets.get(i);
break;
case "coach": // 1 point for coach
points += classTickets.get(i);
break;
default:
// budget gets nothing
}
}
}
return points;
The regex seems like the most convenient way, but this was more intuitive to me for some reason. Either way, I hope the variety will help out.
simply use the built-in String.split() - #bui
I was finally able to wrap my head around regular expressions, but \s+ was not being recognized for some reason. It kept giving me this error message:
Invalid escape sequence (valid ones are \b \t \n \f \r " ' \ )Java(1610612990)
So when I went through with those characters instead, I was able to write this:
int points = 0, multiplier = 0, tracker = 0;
while(fileScanner.hasNext())
{
String read = fileScanner.next().split(
"[\b \t \n \f \r \" \' \\ ]")[0];
if(tracker % 2 == 0)
{
if(read.toLowerCase().equals("firstclass"))
multiplier = 2;
else if(read.toLowerCase().equals("coach"))
multiplier = 1;
else
multiplier = 0;
}else
{
points += multiplier * Integer.parseInt(read);
}
tracker++;
}
This code goes one entry at a time instead of reading a whole array void of whitespace as a work-around for that error message I was getting. If you could show me what the code would look like with String[] tokens = fileAsString.split("\s+"); instead I would really appreciate it :)
you need to add another "\" before "\s" to escape the slash before "s" itself – #bui

How to read and validate different portions of a line of text in a text file in Java?

So I'm trying to validate data in a text file using Java. The text file looks like this (ignore the bullet points):
51673 0 98.85
19438 5 95.00
00483 3 73.16
P1905 1 85.61
80463 2 73.16
76049 4 63.48
34086 7 90.23
13157 0 54.34
24937 2 81.03
26511 1 74.16
20034 4 103.90
The first column of numbers needs to be within the range of 00000-99999 and with not letters, the second column needs to be within the range of 0-5, and the third column needs to be within the range of 0.00-100.00. So how I would be able to validate each of these columns in the text file separately to meet the requirements? I already know how to read the text file, I'm just trying to figure out how to validate the data.
So you have a line, String line = "20034 4 103.90";.
You can break it into its consituent parts using .split().
Then inspect/validate each of them individually before repeating the same for the next line.
So, it would be splitting by the delimiter " ", since it separates the columns.
String[] parts = line.split(" ");
String part1 = parts[0]; // 20034
String part2 = parts[1]; // 4
String part3 = parts[2]; // 203.90
You can play around here http://ideone.com/LcNYQ9
Validation
Regarding validation, it's quite easy.
For column 1, you can do something like if (i > 0 && i < 100000)
Same for column 2, if (i > 0 && i < 6)
To check if the column 1 doesn't contain any letters, you can use this:
part1.contains("[a-zA-Z]+") == false inside an if statement.
Instead of checking if it doesn't have letters, check that it only contains digits or decimal points. I've provided the appropriate regular expressions for doing the same.
Step 1: Put each line in the file into a List<String>:
List<String> list = Files.readAllLines(Paths.get("filepath"));
Step 2: Split each line into its components and validate them individually:
for(String str : list)
{
String[] arr = list.split(" ");
if(arr[0].matches("\\d+")) // Check to see if it only contains digits
int part1 = Integer.valueOf(arr[0]);
else
//throw appropriate exception
validate(part1, minAcceptedValue, maxAcceptedValue);
if(arr[1].matches("\\d+")) // Check to see if it only contains digits
int part2 = Integer.valueOf(arr[1]);
else
//throw appropriate exception
validate(part2, minAcceptedValue, maxAcceptedValue);
if(arr[2].matches("[0-9]{1,4}(\\.[0-9]*)?")) // Check to see if it is a Double that has maximum 4 digits before decimal point. You can change this to any value you like.
int part2 = Integer.valueOf(arr[2]);
else
//throw appropriate exception
validate(part3, minAcceptedValue, maxAcceptedValue);
}
void validate(int x, int min, int max)
{
if(x < min || x > max)
//throw appropriate exception
}
You can use Scanner (javadocs) to help you parse the input. It's similar to the regular expressions solution but it's tailored for these situations where you read a series of values from a potentially enormous text file.
try (Scanner sc = new Scanner(new File(...))) {
while (sc.hasNext()) {
int first = sc.nextInt();
int second = sc.nextInt();
float third = sc.nextFloat();
String tail = sc.nextLine();
// validate ranges
// validate tail is empty
}
}
Off course you may catch any potential exceptions and consider them as validation failures.

ArrayIndexOutOfBounds

I'm working on a fraction calculator using String.split() to get the terms split. The inputs are separated by spaces( 1/2 / 1/2)
String[] toReturn = new String[6];
result = isInputValid(expression);
toReturn = splitExpression(expression, placeToSplit[0]);
int indexOfUnderscore = toReturn[0].indexOf("_");
result = isInputValid(toReturn[0]);
if(toReturn[5] != null){
getOperator2(toReturn);
}
The error is in the if statement. toReturn[5] is out of bounds, because when two terms or less were answered split expression, which uses String.split() to split it at the spaces, doesn't create toReturn[5], even when I set values to toReturn[5]. If there is a way to tell if a field in an array exists, that could solve it, or if there is a way to tell how many terms are being put in. My program works for 1/2 + 1/2 * 1/2, but I haven't figured out how to tell if toReturn[5] exists.
Correctly:
result = isInputValid(expression);
String[] toReturn = splitExpression(expression, placeToSplit[0]);
int indexOfUnderscore = toReturn[0].indexOf("_");
result = isInputValid(toReturn[0]);
if(toReturn.length>5 && !"".equals(toReturn[5]) ){
getOperator2(toReturn);
}
the toReturn.length>5 part verifies that the array itself is at least 6 items long. Then you can check if that element is empty or not...
This is what it should be like.
Remove first line , String[] toReturn = new String[6];
update your third line,
String[] toReturn = splitExpression(expression, placeToSplit[0]);
And check this condition:
if(toReturn.length>5 ){ // use !toReturn[5].isEmpty() to check the empty string
getOperator2(toReturn);
}

how to split this string[LECT-3A,instr01,Instructor 01,teacher,instr1#learnet.com,,,,male,phone,,] as my requirement in java

hello every one i got a string from csv file like this
LECT-3A,instr01,Instructor 01,teacher,instr1#learnet.com,,,,male,phone,,
how to split this string with comma i want the array like this
s[0]=LECT-3A,s[1]=instr01,s[2]=Instructor 01,s[3]=teacher,s[4]=instr1#learnet.com,s[5]=,s[6]=,s[7]=,s[8]=male,s[9]=phone,s[10]=,s[11]=
can anyone please help me how to split the above string as my array
thank u inadvance
- Use the split() function with , as delimeter to do this.
Eg:
String s = "Hello,this,is,vivek";
String[] arr = s.split(",");
you can use the limit parameter to do this:
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Example:
String[]
ls_test = "LECT-3A,instr01,Instructor 01,teacher,instr1#learnet.com,,,,male,phone,,".split(",",12);
int cont = 0;
for (String ls_pieces : ls_test)
System.out.println("s["+(cont++)+"]"+ls_pieces);
output:
s[0]LECT-3A
s[1]instr01
s[2]Instructor 01
s[3]teacher
s[4]instr1#learnet.com
s[5]
s[6]
s[7]
s[8]male
s[9]phone
s[10]
s[11]
You could try something like so:
String str = "LECT-3A,instr01,Instructor 01,teacher,instr1#learnet.com,,,,male,phone,,";
List<String> words = new ArrayList<String>();
int current = 0;
int previous = 0;
while((current = str.indexOf(",", previous)) != -1)
{
words.add(str.substring(previous, current));
previous = current + 1;
}
String[] w = words.toArray(new String[words.size()]);
for(String section : w)
{
System.out.println(section);
}
This yields:
LECT-3A
instr01
Instructor 01
teacher
instr1#learnet.com
male
phone

Java's split method has leading blank records that I can't suppress

I'm parsing an input file that has multiple keywords preceded by a +. The + is my delimiter in a split, with individual tokens being written to an array. The resulting array includes a blank record in the [0] position.
I suspect that split is taking the "nothing" before the first token and populating project[0], then moving on to subsequent tokens which all show up as correct.
Documentaion says that this method has a limit parameter:
If n is zero then the pattern will be applied as many times as
possible, the array can have any length, and trailing empty strings
will be discarded.
and I found this post on SO, but the solution proposed, editing out the leading delimiter (I used a substring(1) to create a temp field) yielded the same blank record for me.
Code and output appers below. Any tips would be appreciated.
import java.util.regex.*;
import java.io.*;
import java.nio.file.*;
import java.lang.*;
//
public class eadd
{
public static void main(String args[])
{
String projStrTemp = "";
String projString = "";
String[] project = new String[10];
int contextSOF = 0;
int projStringSOF = 0;
int projStringEOF = 0;
//
String inputLine = "foo foofoo foo foo #bar.com +foofoofoo +foo1 +foo2 +foo3";
contextSOF = inputLine.indexOf("#");
int tempCalc = (inputLine.indexOf("+")) ;
if (tempCalc == -1) {
proj StrTemp = "+Uncategorized";
} else {
projStringSOF = inputLine.indexOf("+",contextSOF);
projStrTemp = inputLine.trim().substring(projStringSOF).trim();
}
project = projStrTemp.split("\\+");
//
System.out.println(projStrTemp+"\n"+projString);
for(int j=0;j<project.length;j++) {
System.out.println("Project["+j+"] "+project[j]);
}
}
CONSOLE OUTPUT:
+foofoofoo +foo1 +foo2 +foo3
Project[0]
Project[1] foofoofoo
Project[2] foo1
Project[3] foo2
Project[4] foo3
Change:
projStrTemp = inputLine.trim().substring(projStringSOF).trim();
to:
projStrTemp = inputLine.trim().substring(projStringSOF + 1).trim();
If you have a leading delimiter, your array will start with a blank element. It might be worthwhile for you to experiment with split() without all the other baggage.
public static void main(String[] args) {
String s = "an+example";
String[] items = s.split("\\+");
for (int i = 0; i < items.length; i++) {
System.out.println(i + " = " + items[i]);
}
}
With String s = "an+example"; it produces:
0 = an
1 = example
Whereas String s = "+an+example"; produces:
0 =
1 = an
2 = example
One simple solution would be to remove the first + from the string. This way, it won't split before the first keyword:
projStrTemp = inputLine.trim().substring(projStringSOF + 1).trim();
Edit: Personally, I'd go for a more robust solution using regular expressions. This finds all keywords preceded by +. It also requires that + is preceded by either a space or it's at the start of the line so that words like 3+4 aren't matched.
String inputLine = "+foo 3+4 foofoo foo foo #bar.com +foofoofoo +foo1 +foo2 +foo3";
Pattern re = Pattern.compile("(\\s|^)\\+(\\w+)");
Matcher m = re.matcher(inputLine);
while (m.find()) {
System.out.println(m.group(2));
}
+foofoofoo +foo1 +foo2 +foo3
Splits method splits the string around matches of the given + so the array contains in the first element an empty field (with 5 elements). If you want to get the previous data get inputLine instead the processed projStrTemp that substring from the first + included.

Categories