Scanner through a line with whitespace and comma - java

I am new to Java and looking for some help with Java's Scanner class. Below is the problem.
I have a text file with multiple lines and each line having multiple pairs of digit.Such that each pair of digit is represented as ( digit,digit ). For example 3,3 6,4 7,9. All these multiple pairs of digits are seperated from each other by a whitespace. Below is an exampel from the text file.
1 2,3 3,2 4,5
2 1,3 4,2 6,13
3 1,2 4,2 5,5
What i want is that i can retrieve each digit seperately. So that i can create an array of linkedlist out it. Below is what i have acheived so far.
Scanner sc = new Scanner(new File("a.txt"));
Scanner lineSc;
String line;
Integer vertix = 0;
Integer length = 0;
sc.useDelimiter("\\n"); // For line feeds
while (sc.hasNextLine()) {
line = sc.nextLine();
lineSc = new Scanner(line);
lineSc.useDelimiter("\\s"); // For Whitespace
// What should i do here. How should i scan through considering the whitespace and comma
}
Thanks

Consider using a regular expression, and data that doesn't conform to your expectation will be easily identified and dealt with.
CharSequence inputStr = "2 1,3 4,2 6,13";
String patternStr = "(\\d)\\s+(\\d),";
// Compile and use regular expression
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
while (matcher.find()) {
// Get all groups for this match
for (int i=0; i<=matcher.groupCount(); i++) {
String groupStr = matcher.group(i);
}
}
Group one and group two will correspond to the first and second digit in each pairing, respectively.

1. use nextLine() method of Scanner to get the each Entire line of text from the File.
2. Then use BreakIterator class with its static method getCharacterInstance(), to get the individual character, it will automatically handle commas, spaces, etc.
3. BreakIterator also give you many flexible methods to separate out the sentences, words etc.
For more details see this:
http://docs.oracle.com/javase/6/docs/api/java/text/BreakIterator.html

Use the StringTokenizer class. http://docs.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html
//this is in the while loop
//read each line
String line=sc.nextLine();
//create StringTokenizer, parsing with space and comma
StringTokenizer st1 = new StringTokenizer(line," ,");
Then each digit is read as a string when you call nextToken() like this, if you wanted all digits in the line
while(st1.hasMoreTokens())
{
String temp=st1.nextToken();
//now if you want it as an integer
int digit=Integer.parseInt(temp);
//now you have the digit! insert it into the linkedlist or wherever you want
}
Hope this helps!

Use split(regex), more simple :
while (sc.hasNextLine()) {
final String[] line = sc.nextLine().split(" |,");
// What should i do here. How should i scan through considering the whitespace and comma
for(int num : line) {
// Do your job
}
}

Related

How to skip certain input from a text file

I am trying to take in a file that looks like the following (but with hundreds of more lines):
123 000 words with spaces 123 123 123 words with spaces
123 000 and again words here 123 123 123 and words again
The 123, 000, "words with spaces" stuff are different each line. I am just trying to show it as a placeholder for what I need.
If I only need to get the 123's of each row, how can I ignore the other stuff in there?
Below is what I have tried:
File file = new File("txt file here");
try (Scanner in = new Scanner(file))
{
int count = 0;
while (in.hasNext())
{
int a = in.nextInt();
String trash1 = in.next();
String trash2 = in.next();
String trash3 = in.next();
int b = in.nextInt();
int c = in.nextInt();
int d = in.nextInt();
//This continues but I realize this will eventually throw an
//exception at some points in the text file because
//some rows will have more "words with spaces" than others
}
}
catch (FileNotFoundException fnf)
{
System.out.println(fnf.getMessage());
}
Is there a way to skip the "000's" and the "words with spaces" stuff that way I only take in the "123's"? Or am I just approaching this in a "bad" way. Thanks!
You can use regular expressions to strip the first part of the line.
String cleaned = in.nextLine().replace("^(\\d+\\s+)+([a-zA-Z]+\\s+)+", "");
^ means the pattern starts at the beginning of the text (the start of the line)
(\\d+\\s+)+ matches one or more groups of digits followed by whitespace.
([a-zA-Z]+\\s+)+ matches one or more groups of alphabetic characters followed by whitespace.
You may have to modify the pattern if there's punctuation or other characters. You can read more about regular expressions here if you're new to using them.
Grab line by line and split the line around a space and iterate over the array of strings only caring if the string in the array matches what you want
int countsOf123s = 0;
while (in.hasNextLine())
{
String[] words = in.nextLine().split(" "); //or for any whitespace do \\s+
for(String singleWord : words)
{
if(singleWord.equals("123"))
{
//do something
countsOf123s++;
}
}
}

Scanning 2 Different Data Types Java

I have a data file that is a list of names followed by "*****" and then continues with integers. How do I scan the names and then break with the asterisks, followed by scanning the integers?
This question might help : Splitting up data file in Java Scanner
Use the Scanner.useDelimiter() method, put "*****" as the delimiter, like this for example :
sc.useDelimiter("*****");
OR
Alternative :
Read the whole string
Split the string using String.split()
Resulting String array will have index 0 contain the names and index 1 contain the integers.
Below code should work for you
Scanner scanner = new Scanner(<INPUT_STR>).useDelimiter("[*****]");
while (scanner.hasNext()) {
if (scanner.hasNextInt()) {
// For Integer
} else {
// For String
}
}
Although this seems a tedious thing, I think this would solve the issue without worrying if the split returns anything, and the out of bounds.
final String x = "abc****12354";
final Pattern p = Pattern.compile("[A-Z]*[a-z]*\\*{4}");
final Matcher m = p.matcher(x);
while (m.find()) {
System.out.println(m.group());
}
final Pattern p1 = Pattern.compile("\\*{4}[0-9]*");
final Matcher m1 = p1.matcher(x);
while (m1.find()) {
System.out.println(m1.group());
}
The first pattern match minus the last 4 stars (can be substring-ed out) and the second pattern match minus the leading 4 stars (also can be removed) would give the request fields.

single string list- alphabetizing

I'm trying to write a code that uses a scanner to input a list of words, all in one string, then alphabetizer each individual word. What I'm getting is just the first word alphabetized by letter, how can i fix this?
the code:
else if(answer.equals("new"))
{
System.out.println("Enter words, separated by commas and spaces.");
String input= scanner.next();
char[] words= input.toCharArray();
Arrays.sort(words);
String sorted= new String(words);
System.out.println(sorted);
}
Result: " ,ahy "
You're reading in a String via scanner.next() and then breaking that String up into characters. So, as you said, it's sorting the single-string by characters via input.toCharArray(). What you need to do is read in all of the words and add them to a String []. After all of the words have been added, use Arrays.sort(yourStringArray) to sort them. See comments for answers to your following questions.
You'll need to split your string into words instead of characters. One option is using String.split. Afterwards, you can join those words back into a single string:
System.out.println("Enter words, separated by commas and spaces.");
String input = scanner.nextLine();
String[] words = input.split(",| ");
Arrays.sort(words);
StringBuilder sb = new StringBuilder();
sb.append(words[0]);
for (int i = 1; i < words.length; i++) {
sb.append(" ");
sb.append(words[i]);
}
String sorted = sb.toString();
System.out.println(sorted);
Note that by default, capital letters are sorted before lowercase. If that's a problem, see this question.

Java regex, delete content to the left of comma

I got a string with a bunch of numbers separated by "," in the following form :
1.2223232323232323,74.00
I want them into a String [], but I only need the number to the right of the comma. (74.00). The list have abouth 10,000 different lines like the one above. Right now I'm using String.split(",") which gives me :
System.out.println(String[1]) =
1.2223232323232323
74.00
Why does it not split into two diefferent indexds? I thought it should be like this on split :
System.out.println(String[1]) = 1.2223232323232323
System.out.println(String[2]) = 74.00
But, on String[] array = string.split (",") produces one index with both values separated by newline.
And I only need 74.00 I assume I need to use a REGEX, which is kind of greek to me. Could someone help me out :)?
If it's in a file:
Scanner sc = new Scanner(new File("..."));
sc.useDelimiter("(\r?\n)?.*?,");
while (sc.hasNext())
System.out.println(sc.next());
If it's all one giant string, separated by new-lines:
String oneGiantString = "1.22,74.00\n1.22,74.00\n1.22,74.00";
Scanner sc = new Scanner(oneGiantString);
sc.useDelimiter("(\r?\n)?.*?,");
while (sc.hasNext())
System.out.println(sc.next());
If it's just a single string for each:
String line = "1.2223232323232323,74.00";
System.out.println(line.replaceFirst(".*?,", ""));
Regex explanation:
(\r?\n)? means an optional new-line character.
. means a wildcard.
.*? means 0 or more wildcards (*? as opposed to just * means non-greedy matching, but this probably doesn't mean much to you).
, means, well, ..., a comma.
Reference.
split for file or single string:
String line = "1.2223232323232323,74.00";
String value = line.split(",")[1];
split for one giant string (also needs regex) (but I'd prefer Scanner, it doesn't need all that memory):
String line = "1.22,74.00\n1.22,74.00\n1.22,74.00";
String[] array = line.split("(\r?\n)?.*?,");
for (int i = 1; i < array.length; i++) // the first element is empty
System.out.println(array[i]);
Just try with:
String[] parts = "1.2223232323232323,74.00".split(",");
String value = parts[1]; // your 74.00
String[] strings = "1.2223232323232323,74.00".split(",");

How can I filter out non letters from a text file using the scanner delimiter including the single quote or apostrophe in Java

Pls I want to keep a count of every word from a file, and this count should not include non letters like the apostrophe, comma, fullstop, question mark, exclamation mark, e.t.c. i.e just letters of the alphabet.
I tried to use a delimiter like this, but it didn't include the apostrophe.
Scanner fileScanner = new Scanner("C:\\MyJavaFolder\\JavaAssignment1\\TestFile.txt");
int totalWordCount = 0;
//Firstly to count all the words in the file without the restricted characters
while (fileScanner.hasNext()) {
fileScanner.useDelimiter(("[.,:;()?!\" \t\n\r]+")).next();
totalWordCount++;
}
System.out.println("There are " + totalWordCount + " word(s)");
//Then later I create an array to store each individual word in the file for counting their lengths.
Scanner fileScanner2 = new Scanner("C:\\MyJavaFolder\\JavaAssignment1\\TestFile.txt");
String[] words = new String[totalWordCount];
for (int i = 0; i < totalWordCount; ++i) {
words[i] = fileScanner2.useDelimiter(("[.,:;()?!\" \t\n\r]+")).next();
}
This doesn't seem to work !
Please how can I go about this ?
Seems to me that you don't want to filter using anything but spaces and end lines. For example the word "they're" would return as two words if you're using a ' to filter your number of words. Here's how you could change your original code to make it work.
Scanner fileScanner = new Scanner(new File("C:\\MyJavaFolder\\JavaAssignment1\\TestFile.txt"));
int totalWordCount = 0;
ArrayList<String> words = new ArrayList<String>();
//Firstly to count all the words in the file without the restricted characters
while (fileScanner.hasNext()) {
//Add words to an array list so you only have to go through the scanner once
words.add(fileScanner.next());//This defaults to whitespace
totalWordCount++;
}
System.out.println("There are " + totalWordCount + " word(s)");
fileScanner.close();
Using the Pattern.compile() turns your string into a regular expression. The '\s' character is predefined in the Pattern class to match all white space characters.
There is more information at
Pattern Documentation
Also, make sure to close your Scanner classes when you're done. This could prevent your second scanner from opening.
Edit
If you want to count the letters per word you can add the following code to the above code
int totalLetters = 0;
int[] lettersPerWord = new int[words.size()];
for (int wordNum = 0; wordNum < words.size(); wordNum++)
{
String word = words.get(wordNum);
word = word.replaceAll("[.,:;()?!\" \t\n\r\']+", "");
lettersPerWord[wordNum] = word.length();
totalLetters = word.length();
}
I have tested this code and it appears to work for me. The replaceAll, according to the JavaDoc uses a regular expression to match so it should match any of those characters and essentially remove it.
The Delimiter is not a regular expression, so with your example it is looking for things split between "[.,:;()?!\" \t\n\r]+"
You can either use regexp instead of the Delimiter
using the regexp class with the group method may be what your looking for.
String pattern = "(.*)[.,:;()?!\" \t\n\r]+(.*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(test);
if (m.find( )) {
System.out.println("Found value: " + m.group(1) );
}
Play with those classes and you will see it is much more similar to what you need
You could try this regex in your delimiter:
fileScanner.useDelimiter(("[^a-zA-Z]|[^\']")).next();
This will use any non-letter character OR non apostrophe as a delimiter. That way your words will include the apostrophe but not any other non-letter character.
Then you'll have to loop through each word and check for apostrophe's and account for them if you want the length to be accurate. You could just remove each apostrophe and the length will match the number of letters in the word, or you could create word objects with their own length fields, so that you can print the word as is, and know the number of letter characters in that word.

Categories