Java Invisible whitespace in the beginnig of the String - java

I want to read a String from a file. String contains several floating numbers separated with whitespace. Then I want to split it into several Strings, containing one number each and parse them to double. numbers.length() shows that my String is 1 symbol longer than it actualy is in the file. And this symbol is an invisible whitespace I cant get rid of. First element in the array has it too, and trim() doesnt help. I tried to trim every element in array, doesnt help. I cant parse this array into numbers because of this whitespace, Im getting an exception. I see that this is actualy a whitespace in the beginning only if I split String numbers or the first element of the array into chars. Beginner programmer here.
BufferedReader reader1 = new BufferedReader(new InputStreamReader(System.in));
String file1 = reader1.readLine();
reader1.close();
BufferedReader reader2 = new BufferedReader(new FileReader(file1));
String numbers = reader2.readLine();
reader2.close();
System.out.println(numbers.length());
String[] array = numbers.trim().split("\\s+");

First Line Output Full code of the class. And I added what I get in IDEA. That whitespace or whatever that goes before '4' in printing array[0] is my problem.
41
4234.234 9
2341.452 8
98234.4 7
2378.34 7
114.32 6
4
2
3
4
.
2
3
4
Process finished with exit code 0
public class Solution {
public static void main(String[] args) throws IOException{
BufferedReader reader1 = new BufferedReader(new InputStreamReader(System.in));
String file1 = reader1.readLine();
reader1.close();
BufferedReader reader2 = new BufferedReader(new FileReader(file1));
String numbers = reader2.readLine();
reader2.close();
System.out.println(numbers.length());
String[] array = numbers.trim().split("\\s+");
for(int i = 0; i < array.length; i++) {
System.out.println(array[i] + " " + array[i].length());
}
char[] ch = array[0].toCharArray();
for(char c:ch) {
System.out.println(c);
}
}
}

Related

Reading a File without line breaks using Buffered reader

I am reading a file with comma separated values which when split into an array will have 10 values for each line . I expected the file to have line breaks so that
line = bReader.readLine()
will give me each line. But my file doesnt have a line break. Instead after the first set of values there are lots of spaces(465 to be precise) and then the next line begins.
So my above code of readLine() is reading the entire file in one go as there are no lined breaks. Please suggest how best to efficiently tackle this scenario.
One way is to replace String with 465 spaces in your text with new line character "\n" before iterating it for reading.
I second Ninan's answer: replace the 465 spaces with a newline, then run the function you were planning on running earlier.
For aesthetics and readability I would suggest using Regex's Pattern to replace the spaces instead of a long unreadable String.replace(" ").
Your code could like below, but replace 6 with 465:
// arguments are passed using the text field below this editor
public static void main(String[] args)
{
String content = "DOG,CAT MOUSE,CHEESE";
Pattern p = Pattern.compile("[ ]{6}",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
String newString = p.matcher(content).replaceAll("\n");
System.out.println(newString);
}
My suggestion is read file f1.txt and write to anther file f2.txt by removing all empty lines and spaces then read f2.txt something like
FileReader fr = new FileReader("f1.txt");
BufferedReader br = new BufferedReader(fr);
FileWriter fw = new FileWriter("f2.txt");
String line;
while((line = br.readLine()) != null)
{
line = line.trim(); // remove leading and trailing whitespace
if (!line.equals("")) // don't write out blank lines
{
fw.write(line, 0, line.length());
}
}
Then try using your code.
You might create your own subclass of a FilterInputStream or a PushbackInputStream and pass that to an InputStreamReader. One overrides int read().
Such a class unfortunately needs a bit of typing. (A nice excercise so to say.)
private static final int NO_CHAR = -2;
private boolean fromCache;
private int cachedSpaces;
private int cachedNonSpaceChar = NO_CHAR;
int read() throws IOException {
if (fromCache) {
if (cachecSpaces > 0) ...
if (cachedNonSpaceChar != NO_CHAR) ...
...
}
int ch = super.read();
if (ch != -1) {
...
}
return ch;
}
The idea is to cache spaces till either a nonspace char, and in read() either take from the cache, return \n instead, call super.read() when not from cache, recursive read when space.
My understanding is that you have a flat CSV file without proper line break, which supposed to have 10 values on each line.
Updated:
1. (Recommended) You can use Scanner class with useDelimiter to parse csv effectively, assuming you are trying to store 10 values from a line:
public static void parseCsvWithScanner() throws IOException {
Scanner scanner = new Scanner(new File("test.csv"));
// set your delimiter for scanner, "," for csv
scanner.useDelimiter(",");
// storing 10 values as a "line"
int LINE_LIMIT = 10;
// implement your own data structure to store each value of CSV
int[] tempLineArray = new int[LINE_LIMIT];
int lineBreakCount = 0;
while(scanner.hasNext()) {
// trim start and end spaces if there is any
String temp = scanner.next().trim();
tempLineArray[lineBreakCount++] = Integer.parseInt(temp);
if (lineBreakCount == LINE_LIMIT) {
// replace your own logic for handling the full array
for(int i=0; i<tempLineArray.length; i++) {
System.out.print(tempLineArray[i]);
} // end replace
// resetting array and counter
tempLineArray = new int[LINE_LIMIT];
lineBreakCount = 0;
}
}
scanner.close();
}
Or use the BufferedReader.
You might not need the ArrayList to store all values if there is memory issue by replacing your own logic.
public static void parseCsv() throws IOException {
BufferedReader br = new BufferedReader(new FileReader(file));
// your delimiter
char TOKEN = ',';
// your requirement of storing 10 values for each "line"
int LINE_LIMIT = 10;
// tmp for storing from BufferedReader.read()
int tmp;
// a counter for line break
int lineBreakCount = 0;
// array for storing 10 values, assuming the values of CSV are integers
int[] tempArray = new int[LINE_LIMIT];
// storing tempArray of each line to ArrayList
ArrayList<int[]> lineList = new ArrayList<>();
StringBuilder sb = new StringBuilder();
while((tmp = br.read()) != -1) {
if ((char)tmp == TOKEN) {
if (lineBreakCount == LINE_LIMIT) {
// your logic to handle the current "line" here.
lineList.add(tempArray);
// new "line"
tempArray = new int[LINE_LIMIT];
lineBreakCount = 0;
}
// storing current value from buffer with trim of spaces
tempArray[lineBreakCount] =
Integer.parseInt(sb.toString().trim());
lineBreakCount++;
// clear the buffer
sb.delete(0, sb.length());
}
else {
// add current char from BufferedReader if not delimiter
sb.append((char)tmp);
}
}
br.close();
}

Java how to read a line from a text file that has multiple strings and double values?

I want to create a program that reads from a text file with three different parts and then outputs the name. E.g. text file:
vanilla 12 24
chocolate 23 20
chocolate chip 12 12
However, there is a bit of an issue on the third line, as there is a space. So far, my code works for the first two lines, but then throws a InputMismatchException on the third one. How do I make it so it reads both words from one line and then outputs it? My relevant code:
while (in.hasNext())
{
iceCreamFlavor = in.next();
iceCreamRadius = in.nextDouble();
iceCreamHeight = in.nextDouble();
out.println("Ice Cream: " + iceCreamFlavor);
}
In your input file, the separator between fields is composed of multiples spaces, no ?
if yes, you could simply use split method of String object.
You read a line.
You split it to obtain a String array.
String[] splitString = myString.split(" ");
Ther first element «0» is the String, the two others can be parsed as double
This could looks like :
try (BufferedReader br = new BufferedReader(new FileReader("path/to/the/file.txt"))) {
String line;
while ((line = br.readLine()) != null) {
String[] lineSplitted = line.split(" ");
String label = lineSplitted[0];
double d1 = Double.parseDouble(lineSplitted[1]);
double d2 = Double.parseDouble(lineSplitted[2]);
}
} catch (IOException e) {
e.printStackTrace();
}
You can use scanner.useDelimiter to change the delimiter or use a regular expression to parse the line.
//sets delimiter to 2 or more consecutive spaces
Scanner s = new Scanner(input).useDelimiter("(\\s){2-}");
Check the Scanner Javadoc for examples:

Result of Java split() is varies when working with string of numbers

Why does Java String.split() generate different results when working with string defined in code versus string read from a file when numbers are involved? Specifically I have a file called "test.txt" that contains chars and numbers separated by spaces:
G H 5 4
The split method does not split on spaces as expected. But if a string variable is created within code with same chars and numbers separated by spaces then the result of split() is four individual strings, one for char and number. The code below demonstrates this difference:
import java.io.File;
import java.io.FileReader;
import java.io.BufferedReader;
public class SplitNumber {
//Read first line of text file
public static void main(String[] args) {
try {
File file = new File("test.txt");
FileReader fr = new FileReader(file);
BufferedReader bufferedReader = new BufferedReader(fr);
String firstLine;
if ((firstLine = bufferedReader.readLine()) != null) {
String[] firstLineNumbers = firstLine.split("\\s+");
System.out.println("First line array length: " + firstLineNumbers.length);
for (int i=0; i<firstLineNumbers.length; i++) {
System.out.println(firstLineNumbers[i]);
}
}
bufferedReader.close();
String numberString = "G H 5 4";
String[] numbers = numberString.split("\\s+");
System.out.println("Numbers array length: " + numbers.length);
for (int i=0; i<numbers.length; i++) {
System.out.println(numbers[i]);
}
} catch(Exception exception) {
System.out.println("IOException occured");
exception.printStackTrace();
}
}
}
The result is:
First line array length: 3
G
H
5 4
Numbers array length: 4
G
H
5
4
Why do the numbers from the file not get parsed the same as the same string defined within code?
Based on feedback I changed the regex to split("[\\s\\h]+") which resolved the issue; the numbers for the file were properly split which clearly indicated that I had a different whitespace-like character in the text file that I was using. I then replaced the contents of the file (using notepad) and reverted back to split("\\s+") and found that it worked correctly this time. So at some point I must have introduced different white-space like characters in the file (maybe a copy/paste issue). In the end the take away is I should use split("[\\s\\h]+") when reading from a file where I want to split on spaces as it will cover more scenarios that may not be immediately obvious.
Thanks to all for helping me find the root cause of my issue.

Conflicting character counts

I'm trying to find the number of characters in a given text file.
I've tried using both a scanner and a BufferedReader, but I get conflicting results. With the use of a scanner I concatenate every line after I append a new line character. E.g. like this:
FileReader reader = new FileReader("sampleFile.txt");
Scanner lineScanner = new Scanner(reader);
String totalLines = "";
while (lineScanner.hasNextLine()){
String line = lineScanner.nextLine()+'\n';
totalLines += line;
}
System.out.println("Count "+totalLines.length());
This returns the true character count for my file, which is 5799
Whereas when I use:
BufferedReader reader = new BufferedReader(new FileReader("sample.txt"));
int i;
int count = 0;
while ((i = in.read()) != -1) {
count++;
}
System.out.println("Count "+count);
I get 5892.
I know using the lineScanner will be off by one if there is only one line, but for my text file I get the correct ouput.
Also in notepad++ the file length in bytes is 5892 but the character count without blanks is 5706.
Your file may have lines terminated with \r\n rather than \n. That could cause your discrepancy.
You have to consider the newline/carriage returns character in a text file. This also counts as a character.
I would suggest using the BufferedReader as it will return more accurate results.

Remove spaces only certain places not everywhere

I am taking in txt from a txt file and trying to store it in an arraylist. In the txt file the which mean ballot are not together instead the teacher placed it on a separate line. So we have to get all of the ballots together but i am not able to get them to be all together she placed the first ballot on the line like in example below. And we have to make the rest of them together. I am using a fileinputstream to collect the txt from the textfile.
the text looks like this :
person 1
person 2
person 3
<b> 1 2 3
<b>
1
3
2
I want it to look like this
person 1
person 2
person 3
<b> 1 2 3
<b> 1 3 2
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
System.out.print("Enter file name: ");
Scanner keyboard = new Scanner(System.in);
String fileName = keyboard.next();
File file = new File(fileName);
ArrayList<String> ballot;
ballot = new ArrayList<String>();
FileInputStream fstream = new FileInputStream(fileName);
DataInputStream ds = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(ds));
// Pattern p;
// Matcher m;
String strLine;
String inputText = "";
String newline = System.getProperty("line.separator");
while ((strLine = br.readLine()) != null) {
ballot.add(strLine);
}
You need to use a StringBuilder sb = new StringBuilder();. Each time you have a token String t, call sb.append(t.trim()).append(' ');. When done with parsing your file, call sb.toString();. If you want to add newlines here and there, use sb.append('\n');.
If you want an array of your tokens, you should use an ArrayList<String> al = new ArrayList<String>; Each time you have a set of token s making a line, use al.add(s); When done, call String[] result = al.toArray(new String[al.length]); You will need to concatenate your set of tokens into s for each line.

Categories