BufferedReader gives missing characters - java

So I am trying to change the format of a text file that has line numbers every couple of lines just to make it cleaner and easier to read. I made a simple program that goes in and replaces all of the first three characters of a line with spaces, these three character spaces are where the numbers can be. The actual text doesn't start until a few more spaces in. When i do this and have the end result printed out it comes out with a diamond with a question mark in it and I'm assuming that this is the result of missing characters. It seems like most of the missing characters are the apostrophe symbol. If anyone could let me know how to fix it i would really appreciate it :)
public class Conversion {
public static void main(String args[]) throws IOException {
BufferedReader scan = null;
try {
scan = new BufferedReader(new FileReader(new File("C:\\Users\\Nasir\\Desktop\\Beowulftesting.txt")));
} catch (FileNotFoundException e) {
System.out.println("failed to read file");
}
String finalVersion = "";
String currLine;
while( (currLine = scan.readLine()) !=null){
if(currLine.length()>3)
currLine = " "+ currLine.substring(3);
finalVersion+=currLine+"\n";
}
scan.close();
System.out.println(finalVersion);
}
}

Instead of using FileReader, use an InputStreamReader with the correct text encoding. I think the strange characters are appearing because you're reading the file with the wrong encoding.
By the way, don't use += with strings in a loop, like you have. Instead, use a StringBuilder:
StringBuilder finalVersion = new StringBuilder();
String currLine;
while ((currLine = scan.readLine()) != null) {
if (currLine.length() > 3) {
finalVersion.append(" ").append(currLine.substring(3));
} else {
finalVersion.append(currLine);
}
finalVersion.append('\n');
}

Related

Why am I getting a StringIndexOutOfBoundsException when I try to skip multiple lines with BufferedReader?

I am working on a game, and I want to use this text file of mythological names to procedurally generate galaxy solar-system names.
When I read the text file, I tell the while-loop I'm using to continue if there is something that's not a name on a given line. That seems to throw an exception in some (not all) areas where there are multiple lines without names.
How can I make the program work without throwing exceptions or reading lines without names on them?
My Code:
public class Rewrite {
public static void main(String[] args) {
loadFromFile();
}
private static void loadFromFile() {
String[] names = new String[1000];
try {
FileReader fr = new FileReader("src/res/names/Galaxy_System_Names.txt");
BufferedReader br = new BufferedReader(fr);
String aLine;
int countIndex = 0;
while ((aLine = br.readLine()) != null) {
// skip lines without names
if (aLine.equals(String.valueOf(System.lineSeparator()))) {
aLine = br.readLine();
continue;
} else if (aLine.equals("&")) {
aLine = br.readLine();
continue;
} else if (aLine.startsWith("(")) {
aLine = br.readLine();
continue;
}
System.out.println(aLine);
// capitalize first letter of the line
String firstLetter = String.valueOf(aLine.charAt(0));
aLine = firstLetter + aLine.substring(1);
names[countIndex++] = aLine;
}
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
The Exception Thrown:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:47)
at java.base/java.lang.String.charAt(String.java:702)
at utilities.Rewrite.loadHumanNamesFromFile(Rewrite.java:39)
at utilities.Rewrite.main(Rewrite.java:10)
Text-File sample: This throws an error after the name "amor"
áed
áedán
aegle
aella
aeneas
aeolus
aeron
(2)
&
aeson
agamemnon
agaue
aglaea
aglaia
agni
(1)
agrona
ahriman
ahti
ahura
mazda
aias
aigle
ailill
aineias
aino
aiolos
ajax
akantha
alberic
alberich
alcides
alcippe
alcmene
alcyone
alecto
alekto
alexander
alexandra
alexandros
alf
(1)
alfr
alkeides
alkippe
alkmene
alkyone
althea
alvis
alvíss
amalthea
amaterasu
amen
ameretat
amirani
ammon
amon
amon-ra
amor
&
amordad
amulius
amun
From the docs of the BufferedReader::readLine:
Returns: A String containing the contents of the line, not including any line-termination characters
Thus when you get to this part of the file:
amor
&
It will read the blank line and strip the linebreak character, and all that will be left is an empty String. Therefore it will not be caught by your if statement:
if (aLine.equals(String.valueOf(System.lineSeparator())))
You need to add in a check for isEmpty()
After amor is an empty line. You're trying to get the char at index 0 of an empty line. Since it's an empty line, it obviously has no chars, and as such there's no char at index 0

What am I missing? NumberFormatException error

I want to read from a txt file which contains just numbers. Such file is in UTF-8, and the numbers are separated only by new lines (no spaces or any other things) just that. Whenever i call Integer.valueOf(myString), i get the exception.
This exception is really strange, because if i create a predefined string, such as "56\n", and use .trim(), it works perfectly. But in my code, not only that is not the case, but the exception texts says that what it couldn't convert was "54856". I have tried to introduce a new line there, and then the error text says it couldn't convert "54856
"
With that out of the question, what am I missing?
File ficheroEntrada = new File("C:\\in.txt");
FileReader entrada =new FileReader(ficheroEntrada);
BufferedReader input = new BufferedReader(entrada);
String s = input.readLine();
System.out.println(s);
Integer in;
in = Integer.valueOf(s.trim());
System.out.println(in);
The exception text reads as follows:
Exception in thread "main" java.lang.NumberFormatException: For input string: "54856"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)
at java.base/java.lang.Integer.parseInt(Integer.java:658)
at java.base/java.lang.Integer.valueOf(Integer.java:989)
at Quicksort.main(Quicksort.java:170)
The file in.txt consists of:
54856
896
54
53
2
5634
Well, aparently it had to do with Windows and those \r that it uses... I just tried executing it on a Linux VM and it worked. Thanks to everyone that answered!!
Try reading the file with Scanner class has use it's hasNextInt() method to identify what you are reading is Integer or not. This will help you find out what String/character is causing the issue
public static void main(String[] args) throws Exception {
File ficheroEntrada = new File(
"C:\\in.txt");
Scanner scan = new Scanner(ficheroEntrada);
while (scan.hasNext()) {
if (scan.hasNextInt()) {
System.out.println("found integer" + scan.nextInt());
} else {
System.out.println("not integer" + scan.next());
}
}
}
If you want to ensure parsability of a string, you could use a Pattern and Regex that.
Pattern intPattern = Pattern.compile("\\-?\\d+");
Matcher matcher = intPattern.matcher(input);
if (matcher.find()) {
int value = Integer.parseInt(matcher.group(0));
// ... do something with the result.
} else {
// ... handle unparsable line.
}
This pattern allows any numbers and optionally a minus before (without whitespace). It should definetly parse, unless it is too long. I don't know how it handles that, but your example seems to contain mostly short integers, so this should not matter.
Most probably you have a leading/trailing whitespaces in your input, something like:
String s = " 5436";
System.out.println(s);
Integer in;
in = Integer.valueOf(s.trim());
System.out.println(in);
Use trim() on string to get rid of it.
UPDATE 2:
If your file contains something like:
54856\n
896
54\n
53
2\n
5634
then use following code for it:
....your code
FileReader enter = new FileReader(file);
BufferedReader input = new BufferedReader(enter);
String currentLine;
while ((currentLine = input.readLine()) != null) {
Integer in;
//get rid of non-numbers
in = Integer.valueOf(currentLine.replaceAll("\\D+",""));
System.out.println(in);
...your code

Find a string in a very large formatted text file in java

Here is the thing:
I have a really big text file and it has a format like this:
0007476|000011434982|00249626000|R|2008-01-11 00:00:00|9999-12-31 23:59:59|000019.99
0007476|000014017887|00313865000|R|2011-04-19 00:00:00|9999-12-31 23:59:59|000599.99
...
...
And I need to find if a particular pattern exists in the file, say
0007476|whatever|00313865000|whatever
All I need is a boolean saying yes or no.
Now what I have done is to read the file line by line and do a regular expression matching:
Pattern pattern = Pattern.compile(regex);
Scanner scanner = new Scanner(new File(fileName));
String line;
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (pattern.matcher(line).matches()) {
scanner.close();
return true;
}
}
and the regex has a form of
"0007476\|\d{12}\|0031386500.*
This method works, but it takes usually 15 seconds to search for a string that is far from the start line. Is there a faster way to achieve that? Thanks
The java String class has a contains method which returns a boolean. If your string is fixed, this is a lot faster than a regular expression:
if (string.contains("0007476|") && string.contains("|00313865000|")) {
// whatever
}
Hope that helped, if not, leave a comment.
I assume that you need the Scanner because the file is too big to read into a single String instead?
If that is not the case, you can probably use a regular expression that finds the match directly. Depending on whether or not you care about the specific text at the start of the line you can you something along the lines of:
"(?m)^0007476\|\d{12}\|0031386500.*$
If you do need to break it up into smaller chunks because of memory usage I would suggest not reading on a per line basis, (since the lines are rather short), but process bigger chunks using something like a BufferedReader instead?
I fiddled around a bit with a 1.25GB file and the following is about 2.5 times faster than your implementation:
private static boolean matches() throws IOException {
String regex = "(?m)^0007476\|\d{12}\|0031386500.*$";
Pattern pattern = Pattern.compile(regex);
try(BufferedReader br = new BufferedReader(new FileReader(FILENAME))) {
for(String lines; (lines = readLines(br, 10000)) != null; ) {
if (pattern.matcher(lines).find()) {
return true;
}
}
}
return false;
}
private static String readLines(BufferedReader br, int amount) throws IOException {
StringBuilder builder = new StringBuilder();
int lineCounter = 0;
for(String line; (line = br.readLine()) != null && lineCounter < amount; lineCounter++ ) {
builder.append(line).append(System.lineSeparator());
}
return lineCounter > 0 ? builder.toString() : null;
}

How do I make my java code search only for a to z and 0 to 9

My java code takes almost 10-15minutes to run (Input file is 7200+ lines long list of query). How do I make it run in short time to get same results?
How do I make my code to search only for aA to zZ and 0 to 9??
If I don't do #2, some characters in my output are shown as "?". How do I solve this issue?
// no parameters are used in the main method
public static void main(String[] args) {
// assumes a text file named test.txt in a folder under the C:\file\test.txt
Scanner s = null;
BufferedWriter out = null;
try {
// create a scanner to read from the text file test.txt
FileInputStream fstream = new FileInputStream("C:\\user\\query.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
// Write to the file
out = new BufferedWriter(new FileWriter("C:\\user\\outputquery.txt"));
// keep getting the next String from the text, separated by white space
// and print each token in a line in the output file
//while (s.hasNext()) {
// String token = s.next();
// System.out.println(token);
// out.write(token + "\r\n");
//}
String strLine="";
String str="";
while ((strLine = br.readLine()) != null) {
str+=strLine;
}
String st=str.replaceAll(" ", "");
char[]third =st.toCharArray();
System.out.println("Character Total");
for(int counter =0;counter<third.length;counter++){
//String ch= "a";
char ch= third[counter];
int count=0;
for ( int i=0; i<third.length; i++){
// if (ch=="a")
if (ch==third[i])
count++;
}
boolean flag=false;
for(int j=counter-1;j>=0;j--){
//if(ch=="b")
if(ch==third[j])
flag=true;
}
if(!flag){
System.out.println(ch+" "+count);
out.write(ch+" "+count);
}
}
// close the output file
out.close();
} catch (IOException e) {
// print any error messages
System.out.println(e.getMessage());
}
// optional to close the scanner here, the close can occur at the end of the code
finally {
if (s != null) {
// close the input file
s.close();
}
}
}
For something like this I would NOT recommend java though it entirely possible it is much easier with GAWK or something similar. GAWK also has java like syntax so its easy to pick up. You should check it out.
SO isn't really the place to ask such a broad how-do-I-do-this-question but I will refer you to the following page on regular expression and text match in Java. Also, check out the Javadocs for regexes.
If you follow that link you should get what you want, else you could post a more specific question back on SO.

BufferedReader: read multiple lines into a single string

I'm reading numbers from a txt file using BufferedReader for analysis. The way I'm going about this now is- reading a line using .readline, splitting this string into an array of strings using .split
public InputFile () {
fileIn = null;
//stuff here
fileIn = new FileReader((filename + ".txt"));
buffIn = new BufferedReader(fileIn);
return;
//stuff here
}
public String ReadBigStringIn() {
String line = null;
try { line = buffIn.readLine(); }
catch(IOException e){};
return line;
}
public ProcessMain() {
initComponents();
String[] stringArray;
String line;
try {
InputFile stringIn = new InputFile();
line = stringIn.ReadBigStringIn();
stringArray = line.split("[^0-9.+Ee-]+");
// analysis etc.
}
}
This works fine, but what if the txt file has multiple lines of text? Is there a way to output a single long string, or perhaps another way of doing it? Maybe use while(buffIn.readline != null) {}? Not sure how to implement this.
Ideas appreciated,
thanks.
You are right, a loop would be needed here.
The usual idiom (using only plain Java) is something like this:
public String ReadBigStringIn(BufferedReader buffIn) throws IOException {
StringBuilder everything = new StringBuilder();
String line;
while( (line = buffIn.readLine()) != null) {
everything.append(line);
}
return everything.toString();
}
This removes the line breaks - if you want to retain them, don't use the readLine() method, but simply read into a char[] instead (and append this to your StringBuilder).
Please note that this loop will run until the stream ends (and will block if it doesn't end), so if you need a different condition to finish the loop, implement it in there.
I would strongly advice using library here but since Java 8 you can do this also using streams.
try (InputStreamReader in = new InputStreamReader(System.in);
BufferedReader buffer = new BufferedReader(in)) {
final String fileAsText = buffer.lines().collect(Collectors.joining());
System.out.println(fileAsText);
} catch (Exception e) {
e.printStackTrace();
}
You can notice also that it is pretty effective as joining is using StringBuilder internally.
If you just want to read the entirety of a file into a string, I suggest you use Guava's Files class:
String text = Files.toString("filename.txt", Charsets.UTF_8);
Of course, that's assuming you want to maintain the linebreaks. If you want to remove the linebreaks, you could either load it that way and then use String.replace, or you could use Guava again:
List<String> lines = Files.readLines(new File("filename.txt"), Charsets.UTF_8);
String joined = Joiner.on("").join(lines);
Sounds like you want Apache IO FileUtils
String text = FileUtils.readStringFromFile(new File(filename + ".txt"));
String[] stringArray = text.split("[^0-9.+Ee-]+");
If you create a StringBuilder, then you can append every line to it, and return the String using toString() at the end.
You can replace your ReadBigStringIn() with
public String ReadBigStringIn() {
StringBuilder b = new StringBuilder();
try {
String line = buffIn.readLine();
while (line != null) {
b.append(line);
line = buffIn.readLine();
}
}
catch(IOException e){};
return b.toString();
}
You have a file containing doubles. Looks like you have more than one number per line, and may have multiple lines.
Simplest thing to do is read lines in a while loop.
You could return null from your ReadBigStringIn method when last line is reached and terminate your loop there.
But more normal would be to create and use the reader in one method. Perhaps you could change to a method which reads the file and returns an array or list of doubles.
BTW, could you simply split your strings by whitespace?
Reading a whole file into a single String may suit your particular case, but be aware that it could cause a memory explosion if your file was very large. Streaming approach is generally safer for such i/o.
This creates a long string, every line is seprateted from string " " (one space):
public String ReadBigStringIn() {
StringBuffer line = new StringBuffer();
try {
while(buffIn.ready()) {
line.append(" " + buffIn.readLine());
} catch(IOException e){
e.printStackTrace();
}
return line.toString();
}

Categories