How to skip certain line from Text file in java? - java

I am currently learning Java and I have faced this problem where I want to load a file that consists a huge number of lines (I am reading the file line by line ) and the thing I want to do is skip certain lines (pseudo-code).
the line thats starts with (specific word such as "ABC")
I have tried to use
if(line.startwith("abc"))
But that didn't work. I am not sure if I am doing it wrong, that's why I am here asking for a help, below part of the load function:
public String loadfile(.........){
//here goes the variables
try {
File data= new File(dataFile);
if (data.exists()) {
br = new BufferedReader(new FileReader(dataFile));
while ((thisLine = br.readLine()) != null) {
if (thisLine.length() > 0) {
tmpLine = thisLine.toString();
tmpLine2 = tmpLine.split(......);
[...]

Try
if (line.toUpperCase().startsWith(­"ABC")){
//skip line
} else {
//do something
}
This will converts the line to all the Upper Characters by using function toUpperCase() and will check whether the string starts with ABC .
And if it is true then it will do nothing(skip the line) and go into the else part.
You can also use startsWithIgnoreCase which is a function provided by the Apache Commons . It takes the two string arguments.
public static boolean startsWithIgnoreCase(String str,
String prefix)
This function return boolean.
And checks whether a String starts with a specified prefix.
It return true if the String starts with the prefix , case insensitive.

If the case isn't important try using the StringUtils.startsWithIgnoreCase(String str,
String prefix) of Apache Commons
This function return boolean.
See javadoc here
Usage:
if (StringUtils.startsWithIgnoreCase(­line, "abc")){
//skip line
} else {
//do something
}

If you have large a input File, you code will create a OutOfMemoryError. there is nothing you can do against it without editing te code (adding more memory will fail, if the file gets bigger).
I beleave you store the selected lines in memory. If the file gets lager (2GB or so) you'll have 4GB in memory. (The old Value of the String and the new one).
You have to work with streams to solve this.
Create a FileOutpuStream, and write the selcted line into that Stream.
Your method must be changed. For a large input yo cannot return a String:
public String loadfile(...){
You can return a Stream or a file.
public MyDeletingLineBufferedReader loadFile(...)

you can use:
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
String lineString;
try{
while((lineString = br.readLine()) != null) {
if (lineString.toUpperCase().startsWith(­"abc")){
//skip
} else {
//do something
}
}
}
or
static boolean startsWithIgnoreCase(String str, String prefix) method in org.apache.commons.lang.StringUtils like below.
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
String lineString;
try{
while((lineString = br.readLine()) != null) {
if (StringUtils.startsWithIgnoreCase(­lineString, "abc")){
//skip
} else {
//do something
}
}
}

Related

Find a string in a very large formatted text file in java

Here is the thing:
I have a really big text file and it has a format like this:
0007476|000011434982|00249626000|R|2008-01-11 00:00:00|9999-12-31 23:59:59|000019.99
0007476|000014017887|00313865000|R|2011-04-19 00:00:00|9999-12-31 23:59:59|000599.99
...
...
And I need to find if a particular pattern exists in the file, say
0007476|whatever|00313865000|whatever
All I need is a boolean saying yes or no.
Now what I have done is to read the file line by line and do a regular expression matching:
Pattern pattern = Pattern.compile(regex);
Scanner scanner = new Scanner(new File(fileName));
String line;
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (pattern.matcher(line).matches()) {
scanner.close();
return true;
}
}
and the regex has a form of
"0007476\|\d{12}\|0031386500.*
This method works, but it takes usually 15 seconds to search for a string that is far from the start line. Is there a faster way to achieve that? Thanks
The java String class has a contains method which returns a boolean. If your string is fixed, this is a lot faster than a regular expression:
if (string.contains("0007476|") && string.contains("|00313865000|")) {
// whatever
}
Hope that helped, if not, leave a comment.
I assume that you need the Scanner because the file is too big to read into a single String instead?
If that is not the case, you can probably use a regular expression that finds the match directly. Depending on whether or not you care about the specific text at the start of the line you can you something along the lines of:
"(?m)^0007476\|\d{12}\|0031386500.*$
If you do need to break it up into smaller chunks because of memory usage I would suggest not reading on a per line basis, (since the lines are rather short), but process bigger chunks using something like a BufferedReader instead?
I fiddled around a bit with a 1.25GB file and the following is about 2.5 times faster than your implementation:
private static boolean matches() throws IOException {
String regex = "(?m)^0007476\|\d{12}\|0031386500.*$";
Pattern pattern = Pattern.compile(regex);
try(BufferedReader br = new BufferedReader(new FileReader(FILENAME))) {
for(String lines; (lines = readLines(br, 10000)) != null; ) {
if (pattern.matcher(lines).find()) {
return true;
}
}
}
return false;
}
private static String readLines(BufferedReader br, int amount) throws IOException {
StringBuilder builder = new StringBuilder();
int lineCounter = 0;
for(String line; (line = br.readLine()) != null && lineCounter < amount; lineCounter++ ) {
builder.append(line).append(System.lineSeparator());
}
return lineCounter > 0 ? builder.toString() : null;
}

BufferedReader does not read all the lines in text file

I have a function.
public ArrayList<String> readRules(String src) {
try (BufferedReader br = new BufferedReader(new FileReader(src))) {
String sCurrentLine;
while ((sCurrentLine = br.readLine()) != null) {
System.out.println(sCurrentLine);
lines.add(sCurrentLine);
}
} catch (IOException e) {
e.printStackTrace();
}
return lines;
}
My file have 26.400 lines but this function just read 3400 lines at end of file.
How do I read all lines in file.
Thanks!
Why don't you use the utility method Files.readAllLines() (available since Java 7)?
This method ensures that the file is closed when all bytes have been read or an IOException (or another runtime exception) is thrown.
Bytes from the file are decoded into characters using the specified charset.
public ArrayList<String> readRules(String src) {
return Files.readAllLines(src, Charset.defaultCharset());
}
while ((sCurrentLine = br.readLine()) != null)
It is likely that you have an empty line or a line that is treated as null.
Try
while(br.hasNextLine())
{
String current = br.nextLine();
}
Edit: Or, in your text file, when a line is too long, the editor automatically wraps a single line into many lines. When you don't use return key, it is treated as a single line by BufferedReader.
Notepad++ is a good tool to prevent confusing a single line with multiple lines. It numbers the lines with respect to usage of return key. Maybe you could copy/paste your input file to Notepad++ and check if the line numbers match.
You can also cast into a List of strings using readAllLines() and then loop through it.
List<String> myfilevar = Files.readAllLines(Paths.get("/PATH/TO/MY/FILE.TXT"));
for(String x : myfilevar)
{
System.out.println(x);
}

Why would this program output "java.io.BufferedReader#Number"?

This is a quick one that stumps me. I've got a Java Program with the following code:
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
String file1 = args[0];
String file2 = args[1];
String output = args[2];
Writer writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(output), "utf-8"));
// Get the file
BufferedReader br1 = new BufferedReader(new FileReader(file1));
ArrayList<String> masterRBT = new ArrayList<String>();
// Read the files
while(br1.readLine() != null) {
masterRBT.add(br1.toString());
System.out.println(br1.toString());
}
Read the file (in this case, a .csv), and output it to the command line.
I use the command line to run the program, plus three parameters, using so (it only really uses the first one):
java -jar csvdiff.jar mainfile.csv subfile.csv output.csv
But then, it returns this:
java.io.BufferedReader#17dfafd1
Repeatedly, as if on loop. I tried putting in a Try/Catch error, but it still does the same - no errors. I've opened the .csv files, and verified its contents.
The CSV files are located in the same directory as the .jar file.
What am I missing?
because you are attempting to print an instance of BufferedReader not the data you are reading from it
Change
while(br1.readLine() != null) {
masterRBT.add(br1.toString());
System.out.println(br1.toString());
}
to
while((String line = br1.readLine()) != null) {
masterRBT.add(line);
System.out.println(line);
}
You're printing out br1.toString() - you're calling toString() on the BufferedReader itself. BufferedReader doesn't override toString(), so you're getting the implementation from Object, as documented:
The toString method for class Object returns a string consisting of the name of the class of which the object is an instance, the at-sign character #, and the unsigned hexadecimal representation of the hash code of the object. In other words, this method returns a string equal to the value of:
getClass().getName() + '#' + Integer.toHexString(hashCode())
That's not what you want. Presumably you actually want to print out the line that you've just read - but you've thrown that away by now. You want:
String line;
while((line = br1.readLine()) != null) {
masterRBT.add(line);
System.out.println(line);
}
Or as a for loop:
for (String line = br1.readLine(); line != null; line = br1.readLine()) {
masterRBT.add(line);
System.out.println(line);
}
As a general matter, if you start seeing ClassName#Number in output, that's almost certainly a similar problem of calling toString() on an object which doesn't override it.
You are not printing the line but the reader itself, to print the line change your code like this:
// Read the files
String line;
while((line = br1.readLine()) != null) {
masterRBT.add(line);
System.out.println(line);
}
Use this,
String str;
while((str=br1.readLine()) != null) {
masterRBT.add(str);
System.out.println(str);
}
Because BufferedReader.toString() just returns the class name and hash value of the object.
Use BufferedReader.readLine() to get the String instead.

BufferedReader gives missing characters

So I am trying to change the format of a text file that has line numbers every couple of lines just to make it cleaner and easier to read. I made a simple program that goes in and replaces all of the first three characters of a line with spaces, these three character spaces are where the numbers can be. The actual text doesn't start until a few more spaces in. When i do this and have the end result printed out it comes out with a diamond with a question mark in it and I'm assuming that this is the result of missing characters. It seems like most of the missing characters are the apostrophe symbol. If anyone could let me know how to fix it i would really appreciate it :)
public class Conversion {
public static void main(String args[]) throws IOException {
BufferedReader scan = null;
try {
scan = new BufferedReader(new FileReader(new File("C:\\Users\\Nasir\\Desktop\\Beowulftesting.txt")));
} catch (FileNotFoundException e) {
System.out.println("failed to read file");
}
String finalVersion = "";
String currLine;
while( (currLine = scan.readLine()) !=null){
if(currLine.length()>3)
currLine = " "+ currLine.substring(3);
finalVersion+=currLine+"\n";
}
scan.close();
System.out.println(finalVersion);
}
}
Instead of using FileReader, use an InputStreamReader with the correct text encoding. I think the strange characters are appearing because you're reading the file with the wrong encoding.
By the way, don't use += with strings in a loop, like you have. Instead, use a StringBuilder:
StringBuilder finalVersion = new StringBuilder();
String currLine;
while ((currLine = scan.readLine()) != null) {
if (currLine.length() > 3) {
finalVersion.append(" ").append(currLine.substring(3));
} else {
finalVersion.append(currLine);
}
finalVersion.append('\n');
}

BufferedReader: read multiple lines into a single string

I'm reading numbers from a txt file using BufferedReader for analysis. The way I'm going about this now is- reading a line using .readline, splitting this string into an array of strings using .split
public InputFile () {
fileIn = null;
//stuff here
fileIn = new FileReader((filename + ".txt"));
buffIn = new BufferedReader(fileIn);
return;
//stuff here
}
public String ReadBigStringIn() {
String line = null;
try { line = buffIn.readLine(); }
catch(IOException e){};
return line;
}
public ProcessMain() {
initComponents();
String[] stringArray;
String line;
try {
InputFile stringIn = new InputFile();
line = stringIn.ReadBigStringIn();
stringArray = line.split("[^0-9.+Ee-]+");
// analysis etc.
}
}
This works fine, but what if the txt file has multiple lines of text? Is there a way to output a single long string, or perhaps another way of doing it? Maybe use while(buffIn.readline != null) {}? Not sure how to implement this.
Ideas appreciated,
thanks.
You are right, a loop would be needed here.
The usual idiom (using only plain Java) is something like this:
public String ReadBigStringIn(BufferedReader buffIn) throws IOException {
StringBuilder everything = new StringBuilder();
String line;
while( (line = buffIn.readLine()) != null) {
everything.append(line);
}
return everything.toString();
}
This removes the line breaks - if you want to retain them, don't use the readLine() method, but simply read into a char[] instead (and append this to your StringBuilder).
Please note that this loop will run until the stream ends (and will block if it doesn't end), so if you need a different condition to finish the loop, implement it in there.
I would strongly advice using library here but since Java 8 you can do this also using streams.
try (InputStreamReader in = new InputStreamReader(System.in);
BufferedReader buffer = new BufferedReader(in)) {
final String fileAsText = buffer.lines().collect(Collectors.joining());
System.out.println(fileAsText);
} catch (Exception e) {
e.printStackTrace();
}
You can notice also that it is pretty effective as joining is using StringBuilder internally.
If you just want to read the entirety of a file into a string, I suggest you use Guava's Files class:
String text = Files.toString("filename.txt", Charsets.UTF_8);
Of course, that's assuming you want to maintain the linebreaks. If you want to remove the linebreaks, you could either load it that way and then use String.replace, or you could use Guava again:
List<String> lines = Files.readLines(new File("filename.txt"), Charsets.UTF_8);
String joined = Joiner.on("").join(lines);
Sounds like you want Apache IO FileUtils
String text = FileUtils.readStringFromFile(new File(filename + ".txt"));
String[] stringArray = text.split("[^0-9.+Ee-]+");
If you create a StringBuilder, then you can append every line to it, and return the String using toString() at the end.
You can replace your ReadBigStringIn() with
public String ReadBigStringIn() {
StringBuilder b = new StringBuilder();
try {
String line = buffIn.readLine();
while (line != null) {
b.append(line);
line = buffIn.readLine();
}
}
catch(IOException e){};
return b.toString();
}
You have a file containing doubles. Looks like you have more than one number per line, and may have multiple lines.
Simplest thing to do is read lines in a while loop.
You could return null from your ReadBigStringIn method when last line is reached and terminate your loop there.
But more normal would be to create and use the reader in one method. Perhaps you could change to a method which reads the file and returns an array or list of doubles.
BTW, could you simply split your strings by whitespace?
Reading a whole file into a single String may suit your particular case, but be aware that it could cause a memory explosion if your file was very large. Streaming approach is generally safer for such i/o.
This creates a long string, every line is seprateted from string " " (one space):
public String ReadBigStringIn() {
StringBuffer line = new StringBuffer();
try {
while(buffIn.ready()) {
line.append(" " + buffIn.readLine());
} catch(IOException e){
e.printStackTrace();
}
return line.toString();
}

Categories