How to split a very long string - java

I have big file (about 30mb) and here the code I use to read data from the file
BufferedReader br = new BufferedReader(new FileReader(file));
try {
String line = br.readLine();
while (line != null) {
sb.append(line).append("\n");
line = br.readLine();
}
Then I need to split the content I read, so I use
String[] inst = sb.toString().split("GO");
The problem is that sometimes the sub-string is over the maximum String length and I can't get all the data inside the string. How can I get rid of this?
Thanks

Scanner s = new Scanner(input).useDelimiter("GO"); and use s.next()

WHY PART:- The erroneous result may be the outcome of non contiguous heap segment as the CMS collector doesn't de-fragment memory.
(It does not answer your how to solve part though).
You may opt for loading the whole string partwise, i.e using substring

Related

Count number of lines in text skipping the empty ones

I need help with this. Can you tell me how to calculate the number of lines in the input.txt without counting the empty space lines?
So far, I tried:
BufferedReader reader = new BufferedReader(new FileReader("input.txt"));
int lines = 0;
while (reader.readLine() != null)
lines++;
So, this code is able to count the number of lines, but with the empty lines! I know that there are characters /n which illustrates the new line but I do not know how to integrate it in the solution.
I also tried to calculate number of lines, number of empty lines and subtract them, but I wasn't successful.
BufferedReader reader = new BufferedReader(new FileReader("input.txt"));
int lines = 0;
String line;
while ((line = reader.readLine()) != null){
if(!"".equals(line.trim())){
lines++;
}
}
You just need to remember the line you're looking at, and check it before counting:
int lines = 0;
String line;
while ((line = reader.readLine()) != null) {
if (!line.isEmpty()) {
lines++;
}
}
Note that you should be closing your reader too - either in an explicit finally statement, or using a try-with-resources statement in Java 7. I'd advise not using FileReader, too - it always uses the platform default encoding, which isn't a good idea, IMO. Use FileInputStream with an InputStreamReader, and state the encoding explicitly.
You might also want to skip lines which consist entirely of whitespace, but that's an easy change to make to the if (!line.isEmpty()) condition. For example, you could use:
if (!line.trim().isEmpty())
instead... although it would be cleaner to find a helper method which just detected whether a string only consisted of whitespace rather than constructing a new string. A regex could do this, for example.
BufferedReader's readLine() method only returns null when the end of the stream has been reached. To not count empty lines, test if the line exists and if it's empty then don't count it.
Quoting the linked Javadocs above:
Returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
Clean and fast.
try (BufferedReader reader = new BufferedReader("Your inputStream or FileReader")) {
nbLignes = (int) reader.lines().filter(line -> !line.isEmpty()).count();
}

Skip parts while reading and writing a file in Android/Java

I'm trying to learn Java/Android and right now I'm doing some experiments with the replaceAll function. But I've found that with large text files the process gets sluggish so I was wondering if there is a way to skip the "useless" parts of a file to have a better performance. (Note: Just skip them, not delete them)
Note: I am not trying to "count lines" or "println" or "system.out", I'm just replacing strings and saving the changes in the same file.
Example
AAAA
CCCC- 9234802394819102948102948104981209381'238901'2309'129831'2381'2381'23081'23081'284091824098304982390482304981'20841'948023984129048'1489039842039481'204891'29031'923481290381'20391'294872385710239841'20391'20931'20853029573098341'290831'20893'12894093274019799919208310293810293810293810293810298'120931¿2093¿12039¿120931¿203912¿0391¿203912¿039¿12093¿12093¿12093¿12093¿12093¿1209312¿0390¿... DDDD
AAAA
CCCC- 9234802394819102948102948104981209381'238901'2309'129831'2381'2381'23081'23081'284091824098304982390482304981'20841'948023984129048'1489039842039481'204891'29031'923481290381'20391'294872385710239841'20391'20931'20853029573098341'290831'20893'12894093274019799919208310293810293810293810293810298'120931¿2093¿12039¿120931¿203912¿0391¿203912¿039¿12093¿12093¿12093¿12093¿12093¿1209312¿0390¿... DDDD
and so on....like a zillion times
I want to replace all "AAAA" with "BBBB", but there are large portions of data between the strings I am replacing. Also, this portions always begin with "CCCC" and end with "DDDD".
Here's the code I am using to replace the string.
File file = new File("my_file.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
String line = "", oldtext = "";
while((line = reader.readLine()) != null) {
oldtext += line + "\r\n";
}
reader.close();
// Replacing "AAAA" strings
String newtext= oldtext.replaceAll("AAAA", "BBBB");
FileWriter writer = new FileWriter("my_file.txt");
writer.write(newtext);
writer.close();
I think reading all lines is inefficient, especially when you won't be modifying these parts (and they represent the 90% of the file).
Does anyone know a solution???
You are wasting a lot of time on this line --
oldtext += line + "\r\n";
In Java, String is immutable, which means you can't modify them. Therefore, when you do the concatenation, Java is actually making a complete copy of oldtext. So, for every line in your file, you are recopying every line that came before in your new String. Take a look at StringBuilder for a a way to build a String avoiding these copies.
However, in your case, you do not need the whole file in memory, because you can process line by line. By moving your replaceAll and write into your loop, you can operate on each line as you read it. This will keep the memory footprint of the routine down, because you are only keeping a single line in memory.
Note that since the FileWriter is opened before you read the input file, you need to have a different name for the output file. If you want to keep the same name, you can do a renameTo on the File after you close it.
File file = new File("my_file.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
FileWriter writer = new FileWriter("my_out_file.txt");
String line = "";
while((line = reader.readLine()) != null) {
// Replacing "AAAA" strings
String newtext= line.replaceAll("AAAA", "BBBB");
writer.write(newtext);
}
reader.close();
writer.close();

java read properties and xml file using stringbuilder

I need to read a set of xml and property files and parse the data. Currently I am using inputstream ans string builder to do this. But this does not create the file in the same way as input file is. I donot want to remove the white spaces and new lines. How do i achieve this.
is = test.getInputStream();
br = new BufferedReader(new InputStreamReader(is));
String line5;
StringBuilder sb5 = new StringBuilder();
while ((line5 = br.readLine()) != null) {
sb5.append(line5);
}
String s = sb5.toString();
My output is:
#test 123 #test2 345
Expected output is:
#test
123
#test2
345
Any thoughts ? Thanks
br.readLine() consumes the line breaks, you need to add them to your StringBuilder after appending the line.
is = test.getInputStream();
br = new BufferedReader(new InputStreamReader(is));
String line5;
StringBuilder sb5 = new StringBuilder();
while ((line5 = br.readLine()) != null) {
sb5.append(line5);
sb5.append("\n");
}
If you want an extremely simple solution for reading a file to a String, Apache Commons-IO has a method for performing such a task (org.apache.commons.io.FileUtils).
FileUtils.readFileToString(File file, String encoding);
readLine() method doesn't add the EOL character (\n). So while appending the string to the builder, you need to add the EOL char, like sb5.append(line5+"\n");
The various readLine methods discard the newline from the input.
From the BufferedReader docs:
Returns: A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
A solution may be as simple as adding back a newline to your StringBuilder for every readLine: sb5.append(line5 + "\n");.
A better alternative is to read into an intermediate buffer first, using the read method, supplying your own char[]. You can still use StringBuilder.append, and get a String will match the file contents.

Readline is too slow - Anything Faster?

I am reading in from a stream using a BufferedReader and InputStreamReader to create one long string that gets created from the readers. It gets up to over 100,000 lines and then throws a 500 error (call failed on the server). I am not sure what is the problem, is there anything faster than this method? It works when the lines are in the thousands but i am working with large data sets.
BufferedReader in = new BufferedReader(new InputStreamReader(newConnect.getInputStream()));
String inputLine;
String xmlObject = "";
StringBuffer str = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
str.append(inputLine);
str.toString();
}
in.close();
Thanks in advance
to create one long string that gets created from the readers.
Are you by any chance doing this to create your "long string"?
String string;
while(...)
string+=whateverComesFromTheSocket;
If yes, then change it to
StringBuilder str = new StringBuilder(); //Edit:Just changed StringBuffer to StringBuilder
while(...)
str.append(whateverComesFromTheSocket);
String string = str.toString();
String objects are immutable and when you do str+="something", memory is reallocated and str+"something" is copied to that newly allocated area. This is a costly operation and running it 51,000 times is an extremely bad thing to do.
StringBuffer and StringBuilder are String's mutable brothers and StringBuilder, being non-concurrent is more efficient than StringBuffer.
readline() can read at about 90 MB/s, its what you are doing with the data read which is slow. BTW readline removes newlines so this approach you are using is flawed as it will turn everying into one line.
Rather than re-inventing the wheel I would suggest you try FileUtils.readLineToString()
This will read a file as a STring without discarding newlines, efficiently.

Java create strings from Buffered Reader and compare Strings

I am using Java + Selenium 1 to test a web application.
I have to read through a text file line by line using befferedreader.readLine and compare the data that was found to another String.
Is there way to assign each line a unique string? I think it would be something like this:
FileInputStream fstream = new FileInputStream("C:\\write.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
String[] strArray = null;
int p=0;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
strArray[p] = strLine;
assertTrue(strArray[p].equals(someString));
p=p+1;
}
The problem with this is that you don't know how many lines there are, so you can't size your array correctly. Use a List<String> instead.
In order of decreasing importance,
You don't need to store the Strings in an array at all, as pointed out by Perception.
You don't know how many lines there are, so as pointed out by Qwerky, if you do need to store them you should use a resizeable collection like ArrayList.
DataInputStream is not needed: you can just wrap your FileInputStream directly in an InputStreamReader.
You may want to try something like:
public final static String someString = "someString";
public boolean isMyFileOk(String filename){
Scanner sc = new Scanner(filename);
boolean fileOk = true;
while(sc.hasNext() && fileOk){
String line = sc.nextLine();
fileOk = isMyLineOk(line);
}
sc.close();
return fileOk;
}
public boolean isMyLineOk(String line){
return line.equals(someString);
}
The Scanner class is usually a great class to read files :)
And as suggested, you may check one line at a time instead of loading them all in memory before processing them. This may not be an issue if your file is relatively small but you better keep your code scalable, especially for doing the exact same thing :)

Categories