I am reading in from a stream using a BufferedReader and InputStreamReader to create one long string that gets created from the readers. It gets up to over 100,000 lines and then throws a 500 error (call failed on the server). I am not sure what is the problem, is there anything faster than this method? It works when the lines are in the thousands but i am working with large data sets.
BufferedReader in = new BufferedReader(new InputStreamReader(newConnect.getInputStream()));
String inputLine;
String xmlObject = "";
StringBuffer str = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
str.append(inputLine);
str.toString();
}
in.close();
Thanks in advance
to create one long string that gets created from the readers.
Are you by any chance doing this to create your "long string"?
String string;
while(...)
string+=whateverComesFromTheSocket;
If yes, then change it to
StringBuilder str = new StringBuilder(); //Edit:Just changed StringBuffer to StringBuilder
while(...)
str.append(whateverComesFromTheSocket);
String string = str.toString();
String objects are immutable and when you do str+="something", memory is reallocated and str+"something" is copied to that newly allocated area. This is a costly operation and running it 51,000 times is an extremely bad thing to do.
StringBuffer and StringBuilder are String's mutable brothers and StringBuilder, being non-concurrent is more efficient than StringBuffer.
readline() can read at about 90 MB/s, its what you are doing with the data read which is slow. BTW readline removes newlines so this approach you are using is flawed as it will turn everying into one line.
Rather than re-inventing the wheel I would suggest you try FileUtils.readLineToString()
This will read a file as a STring without discarding newlines, efficiently.
Related
Say we have a file like so:
one
two
three
(but this file got encrypted)
My crypto method returns the whole file in memory, as a byte[] type.
I know byte arrays don't have a concept of "lines", that's something a Scanner (for example) could have.
I would like to traverse each line, convert it to string and perform my operation on it but I don't know
how to:
Find lines in a byte array
Slice the original byte array to "lines" (I would convert those slices to String, to send to my other methods)
Correctly traverse a byte array, where each iteration is a new "line"
Also: do I need to consider the different OS the file might have been composed in? I know that there is some difference between new lines in Windows and Linux and I don't want my method to work only with one format.
Edit: Following some tips from answers here, I was able to write some code that gets the job done. I still wonder if this code is worthy of keeping or I am doing something that can fail in the future:
byte[] decryptedBytes = doMyCrypto(fileName, accessKey);
ByteArrayInputStream byteArrInStrm = new ByteArrayInputStream(decryptedBytes);
InputStreamReader inStrmReader = new InputStreamReader(byteArrInStrm);
BufferedReader buffReader = new BufferedReader(inStrmReader);
String delimRegex = ",";
String line;
String[] values = null;
while ((line = buffReader.readLine()) != null) {
values = line.split(delimRegex);
if (Objects.equals(values[0], tableKey)) {
return values;
}
}
System.out.println(String.format("No entry with key %s in %s", tableKey, fileName));
return values;
In particular, I was advised to explicitly set the encoding but I was unable to see exactly where?
If you want to stream this, I'd suggest:
Create a ByteArrayInputStream to wrap your array
Wrap that in an InputStreamReader to convert binary data to text - I suggest you explicitly specify the text encoding being used
Create a BufferedReader around that to read a line at a time
Then you can just use:
String line;
while ((line = bufferedReader.readLine()) != null)
{
// Do something with the line
}
BufferedReader handles line breaks from all operating systems.
So something like this:
byte[] data = ...;
ByteArrayInputStream stream = new ByteArrayInputStream(data);
InputStreamReader streamReader = new InputStreamReader(stream, StandardCharsets.UTF_8);
BufferedReader bufferedReader = new BufferedReader(streamReader);
String line;
while ((line = bufferedReader.readLine()) != null)
{
System.out.println(line);
}
Note that in general you'd want to use try-with-resources blocks for the streams and readers - but it doesn't matter in this case, because it's just in memory.
As Scott states i would like to see what you came up with so we can help you alter it to fit your needs.
Regarding your last comment about the OS; if you want to support multiple file types you should consider making several functions that support those different file extensions. As far as i know you do need to specify which file and what type of file you are reading with your code.
I'm developing an app. That app needs to get the content of a simple .php URL, and save it as a String.
The problem is that it is a very long String (VERY LONG) and it get's but in half. Take this link as an example:
http://thuum.org/download-dev-notes-web.php
With this code
URL notes = new URL("http://thuum.org/download-dev-notes-web.php")
BufferedReader in = new BufferedReader(new InputStreamReader(notes.openStream()));
String t = "";
while ((inputLine = in.readLine()) != null)
t = inputLine;
fOut = openFileOutput("notes", MODE_PRIVATE);
fOut.write(t.getBytes());
// Added This \/ to see it's length when divided, and it is not nearly as much as it should be
System.out.println(t.split("\\#").length);
Can someone tell me how would I be able to download that into a String, and save it into the internal storage without it getting cut? Some why it looks like it gets only the last x digits...
it seems you're overwriting your String t in every iteration of the while-loop. Try this:
StringBuilder result = new StringBuilder();
String inputLine = "";
while ((inputLine = in.readLine()) != null) {
result.append(inputLine);
}
fOut = openFileOutput("notes", MODE_PRIVATE);
fOut.write(result.toString().getBytes());
It creates a mutable StringBuilder and uses the resulting (immutable) String in the write call.
edit: I also recommend to always use curly brackets to indicate end of loop bodies, ommiting those can quickly lead to bugs, just check #gotofail for a recent example ;-)
I have big file (about 30mb) and here the code I use to read data from the file
BufferedReader br = new BufferedReader(new FileReader(file));
try {
String line = br.readLine();
while (line != null) {
sb.append(line).append("\n");
line = br.readLine();
}
Then I need to split the content I read, so I use
String[] inst = sb.toString().split("GO");
The problem is that sometimes the sub-string is over the maximum String length and I can't get all the data inside the string. How can I get rid of this?
Thanks
Scanner s = new Scanner(input).useDelimiter("GO"); and use s.next()
WHY PART:- The erroneous result may be the outcome of non contiguous heap segment as the CMS collector doesn't de-fragment memory.
(It does not answer your how to solve part though).
You may opt for loading the whole string partwise, i.e using substring
I'm trying to learn Java/Android and right now I'm doing some experiments with the replaceAll function. But I've found that with large text files the process gets sluggish so I was wondering if there is a way to skip the "useless" parts of a file to have a better performance. (Note: Just skip them, not delete them)
Note: I am not trying to "count lines" or "println" or "system.out", I'm just replacing strings and saving the changes in the same file.
Example
AAAA
CCCC- 9234802394819102948102948104981209381'238901'2309'129831'2381'2381'23081'23081'284091824098304982390482304981'20841'948023984129048'1489039842039481'204891'29031'923481290381'20391'294872385710239841'20391'20931'20853029573098341'290831'20893'12894093274019799919208310293810293810293810293810298'120931¿2093¿12039¿120931¿203912¿0391¿203912¿039¿12093¿12093¿12093¿12093¿12093¿1209312¿0390¿... DDDD
AAAA
CCCC- 9234802394819102948102948104981209381'238901'2309'129831'2381'2381'23081'23081'284091824098304982390482304981'20841'948023984129048'1489039842039481'204891'29031'923481290381'20391'294872385710239841'20391'20931'20853029573098341'290831'20893'12894093274019799919208310293810293810293810293810298'120931¿2093¿12039¿120931¿203912¿0391¿203912¿039¿12093¿12093¿12093¿12093¿12093¿1209312¿0390¿... DDDD
and so on....like a zillion times
I want to replace all "AAAA" with "BBBB", but there are large portions of data between the strings I am replacing. Also, this portions always begin with "CCCC" and end with "DDDD".
Here's the code I am using to replace the string.
File file = new File("my_file.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
String line = "", oldtext = "";
while((line = reader.readLine()) != null) {
oldtext += line + "\r\n";
}
reader.close();
// Replacing "AAAA" strings
String newtext= oldtext.replaceAll("AAAA", "BBBB");
FileWriter writer = new FileWriter("my_file.txt");
writer.write(newtext);
writer.close();
I think reading all lines is inefficient, especially when you won't be modifying these parts (and they represent the 90% of the file).
Does anyone know a solution???
You are wasting a lot of time on this line --
oldtext += line + "\r\n";
In Java, String is immutable, which means you can't modify them. Therefore, when you do the concatenation, Java is actually making a complete copy of oldtext. So, for every line in your file, you are recopying every line that came before in your new String. Take a look at StringBuilder for a a way to build a String avoiding these copies.
However, in your case, you do not need the whole file in memory, because you can process line by line. By moving your replaceAll and write into your loop, you can operate on each line as you read it. This will keep the memory footprint of the routine down, because you are only keeping a single line in memory.
Note that since the FileWriter is opened before you read the input file, you need to have a different name for the output file. If you want to keep the same name, you can do a renameTo on the File after you close it.
File file = new File("my_file.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
FileWriter writer = new FileWriter("my_out_file.txt");
String line = "";
while((line = reader.readLine()) != null) {
// Replacing "AAAA" strings
String newtext= line.replaceAll("AAAA", "BBBB");
writer.write(newtext);
}
reader.close();
writer.close();
The following code seems to only write a small part of the File in the StringBuilder - why?
Reader rdr = new BufferedReader(new InputStreamReader(new FileInputStream(...)));
StringBuilder buf = new StringBuilder();
CharBuffer cbuff = CharBuffer.allocate(1024);
while(rdr.read(cbuff) != -1){
buf.append(cbuff);
cbuff.clear();
}
rdr.close();
Some more information: The file is bigger than the CharBuffer, also i can see from the debugger that the charbuffer is indeed filled as expected. The only part that makes its way to the StringBuilder seems to be from somewhere in the middle of the file. I am using openJDK7.
I wonder why it would show such a behavior and how this can be fixed.
As Peter Lawrey mentioned, you need to call cbuff.flip() between the read and write. It seems that the append will read from the position of the buffer, which is at the end if we don't call cbuff.flip(). The reason why a part from somewhere in the middle is still written is because in the end, the buffer won't be completely filled, thus some "old" bytes will still be between the position in the buffer and the end of the buffer.
Mystery solved :-)
All those classes have been part of the JDK since 1.0. I doubt that any of them needs to be fixed.
Your code is a long way for the usual idiom. Was this intended as a learning exercise, one that's gone awry? Or did you really want to put this into an application?
Here's how I would expect to see those classes used:
public static final String NEWLINE = System.getProperty("line.separator");
public String readContents(File f) throws IOException {
StringBuilder builder = new StringBuilder(1024);
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader(f));
String line;
while ((line = br.readLine()) != null) {
builder.append(line).append(NEWLINE);
}
} finally {
closeQuietly(br);
}
return builder.toString();
}