I'm relatively new to Java and I'm attempting to write a simple android app. I have a large text file with about 3500 lines in the assets folder of my applications and I need to read it into a string. I found a good example about how to do this but I have a question about why the byte array is initialized to 1024. Wouldn't I want to initialize it to the length of my text file? Also, wouldn't I want to use char, not byte? Here is the code:
private void populateArray(){
AssetManager assetManager = getAssets();
InputStream inputStream = null;
try {
inputStream = assetManager.open("3500LineTextFile.txt");
} catch (IOException e) {
Log.e("IOException populateArray", e.getMessage());
}
String s = readTextFile(inputStream);
// Add more code here to populate array from string
}
private String readTextFile(InputStream inputStream) {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
inputStream.length
byte buf[] = new byte[1024];
int len;
try {
while ((len = inputStream.read(buf)) != -1) {
outputStream.write(buf, 0, len);
}
outputStream.close();
inputStream.close();
} catch (IOException e) {
Log.e("IOException readTextFile", e.getMessage());
}
return outputStream.toString();
}
EDIT: Based on your suggestions, I tried this approach. Is it any better? Thanks.
private void populateArray(){
AssetManager assetManager = getAssets();
InputStream inputStream = null;
Reader iStreamReader = null;
try {
inputStream = assetManager.open("List.txt");
iStreamReader = new InputStreamReader(inputStream, "UTF-8");
} catch (IOException e) {
Log.e("IOException populateArray", e.getMessage());
}
String String = readTextFile(iStreamReader);
// more code here
}
private String readTextFile(InputStreamReader inputStreamReader) {
StringBuilder sb = new StringBuilder();
char buf[] = new char[2048];
int read;
try {
do {
read = inputStreamReader.read(buf, 0, buf.length);
if (read>0) {
sb.append(buf, 0, read);
}
} while (read>=0);
} catch (IOException e) {
Log.e("IOException readTextFile", e.getMessage());
}
return sb.toString();
}
This example is not good at all. It's full of bad practices (hiding exceptions, not closing streams in finally blocks, not specify an explicit encoding, etc.). It uses a 1024 bytes long buffer because it doesn't have any way of knowing the length of the input stream.
Read the Java IO tutorial to learn how to read text from a file.
You are reading the file into a buffer of 1024 Bytes.
Then those 1024 bytes are written to outputStream.
This process repeats until the whole file is read into the outputStream.
As JB Nizet mentioned the example is full of bad practices.
Wouldn't I want to initialize it to the length of my text file? Also, wouldn't I want to use char, not byte?
Yes, and yes ... and as other answers have said, you've picked an example with a number of errors in it.
However, there is a theoretical problem doing both; i.e. setting the buffer length to the file length and using a character buffer rather than a byte buffer. The problem is that the file size is measured in bytes, but the size of the buffer needs to be measured in characters. This is normally fine, but it is theoretically possible that you will need more characters than the file size in bytes; e.g. if the input file used a 6 bit character set and packed 4 characters into 3 bytes.
To read from a file I usaully use a Scanner and a StringBuilder.
Scanner scan = new Scanner(new BufferedInputStream(new FileInputStream(filename)), "UTF-8");
StringBuilder sb = new StringBuilder();
while (scan.hasNextLine()) {
sb.append(scan.nextLine());
sb.append("\n");
}
scan.close
return sb.toString();
Try to throw your exceptions instead of swallowing them. The caller must know there was a problem reading your file.
Edit: Also note that using a BufferedInputStream is important. Otherwise it will try to read bytes by bytes which can be slow.
Related
I was writing a function in java that can read file and get its content to String:
public static String ReadFromFile(String fileLocation) {
StringBuilder result = new StringBuilder();
RandomAccessFile randomAccessFile = null;
FileChannel fileChannel = null;
try {
randomAccessFile = new RandomAccessFile(fileLocation, "r");
fileChannel = randomAccessFile.getChannel();
ByteBuffer byteBuffer = ByteBuffer.allocate(10);
CharBuffer charBuffer = null;
int bytesRead = fileChannel.read(byteBuffer);
while (bytesRead != -1) {
byteBuffer.flip();
charBuffer = StandardCharsets.UTF_8.decode(byteBuffer);
result.append(charBuffer.toString());
byteBuffer.clear();
bytesRead = fileChannel.read(byteBuffer);
}
} catch (IOException ignored) {
} finally {
try {
if (fileChannel != null)
fileChannel.close();
if (randomAccessFile != null)
randomAccessFile.close();
} catch (IOException ignored) {
}
}
return result.toString();
}
From code above you can see that I set 'ByteBuffer.allocate' only 10 bytes on purpose to make things clearer.
Now I want to read a file named "test.txt" that contains unicode charaters in Chinese like this:
乐正绫我爱你乐正绫我爱你
Below is my test code for it:
System.out.println(ReadFromFile("test.txt"));
Expected Output in Console
乐正绫我爱你乐正绫我爱你
Actual Output in Console
乐正绫���爱你��正绫我爱你
Possible Reason
ByteBuffer only allocated 10 bytes, thus unicode characters are truncated every 10 bytes.
Attempt To Solve
Increase ByteBuffer allocated bytes to 20, I got the result below:
乐正绫我爱你��正绫我爱你
Not A Robust Solution
Allocate ByteBuffer to a very huge number, like 102400, but it is not practical when it comes to very huge text files.
Question
How to solve this problem?
You can't, since you don't know how many bytes are used for each character in UTF-8 encoding, and you really don't want to rewrite that logic.
There's Files.readString() in Java 11, for lower versions you can use Files.readAllBytes() e.g.
Path path = new File(fileLocation).toPath()
String contents = new String(Files.readAllBytes(path), StandardCharsets.UTF_8);
we were given a few exercises in lab and one of these is to convert the file transferring method from FileInputStream to BufferedInputStream. It's a client sending a GET request to a web server, which sends the file requested.
I came up with a simple solution, and I just wanted to check if it's correct.
Original code:
try {
FileInputStream fis = new FileInputStream(req);
// req, String containing file name
byte[] data = new byte [fis.available()];
fis.read(data);
out.write(data); // OutputStream out = socket.getOutputStream();
} catch (FileNotFoundException e){
new PrintStream(out).println("404 Not Found");
}
My try:
try {
BufferedInputStream bis = new BufferedInputStream (new FileInputStream(req));
byte[] data = new byte[4];
while(bis.read(data) > -1) {
out.write(data);
data = new byte[4];
}
} catch (FileNotFoundException e){
new PrintStream(out).println("404 Not Found");
}
The file is a web page named index.html, which contains a simple html page.
I have to reallocate the array every time, because at the last execution of the while loop, if the file isn't a multiple of 4 in size, the data array will contain characters from the previous execution, which are shown in the browser.
I chose 4 as data size for debugging purposes.
Output is correct.
Is this a good solution or can I do better?
There's no need to re-create the byte array each time - just overwrite it. More importantly though, you have a conceptual mistake inside your loop. Each iteration just writes the array to the stream assuming it's all valid. If you examine BufferedInputStream#read's documentation you'll see it may not read enough data to fill the entire array, and will return the number of bytes it actually read. You should use this number to limit the amount of bytes you're writing:
while((int len = bis.read(data)) > -1) {
out.write(data, 0, len);
}
I suggest you close off your file once you are done. The BufferedInputStream uses an 8 KB buffer by default which you are reducing to a smaller buffer. A simpler solution is to copy 8 KB at a time and not use the added buffer
try (InputStream in = new FileInputStream(req)) {
byte[] data = new byte[8 << 10];
for (int len; (len = bis.read(data)) > -1; )
out.write(data, 0, len);
} catch (IOException e) {
out.write("404 Not Found\n".getBytes());
}
I need to read multiple small files and append them into a bigger single file.
Base64OutputStream baos = new Base64OutputStream(new FileOutputStream(outputFile, true));
for (String fileLocation : fileLocations) {
InputStream fis = null;
try
{
fis = new FileInputStream(new File(fileLocation));
int bytesRead = 0;
byte[] buf = new byte[65536];
while ((bytesRead=fis.read(buf)) != -1) {
if (bytesRead > 0) baos.write(buf, 0, bytesRead);
}
}
catch (Exception e) {
logger.error(e.getMessage());
}
finally{
try{
if(fis != null)
fis.close();
}
catch(Exception e){
logger.error(e.getMessage());
}
}
}
All pretty standard, but I'm finding that, unless I open a new baos per input file (include it inside the loop), all the files following the first one written by baos are wrong (incorrect output).
The questions:
I've been told that opening/closing an outputstream back and forth for the same resource is not a good practice, why?
Why using a single output stream is not delivering the same result as multiple separate ones?
Perhaps the problem is that if you are assumming that encoding in base64 the concatenation of several files should give the same result as concatenating the base64 encoding of each file? That's not necessariy the case; base64 encodes groups of three consecutive input bytes to 4 ascii characters, so, unless you know that each file has a size that is a multiple of three, the base64 encoding will produce completely different outputs.
I have some strings that are roughly 10K characters each. There is plenty of repetition in them. They are serialized JSON objects. I'd like to easily compress them into a byte array, and uncompress them from a byte array.
How can I most easily do this? I'm looking for methods so I can do the following:
String original = "....long string here with 10K characters...";
byte[] compressed = StringCompressor.compress(original);
String decompressed = StringCompressor.decompress(compressed);
assert(original.equals(decompressed);
You can try
enum StringCompressor {
;
public static byte[] compress(String text) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
OutputStream out = new DeflaterOutputStream(baos);
out.write(text.getBytes("UTF-8"));
out.close();
} catch (IOException e) {
throw new AssertionError(e);
}
return baos.toByteArray();
}
public static String decompress(byte[] bytes) {
InputStream in = new InflaterInputStream(new ByteArrayInputStream(bytes));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
byte[] buffer = new byte[8192];
int len;
while((len = in.read(buffer))>0)
baos.write(buffer, 0, len);
return new String(baos.toByteArray(), "UTF-8");
} catch (IOException e) {
throw new AssertionError(e);
}
}
}
Peter Lawrey's answer can be improved a bit using this less complex code for the decompress function
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
OutputStream out = new InflaterOutputStream(baos);
out.write(bytes);
out.close();
return new String(baos.toByteArray(), "UTF-8");
} catch (IOException e) {
throw new AssertionError(e);
}
I made a library to solve the problem of compressing generic Strings (expecially short ones).
It tries to compress the String using various algorithms (plain utf-8, 5bit encoding for latin letters, huffman encoding, gzip for long Strings) and chooses the one with the shortest result (in the worst case, it will choose the utf-8 encoding, so that you never risk to lose space).
I hope it may be useful, here's the link
https://github.com/lithedream/lithestring
EDIT: I realized that your Strings are always "long", my library defaults on gzip for those sizes, I fear I cannot do better for you.
Virtually every code example out there reads a TXT file line-by-line and stores it in a String array. I do not want line-by-line processing because I think it's an unnecessary waste of resources for my requirements: All I want to do is quickly and efficiently dump the .txt contents into a single String. The method below does the job, however with one drawback:
private static String readFileAsString(String filePath) throws java.io.IOException{
byte[] buffer = new byte[(int) new File(filePath).length()];
BufferedInputStream f = null;
try {
f = new BufferedInputStream(new FileInputStream(filePath));
f.read(buffer);
if (f != null) try { f.close(); } catch (IOException ignored) { }
} catch (IOException ignored) { System.out.println("File not found or invalid path.");}
return new String(buffer);
}
... the drawback is that the line breaks are converted into long spaces e.g. " ".
I want the line breaks to be converted from \n or \r to <br> (HTML tag) instead.
Thank you in advance.
What about using a Scanner and adding the linefeeds yourself:
sc = new java.util.Scanner ("sample.txt")
while (sc.hasNext ()) {
buf.append (sc.nextLine ());
buf.append ("<br />");
}
I don't see where you get your long spaces from.
You can read directly into the buffer and then create a String from the buffer:
File f = new File(filePath);
FileInputStream fin = new FileInputStream(f);
byte[] buffer = new byte[(int) f.length()];
new DataInputStream(fin).readFully(buffer);
fin.close();
String s = new String(buffer, "UTF-8");
You could add this code:
return new String(buffer).replaceAll("(\r\n|\r|\n|\n\r)", "<br>");
Is this what you are looking for?
The code will read the file contents as they appear in the file - including line breaks.
If you want to change the breaks into something else like displaying in html etc, you will either need to post process it or do it by reading the file line by line. Since you do not want the latter, you can replace your return by following which should do the conversion -
return (new String(buffer)).replaceAll("\r[\n]?", "<br>");
StringBuilder sb = new StringBuilder();
try {
InputStream is = getAssets().open("myfile.txt");
byte[] bytes = new byte[1024];
int numRead = 0;
try {
while((numRead = is.read(bytes)) != -1)
sb.append(new String(bytes, 0, numRead));
}
catch(IOException e) {
}
is.close();
}
catch(IOException e) {
}
your resulting String: String result = sb.toString();
then replace whatever you want in this result.
I agree with the general approach by #Sanket Patel, but using Commons I/O you would likely want File Utils.
So your code word look like:
String myString = FileUtils.readFileToString(new File(filePath));
There is also another version to specify an alternate character encoding.
You should try org.apache.commons.io.IOUtils.toString(InputStream is) to get file content as String. There you can pass InputStream object which you will get from
getAssets().open("xml2json.txt") *<<- belongs to Android, which returns InputStream*
in your Activity. To get String use this :
String xml = IOUtils.toString((getAssets().open("xml2json.txt")));
So,
String xml = IOUtils.toString(*pass_your_InputStream_object_here*);