Checksum doesn't match when computed string contains accented characters - java

First : I have a string which contains an accented character .
Second : I calcul the checksum for it .
private static String checkSumInStream(String Str, String checksumAlgorithm) throws Exception
{
InputStream stream = new ByteArrayInputStream(Str.getBytes());
MessageDigest digest = MessageDigest.getInstance(checksumAlgorithm);
InputStream input = null;
StringBuffer sb = new StringBuffer();
try{
input = stream;
byte[] buffer = new byte[8192];
do {
int read = input.read(buffer);
if(read <= 0)
break;
digest.update(buffer, 0, read);
} while(true);
byte[] sum = digest.digest();
for (int i = 0; i < sum.length; i++) {
sb.append(Integer.toString((sum[i] & 0xff) + 0x100, 16).substring(1));
}
}catch(IOException io)
{
}finally{
if(input != null)
input.close();
}
return sb.toString();
}
Then i write the string in text file and i I recalcul the checksum of the file
private String checkSum(File file,String checksumAlgorithm) throws Exception
{
MessageDigest digest = MessageDigest.getInstance(checksumAlgorithm);
InputStream input = null;
input = new FileInputStream(file);
byte[] buffer = new byte[8192];
do {
int read = input.read(buffer);
if(read <= 0)
break;
digest.update(buffer, 0, read);
} while(true);
input.close();
byte[] sum = digest.digest();
StringBuffer sb = new StringBuffer();
for (int i = 0; i < sum.length; i++) {
sb.append(Integer.toString((sum[i] & 0xff) + 0x100, 16).substring(1));
}
return sb.toString();
}
--> Result : the comparison between checksum of an output steam and the file doesn't match when text contains an accented character .

How do you write the String to a file? You must be very careful to do that in the equivalent way of how you read it back from the file.
In your case:
OutputStream out = new FileOutputStream(myfile);
out.write(str.getBytes());
out.close();
Then it should work. But you need to keep in mind that str.getBytes() is not a safe method to use when you write to files, because it uses the platform default encoding for your characters. If you send such a file to some other place and use it there, you may be reading it back with the wrong encoding.
And it's possible that your platform default encoding doesn't even support accented characters! (But if you write and read files in exactly the same way, then you should get exactly the same result, so this wouldn't be the cause of your problem)
The best thing to do is to use the UTF-8 character encoding.
Where ever you used str.getBytes(), replace it with str.getBytes("UTF-8"), or str.getBytes(Charset.forName("UTF-8")) if you want to avoid having to catch UnsupportedEncodingException [even though every Java implementation is required to support the UTF-8 encoding. It's annoying...]

Related

Decryption only yields one correct line after encrypting line by line using RC4 algorithm

I have to encrypt a file line by line using the RC4 algorithm.
Encrypting the whole file and decrypting the whole file yields the original which is fine.
When I attempt to read the file one line at a time,encrypt it and then write the encrypted line to file, decryption of the resulting file yields just one correct line which is the first line of the original file.
I have tried to read the file and feed it to rc4 routine using a byte array whose size is a multiple of the key length but the results were the same. Here is my attempt:
try
{
BufferedReader br = new BufferedReader((new FileReader(fileToEncrypt)));
FileOutputStream fos = new FileOutputStream("C:\\Users\\nikaselo\\Documents\\Encryption\\encrypted.csv", true);
File file = new File("C:\\Users\\nikaselo\\Documents\\Encryption\\encrypted.csv");
// encrypt
while ((line = br.readLine()) != null)
{
byte [] encrypt = fed.RC4(line.getBytes(), pwd);
if (encrypt != null) dos.write(encrypt);
fos.flush();
}
fos.close();
// test decrypt
FileInputStream fis = null;
fis = new FileInputStream(file);
byte[] input = new byte[512];
int bytesRead;
while ((bytesRead = fis.read(input)) != -1)
{
byte [] de= fed.RC4(input, pwd);
String result = new String(de);
System.out.println(result);
}
}
catch (Exception ex)
{
ex.printStackTrace();
}
and here is my RC4 function
public byte [] RC4 (byte [] Str, String Pwd) throws Exception
{
int[] Sbox = new int [256] ;
int A, B,c,Tmp;;
byte [] Key = {};
byte [] ByteArray = {};
//KEY
if ((Pwd.length() == 0 || Str.length == 0))
{
byte [] arr = {};
return arr;
}
if(Pwd.length() > 256)
{
Key = Pwd.substring(0, 256).getBytes();
}
else
{
Key = Pwd.getBytes();
}
//String
for( A = 0 ; A <= 255; A++ )
{
Sbox[A] = A;
}
A = B = c= 0;
for (A = 0; A <= 255; A++)
{
B = (B + Sbox[A] + Key[A % Pwd.length()]) % 256;
Tmp = Sbox[A];
Sbox[A] = Sbox[B];
Sbox[B] = Tmp;
}
A = B = c= 0;
ByteArray = Str;
for (A = 0; A <= Str.length -1 ; A++)
{
B = (B + 1) % 256;
c = (c + Sbox[B]) % 256;
Tmp = Sbox[B];
Sbox[B] = Sbox[c];
Sbox[c] = Tmp;
ByteArray[A] = (byte) (ByteArray[A] ^ (Sbox[(Sbox[B] + Sbox[c]) % 256]));
}
return ByteArray;
}
Running this gives me one clean line and the rest is just unreadable.
You are encrypting line by line, but you are trying to decrypt in 512 bytes blocks.
Your options, as I see it are:
Encrypt and decrypt in fixed sized blocks
Pad each line out to 512 bytes (and split lines that are longer than 512 bytes)
Introduce a delimiter. This will be tricky because potentially any delimiter could appear in the cipher text, so you should base64 encode each encrypted line and separate them with line feeds.
Probably 1 is the easiest (and the one used in real encryption), but if you have to do it line by line, I would go with 3 even though this introduces a vulnerability, but it's RC4 which is no longer considered secure anyway.

HTTP manual Client not writing to disk properly JAVA

I am trying to build a manual HTTP client (using sockets) along with a cache and I cant seem to figure out why the files are not saving to disk properly. It works pretty good for HTML files, but cant seem to work for other files types that re not text based like .gif. Could anyone tell me why? I am quite new to HTTP protocol and Socket programming in general.
The loop to grab the response.
InputStream inputStream = socket.getInputStream();
PrintWriter outputStream = new PrintWriter(socket.getOutputStream());
ArrayList<Byte> dataIn = new ArrayList<Byte>();
ArrayList<String> stringData = new ArrayList<String>();
//Indices to show the location of certain lines in arrayList
int blankIndex = 8;
int lastModIndex = 0;
int byteBlankIndex = 0;
try
{
//Get last modified date
long lastMod = getLastModified(url);
Date d = new Date(lastMod);
//Construct the get request
outputStream.print("GET "+ "/" + pathName + " HTTP/1.1\r\n");
outputStream.print("If-Modified-Since: " + ft.format(d)+ "\r\n");
outputStream.print("Host: " + hostString+"\r\n");
outputStream.print("\r\n");
outputStream.flush();
//Booleans to prevent duplicates, only need first occurrences of key strings
boolean blankDetected = false;
boolean lastModDetected = false;
//Keep track of current index
int count = 0;
int byteCount = 0;
//While loop to read response
String buff = "";
byte t;
while ( (t = (byte) inputStream.read()) != -1)
{
dataIn.add(t);
//Check for key lines
char x = (char) t;
buff = buff + x;
//For the first blank line (signaling the end of the header)
if(x == '\n')
{
stringData.add(buff);
if(buff.equals("\r\n") && !blankDetected)
{
blankDetected = true;
blankIndex = count;
byteBlankIndex = byteCount + 2;
}
//For the last modified line
if(buff.contains("Last-Modified:") && !lastModDetected)
{
lastModDetected = true;
lastModIndex = count;
}
buff = "";
count++;
}
//Increment count
byteCount++;
}
}
The the code to parse through response and write file to disk.
String catalogKey = hostString+ "/" + pathName;
//Get the directory sequence to make
String directoryPath = catalogKey.substring(0, catalogKey.lastIndexOf("/") + 1);
//Make the directory sequence if possible, ignore the boolean value that results
boolean ignoreThisBooleanVal = new File(directoryPath).mkdirs();
//Setup output file, and then write the contents of dataIn (excluding header) to the file
PrintWriter output = new PrintWriter(new FileWriter(new File(catalogKey)),true);
for(int i = byteBlankIndex + 1 ; i < dataIn.size(); i++)
{
output.print(new String(new byte[]{ (byte)dataIn.get(i)}, StandardCharsets.UTF_8));
}
output.close();
byte t;
while ( (t = (byte) inputStream.read()) != -1)
The problem is here. It should read:
int t;
while ( (t = inputStream.read()) != -1)
{
byte b = (byte)t;
// use b from now on in the loop.
The issue is that a byte of 0xff in the input will be returned to the int as 0xff, but to the byte as -1, so you are unable to distinguish it from end of stream.
And you should use a FileOutputStream, not a FileWriter, and you should not accumulate potentially binary data into a String or StringBuffer or anything to do with char. As soon as you've got to the end of the header you should open a FileOutputStream and just start copying bytes. Use buffered streams to make all this more efficient.
Not much point in any of these given that HttpURLConnection already exists.

String invalid length after writing to StringBuilder and ByteArrayOutputStream from FileInputStream, issue with "null characters"

The goal is to read a file name from a file, which is a max of 100 bytes, and the actual name is the file name filled with "null-bytes".
Here is what it looks like in GNU nano
Where .PKGINFO is the valid file name, and the ^# represent "null bytes".
I tried here with StringBuilder
package falken;
import java.io.*;
public class Testing {
public Testing() {
try {
FileInputStream tarIn = new FileInputStream("/home/gala/falken_test/test.tar");
final int byteOffset = 0;
final int readBytesLength = 100;
StringBuilder stringBuilder = new StringBuilder();
for ( int bytesRead = 1, n, total = 0 ; (n = tarIn.read()) != -1 && total < readBytesLength ; bytesRead++ ) {
if (bytesRead > byteOffset) {
stringBuilder.append((char) n);
total++;
}
}
String out = stringBuilder.toString();
System.out.println(">" + out + "<");
System.out.println(out.length());
} catch (Exception e) {
/*
This is a pokemon catch not used in final code
*/
e.printStackTrace();
}
}
}
But it gives an invalid String length of 100, while the output on IntelliJ shows the correct string passed withing the >< signs.
>.PKGINFO<
100
Process finished with exit code 0
But when i paste it here on StackOverflow I get the correct string with unknown "null-characters", whose size is actually 100.
>.PKGINFO <
What regex can i use to get rid of the characters after the valid file name?
The file I am reading is ASCII encoded.
I also tried ByteArrayOutputStream, with the same result
package falken;
import java.io.*;
import java.nio.charset.StandardCharsets;
public class Testing {
public Testing() {
try {
FileInputStream tarIn = new FileInputStream("/home/gala/falken_test/test.tar");
final int byteOffset = 0;
final int readBytesLength = 100;
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
for ( int bytesRead = 1, n, total = 0 ; (n = tarIn.read()) != -1 && total < readBytesLength ; bytesRead++ ) {
if (bytesRead > byteOffset) {
byteArrayOutputStream.write(n);
total++;
}
}
String out = byteArrayOutputStream.toString();
System.out.println(">" + out + "<");
System.out.println(out.length());
} catch (Exception e) {
/*
This is a pokemon catch not used in final code
*/
e.printStackTrace();
}
}
}
What could be the issue here?
Well, it seems to be reading null characters as actual characters, spaces in fact. If it's possible, see if you can read the filename, then, cut out the null characters. In your case, you need a data.trim(); and a data2 = data.substring(0,(data.length()-1))
You need to stop appending to the string buffer once you read the first null character from the file.
You seem to want to read a tar archive, have a look at the following code which should get you started.
byte[] buffer = new byte[500]; // POSIX tar header is 500 bytes
FileInputStream is = new FileInputStream("test.tar");
int read = is.read(buffer);
// check number of bytes read; don't bother if not at least the whole
// header has been read
if (read == buffer.length) {
// search for first null byte; this is the end of the name
int offset = 0;
while (offset < 100 && buffer[offset] != 0) {
offset++;
}
// create string from byte buffer using ASCII as the encoding (other
// encodings are not supported by tar)
String name = new String(buffer, 0, offset,
StandardCharsets.US_ASCII);
System.out.println("'" + name + "'");
}
is.close();
You really shouldn't use trim() on the filename, this will break whenever you encounter a filename with leading or trailing blanks.

Servlet getContentLength() returns > 0 but getInputStream().available() returns 0 [duplicate]

How do I read an entire InputStream into a byte array?
You can use Apache Commons IO to handle this and similar tasks.
The IOUtils type has a static method to read an InputStream and return a byte[].
InputStream is;
byte[] bytes = IOUtils.toByteArray(is);
Internally this creates a ByteArrayOutputStream and copies the bytes to the output, then calls toByteArray(). It handles large files by copying the bytes in blocks of 4KiB.
You need to read each byte from your InputStream and write it to a ByteArrayOutputStream.
You can then retrieve the underlying byte array by calling toByteArray():
InputStream is = ...
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int nRead;
byte[] data = new byte[16384];
while ((nRead = is.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
}
return buffer.toByteArray();
Finally, after twenty years, there’s a simple solution without the need for a 3rd party library, thanks to Java 9:
InputStream is;
…
byte[] array = is.readAllBytes();
Note also the convenience methods readNBytes(byte[] b, int off, int len) and transferTo(OutputStream) addressing recurring needs.
Use vanilla Java's DataInputStream and its readFully Method (exists since at least Java 1.4):
...
byte[] bytes = new byte[(int) file.length()];
DataInputStream dis = new DataInputStream(new FileInputStream(file));
dis.readFully(bytes);
...
There are some other flavors of this method, but I use this all the time for this use case.
If you happen to use Google Guava, it'll be as simple as using ByteStreams:
byte[] bytes = ByteStreams.toByteArray(inputStream);
Safe solution (close streams correctly):
Java 9 and newer:
final byte[] bytes;
try (inputStream) {
bytes = inputStream.readAllBytes();
}
Java 8 and older:
public static byte[] readAllBytes(InputStream inputStream) throws IOException {
final int bufLen = 4 * 0x400; // 4KB
byte[] buf = new byte[bufLen];
int readLen;
IOException exception = null;
try {
try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
while ((readLen = inputStream.read(buf, 0, bufLen)) != -1)
outputStream.write(buf, 0, readLen);
return outputStream.toByteArray();
}
} catch (IOException e) {
exception = e;
throw e;
} finally {
if (exception == null) inputStream.close();
else try {
inputStream.close();
} catch (IOException e) {
exception.addSuppressed(e);
}
}
}
Kotlin (when Java 9+ isn't accessible):
#Throws(IOException::class)
fun InputStream.readAllBytes(): ByteArray {
val bufLen = 4 * 0x400 // 4KB
val buf = ByteArray(bufLen)
var readLen: Int = 0
ByteArrayOutputStream().use { o ->
this.use { i ->
while (i.read(buf, 0, bufLen).also { readLen = it } != -1)
o.write(buf, 0, readLen)
}
return o.toByteArray()
}
}
To avoid nested use see here.
Scala (when Java 9+ isn't accessible) (By #Joan. Thx):
def readAllBytes(inputStream: InputStream): Array[Byte] =
Stream.continually(inputStream.read).takeWhile(_ != -1).map(_.toByte).toArray
As always, also Spring framework (spring-core since 3.2.2) has something for you: StreamUtils.copyToByteArray()
public static byte[] getBytesFromInputStream(InputStream is) throws IOException {
ByteArrayOutputStream os = new ByteArrayOutputStream();
byte[] buffer = new byte[0xFFFF];
for (int len = is.read(buffer); len != -1; len = is.read(buffer)) {
os.write(buffer, 0, len);
}
return os.toByteArray();
}
In-case someone is still looking for a solution without dependency and If you have a file.
DataInputStream
byte[] data = new byte[(int) file.length()];
DataInputStream dis = new DataInputStream(new FileInputStream(file));
dis.readFully(data);
dis.close();
ByteArrayOutputStream
InputStream is = new FileInputStream(file);
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int nRead;
byte[] data = new byte[(int) file.length()];
while ((nRead = is.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
}
RandomAccessFile
RandomAccessFile raf = new RandomAccessFile(file, "r");
byte[] data = new byte[(int) raf.length()];
raf.readFully(data);
Do you really need the image as a byte[]? What exactly do you expect in the byte[] - the complete content of an image file, encoded in whatever format the image file is in, or RGB pixel values?
Other answers here show you how to read a file into a byte[]. Your byte[] will contain the exact contents of the file, and you'd need to decode that to do anything with the image data.
Java's standard API for reading (and writing) images is the ImageIO API, which you can find in the package javax.imageio. You can read in an image from a file with just a single line of code:
BufferedImage image = ImageIO.read(new File("image.jpg"));
This will give you a BufferedImage, not a byte[]. To get at the image data, you can call getRaster() on the BufferedImage. This will give you a Raster object, which has methods to access the pixel data (it has several getPixel() / getPixels() methods).
Lookup the API documentation for javax.imageio.ImageIO, java.awt.image.BufferedImage, java.awt.image.Raster etc.
ImageIO supports a number of image formats by default: JPEG, PNG, BMP, WBMP and GIF. It's possible to add support for more formats (you'd need a plug-in that implements the ImageIO service provider interface).
See also the following tutorial: Working with Images
If you don't want to use the Apache commons-io library, this snippet is taken from the sun.misc.IOUtils class. It's nearly twice as fast as the common implementation using ByteBuffers:
public static byte[] readFully(InputStream is, int length, boolean readAll)
throws IOException {
byte[] output = {};
if (length == -1) length = Integer.MAX_VALUE;
int pos = 0;
while (pos < length) {
int bytesToRead;
if (pos >= output.length) { // Only expand when there's no room
bytesToRead = Math.min(length - pos, output.length + 1024);
if (output.length < pos + bytesToRead) {
output = Arrays.copyOf(output, pos + bytesToRead);
}
} else {
bytesToRead = output.length - pos;
}
int cc = is.read(output, pos, bytesToRead);
if (cc < 0) {
if (readAll && length != Integer.MAX_VALUE) {
throw new EOFException("Detect premature EOF");
} else {
if (output.length != pos) {
output = Arrays.copyOf(output, pos);
}
break;
}
}
pos += cc;
}
return output;
}
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
while (true) {
int r = in.read(buffer);
if (r == -1) break;
out.write(buffer, 0, r);
}
byte[] ret = out.toByteArray();
#Adamski: You can avoid buffer entirely.
Code copied from http://www.exampledepot.com/egs/java.io/File2ByteArray.html (Yes, it is very verbose, but needs half the size of memory as the other solution.)
// Returns the contents of the file in a byte array.
public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = new FileInputStream(file);
// Get the size of the file
long length = file.length();
// You cannot create an array using a long type.
// It needs to be an int type.
// Before converting to an int type, check
// to ensure that file is not larger than Integer.MAX_VALUE.
if (length > Integer.MAX_VALUE) {
// File is too large
}
// Create the byte array to hold the data
byte[] bytes = new byte[(int)length];
// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}
// Ensure all the bytes have been read in
if (offset < bytes.length) {
throw new IOException("Could not completely read file "+file.getName());
}
// Close the input stream and return bytes
is.close();
return bytes;
}
Input Stream is ...
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int next = in.read();
while (next > -1) {
bos.write(next);
next = in.read();
}
bos.flush();
byte[] result = bos.toByteArray();
bos.close();
Java 9 will give you finally a nice method:
InputStream in = ...;
ByteArrayOutputStream bos = new ByteArrayOutputStream();
in.transferTo( bos );
byte[] bytes = bos.toByteArray();
We are seeing some delay for few AWS transaction, while converting S3 object to ByteArray.
Note: S3 Object is PDF document (max size is 3 mb).
We are using the option #1 (org.apache.commons.io.IOUtils) to convert the S3 object to ByteArray. We have noticed S3 provide the inbuild IOUtils method to convert the S3 object to ByteArray, we are request you to confirm what is the best way to convert the S3 object to ByteArray to avoid the delay.
Option #1:
import org.apache.commons.io.IOUtils;
is = s3object.getObjectContent();
content =IOUtils.toByteArray(is);
Option #2:
import com.amazonaws.util.IOUtils;
is = s3object.getObjectContent();
content =IOUtils.toByteArray(is);
Also let me know if we have any other better way to convert the s3 object to bytearray
I know it's too late but here I think is cleaner solution that's more readable...
/**
* method converts {#link InputStream} Object into byte[] array.
*
* #param stream the {#link InputStream} Object.
* #return the byte[] array representation of received {#link InputStream} Object.
* #throws IOException if an error occurs.
*/
public static byte[] streamToByteArray(InputStream stream) throws IOException {
byte[] buffer = new byte[1024];
ByteArrayOutputStream os = new ByteArrayOutputStream();
int line = 0;
// read bytes from stream, and store them in buffer
while ((line = stream.read(buffer)) != -1) {
// Writes bytes from byte array (buffer) into output stream.
os.write(buffer, 0, line);
}
stream.close();
os.flush();
os.close();
return os.toByteArray();
}
I tried to edit #numan's answer with a fix for writing garbage data but edit was rejected. While this short piece of code is nothing brilliant I can't see any other better answer. Here's what makes most sense to me:
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buffer = new byte[1024]; // you can configure the buffer size
int length;
while ((length = in.read(buffer)) != -1) out.write(buffer, 0, length); //copy streams
in.close(); // call this in a finally block
byte[] result = out.toByteArray();
btw ByteArrayOutputStream need not be closed. try/finally constructs omitted for readability
See the InputStream.available() documentation:
It is particularly important to realize that you must not use this
method to size a container and assume that you can read the entirety
of the stream without needing to resize the container. Such callers
should probably write everything they read to a ByteArrayOutputStream
and convert that to a byte array. Alternatively, if you're reading
from a file, File.length returns the current length of the file
(though assuming the file's length can't change may be incorrect,
reading a file is inherently racy).
Wrap it in a DataInputStream if that is off the table for some reason, just use read to hammer on it until it gives you a -1 or the entire block you asked for.
public int readFully(InputStream in, byte[] data) throws IOException {
int offset = 0;
int bytesRead;
boolean read = false;
while ((bytesRead = in.read(data, offset, data.length - offset)) != -1) {
read = true;
offset += bytesRead;
if (offset >= data.length) {
break;
}
}
return (read) ? offset : -1;
}
Java 8 way (thanks to BufferedReader and Adam Bien)
private static byte[] readFully(InputStream input) throws IOException {
try (BufferedReader buffer = new BufferedReader(new InputStreamReader(input))) {
return buffer.lines().collect(Collectors.joining("\n")).getBytes(<charset_can_be_specified>);
}
}
Note that this solution wipes carriage return ('\r') and can be inappropriate.
The other case to get correct byte array via stream, after send request to server and waiting for the response.
/**
* Begin setup TCP connection to PC app
* to open integrate connection between mobile app and pc app (or mobile app)
*/
mSocket = new Socket(IP, port);
// mSocket.setSoTimeout(30000);
DataOutputStream mDos = new DataOutputStream(mSocket.getOutputStream());
String str = "MobileRequest#" + params[0] + "#<EOF>";
mDos.write(str.getBytes());
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
/* Since data are accepted as byte, all of them will be collected in the
following byte array which initialised with accepted data length. */
DataInputStream mDis = new DataInputStream(mSocket.getInputStream());
byte[] data = new byte[mDis.available()];
// Collecting data into byte array
for (int i = 0; i < data.length; i++)
data[i] = mDis.readByte();
// Converting collected data in byte array into String.
String RESPONSE = new String(data);
You're doing an extra copy if you use ByteArrayOutputStream. If you know the length of the stream before you start reading it (e.g. the InputStream is actually a FileInputStream, and you can call file.length() on the file, or the InputStream is a zipfile entry InputStream, and you can call zipEntry.length()), then it's far better to write directly into the byte[] array -- it uses half the memory, and saves time.
// Read the file contents into a byte[] array
byte[] buf = new byte[inputStreamLength];
int bytesRead = Math.max(0, inputStream.read(buf));
// If needed: for safety, truncate the array if the file may somehow get
// truncated during the read operation
byte[] contents = bytesRead == inputStreamLength ? buf
: Arrays.copyOf(buf, bytesRead);
N.B. the last line above deals with files getting truncated while the stream is being read, if you need to handle that possibility, but if the file gets longer while the stream is being read, the contents in the byte[] array will not be lengthened to include the new file content, the array will simply be truncated to the old length inputStreamLength.
I use this.
public static byte[] toByteArray(InputStream is) throws IOException {
ByteArrayOutputStream output = new ByteArrayOutputStream();
try {
byte[] b = new byte[4096];
int n = 0;
while ((n = is.read(b)) != -1) {
output.write(b, 0, n);
}
return output.toByteArray();
} finally {
output.close();
}
}
This is my copy-paste version:
#SuppressWarnings("empty-statement")
public static byte[] inputStreamToByte(InputStream is) throws IOException {
if (is == null) {
return null;
}
// Define a size if you have an idea of it.
ByteArrayOutputStream r = new ByteArrayOutputStream(2048);
byte[] read = new byte[512]; // Your buffer size.
for (int i; -1 != (i = is.read(read)); r.write(read, 0, i));
is.close();
return r.toByteArray();
}
Java 7 and later:
import sun.misc.IOUtils;
...
InputStream in = ...;
byte[] buf = IOUtils.readFully(in, -1, false);
You can try Cactoos:
byte[] array = new BytesOf(stream).bytes();
Here is an optimized version, that tries to avoid copying data bytes as much as possible:
private static byte[] loadStream (InputStream stream) throws IOException {
int available = stream.available();
int expectedSize = available > 0 ? available : -1;
return loadStream(stream, expectedSize);
}
private static byte[] loadStream (InputStream stream, int expectedSize) throws IOException {
int basicBufferSize = 0x4000;
int initialBufferSize = (expectedSize >= 0) ? expectedSize : basicBufferSize;
byte[] buf = new byte[initialBufferSize];
int pos = 0;
while (true) {
if (pos == buf.length) {
int readAhead = -1;
if (pos == expectedSize) {
readAhead = stream.read(); // test whether EOF is at expectedSize
if (readAhead == -1) {
return buf;
}
}
int newBufferSize = Math.max(2 * buf.length, basicBufferSize);
buf = Arrays.copyOf(buf, newBufferSize);
if (readAhead != -1) {
buf[pos++] = (byte)readAhead;
}
}
int len = stream.read(buf, pos, buf.length - pos);
if (len < 0) {
return Arrays.copyOf(buf, pos);
}
pos += len;
}
}
Solution in Kotlin (will work in Java too, of course), which includes both cases of when you know the size or not:
fun InputStream.readBytesWithSize(size: Long): ByteArray? {
return when {
size < 0L -> this.readBytes()
size == 0L -> ByteArray(0)
size > Int.MAX_VALUE -> null
else -> {
val sizeInt = size.toInt()
val result = ByteArray(sizeInt)
readBytesIntoByteArray(result, sizeInt)
result
}
}
}
fun InputStream.readBytesIntoByteArray(byteArray: ByteArray,bytesToRead:Int=byteArray.size) {
var offset = 0
while (true) {
val read = this.read(byteArray, offset, bytesToRead - offset)
if (read == -1)
break
offset += read
if (offset >= bytesToRead)
break
}
}
If you know the size, it saves you on having double the memory used compared to the other solutions (in a brief moment, but still could be useful). That's because you have to read the entire stream to the end, and then convert it to a byte array (similar to ArrayList which you convert to just an array).
So, if you are on Android, for example, and you got some Uri to handle, you can try to get the size using this:
fun getStreamLengthFromUri(context: Context, uri: Uri): Long {
context.contentResolver.query(uri, arrayOf(MediaStore.MediaColumns.SIZE), null, null, null)?.use {
if (!it.moveToNext())
return#use
val fileSize = it.getLong(it.getColumnIndex(MediaStore.MediaColumns.SIZE))
if (fileSize > 0)
return fileSize
}
//if you wish, you can also get the file-path from the uri here, and then try to get its size, using this: https://stackoverflow.com/a/61835665/878126
FileUtilEx.getFilePathFromUri(context, uri, false)?.use {
val file = it.file
val fileSize = file.length()
if (fileSize > 0)
return fileSize
}
context.contentResolver.openInputStream(uri)?.use { inputStream ->
if (inputStream is FileInputStream)
return inputStream.channel.size()
else {
var bytesCount = 0L
while (true) {
val available = inputStream.available()
if (available == 0)
break
val skip = inputStream.skip(available.toLong())
if (skip < 0)
break
bytesCount += skip
}
if (bytesCount > 0L)
return bytesCount
}
}
return -1L
}
You can use cactoos library with provides reusable object-oriented Java components.
OOP is emphasized by this library, so no static methods, NULLs, and so on, only real objects and their contracts (interfaces).
A simple operation like reading InputStream, can be performed like that
final InputStream input = ...;
final Bytes bytes = new BytesOf(input);
final byte[] array = bytes.asBytes();
Assert.assertArrayEquals(
array,
new byte[]{65, 66, 67}
);
Having a dedicated type Bytes for working with data structure byte[] enables us to use OOP tactics for solving tasks at hand.
Something that a procedural "utility" method will forbid us to do.
For example, you need to enconde bytes you've read from this InputStream to Base64.
In this case you will use Decorator pattern and wrap Bytes object within implementation for Base64.
cactoos already provides such implementation:
final Bytes encoded = new BytesBase64(
new BytesOf(
new InputStreamOf("XYZ")
)
);
Assert.assertEquals(new TextOf(encoded).asString(), "WFla");
You can decode them in the same manner, by using Decorator pattern
final Bytes decoded = new Base64Bytes(
new BytesBase64(
new BytesOf(
new InputStreamOf("XYZ")
)
)
);
Assert.assertEquals(new TextOf(decoded).asString(), "XYZ");
Whatever your task is you will be able to create own implementation of Bytes to solve it.

Encoding String to "modified UTF-8" for the DataInput

I would like to encode String value to the modified UTF-8 format bytes. Something like
byte[] bytes = MagicEncoder.encode(str, "modified UTF-8");
DataInput input = new DataInputStream(new ByteArrayInputStream(bytes));
Each read*() method of the DataInput has to be able to properly read the underlaying bytes.
Use DataOutputStream
ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream();
DataOutputStream dataOutputStream = new DataOutputStream(byteOutputStream);
dataOutputStream.writeUTF("some string to write");
dataOutputStream.close();
result is available in byteOutputStream.toByteArray()
As info:
The modified UTF-8 encoding simply replaces the nul character U+0000, normally encoded as byte 0, as the byte sequence C0 80, the normal multi-byte encoding, used for codes > 0x7F.
(Hence normal UTF-8 decoding suffices.)
byte[] originalBytes;
int nulCount = 0;
for (int i = 0; i < originalBytes.length; ++i) {
if (originalBytes[i] == 0) {
++nulCount;
}
}
byte[] convertedBytes = new byte[originalCount + nulCount];
for (int i = 0, j = 0; i < originalBytes.length; ++i, ++j) {
convertedBytes[j] = originalBytes[i];
if (originalBytes[i] == 0) {
convertedBytes[j] = 0xC0;
++j;
convertedBytes[j] = 0x80;
}
}
Better to use System.arrayCopy, and check whether nulCount == 0.

Categories