We are processing a byte[] as shown below (the file is POST'ed to a web server, this code is running in Glassfish) and have found that some files have a byte-order mark (BOM, a three-byte sequence 0xEF,0xBB,0xBF, see: http://en.wikipedia.org/wiki/Byte_order_mark) at the beginning, and we want to remove this BOM. How would we detect and remove a BOM in this code? Thanks.
private final void serializePayloadToFile(File file, byte[] payload) throws IOException {
FileOutputStream fos;
DataOutputStream dos;
fos = new FileOutputStream(file, true); // true for append
dos = new DataOutputStream(fos);
dos.write(payload);
dos.flush();
dos.close();
fos.close();
return;
}
How would we detect [...]
There's obviously no way to tell for sure if the three bytes are three random bytes or three bytes representing a BOM.
You could check if the array starts with 0xEF, 0xBB, 0xBF and in that case skip them.
[...] and remove a BOM in this code?
Something like this should do:
int off = payload.length >= 3
&& payload[0] == 0xEF
&& payload[1] == 0xBB
&& payload[2] == 0xBF ? 3 : 0
dos.write(payload, off, payload.length - off);
DataOutputStream has a write() method with offsets and length
public void write(byte[] b, int off, int len);
So test for the byte order mark and set off (and len) appropriately.
The simplest solution seems to be adding another OutputStream implementation between dos and fos and buffering the first few bytes there, before actually committing them to fos. You might or might not want to throw them away, depending on their values.
Related
This is my first question on StackOverflow. Hope it's gonna be clear and detailed enough.
So I need to write 2 methods, encrypt and decrypt.
My encrypt function is:
public void cifra() throws FileNotFoundException,IOException {
FileInputStream in=new FileInputStream(file);
String s="";
int b;
while(in.read()!=-1) {
b=in.read()+key;
s+=b;
}
in.close();
PrintStream ps=new PrintStream(file);
ps.println(s);
ps.close();
}
My decrypt function is the same but with
b=in.read()-key;
But it dont works. The output file is not same as the initial file non-crypted.
Thanks for the help!
Change your while function to this:
while ((b = in.read()) != -1) {
b += key;
s += b;
}
Currently you read twice, first time inside while condition and second inside the loop, so you are skipping 1 character.
in.read() is reading in a single byte of the file, as an integer. You are then converting that integer to a string via s+=b.
So say in.read() gives you 97 (ASCII for 'a') and your key is 5, you are turning around and writing literally 102 to the file, instead of an 'f', which would be the "encoded" character.
Your loop should be building a byte array (or byte stream) and you should write that byte array to the file.
Here are the docs for the ByteArrayOutputStream, which your loop should write to, which you can in-turn write to a file.
You are reading bytes (each one into an int).
A String however is not an array of bytes, but contains Unicode text, and can combine Greek, Chinese and whatever. (In fact String uses chars where every char is two bytes.) There is a conversion involved for the external bytes having some charset encoding. That will go wrong, uses more memory and is slow. Hence generally one does not use String here.
FileInputStream in = new FileInputStream(file);
ByteArrayOutputStream out = new ByteArrayOutputStream();
int b;
while((b = in.read()) !=-1) {
b = (b + key) % 256;
out.write(b);
}
in.close();
byte[] data = out.toByteArray();
FileOutputStream out2 = new FileOutputStream(file);
out2.write(data);
out2.close();
The other problem is that bytes have a range 0 - 255 (or signed bytes -128 - 127).
Hence my %, modulo. one sees & 0xFF too (bitwise AND with 255, 0b1111_1111).
Note that println(someInt) will write a textual representation as an integer, 'A' being int 65 will be stored as "65" - to 2 bytes: 56 and 55.
I have few characters in notepad that takes 2 or 3 bytes. I am able to use inputstream and output stream to copy the files. Bytes stream is for ASCII characters and Character streams should be used for UNICODE characters. How does input stream process 2 or 3 bytes characters?
FileInputStream fis = new FileInputStream("E:\\Users\\17496382.WUDIP\\Desktop\\qwert.txt");
FileOutputStream fos = new FileOutputStream("E:\\Users\\17496382.WUDIP\\Desktop\\qwert1.txt");
byte[] buffer = new byte[1024];
int len;
while((len = fis.read()) != -1){ //do this until int len is not -1
System.out.println((char)len);
fos.write(buffer, 0, len);
It doesn't. InputStreams read bytes and Readers read characters.
Your code will display garbage if it encounters a multibyte char. It may display garbage otherwise too, since you're assuming that byte = char (while that would work in many encodings).
Lastly: Joel Spolsky's excellent article on Unicode. Read it and you'll be smarter than a lot of other developers.
I'm working with Amazon S3 and would like to upload an InputStream (which requires counting the number of bytes I'm sending).
public static boolean uploadDataTo(String bucketName, String key, String fileName, InputStream stream) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buffer = new byte[1];
try {
while (stream.read(buffer) != -1) { // copy from stream to buffer
out.write(buffer); // copy from buffer to byte array
}
} catch (Exception e) {
UtilityFunctionsObject.writeLogException(null, e);
}
byte[] result = out.toByteArray(); // we needed all that just for length
int bytes = result.length;
IO.close(out);
InputStream uploadStream = new ByteArrayInputStream(result);
....
}
I was told copying a byte at a time is highly inefficient (obvious for large files). I can't make it more because it will add padding to the ByteArrayOutputStream, which I can't strip out. I can strip it out from result, but how can I do it safely? If I use an 8KB buffer, can I just strip out the right most buffer[i] == 0? Or is there a better way to do this? Thanks!
Using Java 7 on Windows 7 x64.
You can do something like this:
int read = 0;
while ((read = stream.read(buffer)) != -1) {
out.write(buffer, 0, read);
}
stream.read() returns the number of bytes that have been written into buffer. You can pass this information to the len parameter of out.write(). So you make sure that you write only the bytes you have read from the stream.
Use Jakarta Commons IOUtils to copy from the input stream to the byte array stream in a single step. It will use an efficient buffer, and not write any excess bytes.
If you want efficiency you could process the file as you read it. I would replace uploadStream with stream and remove the rest of the code.
If you need some buffering you can do this
InputStream uploadStream = new BufferedInputStream(stream);
the default buffer size is 8 KB.
If you want the length use File.length();
long length = new File(fileName).length();
I am trying to first read 4 bytes(int) specifying the size of the message and then read the remaining bytes based on the byte count. I am using the following code to accomplish this:
DataInputStream dis = new DataInputStream(
mClientSocket.getInputStream());
// read the message length
int len = dis.readInt();
Log.i(TAG, "Reading bytes of length:" + len);
// read the message data
byte[] data = new byte[len];
if (len > 0) {
dis.readFully(data);
} else {
return "";
}
return new String(data);
Is there a better/efficient way of doing this?
From JavaDocs of readUTF:
First, two bytes are read and used to construct an unsigned 16-bit
*integer* in exactly the manner of the readUnsignedShort method . This
integer value is called the UTF length and specifies the number of
additional bytes to be read. These bytes are then converted to
characters by considering them in groups. The length of each group is
computed from the value of the first byte of the group. The byte
following a group, if any, is the first byte of the next group.
The only problem with this is that your protocol seems to only send 4 bytes for the payload length. Perhaps you can do a similar method but increase the size of length sentinel read to 4 bytes/32-bits.
Also, I see that you are just doing new String(bytes) which works fine as long as the encoding of the data is the same as "the platform's default charset." See javadoc So it would be much safer to just ensure that you are encoding it correctly(e.g. if you know that the sender sends it as UTF-8 then do new String(bytes,"UTF-8") instead).
How about
DataInputStream dis = new DataInputStream(new BufferedInputStream(
mClientSocket.getInputStream()));
return dis.readUTF();
You can use read(byte[] b, int off, int len) like this
byte[] data = new byte[len];
dis.read(data,0,len);
I have a Java class, where I'm reading data in via an InputStream
byte[] b = null;
try {
b = new byte[in.available()];
in.read(b);
} catch (IOException e) {
e.printStackTrace();
}
It works perfectly when I run my app from the IDE (Eclipse).
But when I export my project and it's packed in a JAR, the read command doesn't read all the data. How could I fix it?
This problem mostly occurs when the InputStream is a File (~10kb).
Thanks!
Usually I prefer using a fixed size buffer when reading from input stream. As evilone pointed out, using available() as buffer size might not be a good idea because, say, if you are reading a remote resource, then you might not know the available bytes in advance. You can read the javadoc of InputStream to get more insight.
Here is the code snippet I usually use for reading input stream:
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = 0;
while ((bytesRead = in.read(buffer)) >= 0){
for (int i = 0; i < bytesRead; i++){
//Do whatever you need with the bytes here
}
}
The version of read() I'm using here will fill the given buffer as much as possible and
return number of bytes actually read. This means there is chance that your buffer may contain trailing garbage data, so it is very important to use bytes only up to bytesRead.
Note the line (bytesRead = in.read(buffer)) >= 0, there is nothing in the InputStream spec saying that read() cannot read 0 bytes. You may need to handle the case when read() reads 0 bytes as special case depending on your case. For local file I never experienced such case; however, when reading remote resources, I actually seen read() reads 0 bytes constantly resulting the above code into an infinite loop. I solved the infinite loop problem by counting the number of times I read 0 bytes, when the counter exceed a threshold I will throw exception. You may not encounter this problem, but just keep this in mind :)
I probably will stay away from creating new byte array for each read for performance reasons.
read() will return -1 when the InputStream is depleted. There is also a version of read which takes an array, this allows you to do chunked reads. It returns the number of bytes actually read or -1 when at the end of the InputStream. Combine this with a dynamic buffer such as ByteArrayOutputStream to get the following:
InputStream in = ...
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int read;
byte[] input = new byte[4096];
while ( -1 != ( read = in.read( input ) ) ) {
buffer.write( input, 0, read );
}
input = buffer.toByteArray()
This cuts down a lot on the number of methods you have to invoke and allows the ByteArrayOutputStream to grow its internal buffer faster.
File file = new File("/path/to/file");
try {
InputStream is = new FileInputStream(file);
byte[] bytes = IOUtils.toByteArray(is);
System.out.println("Byte array size: " + bytes.length);
} catch (IOException e) {
e.printStackTrace();
}
Below is a snippet of code that downloads a file (*. Png, *. Jpeg, *. Gif, ...) and write it in BufferedOutputStream that represents the HttpServletResponse.
BufferedInputStream inputStream = bo.getBufferedInputStream(imageFile);
try {
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int bytesRead = 0;
byte[] input = new byte[DefaultBufferSizeIndicator.getDefaultBufferSize()];
while (-1 != (bytesRead = inputStream.read(input))) {
buffer.write(input, 0, bytesRead);
}
input = buffer.toByteArray();
response.reset();
response.setBufferSize(DefaultBufferSizeIndicator.getDefaultBufferSize());
response.setContentType(mimeType);
// Here's the secret. Content-Length should equal the number of bytes read.
response.setHeader("Content-Length", String.valueOf(buffer.size()));
response.setHeader("Content-Disposition", "inline; filename=\"" + imageFile.getName() + "\"");
BufferedOutputStream outputStream = new BufferedOutputStream(response.getOutputStream(), DefaultBufferSizeIndicator.getDefaultBufferSize());
try {
outputStream.write(input, 0, buffer.size());
} finally {
ImageBO.close(outputStream);
}
} finally {
ImageBO.close(inputStream);
}
Hope this helps.