How does Byte stream reads 2bytes UNICODE characters

How does Byte stream reads 2bytes UNICODE characters - java

I have few characters in notepad that takes 2 or 3 bytes. I am able to use inputstream and output stream to copy the files. Bytes stream is for ASCII characters and Character streams should be used for UNICODE characters. How does input stream process 2 or 3 bytes characters?
FileInputStream fis = new FileInputStream("E:\\Users\\17496382.WUDIP\\Desktop\\qwert.txt");
FileOutputStream fos = new FileOutputStream("E:\\Users\\17496382.WUDIP\\Desktop\\qwert1.txt");
byte[] buffer = new byte[1024];
int len;
while((len = fis.read()) != -1){ //do this until int len is not -1
System.out.println((char)len);
fos.write(buffer, 0, len);

It doesn't. InputStreams read bytes and Readers read characters.
Your code will display garbage if it encounters a multibyte char. It may display garbage otherwise too, since you're assuming that byte = char (while that would work in many encodings).
Lastly: Joel Spolsky's excellent article on Unicode. Read it and you'll be smarter than a lot of other developers.

Related

Writing a Java program that encrypts .txt files with an integer key

This is my first question on StackOverflow. Hope it's gonna be clear and detailed enough.
So I need to write 2 methods, encrypt and decrypt.
My encrypt function is:
public void cifra() throws FileNotFoundException,IOException {
FileInputStream in=new FileInputStream(file);
String s="";
int b;
while(in.read()!=-1) {
b=in.read()+key;
s+=b;
}
in.close();
PrintStream ps=new PrintStream(file);
ps.println(s);
ps.close();
}
My decrypt function is the same but with
b=in.read()-key;
But it dont works. The output file is not same as the initial file non-crypted.
Thanks for the help!

Change your while function to this:
while ((b = in.read()) != -1) {
b += key;
s += b;
}
Currently you read twice, first time inside while condition and second inside the loop, so you are skipping 1 character.

in.read() is reading in a single byte of the file, as an integer. You are then converting that integer to a string via s+=b.
So say in.read() gives you 97 (ASCII for 'a') and your key is 5, you are turning around and writing literally 102 to the file, instead of an 'f', which would be the "encoded" character.
Your loop should be building a byte array (or byte stream) and you should write that byte array to the file.
Here are the docs for the ByteArrayOutputStream, which your loop should write to, which you can in-turn write to a file.

You are reading bytes (each one into an int).
A String however is not an array of bytes, but contains Unicode text, and can combine Greek, Chinese and whatever. (In fact String uses chars where every char is two bytes.) There is a conversion involved for the external bytes having some charset encoding. That will go wrong, uses more memory and is slow. Hence generally one does not use String here.
FileInputStream in = new FileInputStream(file);
ByteArrayOutputStream out = new ByteArrayOutputStream();
int b;
while((b = in.read()) !=-1) {
b = (b + key) % 256;
out.write(b);
}
in.close();
byte[] data = out.toByteArray();
FileOutputStream out2 = new FileOutputStream(file);
out2.write(data);
out2.close();
The other problem is that bytes have a range 0 - 255 (or signed bytes -128 - 127).
Hence my %, modulo. one sees & 0xFF too (bitwise AND with 255, 0b1111_1111).
Note that println(someInt) will write a textual representation as an integer, 'A' being int 65 will be stored as "65" - to 2 bytes: 56 and 55.

Trim Padding From ByteArrayOutputStream

I'm working with Amazon S3 and would like to upload an InputStream (which requires counting the number of bytes I'm sending).
public static boolean uploadDataTo(String bucketName, String key, String fileName, InputStream stream) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buffer = new byte[1];
try {
while (stream.read(buffer) != -1) { // copy from stream to buffer
out.write(buffer); // copy from buffer to byte array
}
} catch (Exception e) {
UtilityFunctionsObject.writeLogException(null, e);
}
byte[] result = out.toByteArray(); // we needed all that just for length
int bytes = result.length;
IO.close(out);
InputStream uploadStream = new ByteArrayInputStream(result);
....
}
I was told copying a byte at a time is highly inefficient (obvious for large files). I can't make it more because it will add padding to the ByteArrayOutputStream, which I can't strip out. I can strip it out from result, but how can I do it safely? If I use an 8KB buffer, can I just strip out the right most buffer[i] == 0? Or is there a better way to do this? Thanks!
Using Java 7 on Windows 7 x64.

You can do something like this:
int read = 0;
while ((read = stream.read(buffer)) != -1) {
out.write(buffer, 0, read);
}
stream.read() returns the number of bytes that have been written into buffer. You can pass this information to the len parameter of out.write(). So you make sure that you write only the bytes you have read from the stream.

Use Jakarta Commons IOUtils to copy from the input stream to the byte array stream in a single step. It will use an efficient buffer, and not write any excess bytes.

If you want efficiency you could process the file as you read it. I would replace uploadStream with stream and remove the rest of the code.
If you need some buffering you can do this
InputStream uploadStream = new BufferedInputStream(stream);
the default buffer size is 8 KB.
If you want the length use File.length();
long length = new File(fileName).length();

Java TCP Socket receiving bytes with specified length

I am trying to first read 4 bytes(int) specifying the size of the message and then read the remaining bytes based on the byte count. I am using the following code to accomplish this:
DataInputStream dis = new DataInputStream(
mClientSocket.getInputStream());
// read the message length
int len = dis.readInt();
Log.i(TAG, "Reading bytes of length:" + len);
// read the message data
byte[] data = new byte[len];
if (len > 0) {
dis.readFully(data);
} else {
return "";
}
return new String(data);
Is there a better/efficient way of doing this?

From JavaDocs of readUTF:
First, two bytes are read and used to construct an unsigned 16-bit
*integer* in exactly the manner of the readUnsignedShort method . This
integer value is called the UTF length and specifies the number of
additional bytes to be read. These bytes are then converted to
characters by considering them in groups. The length of each group is
computed from the value of the first byte of the group. The byte
following a group, if any, is the first byte of the next group.
The only problem with this is that your protocol seems to only send 4 bytes for the payload length. Perhaps you can do a similar method but increase the size of length sentinel read to 4 bytes/32-bits.
Also, I see that you are just doing new String(bytes) which works fine as long as the encoding of the data is the same as "the platform's default charset." See javadoc So it would be much safer to just ensure that you are encoding it correctly(e.g. if you know that the sender sends it as UTF-8 then do new String(bytes,"UTF-8") instead).

How about
DataInputStream dis = new DataInputStream(new BufferedInputStream(
mClientSocket.getInputStream()));
return dis.readUTF();

You can use read(byte[] b, int off, int len) like this
byte[] data = new byte[len];
dis.read(data,0,len);

Java - Read file by chunks?

I know how to read a file by bytes but cannot find a example how to read it in chunks of bytes. I have a byte array, and i want to read the file by 512bytes and send them over a socket.
I have tried by reading total bytes of file and then subtracting 512 bytes until i got a chunk that was less than 512 bytes and signaled EOF and end of transfer.
I am trying to implement a TFTP, where data is sent in 512 byte chunks.
Anyhow would be thankful for a example.

You ... read 512 bytes at a time.
char[] myBuffer = new char[512];
int bytesRead = 0;
BufferedReader in = new BufferedReader(new FileReader("foo.txt"));
while ((bytesRead = in.read(myBuffer,0,512)) != -1)
{
...
}

You can use the appropriate read() method from the input stream, for example FileInputStream supports a read(byte[]) to read a chunk of bytes.
something like: You may want to wrap the input stream in a BufferedInputStream if you wanted to guarantee 512 byte blocks (the constructor takes a block size argument).
byte[] buffer = new byte[512];
FileInputStream in = new FileInputStream("some_file");
int rc = in.read(buffer);
while(rc != -1)
{
// rc should contain the number of bytes read in this operation.
// do stuff...
// next read
rc = in.read(buffer);
}

Using InputStream you can read in an array of given size and limit the reading to this size.
Read here: http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html

Java byte array and DataOutputStream processing

We are processing a byte[] as shown below (the file is POST'ed to a web server, this code is running in Glassfish) and have found that some files have a byte-order mark (BOM, a three-byte sequence 0xEF,0xBB,0xBF, see: http://en.wikipedia.org/wiki/Byte_order_mark) at the beginning, and we want to remove this BOM. How would we detect and remove a BOM in this code? Thanks.
private final void serializePayloadToFile(File file, byte[] payload) throws IOException {
FileOutputStream fos;
DataOutputStream dos;
fos = new FileOutputStream(file, true); // true for append
dos = new DataOutputStream(fos);
dos.write(payload);
dos.flush();
dos.close();
fos.close();
return;
}

How would we detect [...]
There's obviously no way to tell for sure if the three bytes are three random bytes or three bytes representing a BOM.
You could check if the array starts with 0xEF, 0xBB, 0xBF and in that case skip them.
[...] and remove a BOM in this code?
Something like this should do:
int off = payload.length >= 3
&& payload[0] == 0xEF
&& payload[1] == 0xBB
&& payload[2] == 0xBF ? 3 : 0
dos.write(payload, off, payload.length - off);

DataOutputStream has a write() method with offsets and length
public void write(byte[] b, int off, int len);
So test for the byte order mark and set off (and len) appropriately.

The simplest solution seems to be adding another OutputStream implementation between dos and fos and buffering the first few bytes there, before actually committing them to fos. You might or might not want to throw them away, depending on their values.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How does Byte stream reads 2bytes UNICODE characters - java

Related

Writing a Java program that encrypts .txt files with an integer key

Trim Padding From ByteArrayOutputStream

Java TCP Socket receiving bytes with specified length

Java - Read file by chunks?

Java byte array and DataOutputStream processing

Categories

Resources