I am trying the following:
C# Client:
string stringToSend = "Hello man";
BinaryWriter writer = new BinaryWriter(mClientSocket.GetStream(),Encoding.UTF8);
//write number of bytes:
byte[] headerBytes = BitConverter.GetBytes(stringToSend.Length);
mClientSocket.GetStream().Write(headerBytes, 0, headerBytes.Length);
//write text:
byte[] textBytes = System.Text.Encoding.UTF8.GetBytes(stringToSend);
writer.Write(textBytes, 0, textBytes.Length);
Java Server:
Charset utf8 = Charset.forName("UTF-8");
BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream(), utf8));
while (true) {
//we read header first
int headerSize = in.read();
int bytesRead = 0;
char[] input = new char[headerSize];
while (bytesRead < headerSize)
{
bytesRead += in.read(input, bytesRead, headerSize - bytesRead);
}
String resString = new String(input);
System.out.println(resString);
if (resString.equals("!$$$")) {
break;
}
}
The string size equals 9.That's correct on both sides.But, when I am reading the string iteself on the Java side, the data looks wrong.The char buffer ('input' variable)content looks like this:
",",",'H','e','l','l','o',''
I tried to change endianness with reversing the byte array.Also tried changing string encoding format between ASCII and UTF-8.I still feel like it relates to the endianness problem,but can not figure out how to solve it.I know I can use other types of writers in order to write text data to the steam,but I am trying using raw byte arrays for the sake of learning.
These
byte[] headerBytes = BitConverter.GetBytes(stringToSend.Length);
are 4 bytes. And they aren't character data so it makes no sense to read them with a BufferedReader. Just read the bytes directly.
byte[] headerBytes = new byte[4];
// shortcut, make sure 4 bytes were actually read
in.read(headerBytes);
Now extract your text's length and allocate enough space for it
int length = ByteBuffer.wrap(headerBytes).getInt();
byte[] textBytes = new byte[length];
Then read the text
int remaining = length;
int offset = 0;
while (remaining > 0) {
int count = in.read(textBytes, offset, remaining);
if (-1 == count) {
// deal with it
break;
}
remaining -= count;
offset += count;
}
Now decode it as UTF-8
String text = new String(textBytes, StandardCharsets.UTF_8);
and you are done.
Endianness will have to match for those first 4 bytes. One way of ensuring that is to use "network order" (big-endian). So:
C# Client
byte[] headerBytes = BitConverter.GetBytes(IPAddress.HostToNetworkOrder(stringToSend.Length));
Java Server
int length = ByteBuffer.wrap(headerBytes).order(ByteOrder.BIG_ENDIAN).getInt();
At first glance it appears you have a problem with your indexes.
You C# code is sending an integer converted to 4 bytes.
But you Java Code is only reading a single byte as the length of the string.
The next 3 bytes sent from C# are going to the three zero bytes from your string length.
You Java code is reading those 3 zero bytes and converting them to empty characters which represent the first 3 empty characters of your input[] array.
C# Client:
string stringToSend = "Hello man";
BinaryWriter writer = new BinaryWriter(mClientSocket.GetStream(),Encoding.UTF8);
//write number of bytes: Original line was sending the entire string here. Optionally if you string is longer than 255 characters, you'll need to send another data type, perhaps an integer converted to 4 bytes.
byte[] textBytes = System.Text.Encoding.UTF8.GetBytes(stringToSend);
mClientSocket.GetStream().Write((byte)textBytes.Length);
//write text the entire buffer
writer.Write(textBytes, 0, textBytes.Length);
Java Server:
Charset utf8 = Charset.forName("UTF-8");
BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream(), utf8));
while (true) {
//we read header first
// original code was sending an integer as 4 bytes but was only reading a single char here.
int headerSize = in.read();// read a single byte from the input
int bytesRead = 0;
char[] input = new char[headerSize];
// no need foe a while statement here:
bytesRead = in.read(input, 0, headerSize);
// if you are going to use a while statement, then in each loop
// you should be processing the input but because it will get overwritten on the next read.
String resString = new String(input, utf8);
System.out.println(resString);
if (resString.equals("!$$$")) {
break;
}
}
Related
I have this class to encode and decode a file. When I run the class with .txt files the result is successfully. But when I run the code with .jpg or .doc I can not open the file or it is not equals to original. I don’t know why this is happening. I have modified this class
http://myjeeva.com/convert-image-to-string-and-string-to-image-in-java.html. But i want change this line
byte imageData[] = new byte[(int) file.length()];
for
byte example[] = new byte[1024];
and read the file so many times how we need. Thanks.
import java.io.*;
import java.util.*;
public class Encode {
Input = Input file root - Output = Output file root - imageDataString =String encoded
String input;
String output;
String imageDataString;
public void setFileInput(String input){
this.input=input;
}
public void setFileOutput(String output){
this.output=output;
}
public String getFileInput(){
return input;
}
public String getFileOutput(){
return output;
}
public String getEncodeString(){
return imageDataString;
}
public String processCode(){
StringBuilder sb= new StringBuilder();
try{
File fileInput= new File( getFileInput() );
FileInputStream imageInFile = new FileInputStream(fileInput);
i have seen in examples that people create a byte[] with the same length than the file. I don´t want this because i will not know what length will have the file.
byte buff[] = new byte[1024];
int r = 0;
while ( ( r = imageInFile.read( buff)) > 0 ) {
String imageData = encodeImage(buff);
sb.append( imageData);
if ( imageInFile.available() <= 0 ) {
break;
}
}
} catch (FileNotFoundException e) {
System.out.println("File not found" + e);
} catch (IOException ioe) {
System.out.println("Exception while reading the file " + ioe);
}
imageDataString = sb.toString();
return imageDataString;
}
public void processDecode(String str) throws IOException{
byte[] imageByteArray = decodeImage(str);
File fileOutput= new File( getFileOutput());
FileOutputStream imageOutFile = new FileOutputStream( fileOutput);
imageOutFile.write(imageByteArray);
imageOutFile.close();
}
public static String encodeImage(byte[] imageByteArray) {
return Base64.getEncoder().withoutPadding().encodeToString( imageByteArray);
}
public static byte[] decodeImage(String imageDataString) {
return Base64.getDecoder().decode( imageDataString);
}
public static void main(String[] args) throws IOException {
Encode a = new Encode();
a.setFileInput( "C://Users//xxx//Desktop//original.doc");
a.setFileOutput("C://Users//xxx//Desktop//original-copied.doc");
a.processCode( );
a.processDecode( a.getEncodeString());
System.out.println("C O P I E D");
}
}
I tried changing
String imageData = encodeImage(buff);
for
String imageData = encodeImage(buff,r);
and the method encodeImage
public static String encodeImage(byte[] imageByteArray, int r) {
byte[] aux = new byte[r];
for ( int i = 0; i < aux.length; i++) {
aux[i] = imageByteArray[i];
if ( aux[i] <= 0 ) {
break;
}
}
return Base64.getDecoder().decode( aux);
}
But i have the error:
Exception in thread "main" java.lang.IllegalArgumentException: Last unit does not have enough valid bits
You have two problems in your program.
The first, as mentioned in by #Joop Eggen, is that you are not handling your input correctly.
In fact, Java does not promise you that even in the middle of the file, you'll be reading the entire 1024 bytes. It could just read 50 bytes, and tell you it read 50 bytes, and then the next time it will read 50 bytes more.
Suppose you read 1024 bytes in the previous round. And now, in the current round, you're only reading 50. Your byte array now contains 50 of the new bytes, and the rest are the old bytes from the previous read!
So you always need to copy the exact number of bytes copied to a new array, and pass that on to your encoding function.
So, to fix this particular problem, you'll need to do something like:
while ( ( r = imageInFile.read( buff)) > 0 ) {
byte[] realBuff = Arrays.copyOf( buff, r );
String imageData = encodeImage(realBuff);
...
}
However, this is not the only problem here. Your real problem is with the Base64 encoding itself.
What Base64 does is take your bytes, break them into 6-bit chunks, and then treat each of those chunks as a number between N 0 and 63. Then it takes the Nth character from its character table, to represent that chunk.
But this means it can't just encode a single byte or two bytes, because a byte contains 8 bits, and which means one chunk of 6 bits, and 2 leftover bits. Two bytes have 16 bits. Thats 2 chunks of 6 bits, and 4 leftover bits.
To solve this problem, Base64 always encodes 3 consecutive bytes. If the input does not divide evenly by three, it adds additional zero bits.
Here is a little program that demonstrates the problem:
package testing;
import java.util.Base64;
public class SimpleTest {
public static void main(String[] args) {
// An array containing six bytes to encode and decode.
byte[] fullArray = { 0b01010101, (byte) 0b11110000, (byte)0b10101010, 0b00001111, (byte)0b11001100, 0b00110011 };
// The same array broken into three chunks of two bytes.
byte[][] threeTwoByteArrays = {
{ 0b01010101, (byte) 0b11110000 },
{ (byte)0b10101010, 0b00001111 },
{ (byte)0b11001100, 0b00110011 }
};
Base64.Encoder encoder = Base64.getEncoder().withoutPadding();
// Encode the full array
String encodedFullArray = encoder.encodeToString(fullArray);
// Encode the three chunks consecutively
StringBuilder encodedStringBuilder = new StringBuilder();
for ( byte [] twoByteArray : threeTwoByteArrays ) {
encodedStringBuilder.append(encoder.encodeToString(twoByteArray));
}
String encodedInChunks = encodedStringBuilder.toString();
System.out.println("Encoded full array: " + encodedFullArray);
System.out.println("Encoded in chunks of two bytes: " + encodedInChunks);
// Now decode the two resulting strings
Base64.Decoder decoder = Base64.getDecoder();
byte[] decodedFromFull = decoder.decode(encodedFullArray);
System.out.println("Byte array decoded from full: " + byteArrayBinaryString(decodedFromFull));
byte[] decodedFromChunked = decoder.decode(encodedInChunks);
System.out.println("Byte array decoded from chunks: " + byteArrayBinaryString(decodedFromChunked));
}
/**
* Convert a byte array to a string representation in binary
*/
public static String byteArrayBinaryString( byte[] bytes ) {
StringBuilder sb = new StringBuilder();
sb.append('[');
for ( byte b : bytes ) {
sb.append(Integer.toBinaryString(Byte.toUnsignedInt(b))).append(',');
}
if ( sb.length() > 1) {
sb.setCharAt(sb.length() - 1, ']');
} else {
sb.append(']');
}
return sb.toString();
}
}
So, imagine my 6-byte array is your image file. And imagine that your buffer is not reading 1024 bytes but 2 bytes each time. This is going to be the output of the encoding:
Encoded full array: VfCqD8wz
Encoded in chunks of two bytes: VfAqg8zDM
As you can see, the encoding of the full array gave us 8 characters. Each group of three bytes is converted into four chunks of 6 bits, which in turn are converted into four characters.
But the encoding of the three two-byte arrays gave you a string of 9 characters. It's a completely different string! Each group of two bytes was extended to three chunks of 6 bits by padding with zeros. And since you asked for no padding, it produces only 3 characters, without the extra = that usually marks when the number of bytes is not divisible by 3.
The output from the part of the program that decodes the 8-character, correct encoded string is fine:
Byte array decoded from full: [1010101,11110000,10101010,1111,11001100,110011]
But the result from attempting to decode the 9-character, incorrect encoded string is:
Exception in thread "main" java.lang.IllegalArgumentException: Last unit does not have enough valid bits
at java.util.Base64$Decoder.decode0(Base64.java:734)
at java.util.Base64$Decoder.decode(Base64.java:526)
at java.util.Base64$Decoder.decode(Base64.java:549)
at testing.SimpleTest.main(SimpleTest.java:34)
Not good! A good base64 string should always have multiples of 4 characters, and we only have 9.
Since you chose a buffer size of 1024, which is not a multiple of 3, that problem will happen. You need to encode a multiple of 3 bytes each time to produce the proper string. So in fact, you need to create a buffer sized 3072 or something like that.
But because of the first problem, be very careful at what you pass to the encoder. Because it can always happen that you'll be reading less than 3072 bytes. And then, if the number is not divisible by three, the same problem will occur.
Look at:
while ( ( r = imageInFile.read( buff)) > 0 ) {
String imageData = encodeImage(buff);
read returns -1 on end-of-file or the actual number of bytes that were read.
So the last buff might not be totally read, and even contain garbage from any prior read. So you need to use r.
As this is an assignment, the rest is up to you.
By the way:
byte[] array = new byte[1024]
is more conventional in Java. The syntax:
byte array[] = ...
was for compatibility with C/C++.
I have a ByteArrayOutputStream which holds a byte representation of an XML with 750MB size.
I need to convert it to String.
I wrote:
ByteArrayOutputStream xmlArchive = ...
String xmlAsString = xmlArchive.toString(UTF8);
However although I am using 4GB of heap size I get java.lang.OutOfMemoryError: Java heap space
What is wrong? How can I know which heap size to use? I am using JDK64 bit
UPDATE
I need it as String in order to remove all the characters before "<?xml"
Currently my code is:
String xmlAsString = xmlArchive.toString(UTF8);
int xmlBegin = xmlAsString.indexOf("<?xml");
if (xmlBegin >0){
return xmlAsString.substring(xmlBegin);
}
return xmlAsString;
I then convert it again to byte array.
UPDATED 2
The ByteArrayOutputStream is written like this:
HttpMethod method ..
InputStream response = method.getResponseBodyAsStream();
byte[] buf = new byte[5000];
while ( (len=response.read(buf)) != -1) {
output.write(buf, 0, len);
}
len is from the header of the response Content-Length
You could use the Scanner class:
Scanner scanner = new Scanner(response, StandardCharsets.UTF_8.name());
// skip to "<?xml"
scanner.skip(".*?(?=<\\?xml)");
// process rest of stream
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
// Do something with line
}
scanner.close();
Expanding on Jamie Cockburn's answer:
To fill in his while loop to match your expected behaviour:
byte[] buf = line.getBytes(StandardCharsets.UTF_8.name());
output.write(buf, 0, buf.length);
How to read first 2 bytes from input stream and convert 2 bytes data into actual int length value, then read and copy the rest of message into byte array.
The rest of data array should be defined after reading first 2 bytes from the stream, does anyone know efficient logic?
Use a DataInputStream. Use the readUnsignedShort() method to return the length word, then the readFully() method to read the following data.
This creates a string from a byte array. Adapt as needed.
InputStream in;
try {
in = socket.getInputStream();
DataInputStream dis = new DataInputStream(in);
int len = dis.readInt();
byte[] data = new byte[len];
if (len > 0) {
dis.readFully(data);
}
String sReturn = new String(data);
}
I have a file which is split in two parts by "\n\n" - first part is not too long String and second is byte array, which can be quite long.
I am trying to read the file as follows:
byte[] result;
try (final FileInputStream fis = new FileInputStream(file)) {
final InputStreamReader isr = new InputStreamReader(fis);
final BufferedReader reader = new BufferedReader(isr);
String line;
// reading until \n\n
while (!(line = reader.readLine()).trim().isEmpty()){
// processing the line
}
// copying the rest of the byte array
result = IOUtils.toByteArray(reader);
reader.close();
}
Even though the resulting array is the size it should be, its contents are broken. If I try to use toByteArray directly on fis or isr, the contents of result are empty.
How can I read the rest of the file correctly and efficiently?
Thanks!
The reason your contents are broken is because the IOUtils.toByteArray(...) function reads your data as a string in the default character encoding, i.e. it converts the 8-bit binary values into text characters using whatever logic your default encoding prescribes. This usually leads to many of the binary values getting corrupted.
Depending on how exactly the charset is implemented, there is a slight chance that this might work:
result = IOUtils.toByteArray(reader, "ISO-8859-1");
ISO-8859-1 uses only a single byte per character. Not all character values are defined, but many implementations will pass them anyways. Maybe you're lucky with it.
But a much cleaner solution would be to instead read the String in the beginning as binary data first and then converting it to text via new String(bytes) rather than reading the binary data at the end as a String and then converting it back.
This might mean, though, that you need to implement your own version of a BufferedReader for performance purposes.
You can find the source code of the standard BufferedReader via the obvious Google search, which will (for example) lead you here:
http://www.docjar.com/html/api/java/io/BufferedReader.java.html
It's a bit long, but conceptually not too difficult to understand, so hopefully it will be useful as a reference.
Alternatively, you could read the file into byte array, find \n\n position and split the array into the line and bytes
byte[] a = Files.readAllBytes(Paths.get("file"));
String line = "";
byte[] result = a;
for (int i = 0; i < a.length - 1; i++) {
if (a[i] == '\n' && a[i + 1] == '\n') {
line = new String(a, 0, i);
int len = a.length - i - 1;
result = new byte[len];
System.arraycopy(a, i + 1, result, 0, len);
break;
}
}
Thanks for all the comments - the final implementation was done in this way:
try (final FileInputStream fis = new FileInputStream(file)) {
ByteBuffer buffer = ByteBuffer.allocate(64);
boolean wasLast = false;
String headerValue = null, headerKey = null;
byte[] result = null;
while (true) {
byte current = (byte) fis.read();
if (current == '\n') {
if (wasLast) {
// this is \n\n
break;
} else {
// just a new line in header
wasLast = true;
headerValue = new String(buffer.array(), 0, buffer.position()));
buffer.clear();
}
} else if (current == '\t') {
// headerKey\theaderValue\n
headerKey = new String(buffer.array(), 0, buffer.position());
buffer.clear();
} else {
buffer.put(current);
wasLast = false;
}
}
// reading the rest
result = IOUtils.toByteArray(fis);
}
I am trying to first read 4 bytes(int) specifying the size of the message and then read the remaining bytes based on the byte count. I am using the following code to accomplish this:
DataInputStream dis = new DataInputStream(
mClientSocket.getInputStream());
// read the message length
int len = dis.readInt();
Log.i(TAG, "Reading bytes of length:" + len);
// read the message data
byte[] data = new byte[len];
if (len > 0) {
dis.readFully(data);
} else {
return "";
}
return new String(data);
Is there a better/efficient way of doing this?
From JavaDocs of readUTF:
First, two bytes are read and used to construct an unsigned 16-bit
*integer* in exactly the manner of the readUnsignedShort method . This
integer value is called the UTF length and specifies the number of
additional bytes to be read. These bytes are then converted to
characters by considering them in groups. The length of each group is
computed from the value of the first byte of the group. The byte
following a group, if any, is the first byte of the next group.
The only problem with this is that your protocol seems to only send 4 bytes for the payload length. Perhaps you can do a similar method but increase the size of length sentinel read to 4 bytes/32-bits.
Also, I see that you are just doing new String(bytes) which works fine as long as the encoding of the data is the same as "the platform's default charset." See javadoc So it would be much safer to just ensure that you are encoding it correctly(e.g. if you know that the sender sends it as UTF-8 then do new String(bytes,"UTF-8") instead).
How about
DataInputStream dis = new DataInputStream(new BufferedInputStream(
mClientSocket.getInputStream()));
return dis.readUTF();
You can use read(byte[] b, int off, int len) like this
byte[] data = new byte[len];
dis.read(data,0,len);