Converting byte array with ASCII encoding to String produces weird result

Converting byte array with ASCII encoding to String produces weird result - java

I'm making a socket application in Java that receives some HTML data from the server in ASCII and then parse the data accordingly.
byte[] receivedContent = new byte[12500];
receivedSize = inputStream.read(receivedContent);
receivedContent = Arrays.copyOf(receivedContent, receivedSize+1);
if (receivedSize == -1) {
System.out.println("ERROR! NO DATA RECEIVED");
System.exit(-1);
}
lastReceived = new String(receivedContent, StandardCharsets.US_ASCII);
This should really be quite straight forward but it's not. I printed out some debug messages and found that despite receiving some bytes of data, (for exmaple priting receivedSize tells me its received 784 bytes), the resulting string from those bytes is only a few chars long, like this:
Ard</a></li><li><a
I'm expecting a full HTML document, and so this is clearly wrong. There's also no obvious pattern as to when might this happen. It seems totally random. Since I'm allocating new memory for the buffer there really shouldn't be any old data in it that messes with the new data from the socket. Can someone shed some light on this strange behavior? Also this seems to happen less frequently on my Windows machine running OracleJDK rather than my remote Ubunut machine that runs OpenJDK, could that be the reason and how would I fix that?
UPDATE:
at the end I manually inspected the byte array's ASCII encoding against a ASCII table and found that the server is intentionally sending garbled data. Mystery solved.

Instead of using:
inputStream.read(receivedContent);
You need to read all data from the stream. Using something like (from apache commons io):
IOUtils.readFully(inputStream, receivedContent)

Related

Not a gzip format for a obvious gzip text in Java

I have been trying to implement decompressing text compressed in GZIP format
Below we have method I implemented
private byte[] decompress(String compressed) throws Exception {
ByteArrayOutputStream out = new ByteArrayOutputStream();
ByteArrayInputStream in = new
ByteArrayInputStream(compressed.getBytes(StandardCharsets.UTF_8));
GZIPInputStream ungzip = new GZIPInputStream(in);
byte[] buffer = new byte[256];
int n;
while ((n = ungzip.read(buffer)) >= 0) {
out.write(buffer, 0, n);
}
return out.toByteArray();
}
And now I am testing the solution for following compressed text:
H4sIAAAAAAAACjM0MjYxBQAcOvXLBQAAAA==
And there is Not a gzip format exception.
I tried different ways but there still is this error. Maybe anyone has idea what am I doing wrong?

That's not gzip formatted. In general, compressed cannot be a string (because compressed data is bytes, and a string isn't bytes. Some languages / tutorials / 1980s thinking conflate the 2, but it's the 2020s. We don't do that anymore. There are more characters than what's used in english).
It looks like perhaps the following has occurred:
Someone has some data.
They gzipped it.
They then turned the gzipped stream (which are bytes) into characters using Base64 encoding.
They sent it to you.
You now want to get back to the data.
Given that 2 transformations occurred (first, gzip it, then, base64 it), you need to also do 2 transformations, in reverse. You need to:
Take the input string, and de-base64 it, giving you bytes.
You then need to take these bytes and decompress them.
and now you have the original data back.
Thus:
byte[] gzipped = java.util.Base64.getDecoder().decode(compressed);
var in = new GZIPInputStream(new ByteArrayInputStream(gzipped));
return in.readAllBytes();
Note:
Pushing the data from input to outputstream like this is a waste of resources and a bunch of finicky code. There is no need to write this; just call readAllBytes.
If the incoming Base64 is large, there are ways to do this in a streaming fashion. This would require that this method takes in a Reader (instead of a String which cannot be streamed), and would return an InputStream instead of a byte[]. Of course if the input is not particularly large, there is no need. The above approach is somewhat wasteful - both the base64-ed data, and the un-base64ed data, and the decompressed data is all in memory at the same time and you can't avoid this nor can the garbage collector collect any of this stuff in between (because the caller continues to ref that base64-ed string most likely).
In other words, if the compressed ratio is, say, 50%, and the total uncompressed data is 100MB in size, this method takes MORE than:
100MB (uncompressed ) + 50MB (compressed) + 50*4/3 = 67MB (compressed but base64ed) = ~ 217MB of memory.
You know better than we do how much heap your VM is running on, and how large the input data is likely to ever get.
NB: Base64 transfer is extremely inefficient, taking 4 bytes of base64 content for every 3 bytes of input data, and if the data transfer is in UTF-16, it's 8 bytes per 3, even. Ouch. Given that the content was GZipped, this feels a bit daft: First we painstakingly reduce the size of this thing, and then we casually inflate it by 33% for probably no good reason. You may want to check the 'pipe' that leads you to this, possibly you can just... eliminate the base64 aspect of this.
For example, if you have a wire protocol and someone decided that JSON was a good idea, then.. simply.. don't. JSON is not a good idea if you have the need to transfer a bunch of raw data. Use protobuf, or send a combination of JSON and blobs, etc.

How to inflate in Python some data that was deflated by Peoplesoft (Java)?

DISCLAIMER: Peoplesoft knowledge is not mandatory in order to help me with this one!
How could i extract the data from that Peoplesoft table, from the PUBDATALONG column?
The description of the table is here:
http://www.go-faster.co.uk/peopletools/psiblogdata.htm
Currently i am using a program written in Java and below is a piece of the code:
Inflater inflater = new Inflater();
byte[] result = new byte[rs.getInt("UNCOMPDATALEN")];
inflater.setInput(rs.getBytes("PUBDATALONG"));
int length = inflater.inflate(result);
System.out.println(new String(result, 0, length, "UTF-8"));
System.out.println();
System.out.println("-----");
System.out.println();
How could I rewrite this using Python?
It is a question that appeared in other forms on Stackoverflow but had no real answer.
I have basic understanding of what the code does in Java but i don't know any library in Python i could work with to achieve the same thing.
Some recommended to try zlib, as it is compatible with the algorithm used by Java Inflater class, but i did not succeed in doing that.
Considering the below facts from PeopleSoft manual:
When the message is received by the PeopleSoft database, the XML data
is converted to UTF-8 to prevent any UCS2 byte order issues. It is
also compressed using the deflate algorithm prior to storage in the
database.
I tried something like this:
import zlib
import base64
UNCOMPDATALEN = 362 #this value is taken from the DB and is the dimension of the data after decompression.
PUBDATALONG = '789CB3B1AFC8CD51284B2D2ACECCCFB35532D43350B2B7E3E5B2F130F40C8977770D8977F4710D0A890F0E710C090D8EF70F0D09080DB183C8BAF938BAC707FBBBFB783ADA19DAE86388D904B90687FAC0F4DAD940CD70F67771B533B0D147E6DAE8A3A9D5C76B3F00E2F4355C=='
print zlib.decompress(base64.b64decode(PUBDATALONG), 0, 362)
and I get this:
zlib.error: Error -3 while decompressing data: incorrect header check
for sure I do something wrong but I am not smart enough to figure it out by myself.

That string is not Base-64 encoded. It is simply hexadecimal. (I have no idea why it ends in ==, which makes it look a little like a Base-64 string.) You should be able to see by inspection that there are no lower case letters, or for that matter upper case letters after F as there would be in a typical Base-64 encoded string of compressed, i.e. random-appearing data.
Remove the equal signs at the end and use .decode("hex") in Python 2, or bytes.fromhex() in Python 3.

Java is giving me a different buffer of the same packet from the one in C

Hello StackOverflow community,
I has always been around this community for a while but I never had such a problem without any other solution online, or at least, I couldn't find it.
I'm using Java to make a client, as soon as the client connects to the server, it receives a packet containing sensible and essential informations and of course they are encrypted; I successfully reverse engineered the cryptography behind the whole process a long time ago and implemented it in C++ without any problem and fully tested with positive results.
Now I'm trying to rewrite the client in Java for science and better coding speed, but the only problem is that the packet is different from what it should look like.
For example, by sniffing the packet with a C native application I get a buffer, but the same packet in my Java client results different.
What do I mean? I mean there are several 0xFD/BF bytes around which are not valid, resulting in a corrupted buffer and then a decryption failure.
These are the screenshots to let you understand better
Original CORRECT packet
Corrent packet
This is the packet dumped by Java, which is CORRUPTED
Incorrect packet
I'm using Read as reading object class for the socket's outStream.
Do you have any idea about the cause of the problem?
private Reader _br = new InputStreamReader(socket.getInputStream());
char[] _data = new char[92];
this._br.read(_data);
_dump(toBytes(_data));
I just put the code related to the issue.

You are decoding random bytes (probably your ciphertext) to characters, using the platform default encoding, which appears to be UTF-8.
This generally doesn't work, of course, so the replacement character, "�" or U+FFFD, is substituted in the character stream wherever invalid byte sequences are encountered.
Then you print the characters, encoding the (now-corrupted) text into UTF-8. The UTF-8 encoding of U+FFFD is 0xEF 0xBF 0xBD.
The cause of the problem is that you are treating non-textual data as text.
Update:
The problem is that you are creating an InputStreamReader. Don't do that. That would only be useful if the input stream contains encoded text. Read your input into a byte array instead:
InputStream is = socket.getInputStream();
byte[] data = new byte[92];
for (int pos = 0; pos < data.length; ) {
int n = is.read(data, pos, data.length - pos);
if (n < 0)
throw new EOFException();
pos += n;
}
/* Print what you read for debugging... */
for (byte b : data)
System.out.printf("%02X", b & 0xFF);
System.out.println();
Now data contains your packet. You can parse it and decrypt the ciphertext. Perhaps the resulting plain text is actually text, and at that point, you can decode it into characters.

How to reposition a file by having a track on bytes which were decoded to corresponding characters?

Question may be quite vague, let me expound it here.
I'm developing an application in which I'll be reading data from a file. I've a FileReader class which opens the file in following fashion
currentFileStream = new FileInputStream(currentFile);
fileChannel = currentFileStream.getChannel();
data is read as following
bytesRead = fileChannel.read(buffer); // Data is buffered using a ByteBuffer
I'm processing the data in any one of the 2 forms, one is binary and other is character.
If its processed as character I do an additional step of decoding this ByteBuffer into CharBuffer
CoderResult result = decoder.decode(byteBuffer, charBuffer, false);
Now my problem is I need to read by repositioning the file from some offset during recovery mode in case of some failure or crash in application.
For this, I maintain a byteOffset which keeps track of no of bytes processed during binary mode and I persist this variable.
If something happens I reposition the file like this
fileChannel.position(byteOffset);
which is pretty straightforward.
But if processing mode is character, I maintain recordOffset which keeps track of character position/offset in the file. During recovery I make calls to read() internally till I get some character offset which is persisted recordOffset+1.
Is there anyway to get corresponding bytes which were needed to decode characters? For instance I want something similar like this if recordOffset is 400 and its corresponding byteOffset is 410 or 480 something( considering different charsets). So that while repositioning I can do this
fileChannel.position(recordOffset); //recordOffset equivalent value in number of bytes
instead of making repeated calls internally in my application.
Other approach I could think for this was using an InputStreamReader's skip method.
If there are any better approach for this or if possible to get byte - character mapping, please let me know.

BufferedReader, other Object to get a String

In java there is another Object like BufferedReader to read data recived by server??
Because the server send a string without newline and the client don't print any string untile the Server close the connection form Timeout (the timeout message have a newline!) , after the Client print all message recived and the timeout message send by server!
help me thanks!!

Just don't read by newlines using readLine() method, but read char-by-char using read() method.
for (int c = 0; (c = reader.read()) > -1;) {
System.out.print((char) c);
}

You asked for another class to use, so in that case give Scanner a try for this. It's usually used for delimiting input based on patterns or by the types inferred from the input (e.g. reading on a byte-by-byte bases or an int-by-int basis, or some combination thereof). However, you can use it as just a general purpose "reader" here as well, to cover your use case.

When you read anything from a server, you have to strictly follow the communication protocol. For example the server might be an HTTP server or an SMTP server, it may encrypt the data before sending it, some of the data may be encoded differently, and so on.
So you should basically ask: What kind of server do I want to access? How does it send the bytes to me? And has someone else already done the work of interpreting the bytes so that I can quickly get to the data that I really want?
If it is an HTTP server, you can use the code new URL("http://example.org/").openStream(). Then you will get a stream of bytes. How you convert these bytes into characters, strings and other stuff is another task.

You could try
InputStream is = ... // from input
String text = IOUtils.toString(is);
turns the input into text, without without newlines (it preserves original newlines as well)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Converting byte array with ASCII encoding to String produces weird result - java

Instead of using: inputStream.read(receivedContent); You need to read all data from the stream. Using something like (from apache commons io): IOUtils.readFully(inputStream, receivedContent)

Related

Not a gzip format for a obvious gzip text in Java

How to inflate in Python some data that was deflated by Peoplesoft (Java)?

Java is giving me a different buffer of the same packet from the one in C

How to reposition a file by having a track on bytes which were decoded to corresponding characters?

BufferedReader, other Object to get a String

Categories

Resources