Java > PHP Socket - trash at start of message

Java > PHP Socket - trash at start of message - java

I have a java server communicating with a PHP script called from apache. I am aiming to send a JSON from the java server to the php client when requested, however there is some stuff getting prefixed when its received on the client.
JAVA
in = new BufferedReader(new InputStreamReader (socket.getInputStream()));
out= new DataOutputStream(socket.getOutputStream());
//The server receives a JSON from the PHP script and replies. It recives and converts to a Gson JSON no problem.
String reply = "{\"status\":\"reg\",\"token\":\""+client.getToken()+"\"}\r\n";
//reply = "HELLO\r";
out.writeUTF(reply);
PHP
$rec = socket_read($socket, 2048,PHP_NORMAL_READ);
echo "Receiving... ";
echo $rec;
The issue is that the message received is pre-fixed with some crap.
Output From PHP
Receiving... 1{"status":"reg","token":"QOPIPCNDI4K97QP0NAQF"}
If I send "HELLO\r"
Receiving... >HELLO

You shouldn't use DataOutputStream.writeUTF() unless you are using DataOutputStream.readUTF() to read the message.
Here is a snippet of the javadoc of writeUTF():
Writes a string to the underlying output stream using modified UTF-8
encoding in a machine-independent manner.
First, two bytes are written to the output stream as if by the
writeShort method giving the number of bytes to follow. This value is
the number of bytes actually written out, not the length of the
string. Following the length, each character of the string is output,
in sequence, using the modified UTF-8 encoding for the character. If
no exception is thrown, the counter written is incremented by the
total number of bytes written to the output stream. This will be at
least two plus the length of str, and at most two plus thrice the
length of str.
The bolded part above may tell you why you are getting weird characters at the beginning of your message.
Here is a workaround I believe will work in your case
BufferedOutputStream out = new BufferedOutputStream(socket.getOutputStream());
out.write(os.getBytes("UTF-8"));
Reference: Why does DataOutputStream.writeUTF() add additional 2 bytes at the beginning?

Related

Why java string display \u on screen

Suppose \u4404 means 'A' in Japanese.
Following code will print 'A' on screen.
String str = "\u4404";
System.out.println(str);
This is true because encode("\u4404", unicode)='A'.
But I encountered one problem. When I process a message received over http, I get following output like
{"name":"\u4404\u2424\u4022","age":"30"}
The http header shows the reply is encoded by utf-8. But why does the ouput shows like this?
Here is my guess:
Suppose the stream I received from web is mystream. Then, after we encode mystream with utf-8, I get \u4404\u2424\u4022. I have to encode mystream by TWO times, and get the right 'A' in Japanes.
Am I right? If i am right, why transfer data like this? Because of JSON? Thans very much for your ansewer!

How to fully read a file with delimited messages in Google Protobufs?

I'm trying to read a file, which has multiple delimited messages in it (in the thousands), how can I do this properly using Google protobufs?
This is how I'm writing the delimited:
MyMessage myMessage = MyMessage.parseFrom(byte[] msg);
myMessage.writeDelimitedTo(FileOutputStream);
and this is how I'm reading the delimited file;
CodedInputStream is = CodedInputStream.newInstance(new FileInputStream("/location/to/file"));
while (!is.isAtEnd()) {
int size = is.readRawVarint32();
MyMessage msg = MyMessage.parseFrom(is.readRawBytes(size));
//do stuff with your messages
}
I'm kind of confused because the accepted answer in this question say's to use .parseDelimitedFrom() to read the delimited bytes; Google Protocol Buffers - Storing messages into file
However, when using .parseDelimitedFrom(), it only reads the first message. (I don't know how to read the whole file using parseDelimitedFrom()).
This comment say's to write the delimited messages using CodedOutputStream: Google Protocol Buffers - Storing messages into file (i.e. writer.writeRawVariant()). I'm currently using the implementation of this comment to read the whole file. Does writeDelimitedTo() basically do the same thing as
writer.writeRawVarint32(bytes.length);
and
writer.writeRawBytes(bytes);
Also, if my way isn't the proper way of reading a whole file consisting of delimited messages, can you please show me what is?
thank you.

Yes, writeDelimitedTo() simply writes the length as a varint followed byte the bytes. There's no need to use CodedOutputStream directly if you're working in Java.
parseDelimitedFrom() parses one message, but you may call it repeatedly to parse all the messages in the InputStream. The method will return null when you reach the end of the stream.

Client-side string encoding java

My team and I have this nasty problem with parsing a string received from our server. The server is pretty simple socket stuff done in qt here is the sendData function:
void sendData(QTcpSocket *client,QString response){
QString text = response.toUtf8();
QByteArray block;
QDataStream out(&block, QIODevice::WriteOnly);
out << (quint32)0;
out << text;
out.device()->seek(0);
out << (quint32)(block.size() - sizeof(quint32));
try{
client->write(block);
}
catch(...){...
The client is in Java and is also pretty standard socket stuff, here is where we are at now after trying many many different ways of decoding the response from the server:
Socket s;
try {
s = new Socket(URL, 1987);
PrintWriter output = new PrintWriter(s.getOutputStream(), true);
InputStreamReader inp = new InputStreamReader(s.getInputStream(), Charset.forName("UTF-8"));
BufferedReader rd = new BufferedReader( inp );
String st;
while ((st = rd.readLine()) != null){
System.out.println(st);
}...
If a connection is made with the server it sends a string "Send Handshake" with the size of the string in bytes sent before it as seen in the first block of code. This notifies the client that it should send authentication to the server. As of now the string we get from the server looks like this:
������ ��������S��e��n��d�� ��H��a��n��d��s��h��a��k��e
We have used tools such as string encode/decode tool to try and assess how the string is encoded but it fails on every configuration.
We are out of ideas as to what encoding this is, if any, or how to fix it.
Any help would be much appreciated.

At a glance, the line where you convert the QString parameter to a Utf8 QByteArray and then back to a QString seems odd:
QString text = response.toUtf8();
When the QByteArray returned by toUtf8() is assigned to text, I think it is assumed that the QByteArray contains an Ascii (char*) buffer.

I'm pretty sure that QDataStream is intended to be used only within Qt. It provides a platform-independent way of serializing data that is then intended to be deserialized with another QDataStream somewhere else. As you noticed, it's including a lot of extra stuff besides your raw data, and that extra stuff is subject to change at the next Qt version. (This is why the documentation suggests including in your stream the version of QDataStream being used ... so it can use the correct deserialization logic.)
In other words, the extra stuff you are seeing is probably meta-data included by Qt and it is not guaranteed to be the same with the next Qt version. From the docs:
QDataStream's binary format has evolved since Qt 1.0, and is likely to
continue evolving to reflect changes done in Qt. When inputting or
outputting complex types, it's very important to make sure that the
same version of the stream (version()) is used for reading and
writing.
If you are going to another language, this isn't practical to use. If it is just text you are passing, use a well-known transport mechanism (JSON, XML, ASCII text, UTF-8, etc.) and bypass the QDataStream altogether.

BlackBerry UTF-8 InputStreamReader on Socket issue

I'm trying to read the response from a server using a socket and the information is UTF-8 encoded. I'm wrapping the InputStream from the socket in an InputStreamReader with the encoding set to "UTF-8".
For some reason it seems like only part of the response is read and then the reading just hangs for about a minute or two and then it finishes. If I set the encoding on the InputStreamReader to "ISO-8859-1" then I can read all of the data right away, but obviously not all of the characters are displayed correctly.
Code looks something like the following
socketConn = (SocketConnection)Connector.open(url);
InputStreamReader is = new InputStreamReader(socketConn.openInputStream(), "UTF-8");
Then I read through the headers and the content. The content is chunked and I read the line with the size of each chunk (convert to decimal from hex) to know how much to read.
I'm not understanding the difference in reading with the two encodings and the effect it can have because it works without issue with ISO-8859-1 and it works eventually with UTF-8, there is just the long delay.

It's hard to get the reason of the delay.
You may try another way of getting the data from the network:
byte[] data = IOUtilities.streamToBytes(socketConn.openInputStream());
I believe the above should be passed without delay. Then having got the bytes from network you can start data processing. Note you can always get a String from bytes representing a string in UTF-8 encoding:
String stringInUTF8 = new String(bytes, "UTF-8");
UPDATE: see the second comment to this post.

I was already removing the chunk sizes on the fly so I ended up doing something somewhat similar to the IOUtilities answer. Instead of using an InputStreamReader I just used the InputStream. InputStream has a read method that can fill an array of bytes, so for each chunk the code looks something like this
byte[] buf = new buf[size];
is.read(buf);
return new String(buf, "UTF-8");
This seems to work, doesn't cause any delays and I can remove the extra information about the chunks on the fly.

Java App : Unable to read iso-8859-1 encoded file correctly

I have a file which is encoded as iso-8859-1, and contains characters such as ô .
I am reading this file with java code, something like:
File in = new File("myfile.csv");
InputStream fr = new FileInputStream(in);
byte[] buffer = new byte[4096];
while (true) {
int byteCount = fr.read(buffer, 0, buffer.length);
if (byteCount <= 0) {
break;
}
String s = new String(buffer, 0, byteCount,"ISO-8859-1");
System.out.println(s);
}
However the ô character is always garbled, usually printing as a ? .
I have read around the subject (and learnt a little on the way) e.g.
http://www.joelonsoftware.com/articles/Unicode.html
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
http://www.ingrid.org/java/i18n/utf-16/
but still can not get this working
Interestingly this works on my local pc (xp) but not on my linux box.
I have checked that my jdk supports the required charsets (they are standard, so this is no suprise) using :
System.out.println(java.nio.charset.Charset.availableCharsets());

I suspect that either your file isn't actually encoded as ISO-8859-1, or System.out doesn't know how to print the character.
I recommend that to check for the first, you examine the relevant byte in the file. To check for the second, examine the relevant character in the string, printing it out with
System.out.println((int) s.getCharAt(index));
In both cases the result should be 244 decimal; 0xf4 hex.
See my article on Unicode debugging for general advice (the code presented is in C#, but it's easy to convert to Java, and the principles are the same).
In general, by the way, I'd wrap the stream with an InputStreamReader with the right encoding - it's easier than creating new strings "by hand". I realise this may just be demo code though.
EDIT: Here's a really easy way to prove whether or not the console will work:
System.out.println("Here's the character: \u00f4");

Parsing the file as fixed-size blocks of bytes is not good --- what if some character has a byte representation that straddles across two blocks? Use an InputStreamReader with the appropriate character encoding instead:
BufferedReader br = new BufferedReader(
new InputStreamReader(
new FileInputStream("myfile.csv"), "ISO-8859-1");
char[] buffer = new char[4096]; // character (not byte) buffer
while (true)
{
int charCount = br.read(buffer, 0, buffer.length);
if (charCount == -1) break; // reached end-of-stream
String s = String.valueOf(buffer, 0, charCount);
// alternatively, we can append to a StringBuilder
System.out.println(s);
}
Btw, remember to check that the unicode character can indeed be displayed correctly. You could also redirect the program output to a file and then compare it with the original file.
As Jon Skeet suggests, the problem may also be console-related. Try System.console().printf(s) to see if there is a difference.

#Joel - your own answer confirms that the problem is a difference between the default encoding on your operating system (UTF-8, the one Java has picked up) and the encoding your terminal is using (ISO-8859-1).
Consider this code:
public static void main(String[] args) throws IOException {
byte[] data = { (byte) 0xF4 };
String decoded = new String(data, "ISO-8859-1");
if (!"\u00f4".equals(decoded)) {
throw new IllegalStateException();
}
// write default charset
System.out.println(Charset.defaultCharset());
// dump bytes to stdout
System.out.write(data);
// will encode to default charset when converting to bytes
System.out.println(decoded);
}
By default, my Ubuntu (8.04) terminal uses the UTF-8 encoding. With this encoding, this is printed:
UTF-8
?ô
If I switch the terminal's encoding to ISO 8859-1, this is printed:
UTF-8
ôÃ´
In both cases, the same bytes are being emitted by the Java program:
5554 462d 380a f4c3 b40a
The only difference is in how the terminal is interpreting the bytes it receives. In ISO 8859-1, ô is encoded as 0xF4. In UTF-8, ô is encoded as 0xC3B4. The other characters are common to both encodings.

If you can, try to run your program in debugger to see what's inside your 's' string after it is created. It is possible that it has correct content, but output is garbled after System.out.println(s) call. In that case, there is probably mismatch between what Java thinks is encoding of your output and character encoding of your terminal/console on Linux.

Basically, if it works on your local XP PC but not on Linux, and you are parsing the exact same file (i.e. you transferred it in a binary fashion between the boxes), then it probably has something to do with the System.out.println call. I don't know how you verify the output, but if you do it by connecting with a remote shell from the XP box, then there is the character set of the shell (and the client) to consider.
Additionally, what Zach Scrivena suggests is also true - you cannot assume that you can create strings from chunks of data in that way - either use an InputStreamReader or read the complete data into an array first (obviously not going to work for a large file). However, since it does seem to work on XP, then I would venture that this is probably not your problem in this specific case.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java > PHP Socket - trash at start of message - java

Related

Why java string display \u on screen

How to fully read a file with delimited messages in Google Protobufs?

Client-side string encoding java

BlackBerry UTF-8 InputStreamReader on Socket issue

Java App : Unable to read iso-8859-1 encoded file correctly

Categories

Resources