Different Content Length between C# and Java

Different Content Length between C# and Java - java

I am trying to read a upload file from browser, then write to a remote server. But I found that the request.getheader("content-length") is different from the actual length of request.getInputStream() in JAVA. But Request.Header["Content-Length"] is same as the actual length of Request.InputStream in C#.
Could anyone try to explain the difference?

Are you sure that the situations are comparable? One pitfall might be the possibility of zipped content.
Another one might be the character encoding if one implementation counts the characters and the other one counts the bytes sent over the connection.

Related

performance and size limitations on HttpServletResponse.getOutputStream.print(string) vs getWriter(String)

For a web project I'm writing large sections of text to a webpage(table) or even bigger (could be several MB) to CSV files for download.
The java method dealing with this receives a StringBuilder content string, which originally (by the creator of this module) was being sent char by char in a loop:
response.getOutputStream().write(content.charAt(i)).
Upon questioning about the loop, the reason given was that he thought the string might be too big for writing in one go. (using java 1.6).
I can't find any size restrictions anywhere, and then also the question came which method to use instead: print() or getWriter()?
The data in the string is all text.

He assumed wrong. If anything it's inefficient, or at least useless to do that one character at a time. If you have a String in memory, you can write it out at one go without worrying.
If you're only writing text, use a Writer. OutputStream is for binary data (although you can wrap it in an OutputStreamWriter to convert between the two). See Writer or OutputStream?

Efficient way to parse a datagram in Java

Right now I am using a socket and a datagram packet. This program is for a LAN network and sends at least 30 packets a second at 500 bytes maximum.
this is how I am receiving my data
payload = new String(incomingPacket.getData(), incomingPacket.getOffset(), incomingPacket.getLength(), "UTF-8");
Currently I am using no offset and I parse one by one through each character. I use the first 2 characters right now to determine what type of message it is but that is subject to change, then I break down variables and seperate the data with an exclamation mark to tell me when the next variable begins. At the end I parse it and apply it to my program. Is there a faster way to break down and interpret datagram packets? Will there be a performance difference if I put the length of the variables in the offset. Maybe an example would be useful. Also I think my variables are too small to use StringBuilder so I use normal concatenation.

What you are talking about here is setting up your own protocol for communication. While I have this as the fourth part of my socket tutorial (I'm currently working on part 3, non-blocking sockets) I can explain some things here already.
There are several ways of setting up such a protocol, depending on your needs.
One way of doing it is having a byte in front of each piece of data declaring the size, in bytes. That way, you know the length of the byte array containing the next variable value. This makes it easy to read out whole variables in one go via the System.arraycopy method. This is a method I've used before. If the object being sent is always the same, this is all you need to do. Write the variables in a standardized order, add the size of the variable value and you're done.
If you have to send multiple types of objects throught the stream you might want to add a bit of meta data. This meta data can then be used to tell what kind of object is being sent and the order of the variables. This meta data can then be put in a header which you add before the actual message. Once again, the values in the header are preceded by the size byte.
In my tutorial I'll write up a complete code example.

Don't use a String at all. Just process the underlying byte array directly: scan it for delimiters, counts, what have you. You can use a DataInputStream wrapped around a ByteArrayInputStream wrapped around the byte array if you want an API oriented to Java primitive datatypes.

Parse byte array as HTTP object

In Java, how would I convert a byte array (TCP packet payload from a pcap file) into some kind of HTTP object that I can use to get HTTP headers and content body?

One of the stupid lovely things about Java is a total lack of unsigned types. So, a good place to start would be taking your byte array and converting it into a short array to make sure that you don't have any rollover problems. (16 bits versus 8 bits per number).
From there, you could use a BufferedOutputStream to write your data to a file and parse it with one of the Java built-in XML readers, such as JaxB or DOM. BufferedOutputStream writes hex directly to a file, and can take an input of an int, byte, or short array. After you write it out, using the OutputStream it should be very simple to parse the HTML out of it.
If you need any help with any of these individual steps, I'd be happy to help.
EDIT: as maerics has pointed out, perhaps I didn't grasp what you were asking. Regardless, writing your byte array with a BufferedOutputStream is the way to go in my opinion, and I could still help you build a parser if you want.

JNetPcap can do exactly this.
Here are examples for
Opening a pcap file
Parsing http (in the example, we extract an image)
Drawback: parsing http in this library is depracated*, but that doesn't mean it doesn't work
*I can't post anymore links without more reputation. Sorry. You can Google for "jnetpcap http deprecated"

How should I handle searching through byte arrays in Java?

Preliminary: I am writting my own httpclient in Java. I am trying to parse out the contents of chunked encoding.
Here is my dilema: Since I am trying to parse out chunked http transfer encoding with a gzip payload there is a mix of ascii and binary. I can't just take the http resp content and convert it to a string and make use of StringUtils since the binary data can easily contain nil characters. So what I need to do is some basic things for parsing out each chunk and its chunk length (as per chunked transfer/HTTP/1.1 spec).
Are there any helpful ways of searching through byte arrays of binary/part ascii data for certain patterns (like a CR LF) (instead of just a single byte) ? Or must I write the for loops for this?

Thus, you basically need a ChunkedInputStream. Google gives enough hints. Apache HttpCore's variant is pretty concise. I'd suggest you to just use it rather than reinventing your own client.

If you really, really want to implement your own HTTP client stack from the ground up, try a Google search for "+java +bytestring". One of the first hits that I got was this ByteString class which would seem to do provide the kind of functionality you are asking for.
But I think #BalusC's approach is better.

How do I identify that I am at the last byte of a serialized Java object?

Question
What is (if there is any) terminating characters/byte sequences in serialized java objects?
Background
I'm working on a small self-education project where I would like to serialize java objects and write them to a stream where there are read and then unserialized. Since, I will need to identify the borders between serialized objects and I can't be sure that the current object is not the last one, is there a terminating character that is always there that I can use as my identifier?
I noticed that there is a magic number ACED that allows me to identify the start of the object, so how do I identify the end?
EDIT:
If there is no terminating character, is there any safe terminating characters/sequences that I can use (insert) to identify the end of the object?

In theory you should always be able to find the end of an object, in practice you cannot. I understand the problem is customised writeObject implementations that don't call either defaultReadObject or readFields have a non-standard representation.
I've played about with serialisation in the past. Including creating streams for use when I've been doing unusual things to the ObjectInputStream. It's not pleasant(!).
You can read the details in the spec, and the source is worth a read.

there are none. AFAIK the only requirement is that the deserialiser know when to stop reading, when given a corresponding serialisation. subject to that, the serialiser can write whatever it wants -- in any position not just the last.
if you're old skool dump a 32-bit length field at the beginning a refuse to handle objects bigger than 4 gig.
nu scool, you just make sure your read and your write logic are consistent and don't care about the length.

You can add a terminating object to your object stream. e.g. null or a special String.
However, I suggest that you instead convert the ObjectsStream to a byte[] and write the byte length of the byte[] followed by its data. This way each ObjectStream is independent and you always know where it finishes.

Have you considered applying a record-marking layer similar to HTTP Chunked encoding?
The Chunked encoding is intended to solve a generalization of this scenario: identifying the end of a message of indeterminate length that both itself contains no identifiable end, and is embedded in a longer stream without ending it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Different Content Length between C# and Java - java

Are you sure that the situations are comparable? One pitfall might be the possibility of zipped content. Another one might be the character encoding if one implementation counts the characters and the other one counts the bytes sent over the connection.

Related

performance and size limitations on HttpServletResponse.getOutputStream.print(string) vs getWriter(String)

Efficient way to parse a datagram in Java

Parse byte array as HTTP object

How should I handle searching through byte arrays in Java?

How do I identify that I am at the last byte of a serialized Java object?

Categories

Resources