I'm trying to use saxophone for parsing json to protobuf message on the fly, and want to avoid creating string instances for each response.
For that i need to create Bytes instance from InputStream (that is provided from apache http entity).
I'm digging sources for a while but cant find way to do that... any suggestions?
There is two ways you can do this.
// reuse a string builder if the String cannot be pooled easily
stringBuilder.setLength(0);
bytes.parseUTF(stringBuilder, StopCharTesters.ALL);
or you can use the built in String pool
String s = bytes.parseUTF(StopCharTesters.ALL);
This will work well if there is a relative small number of possible Strings (at least most of the time)
Related
I am using redis as centralized cache for distributed system. Currently i am using jedis to connect to redis cluster, where i am storing the value as byte[] instead of string. My question is does storing plain string or byte [] has impact on getting the data. In my application i serialize my java pojo object and convert to byte [] and then store, where as i can convert it to json and store so while getting it from redis i can readily use the object instead of deserialize. I have tried both but the only difference i can see is the extra step of deserialize
In Redis, everything is a byte[]. What redis calls as strings are actually byte[] in programming languages.
When you store JSON, you still need to serialize it to byte[] before saving to redis, and do the reverse when you read back. This is no different from serializing a java object. In other words, you always have to pay the cost of serialization and deserialization.
That said, different libraries have different serialization costs. Java serialization is know to be slow and inefficient. JSON is likely to be better than java serialization - but wastes memory in redis because it is a text based. You can choose a better serialization library.
Kryo is a faster replacement for the java serializer. Message Pack is like JSON but faster. Protocol Buffers / Flat Buffers are even better, but require you to declare a schema upfront. There are other serialization formats as well, each with their tradeoffs.
The general recommendation - try to use the hash datatype. It is efficient, and lets you request specific fields instead of the whole object. Only if hash does not work for you, pick something else based on your needs.
P.S. If you are into benchmarks, this website has several - https://github.com/eishay/jvm-serializers/wiki
Sometimes, we need to create some thrift objects in unit tests. We can do it by manually create object using Java code, like:
MyObj myObj = new MyObj();
myObj.setName("???");
myObj.setAge(111);
But which is not convenient. I'm looking for a way to create objects with some readable text.
We can convert thrift objects to JSON with TSimpleJSONProtocol, and get very readable JSON string, like:
{ "name": "???", "age": 111 }
But the problem is TSimpleJSONProtocol is write only, thrift can't read it back to construct an instance of MyObj.
Although there is a TJSONProtocol which supports to serialize and deserialize, but the generated JSON is not readable, it uses a very simplified JSON format and most of the field names are missing. Not convenient to construct it in tests.
Is there any way to convert thrift objects to readable string and also can convert it back? If TSimpleJSONProtocol supports converting back, which is just what I'm looking for
The main goal of Thrift is to provide efficient serialization and RPC mechanisms. What you want is something that is - at least partially - contrary to that. Human-readable data structures and machine processing efficiency are to a good extent conflicting goals, and Thrift favors the latter over the former.
You already found out about the TSimpleJson and TJson protocols and about their pros and cons, so there is not much to add. The only thing that is left to say is this: the protocol/transport stack of Thrift is simple enough.
This simplicity makes it possible to add another protocol based on your specific needs without much or overly complicated work. One could probably even write an XML protocol (if anyone really wants such bloatware) in short time.
The only caveat, especially vis-à-vis your specific case, is the fact that Thrift needs the field ID to deserialize the data. So you either need to store them in the data, or you need some other mechanism which is able to retrieve that field ID based on the field and structure names.
I am doing a new application where I want to choose which protocol to use in it. I tried the String concatenation and the XML before, but never tried the JSON Object. Well Which one of those three is better in terms of performance? I am aware that XML is way much better than string concatenation. So what to use? XML or JSON? Or maybe a new technology that I am not aware of?
Thanks in advance
I am aware that XML is way much better than string concatenation. Well in this I mean that in String concatenation, I am adding different values and splitters to a string and then looping to find the spliters on the device. like in the example:
String toSend = "test1////test2////test3////test4////test5";
Here the splitter is "////" and I am sending 5 values. Getting these 5 values will be much more slower than XML in case of thousands of values.
It depends. :)
Well, actually I think a properly written code to split a string will be more fast than an XML/JSON parser, however XML/JSON parsers are reliable in terms of returning exactly the same data structure. For instance, how would you handle a case when your data itself includes splitters? If such case is impossible under your business logic, then you may just go with string joining/splitting. Otherwise it is better not to reinvent the wheel and just use XML/JSON (JSON is more lightweight).
It depends on the kind of Objects you will be exchanging.
It also depends on the way you will request and use you objects.
If you want ot provide a REST service that exposes simples Objects will be accessible directly by as Javascript GUI. I would also go for JSON. But no hand-made String concatenation to build JSON. You can use a lib.
But I you plan to exchange more complex data, between various Java based "services". I would probably go for XML. Especially if you can first write the XSD that defines you XML objects. You will be able to generate Java class and let JAXB do the marshalling/unmarshalling boring stuff.
I would choose JSON, it's very portable and lightweight (lighter than XML).
I'm working on utility method that allows conversion of XML data into formatted String and before you're going to think it's a trivial task for javax.xml.transform.Transformer let me explain the specific constraints I've faced with.
The input data does not exist at the moment conversion starts. Actually it's represented as groovy.lang.Writeable (javadoc) instance that I could output into any java.io.Writer instance. Signature of method looks like this:
static String serializeToString(Writable source)
My current solution involves few steps and actually provides expected result:
Create StringWriter, output source there and convert to String
Create javax.xml.transform.stream.StreamSource instance based on this string (using StringReader)
Create new StringWriter instance and wrap it into javax.xml.transform.stream.StreamResult
Perform transformation using instance of javax.xml.transform.Transformer
Convert StringWriter to String
While solution does work I'm not pleased enough with its efficiency. This method will be used really often and I do want to optimize it. What I'd like to avoid is necessity to perform multiple conversions along the line:
From Writeable to String (unformatted)
From String to StreamSource (which means that data will be parsed again)
From StreamSource to String again (formatted)
So the question is whether it's possible to build pipe-like flow which eliminates unnecessary conversions?
UPDATE #1:
To give a little bit more context, I'm converting GPathResult instance to formatted string using StreamingMarkupBuilder.bindNode() method which produces Writable instance. Unfortunately there is no way to specify StreamingMarkupBuilder to produce formatted output.
UPDATE #2:
I did experiment with implementation based on PipedWriter + PipedReader but experiments didn't show much speed gain from this approach. Looks like it's not that critical issue in this case.
Not knowing what you mean exactly by "XML data", but you could think of representing the "Yet-to-be" stuff as a SAXSource directly, thereby by-passing the "to-string" and "parse-string" steps.
I'm using Java sockets for client - server application. I have a situation when sometimes client needs to send a byte array (using byteArrayOutputStream) and sometimes it should send a custom java object. How can I read the information from the input stream on the server side and determine what is in the stream so that I can properly process that?
Usually this is to be done by sending a "header" in front of the body containing information about the body. Have a look at for example the HTTP protocol. The HTTP stream exist of a header which is separated from the body by a double newline. The header in turn exist of several fields in name: value format, each separated by a single newline. In this particular case, you would in HTTP have used the Content-Type header to identify the data type of the body.
Since Java and TCP/IP doesn't provide standard facilities for this, you would need to specify and document the format you're going to send over the line in detail so that the other side knows how to handle the stream. You can of course also grab a standard specification. E.g. HTTP or FTP.
There are multiple ways to handle this.
One is Object Serialization, which sends it over with Java's Object(In|Out)putStream. You run into a small problem when knowing when to read the object off the stream though.
Another is to marshal and unmarshal XML. Uses a bit more traffic but is easier to debug and get running. It helps to have a well documented XML schema for this. An advantage here is you can use existing XML libraries for it.
You could try a custom format if you wanted, but it would probably end up being just a sloppy, less verbose version of XML.
In general, I don't believe there is a feature built into Java that allows you to do this.
Instead, consider sending some more information along with each message that explains what type is coming next.
For example, you might prefix your messages with an integer, such that every time you receive a message, you read the first 4 bytes (an integer is 4 bytes) and interpret its value (e.g. 1=byte array, 2=custom Java object, 3=another custom Java object, ...).
You might also consider adding an integer containing the size of the message so that you know when the current message ends and the next message begins.
I'm going to get called for overkill for this, but unless you seriously need for this protocol to be economical, you might consider marshalling the data. I mean, without peeking at the data, you can't generally tell the difference between something that's a byte array and something that's something else, since you could conceivably represent everything as a byte array.
You can pretty easily use JAXB to marshall the data to and from XML. And JAXB will even turn byte array objects into hex strings or Base64 for you.
First read the data into a byte array on the server. Write your own parsing routine to do nothing more than identify what is in the byte array.
Second perform the full object parsing based on the identification from step one. If the parsing requires passing an inputstream, you can always put the byte array you read in step one into a new ByteArrayInputStream instance.
You need to define a protocol to indicate what type of data follows. For instance, you could start each transfer with a string or enumerated value. The server would first read this, then read the following data based on the 'header' value.
What you could do, would be to prepend any data you send with an integer that is used to determine the type.
That way, you could read the first 4 bytes, and then determine what type of data it is.
I think the easiest way is to use an object which contains the data that you will send along with its type information. Then you can just send this object and according to this object's data type property you can extract the data.