Get an Publisher<ByteBuffer> from InputStream - java

I just upgraded my mongo-db-java-driver and now the handy function GridFSBucket.uploadFromStream has gone. Therefore we now got a
GridFSUploadPublisher<ObjectId> uploadFromPublisher(String filename, Publisher<ByteBuffer> source);
Any ideas how to convert my InputStream into an Publisher<ByteBuffer>? Is there any utilfunction in the java driver or Reactor?

There is a util in spring framework to convert input stream to data buffer flux. And it's direct to map data buffer as byte buffer.
Flux<ByteBuffer> byteBufferFlux = DataBufferUtils.readByteChannel(() -> Channels.newChannel(inputStream), DefaultDataBufferFactory.sharedInstance, 4096).map(DataBuffer::asByteBuffer);

To just make it work one could read stream and store data in byte[] array which later can be wrapped with ByteBuffer.wrap(byte[]) method.
I've looked into older MongoDB driver's source code and found stream to be used as actual stream, so it was not sneakily converted to ByteBuffer under the hood back then.
It seems there is no method to stream the data as it comes, there is no other way than to load file into memory now. I can see how it can be a problem in many scenarios, but maybe I just can't understand something about this approach.

Related

Redis value as byte[] vs plain string

I am using redis as centralized cache for distributed system. Currently i am using jedis to connect to redis cluster, where i am storing the value as byte[] instead of string. My question is does storing plain string or byte [] has impact on getting the data. In my application i serialize my java pojo object and convert to byte [] and then store, where as i can convert it to json and store so while getting it from redis i can readily use the object instead of deserialize. I have tried both but the only difference i can see is the extra step of deserialize
In Redis, everything is a byte[]. What redis calls as strings are actually byte[] in programming languages.
When you store JSON, you still need to serialize it to byte[] before saving to redis, and do the reverse when you read back. This is no different from serializing a java object. In other words, you always have to pay the cost of serialization and deserialization.
That said, different libraries have different serialization costs. Java serialization is know to be slow and inefficient. JSON is likely to be better than java serialization - but wastes memory in redis because it is a text based. You can choose a better serialization library.
Kryo is a faster replacement for the java serializer. Message Pack is like JSON but faster. Protocol Buffers / Flat Buffers are even better, but require you to declare a schema upfront. There are other serialization formats as well, each with their tradeoffs.
The general recommendation - try to use the hash datatype. It is efficient, and lets you request specific fields instead of the whole object. Only if hash does not work for you, pick something else based on your needs.
P.S. If you are into benchmarks, this website has several - https://github.com/eishay/jvm-serializers/wiki

Chronicle Bytes from InputStream

I'm trying to use saxophone for parsing json to protobuf message on the fly, and want to avoid creating string instances for each response.
For that i need to create Bytes instance from InputStream (that is provided from apache http entity).
I'm digging sources for a while but cant find way to do that... any suggestions?
There is two ways you can do this.
// reuse a string builder if the String cannot be pooled easily
stringBuilder.setLength(0);
bytes.parseUTF(stringBuilder, StopCharTesters.ALL);
or you can use the built in String pool
String s = bytes.parseUTF(StopCharTesters.ALL);
This will work well if there is a relative small number of possible Strings (at least most of the time)

Why should i use Serialization instead of File I/O in java

In serialization mechanism,we are wrote the object into stream using objectinputstream and object outputstream.These objects passing across the network.In this mechanismusing a Object input/output stream.So Can i use File INPUT/OUTPUT Streams instead of calling serialization marker interface?.
I guess You are mixing up serialization and general I/O.
Serialization is a way to transform objects into byte sequences (and back, which is called Deserialization). This way, You can transmit serializable objects over the network and store them into files.
File input/output streams are for storing/reading any kind of data to/from files.
when you need to transfer your object on network, you need to serialized it. Below link might be useful for you.
http://java.sun.com/developer/technicalArticles/Programming/serialization/
File I/O and Serialization are two different things. File I/O is used to read/write a file. Serialization interface is used for binary interpretation of an object. So NO, you can't use File Streams for sending over network.(maybe there is some workaround for sending data over network using file streams, but its like trying to fly with a car)
First let's concentrate on the definition:
Serialization: It is the process of converting object state into a format that can be stored and reconstructed later in the same way.
Whereas in file I/O it can't be possible to store data-structure or object and reconstructed later in the same way.
That's why we use serialization or database query methods (like sql, mongodb).
JSON/XML can also be used for serialization using its parser.
Take an example of javascript (not java, but take it like language-agnostics):
var obj = { // it's an object in javascript (same like json)
a: "something",
b: 3,
c: "another"
};
Now if you try to use file i/o in this to save in a file (say abc.txt), it will be saved as a string
which means it can't be accessed later in other code by reading this file (abc.txt) like this:
// readThisFile();
// obj.a;
But if you use serialization (in javascript using JSON natively), you can read it from the file
Since streams are additive, you can do something like
FileOutputStream fos = new FileOutputStream("/some/file/to/write/to");
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeObject(someObject);
Not sure this is what you were asking, but it's hard to tell.
Serialization/Deserialization is used to read and write objects, which not only makes compressed data, which is unreadable but also is writes it in binary. The File I/O is used for reading and writing. It appears that you do not want to serialize, if you don't, well do not use it. Read and write your files in text.
In serialization mechanism,we write the object into s stream using
ObjectInputStream and ObjectOutputStream.
Ok
These objects are passed across the network.In this mechanism using a
ObjectInput/Output stream.
I am following you.
So can I use File Input/Output streams instead of calling
serialization marker interface?.
Here you lost me. Do you mean to send an object over the network or just to serialize it?
Of course you can use whichever Input/Output streams along with ObjectInput/ObjectOutput streams to serialize objects to different media.
For instance:
ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("jedis.bin"));
out.writeObject(new Jedi("Luke"));
Would serialize the object into a file called jedis.bin
And the code
ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
ObjectOputStream out = new ObjectOutputStream(byteStream);
out.writeObject(new Jedi("Luke"));
Would serialize the object into a memory array.
So, anything that is an output/input stream is subject of being used as the underlying stream used by ObjectInput/ObjectOutput streams.

Storing Serializable Objects in the Database

I'm writing an application which needs to write an object into database.
For simplicity, I want to serialize the object.
But ObjectOuputStream needed for the same purpose has only one constructor which takes any subclass of OutputStream as parameter.
What parameter should be passed to it?
You can pass a ByteArrayOutputStream and then store the resulting stream.toByteArray() in the database as blob.
Make sure you specify a serialVersionUID for the class, because otherwise you'll have hard time when you add/remove a field.
Also consider the xml version for object serialization - XMLEncoder, if you need a bit more human-readable data.
And ultimately, you may want to translate your object model to the relational model via an ORM framework. JPA (Hibernate/EclipseLink/OpenJPA) provide object-relational mapping so that you work with objects, but their fields and relations are persisted in a RDBMS.
Using ByteArrayOutputStream should be a simple enough way to convert to a byte[] (call toByteArray after you've flushed). Alternatively there is Blob.setBinaryStream (which actually returns an OutputStream).
You might also want to reconsider using the database as a database...
e.g. create ByteArrayOutputStream and pass it to ObjectOuputStream constructor
One thing to add to this. java serialization is a good, general use tool. however, it can be a bit verbose. you might want to try gzipping the serialized data. you can do this by putting a GZIP stream between the object stream and the byte stream. this will use a small amount of extra cpu, but that is often a worthy tradeoff to shipping the extra bytes over the network and shoving them in a db.

Java sockets - how to determine data type on the server side?

I'm using Java sockets for client - server application. I have a situation when sometimes client needs to send a byte array (using byteArrayOutputStream) and sometimes it should send a custom java object. How can I read the information from the input stream on the server side and determine what is in the stream so that I can properly process that?
Usually this is to be done by sending a "header" in front of the body containing information about the body. Have a look at for example the HTTP protocol. The HTTP stream exist of a header which is separated from the body by a double newline. The header in turn exist of several fields in name: value format, each separated by a single newline. In this particular case, you would in HTTP have used the Content-Type header to identify the data type of the body.
Since Java and TCP/IP doesn't provide standard facilities for this, you would need to specify and document the format you're going to send over the line in detail so that the other side knows how to handle the stream. You can of course also grab a standard specification. E.g. HTTP or FTP.
There are multiple ways to handle this.
One is Object Serialization, which sends it over with Java's Object(In|Out)putStream. You run into a small problem when knowing when to read the object off the stream though.
Another is to marshal and unmarshal XML. Uses a bit more traffic but is easier to debug and get running. It helps to have a well documented XML schema for this. An advantage here is you can use existing XML libraries for it.
You could try a custom format if you wanted, but it would probably end up being just a sloppy, less verbose version of XML.
In general, I don't believe there is a feature built into Java that allows you to do this.
Instead, consider sending some more information along with each message that explains what type is coming next.
For example, you might prefix your messages with an integer, such that every time you receive a message, you read the first 4 bytes (an integer is 4 bytes) and interpret its value (e.g. 1=byte array, 2=custom Java object, 3=another custom Java object, ...).
You might also consider adding an integer containing the size of the message so that you know when the current message ends and the next message begins.
I'm going to get called for overkill for this, but unless you seriously need for this protocol to be economical, you might consider marshalling the data. I mean, without peeking at the data, you can't generally tell the difference between something that's a byte array and something that's something else, since you could conceivably represent everything as a byte array.
You can pretty easily use JAXB to marshall the data to and from XML. And JAXB will even turn byte array objects into hex strings or Base64 for you.
First read the data into a byte array on the server. Write your own parsing routine to do nothing more than identify what is in the byte array.
Second perform the full object parsing based on the identification from step one. If the parsing requires passing an inputstream, you can always put the byte array you read in step one into a new ByteArrayInputStream instance.
You need to define a protocol to indicate what type of data follows. For instance, you could start each transfer with a string or enumerated value. The server would first read this, then read the following data based on the 'header' value.
What you could do, would be to prepend any data you send with an integer that is used to determine the type.
That way, you could read the first 4 bytes, and then determine what type of data it is.
I think the easiest way is to use an object which contains the data that you will send along with its type information. Then you can just send this object and according to this object's data type property you can extract the data.

Categories