Using Thrift for offline serialisation?

Using Thrift for offline serialisation? - java

I use Apache Thrift protocol for tablet-server and interlanguage integration, and all is OK few years.
Integration is between languages (C#/C++/PC Java/Dalvik Java) and thrift is probably one of simplest and safest. So I want pack-repack sophisticated data structures (and changed over years) with Thrift library. Lets say in thrift terms kind of OfflineTransport or OfflineProtocol.
Scenario:
I want to make backup solution, for example during internet provider failure process data in offline mode: serialise, store, try to process in few ways. For example sent serialised data by normal email via poor backup connection etc.
Question is: where in Thrift philosophy is best extension point for me?
I understand, only part of online protocol is possible to backup offline, ie real time return of value is not possible, that is OK.

Look for serializer. There are misc. implementations but they all share the same common concept to use a buffer or file / stream as transport medium:
Writing data in C#
E.g. we plan to store the bits into a bytes[] buffer. So one could write:
var trans = new TMemoryBuffer();
var prot = new TCompactProtocol( trans);
var instance = GetMeSomeDataInstanceToSerialize();
instance.Write(prot);
Now we can get a hold of the data:
var data = trans.GetBuffer();
Reading data in C#
Reading works similar, except that you need to know from somewhere what root instance to construct:
var trans = new TMemoryBuffer( serializedBytes);
var prot = new TCompactProtocol( trans);
var instance = new MyCoolClass();
instance.Read(prot);
Additional Tweaks
One solution to the chicken-egg problem during load could be to use a union as an extra serialization container:
union GenericFileDataContainer {
1 : MyCoolClass coolclass;
2 : FooBar foobar
// more to come later
}
By always using this container as the root instance during serialization it is easy to add more classes w/o breaking compatibility and there is no need to know up front what exactly is in a file - you just read it and check what element is set in the union.

There is an RPC framework that uses the standard thrift Protocol named "thrifty", and it is the same effect as using thrift IDL to define the service, that is, thrify can be compatible with code that uses thrift IDL, which is very helpful for cross-platform. And has a ThriftSerializer class in it:
[ThriftStruct]
public class LogEntry
{
[ThriftConstructor]
public LogEntry([ThriftField(1)]String category, [ThriftField(2)]String message)
{
this.Category = category;
this.Message = message;
}
[ThriftField(1)]
public String Category { get; }
[ThriftField(2)]
public String Message { get; }
}
ThriftSerializer s = new ThriftSerializer(ThriftSerializer.SerializeProtocol.Binary);
byte[] s = s.Serialize<LogEntry>();
s.Deserialize<LogEntry>(s);
you can try it:https://github.com/endink/Thrifty

Related

Can proto2 extensions be used with grpc?

I have a grpc client written in GO and a grpc server written in Java (both using the same proto files (syntax 2).
My grpc method takes a message that may contain extensions. I am able to construct a message containing desired extensions on the client and send it to the server. But when I try to read the message on the server, my extensions are available as unknown fields. (In other words, entity.hasExtension(extension) in Java returns false).
So my question is whether grpc allows extensions to be used in messages that are provided as method parameters. If not, is there a way to convert an unknown field to a field of specific type?
My proto file:
syntax = "proto2";
// proto file used as source for go client and java server as well
package my_services;
import "basic_types.proto";
// import "extension_types.proto";
// do not delete: options for generating java code
option java_multiple_files = true;
option java_package = "myservice.grpc";
option java_outer_classname = "MyServiceWrapper";
option objc_class_prefix = "Foo";
// Interface exposed by the server.
service DataService {
// Obtains all objects satisfying the request message
rpc MyMethod(DataRequest) returns (DataResponse) {}
}
message DataRequest {
optional IdDefinition id = 1;
repeated basic_types.Entity templates = 2;
}
message DataResponse {
repeated IdDefinition id = 1;
optional basic_types.DataResult result = 2;
}
message IdDefinition {
optional int32 myid = 1;
}
basic_types.Entity is a basic message containing extensions:
message Entity {
extensions 1 to max;
}
and may be extended e.g. like this:
extend basic_types.Entity {
optional Foo foo = 1000;
optional Bar bar = 1001;
}
Any help or hint would be much appreciated.

In java it is possible, but you need to set an extension registry with ProtoLiteUtils.setExtensionRegistry(). This is an experimental API, and there may be a different way in the future to do this, but for the time being it should be useable.
More generally, All message encodings are supported by gRPC. We natively support Proto3, but there are a lot of existing Proto2 users that use gRPC. Since gRPC is encoding agnostic, you can even use things like thrift or JSON if you really want to, though we don't automatically generate stubs for those.

Write data in Apache Parquet format

I'm having a scheduler that gets our cluster metrics and writes the data onto a HDFS file using an older version of the Cloudera API. But recently, we updated our JARs and the original code errors with an exception.
java.lang.ClassCastException: org.apache.hadoop.io.ArrayWritable cannot be cast to org.apache.hadoop.hive.serde2.io.ParquetHiveRecord
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116)
at parquet.hadoop.ParquetWriter.write(ParquetWriter.java:324)
I need help in using the ParquetHiveRecord class write the data (which are POJOs) in parquet format.
Code sample below:
Writable[] values = new Writable[20];
... // populate values with all values
ArrayWritable value = new ArrayWritable(Writable.class, values);
writer.write(value); // <-- Getting exception here
Details of "writer" (of type ParquetWriter):
MessageType schema = MessageTypeParser.parseMessageType(SCHEMA); // SCHEMA is a string with our schema definition
ParquetWriter<ArrayWritable> writer = new ParquetWriter<ArrayWritable>(fileName, new
DataWritableWriteSupport() {
#Override
public WriteContext init(Configuration conf) {
if (conf.get(DataWritableWriteSupport.PARQUET_HIVE_SCHEMA) == null)
conf.set(DataWritableWriteSupport.PARQUET_HIVE_SCHEMA, schema.toString());
}
});
Also, we were using CDH and CM 5.5.1 before, now using 5.8.3
Thanks!

I think you need to use DataWritableWriter rather than ParquetWriter. The class cast exception indicates the write support class is expecting an instance of ParquetHiveRecord instead of ArrayWritable. DataWritableWriter likely breaks down the individual records in ArrayWritable to individual messages in the form of ParquetHiveRecord and sends each to the write support.
Parquet is sort of mind bending at times. :)

Looking at the code of the DataWritableWriteSupport class:
https ://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
You can see it is using the DataWritableWriter, hence you do not need to create an instance of DataWritableWriter, the idea of Write support is that you will be able to write different formats to parquet.
What you do need is to wrap your writables in ParquetHiveRecord

GWT - impossible to find working dir with Eclipse

I need to show on my panel the working dir.
I use String value = System.getProperty("user.dir"). Afterwards i put this string on label but I receive this message on console:
The method getProperty(String, String) in the type System is not applicable for the arguments (String).
I use eclipse.

Issue
I am guessing you have not gone through GWT 101 - You cannot blindly use JAVA CODE on client side.
Explanation
You can find the list of classes and methods supported for GWT from JAVA.
https://developers.google.com/web-toolkit/doc/latest/RefJreEmulation
For System only the following are supported.
err, out,
System(),
arraycopy(Object, int, Object, int, int),
currentTimeMillis(),
gc(),
identityHashCode(Object),
setErr(PrintStream),
setOut(PrintStream)
Solution
In your case Execute System.getProperty("user.dir") in your server side code and access it using RPC or any other server side gwt communication technique.

System.getProperty("key") is not supported,
but System.getProperty("key", "default") IS supported, though it will only return the default value as there is not system properties per se.
If you need the working directory during gwt compile, you need to use a custom linker or generator, grab the system property at build time, and emit it as a public resource file.
For linkers, you have to export an external file that gwt can download and get the compile-time data you want. For generators, you just inject the string you want into compiled source.
Here's a slideshow on linkers that is actually very interesting.
http://dl.google.com/googleio/2010/gwt-gwt-linkers.pdf
If you don't want to use a linker and an extra http request, you can use a generator as well, which is likely much easier (and faster):
interface BuildData {
String workingDirectory();
}
BuildData data = GWT.create(BuildData.class);
data.workingDirectory();
Then, you need to make a generator:
public class BuildDataGenerator extends IncrementalGenerator {
#Override
public RebindResult generateIncrementally(TreeLogger logger,
GeneratorContext context, String typeName){
//generator boilerplate
PrintWriter printWriter = context.tryCreate(logger, "com.foo", "BuildDataImpl");
if (printWriter == null){
logger.log(Type.TRACE, "Already generated");
return new RebindResult(RebindMode.USE_PARTIAL_CACHED,"com.foo.BuildDataImpl");
}
SourceFileComposerFactory composer =
new SourceFileComposerFactory("com.foo", "BuildDataImpl");
//must implement interface we are generating to avoid class cast exception
composer.addImplementedInterface("com.foo.BuildData");
SourceWriter sw = composer.createSourceWriter(printWriter);
//write the generated class; the class definition is done for you
sw.println("public String workingDirectory(){");
sw.println("return \""+System.getProperty("user.dir")+"\";");
sw.println("}");
return new RebindResult(RebindMode.USE_ALL_NEW_WITH_NO_CACHING
,"com.foo.BuildDataImpl");
}
}
Finally, you need to tell gwt to use your generator on your interface:
<generate-with class="dev.com.foo.BuildDataGenerator">
<when-type-assignable class="com.foo.BuildData" />
</generate-with>

Is this PowerBuilder stats generation code appropriately object-oriented?

I am working on refactoring an existing application written in PowerBuilder and Java and which runs on Sybase EA Server (Jaguar). I am building a small framework to wrap around Jaguar API functions that are available in EA Server. One of the classes is to get runtime statistics from EA Server using the Monitoring class.
Without going into too much detail, Monitoring is a class in EA Server API that provides Jaguar Runtime Monitoring statistics (actual classes are in C++; EA Server provides a wrapper for these in Java, so they can be accessed through CORBA).
Below is the simplified version of my class. (I made a superclass which I inherit from for getting stats for components, conn. caches, HTTP etc).
public class JagMonCompStats {
...
public void dumpStats(String type, String entity) {
private String type = "Component";
private String entity = "web_business_rules";
private String[] header = {"Active", "Pooled", "invoke"};
// This has a lot more keys, simplified for this discussion
private static short[] compKeys = {
(short) (MONITOR_COMPONENT_ACTIVE.value),
(short) (MONITOR_COMPONENT_POOLED.value),
(short) (MONITOR_COMPONENT_INVOKE.value)
};
private double[] data = null;
...
/* Call to Jaguar API */
Monitoring jm = MonitoringHelper.narrow(session.create("Jaguar/Monitoring"));
data = jm.monitor(type, entity, keys);
...
printStats(entity, header, data);
...
}
protected void printStats(String entityName, String[] header, double[] data) {
/* print the header and print data in a formatted way */
}
}
The line data = jm.monitor is the call to Jaguar API. It takes the type of the entity, the name of the entity, and the keys of the stats we want. This method returns a double array. I go on to print the header and data in a formatted output.
The program works, but I would like to get experts' opinion on OO design aspect. For one, I want to be able to customize printStats to be able to print in different formats (for e.g., full blown report or a one-liner). Apart from this, I am also thinking of showing the stats on a web page or PowerBuilder screen, in which case printStats may not even be relevant. How would you do this in a real OO way?

Well, it's quite simple. Don't print stats from this class. Return them. And let the caller decide how the returned stats should be displayed.
Now that you can get stats, you can create a OneLinerStatsPrinter, a DetailedStatsPrinter, an HtmlStatsFormatter, or whatever you want.

Java DataStructure for writing 4 pieces of information

I need to extract two pieces of information about two IP addresses and then write those information plus two addresses.
I was thinking of a Set of Pairs for IP addresses, but by which data structure I can write all these information?
Thanks

PcapPacketHandler<String> jPacketHandler = new PcapPacketHandler<String>(){
int totalLength = 0;
public void nextPacket(PcapPacket packet, String user) {
Ip4 ip = new Ip4();
String sIP;
String dIP;
if (packet.hasHeader(ip) == false){
return;
}
totalLength = totalLength+ ip.getPayloadLength();
sIP = org.jnetpcap.packet.format.FormatUtils.ip(ip.source());
dIP = org.jnetpcap.packet.format.FormatUtils.ip(ip.destination());
System.out.println("SIP = "+sIP+" "+"destIP = "+dIP+" "+"Payload Length = "+ip.getPayloadLength());
System.out.println("Total Length = "+totalLength);
}
};
pcap.loop(10, jPacketHandler, "");
pcap.close();

Even though this isn't a Javascript app, you could use JSON as it provides a concise way to read/store multiple pieces of data together. Check out the JSON Java Documentation for details about classes, and to download the related source.

If you're just writing the information you could always use a Hashmap. Unless you know what you're planning to do with the data, it's hard to say what's best.

Just make a custom class (POJO), and depending on how you want to write it make it Serializable. That way you can clearly name your fields (and getters and setters) making your code easier to read (and extend).

some thing like this...
class BigClass { //<br>
private IPAdreess addr1; //<br>
private IPAddress addr2; //<br>
private SomeInfo additionalInfo;//<br>
//implement accessors//<br>
//implement equals, hashCode//<br>
}//<br>
IPAddress, SomeInfo are your user types. In java, InetAddress represents IP address. This may be much more than your custom type.
The selection of suitable data structure of "set" could be decided many factors.. Do you want to retain the order? Do you populate it via multiple threads? How many entries you expect in the set? 100s? A million?
Why not post your code? It may be easier to give feedback with real code..

I don't quite understand what graph you exactly want to plot. What I would do is
Dump all data into an sql database
Run a query to produce input for your chart.
Plot the chart e.g. with JFreeChart or even Excel
I imagine a query along the line
select source_ip, dest_ip, sum(time), sum(sent_bytes) group by source_ip, dest_ip

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using Thrift for offline serialisation? - java

Related

Can proto2 extensions be used with grpc?

Write data in Apache Parquet format

GWT - impossible to find working dir with Eclipse

Is this PowerBuilder stats generation code appropriately object-oriented?

Java DataStructure for writing 4 pieces of information

Categories

Resources