How to return Struct from Hive UDF?

How to return Struct from Hive UDF? - java

I am having trouble finding documentation on how to use a Hive UDF to return a Struct.
My major questions are:
What types of objects do I start with in Java?
How do I convert them so they will be interpreted as a Struct in Hive?

Here is a very simple example of such kind of UDF.
It receives an User-Agent string, parse it using external lib and returns a structure with 4 text fields:
STRUCT<type: string, os: string, family: string, device: string>
You need to extend GenericUDF class and override two most important methods: initialize and evaluate.
initialize() describes the structure itself and defines data types inside.
evaluate() fills up the structure with actual values.
You don't need any special classes to return, struct<> in Hive is just an array of objects in Java.
import java.util.ArrayList;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.io.Text;
import eu.bitwalker.useragentutils.UserAgent;
public class UAStructUDF extends GenericUDF {
private Object[] result;
#Override
public String getDisplayString(String[] arg0) {
return "My display string";
}
#Override
public ObjectInspector initialize(ObjectInspector[] arg0) throws UDFArgumentException {
// Define the field names for the struct<> and their types
ArrayList<String> structFieldNames = new ArrayList<String>();
ArrayList<ObjectInspector> structFieldObjectInspectors = new ArrayList<ObjectInspector>();
// fill struct field names
// type
structFieldNames.add("type");
structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
//family
structFieldNames.add("family");
structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
// OS name
structFieldNames.add("os");
structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
// device
structFieldNames.add("device");
structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
StructObjectInspector si = ObjectInspectorFactory.getStandardStructObjectInspector(structFieldNames,
structFieldObjectInspectors);
return si;
}
#Override
public Object evaluate(DeferredObject[] args) throws HiveException {
if (args == null || args.length < 1) {
throw new HiveException("args is empty");
}
if (args[0].get() == null) {
throw new HiveException("args contains null instead of object");
}
Object argObj = args[0].get();
// get argument
String argument = null;
if (argObj instanceof Text){
argument = ((Text) argObj).toString();
} else if (argObj instanceof String){
argument = (String) argObj;
} else {
throw new HiveException("Argument is neither a Text nor String, it is a " + argObj.getClass().getCanonicalName());
}
// parse UA string and return struct, which is just an array of objects: Object[]
return parseUAString(argument);
}
private Object parseUAString(String argument) {
result = new Object[4];
UserAgent ua = new UserAgent(argument);
result[0] = new Text(ua.getBrowser().getBrowserType().getName());
result[1] = new Text(ua.getBrowser().getGroup().getName());
result[2] = new Text(ua.getOperatingSystem().getName());
result[3] = new Text(ua.getOperatingSystem().getDeviceType().getName());
return result;
}
}

There is a concept of SerDe ( serializer and deserialzer ) in HIVE that can be used with the kind of data format you are playing it. It serializes the objects (complex) and then de-serializes it according to the need.
For instance, if you have a JSON file, that contains objects and values, so you need a way to store that content in hive.
For that you weill use a JsonSerde, that is actually a jar file , containing the parser code written in java for playing around with Json data.
SO now you have a jar( SerDe), and the other requirement is for a schema to store that data.
For eg: for XML files you need XSD,
similarly for JSON you define object ,arrays and structures relations.
You can check this link:
http://thornydev.blogspot.in/2013/07/querying-json-records-via-hive.html
Please let me know if this helps and solves your purpose :)

Related

Java declare a class object from the string of that class?

I have a script provided from the client like this
segment-id Integer
segment-description String
Now I want to build a class with the following methods
Sample sample = new Sample();
// casting to the type specified in the script
(map.get("segment-id")) segmentId = sample.get("segment-id");
// Now it can be used as an Integer
Integer result = segmentId + 2;
Is it possible to do something like
Class<map.get("segment-id")> segmentId = new Class<map.get("segment-id")>();
Or any better solution...? I need a way create objects of specific type, which I don't know in advance.
My current solution is
public Integer getInteger(String key) {
return map.get(key);
}
but in this way I have to know in advanced segment-id is of type Integer.

You can use Class.forName() to get the class and .newInstance() to get a new instance:
Object createdObject = Class.forName("java.lang.String").newInstance();
or ...
Object createdObject = Class.forName("java.lang.Integer").newInstance();
If you need to know if it's a String:
if(createdObject instanceof String) {
String castValue = (String) createdObject;
...
}
if(createdObject instanceof Integer) {
Integer castValue = (Integer) createdObject;
...
}
But you could just test the incoming string:
if("java.lang.String".equals(nameOfClassToCreate)) {
....
}

Get the data type of JSON element in Java

I get a JSON value from Kafka queue and I want to get the right data type to save it in the DB.
Value can be: String, int, double or array.
How can i detect automatically the right datatype and create a Java Object from it?
My first steps:
check if json is an array or not:
if (jsonValue.isJsonPrimitive()) {
// create new Object
//ToDo need to parse int, double not only to string
new ValueObject(time,jsonValue.getAsString);
} else if (jsonValue.isJsonArray()) {
//create new Object
//ToDo need to parse int, double string
new ValueObject(time,jsonValue.getAsJsonArray());
}
How can I design the ValueObject class to convert the value to the corresponding data type and return the right object?
Thanks for any ideas

have you tried:
//this instanciates an object of the getClass() method output
Object output = jsonValue.getClass().cast(jsonValue);
if that didn't work, you can try instanceof:
if(jsonValue instanceof int){
int output = (int) jsonValue;
}...
I hope that will do.

if you are using jackson lib, you can do like this:
JsonNode rootNode = objectMapper.readTree(json);
Iterator<String> fields = rootNode.fieldNames();
while(fields.hasNext()){
String field = fields.next();
JsonNode obj = rootNode.get(field);
System.out.println("value " + obj);
if (obj.isInt()) {
System.out.println("Integer");
}
if (obj.isDouble()) {
System.out.println("Double");
}
if (obj.isTextual()) {
System.out.println("String");
}
}

Copy fields across objects of different type in gRPC

Suppose I have two proto buffer types:
message MessageType1 {
SomeType1 field1 = 1;
SomeType2 field2 = 2;
SomeType3 field3 = 3;
}
message MessageType2 {
SomeType1 field1 = 1;
SomeType2 field2 = 2;
SomeType4 field4 = 3;
}
Then in Java I would like to be able to use one object as a template to another:
MessageType1 message1 = ...;
MessageType2 message2 = MessageType2.newBuilder()
.usingTemplate(message1) // sets field1 & field2 only
.setField4(someValue)
.build()
instead of
MessageType1 message1 = ...;
MessageType2 message2 = MessageType2.newBuilder()
.setField1(message1.getField1())
.setField2(message1.getField2())
.setField4(someValue)
.build()
Why do I need this? My gRPC service is designed to take incoming data of one type (message1) which is almost identical to another message of a different type (message2) -- which needs to be sent out. The amount of identical fields is huge and copy code is mundane. Manual solution also has a disadvantage of a miss if a new field gets added.
There exists a template method (object.newBuilder(template)) which allows templating object of the same type, but how about templating between different types?
I could, of course, write a small reflection utility which inspects all members (methods?) and manually copies data over, but generated code looks discouraging and ugly for this sort of quest.
Is there any good approach to tackle this?

It turned out to be not so complicated. I wrote a small utility which would evaluate and match FieldDescriptors (something that gRPC generates). In my world it is enough to match them by name and type. Full solution here:
/**
* Copies fields from source to dest. Only copies fields if they are set, have matching name and type as their counterparts in dest.
*/
public static void copyCommonFields(#Nonnull GeneratedMessageV3 source, #Nonnull com.google.protobuf.GeneratedMessageV3.Builder<?> destBuilder) {
Map<FieldDescriptorKeyElements, Descriptors.FieldDescriptor> elementsInSource = Maps.uniqueIndex(source.getDescriptorForType().getFields(), FieldDescriptorKeyElements::new);
Map<FieldDescriptorKeyElements, Descriptors.FieldDescriptor> elementsInDest = Maps.uniqueIndex(destBuilder.getDescriptorForType().getFields(), FieldDescriptorKeyElements::new);
// those two above could even be cached if necessary as this is static info
Set<FieldDescriptorKeyElements> elementsInBoth = Sets.intersection(elementsInSource.keySet(), elementsInDest.keySet());
for (Map.Entry<Descriptors.FieldDescriptor, Object> entry : source.getAllFields().entrySet()) {
Descriptors.FieldDescriptor descriptor = entry.getKey();
FieldDescriptorKeyElements keyElements = new FieldDescriptorKeyElements(descriptor);
if (entry.getValue() != null && elementsInBoth.contains(keyElements)) {
destBuilder.setField(elementsInDest.get(keyElements), entry.getValue());
}
}
}
// used for convenient/quick lookups in a Set
private static final class FieldDescriptorKeyElements {
final String fieldName;
final Descriptors.FieldDescriptor.JavaType javaType;
final boolean isRepeated;
private FieldDescriptorKeyElements(Descriptors.FieldDescriptor fieldDescriptor) {
this.fieldName = fieldDescriptor.getName();
this.javaType = fieldDescriptor.getJavaType();
this.isRepeated = fieldDescriptor.isRepeated();
}
#Override
public int hashCode() {
return Objects.hash(fieldName, javaType, isRepeated);
}
#Override
public boolean equals(Object obj) {
if (obj == null || !(obj instanceof FieldDescriptorKeyElements)) {
return false;
}
FieldDescriptorKeyElements other = (FieldDescriptorKeyElements) obj;
return Objects.equals(this.fieldName, other.fieldName) &&
Objects.equals(this.javaType, other.javaType) &&
Objects.equals(this.isRepeated, other.isRepeated);
}
}

Answering your specific question: no, there is no template based way to do this. However, there are some other ways to get the same effect:
If you don't care about performance and the field numbers are the same between the messages, you can serialize the first message to bytes and deserialize them back as the new message. This requires that all the fields in the first message must match the type and id number of those in the second message (though, the second message can have other fields). This is probably not a good idea.
Extract the common fields to another message, and share that message. For example:
proto:
message Common {
SomeType1 field1 = 1;
SomeType2 field2 = 2;
SomeType3 field3 = 3;
}
message MessageType1 {
Common common = 1;
// ...
}
message MessageType2 {
Common common = 1;
// ...
}
Then, you can share the messages in code:
MessageType1 message1 = ...;
MessageType2 message2 = MessageType2.newBuilder()
.setCommon(message1.getCommon())
.build();
This is the probably the better solution.
Lastly, as you mentioned, you could resort to reflection. This is probably the most verbose and slowest way, but it would allow you the most control (aside from manually copying over the fields). Not recommended.

Deep copying an array of objects

I'm still pretty new to Java and right now I'm trying to make a copy of Menu. I think I've done a little bit of it where I created a new Menu object with new MenuItems in it. MenuItems is another class with two string variables and a double variable, the itemName and itemDescription and the itemPrice. So I'm trying to copy the contents, the three variables of the original MenuItems into the MenuItems copy, but I don't know how. I got stuck on trying to set the clone copy's name to the original's name.
public class Menu
{
Menu()
{
}
final int maxItems = 50;
MenuItem[] food = new MenuItem[maxItems + 1];
public Object clone()
{
Menu menuClone = new Menu();
MenuItem[] foodClone = new MenuItem[maxItems + 1];
for(int i = 1; i <= maxItems + 1; i++)
{
foodClone[i] = new MenuItem();
foodClone[i] = food[i].setItemName();
}
}
This is the MenuItem class:
public class MenuItem
{
private String name;
private String descrip;
private double price;
MenuItem()
{
}
public String getItemName()
{
return name;
}
public String getItemDescrip()
{
return descrip;
}
public double getPrice()
{
return price;
}
public void setItemName(String itemName)
{
name = itemName;
}
public void setItemDescrip(String itemDescrip)
{
descrip = itemDescrip;
}
public void setPrice(double itemPrice) throws IllegalArgumentException
{
if(itemPrice >= 0.0)
price = itemPrice;
else
throw new IllegalArgumentException("Enter only positive values");
}
public String toString(){
return "Name: " + name + ", Desc: " + descrip;
}
}

You are almost there, where you have:
foodClone[i] = food[i].setItemName();
You probably want (in addition to the other variables of MenuItem)
foodClone[i].setItemName(food[i].getItemName())`
However, it's best to use the clone method or a copy constructor (well, copy constructor arguably might be best).
I do prefer using a copy constructor, such an example would be:
MenuItem(MenuItem menuItemToClone)
{
this.name = menuItemToClone.name;
this.descrip = menuItemToClone.descrip;
this.price = menuItemToClone.price;
}
Then you would just do:
foodClone[i] = new MenuItem(food[i]);

Cloning only provides a shallow copy, despite some of the previous recommendations.
A common solution to the deep copy problem is to use Java Object Serialization (JOS). The idea is simple: Write the object to an array using JOS’s ObjectOutputStream and then use ObjectInputStream to reconstitute a copy of the object. The result will be a completely distinct object, with completely distinct referenced objects. JOS takes care of all of the details: superclass fields, following object graphs, and handling repeated references to the same object within the graph. Figure 3 shows a first draft of a utility class that uses JOS for making deep copies.
import java.io.IOException;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.ObjectOutputStream;
import java.io.ObjectInputStream;
/**
* Utility for making deep copies (vs. clone()'s shallow copies) of
* objects. Objects are first serialized and then deserialized. Error
* checking is fairly minimal in this implementation. If an object is
* encountered that cannot be serialized (or that references an object
* that cannot be serialized) an error is printed to System.err and
* null is returned. Depending on your specific application, it might
* make more sense to have copy(...) re-throw the exception.
*
* A later version of this class includes some minor optimizations.
*/
public class UnoptimizedDeepCopy {
/**
* Returns a copy of the object, or null if the object cannot
* be serialized.
*/
public static Object copy(Object orig) {
Object obj = null;
try {
// Write the object out to a byte array
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream out = new ObjectOutputStream(bos);
out.writeObject(orig);
out.flush();
out.close();
// Make an input stream from the byte array and read
// a copy of the object back in.
ObjectInputStream in = new ObjectInputStream(
new ByteArrayInputStream(bos.toByteArray()));
obj = in.readObject();
}
catch(IOException e) {
e.printStackTrace();
}
catch(ClassNotFoundException cnfe) {
cnfe.printStackTrace();
}
return obj;
}
}
Unfortunately, this approach has some problems:
It will only work when the object being copied, as well as all of the other objects references directly or indirectly by the object, are serializable. (In other words, they must implement java.io.Serializable.) Fortunately it is often sufficient to simply declare that a given class implements java.io.Serializable and let Java’s default serialization mechanisms do their thing.
Java Object Serialization is slow, and using it to make a deep copy requires both serializing and deserializing. There are ways to speed it up (e.g., by pre-computing serial version ids and defining custom readObject() and writeObject() methods), but this will usually be the primary bottleneck.
The byte array stream implementations included in the java.io package are designed to be general enough to perform reasonable well for data of different sizes and to be safe to use in a multi-threaded environment. These characteristics, however, slow down ByteArrayOutputStream and (to a lesser extent) ByteArrayInputStream.
Source: http://javatechniques.com/blog/faster-deep-copies-of-java-objects/

Get an existing object from a Map with a key

For an application written in java (Eclipse), I have created a Map where I save objects of a custom class.
This custom class is called Music and has this constructor:
public Music (String title, String autor, int code){
this.setTitle(title);
this.setAutor(autor);
this.setCode(code);
}
This class has 3 child classes: Vinyl, CD and cassette that extend it. Here is the CD class:
public CD(String title, String autor, String type, int code) {
super(title, autor, code);
this.setType(type);
}
Then, in other class called ManageMusic I have created some methods and the Map:
private final Map<Integer, Music> musicMap;
public ManageMusic() {
musicMap = new HashMap<Integer, Music>();
}
If I want to add an object to the Map, I have a method that basically in this example with the CD does:
musicItem = new CD(title, autor, format, newCode);
musicMap.put(newCode, musicItem);
The code in all theses cases is a number with which I refer to as a determined object to set it into the Map, delete it or get it from the Map.
Now, my question is: When I want to get an object from the Map and set it into a String, I am doing this:
String object = musicMap.get(code).toString();
This way I should be getting the object from the Map and casting it to a String.
How can I manage the case when the code passed doesn't exist in the Map?
How could I catch an exception or do something to tell the user that there is no element inside the Map with that code?

You can use Ternary operator ?:
String object = musicMap.get(code) != null ? musicMap.get(code).toString() : "No item found.";
Edit: (thanks to #user270349)
Even better approach
Music m = musicMap.get(code);
String object = (m != null) ? m.toString() : "No item found.";

You can check if the return value of get is null :
Music object = musicMap.get(code);
if (object == null) {
// do nothing
} else {
String str = object.toString();
}
You could also use containsKey() method :
if (musicMap.containsKey(code)) {
// your code
}

I am not sure if I understood but you can always do.
Music music = musicMap.get(code);
if( music != null )
String object = music.toString()

You can use containsKey method:
String str;
if(musicMap.containsKey(code)){
str = musicMap.get(code);
} else {
// do something
// str = "some string";
}

I would suggest to throw an exception when there is no element in map corresponding to key.
This exception can be caught somewhere in your application(depends on how exceptions are handled in your application) this type of implementation allows to easily display different types of error or warning messages to the user.
Music object = musicMap.get(code);
if (object != null) {
// do something
} else {
throw new NoCDFoundException("no.item.found");
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to return Struct from Hive UDF? - java

I am having trouble finding documentation on how to use a Hive UDF to return a Struct. My major questions are: What types of objects do I start with in Java? How do I convert them so they will be interpreted as a Struct in Hive?

Related

Java declare a class object from the string of that class?

Get the data type of JSON element in Java

Copy fields across objects of different type in gRPC

Deep copying an array of objects

Get an existing object from a Map with a key

Categories

Resources