Using JDBCInputFormat with variable tuple size (apache-flink) - java

I want to write a generic flink job in java, which can take any SQL-SELECT query, run it against a SQL-database and write it into a Elasticsearch index.
One of the problems I have to solve is creating a DataSource for a JDBC-Connection. I want to use the JDBCInputFormat. I followed the example in the documentation data source.
The problem is, the generic type DataSource type must be specified. And I can only use a Tuple type, because JDBCInputFormat generic type OUT extends Tuple. But I do not know at compile time which Tuple I will use.
Do I interpret something wrong?
Is there another jdbc InputFormat I can use?
Is there a way to specify Tuple as a generic type?
I use java 7 and apache-flink 0.10.2
I tried to use Tuple25 with only Strings in it, but I get an exception.
Here follows code and then the exception.
DataSource<StringsTuple25> database = flink.createInput(
JDBCInputFormat.buildJDBCInputFormat()//
.setDrivername(getDatabaseDriverName())//
.setDBUrl(getDatabaseUrl())//
.setUsername(getDatabaseUsername())//
.setPassword(getDatabasePassword())//
.setQuery(getQuery())//
.finish(),
StringsTuple25.typeInformation()
);
My StringTuple25 class
public class StringsTuple25 extends
Tuple25<String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String> {
private static final long serialVersionUID = 1L;
public static TypeInformation<?> typeInformation() {
TypeInformation<String>[] types = new TypeInformation[25];
Arrays.fill(types, STRING_TYPE_INFO);
return new TupleTypeInfo<>(Tuple25.class,types);
}
}
And I get this exception:
Caused by: java.io.IOException: Tuple size does not match columncount
at org.apache.flink.api.java.io.jdbc.JDBCInputFormat.extractTypes(JDBCInputFormat.java:180)
at org.apache.flink.api.java.io.jdbc.JDBCInputFormat.nextRecord(JDBCInputFormat.java:162)
at org.apache.flink.api.java.io.jdbc.JDBCInputFormat.nextRecord(JDBCInputFormat.java:51)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:169)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:745)

As the error indicate, the number of attributes on your used Tuple type must match the number of selected columns in your SQL query. Furthermore, the data types for each attribute must match.
For example if you SELECT id, name FROM ... with id is INTEGER and name is VARCHAR, you would specify use DataStream<Tuple2<Integer,String>> (or specialize your own class class MyResultType extends Tuple2<Integer,String> and DataStream<MyResultType>) and provide a corresponding TypeInformation.
You can also go with generic Tuple type. Your stream would be DataStream<Tuple> (without specifying the number or types of attributes). However, for the TypeInformation you need to know the number of attributes.
Tuple t = Tuple.getTupleClass(numberOfAttributes).newInstance();
for(int i = 0; i < numberOfAttributes; i++) {
t.setField("", i);
}
TypeInformation<Tuple> typeInfo = TypeExtractor.getForObject(t);
Thus, you need to infer the number of selected attributes from you given arguments that define your SQL query.

Related

Get Class Name for the given String

I have a use case, where I have stored the List of Java Data Types in DB, Like Byte, Character, Integer, Long, BigDecimal, BigInteger, Boolean.
So my use case is like If I read the value Long, I need to create the Long.class, if I read String, then String.class.
Class cls = Class.forName("java.lang.Long);, then I can use the cls for my own purpose.
I can achieve this, by having Enum of the above data types, as soon I read the value from the db, I pass the value to enum to get the class type. But I don't know whether it is efficient or not. Is there any method present in Java which gives like, for the given string,(without fully qualified name), it should return the class type.
Storing a reference to the Class object is efficient but using the Class object for reflection can be expensive. If you're just using the Class for reference then you're fine.
enum Decodable {
BIG_INTEGER(BigInteger.class),
INTEGER(Integer.class)
// etc
private final Class<?> decodableClass;
private Decodable(Class<?> decodableClass) {
this.decodableClass = decodableClass;
}
}
You could also just maintain a Set of Class objects.
private static final Set<Class<?>> DECODABLE_CLASSES = ImmutableSet.of(Integer.class, BigInteger.class); //etc

Flink: Declaring dynamic tuple size & type

Is there any way to declare the various types in the tuple dynamically?
I found a way to declare the number of columns in the Tuple dynamically:
env.readCsvFile(filePath).tupleType(Tuple.getTupleClass(3))
But without any type parameters, it throws as error:
Exception in thread "main" org.apache.flink.api.common.functions.InvalidTypesException: Tuple needs to be parameterized by using generics.
I wanted to use all the elements in the Tuple as simple String. The following works:
env.readCsvFile(filePath).types(String.class, String.class);
This results in a Tuple2(String,String) type. But in my case, I don't know how many columns of data there is in the csv. But I'm fine reading all the columns as Strings. (I understand that there's limit of max 25 columns)
I even tried reading by specifying the sub-type of CsvInputFormat:
env.readFile(new TupleCsvInputFormat(filePath,TypeInformation.of(String.class), filePath);
But couldn't get it to compile. Wasn't sure how to use this for my case. I was also unsure on how to extend the Tuple class to achieve the same (if possible). TypeHint seems to require me to know the number of columns before-hand.
I'm not sure about the other env.read...() methods. I tried a few, but a few methods like ignoreFirstLine() were not available. They only come with the CsvReader.
So, can someone kindly help me figure out the best approach to read a csv if the number of columns can be arbitrary (passed by input), and to read each element of the Tuple as a simple String?
It possible to write your own method to read CSV files. Maybe something like this:
public static void main(String[] args) throws Exception {
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
int n = 3; // number of columns here
Class[] types = IntStream.range(0, n).mapToObj(i -> String.class).toArray(Class[]::new);
DataSet<Tuple> csv = readCsv(env, "filename.csv", types);
csv.print();
}
private static DataSource<Tuple> readCsv(ExecutionEnvironment env, String filename, Class[] fieldTypes) {
TupleTypeInfo<Tuple> typeInfo = TupleTypeInfo.getBasicAndBasicValueTupleTypeInfo(fieldTypes);
TupleCsvInputFormat<Tuple> inputFormat = new TupleCsvInputFormat<>(new Path(filename), typeInfo);
return new DataSource<>(env, inputFormat, typeInfo, Utils.getCallLocationName());
}
Note: this method skips calling configureInputFormat method in the CsvReader class. And if you need it you can do it.

Error converting Optional<String> to Integer from TextInputDialog

In this example I have tempSocket1 and tempSocket2 but I really just want one of them. I just included both to show I tried both methods, but I keep getting an error, "the method valueOf(String) in the type Integer is not applicable for the arguments (Optional)." I thought both of these methods were the ones used for converting a string data type to integer, but I'm not sure how the Optional part changes the whole system.
private void showTextInputDialog() {
TextInputDialog changePort = new TextInputDialog("Settings");
changePort.setHeaderText("Change Port");
changePort.setContentText("Please enter port number to be used for establishing connection...");
Optional<String> result = changePort.showAndWait();
result.ifPresent(e -> {
Integer tempSocket1 = Integer.valueOf(result);
Integer tempSocket2 = Integer.parseInt(result);
}
);
}
To convert an Optional to an Integer, it is necessary to invoke the get() method before the conversion.
Optional<String> cadena = Optional.of("333");
Integer num = Integer.valueOf(cadena.get());
You see, Integer.valueOf and Integer.parseInt methods need an argument of type String, but you are passing an Optional<String>. So that's why the error occurred. Optional string and string are not the same.
Just think about this, if Optional<String> were the same as String, would ArrayList<String> be the same as String? Would LinkedList<String> be the same as String? What about HashMap<String, Integer>? Would it be both a String and an Integer?
The chaos that treating generic types the same as their generic type arguments would bring is destructive! Imagine calling charAt on an optional string! Without the implementation, no one knows what will happen...
So yeah, never think that generic types are the same types as the generic type parameters.
Just to extend other answers it may looks better using map method, and even more with lambda and method reference:
Optional<String> result = changePort.showAndWait();
Integer tempSocket = result.map(Integer::valueOf).orElse(8080);
You're trying to pass an Optional<String> instead of a normal String. You need to fetch the string first with .get() before converting your result to an integer. Or use result.ifPresent(e ...) that will automatically unwrap the optional value and convert it to an Integer.
Optional<String> result = changePort.showAndWait();
result.ifPresent(e -> {
Integer tempSocket1 = Integer.valueOf(e);
Integer tempSocket2 = Integer.parseInt(e);
}
);

Casting from string to dynamic types?

I am trying to create object dynamically from csv file. I have get to this part:
Assuming I have this class:
package com.example
public TestClass {
public int hi = 0;
public String ho = "ho";
}
I also have this csv:
hi,ho
1,"hello"
In the codes below, classIdentifier is a string of the class name (i.e. "com.example.TestClass"), and maps is a HashMap that stores the data read from the csv file.
Class<T> c = (Class<T>) classIdentifier.getClass();
Set<String> keys = maps.keySet();
for(String key : keys) {
System.out.println(classIdentifier, c.getDeclaredField(key).getType());
System.out.println(maps.get(key));
c.getDeclaredField(key).set(classIdentifier, c.getDeclaredField(key).getType().cast(maps.get(key)));
}
The above code will printout expected values:
int
1
However this will throw the following error in the cast part:
java.lang.ClassCastException: Cannot cast java.lang.String to int
Although in the given example, hi is int and ho is string, I do want to extend the usage of this code to be able to convert with any type. That means I have no prior knowledge of the fields' type until I use the c.getDeclaredField(key).getType() (That's what I mean by dynamic).
I am just wondering how can I fix it?

Define generic/variable type for data structure

I am having a data structure (LinkedHashMap) but the problem is that the (second) value should be of variable type since I can put there type String or type int or any primitive type so my question is:
Is there a way to define a variable type for it that can get any value type?
This is what I'm having:
private LinkedHashMap<String, String> keyVal;
I want something like this:
private LinkedHashMap<String, anyValue> keyVal;
private LinkedHashMap<String, Object> keyVal;
You can use Object for that. But do remember that while trying to get data back from this map(sometime later), you may face difficulty in casting the Object to your required data type, as you may not know, what data type is actually present.
Hence, its advisable to avoid such implementations.
You cannot have a generic type be a primitive type. If you want to be able to store anything in your map, you can have the "value" generic type for the map be Object:
private LinkedHashMap<String, Object> keyVal;
You can still store what looks like primitives types due to autoboxing, i.e.
keyVal.put("one", 1);
will place an Integer, even though you specified an int.
No, the closest you can have is Object as a second argument.
Now, I would advise to rethink what you need to accomplish, since this is actually going against what generics were created for.
If you have a bound type and want to maintain some flexibility, then you could use something like <String, ? extends SomeType>.
Mixing several types of Objects in the same data-structure is not advisable in Java (if this is good or bad, is beside the point), but type safety goes a long way in preventing weird errors along the line.
Try to think about how you would deal with this when you actually need to retrieve the objects... will you assume they're Strings? What are you going to do with them?
You say you want to have a Map< String, Primitive type>.
A specified by the JLS, primitives are NumericType or boolean, NumericType are IntegralType or FloatingPointType.
If your need is not primitive but only NumericType, you may use java.lang.Number:
Map< String, Number >
Another way is to define a class Any which hold all the possible attributes:
enum Type {
NULL,
INTEGER,
SHORT,
FLOAT,
...
}
class Any {
private int iValue;
private short sValue;
private float fValue;
...
private Type active = Type.NULL;
public void setInt( int value ) {
iValue = value;
active = Type.INTEGER;
}
public void setFloat( float value ) {
fValue = value;
active = Type.FLOAT;
}
...
public int getInt() {
if( type != Type.INTEGER ) {
throw new ClassCastException( type.name() + " is not an integer" );
}
return iValue;
}
...
}
It's up to you to put some check and throw exception if getInt() is called on a float holder. Everything is possible, transtyping like C language for example.
EDIT
You want String too, and String isn't a primitive.
You have to add the following below private short sValue; into the Any class:
private String sValue;
and the followinf below SHORT, into the Type enum:
STRING,
But, like others says, the best way is to avoid these weak type (fourre-tout in french).
You can use
private LinkedHashMap<String, Object> keyVal;
to leave the second type argument as general as possible.
It allows you to store any object as a value, because every class extends Object.
This leads you to the problem that you don't know what type of things are inside your map - you only know that they are of type Object what means you don't know anything.
So to use these objects again you would have to cast them back to their original type what may cause a runtime exception: ClassCastException.
Generics are about defining data structures for different types with the same code, but if you want to use a generic class you have to parameterize it with its type arguments. This ensures that the type is known at runtime and is the great advantage of generics (avoid ClassCastException).
However, you can still specify a more general type that allows multiple types.
For example, if you define it the following way you can store any object that implements Serializable.
private LinkedHashMap<String, ? extends Serializable> keyVal;
As you can see, this allows you to restrict the permitted types to a common property (i.e., to be a subclass of a more general type). That way, you use the map's values as objects of the more general class, because it's everything you know (and want to know) about the objetcs.
It's better to have a look at:
Generics lesson on Oracle.com.
Care when should use wild cards (?) and you should use Generics.
Using Object in type of LinkedHashMap<String, Object> keyVal; is not recommended.
Like some people said, you could use Object for generic variable type, especially while using generic method or not knowing what data type user would come, like this simple one:
import java.util.Scanner;
public class GenericMethod {
public static void main(String[] args) {
System.out.println("Type something that's yours: ");
Scanner sc = new Scanner(System.in);
Object thing;
thing = sc.next();
isMine(thing);
}
// Generic Method
public static <T> void isMine(T x) {
System.out.println(x + " is mine.");
}
}

Categories