Create a custom Transformer In Java spark ml

Create a custom Transformer In Java spark ml - java

I want to create a custom Spark Transformer in Java.
The Transformer is text preprocessor which acts like a Tokenizer. It takes an input column and an output column as parameters.
I looked around and I found 2 Scala Traits HasInputCol and HasOutputCol.
How can I create a class that extends Transformer and implements HasInputCol and OutputCol?
My goal is have something like this.
// Dataset that have a String column named "text"
DataSet<Row> dataset;
CustomTransformer customTransformer = new CustomTransformer();
customTransformer.setInputCol("text");
customTransformer.setOutputCol("result");
// result that have 2 String columns named "text" and "result"
DataSet<Row> result = customTransformer.transform(dataset);

As SergGr suggested, you can extend UnaryTransformer. However it is quite tricky.
NOTE: All the below comments apply to Spark version 2.2.0.
To address the issue described in SPARK-12606, where they were getting "...Param null__inputCol does not belong to...", you should implement String uid() like this:
#Override
public String uid() {
return getUid();
}
private String getUid() {
if (uid == null) {
uid = Identifiable$.MODULE$.randomUID("mycustom");
}
return uid;
}
Apparently they were initializing uid in the constructor. But the thing is that UnaryTransformer's inputCol (and outputCol) is initialized before uid is initialized in the inheriting class. See HasInputCol:
final val inputCol: Param[String] = new Param[String](this, "inputCol", "input column name")
This is how Param is constructed:
def this(parent: Identifiable, name: String, doc: String) = this(parent.uid, name, doc)
Thus, when parent.uid is evaluated, the custom uid() implementation is called and at this point uid is still null. By implementing uid() with lazy evaluation you make sure uid() never returns null.
In your case though:
Param d7ac3108-799c-4aed-a093-c85d12833a4e__inputCol does not belong to fe3d99ba-e4eb-4e95-9412-f84188d936e3
it seems to be a bit different. Because "d7ac3108-799c-4aed-a093-c85d12833a4e" != "fe3d99ba-e4eb-4e95-9412-f84188d936e3", it looks like your implementation of the uid() method returns a new value on each call. Perhaps in your case it was implemented it so:
#Override
public String uid() {
return Identifiable$.MODULE$.randomUID("mycustom");
}
By the way, when extending UnaryTransformer, make sure the transform function is Serializable.

You probably want to inherit your CustomTransformer from org.apache.spark.ml.UnaryTransformer. You may try something like this:
import org.apache.spark.ml.UnaryTransformer;
import org.apache.spark.ml.util.Identifiable$;
import org.apache.spark.sql.types.DataType;
import org.apache.spark.sql.types.DataTypes;
import scala.Function1;
import scala.collection.JavaConversions$;
import scala.collection.immutable.Seq;
import java.util.Arrays;
public class MyCustomTransformer extends UnaryTransformer<String, scala.collection.immutable.Seq<String>, MyCustomTransformer>
{
private final String uid = Identifiable$.MODULE$.randomUID("mycustom");
#Override
public String uid()
{
return uid;
}
#Override
public Function1<String, scala.collection.immutable.Seq<String>> createTransformFunc()
{
// can't use labmda syntax :(
return new scala.runtime.AbstractFunction1<String, Seq<String>>()
{
#Override
public Seq<String> apply(String s)
{
// do the logic
String[] split = s.toLowerCase().split("\\s");
// convert to Scala type
return JavaConversions$.MODULE$.iterableAsScalaIterable(Arrays.asList(split)).toList();
}
};
}
#Override
public void validateInputType(DataType inputType)
{
super.validateInputType(inputType);
if (inputType != DataTypes.StringType)
throw new IllegalArgumentException("Input type must be string type but got " + inputType + ".");
}
#Override
public DataType outputDataType()
{
return DataTypes.createArrayType(DataTypes.StringType, true); // or false? depends on your data
}
}

I'm a bit late to the party, but I have a few examples of custom Java Spark transforms here: https://github.com/dafrenchyman/spark/tree/master/src/main/java/com/mrsharky/spark/ml/feature
Here's an example with just an input column, but you can easily add an output column following the same patterns. This doesn't implement the readers and writers though. You'll need to check the link above to see how to do that.
public class DropColumns extends Transformer implements Serializable,
DefaultParamsWritable {
private StringArrayParam _inputCols;
private final String _uid;
public DropColumns(String uid) {
_uid = uid;
}
public DropColumns() {
_uid = DropColumns.class.getName() + "_" +
UUID.randomUUID().toString();
}
// Getters
public String[] getInputCols() { return get(_inputCols).get(); }
// Setters
public DropColumns setInputCols(String[] columns) {
_inputCols = inputCols();
set(_inputCols, columns);
return this;
}
public DropColumns setInputCols(List<String> columns) {
String[] columnsString = columns.toArray(new String[columns.size()]);
return setInputCols(columnsString);
}
public DropColumns setInputCols(String column) {
String[] columns = new String[]{column};
return setInputCols(columns);
}
// Overrides
#Override
public Dataset<Row> transform(Dataset<?> data) {
List<String> dropCol = new ArrayList<String>();
Dataset<Row> newData = null;
try {
for (String currColumn : this.get(_inputCols).get() ) {
dropCol.add(currColumn);
}
Seq<String> seqCol = JavaConverters.asScalaIteratorConverter(dropCol.iterator()).asScala().toSeq();
newData = data.drop(seqCol);
} catch (Exception ex) {
ex.printStackTrace();
}
return newData;
}
#Override
public Transformer copy(ParamMap extra) {
DropColumns copied = new DropColumns();
copied.setInputCols(this.getInputCols());
return copied;
}
#Override
public StructType transformSchema(StructType oldSchema) {
StructField[] fields = oldSchema.fields();
List<StructField> newFields = new ArrayList<StructField>();
List<String> columnsToRemove = Arrays.asList( get(_inputCols).get() );
for (StructField currField : fields) {
String fieldName = currField.name();
if (!columnsToRemove.contains(fieldName)) {
newFields.add(currField);
}
}
StructType schema = DataTypes.createStructType(newFields);
return schema;
}
#Override
public String uid() {
return _uid;
}
#Override
public MLWriter write() {
return new DropColumnsWriter(this);
}
#Override
public void save(String path) throws IOException {
write().saveImpl(path);
}
public static MLReader<DropColumns> read() {
return new DropColumnsReader();
}
public StringArrayParam inputCols() {
return new StringArrayParam(this, "inputCols", "Columns to be dropped");
}
public DropColumns load(String path) {
return ( (DropColumnsReader) read()).load(path);
}
}

Even later to the party, I have another update. I had a hard time finding information on extending Spark Transformers to Java, so I am posting my findings here.
I have also been working on custom transformers in Java. At the time of writing, it is a little easier to include save/load functionality. One can create writable parameters by implementing DefaultParamsWritable. Implementing DefaultParamsReadable, however, still results in an exception for me, but there is a simple work-around.
Here is the basic implementation of a column renamer:
public class ColumnRenamer extends Transformer implements DefaultParamsWritable {
/**
* A custom Spark transformer that renames the inputCols to the outputCols.
*
* We would also like to implement DefaultParamsReadable<ColumnRenamer>, but
* there appears to be a bug in DefaultParamsReadable when used in Java, see:
* https://issues.apache.org/jira/browse/SPARK-17048
**/
private final String uid_;
private StringArrayParam inputCols_;
private StringArrayParam outputCols_;
private HashMap<String, String> renameMap;
public ColumnRenamer() {
this(Identifiable.randomUID("ColumnRenamer"));
}
public ColumnRenamer(String uid) {
this.uid_ = uid;
init();
}
#Override
public String uid() {
return uid_;
}
#Override
public Transformer copy(ParamMap extra) {
return defaultCopy(extra);
}
/**
* The below method is a work around, see:
* https://issues.apache.org/jira/browse/SPARK-17048
**/
public static MLReader<ColumnRenamer> read() {
return new DefaultParamsReader<>();
}
public Dataset<Row> transform(Dataset<?> dataset) {
Dataset<Row> transformedDataset = dataset.toDF();
// Check schema.
transformSchema(transformedDataset.schema(), true); // logging = true
// Rename columns.
for (Map.Entry<String, String> entry: renameMap.entrySet()) {
String inputColName = entry.getKey();
String outputColName = entry.getValue();
transformedDataset = transformedDataset
.withColumnRenamed(inputColName, outputColName);
}
return transformedDataset;
}
#Override
public StructType transformSchema(StructType schema) {
// Validate the parameters here...
String[] inputCols = getInputCols();
String[] outputCols = getOutputCols();
// Create rename mapping.
renameMap = new HashMap<> ();
for (int i = 0; i < inputCols.length; i++) {
renameMap.put(inputCols[i], outputCols[i]);
}
// Rename columns.
ArrayList<StructField> fields = new ArrayList<> ();
for (StructField field: schema.fields()) {
String columnName = field.name();
if (renameMap.containsKey(columnName)) {
columnName = renameMap.get(columnName);
}
fields.add(new StructField(
columnName, field.dataType(), field.nullable(), field.metadata()
));
}
// Return as StructType.
return new StructType(fields.toArray(new StructField[0]));
}
private void init() {
inputCols_ = new StringArrayParam(this, "inputCols", "input column names");
outputCols_ = new StringArrayParam(this, "outputCols", "output column names");
}
public StringArrayParam inputCols() {
return inputCols_;
}
public ColumnRenamer setInputCols(String[] value) {
set(inputCols_, value);
return this;
}
public String[] getInputCols() {
return getOrDefault(inputCols_);
}
public StringArrayParam outputCols() {
return outputCols_;
}
public ColumnRenamer setOutputCols(String[] value) {
set(outputCols_, value);
return this;
}
public String[] getOutputCols() {
return getOrDefault(outputCols_);
}
}

Related

Getting enum name based on value java in run time

I need to get the enum name based on value. I am given with enum class and value and need to pick the corresponding name during run time .
I have a class called Information as below.
class Information {
private String value;
private String type;
private String cValue;
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
public String getcValue() {
return cValue;
}
public void setcValue(String cValue) {
this.cValue = cValue;
}
public static void main(String args[]) {
Information inf = new Information();
inf.setType("com.abc.SignalsEnum");
inf.setValue("1");
}
}
class SignalEnum {
RED("1"), GREEN("2"), ORANGE("3");
private String sign;
SignalEnum(String pattern) {
this.sign = pattern;
}
}
class MobileEnum {
SAMSUNG("1"), NOKIA("2"), APPLE("3");
private String mobile;
MobileEnum(String mobile) {
this.mobile = mobile;
}
}
In run time i will come to know the enum name using the attribute type from the Information class and also i am getting the value. I need to figure out the corresponding enum to set the value for cValue attribute of Information class.
Just for example i have provided two enums like SignalEnum and MobileEnum but in my actual case i will get one among 100 enum types. Hence i dont want to check type cast. I am looking for some solution using reflection to se the cValue.

Here is a simple resolver for any enum class.
Since reflection operations are expensive, it's better to prepare all required data once and then just query for it.
class EnumResolver {
private Map<String, Enum> map = new ConcurrentHashMap<>();
public EnumResolver(String className) {
try {
Class enumClass = Class.forName(className);
// look for backing property field, e.g. "sign" in SignalEnum
Field accessor = Arrays.stream(enumClass.getDeclaredFields())
.filter(f -> f.getType().equals(String.class))
.findFirst()
.orElseThrow(() -> new NoSuchFieldException("Not found field to access enum backing value"));
accessor.setAccessible(true);
// populate map with pairs like ["1" => SignalEnum.RED, "2" => SignalEnum.GREEN, etc]
for (Enum e : getEnumValues(enumClass)) {
map.put((String) accessor.get(e), e);
}
accessor.setAccessible(false);
} catch (ReflectiveOperationException e) {
throw new RuntimeException(e);
}
}
public Enum resolve(String backingValue) {
return map.get(backingValue);
}
private <E extends Enum> E[] getEnumValues(Class<E> enumClass) throws ReflectiveOperationException {
Field f = enumClass.getDeclaredField("$VALUES");
f.setAccessible(true);
Object o = f.get(null);
f.setAccessible(false);
return (E[]) o;
}
}
And here is simple JUnit test
public class EnumResolverTest {
#Test
public void testSignalEnum() {
EnumResolver signalResolver = new EnumResolver("com.abc.SignalEnum");
assertEquals(SignalEnum.RED, signalResolver.resolve("1"));
assertEquals(SignalEnum.GREEN, signalResolver.resolve("2"));
assertEquals(SignalEnum.ORANGE, signalResolver.resolve("3"));
}
#Test
public void testMobileEnum() {
EnumResolver mobileResolver = new EnumResolver("com.abc.MobileEnum");
assertEquals(MobileEnum.SAMSUNG, mobileResolver.resolve("1"));
assertEquals(MobileEnum.NOKIA, mobileResolver.resolve("2"));
assertEquals(MobileEnum.APPLE, mobileResolver.resolve("3"));
}
}
And again for performance sake you can also instantiate these various resolvers once and put them into a separate Map
Map<String, EnumResolver> resolverMap = new ConcurrentHashMap<>();
resolverMap.put("com.abc.MobileEnum", new EnumResolver("com.abc.MobileEnum"));
resolverMap.put("com.abc.SignalEnum", new EnumResolver("com.abc.SignalEnum"));
// etc
Information inf = new Information();
inf.setType("com.abc.SignalsEnum");
inf.setValue("1");
SignalEnum red = (SignalEnum) resolverMap.get(inf.getType()).resolve(inf.getValue());

Is my POJO object for firebase Realtime Database built correctly?

I am trying to build POJO for my firebase RealTime Database.
Am i doing it correctly according to my Realtime Database? Link below
detailData,detailContent,detailTitleContent,isDetail,titleContent they named the same everywhere,they just have different text in them.
public class POJO {
private String titleContent;
private String detailContent;
private String detailTitleContent;
private List<String> detailData = new ArrayList<>();
private List<String> textInfo = new ArrayList<>();
private boolean isDetail;
private boolean isList;
public POJO() {
}
public POJO(String titleContent, String detailContent, String
detailTitleContent, List<String> detailData, List<String> textInfo,
boolean isDetail, boolean isList) {
this.titleContent = titleContent;
this.detailContent = detailContent;
this.detailTitleContent = detailTitleContent;
this.detailData = detailData;
this.textInfo = textInfo;
this.isDetail = isDetail;
this.isList = isList;
}
public String getTitleContent() {
return titleContent;
}
public String getDetailContent() {
return detailContent;
}
public String getDetailTitleContent() {
return detailTitleContent;
}
public List<String> getDetailData() {
return detailData;
}
public List<String> getTextInfo() {
return textInfo;
}
public boolean isDetail() {
return isDetail;
}
public boolean isList() {
return isList;
}
}

Based on the following response (which you've provided), I'll be creating the POJO Classes
{
"datas": [{
"detailData": [{
"detailContent": "<p>LOTS of information</p>",
"detailTitleContent": "Title"
}, {
"detailContent": "<p>Lots of more information!</p>",
"detailTitleContent": "Second Title"
}],
"isDetail": false,
"titleContent": "Last Title"
}]
}
Therefore, looking at this response, you can see that your first (Let's name is "MyPojo") class will have an array of "datas" object.
public class MyPojo
{
private Datas[] datas;
public Datas[] getDatas (){
return datas;
}
public void setDatas (Datas[] datas){
this.datas = datas;
}
}
Now we have to make a model object for the "Datas":
public class Datas
{
private String isDetail;
private String titleContent;
private DetailData[] detailData;
public String getIsDetail (){
return isDetail;
}
public void setIsDetail (String isDetail){
this.isDetail = isDetail;
}
public String getTitleContent (){
return titleContent;
}
public void setTitleContent (String titleContent){
this.titleContent = titleContent;
}
public DetailData[] getDetailData (){
return detailData;
}
public void setDetailData (DetailData[] detailData){
this.detailData = detailData;
}
}
Last but not least, "DetailData" model which is another array:
public class DetailData
{
private String detailTitleContent;
private String detailContent;
public String getDetailTitleContent (){
return detailTitleContent;
}
public void setDetailTitleContent (String detailTitleContent){
this.detailTitleContent = detailTitleContent;
}
public String getDetailContent (){
return detailContent;
}
public void setDetailContent (String detailContent){
this.detailContent = detailContent;
}
}
From here, you should have a complete Pojo for your JSON response and ready to be handled. Just want to point 2 things out for your benefit:
1. I highly recommend you reading the following tutorial Android JSON Parsing Tutorial and pay close attention to the The difference between [ and { – (Square brackets and Curly brackets) section as you want to gain in-depth understanding of JSONArray and JSONObject.
2. Use JSONLint to validate your JSON response as it's helpful sometimes and also use Convert XML or JSON to Java Pojo Classes - Online tool to generate the Pojo classes based on the JSON response (I used it myself in this case). The major benefit behind this is accuracy, takes less than 1 minute to copy and implement.
Good luck and let me know if you need further assistance :)

Instantiating DynamoDBQueryExpression with generic classtypes

Edit: I was trying to simplify my problem at hand a little, but turns out, it created more confusion instead. Here's the real deal:
I am working with AWS's Java SDK for DynamoDB. Using the DynamoDBMapper class, I am trying to query DynamoDB to retrieve items from a particular table. I have several objects that map to my DynamoDB tables, and I was hoping to have a generic method that could accept the mapped objects, query the table, and return the item result.
Psudo-code:
#DynamoDBTable(tableName="testTable")
public class DBObject {
private String hashKey;
private String attribute1;
#DynamoDBHashKey(attributeName="hashKey")
public String getHashKey() { return this.hashKey; }
public void setHashKey(String hashKey)
#DynamoDBAttribute(attributeName="attribute1")
public String getAttribute1() { return this.attribute1; }
public void setAttribute1(String attribute1) { this.attribute1 = attribute1; }
}
public class DatabaseRetrieval {
public DatabaseRetrieval()
{
DBObject dbObject = new DBObject();
dbObject.setHashKey = "12345";
DBRetrievalAgent agent = new DBRetrievalAgent;
dbObject = agent.retrieveDBObject(dbObject.getClass(), dbObject);
}
}
public class DBRetrievalAgent {
public Object retrieveDBObject(Class<?> classType, Object dbObject)
{
DynamoDBQueryExpression<classType> temp = new DynamoDBQueryExpression<classType>().withHashKeyValues(dbObject);
return this.dynamoDBMapper.query(classType, temp);
}
}

Use a type witness within your method:
public <T> String getResult(Class<T> type) {
List<T> newList = new ArrayList<>();
//other code
}

Try this
ArrayList<T> newList = new ArrayList<>();

You can specify the type as T for your getResult() to make it generic (i.e., accepts any class) as shown below:
public <T> String getResult(T t) {
String result = "";
List<T> newList = new ArrayList<>();
// perform actions
return result;
}

Base Method Invocation on enum

I'm learning about Java enums and I was wondering what is the best approach to check multiple enums for a matching value in order to call a specific method. I have defined two separate enums below that are used by getValue method's colName parameter to determine what method to execute. So the enum drives the method call. There has to be a more efficient way to do this than what I have below. Any suggestions?
I want to avoid having to do the below (pseudo code):
if(colName.equalsIgnoreCase("ATTRIBUTEONE") ||
colName.equalsIgnoreCase("ATTRIBUTETWO") ||
colName.equalsIgnoreCase("ATTRIBUTETWO")){
callAsStringMethod();
} else if(colName.equalsIgnoreCase("ATTRIBUTEFOUR")){
callAsIntegerMethod();
}
My Attempt using enum:
public class RowHelper implements IRowHelper
public static enum StringAttributes {
ATTRIBUTEONE,
ATTRIBUTETWO,
ATTRIBUTETHREE;
}
public static enum IntegerAttributes {
ATTRIBUTEFOUR,
ATTRIBUTEFIVE,
ATTRIBUTESIX,
ATTRIBUTESEVEN;
}
#Override
public String getValue(String colName) throws Exception{
boolean colFound=false;
Object retValue = null;
for (EConstants.StringAttributes attribute : EConstants.StringAttributes.values()) {
if(colName.toUpperCase().equals(attribute)){
retValue = callAsStringMethod();
colFound=true;
}
}
for (EConstants.IntegerAttributes attribute : EConstants.IntegerAttributes.values()) {
if(colName.toUpperCase().equals(attribute)){
retValue = callAsIntegerMethod();
colFound=true;
}
}
if(!colFound)
throw new Exception("column not found");
if(retValue instanceof String )
return (String) retValue;
else
return retValue.toString();
}
}

Try this:
public String getValue(String colName) throws Exception {
final String name = colName != null ? colName.trim().toUpperCase() : "";
try {
EConstants.StringAttributes.valueOf(name);
return callAsStringMethod().toString();
} catch (Exception e1) {
try {
EConstants.IntegerAttributes.valueOf(name);
return callAsIntegerMethod().toString();
} catch (Exception e2) {
throw new Exception("column not found");
}
}
}
The method's now returning the appropriate value, according to the latest edit of the question.
EDIT :
According to Kirk Woll and Louis Wasserman's benchmark, looping through values is significantly faster than doing a try/catch. So here's a simplified version of the original code, expect it to be a bit faster:
public String getValue(String colName) throws Exception {
final String name = colName != null ? colName.trim().toUpperCase() : "";
for (EConstants.StringAttributes attribute : EConstants.StringAttributes.values())
if (name.equals(attribute))
return callAsStringMethod().toString();
for (EConstants.IntegerAttributes attribute : EConstants.IntegerAttributes.values())
if (name.equals(attribute))
return callAsIntegerMethod().toString();
throw new Exception("column not found");
}

Well, this is a weird design ._. Anyway, you can use enum, but I would something like:
public interface RowAttribute {
String getValue(IRowHelper rowHelper);
}
public class StringRowAttribute implements RowAttribute {
#Override
public String getValue(IRowHelper rowHelper) {
return rowHelper.callAsStringMethod();
}
}
public class IntegerRowAttribute implements RowAttribute {
#Override
public String getValue(IRowHelper rowHelper) {
return rowHelper.callAsIntegerMethod().toString();
}
}
public class RowHelper implements IRowHelper {
private static final RowAttribute INTEGER_ATTRIBUTE = new IntegerRowAttribute();
private static final RowAttribute STRING_ATTRIBUTE = new StringRowAttribute();
private static enum Attribute {
ATTRIBUTEONE(INTEGER_ATTRIBUTE),
ATTRIBUTETWO(INTEGER_ATTRIBUTE),
ATTRIBUTETHREE(INTEGER_ATTRIBUTE);
ATTRIBUTEFOUR(STRING_ATTRIBUTE),
ATTRIBUTEFIVE(STRING_ATTRIBUTE),
ATTRIBUTESIX(STRING_ATTRIBUTE),
ATTRIBUTESEVEN(STRING_ATTRIBUTE);
private final RowAttribute attribute;
private Attribute(RowAttribute attribute) {
this.attribute = attribute;
}
public RowAttribute getAttributeResolver() {
return this.attribute;
}
}
#Override
public String getValue(String colName) throws Exception {
final String name = colName != null ? colName.trim() : "";
for (Attribute attribute : Attribute.values()) {
if (attribute.name().equalsIgnoreCase(name)) {
return attribute.getAttributeResolver().getValue(this);
}
}
throw new Exception(String.format("Attribute for column %s not found", colName));
}
}
Then you don't need to create more than one enum and use its power to iterate through the possible values. You would only need to make the methods callAsStringMethod/callAsIntegerMethod public. Another way is to insert the implementations inside RowHelper. Something like this:
public class RowHelper implements IRowHelper {
public interface RowAttribute {
String getValue();
}
private static final RowAttribute INTEGER_ATTRIBUTE = new RowAttribute() {
#Override
public String getValue() {
return callAsIntegerMethod().toString();
}
};
private static final RowAttribute STRING_ATTRIBUTE = new RowAttribute() {
#Override
public String getValue() {
return callAsStringMethod();
}
};
...
#Override
public String getValue(String colName) throws Exception {
...
if (attribute.name().equalsIgnoreCase(name)) {
return attribute.getAttributeResolver().getValue();
}
...
}
}
Anyway, I don't understand in your method how you get the attribute value really without passing as parameter the colName to it.

The most efficient way to do this with multiple enums is, frankly, to make them the same enum. There isn't really a better way.
That said, instead of the loop you have, you can use Enum.valueOf(EnumClass.class, name) to find the enum value of that type with the specified name, rather than looping like you're doing.

Parsing enums with SuperCSV ICsvBeanReader

I parse CSV file and create a domain objects using supercsv. My domain object has one enum field, e.g.:
public class TypeWithEnum {
private Type type;
public TypeWithEnum(Type type) {
this.type = type;
}
public Type getType() {
return type;
}
public void setType(Type type) {
this.type = type;
}
}
My enum looks like this:
public enum Type {
CANCEL, REFUND
}
Trying to create beans out of this CSV file:
final String[] header = new String[]{ "type" };
ICsvBeanReader inFile = new CsvBeanReader(new FileReader(
getFilePath(this.getClass(), "learning/enums.csv")), CsvPreference.STANDARD_PREFERENCE);
final CellProcessor[] processors =
new CellProcessor[]{ TODO WHAT TO PUT HERE? };
TypeWithEnum myEnum = inFile.read(
TypeWithEnum.class, header, processors);
this fails with
Error while filling an object context: null offending processor: null
at org.supercsv.io.CsvBeanReader.fillObject(Unknown Source)
at org.supercsv.io.CsvBeanReader.read(Unknown Source)
Any hint on parsing enums? Should I write my own processor for this?
I already tried to write my own processor, something like this:
class MyCellProcessor extends CellProcessorAdaptor {
public Object execute(Object value, CSVContext context) {
Type type = Type.valueOf(value.toString());
return next.execute(type, context);
}
}
but it dies with the same exception.
The content of my enums.csv file is simple:
CANCEL
REFUND

The exception you're getting is because CsvBeanReader cannot instantiate your TypeWithEnum class, as it doesn't have a default (no arguments) constructor. It's probably a good idea to print the stack trace so you can see the full details of what went wrong.
Super CSV relies on the fact that you should have supplied a valid Java bean, i.e. a class with a default constructor and public getters/setters for each of its fields.
So you can fix the exception by adding the following to TypeWithEnum:
public TypeWithEnum(){
}
As for hints on parsing enums the two easiest options are:
1. Using the HashMapper processor
#Test
public void hashMapperTest() throws Exception {
// two lines of input
String input = "CANCEL\nREFUND";
// you could also put the header in the CSV file
// and use inFile.getCSVHeader(true)
final String[] header = new String[] { "type" };
// map from enum name to enum
final Map<Object, Object> typeMap = new HashMap<Object, Object>();
for( Type t : Type.values() ) {
typeMap.put(t.name(), t);
}
// HashMapper will convert from the enum name to the enum
final CellProcessor[] processors =
new CellProcessor[] { new HashMapper(typeMap) };
ICsvBeanReader inFile =
new CsvBeanReader(new StringReader(input),
CsvPreference.STANDARD_PREFERENCE);
TypeWithEnum myEnum;
while((myEnum = inFile.read(TypeWithEnum.class, header, processors)) !=null){
System.out.println(myEnum.getType());
}
}
2. Creating a custom CellProcessor
Create your processor
package org.supercsv;
import org.supercsv.cellprocessor.CellProcessorAdaptor;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.exception.SuperCSVException;
import org.supercsv.util.CSVContext;
public class TypeProcessor extends CellProcessorAdaptor {
public TypeProcessor() {
super();
}
public TypeProcessor(CellProcessor next) {
super(next);
}
public Object execute(Object value, CSVContext context) {
if (!(value instanceof String)){
throw new SuperCSVException("input should be a String!");
}
// parse the String to a Type
Type type = Type.valueOf((String) value);
// execute the next processor in the chain
return next.execute(type, context);
}
}
Use it!
#Test
public void customProcessorTest() throws Exception {
// two lines of input
String input = "CANCEL\nREFUND";
final String[] header = new String[] { "type" };
// HashMapper will convert from the enum name to the enum
final CellProcessor[] processors =
new CellProcessor[] { new TypeProcessor() };
ICsvBeanReader inFile =
new CsvBeanReader(new StringReader(input),
CsvPreference.STANDARD_PREFERENCE);
TypeWithEnum myEnum;
while((myEnum = inFile.read(TypeWithEnum.class, header, processors)) !=null){
System.out.println(myEnum.getType());
}
}
I'm working on an upcoming release of Super CSV. I'll be sure to update the website to make it clear that you have to have a valid Java bean - and maybe a description of the available processors, for those not inclined to read Javadoc.

Here is a generic cell processor for enums
/** A cell processor to convert strings to enums. */
public class EnumCellProcessor<T extends Enum<T>> implements CellProcessor {
private Class<T> enumClass;
private boolean ignoreCase;
/**
* #param enumClass the enum class used for conversion
*/
public EnumCellProcessor(Class<T> enumClass) {
this.enumClass = enumClass;
}
/**
* #param enumClass the enum class used for conversion
* #param ignoreCase if true, the conversion is made case insensitive
*/
public EnumCellProcessor(Class<T> enumClass, boolean ignoreCase) {
this.enumClass = enumClass;
this.ignoreCase = ignoreCase;
}
#Override
public Object execute(Object value, CsvContext context) {
if (value == null)
return null;
String valueAsStr = value.toString();
for (T s : enumClass.getEnumConstants()) {
if (ignoreCase ? s.name().equalsIgnoreCase(valueAsStr) : s.name().equals(valueAsStr)) {
return s;
}
}
throw new SuperCsvCellProcessorException(valueAsStr + " cannot be converted to enum " + enumClass.getName(), context, this);
}
}
and you will use it
new EnumCellProcessor<Type>(Type.class);

I tried to reproduce your Error but everything works for me. I use SuperCSV 1.52:
private enum ENUMS_VALUES{TEST1, TEST2, TEST3};
#Test
public void testEnum3() throws IOException
{
String testInput = new String("TEST1\nTEST2\nTEST3");
ICsvBeanReader reader = new CsvBeanReader(new StringReader(testInput), CsvPreference.EXCEL_NORTH_EUROPE_PREFERENCE);
final String[] header = new String[] {"header"};
reader.read(this.getClass(), header, new CellProcessor[] {new CellProcessorAdaptor() {
#Override
public Object execute(Object pValue, CSVContext pContext)
{
return next.execute(ENUMS_VALUES.valueOf((String)pValue), pContext);
}}});
}
#Test
public void testEnum4() throws IOException
{
String testInput = new String("TEST1\nTEST2\nTEST3");
ICsvBeanReader reader = new CsvBeanReader(new StringReader(testInput), CsvPreference.EXCEL_NORTH_EUROPE_PREFERENCE);
final String[] header = new String[] {"header"};
reader.read(this.getClass(), header, new CellProcessor[] {new CellProcessorAdaptor()
{
#Override
public Object execute(Object pValue, CSVContext pContext)
{
return ENUMS_VALUES.valueOf((String)pValue);
}}});
}
public void setHeader(ENUMS_VALUES value)
{
System.out.println(value);
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Create a custom Transformer In Java spark ml - java

Related

Getting enum name based on value java in run time

Is my POJO object for firebase Realtime Database built correctly?

Instantiating DynamoDBQueryExpression with generic classtypes

Base Method Invocation on enum

Parsing enums with SuperCSV ICsvBeanReader

Categories

Resources