I'm learning about Java enums and I was wondering what is the best approach to check multiple enums for a matching value in order to call a specific method. I have defined two separate enums below that are used by getValue method's colName parameter to determine what method to execute. So the enum drives the method call. There has to be a more efficient way to do this than what I have below. Any suggestions?
I want to avoid having to do the below (pseudo code):
if(colName.equalsIgnoreCase("ATTRIBUTEONE") ||
colName.equalsIgnoreCase("ATTRIBUTETWO") ||
colName.equalsIgnoreCase("ATTRIBUTETWO")){
callAsStringMethod();
} else if(colName.equalsIgnoreCase("ATTRIBUTEFOUR")){
callAsIntegerMethod();
}
My Attempt using enum:
public class RowHelper implements IRowHelper
public static enum StringAttributes {
ATTRIBUTEONE,
ATTRIBUTETWO,
ATTRIBUTETHREE;
}
public static enum IntegerAttributes {
ATTRIBUTEFOUR,
ATTRIBUTEFIVE,
ATTRIBUTESIX,
ATTRIBUTESEVEN;
}
#Override
public String getValue(String colName) throws Exception{
boolean colFound=false;
Object retValue = null;
for (EConstants.StringAttributes attribute : EConstants.StringAttributes.values()) {
if(colName.toUpperCase().equals(attribute)){
retValue = callAsStringMethod();
colFound=true;
}
}
for (EConstants.IntegerAttributes attribute : EConstants.IntegerAttributes.values()) {
if(colName.toUpperCase().equals(attribute)){
retValue = callAsIntegerMethod();
colFound=true;
}
}
if(!colFound)
throw new Exception("column not found");
if(retValue instanceof String )
return (String) retValue;
else
return retValue.toString();
}
}
Try this:
public String getValue(String colName) throws Exception {
final String name = colName != null ? colName.trim().toUpperCase() : "";
try {
EConstants.StringAttributes.valueOf(name);
return callAsStringMethod().toString();
} catch (Exception e1) {
try {
EConstants.IntegerAttributes.valueOf(name);
return callAsIntegerMethod().toString();
} catch (Exception e2) {
throw new Exception("column not found");
}
}
}
The method's now returning the appropriate value, according to the latest edit of the question.
EDIT :
According to Kirk Woll and Louis Wasserman's benchmark, looping through values is significantly faster than doing a try/catch. So here's a simplified version of the original code, expect it to be a bit faster:
public String getValue(String colName) throws Exception {
final String name = colName != null ? colName.trim().toUpperCase() : "";
for (EConstants.StringAttributes attribute : EConstants.StringAttributes.values())
if (name.equals(attribute))
return callAsStringMethod().toString();
for (EConstants.IntegerAttributes attribute : EConstants.IntegerAttributes.values())
if (name.equals(attribute))
return callAsIntegerMethod().toString();
throw new Exception("column not found");
}
Well, this is a weird design ._. Anyway, you can use enum, but I would something like:
public interface RowAttribute {
String getValue(IRowHelper rowHelper);
}
public class StringRowAttribute implements RowAttribute {
#Override
public String getValue(IRowHelper rowHelper) {
return rowHelper.callAsStringMethod();
}
}
public class IntegerRowAttribute implements RowAttribute {
#Override
public String getValue(IRowHelper rowHelper) {
return rowHelper.callAsIntegerMethod().toString();
}
}
public class RowHelper implements IRowHelper {
private static final RowAttribute INTEGER_ATTRIBUTE = new IntegerRowAttribute();
private static final RowAttribute STRING_ATTRIBUTE = new StringRowAttribute();
private static enum Attribute {
ATTRIBUTEONE(INTEGER_ATTRIBUTE),
ATTRIBUTETWO(INTEGER_ATTRIBUTE),
ATTRIBUTETHREE(INTEGER_ATTRIBUTE);
ATTRIBUTEFOUR(STRING_ATTRIBUTE),
ATTRIBUTEFIVE(STRING_ATTRIBUTE),
ATTRIBUTESIX(STRING_ATTRIBUTE),
ATTRIBUTESEVEN(STRING_ATTRIBUTE);
private final RowAttribute attribute;
private Attribute(RowAttribute attribute) {
this.attribute = attribute;
}
public RowAttribute getAttributeResolver() {
return this.attribute;
}
}
#Override
public String getValue(String colName) throws Exception {
final String name = colName != null ? colName.trim() : "";
for (Attribute attribute : Attribute.values()) {
if (attribute.name().equalsIgnoreCase(name)) {
return attribute.getAttributeResolver().getValue(this);
}
}
throw new Exception(String.format("Attribute for column %s not found", colName));
}
}
Then you don't need to create more than one enum and use its power to iterate through the possible values. You would only need to make the methods callAsStringMethod/callAsIntegerMethod public. Another way is to insert the implementations inside RowHelper. Something like this:
public class RowHelper implements IRowHelper {
public interface RowAttribute {
String getValue();
}
private static final RowAttribute INTEGER_ATTRIBUTE = new RowAttribute() {
#Override
public String getValue() {
return callAsIntegerMethod().toString();
}
};
private static final RowAttribute STRING_ATTRIBUTE = new RowAttribute() {
#Override
public String getValue() {
return callAsStringMethod();
}
};
...
#Override
public String getValue(String colName) throws Exception {
...
if (attribute.name().equalsIgnoreCase(name)) {
return attribute.getAttributeResolver().getValue();
}
...
}
}
Anyway, I don't understand in your method how you get the attribute value really without passing as parameter the colName to it.
The most efficient way to do this with multiple enums is, frankly, to make them the same enum. There isn't really a better way.
That said, instead of the loop you have, you can use Enum.valueOf(EnumClass.class, name) to find the enum value of that type with the specified name, rather than looping like you're doing.
Related
This would mean that the class was initialized, but the variables were not set.
A sample Class:
public class User {
String id = null;
String name = null;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}
The actual class is huge that I prefer not to check if(xyz == null) for each of the variables.
Another non-reflective solution for Java 8, in the line of paxdiabo's answer but without using a series of if's, would be to stream all fields and check for nullness:
return Stream.of(id, name)
.allMatch(Objects::isNull);
This remains quite easy to maintain while avoiding the reflection hammer.
Try something like this:
public boolean checkNull() throws IllegalAccessException {
for (Field f : getClass().getDeclaredFields())
if (f.get(this) != null)
return false;
return true;
}
Although it would probably be better to check each variable if at all feasible.
This can be done fairly easily using a Lombok generated equals and a static EMPTY object:
import lombok.Data;
public class EmptyCheck {
public static void main(String[] args) {
User user1 = new User();
User user2 = new User();
user2.setName("name");
System.out.println(user1.isEmpty()); // prints true
System.out.println(user2.isEmpty()); // prints false
}
#Data
public static class User {
private static final User EMPTY = new User();
private String id;
private String name;
private int age;
public boolean isEmpty() {
return this.equals(EMPTY);
}
}
}
Prerequisites:
Default constructor should not be implemented with custom behavior as that is used to create the EMPTY object
All fields of the class should have an implemented equals (built-in Java types are usually not a problem, in case of custom types you can use Lombok)
Advantages:
No reflection involved
As new fields added to the class, this does not require any maintenance as due to Lombok they will be automatically checked in the equals implementation
Unlike some other answers this works not just for null checks but also for primitive types which have a non-null default value (e.g. if field is int it checks for 0, in case of boolean for false, etc.)
If you want this for unit testing I just use the hasNoNullFieldsOrProperties() method from assertj
assertThat(myObj).hasNoNullFieldsOrProperties();
How about streams?
public boolean checkFieldsIsNull(Object instance, List<String> fieldNames) {
return fieldNames.stream().allMatch(field -> {
try {
return Objects.isNull(instance.getClass().getDeclaredField(field).get(instance));
} catch (IllegalAccessException | NoSuchFieldException e) {
return true;//You can throw RuntimeException if need.
}
});
}
"Best" is such a subjective term :-)
I would just use the method of checking each individual variable. If your class already has a lot of these, the increase in size is not going to be that much if you do something like:
public Boolean anyUnset() {
if ( id == null) return true;
if (name == null) return true;
return false;
}
Provided you keep everything in the same order, code changes (and automated checking with a script if you're paranoid) will be relatively painless.
Alternatively (assuming they're all strings), you could basically put these values into a map of some sort (eg, HashMap) and just keep a list of the key names for that list. That way, you could iterate through the list of keys, checking that the values are set correctly.
I think this is a solution that solves your problem easily: (return true if any of the parameters is not null)
public boolean isUserEmpty(){
boolean isEmpty;
isEmpty = isEmpty = Stream.of(id,
name)
.anyMatch(userParameter -> userParameter != null);
return isEmpty;}
Another solution to the same task is:(you can change it to if(isEmpty==0) checks if all the parameters are null.
public boolean isUserEmpty(){
long isEmpty;
isEmpty = Stream.of(id,
name)
.filter(userParameter -> userParameter != null).count();
return isEmpty > 0
}
The best way in my opinion is Reflection as others have recommended. Here's a sample that evaluates each local field for null. If it finds one that is not null, method will return false.
public class User {
String id = null;
String name = null;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public boolean isNull() {
Field fields[] = this.getClass().getDeclaredFields();
for (Field f : fields) {
try {
Object value = f.get(this);
if (value != null) {
return false;
}
}
catch (IllegalArgumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (IllegalAccessException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
return true;
}
public static void main(String args[]) {
System.out.println(new User().isNull());
}
}
Field[] field = model.getClass().getDeclaredFields();
for(int j=0 ; j<field.length ; j++){
String name = field[j].getName();
name = name.substring(0,1).toUpperCase()+name.substring(1);
String type = field[j].getGenericType().toString();
if(type.equals("class java.lang.String")){
Method m = model.getClass().getMethod("get"+name);
String value = (String) m.invoke(model);
if(value == null){
... something to do...
}
}
Best for me is
Stream.of(getClass().getDeclaredMethods()).allMatch(Objects::isNull);
It can be used in a custom annotation + annotation processor to automagically define a boolean isNull() method on the annotated classes.
Based on Irkwz's answer, but a different approach:
public class SomeClass{
private String field1;
private String field2;
private ComplexField field3;
private String field4;
private Integer field15;
public boolean isNullAllFields() {
return Stream.of(this.getClass().getDeclaredFields()).anyMatch(element -> (element != null));
}
}
And the end of the day u invoke isNullAllFields method to figure out wheter the object fields are empty.
If you want to do the opposite i.e check if some/all members of class are non-non, the check this answer.
In order to make sure that certain members of the class are always non-null, we can use lombok #NonNull annotation on the individual fields of the class.
import lombok.Data;
import lombok.NonNull;
#Data
public class DataClass {
#NonNull
private String data1;
private int data2;
#NonNull
private String data3;
#NonNull
private String data4;
#NonNull
private String data5;
private String data6;
DataClass(String data1,...) {
// constructor
}
}
Easiest way is to convert the class to a map and get its keys and with stream check if any or all key's values are null or not, you can take input from user as well whether they want to check for specific set of keys only!
Below is the code to check whether any of the key's value has null, you can change stream config to all match or any match as per your requirement
Just replace isNullOrEmpty method i have used with proper null or empty check condition for that particular collection
public boolean checkIfAnyFieldIsNull(Object instance, Set<String> fields){
try {
Map<String, Object> instanceMap = new Gson().fromJson(new GsonBuilder().serializeNulls().create().toJson(instance), Map.class);
if(!isNullorEmpty(instanceMap)) {
fields = isNullorEmpty(fields) ? instanceMap.keySet() : fields;
return fields.stream().anyMatch(curField -> isNull(instanceMap.get(curField)));
}else{
return false;
}
}catch (Exception e){
return false;
}
}
}
Try this method once, its works for me!!
private fun checkIfAnyDataIsNull(model: YourModelCass): Boolean {
return Stream.of<Any>(
model.date,
model.merchantName,
model.payment,
).allMatch(Objects::isNull)
}
You can use the simple solution:
if(user.equals(new User()){
//your processing goes here
}
I have a piece of information that is stored in a database table as a string code value. I have defined an enum to make the code human-readable. An entity class defines the field with the enum.
When I display the data in a Vaadin grid, it displays the enumeration value, which is what is needed. However, I am also trying to display the same data in a form text field, and this behaves differently. I had to write a converter for the data binding to avoid a run-time error, but the result is the opposite of what I expect - it shows the code value instead of the enumeration.
Some code to illustrate:
The enumeration type:
public enum TaskType {
TASK_VIEW("00"), INTERACTIVE("01"), BATCH("02"), FOLDER("07"), URL("08"), USER_DEFINED("11");
private String codeValue;
private TaskType(String codeValue) {
this.codeValue = codeValue;
}
public String getCodeValue() {
return codeValue;
}
public static TaskType fromCodeValue(String value) {
switch (value) {
case "00":
return TASK_VIEW;
case "01":
return INTERACTIVE;
case "02":
return BATCH;
case "07":
return FOLDER;
case "08":
return URL;
case "11":
return USER_DEFINED;
default:
return null;
}
}
}
The entity class:
#Entity
public class TaskMaster extends AbstractEntity {
private TaskType type;
// other fields
public TaskType getType() {
return type;
}
public void setType(TaskType type) {
this.type = type;
}
// other methods
}
A converter between database field and enum type:
#Converter(autoApply = true)
public class TaskTypeConverter implements AttributeConverter<TaskType, String> {
#Override
public String convertToDatabaseColumn(TaskType type) {
if (type != null) {
return type.getCodeValue();
} else {
return null;
}
}
#Override
public TaskType convertToEntityAttribute(String dbData) {
if (dbData != null) {
return TaskType.fromCodeValue(dbData);
} else {
return null;
}
}
}
The grid view class:
public class TaskMasterListView extends VerticalLayout {
private Grid<TaskMaster> grid = new Grid<>(TaskMaster.class);
private TaskMasterService taskService;
public TaskMasterListView(TaskMasterService taskService) {
this.taskService = taskService;
...
}
#PostConstruct
public void init() {
List<TaskMaster> items = taskService.findAll();
grid.setItems(items);
}
private void configureGrid() {
grid.addClassName("tasks-grid");
grid.setColumns("internalTaskID", "taskID", "name", "type", "objectName", "version",
"formName");
grid.getColumns().forEach(col -> col.setAutoWidth(true));
}
...
}
The details view (where it displays incorrectly):
public class TaskDetailView extends FormLayout {
private static final Logger logger = LoggerFactory.getLogger(TaskDetailView.class);
private TextField type;
// other GUI objects
private Binder<TaskMaster> binder;
public TaskDetailView() {
configureView();
bindData();
}
public void loadTask(TaskTreeView.TaskSelectionEvent event) {
if (event.getSelected() != null) {
binder.setBean(event.getSelected());
}
}
private void bindData() {
binder = new Binder<>(TaskMaster.class);
binder.setBean(new TaskMaster());
binder.forField(type).withConverter(new TaskTypeConverter()).bind("type");
// other bindings
}
private static class TaskTypeConverter implements Converter<String, TaskType> {
#Override
public Result<TaskType> convertToModel(String value, ValueContext context) {
logger.info("convertToModel: value={}", value);
return Result.ok(TaskType.fromCodeValue(value));
}
#Override
public String convertToPresentation(TaskType value, ValueContext context) {
if (value != null) {
logger.info("convertToPresentation: value={}", value.toString());
return value.getCodeValue();
} else {
logger.info("convertToPresentation: null");
return "";
}
}
}
}
So, as an example, if an entity with type = 07 is displayed in the grid, it shows FOLDER, which is what I want. But, when I display the same object where the type is shown in a text field, it shows 07 instead of FOLDER.
Any idea what's going on here? It seems to be doing the opposite of what I need.
Your static class TaskTypeConverter is used to convert from TaskType.FOLDER by the binder. Now let's see what your convertToPresentation() does: it calls value.getCodeValue() so of course your TextField is filled with 07.
However, you want to return the enum name, so you need to call the inherent enum method value.name() and vice-versa call TaskType.valueOf(value) inside convertToModel(). Don't forget to catch IllegalArgumentException and NPE when calling valueOf()!
A good idea would be not to use .name() but a friendly name f.e. "Folder" which you can hold in your enum.
However, you should probably use a Select<TaskType> or ComboBox<TaskType> for the users to select from a predefined set of values i.e. from an enum, instead of a TextField.
Set the friendly name through setItemLabelGenerator() or a Renderer if you need to customize it more than just text.
I need to get the enum name based on value. I am given with enum class and value and need to pick the corresponding name during run time .
I have a class called Information as below.
class Information {
private String value;
private String type;
private String cValue;
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
public String getcValue() {
return cValue;
}
public void setcValue(String cValue) {
this.cValue = cValue;
}
public static void main(String args[]) {
Information inf = new Information();
inf.setType("com.abc.SignalsEnum");
inf.setValue("1");
}
}
class SignalEnum {
RED("1"), GREEN("2"), ORANGE("3");
private String sign;
SignalEnum(String pattern) {
this.sign = pattern;
}
}
class MobileEnum {
SAMSUNG("1"), NOKIA("2"), APPLE("3");
private String mobile;
MobileEnum(String mobile) {
this.mobile = mobile;
}
}
In run time i will come to know the enum name using the attribute type from the Information class and also i am getting the value. I need to figure out the corresponding enum to set the value for cValue attribute of Information class.
Just for example i have provided two enums like SignalEnum and MobileEnum but in my actual case i will get one among 100 enum types. Hence i dont want to check type cast. I am looking for some solution using reflection to se the cValue.
Here is a simple resolver for any enum class.
Since reflection operations are expensive, it's better to prepare all required data once and then just query for it.
class EnumResolver {
private Map<String, Enum> map = new ConcurrentHashMap<>();
public EnumResolver(String className) {
try {
Class enumClass = Class.forName(className);
// look for backing property field, e.g. "sign" in SignalEnum
Field accessor = Arrays.stream(enumClass.getDeclaredFields())
.filter(f -> f.getType().equals(String.class))
.findFirst()
.orElseThrow(() -> new NoSuchFieldException("Not found field to access enum backing value"));
accessor.setAccessible(true);
// populate map with pairs like ["1" => SignalEnum.RED, "2" => SignalEnum.GREEN, etc]
for (Enum e : getEnumValues(enumClass)) {
map.put((String) accessor.get(e), e);
}
accessor.setAccessible(false);
} catch (ReflectiveOperationException e) {
throw new RuntimeException(e);
}
}
public Enum resolve(String backingValue) {
return map.get(backingValue);
}
private <E extends Enum> E[] getEnumValues(Class<E> enumClass) throws ReflectiveOperationException {
Field f = enumClass.getDeclaredField("$VALUES");
f.setAccessible(true);
Object o = f.get(null);
f.setAccessible(false);
return (E[]) o;
}
}
And here is simple JUnit test
public class EnumResolverTest {
#Test
public void testSignalEnum() {
EnumResolver signalResolver = new EnumResolver("com.abc.SignalEnum");
assertEquals(SignalEnum.RED, signalResolver.resolve("1"));
assertEquals(SignalEnum.GREEN, signalResolver.resolve("2"));
assertEquals(SignalEnum.ORANGE, signalResolver.resolve("3"));
}
#Test
public void testMobileEnum() {
EnumResolver mobileResolver = new EnumResolver("com.abc.MobileEnum");
assertEquals(MobileEnum.SAMSUNG, mobileResolver.resolve("1"));
assertEquals(MobileEnum.NOKIA, mobileResolver.resolve("2"));
assertEquals(MobileEnum.APPLE, mobileResolver.resolve("3"));
}
}
And again for performance sake you can also instantiate these various resolvers once and put them into a separate Map
Map<String, EnumResolver> resolverMap = new ConcurrentHashMap<>();
resolverMap.put("com.abc.MobileEnum", new EnumResolver("com.abc.MobileEnum"));
resolverMap.put("com.abc.SignalEnum", new EnumResolver("com.abc.SignalEnum"));
// etc
Information inf = new Information();
inf.setType("com.abc.SignalsEnum");
inf.setValue("1");
SignalEnum red = (SignalEnum) resolverMap.get(inf.getType()).resolve(inf.getValue());
I want to create a custom Spark Transformer in Java.
The Transformer is text preprocessor which acts like a Tokenizer. It takes an input column and an output column as parameters.
I looked around and I found 2 Scala Traits HasInputCol and HasOutputCol.
How can I create a class that extends Transformer and implements HasInputCol and OutputCol?
My goal is have something like this.
// Dataset that have a String column named "text"
DataSet<Row> dataset;
CustomTransformer customTransformer = new CustomTransformer();
customTransformer.setInputCol("text");
customTransformer.setOutputCol("result");
// result that have 2 String columns named "text" and "result"
DataSet<Row> result = customTransformer.transform(dataset);
As SergGr suggested, you can extend UnaryTransformer. However it is quite tricky.
NOTE: All the below comments apply to Spark version 2.2.0.
To address the issue described in SPARK-12606, where they were getting "...Param null__inputCol does not belong to...", you should implement String uid() like this:
#Override
public String uid() {
return getUid();
}
private String getUid() {
if (uid == null) {
uid = Identifiable$.MODULE$.randomUID("mycustom");
}
return uid;
}
Apparently they were initializing uid in the constructor. But the thing is that UnaryTransformer's inputCol (and outputCol) is initialized before uid is initialized in the inheriting class. See HasInputCol:
final val inputCol: Param[String] = new Param[String](this, "inputCol", "input column name")
This is how Param is constructed:
def this(parent: Identifiable, name: String, doc: String) = this(parent.uid, name, doc)
Thus, when parent.uid is evaluated, the custom uid() implementation is called and at this point uid is still null. By implementing uid() with lazy evaluation you make sure uid() never returns null.
In your case though:
Param d7ac3108-799c-4aed-a093-c85d12833a4e__inputCol does not belong to fe3d99ba-e4eb-4e95-9412-f84188d936e3
it seems to be a bit different. Because "d7ac3108-799c-4aed-a093-c85d12833a4e" != "fe3d99ba-e4eb-4e95-9412-f84188d936e3", it looks like your implementation of the uid() method returns a new value on each call. Perhaps in your case it was implemented it so:
#Override
public String uid() {
return Identifiable$.MODULE$.randomUID("mycustom");
}
By the way, when extending UnaryTransformer, make sure the transform function is Serializable.
You probably want to inherit your CustomTransformer from org.apache.spark.ml.UnaryTransformer. You may try something like this:
import org.apache.spark.ml.UnaryTransformer;
import org.apache.spark.ml.util.Identifiable$;
import org.apache.spark.sql.types.DataType;
import org.apache.spark.sql.types.DataTypes;
import scala.Function1;
import scala.collection.JavaConversions$;
import scala.collection.immutable.Seq;
import java.util.Arrays;
public class MyCustomTransformer extends UnaryTransformer<String, scala.collection.immutable.Seq<String>, MyCustomTransformer>
{
private final String uid = Identifiable$.MODULE$.randomUID("mycustom");
#Override
public String uid()
{
return uid;
}
#Override
public Function1<String, scala.collection.immutable.Seq<String>> createTransformFunc()
{
// can't use labmda syntax :(
return new scala.runtime.AbstractFunction1<String, Seq<String>>()
{
#Override
public Seq<String> apply(String s)
{
// do the logic
String[] split = s.toLowerCase().split("\\s");
// convert to Scala type
return JavaConversions$.MODULE$.iterableAsScalaIterable(Arrays.asList(split)).toList();
}
};
}
#Override
public void validateInputType(DataType inputType)
{
super.validateInputType(inputType);
if (inputType != DataTypes.StringType)
throw new IllegalArgumentException("Input type must be string type but got " + inputType + ".");
}
#Override
public DataType outputDataType()
{
return DataTypes.createArrayType(DataTypes.StringType, true); // or false? depends on your data
}
}
I'm a bit late to the party, but I have a few examples of custom Java Spark transforms here: https://github.com/dafrenchyman/spark/tree/master/src/main/java/com/mrsharky/spark/ml/feature
Here's an example with just an input column, but you can easily add an output column following the same patterns. This doesn't implement the readers and writers though. You'll need to check the link above to see how to do that.
public class DropColumns extends Transformer implements Serializable,
DefaultParamsWritable {
private StringArrayParam _inputCols;
private final String _uid;
public DropColumns(String uid) {
_uid = uid;
}
public DropColumns() {
_uid = DropColumns.class.getName() + "_" +
UUID.randomUUID().toString();
}
// Getters
public String[] getInputCols() { return get(_inputCols).get(); }
// Setters
public DropColumns setInputCols(String[] columns) {
_inputCols = inputCols();
set(_inputCols, columns);
return this;
}
public DropColumns setInputCols(List<String> columns) {
String[] columnsString = columns.toArray(new String[columns.size()]);
return setInputCols(columnsString);
}
public DropColumns setInputCols(String column) {
String[] columns = new String[]{column};
return setInputCols(columns);
}
// Overrides
#Override
public Dataset<Row> transform(Dataset<?> data) {
List<String> dropCol = new ArrayList<String>();
Dataset<Row> newData = null;
try {
for (String currColumn : this.get(_inputCols).get() ) {
dropCol.add(currColumn);
}
Seq<String> seqCol = JavaConverters.asScalaIteratorConverter(dropCol.iterator()).asScala().toSeq();
newData = data.drop(seqCol);
} catch (Exception ex) {
ex.printStackTrace();
}
return newData;
}
#Override
public Transformer copy(ParamMap extra) {
DropColumns copied = new DropColumns();
copied.setInputCols(this.getInputCols());
return copied;
}
#Override
public StructType transformSchema(StructType oldSchema) {
StructField[] fields = oldSchema.fields();
List<StructField> newFields = new ArrayList<StructField>();
List<String> columnsToRemove = Arrays.asList( get(_inputCols).get() );
for (StructField currField : fields) {
String fieldName = currField.name();
if (!columnsToRemove.contains(fieldName)) {
newFields.add(currField);
}
}
StructType schema = DataTypes.createStructType(newFields);
return schema;
}
#Override
public String uid() {
return _uid;
}
#Override
public MLWriter write() {
return new DropColumnsWriter(this);
}
#Override
public void save(String path) throws IOException {
write().saveImpl(path);
}
public static MLReader<DropColumns> read() {
return new DropColumnsReader();
}
public StringArrayParam inputCols() {
return new StringArrayParam(this, "inputCols", "Columns to be dropped");
}
public DropColumns load(String path) {
return ( (DropColumnsReader) read()).load(path);
}
}
Even later to the party, I have another update. I had a hard time finding information on extending Spark Transformers to Java, so I am posting my findings here.
I have also been working on custom transformers in Java. At the time of writing, it is a little easier to include save/load functionality. One can create writable parameters by implementing DefaultParamsWritable. Implementing DefaultParamsReadable, however, still results in an exception for me, but there is a simple work-around.
Here is the basic implementation of a column renamer:
public class ColumnRenamer extends Transformer implements DefaultParamsWritable {
/**
* A custom Spark transformer that renames the inputCols to the outputCols.
*
* We would also like to implement DefaultParamsReadable<ColumnRenamer>, but
* there appears to be a bug in DefaultParamsReadable when used in Java, see:
* https://issues.apache.org/jira/browse/SPARK-17048
**/
private final String uid_;
private StringArrayParam inputCols_;
private StringArrayParam outputCols_;
private HashMap<String, String> renameMap;
public ColumnRenamer() {
this(Identifiable.randomUID("ColumnRenamer"));
}
public ColumnRenamer(String uid) {
this.uid_ = uid;
init();
}
#Override
public String uid() {
return uid_;
}
#Override
public Transformer copy(ParamMap extra) {
return defaultCopy(extra);
}
/**
* The below method is a work around, see:
* https://issues.apache.org/jira/browse/SPARK-17048
**/
public static MLReader<ColumnRenamer> read() {
return new DefaultParamsReader<>();
}
public Dataset<Row> transform(Dataset<?> dataset) {
Dataset<Row> transformedDataset = dataset.toDF();
// Check schema.
transformSchema(transformedDataset.schema(), true); // logging = true
// Rename columns.
for (Map.Entry<String, String> entry: renameMap.entrySet()) {
String inputColName = entry.getKey();
String outputColName = entry.getValue();
transformedDataset = transformedDataset
.withColumnRenamed(inputColName, outputColName);
}
return transformedDataset;
}
#Override
public StructType transformSchema(StructType schema) {
// Validate the parameters here...
String[] inputCols = getInputCols();
String[] outputCols = getOutputCols();
// Create rename mapping.
renameMap = new HashMap<> ();
for (int i = 0; i < inputCols.length; i++) {
renameMap.put(inputCols[i], outputCols[i]);
}
// Rename columns.
ArrayList<StructField> fields = new ArrayList<> ();
for (StructField field: schema.fields()) {
String columnName = field.name();
if (renameMap.containsKey(columnName)) {
columnName = renameMap.get(columnName);
}
fields.add(new StructField(
columnName, field.dataType(), field.nullable(), field.metadata()
));
}
// Return as StructType.
return new StructType(fields.toArray(new StructField[0]));
}
private void init() {
inputCols_ = new StringArrayParam(this, "inputCols", "input column names");
outputCols_ = new StringArrayParam(this, "outputCols", "output column names");
}
public StringArrayParam inputCols() {
return inputCols_;
}
public ColumnRenamer setInputCols(String[] value) {
set(inputCols_, value);
return this;
}
public String[] getInputCols() {
return getOrDefault(inputCols_);
}
public StringArrayParam outputCols() {
return outputCols_;
}
public ColumnRenamer setOutputCols(String[] value) {
set(outputCols_, value);
return this;
}
public String[] getOutputCols() {
return getOrDefault(outputCols_);
}
}
I'm using java 1.6 and i know that from java 1.7 there is option to switch on string
but here i use the if/elseif to route type name,my question if there is a elegant way
that i can change it to switch Yet
public static SwitchType<?> switchT(final String typeName,
final String memberName) {
if (typeName.equals("java.lang.String")) {
return new SwitchInputType<String>(new String(memberName + " "));
} else if (typeName.equals("char")) {
return new SwitchInputType<Character>(new Character('a'));
} else if (typeName.equals("decimal") ||
typeName.equals("java.math.BigDecimal")) {
return new SwitchInputType<BigDecimal>(new BigDecimal("34.58"));
} else if (typeName.equals("boolean")) {
}
You could use a Map<String, SwitchTypeFactory>:
public interface SwitchTypeFactory {
SwitchType<?> create(String memberName);
}
...
private static Map<String, SwitchTypeFactory> factories = new HashMap<String, SwitchTypeFactory>();
static {
factories.put("java.lang.String", new SwitchTypeFactory() {
#Override
public SwitchType<?> create(String memberName) {
return new SwitchInputType<String>(memberName + " ");
}
});
factories.put("char", new SwitchTypeFactory() {
#Override
public SwitchType<?> create(String memberName) {
return new SwitchInputType<Character>(Character.valueOf('a'))
}
});
...
}
public static SwitchType<?> switchT(final String typeName, final String memberName) {
return factories.get(typeName).create(memberName);
}
Many patterns are available, from the use of an enumeration to the use of a Map<String,Implementation> but none of them will be more concise nor faster that what you have in this precise case. They would only make sense if more code was dependent of this typeName.
Although it might be a little counter intuitive using enum has proven quite powerful in that regard.
Every enum has a valueOf(String) method that returns an Element for that name. Then you may use the retrieved Element in a switch statement. The only ugly part is that valueOf(String) throws an IllegalArgumentException. So catching that is equivalent to a default case.
enum Type{
JAVA_LANG_STRING;
CHAR;
DECIMAL;
BOOLEAN;
JAVA_LANG_BIGDECIMAL;
}
public static SwitchType<?> switchT(final String typeName,
final String memberName) {
try{
Type t = Type.valueOf(typeName.toUppercase().replace(".","_"));
switch(t){
case Type.JAVA_LANG_STRING: return new SwitchInputType<String>(new String(memberName + " "));
case Type.CHAR: return new SwitchInputType<Character>(new Character('a'));
case Type.DECIMAL:
case Type.JAVA_MATH_BIGDECIMAL: return new SwitchInputType<BigDecimal>(new BigDecimal("34.58"));
}catch(IllegalArgumentException e){
//default case
}
}
Also enums may implement interfaces. Either by providing one implementation for each element or one global one.
interface SwitchInputTypeFacotry {
SwitchInputType get(String arg);
}
enum TypeName implements SwitchInputTypeFacotry{
CHAR{
SwitchInputType get(String arg){
return new SwitchInputType<Character>(new Character('a'));
}
}
[...]
}
public static SwitchType<?> switchT(final String typeName,
final String memberName) {
try{
SwitchInputTypeFacotry t = Type.valueOf(typeName.toUppercase().replace(".","_"));
return t.get(memberName);
}catch(IllegalArgumentException e){
//default case
}
}
The second way makes it very easy to extend functionality (as long as it stays in one module, subclassing is not possible with enums).