Pass custom value to Reducer - java

I want/need to pass along the rowkey to the Reducer, as the rowkey is calculated in advance, and the information is not available anymore at that stage. (The Reducer executes a Put)
First I tried to just use inner classes, e.g.
public class MRMine {
private byte[] rowkey;
public void start(Configuration c, Date d) {
// calc rowkey based on date
TableMapReduceUtil.initTableMapperJob(...);
TableMapReduceUtil.initTableReducerJob(...);
}
public class MyMapper extends TableMapper<Text, IntWritable> {...}
public class MyReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> {...}
}
and both MyMapper and MyReducer have the default constructor defined. But this approach leads to the following exception(s):
java.lang.RuntimeException: java.lang.NoSuchMethodException: com.mycompany.MRMine$MyMapper.<init>()
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: java.lang.NoSuchMethodException: com.company.MRMine$MyMapper.<init>()
at java.lang.Class.getConstructor0(Class.java:2730)
at java.lang.Class.getDeclaredConstructor(Class.java:2004)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
I got rid of the exception by declaring the inner classes static (Runtimeexception: java.lang.NoSuchMethodException: tfidf$Reduce.<init>()) . but then I'd have to make the rowkey static as well, and I'm running multiple jobs in parallel.
I found https://stackoverflow.com/a/6739905/1338732 where the configure method of the Reducer is overwritten, but it doesn't seem to be available anymore. Anyhow, I wouldn't be able to pass along a value.
I was thinking of (mis)using (?) the Configuration, by just adding a new key-value pair, would this be working, and the correct approach?
Is there a way to pass along any custom value to the reducer?
the versions I'm using are: hbase: 0.94.6.1, hadoop: 1.0.4

Your problem statement is a little unclear, however I think something like this is what you are looking for.
The way I currently use to pass information to the reducer is to pass it in the configuration.
in the job setup do the following
conf.set("someName","someValue");
This will create a tag in the configuration that has name someName with value someValue. This can later be retrieved in the Mapper/Reducer by doing the following:
Configuration conf = context.getConfiguration();
String someVariable = conf.get("someName");
The current code will set the value of someVariable to "someValue", allowing the information to be passed to the reducer.
To pass multiple values use setStrings(). I haven't tested this function yet, but according to the documentation is should work with one of the following two options (the documentation is a little unclear, so try both and use whichever works):
conf.setStrings("someName","value1,value2,value3");
conf.setStrings("someName","value1","value2","value3");
retrieve using:
Configuration conf = context.getConfiguration();
String someVariable = conf.getStrings("someName");
Hope this helps

The goal is a little unclear, but I have found that for many types of jobs involving HBase, you do not need a reducer to put data into HBase. The mapper reads a row, modifies it in some way, then writes it back.
Obviously there are jobs for which that is inappropriate (any type of aggregation for example), but the reduce stage can really slow down a job.

Related

How to print a Function<A,B>?

I would like to create a central logger class that logs messages like:
Couldn't find a record of type MyClass when searching for field
MyField
Currently, I have this piece of code:
public static <T, A, B> String logNotFound(final Class<T> type, String field) {
return String.format(DATA_NOT_FOUND, type.getSimpleName(), field);
}
And I call it like:
Optional<Person> person = findPersonByLastName("Smith");
if (person.isEmpty()) logNotFound(Person.class, "lastName");
However, I don't quite like passing a string to the field name. I would like to call the log method as
logNotFound(Person.class, Person::getLastName)
passing a Function<A,B> as parameter. I expect a message like
Couldn't find a record of type Person when searching for field > Person::getLastName
Is there a way to do this?
seems to be XY problem: you actually need to log errors in convenient way, but do not know how to refer class fields in code... There are two options:
lombok: #FieldNameConstants - there are some issues
implement annotation processor which will generate metamodel for your classes (or if you are on HBN you may use existing one: https://vladmihalcea.com/jpa-criteria-metamodel/)
It is possible, use (or analyze the source code and implement it in your way) safety-mirror library:
assertEquals("isEmpty", Fun.getName(String::isEmpty));
Here you read more details - Printing debug info on errors with java 8 lambda expressions

Using passed in Value for #JmsListener's destination paramter

Is there a specific way to accomplish this? I tried to find a solution on here but couldn't find what I need. I have a Spring Boot application that will be accepting multiple arguments from the command line. The argument in question is the queue name (i.e. the destination). It can be one of several of our many queues. The JmsListener is in the form
#JmsListener(destination="dest_goes_here")
public void processOrder(Message message){. . .}
I have a class that basically looks like this
public class Arguments {
private static queue
private static antoherArg
:
:
getters and setters
}
And what I would like to say is destination = Arguments.getQueue(), but it seems destination can only be a static final variable? I assume this because the error presents a little tooltip that alludes to that.
I also tested it, as I have yet another class called Constants, that obvioulsy contains constants, and if I hard code the queue name as public static final String QUEUE = "MyQ"; then say destination = Constants.QUEUE it is ok with that.
So then I assumed I could do something like this in my listener class private static final String QUEUE = Arguments.getQueue(); But it doesn't like that either. Alas, I am stumped.
So really two questions here if anyone is willing to knowledge share. Why is the #JmsListener ok with having destination set to my second solution, but not the first and the last?
And then the main question (that I'd prefer you answer over the first) is, what strategies can I make use of to set destination to a variable that originates from the command line (i.e. be dynamic)?
Edit: To clarify, I cannot keep the value in my Constants class, as the value will be coming from the command line and needs to be passed to the JmsListener class to be used as the destination.
That's how Java works, destination must be a compile-time constant expression and a function invocation isn't considered one. Take a look at the official language specification for more details. EDIT: you can also look at this answer.
As far as your second (and more important) question goes, I have several suggestions for you.
First, you can read the queue name from a configuration property, like so: destination="${jms.queue.name1}" where jms.queue.name1 is your configuration property. Then, since you are using Spring Boot, you can use command-line arguments to override your configuration properties (see externalized configuration documentation for more details). That way, you'll be able to specify the queue name at runtime by passing it as a command-line argument like so --jms.queue.name1=foo.
Second, you can use programmatic listener registration, like so:
#Configuration
#EnableJms
public class AppConfig implements JmsListenerConfigurer {
#Override
public void configureJmsListeners(JmsListenerEndpointRegistrar registrar) {
SimpleJmsListenerEndpoint endpoint = new SimpleJmsListenerEndpoint();
endpoint.setId("myJmsEndpoint");
endpoint.setDestination(Arguments.getQueue());
endpoint.setMessageListener(message -> {
// processing
});
registrar.registerEndpoint(endpoint);
}
}

Dynamically add fields to an object using annotations (Java/Groovy)

I'm trying to use Java annotations to be able to add specific fields to an object.
The need is the following : I have a class that processes configuration files where keys are associated with values with the form key=value.
The problem is that I want to be able to let the user defining himself required fields which, if not present, throws exception.
The easiest solution is to pass these fields to the constructor in a String[] but, I also want the user to be able to use these required fields as it were properties of the class, so he's able to write in the code something like :
#RequiredFields(
field1,
field2,
field3)
MyClass myObject = new MyClass(String filePath);
String value = myObject.field1;
and the field field1 is also a completion proposal ?
I'm actually developping in Groovy, so if not possible in standard Java, would it be in Groovy ?
Thanks !
What about a factory method which does the validation? Something like:
MyClass my = MyClassFactory.create("arg1", "arg2")
Or, with maps on Groovy:
def my = MyClassFactory.create arg1: "foo", arg2: "bar"
And the factory itself checks the properties file.
If you really want the annotation, maybe the Type Annotations on JDK 8 are an option.
On Groovy, you can try a local AST, which seems like an overengineered solution to me, or GContracts which is programming by contract.
You could combine a factory with GContracts to #Ensure the resulting object contains all fields according to the properties file.

Play: Using a configuration property as the value of an annotation

This (obviously) works:
#Every("10s")
public class Extinguisher extends Job {
...
}
...but this doesn't:
#Every(Play.configuration.getProperty("my.setting", "10s"))
public class Extinguisher extends Job {
...
}
When running auto-test, the app doesn't start and complains my controllers can't get enhanced because of a NullPointerException encountered by javassist.
Is there a way to configure a job scheduling from application.conf?
You can schedule your job manually from #OnApplicationStartup job:
#OnApplicationStartup
public class ExtinguisherBootstrap extends Job {
public void doJob() {
new Extinguisher()
.every(Play.configuration.getProperty("my.setting", "10s"));
}
}
I don't know whether Play or javassist extend what you can do with the Java language, but I can at least point out that the following line is not legal Java:
#Every(Play.configuration.getProperty("my.setting", "10s"))
For an annotation with a parameter with type T and value V, the Java Language Specification requires that:
If T is a primitive type or String, V is a constant expression
In this case, T, the type of the annotation parameter, is a String, but the value you're trying to set to it isn't a (compile-time) constant expression.
The issue is that "configuration" wont be available at that stage.
I don't think what you want to do is possible (as per my current knowledge of Play, maybe someone knows a trick to make it work)
You may be able to "hack it" by having a job run every few seconds and in that job launch the target job as per configuration. It's less efficient, but it may solve your issue
You can do something like this:
#On("cron.noon")
Which will look for a line like this in application.conf:
cron.noon = 1s

Sending a variable to the Mapper Class

I am trying to get an input from the user and pass it to my mapper class that I have created but whenever the value always initialises to zero instead of using the actual value the user input.
How can make sure that whenever I get the variable it always maintain the same value. I have noticed that job1.setMapperClass(Parallel_for.class); creates an instance of the class hence forcing the variable to reinitialize to its original value. Below is the link to the two classes. I am trying to get the value of times from RunnerTool class.
Link to Java TestFor class
Link to RunnerTool class
// setup method in the Mapper
#Override
public void setup(Context context) {
int defaultValue = 1;
times = context.getConfiguration().getInt("parallel_for_iteration", defaultValue );
LOG.info(context.getConfiguration().get("parallel_for_iteration") + " Actually name from the commandline");
LOG.info(times + " Actually number of iteration from the commandline");
}
// RunnerTools class
conf.setInt(ITERATION, times);
You should note that mapper class will be recreated on many cluster nodes so any initalization done to the instance of the mapper class when running the job will not affect other nodes. Technically relevant jar file/s will be distributed among nodes and then mappers will be created there.
So as pointed in the answer above, the only way to pass information to the mappers is using Configuration class.
Mapper get's initialized by reflection, so you can not let the user interact with the mapper class.
Instead you have your Configuration object, which you have to provide if you're setting up your job. There you can set this using conf.set("YOUR KEY", "YOUR VALUE"). In your Mapper class you can override a method called setup(Context context), there you can get the value using context.getConfiguration().get("YOUR KEY"). And maybe save to your mapper local variable.

Categories