Hadoop Custom Partitioner not behaving according to the logic

Hadoop Custom Partitioner not behaving according to the logic - java

Based on this example here, this works. Have tried the same on my dataset.
Sample Dataset:
OBSERVATION;2474472;137176;
OBSERVATION;2474473;137176;
OBSERVATION;2474474;137176;
OBSERVATION;2474475;137177;
Consider each line as string, my Mapper output is:
key-> string[2], value-> string.
My Partitioner code:
#Override
public int getPartition(Text key, Text value, int reducersDefined) {
String keyStr = key.toString();
if(keyStr == "137176") {
return 0;
} else {
return 1 % reducersDefined;
}
}
In my data set most id's are 137176. Reducer declared -2. I expect two output files, one for 137176 and second for remaining Id's. I'm getting two output files but, Id's evenly distributed on both the output files. What's going wrong in my program?

Explicitly set in the Driver method that you want to use your custom Partitioner, by using: job.setPartitionerClass(YourPartitioner.class);. If you don't do that, the default HashPartitioner is used.
Change String comparison method from == to .equals(). i.e., change if(keyStr == "137176") { to if(keyStr.equals("137176")) {.
To save some time, perhaps it will be faster to declare a new Text variable at the beginning of the partitioner, like that: Text KEY = new Text("137176"); and then, without converting your input key to String every time, just compare it with the KEY variable (again using the equals() method). But perhaps those are equivalent. So, what I suggest is:
Text KEY = new Text("137176");
#Override
public int getPartition(Text key, Text value, int reducersDefined) {
return key.equals(KEY) ? 0 : 1 % reducersDefined;
}
Another suggestion, if the network load is heavy, parse the map output key as VIntWritable and change the Partitioner accordingly.

Related

How to compare GraphQL query tree in java

My spring-boot application is generating GraphQL queries, however I want to compare that query in my test.
So basically I have two strings where the first one is containing the actual value and the latter one the expected value.
I want to parse that in a class or tree node so I can compare them if both of them are equal.
So even if the order of the fields are different, I need to know if it's the same query.
So for example we have these two queries:
Actual:
query Query {
car {
brand
color
year
}
person {
name
age
}
}
Expected
query Query {
person {
age
name
}
car {
brand
color
year
}
}
I expect that these queries both are semantically the same.
I tried
Parser parser = new Parser();
Document expectedDocument = parser.parseDocument(expectedValue);
Document actualDocument = parser.parseDocument(actualValue);
if (expectedDocument.isEqualTo(actualDocument)) {
return MatchResult.exactMatch();
}
But found out that it does nothing since the isEqualTo is doing this:
public boolean isEqualTo(Node o) {
if (this == o) {
return true;
} else {
return o != null && this.getClass() == o.getClass();
}
}
I know with JSON I can use Jackson for this purpose and compare treenodes, or parsing it into a Java object and have my own equals() implementation, but I don't know how to do that for GraphQL Java.
How can I parse my GraphQL query string into an object so that I can compare it?

I have recently solved this problem myself. You can reduce the query to a hash and compare the values. You can account for varied query order by utilizing a tree structure. You can take advantage of the QueryTraverser and QueryReducer to accomplish this.
First you can create the QueryTraverser, the exact method for creating this will depend on your execution point. Assuming you are doing it in the AsyncExecutor with access to the ExecutionContext the below code snippet will suffice. But you can do this in instrumentation or the data fetcher itself if you so choose;
val queryTraverser = QueryTraverser.newQueryTraverser()
.schema(context.graphQLSchema)
.document(context.document
.operationName(context.operationDefinition.name)
.variables(context.executionInput?.variables ?: emptyMap())
.build()
Next you will need to provide an implementation of the reducer, and some accumulation object that can add each field to a tree structure. Here is a simplified version of an accumulation object
class MyAccumulation {
/**
* A sorted map of the field node of the query and its arguments
*/
private val fieldPaths = TreeMap<String, String>()
/**
* Add a given field and arguments to the sorted map.
*/
fun addFieldPath(path: String, arguments: String) {
fields[path] = arguments
}
/**
* Function to generate the query hash
*/
fun toHash(): String {
val joinedFields = fieldPaths.entries
.joinToString("") { "${it.key}[${it.value}]" }
return HashingLibrary.hashingfunction(joinedFields)
}
A sample reducer implementation would look like the below;
class MyReducer : QueryReducer<MyAccumulation> {
override fun reduceField(
fieldEnvironment: QueryVisitorFieldEnvironment,
acc: MyAccumulation
): MyAccumulation {
if (fieldEnvironment.isTypeNameIntrospectionField) {
return acc
}
// Get your field path, this should account for
// the same node with different parents, and you should recursively
// traverse the parent environment to construct this
val fieldPath = getFieldPath(fieldEnvironment)
// Provide a reproduceable stringified arguments string
val arguments = getArguments(fieldEnvironment.arguments)
acc.addFieldPath(fieldPath, arguments)
return acc
}
}
Finally put it all together;
val queryHash = queryTraverser
.reducePreOrder(MyReducer(), MyAccumulation())
.toHash()
You can now generate a repdocueable hash for a query that does not care about the query structure, only the actual fields that were requested.
Note: These code snippets are in kotlin but are transposable to Java.

Depending on how important is to perform this comparison you can inspect all the elements of Document to determine equality.
If this is to optimize and return the same result for the same input I would totally recommend just compare the strings and kept two entries (one for each string input).
If you really want to go for the deep compare route, you can check the selectionSet and compare each selection.
Take a look at the screenshot:
You can also give EqualsBuilder.html.reflectionEquals(Object,Object) a try but it might inspect too deep (I tried and returned false)

Handle long min value condition

When I ran a program, long min value is getting persisted instead of original value coming from the backend.
I am using the code:
if (columnName.equals(Fields.NOTIONAL)) {
orderData.notional(getNewValue(data));
As output of this, i am getting long min value, instead of original value.
I tried using this method to handle the scenario
public String getNewValue(Object data) {
return ((Long)data).getLong("0")==Long.MIN_VALUE?"":((Long)data).toString();
}
but doesn't work.
Please suggest

EDITED: I misread the code in the question; rereading it, I now get what the author is trying to do, and cleaned up the suggestion as a consequence.
(Long) data).getLong("0") is a silly way to write null, because that doesn't do anything. It retrieves the system property named '0', and then attempts to parse it as a Long value. As in, if you start your VM with java -D0=1234 com.foo.YourClass, that returns 1234. I don't even know what you're attempting to accomplish with this call. Obviously it is not equal to Long.MIN_VALUE, thus the method returns ((Long) data).toString(). If data is in fact a Long representing MIN_VALUE, you'll get the digits of MIN_VALUE, clearly not what you wanted.
Try this:
public String getNewValue(Object data) {
if (data instanceof Number) {
long v = ((Number) data).longValue();
return v == Long.MIN_VALUE ? "" : data.toString();
}
// what do you want to return if the input isn't a numeric object at all?
return "";

Hadoop: MapReduce MinMax result different from original dataset

I am new in Hadoop.
I try to use MapReduce to get the min and max Monthly Precipitation value for each year.
Here is one year of the data set looks like:
Product code,Station number,Year,Month,Monthly Precipitation Total (millimetres),Quality
IDCJAC0001,023000,1839,01,11.5,Y
IDCJAC0001,023000,1839,02,11.4,Y
IDCJAC0001,023000,1839,03,20.8,Y
IDCJAC0001,023000,1839,04,10.5,Y
IDCJAC0001,023000,1839,05,4.8,Y
IDCJAC0001,023000,1839,06,90.4,Y
IDCJAC0001,023000,1839,07,54.2,Y
IDCJAC0001,023000,1839,08,97.4,Y
IDCJAC0001,023000,1839,09,41.4,Y
IDCJAC0001,023000,1839,10,40.8,Y
IDCJAC0001,023000,1839,11,113.2,Y
IDCJAC0001,023000,1839,12,8.9,Y
And this is what the result I get for the year 1839:
1839 1.31709005E9 1.3172928E9
Obviously, the result is not matched to the original data...But I cannot figure out why it happens...

Your code has multiple issues.
(1) In MinMixExposure, you write doubles, but read ints. You also use Double type (meaning that you care about nulls) but do not handle nulls in serialization/deserialization. If you really need nulls, you should write something like this:
// write
out.writeBoolean(value != null);
if (value != null) {
out.writeDouble(value);
}
// read
if (in.readBoolean()) {
value = in.readDouble();
} else {
value = null;
}
If you do not need to store nulls, replace Double with double.
(2) In map function you wrap your code in IOException catch blocks. This doesn't make any sense. If input data has records in incorrect format, then most probably you will get NullPointerException/NumberFormatError in Double.parseDouble(). However, you do not handle these exceptions.
Checking for nulls after you called parseDouble also doesn't make sense.
(3) You pass map key to reducer as Text. I would recommend to pass year as IntWritable (and configure your job with job.setMapOutputKeyClass(IntWritable.class);).
(4) maxExposure must be handled similarly to minExposure in reducer code. Currently you just return the value for the last record.

Your logic to find the min and max exposure in the Reducer seems off. You set maxExposure twice, and never check whether it is actually the max exposure. I'd go with:
public void reduce(Text key, Iterable<MinMaxExposure> values,
Context context) throws IOException, InterruptedException {
Double minExposure = Double.MAX_VALUE;
Double maxExposure = Double.MIN_VALUE;
for (MinMaxExposure val : values) {
if (val.getMinExposure() < minExposure) {
minExposure = val.getMinExposure();
}
if (val.getMaxExposure() > maxExposure) {
maxExposure = val.getMaxExposure();
}
}
MinMaxExposure resultRow = new MinMaxExposure();
resultRow.setMinExposure(minExposure);
resultRow.setMaxExposure(maxExposure);
context.write(key, resultRow);
}

Outputting single file for partitioner

Trying to get as many reducer as the no of keys
public class CustomPartitioner extends Partitioner<Text, Text>
{
public int getPartition(Text key, Text value,int numReduceTasks)
{
System.out.println("In CustomP");
return (key.toString().hashCode()) % numReduceTasks;
}
}
Driver class
job6.setMapOutputKeyClass(Text.class);
job6.setMapOutputValueClass(Text.class);
job6.setOutputKeyClass(NullWritable.class);
job6.setOutputValueClass(Text.class);
job6.setMapperClass(LastMapper.class);
job6.setReducerClass(LastReducer.class);
job6.setPartitionerClass(CustomPartitioner.class);
job6.setInputFormatClass(TextInputFormat.class);
job6.setOutputFormatClass(TextOutputFormat.class);
But I am getting ootput in a single file.
Am I doing anything wrong

You can not control number of reducer without specifying it :-). But still there is no surety of getting all the keys on different reducer because you are not sure how many distinct keys you would get in the input data and your hash partition function may return same number for two distinct keys. If you want to achieve your solution then you'll have to know number of distinct keys in advance and then modify your partition function accordingly.

you need to specify the number of reduce tasks that's equal to number of keys and also you need to return the partitions based on your key's in partitioner class. for example if your input having 4 keys(here it is wood,Masonry,Reinforced Concrete etc) then your getPartition method look like this..
public int getPartition(Text key, PairWritable value, int numReduceTasks) {
// TODO Auto-generated method stub
String s = value.getone();
if (numReduceTasks ==0){
return 0;
}
if(s.equalsIgnoreCase("wood")){
return 0;
}
if(s.equalsIgnoreCase("Masonry")){
return 1%numReduceTasks;
}
if(s.equalsIgnoreCase("Reinforced Concrete")){
return 2%numReduceTasks;
}
if(s.equalsIgnoreCase("Reinforced Masonry")){
return 3%numReduceTasks;
}
else
return 4%numReduceTasks;
}
}
corresponding output will be collected in respective reducers..Try Running in CLI instead eclipse

You haven't configured the number of reducers to run.
You can configure it using below API
job.setNumReduceTasks(10); //change the number according to your
cluster
Also, you can set while executing from commandline
-D mapred.reduce.tasks=10
Hope this helps.

Veni, You need to Chain the Tasks as below
Mapper1 --> Reducer --> Mapper2 (Post Processing Mapper which creates
file for Each key)
Mapper 2 is InputFormat should be NlineInputFormat, so the output of the reducer that is for each key there will be corresponding mapper and Mapper output will be a separate file foe each key.
Mapper 1 and Reducer is your existing MR job.
Hope this helps.
Cheers
Nag

ANDROID: If...Else as Switch on String

I'm writing an Android app for work that shows the status of our phone lines, but thats neither here nor there.
I make a call to one of our servers and get returned JSON text of the status. I then parse this putting each line into a SortedMap (TreeMap) with the Key being the name of the line and my own class as the value (which holds status and other details).
This all works fine.
When the app runs it should then show each line and the info I have retrieved, but nothing gets updated.
The JSON is returned and added to the Map correctly.
This is a snapshot of the code that isn't working. I simply iterate through the map and depending on the value of key update the relevant TextView. The problem I am having is that when it gets to the IF statement that matches it never runs that code. It skips it as if values don't match.
I can't see any errors. Is this the only way to do this as I know you can't use Switch..Case etc?
Can anyone see my error? I've been coding on Android for 1 week now so its probably a newbie error!!
Thanks
Neil
Iterator iterator = mapLines.entrySet().iterator();
while(iterator.hasNext())
{
// key=value separator this by Map.Entry to get key and value
Map.Entry<String, Status> mapEntry = (Map.Entry<String, Status>)iterator.next();
// getKey is used to get key of Map
String key = (String)mapEntry.getKey();
// getValue is used to get value of key in Map
Status value = (Status)mapEntry.getValue();
if(key == "Ski")
{
TextView tvStatus = (TextView)findViewById(R.id.SkiStatus);
tvStatus.setText(value.Status);
}
else if(key == "Cruise")
{
TextView tvStatus = (TextView)findViewById(R.id.CruiseStatus);
tvStatus.setText(value.Status);
}
else if(key == "Villas")
{
TextView tvStatus = (TextView)findViewById(R.id.VillasStatus);
tvStatus.setText(value.Status);
}
}

You must use equals() to compare String objects in Java. Otherwise you just compare if the two objects are the same instance of the String class and don't compare their actual content:
if (key.equals("Ski")) {
...
}
Or, to avoid a NullPointerException if key might be null:
if ("Ski".equals(key)) {
...
}

I prefer to use maps in this case because they eliminate the need for duplicated code and long if else constructs. I don't know where in your code this snippet occurs so this may not apply in your case but just to mention it.
Use a Map to get the correct resource for your String and set the status.
The code would look something like this:
First initialize the map:
Map<String, Integer> textViews = new HashMap<String, Integer>();
textViews.put("Ski", R.id.Ski);
textViews.put("Cruise", R.id.Cruise);
textViews.put("Villas", R.id.Villas);
Then retrieve the correct id and set the text:
((TextView) findViewById(textViews.get(key))).setText(status);
This will reduce the big if else construct a lot and adding a textview will be as easy as changing the map.

checking key with the string literal "Ski", you can use like below . This will prevent nullpointer exception.
if ("Ski".equals(key))
{
...
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hadoop Custom Partitioner not behaving according to the logic - java

Related

How to compare GraphQL query tree in java

Handle long min value condition

Hadoop: MapReduce MinMax result different from original dataset

Outputting single file for partitioner

ANDROID: If...Else as Switch on String

Categories

Resources