Pig UDF Throwing NullPointerException When Generating New Tuple - java

I have a Pig UDF which ingests some data and then attempts to transform that data in a minimal manner.
my_data = LOAD 'path/to/data' USING SomeCustomLoader();
my_other_data = FOREACH my_data GENERATE MyUDF(COL_1, COL_2, $param1, $param2) as output;
my_final_data = FOREACH my_other_data GENERATE output.NEW_COL1, output.NEW_COL2, output.NEW_COL3;
However, I keep getting the following error:
ERROR 0: Exception while executing [POUserFunc (Name: POUserFUnc(udf.MyUDF)[tuple] - scope-38 Operator Key: scope-38) children: null at []]: java.lang.NullPointerException
My UDF takes the data and transforms it:
public class MyUDF extends EvalFunc<Tuple> {
public Tuple exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
TupleFactory _factory;
Long fieldOne;
String fieldTwo;
String fieldThree;
_factory.getInstance();
try {
fieldOne = Long.valueOf(input.get(0).toString());
fieldTwo = input.get(1).toString();
fieldThree = input.get(2).toString();
fieldOne = doSomething(fieldOne);
fieldTwo = doSomething(fieldTwo);
fieldThree = doSomething(fieldThree);
return _factory.newTuple(Arrays.asList(fieldOne, fieldTwo, fieldThree));
} catch (Exception ex) {
return _factory.newTuple(Arrays.asList("ParseException", "", "", ""));
}
}
}
I have debugged and confirmed that fieldOne, fieldTwo, and fieldThree do exist prior to calling the tuple factory. It's also clear that the exception is being thrown because the code reaches the catch block and then throws this NullPointerException error.
What is not clear is why on earth this is happening.
According to the Pig docs (Pig 0.14.0 API), I should be able to call newTuple(java.util.List c) with the relevant items.
I have also defined my own Schema to ensure the types are correct when going back to the pig script.

The code in question has not instantiated your tuple instance, thus you cannot call the method on an object that does not exist.
public class ... {
TupleFactory _factory;
public Tuple exec(Tuple input) {
_factory = TupleFactory.getInstance();
...
}
}

Related

How to create Junit testcases for lambda expression using Mockito Junit 5?

I want to create Junit TestCases of method, in which we are iterating List<Map<String,Object>> using forEach loop with lambda expresion. Now I want to mock statement objectMapper.writeValueAsString(recordObj.get("value")); but I am not understanding how to use recordObj.
public String apply(MyRequestWrapper requestWrapper) {
String resultStr=null;
final Map<String, List<PubSubEvent>> packagesEventList = AppUtilities.getPackagesEventsMappedList();
try {
logger.debug("Received Record:: " + requestWrapper.getBody().toString());
List<RecordProcessedResult> results = new ArrayList<>();
List<Map<String,Object>> recordMaps= string2List(objectMapper,requestWrapper.getBody().toString());
logger.debug("Parsed received payload ::: "+ LocalDateTime.now() + " batch size is ::: "+ recordMaps.size());
if(! ObjectUtils.isEmpty(recordMaps) && !recordMaps.isEmpty() ) {
recordMaps.forEach(recordObj ->{
ConsumerRecord record=objectMapper.convertValue(recordObj, ConsumerRecord.class);
String topicName = recordObj.get("topic").toString();
String key = null;
String value = null;
String offset = null;
String xTraceabilityId = ((Map<String, String>) recordObj.get("headers")).get(IdTypeConstants.XTRACEABILITYID);
String xCorrelationId = ((Map<String, String>) recordObj.get("headers")).get(IdTypeConstants.XCORRELATIONID);
MDC.put(IdTypeConstants.XTRACEABILITYID, xTraceabilityId);
MDC.put(IdTypeConstants.XCORRELATIONID, xCorrelationId);
try {
key = objectMapper.writeValueAsString(recordObj.get("key"));
value = objectMapper.writeValueAsString(recordObj.get("value"));
offset = objectMapper.writeValueAsString(recordObj.get("offset"));
MyEvent myEvent= objectMapper.readValue(value, MyEvent.class);
subscribedPackageProcessor.setInput(input);
subscribedPackageProcessor.setOutput(output);
subscribedPackageProcessor.setPackagesEventList(packagesEventList);
subscribedPackageProcessor.setRequesterType(requesterType); subscribedPackageProcessor.processSubscribedPackage(myEvent.getPackageId());
RecordProcessedResult rpr = new RecordProcessedResult(record, true, null, xTraceabilityId, xCorrelationId, key, System.currentTimeMillis());
results.add(rpr);
}
catch(Exception e) {
RecordProcessedResult rpr = new RecordProcessedResult(record, false, ExceptionUtils.getStackTrace(e), xTraceabilityId, xCorrelationId, key, System.currentTimeMillis());
results.add(rpr);
logger.info("Exception occured while processing fund data :::out ", e);
}
MDC.clear();
});
}
resultStr = objectMapper.writeValueAsString(results);
}catch (Exception e) {
logger.debug(e.getMessage());
}
return resultStr;
}
I have tried following testcases.
#Test void applyTest() throws Exception {
MyEvent myEvent = new MyEvent();
myEvent.setPackageId("test");
MyRequestWrapper flowRequestWrapper= getMyRequestWrapper();
List<Map<String, Object>> maps = string2List(objectMapper1, flowRequestWrapper.getBody().toString());
Map<String,Object> map = new HashMap<String, Object>();
Mockito.when(objectMapper.readValue(Mockito.anyString(), Mockito.any(TypeReference.class))).thenReturn(maps);
Mockito.when(objectMapper.writeValueAsString(Mockito.anyString())).thenReturn("test");
Mockito.when(objectMapper.readValue(Mockito.anyString(), Mockito.eq(MyEvent.class))).thenReturn(myEvent);
//doNothing().when(subscribedPackageProcessor).processSubscribedPackage("");
String response = processESignCompletedEventSvcFlow.apply(flowRequestWrapper);
Assertions.assertNotNull(response);
}
Please help, Thanks
Your method is way too complex to be unit tested. For example it declares dependencies by calling methods in the same class. You cannot mock those and it makes the testing many times more complicated.
List<Map<String,Object>> recordMaps =
string2List(objectMapper,requestWrapper.getBody().toString());
You need to extract the string2List method into a standalone class (with it's own unit tests) that is injected into your class as a dependency.
Then you can just mock the string2List class and when you do that, you control the creation of recordObj instances from your unit test for this method.
Your second "sin" is abusing lambdas by creating one that is longer than two lines. Lambdas should be short. If it spans more than a few lines, it must be extracted into a standalone class that can be unit tested separately. And again, when you have extracted this lambda into a standalone class and unit tested it, you can't just go "new RecordObjConsumer(results)" in your method, as that creates a hard-coded dependency that you again cannot mock. You need to design the consumer so that it can be injected into your class as an external dependency.

Java + MongoDB: how get a nested field value using complete path?

I have this path for a MongoDB field main.inner.leaf and every field couldn't be present.
In Java I should write, avoiding null:
String leaf = "";
if (document.get("main") != null &&
document.get("main", Document.class).get("inner") != null) {
leaf = document.get("main", Document.class)
.get("inner", Document.class).getString("leaf");
}
In this simple example I set only 3 levels: main, inner and leaf but my documents are deeper.
So is there a way avoiding me writing all these null checks?
Like this:
String leaf = document.getString("main.inner.leaf", "");
// "" is the deafult value if one of the levels doesn't exist
Or using a third party library:
String leaf = DocumentUtils.getNullCheck("main.inner.leaf", "", document);
Many thanks.
Since the intermediate attributes are optional you really have to access the leaf value in a null safe manner.
You could do this yourself using an approach like ...
if (document.containsKey("main")) {
Document _main = document.get("main", Document.class);
if (_main.containsKey("inner")) {
Document _inner = _main.get("inner", Document.class);
if (_inner.containsKey("leaf")) {
leafValue = _inner.getString("leaf");
}
}
}
Note: this could be wrapped up in a utility to make it more user friendly.
Or use a thirdparty library such as Commons BeanUtils.
But, you cannot avoid null safe checks since the document structure is such that the intermediate levels might be null. All you can do is to ease the burden of handling the null safety.
Here's an example test case showing both approaches:
#Test
public void readNestedDocumentsWithNullSafety() throws IllegalAccessException, NoSuchMethodException, InvocationTargetException {
Document inner = new Document("leaf", "leafValue");
Document main = new Document("inner", inner);
Document fullyPopulatedDoc = new Document("main", main);
assertThat(extractLeafValueManually(fullyPopulatedDoc), is("leafValue"));
assertThat(extractLeafValueUsingThirdPartyLibrary(fullyPopulatedDoc, "main.inner.leaf", ""), is("leafValue"));
Document emptyPopulatedDoc = new Document();
assertThat(extractLeafValueManually(emptyPopulatedDoc), is(""));
assertThat(extractLeafValueUsingThirdPartyLibrary(emptyPopulatedDoc, "main.inner.leaf", ""), is(""));
Document emptyInner = new Document();
Document partiallyPopulatedMain = new Document("inner", emptyInner);
Document partiallyPopulatedDoc = new Document("main", partiallyPopulatedMain);
assertThat(extractLeafValueManually(partiallyPopulatedDoc), is(""));
assertThat(extractLeafValueUsingThirdPartyLibrary(partiallyPopulatedDoc, "main.inner.leaf", ""), is(""));
}
private String extractLeafValueUsingThirdPartyLibrary(Document document, String path, String defaultValue) {
try {
Object value = PropertyUtils.getNestedProperty(document, path);
return value == null ? defaultValue : value.toString();
} catch (Exception ex) {
return defaultValue;
}
}
private String extractLeafValueManually(Document document) {
Document inner = getOrDefault(getOrDefault(document, "main"), "inner");
return inner.get("leaf", "");
}
private Document getOrDefault(Document document, String key) {
if (document.containsKey(key)) {
return document.get(key, Document.class);
} else {
return new Document();
}
}

ActiveJDBC and Java Generics causing

I have the following two classes, where the Document extends an Abstract class that provides helper functions, one of which is a "find" method that builds queries to find records based on some simple logic.
public abstract class AbstractTable<T extends AbstractTable<T>> extends Model {
...
public T find (String[] columns) {
String whereClause = "";
List<Object> whereClauseData = new ArrayList<Object> ();
for (String column : columns) {
Object data = this.get(column);
if (data == null) {
whereClause += column + " is null AND ";
} else {
whereClause += column + " = ? AND ";
whereClauseData.add (data);
}
}
return findFirst (whereClause.substring(0, whereClause.length () - 5), whereClauseData.toArray());
}
}
public class Document extends AbstractTable<Document> {
...
public Document findExistingObject(Document document) {
String[] columns = new String[] {"court_case_id", "number", "name", "file_date"};
return super.find (columns);
}
}
When I run this code, and the "findExistingObject" method is called on a Document, I receive this exception:
Exception in thread "main" org.javalite.activejdbc.InitException:
failed to determine Model class name, are you sure models have been
instrumented?
I've made completely sure that I've instrumented the classes. When I move the code from AbstractTable into Document, everything works perfectly. I'm hoping someone can lend some advice, or help, that might show me what I'm doing wrong.
Thanks in advance.
The exact reason for your issue is not generics, but instrumentation. Instrumentation skips abstract models, which means that the method findFirst is called on class Model, and not on Document. You need to invoke a method findFirst on the model Document. Here is a version of code that will work for you:
public T find (String[] columns) throws NoSuchMethodException, InvocationTargetException, IllegalAccessException {
String whereClause = "";
List<Object> whereClauseData = new ArrayList<>();
for (String column : columns) {
Object data = get(column);
if (data == null) {
whereClause += column + " is null AND ";
} else {
whereClause += column + " = ? AND ";
whereClauseData.add (data);
}
}
Method findFirst = getClass().getDeclaredMethod("findFirst", String.class, Object[].class);
return (T) findFirst.invoke(null, whereClause.substring(0, whereClause.length () - 5), whereClauseData.toArray());
}
There is a bit of ugliness there, but at least you can apply this across all your models (if this is what you want).

Using Jackcess to retrieve numeric values stored in a text field gives ClassCastException

I am working with Jackcess to read and categorize an access database. It's simply meant to open the database, loop through each line, and print out individual row data to the console which meet certain conditions. It works fine, except for when I try to read numeric values. My code is below. (This code is built into a Swing GUI and gets executed when a jbutton is pressed.)
if (inv == null) { // Check to see if inventory file has been set. If not, then set it to the default reference path.
inv = rPath;
}
if (inventoryFile.exists()) { // Check to see if the reference path exists.
List<String> testTypes = jList1.getSelectedValuesList();
List<String> evalTypes = jList3.getSelectedValuesList();
List<String> grainTypes = jList2.getSelectedValuesList();
StringBuilder sb = new StringBuilder();
for (int i=0; i<=evalTypes.size()-1; i++) {
if (i<evalTypes.size()-1) {
sb.append(evalTypes.get(i)).append(" ");
}
else {
sb.append(evalTypes.get(i));
}
}
String evalType = sb.toString();
try (Database db = DatabaseBuilder.open(new File(inv));) {
Table sampleList = db.getTable("NTEP SAMPLES LIST");
Cursor cursor = CursorBuilder.createCursor(sampleList);
for (int i=0; i<=testTypes.size()-1; i++) {
if ("Sample Volume".equals(testTypes.get(i))) {
if (grainTypes.size() == 1 && "HRW".equals(grainTypes.get(0))) {
switch (evalType) {
case "GMM":
for (Row row : sampleList){
if (null != row.getString("CURRENTGAC")) {}
if ("HRW".equals(row.get("GRAIN")) && row.getDouble("CURRENTGAC")>=12.00) {
System.out.print(row.get("GRAIN") + "\t");
System.out.println(row.get("CURRENTGAC"));
}
}
break;
case "NIRT":
// some conditional code
break;
case "TW":
// some more code
break;
}
}
else {
JOptionPane.showMessageDialog(null, "Only HRW samples can be used for the selected test(s).", "Error", JOptionPane.ERROR_MESSAGE);
}
break;
}
}
}
catch (IOException ex) {
Logger.getLogger(SampleFilterGUI.class.getName()).log(Level.SEVERE, null, ex);
}
When the code is run I get the following error:
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Double
The following condition looks to be what is throwing the error.
row.getDouble("CURRENTGAC")>=12.00
It appears that when the data is read from the database, the program is reading everything as a string, even though some fields are numeric. I was attempting to cast this field as a double, but java doesn't seem to like that. I have tried using the Double.parseDouble() and Double.valueOf() commands to try converting the value (as mentioned here) but without success.
My question is, how can I convert these fields to numeric values? Is trying to type cast the way to go, or is there a different method I'm not aware of? You will also notice in the code that I created a cursor, but am not using it. The original plan was to use it for navigating through the database, but I found some example code from the jackcess webpage and decided to use that instead. Not sure if that was the right move or not, but it seemed like a simpler solution. Any help is much appreciated. Thanks.
EDIT:
To ensure the program was reading a string value from my database, I input the following code
row.get("CURRENTGAC").getClass().getName()
The output was java.lang.String, so this confirms that it is a string. As was suggested, I changed the following code
case "GMM":
for (Row row : sampleList){
if (null != row.get("CURRENTGAC"))
//System.out.println(row.get("CURRENTGAC").getClass().getName());
System.out.println(String.format("|%s|", row.getString("CURRENTGAC")));
/*if ("HRW".equals(row.get("GRAIN")) && row.getDouble("CURRENTGAC")>=12.00 && row.getDouble("CURRENTGAC")<=14.00) {
System.out.print(row.get("GRAIN") + "\t");
System.out.println(row.get("CURRENTGAC"));
}*/
}
break;
The ouput to the console from these changes is below
|9.85|
|11.76|
|9.57|
|12.98|
|10.43|
|13.08|
|10.53|
|11.46|
...
This output, although looks numeric, is still of the string type. So when I tried to run it with my conditional statement (which is commented out in the updated sample code) I still get the same java.lang.ClassCastException error that I was getting before.
Jackcess does not return all values as strings. It will retrieve the fields (columns) of a table as the appropriate Java type for that Access field type. For example, with a test table named "Table1" ...
ID DoubleField TextField
-- ----------- ---------
1 1.23 4.56
... the following Java code ...
Table t = db.getTable("Table1");
for (Row r : t) {
Object o;
Double d;
String fieldName;
fieldName = "DoubleField";
o = r.get(fieldName);
System.out.println(String.format(
"%s comes back as: %s",
fieldName,
o.getClass().getName()));
System.out.println(String.format(
"Value: %f",
o));
System.out.println();
fieldName = "TextField";
o = r.get(fieldName);
System.out.println(String.format(
"%s comes back as: %s",
fieldName,
o.getClass().getName()));
System.out.println(String.format(
"Value: %s",
o));
try {
d = r.getDouble(fieldName);
} catch (Exception x) {
System.out.println(String.format(
"r.getDouble(\"%s\") failed - %s: %s",
fieldName,
x.getClass().getName(),
x.getMessage()));
}
try {
d = Double.parseDouble(r.getString(fieldName));
System.out.println(String.format(
"Double.parseDouble(r.getString(\"%s\")) succeeded. Value: %f",
fieldName,
d));
} catch (Exception x) {
System.out.println(String.format(
"Double.parseDouble(r.getString(\"%s\")) failed: %s",
fieldName,
x.getClass().getName()));
}
System.out.println();
}
... produces:
DoubleField comes back as: java.lang.Double
Value: 1.230000
TextField comes back as: java.lang.String
Value: 4.56
r.getDouble("TextField") failed - java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Double
Double.parseDouble(r.getString("TextField")) succeeded. Value: 4.560000
If you are unable to get Double.parseDouble() to parse the string values from your database then either
they contain "funny characters" that are not apparent from the samples you posted, or
you're doing it wrong.
Additional information re: your sample file
Jackcess is returning CURRENTGAC as String because it is a Text field in the table:
The following Java code ...
Table t = db.getTable("NTEP SAMPLES LIST");
int countNotNull = 0;
int countAtLeast12 = 0;
for (Row r : t) {
String s = r.getString("CURRENTGAC");
if (s != null) {
countNotNull++;
Double d = Double.parseDouble(s);
if (d >= 12.00) {
countAtLeast12++;
}
}
}
System.out.println(String.format(
"Scan complete. Found %d non-null CURRENTGAC values, %d of which were >= 12.00.",
countNotNull,
countAtLeast12));
... produces ...
Scan complete. Found 100 non-null CURRENTGAC values, 62 of which were >= 12.00.

JNA Structure ByReference

below the method and the unit test for that method.
The problem is that I'm not able to return the value of result from the Load method.
the unit test below fails!
I thought that by default JNA's object were ByRef by default so I tried instantiating and passing LoadResults "without" .ByReference ...
where is my mistake?
#Test
public void testLoad () {
MY_Processor proc = new MY_Processor();
// LoadResults result = new LoadResults ();
LoadResults.ByReference result = new LoadResults.ByReference();
ByteByReference [] pathToFile = new ByteByReference[256];
// fill pathToFile out ...
try {
proc.Load (pathToFile, result);
assertEquals(0, result.errorCode);
assertEquals(1, result.elaborationTime);
assertEquals(2, result.coreItem);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public Integer Load ( ByteByReference[] pathToFile,
LoadResults.ByReference result ) throws Exception {
// here result is correctly filled out !
LoadResults result = null;
result = native.getResult (numCore);
}
added the native code.
UPDATE
// header
typedef struct
{
int errorCode;
int elaborationTime;
int coreItem;
} LoadResults;
//[in] path
//[out] result
int Load (char path[MY_BUFFER_DEFINE], LoadResults* result);
// implementation ...
LoadResults* getResult (int numCore)
{
// some check ...
LoadResults *localResult = new LoadResults();
// fill out ...
return localResult;
}
there is a "free" method exposed by the native code but I didn't show in order to keep the focus on my problem :-)
/UPDATE
thanks!
O.
The problem is that you're passing a Structure as a parameter, then reassigning that parameter within the function. That will have no effect whatsoever on the argument.
The pattern you need to follow is this:
Pointer p = mylib.getResult()
MyStructure m = new MyStructure(p);
// ....
mylib.free(p);
I'd recommend you pass in a native string (const char*) as your path rather than a fixed-size buffer of native char.
UPDATE
If you want to copy a result into the argument, then you'll need to copy the structure's contents, e.g.
public int Load(LoadResults arg) throws Exception {
// Effectively copy memory from result into arg
LoadResults result = native.getResult(numCore);
if (alternative_1) {
// Copy result's native memory into arg's memory, then sync Java fields
result.useMemory(arg.getPointer());
result.write();
arg.read();
}
else {
// Sync result's native memory with arg's Java fields
Pointer p = arg.getPointer();
arg.useMemory(result.getPointer());
arg.read();
arg.useMemory(p);
}
}
Just solved ...
1) I don't need to use LoadResults.ByReference.
2) the problem was that into the Load method I updated the reference passed in input with another one:
public Integer Load ( ByteByReference[] pathToFile, LoadResults result ) throws Exception
{
// that's the problem!!!! storing the value into another object with another "address"
// and not the original "results".
// result = native.getResult (numCore);
// solved with this:
native.getResult (numCore, result);
}

Categories