Cascading - cascading.tuple.TupleException: failed to set a value - java

I am trying to apply ScrubFunction on each tuple and return the tuple with updated values.
But i am getting the Exception like..
Caused by: cascading.tuple.TupleException: failed to set a value, tuple may not be initialized with values, is zero length
Sample Code:
TupleEntry argument = functionCall.getArguments();
Tuple result = new Tuple();
result.setInteger(0, argument.getInteger(0));
result.setString(1, argument.getString(1).toUpperCase());
result.setString(2, argument.getString(2));
result.setString(3, argument.getString(3));
result.setString(4, argument.getString(4));
result.setString(5, argument.getString(5));
functionCall.getOutputCollector().add(result);
What if i want to update few fields in a Tuple and return the updated values.
Can i update directly in TupleEntry and return it.

To your first question: don't set values to a tuple, instead, add.
Tuple result = new Tuple();
result.addInteger(argument.getInteger(0));
// ...
To your second question: yes. See the API doc here: TupleEntry.setObject
Hope this helps :)

Related

Java Stream - Retrieving repeated records from CSV

I searched the site and didn't find something similar. I'm newbie to using the Java stream, but I understand that it's a replacement for a loop command. However, I would like to know if there is a way to filter a CSV file using stream, as shown below, where only the repeated records are included in the result and grouped by the Center field.
Initial CSV file
Final result
In addition, the same pair cannot appear in the final result inversely, as shown in the table below:
This shouldn't happen
Is there a way to do it using stream and grouping at the same time, since theoretically, two loops would be needed to perform the task?
Thanks in advance.
You can do it in one pass as a stream with O(n) efficiency:
class PersonKey {
// have a field for every column that is used to detect duplicates
String center, name, mother, birthdate;
public PersonKey(String line) {
// implement String constructor
}
// implement equals and hashCode using all fields
}
List<String> lines; // the input
Set<PersonKey> seen = new HashSet<>();
List<String> unique = lines.stream()
.filter(p -> !seen.add(new PersonKey(p))
.distinct()
.collect(toList());
The trick here is that a HashSet has constant time operations and its add() method returns false if the value being added is already in the set, true otherwise.
What I understood from your examples is you consider an entry as duplicate if all the attributes have same value except the ID. You can use anymatch for this:
list.stream().filter(x ->
list.stream().anyMatch(y -> isDuplicate(x, y))).collect(Collectors.toList())
So what does the isDuplicate(x,y) do?
This returns a boolean. You can check whether all the entries have same value except the id in this method:
private boolean isDuplicate(CsvEntry x, CsvEntry y) {
return !x.getId().equals(y.getId())
&& x.getName().equals(y.getName())
&& x.getMother().equals(y.getMother())
&& x.getBirth().equals(y.getBirth());
}
I've assumed you've taken all the entries as String. Change the checks according to the type. This will give you the duplicate entries with their corresponding ID

Cannot unbox null value

I have the following code:
List<Details> detailsList = new ArrayList<>();
List<String[]> csv = csvReader.readAll();
final Map<String, Integer> mappedHeaders = mapHeaders(csv.get(0));
List<String[]> data = csv.subList(1, csv.size());
for (String[] entry : data) {
Details details = new Details(
entry[mappedHeaders.get("A")],
entry[mappedHeaders.get("B")],
entry[mappedHeaders.get("C")]);
detailsList.add(details);
I'm essentially reading in a CSV file as a list of string arrays where the first list item is the CSV file headers and all remaining elements correspond to the data rows. However, since different CSV files of the same features might have different feature column ordering I don't know the ordering in advance. For that, I have a mapHeaders method which maps the headers to indices so I can later properly put together the Details object (for example, if headers are ["B", "A", "C"], the mappedHeaders would correspond to {B: 0; A: 1; C: 2}.
I also have some test data files of different column orderings and all but one of them work as they should. However, the one that doesn't work gives me
java.lang.NullPointerException: cannot unbox null value
when trying to evaluate entry[mappedHeaders.get("A")]. Additionally, when running the code in debugging mode, the mappedHeaders contains the correct keys and values and the value for "A" isn't null.
I have also tried entry[mappedHeaders.getOrDefault("A", Arrays.asList(csv.get(0)).indexOf("A"))] which returns -1. The only thing that works is entry[mappedHeaders.getOrDefault("A", 0)] since A is the first column in the failing case, but that workaround don't seem very feasible as there might be more failing cases that I don't know about, but where the ordering is different. What might be the reason for such behavior? Might it be some weird encoding issue?
That's because you are trying to unbox a null value.
A method like intValue, longValue() or doubleValue() is being called on a null object.
Integer val = null;
if (val == 1) {
// NullPointerException
}
Integer val = null;
if (val == null) {
// This works
}
Integer val = 0;
if (val == 1) {
// This works
}

Create Array of values from Array of objects with value as a property

i wish to get the value of name from : [sailpoint.object.Identity#4099209b[id=8a029c656b800bf9016b801a2d130014,name=100] which is stored in a list.Please assist.
Code Snippet:
// getObjects reurn identity objects e.g. sailpoint.object.Identity#43ac0a68[id=8a029c656b800bf9016b801a2eae0017,name=101]
List<Identity> identities = context.getObjects(Identity.class, query);
Results now:
[sailpoint.object.Identity#4099209b[id=8a029c656b800bf9016b801a2d130014,name=100], sailpoint.object.Identity#43ac0a68[id=8a029c656b800bf9016b801a2eae0017,name=101]]
Expected Output:
[100,101]
I'd stream the list and use a getter to extract the name:
List<String> result =
identities.stream().map(Identity::getName).collect(Collectors.toList());
Without a definition of the Identity object I can only assume that the name property is local (and exposed via Identity#getName(). In that case, you can simply map ("translate") each collection item:
List<String> names = context.getObjects(Identity.class, query).stream()
.map(Identity::getName)
.collect(Collectors.toList());

in which case TreeBasedTable.create().rowMap().get(rowKey) will return an empty map

In my project guava tables are used. version is 15.0. somehow in my logs the mapping for particular rowkey is coming empty like {rowkey={}} but I am not able to replicate it.
I tried below approaches.
Table table = TreeBasedTable.create();
table.put(rowkey, null,null) // gives compilation error
table.put(rowkey, "",null) // giving compilation error
table.put(rowkey, null,"") // giving compilation error
table.put(rowkey, "","") // printing like {rowkey={=}}
Please help how can I get {rowkey={}} if i print table.rowMap()
i.e map returning from table.rowMap().get(rowKey) is empty (not null).
table.rowMap().get(rowKey) would return a Map, if the map is null, a probably way is there's no key 'rowkey' at all.
TreeBasedTable tbt = TreeBasedTable.create();
Object object = tbt.rowMap().get("rowkey");
System.out.println(object);
The output is null.
What about this?
TreeBasedTable tbt = TreeBasedTable.create();
tbt.put("rowKey", "columnKey", "value");
Map map = (Map) tbt.rowMap().get("rowKey");
System.out.println(map); // {columnKey=value}
map.clear();
System.out.println(map); // output is {} System.out.println(table.rowMap()); // in this case also output is {} while it
should be {rowkey-{}}
If rowMap().get("rowKey") is not null, it means there really put value into TreeBasedTable. And then something clear the map.

Retrieving timestamp from hbase row

Using Hbase API (Get/Put) or HBQL API, is it possible to retrieve timestamp of a particular column?
Assuming your client is configured and you have a table setup. Doing a get returns a Result
Get get = new Get(Bytes.toBytes("row_key"));
Result result_foo = table.get(get);
A Result is backed by a KeyValue. KeyValues contain the timestamps. You can get either a list of KeyValues with list() or get an array with raw(). A KeyValue has a get timestamp method.
result_foo.raw()[0].getTimestamp()
I think the follow will be better:
KeyValue kv = result.getColumnLatest(family, qualifier);
String status = Bytes.toString(kv.getValue());
Long timestamp = kv.getTimestamp();
since Result#getValue(family, qualifier) is implemented as
public byte[] getValue(byte[] family, byte[] qualifier) {
KeyValue kv = this.getColumnLatest(family, qualifier);
return kv == null ? null : kv.getValue();
}
#codingFoo's answer assumes all timestamps are the same for all cells, but op's question was for a specific column. In that respect, similar to #peibin wang's answer, I would propose the following if you would like the last timestamp for your column:
Use the getColumnLatestCell method on your Result object, and then call the getTimestamp method like so:
Result res = ...
res.getColumnLatestCell(Bytes.toBytes("column_family"), Bytes.toBytes("column_qualifier")).getTimestamp();
If you want access to a specific timestamp you could use the getColumnCells which returns all cells for a specified column, but then you will have to choose between the cells with a get(int index) and then call getTimestamp()
result_foo.rawCells()(0).getTimestamp
is a good style

Categories