Java + MongoDB: how get a nested field value using complete path? - java

I have this path for a MongoDB field main.inner.leaf and every field couldn't be present.
In Java I should write, avoiding null:
String leaf = "";
if (document.get("main") != null &&
document.get("main", Document.class).get("inner") != null) {
leaf = document.get("main", Document.class)
.get("inner", Document.class).getString("leaf");
}
In this simple example I set only 3 levels: main, inner and leaf but my documents are deeper.
So is there a way avoiding me writing all these null checks?
Like this:
String leaf = document.getString("main.inner.leaf", "");
// "" is the deafult value if one of the levels doesn't exist
Or using a third party library:
String leaf = DocumentUtils.getNullCheck("main.inner.leaf", "", document);
Many thanks.

Since the intermediate attributes are optional you really have to access the leaf value in a null safe manner.
You could do this yourself using an approach like ...
if (document.containsKey("main")) {
Document _main = document.get("main", Document.class);
if (_main.containsKey("inner")) {
Document _inner = _main.get("inner", Document.class);
if (_inner.containsKey("leaf")) {
leafValue = _inner.getString("leaf");
}
}
}
Note: this could be wrapped up in a utility to make it more user friendly.
Or use a thirdparty library such as Commons BeanUtils.
But, you cannot avoid null safe checks since the document structure is such that the intermediate levels might be null. All you can do is to ease the burden of handling the null safety.
Here's an example test case showing both approaches:
#Test
public void readNestedDocumentsWithNullSafety() throws IllegalAccessException, NoSuchMethodException, InvocationTargetException {
Document inner = new Document("leaf", "leafValue");
Document main = new Document("inner", inner);
Document fullyPopulatedDoc = new Document("main", main);
assertThat(extractLeafValueManually(fullyPopulatedDoc), is("leafValue"));
assertThat(extractLeafValueUsingThirdPartyLibrary(fullyPopulatedDoc, "main.inner.leaf", ""), is("leafValue"));
Document emptyPopulatedDoc = new Document();
assertThat(extractLeafValueManually(emptyPopulatedDoc), is(""));
assertThat(extractLeafValueUsingThirdPartyLibrary(emptyPopulatedDoc, "main.inner.leaf", ""), is(""));
Document emptyInner = new Document();
Document partiallyPopulatedMain = new Document("inner", emptyInner);
Document partiallyPopulatedDoc = new Document("main", partiallyPopulatedMain);
assertThat(extractLeafValueManually(partiallyPopulatedDoc), is(""));
assertThat(extractLeafValueUsingThirdPartyLibrary(partiallyPopulatedDoc, "main.inner.leaf", ""), is(""));
}
private String extractLeafValueUsingThirdPartyLibrary(Document document, String path, String defaultValue) {
try {
Object value = PropertyUtils.getNestedProperty(document, path);
return value == null ? defaultValue : value.toString();
} catch (Exception ex) {
return defaultValue;
}
}
private String extractLeafValueManually(Document document) {
Document inner = getOrDefault(getOrDefault(document, "main"), "inner");
return inner.get("leaf", "");
}
private Document getOrDefault(Document document, String key) {
if (document.containsKey(key)) {
return document.get(key, Document.class);
} else {
return new Document();
}
}

Related

How to access an object attribute from a String in Java?

I have a String that tells me what attribute I should use to make some filtering. How can I use this String to actually access the data in the object ?
I have a method that returns a List of strings telling me how to filter my List of objects. Such as:
String[] { "id=123", "name=foo" }
So my first idea was to split the String into 2 parts with:
filterString.split("=") and use the first part of the String (e.g. "id") to identify the attribute being filtered.
Coming for a JS background, I would do it like this:
const attr = filterString.split('=')[0]; // grabs the "id" part from the string "id=123", for example
const filteredValue = filterString.split('=')[1]; // grabs the "123" part from the string "id=123", for example
items.filter(el => el[`${attr}`] === filteredValue) // returns an array with the items where the id == "123"
How would I be able to do that with Java ?
You can use reflections to get fields of class by dynamic name.
#Test
void test() throws NoSuchFieldException, IllegalAccessException {
String[] filters = {"id=123", "name=foo"};
List<Item> list = newArrayList(new Item(123, "abc"), new Item(2, "foo"), new Item(123, "foo"));
Class<Item> itemClass = Item.class;
for (String filter : filters) {
String key = StringUtils.substringBefore(filter, "=");
String value = StringUtils.substringAfter(filter, "=");
Iterator<Item> iterator = list.iterator();
while (iterator.hasNext()) {
Item item = iterator.next();
Field field = itemClass.getDeclaredField(key);
field.setAccessible(true);
Object itemValue = field.get(item);
if (!value.equals(String.valueOf(itemValue))) {
iterator.remove();
}
}
}
assertEquals(1, list.size());
}
But I agree with comment from sp00m - it's slow and potentially dangerous.
This code should work :
//create the filter map
Map<String, String> expectedFieldValueMap = new HashMap<>();
for (String currentDataValue : input) {
String[] keyValue = currentDataValue.split("=");
String expectedField = keyValue[0];
String expectedValue = keyValue[1];
expectedFieldValueMap.put(expectedField, expectedValue);
}
Then iterate over input object list ( have used Employee class with id and name fields & prepared a test data list with few Employee objects called inputEmployeeList which is being iterated ) and see if all filters passes, using reflection, though slow, is one way:
for (Employee e : inputEmployeeList) {
try {
boolean filterPassed = true;
for (String expectedField : expectedFieldValueMap.keySet()) {
String expectedValue = expectedFieldValueMap.get(expectedField);
Field fieldData = e.getClass().getDeclaredField(expectedField);
fieldData.setAccessible(true);
if (!expectedValue.equals(fieldData.get(e))) {
filterPassed = false;
break;
}
}
if (filterPassed) {
System.out.println(e + " object passed the filter");
}
} catch (Exception any) {
any.printStackTrace();
// handle
}
}

Does PDFBox allow to remove one field from AcroForm?

I am using Apache PDFBox 2.0.8 and trying to remove one field. But can not find the way to do it, like I can do with iText: PdfStamper.getAcroFields().removeField("signature3").
What I am tying to do. Initially I have template PDF with 3 Digital Signatures. In some cases I need just 2 signatures, so it this case I need to remove 3rd signature from the template. And seems like I can't do it with PDFBox, close thing I found is flattening this field, but that problem is if a flatten particular PDField (not whole form, but just one field) - all other signatures are loosing their functionality, looks like they are getting flattened as well.
Here is code that does it:
PDDocument document = PDDocument.load(file);
PDDocumentCatalog documentCatalog = document.getDocumentCatalog();
PDAcroForm acroForm = documentCatalog.getAcroForm();
List<PDField> flattenList = new ArrayList<>();
for (PDField field : acroForm.getFieldTree()) {
if (field instanceof PDSignatureField && "signature3".equals(field.getFullyQualifiedName())) {
flattenList.add(field);
}
}
acroForm.flatten(flattenList, true);
document.save(dest);
document.close();
As Tilman already mentioned in a comment, PDFBox doesn't have a method to remove a field from the field tree. Nonetheless it has methods to manipulate the underlying PDF structure, so one can write such a method oneself, e.g. like this:
PDField removeField(PDDocument document, String fullFieldName) throws IOException {
PDDocumentCatalog documentCatalog = document.getDocumentCatalog();
PDAcroForm acroForm = documentCatalog.getAcroForm();
if (acroForm == null) {
System.out.println("No form defined.");
return null;
}
PDField targetField = null;
for (PDField field : acroForm.getFieldTree()) {
if (fullFieldName.equals(field.getFullyQualifiedName())) {
targetField = field;
break;
}
}
if (targetField == null) {
System.out.println("Form does not contain field with given name.");
return null;
}
PDNonTerminalField parentField = targetField.getParent();
if (parentField != null) {
List<PDField> childFields = parentField.getChildren();
boolean removed = false;
for (PDField field : childFields)
{
if (field.getCOSObject().equals(targetField.getCOSObject())) {
removed = childFields.remove(field);
parentField.setChildren(childFields);
break;
}
}
if (!removed)
System.out.println("Inconsistent form definition: Parent field does not reference the target field.");
} else {
List<PDField> rootFields = acroForm.getFields();
boolean removed = false;
for (PDField field : rootFields)
{
if (field.getCOSObject().equals(targetField.getCOSObject())) {
removed = rootFields.remove(field);
break;
}
}
if (!removed)
System.out.println("Inconsistent form definition: Root fields do not include the target field.");
}
removeWidgets(targetField);
return targetField;
}
void removeWidgets(PDField targetField) throws IOException {
if (targetField instanceof PDTerminalField) {
List<PDAnnotationWidget> widgets = ((PDTerminalField)targetField).getWidgets();
for (PDAnnotationWidget widget : widgets) {
PDPage page = widget.getPage();
if (page != null) {
List<PDAnnotation> annotations = page.getAnnotations();
boolean removed = false;
for (PDAnnotation annotation : annotations) {
if (annotation.getCOSObject().equals(widget.getCOSObject()))
{
removed = annotations.remove(annotation);
break;
}
}
if (!removed)
System.out.println("Inconsistent annotation definition: Page annotations do not include the target widget.");
} else {
System.out.println("Widget annotation does not have an associated page; cannot remove widget.");
// TODO: In this case iterate all pages and try to find and remove widget in all of them
}
}
} else if (targetField instanceof PDNonTerminalField) {
List<PDField> childFields = ((PDNonTerminalField)targetField).getChildren();
for (PDField field : childFields)
removeWidgets(field);
} else {
System.out.println("Target field is neither terminal nor non-terminal; cannot remove widgets.");
}
}
(RemoveField helper methods removeField and removeWidgets)
One can apply this to a document and field like this:
PDDocument document = PDDocument.load(SOURCE_PDF);
PDField field = removeField(document, "Signature1");
Assert.assertNotNull("Field not found", field);
document.save(TARGET_PDF);
document.close();
(RemoveField test testRemoveInvisibleSignature)
PS: I am not sure how much form related information PDFBox actually caches somewhere. Thus, I would propose not to manipulate the form information any further in the same document manipulation session, at least not without tests.
PPS: You find a TODO in the removeWidgets helper method. If the method outputs "Widget annotation does not have an associated page; cannot remove widget", you'll have to add the missing code.
Thanks to #mkl, I managed to do so with a shorter implementation using version pdfbox-3.0.0-RC1. in this case, to hide a button (check):
var check = (PDPushButton) pdAcroForm.getField(name);
List<PDField> fields = pdAcroForm.getFields();
fields.removeIf(x -> x.getCOSObject().equals(check.getCOSObject()));
pdAcroForm.setFields(fields);
check.getWidgets().forEach(widget -> widget.setNoView(true));

How to change job parameter of a map-reduce job on run-time?

I have written a map job which takes up a bunch of tweets and list of keyword, and emits tweets counts for keywords
#Override
public void map(Object key, Text value, Context output) throws IOException,
InterruptedException {
JSONObject tweetObject = null;
ArrayList<String> keywords = this.getKeyWords();
try {
tweetObject = (JSONObject) parser.parse(value.toString());
} catch (ParseException e) {
e.printStackTrace();
}
if (tweetObject != null) {
String tweetText = (String) tweetObject.get("text");
StringTokenizer st = new StringTokenizer(tweetText);
ArrayList<String> tokens = new ArrayList<String>();
while (st.hasMoreTokens()) {
tokens.add(st.nextToken());
}
for (String keyword : keywords) {
for (String token : tokens) {
token = token.toLowerCase();
if (token.equals(keyword) || token.contains(keyword)) {
output.write(new Text(keyword), one);
break;
}
}
}
}
output.write(new Text("count"), one);
}
ArrayList<String> getKeyWords() {
ArrayList<String> keywords = new ArrayList<String>();
keywords.add("vodka");
keywords.add("tequila");
keywords.add("mojito");
keywords.add("margarita");
return keywords;
}
Right now my keywords list is static/hard-coded in the map-reduce jar file, how can I make this dynamic? i.e. I want to be able to change the keywords on run-time?
What is the best way to do this?
Multiple ways from the top off my head: query a webservice, read a file.
In any case you probably don't want to execute this for every record you map. It is fairly common to use a caching layer (e.g. Guava) to cache an external data source and invalidate it for example by time or modification.

Does rxjava with couchbase offer value for non-bulk opertions

The new Couchbase SDK makes bulk operations easier to use and more performant use rx-java. But is there any value to using rx for operations on single values?
If we look at a simple CAS / insert operation, ie if the value exists do a cas else do an insert and return the document value
final String id = "id";
final String modified = "modified";
final int numCasRetries = 3;
Observable
.defer(() -> bucket.async().get(id))
.flatMap(document -> {
try {
if (document == null) {
JsonObject content = JsonObject.create();
content.put(modified, new Date().getTime());
document = bucket.insert(JsonDocument.create(id, content));
} else {
document.content().put(modified, new Date().getTime());
document = bucket.replace(document);
}
return Observable.just(document);
} catch (CASMismatchException e) {
return Observable.error(e);
}
})
.retry((count, error) -> {
// Only retry on CASMismatchException
return ((error instanceof CASMismatchException)
&& (count < numCASRetries));
})
.onErrorResumeNext(error -> {
return Observable.error(new Exception(error));
})
.toBlocking()
.single();
So toBlocking will block the calling thread until a result is available. and only one value is written and read from Couchbase at a time. So I do not understand why or even if this code will be any better than
final String id = "id";
final String modified = "modified";
final int numCasRetries = 3;
JsonDocument document = null;
for (int i = 1; i <= numCasRetries; i++) {
document = bucket.get(id);
try {
if (document == null) {
JsonObject content = JsonObject.create();
content.put(modified, new Date().getTime());
document = bucket.insert(JsonDocument.create(id, content));
} else {
document.content().put(modified, new Date().getTime());
document = bucket.replace(document);
}
return document;
} catch (CASMismatchException e) {
if (i == numCasRetries) {
throw e;
}
}
}
If anything I'd argue that in this scenario the rx approach is less readable.
For an operation on a single document where ultimately you need to block, I'd tend to agree that your second example is clearer.
RxJava shines when you heavily use asynchronous processing, especially when you need advanced error handling, retry scenarii, combination of asynchronous flows...
The previous generation of Couchbase Java SDK (1.4.x) just had Future for that, and it didn't provide the elegant, powerful and expressive capabilities we found in RxJava.

How to deserialize beans given in a java properties file?

Please, consider the following piece of a typical VMWare configuration file (*.vmx):
memsize = "2048"
MemTrimRate = "-1"
mks.enable3d = "TRUE"
nvram = "Windows Server 2003 Standard Edition.nvram"
pciBridge0.pciSlotNumber = "17"
pciBridge0.present = "TRUE"
pciBridge4.functions = "8"
pciBridge4.pciSlotNumber = "18"
pciBridge4.present = "TRUE"
pciBridge4.virtualDev = "pcieRootPort"
pciBridge5.functions = "8"
pciBridge5.pciSlotNumber = "19"
pciBridge5.present = "TRUE"
pciBridge5.virtualDev = "pcieRootPort"
pciBridge6.functions = "8"
pciBridge6.pciSlotNumber = "20"
pciBridge6.present = "TRUE"
pciBridge6.virtualDev = "pcieRootPort"
pciBridge7.functions = "8"
pciBridge7.pciSlotNumber = "32"
pciBridge7.present = "TRUE"
pciBridge7.virtualDev = "pcieRootPort"
replay.filename = ""
replay.supported = "FALSE"
roamingVM.exitBehavior = "go"
By observing this configuration, one can imagine a PciBridge java bean type with the following signature:
class PciBridge
{
public int pciSlotNumber; // or public int getPciSlotNumber(){...} and public void setPciSlotNumber(int v){...}
public boolean present; // or get/is/set methods
public int functions; // or get/set methods
public String virtualDev; // or get/set methods
}
Moreover, the configuration manager responsible for reading the vmx files might expose the following method:
public <T> List<T> getObjects(final String prop, Class<T> clazz);
And then given the aforementioned configuration, invoking getObjects("pciBridge", PciBridge.class) would return a list of all the PciBridge objects specified in the configuration - the total of 5 in our case.
How do I implement this functionality? Of course, I have seen the same pattern in several different products, so I figure there should be something ready out there to implement this functionality.
Any ideas?
Thanks.
EDIT
Correction - I do not claim that VMWare utilizes the java properties file format (the double quotes are redundant), but the spirit is the same. Besides, there are proper Java applications utilizing the same pattern.
I am posting my own solution. The code depends on http://commons.apache.org/beanutils/ to reflect on the beans and on http://commons.apache.org/configuration/ to manage the property based configuration (because it supports property references using the ${} syntax).
public static <T> Collection<T> getBeans(String prop, Class<T> clazz) throws InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException {
Pattern pattern = Pattern.compile("^" + prop.replace(".", "\\.") + "(\\d*)\\.(\\w+)$");
Map<String, T> beans = new TreeMap<String, T>();
#SuppressWarnings("rawtypes")
Map description = null;
T tmpBean = null;
Iterator<String> itKeys = m_propStore.getKeys();
while (itKeys.hasNext()) {
String key = itKeys.next();
Matcher matcher = pattern.matcher(key);
boolean matchFound = matcher.find();
if (matchFound) {
if (description == null) {
tmpBean = clazz.newInstance();
description = BeanUtils.describe(tmpBean);
}
String beanPropName = matcher.group(2);
if (description.containsKey(beanPropName)) {
String beanKey = matcher.group(1);
T bean = beans.get(beanKey);
if (bean == null) {
bean = tmpBean == null ? clazz.newInstance() : tmpBean;
tmpBean = null;
beans.put(beanKey, bean);
}
try {
BeanUtils.setProperty(bean, beanPropName, m_propStore.getString(key));
} catch (Exception e) {
m_logger.error(String.format("[SystemConfiguration]: failed to set the %s.%s bean property to the value of the %s configuration property - %s",
bean.getClass().getName(), beanPropName, key, e.getMessage()));
}
}
}
}
return beans.values();
}

Categories