AWS textract Extract the meta-data and confidence score - java

Hi all i have extracted the document meta-data from AWS texttract Asynchronous call using java SDK but the meta-data is segregated into multiple blocks and it's huge.
How to extract the confidence score, value and its field name separately using java code i want to extract result something like below:
[{
"Field" : "FirstName",
"Value" : "XXXXX",
"confidence Score" : "98.88"
},
{
"Field" : "LastName",
"Value" : "XXXXX",
"confidence Score" : "65.98"
}]
Could anyone please suggest how to extract the field,value and its confidence score from aws texttract document meta-data?
anyone having any idea on this?

AWS has provided an example for mapping key and value pairs in python. You can use this code to understand the logic and come up with your own code in JAVA.
Source: https://docs.aws.amazon.com/textract/latest/dg/examples-extract-kvp.html

I have just begun with AWS Textract too in Java and wow what a great tool ! I have included code in my answer at this link if you would like to take a look :)
It extracts the keys and values. I suggest you create a model with Key, Value and confidence scores and then create an object for each key value pair
public static ArrayList<KVPair> getKVObjects(List<Block> keyMap, List<Block> valueMap, List<Block> blockMap ) {
ArrayList<KVPair> labelValues = new ArrayList<>();
Block value_block;
for (Block key_block : keyMap) {
value_block = findValueBlock(key_block, valueMap);
String key = getText(key_block, blockMap);
Float top = value_block.getGeometry().getBoundingBox().getTop();
Float left = value_block.getGeometry().getBoundingBox().getLeft();
Float confidenceScore = value_block.getConfidence();
Optional<KVPair> label= (labelValues.stream().filter(x-> x.getLabel().equals(key)).findFirst());
Property property = new Property();
property.setValue(getText(value_block, blockMap));
property.setLocationLeft(left);
property.setLocationTop(top);
property.setConfidenceScore(confidenceScore);
if(label.isPresent()){
label.get().setProperties(property);
}else{
KVPair KVPair = new KVPair();
KVPair.setLabel(key);
KVPair.setProperties(property);
labelValues.add(KVPair);
}
}
return labelValues;
}
AWS-Textract-Key-Value-Pair Java - thread "main" java.lang.NullPointerException

Related

Reindex selected _source fields using Rest high level client in java

I want to re_index only selected fields from my document in elasticsearch using Rest High level client.
I know the elasticsearch query to achieve that but I don't know it's equivalent query using rest client.
Following is the elasticsearch query which I am trying to implement using rest client -
{
"body" : {
"source" : {
"index" : "my source index name",
"_source" : "id, name, rollNo"
},
"dest" : {
"index" : "my destination index name"
}
}
}
To write its equivalent query using rest client in java, I have used the following code -
ReindexRequest reindexRequest = new ReindexRequest();
reindexRequest.setSourceIndices("source index name").setDestIndex("destination index name");
reindexRequest.setDocTypes("id", "name", "rollNo", "_doc");
client.reindex(reindexRequest,RequestOptions.DEFAULT);
But the above code is not working as expected. It's re_indexing all the fields of my document. I want only selective 3 fields to be re_indexed from each doc.
You need to use below code as setDocTypes is not used for source filtering.
As there is no direct method available for setting source filter so you need to change underlying search request suing below code.
ReindexRequest reindexRequest = new ReindexRequest();
reindexRequest.setSourceIndices("source index name").setDestIndex("destination index name");
reindexRequest.setDocTypes("_doc");
String[] include=new String[] {"id", "name", "rollNo"};
String[] exclude=new String[] {"test"};
reindexRequest.getSearchRequest().source().fetchSource(include, exclude);
client.reindex(reindexRequest,RequestOptions.DEFAULT);

How to return only 1 field in MongoDB?

I'm trying to get the order number where a transactionId is equal to another variable I have in my code. My tolls.booths collection looks like this
In my code,
def boothsException = booths.find([ "pings.loc.transactionId": tollEvent.id, "pings.loc.order":1] as BasicDBObject).iterator()
println boothsException
I am getting boothsException = DBCursor{collection=DBCollection{database=DB{name='tolls'}
I would like to essentially say get where transactionId = 410527376 and give me that order number in boothsException (5233423).
This is using MongoDB Java Driver v3.12.2.
The code extracts the value from the returned cursor. I am using newer APIs, so you will find some differences in class names.
int transId = 410527376; // same as tollEvent.id
MongoCursor<Document> cursor = collection
.find(eq("pings.loc.transactionId", transId))
.projection(fields(elemMatch("pings.loc.transactionId"), excludeId()))
.iterator();
while (cursor.hasNext()) {
Document doc = cursor.next();
List<Document> pings = doc.get("pings", List.class);
Integer order = pings.get(0).getEmbedded(Arrays.asList("loc","order"), Double.class).intValue();
System.out.println(order.toString()); // prints 5233423
}
NOTES:
The query with projection gets the following one sub-document from the pings array:
"pings" : [
{
"upvote" : 575,
"loc" : {
"type" : "2dsphere",
"coordinates" : [ .... ],
"transactionId" : 410527376,
"order" : 5233423
},
...
}
]
The remaining code with looping the cursor is to extract the order value from it.
The following are the imports used with the find method's filter and projection:
import static com.mongodb.client.model.Filters.*;
import static com.mongodb.client.model.Projections.*;

check nested key existence in video object in youtube-api to avoid NULLPOINTEREXCEPTION

I want to check the existence of nested key in Video object returned as a Json response from youtube video search by using below code:-
YouTube.Videos.List searchvideostats = youtube.videos().list("snippet,statistics");
searchvideostats.setKey(apiKey);
Video v = searchvideostats.execute().getItems().get(0);
System.out.println(v.toPrettyString());
I got output like this:-
{
"etag" : "\"m2yskBQFythfE4irbTIeOgYYfBU/-TONXAYMx_10Caihcoac4XCyb4I\"",
"id" : "4Aa9GwWaRv0",
"kind" : "youtube#video",
"snippet" : {
"categoryId" : "10",
...................
.........
MY goal is:- how to check whether categoryId key is present in this response or not. coz if do v.getSnippet().getCategoryId() it gives NullPointerException if categotyId is not present in Json.
Tried:-
if (v.containsKey("id")) {
System.out.println("contains");
} else {
System.out.println("doesnt contains");
}
this returns contains as expected.
if (v.containsKey("categoryId")) {
System.out.println("contains");
} else {
System.out.println("doesnt contains");
}
This returns doesnt contains.. which is not expected. How would I check if this nested key is available?
P.S. -> I have to check many nested such keys.
Thanks for help.
You don't need String manipulations. Just use Youtube library.
if(video.getSnippet()!=null && video.getSnippet().getCategoryId()!=null)
will do the stuff.
note: checking for zero length categoryid might be necessary also.

How to retrieve the value of a key(nested) in JSON which is stored in mongoDB using JAVA?

Below is the JSON file from which I want to retrieve the phone number:
"_data" : {
"Variable key" : {
"Name" : "Hello World",
"Phone" : "Phone : 123-456-6789 ",
"Region" : "New York",
"Description" : ""
}
}
My Java Code is:
BasicDBObject query = new BasicDBObject();
BasicDBObject field = new BasicDBObject();
field.put("_data.Phone", 1);
DBCursor cursor = table.find(query,field);
String str;
while (cursor.hasNext()) {
BasicDBObject obj = (BasicDBObject) cursor.next();
str=cursor.curr().get("_data.Phone").toString();
System.out.println(str);
}
which will return null as I'm not considering the variable key.
My problem is there are many JSON files present in the mongo database each having different "Variable Key" and this key may change after sometime. As this key may change over time, how can I retrieve the phone number ?
Thank You !!
Which phone numbers do you want? Your query will return all documents and you are trying to project out just the phone number, but with an incorrect projection specification. If you want all phone numbers, just leave out the projection specification entirely or project on { "_data" : 1 }. If you want the phone numbers associated with specific variable keys, project those out using dot notation like { "_data.key_name.Phone" : 1 }. If you don't know the names of the keys that you want to project on, then that is your root problem that you need to solve before you ask MongoDB to return something that you don't know that you want (or that you don't want).

How to access nested element of mongo query result using Java and R?

I am using Java and R to fetch data from my database and implement prediction. My json in mongodb is like:
{
"Server" : [
{
"deviceName" : "NEWSCVMM",
"availability" : 100,
"osVersion" : "6.3.9600",
"averageResponseTime" : 0.422,
"useswapmemory" : "983040",
"freeswapmemory" : "983040",
"model" : "Virtual Machine",
"numberOfCpu" : "1",
"vendor" : "Microsoft Corporation",
"vmList" : [ ],
"macadd" : [ ],
"cpuInfo" : "Intel(R) Xeon(R) CPU X5670 # 2.93GHz",
"memory" : "6188596",
"serialNo" : "00252-00000-00000-AA228",
"cpuUtilization" : 0,
}]
}
I want to access cpuUtilization from that json. I tried to access nested values using (.) but get result as NULL. I also tried to execute same on R shell but get result as NULL.
Here is what I have tried so far:
c.eval("query <- dbGetQuery(db,' mycollection','{\"hostId\":\"0.0.0.0\",\"windowsServer.cpuUtilization\":{\"$ne\":\"null\"},\"runtimeMillis\":{\"$ne\":\"null\"}}')");
c.eval("date <- query$runtimeMillis");
c.eval("host_id <- query$hostId");
c.eval("cpu <- query$Server.cpuUtilization");
c.eval("all_data<-data.frame(cpu,date)");
c.eval("training<- all_data");
c.eval("rf_fit<-randomForest(cpu~date,data=training,ntree=500)");
c.eval("new <- data.frame(date="+my_predicted_date+ ")");
c.eval("predictions<-predict(rf_fit,newdata=new)");
REXP memory_predictions= c.eval("predictions");
How do I access nested elements on either R shell or using java?
Dot notation should work in Java, I have been using it for quite a while now.
Check your code carefully, there might be a slight mistake somewhere.
Can you please post a part of your code here? I can then take a look and comment.
Here is what I normally do -
Lets say my document is like this -
{_id:0, start : {x:10, y:50}, end : {x:20, y:100}}
{_id:1, start : {x:20, y:50}, end : {x:30, y:100}}
{_id:2, start : {x:30, y:50}, end : {x:40, y:100}}
If I want to query based on "x" value in start field, I would write something like this:
MongoClient client = new MongoClient();
DB db = client.getDB("YourDBNAME");
DBCollection collection = db.getCollection("YOURCOLLECTIONAME");
QueryBuilder builder = QueryBuilder().start("start.x").greaterThan("20");
DBCursor cur = collection.find(builder.get());
while(cursor.hasNext())
{
System.out.println(cursor.next());
}
Compare this code snippet with yours and see. You should get it working fine.
Let me know if this helps or you need more help on this.

Categories