How to retrieve AWS Textract response JSON using the Java SDK - java

I am using AWS Textract to OCR images and create a searchable PDF as outlined in this AWS blog post.
The basic request code looks like this:
AmazonTextractClientBuilder builder = AmazonTextractClientBuilder.standard();
DetectDocumentTextRequest request = new DetectDocumentTextRequest()
.withDocument(new Document()
.withBytes(imageBytes));
DetectDocumentTextResult result = client.detectDocumentText(request);
List<Block> blocks = result.getBlocks()
This works out great however I would also like to write out and keep the original response JSON that contains all the information on what was detected where etc.
Is there a way to get to the response JSON using the JAVA SDK?

AWS doesn't return the response JSON to you in raw form. The assumption may have been that it wouldn't be required once it has been converted to a DetectDocumentTextResult object.
You are able to convert the DetectDocumentTextResult object to JSON (example) which should provide identical values. Note that the variable names will not be identical (e.g.: DocumentMetadata vs documentMetadata) but the values of those variables will be the same.

Related

How to use google.protobuf.struct in proto file to store json result in gRPC java

I Want to Return a JSON response from server to client in gRPC.
one possible way is to convert it to string return the response then convert back to Json Object in client side, but i want to know can we do better?.
i am doing some google and found we can do it with the help of google.protobuf.struct
but didn't actually find any good example.
i want an example how i can use it as JSON in java.
If you are using proto3, one option is to define a protobuf message that mirrors the JSON object you wish to populate. Then you can use JsonFormat to convert between protobuf and JSON.
Using a com.google.protobuf.Struct instead of a self-defined message can also work. There is an example shown in the similar question.

How to get custom extension attributes with microsoft graph api

I'm trying to get user details with microsoft-graph
I'm looking for a custom extension element in my response, such as "extension_3a4189d71ad149c6ab5e65ac45bd6add_MyAttribute1"
when I retrieve the response with String, I can see all the elements.
final ResponseEntity<String> response = restTemplate.exchange("http://graph.windows.net/tenant.com/me?api-version=1.6, HttpMethod.GET, new HttpEntity(headers),String.class);
But when I retrieve the response with com.microsoft.graph.models.extensions.User I can't see the extention anymore.
final ResponseEntity<User> response = restTemplate.exchange("http://graph.windows.net/tenant.com/me?api-version=1.6, HttpMethod.GET, new HttpEntity(headers),User.class);
How can I retrieve the custom extension in more elegant way than getting in String object and look for elements one by one?
because the extension attributes are specific to your tenant, that means its non standard, no out of the box "object class" in the sdk would contain it since it has the app id appended to it. extension_appid_attribname.
so you would have to handle it yourself. you can try to extend the user class and add a method to read or deserialize/map the json return from the graph api similar to what Hury suggested, or something to that effect. there won't likely be an out of the box solution for this.
there are also json libraries out there that may help you deserialize to a dynamic object of some sort, if you really didn't want to map the object manually.
Update:
I dug into this a bit further. I don't think its in extensions.extension however, I did find that in the java sdk you can access it . Here's the documentation: https://github.com/microsoftgraph/msgraph-sdk-java/wiki/Working-with-Open-Types
You would do something like
String ext =
user
.additionalDataManager()
.get("extension_2lkj3l12jl3j2kj3_yourproperty")
.getAsString();
Give that a try
Hopefully that helps.
Please use api
https://graph.microsoft.com/beta/users instead of https://graph.microsoft.com/v1.0/users api https://graph.microsoft.com/beta/users will return all data with custom data of users.

Get EC2 Instance XML Description using AWS Java SDK?

We have a scenario in which we need to retrieve the description info for EC2 instances running on AWS. To accomplish this, we are using the AWS Java SDK. In 90% of our use case, the com.amazonaws.services.ec2.model.Instance class is exactly what we need. However, there is also a small use-case where it would be beneficial to get the raw XML describing the instance. That is, the XML data before it is converted into the Instance object. Is there any way to obtain both the Instance object and the XML string using the AWS Java SDK? Is there a way to manually convert from one to the other? Or, would we be forced to make a separate call using HttpClient or something similar to get the XML data?
Make an EC2Client by adding request handler and override the beforeUnmarshalling() method like below
AmazonEC2ClientBuilder.standard().withRegion("us-east-1")
.withRequestHandlers(
new RequestHandler2() {
#Override
public HttpResponse beforeUnmarshalling(Request<?> request, HttpResponse httpResponse) {
// httpResponse.getContent() is the raw xml response from AWS
// you either save it to a file or to a XML document
return new HTTPResponse(...);
// if you consumed httpResponse.getContent(), you need to provide new HTTPResponse
}
}
).build():
If you have xml (e.g. from using AWS rest API directly), then you can use com.amazonaws.services.ec2.model.transform.* classes to convert xml to java objects. Unfortunately, it only provides classes required for SDK itself. So you, for example, can convert raw XML to an Instance using InstanceStaxUnmarshaller, but can't convert Instance to XML unless you write such converter.
Here is an example how to parse an Instance XML:
XMLEventReader eventReader = XMLInputFactory.newInstance().createXMLEventReader(new StringReader(instanceXml));
StaxUnmarshallerContext suc = new StaxUnmarshallerContext(eventReader, new TreeMap<>());
InstanceStaxUnmarshaller isu = new InstanceStaxUnmarshaller();
Instance i = isu.unmarshall(suc);
System.out.println(i.toString());
You probably can try to intercept raw AWS response, so that you can keep raw XML while still using SDK most of the time. But I wouldn't call that easy as it will require quite a bit of coding.
You could use JAXB.marshal like following. JAXB (Java Architecture for XML Binding) could convert Java object to / from XML file.
StringWriter sw = new StringWriter();
JAXB.marshal(instance, sw);
String xmlString = sw.toString();
You can use AWS rest API to replace Java SDK. A bonus will be slight performance gain because you'll not send statistic data to Amazon as the SDK does.

How can I get json documents from couchbase in java?

I have a couchbase database that is shared between multiple applications, storing documents as json. I cannot seem to get data into my java app, since it appears to be dependent on native java binary serialization.
This code:
CouchbaseClient client = new CouchbaseClient(hosts,"bucket","");
System.out.println((String)client.get("someKey"));
results in
net.spy.memcached.transcoders.SerializingTranscoder: Failed to decompress data
java.util.zip.ZipException: Not in GZIP format
since it is trying to deserialize by default. I notice that I can provide my own transcoder, but I really only want the raw string data so I can json parse it myself using gson or whatever. None of the available transcoders seem to give me this.
The couchbase docs have an example for setting json, but none for reading it. How are people reading json into java?
First off, this problem will go away soon in that the Couchbase "2.0 SDKs" implement common flags between each other so this kind of problem doesn't come up. Michael's blogs are a good read if you want to see what's happening here. The reason for the problem in the first place is that in the 1.x series, Couchbase was trying to stay compatible with existing application code and memcached. In the memcached world, the clients were all written by different people at different times.
Based on the exception, I believe you're probably trying to read an item stored by .NET. I have a sample transcoder you can use for this from a few weeks ago.
Make sure you are using latest CB java client:
<dependencies>
<dependency>
<groupId>com.couchbase.client</groupId>
<artifactId>couchbase-client</artifactId>
<version>1.4.4</version>
</dependency>
</dependencies>
see: Couchbase Java Client Library 1.4
I have my service that uses CB client running just fine. Here is how I create client:
CouchbaseConnectionFactoryBuilder cfb = new CouchbaseConnectionFactoryBuilder();
cfb.setOpTimeout(10000);
cfb.setOpQueueMaxBlockTime(5000);
CouchbaseClient client = new CouchbaseClient(cfb.buildCouchbaseConnection(baseURIs, bucketName, ""));
And here is an example how I get a raw string and convert it to POJOs:
MyPOJO get(CouchbaseClient client, String key)
{
com.google.gson.Gson gson = new com.google.gson.Gson();
String jsonValue = (String) client.get(key);
return gson.fromJson(jsonValue, MyPOJO.class);
}
Also, update your question with the sample JSON doc that causing this issue. Perhaps it has something to do with the format of the document itself.

Jersey Post request - How to perform a file upload with an unknown number of additional parameters?

I asked something like this previously, but upon re-reading my original post, it was not easy to understand what I was really asking. I have the following situation. We have (or at least I'm trying to get working) a custom file upload procedure that will take in the file, a set number of 'known' metadata values (and they will always be there), as well as potentially an unknown number of additional metadata values. The service that exists currently uses the Jersey framework (1.16)
I currently have both client and server code that handles dealing with the file upload portion and the known metadata values (server code below)
#POST
#Path("asset/{obfuscatedValue0}/")
#Consumes(MediaType.MULTIPART_FORM_DATA)
public UUID uploadBlob(#PathParam("obfuscatedValue0") Integer obfuscatedValue0,
#FormDataParam("obfuscatedValue1") String obfuscatedValue1,
#FormDataParam("obfuscatedValue2") String obfuscatedValue2,
#FormDataParam("obfuscatedValue3") String obfuscatedValue3,
#FormDataParam("obfuscatedValue4") String obfuscatedValue4,
#FormDataParam("obfuscatedValue5") String obfuscatedValue5,
#FormDataParam("file") InputStream uploadedInputStream) {
.....
}
...and excerpt of client code:
Builder requestBuilder = _storageService
.path("asset")
.path(obfuscatedValue0.toString())
.type(MediaType.MULTIPART_FORM_DATA)
.accept(MediaType.APPLICATION_JSON);
FormDataMultiPart part = new FormDataMultiPart()
.field("file", is, MediaType.TEXT_PLAIN_TYPE) // 'is' is an inputstream from earlier in code.
.field("obfuscatedValue1", obfuscatedValue1)
.field("obfuscatedValue2", obfuscatedValue2)
.field("obfuscatedValue3", obfuscatedValue3)
.field("obfuscatedValue4", obfuscatedValue4)
.field("obfuscatedValue5", obfuscatedValue5);
storedAsset = requestBuilder.post(UUID.class, part);
However, I need to pass a map of additional parameters that will have an unknown number of values/names. From what I've seen, there is no easy way to do this using the FormDataParam annotation like my previous example.
Based upon various internet searches related to Jersey file uploads, I've attempted to convert it to use MultivaluedMap with the content type set to "application/x-www-form-urlencoded" so it resembles this:
#POST
#Path("asset/{value}/")
#Consumes("application/x-www-form-urlencoded")
public UUID uploadBlob(#PathParam(value), MultivaluedMap<String,String> formParams) {
....
}
It's my understanding that MultivaluedMap is intended to obtain a general map of form parameters (and as such, cannot play nicely together in the same method bearing #FormDataParam annotations.) If I can pass all this information from the Client inside some sort of map, I think I can figure out how to handle parsing the map to grab and 'doMagic()' on the data to get what I want done; I don't think I'll have a problem there.
What I AM fairly confused about is how to format the request client-side code when using this second method within the jersey framework. Can anyone provide some guidance for the situation, or some suggestions on how to proceed? I'm considering trying the solution proposed here and developing a custom xml adapter to deal with this situation, and sending xml instead of multipart-form-data but I'm still confused how this would interact with the InputStream value that will need to be passed. It appears the examples with MultivaluedMap that I've seen only deal with String data.

Categories