Iterating and retrieving metadata of all objects in Amazon S3

Iterating and retrieving metadata of all objects in Amazon S3 - java

I am using AWS Java SDK to interact with S3. I want to iterate through all the objects in the storage and retrieve metadata of each object. I can iterate through the objects using lists as:
ObjectListing list= s3client.listObjects("bucket name");
But I am able to retrieve only summaries through the object in the list. Instead of summary I need metadata of each object, like the one provided by getObjectMetadata() method in S3Object class. How do I get that?

You can get four default metadata from objectSummary that returned from lisObject : Last Modified, Storage Type, Etag and Size.
To get metadata of objects, you need to perform HEAD object request on object or you call following method on your object :
GetObjectMetadataRequest(String bucketName, String key)
Look at this:
ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
.withBucketName(bucketName);
ObjectListing objectListing;
do {
objectListing = s3client.listObjects(listObjectsRequest);
for (S3ObjectSummary objectSummary
: objectListing.getObjectSummaries()) {
/** Default Metadata **/
Date dtLastModified = objectSummary.getLastModified();
String sEtag = objectSummary.getETag();
long lSize = objectSummary.getSize();
String sStorageClass = objectSummary.getStorageClass();
/** To get user defined metadata **/
ObjectMetadata objectMetadata = s3client.getObjectMetadata(bucketName, objectSummary.getKey());
Map userMetadataMap = objectMetadata.getUserMetadata();
Map rowMetadataMap = objectMetadata.getRawMetadata();
}
listObjectsRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());
For more details on GetObjectMetadataRequest, look this link.

Related

Mongo automatic POJO maps object with null values \

I'm trying to get documents from MongoDB and map them to my object. Inserting works well. Retrieving collection also seems have everything in it. But I can't figure out why deserializing object doesn't work as intented. I'm new to Java and I could've simply forget about something, didn't understand documentation properly. But from what I've read it should work properly.
I have this code
[...]
CodecProvider pojoCodecProvider = PojoCodecProvider.builder().automatic(true).build();
CodecRegistry pojoCodecRegistry = fromRegistries(MongoClientSettings.getDefaultCodecRegistry(), fromProviders(pojoCodecProvider));
_mongoClient = MongoClients.create(mongoClientURI);
_mongoDataBase = _mongoClient.getDatabase(dbName).withCodecRegistry(pojoCodecRegistry);
[...]
And in a fuction another function I get the collection and cast it to an ArrayList:
MongoCollection<Document> collection = _mongoDataBase.getCollection(collectionName);
Document query = new Document();
List<StockTakingItem> stockItems = collection.find(query, StockTakingItem.class).into(new ArrayList<StockTakingItem>());
Though if I display the values then:
For MongoCollection collection I got this (so the document with proper values is there).
{ "_id" : "asdfasdfasdf", "Description" : "aaaaaaaaaaaaaaaa description ", "Note" : "nnnnnnnnnnnnnnnn Another note " }
But my StockTakingItem has null values in it (except for an id)
StockTakingItem:id: 'asdfasdfasdf', Description: 'null', Note: 'null
EDIT:
StockTakingItem class:
public class StockTakingItem {
#BsonId
String _id;
String _description;
String _note;
[getters and setters]
}
This code for inserting is working:
var collection = _mongoDataBase.getCollection(collectionName);
collection.insertOne(docToInsert);
Why are there null values and how can I do it properly so it works as intended?

How to get single GridFS file using Java driver 3.7+?

I need to get single the GridFS file using Java driver 3.7+.
I have two collections with file in a database: photo.files and photo.chunks.
The photo.chunks collection contains the binary file like:
The photo.files collection contains the metadata of the document.
To find document using simple database I wrote:
Document doc = collection_messages.find(eq("flag", true)).first();
String messageText = (String) Objects.requireNonNull(doc).get("message");
I tried to find file and wrote in same way as with an example above, according to my collections on screens:
MongoDatabase database_photos = mongoClient.getDatabase("database_photos");
GridFSBucket photos_fs = GridFSBuckets.create(database_photos,
"photos");
...
...
GridFSFindIterable gridFSFile = photos_fs.find(eq("_id", new ObjectId()));
String file = Objects.requireNonNull(gridFSFile.first()).getMD5();
And like:
GridFSFindIterable gridFSFile = photos_fs.find(eq("_id", new ObjectId()));
String file = Objects.requireNonNull(gridFSFile.first()).getFilename();
But I get an error:
java.lang.NullPointerException
at java.util.Objects.requireNonNull(Objects.java:203)
at project.Bot.onUpdateReceived(Bot.java:832)
at java.util.ArrayList.forEach(ArrayList.java:1249)
Also I checked docs of 3.7 driver, but this example shows how to find several files, but I need single:
gridFSBucket.find().forEach(
new Block<GridFSFile>() {
public void apply(final GridFSFile gridFSFile) {
System.out.println(gridFSFile.getFilename());
}
});
Can someone show me an example how to realize it properly?
I mean getting data, e.g. in chunks collection by Object_id and md5 field also by Object_id in metadata collection.
Thanks in advance.

To find and use specific files:
photos_fs.find(eq("_id", objectId)).forEach(
(Block<GridFSFile>) gridFSFile -> {
// to do something
});
or as alternative, I can find specific field of the file.
It can be done firstly by creating objectId of the first file, then pass it to GridFSFindIterable object to get particular field and value from database and get finally file to convert into String.
MongoDatabase database_photos =
mongoClient.getDatabase("database_photos");
GridFSBucket photos_fs = GridFSBuckets.create(database_photos,
"photos");
...
...
ObjectId objectId = Objects.requireNonNull(photos_fs.find().first()).getObjectId();
GridFSFindIterable gridFSFindIterable = photos_fs.find(eq("_id", objectId));
GridFSFile gridFSFile = Objects.requireNonNull(gridFSFindIterable.first());
String file = Objects.requireNonNull(gridFSFile).getMD5();
But it checks files from photo.files not from photo.chunkscollection.
And I'm not sure that this way is code-safe, because of debug info, but it works despite the warning:
Inconvertible types; cannot cast 'com.mongodb.client.gridfs.model.GridFSFile' to 'com.mongodb.client.gridfs.GridFSFindIterableImpl'

Iterate through Files in Google Cloud Bucket

I am attempting to implement a relatively simple ETL pipeline that iterates through files in a google cloud bucket. The bucket has two folders: /input and /output.
What I'm trying to do is write a Java/Scala script to iterate through files in /input, and have the transformation applied to those that are not present in /output or those that have a timestamp later than that in /output. I've been looking through the Java API doc for a function I can leverage (as opposed to just calling gsutil ls ...), but haven't had any luck so far. Any recommendations on where to look in the doc?
Edit: There is a better way to do this than using data transfer objects:
public Page<Blob> listBlobs() {
// [START listBlobs]
Page<Blob> blobs = bucket.list();
for (Blob blob : blobs.iterateAll()) {
// do something with the blob
}
// [END listBlobs]
return blobs;
}
Old method:
def getBucketFolderContents(
bucketName: String
) = {
val credential = getCredential
val httpTransport = GoogleNetHttpTransport.newTrustedTransport()
val requestFactory = httpTransport.createRequestFactory(credential)
val uri = "https://www.googleapis.com/storage/v1/b/" + URLEncoder.encode(
bucketName,
"UTF-8") +
"o/raw%2f"
val url = new GenericUrl(uri)
val request = requestFactory.buildGetRequest(uri)
val response = request.execute()
response
}
}

You can list objects under a folder by setting the prefix string on the object listing API: https://cloud.google.com/storage/docs/json_api/v1/objects/list
The results of listing are sorted, so you should be able to list both folders and then walk through both in order and generate the diff list.

Google Custom search API: how to get search result contents description (e.g snippets) for URL

how can we get the contents of the URL retrieved using google custom search API.
I am new to work with such APIs and in documentation no such sample code is given that can explain it. I am using google-api-services-customsearch-v1-rev36-1.17.0-rc.jar
here is my code.
protected Result[] doSearch() {
HttpRequestInitializer httpRequestInitializer = new HttpRequestInitializer(){
#Override
public void initialize(HttpRequest request) throws IOException {
}
};
JsonFactory jsonFactory = new JacksonFactory();
Customsearch csearch = new Customsearch( new NetHttpTransport(), jsonFactory, httpRequestInitializer);
Customsearch.Cse.List listReqst = csearch.cse().list(query.getQueryString());
listReqst.setKey(GOOGLE_KEY);
// set the search engine ID got from API console
listReqst.setCx("SEARCH_ENGINE_ID");
// set the query string
listReqst.setQ(query); //query contains search query string
// language chosen is English for search results
listReqst.setLr("lang_en");
// set hit position of first search result
listReqst.setStart((long) firstResult);
// set max number of search results to return
listReqst.setNum((long) maxResults);
Search result = list.execute();
// perform search
}
here after this need to get the snippets and URLs of the corresponding websites. which I have to return in this function. how can we retrieve them.

In the final line of your code it executes the query, returns the results, and parses them into that 'Search' object, described here:
https://developers.google.com/resources/api-libraries/documentation/customsearch/v1/java/latest/com/google/api/services/customsearch/model/Search.html
So, to get the URL and snippet for each result you just do:
List<Result> results = result.getItems();
for (Result r : results) {
String url = r.getLink();
String snippet = r.getSnippet();
}
To return all the Results, as per your function signiture above, you just need to convert the list to an array:
List<Result> results = result.getItems();
return results.toArray( new Result[results.size()] );

how to get all instances with a tag under my amazon account using aws java sdk

I want to get all instance id with a particular tag running under my AWS account using java aws sdk. can someone please guide me how can i get this.thanks

I did it by using filter and for example get all the instances created by same key-pair value
DescribeInstancesRequest request = new DescribeInstancesRequest();
List<String> valuesT1 = new ArrayList<String>();
valuesT1.add("my-keypair-name");
Filter filter = new Filter("key-name", valuesT1);
DescribeInstancesResult result = ec2.describeInstances(request.withFilters(filter));
List<Reservation> reservations = result.getReservations();
for (Reservation reservation : reservations) {
List<Instance> instances = reservation.getInstances();
for (Instance instance : instances) {
System.out.println(instance.getInstanceId());
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Iterating and retrieving metadata of all objects in Amazon S3 - java

Related

Mongo automatic POJO maps object with null values \

How to get single GridFS file using Java driver 3.7+?

Iterate through Files in Google Cloud Bucket

Google Custom search API: how to get search result contents description (e.g snippets) for URL

how to get all instances with a tag under my amazon account using aws java sdk

Categories

Resources