Metadata, content-length for GCS objects

Metadata, content-length for GCS objects - java

Two things:
I am trying to set custom metadata on a GCS object signed URL.
I am trying to set a maximum file size on a GCS object signed URL.
Using the following code:
Map<String, String> headers = new HashMap<>();
headers.put("x-goog-meta-" + usernameKey, username);
if (StringUtils.hasText(purpose)) {
headers.put("x-goog-meta-" + purposeKey, purpose);
}
if (maxFileSizeMb != null) {
headers.put("x-goog-content-length-range", String.format("0,%d", maxFileSizeMb * 1048576));
}
List<Storage.SignUrlOption> options = new ArrayList<>();
options.add(Storage.SignUrlOption.httpMethod(HttpMethod.POST));
options.add(Storage.SignUrlOption.withExtHeaders(headers));
String documentId = documentIdGenerator.generateDocumentId().getFormatted();
StorageDocument storageDocument =
StorageDocument.builder().id(documentId).path(getPathByDocumentId(documentId)).build();
storageDocument.setFormattedName(documentId);
SignedUrlData.SignedUrlDataBuilder builder =
SignedUrlData.builder()
.signedUrl(storageInterface.signUrl(gcpStorageBucket, storageDocument, options))
.documentId(documentId)
.additionalHeaders(headers);
First of all the generated signed URL works and I can upload a document.
Now I am expecting to see the object metadata through the console view. There is no metadata set though. Also the content-length-range is not respected. I can upload a 1.3 MB file when the content-length-range is set to 0,1.
Something happens when I upload a bigger file (~ 5 MB), but within the content-length-range. I receive an error message: Metadata part is too large..

As you can see here content-length-range requires both a minimum and maximum size. The unit used for the range is bytes, as you can see in this example.
I also noticed that you used x-goog-content-length-range, I found this documentation for it, when using this header take into account:
Use a PUT request, otherwise it will be silently ignored.
If the size of the request's content is outside the specified range, the request fails and a 400 Bad Request code is returned in the response.
You have to set the minimum and maximum size in bytes.

Related

Obtain Folder size in Azure Data Lake Gen2 using Java

There is some literature over the internet for C# to compute folder size. But could not find Java.
Is there an easy way to know the folder size? in Gen2
How to compute if not?
There are several examples on the internet for (2) with C# and powershell. Any means with Java?

As far as I am aware, there is no API that directly provides the folder size in Azure Data Lake Gen2.
To do it recursively:
DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClientBuilder()
.credential(new StorageSharedKeyCredential(storageAccountName, secret))
.endpoint(endpoint)
.buildClient();
DataLakeFileSystemClient container = dataLakeServiceClient.getFileSystemClient(containerName);
/**
* Returns the size in bytes
*
* #param folder
* #return
*/
#Beta
public Long getSize(String folder) {
DataLakeDirectoryClient directoryClient = container.getDirectoryClient(folder);
if (directoryClient.exists()) {
AtomicInteger count = new AtomicInteger();
return directoryClient.listPaths(true, false, null, null)
.stream()
.filter(x -> !x.isDirectory())
.mapToLong(PathItem::getContentLength)
.sum();
}
throw new RuntimeException("Not a valid folder: " + folder);
}
This recursively iterates through the folders and obtains the size.
The default records per page is 5000. So if there are 12000 records (folders + files combined), it would need to make 3 API calls to fetch details. From the docs:
recursive – Specifies if the call should recursively include all
paths.
userPrincipleNameReturned – If "true", the user identity values
returned in the x-ms-owner, x-ms-group, and x-ms-acl response headers
will be transformed from Azure Active Directory Object IDs to User
Principal Names. If "false", the values will be returned as Azure
Active Directory Object IDs. The default value is false. Note that
group and application Object IDs are not translated because they do
not have unique friendly names.
maxResults – Specifies the maximum
number of blobs to return per page, including all BlobPrefix elements.
If the request does not specify maxResults or specifies a value
greater than 5,000, the server will return up to 5,000 items per page.
If iterating by page, the page size passed to byPage methods such as
PagedIterable.iterableByPage(int) will be preferred over this value.
timeout – An optional timeout value beyond which a RuntimeException
will be raised.

Unable to change chunkSize in MongoDB

I have deployed a MongoDB which is the database of my Parse server. But I'm not able to upload files more than 700 kilobytes to parse server, with parseFile.saveFileInBackground() function in parse-android-SDK. The exception thrown by callback is I/O Failure, and the cause is timeout. I considered too many issues, and I figured out maybe changing the default chunk size that files are divided into them in MongoDB, works something out. I used db.settings.save( { _id:"chunkSize", value: <size-in-magabyte> } ) in MongoDB shell. But I can not really feel any changes. After this command chunkSize field of entries stored under files.fs class in MongoDB, is still the same as before, 261120. What I actually want to do is change this 261120 to 358400. I want my files to be divided into chunks of size 358400 byte and not 261120. Any helps would be appreciated.

I managed to change the mongodb chunk size from default value of 261120bytes chunks to 3000000 bytes chunks. I am using multer gridfs storage and below is the code I did to increase the chunk size:
const storage = new GridFsStorage({
db: connection,
file: (req,file) => {
return new Promise(resolve => {
const fileInfo = {
filename: file.originalname,
bucketName: "file_uploads",
chunkSize: 3000000
};
resolve(fileInfo);
});
}
});

jsoup don't get full data

I have a project for school to parse web code and use it like a data base. When I tried to down data from (https://www.marathonbet.com/en/betting/Football/), I didn't get it all?
Here is my code:
Document doc = Jsoup.connect("https://www.marathonbet.com/en/betting/Football/").get();
Elements newsHeadlines = doc.select("div#container_EVENTS");
for (Element e: newsHeadlines.select("[id^=container_]")) {
System.out.println(e.select("[class^=block-events-head]").first().text());
System.out.println(e.select("[class^=foot-market]").select("[class^=event]").text());
}
for result you get (this is last of displayed leagues):
Football. Friendlies. Internationals All bets Main bets
1. USA 2. Mexico 16 Apr 01:30 +124 7/5 23/10 111/50 +124
on top of her are all leagues displayed.
Why don't I get full data? Thank you for your time!

Jsoup has a default body response limit of 2MB. You can change it to whatever you need with maxBodySize(int)
Set the maximum bytes to read from the (uncompressed) connection into
the body, before the connection is closed, and the input truncated.
The default maximum is 2MB. A max size of zero is treated as an
infinite amount (bounded only by your patience and the memory
available on your machine).
E.g.:
Document doc = Jsoup.get(url).userAgent(ua).maxBodySize(0).get();
You might like to look at the other options in Connection, on how to set request timeouts, the user-agent, etc.

Document feed limit

Is there a limit to the number of entries returned in a DocumentListFeed? I'm getting 100 results, and some of the collections in my account are missing.
How can I make sure I get all of the collections in my account?
DocsService service = new DocsService(APP_NAME);
service.setHeader("Authorization", "Bearer " + accessToken);
String feedUrl = new URL("https://docs.google.com/feeds/default/private/full/-/folder?v=3&showfolders=true&showroot=true");
DocumentLisFeed feed = service.getFeed(feedUrl, DocumentListFeed.class);
List<DocumentListEntry> entries = feed.getEntries();
The size of entries is 100.

A single request to the Documents List feed by default returns 100 element, but you can configure that value by setting the ?max-results query parameter.
Regardless, in order to retrieve all documents and files you should always take into account sending multiple requests, one per page, as explained in the documentation:
https://developers.google.com/google-apps/documents-list/#getting_all_pages_of_documents_and_files
Please also note that it is now recommended to switch to the newer Google Drive API, which interacts with the same resources and has complete documentation and sample code in multiple languages, including Java:
https://developers.google.com/drive/

You can call
feed.getNextLink().getHref()
to get a URL that you an form another feed with. This can be done until the link is null, at which point all the entries have been fetched.

Oracle Workflow API: adding and accessing file Attachments to a Human Task

I am using the Workflow Services Java API (11.1.1) for SOA Suite to access and manipulate human tasks. I would like to be able to access and add file attachments to existing human tasks. I am using the methods provided in the AttachmentType interface.
When adding an attachment, the problem I am running into is that an attachment does get created and associated with the task, however it is empty and has no content. I have attempted both setting the input stream of the attachment, as well as the content string and in each case have had no success (and setting the content string results in an exception when trying to update the corresponding task).
I have successfully added and accessed an attachment using the worklist application, however when trying to access the content of this attachment through code I receive an object with mostly null/0 values throughout, apart from the attachment name.
The code I am using to access attachments resembles:
List attachments = taskWithAttachments.getAttachment();
for(Object o : attachments){
AttachmentType a = (AttachmentType) o;
String content = a.getContent(); // NULL
InputStream str = a.getInputStream(); // NULL
String name = a.getName(); // Has the attachment name
String mime = a.getMimeType(); // Has the mime type
long size = a.getSize(); // 0
...
}
As the API's are not overly rich in documentation I may well be using them incorrectly. I would really appreciate any help/suggestions/alternatives in dealing with BPEL task attachments.
Thanks

After contacting Oracle for support, it turns out that the attachments portion of the Workflow API is broken in the current release. The fix will be included in a future release.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Metadata, content-length for GCS objects - java

Related

Obtain Folder size in Azure Data Lake Gen2 using Java

Unable to change chunkSize in MongoDB

jsoup don't get full data

Document feed limit

Oracle Workflow API: adding and accessing file Attachments to a Human Task

Categories

Resources