Unable to change chunkSize in MongoDB - java

I have deployed a MongoDB which is the database of my Parse server. But I'm not able to upload files more than 700 kilobytes to parse server, with parseFile.saveFileInBackground() function in parse-android-SDK. The exception thrown by callback is I/O Failure, and the cause is timeout. I considered too many issues, and I figured out maybe changing the default chunk size that files are divided into them in MongoDB, works something out. I used db.settings.save( { _id:"chunkSize", value: <size-in-magabyte> } ) in MongoDB shell. But I can not really feel any changes. After this command chunkSize field of entries stored under files.fs class in MongoDB, is still the same as before, 261120. What I actually want to do is change this 261120 to 358400. I want my files to be divided into chunks of size 358400 byte and not 261120. Any helps would be appreciated.

I managed to change the mongodb chunk size from default value of 261120bytes chunks to 3000000 bytes chunks. I am using multer gridfs storage and below is the code I did to increase the chunk size:
const storage = new GridFsStorage({
db: connection,
file: (req,file) => {
return new Promise(resolve => {
const fileInfo = {
filename: file.originalname,
bucketName: "file_uploads",
chunkSize: 3000000
};
resolve(fileInfo);
});
}
});

Related

readFile cause "Could not fulfill resource requirements of job"

I have s3 with terabytes of data, separated to small files less than 5 mb.
I try to use flink to process them.
I create source with next code.
var inputFormat = new TextInputFormat(null);
inputFormat.setNestedFileEnumeration(true);
return streamExecutionEnvironment.readFile(inputFormat, "s3://name/");
But used memory growing up to limit, and job killed, and not scheduled again with error:
Could not fulfill resource requirements of job
Without data in sink.
On small set of data it works fine.
How I can read files without using too much memory?
Thanks.
same behaviour with:
env.fromSource( FileSource.forRecordStreamFormat(
new TextLineFormat(),
new Path("s3://name/")
)
.monitorContinuously(Duration.ofMillis(10000L))
.build(),
WatermarkStrategy.noWatermarks(),
"MySourceName"
)
The FileSource is the preferred way to ingest data from files. It should be able to handle the sort of scale you are talking about.
docs
javadocs
setQueueLimit on kinesis producer solved my problem https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/datastream/kinesis/#backpressure

Metadata, content-length for GCS objects

Two things:
I am trying to set custom metadata on a GCS object signed URL.
I am trying to set a maximum file size on a GCS object signed URL.
Using the following code:
Map<String, String> headers = new HashMap<>();
headers.put("x-goog-meta-" + usernameKey, username);
if (StringUtils.hasText(purpose)) {
headers.put("x-goog-meta-" + purposeKey, purpose);
}
if (maxFileSizeMb != null) {
headers.put("x-goog-content-length-range", String.format("0,%d", maxFileSizeMb * 1048576));
}
List<Storage.SignUrlOption> options = new ArrayList<>();
options.add(Storage.SignUrlOption.httpMethod(HttpMethod.POST));
options.add(Storage.SignUrlOption.withExtHeaders(headers));
String documentId = documentIdGenerator.generateDocumentId().getFormatted();
StorageDocument storageDocument =
StorageDocument.builder().id(documentId).path(getPathByDocumentId(documentId)).build();
storageDocument.setFormattedName(documentId);
SignedUrlData.SignedUrlDataBuilder builder =
SignedUrlData.builder()
.signedUrl(storageInterface.signUrl(gcpStorageBucket, storageDocument, options))
.documentId(documentId)
.additionalHeaders(headers);
First of all the generated signed URL works and I can upload a document.
Now I am expecting to see the object metadata through the console view. There is no metadata set though. Also the content-length-range is not respected. I can upload a 1.3 MB file when the content-length-range is set to 0,1.
Something happens when I upload a bigger file (~ 5 MB), but within the content-length-range. I receive an error message: Metadata part is too large..
As you can see here content-length-range requires both a minimum and maximum size. The unit used for the range is bytes, as you can see in this example.
I also noticed that you used x-goog-content-length-range, I found this documentation for it, when using this header take into account:
Use a PUT request, otherwise it will be silently ignored.
If the size of the request's content is outside the specified range, the request fails and a 400 Bad Request code is returned in the response.
You have to set the minimum and maximum size in bytes.

Processing huge files using AWS S3 and Lambda

I am trying to decrypt files that arrives periodically into our s3 bucket.How can I process if the file size is huge (eg 10GB) ,since the computing resources of Lambda is Limited. Im not sure if it is necessary download the whole file into Lambda and perform the decryption or is there some other way we can chunk the file and process?
Edit :- Processing the file here includes decrypting the file and the parse each row and write it to persistent store like a SQL queue or Database.
You can set the byte-range in the GetObjectRequest to load a specific range of bytes from an S3 object.
The following example comes from the AWS official documentation on S3 GetObject API:
// Get a range of bytes from an object and print the bytes.
GetObjectRequest rangeObjectRequest = new GetObjectRequest(bucketName, key).withRange(0, 9);
objectPortion = s3Client.getObject(rangeObjectRequest);
System.out.println("Printing bytes retrieved.");
displayTextInputStream(objectPortion.getObjectContent());
For more information, you can visit the documentation here:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/download-objects.html

Java/MongoDB message length error on Windows but not on Linux

we are currently working on importing huge JSON files (~100 MB) into MongoDB using the java driver. Currently we split up the files into smaller chunks, since we first encountered problems with importing the whole file. Of course we are aware of the limitation to MongoDB that the maximum document size is 16 MB, however our chunks that we are now importing are far smaller than that.
Strangely enough, the import procedure is working when running it on Linux (eclipse), yet the same program will throw an exception stating "can't say something" on Windows (eclipse).
When observing the log from the database, the error message says
> "Thu Sep 13 11:38:48 [conn1] recv(): message len 1835627538 is too
> large1835627538"
Rerunning the import on the same dataset always leads to the same error message regarding the message length. We investigated the size of our documents to import (using .toString().length()) - the chunk that caused the error was only some kB large.
It makes no difference on which OS the mongo database runs on, but depends on where the import code is being executed (Using the same java-mongo-driver
"we are currently working on importing huge JSON files (~100 MB) into
MongoDB using the java driver"
Are we talking about a JSON file containing 1000s of JSON objects OR 1 JSON object that is size ~100MB? Because if I remember correctly the 16MB limit is per object not per JSON file containing 1000s of JSON objecs.
Also!
"Thu Sep 13 11:38:48 [conn1] recv(): message len 1835627538 is too
large1835627538"
the chunk that caused the error was only some kB large.
If 1835627538 is indeed in kb, that is pretty big, cause thats around ~1750 GigaBytes!!
To get round a JSON file containing 1000s of JSON objects, Why don't you iterate through your data file line by line and do your inserts that way? With my method doesn't matter how large your data file is, the iterator is just a pointer to a specific line. It doesn't load the WHOLE FILE into memory and insert.
NOTE: This is assuming your data file contains 1 JSON object per line.
Using the Apache Commons IO FileUtils (click here), you can use their Line iterator to iterate through you file, for example (not fully working code, need to import correct libs):
LineIterator line_iter;
try {
line_iter = FileUtils.lineIterator(data_file);
while (line_iter.hasNext()) {
line = line_iter.next();
try {
if (line.charAt(0) == '{')
this.mongodb.insert(line);
} catch (IndexOutOfBoundsException e) {}
}
}
line_iter.close(); // close the iterator
} catch (IOException e) {
e.printStackTrace();
}

mapping byte[] in Hibernate and adding file chunk by chunk

I have web service which receives 100 Mb video file by chunks
public void addFileChunk(Long fileId, byte[] buffer)
How can I store this file in Postgresql database using hibernate?
Using regular JDBC is very straight forward. I would use the following code inside my web service method:
LargeObject largeObject = largeObjectManager.Open(fileId, LargeObjectManager.READWRITE);
int size = largeObject.Size();
largeObject.Seek(size);
largeObject.Write(buffer);
largeObject.Close();
How can I achieve the same functionality using Hibernate? and store this file by chunk?
Storing each file chunk in separate row as bytea seems to me not so smart idea. Pease advice.
its now advisable to store 100MB files in database. I would instead store them in the filesystem, but considering transactions are active, employing Servlets seems reasonable.
process http request so that file (received one) is stored in some temporal location.
open transaction, persist file metadata including temporal location, close transaction
using some external process which will monitor temporal files, transfer this file to its final destination from which it will be available to user through some Servlet.
see http://in.relation.to/Bloggers/PostgreSQLAndBLOBs
Yeah byteas would be bad. Hibernate has a way to continue to use large objects and you get to keep the streaming interface.

Categories