Play Framework eating up disk space - java

I am successfully serving videos using the Play framework, but I'm experiencing an issue: each time a file is served, the Play framework creates a copy in C:\Users\user\AppData\Temp. I'm serving large files so this quickly creates a problem with disk space.
Is there any way to serve a file in Play without creating a copy? Or have Play automatically delete the temp file?
Code I'm using to serve is essentially:
public Result video() {
return ok(new File("whatever"));
}

Use Streaming
I use following method for video streaming. This code does not create temp copies of the media file.
Basically this code responds to the RANGE queries sent by the browser. If browser does not support RANGE queries I fallback to the method where I try to send the whole file using Ok.sendFile (internally play also tries to stream the file) (this might create temp files). but this happens very rarely when range queries is not supported by the browser.
GET /media controllers.MediaController.media
Put this code inside a Controller called MediaController
def media = Action { req =>
val file = new File("/Users/something/Downloads/somefile.mp4")
val rangeHeaderOpt = req.headers.get(RANGE)
rangeHeaderOpt.map { range =>
val strs = range.substring("bytes=".length).split("-")
if (strs.length == 1) {
val start = strs.head.toLong
val length = file.length() - 1L
partialContentHelper(file, start, length)
} else {
val start = strs.head.toLong
val length = strs.tail.head.toLong
partialContentHelper(file, start, length)
}
}.getOrElse {
Ok.sendFile(file)
}
}
def partialContentHelper(file: File, start: Long, length: Long) = {
val fis = new FileInputStream(file)
fis.skip(start)
val byteStringEnumerator = Enumerator.fromStream(fis).&>(Enumeratee.map(ByteString.fromArray(_)))
val mediaSource = Source.fromPublisher(Streams.enumeratorToPublisher(byteStringEnumerator))
PartialContent.sendEntity(HttpEntity.Streamed(mediaSource, None, None)).withHeaders(
CONTENT_TYPE -> MimeTypes.forExtension("mp4").get,
CONTENT_LENGTH -> ((length - start) + 1).toString,
CONTENT_RANGE -> s"bytes $start-$length/${file.length()}",
ACCEPT_RANGES -> "bytes",
CONNECTION -> "keep-alive"
)
}

Related

ParquetFileReader not working with file stream

I'm trying to use ParquetFileReader to read files I'm receiving from S3 using a custom InputFile class since it's not a local file and I can't create a local temp file either.
He is my custom class based in this answer:
class ParquetInputFile(stream: ByteArray) : InputFile {
var data: ByteArray = stream
private class SeekableByteArrayInputStream(buf: ByteArray?) : ByteArrayInputStream(buf) {
var pos: Long = -1
}
override fun getLength(): Long {
return data.size.toLong()
}
override fun newStream(): SeekableInputStream {
return object : DelegatingSeekableInputStream(SeekableByteArrayInputStream(data)) {
override fun seek(newPos: Long) {
(stream as SeekableByteArrayInputStream).pos = newPos
}
override fun getPos(): Long {
return (stream as SeekableByteArrayInputStream).pos
}
}
}
override fun toString(): String {
return "com.test.ParquetInputFile[]"
}
}
And here is my code getting the file from S3 and using the classe above:
val bucketFile = s3Client.getObject(
GetObjectRequest(
"bucket-name",
"test_file.parquet"
)
)
val file = ParquetInputFile(bucketFile.objectContent.readAllBytes())
val reader = ParquetFileReader.open(file)
I'm getting an exception in the .open() execution when the reader try to read the file footer, it's saying it's not a parquet file while checking the "Magic" byte array
I made a fast test, using the same S3 file, but reading it from the local disk using a deprecated method from ParquetFileReader and it works:
val local = ParquetFileReader.open(Configuration(), Path("/Users/casky/Documents/pocs/resources/test_file.parquet"))
Debugging the same readFooter method, I saw that when it reads the fileMetadataLength here the size is different from local file and the S3 file, and actually, when the readIntLittleEndian() execute the 4 read functions here, the result returned in ch1, 2, 3 and 4 are:
Local File: 242, 7, 0, 0 returning 2034
S3 File: 80, 65, 82, 49 returning 827474256
But, as you can see, the values from ch1, 2, 3 and 4 are the correct value that the Parquet MAGIC array wants.
Now I'm not sure if the custom class is messing up it somehow, or if the Path object do something with the file content while reading it from local disk.

S3 / MinIO with Java / Scala: Saving byte buffers chunks of files to object storage

So, imagine that I have a Scala Vert.x Web REST API that receives file uploads via HTTP multipart requests. However, it doesn't receive the incoming file data as a single InputStream. Instead, each file is received as a series of byte buffers handed over via a few callback functions.
The callbacks basically look like this:
// the callback that receives byte buffers (chunks) of the file being uploaded
// it is called multiple times until the full file has been received
upload.handler { buffer =>
// send chunk to backend
}
// the callback that gets called after the full file has been uploaded
// (i.e. after all chunks have been received)
upload.endHandler { _ =>
// do something after the file has been uploaded
}
// callback called if an exception is raised while receiving the file
upload.exceptionHandler { e =>
// do something to handle the exception
}
Now, I'd like to use these callbacks to save the file into a MinIO Bucket (MinIO, if you're unfamiliar, is basically self-hosted S3 and it's API is pretty much the same as the S3 Java API).
Since I don't have a file handle, I need to use putObject() to put an InputStream into MinIO.
The inefficient work-around I'm currently using with the MinIO Java API looks like this:
// this is all inside the context of handling a HTTP request
val out = new PipedOutputStream()
val in = new PipedInputStream()
var size = 0
in.connect(out)
upload.handler { buffer =>
s.write(buffer.getBytes)
size += buffer.length()
}
upload.endHandler { _ =>
minioClient.putObject(
PutObjectArgs.builder()
.bucket("my-bucket")
.object("my-filename")
.stream(in, size, 50000000)
.build())
}
Obviously, this isn't optimal. Since I'm using a simple java.io stream here, the entire file ends up getting loaded into memory.
I don't want to save the File to disk on the server before putting it into object storage. I'd like to put it straight into my object storage.
How could I accomplish this using the S3 API and a series of byte buffers given to me via the upload.handler callback?
EDIT
I should add that I am using MinIO because I cannot use a commercially-hosted cloud solution, like S3. However, as mentioned on MinIO's website, I can use Amazon's S3 Java SDK while using MinIO as my storage solution.
I attempted to follow this guide on Amazon's website for uploading objects to S3 in chunks.
That solution I attempted looks like this:
context.request.uploadHandler { upload =>
println(s"Filename: ${upload.filename()}")
val partETags = new util.ArrayList[PartETag]
val initRequest = new InitiateMultipartUploadRequest("docs", "my-filekey")
val initResponse = s3Client.initiateMultipartUpload(initRequest)
upload.handler { buffer =>
println("uploading part", buffer.length())
try {
val request = new UploadPartRequest()
.withBucketName("docs")
.withKey("my-filekey")
.withPartSize(buffer.length())
.withUploadId(initResponse.getUploadId)
.withInputStream(new ByteArrayInputStream(buffer.getBytes()))
val uploadResult = s3Client.uploadPart(request)
partETags.add(uploadResult.getPartETag)
} catch {
case e: Exception => println("Exception raised: ", e)
}
}
// this gets called for EACH uploaded file sequentially
upload.endHandler { _ =>
// upload successful
println("done uploading")
try {
val compRequest = new CompleteMultipartUploadRequest("docs", "my-filekey", initResponse.getUploadId, partETags)
s3Client.completeMultipartUpload(compRequest)
} catch {
case e: Exception => println("Exception raised: ", e)
}
context.response.setStatusCode(200).end("Uploaded")
}
upload.exceptionHandler { e =>
// handle the exception
println("exception thrown", e)
}
}
}
This works for files that are small (my test small file was 11 bytes), but not for large files.
In the case of large files, the processes inside the upload.handler get progressively slower as the file continues to upload. Also, upload.endHandler is never called, and the file somehow continues uploading after 100% of the file has been uploaded.
However, as soon as I comment out the s3Client.uploadPart(request) portion inside upload.handler and the s3Client.completeMultipartUpload parts inside upload.endHandler (basically throwing away the file instead of saving it to object storage), the file upload progresses as normal and terminates correctly.
I figured out what I was doing wrong (when using the S3 client). I was not accumulating bytes inside my upload.handler. I need to accumulate bytes until the buffer size is big enough to upload a part, rather than upload each time I receive a few bytes.
Since neither Amazon's S3 client nor the MinIO client did what I want, I decided to dig into how putObject() was actually implemented and make my own. This is what I came up with.
This implementation is specific to Vert.X, however it can easily be generalized to work with built-in java.io InputStreams via a while loop and using a pair of Piped- streams.
This implementation is also specific to MinIO, but it can easily be adapted to use the S3 client since, for the most part, the two APIs are the same.
In this example, Buffer is basically a container around a ByteArray and I'm not really doing anything special here. I replaced it with a byte array to ensure that it would still work, and it did.
package server
import com.google.common.collect.HashMultimap
import io.minio.MinioClient
import io.minio.messages.Part
import io.vertx.core.buffer.Buffer
import io.vertx.core.streams.ReadStream
import scala.collection.mutable.ListBuffer
class CustomMinioClient(client: MinioClient) extends MinioClient(client) {
def putReadStream(bucket: String = "my-bucket",
objectName: String,
region: String = "us-east-1",
data: ReadStream[Buffer],
objectSize: Long,
contentType: String = "application/octet-stream"
) = {
val headers: HashMultimap[String, String] = HashMultimap.create()
headers.put("Content-Type", contentType)
var uploadId: String = null
try {
val parts = new ListBuffer[Part]()
val createResponse = createMultipartUpload(bucket, region, objectName, headers, null)
uploadId = createResponse.result.uploadId()
var partNumber = 1
var uploadedSize = 0
// an array to use to accumulate bytes from the incoming stream until we have enough to make a `uploadPart` request
var partBuffer = Buffer.buffer()
// S3's minimum part size is 5mb, excepting the last part
// you should probably implement your own logic for determining how big
// to make each part based off the total object size to avoid unnecessary calls to S3 to upload small parts.
val minPartSize = 5 * 1024 * 1024
data.handler { buffer =>
partBuffer.appendBuffer(buffer)
val availableSize = objectSize - uploadedSize - partBuffer.length
val isMinPartSize = partBuffer.length >= minPartSize
val isLastPart = uploadedSize + partBuffer.length == objectSize
if (isMinPartSize || isLastPart) {
val partResponse = uploadPart(
bucket,
region,
objectName,
partBuffer.getBytes,
partBuffer.length,
uploadId,
partNumber,
null,
null
)
parts.addOne(new Part(partNumber, partResponse.etag))
uploadedSize += partBuffer.length
partNumber += 1
// empty the part buffer since we have already uploaded it
partBuffer = Buffer.buffer()
}
}
data.endHandler { _ =>
completeMultipartUpload(bucket, region, objectName, uploadId, parts.toArray, null, null)
}
data.exceptionHandler { exception =>
// should also probably abort the upload here
println("Handler caught exception in custom putObject: " + exception)
}
} catch {
// and abort it here as well...
case e: Exception =>
println("Exception thrown in custom `putObject`: " + e)
abortMultipartUpload(
bucket,
region,
objectName,
uploadId,
null,
null
)
}
}
}
This can all be used pretty easily.
First, set up the client:
private val _minioClient = MinioClient.builder()
.endpoint("http://localhost:9000")
.credentials("my-username", "my-password")
.build()
private val myClient = new CustomMinioClient(_minioClient)
Then, where you receive the upload request:
context.request.uploadHandler { upload =>
myClient.putReadStream(objectName = upload.filename(), data = upload, objectSize = myFileSize)
context.response().setStatusCode(200).end("done")
}
The only catch with this implementation is that you need to know the file sizes in advance for the request.
However, this can easily be solved the way I did it, especially if you're using a web UI.
Before attempting to upload the files, send a request to the server containing a map of file name to file size.
That pre-request should generate a unique ID for the upload.
The server can save group of filename->filesize using the upload ID as an index. - Server sends the upload ID back to the client.
Client sends the multipart upload request using the upload ID
Server pulls the list of files and their sizes and uses it to call .putReadStream()

Google Drive resumable upload in v3

I am looking for some help/example to perform a resumeable upload to Google Drive using the new v3 REST API in Java.
I know there is a low level description here: Upload files | Google Drive API. But at the moment I am not willing to understand any of these low level requests, if there isn't another, simpler method ( like former MediaHttpUploader, which is deprecated now...)
What I currently do is:
File fileMetadata = new File();
fileMetadata.setName(name);
fileMetadata.setDescription(...);
fileMetadata.setParents(parents);
fileMetadata.setProperties(...);
FileContent mediaContent = new FileContent(..., file);
drive.files().create(fileMetadata, mediaContent).execute();
But for large files, this isn't good if the connection interrupts.
I've just created an implementation on that recently. It will create a new file on your DriveFolder and return its metadata when the task succeeds. While uploading, it will also update the listener with uploading info. I added comments to make it auto explanable:
public Task<File> createFile(java.io.File yourfile, MediaHttpUploaderProgressListener uploadListener) {
return Tasks.call(mExecutor, () -> {
//Generates an input stream with your file content to be uploaded
FileContent mediaContent = new FileContent("yourFileMimeType", yourfile);
//Creates an empty Drive file
File metadata = new File()
.setParents(parents)
.setMimeType(yourFileMimeType)
.setName(yourFileName);
//Builds up the upload request
Drive.Files.Create uploadFile = mDriveService.files().create(metadata, mediaContent);
//This will handle the resumable upload
MediaHttpUploader uploader = uploadBackup.getMediaHttpUploader();
//choose your chunk size and it will automatically divide parts
uploader.setChunkSize(MediaHttpUploader.MINIMUM_CHUNK_SIZE);
//according to Google, this enables gzip in future (optional)
uploader.setDisableGZipContent(false); versions
//important, this enables resumable upload
uploader.setDirectUploadEnabled(false);
//listener to be updated
uploader.setProgressListener(uploadListener);
return uploadFile.execute();
});
}
And make your Activity extends MediaHttpUploaderProgressListener so you have real time updates on the file progress:
#Override
public void progressChanged(MediaHttpUploader uploader) {
String sizeTemp = "Uploading"
+ ": "
+ Formatter.formatShortFileSize(this, uploader.getNumBytesUploaded())
+ "/"
+ Formatter.formatShortFileSize(this, totalFileSize);
runOnUiThread(() -> textView.setText(sizeTemp));
}
For calculating the progress percentage, you simply do:
double percentage = uploader.getNumBytesUploaded() / totalFileSize
Or use this one:
uploader.getProgress()
It gives you the percentage of bytes that have been uploaded, represented between 0.0 (0%) and 1.0 (100%). But be sure to have your content length specified, otherwise it will throw IllegalArgumentException.

How to upload multiple files to Google Cloud Storage in a single call using Java API?

We want to upload multiple files to Google Cloud Storage. Currently, we are uploading one by one using the Google Java API. The code is below:
public void uploadFile(File srcFile,String bucketName, String destPath) throws IOException {
BlobId blobId = BlobId.of(bucketName, srcFile.getName());
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).build();
long startTime = System.currentTimeMillis();
// Blob blob = storage.create(blobInfo,new FileInputStream(srcFile));
try (WriteChannel writer = storage.writer(blobInfo)) {
try (FileInputStream in = new FileInputStream(srcFile)){
byte[] buffer = new byte[1024 * 1024 * 100] ;
writer.setChunkSize(buffer.length);
int readSize = 0;
while((readSize = in.read(buffer)) > 0) {
writer.write(ByteBuffer.wrap(buffer, 0, readSize));
}
long endTime = System.currentTimeMillis();
double writeTime = (double)(endTime - startTime) / 1000;
System.out.println("File write time : " + writeTime);
}
}
}
Our application wants to upload multiple files at a time. I tried to find a method to upload multiple files in the Java API, but could not find the any method to upload multiple files using a single call.
If I loop and upload using multiple calls, it is adding a huge network overhead and performance is very slow, which the application cannot afford.
My questions are:
Is it possible to upload multiple files via a single call, either using the Java API or REST API?
Could you please provide an example?
Neither the google-cloud-java library nor the underlying JSON API has an API call to upload multiple files, but you could accomplish what you're trying to do by spawning multiple threads and having each thread upload a file. The gsutil -m option does it this way.

Upload file with signUrl to google cloud storage using and application

I created a Scala application that as part of its functions upload any type of file (BZ2, csv, txt, etc) to google cloud storage.
If I use the direct upload works fine for small files, but to upload a big file google recommends using “signUrl” and this is not creating the file, or updating a file if I create the file before, or throwing any exception with the error.
Works with small files:
val storage: Storage = StorageOptions.getDefaultInstance.getService
val fileContent = Files.readAllBytes(file.toPath)
val fileId: BlobId = BlobId.of(bucketName, s"$folderName/$fileName")
val fileInfo: BlobInfo = BlobInfo.newBuilder(fileId).build()
storage.create(fileInfo, fileContent)
Don´t work:
val storage: Storage = StorageOptions.getDefaultInstance.getService
val outputPath = s"$folderName/$fileName"
val fileInfo: BlobInfo = BlobInfo.newBuilder(bucketName, outputPath).build()
val optionWrite: SignUrlOption = Storage.SignUrlOption.httpMethod(HttpMethod.PUT)
val signedUrl: URL = storage.signUrl(fileInfo, 30, TimeUnit.MINUTES, optionWrite)
val connection = signedUrl.openConnection
connection.setDoOutput(true)
val out = connection.getOutputStream
val inputStream = Files.newInputStream(file.toPath)
var nextByte = inputStream.read()
while (nextByte != -1) {
out.write(nextByte)
nextByte = inputStream.read()
}
out.flush()
inputStream.close()
out.close()
I try reading/writing byte by byte, using and array of bytes, and using a OutputStreamWriter but neither work.
The library that I´m using is:
"com.google.cloud" % "google-cloud-storage" % "1.12.0"
Anyone know why this is not working?
Thanks
I found an easier way to write big files in Google storage, without using the signURL.
I will add the code just in case someone found it useful.
storage.create(fileInfo)
var writer:WriteChannel = storage.get(fileInfo.getBlobId).writer()
var inputStream: InputStream = Files.newInputStream(file.toPath)
val packetToRead: Array[Byte] = new Array[Byte](sizeBlockDefault toInt)
while (inputStream.available() > 0){
val numBytesReaded = inputStream.read(packetToRead,0,sizeBlockDefault toInt)
writer.write(ByteBuffer.wrap(packetToRead, 0, numBytesReaded))
}
inputStream.close()
writer.close()

Categories