I created a Scala application that as part of its functions upload any type of file (BZ2, csv, txt, etc) to google cloud storage.
If I use the direct upload works fine for small files, but to upload a big file google recommends using “signUrl” and this is not creating the file, or updating a file if I create the file before, or throwing any exception with the error.
Works with small files:
val storage: Storage = StorageOptions.getDefaultInstance.getService
val fileContent = Files.readAllBytes(file.toPath)
val fileId: BlobId = BlobId.of(bucketName, s"$folderName/$fileName")
val fileInfo: BlobInfo = BlobInfo.newBuilder(fileId).build()
storage.create(fileInfo, fileContent)
Don´t work:
val storage: Storage = StorageOptions.getDefaultInstance.getService
val outputPath = s"$folderName/$fileName"
val fileInfo: BlobInfo = BlobInfo.newBuilder(bucketName, outputPath).build()
val optionWrite: SignUrlOption = Storage.SignUrlOption.httpMethod(HttpMethod.PUT)
val signedUrl: URL = storage.signUrl(fileInfo, 30, TimeUnit.MINUTES, optionWrite)
val connection = signedUrl.openConnection
connection.setDoOutput(true)
val out = connection.getOutputStream
val inputStream = Files.newInputStream(file.toPath)
var nextByte = inputStream.read()
while (nextByte != -1) {
out.write(nextByte)
nextByte = inputStream.read()
}
out.flush()
inputStream.close()
out.close()
I try reading/writing byte by byte, using and array of bytes, and using a OutputStreamWriter but neither work.
The library that I´m using is:
"com.google.cloud" % "google-cloud-storage" % "1.12.0"
Anyone know why this is not working?
Thanks
I found an easier way to write big files in Google storage, without using the signURL.
I will add the code just in case someone found it useful.
storage.create(fileInfo)
var writer:WriteChannel = storage.get(fileInfo.getBlobId).writer()
var inputStream: InputStream = Files.newInputStream(file.toPath)
val packetToRead: Array[Byte] = new Array[Byte](sizeBlockDefault toInt)
while (inputStream.available() > 0){
val numBytesReaded = inputStream.read(packetToRead,0,sizeBlockDefault toInt)
writer.write(ByteBuffer.wrap(packetToRead, 0, numBytesReaded))
}
inputStream.close()
writer.close()
Related
I have a project that used to split a pdf file that uploaded by user, after split then get the same content inside pdf then merge the page base on pdf content using PDODocument and for merge pdf i use PDFMergerUtility, after marge i save the merge pdf to database using bytearray.
and, after save to DB, user also can download the pdf that already split and merge base on content and reupload when needed.
but i have found a problem, after merge the size of pdf is bigger than pdf before split.
i have try to found the solution, but not found that working to my problem, such us
Android PdfDocument file size
Is there a way to compress PDF to small size using Java?
and another else solution
is there any solution to solve my problem?
I would be glad for any help.
and here is my code
//file: MultipartFile -> file is send from front-end using API
var inpStream: InputStream = file.getInputStream()
inpStream = file.getInputStream()
pdfDocument = PDDocument.load(inpStream)
// splitting the pages of a PDF document
pagesPdf = splitter.split(pdfDocument)
val n = pdfDocument.numberOfPages
val batchSize:Int = 200
val finalBatchSize: Int = n % batchSize
val numOfBatch: Int = (n - finalBatchSize) / batchSize
val batchFinal: Int = if (finalBatchSize == 0) numOfBatch else (numOfBatch + 1)
var batchNo: Int = 1
var startPage: Int
var endPage: Int = 0
while (batchNo <= batchFinal) {
startPage = endPage + 1
if (batchNo > numOfBatch) {
endPage = endPage + finalBatchSize
} else {
endPage = endPage + batchSize
}
val splitter:Splitter = Splitter()
splitter.setStartPage(startPage)
splitter.setEndPage(endPage)
// splitting the pages of a PDF document
pagesPdf = splitter.split(pdfDocument)
batchNo++
i = startPage
var groupPage: Int = i
var pageNo = 0
var pdfMerger: PDFMergerUtility = PDFMergerUtility()
var mergedFileByteArrOut: ByteArrayOutputStream = ByteArrayOutputStream()
pdfMerger.setDestinationStream(mergedFileByteArrOut)
var fileObj:ByteArray? = null,
for (pd in pagesPdf) {
pageNo++;
if (!pd.isEncrypted) {
val stripper = PDFTextStripper()
//CODE TO GET CONTEN
if(condition1 == true){
var fileByteArrOut: ByteArrayOutputStream = ByteArrayOutputStream()
pd.save(fileByteArrOut)
pd.close()
var fileByteArrIn: ByteArrayInputStream = ByteArrayInputStream(fileByteArrOut.toByteArray())
pdfMerger.addSource(fileByteArrIn)
fileObj = fileByteArrOut.toByteArray(),
}
if(condition2 == true){
//I want to compress fileObj first before save to DB
//code to save to DB
fileObj = null
pdfMerger = PDFMergerUtility()
mergedFileByteArrOut= ByteArrayOutputStream()
pdfMerger.setDestinationStream(mergedFileByteArrOut)
}
}
}
You can use cpdf https://community.coherentpdf.com to losslessly squeeze the PDF files afterward. This will reconcile any identical object and common parts, and remove any unneeded parts.
From the command line
cpdf -squeeze in.pdf -o out.pdf
Or, from Java:
jcpdf.squeezeInMemory(pdf);
How to upload a large(>4mb) file as an AppendBlob using Azure Storage Blob client library for Java?
I've successfully implemented the BlockBlob uploading with large files and it seems that the library internally handles the 4mb(?) limitation for single request and chunks the file into multiple requests.
Yet it seems that the library is not capable of doing the same for AppendBlob, so how can this chunking be done manually? Basically I think this requires to chunk an InputStream into smaller batches...
Using Azure Java SDK 12.14.1
Inspired by below answer in SO (related on doing this in C#):
c-sharp-azure-appendblob-appendblock-adding-a-file-larger-than-the-4mb-limit
... I ended up doing it like this in Java:
AppendBlobRequestConditions appendBlobRequestConditions = new AppendBlobRequestConditions()
.setLeaseId("myLeaseId");
try (InputStream input = new BufferedInputStream(
new FileInputStream(file));) {
byte[] buf = new byte[AppendBlobClient.MAX_APPEND_BLOCK_BYTES];
int bytesRead;
while ((bytesRead = input.read(buf)) > 0) {
if (bytesRead != buf.length) {
byte[] smallerData = new byte[bytesRead];
smallerData = Arrays.copyOf(buf, bytesRead);
buf = smallerData;
}
try (InputStream byteStream = new ByteArrayInputStream(buf);) {
appendBlobClient
.appendBlockWithResponse(byteStream, bytesRead,
null, appendBlobRequestConditions, null,
null);
}
}
}
Of course you need to do bunch of stuff before this, like make sure the AppendBlob exists, and if not then create it before trying to append any data.
I would like to upload a large Set<Integer> to Google Cloud Storage. I can do that with:
Blob result = storage.create(blobInfo, Joiner.on('\n').join(set).getBytes(UTF_8));
But this will create an intermediate String with all the content that might be too large.
I found an example with WriteChannel.write():
Set<Integer> set = ...
String bucketName = "my-unique-bucket";
String blobName = "my-blob-name";
BlobId blobId = BlobId.of(bucketName, blobName);
byte[] content = Joiner.on('\n').join(set).getBytes(UTF_8);
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType("text/plain").build();
try (WriteChannel writer = storage.writer(blobInfo)) {
writer.write(ByteBuffer.wrap(content, 0, content.length));
} catch (IOException ex) {
// handle exception
}
However, if I do that, the entire set is converted to a String and then to byte[]. The String itself might be too big.
Is there an example how to iterate over the set and transform it to a ByteBuffer? or should I do a loop on chunks of the set?
The most straightforward approach I could think of would be:
try (WriteChannel writer = storage.writer(blobInfo)) {
for(Integer val : set) {
String valLine = val.toString() + '\n';
writer.write(ByteBuffer.wrap(valLine.getBytes(UTF_8));
}
}
Mind you, this isn't very efficient. It creates a lot of small ByteBuffers. You could greatly improve on this by writing into a single larger ByteBuffer and periodically calling writer.write with it.
To avoid creating an intermediate String with all the bytes you can upload from a file. You can find example code to do an upload from a file in various languages here.
I am trying to add two small files to a zip, as that is the format the destination requires. Both files are less than 1000kb but when I run my code, the program hangs indefinitely during zip.close(), no errors.
What am I doing wrong?
val is = new PipedInputStream()
val os = new PipedOutputStream(is)
val cos = new CountingOutputStream(os)
val zip = new ZipOutputStream(cos)
val fis = new FileInputStream(file)
zip.putNextEntry(new ZipEntry(location))
var i = 0
while(i != -1) {
zip.write(i)
i = fis.read()
}
zip.closeEntry()
fis.close()
zip.close()
When using piped streams, you need to read from the PipedInputStream at the same time you're writing to a PipedOutputStream, otherwise the pipe fills up and the writing will block.
Based on your code, you're not doing the reading part (in a separate thread of course). You can test it with a FileOutputStream, and it should write the file nicely.
I am successfully serving videos using the Play framework, but I'm experiencing an issue: each time a file is served, the Play framework creates a copy in C:\Users\user\AppData\Temp. I'm serving large files so this quickly creates a problem with disk space.
Is there any way to serve a file in Play without creating a copy? Or have Play automatically delete the temp file?
Code I'm using to serve is essentially:
public Result video() {
return ok(new File("whatever"));
}
Use Streaming
I use following method for video streaming. This code does not create temp copies of the media file.
Basically this code responds to the RANGE queries sent by the browser. If browser does not support RANGE queries I fallback to the method where I try to send the whole file using Ok.sendFile (internally play also tries to stream the file) (this might create temp files). but this happens very rarely when range queries is not supported by the browser.
GET /media controllers.MediaController.media
Put this code inside a Controller called MediaController
def media = Action { req =>
val file = new File("/Users/something/Downloads/somefile.mp4")
val rangeHeaderOpt = req.headers.get(RANGE)
rangeHeaderOpt.map { range =>
val strs = range.substring("bytes=".length).split("-")
if (strs.length == 1) {
val start = strs.head.toLong
val length = file.length() - 1L
partialContentHelper(file, start, length)
} else {
val start = strs.head.toLong
val length = strs.tail.head.toLong
partialContentHelper(file, start, length)
}
}.getOrElse {
Ok.sendFile(file)
}
}
def partialContentHelper(file: File, start: Long, length: Long) = {
val fis = new FileInputStream(file)
fis.skip(start)
val byteStringEnumerator = Enumerator.fromStream(fis).&>(Enumeratee.map(ByteString.fromArray(_)))
val mediaSource = Source.fromPublisher(Streams.enumeratorToPublisher(byteStringEnumerator))
PartialContent.sendEntity(HttpEntity.Streamed(mediaSource, None, None)).withHeaders(
CONTENT_TYPE -> MimeTypes.forExtension("mp4").get,
CONTENT_LENGTH -> ((length - start) + 1).toString,
CONTENT_RANGE -> s"bytes $start-$length/${file.length()}",
ACCEPT_RANGES -> "bytes",
CONNECTION -> "keep-alive"
)
}