Blobstore upload files but only the 1st 1 mb - java

I have a working java servlet which will upload files into the bucket indicated but however, it is only able fully upload files with less than 1MB. If i were to upload a file with more then 1MB, only the 1st MB of data will be uploaded while the rest of the files will be empty.
package com.google.appengine;
import java.io.IOException;
import java.util.List;
import java.util.Map;
import java.nio.ByteBuffer;
import java.io.PrintWriter;
import javax.servlet.*;
import javax.servlet.http.*;
import com.google.appengine.api.blobstore.BlobKey;
import com.google.appengine.api.blobstore.BlobInfo;
import com.google.appengine.api.blobstore.BlobstoreService;
import com.google.appengine.api.blobstore.BlobstoreServiceFactory;
import com.google.appengine.api.blobstore.BlobstoreInputStream;
import com.google.appengine.tools.cloudstorage.GcsFileOptions;
import com.google.appengine.tools.cloudstorage.GcsFilename;
import com.google.appengine.tools.cloudstorage.GcsOutputChannel;
import com.google.appengine.tools.cloudstorage.GcsService;
import com.google.appengine.tools.cloudstorage.GcsServiceFactory;
import com.google.appengine.tools.cloudstorage.RetryParams;
public class upload extends HttpServlet {
private BlobstoreService blobstoreService = BlobstoreServiceFactory.getBlobstoreService();
#Override
public void doPost(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
PrintWriter out = response.getWriter();
HttpSession session = request.getSession();
String Bucketname = (String) session.getAttribute("BUCKET");
Map<String, List<BlobInfo>> blobsData = blobstoreService.getBlobInfos(request);
for (String key : blobsData.keySet())
{
for(BlobInfo blob:blobsData.get(key))
{
byte[] b = new byte[(int)blob.getSize()];
BlobstoreInputStream in = new BlobstoreInputStream(blob.getBlobKey());
in.read(b);
GcsService gcsService = GcsServiceFactory.createGcsService();
GcsFilename filename = new GcsFilename(Bucketname, blob.getFilename());
GcsFileOptions options = new GcsFileOptions.Builder()
.mimeType(blob.getContentType())
//.acl("authenticated-read")
.build();
gcsService.createOrReplace(filename,options,ByteBuffer.wrap(b));
in.close();
}
}
String SharedMessage = "File has been Uploaded Successfully!";
String SharedURL = "";
session.setAttribute("SHAREDMESSAGE",SharedMessage);
session.setAttribute("SHAREDURL",SharedURL);
response.sendRedirect("SharedResult.jsp");
}
}
Any help will be appreciated. Thank you

You're ignoring the return value of in.read(byte[]);. It doesn't necessarily fill the whole array, especially when the array is large. You'll need to read until you get -1, which means the stream has been exhausted.
ByteBuffer b = ByteBuffer.allocate((int)blob.getSize());
BlobstoreInputStream in = new BlobstoreInputStream(blob.getBlobKey());
byte[] buf = new byte[8192];
int bytes = 0;
while((bytes = in.read(buf)) != -1)
b.put(buf, 0, bytes);
b.flip();
...

Related

Copy files from box folder to AWS s3 bucket

I am working to copy box files to S3 bucket. How to get file object from box file to copy in to S3 bucket using box-java-sdk
I have to tried to get file's metadata from box folder, but end up with limited documentation to get file object.
import com.box.sdk.BoxAPIConnection;
import com.box.sdk.BoxFile;
import com.box.sdk.BoxFolder;
import com.box.sdk.BoxItem;
import com.box.sdk.Metadata;
String access_token = "some_access_token";
String refresh_token = "some_refresh_token";
BoxAPIConnection api = new BoxAPIConnection(client_id,
client_secret,
access_token,
refresh_token);
for (BoxItem.Info itemInfo : folder) {
if (itemInfo instanceof BoxFile.Info) {
// getting file info, metadata
// have to upload the file content here to S3 bucket
} else if (itemInfo instanceof BoxFolder.Info)
{
BoxFolder.Info folderInfo = (BoxFolder.Info) itemInfo;
// Do something with the folder.
}
}
Goal is to upload box content to S3 bucket.
So i came up with this java code to copy files from box folder to Aws S3. I have used box-sdk-java, aws-sdk-java here.
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.CompleteMultipartUploadRequest;
import com.amazonaws.services.s3.model.InitiateMultipartUploadRequest;
import com.amazonaws.services.s3.model.InitiateMultipartUploadResult;
import com.amazonaws.services.s3.model.PartETag;
import com.amazonaws.services.s3.model.UploadPartRequest;
import com.amazonaws.services.s3.model.UploadPartResult;
import com.box.sdk.BoxAPIConnection;
import com.box.sdk.BoxFile;
import com.box.sdk.BoxFolder;
import com.box.sdk.BoxItem;
import com.box.sdk.Metadata;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import org.apache.commons.io.FileUtils;
public static String fileObjKeyName = "";
public static String bucketName = "s3Bucket";
// store credentials in your local machine in aws config / credentials file.
public static ProfileCredentialsProvider credentialsProvider = new ProfileCredentialsProvider();
public static AmazonS3 s3Client = AmazonS3ClientBuilder
.standard()
.withCredentials(credentialsProvider)
.withRegion(regionfOfS3Bucket)
.build();
String access_token = "some_access_token";
String refresh_token = "some_refresh_token";
BoxAPIConnection api = new BoxAPIConnection(client_id,
client_secret,
access_token,
refresh_token);
BoxFolder folder = new BoxFolder(api,folderId);
for (BoxItem.Info itemInfo : folder) {
if (itemInfo instanceof BoxFile.Info) {
// getting file info, metadata
// have to upload the file content here to S3 bucket
BoxFile file = new BoxFile(api, itemInfo.getID());
BoxFile.Info info = file.getInfo();
System.out.println(info.getName());
FileOutputStream stream;
try {
stream = new FileOutputStream(info.getName());
file.download(stream);
stream.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
File file_new = FileUtils.getFile(info.getName());
fileObjKeyName = itemInfo.getID() + "_" + info.getName();
long contentLength = file_new.length();
System.out.println(contentLength);
long partSize = 5 * 1024 * 1024;
List<PartETag> partETags = new ArrayList<PartETag>();
InitiateMultipartUploadRequest initRequest = new InitiateMultipartUploadRequest(bucketName, fileObjKeyName);
InitiateMultipartUploadResult initResponse = s3Client.initiateMultipartUpload(initRequest);
long filePosition = 0;
for (int i = 1; filePosition < contentLength; i++) {
// Because the last part could be less than 5 MB, adjust the
// part size as needed.
partSize = Math.min(partSize, (contentLength - filePosition));
// Create the request to upload a part.
UploadPartRequest uploadRequest = new UploadPartRequest().withBucketName(bucketName).withKey(fileObjKeyName)
.withUploadId(initResponse.getUploadId()).withPartNumber(i).withFileOffset(filePosition).withFile(file_new).withPartSize(partSize);
// Upload the part and add the response's ETag to our list.
UploadPartResult uploadResult = s3Client.uploadPart(uploadRequest);
partETags.add(uploadResult.getPartETag());
filePosition += partSize;
}
CompleteMultipartUploadRequest compRequest = new CompleteMultipartUploadRequest(bucketName, fileObjKeyName, initResponse.getUploadId(), partETags);
s3Client.completeMultipartUpload(compRequest);
file_new.delete();
} else if (itemInfo instanceof BoxFolder.Info)
{
BoxFolder.Info folderInfo = (BoxFolder.Info) itemInfo;
// Do something with the folder.
}
}

How to convert HTML to PDF with working hyperlinks using docx4j?

I used Eclipse Luna 64bit, Maven, docx4j API for PDF conversion, template letter format on which I want my HTML code. This template is saved in my database.
I want to include a hyperlink in the PDF, so my users can on click this link and open it in their browser.
This is my main class:
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Properties;
import java.util.TreeMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.faces.bean.ManagedBean;
import javax.faces.bean.ManagedProperty;
import javax.faces.bean.ViewScoped;
import javax.faces.model.SelectItem;
import org.apache.commons.lang.StringUtils;
import org.docx4j.Docx4J;
import org.docx4j.XmlUtils;
import org.docx4j.convert.in.xhtml.XHTMLImporterImpl;
import org.docx4j.jaxb.Context;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.relationships.Namespaces;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
import org.docx4j.wml.Body;
import org.docx4j.wml.BooleanDefaultTrue;
import org.docx4j.wml.Document;
import org.docx4j.wml.P;
import org.docx4j.wml.PPrBase;
import org.docx4j.wml.R;
import org.docx4j.wml.Text;
import org.primefaces.context.RequestContext;
import org.primefaces.model.DefaultStreamedContent;
import org.primefaces.model.StreamedContent;
import org.primefaces.model.UploadedFile;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class LetterMaintenanceBean extends BaseManagedBean implements
Serializable {
public StreamedContent previewLetter() {
String content = this.letter.getHtmlContent();
String regex = "<a href=(\"[^\"]*\")[^<]*</a>"; //Digvijay
Pattern p = Pattern.compile(regex); //Digvijay
System.out.println("p: "+p);
Matcher m = p.matcher(content); //Digvijay
System.out.println("m: "+m);
content = m.replaceAll("<strong><u><span style=\"color:#0099cc\">$1</span></u></strong>"); //Digvijay
System.out.println("regex1: "+regex); //Digvijay
Map<String, String> previewExamples = this.getPreviewExamples(this.letter.getMessageTypeCode());
for (Entry<String, String> example : previewExamples.entrySet()) {
if (StringUtils.isNotBlank(example.getKey()) && StringUtils.isNotBlank(example.getValue())) {
content = content.replace(example.getKey(), example.getValue());
System.out.println("content after map date");
}
}
System.out.println("content1:: "+content);
if (!content.startsWith("<div>")) {
content = "<div>" + content + "</div>";
}
// Docx4j does not understand HTML codes for special characters. So replacing with Unicode values.
content = content.replace(" ", " ");
content = content.replace("’", "’");
content = content.replaceAll("</p>", "</p><br/>");
content = content.replaceAll("\"</span>", "</span>");
InputStream stream = null;
try {
System.out.println("content:"+content);
if (this.letter.getHtmlContent().getBytes() != null && this.letter.getWfTemplateId() != null) {
stream = new ByteArrayInputStream(this.HTMLToPDF(content.getBytes(), this.letter.getWfTemplateId()));
} else {
stream = new ByteArrayInputStream(this.HTMLToPDFWithoutTemplate(content.getBytes()));
}
StreamedContent file = new DefaultStreamedContent(stream, "application/pdf", this.letter.getLetterName() + ".pdf");
return file;
} catch (LetterMaintenanceException e) {
this.processServiceException(e);
StreamedContent file = new DefaultStreamedContent(
new ByteArrayInputStream(
"Unable to process your request. If the problem persists, please contact application support."
.getBytes()), "application/pdf", "error" + ".pdf");
return file;
} catch (Exception e) {
this.processGenericException(e);
StreamedContent file = new DefaultStreamedContent(
new ByteArrayInputStream(
"Unable to process your request. If the problem persists, please contact application support."
.getBytes()), "application/pdf", "error" + ".pdf");
return file;
}
}
This is my HTMLToPDF() method:
private byte[] HTMLToPDF(final byte[] htmlContent, final String templateId)
throws Docx4JException, LetterMaintenanceException {
LetterMaintenanceDelegate letterMaintenanceDelegate = new LetterMaintenanceDelegate();
Template template = letterMaintenanceDelegate.retrieveTemplateById(templateId);
if (template == null || template.getContent() == null) {
throw new LetterMaintenanceException("Could not retrieve template");
}
InputStream is = new ByteArrayInputStream(template.getContent());
WordprocessingMLPackage templatePackage = WordprocessingMLPackage.load(is);
// Convert HTML to docx
XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(templatePackage);
XHTMLImporter.setHyperlinkStyle("Hyperlink");
templatePackage
.getMainDocumentPart()
.getContent()
.addAll(XHTMLImporter.convert(new ByteArrayInputStream(htmlContent), null));
// Add content of content docx to template
templatePackage.getMainDocumentPart().getContent().addAll(templatePackage.getMainDocumentPart().getContent());
// Handle page breaks
templatePackage = this.handlePagebreaksInDocx(templatePackage);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Docx4J.toPDF(templatePackage, baos);
return baos.toByteArray();
}
}
In this code I am trying to convert HTML (with href tag) to PDF file and in the PDF output the hyperlink must work.
The current output of this program is a PDF but there are no working links in it.
How can I activate my links?

Jsoup reddit scraper 429 error

So I'm trying to use jsoup to scrape Reddit for images, but when I scrape certain subreddits such as /r/wallpaper, I get a 429 error and am wondering how to fix this. Totally understand that this code is horrible and this is a pretty noob question, but I'm completely new to this. Anyways:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.io.*;
import java.net.URL;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.io.*;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Attributes;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
import java.net.URL;
import java.util.Scanner;
public class javascraper{
public static void main (String[]args) throws MalformedURLException
{
Scanner scan = new Scanner (System.in);
System.out.println("Where do you want to store the files?");
String folderpath = scan.next();
System.out.println("What subreddit do you want to scrape?");
String subreddit = scan.next();
subreddit = ("http://reddit.com/r/" + subreddit);
new File(folderpath + "/" + subreddit).mkdir();
//test
try{
//gets http protocol
Document doc = Jsoup.connect(subreddit).timeout(0).get();
//get page title
String title = doc.title();
System.out.println("title : " + title);
//get all links
Elements links = doc.select("a[href]");
for(Element link : links){
//get value from href attribute
String checkLink = link.attr("href");
Elements images = doc.select("img[src~=(?i)\\.(png|jpe?g|gif)]");
if (imgCheck(checkLink)){ // checks to see if img link j
System.out.println("link : " + link.attr("href"));
downloadImages(checkLink, folderpath);
}
}
}
catch (IOException e){
e.printStackTrace();
}
}
public static boolean imgCheck(String http){
String png = ".png";
String jpg = ".jpg";
String jpeg = "jpeg"; // no period so checker will only check last four characaters
String gif = ".gif";
int length = http.length();
if (http.contains(png)|| http.contains("gfycat") || http.contains(jpg)|| http.contains(jpeg) || http.contains(gif)){
return true;
}
else{
return false;
}
}
private static void downloadImages(String src, String folderpath) throws IOException{
String folder = null;
//Exctract the name of the image from the src attribute
int indexname = src.lastIndexOf("/");
if (indexname == src.length()) {
src = src.substring(1, indexname);
}
indexname = src.lastIndexOf("/");
String name = src.substring(indexname, src.length());
System.out.println(name);
//Open a URL Stream
URL url = new URL(src);
InputStream in = url.openStream();
OutputStream out = new BufferedOutputStream(new FileOutputStream( folderpath+ name));
for (int b; (b = in.read()) != -1;) {
out.write(b);
}
out.close();
in.close();
}
}
Your issue is caused by the fact that your scraper is violating reddit's API rules. Error 429 means "Too many requests" – you're requesting too many pages too fast.
You can make one request every 2 seconds, and you also need to set a proper user agent (they format they recommend is <platform>:<app ID>:<version string> (by /u/<reddit username>)). The way it currently looks, your code is running too fast and doesn't specify one, so it's going to be severely rate-limited.
To fix it, first off, add this to the start of your class, before the main method:
public static final String USER_AGENT = "<PUT YOUR USER AGENT HERE>";
(Make sure to specify an actual user agent).
Then, change this (in downloadImages)
URL url = new URL(src);
InputStream in = url.openStream();
to this:
URLConnection connection = (new URL(src)).openConnection();
Thread.sleep(2000); //Delay to comply with rate limiting
connection.setRequestProperty("User-Agent", USER_AGENT);
InputStream in = connection.getInputStream();
You'll also want to change this (in main)
Document doc = Jsoup.connect(subreddit).timeout(0).get();
to this:
Document doc = Jsoup.connect(subreddit).userAgent(USER_AGENT).timeout(0).get();
Then your code should stop running into that error.
Note that using reddit's API (IE, /r/subreddit.json instead of /r/subreddit) would probably make this project easier, but it isn't required and your current code will work.
As you can look up at Wikipedia the 429 status code tells you that you have too many requests:
The user has sent too many requests in a given amount of time. Intended for use with rate limiting schemes.
A solution would be to slow down your scraper. There are some options how to do this, one would be to use sleep.

Map a image file through Spring controller

Is there any way to map a image file using a spring controller? In my spring application, I want store the images in the directory src/main/resources (i'm using maven) and access them with a method like this:
#RequestMapping(value="image/{theString}")
public ModelAndView image(#PathVariable String theString) {
return new ModelAndView('what should be placed here?');
}
the string theString it's the image name (without extension). With this approach, I should be able to access my images this way:
/webapp/controller_mapping/image/image_name
Anyone can point a direction to do that?
You can return HttpEntity<byte[]>. Construct new instance providing image byte array and necessary headers like content length and mime type then return it from your method. Image bytes can be obtained using classloader getResourceAsStream method.
This works for me. It could use some cleaning up but it works. The ServiceException is just a simple base exception.
Good Luck!
package com.dhargis.example;
import java.io.File;
import java.io.IOException;
import javax.servlet.ServletOutputStream;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import org.apache.commons.io.FileUtils;
import org.apache.log4j.Logger;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
#Controller
#RequestMapping("/image")
public class ImageController {
private static final Logger log = Logger.getLogger(ImageController.class);
private String filestore = "C:\\Users\\dhargis";
//produces = "application/octet-stream"
#RequestMapping(value = "/{filename:.+}", method = RequestMethod.GET)
public void get( #PathVariable String filename,
HttpServletRequest request,
HttpServletResponse response) {
log.info("Getting file " + filename);
try {
byte[] content = null;
File store = new File(filestore);
if( store.exists() ){
File file = new File(store.getPath()+File.separator+filename);
if( file.exists() ){
content = FileUtils.readFileToByteArray(file);
} else {
throw new ServiceException("File does not exist");
}
} else {
throw new ServiceException("Report store is required");
}
ServletOutputStream out = response.getOutputStream();
out.write(content);
out.flush();
out.close();
} catch (ServiceException e) {
log.error("Error on get", e);
} catch (IOException e) {
log.error("Error on get", e);
}
}
}
<!-- begin snippet: js hide: false -->

blobStoreService.serve() is not giving download file

I have a servlet in which I first download a pdf in from http://www.cbwe.gov.in/htmleditor1/pdf/sample.pdf upload it's content on my blobstore and when a user sends a get request in browser a blob will be downloaded in browser, but instead of downloading it's showing data in some other format. Here is my code of servlet:
package org.ritesh;
import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.ByteBuffer;
import javax.servlet.http.*;
import org.apache.commons.io.IOUtils;
import com.google.appengine.api.blobstore.BlobKey;
import com.google.appengine.api.blobstore.BlobstoreService;
import com.google.appengine.api.blobstore.BlobstoreServiceFactory;
import com.google.appengine.api.files.AppEngineFile;
import com.google.appengine.api.files.FileServiceFactory;
import com.google.appengine.api.files.FileService;
import com.google.appengine.api.files.FileWriteChannel;
#SuppressWarnings("serial")
public class BlobURLServlet extends HttpServlet {
public void doGet(HttpServletRequest req, HttpServletResponse resp)
throws IOException {
resp.setContentType("text/plain");
resp.getWriter().println("Hello, world");
FileService fileService = FileServiceFactory.getFileService();
// Create a new Blob file with mime-type "text/plain"
String url="http://www.cbwe.gov.in/htmleditor1/pdf/sample.pdf";
URL url1=new URL(url);
HttpURLConnection conn=(HttpURLConnection) url1.openConnection();
String content_type=conn.getContentType();
InputStream stream =conn.getInputStream();
AppEngineFile file = fileService.createNewBlobFile("application/pdf");
file=new AppEngineFile(file.getFullPath());
Boolean lock = true;
FileWriteChannel writeChannel = fileService.openWriteChannel(file, lock);
// This time we write to the channel directly
String s1="";
String s2="";
byte[] bytes = IOUtils.toByteArray(stream);
writeChannel.write(ByteBuffer.wrap(bytes));
writeChannel.closeFinally();
BlobKey blobKey = fileService.getBlobKey(file);
BlobstoreService blobStoreService = BlobstoreServiceFactory.getBlobstoreService();
blobStoreService.serve(blobKey, resp);
}
}
I deploy this servlet on onemoredemo1.appspot.com. Please open this url and notice when u click on BlobURL servlet it's showing content instead of showing downloading dialog. What modification should I do in my code so it shows download dialog in browser?
Look here:
resp.setContentType("text/plain");
You've said that the content is plain text, when it's not. You need to set the Content-Disposition header appropriately as an attachment, and set the content type to application/pdf.
Additionally, if you're going to serve binary content, you shouldn't also use the writer (which you're writing "Hello, world" with).
If you change your first couple of lines to:
resp.setContentType("application/pdf");
resp.setHeader("Content-Disposition", "attachment;filename=sample.pdf");
you may find that's all that's required.

Categories