Writing to a PDF from inside a GAE app - java

I need to read several megabytes (raw text strings) out of my GAE Datastore and then write them all to a new PDF file, and then make the PDF file available for the user to download.
I am well aware of the sandbox restrictions that prevent you from writing to the file system. I am wondering if there is a crafty way of creating a PDF in-memory (or a combo of memory and the blobstore) and then storing it somehow so that the client-side (browser) can actually pull it down as a file and save it locally.
This is probably a huge stretch, but my only other option is to farm this task out to a non-GAE server, which I would like to avoid at all cost, even if it takes a lot of extra development on my end. Thanks in advance.

You can definitely achieve your use case using GAE itself. Here are the steps that you should follow at a high level:
Download the excellent iText library, which is a Java library to work with PDFs. First build out your Java code to generate the PDF content. Check out various examples at : http://itextpdf.com/book/toc.php
Since you cannot write to a file directly, you need to generate your PDF content in bytes and then write a Servlet which will act as a Download Servlet. The Servlet will use the Response object to open a stream, manipulate the Mime Headers (filename, filetype) and write the PDF contents to the stream. A browser will automatically present a download option when you do that.
Your Download Servlet will have high level code that looks like this:
public class DownloadPDF extends HttpServlet {
public void doGet(HttpServletRequest req, HttpServletResponse res)
throws ServletException, IOException {
//Extract some request parameters, fetch your data and generate your document
String fileName = "<SomeFileName>.pdf";
res.setContentType("application/pdf");
res.setHeader("Content-Disposition", "attachment;filename=\"" + fileName + "\"");
writePDF(<SomeObjectData>, res.getOutputStream());
}
}
}
Remember the writePDF method above is your own method, where you use iText libraries Document and other classes to generate the data and write it ot the outputstream that you have passed in the second parameter.

While I'm not aware of the PDF generation on Google App Engine and especially in Java, but once you have it you can definitely store it and later serve it.
I suppose the generation of the PDF will take more than 30 seconds so you will have to consider using Task Queue Java API for this process.
After you have the file in memory you can simply write it to the Blobstore and later serve it as a regular blob. In the overview you will find a fully functional example on how to upload, write and serve your binary data (blobs) on Google App Engine.

I found a couple of solutions by googling. Please note that I have not actually tried these libraries, but hopefully they will be of help.
PDFJet (commercial)
Write a Google Drive document and export to PDF

Related

Streaming audio file directly from blob

So I am working on this project where I want to store an audio file in a LARGEBLOB on a database, the size of the file is limited to about 10MB, and be able to load the data through a java servlet that allows for playing of the media file.
Most of the sources I have been able to find suggests storing it locally, however, I want to avoid this solution based on the fact that I'd like to rebuild the website somewhere completely different and not have to rely on the folder structure to be the same.
The issues that I am encountering area mainly that the web browser misinterprets the binary data provided by the servlet. It manages to retrieve that it is an audio file of some sort, however; it is unable to determine the type of audio file, which leads me to believe that the servlet is either not providing enough data, or that I am not doing enough to instruct the web browser on how to play the file.
For example, if I have a file audio.mp3 which I have uploaded to the database into a table Tracks and stored in a column TrackFile. Assuming the query of selecting the right song from the table, what data would the servlet need to provide in order for the browser to play the file when accessing the servlet. Currently when I load the servlet, the browser seems to assume that the type is audio/mpeg instead of audio/mp3. The content currently delivered by the servlet also looks something like this:
response.setHeader("Content-Type", this.getServletContext().getMimeType(t.getTrackName() + '.' + t.getFileType()));
response.setHeader("Content-Length", String.valueOf(t.getTrackData().length));
response.setHeader("Content-Disposition", "inline; filename=\"" + t.getTrackName() + '.' + t.getFileType() + "\"");
response.getOutputStream().write(t.getTrackData());
where t is an object which holds all the data which can be retrieved from the database table about a specific track. The method getTrackData() returns a byte[] with contents of the column TrackFile in it. The source of this method is: link, although I adapted it in order to make it work with audio files, although it doesn't.
Are there any obvious things that I should have caught onto based on the fact that I can't get it to play back the file or is what I want to achieve generally impossible so to say?

SDMX-ML: SAS libname XML

Eurostat data can be downloaded via a REST API. The response format of the API is a XML file formatted according to the SDMX-ML standard. With SAS, very conveniently, one can access XML files with the libname statement and the XML or XMLv2 engine.
Currently, I am using the xmlv2 engine together with the automap= option to generate an xmlmap to access the data. It works. But the resulting SAS data sets are very unstructured, and for another data set to be downloaded the data structure might change. Also the request might depend on the DSD-file that Eurostat provides for each database item within a different XML file.
Here comes the code:
%let path = /your/working/directory/;
filename map "&path.map.txt";
filename resp "&path.resp.txt";
proc http
URL="http://ec.europa.eu/eurostat/SDMX/diss-web/rest/data/cdh_e_fos/..PC.FOS1.BE/?startperiod=2005&endPeriod=2011"
METHOD="GET"
OUT=resp;
run;quit;
libname resp XMLv2 automap=REPLACE xmlmap=map;
proc datasets;
copy out=WORK in=resp;
run;quit;
With the code above, you can view all downloaded data in your WORK library. Its a mess.
To download another time series change parameters of the URL according to Eurostat's description.
So here is my question
Is there a way to easily generate a xmlmap from a call to the DSD file so that the data are stored in a well structured way?
As the SDMX-ML standard is widely used in public institutions such as the ECB, Eurostat, OECD... I am wondering if somebody has implemented requests to the databases, already. I know about the tool from Banca Italia which uses a javaObject. However, I was wondering if there might be a solution without the javaObject.

How do you import a specific element from an external JSON file to an excel file?

I have a long JSON file and i want to copy a specific element from it(i know its name) to an excel file.
eg :: Suppose i want to make an excel file having "Product" (Baleno, i20, Ford Figo etc) imported from a JSON file, how to do it using GET POST or without AJAX.
So, obviously there are ways to write this yourself. What I recommend, however, is using a library (or two. I'd recommend JSON Simple and/or Apache POI) Software engineering is about efficiency, and that includes for the engineer. Using libraries is not shameful. I'd recommend doing that first. Try out using librarys, okay?
-Batista
One simple method I have used, when you only require the content you have in the JSON and if the output needs no formatting!
Create/Construct/Return a CSV File containing the content.
Product,Q1Sales,Q2Sales,Q3SalesQ4Sales
"Baleno",6000,5000,7000,5500
Return the Mimetype Filename as "BalenoSales.xls"
Make the Suffix of the Servlet URL ".xls" as well so Excel/IE likes it.

How to Retrieve the Download Link for Large Video Files

I would like to retrieve the download link for large video files. I have no problems with small video files but with large videos, the response from the server is that the file
"exceeds the maximum file size that Google can scan"
I want to use the link as the source to a video tag. But because that link gives me the error, I can't use it.
I'm using the Java SDK and I'm using File.getWebContentLink() to get the link. I've tried getDownloadLink() but that one doesn't even work.
Basically, is there anyway I can get the download link for large video files?
getWebContentLink() is designed for interactive users (browsers).
Instead, at the raw API level you'll want to use File.get with alt=Media AND also set the acknowledgeAbuse flag if you initially get returned the 'Google can't scan'. Read more on downloading files here and the abuse flag here.
In the Java client library, it'd look something like this:
String fileId = "0BwwA4oUTeiV1UVNwOHItT0xfa2M";
OutputStream outputStream = new ByteArrayOutputStream();
driveService.files().get(fileId)
.set("acknowledgeAbuse", true)
.executeMediaAndDownloadTo(outputStream);
Disclaimer, I haven't compiled the above.
Note: Do not use the .../host/id method mentioned in the other answer - that method is deprecated and scheduled to stop serving content by end of August, 2016 (this year)
Try using https://googledrive.com/host/id where id is the file's ID. Inspired by http://www.scriptscoop2.com/t/1eb5579419c6/issue-when-trying-to-stream-a-video-from-google-drive-inside-html5-vid.html.

Document processing in Liferay portal

I've been using Liferay a lot for past 2 years, but I have never needed any extensive document management.
Now I have a portlet where users upload documents (MS office OLE2 documents, ODS documents, PDF etc.) and I have to persist them with all metadata available.
I know how would I do that without using Liferay, I'd probably use Apache solr with Apache Tika (UpdateRichDocuments and ExtractingRequestHandler) or Apache Jackrabbit that are using Apache Tika under the hood (org.apache.jackrabbit.extractor.*).
The problem is, that If I look at the trunk of Liferay, there are some key classes :
Hooks (JCRHook, FileSystemHook, CMISHook, s3Hook) that are employed from within DLLocalServiceImpl kinda directly
Another alternative is using DLAppLocalServiceImpl that is employing DLRepositoryLocalServiceImpl and the files are persisted into repository also via Hooks, but a lot of additional stuff is done in there.
There is not jackrabbit-text-extractors library in Liferay, so I suppose If I wanted metadata to be extracted from PDF, DOCs, ODS documents, I would have very hard times... because the DL service layer doesn't accept additional properties
I think I'd have to avoid using DL services and JCR hook and access Jackrabbit directly... But I would loose the compatibility and possibility migrate my repository etc.
Could please anybody collaborate on this one please ? Thank you
SOLR for indexing, Jackrabbit for document storage. Managing Liferay Document Library in code is fairly easy, just look at the DL*LocalServiceUtil classes, namely DLFolderLocalServiceUtil and DLFileLocalServiceUtil. By default Liferay just creates a matching folder/file structure on the hard drive (with names changed) so you'd only need to write code or use Jackrabbit if you wanted more than this since Liferay allows up/download and viewing out of the box via the control panel and various portlets.
I haven't used JackRabbit with Liferay but once configured everything should be managed under the covers and you shouldn't need to worry about it on the front end.
When you say "with all metadata available" I'm not sure what is retained, but aside from renaming the file so that it can be tracked there shouldn't be any other changes. It should be quick and easy to test by uploading a file of each type and checking the entries in the LIFERAY/data/document_library directory and subdirectories. Again this would be different if Jackrabbit is used.
those two services DLLocalServiceImpl and DLAppLocalServiceImpl both are and will, I suppose, important. The former one if for direct access to repository. Notice that when adding a file via this service you need to persist corresponding DlFileEntry into database and than reference that addFile(...., fileEntryId, ...).
The latter service is doing additional stuff for you, mainly asset management and workflow.
Regarding your use case, I would avoid using document library, because no metadata can go down into the JCR repository. Actually only metadata/custom properties that you could store would be custom properties AKA Expando feature of Liferay portal.
Best way for you seem to be implement your own jackrabbit hook to store data into repository and let Liferay document library use that repository.
Think Edgar is correct. If you check the current trunk via http://svn.liferay.com/repos/public/portal/trunk/portal-service/src/com/liferay/documentlibrary/service/DLLocalService.java (login as guest and no password), you will no longer find the class DLFolderLocalServiceUtil. We are using the existing DLFolderLocalServiceUtil class as well. Thanks for the heads up. We will refactor our code so when 6.1 comes around we can still use the DocumentLibrary services.
You need to always use DLAppServiceUtil ( as Liferay instructs specifically ). Here is my working code that saves a file to the CMS:
public static void saveFileToCMS(ActionRequest aReq, long groupId, String fileName, File filenameWithPath) {
try {
ServiceContext serviceContext = ServiceContextFactory.getInstance(
Group.class.getName(), aReq);
// prevents duplicate entries based on unique title name
Random rand = new Random();
Integer suffix = new Integer(rand.nextInt(10000));
DLAppServiceUtil.addFileEntry(groupId, 0, fileName, "application/vnd.ms-excel",
fileName + suffix.toString(), "description goes here", "changelogname",
filenameWithPath, serviceContext);
//log.info("Successfully added the new file");
} catch (PortalException pe) {
log.error("Portal Exception occurred while saving file to CMS");
pe.printStackTrace();
} catch (SystemException e) {
log.error("System Exception occurred while saving file to CMS");
e.printStackTrace();
}
}

Categories