Passing a long/large string as an argument into a Java app? - java

I'm attempting to pass a large/long string, a PDF binary read from disk, into a Java app and so far am having very little success. The app works when I read the file in from a local disk, so I know the problem is related to how I'm importing the data. Roughly this is what I'm looking at:
I'm reading, manipulating the pdf in PHP and using exec(); to touch the Java app, this is where I'm at with the Java:
Works:
File input = new File("C:\\Users\\Jack\\Downloads\\col_terror.pdf");
document = PDDocument.load(input);
Does not work:
PHP:
exec("/path/to/jar/java -jar JavaAppHere.jar $pdf_string",$ouput);
Java:
public static void main(String[] args) throws Exception {
...
document = PDDocument.load( args[0] );
...
}
I feel this is something quite simple I am not understanding about passing strings as args, though it has been a couple years since I've made a venture into the land of Java.

Please read the Javadoc of PDDocument. If PDDocument refers to PDDocument you can see that you are passing data while PDDocument.load(java.lang.String) expects the filename.
You also don't seem to encode $pdf_string with escapeshellarg()
Since there are too many variables invovled in passing binary data around as shell arguments, I'd advice against it.
The easiest solution is to write the PDF to a temporary file and pass the filename to java. The alternative is to pass the PDF data via stdin.

Related

Replacing text in XWPFParagraph without changing format of the docx file

I am developing font converter app which will convert Unicode font text to Krutidev/Shree Lipi (Marathi/Hindi) font text. In the original docx file there are formatted words (i.e. Color, Font, size of the text, Hyperlinks..etc. ).
I want to keep format of the final docx same as the original docx after converting words from Unicode to another font.
PFA.
Here is my Code
try {
fileInputStream = new FileInputStream("StartDoc.docx");
document = new XWPFDocument(fileInputStream);
XWPFWordExtractor extractor = new XWPFWordExtractor(document);
List<XWPFParagraph> paragraph = document.getParagraphs();
Converter data = new Converter() ;
for(XWPFParagraph p :document.getParagraphs())
{
for(XWPFRun r :p.getRuns())
{
String string2 = r.getText(0);
data.uniToShree(string2);
r.setText(string2,0);
}
}
//Write the Document in file system
FileOutputStream out = new FileOutputStream(new File("Output.docx");
document.write(out);
out.close();
System.out.println("Output.docx written successully");
}
catch (IOException e) {
System.out.println("We had an error while reading the Word Doc");
}
Thank you for ask-an-answer.
I have worked using POI some years ago, but over excel-workbooks, but still I’ll try to help you reach the root cause of your error.
The Java compiler is smart enough to suggest good debugging information in itself!
A good first step to disambiguate the error is to not overwrite the exception message provided to you via the compiler complain.
Try printing the results of e.getLocalizedMessage()or e.getMessage() and see what you get.
Getting the stack trace using printStackTrace method is also useful oftentimes to pinpoint where your error lies!
Share your findings from the above method calls to further help you help debug the issue.
[EDIT 1:]
So it seems, you are able to process the file just right with respect to the font conversion of the data, but you are not able to reconstruct the formatting of the original data in the converted data file.
(thus, "We had an error while reading the Word Doc", is a lie getting printed ;) )
Now, there are 2 elements to a Word document:
Content
Structure or Schema
You are able to convert the data as you are working only on the content of your respective doc files.
In order to be able to retain the formatting of the contents, your solution needs to be aware of the formatting of the doc files as well and take care of that.
MS Word which defined the doc files and their extension (.docx) follows a particular set of schemas that define the rules of formatting. These schemas are defined in Microsoft's XML Namespace packages[1].
You can obtain the XML(HTML) format of the doc-file you want quite easily (see steps in [1] or code in link [2]) and even apply different schemas or possibly your own schema definitions based on the definitions provided by MS's namespaces, either programmatically, for which you need to get versed with XML, XSL and XSLT concepts (w3schools[3] is a good starting point) but this method is no less complex than writing your own version of MS-Word; or using MS-Word's inbuilt tools as shown in [1].
[1]. https://www.microsoftpressstore.com/articles/article.aspx?p=2231769&seqNum=4#:~:text=During%20conversion%2C%20Word%20tags%20the,you%20can%20an%20HTML%20file.
[2]. https://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/testcases/org/apache/poi/hwpf/converter/TestWordToHtmlConverter.java
[3]. https://www.w3schools.com/xml/
My answer provides you with a cursory overview of how to achieve what you want to, but depending on your inclination and time availability, you may want to use your discretion before you decide to head onto one path than the other.
Hope it helps!

Which type is BLOB?

I have about 100,000 BLOBs in my database and have to work with them. All is ok when someone tells me which type of BLOB I must deal with. But there will be situations when I will not know which type is it. So how can I find out which type my BLOB is?
Last time I handled BLOB I get specific info about it, it was zipped file. So I did this..
try {
byte[] str = this.jdbcTemplate.queryForObject("SELECT SAVEDATA FROM JDBEVPP1.TEVP005 WHERE GFNR = 357302", byte[].class); // pakira BLOB u byte array
ByteArrayInputStream bys = new ByteArrayInputStream(str);
GZIPInputStream gzip = new GZIPInputStream(bys);
//...etc...
}
How can I find out which type BLOB is using Java code?
The final answer to this question would be the comment of Robert Harvey:
"The usual way to identify a binary file of some type is to have some "magic numbers" at the beginning of the file that you can use to identify the type. See en.wikipedia.org/wiki/… and en.wikipedia.org/wiki/File_format#Magic_number"
And also comment of Erwin Smout:
"By reading the detailed specs of the database design. Absent that, by trying to locate the original author of the system and hoping he still remembers. Absent that, by trying to locate other code that uses the same BLOB and kind of re-engineering the spec from there. In most shops you will have to go all the way to the third step alas. "

Android/Java How to read this file on website?

www.rgrfm.be/rgrsite/maxradio/android.php
www.rgrfm.be/rgrsite/maxradio/onair.txt
The track information of the music being played is contained in onair.txt. android.php is a php script I wrote.
I need to display the track information in my Android application. I do not want do download it to disk but keep it in memory. I don't know if the php script is useless because it would create additional overhead. So it's probably better to simply parse onair.txt
InputStream is = new URL("http://www.rgrfm.be/rgrsite/maxradio/onair.txt").openStream();
I am stuck with this. Has anyone got time to help me?
As described, php script seems useless. Since, you can directly read the text file. So, first read it as text, then parse it.
URL url = new URL("http://www.rgrfm.be/rgrsite/maxradio/onair.txt");
String text = readAsText(url)
parse(text);
String readAsText(URL url) {
// read the url as text here.
}
void parse(String text) {
}

Writing to a PDF from inside a GAE app

I need to read several megabytes (raw text strings) out of my GAE Datastore and then write them all to a new PDF file, and then make the PDF file available for the user to download.
I am well aware of the sandbox restrictions that prevent you from writing to the file system. I am wondering if there is a crafty way of creating a PDF in-memory (or a combo of memory and the blobstore) and then storing it somehow so that the client-side (browser) can actually pull it down as a file and save it locally.
This is probably a huge stretch, but my only other option is to farm this task out to a non-GAE server, which I would like to avoid at all cost, even if it takes a lot of extra development on my end. Thanks in advance.
You can definitely achieve your use case using GAE itself. Here are the steps that you should follow at a high level:
Download the excellent iText library, which is a Java library to work with PDFs. First build out your Java code to generate the PDF content. Check out various examples at : http://itextpdf.com/book/toc.php
Since you cannot write to a file directly, you need to generate your PDF content in bytes and then write a Servlet which will act as a Download Servlet. The Servlet will use the Response object to open a stream, manipulate the Mime Headers (filename, filetype) and write the PDF contents to the stream. A browser will automatically present a download option when you do that.
Your Download Servlet will have high level code that looks like this:
public class DownloadPDF extends HttpServlet {
public void doGet(HttpServletRequest req, HttpServletResponse res)
throws ServletException, IOException {
//Extract some request parameters, fetch your data and generate your document
String fileName = "<SomeFileName>.pdf";
res.setContentType("application/pdf");
res.setHeader("Content-Disposition", "attachment;filename=\"" + fileName + "\"");
writePDF(<SomeObjectData>, res.getOutputStream());
}
}
}
Remember the writePDF method above is your own method, where you use iText libraries Document and other classes to generate the data and write it ot the outputstream that you have passed in the second parameter.
While I'm not aware of the PDF generation on Google App Engine and especially in Java, but once you have it you can definitely store it and later serve it.
I suppose the generation of the PDF will take more than 30 seconds so you will have to consider using Task Queue Java API for this process.
After you have the file in memory you can simply write it to the Blobstore and later serve it as a regular blob. In the overview you will find a fully functional example on how to upload, write and serve your binary data (blobs) on Google App Engine.
I found a couple of solutions by googling. Please note that I have not actually tried these libraries, but hopefully they will be of help.
PDFJet (commercial)
Write a Google Drive document and export to PDF

How to Generate KML file with Style using Geotools?

I performed a lot of search around this question, and do not find any answer.
In a Java Program, I have a "SimpleFeatureCollection"(geotools) and a "StyleLayerDescriptor"(geotools) which contain my "SimpleFeatureCollection" style, and I need to generate a KML file using this style.
I actually generate successfully a KML file (without any style) using my "SimpleFeatureCollection" object with the following code:
static public boolean collectionToKMLFile(File iKMLFile, SimpleFeatureCollection iPolygonsCollection, StyledLayerDescriptor iStyle) throws IOException
{
Encoder lEncoder = new Encoder(new KMLConfiguration());
FileOutputStream lFileOutputStream = new FileOutputStream(iKMLFile);
lEncoder.setIndenting(true);
lEncoder.encode(iPolygonsCollection, KML.kml, lFileOutputStream);
lFileOutputStream.close();
return false;
}
I do not find any information on how to add style, I do not think it is impossible, do yo have an idea ?
Thanks.
I finally decided to write a program to generate my own Styled KML File.
In fact, it is not really hard:
Write KML Header
Loop over your geometries and Write them
Write Kml Footer
All information relative to KML elements defined in KML Version 2.2 could be found here :
"KML Reference"
Enjoy.

Categories