I have recently discovered the OpenDoPE project. From what I understand from the walkthrough, .docx files must be preprocessed to replace repeatable contents for example.
If I understand well there are 2 ways to do it :
Using docx4j
Using a Macro
I am developing a rails web platform, and I'd prefer the preprocessing to be done client-side, so with the Macro. But then If I can only do it with java, I'll go with it
Problem : when I click the "inject macro" button in the OpenDop Add-in in Word2010, nothing happens :O
So two possible answers :
Explain how I can install this macro in the document
Explain how I can have docx4j to preprocess the document. ie : from a linux terminal, what command with what parameters should I type to preprocess some document.docx file containing repeatable-contents ?
I tried clicking the "inject macro" button in my Word 2010, and it worked, that is:
it prompted me to save a .docm file
when i opened the .docm file in Word, the macro ran
Trying to open the macro in Word's VBA editor though, I couldn't. Seems I obfuscated it :-(
I do have the source files floating around, which I'd be happy to put on GitHub.
Please note however, that it is 4yo unmaintained 'proof of concept' level code (whereas the docx4j code is actively maintained and used by a variety of companies).
For non-interactive processing using Java, see samples/ContentControlBindingExtensions.java
To invoke from a Linux command line, that would need modifying slightly; also you need of course to pass a suitable class path.
The other way you could do it is by installing this simple web app in say Tomcat.
Related
My JavaFX program needs to compute a file name path for various user-written files such as the product-specific preferences. For example, "Do you want to open your previous file the next time the program starts" and so on. I have successfully experimented with https://github.com/dlemmermann/JPackageScriptFX and "jpackage" for Windows at least, so it looks like I will shortly need a way to code, in a cross-platform manner, the "correct" path in which to store such files.
Is there a standard API or coding technique that will give me a file path that the program can write using user permissions that is "correct" for these native platforms?
I am not aware of a single piece of software which would do that but I think my answer to this questions Java - Cross-platform filepath may be helpfull for you. It also mentions how the same can be achieved on Android for example.
I made a text editor in java. It has a FileExplorer class which allows me to read and write to a file. Now I would like to know how I could open said file (text.txt) with my editor application from outside my application. Basically when I double-click on the file (text.txt) it should start my application and pass some variables(like name and path). The application is a .jar file.
I am not going to bother anyone to go through the 2k lines of code, so I won't post it in here. But it is just a JTextPane in a JFrame and a PrintWriter/BufferedReader reading and writing to the file.
PS: should preferably work cross-platform.
Thanks
This question is not about java. It is about mapping files to specific application using you OS tools.
If you are on Windows you have to map extension *.txt to your application. Take a look on this article for details.
Please note that your application must accept file path in command line.
To make association easier I'd recommend you to writer batch file that runs your application and also accepts file name in command line. Then you just have to associate your batch file with *.txt extension.
If you are on Linux association technique depends on your flavor, but you can google it. Obviously you will have to create shell script instead of batch file.
EDIT
Actually your question is mostly about installation process. There are a lot of installation tools that can do this work for you. Some of the tools are even cross platform (I can remember "install anywhere"). There are both commercial and free tools that do this.
I have developed a report using JasperReports (iReport tool), which needs to be exported to MS-Word document. I have Word-2007 on my machine and it is working fine with .docx extension (hardcoded). However if any machine has MS-Word 1997-2003 installed which takes .doc version, it will cause problem as I am using harcoded value .docx.
Is there any way to handle this?
If knowing the word version on client machine is the only option how can i know the version installed? I think this part of code should be in javascript/jQuery (to know MS-Word version on client machine).
Please let me know how this can be accomplished.
You can't inspect the contents of somebody's machine from a browser. That would be a severe security risk.
Give them the choice. JasperReports can export in lots of formats; give the user multiple options and, if necessary, add a brief explanation to each option. Alternatively, give the user a .doc file, which the latest versions of MS word and OpenOffice can also open.
My work has tasked me with determining the feasibility of migrating our existing in-house built change management services(web based) to a Sharepoint solution. I've found everything to be easy except I've run into the issue that for each change management issue (several thousand) there may be any number of attachment files associated with them, called through javascript, that need to be downloaded and put into a document library.
(ex. ... onClick="DownloadAttachment(XXXXX,'ProjectID=YYYY');return false">Attachment... ).
To keep me from manually selecting them all I've been looking over posts of people wanting to do similar, and there seem to be many possible solutions, but they often seem more complicated than they need to be.
So I suppose in a nutshell I'm asking what would be the best way to approach this issue that yields some sort of desktop application or script that can interact with web pages and will let me select and organize all the attachments. (Making a purely web based app (php, javascript, rails, etc.) is not an option for me, so throwing that out there now).
Thanks in advance.
Given a document id and project id,
XXXXX and YYYY respectively in
your example, figure out the URL
from which the file contents can be
downloaded. You can observe a few
URL links in the browser and detect
the pattern which your web
application uses.
Use a tool like Selenium to get a
list of XXXXXs and YYYYs of
documents you need to download.
Write a bash script with wget to
download the files locally and put
in the correct folders.
This is a "one off" migration, right?
Get access to your in-house application's database, and create an SQL query which pulls out rows showing the attachment names (XXXXX?) and the issue/project (YYYY?), ex:
|file_id|issue_id|file_name |
| 5| 123|Feasibility Test.xls|
Analyze the DownloadAttachment method and figure out how it generates the URL that it calls for each download.
Start a script (personally I'd go for Python) that will do the migration work.
Program the script to connect and run the SQL query, or can read a CSV file you create manually from step #1.
Program the script to use the details to determine the target-filename and the URL to download from.
Program the script to download the file from the given URL, and place it on the hard drive with the proper name. (In Python, you might use urllib.)
Hopefully that will get you as far as a bunch of files categorized by "issue" like:
issue123/Feasibility Test.xls
issue123/Billing Invoice.doc
issue456/Feasibility Test.xls
Thank you everyone. I was able to get what I needed using htmlunit and java to traverse a report I made of all change items with attachments, go to each one, copy the source code, traverse that to find instances of the download method, and copy the unique IDs of each attachment and build an .xls of all items and their attachments.
Is there any free Java library for extracting text from PDF, that is compatible with Google Application Engine?
I've read about PDFJet, but it can't read PDF, can it?
Is there perhaps other way how to extract text from PDF? I tried http://www.pdfdownload.org/, unfortunately they don't handle non-English characters correctly.
iText now has a text parsing module (I'm one of the parser authors). See the com.itextpdf.text.pdf.parser.PdfContentReaderTool class for an example of how to use it.
PdfBox does not run on GAE. It uses not-allowed java classes.
(GAE only permits these http://code.google.com/appengine/docs/java/jrewhitelist.html)
I have partially modified a very old version of PdfBox (0.7.3) to be GAE complaiant. Now I'm able to extract text from PDF (whole page or rectangular area). I only modified a minumum part of the pdf text extraction and not the whole PdfBox. :)
The idea was to remove refences to java.awt.retangle & C. using my own "rectangle" class.
More info: http://fhtino.blogspot.com/2010/04/pdfbox-text-extration-gae.html
I modified the latest (1.8.0-Snapshot) version to run on Google AppEngine. Had to disable one Unit-Test, but it runs fine for simple text extraction.
Following the simple try-fail-fix approach i had to modify 5 files in total. Pretty doable.
You'll also have to explicitly use a RandomAccessBuffer, like Fabrizio explained.
For the extra lazy, heres the compiled jar, dependencies for text extraction, and the patch. Note that it might not work for every usecase (i.e. rectangle based extraction). Used it to extract text of a whole page.
https://docs.google.com/folder/d/0B53n_gP2oU6iVjhOOVBNZHk0a0E/edit
I know there is http://pdfbox.apache.org/index.html
Apache PDFBox is an open source Java
PDF library for working with PDF
documents. This project allows
creation of new PDF documents,
manipulation of existing documents and
the ability to extract content from
documents.
but I've never tested it.
Last month, I'd just finished extracting text from pdf file in my project. I used XPDF tool for getting text, and text coordinates, but I used it in Xcode (Objective-C). This tool was open source, written by C++, and able to be encoded in many language. However, I didn't know whether XPdf would be work on your java, or not. Anyway, You can try this tool.