Any tool to help writing programing article with MS Office Word - java

All! When I use MS office 2010 Word to write an article about Java, I always need to embed some Java code into the document. I wonder whether there is such a plugin for Word, or another software tool, that can help me do such a job.
Especially features on code formatting and keywords highlighting are welcome!

There are several ways to get syntax highlighted text in Word. Here are some I know:
Eclipse + OpenOffice Writer (Word should do this as well)
Mark and copy a code section in Eclipse
Paste into Writer (Ctrl+Shift+V) or Word (Ctrl+Alt+V) as HTML or RTF text
Using Notepad++
Paste the code section to Notepad++
Select the appropriate syntax highlighting
Use the NppExport Plugin to export to a RTF file
Open the file with Word

http://www.ifcx.org/wiki/Wings.html will let you embed live code and execute it in a document, but it requires open office and some plugins.

There is also Highlight, which can export to multiple formats (including HTML, RTF and even SVG!), and has many themes.

Related

Preprocess OpenDoPE Word file (Macro or docx4j)

I have recently discovered the OpenDoPE project. From what I understand from the walkthrough, .docx files must be preprocessed to replace repeatable contents for example.
If I understand well there are 2 ways to do it :
Using docx4j
Using a Macro
I am developing a rails web platform, and I'd prefer the preprocessing to be done client-side, so with the Macro. But then If I can only do it with java, I'll go with it
Problem : when I click the "inject macro" button in the OpenDop Add-in in Word2010, nothing happens :O
So two possible answers :
Explain how I can install this macro in the document
Explain how I can have docx4j to preprocess the document. ie : from a linux terminal, what command with what parameters should I type to preprocess some document.docx file containing repeatable-contents ?
I tried clicking the "inject macro" button in my Word 2010, and it worked, that is:
it prompted me to save a .docm file
when i opened the .docm file in Word, the macro ran
Trying to open the macro in Word's VBA editor though, I couldn't. Seems I obfuscated it :-(
I do have the source files floating around, which I'd be happy to put on GitHub.
Please note however, that it is 4yo unmaintained 'proof of concept' level code (whereas the docx4j code is actively maintained and used by a variety of companies).
For non-interactive processing using Java, see samples/ContentControlBindingExtensions.java
To invoke from a Linux command line, that would need modifying slightly; also you need of course to pass a suitable class path.
The other way you could do it is by installing this simple web app in say Tomcat.

Is it possible to read text from a pdf using java FileReader or an alternative for java?

I am wanting to make a simple word counter for my latex documents so that I can double check my word count is accurate. More generally it is useful to discover whether java can interpret text from pdf files anyway. A google on it brought nothing up so I am thinking maybe not? If not, why?
You can't read text from a .pdf without a PDF file reader. Here are a couple of Java .pdf libraries:
Apache PDF Box
iText
See also this link, for an example of Java text extraction with PDF Box:
http://pdfbox.apache.org/userguide/text_extraction.html

PDF Open Office or MS Word

I am new to java, I have to read a PDF, Open Office or MS Word file and make changes in the file and render as PDF document on my web page. Please someone tell me which of these file's API or SDK is easy to use and also tell me best SDK for this. So I can read, Update and render easily. file also contains Table but there is no image.
We use Apache POI to read Microsoft Office files. There are many libraries for PDF in Java. iText is something I have used. Once you pick the tools, do a selective search on Stack Overflow. There are plenty of discussions around these tools.
Depending on the types of updates you are doing, modifying PDF is going to be a problem - it's not intended for editing. You might have to find some way of converting the PDF to something first, then edit. Depending on the types of changes you want to make and the documents you are working from even editing DOC and Writer files is going to be tricky. They are all different formats.
As Jayan mentioned, iText and POI may help you a little. OpenOffice Writer documents can be edited by unzipping then modifying the XML or using the UNO API. Word documents can be editied by using MS Office automation (bad idea), converting to OpenOffice first then editing, or if DOCX, unzipping and processing the XML.
Good luck.

How can I pretty-print Java source code as a PDF?

I'm planning to put some Java code in an appendix to my report. The report is a PDF document, and I use Eclipse for Java.
How can I present it best and do this easily? Any recommendations?
For this purpose, I created a LaTeX doclet. This is a Javadoc doclet, which converts the javadoc comments to LaTeX code, and (if wanted) also includes a pretty-printed version of the source code of the documented methods.
You can then convert the generated LaTeX document to PDF, and append it to your report.
If you use Windows, install CutePDF. This adds a "Printer" that when you print to it it asks you a file name and then prints the output to a .pdf document on your hard drive - hence it is a psuedo printer - it acts like a printer, but is really a pdf file writer.
Don't know solutions for other o/s...
I usually prefer to install a PDF "psuedo" printer in whatever OS I am using. That way I can use the print facilities of whatever app I am using (like Eclipse for example) and get the result in PDF file.
EDIT:
Here is one example of a pseudo printer, this for the Windows platform. Mac OS X has a built in "print to PDF file" capability.
You can use doxygen to generate documentation for your project which can include a formatted source file listing in addition to Javadoc. doxygen can generate both HTML and PDF output. You'll need latex to generate the PDF output.
Another way to pretty print is with IntelliJIDEA. It works also with the community edition.
It's advisable to install a PDF printer, in order to try printouts without wasting a lot of paper. Once you're satisfied with the result, you can print on the real printer. On Windows you can use CutePDF, on Linux Ubuntu install the package cups-pdf with sudo apt-get install cups-pdf.
Note that IntelliJ prints the theme's background, so it's advisable to be on a white background to avoid wasting ink.
To print click on menu File -> Print. The printer selection is in the next menu, after you press on the Print button.
Interestingly you can also print only the selected text, which is useful if you don't want to print import statements.
Other options include the possibility to add line numbers, syntax highlighting and colour printing. On Linux IntelliJ 14.0.3, the default font was a huge size 14, so you might want to change that too.
You could just copy & paste into Word (2007+) and save as PDF. It's a little more straightforward than the file printer, and you can format your code for best results in Word.
You could just copy & paste into OpenOffice/LibreOffice and export to PDF.

Extract text from PDF (google app engine)

Is there any free Java library for extracting text from PDF, that is compatible with Google Application Engine?
I've read about PDFJet, but it can't read PDF, can it?
Is there perhaps other way how to extract text from PDF? I tried http://www.pdfdownload.org/, unfortunately they don't handle non-English characters correctly.
iText now has a text parsing module (I'm one of the parser authors). See the com.itextpdf.text.pdf.parser.PdfContentReaderTool class for an example of how to use it.
PdfBox does not run on GAE. It uses not-allowed java classes.
(GAE only permits these http://code.google.com/appengine/docs/java/jrewhitelist.html)
I have partially modified a very old version of PdfBox (0.7.3) to be GAE complaiant. Now I'm able to extract text from PDF (whole page or rectangular area). I only modified a minumum part of the pdf text extraction and not the whole PdfBox. :)
The idea was to remove refences to java.awt.retangle & C. using my own "rectangle" class.
More info: http://fhtino.blogspot.com/2010/04/pdfbox-text-extration-gae.html
I modified the latest (1.8.0-Snapshot) version to run on Google AppEngine. Had to disable one Unit-Test, but it runs fine for simple text extraction.
Following the simple try-fail-fix approach i had to modify 5 files in total. Pretty doable.
You'll also have to explicitly use a RandomAccessBuffer, like Fabrizio explained.
For the extra lazy, heres the compiled jar, dependencies for text extraction, and the patch. Note that it might not work for every usecase (i.e. rectangle based extraction). Used it to extract text of a whole page.
https://docs.google.com/folder/d/0B53n_gP2oU6iVjhOOVBNZHk0a0E/edit
I know there is http://pdfbox.apache.org/index.html
Apache PDFBox is an open source Java
PDF library for working with PDF
documents. This project allows
creation of new PDF documents,
manipulation of existing documents and
the ability to extract content from
documents.
but I've never tested it.
Last month, I'd just finished extracting text from pdf file in my project. I used XPDF tool for getting text, and text coordinates, but I used it in Xcode (Objective-C). This tool was open source, written by C++, and able to be encoded in many language. However, I didn't know whether XPdf would be work on your java, or not. Anyway, You can try this tool.

Categories