suggestions for PDF topic location - java

Short version: Please give me a recommendation to the best place to post a question helping me convert a PDF to TEXT using JAVA programming.
Details:
I've been working on trying to change a PDF to a text file using JAVA and keep the format as close to the PDF's as possible. I've currently been using a separate,free, third party program to do the converting and then the JAVA program I made does everything else I want. I've asked around a lot of places and most recommend PDFBox which didn't work. All PDFBox did was the same thing Adobe Read X would do, which is create a huge bunch of text mess. I've tried a lot of things and spent a lot of time on this. What I'm going to do now is share one of the PDF's I been trying convert and hopefully someone can help me with some JAVA code that will help convert this. I've only really got permission to share this old file once (even thou I'm 99% sure it would be fine to share the file a few other places) and I would like post the question at the most effect spot.

Well short version (and to actually answer my actual question)
Answer is: http://stackoverflow.com
Special Thanks to #TilmanHausherr who went above and beyond by following my updates and helping me more.
Longer version.
I'm still having some formatting issues but I think I'll be able to find the rest on my own. As far as what I asked for getting the document formatted correctly using Java, that's done. As far as the question was asked this document is kind of formatted correctly it just doesn't look as pretty as some other things I've used.
After that, I got stuck because the formatting would not be correct after the conversation. I've asked around before and most said it would be too hard to explain. In the end I've had to re-learn how to attach PDFBox, trouble shoot through common problems others have had, and a single line of code from TilmanHausherr as shown in the comments helped.
When I started this project I had to learn how to get PDFBox to work with my IDE and how to arrange libs. and such. I then went on to find some old googled codes that used PDFBox to convert text from a PDF. I can't share the code used to do the converting but it takes about 4 to 5 mins to search for the original posters work. There still some modifications I had to do to their code just to get it to work, but I just followed prompts from my IDE: Eclipse.
I used this code to write to the text file instead of a Formatter.:
String textFromMain = textForAll;
try( PrintWriter out = new PrintWriter( "text.txt" ) ) {
out.println( textFromMain );
}

Related

Retrieve just the intro text in Wikipedia

How can I retrieve just the intro text (those few lines at the beginning of each article), using java?
I've seen a question like this here, the problem is that the code was in PHP and I need it in java so I can implement it to my android app...
I've tried to search all GitHub for some easy libraries that could help me get what I want and I got no success.
I've also seen that this link: https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Stack%20Overflow shows information about whatever I want to search. I just can't retrieve the data because, as I said, in that question, the code posted was in PHP
check this https://www.mediawiki.org/api/rest_v1/#!/Page_content/get_page_summary_title
Api for accessing content is here , rest you can normally parse the data.
hope this helps.

Passing a file into a program using Eclipse

Ok so this may be a dumb question, but how do I pass a text file into my java program and likewise have a method which reads over it? I know I need to use a scanner. But I have unsuccessfully gotten the program to even recognize the text file. Any ideas?
Here's a link to one way to do it, try googling it also if this doesn't help.
Java: How to read a text file
This is a pretty common thing in java and should have many examples online.
You could also use a reader of some sort such as a bufferedReader https://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html

Java BlackBerry - How to call a php script properly with GET method?

Hey guys what's up? I'm making a very simple game for BlackBerry Curve 8520, and i need to get the ranking from the server. In order to get this data, the web programmers gave me php files that gets the data from the database and returns it as a dynamic xml file.
My question is:
How can i load that php file using java code? how can i pass a get parameter to that request?
how can i parse the xml retrieved from the php script?
Thanks in advance!
Francisco
You really have two problems here, and I think you should attempt to address them separately.
Problem 1 is getting the data from the Server
Problem 2 is parsing the data you get from the Server.
Both these problems have been covered extensively on this and other forums previously, so I suggest that you search here and elsewhere. Here are a few links from SO:
blackberry HttpConnection.GET
Parse XML file on BlackBerry
In addition, I recommend you review the documentation provided on the official BB site:
http://developer.blackberry.com/bbos/java/
including the following:
http://developer.blackberry.com/bbos/java/documentation/intro_networking_1984362_11.html
As you will see, the BB offers a number of methods of doing communication, in your case I would recommend the ConnectionFactory API:
http://developer.blackberry.com/bbos/java/documentation/network_api_1984363_11.html
And here is something on parsing XML:
http://supportforums.blackberry.com/t5/Java-Development/Use-the-XML-Parser/ta-p/445210
This should be enough to get you going. Please come back with specific questions if you have issues with any of this.

Merging pdf:s with pdfBox creates a unnecessary large file

Massive amount of hit on this topic but only crappy threads :(
I merge a bunch of pdf files with pdfBox. Easy with a class for the purpose.
But the reult is a very large file. I have no exact figure now but its easy twise the size compared to a merge done by a ordinary desctop app.
Not acceptable im afraid.
The problem seems to be similar to this (split in this case, same same but diffrent):
https://issues.apache.org/jira/browse/PDFBOX-785
After some googling I think the problem is that the merge produces a barebones merged PDF file, and a large one at that, without compresson.
According this blog some java pdf libs can handle compression:
http://pdf-house.blogspot.com/
Itext handles this according with pdfstamper setFullCompression().
PDF/CompressPdfdocument.htm">http://www.java2s.com/Tutorial/Java/0419_PDF/CompressPdfdocument.htm
But i also bumped in to the ghost script project.
https://www.linux.com/news/software/applications/8229-putting-together-pdf-files
So, I need a second opinion. This ghost script seems cool, but itext does the trick according to google.
Am I on the right track? What to choose? One of the above or somthing intirely diffrent?
Tnx!
Try mixing PDFBox for merging with itext for compression.
See groovy example: http://pastebin.com/w8Rz8uha
I tested it with http://www.tobcon.ie/assets/files/test.pdf and uncompressed.pdf is 302kb and compressed.pdf is 58kb. (100 duplicated pages)

Unable to parse pdf by Jpedal

I'm facing a problem while parsing a PDF with Jpedal.
While reading the wordlist from the Jpedal, I get garbled characters in the wordslist. This also happens when using OCR, and when I copy the text from PDF and paste in Word or a simple text editor. What I understand is this PDF was generated by Quartz PDF context on MAC OS X 10.6.4, which is used to compress the file size, but iseasily viewable on PDF viewers. I searched for any Java API supporting for decoding this kind of PDF but was unsuccessful. I'm looking for any application or Java API which I can use to decode it; must be usable on a Linux machine.
Hye everybody
I'm posting a possible solution for problem. Here is link describing how quartz parse the pdf and of course which need to be implemented in code cause till now I didn't found any readymade API for it and I believe that stackoverflow is all about taking initiative and do and answer the questions which not been done or asked before.
regards
Rituraj

Categories