Parsing PDF files hosted in web servers - java

I have used iText to parse pdf files. It works well on local files but I want to parse pdf files which are hosted in web servers like this one:
"http://protege.stanford.edu/publications/ontology_development/ontology101.pdf"
but I don't know how??? Could you please answer me how to do this task using iText or other libraries... thx

You need to download the bytes of the PDF file. You can do this with:
URL url = new URL("http://.....");
URLConnection conn = url.getConnection();
if (conn.getResponseCode() != HttpURLConnection.HTTP_OK) { ..error.. }
if ( ! conn.getContentType().equals("application/pdf")) { ..error.. }
InputStream byteStream = conn.getInputStream();
try {
... // give bytes from byteStream to iText
} finally { byteStream.close(); }

Use the URLConnection class:
URL reqURL = new URL("http://www.mysite.edu/mydoc.pdf" );
URLConnection urlCon = reqURL.openConnection();
Then you can use the URLConnection method to retrieve the content. Easiest way:
InputStream is = urlCon.getInputStream();
byte[] b = new byte[1024]; //size of a buffer, can be any
int len;
while((len = is.read(b)) != -1){
//Store the content in preferred way
}
is.close();

Nothing to it. You can pass a URL directly into PdfReader, and let it handle the streaming for you:
URL url = new URL("http://protege.stanford.edu/publications/ontology_development/ontology101.pdf" );
PdfReader reader = new PDFReader( url );
The JavaDoc is your friend.

Related

WebResourceResponse can't read full inputstream from HttpConnection (Android)

I'm working with an ionic application(like hybrid) which can play some videos.I want to add some headers to the request so that I override the "shouldInterceptRequest".
URL myUrl = new URL(real);
HttpURLConnection connection = (HttpURLConnection) myUrl.openConnection();
for (Map.Entry < String, String > entry: headers.entrySet()) {
connection.setRequestProperty(entry.getKey(), entry.getValue());
}
InputStream in = connection.getInputStream();
WebResourceResponse response = new WebResourceResponse("video/mp4", "UTF-8", in );
for (Map.Entry < String, List < String >> entry: connection.getHeaderFields().entrySet()) {
resHeaders.put(entry.getKey(), entry.getValue().get(0));
}
This code can't work.The video tag in html can't play the video.
So I add some code.
byte[] bytes = new byte[30 * 1024 * 1024];
ByteArrayOutputStream byteBuffer = new ByteArrayOutputStream();
int len = 0;
while ((len = in.read(bytes)) != -1) {
byteBuffer.write(bytes, 0, len);
}
bytes = byteBuffer.toByteArray();
WebResourceResponse response = new WebResourceResponse("video/mp4", "UTF-8", new ByteArrayInputStream(bytes));
When I read inputstream into bytes and send bytes to WebResourceResponse, the video can be play.However it means my application will use lots of memory if the video is large.
So that, I want to know is there any way to play the video without saving inputstream into bytes.
OK.Fianlly,I found my way.
Actually, my goal is to add custom headers to resource request and now I found there is an easier way to do it.
For example, if I want to load a image,I will use img tag like this,<img src="the_url_of_image">,and I can't add any header to the request unless I interecept the request.
However,we can use blob now.We can request the resource by using something like ajax and use createObjectURL to create a url links to the resource.

Receive image file through rest Api

How to receive an image file through Rest APIs. There is an option of MULTIPART_FORM_DATA which looks like it will send files in parts as in more than one request.
I want to receive images very fast on server. around 2 images per second.
Simply read image in a File and use Response class to build the response.
Response.ok(new File("myimage.jpg"), "image/jpeg").build();
There are other variations of the same.
Read the image using following.
URL url = new URL("http://localhost:8080/myimage/1");
URLConnection connection = url.openConnection();
input = connection.getInputStream();
byte[] buffer = new byte[1024];
int n = - 1;
OutputStream fos = new FileOutputStream("Output.jpg" );
while ( (n = input.read(buffer)) != -1)
{
fos.write(buffer, 0, n);
}
fos.close();
You can use Apache HTTP client to make it prettier.

How to retrieve the download link for a not supported doc hosted on Google Docs?

Google's official doc tells us :
The download URL for files looks something like this:
https://doc-04-20-docs.googleusercontent.com/docs/secure/m7an0emtau/WJm12345/YzI2Y2ExYWVm?h=16655626&e=download&gd=true
And it also tells us that a document entry xml is done like following :
<entry ...
...
<content type='application/zip' src='https://doc-0s-84-docs.googleusercontent.com/docs/securesc/4t...626&e=download&gd=true'/>
...
</entry>
But whatever I try with gdata java client library, I don't manage to retrieve that url. I tried all the .get*Link*() methods, and .getContent(). Does somebody met this issue and found the solution ? I've also tried to get the mediasource and work on its input stream.
The finality of that is to get the file's content (the file is binary with a custom format) back on my java application server (GAE) to send it to my client who can parse it and view it.
Cheers,
Ricola3D
Finally I found the solution by myself, it seems I was just using it wrong. Here is the code if someday somebody else needs !
URL entryUrl = new URL("https://docs.google.com/feeds/default/private/full/"+mydocumentid);
DocumentEntry mydocument = client.getEntry(entryUrl, DocumentEntry.class);
if (mydocument!=null) { // if we can read the document
MediaContent content = (MediaContent) mydocument.getContent();
//URL exportUrl = new URL(content.getUri()); // download url
MediaSource source = client.getMedia(content);
InputStream inStream = null;
OutputStream outStream = null;
try {
inStream = source.getInputStream();
outStream = resp.getOutputStream();
int c;
while ((c = inStream.read()) != -1) {
outStream.write(c); // copy the stream, by 1byte per 1byte is very slow !
}
} finally {
if (inStream != null) {
inStream.close();
}
if (outStream != null) {
outStream.flush();
outStream.close();
}
}

Downloading a text file

I am trying to download a text file in Android, i know how to load image file, how different is text file downloading from it?
Moerover how to retrive contents from the downloaded file?
You are asking a few things, this should give you an idea of how to get a remote file using urlconnection and associated classes
URL u = new URL(url);
HttpURLConnection c = (HttpURLConnection) u.openConnection();
URLConnection conn = u.openConnection();
fs = conn.getContentLength();
c.setRequestMethod("GET");
c.setDoOutput(true);
c.connect();
String PATH_op = Environment.getExternalStorageDirectory()
+"//"+ filename;
f = new FileOutputStream(new File(PATH_op));
InputStream in = c.getInputStream();
byte[] buffer = new byte[1024];
int len1 = 0;
while ( (len1 = in.read(buffer)) > 0 ) {
f.write(buffer,0, len1);
completed += len1;
}
f.close();`enter code here`
There will be no difference in downloading the text file or image or XML. every thing is same. but the usage after getting the stream depends on the type of the content.
If its a Image we will decode the stream to convert it to an image.
If its a Text we need to read the content character by character until the whole content got read or got -1 as the character which denotes the end of the file.
When coming to the XML file file we will directly pass the input stream object to Parser.

How to play .wav file from URL in web browser embed - Java

I want to play a .wav sound file in embed default media player in IE. Sound file is on some HTTP location. I am unable to sound it in that player.
Following is the code.
URL url = new URL("http://www.concidel.com/upload/myfile.wav");
URLConnection urlc = url.openConnection();
InputStream is = (InputStream)urlc.getInputStream();
fileBytes = new byte[is.available()];
while (is.read(fileBytes,0,fileBytes.length)!=-1){}
BufferedOutputStream out = new BufferedOutputStream(response.getOutputStream());
out.write(fileBytes);
Here is embed code of HTML.
<embed src="CallStatesTreeAction.do?ivrCallId=${requestScope.vo.callId}&agentId=${requestScope.vo.agentId}" type="application/x-mplayer2" autostart="0" playcount="1" style="width: 40%; height: 45" />
If I write in FileOutputStream then it plays well
If I replace my code of getting file from URL to my local hard disk. then it also works fine.
I don't know why I am unable to play file from HTTP. And why it plays well from local hard disk.
Please help.
Make sure you set the correct response type. IE is very picky in that regard.
[EDIT] Your copy loop is broken. Try this code:
URL url = new URL("http://www.concidel.com/upload/myfile.wav");
URLConnection urlc = url.openConnection();
InputStream is = (InputStream)urlc.getInputStream();
fileBytes = new byte[is.available()];
int len;
while ( (len = is.read(fileBytes,0,fileBytes.length)) !=-1){
response.getOutputStream.write(fileBytes, 0, len);
}
The problem with your code is: If the data isn't fetched in a single call to is.read(), it's not appended to fileBytes but instead the first bytes are overwritten.
Also, the output stream which you get from the response is already buffered.

Categories