How to check if URL contains XML file or not? - java

I have a project about getting XML files from URL's, scraping them, pulling the data, then processing it. Also, I am creating the URL with user input. But I need to check if the URL contains XML file to scrape. Any ideas how to do that? So basically how to check if URL contains XML file or not?

Ways to know whether GETing a URL will retrieve XML...
Before retrieving the file
Have an out-of-band guarantee.
Inspect Content-Type HTTP header of response to a HEAD request1.
After retrieving the file
Inspect Content-Type HTTP header of the response1.
Sniff root element.
Files.probeContentType(path)
Parse via conforming XML parser without getting any well-formedness errors.
Note: Only parsing via a conforming XML parser is guaranteed to provide 100% determination.
1 MIME assignments for XML data:
application/xml (RFC 7303, previously RFC 3023)
text/xml (RFC 7303, previously RFC 3023)
Other MIME assignments used with XML applications.

Related

REST: Tell client to send only csv and text format

In restfull WS, how to tell client to send only csv and text format file.
In content-type header, client set the format in which it is sending request and in Accept header, client set the format in which it want to accept response.
But how to tell client to send only content-type csv or file ? Is this through some documentation ?
The 415 status code seems to be suitable for this situation:
6.5.13. 415 Unsupported Media Type
The 415 (Unsupported Media Type) status code indicates that the
origin server is refusing to service the request because the payload
is in a format not supported by this method on the target resource.
The format problem might be due to the request's indicated
Content-Type or Content-Encoding, or as a result of inspecting the
data directly.
The response payload could contain a list of the media types supported by the server.
Image you have an endpoint called /textfiles - the developer using your API is usually reading your documentation on how to implement this endpoint. Unless you're not doing some auto-discovery magic (which I guess is still not common).
If we take Facebook for example, they just state in their documentation which files you can send:
We accept the following files: 3g2, 3gp, 3gpp, [...]
Your question:
But how to tell client to send only content-type csv or file ?
is also a bit unclear. When the user has sent the request, he already attached the files he thought he could send. So here you would rather send an error with a message, which files are allowed. So are we talking about some "pre"-requests here?
From a backend developers point of view I can just tell you: It's in the documentation. Handle errors properly, document and your implementing developer will not hate you :)
if i develop a restful application using spring i would set the produces attribute to return csv or plain text ( https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/bind/annotation/RequestMapping.html) . if the client tries to request a resource other than csv or text it will recieve an error . probably 415

Finding the downloaded filetype from HTTP message

I am trying to build a basic downloaded file scanning extension for the popular open source security application ZAP. using the built in sniffer, I can access the HTTP response messages. I am unable to determine the filetype of the file being downloaded. Although the Mozilla blog regarding HTTP talks about using the MIME Type in the 'Content-Type' header to determine the file type, I find that none of the response messages that I get have anything other than application/json or text/html or application/octet-stream. How do I determine if the corresponding HTTP response body contains any particular file type? . I am thus stuck at a dead end!
I am a beginner in this field and there might be something that I am over looking. Any help or pointers would be greatly appreciated.
The Content-Type entity-header field indicates the media type of the entity-body sent to the recipient or, in the case of the HEAD method, the media type that would have been sent had the request been a GET.
Taken from https://www.rfc-editor.org/rfc/rfc2616 under "14.17 Content-Type"
They give this as an example:
Content-Type: text/html; charset=ISO-8859-4
This HTTP request or response contains text in the form of a body of HTML.
If you do not trust this header (which most of the time you can), the next step would be analyzing the file contents. For example, if the file contains opening and closing HTML tags, then there is a good chance that the file is an HTML file. If the file begins with a [ or { and ends with a ] or } then there is a good chance that it is a JSON file. An actual analysis would and should be much more detailed, of course.

Returning file/files in JSON response (Java-Jersey-ReST)

I am working on a use case where I am displaying user's messages on a JSP. Details of the flow are:
All the messages will be shown in a table with icon for attachments
When the user clicks on attachment, the file should get downloaded.
If there is more than one attachment, user can select the required
one to download.
The attachments will be stored on the local filesystem and the path for the attachments will be determined by the system.
I have tried to implement by referring to these SO questions:
Input and Output binary streams using JERSEY?
Return a file using Java Jersey
file downloading in restful web services
However, it's not solving my purpose. I have the following questions:
Is it possible to send message data (like subject, message, message id, etc) along with the attachments (Inputstream) in one response?
If yes, what needs to be the MediaType for #Produces annotation in my resource method? Currently my resource is annotated with #Produces(MediaType.APPLICATION_JSON). Will this work?
How to send the file data in the response?
Any pointers appreciated. TIA.
You can add custom data to the response Header, so yes you are able to send such message data. Add the data to the response Header.
#Produces(MediaType.APPLICATION_JSON) will not work, unless the clients will accept JSON as a file, what they should and will not do ;)
The correct MediaType depends on what kind of file you want to submit.
You can use the default MediaType / MIME-Type MediaType.APPLICATION_OCTET_STREAM / application/octet-stream (Is there a “default”
MIME type?) but I think it's better to use the correct and exact MIME-Type for your file.
You will find working examples for sending file data with jersey in Input and Output binary streams using JERSEY? - so there is no need to answer this again :)
Hope this was helpful somehow, have a nice day.

Handling different XML response documents with one SAX Handler

I am developing a Java application that makes an HTTP Request to a web service, and XML is returned. If the response code is 200, then a requestSucceeded() callback method will send the XML to a SAXParser with a different SAX Handler, depending on what web service is being called. If the response code is not 200, then a requestFailed() callback method is being called.
The web service that I am calling will return two types of XML documents (with a response code of 200): an XML document containing the successful response information, or an XML error document containing error information (for example, if one of the request parameters wasn't formatted correctly).
My question is this: Given my current setup, what is the best way to look for / handle both kinds of XML documents (a successful XML response or an XML error document)? The SAX Handler is looking for all of the relevant response information and it is storing that information into an object, which is then processed by my application. Is there a better solution than just always first looking for the unique XML Error tags?
Thanks!
Option #1 - Change Respose Code
Why are you returning an error with response code 200? 400 (Bad Request) or another error code might be a better option. Then you could process the XML based on the response code.
Option #2 - Swap Content Handlers
Below is a link to one of my previous answers where I explain how to swap content handlers while processing the document. You could have one content handler that determines if the response is content or error, and then swaps in the appropriate content handler to process the rest.
Using SAX to parse common XML elements
Option #3 - Use JAXB
If the end result is that the XML will be converted to an object, have you considered using JAXB? It will build an object based on the XML based on what is returned.

Posting contents of a file using HttpClient?

I want to send the contents of a file as part of a http request using Apache HttpClient and I could not figure out how to pass on the file contents in the request body.
You didn't specify the format....
Most likely, you want to send a POST request, the contents will be multipart/form-data MIME type. This emulates what a browser sends from an <INPUT type="file" ...> form element. This requires some pretty sophisticated parsing on the server side to extract the multiple parts from the body and correctly extract the file data from the other fields (if any). Fortunately, commons-fileupload does this perfectly. The first answer regarding FilePart is exactly right.
Alternatively, you could simply post the raw contents of a file as the body of the request by using an InputStreamRequestEntity. This may be much simpler if you're writing your own server side to receive the data. The server side is as simple as streaming the request's InputStream to disk. I use this technique for uploads with Google Gears.
Check out FilePart and related.
Here's the sample.

Categories