I am trying to learn how to read from an XML file (getting it from a url) over Http in Java and am pretty confused as to where I should start. I know how to parse an XML document and print the text associated with the elements to the screen and basic manipulation like that but I am trying to take it a little further.
If anyone could provide me with somewhere to start or any tips that would be much appreciated. I would be more than happy to provide more specifics if that is needed. Thanks!
It seems like you already know how to deal with XML, you're just asking how to get the XML over HTTP. This code should work.
URLConnection connection = new URL(urlThatReturnsXml).openConnection();
InputStream is = connection.getInputStream();
String responseAsString = org.apache.commons.io.IOUtils.toString(is);
Make a java.net.URL object from the url string and call openStream() on it. You now have an InputStream to read from. That should get you going.
Related
Is any convinient way to dynamically render some page inside application and then retrieve its contents as InputStream or String?
For example, the simplest way is:
// generate url
Link link = linkSource.createPageRenderLink("SomePageLink");
String urlAsString = link.toAbsoluteURI() + "/customParam/" + customParamValue;
// get info stream from url
HttpGet httpGet = new HttpGet(urlAsString);
httpGet.addHeader("cookie", request.getHeader("cookie"));
HttpResponse response = new DefaultHttpClient().execute(httpGet);
InputStream is = response.getEntity().getContent();
...
But it seems it must be some more easy method how to archive the same result. Any ideas?
I created tapestry-offline for exactly this purpose. Please be aware of the issue here (workaround included).
It's probably best to understand your exact use case. If, for example, you are generating emails in a scheduled task, it's probably better to configure jenkins or cron to hit a URL.
It's probably also worth mentioning the capture component from tapestry-stitch
This is only useful in situations where you want to cature part of a page as a String during page / component render.
www.rgrfm.be/rgrsite/maxradio/android.php
www.rgrfm.be/rgrsite/maxradio/onair.txt
The track information of the music being played is contained in onair.txt. android.php is a php script I wrote.
I need to display the track information in my Android application. I do not want do download it to disk but keep it in memory. I don't know if the php script is useless because it would create additional overhead. So it's probably better to simply parse onair.txt
InputStream is = new URL("http://www.rgrfm.be/rgrsite/maxradio/onair.txt").openStream();
I am stuck with this. Has anyone got time to help me?
As described, php script seems useless. Since, you can directly read the text file. So, first read it as text, then parse it.
URL url = new URL("http://www.rgrfm.be/rgrsite/maxradio/onair.txt");
String text = readAsText(url)
parse(text);
String readAsText(URL url) {
// read the url as text here.
}
void parse(String text) {
}
Everything is okay when I read the data from webpage using InputStreamReader.
I have problem with parsing data to DocumentHTML.
Main reason is that the HTML script has some special characters which are used incorrectly.
There is an & sign twice ( "&&" ) and I believe that is causing the code to crash.
My code looks like this:
URL url = new URL(PageUrl);
URLConnection conn = url.openConnection();
// ... omitted ...
// parsing
HTMLDocument doc = (HTMLDocument)db.parse(conn.getInputStream());
Since I am making an Android application, I don't use standard parsing functions since the DocumentHTML object is going to be too large.
I found many existing examples of parsing HTML like using jsoup but they are not what I want.
I want to write my own code for parsing so that the HTMLDocument object will be kept small.
Why dont you use all the available Html parsers that are available in java?
They have community support they so are the best option.
Open Source HTML Parsers in Java
I have a problem once again where I cant find the source code because its hidden or something... When my java program indexes the page it finds everything but the info i need... I assume its hidden for a reason but is there anyway around this?
Its just a bunch of tr/td tags that show up in firebug but dont show up when viewing the page source or when i do below
URL url = new URL("my url");
URLConnection yc = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
I really have no idea how to attempt to get the info that i need...
The reason for this behavior is because probably those tags are dynamically injected into the DOM using javascript and are not part of the initial HTML which is what you can fetch with an URLConnection. They might even be created using AJAX. You will need a javascript interpreter on your server if you want to fetch those.
If they don't show up in the page source, they're likely being added dynamically by Javascript code. There's no way to get them from your server-side script short of including a javascript interpreter, which is rather high-overhead.
The information in the tags is presumably coming from somewhere, though. Why not track that down and grab it straight from there?
Try Using Jsoup.
Document doc = doc=Jsoup.parse("http:\\",10000);
System.out.print(doc.toString());
Assuming that the issue is that the "missing" content is being injected using javascript, the following SO Question is pertinent:
What's a good tool to screen-scrape with Javascript support?
I'm trying to send a byte[] (using PUT) with Restlet but I can't find any info on how to do it. My code looks like this:
Request request = new Request(Method.PUT, url);
request.setEntity( WHAT DO I PUT HERE?, MediaType.APPLICATION_OCTET_STREAM);
I had expected to find something along the lines of ByteArrayRepresentation, just like there's a JsonRepresentation and a a StringRepresentation but I couldn't find anything.
I believe you want to use an InputRepresentation, like so:
Representation representation = new InputRepresentation(new ByteArrayInputStream(bytes), MediaType.APPLICATION_OCTET_STREAM);
request.setEntity(representation);
I'm not familiar with restlet, but one way to do it would be to base64 encode the data. Then you could handle it like a regular string.
you can try subclassing WritableRepresentation that is especially designed for large representations