Download and parse an XML file in Java - java

I have never had to download an XML file in Java before and parse it after. I'm looking to download and parse this file http://api.irishrail.ie/realtime/realtime.asmx/getStationDataByNameXML?StationDesc=Bayside
All I want to do is read the train times. I've been reading about parsing XML but I'm not really getting anywhere with it. I just keep reading about parsers like stax, after that I'm a bit lost.
Can anyone give me some basic advice of what I need to do?

You can use JAXB for this and any other XML processing needs. Start here.

You can use DOM parser and new Java Architecture for XML Binding JAXB it will help you in marshalling (for converting an xml to Object) and unmarshalling(for converting an Object to xml)
link for the example
http://www.vogella.com/tutorials/JAXB/article.html

You can use the DOM Parser to create a Document object from the XML file.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new File(filename));
The DOM Parser creates a traversable tree from your XML data.
You can then pick out the data you need from the DOM tree using XPath.

try this library
http://x-stream.github.io/download.html
it is simple and fast to use to write and read xml from java.

Related

Special characters creates problem while writing xml

first of all please excuse my shallow understanding into coding as I am a business analyst. Now my question. I am writing java code to convert a csv into xml. I am able to read csv successfully into objects. However, while writing the xml, when special a space or "=" is encounteredan error is thrown.
Piece of the problematic code, I have imporovised the value in create element just to highlight the problem. In actual I am getting this value from an object:-
DocumentBuilderFactory documentFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentFactory.newDocumentBuilder();
Document xmlDocument= documentBuilder.newDocument();
Element root = xmlDocument.createElement("Media NationalGroupId="8" AllFTA="1002" AllSTV="1001");
xmlDocument.appendChild(root);
My xml should look something like this
<Media DateCreated="20200224 145251" NationalGroupId="8" AllFTA="1002" AllSTV="1001" AllTV="1000" NextId="1000000">
createElement should only receive Media as the argument.
To add the other attributes (DateCreated, NationalGroupId, etc), you need to call setAttribute on root, one by one.

File-path from xPath-Object or Document-Object in Java

Is there a way to get the path of a XML-Document from a xPath- or Document-Object in the xPath-API ?
That´s how the Objects are initalized:
FileInputStream file = new FileInputStream(new File("C:\ExampleFile.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
So the question is:
Could the Objects xmlDocument or xpath somehow return "C:\ExampleFile.xml" ?
Using the Document object xmlDocument you can return the path of the file with:
xmlDocument.getDocumentURI();
Rather than creating a FileInputStream from the File and passing that to the parse method
FileInputStream file = new FileInputStream(new File("C:\\ExampleFile.xml"));
use the version of parse that takes a File directly.
File file = new File("C:\\ExampleFile.xml");
// rest of your code is unchanged - parse(file) is now the
// java.io.File version rather than the InputStream version
When you pass just a stream the parser has no way of knowing that that stream was created from a file, as far as the parser is concerned that could be a stream you received from a web server, or a ByteArrayInputStream, or some other non-file source. If you pass the File directly to parse then the parser will handle opening and closing the streams itself, and will be able to provide a meaningful URI to downstream components, and you'll get a sensible result from xmlDocument.getDocumentURI().
As an aside, if you want XPath to work reliably then you need to enable namespaces by calling builderFactory.setNamespaceAware(true) before you call newDocumentBuilder(). Even if your XML doesn't actually use any namespaces, you still need to parse with a namespace-aware DOM parser.

Storing html values in xml

Trying to figure out a way to strip out specific information(name,description,id,etc) from an html file leaving behind the un-wanted information and storing it in an xml file.
I thought of trying using xslt since it can do xml to html... but it doesn't seem to work the other way around.
I honestly don't know what other language i should try to accomplish this. i know basic java and javascript but not to sure if it can do it.. im kind of lost on getting this started.
i'm open to any advice/help. willing to learn a new language too as i'm just doing this for fun.
There are a number of Java libraries for handling HTML input that isn't well-formed (according to XML). These libraries also have built-in methods for querying or manipulating the document, but it's important to realize that once you've parsed the document it's usually pretty easy to treat it as though it were XML in the first place (using the standard Java XML interfaces). In other words, you only need these libraries to parse the malformed input; the other utilities they provide are mostly superfluous.
Here's an example that shows parsing HTML using HTMLCleaner and then converting that object into a standard org.w3c.dom.Document:
TagNode tagNode = new HtmlCleaner().clean("<html><div><p>test");
DomSerializer ser = new DomSerializer(new CleanerProperties());
org.w3c.dom.Document doc = ser.createDOM(tagNode);
In Jsoup, simply parse the input and serialize it into a string:
String text = Jsoup.parse("<html><div><p>test").outerHtml();
And convert that string into a W3C Document using one of the methods described here:
How to parse a String containing XML in Java and retrieve the value of the root node?
You can now use the standard JAXP interfaces to transform this document:
TransformerFactory tFact = TransformerFactory.newInstance();
Transformer transformer = tFact.newTransformer();
Source source = new DOMSource(doc);
Result result = new StreamResult(System.out);
transformer.transform(source, result);
Note: Provide some XSLT source to tFact.newTransformer() to do something more useful than the identity transform.
I would use HTMLAgilityPack or Chris Lovett's SGMLReader.
Or, simply HTML Tidy.
Ideally, you can treat your HTML as XML. If you're lucky, it will already be XHTML, and you can process it as HTML. If not, use something like http://nekohtml.sourceforge.net/ (a HTML tag balancer, etc.) to process the HTML into something that is XML compliant so that you can use XSLT.
I have a specific example and some notes around doing this on my personal blog at http://blogger.ziesemer.com/2008/03/scraping-suns-bug-database.html.
TagSoup
JSoup
Beautiful Soup

Java XML Parsing: Avoid entity reference resolution

I am currently parsing XHTML documents with a DOM parser, like:
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
final DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(MY_ENTITY_RESOLVER);
db.setErrorHandler(MY_ERROR_HANDLER);
...
final Document doc = db.parse(inputSource);
And my problem is that when my document contains an entity reference like, for example:
<p>€</p>
My parser creates a Text node for that content containing "€" instead of "€". This is, it is resolving the entity in the way it is supposed to do it (the XHTML 1.0 Strict DTD links to the ENTITIES Latin1 DTD, which in turn establishes the equivalence of "€" with "€").
The problem is, I don't want the parser to do such thing. I would like to keep the "€" text unmodified.
I've already tried with:
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setExpandEntityReferences(false);
But:
I don't like this because I fear this might make some parser implementations not navigate from the XHTML 1.0 Strict DTD to the ENTITIES Latin1 DTD and therefore not consider "€" as a declared entity.
When I do this, it weirdly creates two nodes: a "pound" Entity node, and a Text node with the "€" symbol after it.
Any ideas? Is it possible to configure this in a DOM Parser without resorting to preprocessing the XHTML and substituting all "&" symbols for something other?...
Solutions could be for a DOM parser or also a SAX one, I wouldn't mind using SAX parsing and then creating my DOM using a transformation...
Also, I cannot switch to a non standard XML parsing libray. No jdom, no jsoup, no HtmlCleaner, etc.
Thanks a lot.
The approach I took was to replace any entities with a unique marker that is treated as plain text by Xerces. Once converted into a Document object, the markers are replaced with Entity Reference objects.
See the convertStringToDocument() function in http://sourceforge.net/p/commonclasses/code/14/tree/trunk/src/com/redhat/ecs/commonutils/XMLUtilities.java

How to create XML file?

I have some data which my program discovers after observing a few things about files.
For instance, i know file name, time file was last changed, whether file is binary or ascii text, file content (assuming it is properties) and some other stuff.
i would like to store this data in XML format.
How would you go about doing it?
Please provide example.
If you want something quick and relatively painless, use XStream, which lets you serialise Java Objects to and from XML. The tutorial contains some quick examples.
Use StAX; it's so much easier than SAX or DOM to write an XML file (DOM is probably the easiest to read an XML file but requires you to have the whole thing in memory), and is built into Java SE 6.
A good demo is found here on p.2:
OutputStream out = new FileOutputStream("data.xml");
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLStreamWriter writer = factory.createXMLStreamWriter(out);
writer.writeStartDocument("ISO-8859-1", "1.0");
writer.writeStartElement("greeting");
writer.writeAttribute("id", "g1");
writer.writeCharacters("Hello StAX");
writer.writeEndDocument();
writer.flush();
writer.close();
out.close();
Standard are the W3C libraries.
final Document docToSave = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
final Element fileInfo = docToSave.createElement("fileInfo");
docToSave.appendChild(fileInfo);
final Element fileName = docToSave.createElement("fileName");
fileName.setNodeValue("filename.bin");
fileInfo.appendChild(fileName);
return docToSave;
XML is almost never the easiest thing to do.
You can use to do that SAX or DOM, review this link: https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1044810.html
I think is that you want

Categories