I've written a program to read a set of source files and convert them into XML files using SrcML tool. Basically the procedure as follows.
for (------------------) {
-------------------
String xmlUri = GetXmlFile(sourceFileUri); // create xml file and get its uri
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlUri);
-------------------
}
For each source file the program creates a XML file in the same location (by overriding the previously created file) and read the XML file. For some source files this procedure works fine. But most of them it gives some SAX Parse Exceptions as follows:
Premature end of file.
Content is not allowed in prolog.
The element type "argcl" must be terminated by the matching end-tag "". (this XML file doesn't even contains an element by name "argcl"
XML document structures must start and end within the same entity.
The SrcML tool creates valid XML documents. When I check the XML file for some of these exception it doesn't show anything wrong with the format.
All exceptions pointed out to the same line in the code which is:
"Document doc = dBuilder.parse(xmlUri);"
I have gone through number of discussions related to this topic in stack over flow as well as in other forums. Neither provides me a clue to overcome this problem.
I really appreciate if someone can help me to solve this problem.
Thank you.
Here's the source code written to read XML file:
private static Document GetXmlDom(String xmlFilePath)
throws SAXException, ParserConfigurationException, IOException {
File tempFile;
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFilePath);
if (doc.hasChildNodes()) {
return doc;
}
}
catch (IOException e) {
e.printStackTrace();
throw e;
}
catch (SAXParseException e) {
e.printStackTrace();
throw e;
}
return null;
}
private static String GetXmlFile(String inputFile) throws IOException {
if (new File(inputFile).isFile()) {
String outFile = FileNameHandler.GetNextNumberedFileName(FileNameHandler.getXmlFlePath(), "outFile.xml");
Process process = new ProcessBuilder("srcML\\src2srcml.exe", inputFile,
"-o", outFile).start();
return outFile;
}
else {
System.out.println("\nNo XML file is created. File does not exist: " + inputFile);
}
return null;
}
public static List<Tag> SourceToXML(String inputFile)
throws SAXException, ParserConfigurationException, IOException {
List<Tag> tagList = new LinkedList<Tag>();
String xmlUri = GetXmlFile(inputFile);
Document doc = GetXmlDom(xmlUri);
if (doc != null) {
LinkedList<Integer> id = new LinkedList<Integer>();
id.add(1);
TagHierarchy.CreateStructuredDom(new TagId(id), doc.getFirstChild(), tagList);
tagList.get(0).setAncestor(null);
TagHierarchy.SetTagHierarchy(tagList);
}
return tagList;
}
Here's the exception thrown:
[Fatal Error] outFile.xml:461:300: The element type "argcl" must be
terminated by the matching end-tag "".
org.xml.sax.SAXParseException; systemId:
file:/E:/srcML/Output/outFile.xml; lineNumber: 461; columnNumber: 300;
The element type "argcl" must be terminated by the matching end-tag
"". at
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
Source) at
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown
Source) at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at vocab.util.file.FileConverter.SourceToXML(FileConverter.java:188)
at vocab.CodeVocabulary.Create(CodeVocabulary.java:59) at
vocab.CodeVocabulary.(CodeVocabulary.java:53) at
vocab.util.DataAcccessUtil.GetCodeVocabularies(DataAcccessUtil.java:331)
at vocab.TestMain.main(TestMain.java:57)
It seems like you're starting a process which generates an XML file, and read the generated file directly after. This means that the parser will read the file while the process is running and writing to the same file. So the parser will not see the complete generated file.
You should wait for the process to finish before reading the file it generates.
You should also respect the Java naming conventions: methods start with a lowercase letter.
Related
I'm using document builder and NodeList in Android Studio to parse an xml document. I previously found that the xml was incorrect and had un-escaped ampersands within the text. After taking care of this though and double check with w3 XML validator, I still get an unexpected token error:
e: "org.xml.sax.SAXParseException: Unexpected token (position:TEXT \n \n 601\n ...#5262:1 in java.io.StringReader#cd0db4a)"
However, when I open the xml and look at the line referred to, I don't see anything that would be considered troublesome:
... ...
5257 <WebSvcLocation>
5258 <Id>1521981</Id>
5259 <Name>Warehouse: Row 3</Name>
5260 <SiteName>Warehouse</SiteName>
5261 </WebSvcLocation>
5262 </ArrayOfWebSvcLocation>
I have checked the xml as well for non printing characters and I have not found any. Below is the code I have been using:
public List<Location> SpinnerXML(String xml){
List<Location> list = new ArrayList<Location>();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
InputSource is;
String s = xml.replaceAll("[&]"," and ");
try {
builder = factory.newDocumentBuilder();
is = new InputSource(new StringReader(s));
Document doc = builder.parse(is);
NodeList lt = doc.getElementsByTagName("WebSvcLocation");
int id;
String name,siteName;
for (int i = 0; i < lt.getLength(); i++) {
Element el = (Element) lt.item(i);
id = Integer.parseInt(getValue(el, "Id"));
name = getValue(el, "Name");
siteName = getValue(el, "SiteName");
list.add(new Location(id, name, siteName));
}
} catch (ParserConfigurationException e){
} catch (SAXException e){
e.printStackTrace();
} catch (IOException e){
}
return list;
}
The XML I have been trying to read is hosted here.
Thanks in advance for the help!
InputSource seems to do some guessing as to the encoding, so here's some things to try.
From here it says:
Android note: The Android platform default (encoding) is always UTF-8.
Referenced from here
Java stores strings as UTF-16 internally.
"Java stores strings as UTF-16 internally, but the encoding used
externally, the "system default encoding", varies.
(1) I would initially recommend:
is.setEncoding("UTF-8");
(2) But it should do no harm to replace this:
Document doc = builder.parse(is);
With this:
Document doc = builder.parse(new ByteArrayInputStream(s.getBytes()));
(3) OR try this:
String s1 = URLDecoder.decode(s, "UTF-8");
Document doc = builder.parse(new ByteArrayInputStream(s1.getBytes()));
NOTE:
if you try (2) or (3) comment OUT:
is = new InputSource(new StringReader(s));
As it may mess up String s.
Here's the XML I'm trying to parse: http://realtime.catabus.com/InfoPoint/rest/routes/get/51
#Override
protected Void doInBackground(String... Url) {
try {
URL url = new URL(Url[0]);
DocumentBuilderFactory dbf = DocumentBuilderFactory
.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
// Download the XML file
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
// Locate the Tag Name
nodelist = doc.getElementsByTagName("VehicleLocation");
} catch (Exception e) {
Log.e("Error", e.getMessage());
e.printStackTrace();
}
return null;
}
During runtime, when it reaches this line: DocumentBuilder db = dbf.newDocumentBuilder(); I get the following error:
Unexpected token (position:TEXT {"RouteId":51,"R...#1:1298 in java.io.InputStreamReader#850b9be)
It seems to have something to do with the encoding. My guess is that it's because the XML doesn't sepcify the encoding, but maybe not.
Is there a way to specify the encoding in the code (I can't change the XML itself)?
Thanks!
EDIT: This seems to only happen when parsing the XML from the url. Storing the file locally seems to work fine.
Is there a way to specify the encoding in the code (I can't change the
XML itself)?
You can call InputSource.setEncoding() to set the encoding.
I would suggest to take a look at XmlPullParser instead for parsing XML in Android.
The code I posted is all working and is for writing but I need advice for how do I read the xml file so I can output it and/or delete the file. I read about SAX, documentbuilder .parse method and few others but I am confused on what do I use. I do not need you to write code for this but to point me in right direction.
The files are created in a folder separately so I need to read them all at once if possible (the name of the file is the variable listed below as Id)
This is how I create XML files.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
{
try {
DocumentBuilder doc = factory.newDocumentBuilder();
Document doc1=doc.newDocument();
Element IdNumber = (Element) doc1.createElement(Id);
Element IDes=(Element) doc1.createElement("InitialDestination");
Element FinDes=(Element) doc1.createElement("FinalDestination");
Element HourTime=(Element) doc1.createElement("Hours");
Element Minutetime=(Element) doc1.createElement("Minutes");
Element Price=(Element) doc1.createElement("TicketPrice");
Element Tran=(Element) doc1.createElement("TransportAgency");
doc1.appendChild(IdNumber);
IDes.appendChild(doc1.createTextNode(InDes));
FinDes.appendChild(doc1.createTextNode(FDes));
HourTime.appendChild(doc1.createTextNode(Htime));
Minutetime.appendChild(doc1.createTextNode(Mtime));
Price.appendChild(doc1.createTextNode(TicketPrice));
Tran.appendChild(doc1.createTextNode(TransportAgency));
IdNumber.appendChild(IDes);
IdNumber.appendChild(FinDes);
IdNumber.appendChild(HourTime);
IdNumber.appendChild(Minutetime);
IdNumber.appendChild(Price);
IdNumber.appendChild(Tran);
Source S=new DOMSource(doc1);
File file1=new File("C:\\Users\\Lozanovski\\Desktop\\TransportMe");
file1.mkdirs();
File file=new File("C:\\Users\\Lozanovski\\Desktop\\TransportMe\\"+Id+".xml");
StreamResult R=new StreamResult(file);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(S, R);
}
catch(ParserConfigurationException except)
{
System.out.println(except);
}
catch(TransformerException except1)
{
System.out.println(except1);
}
catch(DOMException except2)
{
System.out.println(except2);
}
catch(NullPointerException except3){
System.out.println(except3);
}
}
I don't know how to properly post xml code (I will accept edit)
<?xml version="1.0" encoding="UTF-8"?>
-<InsertIdentificationNumber>
<InitialDestination>Insert Initial Destination</InitialDestination>
<FinalDestination>Insert Final Destination</FinalDestination>
<Hours>Insert Hours</Hours>
<Minutes>Insert Minutes</Minutes>
<TicketPrice>Insert Ticket Price</TicketPrice>
<TransportAgency>Insert Transport Agency</TransportAgency>
</InsertIdentificationNumber>
Loop through the folder, and for each file found, create a Document and use it to display/output the content. You can also delete the file if needed:
File folder = new File("C:\\Users\\Lozanovski\\Desktop\\TransportMe");
DocumentBuilder docBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
if (folder.isDirectory()) {
for (File file : folder.listFiles()) {
Document doc = docBuilder.parse(file); // create an XML document
file.delete(); // delete the file
}
}
I am doing conversion from XHTML to PDF using flying saucer, it works perfectly but now i want to add bookmarks, and according to the fs documentation it should be done like this:
<bookmarks>
<bookmark name='1. Foo bar baz' href='#1'>
<bookmark name='1.1 Baz quux' href='#1.2'>
</bookmark>
</bookmark>
<bookmark name='2. Foo bar baz' href='#2'>
<bookmark name='2.1 Baz quux' href='#2.2'>
</bookmark>
</bookmark>
</bookmarks>
That should be put into the HEAD section, I have done that but the SAXParser wont read the file anymore, saying:
line 11 column 14 - Error: <bookmarks> is not recognized!
line 11 column 25 - Error: <bookmark> is not recognized!
I have a local entity resolver set up and have even added the bookmarks to a DTD,
<!--flying saucer bookmarks -->
<!ELEMENT bookmarks (#PCDATA)>
<!ATTLIST bookmarks %attrs;>
<!ELEMENT bookmark (#PCDATA)>
<!ATTLIST bookmark %attrs;>
But it just wont parse, I am out of ideas, please help.
EDIT
I am using the following code to parse:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbf.newDocumentBuilder();
builder.setEntityResolver(new LocalEntityResolver());
document = builder.parse(is);
EDIT
Here is LocalEntityResolver:
class LocalEntityResolver implements EntityResolver {
private static final Logger LOG = ESAPI.getLogger(LocalEntityResolver.class);
private static final Map<String, String> DTDS;
static {
DTDS = new HashMap<String, String>();
DTDS.put("-//W3C//DTD XHTML 1.0 Strict//EN",
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd");
DTDS.put("-//W3C//DTD XHTML 1.0 Transitional//EN",
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd");
DTDS.put("-//W3C//ENTITIES Latin 1 for XHTML//EN",
"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent");
DTDS.put("-//W3C//ENTITIES Symbols for XHTML//EN",
"http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent");
DTDS.put("-//W3C//ENTITIES Special for XHTML//EN",
"http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent");
}
#Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
InputSource input_source = null;
if (publicId != null && DTDS.containsKey(publicId)) {
LOG.debug(Logger.EVENT_SUCCESS, "Looking for local copy of [" + publicId + "]");
final String dtd_system_id = DTDS.get(publicId);
final String file_name = dtd_system_id.substring(
dtd_system_id.lastIndexOf('/') + 1, dtd_system_id.length());
InputStream input_stream = FileUtil.readStreamFromClasspath(
file_name, "my/class/path",
getClass().getClassLoader());
if (input_stream != null) {
LOG.debug(Logger.EVENT_SUCCESS, "Found local file [" + file_name + "]!");
input_source = new InputSource(input_stream);
}
}
return input_source;
}
}
My document builder factory implementation is :
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
Ugh, I finally found the problem. Sorry for making you guys debug the code, the problem was that in my code there was a call to JTidy.parse just before the DOM parsing occurred, this resulted in the content to be parsed to be empty and i did not even catch that, the actual Error was, Premature End of file from SAX.
Thanks to Matt Gibson, while i was going through the code to compile a short input document, i found the bug.
My code now includes a check to see if the content was null
/**
* parses String content into a valid XML document.
* #param content the content to be parsed.
* #return the parsed document or <tt>null</tt>
*/
private static Document parse(final String content) {
Document document = null;
try {
if (StringUtil.isNull(content)) {
throw new IllegalArgumentException("cannot parse null "
+ "content into a DOM object!");
}
InputStream is = new ByteArrayInputStream(content
.getBytes(CONTEXT.getEncoding()));
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbf.newDocumentBuilder();
builder.setEntityResolver(new LocalEntityResolver());
document = builder.parse(is);
} catch (Exception ex) {
LOG.error(Logger.EVENT_FAILURE, "parsing failed "
+ "for content[" + content + "]", ex);
}
return document;
}
I have made the following method which runs hard-coded xPath queries in a hard-coded XML file. The method works perfect with one exception. Some xml files contains the following tag
<!DOCTYPE WorkFlowDefinition SYSTEM "wfdef4.dtd">
When i try to run a query in that file i get the following exception:
java.io.FileNotFoundException:
C:\ProgramFiles\code\other\xPath\wfdef4.dtd(The system cannot find the file specified).
The question is : What can i do to instruct my program not to take under consideration this DTD file?
I have also noted that the path C:\ProgramFiles\code\other\xPath\wfdef4.dtd is the one i run my application from and not the one that the actual xml file is located.
Thank you in advace
Here is my method:
public String evaluate(String expression,File file){
XPathFactory factory = XPathFactory.newInstance();
xPath = XPathFactory.newInstance().newXPath();
StringBuffer strBuffer = new StringBuffer();
try{
InputSource inputSource = new InputSource(new FileInputStream(file));
//evaluates the expression
NodeList nodeList = (NodeList)xPath.evaluate(expression,
inputSource,XPathConstants.NODESET);
//does other stuff, irrelevant with my question.
for (int i = 0 ; i <nodeList.getLength(); i++){
strBuffer.append(nodeList.item(i).getTextContent());
}
}catch (Exception e) {
e.printStackTrace();
}
return strBuffer.toString();
}
And the answer is :
xPath = XPathFactory.newInstance().newXPath();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
//add this line to ignore dth DTD
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);