Part way through creating an XML file with Stax I have some XML in the form of a String. I write this to the Stax output using:
public void addInnerXml(String xml) throws TinyException {
try {
parent.adjustStack(this);
XMLStreamReader2 sr = (XMLStreamReader2) ifact.createXMLStreamReader(new ByteArrayInputStream(xml.getBytes("UTF8")));
for (int type = sr.getEventType(); sr.hasNext(); type = sr.next()) {
switch (type) {
case XMLStreamConstants.COMMENT:
case XMLStreamConstants.DTD:
case XMLStreamConstants.START_DOCUMENT:
case XMLStreamConstants.END_DOCUMENT:
continue;
}
parent.getWriter().copyEventFromReader(sr, false);
}
sr.close();
} catch (XMLStreamException e) {
throw new TinyException("addInnerXml", e);
} catch (UnsupportedEncodingException e) {
throw new TinyException("addInnerXml", e);
}
}
It all works great except namespaces used in the passed in XML, that are defined in the root element, are duplicated again in the inner nodes. Note that the p prefix is repeated
<p:sld xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"
xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<p:cSld> <p:spTree> <p:pic
xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main">
Is there a way to turn this off?
notes: XMLStreamReader2 implements org.codehaus.stax2.typed.TypedXMLStreamReader and is actually XMLStreamReader.
parent.getWriter() also returns an XMLStreamReader2.
thanks - dave
Related
I need to support the situation where a user submits an invalid XML file to me and I report back to them information about the error. Ideally the location of the error (line number and column number) and the nature of the error.
My sample code (see below) works well enough when there is a missing tag or similar error. In that case, I get an approximate location and a useful explanation. However my code fails spectacularly when the XML file contains non-UTF-8 characters. In this case, I get a useless error:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
I cannot find a way to determine the line number where the invalid character might be, nor the character itself. Is there a way to do this?
If, as one comment suggests, it may not be possible as we don't get to the parsing step, is there a way to process the XML file, not with a parser, but simply line-by-line, looking for and reporting non-UTF-8 characters?
Sample code follows. First a basic error handler:
public class XmlErrorHandler implements ErrorHandler {
#Override
public void warning(SAXParseException e) throws SAXException {
show("Warning", e); throw e;
}
#Override
public void error(SAXParseException e) throws SAXException {
show("Error", e); throw e;
}
#Override
public void fatalError(SAXParseException e) throws SAXException {
show("Fatal", e); throw e;
}
private void show(String type, SAXParseException e) {
System.out.println("Line " + e.getLineNumber() + " Column " + e.getColumnNumber());
System.out.println(type + ": " + e.getMessage());
}
}
And a trivial test program:
public class XmlTest {
public static void main(String[] args) {
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser parser = spf.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(new DefaultHandler());
reader.setErrorHandler(new XmlErrorHandler());
InputSource is = new InputSource(args[0]);
reader.parse(is);
}
catch (SAXException e) { // Useful error case
System.err.println(e);
e.printStackTrace(System.err);
}
catch (Exception e) { // Useless error case arrives here
System.err.println(e);
e.printStackTrace();
}
}
}
Sample XML File (with non-UTF-8 smart quotes from (say) a Word document):
<?xml version="1.0" encoding="UTF-8"?>
<example>
<![CDATA[Text with <91>smart quotes<92>.]]>
</example>
I had some success with identifying where the issue in the XML file is using a couple of approaches.
Adapting the code from my question to use a home-grown ContentHandler with a Locator (see below) demonstrated that the XML was being processed up until the invalid character is encountered. In particular, the line number is being tracked. Preserving the line number allowed it to be retrieved from the ContentHandler when the problematic exception occurs.
At this point, I came up with two possibilities. The first is to re-run the processing with a different encoding on the InputStream, eg. Windows-1252. Parsing completed without error in this instance and I was able to retrieve the characters on the line with the known issue. This allows for a reasonably useful error message to the user, ie. line number and the characters.
My second approach was to adapt the code from the top-rated answer to this SO question. This code allows you to find the first non-UTF-8 character in a byte stream. If you assume that 0x0A (linefeed) represents a new line in the XML (and this seems to work pretty well in practice), then the line number, column number and the invalid characters can be extracted easily enough for a precise error message.
// Modified test program
public class XmlTest {
public static void main(String[] args) {
ErrorFinder errorFinder = new ErrorFinder(0); // Create our own content handler
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser parser = spf.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(errorFinder); // Use instead of the default handler
reader.setErrorHandler(new XmlErrorHandler());
InputSource is = new InputSource(args[0]);
reader.parse(is);
}
catch (SAXException e) { // Useful error case
System.err.println(e);
e.printStackTrace(System.err);
}
catch (Exception e) { // Useless error case arrives here
System.err.println(e);
e.printStackTrace();
// Option 1: repeat parsing (see above) with a new ErrorFinder initialised thus:
ErrorFinder ef2 = new ErrorFinder(errorFinder.getCurrentLineNumber()); // and
is.setEncoding("Windows-1252");
}
}
}
// Content handler with irrelevant method implementations elided.
public class ErrorFinder implements ContentHandler {
private int lineNumber; // If non-zero, the line number to retrieve characters for.
private int currentLineNumber;
private char[] chars;
private Locator locator;
public ErrorFinder(int lineNumber) {
super();
this.lineNumber = lineNumber;
}
public void setDocumentLocator(Locator locator) {
this.locator = locator;
}
#Override
public void startDocument() throws SAXException {
currentLineNumber = locator.getLineNumber();
}
... // Skip other over-ridden methods as they have same code as startDocument().
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
currentLineNumber = locator.getLineNumber();
if (currentLineNumber == lineNumber) {
char[] c = new char[length];
System.arraycopy(ch, start, c, 0, length);
chars = c;
}
}
public int getCurrentLineNumber() {
return currentLineNumber;
}
public char[] getChars() {
return chars;
}
}
Is there a method to hide all tags from the output of the YamlWriter using the yamlbeans lib?
public <T> String toYaml(T object) {
try (StringWriter stringWriter = new StringWriter()) {
YamlWriter writer = new YamlWriter(stringWriter);
writer.write(object);
writer.getConfig().writeConfig.setWriteRootTags(false);//replaces only root tag
writer.close(); //don't add this to finally, because it the text will not be flushed
return removeTags(stringWriter.toString());
} catch (IOException e) {
throw new RuntimeException(e);
}
}
private String removeTags(String string) {
//a tag is a sequence of characters starting with ! and ending with whitespace
return removePattern(string, " ![^\\s]*");
}
Thank you.
Change the writers config to never write class names and you're good to go:
writer.getConfig().writeConfig.setWriteClassname(YamlConfig.WriteClassName.NEVER);
We are having some issue with XML file generation using JAXB.
Even though the marshalling is successfully completed the generated XML file is corrupted sometimes(Everyday we generate around 200 xml files each is 150MB in size, so far only 2 files are corrupted in two months).
There were no errors reported in the log file(catalina.out) even though we log each exceptions.
But when we rerun the job, the files are generated successfully.
The application uses the following code segment to marshall the Java object to XML.
public static String marshall(final Marshaller marshaller, final Object obj, final String transformFileName)
throws ServiceException {
File file1 = null;
try {
file1 = new File(transformFileName);
marshaller.marshal(obj, new StreamResult( file1 ) );
return file1.getAbsolutePath();
} catch (XmlMappingException e) {
throw new ServiceException(e);
} catch (IOException e) {
throw new ServiceException(e);
}
}
The following is the bean definition of marshaller
<bean id="bankStatementMarshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
<property name="contextPath" value="*.bankstatement" />
<property name="schema" value="classpath:Statement.xsd" />
</bean>
The marshaller is used by multiple threads at the same time, we have already checked the spring code that for every call of marshall, spring create new marshaller object. So we ruled out the concurrency issue(As per our understanding).
In addition to this, we are using the NFS file system to create all the XML files.
And when we are going through the JAXB Marshaller implementation, we found the following code segment that ignores the IOException when flushing the streams(The cleanup method below).
private void write(Object obj, XmlOutput out, Runnable postInitAction) throws JAXBException {
try {
if( obj == null )
throw new IllegalArgumentException(Messages.NOT_MARSHALLABLE.format());
if( schema!=null ) {
// send the output to the validator as well
ValidatorHandler validator = schema.newValidatorHandler();
validator.setErrorHandler(new FatalAdapter(serializer));
// work around a bug in JAXP validator in Tiger
XMLFilterImpl f = new XMLFilterImpl() {
#Override
public void startPrefixMapping(String prefix, String uri) throws SAXException {
super.startPrefixMapping(prefix.intern(), uri.intern());
}
};
f.setContentHandler(validator);
out = new ForkXmlOutput( new SAXOutput(f) {
#Override
public void startDocument(XMLSerializer serializer, boolean fragment, int[] nsUriIndex2prefixIndex, NamespaceContextImpl nsContext) throws SAXException, IOException, XMLStreamException {
super.startDocument(serializer, false, nsUriIndex2prefixIndex, nsContext);
}
#Override
public void endDocument(boolean fragment) throws SAXException, IOException, XMLStreamException {
super.endDocument(false);
}
}, out );
}
try {
prewrite(out,isFragment(),postInitAction);
serializer.childAsRoot(obj);
postwrite();
} catch( SAXException e ) {
throw new MarshalException(e);
} catch (IOException e) {
throw new MarshalException(e);
} catch (XMLStreamException e) {
throw new MarshalException(e);
} finally {
serializer.close();
}
} finally {
cleanUp();
}
}
private void cleanUp() {
if(toBeFlushed!=null)
try {
toBeFlushed.flush();
} catch (IOException e) {
// ignore
}
if(toBeClosed!=null)
try {
toBeClosed.close();
} catch (IOException e) {
// ignore
}
toBeFlushed = null;
toBeClosed = null;
}
Can anyone suggest what could be the potential issue?
Something like
Since we use multiple thread, concurrency issue can cause this corrupted file generation?
Since we use NFS, the nonavailability of NFS during the generation can cause the corrupted file generation?
Since we generate lot of big size xml, memory usage during the generation is up to 80%. Can this cause the corrupted xml file generation?
Regards,
Mayuran
I am new to XML to parsing and dont know how to go about getting certain details from an xml file. In the following code, (Android Java) I get the location from the tag. Very straight forward;
public void readXML(String xmlToRead) throws XmlPullParserException {
try {
XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
factory.setNamespaceAware(true);
XmlPullParser xpp = factory.newPullParser();
xpp.setInput(new StringReader(xmlToRead));
WeatherDetails weatherDetails = new WeatherDetails();
xpp.next();
int eventType = xpp.getEventType();
while (xpp.getEventType()!=XmlPullParser.END_DOCUMENT) {
if (xpp.getEventType()==XmlPullParser.START_TAG) {
if (xpp.getName().equalsIgnoreCase("description")) {
weatherDetails.setWeatherLocation(xpp.nextText());
weather_userlocation.setText(weatherDetails.getWeatherLocation());
}
}
xpp.next();
}
} catch (XmlPullParserException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
Here is an example XML I'm attempting to parse:
http://forecast.weather.gov/MapClick.php?lat=40.28331&lon=-84.1435136&unit=0&lg=english&FcstType=dwml
Near the bottom of the XML, there is a segment:
<parameters applicable-location="point1">
<temperature type="apparent" units="Fahrenheit" time-layout="k-p1h-n1-1">
<value>33</value>
</temperature>
The value I want is inside the value tag, but there are numerous value tags throughout the XML. How can I point and retrieve this specific one?
Thank you all!
Heres my implementation of your suggestion:
while (xpp.getEventType()!=XmlPullParser.END_DOCUMENT) {
if (xpp.getEventType()==XmlPullParser.START_TAG) {
String name = xpp.getName();
if (name.equalsIgnoreCase("description")) {
weatherDetails.setWeatherLocation(xpp.nextText());
weather_userlocation.setText(weatherDetails.getWeatherLocation());
}
if(name.equalsIgnoreCase("temperature")) {
weather_apparenttemp.setText("Found TEMPERATURE tag!");
xpp.next();
if(xpp.getName().equals("value")) {
weather_apparenttemp.setText(xpp.nextText());
}
}
}
xpp.next();
}
what you need to do is search until you find the temperature tag using something like
String name = xpp.getName();
if(name.equals("temperature"))
{
if(xpp.next().equals("value")
{ String temp = xpp.next.getValue()}
}
Not 100%sure on the synax, its been a while since I used this but that should be the general idea.
Another alternative is to call another function that takes the parser as a parameter and then you can search within that, but thats more for if you have odd structure to the xml.
I am trying to use Jackson JSON take a string and determine if it is valid JSON. Can anyone suggest a code sample to use (Java)?
Not sure what your use case for this is, but this should do it:
public boolean isValidJSON(final String json) {
boolean valid = false;
try {
final JsonParser parser = new ObjectMapper().getJsonFactory()
.createJsonParser(json);
while (parser.nextToken() != null) {
}
valid = true;
} catch (JsonParseException jpe) {
jpe.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
}
return valid;
}
Although Perception's answer probably will fit many needs, there are some problems it won't catch, one of them is duplicate keys, consider the following example:
String json = "{ \"foo\" : \"bar\", \"foo\" : \"baz\" }";
As a complement, you can check for duplicate keys with the following code:
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.enable(DeserializationFeature.FAIL_ON_READING_DUP_TREE_KEY);
objectMapper.readTree(json);
It throws JsonProcessingException on duplicate key or other error.
With Jackson I use this function:
public static boolean isValidJSON(final String json) throws IOException {
boolean valid = true;
try{
objectMapper.readTree(json);
} catch(JsonProcessingException e){
valid = false;
}
return valid;
}
I would recommend using Bean Validation API separately: that is, first bind data to a POJO, then validate POJO. Data format level Schemas are in my opinion not very useful: one usually still has to validate higher level concerns, and schema languages themselves are clumsy, esp. ones that use format being validated (XML Schema and JSON Schema both have this basic flaw).
Doing this makes code more modular, reusable, and separates concerns (serialization, data validation).
But I would actually go one step further, and suggest you have a look at DropWizard -- it integrates Jackson and Validation API implementation (from Hibernate project).
private boolean isValidJson(String json) {
try {
objectMapper.readTree(json);
} catch (JsonProcessingException e) {
return false;
}
return true;
}
Another option would be using java.util.Optional in Java 8. This allows to return an object and to use in the calling code a more functional approach.
This is another possible implementation:
public Optional<JsonProcessingException> validateJson(final String json) {
try{
objectMapper.readTree(json);
return Optional.empty();
} catch(JsonProcessingException e){
return Optional.of(e);
} catch(IOException e) {
throw new RuntimeException(e);
}
}
Then you can use this method like this:
jsonHelper.validateJson(mappingData.getMetadataJson())
.map(e -> String.format("Error: %s at %s", e.getMessage(), e.getLocation().toString()))
.orElse("Valid JSON");
Inproving the other answers
public static boolean isValidJSON(final String json) throws IOException {
boolean valid = true;
try{
mapper.enable(DeserializationFeature.FAIL_ON_TRAILING_TOKENS);
mapper.enable(DeserializationFeature.FAIL_ON_READING_DUP_TREE_KEY);
objectMapper.readTree(json);
} catch(JsonProcessingException e){
valid = false;
}
return valid;
}