SAX XML parser - localized error messages

SAX XML parser - localized error messages - java

The web-application I am currently working on, validates user-supplied xml files against xsd stored on a server. The problem is that if xml fails validation, error messages should be in Russian. I have my parser working - it gives error messages but only in English
String parserClass = "org.apache.xerces.parsers.SAXParser";
String validationFeature = "http://xml.org/sax/features/validation";
String schemaFeature = "http://apache.org/xml/features/validation/schema";
XMLReader reader = null;
reader = XMLReaderFactory.createXMLReader(parserClass);
reader.setFeature(validationFeature,true);
reader.setFeature(schemaFeature,true);
BatchContentHandler contentHandler = new BatchContentHandler(reader);
reader.setContentHandler(contentHandler);
BatchErrorHandler errorHandler = new BatchErrorHandler(reader);
reader.setErrorHandler(errorHandler);
reader.setFeature("http://apache.org/xml/features/continue-after-fatal-error", true);
reader.parse(new InputSource(new ByteArrayInputStream(streamedXML)));
It works fine - error messages are in English.
Reading this post Locale specific messages in Xerces 2.11.0 (Java) and also this post https://www.java.net//node/699069 I added these lines
Locale l = new Locale("ru", "RU");
reader.setProperty("http://apache.org/xml/properties/locale", l);
I also added XMLSchemaMessages_RU.properties file to the jar. Now I get NULL pointer exception. Any ideas or hints? Thanks in advance!

I found here this about http://apache.org/xml/properties/locale:
Desc:The locale to use for reporting errors and warnings. When the value of this property is null the platform default returned from
java.util.Locale.getDefault() will be used.
Type: java.util.Locale
Access: read-write
Since: Xerces-J 2.10.0
Note: If no messages are available for the specified locale the platform default will be used. If the platform default is not English
and no messages are available for this locale then messages will be
reported in English.
Also I found here an example where in order to create a Locale object for the Russian language this code is provided:
Locale dLocale = new Locale.Builder().setLanguage("ru").setScript("Cyrl").build();
I don't know if this could be useful. Just have a try and give me feedback about it!

Related

PDF/A was validated correctly with preflight but online pdf-tools does not validate it

Preflight (version 2.0.15) tool has validated correctly the generated pdf (was created with pdfbox version 2.0.15) file but online pdf-tools (e.x. https://www.pdf-online.com/osa/validate.aspx) does not validate it correctly. I am getting below error:
Compliance pdfa-1b
Result Document does not conform to PDF/A.
Details
Validating file "file.pdf" for conformance level pdfa-1b
Anonymous RDF resources (rdf:Description without rdf:about attribute) are not allowed in XMP Metadata.
The appearance dictionary doesn't contain an entry.
The appearance dictionary doesn't contain an entry.
The appearance dictionary doesn't contain an entry.
The appearance dictionary doesn't contain an entry.
The appearance dictionary doesn't contain an entry.
The document does not conform to the requested standard.
The document contains annotations or form fields with ambigous or without appropriate appearances.
The document's meta data is either missing or inconsistent or corrupt.
The document does not conform to the PDF/A-1b standard.
Done.
In order to generate metadata I use below code:
private void addMetadata(PDDocument pdDocument,final String zzz,final String yyy) {
PDDocumentCatalog catalog = pdDocument.getDocumentCatalog();
PDDocumentInformation info = pdDocument.getDocumentInformation();
info.setCreationDate(Calendar.getInstance());
info.setModificationDate(Calendar.getInstance());
info.setAuthor(metadataAuthor);
info.setProducer(metadataProducer);
info.setTitle(zzz + "_" + yyy);
info.setKeywords("aaa");
info.setCreator("aaa");
info.setSubject("aaa");
PDMarkInfo markInfo = new PDMarkInfo();
markInfo.setMarked(true);
catalog.setMarkInfo(markInfo);
try {
PDMetadata metadataStream = new PDMetadata(pdDocument);
catalog.setMetadata( metadataStream );
XMPMetadata xmp = new XMPMetadata();
XMPSchemaPDFAId pdfaid = new XMPSchemaPDFAId(xmp);
xmp.addSchema(pdfaid);
pdfaid.setConformance("B");
pdfaid.setPart(1);
pdfaid.setAbout("");
XMPSchemaDublinCore dcSchema = xmp.addDublinCoreSchema();
dcSchema.setTitle( info.getTitle() );
dcSchema.addCreator("aaa");
dcSchema.setDescription( info.getSubject() );
XMPSchemaPDF pdfSchema = xmp.addPDFSchema();
pdfSchema.setKeywords( info.getKeywords() );
pdfSchema.setProducer( info.getProducer() );
XMPSchemaBasic basicSchema = xmp.addBasicSchema();
basicSchema.setModifyDate( info.getModificationDate() );
basicSchema.setCreateDate( info.getCreationDate() );
basicSchema.setCreatorTool( info.getCreator() );
metadataStream.importXMPMetadata(xmp.asByteArray());
InputStream colorProfile = getClass().getClassLoader().getResourceAsStream("icm/sRGB Color Space Profile.icm");
// create output intent
PDOutputIntent oi = new PDOutputIntent(pdDocument, colorProfile);
String value = "sRGB IEC61966-2.1";
oi.setInfo(value);
oi.setOutputCondition(value);
oi.setOutputConditionIdentifier(value);
oi.setRegistryName("http://www.color.org");
catalog.addOutputIntent(oi);
} catch (Exception e) {
e.printStackTrace()
}
}
Any suggestions?

As discussed in the comments:
1) The failure to report "The appearance dictionary doesn't contain an entry" is a bug in PDFBox preflight that will be fixed in 2.0.17, see PDFBOX-4586. According to this document:
An ISO 19005-1 validator shall FAIL otherwise conforming files in
which a widget annotation lacks an appearance dictionary
2) The "rdf:Description without rdf:about attribute" may or may not be a bug. VeraPDF doesn't consider it to be one. Your code used an 1.8.* version. For these, you can call dcSchema.setAbout("") to fix this. In 2.0.* the problem doesn't occur if you created the schema with metadata.createAndAddDublinCoreSchema().
I have created an issue in the VeraPDF project and they will bring this question for discussion at the next meeting of the Validation technical working group.
3) That the widgets didn't contain an entry is because at the time setValue() was called, not enough information was present (e.g. the rectangle).That is why you got the message widget of field aa has no rectangle, no appearance stream created.

Invalid byte 1 of 1-byte UTF-8 sequence: RestTemplate [duplicate]

I am trying to fetch the below xml from db using a java method but I am getting an error
Code used to parse the xml
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new ByteArrayInputStream(cond.getBytes()));
Document doc = db.parse(is);
Element elem = doc.getDocumentElement();
// here we expect a series of <data><name>N</name><value>V</value></data>
NodeList nodes = elem.getElementsByTagName("data");
TableID jobId = new TableID(_processInstanceId);
Job myJob = Job.queryByID(_clientContext, jobId, true);
if (nodes.getLength() == 0) {
log(Level.DEBUG, "No data found on condition XML");
}
for (int i = 0; i < nodes.getLength(); i++) {
// loop through the <data> in the XML
Element dataTags = (Element) nodes.item(i);
String name = getChildTagValue(dataTags, "name");
String value = getChildTagValue(dataTags, "value");
log(Level.INFO, "UserData/Value=" + name + "/" + value);
myJob.setBulkUserData(name, value);
}
myJob.save();
The Data
<ContactDetails>307896043</ContactDetails>
<ContactName>307896043</ContactName>
<Preferred_Completion_Date>
</Preferred_Completion_Date>
<service_address>A-End Address: 1ST HELIERST HELIERJT2 3XP832THE CABLES 1 POONHA LANEST HELIER JE JT2 3XP</service_address>
<ServiceOrderId>315473043</ServiceOrderId>
<ServiceOrderTypeId>50</ServiceOrderTypeId>
<CustDesiredDate>2013-03-20T18:12:04</CustDesiredDate>
<OrderId>307896043</OrderId>
<CreateWho>csmuser</CreateWho>
<AccountInternalId>20100333</AccountInternalId>
<ServiceInternalId>20766093</ServiceInternalId>
<ServiceInternalIdResets>0</ServiceInternalIdResets>
<Primary_Offer_Name action='del'>MyMobile Blue £44.99 [12 month term]</Primary_Offer_Name>
<Disc_Reason action='del'>8</Disc_Reason>
<Sup_Offer action='del'>80000257</Sup_Offer>
<Service_Type action='del'>A-01-00</Service_Type>
<Priority action='del'>4</Priority>
<Account_Number action='del'>0</Account_Number>
<Offer action='del'>80000257</Offer>
<msisdn action='del'>447797142520</msisdn>
<imsi action='del'>234503184</imsi>
<sim action='del'>5535</sim>
<ocb9_ARM action='del'>false</ocb9_ARM>
<port_in_required action='del'>
</port_in_required>
<ocb9_mob action='del'>none</ocb9_mob>
<ocb9_mob_BB action='del'>
</ocb9_mob_BB>
<ocb9_LandLine action='del'>
</ocb9_LandLine>
<ocb9_LandLine_BB action='del'>
</ocb9_LandLine_BB>
<Contact_2>
</Contact_2>
<Acc_middle_name>
</Acc_middle_name>
<MarketCode>7</MarketCode>
<Acc_last_name>Port_OUT</Acc_last_name>
<Contact_1>
</Contact_1>
<Acc_first_name>.</Acc_first_name>
<EmaiId>
</EmaiId>
The ERROR
org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
I read in some threads it's because of some special characters in the xml.
How to fix this issue ?

How to fix this issue ?
Read the data using the correct character encoding. The error message means that you are trying to read the data as UTF-8 (either deliberately or because that is the default encoding for an XML file that does not specify <?xml version="1.0" encoding="somethingelse"?>) but it is actually in a different encoding such as ISO-8859-1 or Windows-1252.
To be able to advise on how you should do this I'd have to see the code you're currently using to read the XML.

Open the xml in notepad
Make sure you dont have extra space at the beginning and end of the document.
Select File -> Save As
select save as type -> All files
Enter file name as abcd.xml
select Encoding - UTF-8 -> Click Save

Try:
InputStream inputStream= // Your InputStream from your database.
Reader reader = new InputStreamReader(inputStream,"UTF-8");
InputSource is = new InputSource(reader);
is.setEncoding("UTF-8");
saxParser.parse(is, handler);
If it's anything else than UTF-8, just change the encoding part for the good one.

I was getting the xml as a String and using xml.getBytes() and getting this error. Changing to xml.getBytes(Charset.forName("UTF-8")) worked for me.

I had the same problem in my JSF application which was having a comment line containing some special characters in the XMHTL page. When I compared the previous version in my eclipse it had a comment,
//Some �  special characters found
Removed those characters and the page loaded fine. Mostly it is related to XML files, so please compare it with the working version.

I had this problem, but the file was in UTF-8, it was just that somehow on character had come in that was not encoded in UTF-8. To solve the problem I did what is stated in this thread, i.e. I validated the file:
How to check whether a file is valid UTF-8?
Basically you run the command:
$ iconv -f UTF-8 your_file -o /dev/null
And if there is something that is not encoded in UTF-8 it will give you the line and row numbers so that you can find it.

I happened to run into this problem because of an Ant build.
That Ant build took files and applied filterchain expandproperties to it. During this file filtering, my Windows machine's implicit default non-UTF-8 character encoding was used to generate the filtered files - therefore characters outside of its character set could not be mapped correctly.
One solution was to provide Ant with an explicit environment variable for UTF-8.
In Cygwin, before launching Ant: export ANT_OPTS="-Dfile.encoding=UTF-8".

This error comes when you are trying to load jasper report file with the extension .jasper
For Example
c://reports//EmployeeReport.jasper"
While you should load jasper report file with the extension .jrxml
For Example
c://reports//EmployeeReport.jrxml"
[See Problem Screenshot ][1] [1]: https://i.stack.imgur.com/D5SzR.png
[See Solution Screenshot][2] [2]: https://i.stack.imgur.com/VeQb9.png

I had a similar problem.
I had saved some xml in a file and when reading it into a DOM document, it failed due to special character. Then I used the following code to fix it:
String enco = new String(Files.readAllBytes(Paths.get(listPayloadPath+"/Payload.xml")), StandardCharsets.UTF_8);
Document doc = builder.parse(new ByteArrayInputStream(enco.getBytes(StandardCharsets.UTF_8)));
Let me know if it works for you.

I have met the same problem and after long investigation of my XML file I found the problem: there was few unescaped characters like « ».

Those like me who understand character encoding principles, also read Joel's article which is funny as it contains wrong characters anyway and still can't figure out what the heck (spoiler alert, I'm Mac user) then your solution can be as simple as removing your local repo and clone it again.
My code base did not change since the last time it was running OK so it made no sense to have UTF errors given the fact that our build system never complained about it....till I remembered that I accidentally unplugged my computer few days ago with IntelliJ Idea and the whole thing running (Java/Tomcat/Hibernate)
My Mac did a brilliant job as pretending nothing happened and I carried on business as usual but the underlying file system was left corrupted somehow. Wasted the whole day trying to figure this one out. I hope it helps somebody.

I had the same issue. My problem was it was missing “-Dfile.encoding=UTF8” argument under the JAVA_OPTION in statWeblogic.cmd file in WebLogic server.

You have a library that needs to be erased
Like the following library
implementation 'org.apache.maven.plugins:maven-surefire-plugin:2.4.3'

This error surprised me in production...
The error is because the char encoding is wrong, so the best solution is implement a way to auto detect the input charset.
This is one way to do it:
...
import org.xml.sax.InputSource;
...
InputSource inputSource = new InputSource(inputStream);
someReader(
inputSource.getByteStream(), inputSource.getEncoding()
);
Input sample:
<?xml version="1.0" encoding="utf-16"?>
<rss xmlns:dc="https://purl.org/dc/elements/1.1/" version="2.0">
<channel>
...

SAXParseException localized

I have service that parse XML and produce report with list of parser errors (SAXParseException exactly) using exception.getMessage() (exception.getLocalizedMessage() return the same) that can be read and understand by humans. How to localize this exception messages in a language other than English ?

I've found solution. First need to get XMLSchemaMessages.properties from Apache Xerces. I downloaded Xerces-J-src.2.11.0.tar.gz from http://xerces.apache.org/, unzip and get this file from location: ...\src\org\apache\xerces\impl\msg.
Now rename this file to XMLSchemaMessages_pl.properties or localization You need and place in classpath. I have project in Maven so i put this file into: src\main\resources\com\sun\org\apache\xerces\internal\impl\msg.
And that's all. Changes to this file will be visible in exception messages.

As per the java doc, you need to extends SAXParseException and override getLocalizedMessage, the default implementation returns the same as getMessage.
Edit:
You can have seperate property file for each language and in each you can have code and local message.
When you raise SAXParseException, based on the locale and some code, returns the appropriate message.
MySAXParseException ex = new MySAXParseException(<code>);

Reading messages dynamically in Java1.6

I've multiple messages file (messages_en.properties, messages_ch.properties)
These files are having some static html text & need some dynamic input param such as username so that it'll say Dear {0}, thanks for subscription....
Now i need to substitute username there after reading those contents from appropriate file.
How can I do that in Java? Is there any framework sample code available?

See the I18N trail. Nutshell version from that tutorial, using newer API methods:
ResourceBundle messages = ResourceBundle.getBundle("MessageBundle", Locale.getDefault());
String output = MessageFormat.format(messages.getString("msg.key"), "Mike");
Depending on your actual usecase there may be some shortcuts (e.g., web frameworks often include direct support for localization via tag libraries, some libraries wrap up some busywork, etc.)

Check MessageFormat out:
String result = MessageFormat.format(
"Dear {0} , thanks for subscription....", username);
You can combine it with ResourceBundle getString method to read the message from your properties files through its key and output the formatted, dynamically filled message.

Translating SAX exceptions

I have some Java code that validates XML against an XSD. I am using a modified version of the Error Handler found here: http://www.ibm.com/developerworks/xml/library/x-javaxmlvalidapi.html to catch and log ALL exceptions while validating.
The errors are very terse, they look something like this:
http://www.w3.org/TR/xml-schema-1#cvc-complex-type.2.4.a?s:cID&{"http://www.myschema.com/schema":txn}
Other messages such as
http://www.w3.org/TR/xml-schema-1#cvc-complex-type.2.4.a?s:attributes&{"http://www.myschema.com/schema":sequence}
are even more cryptic.
Is there an easy way to get a clear and intelligible message out of SAX explaining what went wrong here? I think in the first error it was expecting txn and instead found the element cID. BUT... I don't know all the possible errors that might be generated by SAX so I'd rather not try to manually create a translation table.
The eventual users of this output are mostly non-technical so I need to be able generate simple and clear messages such as "element txn was out of sequence".
If it helps, here's the code (more or less) that's used for validation:
Source schema1 = new StreamSource(new File("resources/schema1.xsd"));
Source schema2 = new StreamSource(new File("resources/schema2.xsd"));
Source[] sources = {schema1,schema2};
validator = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI).newSchema(sources).newValidator();
ErrorHandler lenient = new ForgivingErrorHandler();
validator.setErrorHandler(lenient);
Elsewhere...
StreamSource xmlSource = new StreamSource(new StringReader(XMLData) );
try
{
validator.validate(xmlSource);
}
catch (SAXException e)
{
logger.error("XML Validation Error: ",e);
}

Well, it seems I had to add xsi:schemaLocation="http://www.mycompany.com/schema resources/schema1.xsd " to the XML document, because s:http://www.mycompany.com/schema is the default namespace: xmlns="s:http://www.mycompany.com/schema". Of course, I don't have access to modify the tool that generates the XML, so the following ugly hack was necessary:
xmlDataStr = xmlDataStr.replace("<rootNode ", "<rootNode xsi:schemaLocation=\"http://www.mycompany.com/schema resources/schema1.xsd \" ");
...of course now I'm getting double validation errors! A clear and intelligible one such as:
cvc-complex-type.2.4.a: Invalid content was found starting with element 's:cID'. One of '{"http://www.mycompany.ca/schema":tdr}' is expected.
Immediately followed by:
http://www.w3.org/TR/xml-schema-1#cvc-complex-type.2.4.a?s:cID&{"http://www.mycompany.com/schema":tdr}
The double-error is annoying but at least the first one is usable...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

SAX XML parser - localized error messages - java

Related

PDF/A was validated correctly with preflight but online pdf-tools does not validate it

Invalid byte 1 of 1-byte UTF-8 sequence: RestTemplate [duplicate]

SAXParseException localized

Reading messages dynamically in Java1.6

Translating SAX exceptions

Categories

Resources