I have service that parse XML and produce report with list of parser errors (SAXParseException exactly) using exception.getMessage() (exception.getLocalizedMessage() return the same) that can be read and understand by humans. How to localize this exception messages in a language other than English ?
I've found solution. First need to get XMLSchemaMessages.properties from Apache Xerces. I downloaded Xerces-J-src.2.11.0.tar.gz from http://xerces.apache.org/, unzip and get this file from location: ...\src\org\apache\xerces\impl\msg.
Now rename this file to XMLSchemaMessages_pl.properties or localization You need and place in classpath. I have project in Maven so i put this file into: src\main\resources\com\sun\org\apache\xerces\internal\impl\msg.
And that's all. Changes to this file will be visible in exception messages.
As per the java doc, you need to extends SAXParseException and override getLocalizedMessage, the default implementation returns the same as getMessage.
Edit:
You can have seperate property file for each language and in each you can have code and local message.
When you raise SAXParseException, based on the locale and some code, returns the appropriate message.
MySAXParseException ex = new MySAXParseException(<code>);
Related
I'm looking for a way to access the name of the file being processed during the data transformation within a DoFn.
My pipeline is as shown below:
Pipeline p = Pipeline.create(options);
p.apply(FileIO.match()
.filepattern(options.getInput())
.continuously(Duration.standardSeconds(5),
Watch.Growth.<String>never()))
.apply(FileIO.readMatches()
.withCompression(Compression.GZIP))
.apply(XmlIO.<MyString>readFiles()
.withRootElement("root")
.withRecordElement("record")
.withRecordClass(MyString.class))//<-- This only returns the contents of the file
.apply(ParDo.of(new ProcessRecord()))//<-- I need to access file name here
.apply(ParDo.of(new FormatRecord()))
.apply(Window.<String>into(FixedWindows.of(Duration.standardSeconds(5))))
.apply(new CustomWrite(options));
Each file that is processed is an XML document. While processing the content, I need access to the name of the file being processed too to include in the transformed record.
Is there a way to achieve this?
This post has a similar question, but since i'm trying to use XmlIO I havent found a way to access the file metadata.
Below is the approach I found online, but not sure if there is a way to use it in the pipeline described above.
p.apply(FileIO.match()
.filepattern(options.getInput())
.continuously(Duration.standardSeconds(5),
Watch.Growth.<String>never()))//File Metadata
.apply(FileIO.readMatches()
.withCompression(Compression.GZIP))//Readable Files
.apply(MapElements
.into(TypeDescriptors.kvs(TypeDescriptors.strings(),new TypeDescriptor<ReadableFile>() {} ))
.via((ReadableFile file) -> {
return KV.of(file.getMetadata().resourceId().getFilename(),file);
})
);
Any suggestions are highly appreciated.
Thank you for your time reviewing this.
EDIT:
I took Alexey's advice and implemented a custom XmlIO. It would be nice if we could just extend the class we need and override the appropriate method. However, in this specific case, there was a reference to one method which was protected within the sdk because of which I couldn't easily override what i needed and instead ended up copying a whole bunch of files. While this works for now, I hope in future there is a more straighforward way to access the file metadata in these IO implementations.
I don't think it's possible to do "out-of-box" with a current implementation of of XmlIO since it returns a PCollection<T> where T is a type of your xml record and, if I'm not mistaken, there is no way to add a file name there. Though, you still can try to "reimplement" a ReadFiles and XmlSource in a way that it will return parsed payload and input file metadata.
I am trying to get information from many xml files in a directory.
How can I get specific information from each one and send it to an excel file, in java?
file 1.xml
file 2.xml
file 3.xml
*********
**file.csv** or .**xls** with the information of the 'n' files XML
there are several libraries on Java that can help you to do so.
For instance, for getting information from XML you can use dom4j and extract the specific information make use of the query language XPATH, supported by the library (examples). And to read all the XML files form a directory, Java 8 has an easy way of achieving that.
Files.list(Paths.get("/path/to/xml/files"))
.map(YourXMLParser::parse)
.forEach(XLSExporter::export);
where parse method would have the signature:
public MyDataBean parse(Path path) {
InputStream inputStream = Files.newInputStream(Path);
SAXBuilder saxBuilder = new SAXBuilder(inputStream);
... <-- Making use of SAX for instance and return the read data in a custom Bean (MyDataBean)
}
As Files.list() method return Stream you can take advantage of that to use map and forEach.
Once you have the information from each XML files to you can export to XLS using the most used library in Java for it: Apache POI
I hope it can help.
As I'm trying to automate the API testing process, have to pass the XML file to Read method for example,
Given request read ( varXmlFile )
FYI: XML file is present in the same folder where the feature file exists.
Doing this, its throwing an exception like this
com.intuit.karate.exception.KarateException: called: D:\workspace\APIAutomationDemo\target\test-classes\com\org\features\rci_api_testing.feature, scenario: Get Membership Details, line: 15
javascript evaluation failed: read (varXmlFile )
So Karate doesn't allow this way or can we have any other alternative ?
Suggestion please.
Thanks
Please ensure the variable is set:
* def varXmlFile = 'some-xml-file.xml'
Given request read(varXmlFile)
Or just use normally:
Given request read('some-xml-file.xml')
The problem got solved as in the variable varXmlFile holds the file name along with single quote like this 'SampleXmlRequest.xml'.
So I removed the single quote while returning from the method.
We are in the process of converting over to using the XSLT compiler for page generation. I have a Xalan Java extention to exploit the CSSDK and capture some meta data we have stored in the Extended Attributes for output to the page. No problems in getting the EA's rendered to the output file.
The problem is that I don't know how to dynamically capture the file path and name of the output file.
So just as POC, I have the CSVPath hard coded to the output file in my Java extension. Here's a code sample:
CSSimpleFile sourceFile = (CSSimpleFile)client.getFile(new CSVPath("/some-path-to-the-output.jsp"));
Can someone point me in the CSSDK to where I could capture the output file?
I found the answer.
First, get or create your CSClient. You can use the examples provided in the cssdk/samples. I tweaked one so that I captured the CSClient in the method getClientForCurrentUser(). Watch out for SOAP vs Java connections. In development, I was using a SOAP connection and for the make_toolkit build, the Java connection was required for our purposes.
Check the following snippet. The request CSClient is captured in the static variable client.
CSSimpleFile sourceFile = (CSSimpleFile)client.getFile(new CSVPath(XSLTExtensionContext.getContext().getOutputDirectory().toString() + "/" + XSLTExtensionContext.getContext().getOutputFileName()));
Background: I have a data supplier that is providing XML documents with a bogus character encoding. It is not a valid encoding name (but it is essentially ISO 8859-1.) I cannot get this supplier to change the format.
Attempting to parse these XML documents using a DOM parser results in an UnsupportedEncodingException being thrown. This is probably normal behavior, and I can work around it by writing a Charset that wraps the ISO-8859-1 character encoding and by writing a CharsetProvider to support it. When I add this provider to META-INF/services/java.nio.charset.spi.CharsetProvider, everything works well and my Charset is used to read the XML with no additional coding.
Here is the problem that I cannot solve: how to get Hadoop to recognize this Charset and CharsetProvider. I am running a Hadoop job to read sequence files from HDFS, where each record is one of the above-described XML documents. I cannot get my Charset to be recognized and used by the DOM parser. The system is running Java 1.6, Hadoop 0.20.2, and the XML parser is the internal Xerces parser built into Java 1.6.
Some additional details:
I can force the CharsetProvider to load manually in my code, by doing the following (using the "context class loader"), but I still cannot instantiate the Charset, and the XML parsing fails:
ClassLoader cl = Thread.currentThread().getContextClassLoader();
ServiceLoader<CharsetProvider> serviceLoader = ServiceLoader.load(CharsetProvider.class, cl);
for (CharsetProvider i : serviceLoader) {
LOG.info("CharsetProvider[1]: " + i);
}
Looking at the list of available Charsets, I see my encoding is present when I run as a standalone Java app, but not when I run inside Hadoop.
Set<String> charsetNames = Charset.availableCharsets().keySet();
for (String name : charsetNames) {
LOG.info("Charset: " + name);
}
The following fails under Hadoop, but works otherwise:
Charset cs = Charset.forName(MY_CHARSET_NAME);
I suspect there is some magic configuration I need to tell Hadoop to load my CharsetProvider, but I cannot find out how.