Performance-effective way to transform XML data represented as Writeable - java

I'm working on utility method that allows conversion of XML data into formatted String and before you're going to think it's a trivial task for javax.xml.transform.Transformer let me explain the specific constraints I've faced with.
The input data does not exist at the moment conversion starts. Actually it's represented as groovy.lang.Writeable (javadoc) instance that I could output into any java.io.Writer instance. Signature of method looks like this:
static String serializeToString(Writable source)
My current solution involves few steps and actually provides expected result:
Create StringWriter, output source there and convert to String
Create javax.xml.transform.stream.StreamSource instance based on this string (using StringReader)
Create new StringWriter instance and wrap it into javax.xml.transform.stream.StreamResult
Perform transformation using instance of javax.xml.transform.Transformer
Convert StringWriter to String
While solution does work I'm not pleased enough with its efficiency. This method will be used really often and I do want to optimize it. What I'd like to avoid is necessity to perform multiple conversions along the line:
From Writeable to String (unformatted)
From String to StreamSource (which means that data will be parsed again)
From StreamSource to String again (formatted)
So the question is whether it's possible to build pipe-like flow which eliminates unnecessary conversions?
UPDATE #1:
To give a little bit more context, I'm converting GPathResult instance to formatted string using StreamingMarkupBuilder.bindNode() method which produces Writable instance. Unfortunately there is no way to specify StreamingMarkupBuilder to produce formatted output.
UPDATE #2:
I did experiment with implementation based on PipedWriter + PipedReader but experiments didn't show much speed gain from this approach. Looks like it's not that critical issue in this case.

Not knowing what you mean exactly by "XML data", but you could think of representing the "Yet-to-be" stuff as a SAXSource directly, thereby by-passing the "to-string" and "parse-string" steps.

Related

Should I use XML Parser to modify XML template or convert to String and then modify?

I'm working on a project which requires me to insert values into a predefined XML template. Till now I have been using the StringBuilder class to convert the XML file into a string and make the required changes. Now, I wanted to know whether using an XML Parser like DOM,JDOM, SAX etc would be more efficient compared to the alternative way I'm using.
Since there are no implementation issues, I don't think any piece of code needs to be shared.
Please see this it may help you:
https://www.mkyong.com/java/how-to-modify-xml-file-in-java-dom-parser/
Modifying an XML document by string replacements in general is a bad practice as you can accidentally make your XML invalid, so for your task I would rather use simple XSL transformation.

Perform Linear Regression on data (from .arff file) - JAVA, Weka

I want to perform Linear Regression on a collection of data using Java. I have couple of questions..
what data types does linear regression method accept?
Because, I have tried to load the data in pure nominal format as well as numeric, but then when i'm trying to pass that 'data' (an Instance Variable created in program) to Linear Regression it gives me this exception. Cannot handle Multi-Valued nominal class
How to be able to print the Linear Regression output to console in java. I'm unable to produce the code to do so, after going through the predefined LinearRegression.java class, i got to know that buildClassifier() is the method that takes 'data' as input file. But then i'm unable to move forward. Can anyone help me understand the sequence of steps to follow to be able to get output to console.
protected static void useLinearRegression(Instances data) throws Exception{
BufferedReader reader = new BufferedReader(new FileReader("c:\somePath\healthCare.arff"));
Instances data = new Instances(reader);
data1.setClassIndex(data1.numAttributes() - 1);
LinearRegression2 rl=new LinearRegression2();
rl.buildClassifier(data); //What after this? or before
Linear Regression should accept both nominal and numeric data types. It is simply that the target class cannot be a nominal data type.
The Model's toString() method should be able to spit out the model (other classifier options may also be required depending on your needs), but if you are also after the predictions and summaries, you may also need an Evaluation object. There, you could use toSummaryString() or toMatrixString() to obtain some other statistics about the model that was generated.
Hope this Helps!

How to add a code snippet to method body with JDT/AST

I'm trying to generate Java source code with JDT/AST. I now have MethodDeclaration and want to add a code snippet (from another source) to the method body. The code snippet can contain any Java code, even syntactically invalid code. I just can't find the way to do this.
With JCodeModel you would use JBlock#directStatement(String s) method.
Is there a way to do this with JDT/AST?
Since you have a well-formed tree for the rest of the application, and you want to insert non-well-formed text at a particular place, you pretty much can't do it with the standard tree node insertion mechanisms.
What matters is that you produce text for the valid program text with the fragment inserted in at at the right place. Somewhere in there must be a piece of logic that prints the AST as text. What you need to do is to ask that the AST be printed as text, and catch it in the middle of that process, at the precise point necessary, to insert your arbitrary text.
Our DMS Software Reengineering Toolkit has enter/exit print-node hooks in its prettyprinter to allow this kind of thing to happen.
If such things don't exist in JDT/AST, you can try to modify its prettyprinter to give you that hook. Alternatively, you might consider modifying JDT/AST by adding a another tree node type that isn't part of the standard set, that simply holds arbitrary text but acts like a method node. Presumably each node controls what is printed; then you could define the prettyprinting for that tree node, to cause it to output its text.
A final really hacky solution: insert a perfectly valid AST where the arbitrary text will go, containing somewhere a bogus identifier with a unique name, e.g., ZZZ. Then, print the AST to a string, and post-process the string to replace the bogus trees containing the unique name with the actual user text.
You first need to parse the code snippet into an AST. You can use the ASTParser API for this purpose.
It is possible to get the compilation problems of a compilation unit (See CompilationUnit.getProblems()).
There are a couple of ways to modify Java code using JDT. I'd suggest that you consider the ASTRewrite API for modifying the body of a method.
You can manipulate the AST with the ASTParser API - and the output doesn't even have to compile.
Here's an example for your case:
String textToInsert = "Some text";
StringLiteral stringLiteral = methodDeclaration.getAST().newStringLiteral();
rewriter.set(stringLiteral, StringLiteral.ESCAPED_VALUE_PROPERTY, textToInsert, null);
ListRewrite methodStatements = rewriter.getListRewrite(methodDeclaration.getBody(), Block.STATEMENTS_PROPERTY);
methodStatements.insertFirst(stringLiteral, null);
Result:
public void myMethod() {
Some text
}

JSON, XML or String concatenation

I am doing a new application where I want to choose which protocol to use in it. I tried the String concatenation and the XML before, but never tried the JSON Object. Well Which one of those three is better in terms of performance? I am aware that XML is way much better than string concatenation. So what to use? XML or JSON? Or maybe a new technology that I am not aware of?
Thanks in advance
I am aware that XML is way much better than string concatenation. Well in this I mean that in String concatenation, I am adding different values and splitters to a string and then looping to find the spliters on the device. like in the example:
String toSend = "test1////test2////test3////test4////test5";
Here the splitter is "////" and I am sending 5 values. Getting these 5 values will be much more slower than XML in case of thousands of values.
It depends. :)
Well, actually I think a properly written code to split a string will be more fast than an XML/JSON parser, however XML/JSON parsers are reliable in terms of returning exactly the same data structure. For instance, how would you handle a case when your data itself includes splitters? If such case is impossible under your business logic, then you may just go with string joining/splitting. Otherwise it is better not to reinvent the wheel and just use XML/JSON (JSON is more lightweight).
It depends on the kind of Objects you will be exchanging.
It also depends on the way you will request and use you objects.
If you want ot provide a REST service that exposes simples Objects will be accessible directly by as Javascript GUI. I would also go for JSON. But no hand-made String concatenation to build JSON. You can use a lib.
But I you plan to exchange more complex data, between various Java based "services". I would probably go for XML. Especially if you can first write the XSD that defines you XML objects. You will be able to generate Java class and let JAXB do the marshalling/unmarshalling boring stuff.
I would choose JSON, it's very portable and lightweight (lighter than XML).

Designing classes around a StAX parsing

This is a design question rather than a Java-specific question, but I'm designing it for Java.
I've been writing some XML pull parsing classes to handle a custom XML response and as I design them, I can't help but think whether there's something better. Maybe someone even has a design pattern for it.
So, here's what my XML may look like:
<ResponseRoot>
<Header>
<RequestId />
<OtherHeaderMetaData />
</Header>
<Body>
...
<!-- Lots of other elements and nested elements -->
...
</Body>
</ResponseRoot>
So depending on the RequestId (a key of sorts), the Body element is different. Given that this is pull parsing, I'd have a large switch statement and lots of if-else-if blocks.
Would it be more efficient for one class with lots of static methods to handle the whole XML stream, or would it make for a better design to have one class responsible for each RequestId?
I was thinking of mapping RequestId to a class name, and then when I hit Body, I use a factory to retrieve the appropriate subparser. Inside that factory, I could even use a mapping of Class instances and use reflection to instantiate the appropriate parser (since not all parsers are needed all the time). Or... use reflection to grab the appropriate static parsing method instead, so I don't need to instantiate parsers that are really just 1-use classes...
Yea, I'm thinking too deeply, but since this is just a personal project, I just got curious about how people design parsing classes around a StAX parser.
So depending on the RequestId (a key of sorts), the Body element is different.
Can you redesign the XML so that a valid body elemt does not depend on the request-ID, but is determined entirely by the surrounding response element? Then document validity (conformance to the DTD) would correspond to message response validity.
Instead of using a switch statement, consider using the state design pattern. That is, implement your document handler as a finite state machine.
I'd definitely go for a separate processor class per request type. Your factory approach sounds good, but don't bother with the reflection stuff, just create those processor objects instead of using reflection, thirty-something bytes of heap space are a very reasonable price for easily readable code.
One thing that is important though with StAX that in the documentation of every processing method you should describe what state that method is expecting the input to be in (e.g.: before or after having processed the opening <body> tag) and where does it leave the input after processing. You can save yourself hours of frustrating debugging this way.
Instead of putting the responsibility of parsing each custom XML object on the StAX parser, why not have the StAX parser create an intermediate representation of the XML object? Then, you could have a factory which would construct a final representation of the XML object using the RequestID. The code would look similar to:
IntermediateObject io = StAX.parse(XML);
FinalObject = Factory.create(io.getRequestID, io);
The upside of using this approach is that you're separating responsibilities. The StAX parser will only parse the XML, while the factory would be responsible for doing further processing with that information.

Categories