JAXB: Get Tag as String - java

This question may have been answered before in some dark recess of the Interwebs, but I couldn't even figure out how to form a meaningful Google query to search for it.
So: Suppose I have a (simplified) XML document like so:
<root>
<tag1>Value</tag1>
<tag2>Word</tag2>
<tag3>
<something1>Foo</something1>
<something2>Bar</something2>
<something3>Baz</something3>
</tag3>
</root>
I know how to use JAXB to unmarshal this into a Java Object in the standard use cases.
What I don't know how to do is unmarshal tag3's contents wholesale into a String. By which I mean:
<something1>Foo</something1>
<something2>Bar</something2>
<something3>Baz</something3>
as a String, tags and all.

Use annotation #XmlAnyElement.
I've been looking for the same solution and I expected to find some annotation that prevents parsing dom and live it as it is, but did not find it.
Detail at:
Using JAXB to extract inner text of XML element
and
http://blog.bdoughan.com/2011/04/xmlanyelement-and-non-dom-properties.html
I added one cheking in method getElement(), otherwise we could get IndexOutOfBoundsException
if (xml.indexOf(START_TAG) < 0) {
return "";
}
For me it's quite strange behavior with this solution. method getElement() is called for every tag of your xml. The first call is for "Value", the second - "ValueWord", etc. It appends the next tag for previous
update:
I noticed that this approach works only for ONE occurence of tag that we want to parse to String. It's impossible to parse correctly the followint example:
<root>
<parent1>
<tag1>Value</tag1>
<tag2>Word</tag2>
<tag3>
<something1>Foo</something1>
<something2>Bar</something2>
<something3>Baz</something3>
</tag3>
</parent1>
<parent2>
<tag1>Value</tag1>
<tag2>Word</tag2>
<tag3>
<something1>TheSecondFoo</something1>
<something2>TheSecondBar</something2>
<something3>TheSecondBaz</something3>
</tag3>
</parent2>
"tag3" with parent tag "parent2" will contain parameters from the first tag (Foo, Bar, Baz) instead of (TheSecondFoo, TheSecondBar, TheSecondBaz)
Any suggestions are appreciated.
Thanks.

I have an utility method that might come in handy for you in that case. See if it helps. I made a sample code with your example:
public static void main(String[] args){
String text= "<root><tag1>Value</tag1><tag2>Word</tag2><tag3><something1>Foo</something1><something2>Bar</something2><something3>Baz</something3></tag3></root>";
System.out.println(extractTag(text, "<tag3>"));
}
public static String extractTag(String xml, String tag) {
String value = "";
String endTag = "</" + tag.substring(1);
Pattern p = Pattern.compile(tag + "(.*?)" + endTag);
Matcher m = p.matcher(xml);
if (m.find()) {
value = m.group(1);
}
return value;
}

Related

How do I make an xpath with a variable for a selenide test

I want to write a testmethod which I can give a parameter which will define which element to test.
Something like;
public void addImage(String imageNr){
$(By.xpath("(//input[#name='image'])['" + imageNr + "']"));
}
To get i.e. (//input[#name='image'])[2] or (//input[#name='image'])[3]
How would I go about that?
within Selenide you have something called the ElementsCollection. More information can be found on this page: https://selenide.gitbooks.io/user-guide/content/en/selenide-api/elements-collection.html
What you can do is transform the SelenideElement to an ElementsCollection by using double dollar signs:
For example:
This .get requires an Integer type. It will give you first all elements and you can take the second element from the returned list.
$$(By.xpath("(//input[#name='image'])).get(pageNr)
You will still need to do an action after getting this. for Example .click();
Good luck with it.
You can format the XPath String expression with the use of String.format, as following:
public void addImage(String imageNr){
String xpath = "(//input[#name='image'])[{0}]";
xpath = String.format(xpath,imageNr);
$(By.xpath(xpath));
}

How to extract a String from a changing template in Java?

I have a question regarding best practices considering Java regular expressions/Strings manipulation.
I have a changing String template, let's say this time it looks like this:
/get/{id}/person
I have another String that matches this pattern eg.
/get/1234ewq/person
Keep in mind that the pattern could change anytime, slashes could disappear etc.
I would like to extract the difference between the two of them i.e. the result of the processing would be 1234ewq.
I know I could iterate over them char by char and compare, but, if it is possible, I wanted to find some smart approach to it with regular expressions.
What would be the best Java approach?
Thank you.
For you to answer your question with a regex approach I built a small example class which should hint you into a direction you could go with this (see below).
The problem with this approach is that you dynamically create a regular expression that depends on your template strings. This means that you have to somehow verify that your templates do not interfere with the regex compilation and matching process itself.
Also atm if you would use the same placeholder multiple times within a template the resulting HashMap only contains the value for the last placeholder mapping of that kind.
Normally this is the expected behaviour but this depends on your strategy of filling your templates.
For template processing in general you could have a look at the mustache library.
Also as Uli Sotschok mentioned, you probably would be better of with using something like google-diff-match-patch.
public class StringExtractionFromTemplate {
public static void main(String[] args) {
String template = "/get/{id}/person";
String filledTemplate = "/get/1234ewq/person";
System.out.println(diffTemplateInsertion(template, filledTemplate).get("id"));
}
private static HashMap<String, String> diffTemplateInsertion(String template, String filledTemplate){
//language=RegExp
String placeHolderPattern = "\\{(.+)}";
HashMap<String, String> templateTranslation = new HashMap<>();
String regexedTemplate = template.replaceAll(placeHolderPattern, "(.+)");
Pattern pattern = Pattern.compile(regexedTemplate);
Matcher templateMatcher = pattern.matcher(template);
Matcher filledTemplateMatcher = pattern.matcher(filledTemplate);
while (templateMatcher.find() && filledTemplateMatcher.find()) {
if(templateMatcher.groupCount() == filledTemplateMatcher.groupCount()){
for (int i = 1; i <= templateMatcher.groupCount(); i++) {
templateTranslation.put(
templateMatcher.group(i).replaceAll(placeHolderPattern,"$1"),
filledTemplateMatcher.group(i)
);
}
}
}
return templateTranslation;
}
}

Leave entities as-is when parsing XML with Woodstox

I'm using Woodstox to process an XML that contains some entities (most notably >) in the value of one of the nodes. To use an extreme example, it's something like this:
<parent> < > & " &apos; </parent>
I have tried a lot of different configuration options for both WstxInputFactory (IS_REPLACING_ENTITY_REFERENCES, P_TREAT_CHAR_REFS_AS_ENTS, P_CUSTOM_INTERNAL_ENTITIES...) and WstxOutputFactory, but no matter what I try, the output is always something like this:
<parent>nbsp; < nbsp; > & " ' nbsp;</parent>
(> gets converted to >, < stays the same, loses the &...)
I'm reading the XML with an XMLEventReader created with
XMLEventReader reader = wstxInputFactory.createXMLEventReader(new StringReader(fulltext));
after configuring the WstxInputFactory.
Is there any way to configure Woodstox to just ignore all entities and output the text exactly as it was in the input String?
First of all, you need to include actual code since "output is always something like this" makes no sense without explaining exactly how are you outputting content that is parsed: you may be printing events, using some library, or perhaps using Woodstox stream or event writer.
Second: there is difference in XML between small number of pre-defined entities (lt, gt, apos, quot, amp), and arbitary user-defined entities like what nbsp here would be. Former you can use as-is, they are already defined; latter only exist if you define them in DTD.
Handling of the two groups is different, too; former will always be expanded no matter what, and this is by XML specification. Latter will be resolved (unless resolution disabled), and then expanded -- or if not defined exception will be thrown.
You can also specify custom resolver as mention by the other answer; but this will only be used for custom entities (here, ).
In the end it is also good to explain not what you are doing as much as what you are trying to achieve. That will help suggest things better than specific questions of "how do I do X" which may not be the ways to go about.
And as to configuration of Woodstox, maybe this blog entry:
https://medium.com/#cowtowncoder/configuring-woodstox-xml-parser-woodstox-specific-properties-1ce5030a5173
will help (as well as 2 others in the series) -- it covers existing configuration settings.
The basic five XML entities (quot, amp, apos, lt, gt) will be always processed. As far as I know there is no way to get the source of them with Sax.
For the other entities you can process them manually. You can capture the events until the end of the element and concatenate the values:
XMLInputFactory factory = WstxInputFactory.newInstance();
factory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, Boolean.FALSE);
XMLEventReader xmlr = factory.createXMLEventReader(
this.getClass().getResourceAsStream(xmlFileName));
String value = "";
while (xmlr.hasNext()) {
XMLEvent event = xmlr.nextEvent();
if (event.isCharacters()) {
value += event.asCharacters().getData();
}
if (event.isEntityReference()) {
value += "&" + ((EntityReference) event).getName() + ";";
}
if (event.isEndElement()) {
// Assign it to the right variable
System.out.println(value);
value = "";
}
}
For your example input:
<parent> < > & " &apos; </parent>
The output will be:
< > & " '
Otherwise if you want to convert all the entities maybe you could use a custom XmlResolver for undeclared entities:
public class NaiveHtmlEntityResolver implements XMLResolver {
private static final Map<String, String> ENTITIES = new HashMap<>();
static {
ENTITIES.put("nbsp", " ");
ENTITIES.put("apos", "'");
ENTITIES.put("quot", "\"");
// and so on
}
#Override
public Object resolveEntity(String publicID,
String systemID,
String baseURI,
String namespace) throws XMLStreamException {
if (publicID == null && systemID == null) {
return ENTITIES.get(namespace);
}
return null;
}
}
And then tell Woodstox to use it for the undeclared entities:
factory.setProperty(WstxInputProperties.P_UNDECLARED_ENTITY_RESOLVER, new NaiveHtmlEntityResolver());

Why Dom4JDriver is appending a newline at the beginning of the xml content?

I am using the XStream library (1.4.10) and the Dom4jDriver to generate xml content from a Java object. The problem is that it appends a new line at the beginning of the content. Anyway to turn this off?
Dom4JDriver dom4JDriver = new Dom4JDriver();
dom4JDriver.getOutputFormat().setSuppressDeclaration(true);
XStream xStream = new XStream(dom4JDriver);
xStream.processAnnotations(MyClass.class);
String myContent = xStream.toXML(myClassInstance); //extra '\n' appended at the start of the string
MyClass.class:
#XStreamAlias("myClass")
public class MyClass{
private String something;
private String somethingElse;
...........
Generated xml:
\n<myClass>\n <something>blabla</something>\n......
I know that I can just use myContent.subString(...) to get rid of the first character, but it doesnt seem so clean to me. I am also doing this for a lot of operations so I would rather not have that line to begin with for performance's sake. Any advise? Thank you :)
Have you tried DomDriver in place of Dom4JDriver?

Java webservice json pattern issue

java webservices returning this value
{"employee":[{"address":"New Delhi","employeeId":"22222","employeeName":"Abhishek","jobType":"Marketing","salary":"50000"},{"address":"Noida","employeeId":"11111","employeeName":"Dineh Rajput","jobType":"Sr.Software Engineer","salary":"70000"}]}
but I want only
[{"address":"New Delhi","employeeId":"22222","employeeName":"Abhishek","jobType":"Marketing","salary":"50000"},{"address":"Noida","employeeId":"11111","employeeName":"Dineh Rajput","jobType":"Sr.Software Engineer","salary":"70000"}]
my java webservices main code this:
#GET
#Path("/json/employees/")
#Produces("application/json")
public List<Employee> listEmployeesJSON(){
return new ArrayList<Employee>(employees.values());
}
The value you are expecting is not valid due to W3 JSON definition. You can use returned value same as you would use the expected one anyway.
Finally remember that JSON itself is nothing more that string so you can operate on it as on string for example:
import java.util.regex.*;
public class ExpectedJSON{
public static void main(String []args){
String string = "{\"employee\":[{\"address\":\"New Delhi\",\"employeeId\":\"22222\",\"employeeName\":\"Abhishek\",\"jobType\":\"Marketing\",\"salary\":\"50000\"},{\"address\":\"Noida\",\"employeeId\":\"11111\",\"employeeName\":\"Dineh Rajput\",\"jobType\":\"Sr.Software Engineer\",\"salary\":\"70000\"}]}";
String regex = "^[^\\[]+(.+)\\}$";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
matcher.find();
String group = matcher.group(1);
System.out.println(group);
}
}
How are you constructing the json object? If you want your json to not contain the employee key, then just remove it from where you are adding it. Both json formats are valid, you can check your json structure here.
I Wonder why do you need to remove the key of JsonArray Object, if you remove it you lost the reference to the values. By the way,if you think that this way is your solution you can use the gson library.
You can find the solution here.
Hope this help.

Categories