Java approach for XML Validation - java

I want to validate the xml tag values in java.
Scenario:
Suppose I have data as follow
ELements: element1, element2, element3, element4, element5
Values: value1, value2, value3, value4, value5, value6, value7, value8
Now following are the possible combinations which I have to validate:
1. Element1 can have value1/value2 ( element1->value1/value2 )
2. Element2 can have value3/value4 ( element2->value3/value4 )
4. Element3 can have value5 if element1 has value1 ( element3->value5 if element1->value1 )
else Element3 can have value6 if element1 has value2 ( element3->value6 if element1->value2 )
5. Element4 can have value7 if element1->value1 and element2->value4
I can have hard-coding of the requirement in one java file but I want a flexible approach wherein if in future any new condition comes into picture then it can be easily added.
I thought of Hibernate Validation but later on came to know that it is supported for Java 6 and above. My constraint is that I have to use Java 1.5
Please suggest an appropriate approach to fulfill the above requirement. Any link suggestion would also work.
Note: Schema validation is already being carried out.

Explore Schematron for applying business rules on XML data
Schematron Project site - http://www.schematron.com
Introduction tutorial - http://www.dpawson.co.uk/schematron/introduction.html,
see for element to element constraints in the tutorial
One more tutorial - http://www.xml.com/lpt/a/1318

Check this out http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/package-summary.html. This has some basic validation techniques i guess assuming you have the schema

use DOMParser class in java.
It will help you a lot.

Related

How to set defined key as example in generated documentation of a hashmap in springdoc?

I am documenting a java API but the keys for properties that are maps or associative arrays are represented as property1, property2, property3 ... etc.
this is an example of that
#Schema(
description = " This object contains the plans selected by the user.",
name = "plans")
var plans: Map<Int, Plan> = ConcurrentHashMap()
the representation of the generated example in the UI is good in terms of content but I would like to replace just the generated keys by the real ones.
ie property1 by 258.
can someone helpme on how can I get this done using springdoc and its annotations ?

ElasticSearch - define custom letter order for sorting

I'm using ElasticSearch 2.4.2 (via HibernateSearch 5.7.1.Final from Java).
I have a problem with string sorting.
The language of my application has diacritics, which have a specific alphabetic
ordering. For example Ł goes directly after L, Ó goes after O, etc.
So you are supposed to sort the strings like this:
Dla
Dła
Doa
Dóa
Dza
Eza
ElasticSearch sorts by typical letters first, and moves all strange
letters to at the end:
Dla
Doa
Dza
Dła
Dóa
Eza
Can I add a custom letter ordering for ElasticSearch?
Maybe there are some plugins for this?
Do I need to write my own plugin? How do I start?
I found a plugin for Polish language for ElasticSearch,
but as I understand it is for analysing, and analysing is not a solution
in my case, because it will ignore diacritics and leave words with L and Ł mixed:
Dla
Dłb
Dlc
This would sometimes be acceptable, but is not acceptable in my specific usecase.
I will be grateful for any remarks on this.
I've never used it, but there is a plugin that could fit your needs: the ICU collation plugin.
You will have to use the icu_collation token filter, which will turns the tokens into collation keys. For that reason you will need to use a separate #Field (e.g. myField_sort) in Hibernate Search.
You can assign a specific analyzer to your field with #Field(name = "myField_sort", analyzer = #Analyzer(definition = "myCollationAnalyzer")), and define this analyzer (type, parameters) with something like that on one of your entities:
#Entity
#Indexed
#AnalyzerDef(
name = "myCollationAnalyzer",
filters = {
#TokenFilterDef(
name = "polish_collation",
factory = ElasticsearchTokenFilterFactory.class,
params = {
#Parameter(name = "type", value = "'icu_collation'"),
#Parameter(name = "language", value = "'pl'")
}
)
}
)
public class MyEntity {
See the documentation for more information: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_custom_analyzers
It's admittedly a bit clumsy right now, but analyzer configuration will get a bit cleaner in the next Hibernate Search version with normalizers and analyzer definition providers.
Note: as usual, your field will need to be declared as sortable (#SortableField(forField = "myField_sort")).

Aggregating within Apache Jena

I'm using the Java API of Apache Jena to store and retrieve documents and the words within them. For this I decided to set up the following datastructure:
_dataset = TDBFactory.createDataset("./database");
_dataset.begin(ReadWrite.WRITE);
Model model = _dataset.getDefaultModel();
Resource document= model.createResource("http://name.space/Source/DocumentA");
document.addProperty(RDF.value, "Document A");
Resource word = model.createResource("http://name.space/Word/aword");
word.addProperty(RDF.value, "aword");
Resource resource = model.createResource();
resource.addProperty(RDF.value, word);
resource.addProperty(RSS.items, "5");
document.addProperty(RDF.type, resource);
_dataset.commit();
_dataset.end();
The code example above represents a document ("Document A") consisting of five (5) words ("aword"). The occurences of a word in a document are counted and stored as a property. A word can also occur in other documents, therefore the occurence count relating to a specific word in a specific document is linked together by a blank node. (I'm not entirely sure if this structure makes any sense as I'm fairly new to this way of storing information, so please feel free to provide better solutions!)
My major question is: How can I get a list of all distinct words and the sum of their occurences over all documents?
Your data model is a bit unconventional, in my opinion. With your code, you'll end up with data that looks like this (in Turtle notation), and which uses rdf:type and rdf:value in unconventional ways:
:doc rdf:value "document a" ;
rdf:type :resource .
:resource rdf:value :word ;
:items 5 .
:word rdf:value "aword" .
It's unusual, because usually you wouldn't have such complex information on the type attribute of a resource. From the SPARQL standpoint though, rdf:type and rdf:value are properties just like any other, and you can still retrieve the information you're looking for with a simple query. It would look more or less like this (though you'll need to define some prefixes, etc.):
select ?word (sum(?n) as ?nn) where {
?document rdf:type ?type .
?type rdf:value/rdf:value ?word ;
:items ?n .
}
group by ?word
That query will produce a result for each word, and with each will be the sum of all the values of the :items properties associated with the word. There are lots of questions on Stack Overflow that have examples of running SPARQL queries with Jena. E.g., (the first one that I found with Google): Query Jena TDB store.

Faceting using SolrJ and Solr4

I've gone through the related questions on this site but haven't found a relevant solution.
When querying my Solr4 index using an HTTP request of the form
&facet=true&facet.field=country
The response contains all the different countries along with counts per country.
How can I get this information using SolrJ?
I have tried the following but it only returns total counts across all countries, not per country:
solrQuery.setFacet(true);
solrQuery.addFacetField("country");
The following does seem to work, but I do not want to have to explicitly set all the groupings beforehand:
solrQuery.addFacetQuery("country:usa");
solrQuery.addFacetQuery("country:canada");
Secondly, I'm not sure how to extract the facet data from the QueryResponse object.
So two questions:
1) Using SolrJ how can I facet on a field and return the groupings without explicitly specifying the groups?
2) Using SolrJ how can I extract the facet data from the QueryResponse object?
Thanks.
Update:
I also tried something similar to Sergey's response (below).
List<FacetField> ffList = resp.getFacetFields();
log.info("size of ffList:" + ffList.size());
for(FacetField ff : ffList){
String ffname = ff.getName();
int ffcount = ff.getValueCount();
log.info("ffname:" + ffname + "|ffcount:" + ffcount);
}
The above code shows ffList with size=1 and the loop goes through 1 iteration. In the output ffname="country" and ffcount is the total number of rows that match the original query.
There is no per-country breakdown here.
I should mention that on the same solrQuery object I am also calling addField and addFilterQuery. Not sure if this impacts faceting:
solrQuery.addField("user-name");
solrQuery.addField("user-bio");
solrQuery.addField("country");
solrQuery.addFilterQuery("user-bio:" + "(Apple OR Google OR Facebook)");
Update 2:
I think I got it, again based on what Sergey said below. I extracted the List object using FacetField.getValues().
List<FacetField> fflist = resp.getFacetFields();
for(FacetField ff : fflist){
String ffname = ff.getName();
int ffcount = ff.getValueCount();
List<Count> counts = ff.getValues();
for(Count c : counts){
String facetLabel = c.getName();
long facetCount = c.getCount();
}
}
In the above code the label variable matches each facet group and count is the corresponding count for that grouping.
Actually you need only to set facet field and facet will be activated (check SolrJ source code):
solrQuery.addFacetField("country");
Where did you look for facet information? It must be in QueryResponse.getFacetFields (getValues.getCount)
In the solr Response you should use QueryResponse.getFacetFields() to get List of FacetFields among which figure "country". so "country" is idenditfied by QueryResponse.getFacetFields().get(0)
you iterate then over it to get List of Count objects using
QueryResponse.getFacetFields().get(0).getValues().get(i)
and get value name of facet using QueryResponse.getFacetFields().get(0).getValues().get(i).getName()
and the corresponding weight using
QueryResponse.getFacetFields().get(0).getValues().get(i).getCount()

Iterate and concat using XPath Expression

I have the following xml file:
<author>
<firstname>Akhilesh</firstname>
<lastname>Singh</lastname>
</author>
<author>
<firstname>Prassana</firstname>
<lastname>Nagaraj</lastname>
</author>
And I am using the following JXPath expression,
concat(author/firstName," ",author/lastName)
To get the value Akhilesh Singh ,Prassana Nagaraj but
I am getting only Akhilesh Singh.
My requirement is that I should get the value of both author by executing only one JXPath expression.
XPath 2.0 solution:
/*/author/concat(firstname, ' ', lastname, following-sibling::author/string(', '))
With XPath 1.0, when an argument type other than node set is expected, the first node in the node set is selected and then apply the type conversion (boolean type conversion is some how different).
So, your expresion (Note: no capital):
concat(author/firstname," ",author/lastname)
It's the same as:
concat( string( (author/firstname)[1] ), " ", string( (author/lastname)[1] ) )
Depending on the host language you could use:
author/firstname|author/lastname
This is evaluate to a node set with firstName and lastName in document order, so then you could iterate over this node set extracting the string value.
In XPath 2.0 you could use:
string-join(author/concat(firstname,' ', lastname),' ,')
Output:
Akhilesh Singh ,Prassana Nagaraj
Note: Now, with sequence data type and function calls as steps, XPath resembles the functional language it claims to be. Higher Order Functions and partial applycation must wait to XPath 2.1 ...
Edit: Thanks to Dimitre's comments, I've corrected the string separator.
concat() will return single string. If you want both results then you need to iterate over "author" element and do "concat(firstName," ",lastName)"

Categories