For an educational project, I need some code (if exists) that transform XML files (specifically LOM metadata, but just xml is fine) to XML+RDF.
I need that because I'm using a RDF store (4store) to query the triples and make searches faster.
I read that with XSLT it's possible to transform any xml to another xml, so if you know there is an actual class, library or code, please tell me.
Thank you all.
My advice would be to use a software library to transform the XML to RDF/XML since the mapping may not be straightforward and RDF/XML has different XML semantics.
There a loads of different RDF API's for different technology stacks including
dotNetRDF, Jena, Sesame, ARC, Redland
http://semanticweb.org/wiki/Tools
You also need to define how the LOM metadata should be serialised into RDF. There is a good article here:
http://www.downes.ca/xml/rss_lom.htm
Answer my own question..
I'm using a binding of key/value for the LOM file. So, this part of the metadata:
<general>
<identifier xmlns="http://ltsc.ieee.org/xsd/LOM">
<catalog>oai</catalog>
<entry>oai:archiplanet.org:ap_44629</entry>
</identifier>
catalog and entry will going to be converted like this:
s = the URI of my graph, it contains my filename or identifier.
p = "lom.general.identifier.catalog"
v = "oai"
,,,,,,
s = the URI of my graph, it contains my filename or identifier.
p = "lom.general.identifier.entry"
v = "oai:archiplanet.org:ap_44629"
An so, it generates all the triples for the RDF file. I think this approach will help in order to make queries about specific values or properties.
IEEE LOM is not straightforward structure. It contains hierarchical taxonomy which should be taken into account when you are mapping. Here you can find an instruction on how you can map each IEEE LOM element as RDF, if this is your case.
Regarding the conversion, you can use the XML java library to read the XML files and create the final RDF/XML file using Jena according to the ontology I mentioned. The lom ontology is available at here
Related
I need to extend REST API in java with Spring reaching to Marklogic database. I already have functionality using StructuredQueryBuilder and the search method from DocumentManagerImpl (package com.marklogic.client.impl), but the client expects highlighting fragments of answers matching the searched phrases in Polish language, including derivatives from the stems (there may be several keywords by which we search, but with the condition of joint occurrence in the result).
How to extend the search query to Marklogic in the simplest way and using the Java API from Marklogic to obtain additional information about the location of the searched phrases in the returned objects in one query to the database?
Should I put a custom dictionary for stemming in Marklogic? Are there any sources recommended by Marklogic where I can get dictionaries?
You can get snippets with highlighting via the Java API via code like this:
QueryManager mgr = client.newQueryManager();
SearchHandle handle = mgr.search(mgr.newStructuredQueryBuilder().term("quick"), new SearchHandle());
for (MatchDocumentSummary matchResult : handle.getMatchResults()) {
for (MatchLocation matchLocation : matchResult.getMatchLocations()) {
for (MatchSnippet snippet : matchLocation.getSnippets()) {
System.out.println(snippet.getText());
System.out.println(snippet.isHighlighted());
}
}
}
Custom dictionaries are covered at https://docs.marklogic.com/guide/search-dev/custom-dictionaries. I believe that once you've created a dictionary, you'll want to modify the Language setting on your database to use the new dictionary (I have not tried that before, but that appears to be the expected approach).
As for a Polish dictionary - there's a link to a repository of dictionaries at https://developer.marklogic.com/code/dictionaries-and-thesauri/, but there's not a Polish dictionary there. Building a complete dictionary would of course be a significant effort, though it sounds like if you're mostly interested in stemming on certain keywords, you could build a custom dictionary containing just those keywords and their stems.
I need to create RDF that looks like this:
<rdf:Description rdf:about='uri1'>
<namespace:level1>
<rdf:Description>
<namespace:blankNode rdf:resource='uri2'/>
<namespace:text></namespace:text>
</rdf:Description>
</namespace:level1>
</rdf:Description>
<rdf:Description rdf:about="uri2">
some properties here
</rdf:Description>
As you can see, there are nested structures, as well as blank nodes. (I don't know if that's the exact terminology for the "blankNode" property in my structure.) If I use
model.write(System.out, "RDF/XML-ABBREV");
then even the blank node is nested, which I don't want. Is there any way to get this kind of structure using Jena? Or is there any other library for Java that can handle this better?
I think you're going at this the wrong way.
Nesting is a concept that only makes sense when talking about trees. But RDF is not about trees, it's about triples. Forget for a while about the structure of the XML, and think about the triples that are encoded in the XML. I find model.write(System.out, "N-TRIPLES"); most useful for that.
You first need to understand what triples you want your RDF/XML file to express. As long as it expresses the right triples, it doesn't matter whether the one node is written nested inside the other or what order things appear in. These are purely “cosmetic” questions, like indentation.
Meanwhile in RDF4J, I have a found the org.eclipse.rdf4j.rio.rdfxml.util.RDFXMLPrettyWriter to do exactly what I want, produces a nice nested clean and compact RDFXML document of a nested configuration.
While I agree with #cygri, it is often very desirable (e.g. in consulting situations) to have an easily readable RDFXML, and the Writer often spits out hard to digest RDF (probably due to streaming and optimization for memory consumption/speed).
I am trying to compare Document objects to understand if they are well formed or not. So to do that, I made a research about it and heard that xsd files are used to make this comparison. Can you please give me some basci examples to compare document with using xsd objcets ?
For example what do I have to write into xsd file and how I can compare it with a Document object ?
Thank you all
You don't need an XSD schema to determine if a document is well-formed. You only need it to determine if the document is valid against the schema.
I'm not sure what you mean by "comparing XML documents". What are you comparing them with?
<item>
<RelatedPersons>
<RelatedPerson>
<Name>xy</Name>
<Title>asd</Title>
<Address>abc</Address>
</RelatedPerson>
<RelatedPerson>
<Name>xy</Name>
<Title>asd</Title>
<Address>abc</Address>
</RelatedPerson>
</RelatedPersons>
</item>
I d like to parse this data with a SAXParser. How can i do this?
I know the tutorials about SAX, and i can parsing any normal RSS, but i can't parsing this datas only.
Define your Problem: What you can probably do is create a Value Object(POJO) called Person which has the properties: name, title and address. You aim of parsing this XML would then be to create an ArrayList<Person> object. Defining a definite data structure helps you build logic around it.
Choose a Parser : You can then use a SAX Parser or an XML Pull Parser to browse through the tags: see this lin for a tutorial on DOM, SAX and XML Pull Parser in Android.
Data Population Logic: Then while Parsing, whenever you encounter a <RelatedPersons> tag, instantiate a new Person object. When you encounter the respective Properties tag, read the value and populate it in this object. When you encounter a closing </RelatedPersons> dump this Person Object in the ArrayList. Depending on the Parser you use, you will have to use appropriate methods to browse to the child node/nested nodes.(Refer the link for details)
By the time you are done parsing the last item node you will have all the values in your ArrayList.
Note that this is more of a theoretical answer; I hope it helps.
I need to extract data from an incoming message that could be in any format. The extracted data to store is also dependent upon the format, i.e. format A could extract field X, Y, Z, but format B could extract field A, B, C. I also need to view Message B by searching for field C within the message.
Right now I'm configuring and storing a the extraction strategy (XSLT) and executing it at runtime when it's related format is encountered, but I'm storing the extracted data in an Oracle database as an XmlType column. Oracle seems to have pretty lax development/support for XmlType as it requires an old jar that forces you to use a pretty old DOM DocumentBuilderFactory impl (looks like Java 1.4 code), which collides with Spring 3, and doesn't play very nicely with Hibernate. The XML queries are slow and non-intuitive as well.
I'm concluding that Oracle with XmlType isn't a very good way to store the extracted data, so my question is, what is the best way to store the serialized/queryable data?
NoSQL (Cassandra, CouchDB, MongoDB, etc.)?
A JCR like JackRabbit?
A blob with manual de/serialization?
Another Oracle solution?
Something else??
One alterative that you haven't listed is using an XML Database. (Notice that Oracle is one of the ten or so XML database products.)
(Obviously, a blob type won't allow querying "inside" the persisted XML objects unless you read each blob instance into memory and do the querying there; e.g. using XSLT.)
I have had great success in storing complex xml objects in PostgreSQL. Together with the functional index features, you can even create indexes on node values of the stored xml files, and use those indexes to do very fast lookups using index scans without having to reparse the XML file.
This however will only work if you know your query patterns, arbitrary xpath queries will be slow also.
Example (untested, contains syntax errors for sure):
Create a simple table:
create table test123 (
int serial primary key,
myxml text
)
Now lets assume you have xml documents like:
<test>
<name>Peter</name>
<info>Peter is a <i>very</i> good cook</info>
</test>
Now create a function index:
create index idx_test123_name on table123 using xpath(xml,"/test/name");
Now do you fast xml lookups:
SELECT xml FROM test123 WHERE xpath(xml,"/test/name") = 'Peter';
You should also consider creating an index using text_pattern_ops, so you can have fast prefix lookups like:
SELECT xml FROM test123 WHERE xpath(xml,"/test/name") like 'Pe%';