I have this piece of information in RDF/XML
<rdf:RDF xmlns:cim="http://iec.ch/TC57/2012/CIM-schema-cim16#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<cim:SynchronousMachineTimeConstantReactance rdf:ID="_54302da0-b02c-11e3-af35-080027008896">
<cim:IdentifiedObject.aliasName>GENCLS_DYN</cim:IdentifiedObject.aliasName>
<cim:IdentifiedObject.name>RoundRotor Dynamics</cim:IdentifiedObject.name>
<cim:SynchronousMachineTimeConstantReactance.tpdo>0.30000001192092896</cim:SynchronousMachineTimeConstantReactance.tpdo>
<cim:SynchronousMachineTimeConstantReactance.tppdo>0.15000000596046448</cim:SynchronousMachineTimeConstantReactance.tppdo>
I have learned a little bit about how to read the document but now I want to go farther. I am "playing" with API functions to try to get the values but I am lost (and I think I do not understand properly how JENA and RDF work). So, how can I get the values of each tag?
Greetings!
I would start with the Reading and Writing RDF in Apache Jena documentation, and then read The Core RDF Api. One important step in understanding the RDF Data Model is to seperate any notion of XML from your understanding of RDF. RDF is a graph data model that just so happens to have one serialization which is in XML.
You'll note that xml-specific language like "tags" actually don't show up at all in the discussion unless you are talking about how to serialize/deserialize RDF/XML.
In order to make the data you are looking at more human friendly, I'd suggest writing it out in TURTLE. TURTLE (or TTL) is another serialization of RDF that is much easier to read or write.
The following code will express your data in TURTLE and will be helpful in understanding what you see.
final InputStream yourInputFile = ...;
final Model model = ModelFactory.createDefaultModel();
model.read(yourInputFile, "RDF/XML");
model.write(System.out, null, "TURTLE");
You'll also want to provide minimal working examples whenever submitting questions on the subject area. For example, I had to add some missing end-tags to your data in order for it to be valid XML:
<rdf:RDF xmlns:cim="http://iec.ch/TC57/2012/CIM-schema-cim16#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<cim:SynchronousMachineTimeConstantReactance rdf:ID="_54302da0-b02c-11e3-af35-080027008896">
<cim:IdentifiedObject.aliasName>GENCLS_DYN</cim:IdentifiedObject.aliasName>
<cim:IdentifiedObject.name>RoundRotor Dynamics</cim:IdentifiedObject.name>
<cim:SynchronousMachineTimeConstantReactance.tpdo>0.30000001192092896</cim:SynchronousMachineTimeConstantReactance.tpdo>
<cim:SynchronousMachineTimeConstantReactance.tppdo>0.15000000596046448</cim:SynchronousMachineTimeConstantReactance.tppdo>
</cim:SynchronousMachineTimeConstantReactance>
</rdf:RDF>
Which becomes the following TURTLE:
<file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896>
a cim:SynchronousMachineTimeConstantReactance ;
cim:IdentifiedObject.aliasName "GENCLS_DYN" ;
cim:IdentifiedObject.name "RoundRotor Dynamics" ;
cim:SynchronousMachineTimeConstantReactance.tpdo "0.30000001192092896" ;
cim:SynchronousMachineTimeConstantReactance.tppdo "0.15000000596046448" .
RDF operates at the statement level, so to find out that your _54302da0-b02c-11e3-af35-080027008896 is a cim:SynchronousMachineTimeConstantReactance you would look for the corresponding triples. Jena's Model API (linked to above) will provide you with methods to identify the properties that resources have.
The following will list all statements whose subject is the aforementioned resource:
final Resource s = model.getResource("file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896");
final ExtendedIterator<Statement> properties = s.listProperties();
while( properties.hasNext() ) {
System.out.println(properties.next());
}
which produces:
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://iec.ch/TC57/2012/CIM-schema-cim16#SynchronousMachineTimeConstantReactance.tppdo, "0.15000000596046448"]
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://iec.ch/TC57/2012/CIM-schema-cim16#SynchronousMachineTimeConstantReactance.tpdo, "0.30000001192092896"]
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://iec.ch/TC57/2012/CIM-schema-cim16#IdentifiedObject.name, "RoundRotor Dynamics"]
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://iec.ch/TC57/2012/CIM-schema-cim16#IdentifiedObject.aliasName, "GENCLS_DYN"]
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://iec.ch/TC57/2012/CIM-schema-cim16#SynchronousMachineTimeConstantReactance]
which produces:
Related
i'm trying to understand how to load firstly, and then read into RDF file in N-Turtle format.
I'm using Jena Java APIs (https://jena.apache.org/index.html).
I'm trying with this Java code:
Model model = ModelFactory.createDefaultModel();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream("path_in_my_pc");
} catch (FileNotFoundException e) {}
I have to research a word into the RDF file, and print the results that match.
I was thinking to save the RDF N-Turtle into a string, and then, using some String method to find what I need. Is there other method to do this?
It would be also useful understand how to iterate all the RDF file and print the entire document.
Thank you for the help.
There's no format called N-Turtle. There are N3, Turtle, and N-Triples. N3 and Turtle are more or less interchangeable (they're not actually the same, but most of the time, when people say N3, they actually mean Turtle). For reading from Turtle, the answer to
Jena read from turtle fails
should work for you. For N-Triples,
How to load N-TRIPLE file in apache jena?
can work. Also have a look at
How to load an N-Triples model and obtain the namespaces from RDF file?(newbie)
Trying to load a model from a CIM/XML file acording to IEC 61970 (Common Information Model, for power systems models), I found a problem;
According JAXB´s graphs between elements are provided by #XmlREF #XmlID and these both should be equals to match. But in CIM/RDF the references to a resource through an ID, i.e. rdf:resource="#_37C0E103000D40CD812C47572C31C0AD" contain the "#" character, consequently JAXB is unable to match "GeographicalRegion" vs. "SubGeographicalRegion.Region" when in the rdf:resource atribute the "#" character is present.
Here an example:
<cim:GeographicalRegion rdf:ID="_37C0E103000D40CD812C47572C31C0AD">
<cim:IdentifiedObject.name>GeoRegion</cim:IdentifiedObject.name>
<cim:IdentifiedObject.localName>OpenCIM3bus</cim:IdentifiedObject.localName>
</cim:GeographicalRegion>
<cim:SubGeographicalRegion rdf:ID="_ID_SubGeographicalRegion">
<cim:IdentifiedObject.name>SubRegion</cim:IdentifiedObject.name>
<cim:IdentifiedObject.localName>SubRegion</cim:IdentifiedObject.localName>
<cim:SubGeographicalRegion.Region rdf:resource="#_37C0E103000D40CD812C47572C31C0AD"/>
</cim:SubGeographicalRegion>
I realize you're asking for a solution using JAXB, but I would urge you to consider an RDF-based solution as it is more flexible and robust. You're basically trying to reinvent what RDF parsers already have built in. RDF/XML is a difficult format to parse, it doesn't make much sense to try and hack your own parsing together - especially since files that have very different XML structures can express exactly the same information: this only becomes apparent when looking at the level of the RDF. You may find that your JAXB parser workaround works on one CIM/RDF file but completely fails on another.
So, here's an example of how to process your file using the Sesame RDF API. No inferencing is involved, this just parses the file and puts it in an in-memory RDF model, which you can then manipulate and query from any angle.
Assuming the root element of your CIM file looks something like this:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:cim="http://example.org/cim/">
(only a guess of course, but I need prefixes for a proper example)
Then you can do the following, using Sesame's Rio RDF/XML parser:
String baseURI = "http://example.org/my/file";
FileInputStream in = new FileInputStream("/path/to/my/cim.rdf");
Model model = Rio.parse(in, baseURI, RDFFormat.RDFXML);
This creates an in-memory RDF model of your document. You can then simply filter-query over that. For example, to print out the properties of all resources that have _37C0E103000D40CD812C47572C31C0AD as their SubGeographicalRegion.Region:
String CIM_NS = "http://example.org/cim/";
ValueFactory vf = ValueFactoryImpl.getInstance();
URI subRegion = vf.createURI(CIM_NS, "SubGeographicalRegion.Region");
URI res = vf.createURI("http://example.org/my/file#_37C0E103000D40CD812C47572C31C0AD");
Set<Resource> subs = model.filter(null, subRegion, res).subjects();
for (Resource sub: subs) {
System.out.println("resource: " + sub + " has the following properties: ");
for (URI prop: model.filter(sub, null, null).predicates()) {
System.out.println(prop + ": " + model.filter(sub, prop, null).objectValue());
}
}
Of course at this point you can also choose to convert the model to some other syntax format for further handling by your application - as you see fit. The point is that the difference between the identifiers with the leading # and without has been resolved for you by the RDF/XML parser.
This is of course personal opinion only, since I don't know the details of your use case, but I think you'll find that this is quite quick and flexible. I should also point out that although the above solution keeps the entire model in memory, you can easily adapt this to a more streaming (and therefore less memory-intensive) approach if you find your files are too big.
I have about 3200 URLs to small XML files which have some data in the form of strings(obviously).The XML files are displayed(not downloaded) when I go to the URLs. So I need to extract some data from all those XMLs and save it in a single .txt file or XML file or whatever. How can I automate this process?
*Note: This is what the files look like. I need to copy the 'location' and 'title' from all of them and put them in one single file. Using what methodology can this be achieved?
<?xml version="1.0"?>
-<playlist xmlns="http://xspf.org/ns/0/" version="1">
-<tracklist>
<location>http://radiotool.com/fransn.mp3</location>
<title>France, Paris radio 104.5</title>
</tracklist>
</playlist>
*edit: Fixed XML.
It's easy enough with XQuery or XSLT, though the details will depend on how the URLs are held. If they're in a Java List, then (with Saxon at least) you can supply this list as a parameter to the following query:
declare variable urls as xs:string* external;
<data>{
for $u in $urls return doc($u)//*:tracklist
}</data>
The Java code would be something like:
Processor proc = new Processor();
XQueryCompiler c = proc.newXQueryCompiler();
XQueryEvaluator q = c.compile($query).load();
List<XdmItem> urls = new ArrayList();
for (url : inputUrls) {
urls.append(new XdmAtomicValue(url);
}
q.setExternalVariable(new QName("urls"), new XdmValue(urls));
q.setDestination(...)
run();
Have a look at the JSoup library here: http://jsoup.org/
It has facilities for pulling and fixing the contents of a URL, it is intended for HTML though, so I'm not sure it will be good for XML, but it is worth a look.
I'm going to extract values from this XML/RDF:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.0="urn:turismoculturale.itdc.filas-1.0.0-RC1#">
<j.0:Chiesa rdf:ID="turismoCulturale_POI_880">
<j.0:title xml:lang="en">Church of S. Giuda Taddeo or S. Onofrio - Gaeta</j.0:title>
<j.0:title xml:lang="it">Chiesa S. Giuda Taddeo o S. Onofrio - Gaeta</j.0:title>
</j.0:Chiesa>
</rdf:RDF>
I would like to get en title when I am in "en" language and "it" title otherwise. I am able to set the title value in the Poi bean by using:
Digester digester = new Digester();
digester.setNamespaceAware( true );
digester.setRuleNamespaceURI( "urn:turismoculturale.itdc.filas-1.0.0-RC1#" );
digester.addObjectCreate( "*/Chiesa", Poi.class);
digester.addBeanPropertySetter("*/title", "title");
...
but I don't know if it is the english title or the italian one.
Ok - first and foremost, don't try to parse RDF/XML with an XML parser. It's never going to work because the semantics of the XML document are irrelevant with respect to RDF/XML and it is a bad idea (if you know how RDF/XML works), especially in your case where the RDF/XML is being generated dynamically (you can tell by the namespaces). You need to use an RDF parser to parse RDF.
So that means don't use an XML to Java object mapping tool, use an RDF to Java Object mapping tool.
Here is a great link explaining how to do this:
http://answers.semanticweb.com/questions/859/best-way-to-convert-rdfxml-file-to-pojos
And another:
http://answers.semanticweb.com/questions/3251/experience-using-java-based-frameworks-for-rdf-to-pojo-and-vice-versa-mapping
Along with links to all the tools in the aforementioned resource:
Jenabean
Empire
AliBaba
RDFReactor
For an out and out RDF parser, look at Jena:
http://incubator.apache.org/jena
It's an Apache project that is also nicely Maven'ed up.
The Commons Digester FAQ says:
Occasionally, people ask how they can fire a rule for an element based on the value of an attribute
There is no simple way to do this with Digester; the built-in rule-matching engines only provide the ability to match on element name. There is no support available for XPath expressions
It might be possible to create a custom "filtering" rule that has a child rule, and fires that child rule only when the appropriate conditions are set. There are no examples of such a solution, however.
Digester isn't a very good tool. It's too simplistic. Consider using a more comprehensive event-based API such as StAX.
I have the following feeds from my vendor,
http://scores.cricandcric.com/cricket/getFeed?key=4333433434343&format=xml&tagsformat=long&type=schedule
I wanted to get the data from that xml files as java objects, so that I can insert into my database regularly.
The above data is nothing but regular updates from the vendor, so that I can update in my website.
can you please suggest me what are my options available to get this working
Should I use any webservices or just Xstream
to get my final output.. please suggest me as am a new comer to this concept
Vendor has suggested me that he can give me the data in following 3 formats rss, xml or json, I am not sure what is easy and less consumable to get it working
I would suggest just write a program that parses the XML and inserts the data directly into your database.
Example
This groovy script inserts data into a H2 database.
//
// Dependencies
// ============
import groovy.sql.Sql
#Grapes([
#Grab(group='com.h2database', module='h2', version='1.3.163'),
#GrabConfig(systemClassLoader=true)
])
//
// Main program
// ============
def sql = Sql.newInstance("jdbc:h2:db/cricket", "user", "pass", "org.h2.Driver")
def dataUrl = new URL("http://scores.cricandcric.com/cricket/getFeed?key=4333433434343&format=xml&tagsformat=long&type=schedule")
dataUrl.withReader { reader ->
def feeds = new XmlSlurper().parse(reader)
feeds.matches.match.each {
def data = [
it.id,
it.name,
it.type,
it.tournamentId,
it.location,
it.date,
it.GMTTime,
it.localTime,
it.description,
it.team1,
it.team2,
it.teamId1,
it.teamId2,
it.tournamentName,
it.logo
].collect {
it.text()
}
sql.execute("INSERT INTO matches (id,name,type,tournamentId,location,date,GMTTime,localTime,description,team1,team2,teamId1,teamId2,tournamentName,logo) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)", data)
}
}
Well... you could use an XML Parser (stream or DOM), or a JSON parser (again stream of 'DOM'), and build the objects on the fly. But with this data - which seems to consist of records of cricket matches, why not go with a csv format?
This seems to be your basic 'datum':
<id>1263</id>
<name>Australia v India 3rd Test at Perth - Jan 13-17, 2012</name>
<type>TestMatch</type>
<tournamentId>137</tournamentId>
<location>Perth</location>
<date>2012-01-14</date>
<GMTTime>02:30:00</GMTTime>
<localTime>10:30:00</localTime>
<description>3rd Test day 2</description>
<team1>Australia</team1>
<team2>India</team2>
<teamId1>7</teamId1>
<teamId2>1</teamId2>
<tournamentName>India tour of Australia 2011-12</tournamentName>
<logo>/cricket/137/tournament.png</logo>
Of course you would still have to parse a csv, and deal with character delimiting (such as when you have a ' or a " in a string), but it will reduce your network traffic quite substantially, and likely parse much faster on the client. Of course, this depends on what your client is.
Actually you have RESTful store that can return data in several formats and you only need to read from this source and no further interaction is needed.
So, you can use any XML Parser to parse XML data and put the extracted data in whatever data structure that you want or you have.
I did not hear about XTREME, but you can find more information about selecting the best parser for your situation at this StackOverflow question.