Generating unique IRI from a filename - java

I have an ontology, created using Protegé 4.3.0, and I would use the OWL-API in order to add some OWLNamedIndividual objects to a file OWL. I use the following instruction in order to create a new OWLNamedIndividual:
OWLNamedIndividual objSample = df.getOWLNamedIndividual(IRI.create(iri + "#" + id));
the variable id is a String;
iri is the base IRI of the loaded ontology; in order to get the base IRI of the ontology, I used the following instruction: iri = ontology.getOntologyID().getOntologyIRI().
So the new OWLNamedIndividual is added to the loaded ontology and then the ontology is saved to OWL file using the following instruction.
XMLWriterPreferences.getInstance().setUseNamespaceEntities(true);
OWLOntologyFormat format = manager.getOntologyFormat(ontology);
manager.saveOntology(ontology, format, IRI.create(file.toURI()));
The variable id is a String generated from the base name of a file (ie. the file name without the extension). If the base name of the file has one or more spaces in the name, the ontology is saved without any error, but when I open the newly saved OWL file, Protegé reports a parsing error at the first occurrence of the IRI containing spaces.
How could I create a valid IRI for an OWLNamedIndividual object using the base IRI of loaded ontology and the base name of a file?

IRIs are suppose to be a block that represents your resource. If I understand you correctly you have an id such as big boat and you are creating IRIs that look like <http://example.com#big boat>. This is not a valid IRI, and you need to replace the space with an _ or a -, such that you have <http://example.com#big_boat>. Even if you enter a modelling element name with a space in Protégé, it automatically will put a _ in the middle.
Take a look at the this article for the invalid characters in an IRI.
Systems accepting IRIs MAY also deal with the printable characters in
US-ASCII that are not allowed in URIs, namely "<", ">", '"', space,
"{", "}", "|", "\", "^", and "`", in step 2 above. If these
characters are found but are not converted, then the conversion
SHOULD fail.

Related

Removing special characters in dynamic schema

Requirement:
Source file:
abc,''test,data'',valid
xyz,''sample,data'',invalid
the data in source file we need to read dynamically. We are reading entire data in one string column. One of the value and file delimiter have comma separator. I have to load data in target table as follows with out double quotes
Target table :
Col1|Col2|Col3
abc|test,data|valid
xyz|sample,data|invalid

ParserException in Manchester Syntax

I try to use ManchesterOWLSyntaxParser from OWL-API. I need to convert String in Manchester syntax to OWL Axiom, which I can add to existing ontology. The problem is, that I always get Parser Exception (something like bellow):
Exception in thread "main" org.semanticweb.owlapi.manchestersyntax.renderer.ParserException: Encountered Class: at line 1 column 1. Expected one of:
Class name
Object property name
Data property name
inv
Functional
inverse
InverseFunctional
(
Asymmetric
Transitive
Irreflexive
{
Symmetric
Reflexive
at org.semanticweb.owlapi.manchestersyntax.parser.ManchesterOWLSyntaxParserImpl$ExceptionBuilder.build(ManchesterOWLSyntaxParserImpl.java:2802)
at org.semanticweb.owlapi.manchestersyntax.parser.ManchesterOWLSyntaxParserImpl.parseAxiom(ManchesterOWLSyntaxParserImpl.java:2368)
at Main.main(Main.java:29)
I have read about Manchester syntax at w3c website, but I don't know where the problem is. Maybe manchester parser should be used in different way.
Code with example of string in Manchester syntax, which I have tried to parse.
OWLOntology o = ontologyManager.loadOntologyFromOntologyDocument(new File("family.owl"));
OWLDataFactory df = o.getOWLOntologyManager().getOWLDataFactory();
ManchesterOWLSyntaxParser parser = new ManchesterOWLSyntaxParserImpl(ontologyManager.getOntologyConfigurator(), df);
parser.setStringToParse("Class: <somePrefix#Father>" +
" EquivalentTo: \n" +
" <somePrefix#Male>\n" +
" and <somePrefix#Parent>");
OWLAxiom ax = parser.parseAxiom();
The ontology does not have declarations for the classes and properties in the fragment. The parser cannot parse the fragment without knowing the entities involved.
Just like parsing a whole ontology, classes, properties and data types need declaration axioms in the ontology object.

Mongo database Invalid BSON field name exception

I tried to follow this How to use dot in field name?. But it result as the picture. There is a additional space:-
protected Document setNestedField(Document doc, FieldValue parentField, String nestedFieldName, Object value, boolean concatenate) {
if (concatenate) {
doc.put(parentField.getSystemName() + "." + nestedFieldName, value);
}
else {
doc.put(nestedFieldName, value);
}
return doc;
}
Exception:-Invalid BSON field name photographs.inner_fields; nested exception is java.lang.IllegalArgumentException: Invalid BSON field name photographs.inner_fields.
How can I use dot "." in field name. I have to use . as I'm using some 3rd party api and I have no option to replace to something else like [dot]. Please suggest me?
In MongoDB field names cannot contain the dot (.) character as it is part of dot-notation syntax, see the documentation.
What third party API are you using ? Are you sure you need a dot ? Dots are commonly used when parsing JSON and your third party API should not need it.
So, a third party api is both constructing the keys (with periods in them), AND saving that to MongoDB?
I suggest that you open a bug ticker in said API:s tracker.
If this is not the case, encode the periods somewhere in the persistence code - and decode it on the way up.

Java POJO to/from CSV, using field names as column titles

I’m looking for a Java library that can read/write a list of “simple objects” from/to a CSV file.
Let’s define a “simple object” as a POJO that all its fields are primitive types/strings.
The matching between an object’s field and a CSV column must be defined according to the name of the field and the title (first row) of the column - the two must be identical. No additional matching information should be required by the library! Such additional matching information is a horrible code duplication (with respect to the definition of the POJO class) if you simply want the CSV titles to match the field names.
This last feature is something I’ve failed to find in all the libraries I looked at: OpenCSV, Super CSV and BeanIO.
Thanks!!
Ofer
uniVocity-parsers does not require you to provide the field names in your class, but it uses annotations if you need to determine a different name, or even data manipulation to be performed. It is also way faster than the other libraries you tried:
class TestBean {
#NullString(nulls = { "?", "-" }) // if the value parsed in the quantity column is "?" or "-", it will be replaced by null.
#Parsed(defaultNullRead = "0") // if a value resolves to null, it will be converted to the String "0".
private Integer quantity; // The attribute name will be matched against the column header in the file automatically.
#Trim
#LowerCase
#Parsed
private String comments;
...
}
To parse:
BeanListProcessor<TestBean> rowProcessor = new BeanListProcessor<TestBean>(TestBean.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(parserSettings);
//And parse!
//this submits all rows parsed from the input to the BeanListProcessor
parser.parse(new FileReader(new File("/examples/bean_test.csv")));
List<TestBean> beans = rowProcessor.getBeans();
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

Java: Search in a wrong encoded String without modifying it

I have to find a user-defined String in a Document (using Java), which is stored in a database in a BLOB. When I search a String with special characters ("Umlaute", äöü etc.), it failes, meaning it does not return any positions at all. And I am not allowed to convert the document's content into UTF-8 (which would have fixed this problem but raised a new, even bigger one).
Some additional information:
The document's content is returned as String in "ISO-8859-1" (Latin1).
Here is an example, what a String could look like:
Die Erkenntnis, daà der Künstler Schutz braucht, ...
This is how it should look like:
Die Erkenntnis, daß der Künstler Schutz braucht, ...
If I am searching for Künstler it would fail to find it, because it looks for ü but only finds ü.
Is it possible to convert Künstler into Künstler so I can search for the wrong encoded version instead?
Note:
We are using the Hibernate Framework for Database access. The original Getter for the Document's Content returns a byte[]. The String is than returned by calling
new String(getContent(), "ISO-8859-1")
The problem here is, that I cannot change this to UTF-8, because it would then mess up the rest of our application which is based on a third party application that delivers data this way.
Okay, looks like I've found a way to mess up the encoding on purpose.
new String("Künstler".getBytes("UTF-8"), "ISO-8859-1")
By getting the Bytes of the String Künstler in UTF-8 and then creating a new String, telling Java that this is Latin1, it converts to Künstler. It's a hell of a hack but seems to work well.
Already answered by yourself.
An altoghether different approach:
If you can search the blob, you could search using
"SELECT .. FROM ... WHERE"
+ " ... LIKE '%" + key.replaceAll("\\P{Ascii}+", "%") + "%'"
This replaces non-ASCII sequences by the % wildcard: UTF-8 multibyte sequences are non-ASCII by design.

Categories