I tried to follow this How to use dot in field name?. But it result as the picture. There is a additional space:-
protected Document setNestedField(Document doc, FieldValue parentField, String nestedFieldName, Object value, boolean concatenate) {
if (concatenate) {
doc.put(parentField.getSystemName() + "." + nestedFieldName, value);
}
else {
doc.put(nestedFieldName, value);
}
return doc;
}
Exception:-Invalid BSON field name photographs.inner_fields; nested exception is java.lang.IllegalArgumentException: Invalid BSON field name photographs.inner_fields.
How can I use dot "." in field name. I have to use . as I'm using some 3rd party api and I have no option to replace to something else like [dot]. Please suggest me?
In MongoDB field names cannot contain the dot (.) character as it is part of dot-notation syntax, see the documentation.
What third party API are you using ? Are you sure you need a dot ? Dots are commonly used when parsing JSON and your third party API should not need it.
So, a third party api is both constructing the keys (with periods in them), AND saving that to MongoDB?
I suggest that you open a bug ticker in said API:s tracker.
If this is not the case, encode the periods somewhere in the persistence code - and decode it on the way up.
Related
This is very new to me. I am reading data from a cassandra table. This data is being extracted via a "select json * ..." query but here's the thing. The format of that json is
{"acct_ref_nb": 1401040701, "txn_pst_dt": "2020-02-26", "txn_pst_tm": 1934131, "txn_am": 15000.0 ....
Every field is in quotation marks, followed by a colon, followed by the value, then a comma and the next field, so on and so forth.
We need to reformat this and have a nested structure. We also need to change the names of the fields. So you would have something like...
"{
"ccEvent": {
"account": {
"accountReferenceNumber": 1401040701,
"transactionPostDate": "2020-02-26",
"transactionPostTime": 1934131,
"transactionAmount": 15000.0,
........
Is there a preferred library to do this? I'm literally lost even at a high level on how to do this. Thanks.
I have implemented a custom bridge which maps all the dynamic fields with related types. Types can be of FieldType.STRING or FieldType.DOUBLE or FieldType.BOOLEAN based on the value.
When I looked on the mapping on my elastic search schema, all the string fields are mapped with type TEXT where I expect it to be a keyword so that I can do a wildcard serach.
Here is my problem I want to filter "AAA-VALUE" for dynamically mapped field 'attribute.dynamic-field-1'
I have an indexed value as "AAA-VALUE" for dynamically mapped field 'attribute.dynamic-field-1'
If I want to do a keyword search, I faced error like 'Field bridge is not found' then I resolved the error by ignoring the bridge using ignoreFieldBridge and the error is gone.
Then again I tried to do a search with value as "AAA-VALUE" and the result is empty (no data found). Here I created the query using a keyword() query.
Then again I tried to do a phrase query then it got worked but the problem is how I can do a wild card search like '-VALUE'.
Regarding code, I followed similar implementation as given here https://github.com/hibernate/hibernate-search/blob/master/legacy/engine/src/test/java/org/hibernate/search/test/bridge/MultiFieldMapBridge.java
Only the type differs in my implementation, where the type can be a string or boolean or double.
My hibernate search version - hibernate-search.version and hibernate-search-elasticsearch = 5.11.3.Final
It got to work after doing below changes.
This how I added the fields before
public class MultiFieldMapClassBridge implements MetadataProvidingFieldBridge {
;
;
;
luceneOptions.addFieldToDocument( fieldPrefix + "." + key, value, document );
}
But the fields should be added as below.
public class MultiFieldMapClassBridge implements MetadataProvidingFieldBridge {
;
;
org.apache.lucene.document.Field field = new org.apache.lucene.document.StringField(fieldPrefix + "." + key, value, luceneOptions.getStore());
document.add(field);
}
I written the wild card query as below
queryBuilder.keyword().wildcard().onField(prefixedPath).ignoreFieldBridge().matching(String.format("*%s*", matchingString.toLowerCase(Locale.getDefault()))).createQuery();
I realised this after reading this doc where the class bridges have to add the field as StringField.
https://docs.jboss.org/hibernate/search/5.5/reference/en-US/html_single/#example-class-bridge
I'm using jsoup to get all text from websites.
Document doc = Jsoup.connect("URL").get();
String allText doc.text().toLowerCase();
Then I'm using Hibernate to persist the object that holds all text to a MySQL DB:
...
#Column(name="all_text")
#Lob
private String allText = null;
...
Everything is good so far. Only that sometimes I get a MySQL error when I try to save the object with allText:
java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\x8A s...' for column 'all_text' at row 1
Already looked this up and it's an encoding error. Probably have some special characters on their websites. I found a way to fix this by changing the encoding in the DB.
But my actual question is: what's the best way to filter and remove the special characters from the allText string and not persist them at all?
EDIT: To clarify, by special characters I mean Emoticons and all that stuff. Definitely anything that doesn't fit into UTF-8 encoding. I'm not concerned about ~ ^ etc...
Thanks in advance!
Just use regex:
allText.replaceAll("\\p{C}", "");
Don't forget to import java.util.regexPattern
I have to find a user-defined String in a Document (using Java), which is stored in a database in a BLOB. When I search a String with special characters ("Umlaute", äöü etc.), it failes, meaning it does not return any positions at all. And I am not allowed to convert the document's content into UTF-8 (which would have fixed this problem but raised a new, even bigger one).
Some additional information:
The document's content is returned as String in "ISO-8859-1" (Latin1).
Here is an example, what a String could look like:
Die Erkenntnis, daà der Künstler Schutz braucht, ...
This is how it should look like:
Die Erkenntnis, daß der Künstler Schutz braucht, ...
If I am searching for Künstler it would fail to find it, because it looks for ü but only finds ü.
Is it possible to convert Künstler into Künstler so I can search for the wrong encoded version instead?
Note:
We are using the Hibernate Framework for Database access. The original Getter for the Document's Content returns a byte[]. The String is than returned by calling
new String(getContent(), "ISO-8859-1")
The problem here is, that I cannot change this to UTF-8, because it would then mess up the rest of our application which is based on a third party application that delivers data this way.
Okay, looks like I've found a way to mess up the encoding on purpose.
new String("Künstler".getBytes("UTF-8"), "ISO-8859-1")
By getting the Bytes of the String Künstler in UTF-8 and then creating a new String, telling Java that this is Latin1, it converts to Künstler. It's a hell of a hack but seems to work well.
Already answered by yourself.
An altoghether different approach:
If you can search the blob, you could search using
"SELECT .. FROM ... WHERE"
+ " ... LIKE '%" + key.replaceAll("\\P{Ascii}+", "%") + "%'"
This replaces non-ASCII sequences by the % wildcard: UTF-8 multibyte sequences are non-ASCII by design.
Maybe I'm really missing something.
I have indexed a bunch of key/value pairs in Lucene (v4.1 if it matters). Say I have
key1=value1 and key2=value2, e.g. as read from a properties file.
They get indexed both as specific fields and into a catchall "ALL" field, e.g.
new Field("key1", "value1", aFieldTypeMimickingKeywords);
new Field("key2", "value2", aFieldTypeMimickingKeywords);
new Field("ALL", "key1=value1", aFieldTypeMimickingKeywords);
new Field("ALL", "key2=value2", aFieldTypeMimickingKeywords);
// then get added to the Document of course...
I can then do a wildcard search, using
new WildcardQuery(new Term("ALL", "*alue1"));
and it will find the hit.
But, it would be nice to get more info, like "what was complete value (e.g. "key1=value1") that goes with that hit?".
The best I can figure out it to get the Document, then get the list of IndexableFields, then loop over all of them and see if the field.stringValue().contains("alue1"). (I can look at the data structures in the debugger and all the info is there)
This seems completely insane cause isn't that what Lucene just did? Shouldn't the Hit information return some of the Fields?
Is Lucene missing what seems like "obvious" functionality? Google and starting at the APIs hasn't revealed anything straightforward, but I feel like I must be searching on the wrong stuff.
You might want to try with IndexSearcher.explain() method. Once you get the ID of the matching document, prepare a query for each field (using the same search keywords) and invoke Explanation.isMatch() for each query: the ones that yield true will give you the matched field. Example:
for (String field: fields){
Query query = new WildcardQuery(new Term(field, "*alue1"));
Explanation ex = searcher.explain(query, docID);
if (ex.isMatch()){
//Your query matched field
}
}