Modify the metamodel's schema to change/rename column names

Modify the metamodel's schema to change/rename column names - java

I am using Apache MetaModel to get the schema information. There is one use case, where I need to create CsvDataContext object for csv file with no header. I have column names in a separate data structure (List<String> colNames).
The context object gives column names as "A", "B", "C", etc. I guess metamodel assigns some default column names to the tables with no headers.
Is there any way to modify the schema which is held by the CsvDataContext object?
I believe UpdateableDataContext should work, but the documentation doesn't expose any method that allows modifying the metadata like column name.
How is it possible to achieve this scenario?

When you create your CsvDataContext, you specify a CsvConfiguration. One of the options in the CsvConfiguration is to provide a ColumnNamingStrategy. The default strategy is in deed to use alphabetic characters, A, B, C etc. But you can use a custom naming strategy, like this:
ColumnNamingStrategy columnNamingStrategy =
ColumnNamingStrategies.customNames("id", "foo", "bar", "baz");
CsvConfiguration configuration = new CsvConfiguration(
0, columnNamingStrategy, "UTF-8", ',', '"', '\\', true, false);
return new CsvDataContext(file, configuration);

Related

Spring Data MongoDB - projection and search

I am using "Wildcard text index" in order to search for a pattern in every fields of my class. I am also using projection in order to remove a certain field:
#Query(value = "{$text: { $search: ?0 }, fields = "{'notWantedField':0}")
However, I would like to prevent from matching something from the unwanted field.
In other words, I would like first to project (and remove fields), then search on the remaining fields.
Is there a way to combine projection and search while keeping the wildcard search?
Thanks a lot.
I am using spring-data-mongodb 1.10.8

A possible solution could be a $and operator combined with a $regex.
For example following the Mongodb documentation https://docs.mongodb.com/manual/reference/operator/query/text, if you suppose to create a text index combining subject and author (db.articles.createIndex({"author": "text", "subject": "text"}), you can exclude author field with this query:
db.articles.find( {$and: [{ $text: { $search: "coffee" } }, {"author": {'$regex' : '^((?!coffe).)*$', '$options' : 'i'}}]}, {"author": 0})
In your case, considering that your index is a wildcard, you must exclude, using the regex, all the fields that are also in the projection.

ElasticSearch - define custom letter order for sorting

I'm using ElasticSearch 2.4.2 (via HibernateSearch 5.7.1.Final from Java).
I have a problem with string sorting.
The language of my application has diacritics, which have a specific alphabetic
ordering. For example Ł goes directly after L, Ó goes after O, etc.
So you are supposed to sort the strings like this:
Dla
Dła
Doa
Dóa
Dza
Eza
ElasticSearch sorts by typical letters first, and moves all strange
letters to at the end:
Dla
Doa
Dza
Dła
Dóa
Eza
Can I add a custom letter ordering for ElasticSearch?
Maybe there are some plugins for this?
Do I need to write my own plugin? How do I start?
I found a plugin for Polish language for ElasticSearch,
but as I understand it is for analysing, and analysing is not a solution
in my case, because it will ignore diacritics and leave words with L and Ł mixed:
Dla
Dłb
Dlc
This would sometimes be acceptable, but is not acceptable in my specific usecase.
I will be grateful for any remarks on this.

I've never used it, but there is a plugin that could fit your needs: the ICU collation plugin.
You will have to use the icu_collation token filter, which will turns the tokens into collation keys. For that reason you will need to use a separate #Field (e.g. myField_sort) in Hibernate Search.
You can assign a specific analyzer to your field with #Field(name = "myField_sort", analyzer = #Analyzer(definition = "myCollationAnalyzer")), and define this analyzer (type, parameters) with something like that on one of your entities:
#Entity
#Indexed
#AnalyzerDef(
name = "myCollationAnalyzer",
filters = {
#TokenFilterDef(
name = "polish_collation",
factory = ElasticsearchTokenFilterFactory.class,
params = {
#Parameter(name = "type", value = "'icu_collation'"),
#Parameter(name = "language", value = "'pl'")
}
)
}
)
public class MyEntity {
See the documentation for more information: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_custom_analyzers
It's admittedly a bit clumsy right now, but analyzer configuration will get a bit cleaner in the next Hibernate Search version with normalizers and analyzer definition providers.
Note: as usual, your field will need to be declared as sortable (#SortableField(forField = "myField_sort")).

Updating transaction in Datomic for an attribute that has many cardinality

I have searched for two days and haven't seen any code that is closed to this. This is the only code in java that I seen and it's not exactly what I wanted.
conn.transact(list(list("db.fn/cas", datomic_id, "attribute you want to update", old value, new value))).get();
I have tried this code with a single value in the old value and a single value in the new value but it just stack the information instead of overlaying it.
Example: old value is chicken and new value is fish. After the transaction, it's [chicken, fish] instead of what I expected to be just [fish] and chicken will be move into archive(history).
So the question is, how do you ref the old array value and how do you give the new value an array so it'll update as what I expected to be as stated above.
I remember reading somewhere that under the hood it's just a series of values linking to one attribute. If this is the case does that mean that I have to find the datomic id of the string and change it? Also have to remove it if it's not on the new list?

FYI, these are the generic transaction functions I currently use for this kind of task (declared from Clojure, but should be fairly easy to adapt to Java if required):
[{:db/ident :bsu.fns/replace-to-many-scalars,
:db/doc "Given an entity's lookup ref, a to-many (scalar) attribute, and a list of new values,
yields a transaction that replaces the old values by new ones"
:db/id (d/tempid :db.part/user),
:db/fn (d/function
'{:lang :clojure,
:imports [],
:requires [[datomic.api :as d]],
:params [db entid attr new-vals],
:code (let [old-vals (if-let [e (d/entity db entid)] (get e attr) ())
to-remove (remove (set (seq new-vals)) old-vals)]
(concat
(for [ov to-remove] [:db/retract entid attr ov])
(for [nv new-vals] [:db/add entid attr nv]))
)}),
}
{:db/ident :bsu.fns/to-many-retract-all-but,
:db/doc "Given an entity lookup ref, a to-many (entity) attribute, and a list of lookup refs
expands to a transaction which will retract all the [origin `to-many-attr` target] relationships but those for which target is among the `to-spare-lookup-refs`"
:db/id (d/tempid :db.part/user),
:db/fn (d/function
'{:lang :clojure,
:imports [],
:requires [[datomic.api :as d]],
:params [db origin to-many-attr to-spare-lookup-refs],
:code (let [old-targets-ids (d/q '[:find [?t ...] :in $ ?to-many-attr ?origin :where [?origin ?to-many-attr ?t]]
db to-many-attr origin)
to-spare-ids (for [lr to-spare-lookup-refs] (:db/id (d/entity db lr)))
to-delete (->> old-targets-ids (remove (set to-spare-ids)))]
(for [eid to-delete] [:db/retract origin to-many-attr eid])
#_[old-targets-ids to-update-ids to-delete])}),
}]
I don't claim at all they're optimal performance or design-wise, but they've worked for me so far. HTH.

If you need a "last write wins" style consistent solution to replace all values for a particular entity for a card many attribute, your best bet is to go with a transaction function. You could take the following approach:
Get all datoms matching entity + attribute you want to retract all values for.
Generate retractions for all of them.
Create add transactions for all new values (e.g. from a passed collection)
Remove any conflicts (i.e. if you have the same EAV with both an add and a result)
Return the resulting transaction data.

Java POJO to/from CSV, using field names as column titles

I’m looking for a Java library that can read/write a list of “simple objects” from/to a CSV file.
Let’s define a “simple object” as a POJO that all its fields are primitive types/strings.
The matching between an object’s field and a CSV column must be defined according to the name of the field and the title (first row) of the column - the two must be identical. No additional matching information should be required by the library! Such additional matching information is a horrible code duplication (with respect to the definition of the POJO class) if you simply want the CSV titles to match the field names.
This last feature is something I’ve failed to find in all the libraries I looked at: OpenCSV, Super CSV and BeanIO.
Thanks!!
Ofer

uniVocity-parsers does not require you to provide the field names in your class, but it uses annotations if you need to determine a different name, or even data manipulation to be performed. It is also way faster than the other libraries you tried:
class TestBean {
#NullString(nulls = { "?", "-" }) // if the value parsed in the quantity column is "?" or "-", it will be replaced by null.
#Parsed(defaultNullRead = "0") // if a value resolves to null, it will be converted to the String "0".
private Integer quantity; // The attribute name will be matched against the column header in the file automatically.
#Trim
#LowerCase
#Parsed
private String comments;
...
}
To parse:
BeanListProcessor<TestBean> rowProcessor = new BeanListProcessor<TestBean>(TestBean.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(parserSettings);
//And parse!
//this submits all rows parsed from the input to the BeanListProcessor
parser.parse(new FileReader(new File("/examples/bean_test.csv")));
List<TestBean> beans = rowProcessor.getBeans();
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

hbase: querying for specific value with dynamically created qualifier

Hy,
Hbase allows a column family to have different qualifiers in different rows. In my case a column family has the following specification
abc[cnt] # where cnt is an integer that can be any positive integer
what I want to achieve is to get all the data from a different column family, only if the value of the described qualifier (in a different column family) matches.
for narrowing the Scan down I just add those two families I need for the query. but that is as far as I could get for now.
I already achieved the same behaviour with a SingleColumnValueFilter, but then the qualifier was known in advance. but for this one the qualifier can be abc1, abc2 ... there would be too many options, thus too many SingleColumnValueFilter's.
Then I tried using the ValueFilter, but this filter only returns those columns that match the value, thus the wrong column family.
Can you think of any way to achieve my goal, querying for a value within a dynamically created qualifier in a column family and returning the contents of the column family and another column family (as specified when creating the Scan)? preferably only querying once.
Thanks in advance for any input.
UPDATE: (for clarification as discussed in the comments)
in a more graphical way, a row may have the following:
colfam1:aaa
colfam1:aab
colfam1:aac
colfam2:abc1
colfam2:abc2
whereas I want to get all of the family colfam1 if any value of colfam2 has e.g. the value x, with regard to the fact that colfam2:abc[cnt] is dynamically created with cnt being any positive integer

I see two approaches for this: client-side filtering or server-side filtering.
Client-side filtering is more straightforward. The Scan adds only the two families "colfam1" and "colfam2". Then, for each Result you get from scanner.next(), you must filter according to the qualifiers in "colfam2".
byte[] queryValue = Bytes.toBytes("x");
Scan scan = new Scan();
scan.addFamily(Bytes.toBytes("colfam1");
scan.addFamily(Bytes.toBytes("colfam2");
ResultScanner scanner = myTable.getScanner(scan);
Result res;
while((res = scanner.next()) != null) {
NavigableMap<byte[],byte[]> colfam2 = res.getFamilyMap(Bytes.toBytes("colfam2"));
boolean foundQueryValue = false;
SearchForQueryValue: while(!colfam2.isEmpty()) {
Entry<byte[], byte[]> cell = colfam2.pollFirstEntry();
if( Bytes.equals(cell.getValue(), queryValue) ) {
foundQueryValue = true;
break SearchForQueryValue;
}
}
if(foundQueryValue) {
NavigableMap<byte[],byte[]> colfam1 = res.getFamilyMap(Bytes.toBytes("colfam1"));
LinkedList<KeyValue> listKV = new LinkedList<KeyValue>();
while(!colfam1.isEmpty()) {
Entry<byte[], byte[]> cell = colfam1.pollFirstEntry();
listKV.add(new KeyValue(res.getRow(), Bytes.toBytes("colfam1"), cell.getKey(), cell.getValue());
}
Result filteredResult = new Result(listKV);
}
}
(This code was not tested)
And then finally filteredResult is what you want. This approach is not elegant and might also give you performance issues if you have a lot of data in those families. If "colfam1" has a lot of data, you don't want to transfer it to the client if it will end up not being used if value "x" is not in a qualifier of "colfam2".
Server-side filtering. This requires you to implement your own Filter class. I believe you cannot use the provided filter types to do this. Implementing your own Filter takes some work, you also need to compile it as a .jar and make it available to all RegionServers. But then, it helps you to avoid sending loads of data of "colfam1" in vain.
It is too much work for me to show you how to custom implement a Filter, so I recommend reading a good book (HBase: The Definitive Guide for example). However, the Filter code will look pretty much like the client-side filtering I showed you, so that's half of the work done.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.