Duke Fast Deduplication: java.lang.UnsupportedOperationException: Operation not yet supported? - java

I'm trying to use the Duke Fast Deduplication Engine to search for some duplicate records in the database at the company where I work.
I run it from the command line like this:
java -cp "C:\utils\duke-0.6\duke-0.6.jar;C:\utils\duke-0.6\lucene-core-3.6.1.jar" no.priv.garshol.duke.Duke --showmatches --verbose .\config.xml
But I get an error:
Exception in thread "main" java.lang.UnsupportedOperationException: Operation no
t yet supported
at sun.jdbc.odbc.JdbcOdbcResultSet.isClosed(Unknown Source)
at no.priv.garshol.duke.datasources.JDBCDataSource$JDBCIterator.close(JD
BCDataSource.java:115)
at no.priv.garshol.duke.Processor.deduplicate(Processor.java:152)
at no.priv.garshol.duke.Duke.main_(Duke.java:135)
at no.priv.garshol.duke.Duke.main(Duke.java:38)
My configuration file looks like this:
<duke>
<schema>
<threshold>0.82</threshold>
<maybe-threshold>0.80</maybe-threshold>
<path>test</path>
<property type="id">
<name>ID</name>
</property>
<property>
<name>LNAME</name>
<comparator>no.priv.garshol.duke.comparators.ExactComparator</comparator>
<low>0.6</low>
<high>0.8</high>
</property>
<property>
<name>FNAME</name>
<comparator>no.priv.garshol.duke.comparators.ExactComparator</comparator>
<low>0.6</low>
<high>0.8</high>
</property>
<property>
<name>MNAME</name>
<comparator>no.priv.garshol.duke.comparators.ExactComparator</comparator>
<low>0.3</low>
<high>0.5</high>
</property>
<property>
<name>SSN</name>
<comparator>no.priv.garshol.duke.comparators.ExactComparator</comparator>
<low>0.0</low>
<high>1.0</high>
</property>
</schema>
<jdbc>
<param name="driver-class" value="sun.jdbc.odbc.JdbcOdbcDriver" />
<param name="connection-string" value="jdbc:odbc:VT_DeDupe" />
<param name="user-name" value="aleer" />
<param name="password" value="**" />
<param name="query" value="select SocialSecurityNumber, LastName, FirstName, MiddleName, empssn from T_Employees" />
<column name="SocialSecurityNumber" property="ID" />
<column name="LastName" property="LNAME" />
<column name="FirstName" property="FNAME" />
<column name="MiddleName" property="MNAME" />
<column name="empssn" property="SSN" />
</jdbc>
</duke>
It doesn't really tell me what is unsupported...I'm just trying it out, nothing serious with the configuration yet.

As mbonaci says, the problem is that the JDBC driver's isClosed() method is not implemented. This even though implementing it would be no harder than simply writing "return closed".
I added an ugly workaround for this issue now. Please do an "hg pull" and try again.

Which Java version are you using?
sun.jdbc.odbc.JdbcOdbcResultSet.isClosed first appeared in Java 1.6. and it still looks like this in v1.7 (I haven't checked in Java 8):
public boolean isClosed() throws SQLException {
throw new UnsupportedOperationException("Operation not yet supported");
}
So simply don't call that method. Use some other way of checking whether resultset is closed.
Or if you cannot change the code ask the project's authors for help (I see there was an effort to solve exception when closing RS).

Related

How to add Annotations elements in metadata generated by Apache Olingo V2.0?

I have developed Odata service for a system entity which generates a metadata but however I cant figure out how to add Annotations element to it. Sample Metadata generated is as follows :-
<?xml version="1.0" encoding="utf-8"?>
<edmx:Edmx xmlns:edmx="http://schemas.microsoft.com/ado/2007/06/edmx" xmlns:sap="http://www.sap.com/Protocols/SAPData" Version="1.0">
<edmx:DataServices xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"
m:DataServiceVersion="1.0">
<Schema xmlns="http://schemas.microsoft.com/ado/2008/09/edm" Namespace="myNamespace" sap:schema-version="1">
<EntityType Name="System">
<Key>
<PropertyRef Name="Id" />
</Key>
<Property Name="Id" Type="Edm.Int32" Nullable="false" />
<Property Name="name" Type="Edm.String" sap:label="System Name" sap:creatable="false"
sap:updatable="false" sap:sortable="false" sap:required-in-filter="true"/>
<Property Name="description" Type="Edm.String" />
<Property Name="status" Type="Edm.String" />
<Property Name="type" Type="Edm.String" />
</EntityType>
<EntityContainer Name="ODataEntityContainer" m:IsDefaultEntityContainer="true">
<EntitySet Name="Systems" EntityType="myNamespace.System" />
<FunctionImport Name="NumberOfSystems" ReturnType="Collection(myNamespace.System)"
m:HttpMethod="GET" />
</EntityContainer>
</Schema>
</edmx:DataServices>
</edmx:Edmx>
I need to add following elements to above metatada
<Annotations Target="myNamespace.System"
xmlns="http://docs.oasis-open.org/odata/ns/edm">
<Annotation Term="com.sap.vocabularies.UI.v1.LineItem">
<Collection>
<Record Type="com.sap.vocabularies.UI.v1.DataField">
<PropertyValue Property="Value" Path="name" />
</Record>
<Record Type="com.sap.vocabularies.UI.v1.DataField">
<PropertyValue Property="Value" Path="description"/>
</Record>
<Record Type="com.sap.vocabularies.UI.v1.DataField">
<PropertyValue Property="Value" Path="status" />
</Record>
</Collection>
</Annotation>
</Annotations>
I came across the org.apache.olingo.commons.api.edm.provider.annotation package but cant find any suitable API. Please let me know how should I proceed.
Thanks in advance.
The annotations you would like to use have been introduced with OData V3 which is why they are not directly supported with the Olingo V2 library.
You can use the EdmProvider AnnotationElement and AnnotationAttribute classes to mimic this behaviour though. For example You can create a AnnotationElement with the name "Annotations" this element will then have the "AnnotationAttribute" Target=SomeString. Since an "AnnotationElement" can have child elements you can put your Collection element there. Namespaces are also handled with "AnnotationAttributes".
You can only attach the annotation to Edm elements which are derived from the EdmAnnotatable interface. So this is a difference to V3.
This is currently the only way to get this behaviour with Olingo V2.

Hibernate View showing different results in app and Workbench

I'm working on an app in Java connected to a MySql database by hibernate.
I'm using Pojos to define the classes and using the class Session to connect to the database.
The problem is the next view:
CREATE OR REPLACE VIEW INVENTARIO AS
SELECT
ID_ARTICULO,
ID_ESTRUCTURA,
ID_ESTRUCTURA_ORIGEN,
SUM(STOCK)STOCK,
STOCK_MIN,
NECESITA_REPO
FROM
HISTORICO_INVENTARIO
LEFT JOIN TIPOS_MOVIMIENTO
ON HISTORICO_INVENTARIO.ID_TIPO_MOV = TIPOS_MOVIMIENTO.ID_TIPO_MOV
GROUP BY ID_ARTICULO , ID_ESTRUCTURA , ID_ESTRUCTURA_ORIGEN , STOCK_MIN , NECESITA_REPO;
In Java, i'm mapping the view this way:
<hibernate-mapping>
<class name="Pojos.Inventario" table="INVENTARIO">
<id name="id_articulo" type="string" column="ID_ARTICULO"/>
<property name="id_estructura" type="string" column="ID_ESTRUCTURA" />
<property name="id_estructura_origen" type="string" column="ID_ESTRUCTURA_ORIGEN" />
<property name="stock" type="float" column="STOCK" />
<property name="stock_min" type="float" column="STOCK_MIN" />
<property name="necesita_repo" type="string" column="NECESITA_REPO" />
</class>
I've to say that the field "id_articulo" is not the ID, but i've to choose one because.
If i execute this view in MySql Workbench i can the the results correctly. If i execute the same query in my app, i'm having different results.
Does anyone knows why could be this happening?
Thanks in advance.
EDIT:
I've tried to define the XML putting the SQL in the subselect tag:
<class name="Pojos.Inventario">
<subselect>
SELECT
ID_ARTICULO,
ID_ESTRUCTURA,
ID_ESTRUCTURA_ORIGEN,
SUM(STOCK) STOCK,
STOCK_MIN,
NECESITA_REPO
FROM
HISTORICO_INVENTARIO
LEFT JOIN TIPOS_MOVIMIENTO
ON HISTORICO_INVENTARIO.ID_TIPO_MOV = TIPOS_MOVIMIENTO.ID_TIPO_MOV
GROUP BY ID_ARTICULO , ID_ESTRUCTURA , ID_ESTRUCTURA_ORIGEN , STOCK_MIN , NECESITA_REPO
</subselect>
<synchronize table="HISTORICO_INVENTARIO"/>
<synchronize table="TIPOS_MOVIMIENTO"/>
<id name="id_articulo" type="string" column="ID_ARTICULO"/>
<property name="id_estructura" type="string" column="ID_ESTRUCTURA" />
<property name="id_estructura_origen" type="string" column="ID_ESTRUCTURA_ORIGEN" />
<property name="stock" type="float" column="STOCK" />
<property name="stock_min" type="float" column="STOCK_MIN" />
<property name="necesita_repo" type="string" column="NECESITA_REPO" />
</class>
Getting the worong resultset
make your hibernate show_sql parameter to true. Now try to capture sql in your log and try to run it in your sql workbench.
<property name="show_sql">true</property>
Done it!
The problem was produced by the ID. I've added one extra field wich is the new ID. Now I'm getting the correct resultset

Duke deduplication engine: linking records not working?

I am attempting to use Duke to match records from one database to another. One db has song titles + writers. I am trying to match to another db to find duplicates and corresponding records.
I have gotten duke to run and I can see some of the records getting matched. But no matter what I do, Correct links found = 0% always and I just cant right to the linkfile.
This is what I have done currently:
<duke>
<schema>
<threshold>0.79</threshold>
<maybe-threshold>0.70</maybe-threshold>
<path>test</path>
<property type="id">
<name>PublishingID</name>
</property>
<property type="id">
<name>AmgID</name>
</property>
<property>
<name>NAME</name>
<comparator>no.priv.garshol.duke.comparators.JaroWinkler</comparator>
<low>0.12</low>
<high>0.61</high>
</property>
<property>
<name>TITLE</name>
<comparator>no.priv.garshol.duke.comparators.Levenshtein</comparator>
<low>0.09</low>
<high>0.93</high>
</property>
</schema>
<group>
<jdbc>
<param name="driver-class" value="com.mysql.jdbc.Driver"/>
<param name="connection-string" value="jdbc:mysql://127.0.0.1"/>
<param name="user-name" value="root"/>
<param name="password" value="root"/>
<param name="query" value="
SELECT pSongs.song_id, pSongs.songtitle, pSongs.publisher_id, pWriters.first_name AS writer_first_name, pWriters.last_name AS writer_last_name
FROM devel_matching.publisher_songs AS pSongs
INNER JOIN devel_matching.publisher_writers as pWriters ON pWriters.publisher_id = pSongs.publisher_id AND pWriters.song_id = pSongs.song_id
WHERE pSongs.writers LIKE '%LENNON, JOHN%'
LIMIT 20000;"/>
<column name="song_id" property="PublishingID"/>
<column name="songtitle" property="TITLE" cleaner="no.priv.garshol.duke.cleaners.LowerCaseNormalizeCleaner"/>
<column name="writer_first_name" property="NAME" cleaner = "no.priv.garshol.duke.cleaners.LowerCaseNormalizeCleaner"/>
</jdbc>
</group>
<group>
<jdbc>
<param name="driver-class" value="com.mysql.jdbc.Driver"/>
<param name="connection-string" value="jdbc:mysql://127.0.0.1"/>
<param name="user-name" value="root"/>
<param name="password" value="root"/>
<param name="query" value="
SELECT amgSong.id, amgSong.track, SUBSTRING_INDEX(SUBSTRING_INDEX(amgSong.composer, '/', numbers.n), '/', -1) composer
FROM
devel_matching.numbers INNER JOIN devel_matching.track as amgSong
ON CHAR_LENGTH(amgSong.composer) - CHAR_LENGTH(REPLACE(amgSong.composer, '/', '')) >= numbers.n - 1
WHERE amgSong.composer like '%lennon%'
LIMIT 5000;"/>
<column name="id" property = "AmgID"/>
<column name="track" property="TITLE" cleaner="no.priv.garshol.duke.cleaners.LowerCaseNormalizeCleaner"/>
<column name="composer" property="NAME" cleaner = "no.priv.garshol.duke.cleaners.LowerCaseNormalizeCleaner"/>
</jdbc>
</group>
Output:
Total records: 5000
Total matches: 8284
Total non-matches: 1587
Correct links found: 0 / 0 (0.0%)
Wrong links found: 0 / 0 (0.0%)
Unknown links found: 8284
Percent of links correct 0.0%, wrong 0.0%, unknown 100.0%
Precision 0.0%, recall NaN%, f-number 0.0
Running on Spring STS:
program arguments = --progress --verbose --testfile=linked.txt --testdebug --showmatches duke.xml
Its not writing to linked.txt or finding any correct links. Not sure what I am doing wrong here. Any help would be awesome.
Actually, it is finding 8284 links. --testfile is for giving Duke a file containing known correct links, basically test data. What you want is --linkfile, which writes the links you've found into that file.
I guess I should add code which warns against an empty test file, since that very likely indicates a user error.
You'd probably be better off asking this question on the Duke mailing list, btw.

log4j2 how to read property variable from file into log4j2

Background: As usual we have various life cycles like dev. stage, lt, prod all these are picked at deploy time from environment variable ${lifecycle}.
So JNDI setting we stores in ${lifecycle}.properties as variable datasource.jndi.name=jdbc/xxx. As other beans are also using this properties file, it is verified that such variable is loaded & file is in classpath, but somehow I am not able to consume this variable in log4j2.xml in below JDBC Appender.
<JDBC name="DBAppender" tableName="V1_QUERY_LOG" bufferSize="4" ignoreExceptions="false">
<DataSource jndiName="${sys:datasource.jndi.name}" />
<Column name="event_date" isUnicode="false" isEventTimestamp="true" />
<Column name="log_level" isUnicode="false" pattern="%level" />
<Column name="logger" isUnicode="false" pattern="%logger" />
<Column name="message" isUnicode="false" pattern="%message" />
<Column name="exception_msg" isUnicode="false" pattern="%ex{full}" />
</JDBC>
I have tried some option like "${datasource.jndi.name}" too, or is there any way I can fit the solution in
<Properties>
<Property name="datasource.jndi.name">get datasource.jndi.name from {lifecycle}.properties</property>
</Properties>
If you are not using java system properties, but environment variables, you should not use the ${sys:variable} prefix, but the ${env:variable} prefix instead. See also http://logging.apache.org/log4j/2.x/manual/lookups.html#EnvironmentLookup
In general the placeholders that work in Spring bean configuration files do not work in Log4j configuration. They look the same, but the syntax and underlying discovery mechanism are completely different.
For instance ${sys:something} attempts to resolve a Java system property. System properties are usually passed to JVM as command line arguments in format -Dkey=value and not stored in property files.
You can try to use Resource bundle syntax ${bundle:MyProperties:MyKey} however this will load from that specific file and will not perform any additional Spring substitutions.
See also:
http://logging.apache.org/log4j/2.x/manual/configuration.html#PropertySubstitution

Hibernate write followed by read causes object not found

I'm currently working on a Quizzing Tool that uses hibernate and spring. I'm actually building it as a Sakai LMS tool and that complicates this question a little more, but let me see if I can generalize.
My current scenario is when users go to a StartQuiz page, which when they submit the form on the page, initializes an Attempt object(Stored by hibernate). It populates the object below:
<class name="org.quiztool.model.Attempt" table="QT_ATTEMPTS">
<cache usage="transactional" />
<id name="id" type="long">
<generator class="native">
<param name="sequence">QT_ATTEMPTS_ID_SEQ</param>
</generator>
</id>
<many-to-one name="quizId" class="org.quiztool.model.Quiz" cascade="none" />
<property name="score" type="int" not-null="true" />
<property name="outOf" type="int" not-null="true" />
<list name="responses" cascade="none" table="QT_RESPONSES" lazy="false">
<key column="id"/>
<index column="idxr"/>
<many-to-many class="org.quiztool.model.QuizAnswer" />
</list>
<list name="questionList" cascade="none" table="QT_ATTEMPT_QUESTIONS" lazy="false">
<key column="id"/>
<index column="idxq"/>
<many-to-many class="org.quiztool.model.QuizQuestion" />
</list>
<property name="userId" type="string" length="99" />
<property name="siteRole" type="string" length="99" />
<property name="startTime" type="java.util.Date" not-null="true" />
<property name="finishTime" type="java.util.Date" />
</class>
It randomly picks out a set of questions and sets the start time and a few other properties, then redirects the user to the TakeTheQuiz page after saving the object through hibernate.
On the TakeTheQuiz page it loads the attempt object by its ID which is passed as a request param, then prints and formats it into an html form for the user to fill out the quiz. About 2/5 concurrent users will see no questions. The attempt object loads, and its questions are empty.
My theory is that the question list in the Attempt object is either not inserting immediately to the database(which is fine as long as the object goes to the hibernate cache, and I can then get it from the cache ,which I cant see to figure out how to do) OR it is saving to the Database, but my load of the object on the TakeTheQuiz page is reading an incomplete object from the cache.
Admittedly my Hibernate knowledge is limited, so if someone can help me understand what could be happening here and how to fix it, please let me know.
The answer, as I found out, was simple. It seemed that my save function was committing to the database lazily. Once I forced commits for that object at the end of each transaction the problem was solved.
I ended up writing my own hibernate session code which looks like this:
Session session = getSession();
session.beginTransaction();
session.saveOrUpdate(attempt);
session.getTransaction.commit();
session.close();
Problem solved.
My theory is that there is something wrong with the piece of code that randomly picks the questions. Are you sure that it works? Please paste some of your code.
A second theory is that there is something wrong with your transaction boundaries. When do you flush the session? And when is your transaction committed? Give it a try and set the FlushMode on your session to ALWAYS. Does this change something?

Categories