RDF4J only scheduling 5 Queries against a Triple Store

RDF4J only scheduling 5 Queries against a Triple Store - java

I have some more Issues with handling semantic data technologies:
I have a GraphDB Triplestor running locally on my machine an try to schedule some SPARQL queries against it using RDF4J and Java. As you can see from the code below 10 Queries shall be launched in a row. However only 5 get launched (I see number 0 - 4 in console). The problem seems to be that I am limited to 5 open HTTP connections for some reason. Any call of repConn.close() does not seem to change anything. Any Ideas anyone?
import org.eclipse.rdf4j.query.QueryLanguage;
import org.eclipse.rdf4j.query.TupleQuery;
import org.eclipse.rdf4j.query.TupleQueryResult;
import org.eclipse.rdf4j.repository.RepositoryConnection;
import org.eclipse.rdf4j.repository.http.HTTPRepository;
public class testmain {
public HTTPRepository rep;
public RepositoryConnection repConn;
public static void main(String[] args) {
testmain test = new testmain();
test.rep = new HTTPRepository("http://localhost:7200/repositories/test01");
//test.repConn = test.rep.getConnection();
for (int i = 0; i < 10; i++) {
test.repConn = test.rep.getConnection();
String queryString = "select ?archiveID where { ?video <http://www.some.ns/ontology##hasArchiveID> ?archiveID .}";
try {
TupleQuery tupleQuery = test.repConn.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
TupleQueryResult queryResult = tupleQuery.evaluate();
} finally {
test.repConn.close();
}
System.out.println(i);
}
}
}

you also need to close the query result otherwise the repCon.close does not do anything.
try {
TupleQuery tupleQuery = test.repConn.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
TupleQueryResult queryResult = tupleQuery.evaluate();
queryResult.close(); // this should solve your issue
}
Or even better use the new rdf4j streams API(QueryResults.stream(gqr)). That closes everything for you. http://docs.rdf4j.org/migration/ (Point 2.6.5)

Related

Tensorflow 2.0 & Java API

(note, I've resolved my problem and posted the code at the bottom)
I'm playing around with TensorFlow and the backend processing must take place in Java. I've taken one of the models from the https://developers.google.com/machine-learning/crash-course and saved it with tf.saved_model.save(my_model,"house_price_median_income") (using a docker container). I copied the model off and loaded it into Java (using the 2.0 stuff built from source because I'm on windows).
I can load the model and run it:
try (SavedModelBundle model = SavedModelBundle.load("./house_price_median_income", "serve")) {
try (Session session = model.session()) {
Session.Runner runner = session.runner();
float[][] in = new float[][]{ {2.1518f} } ;
Tensor<?> jack = Tensor.create(in);
runner.feed("serving_default_layer1_input", jack);
float[][] probabilities = runner.fetch("StatefulPartitionedCall").run().get(0).copyTo(new float[1][1]);
for (int i = 0; i < probabilities.length; ++i) {
System.out.println(String.format("-- Input #%d", i));
for (int j = 0; j < probabilities[i].length; ++j) {
System.out.println(String.format("Class %d - %f", i, probabilities[i][j]));
}
}
}
}
The above is hardcoded to an input and output but I want to be able to read the model and provide some information so the end-user can select the input and output, etc.
I can get the inputs and outputs with the python command: saved_model_cli show --dir ./house_price_median_income --all
What I want to do it get the inputs and outputs via Java so my code doesn't need to execute python script to get them. I can get operations via:
Graph graph = model.graph();
Iterator<Operation> itr = graph.operations();
while (itr.hasNext()) {
GraphOperation e = (GraphOperation)itr.next();
System.out.println(e);
And this outputs both the inputs and outputs as "operations" BUT how do I know that it is an input and\or an output? The python tool uses the SignatureDef but that doesn't seem to appear in the TensorFlow 2.0 java stuff at all. Am I missing something obvious or is it just missing from TensforFlow 2.0 Java library?
NOTE, I've sorted my issue with the answer help below. Here is my full bit of code in case somebody would like it in the future. Note this is TF 2.0 and uses the SNAPSHOT mentioned below. I make a few assumptions but it shows how to pull the input and output and then use them to run a model
import org.tensorflow.SavedModelBundle;
import org.tensorflow.Session;
import org.tensorflow.Tensor;
import org.tensorflow.exceptions.TensorFlowException;
import org.tensorflow.Session.Run;
import org.tensorflow.Graph;
import org.tensorflow.Operation;
import org.tensorflow.Output;
import org.tensorflow.GraphOperation;
import org.tensorflow.proto.framework.SignatureDef;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import org.tensorflow.proto.framework.MetaGraphDef;
import java.util.Map;
import org.tensorflow.proto.framework.TensorInfo;
import org.tensorflow.types.TFloat32;
import org.tensorflow.tools.Shape;
import java.nio.FloatBuffer;
import org.tensorflow.tools.buffer.DataBuffers;
import org.tensorflow.tools.ndarray.FloatNdArray;
import org.tensorflow.tools.ndarray.StdArrays;
import org.tensorflow.proto.framework.TensorInfo;
public class v2tensor {
public static void main(String[] args) {
try (SavedModelBundle savedModel = SavedModelBundle.load("./house_price_median_income", "serve")) {
SignatureDef modelInfo = savedModel.metaGraphDef().getSignatureDefMap().get("serving_default");
TensorInfo input1 = null;
TensorInfo output1 = null;
Map<String, TensorInfo> inputs = modelInfo.getInputsMap();
for(Map.Entry<String, TensorInfo> input : inputs.entrySet()) {
if (input1 == null) {
input1 = input.getValue();
System.out.println(input1.getName());
}
System.out.println(input);
}
Map<String, TensorInfo> outputs = modelInfo.getOutputsMap();
for(Map.Entry<String, TensorInfo> output : outputs.entrySet()) {
if (output1 == null) {
output1=output.getValue();
}
System.out.println(output);
}
try (Session session = savedModel.session()) {
Session.Runner runner = session.runner();
FloatNdArray matrix = StdArrays.ndCopyOf(new float[][]{ { 2.1518f } } );
try (Tensor<TFloat32> jack = TFloat32.tensorOf(matrix) ) {
runner.feed(input1.getName(), jack);
try ( Tensor<TFloat32> rezz = runner.fetch(output1.getName()).run().get(0).expect(TFloat32.DTYPE) ) {
TFloat32 data = rezz.data();
data.scalars().forEachIndexed((i, s) -> {
System.out.println(s.getFloat());
} );
}
}
}
} catch (TensorFlowException ex) {
ex.printStackTrace();
}
}
}

What you need to do is to read the SavedModelBundle metadata as a MetaGraphDef, from there you can retrieve input and output names from the SignatureDef, like in Python.
In TF Java 1.* (i.e. the client you are using in your example), the proto definitions are not available out-of-the-box from the tensorflow artifact, you need to add a dependency to org.tensorflow:proto as well and deserialize the result of SavedModelBundle.metaGraphDef() into a MetaGraphDef proto.
In TF Java 2.* (the new client actually only available as snapshots from here), the protos are present right away so you can simply call this line to retrieve the right SignatureDef:
savedModel.metaGraphDef().signatureDefMap.getValue("serving_default")

How does apache spark works?

I am testing this framework specifically the SQL module with the java API. Running it as standalone in java, I have better results (even running in the same machine) then pure sql (running over pgadmin).
Executing exactly the same query in the same bd (postgres 9.4 and 9.5), sparks is almost 5 x faster, below is my code:
private static final JavaSparkContext sc =
new JavaSparkContext(new SparkConf()
.setAppName("SparkJdbc").setMaster("local[*]"));
public static void main(String[] args) {
StringBuilder sql = new StringBuilder();
sql.append(" my query ");
long time = System.currentTimeMillis();
DbConnection dbConnection = new DbConnection("org.postgresql.Driver", "jdbc:postgresql://localhost:5432/mydb", "user", "pass");
JdbcRDD<Object[]> jdbcRDD = new JdbcRDD<>(sc.sc(), dbConnection, sql.toString(),
1, 1000000000, 20, new MapResult(), ClassManifestFactory$.MODULE$.fromClass(Object[].class));
JavaRDD<Object[]> javaRDD = JavaRDD.fromRDD(jdbcRDD, ClassManifestFactory$.MODULE$.fromClass(Object[].class));
// javaRDD.map((final Object[] record) -> {
// StringBuilder line = new StringBuilder();
// for (Object o : record) {
// line.append(o != null ? o.toString() : "").append(",");
// }
// return line.toString();
// }).collect().forEach(rec -> System.out.println(rec));
System.out.println("Total: " + javaRDD.count());
System.out.println("Total time: " + (System.currentTimeMillis() - time));
}
Even if I uncomment the code to print the results to the console, it still running faster.
I am wondering how can it be faster, if the source (bd) is the same, can anyone explain it? Maybe using parallel query?
EDIT: There is a relevant information about the postgres process, in standby, the postgres uses 12 process and running the query over pgadmin, the number does not change, over spark, it up to 20, that means probably spark uses some parallel mechanism to do the job.

DSE Cassandra - Why is Astyanax faster than DataStax java driver

I'm switching a Java application from using com.netflix.astyanax:astyanax-core:1.56.44 to com.datastax.cassandra:cassandra-driver-core:3.1.0.
Right off the bat, with a simple test that inserts a row with a randomly-generated key and then reads the row 1000 times, I'm seeing terrible performance compared to the code using Astyanax. Just using a single-node Cassandra instance running locally. The table I'm testing is simple -- just a blob primary key uuid column, and an int date column.
Here's the basic code with the DataStax driver:
class DataStaxCassandra
{
final Session session;
final PreparedStatement preparedIDWriteCmd;
final PreparedStatement preparedIDReadCmd;
void DataStaxCassandra()
{
final PoolingOptions poolingOptions = new PoolingOptions()
.setConnectionsPerHost(HostDistance.LOCAL, 1, 2)
.setConnectionsPerHost(HostDistance.REMOTE, 1, 1)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 128)
.setMaxRequestsPerConnection(HostDistance.REMOTE, 128)
.setPoolTimeoutMillis(0); // Don't ever wait for a connection to one host.
final QueryOptions queryOptions = new QueryOptions()
.setConsistencyLevel(ConsistencyLevel.LOCAL_ONE)
.setPrepareOnAllHosts(true)
.setReprepareOnUp(true);
final LoadBalancingPolicy dcAwareRRPolicy = DCAwareRoundRobinPolicy.builder()
.withLocalDc("my_laptop")
.withUsedHostsPerRemoteDc(0)
.build();
final LoadBalancingPolicy loadBalancingPolicy = new TokenAwarePolicy(dcAwareRRPolicy);
final SocketOptions socketOptions = new SocketOptions()
.setConnectTimeoutMillis(1000)
.setReadTimeoutMillis(1000);
final RetryPolicy retryPolicy = new LoggingRetryPolicy(DefaultRetryPolicy.INSTANCE);
Cluster.Builder clusterBuilder = Cluster.builder()
.withClusterName("test cluster")
.withPort(9042)
.addContactPoints("127.0.0.1")
.withPoolingOptions(poolingOptions)
.withQueryOptions(queryOptions)
.withLoadBalancingPolicy(loadBalancingPolicy)
.withSocketOptions(socketOptions)
.withRetryPolicy(retryPolicy);
// I've tried both V3 and V2, with lower connections/host and higher reqs/connection settings
// with V3, and it doesn't noticably affect the test performance. Leaving it at V2 because the
// Astyanax version is using V2.
clusterBuilder.withProtocolVersion(ProtocolVersion.V2);
final Cluster cluster = clusterBuilder.build();
session = cluster.connect();
preparedIDWriteCmd = session.prepare(
"INSERT INTO \"mykeyspace\".\"mytable\" (\"uuid\", \"date\") VALUES (?, ?) USING TTL 38880000");
preparedIDReadCmd = session.prepare(
"SELECT \"date\" from \"mykeyspace\".\"mytable\" WHERE \"uuid\"=?");
}
public List<Row> execute(final Statement statement, final int timeout)
throws InterruptedException, ExecutionException, TimeoutException
{
final ResultSetFuture future = session.executeAsync(statement);
try
{
final ResultSet readRows = future.get(timeout, TimeUnit.MILLISECONDS);
final List<Row> resultRows = new ArrayList<>();
// How far we can go without triggering the blocking fetch:
int remainingInPage = readRows.getAvailableWithoutFetching();
for (final Row row : readRows)
{
resultRows.add(row);
if (--remainingInPage == 0) break;
}
return resultRows;
}
catch (final TimeoutException e)
{
future.cancel(true);
throw e;
}
}
private void insertRow(final byte[] id, final int date)
throws InterruptedException, ExecutionException, TimeoutException
{
final ByteBuffer idKey = ByteBuffer.wrap(id);
final BoundStatement writeCmd = preparedIDWriteCmd.bind(idKey, date);
writeCmd.setRoutingKey(idKey);
execute(writeCmd, 1000);
}
public int readRow(final byte[] id)
throws InterruptedException, ExecutionException, TimeoutException
{
final ByteBuffer idKey = ByteBuffer.wrap(id);
final BoundStatement readCmd = preparedIDReadCmd.bind(idKey);
readCmd.setRoutingKey(idKey);
final List<Row> idRows = execute(readCmd, 1000);
if (idRows.isEmpty()) return 0;
final Row idRow = idRows.get(0);
return idRow.getInt("date");
}
}
void perfTest()
{
final DataStaxCassandra ds = new DataStaxCassandra();
final int perfTestCount = 10000;
final long startTime = System.nanoTime();
for (int i = 0; i < perfTestCount; ++i)
{
final String id = UUIDUtils.generateRandomUUIDString();
final byte[] idBytes = Utils.hexStringToByteArray(id);
final int date = (int)(System.currentTimeMillis() / 1000);
try
{
ds.insertRow(idBytes, date);
final int dateRead = ds.readRow(idBytes);
assert(dateRead == date) : "Inserted ID with date " +date +" but date read is " +dateRead;
}
catch (final InterruptedException | ExecutionException | TimeoutException e)
{
System.err.println("ERROR reading ID (test " +(i+1) +") - " +e.toString());
}
}
System.out.println(
perfTestCount +" insert+reads took " +
TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTime) +" ms");
}
Is there something I'm doing wrong that would yield poor performance? I was hoping it'd be a decent speed improvement, given that I'm using a pretty old version of Astyanax.
I've tried not wrapping the load balancing policy with TokenAwarePolicy, and getting rid of the "setRoutingKey" lines, just because I know these things definitely shouldn't help when just using a single node as I'm currently doing.
My local Cassandra version is 2.1.15 (which supports native protocol V3), but the machines in our production environment are running Cassandra 2.0.12.156 (which only supports V2).
Keep in mind that this is targeted for an environment with a bunch of nodes and several data centers, which is why I've got the settings the way I do (which the actual values being set from a config file), even though I know for this test I could skip using things like DCAwareRoundRobinPolicy.
Any help would be greatly appreciated! I can also post the code that's using Astyanax, I just figured first it'd be good to make sure nothing's blatantly wrong with my new code.
Thanks!
Tests of 10,000 writes+reads are taking around 30 seconds with the DataStax driver, while with Astyanax, they are taking in the 15-20 second range.
I upped the test count to 100,000 to see if maybe there's some overhead with the DataStax driver that just consumes ~10 seconds at startup, after which they might perform more similarly. But even with 100,000 read/writes:
AstyanaxCassandra 100,000 insert+reads took 156593 ms
DataStaxCassandra 100,000 insert+reads took 294340 ms

Ucanaccess connection of my Android app with a huge MS Access database is taking too much heap space on Android device. Any alternatives?

I am developing an Android app that needs to fetch data from a huge MS Access database of 120MB size.
I have written code to establish connectivity and execute a simple query on the database. I run the same java code on my laptop and my Android device.Here's the code:
p
ackage practiceDB;
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import java.sql.ResultSet;
import java.util.Scanner;
import net.ucanaccess.converters.TypesMap.AccessType;
import net.ucanaccess.ext.FunctionType;
import net.ucanaccess.jdbc.UcanaccessConnection;
import net.ucanaccess.jdbc.UcanaccessDriver;
public class Example {
private Connection ucaConn;
public Example() {
try {
this.ucaConn = getUcanaccessConnection("VehicleDatabase2.mdb");
} catch (SQLException e) {
e.printStackTrace();
} catch(IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws ClassNotFoundException, SQLException {
System.out.println("Please enter an int");
new Scanner(System.in).nextInt();
try {
Example example = new Example();
example.executeQuery();
} catch (Exception ex) {
System.out.println("An exception : " + ex.getMessage());
}
}
private void executeQuery() throws SQLException {
Statement st = null;
try {
System.out.println("Please enter an int");
new Scanner(System.in).nextInt();
st = this.ucaConn.createStatement();
System.out.println("Please enter an int");
new Scanner(System.in).nextInt();
ResultSet rs = st.executeQuery("Select * from PersonData where EngNo = '1544256'");
System.out.println(" result:");
dump (rs, "executeQuery");
} catch(Exception ex) {
System.out.println("Sarah exception: " + ex.getMessage());
} finally {
if ( st != null ) {
st.close();
}
}
}
private Connection getUcanaccessConnection(String pathNewDB) throws SQLException, IOException {
String url = UcanaccessDriver.URL_PREFIX + "VehicleDatabase2.mdb;newDatabaseVersion=V2003";
return DriverManager.getConnection(url);
}
private void dump(ResultSet rs, String exName)
throws SQLException {
System.out.println("-------------------------------------------------");
System.out.println();
System.out.println();
int jk = 0;
while (rs.next()) {
System.out.print("| ");
int j=rs.getMetaData().getColumnCount();
for (int i = 1; i <=j ; ++i) {
Object o = rs.getObject(i);
System.out.print(o + " | ");
}
System.out.println();
System.out.println();
}
}
}
When it runs on my laptop, the connection takes only about a minute to establish.
But when it runs on my Android device, the connection takes more than 10 minutes, and takes up all the heap space, and when the device runs out of memory, the app crashes
What should i do??
Note:
i made some slight changes in this code to run it on android, like adding toasts instead of System.out.println for debugging, i removed the static main function for android, used Environment.getAbsolutePath() to locate the database, etc.
Also, the code that I am running on Android, I first used a 9MB database to check if it works. The code fetches the data as expected from the 9MB database without any issues. The connection takes around 10 seconds to establish in Android in case of the 9MB database (in desktop, it takes less than a second to establish connection with 9MB database)

Yes, I know, it should work on a medium sized db. With a huge one...
Firstly, notice that the time you're measuring, it's the time of the very first connection to the database in the VM life, the followings(if needed) will be instantaneous.
Never tried something like that on Android because your experiment is challenging, yet, should it fit your requirements, you may try:
-use MirrorFolder (or keepMirror) connection parameter(see the ucanaccess web site for more details about it). In this case the very first connection to the db will be very slow, all the followings(even if the vm ends) will be instantaneous. But the access database should be updated only with ucanaccess and on your android
or, alternatively
-use a filter database(configure it on windows) that links the real database within the subset of external linked tables which are closely needed for your app(the memory usage might be dropped down). In this case, you'll have to use the remap connection parameter, because you're on a linux based SO.
See another suggestion related to jackcess(the underlying I/O library) here and use the latest ucanaccess release.

Microsoft Excel as SQL Database

I have followed the instructions provided in this website from content 5.3 and the code works fine.
My plan is to make a jar file (containing an interface/GUI), distribute that jar file to users, and then have them all read/write data should be from one excel file. When I place the excel file in local drive it works, but when I place the file in network folder/server, the java creates a problem:
java.exe has encountered a problem and needs to close. We are sorry
for the inconvenience.
or
Java Result: -1073741811
Any suggestions? Thank you
public class TestIntoExcel
{
public String s;
public double number;
public Date d;
public void display()throws ClassNotFoundException, SQLException
{
Class.forName("oracle.jdbc.driver.OracleDriver");
Connection writeConnection = DriverManager.getConnection
("jdbc:odbc:usersavedataODBC");
writeConnection.setReadOnly(false);
Statement writeStatement = writeConnection.createStatement();
writeStatement.executeUpdate("CREATE TABLE TEST_INSERT(COL1 INT,COL2 VARCHAR(10),COL3 DATE)");
PreparedStatement writeStatement2 =
writeConnection.prepareStatement("INSERT INTO TEST_INSERT(COL1,COL2,COL3)VALUES(?,?,?)");
for(int i = 0; i<3;i++)
{
writeStatement2.setDouble(1, i);
writeStatement2.setString(2, "Row" + i);
writeStatement2.setDate(3, new java.sql.Date(new Date().getTime()));
writeStatement2.execute();
}
String query = "select *from[TEST_INSERT]";
ResultSet rset = writeStatement.executeQuery(query);
//System.out.println(rset);
while(rset.next())
{
number = rset.getDouble("COL1");
s = rset.getString("COL2");
d = rset.getDate("COL3");
System.out.println(number+"\n"+s+"\n"+d);
}
writeStatement.close();
writeStatement2.close();
writeConnection.close();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

RDF4J only scheduling 5 Queries against a Triple Store - java

Related

Tensorflow 2.0 & Java API

How does apache spark works?

DSE Cassandra - Why is Astyanax faster than DataStax java driver

Ucanaccess connection of my Android app with a huge MS Access database is taking too much heap space on Android device. Any alternatives?

Microsoft Excel as SQL Database

Categories

Resources