Vert.x queryStream seems to hang at about 4950 rows

Vert.x queryStream seems to hang at about 4950 rows - java

I am troubleshooting a strange issue with Vertx 3. I've created a request handler for an HTTP route, which queries a PostgreSQL database. It's all very standard and it works, until the row count increases beyond 4950. This is despite using queryStream, which is supposed to scale.
I've simplified the code below to illustrate the problem:
dbClient.getConnection(res -> {
if (res.failed()) {
event.fail(500);
return;
}
try (final SQLConnection conn = res.result()) {
conn.queryStream("select x, y, z from large_table", stream -> {
if (stream.succeeded()) {
final SQLRowStream rowStream = stream.result();
rowStream.handler(row -> {
// Do something with row here, but leaving it empty now
}).endHandler(endHandler -> {
response.end();
});
}
});
}
How do I go about troubleshooting this? When I run the query in psql or using regular JDBC in Java SE it has no issues.
If I append "LIMIT 4000" to the query, it works fine.
Or have I misunderstood Vertx's JDBC support, in that I have to execute this as blocking code because it is taking so long?

It seems like upgrading to vert.x 3.5.2 solved the issue. I am not sure what the actual root cause was.

Related

JanusGraph Java unable to add vertex/edge

I've been trying to create interact with my JanusGraph setup in docker. But after many tries I still don't succeed.
How I connect to JG.
public boolean connect() {
try {
graph = traversal().withRemote("path/to/janusgraph-cql-lucene-server.properties");
return true;
} catch (Exception e) {
log.error("Unable to create connection with graph", e);
return false;
}
}
How I try to add a vertex. It looks like this doesn't do anything.
GraphTraversal<Vertex, Vertex> yt = graph.addV("link")
.property("url", "https://www.youtube.com/123")
.property("page_type", "contact");
GraphTraversal<Vertex, Vertex> fb = graph.addV("link")
.property("url", "https://www.facebook.com/456");
graph.tx().commit();
I've added a node with the gremlin console. This works, so the setup is not invalid or something like that. And when I fetch all nodes I in my application get a valid response.
System.out.println(graph.V().hasLabel("link").count().next()); //returns 1 (the node I added manually)
My assumptions:
Setup is alright because it works in the gremlin console
connection
connection must be alright because the initialization doesn't throw an exception and we get a valid count response.
The only thing I'm not sure about is if there's a transaction commit that I am missing. I didn't find any other than graph.tx().commit();
Could you please help me and tell me what I am doing wrong?

The GraphTraversal object is only a "plan" to be carried out. To have it take effect, you need a closing method like next, toList, etc., like you did for the count.
The confusion probably arose from the fact that the gremlin console automatically keeps nexting the traversal a configured number of times.

Groovy gmongo batch processing

I'm currently trying to run a batch processing job in groovy with Gmongo driver, the collection is about 8 gigs my problem is that my script tries to load everything in-memory, ideally I'd like to be able to process this in batch similar to what Spring Boot Batch does but in groovy scripts
I've tried batchSize(), but this function still retrieves the entire collection into memory only to apply it to my logic in batch-process.
here's my example
momngoDb.collection.find().collect() it -> {
//logic
}

according to official doc:
https://docs.mongodb.com/manual/tutorial/iterate-a-cursor/#read-operations-cursors
def myCursor = db.collection.find()
while (myCursor.hasNext()) {
print( myCursor.next() }
}

After deliberation I found this solution to works best for the following reasons.
Unlike the Cursor it doesn't retrieve documents on a singular basis for processing (which can be terribly slow)
Unlike the Gmongo batch funstion, it also doesn't try to upload the the entire collection in memory only to cut it up in batches for process, this tends to be heavy on machine resources.
code below is efficient and light on resource depending on your batch size.
def skipSize = 0
def limitSize = Integer.valueOf(1000) batchSize (if your going to hard code the batch size then you dont need the int convertion)
def dbSize = Db.collectionName.count()
def dbRunCount = (dbSize / limitSize).round()
dbRunCount.times { it ->
dstvoDsEpgDb.schedule.find()
.skip(skipSize)
.limit(limitSize)
.collect { event ->
//run your business logic processing
}
//calculate the next skipSize
skipSize += limitSize
}

Spark RDD loaded into LevelDB via toLocalIterator creates corrupted database

I have a Spark computation that I want to persist into a simple leveldb database - once all the heavy lifting is done by Spark (in Scala here).
So my code goes like this :
private def saveRddToLevelDb(rdd: RDD[(String, Int)], target: File) = {
import resource._
val options = new Options()
options.createIfMissing(true)
options.compressionType(CompressionType.SNAPPY)
for (db <- managed(factory.open(target, options))) { // scala-arm
rdd.map { case (key, score) =>
(bytes(key), bytes(score.toString))
}.toLocalIterator.foreach { case (key, value) =>
db.put(key, value)
}
}
}
And all is right with the world. But then if I try to open the created database and do a get on it :
org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: .../leveldb-data/000081.sst: Invalid argument
org.fusesource.leveldbjni.internal.NativeDB.get(NativeDB.java:316)
org.fusesource.leveldbjni.internal.NativeDB.get(NativeDB.java:300)
org.fusesource.leveldbjni.internal.NativeDB.get(NativeDB.java:293)
org.fusesource.leveldbjni.internal.JniDB.get(JniDB.java:85)
org.fusesource.leveldbjni.internal.JniDB.get(JniDB.java:77)
I managed however to make it by, not simply opening the created leveldb database, but repairing it beforehand... (in java this time) :
factory.repair(new File(levelDbDirectory, "leveldb-data"), options);
DB db = factory.open(new File(levelDbDirectory, "leveldb-data"), options);
So, everything's all right then ?!
Yes, but my only question is why ?
What am I doing wrong when I put all my data into leveldb :
the open-stream to the database is managed by scala-arm, therefore closed properly afterwards
My JVM is not killed or anything
There's only one process, heck even only one thread - the driver one, accessing the database (via the toLocalIterator method)
and finally, if I open the database using the paranoid mode, leveldb doesn't care before I try to get on it. So the database is not exactly corrupted in its eyes.
I've read about the fact that the put write is actually async, I did not however try to change the WriteOptions to synced, but wouldn't the close method wait for the process to flush everything ?

Predicate implementation inquiry FilreredRowSet in JAVA

I have the following predicate implementation:
public boolean evaluate(RowSet rowset )
{try{
int count=0;
CachedRowSet crs = (CachedRowSet)rowset;
{
if (!crs.isAfterLast())//CRUCIAL LINE
for (int i=0;i<index.length;i++)
{
if ((crs.getObject(index[i])).toString().compareTo(high[i].toString())<=0)
{
if ((crs.getObject(index[i])).toString().compareTo(low[i].toString())>=0)
{
count++;
}
}
}
}
Now, if I comment out the "crucial line" hence:
if (!crs.isAfterLast())
I will get a java.sql.SQLException: Invalid cursor position.
Why does it happen so? Does not it get used the .next() method returning false if the next line is afterLast?
I can traverse any resultset without having to check if (!crs.isAfterLast()). Just using rs.next() would be enough as it returns false if next is after the last.
Why in predicate this does not happen? Thanks in advance.

The issue you are encountering is a bug in Java 7 (even in 7.0_45), documented here: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7166598
A simple workaround to avoid the "Invalid cursor position" SQLException that worked for me is to add this logic to the start of your implementation of Predicate.evaluate(RowSet rs) to detect the end-of-rowset condition:
FilteredRowSet frs = (FilteredRowSet) rs;
int rowNum = frs.getRow();
if (rowNum == 0) { // Meaning "no current row" - should never happen...
return false; // Needed only to address a bug in Java 7
}
Your workaround of calling isAfterLast() also seems fine, and the ugly workaround of simply swallowing the SQLException also works, since the exception occurs after all processing is complete.
The problem is fixed under Java 8 RTE, and recompilation is not necessary.
I have specifically tested a Predicate implementation which is failing under Java 7 using Java 8, within Intellij Idea, and also from the command line. Under Java 8 it also works fine without needing the Java 7 workaround described above.

Huge time accessing database from Java

I'm a junior java programmer and I've finally made my first program, all by myself, apart from school :).
The basics are: you can store data on it and retrieve it anytime. The main thing is, I want to be able to run this program on another computer (as a runable .jar file).
Therefore I had to install JRE and microsoft access 2010 drivers (they both are 32 bit), and the program works perfect, but there is 1 small problem.
It takes ages (literaly, 17 seconds) to store or delete something from the database.
What is the cause of this? Can I change it?
Edit:
Here's the code to insert an object of the class Woord into the database.
public static void ToevoegenWoord(Woord woord) {
try (Connection conn = DriverManager.getConnection("jdbc:odbc:DatabaseSenne")) {
PreparedStatement addWoord =
conn.prepareStatement("INSERT INTO Woorden VALUES (?)");
addWoord.setString(1, woord.getWoord());
addWoord.executeUpdate();
} catch (SQLException ex) {
for (Throwable t : ex) {
System.out.println("Het woord kond niet worden toegevoegd aan de databank.");
t.printStackTrace();
}
}
}

Most likely creating Connection every time is slow operation in your case (especially using JDBC-ODBC bridge). To confirm this try to put print statements with timestamp before and after the line that get Connection from DriverManager. If that's the case consider not to open connection on every request but open it once and reuse, better yet use some sort of Connection Pooling, there are plenty of options available.
If that's mot the case then actual insert could be slow as well. Again simple profiling with print statements should help you to discover where your code is spending most of the time.

First of all, congrats on your first independent foray. To answer your question / elaborate on maximdim's answer, the concern is that calling:
try (Connection conn = DriverManager.getConnection("jdbc:odbc:DatabaseSenne")) {
every time you're using this function may be a major bottleneck (or perhaps another section of your code is.) Most importantly, you will want to understand the concept of using logging or even standard print statements to help diagnose where you are seeing an issue. Wrapping individual lines of code like so:
System.out.println("Before Connection retrieval: " + new Date().getTime());
try (Connection conn = DriverManager.getConnection("jdbc:odbc:DatabaseSenne")) {
System.out.println("AFTER Connection retrieval: " + new Date().getTime());
...to see how many milliseconds pass for each call can help you determine exactly where your bottleneck lies.

Advise: use another database, like Derby, hsqldb. They are not so different from MSAccess, (= can use a file based DB), but perform better (than JDBC/ODBC). And can even be embedded in the application (without extra installation of the DB).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Vert.x queryStream seems to hang at about 4950 rows - java

It seems like upgrading to vert.x 3.5.2 solved the issue. I am not sure what the actual root cause was.

Related

JanusGraph Java unable to add vertex/edge

Groovy gmongo batch processing

Spark RDD loaded into LevelDB via toLocalIterator creates corrupted database

Predicate implementation inquiry FilreredRowSet in JAVA

Huge time accessing database from Java

Categories

Resources