Assign WriteConcern to mongo file system

Assign WriteConcern to mongo file system - java

Colleagues,
I'm using Mongo of v2.2 and java Mongo driver 2.9.0,
Some business logic creates approximately 25 threads and each thread creates 150 files on GridFS. Approximately 20 files per 1000 are not return correct getId() so result is null. I think (correct me if I'm wrong) it is correct behavior in perspective of throughput. But I really need this id. For regular DBCollection I would set WriteConcern.FSYNC_SAFE, but I cannot see if exist method setWriteConcern for GridFS. Have you some ideas how to force files be flushed ?

Looking at driver code in GridFS.java:
_filesCollection = _db.getCollection( _bucketName + ".files" );
I can resolve collection with the same name after creation of GridFS, so my code with setting write concern looks like:
_fs = new GridFS(_db, "MyBucketName");
DBCollection col = _db.getCollection( "MyBucketName" + ".files" );
col.setWriteConcern(WriteConcern.SAFE);
After starting tests I can see that all files are successfully returns correct id.

Related

jOOQ batch insertion inconsistency

While working with batch insertion in jOOQ (v3.14.4) I noticed some inconsistency when looking into PostgreSQL (v12.6) logs.
When doing context.batch(<query>).bind(<1st record>).bind(<2nd record>)...bind(<nth record>).execute() the logs show that the records are actually inserted one by one instead of all in one go.
While doing context.insert(<fields>).values(<1st record>).values(<2nd record>)...values(<nth record>) actually inserts everything in one go judging by the postgres logs.
Is it a bug in the jOOQ itself or was I using the batch(...) functionality incorrectly?
Here are 2 code snippets that are supposed to do the same but in reality, the first one inserts records one by one while the second one actually does the batch insertion.
public void batchInsertEdges(List<EdgesRecord> edges) {
Query batchQuery = context.insertInto(Edges.EDGES,
Edges.EDGES.SOURCE_ID, Edges.EDGES.TARGET_ID, Edges.EDGES.CALL_SITES,
Edges.EDGES.METADATA)
.values((Long) null, (Long) null, (CallSiteRecord[]) null, (JSONB) null)
.onConflictOnConstraint(Keys.UNIQUE_SOURCE_TARGET).doUpdate()
.set(Edges.EDGES.CALL_SITES, Edges.EDGES.as("excluded").CALL_SITES)
.set(Edges.EDGES.METADATA, field("coalesce(edges.metadata, '{}'::jsonb) || excluded.metadata", JSONB.class));
var batchBind = context.batch(batchQuery);
for (var edge : edges) {
batchBind = batchBind.bind(edge.getSourceId(), edge.getTargetId(),
edge.getCallSites(), edge.getMetadata());
}
batchBind.execute();
}
public void batchInsertEdges(List<EdgesRecord> edges) {
var insert = context.insertInto(Edges.EDGES,
Edges.EDGES.SOURCE_ID, Edges.EDGES.TARGET_ID, Edges.EDGES.CALL_SITES, Edges.EDGES.METADATA);
for (var edge : edges) {
insert = insert.values(edge.getSourceId(), edge.getTargetId(), edge.getCallSites(), edge.getMetadata());
}
insert.onConflictOnConstraint(Keys.UNIQUE_SOURCE_TARGET).doUpdate()
.set(Edges.EDGES.CALL_SITES, Edges.EDGES.as("excluded").CALL_SITES)
.set(Edges.EDGES.METADATA, field("coalesce(edges.metadata, '{}'::jsonb) || excluded.metadata", JSONB.class))
.execute();
}
I would appreciate some help to figure out why the first code snippet does not work as intended and second one does. Thank you!

There's a difference between "batch processing" (as in JDBC batch) and "bulk processing" (as in what many RDBMS call "bulk updates").
This page of the manual about data import explains the difference.
Bulk size: The number of rows that are sent to the server in one SQL statement.
Batch size: The number of statements that are sent to the server in one JDBC statement batch.
These are fundamentally different things. Both help improve performance. Bulk data processing does so by helping the RDBMS optimise resource allocation algorithms as it knows it is about to insert 10 records. Batch data processing does so by reducing the number of round trips between client and server. Whether either approach has a big impact on any given RDBMS is obviously vendor specific.
In other words, both of your approaches work as intended.

Writing Data from RDS to Disk in JOOQ

My use case is that I have to run a query on RDS instance and it returns 2 millions records. Now,I want to copy the result directly to disk instead of bringing it in memory then copying it to disk.
Following statement will bring all the records in memory, I want to transfer the results directly to file on disk.
SelectQuery<Record> abc = dslContext.selectQuery().fetch();
Can anyone suggest an pointer?
Update1:
I found the following way to read it :
try (Cursor<BookRecord> cursor = create.selectFrom(BOOK).fetchLazy()) {
while (cursor.hasNext()){
BookRecord book = cursor.fetchOne();
Util.doThingsWithBook(book);
}
}
How many records does it fetch at once and are those records brought in memory first?
Update2:
MySQL driver by default it fetches all the records at once. If fetch size is set to Integer.MIN_VALUE then it fetches one record at a time. If you want to fetch the records in batches then set useCursorFetch=true while setting connection properties.
Related wiki : https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-implementation-notes.html

Your approach using the ResultQuery.fetchLazy() method is the way to go for jOOQ to fetch records one at a time from JDBC. Note that you can use Cursor.fetchNext(int) to fetch a batch of records from JDBC as well.
There's a second thing you might need to configure, and that's the JDBC fetch size, see Statement.setFetchSize(int). This configures how many rows are fetched by the JDBC driver from the server in a single batch. Depending on your database / JDBC driver (e.g. MySQL), the default would again be to fetch all rows in one go. In order to specify the JDBC fetch size on a jOOQ query, use ResultQuery.fetchSize(int). So your loop would become:
try (Cursor<BookRecord> cursor = create
.selectFrom(BOOK)
.fetchSize(size)
.fetchLazy()) {
while (cursor.hasNext()){
BookRecord book = cursor.fetchOne();
Util.doThingsWithBook(book);
}
}
Please read your JDBC driver manual about how they interpret the fetch size, noting that MySQL is "special"

DB2 ERRORCODE 4499 SQLSTATE=58009

On our production application we recently become weird error from DB2:
Caused by: com.ibm.websphere.ce.cm.StaleConnectionException: [jcc][t4][2055][11259][4.13.80] The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated your particular request due to an error or a force interrupt. ERRORCODE=-4499, SQLSTATE=58009
This occurs when hibernate tries to select data from one big table(More than 6 milions records and 320 columns).
I observed that when ResultSet lower that 10 elements, hibernate selects successfully.
Our architecture:
Spring 4.0.3
Hibernate 4.3.5
DB2 v10 z/Os
Websphere 7.0.0.31(with JDBC V9.7FP5)
This select works when I tried to executed this in Data Studio or when app is started localy from Tomcat(connected to production Data Source). I suppose that Data Source on Websphere is not corectly configured, but I tried some modifications and without results. I also tried to update JDBC Driver but that not helped. Actually I become then ERRORCODE = -1244.
Ok, so now I'm looking for any help ;).
I can obviously provide additional information when needed.
Maybe someone fighted earlier with this problem?
Thanks in advance!

We have the same problem and finally solved by running REORG and RUNSTAT on the table(s). In our case, databse and tables were damaged and after running both mentioned operations, it resolved.

This occurs when hibernate tries to select data from one big table(More than 6 milions records and 320 columns)
6 million records with 320 columns seems huge to be read at once through hibernate. How you tried creating a database cursor and streaming few records at a time? In plain JDBC it is done as follows
Statement stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(50); //fetch only 50 records at a time
while with hibernate you would need the below code
Query query = session.createQuery(query);
query.setReadOnly(true);
query.setFetchSize(50);
ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);
// iterate over results
while (results.next()) {
Object row = results.get();
// process row then release reference
// you may need to flush() as well
}
results.close();
This allows you to stream over the result set, however Hibernate will still cache results in the Session, so you’ll need to call session.flush() every so often. If you are only reading data, you might consider using a StatelessSession, though you should read its documentation beforehand.
Analyze the database table locking impact when using this approach.

Transferring Millions of rows from teradata to mySQL

I have to transfer around 5 million rows of data from Teradata to MySQL. Can anyone please suggest me the fastest way to do this over the network, without using the filesystem. I am new to Teradata and MySQL. I want to run this transfer as a batch job on weekly basis, so I am looking for the solution which can be fully automated. Any suggestions or hints will be greatly appreciated.
I have already written the code using JDBC to get the records from the Teradata and insert them into the MySQL. But it is very slow, so I am looking to make that code more efficient. I kept in generic because I didn't have the solution to be constrained by my implementation, as along with making existing code more efficient I am open to other alternatives also. But I don't want to use the file system since it's not easier to maintain or update the scripts.
My implementation:
Getting records from teradata:
connection = DBConnectionFactory.getDBConnection(SOURCE_DB);
statement = connection.createStatement();
rs = statement.executeQuery(QUERY_SELECT);
while (rs.next()) {
Offer offer = new Offer();
offer.setExternalSourceId(rs.getString("EXT_SOURCE_ID"));
offer.setClientOfferId(rs.getString("CLIENT_OFFER_ID"));
offer.setUpcId(rs.getString("UPC_ID"));
offers.add(offer);
}
Inserting the records in mySQL:
int count = 0;
if (isUpdated) {
for (Offer offer : offers) {
count++;
stringBuilderUpdate = new StringBuilder();
stringBuilderUpdate = stringBuilderUpdate
.append(QUERY_INSERT);
stringBuilderUpdate = stringBuilderUpdate.append("'"
+ offer.getExternalSourceId() + "'");
statement.addBatch(stringBuilderUpdate.toString());
queryBuilder = queryBuilder.append(stringBuilderUpdate
.toString() + SEMI_COLON);
if (count > LIMIT) {
countUpdate = statement.executeBatch();
LOG.info("DB update count : " + countUpdate.length);
count = 0;
}
}
if (count > 0) {
// Execute batch
countUpdate = statement.executeBatch();
}
Can anybody please tell me if we can make this code more efficient ???
Thanks
PS: Please ignore the syntax error in above code as this code is working fine. Some info might be missing because of copy and paste.

The fastest method of importing data to MySQL is by using LOAD DATA INFILE or mysqlimport, which is a command line interface to LOAD DATA INFILE and it involves loading data from a file, preferably residing on a local filesystem.
When loading a table from a text file, use LOAD DATA INFILE. This is
usually 20 times faster than using INSERT statements.
Therefore despite the fact that you don't want to use the filesystem I'd suggest to consider creating a dump to a file, transfer it to a MySQL server and use above mentioned means to load the data.
All these tasks can be fully automated via scripting.

How to improve performance of retrieving a REF CURSOR into Java using Spring?

I am performing a call to a function which is part of a DB package. This package is deployed in two locations. One local and another remote (across the Atlantic).
I am retrieving the data via the Spring JDBC template.
There is one function which returns approximately 1000 rows (not all that much) and this is taking about 1.5 seconds when getting the data locally but it's taking in the region of 12 seconds when getting the data remotely.
In all sample code, names have been changed and code has been simplified a little.
Please see an example of the current Java code:
SimpleJdbcCall simpleJdbcCall = new SimpleJdbcCall(getDataSource())
.withSchemaName(MY_SCHEMA_NAME)
.withCatalogName("REFCURSOR_PKG")
.withFunctionName("GET_DATA")
.returningResultSet("RESULT_SET", new DataEntryMapper());
SqlParameterSource params = new MapSqlParameterSource()
.addValue("the_name", name)
.addValue("the_rev", rev);
Map resultSet = simpleJdbcCall.execute(params);
ArrayList list = (ArrayList) resultSet.get("RESULT_SET");
The RowMapper class looks something like this:
class RouteDataEntryMapper implements RowMapper {
public RouteDataEntry mapRow(ResultSet resultSet, int rowNum) throws SQLException {
return new DataEntry(resultSet.getString("name"),
Integer.parseInt(resultSet.getString("rev"));
}
}
SQL package spec snippet:
TYPE REF_CURSOR IS REF CURSOR;
SQL function:
FUNCTION GET_ROUTE_DATA(the_name VARCHAR2, the_rev VARCHAR2) RETURN REF_CURSOR AS
RESULT_SET REF_CURSOR;
BEGIN
OPEN RESULT_SET FOR
select *
from table_name tn
where tn.name = the_name
and tn.rev = the_rev;
RETURN RESULT_SET;
CLOSE RESULT_SET;
EXCEPTION WHEN OTHERS THEN
RAISE;
END GET_ROUTE_DATA;
I have tried using regular boiler plate JDBC also (create connection, prepare statement, execute statement, retrieve data from RESULT_SET, etc) and I found that the vast majority of time was spent looping over the RESULT_SET and extracting the data out of it and into some pojos. In the case of the Spring code above, most of the time was spent during the execute() method but this is probably because it creates the objects using the RowMapper at that time.
So, the common area between them is the performing of actions such as:
rs.getString("name")
and I'm guessing that this is where the problem lies but I could be wrong.
As I said, locally the delay is fine but remotely it's taking way too long. Is this because it's going to the DB on every rs.get... ? Is there a better way to do this?
Thanks in advance.

rs.getString("name")
ResultSet.get*(String columnName) can be replaced with ResultSet.get*(int columnNaumber) which is slightly faster but I doubt that the main problem here.
Is this because it's going to the DB on every rs.get... ?
While it really depends the driver I suspect it won't. For a cached result-set it might go to ther server when your scroll through the cursor but it would still fetch a bunch of rows in every roundtrip.
Two more suggestions I have are:
Use a network sniffing utility to see the data being transferred
Check your driver for any option to pre-fetch and such like.

add this line :-
.withoutProcedureColumnMetaDataAccess
in the following code lines
SimpleJdbcCall simpleJdbcCall = new SimpleJdbcCall(getDataSource())
.withSchemaName(MY_SCHEMA_NAME)
.withCatalogName("REFCURSOR_PKG")
.withFunctionName("GET_DATA")
.withoutProcedureColumnMetaDataAccess // to avoid fetching meta data info from database

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Assign WriteConcern to mongo file system - java

Related

jOOQ batch insertion inconsistency

Writing Data from RDS to Disk in JOOQ

DB2 ERRORCODE 4499 SQLSTATE=58009

Transferring Millions of rows from teradata to mySQL

How to improve performance of retrieving a REF CURSOR into Java using Spring?

Categories

Resources