I am Using Executorsevice to generate files from database. I am using jdbc and core java to get the table data into files.
After creating the Executorservice with 10 threads I am submitting 60 threads in a for loop to get 60 files parallelly. This is working fine with small data and a table with few columns. But in case of a huge file and for tables having more columns, the thread which is working on huge table data/ more columns table is stopping without giving any information in the log when the other threads are completed .
ExecutorService executor = Executors.newFixedThreadPool(THREAD_COUNT);
for (String filename : filenames) {
EachFileThread worker = new EachFileThread(destdir, converter,
filename, this);
executor.execute(worker);
}
executor.shutdown();
Inside Eachfilethread I am reading the xml and get columns, table and form a query and executing the query and formatting the data and putting the data into file
forTable = (FileData) converter.convertFromXMLToObject( filename + ".xml");
String query = getQuery(forTable);
statement = connection.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_READ_ONLY);
resultSet = statement.executeQuery(query);
resultSet.setFetchSize(3000);
WriteData(resultSet, filepath, forTable);(formatting the data from db and then writing to a file)
The problem is that you are not waiting for all the jobs to finish what they were doing. As #msandiford suggested in the comment you should add call to awaitTermination(..) after calling shutdown() as it is in sample shutdownAndAwaitTermination() method on https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html
For example you can try to do it like so:
ExecutorService executor = Executors.newFixedThreadPool(THREAD_COUNT);
for (String filename : filenames) {
EachFileThread worker = new EachFileThread(destdir, converter, filename, this);
executor.execute(worker);
}
executor.shutdown();
try {
// Wait a while for existing tasks to terminate
if (!executor.awaitTermination(60, TimeUnit.SECONDS)) {
executor.shutdownNow(); // Cancel currently executing tasks
// Wait a while for tasks to respond to being cancelled
if (!executor.awaitTermination(60, TimeUnit.SECONDS))
System.err.println("Executor did not terminate");
}
} catch (InterruptedException ie) {
// (Re-)Cancel if current thread also interrupted
executor.shutdownNow();
// Preserve interrupt status
Thread.currentThread().interrupt();
}
Related
For clear investigation I have only one thread producing an entity and one thread consuming it. These two parts share LinkedBlockingQueue. After consuming the entity the thread pass it forward to other thread to save entity in DB. The producing thread stops working after few iterations of inserting and removes an entity via queue. Debug logging shows it like the queue blocks the insert operation even when the queue is empty or has enough space.
Producer code:
final BlockingQueue<Entity> queue = new LinkedBlockingQueue<>(8); //located in calling method
....................................................................................
do {
List<Entity> entityList = entityDatasource.getEntity();
for (Entity entity: entityList) {
try {
log.debug("Size before insert opertaion is: " + queue.size());
queue.put(entity);
log.debug("Size after insert opertaion is: " + queue.size());
} catch (InterruptedException ex) {
...
}
}
} while (atomicBool.get());
Consumer code:
CompletableFuture<Void> queueHandler = CompletableFuture.runAsync(() -> {
do {
try {
log.debug("Queue size is: " + queue.size());
Entity entity = queue.take();
log.debug("Queue size is: " + queue.size());
storeInDb(entity);
} catch (InterruptedException ex) {
...
}
} while (atomicBool.get());
}, asyncPoolQueueHandler); //ThreadPoolTaskExecutor
List<CompletableFuture<Void>> pool = new ArrayList<>();
IntStream.range(0, 1).forEach(i -> {
pool.add(queueHandler);
});
CompletableFuture.allOf(pool.toArray(CompletableFuture[]::new));
DB store:
CompletableFuture
.supplyAsync(() -> {
return entityRep.save(entity);
}, asyncPoolDbPerformer).join(); //ThreadPoolTaskExecutor
VisualVM screenshot
I was wached VisualVM, but there is nothing unexpected to me: when producer stuck then other parts of pipeline are motionless. I would be grateful for advice on what I could do with my issue
The problem was in wrong design. Producer-consumer is not normal solution. More appropriate way is using synchronous blocking pipeline scaled by performance of bottleneck. In my case I'm bounded by database pool connection performance.
(dataSource->businessLogic->dataDestination) x N
where N is scale
I am trying to execute a query over a table in BigQuery using its Java client libraries. I create a Job and then get the result of Job using job.getQueryResults().iterateAll() method.
This way is working but for large data like 600k it takes time around 80-120 seconds. I see BigQuery gets data in 40-45k batches which takes around 5-7 sec each.
I want to get the results faster and I found over internet that if we can get the temporary table created by BigQuery from the Job and the read the data in avro or some other format from that table if will be really fast, but in BigQuery API(using version: 1.124.7) I don't see that way.
Does anyone know how to do that in Java, or how to get data faster in case of large number of records.
Any help is appreciated.
Code to Read Table(Takes 20 sec)
Table table = bigQueryHelper.getBigQueryClient().getTable(TableId.of("project","dataset","table"));
String format = "CSV";
String gcsUrl = "gs://name/test.csv";
Job job = table.extract(format, gcsUrl);
// Wait for the job to complete
try {
Job completedJob = job.waitFor(RetryOption.initialRetryDelay(Duration.ofSeconds(1)),
RetryOption.totalTimeout(Duration.ofMinutes(3)));
if (completedJob != null && completedJob.getStatus().getError() == null) {
log.info("job done");
// Job completed successfully
} else {
log.info("job has error");
// Handle error case
}
} catch (InterruptedException e) {
// Handle interrupted wait
}
Code to read same table using Query(Takes 90 Sec)
Job job = bigQueryHelper.getBigQueryClient().getJob(JobId.of(jobId));
for (FieldValueList row : job.getQueryResults().iterateAll()) {
System.out.println(row);
}
I tried certain ways and based on that found the best way of doing it, just thought to post here to help some one in future.
1: If we use job.getQueryResults().iterateAll() on job or directly on table, it takes same time. So if we don't give batch size BigQuery will use batch size of around 35-45k and fetch the data. So for 600k rows (180Mb) it takes 70-100 sec.
2: We can use the temp table details from created job and use extract job feature of table to write the result in GCS, this will be faster and takes around 30-35 sec. This approach would not download on local for that we again need to use ..iterateAll() on temp table and it will be take same time as 1.
Example pseudo code:
try {
Job job = getBigQueryClient().getJob(JobId.of(jobId));
long start = System.currentTimeMillis();
// FieldList list = getFields(job);
Job completedJob =
job.waitFor(
RetryOption.initialRetryDelay(Duration.ofSeconds(1)),
RetryOption.totalTimeout(Duration.ofMinutes(3)));
if (completedJob != null && completedJob.getStatus().getError() == null) {
log.info("job done");
String gcsUrl = "gs://bucketname/test";
//getting the temp table information of the Job
TableId destinationTableInfo =
((QueryJobConfiguration) job.getConfiguration()).getDestinationTable();
log.info("Total time taken in getting schema ::{}", (System.currentTimeMillis() - start));
Table table = bigQueryHelper.getBigQueryClient().getTable(destinationTableInfo);
//Using extract job to write the data in GCS
Job newJob1 =
table.extract(
CsvOptions.newBuilder().setFieldDelimiter("\t").build().toString(), gcsUrl);
System.out.println("DestinationInfo::" + destinationTableInfo);
Job completedJob1 =
newJob1.waitFor(
RetryOption.initialRetryDelay(Duration.ofSeconds(1)),
RetryOption.totalTimeout(Duration.ofMinutes(3)));
if (completedJob1 != null && completedJob1.getStatus().getError() == null) {
log.info("job done");
} else {
log.info("job has error");
}
} else {
log.info("job has error");
}
} catch (InterruptedException e) {
e.printStackTrace();
}
3: This is the best way which I wanted. It downloads/writes the result faster in local file. It downloads data in around 20 sec. This is the new way BigQuery provides and can be checked using below links:
https://cloud.google.com/bigquery/docs/reference/storage#background
List item
https://cloud.google.com/bigquery/docs/reference/storage/libraries#client-libraries-install-java
Server receives http request in servlet, from servlet calls method in ejb component.
public void ejbMethodVariant1(...) {
//calling stored proc
...
//calling same stored proc
}
public void ejbMethodVariant2(...) {
//calling stored proc
...
Thread t = new Thread(() -> {
//calling same stored proc
});
t.start();
try {
t.join();
} catch (InterruptedException e){
...
}
}
Stored proc is always the same.
"Calling stored proc" means:
Getting connection from data source
Creating callable statement
Executing callable statement
Closing statement
Closing connection
In variant 1 - all works perfectly, without errors. Connections in first and second call have autoCommit=false.
In variant 2 - first call completes successfully, second - time out after 2 minutes (com.microsoft.sqlserver.jdbc.SQLServerException: The query has timed out.). Connection in first call has autoCommit=false, in second call have autoCommit=true.
You're starting a new thread which doesn't have the transaction context, security context, etc copied to it. If you want to use a new thread to run the statement consider using the EE Concurrency utilities in Java EE 7.
I have simple vert.x app:
public class Main {
public static void main(String[] args) {
Vertx vertx = Vertx.vertx(new VertxOptions().setWorkerPoolSize(40).setInternalBlockingPoolSize(40));
Router router = Router.router(vertx);
long main_pid = Thread.currentThread().getId();
Handler<ServerWebSocket> wsHandler = serverWebSocket -> {
if(!serverWebSocket.path().equalsIgnoreCase("/ws")){
serverWebSocket.reject();
} else {
long socket_pid = Thread.currentThread().getId();
serverWebSocket.handler(buffer -> {
String str = buffer.getString(0, buffer.length());
long handler_pid = Thread.currentThread().getId();
log.info("Got ws msg: " + str);
String res = String.format("(req:%s)main:%d sock:%d handlr:%d", str, main_pid, socket_pid, handler_pid);
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
serverWebSocket.writeFinalTextFrame(res);
});
}
};
vertx
.createHttpServer()
.websocketHandler(wsHandler)
.listen(8080);
}
}
When I connect this server with multiple clients I see that it works in one thread. But I want to handle each client connection parallelly. How I should change this code to do it?
This:
new VertxOptions().setWorkerPoolSize(40).setInternalBlockingPoolSize(40)
looks like you're trying to create your own HTTP connection pool, which is likely not what you really want.
The idea of Vert.x and other non-blocking event-loop based frameworks, is that we don't attempt the 1 thread -> 1 connection affinity, rather, when a request, currently being served by the event loop thread is waiting for IO - EG the response from a DB - that event-loop thread is freed to service another connection. This then allows a single event loop thread to service multiple connections in a concurrent-like fashion.
If you want to fully utilise all core on your machine, and you're only going to be running a single verticle, then set the number of instances to the number of cores when your deploy your verticle.
IE
Vertx.vertx().deployVerticle("MyVerticle", new DeploymentOptions().setInstances(Runtime.getRuntime().availableProcessors()));
Vert.x is a reactive framework, which means that it uses a single thread model to handle all your application load. This model is known to scale better than the threaded model.
The key point to know is that all code you put in a handler must never block (like your Thread.sleep) since it will block the main thread. If you have blocking code (say for example a JDBC call) you should wrap your blocking code in a executingBlocking handler, e.g.:
serverWebSocket.handler(buffer -> {
String str = buffer.getString(0, buffer.length());
long handler_pid = Thread.currentThread().getId();
log.info("Got ws msg: " + str);
String res = String.format("(req:%s)main:%d sock:%d handlr:%d", str, main_pid, socket_pid, handler_pid);
vertx.executeBlocking(future -> {
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
serverWebSocket.writeFinalTextFrame(res);
future.complete();
});
});
Now all the blocking code will be run on a thread from the thread pool that you can configure as already shown in other replies.
If you would like to avoid writing all these execute blocking handlers and you know that you need to do several blocking calls then you should consider using a worker verticle, since these will scale at the event bus level.
A final note for multi threading is that if you use multiple threads your server will not be as efficient as a single thread, for example it won't be able to handle 10 million websockets since 10 million threads event on a modern machine (we're in 2016) will bring your OS scheduler to its knees.
It seems that Hibernate Search synchronous execution uses other threads than the calling thread for parallel execution.
How do I execute the Hibernate Search executions serially in the calling thread?
The problem seems to be in the org.hibernate.search.backend.impl.lucene.QueueProcessors class :
private void runAllWaiting() throws InterruptedException {
List<Future<Object>> futures = new ArrayList<Future<Object>>( dpProcessors.size() );
// execute all work in parallel on each DirectoryProvider;
// each DP has it's own ExecutorService.
for ( PerDPQueueProcessor process : dpProcessors.values() ) {
ExecutorService executor = process.getOwningExecutor();
//wrap each Runnable in a Future
FutureTask<Object> f = new FutureTask<Object>( process, null );
futures.add( f );
executor.execute( f );
}
// and then wait for all tasks to be finished:
for ( Future<Object> f : futures ) {
if ( !f.isDone() ) {
try {
f.get();
}
catch (CancellationException ignore) {
// ignored, as in java.util.concurrent.AbstractExecutorService.invokeAll(Collection<Callable<T>>
// tasks)
}
catch (ExecutionException error) {
// rethrow cause to serviced thread - this could hide more exception:
Throwable cause = error.getCause();
throw new SearchException( cause );
}
}
}
}
A serial synchronous execution would happen in the calling thread and would expose context information such as authentication information to the underlying DirectoryProvider.
Very old question, but I might as well answer it...
Hibernate Search does that to ensure single-threaded access to the Lucene IndexWriter for a directory (which is required by Lucene). I imagine the use of an single-threaded executor per-directory was a way of dealing with the queueing problem.
If you want it all to run in the calling thread you need to re-implement the LuceneBackendQueueProcessorFactory and bind it to hibernate.search.worker.backend in your hibernate properties. Not trivial, but do-able.