Efficiently insert data in to database in java [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm building an application that has a database that solely will be used for logging purpose. We log the incoming transaction id and its start and end time. There is no use for the application itself from this database. Hence I want to execute this insert query as efficient as possible without affecting the application itself. My idea is to execute the whole database insert code in a separate thread. So in this way, the database insert will run without interfering the actual work. I would like to know whether there is any design patter related to this kind of scenario. Or else whether my thinking pattern is correct for this.

Your thinking pattern is right. Post your generated data from your main thread(s) into a safe-for-multi-threading blocking queue, and have the logging thread loop block waiting for a message to appear in the queue, then sending that message to the database and repeating.
If there is a chance, however small, that your application may be generating messages faster than your logging thread can process them, then consider giving the queue a maximum capacity, so that the application gets blocked when trying to enqueue a message in the event that the maximum capacity is reached. This will incur a performance penalty, but at least it will be controlled, whereas allowing the queue to grow without a limit may lead to degraded performance in all sorts of other unexpected and nasty ways, and even to out-of-memory errors.
Be advised, however, that plain insert operations (with no cursors and no returned fields) are quite fast as they are, so the gains from using a separate thread might be negligible.
Try running a benchmark while doing your logging a) from a separate logging thread as per your plan, and b) from within your main thread, and see whether it makes any difference. (And post your results here if you can, they would be interesting for others to see.)

From my point of view, the best idea is to make an Java + RabbitMq broker + Background process architecture.
For example:
Java process enqueued a JSON message in RabbitMq queue. This step can be done asynchronously through ExecutorService class if you want a thread pool. Anyway, this task can be done synchrounously due to high enqueue speed of RabbitMq.
Background process connects to queue that contains messages and start to consuming them. This process task is to read and intrepret message by message and make the insert in database with its content information.
This way, you will have two separate processes and database operations won't affect main process.

Related

Spring micro-services: Kafka event processing issue [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 months ago.
Improve this question
I'm new to micro-services and need some suggestion on how to address the below issue.
I have two micro-services order micro-service and delivery micro-services. The order micro-service pushes the event to delivery micro-service through Kafka.
I need some assistance in
How do I track a particular event in both micro-services? This will help in logging and tracking the event changes etc.
For this can I generate a random number and add it to the payload which can be used for tracking.
Let's assume that order micro-service has 10 orders and all the ten have been processed and 10 events are generated and are pushed to the Kafka? If there is a failure, how should I handle it?
I thought of creating an error queue for order service and if there are any errors while processing the events from the order micro-service side it can be pushed there. The user can correct the issue and re-try only those events.
At the delivery micro-service how do I ensure that the a particular event is not processed more than once?
Also, if there any errors while processing the events from the delivery micro-service side, how should this be handled?
Same as second point?
Are there any more scenarios which I need to consider?
How do I track a particular event in both micro-services. This will help in logging and tracking the event changes etc.
For that, usually what is used in prod environments is implementing a traceId.
Let us assume that Order micro-service has 10 orders and all the ten have been processed and 10 events are generated and are pushed to the Kafka? If there is a failure how should I handle it?
Generally what is implemented is a retry template (if you are using spring-boot, otherwise something similar in whichever framework you are working with). What this does is that it retries for a set number of times, handling any possible transitive errors. If the error still persists there are multiple possibilities. You could log appropriately and reprocess from that given offset, or in a less critical environment you could save the event as a file and build an Admin RESTful Endpoint which takes the event and processes it in the same way you process the kafka events.
At the delivery micro-service how do I ensure that the a particular event is not processed more than once.
This depends on the logic you perform per kafka message. If you are creating new data (inserting rows) in your db then you could simply check for uniqueness based on primary key (which in this case would be using order-id or smth similar) and refuse to add the new data, by logging it with a warn message. Otherwise, a common way to do this would be to overwrite the previously inserted data. Both ways should be as performant as processing always unique records if you leave the uniqueness to the db and properly handle the exception.
Another workaround towards uniqueness includes editing the producer to send only unique messages. Check this out.
Also, if there any errors while processing the events from the delivery micro-service side, how should this be handled?
See no 2.

Java: what is the best approach for high performance of multi-threading in a time-critical application? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I’m developing a network proxy application using Java 8. For ingress, the main logic is the data-processing-loop: getting a packet in the inbound queue, processing the content data (e.g. protocol-adoption), and put it in the send-queue. Multi virtual TCP channels are allowed in the design, so a data processing thread, among a list of data-processing threads, handles a bunch of channels at a specific time duration, as a part of the whole job (e.g., for the channels with channel.channelId%NUM_DATA_PROCESSING_THREADS = 0, which is determined by a load-balancing scheduler). Channels are stored in an array and accessed by using the channeled as the index of the cell, which is wrapped by a class that provides methods like register, deregister, getById, size, etc., and the instance is called CHANNEL_STORE in the program. I need to use these methods in the main logic (data-processing-loop) by different threads (at least dispatcher thread, data processing thread, and the control operation thread for destroying a channel from the GUI). Then I need to consider concurrency among these threads. I have several candidate-approaches:
Use synchronized or reentrant locks surrounding the register, deregister, getById, etc. This is the simplest and its thread-safe. But I have performance concerns about the lock (CAS) mechanisms since I need to perform the operations on the CHANNEL_STORE (especially getById) at a very high frequency.
Designate the operations of CHANNEL_STORE to a SingleThreadExecutor by executor.execute(runnable) and/or executor.submit(callable). The concern is the performance of creating runnable/callables at each such destination in the data-processing-loop: creating the runnable instance and call execute – I have no idea will this be even more expansive than the synchronized or reentrant locks. In the reality (so far) there is post-operation so only putting runnable and no need to wait for the callable return in the data-processing-loop, although post-operation is needed in the control loop.
Designate the operations of CHANNEL_STORE to a dedicated task by a pair of ArrayBlockingQueue instead of Executor For each access to CHANNEL_STORE, put a task-indicator together with an attachment of parameters to the first queue, and then the dedicated thread loops on this queue by the blocking method take and operates on the CHANNEL_STORE. Then, it put the result to the 2nd queue for the Designator to continue the post-operation (currently no need, however). I regard this as the fastest, assuming the blocking queue in JVM is lock-free. The concern on this is that code is very messy and error-prone.
I think the 2nd and 3rd may be called "serialization".
The reason that I cannot simply assign tasks to a thread-pool for data processing and forget them is that the TCP stream data packets of each channel cannot be disordered, it has to be in serial per channel base.
Questions:
what’s the performance of the second way comparing to the first way?
what’s the suggestion for my situation?
I'm currently using stream-IO for LAN read/write. If using NIO, the coordination between the NIO thread and data processing threads may bring additional complexity (e.g post operations). So I think this question is meaningful for time-critical (stream-based, multi-channel network) applications like mine.
If I understand well your use case, this is a common problem in concurrent programming. One solution is to use the ring buffer approach, which usually offers a good solution to both synchronization and too many objects creation problems.
You can find a good implementation of this in the lmax dispruptor library. See https://lmax-exchange.github.io/disruptor/ to know more about this. But keep in mind that it is not magic and must be adapted to your use case.

Not sure if its best to use a wrapper class or static for a variable that needs to be seen by multiple threads [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am still learning about java and need a recommendation on best practices.
Here is the scenerio:
An encrypted file comes in
A java app picks up the file when it comes.
The java app from the class that listens for the file, in its main method creates 5 blockingqueues (for the consumers), starts up a producer and 5 consumer threads.
The producer thread reads the file and creates 1 big object consisting of 5 other smaller objects within it.
The producer thread then puts each big object into the blockingqueues.
Each consumer thread looks into its own blockingqueue, retrieves the big object, then it retrieves 1 of the 5 smaller objects and writes a file with the information related to that 1 small object.
my problem:
If anything goes wrong in the producer thread while its reading the file, I want the listening class (the one that starts everything up) to know about it so that it can change the extension of the encrypted file to .err
I also want the other 5 consumer threads to know if something wrong occurs in the producer thread so that they can also change the extension of the file that each creates to .err
Not sure if a wrapper class would be recommended more in this scenerio that I pass into the blockingqueue or to use a static variable in the listening or producer class that all the threads can look at to know if an error occurred. Thank you for your help
or if there is a better solution please let me know
What if instead of having each child thread write out their results to a file, the results were aggregated back to a result handler? This way if there was an error, the result handler can handle it appropriately (by adding the .err extension).
Most of the performance advantages of concurrency have to do with better CPU usage, but since you're writing to a single piece of hardware (disk, probably) there really isn't an advantage to doing that concurrently anyway.
The main disadvantage to this approach would be that your memory overhead would be a little bigger, since you would have to keep the outputs from each consumer in memory until all five had finished writing, instead of being able to have them each finish and persist separately. Honestly, you'll have to do that anyway, since an error in one consumer could happen after some other consumer had already finished and persisted.

Thinking in node if Java background [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am a Java Developer where everything is working in sequential way (or concurrently with multiple threads), one after another. And it's logical to place things in a sequential way.
But node works in concurrent order with single thread. How it can be beneficial even if it is working only on single thread?
Frankly telling, I didn't get the concept of single thread in node. Only one thread handle everything?
Any advice would be beneficial on how I can start thinking in node.
Synchronous Programming(Java)
If you are familiar with synchronous programming (writing code that does one thing after the other) like Java or .Net. take the following code,
For example:
var fs = require('fs');
var content = fs.readFileSync('simpleserver1.js','utf-8');
console.log('File content: ');
console.log(content);
It writes out the code for a simple web server to the console. The code works sequentially, executing each line after the next. The next line is not executed until the previous line finishes executing.
Although this works well,
what if the file in this example were really large and took minutes
to read from?
How could other operations be executed while that code or long
operation is running?
These questions will not arise if you are working in java, because you have many threads to work for you(to serve multiple requests)
Asynchronous Programming(Node.Js)
But when you are using Node you just have a single thread, which serves all requests.
So there comes asynchronous programming, to help you in javascript(Node)
To execute operations while other long operations are running, we use function callbacks. The code below shows how to use an asynchronous callback function:
var fs = require('fs');
fs.readFile('simpleserver1.js','utf-8', function(err,data){
if (err) {
throw err;
}
console.log(“executed from the file finishes reading”);
});
//xyz operation
Notice that the line “executed from the file finishes reading” is executed as the file is being read, thus allowing us to perform other operations while the main reading of the file is also being executed.
Now look at the //xyz operation, in the code. when the file is being read, the server will not wait for the file to be read completely. it will just start executing //xyz operation, and will get back to , the callback function provided in fs.readFile(, when the file is ready.
So thats how Asynchronous programming works in Node.
Also if you want to conpare java and Node you can read this Article
EDIT:
How is node.Js single Threaded
lets take a scenario, where clients request server:
Assumptions:
1) there is single server process, say serverProcess,
2) There are 2 clients requesting server, say clientA and clientB.
3) Now, consider clientA, is going to require a file Operation(as one
shown above using fs).
what happens here,
Flow:
1) clientA requests serverProcess, server gets the request, then
it starts performing file operation. Now it waits till the file is
ready to read(callback is not yet invoked yet).
2) clientB requests serverProcess, Now the server is free right
now, as it is not serving clientA, so it servs clientB, in the
mean-time, the callback from fs.read, Notifies the server that file
data is ready, and it can perform operations on it.
3) Now server starts serving 'clientA'.
now you see, there was just one thread of server , which handled both the client requests, right?
Now what would have happened if this was JAVA, you would have created another thread of server for serving clientB, while clientA was being served by first thread, and waiting for file to be read. So this is how Node is single threaded, meaning A single Process Handles all the requests.
Question:
while there is another process invoked who prepared data from file system, how would you say node is single threaded:
See, I/O(files/database), is itself a different process, what difference here is,
1) Node does not wait for everything to be ready(like java), instead it will just start its next work(or serve other requests), but whatever happens, node will not create a different thread to serve rest of the requests(unless explicitly done//not recommended though).
2) while java will create another thread itself for serving new requests.
This has been said million times, but let me give you a short answer with respect to Java.
You create separate Thread in Java if you want to read a long file, without blocking main thread.
In Javascript, you just read the file using callbacks.
Main difference between those two:
It is easier to screw up the code with multiple threads (race condition, etc).
You do not need exactly the power of CPU's second core to read the file, it is a question of slow I/O, not intensive communication.
In callbacks, there is single thread as you said. Though, it just asks underlying system to read the file, and continues executing your code. Once the file is read, then javascript pauses the code it was executing, and will come back to run your Callback.
Sometimes, you also have to do computationally intensive stuff in Javascript. In that case you can spawn a new process - look into cluster module. But usually, computationally, or I/O heavy operations are already done for you, and you just use them using callbacks.
Ok giving you a head start. It is not about threads its about tasks per second. In a thread mode threads block when they wait for something.
In a non-blocking design everytime you wait for something you just give the thread back and be awaken if the event you are waiting for occured. Those events are known as future. So as in the future i want to do this when this and that has happend (or in a failure case do this other thing). Thats basically it.
It is not node or javascript. It is famous for scala too and sure there are plenty of other languages. And if you are a Java guy look for async processing. Jetty provides it. Vertx is famous for a share nothing architecture.
So have fun with this. I use it regularly. I have a server storing 20GB of data in a custom datastore. Wanna know how we scaled? We brought 512GB for the server and did 20 of those stores in parallel sharing nothing. Its like having 20 servers in one machine with no noticable latency and you scale with the cores. Thats how we do business in todays world.
Hardware is cheap so why fiddle with concurrency on the lowest level?

executing millions of thread concurrently in java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a requirement to handle millions of thread and i know its quite dependent on the hardware configuration and jvm.
I have used executors for the task
call flow of my project :
user(mobile)----->Server(Telecom) ------>Application----->Server(Telecom)----->User
Code call flow :
A------>B---------->C
//Code snippet of A
public static final int maxPoolSize=100;
ExecutorService executorCU=Executors.newFixedThreadPool(maxPoolSize);
Runnable handleCalltask=new B(valans, sessionID, msisdn);
executorCU.execute(handleCalltask);
//Code snippet of B
public static final int maxPoolSize=10;
ExecutorService executor=Executors.newFixedThreadPool(maxPoolSize);
Runnable handleCalltask=new c(valans, sessionID, msisdn);
executor.execute(handleCalltask);
and there are shared map which i implemented as concurrencyHashMap which gets loaded at the loading of application.
Is my approach is correct and if not can anybody suggest how i can achieved maximum threading in my web application.
I have tested with Jmeter and its result are not at all encouraging.
Thanks.
Is my approach is correct
IMO, no, it's definitely not the correct approach.
and if not can anybody suggest how i can achieved maximum threading in my web application.
Separate receiving messages from the client with processing the messages. That way, you can horizontally scale the two parts independently to meet your requirements without having millions of threads in a single JVM.
A few suggestions:
1) I'd make the web application as light as possible and submit any long running tasks to some sort of backend processor.
Within the same JVM, you could use a ThreadPoolExecutor with an ArrayBlockingQueue.
If you wanted to submit the jobs to another JVM, you could use JMS with competing consumers or something like Apache Kafka.
Again the benefit here is that you can add more nodes to either the backend or frontend of the app as required.
2) If required, make your application server's thread pool larger.
For instance, with Tomcat you'd tweak the parameters described here: http://tomcat.apache.org/tomcat-7.0-doc/config/executor.html. Explaining how to correctly tune these parameters is more than I can describe here. Among other things, the values you select will depend on the average number of concurrent requests, the maximum number of concurrent requests, the time required to serve a single request, and the number of application servers in your pool.
3) You'll get the most scalability by reducing statefulness.
If a request can be dispatched to any front end consumer and then processed by any backend consumer, you can add more instances of either to scale. If one request depends on another, you'll need to synchronize the processing of requests across nodes, which reduces scalability. Design things to be stateless from the start if at all possible.
I have tested with Jmeter and its result are not at all encouraging.
You need to profile your application to determine where the hot spots are. If you follow my recommendations above, you can easily add more horsepower where required.

Categories