I am writing a scheduler which grabs XML data and inserts into MySQL DB - simple isn't. But the problem or the logic that I am trying to find is here. NOTE: I want to execute this in windows environment in future it might be configured for other platforms.
Scheduler should run on every 5 mins.
This script should fetch condition/configuration on what to parse and collect the data-fields from XML and these conditions are available from MySQL table.
This table also defines a delay in which this script should check for the difference in the XML fields & delay.
This script does both, one is running for every 5 mins to collect XML and check the difference in the table (MySQL) for every said delay.
This script then reads the XML data-fields and parses it, then collects only those data-fields that is defined from the above MySQL table.
The collected data will be inserted into MySQL DB only when there is change in the state and this state is defined from MySQL table.
Feedback/Suggestions:
Due to the delay, I am not sure how should I store the configuration in the script which will be shared between each schedules.
Is there anyway to use static variable in the code to store this data? Which will be shared b/w different jobs? or different schedules?
Basically, how should I implement this? A better approach in terms of performance.
Thanks for your time.
UPDATE:
One of the suggestion is to use Java Code as a windows service (?) we could have some common data shared between different jobs? - does it make sense?
Reference:
Java Service Wrapper
Concurensy is the answer, try creating Thread pool or Executor servises, and stop certain threads for 5min , you coud even use Synchronization if few threads will be working with the same resource.
Remember not always the more threads use the faster you will finish your job f.e. 3 threads- 2 min
5threads-6 min
*Read tutorial about threads
*create fe simple threads with wait for 5 min
*read some tutorials about thread pool/synchnizations and sharing resourses (script part)
*test to find the most optimal way
Related
Multiple instances of my multi-threaded(approx 10 threads) application is running on different machines(approx 10 machines). So overall 100 threads of this application are active simultaneously.
Each of these threads produce 4 output sets, each set containing 1k-5k rows. Each of these sets is pushed to a single Mysql machine , same db, same table(insert or update operation). So there are 4 tables consuming 4 sets produced by each thread.
I am using mybatis as ORM. These threads may consume a lot of time in writing output to DB than processing the requests.
How can I optimize the database writes in this case?
1. Use batch processing of mybatis
2. Write data to files which will be picked up by single consumer thread & written into DB?
3. Write each data set to different files & use 4 consumer threads to pick data from same set that must be pushed to same table, so locking is minimized?
Please suggest other better ways if possible?
Databases are made to handle concurrency. Not sure what exactly mybatis brings into the picture (not a huge fan of ORM in general), but if it is using it, that makes you start thinking about hacks like intermediate files and single-threaded updates, you are probably much better off ripping it out and writing to db with plain jdbc, which should have no problem handling your use case, provided, you batch your updates adequately.
I have a thread cleaner in my code that is being created if the DB capacity was exceeded, the capacity is checked on every insertion to the DB. I would like to add more functionality to this cleaner and clean also when number of files exceeding, lets say 10000 files. The new functionality should run scheduled.
I want to be able to clean the DB in 2 ways:
1. On demand.
2. Scheduled, every day on X hour.
Which concurrent java class to use?
How can I make sure that the same thread will be used by the 2 ways above?
Code that would perform cleanup of DB should be completely separated out of scheduling (single responsibility principle), so that you could execute it at any time from some other code.
As for scheduling, I would suggest you looking at Quartz scheduler, and get familiar with CRON so that you could extract it to properties to have possibility to change scheduling trigger without modifying your code.
You should synchronize your code so that no more than one cleanup gets performed at the same time, this should be easy with standard synchronize.
If you wish to make it very simple and don't want to add new dependencies, you can go with standard Java solution: Timer. Timer#scheduleAtFixedRate can provide fixed rate execution. Which means you'll have to add extra code whenever new requirements will show up (e.g., don't schedule at weekend).
Can you help me in two problem :
A. We have a table on which read and write operation happens simultaneously. Write happens very vastly so read is very slow - sometimes my web application does not come up due to heavy write operation on this table. How could i handle such scenario. Write happens through different Java application while read happens through our web application, so web application become very slow. Any idea?
B. Write happens to this table happens through 200 threads, these thread take connection from connection pool and write into the table and this application run 24 by 7. is the thread priority is having issue and stopping read operation from web application.
C. Can we have master- master replication for that table only- so write happens in one table and write happens in other table and every two minute data migrates from one table to other table?
Please suggest me .
Thanks in advance.
Check connection pool size - maybe it's too small and your threads waste time waiting for connection from pool.
Check your database settings, if you just running it with out-of-the-box params there maybe a good space for improvements.
You probably need some kind of event-driven system - when vehicle sends data DB is not updated, but a message is added to some queue (e.g. JMS). Your app then caches data on startup, and updates both cache and database upon receiving this message. The key thing is that the only component that interacts with DB is your app, and data changed only when you receive event - so you don't need to query DB to read the data, plus you may do updates in the background using only few threads, etc. There are quite good open-source messaging systems (e.g. Apache Active MQ) and caching libraries (e.g. EH Cache), so you can built reasonably perfomant and fault-tolerant system with not too much effort.
I guess introducing messaging will be a serious reengineering, so to solve your immediate problem replication might be the best solution - merge data from the updateable table to another one every 2 minutes, and the tracker will read that another table; obviously works well if you only read the data in the web-app, and not update them, otherwise you need to put a lot of effort to keep 2 tables in sync. A variation of that is batching - data from vehicle are iserted into intermediate table, and then every 2 minutes transferred into main table from which reader queries them; intermediate table is cleaned after transfer.
The one true way to solve this is to use a queue of write events and to stop the writing periodically so that the reader has a chance.
Create a queue for incoming write updates
Create an atomicXXX (see java.util.concurrency) to use as a lock
Create a thread pool to read from the queue and execute the updates when the lock is unset
Use javax.swing.Timer to periodically set the lock and read the table data.
Before trying anything too complicated try this perhaps:
1) Don't use Thread priorities, they are rarely what you want.
2) Set up your own priority scheme, perhaps simply by having a (priority) queue for both reads and writes where reads are prioritized. That is: add read and write requests to a single queue and have them block or be notified of the result.
3) check your database features to optimize write heavy tables
I want to run some kind of Thread continuously in app engine. What the thread does is
checks a hashmap and updates entries as per some business continuously.
My hashmap is a public memeber variable of class X. And X is a singleton class.
Now I know that appengine do not support Thread and it has somethinking called backend.
Now my question is: If I run backend continiously for 24*7 will I be charged?
There is no heavy processing in backend. It just updates a hashmap based on some condition.
Can I apply some trick so that am not charged? My webapp is not for commercial use and is for fun.
Yes, backends are billed per hour. It does not matter how much they are used: https://developers.google.com/appengine/docs/billing#Billable_Resource_Unit_Costs
Do you need this calculation to happen immediatelly? You could run a cron job, say ever 5 min and perform the task.
Or you can too enqueue a 10 minutes task and re-enqueue when is near to arrive to its 10 minutes limit time. For that you can use the task parameters to pass the state of the process to the next task or also you can use datastore.
We have a JDBC batch job. There are two tables:
BUSINESS_CONTRACT
CLASSIFY_RECORD
The table BUSINESS_CONTRACT stores information of business contracts, we classify business contracts every month and store classify result in the table CLASSIFY_RECORD.
The batch job runs once per month, query the BUSINESS_CONTRACT for those business contracts need to be classified and classify them then insert classify results into CLASSIFY_RECORD.
The batch job runs in a single thread right now, and I want to make it runs with multi-threads
How should I write the basic code structure using the dispatcher-worker pattern?
I learn java multi-threading, but found theoretical resources mostly.Now I want to use multi-threading to solve a real problem, but don't know how to write the first line code.
First, do you need the added complexity of multi-threading? How long does your current process take to run? Do you have multiple CPUs or multiple CPU cores available on the server you would be running this on, that would make the multi-threading beneficial?
I'm not going to write your code for you, but can give you a few pointers...
How would you do this work manually? Assume you had these as paper records, and had to split the task with a co-worker. How would you divide up the work? Between 2 people or 20 people? (That's how many threads you could potentially split this into.)
Once you have these details figured out, you can create multiple threads (your workers, using parent "dispatcher" code) - each configured to select only a portion of the results from your query. You should keep references to each of your threads, and call .join() on each of them once they are all started in order to wait for the entire batch to complete. If there is a large amount of data that will be difficult to split into equal units of work (1,000 records divided into 500 and 500 may require 75% and 25% of the resources for whatever reason), you may want to consider splitting the work into much smaller units (more units than threads), then have the dispatcher continue to feed the units of work to the workers until all work has been assigned.
Also consider, would these split functions of work be truly distinct? If one unit of work fails for some reason and needs to be rolled-back in the database, does this mean that all of the other units of work need to be stopped and any existing inserts rolled-back as well?
Are you using batch updates? It will probably make more of a difference than multiple threads doing single updates.