As you know, trading strategies take actions based on real time feed, such as when the bid or the last trade price changes. A data feed provider streams quotes to our desktop application asynchronously in a separate thread from the main thread. This data feed thread is spawned when you make a request to the data feed provider and lives until you explictly send a request to stop the streaming.
As it stands, the data feed thread executes trading strategies because most of them are designed to enter or update orders upon tick data. Do you see any problem with this approach? Is this design common in trading applications?
I'm using Java.
You definitely don't want to execute a trading strategy on the data feed thread, particularly if the execution takes a while. That execution should happen on a different thread. I am not that familiar with Java, but I assume you could make use of a thread pool there. In C# a very powerful way to spread out work over multiple threads would be using Tasks.
Another thing you might want to think about is what to do when there are new ticks for an instrument while you are still processing the previous tick. In many cases it makes sense to only process the most recent one. I have written up a little post on what I termed the most recent update pattern with a sample implementation in C#. Maybe you find that useful.
As it stands, the data feed thread executes trading strategies because most of them are designed to enter or update orders upon tick data.
Not quite. The data feed thread triggers the execution of trading strategies. You don't want any other processing to slow down the data feed thread.
Related
I built a small video frame analysis app with desktop Java 8. On each frame, I extract data (5 doubles now, but could expand to a 1920x1080x3 OpenCV Mat in the future). I would like to store this data into a database (Java DB, for example) to perform some time-series analysis, and periodically return the results to the user.
I am worried about hard-drive access times if I write to the database and run the app on a single thread, and the best solution that occured to me would be to implement the producer/consumer pattern with multithreading. The examples I found all implement 3 threads:
the main thread
the producer thread
the consumer thread
Is there an advantage in doing that compared to a 2 thread implementation?
main and producer thread
consumer thread
And is that the right way to handle real-time data with a database?
It's limiting to use a fixed number of threads. My PC has (only) 8 cores, your intensive sounding app is not going to use half of them, indeed probably only the consumer is the intensive one, so maybe 12.5%. You'll have to have several of each thread to get the most out of the CPU, and then you'll spend a lot of effort managing threads.
The alternative is to use one of various existing systems for executing work in the background. For example ThreadPoolExecutor With that you can just throw lots of work at it (Runnables) and it will queue work up, and execution can be scaled to suit the hardware it's running on by customizing the number of worker threads.
Or if you're using Swing, then SwingWorker. The advantage of this is you can do some work on a background thread and post the results on the foreground (main/UI) thread easily.
Your question is rather conceptional, so I think it belongs here: Programmers
But as one short hint from my experience, you separate the producer from the main because your main control may freeze if something goes wrong with the producer. Things like frozen forms, not responding controls etc. may be the result. Give your system a chance to reestablish by command.
This question is semi-theory, semi-how to properly code.
I am thinking about making an app in Java that will accepted streaming data, and as the data comes in, update a GUI.
So, what I am thinking of doing is just spawning off threads in Java that will:
collect data for X-milliseconds,
Take new data and update GUI with it
At the same time, start a new thread, collecting data for X milliseconds
This new thread must start off right where the first thread began
And, at the same time, all other parts of the program around going on in their own threads too.
So I need to make sure the threads don't collide, no data is lost in the mix, and I need to have an understanding of the speed limits. Say if the data is coming in at 1 Gbs vs 1 Mbs, what programming difference does that make?
The specific application includes data coming in from bluetooth and also data coming in from the Internet via an HTTPS rest API
If anyone has examples, either online or something quick and dirty right here, that'd be great. My Google searches came up dry..
The question is rather broad, but from an archtetctural point of view, I think the complexity decreases greatly if you change it to one thread reading from your device and putting the data into a buffer and one thread reading from that buffer and updating the UI. This reduces the code that needs to take care of multiple threads accessing it at the same time (idealy it reduces it to the buffer you use) and make synchronization much easier. It also decouples the fetching of the data from displaying it.
Writing the buffer can start off with using PipedInputStream and PipedOutputStream, however in one of my projects it turned out not to be fast enough if you really want to provide real-time processing and display, so you might end up writing yourself a low-latency buffer class.
I am also thinking of integrating the disruptor pattern in our application. I am a bit unsure about a few things before I start using the disruptor
I have 3 producers, mainly a FIX thread which de-serialises the requests. Another thread which continously modifies order price as the market moves. Also we have one more thread which is responsible for de-serialising the requests sent from a GUI application. All three threads currently write to a Blocking Queue (hence we see a lot of contention on the queue)
The disruptor talks about a Single writer principle and from what I have read that approach scales the best. Is there any way we could make the above three threads obey the single writer principle?
Also in a typical request/response application, specially in our case we have contention on an in memory cache, as we need to lock the cache when we update the cache with the response, whilst a request might be happening for the same order. How do we handle this through the disruptor, i.e. how do I tie up a response to a particular request? Can I eliminate the lock on the cache if yes how?
Any suggestions/pointers would be highly appreciated. We are currently using Java 1.6
I'm new to distruptor and am trying to understand as much usecases as possible. I have tried to answer your questions.
Yes, Disruptor can be used to sequence calls from multiple
producers. I understand that all 3 threads try to update the state
of a shared object. And a single consumer which takes necessary action on the shared object. Internally you can have the single consumer delegate calls to the appropriate single threaded handler based on responsibility. The
The Disruptor exactly does this. It sequences the calls such that
the state is accessed only by a thread at a time. If there's a specific order in which the event handlers are to be invoked, set up the memory barrier. The latest version of Disruptor has a DSL that lets you setup the order easily.
The Cache can be abstracted and accessed through the Disruptor. At a time, only a
Reader or a Writer would get access to the cache, since all calls to
the cache are sequential.
Our company has a Batch Application which runs every day, It does some database related jobs mostly, import data into database table from file for example.
There are 20+ tasks defined in that application, each one may depends on other ones or not.
The application execute tasks one by one, the whole application runs in a single thread.
It takes 3~7 hours to finish all the tasks. I think it's too long, so I think maybe I can improve performance by multi-threading.
I think as there is dependency between tasks, it not good (or it's not easy) to make tasks run in parallel, but maybe I can use multi-threading to improve performance inside a task.
for example : we have a task defined as "ImportBizData", which copy data into a database table from a data file(usually contains 100,0000+ rows). I wonder is that worth to use multi-threading?
As I know a little about multi-threading, I hope some one provide some tutorial links on this topic.
Multi-threading will improve your performance but there are a couple of things you need to know:
Each thread needs its own JDBC connection. Connections can't be shared between threads because each connection is also a transaction.
Upload the data in chunks and commit once in a while to avoid accumulating huge rollback/undo tables.
Cut tasks into several work units where each unit does one job.
To elaborate the last point: Currently, you have a task that reads a file, parses it, opens a JDBC connection, does some calculations, sends the data to the database, etc.
What you should do:
One (!) thread to read the file and create "jobs" out of it. Each job should contains a small, but not too small "unit of work". Push those into a queue
The next thread(s) wait(s) for jobs in the queue and do the calculations. This can happen while the threads in step #1 wait for the slow hard disk to return the new lines of data. The result of this conversion step goes into the next queue
One or more threads to upload the data via JDBC.
The first and the last threads are pretty slow because they are I/O bound (hard disks are slow and network connections are even worse). Plus inserting data in a database is a very complex task (allocating space, updating indexes, checking foreign keys)
Using different worker threads gives you lots of advantages:
It's easy to test each thread separately. Since they don't share data, you need no synchronization. The queues will do that for you
You can quickly change the number of threads for each step to tweak performance
Multi threading may be of help, if the lines are uncorrelated, you may start off two processes one reading even lines, another uneven lines, and get your db connection from a connection pool (dbcp) and analyze performance. But first I would investigate whether jdbc is the best approach normally databases have optimized solutions for imports like this. These solutions may also temporarily switch of constraint checking of your table, and turn that back on later, which is also great for performance. As always depending on your requirements.
Also you may want to checkout springbatch which is designed for batch processing.
As far as I know,the JDBC Bridge uses synchronized methods to serialize all calls to ODBC so using mutliple threads won't give you any performance boost unless it boosts your application itself.
I am not all that familiar with JDBC but regarding the multithreading bit of your question, what you should keep in mind is that parallel processing relies on effectively dividing your problem into bits that are independent of one another and in some way putting them back together (their output that is). If you dont know the underlying dependencies between tasks you might end up having really odd errors/exceptions in your code. Even worse, it might all execute without any problems, but the results might be off from true values. Multi-threading is tricky business, in a way fun to learn (at least I think so) but pain in the neck when things go south.
Here are a couple of links that might provide useful:
Oracle's java trail: best place to start
A good tutorial for java concurrency
an interesting article on concurrency
If you are serious about putting effort to getting into multi-threading I can recommend GOETZ, BRIAN: JAVA CONCURRENCY, amazing book really..
Good luck
I had a similar task. But in my case, all the tables were unrelated to each other.
STEP1:
Using SQL Loader(Oracle) for uploading data into database(very fast) OR any similar bulk update tools for your database.
STEP2:
Running each uploading process in a different thread(for unrelated tasks) and in a single thread for related tasks.
P.S. You could identify different inter-related jobs in your application and categorize them in groups; and running each group in different threads.
Links to run you up:
JAVA Threading
follow the last example in the above link(Example: Partitioning a large task with multiple threads)
SQL Loader can dramatically improve performance
The fastest way I've found to insert large numbers of records into Oracle is with array operations. See the "setExecuteBatch" method, which is specific to OraclePreparedStatement. It's described in one of the examples here:
http://betteratoracle.com/posts/25-array-batch-inserts-with-jdbc
If Multi threading would complicate your work, you could go with Async messaging. I'm not fully aware of what your needs are, so, the following is from what I am seeing currently.
Create a file reader java whose purpose is to read the biz file and put messages into the JMS queue on the server. This could be plain Java with static void main()
Consume the JMS messages in the Message driven beans(You can set the limit on the number of beans to be created in the pool, 50 or 100 depending on the need) if you have mutliple servers, well and good, your job is now split into multiple servers.
Each row of data is asynchronously split between 2 servers and 50 beans on each server.
You do not have to deal with threads in the whole process, JMS is ideal because your data is within a transaction, if something fails before you send an ack to the server, the message will be resent to the consumer, the load will be split between the servers without you doing anything special like multi threading.
Also, spring is providing spring-batch which can help you. http://docs.spring.io/spring-batch/reference/html/spring-batch-intro.html#springBatchUsageScenarios
I need to read 200,000 or so records from a website and store them in DB. The application is a desktop app implemented on top of Netbeans Rich Client Platform. By using Apache HttpComponent library, I can send request to the website and retrieve the response that contains the record information; then using regex, I can fairly easily extract the dozen of fields that I need from the HTML.
I am thinking to have 2 worker threads besides the GUI thread. One worker thread handles the HTTP request/response part and also extracts the record from the HTML using regex; while the other worker thread stores the records into DB. So, there will be a data structure to hold the records so that it can be shared between the two worker threads. I am also considering to have a buffer of size 100 (for example) for the HTTP worker thread to store the records, and when the buffer is full, transfer 100 records at one time to the shared records holder.
Please comment on my design and also my questions are:
what is the proper data structure to hold the records?
how to synchronized it between the two worker threads?
how would the multi-threads be implemented in the modular system of Netbeans Platform?
what is the proper data structure to hold the records?
Depends on the data. Probably a simple class with a bunch of fields (preferably immutable to make using multiple threads safer).
how to synchronized it between the two worker threads?
One of the BlockingQueue implementations might be good for that. ArrayBlockingQueue can be used as a fixed-size buffer for passing work between the threads.
how would the multi-threads be implemented in the modular system of Netbeans Platform?
No idea whether NetBeans Platform has anything to say about that. Launching your own threads should work.
First of all, this kind of HTML parsing would slow down your app quite badly. Also, the code would be quite fragile since HTML changes quite often for aesthetic enhancements. You should resort to 'HTML scraping' as the last resort. Most customers agree to opening up a web-service/data-service for this once you explain the disadvantages.
If you really have no other alternatives, then I think your approach is good. But instead of waiting for the buffer to be full, you could have a set of threads writing into the buffer and a set of threads reading from the buffer simultaneously. I would suggest using more number of HTTP scraper threads and less number of DB-write threads since the HTTP request-response cycle and HTML parsing would be order of times slower than a database write.