I have the following problem:
I have a spring boot service that has to handle a lot of traffic (around 5.000 incoming POST requests per second on several TCP connections in parallel).
The incoming data contains some basic sales data. It is used to create a dashboard which is updated via a GET every second and displays only the past minute. Data has not to be stored persistent, but can be thrown away, even losing some data after a restart is not a problem.
Main concerns are memory and cpu usage: I want to use as few memory and cpu as possible.
My idea how to handle is is to keep the data only for the last second. I would use a built-in data-structure like a LinkedList to store the data (O(1) for insertion).
Whenever the dashboard is updated (so a GET comes in), I make a copy of the LinkedList and create a new empty LinkedList that will be updated with the new incoming data. I will use the copy to compute sum and average and return it to the display.
Am I missing something here? Is there a faster, less memory-consuming way to do this?
Related
Give this
public void do(RequestObject request, Callback<RequestObject> callback);
Where Callback is called when the request is processed. One client has to set status of the request to the database. The client fetches some items passes them to the above method and the callback sets the status.
It was working ok for small number of items and slower IO. But now, the IO is speed up and the status is written to database vary frequently. This is causing my database (MySQL) to make so many disk read write calls. My disk usage goes through the roof.
I was thinking of aggregating the setting of status but power in not reliable, that is not a plausible solution. How should re'design this?
EDIT
When the process is started I insert a value and when there is an update, I fetch the item and update the item. #user2612030 Your question lead me to believe, using hibernate might be what is causing more reads than it is necessary.
I can upgrade my disk drive to SSD but that would only do so much. I want a solution that scales.
An SSD is a good starting point, more RAM to MySQL should also help. It can't get rid of the writes, but with enough RAM (and MySQL configured to use it!) there should be few physical reads. If you are using the default configuration, tune it. See for example https://www.percona.com/blog/2016/10/12/mysql-5-7-performance-tuning-immediately-after-installation/ or just search for MySQL memory configuration.
You could also add disks and spread the writes to multiple disks with multiple controllers. That should also help a bit.
It is hard to give good advice without knowing how you record status values. Inserts or updates? How many records are there? Data model? However, to really scale you need to shard the data somehow. That way one server can handle data in one range and another server data in another range and so on.
For write-heavy applications that is non-trivial to set up with MySQL unless you do the sharding in the application code. Most solutions with replication work best for read-mostly applications. You may want to look into a NoSQL database, for example MongoDB, that has been designed for distributing writes from the outset. MongoDB has other challenges (eventual consistency), but it can deliver scalable writes.
I have a big list of over 20000 items to be fetched from DB and process it daily in a simple console based Java App.
What is the best way to do that. Should I fetch the list in small sets and process it or should I fetch the complete list into an array and process it. Keeping in an array means huge memory requirement.
Note: There is only one column to process.
Processing means, I have to pass that string in column to somewhere else as a SOAP request.
20000 items are string of length 15.
It depends. 20000 is not really a big number. If you are only processing 20000 short strings or numbers, the memory requirement isn't that large. But if it's 20000 images that is a bit larger.
There's always a tradeoff. Multiple chunks of data means multiple trips to the database. But a single trip means more memory. Which is more important to you? Also can your data be chunked? Or do you need for example record 1 to be able to process record 1000.
These are all things to consider. Hopefully they help you come to what design is best for you.
Correct me If I am Wrong , fetch it little by little , and also provide a rollback operation for it .
If the job can be done on a database level i would fo it using SQL sripts, should this be impossible i can recommend you to load small pieces of your data having two columns like the ID-column and the column which needs to be processed.
This will enable you a better performance during the process and if you have any crashes you will not loose all processed data, but in a crash case you eill need to know which datasets are processed and which not, this can be done using a 3rd column or by saving the last processed Id each round.
I'm in the early stages of doing a web project which will require working with arrays containing around 500 elements of custom object type. Objects will likely contain between 10 and 40 fields (based on user input), mostly booleans, strings and floats. I'm gonna use PHP for this project, but I'm also interested to know how to treat this problem in Java.
I know that "premature optimization is the root of all evil", but I think I need to decide now, how do I handle those arrays. Do I keep them in the Session object or do I store them in the database (mySQL) and keep just a minimum amount of keys in the session. Keeping data in the session would make application work faster, but when visitor numbers start growing I risk using up too much memory. On the other hand reading and writing from and into database all the time will degrade performance.
I'd like to know where the line is between those two approaches. How do I decide when it's too much data to keep inside session?
When I face a problem like this I try to estimate the size of per user data that I want to keep fast.
If your case, suppose for example to have 500 elements with 40 fields each of which sizing 50 bytes (making an average among texts, numbers, dates, etc.). So we have to keep in memory about 1MB per user for this storage, so you will have about 1GB every 1000 users only for this cache.
Depending on your server resource availability you can find bottlenecks: 1000 users consume CPU, memory, DB, disks accesses; so are in this scenario 1GB the problem? If yes keep them in DB if no keep them in memory.
Another option is to use an in-memory DB or a distributed cache solution that does it all for you, at some cost:
architectural complexity
eventually licence costs
I would be surprised if you had that amount of unique data for each user. Ideally, some of this data would be shared across users, and you could have some kind of application-level cache that stores the most recently used entries, and transparently fetches them from the database if they're missing.
This kind of design is relatively straightforward to implement in Java, but somewhat more involved (and possibly less efficient) with PHP since it doesn't have built-in support for application state.
I've already built the web service API by jersey framework.
Now I want to limit the quotas for every client.
For example:
- one client can only make less than 10000 requests in one day.
- one client can only make less than 10 requests per second.
so on and so forth.
Should I store these information in the table of the database?
But if I do that, will it cost a lot time to handle these requests because I have to update the table.
I am looking forward to other efficient ways to solve these problem.
Because this is my first time to do this kind of job, hope somebody can give me some advise to in these problems.
Thanks~!
Without information about how you define a client its difficult to answer this question. However one method would be to filter all incoming requests using a ContainerRequestFilter
Then at that level you can define what a client is, and log all accesses by that client to your Jersey application. Perhaps by incrementing a value in a DataStructure or a value in a database. Then having a cron job flushes that data every 24 hours.
Ideally you would want to store the data in an in-memory data structure, since the data is transient, it won't grow to a large size and will be deleted in a short period of time anyway. However this will become an issue if you ever scale up to multiple machines, or multiple instances on a single machine.
Without more information from you I can't really give any more information
When you store a message in a queue, isn't it more of meta data information so whoever pulls from the queue knows how to process the data? the actual information in the queue doesn't always hold all the information.
Say you have an app like Twitter, whenever someone posts a message, you would still need to store the actual message text in the database correct?
The queue would be more used to broadcast to other subscribers that a new message has arrived, and then those services could take further action.
Or could you actually store the tweet text in the queue also? (or you COULD, but that would be silly?)
Could a queue message have status fields, which subscribers can change as they process their part of the work flow? (or would you do that in the db?)
Just trying to get some clarification of when you would use a queue versus db.
When a process wants to farm data and the processing of that data out to another process (possibly on a different host), there are 2 strategies:
Stuff all your data into the queue item and let the receiving app worry about storing it in the database, among with whatever other processing.
Update your database, and then queue a tiny message to the other process just to notify it that there's new data to be massaged.
There are a number of factors that can be used to decide on which strategy:
If your database is fully ACID (one would hope) but your queueing system (QS) is not, your data would be safer in the DB. Even if the queue message gets lost in a server crash, you could run a script to process unprocessed data found in the DB. This would be a case for option 2.
If your data is quite large (say, 1 MB or more) then it might be cruel to burden your QS with it. If it's persistent, you'll end up writing the data twice, first to the QS's persister and later to the DB. This could be a drag on performance and influence you to go for option 1.
If your DB is slow or not even accessible to your app's front end, then option 1 it is.
If your second process is going to do something with the data but not store it in a DB, then option 1 may be the way to go.
Can't think of any more, but I hope you get the idea.
In general, a queue is used to 'smooth' out publish rate versus consume rate, by buffering incoming requests that can't be handled immediately. A queue is usually backed by some sort of non-volatile storage (such as a database table). So the distinction is not so clear cut.
Use a database when you want to perform many searches against your 'queue', or provide rich reporting.
I recommend that you look at Gregor Hophe's book, Enterprise Integration Patterns, which explains many different patterns for messaging-based approaches.
We used JMS extensively at my last job where we were passing data around from machine to machine. In the end, we were both sending and storing the data at the same time; however, we stored far less data than we sent out. We had a lot of metadata surrounding the real values.
We used JMS as simply a messaging service and it worked very well for that. But, you don't want to use JMS to store your data as it has no persistence (aside from being able to log and replay the messages perhaps).
One of the main advantages that JMS gives you is the ability to send out your messages in the correct and appropriate order and ensure that everybody receives them in that order. This makes synchronization easy since the majority of the message handling is done for you.
My understanding is Twitter will be using both DB and JMS in conjunction. First when the tweets are written it will store it in the database and this is how it will display in the message board. However since this is a publisher/subscriber model when the tweets are published it will then be sent to the subscribers. So both the items will be used.
I think your twitter example is good. You want the database for long term data. There wouldn't be much point in putting the tweet in the message because it has to go in the database. However, if you were running a chat room then you could go ahead and put the message in the JMS queue because you're not storing it long term anywhere.
It's not that you can't put the tweet in the JMS it's that you need to put it in the database anyways.
I would use the queue whenever you can utilize a "fire-and-forget" pattern. In your Twitter example, I would use the queue to post the message from the client. The queue processor can then store it to the database when it gets to it.
If you require some sort of immediate success/failure status, then the message queue isn't for you.