I have a requirement as follows
There are multiple devices producing data based on the device configuration. e.g., There are two devices producing data at their own intervals let’s say d1 producing for every 15 min and d2 producing for every 30 min
All this data will be sent to Kafka
I need to consume the data and perform calculations for each device which is based on the values produced for the current hour and the first value produced in the next hour. For e.g., If d1 is producing data for every 15min from 12:00 AM-1:00 AM then the calculation is based on the values produced for that hour and the first value produced from 1:00 AM-2:00 AM. If the value is not produced from 1:00AM-2:00 AM then I need to consider data from 12:00 AM-1:00 AM and save it data repository (Time series)
Like this there will be ‘n’ number of devices and each device has its own configuration. In the above scenario device d1 and d2 are producing data for every 1 hr. There might be other devices which will be producing data for every 3 hr, 6 hr.
Currently this requirement is done in Java. Since the devices are increasing so as the computations, I would like to know if Spark/Spark Streaming can be applied to this scenario?Any articles with respect to these kind of requirements can be shared so that it will be of great help.
If, and this is a big if, the computations are going to be device-wise, you can make use of topic partitions and scale the number of partitions with the number of devices. The messages are delivered in order per partition this is the most powerful idea that you need to understand.
However, some words of caution:
The number of topics may increase, if you want to decrease you may need to purge the topics and start again.
In order to ensure that the devices are uniformly distributed, you may consider assign a guid to each device.
If the calculations do not involve some sort of machine learning libraries and can be done in plain java, it may be a good idea to use plain old consumers (or Streams) for this, instead of abstracting them via Spark-Streaming. The lower the level the greater the flexibility.
You can check this. https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster
I have a datalogger that produces a CSV file containing UTC time and 4 parameters. The UTC time is logged ABOUT every 30ms followed by the 4 parameters. The problem I have is 2 fold:
1) The CSV file is potentially huge if I run the datalogger for even an hour.
2) The UTC time is not exactly every 30ms.
In my simple design for a replay of the data I had planned to load the file, split each entry at character "'" then assign the values in a loop though the UTC time value and then load the 4 parameters, but with the file so large I am concerned it wont work or will be very slow. I am new to java and am not sure if the there is a better way to handle so much data (I suspect there is!).
My plan to loop through and repeat he filling of 4 variables for the parameters wont work as the UTC entries are not exact. I had planned to take a decimal place off the data, but that clearly looses me fidelity in the replay of my data. I want to be able to construct a "timeline" in my application to allow play pause stop style functionality hence my problem handling the UTC time.
Here is a sample of some of the data when the time is pretty tight, this isnt always the case:
,13:35:38.772,0,0,0,0.3515625
,13:35:38.792,0,0,-0.0439453125,0.3515625
,13:35:38.822,0,0,0,0.3515625
,13:35:38.842,0,0,0,0.3515625
,13:35:38.872,0,0,0.0439453125,0.3515625
,13:35:38.892,0,0,0,0.3076171875
,13:35:38.922,0,0,0,0.3076171875
,13:35:38.942,0,0,0,0.3076171875
,13:35:38.962,0,0,0.0439453125,0.3515625
,13:35:38.992,0,0,0,0.3515625
,13:35:39.012,0,0,0,0.3076171875
,13:35:39.042,0,0,-0.0439453125,0.3076171875
,13:35:39.072,0,0,0,0.3515625
,13:35:39.092,0,0,0,0.3515625
,13:35:39.112,0,0,0.0439453125,0.3076171875
,13:35:39.142,0,0,0,0.3515625
,13:35:39.162,0,0,0,0.3076171875
,13:35:39.192,0,0,0,0.3515625
,13:35:39.212,0,0,0,0.3076171875
,13:35:39.242,0,0,0,0.3515625
,13:35:39.262,0,0,0,0.3076171875
I realise this is a broad question, but I am looking for a general steer in how to tackle the problem. Code is welcome, but I am expecting to have to ask more questions as time goes on.
Thanks for the help;
Andy
I'm programming an app that will display the result of a team. I'd like notify the user when there's a goal. To do this, I created a service where I'll take the score and put it in a String every 30 seconds and I'll create a notification if it changes. My question is: What's the best way to repeat this every 30 seconds ?
Thank's and sorry for my poor english
Well if you want to do a call every x seconds that would require an ajax setTimeout (not setInterval). If you are going to do that it would be a good idea to send back some data to the db each round trip to make sure your query is not massive, and searching the entire db table.
For example you might send back the most recent timestamp on each round trip. And then have the backend query check how many rows are greater then that timestamp and display the count to the user. So if there is one row, show the user 1 new row. And when they click it query for the content. This should be a low impact query for a high impact activity (pooling would be a better option). Goodluck.
PS: If you want to get fancy and really learn some stuff tonight I would do some research on asynchronous servlets, rather then just pooling as another poster advised. That would lead you down the rabbit hole to some really sweet stuff.
You can do something like :
check if anything changed (yes - send new info, no - do nothing)
sleep 30 sec (Thread.sleep(30000))
repeat
I have managed to read the web service to get current time of any given city.
I could get 2 important values from web service, current time (String) and the offset.
Question is
How to set time of any given city correctly?
Option 1:
Read machine/local time
Calculate UTC/GMT time out of machine time
City time = UTC time +/- offset value
But then what happens when machine time is wrong? You will also got
wrong time right?
Option 2:
Read current city time in String (2012-11-24 19:30)
Parse this time value and set it into Calendar
We got correct City time
But how about the next minute? Of course requesting the web service every minute to get current time is not a good solution right? Is it possible to maintain this Calendar instance keep running automatically every minute once we set it?
NB : I'm developing Android clock widget here.
Thanks
Option 1 is far better, in my eyes. Most cell phones have amazingly accurate time as time synchronization is an integral part of GSM and CDMA. Beyond that, I would far prefer a clock to work offline than to require internet connectivity.
If you are worried about ensuring accuracy in the face of incorrect system time, consider placing a call to a web service to get the current time for verification.
This verification could be done in the background, but keep in mind that web services are not the best time sync providers. I would let anything with under 5 minute difference go as it could be due to your server being out of sync or the call taking too long.
This maybe an over asked question, but my mind draws blank at this moment. I know what a candlestick chart is and how to draw it daily. But how to draw it intraday at asked time periods. I have this server, written in Java, that gives me trade depth (each trade done since the start of the day). Its just a stream of raw data: price, shares, timestamp.
How does one go about calculating candlestick data from that? Lets say, they want to have 5 min candlestick or 1min candlestick. Or is there a library that will do that for me if I feed it data?
Any help is appreciated!
The exact implementation varies depending on how you're storing the data, but in general:
Sort the data by timestamp
Decide when the day starts (e.g. 9 AM EST, whatever) and find the timestamp of that time on the first day. You then know when each 5 minute (or whatever) bar begins and ends, by adding an appropriate offset to that number.
Find the index of the first data point that is not in the first bar - every data point whose index is lower than that is in the first bar. It's now straightforward to take the first, last, maximum, and minimum prices for a candlestick.
Repeat 3, substituting the last index of the previous candle for 0.
You now have the data partitioned into candles.
Have you seen JFreeChart ? It will draw candlesticks, and since it's incredibly configurable, it may well do what you want.