How to discard time intervals with Time Series / XYPlots using JFreeChart? - java

I am building a set of chart displays, one of which is for a month display of daily trading - that is, one point of data per day (closing).
Since there is no trade during weekends and holidays, I need to discard these data points. Not only that, but data points should still appear adjacent to each other, regardless of any gaps in time. This can be seen in any such chart e.g. in the 3 month graph for Nasdaq on Yahoo Finance - see how weekends are skipped.
My question is: how should one correctly implement this in JFreeChart?
Thanks in advance!

In addition to omitting the excluded data points, you can apply a SegmentedTimeline to the corresponding DateAxis. For example,
axis.setTimeline(SegmentedTimeline.newMondayThroughFridayTimeline());
Although deprecated in the current version, as discussed here, the implementation may guide creation of a custom TimeLine, as noted in a comment here.

Related

Date Time on x-axis using JFreeChart

I would like to create a graph using data from a database (Oracle).
The dataset contains DateTimeStamp values and numeric values
The DateTimeStamp values are for intervals of 15 minutes and usually extend for a period greater than one calendar day.
The DateTimeStamp values need to be graphed against the X-axis and the data values (numeric) will be graphed against the Y-yxis.
The x-axis major tic marks should be the calendar days.
I have purchased the enterprise version of JFreeChart but the username password provided does not allow me to ask questions on their forum, and sending email to the owner results in no response.
Thanks for any help you can offer. A working code block/snippet would be ideal and much appreciated.
Look at JDBCXYDataset, which uses the query's ResultSetMetaData to recognize Types related to date and time. Complete examples are seen here and here.
As an aside, access to the support forum is free; just click the register link.

How to specify a range of valid values in a Java GUI

I am creating an IMDB application which displays and organizes movies found on you computer (by looking up the metadata via an IMDB API).
In my search panel I want to give the user the option of looking for movies that were released in a specific range of years (e.g. between 1990 and 2005). Currently I use for this two JSpinners, one for the minimum year and one for the maximum year and use cross validation to check whether maxYear >= minYear && minYear <= maxYear However I don't think this is very user-friendly.
What I would like is a JSlider with two knobs, one for min and one for max. Is this possible? Do you have any other ideas on how to make this interface more user-friendly?
This looks promising: Creating a Java Swing range slider
And here's another example that I think came from the old Tame examples: MThumbSlider
You could have two JTextFields, and just let the user type the minimum and maximum years.
Otherwise, two JSpinners is another choice. Developing a custom component that your users have never seen is not user friendly.
You can cross connect the two JSpinners so that it's impossible for the user to enter a minimum year greater than a maximum year. I've not done this, so I don't have a code example to show you.

Calculating intraday candlesticks by time intervals

This maybe an over asked question, but my mind draws blank at this moment. I know what a candlestick chart is and how to draw it daily. But how to draw it intraday at asked time periods. I have this server, written in Java, that gives me trade depth (each trade done since the start of the day). Its just a stream of raw data: price, shares, timestamp.
How does one go about calculating candlestick data from that? Lets say, they want to have 5 min candlestick or 1min candlestick. Or is there a library that will do that for me if I feed it data?
Any help is appreciated!
The exact implementation varies depending on how you're storing the data, but in general:
Sort the data by timestamp
Decide when the day starts (e.g. 9 AM EST, whatever) and find the timestamp of that time on the first day. You then know when each 5 minute (or whatever) bar begins and ends, by adding an appropriate offset to that number.
Find the index of the first data point that is not in the first bar - every data point whose index is lower than that is in the first bar. It's now straightforward to take the first, last, maximum, and minimum prices for a candlestick.
Repeat 3, substituting the last index of the previous candle for 0.
You now have the data partitioned into candles.
Have you seen JFreeChart ? It will draw candlesticks, and since it's incredibly configurable, it may well do what you want.

Best fit curve for trend line

Problem Constraints
Size of the data set, but not the data itself, is known.
Data set grows by one data point at a time.
Trend line is graphed one data point at a time (using a spline/Bezier curve).
Graphs
The collage below shows data sets with reasonably accurate trend lines:
The graphs are:
Upper-left. By hour, with ~24 data points.
Upper-right. By day for one year, with ~365 data points.
Lower-left. By week for one year, with ~52 data points.
Lower-right. By month for one year, with ~12 data points.
User Inputs
The user can select:
the type of time series (hourly, daily, monthly, quarterly, annual); and
the start and end dates for the time series.
For example, the user could select a daily report for 30 days in June.
Trend Weight
To calculate the window size (i.e., the number of data points to average when calculating the trend line), the following expression is used:
data points / trend weight
Where data points is derived from user inputs and trend weight is 6.4. Even though a trend weight of 6.4 produces good fits, it is rather arbitrary, and might not be appropriate for different user inputs.
Question
How should trend weight be calculated given the constraints of this problem?
Based on the looks of the graphs I would say you have too many points for your 12 point graph (it is just a spline of the points given... which is visually pleasing, but actually does more harm than good when trying to understand the trend) and too few points for your 365 point graph. Perhaps try doing something a little exponential like:
(Data points)^1.2/14.1
I do realize this is even more arbitrary than what you already have, but arbitrary isn't the worst thing in the world.
(I got 14.1 by trying to keep the 52 point graph fixed, since that one looks nice, by taking (52^(1.2)/52)*6.4=14.1. You using this technique you could try other powers besides 1.2 to see what you visually get.
Dan
I voted this up for the quality of your results and the clarity of your write-up. I wish I could offer an answer that could improve on your already excellent work.
I fear that it might be a matter of trial and error with the trend weight until you see an improved fit.
It could be that you could make this an input from users as well: allow them to fiddle with the value, given realistic constraints, until they get satisfactory values.
I also wondered if the weight would be different for each graph, since the number of points in each is different. Are you trying to get a single weighting that works for all graphs?
Excellent work; a nice question. Well done. I wish I was more helpful. Perhaps someone else will have more wisdom to impart than I do.
It might look like the trend lines are accurate in those 4 graphs but its really quite off. (This is best seen in the begging of the lower left one and the beginning of the upper right. I would think that you would want to use no less than half of your points when finding the trend line (though really you should use much more than half). I would suggest a Trend Weight of 2 at a maximum. Though really you ought to stick closer to the 1-1.5 range. Since it is arbitrary i would suggest you give your user an "accuracy of trend line" slider that they can use where the most accurate setting uses a trend weight of 1 and the least accurate uses a weight of #of data points +1. This would use 0 points (amusing you always round down) and, i would assume, though your statistics software might be different, will generate a strait horizontal line.

Is there a good algorithm to check for changes in data over a specified period of time?

We have around 7k financial products whose closing prices should theoretically move up and down within a certain percentage range throughout a defined period of time (say a one week or month period).
I have access to an internal system that stores these historical prices (not a relational database!). I would like to produce a report that lists any products whose price has not moved at all or less than say 10% over the time period.
I can't just compare the first value (day 1) to the value at the end (day n) as the price could potentially have moved back to what it was on the last day which would lead to a false positive while the product's price could have spiked somewhere in between of course.
Are there any established algorithms to do this in reasonable compute time?
There isn't any way to do this without looking at every single day.
Suppose the data looks like such:
oooo0oooo
With that one-day spike in the middle. You're not going to catch that unless you check the day that the spike happens - in other words, you need to check every single day.
If this needs to be checked often (for a large number of interval, like daily for the last year, and for the same set of products), you can store the high and low values of each item per week/month. By combining the right weekly and/or monthly bounds with some raw data at the edges of the interval you can get the minimum and maximum value over the interval.
If you can add data to kdb (i.e. you're not limited to read access) you might consider adding the 'number of days since last price change' as a new set of data (i.e. one number per financial instrument). A daily task would then fetch today's mark and yesterday's, and update the numbers stored. Similarly you could maintain recent (last month, last year) highs and lows in kdb. You'd have to run a job over the larger dataset to prime the values initially, but then your daily updates will involve much less data.
Recommend that if you adopt something like this you have some way to rerun for all or part of the dataset (say for adding a new product).
Lastly - is the history normalised against current prices? (i.e. are revaluations for stock splits or similar taken into account). If not, you'd need to detect these discontinuities and divide them out.
EDIT
I'd investigate usng kdb+/Q to implement the signal processing, rather than extracting the raw data to a Java application. As you say, it's highly performant.
You can do this if you can keep track of the min and max value of the price during the time interval - this assumes that the time interval is not being constantly changed. One way of keeping track of the min and max values of a changing set of items is with two heaps placed 'back to back' - you could store this and some pointers necessary to find and remove old items in one or two arrays in your store. The idea of putting two heaps back to back is in Knuth's Art of Computer Programming Vol 3 as Exercise 31 section 5.2.3. Knuth calls this sort of beast a Priority Dequeue, and this seems to be searchable. Min and max are available at constant cost. Cost of modifying it when a new price arrives is log n, where n is the number of items stored.

Categories