Can Java be used to determine a duration of a download? - java

Just asking, and how would you go about doing this.
I know there is ways to get an overall percentage to inform users of the download's progress, but I haven't a clue on how I can do the similar for time.
E.g. "Time until download finishes: 5 minutes".
All I know is percentages, writing the bytes written then dividing it by the length and turning it into a percentage (if I recall correctly, haven't done the above for a few months, so I'm rusty and very forgetful)
Thanks, out of curiosity.

For a completely linear model, you simply divide the number of bytes left to download by the so-far-average download speed:
double avgSpeed = (double) bytesDownloaded / timeElapsed;
double timeLeft = byteLeftToDownload / avgSpeed;
If you stick to milliseconds everywhere, timeLeft will contain the estimated number of milliseconds until the full file is downloaded.
To output that properly in terms of hours and/or minutes and/or seconds, I suggest you have a look at
How to convert milliseconds into human readable form?

Yes, you can.
No, there's nothing "built-in".
And there are actually (at least) two parts to your question:
1) How do I determine the time?
2) How do I display it?
Suggestion:
Google for "Java status bar", and look at some code that might do what you're looking for.
Suggestion:
Look at MS Vista's "time remaining" algorithm. Whatever Vista is doing - don't do that ;)

Related

Sampling numerical arrays in java

I have a data set of time series data I would like to display on a line graph. The data is currently stored in an oracle table and the data is sampled at 1 point / second. The question is how do I plot the data over a 6 month period of time? Is there a way to down sample the data once it has been returned from oracle (this can be done in various charts, but I don't want to move the data over the network)? For example, if a query returns 10K points, how can I down sample this to 1K points and still have the line graph and keep the visual characteristics (peaks/valley)of the 10K points?
I looked at apache commons but without know exactly what the statistical name for this is I'm a bit at a loss.
The data I am sampling is indeed time series data such as page hits.
It sounds like what you want is to segment the 10K data points into 1K buckets -- the value of each one of these buckets may be any statistic computation that makes sense for your data (sorry, without actual context it's hard to say) For example, if you want to spot the trend of the data, you might want to use Median Percentile to summarize the 10 points in each bucket. Apache Commons Math have helper functions for that. Then, with the 1K downsampled datapoints, you can plot the chart.
For example, if I have 10K data points of page load times, I might map that to 1K data points by doing a median on every 10 points -- that will tell me the most common load time within the range -- and point that. Or, maybe I can use Max to find the maximum load time in the period.
There are two options: you can do as #Adrian Pang suggests and use time bins, which means you have bins and hard boundaries between them. This is perfectly fine, and it's called downsampling if you're working with a time series.
You can also use a smooth bin definition by applying a sliding window average/function convolution to points. This will give you a time series at the same sampling rate as your original, but much smoother. Prominent examples are the sliding window average (mean/median of all points in the window, equally weighted average) and Gaussian convolution (weighted average where the weights come from a Gaussian density curve).
My advice is to average the values over shorter time intervals. Make the length of the shorter interval dependent on the overall time range. If the overall time range is short enough, just display the raw data. E.g.:
overall = 1 year: let subinterval = 1 day
overall = 1 month: let subinterval = 1 hour
overall = 1 day: let subinterval = 1 minute
overall = 1 hour: no averaging, just use raw data
You will have to make some choices about where to shift from one subinterval to another, e.g., for overall = 5 months, is subinterval = 1 day or 1 hour?
My advice is to make a simple scheme so that it is easy for others to comprehend. Remember that the purpose of the plot is to help someone else (not you) understand the data. A simple averaging scheme will help get you to that goal.
If all you need is reduce the points of your visuallization without losing any visuall information, I suggest to use the code here. The tricky part of this approach is to find the correct threshold. Where threshold is the amount of data point you target to have after the downsampling. The less the threshold the more visual information you lose. However from 10K to 1K, is feasible, since I have tried it with a similar amount of data.
As a side note you should have in mind
The quality of your visualization depends one the amount of points and the size (in pixels) of your charts. Meaning that for bigger charts you need more data.
Any further analysis many not return the corrected results if it is applied at the downsampled data. Or at least I haven't seen anyone prooving the opposite.

Format a duration for soccer (football)

In soccer (most places, they call it football), the game time is shown as mm:ss, even if there are more than 59 minutes, so if at one hour 22 minutes, 32 seconds into the game, it would be displayed as 82:32.
I have the time as an android.text.format.Time, with hours, minutes and seconds set, which means that I can easily have it as a number of milliseconds since the epoch. Looking through the formatting options (like Time.format(String)), the format specifiers for minutes seems to always have the range [0, 59). Short of writing my own formatter (not hard, but I'm worried about localization), is there a format call that will do what I want here? Thanks.
My specific fears about localization are as follows:
Time separator (':' vs. '.' or some other thing--I don't know the separators that various locales use).
Order. I could imagine some cultures displaying 32:82 in the example from above.
Something even more horrible that I haven't thought of yet.
Obviously, 1 and 2 are solvable with a smart, localized format string with good comments. 3 scares me, but I may just be being paranoid.
I really think you'll be best either
a) not using any Time.format at all - just doing simple math to figure figure out the minutes and seconds and then displaying mm:ss, or
b) using localized format strings if you absolutely need them.
Your first two worries will not be a problem in almost all cases: using a colon to separate minutes and seconds is common across many cultures, as is displaying the minutes before the seconds. In fact, it's the ISO standard: http://en.wikipedia.org/wiki/ISO_8601
If you're really worried about it and want to do a little investigating, I would find videos of soccer matches being broadcast in different languages / locations. I just found one on YouTube in Chinese, and it's using mm:ss to display the time.

avoid static and distortion when chaining together pcm samples

I have the problem of stitching together pcm audio samples from various parts of an audio recording. The idea is that it is audio feedback from a user seeking through a recording on a progress bar of sorts. They could be of an arbitrary length (say .1 to .5 seconds). My major problem is that when I play these samples back, they result in a significant amount of noise artifacts, distortions, etc.
I imagine this is a result of the amplitude jumps between samples. I have yet to come up with a good method of resolving this. The last thing I did was to try to truncate the samples at the point where they cross the origin (go from positive to negative or vice-versa), but that hasn't helped much. Anyone have any ideas?
THanks
the "zero-crossing" trick usually works well, as does a short linear or cosine fade (~1/30 second). If you are using fades, the fades have to be long enough to avoid pops, but still be significantly shorter than the audio segments you are dealing with. If you use zero-crossing, you have to ensure that the audio you are dealing with actually crosses zero (which can be a problem for low-frequencies, and signals that have become offset. To avoid offsets, both problems, you can high-pass filter the signal first).
If your segments are frequently on the short end of the .1 to .5 ms range, various psycho-acoustic phenomenon might be getting in the way. You should start by limiting yourself to longer segments and seeing if that works, then see how short you can make it. That way you know if the problem is with your code or just making it short.

SQL filtering based on date

Let's say I'm performing a SQL query from a Java program to get timestamps (stored as milliseconds) from a table of timestamps that occur within the last 10 days.
I can think of the following two ways to do this:
db.execSql("select * from timestamps where timestamp > (SELECT strftime('%s', 'now', '-10 days') * 1000)");
or
// First calculate in the number of milliseconds in Java
long t = System.currentTimeInMillis() - (10 * 86400000 /* millis in a day */);
db.execSql("select * from timestamps where timestamp > " + t);
Both get the job done and seem to be equivalent perf-wise when testing. Is one method preferred over the other? Does it matter? Is there an idiomatic way to do this?
How important is exactness on the boundaries? The time as seen by the server could be different than the time on the Java VM by several hours (e.g. timezones). I would generally go with the server-time based one, but my usual application would want to use the start of day and not NOW() so that the "bucket-boundaries" don't slide.
I generally prefer to use the second method: if you end up needing to change the number of days, or the date from which the ten-day window is calculated, it will probably be more straightforward, understandable and maintainable to handle that in Java.
Although I have no conclusive tests to prove this and this is just conjecture, I think I'd want to compute it in the software (via Java) and put the "bottleneck" there. Although it may be the same performance-wise, I wouldn't want the DB to be doing unnecessary work if it doesn't have to.
My guess is that you may have many clients running the same piece of software, but they're probably all querying the same database. Therefore, I side with the latter example.

TextRank Run time

I implemented textrank in java but it seems pretty slow. Does anyone know about its expected performance?
If it's not expected to be slow, could any of the following be the problem:
1) It didn't seem like there was a way to create an edge and add a weight to it at the same in JGraphT time so I calculate the weight and if it's > 0, I add an edge. I later recalculate the weights to add them while looping through the edges. Is that a terrible idea?
2) I'm using JGraphT. Is that a slow library?
3) Anything else I could do to make it faster?
It depends what you mean by "pretty slow". A bit of googling found this paragraph:
"We calculated the total time for RAKE and TextRank (as an average over 100iterations) to extract keywords from the Inspec testing set of 500 abstracts, afterthe abstracts were read from files and loaded in memory. RAKE extracted key-words from the 500 abstracts in 160 milliseconds. TextRank extracted keywordsin 1002 milliseconds, over 6 times the time of RAKE."
(See http://www.scribd.com/doc/51398390/11/Evaluating-ef%EF%AC%81ciency for the context.)
So from this, I infer that a decent TextRank implementation should be capable of extracting keywords from ~500 abstracts in ~1second.

Categories