Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I created a simple rest service POST method that consumes a XML. I created a REST client with Jersey and created my object and i am trying to see the variations in response time by increasing my XML length, by giving larger inputs to my objects. Say, my objects takes up a simple employee details, i will increase . I see that the response time inconsistently varies , from my observation it is not dependent on the size of the xml. I am computing the time taken as follows.
long startTime = System.currentTimeMillis();
// enter code here for post `
long elapsedTime = System.currentTimeMillis() - startTime;
Please Suggest if there is a better way of doing it.
Here what i would like to get clarified is my server is in the local host and why the response time varies (say once it is 88ms and the other time it is 504ms ). What i expect is it should increase when i am giving larger inputs to my xml object but that does not happen as i observe. Please clarify or point me to a better site or book where i can read about the same.
Note that your question is quite broad (and will likely be closed as such). My explanation is similarly broad and just meant to give you some background on why you might see the behavior that you are seeing.
It is unlikely that the way you measure time makes a big difference here, given that you are up to hundreds of milliseconds. It is more likely that the web site that you are invoking is sometimes takes longer to respond.
It may help to compare it to what you see when you type a search query into Google. Sometimes the response pops up "instantaneously", but sometimes it takes a few moments to load. Since you're using the same browser for every search, it can't be the browser causing the difference.
Instead it is likely something in the web service that is varying between the calls. In the Google search example, you might be routed to a different Google server, the server might be using a different storage to search, the query might be cached at some point, etc.
So there is no way to determine why the performance is different between invocations by simply looking at the client code. Instead, you should look at the server code and profile what happens there for different invocations.
use System.nanoTime instead of currentTimeMillis; also be aware that Java optimises on the fly and if you do not warm the server first then you will be timing interpreted execution of the byte code and then at some point it will flip to native byte code. The timings will differ by input length (in a step like function) but you have a certain amount of noise to overcome before you will see that. A lot from GC, a lot from thread scheduling and OS issues.
I recommend the mechanical sympathy group for discussions on this sort of thing. There are many discussions on how to measure these types of systems. https://groups.google.com/forum/#!forum/mechanical-sympathy
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm new to Java development and android. I need to implement functionality for the currency convector.
I need to know how to take information from the Internet (and periodically update it) and save it in a database.
For making a currency converter, you would need to fetch the conversion rates in real time.these are a some ways to achieve this in your client-
1)Make ajax calls to apis(find some 3rd party service to fetch the conversion rates at real time , something like this - How do I get currency exchange rates via an API such as Google Finance? ) or alternatively use something like - (Web scraping with Java - which needs you to write your own back end which is responsible for making the google search and retrieving the exchange rates)
2)Whenever a conversion is executed by an end user, make a call to the decided apis in the step 1 and use the updated exchange rate to calculate and provide results- I would say, don't go for persisting the values in the database, as the foreign exchange market is constantly “live”, meaning that it never closes, even at night, so the exchange rate is always changing.(https://www.purefx.co.uk/foreign-currency-exchange-insight/view/do-exchange-rates-change-daily )
3) You could save the results in cache and update the cache at some periods of time using a cron job (maybe use Executorservice if you are writing a back-end in java) but in this case the rates might not be the latest ones and any satisfactorily "live" values means very noisy api calls at the server-side even when there are no users actively using the client.You might want to trade off accurate conversion with your resource usage and update the server side cache in intervals of 1 hour say)
4) Same apis could also be used to fetch the current rates ,if end user just wants to check the exchange rates
5)If you at all want to save the values in a database, you could use a key value type of database like Redis (https://redis.io/topics/client-side-caching)
Hope this helps.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am working on a project, where I was provided a Java matrix-multiplication program which can run in a distributed system , which is run like so :
usage: java Coordinator maxtrix-dim number-nodes coordinator-port-num
For example:
java blockMatrixMultiplication.Coordinator 25 25 54545
Here's a snapshot of how output looks like :
I want to extend this code with some kind of failsafe ability - and am curious about how I would create checkpoints within a running matrix multiplication calculation. The general idea is to recover to where it was in a computation (but it doesn't need to be so fine grained - just recover to beginning, i.e row 0 column 0 )
My first idea is to use log files (like Apache log4j ), where I would be logging the relevant matrix status. Then, if we forcibly shut down the app in the middle of a calculation, we could recover to a reasonable checkpoint.
Should I use MySQL for such a task (or maybe a more lightweight database)? Or would a basic log file ( and using some useful Apache libraries) be good enough ? any tips appreciated, thanks
source-code :
MatrixMultiple
Coordinator
Connection
DataIO
Worker
If I understand the problem correctly, all you need to do is recover your place in a single matrix calculation in the event of a crash or if the application is quit half way through.
Minimum Viable Solution
The simplest approach would be to recover just the two matrixes you were actively multiplying, but none of your progress, and multiply them from the beginning next time you load the application.
The Process:
At the beginning of public static int[][] multiplyMatrix(int[][] a, int[][] b) in your MatrixMultiple class, create a file, let's call it recovery_data.txt with the state of the two arrays being multiplied (parameters a and b). Alternatively, you could use a simple database for this.
At the end of public static int[][] multiplyMatrix(int[][] a, int[][] b) in your MatrixMultiple class, right before you return, clear the contents of the file, or wipe you database.
When the program is initially run, most likely near the beginning of the main(String[] args) you should check to see if the contents of the text file is non-null, in which case you should multiply the contents of the file, and display the output, otherwise proceed as usual.
Notes on implementation:
Using a simple text file or a full fledged relational database is a decision you are going to have to make, mostly based on the real world data that only you could really know, but in my mind, a textile wins out in most situations, and here are my reasons why. You are going to want to read the data sequentially to rebuild your matrix, and so being relational is not that useful. Databases are harder to work with, not too hard, but compared to a text file there is no question, and since you would not be much use of querying, that isn't balanced out by the ways they usually might make a programmers life easier.
Consider how you are going to store your arrays. In a text file, you have several options, my recommendation would be to store each row in a line of text, separated by spaces or commas, or some other character, and then put an extra line of blank space before the second matrix. I think a similar approach is used in crAlexander's Answer here, but I have not tested his code. Alternatively, you could use something more complicated like JSON, but I think that would be too heavy handed to justify. If you are using a database, then the relational structure should make several logical arrangements for your data apparent as well.
Strategic Checkpoints
You expressed interest in saving some calculations by taking advantage of the possibility that some of the calculations will have already been handled on last time the program ran. Lets look first look at the Pros and Cons of adding in checkpoints after every row has been processed, best I can see them.
Pros:
Save computation time next time the program is run, if the system had been closed.
Cons:
Making the extra writes will either use more nodes if distributed (more on that later) or increase general latency from calculations because you now have to throw in a database write operation for every checkpoint
More complicated to implement (but probably not by too much)
If my comments on the implementation of the Minimum Viable Solution about being able to get away with a text file convinced you that you would not have to add in RDBMS, I take back the parts about not leveraging queries, and everything being accessed sequentially, so a database is now perhaps a smarter choice.
I'm not saying that checkpoints are definitely not the better solution, just that I don't know if they are worth it, but here is what I would consider:
Do you expect people to be quitting half way through a calculation frequently relative to the total amount of calculations they will be running? If you think this feature will be used a lot, then the pro of adding checkpoints becomes much more significant relative to the con of it slowing down calculations as a whole.
Does it take a long time to complete a typical calculation that people are providing the program? If so, the added latency I mentioned in the cons is (percentage wise) smaller, and so perhaps more tolerable, but users are already less happy with performance, and so that cancels out some of the effect there. It also makes the argument for checkpointing more significant because it has the potential to save more time.
And so I would only recommend checkpointing like this if you expect a relatively large amount of instances where this is happening, and if it takes a relatively large amount of time to complete a calculation.
If you decide to go with checkpoints, then modify the approach to:
after every row has been processed on the array that you produce the content of that row to your database, or if you use the textile, at the end of the textile, after another empty line to separate it from the last matrix.
on startup if you need to finish a calculation that has already been begun, solve out and distribute only the rows that have yet to be considered, and retrieve the content of the other rows from your database.
A quick point on implementing frequent checkpoints: You could greatly reduce the extra latency from adding in frequent checkpoints by pushing this task out to an additional thread. Doing this would use more processes, and there is always some latency in actually spawning the process or thread, but you do not have to wait for the entire write operation to be completed before proceeding.
A quick warning on the implementation of any such failsafe method
If there is an unchecked edge case that would mean some sort of invalid matrix would crash the program, this failsafe now bricks the program it entirely by trying it again on every start. To combat this, I see some obvious solutions, but perhaps a bit of thought would let you modify my approaches to something you prefer:
Use a lot of try and catch statements, if you get any sort of error that seems to be caused by malformed data, wipe your recovery file, or modify it to add a note that tells your program to treat it as a special case. A good treatment of this special case may be to display the two matrixes at start with an explanation that your program failed to multiply them likely due to malformed content.
Add data in your file/database on how many times the program has quit while solving the current problem, if this is not the first resume, treat it like the special case in the above option.
I hope that this provided enough information for you to implement your failsafe in the way that makes the most sense given what you suspect the realistic use to be, and note that there are perhaps other ways you could approach this problem as well, and these could equally have their own lists of pros and cons to take into consideration.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
We are using a Change Data Capture tool to migrate source data to a target database in near real-time.
The challenge is to identify as accurately as possible the data migration latency that exists between
source and target. The latency reporting capabilities of the tool are not to our satisfaction and so I
have been tasked with developing a process that will better monitor this specific metric.
There are two main reasons why we need to know this:
1: Provide our users with an accurate data availability matrix to support report scheduling. For example,
How much time should pass after midnight before scheduling a daily reconciliation report for the
previous day given that we want this information as soon as possible?
2: Identify situations when the data mirroring process is running slower than usual (or has even stopped).
This will trigger an email to our support team to investigate.
I am looking for some general ideas of how to best go about this seemingly simple task
My preferred approach is a dedicated heartbeat or health-check table.
At the source the table has an identity column (SQLserver) column or value from a sequence (Oracle) as main identifier; a fixed task name string; fixed server string (if no already identified by the taskname; and the current time.
Have a script/job on the source to insert a record every minute (or 2 minutes or 10 minutes)
In the CDC engine (if there is one), add a column with the time the change event was processed.
At the target, add a final column defaulting to the current time at insert.
A single target table can accommodate multiple sources/tasks.
The regular blibs will allow one to see at a glance whether changes are coming true, whether the application is generating changes or not.
A straightforward report can show the current latency, as was as the latency over time.
It is nice to be able to compare 'this Monday' with 'last Monday' to see if things a similar, better or worse.
Cheers, Hein.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have been experimenting with ways to use the processing power of two computers together as one (not by physically connecting them, but by splitting the task in half and each computer does a half, then the result from the "helper" computer is sent back to be combined with the result from the "main" computer via internet)
I've been using this method to compute fractal images and it works great. The left half and the right half of the image are computed on separate computers, then combined into one. The process of sending one half of the image to the other computer and combining them takes maybe a second, so the efficiency is great and cuts time down by about half.
The problem comes when you want to do this "multi computer processing" with something that needs data exchanged very frequently.
For example, I'd like to use this for something like an n-body simulation. You need the data exchange to happen multiple times per second, so if the exchange takes about a second it actually takes much longer to try and use two computers then it would with one.
So how do online video games do it? The players around you, what they are doing, what they are wearing, everything going on has to be exchanged between everyone playing many times per second.
I'm just looking for general ideas on how to send larger amounts of data and at fast speeds.
The way I have been doing it is with PHP on a free hosting site. The helper computer will compute its half of the data then sends it to the PHP file which saves that data somewhere. Then the main computer reads this and combines it with the data it computed already.
I have a feeling PHP isn't the way to go, but I don't know much about this sort of thing.
Your first step will be to move from using HTTP Requests to using Sockets directly - this will give you much more control over the communication, and give you improved performance by reducing the overhead of the HTTP protocol (this is potentially pretty significant). Plus, with sockets you can more easily have your programs communicate to each other directly, rather than through the PHP-based software.
There are a ton of guides online as to how you would do this sort of system, and I would recommend Googling things like "game networking" and "distributed computing".
Here is one series of articles that I have found useful in the past, that covers the sort of things that you will want to read about: http://gafferongames.com/networking-for-game-programmers/
(He doesn't use Java, but the ideas are universal)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Now that the oligopole of market data providers successfully killed OpenQuant, does any alternative to proprietary and expensive subscriptions for realtime market data subsist?
Ideally I would like to be able to monitor tick by tick securities from the NYSE, NASDAQ and AMEX (about 6000 symbols).
Most vendors put a limit of 500 symbols watchable at the same time, this is unacceptable to me, even if one can imagine a rotation among the 500 symbols ie. making windows of 5 sec. of effective observation out of each minute for every symbol.
Currently I'm doing this by a Java thread pool calling Google Finance, but this is unsatisfactory for several reasons, one being that Google doesn't return the volume traded, but the main one being that Google promptly is killing bots attempting to take advantage of this service ;-)
Any hint much appreciated,
Cheers
I think you'll find all you need to know by looking at this question: source of historical stock data
I don't know of any free data feeds other than Yahoo!, but it doesn't offer tick-by-tick data, it only offers 1 minute intervals with a 15 minute delay. If you want to use an already existing tool to download the historical data, then I would recommend EclipseTrader. It only saves the Open, Close, High, Low, and Volume.
(source: divbyzero.com)
You can write your own data scraper with very little effort. I've written an article on downloading real-time data from yahoo on my blog, but it's in C#. If you're familiar with C# then you'll be able to translate the action in Java pretty quickly. If you write your own data scraper then you can get pretty much ANYTHING that Yahoo! shows on their web site: Bid, Ask, Dividend Share, Earnings Share, Day's High, Day's Low, etc, etc, etc.
If you don't know C# then don't worry, it's REALLY simple: Yahoo allows you to download CSV files with quotes just by modifying a URL. You can find out everything about the URL and the tags that are used on yahoo here: http://www.gummy-stuff.org/Yahoo-data.htm
Here are the basic steps you need to follow:
Construct a URL for the symbol or multiple symbols of your choice.
Add the tags which you're interested in downloading (Open, Close, Volume, Beta, 52 week high, etc, etc.).
Create a URLConnection with the URL you just constructed.
Use a BufferedReader to read the CSV file that is returned from the connection stream.
Your CSV will have the following format:
Each row is a different symbol.
Each column is a different tag.
Open a TDAmeritrade account and you will have free access to ThinkOrSwim real time trading and quotes platform. Live trading is real time and paper trading is delayed 15 minutes. I forget what the minimum required is to open a TDAmeritrade account but you can go to TDAMeritrade.com or thinkorswim.com to check them out.
Intrinio has a bunch of feeds with free and paid tiers. Essentially you only have to pay for what you need as opposed to the bigger data suppliers. Intrinio focuses on data quality and caters to developers as well, so I think it'd be a great option for you.
full disclosure - I work at Intrinio as a developer
There's a handy function in Google Sheets (ImportHTML) which I've been using for a while to reasonable effect.
For example -
=ImportHTML("http://www.bloomberg.com/markets/commodities/futures/metals/","table",1),5,3) returns the EUR Gold spot price.
It works with Yahoo too, so =Index(ImportHTML("http://finance.yahoo.com/q?s=DX-Y.NYB","table",0),2,2) returns the DXY.
The data updates with some small delay but it's usable.