Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have been experimenting with ways to use the processing power of two computers together as one (not by physically connecting them, but by splitting the task in half and each computer does a half, then the result from the "helper" computer is sent back to be combined with the result from the "main" computer via internet)
I've been using this method to compute fractal images and it works great. The left half and the right half of the image are computed on separate computers, then combined into one. The process of sending one half of the image to the other computer and combining them takes maybe a second, so the efficiency is great and cuts time down by about half.
The problem comes when you want to do this "multi computer processing" with something that needs data exchanged very frequently.
For example, I'd like to use this for something like an n-body simulation. You need the data exchange to happen multiple times per second, so if the exchange takes about a second it actually takes much longer to try and use two computers then it would with one.
So how do online video games do it? The players around you, what they are doing, what they are wearing, everything going on has to be exchanged between everyone playing many times per second.
I'm just looking for general ideas on how to send larger amounts of data and at fast speeds.
The way I have been doing it is with PHP on a free hosting site. The helper computer will compute its half of the data then sends it to the PHP file which saves that data somewhere. Then the main computer reads this and combines it with the data it computed already.
I have a feeling PHP isn't the way to go, but I don't know much about this sort of thing.
Your first step will be to move from using HTTP Requests to using Sockets directly - this will give you much more control over the communication, and give you improved performance by reducing the overhead of the HTTP protocol (this is potentially pretty significant). Plus, with sockets you can more easily have your programs communicate to each other directly, rather than through the PHP-based software.
There are a ton of guides online as to how you would do this sort of system, and I would recommend Googling things like "game networking" and "distributed computing".
Here is one series of articles that I have found useful in the past, that covers the sort of things that you will want to read about: http://gafferongames.com/networking-for-game-programmers/
(He doesn't use Java, but the ideas are universal)
Related
I have a small question that might be answered by a more experienced programmer than me.
I have to plot in real time two streams of data. In the end I need to build 5 different plots because in each plot I need to show different unit measures.
The first stream contains 4 values and one timestamp, the second stream has one value and one timestamp.
So, my idea is to run two threads that handle the two different streams which are received with different time intervals(6s the first and 1s the second one). Both threads should be able to receive the data and update the graphs.
My question is: which language do you think would be better between Python and Java, to implement this?
My fear is that using Java will result in a very slow UI.
It would be nice to have an answer supported by some considerations.
Thank you.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am working on a project, where I was provided a Java matrix-multiplication program which can run in a distributed system , which is run like so :
usage: java Coordinator maxtrix-dim number-nodes coordinator-port-num
For example:
java blockMatrixMultiplication.Coordinator 25 25 54545
Here's a snapshot of how output looks like :
I want to extend this code with some kind of failsafe ability - and am curious about how I would create checkpoints within a running matrix multiplication calculation. The general idea is to recover to where it was in a computation (but it doesn't need to be so fine grained - just recover to beginning, i.e row 0 column 0 )
My first idea is to use log files (like Apache log4j ), where I would be logging the relevant matrix status. Then, if we forcibly shut down the app in the middle of a calculation, we could recover to a reasonable checkpoint.
Should I use MySQL for such a task (or maybe a more lightweight database)? Or would a basic log file ( and using some useful Apache libraries) be good enough ? any tips appreciated, thanks
source-code :
MatrixMultiple
Coordinator
Connection
DataIO
Worker
If I understand the problem correctly, all you need to do is recover your place in a single matrix calculation in the event of a crash or if the application is quit half way through.
Minimum Viable Solution
The simplest approach would be to recover just the two matrixes you were actively multiplying, but none of your progress, and multiply them from the beginning next time you load the application.
The Process:
At the beginning of public static int[][] multiplyMatrix(int[][] a, int[][] b) in your MatrixMultiple class, create a file, let's call it recovery_data.txt with the state of the two arrays being multiplied (parameters a and b). Alternatively, you could use a simple database for this.
At the end of public static int[][] multiplyMatrix(int[][] a, int[][] b) in your MatrixMultiple class, right before you return, clear the contents of the file, or wipe you database.
When the program is initially run, most likely near the beginning of the main(String[] args) you should check to see if the contents of the text file is non-null, in which case you should multiply the contents of the file, and display the output, otherwise proceed as usual.
Notes on implementation:
Using a simple text file or a full fledged relational database is a decision you are going to have to make, mostly based on the real world data that only you could really know, but in my mind, a textile wins out in most situations, and here are my reasons why. You are going to want to read the data sequentially to rebuild your matrix, and so being relational is not that useful. Databases are harder to work with, not too hard, but compared to a text file there is no question, and since you would not be much use of querying, that isn't balanced out by the ways they usually might make a programmers life easier.
Consider how you are going to store your arrays. In a text file, you have several options, my recommendation would be to store each row in a line of text, separated by spaces or commas, or some other character, and then put an extra line of blank space before the second matrix. I think a similar approach is used in crAlexander's Answer here, but I have not tested his code. Alternatively, you could use something more complicated like JSON, but I think that would be too heavy handed to justify. If you are using a database, then the relational structure should make several logical arrangements for your data apparent as well.
Strategic Checkpoints
You expressed interest in saving some calculations by taking advantage of the possibility that some of the calculations will have already been handled on last time the program ran. Lets look first look at the Pros and Cons of adding in checkpoints after every row has been processed, best I can see them.
Pros:
Save computation time next time the program is run, if the system had been closed.
Cons:
Making the extra writes will either use more nodes if distributed (more on that later) or increase general latency from calculations because you now have to throw in a database write operation for every checkpoint
More complicated to implement (but probably not by too much)
If my comments on the implementation of the Minimum Viable Solution about being able to get away with a text file convinced you that you would not have to add in RDBMS, I take back the parts about not leveraging queries, and everything being accessed sequentially, so a database is now perhaps a smarter choice.
I'm not saying that checkpoints are definitely not the better solution, just that I don't know if they are worth it, but here is what I would consider:
Do you expect people to be quitting half way through a calculation frequently relative to the total amount of calculations they will be running? If you think this feature will be used a lot, then the pro of adding checkpoints becomes much more significant relative to the con of it slowing down calculations as a whole.
Does it take a long time to complete a typical calculation that people are providing the program? If so, the added latency I mentioned in the cons is (percentage wise) smaller, and so perhaps more tolerable, but users are already less happy with performance, and so that cancels out some of the effect there. It also makes the argument for checkpointing more significant because it has the potential to save more time.
And so I would only recommend checkpointing like this if you expect a relatively large amount of instances where this is happening, and if it takes a relatively large amount of time to complete a calculation.
If you decide to go with checkpoints, then modify the approach to:
after every row has been processed on the array that you produce the content of that row to your database, or if you use the textile, at the end of the textile, after another empty line to separate it from the last matrix.
on startup if you need to finish a calculation that has already been begun, solve out and distribute only the rows that have yet to be considered, and retrieve the content of the other rows from your database.
A quick point on implementing frequent checkpoints: You could greatly reduce the extra latency from adding in frequent checkpoints by pushing this task out to an additional thread. Doing this would use more processes, and there is always some latency in actually spawning the process or thread, but you do not have to wait for the entire write operation to be completed before proceeding.
A quick warning on the implementation of any such failsafe method
If there is an unchecked edge case that would mean some sort of invalid matrix would crash the program, this failsafe now bricks the program it entirely by trying it again on every start. To combat this, I see some obvious solutions, but perhaps a bit of thought would let you modify my approaches to something you prefer:
Use a lot of try and catch statements, if you get any sort of error that seems to be caused by malformed data, wipe your recovery file, or modify it to add a note that tells your program to treat it as a special case. A good treatment of this special case may be to display the two matrixes at start with an explanation that your program failed to multiply them likely due to malformed content.
Add data in your file/database on how many times the program has quit while solving the current problem, if this is not the first resume, treat it like the special case in the above option.
I hope that this provided enough information for you to implement your failsafe in the way that makes the most sense given what you suspect the realistic use to be, and note that there are perhaps other ways you could approach this problem as well, and these could equally have their own lists of pros and cons to take into consideration.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I created a simple rest service POST method that consumes a XML. I created a REST client with Jersey and created my object and i am trying to see the variations in response time by increasing my XML length, by giving larger inputs to my objects. Say, my objects takes up a simple employee details, i will increase . I see that the response time inconsistently varies , from my observation it is not dependent on the size of the xml. I am computing the time taken as follows.
long startTime = System.currentTimeMillis();
// enter code here for post `
long elapsedTime = System.currentTimeMillis() - startTime;
Please Suggest if there is a better way of doing it.
Here what i would like to get clarified is my server is in the local host and why the response time varies (say once it is 88ms and the other time it is 504ms ). What i expect is it should increase when i am giving larger inputs to my xml object but that does not happen as i observe. Please clarify or point me to a better site or book where i can read about the same.
Note that your question is quite broad (and will likely be closed as such). My explanation is similarly broad and just meant to give you some background on why you might see the behavior that you are seeing.
It is unlikely that the way you measure time makes a big difference here, given that you are up to hundreds of milliseconds. It is more likely that the web site that you are invoking is sometimes takes longer to respond.
It may help to compare it to what you see when you type a search query into Google. Sometimes the response pops up "instantaneously", but sometimes it takes a few moments to load. Since you're using the same browser for every search, it can't be the browser causing the difference.
Instead it is likely something in the web service that is varying between the calls. In the Google search example, you might be routed to a different Google server, the server might be using a different storage to search, the query might be cached at some point, etc.
So there is no way to determine why the performance is different between invocations by simply looking at the client code. Instead, you should look at the server code and profile what happens there for different invocations.
use System.nanoTime instead of currentTimeMillis; also be aware that Java optimises on the fly and if you do not warm the server first then you will be timing interpreted execution of the byte code and then at some point it will flip to native byte code. The timings will differ by input length (in a step like function) but you have a certain amount of noise to overcome before you will see that. A lot from GC, a lot from thread scheduling and OS issues.
I recommend the mechanical sympathy group for discussions on this sort of thing. There are many discussions on how to measure these types of systems. https://groups.google.com/forum/#!forum/mechanical-sympathy
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am thinking of assembling this system:
AMD CPU(A8-3870 APU which has Radeon HD 6550D inside: 400 stream processors:xxx GFLOPS) nearly 110$
AMD Graphics card: HD 7750 (512 stream processors:819 GFLOPS peak performance) nearly 170$
an appropriate ram(1600MHz bus) and mainboard
Can i achieve 819+xxx GFLOPS peak-performance mentioned in official sites with using OpenCL and similar programs?
Can i use all 912 cores with OpenCL/Jocl and is it important to add cpu-cores to the pot(4 of them(of course 2 of them will be used for feeding gpu))?
C++ or Java, which one has the most yielding libraries for using multiple gpu's or apu's present on computer?
What happens if i cancel both apu and gpu and buy a single Nvidia GTX-660? This wins?(229$ -1800GFLOPS)(with a simple 4-core cpu of cheapest without apu)
I am not trying to do a VS question. I need to know what could be better for scientific computing(%75 of the time) and gaming(%25 of the time) because i have a low budget. With "scientific calculations" i mean fluid dynamics+solidstate physics simulating. With games i mean those have openCL and PhysX.
Can you give a very very minimal simple example of OpenCL code using multiple GPUs ?
Thank you.
Can i achieve 819+xxx GFLOPS mentioned in official sites with using OpenCL and similar programs?
This is the peak performance. One definition of peak perform is; A manufacturers guarantee not exceed this rating.
You can achieve this number most likely, but not doing something useful. What you can achieve for your specific requirement depends greatly on what it is. You might expect to get 0.1% to 10% of this value in reality.
C++ or Java, which one has the most yielding libraries for using multiple gpu's or apu's present on computer?
I would use whatever you are most comfortable with. You can call the GPU from either, but the language you use is C-like so it doesn't matter what the "host" language is.
What happens if i cancel both apu and gpu and buy a single Nvidia GTX-660?
Impossible to say, but there is a good chance whatever you choose will be okay.
Can you give a very very minimal simple example of OpenCL code using multiple GPUs ?
There are lots of example on the web, but you really need to focus on what you will be using the system for.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Now that the oligopole of market data providers successfully killed OpenQuant, does any alternative to proprietary and expensive subscriptions for realtime market data subsist?
Ideally I would like to be able to monitor tick by tick securities from the NYSE, NASDAQ and AMEX (about 6000 symbols).
Most vendors put a limit of 500 symbols watchable at the same time, this is unacceptable to me, even if one can imagine a rotation among the 500 symbols ie. making windows of 5 sec. of effective observation out of each minute for every symbol.
Currently I'm doing this by a Java thread pool calling Google Finance, but this is unsatisfactory for several reasons, one being that Google doesn't return the volume traded, but the main one being that Google promptly is killing bots attempting to take advantage of this service ;-)
Any hint much appreciated,
Cheers
I think you'll find all you need to know by looking at this question: source of historical stock data
I don't know of any free data feeds other than Yahoo!, but it doesn't offer tick-by-tick data, it only offers 1 minute intervals with a 15 minute delay. If you want to use an already existing tool to download the historical data, then I would recommend EclipseTrader. It only saves the Open, Close, High, Low, and Volume.
(source: divbyzero.com)
You can write your own data scraper with very little effort. I've written an article on downloading real-time data from yahoo on my blog, but it's in C#. If you're familiar with C# then you'll be able to translate the action in Java pretty quickly. If you write your own data scraper then you can get pretty much ANYTHING that Yahoo! shows on their web site: Bid, Ask, Dividend Share, Earnings Share, Day's High, Day's Low, etc, etc, etc.
If you don't know C# then don't worry, it's REALLY simple: Yahoo allows you to download CSV files with quotes just by modifying a URL. You can find out everything about the URL and the tags that are used on yahoo here: http://www.gummy-stuff.org/Yahoo-data.htm
Here are the basic steps you need to follow:
Construct a URL for the symbol or multiple symbols of your choice.
Add the tags which you're interested in downloading (Open, Close, Volume, Beta, 52 week high, etc, etc.).
Create a URLConnection with the URL you just constructed.
Use a BufferedReader to read the CSV file that is returned from the connection stream.
Your CSV will have the following format:
Each row is a different symbol.
Each column is a different tag.
Open a TDAmeritrade account and you will have free access to ThinkOrSwim real time trading and quotes platform. Live trading is real time and paper trading is delayed 15 minutes. I forget what the minimum required is to open a TDAmeritrade account but you can go to TDAMeritrade.com or thinkorswim.com to check them out.
Intrinio has a bunch of feeds with free and paid tiers. Essentially you only have to pay for what you need as opposed to the bigger data suppliers. Intrinio focuses on data quality and caters to developers as well, so I think it'd be a great option for you.
full disclosure - I work at Intrinio as a developer
There's a handy function in Google Sheets (ImportHTML) which I've been using for a while to reasonable effect.
For example -
=ImportHTML("http://www.bloomberg.com/markets/commodities/futures/metals/","table",1),5,3) returns the EUR Gold spot price.
It works with Yahoo too, so =Index(ImportHTML("http://finance.yahoo.com/q?s=DX-Y.NYB","table",0),2,2) returns the DXY.
The data updates with some small delay but it's usable.