LSH implementation for finding clusters [closed]

LSH implementation for finding clusters [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Hie guys.
I am very new to stack exchange and I am currently doing a research on graph theory.
The set of questions I'm going to ask are very introductory since I'm a beginner level programmer (not acquainted with hashing, buckets, vectors etc data structure wise).
My idea is to take in a dataset of the form (timestamp t, node i, node j) which says that there is an edge between i and j at time t. The idea is to search the neighborhood set of each nodes and hash them. If their "vectors" (I don't understand what that is) hash into the same bucket - they are candidates for cluster formation.
But he problem is I want to do experiments and try to run it. But have no idea how do I implement a hash function, and then bucket them together.
I'm not saying help me out with the code. But a pointer (pseudo code) would be very helpful. Like telling me to initialize a hash table etc etc

A hash code is an integer which is calculated from the properties of whatever it is you want to hash. That number is then used as an index into an array.
In this case it seems that you want to use the N dimensions of your vector to calculate this hash code. It's up to you to write a function that calculates that hash codes in a way that vectors that should be clustered all get the same hash code.
Language specific details about hash tables in Java or Python is very easy to find with a web search.

Related

huffman tree without using priority queues [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 months ago.
Improve this question
for my project i have to create a huffman tree project, but my lecturer has said that i cannot use priority queues to build it?
But i dunno how to implement that.
Are there any other ways i can create a huffman tree without using priority queues?
This is an example of a huffman tree but it is using priority queues
enter image description here
enter image description here

There's a trick that is often used to build Huffman trees in practice:
Create a list of your symbols with probabilities, and sort it in ascending order
Create an initially empty list for combined symbols. This will remain sorted as we work.
While there is more than one symbol in the lists:
Remove the two smallest symbols from the beginnings of two lists
Combine them and add them to the end of the combined list. Because the new symbol has a higher combined probability than any combined symbol created before, this list remains sorted.
After the initial sorting, the smallest probability symbol will always be the first one of one of the two lists, so no priority queues or searching is required to find it.
This technique is quite clever, and your lecturer would not expect you to think of it yourself, so it was probably taught or referenced in class.

Run a math expression from string in java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I want to know if there is any efficient method to Run a math expression from string in java , Having some example input and results of that function.
Starting from simple linear functions : a*x+b .To more complex ones
Or is there any good source i can start reading.

I take your task as: take observed input-output and learn some representation which is able to do that transformation with new inputs.
(Some) Neural Networks can learn an approximation-function (Universal approximation theorem
) (and probably other approaches), but there is something important to remark:
Without assumptions about your function (e.g. smoothness), there can't be an algorithm achieving what you want to do! Without assumptions there are infinite many approximation-functions, which are all equally good on your examples, but behave arbitrarily different on new data!
(I'm also ignoring special-cases as: random-data or cryptographic random-generators where this mapping also can't be learned (the former in theory; the latter at least in practice)

Creating my own Lists and Maps data structures [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
For practice I want to make my own lists and maps (like ArrayList, HashMap, HashSet etc.).
My goal is to have it as small and flexible as possible while still maintaining good performance. (long road...)
I have some questions:
1)
Unlike the sun, I don't have to take backwards compatibility into account.
So the first thing I wonder, is there any good reason to keep add and put?
Why not just one?
If I would name put > add would this give problems / complexity / unclearness down the road?
2)
Are there any languages known to have really good data structures? (For example, they could be really smart to avoid a concurrency exception).
3)
As last more a request then a question, if you have any tips our vision of how things could be done different then please post them.

There is no duplicated methods, Collection's have add method that returns a boolean, Map's have put method that returns type associated to Map.
There are plenty of examples of data structure, the point is, ¿what you need your data stucture do best? Avoid concurrency? sort? be fast? store securely?
The examples you need are directly in Java source code:
SOURCES
List
ArrayList
HashMap
and so on....

Search for a "close point" efficiently [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I'm asking this because I'm sure there has been brought up before but not sure how to call it.
I need an efficient way to search and store some point in a metric space. Specifically, I need to find the weather at some points in space and time. I have an API to do that, but don't want to make another request if I have already queried in the past about a point a few inches away from the new point and a few seconds before it, as the weather there would be the same.
So when I receive a new point, I need to ask - do I have in the cache a point "close enough" (whose distance from the new point is a below a threshold)?
If I do - take the data associated with that point. Otherwise, cache the new point.
This can be done easily using a serial check but I'm interested in ways to do it more efficiently.
Thanks!

Suppose your thresshold is t, you can split your search space to a grid,
with cells having width and height t.
Every cell will have a list of points that lie in it.
Now when given a new point, you compute which cell it falls into,
let this be the cell [i,j], you check this cell and all of its neigbors
(i.e. 9 cells altogether), whether they contain any points and if they do,
these are your candidates for closer-than-thresshold points.
You will now compute the distance of all of these.
Since the cells are t wide and t high, distance of all points
lying in any other cell is at least t.
You can store the grid cells in a TreeMap, with the comparator based on [i,j] pairs.
(you only store the cells which have at last one point in them).

Wouldn't you know. Going to a search engine called Google and using some time to string words together, you get Nearest neighbor search as the first result. Who'da thunk it.

Maybe what you need is a Voronoi diagram built from your cache points.

Algorithm for finding trends in data? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm looking for an algorithm that is able to find trends in large amounts of data. For instance, if one is given time t and a variable x, (t,x), and given input such as {(1,1), (2,4), (3,9), (4,16)}, it should be able to figure out that the value of x for t=5 is 25. How is this normally implemented? Do most algorithms compute lines of best fit that are linear, quadratic, exponential, etc. and then chooses the line of best fit with the lowest standard deviation? Are there other techniques for finding trends in data? Also, what happens when you increase the number of variables to analyze large vectors?

This is a really complex question, try to start from: http://en.wikipedia.org/wiki/Interpolation

There is no simple answer for a complex problem: http://en.wikipedia.org/wiki/Regression_analysis

A Neural network might be a good candidate. Especially if you want to learn it something nonlinear.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

LSH implementation for finding clusters [closed] - java

Related

huffman tree without using priority queues [closed]

Run a math expression from string in java [closed]

Creating my own Lists and Maps data structures [closed]

Search for a "close point" efficiently [closed]

Algorithm for finding trends in data? [closed]

Categories

Resources