Best way to implement friends list into a database? MySQL - java

So my project has a "friends list" and in the MySQL database I have created a table:
nameA
nameB
Primary Key (nameA, nameB)
This will lead to a lot of entries, but to ensure that my database is normalised I'm not sure how else to achieve this?
My project also uses Redis.. I could store them there.
When a person joins the server, I would then have to search for all of the entries to see if their name is nameA or nameB, and then put those two names together as friends, this may also be inefficient.
Cheers.

The task is quite common. You want to store pairs where A|B has the same meaning as B|A. As a table has columns, one of the two will be stored in the first column and the other in the second, but who to store first and who second and why?
One solution is to always store the lesser ID first and the greater ID second:
userid1 | userid2
--------+--------
1 | 2
2 | 5
2 | 6
4 | 5
This has the advantage that you store each pair only once, as feels natural, but has the disadvantage that you must look up a person in both coumns and find their friend sometimes in the first and sometimes in the second column. That may make queries kind of clumsy.
Another method is to store the pairs redundantly (by using a trigger typically):
userid1 | userid2
--------+--------
1 | 2
2 | 1
2 | 5
2 | 6
4 | 5
5 | 2
5 | 4
6 | 2
Here querying is easier: Look the person up in one column and find their friends in the other. However, it looks kind of weird to have all pairs duplicated. And you rely on a trigger, which some people don't like.
A third method is to store numbered friendships:
friendship | user_id
-----------+--------
1 | 1
1 | 2
2 | 2
2 | 5
3 | 2
3 | 6
4 | 4
4 | 5
This gives both users in the pair equal value. But in order to find friends, you need two passes: find the friendships for a user, find the friends in these friendships. However, the design is very clear and even extensible, i.e. you could have friendships of three four or more users.
No method is really much better than the other.

Related

Implement minhash LSH using Spark (Java)

this is quite long, and I am sorry about this.
I have been trying to implement the Minhash LSH algorithm discussed in chapter 3 by using Spark (Java). I am using a toy problem like this:
+--------+------+------+------+------+
|element | doc0 | doc1 | doc2 | doc3 |
+--------+------+------+------+------+
| d | 1 | 0 | 1 | 1 |
| c | 0 | 1 | 0 | 1 |
| a | 1 | 0 | 0 | 1 |
| b | 0 | 0 | 1 | 0 |
| e | 0 | 0 | 1 | 0 |
+--------+------+------+------+------+
the goal is to identify, among these four documents (doc0,doc1,doc2 and doc3), which documents are similar to each other. And obviously, the only possible candidate pair would be doc0 and doc3.
Using Spark's support, generating the following "characteristic matrix" is as far as I can reach at this point:
+----+---------+-------------------------+
|key |value |vector |
+----+---------+-------------------------+
|key0|[a, d] |(5,[0,2],[1.0,1.0]) |
|key1|[c] |(5,[1],[1.0]) |
|key2|[b, d, e]|(5,[0,3,4],[1.0,1.0,1.0])|
|key3|[a, c, d]|(5,[0,1,2],[1.0,1.0,1.0])|
+----+---------+-------------------------+
and here is the code snippets:
CountVectorizer vectorizer = new CountVectorizer().setInputCol("value").setOutputCol("vector").setBinary(false);
Dataset<Row> matrixDoc = vectorizer.fit(df).transform(df);
MinHashLSH mh = new MinHashLSH()
.setNumHashTables(5)
.setInputCol("vector")
.setOutputCol("hashes");
MinHashLSHModel model = mh.fit(matrixDoc);
Now, there seems to be two main calls on the MinHashLSHModel model that one can use: model.approxSimilarityJoin(...) and model.approxNearestNeighbors(...). Examples about using these two calls are here: https://spark.apache.org/docs/latest/ml-features.html#lsh-algorithms
On the other hand, model.approxSimilarityJoin(...) requires us to join two datasets, and I have only one dataset which has 4 documents and I would like to figure out which ones in these four are similar to each other, so I don't have a second dataset to join... Just to try it out, I actually joined my only dataset with itself. Based on the result, seems like model.approxSimilarityJoin(...) just did a pair-wise Jaccard calculation, and I don't see any impact by changing the number of Hash functions etc, left me wondering about where exactly the minhash signature was calculated and where the band/row partition has happened...
The other call, model.approxNearestNeighbors(...), actually asks a comparison point, and then the model will identify the nearest neighbor(s) to this given point... Obviously, this is not what I wanted either, since I have four toy documents, and I don't have an extra reference point.
I am running out of ideas, so I went ahead implemented my own version of the algorithm, using Spark APIs, but not much support from MinHashLSHModel model, which really made me feel bad. I am thinking I must have missed something... ??
I would love to hear any thoughts, really wish to solve the mystery.
Thank you guys in advance!
The minHash signatures calculation happens in
model.approxSimilarityJoin(...) itself where model.transform(...)
function is called on each of the input datasets and hash signatures
are calculated before joining them and doing a pair-wise jaccard
distance calculation. So, the impact of changing the number of hash
functions can be seen here.
In model.approxNearestNeighbors(...),
the impact of the same can be seen while creating the model using
minHash.fit(...) function in which transform(...) is called on
the input dataset.

Getting distinct combination of two columns

Using Hibernate, I want to get the rows of values such that:
col1 | col2
-------+-------
1 | 2
-------+-------
2 | 1
-------+-------
3 | 4
-------+-------
4 | 5
-------+-------
4 | 3
will produce:
col1 | col2
-------+-------
1 | 2
-------+-------
3 | 4
-------+-------
4 | 5
Can I get this out in Hibernate on grails? Or can anyone provide a MySQL implementation of this. Been battling this long enough.
You can use mysql least() and greatest() operators to make sure that the smaller number comes first and the highest comes later. This way you can use distinct to eliminate duplicates:
select distinct least(col1, fol2) as col1, greatest(col1, col2) as col2
from yourtable
You can group the two column like :
select * from yourtable group by (col1+col2);

Why set insert the item in ordered?

Actually Set is not an ordered one. I just create the set and insert the numbers 5,2,10.
Wen it is printed in the console, it prints as 2,5,10.
Why since set is not ordered?
This is because this speeds up queries for whether a certain element is part of the set.
The difference is that this behavior is not guaranteed. It may be beneficial to keep small sets ordered for fast lookup, but switch to a hash based implementation once a certain number of elements has been reached, at which point elements would suddenly be sorted by hash value.
Set is an interface while it has several implementations. HashSet is not guaranteed your insertion order(not ordered). LinkedHashSet preserve insertion order and TreeSet give you a sorted set(sorted set).
Then you insert 5,2,10 to HashSet you can't guaranteed the same order.
Set is just an interface, assuming that you are talking about HashSet (because it's where this happens), it doesn't keep them sorted. For example:
HashSet<Integer> set = new HashSet<Integer>();
set.add(1);
set.add(16);
System.out.println(set);
Output
[16, 1]
This is because an HashSet uses the hashcode function to compute the index where the item will be stored in an array-like structure. This way, since the hashcode never changes, it can extract the element from the correct index computing again the hashcode and checking the cell at that index.
The hashcode function converts most classes to an integer:
System.out.println(new Integer(1).hashCode()); # 1
System.out.println(new Integer(1000).hashCode()); # 1000
System.out.println("Hello".hashCode()); # 69609650
Each class can define its own way to compute the hashcode and Integer returns itself.
As you can see numbers get big soon, and we don't want to have an array with 1000 cells just to save the two integers.
To avoid the problem we can create an array with n elements and then use the remainder of the hashcode divided by n as the index.
For example if we want to find the index for 1000 in an array of 16 elements:
System.out.println(new Integer(1000).hashCode() % 16); # 8
So our dictionary will know that the integer 1000 is at index 8. That's how HashSet is implemented.
So, why [16, 1] is not ordered? That's because HashSet are created with 16 elements as capacity at the beginning (when not differently specified), and grow as needed (more on this here).
Let's compute the index to store the data having key = 2 and key = 9 in a dictionary with n = 8:
System.out.println(new Integer(1).hashCode() % 16); # 1
System.out.println(new Integer(16).hashCode() % 16); # 0
This means that the array that contains the dictionary data will be:
| index | value |
|-------|-------|
| 0 | 16 |
| 1 | 1 |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | |
| 13 | |
| 14 | |
| 15 | |
Iterating over it, the order will be the one presented in this representation, so 16 will be before 1.
Set is the Interface. It only indicates avoid duplicate entity of the collection.
HashSet internally uses Hashmap. Normally hashmap uses hashcode. So It wont return in ordered way. If you want insertion order you will use LinkedHashMap.
Set is just an interface. Ordering will depend on implementation. For example TreeSet is an ordered implementation of Set.

Interpret sentence and convert into their corresponding format

There are some formats of inputs and their corresponding outputs
1. 7 years 10 months ---> YRS:7 MNHS:10
2. 7 kgs 10 grms ---> KGS:7 GRMS:10
3. 7 kilograms 10 grams ---> KGS:7 GRMS:10
4. 7 thousand 9 hundread ---> 7900
5. seven years ten months --> YRS:7 MNHS:10
6. seven kgs ten grms ---> KGS:7 GRMS:10
7. triple seven double five --> 77755
I wrote separate modules for all by storing informations in **HashMap. And it is working fine.**
Then I need to write one main module in which input is one sentence(utterance), and I need to replace all above substrings into corresponding substring output.
For example,
Input :- Dial number triple eight triple four three nine eight.
Output :- Dial number 888444398.
and many such utterances.
My doubts :-
I used numbers of HashMap for smaller modules to store meaning of keys, just like - triple means 3 times, double means 2 times and all. But this has limitation that if I need to add anything I have to add that entry in HashMap. Suggest some good technique for this.
I am confused in main module, how to extract useful substring given in above examples from given utterances. So suggest some good technique for this also.
Project Lanuguage : Java.
You should look at Illinos Quantifier package:
http://cogcomp.cs.illinois.edu/page/software_view/Quantifier
http://cogcomp.cs.illinois.edu/demo/quantities/results.php
You might want to use some kind of formal grammar parser. Just doing design of a grammar can clear a view of the problem. In the most simple case your grammar could look like:
STRING -> "" | STRING MEASUREMENT | STRING NUMBER | STRING WORD
MEASUREMENT -> NUMBER UNITS
UNITS -> kgs | grms | years | months | ...
NUMBER -> THOUSAND HUNDRED NUMBER_BELOW_HUNDRED | THOUSAND HUNDRED
THOUSAND -> "" | NUMBER_BELOW_HUNDRED thousand
HUNDRED -> "" | NUMBER_BELOW_HUNDRED hundred
NUMBER_BELOW_HUNDRED -> one | two | three | ... | ninety nine | 99 | 98 | ... | 1
WORD -> /* all other */
You can write a parser by yourself (in this case it seems to be pretty easy) or use a ready solution like Bison/Flex.
The usual alternative for your HashMaps are configuration files.

Combinations Array Java

Given a array for any dimension (for instance [1 2 3]), a function that gives all combinations like
1 |
1 2 |
1 2 3 |
1 3 |
2 |
2 1 3 |
2 3 |
...
Since I'm guessing this is homework, I'll try to refrain from giving a complete answer.
Suppose you already had all combinations (or permutations if that is what you are looking for) of an array of size n-1. If you had that, you could use those combinations/permutations as a basis for forming the new combinations/permutations by adding the nth element to them in the appropriate way. That is the basis for what computer scientists call recursion (and mathematicians like to call a very similar idea induction).
So you could write a method that would handle the the n case, assuming the n-1 case had been handled, and you can put a check to handle the base case as well.

Categories