Faster SQL data retrival with Java and search large data

Faster SQL data retrival with Java and search large data - java

I have a table with over 100 thousand data consisting of number pairs. A sample of which is shown below.
A B
0010 0010
0010 0011
0010 0019
0010 0056
0011 0010
0011 0011
0011 0019
0011 0040
0019 0010
0019 0058
Here the numbers in Column A has possible pairs present in column B.
Explanation : User will have several of these numbers ranging form 10 -100. Now as we can see for 0010 - 0011 and 0019 is present. So if the user has a list containing 0010 along with 0011 a warning will be shown that this pair is not allowed and vice versa.
How to approach this in Java?
Loading the hash map with all the data doesnot seem to be a good option although the search will be much faster.
please suggest. Thanks
Testcase:
num = 0010; //value from list which user will be passing
test(num){
if(num.equals("0019")||num.equals("0011")) //compairing with database
System.out.println("incompatible pair present");
}
The above example is a very simple pseudo code. The actual problem will me much more complex.

Until the question is more clear...
Handling large amounts of data which are already stored in a database let me give you a recommendation: Whatever you want to do here, consider solving it with SQL instead of Java. Or at least write a SQL with an resulting ResultSet which is easy to evaluate in Java afterwards.
But until the question is not that clear ...

Are you trying to find entries where A is the same value but B is different?
SELECT t1.a, t1.b, t2.b
FROM MyTable t1, MyTable t2
WHERE t1.a = t2.b AND t1.b <> t2.b

If you're worried of running out of heap space, you could try using a persistent cache like ehcache. I suggest you check the actual memory consumed before going in for this solution though

Seems like your problem is limited to a very small domain - why cant you instantiate an two dimensional array of bool and set it to true whenever the indexes of two numbers creates an unsupported combination.
Example for usage:
if (forbiden[10][11] || forbiden[11][10])
{
throw new Exception("pairs of '10' and '11' are not allowed");
}
You can instantiate this array from the database by going over the data once and setting this array. You just need to translate 0010 to 10. You will have junk in Indexes 0-9, but you can eliminate it by "translating" the index by subtracting it from 9.
Does that hit your question?

If I have understood correctly what you want to do…
Create a unique index on t1(a,b). Put the user's new pair in an INSERT statement inside a try block. Catch key violation exceptions (will be aSQLException, possibly a subclass depending on your RDBMS) and explain to the user that is a forbidden pair.

Simple - definitely not scalable solution -- if your ranges really are 0000 - 9999.
Simply have a byte table with 999999 entries.
Each entry consists of a simple 0 for allowed or 1 for not allowed.
You find an entry in the table by logically concatenating the two pair numbers (key = first * 1000 + second).
The more scalable database solution is to create a table with a composite primary key (pair1 and pair2) the mere presence of an entry indicating a disallowed pair.

To clarify the question:
You have a table containing two numbers each record which are declared 'incompatible'.
You have a user list of numbers and you want to check if this list contains 'incompatible numbers'. Right?
Here you go with a simple SQL (took your example from comment):
SELECT *
FROM incompatible
WHERE A IN (1, 14, 67) AND B IN (1, 14, 67);
This SQL returns all incompatibilities. When the resultset is empty then there are no incompatibilities and everything is fine. If you only want to retrieve this fact then you can write SELECT 1 ... instead.
The SQL have to be build dynamically to contain the user's numbers in the IN clauses, of course.
To speed up queries you can create an (unique) index over both columns. So the database can do a index range scan (unique).
If this table does not yet contain a primary key then you should create a primary key over both columns.

Related

Efficient way to store and search number combinations

I have an application where i generate unique combinations of numbers like where each combination is unique.I want to store the combinations in a way that i should be able to retrieve a few random combinations efficently later.
Here is an example of one combination
35 36 37 38 39 40 50
There are ~4 millions of combinations in all.
How should I store the data so that i can retrieve combinations later?

Since your combinations are unique and you actually don't have a query criteria on your numbers, it does not matter how you store them in the database. Just insert them in some table. To retrieve X random combinations simply do:
SELECT * FROM table ORDER BY RANDOM() LIMIT X
See:
Select random row(s) in SQLite
On storing array of integers in SQLite:
Insert a table of integers - int[] - into SQLite database,

I think there might be a different solution; in the sense of: do you really have to store all those combinations?!
Assuming that those combinations are just "random" - you could be using some (smart) maths, to some function getCombinationFor(), like
public List<Integer> getCombinationFor(long whatever)
that uses a fixed algorithm to create a unique result for each incoming input.
Like:
getCombinationFor(0): gives 0 1 2 3 10 20 30
getCombinationFor(1): gives 1 2 3 4 10 20 30 40
The above is of course pretty simple; and depending on your requirements towards those sequences you might require something much complicated. But: for sure, you can define such a function to return a permutation of a fixed set of numbers within a certain range!
The important thing is: this function returns a unique List for each and any input; and also important: given a certain sequence, you can immediately determine the number that was used to create that sequence.
So instead of generating a huge set of data containing unique sequences, you simply define an algorithm that knows how to create unique sequences in a deterministic way. If that would work for you, it completely frees you from storing your sequences at all!
Edit: just remembered that I was looking into something kinda "close" to this question/answer ... see here!

Sending data:
Use UNIQUE index. On Unique Violation Exception just rerandomize and send next record.
Retrive:
The simplest possible to implement is: Use HashSet, feed it whith random numbers (max is number of combinations in your database -1) while hashSet.size() is less then desired number of records to retrive. Use numbers from hashSet as ID's or rownums of records (combinations) to select the data: WHERE ID in (ids here).

Mapping Unique 16-Digit numeric ID to Unique Alphanumeric ID

In a project I'm working on, I need to generate 16 character long unique IDs, consisting of 10 numbers plus 26 uppercase letters (only uppercase). They must be guaranteed to be universally unique, with zero chance of a repeat ever.
The IDs are not stored forever. An ID is thrown out of the database after a period of time and a new unique ID must be generated. The IDs can never repeat with the thrown out ones either.
So randomly generating 16 digits and checking against a list of previously generated IDs is not an option because there is no comprehensive list of previous IDs. Also, UUID will not work because the IDs must be 16 digits in length.
Right now I'm using 16-Digit Unique IDs, that are guaranteed to be universally unique every time they're generated (I'm using timestamps to generate them plus unique server ID). However, I need the IDs to be difficult to predict, and using timestamps makes them easy to predict.
So what I need to do is map the 16 digit numeric IDs that I have into the larger range of 10 digits + 26 letters without losing uniqueness. I need some sort of hashing function that maps from a smaller range to a larger range, guaranteeing a one-to-one mapping so that the unique IDs are guaranteed to stay unique after being mapped.
I have searched and so far have not found any hashing or mapping functions that are guaranteed to be collision-free, but one must exist if I'm mapping to a larger space. Any suggestions are appreciated.

Brandon Staggs wrote a good article on Implementing a Partial Serial Number Verification System. The examples are written in Delphi, but could be converted to other languages.

EDIT: This is an updated answer, as I misread the constraints on the final ID.
Here is a possible solution.
Let set:
UID16 = 16-digit unique ID
LUID = 16-symbol UID (using digits+letters)
SECRET = a secret string
HASH = some hash of SECRET+UID16
Now, you can compute:
LUID = BASE36(UID16) + SUBSTR(BASE36(HASH), 0, 5)
BASE36(UID16) will produce a 11-character string (because 16 / log10(36) ~= 10.28)
It is guaranteed to be unique because the original UID16 is fully included in the final ID. If you happen to get a hash collision with two different UID16, you'll still have two distinct LUID.
Yet, it is difficult to predict because the 5 other symbols are based on a non-predictable hash.
NB: you'll only get log2(36^5) ~= 26 bits of entropy on the hash part, which may or may not be enough depending on your security requirements. The less predictable the original UID16, the better.

One general solution to your problem is encryption. Because encryption is reversible it is always a one-to-one mapping from plaintext to cyphertext. If you encrypt the numbers [0, 1, 2, 3, ...] then you are guaranteed that the resulting cyphertexts are also unique, as long as you keep the same key, do not repeat a number or overflow the allowed size. You just need to keep track of the next number to encrypt, incrementing as needed, and check that it never overflows.
The problem then reduces to the size (in bits) of the encryption and how to present it as text. You say: "10 numbers plus 26 uppercase letters (only uppercase)." That is similar to Base32 encoding, which uses the digits 2, 3, 4, 5, 6, 7 and 26 letters. Not exactly what you require, but perhaps close enough and available off the shelf. 16 characters at 5 bits per Base32 character is 80 bits. You could use an 80 bit block cipher and convert the output to Base32. Either roll your own simple Feistel cipher or use Hasty Pudding cipher, which can be set for any block size. Do not roll your own if there is a major security requirement here. Your own Feistel cipher will give you uniqueness and obfuscation, not security. Hasty Pudding gives security as well.
If you really do need all 10 digits and 26 letters, then you are looking at a number in base 36. Work out the required bit size for 36^16 and proceed as before. Convert the cyphertext bits to a number expressed in base 36.
If you write your own cipher then it appears that you do not need the decryption function, which will save a little work.

You want to map from a space consisting of 1016 distinct values to one with 3616 values.
The ratio of the sizes of these two spaces is ~795,866,110.
Use BigDecimal and multiply each input value by the ratio to distribute the input keys equally over the output space. Then base-36 encode the resulting value.
Here's a sample of 16-digit values consisting of 11 digits "timestamp" and 5 digits server ID encoded using the above scheme.
Decimal ID Base-36 Encoding
---------------- ----------------
4156333000101044 -> EYNSC8L1QJD7MJDK
4156333000201044 -> EYNSC8LTY4Y8Y7A0
4156333000301044 -> EYNSC8MM5QJA9V6G
4156333000401044 -> EYNSC8NEDC4BLJ2W
4156333000501044 -> EYNSC8O6KXPCX6ZC
4156333000601044 -> EYNSC8OYSJAE8UVS
4156333000701044 -> EYNSC8PR04VFKIS8
4156333000801044 -> EYNSC8QJ7QGGW6OO
The first 11 digits form the "timestamp" and I calculated the result for a series incremented by 1; the last five digits are an arbitrary "server ID", in this case 01044.

Getting HBase table rows on the basis of timestamp in Java

I have been working in HBase for last some weeks. My question is:
I have a HBase table with 100 records and each record having three columns in one column family and there is just one column family. Now I want to retrieve the rows on the basis of timestamp. Means the row which is added at the last should be retrieved first. Its like (LIFO). Now is this functionality available in HBase? If yes then how can I do it? I am using 0.98.3.
NOTE: While inserting data I did not mention timestamps manually.
I am trying to do it in Java language.
Regards

Rows are naturally sorted lexicographically by the row key (ascending), timestamp is not involved at all when performing full table scans, the first row retrieved will be the lowest one.
This would be the order in case of string row keys:
STRING ROW
0 \x30
00 \x30\x30
0000 \x30\x30\x30\x30
0001 \x30\x30\x30\x31
0002 \x30\x30\x30\x32
...
0010 \x30\x30\x31\x30
1 \x31
10 \x31\x30
2 \x32
a \x61
ab \x61\x62
...
zzzz \x7A\x7A\x7A\x7A
This would be the order in case of 4 byte signed integer row keys:
INT ROW
1 \x00\x00\x00\x01
2 \x00\x00\x00\x02
3 \x00\x00\x00\x03
4 \x00\x00\x00\x04
...
100 \x00\x00\x00\x64
...
10000 \x00\x00\x27\x10
...
MAX_INT \x7F\xFF\xFF\xFF
If you need the scan to work as a LIFO, you have to include the inverted timestamp as a prefix for your rowkey (although this design is not recommended for heavy write environments due of hotspotting).
byte[] rowKey = Bytes.add( Byte.toBytes( Long.MAX_VALUE - System.currentTimeMillis() ), "-myRow".getBytes());
If you don't invert the timestamp, it will work as a LILO.
For more information take a look at this section of the HBase Book: https://hbase.apache.org/book/rowkey.design.html

Exclusive or between N bit sets

I am implementing a program in Java using BitSets and I am stuck in the following operation:
Given N BitSets return a BitSet with 0 if there is more than 1 one in all the BitSets, and 1 otherwise
As an example, suppose we have this 3 sets:
10010
01011
00111
11100 expected result
For the following sets :
10010
01011
00111
10100
00101
01000 expected result
I am trying to do this exclusive with bit wise operations, and I have realized that what I need is literally the exclusive or between all the sets, but not in an iterative fashion,
so I am quite stumped with what to do. Is this even possible?
I wanted to avoid the costly solution of having to check each bit in each set, and keep a counter for each position...
Thanks for any help
Edit : as some people asked, this is part of a project I'm working on. I am building a time table generator and basically one of the soft constraints is that no student should have only 1 class in 1 day, so those Sets represent the attending students in each hour, and I want to filter the ones who have only 1 class.

You can do what you want with two values. One has the bits set at least once, the second has those set more than once. The combination can be used to determine those set once and no more.
int[] ints = {0b10010, 0b01011, 0b00111, 0b10100, 0b00101};
int setOnce = 0, setMore = 0;
for (int i : ints) {
setMore |= setOnce & i;
setOnce |= i;
}
int result = setOnce & ~setMore;
System.out.println(String.format("%5s", Integer.toBinaryString(result)).replace(' ', '0'));
prints
01000

Well first of all, you can't do this without checking every bit in each set. If you could solve this question without checking some arbitrary bit, then that would imply that there exist two solutions (i.e. two different ones for each of the two values that bit can be).
If you want a more efficient way of computing the XOR of multiple bit sets, I'd consider representing your sets as integers rather than with sets of individual bits. Then simply XOR the integers together to arrive at your answer. Otherwise, it seems to me that you would have to iterate through each bit, check its value, and compute the solution on your own (as you described in your question).

BitMask operation in java

Consider the scenario
I have values assigned like these
Amazon -1
Walmart -2
Target -4
Costco -8
Bjs -16
In DB, data is stored by masking these values based on their availability for each product.
eg.,
Mask product description
1 laptop Available in Amazon
17 iPhone Available in Amazon
and BJ
24 Mattress Available in
Costco and BJ's
Like these all the products are masked and stored in the DB.
How do I retrieve all the Retailers based on the Masked value.,
eg., For Mattress the masked value is 24. Then how would I find or list Costco & BJ's programmatically. Any algorithm/logic would be highly appreciated.

int mattress = 24;
int mask = 1;
for(int i = 0; i < num_stores; ++i) {
if(mask & mattress != 0) {
System.out.println("Store "+i+" has mattresses!");
}
mask = mask << 1;
}
The if statement lines up the the bits, if the mattress value has the same bit as the mask set, then the store whose mask that is sells mattresses. An AND of the mattress value and mask value will only be non-zero when the store sells mattresses. For each iteration we move the mask bit one position to the left.
Note that the mask values should be positive, not negative, if need be you can multiply by negative one.

Assuming you mean in a SQL database, then in your retrieval SQL, you can generally add e.g. WHERE (MyField AND 16) = 16, WHERE (MyField AND 24) = 24 etc.
However, note that if you're trying to optimise such retrievals, and the number of rows typically matching a query is much smaller than the total number of rows, then this probably isn't a very good way to represent this data. In that case, it would be better to have a separate "ProductStore" table that contains (ProductID, StoreID) pairs representing this information (and indexed on StoreID).

Are there at most two retailers whose inventories sum to the "masked" value in each case? If so you will still have to check all pairs to retrieve them, which will take n² time. Just use a nested loop.
If the value represents the sum of any number of retailers' inventories, then you are trying to trying to solve the subset-sum problem, so unfortunately you cannot do it in better than 2^n time.
If you are able to augment your original data structure with information to lookup the retailers contributing to the sum, then this would be ideal. But since you are asking the question I am assuming you don't have access to the data structure while it is being built, so to generate all subsets of retailers for checking you will want to look into Knuth's algorithm [pdf] for generating all k-combinations (and run it for 1...k) given in TAOCP Vol 4a Sec 7.2.1.3.

http://www.antiifcampaign.com/
Remember this. If you can remove the "if" with another construct(map/strategy pattern), for me you can let it there, otherwise that "if" is really dangerous!! (F.Cirillo)
In this case you can use map of map with bitmask operation.
Luca.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.