Consider the scenario
I have values assigned like these
Amazon -1
Walmart -2
Target -4
Costco -8
Bjs -16
In DB, data is stored by masking these values based on their availability for each product.
eg.,
Mask product description
1 laptop Available in Amazon
17 iPhone Available in Amazon
and BJ
24 Mattress Available in
Costco and BJ's
Like these all the products are masked and stored in the DB.
How do I retrieve all the Retailers based on the Masked value.,
eg., For Mattress the masked value is 24. Then how would I find or list Costco & BJ's programmatically. Any algorithm/logic would be highly appreciated.
int mattress = 24;
int mask = 1;
for(int i = 0; i < num_stores; ++i) {
if(mask & mattress != 0) {
System.out.println("Store "+i+" has mattresses!");
}
mask = mask << 1;
}
The if statement lines up the the bits, if the mattress value has the same bit as the mask set, then the store whose mask that is sells mattresses. An AND of the mattress value and mask value will only be non-zero when the store sells mattresses. For each iteration we move the mask bit one position to the left.
Note that the mask values should be positive, not negative, if need be you can multiply by negative one.
Assuming you mean in a SQL database, then in your retrieval SQL, you can generally add e.g. WHERE (MyField AND 16) = 16, WHERE (MyField AND 24) = 24 etc.
However, note that if you're trying to optimise such retrievals, and the number of rows typically matching a query is much smaller than the total number of rows, then this probably isn't a very good way to represent this data. In that case, it would be better to have a separate "ProductStore" table that contains (ProductID, StoreID) pairs representing this information (and indexed on StoreID).
Are there at most two retailers whose inventories sum to the "masked" value in each case? If so you will still have to check all pairs to retrieve them, which will take n² time. Just use a nested loop.
If the value represents the sum of any number of retailers' inventories, then you are trying to trying to solve the subset-sum problem, so unfortunately you cannot do it in better than 2^n time.
If you are able to augment your original data structure with information to lookup the retailers contributing to the sum, then this would be ideal. But since you are asking the question I am assuming you don't have access to the data structure while it is being built, so to generate all subsets of retailers for checking you will want to look into Knuth's algorithm [pdf] for generating all k-combinations (and run it for 1...k) given in TAOCP Vol 4a Sec 7.2.1.3.
http://www.antiifcampaign.com/
Remember this. If you can remove the "if" with another construct(map/strategy pattern), for me you can let it there, otherwise that "if" is really dangerous!! (F.Cirillo)
In this case you can use map of map with bitmask operation.
Luca.
Related
I have recently come across the problem of creating arrays with values that have a specified bit length. Say an array with 13bits instead of 8,16,32 etc. I tried to look for a good tutorial/article about it as I am new to bit operations. Though I am not really sure of what to search for. I presume the array would work with a backing array of bytes or longs...
My ultimate question is if you can show me if there is a duplicate question or tutorial out there.
If not perhaps show me an example. AND if you got the time write a short explanation.
Thank you.
EDIT: The purpose is not to make an array of say longs but only use 40% of it. I want it to be packed together to save space to be compatible with the thing im making.
It's not possible to "create your own primitive types" in java. Also I don't think there is any library around here to do what you want. I think most people would go with the overhead of losing some memory, especially at bit level. Maybe C or Cpp would have been a wiser choice (and I'm not even sure).
You'll have to create your own bit manipulation library. There are many ways to do it, I'll give you one. I began using a byte[] but it's more complex. As a rule, use the biggest normal type (ex: for a 48bit elements, use 32 bit types as storage). so let's go with an int array (16 bits) for 100 of your 13bits types. I'll use big-endian-style storage.
int intArraySize = 100 * 16 / 13 + 1; // + 1 is just to be sure...
int[] intArray = new int[byteArraySize];
Now, how do you access the sixth value for example. You'll always need at least and at most two int of your array and an integer to store it.
int pos = 6;
int buffer = 0;
int firstPart = int Array[ (pos * 13) /16]; // 1010 0110 1100 0011
int secondPart = int Array[ (pos * 13) /16 + 1]; // 1001 1110 0101 1111
int begin = pos * 13 % 16;
The variable begin = 14 is the bit at which your number begins. So that means on your 13bits elements there are (16-14) 3 bits in the first (left) int and the rest (13-3 = 10) in the second (right).
The number you want is 1010 0110 1100 0{011 and 1001 1110} 0101 1111.
You're gonna put these two ints into one now. Right shift the secondPart 3 times (so it's the right part of your final number), and left shift the firstPart 10 times, add them in the buffer. Because it's a 13bits elements, you'll need to clean ( with a bitmask ) the 3 first elements of your 16 bit in the buffer, and voila !
I'll let you guess how to insert a value in the array (try doing the same step, but in reverse) and be carefull not to erase other values. And if you haven't looked yet: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/op3.html
Disclaimer: I didn't try the code, but you get the general idea. There might be some errors, maybe you'll have to add or remove 1 to begin. But you get the general idea. The first thing you should do is make a function that prints/log any integer (or byte, or whatever) into it's binary representation. Multiple possibilities here: Print an integer in binary format in Java because you're gonne need them to test every step of your code.
I still think it's a bad idea to store your special number this way, (seriously memory is rarely gonna be an issue), but I found the exercise interesting, and maybe you really need taht kind of storage. If your curious, take a look at the ByteArrayOutputStream, I'm not sure you'll ever need this for what you're doing but who knows.
I was going through the code of Guava library, i was interested to understand the probabilistic match code of mightContain. could any one explain what they are doing in the code specially with the bit wise operator.
here is the code....
public <T> boolean mightContain(T object, Funnel<? super T> funnel,
int numHashFunctions, BitArray bits) {
long hash64 = Hashing.murmur3_128().newHasher().putObject(object, funnel).hash().asLong();
int hash1 = (int) hash64;
int hash2 = (int) (hash64 >>> 32);
for (int i = 1; i <= numHashFunctions; i++) {
int nextHash = hash1 + i * hash2;
if (nextHash < 0) {
nextHash = ~nextHash;
}
// up to here, the code is identical with the previous method
if (!bits.get(nextHash % bits.size())) {
return false;
}
Assuming this is code from the Bloomfilter class, the logic goes like this:
Given the key, perform all of the chosen hashes on that key. Use each hash to pick a bit number and check if that bit is set. If any bits are not set in the filter at that position then this key cannot have been added.
If all of the bits are found to be set then we can only say that the filter might have had the key added. This is because it is possible for a different key (or a combination of a number of different keys) to result in all of the checked bits being set.
Note that the adding of a key to the filter does almost exactly the same function except that it **set**s all of the bits generated.
A Bloom Filter object operates as follows.
A number of hash functions are chosen, each will calculate the location of a bit in the filter. (see Optimal number of hash functions for discussion on how many).
Hold an arbitrary length bit pattern - the length is unimportant but it should be big enough (see Probability of false positives for a discussion on what big enough means).
Each time a key is added to the filter, all configured hash functions are performed on the key resulting in a number of bits being set in the pattern.
To check if a key has already been added, perform all of the hash functions and check the bit found there. If any are found to be zero then this key certainly has not been added to the filter.
If all bits are found to be set then then it may be that this key has been added. You will need to perform further checks to confirm.
There are only two bitwise operators here: >>> and ~.
The >>> is the "right shift, don't carry sign bit" operator. In Java, by default, if you shift:
1000 1100
right by 3 (using >>) you will obtain:
1111 0001
Using >>> which does not carry the sign bit you will get:
0001 0001
The second (~) is the bitwise negation, and is a simple way to obtain a positive number from a negative number, and it looks like they want positive numbers here (sparse array index maybe?). Applying this operator to:
1100 1010
which is a negative byte in Java will yield:
0011 0101
which is positive.
Basically, what this code does is create a hash of the object using a fast hash function, use that to circulate over a BitArray (no idea what that is -- an internal structure to BloomFilter probably), and ensure NON presence if at one point the hash is NOT present in the BitArray.
I suspect the BitArray is updated each time you add to the BloomFilter (using .put(), or .putAll()).
As the question states, how calculate the optimal number to use and how to motivate it?
If we are going to build an hashtable which uses the following hash function:
h(k) = k mod m, k = key
So some sources tells me:
to use the number of elements to be inserted as the value of m
to use a close prime to m
that java simply use 31 as their value of m
And some people tell me to use the closed prime to 2^n as m
I'm so confused at this point that I don't know what value to use for m. Like for instance if we use the table size for m then what happens if we want to expand the table size? Will I then have to rehash all the values with the new value of m. If so why does Java use simply 31 as prime value for m.
I've also heard that the table size should be two times bigger then the total elements in the hashtable, that's for each time it rehashes. But how come we for instance use m=10 for a table of 10 elements when it should be m=20 to create that extra empty space?
Can someone please help me understand how to calculate the value of m to use based on different scenarios like when we want to have a static (where we know that we will only insnert like 10 elements) or dynamic (rehash after a certain limit) hashtable.
Lets illustrate my problem by the following examples:
I got the values {1,2,...,n}
Question: What would be a optimized value of m if I must use the division by mod in my hashfunction?
Senario 1: n = 100?
Senario 2: n = 5043?
Addition question:
Would the m value hashfunction be different if we used a open or closed hashtable?
Note that i'm now not in need to understand hashtable for java but hashtable in general where I must use a divsion mod hashfunction.
Thank you for your time!
You have several issues here:
1) What should m equal?
2) How much free space should you have in your hash table?
3) Should you make the size of your table be a prime number?
1) As was mentioned in the comments, the h(k) you describe isn't the hash function, it gives you the index into your hash table. The idea is that every object produces some hash code, which is a positive integer. You use the hash code to figure out where to put the object in the hash table (so that you can find it again later). You clearly don't want a hash table of size MAX_INT, so you choose some size m. Then for any object, you take its hash code, compute k % m, and now you have an integer in the interval [0, m-1], which is a valid index into your hash table.
2) Because a hash table works by using a hash code to find the place in a table where an object should go, you get into trouble if multiple items are assigned to the same location. This is called a collision. Every hash table implementation must deal with collisions, either by putting items into nearby spots or keeping a linked list of items in each location. No matter the solution, more collisions means lower performance for your hash table. For that reason, it is recommended that you not let your hash table fill up, otherwise, collisions are more likely. Keeping your hash table at least twice as large as the number of items is a common recommendation to reduce the probability of collisions. Obviously, this means you will have to resize your table as it fills up. Yes, this means that you have to rehash each item since it will go into a different location when you are taking a modulus by a different value. That is the hidden cost of a hash table: it runs in constant time (assuming few or no collisions), but it can have a large coefficient (ammortized resizing, rehashing, etc.).
3) It is also often recommended that you make the size of your hash table be a prime number. This is because it tends to produce a better distribution of items in your hash table in certain common use cases, thus avoiding collisions. Rather than giving a complete explanation here, I will refer you to this excellent answer: Why should hash functions use a prime number modulus?
I am implementing a program in Java using BitSets and I am stuck in the following operation:
Given N BitSets return a BitSet with 0 if there is more than 1 one in all the BitSets, and 1 otherwise
As an example, suppose we have this 3 sets:
10010
01011
00111
11100 expected result
For the following sets :
10010
01011
00111
10100
00101
01000 expected result
I am trying to do this exclusive with bit wise operations, and I have realized that what I need is literally the exclusive or between all the sets, but not in an iterative fashion,
so I am quite stumped with what to do. Is this even possible?
I wanted to avoid the costly solution of having to check each bit in each set, and keep a counter for each position...
Thanks for any help
Edit : as some people asked, this is part of a project I'm working on. I am building a time table generator and basically one of the soft constraints is that no student should have only 1 class in 1 day, so those Sets represent the attending students in each hour, and I want to filter the ones who have only 1 class.
You can do what you want with two values. One has the bits set at least once, the second has those set more than once. The combination can be used to determine those set once and no more.
int[] ints = {0b10010, 0b01011, 0b00111, 0b10100, 0b00101};
int setOnce = 0, setMore = 0;
for (int i : ints) {
setMore |= setOnce & i;
setOnce |= i;
}
int result = setOnce & ~setMore;
System.out.println(String.format("%5s", Integer.toBinaryString(result)).replace(' ', '0'));
prints
01000
Well first of all, you can't do this without checking every bit in each set. If you could solve this question without checking some arbitrary bit, then that would imply that there exist two solutions (i.e. two different ones for each of the two values that bit can be).
If you want a more efficient way of computing the XOR of multiple bit sets, I'd consider representing your sets as integers rather than with sets of individual bits. Then simply XOR the integers together to arrive at your answer. Otherwise, it seems to me that you would have to iterate through each bit, check its value, and compute the solution on your own (as you described in your question).
Say that my application has a finite number of "stuff", in my case they will be items in my game but for the purposes of this question I'll use Strings.
Say I have 5 Strings :
James
Dave
John
Steve
Jack
There will be a set list of them, however I will increase that list in the future.
Question : What is a good algorithm I can use, to go from a random number (generated from a barcode) into one of those values from above?
For example, if I have the value 4523542354254, then what algorithm could I use to map that onto Dave? If I have that same number again, I need to make sure it maps to Dave and not to something else each time.
One option I did consider was taking the last digit of the barcode and using the 0-9 that would map onto 10 items, but its not very future proof if I added an 11th item.
Any suggestions?
Hmm... If it is OK that multiple values can be mapped to the one, you can use
string name = names[value % number_of_names];
With the clarification that "If I have that same number again, I need to make sure it maps to Dave and not to something else each time." only applies as long as the set of strings doesn't change.
Simplest is what Maverik says, name = names[barcode % names.length];
A Java long is big enough to store any UPC barcode, int isn't, so I assume here barcode is a long. Note that the last digit of a UPC barcode is base-11, it can be X. I leave it as an exercise for the reader how you actually map barcodes to numbers. One option is just discard the check digit once you've established that it's correct - it's computed from the others, so it doesn't add any information or discriminate between any otherwise-equal codes.
But as Stephen C says, barcodes aren't random, so this might not give you a uniform distribution across the names.
To get a better distribution, you could first hash the barcode. For example name = names[String.valueOf(barcode).hashCode() % names.length];
This still might not be entirely uniform -- there are better but usually slower hash functions than String.hashCode -- but it probably avoids any major biases that there may be in real-life barcodes.
Also, I can't remember whether the Java modulus operator returns negative results for negative input - if so then you need to coerce it into a positive range:
int idx = String.valueOf(barcode).hashCode() % names.length;
if (idx < 0) idx += names.length;