implementing hashing with random access files java - java

I am implementing hashing with random access files in java to treat collisions. I need to use a method to generate the keys according to a name to try to minimize collisions. With the method that I have, if income 100 records, I generated 95 collisions.
Note that the hash method I use is that of division or modulo the input data string is of length 6.
Are there possible improvements to this method, or alternatives?
public int hashCode(String nombre ) {
int hash = 1;
hash = hash*31 + nombre.hashCode();
System.out.println("hsh " +hash);
return Math.abs(hash);
}

What your code boils down to :
hash = 31 + nombre.hashCode();
If your string would be same then you would get collision.
You should change this to be more meaningful.
public int hashCode(String nombre ) {
int hash = new Random().nextInt(); // PLEASE NOTE YOU SHOULD NOT CREATE NEW RANDOM EVERY TIME. CREATE IT ONCE AND JUST USE nextInt()
hash = hash*31 + nombre.hashCode();
System.out.println("hsh " +hash);
return Math.abs(hash);
}

Related

How to return a random value within a range, from a Map with Integer and List of Longs in Java

I am trying to find a way to get a random value from a provided list of different ranges using ThreadLocalRandom, and return that one random value from a method. I've been trying different approaches, and not having much luck.
I've tried this:
private static final Long[][] values = {
{ 233L, 333L },
{ 377L, 477L },
{ 610L, 710L }
};
// This isn't correct
long randomValue = ThreadLocalRandom.current().nextLong(values[0][0]);
But I could not figure out how to get a random value out of it for a specific range, so thought I'd try the Map approach, I tried creating a Map of Integers and List of Longs:
private static Map<Integer, List<Long>> mapValues = new HashMap<>();
{{233L, 333L}, {377L, 477L}, {610L, 710L}} // ranges I want
I am not sure how to store those value ranges into the Map.
I've tried adding in values, for example:
// Need to get the other value for the range in here, in this case 333L
map.put(1, 233L);
I am not sure how to add the 333L to the List, I have searched and tried various things but always get errors, such as: found 'long', required List
I want the Integer in the Map to be an id for the associated range, for example, 1 for 233L-333L, so that I can tell it first, get a random Int key from the Map, for example 1, and then use ThreadLocalRandom.current().nextLong(origin, bound) where origin would be 233L and bound would be 333L, and then return a random value within that range of 233L-333L.
I am not sure if this is possible, or I am simply approaching this the wrong way - any guidance/help is appreciated!
It's pretty straightforward. Your long[][] will do fine.
First, select a random index, then select a long between values[index][0] and values[index][1]1.
long[][] values = {
{ 233L, 333L },
{ 377L, 477L },
{ 610L, 710L }
};
// Select a random index
int index = ThreadLocalRandom.current().nextInt(0, values.length);
// Determine lower and upper bounds
long min = values[index][0];
long max = values[index][1];
long rnd = ThreadLocalRandom.current().nextLong(min, max);
Of course, you could also abstract it away into some convenient classes.
Note that, for the distribution of values to be even, all ranges must have the same size (which seem to be the case in your code).
Implementation with even distribution
However, if you want to support different ranges while the distribution has to remain even, another approach is required.
We could calculate a single random number with as upper bound the total number of possible values. Then we could check in which 'bucket' the value is to be retrieved.
Here is a working example. In order to test the distribution which is said to be even, a random number is generated a million times. As you can see, each value occurs approximately 200,000 times.
1 In my examples, the upper bound is exclusive. This is consistent with many methods from the Java standard libraries, like ThreadLocalRandom.nextLong(origin, bound) or LongStream.range(long start, long end).
int range = ThreadLocalRandom.current().nextInt(3);
long randomValue = ThreadLocalRandom.current().nextLong(values[range][0],values[range][1]);
this will work with the array solution you tried first. first you select the range then you get the random value.
The easiest is the most straight forward.
private static final long[][] values = { { 233L, 333L
}, { 377L, 477L
}, { 610L, 710L
}
};
public static void main(String[] args) {
for (long v[] : values) {
long low = v[0];
long high = v[1];
System.out.println("Between " + low + " and " + high + " -> "
+ getRandom(low, high));
}
}
public static long getRandom(long low, long high) {
// add 1 to high to make range inclusive
return ThreadLocalRandom.current().nextLong(low, high + 1);
}

How can I maintain probability across multiple executions in Java

Firstly I am not the greatest with Math, so please excuse any ignorance relating to that. I am trying to maintain probability based randomness across multiple executions but I am failing. I have this input in a JSONObject
{
"option1": 25,
"option2":25,
"option3" :10,
"option4" :40
}
This is my function that selects a value from the above JSONObject based on the probability assigned:
public static String selectRandomoptions(JSONObject options) {
String selectedOption = null;
if (options != null) {
int maxChance = 0;
for (String option : options.keySet()) {
maxChance += options.getInt(option);
}
if (maxChance < 100) {
maxChance = 100;
}
Random r = new Random();
Integer randomValue = r.nextInt(maxChance);
int chance = 0;
for (String option : options.keySet()) {
chance += options.getInt(option);
if (chance >= randomValue) {
selectedOption = options.toLowerCase();
break;
}
}
}
}
the function behaves within a reasonable error margin if I call it x amount of times in a single execution ( tested 100+ calls), the problem is that I am running this every hour to generates some sample data in an event-driven app to verify our analytics process/data but we need it to be somewhat predictable, at least within a reasonable margin?
Has anyone any idea how I might approach this? I would rather not have to persist anything but I am not opposed to it if it makes sense or reduces complexity/time.
The values returned by Random.nextInt() are uniformly distributed, so that shouldn't be a problem.
I you would like to make random results repeatable, then you may want to use Random with seed.
Rather than create a new Random() object each time you want a new random number, just create the Random object once per run, and use the Random.nextInt() object once per run.
Looking at the documentation of Random() constructor,
This constructor sets the seed of the random number generator to a
value very likely to be distinct from any other invocation of this
constructor.it only guarantees it to be different
that's a bit of a weaker contract than the number you get from nextInt().
If you want to get the same sequence of numbers on each run, use the Random(long seed) or the setSeed(long seed) method of the random object. Both these methods set the seed of the generator. If you used the same seed for each invocation it's guaranteed that you will get the same sequence of numbers from the generator.
Random.setSeed(long).

Java HashMap of Vectors

i'm in a situation in which i'd like to have an hash map with an Integer value as key and an array of Double as values (of which i know the length).
So i want to have something like
HashMap<Integer, Double[]> hash = new HashMap<Integer, Double[]>();
Next i scan a ResultSet finding key,value for the first attribute.
At the end of this first phase i will have an hash map with some keys and for each key i will have a Double value representing a particular score.
Next i want to scan another different ResultSet with different key,values and i want to populate my hash with these values.
The problem is that here i can find element for which i don't have an entry and documents for which i already have an entry.
I'd like to arrive to a situation in which for a particular key i can access all different scores.
How can i add iteratively values to those arrays? because if i use the usual hash.put(key,value) i have to use a Double[] as value but i want to add to the hash map different score iteratively.
I think that using a Vector can bring me some problems due to the fact that some keys can have some empty values for which they don't will be populate.
I’m uncertain how well I understand your requirements. I think you may do something sort of:
double[] arrayInMap = hash.get(key);
if (arrayInMap == null) {
hash.put(key, valuesToAdd);
} else {
if (arrayInMap.length != valuesToAdd.length) {
throw new IllegalStateException("Key "+ key + ": cannot add " + valuesToAdd.length
+ " values to array of length " + arrayInMap.length);
}
for (int ix = 0; ix < arrayInMap.length; ix++) {
arrayInMap[ix] += valuesToAdd[ix];
}
}
I hope you’ll at least be able to use it as inspiration.

code to generate unique number of wrapper object Long type

I want to create a unique number of "Long" type using java. I have seen few examples but they were using timestamp, without using timestamp can i create a unique number of wrapper object "Long" .Please suggest.
please suggest.Thanks.
Generate each digit by calling random.nextInt. For uniqueness, you can keep track of the random numbers you have used so far by keeping them in a set and checking if the set contains the number you generate each time.
public static long generateRandom(int length) {
Random random = new Random();
char[] digits = new char[length];
digits[0] = (char) (random.nextInt(9) + '1');
for (int i = 1; i < length; i++) {
digits[i] = (char) (random.nextInt(10) + '0');
}
return Long.parseLong(new String(digits));
}
Without using timestamp, you have these options:
Keep a record of all previously generated numbers -- of course you have to store them somewhere, which is unwieldy
Store the previous number, and increment each time.
Simply assume that the PRNG will never come up with the same number twice. Since there are 2^64 == 1.8 * 10^19 possible values, this is a very safe bet.
Many of the answers suggest using Math.random() to generate the unique id. Now Math.random() is actually not random at all, and does in itself not add anything unique. The seemingly uniqueness comes from the default seeding in the Math.random() based on System.currentTimeMillis(); with the following code:
/**
* Construct a random generator with the current time of day in milliseconds
* as the initial state.
*
* #see #setSeed
*/
public Random() {
setSeed(System.currentTimeMillis() + hashCode());
}
So why not just remove the Math.Random() from the equation and just use System.currentTimeMillis() in the counter.
Time based unique numbers:
The following code implements a unique number generator based solemnly on time. The benefit of this is that you don't need to store any counters etc. The numbers generated will be unique under the following condition: The code only runs in one JVM at any time periode - this is important, as the timestamp is part of the key.
public class UniqueNumber {
private static UniqueNumber instance = null;
private long currentCounter;
private UniqueNumber() {
currentCounter = (System.currentTimeMillis() + 1) << 20;
}
private static synchronized UniqueNumber getInstance() {
if (instance == null) {
instance = new UniqueNumber();
}
return instance;
}
private synchronized long nextNumber() {
currentCounter++;
while (currentCounter > (System.currentTimeMillis() << 20)) {
try {
Thread.sleep(1);
} catch (InterruptedException e) {
}
}
return currentCounter;
}
static long getUniqueNumber() {
return getInstance().nextNumber();
}
}
The code allows for up to 2^20 numbers to be generated per millisecond (provided you have access to that fast hardware). If this rate is exceeded the code will sleep until next tick of System.currentTimeMillis()
Testing the code:
public static void main(String[] args) {
for (int i = 0; i < 10; i++) {
System.out.println(UniqueNumber.getUniqueNumber());
}
}
Output:
1472534126716256257
1472534126716256258
1472534126716256259
1472534126716256260
1472534126716256261
1472534126716256262
1472534126716256263
1472534126716256264
1472534126716256265
1472534126716256266
Take a look on this Commons Id, it has LongGenerator that generates an incrementing number as a Long object.
This will create simply a random long number -
System.out.println((long)((Math.random())*1000000000000000000L));
You can generate random numbers using java.util.Random and add them to a java.util.Set this will ensure that no duplicate is allowed
Try with UUID as:
Long uniqueLong = UUID.randomUUID().getMostSignificantBits();
Here, you find a very good explanation as to why this could be unique in terms of randomness.

How to create user friendly unique IDs, UUIDs or other unique identifiers in Java

I usually use the UUID class to generate unique IDs. This works fine if these IDs are used by technical systems only, they don't care how long they are:
System.out.println(UUID.randomUUID().toString());
> 67849f28-c0af-46c7-8421-94f0642e5d4d
Is there a nice way to create user friendly unique IDs (like those from tinyurl) which are a bit shorter than the UUIDs? Usecase: you want to send out IDs via Mail to your customers which in turn visit your site and enter that number into a form, like a voucher ID.
I assume that UUIDs get generated equally through the whole range of the 128 Bit range of the UUID. So would it be sage to use just the lower 64 Bits for instance?
System.out.println(UUID.randomUUID().getLeastSignificantBits());
Any feedback is welcome.
I assume that UUIDs get generated
equally through the whole range of the
128 Bit range of the UUID.
First off, your assumption may be incorrect, depending on the UUID type (1, 2, 3, or 4). From the Java UUID docs:
There exist different variants of
these global identifiers. The methods
of this class are for manipulating the
Leach-Salz variant, although the
constructors allow the creation of any
variant of UUID (described below).
The layout of a variant 2 (Leach-Salz)
UUID is as follows: The most
significant long consists of the
following unsigned fields:
0xFFFFFFFF00000000 time_low
0x00000000FFFF0000 time_mid
0x000000000000F000 version
0x0000000000000FFF time_hi
The least significant long consists of
the following unsigned fields:
0xC000000000000000 variant
0x3FFF000000000000 clock_seq
0x0000FFFFFFFFFFFF node
The variant field contains a value
which identifies the layout of the
UUID. The bit layout described above
is valid only for a UUID with a
variant value of 2, which indicates
the Leach-Salz variant.
The version field holds a value that
describes the type of this UUID. There
are four different basic types of
UUIDs: time-based, DCE security,
name-based, and randomly generated
UUIDs. These types have a version
value of 1, 2, 3 and 4, respectively.
The best way to do what you're doing is to generate a random string with code that looks something like this (source):
public class RandomString {
public static String randomstring(int lo, int hi){
int n = rand(lo, hi);
byte b[] = new byte[n];
for (int i = 0; i < n; i++)
b[i] = (byte)rand('a', 'z');
return new String(b, 0);
}
private static int rand(int lo, int hi){
java.util.Random rn = new java.util.Random();
int n = hi - lo + 1;
int i = rn.nextInt(n);
if (i < 0)
i = -i;
return lo + i;
}
public static String randomstring(){
return randomstring(5, 25);
}
/**
* #param args
*/
public static void main(String[] args) {
System.out.println(randomstring());
}
}
If you're incredibly worried about collisions or something, I suggest you base64 encode your UUID which should cut down on its size.
Moral of the story: don't rely on individual parts of UUIDs as they are holistically designed. If you do need to rely on individual parts of a UUID, make sure you familiarize yourself with the particular UUID type and implementation.
Here is another approach for generating user friendly IDs:
http://thedailywtf.com/Articles/The-Automated-Curse-Generator.aspx
(But you should go for the bad-word-filter)
Any UUID/Guid is just 16 Bytes of data. These 16 bytes can be easily encoded using BASE64 (or BASE64url), then stripped off all of the "=" characters at the end of the string.
This gives a nice, short string which still holds the same data as the UUID/Guid. In other words, it is possible to recreate the UUID/Guid from that data if such becomes necessary.
Here's a way to generate a URL-friendly 22-character UUID
public static String generateShortUuid() {
UUID uuid = UUID.randomUUID();
long lsb = uuid.getLeastSignificantBits();
long msb = uuid.getMostSignificantBits();
byte[] uuidBytes = ByteBuffer.allocate(16).putLong(msb).putLong(lsb).array();
// Strip down the '==' at the end and make it url friendly
return Base64.encode(uuidBytes)
.substring(0, 22)
.replace("/", "_")
.replace("+", "-");
}
For your use-case, it would be better to track a running count of registered user, and for each value, generate a string-token like this:
public static String longToReverseBase62(long value /* must be positive! */) {
final char[] LETTERS = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".toCharArray();
StringBuilder result = new StringBuilder(9);
do {
result.append(LETTERS[(int)(value % 62)]);
value /= 62l;
}
while (value != 0);
return result.toString();
}
For security reasons, it would be better if you make the values non-sequential, so each time a user registers, you can increment the value let's say by 1024 (This would be good to generate uuids for 2^64 / 2^10 = 2^54 users which is quite certainly more than you'd ever need :)
At the time of this writing, this question's title is:
How to create user friendly unique IDs, UUIDs or other unique identifiers in Java
The question of generating a user-friendly ID is a subjective one. If you have a unique value, there are many ways to format it into a "user-friendly" one, and they all come down to mapping unique values one-to-one with "user-friendly" IDs — if the input value was unique, the "user-friendly" ID will likewise be unique.
In addition, it's not possible in general to create a random value that's also unique, at least if each random value is generated independently of any other. In addition, there are many things you should ask yourself if you want to generate unique identifiers (which come from my section on unique random identifiers):
Can the application easily check identifiers for uniqueness within the desired scope and range (e.g., check whether a file or database record with that identifier already exists)?
Can the application tolerate the risk of generating the same identifier for different resources?
Do identifiers have to be hard to guess, be simply "random-looking", or be neither?
Do identifiers have to be typed in or otherwise relayed by end users?
Is the resource an identifier identifies available to anyone who knows that identifier (even without being logged in or authorized in some way)?
Do identifiers have to be memorable?
In your case, you have several conflicting goals: You want identifiers that are unique, random, and easy to type by end users. But other things you should think about are:
Are other users allowed to access the resource identified by the ID, whenever they know the ID? If not, then additional access control or a longer key length will be necessary.
Can your application tolerate the risk of duplicate keys? If so, then the keys can be completely randomly generated (such as by a cryptographic RNG such as java.security.SecureRandom in Java). If not, then your goal will be harder to achieve, especially for keys intended for security purposes.
Also, if you want IDs that have to be typed in by end users, you should consider choosing a character set carefully or allowing typing mistakes to be detected.
Only for you :) :
private final static char[] idchars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".toCharArray();
private static String createId(int len) {
char[] id = new char[len];
Random r = new Random(System.currentTimeMillis());
for (int i = 0; i < len; i++) {
id[i] = idchars[r.nextInt(idchars.length)];
}
return new String(id);
}
How about this one? Actually, this code returns 13 characters(numbers and lowercase alphabets) max.
import java.nio.ByteBuffer;
import java.util.UUID;
/**
* Generate short UUID (13 characters)
*
* #return short UUID
*/
public static String shortUUID() {
UUID uuid = UUID.randomUUID();
long l = ByteBuffer.wrap(uuid.toString().getBytes()).getLong();
return Long.toString(l, Character.MAX_RADIX);
}

Categories