Generate 9 million unique random numeric only string

Generate 9 million unique random numeric only string - java

Queustion 1: Can we generate 8 digit unique 9-10 million numeric only strings?
Queustion 2: How do I generate 9 to 10 million unique 'numeric only' string in one program run? These keys will be uploaded to db to be used for next 6 months. I tried
Math.floor(Math.random() * 10000000) + 10000000;
in a loop, but produces lots of duplicates. To eliminate duplicates, I used HashSet, but I get Exception in thread "main" java.lang.OutOfMemoryError: Java heap space after ~140xxxx size in the set. Any other approach to generate this output?

The standard approach to creating a block of unique random numbers is to first create the numbers in order (for instance, in an array), then shuffle them.
You need to be careful in your choice of shuffling algorithms; I hear Fisher-Yates is pretty good.

If it is one time run just increase the heap by using command line option -Xmx2048M (2G is just and example).

Q1. Can we generate 8 digit unique 9-10 million numeric only strings?
yes you can generate 10000000 8 digit unique numeric only strings using 10 digits 1,2,3,4,5,6,7,8,9,0
If you are writing correct logic for all possible combination you will not get any duplicate but just to be in safe side you can use set.
As you are getting java.lang.OutOfMemoryError error that's because you are generating that many numbers and keeping it in memory. solution for this is you generate some small chunk of numbers and save it into the database and then clear the list and again fill with the next chunk of numbers and keep it repeating untill you saved all the numbers into the database.
Q2. How do I generate 9 to 10 million unique 'numeric only' string in one program run?
here is a combination code you can use it to achieve your goal
public class Combination{
public static int count = 0;
public static ArrayList<String> list;
public Combination(){
list = new ArrayList<String>();
}
public static void main(String[] args){
Combination c = new Combination();
Scanner sc = new Scanner(System.in);
String str = sc.next();
int num = sc.nextInt();
if(num>str.length()){
System.out.println("This combination is not possible");
System.out.println(num+" should be less than or equal to the length of the string "+str);
}else{
System.out.println("Processing....");
char[] array = new char[num];
c.fillNthCharacter(0,array,str);
System.out.println("Total combination = "+count);
}
}
public static void fillNthCharacter(int n,char[] array,String str){
for(int i=0;i<str.length();i++){
array[n]=str.charAt(i);
if(n<array.length-1){
fillNthCharacter(n+1,array,str);
}else{
count++;
//System.out.println(new String(array));
list.add(new String(array));
if(list.size()>100000){
//code to add into database
list.clear();
}
}
}
}
}

I simply increased the vm memory size and ran the application to generate 9 million coupons. Thank you everyone for taking interest in answering this.

You can store them in a database and put an index on the column where you store them (with obiouvsly a unique constraint and a loop to retry if a DuplicateKeyException occurs). Even better, you can write a stored procedure to do it and operate directly on the database. I use this approach when generating short codes for urls (that can lead to duplicates). If your time requirements are not stringent, this is a viable option.

Related

NZEC error in Hackerearth problem in java

I'm trying the solve this hacker earth problem https://www.hackerearth.com/practice/basic-programming/input-output/basics-of-input-output/practice-problems/algorithm/anagrams-651/description/
I have tried searching through the internet but couldn't find the ideal solution to solve my problem
This is my code:
String a = new String();
String b = new String();
a = sc.nextLine();
b = sc.nextLine();
int t = sc.nextInt();
int check = 0;
int againCheck =0;
for (int k =0; k<t; k++)
{
for (int i =0; i<a.length(); i++)
{
char ch = a.charAt(i);
for (int j =0; j<b.length(); j++)
{
check =0;
if (ch != b.charAt(j))
{
check=1;
}
}
againCheck += check;
}
}
System.out.println(againCheck*againCheck);
I expect the output to be 4, but it is showing the "NZEC" error
Can anyone help me, please?

The requirements state1 that the input is a number (N) followed by 2 x N lines. Your code is reading two strings followed by a number. It is probably throwing an InputMismatchException when it attempts to parse the 3rd line of input as a number.
Hints:
It pays to read the requirements carefully.
Read this article on CodeChef about how to debug a NZEC: https://discuss.codechef.com/t/tutorial-how-to-debug-an-nzec-error/11221. It explains techniques such as catching exceptions in your code and printing out a Java stacktrace so that you can see what is going wrong.
1 - Admittedly, the requirements are not crystal clear. But in the sample input the first line is a number.

As I've written in other answers as well, it is best to write your code like this when submitting on sites:
def myFunction():
try:
#MY LOGIC HERE
except Exception as E:
print("ERROR Occurred : {}".format(E))
This will clearly show you what error you are facing in each test case. For a site like hacker earth, that has several input problems in various test cases, this is a must.
Coming to your question, NZEC stands for : NON ZERO EXIT CODE
This could mean any and everything from input error to server earthquake.

Regardless of hacker-whatsoever.com I am going to give two useful things:
An easier algorithm, so you can code it yourself, becuase your algorithm will not work as you expect;
A Java 8+ solution with totally a different algorithm, more complex but more efficient.
SIMPLE ALGORITM
In you solution you have a tipical double for that you use to check for if every char in a is also in b. That part is good but the rest is discardable. Try to implement this:
For each character of a find the first occurence of that character in b
If there is a match, remove that character from a and b.
The number of remaining characters in both strings is the number of deletes you have to perform to them to transform them to strings that have the same characters, aka anagrams. So, return the sum of the lenght of a and b.
NOTE: It is important that you keep track of what you already encountered: with your approach you would have counted the same character several times!
As you can see it's just pseudo code, of a naive algorithm. It's just to give you a hint to help you with your studying. In fact this algorithm has a max complexity of O(n^2) (because of the nested loop), which is generally bad. Now, a better solution.
BETTER SOLUTION
My algorithm is just O(n). It works this way:
I build a map. (If you don't know what is it, to put it simple it's a data structure to store couples "key-value".) In this case the keys are characters, and the values are integer counters binded to the respective character.
Everytime a character is found in a its counter increases by 1;
Everytime a character is found in b its counter decreases by 1;
Now every counter represents the diffences between number of times its character is present in a and b. So, the sum of the absolute values of the counters is the solution!
To implement it actually add an entry to map whenever I find a character for the first time, instead of pre-costructing a map with the whole alphabet. I also abused with lambda expressions, so to give you a very different sight.
Here's the code:
import java.util.HashMap;
public class HackerEarthProblemSolver {
private static final String a = //your input string
b = //your input string
static int sum = 0; //the result, must be static because lambda
public static void main (String[] args){
HashMap<Character,Integer> map = new HashMap<>(); //creating the map
for (char c: a.toCharArray()){ //for each character in a
map.computeIfPresent(c, (k,i) -> i+1); //+1 to its counter
map.computeIfAbsent(c , k -> 1); //initialize its counter to 1 (0+1)
}
for (char c: b.toCharArray()){ //for each character in b
map.computeIfPresent(c, (k,i) -> i-1); //-1 to its counter
map.computeIfAbsent(c , k -> -1); //initialize its counter to -1 (0-1)
}
map.forEach((k,i) -> sum += Math.abs(i) ); //summing the absolute values of the counters
System.out.println(sum)
}
}
Basically both solutions just counts how many letters the two strings have in common, but with different approach.
Hope I helped!

Questions regarding programming a single-line calculator in Java

I am currently a early CS student and have begun to start projects outside of class just to gain more experience. I thought I would try and design a calculator.
However, instead of using prompts like "Input a number" etc. I wanted to design one that would take an input of for example "1+2+3" and then output the answer.
I have made some progress, but I am stuck on how to make the calculator more flexible.
Scanner userInput = new Scanner(System.in);
String tempString = userInput.nextLine();
String calcString[] = tempString.split("");
Here, I take the user's input, 1+2+3 as a String that is then stored in tempString. I then split it and put it into the calcString array.
This works out fine, I get "1+2+3" when printing out all elements of calcString[].
for (i = 0; i <= calcString.length; i += 2) {
calcIntegers[i] = Integer.parseInt(calcString[i]);
}
I then convert the integer parts of calcString[] to actual integers by putting them into a integer array.
This gives me "1 0 2 0 3", where the zeroes are where the operators should eventually be.
if (calcString[1].equals("+") && calcString[3].equals("+")) {
int retVal = calcIntegers[0] + calcIntegers[2] + calcIntegers[4];
System.out.print(retVal);
}
This is where I am kind of stuck. This works out fine, but obviously isn't very flexible, as it doesn't account for multiple operators at the same like 1 / 2 * 3 - 4.
Furthermore, I'm not sure how to expand the calculator to take in longer lines. I have noticed a pattern where the even elements will contain numbers, and then odd elements contain the operators. However, I'm not sure how to implement this so that it will convert all even elements to their integer counterparts, and all the odd elements to their actual operators, then combine the two.
Hopefully you guys can throw me some tips or hints to help me with this! Thanks for your time, sorry for the somewhat long question.

Create the string to hold the expression :
String expr = "1 + 2 / 3 * 4"; //or something else
Use the String method .split() :
String tokens = expr.split(" ");
for loop through the tokens array and if you encounter a number add it to a stack. If you encounter an operator AND there are two numbers on the stack, pop them off and operate on them and then push back to the stack. Keep looping until no more tokens are available. At the end, there will only be one number left on the stack and that is the answer.
The "stack" in java can be represented by an ArrayList and you can add() to push items onto the stack and then you can use list.get(list.size()-1); list.remove(list.size()-1) as the pop.

You are taking input from user and it can be 2 digit number too.
so
for (i = 0; i <= calcString.length; i += 2) {
calcIntegers[i] = Integer.parseInt(calcString[i]);
}
will not work for 2 digit number as your modification is i+=2.
Better way to check for range of number for each char present in string. You can use condition based ASCII values.

Since you have separated your entire input into strings, what you should do is check where the operations appear in your calcString array.
You can use this regex to check if any particular String is an operation:
Pattern.matches("[+-[*/]]",operation )
where operation is a String value in calcString
Use this check to seperate values and operations, by first checking if any elements qualify this check. Then club together the values that do not qualify.
For example,
If user inputs
4*51/6-3
You should find that calcString[1],calcString[4] and calcString[6] are operations.
Then you should find the values you need to perform operations on by consolidating neighboring digits that are not separated by operations. In the above example, you must consolidate calcString[2] and calcString[3]
To consolidate such digits you can use a function like the following:
public int consolidate(int startPosition, int endPosition, ArrayList list)
{
int number = list.get(endPosition);
int power = 10;
for(int i=endPosition-1; i>=startPosition; i--)
{
number = number + (power*(list.get(i)));
power*=10;
}
return number;
}
where startPosition is the position where you encounter the first digit in the list, or immediately following an operation,
and endPosition is the last position in the list till which you have not encountered another operation.
Your ArrayList containing user input must also be passed as an input here!
In the example above you can consolidate calcString[2] and calcString[3] by calling:
consolidate(2,3,calcString)
Remember to verify that only integers exist between the mentioned positions in calcString!
REMEMBER!
You should account for a situation where the user enters multiple operations consecutively.
You need a priority processing algorithm based on the BODMAS (Bracket of, Division, Multiplication, Addition and Subtraction) or other mathematical rule of your preference.
Remember to specify that your program handles only +, -, * and /. And not power, root, etc. functions.
Take care of the data structures you are using according to the range of inputs you are expecting. A Java int will handle values in the range of +/- 2,147,483,647!

Iterate through only part of a large list in java

I'm trying to make a Boggle game in Java, and for my program once I randomize the board I have a method which iterates through the possible combinations and compares each one to a dictionary list to check if it's a valid word, and if yes, I put it in the key. It works fine, however the program takes three or four minutes to generate the key, which is mostly due to the size of the dictionary. The one I'm using has about 19k words and comparing every combination takes up a ton of time. Here's the part of the code I'm trying to make faster:
if (str.length()>3&&!key.contains(str)&&prefixes.contains(str.substring(0,3))&&dictionary.contains(str)){
key.add(str);
}
where str is the combination generated. prefixes is a list I generated based on dictionary that goes like this:
public void buildPrefixes(){
for (String word:dictionary){
if(!prefixes.contains(word.substring(0,3))){
prefixes.add(word.substring(0,3));
}
}
}
which just adds all the three letter prefixes in the dictionary such as "abb" and "mar" so that when str is jibberish like "xskfjh" it won't get checked against the whole dictionary, just prefixes which is something like 1k words.
What I'm trying to do is cut down on time by iterating through only the words in the dictionary that have the same first letter as str, so if str is "abbey" then it will only check str against words that start with "a" instead of the whole list, which would cut down on time significantly. Or even better, it only checks str against words that have the same prefix. I am pretty new to Java so I would really appreciate if you're very descriptive in your answers, thanks!

What comments are trying to say is - do not reinvent wheel. Java is not Assembler or C and it is powerful enough to handle such trivial cases.
Here is simple code which shows that simple Set can handle your vocabulary easy:
import java.util.Set;
import java.util.TreeSet;
public class Work {
public static void main(String[] args) {
long startTime=System.currentTimeMillis();
Set<String> allWords=new TreeSet<String>();
for (int i=0; i<20000;i++){
allWords.add(getRandomWord());
}
System.out.println("Total words "+allWords.size()+" in "+(System.currentTimeMillis()-startTime)+" milliseconds");
}
static String getRandomWord() {
int length=3+(int)(Math.random()*10);
String r = "";
for(int i = 0; i < length; i++) {
r += (char)(Math.random() * 26 + 97);
}
return r;
}
}
On my computer it shows
Total words 19875 in 47 milliseconds
As you can see 125 words out of 20,000 were duplicated. And it took not only time to generate 20,000 words in very inefficient way but store them as well as check for duplicates.

Most efficient way to find unique entries in a large data set

Before anything, I am making it clear that this is an assignment and I do not expect full coded answers. All I seek is advice and maybe snippets of code that helps me.
So, I am reading in about 900,000 words all stored in a arrayList. I need to count unique words using a sorted array (or arraylist) in java.
So far, I am simply looping over the given arrayList and use
Collections.sort(words);
and Collections.binarySearch(words, wordToLook); to achieve it like the following:
OrderedSet set = new OrderedSet();
for(String a : words){
if(!set.contains(a)){
set.add(a);
}
}
and
public boolean contains(String word) {
Collections.sort(uniqueWords);
int result = Collections.binarySearch(uniqueWords, word);
if(result<0){
return false;
}else{
return true;
}
}
This code has a running time of about 60 seconds but I was wondering if there is any better way to do this because running a sort every time an element is added seems very inefficient (but of couse necessary if I were to use binary search).
Any sort of feedback would be greatly appreciated. Thanks.

So, you are required to use a sorted array. That is ok, since you are (not yet) programming in the real world.
I will suggest two alternatives:
The first uses binary search (which you are using in your current code).
I would create a class that contains two fields: the word (a String) and the count for that word (an int). You will build a sorted array of these classes.
Start with an empty array and add to it as you read each word. For each word, do a binary search for the word in the array you are building. The search will either find the entry containing the word (and you will increment the count), or you will determine that the word is not yet in the array.
When your binary search ends without finding the word, you will create a new object to hold the word+count and add it to the array in the location where your search ended (be careful to make sure that your logic really puts it in the right spot to keep your list sorted). Of course, your count is set to 1 for new words.
Another alternative:
Read all of your words into a list and sort it. After sorting, all duplicates will be next to each other in the list.
You will walk down this sorted list once and create a list of word+count as you go. If the next word you see is the same as the last word+count, increment the count. If it is a new word, add a new word+count to your result list with count=1.

I would not use a sorted array. I would create a Map<String, Integer> where the key is your word and the value is the count of the number of occurrences of the word. As you read each word, do something like this:
Integer count = map.get(word);
if (count == null) {
count = 0;
}
map.put(word, count + 1);
Then just iterate over the map's entry set and do whatever you need to do with the counts.
If you know, or can estimate, the number of unique words then you should use this number in the HashMap constructor (so you don't grow the map many times).
If you use a sorted array, your run time cannot be better than proportional to NlogN (where N is the number of words in your list). If you use a HashMap, you can achieve a runtime that grows linearly with N (you save yourself the factor of logN).
Another advantage of using a Map is the memory used is proportional to the number of unique words, rather than the total number of words (assuming that you build the map while reading the words, rather than reading all words into a collection and then adding them to the map).

public static int countUnique(array) {
if(array.length == 0) return 0;
int count = 1;
for i from 1 to array.length - 1 {
if(!array[i].equals(array[i - 1])) count++;
}
return count;
}
This is a O(N) algorithm in pseudocode for counting the number of unique entries in a sorted array. The idea behind it is that we count the number of transitions between groups of equal elements. Then, the number of unique entries is the number of transitions plus one (for the first entry).
Hopefully you see how to apply this algorithm to your array after the elements are sorted.

You could always use comparator to get unique values.
List newList = new ArrayList(new Comparator() {
#Override
public int compare(words o1, words o2) {
if(o1.equalsIgnoreCase(o2)){
return 0;
}
return 1;
}
});
Now count:
words - newList = no. of repeated values.
Hope this helps!!!!

two dimensional array in java - difficulties

I'm used to python and django but I've recently started learning java. Since I don't have much time because of work I missed a lot of classes and I'm a bit confused now that I have to do a work.
EDIT
The program is suppose to attribute points according to the time each athlete made in bike and race. I have 4 extra tables for male and female with points and times.
I have to compare then and find the corresponding points for each time (linear interpolation).
So this was my idea to read the file, and use an arrayList
One of the things I'm having difficulties is creating a two dimensional array.
I have a file similar to this one:
12 M 23:56 62:50
36 F 59:30 20:60
Where the first number is an athlete, the second the gender and next time of different races (which needs to be converted into seconds).
Since I can't make an array mixed (int and char), I have to convert the gender to 0 and 1.
so where is what I've done so far:
public static void main(String[] args) throws FileNotFoundException {
Scanner fileTime = new Scanner (new FileReader ("time.txt"));
while (fileTime.hasNext()) {
String value = fileTime.next();
// Modify gender by o and 1, this way I'm able to convert string into integer
if (value.equals("F"))
value = "0";
else if (value.equals("M"))
value = "1";
// Verify which values has :
int index = valor.indexOf(":");
if (index != -1) {
String [] temp = value.split(":");
for (int i=0; i<temp.length; i++) {
// convert string to int
int num = Integer.parseInt(temp[i]);
// I wanted to multiply the first number by 60 to convert into seconds and add the second number to the first
num * 60; // but this way I multiplying everything
}
}
}
I'm aware that there's probably easier ways to do this but honestly I'm a bit confused, any lights are welcome.

Just because an array works well to store the data in one language does not mean it is the best way to store the data in another language.
Instead of trying to make a two dimensional array, you can make a single array (or collection) of a custom class.
public class Athlete {
private int _id;
private boolean _isMale;
private int[] _times;
//...
}
How you intend to use the data may change the way you structure the class. But this is a simple direct representation of the data line you described.

Python is a dynamically-typed language, which means you can think of each row as a tuple, or even as a list/array if you like. The Java idiom is to be stricter in typing. So, rather than having a list of list of elements, your Java program should define a class that represents a the information in each line, and then instantiate and populate objects of that class. In other words, if you want to program in idiomatic Java, this is not a two-dimensional array problem; it's a List<MyClass> problem.

Try reading the file line by line:
while (fileTime.hasNext())
Instead of hasNext use hasNextLine.
Read the next line instead of next token:
String value = fileTime.next();
// can be
String line = fileTime.nextLine();
Split the line into four parts with something as follows:
String[] parts = line.split("\\s+");
Access the parts using parts[0], parts[1], parts[2] and parts[3]. And you already know what's in what. Easily process them.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Generate 9 million unique random numeric only string - java

The standard approach to creating a block of unique random numbers is to first create the numbers in order (for instance, in an array), then shuffle them. You need to be careful in your choice of shuffling algorithms; I hear Fisher-Yates is pretty good.

If it is one time run just increase the heap by using command line option -Xmx2048M (2G is just and example).

I simply increased the vm memory size and ran the application to generate 9 million coupons. Thank you everyone for taking interest in answering this.

Related

NZEC error in Hackerearth problem in java

Questions regarding programming a single-line calculator in Java

Iterate through only part of a large list in java

Most efficient way to find unique entries in a large data set

two dimensional array in java - difficulties

Categories

Resources