How to remove duplicates from a parallel array in Java?

How to remove duplicates from a parallel array in Java? - java

So, I started learning Java and was wondering how parallel arrays of string and int type could be stored exactly once from the source arrays. For example, I have two arrays parallel to each other, one stores the Phone number as a string and the other stores the duration of the calls as a/an int gotten from each phone number.
String[] phoneNumbers;
phoneNumbers = new String[100];
int[] callDurations = new int[phoneNumbers.length];
int size = 0;
phoneNumbers[0] = "888-555-0000";
callDurations[0] = 10;
phoneNumbers[1] = "888-555-1234";
callDurations[1] = 26;
phoneNumbers[2] = "888-555-0000";
callDurations[2] = 90;
phoneNumbers[3] = "888-678-8766";
callDurations[3] = 28;
size = 4;
I wrote a method to find the details of a specific phone number, such as the duration of the specific call "888-555-1234"
Here is the method and how I called it:
public static void findAllCalls(String[] phoneNumbers, int[] callDurations, int size, String targetNumber) {
int match;
System.out.println("Calls from " + targetNumber + ":");
match = find(phoneNumbers, size, 0, targetNumber);
while (match >= 0) {
System.out.println(phoneNumbers[match] + " duration: " + callDurations[match] + "s");
match = find(phoneNumbers, size, match + 1, targetNumber);
}
}
System.out.println("\n\nAll calls from number: ");
findAllCalls(phoneNumbers, callDurations, size, "888-555-1234");
The output of this code is:
All calls from number:
Calls from 888-555-1234:
888-555-1234 duration: 26s
888-555-1234 duration: 28s
Process finished with exit code 0
Whereas,the output I want to get instead is:
All calls from number:
Calls from 888-555-1234:
888-555-1234 duration: 54s
Process finished with exit code 0
(26s + 28s)
How is it possible in java to make sure there are no duplicates stored in a parallel array and get total duration for each phone number instead of having them separately in the arrays?

As already stated in the answers before, you can use a map - will avoid duplicates in both phoneNumber and callDuration (Java code to Prevent duplicate <Key,Value> pairs in HashMap/HashTable).
Or, if you want to stick with the String implementation, you can change the logic in the findAllCalls() method.
public static void findAllCalls(String[] phoneNumbers, int[] callDurations, int size, String targetNumber)
{
int match;
System.out.println("Calls from " + targetNumber + ":");
//match = find(phoneNumbers, size, 0, targetNumber);
int i = 0, duration = 0;
while (i<size)
{
if(phoneNumbers[i].equals(targetNumber))
duration+=callDurations[i];
i++;
//System.out.println(phoneNumbers[match] + " duration: " + callDurations[match] + "s");
//match = find(phoneNumbers, size, match + 1, targetNumber);
}
System.out.println(targetNumber+" duration : "+duration+"s");
}

The question was: "How is it possible in java to make sure there are no duplicates stored in a parallel array and get total duration for each phone number instead of having them separately in the arrays?"
The answer is: There is no (inexpensive) way.
Use a hash map instead. Have a look at java.utils.HashMap. A hash map is a concept to store values (of any kind) associated to a specific key. In your case the values would be the durations, the keys would be your phone number. Therefor you should use a String-Integer hash map here.
On insert do the following:
For each phone number-duration pair do:
Is there already an element in the HashMap of the specified key?
No -> Add phone number and duration
Yes ->
Get the duration stored
Add the current duration to the stored duration
Overwrite the existing item with the new duration calculated
Later you efficiently can perform a lookup.

A Map is an object that maps keys to values
In your case, you want phone numbers (stored in a String) to correspond to call duration (ints). Therefore, you'd declare your HashMap as follows (Note you can't instantiate Map, it is an interface):
Map<String, Integer> callRecords = new HashMap<String, Integer>();
This is a better version because you no longer need to keep track of two different arrays. Now, instead of
phoneNumbers[0] = "888-555-0000";
callDurations[0] = 10;
You can write:
callRecords.put("888-555-0000", 10);

Related

Java: Most efficient way to loop through CSV and sum values of one column for each unique value in another Column

I have a CSV file with 500,000 rows of data and 22 columns. This data represents all commercial flights in the USA for one year. I am being tasked with finding the tail number of the plane that flew the most miles in the data set. Column 5 contains the airplain's tail number for each flight. Column 22 contains the total distance traveled.
Please see my extractQ3 method below. First, created a HashMap for the whole CSV using the createHashMap() method. Then, I ran a for loop to identify every unique tail number in the dataset and stored them in an array called tailNumbers. Then for each unique tail number, I looped through the entire Hashmap to calculate the total miles of distance for that tail number.
The code runs fine on smaller datasets, but once the sized increased to 500,000 rows the code becomes horribly inefficient and takes an eternity to run. Can anyone provide me with a faster way to do this?
public class FlightData {
HashMap<String,String[]> dataMap;
public static void main(String[] args) {
FlightData map1 = new FlightData();
map1.dataMap = map1.createHashMap();
String answer = map1.extractQ3(map1);
}
public String extractQ3(FlightData map1) {
ArrayList<String> tailNumbers = new ArrayList<String>();
ArrayList<Integer> tailMiles = new ArrayList<Integer>();
//Filling the Array with all tail numbers
for (String[] value : map1.dataMap.values()) {
if(Arrays.asList(tailNumbers).contains(value[4])) {
} else {
tailNumbers.add(value[4]);
}
}
for (int i = 0; i < tailNumbers.size(); i++) {
String tempName = tailNumbers.get(i);
int miles = 0;
for (String[] value : map1.dataMap.values()) {
if(value[4].contentEquals(tempName) && value[19].contentEquals("0")) {
miles = miles + Integer.parseInt(value[21]);
}
}
tailMiles.add(miles);
}
Integer maxVal = Collections.max(tailMiles);
Integer maxIdx = tailMiles.indexOf(maxVal);
String maxPlane = tailNumbers.get(maxIdx);
return maxPlane;
}
public HashMap<String,String[]> createHashMap() {
File flightFile = new File("flights_small.csv");
HashMap<String,String[]> flightsMap = new HashMap<String,String[]>();
try {
Scanner s = new Scanner(flightFile);
while (s.hasNextLine()) {
String info = s.nextLine();
String [] piecesOfInfo = info.split(",");
String flightKey = piecesOfInfo[4] + "_" + piecesOfInfo[2] + "_" + piecesOfInfo[11]; //Setting the Key
String[] values = Arrays.copyOfRange(piecesOfInfo, 0, piecesOfInfo.length);
flightsMap.put(flightKey, values);
}
s.close();
}
catch (FileNotFoundException e)
{
System.out.println("Cannot open: " + flightFile);
}
return flightsMap;
}
}

The answer depends on what you mean by "most efficient", "horribly inefficient" and "takes an eternity". These are subjective terms. The answer may also depend on specific technical factors (speed vs. memory consumption; the number of unique flight keys compared to the number of overall records; etc.).
I would recommend applying some basic streamlining to your code, to start with. See if that gets you a better (acceptable) result. If you need more, then you can consider more advanced improvements.
Whatever you do, take some timings to understand the broad impacts of any changes you make.
Focus on going from "horrible" to "acceptable" - and then worry about more advanced tuning after that (if you still need it).
Consider using a BufferedReader instead of a Scanner. See here. Although the scanner may be just fine for your needs (i.e. if it's not a bottleneck).
Consider using logic within your scanner loop to capture tail numbers and accumulated mileage in one pass of the data. The following is deliberately basic, for clarity and simplicity:
// The string is a tail number.
// The integer holds the accumulated miles flown for that tail number:
Map<String, Integer> planeMileages = new HashMap();
if (planeMileages.containsKey(tailNumber)) {
// add miles to existing total:
int accumulatedMileage = planeMileages.get(tailNumber) + flightMileage;
planeMileages.put(tailNumber, accumulatedMileage);
} else {
// capture new tail number:
planeMileages.put(tailNumber, flightMileage);
}
After that, once you have completed the scanner loop, you can iterate over your planeMileages to find the largest mileage:
String maxMilesTailNumber;
int maxMiles = 0;
for (Map.Entry<String, Integer> entry : planeMileages.entrySet()) {
int planeMiles = entry.getValue();
if (planeMiles > maxMiles) {
maxMilesTailNumber = entry.getKey();
maxMiles = planeMiles;
}
}
WARNING - This approach is just for illustration. It will only capture one tail number. There could be multiple planes with the same maximum mileage. You would have to adjust your logic to capture multiple "winners".
The above approach removes the need for several of your existing data structures, and related processing.
If you still face problems, put in some timers to see which specific areas of your code are slowest - and then you will have more specific tuning opportunities you can focus on.

I suggest you use the java 8 Stream API, so that you can take advantage of Parallel streams.

How to return a random value within a range, from a Map with Integer and List of Longs in Java

I am trying to find a way to get a random value from a provided list of different ranges using ThreadLocalRandom, and return that one random value from a method. I've been trying different approaches, and not having much luck.
I've tried this:
private static final Long[][] values = {
{ 233L, 333L },
{ 377L, 477L },
{ 610L, 710L }
};
// This isn't correct
long randomValue = ThreadLocalRandom.current().nextLong(values[0][0]);
But I could not figure out how to get a random value out of it for a specific range, so thought I'd try the Map approach, I tried creating a Map of Integers and List of Longs:
private static Map<Integer, List<Long>> mapValues = new HashMap<>();
{{233L, 333L}, {377L, 477L}, {610L, 710L}} // ranges I want
I am not sure how to store those value ranges into the Map.
I've tried adding in values, for example:
// Need to get the other value for the range in here, in this case 333L
map.put(1, 233L);
I am not sure how to add the 333L to the List, I have searched and tried various things but always get errors, such as: found 'long', required List
I want the Integer in the Map to be an id for the associated range, for example, 1 for 233L-333L, so that I can tell it first, get a random Int key from the Map, for example 1, and then use ThreadLocalRandom.current().nextLong(origin, bound) where origin would be 233L and bound would be 333L, and then return a random value within that range of 233L-333L.
I am not sure if this is possible, or I am simply approaching this the wrong way - any guidance/help is appreciated!

It's pretty straightforward. Your long[][] will do fine.
First, select a random index, then select a long between values[index][0] and values[index][1]1.
long[][] values = {
{ 233L, 333L },
{ 377L, 477L },
{ 610L, 710L }
};
// Select a random index
int index = ThreadLocalRandom.current().nextInt(0, values.length);
// Determine lower and upper bounds
long min = values[index][0];
long max = values[index][1];
long rnd = ThreadLocalRandom.current().nextLong(min, max);
Of course, you could also abstract it away into some convenient classes.
Note that, for the distribution of values to be even, all ranges must have the same size (which seem to be the case in your code).
Implementation with even distribution
However, if you want to support different ranges while the distribution has to remain even, another approach is required.
We could calculate a single random number with as upper bound the total number of possible values. Then we could check in which 'bucket' the value is to be retrieved.
Here is a working example. In order to test the distribution which is said to be even, a random number is generated a million times. As you can see, each value occurs approximately 200,000 times.
1 In my examples, the upper bound is exclusive. This is consistent with many methods from the Java standard libraries, like ThreadLocalRandom.nextLong(origin, bound) or LongStream.range(long start, long end).

int range = ThreadLocalRandom.current().nextInt(3);
long randomValue = ThreadLocalRandom.current().nextLong(values[range][0],values[range][1]);
this will work with the array solution you tried first. first you select the range then you get the random value.

The easiest is the most straight forward.
private static final long[][] values = { { 233L, 333L
}, { 377L, 477L
}, { 610L, 710L
}
};
public static void main(String[] args) {
for (long v[] : values) {
long low = v[0];
long high = v[1];
System.out.println("Between " + low + " and " + high + " -> "
+ getRandom(low, high));
}
}
public static long getRandom(long low, long high) {
// add 1 to high to make range inclusive
return ThreadLocalRandom.current().nextLong(low, high + 1);
}

Network produces the same output regardless of inputs

public static double testElmanWithAnnealing(NeuralDataSet trainingSet,
NeuralDataSet validation,int maxEpoch)
{
// create an elman network
ElmanPattern pattern = new ElmanPattern();
pattern.setActivationFunction(new ActivationTANH());
pattern.setInputNeurons(trainingSet.getInputSize());
pattern.addHiddenLayer(8);
pattern.setOutputNeurons(trainingSet.getIdealSize());
BasicNetwork network = (BasicNetwork)pattern.generate();
network.reset();
// set up a hybrid strategy of resilient + simulated annealing
CalculateScore score = new TrainingSetScore(trainingSet)
final MLTrain trainAlt = new NeuralSimulatedAnnealing(
network, score, 10, 2, 100);
final MLTrain trainMain =
new ResilientPropagation(network, trainingSet);
trainMain.addStrategy(
new HybridStrategy(trainAlt,0.00001,100,3));
int epoch = 0;
do {
trainMain.iteration();
System.out
.println("Epoch #" + epoch + " Error:" + trainMain.getError());
epoch++;
} while(trainMain.getError() > 0.01 && epoch < maxEpoch);
int trueStuff = 0;
int falseStuff = 0;
for(MLDataPair pair: validation ) {
final MLData output = network.compute(pair.getInput());
System.out.println(
"actual=" + output.getData(0) + ",ideal=" + pair.getIdeal().getData(0));
if(output.getData(0) * pair.getIdeal().getData(0) > 0)
trueStuff++;
else
falseStuff++;
}
System.out.println("true classifications:" + trueStuff);
System.out.println("false classifications:" + falseStuff);
return network.calculateError(validation);
}
I have 8 inputs of floating point variables normalized using a simple
min/max scheme to values between -1 and 1.
Trying to classify into either a negative value or a positive value (binary classification). So in the training and validation set the ideal would be either 1 or -1.
Network always produces the same result, or it might have one or two results. For example: -0.05686225929855484 around 90% of the time and some other values occasionally.
am I using encog wrong? does anything in the code stand out to you as a bug?
can I do anything to punish such behaviour of the neural network?
this is even worse than a random guess, surely there's a way to get better predictions.
Thanks in advance.

Java - Return random index of specific character in string

So given a string such as: 0100101, I want to return a random single index of one of the positions of a 1 (1, 5, 6).
So far I'm using:
protected int getRandomBirthIndex(String s) {
ArrayList<Integer> birthIndicies = new ArrayList<Integer>();
for (int i = 0; i < s.length(); i++) {
if ((s.charAt(i) == '1')) {
birthIndicies.add(i);
}
}
return birthIndicies.get(Randomizer.nextInt(birthIndicies.size()));
}
However, it's causing a bottle-neck on my code (45% of CPU time is in this method), as the strings are over 4000 characters long. Can anyone think of a more efficient way to do this?

If you're interested in a single index of one of the positions with 1, and assuming there is at least one 1 in your input, you can just do this:
String input = "0100101";
final int n=input.length();
Random generator = new Random();
char c=0;
int i=0;
do{
i = generator.nextInt(n);
c=input.charAt(i);
}while(c!='1');
System.out.println(i);
This solution is fast and does not consume much memory, for example when 1 and 0 are distributed uniformly. As highlighted by #paxdiablo it can perform poorly in some cases, for example when 1 are scarce.

You could use String.indexOf(int) to find each 1 (instead of iterating every character). I would also prefer to program to the List interface and to use the diamond operator <>. Something like,
private static Random rand = new Random();
protected int getRandomBirthIndex(String s) {
List<Integer> birthIndicies = new ArrayList<>();
int index = s.indexOf('1');
while (index > -1) {
birthIndicies.add(index);
index = s.indexOf('1', index + 1);
}
return birthIndicies.get(rand.nextInt(birthIndicies.size()));
}
Finally, if you need to do this many times, save the List as a field and re-use it (instead of calculating the indices every time). For example with memoization,
private static Random rand = new Random();
private static Map<String, List<Integer>> memo = new HashMap<>();
protected int getRandomBirthIndex(String s) {
List<Integer> birthIndicies;
if (!memo.containsKey(s)) {
birthIndicies = new ArrayList<>();
int index = s.indexOf('1');
while (index > -1) {
birthIndicies.add(index);
index = s.indexOf('1', index + 1);
}
memo.put(s, birthIndicies);
} else {
birthIndicies = memo.get(s);
}
return birthIndicies.get(rand.nextInt(birthIndicies.size()));
}

Well, one way would be to remove the creation of the list each time, by caching the list based on the string itself, assuming the strings are used more often than they're changed. If they're not, then caching methods won't help.
The caching method involves, rather than having just a string, have an object consisting of:
current string;
cached string; and
list based on the cached string.
You can provide a function to the clients to create such an object from a given string and it would set the string and the cached string to whatever was passed in, then calculate the list. Another function would be used to change the current string to something else.
The getRandomBirthIndex() function then receives this structure (rather than the string) and follows the rule set:
if the current and cached strings are different, set the cached string to be the same as the current string, then recalculate the list based on that.
in any case, return a random element from the list.
That way, if the list changes rarely, you avoid the expensive recalculation where it's not necessary.
In pseudo-code, something like this should suffice:
# Constructs fastie from string.
# Sets cached string to something other than
# that passed in (lazy list creation).
def fastie.constructor(string s):
me.current = s
me.cached = s + "!"
# Changes current string in fastie. No list update in
# case you change it again before needing an element.
def fastie.changeString(string s):
me.current = s
# Get a random index, will recalculate list first but
# only if necessary. Empty list returns index of -1.
def fastie.getRandomBirthIndex()
me.recalcListFromCached()
if me.list.size() == 0:
return -1
return me.list[random(me.list.size())]
# Recalculates the list from the current string.
# Done on an as-needed basis.
def fastie.recalcListFromCached():
if me.current != me.cached:
me.cached = me.current
me.list = empty
for idx = 0 to me.cached.length() - 1 inclusive:
if me.cached[idx] == '1':
me.list.append(idx)
You also have the option of speeding up the actual searching for the 1 character by, for example, useing indexOf() to locate them using the underlying Java libraries rather than checking each character individually in your own code (again, pseudo-code):
def fastie.recalcListFromCached():
if me.current != me.cached:
me.cached = me.current
me.list = empty
idx = me.cached.indexOf('1')
while idx != -1:
me.list.append(idx)
idx = me.cached.indexOf('1', idx + 1)
This method can be used even if you don't cache the values. It's likely to be faster using Java's probably-optimised string search code than doing it yourself.
However, you should keep in mind that your supposed problem of spending 45% of time in that code may not be an issue at all. It's not so much the proportion of time spent there as it is the absolute amount of time.
By that, I mean it probably makes no difference what percentage of the time being spent in that function if it finishes in 0.001 seconds (and you're not wanting to process thousands of strings per second). You should only really become concerned if the effects become noticeable to the user of your software somehow. Otherwise, optimisation is pretty much wasted effort.

You can even try this with best case complexity O(1) and in worst case it might go to O(n) or purely worst case can be infinity as it purely depends on Randomizer function that you are using.
private static Random rand = new Random();
protected int getRandomBirthIndex(String s) {
List<Integer> birthIndicies = new ArrayList<>();
int index = s.indexOf('1');
while (index > -1) {
birthIndicies.add(index);
index = s.indexOf('1', index + 1);
}
return birthIndicies.get(rand.nextInt(birthIndicies.size()));
}

If your Strings are very long and you're sure it contains a lot of 1s (or the String you're looking for), its probably faster to randomly "poke around" in the String until you find what you are looking for. So you save the time iterating the String:
String s = "0100101";
int index = ThreadLocalRandom.current().nextInt(s.length());
while(s.charAt(index) != '1') {
System.out.println("got not a 1, trying again");
index = ThreadLocalRandom.current().nextInt(s.length());
}
System.out.println("found: " + index + " - " + s.charAt(index));
I'm not sure about the statistics, but it rare cases might happen that this Solution take much longer that the iterating solution. On case is a long String with only a very few occurrences of the search string.
If the Source-String doesn't contain the search String at all, this code will run forever!

One possibility is to use a short-circuited Fisher-Yates style shuffle. Create an array of the indices and start shuffling it. As soon as the next shuffled element points to a one, return that index. If you find you've iterated through indices without finding a one, then this string contains only zeros so return -1.
If the length of the strings is always the same, the array indices can be static as shown below, and doesn't need reinitializing on new invocations. If not, you'll have to move the declaration of indices into the method and initialize it each time with the correct index set. The code below was written for strings of length 7, such as your example of 0100101.
// delete this and uncomment below if string lengths vary
private static int[] indices = { 0, 1, 2, 3, 4, 5, 6 };
protected int getRandomBirthIndex(String s) {
int tmp;
/*
* int[] indices = new int[s.length()];
* for (int i = 0; i < s.length(); ++i) indices[i] = i;
*/
for (int i = 0; i < s.length(); i++) {
int j = randomizer.nextInt(indices.length - i) + i;
if (j != i) { // swap to shuffle
tmp = indices[i];
indices[i] = indices[j];
indices[j] = tmp;
}
if ((s.charAt(indices[i]) == '1')) {
return indices[i];
}
}
return -1;
}
This approach terminates quickly if 1's are dense, guarantees termination after s.length() iterations even if there aren't any 1's, and the locations returned are uniform across the set of 1's.

Java - Add user input to 3 arrays? (Parallel Arrays)

So I have 3 Parallel Arrays. I need a method that will allow for the user to add to these arrays. As well as another method to be able to identify a certain item and remove it. As well as another method to identify an item and edit/change the contents of that item in the array.
These are my 3 arrays...
I need to add the brand name of the computer to:
String[] computerBrand
I need to add the processor speeds to:
double[] computerSpeed
and I need to add the computers price to:
double[] computerPrice
The first array (string) holds the brand name of computer. (Dell)
the second array (double) holds the processor speed of the computer. (2.5)
the third array (double) holds the price of the computer. (1500)
How do I take user input and put them in the array?
(I CANNOT USE ARRAYLISTS)

For taking input look at the Scanner class.
Scanner
For adding the values to your arrays just do this:
computerBrand[i] = <value_brand>;
computerSpeed[i] = <value_speed>;
computerPrice[i] = <value_price>;
i++;
where these 3 values are the one read by the Scanner,
and i is some index/counter integer variable.
But first make sure you initialize your arrays e.g.:
computerBrand = new String[100];
computerSpeed = new double[100];
computerPrice = new double[100];

// Create the arrays
// (anyway It's better to use double for price and speed)
String[] computerBrand = new String[5];
String[] computerSpeed = new String[5];
String[] computerPrice = new String[5];
//
// Now you have 3 arrays which contains computer info
// A Computer with index 0 will contains his name in computerBrand[0], speed in computerSpeed[0] and price in computerPrice[0]
// Put info into the arrays, here is random in the real code you get the info from the user.. you can understand it's the same way you use for standard arrays (it's anyway arrays)
for (int i = 0; i < 5; ++i)
{
computerBrand[i] = "Computer " + i;
computerSpeed[i] = String.valueOf(Math.floor(Math.random()*10));
computerPrice[i] = String.valueOf(Math.floor(Math.random()*500));
}
// print info
//
for (int i = 0; i < 5; ++i)
{
// As you can see, i have used the same index in every array
System.out.println(
"Brand: " + computerBrand[i] +
" Speed: " + computerSpeed[i] +
" Price: " + computerPrice[i] + "E"
);
}
By reading the code you can understand a simple thing: every array will share the same index.
For your real code, you just need to get name, price and speed of the pc and put everything in the arrays using the same index. If you use it in separate code you can store the last index and use it (more info depends on how it should work.).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to remove duplicates from a parallel array in Java? - java

Related

Java: Most efficient way to loop through CSV and sum values of one column for each unique value in another Column

How to return a random value within a range, from a Map with Integer and List of Longs in Java

Network produces the same output regardless of inputs

Java - Return random index of specific character in string

Java - Add user input to 3 arrays? (Parallel Arrays)

Categories

Resources