Java Searching through two Arrays

Java Searching through two Arrays - java

I have 2 ArrayList's. ArrayList A has 8.1k elements and ArrayList B has 81k elements.
I need to iterate through B, search for that particular item in A then change a field in the matched element in list B.
Here's my code:
private void mapAtoB(List<A> aList, ListIterator<B> it) {
AtomicInteger i = new AtomicInteger(-1);
while(it.hasNext()) {
System.out.print(i.incrementAndGet() + ", ");
B b = it.next();
aList.stream().filter(a -> b.equalsB(a)).forEach(a -> {
b.setId(String.valueOf(a.getRedirectId()));
it.set(b);
});
}
System.out.println();
}
public class B {
public boolean equalsB(A a) {
if (a == null) return false;
if (this.getFullURL().contains(a.getFirstName())) return true;
return false;
}
}
But this is taking forever. To finish this method it takes close to 15 minutes. Is there any way to optimize any of this? 15 min run time is way too much.

I'll be happy to see a good and thorough solution, meanwhile I can propose two ideas (or maybe two reincarnations of one).
The first one is to speed up searching of all objects of type A in one object of type B. For that, Rabin-Karp algorithm seems applicable and simple enough to quickly implement, and Aho-Corasick harder but will probably give better results, not sure how much better.
The other option is to limit the number of objects of type B which should be fully processed for each object of A, for that you could e.g. build an inverse N-gram index: for each fullUrl you take all its substrings of length N ("N-grams"), and you build a map from each such N-gram to a set of B's that have such N-gram in their fullUrl. When searching for an object A, you take all of its N-grams, find a set of B's for each such N-gram and intersect all these sets, the intersection will contain all B's that you should fully process. I implemented this approach quickly, for the sizes you specified it gives a 6-7 time speedup for N=4; as N grows, search becomes faster, but building the index slows down (so if you can reuse it you are probably better off choosing a bigger N). This index takes about 200 Mb for the sizes you specified, so this approach will only scale this far with the growth of the collection of B's. Assuming that all strings are longer than NGRAM_LENGTH, here's the quick and dirty code for building the index using Guava's SetMultimap, HashMultimap:
SetMultimap<String, B> idx = HashMultimap.create();
for (B b : bList) {
for (int i = 0; i < b.getFullURL().length() - NGRAM_LENGTH + 1; i++) {
idx.put(b.getFullURL().substring(i, i + NGRAM_LENGTH), b);
}
}
And for the search:
private void mapAtoB(List<A> aList, SetMultimap<String, B> mmap) {
for (A a : aList) {
Collection<B> possible = null;
for (int i = 0; i < a.getFirstName().length() - NGRAM_LENGTH + 1; i++) {
String ngram = a.getFirstName().substring(i, i + NGRAM_LENGTH);
Set<B> forNgram = mmap.get(ngram);
if (possible == null) {
possible = new ArrayList<>(forNgram);
} else {
possible.retainAll(forNgram);
}
if (possible.size() < 20) { // it's ok to scan through 20
break;
}
}
for (B b : possible) {
if (b.equalsB(a)) {
b.setId(a.getRedirectId());
}
}
}
}
A possible direction for optimization would be to use hashes instead of full N-grams thus reducing the memory footprint and necessity for N-gram key comparisons.

Related

Saving memory and CPU in java loops

this (obvious) code i've writen works well, but for testing purposes, i should make it work for a "one million" sized array in a reasonable time by saving CPU Cycles and saving as much memory as i can.
any suggestions please?
!!! the array is arranged in ascending order !!!
import java.util.Arrays;
class A {
static boolean exists(int[] ints, int k) {
for(int integer : ints){
if(integer == k){
return true;
}
}
return false;
}

Since your array is in ascending order, one thing you could do (i think) is to make a binary search instead of a linear search.

You could use a Set<Integer> that relies on hashing rather than an array where you iterate sequentially.
static boolean exists(Set<Integer> ints, int k) {
return ints.contains(k);
}
You could convert the array to a Set and pass it to the method as many times as required :
Set<Integer> set = Arrays.stream(ints).boxed().collect(Collectors.toSet());
boolean isExist = exists(set, 15);
...
isExist = exists(set, 5005);
...
isExist = exists(set, 355);

Is it possible to get next element in the Stream?

I am trying to converting a for loop to functional code. I need to look ahead one value and also look behind one value. Is it possible using streams?
The following code is to convert the Roman text to numeric value.
Not sure if reduce method with two/three arguments can help here.
int previousCharValue = 0;
int total = 0;
for (int i = 0; i < input.length(); i++) {
char current = input.charAt(i);
RomanNumeral romanNum = RomanNumeral.valueOf(Character.toString(current));
if (previousCharValue > 0) {
total += (romanNum.getNumericValue() - previousCharValue);
previousCharValue = 0;
} else {
if (i < input.length() - 1) {
char next = input.charAt(i + 1);
RomanNumeral nextNum = RomanNumeral.valueOf(Character.toString(next));
if (romanNum.getNumericValue() < nextNum.getNumericValue()) {
previousCharValue = romanNum.getNumericValue();
}
}
if (previousCharValue == 0) {
total += romanNum.getNumericValue();
}
}
}

No, this is not possible using streams, at least not easily. The stream API abstracts away from the order in which the elements are processed: the stream might be processed in parallel, or in reverse order. So "the next element" and "previous element" do not exist in the stream abstraction.
You should use the API best suited for the job: stream are excellent if you need to apply some operation to all elements of a collection and you are not interested in the order. If you need to process the elements in a certain order, you have to use iterators or maybe access the list elements through indices.

I haven't see such use case with streams, so I can not say if it is possible or not. But when I need to use streams with index, I choose IntStream#range(0, table.length), and then in lambdas I get the value from this table/list.
For example
int[] arr = {1,2,3,4};
int result = IntStream.range(0, arr.length)
.map(idx->idx>0 ? arr[idx] + arr[idx-1]:arr[idx])
.sum();

By the nature of the stream you don't know the next element unless you read it. Therefore directly obtaining the next element is not possible when processing current element. However since you are reading current element you obiously know what was read before, so to achieve such goal as "accesing previous element" and "accessing next element", you can rely on the history of elements which were already processed.
Following two solutions are possible for your problem:
Get access to previously read elements. This way you know the current element and defined number of previously read elements
Assume that at the moment of stream processing you read next element and that current element was read in previous iteration. In other words you consider previously read element as "current" and currently processed element as next (see below).
Solution 1 - implemenation
First we need a data structure which will allow keeping track of data flowing through the stream. Good choice could be an instance of Queue because queues by their nature allows data flowing through them. We only need to bound the queue to the number of last elements we want to know (that would be 3 elements for your use case). For this we create a "bounded" queue keeping history like this:
public class StreamHistory<T> {
private final int numberOfElementsToRemember;
private LinkedList<T> queue = new LinkedList<T>(); // queue will store at most numberOfElementsToRemember
public StreamHistory(int numberOfElementsToRemember) {
this.numberOfElementsToRemember = numberOfElementsToRemember;
}
public StreamHistory save(T curElem) {
if (queue.size() == numberOfElementsToRemember) {
queue.pollLast(); // remove last to keep only requested number of elements
}
queue.offerFirst(curElem);
return this;
}
public LinkedList<T> getLastElements() {
return queue; // or return immutable copy or immutable view on the queue. Depends on what you want.
}
}
The generic parameter T is the type of actual elements of the stream. Method save returns reference to instance of current StreamHistory for better integration with java Stream api (see below) and it is not really required.
Now the only thing to do is to convert the stream of elements to the stream of instances of StreamHistory (where each next element of the stream will hold last n instances of actual objects going through the stream).
public class StreamHistoryTest {
public static void main(String[] args) {
Stream<Character> charactersStream = IntStream.range(97, 123).mapToObj(code -> (char) code); // original stream
StreamHistory<Character> streamHistory = new StreamHistory<>(3); // instance of StreamHistory which will store last 3 elements
charactersStream.map(character -> streamHistory.save(character)).forEach(history -> {
history.getLastElements().forEach(System.out::print);
System.out.println();
});
}
}
In above example we first create a stream of all letters in alphabet. Than we create instance of StreamHistory which will be pushed to each iteration of map() call on original stream. Via call to map() we convert to stream containing references to our instance of StreamHistory.
Note that each time the data flows through original stream the call to streamHistory.save(character) updates the content of the streamHistory object to reflect current state of the stream.
Finally in each iteration we print last 3 saved characters. The output of this method is following:
a
ba
cba
dcb
edc
fed
gfe
hgf
ihg
jih
kji
lkj
mlk
nml
onm
pon
qpo
rqp
srq
tsr
uts
vut
wvu
xwv
yxw
zyx
Solution 2 - implementation
While solution 1 will in most cases do the job and is fairly easy to follow, there are use cases were the possibility to inspect next element and previous is really convenient. In such scenario we are only interested in three element tuples (pevious, current, next) and having only one element does not matter (for simple example consider following riddle: "given a stream of numbers return a tupple of three subsequent numbers which gives the highest sum"). To solve such use cases we might want to have more convenient api than StreamHistory class.
For this scenario we introduce a new variation of StreamHistory class (which we call StreamNeighbours). The class will allow to inspect the previous and the next element directly. Processing will be done in time "T-1" (that is: the currently processed original element is considered as next element, and previously processed original element is considered to be current element). This way we, in some sense, inspect one element ahead.
The modified class is following:
public class StreamNeighbours<T> {
private LinkedList<T> queue = new LinkedList(); // queue will store one element before current and one after
private boolean threeElementsRead; // at least three items were added - only if we have three items we can inspect "next" and "previous" element
/**
* Allows to handle situation when only one element was read, so technically this instance of StreamNeighbours is not
* yet ready to return next element
*/
public boolean isFirst() {
return queue.size() == 1;
}
/**
* Allows to read first element in case less than tree elements were read, so technically this instance of StreamNeighbours is
* not yet ready to return both next and previous element
* #return
*/
public T getFirst() {
if (isFirst()) {
return queue.getFirst();
} else if (isSecond()) {
return queue.get(1);
} else {
throw new IllegalStateException("Call to getFirst() only possible when one or two elements were added. Call to getCurrent() instead. To inspect the number of elements call to isFirst() or isSecond().");
}
}
/**
* Allows to handle situation when only two element were read, so technically this instance of StreamNeighbours is not
* yet ready to return next element (because we always need 3 elements to have previos and next element)
*/
public boolean isSecond() {
return queue.size() == 2;
}
public T getSecond() {
if (!isSecond()) {
throw new IllegalStateException("Call to getSecond() only possible when one two elements were added. Call to getFirst() or getCurrent() instead.");
}
return queue.getFirst();
}
/**
* Allows to check that this instance of StreamNeighbours is ready to return both next and previous element.
* #return
*/
public boolean areThreeElementsRead() {
return threeElementsRead;
}
public StreamNeighbours<T> addNext(T nextElem) {
if (queue.size() == 3) {
queue.pollLast(); // remove last to keep only three
}
queue.offerFirst(nextElem);
if (!areThreeElementsRead() && queue.size() == 3) {
threeElementsRead = true;
}
return this;
}
public T getCurrent() {
ensureReadyForReading();
return queue.get(1); // current element is always in the middle when three elements were read
}
public T getPrevious() {
if (!isFirst()) {
return queue.getLast();
} else {
throw new IllegalStateException("Unable to read previous element of first element. Call to isFirst() to know if it first element or not.");
}
}
public T getNext() {
ensureReadyForReading();
return queue.getFirst();
}
private void ensureReadyForReading() {
if (!areThreeElementsRead()) {
throw new IllegalStateException("Queue is not threeElementsRead for reading (less than two elements were added). Call to areThreeElementsRead() to know if it's ok to call to getCurrent()");
}
}
}
Now, assuming that three elements were already read, we can directly access current element (which is the element going through the stream at time T-1), we can access next element (which is the element going at the moment through the stream) and previous (which is the element going through the stream at time T-2):
public class StreamTest {
public static void main(String[] args) {
Stream<Character> charactersStream = IntStream.range(97, 123).mapToObj(code -> (char) code);
StreamNeighbours<Character> streamNeighbours = new StreamNeighbours<Character>();
charactersStream.map(character -> streamNeighbours.addNext(character)).forEach(neighbours -> {
// NOTE: if you want to have access the values before instance of StreamNeighbours is ready to serve three elements
// you can use belows methods like isFirst() -> getFirst(), isSecond() -> getSecond()
//
// if (curNeighbours.isFirst()) {
// Character currentChar = curNeighbours.getFirst();
// System.out.println("???" + " " + currentChar + " " + "???");
// } else if (curNeighbours.isSecond()) {
// Character currentChar = curNeighbours.getSecond();
// System.out.println(String.valueOf(curNeighbours.getFirst()) + " " + currentChar + " " + "???");
//
// }
//
// OTHERWISE: you are only interested in tupples consisting of three elements, so three elements needed to be read
if (neighbours.areThreeElementsRead()) {
System.out.println(neighbours.getPrevious() + " " + neighbours.getCurrent() + " " + neighbours.getNext());
}
});
}
}
The output of this is following:
a b c
b c d
c d e
d e f
e f g
f g h
g h i
h i j
i j k
j k l
k l m
l m n
m n o
n o p
o p q
p q r
q r s
r s t
s t u
t u v
u v w
v w x
w x y
x y z
By StreamNeighbours class it is easier to track the previous/next element (because we have method with appropriate names), while in StreamHistory class this is more cumbersome since we need to manually "reverse" the order of the queue to achieve this.

As others stated, it's not feasable, to get next elements from within an iterated Stream.
If IntStream is used as a for loop surrogate, which merely acts as an index iteration provider, it's possible use its range iteration index just like with for; one needs to provide a means of skipping the next element on the next iteration, though, e. g. by means of an external skip var, like this:
AtomicBoolean skip = new AtomicBoolean();
List<String> patterns = IntStream.range(0, ptrnStr.length())
.mapToObj(i -> {
if (skip.get()) {
skip.set(false);
return "";
}
char c = ptrnStr.charAt(i);
if (c == '\\') {
skip.set(true);
return String.valueOf(new char[] { c, ptrnStr.charAt(++i) });
}
return String.valueOf(c);
})
It's not pretty, but it works.
On the other hand, with for, it can be as simple as:
List<String> patterns = new ArrayList();
for (char i=0, c=0; i < ptrnStr.length(); i++) {
c = ptrnStr.charAt(i);
patternList.add(
c != '\\'
? String.valueOf(c)
: String.valueOf(new char[] { c, ptrnStr.charAt(++i) })
);
}
EDIT:
Condensed code and added for example.

Calling function from several vectors ordered by element value

I have several vectors of different elements but all extending a class which has a specific function, lets say for example
Vector<classone> one;
Vector<classtwo> two;
Vector<classthree> three;
and classone, classtwo and classthree extend Number, and number has two functions:
doThing()
getValue()
And what i want is to call doThing in the order of the getValues received from all the vectors.
One cheap solution would be to concatenate all the vectors in a single Vector, sort it by value and iterate to call the function, but that makes me have to create a huge new vector, occupying new ram, and since the doThing will happen 60 times a second, if the vectors become big, it might be an overkill, i dont really want to create a new vector just to sort it, is there any other solution using the already existing vectors?
Its Java btw.

If one, two and three are sorted, you could create an custom iterator that checks for a given set of lists what the smallest value at the current position is and proceed there.
Should look similar to this (not tested):
class MultiListIterator {
List<Number>[] lists;
int[] positions;
MultiListIterator(List<Number>... lists) {
this.lists = lists;
positions = new int[lists.length];
}
boolean hasNext() {
for (int i = 0; i < lists.length; i++) {
if (positions[i] < lists[i].length) return true;
}
return false;
}
Number next() {
int bestIndex = -1;
Number bestNumber = null;
for (int i = 0; i < lists.length; i++) {
var p = positions[i];
if (p >= positions[i].length) continue;
Number n = lists[i].get(p);
if (bestNumber == null || n.getValue() < bestNumber.getValue()) {
bestIndex = i;
bestNumer = n;
}
}
if (bestNumber == null) throw new RuntimeException("next() beyond hasNext()");
positions[bestIndex++];
return bestNumber;
}
}
Usage:
MultiListIterator mli = new MultiListIterator(one, two, three);
while (mli.hasNext()) {
mli.next().doThing();
}
You may want to let MultiListIterator implement Iterator<Number>.
Note that Java already has a built-in class Number. Using the same name for your class might lead to a lot of confusion when you forget to import it somewhere.

Premature optimizations are generally a bad idea.
Try the method that came to mind first: creating a giant Vector1 ArrayList and sorting it. If it turns out to be a performance issue, then you can start trying new things.

Java: Recursively Finding the minimum element in a list

I will preface this by saying it is homework. I am just looking for some pointers. I have been racking my brain with this one, and for the life of me i am just not getting it. We are asked to find the minimum element in a list. I know i need a sublist in here, but after that i am not sure. any pointers would be great. thanks.
/** Find the minimum element in a list.
*
* #param t a list of integers
*
* #return the minimum element in the list
*/
public static int min(List<Integer> t) {
if (t.size() == 1){
return t.get(0);
}
else{
List<Integer> u = t.subList(1, t.size());

The point of a recursive algorithm is that everything that must be computed is done through return values or additional parameters. You shouldn't have anything outside the local call of the recursive step.
Since you have to find the minimum element you should take some considerations:
the min element of a list composed by one element is that element
the min element of a generic list is the minimum between the first element and the minimum of the remaining list
By taking these into consideration it should be easy to implement. Especially because recursive algorithms have the convenience of being really similar to their algorithmic description.

You need to find the relationship between the function min applied to a list and the function min applied to a sublist.
min([a b c d e ...]) = f(a, min([b c d e ...]))
Now you just need to find the function f. Once you have the relationship, then to implement it is easy. Good luck.

In the most general sense, recursion is a concept based on breaking down work, and then delegating the smaller chunk of work to a copy of yourself. For recursion to work, you need three main things:
The breakdown of work. How are you going to make each step "simpler"?
The recursive call. At some point your function must call itself, but with less "work".
The base case. What is a (usually trivial) end case that will stop the recursion process?
In your case, you're trying to create a function min that operates on a list. You're correct in thinking that you could somehow reduce (breakdown) your work by making the list one smaller each time (sublist out the first element). As others have mentioned, the idea would be to check the first element (which you just pulled off) against the "rest of the list". Well here's where the leap of faith comes in. At this point, you can "assume" that your min function will work on the sublist, and just make a function call on the sublist (the recursive call). Now you have to make sure all your calls will return (i.e. make sure it will not recurse forever). That's where your base case comes in. If your list is of size 1, the only element is the smallest of the list. No need to call min again, just return (that part you already have in your original post).

/**
* The function computes the minimum item of m (-1 if m is empty).
* #param m: The MyList we want to compute its minimum item.
* #return: The minimum item of MyList
*/
public int minimum(MyList<Integer> m){
int res = 0;
int e0 = 0;
int e1 = 0;
// Scenarios Identification
int scenario = 0;
// Type 1. MyLyst is empty
if(m.length() == 0) {
scenario = 1;
}else {
// Type 2. MyLyst is not empty
scenario = 2;
}
// Scenario Implementation
switch(scenario) {
// If MyLyst is empty
case 1:
res = -1;
break;
// If there is 1 or more elements
case 2:
//1. Get and store first element of array
e0 = m.getElement(0);
//2. We remove the first element from MyList we just checked
m.removeElement(0);
//3. We recursively solve the smaller problem
e1 = minimum(m);
//4. Compare and store results
if(e0 < e1) {
res = e0;
}
else {
res = e1;
}
//5. Return removed element back to the array
m.addElement(0, e0);
break;
}
//6. Return result
return res;
}

There you go, Try this out in the method:
public static Integer minimum(List<Integer> t) {
int minInt;
if (t.size() == 1) {
return t.get(0);
} else {
int first = t.get(0);
List<Integer> u = t.subList(1, t.size());
minInt = Math.min(first, u.get(0));
minInt = IntegerList.minimum(u);
}
return minInt;
}
Hopefully this solves your issue.

Implementing edit distance method using recursion results in object heap error

private static int editDistance(ArrayList<String> s1, ArrayList<String> s2) {
if (s1.size()==0) {
return s2.size();
}
else if (s2.size()==0) {
return s1.size();
}
else {
String temp1 = s1.remove(s1.size()-1);
String temp2 = s2.remove(s2.size()-1);
if (temp1.equals(temp2)) {
return editDistance((ArrayList<String>)s1.clone(),(ArrayList<String>)s2.clone());
} else {
s1.add(temp1);
int first = editDistance((ArrayList<String>)s1.clone(),(ArrayList<String>)s2.clone())+1;
s2.add(temp2);
s1.remove(s1.size()-1);
int second = editDistance((ArrayList<String>)s1.clone(),(ArrayList<String>)s2.clone())+1;
s2.remove(s2.size()-1);
int third = editDistance((ArrayList<String>)s1.clone(),(ArrayList<String>)s2.clone())+1;
if (first <= second && first <= third ) {
return first;
} else if (second <= first && second <= third) {
return second;
} else {
return third;
}
}
}
}
For example, the input can be ["div","table","tr","td","a"] and ["table","tr","td","a","strong"] and the corresponding output should be 2.
My problem is when either input list has a size too big, e.g., 40 strings in the list, the program will generate a can't reserve enough space for object heap error. The JVM parameters are -Xms512m -Xmx512m. Could my code need so much heap space? Or it is due to logical bugs in my code?
Edit: With or without cloning the list, this recursive approach does not seem to work either way. Could someone please help estimate the total heap memory it requires to work for me? I assume it would be shocking. Anyway, I guess I have to turn to the dynamic programming approach instead.

You clone() each ArrayList instance before each recursive call of your method. That essentially means that you get yet another copy of the whole list and its contents for each call - it can easily add-up to a very large amount of memory for large recursion depths.
You should consider using List#sublist() instead of clone(), or even adding parameters to your method to pass down indexes towards a single set of initial List objects.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Searching through two Arrays - java

Related

Saving memory and CPU in java loops

Is it possible to get next element in the Stream?

Calling function from several vectors ordered by element value

Java: Recursively Finding the minimum element in a list

Implementing edit distance method using recursion results in object heap error

Categories

Resources