Optimizing searching techniques for large data sets

Optimizing searching techniques for large data sets - java

I'm currently working on a project where I need to work with a .csv file that is around 3 million lines long and different .xlsx files that range in size between 10 lines long and over 1000 lines long. I'm trying to find commonalities between different cells in my .xlsx file and my .csv file.
To do this. I've read in my .csv file and .xslx file and stored both in ArrayLists.
I have what I want to work, working however the method I'm using is O(n^3) using a 3 nested for loop do search between each.
//This is our .xlsx file stored in an ArrayList
for(int i = 1; i<finalKnowledgeGraph.size(); i+=3) {
//loop through our knowledgeGraph again
for(int j = 1; j<finalKnowledgeGraph.size(); j+=3) {
//loop through .csv file which is stored in an ArrayList
for(int k=1; k<storeAsserions.size(); k++) {
if(finalKnowledgeGraph.get(i).equals(storeAsserions.get(k)) && finalKnowledgeGraph.get(j+1).equals(storeAsserions.get(k+1))){
System.out.println("Do Something");
} else if(finalKnowledgeGraph.get(i+1).equals(storeAsserions.get(k)) && finalKnowledgeGraph.get(j).equals(storeAsserions.get(k+1))) {
System.out.println("Do something else");
}
}
}
}
At the moment in my actual code, my System.out.println("Do something") is just writing specific parts of each file to a new .csv file.
Now, with what I'm doing out of the way my problem is optimization. Obviously if im running a 3 nested for loop over millions of inputs, it won't be finished running in my life time so I'm wondering what ways I can optimize the code.
One of my friends suggested storing the files in memory and so read/writes will be several times quicker. Another friend suggested storing the files in hashtables instead of ArrayLists to help speed up the process but since I'm essentially searching EVERY element in said hashtable I don't see how that's going to speed up the process. It just seems like it's going to transfer the searching from one data structure to another. However I said i'd also post the question here and see if people had any tips/suggestions on how I'd go about optimizing this code. Thanks
Note: I have literally no knowledge of optimization etc. myself and I found other questions on S/O too specific for my knowledge on the field so if the question seems like a duplicate, I've probably seen the question you're talking about already and couldn't understand the content
Edit: Everything stored in both ArrayLists is verb:noun:noun pairs where I'm trying to compare nouns between each ArrayList. Since I'm not concerned with verbs, I start searching at index 1. (Just for some context)

One possible solution would be using a database, which -- given a proper index -- could do the search pretty fast. Assuming the data fit in memory, you can be even faster.
The principle
For problems like
for (X x : xList) {
for (Y y : yList) {
if (x.someAttr() == y.someAttr()) doSomething(x, y);
}
}
you simply partition one list into buckets according to the attribute like
Map<A, List<Y>> yBuckets = new HashMap<>();
yList.forEach(y -> yBuckets.compute(y.someAttr(), (k, v) ->
(v==null ? new ArrayList<>() : v).add(y));
Now, you iterate the other list and only look at the elements in the proper bucket like
for (X x : xList) {
List<Y> smallList = yBucket.get(x.someAttr());
if (smallList != null) {
for (Y y : smallList) {
if (x.someAttr() == y.someAttr()) doSomething(x, y);
}
}
}
The comparison can be actually left out, as it's always true, but that's not the point. The speed comes from eliminating to looking at cases when equals would return false.
The complexity gets reduced from quadratic to linear plus the number of calls to doSomething.
Your case
Your data structure obviously does not fit. You're flattening your triplets into one list and this is wrong. You surely can work around it somehow, but creating a class Triplet {String verb, noun1, noun2} makes everything simpler. For storeAsserions, it looks like you're working with pairs. They seem to overlap, but that may be a typo, anyway it doesn't matter. Let's use Triplets and Pairs.
Let me also rename your lists, so that the code fits better in this tiny window:
for (Triplet x : fList) {
for (Triplet y : fList) {
for (Pair z : sList) {
if (x.noun1.equals(z.noun1) && y.noun2.equals(z.noun2)) {
doSomething();
} else if (x.noun2.equals(z.noun1) && y.noun1.equals(z.noun2)) {
doSomethingElse();
}
}
}
}
Now, we need some loops over buckets, so that at least one of the equals tests is always true, so that we save us dealing with non-matching data. Let's concentrate on the first condition
x.noun1.equals(z.noun1) && y.noun2.equals(z.noun2)
I suggest a loop like
for (Pair z : sList) {
for (Triplet x : smallListOfTripletsHavingNoun1SameAsZ) {
for (Triplet y : smallListOfTripletsHavingNoun2SameAsZ) {
doSomething();
}
}
}
where the small lists get computes like in the first section.
No non-matching entries get ever compared, so the complexity gets reduced from cubic to the number of matches (= to the number if lines you code would print).
Addendum -yBuckets
Let's assume xList looks like
[
{id: 1, someAttr: "a"},
{id: 2, someAttr: "a"},
{id: 3, someAttr: "b"},
]
Then yBuckets should be
{
"a": [
{id: 1, someAttr: "a"},
{id: 2, someAttr: "a"},
],
:b": [
{id: 3, someAttr: "b"},
],
}
One simple way, how to create such a Map is
yList.forEach(y -> yBuckets.compute(y.someAttr(), (k, v) ->
(v==null ? new ArrayList<>() : v).add(y));
In plaintext:
For each y from yList,
get a corresponding map entry in the form of (k, v),
when v is null, then create a new List
otherwise work with the List v
In any case, add y to it
and store it back to the Map (which is a no-op unless when a new List was created in the third step).

Related

Java streams with 2d Array

Good afternoon Ive recently learned about steams in my Java class and would like to expand my knowledge.
I am trying to stream through a 2d array, and add the sum of the elements in the previous row to each element in the next row.
I also want to make this stream parallel, but this brings up an issue because if the stream begins working on the second row before the first is finished the the data integrity will be questionable.
Is there anyway to do this in java?

If I understood you correctly, this peace of code does what you are asking for:
int[][] matrix = new int[][]{{1,2,3},{3,2,1},{1,2,3}};
BiConsumer<int[], int[]> intArraysConsumer = (ints, ints2) -> {
for (int i = 0; i < ints.length; i++) {
ints[i] = ints[i] + ints2[i];
}
} ;
int[] collect = Arrays.stream(matrix).collect(() -> new int[matrix[0].length], intArraysConsumer, intArraysConsumer);
System.out.println(Arrays.toString(collect));
This outputs: [5, 6, 7]
For what I understand of the streams api, it will decide if running this in parallel, that's why you need to provide an creator of the object you need starting empty, a way to accumulate on it, and a way to combine two of the partial accumulation objects. For this particular case the accumulate and combine operations are the same.
Please refer to: https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#collect-java.util.function.Supplier-java.util.function.BiConsumer-java.util.function.BiConsumer-

Assigning values to zip codes

What I am trying to do is assign a number value to a group of zip codes and then use that number to do a calculation later on in the program. What I have now is
if (zip == 93726 || zip == 93725 || zip == 92144) {
return 100;
}
else if (zip == 94550 || zip == 34599 || zip == 77375) {
return 150;
}
and then I take the variable and use it in a calculation. Assigning the number to the variable and the calculation all work but what I have ran into is apparently android only allows you to have so many lines of code and I have ran out of lines with just using if else statements. My question is what would be a better what to go about this?
I am not trying to assign a city to each zip because I have seen that they have services that do that from other posters.

a. You can either use a
switch (zip)
{
case 93726: case 93725: case 92144: return 100;
case 94550: case 34599: case 77375: return 150;
}
-
b. or create a HashMap<Integer,Integer> and fill it with 'zip to value' entries, which should give you a much better performance if you have that many cases.
Map<Integer,Integer> m = new HashMap<>();
m.put(93726,100);
later you could call
return m.get(zip);
-
c. If your zip count is in the tens of thousands and you want to work all in memory, then you should consider just holding a hundred-thousand int sized array:
int[] arr=new int[100000];
and filling it with the data:
arr[93726]=100;
.
.
.

You should probably use String constants for you ZIP codes since
in some places, some start with 0
in some places, they may contain letters as well as numbers
If you are using an Object (either String or Integer) for your constants, I have often used Collection.contains(zip) as a lazy shortcut for the condition in each if statement. That collection contains all the constants for that condition, and you would probably use a subclass that is geared towards finding, perhaps HashSet. Keep in mind that if you use a HashMap solution as suggested elsewhere, your keys will be Integer objects too, so you will do hashing on the keys in any case, you just won't need to store the result values in the collection suggestion.
I suspect that for a large collection of constants, hashing may turn out to be faster than having to work through the large number of == conditions in the if statement until you get to the right condition. (It may help a bit if the most-used constants come first, and in the first if statement...)
On a completely different note (i.e. strategy instead of code), you should see if you could group your ZIPs. What I mean is for example, that if you know that all (or most) ZIPs of the forms "923xx" and "924xx" result in a return of 250, you could potentially shorten your conditionals considerably. E.g. zip.startsWith("923") || zip.startsWith("923") for String ZIPs, or (zip / 100) == 923 || (zip / 100) == 924 for int.
A small number of more specific exceptions to the groups can still be handled, you just need to have the more specific conditionals before the more general conditionals.

Use declarative data. Especially as the zip codes might get updated, extended, corrected:
Using for instance a zip.properties
points = 100, 150
zip100 = 93726, 93725, 92144
zip150 = 94550, 34599, 77375,\
88123, 12324, 23424
And
Map<Integer, Integer> zipToPoints = new HashMap<>();
If you got ZIP codes with leading zeroes maybe better use String or take care to parse them with base 10 (leading zero is base 8, octal).
Whether there really exists such a limitation I do not know, but the extra effort of a bit of coding is worth having all as data.

Have a map of the zip codes.just return the value.
Map<Integer,Integer> zipValues = new HashMap<Integer,Integer>();
zipValues.put(93726,100);
.
.
And so on. or u can read from a prop file and populate the map.
then instead of using the if(s),
return zipValues.get(zipCode);
so say zipCode=93726, it will return u 100.
Cheers.

You could create a Map which maps each zip code to an integer. For example:
Map<Integer, Integer> zipCodeMap = new HashMap<>();
zipCodeMap.put(93726, 100);
And later you can retrieve values from there.
Integer value = zipCodeMap.get(93726);
Now you still have to map each zipcode to a value. I would not do that in Java code but rather use a database or read from a text file (csv for example). This depends mostly on your requirements.
Example csv file:
93726, 100
93727, 100
93726, 150
Reading from a csv file:
InputStream is = this.class.getResourceAsStream("/data.csv");
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
try {
String line;
while ((line = reader.readLine()) != null) {
String[] row = line.split(",");
int zip = Integer.parseInt(row[0].trim());
int value = Integer.parseInt(row[1].trim());
zipCodeMap.put(zip, value);
}
}
catch (IOException ex) {
// handle exceptions here
}
finally {
try {
is.close();
}
catch (IOException e) {
// handle exceptions here
}
}

Efficient way of comparing two arraylists in order to retain a delta between them

Here's the problem that I have:
I need to compare two ArrayLists and return if they are the same or if they are different, return the elements that are new from one of them, the pivot so to speak.
Here's the behaviour of the data-set:
the two ArrayLists are made of Strings
they're populated from the same source so are most of the time the same
are ordered (in the sense of the custom logic attached to them)
there will never be any empty Strings
all of the Strings have the same length, always
no duplicates
What I want:
the fastest possible way of achieving my two goals, whichever the case
using only Java 1.6 standard library features, I'd prefer not to implement a hybrid class that emulated something from List and then Set for example.
Example:
A: [ 'a', 'b', 'c', 'd']
B: [ 'a', 'c', 'd']
Result: Lists are different, return the element 'b'; A will be the 'work' List, we will make comparisons based on what's new in this ArrayList, since B will never change.
Thanks for any replies and your input.

Your fastest possible requirement bothers me a lot--I'm quite against optimizing--I generally consider early optimization one of the worst programming practices around.
If you really want to do that, just walk the two lists in order.
if the first entries match, you put that one into a "Same" pile and increment both indexes. If they are different, put the first (lower/less-than) one into a "Different" pile and increment that lists index. Loop this way until you hit the end of one list (any remaining in the other obviously go into the "Different" collection.
That should give you "close" to the fastest way. If you want the absolute fastest then you have to start by using arrays, not lists, and then pay a lot of attention to what else you do every step of the way--but the algorithm should still be pretty close to optimal.
As an example of sub-optimal but much more readable you could use some set manipulation.
Set set1=new HashSet(list1)
Set set2=new HashSet(list2)
Set same=set1.retainAll(set2) // I forget if retainAll modifies set1--if so you need to copy it first
set1.removeAll(list2)
set2.removeAll(list1)
Set different=set1.addAll(set2)
// at this point same contains all the similar values and different contains the ones that don't match. Done.
This is short and readable and probably more performant than you would think. It would be bad practice to code your own if something like this works well enough (Say, in GUI code where speed is unlikely to matter).

Pretty simple (assuming the list is ordered ascending, can be easily changed for descending order):
ArrayList<String> delta(ArrayList<String> a , ArrayList<String> b , Comparator<String> comp){
if(a.isEmpty())
return new ArrayList(b);
if(b.isEmpty())
return new ArrayList(a);
Iterator<String> it_a = a.iterator();
Iterator<String> it_b = b.iterator();
ArrayList<String> delta = new ArrayList<>();
String a_s = it_a.next() , b_s = it_b.next();
boolean onechecked = false;
while(!onechecked){
int comp_v = comp.compare(a_s , b_s);
if(comp_v == 0){
//strings are equal -> ommit them
if(it_a.hasNext())
a_s = it_a.next();
else
onechecked = true;
if(it_b.hasNext())
b_s = it_b.next();
else
onechecked = true;
}else if(comp_v < 0){
//a_s is not part of b
delta.add(a_s);
if(it_a.hasNext())
a_s = it_a.next();
else
onechecked = true;
}else{
//b_s is not part of a
delta.add(b_s);
if(it_b.hasNext())
b_s = it_b.next();
else
onechecked = true;
}
}
//add remaining items
delta.add(it_a.hasNext() ? a_s : b_s);
for(Iterator<String> it = (it_a.hasNext() ? it_a : it_b) ; it.hasNext();)
delta.add(it.next());
return delta;
}
Sorry for not adding any explanation, but the code has to speak for itself, since i have no idea how to explain it.

Calculate term frequency on large data set

I want to calculate term frequency for a large list from an even larger data set.
The list (of pairs) is in the format of
{
source_term0, target_term0;
source_term1, target_term1;
...
source_termX, target_termX }
Where X is about 3.9 million.
The searching data set (pairs) is in the format of
{
source_sentence0, target_sentence0;
source_sentence1, target_sentence1;
...
source_sentenceY, target_sentenceY }
Where Y is about 12 million.
The term frequency is counted when source_termN is appeared in source_sentenceM AND target_termN is appeared in target_sentenceM.
My challenge is the computational time. I can run a nested loop, but it takes very long to complete. Just wondering there is any better algorithm for this case?

One way to do this is to build posting lists from the source sentences and target sentences. That is, for the source sentences, you have a dictionary that contains the term and a list of source sentences the term appears in. You do the same thing for the target sentences.
So given this:
source_sentence1 = "Joe married Sue."
target_sentence1 = "The bridge is blue."
source_sentence2 = "Sue has big feet."
target_sentence2 = "Blue water is best."
Then you have:
source_sentence_terms:
joe, [1]
married,[1]
sue,[1,2]
has,[2]
big,[2]
feet,[2]
target_sentence_temrs
the,[1]
bridge,[1]
is,[1]
blue,[1,2]
water,[2]
is,[2]
best,[2]
Now you can go through your search terms. Let's say that your first pair is:
source_term1=sue, target_term1=bridge
You look "sue" up in the source_sentence_terms and you get the list [1,2], meaning that the term occurs in those two source sentences.
You look "bridge" up in the target_sentence_terms and you get the list [1].
Now you do a set intersection on those two lists and you wind up with [1].
Building the posting lists from the sentences is O(n), where n is the total number of words in all of the sentences. You only have to do that once.
For each pair, you do two hash table lookups. Those are O(1). Doing a set intersection is O(m + n), where m and n are the sizes of the individual sets. It's hard to say how large those sets will be. It depends on the frequency of terms overall, and whether you're querying frequent terms.

An idea comes to mind: sort the whole set of data. Basically, a good sorting algorithm is O(nlogn). You said you were currently at O(n^2), so this would be an improvement. Right now, when the data is sorted. You can iterate over them linearly.
I'm not sure if I understood your situation correctly, so this might be inappropriate.

Map<String, Map<String, Integer>> terms = new HashMap<>();
for each sourceTerm, targetTerm {
// Java 7 or earlier
Map<String, Integer> targetTerms = terms.get(sourceTerm);
if (targetTerms == null)
terms.put(sourceTerm, targetTerms = new HashMap<>());
// Java 8
Map<String, Integer> targetTerms =
terms.computeIfAbsent(sourceTerm, HashMap::new);
targetTerms.put(targetTerm, 0);
}
for each sourceSentence, targetSentence {
String[] sourceSentenceTerms = sourceSentence.split("\\s+");
String[] targetSentenceTerms = targetSentence.split("\\s+");
for (String sourceSentenceTerm : sourceSentenceTerms) {
for (String targetSentenceTerm : targetSentenceTerms) {
Map<String, Integer> targetTerms = terms.get(sourceSentenceTerm);
if (targetTerms != null) {
// Java 7 or earlier
Integer termFreq = targetTerms.get(targetSentenceTerm);
if (termFreq != null)
targetTerms.put(targetSentenceTerm, termFreq + 1);
// Java 8
targetTerms.computeIfPresent(targetSentenceTerm,
(_, f) -> f + 1);
}
}
}

Efficient algorithm to find all the paths from A to Z?

With a set of random inputs like this (20k lines):
A B
U Z
B A
A C
Z A
K Z
A Q
D A
U K
P U
U P
B Y
Y R
Y U
C R
R Q
A D
Q Z
Find all the paths from A to Z.
A - B - Y - R - Q - Z
A - B - Y - U - Z
A - C - R - Q - Z
A - Q - Z
A - B - Y - U - K - Z
A location cannot appear more than once in the path, hence A - B - Y - U - P - U - Z is not valid.
Locations are named AAA to ZZZ (presented here as A - Z for simplicity) and the input is random in such a way that there may or may not be a location ABC, all locations may be XXX (unlikely), or there may not be a possible path at all locations are "isolated".
Initially I'd thought that this is a variation of the unweighted shortest path problem, but I find it rather different and I'm not sure how does the algorithm there apply here.
My current solution goes like this:
Pre-process the list such that we have a hashmap which points a location (left), to a list of locations (right)
Create a hashmap to keep track of "visited locations". Create a list to store "found paths".
Store X (starting-location) to the "visited locations" hashmap.
Search for X in the first hashmap, (Location A will give us (B, C, Q) in O(1) time).
For-each found location (B, C, Q), check if it is the final destination (Z). If so store it in the "found paths" list. Else if it doesn't already exist in "visited locations" hashmap, Recurl to step 3 now with that location as "X". (actual code below)
With this current solution, it takes forever to map all (not shortest) possible routes from "BKI" to "SIN" for this provided data.
I was wondering if there's a more effective (time-wise) way of doing it. Does anyone know of a better algorithm to find all the paths from an arbitrary position A to an arbitrary position Z ?
Actual Code for current solution:
import java.util.*;
import java.io.*;
public class Test {
private static HashMap<String, List<String>> left_map_rights;
public static void main(String args[]) throws Exception {
left_map_rights = new HashMap<>();
BufferedReader r = new BufferedReader(new FileReader("routes.text"));
String line;
HashMap<String, Void> lines = new HashMap<>();
while ((line = r.readLine()) != null) {
if (lines.containsKey(line)) { // ensure no duplicate lines
continue;
}
lines.put(line, null);
int space_location = line.indexOf(' ');
String left = line.substring(0, space_location);
String right = line.substring(space_location + 1);
if(left.equals(right)){ // rejects entries whereby left = right
continue;
}
List<String> rights = left_map_rights.get(left);
if (rights == null) {
rights = new ArrayList<String>();
left_map_rights.put(left, rights);
}
rights.add(right);
}
r.close();
System.out.println("start");
List<List<String>> routes = GetAllRoutes("BKI", "SIN");
System.out.println("end");
for (List<String> route : routes) {
System.out.println(route);
}
}
public static List<List<String>> GetAllRoutes(String start, String end) {
List<List<String>> routes = new ArrayList<>();
List<String> rights = left_map_rights.get(start);
if (rights != null) {
for (String right : rights) {
List<String> route = new ArrayList<>();
route.add(start);
route.add(right);
Chain(routes, route, right, end);
}
}
return routes;
}
public static void Chain(List<List<String>> routes, List<String> route, String right_most_currently, String end) {
if (right_most_currently.equals(end)) {
routes.add(route);
return;
}
List<String> rights = left_map_rights.get(right_most_currently);
if (rights != null) {
for (String right : rights) {
if (!route.contains(right)) {
List<String> new_route = new ArrayList<String>(route);
new_route.add(right);
Chain(routes, new_route, right, end);
}
}
}
}
}

As I understand your question, Dijkstras algorithm cannot be applied as is, since shortest path problem per definition finds a single path in a set of all possible paths. Your task is to find all paths per-se.
Many optimizations on Dijkstras algorithm involve cutting off search trees with higher costs. You won't be able to cut off those parts in your search, as you need all findings.
And I assume you mean all paths excluding circles.
Algorithm:
Pump network into a 2dim array 26x26 of boolean/integer. fromTo[i,j].
Set a 1/true for an existing link.
Starting from the first node trace all following nodes (search links for 1/true).
Keep visited nodes in a some structure (array/list). Since maximal
depth seems to be 26, this should be possible via recursion.
And as #soulcheck has written below, you may think about cutting of paths you have aleady visted. You may keep a list of paths towards the destination in each element of the array. Adjust the breaking condition accordingly.
Break when
visiting the end node (store the result)
when visiting a node that has been visted before (circle)
visiting a node for which you have already found all paths to the destination and merge your current path with all the existing ones from that node.
Performance wise I'd vote against using hashmaps and lists and prefer static structures.
Hmm, while re-reading the question, I realized that the name of the nodes cannot be limited to A-Z. You are writing something about 20k lines, with 26 letters, a fully connected A-Z network would require far less links. Maybe you skip recursion and static structures :)
Ok, with valid names from AAA to ZZZ an array would become far too large. So you better create a dynamic structure for the network as well. Counter question: regarding performance, what is the best data structure for a less popuplate array as my algorithm would require? I' vote for an 2 dim ArrayList. Anyone?

What you're proposing is a scheme for DFS, only with backtracking.It's correct, unless you want to permit cyclic paths (you didn't specify if you do).
There are two gotchas, though.
You have to keep an eye on nodes you already visited on current path (to eliminate cycles)
You have to know how to select next node when backtracking, so that you don't descend on the same subtree in the graph when you already visited it on the current path.
The pseudocode is more or less as follows:
getPaths(A, current_path) :
if (A is destination node): return [current_path]
for B = next-not-visited-neighbor(A) :
if (not B already on current path)
result = result + getPaths(B, current_path + B)
return result
list_of_paths = getPaths(A, [A])
which is almost what you said.
Be careful though, as finding all paths in complete graph is pretty time and memory consuming.
edit
For clarification, the algorithm has Ω(n!) time complexity in worst case, as it has to list all paths from one vertex to another in complete graph of size n, and there are at least (n-2)! paths of form <A, permutations of all nodes except A and Z, Z>. No way to make it better if only listing the result would take as much.

Your data is essentially an adjacency list which allows you to construct a tree rooted at the node corresponding to A. In order to obtain all the paths between A & Z, you can run any tree traversal algorithm.
Of course, when you're building the tree you have to ensure that you don't introduce cycles.

I would proceed recursively where I would build a list of all possible paths between all pairs of nodes.
I would start by building, for all pairs (X, Y), the list L_2(X, Y) which is the list of paths of length 2 that go from X to Y; that's trivial to build since that's the input list you are given.
Then I would build the lists L_3(X, Y), recursively, using the known lists L_2(X, Z) and L_2(Z, Y), looping over Z. For example, for (C, Q), you have to try all Z in L_2(C, Z) and L_2(Z, Q) and in this case Z can only be R and you get L_3(C, Q) = {C -> R -> Q}. For other pairs, you might have an empty L_3(X, Y), or there could be many paths of length 3 from X to Y.
However you have to be careful here when building the paths here since some of them must be rejected because they have cycles. If a path has twice the same node, it is rejected.
Then you build L_4(X, Y) for all pairs by combining all paths L_2(X, Z) and L_3(Z, Y) while looping over all possible values for Z. You still remove paths with cycles.
And so on... until you get to L_17576(X, Y).
One worry with this method is that you might run out of memory to store those lists. Note however that after having computed the L_4's, you can get rid of the L_3's, etc. Of course you don't want to delete L_3(A, Z) since those paths are valid paths from A to Z.
Implementation detail: you could put L_3(X, Y) in a 17576 x 17576 array, where the element at (X, Y) is is some structure that stores all paths between (X, Y). However if most elements are empty (no paths), you could use instead a HashMap<Pair, Set<Path>>, where Pair is just some object that stores (X, Y). It's not clear to me if most elements of L_3(X, Y) are empty, and if so, if it is also the case for L_4334(X, Y).
Thanks to #Lie Ryan for pointing out this identical question on mathoverflow. My solution is basically the one by MRA; Huang claims it's not valid, but by removing the paths with duplicate nodes, I think my solution is fine.
I guess my solution needs less computations than the brute force approach, however it requires more memory. So much so that I'm not even sure it is possible on a computer with a reasonable amount of memory.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Optimizing searching techniques for large data sets - java

Related

Java streams with 2d Array

Assigning values to zip codes

Efficient way of comparing two arraylists in order to retain a delta between them

Calculate term frequency on large data set

Efficient algorithm to find all the paths from A to Z?

Categories

Resources