List implementation that is a view over multiple sublists? - java

I'm working on a piece of software that very frequently needs to return a single list that consists of the first (up to) N elements of a number of other lists. The return is not modified by its clients -- it's read-only.
Currently, I am doing something along the lines of (code simplified for readability):
List ret = new ArrayList<String>();
for (List aList : lists) {
// add the first N elements, if they exist
ret.addAll(aList.subList(0, Math.min(aList.size(), MAXMATCHESPERLIST)));
if (ret.size() >= MAXMATCHESTOTAL) {
break;
}
}
return ret;
I'd like to avoid the creation of the new list and the use of addAll() as I don't need to be returning a new list, and I'm dealing with thousands of elements per second. This method is a major bottleneck for my application.
What I'm looking for is an implementation of List that simply consists of the subList() results (those are cheap views, not actual copies) of each of the contained lists.
I've looked through the usual suspects including java.util, Commons Collections, Commons Lang, etc., but can't for the life of me find any such implementation. I'm pretty sure it has to have been implemented at some point though and hopefully I've missed something obvious.
So I'm turning to you, Stack Overflow -- is anyone aware of such an implementation? I can write one myself, but I hate re-inventing the wheel if the wheel is out there.
Suggestions for alternative more efficient approaches are very welcome!
Optional background detail (probably not all that relevant to my question, but just in case it helps you understand what I'm trying to do): this is for a program to fill crossword-style grids with words that revolve around a theme. Each theme may have any number of candidate word lists, ordered in decreasing order of theme relevancy. For instance, the "film" theme may start with a list of movie titles, then a list of actors, then a generic list of places that may or may not be film-relevant, then a generic list of english words. The lists are each stored in a wildcarded trie structure to allow fast lookups that meet the grid constraints (e.g. "CAT" would be stored in trie'd lists against the keys "CAT", "CA?", "C??", "?AT", ... "???" etc.) Lists vary from a few words to several tens of thousands of words.
For any given query, e.g. "C??", I want to return a list that contains up to N (say 50) matching words, ordered in the same order as the source lists. So if list 1 contains 3 matches for "C??", list 2 contains 7, and list 3 contains 100, I need a return list that contains first the 3 matches from list 1, then the 7 matches from list 2, then 40 of the matches from list 3. And I want that returned "conjoined list view" operation to be more efficient than having to continuously call addAll(), in a similar manner to the implementation of subList().
Caching the returned lists is not an option due to memory constraints -- my trie is already consuming the vast majority of my (32 bit) max-sized heap.
PS this isn't homework, it's for a real project. Any help much appreciated!

Do you need random access for the resulting list? Or you client code only iterates over the result?
If you only need to iterate over the result. Create a custom list implementation which will have list of the original lists :) as the instance field. Return custom iterator which will take items from every list one by one and stops when there are no more items in any of the underlying lists or you return MAXMATCHESTOTAL items already.
With some thoughts you can do the same for random access.

Use list.addAll() multiple times. Simple, does not require external jars and ineffective.
Jakarta collections framework has such list. it is effective but requires external jar and does not support generics.
Check Guava from Google. I think it has something that you are looking for.

What's wrong with returning the sublist? That is the fastest way, since the sublist is not a copy but uses a reference to the backing array, and clients are read-only - seems perfect to me.
EDIT:
I understand why you want to group up the contents of several lists to make a larger chunk, but can you change you clients to not need such a large chunk? See my other answer re BlockingQueue and producer/consumer approach.

Have you considered using a BlockingQueue and having consumers pull items from the queue one by one as they need them, rather than getting items in chunks (lists)? It seems you are attempting to reinvent the producer/consumer pattern here.

Related

JAVA the efficient way to read a logs from file

I'm looking for most effective way to get all the elements from List<String> which contain some String value ("value1") for example.
First thought - simple iteration and adding the elements which contains "value1" to another List<String> But this task must be done very often and by many users.
Thought about list.RemoveAll(), but how do I remove all elements which don't contain "value1"?
So, what is the way to make it most efficiently?
UPDATE:
The whole picture - need to need to read the logs from file very often and for multiple users simultaneously. The logs must be filtered by the username from file. each string in file contains username.
In terms of time efficiency, you cannot get to better result than linear (O(n)) if you want to iterate through the whole list.
Deciding between LinkedList and ArrayList etc. is most likely irrelevant as the differences are small.
If you want a better time than linear to list size, you need to build on some assumptions and prerequisites:
if you know beforehand what string you'll search for, you can build another list along with your original list containing only relevant records
if you know you're going to query one list multiple times, you could build an index
If you just have a list on input that someone gave you, and you need to read through this one input once and find the relevant strings, then you're stuck with linear time since you cannot avoid reading the list at least once.
From your comments it seems like your list is a couple of log statements that should be grouped by user id (which would be your "value1"). If you really need to read the logs very often and for multiple users simultaneously you might consider some caching, possibly with grouping by user id.
As an example you could maintain an additional log file per user and just display it when needed. Alterantively you could keep the latest log statements in memory by employing some FIFO buffer which is grouped by user id (could be a buffer per user and maybe another LIFO layer on top of that).
However, depending on your use case it might not be worth the effort and you might just go and filter the list whenever the user requests to do so. In that case I'd recommend reading the file line by line and only adding the matching lines to the list. If you first read everything into a single list and then remove non-matching elements it'll be less efficient (you'd have to iterate more often, shift elements etc.) and temporarily use more memory (as opposed by discarding every non-matching line right after checking it).
Instead of List, Use TreeSet with provided Comparator so that all Strings with "value1" are at the beginning. When iterating, as soon as the string does not contain "value1", all the remaining do not have it, and you can stop to iterate.
The iteration is likely the only way, but you can allow Java to optimize it as much as possible (and use an elegant, non imperative syntax) by employing Java 8's streams:
// test list
List<String> original = new ArrayList<String>(){
{
add("value1");add("foo");add("foovalue1");add("value1foo");
}
};
List<String> trimmed = original
.stream()
.filter((s) -> s.contains("value1"))
.collect(Collectors.toList());
System.out.println(trimmed);
Output
[value1, foovalue1, value1foo]
Notes
One part of your question that may require more information is "performed often, by many users" - this may call for some concurrency-handling mechanism.
The actual functionality is not very clear. You may still have room to optimize your code early by fetching and collecting the "value1"-containing Strings prior to building you List
Ok, in this I can suggest you the simplest one, I had used.
Use of an Iterator, makes it easier but if you go with list.remove(val) , where val = "value1" , may give you UnsupportedOperationException
List list = yourList; /contains "value1"/
for (Iterator<String> itr = list.iterator(); itr.hasNext();){
String val = itr.next();
if(!val.equals("value1")){
itr.remove();
}
}
Try this one and let me know. :)

What's better to use arrays or lists to create a multiplication calculator

I'm making a calculator for multiplication. I want it to show each step like we do for long term multiplication. For example:
Long Multiplication
To do this I need to take apart the number digit by digit, which is better for this, lists or arrays?
In general, you want an array when you need to quickly look at any single element of data (like, "I want the 4th digit"). You'll want a list if you're going to keep the data in the same order, and will not need to look up elements at random locations in your set of elements.
The behavior of lists and arrays will vary depending on programming language, but conceptually they derive from these low-level ideas: an array is a contiguous block of memory (that is, memory that's all clumped together). You can read all of the elements in an array by counting indexes -- you start somewhere (say, position 0), then add one to get the next element. It's fast and easy for the computer, but the computer's operating system has to do more work to ensure you have a contiguous block. A list is a way of organizing related data by storing an element of data, along with the location of the next element in the list. You retrieve data by looking at an element, then following the address to the next element, until you get to the end of the list. This makes it easier to structure the data (your pointers to 'next' describe how elements are related) and the operating system and programming language have more flexibility in how to get your program the memory it needs -- but the trade off is that finding any individual element requires you to "walk" the list (follow the pointers.)
All of that is general guidance however; use caution when asking this question in different programming languages and runtime environments, because different environments might call something an array when it's really a list, or vice-verse.
The short answer is: "it depends, but usually you'll use lists if you want to read things in order and use arrays if you want to read elements at random."

Structure like Java's EnumSet that can hold repeated elements

I need some structure where to store N Enums, some of them repeated. And be able to easily extract them. So far I've try to use the EnumSet like this.
cards = EnumSet.of(
BEST_OF_THREE,
BEST_OF_THREE,
SIMPLE_QUESTION,
SIMPLE_QUESTION,
STAR);
But now I see it can only have one of each. Conceptually, which one would be the best structure to use for this problem.
Regards
jose
You can use a Map of type Enumeration -> Integer, where the integer indicates how many of each there are. The google guava "MultiSet" does this for you, and handles the edge cases of adding an enum to the set when there is not already an entry, and removing an enum when it leaves none left.
Another strategy is to use the Enumeration ordinal index. Because this index is unique, you can use this to index into an int array that is sized to the Enumeration size, where the count in each array slot would indicate how many of each enumeration you have. Like this:
// initialize array for counting each enumeration type
// TODO: someone should double check every initial value will be zero
int[] cardCount = new int[CardEnum.values().length];
...
// incrementing the count for an enumeration (when we add)
cardCount[BEST_OF_THREE.ordinal()]++;
...
// decrementing the count for an enumeration (when we remove)
cardCount[BEST_OF_THREE.ordinal()]--;
// DEBUG: assert cardCount[BEST_OF_THREE.ordinal()] >= 0
...
// getting the count for an enumeration
int count = cardCount[BEST_OF_THREE.ordinal()];
... Some time later
Having read the clarifying comments underneath the original post that explained what the OP was asking, it is clear that you're best off with a linear structure with an entry per element. I didn't realize that you didn't need detailed information on how many of each you needed. Storing them in a MultiSet or an equivalent counting structure makes it hard to randomly pick, as you need to attribute an index picked at random from [0, size) to a particular container, which takes log time.
Sets don't allow duplicates, so if you want repeats you'll need either a List or a Map.
If you just need the number of duplicates, an EnumMap with Integer values is probably your best bet.
If the order is important, and you need quick access to the number of each type, you'll probably need to roll your own data structure.
If the order is important (but the count of each is not), then a List is the way to go, which implementation depends on how you will use it.
LinkedList - Best when there will be many inserts/removals from the beginning of the List. Indexing into a LinkedList is very expensive, and should be avoided whenever possible. If a List is built by shifting data onto the front of the list, but any later additions are at the end, conversion to an ArrayList once the initial List is built is a good idea - especially if indexing into the List is anticipated at any point.
ArrayList - When in doubt, this is a good place to start. Inserting or removing items requires shifting, so if this is a common operation look elsewhere.
TreeList - This is a good all-around option, and insertions and removals anywhere in the List are inexpensive. This does require the Apache commons library, and uses a bit more memory than the others.
Benchmarks, and the code used go generate them can be found in this gist.

Which list implementation is optimal for removing and inserting from the front and back?

I am working on an algorithm that will store a small list of objects as a sublist of a larger set of objects. the objects are inherently ordered, so an ordered list is required.
The most common operations performed will be, in order of frequency:
retrieving the nth element from the list (for some arbitrary n)
inserting a single to the beginning or end of the list
removing the first or last n elements from the list (for some
arbitrary n)
removing and inserting from the middle will never be done so there is no need to consider the efficiency of that.
My question is what implementation of List is most efficient for this use case in Java (i.e. LinkedList, ArrayList, Vector, etc)? Please defend your answer by explaining the implementation s of the different data structures so that I can make an informed decision.
Thanks.
NOTE
No, this is not a homework question. No, I do not have an army research assistants who can do the work for me.
Based on your first criteria (arbitrary access) you should use an ArrayList. ArrayLists (and arrays in general) provide lookup/retrieval in constant time. In contrast, it takes linear time to look up items in a LinkedList.
For ArrayLists, insertion or deletion at the end is free. It may also be with LinkedLists, but that would be an implementation-specific optimization (it's linear otherwise).
For ArrayLists, insertion or deletion at front requires linear time (with consistent reuse of space, these may become constant depending on implementation). LinkedList operations at front of list are constant.
The last two usage cases somewhat balance each other out, however your most common case definitely suggests array-based storage.
As far as basic implementation details:
ArrayLists are basically just sequential sections of memory. If you know where the beginning is, you can just do a single addition to find the location of any element. Operations at the front are expensive because elements may have to be shifted to make room.
LinkedLists are disjoint in memory and consist of nodes linked to each other (with a reference to the first node). To find the nth node, you have to start at the first node and follow links until you reach the desired node. Operations at the front just require creating a node and updating your start pointer.
I vote for double linked list. http://docs.oracle.com/javase/6/docs/api/java/util/Deque.html
Probably the best data structure for this purpose would be a deque implemented with a dynamic array, which is basically an ArrayList that starts adding elements to the middle of the internal array instead of the beginning. Unfortunately Java's ArrayDeque does not support looking up an nth element.
It is, however, pretty easy to implement one yourself (or lookup an existing implementation), and then all three of the described operations can be done in O(1).
YOu can do all of them with arrayList with minimal confusion if your not worried about efficiency.
i would uses some sort of a queue or stack if i am only inserting at the front or end. They have the least overhead. Or you could also use a linked list.
To remove N elements from the first or end i would use a linked list, you can just delete one node and the ones before or after it are gone. Ie if i delete the first 5 elements just delete the 5th element and the ones before it will disappear. Also if i delete the last 6 elements just delete the 6th to last one and the rest will disappear. And java will do the garbage collecting for you. This would be an order of (1) for this operation.
is this a homework question?
Definitely go for LinkedList. For both inserting a value at the beginning/end of the list and removing the first/last element in the list, it runs in O(1). This is because all that needs to be changed to carry out these operations is a couple of pointers, a minimally costly operation.
Although ArrayLists retrieve the nth element in O(1) while LinkedLists retrieve the nth element in O(n), ArrayLists run the danger of having to adjust their size when elements are inserted. What do you suppose happens when the memory allotted for the ArrayList is used up and you try to insert another element? Well what happens is the ArrayList duplicates itself then allocates more memory (amounting to twice as much as it had initially allocated), a very costly operation. LinkedLists don't have this problem since, again, all that is done is the addition of a pointer.
I don't know a whole lot about Java Vectors, but if they're anything like C++ vectors, then they're very similar to ArrayLists.
I hope this helps.
java.util.TreeMap of Long to Object, and use index of i+tm.firstKey()

Is there a native or faster way to partition a list in java?

say I have a list like this {1,2,3,4,5,6,7,8}. Is there a fast way to create two lists(one with the first half and the other with the second half)? The location of the split is always half the size of the list(lists are always even numbers)
Currently my approach is to divide the size by 2 and then iterate the list adding anything below the value in list1 and anything above in list2. I was just wondering if there was a quicker way(I need to do over a billion of these, so even a slight improvement in performance can save me alot of time).
As far as built-in functionality, you could use List#subList(int, int):
int size = original.size();
List<Integer> first = original.subList(0, size / 2);
List<Integer> second = original.subList(size / 2, size);
Whether or not you want to use subList() depends on what you're doing with the contents. subList() returns views into the original list (instead of actually copying). Make sure you read the javadoc. Here's a snippet:
Returns a view of the portion of this list between the specified
fromIndex, inclusive, and toIndex, exclusive. (If fromIndex and
toIndex are equal, the returned list is empty.) The returned list is
backed by this list, so non-structural changes in the returned list
are reflected in this list, and vice-versa. The returned list supports
all of the optional list operations supported by this list.
Also, I can't really speak to the performance of subList() and how it may or may not meet your requirements. I'd imagine since it's just creating views and not copying that it'd be relatively fast. But again, pretty situational, and you'd need to profile it either way to know.
Ok, you have to do over a billion of theese - fun.
You should profile different methods.
My bet would be to use arrays and System.arraycopy
If your code isn't the code producing the lists, it depends on which implementation of List you are being provided.
You'd be given a better answer if you we're providing more details about the requirements. Is a copy necessary or is a view (as provided by subList() enought). Which implementation of List are you using/being provided etc.
The speed is also going to be affected by jvm (version and platform).

Categories