StringBuilder insert() vs append() performance? - java

Is there any difference in the performance of insert() vs append() from StringBuilder class? I will be building plenty of short string as text identifiers and asked myself this question... Should I initialize SB with a separator and use insert + append or just append ?

Knowing that:
An insert at the end of the string representation is equivalent to an append in term of time complexity (O(n)).
An insert anywhere else than at the end can't be obtained with an append (as they have differents purposes).
For info, an insert may involve up to 3 System.arraycopy (native) calls, while an append 1.
You can easily conclude:
If you want to insert at the end of the string representation, use append
Otherwise, use insert
Doing so, you will have the best performance. But again, these two methods serving two differents purposes (with the exception of inserting at the end), there is no real question here.

They have different functionalities and different complexities,
insert:
(ensures The Capacity of the backing array, needs to copy the old one if necessary)
pushes the elements leading the item at the insertion index (offset)
Where append:
(ensures The Capacity of the backing array, needs to copy the old one if necessary)
adds the new element to the tail of the array
So if you want to always add to the tail, then the performance will be the same since insert will not push any elements.
So, I would use append, it is just cleaner.

According to Java API docs. You must provide offset if you use insert.
StringBuilder.insert(5, "String");
but StringBuilder.append("string") doesn't. I assume append has a better performance than insert.

Related

JAVA the efficient way to read a logs from file

I'm looking for most effective way to get all the elements from List<String> which contain some String value ("value1") for example.
First thought - simple iteration and adding the elements which contains "value1" to another List<String> But this task must be done very often and by many users.
Thought about list.RemoveAll(), but how do I remove all elements which don't contain "value1"?
So, what is the way to make it most efficiently?
UPDATE:
The whole picture - need to need to read the logs from file very often and for multiple users simultaneously. The logs must be filtered by the username from file. each string in file contains username.
In terms of time efficiency, you cannot get to better result than linear (O(n)) if you want to iterate through the whole list.
Deciding between LinkedList and ArrayList etc. is most likely irrelevant as the differences are small.
If you want a better time than linear to list size, you need to build on some assumptions and prerequisites:
if you know beforehand what string you'll search for, you can build another list along with your original list containing only relevant records
if you know you're going to query one list multiple times, you could build an index
If you just have a list on input that someone gave you, and you need to read through this one input once and find the relevant strings, then you're stuck with linear time since you cannot avoid reading the list at least once.
From your comments it seems like your list is a couple of log statements that should be grouped by user id (which would be your "value1"). If you really need to read the logs very often and for multiple users simultaneously you might consider some caching, possibly with grouping by user id.
As an example you could maintain an additional log file per user and just display it when needed. Alterantively you could keep the latest log statements in memory by employing some FIFO buffer which is grouped by user id (could be a buffer per user and maybe another LIFO layer on top of that).
However, depending on your use case it might not be worth the effort and you might just go and filter the list whenever the user requests to do so. In that case I'd recommend reading the file line by line and only adding the matching lines to the list. If you first read everything into a single list and then remove non-matching elements it'll be less efficient (you'd have to iterate more often, shift elements etc.) and temporarily use more memory (as opposed by discarding every non-matching line right after checking it).
Instead of List, Use TreeSet with provided Comparator so that all Strings with "value1" are at the beginning. When iterating, as soon as the string does not contain "value1", all the remaining do not have it, and you can stop to iterate.
The iteration is likely the only way, but you can allow Java to optimize it as much as possible (and use an elegant, non imperative syntax) by employing Java 8's streams:
// test list
List<String> original = new ArrayList<String>(){
{
add("value1");add("foo");add("foovalue1");add("value1foo");
}
};
List<String> trimmed = original
.stream()
.filter((s) -> s.contains("value1"))
.collect(Collectors.toList());
System.out.println(trimmed);
Output
[value1, foovalue1, value1foo]
Notes
One part of your question that may require more information is "performed often, by many users" - this may call for some concurrency-handling mechanism.
The actual functionality is not very clear. You may still have room to optimize your code early by fetching and collecting the "value1"-containing Strings prior to building you List
Ok, in this I can suggest you the simplest one, I had used.
Use of an Iterator, makes it easier but if you go with list.remove(val) , where val = "value1" , may give you UnsupportedOperationException
List list = yourList; /contains "value1"/
for (Iterator<String> itr = list.iterator(); itr.hasNext();){
String val = itr.next();
if(!val.equals("value1")){
itr.remove();
}
}
Try this one and let me know. :)

What data structure to use for indexing data for partial %infix% searching?

Imagine you have a huge cache of data that is to be searched through by 4 ways :
exact match
prefix%
%suffix
%infix%
I'm using Trie for the first 3 types of searching, but I can't figure out how to approach the fourth one other than sequential processing of huge array of elements.
If your dataset is huge cosider using a search platform like Apache Solr so that you dont end up in a performance mess.
You can construct a navigable map or set (eg TreeMap or TreeSet) for the 2 (with keys in normal order) and 3 (keys in reverse)
For option 4 you can construct a collection with a key for every starting letter. You can simplify this depending on your requirement. This can lead to more space being used but get O(log n) lookup times.
For #4 I am thinking if you pre-compute the number of occurances of each character then you can look up in that table for entires that have at least as many occurances of the characters in the search string.
How efficient this algorithm is will probably depend on the nature of the data and the search string. It might be useful to give some examples of both here to get better answers.

Which list implementation is optimal for removing and inserting from the front and back?

I am working on an algorithm that will store a small list of objects as a sublist of a larger set of objects. the objects are inherently ordered, so an ordered list is required.
The most common operations performed will be, in order of frequency:
retrieving the nth element from the list (for some arbitrary n)
inserting a single to the beginning or end of the list
removing the first or last n elements from the list (for some
arbitrary n)
removing and inserting from the middle will never be done so there is no need to consider the efficiency of that.
My question is what implementation of List is most efficient for this use case in Java (i.e. LinkedList, ArrayList, Vector, etc)? Please defend your answer by explaining the implementation s of the different data structures so that I can make an informed decision.
Thanks.
NOTE
No, this is not a homework question. No, I do not have an army research assistants who can do the work for me.
Based on your first criteria (arbitrary access) you should use an ArrayList. ArrayLists (and arrays in general) provide lookup/retrieval in constant time. In contrast, it takes linear time to look up items in a LinkedList.
For ArrayLists, insertion or deletion at the end is free. It may also be with LinkedLists, but that would be an implementation-specific optimization (it's linear otherwise).
For ArrayLists, insertion or deletion at front requires linear time (with consistent reuse of space, these may become constant depending on implementation). LinkedList operations at front of list are constant.
The last two usage cases somewhat balance each other out, however your most common case definitely suggests array-based storage.
As far as basic implementation details:
ArrayLists are basically just sequential sections of memory. If you know where the beginning is, you can just do a single addition to find the location of any element. Operations at the front are expensive because elements may have to be shifted to make room.
LinkedLists are disjoint in memory and consist of nodes linked to each other (with a reference to the first node). To find the nth node, you have to start at the first node and follow links until you reach the desired node. Operations at the front just require creating a node and updating your start pointer.
I vote for double linked list. http://docs.oracle.com/javase/6/docs/api/java/util/Deque.html
Probably the best data structure for this purpose would be a deque implemented with a dynamic array, which is basically an ArrayList that starts adding elements to the middle of the internal array instead of the beginning. Unfortunately Java's ArrayDeque does not support looking up an nth element.
It is, however, pretty easy to implement one yourself (or lookup an existing implementation), and then all three of the described operations can be done in O(1).
YOu can do all of them with arrayList with minimal confusion if your not worried about efficiency.
i would uses some sort of a queue or stack if i am only inserting at the front or end. They have the least overhead. Or you could also use a linked list.
To remove N elements from the first or end i would use a linked list, you can just delete one node and the ones before or after it are gone. Ie if i delete the first 5 elements just delete the 5th element and the ones before it will disappear. Also if i delete the last 6 elements just delete the 6th to last one and the rest will disappear. And java will do the garbage collecting for you. This would be an order of (1) for this operation.
is this a homework question?
Definitely go for LinkedList. For both inserting a value at the beginning/end of the list and removing the first/last element in the list, it runs in O(1). This is because all that needs to be changed to carry out these operations is a couple of pointers, a minimally costly operation.
Although ArrayLists retrieve the nth element in O(1) while LinkedLists retrieve the nth element in O(n), ArrayLists run the danger of having to adjust their size when elements are inserted. What do you suppose happens when the memory allotted for the ArrayList is used up and you try to insert another element? Well what happens is the ArrayList duplicates itself then allocates more memory (amounting to twice as much as it had initially allocated), a very costly operation. LinkedLists don't have this problem since, again, all that is done is the addition of a pointer.
I don't know a whole lot about Java Vectors, but if they're anything like C++ vectors, then they're very similar to ArrayLists.
I hope this helps.
java.util.TreeMap of Long to Object, and use index of i+tm.firstKey()

Merge two ArrayList in java

I have two identical array lists in java each having a string value and an integer count. Now I have to merge these array lists into a single one, in which if the value is present, i will just increment the count, if the value is not present, i will just add the value and the count as such.
The question is, is there anyway I can do it graciously other than iterating in a for loop and if checking every value?
You can't, there's too much custom logic. Iterate, check and add - that's the best approach, and will be more readable.
Technically, you can use a Multiset from guava, but there the count is taken care of by the collection itself, rather than you, so it might require some more work.
The question is, is there anyway I can do it graciously other than
iterating in a for loop and if checking every value?
Short answer is no.
You would be better of using HashMap as a container, at least the merging operation would perform faster. You need a loop in any case. (since there is no addAll / putAll wich could update your counts).

Help need in creating a hashset from a hashmap

I've been able to read a four column text file into a hashmap and get it to write to a output file. However, I need to get the second column(distinct values) into a hashset and write to the output file. I've been able to create the hashset, but it is grabbing everything and not sorting. By the way I'm new, so please take this into consideration when you answer. Thanks
Neither HashSet nor HashMap are meant to sort. They're fundamentally unsorted data structures. You should use an implementation of SortedSet, such as TreeSet.
Some guesses, related to mr Skeets answer and your apparent confusion...
Are you sure you are not inserting the whole line in the TreeSet? If you are going to use ONLY the second column, you will need to split() the strings (representing the lines) into columns - that's nothing that's done automatically.
Also, If you are actually trying to sort the whole file using the second column as key, You will need a TreeMap instead, and use the 2:nd column as key, and the whole line as data. But that won't solve the splitting, it only to keep the relation between the line and the key.
Edit: Here is some terminology for you, you might need it.
You have a Set. It's a collection of other objects - like String. You add other objects to it, and then you can fetch all objects in it by iterating through the set. Adding is done through the method add()and iterating can be done using the enhanced for loop syntax or using the iterator() method.
The set doesn't "grab" or "take" stuff; You add something to the set - in this case a String - Not an array of Strings which is written as String[]
(Its apparently possible to add array to a TreeSet (they are objects too) , but the order is not related to the contents of the String. Maybe thats what you are doing.)
String key = splittedLine[1]; // 2:nd element
"The second element of the keys" doesn't make sense at all. And what's the duplicates you're talking about. (note the correct use of apostrophes... :-)

Categories