Opertion performance between ArrayList or single String - java

Performance wise, is it better to use ArrayLists to store a list of values, or a String (using concat/+)? Intuitively, I think Strings would perform better since it'd likely use less overhead than ArrayLists but I haven't been able to find anything online.
Also, the entries wouldn't be anything too large (~10).

ArrayList operations
You can get a value from an ArrayList in O(1) and add a value in O(1).
Furthermore, ArrayList has already build-in operations that help you to retrieve and add elements.
String operations
Concatenation: With a concat and slice operations, it will be worst. A String is roughly speaking arrays of characters. For example, "Hello" + "Stack" can be represented as array ['H', 'e', 'l', 'l', 'o'] and array ['S', 't', 'a', 'c', 'k'].
Now, if you want to concat these two String, you will have to combined all elements of both arrays. It will give you an array of length 10. Therefore, the concatenation - or creating your new char array - is operation in O(n + m).
Worst, if you are concatening n String, you will have a complexity of O(n^2).
Splitting: the complexity of splitting a String is usually O(N) or more. It depends on the regex you will give for the split-operation.
Operations with String are often not that readable and can be stricky to debug.
Long story short
An ArrayList is usually better than operation with String. But all depend on your use case.

Just use ArrayList, it stores the references to your object values, and the reference is not big at all, and that's the point to use references. I keep wondering why will you want to store the value inside a String... that's just odd. ArrayList to store values and get them are faster enough, and String implementation, inside uses arrays also... so... Use ArrayList.

Java performance doesn't come out of "clever" Java source code.
It comes out of the JIT doing a good job at runtime.
You know what the JIT needs to be able to do a good job?
code that looks like code everybody else is writing (it is optimised to produce optimal results for the sort of code everybody else writes)
many thousands of method invocations.
Meaning: the JIT decides whether it makes sense to re-compile code. When the JIT decides to do so, and only then, you want to make sure it can do that.
"Custom" clever java source code ideas, such as what you are proposing here might achieve the opposite.
Thus: don't invent a clever strategy to mangle values into a String. Write simple, human understandable code that uses a List. Because Lists are the concept that Java offers here.
The only exception would be: if you experience a real performance bottleneck, and you did good profiling, and then you figure: the list isn't good enough, then you would start making experiments using other approaches. But I guess that didn't happen yet. You assume you have a performance problem, and you assume that you should fix it that way. Simply wrong, a waste of time and energy.

ArrayList is better choice because String is underneath an array of chars. So every concatenation is just copying the whole old string to a new place, with added new value - time O(n) for each operation.
When you use ArrayList, it has an initial capacity - until it is filled - every add operation is in time O(1). adding new String into ArrayList is only slower when array is full, then it needs to be copied into new place with more capacity. But only references needs to be moved, not the whole Strings - which is way faster than moving whole Strings.
You can even make ArrayList performance better by setting initial capacity when you know how many element do you have:
List<String> list = new ArrayList<String>(elementsCount);

Related

Is LinkedList.toString().replace() O(2*k)?

I know that converting a suitable object (e.g., a linked list) to a string using the toString() method is an O(n) operation, where 'n' is the length of the linked list. However, if you wanted to then replace something in that that string using the replace(), method, is that also an o(k) method, where 'k' is the length of the string?
For example, for the line String str = path.toString().replace("[", "").replace("]", "").replace(",", "");, does this run through the length of the linked list 1 time, and then the length of the string an additional 3 times? If so, is there a more efficient way to do what that line of code does?
Yes, it would. replace has no idea that [ and ] are only found at the start and end. In fact, it's worse - you get another loop for copying the string over (the string has an underlying array and that needs to be cloned in its entirety to lop a character out of it).
If your intent is to replace every [ in the string, then, no, there is no faster way. However, if your actual intent is to simply not have the opening brace and closing brace, then either write your own loop to toString the contents. Something like:
LinkedList<Foo> foos = ...;
StringBuilder out = new StringBuilder();
for (Foo f : foos) out.append(out.length() == 0 ? "" : ", ").append(f);
return out.toString();
Or even:
String.join(", ", foos);
Or even:
foos.stream().collect(Collectors.joining(", "));
None of this is the same thing as .replace("[", "") - after all, if a [ symbol is part of the toString() of any Foo object, it would be stripped out as well with .replace("[", "") - though you probably didn't want that to happen.
Note that the way modern CPUs work, unless that list has over a few thousand elements in it, looping it 4 times is essentially free and takes no measurable time. The concept of O(n) 'kicks in' after a certain number of loops. On modern hardware, it tends to be a lot of loops before it matters. Often other concerns are much more important. As a simple example, linked list, in general? Horrible performance relative to something like ArrayList. Even in cases where O(k) wise it should be faster. It's due to the way linkedlists create extra objects and how these tend to be non-contiguous (not near each other in memory). Modern CPUs can't read memory at all. They can ask the memory controller to take one of the on-die cache pages and replace it with the contents of another memory page, which takes 500 to a 1000 cycles. The CPU will ask the memory controller to do that and then go to sleep for 1000 cycles. You can see how reducing the number of times it does this can have a rather marked effect on performance, and yet the O(k) business doesn't and cannot take it into account.
Do not worry about performance unless you have a real life scenario where the program appears to run slower than you think it should. Then, use a profiler to figure out which 1% of the code is eating 99% of the resources (because it's virtually always a 1% 'hot path' that is responsible) and then optimize just that 1%. It's pretty much impossible to predict what the 1% is going to be. So, don't bother trying to do so while writing code, it just leads you to writing harder to maintain, less flexible code - which ironically enough tends to lead to situations where adjusting the hot path is harder. Worrying about performance, in essence, slows down the code. Hence why it's very very important not to worry about that, and worry instead about code that is easy to read and easy to modify.

Is if statement in for loop faster than one by one if statements in Java?

I wonder that if I use a HashMap to collect the conditions and loop each one in one if statement can I reach higher performance rather than to write one by one if - else if statement?
In my opinion, one-by-one if-else, if statements may be faster because in for loop runs one more condition in each loop like, does the counter reach the target number? So actually each if statement, it runs 2 if statements. Of course inside of the statements different but if we talk about just statement performance, I think one-by-one type would be better?
Edit: this is just a sample code, my question is about the performance differences between the usage of these statements.
Map<String, Integer> words = new HashMap<String, Integer>
String letter ="d";
int n = 4;
words.put("a",1);
words.put("b",2);
words.put("c",3);
words.put("d",4);
words.put("e",5);
words.forEach((word,number)->{
if(letter.equals(word){
System.out.println(number*n);
});
String letter ="d";
int n = 4;
if(letter.equals("a"){
System.out.println(number*1);
}else if(letter.equals("b"){
System.out.println(number*2);
}else if(letter.equals("c"){
System.out.println(number*3);
}else if(letter.equals("d"){
System.out.println(number*4);
}else if(letter.equals("e"){
System.out.println(number*5);
}
For your example, having a HashMap but then doing an iterative lookup seems to be a bad idea. The point of using a HashMap is to be able to do a hash based lookup. That is much faster than doing an iterative lookup.
Also, from your example, cascading if-then tests will definitely be faster, since they will avoid the overhead of the map iterator and extra function calls. Also, they will avoid the overhead of the map iterator skipping empty storage locations in the hash map backing array. A better question is whether the cascading if-thens are faster than iterating across a simple list. That is hard to answer. Cascading if-thens seem likely to be faster, except that if there are a lot of if-thens, then a cost of loading the code should be added.
For string lookups, a list data structure provides adequate behavior up to a limiting value, above which a more sophisticated data structure must be used. What is the limiting value depends on the environment. For string comparisons, I've found the transition between 20 and 100 elements.
For particular lookups, and whether low level optimizations are available, the transition value may be much larger. For example, doing integer lookups using "C", which will can do direct memory lookups, the transition value is much higher.
Typical data structures are HashMaps, Tries, and sorted arrays. Each fits particular patterns of access. For example, sorted arrays are fastest and most compact, but are expensive to update. HashMaps support dynamic updates, and for good hash functions, provide constant time lookups. But, HashMaps are space inefficient, since they depend on having empty cells between hash values.
For cases which do not involve "very large" data sets, and which are not in critical "hot" code paths, HashMaps are the usual structure which is used.
If you have a Map and you want to retrieve one letter, I'm not sure why you would loop at all?
Map<String, Integer> words = new HashMap<String, Integer>
String letter ="d";
int n = 4;
words.put("a",1);
words.put("b",2);
words.put("c",3);
words.put("d",4);
words.put("e",5);
if (words.containsKey(letter) {
System.out.println(words.get(letter)*n);
}
else
{
System.out.println(letter + " doesn't exist in Map");
}
If you aren't using the benefits of a Map, then why use a Map at all?
A forEach will actually touch every key in the list. The number of checks on your if/else is dependent on where it is in the list and how long the list of available letters is. If the letter you choose is the last one in the list then it would complete all checks before printing. If it is first then it will only do one which is much faster than having to check all.
It would be easy for you to write the two examples and run a timer to determine which is actually faster.
https://www.baeldung.com/java-measure-elapsed-time
There are a lot of wasted calculations if you have to run through 1 million if/else statements and only select one which could be anywhere in the list. This doesn't include typos and the horror of code maintenance. Using a Map with an index would be much quicker. If you are only talking about 100 if/else statements (still too many in my opinion) then you may be able to break even on speed.

How to increase efficiency

I have the following homework question:
Suppose you are given two sequences S1 and S2 of n elements, possibly containing duplicates, on which a total order relation is defined. Describe an efficient algorithm for determining if S1 and S2 contain the same set of elements. Analyze the running time of this method
To solve this question I have compared elemements of the two arrays using retainAll and a HashSet.
Set1.retainAll(new HashSet<Integer>(Set2));
This would solve the problem in constant time.
Do I need to sort the two arrays before the retainAll step to increase efficiency?
I suspect from the code you've posted that you are missing the point of the assignment. The idea is not to use a Java library to check if two collections are equal (for that you could use collection1.equals(collections2). Rather the point is to come up with an algorithm for comparing the collections. The Java API does not specify an algorithm: it's hidden away in the implementation.
Without providing an answer, let me give you an example of an algorithm that would work, but is not necessarily efficient:
for each element in coll1
if element not in coll2
return false
remove element from coll2
return coll2 is empty
The problem specifies that the sequences are ordered (i.e. total order relation is defined) which means you can do much better than the algorithm above.
In general if you are asked to demonstrate an algorithm it's best to stick with native data types and arrays - otherwise the implementation of a library class can significantly impact efficiency and hide the data you want to collect on the algorithm itself.

Performance of HashMap

I have to process 450 unique strings about 500 million times. Each string has unique integer identifier. There are two options for me to use.
I can append the identifier with the string and on arrival of the
string I can split the string to get the identifier and use it.
I can store the 450 strings in HashMap<String, Integer> and on
arrival of the string, I can query HashMap to get the identifier.
Can someone suggest which option will be more efficient in terms of processing?
It all depends on the sizes of the strings, etc.
You can do all sorts of things.
You can use a binary search to get the index in a list, and at that index is the identifier.
You can hash just the first 2 characters, rather than the entire string, that would likely be faster than the binary search, assuming the strings have an OK distribution.
You can use the first character, or first two characters, if they're unique as a "perfect index" in to 255 or 65K large array that points to the identifier.
Also, if your identifier is numeric, it's better to pre-calculate that, rather than convert it on the fly all the time. Text -> Binary is actually rather expensive (Binary -> Text is worse). So it's probably nice to avoid that if possible.
But it behooves you work the problem. 1 million anything at 1ms each, is 20 minutes of processing. At 500m, every nano-second wasted adds up to 8+ minutes extra of processing. You may well not care, but just demonstrating that at these scales "every little bit helps".
So, don't take our words for it, test different things to find what gives you the best result for your work set, and then go with that. Also consider excessive object creation, and avoiding that. Normally, I don't give it a second thought. Object creation is fast, but a nano-second is a nano-second.
If you're working in Java, and you don't REALLY need Unicode (i.e. you're working with single characters of the 0-255 range), I wouldn't use strings at all. I'd work with raw bytes. String are based on Java characters, which are UTF-16. Java Readers convert UTF-8 in to UTF-16 every. single. time. 500 million times. Yup! Another few nano-seconds. 8 nano-seconds adds an hour to your processing.
So, again, look in all the corners.
Or, don't, write it easy, fire it up, run it over the weekend and be done with it.
If each String has a unique identifier then retrieval is O(1) only in case of hashmaps.
I wouldn't suggest the first method because you are splitting every string for 450*500m, unless your order is one string for 500m times then on to the next. As Will said, appending numeric to strings then retrieving might seem straight forward but is not recommended.
So if your data is static (just the 450 strings) put them in a Hashmap and experiment it. Good luck.
Use HashMap<Integer, String>. Splitting a string to get the identifier is an expensive operation because it involves creating new Strings.
I don't think anyone is going to be able to give you a convincing "right" answer, especially since you haven't provided all of the background / properties of the computation. (For example, the average length of the strings could make a lot of difference.)
So I think your best bet would be to write a benchmark ... using the actual strings that you are going to be processing.
I'd also look for a way to extract and test the "unique integer identifier" that doesn't entail splitting the string.
Splitting the string should work faster if you write your code well enough. In fact if you already have the int-id, I see no reason to send only the string and maintain a mapping.
Putting into HashMap would need hashing the incoming string every time. So you are basically comparing the performance of the hashing function vs the code you write to append (prepending might be a bit more tricky) on sending end and to parse on receiving end.
OTOH, only 450 strings aren't a big deal, and if you're into it, writing your own hashing algo/function would actually be the most elegant and performant.

How to make all possible power subsets from an arrayList including a particular item?

Say I have an arrayList of strings like [a, b, c, d, ....]. Can anybody help me with a sample code that how can I come out with a result that contains all possible power subsets form this list which are including a particular string from that list(except the single and empty subset)?
For example: if I like to get all the power subsets including a from the example list then the output will be:
[a,b], [a,c], [a,d], [a,b,c], [a,b,d], [a,c,d] without the empty and single subset([a])
Similarly if I want for b then the output will be:
[b,a], [b,c], [b,d], [b,a,c], [b,a,d], [b,c,d] without the empty and single subset([b])
As all of the items in the example list are string then their might be a memory problem when the subsets will be too rich. Because I need to keep this subsets in memory for a single string at a time. So I also need help about what would be the optimized solution for this scenario?
I need the help in Java. As I am not that much good at Java please pardon me if I made any mistake!
Thanks!
If your initial arraylist of strings has 30 or fewer items, you can use the set method powerSet (http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Sets.html#powerSet%28java.util.Set%29 - Thanks, Jochen). The documentation claims the memory usage is only O(n) for the set-of-sets returned by that method. You can then iterate over that, with an if condition to only consider sets which contain "A" and are of size 2 or greater.
I recommend you try the above or a similar simple solution first and see if you run into memory problems.
If you do run into memory issues, you can try to optimize by minimizing the number of copies of the strings you hold in memory. For example, you can use lists of bytes, shorts, or ints (depending on how long your arraylist is) where each is an index into your arraylist of strings.
The ultimate way to reduce memory usage, however, would be to only hold one subset in memory at a time (if possible). I.e. generate (A, B), process it, discard it, then generate (A, C), process it, discard it, etc.

Categories