which is better option? - java

i need to maintain a list containing two values of string type say v1,v2 for each key say k.
What is a better option
Hashmap with value containing a string containing v1 and v2 and using split() to retrive the correct value after selection.
Hashmap with value containing array of two string variables
I am creating an android app, so just concern about performance. In second case i can access directly but each value will contain another array ( i dont know but it appears like a complicated way) , while in 1st case it will use split function on every access like v.split(",")[0]
Please guide me.
Map<String,String[]> listMap= new HashMap<String, String[]>();
Map<String,String> listMap1= new HashMap<String, String>();;
for (int i = 1; i < tl.getChildCount(); i++) {
TableRow row = (TableRow) tl.getChildAt(i);
COLOR_TABLE clr = (COLOR_TABLE) row.getTag();
if (clr == COLOR_TABLE.green) {
//comp
String x1=listMap1.get( ((TextView) row.getChildAt(0)).getText());
String x2=listMap.get( ((TextView) row.getChildAt(0)).getText());
// now i have to add two string values in a list seperately
}
}

Never abuse Strings! It may be slow at times and they are not made for that purpose.
You could use a generic Pair class if you want to do it more object oriented:
public class Pair<A, B> {
public A first;
public B second;
public Pair(A first, B second) {
this.first = first;
this.second = second;
}
}
Of course you can do it better, with accessors and so.

i think split function runs with O(string size) complexity but reach element of an array is a constant

The string will be much slower than the array, and also have more complicated code. (With the caveat that it is hard to be sure about performance differences until one has actually measured it.)
But if it was me, I would use the simplest solution, and use an object. Later, if the program turns out to be too slow, and measurement shows this to be a performance bottleneck, I would consider other solutions.

Using an array or a custom class as the value would be the preferred approach from a design perspective. If you pack two strings into one separated by a character you need to make sure that the first string won't ever contain that same character, or you must use some kind of an escape mechanism to be able to include it.
If you are only worried about performance test all options and pick the best one. It's hard make predictions because the results will depend on the data you store and on how libraries have been implemented. For example, any structure that stores references to independent strings will potentially suffer a cache miss on access; if the string split or a similar method turns out to be cheaper than a cache miss it will be faster.

Related

Java concatenate strings vs static strings

I try to get a better understanding of Strings. I am basically making a program that requires a lot of strings. However, a lot of the strings are very, very similar and merely require a different word at the end of the string.
E.g.
String one = "I went to the store and bought milk"
String two = "I went to the store and bought eggs"
String three = "I went to the store and bought cheese"
So my question is, what approach would be best suited to take when dealing with strings? Would concatenating 2 strings together have any benefits over just having static strings in, say for example, performance or memory management?
E.g.
String one = "I went to the store and bought "
String two = "milk"
String three = "cheese"
String four = one + two
String five = one + three
I am just trying to figure out the most optimal way of dealing with all these strings. (If it helps to put a number of strings I am using, I currently have 50 but the number could surplus a huge amount)
As spooky has said the main concern with the code is readability. Unless you are working on a program for a phone you do not need to manage your resources. That being said, it really doesn't matter whether you create a lot of Strings that stand alone or concatenate a base String with the small piece that varies. You won't really notice better performance either way.
You may set the opening sentence in a string like this
String openingSentence = "I went to the store and bought";
and alternate defining each word alone, by defining one array of strings like the following ::
String[] thingsToBeBought = { "milk", "water", "cheese" .... };
then you can do foreach loop and concatenate each element in the array with the opening sentence.
In Java, if you concatenate two Strings (e.g. using '+') a new String is created, so the old memory needs to be garbage collected. If you want to concatenate strings, the correct way to do this is to use a StringBuilder or StringBuffer.
Given your comment about these strings really being URLs, you probably want to have a StringBuilder/StringBuffer that is the URL base, and then append the suffixes as needed.
Performance wise final static strings are always better as they are generated during compile time. Something like this
final static String s = "static string";
Non static strings and strings concatenated as shown in the other example are generated at runtime. So even though performance will hardly matter for such a small thing, The second example is not as good as the first one performance wise as in your code :
// not as good performance wise since they are generated at runtime
String four = one + two
String five = one + three
Since you are going to use this string as URL, I would recommend to use StringJoiner (in case your are using JAVA 8). It will be as efficient as StringBuilder (will not create a new string every time you perform concatenation) and will automatically add "/" between strings.
StringJoiner myJoiner = new StringJoiner("/")
There will be no discernable difference in performance, so the manner in which you go about this is more a matter of preference. I would likely declare the first part of the sentence as a String and store the individual purchase items in an array.
Example:
String action = "I went to the store and bought ";
String [] items = {"milk", "eggs", "cheese"};
for (int x = 0; x< items.length; x++){
System.out.println(action + items[x]);
}
Whether you declare every possible String or separate Strings to be concatenated isn't going to have any measurable impact on memory or performance in the example you give. In the extreme case of declaring truly large numbers of String literals, Java's native hash table of interned Strings will use more memory if you declare every possible String, because the table's cached values will be longer.
If you are concatenating more than 2 Strings using the + operator, you will be creating extra String objects to be GC'd. For example if you have Strings a = "1" and b = "2", and do String s = "s" + a + b;, Java will first create the String "s1" and then concatenate it to form a second String "s12". Avoid the intermediate String by using something like StringBuilder. (This wouldn't apply to compile-time declarations, but it would to runtime concatenations.)
If you happen to be formatting a String rather than simply concatenating, use a MessageFormat or String.format(). It's prettier and avoids the intermediate Strings created when using the + operator. So something like, String urlBase = "http://host/res?a=%s&b=%s"; String url = String.format(urlBase, a, b); where a and b are the query parameter String values.

Efficient data structure that checks for existence of String

I am writing a program which will add a growing number or unique strings to a data structure. Once this is done, I later need to constantly check for existence of the string in it.
If I were to use an ArrayList I believe checking for the existence of some specified string would iterate through all items until a matching string is found (or reach the end and return false).
However, with a HashMap I know that in constant time I can simply use the key as a String and return any non-null object, making this operation faster. However, I am not keen on filling a HashMap where the value is completely arbitrary. Is there a readily available data structure that uses hash functions, but doesn't require a value to be placed?
If I were to use an ArrayList I believe checking for the existence of some specified string would iterate through all items until a matching string is found
Correct, checking a list for an item is linear in the number of entries of the list.
However, I am not keen on filling a HashMap where the value is completely arbitrary
You don't have to: Java provides a HashSet<T> class, which is very much like a HashMap without the value part.
You can put all your strings there, and then check for presence or absence of other strings in constant time;
Set<String> knownStrings = new HashSet<String>();
... // Fill the set with strings
if (knownString.contains(myString)) {
...
}
It depends on many factors, including the number of strings you have to feed into that data structure (do you know the number by advance, or have a basic idea?), and what you expect the hit/miss ratio to be.
A very efficient data structure to use is a trie or a radix tree; they are basically made for that. For an explanation of how they work, see the wikipedia entry (a followup to the radix tree definition is in this page). There are Java implementations (one of them is here; however I have a fixed set of strings to inject, which is why I use a builder).
If your number of strings is really huge and you don't expect a minimal miss ratio then you might also consider using a bloom filter; the problem however is that it is probabilistic; but you can get very quick answers to "not there". Here also, there are implementations in Java (Guava has an implementation for instance).
Otherwise, well, a HashSet...
A HashSet is probably the right answer, but if you choose (for simplicity, eg) to search a list it's probably more efficient to concatenate your words into a String with separators:
String wordList = "$word1$word2$word3$word4$...";
Then create a search argument with your word between the separators:
String searchArg = "$" + searchWord + "$";
Then search with, say, contains:
bool wordFound = wordList.contains(searchArg);
You can maybe make this a tiny bit more efficient by using StringBuilder to build the searchArg.
As others mentioned HashSet is the way to go. But if the size is going to be large and you are fine with false positives (checking if the username exists) you can use BloomFilters (probabilistic data structure) as well.

How should I map string keys to values in Java in a memory-efficient way?

I'm looking for a way to store a string->int mapping. A HashMap is, of course, a most obvious solution, but as I'm memory constrained and need to store 2 million pairs, 7 characters long keys, I need something that's memory efficient, the retrieval speed is a secondary parameter.
Currently I'm going along the line of:
List<Tuple<String, int>> list = new ArrayList<Tuple<String, int>>();
list.add(...); // load from file
Collections.sort(list);
and then for retrieval:
Collections.binarySearch(list, key); // log(n), acceptable
Should I perhaps go for a custom tree (each node a single character, each leaf with result), or is there an existing collection that fits this nicely? The strings are practically sequential (UK postcodes, they don't differ much), so I'm expecting nice memory savings here.
Edit: I just saw you mentioned the String were UK postcodes so I'm fairly confident you couldn't get very wrong by using a Trove TLongIntHashMap (btw Trove is a small library and it's very easy to use).
Edit 2: Lots of people seem to find this answer interesting so I'm adding some information to it.
The goal here is to use a map containing keys/values in a memory-efficient way so we'll start by looking for memory-efficient collections.
The following SO question is related (but far from identical to this one).
What is the most efficient Java Collections library?
Jon Skeet mentions that Trove is "just a library of collections from primitive types" [sic] and, that, indeed, it doesn't add much functionality. We can also see a few benchmarks (by the.duckman) about memory and speed of Trove compared to the default Collections. Here's a snippet:
100000 put operations 100000 contains operations
java collections 1938 ms 203 ms
trove 234 ms 125 ms
pcj 516 ms 94 ms
And there's also an example showing how much memory can be saved by using Trove instead of a regular Java HashMap:
java collections oscillates between 6644536 and 7168840 bytes
trove 1853296 bytes
pcj 1866112 bytes
So even though benchmarks always need to be taken with a grain of salt, it's pretty obvious that Trove will save not only memory but will always be much faster.
So our goal now becomes to use Trove (seen that by putting millions and millions of entries in a regular HashMap, your app begins to feel unresponsive).
You mentioned 2 million pairs, 7 characters long keys and a String/int mapping.
2 million is really not that much but you'll still feel the "Object" overhead and the constant (un)boxing of primitives to Integer in a regular HashMap{String,Integer} which is why Trove makes a lot of sense here.
However, I'd point out that if you have control over the "7 characters", you could go even further: if you're using say only ASCII or ISO-8859-1 characters, your 7 characters would fit in a long (*). In that case you can dodge altogether objects creation and represent your 7 characters on a long. You'd then use a Trove TLongIntHashMap and bypass the "Java Object" overhead altogether.
You stated specifically that your keys were 7 characters long and then commented they were UK postcodes: I'd map each postcode to a long and save a tremendous amount of memory by fitting millions of keys/values pair into memory using Trove.
The advantage of Trove is basically that it is not doing constant boxing/unboxing of Objects/primitives: Trove works, in many cases, directly with primitives and primitives only.
(*) say you only have at most 256 codepoints/characters used, then it fits on 7*8 == 56 bits, which is small enough to fit in a long.
Sample method for encoding the String keys into long's (assuming ASCII characters, one byte per character for simplification - 7 bits would be enough):
long encode(final String key) {
final int length = key.length();
if (length > 8) {
throw new IndexOutOfBoundsException(
"key is longer than 8 characters");
}
long result = 0;
for (int i = 0; i < length; i++) {
result += ((long) ((byte) key.charAt(i))) << i * 8;
}
return result;
}
Use the Trove library.
The Trove library has optimized HashMap and HashSet classes for primitives. In this case, TObjectIntHashMap<String> will map the parameterized object (String) to a primitive int.
First of, did you measure that LinkedList is indeed more memory efficient than a HashMap, or how did you come to that conclusion? Secondly, a LinkedList's access time of an element is O(n), so you cannot do efficient binary search on it. If you want to do such approach, you should use an ArrayList, which should give you the beast compromise between performance and space. However, again, I doubt that a HashMap, HashTable or - in particular - a TreeMap would consume that much more memory, but the first two would provide constant access and the tree map logarithmic and provide a nicer interface that a normal list. I would try to do some measurements, how much the difference in memory consumption really is.
UPDATE: Given, as Adamski pointed out, that the Strings themselves, not the data structure they are stored in, will consume the most memory, it might be a good idea to look into data structures that are specific for strings, such as tries (especially patricia tries), which might reduce the storage space needed for the strings.
What you are looking for is a succinct-trie - a trie which stores its data in nearly the least amount of space theoretically possible.
Unfortunately, there are no succinct-trie classes libraries currently available for Java. One of my next projects (in a few weeks) is to write one for Java (and other languages).
In the meanwhile, if you don't mind JNI, there are several good native succinct-trie libraries you could reference.
Have you looked at tries. I've not used them but they may fit with what you're doing.
A custom tree would have the same complexity of O(log n), don't bother. Your solution is sound, but I would go with an ArrayList instead of the LinkedList because the linked list allocates one extra object per stored value, which will amount to a lot of objects in your case.
As Erick writes using the Trove library is a good place to start as you save space in storing int primitives rather than Integers.
However, you are still faced with storing 2 million String instances. Given that these are keys in the map, interning them won't offer any benefit so the next thing I'd consider is whether there's some characteristic of the Strings that can be exploited. For example:
If the Strings represent sentences of common words then you could transform the String into a Sentence class, and intern the individual words.
If the Strings only contain a subset of Unicode characters (e.g. only letters A-Z, or letters + digits) you could use a more compact encoding scheme than Java's Unicode.
You could consider transforming each String into a UTF-8 encoded byte array and wrapping this in class: MyString. Obviously the trade-off here is the additional time spent performing look-ups.
You could write the map to a file and then memory map a portion or all of the file.
You could consider libraries such as Berkeley DB that allow you to define persistent maps and cache a portion of the map in memory. This offers a scalable approach.
maybe you can go with a RadixTree?
Use java.util.TreeMap instead of java.util.HashMap. It makes use of a red black binary search tree and doesn't use more memory than what is required for holding notes containing the elements in the map. No extra buckets, unlike HashMap or Hashtable.
I think the solution is to step a little outside of Java. If you have that many values, you should use a database. If you don't feel like installing Oracle, SQLite is quick and easy. That way the data you don't immediately need is stored on the disk, and all of the caching/storage is done for you. Setting up a DB with one table and two columns won't take much time at all.
I'd consider using some cache as these often have the overflow-to-disk ability.
You might create a key class that matches your needs. Perhaps like this:
public class MyKey implements Comparable<MyKey>
{
char[7] keyValue;
public MyKey(String keyValue)
{
... load this.keyValue from the String keyValue.
}
public int compareTo(MyKey rhs)
{
... blah
}
public boolean equals(Object rhs)
{
... blah
}
public int hashCode()
{
... blah
}
}
try this one
OptimizedHashMap<String, int[]> myMap = new OptimizedHashMap<String, int[]>();
for(int i = 0; i < 2000000; i++)
{
myMap.put("iiiiii" + i, new int[]{i});
}
System.out.println(myMap.containsValue(new int[]{3}));
System.out.println(myMap.get("iiiiii" + 1));
public class OptimizedHashMap<K,V> extends HashMap<K,V>
{
public boolean containsValue(Object value) {
if(value != null)
{
Class<? extends Object> aClass = value.getClass();
if(aClass.isArray())
{
Collection values = this.values();
for(Object val : values)
{
int[] newval = (int[]) val;
int[] newvalue = (int[]) value;
if(newval[0] == newvalue[0])
{
return true;
}
}
}
}
return false;
}
Actually HashMap and List are too general for such specific task as a lookup of int by zipcode. You should use advantage of knowledge which data is used. One of the options is to use a prefix tree with leaves that stores the int value. Also, it could be pruned if (my guess) a lot of codes with same prefixes map to the same integer.
Lookup of the int by zipcode will be linear in such tree and will not grow if number of codes is increased, compare to O(log(N)) in case of binary search.
Since you are intending to use hashing, you can try numerical conversions of the strings based on ASCII values.
the simplest idea will be
int sum=0;
for(int i=0;i<arr.length;i++){
sum+=(int)arr[i];
}
hash "sum" using a well defined hash functions. You would use a hash function based on the expected input patterns.
e.g. if you use division method
public int hasher(int sum){
return sum%(a prime number);
}
selecting a prime number which is not close to an exact power of two improves performances and gives better uniformly hashed distribution of keys.
another method is to weigh the characters based on their respective position.
e.g: if you use the above method, both "abc" and "cab" will be hashed into a same location. but if you need them to be stored in two distinct location give weights for locations like we use the number systems.
int sum=0;
int weight=1;
for(int i=0;i<arr.length;i++){
sum+= (int)arr[i]*weight;
weight=weight*2; // using powers of 2 gives better results. (you know why :))
}
As your sample is quite large, you'd avoid collisions by a chaining mechanism rather than using a probe sequence.
After all,What method you would choose totally depends on the nature of your application.
The problem is objects' memory overhead, but using some tricks you can try to implement your own hashset. Something like this. Like others said strings have quite large overhead so you need to "compress" it somehow. Also try not to use too many arrays(lists) in hashtable (if you do chaining type hashtable) as they are also objects and also have overhead. Better yet do open addressing hashtable.

Data Structure for Storing Multi-key and Value Pairs in Java

Good evening,
I am searching for an elegant solution to implement a transition table (Specifically, for a universal pushdown automaton) that uses several discrete values for a given transition. They say a picture is worth a thousand words so here is part of my table:
State InputSymbol StackSymbol Move(NewState, Action)
------------------------------------------------------------
0 a Z0 (0, push)
0 a a (0, push)
0 a b (0, pop)
0 b Z0 (1, push)
...
Now, I have considered multi-dimensional arrays, ArrayLists of ArrayLists, and other solutions of the sort but all seem rather brutish. This is further complicated by the fact that every possible combination of my three symbols (a, b, and Z0) is not represented in the table.
I have been contemplating using a HashMap, but I am not entirely sure how to make this work with multiple key values. I was considering concatenating all three symbols together and using the resultant string as my key but that, too, seems less than elegant.
And, for the record, this is homework but actually giving an elegant solution is not strictly required. I just enjoy good code.
Thank you in advance for your assistance.
Make a class like this:
class Key{
int state;
char inputSysmbol;
String StackSymbol;
}
Then use the map Map<Key,Move>.
Make a class for Move just like above and don't forget to override hashcode and equals method in both classes.
Encapsulate {State, InputSymbol, StackSymbol} in an object. Then, use the "hashcode()" of the object as the key for your hash table. For more information regarding hash codes see Doc.
Take a look on MultiHashMap from Jakarta commons collections framework.
http://commons.apache.org/collections/api-release/org/apache/commons/collections/MultiHashMap.html
Unfortunately jakarta collection framework is not parametrized, i.e. does not support generics, so you will have to cast you your specific type.
Other solution is to implement something similar your self. Create class Move that holds your data. Create class Table that encapsulate the logic. This class may contain a list (better LinkedList) of Moves and N maps:
Map> stateIndex
Map> inputSymbolIndex
etc, etc.
implementation of add method is trivial.
Implementation of get is more interesting:
public Move get(Integer state, Character inputSymbol, StackSymbol stackSymbol) {
List<Move> stateList = stateIndex.get(state);
List<Move> inputSymbolList = stateIndex.get(inputSymbol);
List<Move> stackSymbolList = stateIndex.get(stackSymbol);
Set<Move> result = new HashSet<Move>(stateList);
result.retainAll(inputSymbolList);
result.retainAll(stackSymbolList);
if (result.size() > 1) {
throw new IllegalStateException("Unique constraint violation");
}
return result.size() == 0 ? null : result.interator().next();
}

What is the fastest way to find an array within another array in Java?

Is there any equivalent of String.indexOf() for arrays? If not, is there any faster way to find an array within another other than a linear search?
Regardless of the elements of your arrays, I believe this is not much different than the string search problem.
This article provides a general intro to the various known algorithms.
Rabin-Karp and KMP might be your best options.
You should be able to find Java implementations of these algorithms and adapt them to your problem.
List<Object> list = Arrays.asList(myArray);
Collections.sort(list);
int index = Collections.binarySearch(list, find);
OR
public static int indexOf(Object[][] array, Object[] find){
for (int i = 0; i < array.length(); i ++){
if (Arrays.equals(array[i], find)){
return i;
}
}
return -1;
}
OR
public static int indexOf(Object[] array, Object find){
for (int i = 0; i < array.length(); i ++){
if (array[i].equals(find)){
return i;
}
}
return -1;
}
OR
Object[] array = ...
int index = Arrays.asList(array).indexOf(find);
As far as I know, there is NO way to find an array within another without a linear search. String.indexOf uses a linear search, just inside a library.
You should write a little library called indexOf that takes two arrays, then you will have code that looks just like indexOf.
But no matter how you do it, it's a linear search under the covers.
edit:
After looking at #ahmadabolkader's answer I kind of take this back. Although it's still a linear search, it's not as simple as just "implement it" unless you are restricted to fairly small test sets/results.
The problem comes when you want to see if ...aaaaaaaaaaaaaaaaaab fits into a string of (x1000000)...aaaaaaaaab (in other words, strings that tend to match most places in the search string).
My thought was that as soon as you found a first character match you'd just check all subsequent characters one-on-one, but that performance would degrade terrifyingly when most of the characters matched most of the time. There was a rolling hash method in #a12r's answer that sounded much better if this is a real-world problem and not just an assignment.
I'm just going to vote for #a12r's answer because of those awesome Wikipedia references.
The short answer is no - there is no faster way to find an array within an array by using some existing construct in Java. Based on what you described, consider creating a HashSet of arrays instead of an array of arrays.
Normally the way you find things in collections in java is
put them in a hashmap (dictionary) and look them up by their hash.
loop through each object and test its equality
(1) won't work for you because an array object's hash won't tell you that the contents are the same. You could write some sort of wrapper that would create a hashcode based on the contents (you'd also have to make sure equals returned values consistent with that).
(2) also will require a bit of work because object equality for arrays will only test that the objects are the same. You'd need to wrap the arrays with a test of the contents.
So basically, not unless you write it yourself.
You mean you have an array which elements also are array elements? If that is the case and the elements are sorted you might be able to use binarysearch from java.util.Arrays

Categories