Storing 15,000 items in Java

Storing 15,000 items in Java - java

I have a document with 15,000 items. Each item contains 6 variables (strings and integers). I have to copy all of these into some sort of two dimensional array, what the best way to do it?
Here are my ideas so far:
Make a GIANT 2D array or array list the same way you make any other array.
Pros: Simple Cons: Messy(would create a class just for this), huge amount of code, if I make a mistake it will be imposable to find where it is, all variables would have to be string even the ints which will make my job harder down the road
Make a new class with a super that takes in all the variables I need.
Create each item as a new instance of this class.
Add all of the instances to a 2D array or array list.
Pros: Simple, less messy, easier to find a mistake, not all the variables need to be strings which makes it much easier later when I don't have to convert string to int, a little less typing for me Cons: Slower? Will instances make my array compile slower? And will they make the over all array slow when I'm searching to items in it?
These ideas don't seem all to great :( and before I start the three week, five hour a day process of adding these items I would like to find the best way so I won't have to do it again... Suggestions on my current ideas or any new ideas?
Data example:
0: 100, west, sports, 10.89, MA, united
*not actual data

Your second options seems to be good. You can create a class containing all the items and create an array of that class.
You may use the following:
1. Read the document using buffered reader, so that memory issues will not occur.
2. Create a class containing your items.
3. Create a List of type you need and store the elements into it.
Let me know in case you face further problems.

If you already have the document with the 15000 * 6 items, in my experience you would be better served writing a program to use regex and parse it and have the output be the contents of the java array in the format you want. With such a parsing program in place, it will then also be very easy for you to change the format of the 15000 lines if you want to generate it differently.
As to the final format, I would have an ArrayList of your bean. By you text thus far, you don't necessarily need a super that takes in the variables, unless you need to have subtypes that are differentiated.
You'll probably run out of static space in a single class. So what I do is break up a big class like that into a file with a bunch of inner nested classes that each have a 64K (or less) part of the data as static final arrays, and then I merge them together in the main class in that file.
I have this in a class of many names to fix:
class FixName{
static String[][] testStrings;
static int add(String[][] aTestStrings, int lastIndex){
for(int i=0; i<aTestStrings.length; ++i) {
testStrings[++lastIndex]=aTestStrings[i];
}
return lastIndex;
}
static {
testStrings = new String[
FixName1.testStrings.length
+FixName2.testStrings.length
+FixName3.testStrings.length
+FixName4.testStrings.length
/**/ ][];
int lastIndex=-1;
lastIndex=add(FixName1.testStrings,lastIndex);
lastIndex=add(FixName2.testStrings,lastIndex);
lastIndex=add(FixName3.testStrings,lastIndex);
lastIndex=add(FixName4.testStrings,lastIndex);
/**/ }
}
class FixName1 {
static String[][] testStrings = {
{"key1","name1","other1"},
{"key2","name2","other2"},
//...
{"keyN","nameN","otherN"}
};
}

Create a wrapper (Item) if you have not already(as your question does not state it clearly).
If the size of the elements is fixed ie 1500 use array other wise use LinkedList(write your own linked list or use Collection).
If there are others operations that you need to support on this collection of items, may be further inserts, search( in particular) use balanced binary search tree.
With the understanding of the question i would say linked list is better option.

If the items have a unique property (name or id or row number or any other unique identifier) I recommend using a HashMap with a wrapper around the item. If you are going to do any kind of lookup on your data (find item with id x and do operation y) this is the fastest option and is also very clean, it just requires a wrapper and you can use a datastructure that is already implemented.
If you are not doing any lookups and need to process the items en masse in no specific order I would recommend an ArrayList, it is very optimized as it is the most commonly used collection in java. You would still need the wrapper to keep things clean and a list is far cleaner than an array at almost no extra cost.
Little point in making your own collection as your needs are not extremely specific, just use one that is already implemented and never worry about your code breaking, if it does it is oracles fault ;)

Related

Assign the size and elements of multi dimensional string array dynamically

I'm trying to achieve the dynamic grid view with titles/suppuration, for this i got the reference from this link (this is the studio project link). In this reference project i'm not able to update the multi dimensional String array dynamically
Multi dimensional String array Code
AUTHORS = new String[] { "Roberto Bolao",
"David Mitchell", "Haruki Murakami", "Thomas Pynchon" };
BOOKS = new String[][] {
{ "The Savage Detectives", "2666" },
{ "Ghostwritten", "number9dream", "Cloud Atlas",
"Black Swan Green", "The Thousand Autumns of Jacob de Zoet" },
{ "A Wild Sheep Chase",
... }};
Notes:
I have updated the AUTHORS array dynamically but i'm not able update and define the size of BOOKS array dynamically
Please check the studio project which i attached here because we have to update the multi dimensional array (BOOKS) with help of normal array (Authors) (If you run the project means you will get the clear view about my question)

Just go one step further: when your requirement is to deal with dynamic number of elements; then use Java's List interface resp. one of its implementation classes for your work.
Arrays in Java are not dynamic. The only thing you can do after an array was created is to change the content of a given array slot. Yes, when you have two-dim arrays, which is actually an array filled with other arrays, you can "dynamically" put in new arrays ... but again: that is simply inconvenient, cumbersome and error-prone.

It is possible to over allocate the array to give room for further expansions. However this will basically give you a naive implementation of a List, so preferably a List could be used instead to avoid extra work (or more accurately any of the concrete implementations). In return for relieving you of a lot of work, these classes will also provide better functionality and be more stable.

Perform arithmetic on number array without iterating

For example, I would like to do something like the following in java:
int[] numbers = {1,2,3,4,5};
int[] result = numbers*2;
//result now equals {2,4,6,8,10};
Is this possible to do without iterating through the array? Would I need to use a different data type, such as ArrayList? The current iterating step is taking up some time, and I'm hoping something like this would help.

No, you can't multiply each item in an array without iterating through the entire array. As pointed out in the comments, even if you could use the * operator in such a way the implementation would still have to touch each item in the array.
Further, a different data type would have to do the same thing.

I think a different answer from the obvious may be beneficial to others who have the same problem and don't mind a layer of complexity (or two).
In Haskell, there is something known as "Lazy Evaluation", where you could do something like multiply an infinitely large array by two, and Haskell would "do" that. When you accessed the array, it would try to evaluate everything as needed. In Java, we have no such luxury, but we can emulate this behavior in a controllable manner.
You will need to create or extend your own List class and add some new functions. You would need functions for each mathematical operation you wanted to support. I have examples below.
LazyList ll = new LazyList();
// Add a couple million objects
ll.multiplyList(2);
The internal implementation of this would be to create a Queue that stores all the primitive operations you need to perform, so that order of operations is preserved. Now, each time an element is read, you perform all operations in the Queue before returning the result. This means that reads are very slow (depending on the number of operations performed), but we at least get the desired result.
If you find yourself iterating through the whole array each time, it may be useful to de-queue at the end instead of preserving the original values.
If you find that you are making random accesses, I would preserve the original values and returned modified results when called.
If you need to update entries, you will need to decide what that means. Are you replacing a value there, or are you replacing a value after the operations were performed? Depending on your answer, you may need to run backwards through the queue to get a "pre-operations" value to replace an older value. The reasoning is that on the next read of that same object, the operations would be applied again and then the value would be restored to what you intended to replace in the list.
There may be other nuances with this solution, and again the way you implement it would be entirely different depending on your needs and how you access this (sequentially or randomly), but it should be a good start.

With the introduction of Java 8 this task can be done using streams.
private long tileSize(int[] sizes) {
return IntStream.of(sizes).reduce(1, (x, y) -> x * y);
}

No it isn't. If your collection is really big and you want to do it faster you can try to operates on elements in 2 or more threads, but you have to take care of synchronization(use synchronized collection) or divide your collection to 2(or more) collections and in each thread operate on one collection. I'm not sure wheather it will be faster than just iterating through the array - it depends on size of your collection and on what you want to do with each element. If you want to use this solution you will have wheather is it faster in your case - it might be slower and definitely it will be much more complicated.
Generally - if it's not critical part of code and time of execution isn't too long i would leave it as it is now.

Best way to store a table of data

I have a table of data (the number of columns can vary in length on different rows). I also need to be able to delete or add new rows of data.
What is the best way to store this data?
My first guess would be an ArrayList.

Two approaches:
Convert everything to strings and use an ArrayList<List<String>> where each entry is a row represented by an ArrayList<String>.
Advantage: Don't need to create your own class to represent a "row".
Disadvantage: Need to convert data, can't do mathematical operations without converting data back, need to make sure all rows are the same length.
As dystroy said, create a class representing a Row in the table, and use an ArrayList<Row>
Advantage: entries keep their actual types, rows don't have variable lengths (unless you want them to), and you can have meaningful ways to access columns (e.g. row.getDate() instead of row.get(3) ).
Disadvantage: might be more work to create the additional class.

I'd choose LinkedList especially if you expect your list to work as a Stack.
Main Drawback of ArrayList is that this one recreates a larger table when capacity is reached => table allocation and copy gets performance slower.
Whereas with LinkedList, there is no concept of capacity since all works by pointers.
According to me, the main (and probably unique in mostly cases...) reason to prefer ArrayList rather than LinkedList is when you mainly want to access (read part so) a particular index. With ArrayList it is O(1), whereas with LinkedList it is O(n).
You can read this post for more information :
When to use LinkedList over ArrayList?

How do I create an array of strings without specifying its length in the beginning?

I want to create an array of strings, but I do not know the length of it in the beginning. It's like the array length depends on many factors and it's only decided when I fill strings/words into it. however, processing does not allow me to do that, it asks me to specify the length in the beginning. How can I get rid of this?..Thanks for all help. Any suggestion will be appreciated.
Amrita

List<String> strs = new ArrayList<String>();
strs.add("String 1");
strs.add("String 2");
strs.add("String 3");
System.out.println(strs.size()); //3
System.out.println(strs.get(1)); //String 2
Something like that is all you need! You don't need to worry about resizing, copying stuff in memory or whatever - the list will just expand as it needs to. All of the performance details are taken care of and unless you're really interested in how it works, you don't need to read about those details to use it.

You can use ArrayList: http://processing.org/reference/ArrayList.html

I would start by using ArrayList and resizing it when necessary. Java pre-allocates memory for ArrayList so that not every resize means that the contents are copied in memory. Access to ArrayList is faster than to LinkedList (it's O(1) instead of O(n)). Only if you find that the resizing of the ArrayList takes too much time, would I think of switching to LinkedList.

Use the typed ArrayList as #berry120 suggests (otherwise, you'll need to cast from Object to String all the time).
Also, if it helps, Processing has some functions for handling Arrays (like append() and expand()). Look under Array Functions in the Processing reference.
Behind the scenes the above mentioned Array Functions use System.arraycopy(), if that's of any use.

You need to use a LinkedList structure: this gives you an easily expanded container array and takes an initial capacity in the constructor, rather than a set limit. This will also be more efficient than an ArrayList, which will copy it's contents every time you exceed the current capacity, rather than simply add to it.

Are there reasons to prefer Arrays over ArrayLists? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Benefits of arrays
Hey there,
are there any reasons to prefer Arrays (MyObject[]) over ArrayLists (List<MyObject>)? The only left place to use Arrays is for primitive data types (int, boolean, etc.). However I have no reasonable explanation for this, it just makes the code a little bit slimmer.
In general I use List in order to maintain a better flexibility. But are there reasons left to use real Arrays?
I would like to know,
best regards

I prefer to use Arrays over ArrayLists whenever I know I am only going to work with a fixed number of elements. My reasons are mostly subjective, but I'm listing them here anyway:
Using Collection classes for primitives is appreciably slower since they have to use autoboxing and wrappers.
I prefer the more straightforward [] syntax for accessing elements over ArrayList's get(). This really becomes more important when I need multidimensional arrays.
ArrayLists usually allocate about twice the memory you need now in advance so that you can append items very fast. So there is wastage if you are never going to add any more items.
(Possibly related to the previous point) I think ArrayList accesses are slower than plain arrays in general. The ArrayList implementation uses an underlying array, but all accesses have to go through the get(), set(), remove(), etc. methods which means it goes through more code than a simple array access. But I have not actually tested the difference so I may be wrong.
Having said that, I think the choice actually depends on what you need it for. If you need a fixed number of elements or if you are going to use multiple dimensions, I would suggest a plain array. But if you need a simple random access list and are going to be making a lot of inserts and removals to it, it just makes a lot more sense to use an Arraylist

Generally arrays have their problems, e.g. type safety:
Integer[] ints = new Integer[10];
Number[] nums = ints; //that shouldn't be allowed
nums[3] = Double.valueOf[3.14]; //Ouch!
They don't play well with collections, either. So generelly you should prefer Collections over arrays. There are just a few things where arrays may be more convenient. As you already say primitive types would be a reason (although you could consider using collection-like libs like Trove). If the array is hidden in an object and doesn't need to change its size, it's OK to use arrays, especially if you need all performance you can get (say 3D and 4D Vectors and Matrices for 3D graphics). Another reason for using arrays may be if your API has lots of varargs methods.
BTW: There is a cute trick using an array if you need mutable variables for anonymous classes:
public void f() {
final int[] a = new int[1];
new Thread(new Runnable() {
public void run() {
while(true) {
System.out.println(a[0]++);
}
}
}).start();
}
Note that you can't do this with an int variable, as it must be final.

I think that the main difference of an array and a list is, that an array has a fixed length. Once it's full, it's full. ArrayLists have a flexible length and do use arrays to be implemented. When the arrayList is out of capacity, the data gets copied to another array with a larger capacity (that's what I was taught once).
An array can still be used, if you have your data length fixed. Because arrays are pretty primitive, they don't have much methods to call and all. The advantage of using these arrays is not so big anymore, because the arrayLists are just good wrappers for what you want in Java or any other language.
I think you can even set a fixed capacity to arraylists nowadays, so even that advantage collapses.
So is there any reason to prefer them? No probably not, but it does make sure that you have just a little more space in your memory, because of the basic functionality. The arraylist is a pretty big wrapper and has a lot of flexibility, what you do not always want.

one for you:
sorting List (via j.u.Collections) are first transformed to [], then sorted (which clones the [] once again for the merge sort) and then put back to List.
You do understand that ArrayList has a backing Object[] under the cover.
Back in the day there was a case ArrayList.get was not inlined by -client hotspot compiler but now I think that's fixed. Thus, performance issue using ArrayList compared to Object[] is not so tough, the case cast to the appropriate type still costs a few clocks (but it should be 99.99% of the times predicted by the CPU); accessing the elements of the ArrayList may cost one more cache-miss more and so (or the 1st access mostly)
So it does depend what you do w/ your code in the end.
Edit
Forgot you can have atomic access to the elements of the array (i.e. CAS stuff), one impl is j.u.c.atomic.AtomicReferenceArray. It's not the most practical ones since it doesn't allow CAS of Objec[][] but Unsafe comes to the rescue.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.