Store Retrieve Data from Database to Java - java

What's the best way to store data retrieved from database in to Java for processing?
Here's some context:
Every month, data from Excel files are stored in the database and our web application (plain JSP/Servlet) does some processing on those data in the database. Currently we use an ArrayList of Hashmaps to store data retrieved from the tables but it seems very clunky. Is there a better way or data structure to do this?
It's not really possible to create some sort of a Model class for each of these because there's no logical "User" object or anything like that. It's basically chunks of random data that need to be processed. Stored procedure is not an answer as well because the processing logic is quite complicated.

Try using Java API's to get faster execution.
Apache POI
Java Excel API namely JXL
Check the link of sample tutorial using JXL: Link
If your excel files are in csv format then use openCSV.

It's not really possible to create some sort of a Model class for each of these because there's no logical "User" object or anything like that. It's basically chunks of random data that need to be processed. Stored procedure is not an answer as well because the processing logic is quite complicated.
Then there is not really a better way than using a List<Map<String, Object>>. To soften the pain, you could abstract the Map<String, Object> away by extending it to another class e.g. Row with eventually convenience methods to cast the values (.getAsDouble(columnname) or even T get(columnname, Class<T> type), etc) so that it makes traversing and manipulating less scary.

Related

Single data column vs multiple columns in Cassandra

I'm working on a project with an existing cassandra database.
The schema looks like this:
partition key (big int)
clustering key1 (timestamp)
data (text)
1
2021-03-10 11:54:00.000
{a:"somedata", b:2, ...}
My question is: Is there any advantage storing data in a json string?
Will it save some space?
Until now I discovered disadvantages only:
You cannot (easily) add/drop columns at runtime, since the application could override the json string column.
Parsing the json string is currently the bottleneck regarding performance.
No, there is no real advantage to storing JSON as string in Cassandra unless the underlying data in the JSON is really schema-less. It will also not save space but in fact use more because each item has to have a key+value instead of just storing the value.
If you can, I would recommend mapping the keys to CQL columns so you can store the values natively and accessing the data is more flexible. Cheers!
Erick is spot-on-correct with his answer.
The only thing I'd add, would be that storing JSON blobs in a single column makes updates (even more) problematic. If you update a single JSON property, the whole column gets rewritten. Also the original JSON blob is still there...just "obsoleted" until compaction runs. The only time that storing a JSON blob in a single column makes any sense, is if the properties don't change.
And I agree, mapping the keys to CQL columns is a much better option.
I don't disagree with the excellent and already accepted answer by #erick-ramirez.
However there is often a good case to be made for using frozen UDTs instead of separate columns for related data that is only ever going to be set and retrieved at the same time and will not be specifically filtered as part of your query.
The "frozen" part is important as it means less work for cassandra but does mean that you rewrite the whole value each update.
This can have a large performance boost over a large number of columns. The nice ScyllaDB people have a great post on that:
If You Care About Performance, Employ User Defined Types
(I know Scylla DB is not exactly Cassandra but I've seen multiple articles that say the same thing about Cassandra)
One downside is that you add work to the application layer and sometimes mapping complex UDTs to your Java types will be interesting.

What is the best approach for saving statistical data on a file using spring framework?

What is the best approach for saving statistical data on a file using spring framework? is there any available library that offers reading and updating the data on a file? or should I build my own IO code?
I already have a relational database, but don't like the approach of creating an additional table to save the calculated values in different multiple tables with joins, also don't want to add more complexity to the project by using an additional database for just one task like MongoDB.
To understand the complexity of this report, Imagine you are drawing a chart with a total number of daily transactions for a full year with billions of records at any time with a lot of extra information like( total and average with different currencies on different rates).
So, my approach was to generate those data in a file on a regular basis, so later I don't need to generate them again once requested, only accumulate the new dates if available to the file
Is this approach fine? and what is the best library to do that in an efficient way?
Update
I found this answer useful for why sometimes people prefer using flat files rather than the relational or non-relational one
Is it faster to access data from files or a database server?
I would preferet to use MongoDB for such purposes, but if you need simple approach, you can write your data to csv\excel file.
Just using I\O
List<String> data = new ArrayList<>();
data.add("head1;head2;head3");
data.add("a;b;c");
data.add("e;f;g");
data.add("9;h;i");
Files.write(Paths.get("my.csv"), data);
That is all)
How to convert your own object, to such string 'filed1;field2' I think you know.
Also you can use apache-poi csv library, but I think this is way much faster.
Files.write(Paths.get("my.csv"), data, StandardOpenOption.APPEND);
If you want to append data to existed file, there are many different options in StandardOpenOption.
For reading you should use Files.readAllLines(Paths.get("my.csv")); it will return you list of strings.
Also you can read lines in range.
But if you need to retrieve one column, or update two columns where, and so on. You should read about MongoDB or other not relational databases. It is difficult write about MongoDB here, you should read documentation.
Enjoy)
I found a library that can be used to write/read CSV files easily and can be mapped to objects as well Jackson data formats
Find an example with spring

Get all documents from Couchbase bucket

I am writing Couchbase DAO using Java API. I store all documents for one entity in particular bucket. I wonder what is the best way to get all documents from this bucket?
Thanks in advance!
First: do you plan to store each entity type in their own buckets? That will probably not work in the long run, unless you plan to only ever have no more than 10 total entities. Buckets are not made to organize data like that: they are meant to store a variety of different types of data.
Second: do you really want to get all data from a bucket? That seems like a very uncommon use case. It's almost like asking "how do I query all data from all tables in a relational database"
That being said, I could imagine a very specialized situation where you'd want to do this. So, you could:
Create a PRIMARY index and execute a N1QL query like SELECT * FROM mybucket;
Create a very simple map/reduce view index of the data.
Both of these things can be done with the Java SDK.

Parse Excel Spreadsheet into model in java

I have an excel spreadsheet that contains service delivery information for a single client at a time. For example, Max Inc will be provided with health assessments at 3 of their corporate offices. An office may have deliveries of health assessments (service type) on multiple days and performed by different doctors.
I've created what I believe to be JavaBeans to ultimately represent all this information and the relationship between entities such as client, delivery, service, and individual sessions in a delivery.
My problem now is, what is the best way to read in and parse the data from the excel spreadsheet?
I was thinking I could create a static util class (like a factory class) that reads the excel spreadsheet (using Apache POI HSSF) in one method and then uses a variety of other methods to parse the data from the spreadsheet and then ultimately return a client object which contains all the other objects and so on.
During some point during this process I also need some data from a SQL Server DB which I thought I will just pull in using JDBC as needed.
Am I heading in the right direction with this approach? Or would recommend doing this another way?
Try simplifying this a little. Save the spreadsheet as a CSV file which is really easy to import into Java and subsequently into a database.
You could use a BufferedReader to read in the strings and split them at each delimiter ",". With each split String you will get an Array which you can add to your database.
Definitely try and avoid interfacing with Excel.
Simply reading cells from an Excel sheet is easy with POI. See: http://poi.apache.org/spreadsheet/quick-guide.html#CellContents
This will avoid any manual steps like converting to another format.
All you really need is a factory method to read in the spreadsheet and return back a list of objects. Then you can do your thing.

What's the most efficient way to load data from a file to a collection on-demand?

I'm working on a java project that will allows users to parse multiple files with potentially thousands of lines. The information parsed will be stored in different objects, which then will be added to a collection.
Since the GUI won't require to load ALL these objects at once and keep them in memory, I'm looking for an efficient way to load/unload data from files, so that data is only loaded into the collection when a user requests it.
I'm just evaluation options right now. I've also thought of the case where, after loading a subset of the data into the collection, and presenting it on the GUI, the best way to reload the previously observed data. Re-run the parser/Populate collection/Populate GUI? or probably find a way to keep the collection into memory, or serialize/deserialize the collection itself?
I know that loading/unloading subsets of data can get tricky if some sort of data filtering is performed. Let's say that I filter on ID, so my new subset will contain data from two previous analyzed subsets. This would be no problem is I keep a master copy of the whole data in memory.
I've read that google-collections are good and efficient when handling big amounts of data, and offer methods that simplify lots of things so this might offer an alternative to allow me to keep the collection in memory. This is just general talking. The question on what collection to use is a separate and complex thing.
Do you know what's the general recommendation on this type of task? I'd like to hear what you've done with similar scenarios.
I can provide more specifics if needed.
You can embed a database into the application, like HSQLDB. That way you parse the files the first time and then use SQL to do simple and complex querys.
HSQLDB (HyperSQL DataBase) is the
leading SQL relational database engine
written in Java. It has a JDBC driver
and supports nearly full ANSI-92 SQL
(BNF tree format) plus many SQL:2008
enhancements. It offers a small, fast
database engine which offers in-memory
and disk-based tables and supports
embedded and server modes.
Additionally, it includes tools such
as a command line SQL tool and GUI
query tools.
If you have tons of data, lots of files, and you are short on memory, you can do an initial scan of the file to index it. If the file is divided into records by line feeds, and you know how to read the record, you could index your records by byte locations. Later, if you wanted to read a certain set of indeces, you would do a fast lookup to find which byte ranges you need to read, and read those from the File's InputStream. When you don't need those items anymore, they will be GCed. You will never hold more items than you need into the heap.
This would be a simple solution. I'm sure you can find a library to provide you with more features.

Categories