Lightweight Java solution for creating views on large data

Lightweight Java solution for creating views on large data - java

I need to create a Java (J2EE) application that allows people to generate "views" on large CSV/TSV tabular data. Views might include things like: pagination through the data, sorting, filtering, pivoting and perhaps charting.
My current thinking is to load the data into temporary tables in a database, use SQL to perform the view tasks and then discard the tables.
Can someone recommend a better approach for this that is also fast?
My constrains are:
This is a real-time transaction, so Hadoop/Hive is not an option
Fast response times are important
I would like to be able to do this in a stateless way where individual requests describe the view (but not at the cost of performance)
I would like to not have to hand-code view generation, hence the preference for SQL databases.

Answering my own question. I find that HSQL does exactly what I need. Looks like Text Tables in HSQL are what I would use to create views the way I need.

Related

how to cache the objects for display tags in jsp JSTL

I am using the displaytag for the pagination purpose.
Now from the DB, I have a millions of records, to go one from the other page, its taking a quite longer time.
Is there a way we can cache the objects which needs to be shown, and so that traversing in between the pages can be faster.
Requirement : We are querying and displaying the number of files in the directory under Linux environment. each folders has thousands of files..

How are your reading from DB? It would be good to see some more from your implementation.
As a general guideline:
If you read all your data into a list from the DB and only display a page, you will be wasting resources (processing and memory). This can kill your app. Try an approach that will just go for the page you're needing.
If you are using a framework like Hibernate, you can implement caching and paging without much trouble.
If you are using direct JDBC, you will have to limit registers in your query. Here the proper technique might depend on the Database Engine you're using. Please provide this information.
Be aware that your problem might be the amount of read information rather than a caching problem (just depends on the implementation).
As a sample, in Oracle, you would need to know the page and the pagesize. With both, you could limit the query with "where rownum < pagesize * page" (or something similar depending on how you index, and navigate to the first register you need with the absolute(int) method of Resultset. On other Engines it might be more efficient.
Now, if you're paginating with some framework, normally they support some implementation of a "DataProvider" so you can control how to fetch results for each page.

Justification of the need for an in-memory database

My use case is as follows --
I have a database table with around 1000+ entries and this table is updated/edited infrequently but i expect this to change in future. Some of the columns in the table contain strings that are of considerable length.
Now I am in the process of writing a UI application that will have some mouseover events that will display texts derived from the aforementioned database table.
I have, for my use case, decided to write a backend 'server' that will host an in-memory database that will have all the data that was present in the aforementioned table. The UI app will now, on startup, cache the required data from the in-memory database present or hosted by the backend server.
Does my use case justify using an in-memory database ? If not, what are the alternatives I should consider ?
EDIT 1 --
My use case also involves running multiple searches of varying complexity on the database very frequently.
Thanks
p1ng

Seems like an excellent use-case for an in-memory database. Writing it yourself, on the other hand, is probably not the way to go.
There are plenty of existing options for just about any imaginable scenario: http://en.wikipedia.org/wiki/In-memory_database
If you're doing complex searches on text data, Lucene is quite excellent. It has special in-memory storage backends, but really, it doesn't matter for such a tiny dataset - it will always be quickly cached anyway.

Options when storing all data in memory doesn't scale

I've written a Java application that users install on there desktop. It crawls websites, storing the data about each page in a LinkedList. The application allows users to view all the pages crawled in a JTable.
This works great for small sites, but doesn't scale very well. Currently users have to allocate more memory (which translates to a -Xmx when starting Java) for larger crawls.
My current thinking is to move to storing all the data in a database, possibly using something like HSQLDB.
Are there any other approaches I should be considering?

relation db is not a good place to store web page data. could you save pages on disk? i you want to do searching on the crawling results. try the apache lucene searching engine. loading all the results all-in-once in memory is not reasonable. you can paginate the JTable model.,and use soft-reference to cache some results when pagination.

A relational database is probably the right approach for this case. Reasons:
It'll enable you to handle larger-than-memory crawls.
If you keep the link data in separate tables from the considerable larger volumes of page data, you may still be able to fit all your links in memory which will be pretty important from a performance and searching perspective
It will give you an easy way of persisting crawled data (in case this is needed in the future)
It's pretty well known / standard technology
There are good open source database implementation available (H2 or JavaDB would probably be my first choices as they are embeddable and written in pure Java)
The relational features could turn out to be useful, for example queries on link data
It doesn't sound like you have the data volumes or availability requirements that might push you towards a NoSQL-type solution

You have basically 4 options:
Store the data in flat files
Store the data in a database
Somehow transmit the data to "the cloud" (I have no idea how)
Somehow "pare" the data down to the essentials, knowing that you can re-extract the full info when needed
You can also do a variant of 4 to gain some space -- rather than a "rich" object structure, compress each distinct datum into a single String or byte[] or some such that you keep in an array or arraylist vs a linked list. This can reduce your storage requirements by 2X or more. Less "object oriented", but sometimes reality intervenes.

Try storing the page data in db4o http://community.versant.com , an object database. Object databases handle complex objects (eg. with lots of siblings) than relational databases.

High scores table

I am looking to add a (local, not online) high scores table to my Android app and I wanted to get some insight on the best way to approach the problem.
I have a list of users (right now being saved out to a file and read back in as an array of User objects), and the high scores need to reference this data to populate the table with the user's name and photo, etc.
For the display, I think a TableLayout would probably be my best option. I can make columns for picture, name, score, etc.
For the information itself, I'm thinking maybe a SQLite table would be the best way to organize my data? If so, then it may be worthwhile to move my user data to a SQLite table as well so I can ensure the data is cross-referenced properly.
Does anybody here have any helpful information or opinions on this? Thanks.
UPDATE: I went with the SQLite database (using two tables) and it works great! Wasn't too hard to learn and get working either. For the layout, it turns out a ListView with a custom adapter was the best way to accomplish what I wanted (more flexible than a TableLayout).

I haven't done a lot with Andriod but I believe you are on the right path. Using the SQLite for data will not only help keep your data structured and organized it will allow your data set to grow exponentially and you can use a common standard way to go from SQL database to Objects using a DAO pattern. I would recommend using SQLite with MyBatis. MyBatis is an Object Relational Mapping utility for java. It allows you to write sql and get back your results as java object. MyBatis

I personally think it would be simpler to use a SQlite table for this. From what I can tell you only really need one table, so it might be a bit of overkill but to me it seems much simpler than dealing with file I/O.
Also with database you can more readily add extra information such as date it was recorded etc.

For something as dirt simple and non-critical as a high scores list, a database feels a little like overkill to me. I'd be inclined to use a CSV file myself.
Of course, take that with a suitable grain of salt, as I don't develop on Android. If the overhead for SQLite is trivial (or if it's already on Android waiting for you to use it!), may as well use it for your data storage needs, if only to keep your code consistent.

ScoreNinja, the open-source score library for Android, might save you some time, or give you some ideas.

What's the most efficient way to load data from a file to a collection on-demand?

I'm working on a java project that will allows users to parse multiple files with potentially thousands of lines. The information parsed will be stored in different objects, which then will be added to a collection.
Since the GUI won't require to load ALL these objects at once and keep them in memory, I'm looking for an efficient way to load/unload data from files, so that data is only loaded into the collection when a user requests it.
I'm just evaluation options right now. I've also thought of the case where, after loading a subset of the data into the collection, and presenting it on the GUI, the best way to reload the previously observed data. Re-run the parser/Populate collection/Populate GUI? or probably find a way to keep the collection into memory, or serialize/deserialize the collection itself?
I know that loading/unloading subsets of data can get tricky if some sort of data filtering is performed. Let's say that I filter on ID, so my new subset will contain data from two previous analyzed subsets. This would be no problem is I keep a master copy of the whole data in memory.
I've read that google-collections are good and efficient when handling big amounts of data, and offer methods that simplify lots of things so this might offer an alternative to allow me to keep the collection in memory. This is just general talking. The question on what collection to use is a separate and complex thing.
Do you know what's the general recommendation on this type of task? I'd like to hear what you've done with similar scenarios.
I can provide more specifics if needed.

You can embed a database into the application, like HSQLDB. That way you parse the files the first time and then use SQL to do simple and complex querys.
HSQLDB (HyperSQL DataBase) is the
leading SQL relational database engine
written in Java. It has a JDBC driver
and supports nearly full ANSI-92 SQL
(BNF tree format) plus many SQL:2008
enhancements. It offers a small, fast
database engine which offers in-memory
and disk-based tables and supports
embedded and server modes.
Additionally, it includes tools such
as a command line SQL tool and GUI
query tools.

If you have tons of data, lots of files, and you are short on memory, you can do an initial scan of the file to index it. If the file is divided into records by line feeds, and you know how to read the record, you could index your records by byte locations. Later, if you wanted to read a certain set of indeces, you would do a fast lookup to find which byte ranges you need to read, and read those from the File's InputStream. When you don't need those items anymore, they will be GCed. You will never hold more items than you need into the heap.
This would be a simple solution. I'm sure you can find a library to provide you with more features.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.