I have a very long sql and it's size is more than 150000 byte . I tried run it in derby editor.
When I want to run, editör throws this error
Java
class file format limit(s) exceeded: method1:e1 code length
(158045 > 65535) in generated class
Is there any way to increase this limit ?
ps: I don't want to divide code to a few pieces. And I don't want to use temporary table.
This has been a problem for Derby over the years. See, for example, DERBY-176, DERBY-732, DERBY-766, and DERBY-1714 (there are more like these).
In all the cases that I'm familiar with, it has been possible to rewrite the SQL that the application issues in order to reduce it to a limit that the JVM can handle (this is more a Java limitation than a Derby limitation).
Are you able to share the SQL that generates this problem? Perhaps the community can suggest a better way to frame your SQL that doesn't generate such extensive bytecodes.
Unfortunately, there is no way to increase this limit. See the relevant part of the JVM spec.
I would suggest that rather than hard-coding the SQL into the class, you read it in from a text-file either as a resource on the classpath, or in a known location on the filing-system.
If the Derby editor itself is generating the class, you may not be able to use the Derby editor with this particular query. It might be wise to rethink the query. I would have thought there might be a better way to write the query. It can't be very maintainable at that length!
there is no way to increase this limit. So I used SQLite instead of
Derby.
It's a in-memory database too. And it's default code limit is over 1000000 byte. Also you can change code limit over 1000000000 byte at compile time.
Related
We have a requirement to incorporate an excel based tool in java web application. This excel tool has set of master data and couple of result outputs using formula calculations on master data.
Master data can be captured in database with relational tables. We are looking for the best way to provide capability to capture, validate and evaluate. formulas.
So far looked at using scripting engines nashorn and provide formula support using eval. We would like to know how people are doing in other places.
I've searched and found two possible libraries that could be useful for you please have a look.
http://mathparser.org/
http://mathparser.org/mxparser-hello-world/mxparser-hello-world-java/
https://lallafa.objecthunter.net/exp4j/
https://lallafa.objecthunter.net/exp4j/#Evaluating_an_expression_asynchronously
Depends on how big your data is and what your required SLA is. Also on what kind of formulas/other functions that you want to support.
For example, consider a function like sum or max. Now, the master data is in some relation table containing 10k rows. You could pull in all this data inside a java app and do a sum (or run any function). However, imagine if the table contained 500K rows. This would take some time to stream all 500K rows to Java app but consumes lot of cpu and network bandwidth (database resources, local cpu resources). A better optimized scenario in that case would be index that column in the database and let database do all the hard work for you.
Personally, I don't like using eval. I would rather parse the user input to determine what actions to take.
I am assuming that data is not big to use big data tools.
I'm looking for the fastest approach, in Java, to store ~1 billion records of ~250 bytes each (storage will happen only once) and then being able to read it multiple times in a non-sequential order.
The source records are being generated into simple java value objects and I would like to read them back in the same format.
For now my best guess is to store these objects, using a fast serialization library such as Kryo, in a flat file and then to use Java FileChannel to make direct random access to read the records at specific positions in the file (when storing the data, I will keep in a hashmap (also to be saved on disk) with the position in the file of each record so that I know where to read it).
Also, there is no need to optimize disk space. My key concern is to optimize read performance, while having a reasonable write performance (that, again, will happen only once).
Last precision: while the records are all of the same type (same Java value object), their size (in bytes) is variable (e.g. it contains strings).
Is there any better approach than what I mentioned above? Any hint or suggestion would be greatly appreciated !
Many thanks,
Thomas
You can use Apache Lucene, it will take care of everything you have mentioned above :)
It is super fast, you can search results more quickly then ever.
Apache Lucene persist objects in files and indexes them. We have used it in couple of apps and it is super fast.
You could just use an embedded Derby database. It's written in Java and you can actually run it up embedded within your process so there is no overhead of inter-process or networked communication. It will store the data and allow you to query it/etc handling all the complexity and indexing for you.
Using Java I would like to create a Map that can grow and grow and potentially be larger than the size of the memory available. Now obviously using a standard POJO HashMap we're going to run out of memory and the JVM will crash. So I was thinking along the lines of a Map that if it becomes aware of memory running low, it can write the current contents to disk.
Has anyone implemented anything like this or knows of any existing solutions out there?
What I'm trying to do is read a very large ASCII file (say 50Gb) a line at a time. Each line contains a key and a value. Keys can be duplicated in the file. I'll then store each line in a Map, which is Keys to a List of values. This Map is the object that will just grow and grow.
Any advice greatly appreciated.
Phil
Update:
Thanks for all the comments and advice everyone. With the problem that I described, a Database is the correct, scalable, solution. I should have stated that this is a temporary Map that needs to be created and used for a short period of time to aid in the parsing of a file. In this case, Michael's suggestion to "store only the line number instead of the actual value " is the most appropriate. Marking Michael's answer(s) as the recommended solution.
I think you are looking for a database.
A NoSQL database will be probably easy to setup and it is more akin a map.
Check BerkeleyDB Java edition, now from Oracle.
It has a map like interface, can be embeddable so no complex setup is needed
Sounds like dumping your huge file into DB.
Well, I had a same situation like this. But, In my case everything was in TXT file format and the throughout the file has the same formatted lines. So, what I did is I just splitted the files into several pieces (possibly, which my JVM can able to process maximum size). Then I called files one by one, to get processed.
Another way, you can directly load your data into database directly.
Seriously, choose a simple database as advised. It's not overhead — you don't have to use JPA or whatnot, just plain JDBC with native SQL. Derby or HSQL, for example, can run in embedded mode, no need to define users, access rights, start the server separately.
The "overhead" will stab you in the back when you've plodden far into the hash map solution and it turns out that you need yet another optimization to avoid the OutOfMemoryException, or the file is not 50 GB, but 75... Really, don't go there.
If you're just wanting to build up the map for data processing (rather than random access in response to requests), then MapReduce may be what you want, with no need to work with a database.
Edit: Note that although many MapReduce introductions focus on the ability to run many nodes, you should still get benefit from sidestepping the requirement to hold all the data in memory on one machine.
How much memory do you have? Unless you have enough memory to keep most of the data in memory its going to be so slow, it may as well have failed. A program which is heavily paging can be 1000x slower or more. Some PC have 16-24 GB and you might consider getting more memory.
Lets assume there is enough duplicates, you can keep most of the data in memory. I suggest you use a byte based String class of your own making, since you have ASCII data and your store your values as another of these "String" types (with a separator) You may find you can keep the working data set in memory.
I use BerkleyDB for this, though it is more complicated than a Map (though they have a Map wrapper which I don't really recommend for anything but simple applications)
http://www.oracle.com/technetwork/database/berkeleydb/overview/index.html
It is also available in Maven http://www.oracle.com/technetwork/database/berkeleydb/downloads/maven-087630.html
<dependencies>
<dependency>
<groupId>com.sleepycat</groupId>
<artifactId>je</artifactId>
<version>3.3.75</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>oracleReleases</id>
<name>Oracle Released Java Packages</name>
<url>http://download.oracle.com/maven</url>
<layout>default</layout>
</repository>
</repositories>
It also has one other disadvantage of vendor lock-in (i.e. you are forced to use this tool. though there may be other Map wrappers to some other databases)
So just choose according to your needs.
Most cache-APIs work like maps and support overflow to disk. Ehcache for example supports that. Or follow this tutorial for guave.
Alright. So I have a very large amount of binary data (let's say, 10GB) distributed over a bunch of files (let's say, 5000) of varying lengths.
I am writing a Java application to process this data, and I wish to institute a good design for the data access. Typically what will happen is such:
One way or another, all the data will be read during the course of processing.
Each file is (typically) read sequentially, requiring only a few kilobytes at a time. However, it is often necessary to have, say, the first few kilobytes of each file simultaneously, or the middle few kilobytes of each file simultaneously, etc.
There are times when the application will want random access to a byte or two here and there.
Currently I am using the RandomAccessFile class to read into byte buffers (and ByteBuffers). My ultimate goal is to encapsulate the data access into some class such that it is fast and I never have to worry about it again. The basic functionality is that I will be asking it to read frames of data from specified files, and I wish to minimize the I/O operations given the considerations above.
Examples for typical access:
Give me the first 10 kilobytes of all my files!
Give me byte 0 through 999 of file F, then give me byte 1 through 1000, then give me 2 through 1001, etc, etc, ...
Give me a megabyte of data from file F starting at such and such byte!
Any suggestions for a good design?
Use Java NIO and MappedByteBuffers, and treat your files as a list of byte arrays. Then, let the OS worry about the details of caching, read, flushing etc.
#Will
Pretty good results. Reading a large binary file quick comparison:
Test 1 - Basic sequential read with RandomAccessFile.
2656 ms
Test 2 - Basic sequential read with buffering.
47 ms
Test 3 - Basic sequential read with MappedByteBuffers and further frame buffering optimization.
16 ms
Wow. You are basically implementing a database from scratch. Is there any possibility of importing the data into an actual RDBMS and just using SQL?
If you do it yourself you will eventually want to implement some sort of caching mechanism, so the data you need comes out of RAM if it is there, and you are reading and writing the files in a lower layer.
Of course, this also entails a lot of complex transactional logic to make sure your data stays consistent.
I was going to suggest that you follow up on Eric's database idea and learn how databases manage their buffers—effectively implementing their own virtual memory management.
But as I thought about it more, I concluded that most operating systems are already a better job of implementing file system caching than you can likely do without low-level access in Java.
There is one lesson from database buffer management that you might consider, though. Databases use an understanding of the query plan to optimize the management strategy.
In a relational database, it's often best to evict the most-recently-used block from the cache. For example, a "young" block holding a child record in a join won't be looked at again, while the block containing its parent record is still in use even though it's "older".
Operating system file caches, on the other hand, are optimized to reuse recently used data (and reading ahead of the most recently used data). If your application doesn't fit that pattern, it may be worth managing the cache yourself.
You may want to take a look at an open source, simple object database called jdbm - it has a lot of this kind of thing developed, including ACID capabilities.
I've done a number of contributions to the project, and it would be worth a review of the source code if nothing else to see how we solved many of the same problems you might be working on.
Now, if your data files are not under your control (i.e. you are parsing text files generated by someone else, etc...) then the page-structured type of storage that jdbm uses may not be appropriate for you - but if all of these files are files that you are creating and working with, it may be worth a look.
#Eric
But my queries are going to be much, much simpler than anything I can do with SQL. And wouldn't a database access be much more expensive than a binary data read?
This is to answer the part about minimizing I/O traffic. On the Java side, all you can really do is wrap your readers in BufferedReaders. Aside from that, your operating system will handle other optimizations like keeping recently-read data in the page cache and doing read-ahead on files to speed up sequential reads. There's no point in doing additional buffering in Java (although you'll still need a byte buffer to return the data to the client).
I had someone recommend hadoop (http://hadoop.apache.org) to me just the other day. It looks like it could be pretty nice, and might have some marketplace traction.
I would step back and ask yourself why you are using files as your system of record, and what gains that gives you over using a database. A database certainly gives you the ability to structure your data. Given the SQL standard, it might be more maintainable in the long run.
On the other hand, your file data may not be structured so easily within the constraints of a database. The largest search company in the world :) doesn't use a database for their business processing. See here and here.
I need to store some data that follows the simple pattern of mapping an "id" to a full table (with multiple rows) of several columns (i.e. some integer values [u, v, w]). The size of one of these tables would be a couple of KB. Basically what I need is to store a persistent cache of some intermediary results.
This could quite easily be implemented as simple sql, but there's a couple of problems, namely I need to compress the size of this structure on disk as much as possible. (because of amount of values I'm storing) Also, it's not transactional, I just need to write once and simply read the contents of the entire table, so a relational DB isn't actually a very good fit.
I was wondering if anyone had any good suggestions? For some reason I can't seem to come up with something decent atm. Especially something with an API in java would be nice.
This sounds like a job for.... new ObjectOutputStream(new FileOutputStream(STORAGE_DIR + "/" + key + ".dat"); !!
Seriously - the simplest method is to just create a file for each data table that you want to store, serialize the data into and look it up using the key as the filename when you want to read.
On a decent file system writes can be made atomic (by writing to a temp file and then renaming the file); read/write speed is measured in 10s of MBit/second; look ups can be made very efficient by creating a simple directory tree like STORAGE_DIR + "/" + key.substring(0,2) + "/" + key.substring(0,4) + "/" + key which should be still efficient with millions of entries and even more efficient if your file system uses indexed directories; lastly its trivial to implement a memory-backed LRU cache on top of this for even faster retrievals.
Regarding compression - you can use Jakarta's commons-compress to affect a gzip or even bzip2 compression to the data before you store it. But this is an optimization problem and depending on your application and available disk space you may be better off investing the CPU cycles elsewhere.
Here is a sample implementation that I made: http://geek.co.il/articles/geek-storage.zip. It uses a simple interface (which is far from being clean - its just a demonstration of the concept) that offers methods for storing and retrieving objects from a cache with a set maximum size. A cache miss is transfered to a user implementation for handling, and the cache will periodically check that it doesn't exceed the storage requirements and will remove old data.
I also included a MySQL backed implementation for completion and a benchmark to compare the disk based and MySQL based implementations. On my home machine (an old Athlon 64) the disk benchmark scores better then twice as fast as the MySQL implementation in the enclosed benchmark (9.01 seconds vs. 18.17 seconds). Even though the DB implementation can probably tweaked for slightly better performance, I believe it demonstrates the problem well enough.
Feel free to use this as you see fit.
I'd use EHCache, it's used by Hibernate and other Java EE libraries, and is really simple and efficient:
To add a table:
List<List<Integer>> myTable = new(...)
cache.put(new Element("myId", myTable));
To read:
List<List<Integer>> myTable = (List<List<Integer>>) cache.get("myId").getObjectValue();
Have you looked at Berkeley DB? That sounds like it may fit the bill.
Edit:
I forgot to add you can gzip the values themselves before you store them. Then just unzip them when you retrieve them.
Apache Derby might be a good fit if you want something embedded (not a separate server).
There is a list of other options at Lightweight Data Bases in Java
It seems that Key=>Value Databases are the thing you search for.
Maybe SuperCSV is the best framework for you!
If you don't want to use a relational database, you can use JAXB to store your Objects as XML files!
There is also a way with other libraries like XStream
If you prefer XML, then use JAXB or XStream. Otherwise you should have a look at CSV libraries such as SuperCSV. People who can life with serialized java files can use the default persistance mechanism like Guss said. Direct Java persistance may be the fastest way.
You can use JOAFIP http://joafip.sourceforge.net/
It make you able to put all your data model in file and you can access to it, update it, without reloading all in memory.
If you have a couple of KB, I don't understand why you need to "compress the size of this structure on disk as much as possible" Given that 181 MB of disk space costs 1 cent, I would suggest that anything less than this isn't worth spending too much time worrying about.
However to answer your question you can compress the file as you write it. As well as ObjectOutputStream, you can use XMLExcoder to serialize your map. This will be more compact than just using ObjectOutputStream and if you decompress the file you will be able to read or edit the data.
XMLEncoder xe = new XMLEncoder(
new GZIPOutputStream(
new FileOutputStream(filename+".xml.gz")));
xe.writeObject(map);
xe.close();