Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
What I think I'm looking for is a no-SQL, library-embedded, on disk (ie not in-memory) database, thats accessible from java (and preferably runs inside my instance of the JVM). That's not really much of a database, and I'm tempted to roll-my-own. Basically I'm looking for the "should we keep this in memory or put it on disk" portion of a database.
Our model has grown to several gigabytes. Right now this is all done in memory, meaning we're pushing the JVM for upward of several gigabytes. It's currently all stored in a flat XML file, serialized and deserialized with xstream and compressed with Java'a built in gzip libraries. That's worked well when our model stays under 100MB, but now that its larger than that its becoming a problem.
loosely speaking that model can be broken down as
Project
configuration component (directed-acyclic-graph), not at all database friendly
a list of a dozen "experiment" structures
each containing a list of about a dozen "run-model" structures.
each run-model contains hundreds of megs of data. Once written they are never edited.
What I'd like to do is have something that conforms to a map interface, of guid -> run-model. This mini-database would keep a flat table of these objects. On our experiment model, we would replace the list of run-models with a list of guids, and add, at the application layer, a get call to this map, which would pull it off the disk and into memory.
That means we can keep configuration of our program in XML (which I'm very happy with) and keep a table of the big data in a DBMS that will keep us from consuming multi-GB of memory. On program start and exit I could then load and unload the two portions of our model (the config section in XML, and the run-models in the database format) from an archiving format.
I'm sort've feeling gung-ho about this, and think that I could probably implement it with some of X-Stream's XML inspection strategies and a custom map implementation, but something a voice in the back of my head is telling me I should find a library to do it instead.
Should I roll my own or is there a database that's small enough to fit this bill?
Thanks guys,
-Geoff
http://www.mapdb.org/
Also take a look at this question: Alternative to BerkeleyDB?
Since MapDB is a possible solution for your problem, Chronicle Map is also worth consideration. It's an embeddable Java key-value store, optionally persistent, offering a very similar programming model to MapDB: it also via the vanilla java.util.Map interface and transparent serialization of keys and values.
The major difference is that according to third-party benchmarks, Chronicle Map is times faster than MapDB.
Regarding stability, no bugs were reported about the Chronicle Map data storage for months now, while it is in active use in many projects.
Disclaimer: I'm the developer of Chronicle Map.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am studying Data Structures in a Fundamentals of Software Development course. I have encountered the following data structures:
Structs
Arrays
Lists
Queues
Hash Tables
... among others. I have a good understanding of how they work, but I'm struggling to understand when and where to use them.
I can identify the use of the Queue Data structure, as this would be helpful in printer and/or thread queuing and prioritizing.
Knowing the strengths and weaknesses of a data structure and implementing it in code are different things, and I am finding the former difficult.
What is a simple example of the use of each of the data structures listed above?
For example:
Queue: first-in, first-out → used for printer queue to queue docs
I had trouble understanding them when i first started programming and so i decided to give a heads up to start with.
I am trying to be as simple as possible. Try Oracle Docs fro further details
Struct: When ever you need Object like structure, where you can group related data, use structs. Structs are very rarely used in java though(as objects are created in their place)
Arrays: Arrays are contiguous memory. when ever you want fixed time access based on index, unlike linkedlist, arrays are very fast and so use them.
But the backlog with arrays is that you need to know the size at the time of initialization. Also arrays does not support higher level methods such as add(), remove(), clear(),contains(), indexOf() etc.
List: is an interface which can be implemented using Arrays(ArrayList)
or LinkedLists (LinkedList). They support all the higher level methods specified earlier.
Also Lists re-sizes themselves whenever it is getting out of space. You can specify the initial size which the underlying Arrays or LinkedLists will be created, but whenever the limit is reached, it created the underlying structure with a bigger size and then copies the contents of the initial one.
Queue or Stack: is an implementation technique and not really a data structure. If you want FIFO implementation, you implement Queue on either Arrays or LinkedList(yes, you can implement this technique on both these data structures)
https://en.wikibooks.org/wiki/Data_Structures/Stacks_and_Queues
HashMap: Hashmap is used whenever you want to store key value pairs. if you notice, you cannot use arrays or linked lists or any other mentioned data structure for this purpose. a key can be any thing from String to Object(but note that it has to be an object and cannot be a primitive) and a value can also be any object
google out each data structure for more details
It depends on what you need. If you read and learn more about these data structures you will find convenient ways for their implementation.
Maybe read this book? http://www.amazon.com/Data-Structures-Abstraction-Design-Using/dp/0470128704
All these data structures are used based on their needs in the program. Try to find advantages of one data structure to the other. That should get things more clear for you.What I say wouldn't be much clear, but i'll give it a shot
Like for example,
Structs are used to create a data type, say you want to have a data type for Book & have the names of the bookBook Structure.
Lists are easier to access both ways if you use linked lists & are better than array some times. Queues, well, you can imagine them as real life queues, First In will be First Out. So you can use them when you need to set this priority.
Like I said, looking for advantages of one over the other should get things clear for you.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I have an xml file with 100 000 fragments with 6 fields in every fragment.I want to search in that xml for different strings at different times.
What is the best xml reader for java?
OK, let's say you've got a million elements of size 50 characters each, say 50Mb of raw XML. In DOM that may well occupy 500Mb of memory, with a more compact representation such as Saxon's TinyTree it might be 250Mb. That's not impossibly big by today's standards.
If you're doing many searches of the same document, then the key factor is search speed rather than parsing speed. You don't want to be doing SAX parsing as some people have suggested because that would mean parsing the document every time you do a search.
The next question, I think, is what kind of search you are doing. You suggest you are basically looking for strings in the content, but it's not clear to what extent these are sensitive to the structure. Let's suppose you are searching using XPath or XQuery. I would suggest three possible implementations:
a) use an in-memory XQuery processor such as Saxon. Parse the document into Saxon's internal tree representation, making sure you allocate enough memory. Then search it as often as you like using XQuery expressions. If you use the Home Edition of Saxon, the search will typically be a sequential search with no indexing support.
b) use an XML database such as MarkLogic or eXist. Initial processing of the document to load the database will take a bit longer, but it won't tie up so much memory, and you can make queries faster by defining indexes.
c) consider use of Lux (http://luxdb.org) which is something of a hybrid: it uses the Saxon XQuery processor on top of Lucene, which is a free text database. It seems specifically designed for the kind of scenario you are describing. I haven't used it myself.
Are you loading the XML document into memory once and then searching it many times? In that case, it's not so much the speed of parsing that should be the concern, but rather the speed of searching. But if you are parsing the document once for every search, then it's fast parsing you need. The other factors are the nature of your searches, and the way in which you want to present the results.
You ask what is the "best" xml reader in the body of your question, but in the title you ask for the "fastest". It's not always true that the best choice is the fastest. because parsing is a mature technology, different parsing approaches might only differ by a few microseconds in performance. Would you be prepared to have four times as much development effort in return for 5% faster performance?
The solution to handling very big XML files is to use a SAX parser. With DOM parsing, any library would really fail with very big XML file. Well, failing is relative to the amount of memory you have and how efficient is the DOM parser.
But anyway, handling large XML files requires SAX parser. Consider SAX as something which just throw elements out the XML file. It is an even based sequential parser. Even based because you are thrown with elements such as start element, end element. You have to know which element you are interested in getting and handle them properly.
I would advise you to play with this simple example to understand SAX,
http://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Suppose I use the standard Java object serialization to write/read small (< 1K) Java objects to/from memory buffer. The most critical part is the deserialization, i.e. reading Java objects from memory buffer (byte array).
Is there any faster alternative to the standard Java serialization for this case ?
You might also want to have a look at FST.
also provides tools for offheap reading/writing
have a look at kryo.
its much much faster than the built-in serialization mechanism (that writes out a lot of strings and relies heavily on reflection), but a bit harder to use.
edit: R.Moeller below suggested FST, which i've never heard of until now but looks to be both faster than kryo and compatible with java built-in serialization (which should make it even easier to use), so i'd look at that 1st
Try Google protobuf or Thrift.
The standard serialization adds a lot of type information which is then verified when the object is deserialized. When you know the type of the object you are deserializing, this is usually not necessary.
What you could do, is create your own serialization method for each class, which just writes all the values of the object to a byte buffer, and a constructor (or factory method, when you swing that way) which takes such a byte buffer and reads all the variables from it.
But just like AlexR I wonder if you really need that. Serialization is usually only needed when the data leaves the program (like getting stored on disk or sent over the network to another program).
Java's standard serialisation is known to be slow, and to use a huge ammount of bytes on disk. It is very simple to do your own custom serialisation.
javas std serialisation is nice for demo project but for above reasons not well suited for professional projects. Further versioning is not well under your controll.
java provides all you need for custom serialisation, see demo code in my post at
Java partial (de)serialization of objects
With that approach you even can specify the binary file format, such that in in C or C# it could be read in, too.
Another advantage custom setialized objects need less space than in main memory (a boolean needs 4 byte in main memm but only 1 byte when custom serialized (as byte)
If differnet project partners have to read your serialied data, Googles Protobuf is an alternative to look at.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Now that the oligopole of market data providers successfully killed OpenQuant, does any alternative to proprietary and expensive subscriptions for realtime market data subsist?
Ideally I would like to be able to monitor tick by tick securities from the NYSE, NASDAQ and AMEX (about 6000 symbols).
Most vendors put a limit of 500 symbols watchable at the same time, this is unacceptable to me, even if one can imagine a rotation among the 500 symbols ie. making windows of 5 sec. of effective observation out of each minute for every symbol.
Currently I'm doing this by a Java thread pool calling Google Finance, but this is unsatisfactory for several reasons, one being that Google doesn't return the volume traded, but the main one being that Google promptly is killing bots attempting to take advantage of this service ;-)
Any hint much appreciated,
Cheers
I think you'll find all you need to know by looking at this question: source of historical stock data
I don't know of any free data feeds other than Yahoo!, but it doesn't offer tick-by-tick data, it only offers 1 minute intervals with a 15 minute delay. If you want to use an already existing tool to download the historical data, then I would recommend EclipseTrader. It only saves the Open, Close, High, Low, and Volume.
(source: divbyzero.com)
You can write your own data scraper with very little effort. I've written an article on downloading real-time data from yahoo on my blog, but it's in C#. If you're familiar with C# then you'll be able to translate the action in Java pretty quickly. If you write your own data scraper then you can get pretty much ANYTHING that Yahoo! shows on their web site: Bid, Ask, Dividend Share, Earnings Share, Day's High, Day's Low, etc, etc, etc.
If you don't know C# then don't worry, it's REALLY simple: Yahoo allows you to download CSV files with quotes just by modifying a URL. You can find out everything about the URL and the tags that are used on yahoo here: http://www.gummy-stuff.org/Yahoo-data.htm
Here are the basic steps you need to follow:
Construct a URL for the symbol or multiple symbols of your choice.
Add the tags which you're interested in downloading (Open, Close, Volume, Beta, 52 week high, etc, etc.).
Create a URLConnection with the URL you just constructed.
Use a BufferedReader to read the CSV file that is returned from the connection stream.
Your CSV will have the following format:
Each row is a different symbol.
Each column is a different tag.
Open a TDAmeritrade account and you will have free access to ThinkOrSwim real time trading and quotes platform. Live trading is real time and paper trading is delayed 15 minutes. I forget what the minimum required is to open a TDAmeritrade account but you can go to TDAMeritrade.com or thinkorswim.com to check them out.
Intrinio has a bunch of feeds with free and paid tiers. Essentially you only have to pay for what you need as opposed to the bigger data suppliers. Intrinio focuses on data quality and caters to developers as well, so I think it'd be a great option for you.
full disclosure - I work at Intrinio as a developer
There's a handy function in Google Sheets (ImportHTML) which I've been using for a while to reasonable effect.
For example -
=ImportHTML("http://www.bloomberg.com/markets/commodities/futures/metals/","table",1),5,3) returns the EUR Gold spot price.
It works with Yahoo too, so =Index(ImportHTML("http://finance.yahoo.com/q?s=DX-Y.NYB","table",0),2,2) returns the DXY.
The data updates with some small delay but it's usable.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
The assigned homework problem needs a function to store a user-selected video game. This is a small school project, and I was wondering what would the best way to store the information. The program must access the video game bookings, but I think a database is a little overblown for such a small task.
What would you do?
Usually, for small school projects, I invent my own flat file format.
Usually it is a simple CSV-like file, with some key-value pairs of some sort.
Depending on the type of information you need to save XML may be the way to go.
Also, if the information only needs to be saved for a short period of time (One run of the application), and amount of data being saved is relatively small, simply keeping all of it in memory will most certainly make the program much faster, and usually easier to write.
Kai's right, although a good way to manage 'bookings' info of video games would still be a small database (try sqlite). I'm sure you'd want to associate bookings info with a user and so any sort of relationship would also justify the use of a database.
I think the easiest solution is to serialize the object that holds your data, then write to disk (choose whatever file extension make you happy). Simply read the file and you've got your object back!
FileInputStream fis = new FileInputStream(filename);
ObjectInputStream in = new ObjectInputStream(fis);
Foo f = (Foo)in.readObject();
Here's a great primer on the whole process: Discover the secrets of the Java Serialization API
It could be helpful if you were a little more specific but...
I think what the assignment is trying to get you to do is understand that the program will not know the data types and the size of the data (row and column wise) until runtime.
From what you're telling me, I would try modeling a table through a mutable list. Program it generically so you can swap out the implementation:
List> table = new ArrayList>();
Is this just video games? If so, I would create a VideoGame object, store fields such as name, maker, system, etc, and put it into a mutable data structureand wallah! It all depends on your operations you will be performing on the list...are you searching and sorting? Do you care about retrieval times?
If you want retrieval to be O(1), or in inaccurate laymen terms, "about one instruction," consider using a Map. If the key is a video game's name, it will return in O(1). If there are multiple entries, consider using a List as the value.
I hope this wasn't too long and confusing but please specify if the number of fields is known or if it has to be entirely generic. If it has to be entirely generic, just use a database! It's made to be generic...or if you really don't want to do that, use the first method I've described.
Hope it helps.
Create your own text file (CSV etc)
Pro: Easy to edit
Con: You have to do all that marshalling yourself. And make up the file format. You will likely end up with a badly written database. Done badly changing the objects could be a real pain.
Serialize the objects (in either binary or XML)
Pro: Something else handles the marshalling (build in with binary and XML if using "beans")
Con: Changing between versions of Java could break a binary formatted serialization. XML requires beans.
Custom XML file
Pro: XML is well supported in Java
Con: You will likely end up getting scared by existing marshalling APIs and rolling your own (using the Java XML APIs one hopes). Then you end up in the same space as text files (a badly written database).
Embedded Database
Pro: SQL is sexy.
Con: Unless you already know SQL and/or are using an ORM product ORM can be a bit of a pain.
I'd likely go embedded database + JPA.
It depends on what kind of data are you talking about, how much of data, how you will be accessing and presenting the data and whether you might need to do any further query and analysis of the data.
I would say CSV or SQLite.