I'm planning to write a Java application wich relies on a small (Around 3000 nodes) graph to represent its structure. The data should be loaded from a custom file at startup to create an in-memory graph database. I've looked into Neo4j but saw that you can't make it run directly as in-memory. Googling around a bit I found Google JIMFS (Java in-memory file system) may suit my needs.
Does anyone have experience with getting Neo4j to work on a JIMFS FileSystem?
Are there more suited alternatives wich work in Java (possibly in-memory out of the box like HSQLDB) for small-scale graphs and still provide a declarative query language like Cypher?
Note that performance is not so much of an issue to me, it's more of a playground to gather some experience with graph databases, but I don't want the application to create a Database file system on disk.
Note that performance is not so much of an issue to me,
In that case you can go for ImpermamentGraphDatabase of neo4j, which is created like this:
graphDb = new TestGraphDatabaseFactory().newImpermanentDatabase();
It doesn't create any files on filesystem.
Source:
http://neo4j.com/docs/stable/tutorials-java-unit-testing.html
I don't know why you wouldn't want the application to create a Database file system on disk but I can easily tell that there are many options. I used neo4j and for most cases found its query methodology clear and visualizer very useful, thereby in my limited knowledge, make it my number one choice. However considering your requirements you might find this interesting :
https://bitbucket.org/lambdazen/bitsy/wiki/Home
Related
I'm currently getting into Socket Programming and building a multi-threaded console application where I need to register/login users. The data needs to be saved locally, but I can not seem find the right structure for it.
Here are the ideas I though about:
Simply saving the data to .txt file. (will be troublesome to search and authenticate the logins)
Using the Java Preferences API but since the application is multi-threaded I keep on overwriting the data each time a new client connects to my server. Can I create a new node for each new user?
What do you guys think is the ideal structure for saving login credentials? (security isn't currently a concern for this application)
I would consider the H2 database engine.
quote:"Very fast, open source, JDBC API Embedded and server modes; in-memory
databases Browser based Console application Small footprint: around 2
MB jar file size"
http://www.h2database.com
It really depends on what you want to do with the application. The result would be different, depending on what you would answer to the following questions:
Do you want/need to persist the databases?
Is there any other data which you need to store along with that?
are you using plain java or a framework like Spring?
Some options:
if you're just prototyping and you don't have any persistence: consider using an in-memory storage for it. For simplicity in coding/dependencies, something like a ConcurrentMap can be completely sufficient. If you wrap it properly, you can exchange it later - and you don't add dependencies and complexities at an early state.
If you're prototyping but you still need persistence, using properties files on top of the ConcurrentMaps can give you a quick win.
There might be some more stages to this, depending on where you want to go with this, choosing a database at one point can be an option. Depending on your experience and needs, you can use a SQL or NoSQL database. Personally, I get faster results with NoSQL (MongoDB in my case) but prefer SQL in production for use cases like account management.
I have a quick question for Neo4J, is it possible to migrate from mysql to neo4j? Based from what I read, it seems that you can, but so far all the tutorials are meant for web service. I was wondering if there is a way (POJO) to do this kind of process. Currently I have over 300k records in process to be exported in CSV and I plan to load them into neo4j using spring. Can I just read them with JDBC and create new nodes in neo4j? thanks!
It's possible to migrate MySQL database to Neo4j, but it depends how you want to do it and what results do you expect.
You can use CSV export/import. It's simple to use, but with some limitations. For one time operation it should be good enough.
Second option is to write your own script or program which transform data from RDBMS to the Graph. It could be more powerful, you can do cleaning, transformation easily. Also you can use Spring Data for Neo4j to create persisted entities.
Next option is to use GraphAware Neo4j Importer. It's "Framework" for importing data from RDBMS to the Neo4j with lot of powerful features, but learning curve is steep.
Summary:
I am trying to write a utility program that is based on the information contained in a separate file. The object has to be such that any information on the physical file can be retrieved quickly and can be updated quickly as well.
Details:
The file is a normal ANSI encoded file that is supposed to store definitions of the physical quantities stated in the SI system. What I really want is that I should be able to read and write changes to the definitions whenever required. I'll be using markers(like ":") to get the headings and definitions like:
Length:metre:m:"..length of path traveled by light in vacuum in
1/299792458th of a second"
and so on.
So in this case is extending RandomAccessFile an option? Will it help me in quick retrieval and syncing of data? Should I use another approach?
If you want these things, then I'd advise you to use an embedded ACID database like H2:
Guarantee that you don't lose changes that you made
Have more than one program access the info
This is because coding up something that correctly does this using low level facilities like RandomAccessFile is quite hard. Storing persistent application state in embedded DBs is commonly done. H2 is probably the most popular among DBs implemented in pure Java.
On how to actually do this, see this: Embedding the Java h2 database programmatically
You prob. want to look at introduction on relational DBs & SQL if you aren't familiar with them.
I want to develop a desktop application that allows users to search through json files.
These files (around 50.000) are predefined. They should be shipped with the application itself.
My question is, what would be the best way to ship these documents with the application and at the same time allow users to search for documents containing certain values, e.g. in sql terms: show all documents where some json value within the document like %Example%.
I thought about using some kind of NoSQL solution, preloading the files into the db and bundle it with the app. I've looked at some solutions, but I'm not really sure which one would be best suited for my needs or if it's even the best approach.
Bottom line is, I can't have my users install a db on their system, that is way too complicated.
I'd prefer a solution suitable for java or python.
Thanks for your help!
You can use an embedded database, memory based database (like hsql) or a file-based database like sqlite.
Neither require any installation from your end users. You just have to package the libraries as part of your application install bundle (and of course, the engine itself).
If you are looking for a k/v store, then the good ol' Berkeley DB should suffice. If you are really looking for a "embedded NoSQL solution", try MooDB.
Mongo DB comes in an embeddable version: https://github.com/flapdoodle-oss/embedmongo.flapdoodle.de
I've used it for integration testing (mocking a Mongo server) and it works really well!
Anytime I read document and search, I also think of Solr: http://lucene.apache.org/solr/
We are using AppEngine and the datastore for our application where we have a moderately large table of information containing a list with entries.
I would like to summarize the list of entries in a report specifying how many times each one appears e.g. normally in SQL I would just use a select distinct for a column, then loop over every entry and just use select count(x) where value = valueOfEntry.
While the count portion is easily done, the distinct problem is "a problem". The only solution I could find remotely close to this is MapReduce and most of the samples are based on Python. There is this blog entry which is very helpful but somewhat outdated since it predated the reduce portion. Then there is the video here and a few more resources I was able to find.
However, its really hard for me to understand how to build he summary table if I can't write to a separate entity and I don't have a reduce stage?
This seems like something trivial and simple to accomplish but requires so many hoops, is there no sample or existing reporting engine I can just plugin to AppEngine without all the friction?
I saw BigQuery, but it seems like a huge hassle to move the data out of app engine and into that store. I tried downloading the data as CSV but ran into many issues with that as well. It doesn't seem like a practical solution in the long run either.
There is a document explaining some of the concepts of the mapreduce for java. Although it is incomplete, it shares most of the architecture with the python version. In that document, there's also a pointer to a complete java sample mapreduce app, that reads from the datastore.
For writing the results, you specify an Output class. To write the results to a new datastore entity you would need to create your own Output Class. But you could also use the blobstore (see BlobFileOutput.java).
Other alternative, is that whenever you write one of your entities, you also write/update another entry to a EntityDistinct data model.
If you plan on performing complex reports and you can anticipate all your needs now, I would suggest you to look again at Big Query. BigQuery is really powerful and works perfectly on very massive datasets. You can inspect http://code.google.com/p/log2bq/ which is a python project that loads the logs into Big Query using mapreduce. Or you could also have a cron job, that every once in a while fetches all new entities and moves them into Big Query.
Related to the friction, remember that this is a no-sql database, and as such has some advantages but some things are inherently different to SQL. Remember you can always use Google Cloud SQL, given that your dataset is of limited size, but you would loose the replication and fault-tolerant capabilities.
I think this could help you: http://jjmpsj.blogspot.ro/2008/05/appengine-output-tricks-reporting.html?m=1