I'm planning to use the Java Weka library's EM algorithm in order to assign probabilities to objects to be in a certain cluster and then, work with these probabilities.
Furthermore, the properties of those objects will be loaded from a database, so I would like to load them into the clusterer directly from memory, instead of dumping them to an arff file as in the examples I have found around the web (e.g. Serialization).
Firstly, I would like to know if the Weka library is the proper one for my purpose of there exists another one such as Apache Commons Math.
Secondly, is there any example which does not manage any file in order to create Instances?
I would be grateful for any help.
Related
I have a java code which I'm currently running as a jar. This code checks for a specific file in the given directory which is currently hard coded in the code.
To give more flexibility and not to touch the code. I would like to have the folers list managed by a different file and the code reads this config file and gets the list of folders each time and execute it.
I would like to know which is the best possible option of maintaining the folder list outside the code so that anyone can update it. Can a properties be used for this ? can we dynamically take values from a property file
In Java you have the java.util.Properties that allow you to load flat key/value data from external resources.
If you need something that can be dynamically updated, there's also the more sophisticated Preferences library. This one allows you to:
Keep data organized in tree structures (it's a tree of nodes, each node storing its own key/value preferences).
Make use of basic types (primitive types, strings and binary data).
Make use of platform-dependent "native" stores transparently (under the hood, it's going to use the file system on Unix systems and registries on Windows by default).
Plug in your own backing store if needed.
Get any data changes performed within the application persisted transparently.
Register node/preference change listeners and react to any change if needed.
The API is quite old and hasn't been updated, but it doesn't mean it's deprecated. It is used mostly with GUI applications (notably, IntelliJ IDEA was storing its configuration using Preferences the last time I checked).
There's also an attempt to revive this library that I made with a project called cross-preferences by integrating modern distributed config stores (such as zookeeper, etcd or consul) as backing stores for java.util.prefs.Preferences and providing a web console for preference management.
Unfortunately I couldn't find anything specific to this topic / to my problem. Here we go:
I'm building a JavaFX Business Application for a friend of mine. Unfortunately I do not have any possibility to connect to a Database. I want the Application to load a savestate from a file. The application contains a list with clients and the clients got some specific properties. I do not want to hardcode this to a .prop or .txt file, because I'm sure that there's a different way of doing this, isn't there?
Thanks in advance, appreciate it!
Lots of choices for persisting data to local storage. The exact choice depends on your needs. You do not describe enough details to make a specific recommendation.
Here is a list of possibilities, roughly in increasing order of complexity of your data.
Text file
If you have small amounts of simple data, save to a text file. You can store each piece in a separate file, or combine into a single file. Recent versions of Java have new classes to make this easier than ever. See Oracle Tutorial.
Comma-separate & Tab-delimited
For sets of structured data, write to text files in comma-separated values (CSV) or tab-delimited values. For example a list of people with rows for each person, and columns for name, phone number, and email address.
While reading/writing such files is easy enough to program yourself, I suggest using an established library to eliminate the drudgery, avoid bugs, and save yourself some time. There are a few such libraries written in Java.
My favorite is the Apache Commons CSV project. This library makes easy work of the chore of reading/writing such files. Despite the name, this library supports tab-delimited as well as comma-separated formats. I've written a few Answers here on Stack Overflow showing how to use this library, as you can see here, here, and here.
By the way, plain old ASCII defines a few character positions explicitly for delimiting in data files, with four levels of grouping (document, group, record/row, and field). Unicode, of course, inherits these from ASCII as code points. I am puzzled why these have remained so obscure and so infrequently used. Seems much more logical to me than using commas and tabs which may well exist inside the data payload.
Serialization
You can write out the data values stored within an object. This is called serialization. Java has a serialization facility built-in, but be sure to study up on the details.
To more simply write out an object’s values and later read them back in to reconstitute an object, I have enjoyed using the Simple XML Serialization project. This works well for relatively simple needs, and is aimed at the situation where you want the structure of a class to drive the process of determining what to write.
Java has other XML binding facilities both built-in and third-party. These are much more powerful in their flexibility. They are especially good for when you want to define and verify the XML structure in a rigid fashion such as defining a XML DTD or XML Schema against which to validate the data and perhaps even generate the Java class in which to represent the data.
Embedded database
For more complicated data, use an embedded relational database.
The SQLite database is bundled with many platforms. This is a C-based library, not pure Java. As the name indicates, SQLite is indeed quite “lite“, lacking rigid data types and many other common database features. SQLite is meant to be an alternative to writing text files than as a competitor to more serious databases. It is a great product if your needs fit the sweet-spot of its capabilities.
My first choice for an embedded database would be H2 Database Engine. Built in pure Java. Can be run inside your app, or separately as a server (you choice). Has sophisticated relational database features. Has been around for years, often updated, and is well-worn. The principal author has much experience in the field.
For analysis.
I know we can use the Save function and load the Model in Spark application. But it works only in Spark application (Java, Scala, Python).
We can also use the PMML and export the model to other type of application.
Is there any way to use a Spark model in a Java application?
I am one of the creators of MLeap. Check us out, it is meant for exactly your use case. If there is a transformer you need that is not currently supported, get in touch with me and we will get it in there.
Our serialization format is solely JSON/Protobuf right now, so it is very portable and supports large models like RandomForest. You can serialize your model to a zip file then load it up wherever.
Take a look at our demo to get a use case:
https://github.com/TrueCar/mleap-demo
Currently no, your options are to use PMML for those models that support it, or write your own framework for using models outside of Spark.
There is movement towards enabling this (see this issue). You could also check out Mleap.
Summary:
I am trying to write a utility program that is based on the information contained in a separate file. The object has to be such that any information on the physical file can be retrieved quickly and can be updated quickly as well.
Details:
The file is a normal ANSI encoded file that is supposed to store definitions of the physical quantities stated in the SI system. What I really want is that I should be able to read and write changes to the definitions whenever required. I'll be using markers(like ":") to get the headings and definitions like:
Length:metre:m:"..length of path traveled by light in vacuum in
1/299792458th of a second"
and so on.
So in this case is extending RandomAccessFile an option? Will it help me in quick retrieval and syncing of data? Should I use another approach?
If you want these things, then I'd advise you to use an embedded ACID database like H2:
Guarantee that you don't lose changes that you made
Have more than one program access the info
This is because coding up something that correctly does this using low level facilities like RandomAccessFile is quite hard. Storing persistent application state in embedded DBs is commonly done. H2 is probably the most popular among DBs implemented in pure Java.
On how to actually do this, see this: Embedding the Java h2 database programmatically
You prob. want to look at introduction on relational DBs & SQL if you aren't familiar with them.
Basically, I am looking for a simple way to list and access a set of strings in stream form in an abstract manner. The only issue is that Java's file-accessing API can only be used for listing and reading files, and any sort of non-filesystem storage of the data uses a different API. My question is whether there is some common API I could use (whether included in Java or as an external API) so that I could access both in an abstract manner, but also somewhat efficiently.
Essentially I want a set of lazily streamed text files. Something like Set might be reasonable, except on a filesystem, you would have to open the text streams even if you don't end up wanting to access that file.
Some sort of api like
String[] TextStorage.list()
InputStream TextStorage.open(String elementname);
which could abstractly be used to access either filesystems or databases, or some other storage mechanism I invent in the future (maybe fetching something across the internet).
Is there a library which already does this? Can I do this with the already existing Java API? Do I need to write this myself? I'd be surprised if no-one has encountered this problem before, but my google-fu and stackoverflow searches don't seem to find anything.
you might use HSQL
http://hsqldb.org/