Best approach storing and accessing Java application data - java

I'm in the middle of a massive refactoring project, the code has a 5000 line main class which was injected into everything, stored everything and had all of the common code.
I'm no expert on analysis and design but I've separated out things to the best of my ability and I'm about 80% through refactoring the classes that depend on the main class to use the new classes I've created.
There are some types of data which are initialised when the application starts and accessed by pretty much everything throughout the life of the application. For instance there is a Config class which holds hundreds of parameters.
The approach I've taken is to create several singletons the two most central are GUIData and ClientData. GUIData contains a reference to the mainframe of the application and clientdata maintains references to the config and other similar classes.
This allows me to call ClientData.getInstance().getConfig().getParam("param") from anywhere in the code but I don't feel like this is the best approach.
I considered individual static classes instead of these data singletons which contain instances of the classes but some of the classes do need constructors.
I've been googling on and off for a week trying to find a better way to do this but somehow I always end up on threads talking about database caching

Immutable (configuration) instances provide "thread-safe application-wide data access".
Typesafe's config (as suggested in a comment by Brian Kent) does exactly that.
Note that this does not involve static classes or singletons. Static classes and singletons may serve your purposes now,
but they could prove bothersome in the future. They can be handy ofcourse, but try limiting their use.
Initialization will have to be done after reading and parsing the configuration data. It is typically done at application startup, before other processing threads are started. The initialization will have to validate the configuration data as much as possible in order to fail fast and terminate the program if the configuration data is no good.
Having a lot of configuration data bundled together can create "hidden lines of communication". E.g. you update one value and the application fails because it required updates to other values as well. It's perfectly fine to put all configuration data in one file and load it from there, but your application (with hundreds of configuration options) should divide the configuration data in sets that are used by different parts of your application. This improves isolation, helps unit-testing and makes it possible to change the application in the future without getting too many nasty surprises.
There are two ways to use a set of configuration data:
from within an object call a singleton Settings.getInstance().getConfigForThisModule().
provide each object that uses configuration data with the configuration data via the constructor or via setConfig(ConfigForThisModule config).
The first approach depends on a convention not to call Settings.getInstance().getConfigForACompletelyUnrelatedModule() which could be a weakness. The second approach is more in line with "dependency injection" and could be more future proof.
You could mix both approaches while you are refactoring, just make sure to be consistent (e.g. only use the singleton approach for configuration data that is used in all parts of the application).
To further improve your design for using the configuration data, keep the following (likely) future functional requirement in mind: when the configuration file is updated, configuration data is reloaded and used in the application. Most logging frameworks manage to support this functional requirement without affecting the performance of multi-threaded applications. Among other things, it requires the following of your application:
if the new configuation data is no good, the program is not terminated but an error is logged instead and the old configuration data remains in use. Your initialization procedure will need to handle both "load at fresh start" and "reload" scenarios. The main thing to take away from this is that your initialization procedure needs to be re-usable and should not affect other (running) parts of your application (isolation, again).
long-lived objects may not keep a local copy of configuration data or a reference to an instance of ConfigForThisModule, instead Settings.getInstance()... (or some other method that can return an updated instance) should be called regurarly.
replacing old configuration with new configuration may not result in errors. Technically, replacing the configuration is as simple as updating an AtomicReference with a new configuration instance returned with Settings.getInstance().... But this is also where the isolation of the configuration data sets are tested: there should be no problem using an old set in one module and a new set in another module at the same.
Configuration data can be seen as a sort of "global state". With that in mind, further design points on what to do and what to avoid (partially blatantly copied to this answer) are discussed in the following two questions:
Why is Global State so Evil?
How are globals any different from a database?

Sorry, the question is a bit vague, are you looking to store the config or the cached objects used by other parts of your program ?
Since you have 100s of params, start with splitting up the config into manageable blocks
1) Split up your configuration parameters into logical blocks that have 1:1 correspondence with a simple properties file -its going to take some time
2) These property files must be externalized so that you can change them at any point in time, make sure that you pass in the base location via a env variable to the program
3) Write a utility class (singleton) that wraps Apache commons configuration to hold your config. (read *.properties from the base location and merge the properties into one configuration object) this must be done before any threads are kicked off.
4) Refer to the configuration param in your code using config.getXXXX() methods
Apache commons config also has ability to reload the config when your properties file changes on the filesystem.
Once this is done, use a DI container like Spring or Guice to cache the configured objects.

If it's just String property values you need, you don't even need a class for that - a global facility exists for you already: System.getProperties()
All you need do is first load the property values on start up:
System.setProperty("myKey", "myValue"); // see below how load properties from a file
Then read it anywhere in your code:
String myValue = System.getProperty("myKey");
or
String myValue = System.getProperty("myKey", "my desired default");
If your container doesn't support property loading out of the box, to load properties from an external file that looks like this:
key1=value
key2=some other value
etc...
you can use this code:
Files.lines(Paths.get("path/to/file"))
.filter(line -> !line.startsWith("#") || !line.contains("=")) // ignore comment/blank
.map(line -> line.split("=", 2)) // split into key/value
.forEach(split -> System.setProperty(split[0], split[1])); // load as property

you can use the Java Properties class util, basically its a HashTable
reference : https://docs.oracle.com/javase/7/docs/api/java/util/Properties.html
you create a file fileName.properties and store your data in key value pairs, for example:
username=your name
port=8080
then you load it into Properties Object and get the data like the following:
Properties prop = new Properties();
load the file...
String userName = prop.getProperty("username")
String port = prop.getProperty("port")// you can parse it to int if needed
what i suggest is to create a property file for each type of configuration like:
clientData.properties
appConfig.properties
you can follow this simple tutorial
http://www.mkyong.com/java/java-properties-file-examples/

Related

Should I use an application.properties or create an alone file for huge fields?

We work on a huge application and it is integrated with more external APIs.
So we use more static fields
key <=> value
To integrate with only one system like PayPal payment for instance.
Fields
payment.paypal.live-mode=false
payment.paypal.url.charge=xxxxxxx
payment.paypal.url.redirect=xxxx
payment.paypal.url.exchange=xxxx
payment.paypal.secret-key= xxxx
payment.paypal.publishable-key= xxxxx
payment.paypal.sources=xxxx,xxx,xxx,xxx
and more fields
What is the best practice to use?
Application.properties or create a new JSON file that handles all these fields.
Note that:-
The application has more than one profile.
Maybe in your case, it makes sense to have a separate file to handle long and specific configurations that are somewhat core to your application. The downside of this is that you would need to handle its parsing on your own (with the help of Jackson for example).
You might also consider using yaml files instead of properties since at least it would avoid repetitions such as payment.paypal.url and it is easier to read and organize.
Putting the data in the application.properties and having a component a class to set the data using the #Value annotation. This ensures sensitive data can be feed in through the environment.
You can use #Configuration with #Profile if you want to but that might be complicating stuff. You can instead document these properties maybe in a JSON file in the your resource directory.

Adding Writable objects to Hadoop Configuration

I see that the Configuration class in Hadoop is writable http://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html. However, I do not see any of the methods that it has exposed that can be used to add a writable object (I see a lot of methods to set and get primitive types like int, long). Let us say, I have my own writable object and I want to add it to the configuration for all my mappers and reduces to use, how do I do this?
Thanks,
Venkat
The configuration is really not for passing entire objects. The configuration should be used more for setting simple parameters that are needed for the setup of the Mappers/Reducers. Think of the conf as you set the variables at the beginning of the job. If you make changes during the middle of a run to the configuration, it most likely won't be there at the end as it's not really meant to dynamically pass data.
What you are looking for if you want to pass around entire Objects between nodes is the Distributed Cache. Technically speaking these are files, but you can use standard object serialization to add them. About the Distributed Cache.
*apologies for linking different hadoop versions, their pages are a bit muddled and hard to find what you need sometimes.
You can check HBase sources (starting from HBase 0.94.6) MultiTableInputFormat.setConf() class method and appropriate TableMapReduceUtil code (for example .initTableMapperJob()). They pass Scan objects through configuration. Earlier TableInputFormat.setConf() class uses very similar mechanics.
Usually only minimal attributes are passed through config but this is probably case closer to your one.
Hope it will help.

Design pattern for parameter settings that is maintainable in decent size java project

I am looking for concrete ideas of how to manage a lot of different parameter settings for my java program. I know this question is a bit diffuse but I need some ideas about the big picture so that my code becomes more maintainable.
What my project does is to perform many processing steps on data, mostly text. These processing steps are algorithms of varying complexity that often have many settings. I would also like to change which processing steps are used by e.g. configuration files.
The reason for my program is to do repeatable experiments, and because of this I need to be able to get a complete view of all the parameters used in the different parts of the code, preferably in a nice format.
At this (prototype) stage I have the settings in source code like:
public static final param1=0.35;
and each class that is responsible for some processing step has its own hard coded settings. It is actually quite scary because there is no simple way to change things or to even see what is done and with what parameters/settings.
My idea is to have a central key/value store for all settings that also supports a dump of all settings. Example:
k:"classA_parameter1",v:"0.35"
k:"classC_parameter5",v:"false"
However, I would not really like to just store the parameters as strings but have them associated to an actual java class or object.
Is it smarter to have a singleton "SettingsManager" that manages everything. Or to have a SettingsManager object in each class that main has access to? I don't really like storing string descriptions of the settings but I cant see any other way (Lets say one setting is a SAXparser implementation that is used and another parameter is a double, e.g. percentage) since I really don't want to store them as Objects and cast them.
Experience and links to pages about relevant design patterns is greatly appreciated.
To clarify, my experiments could be viewed as a series of algorithms that are working on data from files/databases. These algorithms are grouped into different classes depending on their task in the whole process, e.g.
Experiment //main
InternetLookup //class that controls e.g. web scraping
ThreadedWebScraper
LanguageDetection //from "text analysis" package
Statistics //Calculate and store statistics
DatabaseAccess
DecisionMaking //using the data that we have processed earlier, make decisions (machine learning)
BuildModel
Evaluate
Each of the lowest level classes have parameters and are different but I still want a to get a view of everything that is going on.
You have the following options, starting with the simplest one:
A Properties file
Apache Commons Configuration
Spring Framework
The latter allows creation of any Java object from an XML config file but note that it's a framework, not a library: this means that it affects the design of the whole application (it promotes the Inversion of Control pattern).
This wheel has been invented multiple times already.
From the most basic java.util.Properties to the more advanced frameworks like Spring, which offers advanced features like value injection and type conversion.
Building it yourself is probably the worst approach.
Maybe not a complete answer to your question, but some points to consider:
Storing values as strings (and parsing the strings into other types via your SettingsManager) is the usual approach. If your configuration value is too complex to do this then it's probably not really a configuration value, but part of your implementation.
Consider injecting the individual configuration values required by each class via constructor arguments, rather than just passing in the whole SettingsManager object (see Law of Demeter)
Avoid creating a Singleton SettingsManager if possible, singletons harm testability and damage the design of your application in various ways.
If the number of parameters is big I would split them to several config files. Apache Commons Configuration, as mentioned by #Pino is really a nice library to handle them.
On the Java-side I would probably create one config-class per file and wrap Commons Configuration config to load settings, eg:
class StatisticsConfig {
private Configuration config = ... ;
public double getParameter1() {
return config.getDouble("classA_parameter1");
}
}
This may need quite a lot of boilerplate code if the number of parameters is big but I think it is quite clean solution (and easy to refactor).

Configuration design pattern across Java system

The question is old hat - what is a proper design to support a configuration file or system configurations across our system? I've identified the following requirements:
Should be able to reload live and have changes picked up instantly with no redeploying
For software applications that rely on the same, e.g., SQL or memcached credentials, should be possible to introduce the change in an isolated place and deploy in one swoop, even if applications are on separate machines in separate locations
Many processes/machines running the same application supported
And the parts of this design I am struggling with:
Should each major class take its own "Config" class as an input parameter to the constructor? Should there be a factory responsible for instantiating with respect to the right config? Or should each class just read from its own config and reload somewhat automatically?
If class B derives from class A, or composes around it, would it make sense for the Config file to be inherited?
Say class A is constructed by M1 and M2 (M for "main") and M1 is responsible for instantiating a resource. Say the resource relies on MySQL credentials that I expect to be common between M1 and M2, is there a way to avoid the tradeoff of break ownership and put in A's config v. duplicate the resource across M1 and M2's config?
These are the design issues I'm dealing with right now and don't really know the design patterns or frameworks that work here. I'm in Java so any libraries that solve this are very welcome.
You may want to check out Apache Commons Config, which provides a wide range of features. You can specify multiple configuration sources, and arrange these into a hierarchy. One feature of particular interest is the provision for Configuration Events, allowing your components to register their interest in configuration changes.
The goal of changing config on the fly is seductive, but requires some thought around the design. You need to manage those changes carefully (e.g. what happens if you shrink queue sizes - do you throw away existing elements on the queue ?)
Should each major class take its own "Config" class as an input parameter to the constructor?
No, that sounds like an awful design which would unnecessarily overcomplicate a lot of code. I would recommend you to implement a global Configuration class as a singleton. Singleton means that there is only one configuration object, which is a private static variable of your Configuration class and can be acquired with a public static getInstance() method whenever it is needed.
This configuration object should store all configuration parameters as key/value pairs.

GWT+Java: Globals, Singletons, and Headaches

So here's my project:
I am building a central interface/dashboard to present the test data for several test types of multiple product versions. We're using TestNG on our massive product, and while not enough tests are being written, that's a discussion for another topic. Here's what the directory structure looks like:
Filesystem/productVersion+testType/uniqueDateAndBuildID/testng-results.xml
That results.xml file contains tags with child test tags, which correspond to a filesystem directory and then xml files containing actual test case results (pass, fail, etc)
The XML parsing and filesystem traversal is all well and good/reliable.
Flow of control:
Client accesses main page --> server opens properties file --> server checks for web server property (either Websphere or Tomcat, if I'm working locally) --> server sets bunch of constants based on that. Constants include: root filesystem directory, filesystem separator (translation), "like types (basically same tests on different platforms)", and a base URL to append onto. --> server then reads properties file some more and does all of its XML processing. Results are cached in memory as well as to the filesystem using ObjectOutputStream. --> A big list of results is sent back to the client to do the UI processing/display.
Here's where I run into a problem: I can't access those Global variables (contained/set in a Globals class...bad I know :-/ ) back on the client, even though they're in the shared folder. If you're wondering why I can't just load the properties again, it's because the client is GWT-ified Javascript which doesn't include File(). So my next thought, having done a little bit of upper level Java reading was to maybe use a Globals singleton object and pass that back too..but it seems like that's just as bad if not impossible. Suggestions here would be great.
This whole thing is pretty tightly coupled, something my previous Java education hadn't really gotten into yet. And since this is just an internal portal for devs to check, there doesn't seem to be much of a point in actually testing my code. As long as it displays correctly, logs properly, and handles errors gracefully, right? All in all it's <15 classes, so it's not really a big big deal I guess. Should I refactor to clean it all up and make it "better Java", comment everything to clearly delineate flow of control, or not worry too much about it because it's small? I know in the future to think more about things before I design them, but I really didn't know a large amount of the higher Java principles I've been exposed to since starting.
edit after doing a bit of thinking, came up with a possible workaround. What about, instead of passing back only a list of results, I passed back some other custom list implementation that included a globals 'header' object? I could preserve state.
A simple solution would be the Dictionary class:
Provides dynamic string lookup of
key/value string pairs defined in a
module's host HTML page. Each unique
instance of Dictionary is bound to a
named JavaScript object that resides
in the global namespace of the host
page's window object. The bound
JavaScript object is used directly as
an associative array.
You just need to add some dynamic content to your host HTML page - make the server print the values read from the properties file in the form of a JavaScript object:
var GlobalProperties = {
property1: "value1",
property2: "value2"
};
Then, use Dictionary in your code to read those values:
Dictionary globalProperties = Dictionary.getDictionary("GlobalProperties");
String property1 = globalProperties.get("property1");
PS: If you are looking for good ideas/advices on how to make your code less coupled -> more testable, I'd recommend Misko Hevery's blog. He's got many interesting posts, like why singletons are usually bad (global state, not the pattern itself). But most importantly - it has the awesome guide to writing testable code (some guidelines used internally in Google).
You could pass those Global variables using a simple object with a HashMap thought a GWT-RPC call or just include this Hashmap with the result you already retrieve in the first place (along the "big list of results [that] is sent back to the client to do the UI processing/display.")
You can't access serverside singletons from the compiled javascript.
You have two options basically. You can make a Serializable class in the client code, that represents the global variables, or pass your global variables object, but this is a rather inefficient solution.
The simplest is to use a HashMap<String, String> in a serializable object, which you can retrieve with an RPC call:
public class GwtGlobalVariables implements Serializable {
private HashMap<String, String> map = new HashMap<String, String>();
public void put(// a delegate put method of choice
public void setMap() // a getter / setter for the map if you need it
}
Ensure the class is within a GWT module's source folders, i.e. in the same place as your entry point maybe.
Fill the map out with the values needed, pass it through rpc and you have it in your client side code.

Categories