How to efficiently initializing a large, static data table with Tomcat

How to efficiently initializing a large, static data table with Tomcat - java

I am starting with Java/Tomcat, and I am struggling with a problem that was very easy to solve with C++.
My webservice (sigle webapp) works by using the input values to lookup the numeric answer in a large, pre-calculated table. I am struggling with the initialization of this table.
Problem details:
The data table is huge (3000x3000);
The data is pre-computed and this computation is very costly (it
takes hours);
The data is static, it will never change after it is calculated for a
given instance;
In C++, I would just define a a static const array and initialize it inline. I was not able to do this in Java, apparently there's no concept of static data initialization in Java, it needs to generate initialization code and this code cannot be larger than 64k. In fact, I couldn't even load the file with the static initialization in Eclipse, it would hang-up.
So I need to initialize the table from a static file on disk. I tried to place a .csv file on WEB-INF/static, but found no way to open it reliably from inside my Java code (the absolute path will be in different places on my development and production environments, for example).
This is my current class definition (with mocked-up data for the initialization):
package com.hmt.restjersey;
public final class G {
static public final float[][] data = new float[3000][3000];
//TODO: actual initialization from file
static {
Logger.writeEventLog("Initializing G table...");
for (int alpha = 0; alpha < 3000; alpha++) {
for (int beta = 0; beta < 3000; beta++) {
data[alpha][beta] = 1.0f / (1 + alpha + beta);
}
}
Logger.writeEventLog("G table initialized.");
}
}
So, my questions:
How to reliably access the data file (WEB-INF/static/data.csv) to initialize the table?
Is a .csv file the best way to load numeric data efficiently?
Also, since the table is huge I would like to have a single instance of it in the server to save memory and speed up initialization. How do I assure that there will be a single instance shared by all servlet processes?

That's my two cents:
Regarding memory sharing, if all your servlets are in the same WAR (webapp) then they share static vars (because it's the same classloader), but it's even nicer to use ServletContext which is meant just for this, see ServletContext
As the ServletContext example (link above) shows, you don't necessarily need a static initializer - you can use ServletContextListener to init on application startup (btw you could also do initialization on-demand, in the 'getter' of your huge data).
If you'd like to share memory between 2 different WARs, I don't know a straightforward solution. Theoretically it can be shared if the class with the static var is in TOMCAT_HOME/lib, but iMHO it's confusing and weird
Putting the calculation in file/storage is a great idea, because you might find yourself restarting Tomcat!
As to how to locate the file, I agree with dmitrievanthony's comment regarding getResourceAsStream . Basically it allows you to take the file from your classpath (the same one used for locating code), one simple example would be putting it in /WEB-INF/classes/data.csv , see example code Here (I personally like when this approach is wrapped in "Resource" from Spring framework, but it could be an overkill).
Please note: As mentioned in my comment above, I tried to offer answers to your direct questions for the design you chose, but if I were in your shoes I'd stop to consider this design (e.g. is easy to distribute between servers? is it modular and unit-testable? Could "data.csv" be replaced with a database, or MongoDB, or even a separate "dataService" WAR?). But please ignore this remark if you're already considered it...
Edited: ServletContext example, without static fields:
// Class to encapsulate date:
public class G{
private double[][] data;
public static G loadData(){
data=...// complex loading
}
}
// Usage in ServletContextListener:
public class MyListener implements ServletContextListener{
public void contextInitialized(ServletContext ctx) {
G g= G.loadData();
ctx.put("myData", g);
}
// Usage is Servlet:
doGet(...){
G g=(G) getServletContext().getAttribute("myData");
}
Singleton pattern alternative (but I suggest care in terms of testability and modularity, you may also want to have a look at frameworks such as SpringMVC, but let's start simple):
// Singleton:
public class G{
private volatile double[][] data;
private G instance;
public static G getInstance(){
// I don't synchronize because I rely on ServletContextListener to initialize once
if(data==null)
data=... // complex loading
return data;
}
}
// ServletContextListener:
public void contextInitialized(ServletContext ctx) {
G.getInstance();
}
// Usage in servlet:
doGet(){
G g=G.getInstance(); // I don't like it in terms of OOD, but it works
}

Related

What is the best way of reading configuration parameters from configuration file in Java?

Let us assume up to runtime we do not know what are the details of configuration(may user need to configure these parameters in config file before running the application.
I want to read those configuration details and need to reuse them wherever I need them in my application. For that I want to make them as global constants(public static final).
So, My doubt is, is there any performance implications if I read from config file directly from the required class? since,runtime values I can not directly put in separate Interface.
I am thinking it will impact performance.Please suggest me any better way to do this.
UPDATE: Can I use separate final class for configuration details?
putting all configuration details as constants in a separate public final class
(To read all configuration details at once from the configuration file and storing them as global constants for later use in application)

I am thinking it will impact performance.
I doubt that this will be true.
Assuming that the application reads the configuration file just once at startup, the time taken to read the file is probably irrelevant to your application's overall performance. Indeed, the longer the application runs, the less important startup time will be.
Standard advice is to only optimize for application performance when you have concrete evidence (i.e. measurements) to say that performance is a significant issue. Then, only optimize those parts of your code that profiling tells you are really a performance bottleneck.
Can I use separate final class for configuration details
Yes it is possible to do that. Nobody is going to stop you1.
However, it is a bad idea. Anything that means that you need to recompile your code to change configuration parameters is a bad idea. IMO.
To read all configuration details at once from the configuration file and storing them as global constants for later use in application.
Ah ... so you actually want to read the values of the "constants" instead of hard-wiring them.
Yes, that is possible. And it makes more sense than hard-wiring configuration parameters into the code. But it is still not a good idea (IMO).
Why? Well lets look at what the code has to look like:
public final class Config {
public static final int CONST_1;
public static final String CONST_2;
static {
int c1;
String c2;
try (Scanner s = new Scanner(new File("config.txt"))) {
c1 = s.nextInt();
c2 = s.next();
} catch (IOException ex) {
throw RuntimeException("Cannot load config properties", ex);
}
CONST_1 = c1;
CONST_2 = c2;
}
}
First observation is that makes no difference that the class is final. It is declaring the fields as final that makes them constant. (Declaring the class as final prevents subclassing, but that has no impact on the static fields. Static fields are not affected by inheritance.)
Next observation is that this code is fragile in a number of respects:
If something goes wrong in the static initializer block. the unchecked exception that is thrown by the block will get wrapped as an ExceptionInInitializerError (yes ... it is an Error!!), and the Config class will be marked as erroneous.
If that happens, there is no realistic hope of recovering, and it possibly even a bad idea to try and diagnose the Error.
The code above gets executed when the Config class is initialized, but determining when that happens can be tricky.
If the configuration filename is a parameter, then you have the problem of getting hold of the parameter value ... before the static initialization is triggered.
Next, the code is pretty messy compared with loading the state into a instance variables. And that messiness is largely a result of having to work within the constraints of static initializers. Here's what the code looks like if you use final instance variables instead.
public final class Config {
public final int CONST_1;
public final String CONST_2;
public Config(File file) throws IOException {
try (Scanner s = new Scanner(file)) {
CONST_1 = s.nextInt();
CONST_2 = s.next();
}
}
}
Finally, the performance benefits of static final fields over final fields are tiny:
probably one or two machine instructions each time you access one of the constants,
possibly nothing at all if the JIT compiler is smart, and you handle the singleton Config reference appropriately.
In either case, in the vast majority of cases the benefits will be insignificant.
1 - OK ... if your code is code-reviewed, then someone will probably stop you.

Have you ever heard of apache commons configuration http://commons.apache.org/proper/commons-configuration/ ?
It is the best configuration reader I have ever found and even am using it in my application which is running in production since 1 year. Never found any issues, very easy to understand and use, great performance. I know its a bit of dependency to your application but trust me you will like it.
All you need to do is
Configuration config = new ConfigSelector().getPropertiesConfiguration(configFilePath);
String value = config.getString("key");
int value1 = config.getInt("key1");
String[] value2 = config.getStringArray("key2");
List<Object> value3 = config.getList("key3");
And thats it. Your config object will hold all the config values and you can just pass that object to as many classes as you want. With so many available helpful methods you can extract whichever type of key you want.

It will be only one time cost if you are putting them in a property file and reading the file at the start of your application and initialize all the parameters as system parameters(System.setProperty) and then define constants in your code like
public static final String MY_CONST = System.getProperty("my.const");
But ensure the initialization at start of your application before any other class is loaded.

There are different types of configuration.
Usually some sort of bootstrapping configuration, for example to connect to a database or service, is needed to be able to start the application. The J2EE way to specify database connection parameters is via a 'datasource' specified in your container's JNDI registry (Glassfish, JBoss, Websphere, ...). This datasource is then looked up by name in your persistence.xml. In non-J2EE applications it is more common to specify these in a Spring context or even a .properties file. In any case, you usually need something to connect your application to some sort of data store.
After bootstrapping to a data store an option is to manage config values inside this datastore. For example if you have a database you can use a separate table (represented by e.g. a JPA Entity in your application) for configuration values. If you don't want/need this flexibility you can use simple .properties file for this instead. There is good support for .properties files in Java (ResourceBundle) and in frameworks like Spring. The vanilla ResourceBundle just loads the properties once, the Spring helper offers configurable caching and reloading (this helps with the performance aspect which you mentioned). Note: you can also use Properties backed by a data store instead of a file.
Often both approaches coexist in an application. Values that never change within a deployed application (like the application name) can be read from a properties file. Values that might need to be changed by an application maintainer at runtime without redeployment (e.g. the session timeout interval) might better be kept in a reloadable .properties file or in a database. Values that can be changed by users of the application should be kept in the application's data store and usually have an in-application screen to edit them.
So my advise is to separate your configuration settings into categories (e.g. bootstrap, deployment, runtime and application) and select an appropriate mechanism to manage them. This also depends on the scope of your application, i.e. is it a J2EE web app, a desktop app, command-line utility, a batch process?

What kind of configuration file do you have in mind? If it is a properties file, this might suit you:
public class Configuration {
// the configuration file is stored in the root of the class path as a .properties file
private static final String CONFIGURATION_FILE = "/configuration.properties";
private static final Properties properties;
// use static initializer to read the configuration file when the class is loaded
static {
properties = new Properties();
try (InputStream inputStream = Configuration.class.getResourceAsStream(CONFIGURATION_FILE)) {
properties.load(inputStream);
} catch (IOException e) {
throw new RuntimeException("Failed to read file " + CONFIGURATION_FILE, e);
}
}
public static Map<String, String> getConfiguration() {
// ugly workaround to get String as generics
Map temp = properties;
Map<String, String> map = new HashMap<String, String>(temp);
// prevent the returned configuration from being modified
return Collections.unmodifiableMap(map);
}
public static String getConfigurationValue(String key) {
return properties.getProperty(key);
}
// private constructor to prevent initialization
private Configuration() {
}
}
You could also return the Properties object immediately from the getConfiguration() method, but then it could potentially be modified by the code that access it. The Collections.unmodifiableMap() does not make the configuration constant (since the Properties instance gets its values by the load() method after it was created), however since it is wrapped in an unmodifiable map, the configuration cannot be changed by other classes.

Well this is a great problem which is faced in every one's life once in a will. Now coming to the problem, this can be solved by creating a singleton class which has instance variables same as in configuration file with default values. Secondly this class should have a method like getInstance() which reads the properties once and every times returns the same object if it exists. For reading file we can use Environmental variable to get path or something like System.getenv("Config_path");. Reading the properties (readProperties() method) should read each item from config file and set the value to the instance variables of singleton object. So now a single object contains all the configuration parameter's value and also if the parameter is empty than default value is considered.

One more way is to define a class and read the properties file in that class.
This class needs to be at the Application level and can be marked as Singleton.
Marking the class as Singleton will avoid multiple instances to be created.

Putting configuration keys directly to classes is bad: configuration keys will be scattered over the code. Best practice is separation of application code and configuration code. Usually dependency injection framework like spring is used. It loads a configuration file and constructs the objects using configuration values. If you need some configuration value in your class you should create a setter for this value. Spring will set this value during context initialization.

I recommend using JAXB or a similar binding framework that works with text based files. Since a JAXB implementation is part of the JRE, it's pretty easy to use. As Denis I advise against configuration keys.
Here is a simple example for an easy to use and still pretty mighty way to configure you application with XML and JAXB. When you use a DI framework you can just add a similar config object to the DI context.
#XmlRootElement
#XmlAccessorType(XmlAccessType.FIELD)
public class ApplicationConfig {
private static final JAXBContext CONTEXT;
public static final ApplicationConfig INSTANCE;
// configuration properties with defaults
private int number = 0;
private String text = "default";
#XmlElementWrapper
#XmlElement(name = "text")
private List<String> texts = new ArrayList<>(Arrays.asList("default1", "default2"));
ApplicationConfig() {
}
static {
try {
CONTEXT = JAXBContext.newInstance(ApplicationConfig.class);
} catch (JAXBException ex) {
throw new IllegalStateException("JAXB context for " + ApplicationConfig.class + " unavailable.", ex);
}
File applicationConfigFile = new File(System.getProperty("config", new File(System.getProperty("user.dir"), "config.xml").toString()));
if (applicationConfigFile.exists()) {
INSTANCE = loadConfig(applicationConfigFile);
} else {
INSTANCE = new ApplicationConfig();
}
}
public int getNumber() {
return number;
}
public String getText() {
return text;
}
public List<String> getTexts() {
return Collections.unmodifiableList(texts);
}
public static ApplicationConfig loadConfig(File file) {
try {
return (ApplicationConfig) CONTEXT.createUnmarshaller().unmarshal(file);
} catch (JAXBException ex) {
throw new IllegalArgumentException("Could not load configuration from " + file + ".", ex);
}
}
// usage
public static void main(String[] args) {
System.out.println(ApplicationConfig.INSTANCE.getNumber());
System.out.println(ApplicationConfig.INSTANCE.getText());
System.out.println(ApplicationConfig.INSTANCE.getTexts());
}
}
The configuration file looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<applicationConfig>
<number>12</number>
<text>Test</text>
<texts>
<text>Test 1</text>
<text>Test 2</text>
</texts>
</applicationConfig>

protected java.util.Properties loadParams() throws IOException {
// Loads a ResourceBundle and creates Properties from it
Properties prop = new Properties();
URL propertiesFileURL = this.getClass().getResource("/conf/config.properties");
prop.load(new FileInputStream(new File(propertiesFileURL.getPath())));
return prop;
}
Properties prop = loadParams();
String prop1=(String) prop.get("x.y.z");

Given the prevalence of YML to express configuration, I'd recommend creating a YML file with the configuration inside it and then loading that once, at startup, into a POJO, then accessing the fields of that POJO to get the configuration:
user: someuser
password: somepassword
url: jdbc://mysql:3306/MyDatabase
With Java Class
public class Config {
private String user;
private String password;
private String url;
// getters/setters
Jackson can be used to load YML as can SnakeYml directly.
On top of this, you could use the OS project I've been working on - https://github.com/webcompere/lightweight-config - which allows you to wrap this up, and even express placeholders in your file to interpolate environment variables:
user: ${USER}
password: ${PASSWORD}
url: jdbc://${DB_HOST}:3306/MyDatabase
then
Config config = ConfigLoader.loadYmlConfigFromResource("config.yml", Config.class);

Alternative To Singleton Util Class

So I have a class like so:
public class HBaseUtil {
private final String fileName = "hbase.properties";
private Configuration config;
private HBaseUtil() {
try {
config = new PropertiesConfiguration(fileName);
} catch (ConfigurationException e) {
// some exception handling logging
}
}
// now some getters pulling data out of the config object
public static String getProperty(String fieldKeyName) {...}
public static String getColumnFamily(String fieldName) {...}
// ... some more getters
// NO setters (thus making this a read-only class)
}
Thus, basically I have for myself a Singleton class, that the very first time that it is put to use, sets up a configuration object, and then simply keeps listening for get calls. There are a number of problems with this class:
Unit testing the static methods within class HBaseUtil becomes difficult because of a tight-knit coupling between the Singleton and the configurations file.
What I really want is me being able to supply the filename/filename+path to the class so that it can go in there, read the configuration properties from that file and offer them to incoming read requests. One important note here though: I need this flexibility in specifying the properties file ONLY ONCE per JVM launch. So I certainly don't need to maintain state.
Here is what I was able to come up with:
Instead of a Singleton, I have a normal class with all static methods and no explicit constructor defined.
public class HBaseUtil {
// directly start with getters
public static String getProperty(Configuration config, String fieldKeyName) {...}
public static String getColumnFamily(Configuration config, String fieldKeyName) {...}
// ...and so on
}
And then, instead of using the class in my other code like such:
HBaseUtil.getProperty(String fieldKeyName)
I'd use it like so:
Configuration externalConfig = new PropertiesConfiguration("my-custom-hbase.properties");
HbaseUtil.getProperty(externalConfig, fieldKeyName)
My questions:
Am I even thinking in the right direction? My requirement is to have the flexibility in the class only ONCE per JVM. All that needs to be configurable in my project for this, is the location/contents of the HBase .properties file. I was thinking having a Singleton is overkill for this requirement.
What other better approaches are there for my requirement (stated in above point)?
Thanks!
Note: I've read this StackOverflow discussion, but now it's gotten me even more confused.

You should avoid all static methods and instead design a class which does not mandate its lifecycle: it can be a typical immutable POJO with a public constructor.
Then, when you need it as a singleton, use it as a singleton. For testing, use it in some other way.
Usually, dependency injection is the preferred avenue to solve these problems: instead of hard-coding a pulling mechanism for your configuration object, you have the object delivered to any class which needs it. Then you can decide late what bean you will deliver.
Since you are probably not using Spring (otherwise dependency injection would be your default), consider using Guice, which is a very lightweight and non-intrusive approach to dependency injection.

Read File Contents Into Memory At Java Compile

This is probably a basic question that has some sort of solution that I am not aware of, but basically I have a apache-tomcat web application that hosts a lot of different sites and each visitor needs access to the contents of an xml file. There are about 6 different xml files that this could be. If I allow the file to be accessed each time (the file is used in lots of included pages and assets) I get too many files open, if I store it in the sessions, I get too much memory usage.
What I would like is when I compile the classes to have one class read each of the files into memory and then to access that data like a constant. Is there an easy way of doing this?

This is the classic case where a singleton would be useful. A singleton is often used to load content only once.
A modified example from the wikipedia page on Singletons (http://en.wikipedia.org/wiki/Singleton_pattern):
public class Singleton {
private static final Singleton INSTANCE = new Singleton();
private String xmlFileContents;
private Singleton() {
// Call method to populate xmlFileContents field from XML file
}
public static Singleton getInstance() {
return INSTANCE;
}
public String getXMLFileContents() {
return xmlFileContents;
}
}

Is it OK to use static "database helper" class?

I have some Android projects and most of them are connected with SQLite databases. I'm interested is it a good programming practice (or a bad habbit) to use some static class like "DatabaseHelper.class" in which I would have all static method related for database manipulation. For example
public static int getId(Context context, String name) {
dbInit(context);
Cursor result = db.rawQuery("SELECT some_id FROM table WHERE some_name = '" + name + "'", null);
result.moveToFirst();
int id = result.getInt(result.getColumnIndex("some_id"));
result.close();
return id;
}
where dbInit(context) (which is used in all my static methods for database manipluation) is
private static void dbInit(Context context) {
if (db == null) {
db = context.openOrCreateDatabase(DATABASE_NAME, Context.MODE_PRIVATE, null);
}
}
Then when I need something I can easily call those method(s) with for example
int id = DatabaseHelper.getId(this, "Abc");
EDIT: Do I have to use dbClose on every connection or leave it open per-activity and close per-activity? So do I have change that upper code to something like this?
...
dbClose();
return id;
}
private static void dbClose() {
if (db != null) {
db.close();
}
}

I would suggest you get into the habit of getting a database connection every time you need one, and releasing it back when you are done with it. The usual name for such a facility is a "database connection pool".
This moves the connection logic out of your actual code and into the pool, and allow you to do many things later when you need them. One simple thing, could be that the pool logs how long time a connection object was used, so you can get information about the usage of the database.
Your initial pool can be very simple if you only need a single connection.

I would definitely have your database related code in a separate class, but would really recommend against using a static class or Singleton. It might look good at first because of the convenience, but unfortunately it tightly couples your classes, hides their dependencies, and also makes unit testing harder.
The drawbacks section in wikipedia gives you a small overview of why you might want to explore other techniques. You can also head over here or over there where they give concrete examples of a class that uses a database access singleton, and how using dependency injection instead can solve some of the issues I mentioned.
As a first step, I would recommend using a normal class that you instantiate in your constructor, for ex:
public class MyActivity extends Activity {
private DBAccess dbAccess;
public MyActivity() {
dbAccess = new DBAccess(this);
}
}
As a second step, you might want to investigate frameworks like RoboGuice to break the hard dependency. You code would look something like:
public class MyActivity extends Activity {
#Inject private DBAccess dbAccess;
public MyActivity() {
}
}
Let us know if you want more details!

If you're going to use a singleton the very minimum requirement is that you make it stateless/threadsafe. If you use your getId method as it stands concurrent invocations could potentially cause all manner of strange bugs...
dbInit(context);
May be called for Thread A which then stops processing before hitting the query statement. Subsequently Thread B executes getId and also calls dbInit passing in a different context all together. Thread A would then resume and attempt to execute the query on B's context.
Maybe this isn't a problem in your application but I'd recommend sticking a synchronized modifier on that getId method!

java Properties - to expose or not to expose?

This might be an age old problem and I am sure everyone has their own ways.
Suppose I have some properties defined such as
secret.user.id=user
secret.password=password
website.url=http://stackoverflow.com
Suppose I have 100 different classes and places where I need to use these properties.
Which one is good
(1) I create a Util class that will load all properties and serve them using a key constant
Such as :
Util is a singleton that loads all properties and keeps up on getInstance() call.
Util myUtil = Util.getInstance();
String user = myUtil.getConfigByKey(Constants.SECRET_USER_ID);
String password = myUtil.getConfigByKey(Constants.SECRET_PASSWORD);
..
//getConfigByKey() - inturns invokes properties.get(..)
doSomething(user, password)
So wherever I need these properties, I can do steps above.
(2) I create a meaningful Class to represent these properties; say,
ApplicationConfig and provide getters to get specific properties.
So above code may look like:
ApplicationConfig config = ApplicationConfig.getInstance();
doSomething(config.getSecretUserId(), config.getPassword());
//ApplicationConfig would have instance variables that are initialized during
// getInstance() after loading from properties file.
Note: The properties file as such will have only minor changes in the future.
My personal choice is (2) - let me hear some comments?

Do it the most straightforward way (a class with static values):
package com.domain.packagename
public class Properties {
private static String hostName;
public static getHostName() { return hostName; }
private static int port;
public static int getPort() { return port; }
public static void load() {
//do IO stuff, probably
hostName = ??;
port = ??;
//etc
}
}

I find the first approach to be more verbose than necessary. (Especially if the properties are not expected to change very much.) Also, by using the second approach you can handle casting/type issues when the properties are loaded instead of when they are used.

Your option (2) to keep application specific getters sounds better and clean.
public static final keys from an interface had been a bad design in Java for ages.

I guess my first question is why you want to create an instance of something you're saying is a singleton (you mentioned using code like Util.getInstance()). A singleton only has 1 instance so you shouldn't try to instantiate multiple copies in your code.
If the data is static (like this appears to be) I'd create a singleton and retrieve the values from it.

I don't think there is any significant advantage of one method over the other and I don't think the solution (1) is more secure, just because it provides a property key instead of a java getter for getting passwords.
If I had to chose one though I would take option (2).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.