Global Hadoop configuration storage - java

Is there any way to set / get global Hadoop configuration object, something like pseudocode below? Of course I can create my own class with static methods which do what I need but its better if something like this could be found inside Hadoop Java API to not make dependencies more complex. By now I did not find anything usable. Any advice?
In some application level configurator.
Configuration conf = new Configuration();
conf.set(...);
<something>.SetGlobalConfig(conf);
In lower layer client code.
Configuration conf = <something>.GetGlobalConfig();
// ... something that needs configuration.
UPDATE 1: I know about Hadoop .xml configuration files and actually it's one of possible solutions but it's preferable to have all configuration done in code, without external files.
UPDATE 2: Based on points provided decision is to use 'usual' .xml configuration packaged together with job code. Client can 'tune' some parameters via command line parameters that keep hadoop-like semantic due to usage of the same Tool. To isolate rest of code from Hadoop configuration / user aspects even more application code requests configuration through special COnfigurator singletone.
Original question is considered solved though I don't mind against useful ideas.

When you call
Configuration conf = new Configuration();
it will look into the classpath of your project for core-default.xml and core-site.xml.
And the core-site.xml is exactly where you want to put your "global" configurations into.

Related

Spring Web App Deployment:: how do you hide data in application.properties?

everyone!
this is going to be my first time pushing a newly developed Spring Boot App and I was wondering if there is a way to protect passwords and other sensitive information written in the application.properties file.
Assuming we have the following lines:
# PostgreSQL connection settings
spring.datasource.jdbc-url=jdbc:postgresql://localhost:5432/bdreminder
spring.datasource.username=username
spring.datasource.password=password
The source code is to be first stored on GitHub and having the credentials stored in plain text does not seem to be a good idea.
So, I could probably add the file to the .gitignore one; I could set some environment variables on the host but how would it populate the .properties file afterward? Also, this seems quite cumbersome in terms of the scaling later on.
So, basically, I am trying to see how it is done in a real-life :)
Please, help :)
Simplest option is to create a profile specific application.properties file and activate that profile. So for example create application-private.properties and activate profile private. Of course you have to watch out to not commit this file.
Alternatively, and probably a safer option, is to define a file outside your project and import it in your application.properties with following line:
spring.config.import=file:../path/to/your/external.properties
Spring Boot has extensive support for external configuration. The usual approach is to use one of environment variables, configuration provided by a platform such as Kubernetes, or a specialized configuration system through Spring Cloud Config; these all keep secrets (or just environment-specific information) entirely outside of the code. They also have the advantages of providing a common style of configuration for other applications that do not use Spring Boot.

Multi-file config for Spring Boot app

So, I have a Spring Boot application that is based on a plugin architecture, with its config properties, as you would expect, in an application.yml. Due to the fact that plugins may or may not be enabled however, I am keeping the config for each plugin in a separate file.
What's more, I would like to differentiate between these files (e.g. by naming them differently - preferably after the name of the plugin itself) and not have them all as application.yml.
I know that I can use spring.config.name to add the names of all the property files, depending on what plugin is enabled, but I would like a more dynamic approach.
For example, a config directory, with an application.yml and sub-folders named after each plugin - with a separate application.yml in each one...
Ideally, I would then just set spring.config.location to the path of the config folder and Spring would pick up all these files, by looking up in the sub-folders recursively.
So, my question to you, dear Spring experts, is: what magic dust do I have to sprinkle on my config to make this happen?
Is there any other approach you would recommend I take?

Link to properties file outside webservice

I have a webservice that uses Java, REST, Jersey and runs on Tomcat8. The webservice requires access to a database. Depending on where we are in the process the we may be using a testdatabase, production database or something else. Ideally we would like to be able to set which database to use without requiring a code change and recompile.
The approach we have tried is to have a properties file defining the database parameters and use an environment variable to point to the file. This has proved troublesome, first we've had a hard time defining system properties on the Tomcat server that we can read from the application, also it seems like all the files will have to be defined on the classpath, i.e already configured ahead of time and part of the codebase.
This seems like fairly common scenario, so I'm sure there is a recommended way to handle situations like this?
Zack Macomber has a point here. Don't enable your app/service to look up its settings dynamically.
Make your build process dynamic instead.
Maven, Gradle and friends all provide simple ways to modify output depending on build parameters and or tasks/profiles.
In your code always link to the same file (name). The actual file will then be included based on your task and/or build environment. Test config for tests. Production config for production.
In many cases a complete recompilation is not necessary and will therefore be skipped (this depends on your tool, of course).
No code changes at all. Moreover the code will be dumb as hell as it does not need to know anything about context.
Especially when working on something with multiple people this approach provides the most stable long-term-solution. Customizable for those who need some special, local config and most important transparent for all who don't need or don't want to know about runtime environment requirements!
We have a similar case. We have created a second web service on the same endpoint (/admin) which we call to set a few configuration parameters. We also have a DB for persisting the configuration once set. To make life easier, we also created a simple UI to set these values. The user configures the values in the UI, the UI calls the /admin web service, and the /admin service sets the configuration in memory (as properties) as well as in the DB. The main web service uses the properties as dynamic configuration.
Note: we use JWT based authorization to prevent unauthorized access to /admin. But depending upon your need you can keep it unsecure, use basic HTTP auth or go with something more detailed.
Not sure if in this particular case it is wise, but it is possible indeed to create a .properties file anywhere on the filesystem - and link it into your application by means of a Resources element.
https://tomcat.apache.org/tomcat-8.0-doc/config/resources.html
The Resources element represents all the resources available to the web application. This includes classes, JAR files, HTML, JSPs and any other files that contribute to the web application. Implementations are provided to use directories, JAR files and WARs as the source of these resources and the resources implementation may be extended to provide support for files stored in other forms such as in a database or a versioned repository.
You would need a PreResources element here, linking to a folder, the contents of which will be made available to the application at /WEB-INF/classes.
<Context antiResourceLocking="false" privileged="true" docBase="${catalina.home}/webapps/myapp">
<Resources className="org.apache.catalina.webresources.StandardRoot">
<!-- external res folder (contains settings.properties) -->
<PreResources className="org.apache.catalina.webresources.DirResourceSet"
base="/home/whatever/path/config/"
webAppMount="/WEB-INF/classes" />
</Resources>
</Context>
Your application now 'sees' the files in /home/whatever/path/config/ as if they were located at /WEB-INF/classes.
Typically, the Resources element is put inside a Context element. The Context element must be put in a file located at:
$CATALINA_BASE/conf/[enginename]/[hostname]/ROOT.xml
See https://tomcat.apache.org/tomcat-8.0-doc/config/context.html#Defining_a_context

Spring Best approach for multiple environments

I have the following:
System A - Authorization (REST API)
System B - Needs to check for auth
System C - Needs to check for auth
System D - Needs to check for auth
And I have many environment:
Development
Homolog
Production
Each one will have different URLs for System A. So I want to create a project that will integrate those systems. Since All Systems use Jersey and Spring, I can create one filter (jersey) that will abort the request in case the user is not authorized.
So the idea is to create Integration System that will be a JAR with Jerseys filters and uses the parents configuration (Active profile from Spring) to get the correct URL. I might even use this JAR to make System B communicate with System D also, if I can make this work.
The trick is, making this JAR get the correct .properties file based on the Enviroment (set on the parent-project). To be honest, I dont know where to begin.
Reading the DOCs for Spring Environment I found:
Do not use profiles if a simpler approach can get the job done. If the only thing changing between profiles is the value of properties, Spring's existing PropertyPlaceholderConfigurer / may be all you need.
I could have 3 different properties files (development, homolog or production) or I could have one properties file with different keys:
system.a.url.develpment=http://localhost:8080/systemA/authorize
system.a.url.homolog=http://localhost:8081/systemA/authorize
system.a.url.production=http://api.systemA.com/authorize
What is the best approach? What would you do?
In such "simple" case I would only use property file for configuration of urls and have different config files for different environments (dev, prod,..) with one (same named property), e.g.
system.a.url=http://localhost:8081/systemA/authorize
You can manage your property files manually (e.g. outside your jar/war) or you can use maven profiles to make jar/war file specific for your environment. But I don't see the need for spring profiles.
EDIT: Alternatively you can use environment variables to "configure" settings specific to an environment (what a coincidence in the names :)). Note that you can have different environments also inside one machine. For more details check e.g. this.
export AUTH_URL="http://localhost:8081/systemA/authorize"

Akka BalancingDispatcher Config

I have created a file application.conf in src/main/resources that looks like this:
balancing-dispatcher {
type = BalancingDispatcher
executor = "thread-pool-executor"
}
There is nothing else in the file.
Upon creating a new Actor (through my test suite using Akka TestKit) that tries to use the dispatcher, I receive this error message:
[WARN] [04/13/2013 21:55:28.007] [default-akka.actor.default-dispatcher-2] [Dispatchers] Dispatcher [balancing-dispatcher] not configured, using default-dispatcher
My program then runs correctly, albeit using only a single thread.
Furthermore, I intend to package my program into a library. The akka docs state this:
If you are writing an Akka application, keep you configuration in application.conf at
the root of the class path. If you are writing an Akka-based library, keep its
configuration in reference.conf at the root of the JAR file.
I have tried both of these methods so far, but neither has worked.
Any ideas?
Since your application.conf is not found I can only assume that src/main/resources is not part of your build path (cannot comment further without knowing which tool you use for building).
One small thing: why do you use "thread-pool-executor" in there? We found the default "fork-join-executor" to scale better.
Your comment about the one thread suggests that you are creating just one actor; using a BalancingDispatcher does not automagically create more actors, you will have to tell Akka to do that somehow (e.g. creating multiple instance of that same actor manually or via a Router).
The question of reference.conf vs. application.conf is more one of the nature of the settings. If your library wants to get its own settings from the config, then default values should go into reference.conf; that is the design concept and the reason why this file is always implicitly merged in. Defaults should only be in that file, never in the code.

Categories