How to manage properties and secret values in Dataflow Java pipelines?

How to manage properties and secret values in Dataflow Java pipelines? - java

I have some Dataflow pipelines written in Java that run on GCP in different environments/projects (development, UAT, production). Currently, the environment configuration (mainly connection parameters for Cloud SQL instances and BigQuery datasets) is managed using a static map in a Java class (key = env, value = map of properties) and an utility class to dynamically load additional files from Cloud Storage.
What are the best practices (if any) for managing the configuration in such a context?
Essentially, I see two kinds of configuration parameters:
plain values (something that in a Spring application you'd store in a plain property file)
secret values (property files containing data that must be encrypted - username/password for a database, API keys - something that in a K8S context can be mounted as Secret)
Thanks.

I think you will find this tutorial helpful on how to Access Secret Manager from Dataflow Pipeline
"As of today, Dataflow does not provide native support for storing and
accessing secrets. To secure those secrets, the common approach is to
use Cloud KMS to encrypt the secret and decrypt it when running the
data pipeline. With the newly launched Secret Manager, we can now
store those secrets in Secret Manager and access them from our
pipeline to provide better security and ease of use."
The following code uses Secret Manager SDK to access the secret given a JDBC URL secret name.
private static String jdbcUrlTranslator(String jdbcUrlSecretName) {
try (SecretManagerServiceClient client = SecretManagerServiceClient.create()) {
AccessSecretVersionResponse response = client.accessSecretVersion(jdbcUrlSecretName);
return response.getPayload().getData().toStringUtf8();
} catch (IOException e) {
throw new RuntimeException("Unable to read JDBC URL secret");
}
}
public static void main(String[] args) {
PipelineOptionsFactory.register(MainPipelineOptions.class);
MainPipelineOptions options =
PipelineOptionsFactory.fromArgs(args)
.withValidation()
.as(MainPipelineOptions.class);
NestedValueProvider<String, String> jdbcUrlValueProvider =
NestedValueProvider.of(
options.getJdbcUrlSecretName(), MainPipeline::jdbcUrlTranslator);
Pipeline pipeline = Pipeline.create(options);
pipeline
.apply("SQL Server - Read Sales.Customers_Archive",
JdbcIO.<KV<Integer, String>>read()
.withDataSourceConfiguration(
JdbcIO.DataSourceConfiguration.create(
StaticValueProvider.of("com.microsoft.sqlserver.jdbc.SQLServerDriver"),
jdbcUrlValueProvider)
);
// Other transforms
pipeline.run();
}

One way to handle to handle secret values inside the Google Cloud Platform is using the Secret Manager handling the encryption and access control to the stored passwords.
Inside your Java code you can use the Google Cloud Secret Manager maven module to get the sotred secret values

Related

Couchbase Analytics Java SDK connection creation + Security Roles

I am using the Couchbase Java SDK to query a couchbase analytics service. The process is covered in this tutorial: https://docs.couchbase.com/java-sdk/2.7/analytics-using-sdk.html
The Java SDK provides a Bucket object as the means of accessing couchbase. However a bucket is a separate entity from the analytics data-set. For example, my bucket is called data, and I have an analytics data-set that I want to query called requests.
I cannot find a means of connecting to just the requests data-set. The SDK will only connect to the data bucket. From there I can query the requests data-set by writing some N1QL. This work-around means that the user credentials I'm using to run analytics queries must also have access to my main production data bucket, which I'd rather prevent.
Is there a way to connect to simply the analytics data-set using the SDK?
The code I have currently creating the connection looks like this:
public class CouchbaseConfig {
#Bean
public Bucket bucket(CouchbaseProperties properties) {
return cluster().openBucket("data"); // Changing this to the data-set name returns error
}
private Cluster cluster() {
Cluster cluster = CouchbaseCluster.create("localhost");
cluster.authenticate("Administrator", "password");
return cluster;
}
}
Using the requests data-set name in the bucket name results in this error:
Failed to instantiate [com.couchbase.client.java.Bucket]: Factory method 'bucket' threw exception; nested exception is com.couchbase.client.java.error.BucketDoesNotExistException: Bucket "requests" does not exist.
Using the data bucket name, but authentication username / password "analytics-reader" / "password" (with only Analytics Reader) roles results in this error:
Could not load bucket configuration: FAILURE({"message":"Forbidden. User needs one of the following permissions","permissions":["cluster.bucket[data].settings!read"]})
The only work around I have found is to give the analytics-reader user 'Application Access' to the data` bucket 😢

Connecting directly to analytics is possible with SDK3 and Couchbase 6.5 . In all the previous versions (SDK 2.7 included), the only way to query analytics is to connect to a bucket first.

Azure list Azure Database for PostgreSQL servers in Resource group using Azure Java SDK

What is the best and correct way to list Azure Database for PostgreSQL servers present in my Resource Group using Azure Java SDK?
Currently, we have deployments that happen using ARM templates and once the resources have been deployed we want to be able to get the information about those resources from Azure itself.
I have tried doing in the following way:
PagedList<SqlServer> azureSqlServers = azure1.sqlServers().listByResourceGroup("resourceGrpName");
//PagedList<SqlServer> azureSqlServers = azure1.sqlServers().list();
for(SqlServer azureSqlServer : azureSqlServers) {
System.out.println(azureSqlServer.fullyQualifiedDomainName());
}
System.out.println(azureSqlServers.size());
But the list size returned is 0.
However, for virtual machines, I am able to get the information in the following way:
PagedList<VirtualMachine> vms = azure1.virtualMachines().listByResourceGroup("resourceGrpName");
for (VirtualMachine vm : vms) {
System.out.println(vm.name());
System.out.println(vm.powerState());
System.out.println(vm.size());
System.out.println(vm.tags());
}
So, what is the right way of getting the information about the Azure Database for PostgreSQL using Azure Java SDK?
P.S.
Once I get the information regarding Azure Database for PostgreSQL, I would need similar information about the Azure Database for MySQL Servers.
Edit: I have seen this question which was asked 2 years back and would like to know if Azure added Support for Azure Database for PostgreSQL/MySQL servers or not.
Azure Java SDK for MySQL/PostgreSQL databases?

So, I kind of implemented it in the following way and it can be treated as an alternative way...
Looking at the Azure SDK for java repo on Github (https://github.com/Azure/azure-sdk-for-java/tree/master/sdk/postgresql), looks like they have it in beta so I searched for the pom in mvnrepository. I imported the following pom in my project (azure-mgmt-postgresql is still in beta):
<!-- https://mvnrepository.com/artifact/com.microsoft.azure.postgresql.v2017_12_01/azure-mgmt-postgresql -->
<dependency>
<groupId>com.microsoft.azure.postgresql.v2017_12_01</groupId>
<artifactId>azure-mgmt-postgresql</artifactId>
<version>1.0.0-beta-5</version>
</dependency>
In the code, Following is the gist of how I did it:
I already have a service principal created and have its information with me.
But, anyone trying this will require clientId, tenantId, clientSecret, and subscriptionId with them, the way #Jim Xu explained.
// create the credentials object
ApplicationTokenCredentials credentials = new ApplicationTokenCredentials(clientId, tenantId, clientSecret, AzureEnvironment.AZURE);
// build a rest client object configured with the credentials created above
RestClient restClient = new RestClient.Builder()
.withBaseUrl(credentials.environment(), AzureEnvironment.Endpoint.RESOURCE_MANAGER)
.withCredentials(credentials)
.withSerializerAdapter(new AzureJacksonAdapter())
.withResponseBuilderFactory(new AzureResponseBuilder.Factory())
.withInterceptor(new ProviderRegistrationInterceptor(credentials))
.withInterceptor(new ResourceManagerThrottlingInterceptor())
.build();
// use the PostgreSQLManager
PostgreSQLManager psqlManager = PostgreSQLManager.authenticate(restClient, subscriptionId);
PagedList<Server> azurePsqlServers = psqlManager.servers().listByResourceGroup(resourceGrpName);
for(Server azurePsqlServer : azurePsqlServers) {
System.out.println(azurePsqlServer.fullyQualifiedDomainName());
System.out.println(azurePsqlServer.userVisibleState().toString());
System.out.println(azurePsqlServer.sku().name());
}
Note: Server class refers to com.microsoft.azure.management.postgresql.v2017_12_01.Server
Also, if you take a look at the Azure class, you will notice this is how they do it internally.
For reference, you can use SqlServerManager sqlServerManager in the Azure class and look at how they have used it and created an authenticated manager in case you want to use some services that are still in preview or beta.

According to my test, we can use java sdk azure-mgmt-resources to implement your need. For example
Create a service principal
az login
# it will create a service pricipal and assign a contributor rolen to the sp
az ad sp create-for-rbac -n "MyApp" --scope "/subscriptions/<subscription id>" --sdk-auth
code
String tenantId = "<the tenantId you copy >";
String clientId = "<the clientId you copy>";
String clientSecret= "<the clientSecre you copy>";
String subscriptionId = "<the subscription id you copy>";
ApplicationTokenCredentials creds = new
ApplicationTokenCredentials(clientId,domain,secret,AzureEnvironment.AZURE);
RestClient restClient =new RestClient.Builder()
.withBaseUrl(AzureEnvironment.AZURE, AzureEnvironment.Endpoint.RESOURCE_MANAGER)
.withSerializerAdapter(new AzureJacksonAdapter())
.withReadTimeout(150, TimeUnit.SECONDS)
.withLogLevel(LogLevel.BODY)
.withResponseBuilderFactory(new AzureResponseBuilder.Factory())
.withCredentials(creds)
.build();
ResourceManager resourceClient= ResourceManager.authenticate(restClient).withSubscription(subscriptionId);
ResourceManagementClientImpl client= resourceClient.inner();
String filter="resourceType eq 'Microsoft.DBforPostgreSQL/servers'"; //The filter to apply on the operation
String expand=null;//The $expand query parameter. You can expand createdTime and changedTime.For example, to expand both properties, use $expand=changedTime,createdTime
Integer top =null;// The number of results to return. If null is passed, returns all resource groups.
PagedList<GenericResourceInner> results= client.resources().list(filter, null,top);
while (true) {
for (GenericResourceInner resource : results.currentPage().items()) {
System.out.println(resource.id());
System.out.println(resource.name());
System.out.println(resource.type());
System.out.println(resource.location());
System.out.println(resource.sku().name());
System.out.println("------------------------------");
}
if (results.hasNextPage()) {
results.loadNextPage();
} else {
break;
}
}
Besides, you also can use Azure REST API to implement your need. For more details, please refer to https://learn.microsoft.com/en-us/rest/api/resources/resources

How to secure the password used by TextEncryptor in spring boot

I need to encrypt some details for the users of my application (not the password, I am using bcrypt for that), I need to access the details at some point in the future, so i need to be able to decrypt these details, to do that, I have the following class in my spring boot application, my question is how to secure the password used to encrypt the text?
import org.springframework.security.crypto.encrypt.Encryptors;
import org.springframework.security.crypto.encrypt.TextEncryptor;
public class Crypto
{
final static String password = "How_to_Secure_This_Password?";
public static String encrypt(String textToEncrypt, String salt)
{
if (textToEncrypt != null && !textToEncrypt.isEmpty())
{
TextEncryptor encryptor = Encryptors.text(password, salt);
String encryptedText = encryptor.encrypt(textToEncrypt);
return encryptedText;
}
return null;
}
public static String decrypt(String encryptedText, String salt)
{
if(encryptedText != null && !encryptedText.isEmpty())
{
TextEncryptor decryptor = Encryptors.text(password, salt);
String decryptedText = decryptor.decrypt(encryptedText);
return decryptedText;
}
return null;
}
}
From my research so far I can suggest the following solutions:
1- Get the password from a properties file and use Spring Cloud Config for the encryption/decryption feature for the properties file (values prefixed with the string {cipher}), a good starting point is here. I don't like this solution as I don't need the client/sever config structure, and I don't feel good about using it for the sake of one feature only, I believe Spring framework should have similar feature.
2- Use Jasypt library, or its 'unofficial' support for spring boot from here. Again, not sure if the problem is a matter of encrypting this password in a properties file?
3- use the Vault which looks built for something similar to what I need here (API keys, secrets etc...) but it is too much overhead to build, maintain, and integrate ...
My argument here is that if an attacker was able to access my database machine/s then he is most likely will be able to access the application machine/s which means he may be able to revers-engineer the class and will be able to decrypt all the details which I want to secure! I feel confused here, what is best practice and the industry standard here?

The best solution so far is to use Spring Cloud Vault as I am already using spring boot, it can offer more than securing this password, in fact it can secure the password for many API keys, databases etc (it is RC release at the time of writing) .. however, I am not convinced yet that this is the ultimate solution as my application still need to authenticate against the Vault, but I have to say that this is done in a more advanced way and gives a one step further than keeping passwords in config files ...
The issue is chicken and egg problem, and it turns out that SO has so many many similar questions for similar scenarios (saving database password in config, hide it in code, hid password in PBE store etc etc).
This well explained by Mark Paluch in his getting started article
Encrypted data is one step better than unencrypted. Encryption imposes on the other side the need for decryption on the user side which requires a decryption key to be distributed. Now, where do you put the key? Is the key protected by a passphrase? Where do you put the passphrase? On how many systems do you distribute your key and the passphrase?
As you see, encryption introduces a chicken-egg problem. Storing a
decryption key gives the application the possibility to decrypt data.
It also allows an attack vector. Someone who is not authorized could
get access to the decryption key by having access to the machine. That
person can decrypt data which is decryptable by this key. The key is
static so a leaked key requires the change of keys. Data needs to be
re-encrypted and credentials need to be changed. It’s not possible to
discover such leakage with online measure because data can be
decrypted offline once it was obtained.
.......
Vault isn’t the answer for all security concern. It’s worth to check
the Vault Security Model documentation to get an idea of the threat
model.
Ironically enough, Vault storage backend needs to be configured with plain text passwords for most cases (MySQL,S3, Azure, ... I am not accepting this as an answer to my question yet, but this is what I have found so far, waiting for more input from fellow SO contributors with thanks!

Vault is a good solution, but one way to do it would be to provide the password manually when you initialise the component, so that it would be stored in memory and not typed in any config file.

Documents added from java code not syncing with couchbase lite

I did syncing from local couchbase server to my android and IOS application and it is working fine for mobile to server and server to mobile. Then i tried to insert document from JAVA Web application to local server and i succeed to do that. But the problem is that the document inserted by java web application is not syncing with both ios/android mobile applications. My java code to insert document to local server is as follows:
public class CouchBase {
public static void main(String args[]) {
Cluster cluster = CouchbaseCluster.create("127.0.0.1");
Bucket bucket = cluster.openBucket("test");
JsonObject user = JsonObject.empty()
.put("name", "amol")
.put("city", "mumbai");
JsonDocument doc = JsonDocument.create("102", user);
bucket.insert(doc);
System.out.println(doc.content().getString("name"));
}
}
In this code i have created one bucket and then i have created one json object holding required values and passing this object to the json document and finally inserting that document into bucket.
Now my mobile side code to create document:
Document document = database.getDocument(etId.getText().toString());
Map<String, Object> map = new HashMap<String, Object>();
map.put("name", etName.getText().toString());
map.put("city", etCity.getText().toString());
try {
document.putProperties(map);
} catch (CouchbaseLiteException e) {
Log.e(TAG, "Error putting", e);
}
In this code i am simply creating one document and putting values in it.
My syncing code is as follows:
Replication pullReplication = database.createPullReplication(syncUrl);
Replication pushReplication = database.createPushReplication(syncUrl);
pullReplication.setContinuous(true);
pushReplication.setContinuous(true);
pullReplication.start();
pushReplication.start();
Where i am doing Bi-directional syncing.
I am not getting where i am wrong with java code.please help me to out of this problem

Sync gateway doesnt track document inserted through Couchbase-Server java sdk,Also It is not advised to directly insert the data in sync-gateway bucket through java-sdk, you can use bucket shadowing for that.
If you want to insert data through your web application you can make use of sync gateway rest api calls http://developer.couchbase.com/documentation/mobile/1.1.0/develop/references/sync-gateway/rest-api/index.html

At the time of this writing, it's not possible to use the Server SDKs on the bucket used by Sync Gateway. That's because when a new document revision is saved in a Sync Gateway database it goes through the Sync Function to route the documents to channels and grant users and roles access to channels. Some of that metadata is persisted under the _sync property in the document in Couchbase Server. The Server SDKs are not currently aware of the revision based system so it will update the field on the document without creating a new revision.
The recommended way to read/write the Sync Gateway data from a Java Web app is to use the Sync Gateway REST API.

Querying Views in Couchbase, Java Client

I'm using the 1.4.3 version of the java client and am attempting to connect to the Couchbase server I have running locally but I'm getting auth errors. After looking through the code (isn't open source great?) of how their client library is using the variables amongst their classes I've come to the conclusion that if I want to be able to connect to a "bucket" that I have to create a user for each "bucket" with the same user name as that bucket. This makes no sense to me. I have to be wrong. Aren't I? There has to be another way. What is that way?
For reference, here is what I'm using to create a connection (it's Scala but would look nearly identical in Java):
val cf = new CouchbaseConnectionFactoryBuilder()
.setViewTimeout(opTimeout)
.setViewWorkerSize(workerSize)
.setViewConnsPerNode(conPerNode)
.buildCouchbaseConnection(nodes, bucket, password)
new CouchbaseClient(cf)
which follows directly from their examples.
Their Code
If I look into the code in which they're connecting to the "view" itself I see the following:
public ViewConnection createViewConnection(
List<InetSocketAddress> addrs) throws IOException {
return new ViewConnection(this, addrs, bucket, pass);
}
which is then passed to a constructor:
public ViewConnection(final CouchbaseConnectionFactory cf,
final List<InetSocketAddress> seedAddrs, final String user,
final String password) //more code...
and that user variable is actually used in the HTTP Basic Auth to form the Authentication header. That user variable being, of course, equivalent to the bucket variable in the CouchbaseConnectionFactory.

You are correct - each bucket should be authenticated with the bucket name as the user. However, there aren't any users to 'create' - you're just using whatever (bucket) name and password you setup when you created the bucket on the Cluster UI.
Note that people usually use one bucket per application (don't think bucket == table, think bucket == database) and so you wouldn't typically need more than a couple of buckets for most applications.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.