Am using CloudBees to deploy my Java EE application. In that I need to write and read files and I wont find any cloud file system from CloudBees. Please suggest me any free cloud file system storage and java code to access that file system.
Using jclouds you can store stuff in several different clouds while using a consistent API. http://www.jclouds.org/
You can store files - however they will be ephemeral and not shared in the cluster. To achieve that, you would need to store in a DB or s3 or similar (there is also an option of webdav).
file system on RUN#Cloud is indeed not persistent neither distributed. File stored there will "disappear" when application is redeployed/restarted and will not be replicated if application scale out on multiple nodes in cluster.
Best option afaik is to use a storage service (amazon s3 to benefit from low network latency from your RUN instance) using jclouds neutral API (http://www.jclouds.org/documentation/quickstart/aws/), that can be configured to use filsystem storage (http://www.jclouds.org/documentation/quickstart/filesystem/) so that you can test on you own computer, and cache filestore content in temp directory - as defined by System.getProperty("java.io.temp") - to get best performances.
This will require a "warm-up" phase for this cache to get populated from filestore, but you'll then get best of both words.
Related
We have got a web based Java application which we are planning to migrate to cloud with an intention that multiple clients will be using it in a SaaS based environment. The current architecture of the application is quite asynchronous in nature. There are 4 different modules, each having a database of its own. When there is a need of data exchange between the modules we push the data using Pentaho and make use of a directory structure to store the interim data file, which is then picked up by the other module to populate its database. Given the nature of our application this asynchronous communication is very important for us.
Now we are facing a couple of challenges while migrating this application to cloud:
We are planning to use Multi Tenancy on our database server, but how do we ensure that the flat files we use for transferring the data between different modules are also channelized to their respective tenants in the DB.
Since we are planning to host this in cloud, would seek your views, if keeping a text file on a cloud server would be safe from a data security perspective.
File storage in cloud is safe and you can use control IAM roles setup to control the permissions of a file. Cloud providers like Google (Cloud storage), Amazon (AWS S3), etc provides a secure and scalable infrastructure to maintain files in the cloud.
In general setup, cloud storage provides you with buckets which are tagged with a global unique identification. For a multi-tenant setup you can create multiple buckets for individual tenants and store the necessary data feeds in it. Next, you can have jobs batch or streaming jobs using kettle (Pentaho) to push it to the right database based on the unique bucket definition.
Alternatively, you can also push (like other answers) to a streaming setup (like ActiveMQ, Kafka, etc) with user specific topics and have a streaming service (using java or pentaho) to ingest the data to respective database based on the topic.
Hope this helps :)
I cannot realistically give any specific advice without knowing more
about your system. However, based on my experience, I would
recommend switching to message queues, something like Kafka would
work nicely.
Yes, cloud providers offer enough security for static file storage. You can
limit access however you see fit, for example using AWS S3.
1- The multi tenancy may create a bit of issue while transferring the files. But from what information you have given the process of flat file movement across application will not be impacted. Still you can think of moving to MQ mode for passing the data across.
2-From data security view, AWS provides lot of features at access level, MFA, etc. If it needs to be highly secured i would recommend to get AWS Private cloud where nothing is shared with any one at any level.
My Apache Spark application takes various input files and stores the results and logs in other files. The input files are provided along with the application which is supposed to run on the Amazon cloud (EMR seemed preferable to EC2).
Now, I know that I'm supposed to create an uber-jar containing my input files and the application that accesses them. However, how do I retrieve the generated files from the cloud, once the execution finishes?
As an additional info, the files are created and written using relative paths from the code.
Assuming you mean that you want to access the output generated by the Spark application outside the cluster, the usual thing to do is to write it to S3. Then you may of course read the data directly from S3 from outside the EMR cluster.
I use neo4j database to store images id`s, but in which way it would be better to structure my folders that file system can access them the fastest way.
Store all images in the single folder
Create new folders when the original reach some kind of max size
Store all images in individual folder for each user, place, service e.t.c.
Or else...
Serving a file is low-level but Java is interpreted from the VM. You better use a apache who serves the images low-level and a tomcat for the application connected via AJP to the apache.
I am maintaining/developing a web application which is deployed in multiple nodes of a websphere cell. There are two nodes in a WAS cell. Each node has a web server in which my web application is deployed. So there are two instances of web application.
I can use the URL provider to read the property file from the web application. (Reference)
But I have to maintain an identical property file on each server. When I need to change I have to change it on both servers.
Is there anyway I can maintain a single property file and access it from web application deployed on different places? Or any other better way to do this?
If you read your property files using a URL resource (a good practice), then you can host your property file on a single internal web server. The URL resource reference in each of your web containers would point to this internal web server. Then you would only have to change the property file in your internal web server document root.
This practice has several drawbacks.
Security - By externalizing your configuration, you now have another attack vector. You could apply mutual-auth SSL to this scenario, but that gets more complicated than simply maintaining two property files.
Availability - Now your internal web server is a single point of failure. You could cluster it; but then you have more servers to manage, precisely what you were trying to avoid.
Latency - Reading configuration off box involves more latency than the file system.
I believe the property file per node is going to work best. If copying a file twice is so much more onerous than copying it once, just script it. That will scale to however many nodes you opt to deploy.
CAVEAT: This is a bit outside my experience, but if I'm understanding your question correctly...
If you have federated the servers as a single cluster, I believe the Intelligent Management tools can take care of copying an equivalent configuration out to all of them. Each would then read the configuration information from their own local copy.
If I had an application that stored information in its datastore. Is there a way to access that same datastore from a second application?
Yes you can, with the Remote APIs.
For example, you can use Remote API to access a production datastore
from an app running on your local machine. You can also use Remote API
to access the datastore of one App Engine app from a different App
Engine app.
You need to configure the servlet (see documentation for that) and import the appengine-remote-api.jar in your project (You can find it ..\appengine-java-sdk\lib\)
Only remember that Ancestor Queries with Remote APIs are not working (See this)
You didn't mention why you wanted to access the datastore of one application from another, but depending on the nature of your situation, App Engine modules might be a solution. These are structurally similar to separate applications, but they run under the same application "umbrella" and can access a common datastore.
You can not directly access datastore of another application. Your application must actively serve that data in order for another application to be able to access it. The easiest way to achieve this is via Remote API, which needs a piece of code installed in order to serve the data.
If you would like to have two separate code bases (even serving different hostnames/urls), then see the new AppEngine Modules. They give you ability to run totally different code on separate urls and with different runtime settings (instances), while still being on one application sharing all stateful services (datastore, tasks queue, memcache..).