I am new to this topic. I had decided to develop a parallel processing framework for cloud data processing applications in java for my project. the framework has to divide the given sequential java code and process that sub codes in different virtual machines in the cloud. the framework has to dynamically allocate and deallocate the resources according to the load. My problem is how to develop the framework.
Is there any libraries available to schedule the java code into different virtual machine in cloud? please inform me if anything is available.
Terracotta and Gridgain are excellent solutions. Those cited by yerlikayaoglu (Hadoop and hazelcast) are excellent too in their domain but they are all 4 very different and depend on the use case. That's for the map/reduce problem
An other one is the allocation/deallocation of virtual machines. It depends on your cloud provider and some other thing. You can have a look at jClouds
There are solutions such as Hazelcast, Hadoop etc. You can look this projects.
Have a look at Hadoop, a framework which allows basically the same thing, and supports automatic code deployment over the cluster.
If you want to do real time processing you can take a look at storm.
Also Akka provides nice remote actors API for scala and java.
Related
I am trying to monitor CPU / Memory process and etc.
Are there studies or article on the approach of monitoring such systems?
For example..
Duration it stays above 80% before sending an alert.... and after an alert.. under what circumstances it will be resolved and etc.
Is more of an literature review rather than jumping into the implementations aspect.
Project is written in Java
If you want to use a Standard API for Java you can use Java Management Extensions JMX to monitor various server metrics. Oracle offers a good tutorial. There are various implementations and you should research them on your own. If you are already using Apache ActiveMQ in your applicaiotn you could try their built in support. I have heard good things from people I work with about the Sigar version (I am not affiliated with them in any way).
I am an Engineering final year student. I am doing project in cloud computing. I have confident idea about the concept. But i don't know how to simulate the concept in cloud. For PG student level Which cloud computing simulation environment is easy to use? Kindly give your
valuable suggestion. ( Now i am implementing the concept in java )
Try taking a look at OpenShift, its free and very easy to use if your familiar with Unix/Git. I host my blog there on a Java/Unix/MySql stack and have been very satisfied.
Firstly, I recommend you to understand the difference between an IaaS and a PaaS. Wikipedia is always a good place where you can find this information. Maybe you could compare both cloud computer models.
You will see that on PaaS is much easier start with a service since you don't need to install, neither to configure anything. Usually, you just need a button to make available a specific service and not a lot of steps to deploy your application.
You should look for the "How to start" of different PaaS providers. You can start for this How to start tutorial and after this, look for similar guides and compare the most important providers. You could see that it is really easy start working on this cloud model.
Agree: PaaS might be a good starting point. I don't have any experience with Java though, a quick Google search: http://www.cloudbees.com/ might be something.
If you want to go a bit deeper, you should try out Amazon's EC2. I believe they have done a very good job, plus they offer a free tier for one year.
If you want to build cloud computing simulations in Java, take a look at CloudSim Plus. It is a modern, full-featured, highly extensible and easier-to-use Java 8 Framework for Modeling and Simulation of Cloud Computing Infrastructures and Services.
It is an actively maintained, totally re-designed, better organized and largely documented project. It has a large number of exclusive features and is the only cloud simulation framework available at maven central.
Some of its main characteristics and features include:
Vertical VM Scaling
that performs on-demand up and down allocation of VM resources such as Ram, Bandwidth and PEs (CPUs).
Horizontal VM scaling, allowing dynamic creation of VMs according to an overload condition. Such a condition is defined by a predicate that can check different VM resources usage such as CPU, RAM or BW.
Parallel execution of simulations, allowing several simulations to be run simultaneously, in a isolated way, inside a multi-core computer.
Listeners to enable simulation monitoring.
Classes and interfaces to allow implementation of heuristics such as
Tabu Search, Simulated Annealing,
Ant Colony Systems and so on. See an example using Simulated Annealing here.
I am trying to pick a right web technology both for I/O heavy and CPU heavy tasks. NodeJs is perfect for handling large load and it also can be scaled out. However, I am stuck with the cpu heavy part. Is it possible to integrate another technology (e.g. Java) into node, so that I will have it running my algorithms in other threads and then use the results again in node. Is there any existing solution? Any other suggestions will be very good.
You can intergrate NodeJS with Java using node-java.
As mentioned in a previous answer, you can use node-java which is an npm module that talks to Java. You can also use J2V8 which wraps Node.js as a Java library and provides a Node.js API in Java.
The answer is lambda architecture.
NodeJs is nice by itself - handling fast queries in a lightweight manner, not doing any extra computations on data.
The CPU heavy tasks can be easily delegated to specialized components based on JVM (well, the most famous ones are on JVM). This is nicely implemented by using message brokers and microservices.
An event-based architecture, where nodejs can be hooked up to databases like Cassandra or Mongodb and cluster computing frameworks like Apache Spark (not necessarily, though, it depends on the problem) to handle the cpu-heavy parts of the system. And lightweight containers add an icing to the cake by providing nice isolated runtime environments for each of the components to live in.
That's my conclusion so far regarding this question.
I think the suggestions above sort of eliminate the need to wrap node under java or other JVM based solution for cpu-heavy tasks.
NodeJS is based on the v8 javascript engine which is written in c++.
It is therefore possible to write fully native addons in c++ for NodeJS. Check out some of these resources:
https://github.com/nodejs/node-addon-api
https://github.com/nodejs/node-addon-examples
I'm looking into a NoSQL database for use with Vert.x
Based on the not so favorable results mongoDB is out, so I'm looking at CouchDB/CouchBase, not at least since some of our data collection runs on RaberryPI fed by Arduino I/O (with a Rasbery PI CouchDB instance for offline collection).
What Java library would be suitable/best for use with CouchDB and Vert.x
I don't know a lot about vert.x but it appears to run on the JVM, so you should just be able to use Ektorp, which is pretty much the standard Java library for CouchDB nowadays. It covers all the core functionality, it's fairly well thought out, and the maintainer has been reasonably responsive to pull requests etc, as far as I've seen.
There's more documentation on Ektorp here.
I have a small cluster of Linux machines and an account on all of them.
I have ssh access to all of them with no-password login.
How can I use actors or other Scala's concurrency abstraction to achieve distribution ?
What is the simplest path?
Does some library can distribute the processes for me?
The computers are unreliable, they can go on and off wheter they (students) feel like it.
Does some library can distribute the processes for me and watch for ready computers?
Can I avoid bash scripts?
I'd use Akka in your place. It is a distributed computing platform for Scala and Java, based on Erlang's Actor model. In particular, the let-it-fail philosophy it inherits from Erlang is particularly well suited to an environment where the nodes might go off at any time.
You could use Scala's build in support for remote actors. Browse the web or see Scala remote actors for additional information.
You could also take a look at GridGain a very easy to use grid computing framework for Java and Scala.
What you are looking for is a grid.
For a free one take a look at http://www.jppf.org/
JavaSpaces allows you to create distributed Data Structures that facilitate Distributed Computing. Not cheap, but look at GigaSpaces for a robust implementation.
This book gave me an eye opening experience of the possibilities.