I am an Engineering final year student. I am doing project in cloud computing. I have confident idea about the concept. But i don't know how to simulate the concept in cloud. For PG student level Which cloud computing simulation environment is easy to use? Kindly give your
valuable suggestion. ( Now i am implementing the concept in java )
Try taking a look at OpenShift, its free and very easy to use if your familiar with Unix/Git. I host my blog there on a Java/Unix/MySql stack and have been very satisfied.
Firstly, I recommend you to understand the difference between an IaaS and a PaaS. Wikipedia is always a good place where you can find this information. Maybe you could compare both cloud computer models.
You will see that on PaaS is much easier start with a service since you don't need to install, neither to configure anything. Usually, you just need a button to make available a specific service and not a lot of steps to deploy your application.
You should look for the "How to start" of different PaaS providers. You can start for this How to start tutorial and after this, look for similar guides and compare the most important providers. You could see that it is really easy start working on this cloud model.
Agree: PaaS might be a good starting point. I don't have any experience with Java though, a quick Google search: http://www.cloudbees.com/ might be something.
If you want to go a bit deeper, you should try out Amazon's EC2. I believe they have done a very good job, plus they offer a free tier for one year.
If you want to build cloud computing simulations in Java, take a look at CloudSim Plus. It is a modern, full-featured, highly extensible and easier-to-use Java 8 Framework for Modeling and Simulation of Cloud Computing Infrastructures and Services.
It is an actively maintained, totally re-designed, better organized and largely documented project. It has a large number of exclusive features and is the only cloud simulation framework available at maven central.
Some of its main characteristics and features include:
Vertical VM Scaling
that performs on-demand up and down allocation of VM resources such as Ram, Bandwidth and PEs (CPUs).
Horizontal VM scaling, allowing dynamic creation of VMs according to an overload condition. Such a condition is defined by a predicate that can check different VM resources usage such as CPU, RAM or BW.
Parallel execution of simulations, allowing several simulations to be run simultaneously, in a isolated way, inside a multi-core computer.
Listeners to enable simulation monitoring.
Classes and interfaces to allow implementation of heuristics such as
Tabu Search, Simulated Annealing,
Ant Colony Systems and so on. See an example using Simulated Annealing here.
Related
I am trying to wrap my head around Apache Mesos and need clarification on a few items.
My understanding of Mesos is that it is an executable that gets installed on every physical/VM server ("node") in a cluster, and then provides a Java API (somehow) that treats each individual node as a collective pool of computing resources (CPU/RAM/etc.). Hence, to programs coding against the Java API, they only see 1 single set of resources, and don't have to worry about how/where the code is deployed.
So for one, I could be fundamentally wrong in my understanding here (in which case, please correct me!). But if I'm on target, then how does the Java API (provided by Mesos) allow Java clients to tap into these resources?!? Can someone give a concrete example of Mesos in action?
Update
Take a look at my awful drawing below. If I understand the Mesos architecture correctly, we have a cluster of 3 physical servers (phys01, phys02 and phys03). Each of these physicals is running an Ubuntu host (or whatever). Through a hypervisor, say, Xen, we can run 1+ VMs.
I am interested in Docker & CoreOS, so I'll use those in this example, but I'm guessing the same could apply to other non-container setups.
So on each VM we have CoreOS. Running on each CoreOS instance is a Mesos executable/server. All Mesos nodes in a cluster see everything underneath them as a single pool of resources, and artifacts can be arbitrarily deployed to the Mesos cluster and Mesos will figure out which CoreOS instance to actually deploy them to.
Running on top of Mesos is a "Mesos framework" such as Marathon or Kubernetes. Running inside Kubernetes are various Docker containers (C1 - C4).
Is this understanding of Mesos more or less correct?
Your summary is almost right but it does not reflect the essence of what mesos represents. The vision of mesosphere, the Company behind the project, is to create a "Datacenter Operating System" and the mesos is the kernel of it in analogy to the kernel of a normal OS.
The API is not limited to Java, you can use C, C++, Java/Scala, or Python.
If you have set-up your mesos cluster, as you describe in your question and want to use your resources, you usually do this through a framework instead of running your workload directly on it. This doesn't mean that this is complicated here is a very small example in Scala which demonstrates this. Frameworks exist for multiple popular distributed data processing systems like Apache Spark, Apache Cassandra. There are other frameworks such as Chronos a cron on data center level or Marathon which allows you to run Docker based applications.
Update:
Yes, mesos will take care about the placement in the cluster as that's what a kernel does -- scheduling and management of limited resources. The setup you have sketched raises several obvious questions, however.
Layers below mesos:
Installing mesos on CoreOS is possible but cumbersome as I think. This is not a typical scenario for running mesos -- usually it is moved to the lowest possible layer (above Ubuntu in your case). So I hope you have good reasons for running CoreOS and a hypervisor.
Layers above mesos:
Kubernetes ist available as framework and mesosphere seems to put much effort in it. It is, however, without question that there are partly overlapping in terms of functionality -- especially with regard to scheduling. If you want to schedule basic workloads based on Containers, you might be better off with Marathon or in the future maybe Aurora. So also here I hope you have good reasons for this very arrangement.
Sidenote: Kubernetes is similar to Marathon with a broader approach and pretty opinionated.
I would like to build a distributed NoSQL database or key-value store using golang, to learn golang and practice distribute system knowledge I've learnt from school. The target use case I can think of is running MapReduce on top of it, and implement a HDFS-compatible "filesystem" to expose the data to Hadoop, similar to running Hadoop on Ceph and Amazon S3.
My question is, what difficulties should I expect to integrate such an NoSQl database with Hadoop? Or integrate with other languages (e.g., providing Ruby/Python/Node.js/C++ APIs?) if I use golang to build the system.
Ok, I'm not much of a Hadoop user so I'll give you some more general lessons learned about the issues you'll face:
Protocol. If you're going with REST Go will be fine, but expect to find some gotchas in the default HTTP library's defaults (not expiring idle keepalive connections, not necessarily knowing when a reader has closed a stream). But if you want something more compact, know that: a. the Thrift implementation for Go, last I checked, was lacking and relatively slow. b. Go has great support for RPC but it might not play well with other languages. So you might want to check out protobuf, or work on top the redis protocol or something like that.
GC. Go's GC is very simplistic (STW, not generational, etc). If you plan on heavy memory caching in the orders of multiple Gs, expect GC pauses all over the place. There are techniques to reduce GC pressure but the straight forward Go idioms aren't usually optimized for that.
mmap'ing in Go is not straightforward, so it will be a bit of a struggle if you want to leverage that.
Besides slices, lists and maps, you won't have a lot of built in data structures to work with, like a Set type. There are tons of good implementations of them out there, but you'll have to do some digging up.
Take the time to learn concurrency patterns and interface patterns in Go. It's a bit different than other languages, and as a rule of thumb, if you find yourself struggling with a pattern from other languages, you're probably doing it wrong. A good talk about Go concurrency is this one IMHO http://www.youtube.com/watch?v=QDDwwePbDtw
A few projects you might want to have a look at:
Groupcache - a distributed key/value cache written in Go by Brad Fitzpatrick for Google's own use. It's a great implementation of a simple yet super robust distributed system in Go. https://github.com/golang/groupcache and check out Brad's presentation about it: http://talks.golang.org/2013/oscon-dl.slide
InfluxDB which includes a Go based version of the great Raft algorithm: https://github.com/influxdb/influxdb
My own humble (pretty dead) project, a redis compliant database that's based on a plugin architecture. My Go has improved since, but it has some nice parts, and it includes a pretty fast server for the redis protocol. https://bitbucket.org/dvirsky/boilerdb
I'm going to develop an on-line IVR application using Java (without PBX).
In the software requirements there are some mathematical calculations and database communication which I prefer to implement on Java side.
As you know, different technologies are ready to integrate with Java, such as JTAPI, Zanzibar OpenIVR, Moho, VoiceXML, CCXML, Jive, Prophecy, Voicent, Voxeo etc.
Now the question is: What is the best solution? Which one is easiest to reach? Which one have the best efficiency? Do you recommend Open Source frameworks? Is there any Windows API for handling IVR systems?
If you're going to do VoiceXML with Java, you should take a look at Rivr, an open-source VoiceXML dialogue engine.
Rivr let you code your callflow naturally in the Java language. Thus you can reuse all the available Java tools (e.g. debugger, unit testing framework, coverage test tool) to develop the callflow. You also benefit from all your IDE features too (refactorings, source navigation, version control, etc).
The API is very simple. You can code a complete callflow with a single method. No need to define "states" or to manipulate templates or XML files.
Integration with server-side logic is trivial since you are only coding for the server side.
There is far too little information here to provide a direct answer, but I'll try to give you some basics.
The standards for IVR application development is VoiceXML for dialog (caller interaction) and CCXML for call control. The latter is not as commonly available. There are also numerous proprietary solutions. Your choice of an open standard versus a proprietary solution should be more about vendor/solution lock in. Even with the open standards, you'll likely use custom enhancements and have some amount of lock in, but portability will be easier. You can code directly to the telephony boards (challenging and usually poorly documented if you are someone new to telephony) or work with solutions that provide end to end capability. I find very few people porting IVR applications so I would focus on supportability of your application, features and ease of use in your decision.
Platform choices run the spectrum. You have premise (onsite) and hosted solutions. You mostly have high end enterprise solutions and low end solutions. There are very few middle ground solutions. Features (telephony and integration capabilities) vary dramatically.
From a telephony perspective, take nothing for granted. In particular, transfers. There are many ways to transfer a call. How it is done will be constrained by your connection. An analog line to the CO (phone company) can have multiple mechanisms and the one in place will typically be dictated to you. Not all telephony platforms will support what you need. Hangup detection, at least on analog lines, can also catch the novice out. Hosted solutions will typically allow you to avoid most of these problems. VoIP solutions are even more complicated due to compatibility between devices (yes there are standards, lots of them, with lots of optional parts and then there are custom flavors).
For windows specifically, you can use Lync, but it is complicated...though many of the solutions you will explore will be complicated.
In short, there is no best solution. Your knowledge of the technologies, requirements and budget are going to drive the decision. I've generally worked with enterprise IVRs in on premise and hosted configurations that are typically fronting large call centers. I have come in contact with many of the open source solutions. Anything on premise is likely to be complicated because of the system and telephony configuration. Hosted solutions have typically done most of that for you.
I know that those are "de jure standards". But you should also take Asterisk(with AGI/AMI) as a consideration for your project. If you decide to try Asterisk and Java, take a look of astivetoolkit.org it may be very helpful.
Ricky from Twilio here.
For me, picking the best tool for a particular problem is one of my favorite tasks a developer. One technique to figuring this out is blocking off a day and spending an hour or two with each potential option. A few question I'll typically explore:
Which tool is the easiest to get started with?
Which tool has the best documentation?
Which tool has an engaged community that I can learn from?
I'm sure there are a ton more questions depending on your scenario you'd want to explore (Does it fit within my budget? Can I use it with the technologies I already know and love?).
If you're looking at building an IVR, we have an API that could help. We just dropped some new tutorials including a non-trivial, production ready IVR application using Java.
At present I have a set of benchmark tests for recording the speed at which a Java application connects submits and returns data from varying RDBMS housed on varying server platforms. The application uses a simple algorithm for recording the time taken associated with each test. The application itself is a simple Java interface for a user to specify the tests, this seemed easier than hard coding each test or using an IDE to perform each test (bare in mind with the combination of RDBMS, Server O.S and client O.S there are in the region of several hundred individual tests). I would like to further my findings by introducing the cpu usage and memory usage during these tests on the client side where the application resides, I could hard code the algorithm for doing so in my application(My Preference) or use a third party software for monitoring this (Bare in mind it would need to be suitable for cross platform use, Windows 7, Solaris and Ubuntu).
So my question is how could I record the usage of CPU and Memory during a test through either hard coding in my Java application or Using a third party software? If you believe a third party would be the solution please could you mention the actual product and how it is possible to do this?
Thankyou to all who take the time to answer.
Check VisualVM. Has a lot of features
I used VisualVM and help to much to get memory leaks.
Here has a video who show most important VisualVM features
There are plenty of commercial products for this. JProbe is my favorite these days, but I'm also using YourKit. In the free arena, Eclipse has "TPTT" -- "Test Platform something something" -- but it seems to be a rare person who can actually get the darn thing to work. Never works for me.
I have a small cluster of Linux machines and an account on all of them.
I have ssh access to all of them with no-password login.
How can I use actors or other Scala's concurrency abstraction to achieve distribution ?
What is the simplest path?
Does some library can distribute the processes for me?
The computers are unreliable, they can go on and off wheter they (students) feel like it.
Does some library can distribute the processes for me and watch for ready computers?
Can I avoid bash scripts?
I'd use Akka in your place. It is a distributed computing platform for Scala and Java, based on Erlang's Actor model. In particular, the let-it-fail philosophy it inherits from Erlang is particularly well suited to an environment where the nodes might go off at any time.
You could use Scala's build in support for remote actors. Browse the web or see Scala remote actors for additional information.
You could also take a look at GridGain a very easy to use grid computing framework for Java and Scala.
What you are looking for is a grid.
For a free one take a look at http://www.jppf.org/
JavaSpaces allows you to create distributed Data Structures that facilitate Distributed Computing. Not cheap, but look at GigaSpaces for a robust implementation.
This book gave me an eye opening experience of the possibilities.