How to set up HBase in AWS?

How to set up HBase in AWS? - java

I need to set up HBase remotely and since, I have a trial 12 months AWS account, I chose to do it in AWS.
This document https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-create.html describes the initial steps but it's not enough.
Could you explain me how to set up HBase in AWS properly in order to connect to it with my Java application? (how to expose port, enable inbound/outbound traffic, etc.)

Related

Server streaming with Cloud Run

On October 7 2020 and Januari 21 2021, Google introduced unidirectional server streaming and bidirectional web sockets respectively for Cloud Run. Here are the blog posts:
https://cloud.google.com/blog/products/serverless/cloud-run-now-supports-http-grpc-server-streaming
https://cloud.google.com/blog/products/serverless/cloud-run-gets-websockets-http-2-and-grpc-bidirectional-streams
From the second link:
This means you can now build a chat app on top of Cloud Run using a
protocol like WebSockets, or design streaming APIs using gRPC.
This raises some questions:
How does it work with auto scaling?
Say we build a chat app and we have ws connections distributes across multiple instances and need to push a message to all of them. How would we do?
Is it okey for the instances to keep a state now(the web socket connection)? What are the consequences of this?
What I am trying to ask; How do we build a scaleable chat application with Cloud Run and other managed tools available in Google Cloud with features like private messages and public chat rooms?

How does it work with auto-scaling?
Each WebSocket connection will consume 1 connection out of 250 available connection capacity per container. (250 is subject to change in the future as it had been 80 but increased to 250 recently.) This limit info is available in the Google Cloud Run Limits doc. When container's all 250 connections are occupied, another container instance will start automatically.
Say we build a chat app and we have ws connections distributes across multiple instances and need to push a message to all of them. How would we do?
You would have to use some form of central datastore or pubsub to solve that problem. e.g. Google provides Google Cloud PubSub, or you can setup a Redis instance and use Redis' PubSub feature. There are many ways to tackle this problem.
Is it okay for the instances to keep a state now(the web socket connection)? What are the consequences of this?
It is always safe to keep a state in a container, but you just need to make sure that the container can be terminated at any time when there isn't an active connection. Also,
according to the doc, Google Cloud Run will terminate all HTTP requests (including WebSockets) after request timeouts config, which has a default value of 5 min and can be increased to 15 min. Therefore, your WebSocket connections will likely be dropped after 15 min, and you should have a logic to handle auto reconnection. Google Cloud Run doc explicitly talks about this limit.

H2 Database Auto Server mode : Accessing through web console remotely

I am fairly new to H2 Database. As a part of a PoC, I am using H2 database(version : 1.4.187) for mocking the MS SQL Server DB. I have one application, say app1 which generates the data and save into H2. Another application, app2, needs to read from the H2 database and process the data it reads. I am trying to use Auto Server mode so that even if one of the application is down, other one is able to read/write to/from the database.
After reading multiple examples, i found how to build the h2 url and shown as below:
jdbc:h2:~/datafactory;MODE=MSSQLServer;AUTO_SERVER=TRUE;
Enabled the tcp and remote access as Below:
org.h2.tools.Server.createTcpServer("-tcpAllowOthers","-webAllowOthers").start()
With this, I am able to write to the database. Now, I want to read the data using the h2-web-console application. I am able to do that from my local machine. However, I am not able to understand how I can connect to this database remotely from another machine.
My plant is to run these two apps in an ubuntu machine and I can monitor the data using the web console from my machine. Is it not possible with this approach?
How can I solve this ?
Or do I need to use server mode and explicitly start the h2 server? Any help would be appreciated.

By default, remote connections are disabled for H2 database for protection. To enable remote access to the TCP server, you need to start the TCP server using the option -tcpAllowOthers or the other flags -webAllowOthers, -pgAllowOthers
.
To start both the Web Console server (the H2 Console tool) and the TCP server with remote connections enabled, you will have to use something like below
java -jar /path/to/h2.jar -web -webAllowOthers -tcp -tcpAllowOthers -browser
More information can be found in the docs here and console settings can be configured from here

Not entirely sure but looking at the documentation and other questions answered previously regarding the same topic the url should be something like this:
jdbc:h2:tcp://<host>:<port>/~/datafactory;MODE=MSSQLServer;AUTO_SERVER=TRUE;
It seems that the host may not be localhost and the database may not be in memory

Is there a need for the H2 web console?
You can use a different SQL tool using the TCP server you have already started. I use SQuirreL SQL Client (http://squirrel-sql.sourceforge.net/) to connect to different databases.
If you need a web interface you could use Adminer (https://www.adminer.org/) which can connect to different database vendors, including MS SQL, which happens to be mode you're running H2. There is an Adminer Debian package that should work for Ubuntu.

Connecting to a remote database using java

I am planing to but a web hosting package to host a site which shows customer details to the customers who will be registered with that web site. I have a java application which runs in my local computer which I am going to update the database of my hosted web site.
What in need to know is, is that possible to connect to the databases which we are buying from the web hosting package sellers from our locally running java applications?

Q: is it possible to connect to the databases which we are buying from the web hosting package sellers from our locally running java applications?
A: No: not usually.
At a MINIMUM, you need at LEAST two things:
1) RDBMS-specific client software (including, but not necessarily limited to, the relevant JDBC driver(s)) loaded on to EACH client PC
... and ...
2) All firewalls between the remote RDBMS server and each of your client PCs must be open to your RDBMS protocol (for example, port 1433 for MS Sql Server).
It's much more common for your web app, web server and RDBMS to be co-located, and your clients simply communicate via HTTPS.

Yes it's possible to connect to remote databases using java applications. The JDBC API will help you to create tables, insert values, query the tables, retrieve the results of the queries, and update the tables.
As far as a remote connection is concerned, here are two good links to begin with:
1) Link 1
2) Link 2
You haven't specified the type of database you need to connect to. Depending upon that you can learn how to create data sources, database roles etc as shown over here.

Amazon EC2 server TCP Socket Connection

I have developed a Java server using Eclipse that accepts TCP socket connection from android client, performs some computations, and returns the result to the android phone using this socket. I tried it on Wi-Fi.
I want now to move the Java server to the cloud - basically amazon EC2. Is this possible? I am just using a simple tcp socket connection. I have checked and couldn't find an example but came across "elastic beanstalk". Any help is appreciated, maybe a link or tutorial with such an example.
can i convert my java project to .war and use it or can i install eclipse on the cloud and run it as i do locally?

It is definitely possible. And you don't have to convert your project to a .war, unless you want to.
All you have to do is:
Pick the Amazon Image (AMI) you want to use - Amazon Linux is a good place to start, but there are plenty of other options, including Ubuntu and Windows.
Set up a security group - you need to set an incoming rule for your server's port number. It is pretty easy to do this from the Amazon web-based console.
Start a machine and assign it to the security group you created. Again, this is easily accomplished from the amazon web console.
Once the machine is up, log in (using ssh for Linux or Remote Desktop for windows) and install your server.
A few things to remember:
Since you are now running on a public server, sooner or later your server will be attacked. EVERYONE gets attacked. If all you are opening is your single application port, make sure it is secured.
An Amazon server has a private and public IP. Your client application will connect to the public IP.
Servers can fail, and new servers get new public IPs! You need to prepare for this. Either make the IP in the client configurable, or look into something like Amazon Elastic IPs or dynamic DNS.

Best practices for building a simple, scalable cluster on Amazon EC2 for a Java web app

I want to build a Java web app and deploy it on EC2. It will be written in Java and will use MySQL. I was hoping to get some pointers on the actual deployment process and configuration. In particular I'm interested in the following topics:
machine images (DIY vs ready made)
mysql replication and backup to S3
ways of deploying and redeploying the app to EC2 without interruptions
firewalls?
load balancing and auto scaling
cloudtools (or alternative tools)

I can only speak to a few of your discussion points from experience. I've had to strip out hyperlinks to the various Amazon products because I'm new to Stackoverflow and don't have enough rep to post more than one link.
Machine Images: While you can certainly start with your own machine image and convert it to an AMI with the EC2 AMI Tools, I prefer starting with one of Amazon's ready made images and customizing it to suit my needs. The advantage here is that you already know that the base image will deploy, you're more likely to get help on the forum or from the EC2 staff, and you don't have to go through the trouble of setting up a physical machine or your own VM in order to bundle the image and upload it. If you're using the EC2 API Tools, you can get a list of the available base images with ec2-describe-images -o amazon.
MySQL Replication and Backup: Check out the new(ish) Amazon Relational Database Service. It's designed to work with MySQL, can perform automatic backups, and scales easily.
Firewalls: Handling the firewalls for your instances is easy with the API tools. For example, you can create a group,
ec2-add-group condor –d “Condor Workers”
setup firewall rules for that group (bad example - opens all UDP and TCP ports for a CIDR range),
ec2-authorize condor -P tcp -p 0-65535 -s 129.127.0.0/16
ec2-authorize condor -P udp -p 0-65535 -s 129.127.0.0/16
and then launch your instances as part of the group, so that they inherit the firewall rules.
ec2-run-instances ami-12345678 –g condor –k mykeypair
The tricky part is going the other direction -- allowing your EC2 instances to communicate with your company/school/personal network. Since you don't know what IP your instances will have before they start (Amazon Elastic IP can alleviate this to some extent) you're generally forced to allow some subnet of the EC2 cloud.
You can also setup Iptables or additional firewalls on your instances.
Load Balancing: Consider Amazon Elastic Load Balancing. If that doesn't suit your needs, you can create your own "virtual cluster" and use whatever framework you like.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.