Query on Kerberized HBase hangs - java

I have a web server which is presenting some data from a Cloudera cluster. The data are stored on HBase and the cluster is secured with Kerberos. When I try to perform a get, the server hangs without logging any error.
So far I've tried:
Launching the webserver from command line after a kinit (the server is just for testing purpose, so log-in duration and complex procedures to start it are not an issue)
The runAs approach described in here, both with and without the configuration file import from this answer.
The CLASSPATH configuration approach described here
Global authentication with UserGroupInformation.loginUserFromKeytab (with and without all the configurations from point 2 and 3)
I've executed all the gets from hbase shell after kiniting with the web server's user and they work in reasonable time (less than a second, while the last time I left the connection open the server didn't respond in over an hour), so it's not a performance or authorization issue. Inside the same web server, with every configuration listed, I'm able to perform other actions, like connecting to HBase and getting the table instances.
I've also checked the logs from Kerberos, HBase and my web server and none of them presents any error. In fact, I'm pretty much afraid that the authorization works, but it just gets stuck in some loop during the get.
UPDATES: After more testing, I've verified that there is a user set right before the call to HBase's API. Also, I've checked and no calls are made to HBase. So this is not an authentication problem, but something else. Did anybody have the same problem?

Related

Remote Execute the legacy tool in another machine via SSH vs TCP/IP vs RPC vs others

We have a Java web application which collects the inputs from the user and executes the legacy simulation tools in a remote machine by sharing the inputs in the JSON files.
The simulation tools doesn't have the option to run via HTTP request, therefore we have opted SSH. But there are some bottlenecks as below per the current design,
In order to avoid the manual steps of SSH authentication by keys/password, we have implemented Kerberos ticket based authentication which breaks often when windows updates, user changes the password and some other scenarios
Input and results are in JSON file formats in specified directories and each workflows will process it based on workflows the worflow id's matches with the directory name
Monitoring is the very tedious task as per current design. Because few of the workflows run for hours-days. So we need to update the application with the current status as Running-UpToDate-Compeleted/Error. Workflow will be running in commandline mode, it will update the logs/progress from remote system to server machine via SSH and we write this content into files. once user clicks on Refresh button, Application checks the progress file in specific directory and update the status. But it involves additional steps to get the current status and also it is slow because it involves file operation and copying from one machine to another.
Is there anyway we could optimize our design?
Also I found the other options like TCP/IP and RPC. To communicate with Remote systems and trigger the workflow. Our application is Spring boot/React web based, is it fine to mix http and TCP/IP in single application?
We are using Waffle SSO with Spring security, how to implement TCP/IP without compromising security.

Synchronous use of database insertion

I have created a java application that is inserting data to a mysql database. Under some conditions, i need to post some of these data via email, using a java application that I will write as well.
My problem is that i am not sure how i should implement this.
From what I understand, I could use a UDF inside MySql to execute a java application, for which there are many against opinions into using it. Let alone that both the database and the mail client application will reside in a VM that i dont have admin access, and dont want to install anything that neither me nor the admin knows.
My other alternative, that I can think of, is to set up the mail client (or some other application), to run every minute just to check for newly inserted data. Is this a better aproach? Isn't it going to use resources for doing almost nothing. At the moment the VM might not be heavily loaded, but i have no idea how many applications there might end up running on the same machine.
Is there any other alternative i should consider using?
You also need to consider the speed of internet, database server load, system resources. If you have enough memory and less load to insert data in databases or database load is not so much then you can approach this by cron setup. For linux call a script for every 5 minutes. The script perform the following-
1. Fetch unread Emails as files
3. Perfrom shell script to read needed data.
3. write data to mysql
4. Delete the email
If you have heavy loaded system then wise you need to do this once or twice in an hour or may vary.

Launching test SMTP server from ColdFusion

We are in the process of creating a training mode for our ColdFusion (9) sites.
The system will allow our users, after logging in, to switch from production mode to training mode by clicking on a link.
When they switch, the data-sources will be switched allowing the data to be safely modified.
We are also going to implement a test SMTP server, using the SubEthaSMTP Java project, in order to capture the emails that are sent from the training mode and display them to the user in a web page.
We can launch the SMTP server as a stand alone process or service without much trouble.
The nicer solution would be to launch server as part of the ColdFuson runtime at the point that the user switches to training mode.
We would create a true Java thread that would persist on a Server level scope for the length of any training sessions and then some arbitrary time out period. If the server times out and a new training session is initiated we would initiate a new SMTP server.
My essential question is, therefore, is it a bad idea to run an ongoing thread in the ColdFusion runtime this way?
I can't see a problem with doing this, although you ought to test to see what resources SubEthaSMTP uses and make sure it's not going to cause you issues. It looks to have minimal dependencies (essentially just SLF4J, which ColdFusion 9 & 10 already provide)
From the example page it looks to be pretty easy to set up and drop into a long-running scope. I think you're right to pick the server scope, as you may have problems using application or anything more volatile, as there'll be a situation where application scope would timeout and be reset, but you'd loose all references to the Mail Server instance.
Please update the post with your findings, as I'd be interested in seeing what you find.

Rails uploading Rails app to Passenger server

I've uploaded my Rails app from a svn repository to my server. In my localhost it works ok, but when I try to access by Chrome I get the following error:
There appears to be a database problem.
Your config/database.yml may not be written correctly. Please check it and fix any errors.
Your database schema may be out of date or nonexistant. Please run rake db:migrate to ensure that the database schema is up-to-date.
The database server may not be running. Please check whether it's running, and start it if it isn't.
I've no idea what to do, because it's the first time I try to upload a rails project and I'm doing the same things I do in localhost in remote server.
By the way, should I run rails s too throught the ssh connection?
Update: first thing to do is get educated on deploying: start here on the Rails site.
So the thing is, on your server you need to have a database set up, like you do on your local machine. Checking out the code from SVN only gets you the application, not the database.
You mention in the title that you have Passenger set up on the server. Passenger is a module of Apache (or Nginx) which replace the rails s command you are using in development. It is in this passenger configuration file that you'll need to set the RailsEnv <something> to determine what and how the app starts and runs.
If the database servers are the same (e.g. MySQL on both platforms) and the environments are the same (e.g. "development"), and if the app/db/database.yml file is checked into source control, then skip ahead.
If your database and environment is different (e.g. SQLLite in development and MySQL in production) then you'll need to add the necessary configuration -- database name, host, port, usename, password for the environment in the database.yml (and specify the proper database gem in your Gemfile, based on the environment). If you are storing passwords, I don't recommend checking in the database.yml file, but that's a separate topic. In the end, you need to have the right database config in database.yml on the server.
Then, you can run bundle exec rake db:setup on the server from your app's root directory. This will initialize the database with the current schema, and run any seeds.rb setup needed. Check with rake -T to see other options you might consider.
Once that's done, subsequent deployments require that you check out your latest code from SVN, and usually restart the app (with Passenger, this is done with the command touch tmp/restart.txt from the app's root directory. If you have made changes to the database structure, before restarting, run bundle exec rake db:migrate
Oh, yeah, in production, if you're using the default environment, you'll also need to run bundle exec rake assets:precompile the first time, and every time afterwards if you have added images, changed javascript/coffeescript or css/sass files.
Having said this, #rwilliams comment about Capistrano is definitely something you'll want to think about. Deploying is tricky as you can see, and as your app gets bigger you'll want it to be simple. Capistrano allows you to set up a script of things that make deployment a command like cap deploy or cap staging deploy:migrations. It's a lot to learn, but worth the effort.

Access SQLite Database on web from Desktop application

I have an SQLite Database on a webserver. I would like to access the database from a typical Java Desktop Application. Presently, I'm doing this thing...
Download the Database file to a local directory, perform the queries as necessary.
But, I'm unable to perform any update queries on this. How can I do this. [ On the actual database]
Another question is, to directly access the database from web in java (is this possible), make direct queries, updates anything etc,.
How can I achieve this type?
I've written code for connection of Java to SQLite and is working pretty fine, if the db file is in local directory. What changes or anything I have to do to establish a link to the file on webserver without having to download the database file.?
Thanks in advance...
CL. is right saying that if you need to access from desktop applications to a web database, SQLite is not an appropriate choice.
Using SQLite is fine in small web sites, applications where your data have to be accessed from and only from the web site itself; but if you need to access your data from - say - your desktop, without downloading the data file, you can't achieve that with SQLite and HTTP.
An appropriate choice for your web application would be MySQL or other client/server database, so that you could be able to connect to the database service from any place other than your web application, provided server access rules set permit that (e.g. firewalls, granted authentication, etc.).
In your usage scenario, you would be facing several orders of problems.
1) Security
You would be forced to violate the safety principle saying that database files must be protected from direct web exposure; in fact, to access your web SQLite database file from your desktop you would be forced to expose it directly, and this is wrong, as anyone would be able to download it and access your data, which by definition must be accessible just by you.
2) Updatability without downloading
Using HTTP to gain access to the database file can only lead to the requested resource download, because HTTP is a stateless protocol, so when you request GET or even POST access to the database, the web server would provide it to you in one solution, full stop.
In extreme synthesis, no chance to directly write back changes to the database file.
3) Updatability with downloading
You could download your file with a HTTP GET request, read data, make changes and the rest, but what if your online database changes in the meanwhile? Data consistency would be easily compromised.
There could be a way
If you give up using HTTP for your desktop application access to the database, then you could pick FTP (provided you have access credentials to the resource).
FTP lets you read data from and write data to files, so - on Linux - you could use FUSE to mount a remote FTP sharing and access it just like if it was connected to your local file system (see this article, for example).
In synthesis, you:
Create a mount point (i.e. a local directory) for FTP sharing
Use curlftpfs to link the remote FTP resource to your mount point
Access to this directory from your application as if it was a conventional directory
This way you could preserve security, keeping the database file from being exposed on the web, and you would be able to access it from your desktop application.
That said, please consider that concurrent access by several processes (desktop app + webserver instance) to a single database file could lead to problems (see this SO post to have an idea). Keep that in mind before architecting your solution.
Finally, in your usage scenario my suggestion is to program some server-side web service or REST interface that, under authentication, let you interact with the database file performing the key operations you need.
It is safe, reliable and "plastic" enough to let you do whatever you want.
EDIT:
MySQL is widely used for web sites or web applications, as it is fast, quite scalable and reasonably reliable. Activating MySQL server is a little bit OT on StackOverflow and quite long-winded to report, but in that case you could google around to find plenty of articles discussing such topic for your operating system of choice.
Then use MySQL JDBC driver to access the database from your Java desktop application.
If your idea is to stick with SQLite, though, you could basically prepare four web endpoints:
http://yourwebsite.com/select
http://yourwebsite.com/insert
http://yourwebsite.com/update
http://yourwebsite.com/delete
(Notice I specified "http", but you could consider moving to SSL encrypted http connection, a.k.a. "https", find details here and here. I don't know which webserver are you running, but still googling a little bit should point you to a good resource to properly configure https.)
Obviously you could add any endpoint you like for any kind of operation, even a more generic execute, but play my game just for a while.
Requests towards those endpoints are POST, and every endpoint receives proper parameters such as:
table name
fields
where clause
... and the like, but most important is security, so you have to remember 2 things:
1. Sign every request. You could achieve this defining a secret operation key (a string which is known to your client and you server but never travels in clear text), and using it in a hashing function to produce a digest which is sent together with other parameters as an incontrovertible proof for the server that that request it's receiving comes from a genuine source. This avoids you to send username and password in every request, which would introduce the problem of password encryption if you don't use https, and involves that the server has to be able to reconstruct the same signature for the same request using the same algorithm. (I flew over this thing at 400Mph, but the topic is too large to be correctly treated here. Anyways I hope this could point you in the right direction.)
2. Properly escape request parameters. "Sanitize" parameters someone calls it, and I think the metaphor is correct. Generally speaking this process involves some filtering operations performed by the server's endpoint, but it basically could be written as "use prepared statements for your queries". If you don't it could be likely that some malicious attacker injects SQL code in requests to exploit your server in some manner.
SQLite is an embedded database and assumes that the database file is directly accessible.
Your application is not an appropriate use of SQLite.
You should use a client/server database.
In any case, you should never make a database directly accessible on the internet;
the data should go through a web service.

Categories