Two exclusive OptionGroup with Apache Commons CLI - java

I am building a command-line Java application and I have a problem with parsing the command line parameters with Apache Commons CLI.
I am trying to cover my scenario where I need to have two exclusive command-line param groups with long (--abc) and short (-a) arguments as well.
Use case 1
short params: -d oracle -j jdbc:oracle:thin:#//host:port/databa
same but with long params: -dialect oracle -jdbcUrl jdbc:oracle:thin:#//host:port/databa
Use case 2:
short params: -d oracle -h host -p 1521 -s database -U user -P pwd
same but with long params: -dialect oracle -host host -port 1521 -sid database -user user -password pwd
So I created two OptionGroup with the proper Option items:
OptionGroup jdbcUrlGroup = new OptionGroup();
jdbcUrlGroup.setRequired(true);
jdbcUrlGroup.addOption(jdbcUrl);
second group:
OptionGroup customConfigurationGroup = new OptionGroup();
customConfigurationGroup.setRequired(true);
customConfigurationGroup.addOption(host);
customConfigurationGroup.addOption(port);
customConfigurationGroup.addOption(sid);
customConfigurationGroup.addOption(user);
customConfigurationGroup.addOption(password);
Then I build the Options object this way:
Options options = new Options();
options.addOptionGroup(jdbcUrlGroup);
options.addOptionGroup(customConfigurationGroup);
options.addOption(dialect);
But this does not work because it expects to define both groups.
This is how the dialect Option is defined:
Option dialect = Option
.builder("d")
.longOpt("dialect")
.required(false)
.hasArg()
.argName("DIALECT")
.desc("supported SQL dialects: oracle. Default value: oracle")
.build();
The other mandatory Option definitions look similar except this one property:
.required(true)
Result:
-d oracle: Missing required options: [-j ...], [-h ..., -p ..., -s ..., -U ..., -P ...]
-d oracle -jdbcUrl xxx: Missing required option: [-h ..., -p ..., -s ..., -U ..., -P ...]
-d oracle -h yyy: Missing required option: [-j ...]
But what I want is the following: if the JDBC URL is provided then the host, port, etc, params are not needed or the opposite.

I think that it is time to forget Apache Commons CLI and mark it as a deprecated library. Okay, if you have only a few command-line arguments then you can use it, otherwise better not to use. Fact that this Apache project was updated recently (17 February 2019), but still many features are missing from it and a little bit painful to work with Apache Commons CLI library.
The picocli project looks like a better candidate for parsing command line parameters. It is a quite intuitive library, easy to use, and has a nice and comprehensive documentation as well. I think that a middle rated tool with perfect documentation is better than a shiny project without any documentation.
Anyway picocli is a very nice library with perfect documentation, so I give double plus-plus to it :)
This is how I covered my use cases with picocli:
import picocli.CommandLine;
import picocli.CommandLine.ArgGroup;
import picocli.CommandLine.Command;
import picocli.CommandLine.Option;
import picocli.CommandLine.Parameters;
#Command(name = "SqlRunner",
sortOptions = false,
usageHelpWidth = 100,
description = "SQL command line tool. It executes the given SQL and show the result on the standard output.\n",
parameterListHeading = "General options:\n",
footerHeading = "\nPlease report issues at arnold.somogyi#gmail.com.",
footer = "\nDocumentation, source code: https://github.com/zappee/sql-runner.git")
public class SqlRunner implements Runnable {
/**
* Definition of the general command line options.
*/
#Option(names = {"-?", "--help"}, usageHelp = true, description = "Display this help and exit.")
private boolean help;
#Option(names = {"-d", "--dialect"}, defaultValue = "oracle", showDefaultValue = CommandLine.Help.Visibility.ALWAYS, description = "Supported SQL dialects: oracle.")
private static String dialect;
#ArgGroup(exclusive = true, multiplicity = "1", heading = "\nProvide a JDBC URL:\n")
MainArgGroup mainArgGroup;
/**
* Two exclusive parameter groups:
* (1) JDBC URL parameter
* (2) Custom connection parameters
*/
static class MainArgGroup {
/**
* JDBC URL option (only one parameter).
*/
#Option(names = {"-j", "--jdbcUrl"}, arity = "1", description = "JDBC URL, example: jdbc:oracle:<drivertype>:#//<host>:<port>/<database>.")
private static String jdbcUrl;
/**
* Custom connection parameter group.
*/
#ArgGroup(exclusive = false, multiplicity = "1", heading = "\nCustom configuration:\n")
CustomConfigurationGroup customConfigurationGroup;
}
/**
* Definition of the SQL which will be executed.
*/
#Parameters(index = "0", arity = "1", description = "SQL to be executed. Example: 'select 1 from dual'")
String sql;
/**
* Custom connection parameters.
*/
static class CustomConfigurationGroup {
#Option(names = {"-h", "--host"}, required = true, description = "Name of the database server.")
private static String host;
#Option(names = {"-p", "--port"}, required = true, description = "Number of the port where the server listens for requests.")
private static String port;
#Option(names = {"-s", "--sid"}, required = true, description = "Name of the particular database on the server. Also known as the SID in Oracle terminology.")
private static String sid;
#Option(names = {"-U", "--user"}, required = true, description = "Name for the login.")
private static String user;
#Option(names = {"-P", "--password"}, required = true, description = "Password for the connecting user.")
private static String password;
}
/**
* The entry point of the executable JAR.
*
* #param args command line parameters
*/
public static void main(String[] args) {
CommandLine cmd = new CommandLine(new SqlRunner());
int exitCode = cmd.execute(args);
System.exit(exitCode);
}
/**
* It is used to create a thread.
*/
#Override
public void run() {
int exitCode = 0; //executeMyStaff();
System.exit(exitCode);
}
}
And this is how the generated help looks like:
$ java -jar target/sql-runner-1.0-shaded.jar --help
Usage: SqlRunner [-?] [-d=<dialect>] (-j=<jdbcUrl> | (-h=<host> -p=<port> -s=<sid> -U=<user>
-P=<password>)) <sql>
SQL command line tool. It executes the given SQL and show the result on the standard output.
General settings:
<sql> SQL to be executed. Example: 'select 1 from dual'
-?, --help Display this help and exit.
-d, --dialect=<dialect> Supported SQL dialects: oracle.
Default: oracle
Custom configuration:
-h, --host=<host> Name of the database server.
-p, --port=<port> Number of the port where the server listens for requests.
-s, --sid=<sid> Name of the particular database on the server. Also known as the SID in
Oracle terminology.
-U, --user=<user> Name for the login.
-P, --password=<password> Password for the connecting user.
Provide a JDBC URL:
-j, --jdbcUrl=<jdbcUrl> JDBC URL, example: jdbc:oracle:<drivertype>:#//<host>:<port>/<database>.
Please report issues at arnold.somogyi#gmail.com.
Documentation, source code: https://github.com/zappee/sql-runner.git
This look is much better than the Apache CLI generated help.

Related

Weka Experiment with API "ExpType"

I am quite new with JavaCode and I try to use it to conduct experiments on Weka. I try to use this code: https://waikato.github.io/weka-wiki/files/ExperimentDemo.java given by Weka.
However, I donĀ“t know how to set Experiment Type on this code.
// 1. setup the experiment
System.out.println("Setting up...");
Experiment exp = new Experiment();
exp.setPropertyArray(new Classifier[0]);
exp.setUsePropertyIterator(true);
String option;
// classification or regression
option = Utils.getOption("exptype", args);
if (option.length() == 0)
throw new IllegalArgumentException("No experiment type provided!");
SplitEvaluator se = new weka.experiment.ClassifierSplitEvaluator (); //my
Classifier sec = null;
"I could not find how to set "exptype" on API. What should be written there for "classification".
Thank you in advance !
If you take a look at the Javadoc of the main method:
/**
* Expects the following parameters:
* <ul>
* <li>-classifier "classifier incl. parameters"</li>
* <li>-exptype "classification|regression"</li>
* <li>-splittype "crossvalidation|randomsplit"</li>
* <li>-runs "# of runs"</li>
* <li>-folds "# of cross-validation folds"</li>
* <li>-percentage "percentage for randomsplit"</li>
* <li>-result "arff file for storing the results"</li>
* <li>-t "dataset" (can be supplied multiple times)</li>
* </ul>
*
* #param args the commandline arguments
* #throws Exception if something goes wrong
*/
These are the parameters (as java.lang.String array) that you have to supply when executing the main method. This example class is designed to be called from the command-line with parameters.
Here is an example for a command-line:
java -cp weka.jar:. ExperimentDemo -classifier "weka.classifiers.trees.J48 -M 2" -exptype classification -splittype crossvalidation -runs 10 -folds 10 -result results.arff -t dataset.arff
NB: This command assumes Linux (use ; as path separator under Windows) and that the weka.jar file, the class file of the aforementioned example class and the dataset dataset.arff are all in the current directory.
Or if you want to call the main method yourself, construct a String array accordingly:
String[] options = new String[]{
"-classifier",
"weka.classifiers.trees.J48 -M 2",
"-exptype",
"classification",
"-splittype",
"crossvalidation",
"-runs",
"10",
"-folds",
"10",
"-result",
"results.arff",
"-t",
"dataset.arff"
};

How to run Spark code in Airflow?

Hello people of the Earth!
I'm using Airflow to schedule and run Spark tasks.
All I found by this time is python DAGs that Airflow can manage.
DAG example:
spark_count_lines.py
import logging
from airflow import DAG
from airflow.operators import PythonOperator
from datetime import datetime
args = {
'owner': 'airflow'
, 'start_date': datetime(2016, 4, 17)
, 'provide_context': True
}
dag = DAG(
'spark_count_lines'
, start_date = datetime(2016, 4, 17)
, schedule_interval = '#hourly'
, default_args = args
)
def run_spark(**kwargs):
import pyspark
sc = pyspark.SparkContext()
df = sc.textFile('file:///opt/spark/current/examples/src/main/resources/people.txt')
logging.info('Number of lines in people.txt = {0}'.format(df.count()))
sc.stop()
t_main = PythonOperator(
task_id = 'call_spark'
, dag = dag
, python_callable = run_spark
)
The problem is I'm not good in Python code and have some tasks written in Java. My question is how to run Spark Java jar in python DAG? Or maybe there is other way yo do it? I found spark submit: http://spark.apache.org/docs/latest/submitting-applications.html
But I don't know how to connect everything together. Maybe someone used it before and has working example. Thank you for your time!
You should be able to use BashOperator. Keeping the rest of your code as is, import required class and system packages:
from airflow.operators.bash_operator import BashOperator
import os
import sys
set required paths:
os.environ['SPARK_HOME'] = '/path/to/spark/root'
sys.path.append(os.path.join(os.environ['SPARK_HOME'], 'bin'))
and add operator:
spark_task = BashOperator(
task_id='spark_java',
bash_command='spark-submit --class {{ params.class }} {{ params.jar }}',
params={'class': 'MainClassName', 'jar': '/path/to/your.jar'},
dag=dag
)
You can easily extend this to provide additional arguments using Jinja templates.
You can of course adjust this for non-Spark scenario by replacing bash_command with a template suitable in your case, for example:
bash_command = 'java -jar {{ params.jar }}'
and adjusting params.
Airflow as of version 1.8 (released today), has
SparkSqlOperator - https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_sql_operator.py ;
SparkSQLHook code - https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_sql_hook.py
SparkSubmitOperator - https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py
SparkSubmitHook code - https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
Notice that these two new Spark operators/hooks are in "contrib" branch as of 1.8 version so not (well) documented.
So you can use SparkSubmitOperator to submit your java code for Spark execution.
There is an example of SparkSubmitOperator usage for Spark 2.3.1 on kubernetes (minikube instance):
"""
Code that goes along with the Airflow located at:
http://airflow.readthedocs.org/en/latest/tutorial.html
"""
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator
from airflow.models import Variable
from datetime import datetime, timedelta
default_args = {
'owner': 'user#mail.com',
'depends_on_past': False,
'start_date': datetime(2018, 7, 27),
'email': ['user#mail.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
'end_date': datetime(2018, 7, 29),
}
dag = DAG(
'tutorial_spark_operator', default_args=default_args, schedule_interval=timedelta(1))
t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
print_path_env_task = BashOperator(
task_id='print_path_env',
bash_command='echo $PATH',
dag=dag)
spark_submit_task = SparkSubmitOperator(
task_id='spark_submit_job',
conn_id='spark_default',
java_class='com.ibm.cdopoc.DataLoaderDB2COS',
application='local:///opt/spark/examples/jars/cppmpoc-dl-0.1.jar',
total_executor_cores='1',
executor_cores='1',
executor_memory='2g',
num_executors='2',
name='airflowspark-DataLoaderDB2COS',
verbose=True,
driver_memory='1g',
conf={
'spark.DB_URL': 'jdbc:db2://dashdb-dal13.services.dal.bluemix.net:50001/BLUDB:sslConnection=true;',
'spark.DB_USER': Variable.get("CEDP_DB2_WoC_User"),
'spark.DB_PASSWORD': Variable.get("CEDP_DB2_WoC_Password"),
'spark.DB_DRIVER': 'com.ibm.db2.jcc.DB2Driver',
'spark.DB_TABLE': 'MKT_ATBTN.MERGE_STREAM_2000_REST_API',
'spark.COS_API_KEY': Variable.get("COS_API_KEY"),
'spark.COS_SERVICE_ID': Variable.get("COS_SERVICE_ID"),
'spark.COS_ENDPOINT': 's3-api.us-geo.objectstorage.softlayer.net',
'spark.COS_BUCKET': 'data-ingestion-poc',
'spark.COS_OUTPUT_FILENAME': 'cedp-dummy-table-cos2',
'spark.kubernetes.container.image': 'ctipka/spark:spark-docker',
'spark.kubernetes.authenticate.driver.serviceAccountName': 'spark'
},
dag=dag,
)
t1.set_upstream(print_path_env_task)
spark_submit_task.set_upstream(t1)
The code using variables stored in Airflow variables:
Also, you need to create a new spark connection or edit existing 'spark_default' with
extra dictionary {"queue":"root.default", "deploy-mode":"cluster", "spark-home":"", "spark-binary":"spark-submit", "namespace":"default"}:
Go to Admin -> Connection -> Create in Airflow UI. Create a new SSH connection by providing host = IP address, port = 22 and extra as {"key_file": "/path/to/pem/file", "no_host_key_check":true}
This host should be the Spark cluster master from which you can submit spark-jobs. Next, you need to create a DAG with SSHOperator. Following is the template for this.
with DAG(dag_id='ssh-dag-id',
schedule_interval="05 12 * * *",
catchup=False) as dag:
spark_job = ("spark-submit --class fully.qualified.class.name "
"--master yarn "
"--deploy-mode client "
"--driver-memory 6G "
"--executor-memory 6G "
"--num-executors 6 "
"/path/to/your-spark.jar")
ssh_run_query = SSHOperator(
task_id="random_task_id",
ssh_conn_id="name_of_connection_you just_created",
command=spark_job,
get_pty=True,
dag=dag)
ssh_run_query
That's it. You also get the complete logs for this Spark job in Airflow.

(Jython) Problems to run python script from java

I wrote 2 python scripts named runner.py and connect.py.
The runner script starts a traffic simulation with a specific port and the other one connects and is able to send commands. Both scripts work fine in my python IDE. But i want to start both scripts from java to recieve data.
import org.python.core.PyInstance;
import org.python.util.PythonInterpreter;
import org.python.core.PyObject;
import org.python.core.PyString;
public class PythonHandler {
PythonInterpreter interpreter = null;
String script_dir;
public PythonHandler() {
PythonInterpreter.initialize(System.getProperties(), System.getProperties(), new String[0]);
this.interpreter = new PythonInterpreter();
this.script_dir = System.getProperty("user.dir");
}
void execfile(final String fileName) {
this.interpreter.execfile(fileName);
}
PyInstance createClass(final String className, final String opts) {
return (PyInstance) this.interpreter.eval(className + "(" + opts + ")");
}
/**
* This method will start the python script runner.py
* NOTE: doesn't work if there is not a main method in the python script
*/
public void startRunner() {
String runner_dir = script_dir + "\\src\\de\\uniol\\inf\\is\\odysseus\\pgtaxi\\traci\\traci4python\\runner.py";
PythonHandler ie = new PythonHandler();
ie.execfile(runner_dir);
}
/**
* This method will start the python script connect.py
* NOTE: doesn't work if there is not a main method in the python script
*/
public void startConnect() {
String connect_dir = script_dir
+ "\\src\\de\\uniol\\inf\\is\\odysseus\\pgtaxi\\traci\\traci4python\\connect.py";
PythonHandler ie = new PythonHandler();
ie.execfile(connect_dir);
}
/**
* This method will start the python script connect.py
* If there is not a main method you can run a specific function
*
* #param function
* name of the function you want to start
*/
public void callConnectFunction(String function) {
String connect_dir = script_dir
+ "\\src\\de\\uniol\\inf\\is\\odysseus\\pgtaxi\\traci\\traci4python\\connect.py";
PythonHandler ie = new PythonHandler();
ie.execfile(connect_dir);
PyInstance run = ie.createClass("Connection", "None");
run.invoke(function);
}
/**
* This method will start the python script runner.py
* If there is not a main method you can run a specific function
*
* #param function
* name of the function you want to start
*/
public void callRunnerFunction(String function) {
String runner_dir = script_dir + "\\src\\de\\uniol\\inf\\is\\odysseus\\pgtaxi\\traci\\traci4python\\connect.py";
PythonHandler ie = new PythonHandler();
ie.execfile(runner_dir);
PyInstance run = ie.createClass("Runner", "None");
run.invoke(function);
}
}
Both scripts start but in the connect.py occur an error. I don't unterstand why i'm able to run the script from the python IDE but not from my java code.
Here are the code from the python scripts:
runner.py
import sys
import subprocess
import os
PORT = 8873
class Runner:
__gui = None
def __init__(self, gui):
self.__gui = gui
print "Starting runner..."
def runLocal(self):
sumoBinary = os.path.abspath(os.curdir)
sumoBinary = sumoBinary.split('de.uniol.inf.is.odysseus.pgtaxi')[0]
sumoBinary = sumoBinary + 'de.uniol.inf.is.odysseus.pgtaxi\\sumo\\bin\\sumo-gui'
scenario = os.path.abspath(os.curdir)
scenario = sumoBinary.split('de.uniol.inf.is.odysseus.pgtaxi')[0]
scenario = scenario + 'de.uniol.inf.is.odysseus.pgtaxi\\scenario\\oldenburg.sumocfg'
sumoProcess = subprocess.Popen([sumoBinary, "-c", scenario, "--remote-port", str(PORT)], stdout=sys.stdout, stderr=sys.stderr)
if __name__ == '__main__':
conn = Runner('None')
conn.runLocal()
connect.py
import sys
import os
PORT = 8873
class Connection:
__gui = None
def __init__(self, gui):
self.__gui = gui
print "Starting connect..."
def initTraci(self):
tools = os.path.abspath(os.curdir)
tools = tools.split('de.uniol.inf.is.odysseus.pgtaxi')[0]
tools = tools + 'de.uniol.inf.is.odysseus.pgtaxi\\sumo\\tools'
sys.path.append(tools)
import traci
traci.init(PORT)
step = 0
while step < 1000:
traci.simulationStep()
step += 1
traci.close()
sys.stdout.flush()
def getFreePort(self):
tools = os.path.abspath(os.curdir)
tools = tools.split('de.uniol.inf.is.odysseus.pgtaxi')[0]
tools = tools + 'de.uniol.inf.is.odysseus.pgtaxi\\sumo\\tools'
sys.path.append(tools)
import sumolib
PORT = sumolib.miscutils.getFreeSocketPort()
if __name__ == '__main__':
conn = Connection('None')
conn.initTraci()
I get this Exception:
Exception in thread "MainThread" Traceback (most recent call last):
File "C:\Users\FEPREUSS\Desktop\PG\workspace\de.uniol.inf.is.odysseus.pgtaxi\src\de\uniol\inf\is\odysseus\pgtaxi\traci\traci4python\connect.py", line 44, in <module>
conn.initTraci()
File "C:\Users\FEPREUSS\Desktop\PG\workspace\de.uniol.inf.is.odysseus.pgtaxi\src\de\uniol\inf\is\odysseus\pgtaxi\traci\traci4python\connect.py", line 25, in initTraci
traci.init(PORT)
File "C:\Users\FEPREUSS\Desktop\PG\workspace\de.uniol.inf.is.odysseus.pgtaxi\sumo\tools\traci\__init__.py", line 65, in init
return getVersion()
File "C:\Users\FEPREUSS\Desktop\PG\workspace\de.uniol.inf.is.odysseus.pgtaxi\sumo\tools\traci\__init__.py", line 82, in getVersion
return _connections[""].getVersion()
AttributeError: 'NoneType' object has no attribute 'getVersion'
And the two methods from the lib who cause the excpetion:
def init(port=8813, numRetries=10, host="localhost", label="default"):
"""
Establish a connection to a TraCI-Server and store it under the given
label. This method is not thread-safe. It accesses the connection
pool concurrently.
"""
_connections[label] = connect(port, numRetries, host)
switch(label)
return getVersion()
def getVersion():
return _connections[""].getVersion()
Hope someone can help me.
You stumbled on a jython bug which was masked away by SUMO hiding the error message of the failed socket connection. Unfortunately you can not work around it easily other than editing SUMO/tools/traci/connection.py in line 49.
Just replace self._socket = socket() with
self._socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP)
A workaround was committed to the SUMO repository as well.

how to use property=value in CLI commons Library

I am trying to use the OptionBuilder.withArgName( "property=value" )
If my Option is called status and my command line was:
--status p=11 s=22
It only succeeds to identify the first argument which is 11 and it fails to identify the second argument...
Option status = OptionBuilder.withLongOpt("status")
.withArgName( "property=value" )
.hasArgs(2)
.withValueSeparator()
.withDescription("Get the status")
.create('s');
options.addOption(status);
Thanks for help in advance
You can access to passed properties using simple modification of passed command line options
--status p=11 --status s=22
or with your short syntax
-s p=11 -s s=22
In this case you can access to your properties simply with code
if (cmd.hasOption("status")) {
Properties props = cmd.getOptionProperties("status");
System.out.println(props.getProperty("p"));
System.out.println(props.getProperty("t"));
}
If you need to use your syntax strictly, you can manually parse your property=value pairs.
In this case you should remove .withValueSeparator() call, and then use
String [] propvalues = cmd.getOptionValues("status");
for (String propvalue : propvalues) {
String [] values = propvalue.split("=");
System.out.println(values[0] + " : " + values[1]);
}

How to Execute SQL Script File in Java?

I want to execute an SQL script file in Java without reading the entire file content into a big query and executing it.
Is there any other standard way?
There is great way of executing SQL scripts from Java without reading them yourself as long as you don't mind having a dependency on Ant. In my opinion such a dependency is very well justified in your case. Here is sample code, where SQLExec class lives in ant.jar:
private void executeSql(String sqlFilePath) {
final class SqlExecuter extends SQLExec {
public SqlExecuter() {
Project project = new Project();
project.init();
setProject(project);
setTaskType("sql");
setTaskName("sql");
}
}
SqlExecuter executer = new SqlExecuter();
executer.setSrc(new File(sqlFilePath));
executer.setDriver(args.getDriver());
executer.setPassword(args.getPwd());
executer.setUserid(args.getUser());
executer.setUrl(args.getUrl());
executer.execute();
}
There is no portable way of doing that. You can execute a native client as an external program to do that though:
import java.io.*;
public class CmdExec {
public static void main(String argv[]) {
try {
String line;
Process p = Runtime.getRuntime().exec
("psql -U username -d dbname -h serverhost -f scripfile.sql");
BufferedReader input =
new BufferedReader
(new InputStreamReader(p.getInputStream()));
while ((line = input.readLine()) != null) {
System.out.println(line);
}
input.close();
}
catch (Exception err) {
err.printStackTrace();
}
}
}
Code sample was extracted from here and modified to answer question assuming that the user wants to execute a PostgreSQL script file.
Flyway library is really good for this:
Flyway flyway = new Flyway();
flyway.setDataSource(dbConfig.getUrl(), dbConfig.getUsername(), dbConfig.getPassword());
flyway.setLocations("classpath:db/scripts");
flyway.clean();
flyway.migrate();
This scans the locations for scripts and runs them in order. Scripts can be versioned with V01__name.sql so if just the migrate is called then only those not already run will be run. Uses a table called 'schema_version' to keep track of things. But can do other things too, see the docs: flyway.
The clean call isn't required, but useful to start from a clean DB.
Also, be aware of the location (default is "classpath:db/migration"), there is no space after the ':', that one caught me out.
No, you must read the file, split it into separate queries and then execute them individually (or using the batch API of JDBC).
One of the reasons is that every database defines their own way to separate SQL statements (some use ;, others /, some allow both or even to define your own separator).
You cannot do using JDBC as it does not support . Work around would be including iBatis iBATIS is a persistence framework and call the Scriptrunner constructor as shown in iBatis documentation .
Its not good to include a heavy weight persistence framework like ibatis in order to run a simple sql scripts any ways which you can do using command line
$ mysql -u root -p db_name < test.sql
Since JDBC doesn't support this option the best way to solve this question is executing command lines via the Java Program. Bellow is an example to postgresql:
private void executeSqlFile() {
try {
Runtime rt = Runtime.getRuntime();
String executeSqlCommand = "psql -U (user) -h (domain) -f (script_name) (dbName)";
Process pr = rt.exec();
int exitVal = pr.waitFor();
System.out.println("Exited with error code " + exitVal);
} catch (Exception e) {
System.out.println(e.toString());
}
}
The Apache iBatis solution worked like a charm.
The script example I used was exactly the script I was running from MySql workbench.
There is an article with examples here:
https://www.tutorialspoint.com/how-to-run-sql-script-using-jdbc#:~:text=You%20can%20execute%20.,to%20pass%20a%20connection%20object.&text=Register%20the%20MySQL%20JDBC%20Driver,method%20of%20the%20DriverManager%20class.
This is what I did:
pom.xml dependency
<!-- IBATIS SQL Script runner from Apache (https://mvnrepository.com/artifact/org.apache.ibatis/ibatis-core) -->
<dependency>
<groupId>org.apache.ibatis</groupId>
<artifactId>ibatis-core</artifactId>
<version>3.0</version>
</dependency>
Code to execute script:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.Reader;
import java.sql.Connection;
import org.apache.ibatis.jdbc.ScriptRunner;
import lombok.extern.slf4j.Slf4j;
#Slf4j
public class SqlScriptExecutor {
public static void executeSqlScript(File file, Connection conn) throws Exception {
Reader reader = new BufferedReader(new FileReader(file));
log.info("Running script from file: " + file.getCanonicalPath());
ScriptRunner sr = new ScriptRunner(conn);
sr.setAutoCommit(true);
sr.setStopOnError(true);
sr.runScript(reader);
log.info("Done.");
}
}
For my simple project the user should be able to select SQL-files which get executed.
As I was not happy with the other answers and I am using Flyway anyway I took a closer look at the Flyway code. DefaultSqlScriptExecutor is doing the actual execution, so I tried to figure out how to create an instance of DefaultSqlScriptExecutor.
Basically the following snippet loads a String splits it into the single statements and executes one by one.
Flyway also provides other LoadableResources than StringResource e.g. FileSystemResource. But I have not taken a closer look at them.
As DefaultSqlScriptExecutor and the other classes are not officially documented by Flyway use the code-snippet with care.
public static void execSqlQueries(String sqlQueries, Configuration flyWayConf) throws SQLException {
// create dependencies FlyWay needs to execute the SQL queries
JdbcConnectionFactory jdbcConnectionFactory = new JdbcConnectionFactory(flyWayConf.getDataSource(),
flyWayConf.getConnectRetries(),
null);
DatabaseType databaseType = jdbcConnectionFactory.getDatabaseType();
ParsingContext parsingContext = new ParsingContext();
SqlScriptFactory sqlScriptFactory = databaseType.createSqlScriptFactory(flyWayConf, parsingContext);
Connection conn = flyWayConf.getDataSource().getConnection();
JdbcTemplate jdbcTemp = new JdbcTemplate(conn);
ResourceProvider resProv = flyWayConf.getResourceProvider();
DefaultSqlScriptExecutor scriptExec = new DefaultSqlScriptExecutor(jdbcTemp, null, false, false, false, null);
// Prepare and execute the actual queries
StringResource sqlRes = new StringResource(sqlQueries);
SqlScript sqlScript = sqlScriptFactory.createSqlScript(sqlRes, true, resProv);
scriptExec.execute(sqlScript);
}
The simplest external tool that I found that is also portable is jisql - https://www.xigole.com/software/jisql/jisql.jsp .
You would run it as:
java -classpath lib/jisql.jar:\
lib/jopt-simple-3.2.jar:\
lib/javacsv.jar:\
/home/scott/postgresql/postgresql-8.4-701.jdbc4.jar
com.xigole.util.sql.Jisql -user scott -password blah \
-driver postgresql \
-cstring jdbc:postgresql://localhost:5432/scott -c \; \
-query "select * from test;"
JDBC does not support this option (although a specific DB driver may offer this).
Anyway, there should not be a problem with loading all file contents into memory.
Try this code:
String strProc =
"DECLARE \n" +
" sys_date DATE;"+
"" +
"BEGIN\n" +
"" +
" SELECT SYSDATE INTO sys_date FROM dual;\n" +
"" +
"END;\n";
try{
DriverManager.registerDriver ( new oracle.jdbc.driver.OracleDriver () );
Connection connection = DriverManager.getConnection ("jdbc:oracle:thin:#your_db_IP:1521:your_db_SID","user","password");
PreparedStatement psProcToexecute = connection.prepareStatement(strProc);
psProcToexecute.execute();
}catch (Exception e) {
System.out.println(e.toString());
}
If you use Spring you can use DataSourceInitializer:
#Bean
public DataSourceInitializer dataSourceInitializer(#Qualifier("dataSource") final DataSource dataSource) {
ResourceDatabasePopulator resourceDatabasePopulator = new ResourceDatabasePopulator();
resourceDatabasePopulator.addScript(new ClassPathResource("/data.sql"));
DataSourceInitializer dataSourceInitializer = new DataSourceInitializer();
dataSourceInitializer.setDataSource(dataSource);
dataSourceInitializer.setDatabasePopulator(resourceDatabasePopulator);
return dataSourceInitializer;
}
Used to set up a database during initialization and clean up a
database during destruction.
https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/jdbc/datasource/init/DataSourceInitializer.html

Categories