Failed to execute Kettle job file by java - java

It's my first time to deal with Pentaho Data Integration, so i'm trying to execute the job from the java, but i failed to to this ! although of the job is working good from Spoon.
this is my code:
jobMeta = new JobMeta(LogWriter.getInstance(), "E:\\rubbish\\job.kjb", null);
Job job = new Job(LogWriter.getInstance(), null , jobMeta);
job.start();
job.waitUntilFinished();
and this is the result which has an error !:
!INFO 01-07 16:32:03,305 - Using "C:\Users\AALKHA~1\AppData\Local\Temp\vf
_cache" as temporary files store.
ERROR 01-07 16:32:03,468 - null.0 - Unable to read Job Entry copy info from XML
node : org.pentaho.di.core.exception.KettleStepLoaderException:
No valid step/plugin specified (jobPlugin=null) for SPECIAL
ERROR 01-07 16:32:03,471 - null.0 - org.pentaho.di.core.exception.KettleStepLoa
erException:
No valid step/plugin specified (jobPlugin=null) for SPECIAL
at org.pentaho.di.job.entry.JobEntryCopy.<init>(JobEntryCopy.java:110)
at org.pentaho.di.job.JobMeta.loadXML(JobMeta.java:922)
at org.pentaho.di.job.JobMeta.<init>(JobMeta.java:726)
at org.pentaho.di.job.JobMeta.<init>(JobMeta.java:693)
at ingramint.steps.fullIntStep.step1(fullIntStep.java:62)
at ingramint.steps.fullIntStep.<init>(fullIntStep.java:32)
at ingramint.libs.menuLib.mainMenu(menuLib.java:75)
at ingramint.Main.main(Main.java:24)
Jul 01, 2014 4:32:03 PM ingramint.steps.fullIntStep step1
SEVERE: null
org.pentaho.di.core.exception.KettleXMLException:
Unable to load the job from XML file [E:\rubbish\job.kjb]
Unable to load job info from XML node
Unable to read Job Entry copy info from XML node : org.pentaho.di.core.exceptio
.KettleStepLoaderException:
No valid step/plugin specified (jobPlugin=null) for SPECIAL
No valid step/plugin specified (jobPlugin=null) for SPECIAL
at org.pentaho.di.job.JobMeta.<init>(JobMeta.java:734)
at org.pentaho.di.job.JobMeta.<init>(JobMeta.java:693)
at ingramint.steps.fullIntStep.step1(fullIntStep.java:62)
at ingramint.steps.fullIntStep.<init>(fullIntStep.java:32)
at ingramint.libs.menuLib.mainMenu(menuLib.java:75)
at ingramint.Main.main(Main.java:24)
Caused by: org.pentaho.di.core.exception.KettleXMLException:
Unable to load job info from XML node
Unable to read Job Entry copy info from XML node : org.pentaho.di.core.exceptio
.KettleStepLoaderException:
No valid step/plugin specified (jobPlugin=null) for SPECIAL
No valid step/plugin specified (jobPlugin=null) for SPECIAL
at org.pentaho.di.job.JobMeta.loadXML(JobMeta.java:968)
at org.pentaho.di.job.JobMeta.<init>(JobMeta.java:726)
... 5 more
Caused by: org.pentaho.di.core.exception.KettleXMLException:
Unable to read Job Entry copy info from XML node : org.pentaho.di.core.exceptio
.KettleStepLoaderException:
No valid step/plugin specified (jobPlugin=null) for SPECIAL
No valid step/plugin specified (jobPlugin=null) for SPECIAL
at org.pentaho.di.job.entry.JobEntryCopy.<init>(JobEntryCopy.java:134)
at org.pentaho.di.job.JobMeta.loadXML(JobMeta.java:922)
... 6 more
Caused by: org.pentaho.di.core.exception.KettleStepLoaderException:
No valid step/plugin specified (jobPlugin=null) for SPECIAL
at org.pentaho.di.job.entry.JobEntryCopy.<init>(JobEntryCopy.java:110)
... 7 more
i'm using theses jars:
kettle-core-3.2.2.jar
kettle-db-3.2.2.jar
kettle-engine-3.2.2.jar
so any suggestions please ?
Thank you,

Once have a look at the code pasted below and it worked. You need to add all the Jars present in the data-integration/lib to the classpath not only the ones you have mentioned above.
public class ExecuteJob {
public static void main(String[] args) throws Exception {
String filename = args[0];
KettleEnvironment.init();
JobMeta jobMeta = new JobMeta(filename, null);
Job job = new Job(null, jobMeta);
job.start();
job.waitUntilFinished();
if (job.getErrors()!=0) {
System.out.println(“Error encountered!”);
} }
}

Related

Pass argument via JNLP file into java web start application

I'm beginner in Java programming, and I'm trying to make Java Web Start app. I need to pass some args to application, trough JNLP. When I hardcode values in web app, JNLP without args works fine, and start application. Also, when launch application from cmd with arguments, it's working. So I think that problem is in the JNLP file. When I try to launch app with JNLP that contains arguments, start failed with error:
java.lang.ArrayIndexOutOfBoundsException: 0
JNLP file looks like:
<application-desc main-class="SpeedScanLaunch" >
<argument>http://localhost/TestApp/Dam/</argument >
<argument>x</argument >
<argument>50004</argument >
</application-desc >
and Java class looks like:
import java.io.*;
import javax.swing.JOptionPane;
public class SpeedScanLaunch
{
public static void main(String args[])throws IOException
{ String p = args[0];
String t = "/type:" + args[1];
String id = "/id:" + args[2];
JOptionPane.showMessageDialog(null, p);
JOptionPane.showMessageDialog(null, t);
JOptionPane.showMessageDialog(null, id);
...
Does anyone know what the problem with JNLP is?
In addition, screnn freom JaNeLa:
Screenshot from janela
And text report from JaNeLa:
JaNeLA Report - version 11.05.17
Report for file:/E:/eclipse_workspace/TestProjekt/bin/SpeedScanLaunch1.jnlp
Content type application/xml does not equal expected type of application/x-java-jnlp-file
cvc-complex-type.2.4.d: Invalid content was found starting with element 'security'. No child element is expected at this point.
cvc-complex-type.2.4.d: Invalid content was found starting with element 'security'. No child element is expected at this point.
XML encoding not known, but declared as utf-8
Codebase '.' is a malformed URL! Defaulting to file:/E:/eclipse_workspace/TestProjekt/bin/SpeedScanLaunch1.jnlp
Codebase + href 'file:/E:/eclipse_workspace/TestProjekt/bin/SpeedScanLaunch.jnlp' is not equal to actual location of 'file:/E:/eclipse_workspace/TestProjekt/bin/SpeedScanLaunch1.jnlp'.
Optimize this application for off-line use by adding the <offline-allowed /> flag.
Codebase '.' is a malformed URL! Defaulting to file:/E:/eclipse_workspace/TestProjekt/bin/SpeedScanLaunch1.jnlp
Codebase '.' is a malformed URL! Defaulting to file:/E:/eclipse_workspace/TestProjekt/bin/SpeedScanLaunch1.jnlp
Codebase '.' is a malformed URL! Defaulting to file:/E:/eclipse_workspace/TestProjekt/bin/SpeedScanLaunch1.jnlp
Downloads can be optimized by specifying a resource size for 'SpeedScanLaunch.jar'.
The resource download at SpeedScanLaunch.jar can be optimized by removing the (default) value of download='eager'.
The resource download at SpeedScanLaunch.jar can be optimized by removing the (default) value of main='false'.
It might be possible to optimize the start-up of the app. by specifying download='lazy' for the SpeedScanLaunch.jar resource.
Lazy downloads might not work as expected for SpeedScanLaunch.jar unless the download 'part' is specified.

What is the proper value of HADOOP_HOME and PIG_CLASSPATH for APACHE HADOOP version 2.8.0?

I have the problem executing the Hadoop command from the PIG command line. The command and the error stack is below
My instructor suspects that it is because HADDOP_HOME and PIG_CLASSPATH are incorrect. I am on the HADOOP version 2.8.0.
So, originally I had HADOOP_HOME as
HADOOP_HOME=<CELLAR_DIRECTORY>/hadoop/2.8.0/
Then I switched the following setup:
HADOOP_HOME=<CELLAR_DIRECTORY>/hadoop/2.8.0/libexec/etc/hadoop
PIG_CLASSPATH is defined as $HADOOP_HOME
Commands I used in pig:
A = LOAD '/Users/anarinsky/Downloads/loaddata1.txt';
B = MAPREDUCE '/Users/anarinsky/workspace/wordcount/target/wordcount-1.jar' STORE A INTO '/Users/anarinsky/Downloads/tempwrite2' LOAD '/Users/anarinsky/Downloads/tempwrite2' AS (word:chararray, count:int) `com.systemskills.hadoop.wordcount.WordCountDriver /wordcountdata /Users/anarinsky/Downloads/pigoptdir`;
Pig Stack Trace
ERROR 2025: Expected leaf of reduce plan to always be POStore. Found PONative
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias B
at org.apache.pig.PigServer.openIterator(PigServer.java:1019)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:747)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:231)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:206)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:564)
at org.apache.pig.Main.main(Main.java:176)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias B
at org.apache.pig.PigServer.storeEx(PigServer.java:1122)
at org.apache.pig.PigServer.store(PigServer.java:1081)
at org.apache.pig.PigServer.openIterator(PigServer.java:994)
... 13 more
Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException: ERROR 2025: Expected leaf of reduce plan to always be POStore. Found PONative
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:321)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:629)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:152)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:308)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1474)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1459)
at org.apache.pig.PigServer.storeEx(PigServer.java:1118)
... 15 more
Alex!
Unfortunately, it's not related to Pig paths (tried it on my configured hadoop cluster) with same result. The error you get refers to the fact that Physical plan compiler has a bug in compile method. So in order to make your attempt work you have two possibilities
Run native MR job using hadoop and after it finishes process it's results in pig
Edit pig source code and compile your own version. You'll need to edit
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler#compile method and replace
for (PhysicalOperator op : leaves) {
if (!(op instanceof POStore)) {
int errCode = 2025;
String msg = "Expected leaf of reduce plan to " +
"always be POStore. Found " + op.getClass().getSimpleName();
throw new MRCompilerException(msg, errCode, PigException.BUG);
}
}
with
for (PhysicalOperator op : leaves) {
if (!(op instanceof POStore) && !(op instanceof PONative)) {
int errCode = 2025;
String msg = "Expected leaf of reduce plan to " +
"always be POStore. Found " + op.getClass().getSimpleName();
throw new MRCompilerException(msg, errCode, PigException.BUG);
}
}

Found an unexpected Node #text with name = null and with text content =

While deploying a service to weblogic I got the following exception despite cxf successfully generating sources:
weblogic.management.DeploymentException: Error encountered during prepare phase
of deploying WebService module 'brt-service-1.0.war'. While deploying WebService
module 'brt-service-1.0.war'. Error encountered while attempting to Load WSDL
Definitions for WSDL: 'zip:C:/devapps/weblogicDomain/servers/myserver/tmp/_WL_us
er/_appsdir_brt-service-1.0_war/l2mgsr/war/WEB-INF/lib/_wl_cls_gen.jar!/brt.wsdl
'. Found an un expeced Node #text with name = null and with text content =
at weblogic.wsee.deploy.WSEEModule.prepare(WSEEModule.java:149)
at weblogic.wsee.deploy.AppDeploymentExtensionFactory.prepare(AppDeploym
entExtensionFactory.java:79)
at weblogic.wsee.deploy.AppDeploymentExtensionFactory.access$100(AppDepl
oymentExtensionFactory.java:15)
at weblogic.wsee.deploy.AppDeploymentExtensionFactory$1.prepare(AppDeplo
ymentExtensionFactory.java:219)
at weblogic.application.internal.flow.AppDeploymentExtensionFlow.prepare
(AppDeploymentExtensionFlow.java:23)
Truncated. see log file for complete stacktrace
Caused By: weblogic.wsee.wsdl.WsdlException: Found an un expeced Node #text with
name = null and with text content =
at weblogic.wsee.wsdl.WsdlReader.checkDomElement(WsdlReader.java:106)
at weblogic.wsee.wsdl.internal.WsdlExtensibleImpl.parse(WsdlExtensibleIm
pl.java:114)
at weblogic.wsee.wsdl.internal.WsdlDefinitionsImpl.parseChild(WsdlDefini
tionsImpl.java:564)
at weblogic.wsee.wsdl.internal.WsdlExtensibleImpl.parse(WsdlExtensibleIm
pl.java:116)
at weblogic.wsee.wsdl.internal.WsdlDefinitionsImpl.parse(WsdlDefinitions
Impl.java:501)
Truncated. see log file for complete stacktrace
I'm not proud of how long this took me to find. The simple things are so often the hardest.
<wsdl:part name="response" element="beans:changeRequestSubmitRes"/>]
That errant typo'd bracket didn't cause a problem for CXF, but Weblogic, at least at 10.3.6, didn't know what to do with it.

Apache Pig Input Path error using Cloudera quick-start vm and pig shell

I tried to run the the following pig commands for a yelp assignment:
-- ******* PIG LATIN SCRIPT for Yelp Assignmet ******************
-- 0. get function defined for CSV loader
register /usr/lib/pig/piggybank.jar;
define CSVLoader org.apache.pig.piggybank.storage.CSVLoader();
-- The data-fu jar file has a CSVLoader with more options, like reading multiline records,
-- but for this assignment we don't need it, so the next line is commented out
-- register /home/cloudera/incubator-datafu/datafu-pig/build/libs/datafu-pig-incubating-1.3.0-SNAPSHOT.jar;
-- 1 load data
Y = LOAD '/usr/lib/hue/apps/search/examples/collections/solr_configs_yelp_demo/index_data.csv' USING CSVLoader() AS(business_id:chararray,cool,date,funny,id,stars:int,text:chararray,type,useful:int,user_id,name,full_address,latitude,longitude,neighborhoods,open,review_count,state);
Y_good = FILTER Y BY (useful is not null and stars is not null);
--2 Find max useful
Y_all = GROUP Y_good ALL;
Umax = FOREACH Y_all GENERATE MAX(Y_good.useful);
DUMP Umax
Unfortunately, I get the following Error:
Failed!
Failed Jobs: JobId Alias Feature Message Outputs
job_1455222366557_0010 Umax,Y,Y_all,Y_good GROUP_BY,COMBINER Message:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118:
Input path does not exist:
hdfs://quickstart.cloudera:8020/usr/lib/hue/apps/search/examples/collections/solr_configs_yelp_demo/index_data.csv
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306) at
org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303) at
org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at
org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
at java.lang.Thread.run(Thread.java:745) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
Caused by:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
path does not exist:
hdfs://quickstart.cloudera:8020/usr/lib/hue/apps/search/examples/collections/solr_configs_yelp_demo/index_data.csv
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
... 18 more
hdfs://quickstart.cloudera:8020/tmp/temp864992621/tmp897146964,
Input(s): Failed to read data from
"/usr/lib/hue/apps/search/examples/collections/solr_configs_yelp_demo/index_data.csv"
Output(s): Failed to produce result in
"hdfs://quickstart.cloudera:8020/tmp/temp864992621/tmp897146964"
Counters: Total records written : 0 Total bytes written : 0 Spillable
Memory Manager spill count : 0 Total bags proactively spilled: 0 Total
records proactively spilled: 0
Job DAG: job_1455222366557_0010
2016-02-15 06:22:16,643 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed! 2016-02-15 06:22:16,686 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator
for alias Umax Details at logfile:
/home/cloudera/pig_1455546020789.log
I have checked the the path to the file here (see image below):
It seems it resembles the same path seen in the error:
hdfs://quickstart.cloudera:8020/usr/lib/hue/apps/search/examples/collections/solr_configs_yelp_demo/index_data.csv
So, I do not know how else to resolve it! Could there be something else I am not seeing? Any help will be appreciated. Thanks in Advance.
You need to upload your csv into hdfs (using hadoop dfs -put ) and give the path in load command (load '{hdfs path of csv file}' )
The system (whatever it means) expects the path of the file to be loaded to start under /user/cloudera/
So, the way out is to put the file somewhere there, like this, for example:
hdfs dfs -put /usr/lib/hue/apps/search/examples/collections/solr_configs_yelp_demo/index_data.csv /user/cloudera/pigin
and show the following path in the Load:
Y = LOAD 'pigin/index_data.csv' USING CSVLoader() AS(business_id:chararray,cool,date,funny,id,stars:int,text:chararray,type,useful:int,user_id,name,full_address,latitude,longitude,neighborhoods,open,review_count,state);

How to create an XMLConfiguration on demand

I'm using commons-configuration 1.10 and would like to have my configuration file created only when needed.
For now, I have
XMLConfiguration config= new XMLConfiguration(file);
config.setReloadingStrategy(new FileChangedReloadingStrategy());
config.setAutoSave(true);
But, when I try to call
config.setProperty("portal.0.name", portal.getName());
I get
Caused by: org.apache.commons.configuration.ConfigurationException: org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.
at org.apache.commons.configuration.XMLConfiguration.createDocument(XMLConfiguration.java:914)
at org.apache.commons.configuration.XMLConfiguration.save(XMLConfiguration.java:1034)
at org.apache.commons.configuration.AbstractHierarchicalFileConfiguration$FileConfigurationDelegate.save(AbstractHierarchicalFileConfiguration.java:570)
at org.apache.commons.configuration.AbstractFileConfiguration.save(AbstractFileConfiguration.java:557)
at org.apache.commons.configuration.AbstractFileConfiguration.save(AbstractFileConfiguration.java:524)
at org.apache.commons.configuration.AbstractFileConfiguration.save(AbstractFileConfiguration.java:474)
at org.apache.commons.configuration.AbstractFileConfiguration.save(AbstractFileConfiguration.java:441)
at org.apache.commons.configuration.AbstractFileConfiguration.save(AbstractFileConfiguration.java:418)
at org.apache.commons.configuration.AbstractFileConfiguration.possiblySave(AbstractFileConfiguration.java:749)
... 29 more
Caused by: org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.
at org.apache.xerces.dom.CoreDocumentImpl.createElement(Unknown Source)
at org.apache.commons.configuration.XMLConfiguration$XMLBuilderVisitor.insert(XMLConfiguration.java:1529)
at org.apache.commons.configuration.HierarchicalConfiguration$BuilderVisitor.visitBeforeChildren(HierarchicalConfiguration.java:1734)
at org.apache.commons.configuration.HierarchicalConfiguration$Node.visit(HierarchicalConfiguration.java:1401)
at org.apache.commons.configuration.HierarchicalConfiguration$Node.visit(HierarchicalConfiguration.java:1407)
at org.apache.commons.configuration.XMLConfiguration$XMLBuilderVisitor.processDocument(XMLConfiguration.java:1504)
at org.apache.commons.configuration.XMLConfiguration.createDocument(XMLConfiguration.java:908)
... 37 more
Which seems to indicate file can't be saved by autosave mechanism.
So, is there something I do bad ?
And how can I create the configuration while creating the file only when needed (cause otherwise it will require some weird XML file copy)
I used portal.0.name where I should have used portal(0).name, which was solved by ... well ... reading the doc.

Categories