PHP/Java bridge problem - java

I am using tomcat 6 on windows. Here is the code I am testing.
import java.io.ByteArrayOutputStream;
import java.io.Closeable;
import java.io.StringReader;
import javax.script.Invocable;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
/**
* Create and run THREAD_COUNT PHP threads, concurrently accessing a
* shared resource.
*
* Create 5 script engines, passing each a shared resource allocated
* from Java. Each script engine has to implement Runnable.
*
* Java accesses the Runnable script engine using
* scriptEngine.getInterface() and calls thread.start() to invoke each
* PHP Runnable implementations concurrently.
*/
class PhpThreads {
public static final String runnable = new String("<?php\n" +
"function run() {\n" +
" $out = java_context()->getAttribute('sharedResource', 100);\n" +
" $nr = (string)java_context()->getAttribute('nr', 100);\n" +
" echo \"started thread: $nr\n\";\n" +
" for($i=0; $i<100; $i++) {\n" +
" $out->write(ord($nr));\n" +
" java('java.lang.Thread')->sleep(1);\n" +
" }\n" +
"}\n" +
"?>\n");
static final int THREAD_COUNT = 5;
public static void main(String[] args) throws Exception {
ScriptEngineManager manager = new ScriptEngineManager();
Thread threads[] = new Thread[THREAD_COUNT];
ScriptEngine engines[] = new ScriptEngine[THREAD_COUNT];
ByteArrayOutputStream sharedResource = new ByteArrayOutputStream();
StringReader runnableReader = new StringReader(runnable);
// create THREAD_COUNT PHP threads
for (int i=0; i<THREAD_COUNT; i++) {
engines[i] = manager.getEngineByName("php-invocable");
if (engines[i] == null)
throw new NullPointerException ("php script engine not found");
engines[i].put("nr", new Integer(i+1));
engines[i].put("sharedResource", sharedResource);
engines[i].eval(runnableReader);
runnableReader.reset();
// cast the whole script to Runnable; note also getInterface(specificClosure, type)
Runnable r = (Runnable) ((Invocable)engines[i]).getInterface(Runnable.class);
threads[i] = new Thread(r);
}
// run the THREAD_COUNT PHP threads
for (int i=0; i<THREAD_COUNT; i++) {
threads[i].start();
}
// wait for the THREAD_COUNT PHP threads to finish
for (int i=0; i<THREAD_COUNT; i++) {
threads[i].join();
((Closeable)engines[i]).close();
}
// print the output generated by the THREAD_COUNT concurrent threads
String result = sharedResource.toString();
System.out.println(result);
// Check result
Object res=manager.getEngineByName("php").eval(
"<?php " +
"exit((int)('10011002100310041005'!=" +
"#system(\"echo -n "+result+"|sed 's/./&\\\n/g'|sort|uniq -c|tr -d ' \\\n'\")));" +
"?>");
System.exit(((Number)res).intValue());
}
}
I have added all the libraries. When I run the file I get the following error -
run:
Exception in thread "main" javax.script.ScriptException: java.io.IOException: Cannot run program "php-cgi": CreateProcess error=2, The system cannot find the file specified
at php.java.script.InvocablePhpScriptEngine.eval(InvocablePhpScriptEngine.java:209)
at php.java.script.SimplePhpScriptEngine.eval(SimplePhpScriptEngine.java:178)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:232)
at PhpThreads.main(NewClass.java:53)
Caused by: java.io.IOException: Cannot run program "php-cgi": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
at java.lang.Runtime.exec(Runtime.java:593)
at php.java.bridge.Util$Process.start(Util.java:1064)
at php.java.bridge.Util$ProcessWithErrorHandler.start(Util.java:1166)
at php.java.bridge.Util$ProcessWithErrorHandler.start(Util.java:1217)
at php.java.script.CGIRunner.doRun(CGIRunner.java:126)
at php.java.script.HttpProxy.doRun(HttpProxy.java:63)
at php.java.script.CGIRunner.run(CGIRunner.java:111)
at php.java.bridge.ThreadPool$Delegate.run(ThreadPool.java:60)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:81)
at java.lang.ProcessImpl.start(ProcessImpl.java:30)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
... 8 more
What am I missing?

Just add this to your command line:
-Dphp.java.bridge.php_exec=/usr/bin/php
Problem solved!

copy the correct VERSION of the PHP of php5ts.dll and php-cgi.exe to "WEB-INF\cgi\amd64-windows" directory. then restart Tomcat. good luck.

...php-cgi...The system cannot find the file specified
I'm guessing that manager.getEngineByName("php-invocable") should return a wrapper around a system call to run PHP - but that wrapper doesn't know where to find the PHP executable.
A quick glance at the website for the PHP/Java bridge, and I infer that the path is hard coded in the Java - "For further information please see the INSTALL.J2EE file from the documentation download"
The Javadoc is decidedly vague on the topic.
You need to specifically make the -cgi version of PHP at compile time, assuming you've done that, and it is called php-cgi, then as a quick hack you could perpper your filesystem with links named "php-cgi" (its probably expected to be in /bin, /usr/bin/, /usr/local/bin, or the Java may be smart enough to check $PATH)
C.

when you get error like
Fatal Error: Failed to start PHP ["php-cgi", "-v"], reason: java.io.IOException:
Cannot run program ""php-cgi"" (in directory "C:\Documents and Settings\Adminis
trator"): CreateProcess error=2, The system cannot find the file specified
Could not start FCGI server: java.io.IOException: PHP not found. Please install
php-cgi. PHP test command was: [php-cgi, -v]
php.java.bridge.http.FCGIConnectException: Could not connect to server
at php.java.bridge.http.NPChannelFactory.test(NPChannel`enter code here`Factory.java:64)
at php.java.bridge.http.FCGIConnectionPool.<init>(FCGIConnectionPool.jav
a:175)
at php.java.bridge.http.FCGIConnectionPool.<init>(FCGIConnectionPool.jav
a:189)
at php.java.servlet.ContextLoaderListener.createConnectionPool(ContextLo
aderListener.java:541)
at php.java.servlet.ContextLoaderListener.contextInitialized(ContextLoad
erListener.java:185)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContex
t.java:4135)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4
630)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase
.java:791)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:77
1)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:546)
at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.jav
a:1041)
at org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.j
ava:964)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502
)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java
:321)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(Lifecycl
eSupport.java:119)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:785)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:445
)
at org.apache.catalina.core.StandardService.start(StandardService.java:5
19)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:710
)
at org.apache.catalina.startup.Catalina.start(Catalina.java:581)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Caused by: java.io.IOException: File \\.\pipe\C:\Documents and Settings\Administ
rator\Desktop\softwares\apache-tomcat-6.0.29\temp\JavaBridge3144995283109409611.
socket not writable
at php.java.bridge.http.FCGIConnectException.<init>(FCGIConnectException
.java:37)
... 29 more
Caused by: java.io.IOException: PHP not found. Please install php-cgi. PHP test
command was: [php-cgi, -v]
at php.java.bridge.Util$Process.start(Util.java:1145)
at php.java.servlet.fastcgi.FCGIProcess.start(FCGIProcess.java:68)
at php.java.bridge.http.NPChannelFactory.doBind(NPChannelFactory.java:94
)
at php.java.bridge.http.FCGIConnectionFactory.runFcgi(FCGIConnectionFact
ory.java:88)
at php.java.bridge.http.FCGIConnectionFactory$1.run(FCGIConnectionFactor
y.java:109)
with the JavaBridge.war deployment in (windows, tomcat)
please specify the path for the php installation in your environment variable.

Related

How to resolve "Error: long vectors not supported yet: qap_encode.c:36"?

I am trying to connect java with R using Rserve
Java: 1.8.0_151
R: 3.5.0
OS: Mac 10.13.4 HighSierra
To connect R with Java, I typed the following on RStudio
install.packages("Rserve")
library(Rserve)
Rserve(args="--no-save")
things went smooth and I was so happy about it.
Then I jumped back to Java (Java Eclipse so to speak) and continued typing. Here is what I've done on Eclipse
package rserve;
import org.rosuda.REngine.REXPMismatchException;
import org.rosuda.REngine.REngineException;
import org.rosuda.REngine.Rserve.RConnection;
import org.rosuda.REngine.Rserve.RserveException;
public class WordCloud1 {
public static void main(String[] args) throws REngineException,
REXPMismatchException {
RConnection c = new RConnection();
String path = "/Users/JinhoShin/Desktop/study/R/r_temp2";
String file = "seoul_new.txt";
c.parseAndEval("library(KoNLP)");
c.parseAndEval("useSejongDic()");
c.parseAndEval("library(wordcloud)");
c.parseAndEval("library(RColorBrewer)");
c.parseAndEval("setwd('" + path + "')");
c.parseAndEval("data1=readLines('" + file + "')");
c.parseAndEval("data2 = sapply(data1,extractNoun,USE.NAMES=F)");
c.parseAndEval("data3 = unlist(data2)");
c.parseAndEval("data3=gsub('seoul','',data3)");
c.parseAndEval("data3=gsub('request','',data3)");
c.parseAndEval("data3=gsub('place','',data3)");
c.parseAndEval("data3=gsub('transportation','',data3)");
c.parseAndEval("data3=gsub(' ','',data3)");
c.parseAndEval("data3=gsub('-','',data3)");
c.parseAndEval("data3=gsub('OO','',data3)");
c.parseAndEval("write(unlist(data3),'seoul_2.txt')");
c.parseAndEval("data4 = read.table('seoul_2.txt')"); ########this is what blows me up
c.parseAndEval("wordcount=table(data4)");
c.parseAndEval("palete = brewer.pal(9,'Set3')");
c.parseAndEval(
"wordcloud(names(wordcount),freq = wordcount,scale=c(5,1),rot.per=0.25, min.freq = 1," +
" random.order=F, random.color = T, colors=palete)");
c.parseAndEval("savePlot('0517seoul.png', type = 'png')");
c.parseAndEval("dev.off()");
c.close();
}
}
as you notice from the code
c.parseAndEval("data4 = read.table('seoul_2.txt')"); => at rserve.WordCloud1.main(WordCloud1.java:30)
I have no idea why it can't read my text file despite the fact that it could write that file.
This is what Java Eclipse console keeps showing me
Exception in thread "main" org.rosuda.REngine.REngineException: eval failed
at org.rosuda.REngine.Rserve.RConnection.parseAndEval(RConnection.java:499)
at org.rosuda.REngine.REngine.parseAndEval(REngine.java:108)
at rserve.WordCloud1.main(WordCloud1.java:30)
Caused by: org.rosuda.REngine.Rserve.RserveException: eval failed
at org.rosuda.REngine.Rserve.RConnection.eval(RConnection.java:261)
at org.rosuda.REngine.Rserve.RConnection.parseAndEval(RConnection.java:497)
... 2 more
and this is what RStudio keeps showing me
Error: long vectors not supported yet: qap_encode.c:36
Fatal error: unable to initialize the JIT
I tried everything I could do to resolve this issue, but still I am on the same spot.

Activating virtualenv via Java ProcessBuilder

Getting the following when trying to programmatically activate Python's virtualenv via the code below:
java.io.IOException: Cannot run program "." (in directory "/Users/simeon.../..../reporting"): error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at VirtualEnvCreateCmdTest.runCommandInDirectory(VirtualEnvCreateCmdTest.java:30)
at VirtualEnvCreateCmdTest.createVirtEnv(VirtualEnvCreateCmdTest.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at ......
Caused by: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 25 more
Code:
public class VirtualEnvCreateCmdTest {
private final static Logger LOG = LoggerFactory.getLogger(VirtualEnvCreateCmdTest.class);
private void runCommandInDirectory(String path,String ... command) throws Throwable
{
LOG.info("Running command '"+String.join(" ",command)+"' in path '"+path+"'");
ProcessBuilder builder = new ProcessBuilder(command)
.directory(new File(path))
.inheritIO();
Process pr = builder.start();
final String failureMsg = format("Failed to run '%s' in path '%s'. Got exit code: %d", join(" ",command), path, pr.exitValue());
LOG.info("prepared message {}: ", failureMsg);
if(!pr.waitFor(120, TimeUnit.SECONDS))
{
throw new Exception(failureMsg);
}
int output = IOUtils.copy(pr.getInputStream(), System.out);
int exitCode=pr.exitValue();
if(exitCode!=0)
throw new Exception(failureMsg);
}
#Test
public void createVirtEnv() throws Throwable {
String path = "/Users/simeon/.../reporting";
String [] commands = new String[]{".", "activate"};
//String [] commands = new String [] {"/bin/bash", "-c", ". /Users/simeon/..../venv2.7/bin/activate"};
runCommandInDirectory(path, commands);
}
Changing permissioning on the file, doesn't seem to work:
chmod u+x ./bin/activate
Doing it via /bin/bash still doesn't work though with a different error.
At the same time the following works fine on the command line:
. /Users/simeon/.../venv2.7/bin/activate
Any samples of how one would invoke python virtualenv activate command from within Java?
=== What worked: =====
The following ended up working for me:
private void runDjangoMigrate() throws Throwable {
final String REPORTING_PROJECT_LOCATION = "/Users/simeon.../.../reporting";
final String UNIX_SHELL_LOCATION = "/bin/bash";
final String PYTHON_VIRTUALENV_ACTIVATOR_COMMAND =". /Users/simeon.../.../venv2.7/bin/activate;";
final String PYTHON_VIRTUALENV_ACTIVATOR_COMMAND =". " + PYTHON_VIRTUALENV_ACTIVATE_SCRIPT_LOCATION + ";";
final String DJANGO_MANAGE_MODULE = " manage.py";
final String[] DJANGO_MIGRATE_COMMAND = new String[] { UNIX_SHELL_LOCATION, "-c", PYTHON_VIRTUALENV_ACTIVATOR_COMMAND
+ PYTHON_INTERPRETER + DJANGO_MANAGE_MODULE + " migrate --noinput --fake-initial" };
runCommandInDirectory(REPORTING_PROJECT_LOCATION, DJANGO_MIGRATE_COMMAND);
}
private void runCommandInDirectory(String path, String... command) throws Throwable {
LOG.info(format("Running '%s' command in path '%s'", join(" ",command),path));
ProcessBuilder builder = new ProcessBuilder(command).directory(new File(path)).inheritIO();
Process pr = null;
pr = builder.start();
IOUtils.copy(pr.getInputStream(), System.out);
boolean terminated = pr.waitFor(SPAWNED_PYTHON_PROCESS_TTL_SEC, SECONDS);
int exitCode = -1;
if (terminated) {
exitCode = pr.exitValue();
}
if (exitCode != 0) {
final String failureMsg = format("Failed to run '%s' in path '%s'. Got exit code: %d", join(" ", command),
path, pr.exitValue());
throw new Exception(failureMsg);
}
}
and producing the following expected output:
[java] System check identified some issues:
[java]
[java] WARNINGS:
[java] ?: (urls.W005) URL namespace 'admin' isn't unique. You may not be able to reverse all URLs in this namespace
[java] Building permissions...
[java] Operations to perform:
[java] Apply all migrations: admin, auth, contenttypes, sessions
[java] Running migrations:
[java] No migrations to apply.
[java] System check identified some issues:
[java]
[java] WARNINGS:
[java] ?: (urls.W005) URL namespace 'admin' isn't unique. You may not be able to reverse all URLs in this namespace
[java] Building permissions...
[java] User exists, exiting normally due to --preserve
[java] System check identified some issues:
[java]
You can activate a Python virtual environment in bash (and perhaps zsh) and Python, but not in Java or any other environment. The main thing to understand is that virtual environment activation doesn't do some system magic — the activation script just changes the current environment, and Python virtual environments are prepared to changes shell or Python but nothing else.
In terms of Java it means that when you call runCommandInDirectory() it runs a new shell, that new shell briefly activates a virtual env, but then the shell exits and all changes that "activates" the virtual env are gone.
This in turn means that if you need to run some shell or Python commands in a virtual env you have to activate the environment in every runCommandInDirectory() invocation:
String [] commands1 = new String [] {"/bin/bash", "-c", ". /Users/simeon/..../venv2.7/bin/activate; script1.sh"};
runCommandInDirectory(path, commands)
String [] commands2 = new String [] {"/bin/bash", "-c", ". /Users/simeon/..../venv2.7/bin/activate; script2.sh"};
runCommandInDirectory(path, commands)
For Python scripts it's a bit simpler as you can run python from the environment and it automagically activates the environment:
String [] commands = new String [] {"/Users/simeon/..../venv2.7/bin/python", "script.py"};
runCommandInDirectory(path, commands)

Getting Spring-XD and the hdfs sink to work for maprfs

This is a question about spring-xd release 1.0.1 working together with maprfs, which is officially not yet supported. Still I would like to get it to work.
So this is what we did:
1) adjusted the xd-shell and xd-worker and xd-singlenode shell scripts to accept the parameter --hadoopDistro mapr
2) added libraries to the new directory $XD_HOME/lib/mapr
avro-1.7.4.jar jersey-core-1.9.jar
hadoop-annotations-2.2.0.jar jersey-server-1.9.jar
hadoop-core-1.0.3-mapr-3.0.2.jar jetty-util-6.1.26.jar
hadoop-distcp-2.2.0.jar maprfs-1.0.3-mapr-3.0.2.jar
hadoop-hdfs-2.2.0.jar protobuf-java-2.5.0.jar
hadoop-mapreduce-client-core-2.2.0.jar spring-data-hadoop-2.0.2.RELEASE-hadoop24.jar
hadoop-streaming-2.2.0.jar spring-data-hadoop-batch-2.0.2.RELEASE-hadoop24.jar
hadoop-yarn-api-2.2.0.jar spring-data-hadoop-core-2.0.2.RELEASE-hadoop24.jar
hadoop-yarn-common-2.2.0.jar spring-data-hadoop-store-2.0.2.RELEASE-hadoop24.jar
3) run bin/xd-singlenode --hadoopDistro mapr and shell/bin/xd-shell --hadoopDistro mapr.
When creating and deploying a stream via stream create foo --definition "time | hdfs" --deploy, data is written to a file tmp/xd/foo/foo-1.txt.tmp on maprfs. Yet when undeploying the stream, the following exceptions appears:
org.springframework.data.hadoop.store.StoreException: Failed renaming from /xd/foo/foo-1.txt.tmp to /xd/foo/foo-1.txt; nested exception is java.io.FileNotFoundException: Requested file /xd/foo/foo-1.txt does not exist.
at org.springframework.data.hadoop.store.support.OutputStoreObjectSupport.renameFile(OutputStoreObjectSupport.java:261)
at org.springframework.data.hadoop.store.output.TextFileWriter.close(TextFileWriter.java:92)
at org.springframework.xd.integration.hadoop.outbound.HdfsDataStoreMessageHandler.doStop(HdfsDataStoreMessageHandler.java:58)
at org.springframework.xd.integration.hadoop.outbound.HdfsStoreMessageHandler.stop(HdfsStoreMessageHandler.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:201)
at com.sun.proxy.$Proxy120.stop(Unknown Source)
at org.springframework.integration.endpoint.EventDrivenConsumer.doStop(EventDrivenConsumer.java:64)
at org.springframework.integration.endpoint.AbstractEndpoint.stop(AbstractEndpoint.java:100)
at org.springframework.integration.endpoint.AbstractEndpoint.stop(AbstractEndpoint.java:115)
at org.springframework.integration.config.ConsumerEndpointFactoryBean.stop(ConsumerEndpointFactoryBean.java:303)
at org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:229)
at org.springframework.context.support.DefaultLifecycleProcessor.access$300(DefaultLifecycleProcessor.java:51)
at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.stop(DefaultLifecycleProcessor.java:363)
at org.springframework.context.support.DefaultLifecycleProcessor.stopBeans(DefaultLifecycleProcessor.java:202)
at org.springframework.context.support.DefaultLifecycleProcessor.stop(DefaultLifecycleProcessor.java:106)
at org.springframework.context.support.AbstractApplicationContext.stop(AbstractApplicationContext.java:1186)
at org.springframework.xd.module.core.SimpleModule.stop(SimpleModule.java:234)
at org.springframework.xd.dirt.module.ModuleDeployer.destroyModule(ModuleDeployer.java:132)
at org.springframework.xd.dirt.module.ModuleDeployer.handleUndeploy(ModuleDeployer.java:111)
at org.springframework.xd.dirt.module.ModuleDeployer.undeploy(ModuleDeployer.java:83)
at org.springframework.xd.dirt.server.ContainerRegistrar.undeployModule(ContainerRegistrar.java:261)
at org.springframework.xd.dirt.server.ContainerRegistrar$StreamModuleWatcher.process(ContainerRegistrar.java:884)
at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:67)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.io.FileNotFoundException: Requested file /xd/foo/foo-1.txt does not exist.
at com.mapr.fs.MapRFileSystem.getMapRFileStatus(MapRFileSystem.java:805)
at com.mapr.fs.MapRFileSystem.delete(MapRFileSystem.java:629)
at org.springframework.data.hadoop.store.support.OutputStoreObjectSupport.renameFile(OutputStoreObjectSupport.java:258)
... 29 more
I had a look at the OutputStoreObjectSupport.renameFile() function. When a file on hdfs is finished, this method tries to rename the file /xd/foo/foo-1.txt.tmp to xd/foo/foo1.txt. This is the relevant code:
try {
FileSystem fs = path.getFileSystem(getConfiguration());
boolean succeed;
try {
fs.delete(toPath, false);
log.info("Renaming path=[" + path + "] toPath=[" + toPath + "]");
succeed = fs.rename(path, toPath);
} catch (Exception e) {
throw new StoreException("Failed renaming from " + path + " to " + toPath, e);
}
if (!succeed) {
throw new StoreException("Failed renaming from " + path + " to " + toPath + " because hdfs returned false");
}
}
When the target file does not exist on hdfs, maprfs seems to throw an exception when fs.delete(toPath, false) is called. Yet throwing an exception in this case does not make sense. I assume that other Filesystem implementations behave differently, but this is a point I still need to verify. Unfortuntately I cannot find the sources for MapRFileSystem.java. Is this closed source? This would help me to better understand the issue. Has anybody experience with writing from spring-xd to maprfs? Or renaming files on maprfs with spring-data-hadoop?
Edit
I managed to reproduce the issue outside of spring XD with a simple test case (see below). Note that this exception is only thrown if the inWritingSuffix or the inWritingPrefix is set. Otherwise spring-hadoop will not attempt to rename the file. So this is the still somehow unsatisfactory workaround for me: refrain from using inWritingPrefixes and inWritingSuffixes.
#ContextConfiguration("context.xml")
#RunWith(SpringJUnit4ClassRunner.class)
public class MaprfsSinkTest {
#Autowired
Configuration configuration;
#Autowired
FileSystem filesystem;
#Autowired
DataStoreWriter<String >storeWriter;
#Test
public void testRenameOnMaprfs() throws IOException, InterruptedException {
Path testPath = new Path("/tmp/foo.txt");
filesystem.delete(testPath, true);
TextFileWriter writer = new TextFileWriter(configuration, testPath, null);
writer.setInWritingSuffix("tmp");
writer.write("some entity");
writer.close();
}
#Test
public void testStoreWriter() throws IOException {
this.storeWriter.write("something");
}
}
I created a new branch for spring-hadoop which supports maprfs:
https://github.com/blinse/spring-hadoop/tree/origin/2.0.2.RELEASE-mapr
Building this release and using the resulting jar works fine with the hdfs sink.

how to set Hadoop DistributedCache?

when I run the hadoop code to add the third jar,just like the following code:
public static void addTmpJar(String jarPath, JobConf conf) throws IOException {
System.setProperty("path.separator", ":");
FileSystem fs = FileSystem.getLocal(conf);
String newJarPath = new Path(jarPath).makeQualified(fs).toString();
String tmpjars = conf.get("tmpjars");
if (tmpjars == null || tmpjars.length() == 0) {
conf.set("tmpjars", newJarPath);
} else {
conf.set("tmpjars", tmpjars + "," + newJarPath);
}
}
I get the following exception:
Error initializing attempt_201405281453_0053_m_000002_0:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/hadoop/distcache/-7315515059647727905_-860888033_1107570546/nn.hadoop.dev/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201405281453_0053/libjars/mahout-core-0.8-job.jar
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getLocalCache(TrackerDistributedCacheManager.java:173)
at org.apache.hadoop.filecache.TaskDistributedCacheManager.setupCache(TaskDistributedCacheManager.java:187)
at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1320)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1311)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1226)
at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2603)
at java.lang.Thread.run(Thread.java:744)
any one who can tell how to solve this problem,thanks!
From the commandline you can add a jar to the distributedcache using -libjars, the only prerequisite is that your MR program implements Tool which uses GenericOptionsParser, the latter takes care of adding the jar to the cache.
This page explains the above in more detail

Pig ERROR 1066: Unable to open iterator for alias returned from UDF

I am trying to load up my own UDF in pig. I have made it into a jar using eclipse's export function. I am getting this 1066 error when running my pig script. I am not sure B = .. as I can dump A, but I can not dump B.
Script
REGISTER myudfs.jar;
DEFINE HOUR myudfs.HOUR;
A = load 'access_log_Jul95' using PigStorage(' ') as (ip:chararray, dash1:chararray, dash2:chararray, date:chararray, getRequset:chararray, status:int, port:int);
B = FOREACH A GENERATE HOUR(ip);
DUMP B;
Function
package myudfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.WrappedIOException;
public class HOUR extends EvalFunc<String>
{
#SuppressWarnings("deprecation")
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try{
String str = (String)input.get(0);
return str.toUpperCase();
}catch(Exception e){
throw WrappedIOException.wrap("Caught exception processing input row ", e);
}
}
}
Running command
pig -x mapreduce 2.pig
Data Format
199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245
| | | | |
ip date getRequest status port
Pig Stack Trace
ERROR 1066: Unable to open iterator for alias B
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias B
at org.apache.pig.PigServer.openIterator(PigServer.java:836)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:828)
... 12 more
I am extremely unfamiliar with pig, and any and all pointers would be greatly appreciated. I know this is a lot of information to look at, but I have had no luck in mutating any data in a UDF, and I am just not sure where I went wrong.
Thanks

Categories