Run a String through Java using Pig - java

I have a UDF jar which takes in a String as an input through Pig. This java file works through pig fine as running a 'hard coded' string such as this command
B = foreach f generate URL_UDF.mathUDF('stack.overflow');
Will give me the output I expect
My question is I am trying to get information from a text file and use my UDF with it. I load a file and want to pass data within that file which I have loaded to the UDF.
LoadData = load 'data.csv' using PigStorage(',');
f = foreach LoadData generate $0 as col0, $1 as chararray
$1 is the column I needed and researching data types (http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#Data+Types) a char array is used.
I then tryed using the following command
B = foreach f generate URL_UDF.mathUDF($1);
to pass the data into the jar which fails stating
java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String
If anybody has any solution to this that would be great.
The java code I am running is as follows
package URL_UDF;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import org.apache.pig.FilterFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.EvalFunc;
import org.apache.pig.PigWarning;
import org.apache.pig.data.Tuple;
import org.apache.commons.logging.Log;
import org.apache.*;
public class mathUDF extends EvalFunc<String> {
public String exec(Tuple arg0) throws IOException {
// TODO Auto-generated method stub
try{
String urlToCheck = (String) arg0.get(0);
return urlToCheck;
}catch (Exception e) {
// Throwing an exception will cause the task to fail.
throw new IOException("Something bad happened!", e);
}
}
}
Thanks

You can specify the schema with LOAD as follows
LoadData = load 'data.csv' using PigStorage(',') AS (col0: chararray, col1:chararray);
and pass col1 to the UDF.
Or
B = foreach LoadData generate (chararray)$1 AS col1:chararray;
Actually, this is a bug (PIG-2315) in Pig which will be fixed in 0.12.1. The AS clause in foreach does not work as one would expect.

Related

Error: Could not find or load main class using external libraries

I am trying to execute a java program I wrote. Doing the compiling is no problem ( I used : javac -cp \* *.java). But when i try to run it (using: java -cp \* FirstTestCase 1 1 data), it says Error: Could not find or load main class FirstTestCase.
this is the code:
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.security.cert.CertificateException;
import java.security.cert.CertificateFactory;
import java.security.cert.X509Certificate;
import java.util.Timer;
public class FirstTestCase {
public static void main(String[] args) throws IOException, InterruptedException, CertificateException {
// TODO Auto-generated method stub
int amount = 100000;
int threads = 150;
String file = "newData";
if(args.length == 3)
{
amount = Integer.parseInt(args[0]);
threads = Integer.parseInt(args[1]);
file = args[2];
}
X509Downloader d = new X509Downloader(amount, threads, file);
d.getCertificates();
}
}
We are also using external libraries, 4 to be precise. We use the -cp \* to include them all. When i run java FirstTestCase 1 1 data it succeeds to run it but then throws Exception about it not finding the classes that we use from the external libraries.
I normally use an IDE and therefore i barely have any experience using the terminal to start java programs.
I do use a Mac at the moment to test it, but we will run this program later on a server using Linux (in case the operating system matters)

Unable to import a Scala object into the Java project

I created a Scala object:
package myapp.data
import java.io.File
import myapp.models.NodeViewModel
import com.thoughtworks.xstream.XStream
import com.thoughtworks.xstream.io.xml.DomDriver
object ForumSerializer {
def openFile(file : File) : NodeViewModel = {
// doing something
}
def saveToFile(model : NodeViewModel) : Unit = {
// doing something
}
}
Then I tried to import it in another Java file
import myapp.ForumSerializer;
The error I get is:
Import myapp.ForumSerializer cannot be resolved.
What am I doing wrong?
Import it as ForumSerializer$.
Scala adds a $, so the compiler doesn't get confused with the class, when you have both an object and a class of the same name. You can then access the singleton object using the generated MODULE$.

why i am not able to read Html Content from a website in a file?

I have made a java program where in i can use any website to read its Html Content using Scanner class and Varargs.I am not able to get the output while i am using Scanner class and VarArgs.
Below is the following Code.
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import java.util.Scanner;
public class ReadWebsite
{
public static void main(String[] args) throws Exception
{
URL oracle = new URL(args[0]);
Scanner s=new Scanner(oracle.openStream());
while (s.hasNext())
{
System.out.println(s.nextLine());
}
s.close();
}
}
OutputShown
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at oodlesTech.ReadWebsite.main(ReadWebsite.java:15)
if you are running from eclipse you have to pass the arguments.
right click program - > run as - > run configurations -> arguments ->program arguments . in this tab pass the actual url which will be passed as args[0] to your main method.
You are not passing the argument to your java program.
For your testing you can either hard code it in code e.g. URL oracle = new URL("http://www.google.com"); or pass an argument to your java program, explained here

Post data from Matlab to Pachube (Cosm) using Java Methods

I am using JPachube.jar and Matlab in order to send data to my datastream. This java code works on my machine:
package smartclassroom;
import Pachube.Data;
import Pachube.Feed;
//import Pachube.FeedFactory;
import Pachube.Pachube;
import Pachube.PachubeException;
public class SendFeed {
public static void main(String arsg[]) throws InterruptedException{
SendFeed s = new SendFeed(0.0);
s.setZainteresovanost(0.3);
double output = s.getZainteresovanost();
System.out.println("zainteresovanost " + output);
try {
Pachube p = new Pachube("MYAPIKEY");
Feed f = p.getFeed(MYFEED);
f.updateDatastream(0, output);
} catch (PachubeException e) {
System.out.println(e.errorMessage);
}
}
private double zainteresovanost;
public SendFeed(double vrednost) {
zainteresovanost = vrednost;
}
public void setZainteresovanost(double vrednost) {
zainteresovanost = vrednost;
}
public double getZainteresovanost() {
return zainteresovanost;
}
}
but I need to do this from Matlab. I have tried rewriting example (example from link is working on my machine): I have compile java class with javac and added JPachube.jar and SendFeed.class into path and then utilize this code in Matlab:
javaaddpath('C:\work')
javaMethod('main','SendFeed','');
pachubeValue = SendFeed(0.42);
I get an error:
??? Error using ==> javaMethod
No class SendFeed can be located on Java class path
Error in ==> post_to_pachube2 at 6
javaMethod('main','SendFeed','');
This is strange because, as I said example from the link is working.
Afterwards, I decided to include JPachube directly in Matlab code and to write equivalent code in Matlab:
javaaddpath('c:\work\JPachube.jar')
import Pachube.Data.*
import Pachube.Feed.*
import Pachube.Pachube.*
import Pachube.PachubeException.*
pachube = Pachube.Pachube('MYAPIKEY');
feed = pachube.getFeed(MYFEED);
feed.updateDatastream(0, 0.54);
And I get this error:
??? No method 'updateDatastream' with matching signature found for class 'Pachube.Feed'.
Error in ==> post_to_pachube2 at 12
feed.updateDatastream(0, 0.54);
So I have tried almost everything and nothing! Any method making this work will be fine for me. Thanks for help in advance!
This done trick for me (answer from here)
javaaddpath('c:\work\httpcore-4.2.2.jar');
javaaddpath('c:\work\httpclient-4.2.3.jar');
import org.apache.http.impl.client.DefaultHttpClient
import org.apache.http.client.methods.HttpPost
import org.apache.http.entity.StringEntity
httpclient = DefaultHttpClient();
httppost = HttpPost('http://api.cosm.com/v2/feeds/FEEDID/datastreams/0.csv?_method=put');
httppost.addHeader('Content-Type','text/plain');
httppost.addHeader('X-ApiKey','APIKEY');
params = StringEntity('0.7');
httppost.setEntity(params);
response = httpclient.execute(httppost);
I would rather use built-in methods. Matlab hasurlread/urlwrite, which could work if all you wish to do is request some CSV data from Cosm API. If you do need to use JSON, it can be handled in Matlab via a plugin.
Passissing the Cosm API key, that can be done via key parameter like so:
cosm_feed_url = "https://api.cosm.com/v2/feeds/61916.csv?key=<API_KEY>"
cosm_feed_csv = urlread(cosm_feed_url)
However, the standard library methods urlread/urlwrite are rather limited. In fact, the urlwrite function is only designed for file input, and I cannot even see any official example of how one could use a formatted string instead. Creating a temporary file would reasonable, unless it's only a few lines of CSV.
You will probably need to use urlread2 for anything more serious.
UPDATE: it appears that urlread2 can be problematic.

Error: Could not find or load main class- Novice

Hi I am a novice in JAVA. I have been getting this file not found exception inspite of the file existing in the very location I have specified in the path which is
Initially I had the issue of file not found. However, after performing a clean and re-run, now I am having an issue which says
Error: Could not find or load main class main.main
import Message.*;
import java.util.*;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.PrintWriter;
public class main{
public static void main(String[] args) {
Message msg=new Message("bob","alice","request","Data####");
MPasser passerObj=new MPasser("C:\\Workspace\\config.txt","process1");
}
}
Also in the MPasser Constructor the following piece of relevant code is there
public class MPasser(String file_name,String someVariable){
InputStream input;
try {
input =new RandomAccessFile(file_name,"r");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Yaml yaml = new Yaml();
Map<String, String> Object = (Map<String, String>) yaml.load(input);
}
Sorry I have made edits from initial query so that it is more clear
On this line:
input = RandomAccessFile("C:\Workspace\conf.txt",'r');
You need to escape the \'s
input = RandomAccessFile("C:\\Workspace\\conf.txt",'r');
"C:\Workspace\conf.txt"
Those are escape sequences. You probably meant:
"C:\\Workspace\\conf.txt"
You also appear to call it config.txt in one snippet and conf.txt in the other?
Make sure the java process has permissions to read the file.
You have to escape the backslash.
input = RandomAccessFile("C:\\Workspace\\conf.txt",'r');
and also
input = new RandomAccessFile("C:\\Workspace\\conf.txt",'r');
and why you have two different filename conf.txt and config.txt.

Categories