Pig error 1070 when doing UDF - java

I am trying to load up my own UDF in pig. I have made it into a jar using eclipse's export function. I am trying to run it locally so I can make sure it works before I put the jar on HDFS. When running it locally, I get the following error:
ERROR 1070: Could not resolve myudfs.MONTH using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Script
REGISTER myudfs.jar;
--DEFINE MONTH myudfs.MONTH;
A = load 'access_log_Jul95' using PigStorage(' ') as (ip:chararray, dash1:chararray, dash2:chararray, date:chararray, getRequset:chararray, status:int, port:int);
B = FOREACH A GENERATE myudfs.MONTH(date);
DUMP B;
Function
package myudfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.WrappedIOException;
#SuppressWarnings("deprecation")
public class HOUR extends EvalFunc<String>
{
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try{
String str = (String)input.get(0);
return str.substring(1, 3);
}catch(Exception e){
throw WrappedIOException.wrap("Caught exception processing input row ", e);
}
}
}
Working Directory
1.pig 2.pig bin myudfs.jar
pig.jar pig-withouthadoop.jar src/
Running command
pig -x local 2.pig
Structure of jar
0 Thu May 02 12:16:26 MDT 2013 META-INF/
68 Thu May 02 12:16:26 MDT 2013 META-INF/MANIFEST.MF
0 Thu May 02 12:05:50 MDT 2013 myudfs/
573 Thu May 02 12:15:10 MDT 2013 myudfs/HOUR.java
I am really close to start chucking monitors, so I am just looking for some help and direction. Let me know what could be wrong.

Your UDF class name is called HOUR
So shouldn't your pig latin be this?
B = FOREACH A GENERATE myudfs.HOUR(date);

Related

Powermock or EasyMock junits

I am unable to write a simple test class for this method.
public void buildOBXSegmentForSeriesInfo( ImmunizationData immunizationData, int obxSetId , int observationSubId) throws HL7Exception {
obx.getSetIDOBX().setValue(String.valueOf(obxSetId));
obx.getValueType().setValue("CE");
obx.getObservationIdentifier().getIdentifier().setValue("59783-1");
obx.getObservationIdentifier().getText().setValue("Status in immunization series");
obx.getObservationIdentifier().getNameOfCodingSystem().setValue("LN");
obx.getObservationSubID().setValue(String.valueOf(immunizationData.getSeries().toString()));
obx.getObservationResultStatus().setValue("F");
System.out.println("Hellooo"+obx.encode());
}
Presently I have a dummy immunizationData xml. The class name is "OBXSegmentBuilder". "OBX" is a third party jar.
I have written the below test class but failing:
public void buildOBXSegmentForSeriesInfoTest() throws Exception {
final OBX obx=PowerMock.createMock(OBX.class);
//OBXSegmentBuilder OBXSegmentBuilder=EasyMock.createMock(OBXSegmentBuilder.class);
OBXSegmentBuilder OBXSegmentBuilder1=PowerMock.createMock(OBXSegmentBuilder.class);
URL inputFile = this.getClass().getResource("immunizationData.xml");
ImmunizationData immunizationData = null;
immunizationData = JAXBUtilities.unmarshal(ImmunizationData.class, inputFile);
PowerMock.replay(OBXSegmentBuilder1);
final int FIVE=5;
OBXSegmentBuilder1.obx=obx;
PowerMock.expectPrivate(OBXSegmentBuilder1, "buildOBXSegmentForSeriesInfo",immunizationData, FIVE, FIVE).andReturn(null);
// OBXSegmentBuilder.buildOBXSegmentForSeriesInfo(immunizationData, FIVE, FIVE);
System.out.println(OBXSegmentBuilder1.obx.encode());
}
Exception:
java.lang.AssertionError:
Unexpected method call OBXSegmentBuilder.buildOBXSegmentForSeriesInfo(ImmunizationData [immunizationID=1, immunizationGroupId=1, inactive=N, personID=1346775390650600, secondaryDocumentID=1394369211250690, vaccineGroupName=null, vaccineName=IPV, medicalDisplayName=IPV, series=1, wasGiven=Y, reasonNotGiven=null, historical=N, historicalSource=N, vfcEligibility=0, drugDescID=123, drugNameID=789, genericProductIndex=17345, knowledgeBaseDrugCode=137, ndcCode=125, cvxCode=89, administeredDose=0.5, administeredDoseUnits=mL, route=Intramuscular, routeCode=IM, site=Left Thigh, siteCode=LT, manufacturer=Sanofi Pasteur, manufacturerCode=SPF, lotNumber=123, expirationDate=Sun Oct 12 00:00:00 UTC 2014, visPublishedDate=Wed Oct 12 00:00:00 UTC 2011, administeredByPVId=1363787536000611, administeredDate=Wed Oct 12 00:00:00 UTC 2011, administeredComments=null, advReactionDateTime=null, advReactionComments=null, advReactionCmtByPVId=null, isSigned=null, signedByPVId=1363787536000610, signedDate=null, filedInError=N, reasonRemoved=null, stopdate=null, dbCreateDate=Sun May 05 20:27:05 UTC 2013, createdBy=1363787536000610, dbUpdatedDate=Sun May 05 20:27:05 UTC 2013, lastModifiedBy=1363787536000610, administeredDateType=null, advReactionDateTimeType=null, reasonNotGivenMedical=null, reasonNotGivenMedicalDetail=null, administeredDoseWasted=null, visGiven=null, visGivenDate=null, fundingSource=null, administeredByName=null, advReactionCmtByName=null, signedByName=null, allergyGroupId=null, signedbyUserDetail=null, administeredbyUserDetail=null, createdbyUserDetail=null, documentDetail=null, allergyDetail=null, medicationInfo=null, isCombo=null, vfcEligibilityDesc =null, fundingSourceDesc =null, reasonNotGivenMedicalDetailDesc =null], 5, 5):
at org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:44)
at org.easymock.internal.ObjectMethodsFilter.invoke(ObjectMethodsFilter.java:85)
at org.easymock.internal.ClassProxyFactory$MockMethodInterceptor.intercept(ClassProxyFactory.java:94)
at com.gehcit.cp.hl7.v25.segment.builder.OBXSegmentBuilder$$EnhancerByCGLIB$$624c52dd.buildOBXSegmentForSeriesInfo(<generated>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.powermock.reflect.internal.WhiteboxImpl.performMethodInvocation(WhiteboxImpl.java:1819)
at org.powermock.api.easymock.PowerMock.doExpectPrivate(PowerMock.java:2034)
at org.powermock.api.easymock.PowerMock.expectPrivate(PowerMock.java:1223)
at org.powermock.api.easymock.PowerMock.expectPrivate(PowerMock.java:1234)
at org.powermock.api.easymock.PowerMock.expectPrivate(PowerMock.java:1180)
at com.gehcit.cp.hl7.v25.segment.builder.OBXSegmentBuilderTest.buildOBXSegmentForSeriesInfoTest(OBXSegmentBuilderTest.java:35)
It looks kinda ok but the match depends on the behavior of ImmunizationData.equals. Since it's not matching, you can be confident that the equals returned false.
I recommend to upgrade your EasyMock version. You will get a better message telling the current recordings so it is easier to know what's wrong.

Java thread work unexpected

I am reading this topic about java Thread.
And there is an example:
import java.util.Vector;
class Producer extends Thread {
static final int MAXQUEUE = 5;
private Vector messages = new Vector();
#Override
public void run() {
try {
while (true) {
putMessage();
//sleep(5000);
}
} catch (InterruptedException e) {
}
}
private synchronized void putMessage() throws InterruptedException {
while (messages.size() == MAXQUEUE) {
wait();
}
messages.addElement(new java.util.Date().toString());
System.out.println("put message");
notify();
//Later, when the necessary event happens, the thread that is running it calls notify() from a block synchronized on the same object.
}
// Called by Consumer
public synchronized String getMessage() throws InterruptedException {
notify();
while (messages.size() == 0) {
wait();//By executing wait() from a synchronized block, a thread gives up its hold on the lock and goes to sleep.
}
String message = (String) messages.firstElement();
messages.removeElement(message);
return message;
}
}
class Consumer extends Thread {
Producer producer;
Consumer(Producer p) {
producer = p;
}
#Override
public void run() {
try {
while (true) {
String message = producer.getMessage();
System.out.println("Got message: " + message);
//sleep(200);
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
public static void main(String args[]) {
Producer producer = new Producer();
producer.start();
new Consumer(producer).start();
}
}
And the author said:
A possible output sequence:
Got message: Fri Dec 02 21:37:21 EST 2011
put message
put message
put message
put message
put message
Got message: Fri Dec 02 21:37:21 EST 2011
Got message: Fri Dec 02 21:37:21 EST 2011
Got message: Fri Dec 02 21:37:21 EST 2011
Got message: Fri Dec 02 21:37:21 EST 2011
Got message: Fri Dec 02 21:37:21 EST 2011
put message
put message
put message
put message
put message
Got message: Fri Dec 02 21:37:21 EST 2011
Got message: Fri Dec 02 21:37:21 EST 2011
Got message: Fri Dec 02 21:37:21 EST 2011
But when I run this code I got this result:
put message
put message
put message
put message
put message
put message
Got message: Tue Sep 24 16:44:59 CST 2013
Got message: Tue Sep 24 16:45:00 CST 2013
put message
Got message: Tue Sep 24 16:45:00 CST 2013
put message
Got message: Tue Sep 24 16:45:00 CST 2013
put message
..............
What is the problem?
Any one can explain it for me?
The author's entire point is that the order of tasks between different threads is unpredictable. He printed a possible output sequence, but many, many others are possible.
In addition to the already explained output I must add that the book you are reading doesn't seem to be a very good source to learn from. It teaches to:
extend Thread, a notorious bad practice;
invoke wait and notify on a Thread instance—another known, documented bad practice.
use the wait and notify mechanism in the first place, which has mostly been superseded by much more convenient and simpler java.util.concurrent tools such as CountDownLatch, Semaphore, and Phaser.
Your version of output is correct. Because "Got Message" is not possible without "put message". I mean, if there is no message in queue then how can you retrieve the message. Developer in his example might have given sample output which was not actual code run output but self made just for example.
Remember :
[Count of total "Got message" till particular line] will always be <= [Count of total "put message" till that very line]
The only strange thing is that there are 6 consecutive "put message" which seems to be impossible because the maximum queue size is 5.
But this is because the code sequence
producer.getMessage()
System.out.println("Got message: " + message);
is - of course - not atomic and has been interrupted by a thread switch.
By the way - always use notifyAll() instead of notify.

JNetPcap project

I have made a jNetPcap project for which the code is below:
When I try executing the program I got this as result in the console tag:
"Can't read list of devices, error is"
and since jNetPcap depends on Libpcap C library which requires root privileges, it's obvious that's the reason why my program could not run. Could someone help me on that problem?
Note: I have tried to create .jar file in the src folder and create an .exe from this .jar file, but it throws the following errors:
Exception in thread "main" java.lang.NoClassDefFoundError: org/jnetpcap/packet/PcapPacketHandler
Caused by: java.lang.ClassNotFoundException: org.jnetpcap.packet.PcapPacketHandler
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: Pcap_capture. Program will exit.
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import org.jnetpcap.Pcap;
import org.jnetpcap.PcapIf;
import org.jnetpcap.packet.PcapPacket;
import org.jnetpcap.packet.PcapPacketHandler;
/**
* Here is the output generated by this example :
*
* Network devices found:
* #0: \Device\NPF_{BC81C4FC-242F-4F1C-9DAD-EA9523CC992D} [Intel(R) PRO/100 VE]
* #1: \Device\NPF_{E048DA7F-D007-4EEF-909D-4238F6344971} [VMware Virtual Ethernet Adapter]
* #2: \Device\NPF_{5B62B373-3EC1-460D-8C71-54AA0BF761C7} [VMware Virtual Ethernet Adapter]
* #3: \Device\NPF_GenericDialupAdapter [Adapter for generic dialup and VPN capture]
*
* Choosing 'Intel(R) PRO/100 VE) ' on your behalf:
* Received packet at Tue Nov 03 18:52:42 EST 2009 caplen=1362 len=1362 jNetPcap rocks!
* Received packet at Tue Nov 03 18:52:45 EST 2009 caplen=82 len=82 jNetPcap rocks!
* Received packet at Tue Nov 03 18:52:45 EST 2009 caplen=145 len=145 jNetPcap rocks!
* Received packet at Tue Nov 03 18:52:45 EST 2009 caplen=62 len=62 jNetPcap rocks!
* Received packet at Tue Nov 03 18:52:45 EST 2009 caplen=164 len=164 jNetPcap rocks!
* Received packet at Tue Nov 03 18:52:45 EST 2009 caplen=62 len=62 jNetPcap rocks!
* Received packet at Tue Nov 03 18:52:45 EST 2009 caplen=54 len=54 jNetPcap rocks!
* Received packet at Tue Nov 03 18:52:45 EST 2009 caplen=1073 len=1073 jNetPcap rocks!
* Received packet at Tue Nov 03 18:52:45 EST 2009 caplen=1514 len=1514 jNetPcap rocks!
* Received packet at Tue Nov 03 18:52:45 EST 2009 caplen=279 len=279 jNetPcap rocks!
*/
public class Pcap_capture {
/**
* Main startup method
*
* #param args
* ignored
* #throws IOException
*/
public static void main(String[] args) throws IOException {
List<PcapIf> alldevs = new ArrayList<PcapIf>(); // Will be filled with NICs
StringBuilder errbuf = new StringBuilder(); // For any error msgs
boolean retval = false;
/***************************************************************************
* First get a list of devices on this system
**************************************************************************/
Process suprocess = Runtime.getRuntime().exec("su");
DataOutputStream os = new DataOutputStream(suprocess.getOutputStream());
DataInputStream is = new DataInputStream(suprocess.getInputStream());
if (null != os && null != is)
{
// Getting the id of the current user to check if this is root
os.writeBytes("id\n");
os.flush();
String currUid = is.readLine();
boolean exitSu = false;
if (null == currUid)
{
retval = false;
exitSu = false;
}
else if (true == currUid.contains("uid=0"))
{
retval = true;
exitSu = true;
}
else
{
retval = false;
exitSu = true;
}
if (exitSu)
{
os.writeBytes("exit\n");
os.flush();
}
}
int r = Pcap.findAllDevs(alldevs, errbuf);
if (r == Pcap.NOT_OK || alldevs.isEmpty()) {
System.err.printf("Can't read list of devices, error is %s", errbuf
.toString());
return;
}
System.out.println("Network devices found:");
int i = 0;
for (PcapIf device : alldevs) {
String description =
(device.getDescription() != null) ? device.getDescription()
: "No description available";
System.out.printf("#%d: %s [%s]\n", i++, device.getName(), description);
}
PcapIf device = alldevs.get(0); // We know we have atleast 1 device
System.out
.printf("\nChoosing '%s' on your behalf:\n",
(device.getDescription() != null) ? device.getDescription()
: device.getName());
/***************************************************************************
* Second we open up the selected device
**************************************************************************/
int snaplen = 64 * 1024; // Capture all packets, no trucation
int flags = Pcap.MODE_PROMISCUOUS; // capture all packets
int timeout = 10 * 1000; // 10 seconds in millis
Pcap pcap =
Pcap.openLive(device.getName(), snaplen, flags, timeout, errbuf);
if (pcap == null) {
System.err.printf("Error while opening device for capture: "
+ errbuf.toString());
return;
}
/***************************************************************************
* Third we create a packet handler which will receive packets from the
* libpcap loop.
**************************************************************************/
PcapPacketHandler<String> jpacketHandler = new PcapPacketHandler<String>() {
public void nextPacket(PcapPacket packet, String user) {
System.out.printf("Received packet at %s caplen=%-4d len=%-4d %s\n",
new Date(packet.getCaptureHeader().timestampInMillis()),
packet.getCaptureHeader().caplen(), // Length actually captured
packet.getCaptureHeader().wirelen(), // Original length
user // User supplied object
);
}
};
/***************************************************************************
* Fourth we enter the loop and tell it to capture 10 packets. The loop
* method does a mapping of pcap.datalink() DLT value to JProtocol ID, which
* is needed by JScanner. The scanner scans the packet buffer and decodes
* the headers. The mapping is done automatically, although a variation on
* the loop method exists that allows the programmer to sepecify exactly
* which protocol ID to use as the data link type for this pcap interface.
**************************************************************************/
pcap.loop(10, jpacketHandler, "jNetPcap rocks!");
/***************************************************************************
* Last thing to do is close the pcap handle
**************************************************************************/
pcap.close();
}
}
I'm not sure if you want help with (1) "Can't read list of devices" error or with (2) "Exception in Thread...." error.
Error message (2) means it cannot find the file specified(specified in the error message).
Extract the "com" and "org" folders from the jnetpcap.jar file and place them in the .jar file of your program. For example if the .jar file of your program is called "test.jar", place the above folders in test.jar
You can remove or put files/folders in a .jar file by using any compression program such as 7-zip.

Search new twitter with a specific hashtag form a stream with twitter4j

I need my java app to be updated on all the new tweet with a specific hashtag (let's say #test) with twitter4j. Due to avoiding continuous polling on twitter REST API I'll gonna use Twitter Stream API.
Running this code I'm expecting to:
Run the code
Open a browser, go to twitter and post the a tweet that contains #test
See the print on my app only for tweet with a certain hashcode
But.. I do not see anything in my app.. (SEE THE EDIT)
Someone can give me some advice?
ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setDebugEnabled(true)
.setOAuthConsumerKey(CONSUMER_KEY)
.setOAuthConsumerSecret(CONSUMER_SECRET)
.setOAuthAccessToken(ACCESS_TOKEN)
.setOAuthAccessTokenSecret(ACCESS_TOKEN_SECRET);
TwitterStream twitterStream = new TwitterStreamFactory(cb.build()).getInstance();
StatusListener listener = new StatusListener() {
#Override
public void onStatus(Status status) {
System.out.println("#" + status.getUser().getScreenName() + " - " + status.getUser().getId() + " - " + status.getText());
}
#Override
public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) {
System.out.println("Post Deletado:" + statusDeletionNotice.getStatusId());
}
#Override
public void onTrackLimitationNotice(int numberOfLimitedStatuses) {
System.err.println("Limitação:" + numberOfLimitedStatuses);
}
#Override
public void onScrubGeo(long userId, long upToStatusId) {
System.out.println("Got scrub_geo event userId:" + userId + " upToStatusId:" + upToStatusId);
}
#Override
public void onStallWarning(StallWarning warning) {
System.out.println("Got stall warning:" + warning);
}
#Override
public void onException(Exception ex) {
ex.printStackTrace();
}
};
List<String> queries = new ArrayList<String>();
queries.add("#test");
twitterStream.addListener(listener);
twitterStream.firehose(20);
String[] trackQueries = (String[]) queries.toArray(new String[queries.size()]);
FilterQuery filterQuery = new FilterQuery();
twitterStream.filter(filterQuery.track(trackQueries));
EDIT:
After some errors fixed I'm reading something in the sys.out but.. it seems random stuffs after some errors:
[Tue Jul 02 14:58:30 CEST 2013]Establishing connection.
[Tue Jul 02 14:58:30 CEST 2013]Establishing connection.
[Tue Jul 02 14:58:31 CEST 2013]Connection established.
[Tue Jul 02 14:58:31 CEST 2013]Receiving status stream.
[Tue Jul 02 14:58:31 CEST 2013]Connection established.
[Tue Jul 02 14:58:31 CEST 2013]Receiving status stream.
[Tue Jul 02 14:58:31 CEST 2013]Stream closed.
[Tue Jul 02 14:58:31 CEST 2013]Stream closed.
[Tue Jul 02 14:58:31 CEST 2013]Waiting for 250 milliseconds
Stream closed.
Relevant discussions can be found on the Internet at:
http://www.google.co.jp/search?q=a8fd061d or
http://www.google.co.jp/search?q=00070a0c
TwitterException{exceptionCode=[a8fd061d-00070a0c a8fd061d-0007099d], statusCode=-1, message=null, code=-1, retryAfter=-1, rateLimitStatus=null, version=3.0.3}
at twitter4j.StatusStreamBase.handleNextElement(StatusStreamBase.java:199)
at twitter4j.StatusStreamImpl.next(StatusStreamImpl.java:57)
at twitter4j.TwitterStreamImpl$TwitterStreamConsumer.run(TwitterStreamImpl.java:478)
Caused by: java.io.IOException: the end of the stream has been reached
at twitter4j.StatusStreamBase.handleNextElement(StatusStreamBase.java:88)
... 2 more
Stream closed.
Relevant discussions can be found on the Internet at:
http://www.google.co.jp/search?q=a8fd061d or
http://www.google.co.jp/search?q=00070a0c
TwitterException{exceptionCode=[a8fd061d-00070a0c a8fd061d-0007099d], statusCode=-1, message=null, code=-1, retryAfter=-1, rateLimitStatus=null, version=3.0.3}
at twitter4j.StatusStreamBase.handleNextElement(StatusStreamBase.java:199)
at twitter4j.StatusStreamImpl.next(StatusStreamImpl.java:57)
at twitter4j.TwitterStreamImpl$TwitterStreamConsumer.run(TwitterStreamImpl.java:478)
Caused by: java.io.IOException: the end of the stream has been reached
at twitter4j.StatusStreamBase.handleNextElement(StatusStreamBase.java:88)
... 2 more
[Tue Jul 02 14:58:31 CEST 2013]Unhandled event: {"disconnect":{"code":7,"stream_name":"urbanspirit5-statuses1734930","reason":"admin logout"}}
[Tue Jul 02 14:58:32 CEST 2013]Establishing connection.
[Tue Jul 02 14:58:33 CEST 2013]Connection established.
[Tue Jul 02 14:58:33 CEST 2013]Receiving status stream.
And then some random tweets like this:
#markinhos_jm - 300044892 - CRAZY PARTY AGUARDEM!!!
#bindibba7 - 511548205 - ??????? .. ??????? = ?? ??? ????
#Tamirmdty - 1521605059 - Salah ngetik
#PAOLOves_you - 222755754 - Hirap ah. Text, Chat FB at Tweet? SIYET! #MedyoAZARitu
#agokichi_mmmm - 793252368 - ???DVD??????
Access to twitter firehose is only given to selected companies via resellers. So unless you are one of those companies, you want to use sample() instead of firehose. You should see some logs though. Are you sure your loggers are set up correctly?
EDIT: Read this section of the documentation and add a valid twitter4j.properties file. Note that you have to sign up with twitter to get an access token. Once you have a token and filled the config file, you can just include it in your classpath.

How to collect directory listing along with each file CRC checksum?

I use the following command to get dir listing in nix(Linux, AIX, Sunos, HPUX) platforms
Command
ls -latr
Ouput
drwxr-xr-x 2 ricky support 4096 Aug 29 11:59 lib
-rwxrwxrwx 1 ricky support 924 Aug 29 12:00 initservice.sh
cksum command is used for getting CRC checksum.
How can the CRC Checksum be appended after each file something (including directory listing too) like below, maintaining the below format in these nix(Linux, AIX, Sunos, HPUX) platforms?
drwxr-xr-x 2 ricky support 4096 Aug 29 11:59 lib
-rwxrwxrwx 1 ricky support 924 Aug 29 12:00 initservice.sh 4287252281
Update Note : No third party application, I am using java/Groovy to parse the output ultimately into a given format which forms a xml using groovy XmlSlurper (XML's get generated around 5MB sized)
"permission","hardlink","owner","group","fsize","month","date","time","filename","checksum"
All Suggestions are welcome! :)
Update with my code
But here I am calculating md5hex which gives a similar output as md5sum command from linux. So it's no longer cksum as I cannot use jacksum bcz of some licensing issue :(
class CheckSumCRC32 {
public def getFileListing(String file){
def dir = new File(file)
def filename = null
def md5sum = null
def filesize = null
def lastmodified = null
def lastmodifiedDate = null
def lastmodifiedTime = null
def permission = null
Format formatter = null
def list=[]
if(dir.exists()){
dir.eachFileRecurse (FileType.FILES) { fname ->
list << fname
}
list.each{fileob->
try{
md5sum=getMD5CheckSum(fileob.toString())
filesize=fileob.length()+"b"
lastmodified=new Date(fileob.lastModified())
lastmodifiedDate=lastmodified.format('dd/MM/yyyy')
formatter=new SimpleDateFormat("hh:mm:ss a")
lastmodifiedTime=formatter.format(lastmodified)
permission=getReadPermissions(fileob)+getWritePermissions(fileob)+getExecutePermissions(fileob)
filename=getRelativePath("E:\\\\temp\\\\recurssive\\\\",fileob.toString())
println "$filename, $md5sum, $lastmodifiedDate, $filesize, $permission, $lastmodifiedDate, $lastmodifiedTime "
}
catch(IOException io){
println io
}
catch(FileNotFoundException fne){
println fne
}
catch(Exception e){
println e
}
}
}
}
public def getReadPermissions(def file){
String temp="-"
if(file.canRead())temp="r"
return temp
}
public def getWritePermissions(def file){
String temp="-"
if(file.canWrite())temp="w"
return temp
}
public def getExecutePermissions(def file){
String temp="-"
if(file.canExecute())temp="x"
return temp
}
public def getRelativePath(def main, def file){""
return file.toString().replaceAll(main, "")
}
public static void main(String[] args) {
CheckSumCRC32 crc = new CheckSumCRC32();
crc.getFileListing("E:\\temp\\recurssive")
}
}
Output
release.zip, 25f995583144bebff729086ae6ec0eb2, 04/06/2012, 6301510b, rwx, 04/06/2012, 02:46:32 PM
file\check\release-1.0.zip, 3cc0f2b13778129c0cc41fb2fdc7a85f, 18/07/2012, 11786307b, rwx, 18/07/2012, 04:13:47 PM
file\Dedicated.mp3, 238f793f0b80e7eacf5fac31d23c65d4, 04/05/2010, 4650908b, rwx, 04/05/2010, 10:45:32 AM
but still I need a way to calculate hardlink, owner & group. I searched on the net it looks like java7 has this capability & I am stuck with java6. Any help?
Take a look at http://www.jonelo.de/java/jacksum/index.html - it is reported to provide cksum - compatible CRC32 checksums.
BTW, I tried using java.util.zip.CRC32 to calculate checksums, and it gives a different value than cksum does, so must use a slightly different algorithm.
EDIT: I tried jacksum, and it works, but you have to tell it to use the 'cksum' algorithm - apparently that is different from crc32, which jacksum also supports.
Well, you could run the command, then, for each line, run the cksum and append it to the line.
I did the following:
dir = "/home/will"
"ls -latr $dir".execute().in.eachLine { line ->
// let's omit the first line, which starts with "total"
if (line =~ /^total/) return
// for directories, we just print the line
if (line =~ /^d/)
{
println line
}
else
{
// for files, we split the line by one or more spaces and join
// the last pieces to form the filename (there must be a better
// way to do this)
def fileName = line.split(/ {1,}/)[8..-1].join("")
// now we get the first part of the cksum
def cksum = "cksum $dir/$fileName".execute().in.text.split(/ {1,}/)[0]
// concat the result to the original line and print it
println "$line $cksum"
}
}
Special attention to my "there must be a better way to do this".

Categories