Trimmomatic not acknowleding commands over Linux cluster - java

I am trying to use the program Trimmomatic to removed adapter sequences from an Illumina paired-end read over a computer cluster. While I can get the program to open, it will either not acknowledge the commands I enter or will return an error message. I have tried all kinds of permutations of the input commands without success. Examples of input code and error messages are below
Code:
java -classpath /*filepath*/Trimmomatic-0.32/trimmomatic-0.32.jar org.usadellab.trimmomatic.TrimmomaticPE \
-phred33 -trimlog /Results/log.txt \
~/*filepath*/data_R1.fq ~/*filepath*/data_R2.fq \
ILLUMINACLIP:/*filepath*/Trimmomatic-0.32/adapters/TruSeq3-PE-2.fa:2:30:10:3:"true"
Results: (the o/s seems to find and execute the software, but is not feeding in the command; I get the same result if I use the java -jar option for executing Trimmomatic)
TrimmomaticPE [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] [-basein <inputBase> | <inputFile1> <inputFile2>] [-baseout <outputBase> | <outputFile1P> <outputFile1U> <outputFile2P> <outputFile2U>] <trimmer1>...
Code: (If I add in the command PE immediately before all other commands, the program executes and can find the fasta file containing the adapter sequences, but then searches for and cannot fund a file called 'PE')
java -classpath /*filepath*/trimmomatic-0.32.jar org.usadellab.trimmomatic.TrimmomaticPE \
PE -phred33 -trimlog /Results/log.txt \
~/*filepath*/data_R1.fq ~/*filepath*/data_R2.fq \
ILLUMINACLIP:/*filepath*/Trimmomatic-0.32/adapters/TruSeq3-PE-2.fa:2:30:10:3:"true"
Results: (Programs rus and finds the fasta file of adapter sequences, but then fails to execute. Why is it looking for a PE file?)
TrimmomaticPE: Started with arguments: PE -phred33 -trimlog /Results/log.txt /*filepath*/data_R1.fq /*filepath*/data_R2.fq ILLUMINACLIP:/*filepath*/Trimmomatic-0.32/adapters/TruSeq3-PE-2.fa:2:30:10:3:true
Multiple cores found: Using 12 threads
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Exception in thread "main" java.io.FileNotFoundException: PE (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:127)
at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:251)
at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:498)
at org.usadellab.trimmomatic.TrimmomaticPE.main(TrimmomaticPE.java:506)

I've never used trimmomatic but it looks like you are passing in the incorrect parameters.
the trimmomatic webpage lists the usage from version 0.27+ as:
java -jar <path to trimmomatic.jar> PE [-threads <threads] [-phred33 | -phred64] [-trimlog <logFile>] <input 1> <input 2> <paired output 1> <unpaired output 1> <paired output 2> <unpaired output 2> <step 1> ...
or using the "old way"
java -classpath <path to trimmomatic jar> org.usadellab.trimmomatic.TrimmomaticPE [-threads <threads>] [-phred33 | -phred64] [-trimlog <logFile>] <input 1> <input 2> <paired output 1> <unpaired output 1> <paired output 2> <unpaired output 2> <step 1> ...
Where the only difference is the new way is specifying "PE" as the main class instead of a fully qualified path.
First, addressing your 2nd problem:
You look like you are doing both: specifying a fully qualified class name as well as the PE part. This makes trimmomatic think you have a fastq file named "PE" which doesn't exist.
If you get rid of the "PE" OR the qualfited class name; it will call the correct class. Which is what you do first in your first problem.
1st problem
I don't think you have the correct number of arguments listed in your first invocation so trimmomatic displays the usage to tell you what parameters are required. It would be nice if it told you what was wrong but it doesn't.
Solution
It looks like you are only providing 2 fastq files but trimmmoatic needs 6 file paths. You are missing the output paired and unpaired files paths for the read 1 and read 2 data which I assume get created by the program when it runs.
I guess your 2nd attempt got further along in the program since it saw enough parameters that you potentially had enough file paths specified (however, it turns out you had optional step parameters)

Following the advice of dkatzel below and user blakeoft on SeqAnswers (http://seqanswers.com/forums/showthread.php?t=45094), I dropped the PE flag and added individual file names for each output file and the program executed properly.
java -classpath /*filepath*/Trimmomatic-0.32/trimmomatic-0.32.jar org.usadellab.trimmomatic.TrimmomaticPE \
-phred33 \
~/refs/lec12/data_R1.fq ~/refs/lec12/data_R2.fq \
lane1_forward_paired.fq lane1_forward_unpaired.fq lane1_reverse_paired.fq lane1_reverse_unpaired.fq \
ILLUMINACLIP:/*filepath*/Trimmomatic-0.32/adapters/TruSeq3-PE-2.fa:2:30:10:3:true
NB: I also tried using the -baseout flag rather than a list of four files, and the program would open but not execute any commands
NB: The a log file could be generated using the flag -trimlog filename, but only if I first made a blank text file with the same name as the intended log file.

Related

Using SnpSift, only 0.52% of VCF is annotated by dbsnp database

I generated a coordinate sorted vcf file from a cram using the following commands:
samtools sort -# 10 -o /output/sorted.cram
samtools index -# 10 /output/sorted.cram
bcftools mpileup -f reference.fa -r chrz:zzzz-zzzzx -a INFO/AD,FORMAT/DP --threads 10 -O v -o /output/mpileup.vcf /input/sorted.cram
I am trying to annotate the coordinate sorted vcf file (ref genome Hg38) with snpsift. I am using the following command:
java -jar SnpSift.jar annotate -v /dbsnp/file.vcf.gz /input/mpileup.vcf > /output/annotated.vcf
I have downloaded the dbsnp vcf file and tab index here: ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/GATK/
However, only 0.52% of the vcf is being annotated... This seems strange. Additionally, when I try to use the ensemble web interface (https://useast.ensembl.org/Multi/Tools/VEP?db=core) to annotate my vcf I get the error "invalid input". This leads me believe something is wrong with my vcf file? I am only trying to annotate one gene, is it normal for only 0.52% of a gene to be annotated by dbsnp? Thank you in advance for any assistance!
Update! If use bcftools mpileup | bcftools call --variants-only then the ensembl tool works. Additionally, this artificially increases the % of SNPs annotated.

when running batch file from java , 1 randomly appears before >>

i am trying to run a batch file from my java code
this is the batch file line :
C:\Users\abdelk\workspace\Symmetrix>symconfigure -sid 13 -cmd "create dev count=16, size=139840, emulation=FBA , config=TDEV;" commit -nop >> out_file.txt
when i run the batch file from my code, "1" randomly appears before the ">>". so line in cmd becomes like that :
C:\Users\abdelk\workspace\Symmetrix>symconfigure -sid 13 -cmd "create dev count=16, size=139840, emulation=FBA , config=TDEV;" commit -nop **1>>** outfile.txt
I don't know how can i remove this random appearing "1"
this is how i run the batch file from my code
rt.exec("cmd.exe /c start "+functions_object.edit_host_name(current_host_name)+"_Meta.bat",null,new File("C:\\Users\\abdelk\\workspace\\Project"));
First, take a look on Microsoft's TechNet article using command redirection operators.
Numeric 1 is an equivalent for handle stdout (standard output).
In batch files numeric 1 is omitted on redirection stdout.
For example put those 2 lines into a batch file and run it
echo This is just a redirect test.>CapturedStandardOutput.txt
#pause
You will see that cmd.exe automatically inserts  1 (space and 1) left to the redirection operator >.
In general it is not advisable to add already in batch file 1 for stdout.
Why?
Look what is executed with:
echo This is just a redirect test.1>CapturedStandardOutput.txt
#pause
You see in console window:
echo This is just a redirect test.1 1>CapturedStandardOutput.txt
And the file CapturedStandardOutput.txt contains the line:
This is just a redirect test.1
The solution is to use in batch file:
echo This is just a redirect test. 1>CapturedStandardOutput.txt
This results in execution of the line:
echo This is just a redirect test. 1>CapturedStandardOutput.txt
And there is now the line below in file CapturedStandardOutput.txt:
This is just a redirect test.
What you can't see here in the browser window is that the line in the text file ends now with a trailing space in comparison to first example. Therefore best is to use > and >> always without 1 as otherwise it is not really simple to control what is written to the text file.
One more hint:
To redirect a text to a file which ends with 1, 2, ..., 9 it is necessary to escape the number with ^.
Execution of a batch file with
echo Number is ^1>CapturedStandardOutput.txt
#pause
results in executing the command line
echo Number is 1 1>CapturedStandardOutput.txt
and in file CapturedStandardOutput.txt the line
Number is 1
with no trailing space at end of the line.
0 left to > and >> must not be escaped to get number 0 written into a text file.

error while running CRF++ from a java project

I want to call CRF++ toolkit from a java program. I type the following:
Process process = runtime.exec("/home/toshiba/Bureau/CRF++-0.54/.libs/lt-crf_learn /home/toshiba/Bureau/CRF++-0.54/example/atb/template /home/toshiba/Bureau/CRF++-0.54/example/atb/tr_java.data");
process.waitFor();
But, I have the the following error:
CRF++: Yet Another CRF Tool Kit
Copyright (C) 2005-2009 Taku Kudo, All rights reserved.
Usage: /home/toshiba/Bureau/CRF++-0.54/.libs/lt-crf_learn [options] files
-f, --freq=INT use features that occuer no less than INT(default 1)
-m, --maxiter=INT set INT for max iterations in LBFGS routine(default 10k)
-c, --cost=FLOAT set FLOAT for cost parameter(default 1.0)
-e, --eta=FLOAT set FLOAT for termination criterion(default 0.0001)
-C, --convert convert text model to binary model
-t, --textmodel build also text model file for debugging
-a, --algorithm=(CRF|MIRA) select training algorithm
-p, --thread=INT number of threads(default 1)
-H, --shrinking-size=INT set INT for number of iterations variable needs to be optimal before considered for shrinking. (default 20)
-v, --version show the version and exit
-h, --help show this help and exit
I 'm wondering if any one could help me?
I don't think that's a bug in CRF++, since you are able to run it from command line. So the actual question is how to pass arguments properly when starting a process using Runtime.exec(). I would suggest trying the following:
String[] cmd = {"/home/toshiba/Bureau/CRF++-0.54/.libs/lt-crf_learn",
"/home/toshiba/Bureau/CRF++-0.54/example/atb/template",
"/home/toshiba/Bureau/CRF++-0.54/example/atb/tr_java.data"};
Process p = Runtime.getRuntime().exec(cmd);
This may help since Runtime.exec() sometimes splits the command line into arguments in a rather strange fashion.
Another potential problem is mentioned here: Java Runtime.exec()
There's a simple solution for this. Just write your command into a temporary file and execute that file as Runtime.getRuntime.exec("sh <temp-filename>"). Later you can delete this file. I will explain reason behind this if this solution works for you.

How can I invoke a Java program from Perl?

I'm new with Perl and I'm trying to do something and can't find the answer.
I created a Java project that contains a main class that gets several input parameters.
I want to wrap my Java with Perl: I want to create a Perl script that gets the input parameters, and then passes them to the Java program, and runs it.
For example:
If my main is called mymain, and I call it like this: mymain 3 4 hi (3, 4 and hi are the input parameters), I want to create a Perl program called myperl which when it is invoked as myperl 3 4 hi will pass the arguments to the Java program and run it.
How can I do that?
Running a Java program is just like running any other external program.
Your question has two parts :
How do I get the arguments from Perl to the Java program?
How do I run the program in Perl?
For (1) you can do something like
my $javaArgs = " -cp /path/to/classpath -Xmx256";
my $className = myJavaMainClass;
my $javaCmd = "java ". $javaArgs ." " . $className . " " . join(' ', #ARGV);
Notice the join() function - it will put all your arguments to the Perl program and separate them with space.
For (2) you can follow #AurA 's answer.
Using the system() function
my $ret = system("$javaCmd");
This will not capture (i.e. put in the variable $ret) the output of your command, just the return code, like 0 for success.
Using backticks
my $out = `$javaCmd`;
This will populate $out with the whole output of the Java program ( you may not want this ).
Using pipes
open(FILE, "-|", "$javaCmd");
my #out = <FILE>
This is more complicated but allows more operations on the output.
For more information on this see perldoc -f open.
$javaoutput = `java javaprogram`;
or
system "java javaprogram";
For a jar file
$javaoutput = `java -jar filename.jar`;
or
system "java -jar filename.jar";

odd .bat file behavior

I have a bat file with the following contents:
set logfile= D:\log.txt
java com.stuff.MyClass %1 %2 %3 >> %logfile%
when I run the bat file though, I get the following:
C:\>set logfile= D:\log.txt
C:\>java com.stuff.MyClass <val of %1> <val of %2> <val of %3> 1>>D:\log.txt
The parameter is incorrect.
I'm almost positive the "The parameter is incorrect." is due to the extraneous 1 in there. I also think this might have something with the encoding of the .bat file, but I can't quite figure out what is causing it. Anyone ever run into this before or know what might be causing it and how to fix it?
Edit
And the lesson, as always, is check if its plugged in first before you go asking for help. The bat file, in version control, uses D:\log.txt because it is intended to be run from the server which contains a D drive. When testing my changes and running locally, on my computer which doesn't have a D drive, I failed to make the change to use C:\log.txt which is what caused the error. Sorry for wasting you time, thanks for the help, try to resist the urge to downvote me too much.
I doubt that that's the problem - I expect the command processor to deal with that part for you.
Here's evidence of it working for me:
Test.java:
public class Test
{
public static void main(String args[]) throws Exception
{
System.out.println(args.length);
for (String arg : args)
{
System.out.println(arg);
}
}
}
test.bat:
set logfile= c:\users\jon\test\test.log
java Test %1 %2 %3 >> %logfile%
On the command line:
c:\Users\Jon\Test> [User input] test.bat first second third
c:\Users\Jon\Test>set logfile= c:\users\jon\test\test.log
c:\Users\Jon\Test>java Test first second third 1>>c:\users\jon\test\test.log
c:\Users\Jon\Test> [User input] type test.log
3
first
second
third
the 1 is not extraneous: it is inserted by cmd.exe meaning stdout (instead of ">>", you can also write "1>>". contrast this to redirecting stderr: "2>>"). so the problem must be with your parameters.
This may seem like a stupid question, but is there an existing D: drive in the context that the bat file runs in?
Once I had a case where a bat file was used as the command line of a task within the Task Manager, but the Run As user was set to a local user on the box, giving no access to network drives.
Interpolated for your case, if the D: drive were a network drive, running the bat file as, say, the local administrator account on that machine instead of a domain user account would likely fail to have access to D:.

Categories