Using SnpSift, only 0.52% of VCF is annotated by dbsnp database - java

I generated a coordinate sorted vcf file from a cram using the following commands:
samtools sort -# 10 -o /output/sorted.cram
samtools index -# 10 /output/sorted.cram
bcftools mpileup -f reference.fa -r chrz:zzzz-zzzzx -a INFO/AD,FORMAT/DP --threads 10 -O v -o /output/mpileup.vcf /input/sorted.cram
I am trying to annotate the coordinate sorted vcf file (ref genome Hg38) with snpsift. I am using the following command:
java -jar SnpSift.jar annotate -v /dbsnp/file.vcf.gz /input/mpileup.vcf > /output/annotated.vcf
I have downloaded the dbsnp vcf file and tab index here: ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/GATK/
However, only 0.52% of the vcf is being annotated... This seems strange. Additionally, when I try to use the ensemble web interface (https://useast.ensembl.org/Multi/Tools/VEP?db=core) to annotate my vcf I get the error "invalid input". This leads me believe something is wrong with my vcf file? I am only trying to annotate one gene, is it normal for only 0.52% of a gene to be annotated by dbsnp? Thank you in advance for any assistance!
Update! If use bcftools mpileup | bcftools call --variants-only then the ensembl tool works. Additionally, this artificially increases the % of SNPs annotated.

Related

How can I access calendars on my Mac with Java?

I want to read data stored locally by the Apple Calendar app on my Mac (12.1 Monterey).
The data is stored in subdirectories of ~/Library/Calendars/ with one subdirectory per calendar.
The problem: When I try to get a list of files from there, Java returns null:
String userHomeDir = System.getProperty("user.home");
File calendarRoot = new File(userHomeDir + "/Library/Calendars/");
File[] calendars = calendarRoot.listFiles();
System.out.println("Number of files: " + calendars.length); // NPE thrown here
File permissions are as follows:
~/Library: drwx------+ (owner: my user)
~/Library/Calendars: drwxr-xr-x# (owner: my user)
Listing files in Library works fine.
How can I access that folder?
Short answer
Give it up. Apple has made it next to impossible to elegantly get a Java app to read calendar data.
Long answer
Since some versions (Catalina?) the directory ~/Library/Calendars/ and all subdirectories (and files therein) are protected by MacOS using extended attributes, namely com.apple.quarantine.
It used to be possible to grant applications the specific right to access calendar data using System Settings - Security and Privacy - Privacy - Calendar. However, the manual +-Button has gone now.
What I will do is use some zsh script to export the desired calendar events to another directory and remove the com.apple.quarantine attribute from there, too.
This is not elegant and leaves the Java world, but for my case, having a Java command line application being started from a designated shell script, it works rather nicely.
Here's what I came up with:
#!/bin/zsh
calendars="/Users/yourUserName/Library/Calendars"
target="/Users/yourUserName/some/other/directory/Calendar_Export"
cd ${calendars}
calsource=""
for f in *.calendar
do
linesFound=`grep -c '<string>Your Calendar Name</string>' ${f}/Info.plist`
if [[ ${linesFound} -eq 1 ]]
then
echo "The relevant calendar resides at " ${f}", copying all events"
calsource=${calendars}/${f}/Events
fi
done
if [[ ${calsource} != "" ]]
then
rm ${target}/*
cp ${calsource}/* ${target}/
xattr -d com.apple.quarantine ${target}/*
fi

Trimmomatic not acknowleding commands over Linux cluster

I am trying to use the program Trimmomatic to removed adapter sequences from an Illumina paired-end read over a computer cluster. While I can get the program to open, it will either not acknowledge the commands I enter or will return an error message. I have tried all kinds of permutations of the input commands without success. Examples of input code and error messages are below
Code:
java -classpath /*filepath*/Trimmomatic-0.32/trimmomatic-0.32.jar org.usadellab.trimmomatic.TrimmomaticPE \
-phred33 -trimlog /Results/log.txt \
~/*filepath*/data_R1.fq ~/*filepath*/data_R2.fq \
ILLUMINACLIP:/*filepath*/Trimmomatic-0.32/adapters/TruSeq3-PE-2.fa:2:30:10:3:"true"
Results: (the o/s seems to find and execute the software, but is not feeding in the command; I get the same result if I use the java -jar option for executing Trimmomatic)
TrimmomaticPE [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] [-basein <inputBase> | <inputFile1> <inputFile2>] [-baseout <outputBase> | <outputFile1P> <outputFile1U> <outputFile2P> <outputFile2U>] <trimmer1>...
Code: (If I add in the command PE immediately before all other commands, the program executes and can find the fasta file containing the adapter sequences, but then searches for and cannot fund a file called 'PE')
java -classpath /*filepath*/trimmomatic-0.32.jar org.usadellab.trimmomatic.TrimmomaticPE \
PE -phred33 -trimlog /Results/log.txt \
~/*filepath*/data_R1.fq ~/*filepath*/data_R2.fq \
ILLUMINACLIP:/*filepath*/Trimmomatic-0.32/adapters/TruSeq3-PE-2.fa:2:30:10:3:"true"
Results: (Programs rus and finds the fasta file of adapter sequences, but then fails to execute. Why is it looking for a PE file?)
TrimmomaticPE: Started with arguments: PE -phred33 -trimlog /Results/log.txt /*filepath*/data_R1.fq /*filepath*/data_R2.fq ILLUMINACLIP:/*filepath*/Trimmomatic-0.32/adapters/TruSeq3-PE-2.fa:2:30:10:3:true
Multiple cores found: Using 12 threads
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Exception in thread "main" java.io.FileNotFoundException: PE (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:127)
at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:251)
at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:498)
at org.usadellab.trimmomatic.TrimmomaticPE.main(TrimmomaticPE.java:506)
I've never used trimmomatic but it looks like you are passing in the incorrect parameters.
the trimmomatic webpage lists the usage from version 0.27+ as:
java -jar <path to trimmomatic.jar> PE [-threads <threads] [-phred33 | -phred64] [-trimlog <logFile>] <input 1> <input 2> <paired output 1> <unpaired output 1> <paired output 2> <unpaired output 2> <step 1> ...
or using the "old way"
java -classpath <path to trimmomatic jar> org.usadellab.trimmomatic.TrimmomaticPE [-threads <threads>] [-phred33 | -phred64] [-trimlog <logFile>] <input 1> <input 2> <paired output 1> <unpaired output 1> <paired output 2> <unpaired output 2> <step 1> ...
Where the only difference is the new way is specifying "PE" as the main class instead of a fully qualified path.
First, addressing your 2nd problem:
You look like you are doing both: specifying a fully qualified class name as well as the PE part. This makes trimmomatic think you have a fastq file named "PE" which doesn't exist.
If you get rid of the "PE" OR the qualfited class name; it will call the correct class. Which is what you do first in your first problem.
1st problem
I don't think you have the correct number of arguments listed in your first invocation so trimmomatic displays the usage to tell you what parameters are required. It would be nice if it told you what was wrong but it doesn't.
Solution
It looks like you are only providing 2 fastq files but trimmmoatic needs 6 file paths. You are missing the output paired and unpaired files paths for the read 1 and read 2 data which I assume get created by the program when it runs.
I guess your 2nd attempt got further along in the program since it saw enough parameters that you potentially had enough file paths specified (however, it turns out you had optional step parameters)
Following the advice of dkatzel below and user blakeoft on SeqAnswers (http://seqanswers.com/forums/showthread.php?t=45094), I dropped the PE flag and added individual file names for each output file and the program executed properly.
java -classpath /*filepath*/Trimmomatic-0.32/trimmomatic-0.32.jar org.usadellab.trimmomatic.TrimmomaticPE \
-phred33 \
~/refs/lec12/data_R1.fq ~/refs/lec12/data_R2.fq \
lane1_forward_paired.fq lane1_forward_unpaired.fq lane1_reverse_paired.fq lane1_reverse_unpaired.fq \
ILLUMINACLIP:/*filepath*/Trimmomatic-0.32/adapters/TruSeq3-PE-2.fa:2:30:10:3:true
NB: I also tried using the -baseout flag rather than a list of four files, and the program would open but not execute any commands
NB: The a log file could be generated using the flag -trimlog filename, but only if I first made a blank text file with the same name as the intended log file.

error while running CRF++ from a java project

I want to call CRF++ toolkit from a java program. I type the following:
Process process = runtime.exec("/home/toshiba/Bureau/CRF++-0.54/.libs/lt-crf_learn /home/toshiba/Bureau/CRF++-0.54/example/atb/template /home/toshiba/Bureau/CRF++-0.54/example/atb/tr_java.data");
process.waitFor();
But, I have the the following error:
CRF++: Yet Another CRF Tool Kit
Copyright (C) 2005-2009 Taku Kudo, All rights reserved.
Usage: /home/toshiba/Bureau/CRF++-0.54/.libs/lt-crf_learn [options] files
-f, --freq=INT use features that occuer no less than INT(default 1)
-m, --maxiter=INT set INT for max iterations in LBFGS routine(default 10k)
-c, --cost=FLOAT set FLOAT for cost parameter(default 1.0)
-e, --eta=FLOAT set FLOAT for termination criterion(default 0.0001)
-C, --convert convert text model to binary model
-t, --textmodel build also text model file for debugging
-a, --algorithm=(CRF|MIRA) select training algorithm
-p, --thread=INT number of threads(default 1)
-H, --shrinking-size=INT set INT for number of iterations variable needs to be optimal before considered for shrinking. (default 20)
-v, --version show the version and exit
-h, --help show this help and exit
I 'm wondering if any one could help me?
I don't think that's a bug in CRF++, since you are able to run it from command line. So the actual question is how to pass arguments properly when starting a process using Runtime.exec(). I would suggest trying the following:
String[] cmd = {"/home/toshiba/Bureau/CRF++-0.54/.libs/lt-crf_learn",
"/home/toshiba/Bureau/CRF++-0.54/example/atb/template",
"/home/toshiba/Bureau/CRF++-0.54/example/atb/tr_java.data"};
Process p = Runtime.getRuntime().exec(cmd);
This may help since Runtime.exec() sometimes splits the command line into arguments in a rather strange fashion.
Another potential problem is mentioned here: Java Runtime.exec()
There's a simple solution for this. Just write your command into a temporary file and execute that file as Runtime.getRuntime.exec("sh <temp-filename>"). Later you can delete this file. I will explain reason behind this if this solution works for you.

What is the purpose of #SmallTest, #MediumTest, and #LargeTest annotations in Android?

I'm new to Android and I've seen example code using these annotations. For example:
#SmallTest
public void testStuff() {
TouchUtils.tapView(this, anEditTextView);
sendKeys("H E L P SPACE M E PERIOD");
assertEquals("help me.", anEditTextView.getText().toString());
}
What does that annotation accomplish?
This blog post explains it best. Basically, it is the following:
Small: this test doesn't interact with any file system or network.
Medium: Accesses file systems on box which is running tests.
Large: Accesses external file systems, networks, etc.
Per the Android Developers blog, a small test should take < 100ms, a medium test < 2s, and a large test < 120s.
The answer from azizbekian shows how to utilize the annotation when running your tests.
Also, this old out-of-date page has even more information. Specifically, how to use the am instrument tool with adb shell. Here's the pertinent parts:
am instrument options
The am instrument tool passes testing options to InstrumentationTestRunner or a subclass in the form of key-value pairs, using the -e flag, with this syntax:
-e <key> <value>
Some keys accept multiple values. You specify multiple values in a comma-separated list. For example, this invocation of InstrumentationTestRunner provides multiple values for the package key:
$ adb shell am instrument -w -e package com.android.test.package1,com.android.test.package2 \
> com.android.test/android.test.InstrumentationTestRunner
The following table describes the key-value pairs and their result. Please review the Usage Notes following the table.
Key
Value
Description
size
[small | medium | large]
Runs a test method annotated by size. The annotations are #SmallTest, #MediumTest, and #LargeTest.
So reading the above, you could specify small tests like this:
$ adb shell am instrument -w \
> -e package com.android.test.package1,com.android.test.package2 \
> -e size small \
> com.android.test/android.test.InstrumentationTestRunner
As an addition to Davidann's answer and mainly OP's question in the comment:
In the context of the code above, does it actually DO anything except leave a note for other developers? Does it enforce anything? Are there any tools that utilizes this annotation? What's it's purpose in Android development?
You can run a group of tests annotated with specific annotation.
From AndroidJUnitRunner documentation:
Running a specific test size i.e. annotated with SmallTest or MediumTest or LargeTest:
adb shell am instrument -w -e size [small|medium|large] com.android.foo/android.support.test.runner.AndroidJUnitRunner
You may also setup those params through gradle:
android {
...
defaultConfig {
...
testInstrumentationRunnerArgument 'size', 'Large'
}
}
Via gradle:
-Pandroid.testInstrumentationRunnerArguments.size=small
See Doug Stevenson blog post as well as this blog post for more details.
You can also annotate POJO unit tests with #Category(MediumTest.class) or #Category(LargeTest.class), etc. by defining your own Categories - see the test-categories repo for an example

How would I run an .sh file using NSTask and get its output?

I need to run an .sh file and get its output. I need to see the setup of the file as well.
The .sh file simply runs a java app through terminal.
Any ideas? I'm truly stuck on this.....
Elijah
The server.sh file:
echo Starting Jarvis Program D.
ALICE_HOME=.
SERVLET_LIB=lib/servlet.jar
ALICE_LIB=lib/aliceserver.jar
JS_LIB=lib/js.jar
# Set SQL_LIB to the location of your database driver.
SQL_LIB=lib/mysql_comp.jar
# These are for Jetty; you will want to change these if you are using a different http server.
HTTP_SERVER_LIBS=lib/org.mortbay.jetty.jar
PROGRAMD_CLASSPATH=$SERVLET_LIB:$ALICE_LIB:$JS_LIB:$SQL_LIB:$HTTP_SERVER_LIBS
java -classpath $PROGRAMD_CLASSPATH -Xms64m -Xmx128m org.alicebot.server.net.AliceServer $1
My current code:
NSTask *server = [NSTask new];
[server setLaunchPath:#"/bin/sh"];
[server setArguments:[NSArray arrayWithObject:#"/applications/jarvis/brain/server.sh"]];
NSPipe *outputPipe = [NSPipe pipe];
[server setStandardInput:[NSPipe pipe]];
[server setStandardOutput:outputPipe];
[server launch];
NSMutableString *outputString = [NSMutableString string];
while ([outputString rangeOfString:#"Jarvis>"].location == NSNotFound) {
[outputString appendString:[[[NSString alloc] initWithData:[[outputPipe fileHandleForReading] readDataToEndOfFile] encoding:NSUTF8StringEncoding] autorelease]];
NSRunAlertPanel(#"", outputString, #"", #"", #"");
}
The NSRunAlertPanel is just for checking the output. Now my code is freezing and not even getting to the alertpanel.
See answer to this question.
There are a couple of things that should be fixed in your script:
The script should begin with a
shebang. Also make sure that the
script has its executable bit set.
Because the environment variables are set up relative to the shell script directory, you need to make sure that the script directory is the current directory.
You need to export the environment variables that should be visible to the Java process.
In the last line you can use exec to replace the shell process with the Java executable that runs Jetty.
Here is a revised version of your script:
#!/bin/sh
echo Starting Jarvis Program D.
cd "`dirname \"$0\"`"
export ALICE_HOME=.
export SERVLET_LIB=lib/servlet.jar
export ALICE_LIB=lib/aliceserver.jar
export JS_LIB=lib/js.jar
# Set SQL_LIB to the location of your database driver.
export SQL_LIB=lib/mysql_comp.jar
# These are for Jetty; you will want to change these if you are using a different http server.
export HTTP_SERVER_LIBS=lib/org.mortbay.jetty.jar
export PROGRAMD_CLASSPATH=$SERVLET_LIB:$ALICE_LIB:$JS_LIB:$SQL_LIB:$HTTP_SERVER_LIBS
exec java -classpath $PROGRAMD_CLASSPATH -Xms64m -Xmx128m org.alicebot.server.net.AliceServer $1
Invoking the shell script in Objective-C with multiple arguments:
NSTask *server = [NSTask new];
[server setLaunchPath:#"/bin/sh"];
[server setArguments:[NSArray arrayWithObjects:#"/applications/jarvis/brain/server.sh", #"argument", nil]];
...
Using AMShellWrapperTest.app you can filter (save, ...) the stdout stream of server.sh by modifying "- (void)appendOutput:(NSString *)output" in BannerController.m. (... but maybe there is a better way to do this ...)
/*
// output from stdout
- modified AMShellWrapper/AMShellWrapperTest/BannerController.m (http://www.harmless.de/cocoa-code.php)
to print server.sh setup information to "Error Messages:" text output field (or Console.app as an
alternative) and the Q & A dialog to the "Output:" text field
- use of default charliebot, http://sourceforge.net/projects/charliebot/, modified only to run server.sh
with complete path (here: ~/Desktop/charliebot/server.sh) in AMShellWrapperTest.app
*/
- (void)appendOutput:(NSString *)output
{
NSMutableString *outputString = [NSMutableString string];
if (
([output rangeOfString:#"Charlie>"].location != NSNotFound ) || \
([output rangeOfString:#"[Charlie] user>"].location != NSNotFound )
) {
[self write: output];
[self write: #"\n"];
} else {
[outputString appendString: output];
//[outputString writeToFile:#"/dev/console" atomically: NO]; // alternative
[errorOutlet setString:[[errorOutlet string] stringByAppendingString: outputString]];
}
}
yes, but why isn't my code (posted above) not working?
I guess your "Jarvis>" line is the first line of the server.sh ouput stream that expects some user input, which means that this line is incomplete without a terminating newline character "\n". If server.sh had been run in Terminal.app, the user would have to press the return key to let the dialog continue. The conditional code of the while loop (NSNotFound) cannot finish its job on this incomplete line (which would be to abort the while loop) and gets stuck.
You have to drop the while loop and use the 'readInBackgroundAndNotify' mode on NSFileHandle to get non-blocking I/O stdout stream behaviour!
See: NSTask/NSPipe STDIN hangs on large data, sometimes...
So, if you like, just transform the source code of AMShellWrapperTest.app into a pure command-line tool by removing the GUI code.

Categories