Regex Working on the test program but not on WebSprinx crwaler - java

Here is my code for Regex matching which worked for a webpage:
public class RegexTestHarness {
public static void main(String[] args) {
File aFile = new File("/home/darshan/Desktop/test.txt");
FileInputStream inFile = null;
try {
inFile = new FileInputStream(aFile);
} catch (FileNotFoundException e) {
e.printStackTrace(System.err);
System.exit(1);
}
BufferedInputStream in = new BufferedInputStream(inFile);
DataInputStream data = new DataInputStream(in);
String string = new String();
try {
while (data.read() != -1) {
string += data.readLine();
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Pattern pattern = Pattern
.compile("<div class=\"rest_title\">.*?<h1>(.*?)</h1>");
Matcher matcher = pattern.matcher(string);
boolean found = false;
while (matcher.find()) {
System.out.println("Name: " + matcher.group(1) );
found = true;
}
if(!found){
System.out.println("Pattern Not found");
}
}
}
But the same code doesn't work on the crwaler code for which I'm testing the regex, my crawler code is:(I'm using Websphinx)
// Our own Crawler class extends the WebSphinx Crawler
public class MyCrawler extends Crawler {
MyCrawler() {
super(); // Do what the parent crawler would do
}
// We could choose not to visit a link based on certain circumstances
// For now we always visit the link
public boolean shouldVisit(Link l) {
// String host = l.getHost();
return false; // always visit a link
}
// What to do when we visit the page
public void visit(Page page) {
System.out.println("Visiting: " + page.getTitle());
String content = page.getContent();
System.out.println(content);
Pattern pattern = Pattern.compile("<div class=\"rest_title\">.*?<h1>(.*?)</h1>");
Matcher matcher = pattern.matcher(content);
boolean found = false;
while (matcher.find()) {
System.out.println("Name: " + matcher.group(1) );
found = true;
}
if(!found){
System.out.println("Pattern Not found");
}
}
}
This is my code for running the crawler:
public class WebSphinxTest {
public static void main(String[] args) throws MalformedURLException, InterruptedException {
System.out.println("Testing Websphinx. . .");
// Make an instance of own our crawler
Crawler crawler = new MyCrawler();
// Create a "Link" object and set it as the crawler's root
Link link = new Link("http://justeat.in/restaurant/spices/5633/indian-tandoor-chinese-and-seafood/sarjapur-road/bangalore");
crawler.setRoot(link);
// Start running the crawler!
System.out.println("Starting crawler. . .");
crawler.run(); // Blocking function, could implement a thread, etc.
}
}
A little detail about the crawler code. shouldvisit(Link link) filters whether to visit a link or not. visit(Page page) decides what to do when we get the page.
In the above example, test.txt and content contains the same String

In your RegexTestHarness you're reading in lines from a file and concatenating the lines without line breaks after which you do your matching (readLine() returns the contents of the line without the line breaks!).
So in the input of your MyCrawler class, there probably are line break characters in the input. And since the regex meta-char . by default does not match line break chars, it doesn't work in MyCrawler.
To fix this, append (?s) in from of all your patterns that contain a . meta char. So:
Pattern.compile("<div class=\"rest_title\">.*?<h1>(.*?)</h1>")
would become:
Pattern.compile("(?s)<div class=\"rest_title\">.*?<h1>(.*?)</h1>")
The DOT-ALL flag, (?s), will cause the . to match any character, including line break chars.

Related

how to stop a java spell checker program from correcting repetitive words

I've implemented a program that does the following:
scan all of the words in a web page into a string (using jsoup)
Filter out all of the HTML markup and code
Put these words into a spell checking program and offer suggestions
The spell checking program loads a dictionary.txt file into an array and compares the string input to the words inside the dictionary.
My current problem is that when the input contains the same word multiple times, such as "teh program is teh worst", the code will print out
You entered 'teh', did you mean 'the'?
You entered 'teh', did you mean 'the'?
Sometimes a website will have multiple words over and over again and this can become messy.
If it's possible, printing the word along with how many times it was spelled incorrectly would be perfect, but putting a limit to each word being printed once would be good enough.
My program has a handful of methods and two classes, but the spell checking method is below:
Note: the original code contains some 'if' statements that remove punctuation marks but I've removed them for clarity.
static boolean suggestWord;
public static String checkWord(String wordToCheck) {
String wordCheck;
String word = wordToCheck.toLowerCase();
if ((wordCheck = (String) dictionary.get(word)) != null) {
suggestWord = false; // no need to ask for suggestion for a correct
// word.
return wordCheck;
}
// If after all of these checks a word could not be corrected, return as
// a misspelled word.
return word;
}
TEMPORARY EDIT: As requested, the complete code:
Class 1:
public class ParseCleanCheck {
static Hashtable<String, String> dictionary;// To store all the words of the
// dictionary
static boolean suggestWord;// To indicate whether the word is spelled
// correctly or not.
static Scanner urlInput = new Scanner(System.in);
public static String cleanString;
public static String url = "";
public static boolean correct = true;
/**
* PARSER METHOD
*/
public static void PageScanner() throws IOException {
System.out.println("Pick an english website to scan.");
// This do-while loop allows the user to try again after a mistake
do {
try {
System.out.println("Enter a URL, starting with http://");
url = urlInput.nextLine();
// This creates a document out of the HTML on the web page
Document doc = Jsoup.connect(url).get();
// This converts the document into a string to be cleaned
String htmlToClean = doc.toString();
cleanString = Jsoup.clean(htmlToClean, Whitelist.none());
correct = false;
} catch (Exception e) {
System.out.println("Incorrect format for a URL. Please try again.");
}
} while (correct);
}
/**
* SPELL CHECKER METHOD
*/
public static void SpellChecker() throws IOException {
dictionary = new Hashtable<String, String>();
System.out.println("Searching for spelling errors ... ");
try {
// Read and store the words of the dictionary
BufferedReader dictReader = new BufferedReader(new FileReader("dictionary.txt"));
while (dictReader.ready()) {
String dictInput = dictReader.readLine();
String[] dict = dictInput.split("\\s"); // create an array of
// dictionary words
for (int i = 0; i < dict.length; i++) {
// key and value are identical
dictionary.put(dict[i], dict[i]);
}
}
dictReader.close();
String user_text = "";
// Initializing a spelling suggestion object based on probability
SuggestSpelling suggest = new SuggestSpelling("wordprobabilityDatabase.txt");
// get user input for correction
{
user_text = cleanString;
String[] words = user_text.split(" ");
int error = 0;
for (String word : words) {
if(!dictionary.contains(word)) {
checkWord(word);
dictionary.put(word, word);
}
suggestWord = true;
String outputWord = checkWord(word);
if (suggestWord) {
System.out.println("Suggestions for " + word + " are: " + suggest.correct(outputWord) + "\n");
error++;
}
}
if (error == 0) {
System.out.println("No mistakes found");
}
}
} catch (IOException e) {
e.printStackTrace();
System.exit(-1);
}
}
/**
* METHOD TO SPELL CHECK THE WORDS IN A STRING. IS USED IN SPELL CHECKER
* METHOD THROUGH THE "WORD" STRING
*/
public static String checkWord(String wordToCheck) {
String wordCheck;
String word = wordToCheck.toLowerCase();
if ((wordCheck = (String) dictionary.get(word)) != null) {
suggestWord = false; // no need to ask for suggestion for a correct
// word.
return wordCheck;
}
// If after all of these checks a word could not be corrected, return as
// a misspelled word.
return word;
}
}
There is a second class (SuggestSpelling.java) which holds a probability calculator but that isn't relevant right now, unless you planned on running the code for yourself.
Use a HashSet to detect duplicates -
Set<String> wordSet = new HashSet<>();
And store each word of the input sentence. If any word already exist during inserting into the HashSet, don't call checkWord(String wordToCheck) for that word. Something like this -
String[] words = // split input sentence into words
for(String word: words) {
if(!wordSet.contains(word)) {
checkWord(word);
// do stuff
wordSet.add(word);
}
}
Edit
// ....
{
user_text = cleanString;
String[] words = user_text.split(" ");
Set<String> wordSet = new HashSet<>();
int error = 0;
for (String word : words) {
// wordSet is another data-structure. Its only for duplicates checking, don't mix it with dictionary
if(!wordSet.contains(word)) {
// put all your logic here
wordSet.add(word);
}
}
if (error == 0) {
System.out.println("No mistakes found");
}
}
// ....
You have other bugs as well like you are passing String wordCheck as argument of checkWord and re-declare it inside checkWord() again String wordCheck; which is not right. Please check the other parts as well.

Using trim() in Java to remove parts of an ouput

I have some code I wrote that outputs a batch file output to a jTextArea. Currently the batch file outputs an active directory query for the computer name, but there is a bunch of stuff that outputs as well that I want to be removed from the output from the variable String trimmedLine. Currently it's still outputting everything else and I can't figure out how to get only the computer name to appear.
Output: "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET"
I want the output to instead just show only this:
FDCD111304
Can anyone show me how to fix my code to only output the computer name and nothing else?
Look at console output (Ignore top line in console output)
btnPingComputer.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent arg0) {
String line;
BufferedWriter bw = null;
BufferedWriter writer =null;
try {
writer = new BufferedWriter(new FileWriter(tempFile));
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
String lineToRemove = "OU=Workstations";
String s = null;
Process p = null;
try {
p = Runtime.getRuntime().exec("c:\\computerQuery.bat");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
StringBuffer sbuffer = new StringBuffer(); // new trial
BufferedReader in = new BufferedReader(new InputStreamReader(p
.getInputStream()));
try {
while ((line = in.readLine()) != null) {
System.out.println(line);
textArea.append(line);
textArea.append(String.format(" %s%n", line));
sbuffer.append(line + "\n");
s = sbuffer.toString();
String trimmedLine = line.trim();
if(trimmedLine.equals(lineToRemove)) continue;
writer.write(line + System.getProperty("line.separator"));
}
fw.write("commandResult is " + s);
String input = "CN=FDCD511304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
Pattern pattern = Pattern.compile("(.*?)\\=(.*?)\\,");
Matcher m = pattern.matcher(input);
while(m.find()) {
String currentVar = m.group().substring(3, m.group().length() - 1);
System.out.println(currentVar); //store or do whatever you want
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally
{
try {
fw.close();
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
try {
in.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
});
You could also use javax.naming.ldap.LdapName when dealing with distinguished names. It also handles escaping which is tricky with regex alone (i.e. cn=foo\,bar,dc=fl,dc=net is a perfectly valid DN)
String dn = "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
LdapName ldapName = new LdapName(dn);
String commonName = (String) ldapName.getRdn(ldapName.size() - 1).getValue();
Well I would personally use the split() function to first get the parts split up and then parse out again. So my (probably unprofessional and buggy code) would be
String args[] = line.split(",");
String args2[] = args[0].split("=");
String computerName = args2[1];
And that would be where this is:
while ((line = in.readLine()) != null) {
System.out.println(line);
String trimmedLine = line.trim();
if (trimmedLine.equals(lineToRemove))
continue;
writer.write(line
+ System.getProperty("line.separator"));
textArea.append(trimmedLine);
textArea.append(String.format(" %s%n", line));
}
You can use a different regular expression and Matcher.matches() to find only the value you're looking for:
String str = "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
Pattern pattern = Pattern.compile("(?:.*,)?CN=([^,]+).*");
Matcher matcher = pattern.matcher(str);
if(matcher.matches()) {
System.out.println(matcher.group(1));
} else {
System.out.println("No value for CN found");
}
FDCD111304
That regular expression will find the value for CN regardless of where in the string it is. The first group is to discard anything in front of CN= (we use a group starting with ?: here to indicate that the contents of the group should not be kept), then we match CN=, then the value, which may not contain a comma and then the rest of the string (which we don't care about).
You can also use a different regex and Matcher.find() to get both the keys and values and choose which keys to act on:
String str = "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
Pattern pattern = Pattern.compile("([^=]+)=([^,]+),?");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
String key = matcher.group(1);
String value = matcher.group(2);
if("CN".equals(key) || "DC".equals(key)) {
System.out.printf("%s: %s%n", key, value);
}
}
CN: FDCD111304
DC: FL
DC: NET
Try using substring to chop off the parts you dont require hence creating a new string
There're few options, simples dumbest:
str.substring(str.indexOf("=") + 1, str.indexOf(","))
Second one and more flexible approach would be to build HashArray, it would be helpful in future to read other values.
Edit: Second method
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.HashMap;
public class HelloWorld{
public static void main(String []args){
String input = "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
Pattern pattern = Pattern.compile("(.*?)\\=(.*?)\\,");
Matcher m = pattern.matcher(input);
while(m.find()) {
String currentVar = m.group().substring(0, m.group().length() - 2);
System.out.println(currentVar); //store or do whatever you want
}
}
}
This one will print all values like CN=FDCD11130, you can split it by '=' and store in key/value container like HashMap or just inside list.

Android Regular expression grab an image url from a site

I am making an app and I have this problem
I use a patter recognition code to find the image url of an article at a site.
The problem is that in my way it grabs the first photo which is extra small.
Pattern p = Pattern.compile("http://planetaris.gr/media/k2/items/cache.*\.jpg");
There is a XL image which I would like to grab its destination.
I would like to use a pattern that at the end of the link it goes like this
Pattern p = Pattern.compile("(http://planetaris.gr/media/k2/items/cache.)+(.*\[_XL]+(.jpg))");
or
Pattern p = Pattern.compile("http://planetaris.gr/media/k2/items/cache.*\_XL.jpg");
This is where I need your help
Here is the code
public void run() {
//Pattern p = Pattern.compile("http://planetaris.gr/media/k2/items/cache.*\\.jpg");
//Pattern p = Pattern.compile("http://planetaris.gr/media/k2/items/cache.*\\._XL.jpg");
Pattern p = Pattern.compile("(http://planetaris.gr/media/k2/items/cache.)+(.*\\[_XL]+(.jpg))");
try {
URL url = new URL(selectedRssItem.getLink());
URLConnection urlc = url.openConnection();
Log.d("MIMIS_LINK", url.toString());
BufferedInputStream buffer = new BufferedInputStream(urlc.getInputStream());
builder = new StringBuilder();
int byteRead;
while ((byteRead = buffer.read()) != -1)
builder.append((char) byteRead);
buffer.close();
} catch (MalformedURLException ex) {
ex.printStackTrace();
} catch (IOException ex) {
ex.printStackTrace();
}
Matcher m = p.matcher(builder.toString());
if (m.find()) {
try {
bitmap = BitmapFactory.decodeStream((InputStream)new URL(m.group(0)).getContent());
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Log.d("MIMIS_MATCHER", selectedRssItem.getDescription().toString());
};
handler.sendEmptyMessage(0);
}
}.start();
}
private Handler handler = new Handler() {
//#SuppressWarnings("null")
#Override
public void handleMessage(Message msg) {
mSpinner.clearAnimation();
mSpinner.setVisibility(View.GONE);
//progressDialog.dismiss();
myimageview.setImageBitmap(bitmap);
if (bitmap==null){
myimageview.setImageResource(R.drawable.aris_no_image);
};
}
};
because at the site there is also a jpg which has this XL
These are all the .jpg at the given page .
href="/media/k2/items/cache/df95c3d9029788dcdb6f520e9151056c_XL.jpg"
/media/k2/items/cache/df95c3d9029788dcdb6f520e9151056c_L.jpg"
"/images/stories/atnea2.jpg"
/images/stories/diarkeias-bc.jpg"
this regex: /(media|images)/[^\.]*\.jpg
matches all your samples:
href="/media/k2/items/cache/df95c3d9029788dcdb6f520e9151056c_XL.jpg"
/media/k2/items/cache/df95c3d9029788dcdb6f520e9151056c_L.jpg"
"/images/stories/atnea2.jpg"
/images/stories/diarkeias-bc.jpg"
String url = "http://planetaris.gr/media/k2/items/cache.sample_XL.jpg";
String regex = "[0-9a-zA-Z\\-\\._/:]*[XL]\\.jpg$";
System.out.println(url.matches(regex)); //this will be print true if case matches files ends with *XL.jpg and *X.jpg and *L.jpg.
You only want to check the string ends with '.jpg' use the regex
String regex = "[\\x20-\\x7E]*\\.jpg$";
If u want to find the exact match for file ends with *XL.jpg
String url = "http://planetaris.gr/media/k2/items/cache.sample_XL.jpg";
String regex = "[0-9a-zA-Z\\-\\._/:]*XL\\.jpg$";
System.out.println(url.matches(regex)); //this will be print true if case matches
If any space or special character along with 0-9a-zA-Z character coming in your URL string please use the regex.(this will return true any string that ends with *XL.jpg)
String url = "http://planetaris.gr/media/k2/items %!##$%/cache.sample_ssXL.jpg";
String regex = "[\\x20-\\x7E]*XL\\.jpg$";

Extracting Sub-Trees Using Stanford Tregex

I have made a class to extract subtrees using Tregex. I used some code snips from "TregexPattern.java", as i don't want to let the program use the console commands.
In general, having a tree for a sentence, I want to extract certain sub tree (no user interaction).
what I did so far is the following:
package edu.stanford.nlp.trees.tregex;
import edu.stanford.nlp.ling.StringLabelFactory;
import edu.stanford.nlp.trees.*;
import java.io.*;
import java.util.*;
public abstract class Test {
abstract TregexMatcher matcher(Tree root, Tree tree, Map<String, Tree> namesToNodes, VariableStrings variableStrings);
public TregexMatcher matcher(Tree t) {
return matcher(t, t, new HashMap<String, Tree>(), new VariableStrings());
}
public static void main(String[] args) throws ParseException, IOException {
String encoding = "UTF-8";
TregexPattern p = TregexPattern.compile("NP < NN & <<DT"); //"/^MWV/" or "NP < (NP=np < NNS)"
TreeReader r = new PennTreeReader(new StringReader("(VP (VP (VBZ Try) (NP (NP (DT this) (NN wine)) (CC and) (NP (DT these) (NNS snails)))) (PUNCT .))"), new LabeledScoredTreeFactory(new StringLabelFactory()));
Tree t = r.readTree();
treebank = new MemoryTreebank();
treebank.add(t);
TRegexTreeVisitor vis = new TRegexTreeVisitor(p, encoding);
**treebank.apply(vis); //line 26**
if (TRegexTreeVisitor.printMatches) {
System.out.println("There were " + vis.numMatches() + " matches in total.");
}
}
private static Treebank treebank; // used by main method, must be accessible
static class TRegexTreeVisitor implements TreeVisitor {
private static boolean printNumMatchesToStdOut = false;
static boolean printNonMatchingTrees = false;
static boolean printSubtreeCode = false;
static boolean printTree = false;
static boolean printWholeTree = false;
static boolean printMatches = true;
static boolean printFilename = false;
static boolean oneMatchPerRootNode = false;
static boolean reportTreeNumbers = false;
static TreePrint tp;
PrintWriter pw;
int treeNumber = 0;
TregexPattern p;
//String[] handles;
int numMatches;
TRegexTreeVisitor(TregexPattern p, String encoding) {
this.p = p;
//this.handles = handles;
try {
pw = new PrintWriter(new OutputStreamWriter(System.out, encoding), true);
} catch (UnsupportedEncodingException e) {
System.err.println("Error -- encoding " + encoding + " is unsupported. Using ASCII print writer instead.");
pw = new PrintWriter(System.out, true);
}
// tp.setPrintWriter(pw);
}
public void visitTree(Tree t) {
treeNumber++;
if (printTree) {
pw.print(treeNumber + ":");
pw.println("Next tree read:");
tp.printTree(t, pw);
}
TregexMatcher match = p.matcher(t);
if (printNonMatchingTrees) {
if (match.find()) {
numMatches++;
} else {
tp.printTree(t, pw);
}
return;
}
Tree lastMatchingRootNode = null;
while (match.find()) {
if (oneMatchPerRootNode) {
if (lastMatchingRootNode == match.getMatch()) {
continue;
} else {
lastMatchingRootNode = match.getMatch();
}
}
numMatches++;
if (printFilename && treebank instanceof DiskTreebank) {
DiskTreebank dtb = (DiskTreebank) treebank;
pw.print("# ");
pw.println(dtb.getCurrentFile());
}
if (printSubtreeCode) {
pw.println(treeNumber + ":" + match.getMatch().nodeNumber(t));
}
if (printMatches) {
if (reportTreeNumbers) {
pw.print(treeNumber + ": ");
}
if (printTree) {
pw.println("Found a full match:");
}
if (printWholeTree) {
tp.printTree(t, pw);
} else {
**tp.printTree(match.getMatch(), pw); //line 108**
}
// pw.println(); // TreePrint already puts a blank line in
} // end if (printMatches)
} // end while match.find()
} // end visitTree
public int numMatches() {
return numMatches;
}
} // end class TRegexTreeVisitor
}
but it give the following error:
Exception in thread "main" java.lang.NullPointerException
at edu.stanford.nlp.trees.tregex.Test$TRegexTreeVisitor.visitTree(Test.java:108)
at edu.stanford.nlp.trees.MemoryTreebank.apply(MemoryTreebank.java:376)
at edu.stanford.nlp.trees.tregex.Test.main(Test.java:26)
Java Result: 1
Any modifications or ideas?
NullPointerException is usually an indicator of bug in software.
I had the same task in the past. Sentence was parsed with dependency parser.
I decided to put resulting parse tree in XML(DOM) and perform XPath queries over it.
To enhance performance you don't need to put xml in String, just keep all XML structure as DOM in memory (e.g. http://www.ibm.com/developerworks/xml/library/x-domjava/).
Using XPath for querying tree-like data structure gave me the following benefits:
Load/Save/Transfer results of sentence parsing easily.
Robust syntax/capabilities of XPath.
Many people know XPath (everyone can customize your query).
XML and XPath are cross platform.
Plenty of stable implementations of XPath and XML/DOM libraries.
Ability to use XSLT.
Integration with existing XML-based pipeline XSLT+XPath -> XSD -> Do actions (e.g. users have specified their email address and action what to do with it somewhere inside of free-text complaint).

Determining the path to Outlook.exe from java?

I want to invoke outlook from the command line (for various reasons) and wanted to know how I go about discovering the Path to the Outlook.exe file.
I'm pretty sure it's stored in the registry, but was wondering how to go about reading that from Java.
thanks
I found a Microsoft page that describes the procedure, just not in Java.
So I guess the question becomes how do I access the registry from java.
I found this site that might be able to help you. It's a Java Registry wrapper, seems to have a lot of features but no idea how robust the implementation is.
Using Otis' answer the following code does it nicely.
static String getOutlookPath() {
// Message message = new Message();
final String classID;
final String outlookPath;
{ // Fetch the Outlook Class ID
int[] ret = RegUtil.RegOpenKey(RegUtil.HKEY_LOCAL_MACHINE, "SOFTWARE\\Classes\\Outlook.Application\\CLSID", RegUtil.KEY_QUERY_VALUE);
int handle = ret[RegUtil.NATIVE_HANDLE];
byte[] outlookClassID = RegUtil.RegQueryValueEx(handle, "");
classID = new String(outlookClassID).trim(); // zero terminated bytes
RegUtil.RegCloseKey(handle);
}
{ // Using the class ID from above pull up the path
int[] ret = RegUtil.RegOpenKey(RegUtil.HKEY_LOCAL_MACHINE, "SOFTWARE\\Classes\\CLSID\\" + classID + "\\LocalServer32", RegUtil.KEY_QUERY_VALUE);
int handle = ret[RegUtil.NATIVE_HANDLE];
byte[] pathBytes = RegUtil.RegQueryValueEx(handle, "");
outlookPath = new String(pathBytes).trim(); // zero terminated bytes
RegUtil.RegCloseKey(handle);
}
return outlookPath;
}
Below is a solution modified slightly from a similar problem: https://stackoverflow.com/a/6194710/854664
Notice I'm using .pst instead of .xls
import java.io.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class ShowOutlookInstalled {
public static void main(String argv[]) {
try {
Process p = Runtime.getRuntime()
.exec(new String[] { "cmd.exe", "/c", "assoc", ".pst" });
BufferedReader input = new BufferedReader(new InputStreamReader(p.getInputStream()));
String extensionType = input.readLine();
input.close();
// extract type
if (extensionType == null) {
outlookNotFoundMessage("File type PST not associated with Outlook.");
} else {
String fileType[] = extensionType.split("=");
p = Runtime.getRuntime().exec(
new String[] { "cmd.exe", "/c", "ftype", fileType[1] });
input = new BufferedReader(new InputStreamReader(p.getInputStream()));
String fileAssociation = input.readLine();
// extract path
Pattern pattern = Pattern.compile("\".*?\"");
Matcher m = pattern.matcher(fileAssociation);
if (m.find()) {
String outlookPath = m.group(0);
System.out.println("Outlook path: " + outlookPath);
} else {
outlookNotFoundMessage("Error parsing PST file association");
}
}
} catch (Exception err) {
err.printStackTrace();
outlookNotFoundMessage(err.getMessage());
}
}
private static void outlookNotFoundMessage(String errorMessage) {
System.out.println("Could not find Outlook: \n" + errorMessage);
}
}

Categories