Java call from Python without loading classpath - java

I am making a Java jar file call from Python.
def extract_words(file_path):
"""
Extract words and bounding boxes
Arguments:
file_path {[str]} -- [Input file path]
Returns:
[Document]
"""
extractor = PDFBoxExtractor(file_path=file_path,jar_path="external/pdfbox-app-2.0.15.jar",class_path="external")
document = extractor.run()
return document
And somewhere:
pipe = subprocess.Popen(['java',
'-cp',
'.:%s:%s' %
(self._jar_path,
self._class_path) ,
'PrintTextLocations',
self._file_path],
stdout=subprocess.PIPE)
output = pipe.communicate()[0].decode()
This is working fine. But the problem is the jar is heavy and when I have to call this multiple times in a loop, it takes 3-4 seconds to load the jar file each time. If I run this in a loop for 100 iterations, it adds 300-400 seconds to the process.
Is there any way to keep the classpath alive for java and not load jar file every time? Whats the best way to do it in time optimised manner?

You can encapsulate your PDFBoxExtractor in a class my making it a class member. Initialize the PDFBoxExtractor in the constructor of the class. Like below:
class WordExtractor:
def __init__(self):
self.extractor = PDFBoxExtractor(file_path=file_path,jar_path="external/pdfbox-app-2.0.15.jar",class_path="external")
def extract_words(self,file_path):
"""
Extract words and bounding boxes
Arguments:
file_path {[str]} -- [Input file path]
Returns:
[Document]
"""
document = self.extractor.run()
return document
Next step would be to create instance of WordExtractor class outside the loop.
word_extractor = WordExtractor()
#your loop would go here
while True:
document = word_extractor.extract_words(file_path);
This is just example code to explain the concept. You may tweak it the way you want as per your requirement.
Hope this helps !

Related

File wildcard use *

I am trying to read a file which has name: K2ssal.timestamp.
I want to handle the time stamp part of the file name as wildcard.
How can I achieve this ?
tried * after file name but not working.
var getK2SSal: Iterator[String] = Source.fromFile("C://Users/nrakhad/Desktop/Work/Data stage migration/Input files/K2Ssal.*").getLines()
You can use Files.newDirectoryStream with directory + glob:
import java.nio.file.{Paths, Files}
val yourFile = Files.newDirectoryStream(
Paths.get("/path/to/the/directory"), // where is the file?
"K2Ssal.*" // glob of the file name
).iterator.next // get first match
Misconception on your end: unless the library call is specifically implemented to do so, using a wildcard simply doesn't work like you expect it to.
Meaning: a file system doesn't know about wildcards. It only knows about existing files and folders. The fact that you can put * on certain commands, and that the wildcard is replaced with file names is a property of the tool(s) you are using. And most often, programming APIs that allow you to query the file system do not include that special wild card handling.
In other words: there is no sense in adding that asterisk like that.
You have to step back and write code that actively searches for files itself. Here are some examples for scala.
You can read the directory and filter on files based upon the string.
val l = new File("""C://Users/nrakhad/Desktop/Work/Data stage migration/Input files/""").listFiles
val s = l.filter(_.toString.contains("K2Ssal."))

Call parameters from JAVA code in R code to dynamically pass csv file and unique ID

I have a R script in which I want to call parameters from Java code. The parameters are csv file name file name and unique ID which has to be used to name the two output files.
My R script is :
df1 <- read.csv("filename.csv")
vs=colnames(df1)
md=formula(paste(vs[3],"~",vs[1],"+",vs[2]))
fit <- summary(aov(md, data=df1))[[1]]
#text output
names(fit)[1:4]=c("DF","SS","MS","F")
sink("test.txt")
In this code the first line df1 <- read.csv("filename.csv") should take file name dynamically from JAVA code and the last line sink("test.txt") should take unique ID and create the output file.
The java code is :
buildCommand.add("Rscript ");
buildCommand.add(scriptName);
buildCommand.add(inputFileWithPathExtension);
buildCommand.add(uniqueIdForR);
I have seen other post but I am unsure wether it will help in my case, also similar posts talking about rJava package`, but didn't get clear idea.
Any help will be highly appreciated. thanks in advance !
Here a very simple example for reading command line arguments in your case:
args <- commandArgs(TRUE)
input <- args[1]
output <- paste0(args[2], ".txt")
cat("Reading from", input, "\n")
cat("Writing to", output, "\n")
Example:
$ Rscript foo.R foo.csv 1234567
Reading from foo.csv
Writing to 1234567.txt

Calling a Java class with PHP's exec function

I need to split a text into sentences, and I am trying to use Stanford Core NLP. I have downloaded the library. Since its a Java library I am using PHP's exec command (please see below) to call it. My PHP script works well and I can parse a text into sentences. Currently, the script needs an input file to be parsed. My question is if I can use a PHP string variable instead of input .txt file. It will be very convenient for me since I will be using mysql db to retrieve text/string. If it is not possible, then I would need to create a corresponding text file for the command line input. Any feedback you provide will greatly be appreciated.
Here is my small PHP script
$text = "Maria and Ted Bobola grow sweet corn. But with little corn to harvest because of a plant-withering drought, the Bobolas were forced to buy corn from Georgia to supply their produce and flower shop just outside Dover. And most years, the strawberry picking season runs three to five weeks, but not this year, store manager Dee Chambers said. ``It was so hot and dry that even though we irrigated, we had only two weeks in the season,'' she said. They did not recoup $70,000 they paid for strawberry plants last year in this year's harvest.";
$parser = "stanford-corenlp-3.5.0.jar";
$class = "edu.stanford.nlp.process.DocumentPreprocessor";
$input = "sample.txt";
$output = "output.txt";
if (exec( "java -cp $parser -Xmx2g $class -file $input", $result))
{
echo "success";
}
else
{
echo "failure";
}
// Optional
echo '<pre>' . print_r($result, true);
What I am trying to do here is replacing $input (i.e. txt file) with $text (i.e. a php variable).

Make a Java Call from within Progress 4gl

Currently, I have a batch file that is basically running an executable jar.
Like this...
java -jar foo.jar
I have code in progress that is executing that batch file and piping out the values it returns into a txt document. I am then reading in that text document and parsing the info accordingly.
However, this is an ugly way of handling this and could lead to many issues down the road. I am basically just looking for a way in progress to execute a os-command and retrieve it's results without writing it to a file and reading back in.
I am running OpenEdge 10.1C
DEFINE INPUT PARAMETER iJarInput AS CHARACTER NO-UNDO.
DEFINE OUTPUT PARAMETER oJarOutput AS CHARACTER NO-UNDO.
DEFINE VARIABLE cOut AS CHARACTER NO-UNDO.
DEFINE VARIABLE cCmd AS CHARACTER NO-UNDO.
ASSIGN
cCmd = batchFile + " " + iJarInput.
OS-COMMAND SILENT VALUE(cCmd).
INPUT FROM VALUE(outFile).
REPEAT:
IMPORT UNFORMATTED cOut.
oJarOutput = oJarOutput + cOut.
END.
You can call external shared libraries.
http://documentation.progress.com/output/OpenEdge112/oe112html/ABL/wwhelp/wwhimpl/common/html/wwhelp.htm#href=Programming%20Interfaces/15dvpinch08epi.089.5.html&single=true
You could, for instance, use that capability to create a "shim" to your JAR.

Save Word Document with JACOB (Java)

i'm trying to make a simple Java program to open an existing word-document, change something and save it as .html-file.
The part which is not working is to save it as .html .
The problem is, i got the html-file but it's only a renamed doc-file. So not really a .html-file which I can work with.
This is what I found with Google:
Object oWordBasic = Dispatch.call(oWord, "WordBasic").getDispatch();
Dispatch.call((Dispatch) oWordBasic, "FileSaveAs", path);
What I have to do, to get a html-file as output?
Thank you in advance.
It's using the OLE Automation Object to save the file, so you have to find the method or parameter to indicate filetype.
This is the macro I could record using Word:
ActiveDocument.SaveAs filename:="asdd.htm", FileFormat:=wdFormatHTML, _
LockComments:=False, Password:="", AddToRecentFiles:=True, WritePassword _
:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
False
So it means you have to indicate FileFormat := wdFormatHTML (or the constant value) parameter to the SaveAs method. That's left as an exercise to the reader :)
I figured it out, thanks to helios for the tip.
The correct code is:
Object oWordBasic = Dispatch.call(oWord, "WordBasic").getDispatch();
Dispatch.call((Dispatch) oWordBasic, "FileSaveAs", path, new Variant(8));
The Parameter of the variant is the output format. (for example 8 is html, 6 is rtf, 17 is pdf)
You can find the full list at: WdSaveFormat Enumeration

Categories