Error with PDF scraping using Tabulizer library - java

I'm trying to extract tables from several pdf files and used the Tabulizer library. However, as I use the extract_tables function, I keep getting this error:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.lang.IllegalAccessException: class RJavaTools cannot access a member of class java.util.ArrayList$Itr (in module java.base) with modifiers "public".
I have macOS Big Sur(M1, 2020) and have the latest version of Java and R installed. I would appreciate any help and guidance.
Here is the code that I used:
library(tabulizer)
library(rJava) library(pdftools)
setwd(file.path(dirname(rstudioapi::getActiveDocumentContext()$path),'Input Files'))
pdf_files<- list.files(pattern = ".pdf$")
pdf_joined<- pdf_combine(pdf_files,output = "joined.pdf")
f <- "joined.pdf"
out1 <- extract_tables(f)

Related

Renv and Java: "Error in rJava::.jinit() : Unable to create a Java class loader"

I have a script that works perfectly when I'm not using Renv. However, when running it in a project with Renv enabled, the last command line returns the following message:
> r5r_core <- setup_r5(data_path = data_path, verbose = FALSE)
Error in rJava::.jinit() : Unable to create a Java class loader.
Just run the code below inside a renv project to have a reproducible example:
options(java.parameters = "-Xmx2G")
library(r5r)
library(rJava)
data_path <- system.file("extdata/poa", package = "r5r")
list.files(data_path)
poi <- fread(file.path(data_path, "poa_points_of_interest.csv"))
head(poi)
points <- fread(file.path(data_path, "poa_hexgrid.csv"))
points <- points[ c(sample(1:nrow(points), 10, replace=TRUE)), ]
head(points)
# Indicate the path where OSM and GTFS data are stored
r5r_core <- setup_r5(data_path = data_path, verbose = FALSE)
My Java version is compatible with the one used in this package, but it looks like R is having a hard time communicating with Java in Renv. Could anyone tell me?

H2O h2o.importFile Error: 'Cannot determine file type. for nfs://.../model.zip', caused by water.parser.ParseDataset$H2OParse

I am trying to import a h2o model as a .zip file exporter as POJO with R. The following error is all I get:
model_file <- "/Users/bernardo/Desktop/DRF_1_AutoML_20190816_133251.zip"
m <- h2o.importFile(model_file)
Error: DistributedException from localhost/127.0.0.1:54321: 'Cannot determine file type. for nfs://Users/bernardo/Desktop/DRF_1_AutoML_20190816_133251.zip', caused by water.parser.ParseDataset$H2OParseException: Cannot determine file type. for nfs://Users/bernardo/Desktop/DRF_1_AutoML_20190816_133251.zip
I already ran file.exists(model_file) and that returns TRUE, so the file exists. Did the same with normalizePath(model_file) and same result. When I try to import it into my R session, it seems that h2o finds the file but can't import it for some reason.
Here's my R Session info:
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] h2o_3.26.0.2 lares_4.7 data.table_1.12.2 lubridate_1.7.4 forcats_0.4.0
[6] stringr_1.4.0 dplyr_0.8.3 purrr_0.3.2 readr_1.3.1 tidyr_0.8.3
[11] tibble_2.1.3 ggplot2_3.2.1 tidyverse_1.2.1
Hope you guys can help me import my POJO model into R. Thanks!
h2o models are not zip files. Try this
# path to your file
model_file <- "/Users/bernardo/Desktop/DRF_1_AutoML_20190816_133251.zip"
# prediction based on your mojo/pojo file.
preds = h2o.mojo_predict_df(df, model_file, genmodel_jar_path = NULL, classpath = NULL, java_options = NULL, verbose = F)
If they are zipped, then unzip and run them again. More info is here http://docs.h2o.ai/h2o/latest-stable/h2o-docs/save-and-load-model.html
https://rdrr.io/cran/h2o/man/h2o.mojo_predict_df.html
Ok, I actually found the solution I needed. The trick is to convert your dataframe (df) to json format, and then use the .zip file generated with h2o to predict using the h2o.predict_json instead of h2o.mojo_predict_df. I think it's pretty straight forward and less complicated. At least it worked as I needed it to work.
library(jsonlite)
library(h2o)
json <- toJSON(df)
output <- h2o.predict_json(zip_directory, json)
NOTE: No need to unzip the zip file.
If by any chance you've used the lares package, simply use the h2o_predict_MOJO function.
Hope it helps any other people trying to achieve the same result.

Internal Action was not loaded Error: java.lang.ClassNotFoundException

I am trying to run an implementation a jason code that is using some Internal Actions. The interpreter is showing that it was not possible to find the "java" code of the internal action, as showed:
Server running on http://191.36.8.42:3272
[aslparser] [peleus.asl:29] warning: The internal action class for 'org.soton.peleus.act.plan(Goals)' was not loaded! Error:
java.lang.ClassNotFoundException: org.soton.peleus.act.plan
[aslparser] [peleus.asl:42] warning: The internal action class for 'org.soton.peleus.act.isTrue(H)' was not loaded! Error:
java.lang.ClassNotFoundException: org.soton.peleus.act.isTrue
[peleus] Could not finish intention: intention 1: +des([on(b3,table),on(b2,b3),on(b1,b2)])[source(self)] <- ... org.soton.peleus.act.plan(Goals); !checkGoals(Goals); .print("Goals ",Goals," were satisfied") /
{Goals=[on(b3,table),on(b2,b3),on(b1,b2)]}Trigger: +des([on(b3,table),on(b2,b3),on(b1,b2)])[noenv,code(org.soton.peleus.act.plan([on(b3,table),on(b2,b3),on(b1,b2)])),code_line(29),code_src("peleus.asl"),error(action_failed),error_msg("no environment configured!"),source(self)]
[peleus] Adding belief clear(table)
This mas2j file is as following:
MAS peleus {
infrastructure: Centralised
agents:
peleus;
}
Part of agent code (written by Felipe Meneguzzi) is showed bellow:
//The next line is line 28
+des(Goals) : true
<- org.soton.peleus.act.plan(Goals);
!checkGoals(Goals);
.print("Goals ",Goals," were satisfied").
+!checkGoals([]) : true <- true.
//The next line is line 40
+!checkGoals([H|T]) : true
<- .print("Checking ", H);
org.soton.peleus.act.isTrue(H);
!checkGoals(T).
I guess it is about the folder structure, how to set up Jason to search for java files in specific locations?
The folders structure is like this:
Peleus\src\org\soton\peleus for java files
Peleus\examples for mas2j and asl tested project
It all depends on how you are executing the application.
If you are using java, the CLASSPATH should be defined to include the missing classes.
if you are using jason script (that uses Ant), the .mas2j file should include the class path as well.
More on that in the FAQ. Notice that CLASSPATH is where .class files are found, not .java source code files. The error regards a missing class, not a missing source code.

Scala: no corresponding Java Class found

While Compiling an Scala 2.10 Project, I got an error I cannot even understand
java.lang.NoClassDefFoundError: no Java class corresponding to MongoPersistable.this.type found
at scala.reflect.runtime.JavaMirrors$JavaMirror.typeToJavaClass(JavaMirrors.scala:1218) ~[scala-reflect-2.10.0.jar:na]
at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:202) ~[scala-reflect-2.10.0.jar:na]
at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:65) ~[scala-reflect-2.10.0.jar:na]
...
How is this even possible to get such a error message if the code got well in eclipse in first place?
Following line of Code produces this error:
trait MongoPersistable {
def save() {
val dao : MongoDAO[MongoPersistable.this.type] = MongoDAO[this.type];
....
I would upload more code if I know where to search

Can't import sun.org.mozilla.javascript.internal in NetBeans

In my java program I make heavy use of Suns implmentation of the Rhino script engine. Very recently however, my JDK does not seem to automatically import the rt.jar file anymore when compiling.
Whats strange is that NetBeans reports 0 live errors, they only show up when doing a complete Clean & Build. This wasn't happening before when I was importing NativeArray, so I'm really confused on why it all of a sudden stopped working.
Specs:
OS - Windows
Java version - java version "1.6.0_20"
Javac version - javac 1.6.0_20
NetBeans version - 6.9
Check to see if it exists:
C:\Documents and Settings\LordQuackstar\Desktop\TestApp\src>javap sun.org.mozill
a.javascript.internal.WrappedException
Compiled from "WrappedException.java"
public class sun.org.mozilla.javascript.internal.WrappedException extends sun.or
g.mozilla.javascript.internal.EvaluatorException{
static final long serialVersionUID;
public sun.org.mozilla.javascript.internal.WrappedException(java.lang.Throwa
ble);
public java.lang.Throwable getWrappedException();
public java.lang.Object unwrap();
}
Ok it exists, so here's some test code:
package testapp;
import sun.org.mozilla.javascript.internal.WrappedException;
public class Main {
public static void main(String[] args) {
WrappedException e = new WrappedException(null);
}
}
Netbeans output:
init:
deps-clean:
Updating property file: C:\Documents and Settings\LordQuackstar\Desktop\TestApp\build\built-clean.properties
Deleting directory C:\Documents and Settings\LordQuackstar\Desktop\TestApp\build
clean:
init:
deps-jar:
Created dir: C:\Documents and Settings\LordQuackstar\Desktop\TestApp\build
Updating property file: C:\Documents and Settings\LordQuackstar\Desktop\TestApp\build\built-jar.properties
Created dir: C:\Documents and Settings\LordQuackstar\Desktop\TestApp\build\classes
Created dir: C:\Documents and Settings\LordQuackstar\Desktop\TestApp\build\empty
Compiling 1 source file to C:\Documents and Settings\LordQuackstar\Desktop\TestApp\build\classes
C:\Documents and Settings\LordQuackstar\Desktop\TestApp\src\testapp\Main.java:8: package sun.org.mozilla.javascript.internal does not exist
import sun.org.mozilla.javascript.internal.WrappedException;
C:\Documents and Settings\LordQuackstar\Desktop\TestApp\src\testapp\Main.java:16: cannot find symbol
symbol : class WrappedException
location: class testapp.Main
WrappedException e = new WrappedException(null);
^
C:\Documents and Settings\LordQuackstar\Desktop\TestApp\src\testapp\Main.java:16: cannot find symbol
symbol : class WrappedException
location: class testapp.Main
WrappedException e = new WrappedException(null);
^
3 errors
C:\Documents and Settings\LordQuackstar\Desktop\TestApp\nbproject\build-impl.xml:528: The following error occurred while executing this line:
C:\Documents and Settings\LordQuackstar\Desktop\TestApp\nbproject\build-impl.xml:261: Compile failed; see the compiler error output for details.
BUILD FAILED (total time: 0 seconds)
Command line output:
C:\Documents and Settings\LordQuackstar\Desktop\TestApp\src\testapp>javac Main.java
Main.java:3: package sun.org.mozilla.javascript.internal does not exist
import sun.org.mozilla.javascript.internal.WrappedException;
^
Main.java:7: cannot find symbol
symbol : class WrappedException
location: class testapp.Main
WrappedException e = new WrappedException(null);
^
Main.java:7: cannot find symbol
symbol : class WrappedException
location: class testapp.Main
WrappedException e = new WrappedException(null);
^
3 errors
So what would cause this to fail all of a sudden? It was working just fine yesterday. I didn't change anything besides importing 2 more classes from the same package. None of my dependencies changed.
Will test in linux to see if the problem still exists.
Before you say it: No I'm not download rhino separatly, No I'm not changing IDEs,
There are two indications that you shouldn't use this class: sun and internal - these mean that this is some internal class that shouldn't be used by third parties. Because it can change or be removed in future releases - i.e. this is not part of an API. So - download Rhino separately.
If you are using the scripting API - use only the API classes/interfaces - i.e. javax.script
I agree w/ the above advice that you're better off not trying to use the sun internal packages.
This begs the question, how do you access JavaScript arrays w/out sun.org.mozilla.javascript.internal.NativeArray?
What worked for me is code as follows. This creates a Java array called vars based off a JavaScript array called vars.
int varsLength = ((Double)engine.eval("vars.length;")).intValue();
Object[] vars = new Object[varsLength];
for(int i=0; i<vars.length; i++){
vars[i] = engine.eval("vars["+i+"];");
}
I had the same error. You must manually add rt.jar from JRE dir to project libraries. Only this solution seems work. You can also see a tutorial on this approach here by Rob Di Marco
This is an old question now, however when I had this problem, my solution was to do more work in the JavaScript environment and then to return a primitive type (String / Boolean) rather than an object.
Of course, this will not satisfy everyone and all requirements, but it may help in some cases.

Categories