tabula-py unable to read pdf file

tabula-py unable to read pdf file - java

My code:
import tabula
import os
dir_path = os.path.dirname(os.path.realpath(__file__))
file_path = dir_path + '\ALPINE_' + str(20191107) + '.pdf'
print(file_path)
df = tabula.read_pdf('ALPINE_20191107.pdf',multiple_tables=True, pages="all")
result:
runfile('C:/Users/Admin/Documents/lucas/testTabula.py.py', wdir='C:/Users/Admin/Documents/lucas')
Traceback (most recent call last):
File "<ipython-input-29-a6b390aef3cf>", line 1, in <module>
runfile('C:/Users/Admin/Documents/lucas/sem título0.py', wdir='C:/Users/Admin/Documents/lucas')
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Admin/Documents/lucas/sem título0.py", line 12, in <module>
df = tabula.read_pdf('ALPINE_20191107.pdf',multiple_tables=True, pages="all")
File "C:\ProgramData\Anaconda3\lib\site-packages\tabula\io.py", line 332, in read_pdf
return _extract_from(raw_json, pandas_options)
File "C:\ProgramData\Anaconda3\lib\site-packages\tabula\io.py", line 664, in _extract_from
df[c] = pd.to_numeric(df[c], errors="ignore")
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\numeric.py", line 138, in to_numeric
raise TypeError("arg must be a list, tuple, 1-d array, or Series")
TypeError: arg must be a list, tuple, 1-d array, or Series
It's function doesn't seem to work. I could directly type the path to make even simpler, but it didn't work either. It could be a problem with the pdf file, but I already saw it working in another environment with the same script and the same file.
I already have java set on both possible PATHs ('C:\Program Files\Java\jre1.8.0_231\bin') as by documentation but it really doesn't matter, the error occurs with or without then set on PATH. I've tried adding jdk as well but didn't solve either.
I notice the error mentioning pandas so maybe it's conflicting with my version (the latest), but i'm not sure.
python is 3.7.4 and java is the latest to this date

I have had the same issue. I was using the version installed using pip, i.e. tabula-py 2.0.0. I uninstalled the version, and installed from Anaconda using conda install -c conda-forge tabula-py, and current version is tabula-py 1.4.1, which resolved this issue.

Related

Error in building iotivity for android on windows

Can someone help me with this error, I can't seem to identify the problem. I am also new in using Scons. I need to get through this to obtain the .aar and .apk files. I am using Iotivity for a project that allows users to share transfer images between devices of any platform without internet.
Command Prompt:
C:\Users\derrick\Desktop\iotivity-2.0.1.1\iotivity-2.0.1.1>scons TARGET_OS=android
scons: Reading SConscript files ...
Processing using SCons version 3.1.1
Python 2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)] on win32
NameError: name 'host_arch' is not defined:
File "C:\Users\derrick\Desktop\iotivity-2.0.1.1\iotivity-2.0.1.1\SConstruct", line 32:
SConscript('build_common/SConscript')
File "c:\python27\lib\site-packages\scons\SCons\Script\SConscript.py", line 668:
return method(*args, **kw)
File "c:\python27\lib\site-packages\scons\SCons\Script\SConscript.py", line 605:
return _SConscript(self.fs, *files, **subst_kw)
File "c:\python27\lib\site-packages\scons\SCons\Script\SConscript.py", line 286:
exec(compile(scriptdata, scriptname, 'exec'), call_stack[-1].globals)
File "C:\Users\derrick\Desktop\iotivity-2.0.1.1\iotivity-2.0.1.1\build_common\SConscript", line 1025:
env.SConscript(target_os + '/SConscript')
File "c:\python27\lib\site-packages\scons\SCons\Script\SConscript.py", line 605:
return _SConscript(self.fs, *files, **subst_kw)
File "c:\python27\lib\site-packages\scons\SCons\Script\SConscript.py", line 286:
exec(compile(scriptdata, scriptname, 'exec'), call_stack[-1].globals)
File "C:\Users\derrick\Desktop\iotivity-2.0.1.1\iotivity-2.0.1.1\build_common\android\SConscript", line 19:
SConscript('#/extlibs/android/ndk/SConscript')
File "c:\python27\lib\site-packages\scons\SCons\Script\SConscript.py", line 668:
return method(*args, **kw)
File "c:\python27\lib\site-packages\scons\SCons\Script\SConscript.py", line 605:
return _SConscript(self.fs, *files, **subst_kw)
File "c:\python27\lib\site-packages\scons\SCons\Script\SConscript.py", line 286:
exec(compile(scriptdata, scriptname, 'exec'), call_stack[-1].globals)
File "C:\Users\derrick\Desktop\iotivity-2.0.1.1\iotivity-2.0.1.1\extlibs\android\ndk\SConscript", line 24:
if host_arch in ['x86_64']:

It's broken. I guess I'm the one who broke it when I tried to clean up that part of the build some years ago. The iotivity project CI system does not build android binaries on a windows host, it uses a linux builder for that, and I guess no developer did either so nothing detected the problem, which as the error message says, is that host_arch is undefined. This is not fundamental to iotivity, it's just dependency work to set up the Android NDK; once you have one set up this stuff is skipped for subsequent builds. The previous version switched on target_arch which wasn't right - the bundle to get depends on the host, not on what you're building for. I think the Android project stopped supporting 32-bit bundles a while back anyway so the simplest way to move forward is to remove the test (unless for some reason you have 32-bit Windows). That is, change this chunk starting with line 23:
else:
if host_arch in ['x86_64']:
ndk_url = ndk_url_base + '-windows-x86_64.exe'
else:
ndk_url = ndk_url_base + '-windows-x86.exe'
ndk_bundle = 'android-ndk-' + NDK_VER + '.exe'
to the simpler form:
else:
ndk_url = ndk_url_base + '-windows-x86_64.exe'
ndk_bundle = 'android-ndk-' + NDK_VER + '.exe'
(if it wasn't clear, that meant edit the file in the last line of the traceback, C:\Users\derrick\Desktop\iotivity-2.0.1.1\iotivity-2.0.1.1\extlibs\android\ndk\SConscript)

H2O h2o.importFile Error: 'Cannot determine file type. for nfs://.../model.zip', caused by water.parser.ParseDataset$H2OParse

I am trying to import a h2o model as a .zip file exporter as POJO with R. The following error is all I get:
model_file <- "/Users/bernardo/Desktop/DRF_1_AutoML_20190816_133251.zip"
m <- h2o.importFile(model_file)
Error: DistributedException from localhost/127.0.0.1:54321: 'Cannot determine file type. for nfs://Users/bernardo/Desktop/DRF_1_AutoML_20190816_133251.zip', caused by water.parser.ParseDataset$H2OParseException: Cannot determine file type. for nfs://Users/bernardo/Desktop/DRF_1_AutoML_20190816_133251.zip
I already ran file.exists(model_file) and that returns TRUE, so the file exists. Did the same with normalizePath(model_file) and same result. When I try to import it into my R session, it seems that h2o finds the file but can't import it for some reason.
Here's my R Session info:
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] h2o_3.26.0.2 lares_4.7 data.table_1.12.2 lubridate_1.7.4 forcats_0.4.0
[6] stringr_1.4.0 dplyr_0.8.3 purrr_0.3.2 readr_1.3.1 tidyr_0.8.3
[11] tibble_2.1.3 ggplot2_3.2.1 tidyverse_1.2.1
Hope you guys can help me import my POJO model into R. Thanks!

h2o models are not zip files. Try this
# path to your file
model_file <- "/Users/bernardo/Desktop/DRF_1_AutoML_20190816_133251.zip"
# prediction based on your mojo/pojo file.
preds = h2o.mojo_predict_df(df, model_file, genmodel_jar_path = NULL, classpath = NULL, java_options = NULL, verbose = F)
If they are zipped, then unzip and run them again. More info is here http://docs.h2o.ai/h2o/latest-stable/h2o-docs/save-and-load-model.html
https://rdrr.io/cran/h2o/man/h2o.mojo_predict_df.html

Ok, I actually found the solution I needed. The trick is to convert your dataframe (df) to json format, and then use the .zip file generated with h2o to predict using the h2o.predict_json instead of h2o.mojo_predict_df. I think it's pretty straight forward and less complicated. At least it worked as I needed it to work.
library(jsonlite)
library(h2o)
json <- toJSON(df)
output <- h2o.predict_json(zip_directory, json)
NOTE: No need to unzip the zip file.
If by any chance you've used the lares package, simply use the h2o_predict_MOJO function.
Hope it helps any other people trying to achieve the same result.

JPype (Python): importing folder of jar's

i am using JPype in order to work with java classes in python.
I have a folder that contains multiple self-written .jar files.
I know how to import multiple .jar's on the long way:
...
CLASSPATH = "/path/to/jars/first.jar:/path/to/jars/second.jar"
jpype.startJVM(jpype.getDefaultJVMPath(), "-ea", "-Djava.class.path=%s" % CLASSPATH)
MYLIB= jpype.JPackage("org").mylib
MyClass = MYLIB.MyClass
myObj = MyClass()
This works fine, but i think there might be a better way.
I already tried this:
CLASSPATH = "/path/to/jars/*.jar"
and this:
CLASSPATH = "/path/to/jars/*"
In both cases following error occurs:
user#user:~/path/to/python/$ python test.py
Traceback (most recent call last):
File "test.py", line 23, in <module>
myObj = MyClass()
File "/usr/local/lib/python2.7/dist-packages/JPype1-0.6.2-py2.7-linux-x86_64.egg/jpype/_jpackage.py", line 60, in __call__
raise TypeError("Package {0} is not Callable".format(self.__name))
TypeError: Package org.mylib.MyClass is not Callable
My Question:
Is there any way to easily import a folder that contains multiple .jar's in JPype?

You can join the list of jar files with Python code without hardcoding
f'{str.join(":", ["path/to/jars/"+name for name in os.listdir("path/to/jars")])}'

Ansible: Changing permission of a directory issue

I'm running the following Ansible task to change permission of a directory and its content.
- name: Change ownership of everything below /opt/as2/app-server
file: path=/opt/as2/app-server state=directory recurse=yes owner=adrt group=adrt
When running it I get the following issue:
TASK [appserver : Change ownership of everything below /opt/as2/app-server] ****
fatal: [192.168.1.182]: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "Traceback (most recent call last):\r\n File \"/tmp/ansible_UrBo6x/ansible_module_file.py\", line 451, in \r\n main()\r\n File \"/tmp/ansible_UrBo6x/ansible_module_file.py\", line 335, in main\r\n changed |= recursive_set_attributes(module, to_bytes(file_args['path'], errors='surrogate_or_strict'), follow, file_args)\r\n File \"/tmp/ansible_UrBo6x/ansible_module_file.py\", line 146, in recursive_set_attributes\r\n changed |= module.set_fs_attributes_if_different(tmp_file_args, changed)\r\n File \"/tmp/ansible_UrBo6x/ansible_modlib.zip/ansible/module_utils/basic.py\", line 1163, in set_fs_attributes_if_different\r\n File \"/tmp/ansible_UrBo6x/ansible_modlib.zip/ansible/module_utils/basic.py\", line 929, in set_owner_if_different\r\n File \"/tmp/ansible_UrBo6x/ansible_modlib.zip/ansible/module_utils/basic.py\", line 842, in user_and_group\r\nOSError: [Errno 2] No such file or directory: '/opt/as2/app-server-1.0.0/apps/station/WEB-INF/classes/org/adroitlogic/isuite/metrics/As2MetricsService/usr/bin/python$tt__collectStats_closure14.class'\r\n", "msg": "MODULE FAILURE"}
Basically it says there is no such file or directory as,
/opt/as2/app-server-1.0.0/apps/station/WEB-INF/classes/org/adroitlogic/isuite/metrics/As2MetricsService/usr/bin/python$tt__collectStats_closure14.class
The content of the directory, /opt/as2/app-server/apps/station/WEB-INF/classes/org/adroitlogic/isuite/metrics/ is,
As2MetricsService$_$tt__CountStatisticsLists_closure3.class
As2MetricsService$_$tt__collectStats_closure10.class
As2MetricsService$_$tt__collectStats_closure11.class
As2MetricsService$_$tt__collectStats_closure12.class
As2MetricsService$_$tt__collectStats_closure13.class
As2MetricsService$_$tt__collectStats_closure14.class
As2MetricsService$_$tt__collectStats_closure15.class
As2MetricsService$_$tt__collectStats_closure4.class
As2MetricsService$_$tt__collectStats_closure5.class
As2MetricsService$_$tt__collectStats_closure6.class
As2MetricsService$_$tt__collectStats_closure7.class
As2MetricsService$_$tt__collectStats_closure8.class
As2MetricsService$_$tt__collectStats_closure9.class
As2MetricsService$_CountStatisticsLists_closure1.class
As2MetricsService$_collectStats_closure2.class
As2MetricsService.class
There are no subdirectories.
Also when I run the command chown -R adrt:adrt . inside the directory /opt/as2/app-server it executes without any issue.
Help me to understand what is happening here.

Help me to understand what is happening here.
You have just found a bug in Ansible which causes modules to fail when the names of files it processes contain $_ sequence.
The name is passed without escaping the $ character (or rather with an explicit conversion request os.path.expandvars(filename)) and the sequence $_ is processed as an built-in variable resolving to the path of the current process (/usr/bin/python in this case, as Ansible uses Python to run its modules).
In result the file name:
As2MetricsService$_$tt__collectStats_closure14.class
is interpreted as:
As2MetricsService/usr/bin/python$tt__collectStats_closure14.class
and the system throws an error that the file does not exist (which is true).
Until it is fixed, I guess you have to call chown with the command module

Using a Python Script in Java (Eclipse)

I've been looking to incorporate a Python Script a friend made for me into a Java application that I am trying to develop. After some trial and error I finally found out about 'Jython' and used the PythonInterpreter to try and run the script.
However, upon trying to run it, I am getting an error within the Python Script. This is odd because when I try run the script outside of Java (Eclipse IDE in this case), the script works fine and does exactly what I need it to (extract all the images from the .docx files stored in its same directory).
Can someone help me out here?
Java:
import org.python.core.PyException;
import org.python.util.PythonInterpreter;
public class SPImageExtractor
{
public static void main(String[] args) throws PyException
{
try
{
PythonInterpreter.initialize(System.getProperties(), System.getProperties(), new String[0]);
PythonInterpreter interp = new PythonInterpreter();
interp.execfile("C:/Documents and Settings/user/workspace/Intern Project/Proposals/Converted Proposals/Image-Extractor2.py");
}
catch(Exception e)
{
System.out.println(e.toString());
e.printStackTrace();
}
}
}
Java Error regarding Python Script:
Traceback (most recent call last):
File "C:/Documents and
Settings/user/workspace/Intern
Project/Proposals/Converted
Proposals/Image-Extractor2.py", line
19, in
thisDir,_ = path.split(path.abspath(argv[0]))
IndexError: index out of range: 0
Traceback (most recent call last):
File "C:/Documents and
Settings/user/workspace/Intern
Project/Proposals/Converted
Proposals/Image-Extractor2.py", line
19, in
thisDir,_ = path.split(path.abspath(argv[0]))
IndexError: index out of range: 0
Python:
from os import path, chdir, listdir, mkdir, gcwd
from sys import argv
from zipfile import ZipFile
from time import sleep
#A few notes -
#(1) when I do something like " _,variable = something ", that is because
#the function returns two variables, and I only need one. I don't know if it is a
#common convention to use the '_' symbol as the name for the unused variable, but
#I saw it in some guy's code in the past, and I started using it.
#(2) I use "path.join" because on unix operating systems and windows operating systems
#they use different conventions for paths like '\' vs '/'. path.join works on all operating
#systems for making paths.
#Defines what extensions to look for within the file (you can add more to this)
IMAGE_FILE_EXTENSIONS = ('.bmp', '.gif', '.jpg', '.jpeg', '.png', '.tif', '.tiff')
#Changes to the directory in which this script is contained
thisDir = getcwd()
chdir(thisDir)
#Lists all the files/folders in the directory
fileList = listdir('.')
for file in fileList:
#Checks if the item is a file (opposed to being a folder)
if path.isfile(file):
#Fetches the files extension and checks if it is .docx
_,fileExt = path.splitext(file)
if fileExt == '.docx':
#Creates directory for the images
newDirectory = path.join(thisDir, file + "-Images")
if not path.exists(newDirectory):
mkdir(newDirectory)
currentFile = open(file,"r")
for line in currentFile:
print line
sleep(5)
#Opens the file as if it is a zipfile
#Then lists the contents
try:
zipFileHandle = ZipFile(file)
nameList = zipFileHandle.namelist()
for archivedFile in nameList:
#Checks if the file extension is in the list defined above
#And if it is, it extracts the file
_,archiveExt = path.splitext(archivedFile)
if archiveExt in IMAGE_FILE_EXTENSIONS:
zipFileHandle.extract(archivedFile, newDirectory)
except:
pass

My guess is that you don't get command line arguments if the interpreter is called (well not that surprisingly, where should it get the correct values? [or what would be the correct value?]).
os.getcwd()
Return a string representing the current working directory.
Would return the working dir, but presumably that's not what you want.
Not tested, but I think os.path.dirname(os.path.realpath( __ file__)) should work presumably (Note: remove the space there; I should look at the formatting options in detail some time~)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

tabula-py unable to read pdf file - java

I have had the same issue. I was using the version installed using pip, i.e. tabula-py 2.0.0. I uninstalled the version, and installed from Anaconda using conda install -c conda-forge tabula-py, and current version is tabula-py 1.4.1, which resolved this issue.

Related

Error in building iotivity for android on windows

H2O h2o.importFile Error: 'Cannot determine file type. for nfs://.../model.zip', caused by water.parser.ParseDataset$H2OParse

JPype (Python): importing folder of jar's

Ansible: Changing permission of a directory issue

Using a Python Script in Java (Eclipse)

Categories

Resources