Read AVRO file using Python - java

I have an AVRO file(created by JAVA) and seems like it is some kind of zipped file for hadoop/mapreduce, i want to 'unzip' (deserialize) it to a flat file. Per record per row.
I learned that there is an AVRO package for python, and I installed it correctly. And run the example to read the AVRO file. However, it came up with the errors below and I am wondering what is going on reading the simplest example? Can anyone help me interpret the errors bellow.
>>> reader = DataFileReader(open("/tmp/Stock_20130812104524.avro", "r"), DatumReader())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../python2.7/site-packages/avro/datafile.py", line 240, in __init__
raise DataFileException('Unknown codec: %s.' % self.codec)
avro.datafile.DataFileException: Unknown codec: snappy.
btw, if I do 'head' of file, and using VI to open up the first few lines of the AVRO file, I could see the schema definition together with some crappy weird characters - probably the zipped content.
The starting bit of the raw AVRO file looks like below:
bj^A^D^Tavro.codec^Lsnappy^Vavro.schemaØ${"type":"record","name":"Stoc...
I don't know if those schemas would be necessary to read the AVRO file, something like below:
schema = avro.schema.parse(open("schema").read())
# include schema to do sth...
reader = DataFileReader(open("Stock_20130812104524.avro", "r"), DatumReader())
Thanks in advance.

Try pip install python-snappy - make sure you have installed snappy first.

The problem is that if there is no Xcode command line tools installed you cannot get snappy working. You can check by typing gcc at the command prompt to see if it is installed or not. If not then type xcode-select –-install to install it. Then installing python-snappy should work. Thanks Bin!

wget http://www.us.apache.org/dist/avro/avro-1.7.5/java/avro-tools-1.7.5.jar
java -jar avro/avro-tools-1.7.5.jar tojson input.avro > input
More information refers here

Related

Javah Error: Could not find class file for package.name

I am trying to generate C header file for JNI (Linux). I read documentation and questions on javah, but I still get the same error
Error: Could not find class file for 'org.sqlite.core.NativeDB'
I think I have very obvious mistake but I really don't see any... So, I need to generate header file from the NativeDB.class and the path is:
/u/users/maas/user123/sqlite/sqlite-jdbc-3.21.0/target/common-lib/org/sqlite/core/NativeDB.class
I go to the common-lib folder and call javah from the path of:
/u/users/maas/user123/sqlite/sqlite-jdbc-3.21.0/target/common-lib/
The commands I tried:
javah -classpath "/u/users/maas/user123/sqlite/sqlite-jdbc-3.21.0/target/common-lib/org/sqlite/core" org.sqlite.core.NativeDB
javah org.sqlite.core.NativeDB
The error I get:
Error: Could not find class file for 'org.sqlite.core.NativeDB'
I see the NativeDB.class file in the correct directory I mentioned. In the NativeDB.java (which is not in the same folder with NativeDB.class IF it is important) there is a package path:
package org.sqlite.core;
I found the issue (that was really obvious and dumb).
Commands I wrote in the question are correct. I am using USS (UNIX System Services) for Z/OS and ftp for transferring files. I didn't check that I sent .class files via ftp NOT in binary format. And for this reason Java couldn't find the classes because of wrong encoding.
All you need is just turn on the binary mode in the ftp like so:
ftp server.name.com
..login...
bi
mput *class
The bi command enables binary mode. The .class files are expected to be in this format.

Get File Version of .exe in java on Linux

Question - Get File Version of .exe in java on Linux for some strange client.
Solution -
I used JNA library to read file version using Java. Given below code is running fine on windows platform but it is throwing below error on Linux docker image.
"Unable to load library 'version': Error loading shared library libversion.so: No such file or directory Error loading shared library libversion.so: No such file or directory Native library (linux-x86-64/libversion.so) not found in resource path..".
private String GetFileVersion(String filePath) {
File fileToCheck = new File(filePath);
short[] rtnData = new short[4];
int infoSize = Version.INSTANCE.GetFileVersionInfoSize(fileToCheck.getAbsolutePath(), null);
Pointer buffer = Kernel32.INSTANCE.LocalAlloc(WinBase.LMEM_ZEROINIT, infoSize);
try {
Version.INSTANCE.GetFileVersionInfo(fileToCheck.getAbsolutePath(), 0, infoSize, buffer);
IntByReference outputSize = new IntByReference();
PointerByReference pointer = new PointerByReference();
Version.INSTANCE.VerQueryValue(buffer, "\\", pointer, outputSize);
VerRsrc.VS_FIXEDFILEINFO fileInfoStructure = new VerRsrc.VS_FIXEDFILEINFO(pointer.getValue());
rtnData[0] = (short) (fileInfoStructure.dwFileVersionMS.longValue() >> 16);
rtnData[1] = (short) (fileInfoStructure.dwFileVersionMS.longValue() & 0xffff);
rtnData[2] = (short) (fileInfoStructure.dwFileVersionLS.longValue() >> 16);
rtnData[3] = (short) (fileInfoStructure.dwFileVersionLS.longValue() & 0xffff);
return String.format("%s.%s.%s.%s", rtnData[0], rtnData[1], rtnData[2], rtnData[3]);
} catch (Exception exception) {
return null;
} finally {
Kernel32.INSTANCE.GlobalFree(buffer);
}
}
I will start by answering the question that you asked, though I doubt it is what you actually need to know.
The types of different executable file formats are encoded in the first few bytes of the file. For example, ELF files (executables, shared libraries) are described in this Wikipedia page.
So there are a number of ways to find out what kind of executable in Java:
Write some code that reads the first few bytes and decodes the file header information, as per the format described in the Wikipedia link above.
Find an existing Java library that does this and work out how to do this. (Google for "java file magic library" and see what you can find.)
Read about the Linux file command and write some Java code to run file on each library and parse the output.
What I think you actually need to do is a bit different:
Locate the file or files in the file system that the Java is looking for: apparently libversion.so or linux-x86-64/libversion.so. (The file could well be a symlink. Follow it.)
Run file on each file to check that it is the right kind of library. They need to be 32 or 64 bit corresponding the JVM you are running, and the correct ABI and ISA for the platform.
Check that the files are where the JVM expects to find them. The JVM searches for libraries in directories listed in the "java.library.path" system property. You can (if necessary) set the path using a -Djava.library.path=... JVM option.
See "java.library.path – What is it and how to use" for more information on library loading.
(There is absolutely no need to do step 2 "from" or "in" Java.)
I think I have finally worked out what you are doing.
The Version you are using is actually coming from the package com.sun.jna.platform.win32. It is not part of the JNA library (jna.jar). I think it is actually part of jna-platform.jar. If I understand things correctly, that is the generated JNA adapter library for the Windows COM dlls.
If I have that correct, you would actually need the Windows COM native libraries compiled and built for the Linux platform to do what you are trying to do.
AFAIK, that's not possible.
So how could you make this work? Basically you need to do one of the following:
Look for an existing pure Java library for extracting the version information from a Windows ".exe" file. I don't think it is likely that you will find one.
Find the specification for the Windows ".exe" file format and write your own Java code to extract the version information. I haven't looked for the spec to see how much work it would be.
Then you rewrite the code that you added your question to use the alternative API.
The "libversion" file that I mentioned in my other answer is not relevant. It is something else. It is a red herring.

Matlab installation (LD_LIBRARY_PATH) messes up other library files

I am trying to install Matlab on a Linux machine, but setting LD_LIBRARY_PATH (as the installation requires) breaks other library files. I am not an Linux expert, but I have tried several things and cannot get it working correctly. I have even contacted Matlab support, got the issue elevated to the dev team, and was basically told "haha sucks to suck". I have seen a few other people online have had the same issue, but either their questions were never answered or they had a slightly different problem and their solution didn't apply to me.
Installing on a VM running Ubuntu:
I set LD_LIBRARY_PATH as the instructions say, then it breaks network files. I can ping google.com, but I cannot nslookup google.com or visit it in a browser. Nslookup provides this error:
nslookup: /usr/local/MATLAB/MATLAB_Runtime/v90/bin/glnxa64/libcrypto.so.1.0.0: no version information available (required by /usr/lib/libdns.so.100)
03-Feb-2016 11:32:22.361 ENGINE_by_id failed (crypto failure)
03-Feb-2016 11:32:22.362 error:25070067:DSO support routines:DSO_load:could not load the shared library:dso_lib.c:244:
03-Feb-2016 11:32:22.363 error:260B6084:engine routines:DYNAMIC_LOAD:dso not found:eng_dyn.c:447:
03-Feb-2016 11:32:22.363 error:2606A074:engine routines:ENGINE_by_id:no such engine:eng_list.c:418:id=gost
(null): dst_lib_init: crypto failure
The installation worked though (I can run my Java programs that reference compiled Matlab functions). Unsetting LD_LIBRARY_PATH fixes the network files but then I can't run programs anymore.
Installing on EC2 instance:
On an EC2 instance it does not break the network files (nslookup is fine). Instead it messes up Python library files. Trying to use any aws cli command, I get the error:
File "/usr/bin/aws", line 19, in <module>
import awscli.clidriver
File "/usr/lib/python2.7/dist-packages/awscli/clidriver.py", line 16, in <module>
import botocore.session
File "/usr/lib/python2.7/dist-packages/botocore/session.py", line 25, in <module>
import botocore.config
File "/usr/lib/python2.7/dist-packages/botocore/config.py", line 18, in <module>
from botocore.compat import six
File "/usr/lib/python2.7/dist-packages/botocore/compat.py", line 139, in <module>
import xml.etree.cElementTree
File "/usr/lib64/python2.7/xml/etree/cElementTree.py", line 3, in <module>
from _elementtree import *
ImportError: PyCapsule_Import could not import module "pyexpat"
Printing sys.path in Python shows lib-dynload is already there though, so it doesn't seem to the problem.
And when trying to run the program, I get:
Exception in thread "main" java.lang.LinkageError: libXt.so.6: cannot open shared object file: No such file or directory
at com.mathworks.toolbox.javabuilder.internal.DynamicLibraryUtils.dlopen(Native Method)
at com.mathworks.toolbox.javabuilder.internal.DynamicLibraryUtils.loadLibraryAndBindNativeMethods(DynamicLibraryUtils.java:134)
at com.mathworks.toolbox.javabuilder.internal.MWMCR.<clinit>(MWMCR.java:1529)
at VectorAddExample.VectorAddExampleMCRFactory.newInstance(VectorAddExampleMCRFactory.java:48)
at VectorAddExample.VectorAddExampleMCRFactory.newInstance(VectorAddExampleMCRFactory.java:59)
at VectorAddExample.VectorAddClass.<init>(VectorAddClass.java:62)
at com.mypackage.Example.main(Example.java:13)
I'm at a brick wall and really have no clue how to proceed.
Maybe something else already needs LD_LIBRARY_PATH set to work. Make sure you prepend not overwrite:
export LD_LIBRARY_PATH=new/path:$LD_LIBRARY_PATH
Edit:
OK, if LD_LIBRARY_PATH was initially empty, this suggests that Matlab comes with shared libraries that are incompatible with your system ones:
nslookup: /usr/local/MATLAB/MATLAB_Runtime/v90/bin/glnxa64/libcrypto.so.1.0.0: no version information available (required by /usr/lib/libdns.so.100)
suggests that /usr/lib/libdns.so.100 needs libcrypto.so.1.0.0, which is now being resolved to the one that comes with MATLAB, which is incompatible.
You can check the dependencies of a dll by
ldd /usr/lib/libcrypto.so.1.0.0
and hopefully you can find a configuration that keeps both MATLAB and your system happy. Unfortunately, this may involve a lot of trial and error.
If there is no such configuration, you can try setting LD_LIBRARY_PATH only when you run MATLAB:
LD_LIBRARY_PATH=$MATLAB_LD_LIBRARY_PATH matlab
Edit 2:
Well, for the Python issue, it seems to boil down to pyexpat, which is a wrapper around the standard expat XML parser. Try doing (name guessed since I don't have a Linux right now):
ldd /usr/local/lib/python2.7/site-packages/libpyexpat.so
and see what that depends on. Probably, it will be libexpat.so, which is now being resolved to MATLAB's version.
try the following command:
export LD_LIBRARY_PATH=/usr/local/MATLAB/MATLAB_Runtime/v90/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v90/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v90/sys/os‌​/glnxa64:$LD_LIBRARY_PATH
Perhaps not helpful for OP but if you are generating a python package with MATLAB, you could modify the generated __init__.py file MATLAB creates for your package.
Specifically, the generated __init__.py file contains the following line (as of MATLAB 2017a):
PLATFORM_DICT = {'Windows': ['PATH','dll',''], 'Linux': ['LD_LIBRARY_PATH','so','libmw'], 'Darwin': ['DYMCR_LIBRARY_PATH','dylib','libmw']}
For Linux platform, you could simply replace LD_LIBRARY_PATH with something else such as MCR_LIBRARY_PATH to prevent mucking with your shared libs.
sed -i -e 's/LD_LIBRARY_PATH/MCR_LIBRARY_PATH/g' /MY/PACKAGE/BUILD/PATH/__init__.py
Then obviously export MCR_LIBRARY_PATH before using python.

Why couldn't "org.antlr.v4.runetime.misc.TestRig" not be found or load?

So, here's my problem. I've got my ANTLR4 code successfully compiled, without errors and now I want to test it out. The ANTLR4 Documentation tells me, to test my applications, I shall do this:
java org.antlr.v4.runtime.misc.TestRig
I've tried this and got following error:
Error: Main Class org.antlr.v4.runtime.misc.TestRig couldn't be found or load.
I've checked if my CLASSPATH wasn't set, but everything was correctly set as it should be. I also tried moving the file directly to my test folder and opened CMD there and tried it again, I occur the same error. Searching in the Internet didn't help, as no one seemed to have occurred this error with ANTLR4 before.
Specs:
Java 1.7.0.55
ANTLR 4.4
There seems to be something wrong with your classpath, contrary to your belief everything is okay.
When I download the ANTLR 4 JAR and run TestRig:
wget http://www.antlr.org/download/antlr-4.4-complete.jar
...
java -cp antlr-4.4-complete.jar org.antlr.v4.runtime.misc.TestRig
I see the following on my console:
java org.antlr.v4.runtime.misc.TestRig GrammarName startRuleName
[-tokens] [-tree] [-gui] [-ps file.ps] [-encoding encodingname]
[-trace] [-diagnostics] [-SLL]
[input-filename(s)]
Use startRuleName='tokens' if GrammarName is a lexer grammar.
Omitting input-filename makes rig read from stdin.

jython 2.5.3 on unix : interactive shell with command completion

After having spend 4 days on searching for a working solution, i guess i need to ask.
So far i'm successfully working withj jython 2.5.2 or 2.5.3, with a modifier thinClient.sh that loads what i need. It connects successfully to a DeploymentManager with either IPC or SOAP connector.
However it lacks the readline module:
wsadmin>import readline
WASX7015E: Exception running command: "import readline"; exception information:
com.ibm.bsf.BSFException: exception from Jython:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/usr2/produits/websphere7/AppServer/thinClient/lib/jython/Lib/readline.py", line 20, in <module>
raise ImportError("Cannot access JLineConsole")
ImportError: Cannot access JLineConsole
The goal is to make it interactive, with colored prompt and sofort.
I tried so far with absolutely no success:
org.python.util.JLineConsole / org.python.util.ReadlineConsole (misses readline module)
Readline-1.7 (does nothing)
java-readline / libreadline-java-0.8.0 (misses readline module)
JLine (won't load the jar)
pyreadline (won't integrate to jython)
So:
is it possible with jython 2.5.3 under IBM AIX 64, with a thinClient (jython-installer-2.5.3.jar) to have a real interactive shell with bash like completion and command recall using arrow keys, without having to build/compile something ?
if yes, please somebody describe a working solution:
what's the solution name ?
what's in your wsadmin.properties ?
which libraries to load in LIBPATH ?
which class to load in CLASSPATH ?
which command line to invoque with java ?
There are so many "solutions" or "propositions" to this frenquently asked question on the web, but nowhere did i found a well described or working solution. Too much information scattered all around just becomes a mess :(
thanks for any help !
ok i found a workaround, that was so easy to answer myself:
rlwrap -H $THIN_CLIENT_HOME/logs/rlwrap.history.log -f $THIN_CLIENT_HOME/etc/rlwrap.jython.words.txt -r -pBlue -z $THIN_CLIENT_HOME/etc/rlwrap.prompt.pl $CMDLINE
org.python.util.* and com.ibm.ws.scripting.WasxShell are mutually incompatible

Categories