Transferring 4D array from Python to Jython - java

My question is basically the title: I have a number of large numpy arrays that I want to port to a Java application, eventually. The only way I see myself doing this is by first transferring this data to Jython. However, I am not sure how to do this as numpy doesn't exist in Jython.

Well, Python will easily let you serialize your data to files in whatever format you want in 3 lines of code. What format your java application can read from?
If you don't want to write data to disk, or even can't duplicate the in-memory data to pass to other process one thing to check is cap'n'proto -https://capnproto.org/
One way of serializing the arrays as json encoded data files is simply:
import json
json.dump(myarray.tolist(), open("myfile.json", "wt"))
If you Java side can read json, that is all you need.

Related

get csv data using java and validate it against expected results

I have the following data in a CSV file:
video1duration,video2duration,video3duration
00:01:00, 00:00:24, 00:00:15
00:01:00, 00:00:24, 00:00:15
00:01:00, 00:00:24, 00:00:15
The file is stored in a folder locally in my computer.
I need help with writing code to do the followings:
- pass the path of the CSV file to access its data, then treat the data as actual data and then validate each cell/value against expected data that will be written in the IDE as follows:
video1duration,video2duration,video3duration
00:02:00, 00:05:24, 00:00:15
00:04:00, 00:10:24, 00:00:15
00:01:00, 00:00:24, 00:00:15
As I understand your question, you have two-stage process. Trying to merge these two separate things into one will certainly result in less legible and harder-to-maintain code (everything as one giant package/class/function).
Your first stage is to import a .csv file and parse it using any of these 3 methods: Using java.util.Scanner
Using String.split() function
Using 3rd Party libraries like OpenCSV
It is possible to validate that your .csv is valid, and that it contains tabular data without knowing or caring about what the data will later be used for.
In the second stage take tabular data (e.g. an array of arrays) and turn it into a tree. At this point, your hierarchy package will be doing validation but it will only be validating the tree structure (e.g. every node except the root has one parent, etc.). If you want to delve further this might be interesting: https://www.npmjs.com/package/csv-file-validator.

Is serializing in Java the best/easiest way to store and later access (a small amount of) data?

I am relatively new to Java and have much more experience with Matlab. I was wondering what the best way is to store a relatively small amount of data, which has been calculated in one program, that should be used in another program.
Example: program A computes 100 values to be stored in an array. Now I would like to access this array in program B, as it needs these values. Of course, I could just write one program all together, which also implements the part of A. However, now every time I want to execute the total program, all the values have to be calculated again (in part A), which is a waste of resources. In Matlab, I was able to easily save the array in a .mat file and load it in a different script.
Looking around to find my answer I found the option of serializing (What is object serialization? ), which I think would be a suitable for doing what I want. My question: is serializing the easiest and quickest solution to store a small amount of data in Java, or is there a quicker, more user-friendly option (like .mat files in Matlab)?
I think you have several options to do this job. Java object serialization is one possible way. From my point of view there are other options to serialize the data:
Write and read a simple text file to store the computed values.
Using Java Architecture for XML Binding (JAXB) to write annotated Java classes to XML file. Same for JSON is also available.
Using a lightweight database like SQLite or HSQLDB (native Java database).
Using Apache Thrift or Protocol Buffer to de/serializing Java objects to files.

My JSON files are too big to fit into memory, what can I do?

In my program, I am reading a series of text files from the disk. With each text file, I process out some data and store the results as JSON on the disk. In this design, each file has its own JSON file. In addition to this, I also store some of the data in a separate JSON file, which stores relevant data from multiple files. My problem is that the shared JSON grows larger and larger with every file parsed, and eventually uses too much memory. I am on a 32-bit machine and have 4 GB of RAM, and cannot increase the memory size of the Java VM anymore.
Another constraint to consider is that I often refer back to the old JSON. For instance, say I pull out ObjX from FileY. In pseudo code, the following happens (using Jackson for JSON serialization/deserialization):
// In the main method.
FileYJSON = parse(FileY);
ObjX = FileYJSON.get(some_key);
sharedJSON.add(ObjX);
// In sharedJSON object
List objList;
function add(obj)
if (!objList.contains(obj))
objList.add(obj);
The only thing I can think to do is use streaming JSON, but the problem is that I frequently need to access the JSON that came before, so I don't know that stream will work. Also my data types on not only strings, which prevents me from using Jackson's streaming capabilities (I believes). Does anyone know of a good solution?
If you're getting to the point where your data structures are so large that you're running out of memory, you'll have to start using something else. I would recommend that you use a database, which will significantly speed up data retrieval and storage. It will also make the limit of your data structure the size of your hard drive, instead of the size of your RAM.
Try this page for an introduction to Java and Databases.
I can't believe that you really need nearly 4GB RAM only for text files and JSON.
I see three possible solutions.
Switch to plain text if it's possible. That is not that memory hungry.
Just open and close the files as you need them. You can order the files to a specific naming convention, like the first two/three/... digits of their hashes, and open them as you need them.
If you have so many data, you could maybe switch to a database. That would save a lot of resources.
I would prefer option 3 if it's possible for you.
you can make api and get responce.body from it

Writing a file in binary or bytecode

I am storing large amounts of information inside of text files that are written via java. I have two questions relating to this:
Is there any efficiency boost to writing in binary or bytecode over Strings?
What would I use to write the data type into a file.
I already have a setup based around Strings, but I want to compare and at least know how to write the file in bytecode or binary.
When I read in the file, it will be translated into Strings again, but according to my reasearch if I write the file straight into bytecode it removes the added process on both ends of translating between Strings and code both for writing the file and for reading it.
cHao has a good point about just using Strings anyway, but I am still interested in the how if how to write varied data types in the file.
In other words, can I still use the FileReader and BufferedReader to read and translate back to Strings, or is there another thing to use. Also using a BinaryWriter, is it still just the FileWriter class that I use???
If you want to write it in "binary", and you want to save space, why not just zip it using the jdk? Meets all your requirements.

Palm Database (PDB) files in Java?

Has anybody written any classes for reading and writing Palm Database (PDB) files in Java? (I mean on a server, not on the Palm device itself.) I tried to google, but all I got were Protein Data Bank references.
I wrote a Perl program that does it using Palm::PDB.pm, but I want to turn it into a servlet for a GWT app.
The jSyncManager project at http://www.jsyncmanager.org/ is under the LGPL and includes classes to read and write PDB files -- look in jSyncManager/API/Protocol/Util/DLPDatabase.java in its source code. It looks like the core code you need from this could be isolated from the rest of the library with a little effort.
There are a few ways that you can go about this;
Easiest but slowest: Find a perl-> java bridge. This will not be quick, but it will work and it should involve the least amount of work.
Find a C++/C# implementation that you have the source to and convert it (this should be the fastest solution)
Find a Java reader ... there seems to be a few listed under google... however I do not have any experience with these.
Depending on what your intended usage is, you might look into writing a simple reader yourself. The format is pretty simple and you only need to handle a couple of simple fields to parse it.
Basically there is a header for the entire file which has a 2 byte integer at the end which specifies the number of record. So just skip your way through the bytes for all the other fields in the header and then read the last field which is the number of records in the file. Be aware that the PDB format writes integers with most significant byte first.
Following this, there will be a record header for each record, the first field of which is the actual offset into the file for the record itself. Again, be aware of the byte order.
So, now you have the offsets into the file for each record in the file, which should make it very easy to read the actual records as long as you know the format of these for the type of PDB file you are trying to read.
Wikipedia has a nice overview of the header formats.
Maybe JPilot can help? They must have a lot of Java code dealing with Palm OS data.

Categories