get netcdf variable start offset - java

I am using netcdf-java to access netcdf files and variables. I wonder how I can get each variable's start offset and seek to that offset directly. The following is my current code which I get the variables from the method findVariable
NcHdfsRaf raf = new NcHdfsRaf(file, job.getConfiguration());
NetcdfFile ncfile = WRFFile.openFile(raf, path.toString());
Variable timesVar = ncfile.findVariable("Temperature");

Huh. Why do you want that? It's probably a terrible idea. You already have an interface for reading data, and that interface insulates you from any underlying file format changes. Furthermore, remember that netcdf is a portable file format: if you read 1000 bytes from a given offset, those bytes might not be what you expect -- the library will deal with endian conversions and any possible type conversions that must happen.
With all that out of the way, if for some goofy reason you wanted to get the offset, I don't see anything in the Java class that will let you do that:
http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/javadoc/ucar/nc2/Variable.html
If you were doing this in the C library, underneath the public API, you could peek at types of NC_Var: the 'begin' member of the struct is where data begins:
https://github.com/Unidata/netcdf-c/blob/master/include/nc3internal.h#L159

Related

Efficient way to parse a datagram in Java

Right now I am using a socket and a datagram packet. This program is for a LAN network and sends at least 30 packets a second at 500 bytes maximum.
this is how I am receiving my data
payload = new String(incomingPacket.getData(), incomingPacket.getOffset(), incomingPacket.getLength(), "UTF-8");
Currently I am using no offset and I parse one by one through each character. I use the first 2 characters right now to determine what type of message it is but that is subject to change, then I break down variables and seperate the data with an exclamation mark to tell me when the next variable begins. At the end I parse it and apply it to my program. Is there a faster way to break down and interpret datagram packets? Will there be a performance difference if I put the length of the variables in the offset. Maybe an example would be useful. Also I think my variables are too small to use StringBuilder so I use normal concatenation.
What you are talking about here is setting up your own protocol for communication. While I have this as the fourth part of my socket tutorial (I'm currently working on part 3, non-blocking sockets) I can explain some things here already.
There are several ways of setting up such a protocol, depending on your needs.
One way of doing it is having a byte in front of each piece of data declaring the size, in bytes. That way, you know the length of the byte array containing the next variable value. This makes it easy to read out whole variables in one go via the System.arraycopy method. This is a method I've used before. If the object being sent is always the same, this is all you need to do. Write the variables in a standardized order, add the size of the variable value and you're done.
If you have to send multiple types of objects throught the stream you might want to add a bit of meta data. This meta data can then be used to tell what kind of object is being sent and the order of the variables. This meta data can then be put in a header which you add before the actual message. Once again, the values in the header are preceded by the size byte.
In my tutorial I'll write up a complete code example.
Don't use a String at all. Just process the underlying byte array directly: scan it for delimiters, counts, what have you. You can use a DataInputStream wrapped around a ByteArrayInputStream wrapped around the byte array if you want an API oriented to Java primitive datatypes.

Why to avoid using ByteStream much in Java

We shouldn't use byte Stream as Sun Doc says -
actually it represents a kind of low-level I/O that you should avoid.
What is actually low-level I/O and what is exact problem using byte stream.
So the Java docs say:
CopyBytes seems like a normal program, but it actually represents a
kind of low-level I/O that you should avoid. Since xanadu.txt contains
character data, the best approach is to use character streams, as
discussed in the next section. There are also streams for more
complicated data types. Byte streams should only be used for the most
primitive I/O.
The byte streams give you access to the file as it is. Just the bytes. No interpration of any kind. That means no character set conversion, no handling of ints or floats in binary or ascii representation, no dealing with byte orders, or any of that. The higher level streams provide some of these.
Of course a program that copies a file is actually a pretty good example of something that needs a raw byte stream, because it doesn't need or want to do any kind of intepretation of the data; it just wants to copy it verbatim.
So what the really mean is, use byte streams if you think you need them, but be sure you know what you are doing :)
The suggestion is in the context of reading a text file that is discussed in the tutorial. For that purpose it is better to use character streams to handle character set translation properly:
The Java platform stores character values using Unicode conventions.
Character stream I/O automatically translates this internal format to
and from the local character set.
A program that uses character streams in place of byte streams
automatically adapts to the local character set and is ready for
internationalization — all without extra effort by the programmer.

How to initialize huge float arrays in java, android?

I was creating a opengl android application. I was trying to render a opengl object with vertices more than 50,000.
float itemVerts [] = {
// f 231/242/231 132/142/132 131/141/131
0.172233487787643f, -0.0717437751698985f, 0.228589675538813f,
0.176742968653347f, -0.0680393472738536f, 0.2284149434494f,
0.167979223684599f, -0.0670168837233226f, 0.24286384937854f,
// f 131/141/131 230/240/230 231/242/231
0.167979223684599f, -0.0670168837233226f, 0.24286384937854f,
0.166391290343292f, -0.0686544011752973f, 0.241920432968569f,......
and many more.... But when i do this in a function or constructor i get a error while compiling that The code of method () is exceeding the 65535 bytes limit. So I was wondering if there is a different way to do this.
I tried storing the value in file and reading it back. But the IO operation, with string parsing of such huge record is very slow. Takes more than 60 sec. Which is not good.
Please let me know if there is any other way to do this. Thank you for your time and help.
But when i do this in a function or constructor i get a error while
compiling that The code of method () is exceeding the 65535 bytes
limit. So I was wondering if there is a different way to do this.
Put it outside the constructor (as a class variable or field)? If this doesn't change, just make it a constant. If it does change, make it a constant anyway and copy it in the constructor.
I tried storing the value in file and reading it back. But the IO
operation, with string parsing of such huge record is very slow. Takes
more than 60 sec. Which is not good.
If you do decide to keep it in an external file and read it in, don't read it as a string, just serialize it somehow (Java serialization, Protocol Buffers, etc.).
The program don't have to parse the float if we preprocess the data.
Write another program that write all float to a binary file using DataOutputStream.
In your program, read them back using DataInputStream. You might want to chain it with BufferedInputStream.
For this cases I normally use the assets folder to store files in binary format (even you can define some kind of file format to include the vertex, normals etc...) and allocate it on application initialization as wannik explains.
I would proprocess and store floats in binary form, then mmap it as byte buffer and create fload array out of it. This way you get float array, without parsing or allocation of space.

How do I identify that I am at the last byte of a serialized Java object?

Question
What is (if there is any) terminating characters/byte sequences in serialized java objects?
Background
I'm working on a small self-education project where I would like to serialize java objects and write them to a stream where there are read and then unserialized. Since, I will need to identify the borders between serialized objects and I can't be sure that the current object is not the last one, is there a terminating character that is always there that I can use as my identifier?
I noticed that there is a magic number ACED that allows me to identify the start of the object, so how do I identify the end?
EDIT:
If there is no terminating character, is there any safe terminating characters/sequences that I can use (insert) to identify the end of the object?
In theory you should always be able to find the end of an object, in practice you cannot. I understand the problem is customised writeObject implementations that don't call either defaultReadObject or readFields have a non-standard representation.
I've played about with serialisation in the past. Including creating streams for use when I've been doing unusual things to the ObjectInputStream. It's not pleasant(!).
You can read the details in the spec, and the source is worth a read.
there are none. AFAIK the only requirement is that the deserialiser know when to stop reading, when given a corresponding serialisation. subject to that, the serialiser can write whatever it wants -- in any position not just the last.
if you're old skool dump a 32-bit length field at the beginning a refuse to handle objects bigger than 4 gig.
nu scool, you just make sure your read and your write logic are consistent and don't care about the length.
You can add a terminating object to your object stream. e.g. null or a special String.
However, I suggest that you instead convert the ObjectsStream to a byte[] and write the byte length of the byte[] followed by its data. This way each ObjectStream is independent and you always know where it finishes.
Have you considered applying a record-marking layer similar to HTTP Chunked encoding?
The Chunked encoding is intended to solve a generalization of this scenario: identifying the end of a message of indeterminate length that both itself contains no identifiable end, and is embedded in a longer stream without ending it.

Developing a (file) exchange format for java

I want to come up with a binary format for passing data between application instances in a form of POFs (Plain Old Files ;)).
Prerequisites:
should be cross-platform
information to be persisted includes a single POJO & arbitrary byte[]s (files actually, the POJO stores it's names in a String[])
only sequential access is required
should be a way to check data consistency
should be small and fast
should prevent an average user with archiver + notepad from modifying the data
Currently I'm using DeflaterOutputStream + OutputStreamWriter together with InflaterInputStream + InputStreamReader to save/restore objects serialized with XStream, one object per file. Readers/Writers use UTF8.
Now, need to extend this to support the previously described.
My idea of format:
{serialized to XML object}
{delimiter}
{String file name}{delimiter}{byte[] file data}
{delimiter}
{another String file name}{delimiter}{another byte[] file data}
...
{delimiter}
{delimiter}
{MD5 hash for the entire file}
Does this look sane?
What would you use for a delimiter and how would you determine it?
The right way to calculate MD5 in this case?
What would you suggest to read on the subject?
TIA.
It looks INsane.
why invent a new file format?
why try to prevent only stupid users from changing file?
why use a binary format ( hard to compress ) ?
why use a format that cannot be parsed while being received? (receiver has to receive entire file before being able to act on the file. )
XML is already a serialization format that is compressable. So you are serializing a serialized format.
Would serialization of the model (if you are into MVC) not be another way? I'd prefer to use things in the language (or standard libraries) rather then roll my own if possible. The only issue I can see with that is that the file size may be larger than you want.
1) Does this look sane?
It looks fairly sane. However, if you are going to invent your own format rather than just using Java serialization then you should have a good reason. Do you have any good reasons (they do exist in some cases)? One of the standard reasons for using XStream is to make the result human readable, which a binary format immediately loses. Do you have a good reason for a binary format rather than a human readable one? See this question for why human readable is good (and bad).
Wouldn't it be easier just to put everything in a signed jar. There are already standard Java libraries and tools to do this, and you get compression and verification provided.
2) What would you use for a delimiter and how determine it?
Rather than a delimiter I'd explicitly store the length of each block before the block. It's just as easy, and prevents you having to escape the delimiter if it comes up on its own.
3) The right way to calculate MD5 in this case?
There is example code here which looks sensible.
4) What would you suggest to read on the subject?
On the subject of serialization? I'd read about the Java serialization, JSON, and XStream serialization so I understood the pros and cons of each, especially the benefits of human readable files. I'd also look at a classic file format, for example from Microsoft, to understand possible design decisions from back in the days that every byte mattered, and how these have been extended. For example: The WAV file format.
Let's see this should be pretty straightforward.
Prerequisites:
0. should be cross-platform
1. information to be persisted includes a single POJO & arbitrary byte[]s (files actually, the POJO stores it's names in a String[])
2. only sequential access is required
3. should be a way to check data consistency
4. should be small and fast
5. should prevent an average user with archiver + notepad from modifying the data
Well guess what, you pretty much have it already, it's built-in the platform already:Object Serialization
If you need to reduce the amount of data sent in the wire and provide a custom serialization ( for instance you can sent only 1,2,3 for a given object without using the attribute name or nothing similar, and read them in the same sequence, ) you can use this somehow "Hidden feature"
If you really need it in "text plain" you can also encode it, it takes almost the same amount of bytes.
For instance this bean:
import java.io.*;
public class SimpleBean implements Serializable {
private String website = "http://stackoverflow.com";
public String toString() {
return website;
}
}
Could be represented like this:
rO0ABXNyAApTaW1wbGVCZWFuPB4W2ZRCqRICAAFMAAd3ZWJzaXRldAASTGphdmEvbGFuZy9TdHJpbmc7eHB0ABhodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20=
See this answer
Additionally, if you need a sounded protocol you can also check to Protobuf, Google's internal exchange format.
You could use a zip (rar / 7z / tar.gz / ...) library. Many exists, most are well tested and it'll likely save you some time.
Possibly not as much fun though.
I agree in that it doesn't really sound like you need a new format, or a binary one.
If you truly want a binary format, why not consider one of these first:
Binary XML (fast infoset, Bnux)
Hessian
google packet buffers
But besides that, many textual formats should work just fine (or perhaps better) too; easier to debug, extensive tool support, compresses to about same size as binary (binary compresses poorly, and information theory suggests that for same effective information, same compression rate is achieved -- and this has been true in my testing).
So perhaps also consider:
Json works well; binary support via base64 (with, say, http://jackson.codehaus.org/)
XML not too bad either; efficient streaming parsers, some with base64 support (http://woodstox.codehaus.org/, "typed access API" under 'org.codehaus.stax2.typed.TypedXMLStreamReader').
So it kind of sounds like you just want to build something of your own. Nothing wrong with that, as a hobby, but if so you need to consider it as such.
It likely is not a requirement for the system you are building.
Perhaps you could explain how this is better than using an existing file format such as JAR.
Most standard files formats of this type just use CRC as its faster to calculate. MD5 is more appropriate if you want to prevent deliberate modification.
Bencode could be the way to go.
Here's an excellent implementation by Daniel Spiewak.
Unfortunately, bencode spec doesn't support utf8 which is a showstopper for me.
Might come to this later but currently xml seems like a better choice (with blobs serialized as a Map).

Categories