Java Automatic Field Value Serialization - java

Lets say we have a struct-llike java class
public class Person {
private int height;
private byte nChildren;
private int salary;
public byte[] serializeField() {
ByteBuffer buf = ByteBuffer.allocate(4 + 1 + 4);
buf.order(ByteOrder.BIG_ENDIAN);
buf.putInt(height);
buf.put(nChildren);
buf.putInt(salary);
return buf.array();
}
/*
* setters and getters
*/
}
Is there a library that can perform the serializeField() function automatically for any given class? It should be able to maintain the exact order of the fields as defined in the class and perhaps have the ability to ignore certain fields (like the serialVersionUID).

You can write one using reflection.
If performance matters, you can
pass in a ByteBuffer to append to to avoid creating lots of buffers you discard.
public void serializeField(ByteBuffer buf) {
buf.putInt(height);
buf.put(nChildren);
buf.putInt(salary);
}
and generate your code (using reflection etc) to avoid using reflection at runtime. i.e. use reflection to generate the code above or in the next example.
map the fields getters and setters into a ByteBuffer so there is no serialization/deserialization as such.
Say you have an object with a width and height. As you can see the values are updated into a ByteBuffer (e.g. a direct or memory mapped file) This way there is no serialization overhead as such.
class Sized {
private ByteBuffer buffer;
private static final int HEIGHT_OFFSET = 0;
private static final int WIDTH_OFFSET = 4;
public void setBuffer(ByteBuffer buffer) {this.buffer = buffer; }
public int getHeight() { return buffer.getInt(HEIGHT_OFFSET); }
public void setHeight(int height) { buffer.putInt(HEIGHT_OFFSET, height); }
public int getWidth() { return buffer.getInt(WIDTH_OFFSET); }
public void setWidth(int width) { buffer.putInt(WIDTH_OFFSET, width); }
}
To handle the memory mapped data you can use a library like this.
https://github.com/peter-lawrey/Java-Chronicle
This library allow you to create any number of events/records (billions) which can be updated (but not grown in size once written) very quickly. e.g. millions of records per second with sub-microsecond latency. It can also be shared between processes. Once you exceed your main memory, the main limitation is the speed of your hard drive if you exceed your memory usage. i.e. an SSD will really help. It has a mode where you use the Unsafe class directly (more low level than ByteBuffer) by just enabling it.

Related

Calling Module32First Gives Invalid Parameter (Error Code 87)

I am using JNA and writing some code to enumerate through all of the modules available in a process. I successfully obtain the snapshot handle via CreateToolhelp32Snapshot, however on my first Module32First call, I am getting error code 87 for "Invalid Parameter".
Below is the relevant code that I am using:
Code in question:
private void getModule() {
Pointer hSnapshot = this.kernel32.CreateToolhelp32Snapshot(WinAPI.TH32CS_SNAPMODULE, this.processId);
if(hSnapshot != null) {
WinAPI.MODULEENTRY32 moduleEntry32 = new WinAPI.MODULEENTRY32();
moduleEntry32.write(); // Write the struct in memory (for dwSize to be registered)
Pointer moduleEntry32Pointer = moduleEntry32.getPointer();
if(this.kernel32.Module32First(hSnapshot, moduleEntry32Pointer)) {
// irrelevant stuff here
}
}
System.out.println(this.kernel32.GetLastError()); // Prints out "87"
this.kernel32.CloseHandle(hSnapshot);
}
Kernel32 Mapper Class:
Pointer CreateToolhelp32Snapshot(int dwFlags, int th32ProcessID);
boolean Module32First(Pointer hSnapshot, Pointer lpme);
boolean Module32Next(Pointer hSnapshot, Pointer lpme);
ModuleEntry32 Structure:
public static class MODULEENTRY32 extends Structure {
public int dwSize = this.size();
public int th32ModuleID;
public int th32ProcessID;
public int GlblcntUsage;
public int ProccntUsage;
public Pointer modBaseAddr;
public int modBaseSize;
public Pointer hModule;
public String szModule;
public String szExePath;
public MODULEENTRY32() {
super();
}
public MODULEENTRY32(Pointer p) {
super(p);
}
protected List<String> getFieldOrder() {
return Arrays.asList(new String[] {"dwSize", "th32ModuleID", "th32ProcessID", "GlblcntUsage", "ProccntUsage", "modBaseAddr", "modBaseSize", "hModule", "szModule", "szExePath"});
}
}
And finally, this is "WinAPI.TH32CS_SNAPMODULE":
public static final int TH32CS_SNAPMODULE = 0x00000008;
Note: The process I am enumerating on is already opened with OpenProcess and is valid. The processId is valid as well, and is obtained properly as well.
Any help would be appreciated.
Your mapping of a pointer as the second argument, while technically correct, requires you to do a lot more overhead with writing it. It is better to simply put the structure type as an argument. Structures are treated as .ByReference when used as function/method arguments, and handle all of the auto-read and write for you. So if you do this, you can omit your write() call:
boolean Module32First(Pointer hSnapshot, WinAPI.MODULEENTRY32 lpme);
boolean Module32Next(Pointer hSnapshot, WinAPI.MODULEENTRY32 lpme);
That said, the comment on your write call points to the root cause here: you need to set the dwSize value to the size of the structure. However, because of the order of initialization of Java Objects, the Structure does not have the information it needs to determine the size when initializing the dwSize variable, so this part is not giving you the correct size:
public int dwSize = this.size();
To solve this, just declare that variable and set the size later on in the constructor, e.g.,
public static class MODULEENTRY32 extends Structure {
public int dwSize;
// other declarations
public MODULEENTRY32() {
super();
dwSize = size();
}
}
In addition your mappings for szModule and szExePath are incorrect. You've only provided Pointers (a total of 16 bytes) in the structure's memory allocation, but these are fixed length character arrays. They should be defined:
public char[] szModule = new char[MAX_MODULE_NAME32 + 1];
public char[] szExePath = new char[MAX_PATH];
Note the char[] mapping is for the Unicode (_W) version of the mapping which is a reasonably safe assumption.
But rather than write yourself, you should use the mappings that already exists in the JNA user-contributed mappings in jna-platform. The MODULEENTRY32W type is already there (yes it uses DWORD so your version is simpler; but you can look at the code to see how they handled the size.) And the Module32FirstW and Module32NextW functions are also mapped. (The W suffix is for wide characters (Unicode) which all modern Windows systems use now.)

Array of structures in a structure in JNA

My native code is
typedef struct driver_config {
unsigned int dllVersion;
unsigned int channelCount;
unsigned int reserved[10];
ChannelConfig channel[64];
} DriverConfig;
In Java my class looks like this
public class DriverConfig extends Structure {
public int dllVersion;
public int channelCount;
public int[] reserved= new int[10];
ChannelConfig[] channel = new ChannelConfig[64];
public DriverConfig() {
super();
init();
}
private void init() {
for (int i = 0; i < channel.length; i++) {
channel[i]= new ChannelConfig();
}
}
#Override
protected List<String> getFieldOrder() {
return Arrays.asList(new String[] { "dllVersion", "channelCount", "reserved" });
}
//toString()...
}
The method declaration is
int getDriverConfig(DriverConfig driverConfig);
I tried to access the method like this
DriverConfig driverConfig = new DriverConfig();
status = dll.INSTANCE.getDriverConfig(driverConfig);
System.out.println("DriverConfig Status: " + status);
System.out.println(driverConfig.toString());
If channel.length is replaced with less then 50 the array is initialized correctly but with channel.length it did not work. It even did not show any error just nothing.
Your getFieldOrder() array does not include the last element (channel) of your structure. I see in your comments that you attempted to do this but received an error because you have not declared it public. All elements of your structure must be listed in the FieldOrder and also declared public so they can be found with reflection.
Also, with JNA 5.x (which you should be using) the #FieldOrder annotation is preferred.
You haven't identified the mapping for ChannelConfig, but your question title and this API link matching your structure indicate that it is a nested structure array. Structure arrays must be allocated using contiguous memory, either by directly allocating the native memory (new Memory()) which requires knowing the structure size, or by using Structure.toArray(). Allocating in a loop as you have done will end up with memory for each new structure allocated at possibly/probably non-contiguous locations in native memory. Given that you state that it appears to work for some values, you might be getting lucky with contiguous allocations, but your behavior is certainly undefined.
Your structure mapping should therefore be:
#FieldOrder ({"dllVersion", "channelCount", "reserved", "channel"})
public class DriverConfig extends Structure {
public int dllVersion;
public int channelCount;
public int[] reserved= new int[10];
public ChannelConfig[] channel = (ChannelConfig[]) new ChannelConfig().toArray(64);
}

Library for serializing java objects to fixed-width byte arrays

I would like to store a very simple pojo object in binary format:
public class SampleDataClass {
private long field1;
private long field2;
private long field3;
}
To do this, I have written a simple serialize/deserialize pair of methods:
public class SampleDataClass {
// ... Fields as above
public static void deserialize(ByteBuffer buffer, SampleDataClass into) {
into.field1 = buffer.getLong();
into.field2 = buffer.getLong();
into.field3 = buffer.getLong();
}
public static void serialize(ByteBuffer buffer, SampleDataClass from) {
buffer.putLong(from.field1);
buffer.putLong(from.field2);
buffer.putLong(from.field3);
}
}
Simple and efficient, and most importantly the size of the objects in binary format is fixed. I know the size of each record serialized will be 3 x long, i.e. 3 x 8bytes = 24 bytes.
This is crucial, as I will be recording these sequentially and I need to be able to find them by index later on, i.e. "Find me the 127th record".
This is working fine for me, but I hate the boilerplate - and the fact that at some point I'm going to make a mistake and end up write a load of data that can't be read-back because there's an inconsistency between my serialize / deserialize method.
Is there a library that generate something like this for me?
Ideally I'm looking for something like protobuf, with a fixed-length encoding scheme. Later-on, I'd like to encode strings too. These will also have a fixed length. If a string exceeds the length it's truncated to n bytes. If a string is too short, I'll null-terminate it (or similar).
Finally, protobuf supports different versions of the protocol. It is inevitable I'll need to do that eventually.
I was hoping someone had a suggestion, before I start rolling-my-own
Make your class inherit the java.io.Serializable interface. Then you can use java.io.ObjectOutputStream and java.io.ObjectInputStream to serialize / deserialize objects to / from streams. The write and read methods take byte arrays as arguments.
To make it fixed length, standardize the size of the byte[] arrays used.
The most difficult part here is capping your strings or collections. You can do this with Kryo for Strings by overriding default serializers. Placing strings into a custom buffer class (i.e. FixedSerializableBuffer) which stores or is annotated with a length to cut also makes sense.
public class KryoDemo {
static class Foo{
String s;
long v;
Foo() {
}
Foo(String s, long v) {
this.s = s;
this.v = v;
}
#Override
public String toString() {
final StringBuilder sb = new StringBuilder("Foo{");
sb.append("s='").append(s).append('\'');
sb.append(", v=").append(v);
sb.append('}');
return sb.toString();
}
}
public static void main(String[] args) {
Kryo kryo = new Kryo();
Foo foo = new Foo("test string", 1);
kryo.register(String.class, new Serializer<String>() {
{
setImmutable(true);
setAcceptsNull(true);
}
public void write(Kryo kryo, Output output, String s) {
if (s.length() > 4) {
s = s.substring(0, 4);
}
output.writeString(s);
}
public String read(Kryo kryo, Input input, Class<String> type) {
return input.readString();
}
});
// serialization part, data is binary inside this output
ByteBufferOutput output = new ByteBufferOutput(100);
kryo.writeObject(output, foo);
System.out.println("before: " + foo);
System.out.println("after: " + kryo.readObject(new Input(output.toBytes()), Foo.class));
}
}
This prints:
before: Foo{s='test string', v=1}
after: Foo{s='test', v=1}
If the only additional requirement over standard serialization is efficient random access to the n-th entry, there are alternatives to fixed-size entries, and that you will be storing variable length entries (such as strings) makes me think that these alternatives deserve consideration.
One such alternative is to have a "directory" with fixed length entries, each of which points to the variable length content. Random access to an entry is then implemented by reading the corresponding pointer from the directory (which can be done with random access, as the directory entries are fixed size), and then reading the block it points to. This approach has the disadvantage that an additional I/O access is required to access the data, but permits a more compact representation of the data, as you don't have to pad variable length content, which in turn speeds up sequential reading. Of course, neither the problem nor the above solution is novel - file systems have been around for a long time ...

Are there any tricks to reduce memory usage when storing String data type in hashmap?

I need to store value pair (word and number) in the Map.
I am trying to use TObjectIntHashMap from Trove library with char[] as the key, because I need to minimize the memory usage. But with this method, I can not get the value when I use get() method.
I guess I can not use primitive char array to store in a Map because hashcode issues.
I tried to use TCharArrayList but that takes much memory also.
I read in another stackoverflow question that similar with my purpose and have suggestion to use TLongIntHashMap , store encode values of String word in long data type. In this case my words may contains of latin characters or various other characters that appears in wikipedia collections, I do not know whether the Long is enough for encode or not.
I have tried using Trie data structure to store it, but I need to consider my performance also and choose the best for both memory usage and performance.
Do you have any idea or suggestion for this issue?
It sounds like the most compact way to store the data is to use a byte[] encoded in UTF-8 or similar. You can wrap this in your own class or write you own HashMap which allows byte[] as a key.
I would reconsider how much time it is worth spending to save some memory. If you are talking about a PC or Server, at minimum wage you need to save 1 GB for an hours work so if you are only looking to save 100 MB that's about 6 minutes including testing.
Write your own class that implements CharSequence, and write your own implementation of equals() and hashcode(). The implementation would also pre-allocate large shared char[] storage, and use bits of it at a time. (You can definitely incorporate #Peter Lawrey's excellent suggestion into this, too, and use byte[] storage.)
There's also an opportunity to do a 'soft intern()' using an LRU cache. I've noted where the cache would go.
Here's a simple demonstration of what I mean. Note that if you need heavily concurrent writes, you can try to improve the locking scheme below...
public final class CompactString implements CharSequence {
private final char[] _data;
private final int _offset;
private final int _length;
private final int _hashCode;
private static final Object _lock = new Object();
private static char[] _storage;
private static int _nextIndex;
private static final int LENGTH_THRESHOLD = 128;
private CompactString(char[] data, int offset, int length, int hashCode) {
_data = data; _offset = offset; _length = length; _hashCode = hashCode;
}
private static final CompactString EMPTY = new CompactString(new char[0], 0, 0, "".hashCode());
private static allocateStorage() {
synchronized (_lock) {
_storage = new char[1024];
_nextIndex = 0;
}
}
private static CompactString storeInShared(String value) {
synchronized (_lock) {
if (_nextIndex + value.length() > _storage.length) {
allocateStorage();
}
int start = _nextIndex;
// You would need to change this loop and length to do UTF encoding.
for (int i = 0; i < value.length(); ++i) {
_storage[_nextIndex++] = value.charAt(i);
}
return new CompactString(_storage, start, value.length(), value.hashCode());
}
}
static {
allocateStorage();
}
public static CompactString valueOf(String value) {
// You can implement a soft .intern-like solution here.
if (value == null) {
return null;
} else if (value.length() == 0) {
return EMPTY;
} else if (value.length() > LENGTH_THRESHOLD) {
// You would need to change .toCharArray() and length to do UTF encoding.
return new CompactString(value.toCharArray(), 0, value.length(), value.hashCode());
} else {
return storeInShared(value);
}
}
// left to reader: implement equals etc.
}

Can I allocate objects contiguously in java?

Assume I have a large array of relatively small objects, which I need to iterate frequently.
I would like to optimize my iteration by improving cache performance, so I would like to allocate the objects [and not the reference] contiguously on the memory, so I'll get fewer cache misses, and the overall performance could be segnificantly better.
In C++, I could just allocate an array of the objects, and it will allocate them as I wanted, but in java - when allocating an array, I only allocate the reference, and the allocation is being done one object at a time.
I am aware that if I allocate the objects "at once" [one after the other], the jvm is most likely to allocate the objects as contiguous as it can, but it might be not enough if the memory is fragmented.
My questions:
Is there a way to tell the jvm to defrag the memory just before I start allocating my objects? Will it be enough to ensure [as much as possible] that the objects will be allocated continiously?
Is there a different solution to this issue?
New objects are creating in the Eden space. The eden space is never fragmented. It is always empty after a GC.
The problem you have is when a GC is performed, object can be arranged randomly in memory or even surprisingly in the reverse order they are referenced.
A work around is to store the fields as a series of arrays. I call this a column-based table instead of a row based table.
e.g. Instead of writing
class PointCount {
double x, y;
int count;
}
PointCount[] pc = new lots of small objects.
use columns based data types.
class PointCounts {
double[] xs, ys;
int[] counts;
}
or
class PointCounts {
TDoubleArrayList xs, ys;
TIntArrayList counts;
}
The arrays themselves could be in up to three different places, but the data is otherwise always continuous. This can even be marginally more efficient if you perform operations on a subset of fields.
public int totalCount() {
int sum = 0;
// counts are continuous without anything between the values.
for(int i: counts) sum += i;
return i;
}
A solution I use is to avoid GC overhead for having large amounts of data is to use an interface to access a direct or memory mapped ByteBuffer
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
public class MyCounters {
public static void main(String... args) {
Runtime rt = Runtime.getRuntime();
long used1 = rt.totalMemory() - rt.freeMemory();
long start = System.nanoTime();
int length = 100 * 1000 * 1000;
PointCount pc = new PointCountImpl(length);
for (int i = 0; i < length; i++) {
pc.index(i);
pc.setX(i);
pc.setY(-i);
pc.setCount(1);
}
for (int i = 0; i < length; i++) {
pc.index(i);
if (pc.getX() != i) throw new AssertionError();
if (pc.getY() != -i) throw new AssertionError();
if (pc.getCount() != 1) throw new AssertionError();
}
long time = System.nanoTime() - start;
long used2 = rt.totalMemory() - rt.freeMemory();
System.out.printf("Creating an array of %,d used %,d bytes of heap and tool %.1f seconds to set and get%n",
length, (used2 - used1), time / 1e9);
}
}
interface PointCount {
// set the index of the element referred to.
public void index(int index);
public double getX();
public void setX(double x);
public double getY();
public void setY(double y);
public int getCount();
public void setCount(int count);
public void incrementCount();
}
class PointCountImpl implements PointCount {
static final int X_OFFSET = 0;
static final int Y_OFFSET = X_OFFSET + 8;
static final int COUNT_OFFSET = Y_OFFSET + 8;
static final int LENGTH = COUNT_OFFSET + 4;
final ByteBuffer buffer;
int start = 0;
PointCountImpl(int count) {
this(ByteBuffer.allocateDirect(count * LENGTH).order(ByteOrder.nativeOrder()));
}
PointCountImpl(ByteBuffer buffer) {
this.buffer = buffer;
}
#Override
public void index(int index) {
start = index * LENGTH;
}
#Override
public double getX() {
return buffer.getDouble(start + X_OFFSET);
}
#Override
public void setX(double x) {
buffer.putDouble(start + X_OFFSET, x);
}
#Override
public double getY() {
return buffer.getDouble(start + Y_OFFSET);
}
#Override
public void setY(double y) {
buffer.putDouble(start + Y_OFFSET, y);
}
#Override
public int getCount() {
return buffer.getInt(start + COUNT_OFFSET);
}
#Override
public void setCount(int count) {
buffer.putInt(start + COUNT_OFFSET, count);
}
#Override
public void incrementCount() {
setCount(getCount() + 1);
}
}
run with the -XX:-UseTLAB option (to get accurate memory allocation sizes) prints
Creating an array of 100,000,000 used 12,512 bytes of heap and took 1.8 seconds to set and get
As its off heap, it has next to no GC impact.
Sadly, there is no way of ensuring objects are created/stay at adjacent memory locations in Java.
However, objects created in sequence will most likely end up adjacent to each other (of course this depends on the actual VM implementation). I'm pretty sure that the writers of the VM are aware that locality is highly desirable and don't go out of their way to scatter objects randomly around.
The Garbage Collector will at some point probably move the objects - if your objects are short lived, that should not be an issue. For long lived objects it then depends on how the GC implements moving the survivor objects. Again, I think its reasonable that the guys writing the GC have spent some thought on the matter and will perform copies in a way that does not screw locality more than unavoidable.
There are obviously no guarantees for any of above assumptions, but since we can't do anything about it anyway, stop worring :)
The only thing you can do at the java source level is to sometimes avoid composition of objects - instead you can "inline" the state you would normally put in a composite object:
class MyThing {
int myVar;
// ... more members
// composite object
Rectangle bounds;
}
instead:
class MyThing {
int myVar;
// ... more members
// "inlined" rectangle
int x, y, width, height;
}
Of course this makes the code less readable and duplicates potentially a lot of code.
Ordering class members by access pattern seems to have a slight effect (I noticed a slight alteration in a benchmarked piece of code after I had reordered some declarations), but I've never bothered to verify if its true. But it would make sense if the VM does no reordering of members.
On the same topic it would also be nice to (from a performance view) be able to reinterpret an existing primitive array as another type (e.g. cast int[] to float[]). And while you're at it, why not whish for union members as well? I sure do.
But we'd have to give up a lot of platform and architecture independency in exchange for these possibilities.
Doesn't work that way in Java. Iteration is not a matter of increasing a pointer. There is no performance impact based on where on the heap the objects are physically stored.
If you still want to approach this in a C/C++ way, think of a Java array as an array of pointers to structs. When you loop over the array, it doesn't matter where the actual structs are allocated, you are looping over an array of pointers.
I would abandon this line of reasoning. It's not how Java works and it's also sub-optimization.

Categories