Reflection, hashtable or methods for performance? - java

I'm trying to write a Java program to decode and encode Ogg streams. I've got a decoder working but I didn't like the fact that I had duplicate code so I started writing something like that:
Decoder oggDecoder = new Decoder(
new StringDecoder( "Ogg" ),
new IntDecoder( "something" )//, ...
);
I wrote encoders and decoders for some "types" and then use them to build the whole thing.
But then I don't know how to store the result. I have 3 options I know:
- keep the data in an array of bytes and provide a get( String name ) and set( String name, Object value ) methods that will work directly on the bytes.
- use a dictionary.
- use a class and use reflection to set the properties.
I'm not that much into performance and if it's slow I don't really care as long as it's fast enough to read music. Meaning that I know writing the functions myself would make it faster but I want to write just one function working for all the properties.
So what do you think would be the fastest way of doing this?
Another way to ask this question would be:
Given a set of field names as an array of String, what is the most appropriate data structure to store the corresponding values that got decoded from a byte stream:
- keep them as byte
- store them in a dictionary
- store them in a class using reflexion
Thanks in advance for your answer.

KISS - just use a HashMap<String, byte[]>. No reflection needed.
Update
I don't think I understood at first what you want, but now I think what you are looking for is a hetergeneous map structure.
Here's a question that might be of more use to you.

Related

How to modify update a large file with small content changes at specific indexes

I need to modify a file. We've already written a reasonably complex component to build sets of indexes describing where interesting things are in this file, but now I need to edit this file using that set of indexes and that's proving difficult.
Specifically, my dream API is something like this
//if you'll let me use kotlin for a second, assume we have a simple tuple class
data class IdentifiedCharacterSubsequence { val indexOfFirstChar : int, val existingContent : String }
//given these two structures
List<IdentifiedCharacterSubsequences> interestingSpotsInFile = scanFileAsPerExistingBusinessLogic(file, businessObjects);
Map<IdentifiedCharacterSubsequences, String> newContentByPreviousContentsLocation = generateNewValues(inbterestingSpotsInFile, moreBusinessObjects);
//I want something like this:
try(MutableFile mutableFile = new com.maybeGoogle.orApache.MutableFile(file)){
for(IdentifiedCharacterSubsequences seqToReplace : interestingSpotsInFile){
String newContent = newContentByPreviousContentsLocation.get(seqToReplace);
mutableFile.replace(seqToReplace.indexOfFirstChar, seqtoReplace.existingContent.length, newContent);
//very similar to StringBuilder interface
//'enqueues' data changes in memory, doesnt actually modify file until flush call...
}
mutableFile.flush();
// ...at which point a single write-pass is made.
// assumption: changes will change many small regions of text (instead of large portions of text)
// -> buffering makes sense
}
Some notes:
I cant use RandomAccessFile because my changes are not in-place (the length of newContent may be longer or shorter than that of seq.existingContent)
The files are often many megabytes big, thus simply reading the whole thing into memory and modifying it as an array is not appropriate.
Does something like this exist or am I reduced to writing my own implementation using BufferedWriters and the like? It seems like such an obvious evolution from io.Streams for a language which typically emphasizes indexed based behaviour heavily, but I cant find an existing implementation.
Lastly: I have very little domain experience with files and encoding schemes, so I have taken no effort to address the 'two-index' character described in questions like these: Java charAt used with characters that have two code units. Any help on this front is much appreciated. Is this perhaps why I'm having trouble finding an implementation like this? Because indexes in UTF-8 encoded files are so pesky and bug-prone?

Java - How to read a custom map format

I am trying to create a custom map format for my own little 2D RPG, so my question is rather how do I manage reading and creating a custom map format properly and flexible. First off, I am writing my code in Java. The idea was to have a class called 'TileMap'. This class defines a 2-dimensional integer - array where all my entities are stored ( I'm using an entity-system to realize my game ). I also want to save and parse some information about the size of the map before the actual reading process happens. The map file should look much like this:
#This is a test map
width=4
height=3
layercount=1
tilesize=32
[1;0;0;0]
[23;1;0;0]
[5;0;1;0]
where layercount is the number of layers the z-dimension offers. and tilesize is the size of every tile in pixels. Entities are defined in between the brackets. The pattern goes: [entity_id;x_pos;y_pos;z_pos]. I already wrote the code to parse a file like this but its not very flexible because you just have to put one tiny whitespace in front of the square brackets and the map can't load up. I just need some few helpful tips to do this in a flexible way. Can anybody help me out?
I think that may have 3 different ways to solve that:
First, you can use a Map with Maps: Map<Serializable,Map<String,Object>> where Serializable is your entity_id, and the map are the attributes that you need, like ("width",4), ("height",3):
public static final String WIDTH = "WIDTH";
public static final String HEIGHT = "HEIGHT";
...
Map<String,Object> mapProperties = new HashMap<String,Object>();
mapProperties.put(WIDTH, 4);
mapProperties.put(HEIGHT, 3);
....
Map<Serializable,Map<String,Object>> map = new HashMap<Serializable,Map<String,Object>>();
map.put(myEntity.getId(), mapProperties);
Second way could be like this: http://java.dzone.com/articles/hashmap-%E2%80%93-single-key-and
Third way could be like this: Java Tuple Without Creating Multiple Type Parameters

Optimized way of doing string.endsWith() work.

I need to look for all web requests received by Application Server to check if the URL has extensions like .css, .gif, etc
Referred how tomcat is listening for every request and they pick the right configured Servlet to serve.
CharChunk , MessageBytes , Mapper
Here is my idea to implement:
Load all the extensions we like to compare and get the byte
representation of them.
get a unique value for this xtension by summing up the bytes in the byte Array // eg: "css".getBytes()
Add the result value to Sorted List
Whenever we receive the request, get the byte representation of the URL // eg: "flipkart.com/eshopping/images/theme.css".getBytes()
Start summing the bytes from the byte array's last index and break when we encounter "." dot byte value
Search for existence of the value thus summed with the Sorted List // Use binary Search here
Kindly give your feed backs about the implementation and issues if any.
-With thanks, Krishna
This sounds way more complicated than it needs to be.
Use String.lastIndeXOf to find the last dot in the URL
Use String.substring to get the extension based on that
Have a HashSet<String> for a set of supported extensions, or a HashMap<String, Whatever> if you want to map the extension to something else
I would be absolutely shocked to discover that this simple approach turned out to be a performance bottleneck - and indeed I suspect it would be more efficient than the approach you suggested, given that it doesn't require the entire URL to be converted into a byte array... (It's not clear why your approach uses byte arrays anyway instead of forming the hash from char values.)
Fundamentally, my preferred approach to performance is:
Do up-front design and testing around things which are hard to change later, architecturally
For everything else:
Determine the performance criteria first so you know when you can stop
Write the simplest code that works
Test it with realistic data
If it doesn't perform well enough, use profilers (etc) to work out where the bottleneck is, and optimize that making sure that you can prove the benefits using your existing tests

How do I identify that I am at the last byte of a serialized Java object?

Question
What is (if there is any) terminating characters/byte sequences in serialized java objects?
Background
I'm working on a small self-education project where I would like to serialize java objects and write them to a stream where there are read and then unserialized. Since, I will need to identify the borders between serialized objects and I can't be sure that the current object is not the last one, is there a terminating character that is always there that I can use as my identifier?
I noticed that there is a magic number ACED that allows me to identify the start of the object, so how do I identify the end?
EDIT:
If there is no terminating character, is there any safe terminating characters/sequences that I can use (insert) to identify the end of the object?
In theory you should always be able to find the end of an object, in practice you cannot. I understand the problem is customised writeObject implementations that don't call either defaultReadObject or readFields have a non-standard representation.
I've played about with serialisation in the past. Including creating streams for use when I've been doing unusual things to the ObjectInputStream. It's not pleasant(!).
You can read the details in the spec, and the source is worth a read.
there are none. AFAIK the only requirement is that the deserialiser know when to stop reading, when given a corresponding serialisation. subject to that, the serialiser can write whatever it wants -- in any position not just the last.
if you're old skool dump a 32-bit length field at the beginning a refuse to handle objects bigger than 4 gig.
nu scool, you just make sure your read and your write logic are consistent and don't care about the length.
You can add a terminating object to your object stream. e.g. null or a special String.
However, I suggest that you instead convert the ObjectsStream to a byte[] and write the byte length of the byte[] followed by its data. This way each ObjectStream is independent and you always know where it finishes.
Have you considered applying a record-marking layer similar to HTTP Chunked encoding?
The Chunked encoding is intended to solve a generalization of this scenario: identifying the end of a message of indeterminate length that both itself contains no identifiable end, and is embedded in a longer stream without ending it.

Developing a (file) exchange format for java

I want to come up with a binary format for passing data between application instances in a form of POFs (Plain Old Files ;)).
Prerequisites:
should be cross-platform
information to be persisted includes a single POJO & arbitrary byte[]s (files actually, the POJO stores it's names in a String[])
only sequential access is required
should be a way to check data consistency
should be small and fast
should prevent an average user with archiver + notepad from modifying the data
Currently I'm using DeflaterOutputStream + OutputStreamWriter together with InflaterInputStream + InputStreamReader to save/restore objects serialized with XStream, one object per file. Readers/Writers use UTF8.
Now, need to extend this to support the previously described.
My idea of format:
{serialized to XML object}
{delimiter}
{String file name}{delimiter}{byte[] file data}
{delimiter}
{another String file name}{delimiter}{another byte[] file data}
...
{delimiter}
{delimiter}
{MD5 hash for the entire file}
Does this look sane?
What would you use for a delimiter and how would you determine it?
The right way to calculate MD5 in this case?
What would you suggest to read on the subject?
TIA.
It looks INsane.
why invent a new file format?
why try to prevent only stupid users from changing file?
why use a binary format ( hard to compress ) ?
why use a format that cannot be parsed while being received? (receiver has to receive entire file before being able to act on the file. )
XML is already a serialization format that is compressable. So you are serializing a serialized format.
Would serialization of the model (if you are into MVC) not be another way? I'd prefer to use things in the language (or standard libraries) rather then roll my own if possible. The only issue I can see with that is that the file size may be larger than you want.
1) Does this look sane?
It looks fairly sane. However, if you are going to invent your own format rather than just using Java serialization then you should have a good reason. Do you have any good reasons (they do exist in some cases)? One of the standard reasons for using XStream is to make the result human readable, which a binary format immediately loses. Do you have a good reason for a binary format rather than a human readable one? See this question for why human readable is good (and bad).
Wouldn't it be easier just to put everything in a signed jar. There are already standard Java libraries and tools to do this, and you get compression and verification provided.
2) What would you use for a delimiter and how determine it?
Rather than a delimiter I'd explicitly store the length of each block before the block. It's just as easy, and prevents you having to escape the delimiter if it comes up on its own.
3) The right way to calculate MD5 in this case?
There is example code here which looks sensible.
4) What would you suggest to read on the subject?
On the subject of serialization? I'd read about the Java serialization, JSON, and XStream serialization so I understood the pros and cons of each, especially the benefits of human readable files. I'd also look at a classic file format, for example from Microsoft, to understand possible design decisions from back in the days that every byte mattered, and how these have been extended. For example: The WAV file format.
Let's see this should be pretty straightforward.
Prerequisites:
0. should be cross-platform
1. information to be persisted includes a single POJO & arbitrary byte[]s (files actually, the POJO stores it's names in a String[])
2. only sequential access is required
3. should be a way to check data consistency
4. should be small and fast
5. should prevent an average user with archiver + notepad from modifying the data
Well guess what, you pretty much have it already, it's built-in the platform already:Object Serialization
If you need to reduce the amount of data sent in the wire and provide a custom serialization ( for instance you can sent only 1,2,3 for a given object without using the attribute name or nothing similar, and read them in the same sequence, ) you can use this somehow "Hidden feature"
If you really need it in "text plain" you can also encode it, it takes almost the same amount of bytes.
For instance this bean:
import java.io.*;
public class SimpleBean implements Serializable {
private String website = "http://stackoverflow.com";
public String toString() {
return website;
}
}
Could be represented like this:
rO0ABXNyAApTaW1wbGVCZWFuPB4W2ZRCqRICAAFMAAd3ZWJzaXRldAASTGphdmEvbGFuZy9TdHJpbmc7eHB0ABhodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20=
See this answer
Additionally, if you need a sounded protocol you can also check to Protobuf, Google's internal exchange format.
You could use a zip (rar / 7z / tar.gz / ...) library. Many exists, most are well tested and it'll likely save you some time.
Possibly not as much fun though.
I agree in that it doesn't really sound like you need a new format, or a binary one.
If you truly want a binary format, why not consider one of these first:
Binary XML (fast infoset, Bnux)
Hessian
google packet buffers
But besides that, many textual formats should work just fine (or perhaps better) too; easier to debug, extensive tool support, compresses to about same size as binary (binary compresses poorly, and information theory suggests that for same effective information, same compression rate is achieved -- and this has been true in my testing).
So perhaps also consider:
Json works well; binary support via base64 (with, say, http://jackson.codehaus.org/)
XML not too bad either; efficient streaming parsers, some with base64 support (http://woodstox.codehaus.org/, "typed access API" under 'org.codehaus.stax2.typed.TypedXMLStreamReader').
So it kind of sounds like you just want to build something of your own. Nothing wrong with that, as a hobby, but if so you need to consider it as such.
It likely is not a requirement for the system you are building.
Perhaps you could explain how this is better than using an existing file format such as JAR.
Most standard files formats of this type just use CRC as its faster to calculate. MD5 is more appropriate if you want to prevent deliberate modification.
Bencode could be the way to go.
Here's an excellent implementation by Daniel Spiewak.
Unfortunately, bencode spec doesn't support utf8 which is a showstopper for me.
Might come to this later but currently xml seems like a better choice (with blobs serialized as a Map).

Categories