Parsing of nested structures and object model

Parsing of nested structures and object model - java

I am trying to parse a binary file format with nested structures. In procedural pseudo-code, the process would be as such:
// A structure contains:
// tag | oneof(a, b, c) | oneof(oneof(aa, ab, ac), oneof(ba, bb, bc), oneof(ca, cb, cc))
PROCEDURE parse() {
RECORD read_type;
read_tag(read_type);
if (read_type == TYPE_A) {
read_a(read_type);
if (read_type == TYPE_AA) {
read_aa();
} else if (read_type == TYPE_AB) {
read_ab();
} else if (read_type == TYPE_AC) {
read_ac();
}
} else if (read_type == TYPE_B) {
// see above
} else if (read_type == TYPE_C) {
// see above
}
}
An outer structure such as AA can not be interpreted without context from its parent object A, which in turn requires its tag/header to interpret. When working with these structures, it makes sense to manipulate structures that contain A, that contain AA, etc., but never only the A or AA portion of a structure.
My question is then how to create a class model for this procedure. Should the structure be:
class Base;
class A: Base;
class B: Base;
class C: Base;
class AA: A;
class AB: A;
class AC: A;
// ...
In which case, AA might be constructed as such:
AA::AA(): A() {
read_aa();
}
A::A(): Base() {
read_a();
}
Base::Base() {
read_tag();
}
However, the issue would be that it would not be possible to know what derived object to construct without first constructing the base object. This could be worked around by having a constructor AA::AA(A*) that copy constructs its parent, but this seems like an unnecessary inefficiency. Further, this would require an external factory function such as:
Base *read_object() {
Base *base = new Base();
if (b->tag_type == TYPE_A) {
A *a = new A(base);
if (a->tag_type == TYPE_AA) {
return new AA(a);
} else if (a->tag_type == TYPE_AB) {
// ...
} else if (a->tag_type == TYPE_AC) {
// ...
}
} else if (b->tag_type == TYPE_B) {
// ...
} else if (b->tag_type == TYPE_C) {
// ...
}
}
The other option is to have classes that refer to sub-regions of the structure such as:
class CompleteStructure;
class StructureA;
class StructureB;
class StructureC;
class StructureAA;
class StructureAB;
class StructureAC;
// ...
class CompleteStructure {
union {StructureA a, StructureB b, StructureC c} sub;
}
class StructureA {
CompleteStructure *parent;
union {StructureAA aa, StructureAB ab, StructureAC ac} sub;
}
class StructureAA {
StructureA *parent;
}
In this case, the constructor CompleteStructure::CompleteStructure() would read the tag and then construct one of StructureA, StructureB, or StructureC, which would in turn construct is own sub-structure. The issue with this is that each sub-structure would need an explicit reference to its parent in order to "cast" up the hierarchy and implement its methods/functions.
Is one of these approaches better than the other in terms of space/time efficiency and "cleanness"? Is there a superior third approach?
EDIT:
To respond to the two answers below, the question is both about parsing and object behavior. My initial goal is merely to read the structures from the file, print out their fields, and then write them back to disk in the same order. Later on, there will be additional goals such as finding all instances of A-derived structures and sorting them by certain fields or checking for illegal combinations of structures (e.g. having both BA and BB).
EDIT2:
Here is an example schema of one of the structures I refer to (with generic field names). u8/16/32 refer to integer types, sz is a C string, upper-case names are fields that need to be read, and constants are prefixed with underscores.
DEF AA {
// Identifies and deliminates complete records.
TAG {
u32 SYNC_CODE = 0xFFFFFFFF;
}
// Metadata for high level identification of data.
A {
u32 TYPE = __TYPE_A;
u16 CATEGORY = __CATEGORY_1; // A defines the "category" of the following file data
u32 NUM_OF_KV_PAIRS;
for (int i = 0; i < NUM_OF_KV_PAIRS; ++i) { // unspecified metadata
sz KEY;
sz VALUE;
}
u8 HAS_EXTENSION_FLAG = true; // indicates presence of next record
if (!HAS_EXTENSION_FLAG) {
DEFAULT_PARAMS; // legacy
}
}
// Indicates a specific data layout and version.
AA {
u32 TYPE = __TYPE_AA;
u8[16] ACCESS_KEY;
u32 NUM_OFFSETS;
for (int i = 0; i < NUM_OFFSETS; ++i) {
// stuff
}
}
}

It is difficult to answer if some approach is better in terms of effeciency without a more concrete problem description. Below you can find some food for thought.
Point 1: When thinking about class design it worths to also examine the desired behaviour and not just the data. The fact that the binary format used for storage, may or may not imply a hierarchy, should be taken into account of course, but it should not be the primary concern.
As an example, assume we have a Person class that has a height field and a Rectangle class that also has a height field. They both share some data but having only this information makes them rather irrelevant to each other. If we define the context and we say we want to draw them on the screen then suddenly the height field has a more specific meaning. Now inheriting a Drawable perhaps makes more sense.
The question in your case is how we will use them? What common operations can we do if we have a list of {A, B} or {AA, BB} or even {A, BB}? Can we somehow manage them together? This is an important point that you should take into account.
Point 2: You say that "it makes sense to manipulate structures that contain A, that contain AA, etc, but never only the A or AA portion of a structure". So I understand that AA is-a A, but that the opposite is also true. If this is the case, then it makes sense to have Base, A, B, C as abstract classes and only be able to instatiate directly the last level AA, BB etc.
Point 3: On the other hand it might be better to use composition over inheritance if the different structures just define some data and not some behaviour. E.g. will we invoke a method on them like process() that would operate on the data? Or do we want to use the structures themselves as data?
class X {
Base base;
A a;
AA aa;
process() {
// this is different than calling base.process() + a.process() + aa.process()
// do we need one over the other? both?
process(base) + process(a) + process(aa);
}
}
Point 4: Regarding the order of instantiation while reading, this should not be a problem. Perhaps you could read the information as you go storing it temporarily and only instantiate a class after you know its full type (i.e. you reach the last level).
I hope that helps

The question doesn't clearly explain what you think you are doing, or what the actual problem is (ie. what you should be doing).
You need to very clearly define which of A, AA, AB are entities with their own distinct existence -- and where are the child relationships which you're supposedly parsing. You say nested structure but don't detail it.
As another answer mentioned -- OO is about behaviour, not about data modelling.
Leaning heavily on inheritance, especially since you don't know what you're constructing, sound like it would be a complete mistake. Inheritance heirarchies in general are only useful when you need behavior (methods that calculate or do things) & can efficiently divide that behavior-space by some class heirarchy and benefit from that.
Your problem as stated above, is just a parsing problem. You could as well use a Stack and some internal state (say a StringBuilder, at the most trivial) to read & build up parsing-state while using the Stack to push & pop nesting levels.
In fact, the above is a great way to implement most kinds of parsers.
A more-sophisticated alternative (also common in parsers) is to build an AST. These are very efficient & light-weight to build & traverse.
class AstNode {
protected AstNode down; // first child.
protected AstNode across; // next sibling.
public void addChild (AstNode child) {
if (getDown() == null) {
// First Child;
this.down = child;
return;
}
// Sibling to existing Children.
AstNode last = down;
while (last.getAcross() != null)
last = last.getAcross();
last.across = child;
// done.
}
}
With an AST, you can also put properties/ members on for NodeType, Data, Type (lexical) etc and effectively build a powerful data-structure in its own right.
Hope this helps.

Related

Java set variable and method types from one place

I am trying to make a flexible data structure that I can copy paste around. I want it to handle only one data type, but be able to change this easily when copy pasting this around. Consider this simple example:
class Data {
//customize these
int a;
int b;
int val;
private int f() {
return a + b;
}
//built-in functions that I don't customize
public void build() {
int temp = f();
this.val = temp;
}
public int query() {
return this.val;
}
}
Suppose I want a and b to be arrays and f to be concatenation. I would have to change the type of a, b, val, f, query, and temp. This would be a pain to change if I have more code that is dependent on the type. I want something like this:
ClassType c = Integer; //or maybe Long, int[], ArrayList ...
c a;
c b;
private c f() {
...
I am not looking for generics. In any particular program, I will have a single fixed type. This is purely to make copy-pasting this data structure around easier, across multiple independent programs. i.e., I do not want to specify these data types on each instantiation (with generics).
Also, ideally, this would not add overhead for int which is a common use case for me (so simply creating a class of whatever data type would not be ideal).
More specifics (kind of unrelated): I want a Segment Tree that I can "customize" to be range min query, sort the range as array, 2D Segment Tree, etc. (note that these 2 cases have different data types, and thus different method signatures) I already have the methods and data types I need figured out, and now I need to implement it.

In Java datatype is resolved at compile time, while data value is resolved at runtime. So that is the reason assigning some value as datatype of some variable is not possible in Java.

Hierarchical data structure for configuration parameters

I'm writing a setup routine with several configuration parameters (for now, could be int, double, or String). In order to avoid long method signatures, I'd like to create a data structure to store these parameters. The main setup routine calls several methods/factories, each with their own set of required parameters, and I'd like to only pass the relevant piece of the data structure to each routine.
My initial thought was to use some kind of recursive/tree dictionary-like structure, where String keys could be associated with either a value, or a subtree that behaves exactly like the top-level tree.
A rough example of what I'm trying to do follows:
public class A {
// some common methods here
}
public class A1 extends A {
public A1(int x, double y) {
// do stuff
}
}
public class A2 extends A {
public A2(double z) {
// do stuff
}
}
public A SetupA(ConfigOptions opts) {
if (opts.get("typeA").equals("A1")) {
return SetupA1(opts.get("A1opts"));
} else {
return SetupA2(opts.get("A2opts"));
}
}
public A1 SetupA1(ConfigOptions A1opts) {
return new A1(A1opts.get("x"), A1opts.get("y"));
}
public A2 SetupA2(ConfigOptions A2opts) {
return new A2(A2opts.get("z"));
}
When I fill the ConfigOptions structure, I'd like to have a character, e.g. :, that separates levels of the tree. I.e. for the example above, I could specify that I want an A2 object with z = 1.0 something like this:
opts.set("typeA", "A2");
opts.set("A2opts:z", 1.0);
I know that what I've just laid out won't work as written since it's not type-safe. What modifications to the interface, and/or what implementation techniques could I consider that would allow me to accomplish my objectives?
Notes:
The setup routine has a relatively small one-time cost. Therefore readability and a convenient interface is a much higher priority for me than speed.
This is a learning project for me, so I'm looking for a solution that doesn't depend on external libraries or built-in collection classes.
Error checking does not need to be considered by the data structure, since it will be handled by the routines that receive the parameters.

Using a comparable on 3 different classes

I'm trying to implement a function that returns the maximum object of a given Comparable (generic) list.
I have 3 classes that I have implemented their compareTo method that returns the 1 if this is bigger than other, -1 if this is smaller than other, and 0 if they're equal.
Now my problem is with understanding with how do I work with a generic input COmparable list.
Here's the signature of my function, and the code I wrote so far (that refuses to work on me):
public static Comparable<?> getMax(List<Comparable<?>> ls) {
LinkedList<Comparable<?>> tmpComp = new LinkedList<Comparable<?>>();
for (Comparable<?> c : ls)
tmpComp.add(c);
Comparable<?> maxObj = tmpComp.get(0);
for (Comparable<?> c : tmpComp)
if (c.compareTo(maxObj) > 0)
m = c;
return m;
}
I'm writing a system that has users in it, and ads. Users and ads both classes that have "profit" field on them that all I do in my compareTo methods is to compare which of the two (this, or other) have more profit and then just returns the right value according to that. The 3rd class is compared via another field, which is an int as well, that indicates the level (int) of the Quest.
Also that if statement, specifically, gives me an error of the type "is not applicable for the arguments".
Any clues?
Thanks in advance!

Reading your comment, I suggest you redesign your model to be:
interface ProfitGenerating {
double getProfit();
}
class User implements ProfitGenerating {
...
}
class Advert implements ProfitGenerating {
...
}
List<ProfitGenerating> profits = ...;
Optional<ProfitGenerating> maxProfit = profits.stream()
.max(Comparator.comparingDouble(ProfitGenerating::getProfit));

The answer by Mạnh Quyết Nguyễn is good. But it does not account for the situation where you have multiple potential types T, which appears to be your situation.
So in that situation, just wrap your various classes with a single class and use his solution.
If you have a User class and an Ad class, then create a wrapper like so:
class ProfitMaker implements Comparable<ProfitMaker> {
User user;
Ad ad;
public int compare(ProfitMaker p) {
//check my profit and compare with profit of p
}
}
Use that class as the "T" when usign the getMax from Mạnh Quyết Nguyễn.
Alternatively, use an interface
interface ProfitMaker extends Comparable<ProfitMaker> {
int getProfit();
}
Make both your User and Ad classes implement that interface, and that use that interface as the "T" along with the getMax method from Mạnh Quyết Nguyễn.

Your three classes must be comparable to each other. For this they will need to implement Comparable<SomeX> where SomeX is their lowest common superclass. In the worst case, SomeX is Object.
If this is the case, you can simply do:
ls.stream().max(Comparator.naturalOrder())
Alternatively, instead of forcing your classes to implement Comparable<...>, you could capture comparison semantics in a Comparator<...> and then do:
ls.stream().max(comparator)
Using a comparator is better for cases where the order is not really "natural" for the type or where there may be different orders. I think this is the case here since you actually compare instances of different types. It is hard to argue that some order is "natural" for these instances as they don't even belong to one type.
If you compare your instances based on some property they share (like int getProfit()), it would make sense creating a common interface like Profitable. Then you could do:
ls.stream().max(Comparator.comparintInt(Profitable::getProfit))
Note that if you compare on privitive types, you should use comparingInt/comparingLong/comparingDouble instead of comparing to avoid unnecessary boxing and unboxing.
If you for some reason can't create and implement a common interface like Profitable, you can still use comparingInt and likes. You'll just have a much uglier lambda:
ls.stream().max(Comparator.comparintInt(l -> {
if (l instanceof Ad) { return ((Ad) l).getProfit(); }
else if (l instanceof Ransom) { return ((Ransom) l).getProfit(); }
// ...
else { throw new IllegalArgumentException(...); }
}))

Constraint solve inferred type's in order to refactor Java code

We created a java test project, which has a arraylist.
private ArrayList mainList;
public void AddTest(int number)
{
Test t = new Test();
mainList.add(t);
mainList.add(number);
}
As can be seen we add a integer and something of class Test.
In rascal we create a object flow graph which consists of the following:
OFG: {
<|java+class:///java/util/ArrayList/this|,|java+constructor:///java/util/ArrayList/ArrayList()|>,
<|java+variable:///test1/Main/AddTest(int)/t|,|java+field:///test1/Main/mainList|>,
<|java+class:///test1/Main/this|,|java+constructor:///test1/Main/Main()|>,
<|java+parameter:///test1/Main/AddTest(int)/scope(number)/scope(0)/number|,|java+field:///test1/Main/mainList|>,
<|java+class:///test1/Test/this|,|java+constructor:///test1/Test/Test()|>,
<|java+class:///test1/Test/this|,|java+field:///test1/Main/mainList|>
}
As can be seen in the OFG a integer and Test get added to the mainList. Using this knowledge we want to indicate that ArrayList should contain type Object thus
private ArrayList mainList -> private ArrayList<Object> mainList
For this we need a constraint solver which find the lowest type or generalization. Therefore we want to augment the solve function of the following propagation method
rel[loc,&T] propagate(OFG g, rel[loc,&T] gen, rel[loc,&T] kill, bool back) {
rel[loc,&T] IN = { };
rel[loc,&T] OUT = gen + (IN - kill);
gi = g<to,from>;
set[loc] pred(loc n) = gi[n];
set[loc] succ(loc n) = g[n];
solve (IN, OUT) {
IN = { <n,\o> | n <- carrier(g), p <- (back ? pred(n) : succ(n)), \o <- OUT[p] };
OUT = gen + (IN - kill);
}
return OUT;
}
However, we find it difficult to start this using Rascal
We have experience with IBM ILOG, so constraint programming is not new.

One idea, you could write another function or group of functions which relate type parameter positions to possible or necessary types:
a many-to-many rel[loc typeparameter, TypeSymbol bound could encode of which types the type parameter should at least be a subtype according to the objects which flow into it according to your flow analysis.
then an algorithm which computes a tight upperbound based on the alternatives, would combine several supertypes for the same typeparameter and compute the least type which includes them all. This algorithm would make the rel[loc typeparameter, TypeSymbol bound] smaller and smaller until only one solution remains for every type parameter.
You could use the extends and implements relations in the M3 model to find out about common super types, but you should also build in some knowledge about the Java type system, such as the fact the java.lang.Object is the top type for both classes and interfaces in Java, that classes have single inheritance and interfaces multiple inheritance.
TypeSymbol can be found in lang::java::m3::TypeSymbol

Use a certain objects throughout the program based on a value at start

I'm writing a program where at the start of the program I take the bit depth of an image. Once I have the bit depth, which is guaranteed to always be one of 8, 16 or 32 bits. Once I have the bit depth I do some processing on the image, and create a few new images based on the output. For creating the images I need to use specific classes, IE FloatProcessor ShortProcessor ByteProcess and their corresponding arrays, float[] short[] and byte[].
What I would like to do is instead of having a switch or a bunch of if's at every place I need to determine which one to use. The three classes are all extensions of a class, but even if I were to do it in a method, I would still have to return the base class and I still wouldn't know which type to cast it as where I am using it.
Edit: What I really want is something along the lines of if(depth == 8) #define type ByteProcessor etc for 16 and 32

What about using Generics instead of inheritance? My Java is rusty, so I'll use C++ to demonstrate:
template<class DataT>
class Foo
{
public:
DataT data
void processData()
{
// Do something here
};
};
If you still need a switch statement in the processData function, you would still avoid having to put it all over your code. You may be able to use Generics in combination with the factory method pattern to get what you want.

Hypothetically speaking, let's say that all three inherit from a base TypeProcessor class:
abstract class TypeProcessor {
public abstract Image ProcessImage(Image input);
}
And, you have your specific classes:
class ByteProcessor extends TypeProcessor{
public byte data[];
public Image ProcessImage(Image input) {
//do stuff
return ret;
}
}
Obviously, if your program holds a TypeProcessor reference, you can't access data directly (without doing a bunch of type checking and casting and stuff).
The proper solution is to move the code that needs to access data into the class itself:
class ByteProcessor extends TypeProcessor{
public byte data[];
public Image ProcessImage(Image input) {
//do stuff
data = whatever;
FrobData
Image ret = new Image(data);
return ret;
}
void FrobData() {
for(i = 0; i < data.length; i++) {
data[i] = (data[i] + 1) % 64;
}
}
}
This is obviously a contrived and very incorrect example, but it should give you the general idea.
This will introduce some code redundancy, but I assume that the calculations are different enough between different types not to warrant a more complicated solution.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing of nested structures and object model - java

Related

Java set variable and method types from one place

Hierarchical data structure for configuration parameters

Using a comparable on 3 different classes

Constraint solve inferred type's in order to refactor Java code

Use a certain objects throughout the program based on a value at start

Categories

Resources