I have an immutable class with invariant checking. According to Effective Java 2nd Ed item 76 it has a readObjects method that throws an InvalidObjectException if the deserialized object violates the invariants:
// readObject method with validity checking
private void readObject(ObjectInputStream s)
throws IOException, ClassNotFoundException {
// Check that our invariants are satisfied
if (/* some condition*/)
throw new InvalidObjectException("Invariant violated");
I know how to test serialization and deserialization, but this tests only the happy path. There is an ugly way of triggering the InvalidObjectException, where you hardcode a tampered byte stream (shamelessly stolen from EJ2 item 76):
public class BogusPeriod {
// manipulated byte stream
private static final byte[] serializedForm = new byte[] {
(byte)0xac, (byte)0xed, 0x00, 0x05, /* ca. 100 more bytes omitted */ };
// Returns the object with the specified serialized form
private static Object deserializeBogusPeriod() {
try {
InputStream is = new ByteArrayInputStream(serializedForm);
ObjectInputStream ois = new ObjectInputStream(is);
return ois.readObject();
} catch (Exception e) {
throw new IllegalArgumentException(e);
This is really ugly and will probably break as soon as the serializable class changes.
I wonder if there is a simpler method of creating test cases like that? Maybe there is a library that knows at which offsets of a byte stream specific values are located to allow tampering at run time?
You assume, that the object/class is deserializable from java (non corrupt data) and want to do some checks afterwards (like if a date in a string is formatted correct).
Writing your unit test for this, you could use a library like Serialysis (https://weblogs.java.net/blog/2007/06/12/disassembling-serialized-java-objects) to check generated byte streams by rightful streamed objects, find out where in the byte stream your data is located and modify your data during test setup.
IF you trust the source of your data you receive and have been able to deserialize, better use some kind of interceptor / validator provided by your framework of choice (Spring in SE, Java EE etc.) at the moment the object reaches your application.
I'm doing my student project and building a testing tool for regression testing.
Main idea: capture all constructors/methods/functions invocations using AOP during runtime and record all data into a database. Later retrieve the data, run constructors/methods/functions in the same order, and compare return values.
I'm trying to serialize objects (and arrays of objects) into a byte array, record it into PostgreSQL as a blob, and later (in another runtime) retrieve that blob and deserialize it back to object. But when I deserialize data in another runtime it changes and, for example, instead of boolean, I retrieve int. If I do exactly the same operations in the same runtime (serialize - insert into the database - SELECT from the database - deserialize) everything seems to work correctly.
Here is how I record data:
private void writeInvocationRecords(InvocationData invocationData, boolean isConstructor) {
final List<InvocationData> invocationRecords = isConstructor ? constructorInvocationRecords : methodInvocationRecords;
final String recordsFileName = isConstructor ? "constructor_invocation_records.json" : "method_invocation_records.json";
byte[] inputArgsBytes = null;
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream out = null;
try {
out = new ObjectOutputStream(bos);
inputArgsBytes = bos.toByteArray();
} catch (IOException e) {
} finally {
try {
} catch (IOException ex) {
// ignore close exception
byte[] returnValueBytes = null;
ByteArrayOutputStream rvBos = new ByteArrayOutputStream();
ObjectOutputStream rvOut = null;
try {
rvOut = new ObjectOutputStream(rvBos);
returnValueBytes = rvBos.toByteArray();
} catch (IOException e) {
} finally {
try {
} catch (IOException ex) {
// ignore close exception
if (invocationRecords.size() >= (isConstructor ? CONSTRUCTORS_CACHE_SIZE : METHODS_CACHE_SIZE)) {
List<InvocationData> tempRecords = new ArrayList<InvocationData>(invocationRecords);
try {
for (InvocationData record : tempRecords) {
SerialBlob blob = new javax.sql.rowset.serial.SerialBlob(inputArgsBytes);
SerialBlob rvBlob = new javax.sql.rowset.serial.SerialBlob(returnValueBytes);
psInsert.setString(1, record.className);
psInsert.setString(2, record.methodName);
psInsert.setArray(3, conn.createArrayOf("text", record.inputArgsTypes));
psInsert.setBinaryStream(4, blob.getBinaryStream());
psInsert.setString(5, record.returnValueType);
psInsert.setBinaryStream(6, rvBlob.getBinaryStream());
psInsert.setLong(7, record.invocationTimeStamp);
psInsert.setLong(8, record.invocationTime);
psInsert.setLong(9, record.orderId);
psInsert.setLong(10, record.threadId);
psInsert.setString(11, record.threadName);
psInsert.setInt(12, record.objectHashCode);
psInsert.setBoolean(13, isConstructor);
} catch (Exception e) {
Here is how I retrieve data:
List<InvocationData> constructorsData = new LinkedList<InvocationData>();
List<InvocationData> methodsData = new LinkedList<InvocationData>();
Statement st = conn.createStatement();
ResultSet rs = st.executeQuery(SQL_SELECT);
while (rs.next()) {
Object returnValue = new Object();
byte[] returnValueByteArray = new byte[rs.getBinaryStream(7).available()];
returnValueByteArray = rs.getBytes(7);
final String returnType = rs.getString(6);
ByteArrayInputStream rvBis = new ByteArrayInputStream(returnValueByteArray);
ObjectInputStream rvIn = null;
try {
rvIn = new ObjectInputStream(rvBis);
switch (returnType) {
case "boolean":
returnValue = rvIn.readBoolean();
case "double":
returnValue = rvIn.readDouble();
case "int":
returnValue = rvIn.readInt();
case "long":
returnValue = rvIn.readLong();
case "char":
returnValue = rvIn.readChar();
case "float":
returnValue = rvIn.readFloat();
case "short":
returnValue = rvIn.readShort();
returnValue = rvIn.readObject();
} catch (IOException e) {
} catch (ClassNotFoundException e) {
} finally {
try {
if (rvIn != null) {
} catch (IOException ex) {
// ignore close exception
Object[] inputArguments = new Object[0];
byte[] inputArgsByteArray = new byte[rs.getBinaryStream(5).available()];
ByteArrayInputStream bis = new ByteArrayInputStream(inputArgsByteArray);
ObjectInput in = null;
try {
in = new ObjectInputStream(bis);
inputArguments = (Object[])in.readObject();
} catch (IOException e) {
} catch (ClassNotFoundException e) {
} finally {
try {
if (in != null) {
} catch (IOException ex) {
// ignore close exception
InvocationData invocationData = new InvocationData(
if (rs.getBoolean(14)) {
} else {
An explosion of errors and misguided ideas inherent in this question:
Your read and write code is broken.
available() doesn't work. Well, it does what the javadoc says it does, and if you read the javadoc, and read it very carefully, you should come to the correct conclusion that what that is, is utterly useless. If you ever call available(), you've messed up. You're doing so here. More generally your read and write code doesn't work. For example, .read(byteArr) also doesn't do what you think it does. See below.
The entire principle behind what you're attempting to do, doesn't work
You can't 'save the state' of arbitrary objects, and if you want to push the idea, then if you can, then certainly not in the way you're doing it, and in general this is advanced java that involves hacking the JDK itself to get at it: Think of an InputStream that represents data flowing over a network connection. What do you imagine the 'serialization' of this InputStream object should look like? If you consider serialization as 'just represent the underlying data in memory', then what you'd get is a number that represents the OS 'pipe handle', and possibly some IP, port, and sequence numbers. This is a tiny amount of data, and all this data is completely useless - it doesn't say anything meaningful about that connection and this data cannot be used to reconstitute it, at all. Even within the 'scope' of a single session (i.e. where you serialize, and then deserialize almost immediately afterwards), as networks are a stream and once you grab a byte (or send a byte), it's gone. The only useful, especially for the notion of 'lets replay everything that happened as a test', serialization strategy involves actually 'recording' all the bytes that were picked up, as it happens, on the fly. This is not a thing that you can do as a 'moment in time' concept, it's continuous. You need a system that is recording all the things (it needs to be recording every inputstream, every outputstream, every time System.currentTimeMillis() in invoked, every time a random number is generated, etc), and then needs to use the results of recording it all when your API is asked to 'save' an arbitrary state.
Serialization instead is a thing that objects need to opt into, and where they may have to write custom code to properly deal with it. Not all objects can even be serialized (an InputStream representing a network pipe, as above, is one example of an object that cannot be serialized), and for some, serializing them requires some fancy footwork, and the only hope you have is that the authors of the code that powers this object put in that effort. If they didn't, there is nothing you can do.
The serialization framework of java awkwardly captures both of these notions. It does mean that your code, even if you fix the bugs in it, will fail on most objects that can exist in a JVM. Your testing tool can only be used to test the most simplistic code.
If you're okay with that, read on. But if not, you need to completely rethink what you're going to do with this.
ObjectOutputStream sucks
This is not just my opinion, the openjdk team itself is broadly in agreement (they probably wouldn't quite put it like that, of course). The data emitted by OOS is a weird, inefficient, and underspecced binary blob. You can't analyse this data in any feasible way other than spending a few years reverse engineering the protocol, or just deserializing it (which requires having all the classes, and a JVM - this can be an acceptable burden, depends on your use case).
Contrast to e.g. Jackson which serializes data into JSON, which you can parse with your eyeballs, or in any language, and even without the relevant class files. You can construct 'serialized JSON' yourself without the benefit of first having an object (for testing purposes this sounds like a good idea, no? You need to test this testing framework too!).
How do I fix this code?
If you understand all the caveats above and somehow still conclude that this project, as written and continuing to use the ObjectOutputStream API is still what you want to do (I really, really doubt that's the right call):
Use the newer APIs. available() does not return the size of that blob. read(someByteArray) is not guaranteed to fill the entire byte array. Just read the javadoc, it spells it out.
There is no way to determine the size of an inputstream by asking that inputstream. You may be able to ask the DB itself (usually, LENGTH(theBlobColumn) works great in a SELECT query.
If you somehow (e.g. using LENGTH(tbc)) know the full size, you can use InputStream's readFully method, which will actually read all bytes, vs. read, which reads at least 1, but is not guaranteed to read all of it. The idea is: It'll read the smallest chunk that is available. Imagine a network pipe where bytes are dribbling into the network card's buffer, one byte a second. If so far 250 bytes have dribbled in and you call .read(some500SizeByteArr), then you get 250 bytes (250 of the 500 bytes are filled in, and 250 is returned). If you call .readFully(some500SizeByteArr), then the code will wait about 250 seconds, and then returns 500, and fills in all 500 bytes. That's the difference, and that explains why read works the way it does. Said differently: If you do not check what read() is returning, your code is definitely broken.
If you do not know how much data there is, your only option involves a while loop, or to call a helper method that does that. You need to make a temporary byte array, then in a loop keep calling read until it returns -1. For every loop, take the bytes in that array from 0 to (whatever the read call returned), and send these bytes someplace else. For example, a ByteArrayOutputStream.
Class matching
when I deserialize data in another runtime it changes and, for example, instead of boolean, I retrieve int
The java serialization system isn't magically changing your stuff on you. Well, put a pin that. Most likely the class file available in the first run (where you saved the blob in the db) was different vs what it looked like in your second run. Voila, problem.
More generally this is a problem in serialization. If you serialize, say, class Person {Date dob; String name;}, and then in a later version of the software you realize that using a j.u.Date to store a date of birth is a very silly idea, as Date is an unfortunately named class (it represents an instant in time and not a date at all), so you replace it with a LocalDate instead, thus ending up with class Person{LocalDate dob; String name;}, then how do you deal with the problem that you now want to deserialize a BLOB that was made back when the Person.class file still had the broken Date dob; field?
The answer is: You can't. Java's baked in serialization mechanism will flat out throw an exception here, it will not try to do this. This is the serialVersionUID system: Classes have an ID and changing anything about them (such as that field) changes this ID; the ID is stored in the serialized data. If the IDs don't match, deserialization cannot be done. You can force the ID (make a field called serialVersionUID - you can search the web for how to do that), but then you'd still get an error, java's deserializer will attempt to deserialize a Date object into a LocalDate dob; field and will of course fail.
Classes can write their own code to solve this problem. This is non-trivial and is irrelevant to you, as you're building a framework and presumably can't pop in and write code for your testing framework's userbase's custom class files.
I told you to put a pin in 'the serialization mechanism isnt going to magically change types on you'. Put in sufficient effort with overriding serialVersionUID and such and you can end up there. But that'd be because you wrote code that confuses types, e.g. in your readObject implementation (again, search the web for java's serialization mechanism, readObject/writeObject - or just start reading the javadoc of java.io.Serializable, that's a good starting-off point).
Style issues
You create objects for no purpose, you seem to have some trouble with the distinction between a variable/reference and an object. You aren't using try-with-resources. The way your SELECT calls are made suggests you have an SQL injection security issue. e.printStackTrace() as line line in a catch block is always incorrect.
I have a class that takes an InputStream as an argument to read data.
public Foo {
private DataInput in;
public Foo(InputStream ism) {
in = new DataInputStream(ism);
public byte readByte() throws IOException {
return in.readByte();
Sometimes this InputStream might come from a Socket, e.g.,
ism = new BufferedInputStream(sock.getInputStream());
foo = new Foo(ism);
My question is, is it possible to check from within Foo that the input stream comes from Socket, i.e., it's a network I/O rather than local I/O? Since the
call returns the abstract class. I don't know which concrete input stream implementation to test for.
Edit: the motivation is that there is a piece of big Java software that has this structure. Foo is created in many places. Some place with file input stream while others with socket input stream. The software can perform poorly when the read is across the network. So I want to see if it's possible do tracing to differentiate the two scenarios for this software without changing much of its code. I'm using AspectJ to write the tracing in the hope to not create much mess to this existing software.
The problem is that an InputStream can be a FilterInputStream that is constructed around another InputStream and that socket just returns an InputStream.
One approach, very dirt & buggy: find the root InputStream, that is, recursively/loop if it is an instance of FilterInputStream, check its parent InputStream (protected field in). Then check the class of the root, the name probably contains "Socket" if it comes from a Socket.
AspectJ idea (I do not have that much experience with it): you should be able to add an aspect to the getInputStream method of Socket that stores the returned InputStream in a list (or similar) for later checking, or somehow marks that InputStream (adding a flag/method to it?).
You can create 2 superclasses of input stream before passing it into Foo class.
NetworkInputStream nis = new NetworkInputStream(sock.getInputStream());
Foo networkFoo = new Foo(nis);
FileInputStream fis = new FileInputStream(file.getInputStream());
Foo fileFoo = new Foo(fis);
public class NetworkInputStream extends BufferedInputStream {}
public class FileInputStream extends BufferedInputStream {}
Then, on Foo class:
public Foo(InputStream ism) {
if (ism instanceof NetworkInputStream) {
//Do whatever if it's from network stream
if (ism instanceof FileInputStream) {
//Do whateverelse
in = new DataInputStream(ism);
class CSVReader {
private List<String> output;
private InputStream input;
public CSVReader(InputStream input) {
this.input = input;
public void read() throws Exception{
//do something with the inputstream
// create output list.
public List<String> getOutput() {
return Collections.unmodifiableList(output);
I am trying to create a simple class which will be part of a library. I would like to create code that satisfies the following conditions:
handles all potential errors or wraps them into library errors and
throws them.
creates meaningful and complete object states (no incomplete object structures).
easy to utilize by developers using the library
Now, when I evaluated the code above, against the goals, I realized that I failed badly. A developer using this code would have to write something like this -
CSVReader reader = new CVSReader(new FileInputStream("test.csv");
I see the following issues straight away -
- developer has to call read first before getOutput. There is no way for him to know this intuitively and this is probably bad design.
So, I decided to fix the code and write something like this
public List<String> getOutput() throws IOException{
return Collections.unmodifiableList(output);
OR this
public List<String> getOutput() {
throw new IncompleteStateException("invoke read before getoutput()");
return Collections.unmodifiableList(output);
OR this
public CSVReader(InputStream input) {
read(); //throw runtime exception
OR this
public List<String> read() throws IOException {
//read and create output list.
// return list
What is a good way to achieve my goals? Should the object state be always well defined? - there is never a state where "output" is not defined, so I should create the output as part of constructor? Or should the class ensure that a created instance is always valid, by calling "read" whenever it finds that "output" is not defined and just throw a runtime exception? What is a good approach/ best practice here?
I would make read() private and have getOutput() call it as an implementation detail. If the point of exposing read() is to lazy-load the file, you can do that with exposing getOutput only
public List<String> getOutput() {
if (output == null) {
try {
output = read();
} catch (IOException) {
//here you either wrap into your own exception and then declare it in the signature of getOutput, or just not catch it and make getOutput `throws IOException`
return Collections.unmodifiableList(output);
The advantage of this is that the interface of your class is very trivial: you give me an input (via constructor) I give you an output (via getOutput), no magic order of calls while preserving lazy-loading which is nice if the file is big.
Another advantage of removing read from the public API is that you can go from lazy-loading to eager-loading and viceversa without affecting your clients. If you expose read you have to account for it being called in all possible states of your object (before it's loaded, while it's already running, after it already loaded). In short, always expose the least possible
So to address your specific questions:
Yes, the object state should always be well-defined. Your point of not knowing that an external call on read by the client class is needed is indeed a design smell
Yes, you could call read in the constructor and eagerly load everything upfront. Deciding to lazy-load or not is an implementation detail dependent on your context, it should not matter to a client of your class
Throwing an exception if read has not been called puts again the burden to calling things in the right, implicit order on the client, which is unnecessary due to your comment that output is never really undefined so the implementation itself can make the risk-free decision of when to call read
I would suggest you make your class as small as possible, dropping the getOutput() method all together.
The idea is to have a class that reads a CSV file and returns a list, representing the result. To achieve this, you can expose a single read() method, that will return a List<String>.
Something like:
public class CSVReader {
private final InputStream input;
public CSVReader(String filename) {
this.input = new FileInputStream(filename);
public List<String> read() {
// perform the actual reading here
You have a well defined class, a small interface to maintain and the instances of CSVReader are immutable.
Have getOutput check if it is null (or out of date) and load it in automatically if it is. This allows for a user of your class to not have to care about internal state of the class's file management.
However, you may also want to expose a read function so that the user can chose to load in the file when it is convenient. If you make the class for a concurrent environment, I would recommend doing so.
The first approach takes away some flexibility from the API: before the change the user could call read() in a context where an exception is expected, and then call getOutput() exception-free as many times as he pleases. Your change forces the user to catch a checked exception in contexts where it wasn't necessary before.
The second approach is how it should have been done in the first place: since calling read() is a prerequisite of calling getOutput(), it is a responsibility of your class to "catch" your users when they "forget" to make a call to read().
The third approach hides IOException, which may be a legitimate exception to catch. There is no way to let the user know if the exception is going to be thrown or not, which is a bad practice when designing runtime exceptions.
The root cause of your problem is that the class has two orthogonal responsibilities:
Reading a CSV, and
Storing the result of a read for later use.
If you separate these two responsibilities from each other, you would end up with a cleaner design, in which the users would have no confusion over what they must call, and in what order:
interface CSVData {
List<String> getOutput();
class CSVReader {
public static CSVData read(InputStream input) throws IOException {
You could combine the two into a single class with a factory method:
class CSVData {
private CSVData() { // No user instantiation
// Getting data is exception-free
public List<String> getOutput() {
// Creating instances requires a factory call
public static CSVData read(InputStream input) throws IOException {
I have some input that I add to a serialized object.
Now when I read the serialized object, I want to check if it exists... If not loop till it has a value in it.
How do i modify the deserialization function to handle that.
There is basically a delay in populating my serializable object. So in the meantime if i were to read that object, it is going to be empty. I want to put a check to read only when it has data in it. if not it should wait till it has some data
public String _displayResults(){
String SomeData = "";
try {
FileInputStream fis = new FileInputStream("SomeDataobj");
ObjectInputStream ois = new ObjectInputStream(fis);
SomeData = (String)ois.readObject();
catch(Exception e) {
System.out.println("Exception during deserialization: ");
return SomeData;
What I tried:
added a wait condition for 2 secs for 10 times... Is there a cleaner way.
while ( ois.readObject().toString().equalsIgnoreCase("") && i <10){
Java provides an API called Externalizable, which allows you to customize the (de) serialization. Serialiazable is marker interface and that indicates the object can be wrote to output stream. Externalizable provides two methods readExternal() and writeExternal() where you can override the behavior.
Your question is not so clear about what you want to achieve, so I am not sure if the above information is helpful for you
I'm writing GC friendly code to read and return to the user a series of byte[] messages. Internally I reuse the same ByteBuffer which means I'll repeatedly return the same byte[] instance most of the time.
I'm considering writing cautionary javadoc and exposing this to the user as a Iterator<byte[]>. AFAIK it won't violate the Iterator contract, but the user certainly could be surprised if they do Lists.newArrayList(myIterator) and get back a List populated with the same byte[] in each position!
The question: is it bad practice for a class that may mutate and return the same object to implement the Iterator interface?
If so, what is the best alternative? "Don't mutate/reuse your objects" is an easy answer. But it doesn't address the cases when reuse is very desirable.
If not, how do you justify violating the principle of least astonishment?
Two minor notes:
I'm using Guava's AbstractIterator so remove() isn't really of concern.
In my use case the user is me and the visibility of this class will be limited, but I've tried to ask this generally enough to apply more broadly.
Update: I'm accepting Louis' answer because it has 3x more votes than Keith's, but note that in my use case I'm planning to take the code that I left in a comment on Keith's answer to production.
EnumMap did essentially exactly this in its entrySet() iterator, which causes confusing, crazy, depressing bugs to this day.
If I were you, I just wouldn't use an Iterator -- I'd write a different API (possibly quite dissimilar from Iterator, even) and implement that. For example, you might write a new API that takes as input the ByteBuffer to write the message into, so users of the API could control whether or not the buffer gets reused. That seems reasonably intuitive (the user can write code that obviously and cleanly reuses the ByteBuffer), without creating unnecessarily cluttered code.
I would define an intermediate object which you can invalidate. So your function would return an Iterator<ByteArray>, and ByteArray is something like this:
class ByteArray {
private byte[] data;
ByteArray(byte[] d) { data = d; }
byte[] getData() {
if (data == null) throw new BadUseOfIteratorException();
return data;
void invalidate() { data = null; }
Then your iterator can invalidate the previously returned ByteArray so that any future access (via getData, or any other accessor you provide) will fail. Then at least if someone does something like Lists.newArrayList(myIterator), they will at least get an error (when the first invalid ByteArray is accessed) instead of silently returning the wrong data.
Of course, this won't catch all possible bad uses, but probably the common ones. If you're happy with never returning the raw byte[] and providing accessors like byte get(int idx) instead, then it should catch all cases.
You will have to allocate a new ByteArray for each iterator return, but hopefully that's a lot less expensive than copying your byte[] for each iterator return.
Just like Keith Randall I'd also create Iterator<ByteArray>, but working quite differently (the annotations below come from lombok):
public class ByteArray {
#Getter private final byte[] data;
private final ByteArrayIterable source;
void allowReuse() {
public class ByteArrayIterable implements Iterable<ByteArray> {
private boolean allowReuse;
public allowReuse() {
allowReuse = true;
public Iterator<ByteArray> iterator() {
return new AbstractIterator<ByteArray>() {
private ByteArray nextElement;
public ByteArray computeNext() {
if (noMoreElements()) return endOfData();
if (!allowReuse) nextElement =
new ByteArray(new byte[length], ByteArrayIterable.this);
allowReuse = false;
Now in calls like Lists.newArrayList(myIterator) always a new byte array gets allocated, so everything works. In your loops like
for (ByteArray a : myByteArrayIterable) {
the buffer gets reused. No harm may result, unless you call allowReuse() by mistake. If you forget to call it, then you get worse performance but correct behavior.
Now I see it could work without ByteArray, the important thing is that myByteArrayIterable.allowReuse() gets called, which could be done directly.