using BerkeleyDB to replace java.util.List - java

does someone maybe have sample code how to replace a java List (LinkedList or ArrayList) with something similar in BerkeleyDB? My problem is that I have to replace Lists to scale beyond main memory limits. Some simple sample code would be really nice.
I've now used a simple TupleBinding for Integers (keys) and a SerialBinding for the Diff-class (data values).
Now I'm receiving the Error:
14:03:29.287 [pool-5-thread-1] ERROR o.t.g.view.model.TraverseCompareTree - org.treetank.diff.Diff; local class incompatible: stream classdesc serialVersionUID = 8484615870884317488, local class serialVersionUID = -8805161170968505227
java.io.InvalidClassException: org.treetank.diff.Diff; local class incompatible: stream classdesc serialVersionUID = 8484615870884317488, local class serialVersionUID = -8805161170968505227
The listener and TransactionRunner classes which I'm using are:
/** {#inheritDoc} */
#Override
public void diffListener(final EDiff paramDiff, final IStructuralItem paramNewNode,
final IStructuralItem paramOldNode, final DiffDepth paramDepth) {
try {
mRunner.run(new PopulateDatabase(mDiffDatabase, mKey++, new Diff(paramDiff, paramNewNode.getNodeKey(), paramOldNode.getNodeKey(), paramDepth)));
} catch (final Exception e) {
LOGWRAPPER.error(e.getMessage(), e);
}
}
private static class PopulateDatabase implements TransactionWorker {
private StoredMap<Integer, Diff> mMap;
private int mKey;
private Diff mValue;
public PopulateDatabase(final DiffDatabase paramDatabase, final int paramKey, final Diff paramValue) {
Objects.requireNonNull(paramDatabase);
Objects.requireNonNull(paramValue);
mMap = paramDatabase.getMap();
mKey = paramKey;
mValue = paramValue;
}
#Override
public void doWork() throws DatabaseException {
mMap.put(mKey, mValue);
}
}
I don't know why it doesn't work :-/
Edit: Sorry, I just had to delete the generated environment/database and create a new one.

I'm afraid, it wont be that simple. A first step, you might want to take is to refactor your code to move all accesses to the list into a separate class (call it a DAO, if you like). Then it will be a lot easier to move to a database instead of the list.

Berkeley DB is severe over-kill for this type of task. It's a fair beast to configure and set up, plus I believe the license is now commercial. You'll be much better off using a disk-backed list or map. As an example of the latter take a look at Kyoto Cabinet. It's extremely fast, implements the standard Java Collections interface and is as easy to use as a List or Map. See my other answer for example code.

Related

Java/Spring -> how to structure (Design Pattern) the relationship between multiple classes involved in the same process

TLDR;
Does my DailyRecordDataManager class have a code smell? Is it a 'God Class'? and how can I improve the structure?
Hi,
I'm working on my first project with Spring. It's going to fetch covid-19 data from the Madrid (where I live) government website, organise it by locality, and serve it up through an API.
Here is a sample of the JSON data I'm consuming.
{
"codigo_geometria": "079603",
"municipio_distrito": "Madrid-Retiro",
"tasa_incidencia_acumulada_ultimos_14dias": 23.4668991007149,
"tasa_incidencia_acumulada_total": 1417.23308497532,
"casos_confirmados_totales": 1691,
"casos_confirmados_ultimos_14dias": 28,
"fecha_informe": "2020/07/01 09:00:00"
}
Each JSON object is a a record of cases and the infection rate on a specific date and for a specific municipal district.
After fetching the data the program: parses it, filters it, trims/rounds some properties, maps it by locality, uses it to create an object for each locality (DistrictData), and writes the locality DistrictData objects to a MonoDB instance.
At the moment I have split each of these steps in the process separate classes, as per the single responsibility principle. As can be seen in the linked screenshot:
screenshot of intellij package structure
My problem is I don't know how to link these multiple classes together.
At the moment I have a Manager class which smells a bit like a God Class to me:
#Service
public class DailyRecordDataManager implements DataManager {
private final Logger logger = LoggerFactory.getLogger(DailyRecordDataManager.class);
private final DailyRecordDataCollector<String> dataCollector;
private final DataVerifier<String> dataVerifier;
private final JsonParser<DailyRecord> dataParser;
private final DataFilter<List<DailyRecord>> dataFilter;
private final DataTrimmer<List<DailyRecord>> dataTrimmer;
private final DataSorter<List<DailyRecord>> dataSorter;
private final DataMapper<List<DailyRecord>> dataMapper;
private final DataTransformer dataTransformer;
private final DistrictDataService districtDataService;
public DailyRecordDataManager(DailyRecordDataCollector<String> collector,
DataVerifier<String> verifier,
JsonParser<DailyRecord> parser,
DataFilter<List<DailyRecord>> dataFilter,
DataTrimmer<List<DailyRecord>> dataTrimmer,
DataSorter<List<DailyRecord>> dataSorter,
DataMapper dataMapper,
DataTransformer dataConverter,
DistrictDataService districtDataService) {
this.dataCollector = collector;
this.dataVerifier = verifier;
this.dataParser = parser;
this.dataFilter = dataFilter;
this.dataTrimmer = dataTrimmer;
this.dataSorter = dataSorter;
this.dataMapper = dataMapper;
this.dataTransformer = dataConverter;
this.districtDataService = districtDataService;
}
#Override
public boolean newData() {
String data = dataCollector.collectData();
if (!dataVerifier.verifyData(data)) {
logger.debug("Data is not new.");
return false;
}
List<DailyRecord> parsedData = dataParser.parse(data);
if (parsedData.size() == 0) {
return false;
}
List<DailyRecord> filteredData = dataFilter.filter(parsedData);
List<DailyRecord> trimmedData = dataTrimmer.trim(filteredData);
List<DailyRecord> sortedData = dataSorter.sort(trimmedData);
Map<String, List<DailyRecord>> mappedData = dataMapper.map(sortedData);
List<DistrictData> convertedData = dataTransformer.transform(mappedData);
districtDataService.save(convertedData);
return true;
}
}
I also thought about linking all of the involved classes together in a chain of Injected Dependencies -> so each class has the next class in the process as a dependency and, provided nothing goes wrong with the data, calls that next class in the chain when it's time.
I do also however feel that there must be a design pattern that solves the problem I have!
Thanks!
For anyone who finds this and wonders what I ended up opting for the Pipeline pattern.
It allowed me to easily organise all of the individual classes I was using into one clean workflow. It also made each stage of the process very easy to test. As well as the pipeline class itself!
I highly recommend anyone interested in the patter in Java to check out this article, which I used extensively.

Constructor over-injection and Facade Service concept

I have a pretty simple interface which manages the update of business proposals, specifically during a nightly batch process each record is submitted here (but it might be used in other scenarios).
This interface is used inside an EJB 2.0 Bean, which fetches records and "cycles" them.
Beware names are translated from Italian to English so pardon possible errors. I also simplified some concepts.
public interface ProposalUpdateService {
void updateProposal(final ProposalFirstType proposal);
void updateProposal(final ProposalSecondType proposal);
}
The implementation of this interface has quite a lot of dependencies:
public class ProposalUpdateDefaultService implements ProposalUpdateService {
private final ComplexService complexService;
private final OtherComplexService otherComplexService;
private final ProposalStep<Proposal> stepOne;
private final ProposalStep<Proposal> stepTwo;
private final ProposalStep<ProposalTypeTwo> stepThree;
private final ProposalStep<Proposal> stepFour;
public ProposalUpdateDefaultService(
final ComplexService complexService,
final OtherComplexService otherComplexService,
final YetAnotherComplexService yetAnotherComplexService,
final SimpleService simpleService,
final OtherSimpleService otherSimpleService,
final YetAnotherSimpleService yetAnotherSimpleService,
final Converter<ProposalTypeOne, ComplexServiceType> converterProposalTypeOne,
final Converter<ProposalTypeTwo, OtherComplexServiceType> converterProposalTypeTwo) {
this.complexService = complexService;
this.otherComplexService = otherComplexService;
stepOne = new StepOne(yetAnotherComplexService);
stepTwo =
new StepTwo(
complexService,
otherComplexService,
yetAnotherComplexService,
converterProposalTypeOne,
converterProposalTypeTwo);
stepThree =
new StepThree(
simpleService,
otherSimpleService,
yetAnotherSimpleService);
stepFour = new StepFour();
}
...
As you can see this class encapsulate the update of a Proposal object, and this process is splitted in four phases, each representing a single concept (such as, "should this proposal be expired?" or "should I advance its state?"). Those four phases may be arranged differently between different types of Proposal.
Here is the highly simplified implementation of those two updateProposal methods:
#Override
public void updateProposal(final ProposalTypeOne proposal) {
stepOne.process(proposal);
stepTwo.process(proposal);
if (...) {
stepFour.process(proposal);
}
}
#Override
public void updateProposal(final ProposalTypeTwo proposal) {
stepOne.process(proposal);
stepTwo.process(proposal);
stepThree.process(proposal);
stepFour.process(proposal);
}
The two private fields
private final ComplexService complexService;
private final OtherComplexService otherComplexService;
are used for helper private methods.
As you can see this class just organize and delegate work, however, it does depend on too many other classes. The same could be said for certain ProposalStep(s).
The *Service(s) are used inside each step to retrieve details from the database, to update dependent entries, etc.
Would you accept this number of dependencies?
How would you refactor to simplify?
I've read about the Facade Service concept as a way to reduce dependencies, and how I should group cluster of dependencies together, but here I don't really understand what to do.
I may group the Converter(s) and the Service(s) which uses them, but they'll be too many anyway.
Let me know if other details are needed.
The issue I can see is ProposalUpdateDefaultService doing too many things and know too much. It accepts a lot of services, creates steps and executes the steps instead it should only accept a single parameter object and update without knowing what are the steps.
First I would try to reduce the parameters from the constructor ProposalUpdateDefaultService by creating a separate class which will contain the services and converters.
public class ServicesAndConverters {
ComplexService complexService;
OtherComplexService otherComplexService
//...
}
In that way the code can be much cleaner
public class ProposalUpdateDefaultService implements ProposalUpdateService {
ServiceAndConverters serviceAndConvert;
public ProposalUpdateDefaultService(final ServiceAndConverters serviceAndConverters) {
this.serviceAndConvert = serviceAndConverters; //maybe group them in two different class??
}
}
Now the second issue I can see to create steps in the ProposalUpdateDefaultService itself. This should be responsibility of different class. Something like below
public class ProposalUpdateDefaultService implements ProposalUpdateService {
ServiceAndConverters serviceAndConvert;
StepCreator stepCreator = new StepCreator();
public ProposalUpdateDefaultService(final ServiceAndConverters serviceAndConverters) {
this.serviceAndConvert = serviceAndConverters;
stepCreator.createSteps(this.serviceAndConverter);
}
}
And the StepCreator class should look like this
public class StepCreator implements ProposalUpdateService {
private final ProposalStep<Proposal> stepOne;
private final ProposalStep<Proposal> stepTwo;
private final ProposalStep<ProposalTypeTwo> stepThree;
private final ProposalStep<Proposal> stepFour;
public void createSteps(ServiceAndConverters s) {
// do the step processing here
}
}
Now ProposalUpdateDefaultService can execute the steps without knowing what is the steps and which service need to execute
#Override
public void updateProposal(final ProposalTypeOne proposal) {
stepCreator.getStepOne().process(proposal);
stepCreator.getStepTwo().process(proposal);
if (...) {
stepCreator.getStepFour().process(proposal);
}
}
The solution that I found more convenient is just removing the ProposalUpdateService abstraction, and letting the EJB Bean manage the various steps.
This abstraction layer was unnecessary as of now, and each step is still usable individually. Both ProposalUpdateService method invocations become private methods in the EJB Bean.

Why am I getting a NotSerializableException here?

I'm trying to map a function across a JavaRDD in spark, and I keep getting NotSerializableError on the map call.
public class SparkPrunedSet extends AbstractSparkSet {
private final ColumnPruner pruner;
public SparkPrunedSet(#JsonProperty("parent") SparkSet parent, #JsonProperty("pruner") ColumnPruner pruner) {
super(parent);
this.pruner = pruner;
}
public JavaRDD<Record> getRdd(SparkContext context) {
JavaRDD<Record> rdd = getParent().getRdd(context);
Function<Record, Record> mappingFunction = makeRecordTransformer(pruner);
//The line below throws the error
JavaRDD<Record> mappedRdd = rdd.map(mappingFunction);
return mappedRdd;
}
private Function<Record, Record> makeRecordTransformer() {
return new Function<Record, Record>() {
private static final long serialVersionUID = 1L;
#Override
public Record call(Record record) throws Exception {
// Obviously i'd like to do something more useful in here, but this is enough
// to throw the error
return record;
}
};
}
}
When it runs, I get:
java.io.NotSerializableException: com.package.SparkPrunedSet
Record is an interface that implements serializable, and MapRecord is an implementation of it. Similar code to this exists and works in the codebase, except it's using rdd.filter instead. I've read through most of the other stack overflow entries on this, and none of them seem to help. I thought it might have to do with troubles serializing SparkPrunedSet (although I don't understand why it would even need to do this), so I set all of the fields on it to transient, but that didn't help either. Does anyone have any ideas?
The Function you are creating for the transformation is, in fact, an (anonymous) inner class of SparkPrunedSet. Therefore every instance of that function has an implicit reference to the SparkPrunedSet object that created it.
Therefore, serialization of it will require serialization of SparkPrunedSet.

Serialize an Object Array to sent over Sockets

I have an array that I have created from a database ResultSet. I am trying to Serialize it so that I can send it over a socket stream. At the moment I am getting an error telling me that the array is not Serializable. The code I have is down below, the first part is the class to create an object for the array:
class ProteinData
{
private int ProteinKey;
public ProteinData(Integer ProteinKey)
{
this.ProteinKey = ProteinKey;
}
public Integer getProteinKey() {
return this.ProteinKey;
}
public void setProteinKey(Integer ProteinKey) {
this.ProteinKey = ProteinKey;
}
}
The code to populate the array:
public List<ProteinData> readJavaObject(String query, Connection con) throws Exception
{
PreparedStatement stmt = con.prepareStatement(query);
query_results = stmt.executeQuery();
while (query_results.next())
{
ProteinData pro = new ProteinData();
pro.setProteinKey(query_results.getInt("ProteinKey"));
tableData.add(pro);
}
query_results.close();
stmt.close();
return tableData;
}
And the code to call this is:
List dataList = (List) this.readJavaObject(query, con);
ObjectOutputStream output_stream = new ObjectOutputStream(socket.getOutputStream());
output_stream.writeObject(dataList);
And the code recieving this is:
List dataList = (List) input_stream.readObject();
Can someone help me serailize this array. All I can find in forums is simple arrays(EG. int[]).
I tried to add the serializable to the class and the UID number but got java.lang.ClassNotFoundException: socketserver.ProteinData error message. Does anyone now why?
Thanks for any help.
Basically you need that the classes you want to serialize are implementing Serializable. And if you want to avoid the warning related to the serial you should have also a long serialVersionUIDfor each one, that is a code used to distinguish your specific version of the class. Read a tutorial like this one to get additional info, serialization is not so hard to handle..
However remember that serialization is faulty when used between two different versions of the JVM (and it has some flaws in general).
Just a side note: the interface Serializabledoesn't actually give any required feature to the class itself (it's not a typical interface) and it is used just to distinguish between classes that are supposed to be sent over streams and all the others. Of course, if a class is Serializable, all the component it uses (instance variables) must be serializable too to be able to send the whole object.
Change your class declaration to:
class ProteinData implements Serializable {
...
}
I would have thought as a minimum that you would need
class ProteinData implements Serializable
and a
private static final long serialVersionUID = 1234556L;
(Eclipse will generate the magic number for you).
in the class.

Make Java runtime ignore serialVersionUIDs?

I have to work with a large number of compiled Java classes which didn't explicitly specify a serialVersionUID. Because their UIDs were arbitrarily generated by the compiler, many of the classes which need to be serialized and deserialized end up causing exceptions, even though the actual class definitions match up. (This is all expected behavior, of course.)
It is impractical for me to go back and fix all of this 3rd-party code.
Therefore, my question is: Is there any way to make the Java runtime ignore differences in serialVersionUIDs, and only fail to deserialize when there are actual differences in structure?
If you have access to the code base, you could use the SerialVer task for Ant to insert and to modify the serialVersionUID in the source code of a serializable class and fix the problem once for all.
If you can't, or if this is not an option (e.g. if you have already serialized some objects that you need to deserialize), one solution would be to extend ObjectInputStream. Augment its behavior to compare the serialVersionUID of the stream descriptor with the serialVersionUID of the class in the local JVM that this descriptor represents and to use the local class descriptor in case of mismatch. Then, just use this custom class for the deserialization. Something like this (credits to this message):
import java.io.IOException;
import java.io.InputStream;
import java.io.InvalidClassException;
import java.io.ObjectInputStream;
import java.io.ObjectStreamClass;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class DecompressibleInputStream extends ObjectInputStream {
private static Logger logger = LoggerFactory.getLogger(DecompressibleInputStream.class);
public DecompressibleInputStream(InputStream in) throws IOException {
super(in);
}
#Override
protected ObjectStreamClass readClassDescriptor() throws IOException, ClassNotFoundException {
ObjectStreamClass resultClassDescriptor = super.readClassDescriptor(); // initially streams descriptor
Class localClass; // the class in the local JVM that this descriptor represents.
try {
localClass = Class.forName(resultClassDescriptor.getName());
} catch (ClassNotFoundException e) {
logger.error("No local class for " + resultClassDescriptor.getName(), e);
return resultClassDescriptor;
}
ObjectStreamClass localClassDescriptor = ObjectStreamClass.lookup(localClass);
if (localClassDescriptor != null) { // only if class implements serializable
final long localSUID = localClassDescriptor.getSerialVersionUID();
final long streamSUID = resultClassDescriptor.getSerialVersionUID();
if (streamSUID != localSUID) { // check for serialVersionUID mismatch.
final StringBuffer s = new StringBuffer("Overriding serialized class version mismatch: ");
s.append("local serialVersionUID = ").append(localSUID);
s.append(" stream serialVersionUID = ").append(streamSUID);
Exception e = new InvalidClassException(s.toString());
logger.error("Potentially Fatal Deserialization Operation.", e);
resultClassDescriptor = localClassDescriptor; // Use local class descriptor for deserialization
}
}
return resultClassDescriptor;
}
}
Use CGLIB to insert them into the binary classes?
How impractical is this to fix ? If you have the source and can rebuild, can you not just run a script over the entire codebase to insert a
private long serialVersionUID = 1L;
everywhere ?
The serialization errors at runtime tell you explicitly what the ID is expected to be. Just change your classes to declare these as the ID and everything will be OK. This does involve you making changes but I don't believe that this can be avoided
You could possibly use Aspectj to 'introduce' the field into each serializable class as it is loaded. I would first introduce a marker interface into each class using the package and then introduce the field using a Hash of the class file for the serialVersionUID
public aspect SerializationIntroducerAspect {
// introduce marker into each class in the org.simple package
declare parents: (org.simple.*) implements SerialIdIntroduced;
public interface SerialIdIntroduced{}
// add the field to each class marked with the interface above.
private long SerialIdIntroduced.serialVersionUID = createIdFromHash();
private long SerialIdIntroduced.createIdFromHash()
{
if(serialVersionUID == 0)
{
serialVersionUID = getClass().hashCode();
}
return serialVersionUID;
}
}
You will need to add the aspectj load time weaver agent to the VM so it can weave the advice into your existing 3rd party classes. Its funny though once you get around to setting Aspectj up, its remarkable the number of uses that you will put to.
HTH
ste

Categories