Groovy deep copy json map - java

I am trying to create a deep copy of a JSON map in groovy for a build config script.
I have tried the selected answer
def deepcopy(orig) {
bos = new ByteArrayOutputStream()
oos = new ObjectOutputStream(bos)
oos.writeObject(orig); oos.flush()
bin = new ByteArrayInputStream(bos.toByteArray())
ois = new ObjectInputStream(bin)
return ois.readObject()
}
from this existing question but it fails for JSON maps with java.io.NotSerializableException: groovy.json.internal.LazyMap
how can I create a deep copy of the JSON map?

Once you read the JSON, you have the copy.
import groovy.json.JsonSlurper
import groovy.json.JsonOutput
def json = new JsonSlurper().parseText('''{"l1": {"l2": {"l3": 42}}}''')
json.l1.l2.l3 = 23
assert '''{"l2":{"l3":23}}''' == JsonOutput.toJson(json.l1)

Related

Creating a deep copy of a multi-level list?

I have:
ArrayList<ArrayList<ArrayList<Task>>> optimalPaths = new ArrayList<ArrayList<ArrayList<Task>>>();
I would like to create a deep copy of optimalPaths. The copy itself should contain no references whatsoever to optimalPaths. Would the following code work?
ArrayList<ArrayList<ArrayList<Task>>> altPaths = new ArrayList<ArrayList<ArrayList<Task>>>();
for (ArrayList<ArrayList<Task>> e : optimalPaths){
altPaths.add((ArrayList<ArrayList<Task>>) e.clone()); // Create deep copy of optimalPaths
}
I'm not sure if there are still references within altPaths on some level.
You may do it by yourself
for (ArrayList<ArrayList<Task>> outer : optimalPaths) {
ArrayList<ArrayList<Task>> newOuter = new ArrayList<>();
for (ArrayList<Task> inner : outer) {
ArrayList<Task> newInner = new ArrayList<>();
for (Task task: inner) {
newInner.add((Task) task.clone());
}
newOuter.add(newInner);
}
altPaths.add(newOuter);
}
You can use copy by serialization and deserialization if Task class doesnt have any transient fields that you want to copy:
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream out = new ObjectOutputStream(bos);
out.writeObject(optimalPaths);
ByteArrayInputStream bis = new ByteArrayInputStream(bos.toByteArray());
ObjectInputStream in = new ObjectInputStream(bis);
ArrayList<ArrayList<ArrayList<Task>>> copied = (ArrayList<ArrayList<ArrayList<Task>>>) in.readObject();
or use external class to do that: SerializationUtils from Apache Commons

Reading json objects separated by new line

I am trying to write a test case where I want to stream json objects from a json file separated by new line into Java.
I want to stream one event object in Java and serialize it.
The json file is of the form:
{"event":[{"D49-64":0,"Bezeichnung":"A 41","D33-48":0}]}
{"event":[{"D49-64":1,"Bezeichnung":"A 41","D33-48":0}]}
Any suggestions to stream the objects in Java will be beneficial.
The blob that you have posted is not a valid JSONObject, but two individual objects.
To stream this, you would end up with something like the following:
String pathToFile = "/path/to/something.txt";
BufferedReader someReader = new BufferedReader( new FileReader( pathToFile ));
String someData;
while (( someData = someReader.readLine() ) != null ) {
JSONObject o = new JSONObject( someData );
doSomethingWith( o );
}
The library I generally use for JSON manipulation is org.json
I was solving the same problem: reading data from file which just has sequence of json objects in it. I am using com.fasterxml.jackson library for json manipulation. While it does not have direct methods for exactly this, the solution is still quite simple:
// InputStream in - input stream with your data
ObjectMapper mapper = new ObjectMapper();
JsonParser parser = mapper.getFactory().createParser(in);
ObjectNode nextObject;
do {
nextObject = mapper.readTree(parser); // returns null when end of stream is reached
// process your object here
} while(nextObject != null);

Java: storing a big map in resources

I need to use a big file that contains String,String pairs and because I want to ship it with a JAR, I opted to include a serialized and gzipped version in the resource folder of the application. This is how I created the serialization:
ObjectOutputStream out = new ObjectOutputStream(
new BufferedOutputStream(new GZIPOutputStream(new FileOutputStream(OUT_FILE_PATH, false))));
out.writeObject(map);
out.close();
I chose to use a HashMap<String,String>, the resulting file is 60MB and the map contains about 4 million entries.
Now when I need the map and I deserialize it using:
final InputStream in = FileUtils.getResource("map.ser.gz");
final ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(new GZIPInputStream(in)));
map = (Map<String, String>) ois.readObject();
ois.close();
this takes about 10~15 seconds. Is there a better way to store such a big map in a JAR? I ask because I also use the Stanford CoreNLP library which uses big model files itself but seems to perform better in that regard. I tried to locate the code where the model files are read but gave up.
Your problem is you zipped the data. Store it plain text.
The performance hit is most probably in unzipping the stream. Jars are already zipped, so there's no space saving storing the file zipped.
Basically:
Store the file in plain text
Use Files.lines(Paths.get("myfilenane.txt")) to stream the lines
Consume each line with minimal code
Something like this, assuming data is in form key=value (like a Properties file):
Map<String, String> map = new HashMap<>();
Files.lines(Paths.get("myfilenane.txt"))
.map(s -> s.split("="))
.forEach(a -> map.put(a[0], a[1]));
Disclaimer: Code may not compile or work as it was thumbed in on my phone (but there's a reasonable chance it will work)
What you can do is to apply a technique coming from the book Java Performance: The definitive guide from Scott Oaks which actually stores the zipped content of the object into a byte array so for this we need a wrapper class that I call here MapHolder:
public class MapHolder implements Serializable {
// This will contain the zipped content of my map
private byte[] content;
// My actual map defined as transient as I don't want to serialize its
// content but its zipped content
private transient Map<String, String> map;
public MapHolder(Map<String, String> map) {
this.map = map;
}
private void writeObject(ObjectOutputStream out) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (GZIPOutputStream zip = new GZIPOutputStream(baos);
ObjectOutputStream oos = new ObjectOutputStream(
new BufferedOutputStream(zip))) {
oos.writeObject(map);
}
this.content = baos.toByteArray();
out.defaultWriteObject();
// Clear the temporary field content
this.content = null;
}
private void readObject(ObjectInputStream in) throws IOException,
ClassNotFoundException {
in.defaultReadObject();
try (ByteArrayInputStream bais = new ByteArrayInputStream(content);
GZIPInputStream zip = new GZIPInputStream(bais);
ObjectInputStream ois = new ObjectInputStream(
new BufferedInputStream(zip))) {
this.map = (Map<String, String>) ois.readObject();
// Clean the temporary field content
this.content = null;
}
}
public Map<String, String> getMap() {
return this.map;
}
}
Your code will then simply be:
final ByteArrayInputStream in = new ByteArrayInputStream(
Files.readAllBytes(Paths.get("/tmp/map.ser"))
);
final ObjectInputStream ois = new ObjectInputStream(in);
MapHolder holder = (MapHolder) ois.readObject();
map = holder.getMap();
ois.close();
As you may have noticed, you don't zip anymore the content it is zipped internally while serializing the MapHolder instance.
You could consider one of many fast serialization libraries:
protobuf (https://github.com/google/protobuf)
flat buffers (https://google.github.io/flatbuffers/)
cap'n proto (https://capnproto.org)

How can I translate this deserialization code from java to scala?

I'm a Scala/Java noob, so sorry if this is a relatively easy solution--but I'm trying to access a model in an external file (an Apache Open NLP model), and not sure where I'm going wrong. Here's how you'd do it in Java, and here's what I'm trying:
import java.io._
val nlpModelPath = new java.io.File( "." ).getCanonicalPath + "/lib/models/en-sent.bin"
val modelIn: InputStream = new FileInputStream(nlpModelPath)
which works fine, but trying to instantiate an object based off the model in that binary file is where I'm failing:
val sentenceModel = new modelIn.SentenceModel // type SentenceModel is not a member of java.io.InputStream
val sentenceModel = new modelIn("SentenceModel") // not found: type modelIn
I've also tried a DataInputStream:
val file = new File(nlpModelPath)
val dis = new DataInputStream(file)
val sentenceModel = dis.SentenceModel() // value SentenceModel is not a member of java.io.DataInputStream
I'm not sure what I'm missing--maybe some method to convert the Stream to some binary object from which I can pull in methods? Thank you for any pointers.
The problem is that you're using wrong syntax (please, don't take it personal, but why don't you read some beginner java book or even just a tutorial first if you planning to stick with java or scala for some time?)
Code you would write in java
SentenceModel model = new SentenceModel(modelIn);
will look similar in scala:
val model: SentenceModel = new SentenceModel(modelIn)
// or just
val model = new SentenceModel(modelIn)
The problem you got with this syntax is that you forgot to import definition of SentenceModel so compiler simply has no clue what is SentenceModel.
Add
import opennlp.tools.sentdetect.SentenceModel
At the top of your .scala file and this will fix it.

How to test a Weka Text Classification (FilteredClassifier)

Looked at lots of examples for this, and so far no luck. I'd like to classify free text.
Configure a text classifier. (FilteredClassifier using StringToWordVector and LibSVM)
Train the classifier (add in lots of documents, train on filtered text)
Serialize the FilteredClassifier to disk, quit the app
Then later
Load up the serialized FilteredClassifier
Classify stuff!
It goes ok up to when I try to read from disk and classify things. All the documents and examples show the training list and testing list being built at the same time, and in my case, I'm trying to build a testing list after the fact.
A FilteredClassifier alone is not enough to create a testing Instance with the same "dictionary" as the original training set, so how do I save everything I need to classify at a later date?
http://weka.wikispaces.com/Use+WEKA+in+your+Java+code just says "Instances loaded from somewhere" and doesn't say anything about using a similar dictionary.
ClassifierFramework cf = new WekaSVM();
if (!cf.isTrained()) {
train(cf); // Train, save to disk
cf = new WekaSVM(); // reloads from file
}
cf.test("this is a test");
Ends up throwing
java.lang.ArrayIndexOutOfBoundsException: 2
at weka.core.DenseInstance.value(DenseInstance.java:332)
at weka.filters.unsupervised.attribute.StringToWordVector.convertInstancewoDocNorm(StringToWordVector.java:1587)
at weka.filters.unsupervised.attribute.StringToWordVector.input(StringToWordVector.java:688)
at weka.classifiers.meta.FilteredClassifier.filterInstance(FilteredClassifier.java:465)
at weka.classifiers.meta.FilteredClassifier.distributionForInstance(FilteredClassifier.java:495)
at weka.classifiers.AbstractClassifier.classifyInstance(AbstractClassifier.java:70)
at ratchetclassify.lab.WekaSVM.test(WekaSVM.java:125)
Serialize your Instances which holds the definition of the trained data -similar dictionary?- while you are serializing your classifier:
Instances trainInstances = ... //
Instances trainHeader = new Instances(trainInstances, 0);
trainHeader.setClassIndex(trainInstances .classIndex());
OutputStream os = new FileOutputStream(fileName);
ObjectOutputStream objectOutputStream = new ObjectOutputStream(os);
objectOutputStream.writeObject(classifier);
if (trainHeader != null)
objectOutputStream.writeObject(trainHeader);
objectOutputStream.flush();
objectOutputStream.close();
To desialize:
Classifier classifier = null;
Instances trainHeader = null;
InputStream is = new BufferedInputStream(new FileInputStream(fileName));
ObjectInputStream objectInputStream = new ObjectInputStream(is);
classifier = (Classifier) objectInputStream.readObject();
try { // see if we can load the header
trainHeader = (Instances) objectInputStream.readObject();
} catch (Exception e) {
}
objectInputStream.close();
Use trainHeader to create new Instance:
int numAttributes = trainHeader.numAttributes();
double[] vals = new double[numAttributes];
for (int i = 0; i < numAttributes - 1; i++) {
Attribute attribute = trainHeader.attribute(i);
//If your attribute is nominal or string:
double value = attribute.indexOfValue(myStrVal); //get myStrVal from your source
//If your attribute is numeric
double value = myNumericVal; //get myNumericVal from your source
vals[i] = value;
}
vals[numAttributes] = Instance.missingValue();
Instance instance = new Instance(1.0, vals);
instance.setDataset(trainHeader);
return instance;

Categories