ObjectOutputStream.writeObject() freezing when trying to write object between classes

ObjectOutputStream.writeObject() freezing when trying to write object between classes - java

I am a beginner programmer so please excuse any technically incorrect statements/incorrect use of terminology.
I am trying to make a program that reduces CNF SAT in DIMACS format to 3SAT, and then from 3SAT to 3Graph Coloring, and then 3Graph coloring to SAT again. The idea is to make it circular so that the output from one reduction can be piped straight into the input of another, AKA if you reduce a CNF to 3SAT, the program should automatically reduce the 3SAT to Graph coloring after if the use specifies it to.
I have chosen to represent CNFs in a LinkedHashMap in a class called CNFHandler. The LinkedHashMap is where File = the DIMACS cnf formatted file and the CNF object is the CNF (which contains an ArrayList of Clause objects) that corresponds to the CNF.
In my CNFHandler class, I have a reduce object, and it's in this object that I am trying to initiate my piping functionality:
package CNFHandler;
import SAT_to_3SAT_Reducer.Reducer;
import java.io.*;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Optional;
public class CNFHandler {
private Map<File, CNF> allCNFs = new LinkedHashMap<>();
private CNFReader reader;
private Reducer reducer = new Reducer();
// PIPES
private Optional<ObjectInputStream> inputPipe;
private Optional<ObjectOutputStream> outputPipe;
// Instantiate Pipes
public void setInputPipe(ObjectInputStream inputStream) {
this.inputPipe = Optional.of(inputStream);
}
public void setOutputPipe(ObjectOutputStream outputStream) {
this.outputPipe = Optional.of(outputStream);
}
//...
// Skipping lines for brevity
//...
public void reduce(String filePath) {
File path = new File(filePath);
addCNF(filePath);
CNF result = reducer.reduce(allCNFs.get(path));
if (!outputPipe.isPresent()) {
System.out.println(result.toDIMACS());
} else {
try {
outputPipe.get().writeObject(result);
outputPipe.get().close();
} catch (Exception e){
e.printStackTrace();
}
}
}
}
When I try to run "writeObject" (within the try block in the reduce() method) the program doesn't seem to go past that point. I've tried using breakpoints in IntelliJ to see what's going on, but the best I could figure out was as follows:
A Native method called waitForReferencePendingList() seems to be stuck waiting for something, and that's why it won't go past the writeObject method
IntelliJ tells me "Connected to the target VM, address: '127.0.0.1:51236', transport: 'socket'" but I'm not sure why because I'm not using Sockets anywhere in my program
Here is the code for my Main method where I instantiate the ObjectOutputStreams :
import CNFHandler.CNFHandler;
import GraphHandler.GraphHandler;
import java.io.*;
public class Main {
public static void main(String[] args) {
try {
String inFile = "short_cnf.cnf";
PipedOutputStream _S_3S_OUT_PIPE_STREAM = new PipedOutputStream();
PipedInputStream _S_3S_IN_PIPE_STREAM = new PipedInputStream();
_S_3S_IN_PIPE_STREAM.connect(_S_3S_OUT_PIPE_STREAM);
ObjectOutputStream _S_3S_OUT_OBJECT_STREAM = new ObjectOutputStream(_S_3S_OUT_PIPE_STREAM);
ObjectInputStream _S_3S_IN_OBJEECT_STREAM = new ObjectInputStream(_S_3S_IN_PIPE_STREAM);
CNFHandler handler = new CNFHandler();
handler.setOutputPipe(_S_3S_OUT_OBJECT_STREAM);
handler.reduce(inFile);
PipedOutputStream _3S_G_OUT = new PipedOutputStream();
PipedInputStream _3S_G_IN = new PipedInputStream();
_3S_G_IN.connect(_3S_G_OUT);
ObjectOutputStream _3S_G_OUT_STREAM = new ObjectOutputStream(_3S_G_OUT);
ObjectInputStream _3S_G_IN_STREAM = new ObjectInputStream(_3S_G_IN);
GraphHandler graphHandler = new GraphHandler();
graphHandler.setInputPipe(_S_3S_IN_OBJEECT_STREAM);
graphHandler.setOutputPipe(_3S_G_OUT_STREAM);
graphHandler.reduce();
} catch (IOException e) {
e.printStackTrace();
}
}
}
The other weird thing is that the writeObject() method seems to work if I use a different kind of object, for example, if I instantiate a String within the writeObject() method in the same place it's being called in reduce(), or if I instantiate a new CNF object in the same place, it WILL write the object. But I can't do it this way because I have to pass along the values of the object as well (the clauses, etc.) so I don't know what to do.
This is my CNF class, in brief:
package CNFHandler;
import java.io.Serializable;
import java.util.*;
public class CNF implements Serializable {
protected int numVars;
protected int numClauses;
protected String fileName;
// store all variables with no duplicates
protected Set<String> allLiterals = new HashSet<>();
protected ArrayList<Clause> clauses = new ArrayList<>();
/*
for printing to DIMACS: keep track of the max # of
literals that are needed to print a clause
for example if all clauses in the CNF file contain
2 literals, and only one contains 3 literals
then the literalsize will be 3 to ensure things
are printed with proper spacing
*/
protected int literalSize = -20;
/*
keep track of the label referring to the highest #ed literal
just in case they are not stored in order -- this way when we perform
reductions we can just add literals to the end and be sure we are not
duplicating any
*/
protected int highestLiteral = -10;
public CNF(String fileName) {
this.fileName = fileName;
}
protected void addClause(String[] inputs) {
try {
Clause clauseToAdd = new Clause();
// add literals to the hashset, excluding dashes that indicate negative literals
for (int i = 0; i < inputs.length - 1; i++) {
// removing whitespace from the input
String toAdd = inputs[i].replaceAll("\\s+", "");;
// in case the variable is false (has a dash before the int), remove the dash to standardize storage
String moddedToAdd = inputs[i].replaceAll("[-]*", "");
/*
if an unknown variable is in the stream, reject it.
(we're basically checking here if the variable set is full,
and if it is and the variable we're trying to add is new,
then it can't be added)
*/
if ((!allLiterals.contains(moddedToAdd)) && (allLiterals.size() == numVars) && (moddedToAdd.trim().length() > 0)) {
throw new FailedCNFException();
}
// add the original input (so not the regex'd one but the one that would be false if it had been input as false
clauseToAdd.addLiteral(toAdd);
if (!allLiterals.contains(moddedToAdd) && !moddedToAdd.equalsIgnoreCase("")) {
allLiterals.add(moddedToAdd);
/*
change the highestLiteral value if the literal being added is "bigger" than the others that have been seen
*/
if(highestLiteral < Integer.parseInt(moddedToAdd)) {
highestLiteral = Integer.parseInt(moddedToAdd);
}
}
}
if (clauseToAdd.getNumberOfLiterals() > literalSize) {
literalSize = clauseToAdd.getNumberOfLiterals();
}
clauses.add(clauseToAdd);
} catch (FailedCNFException e) {
System.out.println("The number of variables that have been introduced is too many!");
}
}
public void makeClause(String[] inputs) {
try {
if (inputs[inputs.length - 1].equals("0")) {
addClause(inputs);
} else throw new FailedCNFException();
} catch (FailedCNFException f) {
System.out.println("There is no 0 at the end of this line: ");
for (String s : inputs ) {
System.out.print(s + " ");
}
System.out.println();
}
}
public void initializeClauses (String[] inputs) {
setNumVars(inputs[2]);
setNumClauses(inputs[3]);
}
public String toDIMACS () {
String toReturn = "p cnf " + getNumVars() + " " + getNumClauses() + "\n";
for(int i = 0; i < clauses.size()-1; i++){
Clause c = clauses.get(i);
toReturn += c.toDIMACS(literalSize) + "\n";
}
toReturn += clauses.get(clauses.size()-1).toDIMACS(literalSize);
return toReturn;
}
/*
Override tostring method to print clauses in human-readable format
*/
#Override
public String toString () {
if(highestLiteral != -10) {
String toReturn = "(";
for (int i = 0; i < clauses.size() - 1; i++) {
Clause c = clauses.get(i);
toReturn += c + "&&";
}
toReturn += clauses.get(clauses.size() - 1).toString() + ")";
return toReturn;
} else {
return "Add some clauses!";
}
}
public String toString (boolean addFile) {
String toReturn = "";
if (addFile) {
toReturn += "src/test/ExampleCNFs/" + fileName + ".cnf: \n";
}
toReturn += "(";
for(int i = 0; i < clauses.size()-1; i++){
Clause c = clauses.get(i);
toReturn += c + "&&";
}
toReturn += clauses.get(clauses.size()-1).toString() + ")";
return toReturn;
}
//=============================================================================
// HELPER FUNCTIONS
//=============================================================================
public void setNumVars(String vars) {
numVars = Integer.parseInt(vars);
}
public void setNumClauses(String clauses) {
numClauses = Integer.parseInt(clauses);
}
public Clause getClause(int index) {
return clauses.get(index);
}
public void addLiteral(int newLiteral) {
allLiterals.add(String.valueOf(newLiteral));
}
public void addLiterals(Set<String> newLiterals) {
allLiterals.addAll(newLiterals);
}
public void addClauses(ArrayList<Clause> toAdd, int maxLiterals) {
clauses.addAll(toAdd);
numClauses += toAdd.size();
// update literalsize if need be
if (maxLiterals > literalSize) {
literalSize = maxLiterals;
}
}
//=============================================================================
// GETTERS AND SETTERS
//=============================================================================
public void setNumVars(int numVars) {
this.numVars = numVars;
}
public void setNumClauses(int numClauses) {
this.numClauses = numClauses;
}
public int getNumVars() {
return numVars;
}
public int getNumClauses() {
return numClauses;
}
public ArrayList<Clause> getClauses() {
return clauses;
}
public Set<String> getAllLiterals() {
return allLiterals;
}
//
// LITERAL SIZE REPRESENTS THE MAXIMUM NUMBER OF LITERALS A CLAUSE CAN CONTAIN
//
public int getLiteralSize() {
return literalSize;
}
public void setLiteralSize(int literalSize) {
this.literalSize = literalSize;
}
public String getFilePath() {
return "src/test/ExampleCNFs/" + fileName + ".cnf";
}
public String getFileName() {
return fileName;
}
public void setFileName(String fileName) {
this.fileName = fileName;
}
//
// HIGHEST LITERAL REPRESENTS THE HIGHEST NUMBER USED TO REPRESENT A LITERAL
// IN THE DIMACS CNF FORMAT
//
public int getHighestLiteral() {
return highestLiteral;
}
public void setHighestLiteral(int highestLiteral) {
this.highestLiteral = highestLiteral;
}
public void setHighestLiteral(String highestLiteral) {
this.highestLiteral = Integer.parseInt(highestLiteral);
}
}
Can someone give me some insight as to what's going on here, please? Thank you very much.

First of all, neither of the symptoms are actually relevant to your question:
A Native method called waitForReferencePendingList() seems to be stuck waiting for something.
You appear to have found an internal thread that is dealing with the processing of Reference objects following a garbage collection. It is normal for it to be waiting there.
IntelliJ tells me "Connected to the target VM, address: '127.0.0.1:51236', transport: 'socket'"
That is Intellij saying that it has connected to the debug agent in the JVM that is running your application. Again, this is normal.
If you are trying to find the cause of a problem via a debugger, you need to find the application thread that is stuck. Then drill down to the point where it is actually stuck and look at the corresponding source code to figure out what it is doing. In this case, you need to look at the standard Java SE library source code for your platform. Randomly looking for clues rarely works ...
Now to your actual problem.
Without a stacktrace or a minimal reproducible example, it is not possible to say with certainty what is happening.
However, I suspect that writeObject is simply stuck waiting for something to read from the "other end" of the pipeline. It looks like you have set up a PipedInputStream / PipedOutputStream pair. This has only a limited amount of buffering. If the "writer" writes too much to the output stream, it will block until the "reader" has read some data from the input stream.
The other weird thing is that the writeObject() method seems to work if I use a different kind of object ...
The other kind of object probably has a smaller serialization which fits into the available buffer space.

Related

Java: How to search duplicate files in a folder not only by name ,but also by size and content?

I want to create a Java application to identify duplicates. So far I can find duplicates only by name, but I also need size, file type, and maybe content. This is my code so far, using a HashMap:
public static void find(Map<String, List<String>> lists, File dir) {
for (File f : dir.listFiles()) {
if (f.isDirectory()) {
find(lists, f);
} else {
String hash = f.getName() + f.length();
List<String> list = lists.get(hash);
if (list == null) {
list = new LinkedList<String>();
lists.put(hash, list);
}
list.add(f.getAbsolutePath());
}
}
}

I used MessageDigest and checked some files and find the duplicates according to all the criteria I have listed in the title and description. Thank you all.
private static MessageDigest messageDigest;
static {
try {
messageDigest = MessageDigest.getInstance("SHA-512");
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException("cannot initialize SHA-512 hash function", e);
}
}
and this is the result after implementation in the search code for duplicates
public static void find(Map<String, List<String>> lists, File dir) {
for (File f : dir.listFiles()) {
if (f.isDirectory()) {
find(lists, f);
} else {
try{
FileInputStream fi = new FileInputStream(f);
byte fileData[] = new byte[(int) f.length()];
fi.read(fileData);
fi.close();
//Crearea id unic hash pentru fisierul curent
String hash = new BigInteger(1, messageDigest.digest(fileData)).toString(16);
List<String> list = lists.get(hash);
if (list == null) {
list = new LinkedList<String>();
}
//Adăugați calea către listă
list.add(f.getAbsolutePath());
//Adauga lista actualizată la tabelul Hash
lists.put(hash, list);
}catch (IOException e) {
throw new RuntimeException("cannot read file " + f.getAbsolutePath(), e);
}
}
}
}

Considering 2 files equal if they have the same extension and the same file size is simply a matter of creating an object that represents this 'equality'. So, you'd make something like:
public class FileEquality {
private final String fileExtension;
private final long fileSize;
// constructor, toString, equals, hashCode, and getters here.
}
(and fill in all the missing boilerplate: Constructor, toString, equals, hashCode, and getters. See Project Lombok's #Value to make this easy if you like). You can get the file extension from a file name by using fileName.lastIndexOf('.') and fileName.substring(lastIndex). With lombok all you'd have to write is:
#lombok.Value public class FileEquality {
String fileExtension;
long fileSize;
}
Then use FileEquality objects as keys in your hashmap instead of strings. However, just because you have, say, 'foo.txt' and 'bar.txt' that both happen to be 500 bytes in size doesn't mean these 2 files are duplicates. So, you want content involved too, but, if you extend your FileEquality class to include the content of the file, then 2 things come up:
If you're checking content anyway, what does the size and file extension matter? If the content of foo.txt and bar.jpg are precisely the same, they are duplicates, no? Why bother. You can convey the content as a byte[], but note that writing a proper hashCode() and equals() implementation (which are required if you want to use this object as a key for hashmaps) becomes a little trickier. Fortunately, lombok's #Value will get it right, so I suggest you use that.
This implies the entirety of the file content is in your JVM's process memory. Unless you're doing a check on very small files, you'll just run out of memory. You can abstract this away somewhat by not storing the file's entire content, but storing a hash of the content. Google around for how to calculate the sha-256 hash of a file in java. Put this hash value in your FileEquality and now you avoid the memory issue. It is theoretically possible to have 2 files with different contents which nevertheless hash to the exact same sha-256 value but the chances of that are astronomical, and more to the point, sha-256 is designed such that it is not mathematically feasible to intentionally make 2 such files to mess with your application. Therefore, I suggest you just trust the hash :)
Note, of course, that hashing an entire file requires reading the entire file, so if you run your duplicate finder on a directory containing, say, 500GB worth of files, then your application will require at the very least reading of 500GB, which will take some time.

I made this application long ago I found some of its source code for you if you want to learn.
this method works by comparing both of files bytes.
public static boolean checkBinaryEquality(File file1, File file2) {
if(file1.length() != file2.length()) return false;
try(FileInputStream f1 = new FileInputStream(file1); FileInputStream f2 = new FileInputStream(file2)){
byte bus1[] = new byte[1024],
bus2[] = new byte[1024];
// comparing files bytes one by one if we found unmatched results that means they are not equal
while((f1.read(bus1)) >= 0) {
f2.read(bus2);
for(int i = 0; i < 1024;i++)
if(bus1[i] != bus2[i])
return false;
}
// passed
return true;
} catch (IOException exp) {
// problems occurred so let's consider them not equal
return false;
}
}
combine this method with name and extension checking and you are ready to go.

copy-paste-example
create a class that extends File
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Arrays;
public class MyFile extends File {
private static final long serialVersionUID = 1L;
public MyFile(final String pathname) {
super(pathname);
}
#Override
public boolean equals(final Object obj) {
if (this == obj) {
return true;
}
if (this.getClass() != obj.getClass()) {
return false;
}
final MyFile other = (MyFile) obj;
if (!Arrays.equals(this.getContent(), other.getContent())) {
return false;
}
if (this.getName() == null) {
if (other.getName() != null) {
return false;
}
} else if (!this.getName().equals(other.getName())) {
return false;
}
if (this.length() != other.length()) {
return false;
}
return true;
}
#Override
public int hashCode() {
final int prime = 31;
int result = prime;
result = (prime * result) + Arrays.hashCode(this.getContent());
result = (prime * result) + ((this.getName() == null) ? 0 : this.getName().hashCode());
result = (prime * result) + (int) (this.length() ^ (this.length() >>> 32));
return result;
}
private byte[] getContent() {
try (final FileInputStream fis = new FileInputStream(this)) {
return fis.readAllBytes();
} catch (final IOException e) {
e.printStackTrace();
return new byte[] {};
}
}
}
read base directory
import java.io.File;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Vector;
public class FileTest {
public FileTest() {
super();
}
public static void main(final String[] args) {
final Map<MyFile, List<MyFile>> duplicates = new HashMap<>();
FileTest.handleDirectory(duplicates, new File("[path to base directory]"));
final Iterator<Entry<MyFile, List<MyFile>>> iterator = duplicates.entrySet().iterator();
while (iterator.hasNext()) {
final Entry<MyFile, List<MyFile>> next = iterator.next();
if (next.getValue().size() == 0) {
iterator.remove();
} else {
System.out.println(next.getKey().getName() + " - " + next.getKey().getAbsolutePath());
for (final MyFile file : next.getValue()) {
System.out.println(" ->" + file.getName() + " - " + file.getAbsolutePath());
}
}
}
}
private static void handleDirectory(final Map<MyFile, List<MyFile>> duplicates, final File directory) {
final File dir = directory;
if (dir.isDirectory()) {
final File[] files = dir.listFiles();
for (final File file : files) {
if (file.isDirectory()) {
FileTest.handleDirectory(duplicates, file);
continue;
}
final MyFile myFile = new MyFile(file.getAbsolutePath());
if (!duplicates.containsKey(myFile)) {
duplicates.put(myFile, new Vector<>());
} else {
duplicates.get(myFile).add(myFile);
}
}
}
}
}

Block of code is not touched

I'm building a small application in Java, small game mechanics but nothing serious. I have a class which purpose is to fetch data from a file. But when I declare the two classes to read from it the program justs ignore everything and continues. As a result, when I try to access the respective lists it gives me null pointer exception. Code of the method that fetches data below:
public void getData(int l, player tmp, level le) {
String[] dataPlayer;
String[] dataLevel;
try {
//FileReader f = new FileReader(this.levelPath.concat(Integer.toString(l)));
File f = new File(this.levelPath.concat(Integer.toString(l)));
BufferedReader buff = new BufferedReader(new FileReader(f));
System.out.println("Reached");
boolean eof = false;
while (!eof) {
String b = buff.readLine();
if (b == null)
eof = true;
else {
if (b.contains("player")) {
dataPlayer = b.split("-");
for (int i = 0; i < dataPlayer.length; i++) {
if (i == 0)
continue;
items it = new items(dataPlayer[i]);
tmp.setInventory1(it);
}
}else if (b.contains("level")) {
dataLevel = b.split("-");
for (int i = 0; i < dataLevel.length; i++) {
if (i == 0)
continue;
items it = new items(dataLevel[i]);
le.setSpecific(it);
}
}
}
}
}catch (IOException i) {
i.getMessage();
}
}
File contents of the file "levelData1":
player-hat
player-flashlight
level-flower
level-rock
player-adz
The problem with this particular problem was the path, it needed the absolute like that /home/toomlg4u/IdeaProjects/javaProject/src/Data/levelData.

You're doing a lot of things inside that try/catch that may not throw an IOException. If you get any other exception, it's not going to be caught. Depending on what other exception handling you have in place, that may cause weird behavior. For debugging, you could catch all exceptions, and see if you're getting something else.

If you want to remain to your loop code then you can refactor your code to look like this one:
public void getData(int l, player tmp, level le) {
try (BufferedReader buff = new BufferedReader(new FileReader(new File(this.levelPath + l)))) {
String b;
while ((b = buff.readLine()) != null) {
if (b.contains("player")) {
String[] dataPlayer = b.split("-");
items it = new items(dataPlayer[1]); //because you know that you will have an array with only 2 elements
tmp.setInventory1(it);
}else if (b.contains("level")) {
String[] dataLevel = b.split("-");
items it = new items(dataLevel[1]); //because you know that you will have an array with only 2 elements
le.setSpecific(it);
}
}
}catch (IOException e) {
e.printStackTrace();
}
}
It is a little bit better than that you have, easier to debug and to read. I advice you to read about try with resources.
As a rule of thumb, each time when you open a stream you have to close it. When you don't open it yourself then don't close it.

This is how it should look like a decent program in Java:
private Stream<Items> asStreamOfItems(String line){
return Stream.of(line.split("-")).skip(1).map(Items::new);
}
public void parseFile(String pathToTheFile) throws IOException {
List<String> lines = Files.readAllLines(Paths.get(pathToTheFile));
List<Items> players = lines.stream().filter(line -> line.contains("player")).flatMap(this::asStreamOfItems).collect(Collectors.toList());
List<Items> levels = lines.stream().filter(line -> line.contains("level")).flatMap(this::asStreamOfItems).collect(Collectors.toList());
........
}
In this case all your weird errors will vanish.
After you edited the post I saw your file content. In this case the code should look like this one:
class Items {
private final String name;
public Items(String name) {
this.name = name;
}
public String getName() {
return name;
}
public static Items parse(String line) {
return new Items(line.split("-")[1]);
}
}
public void parseFile(String pathToTheFile) throws IOException {
List<String> lines = Files.readAllLines(Paths.get(pathToTheFile));
List<Items> players = lines.stream().filter(line -> line.contains("player")).map(Items::parse).collect(Collectors.toList());
List<Items> levels = lines.stream().filter(line -> line.contains("level")).map(Items::parse).collect(Collectors.toList());
..............
}
Btw, you broke a lot of Java and general programming rules like:
using continue is a bad practice. It should be used only in extreme cases because it makes the code difficult to read.
the class name in Java should be in the CamelCase notation
one method should have only one responsibility
DON'T mutate the object inside of a method (example: tmp.setInventory1(it);) very very very bad practice
when you work with streams use try with resource or try/catch/finally to close your stream after you finish the reading.
Before jumping to write code explore the JAVA IO SDK to look for better methods to read from files

merging sorted files Java

im implementing external merge sort using Java.
So given a file I split it into smaller ones , then sort the smaller portions and finally merge the sorted (smaller) files.
So , the last step is what im having trouble with.
I have a list of files and I want at each step , take the minimum value of the first rows of each file and then remove that line.
So , it is supposed to be something like this:
public static void mergeSortedFiles(List<File> sorted, File output) throws IOException {
BufferedWriter wf = new BufferedWriter(new FileWriter(output));
String curLine = "";
while(!sorted.isEmpty()) {
curLine = findMinLine(sorted);
wf.write(curLine);
}
}
public static String findMinLine(List<File> sorted) throws IOException {
List<BufferedReader> brs = new ArrayList<>();
for(int i =0; i<sorted.size() ; i++) {
brs.add(new BufferedReader(new FileReader(sorted.get(i))));
}
List<String> lines = new ArrayList<>();
for(BufferedReader br : brs) {
lines.add(br.readLine());
}
Collections.sort(lines);
return lines.get(0);
}
Im not sure how to update the files, anyone can help with that?
Thanks for helping!

You can create a Comparable wrapper around each file and then place the wrappers in a heap (for example a PriorityQueue).
public class ComparableFile<T extends Comparable<T>> implements Comparable<ComparableFile<T>> {
private final Deserializer<T> deserializer;
private final Iterator<String> lines;
private T buffered;
public ComparableFile(File file, Deserializer<T> deserializer) {
this.deserializer = deserializer;
try {
this.lines = Files.newBufferedReader(file.toPath()).lines().iterator();
} catch (IOException e) {
// deal with it differently if you want, I'm just providing a working example
// and wanted to use the constructor in a lambda function
throw new UncheckedIOException(e);
}
}
#Override
public int compareTo(ComparableFile<T> that) {
T mine = peek();
T theirs = that.peek();
if (mine == null) return theirs == null ? 0 : -1;
if (theirs == null) return 1;
return mine.compareTo(theirs);
}
public T pop() {
T tmp = peek();
if (tmp != null) {
buffered = null;
return tmp;
}
throw new NoSuchElementException();
}
public boolean isEmpty() {
return peek() == null;
}
private T peek() {
if (buffered != null) return buffered;
if (!lines.hasNext()) return null;
return buffered = deserializer.deserialize(lines.next());
}
}
Then, you can merge them this way:
public class MergeFiles<T extends Comparable<T>> {
private final PriorityQueue<ComparableFile<T>> files;
public MergeFiles(List<File> files, Deserializer<T> deserializer) {
this.files = new PriorityQueue<>(files.stream()
.map(file -> new ComparableFile<>(file, deserializer))
.filter(comparableFile -> !comparableFile.isEmpty())
.collect(toList()));
}
public Iterator<T> getSortedElements() {
return new Iterator<T>() {
#Override
public boolean hasNext() {
return !files.isEmpty();
}
#Override
public T next() {
if (!hasNext()) throw new NoSuchElementException();
ComparableFile<T> head = files.poll();
T next = head.pop();
if (!head.isEmpty()) files.add(head);
return next;
}
};
}
}
And here's some code to demonstrate it works:
public static void main(String[] args) throws IOException {
List<File> files = Arrays.asList(
newTempFile(Arrays.asList("hello", "world")),
newTempFile(Arrays.asList("english", "java", "programming")),
newTempFile(Arrays.asList("american", "scala", "stackoverflow"))
);
Iterator<String> sortedElements = new MergeFiles<>(files, line -> line).getSortedElements();
while (sortedElements.hasNext()) {
System.out.println(sortedElements.next());
}
}
private static File newTempFile(List<String> words) throws IOException {
File tempFile = File.createTempFile("sorted-", ".txt");
Files.write(tempFile.toPath(), words);
tempFile.deleteOnExit();
return tempFile;
}
Output:
american
english
hello
java
programming
scala
stackoverflow
world

So what you want to do is to swap two lines in a text file? You can do it by using a RandomAccessFile however this will be horrible slow since everytime when you swap two lines you have to wait for the next IO burst.
So i highly recommend you to use the following code to be able to do the merge sort on the heap:
List<String> lines1 = Files.readAllLines(youFile1);
List<String> lines2 = Files.readAllLines(youFile2);
//use merge sort on theese lines
List<String> merged;
FileWriter writer = new FileWriter(yourOutputFile);
for(String str: merged) {
writer.write(str + System.lineSeparator());
}
writer.close();

The standard merge technique between a fixed number of files (say, 2) is :
have a variable for the value of the ordering key of the current record of each file (for java, make that variable Comparable).
start the process by reading the first record of each file (and fill in the corresponding variable)
loop (until end-of-file on both) through a code block that says essentially
if (key_1.compareTo(key_2) == 0) { process both files ; then read both files}
else if (key_1.compareTo(key_2) == -1) { process file 1 ; then read file 1}
else { process file 2 ; then read file 2}
Note how this code does essentially nothing more than determine the file with the lowest key, and process that.
If your number of files is variable, then your number of key variables is variable too, and "determining the file with the lowest current key" cannot be done as per above. Instead, have as many current_key_value objects as there are files, and store them all in a TreeSet. Now, the first element of the TreeSet will be the lowest current key value of all the files and if you make sure that you maintain a link between your key variable and the file number you just process that file (and delete the just processed key value from the TreeSet and read a new record from the processed file and add its key value to the TreeSet).

How to update the ArrayList? [duplicate]

This question already has answers here:
Why does the foreach statement not change the element value?
(6 answers)
Closed 5 years ago.
Im new to Java-programming and I just got an assignment at school I'm struggling a bit with. The code you see below is the only code I'm allowed to edit.
The assignment is to find the word "ADJEKTIV" in a txt file and replace it with a random adjective from another txt document containing only adjectives. This part I think I nailed. But when I try to use the write-method from another class called OutputWriter, it seems like it won't take the new updates to the Strings containing "ADJEKTIV". Do I need to "update" the ArrayList somehow to keep the changes?
import java.util.*;
/**
* Class documentation needed!
*/
public class StoryCreator
{
private InputReader reader;
private OutputWriter writer;
private Random random;
public StoryCreator()
{
reader = new InputReader();
writer = new OutputWriter();
random = new Random();
}
public String randomAdjective(String adjectivesFilename)
{
ArrayList<String> adjectives = reader.getWordsInFile(adjectivesFilename);
int index = random.nextInt(adjectives.size());
return adjectives.get(index);
}
public void createAdjectiveStory(String storyFilename, String adjectivesFilename, String outputFilename)
{
ArrayList<String> story = reader.getWordsInFile(storyFilename);
for(String s : story)
{
if(s.contains("ADJEKTIV."))
{
s = randomAdjective(adjectivesFilename) + ". ";
}
if(s.contains("ADJEKTIV"))
{
s = randomAdjective(adjectivesFilename);
}
}
writer.write(story, outputFilename);
}
}
This is the method from the OutputWriter-class:
public void write(ArrayList<String> output, String filename)
{
try {
FileWriter out = new FileWriter(filename);
for(String word : output) {
out.write(word + " ");
}
out.close();
}
catch(IOException exc) {
System.out.println("Error writing output file: " + exc);
}
}

You are not updating the list with
s = randomAdjective(adjectivesFilename);
You instanciate a new String, this is not the instance in the List anymore.
You need to set that value into the list. For that, you need to keep track of the index and use List.set(int, E) to update the list at a specific place.
The easiest at your level. Change the loop to :
for( int i = 0; i < story.size(); i++){
String s = story.get(i);
if(s.contains("ADJEKTIV."))
{
//replace the value with a new one.
s = randomAdjective(adjectivesFilename) + ". ");
story.set(i, s);
/* OR shorter
story.set(i, randomAdjective(adjectivesFilename) + ". ");
*/
}
...
}

Instead of iterating over the ArrayList, you can also replace all the occurrences of once using replaceAll method.
public void createAdjectiveStory(String storyFilename, String adjectivesFilename, String outputFilename)
{
ArrayList<String> story = reader.getWordsInFile(storyFilename);
story.replaceAll(new UnaryOperator<String>() {
public String apply(String original) {
if(original.contains("ADJEKTIV."))
return randomAdjective(adjectivesFilename) + ". ";
if(original.contains("ADJEKTIV"))
return randomAdjective(adjectivesFilename);
return original;
}
});
writer.write(story, outputFilename);
}

One whole exception would be splitted into 2 maps while using hadoop to catch exceptions from raw logs

I want to use hadoop to fetch and parse exceptions from raw logs.
I encounter a problem that some exceptions (spanning multiple lines) will be part of 2 different splits, and thus 2 different mappers.
I have an idea to avoid this problem. I could override the getSplits() method to make every split have a little redundant data. I think this solution will come a too high a cost for me.
So does anyone have a better solution for this problem?

I would go for a preprocessing job where you tag the exceptions with XML tags. Next you can use XMLInputformat to process the files. (this is only the start to a solution, based on your feedback we might make things more concrete)
This link provides a tutorial to write your own XMLinputformat, which you can customize to look for 'exception' characteristics. The main point of this tutorial is this sentence:
In the event that a record spans a InputSplit boundary, the record
reader will take care of this so we will not have to worry about this.
I will copy paste the information of the website, since it might go offline in the future, which could be very frustrating for people reviewing this in the future:
The inputformat:
package org.undercloud.mapreduce.example3;
import java.io.IOException;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class XmlInputFormat extends FileInputFormat {
public RecordReader getRecordReader(InputSplit input, JobConf job, Reporter reporter)
throws IOException {
reporter.setStatus(input.toString());
return new XmlRecordReader(job, (FileSplit)input);
}
The record reader:
NOTE: The logic for reading past the end of the split is in readUntilMatch function which reads past the end of the split if the there is an open tag. This is really what you are looking for I think!
package org.undercloud.mapreduce.example3;
import java.io.IOException;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class XmlRecordReader implements RecordReader {
private String startTagS = "";
private String endTagS = "";
private byte[] startTag;
private byte[] endTag;
private long start;
private long end;
private FSDataInputStream fsin;
private DataOutputBuffer buffer = new DataOutputBuffer();
private LineRecordReader lineReader;
private LongWritable lineKey;
private Text lineValue;
public XmlRecordReader(JobConf job, FileSplit split) throws IOException {
lineReader = new LineRecordReader(job, split);
lineKey = lineReader.createKey();
lineValue = lineReader.createValue();
startTag = startTagS.getBytes();
endTag = endTagS.getBytes();
// Open the file and seek to the start of the split
start = split.getStart();
end = start + split.getLength();
Path file = split.getPath();
FileSystem fs = file.getFileSystem(job);
fsin = fs.open(split.getPath());
fsin.seek(start);
}
public boolean next(Text key, XmlContent value) throws IOException {
// Get the next line
if (fsin.getPos() < end) {
if (readUntilMatch(startTag, false)) {
try {
buffer.write(startTag);
if (readUntilMatch(endTag, true)) {
key.set(Long.toString(fsin.getPos()));
value.bufferData = buffer.getData();
value.offsetData = 0;
value.lenghtData = buffer.getLength();
return true;
}
}
finally {
buffer.reset();
}
}
}
return false;
}
private boolean readUntilMatch(byte[] match, boolean withinBlock) throws IOException {
int i = 0;
while (true) {
int b = fsin.read(); // End of file -> T
if (b == -1) return false;
// F-> Save to buffer:
if (withinBlock) buffer.write(b);
if (b == match[i]) {
i++;
if (i >= match.length) return true;
} else i = 0;
// see if we’ve passed the stop point:
if(!withinBlock && i == 0 && fsin.getPos() >= end) return false;
}
}
public Text createKey() {
return new Text("");
}
public XmlContent createValue() {
return new XmlContent();
}
public long getPos() throws IOException {
return lineReader.getPos();
}
public void close() throws IOException {
lineReader.close();
}
public float getProgress() throws IOException {
return lineReader.getProgress();
}
}
And finally the writable:
package org.undercloud.mapreduce.example3;
import java.io.*;
import org.apache.hadoop.io.*;
public class XmlContent implements Writable{
public byte[] bufferData;
public int offsetData;
public int lenghtData;
public XmlContent(byte[] bufferData, int offsetData, int lenghtData) {
this.bufferData = bufferData;
this.offsetData = offsetData;
this.lenghtData = lenghtData;
}
public XmlContent(){
this(null,0,0);
}
public void write(DataOutput out) throws IOException {
out.write(bufferData);
out.writeInt(offsetData);
out.writeInt(lenghtData);
}
public void readFields(DataInput in) throws IOException {
in.readFully(bufferData);
offsetData = in.readInt();
lenghtData = in.readInt();
}
public String toString() {
return Integer.toString(offsetData) + ", "
+ Integer.toString(lenghtData) +", "
+ bufferData.toString();
}
}
This looks like a really useful tutorial, addressing the issue of records spanning multiple splits. Let me know if you are able to adapt this example to your problem.

The classes TextInputFormat and NLineInputFormat might be helpful. The TextInputFormat will split the file by line, so if the exception ends with a newline (and contains none within it) this should work. If the exceptions contain a fixed number of lines the NLineInputFormat class should be what you want as you can set the number of lines to take.
Unfortunately if the exception(s) can contain a variable number of newlines within them this won't work.
In that case I recommend looking for Mahout's XmlInputFormat. It crosses split boundaries, so will work for most stuff. just run a pre-processor to put the Exceptions inside of an <exception></exception> tag, and specify that as start/end tags.
Example pre-processor, using regex to identify exceptions
String input; //code this to the input string
String regex; //make this equal to the exception regex
BufferedWriter bw; //make this go to file where output will be stored
String toProcess = input;
boolean continueLoop = true;
while(continueLoop){
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(toProcess);
if(m.find()){
bw.write("<exception>"+toProcess.substring(m.start(),m.end())+"</exception>");
toProcess = toProcess.substring(m.end());
}else{
continueLoop = false;
}
}

Thanks for all your solution. I think it is useful for me
Especially notice the above comment
"In the event that a record spans a InputSplit boundary, the record
reader will take care of this so we will not have to worry about
this."
Then I look into the source code about how LineRecordReader to read the data form split. then I find actually the LineRecordReader has alreadly had some logic to read record spaning a InputSplit boundary cause line records in the bottom of the split always be splitted into 2 different split due to size limitation of block.
so I think what i need to do is to add the data size that LineRecordReader read spaning split boundary.
Now my solution is: override the Method "nextKeyValue()" in LineRecordReader.
public boolean nextKeyValue() throws IOException {
if (key == null) {
key = new LongWritable();
}
key.set(pos);
if (value == null) {
value = new Text();
}
int newSize = 0;
while (pos < end) {
newSize = in.readLine(value, maxLineLength,
Math.max((int)Math.min(Integer.MAX_VALUE, end-pos),
maxLineLength));
change the line“ while (pos < end) ” to “ while (pos < end + {param}) ”
the {param} means the size of redundant data which readRecorder read across split boundary.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.