I am using Tess4j for using Tesseract-OCR technology and I have been using the following code:
During testing I wanted to test the catch close so I was feeding wrong information to Tesseract, which should result in TesseractException.
I managed to induce a TesseractException from the createDocuments() method.
Here is the stack trace:
Note that in the exception we can find doOcr()'s line 125, which is within the try-catch clause, but even though console shows a TesseractException being thrown, the code moves onto line 126 returning true.
I use net.sourceforge.tess4j.Tesseract to initiate the OCR proccess, but I tried net.sourceforge.tess4j.Tesseract1 too, which resulted the same red console output that is done by Tess4j, but no TesseractException.
My question is what am I doing wrong? I am just assuming there is an issue with my code, because TesseractExceptionis being thrown, but my code is not catching it.
Look at the source code of Tesseract.java:
#Override
public void createDocuments(String[] filenames, String[] outputbases, List<RenderedFormat> formats) throws TesseractException {
if (filenames.length != outputbases.length) {
throw new RuntimeException("The two arrays must match in length.");
}
init();
setTessVariables();
try {
for (int i = 0; i < filenames.length; i++) {
File workingTiffFile = null;
try {
String filename = filenames[i];
// if PDF, convert to multi-page TIFF
if (filename.toLowerCase().endsWith(".pdf")) {
workingTiffFile = PdfUtilities.convertPdf2Tiff(new File(filename));
filename = workingTiffFile.getPath();
}
TessResultRenderer renderer = createRenderers(outputbases[i], formats);
createDocuments(filename, renderer);
api.TessDeleteResultRenderer(renderer);
} catch (Exception e) {
// skip the problematic image file
logger.error(e.getMessage(), e);
} finally {
if (workingTiffFile != null && workingTiffFile.exists()) {
workingTiffFile.delete();
}
}
}
} finally {
dispose();
}
}
/**
* Creates documents.
*
* #param filename input file
* #param renderer renderer
* #throws TesseractException
*/
private void createDocuments(String filename, TessResultRenderer renderer) throws TesseractException {
api.TessBaseAPISetInputName(handle, filename); //for reading a UNLV zone file
int result = api.TessBaseAPIProcessPages(handle, filename, null, 0, renderer);
if (result == ITessAPI.FALSE) {
throw new TesseractException("Error during processing page.");
}
}
Exception is thrown at line 579. This method is called by a public method above - at line 551. This is inside the try-catch block with logger.error(e.getMessage(), e); in the catch body (line 555).
Now the question is what you really want to achieve?
If you don't want to see this log, you can configure slf4j to not print the log from this library.
If you want to get the actual exception, it is not possible as the library swallows it. I am not familiar with the library, but looking at the code it doesn't seem like there is any nice option - the method that throws the exception is private and is used only in this one place - under the try-catch block. However, the exception is thrown when api.TessBaseAPIProcessPages(...) returns ITessAPI.FALSE and api has a getter. So you could get it, call TessBaseAPIProcessPages(...) method and check for the result. This might be not ideal as you will probably be processing every image twice. Another solution is to fork the source code and modify it yourself. You might also want to contact the author and ask for advice - you could take it further and submit a pull request for them to approve and release.
Add pdf.ttf file to tessdata path (tessdata/pdf.ttf)
pdf.ttf
Related
How can we write a byte array to a file (and read it back from that file) in Java?
Yes, we all know there are already lots of questions like that, but they get very messy and subjective due to the fact that there are so many ways to accomplish this task.
So let's reduce the scope of the question:
Domain:
Android / Java
What we want:
Fast (as possible)
Bug-free (in a rigidly meticulous way)
What we are not doing:
Third-party libraries
Any libraries that require Android API later than 23 (Marshmallow)
(So, that rules out Apache Commons, Google Guava, Java.nio, and leaves us with good ol' Java.io)
What we need:
Byte array is always exactly the same (content and size) after going through the write-then-read process
Write method only requires two arguments: File file, and byte[] data
Read method returns a byte[] and only requires one argument: File file
In my particular case, these methods are private (not a library) and are NOT responsible for the following, (but if you want to create a more universal solution that applies to a wider audience, go for it):
Thread-safety (file will not be accessed by more than one process at once)
File being null
File pointing to non-existent location
Lack of permissions at the file location
Byte array being too large
Byte array being null
Dealing with any "index," "length," or "append" arguments/capabilities
So... we're sort of in search of the definitive bullet-proof code that people in the future can assume is safe to use because your answer has lots of up-votes and there are no comments that say, "That might crash if..."
This is what I have so far:
Write Bytes To File:
private void writeBytesToFile(final File file, final byte[] data) {
try {
FileOutputStream fos = new FileOutputStream(file);
fos.write(data);
fos.close();
} catch (Exception e) {
Log.i("XXX", "BUG: " + e);
}
}
Read Bytes From File:
private byte[] readBytesFromFile(final File file) {
RandomAccessFile raf;
byte[] bytesToReturn = new byte[(int) file.length()];
try {
raf = new RandomAccessFile(file, "r");
raf.readFully(bytesToReturn);
} catch (Exception e) {
Log.i("XXX", "BUG: " + e);
}
return bytesToReturn;
}
From what I've read, the possible Exceptions are:
FileNotFoundException : Am I correct that this should not happen as long as the file path being supplied was derived using Android's own internal tools and/or if the app was tested properly?
IOException : I don't really know what could cause this... but I'm assuming that there's no way around it if it does.
So with that in mind... can these methods be improved or replaced, and if so, with what?
It looks like these are going to be core utility/library methods which must run on Android API 23 or later.
Concerning library methods, I find it best to make no assumptions on how applications will use these methods. In some cases the applications may want to receive checked IOExceptions (because data from a file must exist for the application to work), in other cases the applications may not even care if data is not available (because data from a file is only cache that is also available from a primary source).
When it comes to I/O operations, there is never a guarantee that operations will succeed (e.g. user dropping phone in the toilet). The library should reflect that and give the application a choice on how to handle errors.
To optimize I/O performance always assume the "happy path" and catch errors to figure out what went wrong. This is counter intuitive to normal programming but essential in dealing with storage I/O. For example, just checking if a file exists before reading from a file can make your application twice as slow - all these kind of I/O actions add up fast to slow your application down. Just assume the file exists and if you get an error, only then check if the file exists.
So given those ideas, the main functions could look like:
public static void writeFile(File f, byte[] data) throws FileNotFoundException, IOException {
try (FileOutputStream out = new FileOutputStream(f)) {
out.write(data);
}
}
public static int readFile(File f, byte[] data) throws FileNotFoundException, IOException {
try (FileInputStream in = new FileInputStream(f)) {
return in.read(data);
}
}
Notes about the implementation:
The methods can also throw runtime-exceptions like NullPointerExceptions - these methods are never going to be "bug free".
I do not think buffering is needed/wanted in the methods above since only one native call is done
(see also here).
The application now also has the option to read only the beginning of a file.
To make it easier for an application to read a file, an additional method can be added. But note that it is up to the library to detect any errors and report them to the application since the application itself can no longer detect those errors.
public static byte[] readFile(File f) throws FileNotFoundException, IOException {
int fsize = verifyFileSize(f);
byte[] data = new byte[fsize];
int read = readFile(f, data);
verifyAllDataRead(f, data, read);
return data;
}
private static int verifyFileSize(File f) throws IOException {
long fsize = f.length();
if (fsize > Integer.MAX_VALUE) {
throw new IOException("File size (" + fsize + " bytes) for " + f.getName() + " too large.");
}
return (int) fsize;
}
public static void verifyAllDataRead(File f, byte[] data, int read) throws IOException {
if (read != data.length) {
throw new IOException("Expected to read " + data.length
+ " bytes from file " + f.getName() + " but got only " + read + " bytes from file.");
}
}
This implementation adds another hidden point of failure: OutOfMemory at the point where the new data array is created.
To accommodate applications further, additional methods can be added to help with different scenario's. For example, let's say the application really does not want to deal with checked exceptions:
public static void writeFileData(File f, byte[] data) {
try {
writeFile(f, data);
} catch (Exception e) {
fileExceptionToRuntime(e);
}
}
public static byte[] readFileData(File f) {
try {
return readFile(f);
} catch (Exception e) {
fileExceptionToRuntime(e);
}
return null;
}
public static int readFileData(File f, byte[] data) {
try {
return readFile(f, data);
} catch (Exception e) {
fileExceptionToRuntime(e);
}
return -1;
}
private static void fileExceptionToRuntime(Exception e) {
if (e instanceof RuntimeException) { // e.g. NullPointerException
throw (RuntimeException)e;
}
RuntimeException re = new RuntimeException(e.toString());
re.setStackTrace(e.getStackTrace());
throw re;
}
The method fileExceptionToRuntime is a minimal implementation, but it shows the idea here.
The library could also help an application to troubleshoot when an error does occur. For example, a method canReadFile(File f) could check if a file exists and is readable and is not too large. The application could call such a function after a file-read fails and check for common reasons why a file cannot be read. The same can be done for writing to a file.
Although you can't use third party libraries, you can still read their code and learn from their experience. In Google Guava for example, you usually read a file into bytes like this:
FileInputStream reader = new FileInputStream("test.txt");
byte[] result = ByteStreams.toByteArray(reader);
The core implementation of this is toByteArrayInternal. Before calling this, you should check:
A not null file is passed (NullPointerException)
The file exists (FileNotFoundException)
After that, it is reduced to handling an InputStream and this where IOExceptions come from. When reading streams a lot of things out of the control of your application can go wrong (bad sectors and other hardware issues, mal-functioning drivers, OS access rights) and manifest themselves with an IOException.
I am copying here the implementation:
private static final int BUFFER_SIZE = 8192;
/** Max array length on JVM. */
private static final int MAX_ARRAY_LEN = Integer.MAX_VALUE - 8;
private static byte[] toByteArrayInternal(InputStream in, Queue<byte[]> bufs, int totalLen)
throws IOException {
// Starting with an 8k buffer, double the size of each successive buffer. Buffers are retained
// in a deque so that there's no copying between buffers while reading and so all of the bytes
// in each new allocated buffer are available for reading from the stream.
for (int bufSize = BUFFER_SIZE;
totalLen < MAX_ARRAY_LEN;
bufSize = IntMath.saturatedMultiply(bufSize, 2)) {
byte[] buf = new byte[Math.min(bufSize, MAX_ARRAY_LEN - totalLen)];
bufs.add(buf);
int off = 0;
while (off < buf.length) {
// always OK to fill buf; its size plus the rest of bufs is never more than MAX_ARRAY_LEN
int r = in.read(buf, off, buf.length - off);
if (r == -1) {
return combineBuffers(bufs, totalLen);
}
off += r;
totalLen += r;
}
}
// read MAX_ARRAY_LEN bytes without seeing end of stream
if (in.read() == -1) {
// oh, there's the end of the stream
return combineBuffers(bufs, MAX_ARRAY_LEN);
} else {
throw new OutOfMemoryError("input is too large to fit in a byte array");
}
}
As you can see most of the logic has to do with reading the file in chunks. This is to handle situations, where you don't know the size of the InputStream, before starting reading. In your case, you only need to read files and you should be able to know the length beforehand, so this complexity could be avoided.
The other check is OutOfMemoryException. In standard Java the limit is too big, however in Android, it will be a much smaller value. You should check, before trying to read the file that there is enough memory available.
I have been trying many of the examples provided and have yet to be successful. Here is the code I am currently trying, but getting an error in Eclipse on Paths.of (the of is underlined in red) that says: "rename in file".
String content;
try {
content = Files.readAllLines(Paths.of("C:", "Calcs.txt"));
} catch (IOException e1) {
e1.printStackTrace ();
}
System.out.println (content);
First it is not possible, if you get a list as return type, to assign this to a string. So you must write:
List<String> content;
Second regarding to the Java 8 documentation there is no method of available for this class. You can use the method get like this:
List<String> content = Files.readAllLines(Paths.get("C:", "Calcs.txt"));
Otherwise there exists a method of in the Path class since Java 11. Therefore you can write something like that:
List<String> content = Files.readAllLines(Path.of("C:", "Calcs.txt"));
You're probably looking for Paths.get:
String content;
try {
content = String.join("\n", Files.readAllLines(Paths.get("/home/hassan", "Foo.java")));
} catch (IOException e1) {
e1.printStackTrace ();
}
I have a strange behaviour in my java code I would like to ask some advice.
In a multithreading application I wrote this code:
scratchDir.resolve(directoryTree).toFile().mkdirs();
For a bug the Object scratchDir is null, I was expecting a stack trace on the logs but there's nothing about the error.
I have checked the code and I never try to catch the NullPointerException.
Here is the complete method code:
#Override
public void write(JsonObject jsonObject) throws FileSystemException {
Path directoryTree = getRelativePath();
scratchDir.resolve(directoryTree).toFile().mkdirs();
String newFileName = getHashFileName(jsonObject);
Path filePath = scratchDir.resolve(directoryTree).resolve(newFileName);
logger.debug("Write new file Json {} to persistent storage dir {}", newFileName, scratchDir);
File outputFile = filePath.toFile();
if (outputFile.exists()) {
throw new FileAlreadyExistsException(filePath.toString());
}
try (FileWriter fileWriter = new FileWriter(outputFile)) {
fileWriter.write(jsonObject.toString());
fileWriter.flush();
} catch (Exception e) {
logger.error(e);
}
}
Why I don't have the exception in my logs?
Why are you doing this?
The proper way to do this is:
Files.createDirectories(scratchDir.resolve(directoryTree));
don't mix old and new API. The old mkdirs() api DEMANDS that you check the return value; if it is false, the operation failed, and you do not get the benefit of an exception to tell you why. This is the primary reason for why there is a new API in the first place.
Are you sure you aren't confused - and that is the actual problem? The line as you have it will happily do absolutely nothing whatsoever (no directories, and no logs or exceptions). The line above will throw if it can't make the directories, so start there.
Then, if that line IS being run and nothing is logged, then you've caught the NPE and discarded it, someplace you didn't paste.
With ImageIO.write() API call, I get NullPointerException when I pass a non-existent path like "\\abc\abc.png". I pass the non-existent path purposely to test something but instead of getting FileNotFoundException, I get NPE. Why is that?
ImageIO.write() API is supposed to throw IOException but don't why I get NPE.
I use exception message string to show it in a message box to user but in this case NPE.getLocalizedMessage() returns empty string and hence the popup is empty with just an icon on it.
He is right, though. For example, this code:
public static void main(String[] args) throws IOException {
BufferedImage image = new BufferedImage(32, 32, BufferedImage.TYPE_INT_ARGB);
File out = new File("\\\\ABC\\abc.png");
ImageIO.write(image, "png", out);
}
gives
java.io.FileNotFoundException: \\ABC\abc.png (The network path was not found)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
at javax.imageio.stream.FileImageOutputStream.<init>(FileImageOutputStream.java:69)
at com.sun.imageio.spi.FileImageOutputStreamSpi.createOutputStreamInstance(FileImageOutputStreamSpi.java:55)
at javax.imageio.ImageIO.createImageOutputStream(ImageIO.java:419)
at javax.imageio.ImageIO.write(ImageIO.java:1530)
at javaapplication145.JavaApplication145.main(JavaApplication145.java:24)
Exception in thread "main" java.lang.NullPointerException
at javax.imageio.ImageIO.write(ImageIO.java:1538)
at javaapplication145.JavaApplication145.main(JavaApplication145.java:24)
The reason is that FileImageOutputStreamSpi.createOutputStreamInstance swallows the FileNotFoundException and then the NPE comes when ImageIO.write tries to close a stream that didn't open.
Why the exception is suppressed so brutally, I don't know. The code fragment is
try {
return new FileImageOutputStream((File)output);
} catch (Exception e) {
e.printStackTrace();
return null;
}
The only solution is to verify the path before attempting to use ImageIO.
I found out the reason for NPE for issue mentioned in this thread. Peter Hull is absolutely right in saying
public static void main(String[] args) throws IOException { BufferedImage image = new BufferedImage(32, 32, BufferedImage.TYPE_INT_ARGB); File out = new File("\\\\ABC\\abc.png"); ImageIO.write(image, "png", out); }
This is exactly my code looks like. Thanks Peter for highlighting.
The reason for this issue is that new FileImageOutputStream() throws a FileNotFoundException but some Sun programmer went and caught the exception, printed the stack trace, and returned null. Which is why it's no longer possible to catch the FileNotFoundException - it's already printed. Shortly afterwards, the returned null value causes a NullPointerException, which is what it being thrown from the method I called. When printed the Stack Trace of the exception, I could see FileNotFoundException along with NPE for the reason mentioned above.
-Nayan
How can I override removeEldestEntry method to saving eldest entry to file? Also how to limit the size of a file like I did it in LinkedHashMap. Here is code:
import java.util.*;
public class level1 {
private static final int max_cache = 50;
private Map cache = new LinkedHashMap(max_cache, .75F, true) {
protected boolean removeEldestEntry(Map.Entry eldest) {
return size() > max_cache;
}
};
public level1() {
for (int i = 1; i < 52; i++) {
String string = String.valueOf(i);
cache.put(string, string);
System.out.println("\rCache size = " + cache.size() +
"\tRecent value = " + i + " \tLast value = " +
cache.get(string) + "\tValues in cache=" +
cache.values());
}
I tried to use FileOutPutSTream :
private Map cache = new LinkedHashMap(max_cache, .75F, true) {
protected boolean removeEldestEntry(Map.Entry eldest) throws IOException {
boolean removed = super.removeEldestEntry(eldest);
if (removed) {
FileOutputStream fos = new FileOutputStream("t.tmp");
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeObject(eldest.getValue());
oos.close();
}
return removed;
}
But I have gained an error
Error(15,27): removeEldestEntry(java.util.Map.Entry) in cannot override removeEldestEntry(java.util.Map.Entry) in java.util.LinkedHashMap; overridden method does not throw java.io.IOException
Without IOExecptio compiler asks to handle IOexception and Filenotfoundexception.
Maybe another way exists? Pls show me example code, I am new in java and just trying to understand the basic principles of 2 level caching. Thx
You first need to make sure your method properly overrides the parent. You can make some small changes to the signature, such as only throwing a more specific checked exception that is a sub-class of a checked exception declared in the parent. In this case, the parent does not declare any checked exception so you can not refine that further and may not throw any checked exceptions. So you will have to handle the IOException locally. There are several ways you can do that, convert it to a RuntimeException of some kind and/or log it.
If you are concerned about the file size, you probably do not want to keep just the last removed entry but many of them - so you should open the file for append.
You need to return true from the method to actually remove the eldest and you need to decide if the element should be removed.
When working with files you should use try/finally to ensure that you close the resource even if there is an exception. This can get a little ugly - sometimes it's nice to have a utility method to do the close so you don't need the extra try/catch.
Generally you should also use some buffering for file I/O which greatly improves performance; in this case use wrap the file stream in a java.io.BufferedOutputStream and provide that to the ObjectOutputStream.
Here is something that may do what you want:
private static final int MAX_ENTRIES_ALLOWED = 100;
private static final long MAX_FILE_SIZE = 1L * 1024 * 1024; // 1 MB
protected boolean removeEldestEntry(Map.Entry eldest) {
if (size() <= MAX_ENTRIES_ALLOWED) {
return false;
}
File objFile = new File("t.tmp");
if (objFile.length() > MAX_FILE_SIZE) {
// Do something here to manage the file size, such as renaming the file
// You won't be able to easily remove an object from the file without a more
// advanced file structure since you are writing arbitrary sized serialized
// objects. You would need to do some kind of tagging of each entry or include
// a record length before each one. Then you would have to scan and rebuild
// a new file. You cannot easily just delete bytes earlier in the file without
// even more advanced structures (like having an index, fixed size records and
// free space lists, or even a database).
}
FileOutputStream fos = null;
try {
fos = new FileOutputStream(objFile, true); // Open for append
ObjectOutputStream oos = new ObjectOutputStream(new BufferedOutputStream(fos));
oos.writeObject(eldest.getValue());
oos.close(); // Close the object stream to flush remaining generated data (if any).
return true;
} catch (IOException e) {
// Log error here or....
throw new RuntimeException(e.getMessage(), e); // Convert to RuntimeException
} finally {
if (fos != null) {
try {
fos.close();
} catch (IOException e2) {
// Log failure - no need to throw though
}
}
}
}
You can't change the method signature when overriding a method. So you need to handle the exception in the overridden method instead of throwing it.
This contains a good explanation on how to use try and catch: http://download.oracle.com/javase/tutorial/essential/exceptions/try.html