Java OCR library recommendations? [duplicate] - java

This question already has answers here:
Java OCR implementation [closed]
(5 answers)
Closed 9 years ago.
I need to check a tonne of pictures to see if they have a keyword on them. Can anyone recommend a good, reliable OCR library? I'll happily sacrifice speed for accuracy.

There is no pure Java OCR libraries that have something to do with accuracy. Depending on your budget you may choose something that is not purely Java, but can be called from Java:
If you have plenty of time but zero budget - your choice is Tesseract. It is definetely the best among open source
If you have small budget to spend and you only need run this recognition once - Cloud OCR API service would be your best choice. It is based on leading commertial grade OCR engine and offers quite affordable per-project prices. Disclaimer: I work for ABBYY
In case you will need to run this recognition as ongoing process forever, then you may think that it is economically more efficient to purchase dedicated conversion software, for example this one, it has API and can be called from Java too. But there are actually lot of alternatives, if you are prepared to invest some budget in licensing.

If you have plans for recognize not Latin or digit symbols then better way find non java library, but select from some (external) tools and use other ways(1) for get your text.
On Linux I have used cuneiform(2) via command line interface.
command line interface and pipe, for example.
cuneiform have ported on Linux but I don't know about work command line interface for Windows

Related

Java - Executable JAR? Easy Decompiling? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I recently started with Java and besides the variables, logical operators, loops and stuff like that I played around with JFrames and there are a couple of questions that comes to my mind.
I noticed that by saving my program as a executable jar the
program appears to be 1 file. Is that means when you're going to
develop a something that will use a many resources (images, audio
files & etc) they will be all stored in this jar file?
The second thing I noticed is that I can extract the jar file and decompile the .class files fairly easy. With that in mind let's say I use a MySQL database and to connect to the MySQL server I have to do it using JDBC driver and I ?must? hardcode the password?
Is Java suitable for 3D games? I'm really far from this, but let me know. I saw games written with a Java 3D game engine like jMonkeyEngine and I'm impressed, but I red (read) posts around that Java is slow and not that suitable for 3D games which leaved me a little bit confused.
You can do that, if you want. In Oracle's Java Tutorials you can read about creating executable JAR files: Packaging Programs in JAR Files
Any kind of program written in any language can be reverse engineered, so regardless of whether it's Java or not, you should never hard-code passwords. With Java byte code it's fairly easy to decompile.
More than 10 years ago, when Java was still young, it was relatively slow compared to languages that are compiled to native code directly such as C or C++. However, many advances have been made in the Java virtual machine over the years, and the performance of Java programs is comparable to C++ in many cases. People who still complain that Java is slow aren't up to date or don't have a lot of current experience with Java. Java is certainly fast enough, as you saw from demos with jMonkeyEngine. However, for commercial 3D graphics games, C++ seems to be the traditional de-facto programming language that's used. Note that to squeeze the last bit of performance out of the hardware, you'll need to write code specifically for that hardware. Java isn't the right tool for that, as it's designed to be platform-independent.
That is an option. It's certainly not mandatory, you can load resources from external files as well.
That's true. The common architecture for such an application is to put an application server in between the clients and the actual database. The application server that you control is what knows the database password. It only exposes to the clients the operations that they are supposed to be allowed to perform.
Minecraft is written in java, so QED. You won't likely be pushing the limits of modern GPU performance with a game written in Java, but that isn't the goal of every game.
Yes, you can put everything in one jar
Yes, but giving untrusted users raw access to database is rarely a good idea anyway. The better way is to build an server app with public API and authenticate users.
It depends. If you want to write an 3d app focused on rich, high-quality graphic effects than Java is probably not the best choice. However 3d in Java is easy and quite high-level, so you save lot of time on development compared to other lower-level technologies.

Working with long strings (heredocs) in Java - the readable approach? [duplicate]

This question already has answers here:
Does Java have support for multiline strings?
(43 answers)
Closed 7 years ago.
I need to work with long strings containing line breaks in Java. Those are for HTML generation, but it is not the most important here.
I'm aware Java is cripple in a way it doesn't have heredocs. But there are other mechanisms I could use:
1) String concatenation (or StringBuilders), not very readable and copy-pasteable.
2) Storing strings in .properties files, either not very readable, but with higher copypasteability.
3) Storing each "heredoc" in seperate .txt file, quite readable and copypasteable, but resulting in a horde of txt files.
4) Template engines, like Velocity or Freemarker - moves design out of Java, requires a lot of map operations, it would be quite good, but the Velocity syntax and loop/if abilities aren't as readable as for example those from Smarty
Each have pros and contras, I'd like to choose 3 but the management prefers 1 because of pure ideological reasons. I'd like to have some standard for working with heredocs in Java, possibly library that makes things easier. I'd be gratefull for any suggestions (with good arguments) how to work with heredocs.
Thanx
I hate to be "that guy", who suggests that you take a completely different approach than what you asked about, but have you looked at Groovy? It's JVM language, can be mixed freely with Java, and in addition to a bunch of other really nice language features, it has heredocs.
Try Rythm template engine, which is built as a high performance Java template engine with Razor like clean syntax.
Links:
Check the full featured demonstration
read a brief introduction to Rythm
download the latest package or
fork it
Updates
Rythm now has a web site: http://rythmengine.org, and a fiddle site: http://fiddle.rythmengine.org

Is there any tool that converts Objective C code to java for Android [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Objective-C to Java cross compiler
I have a working iPhone app and want to convert to Android app with minimal effort. Can anyone suggest ?
I don't think so. You're best bet would have been to develop the application from scratch using a platform like Appcelerator or Phone Gap.
The commenter makes an excellent point: the platforms are fundamentally different. A straight conversion of code won't work. You also have to convert framework/api calls and restructure all of your UI. Not only is the framework different, but the assumptions made by the platform are totally different as well.
Possibly the best way to reuse the most code (this isn't necessarily the easiest, keep in mind) would be to convert as much objective c code into C or C++ and make use of the Android NDK. You won't be able to reuse any of the UI code, but you might be able to reuse a significant amount of your application logic depending on what your application does.

NetLogo vs. Repast Simphony? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I would like to simulate some scenarios using the multiagent
paradigm, and it seems NetLogo and Repast are the most popular tools for that.
I'd like to know if anyone has had any experience with either one and could tell me more about them? For example, I've noticed that there is a fluxogram-like modeling option for Repast, but I believe it is rather limited. I've looked around the tutorials and documentation in the official site, and the documentation seems to be lacking. While there are some examples with it, I'd say extending it to simulate an ambient which it has not been specifically prepared to seems like an unreachable goal at the moment, despite Repast obviously being very robust and apparently able to handle it, given enough familiarity with it.
On the other hand, NetLogo has more examples and overall I've liked it more for its simplicity, but it seems to be more focused on the simulating propagation of diseases or similar models. I've found a programming book teaching Logo, so I figure it'd be easier to get started with it too.
Currently, I am thinking of simulating botnets and IDSes as multiagents. The problem, however, is that I would have to abstract the network and transport layers to an extent to be able to do it, as well as generate traffic between the nodes. Repast is apparently more fitting for this, but given its complexity and lack of documentation I'm thinking of using NetLogo. While there are some examples of NetLogo with traditional applications (ex: Tetris or Pac-Man), I'm not sure about how appropriate it'd be for that.
I have a webpage with a couple dozed netlogo multiagent simulations. I use netlogo for teaching and I have found that, once you get past the learning curve, you can develop simulations amazingly fast. Stuff that would take you 80 man-hours in other so-called agent environments (Jade, Repast, which are really mostly just programming libraries) can be done in 2 hours.
On the other hand, netlogo is not really good for simulations that require immense amount of details, like say simulating a network all the way from TCP/IP to HTTP. That would just require large amounts of code, regardless of programming language, and netlogo currently sucks if your program ends up being more that 10 pages long. Having said that, most people would be amazed at what you can get done in 10 pages of netlogo code.
Short answer: it depends on the programming paradigm or language you want to use, and the design you want for your agents:
If you want a low-entry-high-ceiling language allowing quick prototyping but sophisticated simulations, and are willing to learn a new paradigm (avoiding loops) use NetLogo. Good documentation.
If you want to make a real application to use on highly-parallelized clusters or just want to use Java Groovy or need a specific Java library for your purpose, use Repast or better Repast for High Performance Computing (but avoid ReLogo which is very slow). Mild documentation.
If you want to model cognitive agents (instead of reactive) with FIPA communications, better use Jason or better JaCaMo which supports AgentSpeak + Java (so you can also use your favourite Java libraries), and there's no Groovy required. Bad documentation (a lot of non detailed features and commands and bad too-complex-not-commented examples).
Long answer:
Disclaimer: I am more experienced with NetLogo but I also used Repast and a few others like Jason.
Basically, the difference between NetLogo and Repast is that with NetLogo you will have a simpler framework but you'll need to learn how to program in a turtle-and-patch-oriented paradigm, while in Repast you will have to learn that + the mechanisms behind Java Groovy but you will eventually get more flexibility. Speed isn't really a criteria here (see below).
To be more clear, you can program efficiently in NetLogo if you use to a maximum the turtles and the patchs native functions. For example, if you want to implement A*, instead of implementing a list of nodes, you should directly use the patchs and filter them using stuffs like this:
ask patchs with [criteria1 = value and criteria2 = value2] [do-some-stuff]
ask patchs with-min [criteria][do]
let var [somevalue] of min-one-of patches [criteria]
Also if you can't find a way to efficiently do what you want, be sure to check if maybe an extension exists (check also here under Libraries and Tools) for your purpose, like the now native matrix extension which allowed me to make an efficient neural network in NetLogo.
On the other hand, Repast is potentially more flexible than NetLogo (since you have access to the whole range of Java libraries), but a bit more complex since you have to know how to handle Groovy.
If you are solely interested in speed, do NOT use ReLogo (NetLogo-like syntax for Repast) which has been shown to be a whole lot slower than NetLogo (see the 2012 paper below). In any cases, your best bet would either to try an implementation with NetLogo using the tricks above, or if you want to use your application for real later, there is also a distribution called Repast for High Performance Computing which removes most of the overload that come with turtles and patchs objects, and thus it can be used for real applications. A similar extension exists for NetLogo to compute in clusters with parallelization but it's not an official distribution.
If you want more infos about the diverse platforms, here is a nice review of 2006:
Railsback, S. F., Lytinen, S. L., & Jackson, S. K. (2006). Agent-based Simulation Platforms: Review and Development Recommendations. SIMULATION, 82(9), 609-623.
And an updated version of this paper in 2012 dealing with NetLogo vs ReLogo:
Lytinen, S. L., & Railsback, S. F. (2012, April). The evolution of agent-based simulation platforms: A review of netlogo 5.0 and relogo. In Proceedings of the Fourth International Symposium on Agent-Based Modeling and Simulation.
/EDIT: I cited Jason but didn't give any more details. If you want to model cognitive agents (instead of reactive agents), you can do that in NetLogo using the unofficial BDI extension which works well but is a bit limited (but it's easily extensible since it's pure NetLogo), but your best bet is to use a framework specifically designed to model cognitive agent with full support of AgentSpeak.
Jason is very nice since you have access to a full AgentSpeak language + JAVA to implement the technical side. In fact, you can do whole projects using only AgentSpeak (which I did), but you can also make more Java-oriented versions, it's up to you how you want to design your program, the result will be more or less the same. This offers you a lot of flexibility in your design workflow.
Tip: search for "Jason internal actions" in the documentation to get a good description of the available AgentSpeak commands.
Also if you are interested in Jason, you might be interested in JaCaMo (= Jason + Cartago + Moise) which is the result of a cooperation of three projects authors to make a full-fledged cognitive agents framework which also can model complex environments (with artifacts theory) and multi-agents organisations (roles, groups, missions, etc.).
A last framework I know of but didn't have a chance to try is Mason which supports 2D and 3D environments. Never had a chance to try this one so I don't know how this compares with the others but you can try it out.
Here's a generic comparison.
http://www.duncanrobertson.com/research/AMLE.pdf
I had more or less the same problem a few months ago when I had to choose a framework for my simulation. I look at Repast, NetLogo, Swarm and Jade.
NetLogo was nice and I tried to write some simple test applications but since I wanted to use Java as my programming language, NetLogo wasn't the best candidate. Repast has pretty much everything you need to write larger simulations and there are many projects (especially in social sciences) where Repast is used. My problems with Repasts were: bad API documentation, parameters that are passed to methods or constructers that are never used and don't make any sense at all (have a look at the source code) and a lot of boilerplate code.
I'm using Jade (http://jade.tilab.com/) now and I'm really happy with it. The community is good and their mailing list is VERY active. Okay, Jade is just a library and a framework for agent-based modelling. You don't get anything like those visual editor in Repast and you'll have to write your own tool for visualising the results.
Cheers
You could simulate the traffic using a agent type called "packet" that will be spawned and send from a agent called "bot" to another agent called "bot" or "server". Instead of sending the packets to a IP address, you would be sending them to a pair of X and Y coordinates.
Netlogo has an example of how a virus spreads in a network, this might be a good starting point.
I have never tried NetLogo, but have I tried Repast-J and Simphony. It seems Simphony is good, but at the moment I am stuck at changing the Edge type from straight line to curved one. There is not enough documentation and examples available.
Once I tried Mason which is based on java, too. It is similar to Repast-J, yet it was faster. But recently there is not much development in Mason.
I would like to try out Jade later.
If you can already code in Java, you can also look at the following paper for a comparison between RePast, Swarm, Quicksilver, and VSEit, different freely available programming libraries for support of social scientific agent based computer simulation
Tobias, Robert, and Carole Hofmann. "Evaluation of free Java-libraries for social-scientific agent based simulation." Journal of Artificial Societies and Social Simulation 7.1 (2004).
Repast is definitely more flexible than NetLogo but the documentation is not very detailed for RePast Symphony

Java: Text to Speech engines overview [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm now in search for a Java Text to Speech (TTS) framework. During my investigations I've found several JSAPI1.0-(partially)-compatible frameworks listed on JSAPI Implementations page, as well as a pair of Java TTS frameworks which do not appear to follow JSAPI spec (Mary, Say-It-Now). I've also noted that currently no reference implementation exists for JSAPI.
Brief tests I've done for FreeTTS (first one listed in JSAPI impls page) show that it is far from reading simple and obvious words (examples: ABC, blackboard). Other tests are currently in progress.
And here goes the question (6, actually):
Which of the Java-based TTS frameworks have you used?
Which ones, by your opinion, are capable of reading the largest wordbase?
What about their voice quality?
What about their performance?
Which non-Java frameworks with Java bindings are there on the scene?
Which of them would you recommend?
Thank you in advance for your comments and suggestions.
I've actually had pretty good luck with FreeTTS
Google Translate has a secret tts api:
https://translate.google.com/translate_tts?ie=utf-8&tl=en&q=Hello%20World
Actually, there is not a big choice:
Festival, most old. Written in C++ but has bindings to Java.
eSpeak, quick and simple, used by Google Translate
mbrola
Pure Java:
FreeTTS, which code was ported from Festival, and then was open-sourced and development was stopped.
MaryTTS - more powerful and looks production ready.
Also there is other proprietary programs like:
Acapella
Nuance Vocalizer
If your software is Windows only, you can use Microsoft Speech API.
I've used Mary before and I was very impressed with the quality of the voices. Unfortunately, I haven't used any of the other ones.
I've used AT&T Natural Voices which provides JSAPI and MS SAPI hooks. It provides excellent quality voices, a good "general" speech dictionary, many controls over pronunciation, and multiple languages. It's a little pricey, but works very well.
I used it to read important sensor telemetry to drivers in a mobile sensor application. We had no complaints about the voice quality. It had about 75% out-of-the-box accuracy with scientific terms and a much higher (maybe 90%+) with normal dialogue. We got it up to about 99+% accuracy by using markups (most errors were on scientific terms with unusual phoneme combinations).
It was a bit hard on the processor (we were running on a Pentium-III equivalent machine and it was pushing 50%-75% peak CPU). This uses a native speech engine (Windows, Linux, and Mac compatible) with a Java interface.
There's a huge variety of voices and languages...
I used FreeTTS but had a major problem getting the MBrola voices to run on My MacbookPro. I did get MBrola voices to run on Windows (painfully) and Linux. I've had no luck loading any other voice packages on FreeTTS which is a shame because the supplied voices are horrible IMO. Outside of that I had a little success with Cloudgarden as well but that only runs on Windows AFAIK. I'd be interested to hear others successes/failures with Voice engines as this type of work is particular challenging. I'm also toying a bit with Sphinx4. I just pulled down JVXML (which appears to be based on Sphinx4) last night but could not get it to run for some strange reason.
I've contributed to mary. I feel it has potential if someone smarter than me separated the HMM voices out of the core (those voices don't need large data sets and sound ok). I'm also trying to do a event system to freetts to send events when it says a word. I've had success, but it is broken in linux now. (probably because of a timer bug).
Thanks a lot everyone, the trick is in FreeTTS source. Briefly: if being run as java -jar freetts.jar some-more-args-here, it spells lesser words than when being executed in a manner of bin/Server.jar and bin/Client.jar.
I found little comfortable with MarryTTS It has multilanguage and clear voice to understand.
T convert speech to text, the better optiion is sphinx4-5prealpha.
I give one thumb, because it has adjustable, flexibility and modifiable recognizer and grammer.

Categories