implementing Public Suffix extraction using java

implementing Public Suffix extraction using java - java

i need to extract the top domain of an url and i got his http://publicsuffix.org/index.html
and the java implementation is in http://guava-libraries.googlecode.com and i could not find
any example to extract domain name
say example..
example.google.com
returns google.com
and bing.bing.bing.com
returns bing.com
can any one tell me how can i implement using this library with an example....

It looks to me like InternetDomainName.topPrivateDomain() does exactly what you want. Guava maintains a list of public suffixes (based on Mozilla's list at publicsuffix.org) that it uses to determine what the public suffix part of the host is... the top private domain is the public suffix plus its first child.
Here's a quick example:
public class Test {
public static void main(String[] args) throws URISyntaxException {
ImmutableList<String> urls = ImmutableList.of(
"http://example.google.com", "http://google.com",
"http://bing.bing.bing.com", "http://www.amazon.co.jp/");
for (String url : urls) {
System.out.println(url + " -> " + getTopPrivateDomain(url));
}
}
private static String getTopPrivateDomain(String url) throws URISyntaxException {
String host = new URI(url).getHost();
InternetDomainName domainName = InternetDomainName.from(host);
return domainName.topPrivateDomain().name();
}
}
Running this code prints:
http://example.google.com -> google.com
http://google.com -> google.com
http://bing.bing.bing.com -> bing.com
http://www.amazon.co.jp/ -> amazon.co.jp

I recently implemented a Public Suffix List API:
PublicSuffixList suffixList = new PublicSuffixListFactory().build();
assertEquals(
"google.com", suffixList.getRegistrableDomain("example.google.com"));
assertEquals(
"bing.com", suffixList.getRegistrableDomain("bing.bing.bing.com"));
assertEquals(
"amazon.co.jp", suffixList.getRegistrableDomain("www.amazon.co.jp"));

EDIT: Sorry I've been a little too fast. I didn't think of co.jp. co.uk, and so on. You will need to get a list of possible TLDs from somewhere. You could also take a look at http://commons.apache.org/validator/ to validate a TLD.
I think something like this should work: But maybe there exists some Java-Standard Function.
String url = "http://www.foobar.com/someFolder/index.html";
if (url.contains("://")) {
url = url.split("://")[1];
}
if (url.contains("/")) {
url = url.split("/")[0];
}
// You need to get your TLDs from somewhere...
List<String> magicListofTLD = getTLDsFromSomewhere();
int positionOfTLD = -1;
String usedTLD = null;
for (String tld : magicListofTLD) {
positionOfTLD = url.indexOf(tld);
if (positionOfTLD > 0) {
usedTLD = tld;
break;
}
}
if (positionOfTLD > 0) {
url = url.substring(0, positionOfTLD);
} else {
return;
}
String[] strings = url.split("\\.");
String foo = strings[strings.length - 1] + "." + usedTLD;
System.out.println(foo);

Related

Java - Replace host in url?

In java, I'd like to replace the Host part of an url with a new Host, where both the host and url are supplied as a string.
This should take into account the fact that the host could have a port in it, as defined in the RFC
So for example, given the following inputs
url: http://localhost/me/out?it=5
host: myserver:20000
I should get the following output from a function that did this correctly
http://myserver:20000/me/out?it=5
Does anyone know of any libraries or routines that do Host replacement in an url correctly?
EDIT: For my use case, I want my host replacement to match what a java servlet would respond with. I tried this out by running a local java web server, and then tested it using curl -H 'Host:superduper.com:80' 'http://localhost:8000/testurl' and having that endpoint simply return the url from request.getRequestURL().toString(), where request is a HttpServletRequest. It returned http://superduper.com/testurl, so it removed the default port for http, so that's what I'm striving for as well.

The Spring Framework provides the UriComponentsBuilder. You can use it like this:
import org.springframework.web.util.UriComponentsBuilder;
String initialUri = "http://localhost/me/out?it=5";
UriComponentsBuilder builder = UriComponentsBuilder.fromHttpUrl(initialUri);
String modifiedUri = builder.host("myserver").port("20000").toUriString();
System.out.println(modifiedUri);
// ==> http://myserver:20000/me/out?it=5
Here you need to provide hostname and port in separate calls to get right encoding.

You were right to use java.net.URI. The host and port (and user/password, if they exist) are collectively known as the authority component of the URI:
public static String replaceHostInUrl(String originalURL,
String newAuthority)
throws URISyntaxException {
URI uri = new URI(originalURL);
uri = new URI(uri.getScheme().toLowerCase(Locale.US), newAuthority,
uri.getPath(), uri.getQuery(), uri.getFragment());
return uri.toString();
}
(A URI’s scheme is required to be lowercase, so while the above code can be said not to perfectly preserve all of the original URL’s non-authority parts, an uppercase scheme was never actually legal in the first place. And, of course, it won’t affect the functionality of the URL connections.)
Note that some of your tests are in error. For instance:
assertEquals("https://super/me/out?it=5", replaceHostInUrl("https://www.test.com:4300/me/out?it=5","super:443"));
assertEquals("http://super/me/out?it=5", replaceHostInUrl("http://www.test.com:4300/me/out?it=5","super:80"));
Although https://super/me/out?it=5 is functionally identical to https://super:443/me/out?it=5 (since the default port for https is 443), if you specify an explicit port in a URI, then the URI has a port specified in its authority and that’s how it should stay.
Update:
If you want an explicit but unnecessary port number to be stripped, you can use URL.getDefaultPort() to check for it:
public static String replaceHostInUrl(String originalURL,
String newAuthority)
throws URISyntaxException,
MalformedURLException {
URI uri = new URI(originalURL);
uri = new URI(uri.getScheme().toLowerCase(Locale.US), newAuthority,
uri.getPath(), uri.getQuery(), uri.getFragment());
int port = uri.getPort();
if (port > 0 && port == uri.toURL().getDefaultPort()) {
uri = new URI(uri.getScheme(), uri.getUserInfo(),
uri.getHost(), -1, uri.getPath(),
uri.getQuery(), uri.getFragment());
}
return uri.toString();
}

I quickly tried using java.net.URI, javax.ws.rs.core.UriBuilder, and org.apache.http.client.utils.URIBuilder, and none of them seemed to get the idea of a host header possibly including a port, so they all needed some extra logic from what I could see to make it happen correctly, without the port being "doubled up" at times, and not replaced correctly at other times.
Since java.net.URL doesnt require any extra libs, I used it. I do know that if I was using URL.equals somewhere, that could be a problem as it does DNS lookups possibly, but I'm not so I think it's good, as this covers my use cases, as displayed by the pseudo unit test.
I put together this way of doing it, which you can test it out online here at repl.it !
import java.net.URL;
import java.net.MalformedURLException;
class Main
{
public static void main(String[] args)
{
testReplaceHostInUrl();
}
public static void testReplaceHostInUrl()
{
assertEquals("http://myserver:20000/me/out?it=5", replaceHostInUrl("http://localhost/me/out?it=5","myserver:20000"));
assertEquals("http://myserver:20000/me/out?it=5", replaceHostInUrl("http://localhost:19000/me/out?it=5","myserver:20000"));
assertEquals("http://super/me/out?it=5", replaceHostInUrl("http://localhost:19000/me/out?it=5","super"));
assertEquals("http://super/me/out?it=5", replaceHostInUrl("http://www.test.com/me/out?it=5","super"));
assertEquals("https://myserver:20000/me/out?it=5", replaceHostInUrl("https://localhost/me/out?it=5","myserver:20000"));
assertEquals("https://myserver:20000/me/out?it=5", replaceHostInUrl("https://localhost:19000/me/out?it=5","myserver:20000"));
assertEquals("https://super/me/out?it=5", replaceHostInUrl("https://www.test.com/me/out?it=5","super"));
assertEquals("https://super/me/out?it=5", replaceHostInUrl("https://www.test.com:4300/me/out?it=5","super"));
assertEquals("https://super/me/out?it=5", replaceHostInUrl("https://www.test.com:4300/me/out?it=5","super:443"));
assertEquals("http://super/me/out?it=5", replaceHostInUrl("http://www.test.com:4300/me/out?it=5","super:80"));
assertEquals("http://super:8080/me/out?it=5", replaceHostInUrl("http://www.test.com:80/me/out?it=5","super:8080"));
assertEquals("http://super/me/out?it=5&test=5", replaceHostInUrl("http://www.test.com:80/me/out?it=5&test=5","super:80"));
assertEquals("https://super:80/me/out?it=5&test=5", replaceHostInUrl("https://www.test.com:80/me/out?it=5&test=5","super:80"));
assertEquals("https://super/me/out?it=5&test=5", replaceHostInUrl("https://www.test.com:80/me/out?it=5&test=5","super:443"));
assertEquals("http://super:443/me/out?it=5&test=5", replaceHostInUrl("http://www.test.com:443/me/out?it=5&test=5","super:443"));
assertEquals("http://super:443/me/out?it=5&test=5", replaceHostInUrl("HTTP://www.test.com:443/me/out?it=5&test=5","super:443"));
assertEquals("http://SUPERDUPER:443/ME/OUT?IT=5&TEST=5", replaceHostInUrl("HTTP://WWW.TEST.COM:443/ME/OUT?IT=5&TEST=5","SUPERDUPER:443"));
assertEquals("https://SUPERDUPER:23/ME/OUT?IT=5&TEST=5", replaceHostInUrl("HTTPS://WWW.TEST.COM:22/ME/OUT?IT=5&TEST=5","SUPERDUPER:23"));
assertEquals(null, replaceHostInUrl(null, null));
}
public static String replaceHostInUrl(String url, String newHost)
{
if (url == null || newHost == null)
{
return url;
}
try
{
URL originalURL = new URL(url);
boolean hostHasPort = newHost.indexOf(":") != -1;
int newPort = originalURL.getPort();
if (hostHasPort)
{
URL hostURL = new URL("http://" + newHost);
newHost = hostURL.getHost();
newPort = hostURL.getPort();
}
else
{
newPort = -1;
}
// Use implicit port if it's a default port
boolean isHttps = originalURL.getProtocol().equals("https");
boolean useDefaultPort = (newPort == 443 && isHttps) || (newPort == 80 && !isHttps);
newPort = useDefaultPort ? -1 : newPort;
URL newURL = new URL(originalURL.getProtocol(), newHost, newPort, originalURL.getFile());
String result = newURL.toString();
return result;
}
catch (MalformedURLException e)
{
throw new RuntimeException("Couldnt replace host in url, originalUrl=" + url + ", newHost=" + newHost);
}
}
public static void assertEquals(String expected, String actual)
{
if (expected == null && actual == null)
{
System.out.println("TEST PASSED, expected:" + expected + ", actual:" + actual);
return;
}
if (! expected.equals(actual))
throw new RuntimeException("Not equal! expected:" + expected + ", actual:" + actual);
System.out.println("TEST PASSED, expected:" + expected + ", actual:" + actual);
}
}

I realize this is a pretty old question; but posting a simpler solution in case someone else needs it.
String newUrl = new URIBuilder(URI.create(originalURL)).setHost(newHost).build().toString();

I've added a method to do this in the RawHTTP library, so you can simply do this:
URI uri = RawHttp.replaceHost(oldUri, "new-host");
Added in this commit: https://github.com/renatoathaydes/rawhttp/commit/cbe439f2511f7afcb89b5a0338ed9348517b9163#diff-ff0fec3bc023897ae857b07cc3522366
Feeback welcome, will release it soon.

Or using some regex magic:
public static String replaceHostInUrl(String url, String newHost) {
if (url == null || newHost == null) {
return null;
}
String s = url.replaceFirst("(?i)(?<=(https?)://)(www.)?\\w*(.com)?(:\\d*)?", newHost);
if (s.contains("http://")) {
s = s.replaceFirst(":80(?=/)", "");
} else if (s.contains("https://")) {
s = s.replaceFirst(":443(?=/)", "");
}
Matcher m = Pattern.compile("HTTPS?").matcher(s);
if (m.find()) {
s = s.replaceFirst(m.group(), m.group().toLowerCase());
}
return s;
}

Fast java routine to determine if request URI is within another URL

In a class (Java8), I have a String representing an HTTP URL, e.g. String str1="http://www.foo.com/bar", and another string containing a request URI e.g. str2="/bar/wonky/wonky.html".
What is the fastest way in terms of code execution to determine if str2 is within the context of str1 (e.g. the context is /bar) and then construct the complete url String result = "http://www.foo.com/bar/wonky/wonky.html"?

Well I don't know if there is a faster way to just use String.indexOf(). Here is an approach that I think covers the example you gave (demo):
public static boolean overlap(String a, String b_context) {
//Assume the a URL starts with http:// or https://, the next / is the start of the a_context
int root_index = a.indexOf("/", 8);
String a_context = a.substring(root_index);
String a_host = a.substring(0, root_index);
return b_context.startsWith(a_context);
}
Here is a function that uses the same logic but to combine the two urls if they overlap or throw an exception if they don't
public static String combine(String a, String b_context) {
//Assume the a URL starts with http:// or https://, the next / is the start of the a_context
int root_index = a.indexOf("/", 8);
String a_context = a.substring(root_index);
String a_host = a.substring(0, root_index);
if(b_context.startsWith(a_context)) {
return a_host + b_context;
} else {
throw new RuntimeException("urls do not overlap");
}
}
And here is an example of using them
public static void main(String ... args) {
System.out.println(combine("http://google.com/search", "/search?query=Java+String+Combine"));
System.out.println(combine("http://google.com/search", "/mail?inbox=Larry+Page"));
}

How can I remove the subdomain part of a URL

I am trying to remove subdomain and leave only the domain name followed by the extension.
It is difficult to find the subdomain because I do not know how many dots to expect in a url. some urls end in .com some in .co.uk for example.
How can I remove the subdomain safely so that foo.bar.com becomes bar.com and foo.bar.co.uk becomes bar.co.uk
if(!rawUrl.startsWith("http://")&&!rawUrl.startsWith("https://")){
rawUrl = "http://"+rawUrl;
}
String url = new java.net.URL(rawUrl).getHost();
String urlWithoutSub = ???

What you need is a Public Sufix List, such as the one available at https://publicsuffix.org/. Basically, there is no algorithm that can tell you which suffixes are public, so you need a list. And you’d better used one that is public and well-maintained.

just stumped upon this question and decided to write the following function.
Example Input -> Output:
http://example.com -> http://example.com
http://www.example.com -> http://example.com
ftp://www.a.example.com -> ftp://example.com
SFTP://www.a.example.com -> SFTP://example.com
http://www.a.b.example.com -> http://example.com
http://www.a.c.d.example.com -> http://example.com
http://example.com/ -> http://example.com/
https://example.com/aaa -> http://example.com/aaa
http://www.example.com/aa/bb../d -> http://example.com/aa/bb../d
FILE://www.a.example.com/ddd/dd/../ff -> FILE://example.com/ddd/dd/../ff
HTTPS://www.a.b.example.com/index.html?param=value -> HTTPS://example.com/index.html?param=value
http://www.a.c.d.example.com/#yeah../..! -> http://lmao.com/#yeah../..!
Same goes for second level domains
http://some.thing.co.uk/?ke - http://thing.co.uk/?ke
something.co.uk/?ke - something.co.uk/?ke
www.something.co.uk/?ke - something.co.uk/?ke
www.something.co.uk - something.co.uk
https://www.something.co.uk - https://something.co.uk
Code:
public static String removeSubdomains(String url, ArrayList<String> secondLevelDomains) {
// We need our URL in three parts, protocol - domain - path
String protocol= getProtocol(url);
url = url.substring(protocol.length());
String urlDomain=url;
String path="";
if(urlDomain.contains("/")) {
int slashPos = urlDomain.indexOf("/");
path=urlDomain.substring(slashPos);
urlDomain=urlDomain.substring(0, slashPos);
}
// Done, now let us count the dots . .
int dotCount = Strng.countOccurrences(urlDomain, ".");
// example.com <-- nothing to cut
if(dotCount==1){
return protocol+url;
}
int dotOffset=2; // subdomain.example.com <-- default case, we want to remove everything before the 2nd last dot
// however, somebody had the glorious idea, to have second level domains, such as co.uk
for (String secondLevelDomain : secondLevelDomains) {
// we need to check if our domain ends with a second level domain
// example: something.co.uk we don't want to cut away "something", since it isn't a subdomain, but the actual domain
if(urlDomain.endsWith(secondLevelDomain)) {
// we increase the dot offset with the amount of dots in the second level domain (co.uk = +1)
dotOffset += Strng.countOccurrences(secondLevelDomain, ".");
break;
}
}
// if we have something.co.uk, we have a offset of 3, but only 2 dots, hence nothing to remove
if(dotOffset>dotCount) {
return protocol+urlDomain+path;
}
// if we have sub.something.co.uk, we have a offset of 3 and 3 dots, so we remove "sub"
int pos = Strng.nthLastIndexOf(dotOffset, ".", urlDomain)+1;
urlDomain = urlDomain.substring(pos);
return protocol+urlDomain+path;
}
public static String getProtocol(String url) {
String containsProtocolPattern = "^([a-zA-Z]*:\\/\\/)|^(\\/\\/)";
Pattern pattern = Pattern.compile(containsProtocolPattern);
Matcher m = pattern.matcher(url);
if (m.find()) {
return m.group();
}
return "";
}
public static ArrayList<String> getPublicSuffixList(boolean loadFromPublicSufficOrg) {
ArrayList<String> secondLevelDomains = new ArrayList<String>();
if(!loadFromPublicSufficOrg) {
secondLevelDomains.add("co.uk");secondLevelDomains.add("co.at");secondLevelDomains.add("or.at");secondLevelDomains.add("ac.at");secondLevelDomains.add("gv.at");secondLevelDomains.add("ac.at");secondLevelDomains.add("ac.uk");secondLevelDomains.add("gov.uk");secondLevelDomains.add("ltd.uk");secondLevelDomains.add("fed.us");secondLevelDomains.add("isa.us");secondLevelDomains.add("nsn.us");secondLevelDomains.add("dni.us");secondLevelDomains.add("ac.ru");secondLevelDomains.add("com.ru");secondLevelDomains.add("edu.ru");secondLevelDomains.add("gov.ru");secondLevelDomains.add("int.ru");secondLevelDomains.add("mil.ru");secondLevelDomains.add("net.ru");secondLevelDomains.add("org.ru");secondLevelDomains.add("pp.ru");secondLevelDomains.add("com.au");secondLevelDomains.add("net.au");secondLevelDomains.add("org.au");secondLevelDomains.add("edu.au");secondLevelDomains.add("gov.au");
}
try {
String a = URLHelpers.getHTTP("https://publicsuffix.org/list/public_suffix_list.dat", false, true);
Scanner scanner = new Scanner(a);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
if(!line.startsWith("//") && !line.startsWith("*") && line.contains(".")) {
secondLevelDomains.add(line);
}
}
scanner.close();
} catch (Exception e) {
e.printStackTrace();
}
return secondLevelDomains;
}

Fetch all the hyperlinks from a webpage and recursively doing that in java

1 .Fetch all contents from a Webpage
2. fetch hyperlinks from the webpage.
3. Repeat the 1 & 2 from the fetched hyperlink
4. repeat the process untill 200 hyperlinks regietered or no more hyperlink to fetch.
I wrote a sample programs but due to poor understanding of recursion , my loop became an infinite loop.
Suggest me to solve the code matching the expectation.
import java.net.URL;
import java.net.URLConnection;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Content
{
private static final String HTML_A_HREF_TAG_PATTERN =
"\\s*(?i)href\\s*=\\s*(\"([^\"]*\")|'[^']*'|([^'\">\\s]+))";
Pattern pattern;
public Content ()
{
pattern = Pattern.compile(HTML_A_HREF_TAG_PATTERN);
}
private void fetchContentFromURL(String strLink) {
String content = null;
URLConnection connection = null;
try {
connection = new URL(strLink).openConnection();
Scanner scanner = new Scanner(connection.getInputStream());
scanner.useDelimiter("\\Z");
content = scanner.next();
}catch ( Exception ex ) {
ex.printStackTrace();
return;
}
fetchURL(content);
}
private void fetchURL ( String content )
{
Matcher matcher = pattern.matcher( content );
while(matcher.find()) {
String group = matcher.group();
if(group.toLowerCase().contains( "http" ) || group.toLowerCase().contains( "https" )) {
group = group.substring( group.indexOf( "=" )+1 );
group = group.replaceAll( "'", "" );
group = group.replaceAll( "\"", "" );
System.out.println("lINK "+group);
fetchContentFromURL(group);
}
}
System.out.println("DONE");
}
/**
* #param args
*/
public static void main ( String[] args )
{
new Content().fetchContentFromURL( "http://www.google.co.in" );
}
}
I am open for any other solution as well but want to stick with core java Api only no 3rd party.

One possible option here is to remember all visited links to avoid cyclic paths. Here's how to archive it with additional Set storage for already visited links:
public class Content {
private static final String HTML_A_HREF_TAG_PATTERN =
"\\s*(?i)href\\s*=\\s*(\"([^\"]*\")|'[^']*'|([^'\">\\s]+))";
private Pattern pattern;
private Set<String> visitedUrls = new HashSet<String>();
public Content() {
pattern = Pattern.compile(HTML_A_HREF_TAG_PATTERN);
}
private void fetchContentFromURL(String strLink) {
String content = null;
URLConnection connection = null;
try {
connection = new URL(strLink).openConnection();
Scanner scanner = new Scanner(connection.getInputStream());
scanner.useDelimiter("\\Z");
if (scanner.hasNext()) {
content = scanner.next();
visitedUrls.add(strLink);
fetchURL(content);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
private void fetchURL(String content) {
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
String group = matcher.group();
if (group.toLowerCase().contains("http") || group.toLowerCase().contains("https")) {
group = group.substring(group.indexOf("=") + 1);
group = group.replaceAll("'", "");
group = group.replaceAll("\"", "");
System.out.println("lINK " + group);
if (!visitedUrls.contains(group) && visitedUrls.size() < 200) {
fetchContentFromURL(group);
}
}
}
System.out.println("DONE");
}
/**
* #param args
*/
public static void main(String[] args) {
new Content().fetchContentFromURL("http://www.google.co.in");
}
}
I also fixed some other issues in fetching logic, now it works as expected.

inside the fetchContentFromURL method you should record which url u r currently fetching, and if that url has already be fetched then skip it. otherwise two page A, B, which has a link point to each other will cause your code keep fetching.

In addition to JK1's answer, for achieving target 4 of your question, you might want to maintain the count of hyperlinks as instance variable. A rough pseudo code might be(you can adjust the exact count. Also as an alternate, you can use HashSet length to know the number of Hyperlinks your program has parsed till now):
if (!visitedUrls.contains(group) && noOfHyperlinksVisited++ < 200) {
fetchContentFromURL(group);
}
However, I was not sure whether you want a total of 200 hyperlinks OR want to traverse to a depth of 200 links from starting page. In case it is later, you might wish to explore Breadth First Search, which will let you know when you have reached your target depth.

Using jOrtho spell checker

How to use jOrtho spell checker? I have downloaded the latest dictionary (XML file) from wiktionary. Now how to compile it and implement it in my program?

I found the solution and these are the steps to add spell checking functionality.
First download the jar and pre-compiled dictionary form here: http://sourceforge.net/projects/jortho/files/
Following is the code snippet:
SpellChecker.setUserDictionaryProvider(new FileUserDictionary());
SpellChecker.registerDictionaries(this.getClass().getResource("/dictionary"), "en");
SpellChecker.register(messageWriter);
Here, messageWriter is JEditor pane.
Refer to documentation explanation. Put the dictionaries.cnf and dictionary_en.ortho files inside src/dictionary folder.
You can also manipulate the pop-up menu options. Here is an example what I have done:
SpellCheckerOptions sco=new SpellCheckerOptions();
sco.setCaseSensitive(true);
sco.setSuggestionsLimitMenu(10);
JPopupMenu popup = SpellChecker.createCheckerPopup(sco);
messageWriter.addMouseListener(new PopupListener(popup));
Restricting the options to 10. See docs.

First, you have to download the library. http://sourceforge.net/projects/jortho/files/JOrtho%20Library/0.5/ Their zip file should include one or more .jar files. You will need to add these into your classpath. The way you do this depends on how you do your development. If you're using Netbeans, it's different than the way you would do it in Eclipse.
If their zip file includes documentation for their API, you should be able to use that to add it to you Java program. If it does not, you might need to look for an alternative. It looks like the links on their site are dead. Which is usually a bad sign.
There are alternatives. It didn't take me long to find this one http://jazzy.sourceforge.net/ for example. It looks like it's the one used by Lucene internally. It also has a better license than jortho does.
Good luck.

for use in app whiout gui:
public class Checker {
private static Map<String, Method> methods;
public static void main(String[] args) throws NoSuchMethodException, ClassNotFoundException, InvocationTargetException, IllegalAccessException {
SpellChecker.registerDictionaries(Checker.class.getResource("/dictionary/"), "en");
methods = new HashMap<>();
setAccessibleMethod(LanguageBundle.class, "get", Locale.class);
setAccessibleMethod(LanguageBundle.class,
"existInDictionary",
String.class,
Checker.class.getClassLoader().loadClass("com.inet.jortho.Dictionary"),
com.inet.jortho.SpellCheckerOptions.class,
boolean.class
);
setAccessibleMethod(SpellChecker.class, "getCurrentDictionary");
while (SpellChecker.getCurrentLocale() == null) {
try {
Thread.sleep(10);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
Object dictionary = invokeMethod(SpellChecker.class, "getCurrentDictionary", null);
LanguageBundle bundle = (LanguageBundle) invokeMethod(LanguageBundle.class, "get", null, SpellChecker.getCurrentLocale());
Set<String> errors = new HashSet<>();
StringTokenizer st = new StringTokenizer("A sentence with a error in the Hitchhiker's Guide tot he Galaxy");
boolean newSentence = true;
while (st.hasMoreTokens()) {
String word = st.nextToken();
boolean b = true;
boolean nextNewSentence = false;
if (word.length() > 1) {
if ('.' == word.charAt(word.length() - 1)) {
nextNewSentence = true;
word = word.substring(0, word.length() - 1);
}
b = (Boolean) invokeMethod(LanguageBundle.class, "existInDictionary", bundle,
word,
dictionary,
SpellChecker.getOptions(),
newSentence);
}
if (!b)
errors.add(word);
newSentence = nextNewSentence;
}
System.out.println(StringUtils.join(errors, " , "));
}
private static void setAccessibleMethod(Class<?> cls, String name, Class<?>... parameterTypes) throws NoSuchMethodException {
Method method = cls.getDeclaredMethod(name, parameterTypes);
method.setAccessible(true);
methods.put(cls.getName() + "." + name, method);
}
private static Object invokeMethod(Class<?> cls, String name, Object obj, Object... args) throws InvocationTargetException, IllegalAccessException {
return methods.get(cls.getName() + "." + name).invoke(obj, args);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

implementing Public Suffix extraction using java - java

Related

Java - Replace host in url?

Fast java routine to determine if request URI is within another URL

How can I remove the subdomain part of a URL

Fetch all the hyperlinks from a webpage and recursively doing that in java

Using jOrtho spell checker

Categories

Resources