Java and SEO URLS

Java and SEO URLS - java

I'm building a webapp using spring MVC and am curious as to whether there is any clean way to make SEO urls.
For example, instead of http://mysite.com/articles/articleId and the such, have:
http://mysite.com/articles/my-article-subject

If you're using the new Spring-MVC annotations, you can use the #RequestMapping and #PathVariable annotations:
#RequestMapping("/articles/{subject}")
public ModelAndView findArticleBySubject(#PathVariable("subject") String subject)
{
// strip out the '-' character from the subject
// then the usual logic returning a ModelAndView or similar
}
I think it is still necessary to strip out the - character.

This might be of interest to you:
http://tuckey.org/urlrewrite/
If you are familiar with mod_rewrite on Apache servers, this is a similar concept.

http://mysite.com/articles/my-article-subject is a much stronger URL than http://mysite.com/articles/articleId - especially if the title and header tags match "my-article-subject" too and you have "my", "article" and "subject" in the content of the page.

For example if you want the url
http:///blog/11/12/2009/my-hello-world-post/
then configure the servlet mapping
<servlet>
<servlet-class>com.blog.Blog</servlet-class>
<servlet-name>blog</servlet-name>
<servlet-class>com.blog.Blog</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>blog</servlet-name>
<url-pattern>/blog/*</url-pattern>
</servlet-mapping>
and in the servlet code
String url = request.getPathInfo();
StringTokenizer tokens = new StringTokenizer(url,"/");
while(tokens.hasMoreTokens()){
out.println(""+tokens.nextToken());
}
Use these params to get the data from database and display to user

The standard Java web frameworks are not ready for those kind of URL.
AFAIK, SpringMVC does not support this kind of URL.
There are two frameworks I'm sure that support this kind of URL: Mentawai and VRaptor.

If you're only looking for a SEO optimization you could design your URLs this way:
http://mysite.com/articles/my-article-subject/articleId
or
http://mysite.com/articles/articleId/my-article-subject
and just ignore the part my-article-subject when evaluating the urls.
Amazon does something like that with their URLs:
http://www.amazon.com/Dark-Crystal-Jean-Pierre-Amiel/dp/B00000JPH6/ref=sr_1_1?ie=UTF8&s=dvd&qid=1240561659&sr=8-1
Here the Text "Dark-Crystal-Jean-Pierre-Amiel" is totally irrelevant because the article is identified by the id B00000JPH6.
Edit: In fact I just noticed that right here on SO this exact technique is used to generate SEO-friendly URLs...

I have used pretty faces http://ocpsoft.org/prettyfaces/ for our application because it was JSF based. This is a very neat solution.
Not sure if it will work for Spring MVC
Have a look at our URL
http://www.skill-guru.com/cat/certification-mock-test
http://www.skill-guru.com/test/81/core-spring-3.0-certification-mock
http://www.skill-guru.com/tutor/Pro+ESL
Earlier we had non SEO friendly URL's with jsession Id's appended to it. Now it is all neat and clean with the help of pretty filter.
This is in very in line with wordpress url.

generate url containing both id and description like this url http://stackoverflow.com/questions/784891/java-and-seo-urls . in the servlet parse the url and then use id for fetching data from database. Same Technique is applies on this stackoverflow page too. look at the url. its http://stackoverflow.com/questions/784891/java-and-seo-urls
however only questionId is considered and description is ignored. try http://stackoverflow.com/questions/784891 or http://stackoverflow.com/questions/784891/abcdxyz . you will get same page. this is very good technique to generate seo urls

Related

Java - Retain only the Rest API base URL of a Rest API

Small question regarding how to use java to retain only the base URL of a rest API please.
As input, many strings, all valid rest APIs.
For instance, the inputs:
https://some-host.com/v1/someapi
https://another-host.fr/api/compute
https://somewhere.host.com/public/api/v3/getsomething
I would like to only retain the bold part, basically, the https, the : and the slashes, the host name. Everything that comes after the host, I would like to discard it.
Currently, I am trying some kind of string.split based on the / character, then trying to re-concat the arrays, but I have a feeling I am not going to the right direction.
What would be the most appropriate way please?
Thank you.

You could just try java.net.URL or java.net.URI. They behave pretty similar.
For example:
URL url = new URL("http://example.com/a/b/c");
url.getProtocol();
url.getHost();
url.getPath();
or:
URI uri = new URI("http://example.com/a/b/c");
uri.getScheme();
uri.getHost();
uri.getPath();
There are several methods in both classes to extract lot's of different parts.

JSP/Servlet rewrite URL

I'd like to change the URL of some pages in my website in the same way as foursquare is doing:
from www.foursquare.com/v/anystring/venueid
to www.foursquare.com/v/venue-name/venueid
For example central park in new york:
https://foursquare.com/v/writeherewhatyouwant/412d2800f964a520df0c1fe3
becomes
https://foursquare.com/v/central-park/412d2800f964a520df0c1fe3
I'm developing a pure JSP/Servlet app, no frameworks, in a Tomcat container.
I'm thought of using tuckey's urlrewritefilter, but I don't see how can I use dynamic values coming from the servlet itself there (the venue name)
How can I accomplish this?

Off the top of my head, here's something you could try:
1) Create a servlet with a servlet-mapping matching the common (prefix) part of the URL (e.g. for foursquare the pattern would be /v/*).
2) In your servlet, retrieve the remaining part of the URL path using request.getPathInfo(). You can then parse it using regular string utilities and convert it to the new path you'd like.
3) Assuming your updated path is in a variable called newUrl, call response.sendRedirect(newUrl) to tell the browser to update its URL. This will also call your servlet again with the new path, so it needs to handle both cases.
See the javadoc for HttpServletResponse.sendRedirect() for more info about how it handles relative vs absolute paths, etc.

How to change I18N locale NOT by querystring?

This is a struts2 question.
Currently, I am using i18n for internationalization in my webapp.
Some of my jsp pages has querystring to store the request information.
For example,
http://myWebsite.com/myWebsite/myPage?productId=12345
When users try to switch language, I rewrite the URL by javascript to
http://myWebsite.com/myWebsite/myPage?request_locale=zh_CN
And it loses the query string.
And my urls are used in different ways:
http://myWebsite.com/myWebsite/myPage
http://myWebsite.com/myWebsite/myPage?productId=12345#myAnchor
http://myWebsite.com/myWebsite/myPage?productId=12345&key2=value2&key3=value3#myAnchor
http://myWebsite.com/myWebsite/myPage?productId=12345&key2=value2&key3=value3&request_locale=zh_CN
http://myWebsite.com/myWebsite/myPage?productId=12345&key2=value2&key3=value3
...
When I try to handle all this differences in javascript, it becomes so complicated.
Is there any good way to retain the querystrings and anchors after switching locale?

When you rewrite the URL using JS, you should do it based on the current URL.
So, if in the browser is showed
http://myWebsite.com/myWebsite/myPage?productId=12345
you should rewrite it as
http://myWebsite.com/myWebsite/myPage?productId=12345&request_locale=zh_CN
For that, using JS you should get the current URL displayed. Take a look to this question for how to.

Are there any java libraries for validating user supplied HTML, on the server side?

I have a service which takes the user supplied rich text (can have HTML tags) and saves it into the database. That data gets used by some other application. But sometimes the user supplied data has missing HTML tags and wrong closing tags. I want to validate if the user supplied data is valid HTML or not and depending on that I want to warn the user.
Are there any java libraries to do HTML validation?

You can try JTidy, but it's too slow for simple HTML cleaning.
If you want just process HTML you can try NekoHTML, it's lightweight and fast

You can try JTidy.
JTidy is a Java port of HTML Tidy, a
HTML syntax checker and pretty
printer.

You can use Jsoup, from the project README
Here is an example:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
...
String markup = "<body><head>...";
Jsoup.isValid(markup, null);
Instead of null, you can pass a Whitelist ? object as second parameter to the isValid method.
Plus, you can easily install this library using Gradle

Validator.nu, which implements the HTML5 spec, IMO.

There's a great thing called NekoHTML which is just a thin wrapper over the Apache Xerces parser that turns on error-recovery/correction. It doesn't validate so much as error-correct, so you can process the result as XML, i.e. run it through XPaths or XSLTs. It has worked flawlessly for me for several months on completely arbitrary HTML from 3rd-party sites.

Libs for HTML sanitizing

I'm looking for a html sanitizer which I can call per API to sanitise strings which I get from my webapp. Are there some useful easy to use libs available? Does anyone knows maybe one or two?
I don't need something big it just must be able to find unclosed tags and close them.

https://github.com/OWASP/java-html-sanitizer is now marked ready for production use.
A fast and easy to configure HTML Sanitizer written in Java which lets you include HTML authored by third-parties in your web application while protecting against XSS.
You can use prepackaged policies
Sanitizers.FORMATTING.and(Sanitizers.LINKS)
or the tests show how you can configure your own easily:
new HtmlPolicyBuilder()
.allowElements("a")
.allowUrlProtocols("https")
.allowAttributes("href").onElements("a")
.requireRelNofollowOnLinks()
or write custom policies to do things like changing h1s to divs with a certain class:
new HtmlPolicyBuilder()
.allowElements("h1", "p")
.allowElements(
new ElementPolicy() {
public String apply(String elementName, List<String> attrs) {
attrs.add("class");
attrs.add("header-" + elementName);
return "div";
}
}, "h1"))

JTidy may help you.

The HTML Parser JSoup also supports sanitisation by policy: http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer

Apart from JTidy you can also take a look at:
Nekohtml
TagSoup
Getting text in HTmL document

http://roberto.open-lab.com/2009/11/05/a-java-html-sanitizer-also-against-xss/

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java and SEO URLS - java

I'm building a webapp using spring MVC and am curious as to whether there is any clean way to make SEO urls. For example, instead of http://mysite.com/articles/articleId and the such, have: http://mysite.com/articles/my-article-subject

This might be of interest to you: http://tuckey.org/urlrewrite/ If you are familiar with mod_rewrite on Apache servers, this is a similar concept.

http://mysite.com/articles/my-article-subject is a much stronger URL than http://mysite.com/articles/articleId - especially if the title and header tags match "my-article-subject" too and you have "my", "article" and "subject" in the content of the page.

The standard Java web frameworks are not ready for those kind of URL. AFAIK, SpringMVC does not support this kind of URL. There are two frameworks I'm sure that support this kind of URL: Mentawai and VRaptor.

Related

Java - Retain only the Rest API base URL of a Rest API

JSP/Servlet rewrite URL

How to change I18N locale NOT by querystring?

Are there any java libraries for validating user supplied HTML, on the server side?

Libs for HTML sanitizing

Categories

Resources