Make GWT Crawlable (SEO) - java

I like to make my GWT-App crawlable by the google bot. I found this article (https://developers.google.com/webmasters/ajax-crawling/). It states there should be a servlet filter, that serves a different view to the google bot. But how can this work? If i use for example the activities and places pattern, than the page changes are on the client-side only and there is no servlet involved -> servlet filter does not work here.
Can someone give me an explanation? Or is there another good tutorial tailored to gwt how to do this?

If you use Activities&Places your "pages" will have a bookmarkable URL (usually composed of the HTML host page, a #, and some tokens separated by ! or other character).
Thus, you can place links ('s) in your application to make it crawlable. If the link contains the proper structure (the one with # and tokens), it will navigate to the proper Place.
Have a look at https://developers.google.com/web-toolkit/doc/latest/DevGuideMvpActivitiesAndPlaces

So here is the solution to the actual problem:
I wanted to make my GWT (running on Google App Engine) crawlable by the google bot and followed this documentation: "https://developers.google.com/webmasters/ajax-crawling/". I was trying to apply a servlet filter that filters every request to my app and checks for the special fragment in the escaped url that is added by the google bot and present a special view to the bot with a headless browser.
But the servlet did not work for the "MyApp.html"-file. I found out then, that all files are treated as static files and are not affected by the filter. I had to exclude the ".html"-Files from these static files. I did this by adding the line "" to the static files in the "appengine-web.xml".
I hope this will help some people with the same problem to save some time :)
Thanks and best regards
jan

Related

Create new site with REST API

I want to create a new site in Alfresco through the REST API first i tried with the url /alfresco/service/api/sites the site was created but i could not open it. I read the method description and it says
Note: this method only creates a site at the repository level, it
does not create a fully functional site. It should be considered for
internal use only at the moment. Currently, creating a site
programmatically needs to be done in the Share context, using the
create-site module. Further information can be found at the address
http://your_domain:8080/share/page/index/uri/modules/create-site.post
within your Alfresco installation.
I tried to go to the suggested url but it gives 404 !
Any help or suggestions ?
Note - the links/references in this answer all assume you've got the Alfresco Share application installed and available at http://localhost:8081/share/ - tweak as needed for your machine
When wanting to understand or discover webscripts, the first place you want to head to is http://localhost:8081/share/service/index - the Share WebScripts home. (The Alfresco repo tier has an equivalent one too, available at a similar URL).
When there, you'll see all of the Share-side WebScripts listed, of which there are a lot. You could search through that list for create-site. However, you can restrict the webscript listing by URL or by module. For the Create Site webscripts, the URL to see just those is http://localhost:8081/share/page/index/uri/modules/create-site
Head there, and you'll discover there are two create site related webscripts, a get and a post. As you've discovered already, the one you'll want is the POST webscript. Click on that to get the details, at http://localhost:8081/share/page/script/org/alfresco/modules/create-site.post - that's the latest (Alfresco 5.x) URL for the thing you've been directed to in your question. If your Share installation is at a different URL, then once you've navigated from the Share webscripts home, you'll get the specific one for your machine
Finally, you'd need to post the required JSON to that webscript's URI, which is given in the webscript listing, eg http://localhost:8081/share/page/modules/create-site . Easiest way to see exactly what JSON you need is to use firebug / developer tools / etc to see the handful of keys/values that Share sends when you create one through the UI.

Using .JAR file within Web Application

I am trying to learn how to develop web applications and I am a little stuck.
I want to create a movie review application. It has a few java class/packages which is inside a JAR file:
moviewebapp.jar
package - moviewebapp.movie
class - Movie.class
package - moviewebApp.servlet
MovieServlet.class
And a servlet class which has methods such as doPost, doGet, getMovie, updateMovie etc.
I have added the .JAR file to my build path.
I have developed a simple page where a user can click add movie review, which opens up a form where they can input the movie name and rating.
I now want to save the users input to a datastore. How do I use the methods within the servlet class in my javascript to deal with the post of the information about the movie?
I have tried doing imports and then trying to create a servlet object but I dont think I have the correct syntax or maybe thats not how I'm meant to do it.
Any help would be appreciated!
Jars are libraries you should not modify its code unless untill it is last option. You can import it's classes and use methods of those classes.
By your question it seems you are new to Java. So I suggest you should start with hello world tutorials of servers and jdbc then start developing applications. Good luck.
Trying to provide some guidance:
Easiest solution (not the best) : you have to submit your page to receive the entry from the user. Doing this you are going to lose the status of the page on the client side.
A better solution is to make a asynchronous javascript request to get the answer provided by the user for you to process on the server side.

JSP/Tomcat: Navigation system with sub-folders but one page

My JSP project is the back-end of a fairly simple site with the purpose to show many submissions which I want to present on the website. They are organized in categories, basically similar to a typical forum.
The content is loaded entirely from a database since making separate files for everything would be extremely redundant.
However, I want to give the users the possibility to navigate properly on my site and also give unique links to each submission.
So for example a link can be: site.com/category1/subcategory2/submission3.jsp
I know how to generate those links, but is there a way to automatically redirect all the theoretically possible links to the main site.com/index.jsp ?
The Java code of the JSP needs access to the original link of course.
Hope someone has an idea..
Big thanks in advance! :)
Alright, in case someone stumbles across this one day...
The way I've been able to solve this was by using a Servlet. Eclipse allows their creation directly in the project and the wizard even allows you to set the url-mapping, for example /main/* so you don't have to mess with the web.xml yourself.
The doGet function simply contains the redirection as follows:
request.getRequestDispatcher("/index.jsp").forward(request,response);
This kind of redirection unfortunately causes all relative links in the webpage to fail. This can be solved by hardlinking to the root directory for example though. See the neat responses here for alternatives: Browser can't access/find relative resources like CSS, images and links when calling a Servlet which forwards to a JSP

Get URL hierarchy from a base link

Before asking my question (which is basically what the title says) I want to provide some background, so as to give a better knowledge about my situation.
I am writing a little application in Java, mainly for academic purposes, but also with a very specific task in mind. What this application does is basically build an URL hierarchy starting from a base URL, and later on give the ability to organize the links and perform some actions on them.
Imagine the following URLs:
http://www.example.com
http://www.example.com/sub001
http://www.example.com/sub002
http://www.example.com/sub002/ultrasub
I would like my program to retrieve this hierarchy when provided with the base URL http://www.example.com (or http://www.example.com/).
In my code I have a class capable of encoding URLs and I have already thought of a way to validate them, I just couldn't find a way to find out the URL hierarchy beneath the base URL.
Is there a direct way of doing it, or do I just have to download the files from the base URL and start building the hierarchy from the relative and absolute links present in the file?
I am not asking for specific code, just a (somewhat) complete explanation of what way I could take to do it, with maybe some skeleton code to guide me.
Also, I am storing the URLs in a TreeMap<URL,Boolean> structure, in which the Boolean states if the URL has already been analyzed or not. I chose this structure after a quick peek in the Java 7 API specification, but do you suggest any structure that's better for this specific purpose?
Thanks in advance :)
There is no way in the HTTP protocol to request all the URL's that are 'under' a given URL. You are out of luck.
Some protocols (ftp://... for example) do have explicit mechanisms.....
Some HTTP Servers will print an index page if you request a 'directory' but this practice is not recommended and not many servers will do that.
Bottom line is that you have to follow links in order to determine what the server hierarchy is, and even then you may not discover a link to all the areas of the hierarchy.
EDIT: I should add that you should, as a well-behaved nettizen, obey the robots.txt file on any servers you access....
EDIT2: (after comment on FTP mechanism)
The FTP protocol has many commands: See this wiki list. One of the commands is: NLIST which "Returns a list of file names in a specified directory."
The URL specification makes special provision in the URL format for FTP protocol URL's, and in section 3.2.2 :
The url-path of a FTP URL has the following syntax:
<cwd1>/<cwd2>/.../<cwdN>/<name>;type=<typecode>
....
If the typecode is "d", perform a NLST (name list) command with as the argument, and interpret the results as a file directory listing.
I can see the effects when I try this from the commandline (not from a browser):
rolf#home ~ $ curl 'ftp://sunsite.unc.edu/README'
Welcome to ftp.ibiblio.org, the public ftp server of ibiblio.org. We
hope you find what you're looking for.
If you have any problems or questions, please see
http://www.ibiblio.org/help/
Thanks!
and type=d I get:
rolfl#home ~ $ curl 'ftp://sunsite.unc.edu/README;type=d'
HEADER.images
incoming
HEADER.html
pub
unc
README

Crawlers in JSP/Struts/Session controlled Webapps

i got a struts web application (running on tomcat 6) with all files except the first one which invokes a starting action located in the WEB-INF and u always need a Session to use it otherwise you will be redirected to the starting action and starting page again.
The app main function is a search which provide products from a database. How does a crawler navigate in my app? Does it trigger the search which could lead it to error pages? Or can it only follow links that are not embedded in forms (well struts makes nearly everything to forms therefore there are only some links and mostly onclick redirects and form actions)
How can i provide useful information that can be indexed to a crawler like this?
thanks for advice :)
Sounds like you would best off reading up on some seo guidelines: http://www.google.com.au/search?q=seo+guidelines&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a&safe=high,
To answer your questsions:
Crawlers will generally navigate to your app from external links on the web, or after you submit your site to the search engine.
The crawler won't fill in inputs and submit forms, it will follow hyperlinks between your pages.
If you want the crawler to index your search results (can't really see why you would want this) you can put links to common searches on one of your already indexed pages.
You should make sure that your product pages are SEO friendly and are indexed instead of your search results.

Categories