Enabling the search engines to index data from web application

Enabling the search engines to index data from web application - java

I am building a social web application using Java and Cassandra DB. I want some of the data from my database to be visible to search engines.
Since my application is completely dynamic & contains data only in DB and not in static pages, how do the crawlers read this data?
1.)How can I ensure that the data stored on my servers can be seen by the search engines? My application contains user specific data
2.)How do the search engines access that data ??
3.)How can I limit the search engines crawling only to some specific data?

Read the explanations from Google.
The search engines access your data as any other user of your website : by browsing it and clicking all the links they find. Content accessible only through AJAX will be more difficult to make accessible by search engines.
Access can be restricted using a robots.txt file. Explanations are given in the link given above.

1) You need to separate user specific info from public info, either you should have public and private pages - or you could decorate you'r public page with user specifics through some session based Ajax calls.
Meaning: the browser just load the public version of the page, while a javascript would load the users specifics and inject them into the page.
2 and 3 could be solved by uploading a site map to Google.
Or do you want Google to talk to Cassendra directly...? Then ignore all above - I think.

Related

Search indexing of AngularJS application in Java

I have AngularJS application already in production and I need to make it Google-friendly. I have read about Ajax crawling mechanism and I have following two problems:
1. Since I have backend written in Java, I have tried HtmlUnit to make static snapshots, but it didn't work well with AngularJS. How can I serve snapshots of my AngularJS pages using Java?
2. As I have mentioned, application is already published and it uses simple hash without !. E.g.: /#/about, /#/home. Is it possible to keep this scheme? Change to /#!/ would require modifications of all links and would break all existing links (posted on web).
Thanks in advance!

The SEO is always an issue with single-page application.
I suggest you make a quick implementation of Phantom.js to display already rendered page to google bots. Check out this link for more informations.
Phantom.js will be a gateway to which you redirect every indexing bot request, it will then render your app like a normal user will do, and then send back the rendered page to the bot.
Also, it would be better to change your /#/ to only /, it's better for your users and also SEO. You just need to redirect every request to the index.html page and to use '/#/' as fallback for old browser which doesn't support pushState.
You also have some paid solution like https://prerender.io/ which works beautifuly.

Using HTTP 3XX redirect to Google Cloud Storage objects?

I am developing an application on Google AppEngine (Java) that generates a HTML report. The report gets viewed frequently and modified occasionally, and I am thinking to optimize the performance by scheduling the report to be generated and uploaded to Google Cloud Storage, and have that serve the report instead of AppEngine. So, userA and userB can create reports and access them from userA-report.myapp.com and userB-report.myapp.com, where the content is generated in AppEngine and stored in Cloud Storage.
I, however, have a few constraints:
- Some of the reports had access restrictions, which I would like to be controlled by my application still; in the other words, I don't want to use the ACL and maintain that for restricting access;
- I do not have a way to dynamically configure my CNAME entries; so, I still need to handle the request on AppEngine and redirect to Cloud Storage.
I am thinking what I can do is if I detect that the report is already available on Cloud Storage, I send a HTTP 3XX redirect to http://storage.googleapis.com -- I realize that this is not as performant since it involves another trip, but should still be faster than generating the page again. I can also handle any authentication as needed.
Besides the concern I had above for performance, this sounds "backwards" to me to go to the content server first, then redirect to CDN; Is there a way in Cloud Storage to configure in cases where a file is not found, it hits a different server? Or is my approach completely pointless?

First, for the access restriction question, coming to your server first and redirecting to storage.googleapis.com is a perfectly reasonable approach. You might also want to consider using Signed URLs. With this feature, instead of setting ACLs on objects for access control, simply keep the ACL as private and create limited-time signed URLs when you decide a user should have access.
For your second question about when files are not found, you can use the Website Configuration feature with a custom NotFoundPage. That can be an HTML file, so you can use it to redirect to your application server.

How to create login page in google app engine using java?

I would like to create my own login page instead of using the default one that is from google app engine. After the user click the login button, it will redirect them to the home page of my website. so is there a way to do so?
I am using java as programming language.
Would appreciate if anyone could help. Thanks.

Doing this in App Engine is no different to doing it in any other Java servlet environment, except that you're storing your user data in the datastore. How exactly you implement it is up to you and depends on a number of factors; a complete tutorial on how to build a user authentication system seems out of scope for a stack overflow answer.
I'd strongly recommend using a prebuilt authentication solution, however; if you don't like Google User authenticaiton, App Engine comes with built in OpenID authentication as an option. Rolling your own makes you responsible for issues like secure password storage, which is hard to get right, exposes your users to potential security issues, and forces them to create yet another user account instead of using an existing one.

GWT Dynamic page generation

I have written a webpage with GWT which contains auto-generated Hyperlinks. These hyperlinks currently dont point to anything, however, I want them to display certain dynamic information based on the name of the hyperlink. For instance if the hyperlink says iPhone, it should open up another URL with dynamic information about the iPhone which I retrieve from my database. I know JSP/Servlets are used to generate dynamic information on webpages, but how can I integrate such functionality into my GWT webpage?
Thanks
Great this certainly helps in giving me an idea on how I can go about my design.
As a follow up though I have a question on how I can access my backend DB. Now I have stored some data in a SQLite DB which I want to be displayed on the webpage. I was able to implement backend access via GWT's RPC, however, it doesnt seem to allow transfer of a ResultSet object returned by a query. How can ResultSets be transferred? In my browsing I have seen a few keywords such as DTO, JPA etc thrown around but I dont quite have a picture on how they will plug in.

How about this:
[CLIENT]: add a ClickHandler to your hyperlinks where you execute the following steps:
[CLIENT]: retrieve token from hyperlink (i.e. iPhone).
[CLIENT]: access the backend (RPC, RequestFactory or normal RequestBuilder) and pass the token (iPhone) to the backend
[SERVER]: On the backend (servlet, python, php, etc) handle the AJAX call from your GWT app and return the dynamic information based on the token.
[CLIENT]: Display the dynamic information returned by the server call (step 3) in a HTMLPanel or SimplePanel

How to make data from a database accessible from other web applications

I have a database from which I want to expose data.
Ideally I would like to be able to just add a URL into some other web page and that URL would then call the correct datum using the web app I use to interact with the database.
Would a web service be the best option?

Looks to me like a perfect job for ODATA:
The Open Data Protocol (OData) is a Web protocol for querying and updating data that provides a way to unlock your data and free it from silos that exist in applications today. OData does this by applying and building upon Web technologies such as HTTP, Atom Publishing Protocol (AtomPub) and JSON to provide access to information from a variety of applications, services, and stores.
See it action (showing query results in a browser is just one way to use ODATA).

A URL-based solution as you describe would only work if:
a) the web app framework you use can resolve the URL automatically as it parses and sends the HTML to the browser, or
b) the browser resolves the URL (e.g. the IMG element)
If the web app framework you use can resolve the URL (or if you can extend it so that it does), then you still need something that listens at that URL and retrieve the correct element from the database.

The approach here depends on whether you are doing Ajax style web pages or simple HTML, where each UI update refreshes the whole page.
The latter, a traditional page by page web site, it probably the simplest thing. For this explore JSP technologies. The idea is that you write what looks like an HTML page, but embed in it references to Java objects (or even Java code). In this case you should read up on simple frameworks such as Struts. The broad-brish idea is that you get this sequence of processing
Request arrives from Broswer, interpret it to figure out what the user wants to see
Some Java code talks to the Database gets data puts it in a Java Object
A JSP is chosen, that JSP picks items from the Java Object we just prepared
The JSP renders HTML which is sent to the Browser
In the case of Ajax, JavaScript in the Browser decides to display some data and calls a service to get it. So here, yes a "Web Service" of some kind is needed. Usually we use REST services, which return a payload in JSON format, effectively the data is transferred as JavaScript.
There are plenty of libraries for creating RESTful Web Services, for example Apache Wink.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Enabling the search engines to index data from web application - java

Related

Search indexing of AngularJS application in Java

Using HTTP 3XX redirect to Google Cloud Storage objects?

How to create login page in google app engine using java?

GWT Dynamic page generation

How to make data from a database accessible from other web applications

Categories

Resources