Multiple questions in Java (Validating URLs, adding applications to startup, etc.)

Multiple questions in Java (Validating URLs, adding applications to startup, etc.) - java

I have an application to build in Java, and I've got some questions to put.
Is there some way to know if one URL of a webpage is real? The user enters the URL and I have to test if it's real, or not.
How can I konw if one webpage has changes since one date, or what is the date of the last update?
In java how can I put an application running on pc boot, the application must run since the user turns on the computer.

I'm not sure what kind of application you want to build. I'll assume it's a desktop application. In order to check if a URL exists, you should make a HTTP HEAD Request, and parse the results. HEAD can be used to check if the page has been modified. In order for an application to start when the PC boots, you have to add a registry entry under Windows; this process is explained here

To check whether a url is valid you could try using a regular expression (regex for urls).
To know if a webpage has changed you can take a look at the http headers (reading http headers in java).
You can't make the program startup automatically on boot, the user must do that. However, you can write code to help the user set the program as startup app; this however depends on the operating system.

I'm not sure what you mean by "real". If you mean "valid", then you can just construct a java.net.URL from a String and catch the resulting MalformedURLException if it's not valid. If you mean that there's actually something there, you could issue an HTTP HEAD request like Geo says, or you could just retrieve the content. HTTPUnit is particularly handy for retrieving web content.
HTTP headers may indicate when the content has changed, as nan suggested above. If you don't want to count on that you can just retrieve the page and store it, or even better, store a hash of the page content. See DigestOutputStream for generating a hash. On a subsequent check for changes you would simply compare the new hash with the the one you stored last time.
Nan is right about start on boot. What OS are you targeting?

Related

Combine two http responses into one

Is it possible to send extra data attached to a http response via Java or Php?
My Website is a homework-platform: One User enters homeworks into a database, and all users can then see the homeworks on the website. The current load is very inefficient, as the browser makes two requests for eveything to load: One for the index file and one for the homeworks. For the homeworks request the client also sends settings of the user to the server, based on which the returned homeworks are generated by a Php script.
Now, I wonder, if it is possible, to combine those two requests into one? Is it maybe possible to detect the http request with Java or Php on the server, read the cookies (where the settings are saved), then get the homeworks from the database and send the data attached to the http response to the client? Or, even better, firstly only return the index file and as soon as possible and the homework data afterwards as a second response, because the client needs some time to parse the Html & build the DOM-tree when it can't show the homeworks anyway.
While browsing the web I stumbled across terms like "Server-side rendering" and "SPDY", but I don't know if those are the right starting points.
Any help is highly appreciated, as I'm personally very interested in a solution and it would greatly improve the load time of my website.

A simple solution to your problem is to initialize your data in the index file.
You would create a javascript object, and embed it right into the html, rendered by your server. You could place this object in the global namespace (such as under window.initData), so that it can be accessed by the code in your script.
<scipt>
window.initData = {
someVariable: 23,
}; // you could use json_encode if you use php, or Jackson if you use java
</script>
However, it is not a huge problem if your data is fetched in a separate server request. Especially when it takes more time to retrieve the data from the database/web services, you can provide better user experience by first fetching the static content very quickly and displaying a spinner while the (slower) data is being loaded.

Passing Browser Tab information at backend?

Scenario:
We have one analysis which gives different results based on different inputs. So if the user open the same analysis in two different browser tabs, the session variables being common will get overridden and output will be same in both tabs though we want different outputs based on different user inputs in tabs.
So we plan to send a tab-id at the backend so that we save session variables per tab-id.
Is there some automatic way that tab information is being sent to the server like may be in request header or something like that??
Or we will have to generate a tab-id ourselves and send it with every request?

You'll have to generate your tab-id and pass it back with each request, but the following might make it a bit easier:
You can use sessionStorage from Web Storage API to store values unique to each tab. Every tab in the browser starts a new session so they are always distinct.
https://developer.mozilla.org/en/docs/Web/API/Window/sessionStorage
It should work with most common browsers (even IE8+): http://caniuse.com/#search=web%20storage
Hope that helps!

Java web crawling - static URLS

I'm going to look a bit more into the techniques because obviously there is a lot to learn but I was wondering what the best approach to handling static URLs is. I would guess it has to do with cookies but I'm not positive.
E.x. So I search my query on a site example.com, example.com/search?string=blah then sends me to a url that is specific to the search string. From there I can then go further (to the data I actually want) but the link to the results is a static URL example.com/results.php?id=33, the id stays the same regardless of search string. So the only logical thing is a cookie being passed right? If so, how would I have Java open a connection, grab the cookies, then open a new connection and pass the cookies? I tried something like this with two methods, one that opens the initial connection and grabs the cookies then opens a new connection and passes the cookies to that method.
There are definitely multiple cookies if that helps.
Also, any links/resources that you think I might find helpful would be greatly appreciated.

How do I send a query to a website and parse the results?

I want to do some development in Java. I'd like to be able to access a website, say for example
www.chipotle.com
On the top right, they have a place where you can enter in your zip code and it will give you all of the nearest locations. The program will just have an empty box for user input for their zip code, and it will query the actual chipotle server to retrieve the nearest locations. How do I do that, and also how is the data I receive stored?
This will probably be a followup question as to what methods I should use to parse the data.
Thanks!

First you need to know the parameters needed to execute the query and the URL which these parameters should be submitted to (the action attribute of the form). With that, your application will have to do an HTTP request to the URL, with your own parameters (possibly only the zip code). Finally parse the answer.
This can be done with standard Java API classes, but it won't be very robust. A better solution would be HttpClient. Here are some examples.

This will probably be a followup question as to what methods I should use to parse the data.
It very much depends on what the website actually returns.
If it returns static HTML, use an regular (strict) or permissive HTML parser should be used.
If it returns dynamic HTML (i.e. HTML with embedded Javascript) you may need to use something that evaluates the Javascript as part of the content extraction process.
There may also be a web API designed for programs (like yours) to use. Such an API would typically return the results as XML or JSON so that you don't have to scrape the results out of an HTML document.
Before you go any further you should check the Terms of Service for the site. Do they say anything about what you are proposing to do?
A lot of sites DO NOT WANT people to scrape their content or provide wrappers for their services. For instance, if they get income from ads shown on their site, what you are proposing to do could result in a diversion of visitors to their site and a resulting loss of potential or actual income.
If you don't respect a website's ToS, you could be on the receiving end of lawyers letters ... or worse. In addition, they could already be using technical means to make life difficult for people to scrape their service.

HTTP Request (POST) field size limit & Request.BinaryRead in JSPs

First off my Java is beyond rusty and I've never done JSPs or servlets, but I'm trying to help someone else solve a problem.
A form rendered by JavaScript is posting back to a JSP.
Some of the fields in this form are over 100KB in size.
However when the form field is being retrieved on the JSP side the value of the field is being truncated to 100KB.
Now I know that there is a similar problem in ASP Request.Form which can be gotten around by using Request.BinaryRead.
Is there an equivalent in Java?
Or alternatively is there a setting in Websphere/Apache/IBM HTTP Server that gets around the same problem?

Since the posted request must be kept in-memory by the servlet container to provide the functionality required by the ServletRequest API, most servlet containers have a configurable size limit to prevent DoS attacks, since otherwise a small number of bogus clients could provoke the server to run out of memory.
It's a little bit strange if WebSphere is silently truncating the request instead of failing properly, but if this is the cause of your problem, you may find the configuration options here in the WebSphere documentation.

We have resolved the issue.
Nothing to do with web server settings as it turned out and nothing was being truncated in the post.
The form field prior to posting was being split into 102399 bytes sized chunks by JavaScript and each chunk was added to the form field as a value so it was ending up with an array of values.
Request.Form() appears to automatically concatenate these values to reproduce the single giant string but Java getParameter() does not.
Using getParameterValues() and rebuilding the string from the returned values however did the trick.

You can use getInputStream (raw bytes) or getReader (decoded character data) to read data from the request. Note how this interacts with reading the parameters. If you don't want to use a servlet, have a look at using a Filter to wrap the request.
I would expect WebSphere to reject the request rather than arbitrarily truncate data. I suspect a bug elsewhere.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Multiple questions in Java (Validating URLs, adding applications to startup, etc.) - java

Related

Combine two http responses into one

Passing Browser Tab information at backend?

Java web crawling - static URLS

How do I send a query to a website and parse the results?

HTTP Request (POST) field size limit & Request.BinaryRead in JSPs

Categories

Resources