original url : http://pricecheckindia.com/go/store/ebay/52440?ref=velusliv
redirected url : http://www.ebay.in/itm/Asus-Zenfone-6-A600CG-A601CG-White-16-GB-/111471688863?pt=IN_Mobile_Phones&aff_source=DA
I need a program that will take the original url and print the redirected url.
How to get this done in java.
public static void main(String[] args) throws IOException, InterruptedException
{
String url = "http://pricecheckindia.com/go/store/ebay/52440?ref=velusliv";
Response response = Jsoup.connect(url).followRedirects(false).execute();
System.out.println(response.url());
}
It seems that you are being redirected via JavaScript code, which Jsoup doesn't support (it is simple HTML parser, not browser emulator). Your choice then is to either use tool which will support JavaScript like Selenium web driver, or parse your page to get url from click here link from
If it is taking too long to redirect, then please click here
text.
You can use Jsoup to get this link by adding to your current code
Document doc = response.parse();
String redirectUrl = doc.select("a:contains(click here)").attr("href");
System.out.println(redirectUrl);
which will return and print
http://rover.ebay.com/rover/1/4686-127726-2357-15/2?&site=Partnership_PRCCHK&aff_source=DA&mpre=http%3A%2F%2Fwww.ebay.in%2Fitm%2FAsus-Zenfone-6-A600CG-A601CG-White-16-GB-%2F111471688863%3Fpt%3DIN_Mobile_Phones%26aff_source%3DDA
so now all we need to do is parse query from this URL to get value of mpre key, which encoded version looks like
http%3A%2F%2Fwww.ebay.in%2Fitm%2FAsus-Zenfone-6-A600CG-A601CG-White-16-GB-%2F111471688863%3Fpt%3DIN_Mobile_Phones%26aff_source%3DDA
but after decoding it will actually represents
http://www.ebay.in/itm/Asus-Zenfone-6-A600CG-A601CG-White-16-GB-/111471688863?pt=IN_Mobile_Phones&aff_source=DA
To get value of this key and decode it you can use one of solutions from this question: Parse a URI String into Name-Value Collection. With help of method from accepted answer in previously mentioned question we can just invoke
URL address = new URL(redirectUrl);
Map<String,List<String>> urlQuerryMap= splitQuery(address);
String redirected = urlQuerryMap.get("mpre").get(0);
System.out.println(redirected);
to see result
http://www.ebay.in/itm/Asus-Zenfone-6-A600CG-A601CG-White-16-GB-/111471688863?pt=IN_Mobile_Phones&aff_source=DA
If I have a HTML String object, using Selenium in Java, how can I get the browser to open that String as a HTML page? I have seen this done before but I don't remember the format that the URL needs to be.
For this example, let's say the string is :
<h2>This is a <i>test</i></h2>
I looked through this page and couldn't find the answer but I might be overlooking it. For example I tried this URL and it didn't work for me:
data:<h2>This is a <i>test</i></h2>
Here is a link for documentation http://en.wikipedia.org/wiki/Data_URI_scheme. You need to specify MIME-type of data. Try data:text/html,<h2>This is a <i>test</i></h2>
This question already has answers here:
JSON URL from StackExchange API returning jibberish?
(3 answers)
Closed 9 years ago.
I am trying to grab the data from a json file through java. If I navigate to the URL using my browser, everything displays fine, but if I try to get the data using java I get get a bunch of characters that cannot be interpreted or parsed. Note that this code works with other JSON Files. Could this be a server side thing with the way the JSON file is created? I tried messing around with different character sets and that did not seem to fix the problem.
public static void main(String[] args) throws Exception {
URL url = new URL("http://www.minecraftpvp.com/api/ping.json");
URLConnection connection = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
boolean hasLine = true;
while (hasLine) {
String line = in.readLine();
if (line != null) {
System.out.println(line);
} else {
hasLine = false;
}
}
}
The output I get from this is just a ton of strange characters that make no sense at all. Where if I change the url to something like google.com, it works fine.
EDIT: JSON URL from StackExchange API returning jibberish? Seemed to have answered my question. I tried searching before I asked to make sure the answer wasn't here and couldn't find anything. Guess I didn't look hard enough.
Yes that URL is returning gzip encoded content by default.
You can do one of three things:
Explicitly set the Accept-Encoding: header in your request. A web service should not return gzip compression unless it is listed as an accepted encoding in the request, so this website is not being very friendly. Your browser is setting it as accepted I suspect, that is why you can see it there. Just set it to an empty value and it should as per the spec return non-encoded responses, your mileage may vary on this one.
Or use the answer in this How to handle non-UTF8 html page in Java? that shows how to decompress the response. This should be the preferred option over #1.
And/or Ask the person hosting the service to implement the recommended scheme which is to only provide compressed responses if the client says it can handle them or if it can infer it from the browser fingerprint with high confidence.
Best of luck C.
You need to inspect the Content-Encoding header. The URL in question improperly returns gzip-compressed content even when you don't ask for it, and you'll need to run it through a decoder.
I have two websites, A and B. When I open website A, I am redirected to website B automatically.
What is the function with which I can check what was the full path of website A from which was the redirect?
I was trying to start with:
logger.info(request.getPathInfo());
logger.info(request.getPathTranslated());
logger.info(request.getServletPath());
logger.info(request.getLocalName());
logger.info(request.getRemoteAddr());
logger.info(request.getRemoteHost());
logger.info(request.getRequestURI());
logger.info(request.getServerName());
but none of them is correct.
For redirecting I use response.sendRedirect inside Controller.
Thanks for help.
You can try using the optional referer header:
request.getHeader("referer");
But it is important to note that this may not always be populated (specifically IE).
A better solution, if you are in control of both of the websites, is to pass the value somehow when you are doing the redirect. For example, as a GET or POST parameter.
Edit:
As suggested above, you can append query strings to your redirect URL. For example, you might try something like this:
String redirectUrl = "http://my.redirect.com/";
redirectUrl += "?referer=";
redirectUrl += URLEncoder.encode(request.getRequestURL().toString(), "UTF-8");
Then you can just pull this out of the request on the other side.
Use this as a starting point. You may need to manually append other query parameters that may not be part of the getRequestURL() output.
None of these would get you the page that redirected you to the current page. What you can try is:
String refererPage = request.getHeader("referer");
However keep in mind that this is also browser dependent and may not always be present.
Try this
request.getHeader("referer");
Please try
request.getHeader("referer");
If I have a general url (not restricted to twitter or google) like this:
http://t.co/y4o14bI
is there an easy way to check if this url is shortened?
In the above case, I as a human can of course see that it was shortend, but is there an automatic and elegant way?
You could do a request to the URL, look if you get redirected and if so, assume it's a shortening service. For this you'd have to read the HTTP status codes.
On the other hand, you could whitelist some URL shortening services (t.co, bit.ly, and so on) and assume all links to those domains are shortened.
Drawback of the first method is that it isn't certain, some sites use redirects internally. The drawback of the second method is that you'd have to keep adding shortening services, although only a few are used widely.
One signal may be to request the URL and see if it results in a redirect to another domain. However, without a good definition of what "shortened" means, there is no generic way.
if you know all the domains that can be used to shorten your URLs, check if it is contained :
String[] domains = {"bit.ly", "t.co"...};
for(String domain : domains){
if(url.startsWith("http://" + domain)){
return true;
}
}
return false;
You can't: You will have to work by assumption.
Assumption:
Does www exist in url.
Does the server name end with a valid domain (e.g. com, edu, etc.) or does it has co.xx where xx is a valid country or organization code.
And you can add more assumption based on other url shortening links.
You can't.
You can only check if you list a couple of shorteners and check if the url starts with it.
You can also try checking whether the url is shorter than a given length (and contains path/query string), but some shorteners (tinyurl for example) may have longer urls than normal sites (aol.com)
I would prefer the list of known shorteners.
Here's what you could do in Java, groovy and the like.
Get the url you want to test;
Open the url with HttpURLConnection
Check the response code
if it is a valid code, 200 for example, the you can retrieve the url string in long form from the connection object if it was shortened or back in its original form if it wasn't.
We all love to see some code don't we. Its crude, but hey!
String addr = "http://t.co/y4o14bI";
URL url = new URL(addr);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
if (connection.getResponseCode() == 200) {
String longUrl = connection.url;
System.out.println(longUrl);
} else {
// You decide what you want to do here!
}
Actually, you as a human, can't. The only way you know that it's shortened is that it's a t.co domain. The y4o14bI could be an CMS identifier for all you know.
The best way would be to use a list of known shortener urls, and lookup against that.
And even then you would have problems. I use bit.ly with a personal domain, wtn.gd
So http://wtn.gd/random would also be a shortened URL.
You could maybe do a HTTP HEAD-request, and check for a 301/302 ?
If you request an URL like this, your HttpCLient should receive a HTTP Redirect instead of a HTML page. This wouldn't be an evidence but at least a hint.
Evaluate the URL and look for some clues:
the Path meets certain criteria
only has one step (i.e. not multiple slashes)
does not end with filename extensions
not longer than X characters (would need to evaluate various URL shortening services and adjust the upper bounds for the max token length)
HttpUrlConnection returns a redirect responseCode (i.e. 301, 302)
I would suggest using android.util.Patterns.WEB_URL
public static List<String> findUrls(String input) {
List<String> links = new ArrayList<>();
Matcher m = android.util.Patterns.WEB_URL.matcher(input);
while (m.find()) {
String url = m.group();
links.add(url);
}
return links;
}
Use the unshorten URL service like https://unshorten.me
They have an API as well https://unshorten.me/api
If the URL is shortened it will return the original URL.
If not you will get the same one back.