Diffbot URL encode - java

I got the problem with diffbot url encode problem.
I have a URL and I pass url when I call diffbot api like this.
//JsonNode json= (JsonNode)client.analyze(DiffbotClient.ResponseType.Jackson,url);
but I got error massage about url encoding.this is error message that I got
{"errorCode":500,"error":"URL encoding"}
So I change my code system like this.
//JsonNode json= (JsonNode) client.analyze(DiffbotClient.ResponseType.Jackson,u.getHost()+u.getPath()+URLEncoder.encode("?"+u.getQuery(),"UTF-8"));
but it doesn't work out and Diffbot print like that
{"errorCode":500,"error":"Error."}.
what kind of Encoding format diffbot API is using?

You're supposed to only encode the URL the contents of which you're processing with Diffbot, not the entire API string. For example, replace {{token}} below with your own and visit the URL in the browser. It will work.
Use this as inspiration to build your own URL for the API call:
http://api.diffbot.com/v3/article?token={{token}}&url=http%3A%2F%2Fwww.sitepoint.com%2Fdiffbot-crawling-visual-machine-learning%2F
As you can see, only the url query param is encoded, and it's no special encoding, it's just basic HTML entity encoding.

Related

How to use a Url having special characters in HttpGet(URL) in java

I am using HttpClient, its working fine for any url having no special characters.
But when i send the url having special characters it gets failed.
I tried URL Api but it is deprecated.
Tried with utf-8 but also did not work.
Can you suggest me a simple way of making the HttpGet call for below url
http://example.com/?status!~^(notdeleted|presesnt)$&env~check_test
String link = "http://example.com/?"
+ URLEncoder.encode("status!~^(notdeleted|presesnt)$&env~check_test", "UTF-8");
Maybe in two parts around & if that is meant as the next URL parameter.

URL percent encoding query param Bing API Java

I'm trying to URL percent encode my query param value while using URIBuilder to make an HTTP request to Bing API.
The url looks like
"https://api.datamarket.azure.com/Data.ashx/Bing/SearchWeb/v1/Web?$format=json&Query="
Where the Query String must be like
%27Test%20query%27
Using URLEncoder.encode(string, code), a string such as "test query", gets turned into "test+query" which is unacceptable.
URIUtil.encodeQuery()
returns "test%20query" which is almost acceptable, except it needs the %27 at the beginning and end.
When I try to just concatenate the string to make it valid as such, and then load this into URIBuilder, URIBuilder ends up with
https://api.datamarket.azure.com/Data.ashx/Bing/SearchWeb/v1/Web?%24format=json&Query=%2527test%2520query%2527
which is again unacceptable.
How can I remedy this issue? It's driving me insane.
Thanks for any help.
this is encoded URI.
$ is %24
bank is %20
if you want real URI, you need to decode .
I think decode method works well for you.
reference here:
http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/util/URIUtil.html

url encoding skipping the fqdn

I have a question regarding url encoding. Trying to encode the url and could not get it working. Tried java.net.URLEncode.
I have url http://msnbcmedia4.msn.com/i/MSNBC/Components/Photo/_new/130409_luke hancock.jpg and I need to encode it. From online forums my understanding is that I should only encode queryparams and url path excluding fqdn(http://msnbcmedia4.msn.com). Should I need to encode(/ in url path, ? and & in parameters) or skip encoding these. I am trying to download the content from this specific location using java. Any info would be appreciated.
URLEncoder is the right choice. You need to encode only individual Query string parameters name/value and not the entire URL. If you encode whole URL then it will encode Http and other URL parts as well which we don't want.
Check out this awesome answer >> https://stackoverflow.com/a/10786112/2093375
Regards,

Returned URL as String is not valid in JSF

I'm trying to make use of google api as text-to-speech. So, I build a String then should pass it as a URL to a component to obtain a MP3 with the spoken words.
So, this is my code:
URI uri = new URI("http://translate.google.com/translate_tts?tl=es&q="+ URLEncoder.encode((String)this.text.getValue(), "UTF-8"));
When I make uri.toString() its return a well formed URL. If I copy and paste this output in the browser works pefectly.
But if I assign this returned String to the source property of a ice:outputMedia is not working. Then inspect the HTML generated in the page and the String in src property is:
http://translate.google.com/translate_tts?tl=es&q=Bobby+need+peanuts
The & symbol has been replaced by &.
How can I avoid this to make a valid URL?
You need to decode the url on the client side using Javascript.
var decoded = decodeURI(URI)

response.sendredirect with url with foreign chars - how to encode?

I have a jsf app that has international users so form inputs can have non-western strings like kanjii and chinese - if I hit my url with ..?q=東日本大 the output on the page is correct and I see the q input in my form gets populated fine. But if I enter that same string into my form and submit, my app does a redirect back to itself after constructing the url with the populated parameters in the url (seems redundant but this is due to 3rd party integration) but the redirect is not encoding the string properly. I have
url = new String(url.getBytes("ISO-8859-1"), "UTF-8");
response.sendRedirect(url);
But url redirect ends up being q=???? I've played around with various encoding strings (switched around ISO and UTF-8 and just got a bunch of gibberish in the url) in the String constructor but none seem to work to where I get q=東日本大 Any ideas as to what I need to do to get the q=東日本大 populated in the redirect properly? Thanks.
How are you making your url? URIs can't directly have non-ASCII characters in; they have to be turned into bytes (using a particular encoding) and then %-encoded.
URLEncoder.encode should be given an encoding argument, to ensure this is the right encoding. Otherwise you get the default encoding, which is probably wrong and always to be avoided.
String q= "\u6771\u65e5\u672c\u5927"; // 東日本大
String url= "http://example.com/query?q="+URLEncoder.encode(q, "utf-8");
// http://example.com/query?q=%E6%9D%B1%E6%97%A5%E6%9C%AC%E5%A4%A7
response.sendRedirect(url);
This URI will display as the IRI ‘http://example.com/query?q=東日本大’ in the browser address bar.
Make sure you're serving your pages as UTF-8 (using Content-Type header/meta) and interpreting query string input as UTF-8 (server-specific; see this faq for Tomcat.)
Try
response.setContentType("text/html; charset=UTF-16");
response.setCharacterEncoding("utf-16");

Categories