how to detect encoding when i'm using bufferedReader - java

i know this question was asked many times however i'm stuck with this problem and nothing i've read helped me.
i have this code:
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
while((line = reader.readLine()) != null)content += line+"\r\n";
reader.close();
i'm trying to get content of this webpage http://www.garazh.com.ua/tires/catalog/Marangoni/E-COMM/description/ and all nonlatin symbols have been displayed wrong.
i tried set encoding like:
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), "WINDOWS-1251"));
and at this point everething was well! but i cant change encoding for each website i try to parse and i need some solution.
so guys, i know that there is not that easy to detect encoding as it seems but i'm realy need it. if someone had such problem please explain me how you have solved it!
any help appriciated!
this is entire code of the function i'm using to get content:
protected Map<String, String> getFromUrl(String url){
Map<String, String> mp = new HashMap<String, String>();
String newCookie = "", redirect = null;
try{
String host = this.getHostName(url), content = "", header = "", UA = this.getUA(), cookie = this.getCookie(host, UA), referer = "http://"+host+"/";
URL U = new URL(url);
URLConnection conn = U.openConnection();
conn.setRequestProperty("Host", host);
conn.setRequestProperty("User-Agent", UA);
conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
conn.setRequestProperty("Accept-Language", "ru-ru,ru;q=0.8,en-us;q=0.5,en;q=0.3");
conn.setRequestProperty("Accept-Encoding", "gzip,deflate");
conn.setRequestProperty("Accept-Charset", "utf-8;q=0.7,*;q=0.7");
conn.setRequestProperty("Keep-Alive", "115");
conn.setRequestProperty("Connection", "keep-alive");
conn.setRequestProperty("Connection", "keep-alive");
if(referer != null)conn.setRequestProperty("Referer", referer);
if(cookie != null && !cookie.contentEquals(""))conn.setRequestProperty("Cookie", cookie);
for(int i=0; ; i++){
String name = conn.getHeaderFieldKey(i);
String value = conn.getHeaderField(i);
if(name == null && value == null)break;
else if(name != null)if(name.contentEquals("Set-Cookie"))newCookie += value + " ";
else if(name.toLowerCase().trim().contentEquals("location"))redirect = value;
header += name + ": " + value + "\r\n";
}
if(!newCookie.contentEquals("") && !newCookie.contentEquals(cookie))this.setCookie(host, UA, newCookie.trim());
try{
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
while((line = reader.readLine()) != null)content += line+"\r\n";
reader.close();
}
catch(Exception e){/*System.out.println(url+"\r\n"+e);*/}
mp.put("url", url);
mp.put("header", header);
mp.put("content", content);
}
catch(Exception e){
mp.put("url", "");
mp.put("header", "");
mp.put("content", "");
}
if(redirect != null && this.redirectCount < 3){
mp = getFromUrl(redirect);
this.redirectCount++;
}
return mp;
}

Use jsoup for example. Detecting character encoding of a random website is complex issue because of lying/non-existent headers and 2 different meta tags. For example, the page you linked doesn't send the charset in Content-Type header.
And you're going to need a HTML parser anyway, you didn't think of going with a regex, did you?
Here's example usage:
Connection connection = Jsoup.connect("http://www.garazh.com.ua/tires/catalog/Marangoni/E-COMM/description/");
connection
.header("Host", host)
.header("User-Agent", UA)
.header("Accept", "text/html,application/xhtml+xml,application/xmlq=0.9,*/*q=0.8")
.header("Accept-Language", "ru-ru,ruq=0.8,en-usq=0.5,enq=0.3")
.header("Accept-Encoding", "gzip,deflate")
.header("Accept-Charset", "utf-8q=0.7,*q=0.7")
.header("Keep-Alive", "115")
.header("Connection", "keep-alive");
connection.followRedirects(true);
Document doc = connection.get();
Map<String, String> cookies = connection.response().cookies();
Elements titles = doc.select(".title");
for( Element title : titles ) {
System.out.println(title.ownText());
}
Output:
Шины Marangoni E-COMM
Описание шины Marangoni E-COMM

You want to look for the 'Content-Type' header:
Content-Type: text/html; charset=utf-8
The "charset" part there is what you're looking for.

Related

How to create a http response for images in java?

I have been trying to create a simple java web server everything works fine for files such as html or css but I'm unable to send image responses correctly. The problem is obviously with the image data that I'm sending but I'm not sure how to do it properly. I have been searching for any information about it for a long time not and I just can't find anything useful that would fix my problem.
Part of my code:
public void Send(String path) throws IOException {
File file = new File(path);
if(file.exists()) {
if(!file.isDirectory()) {
if(isImage(file)) {
InputStream is = new FileInputStream(file);
byte[] bytes = IOUtils.toByteArray(is);
String response = "HTTP/1.1 200 OK" + CRLF + "Content-Length: " + bytes.length + CRLF;
response += "content-type: image/jpeg" + CRLF + CRLF;
outputStream.write(response.getBytes());
outputStream.write(bytes);
outputStream.write((CRLF + CRLF).getBytes());
outputStream.flush();
} else {
String data = "";
BufferedReader br = new BufferedReader(new FileReader(file));
String st;
while ((st = br.readLine()) != null) {
data += st;
}
int length = data.getBytes().length;
String response = "HTTP/1.1 200 OK" + CRLF + "Content-Length: " + length + CRLF;
response += CRLF + data + CRLF + CRLF;
br.close();
outputStream.write(response.getBytes());
}
return;
}
}
SendError("404 Not Found");
}
outputStream is OutputStream from a Socket.
I saw this but I think I'm only using streams at least for the image part.
I'm new to this so any help would be appreciated!
EDIT (more inforamtion):
Browser information:
Headers
Preview
The isImage(file) methode works fine I have tested it but here it is:
private boolean isImage(File file) {
String mimetype = new MimetypesFileTypeMap().getContentType(file);
String type = mimetype.split("/")[0];
return type.equals("image");
}
And the image is 2.jpg
EDIT 2
I wrote this code to write the content of the array in a text file:
String out = "";
for(int i = 0; i < bytes.length; i++) {
if(i%16 == 0) {
out += "\n";
}
out += String.format("%02X ", bytes[i]);
}
BufferedWriter writer = new BufferedWriter(new FileWriter("test.txt"));
writer.write(out);
writer.close();
So I checked the start of both the image and the array and they seem to be identical.
Start of the image data
Start if the array
After that I tried to create a client for testing:
private static void Get2(String link) throws IOException {
URL url = new URL(link);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setRequestMethod("GET");
con.setRequestProperty("Connection", "keep-alive");
con.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36 Edg/85.0.564.68");
con.setRequestProperty("Accept", "image/webp,image/apng,image/*,*/*;q=0.8");
con.setRequestProperty("Sec-Fetch-Site", "same-origin");
con.setRequestProperty("Sec-Fetch-Mode", "no-cors");
con.setRequestProperty("Sec-Fetch-Dest", "image");
con.setRequestProperty("Accept-Encoding", "gzip, deflate, br");
con.setRequestProperty("Accept-Language", "sl,en;q=0.9,en-GB;q=0.8,en-US;q=0.7");
con.setConnectTimeout(5000);
con.setReadTimeout(5000);
con.setInstanceFollowRedirects(false);
int status = con.getResponseCode();
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer content = new StringBuffer();
int i = 0;
while ((inputLine = in.readLine()) != null) {
if(i < 5) {
System.out.println(inputLine);
} else {
content.append(inputLine);
}
i++;
}
in.close();
con.disconnect();
BufferedWriter writer = new BufferedWriter(new FileWriter("test2.txt"));
writer.write(content.toString());
writer.close();
}
I called the function: Get2("http://localhost:8080/images/2.jpg");
And got saved data in the test2.txt. Inside I saw some parts of similar data but it's clearly something wrong with it. I'm not sure if I'm using this client test wrong so if I'm doing something wrong or should be using something else let me know.
Image (left test2.txt, right test.txt)
Thanks to everyone that will and already helped or had any suggestions.
I finally figured it out. Actually my bad for not providing everything.
String CRLF = "\n\r";
But apparently, it should only be \n.
I read somewhere that windows automatically adds \r after \n. I don't know if that's true but removing \r fixed my problem as before I had 2 empty lines right after GET / HTTP/1.1 so the other content was considered as part of the data.
As soon as I changed that everything worked fine.
Again thanks for your help!
EDIT
Nevermind. What I did wrong was the order of \n and \r. It should be \r\n not \n\r

How to set parameters in a GET request in Java

So I want to send a GET request with parameters. But it only seems to have conventions for the url you send the request to. Unlike the POST request, I see no way to pass parameters in it.
How I send the GET request now, without parameters (might be wrong):
String url = "http://api.netatmo.net/api/getuser";
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
// optional default is GET
con.setRequestMethod("GET");
//add request header
con.setRequestProperty("User-Agent", USER_AGENT);
int responseCode = con.getResponseCode();
Log.v(TAG, ("\nSending 'GET' request to URL : " + url));
Log.v(TAG, ("Response Code : " + responseCode));
BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
//print result
Log.v(TAG, (response.toString()));
How I send the POST request with parameters:
String url = "https://api.netatmo.net/oauth2/token";
URL obj = new URL(url);
HttpsURLConnection con = (HttpsURLConnection) obj.openConnection();
//add request header
con.setRequestMethod("POST");
con.setRequestProperty("User-Agent", USER_AGENT);
con.setRequestProperty("Accept-Language", "en-US,en;q=0.5");
String urlParameters = "grant_type=password&client_id=myid&client_secret=mysecret&username=myusername&password=mypass";
// Send post request
con.setDoOutput(true);
DataOutputStream wr = new DataOutputStream(con.getOutputStream());
wr.writeBytes(urlParameters);
wr.flush();
wr.close();
int responseCode = con.getResponseCode();
Log.v(TAG, "\nSending 'POST' request to URL : " + url);
Log.v(TAG, "Post parameters : " + urlParameters);
Log.v(TAG, "Response Code : " + responseCode);
BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
//print result
Log.v(TAG, response.toString());
access_token = response.substring(17, 74);
refresh_token = response.substring(93,150);
getRequest = "/api/getuser?access_token=" + access_token + " HTTP/1.1";
Log.v(TAG, access_token);
Log.v(TAG, refresh_token);
Log.v(TAG, getRequest);
As per the HTTP specification GET supports only path params or url params and hence you cannot put the params in HTTP request body as you do in POST request.
As Sotirios mentioned in the comments, technically you can still push params in the GET body, but if the APIs are respecting the specs, they will not provide you a way to do it.
Have you tried to add the query params to the request java.net.URL?
String url = "http://api.netatmo.net/api/getuser?access_token=" + access_token;
URL obj = new URL(url);
I was encountering the same problem, trying this:
String bla = "http://api.netatmo.net/api/devicelist?access_token=" + AUTH_TOKEN;
URL url = new URL(bla);
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
String line = "";
String message = "";
while ((line = reader.readLine()) != null)
{
message += line;
}
I got an exception that the syntax was not correct. When I changed the syntax (by for example encoding with UTF 8) the API would just return errors (like 404 not found...).
I finally got it working using this:
try
{
System.out.println("Access Token: " + AUTH_TOKEN);
String url = "http://api.netatmo.net/api/devicelist";
String query = "access_token=" + URLEncoder.encode(AUTH_TOKEN, CHARSET);
URLConnection connection = new URL(url + "?" + query).openConnection();
connection.setRequestProperty("Accept-Charset", CHARSET);
InputStream response = connection.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(response));
String line = "";
String message = "";
while ((line = reader.readLine()) != null)
{
message += line;
}
return message;
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Note: CHARSET = "UTF-8"
Turns out the url the API provided confused me greatly. I fixed the url and it works now.

Java HttpURLConnection no cookies at login

I'm currently trying to do a login with HttpURLConnection and then get the session cookies...
I already tried that on some test pages on my own server, and that works perfectly. When i send a=3&b=5 i get 8 as cookie (PHP page adds both together)!
But when i try that at the other page, the output is just the page as if I just sent nothing with POST :(
General suggestions for improvement are welcome! :)
My Code:
HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();
conn.setDoInput(true);
conn.setDoOutput(true);
conn.setRequestMethod("POST");
conn.setRequestProperty("useragent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
conn.setRequestProperty("Connection", "keep-alive");
DataOutputStream out = new DataOutputStream(conn.getOutputStream());
out.writeBytes("USER=tennox&PASS=*****");
out.close();
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
String response = new String();
while ((line = in.readLine()) != null) {
response = response + line + "\n";
}
in.close();
System.out.println("headers:");
int i = 0;
String header;
while ((header = conn.getHeaderField(i)) != null) {
String key = conn.getHeaderFieldKey(i);
System.out.println(((key == null) ? "" : key + ": ") + header);
i++;
}
String cookies = conn.getHeaderField("Set-Cookie");
System.out.println("\nCookies: \"" + cookies + "\"");
The cookie path should initially be set with ; Path=/, also needing a Set-Cookie in the request header of the POST.
Better rewrite it all with an HttpClient.

sending post with Java

I have a bash script when I logged in a web page to then parse the html. The command that I've used is wget:
wget --save-cookies=cookies.txt --post-data "uid=USER&pass=PWD" http://www.spanishtracker.com/login.php
wget --load-cookies=cookies.txt "http://www.spanishtracker.com/torrents.php" -O OUTPUT
Now, I'm trying to make these with Java. Firs of all, I'm trying to POST the request but when I execute the output don't gives as I was logged. These is the code of Java:
try {
data = URLEncoder.encode("uid", "UTF-8") + "=" + URLEncoder.encode("USER", "UTF-8");
data += "&" + URLEncoder.encode("pass", "UTF-8") + "=" + URLEncoder.encode("PASS", "UTF-8");
// Send the request
URL url = new URL("http://www.spanishtracker.com/index.php");
URLConnection conn = url.openConnection();
conn.setDoOutput(true);
OutputStreamWriter writer = new OutputStreamWriter(conn.getOutputStream());
//write parameters
writer.write(data);
writer.flush();
// Get the response
StringBuffer answer = new StringBuffer();
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
answer.append(line);
}
writer.close();
reader.close();
// temporary to build request cookie header
StringBuilder sb = new StringBuilder();
// find the cookies in the response header from the first request
List<String> cookies = conn.getHeaderFields().get("Set-Cookie");
if (cookies != null) {
System.out.println("Hay cookies para guardar");
for (String cookie : cookies) {
if (sb.length() > 0) {
sb.append("; ");
}
// only want the first part of the cookie header that has the value
String value = cookie.split(";")[0];
sb.append(value);
}
}
Could you help me please.
Many thanks and sorry for my english!
Use apache HttpClient library link

Using Java to send data to a form on a website hosted locally

I have a program in Java where I retrieve contents from a database.
Now I have a form in the program, and what I want to do is, on the press of a button, some string (text) content retrieved from the database, should be sent over to a website that I'm hosting locally. The content so sent, should be displayed on the website when refreshed.
Can someone guide me as to how I can achieve this (the sending of data to be displayed over the website)?
Will appreciate a lot, if you could kindly show some sample snippets or give me a reference to some tutorial that can help.
---- Okay so i found a link to a snippet that's supposed to do this, but im unable to understand at this stage as to how exactly this snippet works...can someone please guide me into knowing this better ?
here's the code
try {
// Construct data
String data = URLEncoder.encode("key1", "UTF-8") + "=" + URLEncoder.encode("value1", "UTF-8");
data += "&" + URLEncoder.encode("key2", "UTF-8") + "=" + URLEncoder.encode("value2", "UTF-8");
// Send data
URL url = new URL("http://hostname:80/cgi");
URLConnection conn = url.openConnection();
conn.setDoOutput(true);
OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream());
wr.write(data);
wr.flush();
// Get the response
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
while ((line = rd.readLine()) != null) {
// Process line...
}
wr.close();
rd.close();
} catch (Exception e) {
}
I'm not sure on how you store and manage any of the records but from Java you can send a HTTP Post to the Url (In your case http://localhost/, probably).
Have a look at http://www.exampledepot.com/egs/java.net/post.html for a snippet on how to do this.
Your Website could then store the received information in a database and display it when you refresh.
Update heres the function
Just a side not this is by no means the best way to do this and I have no idea on how this scales but for simple solutions this has worked for me in the past.
/**
* Posts a Set of forms variables to the Remote HTTP Host
* #param url The URL to post to and read
* #param params The Parameters to post to the remote host
* #return The Content of the remote page and return null if no data was returned
*/
public String post(String url, Map<String, String> params) {
//Check if Valid URL
if(!url.toLowerCase().contains("http://")) return null;
StringBuilder bldr = new StringBuilder();
try {
//Build the post data
StringBuilder post_data = new StringBuilder();
//Build the posting variables from the map given
for (Iterator iter = params.entrySet().iterator(); iter.hasNext();) {
Map.Entry entry = (Map.Entry) iter.next();
String key = (String) entry.getKey();
String value = (String)entry.getValue();
if(key.length() > 0 && value.length() > 0) {
if(post_data.length() > 0) post_data.append("&");
post_data.append(URLEncoder.encode(key, "UTF-8"));
post_data.append("=");
post_data.append(URLEncoder.encode(value, "UTF-8"));
}
}
// Send data
URL remote_url = new URL(url);
URLConnection conn = remote_url.openConnection();
conn.setDoOutput(true);
OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream());
wr.write(post_data.toString());
wr.flush();
// Get the response
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String inputLine;
while ((inputLine = rd.readLine()) != null) {
bldr.append(inputLine);
}
wr.close();
rd.close();
} catch (Exception e) {
//Handle Error
}
return bldr.length() > 0 ? bldr.toString() : null;
}
You would then use the function as follows:
Map<String, String> params = new HashMap<String, String>();
params.put("var_a", "test");
params.put("var_b", "test");
params.put("var_c", "test");
String reponse = post("http://localhost/", params);
if(reponse == null) { /* error */ }
else {
System.out.println(reponse);
}
The big question is how will you authenticate the "update" from your Java program to your website?
You could easily write a handler on your website, say "/update" which saves the POST body (or value of a request parameter) to a file or other persistent store but how will you be sure that only you can set that value, instead of anybody who discovers it?

Categories