JSP encoding while inserting non-English text in MySQL database - java

I am using Ajax call to insert Indian characters in MySQL database. I am facing an UTF-8 encoding problem in between flow of my application.
When I am inserting the non-English characters directly by JDBC (not using an Ajax call), then it's showing "????" in the database.
When I include
response.setCharacterEncoding("UTF-8");
request.setCharacterEncoding("UTF-8");
response.setContentType("text/html;charset=UTF-8");
in my JSP file, then I receive the following in my database (question marks instead of non-English characters):
????????
When I do not include above lines it shows me junk character like this in database:
મà«?àª?પà«?ષà«?àª
Whereas the actual value is
મખપષ
So the actual problem lies in or after sending insert request to MySQL command in JSP through JDBC jdbc connector.
I have included following tags in all my JSP files to ensure character encoding:
<%#page contentType="text/html"%>
<%#page pageEncoding="UTF-8"%>
and
<meta http-equiv="Content-Type" content="text/html; charset=utf-8;charset=UTF-8">
I checked MySQL tables are Unicode enabled and I am able to enter correctly non English text through terminal.
How is this problem caused and how can I solve it?
Now, i am able to write using a insert statement only....but when i mix some queries and insert statement then... my application return me following error:
Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
following are my database variables:
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
| completion_type | 0 |
| concurrent_insert | 1 |
| connect_timeout | 10 |

When I am inserting the non-English characters directly by JDBC (not using an Ajax call), then it's showing "????" in the database.
This will only happen when the both sides are perfectly aware of the character encoding differences in each side. Any character which is not covered by the character encoding used on the other side will be replaced by a question mark ?. Otherwise you would have seen Mojibake.
In this particular case, those sides are the Java side and the database side, with the JDBC driver as mediator. To fix this, you need to tell the JDBC driver what encoding those characters are in. You can do that by setting the useUnicode=true&characterEncoding=UTF-8 parameters in the JDBC connection URL.
jdbc:mysql://localhost:3306/dbname?useUnicode=true&characterEncoding=UTF-8
Then, depending on how you're sending the parameters from the client to server, you possibly also need to fix the request encoding. Given the fact that you're seeing Mojibake when you removes request.setCharacterEncoding("UTF-8"), you're using POST. So that part is fine.
For the case that, if you were using GET to send the parameters, you would need to configure URI encoding in the server side. It's unclear what server you're using, but in case of for example Tomcat, it's a matter of editing the <Connector> entry in /conf/server.xml as follows:
<Connector ... URIEncoding="UTF-8">
See also:
Unicode - How to get the characters right?

Please help me to solve this issue...
You need to figure out where in the processing chain things are going on.
You say that you have created the tables correctly and that you can enter and display text from your terminal. You have something that is known to be handling these characters correctly, so try the following experiments ... in this order ... to isolate where things are going wrong.
Using the mysql command, attempt to insert a row containing the problem characters, and then to select and display the inserted row.
Write a simple Java program to do the same, using the JDBC URLs that your application is currently doing.
Modify your app to capture and log the request parameter strings it is receiving from the browser.
(If possible) capture the requests as received by your server and as sent by the browser. Check both the request parameters and the headers.

One aside:
Place the directives together without spaces. (You can even combine the attributes inside a single #page.) Because HTTP headers should be set before HTML content is written. Because of page buffering this is not strictly needed, but formally yes.
The other answers till now are true.
One additional issue is the database, table and field definitions which all can have a default and actual character set.
Off course one should be really careful, મ��પ�ષ�ઠmight be a wrong display of the correct data, as the displaying program might not be using UTF-8.

Related

UTF-8 letters not displayed correctly

I am querying json-formatted data using apache drill on windows 10 from a dos-prompt. I am following their guide.
I have the very basic json-object {"år":"2018", "æøå":"ÆØÅ"} and when I query it from apache drill the output is not displayed correctly.
select * from dfs.`C:\Users\foo\Downloads\utf8.json`;
+-------+------+
| Õr | µ°Õ |
+-------+------+
| 2018 | ãÏ┼ |
+-------+------+
1 row selected (0,114 seconds)
The file is saved in UTF-8 format (using sublime text). I have also tried to save it in UTF-8 with BOM but it did not make a difference.
Setting the environment variable as mentioned in this SO-thread using
set JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8
does not help.
EDIT:
Slightly after posting I found a SO-thread that suggested to change the windows codepage to 65001 (utf-8). This shows the correct letters but also prevents the command-history (arrow-up) from working properly.
chcp 65001
sqlline.bat -u "jdbc:drill:zk=local"
select * from dfs.`C:\Users\cgu\Downloads\utf8.json`;
+-------+------+
| år | æøå |
+-------+------+
| 2018 | ÆØÅ |
+-------+------+

Why are question marks replacing certain characters in mysql database with collation utf-8?

I'm using Jsoup to scrape a webpage. It takes the text and enters it directly into the database.
The text on the target webpage looks perfectly fine, but after entering it into the database i get question marks replacing certain characters.
For example the single right quotation marks (U+2019) in the following sentence:
I can’t imagine uh, a domain of human endeavor that isn’t impacted by
the imagination.
Will show up like this in the database and on the webpage i'm outputting it on:
I can?t imagine uh, a domain of human endeavor that isn?t impacted by
the imagination.
Initially i thought this was just a problem with the charset/collation of the database but after trying out different types, the problem persists...
The sql database i'm currently working in is in utf-8:
mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
And the meta is set:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
I've tried specifically setting it in java like so:
url = "jdbc:mysql://localhost:3306/somedb?useUnicode=true&characterEncoding=utf-8";
I've tried sql queries like:
SET NAMES 'utf8'
SET CHARACTER SET utf8
I've tried creating a new database and nothing seems to work..
Any ideas why this might be happening?
Jsoup automatically detects the charset for the webpage being crawled.
However, many websites do not set character set encoding along with the content-type header by not defining charset.
If you crawl such webpage, where the charset attribute is missing in HTTP response Content-Type header, Jsoup parses the page using platform’s default character set. That also means that you might not get expected results as the platform’s default character set might be different from the webpage you are crawling.
It might result in loss of characters or them being parsed/printed incorrectly.
To avoid such behavior you need to read the URL as InputStream and manually specify your desired character set
in parse method of Jsoup as given below:
String page = "http://www.somepage.com";
//get input stream from the URL
InputStream in = new URL(page).openStream();
//parse document using input stream and specify the charset
Document doc = Jsoup.parse(in, "ISO-8859-1", page);
//..do your processing
There are several steps to make a page work correctly.
See "question mark" in Trouble with UTF-8 characters; what I see is not what I stored

Saving filepath in mysql using java, am i doing it right?

I'm working on a mini-project that can upload certain files to a server. I'm using java and mysql. According to things I read the 'right way' is to save the file to a folder in my server and put the file path in mysql.
A Part of my code:
File source = new File(fileChooser.getSelectedFile().getPath());
File dest = new File(destChooser.getSelectedFile().getPath());
String tempUpdate1 = "Insert into images (ID, file_name, file_path)"
+ "VALUES ('"+ID+"' , '"+fileChooser.getSelectedFile().getName()+"' , '"+destChooser.getSelectedFile().getPath()+"')";
conn = Connect.ConnectDB();
Statement stemt2 = conn.createStatement();
stemt2.executeUpdate(tempUpdate1);
FileUtils.copyFileToDirectory(source, dest);
JOptionPane.showMessageDialog(null, "File Uploaded Successfully!");
then i tried running it. It successfully copied the file to my desired folder. The problem is in my
mysql table where i save the path of the file.
The table's like this:
| ID | file_name | file_path |
| 1 | sample.docx | C:UsersMEDesktopest |
| 2 | sample.jpg | C:UsersMEDesktopest |
I tried seeing the output of the file path myself using JOptionPane it returned normal with the separators, but using the same variable and puting it in mysql, that's what i get as seen above, no separators.
Am i really doing it right? I did what is instructed on the topics related to this but no one seems to be complaining with the path. I'm wondering if did something wrong somewhere.
I'm planning to access the file using the file path as my project progress. Just would like to ask if the path file is accessible since i dont see any separator such as '//', '/' or '\'
it is almost good.
imagine that you have two files named same way. you can have only one of them on your server. I would save file under coded name like time in milis at moment of saving action.
Your table would contain then original name sample.jpg and file name 49473996034732 for example (in separate column of course). This way you can save as many sample.jpg as you like. Request link should point to file name 49473996034732 so you always will know for which file is request.
Then in response, when you set content to proper type you can set user friendly name.
modified table
| ID | file_name | server_file_name | file_path |
| 1 | sample.docx | 49473996034732 | C:UsersMEDesktopest |
| 3 | sample.jpg | 49473996034850| C:UsersMEDesktopest |
| 2 | sample.jpg | 49473996034988 | C:UsersMEDesktopest |
second thing is that i noticed is that you allow user to pick destination folder for containing file. Shouldn't is be fixed location? This way it would be more like server behavior. But im guessing this is just an example code fast codded in Swing

Jasper Reports - Limit number of rows in a group

I have a CSV datasource something like this:
User,Site,Requests
user01,www.facebook.com,54220
user01,plusone.google.com,2015
user01,www.twitter.com,33564
user01,www.linkedin.com,54220
user01,weibo.com,2015
user02,www.twitter.com,33564
user03,www.facebook.com,54220
user03,plusone.google.com,2015
user03,www.twitter.com,33564
In the report I want to display the first 3 rows (max) for each user, while the other rows will only contribute to the group total. How do I limit the report to only print 3 rows per group?
e.g
User Site Requests
user01 | www.facebook.com | 54220
plusone.google.com | 2015
www.twitter.com | 33564
| 146034
user02 | www.twitter.com | 33564
| 33564
user03 | www.facebook.com | 54220
user03 | plusone.google.com | 2015
user03 | www.twitter.com | 33564
| 89799
It is really just the line limiting I am struggling with, the rest is working just fine.
I found a way to do it, if anyone can come up with a more elegant answer I would be happy to see it, as this feels a bit hacky!
for each item in detail band:
<reportElement... isRemoveLineWhenBlank="true">
<printWhenExpression><![CDATA[$V{userGroup_COUNT} < 4]]></printWhenExpression>
</reportElement>
where userGroup is the field I am grouping by. I only seemed to need the isRemoveLineWhenBlank attribute for the first element.
you may consider to use subreport by querying the grouping fields in the main report and then passing the grouping fields as parameters into the subreport; the merits of this method is to avoid the report engine to actually looping through all un-required rows (although they are not shown) and spending unnecessary server-to-server or server-to-client bandwidth especially when the dataset returned is large

look for a database design related manner

I am working for a log analyzer system,which read the log of tomcat and display them by a chart/table in web page.
(I know there are some existed log analyzer system,I am recreating the wheel. But this is my job,my boss want it.)
Our tomcat log are saved by day. For example:
2011-01-01.txt
2011-01-02.txt
......
The following is my manner for export logs to db and read them:
1 The DB structure
I have three tables:
1)log_current:save the logs generated today.
2)log_past:save the logs generated before today.
The above two tables own the SAME schema.
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
| Id | hostip | username | datasend | method | uri | queryStr | protocol | status | time | browser | platform | refer |
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
| 44359 | 127.0.0.1 | - | 0 | GET | / | | HTTP/1.1 | 404 | 2011-02-17 08:08:25 | Unknown | Unknown | - |
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
3)log_record:save the information of log_past,it record the days whose logs have been exported to the log_past table.
+-----+------------+
| Id | savedDate |
+-----+------------+
| 127 | 2011-02-15 |
| 128 | 2011-02-14 |
..................
+-----+------------+
The table shows log of 2011-02-15 have been exported.
2 Export(to db)
I have two schedule work.
1) day work.
at 00:05:00,check the tomcat log directory(/tomcat/logs) to find all the latest 30 days log files(of course it include logs of yesterday.
check the log_record table to see if logs of one day is exported,for example,2011-02-16 is not find in the log_record,so I will read the 2011-02-16.txt,and export them to log_past.
After export log of yesterday,I start the file monitor for today's log(2011-02-17.txt) not matter it exist or not.
2)the file monitor
Once the monitor is started,it will read the file hour by hour. Each log it read will be saved in the log_current table.
3 tomcat server restart.
Sometimes we have to restart the tomcat,so once the tomcat is started,I will delete all logs of log_current,then do the day work.
4 My problem
1) two table (log_current and log_past).
Because if I save the today's log to log_past,I can not make sure all the log file(xxxx-xx-xx.txt) are exported to db. Since I will do a check in 00:05:00 every day which make sure that logs before today must be exported.
But this make it difficult to query logs accros yestersay and today.
For example,query from 2011-02-14 00:00:00 to 2011-02-15 00:00:00,these log must be at log_past.
But how about from 2011-02-14 00:00:00 to 2011-02-17 08:00:00 ?(suppose it is 2011-02-17 09:00:00 now).
It is complex to query across tables.
Also,I always think my desing for the table and work manner(schedule work of export/read) are not perfect,so anyone can give a good suggestion?
I just need to export and read log and can do a almost real-time analysis where real-time means I have to make logs of current day visiable by chart/table and etc.
First of all, IMO you don't need 2 different tables log_current and log_past. You can insert all the rows in the same table, say logs and retrieve using
select * from logs where id = (select id from log_record where savedDate = 'YOUR_DATE')
This will give you all the logs of the particular day.
Now, once you are able to remove the current and past distinction between tables using above way, I think the problem you are asking here would be solved. :)

Categories