UTF-8 letters not displayed correctly - java

I am querying json-formatted data using apache drill on windows 10 from a dos-prompt. I am following their guide.
I have the very basic json-object {"år":"2018", "æøå":"ÆØÅ"} and when I query it from apache drill the output is not displayed correctly.
select * from dfs.`C:\Users\foo\Downloads\utf8.json`;
+-------+------+
| Õr | µ°Õ |
+-------+------+
| 2018 | ãÏ┼ |
+-------+------+
1 row selected (0,114 seconds)
The file is saved in UTF-8 format (using sublime text). I have also tried to save it in UTF-8 with BOM but it did not make a difference.
Setting the environment variable as mentioned in this SO-thread using
set JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8
does not help.
EDIT:
Slightly after posting I found a SO-thread that suggested to change the windows codepage to 65001 (utf-8). This shows the correct letters but also prevents the command-history (arrow-up) from working properly.
chcp 65001
sqlline.bat -u "jdbc:drill:zk=local"
select * from dfs.`C:\Users\cgu\Downloads\utf8.json`;
+-------+------+
| år | æøå |
+-------+------+
| 2018 | ÆØÅ |
+-------+------+

Related

Order Strings with polish in Postgres

What is the best way to order Entities by some String field that contains polish letters?
Is there a way to do that with Spring Data? Can I include Locale into Pageabe for this?:
Page<Collection> findByInstitutionIdAndIsDeletedFalse(Long institutionId, Pageable pageable);
and
Sort.Order entityOrder = Sort.Order.by("title").ignoreCase();
PageRequest pageable = PageRequest.of(page, perPage, Sort.by(entityOrder));
When I do like that, I have:
alaska
lalka
termos
łóżko
But "łóżko" should be after "lalka".
I tried to change Locale in Postgres database, but it didn't work for db1 nor db2.
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+------------+------------+-----------------------
db1 | user | UTF8 | pl_PL | pl_PL |
db2 | user | UTF8 | pl_PL.utf8 | pl_PL.utf8 |
db3 | user | UTF8 | en_US.utf8 | en_US.utf8 |
Thanks for any suggestions!
Ok, I've changed postgres version from alpine to 9.6, and in Dockerfile I have these lines:
RUN localedef -i pl_PL -c -f UTF-8 -A /usr/share/locale/locale.alias pl_PL.UTF-8
ENV LANG PL_PL.utf8
ENV LC_ALL="pl_PL.UTF-8"
ENV LC_CTYPE="pl_PL.UTF-8"
Now It's working and I have right order. But still I'am wondering how to build my postgres from docker-compose file.

Saving filepath in mysql using java, am i doing it right?

I'm working on a mini-project that can upload certain files to a server. I'm using java and mysql. According to things I read the 'right way' is to save the file to a folder in my server and put the file path in mysql.
A Part of my code:
File source = new File(fileChooser.getSelectedFile().getPath());
File dest = new File(destChooser.getSelectedFile().getPath());
String tempUpdate1 = "Insert into images (ID, file_name, file_path)"
+ "VALUES ('"+ID+"' , '"+fileChooser.getSelectedFile().getName()+"' , '"+destChooser.getSelectedFile().getPath()+"')";
conn = Connect.ConnectDB();
Statement stemt2 = conn.createStatement();
stemt2.executeUpdate(tempUpdate1);
FileUtils.copyFileToDirectory(source, dest);
JOptionPane.showMessageDialog(null, "File Uploaded Successfully!");
then i tried running it. It successfully copied the file to my desired folder. The problem is in my
mysql table where i save the path of the file.
The table's like this:
| ID | file_name | file_path |
| 1 | sample.docx | C:UsersMEDesktopest |
| 2 | sample.jpg | C:UsersMEDesktopest |
I tried seeing the output of the file path myself using JOptionPane it returned normal with the separators, but using the same variable and puting it in mysql, that's what i get as seen above, no separators.
Am i really doing it right? I did what is instructed on the topics related to this but no one seems to be complaining with the path. I'm wondering if did something wrong somewhere.
I'm planning to access the file using the file path as my project progress. Just would like to ask if the path file is accessible since i dont see any separator such as '//', '/' or '\'
it is almost good.
imagine that you have two files named same way. you can have only one of them on your server. I would save file under coded name like time in milis at moment of saving action.
Your table would contain then original name sample.jpg and file name 49473996034732 for example (in separate column of course). This way you can save as many sample.jpg as you like. Request link should point to file name 49473996034732 so you always will know for which file is request.
Then in response, when you set content to proper type you can set user friendly name.
modified table
| ID | file_name | server_file_name | file_path |
| 1 | sample.docx | 49473996034732 | C:UsersMEDesktopest |
| 3 | sample.jpg | 49473996034850| C:UsersMEDesktopest |
| 2 | sample.jpg | 49473996034988 | C:UsersMEDesktopest |
second thing is that i noticed is that you allow user to pick destination folder for containing file. Shouldn't is be fixed location? This way it would be more like server behavior. But im guessing this is just an example code fast codded in Swing

Java tabled output to file

I'm interested in a way of outputting some objects to a table like way the objects. Concrete example would be something like:
-------------------------------------------
| Name | foo | bar |
-------------------------------------------
| asdas | dsfsd |1233.23 |
| adasdasd | fsdfs |3.23 |
| sdasjd | knsdfsd |13.23 |
| lkkkj | dsfsd |2343.23 |
-------------------------------------------
Or an ms office / open office excel file.(is there an api doc for this type of outputting data in specific editors? like how to define a table in OpenPffice)?
I'm asking this because I would like to know the best way doing this.
PS: there is no need to deserialise.
docx4j is a library for creating and manipulating .docx,pptx and excel files.
If you do not feel like using docx4java or it does not fit your needs you can try these
Apache Poi
Open Office API
The easiest is to export to a comma-separated values which you can open in Excel.
You can use the data-exporter library.

JSP encoding while inserting non-English text in MySQL database

I am using Ajax call to insert Indian characters in MySQL database. I am facing an UTF-8 encoding problem in between flow of my application.
When I am inserting the non-English characters directly by JDBC (not using an Ajax call), then it's showing "????" in the database.
When I include
response.setCharacterEncoding("UTF-8");
request.setCharacterEncoding("UTF-8");
response.setContentType("text/html;charset=UTF-8");
in my JSP file, then I receive the following in my database (question marks instead of non-English characters):
????????
When I do not include above lines it shows me junk character like this in database:
મà«?àª?પà«?ષà«?àª
Whereas the actual value is
મખપષ
So the actual problem lies in or after sending insert request to MySQL command in JSP through JDBC jdbc connector.
I have included following tags in all my JSP files to ensure character encoding:
<%#page contentType="text/html"%>
<%#page pageEncoding="UTF-8"%>
and
<meta http-equiv="Content-Type" content="text/html; charset=utf-8;charset=UTF-8">
I checked MySQL tables are Unicode enabled and I am able to enter correctly non English text through terminal.
How is this problem caused and how can I solve it?
Now, i am able to write using a insert statement only....but when i mix some queries and insert statement then... my application return me following error:
Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
following are my database variables:
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
| completion_type | 0 |
| concurrent_insert | 1 |
| connect_timeout | 10 |
When I am inserting the non-English characters directly by JDBC (not using an Ajax call), then it's showing "????" in the database.
This will only happen when the both sides are perfectly aware of the character encoding differences in each side. Any character which is not covered by the character encoding used on the other side will be replaced by a question mark ?. Otherwise you would have seen Mojibake.
In this particular case, those sides are the Java side and the database side, with the JDBC driver as mediator. To fix this, you need to tell the JDBC driver what encoding those characters are in. You can do that by setting the useUnicode=true&characterEncoding=UTF-8 parameters in the JDBC connection URL.
jdbc:mysql://localhost:3306/dbname?useUnicode=true&characterEncoding=UTF-8
Then, depending on how you're sending the parameters from the client to server, you possibly also need to fix the request encoding. Given the fact that you're seeing Mojibake when you removes request.setCharacterEncoding("UTF-8"), you're using POST. So that part is fine.
For the case that, if you were using GET to send the parameters, you would need to configure URI encoding in the server side. It's unclear what server you're using, but in case of for example Tomcat, it's a matter of editing the <Connector> entry in /conf/server.xml as follows:
<Connector ... URIEncoding="UTF-8">
See also:
Unicode - How to get the characters right?
Please help me to solve this issue...
You need to figure out where in the processing chain things are going on.
You say that you have created the tables correctly and that you can enter and display text from your terminal. You have something that is known to be handling these characters correctly, so try the following experiments ... in this order ... to isolate where things are going wrong.
Using the mysql command, attempt to insert a row containing the problem characters, and then to select and display the inserted row.
Write a simple Java program to do the same, using the JDBC URLs that your application is currently doing.
Modify your app to capture and log the request parameter strings it is receiving from the browser.
(If possible) capture the requests as received by your server and as sent by the browser. Check both the request parameters and the headers.
One aside:
Place the directives together without spaces. (You can even combine the attributes inside a single #page.) Because HTTP headers should be set before HTML content is written. Because of page buffering this is not strictly needed, but formally yes.
The other answers till now are true.
One additional issue is the database, table and field definitions which all can have a default and actual character set.
Off course one should be really careful, મ��પ�ષ�ઠmight be a wrong display of the correct data, as the displaying program might not be using UTF-8.

look for a database design related manner

I am working for a log analyzer system,which read the log of tomcat and display them by a chart/table in web page.
(I know there are some existed log analyzer system,I am recreating the wheel. But this is my job,my boss want it.)
Our tomcat log are saved by day. For example:
2011-01-01.txt
2011-01-02.txt
......
The following is my manner for export logs to db and read them:
1 The DB structure
I have three tables:
1)log_current:save the logs generated today.
2)log_past:save the logs generated before today.
The above two tables own the SAME schema.
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
| Id | hostip | username | datasend | method | uri | queryStr | protocol | status | time | browser | platform | refer |
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
| 44359 | 127.0.0.1 | - | 0 | GET | / | | HTTP/1.1 | 404 | 2011-02-17 08:08:25 | Unknown | Unknown | - |
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
3)log_record:save the information of log_past,it record the days whose logs have been exported to the log_past table.
+-----+------------+
| Id | savedDate |
+-----+------------+
| 127 | 2011-02-15 |
| 128 | 2011-02-14 |
..................
+-----+------------+
The table shows log of 2011-02-15 have been exported.
2 Export(to db)
I have two schedule work.
1) day work.
at 00:05:00,check the tomcat log directory(/tomcat/logs) to find all the latest 30 days log files(of course it include logs of yesterday.
check the log_record table to see if logs of one day is exported,for example,2011-02-16 is not find in the log_record,so I will read the 2011-02-16.txt,and export them to log_past.
After export log of yesterday,I start the file monitor for today's log(2011-02-17.txt) not matter it exist or not.
2)the file monitor
Once the monitor is started,it will read the file hour by hour. Each log it read will be saved in the log_current table.
3 tomcat server restart.
Sometimes we have to restart the tomcat,so once the tomcat is started,I will delete all logs of log_current,then do the day work.
4 My problem
1) two table (log_current and log_past).
Because if I save the today's log to log_past,I can not make sure all the log file(xxxx-xx-xx.txt) are exported to db. Since I will do a check in 00:05:00 every day which make sure that logs before today must be exported.
But this make it difficult to query logs accros yestersay and today.
For example,query from 2011-02-14 00:00:00 to 2011-02-15 00:00:00,these log must be at log_past.
But how about from 2011-02-14 00:00:00 to 2011-02-17 08:00:00 ?(suppose it is 2011-02-17 09:00:00 now).
It is complex to query across tables.
Also,I always think my desing for the table and work manner(schedule work of export/read) are not perfect,so anyone can give a good suggestion?
I just need to export and read log and can do a almost real-time analysis where real-time means I have to make logs of current day visiable by chart/table and etc.
First of all, IMO you don't need 2 different tables log_current and log_past. You can insert all the rows in the same table, say logs and retrieve using
select * from logs where id = (select id from log_record where savedDate = 'YOUR_DATE')
This will give you all the logs of the particular day.
Now, once you are able to remove the current and past distinction between tables using above way, I think the problem you are asking here would be solved. :)

Categories