Mac Chrome and Mac Safari produces different unicodes - java

I'm working on a legacy project where we use Java 6 with Spring, Grails, etc.
The problem I'm running into, is that I have an file upload form, where we support support German filenames.
In this case I have a file named something with "für" and I'm having difficulties with that now.
I have tried converting it to unicode to see if that solved the problem but I am now able to see why there's a problem.
On Mac Chrome, it produces U+308 while on Mac Safari it produces U+00FC, with Safari it works and inserts correctly in MySQL, the other one fails.
The error from MySQL:
#1366 - Incorrect string value: '\xCC\x88r' for column `name` at row 1
When I try running this code:
UPDATE `X` SET `name` = 'für' WHERE `skabelon`.`id` = 1302
Why is there a difference and how can I fix it so it'll work with Chrome on Mac? Windows and Ubuntu Chrome works flawless.
UPDATE
It's now working after Normalizing the string.

It worked after I put it through normalizing in Java. Never knew that I needed that before now.
I used the Normalizer from java.text
Normalizer.normalize("String", Normalizer.Form.NFC)

Related

JavaFx application in Windows is not displaying text correctly

So I have an application written in JavaFx 2.2 that has been packaged for linux, mac, and windows. I am getting a strange issue with some of the text fields though. The application will read a file and populate some labels based on whats found in the file. When run on ubuntu or mac we get a result like as you can see we have that special accent character over the c and it looks just fine. However in Windows it shows up like this . Any idea as to why this is happening? I was a bit confused as it is the same exact application on all three. Thanks.
Make sure to specify character encoding when reading the file, in order to avoid using the platform's default encoding, which varies between operating systems. Just by coincidence, the default on Linux and Mac happens to match the file encoding and produces correct output, but you should not rely on it.

JodConverter with LibreOffice outputs all letteres as squares after docx-to-pdf conversion

In order to convert docx-files to pdf (or pdf-a to be precise), we are using JodConverter along with LibreOffice. This has been working fine for a week or so, but then suddenly all letters were representet as squares (usually indicating some control-sign) in the converted pdf (the word-file looked fine). After restarting the service for LibreOffice, things went back to normal, and letters were output just fine.
But we were left worried, as we have no guarantee that it won't happen again. I also have no idea why this happened, we had some troubles in the environment prior to this, but none on the server doing the docx-to-pdf convertion in particular.
Has anyone else encountered this problem, or a theory as to why it occured?
I have no theory as to why it occured, but I recommend using SoftMaker FreeOffice instead of LibreOffice. All included apps offer direct pdf export and it works excellent. Btw., if you have to exchange documents with Microsoft Office users: this is the office suite with the best interoperability on the market. You get from this website for either Windows or Linux without charge: freeoffice.com/en

Character encoding issues in Eclipse for Java using Webdriver

I'm currently using Eclipse with TestNG running selenium webdriver with java. I am using Jexcelapi to import data from OpenOffice (spreadsheet) to compare strings on the website i'm testing with values in the spreadsheet. The problem I have is that we have different regions including germany and Nordics (Sweden, Norway and Denmark). These sites have string characters with accents special characters. This is copied correctly on my spreadsheet and running the scripts in debug mode shows the correct character from the spreadsheet but when i get my results, it displays invalid characters such as ? and whitespace. I have looked through the forum and searched everywhere for the past few days and seen various solutions but none seemed to work. I'm not sure if the problem is with Eclipse, Jexcelapi or OpenOffice.
I changed the encoding settings in Eclipse to UTF-8 as advised in some places but still the same problem. I instantiated the class 'WorkbookSettings' and set the encoding and used it with my getWorkbook method and I still get those bad characters that make my scripts show failures.
Can anyone help with this please?
Thanks in advance
We had a similar problem when running webdriver on a remote machine and trying to paste text into forms. The tests were working on our development machines.
The solution was setting the environment variable
JAVA_TOOL_OPTIONS = -Dfile.encoding=UTF8
After that the webdriver copied with the right encoding for swedish characters.

Java Updatable Resultset Anomaly with UTF-8 data in English Windows

I am facing a weird problem. I am getting exception when I try to update or delete row in updatable resultset which contains non-english utf-characters. However insert goes fine.
java.sql.SQLException: refreshRow() called on row that has been deleted or had primary key changed.
The weirdest things are:
This error happens only when compiled jar is run in windows
However same jar run in Linux runs fine for same data without problem.
Same project run from within IDE runs also fine in Windows.
Other information in case that will be helpful
OS: Windows XP (English with non-english language support installed)
DB: MySQL, encoding utf8, collation - utf8_general_ci
IDE: Netbeans 6.9.1
JDK: 6 update 23
Connector/J 5.1.15 (Just switch to check if this works but same problem with version 14 too)
Connection string includes: "useUnicode=true" and "characterEncoding=utf8"
Initially thought that IDE has something to do so posted this message in netbeans forum
http://forums.netbeans.org/topic36558.html
Also cross posted in mysql JDBC forums hoping to find some answer
http://forums.mysql.com/read.php?39,408795,408795
but couldn't get any help there.
So far, the problem seems to be Windows. May be this is just minor issue but can't think of any work around.
Need some suggestion
Thanks and regards
Deepak
It seems like your IDE is override the default encoding that you get when you run your application from the command line. If you check the actual JVM arguments the IDE uses (normally available in the output window of your IDE), you will probably see the inclusion of a file-encoding argument, like this:
-Dfile.encoding="UTF-8"
Try to start your application with this JVM argument and see if it makes any difference, and if not - compare the actual encoding used when run from the IDE and on the command line like this:
System.out.println(System.getProperty("file.encoding"));
I had the same problem and solved it. I don't understand why this is happenning but this is caused when your primary key of mysql table is combined. In my database there are many tables that have combined primary key and others that have auto-increment. Likely i noticed that this problem didn't occur in tables with auto-increment primary key.

Form encoding in Tapestry

I have a problem with Tapestry form.
My XML database is very sensible with encoding, and need utf-8.
When i put the char 'à' in my form, tapestry receive 'Ó' and my core get an error : Invalid byte 2 of 3-byte UTF-8 sequence.
I haven't the problem in eclipse with local default configuration for tomcat.
But whatever the tomcat configuration, i think my application must do the conversion itself.
So i try :
charset="utf-8" in form => FAIL
buildUtf8Filter in AppModule => FAIL
The charset of every page is always utf-8.
So, what could i do before using java Charset encoder ?
thank you for helping me. :)
I wouldn't think there's anything wrong with your application. Tapestry does everything in UTF-8 by default; that wiki page is fairly out of date (referring to the 5.0.5 beta, where apparently forms with file uploads still didn't use UTF-8 properly).
You're saying you don't have the problem locally. Have you tried running on a different server? If you do not have the problem there, there's probably something wrong with the codepage settings of the operating system on the server.
Purely anecdotal evidence below
I have once had a similar character set problem in a Tapestry 5 app on the production server (running SUSE Linux) that I could not reproduce on any other server. All seemed fine with the application, the Tomcat server, and the codepage settings of the system, but POST data would end up decoded as ISO 8859-1 instead of UTF-8 in the application. The app had run on that server for a year before the problem manifested - maybe through an update in the operating system.
Afer a day of not getting anywhere, we ended up just re-installing the whole server OS, and everything was fine again.
The problem was about the default charset of the JVM launched into windows shell.
It caused trouble with FileWriter and then show bad character in the console :)

Categories