I have a database with some cratian characters in it like Đ , in the database the character is stored correctly, when using a datatable in primefaces it also shows the character in the webpage just fine.
The problem is that when I send it to the out.println() the character Đ in the name is missing.
for (People p : people) {
System.out.println("p.getName());
}
I tried using String name2 = p.getName().getBytes("ISO-8859-2"); but it still not working
I assume you are using UTF-8 as default encoding on the Database and for Primefaces
Have also a look to this:
Display special characters using System.out.println
Related
I want to filter table and check string in selenium and that string in the web contains Lithuanian special letter so i get something like this "M?nesis" instead of "Mėnesis"
ElementsCollection activePlans = $$(".view-content .tile__title").filterBy(text("Mėnesis"));
How can i do that?
I'm using jsoup to get all text from websites.
Document doc = Jsoup.connect("URL").get();
String allText doc.text().toLowerCase();
Then I'm using Hibernate to persist the object that holds all text to a MySQL DB:
...
#Column(name="all_text")
#Lob
private String allText = null;
...
Everything is good so far. Only that sometimes I get a MySQL error when I try to save the object with allText:
java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\x8A s...' for column 'all_text' at row 1
Already looked this up and it's an encoding error. Probably have some special characters on their websites. I found a way to fix this by changing the encoding in the DB.
But my actual question is: what's the best way to filter and remove the special characters from the allText string and not persist them at all?
EDIT: To clarify, by special characters I mean Emoticons and all that stuff. Definitely anything that doesn't fit into UTF-8 encoding. I'm not concerned about ~ ^ etc...
Thanks in advance!
Just use regex:
allText.replaceAll("\\p{C}", "");
Don't forget to import java.util.regexPattern
I tried to insert some special character via java into oracle table and then retrieve it again--assuming my encoding will work.
Below is the code which i tried.
String s=new String("yesterday"+"\u2019"+"s");
...
statement.executeUpdate("INSERT into test1 values ('"+s+"')");
ResultSet rs=statement.executeQuery("select * from test1");
while (rs.next()) {
System.out.println(new String(rs.getString(1).getBytes("UTF-8"),"UTF-8"));
}
...
Now, when I try to see output via commandline execution it displays special character always: yesterday’s
My question is: why even after using encoding, it is not showing expected result. i.e. yesterday’s. Is above mentioned code is not correct or some modification is required?
P.S.: In eclipse, the code might result yesterday’s, but if executed via command line , it shows yesterday’s
I am using :
-- JDK1.6
-- Oracle : 11.1.0.6.0
-- NLS_Database_Parameters: NLS_CHARACTERSET WE8MSWIN1252
--Windows
Edit:
\u2019 : this is RIGHT SINGLE QUOTATION MARK & I am looking for this character only.
Check the java property "file.encoding" when you run on the commandline, it may be set to something other than "UTF-8" causing the text to display incorrectly when you output on the commandline.
Here is an illustration of what I suggested in a comment (change the character set of your client). Straight from my SQL*Plus:
SQL> select unistr('\2019') from dual;
U
-
Æ
SQL> $chcp 1252
Active code page: 1252
SQL> select unistr('\2019') from dual;
U
-
’
If this works for you, you may want to add $chcp 1252 to your [g]login.sql.
The problem is that the character encoding for the apostrophe is \u0027
I ran this in the command line:
public class Yesterday{
public static void main(String[] args) {
String s = new String("yesterday" + "\u0027" +"s");
System.out.println(s);
}
}
it resulted in:
yesterday's
I have to find a user-defined String in a Document (using Java), which is stored in a database in a BLOB. When I search a String with special characters ("Umlaute", äöü etc.), it failes, meaning it does not return any positions at all. And I am not allowed to convert the document's content into UTF-8 (which would have fixed this problem but raised a new, even bigger one).
Some additional information:
The document's content is returned as String in "ISO-8859-1" (Latin1).
Here is an example, what a String could look like:
Die Erkenntnis, daà der Künstler Schutz braucht, ...
This is how it should look like:
Die Erkenntnis, daß der Künstler Schutz braucht, ...
If I am searching for Künstler it would fail to find it, because it looks for ü but only finds ü.
Is it possible to convert Künstler into Künstler so I can search for the wrong encoded version instead?
Note:
We are using the Hibernate Framework for Database access. The original Getter for the Document's Content returns a byte[]. The String is than returned by calling
new String(getContent(), "ISO-8859-1")
The problem here is, that I cannot change this to UTF-8, because it would then mess up the rest of our application which is based on a third party application that delivers data this way.
Okay, looks like I've found a way to mess up the encoding on purpose.
new String("Künstler".getBytes("UTF-8"), "ISO-8859-1")
By getting the Bytes of the String Künstler in UTF-8 and then creating a new String, telling Java that this is Latin1, it converts to Künstler. It's a hell of a hack but seems to work well.
Already answered by yourself.
An altoghether different approach:
If you can search the blob, you could search using
"SELECT .. FROM ... WHERE"
+ " ... LIKE '%" + key.replaceAll("\\P{Ascii}+", "%") + "%'"
This replaces non-ASCII sequences by the % wildcard: UTF-8 multibyte sequences are non-ASCII by design.
I am wondering with this behavior. In my application I am getting data from server , or my own created database. ( I clone server database)
.replaceAll ( "\r\n" , "<br/>" ) ;
When the data is come from server that it replace. But When data is get from sqlite database its unable to replace the above. As I have try .replaceAll ( "a" , "??" ) ; and its working.
The database data is
Bradley Ambrose is the freelance cameraman who recorded the John Key and John Banks tea meeting.\r\n\r\nHe intentionally placed a black bag with a recording device on the table where Key and Banks were sitting, although he claims it was a mistake, If that were true then how did so many people get a copy of it???\r\n\r\nAlso this guy bloody changed his name from Brad White what the hell is this guy an international man of mystery or something.
I have also debug that issue in detail. But the is not replaced even code is executed the above line successfully.
I have also try
replaceAll ( "\n" , "<br/>" )
replaceAll ( "\r" , "<br/>" )
There is debugging picture.
Does the input string contain actual CR and LF characters or pairs of \ and r and \ and n?
The regex won't work in latter case. It would require .replaceAll("\\\\r\\\\n" , "<br/>")
Can you try with Pattern#quote() ?
Something like:
System.out.println("hello\r\n\r\n something".replaceAll(Pattern.quote("\r\n"), ""));
The code is fine. The data you are seeing in the debug screen is wrong. Do the same debug session and insert a system.out.println and check the output with the output in the debug screen.
Unless you you mean the database actually has the string "\r\n". The above assumes that the database actually contains the carrige return and line feed characters. If your database actually has the backslash character followed by the 'n' character then your regex needs a simple tweak. s.replaceAll("\\\\r\\\\n", "")