Due to repetitive errors with one of our Java applications:
Engine engine_0: Error in application action.
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x13)
was found in the element content of the document.
I need to "fix" some Unicode character in an Oracle database, ideally in a programmatic fashion. Once identified, what would be a simple way to "search and replace" it?
Assuming the characters are present in a text field:
update TABLE set COLUMN=REPLACE(convert(varchar(5000), COLUMN), 'searchstring', 'replacestring')
(note that this will only work on a text field with no more than 5000 characters, for larger text fields increase the number in the query).
Related
I am trying to display ŵ through my jsf page but unable to do so. Basically the text with special characters is read from properties file , but on my application screen it becomes something else . I did try to use entity values but not succeeding for example if original text is :
ŵyhsne klqdw dwql
then after replacing with with entity or hexvalues:
**&wcirc ;**yhsne klqdw dwql but in my page it displays as it is
I can just guess your question. Please edit it and improve it.
If you are displaying in web, you should use ŵ (note: without spaces), but this also requires a fonts on client site that support such character.
If the string is in your code: replace the character with \u0175.
But probably the best way it is to use just ŵ either in code on in web, or on any file, and you should assure that such files (or sources) are interpreted ad UTF-8, and you deliver pages are UTF-8. If you are not using UTF-8, just check in similar way, that you are using consistently the correct encoding.
And sending a character doesn't mean it could be displayed. There is always the possibility that a font will not have all *special" characters in it.
I have an issue inserting a pdf text into a mysql table. The error message is as follows:
" Incorrect string value: '\xF0\x9D\x9B\xBC i...' for column 'text' at row 1"
I know that this code refers to the greek letter alpha. However, I have set 'characer set' to UTF-8 for the column text but also in the mysql connection. Also, I have tried uft8mb4. However, none of it worked.
The greek letter alpha occurs in different font types. I am not sure if this matters.
Any ideas why this does not work?
I also created a pdf file myself which contained an alpha in the text. For this example, my programme runs without any errors. Although I know that the error message refers to the alpha, there seems to be an additional issue.
Thanks in advance!
UPDATE:
After some checking, I found that some really strange symbols were created from a formula which contained the greek letter alpha. So, apparently these unknown symbols led to the error.
However, I still do not know how to exclude any unknown symbols from the text. What is the easiest way to do this?
These are the symbols:
I restricted the string in Java to only latin symbols. maybe that's not the most general way of getting rid of those strange symbols but it works for now.
In MySQL, use CHARACTER SET utf8mb4.
Add ?useUnicode=yes&characterEncoding=UTF-8 to the JDBC URL
I tried using Teradata fastload
Here is the sample file that they provide on the official website
L_INDEX,L_TIMESTAMP,L_TEXT
1,2010-08-11 13:19:05.1,some text
2,2010-08-11 13:19:05.1,
3,2010-08-11 13:19:05.1,more text
4,,text
5,,
It runs perfect with the above file
Then I modified ONLY the first row . So that some text became "some, text" . The following is a perfectly legit csv
L_INDEX,L_TIMESTAMP,L_TEXT
1,2010-08-11 13:19:05.1,"some, text" // this row was slightly modified
2,2010-08-11 13:19:05.1,
3,2010-08-11 13:19:05.1,more text
4,,text
5,,
However I got an error saying that the first column contains 4 values but only 3 values were expected
As far as I understand I must specify text qualifier " . How can I do this ?
I read documentation but nothing is mentioned about this .
According to the FastLoad Utility documentation pertaining to the selection of a delimiter for use with the SET RECORD command and a VARTEXT layout:
Any character sequence that appears in the data cannot be used as a
delimiter. No control character other than a tab character can be used
in a delimiter.
This would likely extend to the use of the FastLoad API mechanism leveraged within the Teradata JDBC driver.
EDIT
FastLoad has been around for 15+ years and does what it was intended to well -- load lots of data fast. Your other options would be to create a fixed length record where you do not have to rely on a delimiter or create an INMOD to parse the file as it is streamed into FastLoad.
Other alternatives include MultiLoad, Teradata Parallel Transport, TPUMP, or a proper ETL tool to load your data. Each have their own advantages and disadvantages that have to be considered with the format of the data which is being supplied to the environment.
In our application we have a textfield that is controlled by TinyMCE. If the customer pastes text from Word into the textfield, Oracle balks when we are trying to store this text in our database:
ORA-01461: can bind a LONG value only for insert into a LONG column
Cleaning the text in, say Notepad, will not produce any problems, so my guess is that the problem lies in the input string containing some kind of binary junk that Oracle uses as a delimiter between the values that are used in the sql insert string.
Upgrading our ancient TinyMCE will probably fix the problem, but I also want to ensure the text really is clean when passed to the lower layers. So I thought that I might ensure the text is true ASCII, and if not, clean everything that does not pass as ASCII by looping through the lines in the input and do the following:
line.replaceAll("[^\\p{ASCII}]", "")
Is this a viable solution, and if not, what are the pitfalls?
What about cleaning the pastes content like i described here?
This might also remove junk.
There is no problem when I try to insert this symbol "ñÑ" in the mysql database. However, when I try to retrieve the same data the symbol or character that was selected by the query would appear as null value or something like a ? or a square.
Please help me with these I have been troubled many weeks by these problems. I just cannot understand anymore. I have written the code in java.
The "�" is the replacement character, used when something processing characters can't display or otherwise handle a character. A box is sometimes used for the same purpose, or indicates that the font being used doesn't have a glyph for some character.
To resolve this, check that the character sets being used for the various components, such as the column and connection, are correct.
See also: "Setting the default Java character encoding?"