From client to server, my app correctly accepts text of any kind (I'm using prepared statements to stay safe on the database end). When I display text from server to client I'm running into a problem now with text that has quotation marks, and perhaps there are other cases I haven't thought of or encountered yet.
label:"<%=myObj.getText()%>"
is being translated into:
label:""Hello World""
I think what I need is for the quotation marks in the data to be escaped like so: \"
Is there a class that will do this encoding already so I don't have to write my own parser?
Searching through StackOverflow, I found a promising answer in java.net.URLEncoder that encodes special characters. In reading through it's documentation, though, I found one translation I didn't want:
The space character " " is converted into a plus sign "+".
Related
I have a table of project in which i have a project name and that project name may contain any special character or any alpha numeric value or any combination of number word or special characters.
Now i need to apply keyword search in that and that may contain any special character in search.
So my question is: How we can search either single or multiple special characters in database?
I am using mysql 5.0 with java hibernate api.
This should be possible with some simple sanitization of you query.
e.g: a search for \#(%*#$\ becomes:
SELECT * FROM foo WHERE name LIKE "%\\#(\%*#$\\%";
when evaluated the back slashes escape so that the search ends up being anything that contains "\#(%*#$\"
In general anything that's a special character in a string can be escaped via a backslash. This only really becomes tricky if you have a name such as: "\\foo\\bar\\" which to escape properly would become "\\\\foo\\\\bar\\\\"
A side note, please proof read your posts prior to finalizing. Its really depressing and shows a lack of effort when your questions title has spelling errors in it.
I work on a Java EE web application that uses a combination of Dojo and plain javascript for the front-end.
We've discovered that when ResourceBundle properties are used in javascript, in some cases they end up breaking code.
Specifically, this happens when the properties contain quotes (single and double) & escape sequences (\n, \s ...).
The solution seems to be to include extra escape characters. For instance, \n needs to be prepended by one more slash (\\n) when used in a Js alert
to correctly render the line break, and Quotes if not escaped truncate the content prematurely for obvious reasons.
Our solution to the above issues so far has been to put in the extra escape characters in the property files itself. But this is something that we would like to move away from.
It seems like this might be a widespread problem and I'd like to hear from the experts on how you might have solved this problem.
Current Usage: key=A newline is represented with \\n and this \" is within quotes \".
Envisioned Usage : key=A newline is represented with \n and this " is within quotes ".
PS: We typically use the <fmt:message> tag to access these values in the front end and for use in javascript.
Consider using StringUtils. If has a method to escape input like yours.
http://commons.apache.org/lang/api-2.5/org/apache/commons/lang/StringEscapeUtils.html#escapeJava(java.lang.String)
I have a java JSP/servlet application in a Tomcat, and fronted by an Apache.
The server side checks to make sure only letters in ranges [A..Z][a..z], digits, and punctuation symbols are accepted.
However, when a, for example, chinese character is entered, the value in the server-side looks something like 'ᝈ".
Hence, as far as the server-side is concerned, these are valid punctuation symbols and digits.
Any pointers that can help? Driving me insane after a 10 coding marathon.
You can use Apache Commons StringEscapeUtils.unescapeHTML() in Java.
unescapeHtml(String str)
does the following:
Unescapes a string containing entity escapes to a string containing
the actual Unicode characters corresponding to the escapes.
You need to process the text using a unicode encoding like UTF-8.
First make sure your server is handling requests with UTF-8 encoding. Where you set or configure that will depend on how you're implementing your JSPs/Servlets, but see: http://docs.oracle.com/javaee/6/api/javax/servlet/ServletRequest.html#setCharacterEncoding(java.lang.String)
when dealing with non-english filename.
The problem is that my program cannot gurantee those directories and filenames are in English, if some filenames using japanese, chinese character it will display some character like '?'.
anybody can suggest me wat i need to do to access non english file name
The problem is that my program cannot guarantee those directories and filenames are in English. If a filename use japanese, chinese characters it will display some character like '?'.
The problem is apparently that "it" is using the wrong character set to display the filenames. The solution depends on whether "it" is your program (via a GUI), some other application, the command shell / terminal emulator, or the user's web browser. If you could provide more information, maybe I could offer some suggestions.
But turning the characters into underscores is most likely a bad solution. It is liable to lead to filename clashes, and those Chinese / Japanese / etc characters are most likely meaningful to the people who created the files.
By the way, the correct term for "english" letters is Latin.
EDIT
For your use-case, you don't to store the PDF file using a filename that bears any relation to the supplied filename. I suggest that you try to solve the problem by using a filename consisting of Latin numbers and letters generated from (say) currentTimeInMillis(). If that fails, then your real problem has nothing to do with the filenames at all.
EDIT 2
You ask about the statement
if (fileName.startsWith("=?iso-8859"))
This seems to be trying to unpick a filename in MIME encoded-word format; see RFC 2047 Section 2
Firstly, I think that code may be unnecessary. The javadoc is not specific, but I think that the Part.getFilename() method should deal with decoding of the filename.
Second, if the decoding is necessary, then you are going about it the wrong way. The stuff after the charset cannot simply be treated as the value of the filename. Look at the RFC.
Third, if you need to you should use the relevant MimeUtility methods to decode "word" tokens ... like the filename.
Fourthly, ISO-8859-1 is NOT a suitable encoding for characters in non-Latin character sets.
Finally, examine the raw email headers of the emails that you are trying to decode and look for the header line that starts
Content-Disposition: attachment; filename=...
If the filename looks like "=?iso-8859-1?...", and the filename is supposed to contain japanese / chinese / etc characters, then the problem is in the client (or whatever) that constructed the email. The character set needs to be "utf-8" or one of the other multibyte character sets.
Java uses Unicode natively - you don't need to replace special characters, as Unicode has no special characters - every code point is treated equally. Your replaceSpChars() may be the culprit here.
We are using Java and Oracle for development.
I have table in a oracle database which has a CLOB column in it. Some XYZ application dumps a text file in this column. The text file has multiple rows.
Is it possible that while reading the same CLOB file thru Java application, the escape sequences (new line chars, etc) may get lost??
Reason I asked this is, we gona parse this file line by line and if the escape sequences are lost, then we would be trouble. I would have done this analysis myself, but I am on vacation and my team needs urgent help.
Would really appreciate if you could provide any thoughts/inputs.
You need to ensure that you use the one correct and same character encoding throughout the whole process. I strongly recommend you to pickup UTF-8 for that. It covers every human character known at the world. Every step which involves handling of character data should be instructed to use the very same encoding.
In SQL context, ensure that the DB and table is created with UTF-8 charset. In JDBC context, ensure that JDBC driver is using UTF-8; this is often configureable by JDBC connection string. In Java code context, ensure that you're using UTF-8 when reading/writing character data from/to streams; you can specify it as 2nd constructor argument in InputStreamReader and OutputStreamWriter.
A CLOB stores character data. Carriage returns and line feeds are valid characters, though unprintable ones. As long as your XYZ app is correctly filling your CLOBs, the contents should be just as manageable to you as if they had come from the file.
Depending on the platform and the nature of said "XYZ app," lines could be separated by either \r(Mac), \r\n (DOS/Windows) or \n (Unix/Linux), and you should make allowance for this fact if necessary. This is one aspect where BufferedReader.readLine() is more convenient, as it transparently gets rid of this difference for you.
I'm not 100% sure what you mean by escape sequences in this context. Within a (for example) Java literal string, "\n" is an escape sequence representing a newline, but once that string is outputted into something (say, a database), it's not an escape sequence any more, it's an actual newline character.
Anyhow, to your direct question, Java through can read text from Oracle CLOBs perfectly fine. Newlines are not lost.