How to get Unicode in Jar [duplicate] - java

I wrote a program for calling Unicode from outside the program. I am using Windows XP and Eclipse. When I run that program in the IDE, it shows the Unicode, but when I exported it as a .jar file I am unable read the Unicode and it shows as boxes.
I followed these instructions to install Unicode in my computer. I followed links to install the Telugu fonts to my system.
Can any one please tell me how can I get Unicode in Jar files?

While not answering the question directly, here is a small howto on how to read/write correctly from text files, in an OS dependent way.
First thing to know is that the JVM has a file.encoding property. It defines the default encoding used for all file read/write operation, all readers used when not specifying an encoding.
As such, you don't want to use the default constructors, but define the encoding each time. In Java, the class which "embodies" an encoding is Charset. If you want UTF-8, you will use:
StandardCharsets.UTF_8 (Java 7+),
Charset.forName("UTF-8") (Java 6-),
Charsets.UTF_8 (if you use Guava).
In order to read a file correctly, open an InputStream to that file, then an InputStreamReader over that InputStream (in the code samples below, UTF8 is the UTF-8 charset obtained from one of the methods above):
final InputStream in = new FileInputStream(...);
final Reader reader = new InputStreamReader(in, UTF8);
In order to write a file correctly, open an OutputStream to it, then an OutputStreamWriter over that OutputStream:
final OutputStream out = new FileOutputStream(...);
final Writer writer = new OutputStreamWriter(out, UTF8);
And, of course, do not forget to .close() both of the streams/readers/writers in a finally block. Hint: if you don't use Java 7, use Guava 14.0+, use Closer. It is the most secure way to deal with multiple I/O resources and ensuring they are dealt with correctly.

You did the code already (I didn't follow the links), but you may compare the code with How to import a font - registerFont is crucial.
Also in a jar file all paths are case-sensitive. You may inspect the jar with 7zip or WinZip.

Related

BufferedOutputStream not working with Korean characters as expected

I'm trying to write Korean characters to a File and it's writing some gibberish data which I need to work around for showing as Korean data when I open it in CSV. How can I achieve my requirement without the workaround of decoding back to UTF-8 and show Korean data.
File localExport = File.createTempFile("char-test", ".csv");
try (
FileOutputStream fos = new FileOutputStream(localExport);
BufferedOutputStream bos = new BufferedOutputStream(fos);
OutputStreamWriter outputStreamWriter =
new OutputStreamWriter(bos, StandardCharsets.UTF_8)
) {
ArrayList<String> rows = new ArrayList<>();
rows.add("\"가짜 사용자\",사용자123,saint1_user123");
rows.add("\"페이크유저루노도스트레스 성도1\",saint1_user1");
for (int i=0; i<2; i++) {
String csvUserStr = rows.get(i);
outputStreamWriter.write(csvUserStr);
}
}
It's writing the below data instead of the one I'm actually writing to the File.
There is absolutely nothing wrong with your java code. You are writing those characters, including the korean, precisely as written.
Whatever tool you are using to look at this file?
That's the broken one. Tell it that the file is UTF-8 based. If you can't, get a better tool or figure out which encoding it reads in, and update your java code.
Note that CSV files, text files, etc - they do not store the encoding that was used to write the data. All the programs that read/write to the file need to just know what encoding it is, there's no real way to know other than being told.
UPDATE: From a comment it looks like 'the tool that is reading this' is excel.
Excel asks for the encoding of the file when you use the 'import CSV' dialog. Pick UTF-8 in the dropdown. Depends on which version/OS you're on, but usually it's called 'File Origin'.
If you prefer that your client need not mess with the default, usually the default is something like MacRoman or Win1282, and with such an encoding, it is in fact impossible get korean characters. They simply aren't in that set.
if you want the fire and forget approach, generate the excel file yourself, for example using Apache POI.
CSV files don't have any means to carry encoding information "in-band"—in the file itself. I'm guessing the default character encoding used for Excel CSV imports is the system default, so if that isn't Korean, they will have to specify the encoding when they import the CSV. If your client requires CSV, they have no choice but to accept that behavior.
However, if their requirement is to open your file in Excel (and not that the file has to be CSV format), you could write an Excel spreadsheet instead. The various Excel file formats do include character encoding information, so they would be able to open the file without manually specifying the encoding.
Library recommendations are off-topic, but libraries such Apache POI make writing simple Excel sheets fairly easy. There are additional benefits as well, such as taking care of any necessary escaping for you, so that your file doesn't repeatedly break when unanticipated values are included in the spreadsheet.
As mentioned Excel fails to detect that the text is encoded in UTF-8. One solution is to write an invisible BOM character as first one:
outputStreamWriter.write("\uFEFF");
for...
This is a normally superfluous and ugly marker for miscellaneous UTF encoding.
By the way take a look at the class Files, that can reduce the code to one line.

How to read textfiles with unknown encoding?

I want to read several text files (eg CSV), but I don't know the encoding.
As the textfiles may contain special chars like umlauts, chosing the right encoding seems to be crucial.
new BufferedReader(new InputStreamReader(resource.getInputStream(), encoding));
I tried reading with ISO_8859_1 which did not work propertly with umlauts encoded. So I tried UTF-8, which works.
But I don't know in future if this might also cause problems with different files. And I never now before reading a file in which encoding the file is.
So how should I best read files with encoding unknown?
Strictly speaking the other two answers are right - you just have to know what the encoding is to be guaranteed of anything. However, there are libraries out there that will allow you to make educated guesses about the encoding. Check out ICU4J or jchardet, for example.
You have to know the encoding, you cannot read the files correctly if you don't know it. As UTF-8 works just keep using it. Also check with the producer of the files if they will keep producing them in UTF-8. They should document this.
It is impossible to programmatically recognize encoding of a text file. The only way is to try to open it in a text editor with different encodings until you can read the text

Java Serialization save object in files

In java when serializing objects
FileOutputStream fileOut =
new FileOutputStream("src/employee.ser");
ObjectOutputStream out = new ObjectOutputStream(fileOut);
out.writeObject(em);
out.close();
fileOut.close();
Can we use any kind of extensions .bin,.txtas output file. why .seris most prefereable?
.ser is a shorthand from Serializable and the common 3 letter file extension. You can use any other extension you like or no extension altogether. The file will be created with no problem. Test it. After you test it and demonstrate to yourself this is possible, I would recommend defining the proper extension for the generated files, or just keep using ser since it's common Serialized binary data for Java developers.
Imagine you use txt as extension rather than ser or a custom extension. Another non-developer user of the PC accidentally enters to the folder containing your binary data file with a txt extension (probably on a Windows or Mac environment and hardly but not impossible on Linux :) ) and opens it and see gibberish because, after all, is serialized data. This user may do nothing for seeing gibberish or may think the file is corrupted and deletes it. IMO this is why it would be better to use a non-common extension for your files containing binary data.
Note: you can open any file with almost any program despite its extension and see it choke for not recognizing the format or displaying gibberish, but that's outside of the question.

Is it possible to create InputStream for a UTF-8 file?

We are making some code change to our production code.
In this, earlier we used a InputStream (Basically a FileInputStream) for reading a file from file path and this InputStream is passed to many methods ahead.
Now we realized, the file can contain chinese characters also, so we want to use UTF-8 encoding.
I have a file path in string. And know, sometimes the file can contain chinese character and sometimes not.
I am reluctant to make changes in so many methods and was trying to somehow use UTF-8 encoding while creating InputStream (FileInputStream).
I searched on internet but all I could get is output in bufferreader/inputstream reader (like example Reading InputStream as UTF-8 or http://www.mkyong.com/java/how-to-read-utf-8-encoded-data-from-a-file-java/
So is it possible to read a file from file path and also handle chinese characters and convert it in InputStram?
An InputStream does not handle text, so it does not care about the encoding, so the direct answer to your question is: no, you can't create an InputStream with UTF-8 encoding.
You can however handle UTF-8 files just fine with an InputStream by simply carrying the bytes around and never manipulating them in any way.
If you want to read text from a file you need to construct a Reader and then you'll need to specify the encoding (UTF-8 for you) in the constructor.
If you show us the point where data from the InputStream gets turned into String or char[] objects, then I can show you the place where you need to change your code.

Does endsWith consider a path's file extension, and is this a security vulnerability?

When I call endsWith(".pdf"), would this open malware.pdf.exe or just malware.pdf?
String sFileName = request.getParameter("fName");
if (sFileName.toLowerCase().endsWith(".pdf"))
// open file
else
// don’t open the file
String.endsWith works as documented. However, there are a couple of obvious problems here.
A NUL character \0 will typically terminate the string as far as the OS file API is concerned (because it'll be using C strings).
If served up, may lose content by extension, possibly being macgiced to a different type.
It's generally dangerous to run PDFs downloaded from the internet from the local filesystem. (Chrome warns of this and see Billy Rios on Content Smuggling).
.endsWith("string") will perform as you intend. However, that doesn't mean that the file is actually a pdf. Check out this SO question or others for more information on how to check the header.

Categories