“UTF-8” encoding is not working in java build [closed] - java

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I saved my Java source file specifying it's encoding type as UTF-8 in my eclipse. It is working fine in eclipse.
When I create a build with maven & execute it in my system Unicode characters are not working.
This is my code :
byte[] bytes = new byte[dataLength];
buffer.readBytes(bytes);
String s = new String(bytes, Charset.forName("UTF-8"));
System.out.println(s);
Eclipse console & windows console screenshot attached.
Expecting eclipse output in other systems(windows command prompt, powershell window, Linux machine, etc.,).

You could use the Console class for that.The following code could give you some inspiration:
public class Foo {
public static void main(String[] args) throws IOException {
String s = "öäü";
write(s);
}
private static void write(String s) throws IOException {
String encoding = new OutputStreamWriter(System.out).getEncoding();
Console console = System.console();
if (console != null) {
// if there is a console attached to the jvm, use it.
System.out.println("Using encoding " + encoding + " (Console)");
try (PrintWriter writer = console.writer()) {
writer.write(s);
writer.flush();
}
} else {
// fall back to "normal" system out
System.out.println("Using encoding " + encoding + " (System out)");
System.out.print(s);
}
}
}
Tested on Windows 10(poowershell), Ubuntu 16.04(bash) with default settings. Also works from within IntelliJ (Windows and Linux).

From what I can tell, you either have the wrong character, which I don't think is the case, or you are trying to display it on a terminal that doesn't handle the character. I have written a short test to separate the issues.
public static void main(String[] args){
String testA = "ֆޘᜅᾮ";
String testB = "\u0586\u0798\u1705\u1FAE";
System.out.println(testA.equals(testB));
System.out.println(testA);
System.out.println(testB);
try(BufferedWriter check = Files.newBufferedWriter(
Paths.get("uni-test.txt"),
StandardCharsets.UTF_8,
StandardOpenOption.CREATE,
StandardOpenOption.TRUNCATE_EXISTING) ){
check.write(testA);
check.write("\n");
check.write(testB);
check.close();
} catch(IOException ioc){
}
}
You could replace the values with the characters you want.
The first line should print out true if the string is the actual string you want. After that it is a matter of displaying the characters. For example if I open the text file with less then half of them are broken. If I open it with firefox, then I see all four characters, but some are wonky. You'll need a font that has characters for the corresponding unicode value.
One thing you can do is open the file in a word processor and select a font that displays the characters you want correctly.
As suggested by the OP, including the -Dfile.encoding=UTF8causes the characters to display correctly when using System.out.println. Similar to this question which changes the encoding of System.out.

Related

how do i get the data from a database and store it into a text file?

I am new to databases in Java and i am trying to export the data from 1 table and store it in a text file. At the moment the code below writes to the text file however all on one line? can anyone help?
My Code
private static String listHeader() {
String output = "Id Priority From Label Subject\n";
output += "== ======== ==== ===== =======\n";
return output;
}
public static String Export_Message_Emails() {
String output = listHeader();
output +="\n";
try {
ResultSet res = stmt.executeQuery("SELECT * from messages ORDER BY ID ASC");
while (res.next()) { // there is a result
output += formatListEntry(res);
output +="\n";
}
} catch (Exception e) {
System.out.println(e);
return null;
}
return output;
}
public void exportCode(String File1){
try {
if ("Messages".equals(nameOfFile)){
fw = new FileWriter(f);
//what needs to be written here
//fw.write(MessageData.listAll());
fw.write(MessageData.Export_Message_Emails());
fw.close();
}
}
Don't use a hard coded value of "\n". Instead use System.getProperty("line.separator"); or if you are using Java 7 or greater, you can use System.lineSeparator();
Try String.format("%n") instead "\n".
Unless you're trying to practice your Java programming (which is perfectly fine of course!), you can export all the data from one table and store it in a file by using the SYSCS_UTIL.SYSCS_EXPORT_TABLE system procedure: http://db.apache.org/derby/docs/10.11/ref/rrefexportproc.html
I'm gonna assume you are using Windows and that you are opening your file with notepad. If that is correct then it is not really a problem with your output but with the editor you are viewing it with.
Try a nicer editor, ie. Notepad++
Do as the other answers suggest and use System.getProperty("line.separator"); or similar.
Use a Writer implementation such as, PrintWriter.
Personally I prefer "\n" over the system line separator, which on Windows is "\r\n".
EDIT: Added option 3

How to get encoding type of a .txt or .sql file

Is there a possibility to get the encoding of a existing .txt file? for example: you know a customer needs a specific encoding and you want to automize the process of .sql-data delivery. then you read out the endcoding from a client config and compare it to the current encoding of the file to be delivered. if they differ you change the encoding. could not find a solution till now. any help would be appreciated.
There is no explicit declaration of text encoding in files, but you can guess the encoding by analyzing specific byte sequences that are characteristic of a certain encoding.
Chardet does exactly that and tries to guess. If it can't say for sure what the encoding is, it will give you a list with confidence values (e.g. "90% this is utf8"). The project includes both a Python module and a command line tool. For a Java version, see JChardet.
My 2cents: if you just need a quick way to detect, the command line chardet tool is the way to go.
juniversalchardet is one of the best available API for detecting the encoding type. Please checkout this link. You can go through the list of encoding types supported by it
Working Example from the site
import org.mozilla.universalchardet.UniversalDetector;
public class TestDetector {
public static void main(String[] args) throws java.io.IOException {
byte[] buf = new byte[4096];
String fileName = args[0];
java.io.FileInputStream fis = new java.io.FileInputStream(fileName);
// (1)
UniversalDetector detector = new UniversalDetector(null);
// (2)
int nread;
while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
// (3)
detector.dataEnd();
// (4)
String encoding = detector.getDetectedCharset();
if (encoding != null) {
System.out.println("Detected encoding = " + encoding);
} else {
System.out.println("No encoding detected.");
}
// (5)
detector.reset();
}
}
Hope this helps!

to read unicode character in java

i am trying to read Unicode characters from a text file saved in utf-8 using java
my text file is as follows
अ, अदेबानि ,अन, अनसुला, अनसुलि, अनफावरि, अनजालु, अनद्ला, अमा, अर, अरगा, अरगे, अरन, अराय, अलखद, असे, अहा, अहिंसा, अग्रं, अन्थाइ, अफ्रि,
बियन, खियन, फियन, बन, गन, थन, हर, हम, जम, गल, गथ, दरसे, दरनै, थनै, थथाम, सथाम,
खफ, गल, गथ, मिख, जथ, जाथ, थाथ, दद, देख, न, नेथ, बर, बुंथ, बिथ, बिख, बेल, मम,
आ, आइ, आउ, आगदा, आगसिर
i have tried with the code as followed
import java.io.*;
import java.util.*;
import java.lang.*;
public class UcharRead
{
public static void main(String args[])
{
try
{
String str;
BufferedReader bufReader = new BufferedReader( new InputStreamReader(new FileInputStream("research_words.txt"), "UTF-8"));
while((str=bufReader.readLine())!=null)
{
System.out.println(str);
}
}
catch(Exception e)
{
}
}
}
getting out put as
????????????????????????
can anyone help me
You are (most likely) reading the text correctly, but when you write it out, you also need to enable UTF-8. Otherwise every character that cannot be printed in your default encoding will be turned into question marks.
Try writing it to a File instead of System.out (and specify the proper encoding):
Writer w = new OutputStreamWriter(
new FileOutputStream("x.txt"), "UTF-8");
If you are reading the text properly using UTF-8 encoding then make sure that your console also supports UTF-8. In case you are using eclipse then you can enable UTF-8 encoding foryour console by:
Run Configuration->Common -> Encoding -> Select UTF 8
Here is the eclipse screenshot.
You're reading it correctly - the problem is almost certainly just that your console can't handle the text. The simplest way to verify this is to print out each char within the string. For example:
public static void dumpString(String text) {
for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
System.out.printf("%c - %04x\n", c, (int) c);
}
}
You can then verify that each character is correct using the Unicode code charts.
Once you've verified that you're reading the file correctly, you can then work on the output side of things - but it's important to try to focus on one side of it at a time. Trying to diagnose potential failures in both input and output encodings at the same time is very hard.

Adding new lines to a .txt file [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm studying Chinese.
I have an iPhone app with optical character recognizer that can capture vocab lists in this format: (character TAB pronunciation TAB definition)
淫秽 TAB yin2hui4 TAB obscene; salacious; bawdy
网站 TAB wang3zhan4 TAB website
专项 TAB zhuan1xiang4 TAB attr. earmarked
but the flashcard app I use requires this format: (Character NEWLINE Pronunciation NEWLINE Definition)
淫秽
yin2hui4
obscene; salacious; bawdy
网站
wang3zhan4
<computing> website
专项
zhuan1xiang4
attr. earmarked
I only know a little Java. How do I convert the first format to the second format?
Obviously, we don't want to do your homework. But we don't want to leave you stranded either.
I've left many things open and the below is just a Java-looking pseudocode. You can start here...
FileReader reader = ... // open the file reader using the input file
FileWriter writer = ...// open a file for writing output
while(the stream doesn't end) { // provide the condition, as must be
String line = ... // read a line from the reader
String character = line.substring(0, line.indexOf("\t")),
pronounciation = line.substring(character.length() -1).substring(line.indexOf("\t", character.length()),
definition = line.substring(line.lastIndexOf("\t")); // Obviously, this isn't accurate.... you need to work around this.
writeLineToFile(character)
writeLineToFile(pronounciation)
writeLineToFile(definition)
}
close the reader and writer
Even though it looks like an Exercise. But ideally you can do.
Get the file contents (use commons-io)
Replace TAB with new line and write to file
example code
import java.io.File;
import java.io.IOException;
import org.apache.commons.io.FileUtils;
public class Test {
/**
* #param args
* #throws IOException
*/
public static void main(String[] args) throws IOException {
String path = "C:/test.txt";
// TODO Auto-generated method stub
File file = new File(path);
String string = FileUtils.readFileToString(file);
String finalString = string.replaceAll("\t", "\n");
FileUtils.write(file, finalString);
}
}
The file now would look like
淫秽
yin2hui4
obscene; salacious; bawdy
网站
wang3zhan4
website
专项
zhuan1xiang4
attr. earmarked

Java Apache FileUtils readFileToString and writeStringToFile problems

I need to parse a java file (actually a .pdf) to an String and go back to a file. Between those process I'll apply some patches to the given string, but this is not important in this case.
I've developed the following JUnit test case:
String f1String=FileUtils.readFileToString(f1);
File temp=File.createTempFile("deleteme", "deleteme");
FileUtils.writeStringToFile(temp, f1String);
assertTrue(FileUtils.contentEquals(f1, temp));
This test converts a file to a string and writtes it back. However the test is failing.
I think it may be because of the encodings, but in FileUtils there is no much detailed info about this.
Anyone can help?
Thanks!
Added for further undestanding:
Why I need this?
I have very large pdfs in one machine, that are replicated in another one. The first one is in charge of creating those pdfs. Due to the low connectivity of the second machine and the big size of pdfs, I don't want to synch the whole pdfs, but only the changes done.
To create patches/apply them, I'm using the google library DiffMatchPatch. This library creates patches between two string. So I need to load a pdf to an string, apply a generated patch, and put it back to a file.
A PDF is not a text file. Decoding (into Java characters) and re-encoding of binary files that are not encoded text is asymmetrical. For example, if the input bytestream is invalid for the current encoding, you can be assured that it won't re-encode correctly. In short - don't do that. Use readFileToByteArray and writeByteArrayToFile instead.
Just a few thoughts:
There might actually some BOM (byte order mark) bytes in one of the files that either gets stripped when reading or added during writing. Is there a difference in the file size (if it is the BOM the difference should be 2 or 3 bytes)?
The line breaks might not match, depending which system the files are created on, i.e. one might have CR LF while the other only has LF or CR. (1 byte difference per line break)
According to the JavaDoc both methods should use the default encoding of the JVM, which should be the same for both operations. However, try and test with an explicitly set encoding (JVM's default encoding would be queried using System.getProperty("file.encoding")).
Ed Staub awnser points why my solution is not working and he suggested using bytes instead of Strings. In my case I need an String, so the final working solution I've found is the following:
#Test
public void testFileRWAsArray() throws IOException{
String f1String="";
byte[] bytes=FileUtils.readFileToByteArray(f1);
for(byte b:bytes){
f1String=f1String+((char)b);
}
File temp=File.createTempFile("deleteme", "deleteme");
byte[] newBytes=new byte[f1String.length()];
for(int i=0; i<f1String.length(); ++i){
char c=f1String.charAt(i);
newBytes[i]= (byte)c;
}
FileUtils.writeByteArrayToFile(temp, newBytes);
assertTrue(FileUtils.contentEquals(f1, temp));
}
By using a cast between byte-char, I have the symmetry on conversion.
Thank you all!
Try this code...
public static String fetchBase64binaryEncodedString(String path) {
File inboundDoc = new File(path);
byte[] pdfData;
try {
pdfData = FileUtils.readFileToByteArray(inboundDoc);
} catch (IOException e) {
throw new RuntimeException(e);
}
byte[] encodedPdfData = Base64.encodeBase64(pdfData);
String attachment = new String(encodedPdfData);
return attachment;
}
//How to decode it
public void testConversionPDFtoBase64() throws IOException
{
String path = "C:/Documents and Settings/kantab/Desktop/GTR_SDR/MSDOC.pdf";
File origFile = new File(path);
String encodedString = CreditOneMLParserUtil.fetchBase64binaryEncodedString(path);
//now decode it
byte[] decodeData = Base64.decodeBase64(encodedString.getBytes());
String decodedString = new String(decodeData);
//or actually give the path to pdf file.
File decodedfile = File.createTempFile("DECODED", ".pdf");
FileUtils.writeByteArrayToFile(decodedfile,decodeData);
Assert.assertTrue(FileUtils.contentEquals(origFile, decodedfile));
// Frame frame = new Frame("PDF Viewer");
// frame.setLayout(new BorderLayout());
}

Categories