Android Resources.openRawResource() encoding issue [duplicate] - java

I am reading a property file which consists of a message in the UTF-8 character set.
Problem
The output is not in the appropriate format. I am using an InputStream.
The property file looks like
username=LBSUSER
password=Lbs#123
url=http://localhost:1010/soapfe/services/MessagingWS
timeout=20000
message=Spanish character are = {á é í, ó,ú ,ü, ñ, ç, å, Á, É, Í, Ó, Ú, Ü, Ñ, Ç, ¿, °, 4° año = cuarto año, €, ¢, £, ¥}
And I am reading the file like this,
Properties props = new Properties();
props.load(new FileInputStream("uinsoaptest.properties"));
String username = props.getProperty("username", "test");
String password = props.getProperty("password", "12345");
String url = props.getProperty("url", "12345");
int timeout = Integer.parseInt(props.getProperty("timeout", "8000"));
String messagetext = props.getProperty("message");
System.out.println("This is soap msg : " + messagetext);
The output of the above message is
You can see the message in the console after the line
{************************ SOAP MESSAGE TEST***********************}
I will be obliged if I can get any help reading this file properly. I can read this file with another approach but I am looking for less code modification.

Use an InputStreamReader with Properties.load(Reader reader):
FileInputStream input = new FileInputStream(new File("uinsoaptest.properties"));
props.load(new InputStreamReader(input, Charset.forName("UTF-8")));
As a method, this may resemble the following:
private Properties read( final Path file ) throws IOException {
final var properties = new Properties();
try( final var in = new InputStreamReader(
new FileInputStream( file.toFile() ), StandardCharsets.UTF_8 ) ) {
properties.load( in );
}
return properties;
}
Don't forget to close your streams. Java 7 introduced StandardCharsets.UTF_8.

Use props.load(new FileReader("uinsoaptest.properties")) instead. By default it uses the encoding Charset.forName(System.getProperty("file.encoding")) which can be set to UTF-8 with System.setProperty("file.encoding", "UTF-8") or with the commandline parameter -Dfile.encoding=UTF-8.

If somebody use #Value annotation, could try StringUils.
#Value("${title}")
private String pageTitle;
public String getPageTitle() {
return StringUtils.toEncodedString(pageTitle.getBytes(Charset.forName("ISO-8859-1")), Charset.forName("UTF-8"));
}

You should specify the UTF-8 encoding when you construct your FileInputStream object. You can use this constructor:
new FileInputStream("uinsoaptest.properties", "UTF-8");
If you want to make a change to your JVM so as to be able to read UTF-8 files by default, you will have to change the JAVA_TOOL_OPTIONS in your JVM options to something like this :
-Dfile.encoding=UTF-8

If anybody comes across this problem in Kotlin, like me:
The accepted solution of #Würgspaß works here as well. The corresponding Kotlin syntax:
Instead of the usual
val properties = Properties()
filePath.toFile().inputStream().use { stream -> properties.load(stream) }
I had to use
val properties = Properties()
InputStreamReader(FileInputStream(filePath.toFile()), StandardCharsets.UTF_8).use { stream -> properties.load(stream) }
With this, special UTF-8 characters are loaded correctly from the properties file given in filePath.

Related

How to set 'charset' for DatumWriter || write avro that contains arabic characters to HDFS

Some of the data contains value in Arabic format, and when the data is written, reader code/hadoop fs -text command shows ?? instead of Arabic characters.
1) Writer
// avro object is provided as SpecificRecordBase
Path path = new Path(pathStr);
DatumWriter<SpecificRecord> datumWriter = new SpecificDatumWriter<>();
FileSystem fs = FileSystem.get(URI.create(hdfsUri), conf); // HDFS File System
FSDataOutputStream outputStream = fs.create(path);
DataFileWriter<SpecificRecord> dataFileWriter = new DataFileWriter<>(datumWriter);
Schema schema = getSchema(); // method to get schema
dataFileWriter.setCodec(CodecFactory.snappyCodec());
dataFileWriter.create(schema, outputStream);
dataFileWriter.append(avroObject);
2) Reader
Configuration conf = new Configuration();
FsInput in = new FsInput(new Path(hdfsFilePathStr), conf);
DatumReader<Row> datumReader = new GenericDatumReader<>();
DataFileReader<Row> dataFileReader = new DataFileReader<>(in, datumReader);
GenericRecord outputData = (GenericRecord) dataFileReader.iterator.next();
I've tried hadoop fs -text {filePath} command, there also the values in Arabic appear as ??.
It will be really difficult to change the format in which data is written because there are numerous consumers of the same file.
Tried reading through SpecificRecordBase, still getting ??.
Edit
Also tried these (in both reader and writer):
Configuration conf = new Configuration();
conf.set("file.encoding", StandardCharsets.UTF_16.displayName());
AND
System.setProperty("file.encoding", StandardCharsets.UTF_16.displayName());
Doesn't help.
Apparently, HDFS does not support a lot of non-english characters. To work around that, change the field from String to bytes in your avro schema.
To convert your value from String to bytes, use:
ByteBuffer.wrap(str.getBytes(StandardCharsets.UTF_8)).
Then, while reading, to convert it back to String use:
new String(byteData.array(), StandardCharsets.UTF_8).
Rest of the code in your reader and writer stays the same.
Doing this, for English characters hadooop fs -text command will show proper text but for non-English characters it might show gibberish, but your reader will still be able to create the UTF-8 String from ByteBuffer.

How to convert utf-8 characters in utf-16 unicode - scala

Ref: https://www.branah.com/unicode-converter
I'm new in scala and java and trying to writ a .properties file (in few languages like Chinese,french,German etc ) using scala for internationalization functionality. For that I'm using following code:
for ((key, val) <- jsonData.get.asInstanceOf[Map[String, String]]) {
var file: PrintWriter = null
file = new PrintWriter(filepath, "UTF-8")
prop.setProperty(key, val)
prop.store(file, "")
file.close()
}
So this code is working but its writing file in UTF-8 format like:
传播特征 设计师 考虑 测量
düşünce
which is not rendering properly in browser so instead of that I want to convert it into the UTF-16 unicode format like:
\u4f20\u64ad\u7279\u5f81 \u8bbe\u8ba1\u5e08 \u8003\u8651 \u6d4b\u91cf
\u0064\u00fc\u015f\u00fc\u006e\u0063\u0065
As per this converter: https://www.branah.com/unicode-converter
I don't have access of client side so can't post that code here but I'm sure its same like fetching data from .properties file through ajax and rendering it on browser.
How can I convert it into utf-16 unicode so that it'll render properly in browser.
Any help would be appreciated.
You can use the same code and user "UTF-16"for the Charset
for ((key, val) <- jsonData.get.asInstanceOf[Map[String, String]]) {
var file: PrintWriter = null
file = new PrintWriter(filepath, "UTF-16")
prop.setProperty(key, val)
prop.store(file, "")
file.close()
}
Please check the supported java Charset
https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html
Solved it myself.
replaced FileOutputStream with PrintWriter and it worked :)
var file: File = null
file = new File(filepath)
var fos: FileOutputStream = null
fos = new FileOutputStream(file)
prop.setProperty(key, val)
prop.store(fos, "")
Thanks #Jon Skeet for help.

Encoding issue when reading from Google translator API and writing to properties file

I am using Google translator API to generate Arabic property file from English property file.
Making a URL connection and making a GET request to the URL.,passing original language, translation language and value to be translated
URLConnection urlCon = null;
String urlStr = "https://www.googleapis.com/language/translate/v2";
URL url = new URL(urlStr + "?key=" + apikey + "&source=" + origlang + "&target=" + translateToLang + "&q=" + value);
urlCon = url.openConnection();
urlCon.setConnectTimeout(1000 * 60 * 5);
urlCon.setReadTimeout(1000 * 60 * 5);
urlCon.setDoInput(true);
urlCon.setDoOutput(true);
urlCon.setUseCaches(false);
((HttpURLConnection) urlCon).setRequestMethod("GET");
urlCon.setRequestProperty("Accept-Charset", "UTF-8");
Reading the response from the URL connection through inputstream reader. Passing UTF-8 in the encoding parameter.
BufferedReader br = new BufferedReader(new InputStreamReader(((URLConnection) urlCon).getInputStream(), "UTF-8"));
/* Reading the response line by line */
StringBuffer responseString = new StringBuffer();
String nextLine = null;
while ((nextLine = br.readLine()) != null) {
responseString.append(nextLine);
}
// if response is null or empty, throw exception
String response = responseString.toString();
Parsing the JSON received through GSON parser
JsonElement jelement = new JsonParser().parse(response);
JsonObject jobject = jelement.getAsJsonObject();
jobject = jobject.getAsJsonObject("data");
JsonArray jarray = jobject.getAsJsonArray("translations");
jobject = jarray.get(0).getAsJsonObject();
String result = jobject.get("translatedText").toString();
Writing the translated value in a new property file through fileoutstream
FileOutputStream foutStream = new FileOutputStream(outFile);
foutStream.write(key.getBytes());
foutStream.write("=".getBytes());
foutStream.write(transByte.getBytes());foutStream.write("\n".getBytes());
The issue is I am getting garbled text(?????) written in the new property file for Arabic language.
When you call transByte.getBytes(), the Arabic translation is encoded with your platform default encoding, which will only handle Arabic if your machine is configured for UTF-8 or Arabic. Otherwise, characters will be replaced by '�' or '?' .
Create a new Properties instance, and populate it using setProperty() calls. Then when you store it, the proper escaping will be applied to your Arabic text, which is necessary because property files are encoded with ISO-8859-1 (an encoding for Western Latin characters).
Alternatively, you can store the Properties using a Writer instance that is configured with whatever encoding you choose, but the encoding isn't stored in the file itself, so you will need meta-data or a convention to set the correct encoding when reading the file again.
Finally, you can store the Properties in an XML format, which will use UTF-8 by default, or you can specify another encoding. The file itself will specify the encoding, so it's easier to use an optimal encoding for each language.
Trying to emit a file format using custom string concatenation, as you are doing, is an oft-repeated recipe for disaster. Whether it's XML, JSON, or a simple properties file, it's far too easy to overlook special cases that require escape sequences, etc. Use a library designed to emit the format instead.

Java How to write a string in file as bytes and get string back

I need to write a string in a file as bytes in UTF-8 and then get these bytes back from file and convert it back to string and as a consiquence get the same string. May be it sounds easy but there is a hidden problem such as incorrect symbols in file. I mean that after appending in file it must contain something like:
00000008 d0bad0bb d18ed187 00000010 etc...
But it contains stuff like that:
mystring ---a lot of space--- (and the symbol that doesn't display here)
So, what have I already done? I've tried this way:
Before code read this: I keep strings in HashMap < String, String > that's why my code contains get(...) etc.
try {
FileOutputStream oStream = new FileOutputStream("filename.txt");
Set<String> keySet = storage.keySet();
ByteBuffer buffer = ByteBuffer.allocate(1024);
for (String key : keySet) {
byte[] keyInByte = key.getBytes("UTF-8");
byte[] valueInByte = storage.get(key).getBytes("UTF-8");
oStream.write(buffer.putInt(0, valueInByte.length).array());
oStream.write(keyInByte);
oStream.write((buffer.putInt(0, valueInByte.length).array()));
oStream.write(valueInByte);
}
} catch (Exception e) {
System.err.println("permission denied");
}
I have also tried use PrintWriter, FileWriter, etc... but it doesn't give what I need. For example, some of them need toString() method but after toString() I will lose the ability to work with bytes.
! Notice, that I've tried to change my notepad to UTF-8 encode, but it gives no result.
If you want to use Properties you can do this.
new Properties(map).save(new FileOutputStream("filename.proeprties"));
to load the properties
Properties prop = new Properties();
prop.load(new FileInputStream("filename.properties"));
map.putAll(prop);
To save copying the data you can use Properties as a Map.
key1=value1
key2=value2
Note: a key cannot contain a = or a newline.
This is how I would do it in a binary format
DataOutputStream dos = new BufferedOutputStream(new FileOutputStream("filename.dat")));
dos.writeInt(map.size());
for (Map.Entry<String, String> entry : map.entrySet()) {
dos.writeUTF(entry.getKey());
dos.writeUTF(entry.getValue());
}
dos.close();
This is how I would write it in text, using UTF-8
PrintWriter pw = new PrintWriter(new OutputStreamWriter(new FileOuputStream("filename.txt") "UTF-8"));
pw.println(map.size());
for (Map.Entry<String, String> entry : map.entrySet()) {
pw.println(encode(entry.getKey()));
pw.println(encode(entry.getValue()));
}
pw.close();
public static String encode(String s) {
// if you can assume no new lines, don't do anything.
return s.replaceAll("\\\\", "\\\\\\\\").replaceAll("\n", "\\\\n");
}
This will produce a like like
key1
value1
key2
value2
This you can edit fairly easy. If youc an assume the key doesn't have an = or : or tab, you can use one line like a properties file

JDK 1.5 Properties load with unicode characters

JDK 1.5 Properties load method only gets InputStream while JDK 1.6+ load method also gets also Reader. When a string with Unicode characters is loaded to a properties object on JDK 1.6+ with load(reader), there is no problem. But on JDK 1.5 there is only load(InputStream) method; when loaded to properties unicode characters are not correctly loaded.
Properties props = new Properties();
ByteArrayInputStream bis = null;
Reader reader = null;
try {
bis = new ByteArrayInputStream(someStringWithUnicodeChars.getBytes("UTF-8"));
reader = new InputStreamReader(bis, "UTF-8");
} catch (UnsupportedEncodingException ex) {
ex.printStackTrace();
}
props.load(reader); // This reads unicode characters correctly on JDK 1.6+
// There is no props.load(reader) method on JDK 1.5, so below method is used
props.load(bis);
// but Unicode characters are not loaded correctly.
How can I load following example string with unicode characters to properties object.
key1=test İ Ş Ğ
key2=ÇÇÇÇ
From the 1.5 javadoc "The stream is assumed to be using the ISO 8859-1 character encoding"
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Properties.html#load(java.io.InputStream)
Try this:
InputStream in = new ByteArrayInputStream(someStringWithUnicodeChars.getBytes("ISO-8859-1"));
Properties props = new Properties();
props.load(in);
therefor in the JDK exists the tool native2ascii[.exe].
1) create the properties file as UTF-8, name it for example: sample.native
2) convert the native properties file to Unicode escape sequences: native2ascii prop.native > prop.properties
3) load and process the properties file
// example: you will see the right UTF-8 characters only if your console suppert UTF-8
class PropsFile {
public static void main(String[] args) throws Exception {
try (FileInputStream fis = new FileInputStream("sample.properties")) {
Properties props = new Properties();
props.load(fis);
for (String name : props.stringPropertyNames()) {
System.out.println(name + "=" + props.getProperty(name));
}
}
}
}

Categories