Displaying Arabic on Device J2ME - java

I am using some arabic text in my app. on simulator Arabic Text is diplaying fine.
BUT on device it is not displaying Properly.
On Simulator it is like مَرْحَبًا that.
But on device it is like مرحبا.
My need is this one مَرْحَبًا.

Create text resources for a MIDP application, and how to load them at run-time. This technique is unicode safe, and so is suitable for all languages. The run-time code is small, fast, and uses relatively little memory.
Creating the Text Source
اَللّٰهُمَّ اِنِّىْ اَسْئَلُكَ رِزْقًاوَّاسِعًاطَيِّبًامِنْ رِزْقِكَ
مَرْحَبًا
The process starts with creating a text file. When the file is loaded, each line becomes a separate String object, so you can create a file like:
This needs to be in UTF-8 format. On Windows, you can create UTF-8 files in Notepad. Make sure you use Save As..., and select UTF-8 encoding.
Make the name arb.utf8
This needs to be converted to a format that can be read easily by the MIDP application. MIDP does not provide convenient ways to read text files, like J2SE's BufferedReader. Unicode support can also be a problem when converting between bytes and characters. The easiest way to read text is to use DataInput.readUTF(). But to use this, we need to have written the text using DataOutput.writeUTF().
Below is a simple J2SE, command-line program that will read the .uft8 file you saved from notepad, and create a .res file to go in the JAR.
import java.io.*;
import java.util.*;
public class TextConverter {
public static void main(String[] args) {
if (args.length == 1) {
String language = args[0];
List<String> text = new Vector<String>();
try {
// read text from Notepad UTF-8 file
InputStream in = new FileInputStream(language + ".utf8");
try {
BufferedReader bufin = new BufferedReader(new InputStreamReader(in, "UTF-8"));
String s;
while ( (s = bufin.readLine()) != null ) {
// remove formatting character added by Notepad
s = s.replaceAll("\ufffe", "");
text.add(s);
}
} finally {
in.close();
}
// write it for easy reading in J2ME
OutputStream out = new FileOutputStream(language + ".res");
DataOutputStream dout = new DataOutputStream(out);
try {
// first item is the number of strings
dout.writeShort(text.size());
// then the string themselves
for (String s: text) {
dout.writeUTF(s);
}
} finally {
dout.close();
}
} catch (Exception e) {
System.err.println("TextConverter: " + e);
}
} else {
System.err.println("syntax: TextConverter <language-code>");
}
}
}
To convert arb.utf8 to arb.res, run the converter as:
java TextConverter arb
Using the Text at Runtime
Place the .res file in the JAR.
In the MIDP application, the text can be read with this method:
public String[] loadText(String resName) throws IOException {
String[] text;
InputStream in = getClass().getResourceAsStream(resName);
try {
DataInputStream din = new DataInputStream(in);
int size = din.readShort();
text = new String[size];
for (int i = 0; i < size; i++) {
text[i] = din.readUTF();
}
} finally {
in.close();
}
return text;
}
Load and use text like this:
String[] text = loadText("arb.res");
System.out.println("my arabic word from arb.res file ::"+text[0]+" second from arb.res file ::"+text[1]);
Hope this will help you. Thanks

Related

How to retrieve the encoding of an XML file to parse it correctly? (Best Practice)

My application downloads xml files that happen to be either encoded in UTF-8 or ISO-8859-1 (the software that generates those files is crappy so it does that). I'm from Germany so we're using Umlauts (ä,ü,ö) so it really makes a difference how those files are encoded.
I know that the XmlPullParser has a method .getInputEncoding() which correctly detects how my files are encoded. However I have to set the encoding in my FileInputStream already (which is before I get to call .getInputEncoding()). So far I'm just using a BufferedReader to read the XML file and search for the entry that specifies the encoding and then instantiate my PullParser afterwards.
private void setFileEncoding() {
try {
bufferedReader.reset();
String firstLine = bufferedReader.readLine();
int start = firstLine.indexOf("encoding=") + 10; // +10 to actually start after "encoding="
String encoding = firstLine.substring(start, firstLine.indexOf("\"", start));
// now set the encoding to the reader to be used for parsing afterwards
bufferedReader = new BufferedReader(new InputStreamReader(fileInputStream, encoding));
bufferedReader.mark(0);
} catch (IOException e) {
e.printStackTrace();
}
}
Is there a different way to do this? Can I take advantage of the .getInputEncoding method? Right now the method seems kinda useless to me because how does my encoding matter if I've already had to set it before being able to check for it.
If you trust the creator of the XML to have set the encoding correctly in the XML declaration, you can sniff it as you're doing. However, be aware that it can be wrong; it can disagree with the actual encoding.
If you want to detect the encoding directly, independently of the (potentially wrong) XML declaration encoding setting, use a library such as ICU CharsetDetector or the older jChardet.
ICU CharsetDetector:
CharsetDetector detector;
CharsetMatch match;
byte[] byteData = ...;
detector = new CharsetDetector();
detector.setText(byteData);
match = detector.detect();
jChardet:
// Initalize the nsDetector() ;
int lang = (argv.length == 2)? Integer.parseInt(argv[1])
: nsPSMDetector.ALL ;
nsDetector det = new nsDetector(lang) ;
// Set an observer...
// The Notify() will be called when a matching charset is found.
det.Init(new nsICharsetDetectionObserver() {
public void Notify(String charset) {
HtmlCharsetDetector.found = true ;
System.out.println("CHARSET = " + charset);
}
});
URL url = new URL(argv[0]);
BufferedInputStream imp = new BufferedInputStream(url.openStream());
byte[] buf = new byte[1024] ;
int len;
boolean done = false ;
boolean isAscii = true ;
while( (len=imp.read(buf,0,buf.length)) != -1) {
// Check if the stream is only ascii.
if (isAscii)
isAscii = det.isAscii(buf,len);
// DoIt if non-ascii and not done yet.
if (!isAscii && !done)
done = det.DoIt(buf,len, false);
}
det.DataEnd();
if (isAscii) {
System.out.println("CHARSET = ASCII");
found = true ;
}
You may be able to get the correct character-set from the content-type header, if your server sends it correctly.

Populating ArrayList from file without including formatting info and back-slashes

To start, no this is not a homework assignment. I am fresh out of highschool and am trying to do some personal projects before college. I've been trying to populate an ArrayList with elements from a document. The document looks like:
item1
item2
item3
...
itemN
After failing many times on my own, I tried different solutions from this website. Most recently, this one got me the closest to what I desire:
public static void main(String[] args) throws IOException {
List<String> names = new ArrayList<String>();
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader("/Users/MyName/Desktop/names.txt"));
String line = null;
while ((line = reader.readLine()) != null) {
names.add(line);
}
} finally {
reader.close();
}
for(int i = 0; i<names.size(); i++){
System.out.println(names.get(i));
}
//String[] array = (String[]) names.toArray(); Not necessary that it is in an array
}
The only problem is that this returns something rather ugly in the console:
{\rtf1\ansi\ansicpg1252\cocoartf1347\cocoasubrtf570
{\fonttbl\f0\froman\fcharset0 Times-Roman;}
{\colortbl;\red255\green255\blue255;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx577\tx1155\tx1733\tx2311\tx2889\tx3467\tx4045\tx4623\tx5201\tx5779\tx6357\tx6935\tx7513\tx8091\tx8669\tx9247\tx9825\tx10403\tx10981\tx11559\tx12137\tx12715\tx13293\tx13871\tx14449\tx15027\tx15605\tx16183\tx16761\tx17339\tx17917\tx18495\tx19072\tx19650\tx20228\tx20806\tx21384\tx21962\tx22540\tx23118\tx23696\tx24274\tx24852\tx25430\tx26008\tx26586\tx27164\tx27742\tx28320\tx28898\tx29476\tx30054\tx30632\tx31210\tx31788\tx32366\tx32944\tx33522\tx34100\tx34678\tx35256\tx35834\tx36412\tx36990\tx37567\tx38145\tx38723\tx39301\tx39879\tx40457\tx41035\tx41613\tx42191\tx42769\tx43347\tx43925\tx44503\tx45081\tx45659\tx46237\tx46815\tx47393\tx47971\tx48549\tx49127\tx49705\tx50283\tx50861\tx51439\tx52017\tx52595\tx53173\tx53751\tx54329\tx54907\tx55485\tx56062\tx56640\tx57218\tx57796\li577\fi-578
\f0\fs24 \cf0 \CocoaLigature0 item1\
item2\
item3\
...
itemN\
}
How can i get it to read from the file but not include all of the back-slashes and formatting info?
you just need to really save the file as text file. Your file looks like an RTF file at the moment. Open Pages application and open that file. Go to File... Export to... Plain Text... and save it into a new file.
Looks like your names.txt file got saved as RTF (Rich Text Format). Make sure you convert it to plain text.

Apache-POI formatted text issue in .doc file

I am using this example in my project. Everything is fine, and replacing text is also good, But in my output text file, which should be "centered", have become left aligned. Input file - .doc, I have feeling that is breaking the formatting of documents, but I'm not sure what the problem is. How to resolve this problem?
public class HWPFTest {
public static void main(String[] args){
String filePath = "F:\\Sample.doc";
POIFSFileSystem fs = null;
try {
fs = new POIFSFileSystem(new FileInputStream(filePath));
HWPFDocument doc = new HWPFDocument(fs);
doc = replaceText(doc, "$VAR", "MyValue1");
saveWord(filePath, doc);
}
catch(FileNotFoundException e){
e.printStackTrace();
}
catch(IOException e){
e.printStackTrace();
}
}
private static HWPFDocument replaceText(HWPFDocument doc, String findText, String replaceText){
Range r1 = doc.getRange();
for (int i = 0; i < r1.numSections(); ++i ) {
Section s = r1.getSection(i);
for (int x = 0; x < s.numParagraphs(); x++) {
Paragraph p = s.getParagraph(x);
for (int z = 0; z < p.numCharacterRuns(); z++) {
CharacterRun run = p.getCharacterRun(z);
String text = run.text();
if(text.contains(findText)) {
run.replaceText(findText, replaceText);
}
}
}
}
return doc;
}
private static void saveWord(String filePath, HWPFDocument doc) throws FileNotFoundException, IOException{
FileOutputStream out = null;
try{
out = new FileOutputStream(filePath);
doc.write(out);
}
finally{
out.close();
}
}
}
HWPF is not usable for writing .doc files. It may work for very simple file content, but little extras break it. I fear you are out of luck here - if it is an option for you, you might want to use RTF files and work on those. Word should work correctly, if you rename the rtf extension to .doc (if you need the .doc extension).
(I developed a custom and working variant of HWPF for a client and know how difficult things can get there. The standard HWPF library will get in trouble when characters outside the 8-bit encoding exist, when tables are used, when text boxes are used, when graphics are embedded, ... . Some things in .doc files are also different from what is described in the official specification from Microsoft. Making a working HWPF library is "non-trivial" and requires a lot of spec-reading and investigation. If you want to go down the road to fix those bugs, you need to expect at least half a year of development effort.)

java convert to utf-8 from postgres

I'm generating html files whit this data (stored in postgres):
The html files are generated as UTF-8, but the string looks like they appear in the DB.
How I can do to make the text appear correctly? Like: Últiles de Escritorio
Note. I'm not able to change postgres configuration, I'm using Java 1.6, Postgres 8.4, JDBC
UPDATE:
I use this code to create the html files:
public static void stringToFile (String file_name, String file_content) throws IOException {
OutputStream out = new FileOutputStream(file_name);
OutputStreamWriter writer = new OutputStreamWriter(out, "UTF-8");
try {
try {
writer.write(file_content);
} finally {
writer.close();
}
} finally {
out.close();
}
}
And I use it like:
StringBuilder html_content = new StringBuilder();
ResultSet result_set = statement.executeQuery(sql_query);
while (result_set.next()) {
html_content.append(String.format('<li>%s</li>', result_set.getString(1)));
}
Utils.stringToFile('thehtmlfile.html', html_content.toString());
UPDATE: [SOLVED]
This works for me:
new String(str.getBytes("ISO-8859-1"), "UTF-8")
I hope you're sure that the error isn't happening before you add the string to the db.
Because you can't change your db settings, this isn't so easy to handle, first of all you have to know in which format the text will be saved in your db, see here.
Than you should be a familiar with UTF16 see here and here.
Now, after you are familiar with codecs you wanna use, you have to create the correct utf16 value for each character you get from the db. I will just point out how it could work, the correct implementation you have to do by your own.
public char createUTF16char( char first, char second ) {
char res = first;
res = res << 8;
res = res & (0x0F & second);
return res;
}
This code should just combine the last 8bit of each char ( first and second ) to a new char.
Maybe this is the operation you need but it depends on the coding what is used on the server.
Sincerely

How to save Chinese Characters to file with java?

I use the following code to save Chinese characters into a .txt file, but when I opened it with Wordpad, I couldn't read it.
StringBuffer Shanghai_StrBuf = new StringBuffer("\u4E0A\u6D77");
boolean Append = true;
FileOutputStream fos;
fos = new FileOutputStream(FileName, Append);
for (int i = 0;i < Shanghai_StrBuf.length(); i++) {
fos.write(Shanghai_StrBuf.charAt(i));
}
fos.close();
What can I do ? I know if I cut and paste Chinese characters into Wordpad, I can save it into a .txt file. How do I do that in Java ?
There are several factors at work here:
Text files have no intrinsic metadata for describing their encoding (for all the talk of angle-bracket taxes, there are reasons XML is popular)
The default encoding for Windows is still an 8bit (or doublebyte) "ANSI" character set with a limited range of values - text files written in this format are not portable
To tell a Unicode file from an ANSI file, Windows apps rely on the presence of a byte order mark at the start of the file (not strictly true - Raymond Chen explains). In theory, the BOM is there to tell you the endianess (byte order) of the data. For UTF-8, even though there is only one byte order, Windows apps rely on the marker bytes to automatically figure out that it is Unicode (though you'll note that Notepad has an encoding option on its open/save dialogs).
It is wrong to say that Java is broken because it does not write a UTF-8 BOM automatically. On Unix systems, it would be an error to write a BOM to a script file, for example, and many Unix systems use UTF-8 as their default encoding. There are times when you don't want it on Windows, either, like when you're appending data to an existing file: fos = new FileOutputStream(FileName,Append);
Here is a method of reliably appending UTF-8 data to a file:
private static void writeUtf8ToFile(File file, boolean append, String data)
throws IOException {
boolean skipBOM = append && file.isFile() && (file.length() > 0);
Closer res = new Closer();
try {
OutputStream out = res.using(new FileOutputStream(file, append));
Writer writer = res.using(new OutputStreamWriter(out, Charset
.forName("UTF-8")));
if (!skipBOM) {
writer.write('\uFEFF');
}
writer.write(data);
} finally {
res.close();
}
}
Usage:
public static void main(String[] args) throws IOException {
String chinese = "\u4E0A\u6D77";
boolean append = true;
writeUtf8ToFile(new File("chinese.txt"), append, chinese);
}
Note: if the file already existed and you chose to append and existing data wasn't UTF-8 encoded, the only thing that code will create is a mess.
Here is the Closer type used in this code:
public class Closer implements Closeable {
private Closeable closeable;
public <T extends Closeable> T using(T t) {
closeable = t;
return t;
}
#Override public void close() throws IOException {
if (closeable != null) {
closeable.close();
}
}
}
This code makes a Windows-style best guess about how to read the file based on byte order marks:
private static final Charset[] UTF_ENCODINGS = { Charset.forName("UTF-8"),
Charset.forName("UTF-16LE"), Charset.forName("UTF-16BE") };
private static Charset getEncoding(InputStream in) throws IOException {
charsetLoop: for (Charset encodings : UTF_ENCODINGS) {
byte[] bom = "\uFEFF".getBytes(encodings);
in.mark(bom.length);
for (byte b : bom) {
if ((0xFF & b) != in.read()) {
in.reset();
continue charsetLoop;
}
}
return encodings;
}
return Charset.defaultCharset();
}
private static String readText(File file) throws IOException {
Closer res = new Closer();
try {
InputStream in = res.using(new FileInputStream(file));
InputStream bin = res.using(new BufferedInputStream(in));
Reader reader = res.using(new InputStreamReader(bin, getEncoding(bin)));
StringBuilder out = new StringBuilder();
for (int ch = reader.read(); ch != -1; ch = reader.read())
out.append((char) ch);
return out.toString();
} finally {
res.close();
}
}
Usage:
public static void main(String[] args) throws IOException {
System.out.println(readText(new File("chinese.txt")));
}
(System.out uses the default encoding, so whether it prints anything sensible depends on your platform and configuration.)
If you can rely that the default character encoding is UTF-8 (or some other Unicode encoding), you may use the following:
Writer w = new FileWriter("test.txt");
w.append("上海");
w.close();
The safest way is to always explicitly specify the encoding:
Writer w = new OutputStreamWriter(new FileOutputStream("test.txt"), "UTF-8");
w.append("上海");
w.close();
P.S. You may use any Unicode characters in Java source code, even as method and variable names, if the -encoding parameter for javac is configured right. That makes the source code more readable than the escaped \uXXXX form.
Be very careful with the approaches proposed. Even specifying the encoding for the file as follows:
Writer w = new OutputStreamWriter(new FileOutputStream("test.txt"), "UTF-8");
will not work if you're running under an operating system like Windows. Even setting the system property for file.encoding to UTF-8 does not fix the issue. This is because Java fails to write a byte order mark (BOM) for the file. Even if you specify the encoding when writing out to a file, opening the same file in an application like Wordpad will display the text as garbage because it doesn't detect the BOM. I tried running the examples here in Windows (with a platform/container encoding of CP1252).
The following bug exists to describe the issue in Java:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
The solution for the time being is to write the byte order mark yourself to ensure the file opens correctly in other applications. See this for more details on the BOM:
http://mindprod.com/jgloss/bom.html
and for a more correct solution see the following link:
http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html
Here's one way among many. Basically, we're just specifying that the conversion be done to UTF-8 before outputting bytes to the FileOutputStream:
String FileName = "output.txt";
StringBuffer Shanghai_StrBuf=new StringBuffer("\u4E0A\u6D77");
boolean Append=true;
Writer writer = new OutputStreamWriter(new FileOutputStream(FileName,Append), "UTF-8");
writer.write(Shanghai_StrBuf.toString(), 0, Shanghai_StrBuf.length());
writer.close();
I manually verified this against the images at http://www.fileformat.info/info/unicode/char/ . In the future, please follow Java coding standards, including lower-case variable names. It improves readability.
Try this,
StringBuffer Shanghai_StrBuf=new StringBuffer("\u4E0A\u6D77");
boolean Append=true;
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(FileName,Append), "UTF8"));
for (int i=0;i<Shanghai_StrBuf.length();i++) out.write(Shanghai_StrBuf.charAt(i));
out.close();

Categories