Java split giving opposite order of arabic characters

Java split giving opposite order of arabic characters - java

I am splitting the following string using \\| in java (android) using the IntelliJ 12 IDE.
Everything is fine except the last part, somehow the split picks them up in the opposite order :
As you can see the real positioning 34,35,36 is correct and according to the string, but when it gets picked out into split part no 5 its in the wrong order, 36,35,34 ...
Any way I can get them to be in the right order?
My Code:
public ArrayList<Book> getBooksFromDatFile(Context context, String fileName)
{
ArrayList<Book> books = new ArrayList<Book>();
try
{
// load csv from assets
InputStream is = context.getAssets().open(fileName);
try
{
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
String line;
while ((line = reader.readLine()) != null)
{
String[] RowData = line.split("\\|");
books.add(new Book(RowData[0], RowData[1], RowData[2], RowData[3], RowData[4], RowData[5]));
}
}
catch (IOException ex)
{
Log.e(TAG, "Error parsing csv file!");
}
finally
{
try
{
is.close();
}
catch (IOException e)
{
Log.e(TAG, "Error closing input stream!");
}
}
}
catch (IOException ex)
{
Log.e(TAG, "Error reading .dat file from assets!");
}
return books;
}

The characters in the String should always be in linguistic order, regardless of whether they're right-to-left or left-to-right characters. So we should see [34] = '١', [35] = '-', [36] = '٧'. It is up to rendering engines to display them using the correct right-to-left or left-to-right layout.

In Unicode world, there are strong and weak characters. These are the list of weak characters:
"\\", "/", "+", "-", "=", ";", "$"
They are called "weak" characters because they do not contain any directional information. So it is up to the software to decide in which direction these "weak" characters will be placed, more info here.
To fix this issue you need to set the Directional Formatting, for example:
RightToLeftEmbedding + weakCharacter + PopDirectionalFormatting
using these constant values
char RightToLeftEmbedding = (char)0x202B;
char PopDirectionalFormatting = (char)0x202C;

Related

Read file, one line at a time and run code

I have a file with text in this format:
text:text2:text3
text4:text5:text6
text7:text8:text9
Now what I want to do, is to read the first line, separate the words at the ":", and save the 3 strings into different variables. those variables are then used as parameter for a method, before having the program read the next line and doing the same thing over and over again.. So far I've got this:
public static void main(String[] args) {
BufferedReader reader = null;
try {
File file = new File("C://Users//Patrick//Desktop//textfile.txt");
reader = new BufferedReader(new FileReader(file));
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
Also, I've tried this for separation (although not sure Array is the best option:
String[] strArr = sCurrentLine.split("\\:");

Use String[] parts = line.split(":"); to get an array with text, text2 etc. You can then loop through parts and call the method you want with each item in the list.
Your original split does not work, because : is not a special character in Regex. You only have to use an escape character when the split you are trying to achieve uses a special character.
More information here.

Java XML Parsing - incorrect string version of the data with VTD-XML

I am parsing an XML document in UTF-8 encoding with Java using VTD-XML.
A small excerpt looks like:
<literal>𠀋</literal>
<literal>𠂉</literal>
<literal>𠂢</literal>
I want to iterate through each literal and print it out to the console. However, what I get is:
¢
I am correctly navigating to each element. The way that I get the text value is by calling:
private static String toNormalizedString(String name, int val, final VTDNav vn) throws NavException {
String strValue = null;
if (val != -1) {
strValue = vn.toNormalizedString(val);
}
return strValue;
}
I've also tried vn.getXPathStringVal();, however it yields the same results.
I know that each of the literals above aren't just strings of length one. Rather, they seem to be unicode "characters" composed of two characters. I am able to correctly parse and output the kanji characters if they're length is just one.
My question is - how can I correctly parse and output these characters using VTD-XML? Is there a way to get the underlying bytes of the text between the literal tags so that I can parse the bytes myself?
EDIT
Code to process each line of the XML - converting it to a byte array and then back to a String.
try (BufferedReader br = new BufferedReader(new FileReader("res/sample.xml"))) {
String line;
while ((line = br.readLine()) != null) {
byte[] myBytes = null;
try {
myBytes = line.getBytes("UTF-8");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
System.exit(-1);
}
System.out.println(new String(myBytes));
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}

You are probably trying to get the string involving characters that is greater than 0x10000. That bug is known and is in the process of being addressed... I will notify you once the fix is out.
This question may be identical to this one...
Map supplementary Unicode characters to BMP (if possible)

Words read from file are null

I am attempting to read some words off of the file "words.txt", then use them in other classes of my program when it runs. This is what I have found on the internet, and it doesn't seem to be working properly.
public static List<String> wordsList;
public static void refreshWords(){
String fileName = "words.txt";
String line = null;
try {
FileReader fileReader =
new FileReader(fileName);
BufferedReader bufferedReader =
new BufferedReader(fileReader);
while((line = bufferedReader.readLine()) != null) {
for(String tempWord : line.split(" ")){
wordsList.add(tempWord);
}
}
bufferedReader.close();
}
catch(FileNotFoundException ex) {
System.out.println(
"Unable to open file '" +
fileName + "'");
}
catch(IOException ex) {
System.out.println(
"Error reading file '"
+ fileName + "'");
}
}
public static List<String> getListOfWords(){
return wordsList;
}
I, from the message displayed before the program even runs, which cancels the entire thing, can determine that the error is sparking from adding tempWord to wordsList. I would assume that tempWord is null, but I can't seem to find a reason why it is.
All that I have in the file are a bunch of random words that I thought of off the top of my head, formatted like the following:
this game turtle forest soccer football ball java list annoyed

What you are using there is the old way of doing it (before Java 7).
With Java 7 / 8, reading a file is much easier. So rather than looking for bugs, I'd rewrite this using the new API:
List<String> lines = Files.readAllLines(yourFile.toPath(), StandardCharsets.UTF_8);
See Files.readAllLines(Path, Charset)
Also, in your question, you are splitting lines into words. That's highly unusual, word lists are almost always one word per line.

Buffer Reader code to read input file

I have a text file named "message.txt" which is read using Buffer Reader. Each line of the text file contains both "word" and "meaning" as given in this example:
"PS:Primary school"
where PS - word, Primary school - meaning
When the file is being read, each line is tokenized to "word" and "meaning" from ":".
If the "meaning" is equal to the given input string called "f_msg3", "f_msg3" is displayed on the text view called "txtView". Otherwise, it displays "f_msg" on the text view.
But the "if condition" is not working properly in this code. For example if "f_msg3" is equal to "Primary school", the output on the text view must be "Primary school". But it gives the output as "f_msg" but not "f_msg3". ("f_msg3" does not contain any unnecessary strings.)
Can someone explain where I have gone wrong?
try {
BufferedReader file = new BufferedReader(new InputStreamReader(getAssets().open("message.txt")));
String line = "";
while ((line = file.readLine()) != null) {
try {
/*separate the line into two strings at the ":" */
StringTokenizer tokens = new StringTokenizer(line, ":");
String word = tokens.nextToken();
String meaning = tokens.nextToken();
/*compare the given input with the meaning of the read line */
if(meaning.equalsIgnoreCase(f_msg3)) {
txtView.setText(f_msg3);
} else {
txtView.setText(f_msg);
}
} catch (Exception e) {
txtView.setText("Cannot break");
}
}
} catch (IOException e) {
txtView.setText("File not found");
}

Try this
............
meaning = meaning.replaceAll("\\s+", " ");
/*compare the given input with the meaning of the read line */
if(meaning.equalsIgnoreCase(f_msg3)) {
txtView.setText(f_msg3);
} else {
txtView.setText(f_msg);
}
............
Otherwise comment the else part, then it will work.

I don't see any obvious error in your code, maybe it is just a matter
of cleaning the string (i.e. removing heading and trailing spaces, newlines and so on) before comparing it.
Try trimming meaning, e.g. like this :
...
String meaning = tokens.nextToken();
if(meaning != null) {
meaning = meaning.trim();
}
if(f_msg3.equalsIgnoreCase(meaning)) {
txtView.setText(f_msg3);
} else {
txtView.setText(f_msg);
}
...

A StringTokenizer takes care of numbers (the cause for your error) and other "tokens" - so might be considered to invoke too much complexity.
String[] pair = line.split("\\s*\\:\\s*", 2);
if (pair.length == 2) {
String word = pair[0];
String meaning = pair[1];
...
}
This splits the line into at most 2 parts (second optional parameter) using a regular expression. \s* stands for any whitespace: tabs and spaces.
You could also load all in a Properties. In a properties file the format key=value is convention, but also key:value is allowed. However then some escaping might be needed.

ArrayList vals = new ArrayList();
String jmeno = "Adam";
vals.add("Honza");
vals.add("Petr");
vals.add("Jan");
if(!(vals.contains(jmeno))){
vals.add(jmeno);
}else{
System.out.println("Adam je už v seznamu");
}
for (String jmena : vals){
System.out.println(jmena);
}
try (BufferedReader br = new BufferedReader(new FileReader("dokument.txt")))
{
String aktualni = br.readLine();
int pocetPruchodu = 0;
while (aktualni != null)
{
String[] znak = aktualni.split(";");
System.out.println(znak[pocetPruchodu] + " " +znak[pocetPruchodu + 1]);
aktualni = br.readLine();
}
br.close();
}
catch (IOException e)
{
System.out.println("Nezdařilo se");
}
try (BufferedWriter bw = new BufferedWriter(new FileWriter("dokument2.txt")))
{
int pocetpr = 0;
while (pocetpr < vals.size())
{
bw.write(vals.get(pocetpr));
bw.append(" ");
pocetpr++;
}
bw.close();
}
catch (IOException e)
{
System.out.println("Nezdařilo se");
}

Struggling to parse different text files based on their delimiters

Ive been working on this on and off today.
Here is my method, which basically needs to accept a .data (txt) file location, and then go through the contents of that text file and break it up into strings based on the delimiters present. These are the 2 files.
The person file.
Person ID,First Name,Last Name,Street,City
1,Ola,Hansen,Timoteivn,Sandnes
2,Tove,Svendson,Borgvn,Stavanger
3,Kari,Pettersen,Storgt,Stavanger
The order file.
Order ID|Order Number|Person ID
10|2000|1
11|2001|2
12|2002|1
13|2003|10
public static void openFile(String url) {
//initialize array for data to be held
String[][] myStringArray = new String[10][10];
int row = 0;
try {
//open the file
FileInputStream fstream = new FileInputStream(url);
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
//ignores any blank entries
if (!"".equals(strLine)) {
//splits by comma(\\| for order) and places individually into array
String[] splitStr = new String[5];
//splitStr = strLine.split("\\|");
/*
* This is the part that i am struggling with getting to work.
*/
if (strLine.contains("\\|")) {
splitStr = strLine.split("\\|");
} else if (strLine.contains(",")) {
splitStr = strLine.split(",");
}else{
System.out.println("error no delimiter detected");
}
for (int i = 0; i < splitStr.length; i++) {
myStringArray[row][i] = splitStr[i];
System.out.println(myStringArray[row][i]);
}
}
}
//Close the input stream
br.close();
} catch (FileNotFoundException ex) {
Logger.getLogger(Client.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(Client.class.getName()).log(Level.SEVERE, null, ex);
}
}
The person file is correctly read and parsed. But the order file with the "|" delimiter is having none of it. I just get 'null' printouts.
Whats confusing me is that when i just have splitStr = strLine.split("\|"); It works but i need this method to be able to detect the delimiter present and then apply the correct split.
Any help will be much appreciated

Apart from the fact that this should be done using a CSV library, the reason this code is failing is that contains doesnt accept a regular expression. Remove the escape characters so the pipe character can be detected
if (strLine.contains("|")) {

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java split giving opposite order of arabic characters - java

The characters in the String should always be in linguistic order, regardless of whether they're right-to-left or left-to-right characters. So we should see [34] = '١', [35] = '-', [36] = '٧'. It is up to rendering engines to display them using the correct right-to-left or left-to-right layout.

Related

Read file, one line at a time and run code

Java XML Parsing - incorrect string version of the data with VTD-XML

Words read from file are null

Buffer Reader code to read input file

Struggling to parse different text files based on their delimiters

Categories

Resources