HashMap does not behave as expected for Chinese characters

HashMap does not behave as expected for Chinese characters - java

China-中国,CN
Angola-安哥拉,AO
Afghanistan-阿富汗,AF
Albania-阿尔巴尼亚,AL
Algeria-阿尔及利亚,DZ
Andorra-安道尔共和国,AD
Anguilla-安圭拉岛,AI
In Java, I'm reading the above text from a file and creating a map where the keys will be the part before the comma and the values will be the region code after the comma.
Here is the code:
public static void main(String[] args) {
BufferedReader br;
Map<String,String> mymap = new HashMap<String,String>();
try {
br = new BufferedReader(new InputStreamReader(new FileInputStream("C:/Users/IBM_ADMIN/Desktop/region_code_abbreviations_Chinese.csv"), "UTF-8"));
String line;
while ((line = br.readLine()) != null) {
//System.out.println(line);
String[] arr= line.split(",");
mymap.put(arr[0], arr[1]);
}
br.close();
} catch (IOException e) {
System.out.println("Failed to read users file.");
} finally {}
for(String s: mymap.keySet()){
System.out.println(s);
if(s.equals("China-中国")){
System.out.println("Got it");
break;
}
}
System.out.println("----------------");
System.out.println("Returned from map "+ mymap.get("China-中国"));
mymap = new HashMap<String,String>();
mymap.put("China-中国","Explicitly Put");
System.out.println(mymap.get("China-中国"));
System.out.println("done");
}
The output:
:
:
Egypt-埃及
Guyana-圭亚那
New Zealand-新西兰
China-中国
Indonesia-印度尼西亚
Laos-老挝
Chad-乍得
Korea-韩国
:
:
Returned from map null
Explicitly Put
done
Map is loaded correctly but when I search the map for "China-中国" - I do not get the value.
If I explicitly put "China-中国" in map, then it returns a value.
Why is this happening?

Check if your resource file is not UTF-8, e.g. UTF-8Y, with BOM Bytes at the start. But this would only infere with the first value. If you change the test to a value from the middle, do you have a value or not? If not then this is not the problem.
Second possibility is your source code file is not UTF-8. Therefore the byte sequence of "China-中国" of your resource file and your sourcecode file is not equal and you will not get a match. But you include the value with the sourcecodes byte sequence explicitly and it will be found.
In fact this is not a problem with HashMap but with character or file encoding.

Since you are having a problem with the first value, I would check to see if the file starts with a BOM (Byte Order Mark).
If so, try stripping the BOM before processing.
See: Byte order mark screws up file reading in Java

You can use org.apache.commons.io.input.BOMInputStream.
BufferedReader br= new BufferedReader(new InputStreamReader(new BOMInputStream(new FileInputStream("filepath")),"UTF-8"))

Related

assigning properties to strings in text file

Hopefully my explanation does me some justice. I am pretty new to java. I have a text file that looks like this
Java
The Java Tutorials
http://docs.oracle.com/javase/tutorial/
Python
Tutorialspoint Java tutorials
http://www.tutorialspoint.com/python/
Perl
Tutorialspoint Perl tutorials
http://www.tutorialspoint.com/perl/
I have properties for language name, website description, and website url. Right now, I just want to list the information from the text file exactly how it looks, but I need to assign those properties to them.
The problem I am getting is "index 1 is out of bounds for length 1"
try {
BufferedReader in = new BufferedReader(new FileReader("Tutorials.txt"));
while (in.readLine() != null) {
TutorialWebsite tw = new TutorialWebsite();
str = in.readLine();
String[] fields = str.split("\\r?\\n");
tw.setProgramLanguage(fields[0]);
tw.setWebDescription(fields[1]);
tw.setWebURL(fields[2]);
System.out.println(tw);
}
} catch (IOException e) {
e.printStackTrace();
}
I wanted to test something so i removed the new lines and put commas instead and made it str.split(",") which printed it out just fine, but im sure i would get points taken off it i changed the format.

readline returns a "string containing the contents of the line, not including any line-termination characters", so why are you trying to split each line on "\\r?\\n"?
Where is str declared? Why are you reading two lines for each iteration of the loop, and ignoring the first one?
I suggest you start from
String str;
while ((str = in.readLine()) != null) {
System.out.println(str);
}
and work from there.
The first readline gets the language, the second gets the description, and the third gets the url, and then the pattern repeats. There is nothing to stop you using readline three times for each iteration of the while loop.

you can read all the file in a String like this
// try with resources, to make sure BufferedReader is closed safely
try (BufferedReader in = new BufferedReader(new FileReader("Tutorials.txt"))) {
//str will hold all the file contents
StringBuilder str = new StringBuilder();
String line;
while ((line = in.readLine()) != null) {
str.append(line);
str.append("\n");
} catch (IOException e) {
e.printStackTrace();
}
Later you can split the string with
String[] fields = str.toString().split("[\\n\\r]+");

Why not try it like this.
allocate a List to hold the TutorialWebsite instances.
use try with resources to open the file, read the lines, and trim any white space.
put the lines in an array
then iterate over the array, filling in the class instance
the print the list.
The loop ensures the array length is a multiple of nFields, discarding any remainder. So if your total lines are not divisible by nFields you will not read the remainder of the file. You would still have to adjust the setters if additional fields were added.
int nFields = 3;
List<TutorialWebsite> list = new ArrayList<>();
try (BufferedReader in = new BufferedReader(new FileReader("tutorials.txt"))) {
String[] lines = in.lines().map(String::trim).toArray(String[]::new);
for (int i = 0; i < (lines.length/nFields)*nFields; i+=nFields) {
TutorialWebsite tw = new TutorialWebsite();
tw.setProgramLanguage(lines[i]);
tw.setWebDescription(lines[i+1]);
tw.setWebURL(lines[i+2]);
list.add(tw);
}
} catch (IOException ioe) {
ioe.printStackTrace();
}
list.forEach(System.out::println);
A improvement would be to use a constructor and pass the strings to that when each instance is created.
And remember the file name as specified is relative to the directory in which the program is run.

Print data from a text file imported into a hash map, ignore characters

I have a text file that contains the following:
example.txt
#ignore
#ignore line
#ignore line again
1234567
8940116
12131415
I want to read in the example.txt file into eclipse and add the data into a hashmap. I want the list to be arranged in numerical order and I want it to ignore any comments(any text with #) in the text file. I would like to print the hashmap as follows:
output:
1234567
8940116
12131415

You don't need a hashmap for storing just Strings. Maps are for key value pairs. If you want to put each line from file into a collection use Lists. ArrayLists, LinkedList maintain insertion order. You can use any of them. If you want sorted list you can use TreeList.
BufferedReader reader;
List<String> list = new ArrayList<String>();
try {
reader = new BufferedReader(new FileReader(
"example"));
String line = reader.readLine();
while (line != null) {
if(!line.startsWith("#"){
list.add(line);
}
line = reader.readLine();
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}

Thr purpose of a Map is to store pairs Key/Value, for a single collection you may use a List it's far more efficient, the printing part is you job whatever the type of collection is
List<String> values = new ArrayList<>();
try (BufferedReader reader = new BufferedReader(new FileReader("filename"))) {
String line;
while ((line = reader.readLine()) != null) {
if (!line.startsWith("#")) {
values.add(line);
}
}
} catch (IOException e) {
e.printStackTrace();
}
for (String v : values)
System.out.println(v);

How to output an string of ArrayList into a new file?

Hi guys I have this sample text file in which the names of the peopel are stuck together without any spacing in between them. Is it possible for me to put this into a bufferedreader and create a ArrayList to store the values in a string and then to separate the strings by name.
Text file details:
charles_luiharry_pinkertonarlene_purcellwayne_casanova
My code:
try {
BufferedReader in = new BufferedReader(new FileReader(filename));
String str;
List<String> list = new ArrayList<String>();
while ((str = in.readLine()) != null) {
list.add(str);
}
String[] stringArr = list.toArray(new String[0]);
FileWriter writer = new FileWriter("new_users.txt");
for (String ss : list) {
writer.write(ss);
}
writer.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
Expected output :
charles_lui
harry_pinkerton
arlene_purcell
wayne_casanova
Real output:
A duplicate of the sample file.

Just add a line separator to your writer:
writer.write(ss);
writer.write(System.lineSeparator());
If problems with your os, use System.getProperty( "line.separator" )

BufferedReader.readLine() reads and returns a line from the input which ends with \n or \r\n. It cannot detect the boundaries between the names in your input file. Better prepare the input that the names are on different lines.

It's difficult for a human to successfully separate the last-name with first-name of the next name, how can you expect a computer to do so?
Proposed solution -
Modify the sample file and add a separator(say ';') between two names.
Make a lengthy string by concatenating all the lines in the file. When concatenating remove '\n' or '\r\n' from the end of lines. (Optional - Use a StringBuffer for performance).
Split that string into an 'array of valid names'.
This can be done by calling the split(';') method on the lengthy string, with the separator as the argument.
Then, print from the array.

BufferedReader does not read all the lines in text file

I have a function.
public ArrayList<String> readRules(String src) {
try (BufferedReader br = new BufferedReader(new FileReader(src))) {
String sCurrentLine;
while ((sCurrentLine = br.readLine()) != null) {
System.out.println(sCurrentLine);
lines.add(sCurrentLine);
}
} catch (IOException e) {
e.printStackTrace();
}
return lines;
}
My file have 26.400 lines but this function just read 3400 lines at end of file.
How do I read all lines in file.
Thanks!

Why don't you use the utility method Files.readAllLines() (available since Java 7)?
This method ensures that the file is closed when all bytes have been read or an IOException (or another runtime exception) is thrown.
Bytes from the file are decoded into characters using the specified charset.
public ArrayList<String> readRules(String src) {
return Files.readAllLines(src, Charset.defaultCharset());
}

while ((sCurrentLine = br.readLine()) != null)
It is likely that you have an empty line or a line that is treated as null.
Try
while(br.hasNextLine())
{
String current = br.nextLine();
}
Edit: Or, in your text file, when a line is too long, the editor automatically wraps a single line into many lines. When you don't use return key, it is treated as a single line by BufferedReader.
Notepad++ is a good tool to prevent confusing a single line with multiple lines. It numbers the lines with respect to usage of return key. Maybe you could copy/paste your input file to Notepad++ and check if the line numbers match.

You can also cast into a List of strings using readAllLines() and then loop through it.
List<String> myfilevar = Files.readAllLines(Paths.get("/PATH/TO/MY/FILE.TXT"));
for(String x : myfilevar)
{
System.out.println(x);
}

Parsing in Java with C style?

I am new to java text parsing and I'm wondering what is the best way to parse a file when the format of each line is known.
I have a file that has the following format for each line:
Int;String,double;String,double;String,double;String,double;String,double
Note how the String,double act as a pair separated by a comma and each pair is separated by a semicolon.
A few examples:
1;art,0.1;computer,0.5;programming,0.6;java,0.7;unix,0.3
2;291,0.8;database,0.6;computer,0.2;java,0.9;undegraduate,0.7
3;coffee,0.5;colombia,0.2;java,0.1;export,0.4;import,0.5
I'm using the following code to read each line:
public static void main(String args[]) {
try {
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("textfile.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println(strLine);
}
// Close the input stream
in.close();
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
Thanks in advance :)

You could use the Scanner class, for starters:
A simple text scanner which can parse primitive types and strings using regular expressions.

If you are truly trying to do "C" style parsing, where is the buffer which contains the characters being accumulated for the "next" field? Where is the check that sees if the field separator was read, and where is the code that flushes the current field into the correct data structure once the end of line / field separator is read?
A character by character read loop in Java looks like
int readChar = 0;
while ((readChar = in.read()) != -1) {
// do something with the new readChar.
}

You can provide a pattern and use the Scanner
String input = "fish1-1 fish2-2";
java.util.Scanner s = new java.util.Scanner(input);
s.findInLine("(\\d+)");
java.util.regex.MatchResult result = s.match();
for (int i=1; i<=result.groupCount(); i++)
System.out.println(result.group(i));
s.close();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

HashMap does not behave as expected for Chinese characters - java

Since you are having a problem with the first value, I would check to see if the file starts with a BOM (Byte Order Mark). If so, try stripping the BOM before processing. See: Byte order mark screws up file reading in Java

You can use org.apache.commons.io.input.BOMInputStream. BufferedReader br= new BufferedReader(new InputStreamReader(new BOMInputStream(new FileInputStream("filepath")),"UTF-8"))

Related

assigning properties to strings in text file

Print data from a text file imported into a hash map, ignore characters

How to output an string of ArrayList into a new file?

BufferedReader does not read all the lines in text file

Parsing in Java with C style?

Categories

Resources