Regarding word search error (Encoding error Java) - java

I have a list of french words where I am trying to search in my database. The words are "thé Mariage frères", "thé Lipton" etc.
While I am reading my file in java, it shows the words as "thé Lipton", "thé Mariage frères". It fails to get the correct words.
I don't know how to correct my errors.
Help me, please!!!

You file is in one encoding (maybe latin1/iso-8859-1) and you're reading your file in another encoding.
See if this port helps How to read a file in Java with specific character encoding?

Try this.
try (FileInputStream fis = new FileInputStream("input.txt");
InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8);
BufferedReader reader = new BufferedReader(isr)) {
String line;
while ((line = reader.readLine()) != null)
System.out.println(line);
}

Try creating Scanner object like this
Scanner s = new Scanner(new File("French_Tea_keywords/filter_keywords.txt"), "UTF8");

Related

How to read a file from internal storage?

I've just created a file in MainActivity using the code:
FileOutputStream outputStream = openFileOutput("user", Context.MODE_PRIVATE);
outputStream.close();
Now what to use if I want to read the file (noting that the file will be obviously created in the internal storage)?
And is there a way that makes the file reading works as the Scanner function in Java (where the string is being read word by word and line by line)?
Use openFileInput() with the same parameters as openFileOutput().
This is your solution what I understood from your question,
FileInputStream fIn = new FileInputStream(new File("FILE_PATH"));
BufferedReader bufferedReader = new BufferedReader(
new InputStreamReader(fIn));
String str = bufferedReader.readLine();

character encoding issue in converting csv data using fileInputStream class in java

Hi am having a csv file in my hand and am trying to read each line in csv file and update the content to a database table. Am doing this using java.
Following are the things i did for achieving this.
FileInputStream fileInputStream = FileUtils.openInputStream("filename.csv");
dataInputStream = new DataInputStream(fileInputStream);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(dataInputStream));
String strLine;
while((strLine = bufferedReader.readLine()) != null){
System.out.println(strLine);
}
but its printing something like blocks on the console, not the actual data csv having
can anyone please help me to solve this issue.?
you should find out what encoding it is and then declare it while reading, xample for UTF-16:
new InputStreamReader(zipFile.getInputStream(entry), "UTF_16" )

Java - ignoring certain characters while reading a text file

I'm trying to read a simple text file that contains the following:
LOAD
Bill's Beans
1200
20
15
30
QUIT
I need to store and print the contents line by line. I am doing so using the following code:
String inputFile = "(file path here)";
try {
Scanner input = new Scanner(inputFile);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
String currentLine = "";
while (!currentLine.equals("QUIT}")){
currentLine = input.nextLine();
System.out.println(currentLine);
}
input.close();
However, the output is very "messy". I am trying to avoid storing all new line characters and anything else that doesn't appear in the text file. Output is:
{\rtf1\ansi\ansicpg1252\cocoartf949\cocoasubrtf540
{\fonttbl\f0\fmodern\fcharset0 Courier;}
{\colortbl;\red255\green255\blue255;}
\margl1440\margr1440\vieww9000\viewh8400\viewkind0
\deftab720
\pard\pardeftab720\ql\qnatural
\f0\fs26 \cf0 LOAD\
Bill's Beans\
1200\
20\
15\
30\
QUIT}
Any help would be greatly appreciated, thank you!
This looks like you're reading a RTF file, isn't that so, by any chance?
Otherwise, I found reading text files is most natural for me using this construct:
BufferedReader reader = new BufferedReader(
new FileReader(new File("yourfile.txt")
);
String text = null;
// repeat until all lines is read
while ((text = reader.readLine()) != null) {
// do whatever with the text line
}
Because this is an RTF file, look into this for example: RTFEditorKit
If you insist on writing your own RTF reader, the correct approach would be for you to extend FilterInputStream and handle the RTF metadata in its implementation.
Just add following code into your class, then call it with path parameter. it returns all lines as List object
public List<String> readStudentsNoFromText(String path) throws IOException {
List<String> result = new ArrayList<String>();
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream(new File(path));
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println(strLine);
result.add(strLine.trim());
}
//Close the input stream
in.close();
return result;
}

Check line for unprintable characters while reading text file

My program must read text files - line by line.
Files in UTF-8.
I am not sure that files are correct - can contain unprintable characters.
Is possible check for it without going to byte level?
Thanks.
Open the file with a FileInputStream, then use an InputStreamReader with the UTF-8 Charset to read characters from the stream, and use a BufferedReader to read lines, e.g. via BufferedReader#readLine, which will give you a string. Once you have the string, you can check for characters that aren't what you consider to be printable.
E.g. (without error checking), using try-with-resources (which is in vaguely modern Java version):
String line;
try (
InputStream fis = new FileInputStream("the_file_name");
InputStreamReader isr = new InputStreamReader(fis, Charset.forName("UTF-8"));
BufferedReader br = new BufferedReader(isr);
) {
while ((line = br.readLine()) != null) {
// Deal with the line
}
}
While it's not hard to do this manually using BufferedReader and InputStreamReader, I'd use Guava:
List<String> lines = Files.readLines(file, Charsets.UTF_8);
You can then do whatever you like with those lines.
EDIT: Note that this will read the whole file into memory in one go. In most cases that's actually fine - and it's certainly simpler than reading it line by line, processing each line as you read it. If it's an enormous file, you may need to do it that way as per T.J. Crowder's answer.
Just found out that with the Java NIO (java.nio.file.*) you can easily write:
List<String> lines=Files.readAllLines(Paths.get("/tmp/test.csv"), StandardCharsets.UTF_8);
for(String line:lines){
System.out.println(line);
}
instead of dealing with FileInputStreams and BufferedReaders...
If you want to check a string has unprintable characters you can use a regular expression
[^\p{Print}]
How about below:
FileReader fileReader = new FileReader(new File("test.txt"));
BufferedReader br = new BufferedReader(fileReader);
String line = null;
// if no more lines the readLine() returns null
while ((line = br.readLine()) != null) {
// reading lines until the end of the file
}
Source: http://devmain.blogspot.co.uk/2013/10/java-quick-way-to-read-or-write-to-file.html
I can find following ways to do.
private static final String fileName = "C:/Input.txt";
public static void main(String[] args) throws IOException {
Stream<String> lines = Files.lines(Paths.get(fileName));
lines.toArray(String[]::new);
List<String> readAllLines = Files.readAllLines(Paths.get(fileName));
readAllLines.forEach(s -> System.out.println(s));
File file = new File(fileName);
Scanner scanner = new Scanner(file);
while (scanner.hasNext()) {
System.out.println(scanner.next());
}
The answer by #T.J.Crowder is Java 6 - in java 7 the valid answer is the one by #McIntosh - though its use of Charset for name for UTF -8 is discouraged:
List<String> lines = Files.readAllLines(Paths.get("/tmp/test.csv"),
StandardCharsets.UTF_8);
for(String line: lines){ /* DO */ }
Reminds a lot of the Guava way posted by Skeet above - and of course same caveats apply. That is, for big files (Java 7):
BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);
for (String line = reader.readLine(); line != null; line = reader.readLine()) {}
If every char in the file is properly encoded in UTF-8, you won't have any problem reading it using a reader with the UTF-8 encoding. Up to you to check every char of the file and see if you consider it printable or not.

Reading hebrew from text file with Java

I'm having troubles with reading a UTF-8 encoded text file in Hebrew.
I read all Hebrew characters successfully, except to two letters = 'מ' and 'א'.
Here is how I read it:
FileInputStream fstream = new FileInputStream(SCHOOLS_LIST_PATH);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
if(strLine.contains("zevel")) {
continue;
}
schools.add(getSchoolFromLine(strLine));
}
Any idea?
Thanks,
Tomer
You're using InputStreamReader without specifying the encoding, so it's using the default for your platform - which may well not be UTF-8.
Try:
new InputStreamReader(in, "UTF-8")
Note that it's not obvious why you're using DataInputStream here... just create an InputStreamReader around the FileInputStream.

Categories