I am currently trying to read multiple files (UTF-8) within a directory and store each element in that text file into an array.
I am able to get the text to print to console however it shows some funny characters I can't seem to rid myself off (See image - what is should look like is displayed on the right).
Currently, I have a method that builds an array with all file names in that directory then using a for loop I send each of these file names to a read method which puts it into a string.
The below method writes these file names to an array.
public static ArrayList<String> readModelFilesInModelDir() {
File folder = new File("Models/");
File[] listOfFiles = folder.listFiles();
String random = "";
assert listOfFiles != null;
ArrayList<String> listOfModelFiles = new ArrayList<>();
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isFile()) {
//System.out.println("File " + listOfFiles[i].getName());
listOfModelFiles.add(listOfFiles[i].getName());
} else if (listOfFiles[i].isDirectory()) {
System.out.println("Directory " + listOfFiles[i].getName());
}
}
System.out.println(listOfModelFiles);
return listOfModelFiles;
The below for loop then sends these file names to the read method.
ArrayList<String> modelFiles = readModelFilesInModelDir();
for (int i = 0; i < modelFiles.size(); i++) {
String thisString = readModelFileIntoArray(modelFiles.get(i));
System.out.println(thisString);
}
The below method then reads the string into an array, which is outputting what the images show.
public static String readModelFileIntoArray(String modelFilePath) {
StringBuilder fileHasBeenRead = new StringBuilder();
try {
Reader reader = new InputStreamReader(new FileInputStream(("Models/" + modelFilePath)), StandardCharsets.UTF_8);
String s;
BufferedReader bufferedReader = new BufferedReader(reader);
while ((s = bufferedReader.readLine()) != null) {
fileHasBeenRead.append(s + "\n");
}
reader.close();
} catch (Exception e) {
System.out.print(e);
}
return fileHasBeenRead.toString().trim();
}
Finally, how would I fix this output issue as well as store each of these files that have been read into a seperate array that I can use elsewhere? Thanks!
I agree with Johnny Mopp, your file is encoded in UTF-16, not in UTF-8. The two �� at the beginning of your output looks like a byte order mark (BOM). In UTF-16, each character is coded on two bytes. Since your text only contains characters in the ASCII range, it means that each first byte is always 0x00. This is why you're seeing all these ▯: they correspond to the non-printable character 0x00. I would even say that since the two characters following �� are ▯ and a in this order, your file is using big-endian UTF-16.
Instead of UTF-8, use StandardCharsets.UTF_16. It will also take the BOM into account and use the appropriate endianness.
It's much easier (and usually better) to use existing libraries for common stuff. There is FileUtils from apache commons-io, that provides this functionality out of the box, reducing your file reading code to a one liner
String thisString = FileUtils.readFileToString("Models/" + modelFilePath, StandardCharsets.UTF_8);
... or whatever charset your file is using...
Related
I have to read a file called test.p2b with the following content:
I tried reading it like this:
static void branjeIzDatoteke(String location){
byte[] allBytes = new byte[10000];
try {
InputStream input = new FileInputStream(location);
int byteRead;
int j=0;
while ((byteRead = input.read())!=-1){
allBytes[j] = (byte)input.read();
}
String str = new String(allBytes,"UTF-8");
for (int i=0;i<=str.length()-8;i+=8){
//int charCode = Integer.parseInt(str.substring(i,i+8),2);
//System.out.println((char)charCode);
int drek = (int)str.charAt(i);
System.out.println(Integer.toBinaryString(drek));
}
} catch (IOException ex) {
Logger.getLogger(Slika.class.getName()).log(Level.SEVERE, null, ex);
}
}
I tried just printing out the string (when I created String str = new String(allBytes,"UTF-8");), but all I get is a square at the beginning and then 70+ blank lines with no text.
Then I tried the int charCode = Integer.parseInt(str.substring(i,i+8),2); and printing out each individual character, but then I got a NumberFormatException.
I even tried just converting
Finally I tried the Integer.toBinaryString I have at the end but in this case I get 1s and 0s. That's not what I want, I need to read the actual text but no method seems to work.
I've actually read a binary file before using the method I already tried:
int charCode = Integer.parseInt(str.substring(i,i+8),2);
System.out.println((char)charCode);
but like I said, I get a NumberFormatException.
I don't understand why these methods won't work.
If you want to read all the bytes you can use the java.nio.file.Files utility class:
Path path = Paths.get("test.p2b");
byte[] allBytes = Files.readAllBytes(path);
String str = new String(allBytes, "UTF-8");
System.out.print(str);
You iteration over str content might not work. Certain UTF characters are expressed as surrogate pairs, a code points that can span more than one char (as explained here). Since you are using UTF you should be using String#codePoinst() method to iterate over the code points instead of the characters.
I've got an oddball problem here. I've got a little java program that filters Minecraft log files to make them easier to read. On each line of these logs, there are usually multiple instances of the character "§", which returns a hex value of FFFD.
I am filtering out this character (as well as the character following it) using:
currentLine = currentLine.replaceAll("\uFFFD.", "");
Now, when I run the program through NetBeans, it works swell. My lines get outputted looking like this:
CxndyAnnie: Mhm
CxndyAnnie: Sorry
But when I build the .jar file and wrap it into a .exe file using JSmooth, that character no longer gets filtered out when I run the .exe, and my lines come out looking like this:
§e§7[§f$65§7] §1§nCxndyAnnie§e: Mhm
§e§7[§f$65§7] §1§nCxndyAnnie§e: Sorry
(note: the additional square brackets and $65 show up because their filtering is dependent on the special character and it's following character being removed first)
Any ideas why this would no longer work after putting it through JSmooth? Is there a different way to do the text replace that would preserve its function?
By the way, I also attempted to remove this character using
currentLine = currentLine.replaceAll("§.", "");
but that didn't work in Netbeans nor as a .exe.
I'll go ahead and past the full method below:
public static String[] filterLines(String[] allLines, String filterType, Boolean timeStamps) throws IOException {
String currentLine = null;
FileWriter saveFile = new FileWriter("readable.txt");
String heading;
String string1 = "[L]";
String string2 = "[A]";
String string3 = "[G]";
if (filterType.equals(string1)) {
heading = "LOCAL CHAT LOGS ONLY \r\n\r\n";
}
else if (filterType.equals(string2)) {
heading = "ADVERTISING CHAT LOGS ONLY \r\n\r\n";
}
else if (filterType.equals(string3)) {
heading = "GLOBAL CHAT LOGS ONLY \r\n\r\n";
}
else {
heading = "CHAT LINES CONTAINING \"" + filterType + "\" \r\n\r\n";
}
saveFile.write(heading);
for (int i = 0; i < allLines.length; i++) {
if ((allLines[i] != null ) && (allLines[i].contains(filterType))) {
currentLine = allLines[i];
if (!timeStamps) {
currentLine = currentLine.replaceAll("\\[..:..:..\\].", "");
}
currentLine = currentLine.replaceAll("\\[Client thread/INFO\\]:.", "");
currentLine = currentLine.replaceAll("\\[CHAT\\].", "");
currentLine = currentLine.replaceAll("\uFFFD.", "");
currentLine = currentLine.replaceAll("\\[A\\].", "");
currentLine = currentLine.replaceAll("\\[L\\].", "");
currentLine = currentLine.replaceAll("\\[G\\].", "");
currentLine = currentLine.replaceAll("\\[\\$..\\].", "");
currentLine = currentLine.replaceAll(".>", ":");
currentLine = currentLine.replaceAll("\\[\\$100\\].", "");
saveFile.write(currentLine + "\r\n");
//System.out.println(currentLine);
}
}
saveFile.close();
ProcessBuilder openFile = new ProcessBuilder("Notepad.exe", "readable.txt");
openFile.start();
return allLines;
}
FINAL EDIT
Just in case anyone stumbles across this and needs to know what finally worked, here's the snippet of code where I pull the lines from the file and re-encode it to work:
BufferedReader fileLines;
fileLines = new BufferedReader(new FileReader(file));
String[] allLines = new String[numLines];
int i=0;
while ((line = fileLines.readLine()) != null) {
byte[] bLine = line.getBytes();
String convLine = new String(bLine, Charset.forName("UTF-8"));
allLines[i] = convLine;
i++;
}
I also had a problem like this in the past with minecroft logs, I don’t remember the exact details, but the issue came down to a file format problem, where UTF8 encoding worked correctly but some other text encoding including the system default did not work correctly.
First:
Make sure that you specify UTF8 encoding when reading the byteArray from file so that allLines contains the correct info like so:
Path fileLocation = Paths.get("C:/myFileLocation/logs.txt");
byte[] data = Files.readAllBytes(fileLocation);
String allLines = new String(data , Charset.forName("UTF-8"));
Second:
Using \uFFFD is not going to work, because \uFFFD is only used to replace an incoming character whose value is unknown or unrepresentable in Unicode.
However if you used the correct encoding (shown in my first point) then \uFFFD is not necessary because the value § is known in unicode so you can simply use
currentLine.replaceAll("§", "");
or specifically use the actual unicode string for that character U+00A7 like so
currentLine.replaceAll("\u00A7", "");
or just use both those lines in your code.
My guess is, the code I've written doesn't work with .CSV files, but only .txt.
The purpose of my code is to take the user input from field1, and check against my .CSV file to see if there is an instance of the user input located within the file. If there is, then it will be replaced by the user input from field2.
This works with my .txt file, but not with my .CSV file.
Here's the code that is activated at the push of a button (save button):
try{
// Input the file location into Path variable 'p'
//Cannot write to CSV files
//Path p = Paths.get("C:\\Users\\myname\\Documents\\Stock Take Program\\tiger.csv");
Path p = Paths.get("C:\\Users\\myname\\Desktop\\test.txt");
//Read the whole file to a ArrayList
List<String> fileContent = new ArrayList<>(Files.readAllLines(p));
//Converting user input from editSerialField to a string
String strSerial = editSerialField.getText();
//Converting user input from editLocationField to a string
String strLocation = editLocationField.getText();
//This structure looks for a duplicate in the text file, if so, replaces it with the user input from editLocationField.
for (int i = 0; i < fileContent.size(); i++)
{
if (fileContent.get(i).equals(strSerial))
{
fileContent.set(i, strLocation);
}
break;
}
// write the new String with the replaced line OVER the same file
Files.write(p, fileContent);
}catch(IOException e)
{
e.printStackTrace();
}
My question is, how can I update my code to work with updating and replacing the contents of a .CSV file with the user input, the same way as it works for my .txt files.
When writing to a text file, it replaces only the first line, but when writing to a .CSV file, it does not replace anything.
Is there anyway I should be writing my code differently to replace text within a .CSV file.
Any help is greatly appreciated, thanks.
I'm an idiot. My '.CSV' file is actually titled tiger.csv as a text file. I've now saved an actual CSV version and it is now working.
Haha, thanks for the help guys.
Probably should be in another question, but the problem relating to the change only working on the first line is due to the break being called and shortcutting the loop on the first way round. Put it within the if block.
for (int i = 0; i < fileContent.size(); i++)
{
if (fileContent.get(i).equals(strSerial))
{
fileContent.set(i, strLocation);
break;
}
}
Or leave it off completely if you want it to be able to update multiple lines.
HTH,
public static void main(String[] args) throws FileNotFoundException {
String UserEntredValu="Karnataka";
String csvFile = "C:/Users/GOOGLE/Desktop/sample/temp.csv";
String line = "";
String cvsSplitBy = ",";
PrintWriter pw = null;
pw = new PrintWriter(new File(csvFile));
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
while ((line = br.readLine()) != null) {
// use comma as separator
String[] country = line.split(cvsSplitBy);
for( int i = 0; i < country.length - 1; i++)
{
String element = country[i];
if(element.contains(UserEntredValu)){
String newEle=element.replace(element, "NEW INDIA");
pw.write(newEle);
System.out.println("done!");
pw.close();
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
Problem: Arabic words in my text files read by java show as series of question marks : ??????
Here is the code:
File[] fileList = mainFolder.listFiles();
BufferedReader bufferReader = null;
Reader reader = null;
try{
for(File f : fileList){
reader = new InputStreamReader(new FileInputStream(f.getPath()), "UTF8");
bufferReader = new BufferedReader(reader);
String line = null;
while((line = bufferReader.readLine())!= null){
System.out.println(new String(line.getBytes(), "UTF-8"));
}
}
}
catch(Exception exc){
exc.printStackTrace();
}
finally {
//Close the BufferedReader
try {
if (bufferReader != null)
bufferReader.close();
} catch (IOException ex) {
ex.printStackTrace();
}
As you can see I have specified the UTF-8 encoding in different places and still I get question marks, do you have any idea how can I fix this??
Thanks
Instead of trying to print out the line directly, print out the Unicode values of each character. For example:
char[] chars = line.toCharArray();
for (int i = 0; i < chars.length; i++)
{
System.out.println(i + ": " + chars[i] + " - " + (int) chars[i]);
}
Then look up the relevant characters in the Unicode code charts.
If you find it's printing 63, then those really are question marks... which would suggest that your text file isn't truly UTF-8 to start with.
If, on the other hand for some characters it's printing out "?" but then a value other than 63, then that would suggest it's a console display issue and you're reading the data correctly.
Replace
System.out.println(new String(line.getBytes(), "UTF-8"));
by
System.out.println(line);
The String#getBytes() without the charset argument namely uses platform default encoding to get the bytes from the string, which may not be UTF-8 per se. You're already reading the bytes as UTF-8 by InputStreamReader, so you don't need to massage it forth and back afterwards.
Further, ensure that your display console (where you're reading those lines) supports UTF-8. In for example Eclipse, you can do that by Window > Preferences > General > Workspace > Text File Encoding > Other > UTF-8.
See also:
Unicode - How to get the characters right?
I want to read a text file containing space separated values. Values are integers.
How can I read it and put it in an array list?
Here is an example of contents of the text file:
1 62 4 55 5 6 77
I want to have it in an arraylist as [1, 62, 4, 55, 5, 6, 77]. How can I do it in Java?
You can use Files#readAllLines() to get all lines of a text file into a List<String>.
for (String line : Files.readAllLines(Paths.get("/path/to/file.txt"))) {
// ...
}
Tutorial: Basic I/O > File I/O > Reading, Writing and Creating text files
You can use String#split() to split a String in parts based on a regular expression.
for (String part : line.split("\\s+")) {
// ...
}
Tutorial: Numbers and Strings > Strings > Manipulating Characters in a String
You can use Integer#valueOf() to convert a String into an Integer.
Integer i = Integer.valueOf(part);
Tutorial: Numbers and Strings > Strings > Converting between Numbers and Strings
You can use List#add() to add an element to a List.
numbers.add(i);
Tutorial: Interfaces > The List Interface
So, in a nutshell (assuming that the file doesn't have empty lines nor trailing/leading whitespace).
List<Integer> numbers = new ArrayList<>();
for (String line : Files.readAllLines(Paths.get("/path/to/file.txt"))) {
for (String part : line.split("\\s+")) {
Integer i = Integer.valueOf(part);
numbers.add(i);
}
}
If you happen to be at Java 8 already, then you can even use Stream API for this, starting with Files#lines().
List<Integer> numbers = Files.lines(Paths.get("/path/to/test.txt"))
.map(line -> line.split("\\s+")).flatMap(Arrays::stream)
.map(Integer::valueOf)
.collect(Collectors.toList());
Tutorial: Processing data with Java 8 streams
Java 1.5 introduced the Scanner class for handling input from file and streams.
It is used for getting integers from a file and would look something like this:
List<Integer> integers = new ArrayList<Integer>();
Scanner fileScanner = new Scanner(new File("c:\\file.txt"));
while (fileScanner.hasNextInt()){
integers.add(fileScanner.nextInt());
}
Check the API though. There are many more options for dealing with different types of input sources, differing delimiters, and differing data types.
This example code shows you how to read file in Java.
import java.io.*;
/**
* This example code shows you how to read file in Java
*
* IN MY CASE RAILWAY IS MY TEXT FILE WHICH I WANT TO DISPLAY YOU CHANGE WITH YOUR OWN
*/
public class ReadFileExample
{
public static void main(String[] args)
{
System.out.println("Reading File from Java code");
//Name of the file
String fileName="RAILWAY.txt";
try{
//Create object of FileReader
FileReader inputFile = new FileReader(fileName);
//Instantiate the BufferedReader Class
BufferedReader bufferReader = new BufferedReader(inputFile);
//Variable to hold the one line data
String line;
// Read file line by line and print on the console
while ((line = bufferReader.readLine()) != null) {
System.out.println(line);
}
//Close the buffer reader
bufferReader.close();
}catch(Exception e){
System.out.println("Error while reading file line by line:" + e.getMessage());
}
}
}
Look at this example, and try to do your own:
import java.io.*;
public class ReadFile {
public static void main(String[] args){
String string = "";
String file = "textFile.txt";
// Reading
try{
InputStream ips = new FileInputStream(file);
InputStreamReader ipsr = new InputStreamReader(ips);
BufferedReader br = new BufferedReader(ipsr);
String line;
while ((line = br.readLine()) != null){
System.out.println(line);
string += line + "\n";
}
br.close();
}
catch (Exception e){
System.out.println(e.toString());
}
// Writing
try {
FileWriter fw = new FileWriter (file);
BufferedWriter bw = new BufferedWriter (fw);
PrintWriter fileOut = new PrintWriter (bw);
fileOut.println (string+"\n test of read and write !!");
fileOut.close();
System.out.println("the file " + file + " is created!");
}
catch (Exception e){
System.out.println(e.toString());
}
}
}
Just for fun, here's what I'd probably do in a real project, where I'm already using all my favourite libraries (in this case Guava, formerly known as Google Collections).
String text = Files.toString(new File("textfile.txt"), Charsets.UTF_8);
List<Integer> list = Lists.newArrayList();
for (String s : text.split("\\s")) {
list.add(Integer.valueOf(s));
}
Benefit: Not much own code to maintain (contrast with e.g. this). Edit: Although it is worth noting that in this case tschaible's Scanner solution doesn't have any more code!
Drawback: you obviously may not want to add new library dependencies just for this. (Then again, you'd be silly not to make use of Guava in your projects. ;-)
Use Apache Commons (IO and Lang) for simple/common things like this.
Imports:
import org.apache.commons.io.FileUtils;
import org.apache.commons.lang3.ArrayUtils;
Code:
String contents = FileUtils.readFileToString(new File("path/to/your/file.txt"));
String[] array = ArrayUtils.toArray(contents.split(" "));
Done.
Using Java 7 to read files with NIO.2
Import these packages:
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
This is the process to read a file:
Path file = Paths.get("C:\\Java\\file.txt");
if(Files.exists(file) && Files.isReadable(file)) {
try {
// File reader
BufferedReader reader = Files.newBufferedReader(file, Charset.defaultCharset());
String line;
// read each line
while((line = reader.readLine()) != null) {
System.out.println(line);
// tokenize each number
StringTokenizer tokenizer = new StringTokenizer(line, " ");
while (tokenizer.hasMoreElements()) {
// parse each integer in file
int element = Integer.parseInt(tokenizer.nextToken());
}
}
reader.close();
} catch (Exception e) {
e.printStackTrace();
}
}
To read all lines of a file at once:
Path file = Paths.get("C:\\Java\\file.txt");
List<String> lines = Files.readAllLines(file, StandardCharsets.UTF_8);
All the answers so far given involve reading the file line by line, taking the line in as a String, and then processing the String.
There is no question that this is the easiest approach to understand, and if the file is fairly short (say, tens of thousands of lines), it'll also be acceptable in terms of efficiency. But if the file is long, it's a very inefficient way to do it, for two reasons:
Every character gets processed twice, once in constructing the String, and once in processing it.
The garbage collector will not be your friend if there are lots of lines in the file. You're constructing a new String for each line, and then throwing it away when you move to the next line. The garbage collector will eventually have to dispose of all these String objects that you don't want any more. Someone's got to clean up after you.
If you care about speed, you are much better off reading a block of data and then processing it byte by byte rather than line by line. Every time you come to the end of a number, you add it to the List you're building.
It will come out something like this:
private List<Integer> readIntegers(File file) throws IOException {
List<Integer> result = new ArrayList<>();
RandomAccessFile raf = new RandomAccessFile(file, "r");
byte buf[] = new byte[16 * 1024];
final FileChannel ch = raf.getChannel();
int fileLength = (int) ch.size();
final MappedByteBuffer mb = ch.map(FileChannel.MapMode.READ_ONLY, 0,
fileLength);
int acc = 0;
while (mb.hasRemaining()) {
int len = Math.min(mb.remaining(), buf.length);
mb.get(buf, 0, len);
for (int i = 0; i < len; i++)
if ((buf[i] >= 48) && (buf[i] <= 57))
acc = acc * 10 + buf[i] - 48;
else {
result.add(acc);
acc = 0;
}
}
ch.close();
raf.close();
return result;
}
The code above assumes that this is ASCII (though it could be easily tweaked for other encodings), and that anything that isn't a digit (in particular, a space or a newline) represents a boundary between digits. It also assumes that the file ends with a non-digit (in practice, that the last line ends with a newline), though, again, it could be tweaked to deal with the case where it doesn't.
It's much, much faster than any of the String-based approaches also given as answers to this question. There is a detailed investigation of a very similar issue in this question. You'll see there that there's the possibility of improving it still further if you want to go down the multi-threaded line.
read the file and then do whatever you want
java8
Files.lines(Paths.get("c://lines.txt")).collect(Collectors.toList());