Java printing stray character instead of actual - java

I have a simple java program that reads a line from file and writes it to another file.
My source file has words like: it's but the destination file is having words like it�s.
I am using BufferedReader br = new BufferedReader(new FileReader(inputFile)); to read the source file and PrintWriter writer = new PrintWriter(resultFile, "UTF-8"); to write the destination file.
How to get the actual character in my destination file too?

You need to specify a CharacterSet when creating the BufferedReader, otherwise the platform default encoding is used:
BufferedReader br = new BufferedReader(new FileReader(inputFile),"UTF-8");

I know this question is a bit old, but I thought I'd put down my answer anyway.
You can use java.nio.file.Files to read the file and java.io.RandomAccessFile to write to your destination file. For example:
public void copyContentsOfFile(File source, File destination){
Path p = Paths.get(source.toURI());
try {
byte[] bytes = Files.readAllBytes(p);
RandomAccessFile raf = new RandomAccessFile(destination, "rw");
raf.writeBytes(new String(bytes));
raf.close();
} catch (IOException e) {
e.printStackTrace();
}
}

Related

Read UTF-8 properties file and save as UTF-8 txt file

I am currently trying to analyze all of my properties files and need my properties files in the form of a .txt file for one part. The problem is that german "Umlaute" like Ä,Ü,Ö etc. are not taken over correctly and therefore my program does not work. (If I convert the files manually into a txt there are no problems, but the whole thing should run dynamically)
Here is my code I am currently using:
private static void createTxt(String filePath, String savePath) throws IOException {
final File file = new File(filePath);
final BufferedReader bReader = new BufferedReader(new FileReader(file.getPath()));
final List<String> stringList= new ArrayList<>();
String line = bReader.readLine();
while (line != null) {
stringList.add(line);
line = bReader.readLine();
}
final Writer out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(savePath), "UTF-8"));
try {
for (final String s : stringList) {
out.write(s + "\n");
}
}
finally {
out.close();
}
}
The encoding of the txt is also UTF-8 - I think the problem is due to the bufferedReader or caching into the ArrayList
Thank you for your time and help,
LG Pascal
When reading and writing files you should always set a charset. FileReader has a constructor that takes a Charset.
new FileReader(file, StandardCharsets.UTF_8)
If you just want to read all lines from a file just use Files.readAllLines(path, StandardCharsets.UTF_8);
To write you can use Files.write(path, listOfStrings, StandardCharsets.UTF_8);
And if you only want to copy the files, just use Files.copy(source, target);

Why does introducing a FileWriter delete all the content in the file?

I have a text file with some text in it and i'm planning on replacing certain characters in the text file. So for this i have to read the file using a buffered reader which wraps a file reader.
File file = new File("new.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
String line = null;
while ((line = br.readLine()) != null) {
System.out.println(line);
}
But since i have to edit characters i have to introduce a file writer and add the code which has a string method called replace all. so the overall code will look as given below.
File file = new File("new.txt");
FileWriter fw = new FileWriter(file);
BufferedReader br = new BufferedReader(new FileReader(file));
String line = null;
while ((line = br.readLine()) != null) {
System.out.println(line);
fw.write(br.readLine().replaceAll("t", "1") + "\n");
}
Problem is when i introduce a file writer to the code (By just having the initialization part and when i run the program it deletes the content in the file regardless of adding the following line)
fw.write(br.readLine().replaceAll("t", "1") + "\n");
Why is this occurring? am i following the correct approach to edit characters in a text file?
Or is there any other way of doing this?
Thank you.
public FileWriter(String fileName,
boolean append)
Parameters:
fileName - String The system-dependent filename.
append - boolean if true, then data will be written to the end of the
file rather than the beginning.
To append data use
new FileWriter(file, true);
The problem is that you're trying to write to the file while you're reading from it. A better solution would be to create a second file, put the transformed data into it, then replace the first file with it when you're done. Or if you don't want to do that, read all of the data out of the file first, then open it for writing and write the transformed data.
Also, have you considered using a text-processing language solution such as awk, sed or perl: https://unix.stackexchange.com/questions/112023/how-can-i-replace-a-string-in-a-files
You need to read the file first, and then, only after you read the entire file, you can write to it.
Or you open a different file for writing and then afterwards you replace the old file with the new one.
The reason is that once you start writing to a file, it is truncated (the data that was in the file is deleted).
The only way to avoid that is to open the file in "append" mode. With that mode, you start writing at the end of the file, so you don't delete its content. However, you won't be able to modify the existing content, you will only add content.
Maybe like this
public static void main(String[] args) throws IOException {
try {
File file = new File("/Users/alexanderkrum/IdeaProjects/printerTest/src/atmDep.txt");
Scanner myReader = new Scanner(file);
ArrayList<Integer> numbers = new ArrayList<>();
while (myReader.hasNextLine()) {
numbers.add(myReader.nextInt() + 1);
}
myReader.close();
FileWriter myWriter = new FileWriter(file);
for (Integer number :
numbers) {
myWriter.write(number.toString() + '\n');
}
myWriter.close();
} catch (FileNotFoundException e) {
System.out.println("An error occurred.");
e.printStackTrace();
}
}
Just add at last :
fw.close();
this will close it ,then it will not delete anything in the file.
:)

java detect if file is UTF-8 or Ansi

In Java is there a way to detect if a file is ANSI or UTF-8? The problem i am having is that if someone creates a CSV file in Excel it's UTF-8. If they create it using note pad it's ANSI.
I am wondering if i can detect the type of file then handle it accordingly.
Thanks.
You could try something like this. It relies on Excel including a Byte Order Mark (BOM), which a quick search suggests it does although I can't verify it, and on the fact that java treats the BOM as a particular "character" \uFEFF.
FileInputStream fis = new FileInputStream(file);
BufferedReader br = new BufferedReader(new InputStreamReader(fis, "UTF-8"));
String line = br.readLine();
if (line.startsWith("\uFEFF")) {
// it's UTF-8, throw away the BOM character and continue
line = line.substring(1);
} else {
// it's not UTF-8, reopen
br.close(); // also closes fis
fis = new FileInputStream(file); // reopen from the start
br = new BufferedReader(new InputStreamReader(fis, "Cp1252"));
line = br.readLine();
}
// now line contains the first line, and br.readLine() will get the next
Some more information on the UTF-8 Byte Order Mark and detection of encoding at http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

Copying and merging two files

I am using java File Streams. I have two files. First file may or may not be empty. The second file contains strings and floats. If the first file is empty then I want to copy second file in it. else I want to merge the files.
Have tried RandomAccessFile but it's not working.
If you want to copy a file then use
public static Path copy(Path source,
Path target,
CopyOption... options)
throws IOException
File.copy()
If you want to merge them then open the file in write mode in which you want to append the data with appending mode.
BufferedWriter bw = new BufferedWritr(new FileWriter("file.txr",true));
and then write the data in bw which you have read from the source file.
My solution would look like this:
public void CopyFile(File one, File two) throws IOException {
// Declare the reader and the writer
BufferedReader in = new BufferedReader(new FileReader(one));
BufferedWriter out;
String contentOfFileOne = "";
// Read the content of the first file
while(in.ready()){
contentOfFileOne += in.readLine();
}
// Trim all whitespaces
contentOfFileOne.trim();
// If the first file is empty
if(contentOfFileOne.isEmpty()){
// Create a new Writer to the first file and a reader
// from the second file
in.close();
out = new BufferedWriter(new FileWriter(one));
in = new BufferedReader(new FileReader(two));
while(in.ready()){
String currentLine = in.readLine();
out.write(currentLine);
}
// Close them accordingly
in.close();
out.close();
} else {
// If the first file contains something
in.close();
out = new BufferedWriter(new FileWriter(one,true));
in = new BufferedReader(new FileReader(two));
// Copy the content of file two at the end of file one
while(in.ready()){
String currentLine = in.readLine();
out.write(currentLine);
}
in.close();
out.close();
}
}
The comments should explain the functionality.
I think this is supposed to be the most efficient option
FileChannel f1 = FileChannel.open(Paths.get("1"), StandardOpenOption.APPEND);
FileChannel f2 = FileChannel.open(Paths.get("2"));
f1.transferFrom(f2, f1.size(), Long.MAX_VALUE);

Character corruption going from BufferedReader to BufferedWriter in java

In Java, I am trying to parse an HTML file that contains complex text such as greek symbols.
I encounter a known problem when text contains a left facing quotation mark. Text such as
mutations to particular “hotspot” regions
becomes
mutations to particular “hotspot�? regions
I have isolated the problem by writting a simple text copy meathod:
public static int CopyFile()
{
try
{
StringBuffer sb = null;
String NullSpace = System.getProperty("line.separator");
Writer output = new BufferedWriter(new FileWriter(outputFile));
String line;
BufferedReader input = new BufferedReader(new FileReader(myFile));
while((line = input.readLine())!=null)
{
sb = new StringBuffer();
//Parsing would happen
sb.append(line);
output.write(sb.toString()+NullSpace);
}
return 0;
}
catch (Exception e)
{
return 1;
}
}
Can anybody offer some advice as how to correct this problem?
★My solution
InputStream in = new FileInputStream(myFile);
Reader reader = new InputStreamReader(in,"utf-8");
Reader buffer = new BufferedReader(reader);
Writer output = new BufferedWriter(new FileWriter(outputFile));
int r;
while ((r = reader.read()) != -1)
{
if (r<126)
{
output.write(r);
}
else
{
output.write("&#"+Integer.toString(r)+";");
}
}
output.flush();
The file read is not in the same encoding (probably UTF-8) as the file written (probably ISO-8859-1).
Try the following to generate a file with UTF-8 encoding:
BufferedWriter output = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile),"UTF8"));
Unfortunately, determining the encoding of a file is very difficult. See Java : How to determine the correct charset encoding of a stream
In addition to what Thierry-Dimitri Roy wrote, if you know the encoding you have to create your FileReader with a bit of extra work. From the docs:
Convenience class for reading
character files. The constructors of
this class assume that the default
character encoding and the default
byte-buffer size are appropriate. To
specify these values yourself,
construct an InputStreamReader on a
FileInputStream.
The Javadoc for FileReader says:
The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.
In your case the default character encoding is probably not appropriate. Find what encoding the input file uses, and specify it. For example:
FileInputStream fis = new FileInputStream(myFile);
InputStreamReader isr = new InputStreamReader(fis, "charset name goes here");
BufferedReader input = new BufferedReader(isr);

Categories