Remove escape characters from String loaded from file - java

I am using the below method to load a string from file to variable.
private static String readFile(String path) throws IOException {
FileInputStream stream = new FileInputStream(new File(path));
try {
FileChannel fc = stream.getChannel();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
/* Instead of using default, pass in a decoder. */
return Charset.defaultCharset().decode(bb).toString();
}
finally {
stream.close();
}
}
The problem is that my variable has escape characters in it. I want my variables to contain:
some string
but instead it looks like:
some string&#xd
how can i improve my method to not allow that?

You can use Reader instead, and BufferedReader in particular to read lines from TXT file:
BufferedReader br = new BufferedReader(new FileReader(path));
String line = br.readLine(); // this strips line termination characters for you
If you want to read the whole file, there are lots of utility classes that provide this functionality (like Google Guava):
String contents = Files.toString(new File(path), charset);

I would think that there are some hidden characters in your .txt file.
You could try:
return Charset.defaultCharset()
.newDecoder()
.onMalformedInput(CodingErrorAction.IGNORE)
.onUnmappableCharacter(CodingErrorAction.IGNORE)
.decode(bb)
.toString()

Related

Read UTF-8 properties file and save as UTF-8 txt file

I am currently trying to analyze all of my properties files and need my properties files in the form of a .txt file for one part. The problem is that german "Umlaute" like Ä,Ü,Ö etc. are not taken over correctly and therefore my program does not work. (If I convert the files manually into a txt there are no problems, but the whole thing should run dynamically)
Here is my code I am currently using:
private static void createTxt(String filePath, String savePath) throws IOException {
final File file = new File(filePath);
final BufferedReader bReader = new BufferedReader(new FileReader(file.getPath()));
final List<String> stringList= new ArrayList<>();
String line = bReader.readLine();
while (line != null) {
stringList.add(line);
line = bReader.readLine();
}
final Writer out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(savePath), "UTF-8"));
try {
for (final String s : stringList) {
out.write(s + "\n");
}
}
finally {
out.close();
}
}
The encoding of the txt is also UTF-8 - I think the problem is due to the bufferedReader or caching into the ArrayList
Thank you for your time and help,
LG Pascal
When reading and writing files you should always set a charset. FileReader has a constructor that takes a Charset.
new FileReader(file, StandardCharsets.UTF_8)
If you just want to read all lines from a file just use Files.readAllLines(path, StandardCharsets.UTF_8);
To write you can use Files.write(path, listOfStrings, StandardCharsets.UTF_8);
And if you only want to copy the files, just use Files.copy(source, target);

Copying and merging two files

I am using java File Streams. I have two files. First file may or may not be empty. The second file contains strings and floats. If the first file is empty then I want to copy second file in it. else I want to merge the files.
Have tried RandomAccessFile but it's not working.
If you want to copy a file then use
public static Path copy(Path source,
Path target,
CopyOption... options)
throws IOException
File.copy()
If you want to merge them then open the file in write mode in which you want to append the data with appending mode.
BufferedWriter bw = new BufferedWritr(new FileWriter("file.txr",true));
and then write the data in bw which you have read from the source file.
My solution would look like this:
public void CopyFile(File one, File two) throws IOException {
// Declare the reader and the writer
BufferedReader in = new BufferedReader(new FileReader(one));
BufferedWriter out;
String contentOfFileOne = "";
// Read the content of the first file
while(in.ready()){
contentOfFileOne += in.readLine();
}
// Trim all whitespaces
contentOfFileOne.trim();
// If the first file is empty
if(contentOfFileOne.isEmpty()){
// Create a new Writer to the first file and a reader
// from the second file
in.close();
out = new BufferedWriter(new FileWriter(one));
in = new BufferedReader(new FileReader(two));
while(in.ready()){
String currentLine = in.readLine();
out.write(currentLine);
}
// Close them accordingly
in.close();
out.close();
} else {
// If the first file contains something
in.close();
out = new BufferedWriter(new FileWriter(one,true));
in = new BufferedReader(new FileReader(two));
// Copy the content of file two at the end of file one
while(in.ready()){
String currentLine = in.readLine();
out.write(currentLine);
}
in.close();
out.close();
}
}
The comments should explain the functionality.
I think this is supposed to be the most efficient option
FileChannel f1 = FileChannel.open(Paths.get("1"), StandardOpenOption.APPEND);
FileChannel f2 = FileChannel.open(Paths.get("2"));
f1.transferFrom(f2, f1.size(), Long.MAX_VALUE);

How to modify text of a text-file which is read using FileInputStream

I have to use a method whose signature is like this
aMethod(FileInputStream);
I call that method like this
FileInputStream inputStream = new FileInputStream(someTextFile);
aMethod(inputStream);
I want to remove/edit some char which is being read from someTextFile before it being passed into aMethod(inputStream);
I cannot change aMethod's signature or overload it. And, it just take a InputStream.
If method taking a string as param, then I wouldn't be asking this question.
I am InputStream noob. Please advise.
you can convert a string into input stream
String str = "Converted stuff from reading the other inputfile and modifying it";
InputStream is = new ByteArrayInputStream(str.getBytes());
Here is something that might help. It will grab your .txt file. Then it will load it and go through line by line. You have to fill in the commented areas to do what you want.
public void parseFile() {
String inputLine;
String filename = "YOURFILE.txt";
Thread thisThread = Thread.currentThread();
ClassLoader loader = thisThread.getContextClassLoader();
InputStream is = loader.getResourceAsStream(filename);
try {
FileWriter fstream = new FileWriter("path/to/NEWFILE.txt");
BufferedWriter out = new BufferedWriter(fstream);
BufferedReader reader = new BufferedReader(
new InputStreamReader(is));
while((inputLine = reader.readLine()) != null) {
String[] str = inputLine.split("\t");
if(/* IF WHAT YOU WANT IS IN THE FILE ADD IT */) {
// DO SOMETHING OR ADD WHAT YOU WANT
out.append(str);
out.newLine();
}
}
reader.close();
out.close();
} catch (Exception e) {
e.getMessage();
}
}
Have you looked at another class FilterInputStream which also extends InputStream which may fit into your requirement?
From the documentation for the class
A FilterInputStream contains some other input stream, which it uses as its basic source of data, possibly transforming the data along the way or providing additional functionality.
Also have a look at this question which also seems to be similar to your question.

Check line for unprintable characters while reading text file

My program must read text files - line by line.
Files in UTF-8.
I am not sure that files are correct - can contain unprintable characters.
Is possible check for it without going to byte level?
Thanks.
Open the file with a FileInputStream, then use an InputStreamReader with the UTF-8 Charset to read characters from the stream, and use a BufferedReader to read lines, e.g. via BufferedReader#readLine, which will give you a string. Once you have the string, you can check for characters that aren't what you consider to be printable.
E.g. (without error checking), using try-with-resources (which is in vaguely modern Java version):
String line;
try (
InputStream fis = new FileInputStream("the_file_name");
InputStreamReader isr = new InputStreamReader(fis, Charset.forName("UTF-8"));
BufferedReader br = new BufferedReader(isr);
) {
while ((line = br.readLine()) != null) {
// Deal with the line
}
}
While it's not hard to do this manually using BufferedReader and InputStreamReader, I'd use Guava:
List<String> lines = Files.readLines(file, Charsets.UTF_8);
You can then do whatever you like with those lines.
EDIT: Note that this will read the whole file into memory in one go. In most cases that's actually fine - and it's certainly simpler than reading it line by line, processing each line as you read it. If it's an enormous file, you may need to do it that way as per T.J. Crowder's answer.
Just found out that with the Java NIO (java.nio.file.*) you can easily write:
List<String> lines=Files.readAllLines(Paths.get("/tmp/test.csv"), StandardCharsets.UTF_8);
for(String line:lines){
System.out.println(line);
}
instead of dealing with FileInputStreams and BufferedReaders...
If you want to check a string has unprintable characters you can use a regular expression
[^\p{Print}]
How about below:
FileReader fileReader = new FileReader(new File("test.txt"));
BufferedReader br = new BufferedReader(fileReader);
String line = null;
// if no more lines the readLine() returns null
while ((line = br.readLine()) != null) {
// reading lines until the end of the file
}
Source: http://devmain.blogspot.co.uk/2013/10/java-quick-way-to-read-or-write-to-file.html
I can find following ways to do.
private static final String fileName = "C:/Input.txt";
public static void main(String[] args) throws IOException {
Stream<String> lines = Files.lines(Paths.get(fileName));
lines.toArray(String[]::new);
List<String> readAllLines = Files.readAllLines(Paths.get(fileName));
readAllLines.forEach(s -> System.out.println(s));
File file = new File(fileName);
Scanner scanner = new Scanner(file);
while (scanner.hasNext()) {
System.out.println(scanner.next());
}
The answer by #T.J.Crowder is Java 6 - in java 7 the valid answer is the one by #McIntosh - though its use of Charset for name for UTF -8 is discouraged:
List<String> lines = Files.readAllLines(Paths.get("/tmp/test.csv"),
StandardCharsets.UTF_8);
for(String line: lines){ /* DO */ }
Reminds a lot of the Guava way posted by Skeet above - and of course same caveats apply. That is, for big files (Java 7):
BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);
for (String line = reader.readLine(); line != null; line = reader.readLine()) {}
If every char in the file is properly encoded in UTF-8, you won't have any problem reading it using a reader with the UTF-8 encoding. Up to you to check every char of the file and see if you consider it printable or not.

How can I read a .txt file into a single Java string while maintaining line breaks?

Virtually every code example out there reads a TXT file line-by-line and stores it in a String array. I do not want line-by-line processing because I think it's an unnecessary waste of resources for my requirements: All I want to do is quickly and efficiently dump the .txt contents into a single String. The method below does the job, however with one drawback:
private static String readFileAsString(String filePath) throws java.io.IOException{
byte[] buffer = new byte[(int) new File(filePath).length()];
BufferedInputStream f = null;
try {
f = new BufferedInputStream(new FileInputStream(filePath));
f.read(buffer);
if (f != null) try { f.close(); } catch (IOException ignored) { }
} catch (IOException ignored) { System.out.println("File not found or invalid path.");}
return new String(buffer);
}
... the drawback is that the line breaks are converted into long spaces e.g. " ".
I want the line breaks to be converted from \n or \r to <br> (HTML tag) instead.
Thank you in advance.
What about using a Scanner and adding the linefeeds yourself:
sc = new java.util.Scanner ("sample.txt")
while (sc.hasNext ()) {
buf.append (sc.nextLine ());
buf.append ("<br />");
}
I don't see where you get your long spaces from.
You can read directly into the buffer and then create a String from the buffer:
File f = new File(filePath);
FileInputStream fin = new FileInputStream(f);
byte[] buffer = new byte[(int) f.length()];
new DataInputStream(fin).readFully(buffer);
fin.close();
String s = new String(buffer, "UTF-8");
You could add this code:
return new String(buffer).replaceAll("(\r\n|\r|\n|\n\r)", "<br>");
Is this what you are looking for?
The code will read the file contents as they appear in the file - including line breaks.
If you want to change the breaks into something else like displaying in html etc, you will either need to post process it or do it by reading the file line by line. Since you do not want the latter, you can replace your return by following which should do the conversion -
return (new String(buffer)).replaceAll("\r[\n]?", "<br>");
StringBuilder sb = new StringBuilder();
try {
InputStream is = getAssets().open("myfile.txt");
byte[] bytes = new byte[1024];
int numRead = 0;
try {
while((numRead = is.read(bytes)) != -1)
sb.append(new String(bytes, 0, numRead));
}
catch(IOException e) {
}
is.close();
}
catch(IOException e) {
}
your resulting String: String result = sb.toString();
then replace whatever you want in this result.
I agree with the general approach by #Sanket Patel, but using Commons I/O you would likely want File Utils.
So your code word look like:
String myString = FileUtils.readFileToString(new File(filePath));
There is also another version to specify an alternate character encoding.
You should try org.apache.commons.io.IOUtils.toString(InputStream is) to get file content as String. There you can pass InputStream object which you will get from
getAssets().open("xml2json.txt") *<<- belongs to Android, which returns InputStream*
in your Activity. To get String use this :
String xml = IOUtils.toString((getAssets().open("xml2json.txt")));
So,
String xml = IOUtils.toString(*pass_your_InputStream_object_here*);

Categories