This question already has answers here:
Non-blocking (NIO) reading of lines
(9 answers)
Unable to get and output .txt
(1 answer)
Closed 5 years ago.
I tried to read a txt file (with texts inside) in every lines .Then I will process the lines later .
Here is my work.
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.stream.Stream;
public class Fypio {
public static void main(String args[]) {
String fileName = "e://input.txt";
//read file into stream, try-with-resources
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
stream.forEach(System.out::println);
} catch (IOException e) {
e.printStackTrace();
}
}
}
However, I get the following error. I am definitely sure that the directory is correct though.
Error:
Exception in thread "main" java.io.UncheckedIOException: java.nio.charset.MalformedInputException: Input length = 1
at java.io.BufferedReader$1.hasNext(BufferedReader.java:574)
at java.util.Iterator.forEachRemaining(Iterator.java:115)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at fypio.Fypio.main(Fypio.java:21)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at java.io.BufferedReader$1.hasNext(BufferedReader.java:571)
... 4 more
#Or any sample codes can be provided to read txt file by line ?
Update my txt files should be encoded with ANSI
MalformedInputException means your text file is not in the charset (encoding) you requested.
Although your code does not explicitly specify a charset, the Files.lines method always uses UTF-8:
Read all lines from a file as a Stream. Bytes from the file are decoded into characters using the UTF-8 charset.
Since your text file is not a UTF-8 text file, you’ll need to specify its charset in your code. If you aren’t sure, the file probably uses the system’s default charset:
try (Stream<String> stream = Files.lines(Paths.get(fileName), Charset.defaultCharset())) {
Update:
You have stated in a comment that your text file is “ANSI,” which is the (technically incorrect) name Windows uses for its one-byte charsets. On a US version of Windows, you’d probably want to use:
try (Stream<String> stream = Files.lines(Paths.get(fileName), Charset.forName("windows-1252"))) {
Related
I'm reading a file line by line, and I am trying to make it so that if I get to a line that fits my specific parameters (in my case if it begins with a certain word), that I can overwrite that line.
My current code:
try {
FileInputStream fis = new FileInputStream(myFile);
DataInputStream in = new DataInputStream(fis);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
if (line.startsWith("word")) {
// replace line code here
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
...where myFile is a File object.
As always, any help, examples, or suggestions are much appreciated.
Thanks!
RandomAccessFile seems a good fit. Its javadoc says:
Instances of this class support both reading and writing to a random access file. A random access file behaves like a large array of bytes stored in the file system. There is a kind of cursor, or index into the implied array, called the file pointer; input operations read bytes starting at the file pointer and advance the file pointer past the bytes read. If the random access file is created in read/write mode, then output operations are also available; output operations write bytes starting at the file pointer and advance the file pointer past the bytes written. Output operations that write past the current end of the implied array cause the array to be extended. The file pointer can be read by the getFilePointer method and set by the seek method.
That said, since text files are a sequential file format, you can not replace a line with a line of a different length without moving all subsequent characters around, so to replace lines will in general amount to reading and writing the entire file. This may be easier to accomplish if you write to a separate file, and rename the output file once you are done. This is also more robust in case if something goes wrong, as one can simply retry with the contents of the initial file. The only advantage of RandomAccessFile is that you do not need the disk space for the temporary output file, and may get slight better performance out of the disk due to better access locality.
Your best bet here is likely going to be reading in the file into memory (Something like a StringBuilder) and writing what you want your output file to look like into the StringBuilder. After you're done reading in the file completely, you'll then want to write the contents of the StringBuilder to the file.
If the file is too large to accomplish this in memory you can always read in the contents of the file line by line and write them to a temporary file instead of a StringBuilder. After that is done you can delete the old file and move the temporary one in its place.
An old question, recently worked on this. Sharing the experience
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public static void updateFile(Path file) {
// Get all the lines
try (Stream<String> stream = Files.lines(file,StandardCharsets.UTF_8)) {
// Do the replace operation
List<String> list = stream.map(line -> line.replaceAll("test", "new")).collect(Collectors.toList());
// Write the content back
Files.write(file, list, StandardCharsets.UTF_8);
} catch (IOException e) {
e.printStackTrace();
}
}
I got my last question marked as duplicated as question Which encoding does Process.getInputStream() use?. While actually that's not what I'm asking. In my second example, UTF-8 can successfully parse the special character. However, when the special character is read from the process input stream, it cannot be parsed correctly by UTF-8 anymore. Why does this happen and does that mean ISO_8859_1 is the only option I can choose.
I'm working on a plugin which can retrieve the Azure key vault secret in runtime. However, there's one encoding issue. I stored a string contains special character ç, the string is as follows: HrIaMFBc78!?%$timodagetwiçç99. However, with following program, the special character ç cannot be parsed correctly:
package com.buildingblocks.azure.cli;
import java.io.*;
import java.nio.charset.StandardCharsets;
public class Test {
static String decodeText(String command) throws IOException, InterruptedException {
Process p;
StringBuilder output = new StringBuilder();
p = Runtime.getRuntime().exec("cmd.exe /c \"" + command + "\"");
p.waitFor();
InputStream stream;
if (p.exitValue() != 0) {
stream = p.getErrorStream();
} else {
stream = p.getInputStream();
}
BufferedReader reader = new BufferedReader(new InputStreamReader(stream, StandardCharsets.UTF_8));
String line = "";
while ((line = reader.readLine()) != null) {
output.append(line + "\n");
}
return output.toString();
}
public static void main(String[] arg) throws IOException, InterruptedException {
System.out.println(decodeText("az keyvault secret show --name \"test-password\" --vault-name \"test-keyvault\""));
}
}
The output is: "value": "HrIaMFBc78!?%$timodagetwi��99"
If I use following program to parse the String, the special character ç can be parsed successfully.
package com.buildingblocks.azure.cli;
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
public class Test {
static String decodeText(String input, String encoding) throws IOException {
return
new BufferedReader(
new InputStreamReader(
new ByteArrayInputStream(input.getBytes()),
Charset.forName(encoding)))
.readLine();
}
public static void main(String[] arg) throws IOException {
System.out.println(decodeText("HrIaMFBc78!?%$timodagetwiçç99", StandardCharsets.UTF_8.toString()));
}
}
Both of them are using the BufferedReader with the same setup, but the one parsing the output from process failed. Does anybody know the reason for this?
You are reading with UTF-8
BufferedReader reader = new BufferedReader(
new InputStreamReader(stream, StandardCharsets.UTF_8));
Your second example does write the String as UTF-8 so it can be read with the former mentioned code and works well.
But your first example does execute cmd.exe (so Windows OS) and fetches the returned stream data by OS.
At Windows you normally have CP1252 as default charset which is not UTF-8.
You could either setup the default character encoding for Windows to UTF-8 - please look at
Save text file in UTF-8 encoding using cmd.exe for an HowTo.
Or you just use the system encoding of your OS (At Windows normally CP1252) at your input stream reader creation (instead StandardCharsets.UTF_8).
The ç in has two bytes in UTF-8 encoding, so two of them would be four bytes. The two place holder characters � suggest that only two bytes were there. In ISO 8859-1 encoding, a ç has one byte, so this suggests that the encoding was not UTF-8, but may have been ISO 8859-1.
The InputStream does not use any encoding, it just transfers the bytes. The encoding is used in the InputStreamReader.
A hex-dump of the input might be useful. Alternatively, you can try to interpose a script between the Java program and the program you want to call, and analyse the situation there. Or just try with ISO 8859-1 instead.
The charset you select in Java should match the encoding used by the command you execute. It's not UTF-8, and is probably ISO-8859-1. Because the encoding used by the command is likely to default to something different on different machines, you might try setting it explicitly to a known value before executing your command:
chcp 65001 && <command>
Or, in your context:
Runtime.getRuntime().exec("cmd.exe /c \"chcp && " + command + "\"");
Windows code page 65001 is UTF-8.
Note that failing to consume the output of the subprocess can cause it to block, and never terminate, so your waitFor() may block because you consume the output afterward. The standard output of the process may have a large enough buffer to complete, but if there is output to standard error, it is more likely to block. An alternative is to direct standard error to the stderr of the parent Java process.
The CMD.EXE you launch with ProcessBuilder / Runtime.getRuntime will be sending a stream of the default platform charset. This is not necessarily UTF-8 or the same as your JVM default charset (as you may have changed that with system property -Dfile.encoding=XYZ).
You may be able to determine the charset of the CMD.EXE stream for use in your first method by running CMD.EXE and seeing what value of file.encoding is printed when running JVM without extra parameter:
C:\> java -XshowSettings:properties
Property settings:
...
file.encoding = Cp1252 (or whatever)
There is something wrong with GZIPInputStream or GZIPOutputStream. Just please read the following code (or run it and see what happens):
def main(a: Array[String]) {
val name = "test.dat"
new GZIPOutputStream(new FileOutputStream(name)).write(10)
println(new GZIPInputStream(new FileInputStream(name)).read())
}
It creates a file test.dat, writes a single byte 10 formatting by GZIP, and read the byte in the same file with the same format.
And this is what I got running it:
Exception in thread "main" java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(Unknown Source)
at java.util.zip.InflaterInputStream.read(Unknown Source)
at java.util.zip.GZIPInputStream.read(Unknown Source)
at java.util.zip.InflaterInputStream.read(Unknown Source)
at nbt.Test$.main(Test.scala:13)
at nbt.Test.main(Test.scala)
The reading line seems going the wrong way for some reason.
I googled the error Unexpected end of ZLIB input stream and found some bug reports to Oracle, which were issued around 2007-2010. So I guess the bug still remains in some way, but I'm not sure if my code is right, so let me post this here and listen to your advice. Thank you!
You have to call close() on the GZIPOutputStream before you attempt to read it. The final bytes of the file will only be written when the stream object is actually closed.
(This is irrespective of any explicit buffering in the output stack. The stream only knows to compress and write the last bytes when you tell it to close. A flush() won't help ... though calling finish() instead of close() should work. Look at the javadocs.)
Here's the correct code (in Java);
package test;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
public class GZipTest {
public static void main(String[] args) throws
FileNotFoundException, IOException {
String name = "/tmp/test";
GZIPOutputStream gz = new GZIPOutputStream(new FileOutputStream(name));
gz.write(10);
gz.close(); // Remove this to reproduce the reported bug
System.out.println(new GZIPInputStream(new FileInputStream(name)).read());
}
}
(I've not implemented resource management or exception handling / reporting properly as they are not relevant to the purpose of this code. Don't treat this as an example of "good code".)
There is something wrong with GZIPInputStream or GZIPOutputStream. Just please read the following code (or run it and see what happens):
def main(a: Array[String]) {
val name = "test.dat"
new GZIPOutputStream(new FileOutputStream(name)).write(10)
println(new GZIPInputStream(new FileInputStream(name)).read())
}
It creates a file test.dat, writes a single byte 10 formatting by GZIP, and read the byte in the same file with the same format.
And this is what I got running it:
Exception in thread "main" java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(Unknown Source)
at java.util.zip.InflaterInputStream.read(Unknown Source)
at java.util.zip.GZIPInputStream.read(Unknown Source)
at java.util.zip.InflaterInputStream.read(Unknown Source)
at nbt.Test$.main(Test.scala:13)
at nbt.Test.main(Test.scala)
The reading line seems going the wrong way for some reason.
I googled the error Unexpected end of ZLIB input stream and found some bug reports to Oracle, which were issued around 2007-2010. So I guess the bug still remains in some way, but I'm not sure if my code is right, so let me post this here and listen to your advice. Thank you!
You have to call close() on the GZIPOutputStream before you attempt to read it. The final bytes of the file will only be written when the stream object is actually closed.
(This is irrespective of any explicit buffering in the output stack. The stream only knows to compress and write the last bytes when you tell it to close. A flush() won't help ... though calling finish() instead of close() should work. Look at the javadocs.)
Here's the correct code (in Java);
package test;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
public class GZipTest {
public static void main(String[] args) throws
FileNotFoundException, IOException {
String name = "/tmp/test";
GZIPOutputStream gz = new GZIPOutputStream(new FileOutputStream(name));
gz.write(10);
gz.close(); // Remove this to reproduce the reported bug
System.out.println(new GZIPInputStream(new FileInputStream(name)).read());
}
}
(I've not implemented resource management or exception handling / reporting properly as they are not relevant to the purpose of this code. Don't treat this as an example of "good code".)
I'm reading a file line by line, and I am trying to make it so that if I get to a line that fits my specific parameters (in my case if it begins with a certain word), that I can overwrite that line.
My current code:
try {
FileInputStream fis = new FileInputStream(myFile);
DataInputStream in = new DataInputStream(fis);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
if (line.startsWith("word")) {
// replace line code here
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
...where myFile is a File object.
As always, any help, examples, or suggestions are much appreciated.
Thanks!
RandomAccessFile seems a good fit. Its javadoc says:
Instances of this class support both reading and writing to a random access file. A random access file behaves like a large array of bytes stored in the file system. There is a kind of cursor, or index into the implied array, called the file pointer; input operations read bytes starting at the file pointer and advance the file pointer past the bytes read. If the random access file is created in read/write mode, then output operations are also available; output operations write bytes starting at the file pointer and advance the file pointer past the bytes written. Output operations that write past the current end of the implied array cause the array to be extended. The file pointer can be read by the getFilePointer method and set by the seek method.
That said, since text files are a sequential file format, you can not replace a line with a line of a different length without moving all subsequent characters around, so to replace lines will in general amount to reading and writing the entire file. This may be easier to accomplish if you write to a separate file, and rename the output file once you are done. This is also more robust in case if something goes wrong, as one can simply retry with the contents of the initial file. The only advantage of RandomAccessFile is that you do not need the disk space for the temporary output file, and may get slight better performance out of the disk due to better access locality.
Your best bet here is likely going to be reading in the file into memory (Something like a StringBuilder) and writing what you want your output file to look like into the StringBuilder. After you're done reading in the file completely, you'll then want to write the contents of the StringBuilder to the file.
If the file is too large to accomplish this in memory you can always read in the contents of the file line by line and write them to a temporary file instead of a StringBuilder. After that is done you can delete the old file and move the temporary one in its place.
An old question, recently worked on this. Sharing the experience
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public static void updateFile(Path file) {
// Get all the lines
try (Stream<String> stream = Files.lines(file,StandardCharsets.UTF_8)) {
// Do the replace operation
List<String> list = stream.map(line -> line.replaceAll("test", "new")).collect(Collectors.toList());
// Write the content back
Files.write(file, list, StandardCharsets.UTF_8);
} catch (IOException e) {
e.printStackTrace();
}
}