to get the content of a txt file I usually use a scanner and iterate over each line to get the content:
Scanner sc = new Scanner(new File("file.txt"));
while(sc.hasNextLine()){
String str = sc.nextLine();
}
Does the java api provide a way to get the content with one line of code like:
String content = FileUtils.readFileToString(new File("file.txt"))
Not the built-in API - but Guava does, amongst its other treasures. (It's a fabulous library.)
String content = Files.toString(new File("file.txt"), Charsets.UTF_8);
There are similar methods for reading any Readable, or loading the entire contents of a binary file as a byte array, or reading a file into a list of strings, etc.
Note that this method is now deprecated. The new equivalent is:
String content = Files.asCharSource(new File("file.txt"), Charsets.UTF_8).read();
With Java 7 there is an API along those lines.
Files.readAllLines(Path path, Charset cs)
commons-io has:
IOUtils.toString(new FileReader("file.txt"), "utf-8");
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
public static void main(String[] args) throws IOException {
String content = Files.readString(Paths.get("foo"));
}
From https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/Files.html#readString(java.nio.file.Path)
You could use the FileReader class together with the BufferedReader to read the text file.
File fileToRead = new File("file.txt");
try( FileReader fileStream = new FileReader( fileToRead );
BufferedReader bufferedReader = new BufferedReader( fileStream ) ) {
String line = null;
while( (line = bufferedReader.readLine()) != null ) {
//do something with line
}
} catch ( FileNotFoundException ex ) {
//exception Handling
} catch ( IOException ex ) {
//exception Handling
}
After a bit of testing, I find BufferedReader and Scanner both problematic under various circumstances (the former often fails to detect new lines and the latter often strips spaces, for instance, from a JSON string exported by org.json library). There are other methods available but the problem is they are only supported after certain Java versions (which is bad for an Android developer, for example) and you might not want to use Guava or Apache commons library just for a single purpose like this. Hence, my solution is to read the whole file as bytes and convert it to string. The code below are taken from one of my hobby projects:
/**
* Get byte array from an InputStream most efficiently.
* Taken from sun.misc.IOUtils
* #param is InputStream
* #param length Length of the buffer, -1 to read the whole stream
* #param readAll Whether to read the whole stream
* #return Desired byte array
* #throws IOException If maximum capacity exceeded.
*/
public static byte[] readFully(InputStream is, int length, boolean readAll)
throws IOException {
byte[] output = {};
if (length == -1) length = Integer.MAX_VALUE;
int pos = 0;
while (pos < length) {
int bytesToRead;
if (pos >= output.length) {
bytesToRead = Math.min(length - pos, output.length + 1024);
if (output.length < pos + bytesToRead) {
output = Arrays.copyOf(output, pos + bytesToRead);
}
} else {
bytesToRead = output.length - pos;
}
int cc = is.read(output, pos, bytesToRead);
if (cc < 0) {
if (readAll && length != Integer.MAX_VALUE) {
throw new EOFException("Detect premature EOF");
} else {
if (output.length != pos) {
output = Arrays.copyOf(output, pos);
}
break;
}
}
pos += cc;
}
return output;
}
/**
* Read the full content of a file.
* #param file The file to be read
* #param emptyValue Empty value if no content has found
* #return File content as string
*/
#NonNull
public static String getFileContent(#NonNull File file, #NonNull String emptyValue) {
if (file.isDirectory()) return emptyValue;
try {
return new String(readFully(new FileInputStream(file), -1, true), Charset.defaultCharset());
} catch (IOException e) {
e.printStackTrace();
return emptyValue;
}
}
You can simply use getFileContent(file, "") to read the content of a file.
Related
We have some Java code that processes a user-provided file by looping through the file using BufferedReader.readline() to read in each line.
The problem is that when the user uploads a file that has extremely long lines, like an arbitrary binary JPG or such, this can cause out-of-memory issues. Even the first readline() may not return. We want to reject the files with long lines before it OOMs.
Is there a standard Java idiom to handle this, or do we just change to read() and write our own safe version of readLine()?
You will need to read the file character by character (or chunk by chunk) yourself (via some form of read()), and then form the lines into Strings when you encounter a newline character. This way you can throw an Exception (avoiding the OOM error) if some maximum number of characters is hit before a newline is encountered.
If you use a Reader instance it should not be too difficult to implement this code, just read from the Reader into a buffer (which you allocate to your maximum possible line length), and then convert the buffer to String when you encounter a newline (or throw an exception if you don't).
There doesn't appear to be any way to set a line length limit for BufferedReader.readLine(), so it will accumulate the entire line before feeding it to your code, however long that line may be.
Therefore, you'll have to do the line-splitting part yourself, and give up once a line is too long.
You might use the following as a starting point:
class LineTooLongException extends Exception {}
class ShortLineReader implements AutoCloseable {
final Reader reader;
final char[] buf = new char[8192];
int nextIndex = 0;
int maxIndex = 0;
boolean eof;
public ShortLineReader(Reader reader) {
this.reader = reader;
}
public String readLine() throws IOException, LineTooLongException {
if (eof) {
return null;
}
for (;;) {
for (int i = nextIndex; i < maxIndex; i++) {
if (buf[i] == '\n') {
String result = new String(buf, nextIndex, i - nextIndex);
nextIndex = i + 1;
return result;
}
}
if (maxIndex - nextIndex > 6000) {
throw new LineTooLongException();
}
System.arraycopy(buf, nextIndex, buf, 0, maxIndex - nextIndex);
maxIndex -= nextIndex;
nextIndex = 0;
int c = reader.read(buf, maxIndex, buf.length - maxIndex);
if (c == -1) {
eof = true;
return new String(buf, nextIndex, maxIndex - nextIndex);
} else {
maxIndex += c;
}
}
}
#Override
public void close() throws Exception {
reader.close();
}
}
public class Test {
public static void main(String[] args) throws Exception {
File file = new File("D:\\t\\output.log");
// try (OutputStream fos = new BufferedOutputStream(new FileOutputStream(file))) {
// for (int i = 0; i < 10000000; i++) {
// fos.write(65);
// }
// }
try (ShortLineReader r = new ShortLineReader(new FileReader(file))) {
String s;
while ((s = r.readLine()) != null) {
System.out.println(s);
}
}
}
}
Note: This assumes unix-style line termination.
Use BufferedInputStream to read binary data rather than BufferedReader...
for example if it is an image file, using ImageIO and InputStream you can do it like this..
File file = new File("image.gif");
image = ImageIO.read(file);
InputStream is = new BufferedInputStream(new FileInputStream("image.gif"));
image = ImageIO.read(is);
hope it helps...
There doesn't appear to be a definite way but a few things you can do:
Check file headers. jMimeMagic seems to be a pretty good library for this purpose.
Check the type of characters the file contains. Essentially do statistical analysis on the first 'x' bytes of the file and use that to estimate the rest of the content.
Check for newlines '\n' or '\r' in the files, binary files usually wont contain newlines.
Hope that helps.
I have a java ee application where I use a servlet to print a log file created with log4j. When reading log files you are usually looking for the last log line and therefore the servlet would be much more useful if it printed the log file in reverse order. My actual code is:
response.setContentType("text");
PrintWriter out = response.getWriter();
try {
FileReader logReader = new FileReader("logfile.log");
try {
BufferedReader buffer = new BufferedReader(logReader);
for (String line = buffer.readLine(); line != null; line = buffer.readLine()) {
out.println(line);
}
} finally {
logReader.close();
}
} finally {
out.close();
}
The implementations I've found in the internet involve using a StringBuffer and loading all the file before printing, isn't there a code light way of seeking to the end of the file and reading the content till the start of the file?
[EDIT]
By request, I am prepending this answer with the sentiment of a later comment: If you need this behavior frequently, a "more appropriate" solution is probably to move your logs from text files to database tables with DBAppender (part of log4j 2). Then you could simply query for latest entries.
[/EDIT]
I would probably approach this slightly differently than the answers listed.
(1) Create a subclass of Writer that writes the encoded bytes of each character in reverse order:
public class ReverseOutputStreamWriter extends Writer {
private OutputStream out;
private Charset encoding;
public ReverseOutputStreamWriter(OutputStream out, Charset encoding) {
this.out = out;
this.encoding = encoding;
}
public void write(int ch) throws IOException {
byte[] buffer = this.encoding.encode(String.valueOf(ch)).array();
// write the bytes in reverse order to this.out
}
// other overloaded methods
}
(2) Create a subclass of log4j WriterAppender whose createWriter method would be overridden to create an instance of ReverseOutputStreamWriter.
(3) Create a subclass of log4j Layout whose format method returns the log string in reverse character order:
public class ReversePatternLayout extends PatternLayout {
// constructors
public String format(LoggingEvent event) {
return new StringBuilder(super.format(event)).reverse().toString();
}
}
(4) Modify my logging configuration file to send log messages to both the "normal" log file and a "reverse" log file. The "reverse" log file would contain the same log messages as the "normal" log file, but each message would be written backwards. (Note that the encoding of the "reverse" log file would not necessarily conform to UTF-8, or even any character encoding.)
(5) Create a subclass of InputStream that wraps an instance of RandomAccessFile in order to read the bytes of a file in reverse order:
public class ReverseFileInputStream extends InputStream {
private RandomAccessFile in;
private byte[] buffer;
// The index of the next byte to read.
private int bufferIndex;
public ReverseFileInputStream(File file) {
this.in = new RandomAccessFile(File, "r");
this.buffer = new byte[4096];
this.bufferIndex = this.buffer.length;
this.in.seek(file.length());
}
public void populateBuffer() throws IOException {
// record the old position
// seek to a new, previous position
// read from the new position to the old position into the buffer
// reverse the buffer
}
public int read() throws IOException {
if (this.bufferIndex == this.buffer.length) {
populateBuffer();
if (this.bufferIndex == this.buffer.length) {
return -1;
}
}
return this.buffer[this.bufferIndex++];
}
// other overridden methods
}
Now if I want to read the entries of the "normal" log file in reverse order, I just need to create an instance of ReverseFileInputStream, giving it the "revere" log file.
This is a old question. I also wanted to do the same thing and after some searching found there is a class in apache commons-io to achieve this:
org.apache.commons.io.input.ReversedLinesFileReader
I think a good choice for this would be using RandomFileAccess class. There is some sample code for back-reading using this class on this page. Reading bytes this way is easy, however reading strings might be a bit more challenging.
If you are in a hurry and want the simplest solution without worrying too much about performance, I would give a try to use an external process to do the dirty job (given that you are running your app in a Un*x server, as any decent person would do XD)
new BufferedReader(new InputStreamReader(Runtime.getRuntime().exec("tail yourlogfile.txt -n 50 | rev").getProcess().getInputStream()))
A simpler alternative, because you say that you're creating a servlet to do this, is to use a LinkedList to hold the last N lines (where N might be a servlet parameter). When the list size exceeds N, you call removeFirst().
From a user experience perspective, this is probably the best solution. As you note, the most recent lines are the most important. Not being overwhelmed with information is also very important.
Good question. I'm not aware of any common implementations of this. It's not trivial to do properly either, so be careful what you choose. It should deal with character set encoding and detection of different line break methods. Here's the implementation I have so far that works with ASCII and UTF-8 encoded files, including a test case for UTF-8. It does not work with UTF-16LE or UTF-16BE encoded files.
import java.io.BufferedReader;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.RandomAccessFile;
import java.io.Reader;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import junit.framework.TestCase;
public class ReverseLineReader {
private static final int BUFFER_SIZE = 8192;
private final FileChannel channel;
private final String encoding;
private long filePos;
private ByteBuffer buf;
private int bufPos;
private byte lastLineBreak = '\n';
private ByteArrayOutputStream baos = new ByteArrayOutputStream();
public ReverseLineReader(File file, String encoding) throws IOException {
RandomAccessFile raf = new RandomAccessFile(file, "r");
channel = raf.getChannel();
filePos = raf.length();
this.encoding = encoding;
}
public String readLine() throws IOException {
while (true) {
if (bufPos < 0) {
if (filePos == 0) {
if (baos == null) {
return null;
}
String line = bufToString();
baos = null;
return line;
}
long start = Math.max(filePos - BUFFER_SIZE, 0);
long end = filePos;
long len = end - start;
buf = channel.map(FileChannel.MapMode.READ_ONLY, start, len);
bufPos = (int) len;
filePos = start;
}
while (bufPos-- > 0) {
byte c = buf.get(bufPos);
if (c == '\r' || c == '\n') {
if (c != lastLineBreak) {
lastLineBreak = c;
continue;
}
lastLineBreak = c;
return bufToString();
}
baos.write(c);
}
}
}
private String bufToString() throws UnsupportedEncodingException {
if (baos.size() == 0) {
return "";
}
byte[] bytes = baos.toByteArray();
for (int i = 0; i < bytes.length / 2; i++) {
byte t = bytes[i];
bytes[i] = bytes[bytes.length - i - 1];
bytes[bytes.length - i - 1] = t;
}
baos.reset();
return new String(bytes, encoding);
}
public static void main(String[] args) throws IOException {
File file = new File("my.log");
ReverseLineReader reader = new ReverseLineReader(file, "UTF-8");
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
}
public static class ReverseLineReaderTest extends TestCase {
public void test() throws IOException {
File file = new File("utf8test.log");
String encoding = "UTF-8";
FileInputStream fileIn = new FileInputStream(file);
Reader fileReader = new InputStreamReader(fileIn, encoding);
BufferedReader bufReader = new BufferedReader(fileReader);
List<String> lines = new ArrayList<String>();
String line;
while ((line = bufReader.readLine()) != null) {
lines.add(line);
}
Collections.reverse(lines);
ReverseLineReader reader = new ReverseLineReader(file, encoding);
int pos = 0;
while ((line = reader.readLine()) != null) {
assertEquals(lines.get(pos++), line);
}
assertEquals(lines.size(), pos);
}
}
}
you can use RandomAccessFile implements this function,such as:
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import com.google.common.io.LineProcessor;
public class FileUtils {
/**
* 反向读取文本文件(UTF8),文本文件分行是通过\r\n
*
* #param <T>
* #param file
* #param step 反向寻找的步长
* #param lineprocessor
* #throws IOException
*/
public static <T> T backWardsRead(File file, int step,
LineProcessor<T> lineprocessor) throws IOException {
RandomAccessFile rf = new RandomAccessFile(file, "r");
long fileLen = rf.length();
long pos = fileLen - step;
// 寻找倒序的第一行:\r
while (true) {
if (pos < 0) {
// 处理第一行
rf.seek(0);
lineprocessor.processLine(rf.readLine());
return lineprocessor.getResult();
}
rf.seek(pos);
char c = (char) rf.readByte();
while (c != '\r') {
c = (char) rf.readByte();
}
rf.readByte();//read '\n'
pos = rf.getFilePointer();
if (!lineprocessor.processLine(rf.readLine())) {
return lineprocessor.getResult();
}
pos -= step;
}
}
use:
FileUtils.backWardsRead(new File("H:/usersfavs.csv"), 40,
new LineProcessor<Void>() {
//TODO implements method
.......
});
The simplest solution is to read through the file in forward order, using an ArrayList<Long> to hold the byte offset of each log record. You'll need to use something like Jakarta Commons CountingInputStream to retrieve the position of each record, and will need to carefully organize your buffers to ensure that it returns the proper values:
FileInputStream fis = // .. logfile
BufferedInputStream bis = new BufferedInputStream(fis);
CountingInputStream cis = new CountingInputSteam(bis);
InputStreamReader isr = new InputStreamReader(cis, "UTF-8");
And you probably won't be able to use a BufferedReader, because it will attempt to read-ahead and throw off the count (but reading a character at a time won't be a performance problem, because you're buffering lower in the stack).
To write the file, you iterate the list backwards and use a RandomAccessFile. There is a bit of a trick: to properly decode the bytes (assuming a multi-byte encoding), you will need to read the bytes corresponding to an entry, and then apply a decoding to it. The list, however, will give you the start and end position of the bytes.
One big benefit to this approach, versus simply printing the lines in reverse order, is that you won't damage multi-line log messages (such as exceptions).
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
/**
* Inside of C:\\temp\\vaquar.txt we have following content
* vaquar khan is working into Citi He is good good programmer programmer trust me
* #author vaquar.khan#gmail.com
*
*/
public class ReadFileAndDisplayResultsinReverse {
public static void main(String[] args) {
try {
// read data from file
Object[] wordList = ReadFile();
System.out.println("File data=" + wordList);
//
Set<String> uniquWordList = null;
for (Object text : wordList) {
System.out.println((String) text);
List<String> tokens = Arrays.asList(text.toString().split("\\s+"));
System.out.println("tokens" + tokens);
uniquWordList = new HashSet<String>(tokens);
// If multiple line then code into same loop
}
System.out.println("uniquWordList" + uniquWordList);
Comparator<String> wordComp= new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
if(o1==null && o2 ==null) return 0;
if(o1==null ) return o2.length()-0;
if(o2 ==null) return o1.length()-0;
//
return o2.length()-o1.length();
}
};
List<String> fs=new ArrayList<String>(uniquWordList);
Collections.sort(fs,wordComp);
System.out.println("uniquWordList" + fs);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
static Object[] ReadFile() throws IOException {
List<String> list = Files.readAllLines(new File("C:\\temp\\vaquar.txt").toPath(), Charset.defaultCharset());
return list.toArray();
}
}
Output:
[Vaquar khan is working into Citi He is good good programmer programmer trust me
tokens[vaquar, khan, is, working, into, Citi, He, is, good, good, programmer, programmer, trust, me]
uniquWordList[trust, vaquar, programmer, is, good, into, khan, me, working, Citi, He]
uniquWordList[programmer, working, vaquar, trust, good, into, khan, Citi, is, me, He]
If you want to Sort A to Z then write one more comparater
Concise solution using Java 7 Autoclosables and Java 8 Streams :
try (Stream<String> logStream = Files.lines(Paths.get("C:\\logfile.log"))) {
logStream
.sorted(Comparator.reverseOrder())
.limit(10) // last 10 lines
.forEach(System.out::println);
}
Big drawback: only works when lines are strictly in natural order, like log files prefixed with timestamps but without exceptions
I have a linux server and many clients with many operating systems. The server takes an input file from clients. Linux has end of line char LF, while Mac has end of line char CR, and
Windows has end of line char CR+LF
The server needs as end of line char LF. Using java, I want to ensure that the file will always use the linux eol char LF. How can I achieve it?
Could you try this?
content.replaceAll("\\r\\n?", "\n")
Combining the two answers (by Visage & eumiro):
EDIT: After reading the comment. line.
System.getProperty("line.separator") has no use then.
Before sending the file to server, open it replace all the EOLs and writeback
Make sure to use DataStreams to do so, and write in binary
String fileString;
//..
//read from the file
//..
//for windows
fileString = fileString.replaceAll("\\r\\n", "\n");
fileString = fileString.replaceAll("\\r", "\n");
//..
//write to file in binary mode.. something like:
DataOutputStream os = new DataOutputStream(new FileOutputStream("fname.txt"));
os.write(fileString.getBytes());
//..
//send file
//..
The replaceAll method has two arguments, the first one is the string to replace and the second one is the replacement. But, the first one is treated as a regular expression, so, '\' is interpreted that way. So:
"\\r\\n" is converted to "\r\n" by Regex
"\r\n" is converted to CR+LF by Java
Had to do this for a recent project. The method below will normalize the line endings in the given file to the line ending specified by the OS the JVM is running on. So if you JVM is running on Linux, this will normalize all line endings to LF (\n).
Also works on very large files due to the use of buffered streams.
public static void normalizeFile(File f) {
File temp = null;
BufferedReader bufferIn = null;
BufferedWriter bufferOut = null;
try {
if(f.exists()) {
// Create a new temp file to write to
temp = new File(f.getAbsolutePath() + ".normalized");
temp.createNewFile();
// Get a stream to read from the file un-normalized file
FileInputStream fileIn = new FileInputStream(f);
DataInputStream dataIn = new DataInputStream(fileIn);
bufferIn = new BufferedReader(new InputStreamReader(dataIn));
// Get a stream to write to the normalized file
FileOutputStream fileOut = new FileOutputStream(temp);
DataOutputStream dataOut = new DataOutputStream(fileOut);
bufferOut = new BufferedWriter(new OutputStreamWriter(dataOut));
// For each line in the un-normalized file
String line;
while ((line = bufferIn.readLine()) != null) {
// Write the original line plus the operating-system dependent newline
bufferOut.write(line);
bufferOut.newLine();
}
bufferIn.close();
bufferOut.close();
// Remove the original file
f.delete();
// And rename the original file to the new one
temp.renameTo(f);
} else {
// If the file doesn't exist...
log.warn("Could not find file to open: " + f.getAbsolutePath());
}
} catch (Exception e) {
log.warn(e.getMessage(), e);
} finally {
// Clean up, temp should never exist
FileUtils.deleteQuietly(temp);
IOUtils.closeQuietly(bufferIn);
IOUtils.closeQuietly(bufferOut);
}
}
Here is a comprehensive helper class to deal with EOL issues. It it partially based on the solution posted by tyjen.
import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;
/**
* Helper class to deal with end-of-line markers in text files.
*
* Loosely based on these examples:
* - http://stackoverflow.com/a/9456947/1084488 (cc by-sa 3.0)
* - https://github.com/apache/tomcat/blob/main/java/org/apache/tomcat/buildutil/CheckEol.java (Apache License v2.0)
*
* This file is posted here to meet the "ShareAlike" requirement of cc by-sa 3.0:
* http://stackoverflow.com/a/27930311/1084488
*
* #author Matthias Stevens
*/
public class EOLUtils
{
/**
* Unix-style end-of-line marker (LF)
*/
private static final String EOL_UNIX = "\n";
/**
* Windows-style end-of-line marker (CRLF)
*/
private static final String EOL_WINDOWS = "\r\n";
/**
* "Old Mac"-style end-of-line marker (CR)
*/
private static final String EOL_OLD_MAC = "\r";
/**
* Default end-of-line marker on current system
*/
private static final String EOL_SYSTEM_DEFAULT = System.getProperty( "line.separator" );
/**
* The support end-of-line marker modes
*/
public static enum Mode
{
/**
* Unix-style end-of-line marker ("\n")
*/
LF,
/**
* Windows-style end-of-line marker ("\r\n")
*/
CRLF,
/**
* "Old Mac"-style end-of-line marker ("\r")
*/
CR
}
/**
* The default end-of-line marker mode for the current system
*/
public static final Mode SYSTEM_DEFAULT = ( EOL_SYSTEM_DEFAULT.equals( EOL_UNIX ) ? Mode.LF : ( EOL_SYSTEM_DEFAULT
.equals( EOL_WINDOWS ) ? Mode.CRLF : ( EOL_SYSTEM_DEFAULT.equals( EOL_OLD_MAC ) ? Mode.CR : null ) ) );
static
{
// Just in case...
if ( SYSTEM_DEFAULT == null )
{
throw new IllegalStateException( "Could not determine system default end-of-line marker" );
}
}
/**
* Determines the end-of-line {#link Mode} of a text file.
*
* #param textFile the file to investigate
* #return the end-of-line {#link Mode} of the given file, or {#code null} if it could not be determined
* #throws Exception
*/
public static Mode determineEOL( File textFile )
throws Exception
{
if ( !textFile.exists() )
{
throw new IOException( "Could not find file to open: " + textFile.getAbsolutePath() );
}
FileInputStream fileIn = new FileInputStream( textFile );
BufferedInputStream bufferIn = new BufferedInputStream( fileIn );
try
{
int prev = -1;
int ch;
while ( ( ch = bufferIn.read() ) != -1 )
{
if ( ch == '\n' )
{
if ( prev == '\r' )
{
return Mode.CRLF;
}
else
{
return Mode.LF;
}
}
else if ( prev == '\r' )
{
return Mode.CR;
}
prev = ch;
}
throw new Exception( "Could not determine end-of-line marker mode" );
}
catch ( IOException ioe )
{
throw new Exception( "Could not determine end-of-line marker mode", ioe );
}
finally
{
// Clean up:
IOUtils.closeQuietly( bufferIn );
}
}
/**
* Checks whether the given text file has Windows-style (CRLF) line endings.
*
* #param textFile the file to investigate
* #return
* #throws Exception
*/
public static boolean hasWindowsEOL( File textFile )
throws Exception
{
return Mode.CRLF.equals( determineEOL( textFile ) );
}
/**
* Checks whether the given text file has Unix-style (LF) line endings.
*
* #param textFile the file to investigate
* #return
* #throws Exception
*/
public static boolean hasUnixEOL( File textFile )
throws Exception
{
return Mode.LF.equals( determineEOL( textFile ) );
}
/**
* Checks whether the given text file has "Old Mac"-style (CR) line endings.
*
* #param textFile the file to investigate
* #return
* #throws Exception
*/
public static boolean hasOldMacEOL( File textFile )
throws Exception
{
return Mode.CR.equals( determineEOL( textFile ) );
}
/**
* Checks whether the given text file has line endings that conform to the system default mode (e.g. LF on Unix).
*
* #param textFile the file to investigate
* #return
* #throws Exception
*/
public static boolean hasSystemDefaultEOL( File textFile )
throws Exception
{
return SYSTEM_DEFAULT.equals( determineEOL( textFile ) );
}
/**
* Convert the line endings in the given file to Unix-style (LF).
*
* #param textFile the file to process
* #throws IOException
*/
public static void convertToUnixEOL( File textFile )
throws IOException
{
convertLineEndings( textFile, EOL_UNIX );
}
/**
* Convert the line endings in the given file to Windows-style (CRLF).
*
* #param textFile the file to process
* #throws IOException
*/
public static void convertToWindowsEOL( File textFile )
throws IOException
{
convertLineEndings( textFile, EOL_WINDOWS );
}
/**
* Convert the line endings in the given file to "Old Mac"-style (CR).
*
* #param textFile the file to process
* #throws IOException
*/
public static void convertToOldMacEOL( File textFile )
throws IOException
{
convertLineEndings( textFile, EOL_OLD_MAC );
}
/**
* Convert the line endings in the given file to the system default mode.
*
* #param textFile the file to process
* #throws IOException
*/
public static void convertToSystemEOL( File textFile )
throws IOException
{
convertLineEndings( textFile, EOL_SYSTEM_DEFAULT );
}
/**
* Line endings conversion method.
*
* #param textFile the file to process
* #param eol the end-of-line marker to use (as a {#link String})
* #throws IOException
*/
private static void convertLineEndings( File textFile, String eol )
throws IOException
{
File temp = null;
BufferedReader bufferIn = null;
BufferedWriter bufferOut = null;
try
{
if ( textFile.exists() )
{
// Create a new temp file to write to
temp = new File( textFile.getAbsolutePath() + ".normalized" );
temp.createNewFile();
// Get a stream to read from the file un-normalized file
FileInputStream fileIn = new FileInputStream( textFile );
DataInputStream dataIn = new DataInputStream( fileIn );
bufferIn = new BufferedReader( new InputStreamReader( dataIn ) );
// Get a stream to write to the normalized file
FileOutputStream fileOut = new FileOutputStream( temp );
DataOutputStream dataOut = new DataOutputStream( fileOut );
bufferOut = new BufferedWriter( new OutputStreamWriter( dataOut ) );
// For each line in the un-normalized file
String line;
while ( ( line = bufferIn.readLine() ) != null )
{
// Write the original line plus the operating-system dependent newline
bufferOut.write( line );
bufferOut.write( eol ); // write EOL marker
}
// Close buffered reader & writer:
bufferIn.close();
bufferOut.close();
// Remove the original file
textFile.delete();
// And rename the original file to the new one
temp.renameTo( textFile );
}
else
{
// If the file doesn't exist...
throw new IOException( "Could not find file to open: " + textFile.getAbsolutePath() );
}
}
finally
{
// Clean up, temp should never exist
FileUtils.deleteQuietly( temp );
IOUtils.closeQuietly( bufferIn );
IOUtils.closeQuietly( bufferOut );
}
}
}
Use
System.getProperty("line.separator")
That will give you the (local) EOL character(s). You can then use an analysis of the incomifile to determine what 'flavour' it is and convert accordingly.
Alternatively, get your clients to standardise!
public static String normalize(String val) {
return val.replace("\r\n", "\n")
.replace("\r", "\n");
}
For HTML:
public static String normalize(String val) {
return val.replace("\r\n", "<br/>")
.replace("\n", "<br/>")
.replace("\r", "<br/>");
}
solution to change the file ending with recursive search in path
package handleFileLineEnd;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.OpenOption;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import sun.awt.image.BytePackedRaster;
public class handleFileEndingMain {
static int carriageReturnTotal;
static int newLineTotal;
public static void main(String[] args) throws IOException
{
processPath("c:/temp/directories");
System.out.println("carriageReturnTotal (files have issue): " + carriageReturnTotal);
System.out.println("newLineTotal: " + newLineTotal);
}
private static void processPath(String path) throws IOException
{
File dir = new File(path);
File[] directoryListing = dir.listFiles();
if (directoryListing != null) {
for (File child : directoryListing) {
if (child.isDirectory())
processPath(child.toString());
else
checkFile(child.toString());
}
}
}
private static void checkFile(String fileName) throws IOException
{
Path path = FileSystems.getDefault().getPath(fileName);
byte[] bytes= Files.readAllBytes(path);
for (int counter=0; counter<bytes.length; counter++)
{
if (bytes[counter] == 13)
{
carriageReturnTotal = carriageReturnTotal + 1;
System.out.println(fileName);
modifyFile(fileName);
break;
}
if (bytes[counter] == 10)
{
newLineTotal = newLineTotal+ 1;
//System.out.println(fileName);
break;
}
}
}
private static void modifyFile(String fileName) throws IOException
{
Path path = Paths.get(fileName);
Charset charset = StandardCharsets.UTF_8;
String content = new String(Files.readAllBytes(path), charset);
content = content.replaceAll("\r\n", "\n");
content = content.replaceAll("\r", "\n");
Files.write(path, content.getBytes(charset));
}
}
Although String.replaceAll() is simpler to code, this should perform better since it doesn't go through the regex infrastructure.
/**
* Accepts a non-null string and returns the string with all end-of-lines
* normalized to a \n. This means \r\n and \r will both be normalized to \n.
* <p>
* Impl Notes: Although regex would have been easier to code, this approach
* will be more efficient since it's purpose built for this use case. Note we only
* construct a new StringBuilder and start appending to it if there are new end-of-lines
* to be normalized found in the string. If there are no end-of-lines to be replaced
* found in the string, this will simply return the input value.
* </p>
*
* #param inputValue !null, input value that may or may not contain new lines
* #return the input value that has new lines normalized
*/
static String normalizeNewLines(String inputValue){
StringBuilder stringBuilder = null;
int index = 0;
int len = inputValue.length();
while (index < len){
char c = inputValue.charAt(index);
if (c == '\r'){
if (stringBuilder == null){
stringBuilder = new StringBuilder();
// build up the string builder so it contains all the prior characters
stringBuilder.append(inputValue.substring(0, index));
}
if ((index + 1 < len) &&
inputValue.charAt(index + 1) == '\n'){
// this means we encountered a \r\n ... move index forward one more character
index++;
}
stringBuilder.append('\n');
}else{
if (stringBuilder != null){
stringBuilder.append(c);
}
}
index++;
}
return stringBuilder == null ? inputValue : stringBuilder.toString();
}
Since Java 12 you can use
var str = str.indent(0);
which implicitly normalizes the EOF characters.
Or more explicitly
var str = str.lines().collect(Collectors.joining("\n", "", "\n"))
I have a log file which gets updated every second. I need to read the log file periodically, and once I do a read, I need to store the file pointer position at the end of the last line I read and in the next periodic read I should start from that point.
Currently, I am using a random access file in Java and using the getFilePointer() method to get he offset value and the seek() method to go to the offset position.
However, I have read in most articles and even the Java doc recommendations to use BufferredReader for efficient reading of a file. How can I achieve this (getting the filepointer and moving to the last line) using a BufferedReader, or is there any other efficient way to achieve this task?
A couple of ways that should work:
open the file using a FileInputStream, skip() the relevant number of bytes, then wrap the BufferedReader around the stream (via an InputStreamReader);
open the file (with either FileInputStream or RandomAccessFile), call getChannel() on the stream/RandomAccessFile to get an underlying FileChannel, call position() on the channel, then call Channels.newInputStream() to get an input stream from the channel, which you can pass to InputStreamReader -> BufferedReader.
I haven't honestly profiled these to see which is better performance-wise, but you should see which works better in your situation.
The problem with RandomAccessFile is essentially that its readLine() method is very inefficient. If it's convenient for you to read from the RAF and do your own buffering to split the lines, then there's nothing wrong with RAF per se-- just that its readLine() is poorly implemented
Neil Coffey's solution is good if you are reading fixed length files. However for files that have variable length (data keep coming in) there are some problems with using BufferedReader directly on FileInputStream or FileChannel inputstream via an InputStreamReader. For ex consider the cases
1)
You want to read data from some offset to current file length. So you use BR on FileInputStream/FileChannel(via an InputStreamReader) and use its readLine method. But while you are busy reading the data let say some data got added which causes BF's readLine to read more data than what you expected(the previous file length)
2)
You finished readLine stuff but when you try to read the current file length/channel position some data got added suddenly which causes the current file length/channel position to increase but you have already read less data than this.
In both of the above cases it is difficult to know the actual data you have read (you cannot just use the length of data read using readLine because it skips some chars like carriage return)
So it is better to read the data in buffered bytes and use a BufferedReader wrapper around this. I wrote some methods like this
/** Read data from offset to length bytes in RandomAccessFile using BufferedReader
* #param offset
* #param length
* #param accessFile
* #throws IOException
*/
public static void readBufferedLines(long offset, long length, RandomAccessFile accessFile) throws IOException{
if(accessFile == null) return;
int bufferSize = BYTE_BUFFER_SIZE;// constant say 4096
if(offset < length && offset >= 0){
int index = 1;
long curPosition = offset;
/*
* iterate (length-from)/BYTE_BUFFER_SIZE times to read into buffer no matter where new line occurs
*/
while((curPosition + (index * BYTE_BUFFER_SIZE)) < length){
accessFile.seek(offset); // seek to last parsed data rather than last data read in to buffer
byte[] buf = new byte[bufferSize];
int read = accessFile.read(buf, 0, bufferSize);
index++;// Increment whether or not read successful
if(read > 0){
int lastnewLine = getLastLine(read,buf);
if(lastnewLine <= 0){ // no new line found in the buffer reset buffer size and continue
bufferSize = bufferSize+read;
continue;
}
else{
bufferSize = BYTE_BUFFER_SIZE;
}
readLine(buf, 0, lastnewLine); // read the lines from buffer and parse the line
offset = offset+lastnewLine; // update the last data read
}
}
// Read last chunk. The last chunk size in worst case is the total file when no newline occurs
if(offset < length){
accessFile.seek(offset);
byte[] buf = new byte[(int) (length-offset)];
int read = accessFile.read(buf, 0, buf.length);
if(read > 0){
readLine(buf, 0, read);
offset = offset+read; // update the last data read
}
}
}
}
private static void readLine(byte[] buf, int from , int lastnewLine) throws IOException{
String readLine = "";
BufferedReader reader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(buf,from,lastnewLine) ));
while( (readLine = reader.readLine()) != null){
//do something with readLine
System.out.println(readLine);
}
reader.close();
}
private static int getLastLine(int read, byte[] buf) {
if(buf == null ) return -1;
if(read > buf.length) read = buf.length;
while( read > 0 && !(buf[read-1] == '\n' || buf[read-1] == '\r')) read--;
return read;
}
public static void main(String[] args) throws IOException {
RandomAccessFile accessFile = new RandomAccessFile("C:/sri/test.log", "r");
readBufferedLines(0, accessFile.length(), accessFile);
accessFile.close();
}
I had a similar problem, and I created this class to take lines from BufferedStream, and count how many bytes you have read so far by using getBytes(). We assume the line separator has a single byte by default, and we re-instance the BufferedReader for seek() to work.
public class FileCounterIterator {
public Long position() {
return _position;
}
public Long fileSize() {
return _fileSize;
}
public FileCounterIterator newlineLength(Long newNewlineLength) {
this._newlineLength = newNewlineLength;
return this;
}
private Long _fileSize = 0L;
private Long _position = 0L;
private Long _newlineLength = 1L;
private RandomAccessFile fp;
private BufferedReader itr;
public FileCounterIterator(String filename) throws IOException {
fp = new RandomAccessFile(filename, "r");
_fileSize = fp.length();
this.seek(0L);
}
public FileCounterIterator seek(Long newPosition) throws IOException {
this.fp.seek(newPosition);
this._position = newPosition;
itr = new BufferedReader(new InputStreamReader(new FileInputStream(fp.getFD())));
return this;
}
public Boolean hasNext() throws IOException {
return this._position < this._fileSize;
}
public String readLine() throws IOException {
String nextLine = itr.readLine();
this._position += nextLine.getBytes().length + _newlineLength;
return nextLine;
}
}
I have read a file into a String. The file contains various names, one name per line. Now the problem is that I want those names in a String array.
For that I have written the following code:
String [] names = fileString.split("\n"); // fileString is the string representation of the file
But I am not getting the desired results and the array obtained after splitting the string is of length 1. It means that the "fileString" doesn't have "\n" character but the file has this "\n" character.
So How to get around this problem?
What about using Apache Commons (Commons IO and Commons Lang)?
String[] lines = StringUtils.split(FileUtils.readFileToString(new File("...")), '\n');
The problem is not with how you're splitting the string; that bit is correct.
You have to review how you are reading the file to the string. You need something like this:
private String readFileAsString(String filePath) throws IOException {
StringBuffer fileData = new StringBuffer();
BufferedReader reader = new BufferedReader(
new FileReader(filePath));
char[] buf = new char[1024];
int numRead=0;
while((numRead=reader.read(buf)) != -1){
String readData = String.valueOf(buf, 0, numRead);
fileData.append(readData);
}
reader.close();
return fileData.toString();
}
Particularly i love this one using the java.nio.file package also described here.
You can optionally include the Charset as a second argument in the String constructor.
String content = new String(Files.readAllBytes(Paths.get("/path/to/file")));
Cool huhhh!
As suggested by Garrett Rowe and Stan James you can use java.util.Scanner:
try (Scanner s = new Scanner(file).useDelimiter("\\Z")) {
String contents = s.next();
}
or
try (Scanner s = new Scanner(file).useDelimiter("\\n")) {
while(s.hasNext()) {
String line = s.next();
}
}
This code does not have external dependencies.
WARNING: you should specify the charset encoding as the second parameter of the Scanner's constructor. In this example I am using the platform's default, but this is most certainly wrong.
Here is an example of how to use java.util.Scanner with correct resource and error handling:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.Iterator;
class TestScanner {
public static void main(String[] args)
throws FileNotFoundException {
File file = new File(args[0]);
System.out.println(getFileContents(file));
processFileLines(file, new LineProcessor() {
#Override
public void process(int lineNumber, String lineContents) {
System.out.println(lineNumber + ": " + lineContents);
}
});
}
static String getFileContents(File file)
throws FileNotFoundException {
try (Scanner s = new Scanner(file).useDelimiter("\\Z")) {
return s.next();
}
}
static void processFileLines(File file, LineProcessor lineProcessor)
throws FileNotFoundException {
try (Scanner s = new Scanner(file).useDelimiter("\\n")) {
for (int lineNumber = 1; s.hasNext(); ++lineNumber) {
lineProcessor.process(lineNumber, s.next());
}
}
}
static interface LineProcessor {
void process(int lineNumber, String lineContents);
}
}
You could read your file into a List instead of a String and then convert to an array:
//Setup a BufferedReader here
List<String> list = new ArrayList<String>();
String line = reader.readLine();
while (line != null) {
list.add(line);
line = reader.readLine();
}
String[] arr = list.toArray(new String[0]);
There is no built-in method in Java which can read an entire file. So you have the following options:
Use a non-standard library method, such as Apache Commons, see the code example in romaintaz's answer.
Loop around some read method (e.g. FileInputStream.read, which reads bytes, or FileReader.read, which reads chars; both read to a preallocated array). Both classes use system calls, so you'll have to speed them up with bufering (BufferedInputStream or BufferedReader) if you are reading just a small amount of data (say, less than 4096 bytes) at a time.
Loop around BufferedReader.readLine. There has a fundamental problem that it discards the information whether there was a '\n' at the end of the file -- so e.g. it is unable to distinguish an empty file from a file containing just a newline.
I'd use this code:
// charsetName can be null to use the default charset.
public static String readFileAsString(String fileName, String charsetName)
throws java.io.IOException {
java.io.InputStream is = new java.io.FileInputStream(fileName);
try {
final int bufsize = 4096;
int available = is.available();
byte[] data = new byte[available < bufsize ? bufsize : available];
int used = 0;
while (true) {
if (data.length - used < bufsize) {
byte[] newData = new byte[data.length << 1];
System.arraycopy(data, 0, newData, 0, used);
data = newData;
}
int got = is.read(data, used, data.length - used);
if (got <= 0) break;
used += got;
}
return charsetName != null ? new String(data, 0, used, charsetName)
: new String(data, 0, used);
} finally {
is.close();
}
}
The code above has the following advantages:
It's correct: it reads the whole file, not discarding any byte.
It lets you specify the character set (encoding) the file uses.
It's fast (no matter how many newlines the file contains).
It doesn't waste memory (no matter how many newlines the file contains).
FileReader fr=new FileReader(filename);
BufferedReader br=new BufferedReader(fr);
String strline;
String arr[]=new String[10];//10 is the no. of strings
while((strline=br.readLine())!=null)
{
arr[i++]=strline;
}
The simplest solution for reading a text file line by line and putting the results into an array of strings without using third party libraries would be this:
ArrayList<String> names = new ArrayList<String>();
Scanner scanner = new Scanner(new File("names.txt"));
while(scanner.hasNextLine()) {
names.add(scanner.nextLine());
}
scanner.close();
String[] namesArr = (String[]) names.toArray();
I always use this way:
String content = "";
String line;
BufferedReader reader = new BufferedReader(new FileReader(...));
while ((line = reader.readLine()) != null)
{
content += "\n" + line;
}
// Cut of the first newline;
content = content.substring(1);
// Close the reader
reader.close();
You can also use java.nio.file.Files to read an entire file into a String List then you can convert it to an array etc. Assuming a String variable named filePath, the following 2 lines will do that:
List<String> strList = Files.readAllLines(Paths.get(filePath), Charset.defaultCharset());
String[] strarray = strList.toArray(new String[0]);
A simpler (without loops), but less correct way, is to read everything to a byte array:
FileInputStream is = new FileInputStream(file);
byte[] b = new byte[(int) file.length()];
is.read(b, 0, (int) file.length());
String contents = new String(b);
Also note that this has serious performance issues.
If you have only InputStream, you can use InputStreamReader.
SmbFileInputStream in = new SmbFileInputStream("smb://host/dir/file.ext");
InputStreamReader r=new InputStreamReader(in);
char buf[] = new char[5000];
int count=r.read(buf);
String s=String.valueOf(buf, 0, count);
You can add cycle and StringBuffer if needed.
You can try Cactoos:
import org.cactoos.io.TextOf;
import java.io.File;
new TextOf(new File("a.txt")).asString().split("\n")
Fixed Version of #Anoyz's answer:
import java.io.FileInputStream;
import java.io.File;
public class App {
public static void main(String[] args) throws Exception {
File f = new File("file.txt");
long fileSize = f.length();
String file = "test.txt";
FileInputStream is = new FileInputStream("file.txt");
byte[] b = new byte[(int) f.length()];
is.read(b, 0, (int) f.length());
String contents = new String(b);
}
}