Is there a way to create a StringBuilder from a byte[]?
I want to improve memory usage using StringBuilder but what I have first is a byte[], so I have to create a String from the byte[] and then create the StringBuilder from the String and I don't see this solution as optimal.
Thanks
Basically, your best option seems to be using CharsetDecoder directly.
Here's how:
byte[] srcBytes = getYourSrcBytes();
//Whatever charset your bytes are endoded in
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();
//ByteBuffer.wrap simply wraps the byte array, it does not allocate new memory for it
ByteBuffer srcBuffer = ByteBuffer.wrap(srcBytes);
//Now, we decode our srcBuffer into a new CharBuffer (yes, new memory allocated here, no can do)
CharBuffer resBuffer = decoder.decode(srcBuffer);
//CharBuffer implements CharSequence interface, which StringBuilder fully support in it's methods
StringBuilder yourStringBuilder = new StringBuilder(resBuffer);
ADDED:
After some tests it seems that the simple new String(bytes) is much faster and it seems there is no simple way to make it faster than that. Here is the test I ran:
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.text.ParseException;
public class ConsoleMain {
public static void main(String[] args) throws IOException, ParseException {
StringBuilder sb1 = new StringBuilder("abcdefghijklmnopqrstuvwxyz");
for (int i=0;i<19;i++) {
sb1.append(sb1);
}
System.out.println("Size of buffer: "+sb1.length());
byte[] src = sb1.toString().getBytes("UTF-8");
StringBuilder res;
long startTime = System.currentTimeMillis();
res = testStringConvert(src);
System.out.println("Conversion using String time (msec): "+(System.currentTimeMillis()-startTime));
if (!res.toString().equals(sb1.toString())) {
System.err.println("Conversion error");
}
startTime = System.currentTimeMillis();
res = testCBConvert(src);
System.out.println("Conversion using CharBuffer time (msec): "+(System.currentTimeMillis()-startTime));
if (!res.toString().equals(sb1.toString())) {
System.err.println("Conversion error");
}
}
private static StringBuilder testStringConvert(byte[] src) throws UnsupportedEncodingException {
String s = new String(src, "UTF-8");
StringBuilder b = new StringBuilder(s);
return b;
}
private static StringBuilder testCBConvert(byte[] src) throws CharacterCodingException {
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();
ByteBuffer srcBuffer = ByteBuffer.wrap(src);
CharBuffer resBuffer = decoder.decode(srcBuffer);
StringBuilder b = new StringBuilder(resBuffer);
return b;
}
}
Results:
Size of buffer: 13631488
Conversion using String time (msec): 91
Conversion using CharBuffer time (msec): 252
And a modified (less memory-consuming) version on IDEONE: Here.
If it is short statements you want, then there is no way around the String step in between. The String constructor mixes conversion and object construction for convenience in a very common case, but there is no such convenience constructor for a StringBuilder.
If it is performance you are interested in, then you might avoid the intermediate String object by using something like this:
new StringBuilder(Charset.forName(charsetName).decode(ByteBuffer.wrap(inBytes)))
If you want to be able to fine-tune performance, you can control the decode process yourself. For example, you might want to avoid using too much memory, by using averageCharsPerByte as an estimate of how much memory will be needed. Instead of resizing the buffer if that estimate was too short, you could use the resulting StringBuilder to accumulate all the parts.
CharsetDecoder cd = Charset.forName(charsetName).newDecoder();
cd.onMalformedInput(CodingErrorAction.REPLACE);
cd.onUnmappableCharacter(CodingErrorAction.REPLACE);
int lengthEstimate = Math.ceil(cd.averageCharsPerByte()*inBytes.length) + 1;
ByteBuffer inBuf = ByteBuffer.wrap(inBytes);
CharBuffer outBuf = CharBuffer.allocate(lengthEstimate);
StringBuilder out = new StringBuilder(lengthEstimate);
CoderResult cr;
while (true) {
cr = cd.decode(inBuf, outBuf, true);
out.append(outBuf);
outBuf.clear();
if (cr.isUnderflow()) break;
if (!cr.isOverflow()) cr.throwException();
}
cr = cd.flush(outBuf);
if (!cr.isUnderflow()) cr.throwException();
out.append(outBuf);
I doubt that the above code will be worth the effort in most applications, though. If an application is that interested in performance, it probably shouldn't be dealing with StringBuilder either, but handle everything at the buffer level.
Related
What I want to do is save 4x8bytes as a 64bit Long.
And decode that 64bit Long into 4x8bytes again.
I know you may not understand it but I have an Encoder, which uses bytes
8 bits to make a 64 bit Long.
And I'm saving multiple of those an example: "-223784 2432834 -34233566"
and I want to read every number split it when " " is the character and put it in a long[].
Currently I have this Code:
FileInputStream fin = new FileInputStream( IOUtils.path + File.separator + "eclipse.hm" );
String c = "";
long[] longs = new long[1000000];
int b,ggg=0;
while((b=fin.read())!=-1) {
if( (char)b==' ' ) {
longs[ggg++] = Long.parseLong(c);
c = "";
} else {
c+=(char) b;
}
fetched++;
}
fin.close();
The Method of my "Decoder" is as follows:
public static Object decode(long[] input) throws DataFormatException, IOException, ClassNotFoundException {
byte[] toInflate = BitSet.valueOf(input).toByteArray();
Inflater inflater = new Inflater();
inflater.setInput(toInflate);
byte[] deflated = new byte[ toInflate.length*2 ];
inflater.inflate(deflated);
inflater.end();
ObjectInputStream ois = new ObjectInputStream( new ByteArrayInputStream(deflated) );
Object r = ois.readObject();
ois.close();
return r;
}
The Decoder works I had tested it with my Encoder and directly input the output of the Encoder.
So there must be a read error.
and I'm literally speechless, as well as I don't have anything in mind to fix this problem...
Thanks for help, sincerly Richee.
You can use some built-in String functions to split string. After that you need to do some transformations from string to long for all elements that you get after split step.
To fix this Issue I tried, removing the Deflater from my Encoder and Decoder(Inflater).
After that it worked so now I am wondering why does the Deflater/Inflater "destroy" my StreamHeader....
Anyways sorry for taking your time...
Given the file
Orange
Purple
Indigo
Pink
Why won't the myWay method in the code below give me the content of the ByteBuffer via the Charset.decode? Notice I validate that the ByteBuffer has the file content, but it seems no matter what methodology I use from within myWay, I cannot get the generated CharBuffer to have the content. The otherWay method works as expected. Does anyone know what's going on? I've read the javdoc for ByteBuffer and CharBuffer but didn't really see anything that explains this (or I just missed it.) What difference would it make to use FileChannel.read vs FileChannel.map if I can show the content of the buffer with read?
public class Junk {
private static final int BUFFER_SIZE = 127;
private static final String CHARSET = "UTF-8";
public static void main(String[] args) {
try {
String fileName = "two.txt";
myWay(fileName);
otherWay(fileName);
} catch (IOException e) {
throw new IllegalStateException(e);
}
}
private static void myWay(String fileName) throws IOException {
System.out.println("I did it MY WAY!......");
FileChannel channel = FileChannel.open(Paths.get(fileName), StandardOpenOption.READ);
// I tried both `allocate` and `allocateDirect`
ByteBuffer buffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
int bytesRead = channel.read(buffer);
channel.close();
// Manually build the string from the ByteBuffer.
// This is ONLY to validate the buffer has the content
StringBuilder sb = new StringBuilder();
for(int i=0;i<bytesRead;i++){
sb.append((char)buffer.get(i));
}
System.out.println("manual string='"+sb+"'");
CharBuffer charBuffer = Charset.forName(CHARSET).decode(buffer);
// WHY FOR YOU NO HAVE THE CHARS??!!
System.out.println("CharBuffer='" + new String(charBuffer.array()) + "'");
System.out.println("CharBuffer='" + charBuffer.toString() + "'");
System.out.println("........My way sucks.");
}
private static void otherWay(String fileName) throws IOException{
System.out.println("The other way...");
FileChannel channel = FileChannel.open(Paths.get(fileName), StandardOpenOption.READ);
ByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
channel.close();
Charset chars = Charset.forName(CHARSET);
CharBuffer cbuf = chars.decode(buffer);
String str = new String(cbuf.array());
System.out.println("str = '" + str + "'");
System.out.println("...works.");
}
}
The output:
I did it MY WAY!......
manual string='Orange
Purple
Indigo
Pink'
CharBuffer='������������������������������������������������������������������������������������������������������'
CharBuffer='������������������������������������������������������������������������������������������������������'
........My way sucks.
The other way...
str = 'Orange
Purple
Indigo
Pink'
...works.
Simple and subtle: You don't rewind your buffer.
When you call FileChannel#read(ByteBuffer), then this method will advance the position() of the buffer:
System.out.println("Before "+buffer.position()); // prints 0
int bytesRead = channel.read(buffer);
System.out.println("After "+buffer.position()); // prints 28
When you afterwards decode this into a CharBuffer, then you essentially decode exactly those 99 bytes that have never been written to (and that are all still 0).
Just add
buffer.rewind(); // (or buffer.position(0))
buffer.limit(bytesRead);
after you have read the data from the file channel, so that the decode method grabs exactly the part that has received data.
I am trying several ways to decode the bytes of a file into characters.
Using java.io.Reader and Channels.newReader(...)
public static void decodeWithReader() throws Exception {
FileInputStream fis = new FileInputStream(FILE);
FileChannel channel = fis.getChannel();
CharsetDecoder decoder = Charset.defaultCharset().newDecoder();
Reader reader = Channels.newReader(channel, decoder, -1);
final char[] buffer = new char[4096];
for(;;) {
if(-1 == reader.read(buffer)) {
break;
}
}
fis.close();
}
Using buffers and a decoder manually:
public static void readWithBuffers() throws Exception {
FileInputStream fis = new FileInputStream(FILE);
FileChannel channel = fis.getChannel();
CharsetDecoder decoder = Charset.defaultCharset().newDecoder();
final long fileLength = channel.size();
long position = 0;
final int bufferSize = 1024 * 1024; // 1MB
CharBuffer cbuf = CharBuffer.allocate(4096);
while(position < fileLength) {
MappedByteBuffer bbuf = channel.map(MapMode.READ_ONLY, position, Math.min(bufferSize, fileLength - position));
for(;;) {
CoderResult res = decoder.decode(bbuf, cbuf, false);
if(CoderResult.OVERFLOW == res) {
cbuf.clear();
} else if (CoderResult.UNDERFLOW == res) {
break;
}
}
position += bbuf.position();
}
fis.close();
}
For a 200MB text file, the first approach consistently takes 300ms to complete. The second approach consistently takes 700ms. Do you have any idea why the reader approach is so much faster?
Can it run even faster with another implementation?
The benchmark is performed on Windows 7, and JDK7_07.
For comparison can you try.
public static void readWithBuffersISO_8859_1() throws Exception {
FileInputStream fis = new FileInputStream(FILE);
FileChannel channel = fis.getChannel();
MappedByteBuffer bbuf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
while(bbuf.remaining()>0) {
char ch = (char)(bbuf.get() & 0xFF);
}
fis.close();
}
This assumes an ISO-8859-1. If you want maximum speed, treating the text like a binary format can help if its an option.
As #EJP points out, you are changing a number of things as once and you need to start with the simplest comparable example and see how much difference each element adds.
Here is a third implementation that does not use mapped buffers. In the same conditions than before, it runs consistently in 220ms. The default charset on my machine being "windows-1252", if I force the simpler "ISO-8859-1" charset the decoding is even faster (about 150ms).
It looks like the usage of native features like mapped buffers actually hurts performance (for this very use case). Also interesting, if I allocate direct buffers instead of heap buffers (look at the commented lines) then the performance is reduced (a run then takes around 400ms).
So far the answer seems to be: to decode characters as fast as possible in Java (provided you can't enforce the usage of one charset), use a decoder manually, write the decode loop with heap buffers, do not use mapped buffers or even native ones. I have to admit that I don't really know why it is so.
public static void readWithBuffers() throws Exception {
FileInputStream fis = new FileInputStream(FILE);
FileChannel channel = fis.getChannel();
CharsetDecoder decoder = Charset.defaultCharset().newDecoder();
// CharsetDecoder decoder = Charset.forName("ISO-8859-1").newDecoder();
ByteBuffer bbuf = ByteBuffer.allocate(4096);
// ByteBuffer bbuf = ByteBuffer.allocateDirect(4096);
CharBuffer cbuf = CharBuffer.allocate(4096);
// CharBuffer cbuf = ByteBuffer.allocateDirect(2 * 4096).asCharBuffer();
for(;;) {
if(-1 == channel.read(bbuf)) {
decoder.decode(bbuf, cbuf, true);
decoder.flush(cbuf);
break;
}
bbuf.flip();
CoderResult res = decoder.decode(bbuf, cbuf, false);
if(CoderResult.OVERFLOW == res) {
cbuf.clear();
} else if (CoderResult.UNDERFLOW == res) {
bbuf.compact();
}
}
fis.close();
}
Let's suppose I have just used a BufferedInputStream to read the bytes of a UTF-8 encoded text file into a byte array. I know that I can use the following routine to convert the bytes to a string, but is there a more efficient/smarter way of doing this than just iterating through the bytes and converting each one?
public String openFileToString(byte[] _bytes)
{
String file_string = "";
for(int i = 0; i < _bytes.length; i++)
{
file_string += (char)_bytes[i];
}
return file_string;
}
Look at the constructor for String
String str = new String(bytes, StandardCharsets.UTF_8);
And if you're feeling lazy, you can use the Apache Commons IO library to convert the InputStream to a String directly:
String str = IOUtils.toString(inputStream, StandardCharsets.UTF_8);
Java String class has a built-in-constructor for converting byte array to string.
byte[] byteArray = new byte[] {87, 79, 87, 46, 46, 46};
String value = new String(byteArray, "UTF-8");
To convert utf-8 data, you can't assume a 1-1 correspondence between bytes and characters.
Try this:
String file_string = new String(bytes, "UTF-8");
(Bah. I see I'm way to slow in hitting the Post Your Answer button.)
To read an entire file as a String, do something like this:
public String openFileToString(String fileName) throws IOException
{
InputStream is = new BufferedInputStream(new FileInputStream(fileName));
try {
InputStreamReader rdr = new InputStreamReader(is, "UTF-8");
StringBuilder contents = new StringBuilder();
char[] buff = new char[4096];
int len = rdr.read(buff);
while (len >= 0) {
contents.append(buff, 0, len);
}
return buff.toString();
} finally {
try {
is.close();
} catch (Exception e) {
// log error in closing the file
}
}
}
You can use the String(byte[] bytes) constructor for that. See this link for details.
EDIT You also have to consider your plateform's default charset as per the java doc:
Constructs a new String by decoding the specified array of bytes using
the platform's default charset. The length of the new String is a
function of the charset, and hence may not be equal to the length of
the byte array. The behavior of this constructor when the given bytes
are not valid in the default charset is unspecified. The
CharsetDecoder class should be used when more control over the
decoding process is required.
You could use the methods described in this question (especially since you start off with an InputStream): Read/convert an InputStream to a String
In particular, if you don't want to rely on external libraries, you can try this answer, which reads the InputStream via an InputStreamReader into a char[] buffer and appends it into a StringBuilder.
Knowing that you are dealing with a UTF-8 byte array, you'll definitely want to use the String constructor that accepts a charset name. Otherwise you may leave yourself open to some charset encoding based security vulnerabilities. Note that it throws UnsupportedEncodingException which you'll have to handle. Something like this:
public String openFileToString(String fileName) {
String file_string;
try {
file_string = new String(_bytes, "UTF-8");
} catch (UnsupportedEncodingException e) {
// this should never happen because "UTF-8" is hard-coded.
throw new IllegalStateException(e);
}
return file_string;
}
Here's a simplified function that will read in bytes and create a string. It assumes you probably already know what encoding the file is in (and otherwise defaults).
static final int BUFF_SIZE = 2048;
static final String DEFAULT_ENCODING = "utf-8";
public static String readFileToString(String filePath, String encoding) throws IOException {
if (encoding == null || encoding.length() == 0)
encoding = DEFAULT_ENCODING;
StringBuffer content = new StringBuffer();
FileInputStream fis = new FileInputStream(new File(filePath));
byte[] buffer = new byte[BUFF_SIZE];
int bytesRead = 0;
while ((bytesRead = fis.read(buffer)) != -1)
content.append(new String(buffer, 0, bytesRead, encoding));
fis.close();
return content.toString();
}
String has a constructor that takes byte[] and charsetname as parameters :)
This also involves iterating, but this is much better than concatenating strings as they are very very costly.
public String openFileToString(String fileName)
{
StringBuilder s = new StringBuilder(_bytes.length);
for(int i = 0; i < _bytes.length; i++)
{
s.append((char)_bytes[i]);
}
return s.toString();
}
Why not get what you are looking for from the get go and read a string from the file instead of an array of bytes? Something like:
BufferedReader in = new BufferedReader(new InputStreamReader( new FileInputStream( "foo.txt"), Charset.forName( "UTF-8"));
then readLine from in until it's done.
I use this way
String strIn = new String(_bytes, 0, numBytes);
I have a java ee application where I use a servlet to print a log file created with log4j. When reading log files you are usually looking for the last log line and therefore the servlet would be much more useful if it printed the log file in reverse order. My actual code is:
response.setContentType("text");
PrintWriter out = response.getWriter();
try {
FileReader logReader = new FileReader("logfile.log");
try {
BufferedReader buffer = new BufferedReader(logReader);
for (String line = buffer.readLine(); line != null; line = buffer.readLine()) {
out.println(line);
}
} finally {
logReader.close();
}
} finally {
out.close();
}
The implementations I've found in the internet involve using a StringBuffer and loading all the file before printing, isn't there a code light way of seeking to the end of the file and reading the content till the start of the file?
[EDIT]
By request, I am prepending this answer with the sentiment of a later comment: If you need this behavior frequently, a "more appropriate" solution is probably to move your logs from text files to database tables with DBAppender (part of log4j 2). Then you could simply query for latest entries.
[/EDIT]
I would probably approach this slightly differently than the answers listed.
(1) Create a subclass of Writer that writes the encoded bytes of each character in reverse order:
public class ReverseOutputStreamWriter extends Writer {
private OutputStream out;
private Charset encoding;
public ReverseOutputStreamWriter(OutputStream out, Charset encoding) {
this.out = out;
this.encoding = encoding;
}
public void write(int ch) throws IOException {
byte[] buffer = this.encoding.encode(String.valueOf(ch)).array();
// write the bytes in reverse order to this.out
}
// other overloaded methods
}
(2) Create a subclass of log4j WriterAppender whose createWriter method would be overridden to create an instance of ReverseOutputStreamWriter.
(3) Create a subclass of log4j Layout whose format method returns the log string in reverse character order:
public class ReversePatternLayout extends PatternLayout {
// constructors
public String format(LoggingEvent event) {
return new StringBuilder(super.format(event)).reverse().toString();
}
}
(4) Modify my logging configuration file to send log messages to both the "normal" log file and a "reverse" log file. The "reverse" log file would contain the same log messages as the "normal" log file, but each message would be written backwards. (Note that the encoding of the "reverse" log file would not necessarily conform to UTF-8, or even any character encoding.)
(5) Create a subclass of InputStream that wraps an instance of RandomAccessFile in order to read the bytes of a file in reverse order:
public class ReverseFileInputStream extends InputStream {
private RandomAccessFile in;
private byte[] buffer;
// The index of the next byte to read.
private int bufferIndex;
public ReverseFileInputStream(File file) {
this.in = new RandomAccessFile(File, "r");
this.buffer = new byte[4096];
this.bufferIndex = this.buffer.length;
this.in.seek(file.length());
}
public void populateBuffer() throws IOException {
// record the old position
// seek to a new, previous position
// read from the new position to the old position into the buffer
// reverse the buffer
}
public int read() throws IOException {
if (this.bufferIndex == this.buffer.length) {
populateBuffer();
if (this.bufferIndex == this.buffer.length) {
return -1;
}
}
return this.buffer[this.bufferIndex++];
}
// other overridden methods
}
Now if I want to read the entries of the "normal" log file in reverse order, I just need to create an instance of ReverseFileInputStream, giving it the "revere" log file.
This is a old question. I also wanted to do the same thing and after some searching found there is a class in apache commons-io to achieve this:
org.apache.commons.io.input.ReversedLinesFileReader
I think a good choice for this would be using RandomFileAccess class. There is some sample code for back-reading using this class on this page. Reading bytes this way is easy, however reading strings might be a bit more challenging.
If you are in a hurry and want the simplest solution without worrying too much about performance, I would give a try to use an external process to do the dirty job (given that you are running your app in a Un*x server, as any decent person would do XD)
new BufferedReader(new InputStreamReader(Runtime.getRuntime().exec("tail yourlogfile.txt -n 50 | rev").getProcess().getInputStream()))
A simpler alternative, because you say that you're creating a servlet to do this, is to use a LinkedList to hold the last N lines (where N might be a servlet parameter). When the list size exceeds N, you call removeFirst().
From a user experience perspective, this is probably the best solution. As you note, the most recent lines are the most important. Not being overwhelmed with information is also very important.
Good question. I'm not aware of any common implementations of this. It's not trivial to do properly either, so be careful what you choose. It should deal with character set encoding and detection of different line break methods. Here's the implementation I have so far that works with ASCII and UTF-8 encoded files, including a test case for UTF-8. It does not work with UTF-16LE or UTF-16BE encoded files.
import java.io.BufferedReader;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.RandomAccessFile;
import java.io.Reader;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import junit.framework.TestCase;
public class ReverseLineReader {
private static final int BUFFER_SIZE = 8192;
private final FileChannel channel;
private final String encoding;
private long filePos;
private ByteBuffer buf;
private int bufPos;
private byte lastLineBreak = '\n';
private ByteArrayOutputStream baos = new ByteArrayOutputStream();
public ReverseLineReader(File file, String encoding) throws IOException {
RandomAccessFile raf = new RandomAccessFile(file, "r");
channel = raf.getChannel();
filePos = raf.length();
this.encoding = encoding;
}
public String readLine() throws IOException {
while (true) {
if (bufPos < 0) {
if (filePos == 0) {
if (baos == null) {
return null;
}
String line = bufToString();
baos = null;
return line;
}
long start = Math.max(filePos - BUFFER_SIZE, 0);
long end = filePos;
long len = end - start;
buf = channel.map(FileChannel.MapMode.READ_ONLY, start, len);
bufPos = (int) len;
filePos = start;
}
while (bufPos-- > 0) {
byte c = buf.get(bufPos);
if (c == '\r' || c == '\n') {
if (c != lastLineBreak) {
lastLineBreak = c;
continue;
}
lastLineBreak = c;
return bufToString();
}
baos.write(c);
}
}
}
private String bufToString() throws UnsupportedEncodingException {
if (baos.size() == 0) {
return "";
}
byte[] bytes = baos.toByteArray();
for (int i = 0; i < bytes.length / 2; i++) {
byte t = bytes[i];
bytes[i] = bytes[bytes.length - i - 1];
bytes[bytes.length - i - 1] = t;
}
baos.reset();
return new String(bytes, encoding);
}
public static void main(String[] args) throws IOException {
File file = new File("my.log");
ReverseLineReader reader = new ReverseLineReader(file, "UTF-8");
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
}
public static class ReverseLineReaderTest extends TestCase {
public void test() throws IOException {
File file = new File("utf8test.log");
String encoding = "UTF-8";
FileInputStream fileIn = new FileInputStream(file);
Reader fileReader = new InputStreamReader(fileIn, encoding);
BufferedReader bufReader = new BufferedReader(fileReader);
List<String> lines = new ArrayList<String>();
String line;
while ((line = bufReader.readLine()) != null) {
lines.add(line);
}
Collections.reverse(lines);
ReverseLineReader reader = new ReverseLineReader(file, encoding);
int pos = 0;
while ((line = reader.readLine()) != null) {
assertEquals(lines.get(pos++), line);
}
assertEquals(lines.size(), pos);
}
}
}
you can use RandomAccessFile implements this function,such as:
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import com.google.common.io.LineProcessor;
public class FileUtils {
/**
* 反向读取文本文件(UTF8),文本文件分行是通过\r\n
*
* #param <T>
* #param file
* #param step 反向寻找的步长
* #param lineprocessor
* #throws IOException
*/
public static <T> T backWardsRead(File file, int step,
LineProcessor<T> lineprocessor) throws IOException {
RandomAccessFile rf = new RandomAccessFile(file, "r");
long fileLen = rf.length();
long pos = fileLen - step;
// 寻找倒序的第一行:\r
while (true) {
if (pos < 0) {
// 处理第一行
rf.seek(0);
lineprocessor.processLine(rf.readLine());
return lineprocessor.getResult();
}
rf.seek(pos);
char c = (char) rf.readByte();
while (c != '\r') {
c = (char) rf.readByte();
}
rf.readByte();//read '\n'
pos = rf.getFilePointer();
if (!lineprocessor.processLine(rf.readLine())) {
return lineprocessor.getResult();
}
pos -= step;
}
}
use:
FileUtils.backWardsRead(new File("H:/usersfavs.csv"), 40,
new LineProcessor<Void>() {
//TODO implements method
.......
});
The simplest solution is to read through the file in forward order, using an ArrayList<Long> to hold the byte offset of each log record. You'll need to use something like Jakarta Commons CountingInputStream to retrieve the position of each record, and will need to carefully organize your buffers to ensure that it returns the proper values:
FileInputStream fis = // .. logfile
BufferedInputStream bis = new BufferedInputStream(fis);
CountingInputStream cis = new CountingInputSteam(bis);
InputStreamReader isr = new InputStreamReader(cis, "UTF-8");
And you probably won't be able to use a BufferedReader, because it will attempt to read-ahead and throw off the count (but reading a character at a time won't be a performance problem, because you're buffering lower in the stack).
To write the file, you iterate the list backwards and use a RandomAccessFile. There is a bit of a trick: to properly decode the bytes (assuming a multi-byte encoding), you will need to read the bytes corresponding to an entry, and then apply a decoding to it. The list, however, will give you the start and end position of the bytes.
One big benefit to this approach, versus simply printing the lines in reverse order, is that you won't damage multi-line log messages (such as exceptions).
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
/**
* Inside of C:\\temp\\vaquar.txt we have following content
* vaquar khan is working into Citi He is good good programmer programmer trust me
* #author vaquar.khan#gmail.com
*
*/
public class ReadFileAndDisplayResultsinReverse {
public static void main(String[] args) {
try {
// read data from file
Object[] wordList = ReadFile();
System.out.println("File data=" + wordList);
//
Set<String> uniquWordList = null;
for (Object text : wordList) {
System.out.println((String) text);
List<String> tokens = Arrays.asList(text.toString().split("\\s+"));
System.out.println("tokens" + tokens);
uniquWordList = new HashSet<String>(tokens);
// If multiple line then code into same loop
}
System.out.println("uniquWordList" + uniquWordList);
Comparator<String> wordComp= new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
if(o1==null && o2 ==null) return 0;
if(o1==null ) return o2.length()-0;
if(o2 ==null) return o1.length()-0;
//
return o2.length()-o1.length();
}
};
List<String> fs=new ArrayList<String>(uniquWordList);
Collections.sort(fs,wordComp);
System.out.println("uniquWordList" + fs);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
static Object[] ReadFile() throws IOException {
List<String> list = Files.readAllLines(new File("C:\\temp\\vaquar.txt").toPath(), Charset.defaultCharset());
return list.toArray();
}
}
Output:
[Vaquar khan is working into Citi He is good good programmer programmer trust me
tokens[vaquar, khan, is, working, into, Citi, He, is, good, good, programmer, programmer, trust, me]
uniquWordList[trust, vaquar, programmer, is, good, into, khan, me, working, Citi, He]
uniquWordList[programmer, working, vaquar, trust, good, into, khan, Citi, is, me, He]
If you want to Sort A to Z then write one more comparater
Concise solution using Java 7 Autoclosables and Java 8 Streams :
try (Stream<String> logStream = Files.lines(Paths.get("C:\\logfile.log"))) {
logStream
.sorted(Comparator.reverseOrder())
.limit(10) // last 10 lines
.forEach(System.out::println);
}
Big drawback: only works when lines are strictly in natural order, like log files prefixed with timestamps but without exceptions