I'm trying to print a few of the results at a time and line break to prevent passing the 80 character column in output. I don't want to use a hard-coded line value and would rather print out a few elements at a time until complete. The Arraylist is a value in a map, so for each word i'd like the output to look like this
word =
[ 12, 456, 345, 134,
536, 6346, 3426, 2346,
2346, 2347, 36787, 46789]
rather than this:
word = [12, 456, 345, 134, 536, 6346...
Class involved:
package java112.analyzer;
import java.io.*;
import java.util.*;
/**
* This is the KeywordAnalyzer class. <br>
*
*#author mgundrum
*
*/
public class KeywordAnalyzer implements Analyzer {
private Map<String, List<Integer>> keywordMap = new TreeMap();
private Properties properties;
private int tokenOccurence = 1;
/**
*This is a constructor method for the TokenCountAnalyzer class
*/
public KeywordAnalyzer() {
}
/**
*This is a constructor method for the TokenCountAnalyzer class
*
*#param properties The properties object passed from AnalyzeFile
*/
public KeywordAnalyzer(Properties properties) {
this.properties = properties;
readFile();
}
public Map<String, List<Integer>> getKeywordMap() {
return keywordMap;
}
public void readFile() {
BufferedReader input = null;
try {
//create a BufferedReader to read the input file
input = new BufferedReader(new FileReader(properties.getProperty("file.path.keywords")));
String inputLine = "";
//loop through the input file one line at a time and split on white
while (input.ready()) {
inputLine = input.readLine();
keywordMap.put(inputLine, new ArrayList<Integer>() );
}
} catch (java.io.FileNotFoundException fnfe) {
System.out.println("Failed to read input file");
fnfe.printStackTrace();
} catch (Exception exception) {
System.out.println("General Error");
exception.printStackTrace();
} finally {
//Don't forget to close!
try {
if (input != null) {
input.close();
}
} catch (java.io.IOException ioe) {
System.out.println("Failed to close input file");
ioe.printStackTrace();
}
}
}
/**
* This method counts the total tokens as they are passed in
*
*#param token - tokens passed from AnalyzeFile
*/
public void processToken(String token) {
if (keywordMap.containsKey(token)) {
List list = keywordMap.get(token);
list.add( new Integer(tokenOccurence));
keywordMap.put(token, list);
}
tokenOccurence++;
}
/**
* This method defines the TokenCount output file.
*
*#param inputFilePath - original file
*/
public void writeOutputFile(String inputFilePath) {
PrintWriter output = null;
try {
output = new PrintWriter( new BufferedWriter( new FileWriter(
properties.getProperty("output.dir") +
properties.getProperty("output.file.keywords"))));
for (Map.Entry<String, List<Integer>> entry : keywordMap.entrySet()) {
output.println(entry.getKey() + " = ");
output.println(entry.getValue());
output.println();
}
} catch (IOException ioException) {
System.out.println("FileWriter caused an error");
ioException.printStackTrace();
} finally {
if (output != null) {
output.close();
}
}
}
}
Any help would be great!
I wrote this static method that takes in a string and returns it word wrapped to a specified character length. I tested some basic edge cases, but you probably want to test it more thoroughly before you use it for anything!
In essence, it starts at the char length, then goes back to look for spaces. If it finds one, it replaces it with a newline. If it cant find any, it looks forward for them. If it fails that too, it gives up and returns whatever it has.
Here it is:
public static String wordWrap(String rawInput, int maxLineLength) {
StringBuilder input = new StringBuilder(rawInput);
StringBuilder output = new StringBuilder();
while(input.length() > maxLineLength) {
for(int i = maxLineLength; i >= 0; i--) {
//you can change this delimiter to whatever (or add it as an argument =D)
if(input.charAt(i) == ' ') {
output.append(input.substring(0, i));
output.append('\n');
input.delete(0, i+1);
break;
}
if(i == 0) {
for(int j = maxLineLength; j < input.length(); j++) {
if(input.charAt(j) == ' ') {
output.append(input.substring(0, j));
output.append('\n');
input.delete(0, j+1);
break;
}
if(j == input.length() - 1) {
return output.append(input).toString();
}
}
}
}
}
return output.toString();
}
I want to split data based on character values which are two right parenthesis )) as start of substring and carriage return CR as the end of substring. The data comes in form of bytes Am stuck on how to split it. This is so far what I have come up with.
public class ByteDecoder {
public static void main(String[] args) throws IOException {
InputStream is = null;
DataInputStream dis = null;
try{
is = new FileInputStream("byte.log");
dis = new DataInputStream(is);
int count = is.available();
byte[] bs = new byte[count];
dis.read(bs);
for (byte b:bs)
{
char c = (char)b;
System.out.println(c);
//convert bytes to hex string
// String c = DatatypeConverter.printHexBinary( bs);
}
}catch(Exception e){
e.printStackTrace();
}finally{
if(is!=null)
is.close();
if(dis!=null)
dis.close();
}
}
}
CR (unlucky 13) as end marker of binary data might be a bit dangerous. More dangerous seems how the text and bytes became written: the text must be written as bytes in some encoding.
But considering that, one could wrap the FileInputStream in your own ByteLogInputStream, and there hold the reading state:
/**
* An InputStream converting bytes between ASCII "))" and CR to hexadecimal.
* Typically wrapped as:
* <pre>
* try (BufferedReader in = new BufferedReader(
* new InputStreamReader(
* new ByteLogInputStream(
* new FileInputStream(file), "UTF-8"))) {
* ...
* }
* </pre>
*/
public class ByteLogInputStream extends InputStream {
private enum State {
TEXT,
AFTER_RIGHT_PARENT,
BINARY
}
private final InputStream in;
private State state = State.TEXT;
private int nextHexDigit = 0;
public ByteLogInputStream(InputStream in) {
this.in = in;
}
#Override
public int read() throws IOException {
if (nextHexDigit != 0) {
int hex = nextHexDigit;
nextHexDigit = 0;
return hex;
}
int ch = in.read();
if (ch != -1) {
switch (state) {
case TEXT:
if (ch == ')') {
state = State.AFTER_RIGHT_PARENT;
}
break;
case AFTER_RIGHT_PARENT:
if (ch == ')') {
state = State.BINARY;
}
break;
case BINARY:
if (ch == '\r') {
state = State.TEXT;
} else {
String hex2 = String.format("%02X", ch);
ch = hex2.charAt(0);
nextHexDigit = hex2.charAt(1);
}
break;
}
}
return ch;
}
}
As one binary byte results in two hexadecimal digits, you need to buffer a nextHexDigit for the next digit.
I did not override available (to account for a possible nextHexDigit).
If you want to check whether \r\n follows, one should use a PushBackReader. I did use an InputStream, as you did not specify the encoding.
I have a big test file with 70 million lines of text.
I have to read the file line by line.
I used two different approaches:
InputStreamReader isr = new InputStreamReader(new FileInputStream(FilePath),"unicode");
BufferedReader br = new BufferedReader(isr);
while((cur=br.readLine()) != null);
and
LineIterator it = FileUtils.lineIterator(new File(FilePath), "unicode");
while(it.hasNext()) cur=it.nextLine();
Is there another approach that can make this task faster?
1) I am sure there is no difference speedwise, both use FileInputStream internally and buffering
2) You can take measurements and see for yourself
3) Though there's no performance benefits I like the 1.7 approach
try (BufferedReader br = Files.newBufferedReader(Paths.get("test.txt"), StandardCharsets.UTF_8)) {
for (String line = null; (line = br.readLine()) != null;) {
//
}
}
4) Scanner based version
try (Scanner sc = new Scanner(new File("test.txt"), "UTF-8")) {
while (sc.hasNextLine()) {
String line = sc.nextLine();
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
}
5) This may be faster than the rest
try (SeekableByteChannel ch = Files.newByteChannel(Paths.get("test.txt"))) {
ByteBuffer bb = ByteBuffer.allocateDirect(1000);
for(;;) {
StringBuilder line = new StringBuilder();
int n = ch.read(bb);
// add chars to line
// ...
}
}
it requires a bit of coding but it can be really faster because of ByteBuffer.allocateDirect. It allows OS to read bytes from file to ByteBuffer directly, without copying
6) Parallel processing would definitely increase speed. Make a big byte buffer, run several tasks that read bytes from file into that buffer in parallel, when ready find first end of line, make a String, find next...
If you are looking out at performance, you could have a look at the java.nio.* packages - those are supposedly faster than java.io.*
In Java 8, for anyone looking now to read file large files line by line,
Stream<String> lines = Files.lines(Paths.get("c:\myfile.txt"));
lines.forEach(l -> {
// Do anything line by line
});
I actually did a research in this topic for months in my free time and came up with a benchmark and here is a code to benchmark all the different ways to read a File line by line.The individual performance may vary based on the underlying system.
I ran on a windows 10 Java 8 Intel i5 HP laptop:Here is the code.
import java.io.*;
import java.nio.channels.Channels;
import java.nio.channels.FileChannel;
import java.nio.file.Files;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
import java.util.regex.Pattern;
import java.util.stream.Stream;
public class ReadComplexDelimitedFile {
private static long total = 0;
private static final Pattern FIELD_DELIMITER_PATTERN = Pattern.compile("\\^\\|\\^");
#SuppressWarnings("unused")
private void readFileUsingScanner() {
String s;
try (Scanner stdin = new Scanner(new File(this.getClass().getResource("input.txt").getPath()))) {
while (stdin.hasNextLine()) {
s = stdin.nextLine();
String[] fields = FIELD_DELIMITER_PATTERN.split(s, 0);
total = total + fields.length;
}
} catch (Exception e) {
System.err.println("Error");
}
}
//Winner
private void readFileUsingCustomBufferedReader() {
try (CustomBufferedReader stdin = new CustomBufferedReader(new FileReader(new File(this.getClass().getResource("input.txt").getPath())))) {
String s;
while ((s = stdin.readLine()) != null) {
String[] fields = FIELD_DELIMITER_PATTERN.split(s, 0);
total += fields.length;
}
} catch (Exception e) {
System.err.println("Error");
}
}
private void readFileUsingBufferedReader() {
try (BufferedReader stdin = new BufferedReader(new FileReader(new File(this.getClass().getResource("input.txt").getPath())))) {
String s;
while ((s = stdin.readLine()) != null) {
String[] fields = FIELD_DELIMITER_PATTERN.split(s, 0);
total += fields.length;
}
} catch (Exception e) {
System.err.println("Error");
}
}
private void readFileUsingLineReader() {
try (LineNumberReader stdin = new LineNumberReader(new FileReader(new File(this.getClass().getResource("input.txt").getPath())))) {
String s;
while ((s = stdin.readLine()) != null) {
String[] fields = FIELD_DELIMITER_PATTERN.split(s, 0);
total += fields.length;
}
} catch (Exception e) {
System.err.println("Error");
}
}
private void readFileUsingStreams() {
try (Stream<String> stream = Files.lines((new File(this.getClass().getResource("input.txt").getPath())).toPath())) {
total += stream.mapToInt(s -> FIELD_DELIMITER_PATTERN.split(s, 0).length).sum();
} catch (IOException e1) {
e1.printStackTrace();
}
}
private void readFileUsingBufferedReaderFileChannel() {
try (FileInputStream fis = new FileInputStream(this.getClass().getResource("input.txt").getPath())) {
try (FileChannel inputChannel = fis.getChannel()) {
try (CustomBufferedReader stdin = new CustomBufferedReader(Channels.newReader(inputChannel, "UTF-8"))) {
String s;
while ((s = stdin.readLine()) != null) {
String[] fields = FIELD_DELIMITER_PATTERN.split(s, 0);
total = total + fields.length;
}
}
} catch (Exception e) {
System.err.println("Error");
}
} catch (Exception e) {
System.err.println("Error");
}
}
public static void main(String args[]) {
//JVM wamrup
for (int i = 0; i < 100000; i++) {
total += i;
}
// We know scanner is slow-Still warming up
ReadComplexDelimitedFile readComplexDelimitedFile = new ReadComplexDelimitedFile();
List<Long> longList = new ArrayList<>(50);
for (int i = 0; i < 50; i++) {
total = 0;
long startTime = System.nanoTime();
//readComplexDelimitedFile.readFileUsingScanner();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingScanner");
longList.forEach(System.out::println);
// Actual performance test starts here
longList = new ArrayList<>(10);
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingBufferedReaderFileChannel();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingBufferedReaderFileChannel");
longList.forEach(System.out::println);
longList.clear();
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingBufferedReader();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingBufferedReader");
longList.forEach(System.out::println);
longList.clear();
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingStreams();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingStreams");
longList.forEach(System.out::println);
longList.clear();
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingCustomBufferedReader();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingCustomBufferedReader");
longList.forEach(System.out::println);
longList.clear();
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingLineReader();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingLineReader");
longList.forEach(System.out::println);
}
}
I had to rewrite BufferedReader to avoid synchronized and a couple of boundary conditions that is not needed.(Atleast that's what I felt.It is not unit tested so use it at your own risk.)
import com.sun.istack.internal.NotNull;
import java.io.*;
import java.util.Iterator;
import java.util.NoSuchElementException;
import java.util.Spliterator;
import java.util.Spliterators;
import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
/**
* Reads text from a character-input stream, buffering characters so as to
* provide for the efficient reading of characters, arrays, and lines.
* <p>
* <p> The buffer size may be specified, or the default size may be used. The
* default is large enough for most purposes.
* <p>
* <p> In general, each read request made of a Reader causes a corresponding
* read request to be made of the underlying character or byte stream. It is
* therefore advisable to wrap a CustomBufferedReader around any Reader whose read()
* operations may be costly, such as FileReaders and InputStreamReaders. For
* example,
* <p>
* <pre>
* CustomBufferedReader in
* = new CustomBufferedReader(new FileReader("foo.in"));
* </pre>
* <p>
* will buffer the input from the specified file. Without buffering, each
* invocation of read() or readLine() could cause bytes to be read from the
* file, converted into characters, and then returned, which can be very
* inefficient.
* <p>
* <p> Programs that use DataInputStreams for textual input can be localized by
* replacing each DataInputStream with an appropriate CustomBufferedReader.
*
* #author Mark Reinhold
* #see FileReader
* #see InputStreamReader
* #see java.nio.file.Files#newBufferedReader
* #since JDK1.1
*/
public class CustomBufferedReader extends Reader {
private final Reader in;
private char cb[];
private int nChars, nextChar;
private static final int INVALIDATED = -2;
private static final int UNMARKED = -1;
private int markedChar = UNMARKED;
private int readAheadLimit = 0; /* Valid only when markedChar > 0 */
/**
* If the next character is a line feed, skip it
*/
private boolean skipLF = false;
/**
* The skipLF flag when the mark was set
*/
private boolean markedSkipLF = false;
private static int defaultCharBufferSize = 8192;
private static int defaultExpectedLineLength = 80;
private ReadWriteLock rwlock;
/**
* Creates a buffering character-input stream that uses an input buffer of
* the specified size.
*
* #param in A Reader
* #param sz Input-buffer size
* #throws IllegalArgumentException If {#code sz <= 0}
*/
public CustomBufferedReader(#NotNull final Reader in, int sz) {
super(in);
if (sz <= 0)
throw new IllegalArgumentException("Buffer size <= 0");
this.in = in;
cb = new char[sz];
nextChar = nChars = 0;
rwlock = new ReentrantReadWriteLock();
}
/**
* Creates a buffering character-input stream that uses a default-sized
* input buffer.
*
* #param in A Reader
*/
public CustomBufferedReader(#NotNull final Reader in) {
this(in, defaultCharBufferSize);
}
/**
* Fills the input buffer, taking the mark into account if it is valid.
*/
private void fill() throws IOException {
int dst;
if (markedChar <= UNMARKED) {
/* No mark */
dst = 0;
} else {
/* Marked */
int delta = nextChar - markedChar;
if (delta >= readAheadLimit) {
/* Gone past read-ahead limit: Invalidate mark */
markedChar = INVALIDATED;
readAheadLimit = 0;
dst = 0;
} else {
if (readAheadLimit <= cb.length) {
/* Shuffle in the current buffer */
System.arraycopy(cb, markedChar, cb, 0, delta);
markedChar = 0;
dst = delta;
} else {
/* Reallocate buffer to accommodate read-ahead limit */
char ncb[] = new char[readAheadLimit];
System.arraycopy(cb, markedChar, ncb, 0, delta);
cb = ncb;
markedChar = 0;
dst = delta;
}
nextChar = nChars = delta;
}
}
int n;
do {
n = in.read(cb, dst, cb.length - dst);
} while (n == 0);
if (n > 0) {
nChars = dst + n;
nextChar = dst;
}
}
/**
* Reads a single character.
*
* #return The character read, as an integer in the range
* 0 to 65535 (<tt>0x00-0xffff</tt>), or -1 if the
* end of the stream has been reached
* #throws IOException If an I/O error occurs
*/
public char readChar() throws IOException {
for (; ; ) {
if (nextChar >= nChars) {
fill();
if (nextChar >= nChars)
return (char) -1;
}
return cb[nextChar++];
}
}
/**
* Reads characters into a portion of an array, reading from the underlying
* stream if necessary.
*/
private int read1(char[] cbuf, int off, int len) throws IOException {
if (nextChar >= nChars) {
/* If the requested length is at least as large as the buffer, and
if there is no mark/reset activity, and if line feeds are not
being skipped, do not bother to copy the characters into the
local buffer. In this way buffered streams will cascade
harmlessly. */
if (len >= cb.length && markedChar <= UNMARKED && !skipLF) {
return in.read(cbuf, off, len);
}
fill();
}
if (nextChar >= nChars) return -1;
int n = Math.min(len, nChars - nextChar);
System.arraycopy(cb, nextChar, cbuf, off, n);
nextChar += n;
return n;
}
/**
* Reads characters into a portion of an array.
* <p>
* <p> This method implements the general contract of the corresponding
* <code>{#link Reader#read(char[], int, int) read}</code> method of the
* <code>{#link Reader}</code> class. As an additional convenience, it
* attempts to read as many characters as possible by repeatedly invoking
* the <code>read</code> method of the underlying stream. This iterated
* <code>read</code> continues until one of the following conditions becomes
* true: <ul>
* <p>
* <li> The specified number of characters have been read,
* <p>
* <li> The <code>read</code> method of the underlying stream returns
* <code>-1</code>, indicating end-of-file, or
* <p>
* <li> The <code>ready</code> method of the underlying stream
* returns <code>false</code>, indicating that further input requests
* would block.
* <p>
* </ul> If the first <code>read</code> on the underlying stream returns
* <code>-1</code> to indicate end-of-file then this method returns
* <code>-1</code>. Otherwise this method returns the number of characters
* actually read.
* <p>
* <p> Subclasses of this class are encouraged, but not required, to
* attempt to read as many characters as possible in the same fashion.
* <p>
* <p> Ordinarily this method takes characters from this stream's character
* buffer, filling it from the underlying stream as necessary. If,
* however, the buffer is empty, the mark is not valid, and the requested
* length is at least as large as the buffer, then this method will read
* characters directly from the underlying stream into the given array.
* Thus redundant <code>CustomBufferedReader</code>s will not copy data
* unnecessarily.
*
* #param cbuf Destination buffer
* #param off Offset at which to start storing characters
* #param len Maximum number of characters to read
* #return The number of characters read, or -1 if the end of the
* stream has been reached
* #throws IOException If an I/O error occurs
*/
public int read(char cbuf[], int off, int len) throws IOException {
int n = read1(cbuf, off, len);
if (n <= 0) return n;
while ((n < len) && in.ready()) {
int n1 = read1(cbuf, off + n, len - n);
if (n1 <= 0) break;
n += n1;
}
return n;
}
/**
* Reads a line of text. A line is considered to be terminated by any one
* of a line feed ('\n'), a carriage return ('\r'), or a carriage return
* followed immediately by a linefeed.
*
* #param ignoreLF If true, the next '\n' will be skipped
* #return A String containing the contents of the line, not including
* any line-termination characters, or null if the end of the
* stream has been reached
* #throws IOException If an I/O error occurs
* #see java.io.LineNumberReader#readLine()
*/
String readLine(boolean ignoreLF) throws IOException {
StringBuilder s = null;
int startChar;
bufferLoop:
for (; ; ) {
if (nextChar >= nChars)
fill();
if (nextChar >= nChars) { /* EOF */
if (s != null && s.length() > 0)
return s.toString();
else
return null;
}
boolean eol = false;
char c = 0;
int i;
/* Skip a leftover '\n', if necessary */
charLoop:
for (i = nextChar; i < nChars; i++) {
c = cb[i];
if ((c == '\n')) {
eol = true;
break charLoop;
}
}
startChar = nextChar;
nextChar = i;
if (eol) {
String str;
if (s == null) {
str = new String(cb, startChar, i - startChar);
} else {
s.append(cb, startChar, i - startChar);
str = s.toString();
}
nextChar++;
return str;
}
if (s == null)
s = new StringBuilder(defaultExpectedLineLength);
s.append(cb, startChar, i - startChar);
}
}
/**
* Reads a line of text. A line is considered to be terminated by any one
* of a line feed ('\n'), a carriage return ('\r'), or a carriage return
* followed immediately by a linefeed.
*
* #return A String containing the contents of the line, not including
* any line-termination characters, or null if the end of the
* stream has been reached
* #throws IOException If an I/O error occurs
* #see java.nio.file.Files#readAllLines
*/
public String readLine() throws IOException {
return readLine(false);
}
/**
* Skips characters.
*
* #param n The number of characters to skip
* #return The number of characters actually skipped
* #throws IllegalArgumentException If <code>n</code> is negative.
* #throws IOException If an I/O error occurs
*/
public long skip(long n) throws IOException {
if (n < 0L) {
throw new IllegalArgumentException("skip value is negative");
}
rwlock.readLock().lock();
long r = n;
try{
while (r > 0) {
if (nextChar >= nChars)
fill();
if (nextChar >= nChars) /* EOF */
break;
if (skipLF) {
skipLF = false;
if (cb[nextChar] == '\n') {
nextChar++;
}
}
long d = nChars - nextChar;
if (r <= d) {
nextChar += r;
r = 0;
break;
} else {
r -= d;
nextChar = nChars;
}
}
} finally {
rwlock.readLock().unlock();
}
return n - r;
}
/**
* Tells whether this stream is ready to be read. A buffered character
* stream is ready if the buffer is not empty, or if the underlying
* character stream is ready.
*
* #throws IOException If an I/O error occurs
*/
public boolean ready() throws IOException {
rwlock.readLock().lock();
try {
/*
* If newline needs to be skipped and the next char to be read
* is a newline character, then just skip it right away.
*/
if (skipLF) {
/* Note that in.ready() will return true if and only if the next
* read on the stream will not block.
*/
if (nextChar >= nChars && in.ready()) {
fill();
}
if (nextChar < nChars) {
if (cb[nextChar] == '\n')
nextChar++;
skipLF = false;
}
}
} finally {
rwlock.readLock().unlock();
}
return (nextChar < nChars) || in.ready();
}
/**
* Tells whether this stream supports the mark() operation, which it does.
*/
public boolean markSupported() {
return true;
}
/**
* Marks the present position in the stream. Subsequent calls to reset()
* will attempt to reposition the stream to this point.
*
* #param readAheadLimit Limit on the number of characters that may be
* read while still preserving the mark. An attempt
* to reset the stream after reading characters
* up to this limit or beyond may fail.
* A limit value larger than the size of the input
* buffer will cause a new buffer to be allocated
* whose size is no smaller than limit.
* Therefore large values should be used with care.
* #throws IllegalArgumentException If {#code readAheadLimit < 0}
* #throws IOException If an I/O error occurs
*/
public void mark(int readAheadLimit) throws IOException {
if (readAheadLimit < 0) {
throw new IllegalArgumentException("Read-ahead limit < 0");
}
rwlock.readLock().lock();
try {
this.readAheadLimit = readAheadLimit;
markedChar = nextChar;
markedSkipLF = skipLF;
} finally {
rwlock.readLock().unlock();
}
}
/**
* Resets the stream to the most recent mark.
*
* #throws IOException If the stream has never been marked,
* or if the mark has been invalidated
*/
public void reset() throws IOException {
rwlock.readLock().lock();
try {
if (markedChar < 0)
throw new IOException((markedChar == INVALIDATED)
? "Mark invalid"
: "Stream not marked");
nextChar = markedChar;
skipLF = markedSkipLF;
} finally {
rwlock.readLock().unlock();
}
}
public void close() throws IOException {
rwlock.readLock().lock();
try {
in.close();
} finally {
cb = null;
rwlock.readLock().unlock();
}
}
public Stream<String> lines() {
Iterator<String> iter = new Iterator<String>() {
String nextLine = null;
#Override
public boolean hasNext() {
if (nextLine != null) {
return true;
} else {
try {
nextLine = readLine();
return (nextLine != null);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
}
#Override
public String next() {
if (nextLine != null || hasNext()) {
String line = nextLine;
nextLine = null;
return line;
} else {
throw new NoSuchElementException();
}
}
};
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(
iter, Spliterator.ORDERED | Spliterator.NONNULL), false);
}
}
And now the results:
Time taken for readFileUsingBufferedReaderFileChannel
2902690903
1845190694
1894071377
1815161868
1861056735
1867693540
1857521371
1794176251
1768008762
1853089582
Time taken for readFileUsingBufferedReader
2022837353
1925901163
1802266711
1842689572
1899984555
1843101306
1998642345
1821242301
1820168806
1830375108
Time taken for readFileUsingStreams
1992855461
1930827034
1850876033
1843402533
1800378283
1863581324
1810857226
1798497108
1809531144
1796345853
Time taken for readFileUsingCustomBufferedReader
1759732702
1765987214
1776997357
1772999486
1768559162
1755248431
1744434555
1750349867
1740582606
1751390934
Time taken for readFileUsingLineReader
1845307174
1830950256
1829847321
1828125293
1827936280
1836947487
1832186310
1820276327
1830157935
1829171481
Process finished with exit code 0
Inference:
The test was run on a 200 MB file.
The test was repeated several times.
The data looked like this
Start Date^|^Start Time^|^End Date^|^End Time^|^Event Title ^|^All Day Event^|^No End Time^|^Event Description^|^Contact ^|^Contact Email^|^Contact Phone^|^Location^|^Category^|^Mandatory^|^Registration^|^Maximum^|^Last Date To Register
9/5/2011^|^3:00:00 PM^|^9/5/2011^|^^|^Social Studies Dept. Meeting^|^N^|^Y^|^Department meeting^|^Chris Gallagher^|^cgallagher#schoolwires.com^|^814-555-5179^|^High School^|^2^|^N^|^N^|^25^|^9/2/2011
Bottomline not much difference between BufferedReader and my CustomReader and it is very miniscule and hence you can use this to read your file.
Trust me you don't have to break your head.use BufferedReader with readLine,it is properly tested.At worst if you feel you can improve it just override and change it to StringBuilder instead of StringBuffer just to shave off half a second
I had a similar problem, but I only needed the bytes from the file. I read through links provided in the various answers, and ultimately tried writing one similar to #5 in Evgeniy's answer. They weren't kidding, it took a lot of code.
The basic premise is that each line of text is of unknown length. I will start with a SeekableByteChannel, read data into a ByteBuffer, then loop over it looking for EOL. When something is a "carryover" between loops, it increments a counter and then ultimately moves the SeekableByteChannel position around and reads the entire buffer.
It is verbose ... but it works. It was plenty fast for what I needed, but I'm sure there are more improvements that can be made.
The process method is stripped down to the basics for kicking off reading the file.
private long startOffset;
private long endOffset;
private SeekableByteChannel sbc;
private final ByteBuffer buffer = ByteBuffer.allocateDirect(1024);
public void process() throws IOException
{
startOffset = 0;
sbc = Files.newByteChannel(FILE, EnumSet.of(READ));
byte[] message = null;
while((message = readRecord()) != null)
{
// do something
}
}
public byte[] readRecord() throws IOException
{
endOffset = startOffset;
boolean eol = false;
boolean carryOver = false;
byte[] record = null;
while(!eol)
{
byte data;
buffer.clear();
final int bytesRead = sbc.read(buffer);
if(bytesRead == -1)
{
return null;
}
buffer.flip();
for(int i = 0; i < bytesRead && !eol; i++)
{
data = buffer.get();
if(data == '\r' || data == '\n')
{
eol = true;
endOffset += i;
if(carryOver)
{
final int messageSize = (int)(endOffset - startOffset);
sbc.position(startOffset);
final ByteBuffer tempBuffer = ByteBuffer.allocateDirect(messageSize);
sbc.read(tempBuffer);
tempBuffer.flip();
record = new byte[messageSize];
tempBuffer.get(record);
}
else
{
record = new byte[i];
// Need to move the buffer position back since the get moved it forward
buffer.position(0);
buffer.get(record, 0, i);
}
// Skip past the newline characters
if(isWindowsOS())
{
startOffset = (endOffset + 2);
}
else
{
startOffset = (endOffset + 1);
}
// Move the file position back
sbc.position(startOffset);
}
}
if(!eol && sbc.position() == sbc.size())
{
// We have hit the end of the file, just take all the bytes
record = new byte[bytesRead];
eol = true;
buffer.position(0);
buffer.get(record, 0, bytesRead);
}
else if(!eol)
{
// The EOL marker wasn't found, continue the loop
carryOver = true;
endOffset += bytesRead;
}
}
// System.out.println(new String(record));
return record;
}
This article is a great way to start.
Also, you need to create test cases in which you read first 10k(or something else, but shouldn't be too small) lines and calculate the reading times accordingly.
Threading might be a good way to go, but it's important that we know what you will be doing with the data.
Another thing to be considered is, how you will store that size of data.
I tried the following three methods, my file size is 1M, and I got results:
I run the program several times it looks that BufferedReader is faster.
#Test
public void testLargeFileIO_Scanner() throws Exception {
long start = new Date().getTime();
String fileName = "/Downloads/SampleTextFile_1000kb.txt"; //this path is on my local
InputStream inputStream = new FileInputStream(fileName);
try (Scanner fileScanner = new Scanner(inputStream, StandardCharsets.UTF_8.name())) {
while (fileScanner.hasNextLine()) {
String line = fileScanner.nextLine();
//System.out.println(line);
}
}
long end = new Date().getTime();
long time = end - start;
System.out.println("Scanner Time Consumed => " + time);
}
#Test
public void testLargeFileIO_BufferedReader() throws Exception {
long start = new Date().getTime();
String fileName = "/Downloads/SampleTextFile_1000kb.txt"; //this path is on my local
try (BufferedReader fileBufferReader = new BufferedReader(new FileReader(fileName))) {
String fileLineContent;
while ((fileLineContent = fileBufferReader.readLine()) != null) {
//System.out.println(fileLineContent);
}
}
long end = new Date().getTime();
long time = (long) (end - start);
System.out.println("BufferedReader Time Consumed => " + time);
}
#Test
public void testLargeFileIO_Stream() throws Exception {
long start = new Date().getTime();
String fileName = "/Downloads/SampleTextFile_1000kb.txt"; //this path is on my local
try (Stream inputStream = Files.lines(Paths.get(fileName), StandardCharsets.UTF_8)) {
//inputStream.forEach(System.out::println);
}
long end = new Date().getTime();
long time = end - start;
System.out.println("Stream Time Consumed => " + time);
}
I am trying to build a servlet that parses form input and creates a .csv file from them, but the bufferedWriter object truncates a lot of characters for no (apparent to me) reason.
String filepath = getServletContext().getRealPath("\\") + "temp";
String filename = "csv"+dateFormat.format(date)+".csv";
File file = new File(filepath + filename);
file.createNewFile();
BufferedWriter fwrite = new BufferedWriter(new FileWriter(file));
for(int i=0; i<list.size(); i++) {
String[] dataEntry = list.get(i);
for (int j=0; j<dataEntry.length;j++)
fwrite.write("test1-2");
//fwrite.append(dataEntry[j]+";");
fwrite.newLine();
}
fwrite.close();
URI fileUri = file.toURI();
stream = response.getOutputStream();
response.setContentType("text/csv");
response.addHeader("Content-Disposition", "attachment; filename="
+ filename);
URLConnection urlConn = fileUri.toURL().openConnection();
response.setContentLength((int) urlConn.getContentLength());
buf = new BufferedInputStream(urlConn.getInputStream());
while (buf.read() != -1)
stream.write(buf.read());
} finally {
if (stream != null)
stream.close();
if (buf != null)
buf.close();
}
}
Sorry if the code is a bit slapdash. My current output when writing the "test1-2" string for each entry is
et-ts12et-ts12et-ts12ÿ
any further comments on the code itself would be appreciated I'm just experimenting with stuff i find on the net, I have no actual best practices points of reference.
I'd maybe add a few methods so that no individual method is overly large. For example:
/**
* Saves the List of String[] to the File.
*
* #param f
* #param list
*
* #throws IOException
*/
void saveList(File f, List<String[]> list) throws IOException {
FileWriter fw = null;
try {
fw = new FileWriter(f);
saveList(fw, list);
} finally {
if (null != fw) {
// Ensure that fw is closed.
fw.close();
}
}
}
/**
* Saves the List of String[] to the Writer.
*
* #param w
* #param list
*
* #throws IOException
*/
void saveList(Writer w, List<String[]> list) throws IOException {
BufferedWriter bw = new BufferedWriter(w);
for (int i = 0; i < list.size(); i++) {
String[] dataEntry = list.get(i);
for (int j = 0; j < dataEntry.length; j++) {
bw.write("test1-2");
// bw.append(dataEntry[j]+";");
}
bw.newLine();
}
bw.flush();
}
/**
* Copies in's contents to out.
*
* #param in
* Must not be null.
* #param out
* Must not be null.
*
* #throws IOException
*/
void copyStream(InputStream in, OutputStream out) throws IOException {
if (null == in) {
throw new NullPointerException("in must not be null");
}
if (null == out) {
throw new NullPointerException("out must not be null");
}
byte[] buf = new byte[1024 * 8];
int read = -1;
while ((read = in.read(buf)) > -1) {
out.write(buf, 0, read);
}
}
/**
* Copies in's contents to out, and ensures that in is closed afterwards.
*
* #param in
* Must not be null.
* #param out
* Must not be null.
*
* #throws IOException
*/
void copyStreamAndCloseIn(InputStream in, OutputStream out) throws IOException {
try {
copyStream(in, out);
} finally {
in.close();
}
}
public void service(HttpServletRequest request, HttpServletResponse response) throws IOException {
String filepath = getServletContext().getRealPath("\\") + "temp";
String filename = "csv" + dateFormat.format(date) + ".csv";
File file = new File(filepath + filename);
file.createNewFile();
saveList(file, list);
long length = file.length();
response.setContentType("text/csv");
response.addHeader("Content-Disposition", "attachment; filename=" + filename);
response.setContentLength((int) length);
copyStreamAndCloseIn(new FileInputStream(file), response.getOutputStream());
}
As for the strange output et-ts12et-ts12et-ts12ÿ, I'm not sure why that would be.
How are you viewing this value? Printing to the console, reading the file afterwards? Both printing to the console and opening the file in another editor could produce strange results depending on the character encoding in use.
I want to send a file from serial port and I have to use the Z-modem protocol in Java.
I saw the protocol and it looks defficult for me and I can't buy a commercial solution.
Any Idea how can I get it easyer?
Thank you for the help.
class TModem {
protected final byte CPMEOF = 26; /* control/z */
protected final int MAXERRORS = 10; /* max times to retry one block */
protected final int SECSIZE = 128; /* cpm sector, transmission block */
protected final int SENTIMOUT = 30; /* timeout time in send */
protected final int SLEEP = 30; /* timeout time in recv */
/* Protocol characters used */
protected final byte SOH = 1; /* Start Of Header */
protected final byte EOT = 4; /* End Of Transmission */
protected final byte ACK = 6; /* ACKnowlege */
protected final byte NAK = 0x15; /* Negative AcKnowlege */
protected InputStream inStream;
protected OutputStream outStream;
protected PrintWriter errStream;
/** Construct a TModem */
public TModem(InputStream is, OutputStream os, PrintWriter errs) {
inStream = is;
outStream = os;
errStream = errs;
}
/** Construct a TModem with default files (stdin and stdout). */
public TModem() {
inStream = System.in;
outStream = System.out;
errStream = new PrintWriter(System.err);
}
/** A main program, for direct invocation. */
public static void main(String[] argv) throws
Exception, IOException, InterruptedException {
/* argc must == 2, i.e., `java TModem -s filename' */
if (argv.length != 2)
usage();
if (argv[0].charAt(0) != '-')
usage();
TModem tm = new TModem( );
tm.setStandalone(true);
boolean OK = false;
switch (argv[0].charAt(1)){
case 'r':
OK = tm.receive(argv[1]);
break;
case 's':
OK = tm.send(argv[1]);
break;
default:
usage();
}
System.out.print(OK?"Done OK":"Failed");
System.exit(0);
}
/* give user minimal usage message */
protected static void usage()
{
System.err.println("usage: TModem -r/-s file");
// not errStream, not die(), since this is static.
System.exit(1);
}
/** If we're in a standalone app it is OK to System.exit() */
protected boolean standalone = false;
public void setStandalone(boolean is) {
standalone = is;
}
public boolean isStandalone() {
return standalone;
}
/** A flag used to communicate with inner class IOTimer */
protected boolean gotChar;
/** An inner class to provide a read timeout for alarms. */
class IOTimer extends Thread {
String message;
long milliseconds;
/** Construct an IO Timer */
IOTimer(long sec, String mesg) {
milliseconds = 1000 * sec;
message = mesg;
}
public void run() {
try {
Thread.sleep(milliseconds);
} catch (InterruptedException e) {
// can't happen
e.printStackTrace();
}
/** Implement the timer */
if (!gotChar)
errStream.println("Timed out waiting for " + message);
//System.out.println("Timed out waiting for " + message);
die(1);
}
}
/*
* send a file to the remote
*/
public boolean send(String tfile) throws Exception, IOException, InterruptedException
{
Parameters param;
param = new Parameters();
param.setPort("COM1");
param.setBaudRate("115200");
param.setParity("N");
param.setByteSize("8");
Com com = new Com(param);
char checksum, index, blocknumber, errorcount;
byte character;
byte[] sector = new byte[SECSIZE];
int nbytes;
DataInputStream foo;
foo = new DataInputStream(new FileInputStream(tfile));
errStream.println( "file open, ready to send");
System.out.println( "file open, ready to send");
errorcount = 0;
blocknumber = 1;
// The C version uses "alarm()", a UNIX-only system call,
// to detect if the read times out. Here we do detect it
// by using a Thread, the IOTimer class defined above.
gotChar = false;
new IOTimer(SENTIMOUT, "NAK to start send").start();
do {
character = getchar(com);
gotChar = true;
if (character != NAK && errorcount < MAXERRORS)
++errorcount;
} while (character != NAK && errorcount < MAXERRORS);
errStream.println( "transmission beginning");
System.out.println( "transmission beginning");
if (errorcount == MAXERRORS) {
xerror();
}
while ((nbytes=inStream.read(sector))!=0) {
if (nbytes<SECSIZE)
sector[nbytes]=CPMEOF;
errorcount = 0;
while (errorcount < MAXERRORS) {
errStream.println( "{" + blocknumber + "} ");
System.out.println( "{" + blocknumber + "} ");
putchar(com, SOH); /* here is our header */
putchar(com, blocknumber); /* the block number */
putchar(com, ~blocknumber); /* & its complement */
checksum = 0;
for (index = 0; index < SECSIZE; index++) {
putchar(com, sector[index]);
checksum += sector[index];
}
putchar(com, checksum); /* tell our checksum */
if (getchar(com) != ACK)
++errorcount;
else
break;
}
if (errorcount == MAXERRORS)
xerror();
++blocknumber;
}
boolean isAck = false;
while (!isAck) {
putchar(com, EOT);
isAck = getchar(com) == ACK;
}
errStream.println( "Transmission complete.");
//System.out.println( "Transmission complete.");
return true;
}
/*
* receive a file from the remote
*/
public boolean receive(String tfile) throws Exception
{
Parameters param;
param = new Parameters();
param.setPort("COM1");
param.setBaudRate("115200");
param.setParity("N");
param.setByteSize("8");
Com com = new Com(param);
char checksum, index, blocknumber, errorcount;
byte character;
byte[] sector = new byte[SECSIZE];
DataOutputStream foo;
foo = new DataOutputStream(new FileOutputStream(tfile));
System.out.println("you have " + SLEEP + " seconds...");
/* wait for the user or remote to get his act together */
gotChar = false;
new IOTimer(SLEEP, "receive from remote").start();
errStream.println("Starting receive...");
//System.out.println("Starting receive...");
putchar(com, NAK);
errorcount = 0;
blocknumber = 1;
rxLoop:
do {
character = getchar(com);
gotChar = true;
if (character != EOT) {
try {
byte not_ch;
if (character != SOH) {
errStream.println( "Not SOH");
//System.out.println( "Not SOH");
if (++errorcount < MAXERRORS)
continue rxLoop;
else
xerror();
}
character = getchar(com);
not_ch = (byte)(~getchar(com));
errStream.println( "[" + character + "] ");
//System.out.println( "[" + character + "] ");
if (character != not_ch) {
errStream.println( "Blockcounts not ~");
//System.out.println("Blockcounts not ~");
++errorcount;
continue rxLoop;
}
if (character != blocknumber) {
errStream.println( "Wrong blocknumber");
//System.out.println( "Wrong blocknumber");
++errorcount;
continue rxLoop;
}
checksum = 0;
for (index = 0; index < SECSIZE; index++) {
sector[index] = getchar(com);
checksum += sector[index];
}
if (checksum != getchar(com)) {
errStream.println( "Bad checksum");
//System.out.println( "Bad checksum");
errorcount++;
continue rxLoop;
}
putchar(com, ACK);
blocknumber++;
try {
foo.write(sector);
} catch (IOException e) {
errStream.println("write failed, blocknumber " + blocknumber);
//System.out.println("write failed, blocknumber " + blocknumber);
}
} finally {
if (errorcount != 0)
putchar(com, NAK);
}
}
} while (character != EOT);
foo.close();
putchar(com, ACK); /* tell the other end we accepted his EOT */
putchar(com, ACK);
putchar(com, ACK);
errStream.println("Receive Completed.");
//System.out.println("Receive Completed.");
return true;
}
protected byte getchar(Com com) throws Exception {
return (byte)com.receiveSingleDataInt();
// return (byte)inStream.read();
}
protected void putchar(Com com, int c) throws Exception {
com.sendSingleData(c);
// outStream.write(c);
}
protected void xerror()
{
errStream.println("too many errors...aborting");
//System.out.println("too many errors...aborting");
die(1);
}
protected void die(int how)
{
if (standalone)
System.exit(how);
else
System.out.println(("Error code " + how));
}
}