Reading files bits and saving them - java

i have file reader which read entire file and write it's bits.
I have this class which help reading:
import java.io.*;
public class FileReader extends ByteArrayInputStream{
private int bitsRead;
private int bitPosition;
private int currentByte;
private int myMark;
private final static int NUM_BITS_IN_BYTE = 8;
private final static int END_POSITION = -1;
private boolean readingStarted;
/**
* Create a BitInputStream for a File on disk.
*/
public FileReader( byte[] buf ) throws IOException {
super( buf );
myMark = 0;
bitsRead = 0;
bitPosition = NUM_BITS_IN_BYTE-1;
currentByte = 0;
readingStarted = false;
}
/**
* Read a binary "1" or "0" from the File.
*/
public int readBit() throws IOException {
int theBit = -1;
if( bitPosition == END_POSITION || !readingStarted ) {
currentByte = super.read();
bitPosition = NUM_BITS_IN_BYTE-1;
readingStarted = true;
}
theBit = (0x01 << bitPosition) & currentByte;
bitPosition--;
if( theBit > 0 ) {
theBit = 1;
}
return( theBit );
}
/**
* Return the next byte in the File as lowest 8 bits of int.
*/
public int read() {
currentByte = super.read();
bitPosition = END_POSITION;
readingStarted = true;
return( currentByte );
}
/**
*
*/
public void mark( int readAheadLimit ) {
super.mark(readAheadLimit);
myMark = bitPosition;
}
/**
* Add needed functionality to super's reset() method. Reset to
* the last valid position marked in the input stream.
*/
public void reset() {
super.pos = super.mark-1;
currentByte = super.read();
bitPosition = myMark;
}
/**
* Returns the number of bits still available to be read.
*/
public int availableBits() throws IOException {
return( ((super.available() * 8) + (bitPosition + 1)) );
}
}
In class where i call this, i do:
FileInputStream inputStream = new FileInputStream(file);
byte[] fileBits = new byte[inputStream.available()];
inputStream.read(fileBits, 0, inputStream.available());
inputStream.close();
FileReader bitIn = new FileReader(fileBits);
and this work correctly.
However i have problems with big files above 100 mb because byte[] have the end.
So i want to read bigger files. Maybe some could suggest how i can improve this code ?
Thanks.

If scaling to large file sizes is important, you'd be better off not reading the entire file into memory. The downside is that handling the IOException in more locations can be a little messy. Also, it doesn't look like your application needs something that implements the InputStream API, it just needs the readBit() method. So, you can safely encapsulate, rather than extend, the InputStream.
class FileReader {
private final InputStream src;
private final byte[] bits = new byte[8192];
private int len;
private int pos;
FileReader(InputStream src) {
this.src = src;
}
int readBit() throws IOException {
int idx = pos / 8;
if (idx >= len) {
int n = src.read(bits);
if (n < 0)
return -1;
len = n;
pos = 0;
idx = 0;
}
return ((bits[idx] & (1 << (pos++ % 8))) == 0) ? 0 : 1;
}
}
Usage would look similar.
FileInputStream src = new FileInputStream(file);
try {
FileReader bitIn = new FileReader(src);
...
} finally {
src.close();
}
If you really do want to read in the entire file, and you are working with an actual file, you can query the length of the file first.
File file = new File(path);
if (file.length() > Integer.MAX_VALUE)
throw new IllegalArgumentException("File is too large: " + file.length());
int len = (int) file.length();
FileInputStream inputStream = new FileInputStream(file);
try {
byte[] fileBits = new byte[len];
for (int pos = 0; pos < len; ) {
int n = inputStream.read(fileBits, pos, len - pos);
if (n < 0)
throw new EOFException();
pos += n;
}
/* Use bits. */
...
} finally {
inputStream.close();
}

org.apache.commons.io.IOUtils.copy(InputStream in, OutputStream out)

Related

Incremental string decoding in Java

Suppose I receive bytes in chunks and I want to efficiently decode them to a string (that is going to be Unicode obviously), also I want to know, as soon as I can, if that string begins with a certain sequence.
One way could be:
public boolean inputBytesMatch(InputStream inputStream, String match) throws IOException {
byte[] buff = new byte[1024];
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int len;
while ((len = inputStream.read(buff)) > 0){
byteArrayOutputStream.write(buff, 0, len);
String decoded = new String(byteArrayOutputStream.toByteArray(), Charset.defaultCharset());
if (decoded.startsWith(match)){
return true;
}
}
return false;
}
but this involves allocating a new array from the byteArrayOutputStream every time there is a new chunk and String will do another copy in the constructor. All this seems to me pretty inefficient. Also string will do a decode of the bytes in the constructor, every single time, doing it from the beginning once again.
How can I make this process faster?
Actually you don't need a ByteArrayOutputStream at all.
First turn your String match into a byte[], using your desired encoding.
Then just compare each incoming chunk with the next part of that array:
public boolean inputBytesMatch(InputStream inputStream, String match) throws IOException {
byte[] compare = match.getBytes(Charset.defaultCharset());
int n = compare.length;
int compareAt = 0;
byte[] buff = new byte[n];
int len;
while (compareAt < n && (len = inputStream.read(buff, 0, n-compareAt)) > 0) {
for (int i=0; i < len && compareAt < n; i++, compareAt++) {
if (compare[compareAt] != buff[i]) {
// found contradicting byte
return false;
}
}
}
// No byte was found which contradicts that the streamed data begins with compare.
// Did we actually read enough bytes?
return compareAt >= n;
}
You might find this version more readable:
public boolean inputBytesMatch(InputStream inputStream, String match) throws IOException {
byte[] compare = match.getBytes(Charset.defaultCharset());
int n = compare.length;
int compareAt = 0;
byte[] buff = new byte[n];
int len;
while (compareAt < n && (len = inputStream.read(buff, 0, n-compareAt)) > 0) {
if (!isSubArray(compare, compareAt, buff, len)) {
return false;
}
compareAt += len;
}
return compareAt >= n;
}
private boolean isSubArray(byte[] searchIn, int searchInOffset, byte[] searchFor, int searchForLength)
{
if (searchInOffset + searchForLength >= searchIn.length) {
// can not match
return false;
}
for (int i=0; i < searchForLength; i++) {
if (searchIn[searchInOffset+i] != searchFor[i]) {
return false;
}
}
return true;
}

Split large file into chunks

I have a method which accept file and size of chunks and return list of chunked files. But the main problem that my line in file could be broken, for example in main file I have next lines:
|1|aaa|bbb|ccc|
|2|ggg|ddd|eee|
After split I could have in one file:
|1|aaa|bbb
In another file:
|ccc|2|
|ggg|ddd|eee|
Here is the code:
public static List<File> splitFile(File file, int sizeOfFileInMB) throws IOException {
int counter = 1;
List<File> files = new ArrayList<>();
int sizeOfChunk = 1024 * 1024 * sizeOfFileInMB;
byte[] buffer = new byte[sizeOfChunk];
try (BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file))) {
String name = file.getName();
int tmp = 0;
while ((tmp = bis.read(buffer)) > 0) {
File newFile = new File(file.getParent(), name + "."
+ String.format("%03d", counter++));
try (FileOutputStream out = new FileOutputStream(newFile)) {
out.write(buffer, 0, tmp);
}
files.add(newFile);
}
}
return files;
}
Should I use RandomAccessFile class for above purposes (main file is really big - more then 5 Gb)?
If you don't mind to have chunks of different lengths (<=sizeOfChunk but closest to it) then here is the code:
public static List<File> splitFile(File file, int sizeOfFileInMB) throws IOException {
int counter = 1;
List<File> files = new ArrayList<File>();
int sizeOfChunk = 1024 * 1024 * sizeOfFileInMB;
String eof = System.lineSeparator();
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String name = file.getName();
String line = br.readLine();
while (line != null) {
File newFile = new File(file.getParent(), name + "."
+ String.format("%03d", counter++));
try (OutputStream out = new BufferedOutputStream(new FileOutputStream(newFile))) {
int fileSize = 0;
while (line != null) {
byte[] bytes = (line + eof).getBytes(Charset.defaultCharset());
if (fileSize + bytes.length > sizeOfChunk)
break;
out.write(bytes);
fileSize += bytes.length;
line = br.readLine();
}
}
files.add(newFile);
}
}
return files;
}
The only problem here is file charset which is default system charset in this example. If you want to be able to change it let me know. I'll add third parameter to "splitFile" function for it.
Just in case anyone is interested in a Kotlin version.
It creates an iterator of ByteArray chunks:
class ByteArrayReader(val input: InputStream, val chunkSize: Int, val bufferSize: Int = 1024*8): Iterator<ByteArray> {
var eof: Boolean = false
init {
if ((chunkSize % bufferSize) != 0) {
throw RuntimeException("ChunkSize(${chunkSize}) should be a multiple of bufferSize (${bufferSize})")
}
}
override fun hasNext(): Boolean = !eof
override fun next(): ByteArray {
var buffer = ByteArray(bufferSize)
var chunkWriter = ByteArrayOutputStream(chunkSize) // no need to close - implementation is empty
var bytesRead = 0
var offset = 0
while (input.read(buffer).also { bytesRead = it } > 0) {
if (chunkWriter.use { out ->
out.write(buffer, 0, bytesRead)
out.flush()
offset += bytesRead
offset == chunkSize
}) {
return chunkWriter.toByteArray()
}
}
eof = true
return chunkWriter.toByteArray()
}
}
Split a file to multiple chunks (in memory operation), here I'm splitting any file to a size of 500kb(500000 bytes) and adding to a list :
public static List<ByteArrayOutputStream> splitFile(File f) {
List<ByteArrayOutputStream> datalist = new ArrayList<>();
try {
int sizeOfFiles = 500000;
byte[] buffer = new byte[sizeOfFiles];
try (FileInputStream fis = new FileInputStream(f); BufferedInputStream bis = new BufferedInputStream(fis)) {
int bytesAmount = 0;
while ((bytesAmount = bis.read(buffer)) > 0) {
try (OutputStream out = new ByteArrayOutputStream()) {
out.write(buffer, 0, bytesAmount);
out.flush();
datalist.add((ByteArrayOutputStream) out);
}
}
}
} catch (Exception e) {
//get the error
}
return datalist;
}
Split files in chunks depending upon your chunk size
val f = FileInputStream(file)
val data = ByteArray(f.available()) // Size of original file
var subData: ByteArray
f.read(data)
var start = 0
var end = CHUNK_SIZE
val max = data.size
if (max > 0) {
while (end < max) {
subData = data.copyOfRange(start, end)
start = end
end += CHUNK_SIZE
if (end >= max) {
end = max
}
//Function to upload your chunk
uploadFileInChunk(subData, isLast = false)
}
// For the Last Chunk
end--
subData = data.copyOfRange(start, end)
uploadFileInChunk(subData, isLast = true)
}
If you are taking the file from the user through intent you may get file URI as content, so in that case.
Uri uri = data.getData();
InputStream inputStream = getContext().getContentResolver().openInputStream(uri);
fileInBytes = IOUtils.toByteArray(inputStream);
Add the dependency in you build gradle to use IOUtils
compile 'commons-io:commons-io:2.11.0'
Now do a little modification in the above code to send your file to server.
var subData: ByteArray
var start = 0
var end = CHUNK_SIZE
val max = fileInBytes.size
if (max > 0) {
while (end < max) {
subData = fileInBytes.copyOfRange(start, end)
start = end
end += CHUNK_SIZE
if (end >= max) {
end = max
}
uploadFileInChunk(subData, isLast = false)
}
// For the Last Chunk
end--
subData = fileInBytes.copyOfRange(start, end)
uploadFileInChunk(subData, isLast = true)
}

Fast read text file character-by-character(java)

Sorry for my english. I try read realy fast big size text file character-by-character(not use readLine()) but it has not yet obtained. My code:
for(int i = 0; (i = textReader.read()) != -1; ) {
char character = (char) i;
}
It read 1GB text file 56666ms, how can i read faster?
UDP
Its method read 1GB file 28833ms
FileInputStream fIn = null;
FileChannel fChan = null;
ByteBuffer mBuf;
int count;
try {
fIn = new FileInputStream(textReader);
fChan = fIn.getChannel();
mBuf = ByteBuffer.allocate(128);
do {
count = fChan.read(mBuf);
if(count != -1) {
mBuf.rewind();
for(int i = 0; i < count; i++) {
char c = (char)mBuf.get();
}
}
} while(count != -1);
}catch(Exception e) {
}
The fastest way to read input is to use buffer. Here is an example of a class that has internal buffer.
class Parser
{
final private int BUFFER_SIZE = 1 << 16;
private DataInputStream din;
private byte[] buffer;
private int bufferPointer, bytesRead;
public Parser(InputStream in)
{
din = new DataInputStream(in);
buffer = new byte[BUFFER_SIZE];
bufferPointer = bytesRead = 0;
}
public int nextInt() throws Exception
{
int ret = 0;
byte c = read();
while (c <= ' ') c = read();
//boolean neg = c == '-';
//if (neg) c = read();
do
{
ret = ret * 10 + c - '0';
c = read();
} while (c > ' ');
//if (neg) return -ret;
return ret;
}
private void fillBuffer() throws Exception
{
bytesRead = din.read(buffer, bufferPointer = 0, BUFFER_SIZE);
if (bytesRead == -1) buffer[0] = -1;
}
private byte read() throws Exception
{
if (bufferPointer == bytesRead) fillBuffer();
return buffer[bufferPointer++];
}
}
This parser has function that will give you nextInt, if you want next char you can can call read() function.
This is the fastest way to read from a file (as far as I know)
You would initialize this parser like this:
Parser p = new Parser(new FileInputStream("text.txt"));
int c;
while((c = p.read()) != -1)
System.out.print((char)c);
This code reads 250mb in 7782ms.
Disclaimer:
the code is not mine, it has been posted as a solution to a problem on CodeChef by the user 'Kamalakannan CM'
I would use BufferedReader, it reads buffered. A short sample:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.nio.CharBuffer;
public class Main {
public static void main(String... args) {
try (FileReader fr = new FileReader("a.txt")) {
try (BufferedReader reader = new BufferedReader(fr)) {
CharBuffer charBuffer = CharBuffer.allocate(8192);
reader.read(charBuffer);
} catch (IOException e) {
e.printStackTrace();
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
The default constructor uses a default buffersize of 8192. In case you want to use a different buffer size you can use this constructor. Alternatively you can read in an array buffer:
....
char[] buffer = new char[255];
reader.read(buffer);
....
or read one character at a time:
int char = reader.read();

Removing ASCII characters in a string with encoding

I have a byte array which is filled by a serial port event and code is shown below:
private InputStream input = null;
......
......
public void SerialEvent(SerialEvent se){
if(se.getEventType == SerialPortEvent.DATA_AVAILABLE){
int length = input.available();
if(length > 0){
byte[] array = new byte[length];
int numBytes = input.read(array);
String text = new String(array);
}
}
}
The variable text contains the below characters,
"\033[K", "\033[m", "\033[H2J", "\033[6;1H" ,"\033[?12l", "\033[?25h", "\033[5i", "\033[4i", "\033i" and similar types..
As of now, I use String.replace to remove all these characters from the string.
I have tried new String(array , 'CharSet'); //Tried with all CharSet options but I couldn't able to remove those.
Is there any way where I can remove those characters without using replace method?
I gave a unsatisfying answer, thanks to #OlegEstekhin for pointing that out.
As noone else answered yet, and a solution is not a two-liner, here it goes.
Make a wrapping InputStream that throws away escape sequences. I have used a PushbackInputStream, where a partial sequence skipped, may still be pushed back for reading first. Here a FilterInputStream would suffice.
public class EscapeRemovingInputStream extends PushbackInputStream {
public static void main(String[] args) {
String s = "\u001B[kHello \u001B[H12JWorld!";
byte[] buf = s.getBytes(StandardCharsets.ISO_8859_1);
ByteArrayInputStream bais = new ByteArrayInputStream(buf);
EscapeRemovingInputStream bin = new EscapeRemovingInputStream(bais);
try (InputStreamReader in = new InputStreamReader(bin,
StandardCharsets.ISO_8859_1)) {
int c;
while ((c = in.read()) != -1) {
System.out.print((char) c);
}
System.out.println();
} catch (IOException ex) {
Logger.getLogger(EscapeRemovingInputStream.class.getName()).log(
Level.SEVERE, null, ex);
}
}
private static final Pattern ESCAPE_PATTERN = Pattern.compile(
"\u001B\\[(k|m|H\\d+J|\\d+:\\d+H|\\?\\d+\\w|\\d*i)");
private static final int MAX_ESCAPE_LENGTH = 20;
private final byte[] escapeSequence = new byte[MAX_ESCAPE_LENGTH];
private int escapeLength = 0;
private boolean eof = false;
public EscapeRemovingInputStream(InputStream in) {
this(in, MAX_ESCAPE_LENGTH);
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
for (int i = 0; i < len; ++i) {
int c = read();
if (c == -1) {
return i == 0 ? -1 : i;
}
b[off + i] = (byte) c;
}
return len;
}
#Override
public int read() throws IOException {
int c = eof ? -1 : super.read();
if (c == -1) { // Throw away a trailing half escape sequence.
eof = true;
return c;
}
if (escapeLength == 0 && c != 0x1B) {
return c;
} else {
escapeSequence[escapeLength] = (byte) c;
++escapeLength;
String esc = new String(escapeSequence, 0, escapeLength,
StandardCharsets.ISO_8859_1);
if (ESCAPE_PATTERN.matcher(esc).matches()) {
escapeLength = 0;
} else if (escapeLength == MAX_ESCAPE_LENGTH) {
escapeLength = 0;
unread(escapeSequence);
return super.read(); // No longer registering the escape
}
return read();
}
}
}
User calls EscapeRemovingInputStream.read
this read may call some read's itself to fill an byte buffer escapeSequence
(a push-back may be done calling unread)
the original read returns.
The recognition of an escape sequence seems grammatical: command letter, numerical argument(s). Hence I use a regular expression.

File transfer over socket using JAVA

I was searching a code in java for sending multiple files over a socket, I found this code which consists of a TX main, a RX main and a class for all the dirty work I assume. Code runs with no errors but I have a questions for the experts,
where exactly in the code, the user types the files that he/she want to send to the server ?
And in the server main, what is the location where the server stores the received file, and with what name ?
Where exactly in this code ( TX / RX / ByteStream), should I amend to specify what file goes in ?
I would like to input the filename myself in the client (TX) side, where futher on I would include a JFileChooser for the user to select Graphically which file to send.
package file_rx;
import java.io.*;
import java.net.*;
public class File_RX implements Runnable
{
private static final int port = 4711;
private Socket socket;
public static void main(String[] _)
{
try
{
ServerSocket listener = new ServerSocket(port);
while (true)
{
File_RX file_rec = new File_RX();
file_rec.socket = listener.accept();
new Thread(file_rec).start();
}
}
catch (java.lang.Exception e)
{
e.printStackTrace(System.out);
}
}
public void run()
{
try
{
InputStream in = socket.getInputStream();
int nof_files = ByteStream.toInt(in);
for (int cur_file = 0; cur_file < nof_files; cur_file++)
{
String file_name = ByteStream.toString(in);
File file = new File(file_name);
ByteStream.toFile(in, file);
}
}
catch (java.lang.Exception e)
{
e.printStackTrace(System.out);
}
}
}
package file_tx;
import java.io.*;
import java.net.*;
public class File_TX
{
private static final int port = 4711;
private static final String host = "localhost";
public static void main(String[] args)
{
try
{
Socket socket = new Socket(host, port);
OutputStream os = socket.getOutputStream();
int cnt_files = args.length;
ByteStream.toStream(os, cnt_files);
for (int cur_file = 0; cur_file < cnt_files; cur_file++)
{
ByteStream.toStream(os, args[cur_file]);
ByteStream.toStream(os, new File(args[cur_file]));
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
package file_rx;
import java.io.*;
public class ByteStream
{
private static byte[] toByteArray(int in_int)
{
byte a[] = new byte[4];
for (int i = 0; i < 4; i++)
{
int b_int = (in_int >> (i*8)) & 255;
byte b = (byte) (b_int);
a[i] = b;
}
return a;
}
private static int toInt(byte[] byte_array_4)
{
int ret = 0;
for (int i = 0; i < 4; i++)
{
int b = (int) byte_array_4[i];
if (i < 3 && b < 0)
{
b = 256 + b;
}
ret += b << (i * 8);
}
return ret;
}
public static int toInt(InputStream in) throws java.io.IOException
{
byte[] byte_array_4 = new byte[4];
byte_array_4[0] = (byte)in.read();
byte_array_4[1] = (byte)in.read();
byte_array_4[2] = (byte)in.read();
byte_array_4[3] = (byte)in.read();
return toInt(byte_array_4);
}
public static String toString(InputStream ins) throws java.io.IOException
{
int len = toInt(ins);
return toString(ins, len);
}
private static String toString(InputStream ins, int len) throws java.io.IOException
{
String ret = new String();
for (int i = 0; i < len; i++)
{
ret += (char) ins.read();
}
return ret;
}
public static void toStream(OutputStream os, int i) throws java.io.IOException
{
byte [] byte_array_4 = toByteArray(i);
os.write(byte_array_4);
}
public static void toStream(OutputStream os, String s) throws java.io.IOException
{
int len_s = s.length();
toStream(os, len_s);
for (int i = 0; i < len_s; i++)
{
os.write((byte) s.charAt(i));
}
os.flush();
}
private static byte[] toByteArray(InputStream ins, int an_int) throws java.io.IOException
{
byte[] ret = new byte[an_int];
int offset = 0;
int numRead = 0;
int outstanding = an_int;
while ((offset < an_int) && (numRead = ins.read(ret, offset, outstanding)) > 0)
{
offset += numRead;
outstanding = an_int - offset;
}
if (offset < ret.length)
{
//throw new Exception("Could not completely read from stream, numRead =" + numRead + ", ret.lenght = " + ret.length);
}
return ret;
}
private static void toFile(InputStream ins, FileOutputStream fos, int len, int buf_size) throws java.io.IOException, java.io.FileNotFoundException
{
byte[] buffer = new byte[buf_size];
int len_read = 0;
int total_len_read = 0;
while (total_len_read + buf_size <= len)
{
len_read = ins.read(buffer);
total_len_read += len_read;
fos.write(buffer, 0, len_read);
}
if (total_len_read < len)
{
toFile(ins, fos, len - total_len_read, buf_size / 2);
}
}
private static void toFile(InputStream ins, File file, int len) throws java.io.IOException, java.io.FileNotFoundException
{
FileOutputStream fos = new FileOutputStream(file);
toFile(ins, fos, len, 1024);
}
public static void toFile (InputStream ins, File file) throws java.io.IOException, java.io.FileNotFoundException
{
int len = toInt(ins);
toFile(ins, file, len);
}
public static void toStream(OutputStream os, File file) throws java.io.IOException, java.io.FileNotFoundException
{
toStream(os, (int) file.length());
byte b[] = new byte[1024];
InputStream is = new FileInputStream(file);
int numRead = 0;
while ((numRead = is.read(b)) > 0)
{
os.write(b, 0, numRead);
}
os.flush();
}
}
The names (and paths) of the files to be transmitted are specified as arguments to the main method in the File_TX class. On the server side (File_RX class), the files will be saved relatively to the current directory of the File_RX.class file, having the same relative path as the input arguments above.

Categories