Java compressing a .txt file - java

I am currently trying to write a program which reads in a compressed file which is written in bits or 0s and 1s, and convert them in to strings of 0s and 1s.
The School provided a class and method for reading 1 bit and converting that in to a character char. So to read and convert one bit to a char, all i need to do is type in my code:
char oneBit = inputFile.readBit();
in my main method.
How do I get my program to read over every bit within the compressed file and convert them to char? using the .readBit method? And how would I convert all the char 0s and 1s in to strings of 0s and 1s?
The readBit method:
public char readBit() {
char c = 0;
if (bitsRead == 8)
try {
if (in.available() > 0) { // We have not reached the end of the
// file
buffer = (char) in.read();
bitsRead = 0;
} else
return 0;
} catch (IOException e) {
System.out.println("Error reading from file ");
System.exit(0); // Terminate the program
}
// return next bit from the buffer; bit is converted first to char
if ((buffer & 128) == 0)
c = '0';
else
c = '1';
buffer = (char) (buffer << 1);
++bitsRead;
return c;
}
where in is the input file.

Try using this resource
Sample implementation.
public class BitAnswer {
final static int RADIX = 10;
public static void main(String[] args) {
BitInputStream bis = new BitInputStream("<file_name>");
int result = bis.readBit();
while( result != -1 ) {
System.out.print(Character.forDigit(result, RADIX));
result = bis.readBit();
}
System.out.println("\nAll bits read!");
}
}

public void compress(){
String inputFileName = "c://tmp//content.txt";
String outputFileName = "c://tmp//compressedContent.txt";
FileOutputStream fos = null;
StringBuilder sb = new StringBuilder();
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
OutputStream outputStream= null;
try (BufferedReader br = new BufferedReader(new FileReader(new File(inputFileName)))) {
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
outputStream = new DeflaterOutputStream(byteArrayOutputStream); // GZIPOutputStream(byteArrayOutputStream) - use if you want unix .gz format
outputStream.write(sb.toString().getBytes());
String compressedText = Base64.getEncoder().encodeToString(byteArrayOutputStream.toByteArray());
fos=new FileOutputStream(outputFileName);
fos.write(compressedText.getBytes());
System.out.println("done compress");
} catch (Exception e) {
e.printStackTrace();
}finally{
try{
if (outputStream != null) {
outputStream.close();
}
if (byteArrayOutputStream != null) {
byteArrayOutputStream.close();
}
if(fos != null){
fos.close();
}
}catch (Exception e) {
e.printStackTrace();
}
System.out.println("closed streams !!! ");
}
}

Related

Given InputStream replace character and produce OutputStream

I have a lot of massive files I need convert to CSV by replacing certain characters.
I am looking for reliable approach given InputStream return OutputStream and replace all characters c1 to c2.
Trick here is to read and write in parallel, I can't fit whole file in memory.
Do I need to run it in separate thread if I want read and write at the same time?
Thanks a lot for your advices.
To copy data from an input stream to an output stream you write data while you're reading it either a byte (or character) or a line at a time.
Here is an example that reads in a file converting all 'x' characters to 'y'.
BufferedInputStream in = new BufferedInputStream(new FileInputStream("input.dat"));
BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream("output.dat"));
int ch;
while((ch = in.read()) != -1) {
if (ch == 'x') ch = 'y';
out.write(ch);
}
out.close();
in.close();
Or if can use a Reader and process a line at a time then can use this aproach:
BufferedReader reader = new BufferedReader(new FileReader("input.dat"));
PrintWriter writer = new PrintWriter(
new BufferedOutputStream(new FileOutputStream("output.dat")));
String str;
while ((str = reader.readLine()) != null) {
str = str.replace('x', 'y'); // replace character at a time
str = str.replace("abc", "ABC"); // replace string sequence
writer.println(str);
}
writer.close();
reader.close();
BufferedInputStream and BufferedReader read ahead and keep 8K of characters in a buffer for performance. Very large files can be processed while only keeping 8K of characters in memory at a time.
FileWriter writer = new FileWriter("Report.csv");
BufferedReader reader = new BufferedReader(new InputStreamReader(YOURSOURCE, Charsets.UTF_8));
String line;
while ((line = reader.readLine()) != null) {
line.replace('c1', 'c2');
writer.append(line);
writer.append('\n');
}
writer.flush();
writer.close();
You can find related answer here: Filter (search and replace) array of bytes in an InputStream
I took #aioobe's answer in that thread, and built the replacing input stream module in Java, which you can find it in my GitHub gist: https://gist.github.com/lhr0909/e6ac2d6dd6752871eb57c4b083799947
Putting the source code here as well:
import java.io.FilterInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.Queue;
/**
* Created by simon on 8/29/17.
*/
public class ReplacingInputStream extends FilterInputStream {
private Queue<Integer> inQueue, outQueue;
private final byte[] search, replacement;
public ReplacingInputStream(InputStream in, String search, String replacement) {
super(in);
this.inQueue = new LinkedList<>();
this.outQueue = new LinkedList<>();
this.search = search.getBytes();
this.replacement = replacement.getBytes();
}
private boolean isMatchFound() {
Iterator<Integer> iterator = inQueue.iterator();
for (byte b : search) {
if (!iterator.hasNext() || b != iterator.next()) {
return false;
}
}
return true;
}
private void readAhead() throws IOException {
// Work up some look-ahead.
while (inQueue.size() < search.length) {
int next = super.read();
inQueue.offer(next);
if (next == -1) {
break;
}
}
}
#Override
public int read() throws IOException {
// Next byte already determined.
while (outQueue.isEmpty()) {
readAhead();
if (isMatchFound()) {
for (byte a : search) {
inQueue.remove();
}
for (byte b : replacement) {
outQueue.offer((int) b);
}
} else {
outQueue.add(inQueue.remove());
}
}
return outQueue.remove();
}
#Override
public int read(byte b[]) throws IOException {
return read(b, 0, b.length);
}
// copied straight from InputStream inplementation, just needed to to use `read()` from this class
#Override
public int read(byte b[], int off, int len) throws IOException {
if (b == null) {
throw new NullPointerException();
} else if (off < 0 || len < 0 || len > b.length - off) {
throw new IndexOutOfBoundsException();
} else if (len == 0) {
return 0;
}
int c = read();
if (c == -1) {
return -1;
}
b[off] = (byte)c;
int i = 1;
try {
for (; i < len ; i++) {
c = read();
if (c == -1) {
break;
}
b[off + i] = (byte)c;
}
} catch (IOException ee) {
}
return i;
}
}

Fast read text file character-by-character(java)

Sorry for my english. I try read realy fast big size text file character-by-character(not use readLine()) but it has not yet obtained. My code:
for(int i = 0; (i = textReader.read()) != -1; ) {
char character = (char) i;
}
It read 1GB text file 56666ms, how can i read faster?
UDP
Its method read 1GB file 28833ms
FileInputStream fIn = null;
FileChannel fChan = null;
ByteBuffer mBuf;
int count;
try {
fIn = new FileInputStream(textReader);
fChan = fIn.getChannel();
mBuf = ByteBuffer.allocate(128);
do {
count = fChan.read(mBuf);
if(count != -1) {
mBuf.rewind();
for(int i = 0; i < count; i++) {
char c = (char)mBuf.get();
}
}
} while(count != -1);
}catch(Exception e) {
}
The fastest way to read input is to use buffer. Here is an example of a class that has internal buffer.
class Parser
{
final private int BUFFER_SIZE = 1 << 16;
private DataInputStream din;
private byte[] buffer;
private int bufferPointer, bytesRead;
public Parser(InputStream in)
{
din = new DataInputStream(in);
buffer = new byte[BUFFER_SIZE];
bufferPointer = bytesRead = 0;
}
public int nextInt() throws Exception
{
int ret = 0;
byte c = read();
while (c <= ' ') c = read();
//boolean neg = c == '-';
//if (neg) c = read();
do
{
ret = ret * 10 + c - '0';
c = read();
} while (c > ' ');
//if (neg) return -ret;
return ret;
}
private void fillBuffer() throws Exception
{
bytesRead = din.read(buffer, bufferPointer = 0, BUFFER_SIZE);
if (bytesRead == -1) buffer[0] = -1;
}
private byte read() throws Exception
{
if (bufferPointer == bytesRead) fillBuffer();
return buffer[bufferPointer++];
}
}
This parser has function that will give you nextInt, if you want next char you can can call read() function.
This is the fastest way to read from a file (as far as I know)
You would initialize this parser like this:
Parser p = new Parser(new FileInputStream("text.txt"));
int c;
while((c = p.read()) != -1)
System.out.print((char)c);
This code reads 250mb in 7782ms.
Disclaimer:
the code is not mine, it has been posted as a solution to a problem on CodeChef by the user 'Kamalakannan CM'
I would use BufferedReader, it reads buffered. A short sample:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.nio.CharBuffer;
public class Main {
public static void main(String... args) {
try (FileReader fr = new FileReader("a.txt")) {
try (BufferedReader reader = new BufferedReader(fr)) {
CharBuffer charBuffer = CharBuffer.allocate(8192);
reader.read(charBuffer);
} catch (IOException e) {
e.printStackTrace();
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
The default constructor uses a default buffersize of 8192. In case you want to use a different buffer size you can use this constructor. Alternatively you can read in an array buffer:
....
char[] buffer = new char[255];
reader.read(buffer);
....
or read one character at a time:
int char = reader.read();

Read large file error "outofmemoryerror"(java)

sorry for my english. I want to read a large file, but when I read error occurs outOfMemoryError. I do not understand how to work with memory in the application. The following code does not work:
try {
StringBuilder fileData = new StringBuilder(1000);
BufferedReader reader = new BufferedReader(new FileReader(file));
char[] buf = new char[8192];
int bytesread = 0,
bytesBuffered = 0;
while( (bytesread = reader.read( buf )) > -1 ) {
String readData = String.valueOf(buf, 0, bytesread);
bytesBuffered += bytesread;
fileData.append(readData); //this is error
if (bytesBuffered > 1024 * 1024) {
bytesBuffered = 0;
}
}
System.out.println(fileData.toString().toCharArray());
} finally {
}
You need pre allocate a large buffer to avoid reallocate.
File file = ...;
StringBuilder fileData = new StringBuilder(file.size());
And running with large heap size:
java -Xmx2G
==== update
A while loop using buffer doesn't need too memory to run. Treat input like a stream, match your search string with the stream. It's a really simple state machine. If you need search multiple words, you can find a TrieTree implementation(support stream) for that.
// the match state model
...xxxxxxabxxxxxaxxxxxabcdexxxx...
ab a abcd
File file = new File("path_to_your_file");
String yourSearchWord = "abcd";
int matchIndex = 0;
boolean matchPrefix = false;
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
int chr;
while ((chr = reader.read()) != -1) {
if (matchPrefix == false) {
char searchChar = yourSearchWord.charAt(0);
if (chr == searchChar) {
matchPrefix = true;
matchIndex = 0;
}
} else {
char searchChar = yourSearchWord.charAt(++matchIndex);
if (chr == searchChar) {
if (matchIndex == yourSearchWord.length() - 1) {
// match!!
System.out.println("match: " + matchIndex);
matchPrefix = false;
matchIndex = 0;
}
} else {
matchPrefix = false;
matchIndex = 0;
}
}
}
}
Try this. This might be helpful :-
try{
BufferedReader reader = new BufferedReader(new FileReader(file));
String txt = "";
while( (txt = reader.read()) != null){
System.out.println(txt);
}
}catch(Exception e){
System.out.println("Error : "+e.getMessage());
}
You should not hold such big files in memory, because you run out of it, as you see. Since you use Java 7, you need to read the file manually as stream and check the content on the fly. Otherwise you could use the stream API of Java 8. This is just an example. It works, but keep in mind, that the position of the found word could vary due to encoding issues, so this is no production code:
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
public class FileReader
{
private static String wordToFind = "SEARCHED_WORD";
private static File file = new File("YOUR_FILE");
private static int currentMatchingPosition;
private static int foundAtPosition = -1;
private static int charsRead;
public static void main(String[] args) throws IOException
{
try (FileInputStream fis = new FileInputStream(file))
{
System.out.println("Total size to read (in bytes) : " + fis.available());
int c;
while ((c = fis.read()) != -1)
{
charsRead++;
checkContent(c);
}
if (foundAtPosition > -1)
{
System.out.println("Found word at position: " + (foundAtPosition - wordToFind.length()));
}
else
{
System.out.println("Didnt't find the word!");
}
}
catch (IOException e)
{
e.printStackTrace();
}
}
private static void checkContent(int c)
{
if (currentMatchingPosition >= wordToFind.length())
{
//already found....
return;
}
if (wordToFind.charAt(currentMatchingPosition) == (char)c)
{
foundAtPosition = charsRead;
currentMatchingPosition++;
}
else
{
currentMatchingPosition = 0;
foundAtPosition = -1;
}
}
}

writing in text file in java

I have a text file that contained a large number of words, and i want to divide the words by writing ** for every 4 words.
What i did until now is adding the first ** ( the first 4 words) and I have some difficulties in putting the other stars.
here is my code until now (I am using java)
import java.io.*;
public class Insert {
public static void main(String args[]){
try {
INSERT In = new INSERT();
int tc=4;
In.insertStringInFile (new File("D:/Users//im080828/Desktop/Souad/project/reduction/weight/outdata/d.txt"), tc, "**");
}
catch (Exception e) {
e.printStackTrace();
}
}
public void insertStringInFile(File inFile, int lineno, String lineToBeInserted)
throws Exception {
// temp file
File outFile = new File("$$$$$$$$.tmp");
// input
FileInputStream fis = new FileInputStream(inFile);
BufferedReader in = new BufferedReader
(new InputStreamReader(fis));
// output
FileOutputStream fos = new FileOutputStream(outFile);
PrintWriter out = new PrintWriter(fos);
String thisLine = "";
int i =1;
while ((thisLine = in.readLine()) != null) {
if(i == lineno) out.println(lineToBeInserted);
out.println(thisLine);
i++;
}
out.flush();
out.close();
in.close();
inFile.delete();
outFile.renameTo(inFile);
}
}
Please.. give me some ideas
Thanks :)
When you do if (i == lineno) you only get true if (i==4), so your behavior is normal. You need to use the modulo operator if ((i % lineno) == 0) to get a star every for lines.
http://www.dreamincode.net/forums/topic/273783-the-use-of-the-modulo-operator/

Start with new line after certain amount of characters in java

I have a program which reads a file I can change the content of this file and after that it's written to another file. The input file looks like this: http://gyazo.com/4ee1ade01378238e2c765e593712de7f and the output has to look like this http://gyazo.com/5a5bfd00123df9d7791a74b4e77f6c10 my current output is http://gyazo.com/87a83f4c6d48aebda3d11060ebad66c2 so how to change my code that it's starts a new line after 12 characters? Also I want to delete the last !.
public class readFile {
String line;
StringBuilder buf = new StringBuilder();
public void readFile(){
BufferedReader reader = null;
try {
File file = new File("C:/Users/Sybren/Desktop/Invoertestbestand1.txt");
reader = new BufferedReader(new FileReader(file));
//String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
//buf.append(line);
processInput();
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
};
}
}
public void processInput(){
buf.append(line);
if (buf.length()>7){
buf.append("-");
//buf.append(System.getProperty("line.separator"));
}
/* start with a new line if the line length is bigger than 12 - in progress*/
/* I know this if doesn't work but how to fix it? */
if (buf.length()>12){
buf.append(System.getProperty("line.separator"));
}
/* if a * is followed by * change them to a !*/
for (int index = 0; index < buf.length(); index++) {
if (buf.charAt(index) == '*' && buf.charAt(index+1) == '*') {
buf.setCharAt(index, '!');
buf.deleteCharAt(index+1);
//buf.deleteCharAt(buf.length()-1);
}
// get last character from stringbuilder and delete
//buf.deleteCharAt(buf.length()-1);
}
}
public void writeFile() {
try {
String content = buf.toString();
File file = new File("C:/Users/Sybren/Desktop/test.txt");
// if file doesnt exists, then create it
if (!file.exists()) {
file.createNewFile();
}
FileWriter fw = new FileWriter(file.getAbsoluteFile());
BufferedWriter bw = new BufferedWriter(fw);
bw.write(content);
bw.close();
System.out.println("Done");
} catch (IOException e) {
e.printStackTrace();
}
}
}
Update the code in which while reading the file you will take the decision :
int sevenCount = 0;
int fourteenCount = 0;
int data = 0;
while ((data = reader.read()) != -1) {
sevenCount++;
fourteenCount++;
if(sevenCount==7)
{
buf.append("-"); // append - at every 7th character
sevenCount = 0;
}
if(fourteenCount==14)
{
buf.append("\n"); // change line after evrry 14th character
fourteenCount = 0;
}
if(((char)data) == '*')
{
char c = '!'; //Change the code when char contain *
data = (int)c;
}
else
{
buf.append((char)data);
}
}
If you want to insert a newline in a string every 12 chars:
str = str.replaceAll(".{12}", "$0\n");

Categories