How to convert binary text into useable file - java

So I use the following methods
(File is converted to Byte Array through 'convertFileToByteArray()', then written to .txt file by 'convertByteArrayToBitTextFile()'
to convert any kind of file into a Binary Text file (and by that I mean only 1's and 0's in human readable form.)
public static byte[] convertFileToByteArray(String path) throws IOException
{
File file = new File(path);
byte[] fileData;
fileData = new byte[(int)file.length()];
FileInputStream in = new FileInputStream(file);
in.read(fileData);
in.close();
return fileData;
}
public static boolean convertByteArrayToBitTextFile(String path, byte[] bytes)
{
String content = convertByteArrayToBitString(bytes);
try
{
PrintWriter out = new PrintWriter(path);
out.println(content);
out.close();
return true;
}
catch (FileNotFoundException e)
{
return false;
}
}
public static String convertByteArrayToBitString(byte[] bytes)
{
String content = "";
for (int i = 0; i < bytes.length; i++)
{
content += String.format("%8s", Integer.toBinaryString(bytes[i] & 0xFF)).replace(' ', '0');
}
return content;
}
Edit: Additional Code:
public static byte[] convertFileToByteArray(String path) throws IOException
{
File file = new File(path);
byte[] fileData;
fileData = new byte[(int)file.length()];
FileInputStream in = new FileInputStream(file);
in.read(fileData);
in.close();
return fileData;
}
public static boolean convertByteArrayToBitTextFile(String path, byte[] bytes)
{
try
{
PrintWriter out = new PrintWriter(path);
for (int i = 0; i < bytes.length; i++)
{
out.print(String.format("%8s", Integer.toBinaryString(bytes[i] & 0xFF)).replace(' ', '0'));
}
out.close();
return true;
}
catch (FileNotFoundException e)
{
return false;
}
}
public static boolean convertByteArrayToByteTextFile(String path, byte[] bytes)
{
try
{
PrintWriter out = new PrintWriter(path);
for(int i = 0; i < bytes.length; i++)
{
out.print(bytes[i]);
}
out.close();
return true;
}
catch (FileNotFoundException e)
{
return false;
}
}
public static boolean convertByteArrayToRegularFile(String path, byte[] bytes)
{
try
{
PrintWriter out = new PrintWriter(path);
for(int i = 0; i < bytes.length; i++)
{
out.write(bytes[i]);
}
out.close();
return true;
}
catch (FileNotFoundException e)
{
return false;
}
}
public static boolean convertBitFileToByteTextFile(String path)
{
try
{
byte[] b = convertFileToByteArray(path);
convertByteArrayToByteTextFile(path, b);
return true;
}
catch (IOException e)
{
return false;
}
}
I do this to try methods of compression on a very fundamental level, so please let's not discuss why use human-readable form.
Now this works quite well so far, however I got two problems.
1)
It takes foreeeever (>20 Minutes for 230KB into binary text). Is this just a by-product of the relatively complicated conversion or are there other methods to do this faster?
2) and main problem:
I have no idea how to convert the files back to what they used to be. Renaming from .txt to .exe does not work (not too surprising as the resulting file is two times larger than the original)
Is this still possible or did I lose Information about what the file is supposed to represent by converting it to a human-readable text file?
If so, do you know any alternative that prevents this?
Any help is appreciated.

The thing that'll cost you most time is the construction of an ever increasing String. A better approach would be to write the data as soon as you have it.
The other problem is very easy. You know that every sequence of eight characters ('0' or '1') was made from a byte. Hence, you know the values of each character in an 8-character block:
01001010
^----- 0*1
^------ 1*2
^------- 0*4
^-------- 1*8
^--------- 0*16
^---------- 0*32
^----------- 1*64
^------------ 0*128
-----
64+8+2 = 74
You only need to add the values where an '1' is present.
You can do it in Java like this, without even knowing the individual bit values:
String sbyte = "01001010";
int bytevalue = 0;
for (i=0; i<8; i++) {
bytevalue *= 2; // shifts the bit pattern to the left 1 position
if (sbyte.charAt(i) == '1') bytevalue += 1;
}

Use StringBuilder to avoid generating enormous numbers of unused String instances.
Better yet, write directly to the PrintWriter instead of building it in-memory at all.
Loop through every 8-character subsequence and call Byte.parseByte(text, 2) to parse it back to a byte.

Related

How to convert Reader to InputStream in java

I need to convert a Reader object into InputStream. My solution right now is below. But my concern is since this will handle big chunks of data, it will increase the memory usage drastically.
private static InputStream getInputStream(final Reader reader) {
char[] buffer = new char[10240];
StringBuilder builder = new StringBuilder();
int charCount;
try {
while ((charCount = reader.read(buffer, 0, buffer.length)) != -1) {
builder.append(buffer, 0, charCount);
}
reader.close();
} catch (final IOException e) {
e.printStackTrace();
}
return new ByteArrayInputStream(builder.toString().getBytes(StandardCharsets.UTF_8));
}
Since I use StringBuilder this will keep the full content of the reader object in memory. I want to avoid this. Is there a way I can pipe Reader object? Any help regarding this highly appreciated.
Using the Apache Commons IO library, you can do this conversion in one line:
//import org.apache.commons.io.input.ReaderInputStream;
InputStream inputStream = new ReaderInputStream(reader, StandardCharsets.UTF_8);
You can read the documentaton for this Class at https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/ReaderInputStream.html
It might be worth trying this to see if it solves the memory issue too.
First: a rare requirement, often it is the other way around, or there is a FileChannel, so one can use a ByteBuffer.
A PipedInputStream would be possible, starting a PipedOutputStream in a second thread. However that is unneeded.
A Reader gives chars. Unicode code points are derived from either one or two chars (the latter a surrogate pair).
/**
* Reader for an InputSteam of UTF-8 text bytes.
*/
public class ReaderInputStream extends InputStream {
private final Reader reader;
private boolean eof;
private int byteCount;
private byte[] bytes = new byte[6];
public ReaderInputStream(Reader reader) {
this.reader = reader;
}
#Override
public int read() throws IOException {
if (byteCount > 0) {
int c = bytes[0];
--byteCount;
for (int i = 0; i < byteCount; ++i) {
bytes[i] = bytes[i + 1];
}
return c;
}
if (eof) {
return -1;
}
int c = reader.read();
if (c == -1) {
eof = true;
return -1;
}
char ch = (char) c;
String s;
if (Character.isHighSurrogate(ch)) {
c = reader.read();
if (c == -1) {
// Error, low surrogate expected.
eof = true;
//return -1;
throw new IOException("Expected a low surrogate char i.o. EOF");
}
char ch2 = (char) c;
if (!Character.isLowSurrogate(ch2)) {
throw new IOException("Expected a low surrogate char");
}
s = new String(new char [] {ch, ch2});
} else {
s = Character.toString(ch);
}
byte[] bs = s.getBytes(StandardCharsets.UTF_8);
byteCount = bs.length;
System.arraycopy(bs, 0, bytes, 0, byteCount);
return read();
}
}
Path source = Paths.get("...");
Path target = Paths.get("...");
try (Reader reader = Files.newBufferedReader(source, StandardCharsets.UTF_8);
InputStream in = new ReaderInputStream(reader)) {
Files.copy(in, target);
}

finding the number of occurrences for a specific char using recursion

This code below is part of a program, that will find the number of occurrences of the input character in a text file
public static void main(String[] args){
[...]
java.io.File file1=new java.io.File(dirPath1);
FileInputStream fis = new FileInputStream(file1);
System.out.println(" return "+rec(sc.next().charAt(0),fis));
}
public static int rec (char ch, FileInputStream fis)throws IOException{
char current=0;
if(fis.available()==0){
return 0;
}
if(fis.read()!=-1){
current = (char) fis.read();
}
if(current==ch) {
return 1+rec(ch,fis);
}else
return rec(ch,fis);
}
}
The problem is:
If the file has one character, and ch=that one character. it returns 0, when I traced the code I found that it doesn't enter if(current==ch). Although, they are the same char.
if there is more than on character,strings some of them-the matches chars- will enter if block and others won't.
How can I fix this ?
Is there another way to find the number of occurrences recursively ?
another question: should I use try and catch in rec method to catch IOException ?
Thanks in advance
P.S. this program is from assignment,I have to use recursion and compare it with iteration.
you call fis.read() twice first one read first character and second one read nothing
this is your answer
public static int rec(char ch, FileInputStream fis) throws IOException {
char current = 0;
if (fis.available() == 0) {
return 0;
}
int read = fis.read();
if (read != -1) {
current = (char) read;
}
if (current == ch) {
return 1 + rec(ch, fis);
}
else
return rec(ch, fis);
}
My suggestion would be as follows:
Read the whole text file into a java.lang.String
Then use the library Apache Commons Lang and use this method for counting the occurrences:
http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#countMatches-java.lang.CharSequence-java.lang.CharSequence-
You should use FileReader to read chars from text file.
Reader reader = new FileReader("MyFile.txt");
I think using while ((i=reader.read()) != -1) is a better approach instead of three if and an else.
So you can achieve this with fewer lines of code:
public static int rec (char ch, Reader reader)throws IOException{
char current=0;
int i;
while ((i=reader.read()) != -1) {
current = (char) i;
if(current==ch) {
return 1+rec(ch,reader);
}else
return rec(ch,reader);
}
return 0;
}
I think there is no need to use try and catch in rec method to catch IOException. I have used it here:
try {
Reader reader = new FileReader("MyFile.txt");
System.out.println(" return " + rec('a', reader));
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

String serialization unit test failed

After running a Junit test for String serialization, it is failed and gave me the following results:
Expected: "netmodel"
Actual: "l"
The serialize method as follows
public static void serializeString(String objectToSerialize, OutputStream outputStream) {
byte[] bytesArr = objectToSerialize.getBytes();
serializeInt(bytesArr.length, outputStream);
try {
outputStream.write(bytesArr);
} catch (IOException e) {
e.printStackTrace();
}
}
And my deserialize method as follows
public static String deserializeString(InputStream inputStream) {
String deserializeObject = "";
char asciiToChar;
int stringByteArrayLength = deserializeInt(inputStream);
byte[] databytesArr = new byte[stringByteArrayLength];
try {
inputStream.read(databytesArr, 0, stringByteArrayLength);
}
catch (IOException e) {
e.printStackTrace();
}
for (int i = 0; i < databytesArr.length; i++) {
asciiToChar = (char) databytesArr[i];
deserializeObject = "" + Character.toString(asciiToChar);
}
return deserializeObject;
}
Finally, I wrote a unit test as follows
public class StringSerializerTest {
private InputStream iStream;
private ByteArrayOutputStream oStream;
#Before
public void init() {
oStream = new ByteArrayOutputStream();
}
String serialzeAndDeserializeObject(String stringValue) {
OutputStreamUtil.serializeString(stringValue, oStream);
iStream = new ByteArrayInputStream(oStream.toByteArray());
return InputStreamUtil.deserializeString(iStream);
}
#Test
public void equals_equal() {
String stringValue = "netmodel";
String deserializedStringValue = serialzeAndDeserializeObject(stringValue);
assertThat(deserializedStringValue).isEqualTo(stringValue);
}
}
what was wrong? and how to fix it?
You are reassigning the entire value of deserializeObject during each iteration of
for (int i = 0; i < databytesArr.length; i++) {
asciiToChar = (char) databytesArr[i];
deserializeObject = "" + Character.toString(asciiToChar);
}
This results in only the last character (l in this case) being stored in deserializeObject. This loop should append the next character to the deserializeObject as in the following:
for (int i = 0; i < databytesArr.length; i++) {
asciiToChar = (char) databytesArr[i];
deserializeObject += Character.toString(asciiToChar);
}
The corrected deserialization logic would be:
public static String deserializeString(InputStream inputStream) {
String deserializeObject = "";
char asciiToChar;
int stringByteArrayLength = deserializeInt(inputStream);
byte[] databytesArr = new byte[stringByteArrayLength];
try {
inputStream.read(databytesArr, 0, stringByteArrayLength);
}
catch (IOException e) {
e.printStackTrace();
}
for (int i = 0; i < databytesArr.length; i++) {
asciiToChar = (char) databytesArr[i];
deserializeObject += Character.toString(asciiToChar);
}
return deserializeObject;
}
The error was already reported by Justin Albano.
However take also care of strings with non-ASCII: like special characters.
Something like the following. Also one should close at the end to ensure flushing in case of a buffered stream. And theoretically a read could yield only a non-blocking part of the array. DataOutputStream has nice methods, though you seem to roll your own serialisation.
public static void serializeString(String objectToSerialize, OutputStream outputStream)
throws IOException {
byte[] bytesArr = objectToSerialize.getBytes(StandardCharsets.UTF_8);
serializeInt(bytesArr.length, outputStream);
outputStream.write(bytesArr);
}
public static String deserializeString(InputStream inputStream)
throws IOException {
int stringByteArrayLength = deserializeInt(inputStream);
byte[] databytesArr = new byte[stringByteArrayLength];
readFully(inputStream, databytesArr);
return new String(databytesArr, StandardCharsets.UTF_8);
}
private static void readFully(InputStream inputStream, byte[] bytes) throws IOException {
int i = 0;
while (i < bytes.length) {
int nread = inputStream.read(bytes, i, bytes.length - i);
if (nread <= 0) {
throw new IOException("Premature EOF");
}
i += nread;
}
}
Mind that StandardCharsets is not in Android SDK, only standard Java.

Huffman Coding - Dealing with unicode

I've implemented a Huffman coding in java, that works on byte data from an input file. However, it only works when compressing ascii. I'd like to extend it so that it can deal with characters that are larger than 1 byte long, but I'm not sure how to do this exactly.
private static final int CHARS = 256;
private int [] getByteFrequency(File f) throws FileNotFoundException {
try {
FileInputStream fis = new FileInputStream(f);
byte [] bb = new byte[(int) f.length()];
int [] aa = new int[CHARS];
if(fis.read(bb) == bb.length) {
System.out.print("Uncompressed data: ");
for(int i = 0; i < bb.length; i++) {
System.out.print((char) bb[i]);
aa[bb[i]]++;
}
System.out.println();
}
return aa;
} catch (FileNotFoundException e) { throw new FileNotFoundException();
} catch (IOException e) { e.printStackTrace(); }
return null;
}
For example, this is what I'm using to get the frequency of the characters in the file, and obviously it only works on a single byte. If I give it a unicode file, I get an ArrayIndexOutOfBoundsException at aa[bb[i]]++;, and i is normally a negative number. I know this is because aa[bb[i]]++; is only looking at one byte, and the unicode character will be more than one, but I'm not sure on how I can change it.
Can anybody give me some pointers?
Try the following:
private static final int CHARS = 256;
private int [] getByteFrequency(File f) throws FileNotFoundException {
try {
FileInputStream fis = new FileInputStream(f);
byte [] bb = new byte[(int) f.length()];
int [] aa = new int[CHARS];
if(fis.read(bb) == bb.length) {
System.out.print("Uncompressed data: ");
for(int i = 0; i < bb.length; i++) {
System.out.print((char) bb[i]);
aa[((int)bb[i])&0xff]++;
}
System.out.println();
}
return aa;
} catch (FileNotFoundException e) { throw new FileNotFoundException();
} catch (IOException e) { e.printStackTrace(); }
return null;
}
If i'm correct (I haven't tested it), your problem is that byte is a SIGNED value in java. The cast to integer + masking it to 0xff should handle it correctly.

Java: Read up to x chars from a file into array

I want to read a text file and store its contents in an array where each element of the array holds up to 500 characters from the file (i.e. keep reading 500 characters at a time until there are no more characters to read).
I'm having trouble doing this because I'm having trouble understanding the difference between all of the different ways to do IO in Java and I can't find any that performs the task I want.
And will I need to use an array list since I don't initially know how many items are in the array?
It would be hard to avoid using ArrayList or something similar. If you know the file is ASCII, you could do
int partSize = 500;
File f = new File("file.txt");
String[] parts = new String[(f.length() + partSize - 1) / partSize];
But if the file uses a variable-width encoding like UTF-8, this won't work. This code will do the job.
static String[] readFileInParts(String fname) throws IOException {
int partSize = 500;
FileReader fr = new FileReader(fname);
List<String> parts = new ArrayList<String>();
char[] buf = new char[partSize];
int pos = 0;
for (;;) {
int nRead = fr.read(buf, pos, partSize - pos);
if (nRead == -1) {
if (pos > 0)
parts.add(new String(buf, 0, pos));
break;
}
pos += nRead;
if (pos == partSize) {
parts.add(new String(buf));
pos = 0;
}
}
return parts.toArray(new String[parts.size()]);
}
Note that FileReader uses the platform default encoding. To specify a specific encoding, replace it with new InputStreamReader(new FileInputStream(fname), charSet). It bit ugly, but that's the best way to do it.
An ArrayList will definitely be more suitable as you don't know how many elements you're going to have.
There are many ways to read a file, but as you want to keep the count of characters to get 500 of them, you could use the read() method of the Reader object that will read character by character. Once you collected the 500 characters you need (in a String I guess), just add it to your ArrayList (all of that in a loop of course).
The Reader object needs to be initialized with an object that extends Reader, like an InputStreamReader (this one take an implementation of an InputStream as parameter, a FileInputStream when working with a file as input).
Not sure if this will work, but you might want to try something like this (Caution: untested code):
private void doStuff() {
ArrayList<String> stringList = new ArrayList<String>();
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader("file.txt"));
String str;
int count = 0;
while ((str = in.readLine()) != null) {
String temp = "";
for (int i = 0; i <= str.length(); i++) {
temp += str.charAt(i);
count++;
if(count>500) {
stringList.add(temp);
temp = "";
count = 0;
}
}
if(count>500) {
stringList.add(temp);
temp = "";
count = 0;
}
}
} catch (IOException e) {
// handle
} finally {
try {
in.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

Categories