Text to binary conversion and writing to file. help please - java

I'm trying to convert plain text data to binary format so that it becomes non-redable.
The data needs to be written to a file. The conversion works if I print it in console window, and cannot read original text.
However, when it is written to a file, the same original text appears.
How can I write this binary data to a file, without encrypting it, but making it non-readable?
This file, later needs to be processed by another third party tool which accepts binary data. This is why I cannot encrypt it using my own algo.
This is my code:
import java.io.*;
import java.lang.*;
public class convert{
public static void main( String args[] ){
String s= "This is text";
try{
File file= new File("test.php");
file.createNewFile();
FileWriter fw= new FileWriter( file.getName(), true );
BufferedWriter bw= new BufferedWriter( fw );
byte[] b= s.getBytes();
for(int x=0; x<b.length; x++){
byte c=b[x];
bw.write( c );
System.out.println(c);
}
bw.close();
}catch( Exception e ){ e.printStackTrace(); }
}//main
}//class

Plain text is in itself a binary format, where each byte is interpreted as a character (or other variants, depending on assumed or specified encoding, e.g. UTF-8, UTF-16...).
This means that if you write an ASCII character as a byte to a file, it will look identical and will be readable by anyone. In fact, a lot of binary file formats still save strings as normal bytes, which means they can be read when opened in a hex editor, such as here:
What you will need to do to make it unreadable is to serialize it into some common format that is not readable. Normal serialization in Java is unfortunately readable, but you can check here for advice on how to obscure it. You can also use ZIP or similar compression algorithms as well.
Another, more hacky way, is to shift all your character bytes by some known value. This will result in them becoming a different character and it will be unreadable. This can be seen as a very basic Caesar Chipher.
But in the end, all that matters is which formats your target program is able to read.

Well you can use a RandomAccessFile for this. It becomes easy.
RandomAccessFile raf = new RandomAccessFile(path,permissions); //permissions are r,w,rw. The usual ones.
raf.seek(0); //this will set the pointer to first position.
raf.writeUTF(what ever you want to write to file);
//to read the file use readUTF()
raf.seek(0); //you can read from a part of file using this seek method.
System.out.println(raf.readUTF());

Related

high-level streams and low-level streams in IO Java? [duplicate]

I would like to know the specific difference between BufferedReader and FileReader.
I do know that BufferedReader is much more efficient as opposed to FileReader, but can someone please explain why (specifically and in detail)? Thanks.
First, You should understand "streaming" in Java because all "Readers" in Java are built upon this concept.
File Streaming
File streaming is carried out by the FileInputStream object in Java.
// it reads a byte at a time and stores into the 'byt' variable
int byt;
while((byt = fileInputStream.read()) != -1) {
fileOutputStream.write(byt);
}
This object reads a byte(8-bits) at a time and writes it to the given file.
A practical useful application of it would be to work with raw binary/data files, such as images or audio files (use AudioInputStream instead of FileInputStream for audio files).
On the other hand, it is very inconvenient and slower for text files, because of looping through a byte at a time, then do some processing and store the processed byte back is tedious and time-consuming.
You also need to provide the character set of the text file, i.e if the characters are in Latin or Chinese, etc. Otherwise, the program would decode and encode 8-bits at a time and you'd see weird chars printed on the screen or written in the output file (if a char is more than 1 byte long, i.e. non-ASCII characters).
File Reading
This is just a fancy way of saying "File streaming" with inclusive charset support (i.e no need to define the charset, like earlier).
The FileReader class is specifically designed to deal with the text files.
As you've seen earlier, the file streaming is best to deal with raw binary data, but for the sake of text, it is not so efficient.
So the Java-dudes added the FileReader class, to deal specifically with the text files. It reads 2 bytes (or 4 bytes, depends on the charset) at a time. A remarkably huge improvement over the preceding FileInputStream!!
so the streaming operation is like this,
int c;
while ( (c = fileReader.read()) != -1) { // some logic }
Please note, Both classes use an integer variable to store the value retrieved from the input file (so every char is converted into an integer while fetching and back to the char while storing).
The only advantage here is that this class deals only with text files, so you don't have to specify the charset and a few other properties. It provides an out-of-the-box solution, for most of the text files processing cases. It also supports internationalization and localization.
But again it's still very slow (Imaging reading 2 bytes at a time and looping through it!).
Buffering streams
To tackle the problem of continuous looping over a byte or 2. The Java-dudes added another spectacular functionality. "To create a buffer of data, before processing."
The concept is pretty much alike when a user streams a video on YouTube. A video is buffered before playing, to provide flawless video watching experience. (Tho, the browser keeps buffering until the whole video is buffered ahead of time.) The same technique is used by the BufferedReader class.
A BufferedReader object takes a FileReader object as an input which contains all the necessary information about the text file that needs to be read. (such as the file path and charset.)
BufferedReader br = new BufferedReader( new FileReader("example.txt") );
When the "read" instruction is given to the BufferedReader object, it uses the FileReader object to read the data from the file. When an instruction is given, the FileReader object reads 2 (or 4) bytes at a time and returns the data to the BufferedReader and the reader keeps doing that until it hits '\n' or '\r\n' (The end of the line symbol).
Once a line is buffered, the reader waits patiently, until the instruction to buffer the next line is given.
Meanwhile, The BufferReader object creates a special memory place (On the RAM), called "Buffer", and stores all the fetched data from the FileReader object.
// this variable points to the buffered line
String line;
// Keep buffering the lines and print it.
while ((line = br.readLine()) != null) {
printWriter.println(line);
}
Now here, instead of reading 2 bytes at a time, a whole line is fetched and stored in the RAM somewhere, and when you are done with processing the data, you can store the whole line back to the hard disk. So it makes the process run way faster than doing 2 bytes a time.
But again, why do we need to pass FileReader object to the BufferReader? Can't we just say "buffer this file" and the BufferReader would take care of the rest? wouldn't that be sweet?
Well, the BufferReader class is created in a way that it only knows how to create a buffer and to store incoming data. It is irrelevant to the object from where the data is coming. So the same object can be used for many other input streams than just text files.
So being said that, When you provide the FileReader object as an input, it buffers the file, the same way if you provide the InputStreamReader as an object, it buffers the Terminal/Console input data until it hits a newline symbol. such as,
// Object that reads console inputs
InputStreamReader console = new InputStreamReader(System.in);
BufferedReader br = new BufferedReader(console);
System.out.println(br.readLine());
This way, you can read (or buffer) multiple streams with the same BufferReader class, such as text files, consoles, printers, networking data etc, and all you have to remember is,
bufferedReader.readLine();
to print whatever you've buffered.
In simple manner:
A FileReader class is a general tool to read in characters from a File. The BufferedReader class can wrap around Readers, like FileReader, to buffer the input and improve efficiency. So you wouldn't use one over the other, but both at the same time by passing the FileReader object to the BufferedReader constructor.
Very Detail
FileReader is used for input of character data from a disk file. The input file can be an ordinary ASCII, one byte per character text file. A Reader stream automatically translates the characters from the disk file format into the internal char format. The characters in the input file might be from other alphabets supported by the UTF format, in which case there will be up to three bytes per character. In this case, also, characters from the file are translated into char format.
As with output, it is good practice to use a buffer to improve efficiency. Use BufferedReader for this. This is the same class we've been using for keyboard input. These lines should look familiar:
BufferedReader stdin =
new BufferedReader(new InputStreamReader( System.in ));
These lines create a BufferedReader, but connect it to an input stream from the keyboard, not to a file.
Source: http://www.oopweb.com/Java/Documents/JavaNotes/Volume/chap84/ch84_3.html
BufferedReader requires a Reader, of which FileReader is one - it descends from InputStreamReader, which descends from Reader.
FileReader - read character files
BufferedReader - "Read text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines."
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
http://docs.oracle.com/javase/7/docs/api/java/io/FileReader.html
Actually BufferedReader makes use of Readers like FileReader.
FileReader class helps in writing on file but its efficency is low since it has yo retrive one character at a time from file but BufferedReader takes chunks of data and store it in buffer so instead of retriving one character at atime from file retrival becomes easy using buffer.
Bufferedreader - method that you can use actually as a substitute for Scanner method, gets file, gets input.
FileReader - as the name suggests.

Data loss when writing bytes to a file

I'm working on a string compressor for a school assignment,
There's one bug that I can't seem to work out. The compressed data is being written a file using a FileWriter, represented by a byte array. The compression algorithm returns an input stream so the data flows as such:
piped input stream
-> input stream reader
-> data stored in char buffer
-> data written to file with file writer.
Now, the bug is, that with some very specific strings, the second to last byte in the byte array is written wrong. and it's always the same bit values "11111100".
Every time it's this bit values and always the second to last byte.
Here are some samples from the code:
InputStream compress(InputStream){
//...
//...
PipedInputStream pin = new PipedInputStream();
PipedOutputStream pout = new PipedOutputStream(pin);
ObjectOutputStream oos = new ObjectOutputStream(pout);
oos.writeObject(someobject);
oos.flush();
DataOutputStream dos = new DataOutputStream(pout);
dos.writeFloat(//);
dos.writeShort(//);
dos.write(SomeBytes); // ---Here
dos.flush();
dos.close();
return pin;
}
void write(char[] cbuf, int off, int len){
//....
//....
InputStreamReader s = new InputStreamReader(
c.compress(new ByteArrayInputStream(str.getBytes())));
s.read(charbuffer);
out.write(charbuffer);
}
A string which triggers it is "hello and good evenin" for example.
I have tried to iterate over the byte array and write them one by one, it didn't help.
It's also worth noting that when I tried to write to a file using the output stream in the algorithm itself it worked fine. This design was not my choice btw.
So I'm not really sure what i'm doing wrong here.
Considering that you're saying:
Now, the bug is, that with some very specific strings, the second to
last byte in the byte array is written wrong. and it's always the same
bit values "11111100".
You are taking a
binary stream (the compressed data)
-> reading it as chars
-> then writing it as chars.
And your are converting bytes to chars without clearly defining the encoding.
I'd say that the problem is that your InputStreamReader is translating some byte sequences in a way that you're not expecting.
Remember that in encodings like utf-8 two or three bytes may become one single char.
It can't be coincidence that the very byte pattern you pointed out (11111100) Is one of the utf-8 escape codes (1111110x). Check this wikipedia table at and you'll see that uft-8 is destructive since if a byte starts with: 1111110x the next must start with 10xxxxxx.
Meaning that if using utf-8 to convert
bytes1[] -> chars[] -> bytes2[]
in some cases bytes2 will be different from bytes1.
I recommend changing your code to remove those readers. Or specify ASCII encoding to see if that prevent the translations.
I solved this by encoding and decoding the bytes with Base64.

How to convert byte array in String format to byte array?

I have created a byte array of a file.
FileInputStream fileInputStream=null;
File file = new File("/home/user/Desktop/myfile.pdf");
byte[] bFile = new byte[(int) file.length()];
try {
fileInputStream = new FileInputStream(file);
fileInputStream.read(bFile);
fileInputStream.close();
}catch(Exception e){
e.printStackTrace();
}
Now,I have one API, which is expecting a json input, there I have to put the above byte array in String format. And after reading the byte array in string format, I need to convert it back to byte array again.
So, help me to find;
1) How to convert byte array to String and then back to the same byte array?
The general problem of byte[] <-> String conversion is easily solved once you know the actual character set (encoding) that has been used to "serialize" a given text to a byte stream, or which is needed by the peer component to accept a given byte stream as text input - see the perfectly valid answers already given on this. I've seen a lot of problems due to lack of understanding character sets (and text encoding in general) in enterprise java projects even with experienced software developers, so I really suggest diving into this quite interesting topic. It is generally key to keep the character encoding information as some sort of "meta" information with your binary data if it represents text in some way. Hence the header in, for example, XML files, or even suffixes as parts of file names as it is sometimes seen with Apache htdocs contents etc., not to mention filesystem-specific ways to add any kind of metadata to files. Also, when communicating via, say, http, the Content-Type header fields often contain additional charset information to allow for correct interpretation of the actual Contents.
However, since in your example you read a PDF file, I'm not sure if you can actually expect pure text data anyway, regardless of any character encoding.
So in this case - depending on the rest of the application you're working on - you may want to transfer binary data within a JSON string. A common way to do so is to convert the binary data to Base64 and, once transferred, recover the binary data from the received Base64 string.
How do I convert a byte array to Base64 in Java?
is a good starting point for such a task.
String class provides an overloaded constructor for this.
String s = new String(byteArray, "UTF-8");
byteArray = s.getBytes("UTF-8");
Providing an explicit encoding charset is encouraged because different encoding schemes may have different byte representations. Read more here and here.
Also, your inputstream maynot read all the contents in one go. You have to read in a loop until there is nothing more left to be read. Read the documentation. read() returns the number of bytes read.
Reads up to b.length bytes of data from this input stream into an
array of bytes. This method blocks until some input is available
String.getBytes() and String(byte[] bytes) are methods to consider.
Convert byte array to String
String s = new String(bFile , "ISO-8859-1" );
Convert String to byte array
byte bArray[] =s.getBytes("ISO-8859-1");

BufferedWriter is acting strange

I am trying to make a game with a working highscore mechanism and I am using java.io.BufferedWriter to write to a highscore file. I don't have an encryption on the highscore and I am using Slick2D and LWJGL for rendering and user input. The program executes this code:
FileWriter fstream = new FileWriter("res/gabjaphou.txt");
BufferedWriter writer = new BufferedWriter(fstream);
writer.write(score); // score is an int value
writer.close(); // gotta save m'resources! lol
I open the text file generated by this and all it reads is a question mark. I don't know why this happens, and I used other code from another project I was making and I had no problem with that... Does anyone know why? This is really annoying! :C
BufferedWriter.write(int) is meant to write a single charecter, not a integer.
public void write(int c)
throws IOException
Writes a single character.
Overrides: write in class Writer
Parameters: c - int specifying a character to be written
Throws: IOException - If an I/O error occurs
Try
writer.write(String.valueOf(score));
Please use writer.write(String.valueOf(score)); otherwise it writes score as a character.
See the documentation:
Writes a single character. The character to be written is contained in the 16 low-order bits of the given integer value; the 16 high-order bits are ignored.
What you want to use is Writer.write(String); convert score to a String using String.valueOf or Integer.toString.
writer.write(String.valueOf(score));
BufferedWriter is attempting to write a series of bytes to the file, not numbers. A number is still a character.
Consider using FileWriter instead, and something as simple as:
fileWriter.write(Integer.toString(score)) Write takes a string here, but the output should be the same.

how to write hexadecimal values to a binary file

Im currently trying to build a save editor for a video game. Anyway the I figured out how to write to the binary file with output stream rather than writer I'm running into a problem. I'm trying to overwrite certain hexadecimal values but every time I try I end up replacing the whole file, theres probably an easy explanation for this but I also wanted advice on how to replace the hex values converting the hex values (ex. 5acd) from a string only gives me the byte data for the strings. Heres what I'm doing:
String textToWrite = inputField.getText();
byte[] charsToWrite = textToWrite.getBytes();
FileOutputStream out = new FileOutputStream(theFile);
out.write(charsToWrite, 23, charsToWrite.length)
Use a RandomAccessFile. This has the methods that you are looking for. FileOutputStream will only allow you to overwrite or append. However, note as Murali VP eluded to, this will only allow you to perform direct replacements (byte-for-byte) - and not removal or insertion of bytes.
Converting from Hex String to Byte Array (which is essentially what you need) - see this SO post for what you need.
HTH

Categories