Reading/writing a BINARY File with Strings? - java

How can I write/read a string from a binary file?
I've tried using writeUTF / readUTF (DataOutputStream/DataInputStream) but it was too much of a hassle.
Thanks.

Forget about FileWriter, DataOutputStream for a moment.
For binary data one uses OutputStream and InputStream classes. They handle byte[].
For text data one uses Reader and Writer classes. They handle String which can store all kind of text, as it internally uses Unicode.
The crossover from text to binary data can be done by specifying the encoding, which defaults to the OS encoding.
new OutputStreamWriter(outputStream, encoding)
string.getBytes(encoding)
So if you want to avoid byte[] and use String you must abuse an encoding which covers all 256 byte values in any order. So no "UTF-8", but maybe "windows-1252" (also named "Cp1252").
But internally there is a conversion, and in very rare cases problems might happen. For instance é can in Unicode be one code, or two, e + combining diacritical mark right-accent '. There exists a conversion function (java.text.Normalizer) for that.
One case where this already led to problems is file names in different operating systems; MacOS has another Unicode normalisation than Windows, and hence in version control system need special attention.
So on principle it is better to use the more cumbersome byte arrays, or ByteArrayInputStream, or java.nio buffers. Mind also that String chars are 16 bit.

If you want to write text you can use Writers and Readers.
You can use Data*Stream writeUTF/readUTF, but the strings have to be less than 64K characters long.
public static void main(String... args) throws IOException {
// generate a million random words.
List<String> words = new ArrayList<String>();
for (int i = 0; i < 1000000; i++)
words.add(Long.toHexString(System.nanoTime()));
writeStrings("words", words);
List<String> words2 = readWords("words");
System.out.println("Words are the same is " + words.equals(words2));
}
public static List<String> readWords(String filename) throws IOException {
DataInputStream dis = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
int count = dis.readInt();
List<String> words = new ArrayList<String>(count);
while (words.size() < count)
words.add(dis.readUTF());
return words;
}
public static void writeStrings(String filename, List<String> words) throws IOException {
DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(filename)));
dos.writeInt(words.size());
for (String word : words)
dos.writeUTF(word);
dos.close();
}
prints
Words are the same is true

Related

How to read/write extended ASCII characters as a string into ANSI coded text file in java

This is my encryption program. Primarily used to encrypt Files(text)
This part of the program converts List<Integer> elements intobyte [] and writes it into a text file. Unfortunately i cannot provide the algorithm.
void printit(List<Integer> prnt, File outputFile) throws IOException
{
StringBuilder building = new StringBuilder(prnt.size());
for (Integer element : prnt)
{
int elmnt = element;
//building.append(getascii(elmnt));
building.append((char)elmnt);
}
String encryptdtxt=building.toString();
//System.out.println(encryptdtxt);
byte [] outputBytes = offo.getBytes();
FileOutputStream outputStream =new FileOutputStream(outputFile);
outputStream.write(outputBytes);
outputStream.close();
}
This is the decryption program where the decryption program get input from a .enc file
void getfyle(File inputFile) throws IOException
{
FileInputStream inputStream = new FileInputStream(inputFile);
byte[] inputBytes = new byte[(int)inputFile.length()];
inputStream.read(inputBytes);
inputStream.close();
String fylenters = new String(inputBytes);
for (char a:fylenters.toCharArray())
{
usertext.add((int)a);
}
for (Integer bk : usertext)
{
System.out.println(bk);
}
}
Since the methods used here, in my algorithm require List<Integer> byte[] gets converted to String first and then to List<Integer>and vice versa.
The elements while writing into a file during encryption do not match the elements read from the .enc file.
Is my method of converting List<Integer> to byte[] correct??
or is something else wrong? . I do know that java can't print extended ASCII characters so i used this .But, even this failed.It gives a lot of ?s
Is there a solution??
please help me .. and also how to do it for other formats(.png.mp3....etc)
The format of the encrypted file can be anything (it needn't be .enc)
thanxx
There are thousands of different 'extended ASCII' codes and Java supports about a hundred of them,
but you have to tell it which 'Charset' to use or the default often causes data corruption.
While representing arbitrary "binary" bytes in hex or base64 is common and often necessary,
IF the bytes will be stored and/or transmitted in ways that preserve all 256 values, often called "8-bit clean",
and File{Input,Output}Stream does, you can use "ISO-8859-1" which maps Java char codes 0-255 to and from bytes 0-255 without loss, because Unicode is based partly on 8859-1.
on input, read (into) a byte[] and then new String (bytes, charset) where charset is either the name "ISO-8859-1"
or the java.nio.charset.Charset object for that name, available as java.nio.charset.StandardCharSets.ISO_8859_1;
or create an InputStreamReader on a stream reading the bytes from a buffer or directly from the file, using that charset name or object, and read chars and/or a String from the Reader
on output, use String.getBytes(charset) where charset is that charset name or object and write the byte[];
or create an OutputStreamWriter on a stream writing the bytes to a buffer or the file, using that charset name or object, and write chars and/or String to the Writer
But you don't actually need char and String and Charset at all. You actually want to write a series of Integers as bytes, and read a series of bytes as Integers. So just do that:
void printit(List<Integer> prnt, File outputFile) throws IOException
{
byte[] outputBytes = new byte[prnt.size()]; int i = 0;
for (Integer element : prnt) outputBytes[i++] = (byte)element;
FileOutputStream outputStream =new FileOutputStream(outputFile);
outputStream.write(b);
outputStream.close();
// or replace the previous three lines by one
java.nio.file.Files.write (outputFile.toPath(), outputBytes);
}
void getfyle(File inputFile) throws IOException
{
FileInputStream inputStream = new FileInputStream(inputFile);
byte[] inputBytes = new byte[(int)inputFile.length()];
inputStream.read(inputBytes);
inputStream.close();
// or replace those four lines with
byte[] inputBytes = java.nio.file.Files.readAllBytes (inputFile.toPath());
for (byte b: inputBytes) System.out.println (b&0xFF);
// or if you really wanted a list not just a printout
ArrayList<Integer> list = new ArrayList<Integer>(inputBytes.length);
for (byte b: inputBytes) list.add (b&0xFF);
// return list or store it or whatever
}
Arbitrary data bytes are not all convertible to any character encoding and encryption creates data bytes including all values 0 - 255.
If you must convert the encrypted data to a string format the standard methods are to convert to Base64 or hexadecimal.
In encryption part:
`for (Integer element : prnt)
{
int elmnt = element;
//building.append(getascii(elmnt));
char b = Integer.toString(elmnt).charAt(0);
building.append(b);
}`
-->this will convert int to char like 1 to '1' and 5 to '5'

Read faster a file & convert it into HEX

I need to read a file that is in ascii and convert it into hex before applying some functions (search for a specific caracter)
To do this, I read a file, convert it in hex and write into a new file. Then I open my new hex file and I apply my functions.
My issue is that it makes way too much time to read and convert it (approx 8sec for a 9Mb file)
My reading method is :
public static void convertToHex2(PrintStream out, File file) throws IOException {
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
int value = 0;
StringBuilder sbHex = new StringBuilder();
StringBuilder sbResult = new StringBuilder();
while ((value = bis.read()) != -1) {
sbHex.append(String.format("%02X ", value));
}
sbResult.append(sbHex);
out.print(sbResult);
bis.close();
}
Do you have any suggestions to make it faster ?
Did you measure what your actual bottleneck is? Because you seem to read very little amount of data in your loop and process that each time. You might as well read larger chunks of data and process those, e.g. using DataInputStream or whatever. That way you would benefit more from optimized reads of your OS, file system, their caches etc.
Additionally, you fill sbHex and append that to sbResult, to print that somewhere. Looks like an unnecessary copy to me, because sbResult will always be empty in your case and with sbHex you already have a StringBuilder for your PrintStream.
Try this:
static String[] xx = new String[256];
static {
for( int i = 0; i < 256; ++i ){
xx[i] = String.format("%02X ", i);
}
}
and use it:
sbHex.append(xx[value]);
Formatting is a heavy operation: it does not only the coversion - it also has to look at the format string.

java convert utf-8 2 byte char to 1 byte char

There are many similar questions, but no one helped me.
utf-8 can be 1 byte or 2,3,4.
ISO-8859-15 is allways 2 bytes.
But I need 1 byte character like code page Code "page 863" (IBM863).
http://en.wikipedia.org/wiki/Code_page_863
For example "é" is code point 233 and is 2 bytes long in utf 8, how can I convert it to IBM863 (1 byte) in Java?
Running on JVM -Dfile.encoding=UTF-8 possible?
Of course that conversion would mean that some characters can be lost, because IBM863 is smaller.
But I need the language specific characters, like french, è, é etc.
Edit1:
String text = "text with é";
Socket socket = getPrinterSocket( printer);
BufferedWriter bwOut = getPrinterWriter(printer,socket);
...
bwOut.write("PRTXT \"" + text + "\n");
...
if (socket != null)
{
bwOut.close();
socket.close();
}
else
{
bwOut.flush();
}
Its going a label printer with Fingerprint 8.2.
Edit 2:
private BufferedWriter getPrinterWriter(PrinterLocal printer, Socket socket)
throws IOException
{
return new BufferedWriter(new OutputStreamWriter(socket.getOutputStream()));
}
First of all: there is no such thing as "1 byte char" or, in fact, "n byte char" for whatever n.
In Java, a char is a UTF-16 code unit; depending on the (Unicode) code point, either one, or two chars, are necessary to represent a code point.
You can use the following methods:
Character.toChars() to turn a Unicode code point into a char array representing this code point;
a CharsetEncoder to perform the char[] to byte[] conversion;
a CharsetDecoder to perform the byte[] to char[] conversion.
You obtain the two latter from a Charset's .new{Encoder,Decoder}() methods.
It is crucially important here to know what your input is exactly: is it a code point, is it an encoded byte array? You'll have to adapt your code depending on this.
Final note: the file.encoding setting defines the default charset to use when you don't specify a charset to use, for instance in a FileReader constructors; you should avoid not specifying a charset to begin with!
byte[] someUtf8Bytes = ...
String decoded = new String(someUtf8Bytes, StandardCharsets.UTF8);
byte[] someIso15Bytes = decoded.getBytes("ISO-8859-15");
byte[] someCp863Bytes = decoded.getBytes("cp863");
If you start with a string, use just getBytes with a proper encoding.
If you want to write strings with a proper encoding to a socket, you can either use OutputStream instead of PrintStream or Writer and send byte arrays, or you can do:
new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), "cp863"))

Java Binary type in file

I have a problem with binary type. I have binary file with data. Every element is split by "_". I am using
DataInputStream in = new DataInputStream(new FileInputStream("C:/Data/"+names)); , where names is the name of my binary file. How I can read this file and saving elements in array? This is possible?
When writing to a binary file, there is no need to split each items in the matrix with '_'. The program knows how many bytes allocated for each item.
The following code write 2 doubles without '_' in between. After that, it reads them back from the file and output the data.
public class Test {
public static void main(String[] args) throws Exception {
DataOutputStream dos = new DataOutputStream(new FileOutputStream("a.bin"));
dos.writeDouble(1.2);
dos.writeDouble(3.4);
dos.close();
DataInputStream dis = new DataInputStream(new FileInputStream("a.bin"));
System.out.println(dis.readDouble());
System.out.println(dis.readDouble());
dis.close();
}
}
The program outputs:
1.2
3.4
But if you didn't write the file and there is '_' between items, you can use readChar() after reading each item from the binary file as #Bhaskar already mentioned.
Finally, using ObjectOutputStream can write the whole array at once.
public class Test {
public static void main(String[] args) throws Exception {
ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream("a.bin"));
double[] a = {1.2, 3.4};
oos.writeObject(a);
oos.close();
ObjectInputStream ois = new ObjectInputStream(new FileInputStream("a.bin"));
double[] b = (double[]) ois.readObject();
System.out.println(b[0]);
System.out.println(b[1]);
ois.close();
}
}
It depends on how the data was written down into that file. If it was written using DataOutputStream's writeXXX() where XXX stands for the actual data type of elements, and where elements were separated by a writeChar('_') , then you can easily read them back using DataInputStream's readXXX() method. Just make sure that you read the elements in the exact sequence that they were written , and also that you use readChar() whenever you expect the - to be present ( ie between two elements).
You can use read(byte[]) or read(byte[],ffset,length) to read the content of file into byte array.

How to save Chinese Characters to file with java?

I use the following code to save Chinese characters into a .txt file, but when I opened it with Wordpad, I couldn't read it.
StringBuffer Shanghai_StrBuf = new StringBuffer("\u4E0A\u6D77");
boolean Append = true;
FileOutputStream fos;
fos = new FileOutputStream(FileName, Append);
for (int i = 0;i < Shanghai_StrBuf.length(); i++) {
fos.write(Shanghai_StrBuf.charAt(i));
}
fos.close();
What can I do ? I know if I cut and paste Chinese characters into Wordpad, I can save it into a .txt file. How do I do that in Java ?
There are several factors at work here:
Text files have no intrinsic metadata for describing their encoding (for all the talk of angle-bracket taxes, there are reasons XML is popular)
The default encoding for Windows is still an 8bit (or doublebyte) "ANSI" character set with a limited range of values - text files written in this format are not portable
To tell a Unicode file from an ANSI file, Windows apps rely on the presence of a byte order mark at the start of the file (not strictly true - Raymond Chen explains). In theory, the BOM is there to tell you the endianess (byte order) of the data. For UTF-8, even though there is only one byte order, Windows apps rely on the marker bytes to automatically figure out that it is Unicode (though you'll note that Notepad has an encoding option on its open/save dialogs).
It is wrong to say that Java is broken because it does not write a UTF-8 BOM automatically. On Unix systems, it would be an error to write a BOM to a script file, for example, and many Unix systems use UTF-8 as their default encoding. There are times when you don't want it on Windows, either, like when you're appending data to an existing file: fos = new FileOutputStream(FileName,Append);
Here is a method of reliably appending UTF-8 data to a file:
private static void writeUtf8ToFile(File file, boolean append, String data)
throws IOException {
boolean skipBOM = append && file.isFile() && (file.length() > 0);
Closer res = new Closer();
try {
OutputStream out = res.using(new FileOutputStream(file, append));
Writer writer = res.using(new OutputStreamWriter(out, Charset
.forName("UTF-8")));
if (!skipBOM) {
writer.write('\uFEFF');
}
writer.write(data);
} finally {
res.close();
}
}
Usage:
public static void main(String[] args) throws IOException {
String chinese = "\u4E0A\u6D77";
boolean append = true;
writeUtf8ToFile(new File("chinese.txt"), append, chinese);
}
Note: if the file already existed and you chose to append and existing data wasn't UTF-8 encoded, the only thing that code will create is a mess.
Here is the Closer type used in this code:
public class Closer implements Closeable {
private Closeable closeable;
public <T extends Closeable> T using(T t) {
closeable = t;
return t;
}
#Override public void close() throws IOException {
if (closeable != null) {
closeable.close();
}
}
}
This code makes a Windows-style best guess about how to read the file based on byte order marks:
private static final Charset[] UTF_ENCODINGS = { Charset.forName("UTF-8"),
Charset.forName("UTF-16LE"), Charset.forName("UTF-16BE") };
private static Charset getEncoding(InputStream in) throws IOException {
charsetLoop: for (Charset encodings : UTF_ENCODINGS) {
byte[] bom = "\uFEFF".getBytes(encodings);
in.mark(bom.length);
for (byte b : bom) {
if ((0xFF & b) != in.read()) {
in.reset();
continue charsetLoop;
}
}
return encodings;
}
return Charset.defaultCharset();
}
private static String readText(File file) throws IOException {
Closer res = new Closer();
try {
InputStream in = res.using(new FileInputStream(file));
InputStream bin = res.using(new BufferedInputStream(in));
Reader reader = res.using(new InputStreamReader(bin, getEncoding(bin)));
StringBuilder out = new StringBuilder();
for (int ch = reader.read(); ch != -1; ch = reader.read())
out.append((char) ch);
return out.toString();
} finally {
res.close();
}
}
Usage:
public static void main(String[] args) throws IOException {
System.out.println(readText(new File("chinese.txt")));
}
(System.out uses the default encoding, so whether it prints anything sensible depends on your platform and configuration.)
If you can rely that the default character encoding is UTF-8 (or some other Unicode encoding), you may use the following:
Writer w = new FileWriter("test.txt");
w.append("上海");
w.close();
The safest way is to always explicitly specify the encoding:
Writer w = new OutputStreamWriter(new FileOutputStream("test.txt"), "UTF-8");
w.append("上海");
w.close();
P.S. You may use any Unicode characters in Java source code, even as method and variable names, if the -encoding parameter for javac is configured right. That makes the source code more readable than the escaped \uXXXX form.
Be very careful with the approaches proposed. Even specifying the encoding for the file as follows:
Writer w = new OutputStreamWriter(new FileOutputStream("test.txt"), "UTF-8");
will not work if you're running under an operating system like Windows. Even setting the system property for file.encoding to UTF-8 does not fix the issue. This is because Java fails to write a byte order mark (BOM) for the file. Even if you specify the encoding when writing out to a file, opening the same file in an application like Wordpad will display the text as garbage because it doesn't detect the BOM. I tried running the examples here in Windows (with a platform/container encoding of CP1252).
The following bug exists to describe the issue in Java:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
The solution for the time being is to write the byte order mark yourself to ensure the file opens correctly in other applications. See this for more details on the BOM:
http://mindprod.com/jgloss/bom.html
and for a more correct solution see the following link:
http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html
Here's one way among many. Basically, we're just specifying that the conversion be done to UTF-8 before outputting bytes to the FileOutputStream:
String FileName = "output.txt";
StringBuffer Shanghai_StrBuf=new StringBuffer("\u4E0A\u6D77");
boolean Append=true;
Writer writer = new OutputStreamWriter(new FileOutputStream(FileName,Append), "UTF-8");
writer.write(Shanghai_StrBuf.toString(), 0, Shanghai_StrBuf.length());
writer.close();
I manually verified this against the images at http://www.fileformat.info/info/unicode/char/ . In the future, please follow Java coding standards, including lower-case variable names. It improves readability.
Try this,
StringBuffer Shanghai_StrBuf=new StringBuffer("\u4E0A\u6D77");
boolean Append=true;
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(FileName,Append), "UTF8"));
for (int i=0;i<Shanghai_StrBuf.length();i++) out.write(Shanghai_StrBuf.charAt(i));
out.close();

Categories