How to write a one-byte value in a binary file-java

How to write a one-byte value in a binary file-java - java

I have tried a lot with many ways to write a program that : write a one byte value in a file as it is.. for example write 01010101 in a file.. then i want to read the file and print what i wrote.So it should display 01010101. None of my codes worked so. Any help?
Because i am writing a compression program it essential to be 1 byte and not 8
import java.io.DataOutputStream;
import java.io.FileOutputStream;
import java.io.FileInputStream;
import java.io.File;
import java.io.BufferedReader;
import java.io.InputStreamReader;
public class Main2 {
public static void main(String[] args) throws Exception {
DataOutputStream dos = new DataOutputStream(new FileOutputStream("Text.t"));
dos.writeBytes(String.valueOf(01010101));
File file = new File("Text.t");
BufferedReader br = new BufferedReader(
new InputStreamReader(
new FileInputStream(file)));
System.out.println(br.readLine());
dos.close();
br.close();
}
}
It works well with binary code that starst with 1 but with 0 not.. for example for 01010101 it shows 266305

The problem with "It works well with binary code that starst with 1 but with 0 not.. for example for 01010101 it shows 266305" is that 01010101 is an octal literal and is read in by the compiler as base-8 (aka Octal).
Use 1010101 when writing the literal - leading zeros mean nothing to numbers; but they do mean something to how the Java code is parsed!
Decimal numbers that are displayed as "00xyz" are often zero-padded, which is applied to the string representation; the number itself is xyz.
From the comment I believe the desired operation is to use a binary literal. You'll have to emit this using a "bit converter" to display as expected - the bit converter will take the value of eg. 0b11 (integer 3) and turn it into a string "11". You may also want to apply a padding with an assumed output width - again, 0b01 == 0b1 and the leading 0 means nothing to an integer.
The following will emit the decimal string representation of the huffman bit sequence, without any leading zeros. However this when paired with above should get you off on the right track.
dos.writeBytes(String.valueOf(0b01001010));

I would use a Byte representation for radix 2 e.g. Byte.parseByte("00010001", 2).
But the problem is Java's primitives are signed numbers so it won't work for negative values (when first digit is 1), thus Byte.parseByte("10010011", 2) will throw a NumberFormatException.
The trick here is to initially replace leading digit (if it is 1, with 0), parse it and then set the bit again to 1. Then store this byte to your file.
private static byte binaryStringToByte(String s) {
//also check for null, length = 8, contain 0/1 only etc.
if (s.startsWith("0")) {
return Byte.parseByte(s, 2);
} else {
StringBuilder sBuilder = new StringBuilder(s);
sBuilder.setCharAt(0, '0');
byte temp = Byte.parseByte(sBuilder.toString(), 2);
return (byte) (temp | (1 << 7));
}
}
Then, to get the binary String representation of a byte use this code:
byte b = binaryStringToByte("10001000");
String s1 = String.format("%8s", Integer.toBinaryString(b & 0xFF)).replace(' ', '0');

Related

The readChar() method displays japanese character

I'm trying to write a code that pick-up a word from a file according to an index entered by the user but the problem is that the method readChar() from the RandomAccessFile class is returning japanese characters, I must admit that it's not the first time that I've seen this on my lenovo laptop , sometimes on some installation wizards I can see mixed stuff with normal characters mixed with japanese characters, do you think it comes from the laptop or rather from the code?
This is the code:
package com.project;
import java.io.*;
import java.util.StringTokenizer;
public class Main {
public static void main(String[] args) throws IOException {
int N, i=0;
char C;
char[] charArray = new char[100];
String fileLocation = "file.txt";
BufferedReader buffer = new BufferedReader(new InputStreamReader(System.in));
do {
System.out.println("enter the index of the word");
N = Integer.parseInt(buffer.readLine());
if (N!=0) {
RandomAccessFile word = new RandomAccessFile(new File(fileLocation), "r");
do {
word.seek((2*(N-1))+i);
C = word.readChar();
charArray[i] = C;
i++;
}while(charArray[i-1] != ' ');
System.out.println("the word of index " + N + " is: " );
for (char carTemp : charArray )
System.out.print(carTemp);
System.out.print("\n");
}
}while(N!=0);
buffer.close();
}
}
i get this output :
瑯潕啰灰灥敲牃䍡慳獥攨⠩⤍ഊੴ瑯潌䱯潷睥敲牃䍡慳獥攨⠩⤍ഊ੣捯潭浣捡慴琨⡓却瑲物楮湧朩⤍ഊ੣捨桡慲牁䅴琨⡩楮湴琩⤍ഊੳ獵畢扳獴瑲物楮湧木⠠⁳獴瑡慲牴琠⁩楮湤摥數砬Ⱐ⁥敮湤搠⁩楮湤摥數砩⤍ഊੴ瑲物業洨⠩Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 100 out of bounds for length 100
at Main.main(Main.java:21)

There are many things wrong, all of which have to do with fundamental misconceptions.
First off: A file on your disk - never mind the File interface in Java, or any other programming language; the file itself - does not and cannot store text. Ever. It stores bytes. That is, raw data, as (on every machine that's been relevant for decades, but historically there have been other ways to do it) quantified in bits, which are organized into groups of 8 that are called bytes.
Text is an abstraction; an interpretation of some particular sequence of byte values. It depends - fundamentally and unavoidably - on an encoding. Because this isn't a blog, I'll spare you the history lesson here, but suffice to say that Java's char type does not simply store a character of text. It stores an unsigned two-byte value, which may represent a character of text. Because there are more characters of text in Unicode than two bytes can represent, sometimes two adjacent chars in an array are required to represent a character of text. (And, of course, there is probably code out there that abuses the char type simply because someone wanted an unsigned equivalent of short. I may even have written some myself. That era is a blur for me.)
Anyway, the point is: using .readChar() is going to read two bytes from your file, and store them into a char within your char[], and the corresponding numeric value is not going to be anything like the one you wanted - unless your file happens to be encoded using the same encoding that Java uses natively, called UTF-16.
You cannot properly read and interpret the file without knowing the file encoding. Full stop. You can at best delude yourself into believing that you can read it. You also cannot have "random access" to a text file - i.e., indexing according to a number of characters of text - unless the encoding in question is constant width. (Otherwise, of course, you can't just calculate the distance-in-bytes into the file where a given character of text is; it depends on how many bytes the previous characters took up, which depends on which characters they are.) Many text encodings are not constant width. One of the most popular, which frankly is the sane default recommendation for most tasks these days, is not. In which case you are simply out of luck for the problem you describe.
At any rate, once you know the encoding of your file, the expected way to retrieve a character of text from a file in Java is to use one of the Reader classes, such as InputStreamReader:
An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters using a specified charset. The charset that it uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.
(Here, charset simply means an instance of the class that Java uses to represent text encodings.)
You may be able to fudge your problem description a little bit: seek to a byte offset, and then grab the text characters starting at that offset. However, there is no guarantee that the "text characters starting at that offset" make any sense, or in fact can be decoded at all. If the offset happens to be in the middle of a multi-byte encoding for a character, the remaining part isn't necessarily valid encoded text.

char is 16 bits, i.e. 2 bytes.
seek seeks to a byte boundary.
If the file contains chars then they are at even offsets: 0, 2, 4...
The expression (2*(N-1))+i) is even iff i is even; if odd, you are sure to land in the middle of a char, and thus read garbage.
i starts at zero, but you increment by 1, i.e., half a character.
Your seek argument should probably be (2*(N-1+i)).
Alternative explanation: your file does not contain chars at all; for example, you created an ASCII file in which a character is a single byte.
In that case, the error is attempting to read ASCII (an obsolete character encoding) with a readChar function.
But if the file contains ASCII, the purpose of multiplying by 2 in the seek argument is obscure. It apparently serves no useful purpose.

I changed the encoding of the file to UTF-16 and modified the programe in order to display the right indexes, those that represents the beginning of each word, now it works fine, Thank you guys.
import java.io.*;
public class Main {
public static void main(String[] args) throws IOException {
int N, i=0, j=0, k=0;
char C;
char[] charArray = new char[100];
String fileLocation = "file.txt";
BufferedReader buffer = new BufferedReader(new InputStreamReader(System.in));
DataInputStream in = new DataInputStream(new FileInputStream(fileLocation));
boolean EOF=false;
do {
try {
j++;
C = in.readChar();
if((C==' ')||(C=='\n')){
System.out.print(j+1+"\t");
}
}catch (IOException e){
EOF=true;
}
}while (EOF!=true);
System.out.println("\n");
do {
System.out.println("enter the index of the word");
N = Integer.parseInt(buffer.readLine());
if (N!=0) {
RandomAccessFile word = new RandomAccessFile(new File(fileLocation), "r");
do {
word.seek((2*(N-1+i)));
C = word.readChar();
charArray[i] = C;
i++;
}while(charArray[i-1] != ' ' && charArray[i-1] != '\n');
System.out.print("the word of index " + N + " is: " );
for (char carTemp : charArray )
System.out.print(carTemp);
System.out.print("\n");
i=0;
charArray = new char[100];
}
}while(N!=0);
buffer.close();
}
}

Java - read UTF-8 file with a single emoji symbol

I have a file with a single unicode symbol.
The file is encoded in UTF-8.
It contains a single symbol represented as 4 bytes.
https://www.fileformat.info/info/unicode/char/1f60a/index.htm
F0 9F 98 8A
When I read the file I get two symbols/chars.
The program below prints
?
2
?
?
55357
56842
======================================
😊
16
&
======================================
?
2
?
======================================
Is this normal... or a bug? Or am I misusing something?
How do I get that single emoji symbol in my code?
EDIT: And also... how do I escape it for XML?
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStreamReader;
public class Test008 {
public static void main(String[] args) throws Exception{
BufferedReader in = new BufferedReader(
new InputStreamReader(
new FileInputStream("D:\\DATA\\test1.txt"), "UTF8"));
String s = "";
while ((s = in.readLine()) != null) {
System.out.println(s);
System.out.println(s.length());
System.out.println(s.charAt(0));
System.out.println(s.charAt(1));
System.out.println((int)(s.charAt(0)));
System.out.println((int)(s.charAt(1)));
String z = org.apache.commons.lang.StringEscapeUtils.escapeXml(s);
String z3 = org.apache.commons.lang3.StringEscapeUtils.escapeXml(s);
System.out.println("======================================");
System.out.println(z);
System.out.println(z.length());
System.out.println(z.charAt(0));
System.out.println("======================================");
System.out.println(z3);
System.out.println(z3.length());
System.out.println(z3.charAt(0));
System.out.println("======================================");
}
in.close();
}
}

Yes normal, the Unicode symbol is 2 UTF-16 chars (1 char is 2 bytes).
int codePoint = s.codePointAt(0); // Your code point.
System.out.printf("U+%04X, chars: $d%n", codePoint, Character.charCount(cp));
U+F09F988A, chars: 2
After comments
Java, using a Stream:
public static String escapeToAsciiHTML(String s) {
StringBuilder sb = new StringBuilder();
s.codePoints().forEach(cp -> {
if (cp < 128) {
sb.append((char) cp);
} else{
sb.append("&#").append(cp).append(";");
}
});
return sb.toString();
}

StringEscapeUtils is broken. Don't use it. Try NumericEntityEscaper.
Or, better yet, as apache commons libraries tend to be bad API** and broken*** anyway, guava*'s XmlEscapers
java is unicode, yes, but 'char' is a lie. 'char' does not represent characters; it represents a single, unsigned 16 bit number. The actual method to get a character out of, say, a j.l.String object isn't charAt, which is a misnomer; it's codepointAt, and friends.
This (char being a fakeout) normally doesn't matter; most actual characters fit in the 16-bit char type. But when they don't, this matters, and that emoji doesn't fit. In the unicode model used by java and the char type, you then get 2 char values (representing a single unicode character). This pair is called a 'surrogate pair'.
Note that the right methods tend to work in int (you need the 32 bits to represent one single unicode symbol, after all).
*) guava has its own issues, by being aggressively not backwards compatible with itself, it tends to lead to dependency hell. It's a pick your poison kind of deal, unfortunately.
**) Utils-anything is usually a sign of bad API design; 'util' is almost meaningless as a term and usually implies you've broken the object oriented model. The right model is of course to have an object representing the process of translating data in one form (say, a raw string) to another (say, a string that can be dumped straight into an XML file, escaped and well) - and such a thing would thus be called an 'escaper', and would live perhaps in a package named 'escapers' or 'text'. Later editions of apache libraries, as well as guava, fortunately 'fixed' this.
***) As this very example shows, these APIs often don't do what you want them to. Note that apache is open source; if you want these APIs to be better, they accept pull requests :)

In what format is data sent on, getInputStream on a URLconnection Object?

Im trying to connect to a php script on a server and retrieve the text the script echoes.Do accomplish I used the following code.
CODE:=
import java.net.*;
import java.io.*;
class con{
public static void main(String[] args){
try{
int c;
URL tj = new URL("http://www.thejoint.cf/test.php");
URLConnection tjcon = tj.openConnection();
InputStream input = tjcon.getInputStream();
while(((c = input.read()) != -1)){
System.out.print((char) c);
}
input.close();
}catch(Exception e){
System.out.println("Caught this Exception:"+e);
}
}
}
I do get the desired output that is the text "You will be Very successful".But when I remove the (char) type casting it yields a 76 digit long.
8911111732119105108108329810132118101114121321151179999101115115102117108108
number which I'm not able to make sense of.I read that the getInputStream is a byte stream, then should there be number of digits times 8 number long output?
Any insight would be very helpful, Thank you

It does not print one number 76 digits long. You have a loop there, it prints a lot of numbers, each up to three digits long (one byte).
In ASCII, 89 = "Y", 111 = "o" ....
What the cast to char that you removed did was that it interpreted that number as a Unicode code point and printed the corresponding characters instead (also one at a time).
This way of reading text byte by byte is very fragile. It basically only works with ASCII. You should be using a Reader to wrap the InputStream. Then you can read char and String directly (and it will take care of character sets such as Unicode).
Oh I thought it would give out the byte representation of the individual letter.
But that's exactly what it does.
You can see it more clearly if you use println instead of print (then it will print each number on its own line).

java huffman compressor output bigger than original

I'm doing a Huffman compressor for homework and I managed to build the Huffman tree and the code of 0 and 1 for all the char but the output file is bigger then the original.
There was a question which like mine here
Unable to compress file during Huffman Encoding in Java
but I didn't get it very much.
My code:
this.HuffmanTreeBulid();////create the Huffman tree
HuffmanNode root =tree;
this.codeGenerator(root, codes);//create the hushmap
try
{
FileOutputStream out2 = new FileOutputStream(fileOut);//for the new file
FileInputStream in = new FileInputStream(fileInput);//for reading again the orignal file
FileWriter out = new FileWriter(fileOut);
//String code;
char currentchar;
int currentByte;//int for going on all the bytes from the file
if(!fileOut.exists())//if new file exits then replace it if not create it
fileOut.createNewFile();
else
{
fileOut.delete();
fileOut.createNewFile();
}
while((currentByte = in.read())!=-1)
{
int currentint =currentByte& 0xff;//"& 0xff" is for unsigned int
currentchar=(char)currentint;
byte[] c=(huffmanCodes.get(currentchar)).getBytes();
//out.write(huffmanCodes.get(code2));
//out.write(huffmanCodes.get(currentchar));//for FileWriter
out2.write(c);
}
in.close();
out.close();
out2.close();
}
catch (IOException e)
{
e.printStackTrace();
}
updete 1:
i understand the problem so i traid doing this
int bitIndex = 0;
for (int i=0;i<codes.length();i++)
{
if(codes.charAt(i)=='1')
buffer.set(bitIndex++);
else
buffer.clear(bitIndex++);
}
still dosnt work :(
updete 2: im doing this for getting the byte from the string
byte[] bytes = new BigInteger(binaryString, 2).toByteArray();
for (byte b : bytes)
{
out2.write(b);
}
still wont work but its the close i can get till now
maybe the byte is fine but im writing in a wrong way?

The Problem is the following line:
byte[] c=(huffmanCodes.get(currentchar)).getBytes();
You try to get your coded String to bare bits and bytes. But in fact, getBytes()returns just the encoded bytesequence in your platform standard. So you get maybe an UTF-8 Byte encoding for the character "1" and an UTF-8 Byte encoding for the character "0".
You have to parse your String to a byte. You can see how to do so here:
java: convert binary string to int
or here:
How to convert binary string to a byte?
you can read more about the getBytes method here:
https://beginnersbook.com/2013/12/java-string-getbytes-method-example/
as #9000 mentioned you do not have a Bitstream.
Working with compressors bitstreams might be more suitable than working with complete bytes. so parsing a complete byte will not compress your string as a char remains the size of a char.
what you can do, is to concatenate the resulting binary strings and then parse the string to bytes at the end. Be aware of trailing zeros.

I would suggest adding something like this:
class BitstreamPacker {
private int bitPos; // Actual values 0..7; where to add the next bit.
private ArrayList<Byte> data;
public addBit(bool bit) {
// Add the bit to the last byte of data; allocate more if does not fit.
// Adjusts bitPos as it goes.
}
public void writeBytes(ByteOutputStream output) {
// Writes the number of bytes, then the last bit pos, then the bytes.
}
}
Similarly,
class BitstreamUnpacker {
private byte[] data; // Or ArrayList if you wish.
private int currentBytePos;
private int currentBitPos; // Could be enough to track the global bit position.
public static BitstreamUnpacker fromByteStream(ByteInputStream input) {
// A factory method; reads the stream and creates an instance.
// Uses the byte count to allocate the right amount of bytes;
// uses the bit count to limit the last byte to the actual number of bits.
return ...;
}
public Bool getNextBit() {
// Reads bits sequentially from the internal data.
// Returns null when the end of data is reached.
// Or feel free to implement an iterator / iterable.
}
}
Note that the bit stream may end at the middle of the byte, so storing the count of bits in the last byte is required.
To help you better understand the idea, here's some Python code (because Python is easy to play with interactively):
class BitstreamPacker(object):
def __init__(self):
self.data = [] # A list of bytes.
self.bit_offset = 0 # 0..7.
def add_bit(self, bit):
if self.bit_offset == 0: # We must begin a new byte.
self.data.append(0) # Append a new byte.
# We use addition because we know that the bit we're affecting is 0.
# [-1] means last element.
self.data[-1] += (bit << self.bit_offset)
self.bit_offset += 1
if self.bit_offset > 7: # We've exceeded one byte.
self.bit_offset = 0 # Shift the offset to the beginning of a byte.
def get_bytes(self):
# Just returning the data instead of writing, to simplify interactive use.
return (len(self.data), self.bit_offset, self.data)
How does it work from Python REPL?
>>> bp = BitstreamPacker()
>>> bp.add_bit(1)
>>> bp.add_bit(1)
>>> bp.get_bytes()
(1, 2, [3]) # One byte, two bits in it are used.
>>> bp.add_bit(0)
>>> bp.add_bit(0)
>>> bp.add_bit(0)
>>> bp.add_bit(1)
>>> bp.add_bit(1)
>>> bp.add_bit(1)
>>> bp.get_bytes()
(1, 0, [227]) # Whole 8 bits of one byte were used.
>>> bp.add_bit(1)
>>> bp.get_bytes()
(2, 1, [227, 1]) # Two bytes used: one full, and one bit in the next.
>>> assert 0b11100011 == 227 # The binary we sent matches.
>>> _
I hope this helps.

Converting string of number to Hex, Binary, and Decimal

I'm trying to answer the question below:
Use an array or ArrayList and generate 20 random numbers (Integer
values between 0 and 100. 100 not inclusive). The program should
perform the following tasks.
Write the numbers from the array or ArrayList to a file.
Read the numbers from the file and display them on the console in decimal, hexadecimal and binary.
So far I have the random generator working well, and the file is being written. As for re-reading the file and displaying the numbers from the file as hex, decimal and binary...I am completely lost. Here is that I have so far.
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Random;
public class Write {
public static void main(String[] args) throws IOException {
Random generator = new Random();
ArrayList numList = new ArrayList();
int n = 0;
while( n < 20 ) {
int numGen = generator.nextInt(100);
numList.add(numGen);
n++;
}
String result = numList.toString().replaceAll("[\\[\\]]", "");
result = result.replace(",", " ");
System.out.print(result);
String filePath = "C:/Users/Username/Desktop/FileIOTest/coding_assignment.txt";
File f = new File(filePath);
FileOutputStream fileout = new FileOutputStream (f);
DataOutputStream dataOut = new DataOutputStream(fileout);
dataOut.writeBytes(result);
dataOut.close();
}
}

I guess you should try to complete the lines by yourself, so I only give you some basic input.
To read from file, three lines:
fileIn
dataIn
readBytes
These lines should be easy. To get the numbers, use
split
Integer.ParseInt
and to display, you may refer to
Integer.toHexString
Integer.toBinaryString

From your code, you have not started to write the code for Reading numbers from file.
Regarding reading the values, you can use BurreferReader to read numbers line by line. And then you can use String.split method to split the numbers into an array using split(" ")
Regarding converting int value to Binary and Hex mode, you can use the method toBinaryString and toHexString in Integer class, like
int i = 20;
System.out.println(i);//Print int value
System.out.println(Integer.toBinaryString(i)); //Print Binary string
System.out.println(Integer.toHexString(i)); // Print Hex string
Output in Console is as follows:
20
10100
14

To convert numbers to binary and hexadecimal from base 10, you can simply use the methods:
Integer.toBinaryString(n);
Integer.toHexString(n);
However, if you actually want to code these for yourself, try checking out the following website:
http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/toBaseK.html
It helps provide a simple algorithm that converts from base 10 to any other number base.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.