How to see the contents of a file be written by BUfferbytes - java

WHen I use bufferbyte to input say integer 1 into the file, as .txt file,
Filechanel fc
(buffer.putInt(1))
fc.write(buffer).
when I open it with text editor, it does not appear to be 1 there, but it could be read by buffer correctly. But if I input character such as 'a', 'b' into the file, it appears well.
Is it nature that, when I input integers with bytebuffer, I cannot open it and see it clearly with eyes.

In order to see the integer you write to the file, you must first convert it to readable characters. For example, the integer 20000 is different from the string "20000". The integer is represented as 4 bytes in the file where as the individual characters that make up the readable string consist of at least 5 (in this example). Therefore, what you don't see when you write the integer value to the text file is the text editor trying to interpret the 4 bytes that make up the integer as 4 ascii characters (which may or may not be visible).

All computer files everywhere are just a sequence of bits and bytes.
Humans have come up with a way to represent human readable characters with bit sequences. These are known as character sets or character encodings. A very basic one is ASCII.
For example, the English upper-case character A is represented with the binary value
100 0001
the decimal value
65
or the hex value
41
When you write
(buffer.putInt(1))
fc.write(buffer) // assuming you've positioned the ByteBuffer
you're writing the decimal value 1 as binary to the file. The decimal value 1, as an int, in binary, is
00000000 00000000 00000000 00000001
since an int is 4 bytes.
When you open the file with a text editor (or any editor), it will see 4 bytes and try to give you the textual representation.

Related

Why character can't print after 128 number

In my project i am try to convert Binary number to integer and convert integer to Character. But after 128 number print only '?' character. Please help me how to print up to 250 characters. My code is
class b
{
public static void main(String[] args)
{
String dec1="11011001" ;
System.out.println(dec1);
int dec = Integer.parseInt(dec1, 2);
System.out.println(dec);
String str = new Character((char)dec).toString();
System.out.println("decrypted number is "+str);
}
}
Thank you.
Not all byte values have a printable character associated with them, ASCII does not, many/most unicode bytes do not, and the range 0x00 - 0x1f are all unprintable controls such as DC1, Bell, Backspace, etc. Unicode has the same first 32 characters reserved as non-printable.
Byte values above 127 (0x7f) have different meanings in different encodings, there are many encodings. Historically ASCII was the default encoding and there were many extensions to it. These days the standard is unicode which exists in several varieties including UTF-8, UTF-16 (LE, BE and BOM) and UTF-32 (LE, BE and BOM). UTF8 is common for interchange especially over the net and UTF-16 internally in many systems.
Depending on the encoding and glyph (displayed representation) it may take from one to over 16 bytes to represent a single glyph. Emoji mostly are in code plane 1 meaning that they require more than 16-bits for their code point (unicode is a 21-bit encoding system). Additionally some glyphs are represented by a sequence of code points, examples are flags which combine a country with the flag and Emoji joined with "joiners".
In the case of 217 (0xd9) that is not a legal codepoint in UTF-8 but 217 as two bytes (16-bit integer) (0x00d9) is a valid representation of Ù.
See ASCII and Unicode.
As per your code,First the binary will be converted to Integer and Then you are converting Integer to the Character which is done by checking the ASCII value. It will return the character having same ASCII value as the Integer dec1 you are converting. Since in ASCII TABLE the values are upto 127, You will get the character upto the integer value 127, So for the greater value of dec1 than 127, You will get character as ? which will be then converted into String. First 32 elements are non-printable characters so you will get some strange symbol for it but for value of dec1 in the range 32-126, You will get the character assigned to that particular ASCII value as per ASCII TABLE. Since the value 127 is assigned to DEL, you will get strange symbol for value of dec 127.
The issue is that your console's encoding doesn't match the encoding of the output of your Java program. I don't know what console you're using, but on Windows, you can run this command to see your current encoding:
chcp
The default console's encoding for USA is 437 and for Western Europe and Canada 850. These encodings have the 128 characters from ASCII encoding and 128 additional characters that are different from one encoding to another. You get nothing beyond the 128 ASCII characters because your Java output's encoding doesn't match the console's encoding. You have to change one of them to match the other.
You can change your console's encoding to UTF-8 by running this command:
chcp 65001
If you're not on Windows, you'll have to search for the equivalent commands for your system. But I believe on most Linux & Unix derived systems, you can use the locale command to see the current encoding and the export command to change it.
I receive the following output from your code. I assume that you run the program in an environment/console that doesn't support the character. You need a console that support UTF-8, UTF-16 or similar to be able to print all characters you setup numerical values for.
11011001
217
decrypted number is Ù

the logic behind the diffrerence between fileInputStream and Scanner classes

I'm trying to understand the difference between Scanner.nextByte() and FileInputStream.read(). I read similar topics, but I didn't find the answer of my question. A similar question is asked in the topic : Scanner vs FileInputStream
Let me say what I understand :
Say that a .txt file includes
1
Then,
FileInputStream.read() will return 49
Scanner.nextByte() will return 1
If .txt file includes
a
FileInputStream.read() will return 97.
Scanner.nextByte() will throw a java.util.InputMismatchException.
In the answers which I gave the link, it said that:
FileInputStream.read() will evaluate the 1 as a byte, and return its
value: 49. Scanner.nextByte() will read the 1 and try to evaluate
it as an integer regular expression of radix 10, and give you: 1.
FileInputStream.read() will evaluate the a as a byte, and return its
value: 97. Scanner.nextByte() will read the a and try to evaluate
it as an integer regular expression of radix 10, and throw a
java.util.InputMismatchException.
But I didn't understand what they mean actually. Can you explain these words in simple words with more clear examples? I looked at ASCII table, character 1 corresponds to 49. The reason of FileInputStream.read() return 49 is because of that?
I'm totaly confused. Please explain me in simple words.
Files contain bytes. FileInputStream reads these bytes. So if a file contains one byte whose value is 49, stream.read() will return 49. If the file contains two identical bytes 49, calling read() twice will return 49, then 49.
Characters like 'a', '1' or 'Z' can be stored in files. To be stored in files, they first have to be transformed into bytes, because that's what files contain. There are various ways (called "character encodings") to transform characters to bytes. Some of them (like ASCII, ISO-8859-1 or UTF-8) transform the character '1' into the byte 49.
Scanner reads characters from a file. So it transforms the bytes in the file to characters (using the character encoding, but in the other direction: from bytes to characters). Some sequences of characters form decimal numbers, like for example '123', '-5265', or '1'. Some don't, like 'abc'.
When you call nextByte() on a Scanner, you ask the scanner to read the next sequence of characters (until the next white space or until the end of the file if there is no whitespace), then to check if this sequence of characters represents a valid decimal number, and to check that this decimal number fits into a byte (i.e. be a number between -128 and 127). If it is, the sequence of characters is parsed as a decimal number, stored int a byte, and returned.
So if the file contains the byte 49 twice, the sequence of characters read and parsed by nextByte() would be '11', which would be transformed into the byte 11.

Append byte array as unicode character(s) to a string and display it and write that string back to a file

I have byte array E2 80 94 which means long dash "\u2014". I want to append those bytes to a string so that I see long dash when I display it. How do I do that? After displaying it, howdo I write string to a file so that long dash is stored as E2 80 94?
My bytes vary in length: 1 - 8 bytes. I want to write them literally to a string (and then that string to a file). I have no means to know if those bytes are one character or multiple. I am reading them from a binary file(.mobi).
"0x01 to 0x08: "literals": the byte is interpreted as a count from 1 to 8, and that many literals are copied unmodified from the compressed stream to the decompressed stream." -WikiBooks, PalmDoc Compression
You can build easily a String with
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#String%28byte[],%20java.nio.charset.Charset%29
and then operate with it. Make sure to identify properly the encoding of your binary array in order for the constructor to work

Double byte character in Java

Below code will print the length of byte store to below String which contain double byte Japanese character. Per my understanding, output of this program should be 2, however it is coming as 3. Why it this the case?
String j = "大";
System.out.println(j.getBytes().length);
If this will be always the case, then should I assume below:
1, for single byte character, output of program will be always 1
2, for double byte character, output of program will be always 3
UTF 8 characters byte length can be between 1 to 4 bytes. So your code is printing whatever is the correct byte length for the input japanese character.
I believe the code point for that character is 0x5927, which when represented as UTF-8 is the three bytes E5 A4 A7. (Not all non-ASCII characters take 3 bytes in UTF-8, only those with code points in the range of 0x0800 and 0xFFFF.)
.getBytes() method uses the default system encoding (in case of Linux it's usually UTF-8).
Since you mentioned "one-byte" and "two-byte Japanese characters", I guess you want to use SJIS encoding. You do it this way:
String j = "大";
System.out.println(j.getBytes("SJIS").length);
prints 2.
As a guideline, never use .getBytes without specifying an encoding and never use any other method or class that uses the default system encoding. You'll run your code on a different computer and it will stop working.

Java - bytes and binary

This is a basic question.
When I use a byte stream to write bytes to a file, am I creating a binary file?
For example: I use a byte stream to write text data to a notepad and when I open the notepad in a HEX viewer I see the corresponding hex value for each character. But why not the binary values (i.e 0s and 1s).
I also learned that using a dataoutput/input stream I read/write binary file.
I guess my confusion is with what does it mean to write bytes and what does it mean to write a binary data.
When I use a byte stream to write bytes to a file, am I creating a binary file?
You write the bytes as is, e.g., as the ones and zeroes they are. If these bytes represents characters then commonly no, it's just a text-file (everything is ones and zeroes after all). Otherwise the answers is it depends. The term binary file is missleading, but is usually referers to as a file which can contain arbitrary data.
when I open the notepad in a HEX viewer I see the corresponding hex value for each character. But why not the binary values
HEX is just another representation of bytes. The following three are equal
10 (Decimal value 10)
0xA (Hex value 10)
00001010 (Binary value 10)
A computer only stores binary values. But editors may choose to represent (display) those in another way, such as Hex or decimal form. Given enough bytes, it can even be represented as an image.
what does it mean to write bytes and what does it mean to write a binary data
Binary data means ones and zeroes, e.g., 00001010 which are 8 bits. 8 bits makes a byte.
The confusion could be caused by the application you are using. If you open something in HEX viewer, it should be represented in HEX not BIN.
The notions of "text" and "binary" files is mostly a notional understanding for you and me as "consumers" of the file. Strictly speaking, every file consists of 1's and 0's, and are thus all binary in the truest sense of the word. Hexadecimal representations, encodings for a particular character set, image file formats. You can spin up an array of 100 random bytes, spit it out to a file, and it's just as "binary" as any other file. Its all in the context of how the bytes are interpreted that makes the difference.
Here's an example. In old tried-and-true ACII, an upper-case "A" is encoded as decimal 65. You can represent that to people as 0x41 (hex) in a hex viewer, as an "A" an editor, but ultimately, you write that byte to a file, it's just a byte translated to a series of eight bits, 01000001.
Typically you are create a text file using Writer(s), and a binary file using other means (Streams, Channels, etc.). However, if your 'binary' file contains text and only text, it is a text file regardless.
Regarding hexadecimal format, that is merely a compact (preferred) way of viewing byte values.

Categories