Java How to read unicode characters in socket? [duplicate] - java

This question already has an answer here:
DataInputStream and UTF-8
(1 answer)
Closed 9 years ago.
The server receives byte array as inputstream,and I wrapped the stream with DataInputStream.The first 2 bytes indicate the length of the byte array,and the second 2 bytes indicate a flag,and the next bytes consist of the content.My problem is the content contains unicode character which has 2 bytes.How can I read the unicode char ? My prev code is:
DataInputStream dis = new DataInputStream(is);
int length = dis.readUnsignedShort();
int flag = dis.readUnsignedShort();
String content = "";
int c;
for (int i = 0; i < length - 4; i++) {
c = dis.read();
content += (char) c;
}
It only can read ascII.thxs for your helps!

This depends on encoding scheme of your input. If you do not want to do the heavy-lifting, you could use Apache IOUtils and convert the bytes to unicode string.
Example :
IOUtils.toString(bytes, "UTF-8")

Related

Writing a Java program that encrypts .txt files with an integer key

This is my first question on StackOverflow. Hope it's gonna be clear and detailed enough.
So I need to write 2 methods, encrypt and decrypt.
My encrypt function is:
public void cifra() throws FileNotFoundException,IOException {
FileInputStream in=new FileInputStream(file);
String s="";
int b;
while(in.read()!=-1) {
b=in.read()+key;
s+=b;
}
in.close();
PrintStream ps=new PrintStream(file);
ps.println(s);
ps.close();
}
My decrypt function is the same but with
b=in.read()-key;
But it dont works. The output file is not same as the initial file non-crypted.
Thanks for the help!
Change your while function to this:
while ((b = in.read()) != -1) {
b += key;
s += b;
}
Currently you read twice, first time inside while condition and second inside the loop, so you are skipping 1 character.
in.read() is reading in a single byte of the file, as an integer. You are then converting that integer to a string via s+=b.
So say in.read() gives you 97 (ASCII for 'a') and your key is 5, you are turning around and writing literally 102 to the file, instead of an 'f', which would be the "encoded" character.
Your loop should be building a byte array (or byte stream) and you should write that byte array to the file.
Here are the docs for the ByteArrayOutputStream, which your loop should write to, which you can in-turn write to a file.
You are reading bytes (each one into an int).
A String however is not an array of bytes, but contains Unicode text, and can combine Greek, Chinese and whatever. (In fact String uses chars where every char is two bytes.) There is a conversion involved for the external bytes having some charset encoding. That will go wrong, uses more memory and is slow. Hence generally one does not use String here.
FileInputStream in = new FileInputStream(file);
ByteArrayOutputStream out = new ByteArrayOutputStream();
int b;
while((b = in.read()) !=-1) {
b = (b + key) % 256;
out.write(b);
}
in.close();
byte[] data = out.toByteArray();
FileOutputStream out2 = new FileOutputStream(file);
out2.write(data);
out2.close();
The other problem is that bytes have a range 0 - 255 (or signed bytes -128 - 127).
Hence my %, modulo. one sees & 0xFF too (bitwise AND with 255, 0b1111_1111).
Note that println(someInt) will write a textual representation as an integer, 'A' being int 65 will be stored as "65" - to 2 bytes: 56 and 55.

How to convert a character into an integer in Java?

I am a beginner at Java, trying to figure out how to convert characters from a text file into integers. In the process, I wrote a program which generates a text file showing what characters are generated by what integers.
package numberchars;
import java.io.FileWriter;
import java.io.IOException;
import java.io.FileReader;
import java.lang.Character;
public class Numberchars {
public static void main(String[] args) throws IOException {
FileWriter outputStream = new FileWriter("NumberChars.txt");
//Write to the output file the char corresponding to the decimal
// from 1 to 255
int counter = 1;
while (counter <256)
{
outputStream.write(counter);
outputStream.flush();
counter++;
}
outputStream.close();
This generated NumberChars.txt, which had all the numbers, all the letters both upper and lower case, surrounded at each end by other symbols and glyphs.
Then I tried to read this file and convert its characters back into integers:
FileReader inputStream = new FileReader("NumberChars.txt");
FileWriter outputStream2 = new FileWriter ("CharNumbers.txt");
int c;
while ((c = inputStream.read()) != -1)
{
outputStream2.write(Character.getNumericValue(c));
outputStream2.flush();
}
}
}
The resulting file, CharNumbers.txt, began with the same glyphs as NumberChars.txt but then was blank. Opening the files in MS Word, I found NumberChars had 248 characters (including 5 spaces) and CharNumbers had 173 (including 8 spaces).
So why didn't the Character.getNumericValue(c) result in an integer written to CharNumbers.txt? And given that it didn't, why at least didn't it write an exact copy of NumberChars.txt? Any help much appreciated.
Character.getNumericValue doesn't do what you think it does. If you read the Javadoc:
Returns the int value that the specified character (Unicode code point) represents. For example, the character '\u216C' (the Roman numeral fifty) will return an int with a value of 50.
On error it returns -1 (which looks like 0xFF_FF_FF_FF in 2s complement).
Most characters don't have such a "numeric value," so you write the ints out, each padded to 2 bytes (more on that later), read them back in the same way, and then start writing a whole lot of 0xFFFF (-1 truncated to 2 bytes) courtesy of a misplaced Character.getNumericValue. I'm not sure what MS Word is doing, but it's probably getting confused what the encoding of your file is and glomming all those bytes into 0xFF_FF_FF_FF (because the high bits of each byte are set) and treating that as one character. (Use a text editor more suited to this kind of stuff like Notepad++, btw.) If you were to measure your file's size on disk in bytes it will probably still be 256 chars * 2 bytes/chars = 512 bytes.
I'm not sure what you meant to do here, so I'll note that InputStreamReader and OutputStreamWriter work on a (Unicode) character basis, with an encoder that defaults to the system one. That's why your ints are padded/truncated to 2 bytes. If you wanted pure byte IO, use FileInputStream/FileOutputStream. If you wanted to read and write the ints as Strings, you need to use FileWriter/FileReader, but not like you did.
// Just bytes
// This is a try-with-resources. It executes the code with the decls in it
// but is also like an implicit finally block that calls `close()` on each resource.
try(FileOutputStream fos = new FileOutputStream("bytes.bin")) {
for(int b = 0; b < 256; b++) { // Bytes are signed so we use int.
// This takes an int and truncates it for the lowest byte
fos.write(b);
// Can also fill a byte[] and dump it all at once with overloaded write.
}
}
byte[] bytes = new bytes[256];
try(FileInputStream fis = new FileInputStream("bytes.bin")) {
// Reads up to bytes.length bytes into bytes
fis.read(bytes);
}
// Foreach loop. If you don't know what this does, I think you can figure out from the name.
for(byte b : bytes) {
System.out.println(b);
}
// As Strings
try(FileWriter fw = new FileWriter("strings.txt")) {
for(int i = 0; i < 256; i++) {
// You need a delimiter lest you not be able to tell 12 from 1,2 when you read
// Uses system default encoding
fw.write(Integer.toString(i) + "\n");
}
}
byte[] bytes = new byte[256];
try(
FileReader fr = new FileReader("strings.txt");
// FileReaders can't do stuff like "read one line to String" so we wrap it
BufferedReader br = new BufferedReader(fr);
) {
for(int i = 0; i < 256; i++) {
bytes[i] = Byte.valueOf(br.readLine());
}
}
for(byte b : bytes) {
System.out.println(b);
}
public class MyCLAss {
public static void main(String[] args)
{
char x='b';
System.out.println(+x);//just by witting a plus symbol before the variable you can find it's ascii value....it will give 98.
}
}

Sending string as byte array from C# to Java via socket

I am trying the following:
C# Client:
string stringToSend = "Hello man";
BinaryWriter writer = new BinaryWriter(mClientSocket.GetStream(),Encoding.UTF8);
//write number of bytes:
byte[] headerBytes = BitConverter.GetBytes(stringToSend.Length);
mClientSocket.GetStream().Write(headerBytes, 0, headerBytes.Length);
//write text:
byte[] textBytes = System.Text.Encoding.UTF8.GetBytes(stringToSend);
writer.Write(textBytes, 0, textBytes.Length);
Java Server:
Charset utf8 = Charset.forName("UTF-8");
BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream(), utf8));
while (true) {
//we read header first
int headerSize = in.read();
int bytesRead = 0;
char[] input = new char[headerSize];
while (bytesRead < headerSize)
{
bytesRead += in.read(input, bytesRead, headerSize - bytesRead);
}
String resString = new String(input);
System.out.println(resString);
if (resString.equals("!$$$")) {
break;
}
}
The string size equals 9.That's correct on both sides.But, when I am reading the string iteself on the Java side, the data looks wrong.The char buffer ('input' variable)content looks like this:
",",",'H','e','l','l','o',''
I tried to change endianness with reversing the byte array.Also tried changing string encoding format between ASCII and UTF-8.I still feel like it relates to the endianness problem,but can not figure out how to solve it.I know I can use other types of writers in order to write text data to the steam,but I am trying using raw byte arrays for the sake of learning.
These
byte[] headerBytes = BitConverter.GetBytes(stringToSend.Length);
are 4 bytes. And they aren't character data so it makes no sense to read them with a BufferedReader. Just read the bytes directly.
byte[] headerBytes = new byte[4];
// shortcut, make sure 4 bytes were actually read
in.read(headerBytes);
Now extract your text's length and allocate enough space for it
int length = ByteBuffer.wrap(headerBytes).getInt();
byte[] textBytes = new byte[length];
Then read the text
int remaining = length;
int offset = 0;
while (remaining > 0) {
int count = in.read(textBytes, offset, remaining);
if (-1 == count) {
// deal with it
break;
}
remaining -= count;
offset += count;
}
Now decode it as UTF-8
String text = new String(textBytes, StandardCharsets.UTF_8);
and you are done.
Endianness will have to match for those first 4 bytes. One way of ensuring that is to use "network order" (big-endian). So:
C# Client
byte[] headerBytes = BitConverter.GetBytes(IPAddress.HostToNetworkOrder(stringToSend.Length));
Java Server
int length = ByteBuffer.wrap(headerBytes).order(ByteOrder.BIG_ENDIAN).getInt();
At first glance it appears you have a problem with your indexes.
You C# code is sending an integer converted to 4 bytes.
But you Java Code is only reading a single byte as the length of the string.
The next 3 bytes sent from C# are going to the three zero bytes from your string length.
You Java code is reading those 3 zero bytes and converting them to empty characters which represent the first 3 empty characters of your input[] array.
C# Client:
string stringToSend = "Hello man";
BinaryWriter writer = new BinaryWriter(mClientSocket.GetStream(),Encoding.UTF8);
//write number of bytes: Original line was sending the entire string here. Optionally if you string is longer than 255 characters, you'll need to send another data type, perhaps an integer converted to 4 bytes.
byte[] textBytes = System.Text.Encoding.UTF8.GetBytes(stringToSend);
mClientSocket.GetStream().Write((byte)textBytes.Length);
//write text the entire buffer
writer.Write(textBytes, 0, textBytes.Length);
Java Server:
Charset utf8 = Charset.forName("UTF-8");
BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream(), utf8));
while (true) {
//we read header first
// original code was sending an integer as 4 bytes but was only reading a single char here.
int headerSize = in.read();// read a single byte from the input
int bytesRead = 0;
char[] input = new char[headerSize];
// no need foe a while statement here:
bytesRead = in.read(input, 0, headerSize);
// if you are going to use a while statement, then in each loop
// you should be processing the input but because it will get overwritten on the next read.
String resString = new String(input, utf8);
System.out.println(resString);
if (resString.equals("!$$$")) {
break;
}
}

String to byte[] and byte[] to String [duplicate]

This question already has answers here:
How to convert String into Byte and Back
(2 answers)
How do you convert binary data to Strings and back in Java?
(4 answers)
Closed 7 years ago.
I'm dealing with some code that converts a String into a byte[], then from byte[] to String (a String which is a binary representation of the original String), then I'm supposing to do something with that String. When I try to convert the String to byte[] and byte[] to the original String, something is not working.
byte[] binary = "Example".getBytes(StandardCharsets.UTF_8);
String x = new String();
for(byte b : binary)
{
x += Integer.toBinaryString(b);
}
byte[] b = new byte[x.length()];
for (int i = 0; i < b.length; i++)
{
b[i] = (byte) (x.charAt(i) - '0');
}
String str = new String(b, StandardCharsets.UTF_8);
System.out.println(str);
As you can see in that code, I'm using an example String called "Example" and I'm trying to do what I wrote above.
When I print str, I'm not getting that "Example" string.
Does anyone know a way to do this? I searched for a solution on Stack Overflow itself, but I can't figure out a solution.
This should work without the middle section.
byte[] binary = "Example".getBytes(StandardCharsets.UTF_8);
String str = new String(binary, StandardCharsets.UTF_8);
System.out.println(str);

Java TCP Socket receiving bytes with specified length

I am trying to first read 4 bytes(int) specifying the size of the message and then read the remaining bytes based on the byte count. I am using the following code to accomplish this:
DataInputStream dis = new DataInputStream(
mClientSocket.getInputStream());
// read the message length
int len = dis.readInt();
Log.i(TAG, "Reading bytes of length:" + len);
// read the message data
byte[] data = new byte[len];
if (len > 0) {
dis.readFully(data);
} else {
return "";
}
return new String(data);
Is there a better/efficient way of doing this?
From JavaDocs of readUTF:
First, two bytes are read and used to construct an unsigned 16-bit
*integer* in exactly the manner of the readUnsignedShort method . This
integer value is called the UTF length and specifies the number of
additional bytes to be read. These bytes are then converted to
characters by considering them in groups. The length of each group is
computed from the value of the first byte of the group. The byte
following a group, if any, is the first byte of the next group.
The only problem with this is that your protocol seems to only send 4 bytes for the payload length. Perhaps you can do a similar method but increase the size of length sentinel read to 4 bytes/32-bits.
Also, I see that you are just doing new String(bytes) which works fine as long as the encoding of the data is the same as "the platform's default charset." See javadoc So it would be much safer to just ensure that you are encoding it correctly(e.g. if you know that the sender sends it as UTF-8 then do new String(bytes,"UTF-8") instead).
How about
DataInputStream dis = new DataInputStream(new BufferedInputStream(
mClientSocket.getInputStream()));
return dis.readUTF();
You can use read(byte[] b, int off, int len) like this
byte[] data = new byte[len];
dis.read(data,0,len);

Categories