I want to know that when I use PrintWriter for writing to a file. It will write with ASCII code in the file or binary format?
Thanks.
A Writer writes characters, so the binary data that ends up in the file depends on the encoding.
For example, if you have a 16-bit encoding like UTF-16 then there will be an extra zero byte for each ASCII byte:
public class TestWriter
{
public static void main(String[] args)
throws UnsupportedEncodingException
{
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
final OutputStreamWriter out = new OutputStreamWriter(baos, "UTF-16");
final PrintWriter writer = new PrintWriter(out);
writer.printf("abc");
writer.close();
for (final byte b : baos.toByteArray())
{
System.out.printf("0x%02x ", b);
}
System.out.printf("\n");
}
}
prints 0xfe 0xff 0x00 0x61 0x00 0x62 0x00 0x63.
It will output character data. But this can be beyond the ASCII set, which holds only 128 characters, of which the first 32 are special control characters.
You can write most ASCII letters as text or binary and you will get the same outcome with most characters encodings. Two of the ASCII characters have a special meaning in text files, are newline \n and carriage return \r. This means that if you have text file and you want to write a String which contains these characters it can be hard or impossible for the reader to distinguish between end-of-line in the text file e.g. println() and end-of-line you put in a String e.g. print("1\n2\n3\n")
Related
If a string of data contains characters with different encodings, is there a way to change charset encoding after an input stream is created or suggestions on how it could be achieved?
Example to help explain:
// data need to read first 4 characters using UTF-8 and next 4 characters using ISO-8859-2?
String data = "testўёѧẅ"
// use default charset of platform, could pass in a charset
try (InputStream in = new ByteArrayInputStream(data.getBytes())) {
// probably an input stream reader to use char instead of byte would be clearer but hopefully the idea comes across
byte[] bytes = new byte[4];
while (in.read(bytes) != -1) {
// TODO: change the charset here to UTF-8 then read values
// TODO: change the charset here to ISO-8859-2 then read values
}
}
Been looking at decoders, might be the way to go:
What is CharsetDecoder.decode(ByteBuffer, CharBuffer, endOfInput)
Encoding conversion in java
Attempt using same input stream:
String data = "testўёѧẅ";
InputStream inputStream = new ByteArrayInputStream(data.getBytes());
Reader r = new InputStreamReader(inputStream, "UTF-8");
int intch;
int count = 0;
while ((intch = r.read()) != -1) {
System.out.println((char)ch);
if ((++count) == 4) {
r = new InputStreamReader(inputStream, Charset.forName("ISO-8859-2"));
}
}
//outputs test and not the 2nd part
Assuming that you know there will be n UTF-8 characters and m ISO 8859-2 characters in your stream (n=4, m=4 in your example), you can do by using two different InputStreamReaders working on the same InputStream:
try (InputStream in = new ByteArrayInputStream(data.getBytes())) {
InputStreamReader inUtf8 = new InputStreamReader(in, StandardCharsets.UTF_8);
InputStreamReader inIso88592 = new InputStreamReader(in, Charset.forName("ISO-8859-2"));
// read `n` characters using inUtf8, then read `m` characters using inIso88592
}
Note that you need to read characters not bytes (i.e. check how many characters how been read so far, as in UTF-8 a single character may be encoded on 1-4 bytes).
String contains Unicode so it can combine all language scripts.
String data = "testўёѧẅ";
For that String uses a char array, where char is UTF-16. Sometimes a Unicode symbol, a code point, needs to be encoded as two chars. So: char only for a part of the Unicode maps Unicode code points exactly. Here it might do:
String d1 = data.substring(0, 4);
byte[] b1 = data.getBytes(StandardCharsets.UTF_8); // Binary data, UTF-8 text
String d2 = data.substring(4);
Charset charset = Charset.from("ISO-8859-2");
byte[] b2 = data.getBytes(charset); // Binary data, Latin-2 text
The number of bytes do not need to correspond to the number of code points.
Also é might be 1 code point é, or two code points: e and a zero width ´.
To split text by script or Unicode block:
data.codePoints().forEach(cp -> System.out.printf("%-35s - %-25s - %s%n",
Character.getName(cp),
Character.UnicodeBlock.of(cp),
Character.UnicodeScript.of(cp)));
Name: Unicode block: Script:
LATIN SMALL LETTER T - BASIC_LATIN - LATIN
LATIN SMALL LETTER E - BASIC_LATIN - LATIN
LATIN SMALL LETTER S - BASIC_LATIN - LATIN
LATIN SMALL LETTER T - BASIC_LATIN - LATIN
CYRILLIC SMALL LETTER SHORT U - CYRILLIC - CYRILLIC
CYRILLIC SMALL LETTER IO - CYRILLIC - CYRILLIC
CYRILLIC SMALL LETTER LITTLE YUS - CYRILLIC - CYRILLIC
LATIN SMALL LETTER W WITH DIAERESIS - LATIN_EXTENDED_ADDITIONAL - LATIN
This is my encryption program. Primarily used to encrypt Files(text)
This part of the program converts List<Integer> elements intobyte [] and writes it into a text file. Unfortunately i cannot provide the algorithm.
void printit(List<Integer> prnt, File outputFile) throws IOException
{
StringBuilder building = new StringBuilder(prnt.size());
for (Integer element : prnt)
{
int elmnt = element;
//building.append(getascii(elmnt));
building.append((char)elmnt);
}
String encryptdtxt=building.toString();
//System.out.println(encryptdtxt);
byte [] outputBytes = offo.getBytes();
FileOutputStream outputStream =new FileOutputStream(outputFile);
outputStream.write(outputBytes);
outputStream.close();
}
This is the decryption program where the decryption program get input from a .enc file
void getfyle(File inputFile) throws IOException
{
FileInputStream inputStream = new FileInputStream(inputFile);
byte[] inputBytes = new byte[(int)inputFile.length()];
inputStream.read(inputBytes);
inputStream.close();
String fylenters = new String(inputBytes);
for (char a:fylenters.toCharArray())
{
usertext.add((int)a);
}
for (Integer bk : usertext)
{
System.out.println(bk);
}
}
Since the methods used here, in my algorithm require List<Integer> byte[] gets converted to String first and then to List<Integer>and vice versa.
The elements while writing into a file during encryption do not match the elements read from the .enc file.
Is my method of converting List<Integer> to byte[] correct??
or is something else wrong? . I do know that java can't print extended ASCII characters so i used this .But, even this failed.It gives a lot of ?s
Is there a solution??
please help me .. and also how to do it for other formats(.png.mp3....etc)
The format of the encrypted file can be anything (it needn't be .enc)
thanxx
There are thousands of different 'extended ASCII' codes and Java supports about a hundred of them,
but you have to tell it which 'Charset' to use or the default often causes data corruption.
While representing arbitrary "binary" bytes in hex or base64 is common and often necessary,
IF the bytes will be stored and/or transmitted in ways that preserve all 256 values, often called "8-bit clean",
and File{Input,Output}Stream does, you can use "ISO-8859-1" which maps Java char codes 0-255 to and from bytes 0-255 without loss, because Unicode is based partly on 8859-1.
on input, read (into) a byte[] and then new String (bytes, charset) where charset is either the name "ISO-8859-1"
or the java.nio.charset.Charset object for that name, available as java.nio.charset.StandardCharSets.ISO_8859_1;
or create an InputStreamReader on a stream reading the bytes from a buffer or directly from the file, using that charset name or object, and read chars and/or a String from the Reader
on output, use String.getBytes(charset) where charset is that charset name or object and write the byte[];
or create an OutputStreamWriter on a stream writing the bytes to a buffer or the file, using that charset name or object, and write chars and/or String to the Writer
But you don't actually need char and String and Charset at all. You actually want to write a series of Integers as bytes, and read a series of bytes as Integers. So just do that:
void printit(List<Integer> prnt, File outputFile) throws IOException
{
byte[] outputBytes = new byte[prnt.size()]; int i = 0;
for (Integer element : prnt) outputBytes[i++] = (byte)element;
FileOutputStream outputStream =new FileOutputStream(outputFile);
outputStream.write(b);
outputStream.close();
// or replace the previous three lines by one
java.nio.file.Files.write (outputFile.toPath(), outputBytes);
}
void getfyle(File inputFile) throws IOException
{
FileInputStream inputStream = new FileInputStream(inputFile);
byte[] inputBytes = new byte[(int)inputFile.length()];
inputStream.read(inputBytes);
inputStream.close();
// or replace those four lines with
byte[] inputBytes = java.nio.file.Files.readAllBytes (inputFile.toPath());
for (byte b: inputBytes) System.out.println (b&0xFF);
// or if you really wanted a list not just a printout
ArrayList<Integer> list = new ArrayList<Integer>(inputBytes.length);
for (byte b: inputBytes) list.add (b&0xFF);
// return list or store it or whatever
}
Arbitrary data bytes are not all convertible to any character encoding and encryption creates data bytes including all values 0 - 255.
If you must convert the encrypted data to a string format the standard methods are to convert to Base64 or hexadecimal.
In encryption part:
`for (Integer element : prnt)
{
int elmnt = element;
//building.append(getascii(elmnt));
char b = Integer.toString(elmnt).charAt(0);
building.append(b);
}`
-->this will convert int to char like 1 to '1' and 5 to '5'
There are many similar questions, but no one helped me.
utf-8 can be 1 byte or 2,3,4.
ISO-8859-15 is allways 2 bytes.
But I need 1 byte character like code page Code "page 863" (IBM863).
http://en.wikipedia.org/wiki/Code_page_863
For example "é" is code point 233 and is 2 bytes long in utf 8, how can I convert it to IBM863 (1 byte) in Java?
Running on JVM -Dfile.encoding=UTF-8 possible?
Of course that conversion would mean that some characters can be lost, because IBM863 is smaller.
But I need the language specific characters, like french, è, é etc.
Edit1:
String text = "text with é";
Socket socket = getPrinterSocket( printer);
BufferedWriter bwOut = getPrinterWriter(printer,socket);
...
bwOut.write("PRTXT \"" + text + "\n");
...
if (socket != null)
{
bwOut.close();
socket.close();
}
else
{
bwOut.flush();
}
Its going a label printer with Fingerprint 8.2.
Edit 2:
private BufferedWriter getPrinterWriter(PrinterLocal printer, Socket socket)
throws IOException
{
return new BufferedWriter(new OutputStreamWriter(socket.getOutputStream()));
}
First of all: there is no such thing as "1 byte char" or, in fact, "n byte char" for whatever n.
In Java, a char is a UTF-16 code unit; depending on the (Unicode) code point, either one, or two chars, are necessary to represent a code point.
You can use the following methods:
Character.toChars() to turn a Unicode code point into a char array representing this code point;
a CharsetEncoder to perform the char[] to byte[] conversion;
a CharsetDecoder to perform the byte[] to char[] conversion.
You obtain the two latter from a Charset's .new{Encoder,Decoder}() methods.
It is crucially important here to know what your input is exactly: is it a code point, is it an encoded byte array? You'll have to adapt your code depending on this.
Final note: the file.encoding setting defines the default charset to use when you don't specify a charset to use, for instance in a FileReader constructors; you should avoid not specifying a charset to begin with!
byte[] someUtf8Bytes = ...
String decoded = new String(someUtf8Bytes, StandardCharsets.UTF8);
byte[] someIso15Bytes = decoded.getBytes("ISO-8859-15");
byte[] someCp863Bytes = decoded.getBytes("cp863");
If you start with a string, use just getBytes with a proper encoding.
If you want to write strings with a proper encoding to a socket, you can either use OutputStream instead of PrintStream or Writer and send byte arrays, or you can do:
new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), "cp863"))
i need to write a simple CSV file using OutputStreamWriter everything works OK but i have a problem a have in the first Header on the CSV the outer left on every line seems to ADD improperly a Character or a sequence of them in the String here is my Java Code
private final Character SEPARATOR=';';
private final Character LINE_FEED='\n';
public void createCSV(final String fileName)//......
{
try
(final OutputStream outputStream = new FileOutputStream(fileName);
final OutputStreamWriter writer=new OutputStreamWriter(outputStream,StandardCharsets.UTF_16);)
{
final StringBuilder builder = new StringBuilder().append("Fecha").append(SEPARATOR)
.append("NºExp").append(SEPARATOR)
.append("NºFactura").append(SEPARATOR).append(LINE_FEED);
writer.append(builder.toString());
writer.append(builder.toString());
writer.flush();
}catch (IOException e){e.printStackTrace();}
}
unfortunalety i am receiving this ouput always happens in the first line if i repeat the same output to the second line in the CSV everything works smoothly is a Java problem or is my Excel gives me nightmares??.. thank a lot..
OUTPUT
This is a superfluous byte order mark (BOM), \uFFFE, a zero width space, its byte encoding used to determine whether it is UTF-16LE (little endian) or UTF-16-BE (big endian).
Write "UTF16-LE", which has the Windows/Intel ordering of least significant byte, most significant byte.
StandardCharsets.UTF_16LE
Good day.
I have an ASCII file with Spanish words. They contain only characters between A and Z, plus Ñ, ASCII Code 165 (http://www.asciitable.com/).
I get this file with this source code:
InputStream is = ctx.getAssets().open(filenames[lang_code][w]);
InputStreamReader reader1 = new InputStreamReader(is, "UTF-8");
BufferedReader reader = new BufferedReader(reader1, 8000);
try {
while ((line = reader.readLine()) != null) {
workOn(line);
// do a lot of things with line
}
reader.close();
is.close();
} catch (IOException e) { e.printStackTrace(); }
What here I called workOn() is a function that should extract the characters codes from the strings and is something like that:
private static void workOn(String s) {
byte b;
for (int w = 0; w < s.length(); w++) {
b = (byte)s.charAt(w);
// etc etc etc
}
}
Unfortunately what happens here is that I cannot identify b as an ASCII code when it represents the Ñ letter. The value of b is correct for any ascii letter, and returns -3 when dealing with Ñ, that, brought to signed, is 253, or the ASCII character ². Nothing similar to Ñ...
What happens here? How should I get this simple ASCII code?
What is getting me mad is that I cannot find a correct coding. Even, if I go and browse the UTF-8 table (http://www.utf8-chartable.de/) Ñ is 209dec and 253dec is ý, 165dec is ¥. Again, not event relatives to what I need.
So... help me please! :(
Are you sure that your source file you are reading is UTF-8 encoded? In UTF-8 encoding, all values greater than 127 are reserved for a multi-byte sequence, and they are never seen standing on their own.
My guess is that the file you are reading is encoded using "code page 237" which is the original IBM PC character set. In that character set, the Ñ is represented by the decimal 165.
Many modern systems use ISO-8859-1, which happen to be equivalent to the first 256 characters of the Unicode character set. In those, the Ñ character is a decimal 209. In a comment, the author clarified that a 209 is actually in the file.
If the file was really UTF-8 encoded, then the Ñ would be represented as a two-byte sequence, and would be neither the value 165 nor the value 209.
Based on the above assumption that the file is ISO-8859-1 encoded, you should be able to solve the situation by using:
InputStreamReader reader1 = new InputStreamReader(is, "ISO-8859-1");
This will translate to the Unicode characters, and you should then find the character Ñ represented by decimal 209.