Java OutputStreamWriter UTF-16 CVS wrong character on StartLine - java

i need to write a simple CSV file using OutputStreamWriter everything works OK but i have a problem a have in the first Header on the CSV the outer left on every line seems to ADD improperly a Character or a sequence of them in the String here is my Java Code
private final Character SEPARATOR=';';
private final Character LINE_FEED='\n';
public void createCSV(final String fileName)//......
{
try
(final OutputStream outputStream = new FileOutputStream(fileName);
final OutputStreamWriter writer=new OutputStreamWriter(outputStream,StandardCharsets.UTF_16);)
{
final StringBuilder builder = new StringBuilder().append("Fecha").append(SEPARATOR)
.append("NºExp").append(SEPARATOR)
.append("NºFactura").append(SEPARATOR).append(LINE_FEED);
writer.append(builder.toString());
writer.append(builder.toString());
writer.flush();
}catch (IOException e){e.printStackTrace();}
}
unfortunalety i am receiving this ouput always happens in the first line if i repeat the same output to the second line in the CSV everything works smoothly is a Java problem or is my Excel gives me nightmares??.. thank a lot..
OUTPUT

This is a superfluous byte order mark (BOM), \uFFFE, a zero width space, its byte encoding used to determine whether it is UTF-16LE (little endian) or UTF-16-BE (big endian).
Write "UTF16-LE", which has the Windows/Intel ordering of least significant byte, most significant byte.
StandardCharsets.UTF_16LE

Related

Base64 Encoded to Decoded File Conversion Problem

I am processing very large files (> 2Gig). Each input file is Base64 encoded, andI am outputting to new files after decoding. Depending on the buffer size (LARGE_BUF) and for a given input file, my input to output conversion either works fine, is missing one or more bytes, or throws an exception at the outputStream.write line (IllegalArgumentException: Last unit does not have enough bits). Here is the code snippet (could not cut and paste so my not be perfect):
.
.
final int LARGE_BUF = 1024;
byte[] inBuf = new byte[LARGE_BUF];
try(InputStream inputStream = new FileInputStream(inFile); OutputStream outStream new new FileOutputStream(outFile)) {
for(int len; (len = inputStream.read(inBuf)) > 0); ) {
String out = new String(inBuf, 0, len);
outStream.write(Base64.getMimeDecoder().decode(out.getBytes()));
}
}
For instance, for my sample input file, if LARGE_BUF is 1024, output file is 4 bytes too small, if 2*1024, I get the exception mentioned above, if 7*1024, it works correctly. Grateful for any ideas. Thank you.
First, you are converting bytes into a String, then immediately back into bytes. So, remove the use of String entirely.
Second, base64 encoding turns each sequence of three bytes into four bytes, so when decoding, you need four bytes to properly decode three bytes of original data. It is not safe to create a new decoder for each arbitrarily read sequence of bytes, which may or may not have a length which is an exact multiple of four.
Finally, Base64.Decoder has a wrap(InputStream) method which makes this considerably easier:
try (InputStream inputStream = Base64.getDecoder().wrap(
new BufferedInputStream(
Files.newInputStream(Paths.get(inFile))))) {
Files.copy(inputStream, Paths.get(outFile));
}

java convert utf-8 2 byte char to 1 byte char

There are many similar questions, but no one helped me.
utf-8 can be 1 byte or 2,3,4.
ISO-8859-15 is allways 2 bytes.
But I need 1 byte character like code page Code "page 863" (IBM863).
http://en.wikipedia.org/wiki/Code_page_863
For example "é" is code point 233 and is 2 bytes long in utf 8, how can I convert it to IBM863 (1 byte) in Java?
Running on JVM -Dfile.encoding=UTF-8 possible?
Of course that conversion would mean that some characters can be lost, because IBM863 is smaller.
But I need the language specific characters, like french, è, é etc.
Edit1:
String text = "text with é";
Socket socket = getPrinterSocket( printer);
BufferedWriter bwOut = getPrinterWriter(printer,socket);
...
bwOut.write("PRTXT \"" + text + "\n");
...
if (socket != null)
{
bwOut.close();
socket.close();
}
else
{
bwOut.flush();
}
Its going a label printer with Fingerprint 8.2.
Edit 2:
private BufferedWriter getPrinterWriter(PrinterLocal printer, Socket socket)
throws IOException
{
return new BufferedWriter(new OutputStreamWriter(socket.getOutputStream()));
}
First of all: there is no such thing as "1 byte char" or, in fact, "n byte char" for whatever n.
In Java, a char is a UTF-16 code unit; depending on the (Unicode) code point, either one, or two chars, are necessary to represent a code point.
You can use the following methods:
Character.toChars() to turn a Unicode code point into a char array representing this code point;
a CharsetEncoder to perform the char[] to byte[] conversion;
a CharsetDecoder to perform the byte[] to char[] conversion.
You obtain the two latter from a Charset's .new{Encoder,Decoder}() methods.
It is crucially important here to know what your input is exactly: is it a code point, is it an encoded byte array? You'll have to adapt your code depending on this.
Final note: the file.encoding setting defines the default charset to use when you don't specify a charset to use, for instance in a FileReader constructors; you should avoid not specifying a charset to begin with!
byte[] someUtf8Bytes = ...
String decoded = new String(someUtf8Bytes, StandardCharsets.UTF8);
byte[] someIso15Bytes = decoded.getBytes("ISO-8859-15");
byte[] someCp863Bytes = decoded.getBytes("cp863");
If you start with a string, use just getBytes with a proper encoding.
If you want to write strings with a proper encoding to a socket, you can either use OutputStream instead of PrintStream or Writer and send byte arrays, or you can do:
new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), "cp863"))

convert charset X to unicode in Java

How do you convert a specific charset to unicode in Java?
charsets have been discussed quite a lot here, but I think this one hasn't been covered yet.
I have a hex-string that meets the criteria length%4==0 (e.g. \ud3faef8e). usually I just display this in an HTML container and add &#x to the front and ; to the back of each hex quadruple.
but in this case the following procedure led to the correct output (non-Java)
paste hex string into Hex-Editor and save the file to test.txt (utf-8)
open the file with Notepad++
change the encoding to Simplified Chinese (GB2312)
Now I'm trying to do the same in Java.
// having hex convert to ascii
String ascii = "";
for (int cnt = 0; cnt <= unicode.length() - 2; cnt += 2) {
String tmp = unicode.substring(cnt, cnt + 2);
int decimal = Integer.parseInt(tmp, 16);
ascii += (char) decimal;
}
// writing ascii to file at this point leads to the same result as in step 2 before
try {
// get the bytes
byte[] utf8 = ascii.getBytes("UTF-8"); // == UTF8
// convert to gb2312
String converted = new String(utf8, "GB2312"); // == EUC_CN
// write to file (writer with declared UTF-8)
writeToFile(converted, 20 + cntu);
cntu++;
} catch (Exception e) {
System.err.println(e.getMessage());
}
the output looks according the should-output, except the fact that randomly the following character is displayed: � why does this one come up? and how can I get rid of it?
in the end, what I'd like to get is the converted unicode again to be able to display it with my original approach (폴), but I haven't figured out a way to get to the hex values again (they don't match the criteria length%4==0). how do I get the hex values of the characters?
update1
to be more precise, regarding the input, I'm assuming that it is Unicode, because of the start of the String with \u, which would be sufficient for my usual approach, but not in the case I am describing above.
update2
the writeToFile method
FileOutputStream fos = new FileOutputStream("test" + id + ".txt");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(str);
out.close();
I tried with GB2312 as well, but there is no change. I still get the ? inbetween the correct characters.
update3
the expected output for \ud3f6ef8e is 遇飵 , you get to it when following the steps 1 to 3. (HxD as an example of an hex editor)
there was no indication that I should delete my question, thus I'm writing my final comment as the answer
I was misinterpreting the incoming hex-digits. they were in a specific charset and not uni-code, so they represented the hex-values of a character in that charset. What I'm doing now is new String(byteArray, "CharsetName"); and get (int)s.charAt(i) to get the unicode value and write it to HTML. thanks for your ideas and hints
for more details see this answer here: https://stackoverflow.com/a/4049781/1338732 , and this question here: How to convert UTF-8 to unicode in Java?

Java reading hex file

As part of a larger program, I need to read values from a hex file and print the decimal values.
It seems to be working fine; However all hex values ranging from 80 to 9f are giving wrong values.
for example 80 hex gives a decimal value of 8364
Please help.
this is my code :
String filename = "pidno5.txt";
FileInputStream ist = new FileInputStream("sb3os2tm1r01897.032");
BufferedReader istream = new BufferedReader(new InputStreamReader(ist));
int b[]=new int[160];
for(int i=0;i<160;i++)
b[i]=istream.read();
for(int i=0;i<160;i++)
System.out.print((b[i])+" ");
If you were trying to read raw bytes this is not what you are doing.
You are using a Reader, which reads characters (in an encoding you did not specify, so it defaults to something, maybe UTF-8).
To read bytes, use an InputStream (and do not wrap it in a Reader).
You may also use a different encoding:
BufferedReader istream = new BufferedReader(new InputStreamReader(ist, "ISO-8859-15"));

Using a PrintWriter to write ASCII codes to a file

I want to know that when I use PrintWriter for writing to a file. It will write with ASCII code in the file or binary format?
Thanks.
A Writer writes characters, so the binary data that ends up in the file depends on the encoding.
For example, if you have a 16-bit encoding like UTF-16 then there will be an extra zero byte for each ASCII byte:
public class TestWriter
{
public static void main(String[] args)
throws UnsupportedEncodingException
{
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
final OutputStreamWriter out = new OutputStreamWriter(baos, "UTF-16");
final PrintWriter writer = new PrintWriter(out);
writer.printf("abc");
writer.close();
for (final byte b : baos.toByteArray())
{
System.out.printf("0x%02x ", b);
}
System.out.printf("\n");
}
}
prints 0xfe 0xff 0x00 0x61 0x00 0x62 0x00 0x63.
It will output character data. But this can be beyond the ASCII set, which holds only 128 characters, of which the first 32 are special control characters.
You can write most ASCII letters as text or binary and you will get the same outcome with most characters encodings. Two of the ASCII characters have a special meaning in text files, are newline \n and carriage return \r. This means that if you have text file and you want to write a String which contains these characters it can be hard or impossible for the reader to distinguish between end-of-line in the text file e.g. println() and end-of-line you put in a String e.g. print("1\n2\n3\n")

Categories