How to decode bytes from ByteBuffer to UTF-8 symbols in NIO? - java

I need to read and print some text from the file using NIO. Code works fine with English, but for Russian I need to decode bytes in UTF-8.
I don't understand the order for converting bytes to UTF-8 symbols. Can you help?
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
import java.nio.file.*;
public class Practice {
public static void main(String[] args) {
try (FileChannel fChan = (FileChannel) Files.newByteChannel(Paths.get("D:/test.txt"))) {
ByteBuffer byteBuf = ByteBuffer.allocate(16);
int count;
do {
count = fChan.read(byteBuf);
if(count != -1) {
byteBuf.rewind();
for(int i = 0; i < count; i++) {
System.out.print((char) byteBuf.get());
}
}
} while(count != -1);
} catch(InvalidPathException e) {
System.out.println("Path exception " + e);
} catch(IOException e) {
System.out.println("IO Exception " + e);
}
}
}

To read UTF-8 encoded text from a ByteBuffer, you can decode it as a CharBuffer:
CharBuffer charBuffer = StandardCharsets.UTF_8.decode(byteBuffer);
For more fine-grained access, use the underlying CharsetDecoder:
CharsetDecoder charsetDecoder = StandardCharsets.UTF_8.newDecoder();
If you truly want to take the raw bytes yourself and decode that using UTF-8, then you first need to learn how UTF-8 works, so do a web search for UTF-8 and start reading, because the way the question is written, it sounds like you don't know that yet. To actually write code for that, you also need to know how to do bit-manipulation in Java, so if you don't know that either, do another web search and start reading. If you can't out that information together to do it, write a new question, explain what you do know, and what is stopping you from applying what you know to the problem.

Related

Problem in reading text from the file using FileInputStream in Java

I have a file input.txt in my system and I want to read data from that file using FileInputStream in Java. There is no error in the code, but still it does not work. It does not display the output. Here is the code, any one help me out kindly.
package com.company;
import java.io.FileInputStream;
import java.io.InputStream;
public class Main {
public static void main(String[] args) {
// write your code here
byte[] array = new byte[100];
try {
InputStream input = new FileInputStream("input.txt");
System.out.println("Available bytes in the file: " + input.available());
// Read byte from the input stream
input.read(array);
System.out.println("Data read from the file: ");
// Convert byte array into string
String data = new String(array);
System.out.println(data);
// Close the input stream
input.close();
} catch (Exception e) {
e.getStackTrace();
}
}
}
Use utility class Files.
Path path = Paths.get("input.txt");
try {
String data = Files.readString(path, Charset.defaultCharset());
System.out.println(data);
} catch (Exception e) {
e.printStackTrace();
}
For binary data, non-text, one should use Files.readAllBytes.
available() is not the file length, just the number of bytes alread buffered by the system; reading more will block while physically reading the disk device.
String.getBytes(Charset) and new String(byte[], Charset) explicitly specify the charset of the actual bytes. String will then keep the text in Unicode, so it may combine all scripts of the world.
Java was designed with text as Unicode, due to the situation then with C and C++. So in a String you can mix Arabic, Greek, Chinese and math symbols. For that binary data (byte[], InputStream, OutputStream) must be given the encoding, Charset, the bytes are in, and then a conversion to Unicode happens for text (String, char, Reader, Writer).
FileInputStream.read(byte[]) requires using the result and just reads one single buffer, must be repeated.

Is there an -1 at the end of an Inputstream?

I am quite new to programming.
While reading the article Byte Streams in "Basic I/O" in The Java Tutorials by Oracle, I came accross this code:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class CopyBytes {
public static void main(String[] args) throws IOException {
FileInputStream in = null;
FileOutputStream out = null;
try {
in = new FileInputStream("xanadu.txt");
out = new FileOutputStream("outagain.txt");
int c;
while ((c = in.read()) != -1) {
out.write(c);
}
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
}
}
I do not understand the condition of the while-loop. Is -1 some kind of sign that the Message is over? Does the FileOutputStream add it at the end?
Thank you all for your attention. I hope you have a wonderfull sylvester.
To add to the other answers, the tool for figuring this out is the documentation.
For the 'read' method of FileInputStream:
public int read()
throws IOException
Reads a byte of data from this input stream. This method blocks if no input is yet available. Specified by:
read in class InputStream
Returns: the next byte of data, or -1 if the
end of the file is reached.
This is definitive.
All standard Java classes are documented in this manner. In case of uncertainty, a quick check will reassure you.
EDIT: "Signals that an end of file or end of stream has been reached unexpectedly during input.
This exception is mainly used by data input streams, which generally expect a binary file in a specific format, and for which an end of stream is an unusual condition. Most other input streams return a special value on end of stream."
The right way is to catch EOFException to find out is it end of file or not, but in tihs case reading chars as EOF -1 is returned and not null, and it's working because there is no char for negative ascii, it's the same to check while ((c = in.read()) >= 0) {}, so you can use != -1 and it will work.

Java NIO scan through ByteBuffer for certain bytes and word with sections

Okay, so I'm trying to do something that seemed like it should be fairly simple, but with these new NIO interfaces, things are confusing the hell out of me! Here's what I'm trying to do, I need to scan through a file as bytes until encountering certain bytes! When I encounter those certain bytes, need to grab that segment of the data and do something with it, and then move on and do this again. I would have thought that with all these markers and positions and limits in ByteBuffer, I'd be able to do this, but I can't seem make it work! Here's what I have so far..
test.text:
this is a line of text a
this is line 2b
line 3
line 4
line etc.etc.etc.
Test.java:
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
public class Test {
public static final Charset ENCODING = Charset.forName("UTF-8");
public static final byte[] NEWLINE_BYTE = {0x0A, 0x0D};
public Test() {
String pathString = "test.txt";
//the path to the file
Path path = Paths.get(pathString);
try (FileChannel fc = FileChannel.open(path,
StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {
if (fc.size() > 0) {
int n;
ByteBuffer buffer = ByteBuffer.allocate((int) fc.size());
do {
n = fc.read(buffer);
} while (n != -1 && buffer.hasRemaining());
buffer.flip();
int pos = 0;
System.out.println("FILE LOADED: |" + new String(buffer.array(), ENCODING) + "|");
do {
byte b = buffer.get();
if (b == NEWLINE_BYTE[0] || b == NEWLINE_BYTE[1]) {
System.out.println("POS: " + pos);
System.out.println("POSITION: " + buffer.position());
System.out.println("LENGTH: " + Integer.toString(buffer.position() - pos));
ByteBuffer lineBuffer = ByteBuffer.wrap(buffer.array(), pos + 1, buffer.position() - pos);
System.out.println("LINE: |" + new String(lineBuffer.array(), ENCODING) + "|");
pos = buffer.position();
}
} while (buffer.hasRemaining());
}
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
public static void main(String args[]) {
Test t = new Test();
}
}
So the first part is working, the fc.read(buffer) function only ever runs once and pulls the entire file into the ByteBuffer. Then in the second do loop, I'm able to loop through byte by byte just fine and it does hit the if statement when it hits a \n(or \r), but then I can't figure out how to get that PORTION of the bytes I've just looked through into a separate byte array to work with! I've tried splice and various flips, and I've tried wrap as shown in the code above, but can't seem to make it work, both buffers alway have the complete file and so does anything I splice or wrap off it!
I just need to loop through the file byte by byte, looking at a certain section at a time, and then my end goal, when I've looked through and found the right spot, I want to insert some data to the right spot! I need that lineBuffer as outputted at "LINE: " to have ONLY the portion of the bytes I've looped through so far! Help and thank you!
Leaving the I/O aside, once you have content in the ByteBuffer it would be a lot simpler to convert it to a CharBuffer via asCharBuffer(). Then CharBuffer implements CharSequence, which gives you a lot of String and regex methods to use.
Here is the solution I ended up with, using the bulk relative get function of ByteBuffer to get the chunk each time. I think I'm using the mark() functionality as it's intended, though am using an additional variable (pos) to keep track of the mark since I can't find a function in ByteBuffer to return the relative position of the mark itself. Also, I've got explicit functionality to look for either \r, \n, or both in sequence. Keep in mind this code will only work on UTF-8 encoded data. I hope this helps someone else.
public class Test {
public static final Charset ENCODING = Charset.forName("UTF-8");
public static final byte[] NEWLINE_BYTES = {0x0A, 0x0D};
public Test() {
//test text file sequence of any strings followed by newline
String pathString = "test.txt";
Path path = Paths.get(pathString);
try (FileChannel fc = FileChannel.open(path,
StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {
if (fc.size() > 0) {
int n;
ByteBuffer buffer = ByteBuffer.allocate((int) fc.size());
do {
n = fc.read(buffer);
} while (n != -1 && buffer.hasRemaining());
buffer.flip();
int newlineByteCount = 0;
buffer.mark();
do {
//get one byte at a time
byte b = buffer.get();
if (b == NEWLINE_BYTES[0] || b == NEWLINE_BYTES[1]) {
newlineByteCount++;
byte nextByte = buffer.get();
if (nextByte == NEWLINE_BYTES[1]) {
newlineByteCount++;
} else {
buffer.position(buffer.position() - 1);
}
int pos = buffer.position();
//reset the buffer back to the mark() position
buffer.reset();
//create an array just the right length and get the bytes we just measured out
int length = pos - buffer.position() - newlineByteCount;
byte[] lineBytes = new byte[length];
buffer.get(lineBytes, 0, length);
String lineString = new String(lineBytes, ENCODING);
System.out.println("LINE: " + lineString);
buffer.position(buffer.position() + newlineByteCount);
buffer.mark();
newlineByteCount = 0;
} else if (newlineByteCount > 0) {
}
} while (buffer.hasRemaining());
}
} catch (IOException ioe) { ioe.printStackTrace(); }
}
public static void main(String args[]) { new Test(); }
}
I needed something similar but more general than splitting a single buffer. In my case, I've multiple buffers; in fact, my code is a modification of Spring StringDecoder that can convert a Flux<DataBuffer>(DataBuffer) to Flux<String>.
https://stackoverflow.com/a/48111196/839733

Reading binary data in Java

So for a project I am working on, I need to be reading binary data from .FRX files into my Java project. Java's standard byte reader however, keeps returning the wrong bytes for me, which I believe could be a result of Java's modified UTF8-encoding. If I use C#'s binary reading methods, I get the output that I require. An obvious (but proving to be difficult) solution is using C# and a DLL to wrap into the Java project, and I was just wondering if anyone has any simpler alternatives in Java, perhaps an alternative standard byte-reader which can be implemented in Java relatively easily.
Any help is greatly appreciated!
Question update
Here is my C# program, which returns the output I am looking for.
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
public class GetFromFRX
{
public string getFromFRX(string filename, int pos)
{
StringBuilder buffer = new StringBuilder();
using (BinaryReader b = new BinaryReader(File.Open("frmResidency.frx", FileMode.Open)))
{
try
{
b.BaseStream.Seek(pos, SeekOrigin.Begin);
int length = b.ReadInt32();
for (int i = 0; i < length; i++)
{
buffer.Append(b.ReadChar());
}
}
catch (Exception e)
{
return "Error obtaining resource\n" + e.Message;
}
}
return buffer.ToString();
}
}
And here is some slightly differently formatted Java code:
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
public class JavaReader {
public static void main(String[] args) throws Exception {
InputStream i = null;
BufferedInputStream b = null;
try{
// open file
i = new FileInputStream("frmResidency.frx");
// input stream => buffed input stream
b = new BufferedInputStream(i);
int numByte = b.available();
byte[] buf = new byte[numByte];
b.read(buf, 2, 3);
for (byte d : buf) {
System.out.println((char)d+":" + d);
}
}catch(Exception e){
e.printStackTrace();
}finally{
if(i!=null)
i.close();
if(b!=null)
b.close();
}
}
}
In your Java code:
You are using available() in a way which is specifically warned against in the Javadoc.
You aren't checking the result returned by the read() method.
You are reading into the buffer at offset 2 and then checking the entire buffer.
You are reading bytes where your C# code reads characters.
You aren't reading the length word.
You aren't using methods like DataInputStream.readInt() which correspond to your C# code.

Integration test for image download java

I'm trying to write an integration test to see if a file is downloaded correctly from a url.
I'm not sure how to test this because I expect to get the file in byte[] but I not really sure about the image that I'm comparing it to.
I thought about downloading the file manually and then convert it to bytes and take the result and paste it in the code as the expected value and than compare it to the result i get.
If you have a better idea I would be glad to hear it.
Thanks:)
Comparing the images' hash value will be helpful.
Compute the hash value before and after downloading the file.
Compare the hash values. If they are equal, your file's integrity is good.
You can use hash algorithms like MD5 or SHA-1. If the files are smaller MD5 is good. For large number of file comparison SHA-1 will be useful since there will be less collisions.
Since you are using and
expect to get the file in byte[]
There's an input stream decorator, java.security.DigestInputStream or java.security.MessageDigest, so that you can compute the digest while using the input stream.
import java.io.*;
import java.security.MessageDigest;
public class MD5Checksum {
public static byte[] createChecksum(String filename) throws Exception {
InputStream fis = new FileInputStream(filename);
byte[] buffer = new byte[1024];
MessageDigest complete = MessageDigest.getInstance("MD5");
int numRead;
do {
numRead = fis.read(buffer);
if (numRead > 0) {
complete.update(buffer, 0, numRead);
}
} while (numRead != -1);
fis.close();
return complete.digest();
}
public static String getMD5Checksum(String filename) throws Exception {
byte[] b = createChecksum(filename);
String result = "";
for (int i=0; i < b.length; i++) {
result += Integer.toString( ( b[i] & 0xff ) + 0x100, 16).substring( 1 );
}
return result;
}
public static void main(String args[]) {
try {
System.out.println(getMD5Checksum("apache-tomcat-5.5.17.exe"));
}
catch (Exception e) {
e.printStackTrace();
}
}
}
Here you can find other also good code snippets.

Categories