Incremental string decoding in Java

Incremental string decoding in Java - java

Suppose I receive bytes in chunks and I want to efficiently decode them to a string (that is going to be Unicode obviously), also I want to know, as soon as I can, if that string begins with a certain sequence.
One way could be:
public boolean inputBytesMatch(InputStream inputStream, String match) throws IOException {
byte[] buff = new byte[1024];
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int len;
while ((len = inputStream.read(buff)) > 0){
byteArrayOutputStream.write(buff, 0, len);
String decoded = new String(byteArrayOutputStream.toByteArray(), Charset.defaultCharset());
if (decoded.startsWith(match)){
return true;
}
}
return false;
}
but this involves allocating a new array from the byteArrayOutputStream every time there is a new chunk and String will do another copy in the constructor. All this seems to me pretty inefficient. Also string will do a decode of the bytes in the constructor, every single time, doing it from the beginning once again.
How can I make this process faster?

Actually you don't need a ByteArrayOutputStream at all.
First turn your String match into a byte[], using your desired encoding.
Then just compare each incoming chunk with the next part of that array:
public boolean inputBytesMatch(InputStream inputStream, String match) throws IOException {
byte[] compare = match.getBytes(Charset.defaultCharset());
int n = compare.length;
int compareAt = 0;
byte[] buff = new byte[n];
int len;
while (compareAt < n && (len = inputStream.read(buff, 0, n-compareAt)) > 0) {
for (int i=0; i < len && compareAt < n; i++, compareAt++) {
if (compare[compareAt] != buff[i]) {
// found contradicting byte
return false;
}
}
}
// No byte was found which contradicts that the streamed data begins with compare.
// Did we actually read enough bytes?
return compareAt >= n;
}
You might find this version more readable:
public boolean inputBytesMatch(InputStream inputStream, String match) throws IOException {
byte[] compare = match.getBytes(Charset.defaultCharset());
int n = compare.length;
int compareAt = 0;
byte[] buff = new byte[n];
int len;
while (compareAt < n && (len = inputStream.read(buff, 0, n-compareAt)) > 0) {
if (!isSubArray(compare, compareAt, buff, len)) {
return false;
}
compareAt += len;
}
return compareAt >= n;
}
private boolean isSubArray(byte[] searchIn, int searchInOffset, byte[] searchFor, int searchForLength)
{
if (searchInOffset + searchForLength >= searchIn.length) {
// can not match
return false;
}
for (int i=0; i < searchForLength; i++) {
if (searchIn[searchInOffset+i] != searchFor[i]) {
return false;
}
}
return true;
}

Related

How to convert Reader to InputStream in java

I need to convert a Reader object into InputStream. My solution right now is below. But my concern is since this will handle big chunks of data, it will increase the memory usage drastically.
private static InputStream getInputStream(final Reader reader) {
char[] buffer = new char[10240];
StringBuilder builder = new StringBuilder();
int charCount;
try {
while ((charCount = reader.read(buffer, 0, buffer.length)) != -1) {
builder.append(buffer, 0, charCount);
}
reader.close();
} catch (final IOException e) {
e.printStackTrace();
}
return new ByteArrayInputStream(builder.toString().getBytes(StandardCharsets.UTF_8));
}
Since I use StringBuilder this will keep the full content of the reader object in memory. I want to avoid this. Is there a way I can pipe Reader object? Any help regarding this highly appreciated.

Using the Apache Commons IO library, you can do this conversion in one line:
//import org.apache.commons.io.input.ReaderInputStream;
InputStream inputStream = new ReaderInputStream(reader, StandardCharsets.UTF_8);
You can read the documentaton for this Class at https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/ReaderInputStream.html
It might be worth trying this to see if it solves the memory issue too.

First: a rare requirement, often it is the other way around, or there is a FileChannel, so one can use a ByteBuffer.
A PipedInputStream would be possible, starting a PipedOutputStream in a second thread. However that is unneeded.
A Reader gives chars. Unicode code points are derived from either one or two chars (the latter a surrogate pair).
/**
* Reader for an InputSteam of UTF-8 text bytes.
*/
public class ReaderInputStream extends InputStream {
private final Reader reader;
private boolean eof;
private int byteCount;
private byte[] bytes = new byte[6];
public ReaderInputStream(Reader reader) {
this.reader = reader;
}
#Override
public int read() throws IOException {
if (byteCount > 0) {
int c = bytes[0];
--byteCount;
for (int i = 0; i < byteCount; ++i) {
bytes[i] = bytes[i + 1];
}
return c;
}
if (eof) {
return -1;
}
int c = reader.read();
if (c == -1) {
eof = true;
return -1;
}
char ch = (char) c;
String s;
if (Character.isHighSurrogate(ch)) {
c = reader.read();
if (c == -1) {
// Error, low surrogate expected.
eof = true;
//return -1;
throw new IOException("Expected a low surrogate char i.o. EOF");
}
char ch2 = (char) c;
if (!Character.isLowSurrogate(ch2)) {
throw new IOException("Expected a low surrogate char");
}
s = new String(new char [] {ch, ch2});
} else {
s = Character.toString(ch);
}
byte[] bs = s.getBytes(StandardCharsets.UTF_8);
byteCount = bs.length;
System.arraycopy(bs, 0, bytes, 0, byteCount);
return read();
}
}
Path source = Paths.get("...");
Path target = Paths.get("...");
try (Reader reader = Files.newBufferedReader(source, StandardCharsets.UTF_8);
InputStream in = new ReaderInputStream(reader)) {
Files.copy(in, target);
}

Java - Write content from one file chunk by chunk (e.g. 8 Bytes) alternately into multiple files

So I've been trying to read the content of a text file and write the content chunk by chunk alternately into e.g. 2 new files.
I already tried multiple ways to do that but it won't work (OutputStream and FileOutputStream seems to be the most suitable).
Before i tried to part the file in e.g. 3 Parts and wrote the first part in one file, the second part in another and so on. Which worked perfectly fine with OutputStream and FileOutputStream.
But it won't work when i want to do it alternately.
To do it alternately i use the round robin algorithm, which on its own works fine.
I would be really thankful if you could show me some examples to do it!
public void splitFile(String filePath, int numberOfParts, long sizeOfParts[]) throws FileNotFoundException, IOException, SQLException {
long bytes = 8;
OutputStream partsPath[] = new OutputStream[numberOfParts];
long bytePositition[] = new long[numberOfParts];
long copy_size[] = new long[numberOfParts];
for (int i = 0; i < numberOfParts; i++) {
copy_size[i] = sizeOfParts[i];
partsPath[i] = new FileOutputStream(path); //Gets Path from my Database (works)
//System.out.println(cloudsTable.getCloudsPathsFromDatabase(i) + '\\' + name + (i + 1) + fileType);
}
InputStream file = new FileInputStream(filePath);
while (true) {
boolean done = true;
for (int i = 0; i < numberOfParts; i++) {
if (copy_size[i] > 0) {
done = false;
if (copy_size[i] > bytes) {
copy_size[i] -= bytes;
bytePositition[i] += bytes;
System.out.println("file " + i + " " + bytePositition[i]);
readWrite(file, bytePositition[i], partsPath[i]);
} else {
bytePositition[i] += copy_size[i];
System.out.println("rest file " + i + " " + bytePositition[i]);
readWrite(file, bytePositition[i], partsPath[i]);
copy_size[i] = 0;
}
}
}
if (done == true) {
break;
}
}
file.close();
for (int i = 0; i < partsPath.length; i++) {
partsPath[i].close();
}
}
private void readWrite(InputStream file, long bytes, OutputStream path) throws IOException {
byte[] buf = new byte[(int) bytes];
while (file.read(buf) != -1) {
path.write(buf);
path.flush();
}
}
What the code does is, it only write the content of the Originalfile in the first-copied file and the following files are empty
EDIT:
To clarify what the code should do is write the first 8 bytes to go to file 1, second 8 bytes to go to file 2, third 8 bytes to go to file 3, fourth 8 bytes to go to file 1, and so on, round robin, until file 1 is sizeOfParts[0] long, file 2 is sizeOfParts[1] long, and file 3 is sizeOfParts[2] long.

The main problem is that the readWrite() method is only supposed to copy one 8-byte block of bytes, but has a loop that makes it copy all the remaining bytes in the input file.
In addition, the code should be enhanced to use try-finally to close the files, and to correctly handle end-of-file, in case the input file is shorter than the sum of parts.
I would eliminate the readWrite() method, and consolidate the logic to prevent duplicate code, like this:
public void splitFile(String inPath, long[] sizeOfParts) throws IOException, SQLException {
final int numberOfParts = sizeOfParts.length;
String[] outPath = new String[numberOfParts];
// Gets Paths from Database here
InputStream in = null;
OutputStream[] out = new OutputStream[numberOfParts];
try {
in = new BufferedInputStream(new FileInputStream(inPath));
for (int part = 0; part < numberOfParts; part++)
out[part] = new BufferedOutputStream(new FileOutputStream(outPath[part]));
byte[] buf = new byte[8];
long[] remain = sizeOfParts.clone();
for (boolean done = false; ! done; ) {
done = true;
for (int part = 0; part < numberOfParts; part++) {
if (remain[part] > 0) {
int len = in.read(buf, 0, (int) Math.min(remain[part], buf.length));
if (len == -1) {
done = true;
break;
}
remain[part] -= len;
System.out.println("file " + part + " " + (sizeOfParts[part] - remain[part]));
out[part].write(buf, 0, len);
done = false;
}
}
}
} finally {
if (in != null)
in.close();
for (int part = 0; part < out.length; part++)
if (out[part] != null)
out[part].close();
}
}

captured photo have grey line at bottom and size of image is very small in codenameone

I am capturing an image, and I am storing it in my file system. Whereas I preview the photo by reading the byte stream.
Somewhere its not reading the byte data completely due to which I get the dark line at bottom of image. How I can use readfully() or readAll() here. When I tried using readall() method, the half of image didnt load properly. So, I am confused how to use this. I cant use File URI here as my image is having unique ID. Its working fine on simulator, but having issue on device. Any help on this would be really appreciated like how to use readfully() or readAll() method if that's the only solution.
public void saveFromFile(final String file, final boolean launchPreviewIfAnnotateAllowed) {
FileSystemStorage fss = FileSystemStorage.getInstance();
InputStream is = fss.openInputStream(file);
final byte[] bytes = getBytes(is);
is.close();
is = null;
model.updateMediaData(bytes);
Image im = Image.createImage(bytes, 0, bytes.length);
}
private static byte[] getBytes(InputStream is) throws IOException {
int len;
int size = 1024;
byte[] buf;
if (is instanceof ByteArrayInputStream) {
size = is.available();
buf = new byte[size];
len = is.read(buf, 0, size);
} else {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
buf = new byte[size];
while ((len = is.read(buf, 0, size)) != -1)
bos.write(buf, 0, len);
buf = bos.toByteArray();
}
return buf;
}
the updateMediaData() looks something like this
public void updateMediaData(byte[] binaryData) {
updateMediaData(binaryData, null);
}
public void updateMediaData(final byte[] binaryData, final String filename) {
mcHelper.updateMediaData(binaryData, photoId, filename);
}
public void updateMediaData(byte[] binaryData, String componentId, String filename) {
if (binaryData != null) {
String userName = pj;
userName = Strings.replaceAll(userName, " ", "_"); //USERNAME RENAME
//now strip any bizzare chars from it
String altered = "";
for (int i = 0; i < userName.length(); i++) {
char c = userName.charAt(i);
if (c == '_' || Character.isDigit(c) || (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')) {
altered += c;
}
}
userName = altered;
final String muid = userName + DateTime.getMediaFileTimeStamp() + "_" + new Guid() + "_" + componentId
+ "." + getExtension(binaryData);
mediaData = new MediaData(muid);
mediaData.setFormFieldId(owner.getIdInt());
mediaData.setMediaType(MediaData.MEDIA_TYPE_PHOTO);
if (muid != null && muid.indexOf(".jpg") != -1) {
mediaData.setFilename(muid);
}
else {
mediaData.setFilename(filename != null ? filename : muid); // If filename isn't specified, use MUID.
}
mediaData.setRevisionNumber(0);
mediaData.setData(binaryData);
mediaData.setMimeType(getMimeType(binaryData));
if (mediaData.getMimeType().indexOf("octet-stream") != -1) {
mediaData.setMimeType("image/jpeg"); //can happen when encrytion used
}
// Save the data to persistent storage.
try {
mm.saveData(mediaData);
} catch (DataAccessException e) {
Application.log(Application.LOG_LEVEL_WARNING, "Failed to save media data for MediaComponent.", e);
}
}
}

Replace getBytes with:
private static byte[] getBytes(InputStream is) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
Util.copy(is, bos);
return bos.bos.toByteArray();
}
This performs the stream copy correctly.

Removing ASCII characters in a string with encoding

I have a byte array which is filled by a serial port event and code is shown below:
private InputStream input = null;
......
......
public void SerialEvent(SerialEvent se){
if(se.getEventType == SerialPortEvent.DATA_AVAILABLE){
int length = input.available();
if(length > 0){
byte[] array = new byte[length];
int numBytes = input.read(array);
String text = new String(array);
}
}
}
The variable text contains the below characters,
"\033[K", "\033[m", "\033[H2J", "\033[6;1H" ,"\033[?12l", "\033[?25h", "\033[5i", "\033[4i", "\033i" and similar types..
As of now, I use String.replace to remove all these characters from the string.
I have tried new String(array , 'CharSet'); //Tried with all CharSet options but I couldn't able to remove those.
Is there any way where I can remove those characters without using replace method?

I gave a unsatisfying answer, thanks to #OlegEstekhin for pointing that out.
As noone else answered yet, and a solution is not a two-liner, here it goes.
Make a wrapping InputStream that throws away escape sequences. I have used a PushbackInputStream, where a partial sequence skipped, may still be pushed back for reading first. Here a FilterInputStream would suffice.
public class EscapeRemovingInputStream extends PushbackInputStream {
public static void main(String[] args) {
String s = "\u001B[kHello \u001B[H12JWorld!";
byte[] buf = s.getBytes(StandardCharsets.ISO_8859_1);
ByteArrayInputStream bais = new ByteArrayInputStream(buf);
EscapeRemovingInputStream bin = new EscapeRemovingInputStream(bais);
try (InputStreamReader in = new InputStreamReader(bin,
StandardCharsets.ISO_8859_1)) {
int c;
while ((c = in.read()) != -1) {
System.out.print((char) c);
}
System.out.println();
} catch (IOException ex) {
Logger.getLogger(EscapeRemovingInputStream.class.getName()).log(
Level.SEVERE, null, ex);
}
}
private static final Pattern ESCAPE_PATTERN = Pattern.compile(
"\u001B\\[(k|m|H\\d+J|\\d+:\\d+H|\\?\\d+\\w|\\d*i)");
private static final int MAX_ESCAPE_LENGTH = 20;
private final byte[] escapeSequence = new byte[MAX_ESCAPE_LENGTH];
private int escapeLength = 0;
private boolean eof = false;
public EscapeRemovingInputStream(InputStream in) {
this(in, MAX_ESCAPE_LENGTH);
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
for (int i = 0; i < len; ++i) {
int c = read();
if (c == -1) {
return i == 0 ? -1 : i;
}
b[off + i] = (byte) c;
}
return len;
}
#Override
public int read() throws IOException {
int c = eof ? -1 : super.read();
if (c == -1) { // Throw away a trailing half escape sequence.
eof = true;
return c;
}
if (escapeLength == 0 && c != 0x1B) {
return c;
} else {
escapeSequence[escapeLength] = (byte) c;
++escapeLength;
String esc = new String(escapeSequence, 0, escapeLength,
StandardCharsets.ISO_8859_1);
if (ESCAPE_PATTERN.matcher(esc).matches()) {
escapeLength = 0;
} else if (escapeLength == MAX_ESCAPE_LENGTH) {
escapeLength = 0;
unread(escapeSequence);
return super.read(); // No longer registering the escape
}
return read();
}
}
}
User calls EscapeRemovingInputStream.read
this read may call some read's itself to fill an byte buffer escapeSequence
(a push-back may be done calling unread)
the original read returns.
The recognition of an escape sequence seems grammatical: command letter, numerical argument(s). Hence I use a regular expression.

Reading files bits and saving them

i have file reader which read entire file and write it's bits.
I have this class which help reading:
import java.io.*;
public class FileReader extends ByteArrayInputStream{
private int bitsRead;
private int bitPosition;
private int currentByte;
private int myMark;
private final static int NUM_BITS_IN_BYTE = 8;
private final static int END_POSITION = -1;
private boolean readingStarted;
/**
* Create a BitInputStream for a File on disk.
*/
public FileReader( byte[] buf ) throws IOException {
super( buf );
myMark = 0;
bitsRead = 0;
bitPosition = NUM_BITS_IN_BYTE-1;
currentByte = 0;
readingStarted = false;
}
/**
* Read a binary "1" or "0" from the File.
*/
public int readBit() throws IOException {
int theBit = -1;
if( bitPosition == END_POSITION || !readingStarted ) {
currentByte = super.read();
bitPosition = NUM_BITS_IN_BYTE-1;
readingStarted = true;
}
theBit = (0x01 << bitPosition) & currentByte;
bitPosition--;
if( theBit > 0 ) {
theBit = 1;
}
return( theBit );
}
/**
* Return the next byte in the File as lowest 8 bits of int.
*/
public int read() {
currentByte = super.read();
bitPosition = END_POSITION;
readingStarted = true;
return( currentByte );
}
/**
*
*/
public void mark( int readAheadLimit ) {
super.mark(readAheadLimit);
myMark = bitPosition;
}
/**
* Add needed functionality to super's reset() method. Reset to
* the last valid position marked in the input stream.
*/
public void reset() {
super.pos = super.mark-1;
currentByte = super.read();
bitPosition = myMark;
}
/**
* Returns the number of bits still available to be read.
*/
public int availableBits() throws IOException {
return( ((super.available() * 8) + (bitPosition + 1)) );
}
}
In class where i call this, i do:
FileInputStream inputStream = new FileInputStream(file);
byte[] fileBits = new byte[inputStream.available()];
inputStream.read(fileBits, 0, inputStream.available());
inputStream.close();
FileReader bitIn = new FileReader(fileBits);
and this work correctly.
However i have problems with big files above 100 mb because byte[] have the end.
So i want to read bigger files. Maybe some could suggest how i can improve this code ?
Thanks.

If scaling to large file sizes is important, you'd be better off not reading the entire file into memory. The downside is that handling the IOException in more locations can be a little messy. Also, it doesn't look like your application needs something that implements the InputStream API, it just needs the readBit() method. So, you can safely encapsulate, rather than extend, the InputStream.
class FileReader {
private final InputStream src;
private final byte[] bits = new byte[8192];
private int len;
private int pos;
FileReader(InputStream src) {
this.src = src;
}
int readBit() throws IOException {
int idx = pos / 8;
if (idx >= len) {
int n = src.read(bits);
if (n < 0)
return -1;
len = n;
pos = 0;
idx = 0;
}
return ((bits[idx] & (1 << (pos++ % 8))) == 0) ? 0 : 1;
}
}
Usage would look similar.
FileInputStream src = new FileInputStream(file);
try {
FileReader bitIn = new FileReader(src);
...
} finally {
src.close();
}
If you really do want to read in the entire file, and you are working with an actual file, you can query the length of the file first.
File file = new File(path);
if (file.length() > Integer.MAX_VALUE)
throw new IllegalArgumentException("File is too large: " + file.length());
int len = (int) file.length();
FileInputStream inputStream = new FileInputStream(file);
try {
byte[] fileBits = new byte[len];
for (int pos = 0; pos < len; ) {
int n = inputStream.read(fileBits, pos, len - pos);
if (n < 0)
throw new EOFException();
pos += n;
}
/* Use bits. */
...
} finally {
inputStream.close();
}

org.apache.commons.io.IOUtils.copy(InputStream in, OutputStream out)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Incremental string decoding in Java - java

Related

How to convert Reader to InputStream in java

Java - Write content from one file chunk by chunk (e.g. 8 Bytes) alternately into multiple files

captured photo have grey line at bottom and size of image is very small in codenameone

Removing ASCII characters in a string with encoding

Reading files bits and saving them

Categories

Resources