Reading different encoding from the same InputStream [duplicate]

Reading different encoding from the same InputStream [duplicate] - java

I'm working through the problems in Programming Pearls, 2nd edition, Column 1. One of the problems involves writing a program that uses only around 1 megabyte of memory to store the contents of a file as a bit array with each bit representing whether or not a 7 digit number is present in the file. Since Java is the language I'm the most familiar with, I've decided to use it even though the author seems to have had C and C++ in mind.
Since I'm pretending memory is limited for the purpose of the problem I'm working on, I'd like to make sure the process of reading the file has no buffering at all.
I thought InputStreamReader would be a good solution, until I read this in the Java documentation:
To enable the efficient conversion of bytes to characters, more bytes may be read ahead from the underlying stream than are necessary to satisfy the current read operation.
Ideally, only the bytes that are necessary would be read from the stream -- in other words, I don't want any buffering.

One of the problems involves writing a program that uses only around 1 megabyte of memory to store the contents of a file as a bit array with each bit representing whether or not a 7 digit number is present in the file.
This implies that you need to read the file as bytes (not characters).
Assuming that you do have a genuine requirement to read from a file without buffering, then you should use the FileInputStream class. It does no buffering. It reads (or attempts to read) precisely the number of bytes that you asked for.
If you then need to convert those bytes to characters, you could do this by applying the appropriate String constructor to a byte or byte[]. Note that for multibyte character encodings such as UTF-8, you would need to read sufficient bytes to complete each character. Doing that without the possibility of read-ahead is a bit tricky ... and entails "knowledge* of the character encoding you are reading.
(You could avoid that knowledge by using a CharsetDecoder directly. But then you'd need to use the decode method that operates on Buffer objects, and that is a bit complicated too.)
For what it is worth, Java makes a clear distinction between stream-of-byte and stream-of-character I/O. The former is supported by InputStream and OutputStream, and the latter by Reader and Write. The InputStreamReader class is a Reader, that adapts an InputStream. You should not be considering using it for an application that wants to read stuff byte-wise.

Related

performance and size limitations on HttpServletResponse.getOutputStream.print(string) vs getWriter(String)

For a web project I'm writing large sections of text to a webpage(table) or even bigger (could be several MB) to CSV files for download.
The java method dealing with this receives a StringBuilder content string, which originally (by the creator of this module) was being sent char by char in a loop:
response.getOutputStream().write(content.charAt(i)).
Upon questioning about the loop, the reason given was that he thought the string might be too big for writing in one go. (using java 1.6).
I can't find any size restrictions anywhere, and then also the question came which method to use instead: print() or getWriter()?
The data in the string is all text.

He assumed wrong. If anything it's inefficient, or at least useless to do that one character at a time. If you have a String in memory, you can write it out at one go without worrying.
If you're only writing text, use a Writer. OutputStream is for binary data (although you can wrap it in an OutputStreamWriter to convert between the two). See Writer or OutputStream?

Both Reader and Stream give the same result , what is the difference? [duplicate]

Today I got this question for which I think I answered very bad. I said stream is a data that flows and reader is a technique where we read from that is a static data. I know this is an awful answer, so please provide me the crisp difference and definitions between these two with example in Java.
Thanks.

An InputStream is byte-oriented. A Reader is character-oriented.
The javadocs are your friend, explaining the difference. Reader, InputStream

As others have said, the use cases for each are slightly different (even though they often can be used interchangeably)
Since readers are for reading characters, they are better when you are dealing with input that is of a textual nature (or data represented as characters). I say better because Readers (in the context of typical usage) are essentially streams with methods that easily facilitate reading character input.

Stream is for reading bytes, Reader is for reading characters. One character may take one byte or more, depending on character set.

Stream classes are byte-oriented classes, that mean all InputStream classes (Buffered and non-buffered) read data byte by byte from stream and all OutputStream(Buffered and non-buffered) classes writes data byte by byte to the stream. Stream classes are useful when you have small data or if you are dealing with binary files like images.
On the other handReader/Writer are character based classes. These classes read or write one character at time from or into stream. These classes extends either java.io.Reader (all character input classes) or java.io.Writer (all character output classes). These classes are useful if you are dealing with text file or other textual stream. These classes are also Buffered and Non-Buffered.

Why to avoid using ByteStream much in Java

We shouldn't use byte Stream as Sun Doc says -
actually it represents a kind of low-level I/O that you should avoid.
What is actually low-level I/O and what is exact problem using byte stream.

So the Java docs say:
CopyBytes seems like a normal program, but it actually represents a
kind of low-level I/O that you should avoid. Since xanadu.txt contains
character data, the best approach is to use character streams, as
discussed in the next section. There are also streams for more
complicated data types. Byte streams should only be used for the most
primitive I/O.
The byte streams give you access to the file as it is. Just the bytes. No interpration of any kind. That means no character set conversion, no handling of ints or floats in binary or ascii representation, no dealing with byte orders, or any of that. The higher level streams provide some of these.
Of course a program that copies a file is actually a pretty good example of something that needs a raw byte stream, because it doesn't need or want to do any kind of intepretation of the data; it just wants to copy it verbatim.
So what the really mean is, use byte streams if you think you need them, but be sure you know what you are doing :)

The suggestion is in the context of reading a text file that is discussed in the tutorial. For that purpose it is better to use character streams to handle character set translation properly:
The Java platform stores character values using Unicode conventions.
Character stream I/O automatically translates this internal format to
and from the local character set.
A program that uses character streams in place of byte streams
automatically adapts to the local character set and is ready for
internationalization — all without extra effort by the programmer.

Is it possible to search a file with comrpessed objects in java?

I read from ORACLE of the following bit:
Can I execute methods on compressed versions of my objects, for example isempty(zip(serial(x)))?
This is not really viable for arbitrary objects because of the encoding of objects. For a particular object (such as String) you can compare the resulting bit streams. The encoding is stable, in that every time the same object is encoded it is encoded to the same set of bits.
So I got this idea, say if I have a char array of 4M something long, is it possible for me to compress it to several hundreds of bytes using GZIPOutputStream, and then map the whole file into memory, and do random search on it by comparing bits? Say if I am looking for a char sequence of "abcd", could I somehow get the bit sequence of compressed version of "abcd", and then just search the file for it? Thanks.

You cannot use GZIP or similar to do this as the encoding of each byte change as the stream is processed. i.e. the only way to determine what a byte means is to read all the bytes previous.
If you want to access the data randomly, you can break the String into smaller sections. That way you only need to decompress a relative short section of data.

Java I/O streams; what are the differences?

java.io has many different I/O streams, (FileInputStream, FileOutputStream, FileReader, FileWriter, BufferedStreams... etc.) and I am confused in determining the differences between them. What are some examples where one stream type is preferred over another, and what are the real differences between them?

Streams: one byte at a time. Good for binary data.
Readers/Writers: one character at a time. Good for text data.
Anything "Buffered": many bytes/characters at a time. Good almost all the time.

When learning Java I made this mental scheme about java.io:
Streams
byte oriented stream (8 bit)
good for binary data such as a Java .class file
good for "machine-oriented" data
Readers/Writers
char (utf-16) oriented stream (16 bit)
good for text such as a Java source
good for "human-oriented" data
Buffered
always useful unless proven otherwise

This is a big topic! I would recommend that you begin by reading I/O Streams:
An I/O Stream represents an input
source or an output destination. A
stream can represent many different
kinds of sources and destinations,
including disk files, devices, other
programs, and memory arrays.
Streams support many different kinds
of data, including simple bytes,
primitive data types, localized
characters, and objects. Some streams
simply pass on data; others manipulate
and transform the data in useful ways.

Separate each name into words: each capital is a different word.
File Input Stream is to get Input from a File using a Stream.
File Output Stream is to write Output to a File using a Stream
And so on and so forth
As mmyers wrote :
Streams: one byte at a time.
Readers/Writers: one character at a time.
Buffered*: many bytes/characters at a time.

The specialisations you mention are specific types used to provide a standard interface to a variety of data sources. For example, a FileInputStream and an ObjectInputStream will both implement the InputStream interface, but will operate on Files and Objects respectively.

Java input and output is defined in terms of an abstract concept called a “stream”, which is a sequence of data.
There are 2 kinds of streams.
Byte streams (8 bit bytes) Æ Abstract classes are: InputStream and OutputStream
Character streams (16 bit UNICODE) Æ Abstract classes are: Reader and Writer
java.io.* classes use the decorator design pattern. The decorator design pattern attaches
responsibilities to objects at runtime. Decorators are more flexible than inheritance because the inheritance
attaches responsibility to classes at compile time. The java.io.* classes use the decorator pattern to construct
different combinations of behavior at runtime based on some basic classes.
from the book Java/J2EE Job Interview Companion By K.Arulkumaran & A.Sivayini

Byte streams are mostly and widely used stream type in java 1.0 for both character and for byte. After java 1.0 it was deprecated and character streams plays a important role. ie., for example
BufferedReader will get the character from the source, and its constructor looks like
BufferedReader(Reader inputReader)..
Here Reader is an abstract class and the once of its concrete classes are InputStreamReader, which will converts bytes into characters and take input from the keyboard(System.in)...
BufferedReader : Contains internal Buffer that will read characters from the stream. Internal counter keeps track of next character to be supplied to the buffer thru read(). InputStreamReader will takes input as bytes and converts internally into characters.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reading different encoding from the same InputStream [duplicate] - java

Related

performance and size limitations on HttpServletResponse.getOutputStream.print(string) vs getWriter(String)

Both Reader and Stream give the same result , what is the difference? [duplicate]

Why to avoid using ByteStream much in Java

Is it possible to search a file with comrpessed objects in java?

Java I/O streams; what are the differences?

Categories

Resources