I have a problem in parsing an InputStream and getting the loading percentage of his data. I mean, my method needs to parse the InputStream, put it into a StringBuffer, get the total of bytes parsed and returns a String based on my StringBuffer.
private String processPercent(InputStream content, HttpResponse response) throws IOException
{
InputStream in = content;
int totalBytes = Integer.parseInt(response.getFirstHeader("Content-Length").getValue());
int processedByte;
int loaded = 0;
StringBuffer sb = new StringBuffer();
while((processedByte = in.read()) != -1)
{
sb.append((char) processedByte);
if(this.asyncTask instanceof IProgressPercent)
{
lastProcessed = processedByte;
loaded += processedByte;
float percent = ((100*loaded) / totalBytes);
this.progressPercent = (int)percent;
this.asyncTask.doProgress(this.progressPercent);
}
}
in.close();
return new String(sb);
}
The problem is when I display the value of percent variable, I get a value higher than 100, so I think that's a calculation problem.
Any idea ? Thanks.
Replace
loaded += processedByte;
with
++loaded;
before the if actually; incrementing by 1.
You have several problems here.
First of all, you are using an InputStream; an InputStream reads bytes. And a char is not two bytes. A char is a UTF-16 code unit. See here.
Second, you only read byte by byte. If your content length is 1M, you will call this code 1M times. Is this really what you want?
You should change your code in the following ways:
first of all, use an InputStreamReader over your InputStream -- and initiate it with the correct charset;
use this reader's read() method which reads into a preallocated char array; use the number of chars read to update your progress bar;
add the contents of your char[] to your StringBuffer (which should really be a StringBuilder).
Due to the fact that the decoding process will produce less chars than there are actual bytes, this means your counter will be a little off, though; it can be cured using even more sophisticated mechanisms, but this will require quite a bit of code.
Of course, an easier way is, if the size is not too large, download directly all the contents, put it into a ByteArrayOutputStream and then decoding that ByteArrayOutputStream's content into a String.
Instead of
float percent = ((100*loaded) / totalBytes);
use
float percent = 100 * (loaded / totalBytes);
Related
As I read following from the Oracle website, I get that the int variable holds a character value in its last 16 bits from inputStream.read().
So does it always waste 2 bytes ?
CopyCharacters is very similar to CopyBytes. The most important
difference is that CopyCharacters uses FileReader and FileWriter for
input and output in place of FileInputStream and FileOutputStream.
Notice that both CopyBytes and CopyCharacters use an int variable to
read to and write from. However, in CopyCharacters, the int variable
holds a character value in its last 16 bits; in CopyBytes, the int
variable holds a byte value in its last 8 bits.
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class CopyCharacters {
public static void main(String[] args) throws IOException {
FileReader inputStream = null;
FileWriter outputStream = null;
try {
inputStream = new FileReader("xanadu.txt");
outputStream = new FileWriter("characteroutput.txt");
int c;
while ((c = inputStream.read()) != -1) {
outputStream.write(c);
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (outputStream != null) {
outputStream.close();
}
}
}
}
So does it always waste 2 bytes ?
Ermm ... yes. Either 2 bytes in the Reader case or 3 bytes in the InputStream case.
This wastage is necessary for the following reasons:
Both InputStream.read() and Reader.read() need to return a value to represent the "end of stream". As the javadocs say:
InputStream.read(): Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned.
Reader.read(): Returns the character read, as an integer in the range 0 to 65535 (0x00-0xffff), or -1 if the end of the stream has been reached.
The extra end-of-stream value means that the return type of read() cannot be (respectively) byte or char. (See also the last reason ...)
It turns out that the "wasted" 2 or 3 bytes are of no consequence. Even a trivial Java program is going to use megabytes of memory. (Indeed, even a trivial C program is going to use tens or hundreds of kilobytes of memory ... if you account for the library code that they use.)
Returning a byte or char probably wouldn't save memory anyway. In a typical modern systems, local variables (even byte and char) are stored word aligned on the stack. This is done because accessing memory with a word aligned address is typically faster.
Replacing the -1 with an exception would be inefficient in another way. Throwing and catching exceptions in Java is significantly more expensive than a simple test for -1.
I am working on a Huffman java application and i'm almost done. I have one problem though. I need to save a String of something like: "101011101010" to a file. When I save it with my current code it saves it as characters which take up 1 byte every 0 or 1. I'm pretty sure it's possible to save every 0/1 as a bit.
I already tried some things with BitSet and Integer.valueOf but I can't get them to work. This is my current code:
FileOutputStream fos = new FileOutputStream("encoded.bin");
fos.write(encoded.getBytes());
fos.close();
Where 'encoded' is a String which can be like: "0101011101".
If I try to save it as integer the leading 0 will be removed.
Thanks in advance!
EDIT: Huffman is a compression method so the outputted file should be as small as possible.
I think I found my answer. I put the 1's and 0's in a BitSet using the following code:
BitSet bitSet = new BitSet(encoded.length());
int bitcounter = 0;
for(Character c : encoded.toCharArray()) {
if(c.equals('1')) {
bitSet.set(bitcounter);
}
bitcounter++;
}
After that I save it to the file using bitSet.toByteArray()
When I want to read it again I convert it back to a bitset using BitSet.valueOf(bitSet.toByteArray()). Then I loop through the bitset like this:
String binaryString = "";
for(int i = 0; i <= set.length(); i++) {
if(set.get(i)) {
binaryString += "1";
} else {
binaryString += "0";
}
}
Thanks to everyone who helped me.
Binary files are limited to storing bits in multiples of eight. You can solve this problem by chopping the string into eight-bit chunks, converting them to bytes using Byte.parseByte(eightCharString, 2) and adding them to a byte array:
Compute the length of the byte array by dividing the length of your bit string by eight
Allocate an array of bytes of the desired length
Run a loop that takes substrings from the string at positions representing multiples of eight
Parse each chunk, and put the result into the corresponding byte
Call fos.write() on the byte array
Try this.
String encoded = "0101011101";
FileOutputStream fos = new FileOutputStream("encoded.bin");
String s = encoded + "00000000".substring(encoded.length() % 8);
for (int i = 0, len = s.length(); i < len; i += 8)
fos.write((byte)Integer.parseInt(s.substring(i, i + 8), 2));
fos.close();
Every object in Parse.com has your own ObjectId, that is a string with 10 char and apparently it is created by this regex: [0-9a-zA-Z]{10}.
Example of ObjectId in Parse:
X12wEq4sFf
Weg243d21s
zwg34GdsWE
I would like to convert this String to Long, because it will save memory and improve searching. (10 chars using UTF-8 has 40 bytes, and 1 long has 8 bytes)
If we calculate the combinations, we can find:
String ObjectId: 62^10 = 839299365868340224 different values;
long: is 2^64 = 18446744073709551616 different values.
So, we can convert these values without losing information. There is a simple way to do it safely? Please, consider any kind of encoding for Chars (UTF-8, UTF-16, etc);
EDIT: I am just thinking in a hard way to solved it. I am asking if there is an easy way.
Your character set is a subset of the commonly-used Base64 encoding, so you could just use that. Java has the Base64 class, no need to roll your own codec for this.
Are you sure this is actually valuable? "because it will save memory and improve searching" seems like an untested assertion; saving a few bytes on the IDs may very well be offset by the added cost of encoding and decoding every time you want to use something.
EDIT: Also, why are you using UTF-8 strings for guaranteed-ascii data? If you represent 10 char IDs as a byte[10], that's just 10 bytes instead of 40 (i.e. much closer to the 8 for a long). And you don't need to do any fancy conversions.
Here's a straightforward solution using 6 bits to store a single character.
public class Converter {
private static final String CHARS = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static int convertChar(char c) {
int ret = CHARS.indexOf( c );
if (ret == -1)
throw new IllegalArgumentException( "Invalid character encountered: "+c);
return ret;
}
public static long convert(String s) {
if (s.length() != 10)
throw new IllegalArgumentException( "String length must be 10, was "+s.length() );
long ret = 0;
for (int i = 0; i < s.length(); i++) {
ret = (ret << 6) + convertChar( s.charAt( i ));
}
return ret;
}
}
I'll leave the conversion from long to String for you to implement, it's basically the same in reverse.
P.s.: If you really want to save space, don't use Long, it adds nothing compared to the primitive long except overhead.
P.s 2: Also note that you aren't really saving much with this conversion: storing the ASCII characters can be done in 10 bytes, while a long takes up 4. What you save here is mostly the overhead you'd get if you stored those 10 bytes in a byte array.
I have an InputStream and I want to read each char until I find a comma "," from a socket.
Heres my code
private static Packet readPacket(InputStream is) throws Exception
{
int ch;
Packet p = new Packet();
String type = "";
while((ch = is.read()) != 44) //44 is the "," in ISO-8859-1 codification
{
if(ch == -1)
throw new IOException("EOF");
type += new String(ch, "ISO-8859-1"); //<----DOES NOT COMPILE
}
...
}
String constructor does not receive an int, only an array of bytes. I read the documentation and the it says
read():
Reads the next byte of data from the input stream.
How can I convert this int to byte then ? Is it using only the less significant bits (8 bits) of all 32 bits of the int ?
Since Im working with Java, I want to keep it full plataform compatible (little endian vs big endian, etc...) Whats the best approach here and why ?
PS: I dont want to use any ready-to-use classes like DataInputStream, etc....
The String constructor takes a char[] (an array)
type += new String(new byte[] { (byte) ch }, "ISO-8859-1");
Btw. it would be more elegant to use a StringBuilder for type and make use of its append-methods. Its faster and also shows the intend better:
private static Packet readPacket(InputStream is) throws Exception {
int ch;
Packet p = new Packet();
StringBuilder type = new StringBuilder();
while((ch = is.read()) != 44) {
if(ch == -1)
throw new IOException("EOF");
// NOTE: conversion from byte to char here is iffy, this works for ISO8859-1/US-ASCII
// but fails horribly for UTF etc.
type.append((char) ch);
}
String data = type.toString();
...
}
Also, to make it more flexible (e.g. work with other character encodings), your method would better take an InputStreamReader that handles the conversion from bytes to characters for you (take look at InputStreamReader(InputStream, Charset) constructor's javadoc).
For this can use an InputStreamReader, which can read encoded character data from a raw byte stream:
InputStreamReader reader = new InputStreamReader(is, "ISO-8859-1");
You may now use reader.read(), which will consume the correct number of bytes from is, decode as ISO-8859-1, and return a Unicode code point that can be correctly cast to a char.
Edit: Responding to comment about not using any "ready-to-use" classes:
I don't know if InputStreamReader counts. If it does, check out Durandal's answer, which is sufficient for certain single byte encodings (like US-ASCII, arguable, or ISO-8859-1).
For multibyte encodings, if you do not want to use any other classes, you would first buffer all data into a byte[] array, then construct a String from that.
Edit: Responding to a related question in the comments on Abhishek's answer.
Q:
Abhishek wrote: Can you please enlighten me a little more? i have tried casting integer ASCII to character..it has worked..can you kindly tell where did i go wrong?
A:
You didn't go "wrong", per se. The reason ASCII works is the same reason that Brian pointed out that ISO-8859-1 works. US-ASCII is a single byte encoding, and bytes 0x00-0x7f have the same value as their corresponding Unicode code points. So a cast to char is conceptually incorrect, but in practice, since the values are the same, it works. Same with ISO-8859-1; bytes 0x00-0xff have the same value as their corresponding code points in that encoding. A cast to char would not work in e.g. IBM01141 (a single byte encoding but with different values).
And, of course, a single byte to char cast would not work for multibyte encodings like UTF-16, as more than one input byte must be read (a variable number, in fact) to determine the correct value of a corresponding char.
type += new String(String.valueOf(ch).getBytes("ISO-8859-1"));
Partial answer: Try replacing :
type += new String(ch, "ISO-8859-1");
by
type+=(char)ch;
This can be done if you receive the ASCII value of the char.Code converts ASCII in to char by casting.
Its better to avoid lengthy code and this would work just fine. The read() function works in many ways:
One way is: int= inpstr.read();
Second inpstr.read(byte)
So its up to you which method you wanna use.. both have different purpose..
I am reading a char array from a file in and then converting it too a string using the String constructor.
read = fromSystem.read(b);
String s = new String(b);
This code has been in the program for ages and works fine, although until now it has been reading the full size of the array, 255 chars, each time. Now I am reusing the class for another purpose and the size of what it reads varies. I am having the problem that if it reads, say 20 chars, then 15, the last 5 of the previous read are still in the byte array. To overcome this I added a null char at the end of what had been read.
read = fromSystem.read(b);
if (read < bufferLength) {
b[read] = '\0';
}
String s = new String(b);
If I then did
System.out.println(b);
It works, the end of the buffer doens't show. However, if I pass that string into a message dialog then it still shows. Is there some other way that I should terminate the string?
Use:
String s = new String(b, 0, read)
instead.
You need to use the String constructor that allows you to specify the range of bytes that are valid in the byte array.
String(byte[] bytes, int offset, int length);
Using it like this:
read = fromSystem.read(b);
String s = new String(b, 0, read);