issues in reading google text document with google apis

issues in reading google text document with google apis - java

I am trying to use following code to read a Google text document. But the value returned is a stream with garbage characters instead of the real contents. How can I fix this.
for (DocumentListEntry entry : resultFeed.getEntries()) {
String docId = entry.getDocId();
String docType = entry.getType();
URL exportUrl = new URL("https://docs.google.com/feeds/download/"
+ docType
+ "s/Export?docID="
+ docId
+ "&exportFormat=doc");
MediaContent mc = new MediaContent();
mc.setUri(exportUrl.toString());
MediaSource ms = client.getMedia(mc);
InputStream inStream = null;
try {
inStream = ms.getInputStream();
int c;
while ((c = inStream.read()) != -1) {
System.out.print((char)c);
}
} finally {
if (inStream != null) {
inStream.close();
}
}
}

From a quick read of the documentation, it looks like you are reading the raw bytes of a Microsoft Word-encoded document.
Try changing the &exportFormat=doc to html or txt and see if the output makes more sense.

I suspect that the files you are trying to print out have some other encoding but you're printing them byte by byte in ASCII way. I would try to read the whole stream as byte array and then convert it to string using some other encoding (e.g. UTF8).

Related

Recover data from BLOB file S-Health

I am trying to get data from a blob file I am getting with the Samsung S-health SDK.
What I actually do is:
Cursor c = null;
int i = 0;
c = result.getResultCursor();
if (c != null) {
while ( c.moveToNext() ) {
byte[] live_data = c.getBlob( c.getColumnIndex( HealthConstants.Exercise.LIVE_DATA ) );
if ( live_data != null ) {
// Do something with data.
} else {
Log.d(APP_TAG, "there is no live data.");
}
}
} else {
Log.d(APP_TAG, "There is no result.");
}
"live_data" is a compressed file containing a json with all the data.
I tried to decompress it with ZipInputStream without success.
How can I do?

Finally I found the solution after several attempts.
The problem was that the blob file was a gzip file.
Below is a simple code to decompress the byte array ( in my case live_data ):
ByteArrayInputStream inputStream = new ByteArrayInputStream(live_data);
GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
BufferedReader bf = new BufferedReader(new
InputStreamReader(gzipInputStream, "UTF-8"));
String outStr = "";
String line;
while ((line=bf.readLine())!=null) {
outStr += line;
}
Where outStr is the content of the file I wanted to retrieve and live_data is the blob byte array.
I hope this can be of help to others as well.

Have you tried decompression the BLOB file with WinRar?
I tried for many days this and finally I changed extension to .rar then I got 10x larger plaintext with data samples (ex.: {"calorie":0.0,"distance":0.0,"heart_rate":89.0,"start_time":1521297002178},)
I have S-Health 5.16.0.043 and older DataViewer1.2.0.apk.

Remove Base64 prefix from InputStream

I have a Base64 encoded Image String residing in a File Server. The encoded String has a prefix (ex: "data:image/png;base64,") for support in popular modern browsers (it's obtained via JavaScript's Canvas.toDataURL() method). The client sends a request for the image to my server which verifies them and returns a stream of the Base64 encoded String.
If the client is a web client, the image can be displayed as is within an <img> tag by setting the src to the Base64 encoded String. However, if the client is an Android client, the String needs to be decoded into a Bitmap without the prefix. Though, this can be done fairly easily.
The Problem:
In order to simplify my code and not reinvent the wheel, I'm using an Image Library for the Android client to handle loading, displaying, and caching the images (Facebook's Fresco Library to be exact). However, no library seems to support Base64 decoding (I want my cake and to eat it too). A solution I came up with is to decode the Base64 String on the server as it is being streamed to the client.
The Attempt:
S3Object obj = s3Client.getObject(new GetObjectRequest(bucketName, keyName));
Base64.Decoder decoder = Base64.getDecoder();
//decodes the stream as it is being read
InputStream stream = decoder.wrap(obj.getObjectContent());
try{
return new StreamingOutput(){
#Override
public void write(OutputStream output) throws IOException, WebApplicationException{
int nextByte = 0;
while((nextByte = stream.read()) != -1){
output.write(nextByte);
}
output.flush();
output.close();
stream.close();
}
};
}catch(Exception e){
e.printStackTrace();
}
Unfortunately, the Fresco library still has a problem displaying the image (with no stack traces!). As there doesn't seem to be an issue on my server when decoding the stream (no stack traces either), it leads me to believe that it must be an issue with the prefix. Which leaves me with a dilemma.
The Question: How do I remove the Base64 prefix from a Stream being sent to the client without storing and editing the entire Stream on the server? Is this possible?

Fresco does support decoding data URIs, just as the web client does.
The demo app has an example of this.

How do I remove the Base64 prefix from a Stream being sent to the client without storing and editing the entire Stream on the server?
Removing the prefix while sending the stream to the client turns out to be a pretty complex task. If you don't mind storing the whole String on the server you could simply do:
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
String line;
try {
br = new BufferedReader(new InputStreamReader(stream));
while ((line = br.readLine()) != null) {
sb.append(line);
}
String result = sb.toString();
//comma is the charater which seperates the prefix and the Base64 String
int i = result.indexOf(",");
result = result.substring(i + 1);
//Now, that we have just the Base64 encoded String, we can decode it
Base64.Decoder decoder = Base64.getDecoder();
byte[] decoded = decoder.decode(result);
//Now, just write each byte from the byte array to the output stream
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
But to be more efficient and not store the entire Stream on the server, creates a much more complicated task. We could use the Base64.Decoder.wrap() method but the problem with that is that it throws an IOException if it reaches a value that cannot be decoded (wouldn't it be nice if they provided a method that just left the bytes as is if they can't be decoded?). And unfortunately, the Base64 prefix can't be decoded because it's not Base64 encoded. So, it would throw an IOException.
To get around this problem, we would have to use an InputStreamReader to read the InputStream with the specified appropriate Charset. Then we would have to cast the ints received from the InputStream's read() method call to chars. When we reach the appropriate amount of chars, we would have to compare it with the Base64 prefix's intro ("data"). If it's a match, we know the Stream contains the prefix, so continue reading until we reach the prefix end character (the comma: ","). Finally, we can begin streaming out the bytes after the prefix. Example:
S3Object obj = s3Client.getObject(new GetObjectRequest(bucketName, keyName));
Base64.Decoder decoder = Base64.getDecoder();
InputStream stream = obj.getObjectContent();
InputStreamReader reader = new InputStreamReader(stream);
try{
return new StreamingOutput(){
#Override
public void write(OutputStream output) throws IOException, WebApplicationException{
//for checking if string has base64 prefix
char[] pre = new char[4]; //"data" has at most four bytes on a UTF-8 encoding
boolean containsPre = false;
int count = 0;
int nextByte = 0;
while((nextByte = stream.read()) != -1){
if(count < pre.length){
pre[count] = (char) nextByte;
count++;
}else if(count == pre.length){
//determine whether has prefix or not and act accordingly
count++;
containsPre = (Arrays.toString(pre).toLowerCase().equals("data")) ? true : false;
if(!containsPre){
//doesn't have Base64 prefix so write all the bytes until this point
for(int i = 0; i < pre.length; i++){
output.write((int) pre[i]);
}
output.write(nextByte);
}
}else if(containsPre && count < 25){
//the comma character (,) is considered the end of the Base64 prefix
//so look for the comma, but be realistic, if we don't find it at about 25 characters
//we can assume the String is not encoded correctly
containsPre = (Character.toString((char) nextByte).equals(",")) ? false : true;
count++;
}else{
output.write(nextByte);
}
}
output.flush();
output.close();
stream.close();
}
};
}catch(Exception e){
e.printStackTrace();
return null;
}
This seems a bit hefty of a task to do on the server so I think decoding on the client side is a better choice. Unfortunately, most Android client side libraries don't have support for Base64 decoding (especially with the prefix). However, as #tyronen pointed out Fresco does support it if the String is already obtained. Though, this removes one of the key reasons to use an image loading library.
Android Client Side Decoding
To decode on the client side application is pretty easy. First obtain the String from the InputStream:
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
String line;
try {
br = new BufferedReader(new InputStreamReader(stream));
while ((line = br.readLine()) != null) {
sb.append(line);
}
return sb.toString();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Then decode the String using Android's Base64 class:
int i = result.indexOf(",");
result = result.substring(i + 1);
byte[] decodedString = Base64.decode(result, Base64.DEFAULT);
Bitmap bitMap = BitmapFactory.decodeByteArray(decodedString, 0, decodedString.length);
The Fresco library seems hard to update due to them using a lot of delegation. So, I moved on to using the Picasso image loading library and created my own fork of it with the Base64 decoding ability.

Converting a set of characters into a different string array using java

I am trying to read a data from a file. I have following code.
public void ReadFile()
{
File sdcard = android.os.Environment.getExternalStorageDirectory();
File directory = new File(sdcard.getAbsolutePath()+ "/MyDirectory");
File file = new File(directory,"textfile1.txt");
try (FileInputStream fis = new FileInputStream(file)) {
char stringComma = new Character(',');
System.out.println("Total file size to read (in bytes) : "+ fis.available());
int content;
while ((content = fis.read()) != -1) {
// convert to char and display it
Log.d(TAG, "reading a file");
System.out.print((char) content);
}
} catch (IOException e) {
e.printStackTrace();
}
}
I have file format as follows [textfile1.txt]
[12],84359768069 //some numbers
[34],56845745740
[44],36344679992
[99],46378467467
When i am reading this file each character will read at a time. I want to split this and store in different string arrays like
str1 = [12]
str2 = 84359768069
How i can achieve this?

You're currently reading a byte at a time, because you're using InputStream. That's the first thing to fix - you should be using a Reader for text data. The best approach is to wrap your InputStream in an InputStreamReader.
Next, it sounds like you want to read a line at a time rather than just a character at a time. The easiest way of doing that is to use a BufferedReader wrapping an InputStreamReader.
(If you were using Java 7+, all of this could be achieved very nicely using Files.newBufferedReader - you just need to supply the Path and the Charset. Until Android supports that, you'll need to just do the wrapping manually. It's not too painful though.)
One you're reading a line at a time, you then need to split the line by comma - look at using String.split for this. I would then suggest you create a class to store these two separate values. So each line will be transformed into an instance of your class.
Finally, create a List<YourCustomClass> and add to it as you read the file.
That's given an overview of how to achieve each step - hopefully enough detail to enable you to get going, but not spoon-feeding you enough to hamper you actually learning from the experience.

A simple solution would be to parse the readed characters:
public void ReadFile()
{
File sdcard = android.os.Environment.getExternalStorageDirectory();
File directory = new File(sdcard.getAbsolutePath()+ "/MyDirectory");
File file = new File(directory,"textfile1.txt");
try (FileInputStream fis = new FileInputStream(file)) {
char stringComma = new Character(',');
System.out.println("Total file size to read (in bytes) : "+ fis.available());
int content;
String str1="";
String str2 = "";
boolean commaFound=false;
while ((content = fis.read()) != -1) {
// convert to char and display it
Log.d(TAG, "reading a file");
if ((char)content==',')
{
commaFound = true;
}
else if ((char)content=="\n")
{
System.out.printlnt("str1="+str1+"\nstr2="+str2);
commaFound = false;
str1 = "";
str2 = "";
}
else
{
if (commaFound)
{
str2 += (char)content;
}
else
{
str1 += (char)content;
}
}
System.out.print((char) content);
}
} catch (IOException e) {
e.printStackTrace();
}
}

Java: Reading file in two parts - partly as String and partly as byte[]

I have a file which is split in two parts by "\n\n" - first part is not too long String and second is byte array, which can be quite long.
I am trying to read the file as follows:
byte[] result;
try (final FileInputStream fis = new FileInputStream(file)) {
final InputStreamReader isr = new InputStreamReader(fis);
final BufferedReader reader = new BufferedReader(isr);
String line;
// reading until \n\n
while (!(line = reader.readLine()).trim().isEmpty()){
// processing the line
}
// copying the rest of the byte array
result = IOUtils.toByteArray(reader);
reader.close();
}
Even though the resulting array is the size it should be, its contents are broken. If I try to use toByteArray directly on fis or isr, the contents of result are empty.
How can I read the rest of the file correctly and efficiently?
Thanks!

The reason your contents are broken is because the IOUtils.toByteArray(...) function reads your data as a string in the default character encoding, i.e. it converts the 8-bit binary values into text characters using whatever logic your default encoding prescribes. This usually leads to many of the binary values getting corrupted.
Depending on how exactly the charset is implemented, there is a slight chance that this might work:
result = IOUtils.toByteArray(reader, "ISO-8859-1");
ISO-8859-1 uses only a single byte per character. Not all character values are defined, but many implementations will pass them anyways. Maybe you're lucky with it.
But a much cleaner solution would be to instead read the String in the beginning as binary data first and then converting it to text via new String(bytes) rather than reading the binary data at the end as a String and then converting it back.
This might mean, though, that you need to implement your own version of a BufferedReader for performance purposes.
You can find the source code of the standard BufferedReader via the obvious Google search, which will (for example) lead you here:
http://www.docjar.com/html/api/java/io/BufferedReader.java.html
It's a bit long, but conceptually not too difficult to understand, so hopefully it will be useful as a reference.

Alternatively, you could read the file into byte array, find \n\n position and split the array into the line and bytes
byte[] a = Files.readAllBytes(Paths.get("file"));
String line = "";
byte[] result = a;
for (int i = 0; i < a.length - 1; i++) {
if (a[i] == '\n' && a[i + 1] == '\n') {
line = new String(a, 0, i);
int len = a.length - i - 1;
result = new byte[len];
System.arraycopy(a, i + 1, result, 0, len);
break;
}
}

Thanks for all the comments - the final implementation was done in this way:
try (final FileInputStream fis = new FileInputStream(file)) {
ByteBuffer buffer = ByteBuffer.allocate(64);
boolean wasLast = false;
String headerValue = null, headerKey = null;
byte[] result = null;
while (true) {
byte current = (byte) fis.read();
if (current == '\n') {
if (wasLast) {
// this is \n\n
break;
} else {
// just a new line in header
wasLast = true;
headerValue = new String(buffer.array(), 0, buffer.position()));
buffer.clear();
}
} else if (current == '\t') {
// headerKey\theaderValue\n
headerKey = new String(buffer.array(), 0, buffer.position());
buffer.clear();
} else {
buffer.put(current);
wasLast = false;
}
}
// reading the rest
result = IOUtils.toByteArray(fis);
}

How can I read a .txt file into a single Java string while maintaining line breaks?

Virtually every code example out there reads a TXT file line-by-line and stores it in a String array. I do not want line-by-line processing because I think it's an unnecessary waste of resources for my requirements: All I want to do is quickly and efficiently dump the .txt contents into a single String. The method below does the job, however with one drawback:
private static String readFileAsString(String filePath) throws java.io.IOException{
byte[] buffer = new byte[(int) new File(filePath).length()];
BufferedInputStream f = null;
try {
f = new BufferedInputStream(new FileInputStream(filePath));
f.read(buffer);
if (f != null) try { f.close(); } catch (IOException ignored) { }
} catch (IOException ignored) { System.out.println("File not found or invalid path.");}
return new String(buffer);
}
... the drawback is that the line breaks are converted into long spaces e.g. " ".
I want the line breaks to be converted from \n or \r to <br> (HTML tag) instead.
Thank you in advance.

What about using a Scanner and adding the linefeeds yourself:
sc = new java.util.Scanner ("sample.txt")
while (sc.hasNext ()) {
buf.append (sc.nextLine ());
buf.append ("<br />");
}
I don't see where you get your long spaces from.

You can read directly into the buffer and then create a String from the buffer:
File f = new File(filePath);
FileInputStream fin = new FileInputStream(f);
byte[] buffer = new byte[(int) f.length()];
new DataInputStream(fin).readFully(buffer);
fin.close();
String s = new String(buffer, "UTF-8");

You could add this code:
return new String(buffer).replaceAll("(\r\n|\r|\n|\n\r)", "<br>");
Is this what you are looking for?

The code will read the file contents as they appear in the file - including line breaks.
If you want to change the breaks into something else like displaying in html etc, you will either need to post process it or do it by reading the file line by line. Since you do not want the latter, you can replace your return by following which should do the conversion -
return (new String(buffer)).replaceAll("\r[\n]?", "<br>");

StringBuilder sb = new StringBuilder();
try {
InputStream is = getAssets().open("myfile.txt");
byte[] bytes = new byte[1024];
int numRead = 0;
try {
while((numRead = is.read(bytes)) != -1)
sb.append(new String(bytes, 0, numRead));
}
catch(IOException e) {
}
is.close();
}
catch(IOException e) {
}
your resulting String: String result = sb.toString();
then replace whatever you want in this result.

I agree with the general approach by #Sanket Patel, but using Commons I/O you would likely want File Utils.
So your code word look like:
String myString = FileUtils.readFileToString(new File(filePath));
There is also another version to specify an alternate character encoding.

You should try org.apache.commons.io.IOUtils.toString(InputStream is) to get file content as String. There you can pass InputStream object which you will get from
getAssets().open("xml2json.txt") *<<- belongs to Android, which returns InputStream*
in your Activity. To get String use this :
String xml = IOUtils.toString((getAssets().open("xml2json.txt")));
So,
String xml = IOUtils.toString(*pass_your_InputStream_object_here*);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

issues in reading google text document with google apis - java

From a quick read of the documentation, it looks like you are reading the raw bytes of a Microsoft Word-encoded document. Try changing the &exportFormat=doc to html or txt and see if the output makes more sense.

I suspect that the files you are trying to print out have some other encoding but you're printing them byte by byte in ASCII way. I would try to read the whole stream as byte array and then convert it to string using some other encoding (e.g. UTF8).

Related

Recover data from BLOB file S-Health

Remove Base64 prefix from InputStream

Converting a set of characters into a different string array using java

Java: Reading file in two parts - partly as String and partly as byte[]

How can I read a .txt file into a single Java string while maintaining line breaks?

Categories

Resources