I have text with contents
12 13 14
The text has 8 spaces between values 12 and 13 and 13 and 14
My java method is receiving the text as inputstream thru an argument and storing each contents in a byte array, and further then convert each byte to a character
public class FileUpload implements RequestStreamHandler{
String fileObjKeyName = "sample1.txt";
String bucketName="";
/**
* #param args
*/
#Override
public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {
LambdaLogger logger = context.getLogger();
byte[] bytes = IOUtils.toByteArray(inputStream);
StringBuilder sb = new StringBuilder();
StringBuilder sb1 = new StringBuilder();
sb.append("[ ");
sb1.append("[ ");
for (byte b : bytes) {
sb.append(b);
char ch = (char) b;
sb1.append(ch);
}
sb.append("]");
sb1.append("] ");
logger.log(sb.toString());
logger.log(sb1.toString());
}
}
The Decimal representation for the each bytes are correctly printed as below
[ 4950323232323232323249513232323232323232324952]
However when converted to character, only one decimal value '32' (for spaces) between the values are getting converted, skipping all remaining in between spaces bytes.
[ 12 13 14]
Can anyone suggest, the reason for this.
How you convert byte to string? it will be same. see below code:
public static void main(String[] args) {
byte[] bytes = "12 13 14".getBytes();
System.out.println(Arrays.toString(bytes));
String str = new String(bytes,StandardCharsets.UTF_8);
System.out.println(str);
}
Your example shows that you're using AWS, for which you will often check the results and the produced logs online, with a tool that supports HTML.
And in HTML, when you write several consecutive spaces, they are displayed as only one.
Your String object, withing Java, does contain the 8 spaces. But when you give it to a logger to be eventually displayed in a webpage, the spaces are collapsed and displayed as only one.
This is easy to prove: just add the following code at the end of your method:
String s = sb1.toString();
logger.log("s length: " + s.length());
for(int i = 0; i < s.length(); i++) {
logger.log("s[" + i + "]: " + s.charAt(i));
}
It demonstrates the length and exact content of the String. If you're not seeing that exact content when displaying the String, it is the fault of the tool that displays it.
Related
I am writing an hive UDF to convert the EBCDIC character to Hexadecimal.
Ebcdic characters are present in hive table.Currently I am able to convert it, bit it is ignoring few characters while conversion.
Example:
This is the EBCDIC value stored in table:
AGNSAñA¦ûÃÃÂõÂjÂq  à ()
Converted hexadecimal:
c1c7d5e2000a5cd4f6ef99187d07067203a0200258dd9736009f000000800017112400000000001000084008403c000000000000000080
What I want as output:
c1c7d5e200010a5cd4f6ef99187d0706720103a0200258dd9736009f000000800017112400000000001000084008403c000000000000000080
It is ignoring to convert the below EBCDIC characters:
01 - It is start of heading
10 - It is a escape
15 - New line.
Below is the code I have tried so far:
public class EbcdicToHex extends UDF {
public String evaluate(String edata) throws UnsupportedEncodingException {
byte[] ebcdiResult = getEBCDICRawData(edata);
String hexResult = getHexData(ebcdiResult);
return hexResult;
}
public byte[] getEBCDICRawData (String edata) throws UnsupportedEncodingException {
byte[] result = null;
String ebcdic_encoding = "IBM-037";
result = edata.getBytes(ebcdic_encoding);
return result;
}
public String getHexData(byte[] result){
String output = asHex(result);
return output;
}
public static String asHex(byte[] buf) {
char[] HEX_CHARS = "0123456789abcdef".toCharArray();
char[] chars = new char[2 * buf.length];
for (int i = 0; i < buf.length; ++i) {
chars[2 * i] = HEX_CHARS[(buf[i] & 0xF0) >>> 4];
chars[2 * i + 1] = HEX_CHARS[buf[i] & 0x0F];
}
return new String(chars);
}
}
While converting, its ignoring few EBCDIC characters. How to make them also converted to hexadecimal?
I think the problem lies elsewhere, I created a small testcase where I create a String based on those 3 bytes you claim to be ignored, but in my output they do seem to be converted correctly:
private void run(String[] args) throws Exception {
byte[] bytes = new byte[] {0x01, 0x10, 0x15};
String str = new String(bytes, "IBM-037");
byte[] result = getEBCDICRawData(str);
for(byte b : result) {
System.out.print(Integer.toString(( b & 0xff ) + 0x100, 16).substring(1) + " ");
}
System.out.println();
System.out.println(evaluate(str));
}
Output:
01 10 15
011015
Based on this it seems both your getEBCDICRawData and evaluate method seem to be working correctly and makes me believe your String value may already be incorrect to start with. Could it be the String is already missing those characters? Or perhaps a long shot, but maybe the charset is incorrect? There are different EBCDIC charsets, so maybe the String is composed using a different one? Although I doubt this would make much difference for the 01, 10 and 15 bytes.
As a final remark, but probably unrelated to your problem, I usually prefer to use the encode/decode functions on the charset object to do such conversions:
String charset = "IBM-037";
Charset cs = Charset.forName(charset);
ByteBuffer bb = cs.encode(str);
CharBuffer cb = cs.decode(bb);
I am a beginner at Java, trying to figure out how to convert characters from a text file into integers. In the process, I wrote a program which generates a text file showing what characters are generated by what integers.
package numberchars;
import java.io.FileWriter;
import java.io.IOException;
import java.io.FileReader;
import java.lang.Character;
public class Numberchars {
public static void main(String[] args) throws IOException {
FileWriter outputStream = new FileWriter("NumberChars.txt");
//Write to the output file the char corresponding to the decimal
// from 1 to 255
int counter = 1;
while (counter <256)
{
outputStream.write(counter);
outputStream.flush();
counter++;
}
outputStream.close();
This generated NumberChars.txt, which had all the numbers, all the letters both upper and lower case, surrounded at each end by other symbols and glyphs.
Then I tried to read this file and convert its characters back into integers:
FileReader inputStream = new FileReader("NumberChars.txt");
FileWriter outputStream2 = new FileWriter ("CharNumbers.txt");
int c;
while ((c = inputStream.read()) != -1)
{
outputStream2.write(Character.getNumericValue(c));
outputStream2.flush();
}
}
}
The resulting file, CharNumbers.txt, began with the same glyphs as NumberChars.txt but then was blank. Opening the files in MS Word, I found NumberChars had 248 characters (including 5 spaces) and CharNumbers had 173 (including 8 spaces).
So why didn't the Character.getNumericValue(c) result in an integer written to CharNumbers.txt? And given that it didn't, why at least didn't it write an exact copy of NumberChars.txt? Any help much appreciated.
Character.getNumericValue doesn't do what you think it does. If you read the Javadoc:
Returns the int value that the specified character (Unicode code point) represents. For example, the character '\u216C' (the Roman numeral fifty) will return an int with a value of 50.
On error it returns -1 (which looks like 0xFF_FF_FF_FF in 2s complement).
Most characters don't have such a "numeric value," so you write the ints out, each padded to 2 bytes (more on that later), read them back in the same way, and then start writing a whole lot of 0xFFFF (-1 truncated to 2 bytes) courtesy of a misplaced Character.getNumericValue. I'm not sure what MS Word is doing, but it's probably getting confused what the encoding of your file is and glomming all those bytes into 0xFF_FF_FF_FF (because the high bits of each byte are set) and treating that as one character. (Use a text editor more suited to this kind of stuff like Notepad++, btw.) If you were to measure your file's size on disk in bytes it will probably still be 256 chars * 2 bytes/chars = 512 bytes.
I'm not sure what you meant to do here, so I'll note that InputStreamReader and OutputStreamWriter work on a (Unicode) character basis, with an encoder that defaults to the system one. That's why your ints are padded/truncated to 2 bytes. If you wanted pure byte IO, use FileInputStream/FileOutputStream. If you wanted to read and write the ints as Strings, you need to use FileWriter/FileReader, but not like you did.
// Just bytes
// This is a try-with-resources. It executes the code with the decls in it
// but is also like an implicit finally block that calls `close()` on each resource.
try(FileOutputStream fos = new FileOutputStream("bytes.bin")) {
for(int b = 0; b < 256; b++) { // Bytes are signed so we use int.
// This takes an int and truncates it for the lowest byte
fos.write(b);
// Can also fill a byte[] and dump it all at once with overloaded write.
}
}
byte[] bytes = new bytes[256];
try(FileInputStream fis = new FileInputStream("bytes.bin")) {
// Reads up to bytes.length bytes into bytes
fis.read(bytes);
}
// Foreach loop. If you don't know what this does, I think you can figure out from the name.
for(byte b : bytes) {
System.out.println(b);
}
// As Strings
try(FileWriter fw = new FileWriter("strings.txt")) {
for(int i = 0; i < 256; i++) {
// You need a delimiter lest you not be able to tell 12 from 1,2 when you read
// Uses system default encoding
fw.write(Integer.toString(i) + "\n");
}
}
byte[] bytes = new byte[256];
try(
FileReader fr = new FileReader("strings.txt");
// FileReaders can't do stuff like "read one line to String" so we wrap it
BufferedReader br = new BufferedReader(fr);
) {
for(int i = 0; i < 256; i++) {
bytes[i] = Byte.valueOf(br.readLine());
}
}
for(byte b : bytes) {
System.out.println(b);
}
public class MyCLAss {
public static void main(String[] args)
{
char x='b';
System.out.println(+x);//just by witting a plus symbol before the variable you can find it's ascii value....it will give 98.
}
}
I have list of tweet as the input to the hdfs, and try to perform a map-reduce task. This is my mapper implementation:
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
try {
String[] fields = value.toString().split("\t");
StringBuilder sb = new StringBuilder();
for (int i = 1; i < fields.length; i++) {
if (i > 1) {
sb.append("\t");
}
sb.append(fields[i]);
}
tid.set(fields[0]);
content.set(sb.toString());
context.write(tid, content);
} catch(DecoderException e) {
e.printStackTrace();
}
}
As you can see, I tried to split the input by "\t", but the input (value.toString()) looks like this when I print it out:
2014\x091880284777\x09argento_un\x090\x090\x09RT #topmusic619: #RETWEET THIS!!!!!\x5CnFOLLOW ME &
; EVERYONE ELSE THAT RETWEETS THIS FOR 35+ FOLLOWERS\x5Cn#TeamFollowBack #Follow2BeFollowed #TajF\xE2\x80\xA6
here is another example:
2014\x0934447260\x09RBEKP\x090\x090\x09\xE2\x80\x9C#LENEsipper: Wild lmfaooo RT #Yerrp08: L**o some
n***a nutt up while gettin twerked
I noted that \x09 should be a tab character (ASCII 09 is tab), So I tried to use apache Hex:
String tmp = value.toString();
byte[] bytes = Hex.decodeHex(tmp.toCharArray());
But the decodeHex function returns null.
This is weird, since some of the characters are in hex while others are not. How can I decode them?
Edit:
Also note that besides tab, emojis are also encoded as hex values.
I have this class to encode and decode a file. When I run the class with .txt files the result is successfully. But when I run the code with .jpg or .doc I can not open the file or it is not equals to original. I don’t know why this is happening. I have modified this class
http://myjeeva.com/convert-image-to-string-and-string-to-image-in-java.html. But i want change this line
byte imageData[] = new byte[(int) file.length()];
for
byte example[] = new byte[1024];
and read the file so many times how we need. Thanks.
import java.io.*;
import java.util.*;
public class Encode {
Input = Input file root - Output = Output file root - imageDataString =String encoded
String input;
String output;
String imageDataString;
public void setFileInput(String input){
this.input=input;
}
public void setFileOutput(String output){
this.output=output;
}
public String getFileInput(){
return input;
}
public String getFileOutput(){
return output;
}
public String getEncodeString(){
return imageDataString;
}
public String processCode(){
StringBuilder sb= new StringBuilder();
try{
File fileInput= new File( getFileInput() );
FileInputStream imageInFile = new FileInputStream(fileInput);
i have seen in examples that people create a byte[] with the same length than the file. I don´t want this because i will not know what length will have the file.
byte buff[] = new byte[1024];
int r = 0;
while ( ( r = imageInFile.read( buff)) > 0 ) {
String imageData = encodeImage(buff);
sb.append( imageData);
if ( imageInFile.available() <= 0 ) {
break;
}
}
} catch (FileNotFoundException e) {
System.out.println("File not found" + e);
} catch (IOException ioe) {
System.out.println("Exception while reading the file " + ioe);
}
imageDataString = sb.toString();
return imageDataString;
}
public void processDecode(String str) throws IOException{
byte[] imageByteArray = decodeImage(str);
File fileOutput= new File( getFileOutput());
FileOutputStream imageOutFile = new FileOutputStream( fileOutput);
imageOutFile.write(imageByteArray);
imageOutFile.close();
}
public static String encodeImage(byte[] imageByteArray) {
return Base64.getEncoder().withoutPadding().encodeToString( imageByteArray);
}
public static byte[] decodeImage(String imageDataString) {
return Base64.getDecoder().decode( imageDataString);
}
public static void main(String[] args) throws IOException {
Encode a = new Encode();
a.setFileInput( "C://Users//xxx//Desktop//original.doc");
a.setFileOutput("C://Users//xxx//Desktop//original-copied.doc");
a.processCode( );
a.processDecode( a.getEncodeString());
System.out.println("C O P I E D");
}
}
I tried changing
String imageData = encodeImage(buff);
for
String imageData = encodeImage(buff,r);
and the method encodeImage
public static String encodeImage(byte[] imageByteArray, int r) {
byte[] aux = new byte[r];
for ( int i = 0; i < aux.length; i++) {
aux[i] = imageByteArray[i];
if ( aux[i] <= 0 ) {
break;
}
}
return Base64.getDecoder().decode( aux);
}
But i have the error:
Exception in thread "main" java.lang.IllegalArgumentException: Last unit does not have enough valid bits
You have two problems in your program.
The first, as mentioned in by #Joop Eggen, is that you are not handling your input correctly.
In fact, Java does not promise you that even in the middle of the file, you'll be reading the entire 1024 bytes. It could just read 50 bytes, and tell you it read 50 bytes, and then the next time it will read 50 bytes more.
Suppose you read 1024 bytes in the previous round. And now, in the current round, you're only reading 50. Your byte array now contains 50 of the new bytes, and the rest are the old bytes from the previous read!
So you always need to copy the exact number of bytes copied to a new array, and pass that on to your encoding function.
So, to fix this particular problem, you'll need to do something like:
while ( ( r = imageInFile.read( buff)) > 0 ) {
byte[] realBuff = Arrays.copyOf( buff, r );
String imageData = encodeImage(realBuff);
...
}
However, this is not the only problem here. Your real problem is with the Base64 encoding itself.
What Base64 does is take your bytes, break them into 6-bit chunks, and then treat each of those chunks as a number between N 0 and 63. Then it takes the Nth character from its character table, to represent that chunk.
But this means it can't just encode a single byte or two bytes, because a byte contains 8 bits, and which means one chunk of 6 bits, and 2 leftover bits. Two bytes have 16 bits. Thats 2 chunks of 6 bits, and 4 leftover bits.
To solve this problem, Base64 always encodes 3 consecutive bytes. If the input does not divide evenly by three, it adds additional zero bits.
Here is a little program that demonstrates the problem:
package testing;
import java.util.Base64;
public class SimpleTest {
public static void main(String[] args) {
// An array containing six bytes to encode and decode.
byte[] fullArray = { 0b01010101, (byte) 0b11110000, (byte)0b10101010, 0b00001111, (byte)0b11001100, 0b00110011 };
// The same array broken into three chunks of two bytes.
byte[][] threeTwoByteArrays = {
{ 0b01010101, (byte) 0b11110000 },
{ (byte)0b10101010, 0b00001111 },
{ (byte)0b11001100, 0b00110011 }
};
Base64.Encoder encoder = Base64.getEncoder().withoutPadding();
// Encode the full array
String encodedFullArray = encoder.encodeToString(fullArray);
// Encode the three chunks consecutively
StringBuilder encodedStringBuilder = new StringBuilder();
for ( byte [] twoByteArray : threeTwoByteArrays ) {
encodedStringBuilder.append(encoder.encodeToString(twoByteArray));
}
String encodedInChunks = encodedStringBuilder.toString();
System.out.println("Encoded full array: " + encodedFullArray);
System.out.println("Encoded in chunks of two bytes: " + encodedInChunks);
// Now decode the two resulting strings
Base64.Decoder decoder = Base64.getDecoder();
byte[] decodedFromFull = decoder.decode(encodedFullArray);
System.out.println("Byte array decoded from full: " + byteArrayBinaryString(decodedFromFull));
byte[] decodedFromChunked = decoder.decode(encodedInChunks);
System.out.println("Byte array decoded from chunks: " + byteArrayBinaryString(decodedFromChunked));
}
/**
* Convert a byte array to a string representation in binary
*/
public static String byteArrayBinaryString( byte[] bytes ) {
StringBuilder sb = new StringBuilder();
sb.append('[');
for ( byte b : bytes ) {
sb.append(Integer.toBinaryString(Byte.toUnsignedInt(b))).append(',');
}
if ( sb.length() > 1) {
sb.setCharAt(sb.length() - 1, ']');
} else {
sb.append(']');
}
return sb.toString();
}
}
So, imagine my 6-byte array is your image file. And imagine that your buffer is not reading 1024 bytes but 2 bytes each time. This is going to be the output of the encoding:
Encoded full array: VfCqD8wz
Encoded in chunks of two bytes: VfAqg8zDM
As you can see, the encoding of the full array gave us 8 characters. Each group of three bytes is converted into four chunks of 6 bits, which in turn are converted into four characters.
But the encoding of the three two-byte arrays gave you a string of 9 characters. It's a completely different string! Each group of two bytes was extended to three chunks of 6 bits by padding with zeros. And since you asked for no padding, it produces only 3 characters, without the extra = that usually marks when the number of bytes is not divisible by 3.
The output from the part of the program that decodes the 8-character, correct encoded string is fine:
Byte array decoded from full: [1010101,11110000,10101010,1111,11001100,110011]
But the result from attempting to decode the 9-character, incorrect encoded string is:
Exception in thread "main" java.lang.IllegalArgumentException: Last unit does not have enough valid bits
at java.util.Base64$Decoder.decode0(Base64.java:734)
at java.util.Base64$Decoder.decode(Base64.java:526)
at java.util.Base64$Decoder.decode(Base64.java:549)
at testing.SimpleTest.main(SimpleTest.java:34)
Not good! A good base64 string should always have multiples of 4 characters, and we only have 9.
Since you chose a buffer size of 1024, which is not a multiple of 3, that problem will happen. You need to encode a multiple of 3 bytes each time to produce the proper string. So in fact, you need to create a buffer sized 3072 or something like that.
But because of the first problem, be very careful at what you pass to the encoder. Because it can always happen that you'll be reading less than 3072 bytes. And then, if the number is not divisible by three, the same problem will occur.
Look at:
while ( ( r = imageInFile.read( buff)) > 0 ) {
String imageData = encodeImage(buff);
read returns -1 on end-of-file or the actual number of bytes that were read.
So the last buff might not be totally read, and even contain garbage from any prior read. So you need to use r.
As this is an assignment, the rest is up to you.
By the way:
byte[] array = new byte[1024]
is more conventional in Java. The syntax:
byte array[] = ...
was for compatibility with C/C++.
How to convert a string to a stream of bits zeroes and ones
what i did i take a string then convert it to an array of char then i used method
called forDigit(char,int) ,but it does not give me the character as a stream of 0 and 1
could you help please.
also how could i do the reverse from bit to a char. pleaes show me a sample
Its easiest if you take two steps. String supports converting from String to/from byte[] and BigInteger can convert byte[] into binary text and back.
String text = "Hello World!";
System.out.println("Text: "+text);
String binary = new BigInteger(text.getBytes()).toString(2);
System.out.println("As binary: "+binary);
String text2 = new String(new BigInteger(binary, 2).toByteArray());
System.out.println("As text: "+text2);
Prints
Text: Hello World!
As binary: 10010000110010101101100011011000110111100100000010101110110111101110010011011000110010000100001
As text: Hello World!
I tried this one ..
public String toBinaryString(String s) {
char[] cArray=s.toCharArray();
StringBuilder sb=new StringBuilder();
for(char c:cArray)
{
String cBinaryString=Integer.toBinaryString((int)c);
sb.append(cBinaryString);
}
return sb.toString();
}
String strToConvert = "abc";
byte [] bytes = strToConvert.getBytes();
StringBuilder bits = new StringBuilder(bytes.length * 8);
System.err.println(strToConvert + " contains " + bytes.length +" number of bytes");
for(byte b:bytes) {
bits.append(Integer.toString(b, 2));
}
System.err.println(bits);
char [] chars = new char[bits.length()];
bits.getChars(0, bits.length(), chars, chars.length);