Say I open a text file like this:
public static void main(String[] args) throws IOException {
String file_name = "file.txt";
try {
Read file = new ReadFile(file_name);
String[] Lines = file.openFile();
for (int i = 0; i < es.length; i++) {
System.out.println(Lines[i]);
}
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
Now, I want to change the result to binary (for further conversion into AMI coding), and I suppose that firstly I should turn it to ASCII (though I'm also not 100% certain if that's absolutely necessary), but I'm not sure if I should better change it to chars, or perhaps is there an easier way?
Please, mind that I'm just a beginner.
Do you happen to know for sure that the files will be ASCII encoded? Assuming it is, you can just use the getBytes() function of string:
byte[] lineDefault = line.getBytes();
There is a second option for .getBytes() as well if you don't want to use the default encoding. I often am using:
byte[] lineUtf8 = line.getBytes("UTF-8");
which gives byte sequences which are equivalent to ASCII for characters whose hex values are less than 0x80.
Related
I'm teaching myself Java IO currently and I'm able to read basic ASCII characters from a .txt file but when I get to other Latin-1 or characters within the 255 range it prints it as 194 instead of the correct character decimal number.
For example, I can read abcdefg from the txt file but if I throw in a character like © I dont get 169, I for some reason get 194. I tried testing this out by just printing all chars between 1-255 with a loop but that works. Reading this input seems to not though... so I'm a little perplexed. I understand I can use a reader object or whatever but I want to cover the basics first by learning the byte streams. Here is what I have though:
InputStream io = null;
try{
io = new FileInputStream("thing.txt");
int yeet = io.read();
System.out.println(yeet);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
UTF-8 encoding table and Unicode characters
You can see here that HEX code for © is c2 a9 i.e. 194 169. It seems that your file has Big Endian Endian Endianness and you read the first byte which is 194.
P.S. Read a file character by character/UTF8 this is another good example of java encodings, code-points, etc.
I have some solutions for you.
The first solution
There is a full understanding of the book on this site
The second solution
I have a sample code for you
public class Example {
public static void main(String[] args) throws Exception {
String str = "hey\u6366";
byte[] charset = str.getBytes("UTF-8");
String result = new String(charset, "UTF-8");
System.out.println(result);
}
}
Output:
hey捦
Let us understand the above program. Firstly we converted a given Unicode string to UTF-8 for future verification using the getBytes() method
String str = "hey\u6366";
byte[] charset = str.getBytes("UTF-8")
Then we converted the charset byte array to Unicode by creating a new String object as follows
String result = new String(charset, "UTF-8");
System.out.println(result);
Good luck
I am quite new in java, I need to save xml to csv using java, but problem is I cannot use CSVWriter because in xml there are also UTF8 encoded data.
Therefore I found out it is possible to use outputstreamwriter, which can be encoded in UTF8.
For string everything is ok, but for integer I cannot get correct number.
Sample code:
import java.io.BufferedWriter;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.*;
public class UTF8WriterDemo {
public static void main(String[] args) {
Writer out = null;
try {
out = new BufferedWriter(
new OutputStreamWriter(new FileOutputStream("c://java2//file.csv"), "windows-1250"));
//for (int i=0; i<4; i++ ) {
String text = "This tečt will be added to File !!";
int hu = 4;
out.write('\ufeff');
out.write(text+ '\n');
out.write(hu+ '\n');
//}
out.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
I get picture instead of a number.
I suppose it's because:
An OutputStreamWriter is a bridge from character streams to byte streams: Characters written to it are encoded into bytes using a specified charset. The charset that it uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.
And that's why it's not displayed correctly.
Therefore I would like to ask, is there any option for integers to be displayed using outputstreamwriter?
Or if not, how can I convert xml data into csv using java for UTF8 encoded characters?
Thank you
Java has a difference between using double quotes and single quotes.
"foo" is a String.
'f' is a char (or Character)
'foo' will throw an Exception, because you can only have 1 character in a char.
'\n' is also 1 character, specifically the newline character. Adding a number and a character will use the number as an ASCII value and use the corresponding character, then combine both characters into a String (or array of characters, ie. char[]).
Using double quotes should fix your issue.
import java.io.*;
public class UTF8WriterDemo {
public static void main(String[] args) {
Writer out = null;
try {
out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("file.csv"), "windows-1250"));
//for (int i = 0; i < 4; i++) {
String text = "This text will be added to File !!";
int hu = 4;
String text2 = new String("" + hu);
out.write('\ufeff');
out.write(text + '\n');
out.write(text2 + '\n');
// }
out.close();
} catch (Exception e) {
e.printStackTrace();
} finally {
System.out.println("The process is completed.");
}
}
}
actually I need to rewrite this construction:
FileWriter fileWriter = new
FileWriter("C:\\java\\test\\EEexample3.csv");
CSVWriter csvWriter = new CSVWriter(fileWriter);
csvWriter.writeNext(new String[] {
..
..
..
..
}
..code.. code..
String homeCurrencyPriceString = iit.getHomeCurrency().getPrice()!=null?iit.getHomeCurrency().getPrice().toString():"";
String headerDateString = invoiceHeaderType.getDateTax()!=null?invoiceHeaderType.getDateTax().toString():"";
String invoiceTypeString = invoiceHeaderType.getInvoiceType()!=null?invoiceHeaderType.getInvoiceType().value():"";
String headeraccountno= invoiceHeaderType.getAccount().getAccountNo()!=null?invoiceHeaderType.getAccount().getAccountNo().toString():"";
String headertext = invoiceHeaderType.getText()!=null?invoiceHeaderType.getText():"";
String invoiceitemtext= iit.getText()!=null?iit.getText():"";
String headericdph = invoiceHeaderType.getPartnerIdentity().getAddress().getIcDph()!=null?invoiceHeaderType.getPartnerIdentity().getAddress().getIcDph():"";
String symVar = invoiceHeaderType.getSymVar()!=null?invoiceHeaderType.getSymVar():"";
csvWriter.writeNext(new String[] {
invoiceHeaderType.getPartnerIdentity().getAddress().getIco(), headericdph, invoiceHeaderType.getPartnerIdentity().getAddress().getCompany(),symVar, invoiceHeaderType.getId().toString(), iit.getId().toString(), homeCurrencyPriceString, detailcentreString,headercentreString, headerDateString, invoiceTypeString,headeraccountno, headertext,invoiceitemtext
});
where objects are filled by xml
to outputstreamwriter construction.
So first I am trying outputstream as simple code, to be sure it`s working , next when it works, I wanted to rewrite the whole code.
Using CSVwriter everything works smoothly, just now there were added texts encoded in UTF8/windows1250 :( Therefore I need to fix the construction of code.
Even number objects like price are converted using .toString(), so maybe it works without int.
I hope writer of outputstreamwriter is able to do what is necessary.
I am going to try.
I have to read a file called test.p2b with the following content:
I tried reading it like this:
static void branjeIzDatoteke(String location){
byte[] allBytes = new byte[10000];
try {
InputStream input = new FileInputStream(location);
int byteRead;
int j=0;
while ((byteRead = input.read())!=-1){
allBytes[j] = (byte)input.read();
}
String str = new String(allBytes,"UTF-8");
for (int i=0;i<=str.length()-8;i+=8){
//int charCode = Integer.parseInt(str.substring(i,i+8),2);
//System.out.println((char)charCode);
int drek = (int)str.charAt(i);
System.out.println(Integer.toBinaryString(drek));
}
} catch (IOException ex) {
Logger.getLogger(Slika.class.getName()).log(Level.SEVERE, null, ex);
}
}
I tried just printing out the string (when I created String str = new String(allBytes,"UTF-8");), but all I get is a square at the beginning and then 70+ blank lines with no text.
Then I tried the int charCode = Integer.parseInt(str.substring(i,i+8),2); and printing out each individual character, but then I got a NumberFormatException.
I even tried just converting
Finally I tried the Integer.toBinaryString I have at the end but in this case I get 1s and 0s. That's not what I want, I need to read the actual text but no method seems to work.
I've actually read a binary file before using the method I already tried:
int charCode = Integer.parseInt(str.substring(i,i+8),2);
System.out.println((char)charCode);
but like I said, I get a NumberFormatException.
I don't understand why these methods won't work.
If you want to read all the bytes you can use the java.nio.file.Files utility class:
Path path = Paths.get("test.p2b");
byte[] allBytes = Files.readAllBytes(path);
String str = new String(allBytes, "UTF-8");
System.out.print(str);
You iteration over str content might not work. Certain UTF characters are expressed as surrogate pairs, a code points that can span more than one char (as explained here). Since you are using UTF you should be using String#codePoinst() method to iterate over the code points instead of the characters.
I am a beginner at Java, trying to figure out how to convert characters from a text file into integers. In the process, I wrote a program which generates a text file showing what characters are generated by what integers.
package numberchars;
import java.io.FileWriter;
import java.io.IOException;
import java.io.FileReader;
import java.lang.Character;
public class Numberchars {
public static void main(String[] args) throws IOException {
FileWriter outputStream = new FileWriter("NumberChars.txt");
//Write to the output file the char corresponding to the decimal
// from 1 to 255
int counter = 1;
while (counter <256)
{
outputStream.write(counter);
outputStream.flush();
counter++;
}
outputStream.close();
This generated NumberChars.txt, which had all the numbers, all the letters both upper and lower case, surrounded at each end by other symbols and glyphs.
Then I tried to read this file and convert its characters back into integers:
FileReader inputStream = new FileReader("NumberChars.txt");
FileWriter outputStream2 = new FileWriter ("CharNumbers.txt");
int c;
while ((c = inputStream.read()) != -1)
{
outputStream2.write(Character.getNumericValue(c));
outputStream2.flush();
}
}
}
The resulting file, CharNumbers.txt, began with the same glyphs as NumberChars.txt but then was blank. Opening the files in MS Word, I found NumberChars had 248 characters (including 5 spaces) and CharNumbers had 173 (including 8 spaces).
So why didn't the Character.getNumericValue(c) result in an integer written to CharNumbers.txt? And given that it didn't, why at least didn't it write an exact copy of NumberChars.txt? Any help much appreciated.
Character.getNumericValue doesn't do what you think it does. If you read the Javadoc:
Returns the int value that the specified character (Unicode code point) represents. For example, the character '\u216C' (the Roman numeral fifty) will return an int with a value of 50.
On error it returns -1 (which looks like 0xFF_FF_FF_FF in 2s complement).
Most characters don't have such a "numeric value," so you write the ints out, each padded to 2 bytes (more on that later), read them back in the same way, and then start writing a whole lot of 0xFFFF (-1 truncated to 2 bytes) courtesy of a misplaced Character.getNumericValue. I'm not sure what MS Word is doing, but it's probably getting confused what the encoding of your file is and glomming all those bytes into 0xFF_FF_FF_FF (because the high bits of each byte are set) and treating that as one character. (Use a text editor more suited to this kind of stuff like Notepad++, btw.) If you were to measure your file's size on disk in bytes it will probably still be 256 chars * 2 bytes/chars = 512 bytes.
I'm not sure what you meant to do here, so I'll note that InputStreamReader and OutputStreamWriter work on a (Unicode) character basis, with an encoder that defaults to the system one. That's why your ints are padded/truncated to 2 bytes. If you wanted pure byte IO, use FileInputStream/FileOutputStream. If you wanted to read and write the ints as Strings, you need to use FileWriter/FileReader, but not like you did.
// Just bytes
// This is a try-with-resources. It executes the code with the decls in it
// but is also like an implicit finally block that calls `close()` on each resource.
try(FileOutputStream fos = new FileOutputStream("bytes.bin")) {
for(int b = 0; b < 256; b++) { // Bytes are signed so we use int.
// This takes an int and truncates it for the lowest byte
fos.write(b);
// Can also fill a byte[] and dump it all at once with overloaded write.
}
}
byte[] bytes = new bytes[256];
try(FileInputStream fis = new FileInputStream("bytes.bin")) {
// Reads up to bytes.length bytes into bytes
fis.read(bytes);
}
// Foreach loop. If you don't know what this does, I think you can figure out from the name.
for(byte b : bytes) {
System.out.println(b);
}
// As Strings
try(FileWriter fw = new FileWriter("strings.txt")) {
for(int i = 0; i < 256; i++) {
// You need a delimiter lest you not be able to tell 12 from 1,2 when you read
// Uses system default encoding
fw.write(Integer.toString(i) + "\n");
}
}
byte[] bytes = new byte[256];
try(
FileReader fr = new FileReader("strings.txt");
// FileReaders can't do stuff like "read one line to String" so we wrap it
BufferedReader br = new BufferedReader(fr);
) {
for(int i = 0; i < 256; i++) {
bytes[i] = Byte.valueOf(br.readLine());
}
}
for(byte b : bytes) {
System.out.println(b);
}
public class MyCLAss {
public static void main(String[] args)
{
char x='b';
System.out.println(+x);//just by witting a plus symbol before the variable you can find it's ascii value....it will give 98.
}
}
For a class, I was given a file of base64 encoded salted sha-256 hashed passwords.
the file is in the form:
username:base64 encoded sha256 password:salt
My original thought was to base64 decode the hash so I would be left with:
username:salted hashed password:salt
then run it through JTR or hashcat to crack the passwords.
My problem is in the base64 decoding process.
my code looks like:
public static byte[] decode(String string) {
try {
return new BASE64Decoder().decodeBuffer(string);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
public static void splitLine(String strLine)
throws Exception {
StringTokenizer st = new StringTokenizer(strLine, ":");
if (st.hasMoreTokens())
userName = st.nextToken();
if (st.hasMoreTokens())
password = st.nextToken();
if (st.hasMoreTokens())
salt = st.nextToken();
}
public static void main(String[] argv) {
String line = null;
String pwdFile = null;
int count = 0;
try {
pwdFile = argv[0];
BufferedReader br = new BufferedReader(new FileReader(pwdFile));
line = br.readLine();
while (line != null) {
splitLine(line);
/* alternative #1: generates a lot of non-printable characters for the hash */
System.out.println(userName+":"+new String(decode(password))+":"+salt);
/* alternative #2: gives a list of the decimal values for each byte of the hash */
System.out.println(userName+":"+Arrays.toString(decode(password))+":"+salt);
count++;
line = br.readLine();
}
br.close();
System.err.println("total lines read: " + count);
} catch (Exception e) {
e.printStackTrace();
System.exit(-1);
}
}
With alternative #1, I end up with 50,000 more lines in my output file than were in the input file, so i assume some of the decoded strings contain newline characters which I need to fix as well.
How do I get back to and print the original hash value for the password in a format that either hashcat or JTR will recognize as salted sha256?
Problem: You are trying to to work with Base64 encoded password hashes and when they are decoded, there are unprintable characters
Background: When a value is hashed, the bytes are all changed according to a hashing algorithm and the resulting bytes are often beyond the range of printable characters. Base64 encoding is simply an alphabet that maps ALL bytes into printable characters.
Solution: work with the bytes that Base64 decode returns instead of trying to make them into a String. Convert those raw bytes to Hex representations (Base16) before you print them or give them to Hashcat or JTR. In short, you need to do something like the following (it happens to use Guava library):
String hex = BaseEncoding.base16().encode(bytesFromEncodedString);
This is condensed from a longer answer I posted