Read faster a file & convert it into HEX - java

I need to read a file that is in ascii and convert it into hex before applying some functions (search for a specific caracter)
To do this, I read a file, convert it in hex and write into a new file. Then I open my new hex file and I apply my functions.
My issue is that it makes way too much time to read and convert it (approx 8sec for a 9Mb file)
My reading method is :
public static void convertToHex2(PrintStream out, File file) throws IOException {
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
int value = 0;
StringBuilder sbHex = new StringBuilder();
StringBuilder sbResult = new StringBuilder();
while ((value = bis.read()) != -1) {
sbHex.append(String.format("%02X ", value));
}
sbResult.append(sbHex);
out.print(sbResult);
bis.close();
}
Do you have any suggestions to make it faster ?

Did you measure what your actual bottleneck is? Because you seem to read very little amount of data in your loop and process that each time. You might as well read larger chunks of data and process those, e.g. using DataInputStream or whatever. That way you would benefit more from optimized reads of your OS, file system, their caches etc.
Additionally, you fill sbHex and append that to sbResult, to print that somewhere. Looks like an unnecessary copy to me, because sbResult will always be empty in your case and with sbHex you already have a StringBuilder for your PrintStream.

Try this:
static String[] xx = new String[256];
static {
for( int i = 0; i < 256; ++i ){
xx[i] = String.format("%02X ", i);
}
}
and use it:
sbHex.append(xx[value]);
Formatting is a heavy operation: it does not only the coversion - it also has to look at the format string.

Related

Java - My Huffman decompression doesn't work in rare cases. Can't figure out why

I just finished coding a Huffman compression/decompression program. The compression part of it seems to work fine but I am having a little bit of a problem with the decompression. I am quite new to programming and this is my first time doing any sort of byte manipulation/file handling so I am aware that my solution is probably awful :D.
For the most part my decompression method works as intended but sometimes it drops data after decompression (aka my decompressed file is smaller than my original file).
Also whenever I try to decompress a file that isnt a plain text file (for example a .jpg) the decompression returns a completely empty file (0 bytes), the compression compresses these other types of files just fine though.
Decompression method:
public static void decompress(File file){
try {
BitFileReader bfr = new BitFileReader(file);
int[] charFreqs = new int[256];
TreeMap<String, Integer> decodeMap = new TreeMap<String, Integer>();
File nF = new File(file.getName() + "_decomp");
nF.createNewFile();
BitFileWriter bfw = new BitFileWriter(nF);
DataInputStream data = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
int uniqueBytes;
int counter = 0;
int byteCount = 0;
uniqueBytes = data.readUnsignedByte();
// Read frequency table
while (counter < uniqueBytes){
int index = data.readUnsignedByte();
int freq = data.readInt();
charFreqs[index] = freq;
counter++;
}
// build tree
Tree tree = buildTree(charFreqs);
// build TreeMap
fillDecodeMap(tree, new StringBuffer(), decodeMap);
// Skip BitFileReader position to actual compressed code
bfr.skip(uniqueBytes*5);
// Get total number of compressed bytes
for(int i=0; i<charFreqs.length; i++){
if(charFreqs[i] > 0){
byteCount += charFreqs[i];
}
}
// Decompress data and write
counter = 0;
StringBuffer code = new StringBuffer();
while(bfr.hasNextBit() && counter < byteCount){
code.append(""+bfr.nextBit());
if(decodeMap.containsKey(code.toString())){
bfw.writeByte(decodeMap.get(code.toString()));
code.setLength(0);
counter++;
}
}
bfw.close();
bfr.close();
data.close();
System.out.println("Decompression successful!");
}
catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
File f = new File("test");
compress(f);
f = new File("test_comp");
decompress(f);
}
}
When I compress the file I save the "character" (byte) values and the frequencies of each unique "character" + the compressed bytes in the same file (all in binary form). I then use this saved info to fill by charFreqs array in my decompress() method and then use that array to build my tree. The formatting of the saved structure looks like this:
<n><value 1><frequency>...<value n><frequency>[the compressed bytes]
(without the <> of course) where n is the number of unique bytes/characters I have in my original text (AKA my leaf values).
I have tested my code a bit and the bytes seem to get dropped somewhere in the while() loop at the bottom of my decompress method (charFreqs[] and the tree seem to retain all the original byte values).
EDIT: Upon request I have now shortened my post a bit in an attempt to make it less cluttered and more "straight to the point".
EDIT 2: I fixed it (but not fully)! The fault was in my BitFileWriter and not in my decompress method. My decompression still does not function properly though. Whenever I try to decompress something that isn't a plain text file (for example a .jpg) it returns a empty "decompressed" file (0 bytes in size). I have no idea what is causing this...

Out of memory java heap space

I am trying to send chunks of files from server to more than one clients. When I am trying to send file of size 700mb, it showed "OutOfMemory java heap space" error. I am using Netbeans 7.1.2 version.
I also tried VMoption in the properties. But still the same error happens. I think there is some problem with reading the entire file. Below code is working for up to 300mb. Please give me some suggestions.
Thanks in advance
public class SplitFile {
static int fileid = 0 ;
public static DataUnit[] getUpdatableDataCode(File fileName) throws FileNotFoundException, IOException{
int i = 0;
DataUnit[] chunks = new DataUnit[UAProtocolServer.singletonServer.cloudhosts.length];
FileInputStream fis;
long Chunk_Size = (fileName.length())/chunks.length;
int cursor = 0;
long fileSize = (long) fileName.length();
int nChunks = 0, read = 0;long readLength = Chunk_Size;
byte[] byteChunk;
try {
fis = new FileInputStream(fileName);
//StupidTest.size = (int)fileName.length();
while (fileSize > 0) {
System.out.println("loop"+ i);
if (fileSize <= Chunk_Size) {
readLength = (int) fileSize;
}
byteChunk = new byte[(int)readLength];
read = fis.read(byteChunk, 0, (int)readLength);
fileSize -= read;
// cursor += read;
assert(read==byteChunk.length);
long aid = fileid;
aid = aid<<32 | nChunks;
chunks[i] = new DataUnit(byteChunk,aid);
// Lister.add(chunks[i]);
nChunks++;
++i;
}
fis.close();
fis = null;
}catch(Exception e){
System.out.println("File splitting exception");
e.printStackTrace();
}
return chunks;
}
Reading in the whole file would definitely trigger OutOfMemoryError as file size grow. Tuning the -Xmx1024M may be good for temporary fix, but it's definitely not the right/scalable solution. Also, doesn't matter how you move your variables around (like creating buffer outside of the loop instead of inside the loop) you will get OutOfMemoryError sooner or later. The only way to not get OutOfMemoryError for you is to not to read the complete file in memory.
If you have to use just memory, then an approach is to send off chunks to the client so you don't have to keep all the chunks in memory:
instead of:
chunks[i] = new DataUnit(byteChunk,aid);
do:
sendChunkToClient(new DataUnit(byteChunk, aid));
But the above solution has the drawback that if some error happened in-between chunk sending, you may have hard time trying to resume/recover from the error point.
Saving the chunks to temporary files like Ross Drew suggested is probably better and more reliable.
How about creating the
byteChunk = new byte[(int)readLength];
outside of the loop and just reuse it instead of creating an array of bytes over and over if it's always the same.
Alternatively
You could write incoming data to a temporary file as it comes in instead of maintaining that huge array then process it once it's all arrived.
Also
If you are using it multiple times as an int, you should probably just case readLength to an int outside the loop as well
int len = (int)readLength;
And Chunk_Size is a variable right? It should begin with a lower case letter.

How can I read a specific number of bytes from a FileInputStream object using buffers

I have a series of objects stored within a file concatenated as below:
sizeOfFile1 || file1 || sizeOfFile2 || file2 ...
The size of the files are serialized long objects and the files are just the raw bytes of the files.
I am trying to extract the files from the input file. Below is my code:
FileInputStream fileInputStream = new FileInputStream("C:\Test.tst");
ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);
while (fileInputStream.available() > 0)
{
long size = (long) objectInputStream.readObject();
FileOutputStream fileOutputStream = new FileOutputStream("C:\" + size + ".tst");
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
int chunkSize = 256;
final byte[] temp = new byte[chunkSize];
int finalChunkSize = (int) (size % chunkSize);
final byte[] finalTemp = new byte[finalChunkSize];
while(fileInputStream.available() > 0 && size > 0)
{
if (fileInputStream.available() > finalChunkSize)
{
int i = fileInputStream.read(temp);
secBufferedOutputStream.write(temp, 0, i);
size = size - i;
}
else
{
int i = fileInputStream.read(finalTemp);
secBufferedOutputStream.write(finalTemp, 0, i);
size = 0;
}
}
bufferedOutputStream.close();
}
fileOutputStream.close();
My code fails after it reads the first sizeOfFile; it just reads the rest of the input file into one file when there are multiple files stored.
Can anyone see the issue here?
Regards.
Wrap it in a DataInputStream and use readFully(byte[]).
But I question the design. Serialization and random access do not mix. It sounds like you should be using a database.
NB you are misusing available(). See the method's Javadoc page. It is never correct to use it as a count of the total number of bytes in the stream. There are few if any correct uses of available(), and this isn't one of them.
you could try NIO instead...
FileChannel roChannel = new RandomAccessFile(file, "r").getChannel();
ByteBuffer roBuf = roChannel.map(FileChannel.MapMode.READ_ONLY, 0, SIZE);
This reads only SIZE bytes from the file.
B
This is using DataInput to read longs. In this particular case I am not using readFully() as a segment might be too long to keep it in memory:
DataInputStream in = new DataInputStream(FileInputStream());
byte[] buf = new byte[64*1024];
while(true) {
OutputStream out = ...;
long size;
try { size = in.readLong(); } catch (EOFException e) { break; }
while(size > 0) {
int len = (size > buf.length)?buf.length:size;
len = in.read(buf, 0, len);
out.write(buf, 0, len);
size-=len;
}
out.close();
}
Save yourself a lot of trouble by doing one of these things:
Switch to using Avro, trust me you would be crazy not to. It's easy to learn, and will accomodate schema changes. Using ObjectXXXStream is one of the worst ideas ever, as soon as you change your schema your old files are garbage.
or use Thrift
or use Hibernate (but this is probably not a great option, hibernate takes a lot of time to learn, and takes a lot of configuration)
If you really refuse to switch to avro, I recommend reading up on apache's IOUtils class. It has a method to copy from one input stream to another, saving you a lot of headaches. Unfortunately what you want to do is a little more complicated, you want the size prefixing each file. You might be able to use a combination of SequenceInputStream objects to do that.
There is also GzipOutputStream and ZipOutputStream, but I think those require some other jars added to your classpath too.
I'm not going to write an example because I honestly think you should just learn avro or thrift and use that.

Parsing PDF that has been downloaded from internet

I have searched questions about this topic on stackoverflow. They really helped me but I stuck again.
My problem is that I need do write a method that downloads pdf from a site like (www.example.com/abc.pdf) and then I want to read the output. I don't want to save this file, just read in system out. I don't need to put bytes to fileoutputstream. I tried to cast bytes to char to get characters ( it can be dumbest solution ). But I got unknown characters. Any idea or am I understood it in a wrong way?
Here is the code and its output:
String textlink="http://www.selab.isti.cnr.it/ws-mate/example.pdf";// it comes from main class
public String HtmlTest(String textLink) throws IOException{
StringBuilder sd=new StringBuilder();
URL link=new URL(textLink);
URLConnection urlConn = link.openConnection();
BufferedInputStream in = null;
try
{
in = new BufferedInputStream(urlConn.getInputStream());
byte data[] = new byte[1024];
in.read(data, 0, 1024);
for (int j = 0; j < data.length; j++) {
if(j%100==0){
sd.append((char)data[j]+"\n"); // i used this for making readable text
}
else{
sd.append((char)data[j]);
}
}
}
finally
{
if (in != null)
in.close();
}
return sd.toString();
}
Output
run:
%
PDF-1.3
%ᅦ↓マᄁ
7 0 obj
<</Length 8 0 R/Filter /FlateDecode>>
stream
xワᆳY[モᅴᄊ○ᄈ&?BoNf,,q%¢ᄐ4￞x&゙6ᄅロlᅮ
ラᄐ￐폐Zeム￲f→チ
You're not going to get very far trying to read a .pdf file as though it were basically a text file. For starters, the "text" is in a compressed binary format; there are other issues you'll probably also have to deal with.
STRONG SUGGESTION:
Use a Java .pdf library like Apache PDFBox
IMHO>.

Load text file to memory in Java

I have wiki.txt file and its size is 50 MB.
I need to do several things on the file and so I thought that the best way in terms of performance is to load the file to memory, is that correct?
This is the code that I written:
File file = new File("wiki.txt");
FileInputStream fileInputStream = new FileInputStream(file);
FileChannel fileChannel = fileInputStream.getChannel();
MappedByteBuffer mapByteBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, file.length());
System.out.println((char)mapByteBuffer.get());
I get error on this code: mapByteBuffer.get().
I tried the get() function a few options but all of them I get error and didn't even get an error on e.getMessage() I just got null.
Another important thing to note, my text file contains English words and actions I need to do is search, if expressed is exist in this text file.
Thank you.
I would suggest using a MemoryMappedFile, to read the file directly from the disk instead of loading it in memory.
RandomAccessFile file = new RandomAccessFile("wiki.txt", "r");
FileChannel channel = file.getChannel();
MappedByteBuffer buf = channel.map(FileChannel.MapMode.READ_WRITE, 0, 1024*50);
And then you can read the buffer as usual.
My answers for point (1):
It depends on what you want to do with the file. If your processing doesn't involve rewind operation (looking what was read behind/before), it's best to just read as a stream and process it in one go (instead of loading all into memory).
Even if you need random access across the file, you may also be interested in doing block file operation, because your solution may not scale well when the file size change to bigger size.
RandomAccessFile if you are on Java 1.4 or above.
For random access, the operating system usually handles the file buffer caching quite well you don't have to handle yourself.
It is important to read the whole error, not just the message. Often the real information is in the exception's name not the text associated with it.
You will get an error if the file is empty as there is no first byte.
Note: the approach you are using assumes ASCII 7-bit characters. If you want to assume ISO-8859-1 characters you can use (char) (byteBuffer.get() & 0xFF)
However, if you have plan text you may find that using strings is simpler to use and not much slower. e.g. you can read a 50 MB file as text in less than a second. I would only use a memory mapped file if this is far too long.
I would suggest to use BufferedReader. It is much faster and requires relatively less resources.
First read number of lines:
InputStream is = new BufferedInputStream(new FileInputStream(filename));
byte[] chars = new byte[1024];
int numberOfChars = 0;
while ((numberOfChars = is.read(chars)) != -1)
{
for (int i = 0; i < numberOfChars; ++i)
{
if (chars[i] == '\n' && numberOfChars - i != 1)
{
++count;
}
}
}
count++
return count; // number of lines
Then read the lines:
BufferedReader in = new BufferedReader(new FileReader(fileName));
for (int i = 0; i < endLine; i++)
{
String oneLine = in.readLine();
}
In this strings you can even do search for what you need.

Categories