I need to hash the inputstream during upload to ensure the integrity of the file. How can I copy the inputstream into two copies and the size of the file is more than 10GB. It need faster time to hash and copy
You have multiple options.
Files.newInputStream, hand that off to your hasher algorithm, obtain the hash, then start over to send. This is the best option if it is highly useful to have the hash available during/before the upload. It requires reading the bytes off of disk twice, of course.
Use an existing implementation of a stream that hashes on the fly, such as guava's HashingInputStream, or write something like this on your own (it's not particularly difficult to do so).
You can't easily have 2 inputstreams that can both be fully streamed through whilst only causing the file to be read once, because the 'user' of an inputstream decides how 'fast' you go through, and you can't have 2 separate lines of code both be in charge.
Hence, one of the two processes needs to not be an inputstream and instead have its control reversed: Instead of allowing the code to ask the inputstream for more data (by calling one of its read methods), you'd have some code that is invoked by the inputstream with: Hey, I just read this data because the 'primary' driver asked for it, before I hand it off to the primary driver, anything you need to do here?
The hashing code should be this secondary driver, because it can trivially deal with 'here are X bytes but we are not done yet please process it'.
Here is an example of what that would look like. Note that FilterInputStream by default just forwards all calls directly to the stream you wrap.
public class HashingInputStream extends FilterInputStream {
private final MessageDigest hash;
public HashingInputStream(InputStream base) {
super(base);
hash = MessageDigest.getInstance("SHA-256");
}
#Override
public int read() throws IOException {
int v = super.read();
if (v == -1) return v;
hash.update((byte) v);
return v;
}
#Override
public int read(byte[] b) throws IOException {
int r = super.read(b);
if (r == -1) return r;
hash.update(b, 0, r);
return r;
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
int r = super.read(b, off, len);
if (r == -1) return r;
hash.update(b, off, r);
return r;
}
public byte[] digest() {
return hash.digest();
}
}
your upload code wraps the inputstream, e.g:
try (var in = Files.newInputStream(pathToYourFile)) {
var hashing = new HashingInputStream(in);
hashing.transferTo(yourOutputStream);
var hash = hashing.digest();
}
And you'll get your hash at the end; the file is only read once.
Related
I am not very familiar with exactly all of the implications of bytes or even close to charsets, simply because i have not used them often. However i am working on a project in which i need to convert every Java primitive type (and Strings) to AND from bytes. I want them all with the charset UTF-8, but i'm not sure if i am converting them properly.
Anyways, although i am pretty sure that all number to/from byte conversions are correct, but then again, i need to be 100% sure. If someone has really good experience with bytes with numbers and charsets, could you look over the class below, and point out any issues?
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
public class ByteUtil
{
//TO BYTES FROM PRIMITIVES & STRINGS
public static byte[] getBytes(short i)
{
return ByteBuffer.allocate(2).putInt(i).array();
}
public static byte[] getBytes(int i)
{
return ByteBuffer.allocate(4).putInt(i).array();
}
public static byte[] getBytes(long i)
{
return ByteBuffer.allocate(8).putLong(i).array();
}
public static byte getBytes(boolean i)
{
return (byte) (i ? 1 : 0);
}
public static byte[] getBytes(char i)
{
return getBytes(String.valueOf(i).trim());
}
public static byte[] getBytes(String i)
{
return i.getBytes(StandardCharsets.UTF_8);
}
public static byte[] getBytes(float i)
{
return getBytes(Float.floatToIntBits(i));
}
public static byte[] getBytes(double i)
{
return getBytes(Double.doubleToLongBits(i));
}
//TO PRIMITIVES & STRINGS FROM BYTES
public static short getShort(byte[] b)
{
ByteBuffer wrapped = ByteBuffer.wrap(b);
return wrapped.getShort();
}
public static int getInt(byte[] b)
{
ByteBuffer wrapped = ByteBuffer.wrap(b);
return wrapped.getInt();
}
public static long getLong(byte[] b)
{
ByteBuffer wrapped = ByteBuffer.wrap(b);
return wrapped.getLong();
}
public static boolean getBoolean(byte b)
{
return(b == 1 ? true : false);
}
public static char getChar(byte[] b)
{
return getString(b).trim().toCharArray()[0];
}
public static String getString(byte[] b)
{
return new String(b, StandardCharsets.UTF_8);
}
public static float getFloat(byte[] b)
{
return Float.intBitsToFloat(getInt(b));
}
public static double getDouble(byte[] b)
{
return Double.longBitsToDouble(getLong(b));
}
}
Additionally, all the data put in and returned is read by my source internally, for example the boolean conversion may or may not be the correct way to do something like such, but in the boolean case, it wont matter since i know what i am checking for.
You don't even need to do this. You can use a DataOutputStream to write your primitive types and Strings to a ByteArrayOutputStream. You can then use toByteArray() to get a byte[] that you put into a ByteArrayInputStream. You can wrap that InputStream in a DataInputStream to get back your primitives.
If you're doing a school assignment where you need to implement this yourself (which sounds like a dumb assignment), you can look up the implementations of ByteArrayOutputStream and ByteArrayInputStream on GrepCode. Copy/pasting is a bad idea, but it might give you some hints about considerations to take into account.
I have an Integer[64] of numbers 0 - 6 which say what type of chess piece is there. I have a Boolean[64] of what color each place is. I need to be able to save them as (Strings?) and save them for later use, but I need a fast and efficient way. As of now I am looping through both arrays and creating a 64char String, but I make a few million of them because my chess AI looks deep into the game. Thoughts?
First of all you should redefine your data structure.
Instead of two arrays with integer and booleans you can define one array
byte[64] field;
Then add two methods that retrieve the information about the type and the color:
public int getType(int fieldNo) {
# this returns the first three bits (int 0-6)
return field[fieldNo] & 0x07;
}
public boolean getColor(int fieldNo) {
# this returns the fourth bit
return (field[fieldNo] & 0x08) > 0;
}
You can now save the complete chess field just by writing/reading the fields array:
public byte[] readField(String file) throws IOException {
byte[] field = new short[64];
try (DataInputStream stream = new DataInputStream(new FileInputStream(file)); ) {
stream.readFully(field,0,64);
}
return field;
}
public void writeField(String file, byte[] field) throws IOException {
try (DataOutputStream stream = new DataOutputStream(new FileOutputStream(file)); ) {
stream.write(field,0,64);
}
return field;
}
This saves a complete field in 64 bytes.
More improvements:
Compress the 64 byte filed when saving more than one field to one file. Compression should be good because most of your bytes have value 0.
Instead of using byte[64] you can use byte[32] only and map the information to the first / last 4 bits of one byte.
This was the question asked:
Develop a class Decrypt that derives from FileInputStream and overrides the read() method of FileInputStream such that overriding read method returns a decrypted integer. Use this class to decrypt the file information that is contained out.txt.
I wrote a code for encryption, it worked but for decryption it doesn't. For decryption, i have to XOR the value with 128.
The problem is that after running the program, it doesn't write anything on the output file.
Here's the link for the sample input:
https://www.dropbox.com/s/jb361cxmjc9yd8n/in.txt
This is how the sample output looks like:
How high he holds his haughty head.
The code is below:
//Decrypt class
import java.io.*;
public class Decrypt extends FileInputStream {
public Decrypt(String name) throws FileNotFoundException {
super(name);
}
public int read() throws IOException {
int value = super.read();
int xor = value^128; // exclusive OR to decrypt
return xor;
}
}
//main
import java.io.*;
public class LA4ex3b {
public static void main(String[] args) throws IOException {
Decrypt de=null;
FileOutputStream output=null;
try
{
de = new Decrypt("C:/Users/user/workspace/LA4ex3a/in.txt");
output = new FileOutputStream("C:/Users/user/workspace/LA4ex3a /out.txt");
int a;
while ((a = de.read()) != -1)
{
output.write(a);
}
}
finally
{
if (de!=null)
de.close();
if (output!=null)
output.close();
}
}
}
int value = super.read();
int xor = value^128; // exclusive OR to decrypt
return xor;
In the above you do not check for the special value of -1 returned from super.read(), which you must push through transparently. Without that you'll never receive -1 in your while loop and the program will not terminate normally. The code as below should fix that issue:
int value = super.read();
return value == -1? value : value^128;
Well, I think you should ask in your Overriden method, wether super.read() is -1, too. Because if super.read() is -1, and you xor it with 128, it will not be -1 any longer, so Decrypt.read() wont be -1.
Edit: Ok, I wasn't fast enough!
Two quick corrections:
The output filename should not have a space in it
You listed no restrictions on data in your input file, so I suggest you use the "read data into a byte array" method (just in case a "-1" byte value is legitimate data in the input. Your particular text input is probably ok, but think of problems like these and solve the most inclusive problem you can still simply.
byte[] dataBuffer = new byte[1000];
int bytesRead;
bytesRead = de.read(dataBuffer);
while (bytesRead != -1 ) [
// decrypt each byte
// write the decrypted bytes
bytesRead = de.read(dataBuffer);
}
class Overload
{
public int Add(int a, int b)
{
Console.WriteLine("Int method with Two params executed");
return a + b;
}
public int Add(int a, int b, int c)
{
Console.WriteLine("Int method with three params executed");
return a + b + c;
}
public double Add(double a, double b)
{
Console.WriteLine("double method with Two params executed");
return a + b;
}
}
//class Derived : Overload //over riding//
//{
// public int Add(int a, int b)
// {
// return a + b;
// }
//}
}
I need to store value pair (word and number) in the Map.
I am trying to use TObjectIntHashMap from Trove library with char[] as the key, because I need to minimize the memory usage. But with this method, I can not get the value when I use get() method.
I guess I can not use primitive char array to store in a Map because hashcode issues.
I tried to use TCharArrayList but that takes much memory also.
I read in another stackoverflow question that similar with my purpose and have suggestion to use TLongIntHashMap , store encode values of String word in long data type. In this case my words may contains of latin characters or various other characters that appears in wikipedia collections, I do not know whether the Long is enough for encode or not.
I have tried using Trie data structure to store it, but I need to consider my performance also and choose the best for both memory usage and performance.
Do you have any idea or suggestion for this issue?
It sounds like the most compact way to store the data is to use a byte[] encoded in UTF-8 or similar. You can wrap this in your own class or write you own HashMap which allows byte[] as a key.
I would reconsider how much time it is worth spending to save some memory. If you are talking about a PC or Server, at minimum wage you need to save 1 GB for an hours work so if you are only looking to save 100 MB that's about 6 minutes including testing.
Write your own class that implements CharSequence, and write your own implementation of equals() and hashcode(). The implementation would also pre-allocate large shared char[] storage, and use bits of it at a time. (You can definitely incorporate #Peter Lawrey's excellent suggestion into this, too, and use byte[] storage.)
There's also an opportunity to do a 'soft intern()' using an LRU cache. I've noted where the cache would go.
Here's a simple demonstration of what I mean. Note that if you need heavily concurrent writes, you can try to improve the locking scheme below...
public final class CompactString implements CharSequence {
private final char[] _data;
private final int _offset;
private final int _length;
private final int _hashCode;
private static final Object _lock = new Object();
private static char[] _storage;
private static int _nextIndex;
private static final int LENGTH_THRESHOLD = 128;
private CompactString(char[] data, int offset, int length, int hashCode) {
_data = data; _offset = offset; _length = length; _hashCode = hashCode;
}
private static final CompactString EMPTY = new CompactString(new char[0], 0, 0, "".hashCode());
private static allocateStorage() {
synchronized (_lock) {
_storage = new char[1024];
_nextIndex = 0;
}
}
private static CompactString storeInShared(String value) {
synchronized (_lock) {
if (_nextIndex + value.length() > _storage.length) {
allocateStorage();
}
int start = _nextIndex;
// You would need to change this loop and length to do UTF encoding.
for (int i = 0; i < value.length(); ++i) {
_storage[_nextIndex++] = value.charAt(i);
}
return new CompactString(_storage, start, value.length(), value.hashCode());
}
}
static {
allocateStorage();
}
public static CompactString valueOf(String value) {
// You can implement a soft .intern-like solution here.
if (value == null) {
return null;
} else if (value.length() == 0) {
return EMPTY;
} else if (value.length() > LENGTH_THRESHOLD) {
// You would need to change .toCharArray() and length to do UTF encoding.
return new CompactString(value.toCharArray(), 0, value.length(), value.hashCode());
} else {
return storeInShared(value);
}
}
// left to reader: implement equals etc.
}
java.nio.ByteBuffer#duplicate() returns a new byte buffer that shares the old buffer's content. Changes to the old buffer's content will be visible in the new buffer, and vice versa. What if I want a deep copy of the byte buffer?
I think the deep copy need not involve byte[]. Try the following:
public static ByteBuffer clone(ByteBuffer original) {
ByteBuffer clone = ByteBuffer.allocate(original.capacity());
original.rewind();//copy from the beginning
clone.put(original);
original.rewind();
clone.flip();
return clone;
}
As this question still comes up as one of the first hits to copying a ByteBuffer, I will offer my solution. This solution does not touch the original buffer, including any mark set, and will return a deep copy with the same capacity as the original.
public static ByteBuffer cloneByteBuffer(final ByteBuffer original) {
// Create clone with same capacity as original.
final ByteBuffer clone = (original.isDirect()) ?
ByteBuffer.allocateDirect(original.capacity()) :
ByteBuffer.allocate(original.capacity());
// Create a read-only copy of the original.
// This allows reading from the original without modifying it.
final ByteBuffer readOnlyCopy = original.asReadOnlyBuffer();
// Flip and read from the original.
readOnlyCopy.flip();
clone.put(readOnlyCopy);
return clone;
}
If one cares for the position, limit, or order to be set the same as the original, then that's an easy addition to the above:
clone.position(original.position());
clone.limit(original.limit());
clone.order(original.order());
return clone;
Based off of mingfai's solution:
This will give you an almost true deep copy. The only thing lost will be the mark. If orig is a HeapBuffer and the offset is not zero or the capacity is less than the backing array than the outlying data is not copied.
public static ByteBuffer deepCopy( ByteBuffer orig )
{
int pos = orig.position(), lim = orig.limit();
try
{
orig.position(0).limit(orig.capacity()); // set range to entire buffer
ByteBuffer toReturn = deepCopyVisible(orig); // deep copy range
toReturn.position(pos).limit(lim); // set range to original
return toReturn;
}
finally // do in finally in case something goes wrong we don't bork the orig
{
orig.position(pos).limit(lim); // restore original
}
}
public static ByteBuffer deepCopyVisible( ByteBuffer orig )
{
int pos = orig.position();
try
{
ByteBuffer toReturn;
// try to maintain implementation to keep performance
if( orig.isDirect() )
toReturn = ByteBuffer.allocateDirect(orig.remaining());
else
toReturn = ByteBuffer.allocate(orig.remaining());
toReturn.put(orig);
toReturn.order(orig.order());
return (ByteBuffer) toReturn.position(0);
}
finally
{
orig.position(pos);
}
}
One more simple solution
public ByteBuffer deepCopy(ByteBuffer source, ByteBuffer target) {
int sourceP = source.position();
int sourceL = source.limit();
if (null == target) {
target = ByteBuffer.allocate(source.remaining());
}
target.put(source);
target.flip();
source.position(sourceP);
source.limit(sourceL);
return target;
}
You'll need to iterate the entire buffer and copy by value into the new buffer.
I believe this should supply a full deep copy, including the mark, "out-of-bounds" data, etc...just in case you need the most complete sandbox-safe carbon copy of a ByteBuffer.
The only thing it doesn't copy is the read-only trait, which you can easily get by just calling this method and tagging on a ".asReadOnlyBuffer()"
public static ByteBuffer cloneByteBuffer(ByteBuffer original)
{
//Get position, limit, and mark
int pos = original.position();
int limit = original.limit();
int mark = -1;
try
{
original.reset();
mark = original.position();
}
catch (InvalidMarkException e)
{
//This happens when the original's mark is -1, so leave mark at default value of -1
}
//Create clone with matching capacity and byte order
ByteBuffer clone = (original.isDirect()) ? ByteBuffer.allocateDirect(original.capacity()) : ByteBuffer.allocate(original.capacity());
clone.order(original.order());
//Copy FULL buffer contents, including the "out-of-bounds" part
original.limit(original.capacity());
original.position(0);
clone.put(original);
//Set mark of both buffers to what it was originally
if (mark != -1)
{
original.position(mark);
original.mark();
clone.position(mark);
clone.mark();
}
//Set position and limit of both buffers to what they were originally
original.position(pos);
original.limit(limit);
clone.position(pos);
clone.limit(limit);
return clone;
}