Find file with certain extension and calculate its hash in Java

Find file with certain extension and calculate its hash in Java - java

I want to calculate the MD5 hash of a file that ends with a certain extension in Java. I used two codes for this:
FileSearch.java
public class FileSearch
{
public static File findfile(File file) throws IOException
{
String drive = (new DetectDrive()).USBDetect();
Path start = FileSystems.getDefault().getPath(drive);
Files.walkFileTree(start, new SimpleFileVisitor<Path>() {
#Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs)
{
if (file.toString().endsWith(".raw"))
{
System.out.println(file);
}
return FileVisitResult.CONTINUE;
}
});
return file;
}
public static void main(String[] args) throws IOException
{
Hash hasher = new Hash();
try
{
if (file.toString().endsWith("raw"))
{
hasher.hash(file);
}
} catch (IOException e)
{
e.printStackTrace();
}
}
}
Hash.java
public class Hash
{
public void hash(File file) throws Exception
{
MessageDigest md = MessageDigest.getInstance("MD5");
FileInputStream fis = new FileInputStream(file);
byte[] dataBytes = new byte[1024];
int nread = 0;
while ((nread = fis.read(dataBytes)) != -1)
{
md.update(dataBytes, 0, nread);
};
byte[] mdbytes = md.digest();
StringBuffer sb = new StringBuffer();
for (int i = 0; i < mdbytes.length; i++)
{
sb.append(Integer.toString((mdbytes[i] & 0xff) + 0x100, 16).substring(1));
}
System.out.println("Digest(in hex format):: " + sb.toString());
}
}
The first code is used to search for the file that ends with .raw while the second code (not completed yet) is used to get the raw file and then calculate its hash. However, I do not know how to call the first code into the second code to get that raw file. I believe I have to put a string inside the new FileInputStream(...) but I need to call the raw file instead.
Is it possible to do so since both of them contain a main method? Or do I need to change the FileSearch.java without the main method and have a "public String search()" instead and then call it in the second code? I would appreciate if you could show me how to do it the right way.

So the logic consists in these steps:
for each file with the .raw extension
hash the file
You should thus have a method void hash(File file), and call it from your first class.
So, in Hash.java, rename your main method to
public void hash(File file)
And open the file using
FileInputStream fis = new FileInputStream(file);
Then call this hash() method from your first class:
public static void main(String[] args) throws IOException
Hash hasher = new Hash();
...
if (file.toString().endsWith(".raw")) {
hasher.hash(file);
}
...
}
You'll also have to make sure that every FileInputStream you create is properly closed, otherwise you'll quickly run out of file descriptors. The best way to do that is to use the try-with-resources construct: http://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html

Related

Issue in executing the api methods in Solibri w.r.t openModel and saveModel not working in the same file

The code below is written to a public class that is intended to be executed in another class that implements the Views interface in Solibri, but while executing the code, it does not execute the last two methods that have been created. If somebody knows what the problem is can they please mention how to resolve it ?
The api methods are all w.r.t to Solibri and I have not been able to debug the error myself
public static void openInSolibri() throws IOException {
String name = "Solibri Building.smc";
FileInputStream fis = new FileInputStream(
"C:\\Users\\Public\\Solibri\\SOLIBRI\\Samples\\models\\Solibri Building.smc");
BufferedInputStream bis = new BufferedInputStream(fis);
SMC.openModel(name, bis);
}
// method to update IFC models
public static void updateModels() throws FileNotFoundException, IOException {
String filename = "C:\\Users\\Public\\Solibri\\SOLIBRI\\Samples\\models\\Solibri Building.smc";
try (InputStream inputStream = new FileInputStream(filename);) {
UUID uuid = SMC.getModel().getUUID();
SMC.getModels().updateIFCModel(uuid, inputStream);
}
}
// method to run checks
public static void runChecks() {
SMC.getChecking().runChecking(false);
}
//method to export bcfxml content
public static void exportBcf() throws FileNotFoundException {
FileOutputStream fos = new FileOutputStream(new File("C:\\Solibri Model\\myFile.bcf"));
SMC.getBcfXml().exportBcfXml(BcfVersion.V2_1, BcfScope.ALL, fos);
}
//method to save the model
public static void saveSMC() {
Path path = Paths.get("C:\\Solibri Model\\Solibri Building.smc");
SMC.saveModel(path);
}

How do I create a report based on String in Java?

Let's assume that I have a java program that creates a report by multiple threads writing .to a file:
public File report = new File("C:\somewhere\file")
public FileWriter fileWriter = new FileWriter("C:\somewhere\file");
//Some thread executed the following statement
fileWriter.write("creating report for this thread");
Instead of using a file, I want to use some type of String buffer to create the report so I can return it in a rest response. What can I use that has the same outcome as if using a File.
Update: I want to completely omit the file implementation as I can't store it in cloud.

You can use the outputstream of the HttpServletResponse to send the file as stream. Don't forget to make your header relevant. You can write a method to process the output as file:
public static void writeFileToOutputStream(HttpServletResponse response, File file) {
String type = "application/octet-stream";
response.setContentType(type);
response.setHeader("Content-Disposition", String.format("inline;filename=\"" + file.getName() + "\""));
response.setContentLength((int) file.length());
InputStream inputStream = null;
try {
inputStream = new BufferedInputStream(new FileInputStream(file));
FileCopyUtils.copy(inputStream, response.getOutputStream());
} catch (IOException e) {
log.info("------couldn't write file------");
}
}

Several threads writing to the same would have one obvious solution: use java.util.logging. Writing to a log file. The content of a log file can also easily be returned as a REST response.
Using a string buffer, StringBuilder is faster, but not thread-safe. The older StringBuffer is thread-safe but not with twice appending, like in:
sb.append("The size is ").append(size); // Not thread-safe.
You could do:
private final StringBuilder sb = new StringBuilder(4096);
public void printf(String messageFormat, Object... args) {
String s = new MessageFormat(....);
synchronized(sb) {
sb.append(s);
}
}
public String extract() {
String s;
synchronized(sb) {
s = sb.toString();
sb.setLength(0);
}
return s;
}

If you want to stay implementation agnostic then you should design to an interface. I'd suggest just plain old Writer. You could have something like:
public abstract class AbstractReportWriter {
protected Writer writer;
public AbstractWriter(Writer w) {
writer = w;
}
public void write(String text) {
writer.write(text);
}
}
public class FileReportWriter extends AbstractReportWriter {
public FileReportWriter(String path) {
super(new FileWriter(path))
}
}
public class StringReportWriter extends AbstractReportWriter {
public StringReportWriter() {
super(new StringWriter())
}
public String getValue() {
return ((StringWriter) writer).toString()
}
}
public class CloudReportWriter extends AbstractReportWriter {
public CloudReportWriter() {
super(new YourCloudWriterClass());
}
}
Then you can pick and choose your writer by just swapping the implementation.

How to get the intermediate compressed size using GZipOutputStream? [duplicate]

I have a BufferedWriter as shown below:
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(
new GZIPOutputStream( hdfs.create(filepath, true ))));
String line = "text";
writer.write(line);
I want to find out the bytes written to the file with out querying file like
hdfs = FileSystem.get( new URI( "hdfs://localhost:8020" ), configuration );
filepath = new Path("path");
hdfs.getFileStatus(filepath).getLen();
as it will add overhead and I don't want that.
Also I cant do this:
line.getBytes().length;
As it give size before compression.

You can use the CountingOutputStream from Apache commons IO library.
Place it between the GZIPOutputStream and the file Outputstream (hdfs.create(..)).
After writing the content to the file you can read the number of written bytes from the CountingOutputStream instance.

If this isn't too late and you are using 1.7+ and you don't wan't to pull in an entire library like Guava or Commons-IO, you can just extend the GZIPOutputStream and obtain the data from the associated Deflater like so:
public class MyGZIPOutputStream extends GZIPOutputStream {
public MyGZIPOutputStream(OutputStream out) throws IOException {
super(out);
}
public long getBytesRead() {
return def.getBytesRead();
}
public long getBytesWritten() {
return def.getBytesWritten();
}
public void setLevel(int level) {
def.setLevel(level);
}
}

You can make you own descendant of OutputStream and count how many time write method was invoked

This is similar to the response by Olaseni, but I moved the counting into the BufferedOutputStream rather than the GZIPOutputStream, and this is more robust, since def.getBytesRead() in Olaseni's answer is not available after the stream has been closed.
With the implementation below, you can supply your own AtomicLong to the constructor so that you can assign the CountingBufferedOutputStream in a try-with-resources block, but still retrieve the count after the block has exited (i.e. after the file is closed).
public static class CountingBufferedOutputStream extends BufferedOutputStream {
private final AtomicLong bytesWritten;
public CountingBufferedOutputStream(OutputStream out) throws IOException {
super(out);
this.bytesWritten = new AtomicLong();
}
public CountingBufferedOutputStream(OutputStream out, int bufSize) throws IOException {
super(out, bufSize);
this.bytesWritten = new AtomicLong();
}
public CountingBufferedOutputStream(OutputStream out, int bufSize, AtomicLong bytesWritten)
throws IOException {
super(out, bufSize);
this.bytesWritten = bytesWritten;
}
#Override
public void write(byte[] b) throws IOException {
super.write(b);
bytesWritten.addAndGet(b.length);
}
#Override
public void write(byte[] b, int off, int len) throws IOException {
super.write(b, off, len);
bytesWritten.addAndGet(len);
}
#Override
public synchronized void write(int b) throws IOException {
super.write(b);
bytesWritten.incrementAndGet();
}
public long getBytesWritten() {
return bytesWritten.get();
}
}

Jar differs but they should not

I have one method to create a jar.
public class Test {
public static void main(String[] args) throws Exception {
aha();
aha();
aha();
aha();
Thread.sleep(5000);
aha();
}
private static void aha() throws IOException, NoSuchAlgorithmException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
JarOutputStream jos = new JarOutputStream(baos);
jos.putNextEntry(new ZipEntry("sd"));
jos.write("sdf".getBytes());
jos.close();
MessageDigest md = MessageDigest.getInstance("sha1");
byte[] digest = md.digest(baos.toByteArray());
for (byte b : digest) {
System.out.print("," + b);
}
System.out.println();
}
}
The output is:
,-57,-44,59,113,-126,-15,71,62,-90,-120,27,36,-3,69,26,-55,63,107,-93,102
,-57,-44,59,113,-126,-15,71,62,-90,-120,27,36,-3,69,26,-55,63,107,-93,102
,-57,-44,59,113,-126,-15,71,62,-90,-120,27,36,-3,69,26,-55,63,107,-93,102
,-57,-44,59,113,-126,-15,71,62,-90,-120,27,36,-3,69,26,-55,63,107,-93,102
,-124,-26,-79,-28,-34,77,-72,83,92,53,30,-13,95,21,-92,55,70,24,-72,39
I need same digests but the last digest differs. How to become reproducable hashes?

Altough almost invisible, if you write a ZipEntry to a JarOutputStream, the underlying ZipOutputStream will initialize the last modification time for you.
if (e.xdostime == -1) {
// by default, do NOT use extended timestamps in extra
// data, for now.
e.setTime(System.currentTimeMillis());
}
You would have to manually initialize the time with setTime get a constant result.

How to Cache InputStream for Multiple Use

I have an InputStream of a file and i use apache poi components to read from it like this:
POIFSFileSystem fileSystem = new POIFSFileSystem(inputStream);
The problem is that i need to use the same stream multiple times and the POIFSFileSystem closes the stream after use.
What is the best way to cache the data from the input stream and then serve more input streams to different POIFSFileSystem ?
EDIT 1:
By cache i meant store for later use, not as a way to speedup the application. Also is it better to just read up the input stream into an array or string and then create input streams for each use ?
EDIT 2:
Sorry to reopen the question, but the conditions are somewhat different when working inside desktop and web application.
First of all, the InputStream i get from the org.apache.commons.fileupload.FileItem in my tomcat web app doesn't support markings thus cannot reset.
Second, I'd like to be able to keep the file in memory for faster acces and less io problems when dealing with files.

you can decorate InputStream being passed to POIFSFileSystem with a version that when close() is called it respond with reset():
class ResetOnCloseInputStream extends InputStream {
private final InputStream decorated;
public ResetOnCloseInputStream(InputStream anInputStream) {
if (!anInputStream.markSupported()) {
throw new IllegalArgumentException("marking not supported");
}
anInputStream.mark( 1 << 24); // magic constant: BEWARE
decorated = anInputStream;
}
#Override
public void close() throws IOException {
decorated.reset();
}
#Override
public int read() throws IOException {
return decorated.read();
}
}
testcase
static void closeAfterInputStreamIsConsumed(InputStream is)
throws IOException {
int r;
while ((r = is.read()) != -1) {
System.out.println(r);
}
is.close();
System.out.println("=========");
}
public static void main(String[] args) throws IOException {
InputStream is = new ByteArrayInputStream("sample".getBytes());
ResetOnCloseInputStream decoratedIs = new ResetOnCloseInputStream(is);
closeAfterInputStreamIsConsumed(decoratedIs);
closeAfterInputStreamIsConsumed(decoratedIs);
closeAfterInputStreamIsConsumed(is);
}
EDIT 2
you can read the entire file in a byte[] (slurp mode) then passing it to a ByteArrayInputStream

Try BufferedInputStream, which adds mark and reset functionality to another input stream, and just override its close method:
public class UnclosableBufferedInputStream extends BufferedInputStream {
public UnclosableBufferedInputStream(InputStream in) {
super(in);
super.mark(Integer.MAX_VALUE);
}
#Override
public void close() throws IOException {
super.reset();
}
}
So:
UnclosableBufferedInputStream bis = new UnclosableBufferedInputStream (inputStream);
and use bis wherever inputStream was used before.

This works correctly:
byte[] bytes = getBytes(inputStream);
POIFSFileSystem fileSystem = new POIFSFileSystem(new ByteArrayInputStream(bytes));
where getBytes is like this:
private static byte[] getBytes(InputStream is) throws IOException {
byte[] buffer = new byte[8192];
ByteArrayOutputStream baos = new ByteArrayOutputStream(2048);
int n;
baos.reset();
while ((n = is.read(buffer, 0, buffer.length)) != -1) {
baos.write(buffer, 0, n);
}
return baos.toByteArray();
}

Use below implementation for more custom use -
public class ReusableBufferedInputStream extends BufferedInputStream
{
private int totalUse;
private int used;
public ReusableBufferedInputStream(InputStream in, Integer totalUse)
{
super(in);
if (totalUse > 1)
{
super.mark(Integer.MAX_VALUE);
this.totalUse = totalUse;
this.used = 1;
}
else
{
this.totalUse = 1;
this.used = 1;
}
}
#Override
public void close() throws IOException
{
if (used < totalUse)
{
super.reset();
++used;
}
else
{
super.close();
}
}
}

What exactly do you mean with "cache"? Do you want the different POIFSFileSystem to start at the beginning of the stream? If so, there's absolutely no point caching anything in your Java code; it will be done by the OS, just open a new stream.
Or do you wan to continue reading at the point where the first POIFSFileSystem stopped? That's not caching, and it's very difficult to do. The only way I can think of if you can't avoid the stream getting closed would be to write a thin wrapper that counts how many bytes have been read and then open a new stream and skip that many bytes. But that could fail when POIFSFileSystem internally uses something like a BufferedInputStream.

If the file is not that big, read it into a byte[] array and give POI a ByteArrayInputStream created from that array.
If the file is big, then you shouldn't care, since the OS will do the caching for you as best as it can.
[EDIT] Use Apache commons-io to read the File into a byte array in an efficient way. Do not use int read() since it reads the file byte by byte which is very slow!
If you want to do it yourself, use a File object to get the length, create the array and the a loop which reads bytes from the file. You must loop since read(byte[], int offset, int len) can read less than len bytes (and usually does).

This is how I would implemented, to be safely used with any InputStream :
write your own InputStream wrapper where you create a temporary file to mirror the original stream content
dump everything read from the original input stream into this temporary file
when the stream was completely read you will have all the data mirrored in the temporary file
use InputStream.reset to switch(initialize) the internal stream to a FileInputStream(mirrored_content_file)
from now on you will loose the reference of the original stream(can be collected)
add a new method release() which will remove the temporary file and release any open stream.
you can even call release() from finalize to be sure the temporary file is release in case you forget to call release()(most of the time you should avoid using finalize, always call a method to release object resources). see Why would you ever implement finalize()?

public static void main(String[] args) throws IOException {
BufferedInputStream inputStream = new BufferedInputStream(IOUtils.toInputStream("Foobar"));
inputStream.mark(Integer.MAX_VALUE);
System.out.println(IOUtils.toString(inputStream));
inputStream.reset();
System.out.println(IOUtils.toString(inputStream));
}
This works. IOUtils is part of commons IO.

This answer iterates on previous ones 1|2 based on the BufferInputStream. The main changes are that it allows infinite reuse. And takes care of closing the original source input stream to free-up system resources. Your OS defines a limit on those and you don't want the program to run out of file handles (That's also why you should always 'consume' responses e.g. with the apache EntityUtils.consumeQuietly()). EDIT Updated the code to handle for gready consumers that use read(buffer, offset, length), in that case it may happen that BufferedInputStream tries hard to look at the source, this code protects against that use.
public class CachingInputStream extends BufferedInputStream {
public CachingInputStream(InputStream source) {
super(new PostCloseProtection(source));
super.mark(Integer.MAX_VALUE);
}
#Override
public synchronized void close() throws IOException {
if (!((PostCloseProtection) in).decoratedClosed) {
in.close();
}
super.reset();
}
private static class PostCloseProtection extends InputStream {
private volatile boolean decoratedClosed = false;
private final InputStream source;
public PostCloseProtection(InputStream source) {
this.source = source;
}
#Override
public int read() throws IOException {
return decoratedClosed ? -1 : source.read();
}
#Override
public int read(byte[] b) throws IOException {
return decoratedClosed ? -1 : source.read(b);
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
return decoratedClosed ? -1 : source.read(b, off, len);
}
#Override
public long skip(long n) throws IOException {
return decoratedClosed ? 0 : source.skip(n);
}
#Override
public int available() throws IOException {
return source.available();
}
#Override
public void close() throws IOException {
decoratedClosed = true;
source.close();
}
#Override
public void mark(int readLimit) {
source.mark(readLimit);
}
#Override
public void reset() throws IOException {
source.reset();
}
#Override
public boolean markSupported() {
return source.markSupported();
}
}
}
To reuse it just close it first if it wasn't.
One limitation though is that if the stream is closed before the whole content of the original stream has been read, then this decorator will have incomplete data, so make sure the whole stream is read before closing.

I just add my solution here, as this works for me. It basically is a combination of the top two answers :)
private String convertStreamToString(InputStream is) {
Writer w = new StringWriter();
char[] buf = new char[1024];
Reader r;
is.mark(1 << 24);
try {
r = new BufferedReader(new InputStreamReader(is, "UTF-8"));
int n;
while ((n=r.read(buf)) != -1) {
w.write(buf, 0, n);
}
is.reset();
} catch(UnsupportedEncodingException e) {
Logger.debug(this.getClass(), "Cannot convert stream to string.", e);
} catch(IOException e) {
Logger.debug(this.getClass(), "Cannot convert stream to string.", e);
}
return w.toString();
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Find file with certain extension and calculate its hash in Java - java

Related

Issue in executing the api methods in Solibri w.r.t openModel and saveModel not working in the same file

How do I create a report based on String in Java?

How to get the intermediate compressed size using GZipOutputStream? [duplicate]

Jar differs but they should not

How to Cache InputStream for Multiple Use

Categories

Resources