How to get the intermediate compressed size using GZipOutputStream? [duplicate]

How to get the intermediate compressed size using GZipOutputStream? [duplicate] - java

I have a BufferedWriter as shown below:
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(
new GZIPOutputStream( hdfs.create(filepath, true ))));
String line = "text";
writer.write(line);
I want to find out the bytes written to the file with out querying file like
hdfs = FileSystem.get( new URI( "hdfs://localhost:8020" ), configuration );
filepath = new Path("path");
hdfs.getFileStatus(filepath).getLen();
as it will add overhead and I don't want that.
Also I cant do this:
line.getBytes().length;
As it give size before compression.

You can use the CountingOutputStream from Apache commons IO library.
Place it between the GZIPOutputStream and the file Outputstream (hdfs.create(..)).
After writing the content to the file you can read the number of written bytes from the CountingOutputStream instance.

If this isn't too late and you are using 1.7+ and you don't wan't to pull in an entire library like Guava or Commons-IO, you can just extend the GZIPOutputStream and obtain the data from the associated Deflater like so:
public class MyGZIPOutputStream extends GZIPOutputStream {
public MyGZIPOutputStream(OutputStream out) throws IOException {
super(out);
}
public long getBytesRead() {
return def.getBytesRead();
}
public long getBytesWritten() {
return def.getBytesWritten();
}
public void setLevel(int level) {
def.setLevel(level);
}
}

You can make you own descendant of OutputStream and count how many time write method was invoked

This is similar to the response by Olaseni, but I moved the counting into the BufferedOutputStream rather than the GZIPOutputStream, and this is more robust, since def.getBytesRead() in Olaseni's answer is not available after the stream has been closed.
With the implementation below, you can supply your own AtomicLong to the constructor so that you can assign the CountingBufferedOutputStream in a try-with-resources block, but still retrieve the count after the block has exited (i.e. after the file is closed).
public static class CountingBufferedOutputStream extends BufferedOutputStream {
private final AtomicLong bytesWritten;
public CountingBufferedOutputStream(OutputStream out) throws IOException {
super(out);
this.bytesWritten = new AtomicLong();
}
public CountingBufferedOutputStream(OutputStream out, int bufSize) throws IOException {
super(out, bufSize);
this.bytesWritten = new AtomicLong();
}
public CountingBufferedOutputStream(OutputStream out, int bufSize, AtomicLong bytesWritten)
throws IOException {
super(out, bufSize);
this.bytesWritten = bytesWritten;
}
#Override
public void write(byte[] b) throws IOException {
super.write(b);
bytesWritten.addAndGet(b.length);
}
#Override
public void write(byte[] b, int off, int len) throws IOException {
super.write(b, off, len);
bytesWritten.addAndGet(len);
}
#Override
public synchronized void write(int b) throws IOException {
super.write(b);
bytesWritten.incrementAndGet();
}
public long getBytesWritten() {
return bytesWritten.get();
}
}

Related

Issue in executing the api methods in Solibri w.r.t openModel and saveModel not working in the same file

The code below is written to a public class that is intended to be executed in another class that implements the Views interface in Solibri, but while executing the code, it does not execute the last two methods that have been created. If somebody knows what the problem is can they please mention how to resolve it ?
The api methods are all w.r.t to Solibri and I have not been able to debug the error myself
public static void openInSolibri() throws IOException {
String name = "Solibri Building.smc";
FileInputStream fis = new FileInputStream(
"C:\\Users\\Public\\Solibri\\SOLIBRI\\Samples\\models\\Solibri Building.smc");
BufferedInputStream bis = new BufferedInputStream(fis);
SMC.openModel(name, bis);
}
// method to update IFC models
public static void updateModels() throws FileNotFoundException, IOException {
String filename = "C:\\Users\\Public\\Solibri\\SOLIBRI\\Samples\\models\\Solibri Building.smc";
try (InputStream inputStream = new FileInputStream(filename);) {
UUID uuid = SMC.getModel().getUUID();
SMC.getModels().updateIFCModel(uuid, inputStream);
}
}
// method to run checks
public static void runChecks() {
SMC.getChecking().runChecking(false);
}
//method to export bcfxml content
public static void exportBcf() throws FileNotFoundException {
FileOutputStream fos = new FileOutputStream(new File("C:\\Solibri Model\\myFile.bcf"));
SMC.getBcfXml().exportBcfXml(BcfVersion.V2_1, BcfScope.ALL, fos);
}
//method to save the model
public static void saveSMC() {
Path path = Paths.get("C:\\Solibri Model\\Solibri Building.smc");
SMC.saveModel(path);
}

How do I create a report based on String in Java?

Let's assume that I have a java program that creates a report by multiple threads writing .to a file:
public File report = new File("C:\somewhere\file")
public FileWriter fileWriter = new FileWriter("C:\somewhere\file");
//Some thread executed the following statement
fileWriter.write("creating report for this thread");
Instead of using a file, I want to use some type of String buffer to create the report so I can return it in a rest response. What can I use that has the same outcome as if using a File.
Update: I want to completely omit the file implementation as I can't store it in cloud.

You can use the outputstream of the HttpServletResponse to send the file as stream. Don't forget to make your header relevant. You can write a method to process the output as file:
public static void writeFileToOutputStream(HttpServletResponse response, File file) {
String type = "application/octet-stream";
response.setContentType(type);
response.setHeader("Content-Disposition", String.format("inline;filename=\"" + file.getName() + "\""));
response.setContentLength((int) file.length());
InputStream inputStream = null;
try {
inputStream = new BufferedInputStream(new FileInputStream(file));
FileCopyUtils.copy(inputStream, response.getOutputStream());
} catch (IOException e) {
log.info("------couldn't write file------");
}
}

Several threads writing to the same would have one obvious solution: use java.util.logging. Writing to a log file. The content of a log file can also easily be returned as a REST response.
Using a string buffer, StringBuilder is faster, but not thread-safe. The older StringBuffer is thread-safe but not with twice appending, like in:
sb.append("The size is ").append(size); // Not thread-safe.
You could do:
private final StringBuilder sb = new StringBuilder(4096);
public void printf(String messageFormat, Object... args) {
String s = new MessageFormat(....);
synchronized(sb) {
sb.append(s);
}
}
public String extract() {
String s;
synchronized(sb) {
s = sb.toString();
sb.setLength(0);
}
return s;
}

If you want to stay implementation agnostic then you should design to an interface. I'd suggest just plain old Writer. You could have something like:
public abstract class AbstractReportWriter {
protected Writer writer;
public AbstractWriter(Writer w) {
writer = w;
}
public void write(String text) {
writer.write(text);
}
}
public class FileReportWriter extends AbstractReportWriter {
public FileReportWriter(String path) {
super(new FileWriter(path))
}
}
public class StringReportWriter extends AbstractReportWriter {
public StringReportWriter() {
super(new StringWriter())
}
public String getValue() {
return ((StringWriter) writer).toString()
}
}
public class CloudReportWriter extends AbstractReportWriter {
public CloudReportWriter() {
super(new YourCloudWriterClass());
}
}
Then you can pick and choose your writer by just swapping the implementation.

Timeout-based BufferedWriter flush

I'm using BufferedWriter with the default size of 8192 characters to write lines to a local file. The lines are read from socket inputstream using BufferedReader readLine method, blocking I/O.
Average line length is 50 characters. It all works well and fast enough (over 1 mln lines per second) however if the client stops writing, lines that are currently stored in BufferedWriter buffer won't be flushed to disk. In fact the buffered characters won't be flushed to disk until the client resumes writing or the connection is closed. This translates into a delay between the time line is transmitted by client and the time this line is committed to file, so long-tail latency goes up.
Is there a way to flush incomplete BufferedWriter buffer on timeout, e.g. within 100 milliseconds?

What about something like this? It's not a real BufferedWriter, but it's a Writer. It works by periodically checking on on the last writer to the underlying, hopefully unbuffered writer, then flushing the BufferedWriter if it's been longer than the timeout.
public class PeriodicFlushingBufferedWriter extends Writer {
protected final MonitoredWriter monitoredWriter;
protected final BufferedWriter writer;
protected final long timeout;
protected final Thread thread;
public PeriodicFlushingBufferedWriter(Writer out, long timeout) {
this(out, 8192, timeout);
}
public PeriodicFlushingBufferedWriter(Writer out, int sz, final long timeout) {
monitoredWriter = new MonitoredWriter(out);
writer = new BufferedWriter(monitoredWriter, sz);
this.timeout = timeout;
thread = new Thread(new Runnable() {
#Override
public void run() {
long deadline = System.currentTimeMillis() + timeout;
while (!Thread.interrupted()) {
try {
Thread.sleep(Math.max(deadline - System.currentTimeMillis(), 0));
} catch (InterruptedException e) {
return;
}
synchronized (PeriodicFlushingBufferedWriter.this) {
if (Thread.interrupted()) {
return;
}
long lastWrite = monitoredWriter.getLastWrite();
if (System.currentTimeMillis() - lastWrite >= timeout) {
try {
writer.flush();
} catch (IOException e) {
}
}
deadline = lastWrite + timeout;
}
}
}
});
thread.start();
}
#Override
public synchronized void write(char[] cbuf, int off, int len) throws IOException {
this.writer.write(cbuf, off, len);
}
#Override
public synchronized void flush() throws IOException {
this.writer.flush();
}
#Override
public synchronized void close() throws IOException {
try {
thread.interrupt();
} finally {
this.writer.close();
}
}
private static class MonitoredWriter extends FilterWriter {
protected final AtomicLong lastWrite = new AtomicLong();
protected MonitoredWriter(Writer out) {
super(out);
}
#Override
public void write(int c) throws IOException {
lastWrite.set(System.currentTimeMillis());
super.write(c);
}
#Override
public void write(char[] cbuf, int off, int len) throws IOException {
lastWrite.set(System.currentTimeMillis());
super.write(cbuf, off, len);
}
#Override
public void write(String str, int off, int len) throws IOException {
lastWrite.set(System.currentTimeMillis());
super.write(str, off, len);
}
#Override
public void flush() throws IOException {
lastWrite.set(System.currentTimeMillis());
super.flush();
}
public long getLastWrite() {
return this.lastWrite.get();
}
}
}

#copeg is right - flush it after every line. It is easy to flush it at time period but what is the sense to have only half record and not be able to proceed it?

You might apply Observer, Manager, and Factory patterns here and have a central BufferedWriterManager produce your BufferedWriters and maintain a list of active instances. An internal thread might wake periodically and flush the active instances. This might also be an opportunity for Weak references so there is no requirement for your consumers to explicitly free the object. Instead, the GC will do the work and your Manager simply needs to handle the case when its internal reference becomes null (i.e. when all strong references are dropped).

Don't try this complex scheme, it's too hard. Just reduce the size of the buffer, by specifying it when constructing the BufferedWriter. Reduce it till you find the balance between performance and latency that you need.

How to redirect javax.mail.Session setDebugOut to log4j logger?

How to redirect javax.mail.Session setDebugOut to log4j logger?
Is it possible to redirect only mailSession debug out to logger?
I mean, there are solutions like
link text
which reassigns all standard output to go to log4j
--System.setOut(new Log4jStream())
Best Regards

Apache Commons Exec library contains useful class LogOutputStream, which you can use for this exact purpose:
LogOutputStream losStdOut = new LogOutputStream() {
#Override
protected void processLine(String line, int level) {
cat.debug(line);
}
};
Session session = Session.getDefaultInstance(new Properties(), null);
session.setDebugOut(new PrintStream(losStdOut));
cat is obviously log4j Category/Appender.

i created an own filteroutputstream (you could also use the org.apache.logging.Logger instead of the SLF)
public class LogStream extends FilterOutputStream
{
private static org.slf4j.Logger LOG = org.slf4j.LoggerFactory.getLogger(LogStream.class);
private static final OutputStream bos = new ByteArrayOutputStream();
public LogStream(OutputStream out)
{
// initialize parent with my bytearray (which was never used)
super(bos);
}
#Override
public void flush() throws IOException
{
// this was never called in my test
bos.flush();
if (bos.size() > 0) LOG.info(bos.toString());
bos.reset();
}
#Override
public void write(byte[] b) throws IOException
{
LOG.info(new String(b));
}
#Override
public void write(byte[] b, int off, int len) throws IOException
{
LOG.info(new String(b, off, len));
}
#Override
public void write(int b) throws IOException
{
write(new byte[] { (byte) b });
}
}
then i redirected the javamail to my output
// redirect the output to our logstream
javax.mail.Session def = javax.mail.Session.getDefaultInstance(new Properties());
def.setDebugOut(new PrintStream(new LogStream(null)));
def.setDebug(true);
that did the trick :)

Write your own OutputStream class
and
mailSession.setDebugOut(new PrintStream(your custom aoutput stream object));

How to Cache InputStream for Multiple Use

I have an InputStream of a file and i use apache poi components to read from it like this:
POIFSFileSystem fileSystem = new POIFSFileSystem(inputStream);
The problem is that i need to use the same stream multiple times and the POIFSFileSystem closes the stream after use.
What is the best way to cache the data from the input stream and then serve more input streams to different POIFSFileSystem ?
EDIT 1:
By cache i meant store for later use, not as a way to speedup the application. Also is it better to just read up the input stream into an array or string and then create input streams for each use ?
EDIT 2:
Sorry to reopen the question, but the conditions are somewhat different when working inside desktop and web application.
First of all, the InputStream i get from the org.apache.commons.fileupload.FileItem in my tomcat web app doesn't support markings thus cannot reset.
Second, I'd like to be able to keep the file in memory for faster acces and less io problems when dealing with files.

you can decorate InputStream being passed to POIFSFileSystem with a version that when close() is called it respond with reset():
class ResetOnCloseInputStream extends InputStream {
private final InputStream decorated;
public ResetOnCloseInputStream(InputStream anInputStream) {
if (!anInputStream.markSupported()) {
throw new IllegalArgumentException("marking not supported");
}
anInputStream.mark( 1 << 24); // magic constant: BEWARE
decorated = anInputStream;
}
#Override
public void close() throws IOException {
decorated.reset();
}
#Override
public int read() throws IOException {
return decorated.read();
}
}
testcase
static void closeAfterInputStreamIsConsumed(InputStream is)
throws IOException {
int r;
while ((r = is.read()) != -1) {
System.out.println(r);
}
is.close();
System.out.println("=========");
}
public static void main(String[] args) throws IOException {
InputStream is = new ByteArrayInputStream("sample".getBytes());
ResetOnCloseInputStream decoratedIs = new ResetOnCloseInputStream(is);
closeAfterInputStreamIsConsumed(decoratedIs);
closeAfterInputStreamIsConsumed(decoratedIs);
closeAfterInputStreamIsConsumed(is);
}
EDIT 2
you can read the entire file in a byte[] (slurp mode) then passing it to a ByteArrayInputStream

Try BufferedInputStream, which adds mark and reset functionality to another input stream, and just override its close method:
public class UnclosableBufferedInputStream extends BufferedInputStream {
public UnclosableBufferedInputStream(InputStream in) {
super(in);
super.mark(Integer.MAX_VALUE);
}
#Override
public void close() throws IOException {
super.reset();
}
}
So:
UnclosableBufferedInputStream bis = new UnclosableBufferedInputStream (inputStream);
and use bis wherever inputStream was used before.

This works correctly:
byte[] bytes = getBytes(inputStream);
POIFSFileSystem fileSystem = new POIFSFileSystem(new ByteArrayInputStream(bytes));
where getBytes is like this:
private static byte[] getBytes(InputStream is) throws IOException {
byte[] buffer = new byte[8192];
ByteArrayOutputStream baos = new ByteArrayOutputStream(2048);
int n;
baos.reset();
while ((n = is.read(buffer, 0, buffer.length)) != -1) {
baos.write(buffer, 0, n);
}
return baos.toByteArray();
}

Use below implementation for more custom use -
public class ReusableBufferedInputStream extends BufferedInputStream
{
private int totalUse;
private int used;
public ReusableBufferedInputStream(InputStream in, Integer totalUse)
{
super(in);
if (totalUse > 1)
{
super.mark(Integer.MAX_VALUE);
this.totalUse = totalUse;
this.used = 1;
}
else
{
this.totalUse = 1;
this.used = 1;
}
}
#Override
public void close() throws IOException
{
if (used < totalUse)
{
super.reset();
++used;
}
else
{
super.close();
}
}
}

What exactly do you mean with "cache"? Do you want the different POIFSFileSystem to start at the beginning of the stream? If so, there's absolutely no point caching anything in your Java code; it will be done by the OS, just open a new stream.
Or do you wan to continue reading at the point where the first POIFSFileSystem stopped? That's not caching, and it's very difficult to do. The only way I can think of if you can't avoid the stream getting closed would be to write a thin wrapper that counts how many bytes have been read and then open a new stream and skip that many bytes. But that could fail when POIFSFileSystem internally uses something like a BufferedInputStream.

If the file is not that big, read it into a byte[] array and give POI a ByteArrayInputStream created from that array.
If the file is big, then you shouldn't care, since the OS will do the caching for you as best as it can.
[EDIT] Use Apache commons-io to read the File into a byte array in an efficient way. Do not use int read() since it reads the file byte by byte which is very slow!
If you want to do it yourself, use a File object to get the length, create the array and the a loop which reads bytes from the file. You must loop since read(byte[], int offset, int len) can read less than len bytes (and usually does).

This is how I would implemented, to be safely used with any InputStream :
write your own InputStream wrapper where you create a temporary file to mirror the original stream content
dump everything read from the original input stream into this temporary file
when the stream was completely read you will have all the data mirrored in the temporary file
use InputStream.reset to switch(initialize) the internal stream to a FileInputStream(mirrored_content_file)
from now on you will loose the reference of the original stream(can be collected)
add a new method release() which will remove the temporary file and release any open stream.
you can even call release() from finalize to be sure the temporary file is release in case you forget to call release()(most of the time you should avoid using finalize, always call a method to release object resources). see Why would you ever implement finalize()?

public static void main(String[] args) throws IOException {
BufferedInputStream inputStream = new BufferedInputStream(IOUtils.toInputStream("Foobar"));
inputStream.mark(Integer.MAX_VALUE);
System.out.println(IOUtils.toString(inputStream));
inputStream.reset();
System.out.println(IOUtils.toString(inputStream));
}
This works. IOUtils is part of commons IO.

This answer iterates on previous ones 1|2 based on the BufferInputStream. The main changes are that it allows infinite reuse. And takes care of closing the original source input stream to free-up system resources. Your OS defines a limit on those and you don't want the program to run out of file handles (That's also why you should always 'consume' responses e.g. with the apache EntityUtils.consumeQuietly()). EDIT Updated the code to handle for gready consumers that use read(buffer, offset, length), in that case it may happen that BufferedInputStream tries hard to look at the source, this code protects against that use.
public class CachingInputStream extends BufferedInputStream {
public CachingInputStream(InputStream source) {
super(new PostCloseProtection(source));
super.mark(Integer.MAX_VALUE);
}
#Override
public synchronized void close() throws IOException {
if (!((PostCloseProtection) in).decoratedClosed) {
in.close();
}
super.reset();
}
private static class PostCloseProtection extends InputStream {
private volatile boolean decoratedClosed = false;
private final InputStream source;
public PostCloseProtection(InputStream source) {
this.source = source;
}
#Override
public int read() throws IOException {
return decoratedClosed ? -1 : source.read();
}
#Override
public int read(byte[] b) throws IOException {
return decoratedClosed ? -1 : source.read(b);
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
return decoratedClosed ? -1 : source.read(b, off, len);
}
#Override
public long skip(long n) throws IOException {
return decoratedClosed ? 0 : source.skip(n);
}
#Override
public int available() throws IOException {
return source.available();
}
#Override
public void close() throws IOException {
decoratedClosed = true;
source.close();
}
#Override
public void mark(int readLimit) {
source.mark(readLimit);
}
#Override
public void reset() throws IOException {
source.reset();
}
#Override
public boolean markSupported() {
return source.markSupported();
}
}
}
To reuse it just close it first if it wasn't.
One limitation though is that if the stream is closed before the whole content of the original stream has been read, then this decorator will have incomplete data, so make sure the whole stream is read before closing.

I just add my solution here, as this works for me. It basically is a combination of the top two answers :)
private String convertStreamToString(InputStream is) {
Writer w = new StringWriter();
char[] buf = new char[1024];
Reader r;
is.mark(1 << 24);
try {
r = new BufferedReader(new InputStreamReader(is, "UTF-8"));
int n;
while ((n=r.read(buf)) != -1) {
w.write(buf, 0, n);
}
is.reset();
} catch(UnsupportedEncodingException e) {
Logger.debug(this.getClass(), "Cannot convert stream to string.", e);
} catch(IOException e) {
Logger.debug(this.getClass(), "Cannot convert stream to string.", e);
}
return w.toString();
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to get the intermediate compressed size using GZipOutputStream? [duplicate] - java

You can use the CountingOutputStream from Apache commons IO library. Place it between the GZIPOutputStream and the file Outputstream (hdfs.create(..)). After writing the content to the file you can read the number of written bytes from the CountingOutputStream instance.

You can make you own descendant of OutputStream and count how many time write method was invoked

Related

Issue in executing the api methods in Solibri w.r.t openModel and saveModel not working in the same file

How do I create a report based on String in Java?

Timeout-based BufferedWriter flush

How to redirect javax.mail.Session setDebugOut to log4j logger?

How to Cache InputStream for Multiple Use

Categories

Resources