Parse info from torrent file - java

I want to get information from *.torrent files. I've read that I can get a list of files of all torrent, announces, size of each piece and count of pieces.
Can you advise me a lib or class for php for a server-side program (in the other way, I can do C++/Java desktop utility and upload on server it's output information, so it's suitable too) for easily parsing this info?
Can I get info about offset and count of pieces for each file? For example, I want to choose only one file from each other. Can I get data for requesting only this file from peers?

Related

Optimize Apache POI .xls file append

Can someone please let me know if there is a memory efficient way to append to .xls files. (Client is very insistent on .xls file for the report and I did all possible research but in vain) All I could find is that to append to existing .xls, we first have to load the entire file into memory, append data and then write it back. Is that the only way ? I can afford to give up on time to optimize memory consumption.
I am afraid that is not possible using apache poi. And I doubt that it will be possible by other libraries. Even the Microsoft applications itself needs always opening the whole file to be able to work with it.
All of the Microsoft Office file formats have a complex internal structure similar to a file system. And the parts of that internal system may have relations to each other. So one cannot simply stream data into those files and append data as it is possible with plain text files or CSV files or single XML files for example. One always needs considering the validity of the complete file system and its realtions. So the complete file system always needs to be known. And where should it be known when not in memory?
The modern Microsoft Office file formats are Office Open XML. This are ZIP archives containing an internal file system having a directory structure containing XML files and other files too. So one can reduce the memory footprint by reading data parts from that ZIP file system directly instead of reading all data into the memory by unzipping the ZIP file system. This is what apache poi tries with XSSF and SAX (Event API). But this is for reading only.
For the writing approach one could have parts of the data (single XML files) written to temporary files to keep them away from the memory. Then put the complete ZIP file system together from those temporary files when all writing is complete. This is what SXSSF (Streaming Usermodel API) tries to do. But this is for writing only.
When it comes to appending data to an existing Microsoft Office file, then nothing of the above is useable. Because, as said already, one always needs considering the validity of the complete file system and its realtions. So the complete file system always needs to be known. So the whole file system always needs to be accessible to append data parts to it and update the relationships. One could think about having all data parts (single XML files) and relationship parts in temporary files to keep them away from the memory. But I don't know any library (maybe the closed source ones like Aspose) who does this. And I doubt that will be possible in a performant way. So you would pay time for a lower memory footprint.
The older Microsoft Office file formats are binary file systems but also consists in an complex internal structure. The single parts are streams of binary records which also may have relations to each other. So the main problem is the same as with Office Open XML.
There is Event API (HSSF Only) which tries reading single record streams similiar to the event API for Office Open XML. But, of course, this is for reading only.
There is no streaming approach for writing HSSF upto now. And the reason is that the old binary Excel worksheets only provide 65,536 rows and 256 columns. So the data amount in one sheet cannot be that big. So a GB sized *.xls file should not occur at all. You should not use Excel as data exchange format for database data. This is not what a spreadsheet calculation application is made for.
But even if one would program a streaming approach for writing HSSF this would not solve your problem. Because there is still nothing for appending data to an existing *.xls file. And the problems for this are the same as with the Office Open XML file formats.

How to read tcpdump file in java of DARPA dataset?

I am working on DARPA dataset for network intrusion detection system. The DARPA dataset contains tcpdump files for training and testing purpose. Now when I open the file in text editor like wordpad, notepad++, I can't read the output file.
How can we read tcpdumfile so that I can save the records in database ?
Well, one way to read it is with, well, tcpdump; that's why they're called tcpdump files, after all.
Another possibility would be to use it with the TShark program that comes with Wireshark; it can be told to write the values of particular protocol fields to the standard output, and you could have a program that reads those values and puts them in a database.
If you want to do this in a Java program, some possibilities are:
jpcap;
jNetPcap;
the jNetWORKS SDK, if I understand what their page for it is saying - that's a commercial product;
possibly other packet-parsing Java libraries.
One thing that is most definitely NOT a possibility is trying to process the files as text - for example, trying to read them in a text editor - because they're not text files! They're binary files, and the packet data for each packet is also binary, and you'd need code that understands the protocols in order to parse that binary data and extract whatever fields you want to put into the database.

TrueZip Random Access Functionality

I'm trying to understand how to randomly traverse a file/files in a .tar.gz using TrueZIP in a Java 6 environment( using the Files classes). I found instances where it uses Java 7's Path, however, I can't come up with an example on how to randomly read an archive on Java 6.
Additionally, does "random" reading mean that it first uncompresses the entire archive, or does it read sections in the compressed file? The purpose is that I want to retrieve some basic information from the file without having to uncompress the entire thing just to read it(ie username).
The method that gzip uses to compress a file (especially .tar.gz files) usually implies that the output file is not random-accessible - you need the symbol table and other context from the entire file up to the current block to even be able to uncompress that block to see what's in it. This is one of the ways it achieves (somewhat) better compression over ZIP/pkzip, which compress each file individually before adding them to a container archive, resulting in the ability to seek to a specific file and uncompress just that file.
So, in order to pick a .tar.gz apart, you will need to uncompress the whole thing, either to a temporary file or in memory (if it's not too large), then you can jump to specific entries in the underlying .tar file, although that has to be done sequentially by skipping from header to header, as tar does not include a central index/directory of files.
I am not aware of TrueZip in particular, but at least in terms of Zip, RAR and Tar you can access single files and retrieve details about them and even extract them without touching the rest of the package.
Additionally, does "random" reading mean that it first uncompresses
the entire archive
If TrueZip follows Zip/RAR/Tar format, then it does not uncompress the entire archive.
The purpose is that I want to retrieve some basic information from the
file without having to uncompress the entire thing just to read it(ie
username).
As previously, that should be fine -- I don't know TrueZip API in particular, but file container formats allow you to inspect file info without reading a single bit of the data, and optionally extract/read the file contents without touching any other file in the container.
The source code comment of zran describes how such tools are working:
http://svn.ghostscript.com/ghostscript/tags/zlib-1.2.3/examples/zran.c
In conclusion one can say that the complete file has to be processed for generating the necessary index.
That is much faster than actually decompress everything.
The index allows to split the file into blocks that can be decompressed without having to decompress the blocks before. That is used for emulating random access.

How to know file type without extension

While trying to come-up with a servlet based application to read files and manipulate them (image type conversion) here is a question that came up to me:
Is it possible to inspect a file content and know the filetype?
Is there a standard that specifies that each file MUST provide some type of marker in their content so that the application will not have to rely on the file extension constraints?
Consider an application scenario:
I am creating an application that will be able to convert different file formats to a set of output formats. Say user uploads an PDF, my application can suggest that the possible conversion formats are microsoft word or TIFF or JPEG etc.
As my application will gradually support different file formats (over a period of time), I want my application to inspect the input file instead of having the user to specify the format. And suggest to user the possible formats of output.
I understand this is an open ended, broad question. Please let me know if it needs to be modified.
Thanks,
Ayusman
Yeap you can figure out the type without an extension using the magic number.
Also, the way the file command figures it out, is actually through a 3 step check:
Check for filesystem properties to identifie empty files, folders, etc...
The said magic number
In text files, check for language in it
Here's a library that'll help you with Magic Numbers: jmimemagic

Resaving A Blob File In Java

I have a web application in GWT and a complementary desktop client also written in Java (so the same solution basically applies to both). In my program users can attach files, then download them later or do whatever. These files are stored as blobs and can be in just about any format. Many of the users that use Excel and Word want to be able to open the file, make changes, then have those changes stored back in the attached file. In other words, need an inline editing of attachments.
Any ideas on how to make this happen? Should I have an 'edit' mode that keeps a file handler while the file is open, and then store that File handler? Some way keeping track of whether the file is changing, or not?
Sorry about the late response. Amol >> I have that going. I want to save directly back to a blob as if it were a filehandle. Thought that was clear in my question.
I have decided that this is almost impossible with a web application without writing some kind of client interface for each and every potential file type - word, excel, pdf, graphics, etc...

Categories