I am working on DARPA dataset for network intrusion detection system. The DARPA dataset contains tcpdump files for training and testing purpose. Now when I open the file in text editor like wordpad, notepad++, I can't read the output file.
How can we read tcpdumfile so that I can save the records in database ?
Well, one way to read it is with, well, tcpdump; that's why they're called tcpdump files, after all.
Another possibility would be to use it with the TShark program that comes with Wireshark; it can be told to write the values of particular protocol fields to the standard output, and you could have a program that reads those values and puts them in a database.
If you want to do this in a Java program, some possibilities are:
jpcap;
jNetPcap;
the jNetWORKS SDK, if I understand what their page for it is saying - that's a commercial product;
possibly other packet-parsing Java libraries.
One thing that is most definitely NOT a possibility is trying to process the files as text - for example, trying to read them in a text editor - because they're not text files! They're binary files, and the packet data for each packet is also binary, and you'd need code that understands the protocols in order to parse that binary data and extract whatever fields you want to put into the database.
Related
Can someone please let me know if there is a memory efficient way to append to .xls files. (Client is very insistent on .xls file for the report and I did all possible research but in vain) All I could find is that to append to existing .xls, we first have to load the entire file into memory, append data and then write it back. Is that the only way ? I can afford to give up on time to optimize memory consumption.
I am afraid that is not possible using apache poi. And I doubt that it will be possible by other libraries. Even the Microsoft applications itself needs always opening the whole file to be able to work with it.
All of the Microsoft Office file formats have a complex internal structure similar to a file system. And the parts of that internal system may have relations to each other. So one cannot simply stream data into those files and append data as it is possible with plain text files or CSV files or single XML files for example. One always needs considering the validity of the complete file system and its realtions. So the complete file system always needs to be known. And where should it be known when not in memory?
The modern Microsoft Office file formats are Office Open XML. This are ZIP archives containing an internal file system having a directory structure containing XML files and other files too. So one can reduce the memory footprint by reading data parts from that ZIP file system directly instead of reading all data into the memory by unzipping the ZIP file system. This is what apache poi tries with XSSF and SAX (Event API). But this is for reading only.
For the writing approach one could have parts of the data (single XML files) written to temporary files to keep them away from the memory. Then put the complete ZIP file system together from those temporary files when all writing is complete. This is what SXSSF (Streaming Usermodel API) tries to do. But this is for writing only.
When it comes to appending data to an existing Microsoft Office file, then nothing of the above is useable. Because, as said already, one always needs considering the validity of the complete file system and its realtions. So the complete file system always needs to be known. So the whole file system always needs to be accessible to append data parts to it and update the relationships. One could think about having all data parts (single XML files) and relationship parts in temporary files to keep them away from the memory. But I don't know any library (maybe the closed source ones like Aspose) who does this. And I doubt that will be possible in a performant way. So you would pay time for a lower memory footprint.
The older Microsoft Office file formats are binary file systems but also consists in an complex internal structure. The single parts are streams of binary records which also may have relations to each other. So the main problem is the same as with Office Open XML.
There is Event API (HSSF Only) which tries reading single record streams similiar to the event API for Office Open XML. But, of course, this is for reading only.
There is no streaming approach for writing HSSF upto now. And the reason is that the old binary Excel worksheets only provide 65,536 rows and 256 columns. So the data amount in one sheet cannot be that big. So a GB sized *.xls file should not occur at all. You should not use Excel as data exchange format for database data. This is not what a spreadsheet calculation application is made for.
But even if one would program a streaming approach for writing HSSF this would not solve your problem. Because there is still nothing for appending data to an existing *.xls file. And the problems for this are the same as with the Office Open XML file formats.
Within a Java program i've got a bunch of text files which the program reads and writes to (i know this is a really bad way to implement an app) but I need some way to ensure the integrity of the text files every time the program loads.
If the text file is deleted the program will be able to re-create it as it was last. Is there any way of doing something like this where I can store data between program executions? - But the important thing is that i'm able to change the data stored.
(Usually would use a database but it's not an option atm).
edit: (Clarify what I'm looking for)
There exists a text file full of data.
User deletes the text file.
Program detects wrong or missing file and re-creates it from a backup which the user can't get his hands on.
This is the kind of process i'm trying to implement.
You can't save data locally in a safe way. Everything that is stored on the users machine is under the users control. You can make them jump through hoops, like with using encryption or storing files in obscure formats in strange places, but you will just make it less convenient to change the files, not impossible for a determined user.
The only way to get around this is to store the data online.
I want to get information from *.torrent files. I've read that I can get a list of files of all torrent, announces, size of each piece and count of pieces.
Can you advise me a lib or class for php for a server-side program (in the other way, I can do C++/Java desktop utility and upload on server it's output information, so it's suitable too) for easily parsing this info?
Can I get info about offset and count of pieces for each file? For example, I want to choose only one file from each other. Can I get data for requesting only this file from peers?
I have a web application in GWT and a complementary desktop client also written in Java (so the same solution basically applies to both). In my program users can attach files, then download them later or do whatever. These files are stored as blobs and can be in just about any format. Many of the users that use Excel and Word want to be able to open the file, make changes, then have those changes stored back in the attached file. In other words, need an inline editing of attachments.
Any ideas on how to make this happen? Should I have an 'edit' mode that keeps a file handler while the file is open, and then store that File handler? Some way keeping track of whether the file is changing, or not?
Sorry about the late response. Amol >> I have that going. I want to save directly back to a blob as if it were a filehandle. Thought that was clear in my question.
I have decided that this is almost impossible with a web application without writing some kind of client interface for each and every potential file type - word, excel, pdf, graphics, etc...
This question already has answers here:
How to create my own file extension like .odt or .doc? [closed]
(3 answers)
Closed 8 years ago.
I'm on my way in developing a desktop application using netbeans(Java Dextop Application) and I need to implement my own file format which is specific to that application only. I'm quite uncertain as to how should I go about first.What code should I use so that my java application read that file and open it in a way as I want it to be.
If it's character data, use Reader/Writer. If it's binary data, use InputStream/OutputStream. That's it. They are available in several flavors, like BufferdReader which eases reading a text file line by line and so on.
They're part of the Java IO API. Start learning it here: Java IO tutorial.
By the way, Java at its own really doesn't care about the file extension or format. It's the code logic which you need to write to handle each character or byte of the file according to some file format specification (which you in turn have to writeup first if you'd like to invent one yourself).
I am not sure this directly addresses your question, but since you mentioned a custom file format, it is worth noting that applications launched using Java Web Start can declare a file association. If the user double clicks one of those file types, the file name will be passed to the main(String[]) of the app.
This ability is used in the File Service demo. of the JNLP API - available at my site.
As to the exact format of the file & the best ways to load and save it, there are a large number of possibilities that can be narrowed down with more details of the information it contains.
Choosing a new/existing file extension does not affect your application (or in any case anyone's). It is upto the programmer what files he wants his app to read.
For example, you may consider you can't read a pdf or doc directly as a text file....but that is not because they are written/ stored differently, but because they have headers or characters which your app does not understand. So we might use a plugin or extension which understands those added headers ( or rather the grammar of the pdf /doc file) removes them & lets our app know what text (or anything else) it contains.
So if you wish to incorporate your own extension, & specifically want no other application to be able to read it, just write the text in a way that only your program is able to understand. Though writing a file in binary pretty much ensures that your file is not read directly just by user opening a file, but it is however still possible to read from it, if it is merely collection of raw characters.
If you ask code for hiding a data, I'd say there are plenty of algorithms you might use, which usually get tagged as encryptions cause you are basically trying to lock/hide your stuff. So if you do not really care for the big hulla-bulla, simply trying to keep a file from being directly read & successful attempts to read the file does not cause any harm to your application, write it in binary.