I'm trying to make a list of all the files and folders on a mounted NTFS Volume, and I made 2 ways to do it so far, all yielding different results (unfortunately).
(NOTE: I couldn't include additional sources here because link limit)
There are a few things I would like cleared up:
(1) How come certain files/folders have weird unrecognizable characters in the middle of the name? and how do I write print them to wstringstream and then how would I properly write them to a wofstream?
Example file path: C:\Users\Rahul\AppData\Local\Packages\winstore_cw5n1h2txyewy\LocalState\Cache\4\4-https∺∯∯wscont.apps.microsoft.com∯winstore∯6.3.0.1∯100∯US∯en-us∯MS∯482∯features1908650c-22a4-485e-8e88-b12d01c84f2f.json.dat
How it appears if you were to use dir in cmd: C:\Users\Rahul\AppData\Local\Packages\winstore_cw5n1h2txyewy\LocalState\Cache\4\4-https???wscont.apps.microsoft.com?winstore?6.3.0.1?100?US?en-us?MS?482?features1908650c-22a4-485e-8e88-b12d01c84f2f.json.dat
How it appears if you were to use wprintf in C++: C:\Users\Rahul\AppData\Local\Packages\winstore_cw5n1h2txyewy\LocalState\Cache\4\4-https
The file name shows properly in windows explorer, but has trouble being printed in cmd. It appears as a box in notepad++, but if you right-click, it shows it properly, so notepad++ can also display the characters properly (sort-of, encoding change maybe?).
I'm currently using (ss is the stringstream, initialized as wstingstream ss("");)
wstringstream ss("");
(my program methods here)
wofstream out("...", wofstream::out);
out << ss.rdbuf();
out.close();
I'm assuming that the encoding has at least something to do with it, but at the same time, I'm not sure which flags to use.
(2) Are all files listed in the MFT?
Every link on NTFS says that all file information and attributes are stored in the MFT, but according to the open source NTFSLib (have a link limit, can be found by googling An-NTFS-Parser-Lib), there are 131840 file records.
When I run my own program, I end up with this 50MB file (includes permissions and the such). My program uses FSCTL_MFT_ENUM_USN_DATA and CreateFile for handles and GetFileInformationByHandle for getting extended information.
CreateFile takes in the WCHAR* normally, and doesn't have the weird null termination issues (I think, maybe, not even sure anymore, this might be where the missing files are).
It shows that there are 129454 files that it could read, I'm assuming that the other 131840-129454=2386 files are files that were deleted but are still in the USN journal.
(3) How come my Java version of the code outputs more file records than the MFT even contains?
The output of my Java code is a 150MB file (includes permissions, enumerates with names instead of symbols because I don't know how to not do that, so it's way bigger).
As you can see here, there are 161430 file records in this one. That's more than what NTFSLib said there are. Yes, it is the case that probably many of those 131840 file records are 'additional names', but I explicitly avoided symlinks in my Java version. Is it the case that those extra 30000 files are generated from hardlinks or somehow having more names is independent from being symlinks?
Solution to (1):
You must write your own library that can write UTF-16, since writing sometimes will run into cases where the characters are misaligned and will think that there is a null, for example:
0xD00A may run into the 0x00 character during a misalign and thus will terminate.
I used the following two files to write out as unicode. Handles wchar_t, wchar_t*, char, char*, unsigned long, and unsigned long long:
UTF16.h,
UTF16.c
(2,3):
Yes, they're all there. You can find the number of links in the GetInformationByHandle method and this will count up to the number of files that the Java one contains.
Still looking for: How do you list the names of all the links to the file record in the MFT?
Related
When I call endsWith(".pdf"), would this open malware.pdf.exe or just malware.pdf?
String sFileName = request.getParameter("fName");
if (sFileName.toLowerCase().endsWith(".pdf"))
// open file
else
// don’t open the file
String.endsWith works as documented. However, there are a couple of obvious problems here.
A NUL character \0 will typically terminate the string as far as the OS file API is concerned (because it'll be using C strings).
If served up, may lose content by extension, possibly being macgiced to a different type.
It's generally dangerous to run PDFs downloaded from the internet from the local filesystem. (Chrome warns of this and see Billy Rios on Content Smuggling).
.endsWith("string") will perform as you intend. However, that doesn't mean that the file is actually a pdf. Check out this SO question or others for more information on how to check the header.
I have Java code doing the following:
Create a temporary empty file with ZIP extension using File.createTempFile()
Delete it with File.delete() (we only really wanted it to generate a temp file name)
Copy a "template" ZIP file to the same path with com.google.commons.io.ByteStreams.copy() using a new OutputSupplier given the same filename
Modify the ZIP archive (remove a directory) using TrueZIP 7.4.3
On a specific system, step 4 fails consistently with FsReadOnlyArchiveFileSystemException - "This is a read-only archive file system!" (see http://java.net/projects/truezip/lists/users/archive/2011-05/message/9)
Debugging the TrueZIP code, I noticed the following:
There is no open file handle on this file between any of the steps above, and specifically not before step 4
Checking the same file with File.canWrite() rather than NIO returns at the exact same timing (using a debugger), it shows that it is writable
Here is what you see in the debugger expressions list:
fn => "C:/myworkdir/temp/myfile4088293380313057223tmp.zip"
java.nio.file.Files.isWritable(java.nio.file.Paths.get(fn)) => false
new java.io.File(fn).canWrite() => true
Using JDK 1.7.04
Any ideas?
There is a bug in java.nio.file.Files.isWritable under windows:
it won't take implicit permissions into consideration.
java bug #7190897
The end result isn't too surprising:
java.nio.file.Files.isWritable(java.nio.file.Paths.get(fn)) => false
new java.io.File(fn).canWrite() => true
File.canWrite doesn't pay attention to ACLs at all and only checks the MS-DOS read-only attribute.
Files.isWriteable pays attention to ACLs but for whatever reason (to keep broken programs broken?) they left File.canWrite un-fixed. This turns out to be lucky, because in some situations it seems like it can return false even when you can open the file with no problems.
Really, I would summarise the methods like this:
File.canWrite sometimes returns true when you can't actually write to the file.
Files.isWriteable sometimes returns false when you can actually write to the file.
I'm not sure what the point of either method is right now. Since everyone who uses these ultimately has to write a non-broken equivalent which actually tries to open the file, one wonders why they didn't just open the file to do the check themselves.
I would avoid using both APIs and instead rely on the exceptions thrown by e.g. new FileOutputStream(). They at least are real, and of real concern. Using the APIs you mention is entirely pointless, and it introduces timing windows and repeated code. You have to catch the IOException anyway: why write all that code twice?
I need to analyze a log file at runtime with Java.
What I need is, to be able to take a big text file, and search for a certain string or regex within a certain range of lines.
The range itself is deduced by another search.
For example, I want to search the string "operation ended with failure" in the file, but not the whole file, only starting with the line which says "starting operation".
Of course I can do this with plain InputStream and file reading, but is there a library or a tool that will help do it more conveniently?
If the file is really huge, then in your case either good written java or any *nix tool solution will be almost equally slow (it will be bound to IO). In such a case you won't avoid reading the whole file line-by-line.... And in this case few lines of java code would do the job ... But rather than once-off search I'd think about splitting the file at generation time, which might be much more efficient. You could redirect the log file to another program/script (either awk or python would be perfect for it) and split the file on-line/when generated rather than post-factum.
Check this one out - http://johannburkard.de/software/stringsearch/
Hope that helps ;)
I am currently writing a program which takes user input and creates rows of a comma delimited .csv file. I am in need of a way to save this data in a way in which users are not able to easily edit this data. It does not need to be super secure, just enough so that it couldn't accidentally be edited. I also need another file (or the same file?) created to then be easily accessible (in the file system) by the user so that they may then email this file to a system admin who can then open the .csv file. I could provide this second person with a conversion program if necessary.
The file I save data in and the file to be sent can be two different files if there are any advantages to this. I was currently considering just using a file with a weird file extension, but saving it as a text file so that the user will only be able to open it if they know to try that. The other option being some sort of encryption, but I'm not sure if this is necessary and even if it was where I would start.
Thanks for the help :)
Edit: This file is meant to store the actual data being entered. Currently the data is being gathered on paper forms which are then sent to the admin to manually enter all of the data. This little app is meant to have someone else enter the data from the paper form and then tell them if they've entered it all correctly. After they've entered it all they then need to send the data to the admin. It would be preferable if the sending was handled automatically, but this app needs to be very simple and low budget and I don't want an internet connection to be a requirement.
You could store your data in a serializable object and save that. It would resist casual editing and be very simple to read and write from your app. This page should get you started: http://java.sun.com/developer/technicalArticles/Programming/serialization/
From your question, I am guessing that the uneditable file's purpose is to store some kind of system config and you don't want it to get messed up easily. From your own suggestions, it seems that even knowing that the file has been edited would help you, since you can then avoid using it. If that is the case, then you can use simple checks, such as save the total number of characters in the line as the first or last comma delimited value. Then, before you use the file, you just run a small validation code on it to verify that the file is indeed unaltered.
Another approach may just be to use a ZIP (file) of a "plain text format" (CSV, XML, other serialization method, etc) and, optionally, utilize a well-known (to you) password.
This approach could be used with other stream/package types: the idea behind using a ZIP (as opposed to an object serializer directly) is so that one can open/inspect/modify said data/file(s) easily without special program support. This may or may not be a benefit and using a password may or may not even be required, see below.
Some advantages of using a ZIP (or CAB):
The ability for multiple resources (aids in extensibility)
The ability to save the actual data in a "text format" (XML, perhaps)
Maintain competitive file-sizes for "common data"
Re-use existing tooling support (also get checksum validation for free!)
Additionally, using a non-ZIP file extension will prevent most users from casually associating the file (a similar approach to what is presented in the original post, but subtly different because the ZIP format itself is not "plain text") with the ZIP format and being able to open it. A number of modern Microsoft formats utilize the fact that the file-extension plays an important role and use CAB (and sometimes ZIP) formats as the container format for the document. That is, an ".XSN" or ".WSP" or ".gadget" file can be opened with a tool like 7-zip, but are generally only done so by developers who are "in the know". Also, just consider ".WAR" and ".JAR" files as other examples of this approach, since this is Java we're in.
Traditional ZIP passwords are not secure, and more-so is using a static password embedded in the program. However, if this is just a deterrent (e.g. not for "security") then those issues are not important. Coupled with an "un-associated" file-type/extension, I believe this offers the protection asked for in the question while remaining flexible. It may be possible to entirely drop the password usage and still prevent "accidental modifications" just by using a ZIP (or other) container format, depending upon requirement/desires.
Happy coding.
Can you set file permissions to make it read-only?
Other than doing a binary output file, the file system that Windows runs (I know for sure it works from XP through x64 Windows 7) has a little trick that you can use to hide data from anyone simply perusing through your files:
Append your output and input files with a colon and then an arbitrary value, eg if your filename is "data.csv", make it instead "data.csv:42". Any existing or non-existing file can be appended to to access a whole hidden area (and every file for every value after the colon is distinct, so "data.csv:42" != "data.csv:carrots" != "second.csv:carrots").
If this file doesn't exist, it will be created and initialized to have 0 bytes of data with it. If you open up the file in Notepad you will indeed see that it holds exactly the data it held before writing to the :42 file, no more, no less, but in reality subsequent data read from this "data.csv:42" file will persist. This makes it a perfect place to hide data from any annoying user!
Caveats: If you delete "data.csv", all associated hidden data will be deleted too. Also, there are indeed programs that will find these files, but if your user goes through all that trouble to manually edit some csv file, I say let them.
I also have no idea if this will work on other platforms, I've never thought to try it.
I want to save a video file in C:\ by incrementing the file name e.g. video001.avi video002.avi video003.avi etc. i want to do this in java. The program is on
Problem in java programming on windows7 (working well in windows xp)
How do i increment the file name so that it saves without replacing the older file.
Using the File.createNewFile() you can atomically create a file and determine whether or not the current thread indeed created the file, the return value of that method is a boolean that will let you know whether or not a new file was created or not. Simply checking whether or not a file exists before you create it will help but will not guarantee that when you create and write to the file that the current thread created it.
You have two options:
just increment a counter, and rely on the fact that you're the only running process writing these files (and none exist already). So you don't need to check for clashes. This is (obviously) prone to error.
Use the File object (or Apache Commons FileUtils) to get the list of files, then increment a counter and determine if the corresponding file exists. If not, then write to it and exit. This is a brute force approach, but unless you're writing thousands of files, is quite acceptable performance-wise.