With Java 11 on Windows, I can get info about my files using:
import javax.swing.filechooser.FileSystemView
var type = FileSystemView.getFileSystemView().getSystemTypeDescription(file)
var icon = FileSystemView.getFileSystemView().getSystemIcon(file)
On Ubuntu (20.04) however, things are different. By now, I've figured out that the icon has a ToolkitImage inside instead of a BufferedImage, which is annoying because it's internal API, but I can render that now.
The remaining problem is the file type, which still returns null on Ubuntu when using the FileSystemView, or returns "Generic File" for every file if using the new FileChooser().getTypeDescription(file) way.
How can I get a proper file type description on Ubuntu?
getFileSystemView is broken
A bold claim: Whatever you wanted this to do, it won't work. Based on looking at the source. You can skip this section if you're willing to accept it's a dead end, but I best back up such a claim, so read on if you'd like to be convinced:
The sources I have here for both JDK11 and JDK17 do the following relatively simplistic approach for FileSystemView.getFileSystemView():
If the value of File.separator is \\, return the windows implementation.
If the value is /, return the unix implementation (that'd therefore be just about everything else, notably including macs).
Otherwise return the generic implementation. Let's forget about this, which OS has neither / or \\ at this point? pre-MacOSX mac os is long dead at this point, that's the only one I can think of.
The unix implementation is:
return null;
Oof. That's not going to get us very far. The windows implementation goes with ShellFolder. Which is general code; I do not understand why the unix implementation just disregards it.
Perhaps this explanation makes the most sense: .getSystemTypeDescription is intended to return the opinion of the OS itself as to how one would describe the type of file this is. The reason the unix implementation just return null;s is simply that as a concept this isn't how unix works. The OS itself doesn't have some sort of registry that maps file extensions to names (such as windows' HKEY_LOCAL_MACHINE/.txt and friends), nor does it have a concept where each file has its own metadata that contains additional info, such as 'which app created me' / 'which app should be used to open me when double clicked', such as MacOSX does. (Of course, if you do run this on a mac, you still get null, which really isn't excusable).
Of course, we now get into a more tricky debate: What is your OS, really. One could say 'well, its linux, and KDE, or GnomeDesktop or whatnot, well, that's just this app, you know'. But one could also say that you run the java app on the OS 'KDE/Linux'. In other words, what does System mean when we talk about FileSystemView. Evidently, the JDK impl source I'm looking at (which is OpenJDKs) chooses to define it as 'just linux', which has no such thing as the 'system's opinion on what type of file this is', making return null; a correct, but mostly useless, answer.
The getSystemIcon implementation of the abstract supertype itself is weird: It is a near carbon copy of the windows-specific implementation of getSystemTypeDescription - namely: Get the ShellFolder object, then ask it. I have no idea why on unix, 'just ask the shellfolder' is the implementation of getSystemIcon, whereas 'just return null' is the implementation of getSystemTypeDescription - why not also ask the shellfolder?
At any rate, even if you did, not much use there: The default shell folder implementation always returns null. This is sun.awt code so it is considerably more likely that the implementation of AWT for that specific platform overrides it, but this isn't in the openjdk sources as far as I looked, at any rate.
The default impl of getSystemIcon will return either a generic file icon or a generic folder icon (by invoking UIManager.getIcon("FileView.directotyIcon"), for example) if the ShellFolder returns null as an icon.
So let's give up on this implementation: Conclude it cannot help you.
Define 'type description'
What does that really mean? I can only foresee 3 useful takes on what this is supposed to mean:
Something that human eyeballs and brains will likely understand.
A mime type, which is a universal standard for describing file types.
"Whatever the window manager that the user is using would see in the local equivalent of a file explorer app - explorer.exe, on windows, Finder.app on mac, etc".
Presumably the getSystemTypeDescription is the method that is supposed to answer the 3rd option (the local window manager's description). But, given that OpenJDK doesn't actually implement this (well, it does, in a useless way, by just returning null), the only way you're getting that is if you put in the considerable effort to figure out how each and every popular window manager used worldwide does it and port it all over to java code. I assume you're not interested in doing that kind of work.
But the other 2 - there are ways to get that.
Let's start with mime types.
Plan A is to ask java:
import java.nio.file.*;
class Test {
public static void main(String[] args) throws Exception {
var p = Paths.get("test.otf");
Files.createFile(p);
System.out.println(Files.probeContentType(p));
Files.delete(p);
}
}
Save to that a file and run it: java Test.java (yay JDK11+ where you can just pass java files to the JVM executable), and see if it works. That is, that should be returning application/font-sfnt for you. It does, for me at any rate, with Coretto JDK17 (java -version: openjdk version "17.0.3" 2022-04-19 LTS) on Ubuntu 20.04.1.
Running it with Temurin 17 (JDK from the Adoptium project) on mac: font/otf. Oh well, that's embarrassing, perhaps. But it's not necessarily a bad answer. Unfortunately, the Mac's own Finder app has a 'type description' column and that's "OpenType® font", not "font/otf". Presumably macs have a mimetype to human readable description database someplace that as far as I know you can't access with generic java code. Still, "font/otf" is better than "an .otf file", presumably.
If the probe method isn't working for you, you can always choose to check if /etc/mime.types exists, which should exist on linuxen. For each line, .split("\\s+"); v[0] is the mime type, and the remaining elements are each an extension without the dot, e.g. my ubuntu install would list application/font-sfnt as being the mime type for types otf and ttf.
Yet another alternative is to ship a known list of extension-to-mimetype mappings. For example, The Eclipse Jetty has a MimeTypes class that is pre-filled with this sizable list of known extensions.
Steve Jobs / flash the MIME gang-sign
If you're like Steve or the MIME consortium, this whole business of treating 'the stuff after the last dot in the file name' as somehow indicative of what kind of file it is, leaves a bad taste in your mouth and you'd like to avoid it. You can, sort of. On unixen anyway. Most unix installs have /usr/bin/file - both my mac and the ubuntu install I'm looking at has this. You can ProcessBuilder.exec that. This tool does not look at the file name at all, solely at the actual content. It might be slow (reads the whole thing if it needs to), but, if I run it on an OTF file, it spits out:
actual-valid-font-file.tof: OpenType font data
which is certainly a string I could show to a user that's "prettier" than font/otf, though it isn't quite what a native mac app would show (which shows OpenType® font as mentioned before.
On windows, where file (the filetype guesser application) isn't usually available, well, it sounds like FileSystemView.getFileSystemView().getSystemTypeDescription(file) actually works. I bet the number of systems where /usr/bin/file doesn't exist, and getSystemTypeDescription returns nothing useful, is infinitesemal.
Icons
Presumably you want the same thing here: Give me that icon which would be familiar to the user, which runs into the same issue, especially on linux: Each and every 'file explorer' app has its own icon set, and there are a lot of file explorer apps - just about every window manager ships their own version of it and there are a lot of linux window managers. I'm not sure any JVM impl out there has code to fetch the right icon out of all of those different window manager implementations, and I don't think there's a standardized way to accomplish this using just plain jane disk access, either.
But, we've established you can pick up the mime type (if using /usr/bin/file, there's the --mime-type option. (My /usr/bin/file gives me application/vnd.ms-opentype, so that's 3 different mimetypes for the same thing already, boy that whole XKCD comic of 'there are 14 different standards' comes up a lot, doesn't it)
Given a mime type, there are loads of icon sets out there, free and open source.
The Oxygen icons project is a FOSS iconset hosted on github with an icon (in various sizes) for a boatload of mimetypes. You could use .getSystemIcon first, and if that doesn't return a suitable answer (bit tricky; sometimes you get a generic 'its a file' icon which you might not want), then use an icon set. You won't be matching the Look-n-Feel of the platform, but then again if this question is really just "I want to write an app in swing that looks indistinguishable from the host OS, be it windows, mac, KDE, Gnome, Xfce, Cinnamon, Budgie, or Enlightenment", the only pragmatic answer pretty much has to be: "Just give up on that pipe dream".
NB: Hoi :)
The code for getSystemTypeDescription is
public String getSystemTypeDescription(File f) {
return null;
}
which is overridden in WindowsFileSystemView, but not in UnixFileSystemView. Maybe JFileChooser suits your needs:
JFileChooser chooser = new JFileChooser();
String type = chooser.getTypeDescription(file);
I get a FileNotFoundException during the execution of getSystemIcon. Following the code, in the method getShellFolder there is this snippet
if (!Files.exists(Paths.get(file.getPath()), LinkOption.NOFOLLOW_LINKS)) {
throw new FileNotFoundException();
}
so symbolic links are not followed, and maybe that's the issue. But again, JFileChooser works:
Icon icon = chooser.getIcon(file);
I'm trying to make a list of all the files and folders on a mounted NTFS Volume, and I made 2 ways to do it so far, all yielding different results (unfortunately).
(NOTE: I couldn't include additional sources here because link limit)
There are a few things I would like cleared up:
(1) How come certain files/folders have weird unrecognizable characters in the middle of the name? and how do I write print them to wstringstream and then how would I properly write them to a wofstream?
Example file path: C:\Users\Rahul\AppData\Local\Packages\winstore_cw5n1h2txyewy\LocalState\Cache\4\4-https∺∯∯wscont.apps.microsoft.com∯winstore∯6.3.0.1∯100∯US∯en-us∯MS∯482∯features1908650c-22a4-485e-8e88-b12d01c84f2f.json.dat
How it appears if you were to use dir in cmd: C:\Users\Rahul\AppData\Local\Packages\winstore_cw5n1h2txyewy\LocalState\Cache\4\4-https???wscont.apps.microsoft.com?winstore?6.3.0.1?100?US?en-us?MS?482?features1908650c-22a4-485e-8e88-b12d01c84f2f.json.dat
How it appears if you were to use wprintf in C++: C:\Users\Rahul\AppData\Local\Packages\winstore_cw5n1h2txyewy\LocalState\Cache\4\4-https
The file name shows properly in windows explorer, but has trouble being printed in cmd. It appears as a box in notepad++, but if you right-click, it shows it properly, so notepad++ can also display the characters properly (sort-of, encoding change maybe?).
I'm currently using (ss is the stringstream, initialized as wstingstream ss("");)
wstringstream ss("");
(my program methods here)
wofstream out("...", wofstream::out);
out << ss.rdbuf();
out.close();
I'm assuming that the encoding has at least something to do with it, but at the same time, I'm not sure which flags to use.
(2) Are all files listed in the MFT?
Every link on NTFS says that all file information and attributes are stored in the MFT, but according to the open source NTFSLib (have a link limit, can be found by googling An-NTFS-Parser-Lib), there are 131840 file records.
When I run my own program, I end up with this 50MB file (includes permissions and the such). My program uses FSCTL_MFT_ENUM_USN_DATA and CreateFile for handles and GetFileInformationByHandle for getting extended information.
CreateFile takes in the WCHAR* normally, and doesn't have the weird null termination issues (I think, maybe, not even sure anymore, this might be where the missing files are).
It shows that there are 129454 files that it could read, I'm assuming that the other 131840-129454=2386 files are files that were deleted but are still in the USN journal.
(3) How come my Java version of the code outputs more file records than the MFT even contains?
The output of my Java code is a 150MB file (includes permissions, enumerates with names instead of symbols because I don't know how to not do that, so it's way bigger).
As you can see here, there are 161430 file records in this one. That's more than what NTFSLib said there are. Yes, it is the case that probably many of those 131840 file records are 'additional names', but I explicitly avoided symlinks in my Java version. Is it the case that those extra 30000 files are generated from hardlinks or somehow having more names is independent from being symlinks?
Solution to (1):
You must write your own library that can write UTF-16, since writing sometimes will run into cases where the characters are misaligned and will think that there is a null, for example:
0xD00A may run into the 0x00 character during a misalign and thus will terminate.
I used the following two files to write out as unicode. Handles wchar_t, wchar_t*, char, char*, unsigned long, and unsigned long long:
UTF16.h,
UTF16.c
(2,3):
Yes, they're all there. You can find the number of links in the GetInformationByHandle method and this will count up to the number of files that the Java one contains.
Still looking for: How do you list the names of all the links to the file record in the MFT?
I have a main template that captures a string:
#(captured: String)
.... other templating stuff
I have a sub template that wants to utilize #captured:
.... somewhere in this templating stuff we have:
#subTemplate(#captured) <- wants to use #captured
I try this and I get nothing but errors. Im sure this MUST be possible, so what am I doing wrong? Im sorry if this question is simple, I just dont know how to succinctly phrase it for Google.
You need to remove the trailing # symbol on captured when it is being passed in as a variable.
e.g
#subTemplate(#captured) --> #subTemplate(captured)
The reason why this is the case is because # is a special symbol that tells Play that the template engine is about to do some computation, rather than just outputting HTML. In the case above, by calling the sub template, you have already started a computation (i.e used the # symbol), so you do not use it again inside the parenthisis, because the compiler is already in computation mode.
This was exactly the same in the Play 1.x template engine.
Remove the leading 'at' in #captured. For some odd reason, Play didnt wanna pick up on this and make it work until now. Seeing if i can reproduce the problem.
when printing with Java, one can select the media tray (within PrintRequestAttributeSet). Then one can pass this setting to a printjob and have one's document printed to the given tray.
My question is now: can I somehow specify that the first page is printed to one tray and the second one to another tray within one print job?
I'm reluctant to creating two separate print jobs, because my usage scenario is a mass-print, of say 1000 documents. Each document has some pages going to tray 1 and some pages going to tray 2.
If I have to create a new print job on each switch of trays, I would end up with several thousands of separate print jobs and I'm afraid of all sorts of print-spooler overruns and system crashes. Thus my preference to somehow sneak those "switches of tray" into one print job.
I'm pretty sure that it can be sone somehow, but didn't succeed so far.
I thought about creating those thousands of PrinterJobs, but having them print to a StreamPrintService (instead of an actual print service), thus capturing the switches of tray along with the actual printing data. Then I was planning to concatenate the results of those single "virtual" prints and send it all to a real printer in one real print job.
However, with java 1.6 there seems to be only one StreamPrintService, which can only output postscript.
So: is there a way to capture the raw, native output from a native printer driver (using java)? Does it seem practical to you to concatenate that output and send it to the printer, in order to solve my problem?
I would be glad, also about comments regarding only parts of the problem..
By adding a "Destination" attribute to one's print attribute set, the print can be redirected to a local file. That file contains the printjob in whatever language the actual printer's driver uses to talk in.
In my case, I ended up with postscript files.
I created two postscript files, each printing from a different tray and then send their concatenation to the printer. And it worked :-) ! I switched of the printer to verify that there is only one printjob and I wrote some numbers on the paper in the those trays. So I guess I can be sure that it's not only wishful thinking ;-).
However, I think I won't pursue this topic in depth, because
I'm not a printing guru and have my doubts that this approach works every case.
For our customer, the whole thing is a nice-to-have at this time, so there won't be a budget to reasearch further on the circumstances under which this little idea works
I need to analyze a log file at runtime with Java.
What I need is, to be able to take a big text file, and search for a certain string or regex within a certain range of lines.
The range itself is deduced by another search.
For example, I want to search the string "operation ended with failure" in the file, but not the whole file, only starting with the line which says "starting operation".
Of course I can do this with plain InputStream and file reading, but is there a library or a tool that will help do it more conveniently?
If the file is really huge, then in your case either good written java or any *nix tool solution will be almost equally slow (it will be bound to IO). In such a case you won't avoid reading the whole file line-by-line.... And in this case few lines of java code would do the job ... But rather than once-off search I'd think about splitting the file at generation time, which might be much more efficient. You could redirect the log file to another program/script (either awk or python would be perfect for it) and split the file on-line/when generated rather than post-factum.
Check this one out - http://johannburkard.de/software/stringsearch/
Hope that helps ;)