Hacker News new | past | comments | ask | show | jobs | submit login
The File Filesystem (2021) (mgree.github.io)
346 points by wegwerff 21 days ago | hide | past | favorite | 101 comments



Oh this is cool! I recently wrapped libfuse in Nim and after porting the 'hello' filesystem example I made one which is more or less exactly this. However my version you pipe data and have to provide a mountpoint, then when it's done it writes the result over stdout. That means you can inline it in a pipe chain but also that you have to make sure to grab the output.

At the moment I'm exploring other stuff which could be made into file systems. I've got a statusbar thing for the Nimdow window manager which allows you to write contents to individual files and it creates a bar with blocks on them as the output. It makes it super easy to swap out what is on your bar which is pretty neat.

Another tool I've made is a music player. It uses libvlc and when given a folder it reads all the media with ID3 tags and sets up folders like 'by-artist', 'by-album', etc. Each file is named as '<track number> - <song title>' and contains the full path to the actual file. To play a song you cat one of these files into 'control/current' and write the word play to 'control/command'. There's a bit more to it like that like a playlist feature and some more commands, but that's the basic idea. The goal is to have a super-scriptable music player.


Here's an idea: recursively mount code files/projects. Use something like tree-sitter to extract class and function definitions and make each into a "file" within the directory representing the actual file. Need to get an idea for how a codebase is structured? Just `tree` it :)

Getting deeper into the rabbit hole, maybe imports could be resolved into symlinks and such. Plenty of interesting possibilities!


Would you mind sharing the Nim code? I've been interested in working with FUSE for a while, and use Nim for a few projects.

No worries if not, I'm just curious!


Not PMunch, but bindings¹ and statusbar².

Nimble has a couple of fuse projects and wrappers registered too.

¹ https://github.com/PMunch/libfuse-nim

² https://github.com/PMunch/statusbar


The audio player is unfortunately not on GitHub yet, I've still got a few kinks to work out before it's in a shareable state. The statusbar project was also shared mostly so the other Nimdow users could play around with it, so the code quality is quite sub-par.


JNRowe beat me to it! Feel free to browse the rest of my GitHub, its mostly Nim code. And if you want to talk Nim or Fuse you can join the Nim Discord server (or the Matrix or IRC bridge) or post on the Nim forum.


This makes me think, it would be nice if there was an easy built-in way to expose information about a process using the filesystem. Something like "cat /proc/$pid/fs/current_track" to get a name of a current song from a music player, or "ls /proc/$pid/fs/tabs" to list open tabs in my browser (and maybe use this to grab the html or embedded images).

I mean right now it's possible to do this using FUSE, but that's convoluted and nobody does it.


Can you actually write to the /proc directory? Working with libfuse I've had much of the same ideas, basically allow programs to expose information as files. The beauty of the fuse system is also that you only have to respond to requests when they happen. So you don't have to actually create all these files until someone asks for them. Another idea I've had is to expose the configuration of a program through a filesystem. Instead of having a config file and a refresh command or a complicated IPC you could simply write to files to change the config.


Useful enough that it should be an OS-level standard feature, imho.

Unix-like OSes allow mounting disk images to explore their contents. But there's many more file formats where exploring files-inside-files is useful. Compressed archives, for one. Some file managers support those, but (imho) application-level is not the optimal layer to put this functionality.

Could be implemented with a kind of driver-per-filetype.


Really what you'd like to see is a way to write the mount command for each file type (do one thing well) and another command to detect the file type and dispatch accordingly (probably similar to the `file` command), all in user space.

The only thing standing in the way of this today is that MacOS doesn't expose a user space file system API. You can do this on Linux, Windows, and BSDs today.

(No, file provider extensions don't cut it, Apple devs who read this, please give us a FUSE equivalent, we know it exists).


I want stuff to work like ZipMagic did in the early 90s/ early 2000s.

You could cd into zip files, they would act as directories and files at the same time.

I seem to remember Linus saying a file could act like a directory in Linux a long time ago too.

Though I don't think Linux has filters for the filesystem like Windows does so implementation might be more tricky.


Does https://osxfuse.github.io/ cover this? Or is there some fundamental issue? (beyond "it's not built in")


Recent macOS versions do have a general purpose built-in API for user mode filesystems. That API is incompatible with FUSE. The big problem is it is undocumented and you need an entitlement from Apple to use it, and Apple won’t give you that.

Apple do have a publicly available API for cloud file systems (Dropbox-style products), but it makes a lot of assumptions which makes it effectively unusable for other use cases.

Then there are third party solutions like osxfuse. These have the problem that they rely on kernel extensions and Apple keeps on making those harder and harder, and is aiming to get rid of them; plus, they are all now proprietary licensed, albeit often with a free license for open source use.

One approach that does work without any kernel extensions or private APIs is to make your user filesystem an NFS server and then mount that. One competitor to osxfuse does that, but it also is proprietary


According to the MacFUSE author, their specific approach is not actually undocumented:

> Apple has put it in an umbrella called "unsupported" (in the kernel interfaces section) ... either Apple will not take this interface away, and if they do, it will be to provide a better interface

http://preserve.mactech.com/articles/mactech/Vol.23/23.03/Ma...


Yes, that’s not what I’m talking about though

MacFUSE uses the kernel mode VFS API

I’m talking about the undocumented user mode filesystem LiveFS/UserFS/com.apple.filesystems.lifs API which was added in Monterey (macOS 12), and since Ventura (macOS 13) is used to implement the OOTB FAT and exFAT filesystem support. Using that requires private entitlements (e.g. com.apple.private.LiveFS.connection) which Apple (thus far) won’t give to anyone else


What about the File Provider API?


It is designed for the cloud storage use case (Dropbox, Google Drive, etc) - it creates local copies of files and synchs them with remote ones. Not what you want to do in the general case.


I take it from the downvotes people think what I said is factually wrong?

If that's the case, I wish someone would point out where I've got it wrong instead of just silently downvoting


> it makes a lot of assumptions which makes it effectively unusable for other use cases

What use-cases are foreclosed and why?


Well that requires a kext so it's a nonstarter, and fuse-t uses NFS which is extremely janky and unreliable on MacOS.

The fundamental issue is that macOS doesn't provide an API for this natively.


> fuse-t uses NFS which is extremely janky and unreliable on MacOS

I was wondering what issues you were talking about, and then I found this - https://github.com/macos-fuse-t/fuse-t/issues/45 - data corruption

> The fundamental issue is that macOS doesn't provide an API for this natively.

The API is there, Apple just doesn’t want to give anyone outside of Apple the entitlement that lets them use it. I don’t understand why Apple won’t.

Well, I understand that would require them to document it, ship public headers, and support it for external developers - but why not?


> The API is there, Apple just doesn’t want to give anyone outside of Apple the entitlement that lets them use it.

If no one can call it it's not an API, it's an implementation detail. And I don't even think its exposed by headers, just alluded to by people who claim APFS is implemented in user space.

> I was wondering what issues you were talking about, and then I found this

Worse than this, it's possible to DoS a Mac with an NFS server just by refusing to reply to a request. That's unacceptable for a user space file system (although FUSE is only kinda better, in that it can force processes that read from the FS into uninterruptable sleep that prevents them from being killed).

> Well, I understand that would require them to document it, ship public headers, and support it for external developers - but why not?

Because Apple doesn't give a fuck about developers. Every developer will eventually learn this, but for those that haven't - Apple doesn't want you writing software for their platform, unless you're an Apple employee and on an Apple team paid to do it. It's why their docs suck, it's why to learn anything you need to watch ADC videos instead of read manpages, and it's why all the cool stuff is behind protected entitlements that you can't get or will be limited in using.


No, it's almost certainly not because they don't give a fuck about developers. They definitely do.

It's much more likely that they want to:

a. Dogfood the API using internal use cases first when they can still make changes to the API without breaking anything. Note that the latest MacOS releases moved some filesystems into userspace using this new API. They probably learned some stuff by doing that.

b. Work out how to protect system stability from crappy userland filesystems. As you point out, bugs in FUSE providers can hang apps.

c. Work out how such an API interacts with their sandboxing system and how to avoid FUSE-style filesystems being used to subvert the sandbox. This is a common source of exploits in FUSE-style systems and is one of the key learnings from GNU/Hurd: UNIX software is written on the assumption that filing systems aren't malicious and invalidating that assumption creates new bug classes.

d. Work out what the most important use cases are and try to ensure those use cases will have a good or at least uniform UX first.

Providing a FUSE-like API is presumably also just not a high priority. By far the most common use case in terms of number of users is the Dropbox use case. FUSE is mostly used for toys and experiments beyond that (like filefs). Those matter and I'm sure there are friendly geeks on the Darwin team who'd like to enable those, but Linux also works for exploration. Certainly Apple management would not be happy about an engineer who decided to enable nerd experimentation but undermined the security system whilst doing so.

And it's worth remembering that you can have root on macOS. It means disabling SIP and adding a kernel boot arg, but that only takes a few minutes and then you can grant apps any entitlements you like:

https://github.com/osy/AMFIExemption

That's no good for people who aren't developers, but most FUSE filesystems are designed for developers anyway.


> Worse than this, it's possible to DoS a Mac with an NFS server just by refusing to reply to a request.

I wonder if their SMB/CIFS client implementation has these kinds of issues? It probably gets used more heavily

> And I don't even think its exposed by headers

Apple (accidentally?) released some of the private headers for this feature in one of their open source releases: https://github.com/apple-oss-distributions/msdosfs/blob/rel/...


Maybe? It's kind of hard to tell. It's not exactly easy to write any of these servers from scratch to find out. But I wouldn't be surprised - they want app developers to be using the file provider extension API, which is unsuitable for everyone who isn't making a Dropbox clone.

That link is very interesting. It doesn't smell like any other Apple API as they're exposing a vtable with good documentation comments. It would be interesting to hack with this with SIP disabled to see how it works. I'm especially curious about how mount/unmount work and how the plugin registers itself with the OS, or what application is the client/host.


> why don't they

It would make macOS more of a general-purpose OS, would increase the amount of functionality from which third parties would benefit, but Apple themselves would likely not. That would increase the number and variety of tech support requests, ever so slightly but still, and would introduce a few new attack surfaces.

Instead, Apple's strategy is to tighten the macOS more and more, and turn it into a specialist OS completely controlled by Apple, with a few companies like Adobe and Ableton licensing access to its internals.


I've been using OSX since 2003, and developing on it for more than ten years. At no point have I seen anything that it's reasonable to call "tightening macOS", let alone the absurd claim of complete control except for an inner circle of elite companies.

The closest thing would be adding the attestation system, so that unsigned binaries have to be explicitly given permission to run... once. That's a security feature which trades a bit of convenience for a lot of protection, especially for the average user. I have no problem with that sort of thing.

I see this sort of sentiment very frequently from non-users of the operating system, but never from those of us who actually use it. Go figure.


Apple used to be a lot more developer-friendly company. It is part of what got them where they are now - the fact that so many developers use Macs, which in turn encourages business software vendors to support Macs

Stuff like this is of little interest to ordinary users (at least not directly), but appeals to developers

By de-emphasising the developer is experience, they are undermining one of the factors that got them to where they are today


This already exists - avfs[0] does this as a FUSE filesystem. It's not the most intuitive to use, but it works, and is extensible.

[0] https://avf.sourceforge.net/


This was a core design feature of reiserfsv4, but Linux ultimately refused to merge it, probably not helped by the whole murdering-his-wife thing.


> This was a core design feature of reiserfsv4, but Linux ultimately refused to merge it

IIRC, because it contained these strange beasts which functioned as both files and directories - i.e. cat would return data, but then you could cd into them and run ls. Linus (among others) didn’t want to permit those violations of the file-directory dichotomy into the Linux kernel.


Oh that's funny- I remember a much much earlier Linus mentioning how this would he possible in Linux, I didn't know anyone actually did it.

I think you really should be able to "cd" into any kind of structured data.


Honest question: How is this useful? I don’t see any use-case where this would come in handy.


It allows you to use any tool available for regular files, on the files-in-files as well.

As opposed to extract contents and then work on that (requiring extra steps + disk space). Or be limited to what specialized utilities support.


We used this in Gitlab CI. Unfortunately, the only way they deal with artifacts is by putting them in Zip files. Cache between builds would thus be stored as a Zip file. However, fully extracting it before each build would sometimes take as much, if not more time than to just build fresh. Mounting a Zip file as a filesystem allows extracting entries on-demand, at the time a file access would've been made. This was a notable speedup in our compilation process.


tar is what you're looking for, no?


It was a while ago, and I haven't used Gitlab in a few years. Maybe they've added TAR as an option since, but Zip was the only option at that time.


It reduces the code required to convert from N-producers to M-consumer from N x M to N+M because they're reading from and to a well-understood common form.


You could seed compressed archives of massive text files or similar via BitTorrent while making the contents available to your apps in read-only mode.


Exporting data to some format would be easy.


I thought archivemount already did that. Am I missing something?

Anyway, even if that's not what you are looking for, FUSE is a more general mechanism that will allow you to do what you want (well, it seems like, at least) and much more.


It exists :-)

For zip archives, there are fuse-zip and mount-zip which are FUSE filesystem.

As an intermediate between OS level and application-level, there are desktop environment level: gvfs for GNOME and KIO for KDE, but they are compatible only in their own ecosystems.


Would be nice to have something that integrates with 7z - it supports a lot of weird archive types, including "weird" ones I care about (for example PE files, better known as ".exe files").


Or zstd. I have some dd blobs of partitions, the blobs are zstandard-compressed, would like to mount them.


Ratarmount also works for that. However, it, and any other tool I know of, works only well if it was compressed with pzstd because of a limitation of the zstd format. It needs separate zstd frames for fast seeking.


ratarmount for tar files.


> Compressed archives, for one

You can look inside of archives pretty easily with `lsar` (part of the unar package). It works with disk images like ISO 9660 files too

But yes, especially for nested archives, having deeper OS support would be nice.


It's not exactly the same, but nushell provides ways of exploring inside files.



All you need now is a giant pile of rules for which revisions to select and you have the unholy demon that is Rational ClearCase


I always thought Oracle ADE was a cooler demon. Shame the internal talk about productising it never went anywhere.


What!? you didn't add a versioned database layer on a server with code stored in clearcase that stored those ClearCase config specs to manage the configuration of your config specs to manage the configuration of your version control system that had your application configuration in it?! How did you even operate? /s


This is really neat, but when I saw the headline I got excited that it was something I have been looking for / considering writing, and I figure the comments here would be a good place to ask if something like this exists:

Is there a FUSE filesystem that runs in-memory (like tmpfs) while mounted, and then when dismounted it serializes to a single file on disk? The closest I can find are FUSE drivers that mount archive files, but then you don't get things like symlinks.


Closest I found: https://github.com/guardianproject/libsqlfs

> The libsqlfs library implements a POSIX style file system on top of an SQLite database. It allows applications to have access to a full read/write file system in a single file, complete with its own file hierarchy and name space. This is useful for applications which needs structured storage, such as embedding documents within documents, or management of configuration data or preferences. Libsqlfs can be used as an shared library, or it can be built as a FUSE (Linux File System in User Space) module to allow a libsqlfs database to be accessed via OS level file system interfaces by normal applications.


Not purely in-memory, but something like https://github.com/jrwwallis/qcow2fuse maybe? It's clunky compared to OSX's DMGs, but if you squint it achieves similar ends.

Otherwise you could achieve this with a tmpfs wrapped to serialize to a tarball (preserving symlinks) when unmounted.


Oh nice, I didn't even know that existed. I've been using qemu-nbd and parted by hand and it gets cumbersome, so this might help a lot. Thanks!


Why does it have to be in memory?

I’m sure you’re already aware of this, but there are all kinds of very real scenarios that could lead to corrupted data if you’re only flushing the buffer upon unmounting.

Sounds like you’ve got an interesting problem you’re trying to solve though.


does it have to be fuse? cant you mount a disk image with loopback


Wouldn't a disk image have a fixed size? It could be a pain to resize.


qcow2 is auto-expanding


But not trivially loop-mountable. I guess it's possible, though. https://unix.stackexchange.com/questions/268460/how-to-mount...


But will the disk image be fully stored in memory? No.. not with loopback. Either that, or it won't be mutable in memory with commit on unmount.


Put the disk image inside a ramdisk and it's in memory. Write a script for saving to physical disk when dismounting, and you're done.


I can't think of anything _exactly_ like that, but I think you can get close by just copying some type of image file to /tmp and then moving it to disk when you're done after unmounting.


/tmp isn't stored in memory; it's usually a normal on-disk filesystem that's cleared regularly. You want /dev/shm instead, which is a purely in-memory filesystem on normal Linux systems.


> /tmp isn't stored in memory

It is if your system uses tmpfs for /tmp

https://en.wikipedia.org/wiki/Tmpfs


The point they were trying to make is that it doesn't have to be, and it isn't in several of the Linux systems I've used over the years. Assuming that it is is a bad idea.


/dev/shm always is though



Reminds me of Omar Rizwan's TabFS <https://omar.website/tabfs/>


Ha. I did this back in 2003. It's surprisingly fast, and makes it simple to do granular locking.

I used it as a per-user database for a web-templating language for a giant web-site building tool.


When I saw the title I thought it was a meme.

But wow what a clever idea. Not sure id ever need to reach for it personally as I do most data processing in a higher level language, but I can imagine people can find use cases.

Nice out of the box thinking


This looks awesome, I need to give it a try asap. I can very well see myself using this to navigate or search inside JSON files


If you're intrigued by this then Solvent-Configr might be of interest: https://aws.amazon.com/marketplace/pp/prodview-i3ym46leenag4

It uses file system mechanics to model objects, meaning you can design object based solutions that support file system style navigation.

Demo: https://youtu.be/XgTgubZQPHw


What happens if your JSON key has a slash?


It's an interesting idea but I think the usefulness would be greatly enhanced if it could handle json arrays; most needed json structures contain array elements in my experience


per an issue ticket[1], it can:

setfattr -n user.type -v list # use xattr on macOS

[1]: https://github.com/mgree/ffs/issues/66


Hmm, this opens the possibility to also commit these files as directory structures. I wonder how this would affect merges and conflicts.


Neat. Now how about a filesystem that takes a directory of files and exposes it as a single json file? You could call it the Filesystem File, and mount it in the File Filesystem if you wanted...


this is cool but it's fuse.. which is not so cool.

these days i reach for jq.. I've recently became interested in duckdb too.

Using a tool that is specialized for the format is usually more ideal than a generic one treating everything as files.

There is a lot in JSON that can't be represented in a flexible way as files and directories. For instance, what if a key has "/" in it.. What happens to lists and their order when you re-serialize How are hashes represented.. how can you tell if a parent is a object or a list.. inserting a item into a list is a ton of error prone renames.. the list goes on

(edited for formatting)


This reminds me of [pry](https://pry.github.io/) for Ruby that allows you to cd into other scopes


For fuck's sake! Not everything needs to be a file!

(This is a joke. I love the idea and execution.)


Daniel Stenberg, of cURL fame, co-write an Amiga-editor called FrexxEd where the open buffers were exposed as files in the filesystem.

Meaning you could write any shell script to manipulate an open buffer (not that important as it also exposed all editor functionality both via IPC via Arexx and via FPL - a C-like scripting language), and that you could e.g. compile without saving (that was very helpful on a system where a lot of people might only even have a single floppy drive, and where being able to have the compiler in the drive and compile straight from the in-memory version in RAM so you didn't have to keep swapping floppies was highly useful (just remember to save before actually trying to run the program - no memory protection...)


Classic MacOS in the 80s had "MPW" macintosh programmers workshop - that treated open text windows as files and selections within the windows as files, so it wasn't uncommon to have a portion of an otherwise documentation file have a "click here and hit enter", which would use the selected text as stdin for some semi-ported unix tool. (no memory protection or multitasking, so true pipelines with backpressure didn't work)


BBEdit has something akin - you can select text and run a Unix command on the selection via temp files. Very useful


I want a webbrowser that does that, lets me shell-cd into each tab as a directory


I see what you did there.


remind me of djb's envdir


All I see is a generic tree walk mechanism, here implemented in folders/files, but in plan9 it's .. plumber ?


No XML, Excel,PDF or CSV support yet.


I guess support for XML would be tricky, because XML is just way more complex format than the ones already supported. It is still essentially a tree, but with additional structure.

Representing elements and their contents is easy enough. But attributes, comments, processing instructions, entities... And remember, an XML document can include a DTD (it does not have to be in a separate file).

To present it as a file system in a useful, non-convoluted way? I will be very, very interested if it's possible, but not holding my breath.


On the one hand, I can’t help but point out you forgot to mention the other big inherent complexity that would make XML-as-FS a uniquely complex beast: namespaces.

On the other hand, I can’t help but point out that a related technology comes very close to demonstrating how you might map XML to a file system: XPath. Probably the biggest issue would be syntax, and again largely due to namespaces.


Could this be used in Windows by exposing a Samba file share from WSL or Docker?


I would gently suggest naming it filefs or something; ffs already means https://man.freebsd.org/cgi/man.cgi?ffs(7)

That said - good idea/approach; seems like an excellent way to cleanly extend the unix approach to structured file formats:)



There's another expansion to that acronym that I can think of. I think the joke is implied.


Op’s ffs do not target FreeBSD and it l seems like referenced system is FreeBSD only. Claiming naming rights is a stretch here


That's ahistorical. FFS is the berkeley Fast File System, from BSD 4.2, in 1983.


FFS predates FreeBSD and is in some capacity supported by all 3 major BSDs. I'm fairly confident that Linux actually supports it through the ufs driver ( https://github.com/torvalds/linux/tree/master/fs/ufs ); whether the use of different names in different places makes it better or worse is an exercise for the reader.


Yeah please rename it to JFS (JSON File System). Oh wait...


There is a reason I suggested "filefs"; 3 chars isn't really enough to easily be unambiguous.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: