zlacker

[parent] [thread] 5 comments
1. mike_h+(OP)[view] [source] 2022-09-10 16:17:05
What exactly are you thinking of? Git manages files after all.

If you mean the underlying data structures, that's basically what modern filesystems are. XFS, APFS, BTRFS etc are all copy-on-write file systems that use git-like structures underneath. In the same way that git branches are "instant" because no data is actually copied, so too can these file systems clone files or whole trees instantly without copying any real data. You can easily "branch" a directory tree, rsync the results to a remote machine, etc.

replies(2): >>carapa+l7 >>comex+Ko
2. carapa+l7[view] [source] 2022-09-10 17:00:08
>>mike_h+(OP)
The thought is somewhat inchoate still. I'm working with a pure functional language (Joy https://joypy.osdn.io/ ) and when it came time to add filesystem support I balked. Instead, I'm trying out immutable 3-tuples of (hash, offset, length) to identify sequences of bytes (for now the "backing store" is just a git repo.) Like I said, it's early days but so far it's very interesting and useful.

I get what you're saying about modern filesystems, and I agree. I guess from that POV I'm saying we could stand to remove some of the layers of abstraction?

replies(1): >>mike_h+ab
◧◩
3. mike_h+ab[view] [source] [discussion] 2022-09-10 17:23:53
>>carapa+l7
Well, Git still uses mutable state stored in files. You can't avoid it - the world is mutable. The question is how to expose and manage the mutations.

At any rate you might be interested in a few different projects:

1. BlueStore: https://ceph.io/en/news/blog/2017/new-luminous-bluestore/

2. The DAT or IPFS protocols, which are based on the idea of immutable logs storing file data, identified by hashes, with public keys and signatures to handle mutability.

4. comex+Ko[view] [source] 2022-09-10 18:52:05
>>mike_h+(OP)
For one thing, it would be nice if every directory had a hash that covered all its contents (recursively), like Git tree objects. That way, all sorts of tools that need to check which files in a directory tree have changed – including `git diff`, `make`, file sync tools, and indexed search tools – could immediately skip directories with no changes, without needing a separate fsmonitor tool. The cost would be higher overhead recalculating hashes when files are constantly being updated.

It would also be nice to support filesystem transactions, somewhat analogous to Git commits. POSIX file APIs make it difficult to avoid race conditions in the best circumstances, and extremely difficult if there’s a security boundary involved. You can never check something about a path (e.g. “is this a symlink?”) and then rely on that thing being true, because it could have been concurrently modified between the check and whatever you do next. So you have to rely on the limited atomic semantics available - for example, instead of explicitly checking for a symlink, you can use O_NOFOLLOW - but those are not always sufficient, depending on what you’re trying to do. It shouldn’t be this way. I should be able to take a frozen snapshot of the filesystem, inspect it to my heart’s content, make the changes I want, and finally atomically commit or abort.

Regarding copy-on-write clones, can any of those filesystems actually clone arbitrary directories? In APFS’s case, you can make copy-on-write clones of files, and you can make copy-on-write snapshots of entire volumes, but you can’t just take some random directory and make a copy-on-write clone of it (without individually cloning all the files under the directory). I believe the same limitation exists for some or all of the modern Linux filesystems.

replies(2): >>mike_h+fE >>jra_sa+CH
◧◩
5. mike_h+fE[view] [source] [discussion] 2022-09-10 20:44:12
>>comex+Ko
Filesystems do have timestamps, but they aren't propagated to the root for performance reasons - normally you want file IO to be fast more than you want to be able to quickly test if a large directory tree changed and propagation would make all writes contend on a single lock (for the root dir hash/ts).

Agreed about fs transactions. Not sure about cloning directory trees. I thought btrfs could do it but I never used that feature. You might well be right.

◧◩
6. jra_sa+CH[view] [source] [discussion] 2022-09-10 21:18:41
>>comex+Ko
I'm giving a (slightly updated) version of my talk at the Storage Network Industry Association Storage Developer's Conference (2022) in Freemont, CA next thursday:

https://storagedeveloper.org/events/sdc-2022/agenda/2022-09-...

"Symbolic links Considered Harmful"

Might be relevant to readers :-).

[go to top]