FUSE, yt-dlp and various animals' cunts

yt-dlp is an incredibly useful program for DOWNLOADING videos off youtube (and also from pretty well anywhere else, it seems, though youtube is by far the most common target I use it on). The reasons this is a useful capability include the following:

You get a permanent copy of the video on your own hard drive, so you can watch it again WITHOUT having to go back to bloody youtube, and if it gets deleted off youtube it doesn't matter because you've still got a copy of it.
Indeed, you don't have to visit youtube itself at all, so you are spared the pain of interacting with such a fucking horrendously bad website. You can find videos by searching site:youtube.com with Bing (not with Google; it makes Google conk out and give you that stupid bastarding "are you a robot" page after a couple of pages' worth of results) and then download them simply by passing the link to yt-dlp.
You get to watch it using a decent player, of your own choice, not whatever piece of shite is forced on you to watch it within the fucking browser.
There are NO FUCKING ADVERTS.
The quality of the video you get does not depend on the nature of your internet connection. You don't have to suffer youtube limiting you to its own choice of shitty quality or codec because it thinks that's appropriate; you can select any combination of quality and codec from the complete list of what's available for that video. You also don't have to suffer the thing periodically hanging while it waits for more data, and if something happens to break the download you can resume it from the same byte it broke at.

Unfortunately, it is on that last point that yt-dlp is not as good as it could be, because it fails to deal with what I at least find is the most common cause of the download breaking: timeout. The URL it digs out of youtube to actually download the video from is one of these cunt-arsed temporary things that only works for a few hours; indeed one of the parameters embedded in it is the time when it will stop working. But yt-dlp does not take any notice of this parameter. It does not, as it could and should, detect that you are coming up to this time with the download still incomplete and obtain a fresh URL with a renewed timeout to make sure it can carry on downloading; instead it just sticks with the one it got at the beginning, and carries on using that until eventually youtube returns a 403, whereupon yt-dlp decides that the situation is not "retryable" and just stops.

This is shit, because the situation is "retryable". If you simply reissue the same yt-dlp command over again when it stops, it will obtain a fresh URL, and then resume the download from the point where it broke (using the size of the partly-downloaded file as its means of knowing what point to restart at). It could do this automatically, but it won't: it has various different retry-on-error options, but it doesn't have any that will make it retry on this particular error (or if it does have one it's fucking well hidden).

For the case of downloading to a file, this isn't a big deal, because you can just wrap the yt-dlp command in a shell one-liner to reissue the command if the exit status is non-zero. But for the case of downloading to a pipe, it is a dog's cunt. With no file to read the size of when it retries, all it can do is start over again from the beginning, which of course is no bloody good at all.

Why am I downloading to a pipe? Because of youtube itself being a cunt. There is a moronic fashion these days for publishing videos at double the normal frame rate (ie. 50fps instead of 25, or 60fps instead of 30). All this achieves is to waste bandwidth/disk space on a file which is twice the size it should be, for no discernible improvement in quality. (And I say that as one who is particularly aware of judder and jerkiness in videos which other people don't notice, and who insists on a refresh rate of at minimum 85Hz, preferably more, on a CRT monitor to avoid flicker: the "rules" are not the same for a continuous, regular stream of video frames as they are for either an irregular stream or for a static image, and 25fps is perfectly adequate.) The usual manifestation of this fashion as displayed by youtube is to provide versions of the video either at normal frame rate in shitty tiny resolutions, or at double frame rate for every resolution from quite small to ridiculously fucking huge, so either I can get a sensible frame rate but at too small a resolution, or a sensible resolution but with a waste of 100% extra disk space to store all the unnecessary extra frames.

So when I encounter one of these, I download the absolutely-most-of-everything format, with the doubled frame rate and at the highest resolution available - often they come at a completely fucking stupid maximum resolution, like 3840x2160, for fuck's sake, what the steaming fuck is the point of that many pixels for a fucking video, some buggers need to have a look at what the angular resolution of the human eye is actually like. It's only actually useful to palliate the inevitable loss of quality when re-encoding a video - and that is what I am doing with it; I pipe the output from yt-dlp into ffmpeg, and use ffmpeg to convert the video to normal frame rate at a sensible resolution. ffmpeg isn't clever enough to halve the frame rate without completely decoding and re-encoding the video, with the complete new iteration of lossy compression that that entails, so the excessive input resolution is useful to compensate for the associated loss of quality.

But the absolutely-most-of-everything format is of course a sodding enormous bloody file, and the consequence is that doing this with a video that runs for about 2.5 hours means requesting a video file of about 30GB which takes something over 24 hours to download and convert on the fly. But youtube's bastard fucking expiring URLs don't last that long, so the download conks out four times or thereabouts part way through the process and needs to be restarted. Which since it is writing to stdout, yt-dlp can't do. And this is a dog's cunt.

So what to do about this? I suppose I could hack yt-dlp, but the sodding thing is written in sodding Python, which is horrible shit and a pain in the fucking arse and it is worth going to quite a bit of alternative effort in order to avoid having to deal with the cunt. I fucking hate Python. Fucking weird syntax in too many different and pointless variations, and any language which uses whitespace as a syntactic element is not fit for purpose in any case: who the cunting fuck ever thought that was a good idea anyway? It's a fucking shite and moronic idea and whoever it was that had it deserves a fucking smack. Python is a load of old arse. Fuck Python.

And in any case yt-dlp tends to need updating every couple of months in order to get the latest version that has been rewritten to cope with whatever youtube's latest method of being a wanker is, which means a hacked version would need every updated version patched and rebuilt before I could install it, and no doubt sooner or later I'd find that the patch could no longer be applied for some arse biscuit reason and have to fuck around figuring out why and how to fix it when I can no longer remember how it's supposed to work or what this that and the other peculiar bit of code in it is for... so bollocks to that.

No, what it needs is some kind of "fake file" to make yt-dlp think it is writing to a file - and can read the size of the file to find out how much it's already written when it's restarted - but nevertheless the output appears on stdout so I can pipe it to ffmpeg. This is not something that Linux provides natively, so I have to implement it myself; since I really don't want the pain of trying to implement it as a kernel feature, the alternative is, basically, FUSE.

Which is all very well but the trouble is that FUSE itself is some species of canine vagina, probably a dingo's cunt. Apart from the initial and far too common difficulty of the thing being hosted on "Github - A site where nothing fucking works" (WHY do we have to put up with this fucking abomination for so much software? Why the fuck can't people either put the software on their own website, or at least find some other site to host it on that isn't such a totally fucking useless piece of dysfunctional shite?), the main problem with FUSE is that there is NO FUCKING DOCUMENTATION and you keep having to figure stuff out by trial and error instead, with whatever assistance you can manage to dig up from inadequate and incomplete answers on stackoverflow to someone who asked a slightly different but possibly related question. The most useful stuff I found was the code examples on this FUSE tutorial page, and the other examples that come as part of the package in /usr/share/doc, which give just about enough vaguely-usable code for very minimal functions to fuck about with and see what bleeding bizarre results can be obtained.

(Is there documentation on "Github - A site where nothing fucking works"? I don't bleeding know. Because it's "Github - A site where nothing fucking works", and nothing fucking works. But if there is, nobody ever quotes it, or links to it, so I suspect that there really isn't any, and therefore I still wouldn't be able to find it even if it wasn't "Github - A site where nothing fucking works".)

Compiling FUSE stuff requires the compiler flags -I/usr/include/fuse -D_FILE_OFFSET_BITS=64 -lfuse -pthread. This is most easily done by including $(pkg-config fuse --cflags --libs) in the compiler command line, assuming that pkg-config is working.

It is also necessary to put #define FUSE_USE_VERSION 26 (other version numbers exist but I don't know what the differences are) BEFORE you #include <fuse.h>. If you leave that out, it fucks up something rotten (I can't remember exactly how because I only did that once, but it isn't useful).

Example code on the web, such as the tutorial page I linked to above, fires up FUSE by calling fuse_main() with the directory for the FUSE filesystem to be mounted on as a parameter. It takes parameters in the same int argc, char *argv[] format as main() does. But nowhere does there seem to be any actual fucking LIST of what other parameters it understands. I managed to find information about only three:

-f makes it NOT fork and daemonise itself. Until I discovered this, I was having to piss about passing dup()s of stdout and stderr in the user data block that fs_init() receives, so that fs_init() could reopen stdout and stderr after FUSE had closed them, which worked, but it was something I could have done without.
-s makes it run single-threaded instead of multi-threaded. At least, that's what it seems to be supposed to do. A couple of pages appeared to be saying that it works the other way round (ie. turns single-threadedness off instead of on), and another page suggested that it doesn't actually do anything at the moment. This is confusing and shit, but fortunately I think it's wrong. I used it, anyway, because I don't want it to run multi-threaded, and nothing exploded as a result of me not bothering about making my code thread-safe; but then nothing exploded without it either, so I still don't really know.
-d makes it print "debugging output", which mainly consists of messages indicating what function it has just tried to call. Only it's fucking stupid because it refers to functions that don't exist. I had it moaning at me that I hadn't implemented SETATTR. Only there is NO FUCKING FUNCTION called SETATTR, so a bleeding fat lot of fucking help that is. What is the point of providing such a cunty and useless message? Stupid bastards.

Unhelpful wankers on stackoverflow respond to various people asking "how the fuck do I do [simple thing that ought to be basic and obvious]" by disparaging fuse_main() as a "function for lazy programmers" and saying "you can't do it that way, you'll have to use the more complicated method". They do NOT say what this "more complicated method" is, or how you use it, or even where you can find something written by someone who isn't an unhelpful wanker that does tell you about it. One suspects that they don't actually know themselves, and are just trying to act all superior about it, which makes them double wankers, and cunts. Of course they can't provide any links to a description of the "more complicated method" because there FUCKING AREN'T ANY, but they leave you to find that out for yourself by googling fruitlessly until you give up in disgust.

I found lots of replies from such wankers when I was trying to find out how to get FUSE to provide some indication that it has completed its initialisation and is ready for use. fuse_main() never returns, so you have to fork it as a child process and carry on doing stuff in the parent process. But you can't carry on doing stuff straight away, because it takes a while for it to get ready. So I was finding that my main process had already started to create files in the mountpoint directory before FUSE had looked at it, and when it did look at it it complained that the directory wasn't empty and refused to start. Therefore I was trying to find out if there was any way to tell it to send a signal or something once it had completed initialisation, which I could wait for before trying to carry on and use it. And could I find anything? Could I fuck.

There is, of course, the function fs_init() which is called "during initialisation" so your own FUSE code can pick that up and do any of its own initialisation that it needs to do. But nothing tells you WHEN "during initialisation" it gets called or what state the rest of FUSE is in when it does it. It might be after FUSE has finished doing everything it needs to do itself, but it equally might be before it has done any of it, or at some random point half way through: you just don't fucking know, because nothing fucking tells you.

I found someone on stackoverflow who was having exactly the same problem. They had bodged around it for the time being by having their main process sleep for 1 second after forking FUSE, but this is obviously unsatisfactory and they wanted to know how to actually solve the problem instead of bodging it. The response they got was one of these stupid twats saying "you'll have to use the more complicated method" and then not saying what that is. So they took a different angle, and tried asking whether fs_init() is only called after FUSE has done all its own initialisation, or not. But nobody gave any useful answer to that either: there are some fucking useless bloody arseholes on stackoverflow, there really are. So they tried experimenting, and found that on Linux they could have their main process wait for fs_init() to send a signal and expect FUSE to be ready for use when it was received, but on Mac OS that wasn't good enough and they still needed the delay as well - and since it was Mac OS they were writing for, they were basically no better off. And at this point the discussion terminated because no cunt would bother to provide an answer, although there surely must be plenty of people who do know on that site for fuck's sake.

Since I am on Linux, having my code wait to receive a signal sent from fs_init() is the method I adopted, and it does appear to work; but I still don't know if I can actually rely on that, or if I've just been lucky with it so far.

For fuck's sake.

Now, about that -d parameter, and this stupid bleeding SETATTR shite which doesn't exist, and related matters...

FUSE has an absolute fuck load of functions which you can define and no doubt if you're using it to implement a full-on complete filesystem you will be using most or all of them as a matter of course, but for the kind of extremely minimal demonstration that the abovementioned tutorial page provides - or the examples in /usr/share/doc - you only need a tiny handful of them, and the rest can be left undefined. However, there is a difficulty regarding which functions those are and how you find this out. Nothing fucking tells you, of course, and it isn't always obvious; you can be thinking "well, to do X, it will surely have to use Y", but then you find it doesn't go anywhere near Y but uses Z in a weird and unanticipated way instead. And what is worse, there isn't one single answer - instead it varies as you go along as some obscure and perverse function of what things you have and haven't done yourself at any given point, so you find the ground shifting underneath you and end up having to swear a lot.

I found some page on the web that said "It may be tempting to leave all the functions you're not using undefined. Don't. Otherwise FUSE will try to use them at some point, and fail because they're not there, and you won't know what it is that it can't find that's throwing it" (not quite a verbatim quote but close enough from memory). Well, yeah. People are too polite, and the page fails to mention that the reason you won't know what it is that it can't find is that the -d output is a load of fucking useless shite that talks about functions like SETATTR that don't bleeding exist anyway. I think it should, because if things are shite and nobody ever says so then they never get put right and just carry on being shite. The more comments there are like WHAT THE FUCK IS "SETATTR"? on random web pages about FUSE, the more chance there is of someone connected with FUSE noticing that there is a problem and doing something about it. Even if it's only a faint hope, it still needs to be said.

The trouble with that advice not to leave anything unimplemented is that it ignores practicality: of course it's "tempting", because otherwise you'll be there for months writing fucking reams of code that will probably never get called and has fuck all to do with whatever minimal thing you're trying to do. There's a whole lot of difference between writing a fully-featured filesystem and merely using a tiddly bit of the functionality available via FUSE to solve some highly constrained and specific problem, and to give advice of that nature is to ignore/dismiss the possibility of such a constrained case ever existing, which isn't realistic and isn't useful.

But you also can't simply get away with pointing all the unimplemented functions at dummy functions which do nothing but report that they have been called, to overcome the SETATTR problem, and then return without doing anything, because FUSE does not have a single set of functions that it requires to perform some given operation; instead it chooses a different set of functions for the same operation depending on which ones you have defined or not. So if you provide dummy functions in place of those you haven't implemented, it will try and use them to actually do things, when it's perfectly happy to do those same things without using those functions if you have just left them as NULL. So by following this approach blindly, you again end up writing code that you don't need to bother with because you don't realise that you don't need to bother with it.

An example of this kind of thing arises with read() and read_buf(). If you only define read() and leave read_buf() as a NULL, then FUSE will happily perform read operations using read() and not care that read_buf() doesn't exist. But if as well as defining read() you also provide a dummy function for read_buf(), then it will start trying to use the dummy read_buf() to perform the same bloody operations that it would otherwise just use read() for. So if you don't realise it's being an inconsistent wanker, you can end up thinking you do have to provide a proper implementation of read_buf() when in fact all you have to do is not define it as anything at all.

How do you know what FUSE's decision process is, and work out how to anticipate it so you can avoid writing code you don't need to bother with? Well you fucking don't know, of course. Because nothing fucking well tells you. It's all down to trial and error again, pissing around repeatedly changing something and recompiling and running it and seeing what it does this time over and over again instead of simply being able to look the answers up.

So to ease the mechanics of the trial-and-error process I wrote a header file fuses.h which does provide a dummy function for every function FUSE allows you to define, but also lets you turn off the ones you don't want. The standard way you inform FUSE of the functions you have defined is to define a struct fuse_operations with the appropriate entries set to the pointers to your own functions, the other entries remaining as NULL. With fuses.h, you can also define a second struct fuse_operations in which the entries for the functions that you want to remain UNdefined are set to some non-NULL value (doesn't matter what; 1 cast to a void* will do). You then pass both structs to a function fizzmerge() which returns a struct fuse_operations filled in with pointers to all the functions you have defined, NULL for all the functions you have specified must remain UNdefined, and dummy functions for all the rest. Therefore all you have to do to to see what the deal is for any given function is to edit the declaration for the second struct fuse_operations to add or remove that function, recompile, and test. It's still a clunky and shit trial-and-error process which is not necessarily entirely dependable, but at least the basic mechanics of the fucking about you have to do for each trial is reduced to a minimum.

Of course if FUSE was properly documented there would be no need to fuck around doing this sort of shit, because you would be able to read the documentation and get the answers out of that. But you can't, because FUSE is written by wankers who seem to think that trial and error is perfectly fine as a method of working out how to use something so inherently complex and sensitive as a library for implementing filesystems with.

So anyway, accompanied by much moaning and swearing about the fucking useless nature of the information I was finding on the internet, I used this kind of method to kludge together a very minimal FUSE hack that does just enough to let me test it and then build something to solve the yt-dlp timeout problem that uses it.

Most of the FUSE functions in this lot were written by taking bits of the example code in /usr/share/doc and hacking it about. It also contains the odd bit of code which is a relic of previous test iterations, which is still there partly in case I decide I want to use it again and partly because I can't be arsed. No, it isn't elegant or beautiful code; it was just thrown together to solve a specific problem, and it does that. A lot of the reason for providing it here is simply to add to the variety of more or less crappy minimal examples of stuff done with FUSE, in case anyone finds it more useful than not having it.

Download the code: phaic.tar.gz (contains: phaic.c, fuses.h, Makefile)

/* phaic.c */ /* gcc phaic.c $(pkg-config fuse --cflags --libs) -O2 -o phaic */ #define FUSE_USE_VERSION 26 #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #include <signal.h> #include <errno.h> #include <fuse.h> #include <string.h> #include <alloca.h> #define LHOGGHUE 0 #include "fuses.h" #define MAXPH 16 #define MAXPATH 512 struct filedat { size_t size; char fn[MAXPATH]; }; union voidstr { void *p; char s[8]; }; struct sose { int so; int se; int pid; union voidstr sv; }; static struct filedat phyles[MAXPH]; /* Look for a file name in the list of files; if not found and c != 0, create new entry */ static struct filedat *fnfile(const char *fn, int c) { static int nfiles = 0; int i; if (nfiles >= MAXPH) return NULL; for (i = 0; i < nfiles; i++) if (!strcmp(fn, phyles[i].fn)) return &(phyles[i]); if (!c) return NULL; memset(&(phyles[nfiles]), 0, sizeof(struct filedat)); strcpy(phyles[nfiles].fn, fn); return &(phyles[nfiles++]); } static void *fs_init(struct fuse_conn_info *c) { struct fuse_context *fc; struct sose *s; fc = fuse_get_context(); s = (struct sose *)(fc->private_data); /* It APPEARS that when it gets to this point, fuse itself is ready for action, * so we can now send the main process a signal to tell it so (and tell it what * the complete temp dir name is). At least on Linux; found some suggestion that * on Mac OS this won't work, but this _is_ Linux, so fuck Apple. */ sigqueue(s->pid, SIGUSR1, (union sigval){ .sival_ptr = s->sv.p }); if (s->so || s->se) { /* was used for experimentation */ dup2(s->so, 1); dup2(s->se, 2); } if (LHOGGHUE) fprintf(stderr, "fs_init(c=%p)\n", c); return NULL; } /* Makes ls work, basically */ static int fs_readdir(const char *path, void *data, fuse_fill_dir_t filler, off_t off, struct fuse_file_info *ffi) { int i; char *s; if (LHOGGHUE) fprintf(stderr, "fs_readdir(path=%s, data=%p, filler=%p, off=%zu, ffi=%p)\n", path, data, filler, off, ffi); if (strcmp(path, "/") != 0) return -ENOENT; filler(data, ".", NULL, 0); filler(data, "..", NULL, 0); for (i = 0; phyles[i].fn[0]; i++) { s = phyles[i].fn; if (*s == '/') s++; filler(data, s, NULL, 0); } return 0; } /* Does not really read the "file", just reads its size. (only used in testing) */ static int fs_read(const char *path, char *buf, size_t size, off_t off, struct fuse_file_info *ffi) { size_t len; struct filedat *b; char file_contents[256]; if (LHOGGHUE) fprintf(stderr, "fs_read(path=%s, buf=%p, size=%zu, off=%zu, ffi=%p)\n", path, buf, size, off, ffi); if (!(b = fnfile(path, 0))) return -ENOENT; sprintf(file_contents, "%zu\n", b->size); len = strlen(file_contents); if (off < len) { if (off + size > len) size = len - off; memcpy(buf, file_contents + off, size); } else { size = 0; } return size; } /* Does not really write to the "file"; writes to stdout, but records the cumulative byte count */ static int fs_write(const char *path, const char *buf, size_t size, off_t off, struct fuse_file_info *ffi) { struct filedat *b; if (LHOGGHUE) fprintf(stderr, "fs_write(path=%s, buf=%p, size=%zu, off=%zu, ffi=%p)\n", path, buf, size, off, ffi); if (!(b = fnfile(path, 0))) return -ENOENT; b->size += write(1, buf, size); /* to stdout */ return size; } /* All this really does is test whether the file exists */ static int fs_open(const char *path, struct fuse_file_info *ffi) { struct filedat *b; if (LHOGGHUE) fprintf(stderr, "fs_open(path=%s, ffi=%p)\n", path, ffi); if (!(b = fnfile(path, 0))) return -ENOENT; return 0; } /* As above, but creates it if it doesn't */ static int fs_mknod(const char *path, mode_t mode, dev_t dev) { struct filedat *b; if (LHOGGHUE) fprintf(stderr, "fs_mknod(path=%s, mode=%6.6o, dev=%d)\n", path, mode, dev); if (!(b = fnfile(path, 1))) return -ENOENT; return 0; } /* Gives sensible results for a top-level dir or a "file" in it, but not for anything else */ static int fs_getattr(const char *path, struct stat *st) { struct filedat *b; if (LHOGGHUE) fprintf(stderr, "fs_getattr(path=%s, st=%p)\n", path, st); if (strcmp(path, "/") == 0) { st->st_blksize = 512; st->st_mode = 040755; st->st_nlink = 2; } else if (b = fnfile(path, 0)) { st->st_mode = 0100600; st->st_nlink = 1; st->st_uid = 1000; st->st_gid = 1000; st->st_rdev = 0; st->st_size = b->size; st->st_blksize = 4096; st->st_blocks = (st->st_size / st->st_blksize) + 1; } else { return -ENOENT; } return 0; } /* They've made a machine, a new-fangled device * They're lighting the fuse * There's no need to worry, your world will be all right... */ int light_fuse(char *argo, char *tmpdirname, int db) { char *tmpdir; char *randbit; char *aagz[8]; int narg = 0; int rv = 0; int pid, cpid; union voidstr uv; sigset_t sset, oset; siginfo_t sit; struct sose s; /* Lists of fuse funcs to pass to fizzmerge() */ /* Ones we've implemented */ struct fuse_operations fsops = { .init = fs_init, .readdir = fs_readdir, .read = fs_read, .write = fs_write, .open = fs_open, .getattr = fs_getattr, .mknod = fs_mknod, }; /* Ones we want NOT implemented: some (eg. getxattr()) just cause log spam, * but some (eg. read_buf()) make it want to use them instead of using the * ones we have implemented (for this ex., read()) and fuck it up. */ struct fuse_operations cough = { .getxattr = r1, .fgetattr = r1, .create = r1, .flush = r1, .lock = r1, .release = r1, .opendir = r1, .releasedir = r1, .read_buf = r1, .write_buf = r1, }; /* After fizzmerge(), all other fuse funcs end up as dummies that just return 0. */ /* Stuff for initialising my own fuse stuff with */ s.so = s.se = 0; /* s.so = dup(1); s.se = dup(2);*/ s.sv.p = NULL; s.pid = pid = getpid(); /* SIGUSR1 used to tell main process that fuse is ready */ memset(&sit, 0, sizeof(siginfo_t)); sigemptyset(&sset); sigaddset(&sset, SIGUSR1); /* Do we actually need this for use with sigwaitinfo()? Docs say we do wrt. other * sig handling stuff, but don't mention sigwait()/sigwaitinfo()/sigtimedwait() */ sigprocmask(SIG_BLOCK, &sset, &oset); switch(cpid = fork()) { case -1: fprintf(stderr, "fork() failed: %s\n", strerror(errno)); exit(errno); break; case 0 : sprintf(tmpdirname, "%s-tmpdir-%d.XXXXXX", argo, getpid()); /* make tmp dir for fuse to sit on */ if ((tmpdir = mkdtemp(tmpdirname)) == NULL) { fprintf(stderr, "Cannot create temp dir %s: %s\n", tmpdirname, strerror(errno)); rv = errno; } else { randbit = tmpdir + strlen(tmpdir) - 6; strcpy(s.sv.s, randbit); memset(&(phyles[0]), 0, sizeof(struct filedat) * MAXPH); /* Fill in unused/unwanted entries and dummy entries in fsops */ fizzmerge(&fsops, &cough); narg = 0; aagz[narg++] = argo; aagz[narg++] = "-s"; aagz[narg++] = "-f"; if (db) aagz[narg++] = "-d"; aagz[narg++] = tmpdirname; aagz[narg] = NULL; if (rv = fuse_main(narg, aagz, &fsops, &s)) { /* I THINK: 0 = was all OK; not 0 = didn't start, or etc. */ fprintf(stderr, "Could not remove temp dir %s: %s\n", tmpdirname, strerror(errno)); } } if (rv) { /* Make main process receive a tmp dir name ending in . to indicate error */ sigqueue(pid, SIGUSR1, (union sigval){ .sival_ptr = NULL }); } close(1); /* close stdout formally, now we're done with it */ exit(rv); break; default: sigwaitinfo(&sset, &sit); sigprocmask(SIG_SETMASK, &oset, NULL); uv.p = sit.si_ptr; /* man sigaction says si_ptr, man sigqueue says si_value; si_ptr works */ sprintf(tmpdirname, "%s-tmpdir-%d.%s", argo, cpid, uv.p ? uv.s : ""); close(1); /* don't want/need main process's stdout */ break; } return cpid; } /* Execute something in a loop until its exit status is zero */ int lupin(char *tmpdir, char *outswitch, int argc, char *argv[]) { char phyle[256]; char **aagz; int narg; int cpid; int i, rv; /* Construct argv to tell the thing to write to a (fake) fuse file */ sprintf(phyle, "%s/file-%d", tmpdir, getpid()); narg = argc + 2; aagz = alloca((narg + 1) * sizeof(char *)); for (i = 0; i < argc - 1; i++) aagz[i] = argv[i + 1]; aagz[i++] = outswitch; aagz[i++] = phyle; aagz[i] = NULL; rv = 1; while (rv) { switch (cpid = fork()) { case -1: fprintf(stderr, "fork() failed: %s\n", tmpdir, strerror(errno)); exit(errno); break; case 0 : close(1); /* don't want the thing's own stdout */ execvp(aagz[0], aagz); rv = errno; /* if it gets to here the exec failed */ fprintf(stderr, "execvp(%s", aagz[0]); for (i = 0; i <= narg; i++) fprintf(stderr, ", %s", aagz[i] ? aagz[i] : "NULL"); fprintf(stderr, ") failed: %s\n", strerror(rv)); raise(SIGTERM); /* cause "abnormal" exit so the while loop will terminate */ pause(); /* stop it, because I don't know if raise() will really return or not */ break; /* So it never gets here (at least it bloody well shouldn't) */ default: waitpid(cpid, &rv, 0); if (WIFEXITED(rv)) { rv = WEXITSTATUS(rv); } else { rv = 0; /* break the loop if it didn't exit "normally" (via exit()) */ } break; } } return rv; } void usage(char *argo) { fprintf(stderr, "Usage: %s file_output_switch wrapped_program [param [param...]]\n" " file_output_switch: whatever you pass to wrapped_program to tell it\n" " to write to a given file (as in \"-o file\" etc.)\n" " wrapped_program: thing to execute in a loop until it returns a\n" " zero exit status\n" " [param [param...]] any other parameters to pass to wrapped_program;\n" " will have \"file_output_switch random_filename\"\n" " stuck on the end after this lot.\n" "Example: %s -o yt-dlp -f 315 http://www.youtube.com/watch?v=gb4I6EtGIXs\n", argo, argo); exit(0); } int main(int argc, char *argv[]) { char tdn[256]; char *pn; char *outswitch; int cpid, es; int i; /* Crude argument parsage */ if (pn = strrchr(argv[0], '/')) pn++; else pn = argv[0]; if (argc < 3) usage(pn); for (i = 1; i < argc; i++) if (!strcmp(argv[i], "-h") || !strcmp(argv[i], "-?") || !strcmp(argv[i], "--help")) usage(pn); outswitch = argv[1]; for (i = 2; i <= argc; i++) argv[i - 1] = argv[i]; argc--; cpid = light_fuse(argv[0], tdn, 0); if (tdn[strlen(tdn) - 1] == '.') { waitpid(cpid, NULL, 0); fprintf(stderr, "Fuse is soggy and will not light.\n"); exit(1); } lupin(tdn, outswitch, argc, argv); kill(cpid, SIGTERM); waitpid(cpid, &es, 0); rmdir(tdn); }

Right, so what's going on here then?

In the middle of it all is the loop function lupin() which executes yt-dlp (or whatever else you've told it to execute: the same sort of problem might arise with using wget to download to a pipe something that times out, for instance) repeatedly in a loop with the same parameters each time, until it returns a zero exit status. Very crude. It doesn't - can't - distinguish between a non-zero exit status caused by a youtube timeout and one caused by a genuine error, so if you are getting a genuine error every time it might need to be forcibly interrupted to make it stop.

The FUSE stuff provides yt-dlp (or whatever) with a fake file to write to: it has a filename, it looks like a file, and it has a size which corresponds to the number of bytes that have been written to it and which yt-dlp can read in the usual way to determine what point to restart an interrupted download at. However, the data is never actually written to any real file; it just gets copied straight to stdout.

LHOGGHUE is defined to 0 or 1 to tell fuses.h to turn off or on the reporting of what dummy function has been called; it's also used in this file to turn off or on the reporting of the "real" fs_ functions being called.

A struct filedat holds the data describing one of these fake files - such as it is; it's just the filename and the byte count. Nothing else is implemented: stuff like file permissions is totally ignored, and dummy data is provided to anything that asks for such stuff.

As it stands it maintains a table containing this data for up to 16 files, although it only actually needs one file for this particular application; the ability to handle up to 16 is a relic of testing. Again, it is very crude; there is no mechanism to delete files, and once you get up to 16 files it just stops working. It also doesn't handle subdirectories: one top-level directory of up to 16 files is all you get. But it's still more than is needed for this very simple thing.

Indeed, the support for multiple files is arguably of negative usefulness; since all the data written to any file simply appears straight on stdout, and there's only one stdout, if you open and write to another file while it's in the middle of downloading, the data you write to that file will appear in the middle of the data stream that's being downloaded and bugger it up. So don't do that.

(If it really bothers you, you could always change #define MAXPH 16 to #define MAXPH 1 so it isn't possible to create any such additional files; but on the other hand if you're that bothered about security you really ought to be completely rewriting and extending this code to deal correctly with permissions and all the rest of it, since I have completely ignored all that stuff because I can't be arsed, and the "security" is not even MS-DOS level.)

union voidstr abuses a void* to turn it into a string, so the ability to return a pointer as auxiliary data when sending a signal can be abused to send a short character string instead.

struct sose holds odds and ends of data which are/were used in fs_init(): copies of stdout and stderr (which were used before I discovered the -f parameter, see above, but aren't any more), the PID of the parent process which also isn't used any more but I haven't been arsed to take it out, and a union voidstr containing a short character string to be sent along with the "ready" signal, see later.

fnfile() looks up a filename in the table of struct filedats, and returns the appropriate struct if it finds it or NULL otherwise; if the parameter c is non-zero and it doesn't find the filename and there are less than 16 of them, it creates a new struct filedat and returns that.

fs_init() is that function which is called "during initialisation" of FUSE. It retrieves the "user data" passed to the FUSE initialisation, and sends a SIGUSR1 to tell the parent process that FUSE is now ready for action (probably), at the same time passing the already-mentioned short character string to the parent process as an abused void*. It also used to reopen stdout and stderr before I discovered -f, and still could if it was given them, but it isn't any more.

fs_readdir(), as the comment says, "makes ls work, basically". It never seems to get called for any other reason, only when you run ls. Of course, it isn't technically necessary for ls to work for this specific application anyway, but it was handy for testing stuff, again. Since the only directory that exists is the single top-level directory, the only value accepted for path is /.

The parameter data is something FUSE passes in to be passed in turn to filler(), a function which is also passed in by FUSE. It takes four parameters, the first of which is data. The second parameter is the filename being listed: . and .. are "synthetic"; the other filenames are retrieved from the table of struct filedats, with the leading slash removed. The third parameter can be a pointer to a struct containing file attribute data, but FUSE seems to be perfectly happy if you pass NULL for this; it just calls fs_getattr() to get the data instead.

The fourth parameter is something called "offset" and nobody seems to really know what it does. Some stuff on the web suggests that it should never be zero, but instead should be a different value for every file. On the other hand the examples in /usr/share/doc and the tutorial page mentioned above do just set it to zero all the time, and it seems to work. So fuck knows. Maybe it is somehow related to the off parameter to fs_readdir(); I don't know what that does either because none of the examples I looked at used it.

The ffi parameter to fs_readdir() is some more detailed information about the file, maintained by FUSE itself I think, which is also unused by this application and by any of the examples I looked at. All the fs_ functions except fs_init() get passed one of these, and none of them in this minimal application use it.

fs_read() is a clone-and-hack of the corresponding function out of one of the examples. Normally it would read the file (duh), but in this case there is nothing to read because all the data has been written to stdout and not to an actual file, so it just pretends that the count of bytes written is the "file content" and gives you that instead. So you can get the count of bytes by reading the file as well as by statting it. Again this was useful for testing, but it never actually gets used by this application.

fs_write() is the equivalent function for writing to the file (duh, again): as already described, it actually writes the data to stdout, and adds the number of bytes written to the count recorded for the "file".

fs_open() is extremely simple since for this application it turns out that there's next to nothing for it to usefully do; all it does do is call fnfile() to report whether the file exists or not.

fs_mknod() is equally simple; indeed it's exactly the same except that if the file doesn't exist it tries to create it, and only fails if there are already 16 files defined.

fs_getattr() fills in a struct stat with "reasonable fake values" when given a pathname referring to either the top-level directory or a file in the top-level directory. Note that the value st_mode is not just the mode, but the type and mode, as per man 2 stat. If you just treat it as the mode and don't fill in the type part, FUSE goes nuts.

light_fuse() is the metrognomic function that forks off FUSE as a child process. The first thing it does is define the two struct fuse_operationss to be passed to fizzmerge() as described above in the bit about fuses.h. It also sets up the "user data" for FUSE initialisation; remaining from before I discovered -f, but commented out, are the lines to save stdout and stderr so that fs_init() can reopen them. Then it pisses about with signal stuff to prepare for the later use of SIGUSR1, and then it forks.

The first thing the child part of the fork does is to create a temporary directory for FUSE to sit on. This could of course have been done before forking, but I want the name of the temporary directory to include the PID of the FUSE process so it's obvious from the name what process is using it, for which reason it has to be done here. This in turn means that the child process has to report to the main process the random string that mkdtemp replaces "XXXXXX" with, so that the main process can also know what the name of the temporary directory is. It is this that all the pissing about with union voidstr is for, abusing the pointer that you can send along with a signal to use it as a short character string instead and write the replacement for "XXXXXX" into it.

Having done that, it initialises (zeroes) the table of struct filedats, calls fizzmerge() to set up the fuses.hified list of FUSE function definitions, sets up the pseudo-command-line-ish argument list for FUSE, and then calls fuse_main() with that argument list, the list of function definitions, and the "user data" pointer as parameters.

Of course the fucking useless (non-existent) "documentation" does not tell you anything about what you can expect to happen next. What SEEMS to happen is that if everything is OK, fuse_main() never returns; you have to send it a SIGTERM to make it exit, and it then returns 0. If everything is not OK, it returns non-zero straight away, before it gets as far as calling fs_init().

So in the case of everything being OK, fuse_main() calls fs_init() and that sends a SIGUSR1 to the main process carrying the string which replaced "XXXXXX" as auxiliary data; the child process continues to run until terminated by a signal, and then it returns 0. But if fuse_main() fails, or if the creation of the temporary directory didn't work, the following code sends the main process a SIGUSR1 with an empty string for the "XXXXXX" replacement, from which the main process knows that it failed. If it was fuse_main() that failed after the creation of the temporary directory had succeeded, it deletes the temporary directory again, so the main process knows that no matter what the reason was for it not getting an "XXXXXX" replacement, if it didn't get one then there is no temporary directory either so it doesn't matter that it can't work out what the complete directory name would be.

The parent part of the fork, accordingly, uses sigwaitinfo() to pause itself until it receives the SIGUSR1; when it gets it, it extracts the "XXXXXX" replacement string, works out what the name of the temporary directory is, and returns that to the caller in the tmpdirname parameter, which is a buffer supplied by the caller for the purpose.

(Note that once it has forked it then has two copies of the tmpdirname buffer, so the child process can use the second copy for its own construction of the temporary directory name without affecting the operation of the main process.)

lupin() is the function that repeatedly forks a(nother) child process to execute yt-dlp (or whatever else you've told it to execute), and keeps on doing so until the child process returns a zero exit status (unless it can't execute yt-dlp at all, in which case it stops trying straight away). It is a bog-standard fork-and-exec jobbie with quite a lot of comments in it, so I hope it is basically self-explanatory.

usage() does what it obviously does, and also saves me bothering to add a paragraph here to explain the argument format. Its parameter argo is the name of the program, derived from argv[0], so you can give the compiled binary whatever name you like without having to also edit a name hard-coded into the source.

main() also ought to be pretty self-explanatory: it parses the arguments as described by usage(), fires up FUSE by calling light_fuse(), runs the yt-dlp (or whatever) loop by calling lupin(), shuts FUSE down and cleans up.

To illustrate how all the various forks and piped things hang together, here is a sample of a process tree with it running and downloading/converting a video:

20505 ? Ss 0:11 \_ xterm -class UXTerm -title uxterm -u8 20508 pts/4 Ss 0:02 \_ bash 23115 pts/4 S+ 0:00 \_ phaic -o yt-dlp -f 303 3QfkLC4pY-o 23117 pts/4 S+ 0:00 | \_ phaic -o yt-dlp -f 303 3QfkLC4pY-o 23119 pts/4 S+ 0:06 | \_ python3 /usr/bin/yt-dlp -f 303 3QfkLC4pY-o -o phaic-tmpdir-23117.AJ9fBI/file-23115 23116 pts/4 Rl+ 215:03 \_ ffmpeg -i - -vcodec vp9 -crf 24 -vb 0 -tile-columns 4 -threads 4 -r 29.97 3QfkLC4pY-o.out.webm Command line was: phaic -o yt-dlp -f 303 3QfkLC4pY-o | ffmpeg -i - -vcodec vp9 -crf 24 -vb 0 -tile-columns 4 -threads 4 -r 29.97 3QfkLC4pY-o.out.webm

23115 is the top-level process, sitting there waiting for the sub-processes to exit. 23117 is the sub-process maintaining the FUSE fake file frig, and 23119 is the sub-process in which yt-dlp gets the video and is automatically restarted if it conks out. (3QfkLC4pY-o is the youtube ID of the video; I don't bother copying and pasting the entire URL or indeed even seeing it, since just the ID on its own is enough.) You can see that yt-dlp thinks it is writing its output to a file named phaic-tmpdir-23117.AJ9fBI/file-23115, to which it adds .part in the usual way.

$ ls -al phaic-tmpdir-23117.AJ9fBI total 32455 drwxr-xr-x 2 root root 0 Jan 1 1970 . drwxr-xr-x 3 pigeon pigeon 4096 Apr 2 21:28 .. -rw------- 1 pigeon pigeon 265837168 Jan 1 1970 file-23115.part $

phaic-tmpdir-23117.AJ9fBI is the FUSE mount point; it is a real directory, but phaic creates it on entry and deletes it again on exit. The owner and group are actually pigeon, and it might even have files in it if it wasn't just a temporary directory newly created for this one purpose.

Once FUSE has grabbed it, you can no longer see the real directory, so you don't see the real owner/group and you wouldn't see any files in it if there were any. Note that the functions fs_readdir (titled /* Makes ls work, basically */) and fs_getattr (/* Gives sensible results for a top-level dir or a "file" in it, but not for anything else */) are only returning a hard-coded 1000 for the uid and gid of anything and everything; the directory shows up as root because that's how FUSE does it and there seems no point trying to make it not do it like that.

The file file-23115.part is the FUSE fake file and it does not really exist. There is nothing taking up 265837168 bytes on disk, in memory or anywhere else; if you try and read it you won't get 265837168 bytes of data, you'll get the string "265837168\n". It is nothing but a "stat()table and writable semi-persistent entity" or something similarly pretentious: if yt-dlp were to conk out at this point, on being restarted it would stat() the file (or whatever you call it in Python), discover that it had already downloaded 265837168 bytes, tell youtube to resume at the 265837169th byte, and carry on writing new data to the file, thinking it is being appended to the file when it is actually appearing on stdout and being piped to the stdin of ffmpeg. Meanwhile ffmpeg would have been sitting there scratching its balls waiting for the next byte on stdin.

If you try and write to the file then that does appear to work, in that you don't get a write error and the file size appears to increase by however many bytes you wrote. However where that data has actually gone is out of stdout and down the pipe to ffmpeg, and most likely buggered it up. Also the increased count now means that if yt-dlp conks out and needs restarting, it will pick the download back up that number of bytes too far along and leave a gap. So don't do that: it is, after all, a pretty easy thing to not do.

Anyway, it all seems to work, at least so far - in that I have successfully used it to download a 30GB-odd video from youtube and pipe it to ffmpeg to be converted on the fly, which took something over 24 hours and involved several timeouts and restarts (and also a couple of instances of my crappy internet connection conking out entirely and bringing everything to a halt until the automatic fucked-connection detector switched the power to the stupid bleeding piece of shite "router" box off and on again, which takes about 5 minutes to complete). It all came out fine and the converted video file written by ffmpeg plays as expected without any glitches.

(Aside moan: WHY the FUCK can't you simply get a pure ADSL interface on a fucking PCI CARD - ie. just the hardware to interface between the ADSL line on one side and the PCI bus on the other, with all the necessary software running on the host processor, so I can actually get at the bloody thing and monitor the shitey connection with native Linux tools/facilities and restart it when it conks out in a reasonably rapid manner? Why the cunting fuck is the only way to get an ADSL interface to get what is in effect a whole separate fucking computer, which does its best to be as opaque and inaccessible as possible, and although its OS is Linux the software it actually runs is just a fucking huge opaque binary blob that you can't do anything with? I don't want this fucking incubus sitting in the way of my connection, I want the same Linux PC that I have set up myself to be a firewall and router to also be handling the ADSL connection natively so I can do something decent to compensate for the shittiness of the service. Cunt. And yeah, I know you can get things which call themselves "ADSL cards", but they are nothing but a rip-off pile of fucking shite: they aren't plain interfaces, they're one of these whole separate computer cunting things just the same as the separate boxes are, only mounted on a card instead of in a separate box and costing several times as much as the same fucking thing when it is in a separate box: what the dog shite is the fucking point?)

Download the code: phaic.tar.gz (contains: phaic.c, fuses.h, Makefile)

...Oh yeah, and after all that, what the fuck was SETATTR? Well, I'm still not completely sure, because it stopped moaning about it at some point without me having explicitly done anything to deal with it. But what seems to be going on is that once I'd written fuses.h, at the point where it was originally moaning about SETATTR it instead called the dummy version of fs_chmod() and was quite happy with that even though it's a dummy that doesn't do anything. This happens when it wants to create a new file; why the juddering fuck it wants to call fs_chmod() instead of just using the mode parameter of fs_mknod() I have no idea, but it does, because it's a cunt. So it looks like the answer is that SETATTR is fs_chmod(), at least in this case; but why the bastarding fucking shitewank can't they cunting well CALL it fs_chmod() instead of making up this stupid fucking name that doesn't exist?

Back to Pigeon's Nest

Be kind to pigeons