CD-Text and cdrecord "cdtext.dat" file

This page sets out to provide some useful information relating to the question: How the pissing fuck do you write a CD with CD-Text using cdrecord?

(Note: this page is not a finished work; it is subject to updates as I find out more about this shit. Though they are not very likely to be frequent updates as finding out about this shit is fucking hard. It's supposed to be in the MMC3 standard but in fact most of it is not and what is there is in such bad English it's piss awkward to understand, and there's supposed to be more in IEC 61866 but can I find a copy of that? Can I arse. Hours on google and all I find is ONE file sharing link that 404s, ONE torrent link that 404s, and a huge number of fucking shitheads who think it is acceptable to try and make me pay up to two fucking hundred fucking quid for some fucking data which costs fucking zilch to copy so am I fuck going to pay a fucking penny for it let alone two hundred fucking pounds. Fucking cunts. You do not charge money for fucking data because copying data does not fucking cost anything so fucking go and fucking fuck yourselves you fucking pieces of shit. Plus it's supposed to be a fucking standard for everyone to use, not just for rich cunts with more money than sense. So fuck off.)

The facilities cdrecord provides for writing CD-Text are OK if all you are doing is making a clone of an existing CD that already has CD-Text on it. But for anything else they are piss useless and the situation is made worse by the ~~lack of~~ refusal to provide documentation. This is why:

cdrecord can obtain the CD-Text information either from the .inf files created when reading a CD, or from a binary file for which the usual example filename given is "cdtext.dat", which may be created from an existing CD by running cdrecord -vv -toc. Fine. But what do you do when you don't have a CD to read in the first place? Say for example you are making your own personal The Very Best of the Arse Brothers CD by assembling your favourite tracks off your collection of Arse Brothers albums. Or you are making a CD of the very first Arse Brothers album Farting In The Lift which came out in 1973 and is only available as a second-hand LP. Or you have the original CD of Bum Cheek Cheek which does not have CD-Text on it but your player supports CD-Text and you want to make a version of it with suitable text added. Or anything else like that. How the festering dog puddle do you create suitable files to tell cdrecord what CD-Text it is supposed to write?

The .inf files look promising at first because they are in plain ASCII and can therefore easily be read, understood and edited. But there is a lurking cunt. They contain other information as well, the most awkward of which is start and end sector numbers for the track. Working out what this ought to be is a pain in the arse. I suppose you could burn a dummy copy on a CD-RW and then read it back to see what sector numbers you get, or just try to work it out by hand from the file sizes and hope you get it right, or something, but I for one don't fancy doing that and it seems that nobody else does either; can't say I blame them.

The cdtext.dat file now begins to look like the better approach because it does not contain any of this extraneous shit. But there is a cunt here as well and it is a great big gaping one with a speculum in it. WHAT IS THE FORMAT OF THIS FILE? No fucker says. The cdrecord documentation certainly doesn't say. Googling found several instances of people asking about this on forums and mailing lists and getting no answer - or at least no helpful answer; most of them did get a response from Jörg Schilling, the author of cdrecord, but all he ever says is "run cdrecord -vv -toc" with an air of ineffable smugness while completely ignoring the utter uselessness of such an answer. It's no fucking use at all when you don't have a CD with CD-Text to run it on, and even if you have, all it gives you is the raw file; it doesn't tell you fuck about the format and you still have to work it out for yourself, which is exactly the problem the people asking the question were trying to avoid.

Having spent a lot of time on googling and got nothing from it bar smug cunty unhelpfulness, I was pissed off. Sufficiently pissed off that I decided to do something about it myself. So the next thing I did was try to google for a sample cdtext.dat file to have a butchers at and see if I could work out what it was at.

Fuck. First problem: can't fucking find one. That's right, according to Google there is not one single fucking example of a cdtext.dat file to be found anywhere on the entire fucking internet. How amazingly fucking shit is that?

So the next step was the mindless tedium of feeding CDs into my CD drive one by one and running cdrecord -vv -toc on each one of them to try and get a sample file that way. Problem: no fucker uses CD-Text. It seems that there is only ONE CD in my entire fucking collection that does have CD-Text on it: that is my CD of Revolution Days by Barclay James Harvest featuring Les Holroyd. Yay for Les (again) Thumb
Up

Of course, having only one example to inspect is decidedly suboptimal as I cannot avoid the possibility that what I derive might be derived wrongly, and while it might accurately represent that particular file I can't be sure it represents an arbitrary file with equal accuracy. But with the lack of samples and the cunty attitude of Herr Schilling to people who have asked him to provide one it's the best I can do. And to start things off I can at the very least make sure there is one example cdtext.dat file on the internet instead of fucking zero. Here it is:

Sample cdtext.dat CD-Text file - read from "Revolution Days" by Barclay James Harvest featuring Les Holroyd

Here is a hex dump of the file:

00000000 01 d6 00 00 80 00 00 00 52 45 56 4f 4c 55 54 49 |........REVOLUTI| 00000010 4f 4e 20 44 8d 1f 80 00 01 0c 41 59 53 00 49 54 |ON D......AYS.IT| 00000020 27 53 20 4d 59 20 1a 97 80 01 02 08 4c 49 46 45 |'S MY ......LIFE| 00000030 00 4d 49 53 53 49 4e 47 ed bb 80 02 03 07 20 59 |.MISSING...... Y| 00000040 4f 55 00 54 48 41 54 20 57 41 a6 c8 80 03 04 07 |OU.THAT WA......| 00000050 53 20 54 48 45 4e 2e 2e 2e 20 54 48 60 b4 80 03 |S THEN... TH`...| 00000060 05 0f 49 53 20 49 53 20 4e 4f 57 00 50 52 f8 61 |..IS IS NOW.PR.a| 00000070 80 04 06 02 45 4c 55 44 45 00 4a 41 4e 55 41 52 |....ELUDE.JANUAR| 00000080 ba 5d 80 05 07 06 59 20 4d 4f 52 4e 49 4e 47 00 |.]....Y MORNING.| 00000090 4c 4f c1 19 80 06 08 02 56 45 20 4f 4e 20 54 48 |LO......VE ON TH| 000000a0 45 20 4c 49 34 29 80 06 09 0e 4e 45 00 51 55 49 |E LI4)....NE.QUI| 000000b0 45 52 4f 20 45 4c 98 48 80 07 0a 09 20 53 4f 4c |ERO EL.H.... SOL| 000000c0 00 54 4f 54 41 4c 4c 59 fe b6 80 08 0b 07 20 43 |.TOTALLY...... C| 000000d0 4f 4f 4c 00 4c 49 46 45 20 49 92 71 80 09 0c 06 |OOL.LIFE I.q....| 000000e0 53 20 46 4f 52 20 4c 49 56 49 4e 47 51 e2 80 09 |S FOR LIVINGQ...| 000000f0 0d 0f 00 53 4c 45 45 50 59 20 53 55 4e 44 1f c8 |...SLEEPY SUND..| 00000100 80 0a 0e 0b 41 59 00 52 45 56 4f 4c 55 54 49 4f |....AY.REVOLUTIO| 00000110 43 38 80 0b 0f 09 4e 20 44 41 59 00 4d 41 52 4c |C8....N DAY.MARL| 00000120 45 4e 4c 68 80 0c 10 06 45 20 28 66 72 6f 6d 20 |ENLh....E (from | 00000130 74 68 65 20 df d0 80 0c 11 0f 42 45 52 4c 49 4e |the ......BERLIN| 00000140 20 53 55 49 54 45 ed 0e 80 0c 12 0f 29 00 00 00 | SUITE......)...| 00000150 00 00 00 00 00 00 00 00 cc 63 81 00 13 00 42 41 |.........c....BA| 00000160 52 43 4c 41 59 20 4a 41 4d 45 e4 20 81 00 14 0c |RCLAY JAME. ....| 00000170 53 20 48 41 52 56 45 53 54 20 46 45 f8 d7 81 00 |S HARVEST FE....| 00000180 15 0f 41 54 55 52 49 4e 47 20 4c 45 53 20 ec c8 |..ATURING LES ..| 00000190 81 00 16 0f 48 4f 4c 52 4f 59 44 00 00 00 00 00 |....HOLROYD.....| 000001a0 87 ba 8f 00 17 00 00 01 0c 00 13 04 00 00 00 00 |................| 000001b0 00 00 33 55 8f 01 18 00 00 00 00 00 00 00 00 03 |..3U............| 000001c0 19 00 00 00 0b f6 8f 02 19 00 00 00 00 00 09 00 |................| 000001d0 00 00 00 00 00 00 cc b9 |........| Rearranged for readability: 01 d6 00 00 80 00 00 00 REVOLUTION D 8d 1f 80 00 01 0c AYS.IT'S MY 1a 97 80 01 02 08 LIFE.MISSING ed bb 80 02 03 07 YOU.THAT WA a6 c8 80 03 04 07 S THEN... TH 60 b4 80 03 05 0f IS IS NOW.PR f8 61 80 04 06 02 ELUDE.JANUAR ba 5d 80 05 07 06 Y MORNING.LO c1 19 80 06 08 02 VE ON THE LI 34 29 80 06 09 0e NE.QUIERO EL 98 48 80 07 0a 09 SOL.TOTALLY fe b6 80 08 0b 07 COOL.LIFE I 92 71 80 09 0c 06 S FOR LIVING 51 e2 80 09 0d 0f .SLEEPY SUND 1f c8 80 0a 0e 0b AY.REVOLUTIO 43 38 80 0b 0f 09 N DAY.MARLEN 4c 68 80 0c 10 06 E (from the df d0 80 0c 11 0f BERLIN SUITE ed 0e 80 0c 12 0f )........... cc 63 81 00 13 00 BARCLAY JAME e4 20 81 00 14 0c S HARVEST FE f8 d7 81 00 15 0f ATURING LES ec c8 81 00 16 0f HOLROYD..... 87 ba 8f 00 17 00 ............ 33 55 8f 01 18 00 ............ 0b f6 8f 02 19 00 ............ cc b9

The format of it seems to be as follows:

File size - 2 (2 bytes, big-endian)
00 00 (2 bytes)
Data, split into 18-byte lines formatted as:
- Tag (1 byte)
- Content sequence (1 byte)
- Line sequence (1 byte)
- "Carryover" (1 byte)
- Text (12 bytes)
- 16-bit CRC (2 bytes, big-endian)

...So what does all this shit mean? The first two entries are I think pretty bleeding obvious; fuck knows why you have to subtract 2 from the file size, but that is the least of this format's problems, as will be seen...

The lines are grouped into sections ordered by the initial tag byte; lines within each section are in numerical sequence.

The tag byte can be:

0x80 - Album/track title data
0x81 - Performer data
0x8e - ISRC/MCN data
0x8f - Metadata describing the preceding lines

...of course it may well be able to be other things as well but I don't know what.

The content sequence byte indicates which entry in the current tag group the first byte of the "Text" belongs to. It starts over again from 0 every time we get into a new tag group. Look at the "rearranged for readability" version of the hex dump. The first line (ignoring the four bytes at the start of the file) is the first line for the 0th title, REVOLUTION DAYS (which is the title of the whole album) so its content sequence byte is 0. The second line begins with a continuation of the 0th title so its content sequence byte is also 0. The third line begins with a continuation of the 1st title, IT'S MY LIFE (the first track) so its content sequence byte is 1. And so on.

Note that as a "sequence" it is a bit shit because it can have gaps in it. Let us take the cdtext.dat for Bum Cheek Cheek as an example:

01 e8 00 00 80 00 00 00 BUM CHEEK CH e5 c3 80 00 01 0c EEK.KURT VON ae cb 80 01 02 08 NEGUT'S ASTE 71 3b 80 01 03 0f RISK.WINDYPO 17 0d 80 02 04 07 PS.TAGNUTS.S e5 8f 80 04 05 01 WEET SMELL O 09 07 80 04 06 0d F FARTING.I da b2 80 05 07 02 SHAT THE SHE 15 c3 80 05 08 0e RIFF.SITTING dd 43 80 06 09 07 IN A TREE.. fb f7 80 06 0a 0f ..SHIT FOR E 5b 7b 80 07 0b 0a NGLAND.THE W 30 44 80 08 0c 05 INNIT SONG.B de 75 80 09 0d 01 IG TURD.GRAS 69 a0 80 0a 0e 04 SY ARSE.THE f3 bf 80 0b 0f 04 HAIRS ON MY 8e cd 80 0b 10 0f BUM GROW DOW 8d 50 80 0b 11 0f N, DOWN, DOW 55 5a 80 0b 12 0f N.POP POP PO cb a0 80 0c 13 0a P.REQUIEM FO bb 65 80 0d 14 0a R A SKIDMARK 60 d3 80 0d 15 0f ............ 64 e4 81 00 16 00 THE ARSE BRO a6 7a 81 00 17 0c THERS....... 9f 08 8f 00 18 00 ............ 2e e6 8f 01 19 00 ............ eb 4b 8f 02 1a 00 ............ 41 1a

Look at the fifth line. It begins with the tail end of the 2nd title, WINDYPOPS, so its content sequence byte is 2. Then follows the 3rd title, TAGNUTS, which is quite short so the whole thing fits within this line with space left over. The end of the line then holds the start of the 4th title, SWEET SMELL OF FARTING, which carries over into the sixth line, and so the content sequence byte for the sixth line is 4. As you can see the content sequence jumps straight from 2 to 4, and there is no 3 referring to TAGNUTS anywhere. That's just the way it is.

The line sequence byte is straightforward: it starts at 0 and goes up by 1 for every line. Nothing difficult there.

The "carryover" byte is another one which is kind of shit. It represents the number of bytes of the title that the first byte of the line belongs to that we have already had in preceding lines. This is rather pointless to begin with since we already know that, and it is even more pointless since it is limited to a maximum of 15 even if the number of preceding bytes for the title is more than that. Fuck knows what this is all about but, again, that's the way it is.

For greater clarity, look at the Bum Cheek Cheek cdtext.dat again. The first line obviously can have no carryover from preceding lines because there aren't any, so its carryover byte is 0. The second line has a carryover byte of 0x0c because it begins with a continuation of the 0th title of which we have already had 12 bytes in the preceding line. After the 0th title on the second line come the first 8 bytes of the 1st title, which is continued on the third line so the third line has a carryover byte of 8. Since this is quite a long title it carries on into the fourth line as well; we have had 20 characters of it before the fourth line, but since 20 is greater than 15 the carryover byte for the fourth line is 15, the maximum value it is allowed to have. And so it goes.

The text is just that - text, as a sequence of 0-terminated ASCII strings. The terminating 0 is considered to be part of the string. I don't think it has to be all upper case, but I don't actually know; the example I read from Revolution Days was like that, so I have stuck to that for my own generated examples, but I can't say whether you actually have to do that or whether you can use lower case if you want to.

The CRC is a 2-byte big-endian value representing the CRC calculated for the preceding 16 bytes according to the following algorithm:

/* l is assumed to point to a buffer of >= 18 bytes */ /* 0x1021 is a magic number; don't worry about it! */ void addcrc(unsigned char *l) { int i, j, r; r = 0; for (i = 16; i; --i) { r ^= (*l++ << 8); for (j = 8; j; --j) if ((r <<= 1) & 0x10000) r ^= 0x1021; } r ^= 0xffff; *l++ = (r >> 8) & 0xff; *l = r & 0xff; }

Right, that's how the lines work, now let us look at what sort of lines there are. As mentioned above, this is what the tag byte tells us, and it can have the following values (that I know about):

0x80 - Album/track title data
0x81 - Performer data
0x8e - ISRC/MCN data
0x8f - Metadata describing the preceding lines

Lines occur in numerical order of tag. Again I don't know if they have to or if you can put them in any order and it'll sort things out from the tag values, but my only pukka example is in numerical tag order, so numerical tag order let it be.

The 0x80 tag indicates a line of title data. The first title listed is the title of the whole album. The following titles are the titles of the individual tracks. There isn't anything to distinguish the album title from the track titles apart from it being the first one.

The 0x81 tag indicates a line of performer data. In the example I have there is only one performer entry - BARCLAY JAMES HARVEST FEATURING LES HOLROYD - referring to the whole album and every track on it. I am not entirely clear what is supposed to happen in the case of a compilation album containing tracks by a bunch of different bands, but I am pretty sure that what you would do in that instance is have several performer entries, the first one (referring to the whole album) being VARIOUS or something like that, followed by one performer entry for each track, naming the band concerned.

The 0x8e tag indicates a line of ISRC or MCN data. I am not sure quite how this is supposed to work because although Revolution Days does have ISRC data on it, cdrecord does not put it into the cdtext.dat file; fuck knows why not. It does get written to the .inf files, though, one ISRC tag per track, so I guess that if you want to include it you put one ISRC entry per track in the cdtext.dat file. Unlike the title (and performer, probably), there does not seem to be an initial entry referring to the whole album; there are only the entries for each track.

I am even less sure what the deal is with MCN data, since Revolution Days does not have any on it, but I think what happens is that you get one entry only, referring to the whole album, and it comes before the ISRC entries.

The 0x8f tag indicates a line of metadata. There are three such lines and they come right at the end. The tag, content sequence, line sequence, carryover and CRC all work the same as for "ordinary" lines (the carryover is always 0 because each metadata line is a separate entity and does not carry over into the next one). The metadata itself, of course, goes in the 12 text bytes. They are all set to 0 apart from the ones with stuff in them. The format seems to be like this:

1st line : 00 01 TT 00 LT LP 00 00 00 00 00 00 2nd line : 00 00 00 00 00 00 LI 03 LC 00 00 00 3rd line : 00 00 00 00 09 00 00 00 00 00 00 00

The variables TT, LT, LP, LI and LC mean:

TT: Total number of Tracks
LT: Line count of Title lines
LP: Line count of Performer lines
LI: Line count of ISRC lines
LC: Line Count

TT is found by counting the number of title entries and subtracting 1 (because the first title is the album title, not a track title). LC, the Line Count, is the same value as the line sequence byte for the 3rd 0x8f line, because it represents the total number of lines in the file and the 3rd 0x8f line is the last line of the file. LT, LP and LI refer to the number of lines used for title, performer and ISRC entries, respectively, not the number of entries.

The "magic constants" 01, 03 and 09 are a bit unfortunate. They may not really be constants; it is quite possible that they are variables that don't vary very much. 09 is probably a language code meaning English. Fuck only knows what 01 and 03 are.

So, having ploughed through all this crap, where do we end up? It is still going to be a pain in the arse to put together a file in this format by hand. But fret ye not, I have written a program to do it for you. It takes as input a text file in a very simple format and generates a cdtext.dat-format file from it.

The text file contains the title, performer and ISRC entries, one per line. The first character of each line must be either T, P or I according to which kind of entry it is. Any whitespace between the T/P/I and the rest of the line is ignored. The entries must be in the order they are to appear in the cdtext.dat file, ie. the first one refers to the whole album and the subsequent ones refer to individual tracks, as described above. There must be the right number of them also as described above and it is up to you to make sure this is the case Smile

Here is a sample file which will produce a cdtext.dat for Revolution Days identical to that created by cdrecord from the original CD:

T REVOLUTION DAYS T IT'S MY LIFE T MISSING YOU T THAT WAS THEN... THIS IS NOW T PRELUDE T JANUARY MORNING T LOVE ON THE LINE T QUIERO EL SOL T TOTALLY COOL T LIFE IS FOR LIVING T SLEEPY SUNDAY T REVOLUTION DAY T MARLENE (from the BERLIN SUITE) P BARCLAY JAMES HARVEST FEATURING LES HOLROYD

And here is the source code. It does not do any validity checking so it is up to you to make sure your input is OK. I can't guarantee its output to be valid either; as I have said the only pukka example I have to go on is the Revolution Days data, and all I can say is that my program does successfully produce a cdtext.dat for that album identical to that read off the original CD by cdrecord. In particular the implementation of ISRC entries is pretty much a guess, although I think it is quite a good one.

Download source code: make-cdtext-dat.c.gz

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <ctype.h> #define SCRATCH 1024 #define LINE 18 unsigned char ctt[][2]={ { 'T', 0x80 }, { 'P', 0x81 }, { 'I', 0x8e }, { 0, 0 } }; unsigned char ctot(unsigned char c) { int x; for (x = 0; ctt[x][0]; x++) if (ctt[x][0] == c) return(ctt[x][1]); return(0); } unsigned char ttoc(unsigned char t) { int x; for (x = 0; ctt[x][1]; x++) if (ctt[x][1] == t) return(ctt[x][0]); return(0); } void addcrc(unsigned char *l) { int i, j, r; r = 0; for (i = 16; i; --i) { r ^= (*l++ << 8); for (j = 8; j; --j) if ((r <<= 1) & 0x10000) r ^= 0x1021; } r ^= 0xffff; *l++ = (r >> 8) & 0xff; *l = r & 0xff; } void dl(unsigned char *l) { int x; printf("%02x %02x %02x %02x ", l[0], l[1], l[2], l[3]); for (x = 4; x < 16; x++) printf("%c", isprint(l[x]) ? l[x] : '.'); printf(" %02x %02x\n", l[16], l[17]); } void send(int action, unsigned char *l) { static unsigned char *buf=NULL; static int len=0; FILE *ofp; switch (action) { case -1 : if (!(ofp = fopen(l, "w"))) { fprintf(stderr, "%s: cannot open for writing\n", l); exit(1); } fprintf(ofp, "%c%c", (unsigned char)((len >> 8) & 0xff), (unsigned char)(len & 0xff)); fwrite(buf, 1, len, ofp); fclose(ofp); free(buf); buf = NULL; len = 0; break; case 0 : addcrc(l); dl(l); memcpy(&buf[len], l, LINE); len += LINE; break; default : if (!(buf = malloc(action += 64))) { fprintf(stderr, "Out of memory: cannot allocate %d bytes\n", action); exit(1); } buf[0] = buf[1] = 0; len = 2; break; } } int recv(char action, unsigned char *l) { static FILE *ifp; int ret=0; unsigned char buf[SCRATCH]; unsigned char *s; switch (action) { case -1 : fclose(ifp); break; case 0 : if (!(ifp = fopen(l, "r"))) { fprintf(stderr, "%s: cannot open for reading\n", l); exit(1); } fseek(ifp, 0, SEEK_END); ret = ftell(ifp); rewind(ifp); ret += (ret >> 1); send(ret, NULL); break; default : if (feof(ifp)) rewind(ifp); while (fgets(buf, SCRATCH, ifp)) { s = buf; if (*s++ == action) { if (l) { while (isspace(*s)) s++; while (*l++ = *s++) ret++; } else { ret = 1; } break; } } break; } return(ret); } int dolines(unsigned char code, unsigned char ln) { unsigned char buf[SCRATCH]; unsigned char ol[LINE]; unsigned char *s, *o, *t; unsigned char pref, seq, cont, bc; pref = ctot(code); seq = 0; cont = 0; bc = 12; while (recv(code, buf)) { t = buf; while (*t) if (!isprint(*++t)) *t = 0; for (s = buf; s <= t; s++) { if (bc == 12) { o = ol; *o++ = pref; *o++ = seq; *o++ = ln++; *o++ = (cont > 15 ? 15 : cont); } *o++ = *s; cont++; if (!--bc) { send(0, ol); bc = 12; } } cont = 0; seq++; } if (bc < 12) { while (bc--) *o++ = 0; send(0, ol); } return(ln); } void dofile(char *infile, char *outfile) { unsigned char trak, perf, isrc, tot, trax; char buf[LINE]; recv(0, infile); trak = dolines('T', 0); perf = dolines('P', trak); isrc = dolines('I', perf); tot = isrc; isrc -= perf; perf -= trak; for (trax = 0; recv('T', NULL); trax++); trax--; recv(-1, NULL); memset(buf, 0, LINE); sprintf(buf, "\x8f%c%c%c%c\x01%c%c%c%c", 0, tot++, 0, 0, trax, 0, trak, perf); send(0, buf); memset(buf, 0, LINE); sprintf(buf, "\x8f\x01%c", tot++); sprintf(&buf[10], "%c\x03%c", isrc, tot); send(0, buf); memset(buf, 0, LINE); sprintf(buf, "\x8f\x02%c", tot++); buf[8] = 9; send(0, buf); send(-1, outfile); } int main(int argc, char *argv[]) { char *p; if (p = strrchr(argv[0], '/')) { p++; } else { p = argv[0]; } if (argc != 3) { fprintf(stderr, "Usage: %s infile outfile\n", p); exit(1); } dofile(argv[1], argv[2]); }

...Of course, having read all this, you may well be thinking: why go to all this trouble in order to use cdrecord; why not just use something else? To which the answer is: all the other programs capable of writing CD-Text are even more of a pain in the fucking arse to get the CD-Text data into them, so this is actually the easiest method... Bang Head

Back to Pigeon's Nest

Be kind to pigeons