< akira4> Hi! I'm new here. I wanted to apply for OPW under ffmpeg. Could someone guide me about how to start? <@llogan> akira4: have you seen this link? https://trac.ffmpeg.org/wiki/SponsoringPrograms/OPW/2014-12 < akira4> llogan, I did check the page but couldn't figure out where exactly to start <@llogan> do any of the listed projects interest you? do you have a project idea you would like to work on? < akira4> llogan, I am interested in the subtitles project. <@llogan> i think ubitux can help you with that. <@ubitux> oh? <@llogan> you're listed as a mentor <@ubitux> yeah right <@ubitux> but i wasn't briefed on the opw process <@ubitux> so yeah sure i can answer subtitles questions, but not so much about opw < akira4> I see. Thanks < akira4> Also how do I start with the qualification task? <@ubitux> mmh let me what i put there <@ubitux> "write one subtitles demuxer and decoder (for example support for Spruce subtitles format). This is in order to make sure the subtitles chain is understood." mmh alright <@ubitux> let me see if i can find something that looks like a specifications <@ubitux> http://documentation.apple.com/en/dvdstudiopro/usermanual/index.html#chapter=19%26section=13%26tasks=true <@ubitux> alright, this looks like a good starting point <@ubitux> not sure if we have samples around < akira4> cool. Thanks <@ubitux> http://www.eso.org/~lchriste/trans/eyes/subtitles/soundtrack.stl <@ubitux> alright, here is one <@ubitux> akira4: are you familiar with libavformat/libavcodec and the demux/decode process or not at all? * rcombs would love to see subtitles in lavfi, and would probably help with it < akira4> ubitux, No I'm not actually. <@ubitux> rcombs: yes that's the last step, but there are a few things before, to give me enough time to redesign the api so it's possible :P <@ubitux> akira4: alright, so... <@ubitux> akira4: basically, see this first: http://ffmpeg.org/ffmpeg.html#Detailed-description <@ubitux> (just the ascii graph and the explanation below) <@ubitux> poke me when you're done, i'll explain how subtitles fit into this < akira4> I see. < akira4> cool < akira4> I'll do that <@ubitux> akira4: i realize that description doesn't actually explain much; can you start building ffmpeg and get maybe a .srt file somewhere? < akira4> ubitux, Alright I'll try doing that. < akira4> ubitux, I'm done with the building of the source code part. Should I read the documentation that you provided? <@ubitux> akira4: it will just take you 2 min <@ubitux> akira4: do you have a .srt file at hand? < akira4> ubitux, yep I have many .srt files with me <@ubitux> ok; so do you have a ffprobe tool built in the source directory? < akira4> ubitux, yes I do. <@ubitux> try running ./ffprobe -show_packets -show_data foo.srt|less <@ubitux> this will show you the demuxing process < akira4> I see. <@ubitux> basically, the srt demuxer (in libavformat/srtdec.c) will fill "packets" <@ubitux> a packet is a simple structure, which has a few fields, notably pts, data and size <@ubitux> and duration <@ubitux> (and a few other things you can see here) <@ubitux> the data is basically supposed to be kind of opaque <@ubitux> in the case of the srt, you can see that it contains basically the text for each event < akira4> yes <@ubitux> but it can have markup, right <@ubitux> typically, it's the event copied verbatim <@ubitux> so with and stuff like that <@ubitux> other subtitles demuxer will output similar packets, with their markups as well < akira4> I see. <@ubitux> for microdvd typically, you'll get stuff like ${c:...} <@ubitux> and same for every other formats <@ubitux> anyway <@ubitux> these packets, you can send them directly to a muxer <@ubitux> for instance with ffmpeg, you can do ffmpeg -i in.srt -c copy out.srt, and only the demuxer and muxer will be in the chain <@ubitux> the demuxer will output timed packets, and the muxer will re-create a file by printing timestamps and the payload (data) <@ubitux> similarly, you can do ffmpeg -i in.srt -c copy out.mkv < akira4> oh < akira4> wait so < akira4> if i'm getting this right < akira4> we're basically taking a .srt file with markup < akira4> and creating a file with timed events corresponding to the text? <@ubitux> you should get the exact same file at the end <@ubitux> can you open libavformat/srtenc.c ? <@ubitux> the srt_write_packet() is the main callback of the muxer <@ubitux> it takes a packet with a pts and duration, and print the string "00:01:02:03 --> 04:05:..." <@ubitux> and the payload < akira4> okay. <@ubitux> the idea is that some containers (or formats) accepts packets of different known tags <@ubitux> so your srt demuxer is outputing packets, with the codec "subrip" (that's the name of the markup) <@ubitux> several muxers can take these packets <@ubitux> the srt muxer is obviously one, but the matroska (mkv) muxer also accepts them <@ubitux> it means the muxer know how to store these packets < akira4> I see. <@ubitux> in the case of the srt muxer, it will create a new .srt file with just the timestamps printed as is, and the text <@ubitux> and matroska has its own way of storing the timestamps <@ubitux> so you'll have something like 32-bits for a timestamps, it will be stored as binary <@ubitux> (not sure if that's exactly that, but you get the point) < akira4> yeah <@ubitux> so, is that fine with you so far? < akira4> one thing <@ubitux> i'm going to go on the decoding process now <@ubitux> ok < akira4> so the srt file that was created out.srt < akira4> is in some ways different from in.srt ? <@ubitux> ideally, it should be the same <@ubitux> in practice you might have slight differences <@ubitux> let me thing of an example.. <@ubitux> right, imagine you have a lot of empty lines at the beginning of in.srt <@ubitux> the srt demuxer will ignore them <@ubitux> and just outputs packets loosing that "information" < akira4> okay <@ubitux> similarly, you could imaging a in.srt with timestamps written in a weird manner <@ubitux> like, i don't know, if the timestamps are written like 00001:02:03.04 --> ... <@ubitux> when reading the timestamps and storing it in AVPacket.pts, the demuxer will loose that weird "0" padding <@ubitux> only the timestamps value itself is kept < akira4> and that would lead to losing of data if the .srt file is opened by a container that isn't compatible with it? <@ubitux> the muxer will probably print 01:02:03.04 --> ... <@ubitux> you can't actually send the packet to a muxer that doesn't support it <@ubitux> if the muxer doesn't accept subrip packets, you'll have to convert them, that's the nextttstep <@ubitux> (convert them from one markup to another) <@ubitux> this is basically the same as audio and video <@ubitux> if you have a mkv with h264 in it, and you want to put that h264 into ogg, you can't <@ubitux> because ogg doesn't accept h264 packets, and you'll have to convert these packets <@ubitux> OTOH, you can demux h264 packets from a mkv file, and just remux them into a mp4 file <@ubitux> because both mkv and mp4 accept h264 packets < akira4> hold on. Let me read the whole thing. it happened too fast. <@ubitux> right, sorry < akira4> so the whole idea is that if there is a packet that a muxer doesnt support we convert it? <@ubitux> yes < akira4> and the packets can be different if they have different tags? <@ubitux> that's the decoding/encoding process i was going to explain < akira4> okay <@ubitux> what do you mean by different packets and different tags? <@ubitux> a subrip packet will be different from a microdvd packet yes < akira4> I'm not sure what tags mean < akira4> I was gonna ask that <@ubitux> i don't remember talking about tags, but you did :p < akira4> sorry. I think I got confused <@ubitux> i'm going to give you more example before i continue < akira4> cool. thanks :) <@ubitux> a .sub file (microdvd) contains lines like this: "{1400}{1500}hello world" <@ubitux> 1400 is the starting frame, and 1500 is the ending one; you can consider them as timestamps for now < akira4> ok <@ubitux> now microdvd also has a markup system <@ubitux> it looks like this typically: "{1400}{1500}{c:$ff0000}hello world" <@ubitux> this will make the text in red <@ubitux> anyway, the demuxer has no knowlead of the markup <@ubitux> and in this case, it will output a packet that looks like this: <@ubitux> AVPacket { pts=1400, duration=100, data="{c:$ff0000}hello world" } <@ubitux> (it's a C-struct, right?) < akira4> yep <@ubitux> if you have a .srt file, it's different <@ubitux> you will probably get something like this: < akira4> I see. <@ubitux> AVPacket { pts=56, duration=100, data="hello world" } <@ubitux> and so, as you guess, the srt muxer can not accept the microdvd packets <@ubitux> and in the same way the microdvd can not accept the srt packets <@ubitux> otherwise, you would end up with files like with this: <@ubitux> {1400}{1500}hello world <@ubitux> and this is an invalid file < akira4> hmm. <@ubitux> that's why the microdvd muxer only accept microdvd packets <@ubitux> and the srt muxer only accepts srt packets <@ubitux> for example, matroska accepts the srt packets, but not the microdvd ones <@ubitux> because that's how it was designed, it only accepts the srt markup <@ubitux> did i lost you again? < akira4> nope < akira4> I got everything :) <@ubitux> cool <@ubitux> should i move on then? < akira4> yes <@ubitux> alright, now it's a bit tricky <@ubitux> i'm going to make a comparison with how audio and video are handled < akira4> okay <+wm4> oh, someone is going to help out with subtitles stuff... that's nice <@ubitux> if you have h264 packets, the data is the compressed data that only the decoder can translate to images <@ubitux> every video decoders (they are in libavcodec/) only understands one kind of packets <@ubitux> they do output "raw" frames < akira4> I see. <@ubitux> there are various different form of "raw" but all of them are generics < akira4> what exactly are frames? <@ubitux> typically, some decoders will output RGB <@ubitux> some other will output YUV, but that's all pure raw that can be piped to image processing code <+wm4> please note that this separation is sometimes a bit non-sensical with subtitles <@ubitux> wm4: wait wait ;) <+wm4> because subtitles are complicated <+wm4> ok ok <@ubitux> wm4: i'm trying to make things simple first :P <@ubitux> akira4: when talking about frames, i'm talking about the AVFrame structure <@ubitux> it contains a lot of information about the decoded "frame" or image <@ubitux> (or slice of sound in the case of audio) < akira4> Oh. I see. <@ubitux> it's basically the exploitable data <@ubitux> a decoder task is to transform a AVPacket into a AVFrame (in the case of audio/video) <@ubitux> the AVPacket is the opaque form that only demuxer and muxers understand <@ubitux> and the AVFrame is the decoded form, which is usable by everyone < akira4> ok <@ubitux> so if you want to display a frame in your video player, you have to decode it, to get the picture itself, and blend it to the screen or whatever <+wm4> it's worth noting that a packet (AVPacket) is basically always a byte blob <@ubitux> yes, the data is supposely opaque, it just carries timing information and says what the data blob type is <+wm4> so it's basically a byte array plus timestamps (and, very rarely, some "side" data) < akira4> ok. <@ubitux> so now we reach the point where you can get a clue about what the ascii graphic @ http://ffmpeg.org/ffmpeg.html#Detailed-description means < akira4> yeah. <@ubitux> the decoded frames box is where you plug your filters typically (to alter the image itself) <@ubitux> ok, so this is how the audio and video works <@ubitux> subtitles are a PITA so it's a bit different < akira4> the filters can be used on both audio and video frames right? <@ubitux> yes <@ubitux> nowadays both audio and video are stored in the AVFrame structure <+wm4> ubitux: we should explain how audio/video is displayed <+wm4> since video players are, you know, a very common use case of ffmpeg <@ubitux> i don't think that's necessary now <@ubitux> i'm trying to make sure the demuxer/muxer and decoder/encoder process is well known <@ubitux> so akira4 can write a demuxer/decoder for a simple subtitle format <@ubitux> and actually understand what's going on <@ubitux> anyway <@ubitux> akira4: so, are you ok so far? < akira4> Yep. hanged Tue Sep 30 2014 <@ubitux> ok so, as we said, the AVFrame contains for audio and video universally recognized forms <@ubitux> for subtitle, the equivalent structure is AVSubtitle < akira4> I see. <@ubitux> and we decided that the universal representation of subtitles would be... ASS <+wm4> and we all hate AVSubtitle!!11 <@ubitux> wm4: haha <@ubitux> akira4: are you familiar with .ass/.ssa, libass or vsfilter? <@ubitux> anime subtitles and stuff < akira4> No I'm not. <@ubitux> ok < akira4> oh <@ubitux> so basically, do you see the karaoke opening in animes? < akira4> Yeah <@ubitux> these are done with ASS/SSA markup <@ubitux> so far, it's the most widely used markup for subtitles < akira4> hmm <@ubitux> it has a dark history and i won't enter in the details <@ubitux> but basically that's the only subtitles markup that has correct rendering engines <+wm4> actually, srt is the most common format <@ubitux> and that is, vsfilter (original implementation, windows) and libass (linux etc) < akira4> rendering engines? <@ubitux> wm4: there is no "srt rendering engine" <@ubitux> akira4: yes, translating a markup to the picture itself, the rasterization process <@ubitux> basically translating a "foo" into a bitmap with "foo" in red < akira4> cool <@ubitux> every unixes now use libass for rendering subtitles <@ubitux> and so basically the idea is that the common representation for all subtitles formats is... ass/ssa markup < akira4> okay <@ubitux> because it's kind of extremely advanced markup, and it has an engine to actually do the rendering <@ubitux> actually, almost every player does this <@ubitux> basically, they will convert to the ASS equivalent <@ubitux> and send it to libass to get a bitmap <@ubitux> and blend it onto the video < akira4> I see. <@ubitux> in FFmpeg we try to achieve something similar <@ubitux> so in the AVSubtitle structure, we do store the "decoded" markup in ASS <@ubitux> there are a lot of very nasty details about this, but i will not talk about it today <@ubitux> so anyway <@ubitux> in libavcodec you'll find "markup converters" for subtitles <@ubitux> if you look at libavcodec/microdvddec.c <@ubitux> it will take packets with microdvd markup (so "{c:$ff00000}foo") <@ubitux> and convert that string to an ASS markup and store it in AVSubtitle < akira4> just a sec < akira4> is the microdvd_decode_frame part where the conversion is done? <@ubitux> (if you want to visualize ASS markup, it looks like this: http://docs.aegisub.org/3.2/ASS_Tags/) <@ubitux> akira4: yes <@ubitux> akira4: especially the while () block <@ubitux> it starts by opening the tags <@ubitux> see microdvd_open_tags() < akira4> yep <@ubitux> it translates the {c:...} tags into the ass markup {\c&H...} <@ubitux> you can have a look at the subrip decoder (badly named srtdec) in libavcodec/srtdec.c <@ubitux> but i'm currently making this one sane <@ubitux> so you'll see various historical horror <@ubitux> i'll explain later what the problem was another time < akira4> haha. Okay <@ubitux> but to make thing simple, the timing information was for a long time stored in the data payload <@ubitux> and well..... <@ubitux> so. <@ubitux> these are the main concepts < akira4> usually what is stored in the data payload? <@ubitux> the opaque data that only the decoder understands <@ubitux> in the case of subtitles, it's the text of the event <@ubitux> with the markup <@ubitux> the timing information has no place into the data <@ubitux> it belongs into the pts and duration fields < akira4> Oh. < akira4> I se <@ubitux> it's important because in the ffmpeg chain <@ubitux> you can actually play with the timestamps a lot <@ubitux> typically you could imagine that in a remuxing you shift the timestamps <@ubitux> like, for retiming a .srt <@ubitux> you could do that <@ubitux> you don't need to understand the markup data for that < akira4> hmm. <@ubitux> you just need to demux, alter the AVPacket->pts, and remux it <@ubitux> no need for decoding <@ubitux> ah, and i forgot the encoding part: for subtitles basically we have a system that parses back the ass markup, with a callback system, so the encoder can print do the inverse operation (ass markup to whatever subtitle markup) < akira4> that's because..decoding would be the part where we take the packet and convert it into a frame and that wouldn't be necessary right? <@ubitux> exactly <@ubitux> into a AVSubtitle for subtitle, but yes that's the idea < akira4> cool. <@ubitux> you don't need to decode the srt to ass (decoder), and convert it back to srt (encoder) before resending the packet to the srt muxer <@ubitux> you can just copy opaquely from srt demuxer to srt muxer < akira4> makes sense <@ubitux> you can't change the content of the text (because it's not "understandable" except for the decoder), but the AVPacket has the timing informatio that you can just change <@ubitux> so, is everything clear enough so far? < akira4> yeah. <@ubitux> ok so <@ubitux> i don't have much time left for today <@ubitux> but i'll give you directions for the qualification task <@ubitux> basically what you want to do for now is simply add a demuxer <@ubitux> so, you'll need to update libavformat/allformats.c, libavformat/Makefile, and create a file similar to other demuxers <@ubitux> libavformat/webvttdec.c might be a good exampl <@ubitux> example* <@ubitux> and since you don't want to support markup for now <@ubitux> you'll make that demuxer output "text" packets < akira4> okay <@ubitux> we already have a "text" decoder < akira4> I see. <@ubitux> it will "convert" the raw text to ass markup <@ubitux> and it actually understands already the '|' as line separator typically < akira4> and about the muxer? <@ubitux> don't worry about the muxer <@ubitux> you don't really need to write one, no one will use it < akira4> cool. <@ubitux> the main user of this are the players <@ubitux> and they just want to be able to support all kind of subtitles formats <@ubitux> no one is interesting in creating these old broken subtitles formats anymore < akira4> hmm <@ubitux> the codec id you are interested in is AV_CODEC_ID_TEXT <@ubitux> that's what your demuxer should output <@ubitux> if you do that, you should be able to do ffmpeg -i in.stl out.srt <@ubitux> (since you have a decoder, and a subrip encoder already exists, it will work just fine a create a proper srt file) <@ubitux> if you want examples of other formats outputing AV_CODEC_ID_TEXT texts, you can try: git grep AV_CODEC_ID_TEXT in the libavformat directory < akira4> hmm <@ubitux> actually, just look at libavformat/aqtitledec.c <@ubitux> anyway <@ubitux> after modifying libavformat/allformats.c and libavformat/Makefile <@ubitux> don't forget to re-run configure <@ubitux> (--cc='ccache cc' is your friend) < akira4> I won't :) <@ubitux> i think you have enough information for now to start experimenting < akira4> thank you so much for the help ubitux :) <@ubitux> i'll be available in a few hours if you have more questions <@ubitux> feel free to ask even if i'm afk, i'll backlog <@ubitux> good luck < akira4> cool. thanks :)