Fix broken MKV samples created with MKVmerge

Re-Creating a truncated mkv Sample when it was created with the MKVmerge writing library.

Intro

I had a good mkv Sample that was fine (passed srs). I split it into 2 pieces: The 1st part was 20M,
to use as a test of a truncated MKV file. The 2nd part was ignored. With this, I could test out
re-creation, and compare the final results to good one.

Find tools

First, you have to use the exact same version of MKVmerge as was used to create the Sample. Any version
of MKVinfo from any release can tell you which one you need. In this case it was 2.4.0.
MediaInfo shows the Writing application too.

All the versions are available here:
http://www.bunkus.org/videotools/mkvtoolnix/win32/

Some of the win32 versions are Windows Installer types (yuck). It also adds a perm path entry pointing
to it for all cmd prompts (yuck again). Install it, then copy it somewhere else, then use Control Panel/
Add-Remove Applications to del it. All the dll's that are needed are in the release folder, so no
conflict occurs when there are different versions. I made a folder on a work drive, and made subfolders
in this for each of the versions (2.4.0 … 4.2.2 .. etc.) and put them there. Each version is 5-7 mb.

  • Use 7-Zip Open Inside or Extract files… on the Nullsoft Install System installer to view and extract everything without running the setup.

Find location

The next thing to discuss is how MKVmerge handles splitting a file. Unlike it's AVI counterpart in Nandub,
VirtualDub and VirtualDubMod, MKVmerge works on "keyframe only segments", so any time entered is rounded
UP to next keyframe (makes it easy to recreate a sample). This is also the behavior in MEncoder. This means
that you only need to find the approximate place in the movie where the sample was cut from: MKVmerge will
use the first NEXT keyframe as the starting location.

So, load up your truncated sample into your player, and play it, noting what is going on. The more you
have of the sample that isn't truncated, the easier it is to find it in the actual full movie. I use BSPlayer,
and the total sample time (it should be) is displayed (NOTE: so do VLC and MPC).

Now load up the movie, and try to find the same segment you just viewed. I use BSPlayer, and in it, the
left / right arrow keys will automatically jump to the next/previous keyframe, so it makes finding the
sample spot in the full movie a lot easier. Once you have done this, note the time in mins/secs where it
at (good idea to write these down in a txt file or similar).

My test case was:
Dragon.Ball.Z.Broly.The.Legendary.Super.Saiyan.1993.720p.BluRay.x264-CiNEFiLE
The truncated sample (20m) was named bad.mkv, and the full movie was extracted from rars and named inp.mkv
(short names used to speed up typing).

It's important to know that not only is srs.exe a really good test for avi and mkv files (it can detect
errors that a lot of apps miss, including GSpot), it can also tell the byte count the sample mkv should be.
In this test case, running srs.exe bad.mkv produced:

Warning: File size does not appear to be correct!
         Expected: 53,110,586
         Found   : 20,000,000

So you know that your sample's size should be the first figure shown (for this test, it's 53,110,586 bytes).

Split mkv

I determined that the sample was 1 minute 45 seconds in length (BSPlayer, VLC and MPC will show you the full
time it is SUPPOSED to be, even when it's truncated), and was to be found in the full movie at approximately
8 minutes 14 seconds. The following was chosen as the cmdline to run:

mkvmerge -o out.mkv --split timecodes:00:08:12,00:09:57 inp.mkv

Since it will auto seek to the next keyframe, I made the start time 08:12 and the end time 09:57 (8:12+01:45).

The MKVmerge parameter "-- split timecodes" split the file into 3 pieces and named them:

  • out-001.mkv ……….. Up to the start time (+seek to next keyframe) …….. ignore this file
  • out-002.mkv ……….. This is the sample cut you want
  • out-002.mkv ……….. the rest of the file till the end …….. ignore this file

(NOTE: whatever name is specified for the -o parameter has -001, -002 and -003 appended).

When I ran it, and did a dir list, this was the result:

51,768,122 out-002.mkv

which is too small based on what srs reported to me earlier. This is because it rounded up to the NEXT keyframe
and did a 1 minute 45 second cut from the original time specified (08.12), which makes the result LESS than
1 minute 45 seconds.

I then ran it again with this cmd line:

mkvmerge -o out.mkv --split timecodes:00:08:12,00:09:59 inp.mkv

Which produced:

53,110,586 out-002.mkv

Which is the correct size as reported by srs.exe on the truncated sample.

Fix differences

There is a sequence of bytes that differ at the beginning of the newly created mkv and the truncated sample.
This is always true. due to a date time code being written into the mkv (mkvinfo will show it as:)

ORIG TRUNCATED ..... | + Date: Sat Oct 25 13:29:04 2008 UTC
NEW RECREATED ...... | + Date: Thu Sep 08 21:03:02 2011 UTC

When I did a binary comparison of the truncated sample and the same number of bytes from newly created sample,
here were the differences (23 bytes total): (output of fc commandline tool on Windows)

Comparing files bad.avi and OUT-002(20M).MKV
      000010B1: 03 04 000010B2: 6C AD 000010B3: 38 FF 000010B4: 72 0D 000010B5: 7E D5 000010B6: 9A 3F
      000010B7: 40 2C 000010BC: BD 9B 000010BD: 64 B3 000010BE: 3F AF 000010BF: B6 5B 000010C0: 10 2A
      000010C1: B4 64 000010C2: 76 F7 000010C3: AE E9 000010C4: 80 A2 000010C5: 62 68 000010C6: AD AC
      000010C7: 49 32 000010C8: CD CC 000010C9: 45 1E 000010CA: 75 A7 000010CB: 84 AC

Using a technique I have long used for other projects, I split the truncated sample "bad.avi" into 2 parts:
5M and the rest of the the bytes > 5M. The first slice 5M file was named "orig" (the other file ignored).
Then I did the same for the newly created file "out-002.mkv", except this time the 1st 5M was ignored, and
the rest of that file's data slice was named "newslice". Then using a cmd line binary copy …

copy /b orig+newslice try.mkv

This created a file named try.mkv which was:

  • the first 5M of the original truncated sample and
  • the rest of the data > 5M of the newly create sample.

NOTE: You will need some kind of file splitter to do this, or a hex editor. Google it!

This time, When I did a binary comparison of the truncated sample and the same number of bytes from the
"joined" mkv (try.mkv), there were no differences.

Test

As a final test, since I did have the actual original sample, I compared it to my final result:

Comparing files dragon.ball.z.broly.1993.720p.bluray.x264.sample-cinefile.mkv and try.mkv
        FC: no differences encountered

Epilogue

The final word: I believe it is possible to fully recreate a truncated mkv sample from the original source
mkv if it was created by mkvmerge, as long as you use the same mkvmerge version, and the same time and size
parameters. This also applies to a mkv sample that is corrupted (bad bytes), as long as the corruption
does not occur very early in the mkv. —YopoM

Libav

Libav is an abandoned free software project, forked from FFmpeg in 2011. Old versions can be found on:

libav-win64-20140213
Writing application                      : Lavf55.12.0
Writing library                          : Lavf55.12.0
Conformance errors                       : 2
 0x8538067                               : Yes
  General compliance                     : Element size 9091763 is more than maximal permitted size 8564548 (offset 0x3B)
 Matroska                                : Yes
  General compliance                     : File size 8564607 is less than expected size at least 9091822 (offset 0x3B)

The name of the command line tool and the parameters changed throughout the years. The example below cuts the first minute without re-encoding.

ffmpeg -i sample.mkv -t 00:01:00 -vcodec copy -acodec copy -dcodec copy out.mkv
avconv -i sample.mkv -t 00:01:00 -dcodec copy out.mkv

A 16 byte Segment UID needs to be copied over from the broken sample to the newly created sample. All other bytes must match with the broken sample.

View the MKV file structure with MKVToolnix GUI.
Use KDiff3 to compare two samples with each other.
Use a hex editor like HxD to copy over a Segment UID or time stamp.

Notes and examples

It's possible to have a new sample that is an exact match of the broken sample except for the date. But the total file size can still be off by tens of bytes, even though the sample has the correct duration.

mkvmerge -o y.mkv --no-chapters --track-name 0:"Inheritance.2020.MULTi.1080p.BluRay.x264-THREESOME.mkv" --split timecodes:00:05:00.000,00:06:15.000 Inheritance.2020.MULTi.1080p.BluRay.x264-THREESOME.mkv

MKVmerge end tags fixing

Always expirment with another release from the same group first! This helps to get the settings right, but more importantly it lets you know whether any changes are needed in the data part you can't verify with the broken sample. This example shows tags at the end of the sample that need fixing.

Gather information and download tool

Broken sample info with srs and MediaInfo:

pysrs the.great.north.s03e20.polish.1080p.web.h264-chopin-sample.mkv

Warning: File size does not appear to be correct!
         Expected: 22.956.938
         Found   : 18.620.416

Corruption detected: Invalid element length at 0x011A7AC1
General
Unique ID                                : 187419578719878253581850768036684347865 (0x8CFFB2A866D348C1DDAF4A4DC4E7B9D9)
Complete name                            : The.Great.North.S03E20.POLiSH.1080p.WEB.H264-CHOPiN\Sample\the.great.north.s03e20.polish.1080p.web.h264-chopin-sample.mkv
Format                                   : Matroska
Format version                           : Version 4
File size                                : 17.8 MiB
Duration                                 : 1 min 4 s
Overall bit rate mode                    : Variable
Overall bit rate                         : 2 318 kb/s
Frame rate                               : 23.976 FPS
Encoded date                             : 2024-08-26 07:47:43 UTC
Writing application                      : mkvmerge v63.0.0 ('Everything') 64-bit
Writing library                          : libebml v1.4.2 + libmatroska v1.6.4
Conformance errors                       : 2
 0x8538067                               : Yes
  General compliance                     : Element size 22956886 is more than maximal permitted size 18620364 (offset 0x34)
 Matroska                                : Yes
  General compliance                     : File size 18620416 is less than expected size at least 22956938 (offset 0x34)

Download a matching mkvtoonix version to use:
https://mkvtoolnix.download/windows/releases/63.0.0/mkvtoolnix-64-bit-63.0.0.7z

Create sample

In this step you need to get a visual match with the right size via a trial and error. Use Ctrl-C to cancel when the third part starts.

mkvmerge.exe -o out.mkv the.great.north.s03e20.polish.1080p.web.h264-chopin.mkv --split timecodes:00:05:00.000,00:06:00.000

expected: 22956938 bytes and got: 22.956.938 out-002.mkv

Compare samples

Use KDiff3 to have visual of how close you are. There should be almost no differences.

Of the bytes matching the broken sample: Date and Segment UID are different. With MKVToolnix GUI you can see which fields are different and what the readable value is.

In the missing bytes range of my control example I see only 4 datetime stamps in the Tags section that differ. Verify that the date is an exact match with the Date in Segment information. Since this is the case and no other meta data needs changing, we can recreate this sample!

In the example of the.great.north.s03e20 there are 5 _STATISTICS_WRITING_DATE_UTC tags in total that need to have the timestamp fixed.

Fix fields

Copy over the Date and Segment UID fields from the broken sample with a hex editor. You can verify with a diff.

Fix the dates in the tags at the end with a hex editor. Make sure to edit all tags.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License