Backfill samples

Tutorial for backfilling samples from DC++

Get list of releases with missing samples

Ask one of the admins.
Maybe create a download link for the users? One without music. + add 1h cache

https://www.srrdb.com/open

Based on auto searching (slow)

This is based on the idea that you have a list of release names that you are searching the sample for.

  1. Exit AirDC++.
  2. Run the PHP script below with an input file (one release per line of which the sample is missing).
  3. The script will generate an AutoSearch.xml compatible with AirDC 2.70 and probably few versions older software as well.
  4. Overwrite the existing AutoSearch.xml in the AirDC/Settings folder.
  5. Run AirDC++. It will now search and whenever it find a result, it will add it to the download queue. It waits for you to set it to download before it begins.
<?php
$filename = $argv[1];
 
if(!file_exists($filename)) {
    echo "Input file does not exist";
    exit();
} else {
    $file = file_get_contents($filename);
}
 
$releases = explode("\n", $file);
?>
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
 
<Autosearch LastPosition="0">
    <Autosearch>
<?php
$i = 0;
foreach($releases as $value) {
    $value = trim($value);
    if(!empty($value) && $value[0] !== "#") {
 
?>
        <Autosearch Enabled="1" SearchString="<?php echo $value; ?> sample" FileType="7" Action="1" Remove="1" Target="C:\AutoSearch\<?php echo $value; ?>" TargetType="0" MatcherType="0" MatcherString="<?php echo $value; ?> sample" UserMatch="" ExpireTime="0" CheckAlreadyQueued="0" CheckAlreadyShared="0" SearchDays="1111111" StartTime="0000" EndTime="2359" LastSearchTime="0" MatchFullPath="1" ExcludedWords="" Token="<?php echo $i; ?>">
            <Params Enabled="0" CurNumber="1" MaxNumber="0" MinNumberLen="2" LastIncFinish="0"/>
        </Autosearch>
<?php
    }
$i++;
}
?>
    </Autosearch>
</Autosearch>

Target="C:\AutoSearch\" is the path used when adding it to the queue. This will be the download folder.
FileType="7" is folder. "Because we choose to download the sample folder and all of its content"

c:\sample-backfill>php-8.2.10-nts-Win32-vs16-x64\php.exe -d short_open_tag=Off create-autosearch.php input.txt > AutoSearch.xml

Based on file lists (fast, but limited)

  1. Download a file list from a user you know has many samples you like. This is built into AirDC++: right click 'Get file list' on a user.
  2. The Python script below removes all files and folders you don't want to have in the file list.
    1. Edit the code for only subtitles or proofs and covers
    2. Only broken or missing samples will be kept in the file list when a text file listing those release names exists. You can create such a text file based on the list search on srrdb or using the dumped text files on http://www.srrdb.com/open.
  3. File -> Open file list… to open the shrunken file list in the DC client again.
  4. Download the files and use one of the other scripts to create vobsub srr files or sample srs files.
import bz2
import os.path
import xml.etree.ElementTree as ET
 
releasenames = []
release_file = 'missing-broken-sample.txt'
if os.path.exists(release_file):
    with open(release_file) as f:
        releasenames = [line.rstrip('\n') for line in f]
 
tree = ET.parse('filelist.xml')
root = tree.getroot()
 
cid = root.attrib['CID'];
 
def clean_node(topnode, name_topnode):
    remove_top = True
    for node in reversed(topnode):
        name = node.attrib['Name']
        if node.tag == 'Directory':
            print(name)
 
            # keep small files in certain subfolders
            if len(releasenames):
                if name.lower() == 'sample' and name_topnode in releasenames:
                    remove_top = False
                else:
                    subtree_removed = clean_node(node, name)
                    if subtree_removed:
                        topnode.remove(node)
                    else:
                        remove_top = False
            else:
                #if name.lower() in ('subs', 'sub'):
                if name.lower() in ('proof', 'proofs', 'cover', 'covers'):
                    remove_top = False
                else:
                    subtree_removed = clean_node(node, name)
                    if subtree_removed:
                        topnode.remove(node)
                    else:
                        remove_top = False
        if node.tag == 'File':
            topnode.remove(node)
    return remove_top
 
clean_node(root, root.attrib['Base'])
 
output_xml = 'samples.%s.xml' % cid
tree.write(output_xml)
 
# compression not needed for import by AirDC++
with open(output_xml, 'rb') as data:
    tarbz2contents = bz2.compress(data.read())
    with open(output_xml + ".bz2", 'wb') as out:
        out.write(tarbz2contents)
 
# create .srr files for vobsubs
# for /f %f in ('dir /s /b *.rar') do pyrescene --vobsubs %f -o %~dpf

Generating SRS files

Use the following .bat file:

@ECHO OFF
SET srr="C:\Python27\python.exe" "D:\Downloads\SRR\pyReScene\bin\srr"
SET srs="C:\Python27\python.exe" "D:\Downloads\SRR\pyReScene\bin\srs"
SET output="C:\srr_output\"

goto start

:srs
SET gx=%1
SET filename=%gx:~1,-5%
SET extension=%gx:~-4,-1%
FOR %%T in (%gx%) DO SET path=%%~pT
%srs% %gx% -y -o %path% 2> "%filename%.%extension%.txt"
IF EXIST "%filename%.srs" del "%filename%.%extension%.txt"
echo Done %gx%
goto eof

:start
FOR /F "usebackq tokens=*" %%G IN (`dir /A:-D /B /S *.avi *.mkv *.mp4 *.wmv`) DO CALL :srs "%%G"

:eof

or the new Python script that works everywhere:
https://bitbucket.org/Gfy/pyrescene/src/default/bin/srs_batch.py

Uploading SRS files

  • check for doubles, subpacks

List all dirs that have more than one file:

find . -type d -exec sh -c 'set -- "$0"/*.*; [ $# -gt 1 ]' {} \; -print
pyReScene/scripts/more_files.py . -mneuc -o _issues
Usage: more_files.py [arguments] [directory]'
This tool will list all directories with multiple files or none.

Options:
  --version       show program's version number and exit
  -h, --help      show this help message and exit
  -m, --more      more than one
  -n, --none      no files in dir
  -e, --empty     empty files in dir
  -u, --usenet    usenet-space-cowbys.info
  -c, --capitals  .MKV.txt, .AVI.txt, MP4.txt or WMV.txt
  -o DIRECTORY    move the found directories to this location
  • filter bad ones

These are for when the samples come from Usenet and the nzb downloader capitalizes the file name when the download failed.

find . -name "*.MKV.txt"
find . -name "*.AVI.txt"
find . -type f \( -name "*.MKV.txt" -o -name "*.AVI.txt" \)
find . -type f \( -name "*.MKV.txt" -o -name "*.AVI.txt" -o -name "*.MP4.txt" -o -name "*.WMV.txt" \) -exec dirname {} \;
  • check for empty txt files
grootte:leeg
size:empty
  • check with releases already on site (Better now: do not upload with an admin account -> new files will end up in Adds queue for checking)

http://www.srrdb.com/tools/listsearch

As admin: download the .csv list with release names and use that huge txt file: a lot faster.

releasename;nfo;sample(yes,no,broken);srrhash;imdb;foreign(yes/no);group;uploaddate;modifieddate;confirmed(yes/no);userid;username
((NAVTILVS))-Polaris_Vltimat_I-EP-2010-FiH;yes;yes;da39a3ee5e6b4b0d3255bfef95601890afd80709;;no;FiH;2014-01-18 14:27:54;2014-01-18 14:27:54;yes;1;xxx
...
grep -P '.+?;(yes|no);(no|broken);.*' list.csv > no_srs.txt

Open no_srs.txt in vim and type

:%s/^\(.\{-}\);.*/\1/g
:wq

pyReScene\scripts\list_compare.py or use the unix command comm.

  • Move/check dirs with samples that are already on the site to an other directory.
list_compare.py no_srs.txt (dir /b resultlist) -u > already_sample_or_rel_not_in_db.txt
  • separate the samples that do not have an SRR on the site
  • upload only the SRS files
-e.srs
  • then do the txt files

USE -t FLAG!

  • LOG WITH tee !!!!

If something went wrong, you can go back to look for the error.

srrdb-uploader -e.srs directory | tee srs_upload.txt
  • Upload tool:

https://bitbucket.org/Gfy/srrdb-uploader/downloads

Each SRS must be in a folder with the release name. A Sample folder will be automatically added on the site. It will look as follows when there is no Sample folder or the samples are already on the site.

Storing file 'sample.srs' in 'Sample' with release
             'Release.Name-GROUP'.
'Sample/sample.srs' successfully uploaded.

If there already is a sample folder, that one will be used:

Storing file 'sample.srs' in 'SAMPLE' with release
             'Release.HDTV.x264-W4F'.

Check how much the SRS percentage grows

I found some old numbers. 56% is quite low, so that one might be from Usenet. It wasn't in the how to file, so it's just a guess.

56.3%
56.54%
62.95%
63.64%

62.44%
63.72%
72.05%

2014-01-18: 74% (upload with many music SRRs started the day before)
2014-01-24: 75.3%
2014-06-15: 79.4%
2014-07-05: 80.6% (before backfilling samples from uploader: 5576 results)
2014-07-06: 81% (after)
2014-11-22: 86.4% -> 86.5%. (2303 files processed.)
2018-11-02: 86.8% (Total SRR: 4719120, Total NFO: 4709485, Total SRS: 4096136)

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License