Backfill samples

Tutorial for backfilling samples from DC++

Get list of releases with missing samples

Ask one of the admins.
Maybe create a download link for the users? One without music. + add 1h cache

Based on auto searching (slow)

This is based on the idea that you have a list of release names that you are searching the sample for.

  1. Exit AirDC++.
  2. Run the PHP script below with an input file (one release per line of which the sample is missing).
  3. The script will generate an AutoSearch.xml compatible with AirDC 2.70 and probably few versions older software as well.
  4. Overwrite the existing AutoSearch.xml in the AirDC/Settings folder.
  5. Run AirDC++. It will now search and whenever it find a result, it will add it to the download queue. It waits for you to set it to download before it begins.
<?php
$filename = $argv[1];
 
if(!file_exists($filename)) {
    echo "Input file does not exist";
    exit();
} else {
    $file = file_get_contents($filename);
}
 
$releases = explode("\n", $file);
?>
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
 
<Autosearch LastPosition="0">
    <Autosearch>
<?php
$i = 0;
foreach($releases as $value) {
    $value = trim($value);
    if(!empty($value) && $value[0] !== "#") {
 
?>
        <Autosearch Enabled="1" SearchString="<?php echo $value; ?> sample" FileType="7" Action="1" Remove="1" Target="C:\AutoSearch\<?php echo $value; ?>" TargetType="0" MatcherType="0" MatcherString="<?php echo $value; ?> sample" UserMatch="" ExpireTime="0" CheckAlreadyQueued="0" CheckAlreadyShared="0" SearchDays="1111111" StartTime="0000" EndTime="2359" LastSearchTime="0" MatchFullPath="1" ExcludedWords="" Token="<?php echo $i; ?>">
            <Params Enabled="0" CurNumber="1" MaxNumber="0" MinNumberLen="2" LastIncFinish="0"/>
        </Autosearch>
<?php
    }
$i++;
}
?>
    </Autosearch>
</Autosearch>

Target="C:\AutoSearch\" is the path used when adding it to the queue. This will be the download folder.
FileType="7" is folder. "Because we choose to download the sample folder and all of its content"

Based on file lists (fast, but limited)

  1. Download a file list from a user you know has many samples you like.
  2. TODO: figure out how to do it
  3. Parse the xml, get list of release names with missing samples, create xml for download queue? Create an other input list with fewer found samples?

Generating SRS files

Use the following .bat file:

@ECHO OFF
SET srr="C:\Python27\python.exe" "D:\Downloads\SRR\pyReScene\bin\srr"
SET srs="C:\Python27\python.exe" "D:\Downloads\SRR\pyReScene\bin\srs"
SET output="C:\srr_output\"

goto start

:srs
SET gx=%1
SET filename=%gx:~1,-5%
SET extension=%gx:~-4,-1%
FOR %%T in (%gx%) DO SET path=%%~pT
%srs% %gx% -y -o %path% 2> "%filename%.%extension%.txt"
IF EXIST "%filename%.srs" del "%filename%.%extension%.txt"
echo Done %gx%
goto eof

:start
FOR /F "usebackq tokens=*" %%G IN (`dir /A:-D /B /S *.avi *.mkv *.mp4 *.wmv`) DO CALL :srs "%%G"

:eof

or the new Python script that works everywhere:
https://bitbucket.org/Gfy/pyrescene/src/default/bin/srs_batch.py

Uploading SRS files

  • check for doubles, subpacks

List all dirs that have more than one file:

find . -type d -exec sh -c 'set -- "$0"/*.*; [ $# -gt 1 ]' {} \; -print
pyReScene/scripts/more_files.py . -mneuc -o _issues
Usage: more_files.py [arguments] [directory]'
This tool will list all directories with multiple files or none.

Options:
  --version       show program's version number and exit
  -h, --help      show this help message and exit
  -m, --more      more than one
  -n, --none      no files in dir
  -e, --empty     empty files in dir
  -u, --usenet    usenet-space-cowbys.info
  -c, --capitals  .MKV.txt, .AVI.txt, MP4.txt or WMV.txt
  -o DIRECTORY    move the found directories to this location
  • filter bad ones

These are for when the samples come from Usenet and the nzb downloader capitalizes the file name when the download failed.

find . -name "*.MKV.txt"
find . -name "*.AVI.txt"
find . -type f \( -name "*.MKV.txt" -o -name "*.AVI.txt" \)
find . -type f \( -name "*.MKV.txt" -o -name "*.AVI.txt" -o -name "*.MP4.txt" -o -name "*.WMV.txt" \) -exec dirname {} \;
  • check for empty txt files
grootte:leeg
size:empty
  • check with releases already on site (Better now: do not upload with an admin account -> new files will end up in Adds queue for checking)

http://www.srrdb.com/tools/listsearch

As admin: download the .csv list with release names and use that huge txt file: a lot faster.

releasename;nfo;sample(yes,no,broken);srrhash;imdb;foreign(yes/no);group;uploaddate;modifieddate;confirmed(yes/no);userid;username
((NAVTILVS))-Polaris_Vltimat_I-EP-2010-FiH;yes;yes;da39a3ee5e6b4b0d3255bfef95601890afd80709;;no;FiH;2014-01-18 14:27:54;2014-01-18 14:27:54;yes;1;xxx
...
grep -P '.+?;(yes|no);(no|broken);.*' list.csv > no_srs.txt

Open no_srs.txt in vim and type

:%s/^\(.\{-}\);.*/\1/g
:wq

pyReScene\scripts\list_compare.py or use the unix command comm.

  • Move/check dirs with samples that are already on the site to an other directory.
list_compare.py no_srs.txt (dir /b resultlist) -u > already_sample_or_rel_not_in_db.txt
  • separate the samples that do not have an SRR on the site
  • upload only the SRS files
-e.srs
  • then do the txt files

USE -t FLAG!

  • LOG WITH tee !!!!

If something went wrong, you can go back to look for the error.

srrdb-uploader -e.srs directory | tee srs_upload.txt
  • Upload tool:

https://bitbucket.org/Gfy/srrdb-uploader/downloads

Each SRS must be in a folder with the release name. A Sample folder will be automatically added on the site. It will look as follows when there is no Sample folder or the samples are already on the site.

Storing file 'sample.srs' in 'Sample' with release
             'Release.Name-GROUP'.
'Sample/sample.srs' successfully uploaded.

If there already is a sample folder, that one will be used:

Storing file 'sample.srs' in 'SAMPLE' with release
             'Release.HDTV.x264-W4F'.

Check how much the SRS percentage grows

I found some old numbers. 56% is quite low, so that one might be from Usenet. It wasn't in the how to file, so it's just a guess.

56.3%
56.54%
62.95%
63.64%

62.44%
63.72%
72.05%

2014-01-18: 74% (upload with many music SRRs started the day before)
2014-01-24: 75.3%
2014-06-15: 79.4%
2014-07-05: 80.6% (before backfilling samples from uploader: 5576 results)
2014-07-06: 81% (after)
2014-11-22: 86.4% -> 86.5%. (2303 files processed.)

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License