LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General > LinuxQuestions.org Member Success Stories
User Name
Password
LinuxQuestions.org Member Success Stories Just spent four hours configuring your favorite program? Just figured out a Linux problem that has been stumping you for months?
Post your Linux Success Stories here.

Notices


Reply
  Search this Thread
Old 11-26-2017, 04:58 PM   #1
zpimp
Member
 
Registered: Oct 2014
Posts: 73

Rep: Reputation: Disabled
download and convert multiple youtube links


so i wanted to get music from youtube

this is the simple version, it gets files one by one, it only supports video urls (no playlists)
you need to have youtube-dl installed (latest version)
i got it from here https://packages.debian.org/sid/all/youtube-dl/download
just put your youtube urls in file called "list0.txt" one on each line
it downloads m4a (audio only) and then converts to mp3


Code:
#!/bin/bash
START=$(date +%s)
 
echo "get links and download files one by one"
youtube-dl -a list.txt --no-warnings --no-check-certificate -f 140
 
echo "convert files one by one"
for i in *.m4a; do ffmpeg -i "$i" -acodec libmp3lame -aq 2 "${i%.*}.mp3"; done
 
END=$(date +%s)
DIFF=$(( $END - $START ))
echo time-is  $DIFF

this is the second version, it does stuff in parallel
for this you also need to install aria2 and gnu parallel

just put your youtube urls (videos/playlists/channels) in "list0.txt" one on each line


Code:
#!/bin/bash
 
echo log started - $(date) > g.log

START=$(date +%s)

youtube-dl -a list0.txt -j --flat-playlist --no-check-certificate  > list1.txt
cat list1.txt |sed 's/": "/__/g ; s/"/_/g; s/,/|/g ;'|grep -Po '_id__(.{11})_\|'|sort|uniq | sed 's/_id__//g;s/_|//g' \
 |sed -e 's/^/https:\/\/www.youtube.com\/watch?v=/'>list2.txt

END=$(date +%s)
DIFF=$(( $END - $START ))
echo get video ids from links - serial youtube-dl - $DIFF >> g.log

#=====================================

START=$(date +%s)

cat list2.txt|parallel -j8 youtube-dl "{}" --no-warnings --no-check-certificate --skip-download -q -f 140 --get-filename --get-url -o "\|\|%\(id\)s_%\(title\)s.m4a" \> \$\{RANDOM\}.ttt
cat *.ttt|sed 's/||/  out=/g'>d2.txt
rm *.ttt


END=$(date +%s)
DIFF=$(( $END - $START ))
echo get download links - parallel youtube-dl - $DIFF >> g.log

#=====================================

START=$(date +%s)

echo "download - parallel"
echo time = $(date)
aria2c --check-certificate=false -i d2.txt -j 4

END=$(date +%s)
DIFF=$(( $END - $START ))
echo download video/audio - aria2 - $DIFF >> g.log

#=====================================

START=$(date +%s)

echo "convert - parallel"
echo time = $(date)
find . -name "*.m4a" |parallel -j2  ffmpeg -i "{}" -acodec libmp3lame -aq 2 "{}.mp3"
 
END=$(date +%s)
DIFF=$(( $END - $START ))
echo convert audio/video - parallel ffmpeg - $DIFF >> g.log
the m4a/aac format is better quality, most of devices nowadays support it and is smaller in size, the mp3 conversion takes some time, but its supported by any device

tell me what you think

Last edited by zpimp; 11-26-2017 at 05:01 PM.
 
Old 08-20-2019, 01:48 PM   #2
zpimp
Member
 
Registered: Oct 2014
Posts: 73

Original Poster
Rep: Reputation: Disabled
new version 20190820

download audio of videos with more than 1_000_000 views from multiple channels m4a, and convert to mp3

Code:
#!/bin/bash


#list0.txt contains list of channels, playlists or videos, one per line
#list1.txt contains youtube output of playlists,channels
#list2.txt contains all youtube ids found in the channels,playlists you posted in list0.txt


#stop the script with ctrl+z 
#if you dont delete list2.txt the script will download the same ids skipping those wich exist

#if list2.txt doesent exist download all channels in list0.txt and create list2.txt again
FILE=list2.txt
if [[ -f "$FILE" ]]; then
    echo "$FILE exist"
else
youtube-dl -a list0.txt -j --flat-playlist --no-check-certificate  > list1.txt
cat list1.txt |sed 's/": "/__/g ; s/"/_/g; s/,/|/g ;'|grep -Po '_id__(.{11})_'|sort|uniq | sed 's/_id__//g;s/_$//g' |grep -v \| >list2.txt
fi



#get ids of all downloaded files in the current directory
ex=$(ls -l|grep m4a|grep -Po '.{15}$'|grep -Po '^.{11}')



#check each file in list2.txt
while read p; do


#if file is already downloaded
if [[ $ex == *$p* ]]; then
echo "----- $p exists ----- "
#if file doesent exist download, if video has more than min-views
  elif [[ $ex != *$p* ]];then
 echo "$p not found downloading "
 youtube-dl https://www.youtube.com/watch?v=$p --min-views 1_000_000 -f 140
fi


done <list2.txt



#convert every m4a file to mp3, j4 means use 4 threads, for a 4 core cpu
find . -name "*.m4a" |parallel -j4  ffmpeg -i "{}" -acodec libmp3lame -aq 2 "{}.mp3"

#mkdir mp & mv *mp3 mp/ & mkdir m4 & mv *m4a m4/
 
Old 08-25-2019, 11:08 AM   #3
Sefyir
Member
 
Registered: Mar 2015
Distribution: Linux Mint
Posts: 634

Rep: Reputation: 316Reputation: 316Reputation: 316Reputation: 316
Quote:
tell me what you think
Code:
cat list1.txt |sed 's/": "/__/g ; s/"/_/g; s/,/|/g ;'|grep -Po '_id__(.{11})_'|sort|uniq | sed 's/_id__//g;s/_$//g' |grep -v \| >list2.txt
This is a lot of piping through sed. There's probably a way of shrinking that down to 1-2 commands. I don't know sed well enough, but it might be worth a post to ask about

Code:
youtube-dl -a list0.txt -j --flat-playlist --no-check-certificate  > list1.txt
cat list1.txt |sed 's/": "/__/g ; s/"/_/g; s/,/|/g ;'|grep -Po '_id__(.{11})_'|sort|uniq | sed 's/_id__//g;s/_$//g' |grep -v \| >list2.txt
...
done <list2.txt
Have you considered process substitution? It can help get rid of files floating around

Instead of
Code:
find . > file1
while read i;
do
  echo "$i"
done < file1
Code:
while read i;
do
  echo "$i"
done < <(find .)
Code:
find . -name "*.m4a" |parallel -j4  ffmpeg -i "{}" -acodec libmp3lame -aq 2 "{}.mp3"
Both find and parallel support using null separators. find with -print0 and parallel with --null or -0. This can reduce the chance for errors if a file shows up with a newline in it.

Code:
"{}.mp3"
Since you're using parallel, remove the extension with {.}
Code:
parallel echo {} {.} {.}.mp3 ::: test.mp4
test.mp4 test test.mp3
Code:
#mkdir mp & mv *mp3 mp/ & mkdir m4 & mv *m4a m4/
Just do the moving of files with ffmpeg / parallel
Code:
parallel ffmpeg -i {} mp3_dir/{.}.mp3 ::: *mp4
Anyways good stuff. youtube download scripts are fun

I threw this together
Code:
ytaudio() { parallel "youtube-dl -qx --audio-format 'mp3' -o '%(title)s.%(ext)s' --restrict-filenames {} && echo Processed {}" ::: $@; }
Usage:
Code:
ytaudio link1 [linkn] [playlistlinkn]
ytaudio https://www.youtube.com/watch?v=jItnCGRsMjw
Code:
# ytaudio() {               # We're defining a bash function here                                                            
#   parallel "                                                                  
#     youtube-dl                                                                
#     -q                    # Quiet mode                                        
#     -x                    # Extract as audio file                             
#     --audio-format 'mp3'                                                      
#     -o '%(title)s.%(ext)s'# Output Template == my_music.mp3                    
#     --restrict-filenames  # Remove spaces and special characters              
#     {}                    # Check out other replacement strings for this       
#   &&                      # If previous command succeeds, do this              
#   echo Processed {}"                                                          
#   :::                     # After this, specify files                        
#     $@                    # Takes command line input. ./cmd a b c             
# ; }

Last edited by Sefyir; 08-25-2019 at 11:12 AM.
 
Old 08-25-2019, 01:57 PM   #4
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
can you provide sample lists?

anyway, after a very quick look I just wanted to clean some things up

no need for cat
http://porkmail.org/era/unix/award.html

Code:
cat list1.txt |sed 's/": "/__/g ; s/"/_/g; s/,/|/g ;'|grep -Po '_id__(.{11})_'|sort|uniq | sed 's/_id__//g;s/_$//g' |grep -v \| >list2.txt
Code:
<list1.txt sed 's/": "/__/g ; s/"/_/g; s/,/|/g ;'
anyway, it looks like you want the 11 chars between _id__ and _
many ways to do this, here is one with sed

Code:
<list1.txt sed -n 's/.*__id_\([a-Z0-9]\+\)_.*/\1/p'
what did I do?
well.
I substitued the whole line with what was found in the ()
you can save multiple patterns and reorder
e.g.

Code:
# edit didn't reorder 
#<list1.txt sed -n 's/.*\(__id_\)\([a-Z0-9]\+\)_.*/\1\2/p'
<list1.txt sed -n 's/.*\(__id_\)\([a-Z0-9]\+\)_.*/\2\1/p'
so

some junk before __id_gu7d54djkl_ more junk at end
outputs
gu7d54djkl__id_

looking back at your pipe to pipe to pipe again...
it looks like you created the __id_ placeholders

are you sedding json data?
if so you may find jq useful

it has a steep learning curve, but well worth it

as an example, some output from api.tmdb.org
Code:
{"page":1,"total_results":2,"total_pages":1,"results":[{"vote_count":13,"id":45049,"video":false,"vote_average":7.6,"title":"The Code","popularity":0.731,"poster_path":"\/fvIEpbgUS45JLZg6OZpq6ke9wOI.jpg","original_language":"en","original_title":"The Code","genre_ids":[28,53,99],"backdrop_path":"\/kwG1vm97uUFTiGhTgiJYr9aB0AM.jpg","adult":false,"overview":"The Code is a Finnish-made documentary about Linux, featuring some of the most influential people of the free software movement.","release_date":"2001-09-26"},{"vote_count":0,"id":243915,"video":false,"vote_average":0,"title":"LINUX die Reise des Pinguins","popularity":0.6,"poster_path":null,"original_language":"de","original_title":"LINUX die Reise des Pinguins","genre_ids":[99],"backdrop_path":null,"adult":false,"overview":"","release_date":"2009-03-14"}]}
ugly json
but with jq

Code:
<tmdb_api_output.json jq -C "."
{
  "page": 1,
  "total_results": 2,
  "total_pages": 1,
  "results": [
    {
      "vote_count": 13,
      "id": 45049,
      "video": false,
      "vote_average": 7.6,
      "title": "The Code",
      "popularity": 0.731,
      "poster_path": "/fvIEpbgUS45JLZg6OZpq6ke9wOI.jpg",
      "original_language": "en",
      "original_title": "The Code",
      "genre_ids": [
        28,
        53,
        99
      ],
      "backdrop_path": "/kwG1vm97uUFTiGhTgiJYr9aB0AM.jpg",
      "adult": false,
      "overview": "The Code is a Finnish-made documentary about Linux, featuring some of the most influential people of the free software movement.",
      "release_date": "2001-09-26"
    },
    {
      "vote_count": 0,
      "id": 243915,
      "video": false,
      "vote_average": 0,
      "title": "LINUX die Reise des Pinguins",
      "popularity": 0.6,
      "poster_path": null,
      "original_language": "de",
      "original_title": "LINUX die Reise des Pinguins",
      "genre_ids": [
        99
      ],
      "backdrop_path": null,
      "adult": false,
      "overview": "",
      "release_date": "2009-03-14"
    }
  ]
}
and this
Code:
jq -r ".results[0]|.id,.title,.release_date,.overview,.vote_count,.vote_average"
outputs this
Code:
45049
The Code
2001-09-26
The Code is a Finnish-made documentary about Linux, featuring some of the most influential people of the free software movement.
13
7.6
I use jq more and more,
I even wrote a very nasty sed script to convert puluseaudio's `pacmd list-sink-inputs` output to json because it was so much nicer to use jq to automate stuff


tl;dr use jq instead of sed

Last edited by Firerat; 08-25-2019 at 02:01 PM.
 
Old 08-25-2019, 02:48 PM   #5
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
oops, forgot when I got carried away with jq
Code:
....|sort|uniq...
sort can filter unique
Code:
sort -u
also have a very quick look at ytdl

play around with this
Code:
while read yt_id;do
    echo ${yt_id:2:11}
    # ${var:pos:len}
    # https://www.tldp.org/LDP/abs/html/parameter-substitution.html
    # note, bash usually starts counting at 0, so 2 is the third char.
done < <(
    youtube-dl \
    -a list0.txt \
    -j \
    --flat-playlist \
    --no-check-certificate \
    | jq -r "._filename"
)
you may notice that I did away with the need to write to storage
something else you can try

Code:
raw_ytdl_json=$(
    youtube-dl \
    -a list0.txt \
    -j \
    --flat-playlist \
    --no-check-certificate )
now that you have the json data in memory
Code:
<<<$raw_ytdl_json jq -r "._filename[2:13]"
/!\ note the 2:13 is not 2:11
https://stedolan.github.io/jq/manual/
Quote:
Array/String Slice: .[10:15]

The .[10:15] syntax can be used to return a subarray of an array or substring of a string. The array returned by .[10:15] will be of length 5, containing the elements from index 10 (inclusive) to index 15 (exclusive). Either index may be negative (in which case it counts backwards from the end of the array), or omitted (in which case it refers to the start or end of the array).
Edit: tbh probably not the best idea to stick all that json data into teh var, it could be huge!
the first example is better
I would probable stick them into a bash array and work with them later
something like
Code:
......
my_array+=($yt_id) # instead of that echo
......
#later on 
get_it(){
    id="$1"
    prefix="https://blahblah"
    suffix="some_end_bit"

    some_prog_to_dl_it "${prefix}${id}{suffix}"
}
for ((i=0;i<${#my_array[@]};i++));do
     [[ -e /some/dir/${my_array[i]}.mp4 ]] \
        || get_it "${my_array[i]}" 
done

Last edited by Firerat; 08-25-2019 at 03:04 PM.
 
Old 08-25-2019, 09:17 PM   #6
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
ok, so I got bored

This is quite dumb, not secure to go piping stuff directly into ffmpeg but for fun I came up with this

Code:
#!/bin/bash

# uncomment for optional proxy
#Proxy="--proxy a_squid_proxy:3128" 

UserAgent="$(youtube-dl --dump-user-agent)"
# probably no need to use the same UA, but what the heck

InputList="$1"

Get_ids(){
    youtube-dl ${Proxy} \
        -a "$InputList" \
        -j --flat-playlist --no-check-certificate \
        | jq -j ".id,\" \",.title,\"\n\""
}
Get_url(){
    youtube-dl $Proxy \
    "https://www.youtube.com/watch?v=${id_title%% *}" \
    --no-warnings --no-check-certificate --skip-download -q -f 140 --get-url
}

Get_mp3(){
while read id_title;do
# jq spat out "dehfuefhhf some song title"
# ${id_title#* } // that deletes the id part
    # skip if we already have 
    [[ -e "${id_title#* }.mp3" ]] && continue

    curl -s  -A "${UserAgent}" ${Proxy} \
        "$(Get_url)" \
        | ffmpeg -hide_banner \
        -i - \
        -c:a libmp3lame -aq 2 \
        "${id_title#* }.mp3"
    # exit after the first one
    exit
done< <(Get_ids)
}

[[ -e "$InputList" ]] \
    && [[ $(file -b --mime-type "$InputList" ) == "text/plain" ]] \
    && Get_mp3
exit
No real checks, just blindly does stuff
and having ffmpeg use stdin from some random internet page is asking for trouble


It would be *much* safer to dl the m4a and then use ffmpeg after testing that the m4a is actually aac data ( which is what you have already been doing )
I just thought it would be fun to skip that step

see if you can come up with some checks on the ids and urls, that they are in the expected length/format.
maybe use parallel to start a chain of
Get_m4a && check_m4a && ffmpeg_to_mp3

since jq is a shiny new toy
Code:
mediainfo --Output=JSON
ffprobe -hide_banner \
        -print_format json \
        -show_format \
        -show_streams \
        -show_chapters
one day I might hack away at pulseaudio to have it output json
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: youtube-dl – download Youtube videos from linux command Line LXer Syndicated Linux News 0 06-24-2014 12:11 PM
download multiple video from youtube newbie0101 Linux - Software 1 04-01-2012 08:51 AM
download issue from youtube site using Firefox addon download helper ubume2 General 0 09-27-2011 07:51 AM
LXer: YouTube and GNU/Linux: download and convert videos the easy way LXer Syndicated Linux News 0 12-11-2008 05:11 PM
LXer: Youtube-dl - Download videos from Youtube in openSUSE LXer Syndicated Linux News 1 08-14-2008 08:10 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General > LinuxQuestions.org Member Success Stories

All times are GMT -5. The time now is 08:02 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration