LinuxQuestions.org Member Success StoriesJust spent four hours configuring your favorite program? Just figured out a Linux problem that has been stumping you for months?
Post your Linux Success Stories here.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
this is the simple version, it gets files one by one, it only supports video urls (no playlists)
you need to have youtube-dl installed (latest version)
i got it from here https://packages.debian.org/sid/all/youtube-dl/download
just put your youtube urls in file called "list0.txt" one on each line
it downloads m4a (audio only) and then converts to mp3
Code:
#!/bin/bash
START=$(date +%s)
echo "get links and download files one by one"
youtube-dl -a list.txt --no-warnings --no-check-certificate -f 140
echo "convert files one by one"
for i in *.m4a; do ffmpeg -i "$i" -acodec libmp3lame -aq 2 "${i%.*}.mp3"; done
END=$(date +%s)
DIFF=$(( $END - $START ))
echo time-is $DIFF
this is the second version, it does stuff in parallel
for this you also need to install aria2 and gnu parallel
just put your youtube urls (videos/playlists/channels) in "list0.txt" one on each line
the m4a/aac format is better quality, most of devices nowadays support it and is smaller in size, the mp3 conversion takes some time, but its supported by any device
download audio of videos with more than 1_000_000 views from multiple channels m4a, and convert to mp3
Code:
#!/bin/bash
#list0.txt contains list of channels, playlists or videos, one per line
#list1.txt contains youtube output of playlists,channels
#list2.txt contains all youtube ids found in the channels,playlists you posted in list0.txt
#stop the script with ctrl+z
#if you dont delete list2.txt the script will download the same ids skipping those wich exist
#if list2.txt doesent exist download all channels in list0.txt and create list2.txt again
FILE=list2.txt
if [[ -f "$FILE" ]]; then
echo "$FILE exist"
else
youtube-dl -a list0.txt -j --flat-playlist --no-check-certificate > list1.txt
cat list1.txt |sed 's/": "/__/g ; s/"/_/g; s/,/|/g ;'|grep -Po '_id__(.{11})_'|sort|uniq | sed 's/_id__//g;s/_$//g' |grep -v \| >list2.txt
fi
#get ids of all downloaded files in the current directory
ex=$(ls -l|grep m4a|grep -Po '.{15}$'|grep -Po '^.{11}')
#check each file in list2.txt
while read p; do
#if file is already downloaded
if [[ $ex == *$p* ]]; then
echo "----- $p exists ----- "
#if file doesent exist download, if video has more than min-views
elif [[ $ex != *$p* ]];then
echo "$p not found downloading "
youtube-dl https://www.youtube.com/watch?v=$p --min-views 1_000_000 -f 140
fi
done <list2.txt
#convert every m4a file to mp3, j4 means use 4 threads, for a 4 core cpu
find . -name "*.m4a" |parallel -j4 ffmpeg -i "{}" -acodec libmp3lame -aq 2 "{}.mp3"
#mkdir mp & mv *mp3 mp/ & mkdir m4 & mv *m4a m4/
This is a lot of piping through sed. There's probably a way of shrinking that down to 1-2 commands. I don't know sed well enough, but it might be worth a post to ask about
Both find and parallel support using null separators. find with -print0 and parallel with --null or -0. This can reduce the chance for errors if a file shows up with a newline in it.
Code:
"{}.mp3"
Since you're using parallel, remove the extension with {.}
Code:
parallel echo {} {.} {.}.mp3 ::: test.mp4
test.mp4 test test.mp3
Code:
#mkdir mp & mv *mp3 mp/ & mkdir m4 & mv *m4a m4/
Just do the moving of files with ffmpeg / parallel
Code:
parallel ffmpeg -i {} mp3_dir/{.}.mp3 ::: *mp4
Anyways good stuff. youtube download scripts are fun
# ytaudio() { # We're defining a bash function here
# parallel "
# youtube-dl
# -q # Quiet mode
# -x # Extract as audio file
# --audio-format 'mp3'
# -o '%(title)s.%(ext)s'# Output Template == my_music.mp3
# --restrict-filenames # Remove spaces and special characters
# {} # Check out other replacement strings for this
# && # If previous command succeeds, do this
# echo Processed {}"
# ::: # After this, specify files
# $@ # Takes command line input. ./cmd a b c
# ; }
anyway, it looks like you want the 11 chars between _id__ and _
many ways to do this, here is one with sed
Code:
<list1.txt sed -n 's/.*__id_\([a-Z0-9]\+\)_.*/\1/p'
what did I do?
well.
I substitued the whole line with what was found in the ()
you can save multiple patterns and reorder
e.g.
Code:
# edit didn't reorder
#<list1.txt sed -n 's/.*\(__id_\)\([a-Z0-9]\+\)_.*/\1\2/p'
<list1.txt sed -n 's/.*\(__id_\)\([a-Z0-9]\+\)_.*/\2\1/p'
so
some junk before __id_gu7d54djkl_ more junk at end
outputs gu7d54djkl__id_
looking back at your pipe to pipe to pipe again...
it looks like you created the __id_ placeholders
are you sedding json data?
if so you may find jq useful
it has a steep learning curve, but well worth it
as an example, some output from api.tmdb.org
Code:
{"page":1,"total_results":2,"total_pages":1,"results":[{"vote_count":13,"id":45049,"video":false,"vote_average":7.6,"title":"The Code","popularity":0.731,"poster_path":"\/fvIEpbgUS45JLZg6OZpq6ke9wOI.jpg","original_language":"en","original_title":"The Code","genre_ids":[28,53,99],"backdrop_path":"\/kwG1vm97uUFTiGhTgiJYr9aB0AM.jpg","adult":false,"overview":"The Code is a Finnish-made documentary about Linux, featuring some of the most influential people of the free software movement.","release_date":"2001-09-26"},{"vote_count":0,"id":243915,"video":false,"vote_average":0,"title":"LINUX die Reise des Pinguins","popularity":0.6,"poster_path":null,"original_language":"de","original_title":"LINUX die Reise des Pinguins","genre_ids":[99],"backdrop_path":null,"adult":false,"overview":"","release_date":"2009-03-14"}]}
ugly json
but with jq
Code:
<tmdb_api_output.json jq -C "."
{
"page": 1,
"total_results": 2,
"total_pages": 1,
"results": [
{
"vote_count": 13,
"id": 45049,
"video": false,
"vote_average": 7.6,
"title": "The Code",
"popularity": 0.731,
"poster_path": "/fvIEpbgUS45JLZg6OZpq6ke9wOI.jpg",
"original_language": "en",
"original_title": "The Code",
"genre_ids": [
28,
53,
99
],
"backdrop_path": "/kwG1vm97uUFTiGhTgiJYr9aB0AM.jpg",
"adult": false,
"overview": "The Code is a Finnish-made documentary about Linux, featuring some of the most influential people of the free software movement.",
"release_date": "2001-09-26"
},
{
"vote_count": 0,
"id": 243915,
"video": false,
"vote_average": 0,
"title": "LINUX die Reise des Pinguins",
"popularity": 0.6,
"poster_path": null,
"original_language": "de",
"original_title": "LINUX die Reise des Pinguins",
"genre_ids": [
99
],
"backdrop_path": null,
"adult": false,
"overview": "",
"release_date": "2009-03-14"
}
]
}
45049
The Code
2001-09-26
The Code is a Finnish-made documentary about Linux, featuring some of the most influential people of the free software movement.
13
7.6
I use jq more and more,
I even wrote a very nasty sed script to convert puluseaudio's `pacmd list-sink-inputs` output to json because it was so much nicer to use jq to automate stuff
The .[10:15] syntax can be used to return a subarray of an array or substring of a string. The array returned by .[10:15] will be of length 5, containing the elements from index 10 (inclusive) to index 15 (exclusive). Either index may be negative (in which case it counts backwards from the end of the array), or omitted (in which case it refers to the start or end of the array).
Edit: tbh probably not the best idea to stick all that json data into teh var, it could be huge!
the first example is better
I would probable stick them into a bash array and work with them later
something like
Code:
......
my_array+=($yt_id) # instead of that echo
......
#later on
get_it(){
id="$1"
prefix="https://blahblah"
suffix="some_end_bit"
some_prog_to_dl_it "${prefix}${id}{suffix}"
}
for ((i=0;i<${#my_array[@]};i++));do
[[ -e /some/dir/${my_array[i]}.mp4 ]] \
|| get_it "${my_array[i]}"
done
This is quite dumb, not secure to go piping stuff directly into ffmpeg but for fun I came up with this
Code:
#!/bin/bash
# uncomment for optional proxy
#Proxy="--proxy a_squid_proxy:3128"
UserAgent="$(youtube-dl --dump-user-agent)"
# probably no need to use the same UA, but what the heck
InputList="$1"
Get_ids(){
youtube-dl ${Proxy} \
-a "$InputList" \
-j --flat-playlist --no-check-certificate \
| jq -j ".id,\" \",.title,\"\n\""
}
Get_url(){
youtube-dl $Proxy \
"https://www.youtube.com/watch?v=${id_title%% *}" \
--no-warnings --no-check-certificate --skip-download -q -f 140 --get-url
}
Get_mp3(){
while read id_title;do
# jq spat out "dehfuefhhf some song title"
# ${id_title#* } // that deletes the id part
# skip if we already have
[[ -e "${id_title#* }.mp3" ]] && continue
curl -s -A "${UserAgent}" ${Proxy} \
"$(Get_url)" \
| ffmpeg -hide_banner \
-i - \
-c:a libmp3lame -aq 2 \
"${id_title#* }.mp3"
# exit after the first one
exit
done< <(Get_ids)
}
[[ -e "$InputList" ]] \
&& [[ $(file -b --mime-type "$InputList" ) == "text/plain" ]] \
&& Get_mp3
exit
No real checks, just blindly does stuff
and having ffmpeg use stdin from some random internet page is asking for trouble
It would be *much* safer to dl the m4a and then use ffmpeg after testing that the m4a is actually aac data ( which is what you have already been doing )
I just thought it would be fun to skip that step
see if you can come up with some checks on the ids and urls, that they are in the expected length/format.
maybe use parallel to start a chain of
Get_m4a && check_m4a && ffmpeg_to_mp3
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.