Skip to content

Commit

Permalink
Update TEDSubs to reflect the changes in TED website
Browse files Browse the repository at this point in the history
  • Loading branch information
joedicastro committed Mar 31, 2014
1 parent 4df546c commit 2d53536
Showing 1 changed file with 3 additions and 4 deletions.
7 changes: 3 additions & 4 deletions src/TEDSubs.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,9 +179,9 @@ def main():
(opts, args) = options().parse_args()

# regex expressions to search into the webpage
regex_intro = re.compile('introDuration%22%3A(\d+\.?\d+)%2C')
regex_id = re.compile('talkId%22%3A(\d+)%2C')
regex_url = re.compile('id="no-flash-video-download" href="(.+)"')
regex_intro = re.compile('"introDuration":(\d+\.?\d+),')
regex_id = re.compile('"id":(\d+),')
regex_url = re.compile('"nativeDownloads":.*"high":"(.+)\?.+},"sub')
regex_vid = re.compile('http://.+\/(.*\.mp4)')

if not args:
Expand All @@ -204,7 +204,6 @@ def main():
+ 1) * 1000)
ttalk_id = int(regex_id.findall(ttalk_webpage)[0])
ttalk_url = regex_url.findall(ttalk_webpage)[0]
ttalk_url = ttalk_url.replace('.mp4', '-480p.mp4')
ttalk_vid = regex_vid.findall(ttalk_url)[0]
except IndexError:
print('Maybe this video is not available for download.')
Expand Down

4 comments on commit 2d53536

@heysweet
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the newer urls aren't properly parsed (ie https://www.ted.com/talks/hugh_herr_the_new_bionics_that_let_us_run_climb_and_dance). Any suggestions for how I could reformat the url to make it work in your script?

@joedicastro
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a few hours I'll take a look at that and I'll say to you, thanks for report this!

@joedicastro
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewsweet I tested that url and works fine for me, what's the error that happend to you? Can you show me the traceback?

This is my output:

joedicastro@itaca:~/code/ted-talks-download/src$ ./TEDSubs.py https://www.ted.com/talks/hugh_herr_the_new_bionics_that_let_us_run_climb_and_dance                                                                   
Subtitle 'HughHerr_2014-480p.eng.srt' downloaded.
Subtitle 'HughHerr_2014-480p.spa.srt' not found.
Donwloading video...
Video HughHerr_2014-480p.mp4 downloaded

@lboullu
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
Two modifications/corrections and one remark that new users might need (I did) for TedTalks.py:

  • In the main, line 457, my version refused to work because the number 86400 is too low. Increasing it to 986400 works just fine
  • I might be the first Windows user to try this script, but in the original the variable FOUND is not defined if the OS is Windows (as it is for those who aren't line 504). Hence, I added FOUND = 0 just before the if not of line 503
  • The 'join' function line 423 will scream if your folder has an accent (non classical ASCII symbol) in it.

Also, I did not manage to make the function "get_subs" work, but that doesn't matter ; and windows users must use

    (path where you installed python)\ setup.py install

and

    TEDTalks.py [path]

i.e. without python or python.exe.

Thanks for your work,

Loïs

Please sign in to comment.