Markdownify'ing webpages

Markdownify'ing webpages

Sun Jun 7 06:17:33 2015

I needed to convert some webpages to a human-readable format for studing/review at a later time. After some thought I came up with the following:

download pages via curl/wget or good old “save page as”
slugify filenames for easier shell manipulation
convert to markdown using Aaron Swartz’s html2text
rename file extensions to reflect new format

# delete directories (handling spaces in filenames)
$ find . -type d -print0 | xargs -0 rm -rf

# slugify, rename, then delete html files
$ ls | xargs -0 | slugify && \
for file in $(ls ); do
    html2text $file > ${file%.html}.md;
done && \
rm *.html

/etc

13 Aug 2023 Dark mode!!!
15 Aug 2022 Wordle Suggestions
30 Dec 2021 MetaMask Local Network
21 Oct 2021 Alleviating Meeting...
07 Aug 2021 Pairwise Comparison...
21 Jul 2021 Using the Logistic ...
15 Jul 2021 _init_completion: c...
18 Jun 2021 Running DynamoDB Lo...
21 Mar 2021 Git Rebase Onto
21 Apr 2020 OS X Mission Contro...
03 Apr 2020 ImageMagick stitch ...
21 Mar 2020 iTerm2 Keys
05 Dec 2019 Python itertools.pr...
10 May 2019 ISO_31-11
22 Mar 2019 Cons cells (python ...
12 Mar 2019 Ordered Symbol Tabl...
06 Mar 2019 Docker: No space le...
14 Dec 2018 Vim Window Resizing
11 Nov 2018 brew macvim woes
31 Oct 2018 Iterated logarithm
30 Oct 2018 Bash 'Command' Builtin
16 Sep 2018 discrete math
31 Jul 2018 killall Dock
26 May 2018 Ruby defaultdict
04 May 2018 Docker detach
19 Apr 2018 Additive Inverse
17 Apr 2018 Remove from Chrome ...
24 Mar 2018 Database N+1 Queries
08 Mar 2018 Book: The Power of ...
02 Mar 2018 Associative entity
30 Jan 2018 AWS S3
05 Dec 2017 Recursive Cartesian...
04 Nov 2017 Products and trees
23 Oct 2017 Git: cleanup merged...
21 Oct 2017 Combinatorics refre...
19 Oct 2017 Run-length encoding
17 Oct 2017 Last git branch
30 Sep 2017 YubiKey All The Things
14 Sep 2017 Send More Money
11 Aug 2017 Map lookup time
31 Jul 2017 jq to the rescue
27 Jul 2017 Circular Right Rotate
08 May 2017 RSpec gotcha (err me)
07 Apr 2017 mount bigger /dev/shm
11 Mar 2017 Symmetric Key Encry...
19 Feb 2017 Euclid's algorithm
08 Feb 2017 getting ssl certs
06 Feb 2017 sqlite dump to csv
04 Jan 2017 Hash comprehension?
02 Dec 2016 custom pry commands
08 Nov 2016 ImageMagick identify
06 Oct 2016 date manip
22 Sep 2016 Docker snippets
21 Jun 2016 elasticsearch bulk api
30 May 2016 Fibonacci Fun Time
21 Apr 2016 require 'pry'; bind...
09 Apr 2016 ffmpeg time lapse
29 Feb 2016 rlwrap ftw
25 Feb 2016 NGINX proxy_pass to...
17 Feb 2016 git gui blame
05 Feb 2016 xargs with files wi...
22 Jan 2016 screen in screen in...
22 Dec 2015 Debugging Octave/Ma...
13 Nov 2015 pdb aliases are you...
06 Nov 2015 moreutils::sponge
07 Oct 2015 nvm error: Checksum...
22 Aug 2015 GPG snippets
12 Jul 2015 No more sslwrap
08 Jun 2015 Git flow bullets/ch...
07 Jun 2015 Markdownify'ing web...
22 May 2015 brute force arithmetic
15 May 2015 host dns lookup
02 Apr 2015 E141: No file name ...
01 Apr 2015 netlify'ing :)
07 Feb 2015 Instagram python ap...
22 Dec 2014 CPython vs C Perfor...
07 Nov 2014 Nibble to binary st...
24 Oct 2014 Python Package to RPM
09 Oct 2014 ssh force IdentityFile
24 Sep 2014 Memoization tests (...
22 Aug 2014 VirtualBox Bridged ...
19 Aug 2014 Mac OS X NFS Client
19 Aug 2014 GTK Emacs Keybindings
07 Aug 2014 Linux Caps Lock to ...
04 Aug 2014 SD card dd unbuffer...
23 Jul 2014 CentOS(ish) Static IP
16 Jul 2014 Imaged OS mac addre...
12 Jul 2014 ssh disable public ...
12 Jul 2014 Modify Raspberry Pi...
02 Jul 2014 Refresh postfix
30 Jun 2014 Install pip for the...
10 Jun 2014 Apache mod_rewrite ...
16 May 2014 Manage dotfiles wit...
23 Apr 2014 git grep with blame
06 Apr 2014 PostgreSQL Ident Au...