/dev/random

Dark mode!!!

I’ve finally added support for dark mode 😎 This has been on my TODO list for a while, especially since adding dark mode support on both cadizm.com and dev.cadizm.com.

Both of those sites were previously migrated to Next.js which made the job much easier.

This blog is still using Jekyll and tbh it’s working just fine. My needs are quite simple with my main requirements being:

  1. Uncluttered repo that is easy to understand
  2. Simple and fast static site generation
  3. Support for writing posts/code in Markdown
  4. Ability to typeset mathematics using LaTeX
  5. Support for dark mode!

After adding MathJax, the only thing that was missing was dark mode. But no more! Now that most modern browsers have prefers-color-scheme media query support, adding dark mode to this site was much easier than expected!

Because my frontend skills can be described (nicely) as lacking, I was googling (and asking ChatGPT) all sorts of questions about how to migrate Jekyll sites and very old Bootstrap sites to use dark mode. Most of the answers involved upgrading to the latest Bootstrap which was something I was not particularly interested in doing.

After initially trying those routes and being unhappy with the results, I went for the manual approach which ended up being much nicer~ The PR only needed the addition of a few media queries for body, anchor, and code elements. As well as choosing more appropriate syntax highlighting scheme css classes. I learned how easy this is to do using rouge

$ rougify style molokai | pbcopy

Wordle Suggestions

I’ve recently finished adding Wordle to my short list of “word game helper” apps. After playing a couple of times unaided by a computer, it seemed fun to put together a short algorithmic solution to help ensure a continued 100% win streak! You can find the source here. Continue reading for an overview of the approach and an analysis of Big \(O\) time complexity.

First a short snippet of the high-level logic:

def suggest(wordle, excluded, included, misplaced):
  """
  Client entrypoint for suggesting Wordle solutions.
  """
  return score(discard(misplaced, search(wordle, exclude(excluded, include(included, corpus)))))

The intent is to:

  1. Use a list of all known 5 letter English words (“corpus”)
  2. Include from the corpus, words that “include” a given set of letters
  3. Exclude form the corpus, words that “exclude” a given set of letters
  4. Search this subset of included/excluded words for words matching a “wordle” guess (using “.” for positions whose letter we do not know (shaded gray in Wordle) and the actual letter itself for those correctly positioned (Wordle shaded green))
  5. Subtracting from the search, words that have letters that we know are “misplaced” (in Wordle these are shaded yellow)
  6. Finally score the words remaining after performing the above and return in descending score order

Quick definition for variables used in analysis. \(n\) will be bound to the size of the corpus, and \(k\) to the size of the input for each of the functions above. For example, in the call:

exclude(excluded, include(included, corpus))

\(n\) is size of corpus, \(k\) is the variable used to describe the sizes of both included and excluded, with the understanding that size \(k\) passed in the invocation to include is not necessarily the same \(k\) passed to exclude, but is of course required to be strictly <= the initial \(k\) passed to include.

Time complexity for include:

def include(letters, corpus):
  """
  Return words in corpus that include all letters in `letters`.
  """
  return set(filter(lambda word: set(word) & set(letters) == set(letters), corpus))

A single pass over each word in corpus is done, so this already requires \(O(n)\) time. At each iteration we are performing a set intersection operation which takes \(O(min(m, n))\) time. In our case using \(k\) for letters, we know that letters we want to include have an upper bound of 5 since all Wordle words are 5 letters. And each of the words in our corpus are 5 letters by definition. This implies that although the theoretical upper bound is \(O(k \cdot n) = O(5 \cdot n)\), we can extract and drop our constant \(k\), leaving \(5 \cdot O(n) \rightarrow O(n)\) linear time for the include operation.

Similar analysis can be used for exclude, although our constant size will be slightly larger: There are 26 letters in the English alphabet so we know that \(k\) in this case has to at least be <= 26. We can get a tighter bound by realizing that our corpus is restricted to 5 letter words and that 2 seems to be the minimum number of letters possible for valid words. So even if we are extremely unlucky in letter guessing, our upper bound is constrained to constant size \(26 - 2 = 24\).

The search operation follows analogously, as we are again doing a single pass with a constant time check to apply our regular expression, which simplifies to a single pass over our regex and strings, both of which are fixed at 5.

Only when we apply discard do we get a seemingly quadratic upper bound when we apply the \(O(n)\) search operation for each member of misplaced. But if we again think about the maximum size of misplaced, we can convince ourselves that it is again upper bounded at 5! (In the unlikely case that we have correctly identified all 5 letters, all of which are in the incorrect position.)

I hope to have shown that Wordle’s restriction to valid 5 letter words greatly reduces the size of inputs to our functions which can be leveraged to arrive at a tighter bounded linear running time (excluding scoring which has been omitted and will come in a future post since this one is is already a bit lengthy).

MetaMask Local Network

Have been checking out Ethereum during the break, mostly going through the docs as well as various guides that I’ve come across. The impetus was this article which gives a great intro to the emerging Web3 movement.

Along with setting up and interacting with a local geth node, it’s helpful to interact with accounts (as opposed to “wallets”) using the MetaMask wallet. MetaMask supports importing accounts using private keys or JSON in the Web3 Keystore format.

One thing that wasn’t obvious (and also the motivation for this post), was how to setup MetaMask and a geth node running locally for communication. By default, CORS is disabled and requires a runtime configuration flag to be passed:

geth --goerli --syncmode "light" --signer=/path/to/signer/clef.ipc --http \
  --http.corsdomain "moz-extension://67b02c56-bb74-4303-bfed-1aa4014483f1"

Make sure to set MetaMask ChainID to 5 when using the --goerli flag and/or explicitly set chain ID using the --networkid flag (and of course the correct chrome or moz extension UUID).

Alleviating Meeting Fatigue

Things I try to keep in mind regarding meetings:


Pairwise Comparison, Triangular Numbers, and Gauss

When performing pairwise comparison over 2 sequences, the canonical code (in Java) looks something like:

public static <T> List<Pair<T>> pairwise(List<T> list) {
    if (list == null) {
        return null;
    }

    int n = list.size();
    List<Pair<T>> res = new ArrayList<>();

    for (int i = 0; i < n - 1; ++i) {
        for (int j = i + 1; j < n; ++j) {
            res.add(new Pair<T>(list.get(i), list.get(j)));
        }
    }

    return res;
}

var input = List.of(a);
var output = [];

var input = List.of(a, b);
var output = [(a, b)];

var input = List.of(a, b, c);
var output = [(a, b), (a, c), (b, c)];

var input = List.of(a, b, c, d);
var output = [(a, b), (a, c), (a, d), (b, c), (b, d), (c, d)];

var input = List.of(a, b, c, d, e);
var output = [(a, b), (a, c), (a, d), (a, e), (b, c), (b, d), (b, e), (c, d), (c, e), (d, e)];

When considering the algorithm’s complexity it’s easy to skim the code, see 2 for loops, declare it quadratic, and call it a day. But if we want a more rigorous proof, we can dissect its components.

The outer loop clearly runs in linear time (\(O(n)\)). But what about the inner loops? Each begins at the \(ith\) element and ends at the last, so it’s upper bound must be \(n\). But does that mean the overall pairwise comparison is sub-quadratic?

Again, it becomes easier to see when we start unrolling the loop iterations:

\[n + (n - 1) + (n - 2) + \cdots + 1\]

Rewriting from right-to-left in Sigma notation, the formula becomes:

\[\sum_{k=1}^{n} k = 1 + 2 + 3 + \cdots + n\]

Which is more easily recognizable as the sequence summing the first \(n\) integers:

\[\sum_{k=1}^{n} k = 1 + 2 + 3 + \cdots + n = \frac{n(n + 1)}{2}\]

The solution to this sequence is most famously attributed to Gauss when in grade school. And in this notation it’s much easier to see that the upper bound is indeed \(O(n^2)\).

A visual interpretation can be done using triangular numbers, where the triangle can be pictured as a “half-square” of objects added together with a copy of itself and rotated to form a rectangle with dimensions \(n(n + 1)\).

Lastly, thinking of this combinatorically, the formula can be written as a binomial coefficient:

\[{n + 1 \choose 2} = \frac{n(n + 1)}{2} = \sum_{k=1}^{n} k = 1 + 2 + 3 + \cdots + n\]

Using the Logistic Function to Compute Elo

When designing ordered rankings (sports, gaming, even dating), the Elo Rating System in chess has proven very useful.

FIDE has tweaked the original rating scheme over the years, but in its current form it is:

\[E_A = \frac{1}{1 + 10^\frac{R_B - R_A}{400}}\]

\(E_A\) represents the probability (expected outcome) of Player A beating Player B. The base of 10 and standard deviation of 400 have been chosen such that when players have a 400 point difference, there is a 90% chance that the stronger player will win:

\[\frac{1}{1 + 10^\frac{-400}{400}} = \frac{1}{1 + 10^{-1}} = \frac{1}{\frac{11}{10}} = 0.90\]

And a 50/50 chance for either player winning when they have equal ratings:

\[\frac{1}{1 + 10^\frac{0}{400}} = \frac{1}{1 + 1} = \frac{1}{2} = 0.50\]

After a given outcome has occurred, the formula below is used to update each player’s rating is:

\[R' = R + K(S - E)\]

This updates a player’s rating given: current rating \(R\), expected probability of winning \(E\), and outcome \(S\).

Some new variables that require definitions: \(K\) is the K-factor constant controlling the sensitivity of update: larger for more sensitive and smaller for less. \(S\) is the score which takes value 1 for win, 0 for loss, and 0.5 for tie.

_init_completion: command not found

Update Sat Nov 11 12:42:24 PST 2023

As of Nov 2023, the easiest way to fix this is to just install the bash-completion@2 Homebrew package.


If you encounter the bash error message: _init_completion: command not found when trying to use bash-completion (and you are on macOS), the error is most likely due to having an older version of bash installed such as 3.2 (which is the default on macOS).

The easiest way to solve this is of course to upgrade using brew. At the time of this writing this will install 5.1.8(1)-release, but all we really need is v4.1+ in order to make use of bash-completion v2 which includes _init_completion.

Here’s a great article by the k8s folks that explains the process step-by-step.

But if for some reason we wanted to keep using the default bash shell on macOS along with bash-completion v1, we could steal a hack found in the kubctl completion script:

# Homebrew on Macs have version 1.3 of bash-completion which doesn't include
# _init_completion. This is a very minimal version of that function.
__kubectl_init_completion()
{
    COMPREPLY=()
    _get_comp_words_by_ref "$@" cur prev words cword
}

# now use this function if we don't have _init_completion installed
if declare -F _init_completion >/dev/null 2>&1; then
    _init_completion -s || return
else
    __kubectl_init_completion -n "=" || return
fi

Running DynamoDB Locally

I’ve found it useful to run Dynamo locally during development. It’s helpful during new feature development and experimentation (doesn’t matter if you break things and need to start again from scratch). Other benefits include being a nice person in a shared environment and also saving money is always good :)

I’ve added some helper scripts on GitHub for easily initializing new tables and dumping their contents. For Java development, you’ll also want something like the following:

// import java.net.URI;
// import org.springframework.web.util.UriComponentsBuilder;
// import software.amazon.awssdk.services.dynamodb.DynamoDbAsyncClient;

URI uri = UriComponentsBuilder.newInstance()
    .scheme("http")
    .host("localhost")
    .port("8000")
    .build().toUri();

DynamoDbAsyncClient client = DynamoDbAsyncClient.builder()
    .endpointOverride(uri)
    .build();

Git Rebase Onto

When working on a complicated feature, it is often helpful to break up functionality into logical and cohesive “working sets” or “change sets”. This helps not on the author, but other engineers during peer code review.

In day-to-day development and when using version control, change sets will almost always mean feature branches in Git.

Because complicated features may build upon functionality introduced as part of the new feature that has not yet been merged back to the main branch, a situation depicted below can often occur:

o---o---o master
     \
      o---o---o  feature-a-part-1
               \
                o---o---o  feature-a-part-2
                         \
                          o---o---o  feature-a-part-3

In a perfect world, all 3 feature branches would be bug-free, have the necessary code review approvals, and be ready to be merged back to the main branch at the same time in sequential order:

(feature-a-part-2) $ git merge feature-a-part-3
(feature-a-part-2) $ git co feature-a-part-1
(feature-a-part-1) $ git merge feature-a-part-2
(feature-a-part-1) $ git co master
(master) $ git merge feature-a-part-1

Unfortunately things are much more likely to be a bit messier in real life. Some common scenarios:

  • What happens when parts 1 or 2 have some awkwardness that needs to be reworked?
  • Parts 1 and/or 2 are approved and ready to be merged to master, but 3 is not?
  • The master branch (or another feature branch) have some changes that make our branch out of date?

The situation can be further complicated when a repository prefers the “squash-merge” strategy since history is effectively “rolled up” during the squash process, requiring work on the part of the engineer to make sure branches forked from other feature branches are merged correctly back to the main branch.

Enter Git rebase –onto. Taken directly from the documentation:

Here is how you would transplant a topic branch based on one branch to another, to pretend that you forked the topic branch from the latter branch, using rebase --onto. First let’s assume your topic is based on branch next. For example, a feature developed in topic depends on some functionality which is found in next.

o---o---o---o---o  master
     \
      o---o---o---o---o  next
                       \
                        o---o---o  topic

We want to make topic forked from branch master; for example, because the functionality on which topic depends was merged into the more stable master branch. We want our tree to look like this:

o---o---o---o---o  master
    |            \
    |             o'--o'--o'  topic
     \
      o---o---o---o---o  next

We can get this using the following command:

$ git rebase --onto master next topic

Using our previous 3-part example above, assume that part 2 has been merged into part 1 and part 1 has been merged into master. We can fix part 3’s PR like so:

(master) $ git pull
(master) $ git co feature-a-part-3
(feature-a-part-3) $ git rebase -i HEAD~2 --onto master

We are replaying the 2 commits made on part 3 after it branched off part 2 onto the new tip of master.

OS X Mission Control & Dock Things

Two things that will drive one bonkers when switching to a new laptop (when you don’t recall that they are actual settings)

Dock keyboard shortcut bonus:

# move context to Dock
Ctrl + F3

ImageMagick stitch images vertically

For those times you need to screenshot “stitch/sew/concatenate/tile” more than a screen full.

$ convert Screen\ Shot\ 2020-04-01\ at\ 12.36.* -append screen.png

iTerm2 Keys

It can be challenging acclimating oneself to a new computer, especially when muscle memory key bindings that weren’t backed up need to be remapped :(

Hints to my future self:

+--------------------------------------------+
| Key Combination | Action                   |
| --------------- | ------------------------ |
| ⌘b              | Send ^[b                 |
| ⌘d              | Send ^[d                 |
| ⌘f              | Send ^[f                 |
| ⌘←Delete        | Send Hex Codes: 0x1b 0x8 |
+--------------------------------------------+

Bonus: Wheat color hex code: #f5deb3

Python itertools.product and nested loops

n = 7
m = 3

# m = 3, so 3 nested loops
foo = []
for i in range(n):
    for j in range(n):
        for k in range(n):
            foo.append((i, j, k))


import itertools
bar = list(itertools.product(*[range(n) for _ in range(m)]))

assert foo == bar

ISO_31-11

TIL ISO 31-11 exists. Came across the wiki page when googling the notations for open and closed intervals (which I can never seem to remember).

As a further reminder:

[a, b]

denotes the closed interval (closed to adding endpoints) which includes endpoints a and b.

(a, b)

denotes the oepn interval (open to adding endpoints) which excludes endpoints a and b.

Cons cells (python edition)

Came across a puzzle asking about implementing Lisp’s car and cdr functions in Python. Turned out to be pretty fun and concise:

# cons.py

# `cons`struct cons cell that holds pairs of values/pointers to values
def cons(a, b):
    def pair(f):
        return f(a, b)
    return pair

# return the first half (`c`ontents of `a`ddress `r`egister) of cons cell
def car(f):
    return f(lambda a, b: a)

# return second first half (`c`ontents of `d`ecrement `r`egister) of cons cell
def cdr(f):
    return f(lambda a, b: b)

# return cons cell as a `dotted pair` string
def dotted(cons_cell, s=None):
    s = [] if s is None else s

    for v in ['(', car(cons_cell), ' . ', cdr(cons_cell), ')']:
        if callable(v):
            dotted(v, s)
        else:
            s.append('nil' if v is None else str(v))

    return ''.join(s)

if __name__ == '__main__':
    print(dotted(cons(1, 2)))
    print(dotted(cons(1, cons(2, cons(3, None)))))
    print(dotted(cons(cons(1, 2), cons(3, 4))))
$ python cons.py
(1 . 2)
(1 . (2 . (3 . nil)))
((1 . 2) . (3 . 4))

Ordered Symbol Table ADT in C++ map

C++’s ordered std::map (usually implemented with red-black trees), satisfies the requirements for the Ordered Symbol Table abstract data type.

Some examples of how the ST interface might be mapped and take advantage of a wrapped std::map and make use of an ordered symbol table’s relative key ordering:

  // Largest key less than or equal to key
  const K* floor(K key) {
    if (isEmpty())
      return nullptr;

    // lower_bound returns iterator to first element not less than key
    auto candidate = map.lower_bound(key);
    if (candidate == map.begin())
      return &candidate->first;

    return &(--candidate)->first;
  }

  // Smallest key greater than or equal to key
  const K* ceiling(K key) {
    if (isEmpty())
      return nullptr;

    // upper_bound returns iterator to the first element greater than key
    return &map.upper_bound(key)->first;
  }

  // Number of keys less than key
  // i == rank(select(i)) for all 0 <= i <= size() - 1
  int rank(K key) {
    return std::distance(map.begin(), map.lower_bound(key));
  }

  // Key of rank k
  // key == select(rank(key)) for all keys in map
  const K* select(int k) {
    int low = 0;
    int high = size() - 1;

    // binary search
    while (low <= high) {
      auto it = map.begin();
      int mid = low + ((high - low) / 2);

      std::advance(it, mid);
      int rank = std::distance(map.begin(), it);

      if (rank < k) {
        low = mid + 1;  // look to the right
      } else if (rank > k) {
        high = mid - 1;  // look to the left
      } else {
        return &it->first;
      }
    }

    return nullptr;
  }

  // Number of keys in [low, high]
  int size(K low, K high) {
    auto begin = map.lower_bound(low);
    auto end = map.upper_bound(high);

    return std::distance(begin, end);
  }

  // Keys in [low, high], in sorted order
  std::vector<K> keys(K low, K high) {
    auto begin = map.lower_bound(low);
    auto end = map.upper_bound(high);

    std::vector<K> res;
    for (auto it = begin; it != end; ++it) {
      res.push_back(it->first);
    }

    return res;
  }

Docker: No space left on device

TL;DR

$ du -hs ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/
 63G
$ docker ps -q -f 'status=exited' | xargs docker rm
$ docker images -q -f 'dangling=true' | xargs docker rmi
$ du -hs ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/
 16G

Couldn’t bring up docker container since db connection failed. Taking a closer look at the db container showed the following:

...
db_1 | FATAL: could not write lock file "/var/run/postgresql/.s.PGSQL.5432.lock": No space left on device
...

Google search turned up the following blog post in slot 2: “No space left on device” on OSX Docker which somewhat explains the 64G cap on total storage for Docker on OS X.

Vim Window Resizing

Quick reminders for self:

Documentation: Windows

Highlights

" Make only window
Ctrl + w, o

" Open window in new tab
Ctrl + w, T

" Increase current window by 10
10 Ctrl + w, >

brew macvim woes

When you get the lovely library not found errors after installing macos security updates:

$ gvim
$ dyld: Library not loaded: /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/libruby.2.0.0.dylib

After much heartache and googling, here’s how you force brew to install macvim using the ruby version on your PATH:

l00k:~$ brew install --env=std macvim --HEAD
Updating Homebrew...
==> Cloning https://github.com/macvim-dev/macvim.git
Cloning into '/Users/mcadiz/Library/Caches/Homebrew/macvim--git'...
==> Checking out branch master
Already on 'master'
Your branch is up to date with 'origin/master'.
==> ./configure --with-features=huge --enable-multibyte --with-macarchs=x86_64 --enable-perlinterp --enable-rubyinterp --enable-tclinterp --enable-terminal --with-tlib=ncurses --w
==> make
🍺  /usr/local/Cellar/macvim/HEAD-2c43cd6: 2,185 files, 37.9MB, built in 1 minute 12 seconds

Iterated logarithm

The iterated logarithm (written “log*” and read “log star”) is: “the number of times the logarithm function must be iteratively applied before the result is <= 1”.

Among other things, it’s useful in analyzing the running time of the Find and Union operations in the union find datastructure that uses path compression.

Here’s a ruby snippet that computes iterated logarithm values for numbers <= 2^65536 (i.e. 2^(2^16)).

def log_star(x)
  return 0 if x <= 0

  count = 0
  while x >= 1 do
    x = Math.log2(x)
    count += 1
  end

  count
end

[0, 1, 2, 3, 4, 15, 16, 65535, 65536, 2**65535, 2**65536].map do |i|
  if i <= 65536
    "log* #{i}: #{log_star(i)}"
  else
    "log* 2^#{Math.log2(i).to_i}: #{log_star(i)}"
  end
end
.tap do |res|
  width = res.map(&:size).max
  puts res.map {|x| x.rjust(width)}
end
$ ruby iterated_logarithm.rb
      log* 0: 0
      log* 1: 1
      log* 2: 2
      log* 3: 2
      log* 4: 3
     log* 15: 3
     log* 16: 4
  log* 65535: 4
  log* 65536: 5
log* 2^65535: 5
log* 2^65536: 6

Bash 'Command' Builtin

According to the manual,

command -v  <command>

…causes a single word indicating the command or file name used to invoke command to be displayed.

Very useful when which foo doesn’t work (for example nvm which is a sourced function defined during shell creation).

discrete math

Starting a new GitHub repo with code for theorems, corollaries, and exercises from Discrete Mathematics: Elementary and Beyond.

One thing to note is that all of the code will be written in C++ (trying to get up to speed with C++11!!) and is intended to be run in a docker container. You can find the image here.

The image has all of the necessary dev tools installed (compilers, test libraries, linters, etc) so that you can compile and run the tests as long as you have docker installed.

See the README for more info.

First exercise is straightforward decimal to binary conversion:

std::string dec2bin(int64_t n) {
  if (n < 0) {
    throw std::out_of_range("`n' must be nonnegative");
  } else if (n == 0) {
    return "0";
  }

  std:: string s;

  while (n) {
    auto t = n & 1 ? "1" : "0";
    s.insert(0, t);
    n >>= 1;
  }

  return s;
}

killall Dock

Often, for when unplugging from thunderbolt, the OS X user interface will get borked and be unable to switch Desktops, launch Launchpad, use Spotlight, etc…

To “fix” the issue I was previously just rebooting my machine. But the other I learned another much less painful workaround:

$ killall Dock

Ruby defaultdict

For those coming from Python, it may be surprising that the new(obj) -> new_hash method doesn’t behave like Python’s defaultdict.

Ruby

irb(main):001:0> foo = Hash.new([])
=> {}
irb(main):002:0> foo[:bar].push(1)
=> [1]
irb(main):003:0> foo[:baz].push(2)
=> [1, 2]
irb(main):004:0> foo
=> {}

Python

In [1]: from collections import defaultdict
In [2]: foo = defaultdict(list)
In [3]: foo['bar'].append(1)
In [4]: foo['baz'].append(2)
In [5]: foo
Out[5]: defaultdict(list, {'bar': [1], 'baz': [2]})

Instead what you want is the block form new {|hash, key| block } -> new_hash:

irb(main):001:0> foo = Hash.new {|h, k| h[k] = []}
=> {}
irb(main):002:0> foo[:bar].push(1)
=> [1]
irb(main):003:0> foo[:baz].push(2)
=> [2]
irb(main):004:0> foo
=> {:bar=>[1], :baz=>[2]}

Docker detach

Goal: attach to a running Docker container and then detach without killing it.

Two possbile solutions:

  1. Always remember to spin up (exec/run) containers using the -i -t flags (interactive and pseudo-tty respecively) so that you can use the detach key sequence (ctl-p, ctrl-q by default) to actually detach from the container.

  2. Pass --sig-proxy=false when you attach so that SIGKILL is not proxied to the container and stays in the current shell.

Additive Inverse

Obtaining the additive inverse of a two’s complement integer using bitwise negation and addition:

In [1]: x = 2
In [2]: ~x + 1
Out[2]: -2
In [3]: x = -77
In [4]: ~x + 1
Out[4]: 77

Remove from Chrome dropdown

The magic keyboard sequence to remove a saved entry (such as email) from Chrome’s dropdown:

1. Select entry from dropdown
2. <fn> + <shift> + <delete>

Database N+1 Queries

Was recently reminded to always be on the lookout for potential n+1 query pitfalls. Take a look at the code before:

index = (string.size-1).downto(0).find {|i| Foo::Model.find_by(bar: string[0..i])}
index ? string[0..index] : nil

Versus the after:

candidates = string.length.times.map {|i| string[0..i]}
Foo::Model.where(bar: candidates).order('length(bar) desc').take&.bar

Much nicer doing 1 trip to the db with the query selecting and sorting instead of searching candidates one by one, up to string.length times.

Book: The Power of Habit (Duhigg)

Just finished The Power of Habit by Charles Duhigg. A great read that I can’t recommend enough. As an aside, making a conscious effort to make reading a “habit” =P and will be keeping myself honest here on this blog.

Associative entity

Since I always forget the correct ER name for a “join table”, here it is, associative entity.

AWS S3

Copy a file to a new subfolder within an existing s3 bucket (named incoming).

$ aws s3 --profile=admin cp bad-data.log s3://incoming/foo/bar/
$ aws s3 --profile=admin ls s3://incoming --recursive

Recursive Cartesian Product

A recursive approach to Python’s iterative method for the Cartesian product:

def product(*args):
    res = []
    product_helper([], args, 0, res)
    return res


def product_helper(prefix, pools, offset, res):
    if offset >= len(pools):
        res.append(tuple(prefix))
        return

    pool = pools[offset]

    for i in range(len(pool)):
        product_helper(prefix + [pool[i]], pools, offset + 1, res)


if __name__ == '__main__':
    import itertools
    L = ['abc', 'def']
    assert product(*L) == list(itertools.product(*L))

Products and trees

def product_others(L):
    """
    Return an array M where the value at each index i is the product
    of all values in L except for L[i]

    Constraint: only use multiplication operator
    """
    product = lambda x, y: x * y

    # value at index i is the product of all prev in L
    before = [
        reduce(product, L[:i], 1)
        for i in range(len(L))
    ]

    # value at index i is the product of all next in L
    after = [
        reduce(product, L[i+1:], 1)
        for i in range(len(L))
    ]

    return [x * y for (x, y) in zip(before, after)]


def reconstruct(inorder, postorder):
    """
    Reconstruct and return the root of the tree that yields the
    passed inorder and postorder traversals
    """
    assert len(inorder) == len(postorder)

    if not inorder:
        return

    # root is always last entry
    root = Node(postorder[-1])
    # find index of root in inorder
    i = inorder.index(root.data)

    # left subtree of tree rooted at root
    inorder_left = inorder[:i]
    # right subtree of tree rooted at root
    inorder_right = inorder[i+1:]

    postorder_left = postorder[:i]
    postorder_right = postorder[len(postorder_left):-1]

    root.left = reconstruct(inorder_left, postorder_left)
    root.right = reconstruct(inorder_right, postorder_right)

    return root


if __name__ == '__main__':
    L = [2, 4, 1, 3]
    assert [12, 6, 24, 8] == product_others(L)

    #
    #           1
    #          / \
    #         2   3
    #        / \ / \
    #       4  5 6  7
    #        \
    #         8
    #
    t1 = Node(1)
    t1.left = Node(2)
    t1.right = Node(3)
    t1.left.left = Node(4)
    t1.left.right = Node(5)
    t1.right.left = Node(6)
    t1.right.right = Node(7)
    t1.left.left.right = Node(8)

    A, B = [], []
    inorder(t1, A)
    postorder(t1, B)

    t2 = reconstruct(A, B)

    C, D = [], []
    inorder(t2, C)
    postorder(t2, D)

    assert A == C
    assert B == D

Git: cleanup merged branches

Delete local branches already merged to develop.

$ git br --merged develop | grep -v 'develop|master' | xargs git br -d

Combinatorics refresher

Power set, permutations, and combinations.

Note that for permutations, can augment to get the k-length permutations by return slices of length k ([v + perm][:k]).

def powerset(S):
    """
    Return the set of all (2^N) subsets of S (including empty set and S itself)
    """
    res = []

    for i in range(2**len(S)):
        T = []
        for j in range(len(S)):
            if (1 << j) & i:
                T.append(S[j])
        res.append(T)

    return res


def permutations(L):
    """
    Return all n! (n-factorial) permutations of elements in L
    """
    if len(L) <= 0:
        return []
    elif len(L) == 1:
        return L

    res = []
    for i, v in enumerate(L):
        prefix = L[:i]
        suffix = L[i+1:]
        for perm in permutations(prefix + suffix):
            res.append(v + perm)

    return res


def combinations(k, L):
    """
    Return all (binomial coefficient) k-length combinations of elements in L
    """
    if k == 0:
        return [[]]

    res = []
    for i, v in enumerate(L):
        for suffix in combinations(k-1, L[i+1:]):
            res.append([v] + suffix)

    return res

Run-length encoding

Given an arbitrary string of fixed length, implement a function that compresses the string by removing repeated characters and providing a count of how many are upcoming instead

def rle(S):
    res, count = [], 1

    for i, s in enumerate(S):
        try:
            lookahead = S[i+1]
        except IndexError:
            lookahead = None

        if s == lookahead:
            count += 1
            continue

        if count > 1:
            res.append(str(count))
            count = 1
        res.append(s)

    return ''.join(res)

assert rle('WB') == 'WB'
assert rle('WWB') == '2WB'
assert rle('WWBB') == '2W2B'
assert rle('WWBW') == '2WBW'
assert rle('WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW') == '12WB12W3B24WB14W'

Last git branch

Note to self: to avoid mass proliferation of already merged local branches:

l00k:~/workspace/src/foo-proj (feature-foo) $ git co develop
l00k:~/workspace/src/foo-proj (develop) $ git br -d @{-1}
Deleted branch feature-foo (was 8169880).

YubiKey All The Things

Mostly notes to self, extracted from the very thorough guide here.

l0st:~$ gpg --keyserver keyserver.ubuntu.com --recv 0x210583A0ECB0AFA4
gpg: key 0x210583A0ECB0AFA4: public key "Michael Cadiz <mike@cadizm.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1

l0st:~$ gpg --card-status
Reader ...........: Yubico Yubikey 4 OTP U2F CCID
Application ID ...: D2760001240102010006064714440000
Version ..........: 2.1
Manufacturer .....: Yubico
Serial number ....: 06471444
Name of cardholder: Michael Cadiz
Language prefs ...: en
Sex ..............: unspecified
URL of public key : [not set]
Login data .......: mike@cadizm.com
Signature PIN ....: not forced
Key attributes ...: rsa2048 rsa2048 rsa2048
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 0 3
Signature counter : 0
Signature key ....: 937F C389 F454 379B 0AFA  91B7 FFFC 7132 9711 5C2D
      created ....: 2017-09-30 00:56:00
Encryption key....: AE92 FBF9 2649 FA40 6B8E  7E0F 601C 0549 9658 64FF
      created ....: 2017-09-30 00:56:43
Authentication key: 8CDF 1B3D 174B 9516 AFFC  127A 6E4E 403F 1B48 9938
      created ....: 2017-09-30 00:57:04
General key info..: sub  rsa2048/0xFFFC713297115C2D 2017-09-30 Michael Cadiz <mike@cadizm.com>
sec#  rsa2048/0x210583A0ECB0AFA4  created: 2017-09-30  expires: never
ssb>  rsa2048/0xFFFC713297115C2D  created: 2017-09-30  expires: never
                                  card-no: 0006 06471444
ssb>  rsa2048/0x601C0549965864FF  created: 2017-09-30  expires: never
                                  card-no: 0006 06471444
ssb>  rsa2048/0x6E4E403F1B489938  created: 2017-09-30  expires: never
                                  card-no: 0006 06471444

l0st:~$ gpg --list-keys
/Users/mcadiz/.gnupg/pubring.kbx
--------------------------------
pub   rsa2048/0x210583A0ECB0AFA4 2017-09-30 [SC]
      Key fingerprint = 204E 949F 17A5 B8EF E8DB  0571 2105 83A0 ECB0 AFA4
uid                   [ unknown] Michael Cadiz <mike@cadizm.com>
sub   rsa2048/0xFFFC713297115C2D 2017-09-30 [S]
sub   rsa2048/0x601C0549965864FF 2017-09-30 [E]
sub   rsa2048/0x6E4E403F1B489938 2017-09-30 [A]

l0st:~$ gpg --edit-key 0x210583A0ECB0AFA4
gpg (GnuPG) 2.2.1; Copyright (C) 2017 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Secret key is available.

pub  rsa2048/0x210583A0ECB0AFA4
     created: 2017-09-30  expires: never       usage: SC
     trust: unknown       validity: unknown
ssb  rsa2048/0xFFFC713297115C2D
     created: 2017-09-30  expires: never       usage: S
     card-no: 0006 06471444
ssb  rsa2048/0x601C0549965864FF
     created: 2017-09-30  expires: never       usage: E
     card-no: 0006 06471444
ssb  rsa2048/0x6E4E403F1B489938
     created: 2017-09-30  expires: never       usage: A
     card-no: 0006 06471444
[ unknown] (1). Michael Cadiz <mike@cadizm.com>

gpg> trust
pub  rsa2048/0x210583A0ECB0AFA4
     created: 2017-09-30  expires: never       usage: SC
     trust: unknown       validity: unknown
ssb  rsa2048/0xFFFC713297115C2D
     created: 2017-09-30  expires: never       usage: S
     card-no: 0006 06471444
ssb  rsa2048/0x601C0549965864FF
     created: 2017-09-30  expires: never       usage: E
     card-no: 0006 06471444
ssb  rsa2048/0x6E4E403F1B489938
     created: 2017-09-30  expires: never       usage: A
     card-no: 0006 06471444
[ unknown] (1). Michael Cadiz <mike@cadizm.com>

Please decide how far you trust this user to correctly verify other users' keys
(by looking at passports, checking fingerprints from different sources, etc.)

  1 = I don't know or won't say
  2 = I do NOT trust
  3 = I trust marginally
  4 = I trust fully
  5 = I trust ultimately
  m = back to the main menu

Your decision? 5
Do you really want to set this key to ultimate trust? (y/N) y

pub  rsa2048/0x210583A0ECB0AFA4
     created: 2017-09-30  expires: never       usage: SC
     trust: ultimate      validity: unknown
ssb  rsa2048/0xFFFC713297115C2D
     created: 2017-09-30  expires: never       usage: S
     card-no: 0006 06471444
ssb  rsa2048/0x601C0549965864FF
     created: 2017-09-30  expires: never       usage: E
     card-no: 0006 06471444
ssb  rsa2048/0x6E4E403F1B489938
     created: 2017-09-30  expires: never       usage: A
     card-no: 0006 06471444
[ unknown] (1). Michael Cadiz <mike@cadizm.com>
Please note that the shown key validity is not necessarily correct
unless you restart the program.

gpg> quit

l0st:~$ gpg --list-keys
gpg: checking the trustdb
gpg: marginals needed: 3  completes needed: 1  trust model: pgp
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
/Users/mcadiz/.gnupg/pubring.kbx
--------------------------------
pub   rsa2048/0x210583A0ECB0AFA4 2017-09-30 [SC]
      Key fingerprint = 204E 949F 17A5 B8EF E8DB  0571 2105 83A0 ECB0 AFA4
uid                   [ultimate] Michael Cadiz <mike@cadizm.com>
sub   rsa2048/0xFFFC713297115C2D 2017-09-30 [S]
sub   rsa2048/0x601C0549965864FF 2017-09-30 [E]
sub   rsa2048/0x6E4E403F1B489938 2017-09-30 [A]

l0st:~$ ssh-add -L
The agent has no identities.
l0st:~$ pkill ssh-agent && pkill gpg-agent && eval $(gpg-agent --daemon --enable-ssh-support  --log-file ~/.gnupg/gpg-agent.log)
l0st:~$ ssh-add -L
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5xxrkhC2MnXO2HJzJL7kFPhE71pejZMNZ8SwYDMlQoRCnuvNZ9eDy+ped+nCO+SEyNcMlike6clG6/iepfXNteqpaM4mmH9UyKmbdVLr3Vks1dMUI9TL9DHzIe7KvogmHdV+Fg30m13pKbknFES7HGBjAV6U8EXteT4v17dSZ/P2B3l4EdmTAtgWfsQnSEecd5SKVWpdOovL/h67W4zrgTKcQZB5h3sW7TyPSSgEF30Bdt8gv8/nZJtv2husSZaIRzm7V4Y3ikP1Lh3QdSQenTAjcWRRh/X+9ueXyc5zqGQu/i+PPriZIdIqN2T+ql5FXTdni5W4Se0Fhm7SvHVwd cardno:000606471444

In the event your YubiKey (and backup YubiKey) are lost, to re-import

$ cd $(mktemp -d)
$ chmod 700 .
$ export GNUPGHOME=`pwd`
$ export GPG_TTY=$(tty)
$ wget https://cadizm.com/files/cadizm-pgp-public.key.txt
$ wget https://raw.githubusercontent.com/cadizm/dotfiles/master/gnupg/gpg.conf
$ wget https://raw.githubusercontent.com/cadizm/dotfiles/master/gnupg/gpg-agent.conf
$ pkill gpg-agent
$ gpg --list-keys
$ gpg --import < cadizm-pgp-public.key.txt
# save master and sub-keys to master-sub.key
$ gpg --import < master-sub.key
$ gpg --list-keys

You should then revoke and regenerate your keys.

Send More Money

How can we get MONEY?

  SEND
+ MORE
------
 MONEY

When in doubt, use brute force:

from itertools import permutations

corpus = list(set('SENDMOREMONEY'))

def f(D):
    def g(word):
        return int(''.join(([str(D[c]) for c in word])))
    return g('SEND') + g('MORE') == g('MONEY')


def pretty_print(D):
    print """
  SEND    {S}{E}{N}{D}
+ MORE  + {M}{O}{R}{E}
------  ------
 MONEY   {M}{O}{N}{E}{Y}
""".format(**D)


for perm in permutations(xrange(0, 10), len(corpus)):
    if f(dict(zip(corpus, perm))):
        pretty_print(dict(zip(corpus, perm)))

Map lookup time

Very surprised with the benchmark results of the 2 following parity implementations:

func Uint64(n uint64) int {
	parity := 0
	for i := uint16(0); i < 4; i++ {
		word := uint16(n >> (16 * i))
		parity ^= lookup(word)
	}

	return parity
}

func parity64(n uint64) int {
	var parity uint64 = 0
	for n > 0 {
		parity ^= 1
		n &= (n - 1) // clear lowest set bit trick
	}
	return int(parity)
}

I would have expected to Uint64 to outperform parity64 since it’s “just” a whole bunch of map lookup and xor’s. But after initial benching and digging in to see where time was spent, apparently map lookups are not really constant time in practice.

$ go test -bench .
BenchmarkParity64-8             2000000000               0.04 ns/op
BenchmarkUint64NoSetup-8        2000000000               0.17 ns/op
BenchmarkUint64WithSetup-8      2000000000               0.17 ns/op
PASS
ok      github.com/cadizm/epi/5.1-compute-parity        10.746s

For reference, here’s a link to the source.

jq to the rescue

If you’ve ever pulled your hair attempting to parse JSON using awk/grep/sed, do your hairline a favor and install jq.

The docs are very good, but to give a non-trivial concrete example, here’s processing JSON output from a Jenkins API:

function poll_jenkins {
    url="$JENKINS/job/foo/api/json"
    count=0
    while [ $count -lt $WAIT_COUNT ]; do
        let count=count+1
        echo -n "."
        sleep 1
        response=$(curl -q -S -X GET $url 2>/dev/null)
        if [ $(echo $response | jq ".inQueue") = "true" ]; then
            continue
        fi
        last_successful_build=$(echo $response | jq ".lastSuccessfulBuild.number")
        last_build=$(echo $response | jq ".lastBuild.number")
        if [ "$last_successful_build" = "$last_build" ]; then
            LAST_BUILD=$last_build
            return
        fi
    done
}

Retrieving the offer_id from a list of offers:

$ cat offers-2017-07-31.json | jq '[.offers[].offer_id]' | less

Circular Right Rotate

def rotate_right(L, k):
    """
    Circular right rotation of L, k times
    """
    M = L[:]
    for i, _ in enumerate(L):
        M[(i + k) % len(L)] = L[i]
    return M

RSpec gotcha (err me)

Spent a good half hour trying to figure out why the following spec was failing:

describe '#create' do
  it 'receives handle_event callback' do
    params[:foo] = 'bar'
    post :create, params: params
    expect(Foo::Bar).to receive(:new)
  end
end

After coming back to it the next day with a clearer head (and more patient googling), I realized that the expectation is that message will be received in the future. Doh!

So this should be:

describe '#create' do
  it 'receives handle_event callback' do
    expect(Foo::Bar).to receive(:new)
    params[:foo] = 'bar'
    post :create, params: params
  end
end

mount bigger /dev/shm

While working on a project that fires up a whole bunch of headless Google Chrome tabs using Selenium Python, I noticed that my program would crash all the time due to a “tab crash” error.

At first I thought it was just a difference between Ubuntu and OS X since it seemed to work fine on my laptop. But after a little googling I came across a docker thread which seemed to be pretty similar to what I was experiencing.

Apparently the default mount size for /dev/shm may not be large enough if Chrome has alot of tabs open (you check how many using $ lsof /dev/shm). To increase the size:

$ sudo umount /dev/shm
$ sudo mount -t tmpfs -o rw,nosuid,nodev,noexec,relatime,size=512M tmpfs /dev/shm

Symmetric Key Encryption with XOR Reciprocal Cipher

For very basic (and weak) encryption (of non-sensitive data) using a shared secret, one might want to consider using an XOR Cipher due to ease of implementation.

Here’s 3 implementations I needed for distribution to different code bases:

# cipher.py

def xor(s, t):
    return ''.join(chr(ord(a) ^ ord(b)) for a, b in zip(s, t))

def pad(s, char=' ', padlen=16):
    n = len(s)
    if n < padlen:
        s += char * (padlen - n)
    return s

def encrypt(key, text):
    from binascii import hexlify
    return hexlify(xor(key, pad(text)))

def decrypt(key, cipher_text):
    from binascii import unhexlify
    return xor(key, unhexlify(cipher_text)).strip()

if __name__ == '__main__':
    key = 'I|tF&T=RMYmtU|80~N`"16v~&V>D"J|['
    msg = '101237'

    cipher_text = encrypt(key, msg)
    decrypted_text = decrypt(key, cipher_text)

    assert decrypted_text == msg
    print cipher_text, decrypted_text
# cipher.rb

def xor(s, t)
  s, t = t, s if s.size > t.size
  s.chars.zip(t.chars).map { |a, b| (a.ord ^ b.ord).chr }.join
end

def pad(s, char=' ', padlen=16)
  n = s.size
  s = n < padlen ? s + char * (padlen - n) : s
end

def encrypt(key, text)
  xor(key, pad(text)).unpack("H*").join
end

def decrypt(key, cipher_text)
  xor(key, [cipher_text].pack("H*")).strip
end

if $0 == __FILE__
  key = 'I|tF&T=RMYmtU|80~N`"16v~&V>D"J|['
  msg = '101237'

  cipher_text = encrypt(key, msg)
  decrypted_text = decrypt(key, cipher_text)

  raise if decrypted_text != msg
  puts "#{cipher_text} #{decrypted_text}"
end
// Cipher.java

class Cipher {
    private String key;

    public Cipher(String key) {
        this.key = key;
    }

    public String encrypt(String text) {
        return hexlify(xor(this.key, pad(text, ' ', 16)));
    }

    public String decrypt(String cipherText) {
        return xor(this.key, unhexlify(cipherText)).trim();
    }

    private static String xor(String s, String t) {
        int n = s.length() < t.length() ? s.length() : t.length();

        StringBuffer sb = new StringBuffer();

        for (int i = 0; i < n; ++i) {
            int a = (int)s.charAt(i);
            int b = (int)t.charAt(i);

            sb.append((char)(a ^ b));
        }

        return sb.toString();
    }

    private static String pad(String s, char c, int padlen) {
        int n = s.length();

        if (n < padlen) {
            StringBuffer sb = new StringBuffer();
            for (int i = 0; i < padlen - n; ++i) {
                sb.append(c);
            }
            s += sb.toString();
        }

        return s;
    }

    private static String hexlify(String s) {
        String hex = "";
        for (byte b : s.getBytes()) {
            hex += String.format("%02x", b);
        }
        return hex;
    }

    private static String unhexlify(String s) {
        StringBuffer sb = new StringBuffer();

        for (int i = 0; i < s.length(); i += 2) {
            String hex = s.substring(i, i + 2);
            sb.append((char)Integer.parseInt(hex, 16));
        }

        return sb.toString();
    }

    public static void main(String[] args)
        throws Exception
    {
        Cipher cipher = new Cipher("I|tF&T=RMYmtU|80~N`\"16v~&V>D\"J|[");
        String msg = "101237";

        String cipherText = cipher.encrypt(msg);
        String decryptedText = cipher.decrypt(cipherText);

        if (!decryptedText.equals(msg)) {
            throw new Exception();
        }

        System.out.printf("%s %s\n", cipherText, decryptedText);
    }
}

Euclid's algorithm

Since I always forget

// euclid.go

func gcd(a, b int) int {
	if b == 0 {
		return a
	}
	return gcd(b, a%b)
}

func lcm(a, b int) int {
	return (a * b) / gcd(a, b)
}

getting ssl certs

I’ve known about

$ openssl s_client -connect github.com:443 -showcerts > github.com.crt

for a while, but was always annoyed about having to wait for the connection to timeout and close. Found a cool workaround here:

$ echo | openssl s_client -connect github.com:443 2>&1 | sed --quiet '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > github.com.crt

Note: the long form --quiet option infers gnu sed; if you’re on os x, you can

$ brew install --with-default-names gnu-sed

sqlite dump to csv

Since I always forget…

$ sqlite3 postal_codes.db
SQLite version 3.8.5 2014-08-15 22:37:57
Enter ".help" for usage hints.
sqlite> .mode csv
sqlite> .output foo.csv
sqlite> .dump postal_codes
sqlite> ^D

Hash comprehension?

The closest I’ve found to a ruby equivalent for python’s dict comprehension.

# python

from pprint import pprint
import string

pprint({i: string.lowercase[i] for i in range(len(string.lowercase))})
# ruby

require 'pp'

lowercase = ('a'..'z').map { |v| v }
pp Hash[lowercase.each_with_index.map { |v, i|  [i, v] }]

custom pry commands

In a previous post I described how I added some aliases to my .pdbrc to dump local vars to a file.

Now that I’m diving deeper into the ecosystem, I wanted to find out how to do the equivalent in ruby and rails. After some quick googling, I came up with the following roughly roughly equivalent snippet (and analagously can go into .pryrc):

Pry::Commands.block_command "dl", "Dump locals to log" do
  vars = target.local_variables.select { |v| !v.to_s.starts_with?('_') }
  foo = Hash[vars.map { |k| [k, target.local_variable_get(k)] }].to_json
  File.open('/tmp/foo.json', 'w') { |f| f.write(foo) }
  output.puts "Dumped locals to \"/tmp/foo.json\""
end

For reference: git commit

ImageMagick identify

To get the “format and characteristics” of one or more images, ImageMagick to the rescue:

$ find . -type f -iname '*.jpg' | xargs identify

date manip

A couple of commands that might be useful for date manipulation:

Change filename’s modification date to today at 6 am

$ touch -d "`date "+%D"` 06:00:00" filename

If you install faketime,

Run script as if it were 10 am now

$ faketime "`date "+%D"` 10:00:00" script

Docker snippets

Notes on using Docker Toolbox (not Docker for Mac, since I want to provision Docker hosts on remote machines).

# create a new machine
$ docker-machine create --driver virtualbox shiny-new-machine
$ docker-machine ls
$ eval $(docker-machine env shiny-new-machine)
$ docker-machine ip shiny-new-machine

# run a container
$ docker run -p  8001:80 nginx
$ docker ps
$ curl $(docker-machine ip shiny-new-machine)

# start up a bunch of containers
$ docker-compose -f dev.yml up

# run command in existing container
$ docker exec foo_container python manage.py migrate

elasticsearch bulk api

Snippets for creating an elasticsearch index using its bulk api (and python wrapper).

def bulk_index(infile_csv, index_name):
    from elasticsearch.helpers import bulk

    _create_index(index_name)

    print 'Bulk indexing %s...' % infile_csv,
    sys.stdout.flush()
    bulk(client=_es, actions=_actions(infile_csv, index_name))
    print 'Done'
    pprint(_es.count(index=index_name))


def _create_index(name):
    global _es

    body = {
        'mappings': {
            'teaser': {
                'properties': {
                    'foo': { 'type': 'integer' },
                    'bar': { 'type': 'float' },
                    # defaults to string if not specified
                    'baz': { 'type': 'string' },
                }
            }
        }
    }

    if not _es.indices.exists(name):
        print "Creating index: %s" % name
        _es.indices.create(name, body=body)
    else:
        print "Index: %s already exists" % name


def _actions(infile_csv, index_name):
    """
    Return generator of actions for elasticsearch bulk indexing
    """
    with open(infile_csv) as csv_file:
        reader = csv.DictReader(csv_file)
        for row in reader:
            row.update(
                _index=index_name,
                _type='teaser',
                _op_type='index',
                )
            mapping = {
                'foo': int,
                'bar': float,
            }
            yield {k: mapping[k](v) if k in mapping else v for k,v in row.items()}

Fibonacci Fun Time

I signed up for Coursera’s Algorithmic Toolbox for fun and to dust the cobwebs on some CS FUNdamentals.

The first programming assignment involved a number of different variations on calculating Fibonacci numbers. And although they provided a more efficient solution than the naive recursive implementation, I wanted do some googling and see if there were any other faster implementations.

I came upon Nayuki’s fast doubling implementation and wanted to do a quick and dirty benchmark against more common algorithms. Here it is!

## Decorator functions

from functools import wraps

def memoize(func):
    table = {}

    @wraps(func)
    def wrapper(*args):
        if args not in table:
            table[args] = func(*args)

        return table[args]

    return wrapper


def callrange(func):
    @wraps(func)
    def wrapper(n):
        for i in range(n):
            func(i)

        return func(n)

    return wrapper

## Fibonacci functions

def recursive_fibonacci(n):
    if n in (0, 1):
        return n

    return recursive_fibonacci(n - 1) + recursive_fibonacci(n - 2)


@callrange
@memoize
def memoized_fibonacci(n):
    if n in (0, 1):
        return n

    return memoized_fibonacci(n - 1) + memoized_fibonacci(n - 2)


def list_fibonacci(n):
    L = [i for i in range(n + 1)]

    for i in range(2, len(L)):
        L[i] = L[i - 1] + L[i - 2]

    return L[n]


def sum_fibonacci(n):
    if n in (0, 1):
        return n

    i = 2
    j, k = 0, 1

    while i < n + 1:
        j, k = k, j + k
        i += 1

    return k


def fast_doubling_fibonacci(n):
    """
    Fast doubling Fibonacci algorithm

    Copyright (c) 2015 Project Nayuki
    All rights reserved. Contact Nayuki for licensing.
    https://www.nayuki.io/page/fast-fibonacci-algorithms
    """
    def _fib(n):
        if n == 0:
            return (0, 1)
        else:
            a, b = _fib(n // 2)
            c = a * (b * 2 - a)
            d = a * a + b * b
            if n % 2 == 0:
                return (c, d)
            else:
                return (d, c + d)

    return _fib(n)[0]


def test(f):
    solutions = [int(n) for n in open('fibonacci.txt')]

    for i in range(len(solutions)):
        assert f(i) == solutions[i]


def time(f, n):
    import time

    start = time.time()
    f(n)
    end = time.time()

    print "%24s: %12f" % (f.func_name, end - start)  # 1 sec lsb


if __name__ == '__main__':
    F = [
        memoized_fibonacci,
        list_fibonacci,
        sum_fibonacci,
        fast_doubling_fibonacci,
    ]

    for f in F:
        test(f)
        time(f, 10000)
$ python fibonacci.py
      memoized_fibonacci:    22.862408
          list_fibonacci:     0.005365
           sum_fibonacci:     0.002401
 fast_doubling_fibonacci:     0.000047

require 'pry'; binding.pry

Coming from the python world, I’m used to stepping into code by throwing import ipdb; ipdb.set_trace() in my source when I want to debug something.

Note to self: the pry gem seems to be similar in ruby where you can do

require 'pry'; binding.pry

ffmpeg time lapse

Note to self: after some googling and playing around with options, use the following to create time lapses with ffmpeg:

$ ffmpeg -framerate 30 -i '%*.JPG' -q:v 2 -r 30 -pix_fmt yuv420p timelapse.mp4

rlwrap ftw

rlwrap is super nifty. I recently found a need for it when playing around in the ocaml repl:

l00k:~$ ocaml
        OCaml version 4.02.3

# 1 + 2 * 3^A^E^CInterrupted.
# ^D
l00k:~$ rlwrap ocaml
        OCaml version 4.02.3

# 1 + 2 * 3
  ;;
- : int = 7
# ^D
l00k:~$

NGINX proxy_pass to GitHub Pages

I have a digital ocean droplet that I use for my personal website and also as a playground for side projects.

It’s been working out well, but after recently migrating this blog to netlify, I decided to host the landing page for my site on GitHub Pages since its content is extremely simple (and static sites are the new hotness).

While GitHub Pages already provides a subdomain (mine is cadizm.github.io) I wanted to preserve my sweet domain name.

I’m running NGINX and decided to proxy to GitHub Pages since it seems to do pretty much what I want to do. One small caveat is that assets need to be qualified using a full url instead of being specified relatively.

Just in case anyone is trying to do the same, here’s a snippet of the relevant NGINX config:

    location / {
        proxy_pass https://cadizm.github.io;
        proxy_set_header Host cadizm.github.io;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }

git gui blame

Though I’ve been using gitk for years, I’ve never really found a use for git gui.

One cool git gui command I’ve recently found useful is git gui blame. With it when you right-click a line you get a few options, two of which are particularly handy: “Show History Context” and “Blame Parent Commit”.

Blame Parent Commit is especially practical in that you can recursively blame each successive revision.

xargs with files with spaces

Oldie, but goodie that I had to review manpage because I forgot the flag:

$ find . -type f -iname '*.foo' -print0 | xargs -0 rm -f

screen in screen in screen...

Recently ran into an issue where I was running a screen session in an existing screen.

My usual workflow is to start a new screen session each day using a custom screen startup script (since I have some env vars I’d like to renew each day).

I had to then ssh into a remote machine and run a daemonized job in screen that wrote some output I needed to revisit.

The problem arose in that when I tried to detach from the inner screen, I ended detaching from the original screen I had started on my local machine. After some quick googling, I found this.

Since the inner screen was started on a remote machine that didn’t have my .screenrc, the global escape option was still the default ctrl-a, and I didn’t need to go through the usual hoops when detaching from the inner screen.

Reference for myself:

local$ screen-start  # start screen session
local$ screen -x  # attach to screen
local$ ssh remote  # in orig screen
remote$ screen -dmS 'foo.sh' foo.sh  # start/attach new daemonized screen
remote$ ^ad  # detach from inner screen
remote$ ^d  # logout from remote machine
local$  # back to local machine in orig screen

Can also do:

$ screen -d -D

Debugging Octave/Matlab

Small note to self: favor keyboard over dbstop in Octave (not sure about Matlab).

During cursory exploration, locals don’t seem to be in scope using dbstop.

pdb aliases are your friend

Pretty much every time I drop into ipdb, I end up inspecting locals to see what’s going on. Very often I assign what I’m inspecting to a local var foo and then either manipulate it and/or write its contents to disk.

It can become annoying to have to keep writing from pprint import.... and as a result I’ve found these .pdbrc aliases really helpful.

from pprint import pformat, pprint
_L = locals()
alias pl pprint(_L);
alias dl with open('/tmp/locals.txt', 'w') as f: f.write(pformat(_L))

moreutils::sponge

Just discovered moreutils after searching for “cat file sort and write to same file”.

StackExchange told me that sort actually has an option to pseudo edit files “in-place” with the -o, --output options. But one of the more intersting (pun intended) google results described sponge found in moreutils.

Where you can do nifty things like:

$ sort conquest_eligible.txt | sponge conquest_eligible.txt

nvm error: Checksums do not match

If you get a Checksums do not match error while trying to do an nvm install on OS X, do a brew install md5sha1sum.

nvm.sh has the following conditionals in nvm_checksum and sha1 on OS X doesn’t accept the -q option.

nvm_checksum() {
  local NVM_CHECKSUM
  if nvm_has "sha1sum" && ! nvm_is_alias "sha1sum"; then
    NVM_CHECKSUM="$(command sha1sum "$1" | command awk '{print $1}')"
  elif nvm_has "sha1" && ! nvm_is_alias "sha1"; then
    NVM_CHECKSUM="$(command sha1 -q "$1")"
  elif nvm_has "shasum" && ! nvm_is_alias "shasum"; then
    NVM_CHECKSUM="$(shasum "$1" | command awk '{print $1}')"
  else
    echo "Unaliased sha1sum, sha1, or shasum not found." >&2
    return 2
  fi

  if [ "_$NVM_CHECKSUM" = "_$2" ]; then
    return
  elif [ -z "$2" ]; then
    echo 'Checksums empty' #missing in raspberry pi binary
    return
  else
    echo 'Checksums do not match.' >&2
    return 1
  fi
}

GPG snippets

Notes to self re: GPG

If brew installs GPG to gpg2, consider symlinking to gpg.

Generate a key

$ gpg --gen-key

Sign and encrypt

$ gpg --armor --sign --encrypt --recipient='foo@bar.com' --output=baz.gpg baz

Decrypt

$ gpg --decrypt baz.gpg

List/export keys

$ gpg --list-keys
$ gpg --armor --output my-public.key --export 'foo@bar.com'
$ gpg --armor --output my-private.key --export-secret-key 'foo@bar.com'

No more sslwrap

If you find that your version of Python no longer has sslwrap and 3rd party library requires it (i.e. gevent), here’s a post describing a workaround which worked for me.

It basically boils down to defining and setting the following:

import inspect
__ssl__ = __import__('ssl')

try:
    _ssl = __ssl__._ssl
except AttributeError:
    _ssl = __ssl__._ssl2


def new_sslwrap(sock, server_side=False, keyfile=None, certfile=None, cert_reqs=__ssl__.CERT_NONE, ssl_version=__ssl__.PROTOCOL_SSLv23, ca_certs=None, ciphers=None):
    context = __ssl__.SSLContext(ssl_version)
    context.verify_mode = cert_reqs or __ssl__.CERT_NONE
    if ca_certs:
        context.load_verify_locations(ca_certs)
    if certfile:
        context.load_cert_chain(certfile, keyfile)
    if ciphers:
        context.set_ciphers(ciphers)

    caller_self = inspect.currentframe().f_back.f_locals['self']
    return context._wrap_socket(sock, server_side=server_side, ssl_sock=caller_self)

if not hasattr(_ssl, 'sslwrap'):
    _ssl.sslwrap = new_sslwrap

Git flow bullets/checklist

We use Git flow at work. While the branching strategy makes alot of sense and is easy to follow, it can be easy to forget some steps when executing branch tasks.

Here’s sort of a bulleted checklist I use to make sure for the most part I’m adhering to the spirit of the model. It’s derived mostly from an Atlassian post describing the workflow.

Git flow branches

master

  • master stores official release history
  • tag each commit on master with version/release number
  • hotfix branches cut from master

develop

  • develop used to integrate feature branches
  • feature branches branch from develop
  • feature branches merged back into develop

release branches

  • release/xxx branches branch from develop
  • only release-related tasks (bug fixes, documentation) allowed into release branch after it is cut
  • once release branch is ready, it is merged into master (which is then tagged with version/release number)
  • release branch also merged back into develop

hotfix branches

  • only hotfix/xxx branches may branch from master
  • once hotfix complete, merge branch into both master and develop
  • master once again tagged (since each commit to master is tagged)

Markdownify'ing webpages

I needed to convert some webpages to a human-readable format for studing/review at a later time. After some thought I came up with the following:

  1. download pages via curl/wget or good old “save page as”
  2. slugify filenames for easier shell manipulation
  3. convert to markdown using Aaron Swartz’s html2text
  4. rename file extensions to reflect new format
# delete directories (handling spaces in filenames)
$ find . -type d -print0 | xargs -0 rm -rf

# slugify, rename, then delete html files
$ ls | xargs -0 | slugify && \
for file in $(ls ); do
    html2text $file > ${file%.html}.md;
done && \
rm *.html

brute force arithmetic

Came across this puzzle today. Instead of trying to do some fancy multivariate analysis, here’s a brute force solution.

"""
Solve the `nam puzzle' using brute force.

See http://goo.gl/8b0ri6 for full problem description.
"""


def main():
    from itertools import permutations
    for p in permutations(range(1, 10)):
        a, b, c, d, e, f, g, h, i = p
        if a + ((13 * b) / c) + d + (12 * e) \
                - f - 11 + ((g * h) / i) - 10 == 66:
            print ':) => {}'.format(p)
            break
    else:
        print ':('


if __name__ == '__main__':
    import sys
    sys.exit(main())
l00k:~/ws/20150522$ python nam_puzzle.py
:) => (1, 2, 3, 4, 6, 5, 7, 9, 8)

host dns lookup

Why have I not been using this instead of dig?

l00k:~/workspace/src/blog.mcadiz.com (master) $ host google.com
google.com has address 216.58.216.14
google.com has IPv6 address 2607:f8b0:4007:809::200e
google.com mail is handled by 20 alt1.aspmx.l.google.com.
google.com mail is handled by 10 aspmx.l.google.com.
google.com mail is handled by 30 alt2.aspmx.l.google.com.
google.com mail is handled by 50 alt4.aspmx.l.google.com.
google.com mail is handled by 40 alt3.aspmx.l.google.com.

l00k:~/workspace/src/blog.mcadiz.com (master) $ host mcadiz.com
mcadiz.com has address 192.241.207.230
mcadiz.com mail is handled by 20 mx2.zohomail.com.
mcadiz.com mail is handled by 10 mx.zohomail.com.

Though with the verbose flag turned on, it’s pretty much dig:

l00k:~/workspace/src/blog.mcadiz.com (master) $ host -d mcadiz.com
Trying "mcadiz.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25273
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;mcadiz.com.                    IN      A

;; ANSWER SECTION:
mcadiz.com.             1550    IN      A       192.241.207.230

Received 44 bytes from 10.18.6.11#53 in 24 ms
Trying "mcadiz.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64794
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;mcadiz.com.                    IN      AAAA

;; AUTHORITY SECTION:
mcadiz.com.             1543    IN      SOA     ns1.digitalocean.com. hostmaster.mcadiz.com. 1427872864 10800 3600 604800 1800

Received 92 bytes from 10.18.6.11#53 in 1 ms
Trying "mcadiz.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24704
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;mcadiz.com.                    IN      MX

;; ANSWER SECTION:
mcadiz.com.             1551    IN      MX      20 mx2.zohomail.com.
mcadiz.com.             1551    IN      MX      10 mx.zohomail.com.

Received 76 bytes from 10.18.6.11#53 in 1 ms

E141: No file name for buffer

If you encounter the vim error messager E141: No file name for buffer and you’re not aware of the different buftype’s, you can learn more here.

But if you don’t care and just want this error to go away, SO to the rescue

:setlocal buftype=nofile
:setl bt=nofile

netlify'ing :)

I came across Netlify today on Hacker News and am trying it out. This is the first trial post set up to use their continuous deployment feature. I’ll do a more thorough follow-up post once I have some time to read through their docs a bit more.

Instagram python api Enforce Signed Header

If you have selected Enforce signed header for your Instagram client, you need to make sure to set the X-Insta-Forwarded-For header as described in the developer docs.

I’m using the python-instagram library provided by Instagram itself and couldn’t figure out how to set this header. If you know how to do this, please let me know!

The diff below augments python-instagram’s OAuth2API class and computes and stuffs the X-Insta-Forwarded-For header arbitrarily on each request. Note that the value is the client ip address concatenated with an HMAC calculated using SHA256 hash function and the client secret as the key and the client ip address as the message.

l0st:~/workspace/src/python-instagram.git (master) $ git df
diff --git a/instagram/oauth2.py b/instagram/oauth2.py
index 56dbd23..7ab0cbd 100644
--- a/instagram/oauth2.py
+++ b/instagram/oauth2.py
@@ -1,3 +1,6 @@
+import hashlib, hmac
+import socket
+
 from .json_import import simplejson
 from six.moves.urllib.parse import urlencode
 from httplib2 import Http
@@ -209,6 +212,14 @@ class OAuth2Request(object):
         headers = headers or {}
         if not 'User-Agent' in headers:
             headers.update({"User-Agent": "%s Python Client" % self.api.api_name})
+        if not 'X-Insta-Forwarded-For' in headers:
+            key = 'put-client-secret-key-here'
+            h = hmac.new(key, digestmod=hashlib.sha256)
+            ipaddr = socket.gethostbyname(socket.gethostname())
+            h.update(ipaddr)
+            v = '{}|{}'.format(ipaddr, h.hexdigest())
+            print v
+            headers.update({"X-Insta-Forwarded-For": v})
         # https://github.com/jcgregorio/httplib2/issues/173
         # bug in httplib2 w/ Python 3 and disable_ssl_certificate_validation=True
         http_obj = Http() if six.PY3 else Http(disable_ssl_certificate_validation=True)

CPython vs C Performance

I’ve been doing some HackerRank challenges for fun lately. Their platform has support for pretty much every language under the sun, so users are free to use what they’re most comfortable with.

Python’s been my language of choice for the problem sets since they’re not too complicated (at least the warmups) and the solutions typically require only coding up a quick function or two.

Submissions have time/memory limitations based on language, which obviously makes sense since performance will vary depending upon language. But I never really gave language much weight and concentrated on making sure my algorithm had good to decent Big O computational run time complexity (where good to decent means polynomial times of O(n2) or O(n3) or less).

But after spending a bit of time on a particular problem whose straightforward solution had a cubic running time and consistently went over HackerRank’s 10 second time limit (for python programs), I decided to poke around the discussion boards and see if I could find out what I was doing wrong and how others were approaching the problem.

To my surprise one of the moderators commented that another poster’s C++ solution was `correct’ and simply required 3 nested loops over the input; pretty much exactly what I was doing, except in a different language.

I was a bit surprised, and so I translated my python solution to vanilla C and low and behold run times went from ~16 sec to under 1 sec. I’ll now concede that language choice does in fact matter for `hot’ pieces of code or systems requiring realtime or near-realtime performance. In such cases it probably makes sense to optimize and/or rewrite critical pieces in a more close-to-the metal language like assembly, C, or C++.

But obviously correctness trumps all and gathering data/metrics via profiling and instrumentation should be done first to see which parts may benefit most from an optimized rewrite.

For reference the programs and their associated running times are below. And here is the problem description and one of the sample inputs.

$ time cat input/acm_icpc_team.txt | python acm_icpc_team-v1.py
467
1

real    0m16.264s
user    0m16.176s
sys     0m0.045s

$ time cat input/acm_icpc_team.txt | python acm_icpc_team-v2.py
467
1

real    0m16.954s
user    0m16.874s
sys     0m0.045s

$ time cat input/acm_icpc_team.txt | ./acm_icpc_team
467
1

real    0m0.465s
user    0m0.461s
sys     0m0.005s
# acm_icpc_team-v1.py

import sys

if __name__ == '__main__':
    line = sys.stdin.readline()
    N, M = [int(n) for n in line.split()]
    buf = [line.strip() for line in sys.stdin]

    max_ = 0
    count = 0

    for i in range(N-1):
        for j in range(i+1, N):
            bits = 0
            for k in range(M):
                if buf[i][k] == '1' or buf[j][k] == '1':
                    bits += 1
            if bits == max_:
                count += 1
            elif bits > max_:
                max_ = bits
                count = 1

    print "{0}\n{1}".format(max_, count)
# acm_icpc_team-v2.py

import sys

if __name__ == '__main__':
    line = sys.stdin.readline()
    N, M = [int(n) for n in line.split()]
    buf = [line.strip() for line in sys.stdin]

    max_ = 0
    count = 0

    for i in range(N-1):
        for j in range(i+1, N):
            v = int(buf[i], base=2) | int(buf[j], base=2)
            bits = 0
            while v:
                v &= (v-1)  # clear the least significant bit set
                bits += 1
            if bits == max_:
                count += 1
            elif bits > max_:
                max_ = bits
                count = 1

    print "{0}\n{1}".format(max_, count)
/* acm_icpc_team.c */

#include <stdio.h>

int main(int argc, char** argv)
{
    int N, M;
    scanf("%d %d\n", &N, &M);

    char buf[500][500];
    for (int i = 0; i < N; ++i) {
        scanf("%500s\n", buf[i]);
    }

    int max = 0;
    int count = 0;

    for (int i = 0; i < N-1; ++i) {
        for (int j = i+1; j < N; ++j) {
            int bits = 0;
            for (int k = 0; k < M; ++k) {
                if (buf[i][k] == '1' || buf[j][k] == '1') {
                    bits += 1;
                }
            }
            if (bits == max) {
                count += 1;
            }
            else if (bits > max) {
                max = bits;
                count = 1;
            }
        }
    }

    printf("%d\n%d\n", max, count);

    return 0;
}

Nibble to binary string

I needed to demonstrate converting decimal nibbles to binary and used this:

for i in range(0, 16):
    binary = '{:04d}'.format(int(bin(i)[2:]))

In retrospect, I probably should have taken the time to write a function:

def binary(nibble):
    "Return 0-padded binary string of nibble"
    if nibble < 0 or nibble > 15:
        raise Exception('Invalid input: {0}'.format(nibble))
    bits = []
    for _ in range(4):
        bits.insert(0, str(nibble & 0x1))
        nibble >>= 1
    return ''.join(bits)

Python Package to RPM

Sometimes you need to convert your Python package into an RPM (perhaps for political reasons). If you are using distutils/setuptools, then this is very easy after a bit of setup.

The exapmle that follows assumes a CentOS 6 box, but should work on any Fedora-like distro.

Enable EPEL
$ wget http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
$ sudo rpm -ivh epel-release-6-8.noarch.rpm
Install dependencies
$ sudo yum groupinstall -y 'Development Tools'
$ sudo yum install -y fedora-packager git python-pip
Setup RPM build envirionment
$ sudo usermod -a -G mock <foo user>
$ rpmdev-setuptree
Create spec, build source tarball
$ python setup.py bdist_rpm --requires='mysql-connector-python,python-pip'
Modify spec as needed

As mentioned in the docs, “there are many options in .spec files that don’t have corresponding options in the setup script” so you may need to modify parts by hand. In my case, there’s currently no RPM for sqlsoup, so I need to install it during the post step.

# foo.spec
%prep
%setup -n %{name}-%{unmangled_version} -n %{name}-%{unmangled_version}

%build
python setup.py build

%post
pip install sqlsoup
Build RPM
$ rpmbuild -ba ./foo.spec
Install RPM

Hopefully the rpmbuild command completed successfully and all that remains is to install the RPM.

$ sudo rpm -ivh foo-0.1.0-1.noarch.rpm

ssh force IdentityFile

If you’re like me and you work with alot of machines or frequently spin up and teardown vagrant boxes, then your ssh config file (e.g. $HOME/.ssh/config) is littered with entries like:

Host 192.168.1.*
  IdentitiesOnly yes
  IdentityFile ~/.ssh/id_rsa-foo

Host 10.10.10.*
  User baz
  IdentitiesOnly yes
  IdentityFile ~/.ssh/id_rsa.bar

When the machine you’d like to ssh into falls within one of the config ip ranges, but you’d like to override the rsa key used:

$ ssh -vvv -o PubkeyAuthentication=no -o IdentitiesOnly=yes -o IdentityFile=~/.ssh/id_rsa.private foo@127.0.0.1

Memoization tests (py edition)

Script execution and abbreviated source below. Full source can be found here.

Note the huge gain after memoization, and that hand-rolled versions still offer small gains over a decorated version.

$ python factorial.py

Running Factorial
Testing correctness...
Success
Timing...
0.240013837814
Success

Running RecursiveFactorial
Testing correctness...
RuntimeError('maximum recursion depth exceeded in cmp',) at iteration 998
Timing...
RuntimeError('maximum recursion depth exceeded in cmp',) iteration 994
RuntimeError('maximum recursion depth exceeded in cmp',) iteration 994
0.569229125977
RuntimeError('maximum recursion depth exceeded in cmp',) at iteration 998

Running MemoizedRecursiveFactorial
Testing correctness...
Success
Timing...
0.000745058059692
Success

Running IterativeFactorial
Testing correctness...
Success
Timing...
0.307121992111
Success

Running MemoizedIterativeFactorial
Testing correctness...
Success
Timing...
0.000705003738403
Success

Timing <function time_test_decorated_recursive_factorial at 0x1004a4c08>
0.00292491912842
#
# Abridged source
#

class memoize(object):
    """Memoization decorator (@memoize).
    """

    def __init__(self, func):
        self.func = func
        self.memo = dict()

    def __call__(self, *args):
        try:
            return self.memo[args]
        except KeyError:
            self.memo[args] = self.func(*args)
            return self.memo[args]


@memoize
def decorated_recursive_factorial(n):
    if n < 0:
        return None
    if n in [0, 1]:
        return 1
    return n * decorated_recursive_factorial(n - 1)


class MemoizedRecursiveFactorial(Factorial):
    """Memoized recursive factorial implementation.
    """

    def __str__(self):
        return "MemoizedRecursiveFactorial"

    def factorial(self, n):
        if n < 0:
            return None
        if n in self.D:
            return self.D[n]
        if n in [0, 1]:
            return 1
        self.D[n] = n * self.factorial(n - 1)
        return self.D[n]


class MemoizedIterativeFactorial(Factorial):
    """Memoized iterative factorial implementation.
    """

    def __str__(self):
        return "MemoizedIterativeFactorial"

    def factorial(self, n):
        if n < 0:
            return None
        if n in self.D:
            return self.D[n]
        m, res = n, 1
        while m > 1:
            if (m - 1) in self.D:
                self.D[m] = self.D[m - 1] * m
                return self.D[m]
            else:
                res *= m
                self.D[m] = res
                m -= 1
        return res


if __name__ == '__main__':
    f = Factorial()
    r = RecursiveFactorial()
    mr = MemoizedRecursiveFactorial()
    i = IterativeFactorial()
    mi = MemoizedIterativeFactorial()

    for instance in [f, r, mr, i, mi]:
        print '\nRunning {0}'.format(instance)

        print 'Testing correctness...'
        if instance.test_correctness():
            print 'Success'

        print 'Timing...'
        print instance.time_test_factorial()

    print '\nTiming {0}'.format(repr(time_test_decorated_recursive_factorial))
    print time_test_decorated_recursive_factorial()

VirtualBox Bridged Network Adapter

To ssh into it a virtualbox vm, set up bridged networking:

  1. Add a bridged adapter similar to this

  2. Spin up your vm and look for the new interface. In my case the interface was not configured automatically:

     $ ifconfig -a
     eth2      Link encap:Ethernet  HWaddr 08:00:27:2D:73:70  
               BROADCAST MULTICAST  MTU:1500  Metric:1
               RX packets:0 errors:0 dropped:0 overruns:0 frame:0
               TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
               collisions:0 txqueuelen:1000 
               RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
               Interrupt:16 Base address:0xd240 
    
  3. To set up via dhcp, kill any dhclient processes that may attached to the interface and then restart:

     $ kill $(lsof -tc dhclient)
     $ dhclient eth2
    
  4. Assuing dhclient succeeded, you can now find out the ip address assigned:

     $ifconfig -a
     eth2      Link encap:Ethernet  HWaddr 08:00:27:2D:73:70  
               inet addr:10.10.11.163  Bcast:10.10.15.255  Mask:255.255.248.0
               inet6 addr: fe80::a00:27ff:fe2d:7370/64 Scope:Link
               UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
               RX packets:34244 errors:15 dropped:0 overruns:0 frame:0
               TX packets:477 errors:0 dropped:0 overruns:0 carrier:0
               collisions:0 txqueuelen:1000 
               RX bytes:2268686 (2.1 MiB)  TX bytes:60728 (59.3 KiB)
               Interrupt:16 Base address:0xd240 
    
  5. To have the interface configured at boot, in the vm:

    $ cd /etc/sysconfig/network-scripts
    $ cp ifcfg-eth0  ifcfg-eth2
    $ sed -i 's/eth0/eth2/g' ifcfg-eth2
    
  6. Back on the host OS, you can now:

     $ ssh 10.10.11.163
    

Mac OS X NFS Client

Show what’s mounted on 192.168.1.100 and mount on /nfs/foo

$ showmount -e 192.168.1.100
Exports list on 192.168.1.100:
/export/nfs                         192.168.1.101
$ sudo mount -t nfs 10.10.11.62:/export/nfs /nfs/foo

GTK Emacs Keybindings

Works on Ubuntu 12.04 LTS using Unity/xmonad.

Install gconf-editor and launch. This should create $HOME/.gtkrc-2.0 which has something like:

include "/home/me/.gtkrc.mine"

in which you can add:

gtk-key-theme-name = "Emacs"

Linux Caps Lock to Control

Download: http://goo.gl/pxhZdS

#!/bin/bash

if [[ $EUID -ne 0 ]]; then
    echo "This script must be run as root" 1>&2
    exit 1
fi

if [[ -z `which dumpkeys` ]]; then
    echo "dumpkeys not in PATH" 1>&2
    exit 1
fi

if [[ -z `which loadkeys ` ]]; then
    echo "loadkeys not in PATH" 1>&2
    exit 1
fi

dumpkeys | head -1 > /tmp/foo_keys
echo "keycode 58 = Control" >> /tmp/foo_keys
loadkeys /tmp/foo_keys

SD card dd unbuffered block size write speed

l0st:~/workspace/src/minibian/image$ sudo pv -pterb minibian-2013-10-13-fcs.img.copy | sudo dd of=/dev/rdisk1
75.12MiB 0:05:28 [237kiB/s]

l0st:~/workspace/src/minibian/image$ sudo pv -pterb minibian-2013-10-13-fcs.img.copy | sudo dd of=/dev/rdisk1 bs=8m
1.916GiB 0:03:48 [10.7MiB/s]

-__-

CentOS(ish) Static IP

# /etc/resolv.conf
domain foo.bar.com
nameserver 10.10.10.12
nameserver 10.10.10.11

 

# /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=pi
GATEWAY=10.10.10.1

 

# /etc/sysconfig/network-scripts/ifcfg-eth0 
DEVICE=eth0
#BOOTPROTO=dhcp
BOOTPROTO=static
HWADDR="aa:bb:cc:11:22:33"
IPADDR=10.10.11.215
NETMASK=255.255.248.0
ONBOOT=yes
GATEWAY=10.10.10.1

 

$ rm /etc/udev/rules.d/70-persistent-net.rules && reboot

Imaged OS mac address caching

udev caches network interface definitions by mac address. If you are using a cloned image with this (most likely) defined incorrectly, it’s probably safe to just delete /etc/udev/rules.d/70-persistent-net.rules.

ssh disable public key authentication

If you have an ssh-agent running and perhaps alias ssh to ssh -A, you can disable agent forwarding and force password prompting with:

$ ssh -o 'PubkeyAuthentication=no' foo@127.0.0.1

or

$ SSH_AUTH_SOCK='' ssh foo@127.0.0.1

as described here.

Modify Raspberry Pi Image

Download your raspberry image of choice. In my case I needed the redsleeve distro.

$ fdisk -l raspi-v2-redsleeve-cli-0.3.img

Disk raspi-v2-redsleeve-cli-0.3.img: 1939 MB, 1939865600 bytes
255 heads, 63 sectors/track, 235 cylinders, total 3788800 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00014d34

                         Device Boot      Start         End      Blocks   Id  System
raspi-v2-redsleeve-cli-0.3.img1            8192      122879       57344    c  W95 FAT32 (LBA)
raspi-v2-redsleeve-cli-0.3.img2          122880     3788799     1832960   83  Linux

We want to be able to mount/modify the second partition on /mnt. Run:

$ sudo kpartx -a raspi-v2-redsleeve-cli-0.3.img

which should add a loop device in /dev/mapper:

$ ls /dev/mapper
control  loop0p1@  loop0p2@  root@  swap1@

The second partition is now mapped to /dev/mapper/loop0p2. Mount:

$ sudo mount -o rw /dev/mapper/loop0p2 /mnt
$ cd /mnt
$ ls
bin/  boot/  dev/  etc/  home/  lib/  lost+found/  media/  mnt/  opt/  proc/  root/  sbin/  selinux/  srv/  sys/  tmp/  usr/  var/

We can now modify the disk image as we please. Remember to umount and kpartx -d when done.

$ sudo umount /mnt
$ sudo kpartx -d raspi-v2-redsleeve-cli-0.3.img
loop deleted : /dev/loop0

Refresh postfix

From Postfix README

Execute the command:

$ postmap /etc/postfix/virtual

after changing the virtual file, and execute the command:

$ postfix reload

after changing the main.cf file.

Install pip for the current user

$ wget https://bootstrap.pypa.io/get-pip.py
$ python get-pip.py --user

Link: get-pip.py

Apache mod_rewrite `serve backup' file

when we can’t find a particular static file in its “usual” location, use its alternate.

RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
RewriteCond %{REQUEST_URI}  ^/foo/bar/baz(.*\.png)$
RewriteRule ^/foo/bar/baz/(.*)$  /nfs/foo/bar/baz/$1

Manage dotfiles with dotfiles-python

$ workon dotfiles
$ cd ~/workspace/src/dotfiles/
$ dotfiles -R . --list
$ dotfiles -R . --sync --dry-run

git grep with blame

sometimes we want grep and blame

add to git-blame on your PATH

#!/usr/bin/env python

import sys
from subprocess import call

if __name__ == '__main__':
    args = sys.argv[1:]
    assert len(sys.argv[1:]) % 2 == 0
    for (file_line, text) in zip(*[iter(args)] * 2):
        f, n = file_line.strip(':').split(':')
        call(['git', 'blame', '-L', '{0},{1}'.format(n, n), '--', f])

then

$ git grep -i 'error loading order details' | xargs git-blame
76d02b9z (Foo User 2014-03-14 21:05:53 1957)    $self->alert("Error loading order details");
76d02b9z (Foo User 2014-03-14 21:05:53 2053)    $self->alert("Error loading order details");

PostgreSQL Ident Authentication

assumes debian/ubuntu-ish linux flavor

install oidentd

$ sudo apt-get install oidentd

add to /etc/oidentd.conf

 user foo_user {
     default {
         deny spoof
         deny spoof_all
         deny spoof_privport
         allow random
         allow random_numeric
         allow numeric
         deny hide
     }
 }

add to /etc/postgresql/9.1/main/pg_hba.conf

 # TYPE   DATABASE   USER   ADDRESS   METHOD
 local    all        all              ident map=my_map

add to /etc/postgresql/9.1/main/pg_ident.conf

 # MAPNAME   SYSTEM-USERNAME   PG-USERNAME
 my_map      foo_user          pg_user

restart services

$ sudo service oidentd restart
$ sudo service postgresql restart

references