EDA with Org Tables

EDA with Org Tables

I’ve enjoyed the occasional wrangling with tabular data in Org tables before1, after all you still come heavy with whatever your texteditor is capable of and most of the time exploratory data analysis (EDA) starts out simple, by looking at tables. Only recently I learned about orgtbl-ascii-draw, which draws an ASCII barplot with values from a given column. Actually, passing some UTF-8 block elements to the characters argument produces this decent plot:

| Key | Value |            |
|-----+-------+------------|
| c   |  1.05 | ▎          |
| a   |   1.1 | ▍          |
| a   |   1.2 | ▋          |
| b   |     2 | ███▏       |
| c   |   4.3 | █████████▉ |
| d   |   3.1 | ██████▍    |
#+TBLFM: $3='(orgtbl-ascii-draw $2 1 5 12 (apply 'string (number-sequence 9615 9608 -1)))

For larger collections a stemplot may be more appropriate. There is a stem function in R, but that should be doable in Elisp as well. I’ll use Org’s “Library of Babel” facilities to later call this implementation from other code blocks:

#+name: stemplot
#+begin_src elisp :results table :lexical t :var data='() stemlen='()
(let (dat acc slen fac)
  (setq dat (sort (mapcar 'car (copy-sequence data)) '<))
  (setq slen (string-width (number-to-string (truncate (apply 'max dat)))))
  (setq stemlen (or stemlen 1))
  (setq dat (mapcar
             (lambda (x) (floor x (expt 10 (- slen stemlen 1))))
             dat))
  (setq slen (or stemlen slen))
  (setq fac (expt 10 (- slen (1- stemlen))))
  ;; upper limit
  (dotimes (i (1+ (floor (car (last dat)) fac)))
    (push (list i (mapconcat
                   (lambda (x)
                     (if (and (>= x (* fac i))
                              (< x (* fac (1+ i))))
                         ;; now that x are integers get the remainder
                         (format "%s" (% x fac))))
                   dat ""))
          acc))
  (nreverse acc))
#+end_src

Now that the stemplot function is defined in a named source code block, we need to add it to the library:

(org-babel-lob-ingest (buffer-file-name))

To celebrate the NBA Playoffs fever that absolutely hit me (again), let’s have a glance at the field goal attempts per game that have been a source of controversy during the first round:

#+begin_src shell :results table :cache yes :post stemplot(data=*this*, stemlen=2)
curl -X GET "http://www.basketball-reference.com/playoffs/NBA_2017_per_game.html" | \
    sed -nr '1,$ s/.*data-stat="fga_per_g" >([0-9]+\.[0-9]+)<\/td>.*/\1/ p'
#+end_src
Table 1: FGA/G NBA Playoffs 2017
Stem Leaves
30 3
29  
28  
27  
26  
25  
24  
23  
22 05
21 13
20 066
19 36
18 00136
17 0113
16 56
15 5
14 33
13 0036688
12 013459
11 00359
10 1134588
9 011138
8 00133355666668
7 0000235788
6 0000000002222257789
5 000022255557
4 02222455555577777
3 00000000000022246677777799
2 0000000013333555556
1 0000000011333355555667778888
0 0000022222469

The outlier is, who would have thought, Russell Westbrook, the black hole from OKC.

Org-mode Elisp Emacs Org-table