EDA with Org Tables

I’ve enjoyed the occasional wrangling with tabular data in Org tables before1, after all you still come heavy with whatever your texteditor is capable of and most of the time exploratory data analysis (EDA) starts out simple, by looking at tables. Only recently I learned about orgtbl-ascii-draw, which draws an ASCII barplot with values from a given column. Actually, passing some UTF-8 block elements to the characters argument produces this decent plot:

| Key | Value |             |
|-----+-------+-------------|
| c   |  0.05 | ▎           |
| a   |   1.1 | ██▋         |
| a   |   1.2 | ██▉         |
| b   |     2 | ████▉       |
| c   |   4.3 | ██████████▍ |
| d   |   3.1 | ███████▌    |
#+tblfm: $3='(orgtbl-ascii-draw $2 0 5 12 (apply 'string (number-sequence 9615 9608 -1)))

For larger collections a stemplot may be more appropriate. There is a stem function in R, but that should be doable in Elisp as well. I’ll use Org’s “Library of Babel” facilities to later call this implementation from other code blocks:

#+name: stemplot
#+begin_src elisp :results table :lexical t :var data='() stemlen='()
(let* ((dat (sort (mapcar 'car (copy-sequence data)) '<))
       (slen (string-width (number-to-string (truncate (apply 'max dat)))))
       (stemlen (or stemlen 1))
       (dat (mapcar (lambda (x) (floor x (expt 10 (- slen stemlen 1)))) dat))
       acc)
  (setq slen (or stemlen slen))
  (setq fac (expt 10 (- slen (1- stemlen))))
  ;; upper limit
  (dotimes (i (1+ (floor (car (last dat)) fac)) acc)
    (push (list i (mapconcat
                   (lambda (x)
                     (if (and (>= x (* fac i))
                              (< x (* fac (1+ i))))
                         ;; now that x are integers get the remainder
                         (format "%s" (% x fac))))
                   dat ""))
          acc)))
#+end_src

Now that the stemplot function is defined in a named source code block, we need to add it to the library:

(org-babel-lob-ingest (buffer-file-name))

To celebrate the NBA Playoffs fever that absolutely hit me (again), let’s have a glance at the field goal attempts per game that have been a source of controversy during the first round:

#+begin_src shell :results table :cache yes :post stemplot(data=*this*, stemlen=2)
curl -X GET "https://www.basketball-reference.com/playoffs/NBA_2017_per_game.html" | \
    sed -nr '1,$ s/.*data-stat="fga_per_g" >([0-9]+\.[0-9]+)<\/td>.*/\1/ p'
#+end_src
Table 1: FGA/G NBA Playoffs 2017
Stem Leaves
30 3
29  
28  
27  
26  
25  
24  
23  
22 05
21 36
20 06
19 388
18 0336
17 011358
16 58
15 01
14 3
13 00036889
12 00013
11 0001359
10 56
9 001135589
8 033445566666888
7 0000002345
6 00000022257899
5 000002224555555557
4 00222244555577799
3 00000000000225555566677779
2 00000011255556677
1 0000001333333333555557778888888
0 00022255566

The outlier is, who would have thought, Russell Westbrook, the black hole from OKC.

Org-mode(6)
Elisp(7)
Emacs(10)
Org-table(3)