-
This is the blog of Sam Newman, a consultant working for ThoughtWorks in the UK.
I can be contacted at sam.newman@gmail.com.
My Twitter screen name is @samnewman.
links for 2010-09-01
-
A Linux tool for monitoring resource trends
-
How to move several thousand devs from clearcase to SVN in 7 days. ish.
-
An open-source syslog storage and viewing system using MongoDB and rails
links for 2010-08-25
-
interesting collection of design patterns for Web UIs.
-
So basically, it's a level of redirection on just checking the dependencies in in the first place.
links for 2010-08-23
-
A clojure wrapper over jsch
-
Several kinds of awesome
links for 2010-08-20
-
A command-line interface for the cache header checking redbot. Very cool.
-
The Resource Expert Droid is an online app which validates cache headers for HTTP ans gives advice on them. Mundo coolio.
links for 2010-08-19
-
Video lectures from the SICP authors
links for 2010-08-18
-
In a similar vein to my recent Clojure & Incanter work, albeit with slightly more important data
links for 2010-08-17
-
Fantastic pictorial representation of a PHD
Graphing Unique Users With Incanter
In a previous post, I showed how we could use Clojure and specifically Incanter to process access logs to graph hits on our site. Now, we’re going to adapt our solution to allow us to to show the number of unique users over time.
We’re going to change the previous solution to pull out the core dataset representing the raw data we’re interested in from the access log – records-from-access-log remains unchanged from before:
(defn access-log-to-dataset [filename] (col-names (to-dataset (records-from-access-log filename)) ["Date" "User"]))
The raw dataset retrieved from this call looks like this:
| Date | User |
|---|---|
| 11/Aug/2010:00:00:30 +0100 | Bob |
| 11/Aug/2010:00:00:31 +0100 | Frank |
| 11/Aug/2010:00:00:34 +0100 | Frank |
Now, we need to work out the number of unique users in a given time period. Like before, we’re going to use $rollup to group multiple records by minute, but we need to work out how to summarise the user column. To do this, we create a custom summarise function which calculates the number of unique users:
(defn num-unique-items [seq] (count (set seq)))
Then use that to modify the raw dataset and graph the resulting dataset:
(defn access-log-to-unique-user-dataset
[access-log-dataset]
($rollup num-unique-items "User" "Date"
(col-names (conj-cols ($map #(round-ms-down-to-nearest-min (as-millis %)) "Date" access-log-dataset) ($ "User" access-log-dataset)) ["Date" "Unique Users"])))
(defn concurrent-users-graph
[dataset]
(time-series-plot :Date :User
:x-label "Date"
:y-label "User"
:title "Users Per Min"
:data (access-log-to-unique-user-dataset dataset)))
(def access-log-dataset
(access-log-to-dataset "/path/to/access.log"))
(save (concurrent-users-graph access-log-dataset) "unique-users.png")
You can see the full source code listing here.
