Will Critchlow - Using and abusing data


No comments posted yet


Slide 30

grep www.distilled.net-access_log -e 'Googlebot' | grep -e '404‘ grep www.distilled.net-access_log -e ' 404 ' | grep -e 'store'

Slide 1

Using and abusing data Setting us apart from other marketers Will Critchlow, 2011

Slide 2

This presentation is about skills and things to try… Understanding data is all that separates us from the savages

Slide 3

Some skills I firmly believe technical skills make you a better SEO

Slide 4

Gathering data: there are loads of places it could come from, but you should be adept at accessing it. Check out http://www.infochimps.com/

Slide 5

There are loads of great APIs: http://www.seomoz.org/blog/api-and-dataset-cheatsheet-building-quick-dirty-tools

Slide 6

Sometimes you need to go get your own: with varying levels of difficulty. Check out http://www.mozenda.com/, http://scraperwiki.com/

Slide 7

Rapid prototyping FTW: I’m talking *really* Heath Robinson

Slide 8

On statistics The modern world runs on math. Learn just a little bit

Slide 9

We are good at spotting (and extrapolating) trends Go, go, go

Slide 10

But randomness often looks like a trend Uh oh

Slide 11

And sometimes, we are just wrong PANIC!!1

Slide 12

A 1% mis-diagnosis rate can result in more errors than correct diagnoses

Slide 13

Pie charts are often hard to be read and can easily be made misleading http://dis.tl/m6sZLO

Slide 14

Watch out for charts that don’t go to zero on the y-axis http://dis.tl/iH1qgs

Slide 15

The same data presented in a less misleading way

Slide 16

Action: read “How to lie with statistics” http://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/0393310728 Old, but great

Slide 17

Don’t blindly mine data for patterns (you’ll definitely find something). Have a hypothesis Test on different data to that used to form the hypothesis Don’t forget that a 95% confidence interval is wrong 1 in 20 times Common statistical misconceptions / mistakes

Slide 18

Visual =Powerful

Slide 19

Geckoboard: build a culture of data - http://www.geckoboard.com

Slide 20

Visible data and clear targets work: we had a modest internal target to hit 2,000 followers of @distilled by our #linklove conference Target set Target ended Source: twittercounter.com

Slide 21

One of our clients built a tool to visualize website changes over time: http://www.reevoo.com / https://github.com/georgebrock/timelapse

Slide 22

Public = powerful

Slide 23

Coming soon: public post analytics on SEOmoz

Slide 24

Back to getting the data

Slide 25

=importxml("http://www.google.com/search?gl=us&q="&A2, "//h3[@class='r']/a/@href") We’ve written about the power of Gdocs for rapid prototyping a bunch of times – see for example: http://dis.tl/kZsv9y and http://dis.tl/jqk05J

Slide 26

Another xpath resource: http://developer.yahoo.com/yql/console/

Slide 27

Command line

Slide 28

Linux: native command lines make a lot of this easier. The latest Ubuntu and VirtualBox are a piece of cake to install and get using

Slide 29

VirtualBox: take snapshots really frequently

Slide 30

Unix principle: lots of small tools, piped together – great for ad-hoc or to create specifications grep * -r -e ‘UA-XXXXXX’ | sed -e ‘s/UA-\([0-9]*\)/UA-XXXXXX/g’

Slide 31

Inspiration: via @tomcritchlow - http://www.neilkodner.com/

Slide 32

Real programming

Slide 33

apt-get install python-virtualenv virtualenv --no-site-packages <path> source <path>/bin/activate pip install gdata http://code.google.com/p/gdata-python-client/ http://code.google.com/apis/analytics/docs/gdata/gdataExplorer.html

Slide 34

https://gist.github.com/967503 (~/Downloads/googleanalytics.py)

Slide 35

Training: I really like the peepcode videos for learning technical stuff. Recommended: command line, git, vim. http://peepcode.com/

Slide 36


Slide 37

If you start catching the bug, go learn a framework. I like django (https://docs.djangoproject.com/en/1.3/intro/tutorial01/)

Slide 38

Stackoverflow is your friend – everything you can’t get answers to here is a copy and paste error guaranteed(*)

Slide 39

Time series

Slide 40

You come across data that looks like this all the time Day Date Visits 1 Monday 05/04/2010 877 2 Tuesday 06/04/2010 1087 3 Wednesday 07/04/2010 1018 4 Thursday 08/04/2010 1039 5 Friday 09/04/2010 917 6 Saturday 10/04/2010 670 7 Sunday 11/04/2010 746 8 Monday 12/04/2010 1165 9 Tuesday 13/04/2010 1192 10 Wednesday 14/04/2010 1053 11 Thursday 15/04/2010 1022 12 Friday 16/04/2010 947

Slide 41

Your first action: eyeball the data

Slide 42

Your action: Fire up R - http://www.r-project.org/ > pg = read.csv("<path_to_file.csv>") > tspg=ts(pg[,3],start=1,freq=7) > plot(stl(tspg,s.window="periodic"))

Slide 43

Your action: decompose time series into its constituent parts

Slide 44

The “trend” is useful: this removes seasonality and outliers

Slide 45

The “remainder” is the outliers: might correspond to activity

Slide 46

Videos are now available of our #linklove conference in London and New Orleans: www.distilled.net/store

Summary: As online marketers, understanding data is one of the things that sets us apart. In order to use data as effectively as possible, we need to understand what tools are available and how we can use those tools. In this webinar, Will is going to share his tips for dealing with data quickly and efficiently.

Tags: data