|
|
grep www.distilled.net-access_log -e 'Googlebot' | grep -e '404‘ grep www.distilled.net-access_log -e ' 404 ' | grep -e 'store'
Using and abusing data Setting us apart from other marketers Will Critchlow, 2011
This presentation is about skills and things to try… Understanding data is all that separates us from the savages
Some skills I firmly believe technical skills make you a better SEO
Gathering data: there are loads of places it could come from, but you should be adept at accessing it. Check out http://www.infochimps.com/
There are loads of great APIs: http://www.seomoz.org/blog/api-and-dataset-cheatsheet-building-quick-dirty-tools
Sometimes you need to go get your own: with varying levels of difficulty. Check out http://www.mozenda.com/, http://scraperwiki.com/
Rapid prototyping FTW: I’m talking *really* Heath Robinson
On statistics The modern world runs on math. Learn just a little bit
We are good at spotting (and extrapolating) trends Go, go, go
But randomness often looks like a trend Uh oh
And sometimes, we are just wrong PANIC!!1
A 1% mis-diagnosis rate can result in more errors than correct diagnoses
Pie charts are often hard to be read and can easily be made misleading http://dis.tl/m6sZLO
Watch out for charts that don’t go to zero on the y-axis http://dis.tl/iH1qgs
The same data presented in a less misleading way
Action: read “How to lie with statistics” http://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/0393310728 Old, but great
Don’t blindly mine data for patterns (you’ll definitely find something). Have a hypothesis Test on different data to that used to form the hypothesis Don’t forget that a 95% confidence interval is wrong 1 in 20 times Common statistical misconceptions / mistakes
Visual =Powerful
Geckoboard: build a culture of data - http://www.geckoboard.com
Visible data and clear targets work: we had a modest internal target to hit 2,000 followers of @distilled by our #linklove conference Target set Target ended Source: twittercounter.com
One of our clients built a tool to visualize website changes over time: http://www.reevoo.com / https://github.com/georgebrock/timelapse
Public = powerful
Coming soon: public post analytics on SEOmoz
Back to getting the data
=importxml("http://www.google.com/search?gl=us&q="&A2, "//h3[@class='r']/a/@href") We’ve written about the power of Gdocs for rapid prototyping a bunch of times – see for example: http://dis.tl/kZsv9y and http://dis.tl/jqk05J
Another xpath resource: http://developer.yahoo.com/yql/console/
Command line
Linux: native command lines make a lot of this easier. The latest Ubuntu and VirtualBox are a piece of cake to install and get using
VirtualBox: take snapshots really frequently
Unix principle: lots of small tools, piped together – great for ad-hoc or to create specifications grep * -r -e ‘UA-XXXXXX’ | sed -e ‘s/UA-\([0-9]*\)/UA-XXXXXX/g’
Inspiration: via @tomcritchlow - http://www.neilkodner.com/
Real programming
apt-get install python-virtualenv virtualenv --no-site-packages <path> source <path>/bin/activate pip install gdata http://code.google.com/p/gdata-python-client/ http://code.google.com/apis/analytics/docs/gdata/gdataExplorer.html
https://gist.github.com/967503 (~/Downloads/googleanalytics.py)
Training: I really like the peepcode videos for learning technical stuff. Recommended: command line, git, vim. http://peepcode.com/
STOP
If you start catching the bug, go learn a framework. I like django (https://docs.djangoproject.com/en/1.3/intro/tutorial01/)
Stackoverflow is your friend – everything you can’t get answers to here is a copy and paste error guaranteed(*)
Time series
You come across data that looks like this all the time Day Date Visits 1 Monday 05/04/2010 877 2 Tuesday 06/04/2010 1087 3 Wednesday 07/04/2010 1018 4 Thursday 08/04/2010 1039 5 Friday 09/04/2010 917 6 Saturday 10/04/2010 670 7 Sunday 11/04/2010 746 8 Monday 12/04/2010 1165 9 Tuesday 13/04/2010 1192 10 Wednesday 14/04/2010 1053 11 Thursday 15/04/2010 1022 12 Friday 16/04/2010 947
Your first action: eyeball the data
Your action: Fire up R - http://www.r-project.org/ > pg = read.csv("<path_to_file.csv>") > tspg=ts(pg[,3],start=1,freq=7) > plot(stl(tspg,s.window="periodic"))
Your action: decompose time series into its constituent parts
The “trend” is useful: this removes seasonality and outliers
The “remainder” is the outliers: might correspond to activity
Videos are now available of our #linklove conference in London and New Orleans: www.distilled.net/store
by willcritchlow | Modified: 1 year ago
Language: English | Topic: Internet
| 463 Views | 80 Downloads | 33 Embeds |
Summary: As online marketers, understanding data is one of the things that sets us apart. In order to use data as effectively as possible, we need to understand what tools are available and how we can use those tools. In this webinar, Will is going to share his tips for dealing with data quickly and efficiently.
| URL: |
No comments posted yet
Comments