Yes, Sql!


No comments posted yet


Slide 1

Yes, SQL! Uri Cohen

Slide 2

> SELECT * FROM qcon2010.speakers WHERE name=‘Uri Cohen’ +-----------------------------------------------------+ | Name | Company | Role | Twitter | +-----------------------------------------------------+ | Uri Cohen | GigaSpaces | Product Manager | @uri1803 | +-----------------------------------------------------+ 2 > db.speakers.find({name:”Uri Cohen”}) { “name”:”Uri Cohen”, “company”: { name:”GigaSpaces”, products:[“XAP”, “IMDG”] domain: “In memory data grids” } “role”:”product manager”, “twitter”:”@uri1803” }

Slide 3

Agenda 3 SQL What it is and isn’t good for NoSQL Motivation & Main Concepts of Modern Distributed Data Stores Common interaction models Key/Value, Column, Document NOT consistency and distribution algorithms One Data Store, Multiple APIs Brief intro to GigaSpaces Key/Value challenges SQL challenges: Add-hoc querying, Relationships (JPA)

Slide 4

A Few (more) Words About SQL 4

Slide 5

SQL (Usually) Centralized  Transactional, consistent  Hard to Scale 5

Slide 6

SQL Static, normalized data schema Don’t duplicate, use FKs 6

Slide 7

SQL Add hoc query support  Model first, query later 7

Slide 8

SQL Standard  Well known  Rich ecosystem 8

Slide 9

(Brief) NOSql Recap 9

Slide 10

NoSql (or a Naive Attempt to Define It) A loosely coupled collection of non-relational data stores 10

Slide 11

NoSql (or a Naive Attempt to Define It) (Mostly) d i s t r i b u t e d 11

Slide 12

NoSql (or a Naive Attempt to Define It) scalable (Up & Out) 12

Slide 13

NoSql (or a Naive Attempt to Define It) Not (always) ACID BASE anyone? 13

Slide 14

Why Now? Timing is everything… Exponential Increase in data & throughput Non or semi structured data that changes frequently 14

Slide 15

A Universe of Data Models 15 Key / Value Column { “name”:”uri”, “ssn”:”213445”, “hobbies”:[”…”,“…”], “…”: { “…”:”…” “…”:”…” } } { { ... } } { { ... } } Document

Slide 16

Key/Value Have the key? Get the value That’s about it when it comes to querying Map/Reduce (sometimes) Good for cache aside (e.g. Hibernate 2nd level cache) Simple, id based interactions (e.g. user profiles) In most cases, values are Opaque 16

Slide 17

Key/Value Scaling out is relatively easy (just hash the keys) Some will do that automatically for you Fixed vs. consistent hashing 17

Slide 18

Key/Value Implementations: Memcached, Redis, Riak In memory data grids (mostly Java-based) started this way GigaSpaces, Oracle Coherence, WebSphere XS, JBoss Infinispan, etc. 18

Slide 19

Column Based 19

Slide 20

Column Based Mostly derived from Google’s BigTable / Amazon Dynamo papers One giant table of rows and columns Column == pair (name and a value, sometimes timestamp) Each row can have a different number of columns Table is sparse: (#rows) × (#columns) ≥ (#values) 20

Slide 21

Column Based Query on row key Or column value (aka secondary index) Good for a constantly changing, (albeit flat) domain model 21

Slide 22

Document Think JSON (or BSON, or XML) 22 { “name”:”Lady Gaga”, “ssn”:”213445”, “hobbies”:[”Dressing up”,“Singing”], “albums”: [{“name”:”The fame” “release_year”:”2008”}, {“name”:”Born this way” “release_year”:”2011”}] } { { ... } } { { ... } }

Slide 23

Document Model is not flat, data store is aware of it Arrays, nested documents Better support for ad hoc queries MongoDB excels at this Very intuitive model Flexible schema

Slide 24

What if you didn’t have to choose? 24 JPA JDBC

Slide 25

A Brief Intro to GigaSpaces In Memory Data Grid With optional write behind to a secondary storage 25

Slide 26

A Brief Intro to GigaSpaces Tuple based Aware of nested tuples (and soon collections) Document like Rich querying and map/reduce semantics 26

Slide 27

A Brief Intro to GigaSpaces Transparent partitioning & HA Fixed hashing based on a chosen property 27

Slide 28

A Brief Intro to GigaSpaces Transactional (Like, ACID) Local (single partition) Distributed (multiple partitions) 28

Slide 29

Use the Right API for the Job 29 Even for the same data… POJO & JPA for Java apps with complex domain model Document for a more dynamic view Memcached for simple, language neutral data access JDBC for: Interaction with legacy apps Flexible ad-hoc querying (e.g. projections)

Slide 30

Memcached (the Daemon is in the Details) Memcached Client

Slide 31

Memcached Client Memcached (the Daemon is in the Details)

Slide 32

SQL/JDBC – Query Them All 32 Query may involve Map/Reduce Reduce phase includes merging and sorting

Slide 33

SQL/JDBC – Things to Consider Unique and FK constraints are not practically enforceable Sorting and aggregation may be expensive Distributed transactions are evil Stay local… 33

Slide 34

JPA It’s all about relationships… 34

Slide 35

JPA Relationships To embed or not to embed, that is the question…. Easy to partition and scale Easy to query: user.accounts[*].type = ‘checking’ Owned relationships only

Slide 36

JPA Relationships To embed or not to embed, that is the question…. Any type of relationship Partitioning is hard Querying involves joining

Slide 37

Summary One API doesn’t fit all Use the right API for the job Know the tradeoffs Always ask what you’re giving up, not just what you’re gaining 37

Slide 38

Thank YOU! @uri1803 38

Summary: From qconsf 2010 - this presentation focuses on how the classic querying models like plain SQL and JPA map to distributed data stores. It first reviews the current distributed data stores landscape and its querying models, and then discuss the wide range of APIs for data extraction from these data stores. It then discusses the main challenges of mapping various APIs to a distributed data model and the trade offs to be aware off.

Tags: nosql sql gigaspaces