NoSQL and NewSQL

0

No comments posted yet

Comments

Slide 3

http://faculty.up.edu/lulay/failure/vasacasestudy.pdf Lessons-Learned The lessons to be learned from the sinking of the Vasa are as relevant today as in 1628. Those lessons are summarized as follows: 1. Excessive schedule pressure: The Vasa was completed under strong time constraints to meet a pressing need. 2. Changing needs: Many changes to operational characteristics were made during construction of the ship. 3. Lack of technical specifications: The (non-existent) specifications were not revised as the operational requirements changed. 4. Lack of a documented project plan: During a year-long transition in leadership it was difficult for the assistant to manage the project. This resulted in poor supervision of the various groups working on the ship (i.e., the shipwright, the ship builder, and the Lack of technical specifications: The (non-existent) specifications were not revised as the operational requirements changed. 4. Lack of a documented project plan: During a year-long transition in leadership it was difficult for the assistant to manage the project. This resulted in poor supervision of the various groups working on the ship (i.e., the shipwright, the ship builder, and the Prepared by 6 of 7 R. Fairley numerous subcontractors). There is no evidence that the new project manager (the former assistant) prepared any plans after the original shipwright died. 5. Excessive innovation: No one in Sweden, including the shipwright, had ever built a ship having two gun decks. 6. Secondary innovations: Many secondary innovations were added during construction of the Vasa to accommodate the increased length, the additional gun deck, and other changes. 7. Requirements creep: It seems that no one was aware of the degree to which the Vasa had evolved during the 2 ½ years of construction. 8. Lack of scientific methods: There were no known methods for calculating center of gravity, stiffness, and the resulting stability relationships of the Vasa. 9. Ignoring the obvious: The Vasa was launched after failing a stability test. 10. Possible mendacity: Results of the stability test were known to some but were not communicated to others

Slide 1

The NoSQL and NewSQL

Slide 2

2 > SELECT * FROM 3d.speakers WHERE name=‘Nati Shalom’ +-------------------------------------------------------+ | Name | Company | Role | Twitter | +-------------------------------------------------------+ | Nati Shalom | GigaSpaces | CTO & Founder | @natishalom| +-------------------------------------------------------+ > 3d.speakers.find({name:“Nati Shalom”}) { name:“Nati Shalom”, company: { name:“GigaSpaces”, products:[“IMDG”,“XAP”,“CEAP”] domain:“Scaling platform” } role:“CTO, twitter:“@natishalom” }

Slide 3

Before we jump on new technology.. Why the Vasa Sank ? Excessive innovation Lack of scientific methods Ignoring the obvious The Vasa was designed to be one of the premier warships of the 17th century. Unfortunately, the ship sank two hours after its initial launch in an 8-knot wind

Slide 4

Agenda 4 SQL What it is and isn’t good for NoSQL Motivation & Main Concepts of Modern Distributed Data Stores Common interaction models Key/Value, Column, Document NOT consistency and distribution algorithms One Data Store, Multiple APIs Brief intro to GigaSpaces Key/Value challenges SQL challenges: Add-hoc querying, Relationships (JPA)

Slide 5

A Few (more) Words About SQL 5

Slide 6

SQL (Usually) Centralized  Transactional, consistent  Hard to Scale 6

Slide 7

SQL Static, normalized data schema Don’t duplicate, use FKs 7

Slide 8

SQL Add hoc query support  Model first, query later 8

Slide 9

SQL Standard  Well known  Rich ecosystem 9

Slide 10

(Brief) NOSql Recap 10

Slide 11

NoSql (or a Naive Attempt to Define It) A loosely coupled collection of non-relational data stores 11

Slide 12

NoSql (or a Naive Attempt to Define It) (Mostly) d i s t r i b u t e d 12

Slide 13

NoSql (or a Naive Attempt to Define It) scalable (Up & Out) 13

Slide 14

NoSql (or a Naive Attempt to Define It) Not (always) ACID BASE anyone? 14 Basically Available, Soft-state, Eventually consistent

Slide 15

Why Now? Timing is everything… Exponential Increase in data & throughput Non or semi structured data that changes frequently 15

Slide 16

A Universe of Data Models 16 Key / Value Column { “name”:”uri”, “ssn”:”213445”, “hobbies”:[”…”,“…”], “…”: { “…”:”…” “…”:”…” } } { { ... } } { { ... } } Document

Slide 17

Key/Value Have the key? Get the value That’s about it when it comes to querying Map/Reduce (sometimes) Good for cache aside (e.g. Hibernate 2nd level cache) Simple, id based interactions (e.g. user profiles) In most cases, values are Opaque 17

Slide 18

Key/Value Scaling out is relatively easy (just hash the keys) Some will do that automatically for you Fixed vs. consistent hashing 18

Slide 19

Key/Value Implementations: Memcached, Redis, Riak In memory data grids (mostly Java-based) started this way GigaSpaces, Oracle Coherence, WebSphere XS, JBoss Infinispan, etc. 19

Slide 20

Column Based 20

Slide 21

Column Based Mostly derived from Google’s BigTable / Amazon Dynamo papers One giant table of rows and columns Column == pair (name and a value, sometimes timestamp) Each row can have a different number of columns Table is sparse: (#rows) × (#columns) ≥ (#values) 21

Slide 22

Column Based Query on row key Or column value (aka secondary index) Good for a constantly changing, (albeit flat) domain model 22

Slide 23

Document Think JSON (or BSON, or XML) 23 { “name”:”Lady Gaga”, “ssn”:”213445”, “hobbies”:[”Dressing up”,“Singing”], “albums”: [{“name”:”The fame” “release_year”:”2008”}, {“name”:”Born this way” “release_year”:”2011”}] } { { ... } } { { ... } }

Slide 24

Document Model is not flat, data store is aware of it Arrays, nested documents Better support for ad hoc queries MongoDB excels at this Very intuitive model Flexible schema

Slide 25

What if you didn’t have to choose? 25

Slide 26

A Brief Intro to GigaSpaces In Memory Data Grid With optional write behind to a secondary storage 26

Slide 27

Few things on memory and cost.. Memory can be x10, x100 lower than disk based solution for high performance apps (Stanford research) An entire application data can fit into a single blade 1TB cost only $1.8k/Month ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved 27

Slide 28

A Brief Intro to GigaSpaces Tuple based Aware of nested tuples (and soon collections) Document like Rich querying and map/reduce semantics 28

Slide 29

A Brief Intro to GigaSpaces Transparent partitioning & HA Fixed hashing based on a chosen property 29

Slide 30

A Brief Intro to GigaSpaces Transactional (Like, ACID) Local (single partition) Distributed (multiple partitions) 30

Slide 31

Use the Right API for the Job 31 Even for the same data… POJO & JPA for Java apps with complex domain model Document for a more dynamic view Memcached for simple, language neutral data access JDBC for: Interaction with legacy apps Flexible ad-hoc querying (e.g. projections)

Slide 32

Memcached (the Daemon is in the Details) Memcached Client

Slide 33

JPA It’s all about relationships… 33

Slide 34

JPA Relationships To embed or not to embed, that is the question…. Easy to partition and scale Easy to query: user.accounts[*].type = ‘checking’ Owned relationships only

Slide 35

Distributed relationship 35 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved JPA Use Map/Reduce through JPA api to handle distributed relationship

Slide 36

Schema-Free (Document) API ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved 36

Slide 37

What’s next.. Complete solution for Big Data 37 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved In Memory Data Grid Real time Event driven Execute code with data File Based NoSQL Low cost storage Write/Read scalability Dynamic scaling Event Sources Analytic Application Write behind How many Req/Day What devices fail at the same time

Slide 38

Summary One API doesn’t fit all Use the right API for the job Combine In-Memory & File-based solution for best cost/performance Know the tradeoffs Always ask what you’re giving up, not just what you’re gaining 38

Slide 39

Thank YOU! @natishalom http://blog.gigaspaces.com 39

Summary: Common patterns in SQL and NoSQL and the emergence of the NewSQL model. The talk walk through common references - Google BigTable, MongoDB, Memcache .. It shows how you can combine the best of each of those API's into one common scalable data service using GigaSpaces. A live demo is also avaliable here: http://www.youtube.com/watch?v=jC57mId3SMg

Tags: gigaspaces nosql newsql mongodb jpa scalability

URL:
More by this User
Most Viewed