diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..a136337 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +*.pdf diff --git a/README.markdown b/README.markdown index 445b1d7..8dab1be 100644 --- a/README.markdown +++ b/README.markdown @@ -9,13 +9,13 @@ lecture and discussion. Participants will gain an intuitive understanding of key distributed systems terms, an overview of the algorithmic landscape, and explore production concerns. -## What makes a thing distributed? +## What makes a thing distributed Lamport, 1987: -> A distributed system is one in which the failure of a computer -> you didn't even know existed can render your own computer -> unusable. +> A distributed system is one in which the failure of a computer +> you didn't even know existed can render your own computer +> unusable. - First glance: \*nix boxen in our colo, running processes communicating via TCP or UDP. @@ -162,7 +162,7 @@ Lamport, 1987: - This causes all kinds of havoc in, say, metrics collection - And debugging it is *hard* - TCP gives you flow control and repacks logical messages into packets - - You'll need to re-build flow-control and backpressure + - You'll need to re-build flow-control and back-pressure - TLS over UDP is a thing, but tough - UDP is really useful where TCP FSM overhead is prohibitive - Memory pressure @@ -187,15 +187,15 @@ Lamport, 1987: - Caveat: Hardware can drift - Caveat: By *centuries* - NTP might not care - - http://rachelbythebay.com/w/2017/09/27/2153/ + - - Caveat: NTP can still jump the clock backwards (default: delta > 128 ms) - - https://www.eecis.udel.edu/~mills/ntp/html/clock.html + - - Caveat: POSIX time is not monotonic by *definition* - Cloudflare 2017: Leap second at midnight UTC meant time flowed backwards - At the time, Go didn't offer access to CLOCK_MONOTONIC - Computed a negative duration, then fed it to rand.int63n(), which paniced - Caused DNS resolutions to fail: 1% of HTTP requests affected for several hours - - https://blog.cloudflare.com/how-and-why-the-leap-second-affected-cloudflare-dns/ + - - Caveat: The timescales you want to measure may not be attainable - Caveat: Threads can sleep - Caveat: Runtimes can sleep @@ -252,7 +252,6 @@ Lamport, 1987: - I don't know who's doing it yet, but I'd bet datacenters in the future will offer dedicated HW interfaces for bounded-accuracy time. - ## Review We've covered the fundamental primitives of distributed systems. Nodes @@ -261,9 +260,6 @@ various ways. Protocols like TCP and UDP give us primitive channels for processes to communicate, and we can order events using clocks. Now, we'll discuss some high-level *properties* of distributed systems. - - - ## Availability - Availability is basically the fraction of attempted operations which succeed. @@ -309,7 +305,6 @@ discuss some high-level *properties* of distributed systems. - "Apdex for the user service just dropped to 0.5; page ops!" - Ideally: integral of happiness delivered by your service? - ## Consistency - A consistency model is the set of "safe" histories of events in the system @@ -465,7 +460,6 @@ discuss some high-level *properties* of distributed systems. - Monotonic Reads - Monotonic Writes - ### Harvest and Yield - Fox & Brewer, 1999: Harvest, Yield, and Scalable Tolerant Systems @@ -482,8 +476,8 @@ discuss some high-level *properties* of distributed systems. - e.g. "99% of the time, you can read 90% of your prior writes" - Strongly dependent on workload, HW, topology, etc - Can tune harvest vs yield on a per-request basis - - "As much as possible in 10ms, please" - - "I need everything, and I understand you might not be able to answer" + - "As much as possible in 10ms, please" + - "I need everything, and I understand you might not be able to answer" ### Hybrid systems @@ -505,7 +499,6 @@ consistency models generally come at the cost of performance and availability. Next, we'll talk about different ways to build systems, from weak to strong consistency. - ## Avoid Consensus Wherever Possible ### CALM conjecture @@ -534,7 +527,6 @@ consistency. - Unordered programming with flow analysis - Can tell you where coordination *would* be required - ### Gossip - Message broadcast system @@ -552,7 +544,7 @@ consistency. - Hop up to a connector node which relays to other connector nodes - Reduces superfluous messages - Reduces latency - - Plumtree (Leit ̃ao, Pereira, & Rodrigues, 2007: Epidemic Broadcast Trees) + - Plumtree (Leitao, Pereira, & Rodrigues, 2007: Epidemic Broadcast Trees) - Push-Sum et al - Sum inputs from everyone you've received data from - Broadcast that to a random peer @@ -603,7 +595,6 @@ consistency. - Probably best in concert with stronger transactional systems - See also: COPS, Swift, Eiger, Calvin, etc - ## Fine, We Need Consensus, What Now? - The consensus problem: @@ -648,7 +639,6 @@ consistency. majority of nodes. - More during cluster transitions. - ### Paxos - Paxos is the Gold Standard of consensus algorithms @@ -740,8 +730,6 @@ transactions. Serializability and linearizability require *consensus*, which we can obtain through Paxos, ZAB, VR, or Raft. Now, we'll talk about different *scales* of distributed systems. - - ## Characteristic latencies - Latency is *never* zero @@ -791,12 +779,11 @@ can obtain through Paxos, ZAB, VR, or Raft. Now, we'll talk about different - Network is within an order of mag compared to uncached disk seeks - Or faster, in EC2 - EC2 disk latencies can routinely hit 20ms - - 200ms? - - *20,000* ms??? - - Because EBS is actually other computers - - LMAO if you think anything in EC2 is real - - Wait, *real disks do this too*? - - What even are IO schedulers? + - 200ms? *20,000* ms??? + - Because EBS is actually other computers + - LMAO if you think anything in EC2 is real + - Wait, *real disks do this too*? + - What even are IO schedulers? - But network is waaaay slower than memory/computation - If your aim is *throughput*, work units should probably take longer than a millisecond @@ -849,7 +836,6 @@ latencies are short enough for many network hops before users take notice. In geographically replicated systems, high latencies drive eventually consistent and datacenter-pinned solutions. - ## Common distributed systems ### Outsourced heaps @@ -951,7 +937,6 @@ low-latency processing of datasets, and tend to look more like frameworks than databases. Their dual, distributed queues, focus on the *messages* rather than the *transformations*. - ## A Pattern Language - General recommendations for building distributed systems @@ -1093,7 +1078,7 @@ than the *transformations*. - Sharding for scalability - Avoiding coordination via CRDTs - Flake IDs: *mostly* time-ordered identifiers, zero-coordination - - See http://yellerapp.com/posts/2015-02-09-flake-ids.html + - See - Partial availability: users can still use some parts of the system - Processing a queue: more consumers reduces the impact of expensive events @@ -1290,8 +1275,6 @@ scale. As software grows, different components must scale independently, and we break out libraries into distinct services. Service structure goes hand-in-hand with teams. - - ## Production Concerns - More than design considerations @@ -1362,8 +1345,8 @@ hand-in-hand with teams. - In relation to its dependencies - Which can, in turn, drive new tests - In a way, good monitoring is like continuous testing - - But not a replacement: these are distinct domains - - Both provide assurance that your changes are OK + - But not a replacement: these are distinct domains + - Both provide assurance that your changes are OK - Want high-frequency monitoring - Production behaviors can take place on 1ms scales - TCP incast @@ -1381,13 +1364,13 @@ hand-in-hand with teams. - Key metrics for most systems - Apdex: successful response WITHIN latency SLA - Latency profiles: 0, 0.5, 0.95, 0.99, 1 - - Percentiles, not means - - BTW you can't take the mean of percentiles either + - ^ Percentiles, not means + - ^ BTW you can't take the mean of percentiles either - Overall throughput - Queue statistics - Subjective experience of other systems latency/throughput - - The DB might think it's healthy, but clients could see it as slow - - Combinatorial explosion--best to use this when drilling into a failure + - ^ The DB might think it's healthy, but clients could see it as slow + - ^ Combinatorial explosion--best to use this when drilling into a failure - You probably have to write this instrumentation yourself - Invest in a metrics library - Out-of-the-box monitoring usually doesn't measure what really matters: your @@ -1504,7 +1487,6 @@ hand-in-hand with teams. - Ask Jeff Hodges why it's hard: see his RICON West 2013 talk - See Zach Tellman - Everything Will Flow - ## Review Running distributed systems requires cooperation between developers, QA, and @@ -1519,10 +1501,15 @@ special care. ### Online -- Mixu has a delightful book on distributed systems with incredible detail. http://book.mixu.net/distsys/ -- Jeff Hodges has some excellent, production-focused advice. https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/ -- The Fallacies of Distributed Computing is a classic text on mistaken assumptions we make designing distributed systems. http://www.rgoarchitects.com/Files/fallacies.pdf -- Christopher Meiklejohn has a list of key papers in distributed systems. http://christophermeiklejohn.com/distributed/systems/2013/07/12/readings-in-distributed-systems.html +- Mixu has a delightful book on distributed systems with incredible detail. + +- Jeff Hodges has some excellent, production-focused advice. + +- The Fallacies of Distributed Computing is a classic text + on mistaken assumptions we make designing distributed systems. + +- Christopher Meiklejohn has a list of key papers in distributed systems. + ### Trees diff --git a/pandoc_pdf.sh b/pandoc_pdf.sh new file mode 100755 index 0000000..3287602 --- /dev/null +++ b/pandoc_pdf.sh @@ -0,0 +1,3 @@ +#!/bin/sh + +pandoc --verbose --from=markdown_github --output=aphyr-distsys-intro.pdf --variable classoption=twocolumn --standalone README.markdown