Lecture 04

  • Lecture 03 recap

    • Distributed system design as a graph mapping problem
    • Performance: throughput, responsiveness
    • Quality of Service (QoS)
    • Dependability (Reliability): Correctness, Fault-tolerance, Security
      • correctness: the system correctly implements the specification (may not fulfill customer needs)
      • fault-tolerance: the use of redundancy
  • Fundamental models

    • Questions: Main entities and their interaction? Characteristics that change the behavior?
    • Purpose: Specify assumptions, make generalizations
  • Interaction model

    • As the number of components increases, the system behavior becomes harder to predict. Distributed systems are complex system and have emergent behaviors. It is hard to analyze it. Instead, we do simulation.
    • Important factors: latency, bandwidth, jitter (difference between response time), clocks and timing
  • Event ordering:

    • There no guarantee on the message arrival order

    • Events change the system state

    • When message comes, the state of receiver can vary

    • Sync. vs. Async: Has or Has no a global clock

      • In Sync. system, a faster process has to wait for the next clock tick.
  • Failures

    • Types of failure: fail-stop, crash, omission, Byzantine
    • History: Byzantine empire (拜占庭帝国) 因为历史上为战败国,被赋予了坏名声。所以Byzantine错误在分布式系统中是最糟糕的错误。
    • Timing failures: Clock (Clock rate too fast), Performance 1 (Process exceeds clock interval), Performance 2 (Message transmission takes too long)
  • We program for a single process, not directly define how processes interact. The interaction cannot be precisely predicted. Just like a large group of people moving in space. Like birds that fly in some pattern without anyone designing their collective behavior.

  • Bandwidth

    • data rate (e.g. 300kb/s) is not bandwidth
    • Bandwidth is a physical property of a channel (cannot be changed)
    • The Shannon theory links bandwidth to data rate