OSDI
2024
2024 OSDI has the following sessions:
- Memory
- Low-Latency LLM Serving
- Distributed
- Deep Learning
- Operating System
- Cloud Computing
- Formal Verification
- Cloud Security
- Data Management
- Analysis of Correctness (formal verification, reliability, etc.)
- ML Scheduling
A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications
Existing far memory systems include two implementation:
- Uses the kernel's paging system
- Bypass the kernel, fetching data at the object granularity
Sabre: Hardware-Accelerated Snapshot Compression for Serverless MicroVMs
Motivation: cold starts, VM snapshotting.
Backgrounds:
- Firecvracker can snapshot the full guest memory or only the dirty pages.
- Working sets of pages can make serverless VM snapshots smaller and faster to fetch.
- Hardware implementations of (de)compression include Intel In-Memory Analytics Accelerator (IAA)
Contribution:
- Characterizing the IAA accelerator on a set of diverse benchmarks, and show its potential for compressing memory pages.
- Build Sabre and integrating it with the Firecracker virtual machine monitor (VMM) in a serverless environment with snapshotting support.
Fairness in Serving Large Language Models
🏫: UCB Skylab
LLM serving scheduling.
The first author (Ying Sheng) will be joining UCLA in Fall 2026.
MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures
🏫: University of Sydney
ServiceLab: Preventing Tiny Performance Regressions at Hyperscale through Pre-Production Testing
⚠️ Industrial paper from META Platforms.
Motivation: detect small performance regressions, sometimes as tiny as 0.01%, on serverless platform with million of machines.
Contribution:
- (Heterogeneous cloud machines) The performance variance on two machines is comparable based on instance type, CPU architecture, kernel version, datacenter region and have CPU turbo disabled.
- (Detect small regressions)
- (Support diverse services) ServiceLab takes the record-and-replay approach for testing.
Performace variance includes: 1. accidents 2. environment 3. true regression. After analysis and filtering, the performance invariants are: kernel versions, ServerTypes, CPU architecture, and datacenter regions.
Optimizing Resource Allocation in Hyperscale Datacenters: Scalability, Usability, and Experiences
dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
🏫: Microsoft Research
Performance Interfaces for Hardware Accelerators
🏫: EPFL, Dependable System Group
ACCL+: an FPGA-Based Collective Engine for Distributed Applications
🏫: ETH Zurich, System Group
ServerlessLLM: Low-Latency Serverless Inference for Large Language Models
🏫: University of Edinburgh
Serverless inference takes users' model execution and parameter as the input, placed at a checkpoint storage system. When a request arrives, the scheduler selects available GPUs to initiate these checkpoints and a router directs the request to the seleted GPUs. But it generally suffers from high latency and cold start.
Core idea: leveraging the multi-tier storage hierarchy for local checkpoint storage and harnessing their significant storage bandwidth for efficient checkpoint loading.
Challenges:
- Bandwidth
- Live migration of inference, with two types: (1) token-only (2) full kv-cache
- Predict the resource consumption
Multitier storage hierarchy: 1. memory, 2. NVMe SSD, 3. SATA SSD
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
🏫: UCSD
Llumnix: Dynamic Scheduling for Large Language Model Serving
⚠️ Industrial paper from Alibaba.
DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency
When will my ML Job finish? Toward providing Completion Time Estimates through Predictability-Centric Scheduling
🏫: Tufts University
A new scheduler to reach a balance between predictability and practicality.
2023
Honeycomb: Secure and Efficient GPU Executions via Static Validation
TAILCHECK: A Lightweight Heap Overflow Detection Mechanism with Page Protection and Tagged Pointers
2022
Automatic Reliability Testing for Cluster Management Controllers
KSplit: Automating Device Driver Isolation
Join static analysis and kernel isolation.
RESIN: A Holistic Service for Dealing with Memory Leaks in Production Cloud Infrastructure
Memory leakage, cloud infrastructure
TODO
XRP: In-Kernel Storage Functions with eBPF
TODO
zIO: Accelerating IO-Intensive Applications with Transparent Zero-Copy IO
TODO
TODO
Design and Verification of the Arm Confidential Compute Architecture
TODO
CAP-VMs: Capability-Based Isolation and Sharing in the Cloud
TODO
Application-Informed Kernel Synchronization Primitives
TODO
TODO
Operating System Support for Safe and Efficient Auxiliary Execution
Auxiliary tasks: tasks for fault detection, performance monitoring, online diagnosis, resource management, etc.
Three protection scenarios:
- application extensibility: protect main realm from untrusted extension code.
- secure partitioning: protect sensitive procedure from main application being compromised.
- maintenance: protect main application from trusted code.
BlackBox: A Container Security Monitor for Protecting Containers on Untrusted Operating Systems
Terminology
- TCB: trusted computing base, can be a metric of LOC.
- CSM: container security monitor, servers as the TCB in BlackBox.
Blackbox: a fine-grain protection of container data confidentiality and integrity without the need to trust the OS
2021
NrOS: Effective Replication and Sharing in an Operating System
2020
Do OS abstractions make sense on FPGAs?
TODO
Testing Configuration Changes in Context to Prevent Production Failures
ctest's two targets: (1) misconfiguration (2) bugs in code exposed by change towards configurations.
ctest is parameterized.
ctest choose dynamic analysis by instrumenting GET and SET APIs in configuration abstractions.
ctest exempts parameters that implicitly assume values.
ctest use heuristics to automatically generate values for validation.
Toward a Generic Fault Tolerance Technique for Partial Network Partitioning