作者 |
(美)Brendan Gregg(布伦丹·格雷格) |
丛书名 |
原味精品书系 |
出版社 |
电子工业出版社 |
ISBN |
9787121386947 |
简要 |
简介 |
内容简介书籍计算机书籍 本书作为全面介绍 BPF 技术的图书,从 BPF 技术的起源到未来发展方向都有涵盖,不仅系统介绍了 BPF 的编程模型,还完整介绍了两个主要的 BPF 前端编程框架—BCC 和 bpftrace,更给出了一系列实现范例,生动展示了 BPF 技术的实际能力和未来发展前景。本书的另一个关注方向是 Linux 系统性能和应用程序性能的调优,内容涉及系统性能调优的策略、工具与实践案例,不仅介绍了对应的 BPF 工具,还着重介绍了这些工具如何与 Linux 传统性能工具配合使用,这样读者可以选择最佳方案。本书介绍的工具小巧精致,并提供了简单易读的源代码,它们充分展现了 BPF 技术的魅力 :安全、高效、快捷的系统扩展力。未来 BPF 技术在 Linux 中的应用场景会越来越多、越来越重要。希望本书能在大家学习 BPF 技术并关注它的发展时提供帮助。 |
目录 |
Part I: Technologies 1 Introduction 1 1.1 What Are BPF and eBPF? 1 1.2 What Are Tracing, Snooping, Sampling, Profiling, and Observability? 2 1.3 What Are BCC, bpftrace, and IO Visor? 3 1.4 A First Look at BCC: Quick Wins 4 1.5 BPF Tracing Visibility 6 1.6 Dynamic Instrumentation: kprobes and uprobes 8 1.7 Static Instrumentation: Tracepoints and USDT 9 1.8 A First Look at bpftrace: Tracing open() 10 1.9 Back to BCC: Tracing open() 12 1.10 Summary 14 2 Technology Background 15 2.1 BPF Illustrated 15 2.2 BPF 16 2.3 Extended BPF (eBPF) 17 2.3.1 Why Performance Tools Need BPF 19 2.3.2 BPF Versus Kernel Modules 21 2.3.3 Writing BPF Programs 22 2.3.4 Viewing BPF Instructions: bpftool 23 2.3.5 Viewing BPF Instructions: bpftrace 30 2.3.6 BPF API 31 2.3.7 BPF Concurrency Controls 35 2.3.8 BPF sysfs Interface 36 2.3.9 BPF Type Format (BTF) 37 2.3.10 BPF CO-RE 37 2.3.11 BPF Limitations 38 2.3.12 BPF Additional Reading 38 2.4 Stack Trace Walking 39 2.4.1 Frame Pointer–Based Stacks 39 2.4.2 debuginfo 40 2.4.3 Last Branch Record (LBR) 40 2.4.4 ORC 40 2.4.5 Symbols 41 2.4.6 More Reading 41 2.5 Flame Graphs 41 2.5.1 Stack Trace 41 2.5.2 Profiling Stack Traces 41 2.5.3 Flame Graph 42 2.5.4 Flame Graph Features 44 2.5.5 Variations 44 2.6 Event Sources 45 2.7 kprobes 46 2.7.1 How kprobes Work 46 2.7.2 kprobes Interfaces 47 2.7.3 BPF and kprobes 48 2.7.4 kprobes Additional Reading 49 2.8 uprobes 49 2.8.1 How uprobes Work 49 2.8.2 Uprobes Interfaces 51 2.8.3 BPF and uprobes 51 2.8.4 uprobes Overhead and Future Work 52 2.8.5 uprobes Additional Reading 52 2.9 Tracepoints 53 2.9.1 Adding Tracepoint Instrumentation 53 2.9.2 How Tracepoints Work 55 2.9.3 Tracepoint Interfaces 56 2.9.4 Tracepoints and BPF 56 2.9.5 BPF Raw Tracepoints 57 2.9.6 Additional Reading 58 2.10 USDT 58 2.10.1 Adding USDT Instrumentation 58 2.10.2 How USDT Works 60 2.10.3 BPF and USDT 61 2.10.4 USDT Additional Reading 61 2.11 Dynamic USDT 61 2.12 PMCs 63 2.12.1 PMC Modes 63 2.12.2 PEBS 64 2.12.3 Cloud Computing 64 2.13 perf_events 64 2.14 Summary 65 3 Performance Analysis 67 3.1 Overview 67 3.1.1 Goals 68 3.1.2 Activities 68 3.1.3 Mulitple Performance Issues 69 3.2 Performance Methodologies 69 3.2.1 Workload Characterization 70 3.2.2 Drill-Down Analysis 71 3.2.3 USE Method 72 3.2.4 Checklists 72 3.3 Linux 60-Second Analysis 73 3.3.1 uptime 73 3.3.2 dmesg | tail 74 3.3.3 vmstat 1 74 3.3.4 mpstat -P ALL 1 75 3.3.5 pidstat 1 75 3.3.6 iostat -xz 1 76 3.3.7 free -m 77 3.3.8 sar -n DEV 1 77 3.3.9 sar -n TCP,ETCP 1 78 3.3.10 top 78 3.4 BCC Tool Checklist 79 3.4.1 execsnoop 80 3.4.2 opensnoop 80 3.4.3 ext4slower 80 3.4.4 biolatency 81 3.4.5 biosnoop 81 3.4.6 cachestat 82 3.4.7 tcpconnect 82 3.4.8 tcpaccept 82 3.4.9 tcpretrans 83 3.4.10 runqlat 83 3.4.11 profile 84 3.5 Summary 84 4 BCC 85 4.1 BCC Components 86 4.2 BCC Features 86 4.2.1 Kernel-Level Features 87 4.2.2 BCC User-Level Features 87 4.3 BCC Installation 88 4.3.1 Kernel Requirements 88 4.3.2 Ubuntu 88 4.3.3 RHEL 89 4.3.4 Other Distributions 89 4.4 BCC Tools 89 4.4.1 Highlighted Tools 90 4.4.2 Tool Characteristics 91 4.4.3 Single-Purpose Tools 91 4.4.4 Multi-Purpose Tools 93 4.5 funccount 94 4.5.1 funccount Examples 94 4.5.2 funccount Syntax 97 4.5.3 funccount One-Liners 97 4.5.4 funccount Usage 98 4.6 stackcount 99 4.6.1 stackcount Example 99 4.6.2 stackcount Flame Graphs 100 4.6.3 stackcount Broken Stack Traces 101 4.6.4 stackcount Syntax 102 4.6.5 stackcount One-Liners 102 4.6.6 stackcount Usage 103 4.7 trace 104 4.7.1 trace Example 104 4.7.2 trace Syntax 105 4.7.3 trace One-Liners 106 4.7.4 trace Structs 107 4.7.5 trace Debugging File Descriptor Leaks 107 4.7.6 trace Usage 109 4.8 argdist 110 4.8.1 argdist Syntax 111 4.8.2 argdist One-Liners 112 4.8.3 argdist Usage 113 4.9 Tool Documentation 114 4.9.1 Man Page: opensnoop 115 4.9.2 Examples File: opensnoop 118 4.10 Developing BCC Tools 119 4.11 BCC Internals 120 4.12 BCC Debugging 121 4.12.1 printf() Debugging 122 4.12.2 BCC Debug Output 124 4.12.3 BCC Debug Flag 125 4.12.4 bpflist 126 4.12.5 bpftool 127 4.12.6 dmesg 127 4.12.7 Resetting Events 127 4.13 Summary 128 5 bpftrace 129 5.1 bpftrace Components 130 5.2 bpftrace Features 131 5.2.1 bpftrace Event Sources 131 5.2.2 bpftrace Actions 131 5.2.3 bpftrace General Features 132 5.2.4 bpftrace Compared to Other Observability Tools 132 5.3 bpftrace Installation 133 5.3.1 Kernel Requirements 133 5.3.2 Ubuntu 134 5.3.3 Fedora 134 5.3.4 Post-Build Steps 134 5.3.5 Other Distributions 135 5.4 bpftrace Tools 135 5.4.1 Highlighted Tools 136 5.4.2 Tool Characteristics 136 5.4.3 Tool Execution 137 5.5 bpftrace One-Liners 137 5.6 bpftrace Documentation 138 5.7 bpftrace Programming 138 5.7.1 Usage 139 5.7.2 Program Structure 140 5.7.3 Comments 140 5.7.4 Probe Format 141 5.7.5 Probe Wildcards 141 5.7.6 Filters 142 5.7.7 Actions 142 5.7.8 Hello, World! 142 5.7.9 Functions 143 5.7.10 Variables 143 5.7.11 Map Functions 144 5.7.12 Timing vfs_read() 145 5.8 bpftrace Usage 147 5.9 bpftrace Probe Types 148 5.9.1 tracepoint 148 5.9.2 usdt 150 5.9.3 kprobe and kretprobe 151 5.9.4 uprobe and uretprobe 151 5.9.5 software and hardware 152 5.9.6 profile and interval 153 5.10 bpftrace Flow Control 154 5.10.1 Filter 154 5.10.2 Ternary Operators 154 5.10.3 If Statements 155 5.10.4 Unrolled Loops 155 5.11 bpftrace Operators 155 5.12 bpftrace Variables 156 5.12.1 Built-in Variables 156 5.12.2 Built-ins: pid, comm, and uid 157 5.12.3 Built-ins: kstack and ustack 157 5.12.4 Built-ins: Positional Parameters 159 5.12.5 Scratch 160 5.12.6 Maps 160 5.13 bpftrace Functions 161 5.13.1 printf() 162 5.13.2 join() 163 5.13.3 str() 163 5.13.4 kstack() and ustack() 164 5.13.5 ksym() and usym() 165 5.13.6 kaddr() and uaddr() 166 5.13.7 system() 166 5.13.8 exit() 167 5.14 bpftrace Map Functions 167 5.14.1 count() 168 5.14.2 sum(), avg(), min(), and max() 169 5.14.3 hist() 170 5.14.4 lhist() 171 5.14.5 delete() 171 5.14.6 clear() and zero() 172 5.14.7 print() 172 5.15 bpftrace Future Work 173 5.15.1 Explicit Address Modes 173 5.15.2 Other Additions 174 5.15.3 ply 175 5.16 bpftrace Internals 175 5.17 bpftrace Debugging 176 5.17.1 printf() Debugging 177 5.17.2 Debug Mode 177 5.17.3 Verbose Mode 179 5.18 Summary 180 Part II: Using BPF Tools 6 CPUs 181 6.1 Background 181 6.1.1 CPU Fundamentals 182 6.1.2 BPF Capabilities 184 6.1.3 Strategy 185 6.2 Traditional Tools 186 6.2.1 Kernel Statistics 187 6.2.2 Hardware Statistics 189 6.2.3 Hardware Sampling 192 6.2.4 Timed Sampling 192 6.2.5 Event Statistics and Tracing 196 6.3 BPF Tools 198 6.3.1 execsnoop 200 6.3.2 exitsnoop 202 6.3.3 runqlat 203 6.3.4 runqlen 207 6.3.5 runqslower 210 6.3.6 cpudist 211 6.3.7 cpufreq 212 6.3.8 profile 215 6.3.9 offcputime 219 6.3.10 syscount 224 6.3.11 argdist and trace 226 6.3.12 funccount 229 6.3.13 softirqs 231 6.3.14 hardirqs 232 6.3.15 smpcalls 233 6.3.16 llcstat 237 6.3.17 Other Tools 238 6.4 BPF One-Liners 238 6.4.1 BCC 238 6.4.2 bpftrace 239 6.5 Optional Exercises 240 6.6 Summary 241 7 Memory 243 7.1 Background 244 7.1.1 Memory Fundamentals 244 7.1.2 BPF Capabilities 247 7.1.3 Strategy 250 7.2 Traditional Tools 250 7.2.1 Kernel Log 251 7.2.2 Kernel Statistics 252 7.2.3 Hardware Statistics and Sampling 255 7.3 BPF Tools 257 7.3.1 oomkill 258 7.3.2 memleak 259 7.3.3 mmapsnoop 261 7.3.4 brkstack 262 7.3.5 shmsnoop 264 7.3.6 faults 264 7.3.7 ffaults 267 7.3.8 vmscan 268 7.3.9 drsnoop 271 7.3.10 swapin 272 7.3.11 hfaults 273 7.3.12 Other Tools 274 7.4 BPF One-Liners 274 7.4.1 BCC 274 7.4.2 bpftrace 275 7.5 Optional Exercises 275 7.6 Summary 276 8 File Systems 277 8.1 Background 278 8.1.1 File Systems Fundamentals 278 8.1.2 BPF Capabilities 280 8.1.3 Strategy 281 8.2 Traditional Tools 282 8.2.1 df 282 8.2.2 mount 283 8.2.3 strace 283 8.2.4 perf 284 8.2.5 fatrace 286 8.3 BPF Tools 287 8.3.1 opensnoop 289 8.3.2 statsnoop 291 8.3.3 syncsnoop 293 8.3.4 mmapfiles 294 8.3.5 scread 295 8.3.6 fmapfault 297 8.3.7 filelife 298 8.3.8 vfsstat 299 8.3.9 vfscount 301 8.3.10 vfssize 302 8.3.11 fsrwstat 304 8.3.12 fileslower 306 8.3.13 filetop 308 8.3.14 writesync 310 8.3.15 filetype 311 8.3.16 cachestat 314 8.3.17 writeback 316 8.3.18 dcstat 318 8.3.19 dcsnoop 320 8.3.20 mountsnoop 322 8.3.21 xfsslower 323 8.3.22 xfsdist 324 8.3.23 ext4dist 327 8.3.24 icstat 330 8.3.25 bufgrow 331 8.3.26 readahead 332 8.3.27 Other Tools 334 8.4 BPF One-Liners 334 8.4.1 BCC 334 8.4.2 bpftrace 335 8.4.3 BPF One-Liners Examples 336 8.5 Optional Exercises 340 8.6 Summary 340 9 Disk I/O 341 9.1 Background 342 9.1.1 Disk Fundamentals 342 9.1.2 BPF Capabilities 344 9.1.3 Strategy 346 9.2 Traditional Tools 346 9.2.1 iostat 346 9.2.2 perf 348 9.2.3 blktrace 349 9.2.4 SCSI Logging 350 9.3 BPF Tools 351 9.3.1 biolatency 352 9.3.2 biosnoop 358 9.3.3 biotop 361 9.3.4 bitesize 362 9.3.5 seeksize 364 9.3.6 biopattern 366 9.3.7 biostacks 368 9.3.8 bioerr 371 9.3.9 mdflush 374 9.3.10 iosched 375 9.3.11 scsilatency 377 9.3.12 scsiresult 379 9.3.13 nvmelatency 381 9.4 BPF One-Liners 384 9.4.1 BCC 384 9.4.2 bpftrace 385 9.4.3 BPF One-Liners Examples 386 9.5 Optional Exercises 387 9.6 Summary 387 10 Networking 389 10.1 Background 390 10.1.1 Networking Fundamentals 390 10.1.2 BPF Capabilities 396 10.1.3 Strategy 398 10.1.4 Common Tracing Mistakes 399 10.2 Traditional Tools 399 10.2.1 ss 400 10.2.2 ip 402 10.2.3 nstat 402 10.2.4 netstat 403 10.2.5 sar 405 10.2.6 nicstat 406 10.2.7 ethtool 407 10.2.8 tcpdump 408 10.2.9 /proc 409 10.3 BPF Tools 411 10.3.1 sockstat 412 10.3.2 sofamily 414 10.3.3 soprotocol 416 10.3.4 soconnect 419 10.3.5 soaccept 422 10.3.6 socketio 424 10.3.7 socksize 426 10.3.8 sormem 429 10.3.9 soconnlat 432 10.3.10 so1stbyte 435 10.3.11 tcpconnect 437 10.3.12 tcpaccept 440 10.3.13 tcplife 443 10.3.14 tcptop 448 10.3.15 tcpsnoop 449 10.3.16 tcpretrans 450 10.3.17 tcpsynbl 453 10.3.18 tcpwin 454 10.3.19 tcpnagle 456 10.3.20 udpconnect 458 10.3.21 gethostlatency 460 10.3.22 ipecn 461 10.3.23 superping 463 10.3.24 qdisc-fq 466 10.3.25 qdisc-cbq, qdisc-cbs, qdisc-codel, qdisc-fq_codel, qdisc-red, and qdisc-tbf 468 10.3.26 netsize 470 10.3.27 nettxlat 473 10.3.28 skbdrop 475 10.3.29 skblife 477 10.3.30 ieee80211scan 479 10.3.31 Other Tools 481 10.4 BPF One-Liners 482 10.4.1 BCC 482 10.4.2 bpftrace 482 10.4.3 BPF One-Liners Examples 484 10.5 Optional Exercises 487 10.6 Summary 488 11 Security 489 11.1 Background 489 11.1.1 BPF Capabilities 490 11.1.2 Unprivileged BPF Users 493 11.1.3 Configuring BPF Security 494 11.1.4 Strategy 495 11.2 BPF Tools 495 11.2.1 execsnoop 496 11.2.2 elfsnoop 497 11.2.3 modsnoop 498 11.2.4 bashreadline 499 11.2.5 shellsnoop 500 11.2.6 ttysnoop 502 11.2.7 opensnoop 503 11.2.8 eperm 504 11.2.9 tcpconnect and tcpaccept 505 11.2.10 tcpreset 506 11.2.11 capable 508 11.2.12 setuids 512 11.3 BPF One-Liners 514 11.3.1 BCC 514 11.3.2 bpftrace 514 11.3.3 BPF One-Liners Examples 514 11.4 Summary 515 12 Languages 517 12.1 Background 517 12.1.1 Compiled 518 12.1.2 JIT Compiled 519 12.1.3 Interpreted 520 12.1.4 BPF Capabilities 521 12.1.5 Strategy 521 12.1.6 BPF Tools 522 12.2 C 522 12.2.1 C Function Symbols 523 12.2.2 C Stack Traces 526 12.2.3 C Function Tracing 528 12.2.4 C Function Offset Tracing 529 12.2.5 C USDT 529 12.2.6 C One-Liners 530 12.3 Java 531 12.3.1 libjvm Tracing 532 12.3.2 jnistacks 533 12.3.3 Java Thread Names 536 12.3.4 Java Method Symbols 537 12.3.5 Java Stack Traces 539 12.3.6 Java USDT Probes 543 12.3.7 profile 549 12.3.8 offcputime 553 12.3.9 stackcount 559 12.3.10 javastat 562 12.3.11 javathreads 563 12.3.12 javacalls 565 12.3.13 javaflow 566 12.3.14 javagc 568 12.3.15 javaobjnew 568 12.3.16 Java One-Liners 569 12.4 Bash Shell 570 12.4.1 Function Counts 572 12.4.2 Function Argument Tracing (bashfunc.bt) 573 12.4.3 Function Latency (bashfunclat.bt) 576 12.4.4 /bin/bash 577 12.4.5 /bin/bash USDT 581 12.4.6 bash One-Liners 582 12.5 Other Languages 583 12.5.1 JavaScript (Node.js) 583 12.5.2 C++ 585 12.5.3 Golang 585 12.6 Summary 588 13 Applications 589 13.1 Background 590 13.1.1 Application Fundamentals 590 13.1.2 Application Example: MySQL Server 591 13.1.3 BPF Capabilities 592 13.1.4 Strategy 592 13.2 BPF Tools 593 13.2.1 execsnoop 595 13.2.2 threadsnoop 595 13.2.3 profile 598 13.2.4 threaded 601 13.2.5 offcputime 603 13.2.6 offcpuhist 607 13.2.7 syscount 610 13.2.8 ioprofile 611 13.2.9 libc Frame Pointers 613 13.2.10 mysqld_qslower 614 13.2.11 mysqld_clat 617 13.2.12 signals 621 13.2.13 killsnoop 623 13.2.14 pmlock and pmheld 624 13.2.15 naptime 629 13.2.16 Other Tools 630 13.3 BPF One-Liners 631 13.3.1 BCC 631 13.3.2 bpftrace 631 13.4 BPF One-Liners Examples 632 13.4.1 Counting libpthread Conditional Variable Functions for One Second 632 13.5 Summary 633 14 Kernel 635 14.1 Background 636 14.1.1 Kernel Fundamentals 636 14.1.2 BPF Capabilities 638 14.2 Strategy 639 14.3 Traditional Tools 640 14.3.1 Ftrace 640 14.3.2 perf sched 643 14.3.3 slabtop 644 14.3.4 Other Tools 644 14.4 BPF Tools 644 14.4.1 loads 646 14.4.2 offcputime 647 14.4.3 wakeuptime 649 14.4.4 offwaketime 650 14.4.5 mlock and mheld 652 14.4.6 Spin Locks 656 14.4.7 kmem 657 14.4.8 kpages 658 14.4.9 memleak 659 14.4.10 slabratetop 660 14.4.11 numamove 661 14.4.12 workq 663 14.4.13 Tasklets 664 14.4.14 Other Tools 665 14.5 BPF One-Liners 666 14.5.1 BCC 666 14.5.2 bpftrace 666 14.6 BPF One-Liners Examples 667 14.7 Challenges 668 14.8 Summary 669 15 Containers 671 15.1 Background 671 15.1.1 BPF Capabilities 673 15.1.2 Challenges 673 15.1.3 Strategy 676 15.2 Traditional Tools 676 15.2.1 From the Host 676 15.2.2 From the Container 677 15.2.3 systemd-cgtop 677 15.2.4 kubectl top 678 15.2.5 docker stats 678 15.2.6 /sys/fs/cgroups 679 15.2.7 perf 679 15.3 BPF Tools 680 15.3.1 runqlat 680 15.3.2 pidnss 681 15.3.3 blkthrot 683 15.3.4 overlayfs 684 15.4 BPF One-Liners 687 15.5 Optional Exercises 687 15.6 Summary 687 16 Hypervisors 689 16.1 Background 689 16.1.1 BPF Capabilities 691 16.1.2 Suggested Strategies 691 16.2 Traditional Tools 692 16.3 Guest BPF Tools 693 16.3.1 Xen Hypercalls 693 16.3.2 xenhyper 697 16.3.3 Xen Callbacks 699 16.3.4 cpustolen 700 16.3.5 HVM Exit Tracing 701 16.4 Host BPF Tools 702 16.4.1 kvmexits 702 16.4.2 Future Work 706 16.5 Summary 707 Part III: Additional Topics 17 Other BPF Performance Tools 709 17.1 Vector and Performance Co-Pilot (PCP) 709 17.1.1 Visualizations 710 17.1.2 Visualization: Heat Maps 711 17.1.3 Visualization: Tabular Data 713 17.1.4 BCC Provided Metrics 714 17.1.5 Internals 714 17.1.6 Installing PCP and Vector 715 17.1.7 Connecting and Viewing Data 715 17.1.8 Configuring the BCC PMDA 717 17.1.9 Future Work 718 17.1.10 Further Reading 718 17.2 Grafana and Performance Co-Pilot (PCP) 718 17.2.1 Installation and Configuration 719 17.2.2 Connecting and Viewing Data 719 17.2.3 Future Work 721 17.2.4 Further Reading 721 17.3 Cloudflare eBPF Prometheus Exporter (with Grafana) 721 17.3.1 Build and Run the ebpf Exporter 721 17.3.2 Configure Prometheus to Monitor the ebpf_exporter Instance 722 17.3.3 Set Up a Query in Grafana 722 17.3.4 Further Reading 723 17.4 kubectl-trace 723 17.4.1 Tracing Nodes 723 17.4.2 Tracing Pods and Containers 724 17.4.3 Further Reading 726 17.5 Other Tools 726 17.6 Summary 726 18 Tips, Tricks, and Common Problems 727 18.1 Typical Event Frequency and Overhead 727 18.1.1 Frequency 728 18.1.2 Action Performed 729 18.1.3 Test Yourself 731 18.2 Sample at 49 or 99 Hertz 731 18.3 Yellow Pigs and Gray Rats 732 18.4 Write Target Software 733 18.5 Learn Syscalls 734 18.6 Keep It Simple 735 18.7 Missing Events 735 18.8 Missing Stacks Traces 737 18.8.1 How to Fix Broken Stack Traces 738 18.9 Missing Symbols (Function Names) When Printing 738 18.9.1 How to Fix Missing Symbols: JIT Runtimes (Java, Node.js, ...) 739 18.9.2 How to Fix Missing Symbols: ELF binaries (C, C++, ...) 739 18.10 Missing Functions When Tracing 739 18.11 Feedback Loops 740 18.12 Dropped Events 740 Part IV: Appendixes A bpftrace One-Liners 741 B bpftrace Cheat Sheet 745 C BCC Tool Development 747 D C BPF 763 E BPF Instructions 783 Glossary 789 Bibliography 795 |