zoukankan      html  css  js  c++  java
  • [转]Top 10 DTrace scripts for Mac OS X

    org link: http://dtrace.org/blogs/brendan/2011/10/10/top-10-dtrace-scripts-for-mac-os-x/

    Top 10 DTrace scripts for Mac OS X

    Since version 10.5 “Leopard”, Mac OS X has had DTrace, a tool used for performance analysis and troubleshooting. It provides data for Apple’s Instruments tool, as well as a collection of command line tools that are implemented as DTrace scripts. I’m familiar with the latter as I wrote the originals for theDTraceToolkit, which Apple then customized and enhanced for Mac OS X where they are shipped by default (great!). I use them regularly to answer this question:

    why is my MacBook slow?

    I work in an office where everyone has MacBook Pros, and “why is my MacBook slow?” is a common question. Applications can become slow or unresponsive while waiting for CPU work, memory requests or disk I/O to complete.

    For people who try to ignore the slowdown, the question can become:

    why is my MacBook fan so loud?

    Standard performance analysis tools like Activity Monitor and top(1) (and any third-party tools based on the same foundation) can’t tell you some key information about activity on your system, such as how much CPU consumption is caused by short-lived processes, or which processes are causing disk I/O. DTrace, however, can see (just about) everything.

    In this post, I’ll cover the top ten Mac OS X DTrace scripts that I use for figuring out why laptops are slow or why applications are misbehaving. Most of these scripts are already installed, a few are from the new DTrace book.

    How to use DTrace

    If you’ve never run a DTrace script before or even used the command line, here’s a basic walkthrough:

    Open Terminal

    Run the “Terminal” application. You can find it here in Finder:

    You can also type “terminal” in Spotlight (the magnifying glass in the top right corner of your Mac’s screen), which should find it. I usually drag it to my Dock so it’s easy to find later:

    Customize Terminal

    When you first run Terminal, it’ll probably look like this:

    I find the default font small and hard to read. The size of the window in terms of characters is also small (80 columns by 24 rows), presumably to pay homage to original Unix terminals of that size. (Why Unix? Mac OS X is Unix under the hood: the Darwin kernel).

    You can adjust the font to your liking in Terminal->Preferences->Text. I use Monaco 13pt, with “Antialias Text” on. Under the “Window” tab is the default size, 80×24, which you can also increase later by clicking and dragging the bottom right corner of the terminal. For example, here’s my screen as I write this blog post (in a terminal-based text editor).

    Running top

    While DTrace can see everything, there are some things already covered by easy-to-use (and easy-to-type) tools, like top(1). Now that you have Terminal running, type “top -o cpu” and hit enter, which will refresh the screen showing top running processes. Type “q” to exit the top program. The output will look something like this:

    Brendan-2:~ brendan$ top -o cpu
    [...]
    PID    COMMAND      %CPU TIME     #TH  #WQ  #PORT #MREGS RPRVT  RSHRD  RSIZE
    67254- firefox-bin  98.8 04:53:45 62/1 1    743   9935+  374M+  78M    637M+
    86550  top          15.3 00:28.60 1/1  0    24    33     1748K  264K   2324K
    1068   Terminal     13.2 17:38:19 5    1    358   1326   37M    67M    61M+
    61501- Google Chrom 7.9  13:40.89 6    1    93-   556-   45M-   73M    77M-
    [...]
    

    I’ve truncated the header block to just include the columns. Look at the “%CPU” column to see which processes are making the CPUs busy, and the “RSIZE” column to see who is consuming main memory. The busiest (CPU) process will be at the top, since we sorted on cpu (“-o cpu”). The top was “firefox-bin” (Mozilla Firefox) at 98.8% CPU, which is in terms of a single processor (this has two). If firefox stayed that high you could look for the responsible tab and close it down, or restart Firefox.

    After top, I turn to DTrace.

    Running DTrace

    DTrace requires admin privileges, so to use it you’ll usually need to type in a password to authenticate, provided you have administrator access (if you aren’t sure you do, click here to see how to check).

    You can run DTrace by prefixing your DTrace commands with “sudo”, which will prompt for the password the first time around (but not for some time after that). When a DTrace command is running, you usually type Ctrl-C to end it. Here’s an example:

    Brendan-2:~ brendan$ sudo iosnoop
    Password: password
      UID   PID D    BLOCK   SIZE       COMM PATHNAME
      503 67261 W 384070496  73728  TweetDeck ??/Cookies/Cookies.plist_tmp_67261_0.dat
      503 67261 W 384070640   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
      503 67261 W 384070640   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
      503 67261 W 308056800   4096  TweetDeck ??/Local Store/td_26_brendangregg.db
      503 67261 W 308056856   4096  TweetDeck ??/Local Store/td_26_brendangregg.db
    ^C
    brendan@macbook:~>
    

    If sudo says “permission denied”, check your admin status.

    and here’s what happens if you forgot the “sudo”:

    Brendan-2:~ brendan$ iosnoop
    dtrace: failed to initialize dtrace: DTrace requires additional privileges
    

    Scripts

    The scripts follow. A note on style: if the script ends with “.d”, it’s a basic DTrace script. If it doesn’t, then it’s a shell-wrapped script that provides command line options. Use “-h” to list them.

    And if you’d like to learn DTrace, I’d recommend starting by reading the basic ones.

    1. iosnoop

    This “traces” disk I/O execution live. Each time a disk I/O completes, a line of output is printed to summarize it, including process name and filename details:

    Brendan-2:~ brendan$ sudo iosnoop
      UID   PID D    BLOCK   SIZE       COMM PATHNAME
      503 54079 R 286522800   4096 Google Chrome ??/Cache/data_2
      503 54079 R 286522808   4096 Google Chrome ??/Cache/data_2
      503 34852 W 385001216  53248      Adium ??/Default/.dat8824.01b
      503 65002 W 308056800   4096  TweetDeck ??/Local Store/td_26_brendangregg.db
      503 65002 W 308056864   4096  TweetDeck ??/Local Store/td_26_brendangregg.db
      503 65002 W 385001320   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
      503 65002 W 385001320   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
      503 65002 W 385001320   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
      503 54079 W 385001384  12288 Google Chrome ??/Default/Cookies-journal
      503 54079 W 385001384   4096 Google Chrome ??/Default/Cookies-journal
      503 54079 W 134993856   4096 Google Chrome ??/Default/Cookies
      503 54079 W 134994056   4096 Google Chrome ??/Default/Cookies
      503 54079 W 134994176   4096 Google Chrome ??/Default/Cookies
      503 54079 W 134994224   8192 Google Chrome ??/Default/Cookies
    [...]
    

    This lets you instantly find out which applications are using the disk, and what files they are reading or writing to. Disk I/O is typically slow (for non-SSD disks), so an application calling frequent disk I/O (a dozen per second or more) may run slowly as it waits for the disk I/O to complete.

    The output columns show: UID = user ID, PID = process ID (unique identifier for the process), D = direction (R = read, W = write), BLOCK = location on disk, SIZE = I/O size in bytes, COMM = process name, PATHNAME = trailing portion of file pathname.

    In that output I caught Google Chrome reading from a cache file (“data_2″), and writing to cookie files (“Cookies-journal” and “Cookies”). TweetDeck also wrote to database files (“td_26_brendangregg.db” and “td_26_brendangregg.db-journal”). The “??” is where the path information ends for iosnoop (if you are a darwin programmer and want to take a swing at improving that, see the fi_pathname translator in /usr/lib/dtrace/io.d).

    The “-h” option lists options (don’t need “sudo” for this):

    Brendan-2:~ brendan$ iosnoop -h
    USAGE: iosnoop [-a|-A|-DeghiNostv] [-d device] [-f filename]
                   [-m mount_point] [-n name] [-p PID]
           iosnoop          # default output
                    -a      # print all data (mostly)
                    -A      # dump all data, space delimited
                    -D      # print time delta, us (elapsed)
                    -e      # print device name
                    -g      # print command arguments
                    -i      # print device instance
                    -N      # print major and minor numbers
                    -o      # print disk delta time, us
                    -s      # print start time, us
                    -t      # print completion time, us
                    -v      # print completion time, string
                    -d device       # instance name to snoop
                    -f filename     # snoop this file only
                    -m mount_point  # this FS only
                    -n name         # this process name only
                    -p PID          # this PID only
       eg,
            iosnoop -v      # human readable timestamps
            iosnoop -N      # print major and minor numbers
            iosnoop -m /    # snoop events on filesystem / only
    

    For tricky performance issues I often use “-stoD” to get start and end timestamps for each I/O in microseconds, and a couple of different types of I/O time calculations.

    In hindsight, I should have called it diskiosnoop, since “io” could refer to different locations in the kernel I/O stack.

    2. hfsslower.d

    This script answers an iosnoop FAQ: why dosen’t iosnoop see my application disk I/O?

    The reason is that applications rarely request disk I/O directly, rather, they access a file system that does disk I/O on their behalf. To increase performance, the file system will usually try to cache as much file data as possible in main memory (DRAM). The application may (by some notion) think that it’s doing disk I/O, but it’s actually reading from very fast DRAM, thanks to the file system. Writes can also buffer in DRAM and write to disk later, which also speeds up application performance.

    The hfsslower.d script measures I/O before it is processed by the HFS+ file system (Apple’s current default file system). iosnoop measures I/O after the file system, and only if it reaches disk:

    A couple of points:

    • hfsslower.d will see a lot more I/O than iosnoop, as it includes file system cache hits.
    • hfsslower.d better reflects application performance, as it measures the same latency that the application directly suffered.

    The hfsslower.d script is from the DTrace book, and can be found here. To run it, you’ll need to create a text file containing the script (or pull it from the DTrace book tarball), and make the file executable from Terminal by running “chmod 755 hfsslower.d”.

    This script takes an argument which is the minimum number of milliseconds to show I/O for. Here’s tracing 1ms HFS+ I/O and slower:

    Brendan-2:~ brendan$ sudo ./dtbook_scripts/Chap5/hfsslower.d 1
    TIME                 PROCESS          D   KB     ms FILE
    2011 Sep 27 19:00:15 Google Chrome    R    0     15 data_5
    2011 Sep 27 19:00:15 Google Chrome    R    0     10 data_0
    2011 Sep 27 19:00:15 Google Chrome    R    0     10 data_0
    2011 Sep 27 19:00:15 Google Chrome    R    0     10 data_5
    2011 Sep 27 19:00:21 TweetDeck        W    0      1 td_26_brendangregg.db-journal
    2011 Sep 27 19:00:25 Adium            R    0     18 Smile.png
    2011 Sep 27 19:01:08 firefox-bin      W    0     17 _CACHE_001_
    2011 Sep 27 19:01:36 firefox-bin      W    0      9 _CACHE_001_
    2011 Sep 27 19:01:36 firefox-bin      W    2     14 _CACHE_002_
    2011 Sep 27 19:01:36 firefox-bin      W    0      7 _CACHE_001_
    2011 Sep 27 19:01:37 firefox-bin      W    0      1 _CACHE_001_
    2011 Sep 27 19:01:40 firefox-bin      W    0      6 _CACHE_001_
    2011 Sep 27 19:01:46 firefox-bin      W    0     14 _CACHE_001_
    2011 Sep 27 19:01:49 firefox-bin      R   15      1 _CACHE_003_
    2011 Sep 27 19:01:49 firefox-bin      W   12      7 _CACHE_003_
    2011 Sep 27 19:01:49 firefox-bin      W    0      4 _CACHE_001_
    2011 Sep 27 19:01:51 firefox-bin      W    0      9 _CACHE_001_
    2011 Sep 27 19:01:52 firefox-bin      R   21     27 Times.dfont/..namedfork/rsrc
    2011 Sep 27 19:01:52 fontd            R    0     16 annex_aux
    2011 Sep 27 19:01:52 firefox-bin      W    0      5 _CACHE_001_
    [...]
    

    Columns are: TIME = time of I/O completion, PROCESS = application name, D = direction (R = read, W = write), KB = I/O size in Kbytes, ms = I/O latency in milliseconds, FILE = filename.

    If you use the argument “0″, it will trace everything. If I’m chasing down slow I/O, I’ll often use an argument of “10″ for I/O slower than 10 milliseconds.

    At this point you may think: if you just care about slow I/O, then just use iosnoop. That works to a point, but there can be slow I/O caused by something other than disks (file system lock contention, for example). The other advantage of the hfsslower.d script is that the measured latency matches the application pain suffered, whereas at the disk level you can only assume a correlation.

    3. execsnoop

    This traces the execution of new processes. This is great at identifying short-lived processes that may be caused by misbehaving applications and can slow down your system. These short-lived processes are usually too quick to be picked up by standard monitoring tools like the Activity Monitor or top(1).

    To demonstrate this tool, here’s what happens when you type “man ls”:

    Brendan-2:~ brendan$ sudo execsnoop -v
    STRTIME                UID    PID   PPID ARGS
    2011 Sep 28 20:19:18     0  67234  66312 man
    2011 Sep 28 20:19:18     0  67234  66312 man
    2011 Sep 28 20:19:18     0  67235  67234 sh
    2011 Sep 28 20:19:18     0  67235  67234 gzip
    2011 Sep 28 20:19:19     0  67236  67234 sh
    2011 Sep 28 20:19:18     0  67234  66312 man
    2011 Sep 28 20:19:19     0  67236  67234 gzip
    2011 Sep 28 20:19:19     0  67234  66312 man
    2011 Sep 28 20:19:19     0  67244  67243 less
    2011 Sep 28 20:19:19     0  67242  67239 sh
    2011 Sep 28 20:19:19     0  67242  67239 gzip
    2011 Sep 28 20:19:19     0  67240  67238 tbl
    2011 Sep 28 20:19:19     0  67241  67238 groff
    2011 Sep 28 20:19:19     0  67245  67241 troff
    2011 Sep 28 20:19:19     0  67246  67241 grotty
    

    This prints a line for each new process that is executed, in a rolling output similar to the previous tools. The fields are: STRTIME = (string) timestamp, UID = user ID, PID = process ID, PPID = parent process ID, ARGS = process name (should be process + arguments, but that doesn’t yet work on Mac OS X; if you want to debug, see pr_psargs in /usr/lib/dtrace/proc.d).

    Here’s what happens when I turn AirPort (wifi) off, then on:

    Brendan-2:~ brendan$ sudo execsnoop -v
    STRTIME                UID    PID   PPID ARGS
    2011 Sep 28 20:28:00     0  67204      1 airportd
    2011 Sep 28 20:28:01   503  67205    140 fontworker
    2011 Sep 28 20:28:01     0  67206      1 mDNSResponderHel
    2011 Sep 28 20:28:06     0     49      1 autofsd
    2011 Sep 28 20:28:10     0  67208     37 ManagedClient
    2011 Sep 28 20:28:31   503  67209    140 fontworker
    2011 Sep 28 20:28:43     0  67210      1 airportd
    2011 Sep 28 20:28:49     0  67211      1 mDNSResponderHel
    2011 Sep 28 20:28:49   503  67212    140 fontworker
    2011 Sep 28 20:28:51     0  67214      1 kerberosautoconf
    2011 Sep 28 20:28:51     0  67215      1 kerberosautoconf
    2011 Sep 28 20:28:56     0     49      1 autofsd
    2011 Sep 28 20:29:01     0  67219     37 ManagedClient
    2011 Sep 28 20:29:04     0  67220      1 ocspd
    2011 Sep 28 20:29:16   503  67221  54079 Google Chrome He
    2011 Sep 28 20:29:19   503  67222    140 fontworker
    

    Near the end I opened up a new tab in Mozilla Firefox and another in Google Chrome. You can see a fundamental difference: Chrome creates a new process for that tab, Firefox doesn’t.

    execsnoop has various options, use -h to list them all. I used -v above, to print the time.

    4. opensnoop

    This traces file opens and prints various details, including the time and error code when using “-ve”. I usually use it to look for failed opens, which can be a sign of misconfigured applications. Discovering their config files and resource files can also be useful.

    Brendan-2:~ brendan$ sudo opensnoop -ve
    STRTIME                UID    PID COMM          FD ERR PATH
    2011 Sep 30 01:44:15     0     11 DirectoryServic  -1   2 /var/db/dslocal/nodes/Default/hosts
    2011 Sep 30 01:44:15     0     11 DirectoryServic  20   0 /etc/hosts
    2011 Sep 30 01:44:15     0  85198 dtrace         5   0 /etc/localtime
    2011 Sep 30 01:44:15   503    146 SystemUIServer  19   0 /Users/brendan/Library/Preferences/com.apple.menuextra.clock.plist.47C2X9o
    2011 Sep 30 01:44:15   503    146 SystemUIServer  19   0 /System/Library/CoreServices/Menu Extras/TimeMachine.menu/Contents/Resources/TMRotatingArrow.pdf
    2011 Sep 30 01:44:15   503    146 SystemUIServer  19   0 /System/Library/CoreServices/Menu Extras/Clock.menu/Contents/Resources/Clock.pdf
    2011 Sep 30 01:44:15   503    146 SystemUIServer  19   0 /System/Library/CoreServices/Menu Extras/TimeMachine.menu/Contents/Resources/TMRotatingArrow.pdf
    2011 Sep 30 01:44:17   503  67261 TweetDeck     17   0 /Users/brendan/Library/Cookies/Cookies.plist_tmp_67261_0.dat
    2011 Sep 30 01:44:17   503  67261 TweetDeck     17   0 /Users/brendan/Library/Cookies/Cookies.plist
    2011 Sep 30 01:44:18   503  54079 Google Chrome  72   0 /var/folders/bU/bU0WYlnwE6KKnMBrvI5QUU+++TQ/-Tmp-/.com.google.Chrome.sQdN3D
    2011 Sep 30 01:44:19     0     36 mds            9   0 .
    ^C
    

    While tracing, I changed the clock in the top bar from digital:

    to analogue:

    The files opened to performed this can be seen by the “SystemUIServer” process, beginning with a Preferences file and then some PDFs. Huh? PDFs? What are those PDFs?

    That’s bizarre. I didn’t know that those tiny icons were implemented as PDFs!

    5. dtruss

    The previous two tools, opensnoop and execsnoop, operate by tracing specific system calls. A system call (or “syscall”) is what an application performs to request the operating system kernel to perform privileged work, including process creation, file operations, and other I/O (eg, disk or network I/O). Syscalls are a great target for analysis with DTrace, since examining them often provides a pretty good picture of what an application is trying to do. They can also provide useful details including byte counts, file and process names, error codes, and latency.

    The dtruss tool traces all types of system calls, which is very useful for general debugging, especially since Mac OS X doesn’t come with a standard syscall tracer (like Linux’s “strace” or Solaris’s “truss”). One advantage of dtruss over those other tools is that dtruss can trace multiple processes at the same time, matching on the process name “-n”. For example, tracing Firefox via its process name “firefox-bin”:

    Brendan-2:~ brendan$ sudo dtruss -n firefox-bin
    	PID/THRD  SYSCALL(args) 		 = return
    67254/0x720286:  write_nocancel(0x7, "8", 0x1)		 = 1 0
    67254/0x720286:  lseek(0x24, 0x75500, 0x0)		 = 480512 0
    67254/0x720286:  read_nocancel(0x24, "", 0x200)		 = 512 0
    67254/0x720286:  write_nocancel(0x7, "8", 0x1)		 = 1 0
    67254/0x720286:  lseek(0x24, 0x73B00, 0x0)		 = 473856 0
    67254/0x720286:  read_nocancel(0x24, "", 0x200)		 = 512 0
    67254/0x746a81:  select_nocancel(0x3D, 0xB04909B8, 0xB0490938)		 = 1 0
    67254/0x746a81:  read_nocancel(0x6, "8", 0x400)		 = 1 0
    67254/0x746a81:  recvfrom_nocancel(0x36, 0xB0490C3F, 0x1)		 = -1 Err#35
    67254/0x746a81:  select_nocancel(0x3D, 0xB0490998, 0xB0490918)		 = 1 0
    67254/0x746a81:  sendto_nocancel(0x36, 0x20A51008, 0x292)		 = 658 0
    67254/0x720286:  write_nocancel(0x7, "8", 0x1)		 = 1 0
    67254/0x720286:  lseek(0x24, 0x74000, 0x0)		 = 475136 0
    [...]
    

    The output will be many pages, as applications commonly make frequent system calls.

    dtruss can also launch and trace a program. Here the humble “ls -l” command was traced:

    Brendan-2:~ brendan$ sudo dtruss ls -l hfsslower.d
    -rwxr-xr-x  1 brendan  staff  1152 Jan 28  2011 hfsslower.d
    SYSCALL(args) 		 = return
    ioctl(0x3, 0x80086804, 0x7FFF5FBFD710)		 = 0 0
    close(0x3)		 = 0 0
    stat64("/usr/lib/libstdc++.6.dylib", 0x7FFF5FBFCB20, 0x7FFF5FBFD160)		 = 0 0
    mmap(0x0, 0xD000, 0x3, 0x1002, 0x1000000, 0x7FFF00000001)		 = 0x20000 0
    [...100 lines truncated...]
    open_nocancel("/etc/sysinfo.conf", 0x0, 0x1B6)		 = -1 Err#2
    lstat64_extended(0x100100768, 0x7FFF5FBFE3F0, 0x100101580)		 = 0 0
    listxattr(0x100100768, 0x0, 0x0)		 = 0 0
    fstat64(0x1, 0x7FFF5FBFD960, 0x7FFF5FBFDA2C)		 = 0 0
    ioctl(0x1, 0x4004667A, 0x7FFF5FBFD9AC)		 = 0 0
    access("/etc/localtime", 0x4, 0x0)		 = 0 0
    open_nocancel("/etc/localtime", 0x0, 0x0)		 = 4 0
    fstat64(0x4, 0x7FFF5FBFB8B0, 0x0)		 = 0 0
    read_nocancel(0x4, "TZif", 0x2A64)		 = 1017 0
    close_nocancel(0x4)		 = 0 0
    write_nocancel(0x1, "-rwxr-xr-x  1 brendan  staff  1152 Jan 28  2011 hfsslower.d
    ", 0x3C)		 = 60 0
    fchdir(0x3, 0x100100000, 0xFC080)		 = 0 0
    fchdir(0x3, 0x0, 0x100800000)		 = 0 0
    close_nocancel(0x3)		 = 0 0
    

    100 lines were trimmed to keep that example short.

    dtruss supports various options, including the printing of system call timing for use when analyzing performance.

    Brendan-2:~ brendan$ dtruss -h
    USAGE: dtruss [-acdefholLs] [-t syscall] { -p PID | -n name | command }
    
              -p PID          # examine this PID
              -n name         # examine this process name
              -t syscall      # examine this syscall only
              -a              # print all details
              -c              # print syscall counts
              -d              # print relative times (us)
              -e              # print elapsed times (us)
              -f              # follow children
              -l              # force printing pid/lwpid
              -o              # print on cpu times
              -s              # print stack backtraces
              -L              # don't print pid/lwpid
              -b bufsize      # dynamic variable buf size
       eg,
           dtruss df -h       # run and examine "df -h"
           dtruss -p 1871     # examine PID 1871
           dtruss -n tar      # examine all processes called "tar"
           dtruss -f test.sh  # run test.sh and follow children
    

    I’ve used the “-e” option many times to figure out what’s slowing down an application. It shows the elapsed time for the system call in microseconds:

    Brendan-2:~ brendan$ dtruss -e ls -l hfsslower.d 
    -rwxr-xr-x  1 brendan  staff  1152 Jan 28  2011 hfsslower.d
     ELAPSD SYSCALL(args) 		 = return
        245 mmap(0x10000B000, 0x2000, 0x5, 0x12, 0x3, 0x7FFF00000001)		 = 0xB000 0
         11 mmap(0x10000D000, 0x1000, 0x3, 0x12, 0x3, 0x7FFF00000001)		 = 0xD000 0
         10 mmap(0x10000E000, 0x1F10, 0x1, 0x12, 0x3, 0x7FFF00000001)		 = 0xE000 0
         31 open("/dev/dtracehelper", 0x2, 0x7FFF5FC45370)		 = 3 0
         40 __sysctl(0x7FFF5FBFD5B0, 0x2, 0x7FFF5FBFD5DC)		 = 0 0
    

    I then look down the “ELAPSD” column for the largest times.

    A dtruss FAQ is: why is the output in the wrong order?

    As with other DTrace “tracing” style tools (that print lines of output as they occur) the output can be shuffled slightly due to CPU buffering on multi-CPU systems. The “-d” option for relative times (since program start) is useful for cases where the output order is important. You can then post-sort by that column so that the dtruss output is in the correct order. (For example, by redirecting the output to a file, then using the sort(1) command on that column.)

    Ryan Dahl (creator of node.js) had been using dtruss so frequently recently that he made some enhancements and posted a newer version on github here.

    6. soconnect_mac.d

    This script is from the DTrace book, and traces outbound TCP connections along with details:

    Brendan-2:~ brendan$ sudo ./dtbook/Chap6/soconnect_mac.d
    PID    PROCESS          FAM ADDRESS          PORT   LAT(us) RESULT
    45343  firefox-bin      2   70.32.100.166    443        735 In progress
    65002  TweetDeck        2   199.59.148.201   80          94 In progress
    65002  TweetDeck        2   208.93.137.53    80          89 In progress
    54079  Google Chrome    2   97.74.215.143    80          76 In progress
    45343  firefox-bin      2   97.74.215.143    80         752 In progress
    45343  firefox-bin      2   97.74.215.143    80          88 In progress
    45343  firefox-bin      2   97.74.215.143    80          85 In progress
    45343  firefox-bin      2   97.74.215.143    80          39 In progress
    45343  firefox-bin      2   97.74.215.143    80          22 In progress
    45343  firefox-bin      2   97.74.215.143    80          20 In progress
    27     ntpd             2   17.151.16.20     123         24 Success
    [...]
    

    It’s a quick way to find out what applications are connecting to whom on the Internet. The script is here.

    Columns are: PID = process ID, PROCESS = process name, FAM = protocol family (2 = IPv4), ADDRESS = IP address, PORT = TCP destination port, LAT(us) latency of the connect() system call in microseconds, RESULT = return of the connect().

    Many connections these days are “non-blocking”, so the latency appears low and has the result “In progress”. The actual connection completes sometime later (the timing can be traced using some more DTrace, when desired).

    There is a companion tool, soaccept_mac.d, which shows inbound TCP connections. If you ran both regularly, you may catch something performing networking that shouldn’t be (e.g., spyware, virus, …).

    7. errinfo

    This tool provides a summary of which system calls were failing, showing the process name, error code, and short description of the error:

    Brendan-2:~ brendan$ sudo errinfo -c
    Tracing... Hit Ctrl-C to end.
    ^C
                EXEC          SYSCALL  ERR  COUNT  DESC
           TweetDeck          lstat64    2      1  No such file or directory
           TweetDeck           select    9      1  Bad file descriptor
           TweetDeck           stat64    2      1  No such file or directory
               cupsd           unlink    2      1  No such file or directory
         firefox-bin connect_nocancel   36      1  Operation now in progress
             launchd            mkdir   17      1  File exists
             launchd    open_nocancel    2      1  No such file or directory
    [...]
              Google           Chrome    0     35
              Google           Chrome    0     35
            mdworker      getattrlist    2     24  No such file or directory
            mdworker            mkdir   13     24  Permission denied
           TweetDeck           access    2     33  No such file or directory
           TweetDeck             read   35     35  Resource temporarily unavailable
       mDNSResponder          recvmsg   35     38  Resource temporarily unavailable
            mdworker           stat64    2     47  No such file or directory
           gpg-agent    read_nocancel   35     60  Resource temporarily unavailable
            thnuclnt            ioctl   25     60  Inappropriate ioctl for device
                ntpd __pthread_canceled   22     69  Invalid argument
                ntpd        sigreturn   -2     69
                ntpd       sigsuspend    4     69  Interrupted system call
              Google           Chrome    0      0
     activitymonitor        proc_info    3   1495  No such process
    

    This can be another quick way to track down failing or misconfigured applications. Note that (usually) most system call errors are not a problem: it can be normal for system calls to fail, and the calling application handles that condition correctly. errinfo is particularly useful when the application didn’t handle the failure correctly, and the system calls are silently failing.

    8. bitesize.d

    This is a simple DTrace script that characterizes the disk I/O workload, showing a distribution of the size of the I/O in bytes along with the application name:

    Brendan-2:~ brendan$ sudo bitesize.d
    Tracing... Hit Ctrl-C to end.
    ^C
    
         PID  CMD
       31502  vim
    
               value  ------------- Distribution ------------- count
                2048 |                                         0
                4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         4
                8192 |                                         0
               16384 |@@@@@@@@                                 1
               32768 |                                         0        
    
       54079  Google Chrome
    
               value  ------------- Distribution ------------- count
                2048 |                                         0
                4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        18
                8192 |@@@@@@@                                  4
               16384 |                                         0        
    
       86143  bsdtar
    
               value  ------------- Distribution ------------- count
                2048 |                                         0
                4096 |@@@                                      19
                8192 |@@@@                                     23
               16384 |@@                                       12
               32768 |@@@@@@@@@@@@@@@@@@@@@@@                  151
               65536 |@@@@                                     25
              131072 |@@                                       10
              262144 |@@                                       10
              524288 |@                                        8
             1048576 |                                         3
             2097152 |                                         0
    

    The “value” column shows the minimum size of the I/O in bytes, and the “count” column shows the number of I/O in that range: between the minimum size and the next size shown.

    To follow an example: the “bsdtar” command shown above performed 151 disk I/Os that were between 32768 bytes and 65535 bytes in size (32K to 64K), which was where most of its I/O fell. The text plot in the middle provides a visualization for the distribution of the events, which shows a spike that corresponds to this 32K to 64K range. Google Chrome did a couple of dozen I/O between 4 and 16 Kbytes.

    Disk I/O is an expensive operation, so it’s usually preferable for it to be larger in size to improve the throughput to the device.

    9. iotop

    This presents the same data as iosnoop, but in a summarized way similar to top(1). It’s handy when disk I/O is so frequent that iosnoop is too verbose, and you want a high level summary of which process is rattling the disks. Here I’ve used “-CP” to not clear the screen and provide a rolling output (-C), and to show disk busy percentages (-P):

    Brendan-2:~ brendan$ sudo iotop -CP 1
    Tracing... Please wait.
    2011 Sep 27 19:33:11,  load: 0.74,  disk_r:      0 KB,  disk_w:  57344 KB
    
      UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
        0  66661  66312 cp               ??       14   2 W     98
    
    2011 Sep 27 19:33:12,  load: 0.74,  disk_r:      4 KB,  disk_w:  55752 KB
    
      UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
        0  66661  66312 cp               ??       14   2 W     93
    
    2011 Sep 27 19:33:13,  load: 0.68,  disk_r:     60 KB,  disk_w:  54464 KB
    
      UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
        0  66661  66312 cp               ??       14   2 R     11
        0  66661  66312 cp               ??       14   2 W     88
    
    2011 Sep 27 19:33:14,  load: 0.68,  disk_r:   2560 KB,  disk_w:  32928 KB
    
      UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
      503  65002    140 TweetDeck        ??       14   2 W      0
        0      1      0 launchd          ??       14   2 W      1
        0  66661  66312 cp               ??       14   2 R     11
        0  66661  66312 cp               ??       14   2 W     71
    

    The “??” is for where some details aren’t yet available to DTrace, for the same reason as was described for the iosnoop script (if you are a programmer, you can see what needs to be fixed in the /usr/lib/dtrace/io.d file).

    If you get “dynvardrops”, sorry: back when I wrote this (6 years ago), I never saw a disk workload heavy enough to cause these. A remedy can be to increase the dynvarsize tunable by adding the following line:

    #pragma D option dynvarsize=16m

    near this one:

    #pragma D option quiet

    in /usr/bin/iotop. You may want to make a backup of iotop first (iotop.orig) or copy it somewhere else to run it.

    10. maclife.d

    This script is from the DTrace book, and traces the creation and deletion of files:

    Brendan-2:~ brendan$ sudo ./dtbook/Chap5/maclife.d
    TIME(ms)        UID    PID PROCESS      CALL         DIR/FILE
    3754594958      503  54079 Google Chrom VNOP_CREATE  Chrome/.com.google.Chrome.K7I9jy
    3754597703      503  54079 Google Chrom VNOP_CREATE  -Tmp-/.com.google.Chrome.8MIKKL
    3754597703      503  54079 Google Chrom VNOP_REMOVE  -Tmp-/.com.google.Chrome.8MIKKL
    3754597703      503  54079 Google Chrom VNOP_CREATE  -Tmp-/.com.google.Chrome.TLlOje
    3754597703      503  54079 Google Chrom VNOP_REMOVE  -Tmp-/.com.google.Chrome.TLlOje
    3754598365      503  54079 Google Chrom VNOP_CREATE  -Tmp-/.com.google.Chrome.yIwTdE
    3754598365      503  54079 Google Chrom VNOP_REMOVE  -Tmp-/.com.google.Chrome.yIwTdE
    3754603801      503  65002 TweetDeck    VNOP_CREATE  Cookies/Cookies.plist_tmp_65002_0.dat
    3754605028      503  65002 TweetDeck    VNOP_REMOVE  Local Store/td_26_brendangregg.db-journal
    3754605026      503  65002 TweetDeck    VNOP_CREATE  Local Store/td_26_brendangregg.db-journal
    3754607674      503  65002 TweetDeck    VNOP_CREATE  Local Store/td_26_brendangregg.db-journal
    3754607676      503  65002 TweetDeck    VNOP_REMOVE  Local Store/td_26_brendangregg.db-journal
    3754609536      503  34852 Adium        VNOP_CREATE  Default/.dat8824.9fa
    3754711583      503  17726 thnuclnt     VNOP_CREATE  thnuclnt-17717/thnuclnt.conf-fta17726
    3754711587      503  17726 thnuclnt     VNOP_REMOVE  -Tmp-/4e8220118e918
    3754711617      503  17726 thnuclnt     VNOP_REMOVE  -Tmp-/4e8220118ead8
    3754711620      503  17726 thnuclnt     VNOP_REMOVE  -Tmp-/4e8220119621c
    3754711623      503  17726 thnuclnt     VNOP_REMOVE  -Tmp-/4e82201196d7e
    3754711624      503  17726 thnuclnt     VNOP_REMOVE  thnuclnt-17717/thnuclnt.conf-fta17726
    3754612740      503  31502 vim          VNOP_CREATE  macosx/4913
    3754612740      503  31502 vim          VNOP_REMOVE  macosx/4913
    3754612741      503  31502 vim          VNOP_CREATE  macosx/top10dtrace.html
    3754612741      503  31502 vim          VNOP_REMOVE  macosx/top10dtrace.html~
    [...]
    

    Interesting! While tracing I saved the file I was editing in vim, which is seen in the last four lines. This tells me that vim is creating and removing temporary files as part of the save process.

    Also note that it looks like TweetDeck created the file twice before removing it (two VNOP_CREATEs followed by a VNOP_REMOVE). This isn’t the correct order, which can be seen by examining the TIME(ms) column. A side-effect of DTrace’s negligible performance impact design is that output can be slightly shuffled due to the way it collects data from per-CPU buffers. I often include a TIME column like that one, not just for the usefulness of knowing time, but also as a means to post sort the output.

    Other Scripts

    There are other tools ready-to-go in Mac OS X. If you run the “man -k dtrace” command in a Terminal (or “apropos dtrace”), you’ll see the following output:

    Brendan-2:~ brendan$ man -k dtrace
    [...]
    bitesize.d(1m)           - analyse disk I/O size by process. Uses DTrace
    cpuwalk.d(1m)            - Measure which CPUs a process runs on. Uses DTrace
    creatbyproc.d(1m)        - snoop creat()s by process name. Uses DTrace
    dappprof(1m)             - profile user and lib function usage. Uses DTrace
    dapptrace(1m)            - trace user and library function usage. Uses DTrace
    diskhits(1m)             - disk access by file offset. Uses DTrace
    dispqlen.d(1m)           - dispatcher queue length by CPU. Uses DTrace
    dtrace(1)                - generic front-end to the DTrace facility
    dtruss(1m)               - process syscall details. Uses DTrace
    errinfo(1m)              - print errno for syscall fails. Uses DTrace
    execsnoop(1m)            - snoop new process execution. Uses DTrace
    fddist(1m)               - file descriptor usage distributions. Uses DTrace
    filebyproc.d(1m)         - snoop opens by process name. Uses DTrace
    hotspot.d(1m)            - print disk event by location. Uses DTrace
    httpdstat.d(1m)          - realtime httpd statistics. Uses DTrace
    iofile.d(1m)             - I/O wait time by file and process. Uses DTrace
    iofileb.d(1m)            - I/O bytes by file and process. Uses DTrace
    iopattern(1m)            - print disk I/O pattern. Uses DTrace
    iopending(1m)            - plot number of pending disk events. Uses DTrace
    iosnoop(1m)              - snoop I/O events as they occur. Uses DTrace
    iotop(1m)                - display top disk I/O events by process. Uses DTrace
    kill.d(1m)               - snoop process signals as they occur. Uses DTrace
    lastwords(1m)            - print syscalls before exit. Uses DTrace
    loads.d(1m)              - print load averages. Uses DTrace
    newproc.d(1m)            - snoop new processes. Uses DTrace
    opensnoop(1m)            - snoop file opens as they occur. Uses DTrace
    pathopens.d(1m)          - full pathnames opened ok count. Uses DTrace
    pidpersec.d(1m)          - print new PIDs per sec. Uses DTrace
    plockstat(1)             - front-end to DTrace to print statistics about POSIX mutexes and read/write locks
    priclass.d(1m)           - priority distribution by scheduling class. Uses DTrace
    pridist.d(1m)            - process priority distribution. Uses DTrace
    procsystime(1m)          - analyse system call times. Uses DTrace
    runocc.d(1m)             - run queue occupancy by CPU. Uses DTrace
    rwbypid.d(1m)            - read/write calls by PID. Uses DTrace
    rwbytype.d(1m)           - read/write bytes by vnode type. Uses DTrace
    rwsnoop(1m)              - snoop read/write events. Uses DTrace
    sampleproc(1m)           - sample processes on the CPUs. Uses DTrace
    seeksize.d(1m)           - print disk event seek report. Uses DTrace
    setuids.d(1m)            - snoop setuid calls as they occur. Uses DTrace
    sigdist.d(1m)            - signal distribution by process. Uses DTrace
    syscallbypid.d(1m)       - syscalls by process ID. Uses DTrace
    syscallbyproc.d(1m)      - syscalls by process name. Uses DTrace
    syscallbysysc.d(1m)      - syscalls by syscall. Uses DTrace
    topsyscall(1m)           - top syscalls by syscall name. Uses DTrace
    topsysproc(1m)           - top syscalls by process name. Uses DTrace
    weblatency.d(1m)         - website latency statistics. Uses DTrace
    

    Plus more in the DTrace book.

    And More…

    DTrace isn’t just about running scripts; you can write your own custom scripts, run one-liners, and use higher-level tools that use DTrace behind the scenes, like Apple’s Instruments and Joyent’s Cloud Analytics. And it’s for more than just your MacBook: if you are using servers that have DTrace available, you can use it to diagnose their performance and issues too, including tracing kernel and application code. I regularly use it to see how a MySQL database interacts with a kernel file system, for example.

    For more reading about DTrace, you can see my posts tagged dtrace, other blogs on dtrace.org, the original 410 page DTrace guide, 1100 page DTrace book, and my original DTrace page (which includes the DTraceToolkit). It’s a little old now (and I think prustat needs updating), but there’s also Matty’s Top Ten DTrace Scripts, which includes some of those above; and Greg Miller’s Exploring Leopard with DTrace, which includes Objective-C tracing.

    For another post on Mac OS X performance, see my USE Method-based Mac OS X performance checklist.

  • 相关阅读:
    随机的爱
    在使用ASP.NET时进行页面重定向的3种方法。(转)
    [程序员必看]请不要做浮躁的人
    C# datetime 操作
    心碎的浪漫
    动网论坛 企业版

    内存映射文件(File Mapping)API
    内存映射文件
    已知进程句柄,如何知道此进程的主窗口句柄
  • 原文地址:https://www.cnblogs.com/Proteas/p/3727297.html
Copyright © 2011-2022 走看看