So let’s see what file has to say about our. Applications/Google Chrome.app/Contents/MacOS/Google Chrome: Mach-O 64-bit executable x86_64 $ file /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome README.md: UTF-8 Unicode English text, with very long lines Here are some examples: $ file favicon-16x16.pngįavicon-16x16.png: PNG image data, 16 x 16, 8-bit colormap, non-interlacedįavicon.ico: MS Windows icon resource - 3 icons, 48x48, 256-colors This is a time profile of a simple program I made specifically for analysis without any complex threading or multi-process behaviour: simple.cpp.Ī good first step when trying to analyze any file is to use the unix file program.įile will try to guess the type of a file by looking at its bytes. If you’d like to follow along with these steps, you can find my test file here: ace, which is a profile from Instruments 8.3.3. This is exactly the information we want to extract: timestamps, and call stacks. Instruments’ Time Profiler is a sampling profiler.Īfter you record a time profile in Instruments, you can see list of samples with their timestamps and associated call stacks. A manual way of doing this if you don’t have a profiler is to just repeatedly pause the program in a debugger and look at the call stack. The program will respond with its current call stack is (or call stack s, in the case of a multithreaded program), then the profiler will record that call stack along with the current timestamp. While the program being analyzed is running, a sampling profiler will periodically ask the running program “Hey! What are you doing RIGHT NOW?”. We’re trying to import a CPU time profile, which helps us answer the question “where is all the time going in my program?” There are many different ways to analyze runtime performance of a program, but one of the most common is to use a sampling profiler. If you get stuck trying to do something similar, don’t be discouraged! A brief introduction to sampling profilersīefore we dig into the file format, it will be helpful to understand what kind of data we need to extract. For the sake of brevity, what’s presented here is a much smoother process than I really went. Guessing binary formats with Synalyze It!ĭisclaimer: I got stuck many times trying to understand the file format.Exploring binary file contents with xxd.Finding the list of samples with find and du.A brief introduction to sampling profilers.This was my first foray into complex binary file reverse engineering, and I’d like to share my process for doing it, hopefully teaching you about some tools along the way. trace file format, by contrast, is a complex, multi-encoding format which seems to use several hand-rolled binary formats. Up until this point, all of the formats I’ve been importing into speedscope have been either plaintext or JSON, which lends them to easier analysis. If we can extract the right information from the files Instruments outputs, then we can construct flamecharts to help us build intuition for what’s happening while our code is executing. The tool of choice to do that on OS X is Instruments. Occasionally, however, it’s helpful to be able to profile the native build we use for development and debugging. It can import CPU profile formats from a variety of sources, like Chrome, Firefox, and Brendan Gregg’s stackcollapse format.Īt Figma, I work in a C codebase that cross-compiles to asm.js and WebAssembly to run in the browser. Over the last few months, I’ve been building a performance visualization tool called speedscope. Have you ever wondered how applications store their data? Plenty of file formats like MP3 and JPG are standardized and well documented, but what about custom, proprietary file formats? What do you do when you want to extract data that you know is in a file somewhere, and there are no APIs to extract it? Reverse Engineering Instruments’ File Format
0 Comments
Leave a Reply. |