Copy-and-Patch: A Copy-and-Patch Tutorial

uint8_t cnp_stencil__code[] = { // Copy the bytes from the top of the function until the jmp. }; uint8_t* cnp_copy_(uint8_t* stencil_start) { const size_t stencil_size = sizeof(cnp_stencil__code); memcpy(stencil_start, cnp_stencil__code, stencil_size); return stencil_start + stencil_size; } // If any relocations exist for the stencil, fill in the values. // If not, just skip writing this function. void cnp_patch_(uint8_t* stencil_start, /* … */ ) { memcpy(stencil_start + /*relocation_offset*/, &value, /* relocation_size */); }

#include #include #include #include #include #include //#include “cnp_stencils.h” uint8_t* cnp_copy_load_int_reg1(uint8_t* stencil_start); void cnp_patch_load_int_reg1(uint8_t* stencil_start, int value); uint8_t* cnp_copy_load_int_reg2(uint8_t* stencil_start); void cnp_patch_load_int_reg2(uint8_t* stencil_start, int value); uint8_t* cnp_copy_add_int1_int2(uint8_t* stencil_start); uint8_t* cnp_copy_return_int1(uint8_t* stencil_start); typedef int(*jit_func)() __attribute__((preserve_none)); jit_func create_add_1_2() { // Most systems mark memory as non-executable by default // and mprotect() to set memory as executable needs // to be run against mmap-allocated memory. We start // by allocating it as read/write, and then switch it // to write/execute once we’re done writing the code. uint8_t* codedata = mmap(NULL, 256, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0); assert (codedata != MAP_FAILED); jit_func ret = (jit_func)codedata; // Concatenate our program together, while saving the // locations that need to be patched. uint8_t* load_int_reg1_location = codedata; codedata = cnp_copy_load_int_reg1(codedata); uint8_t* load_int_reg2_location = codedata; codedata = cnp_copy_load_int_reg2(codedata); codedata = cnp_copy_add_int1_int2(codedata); codedata = cnp_copy_return_int1(codedata); // Overwrite the zero value placeholders with our intended // specialized values: 1 and 2. cnp_patch_load_int_reg1(load_int_reg1_location, 1); cnp_patch_load_int_reg2(load_int_reg2_location, 2); // Now that we’re done writing, remove write access and // allow execution from this page instead. int rc = mprotect(ret, 256, PROT_READ | PROT_EXEC); if (rc) { perror(“mprotect”); } return ret; } int main() { jit_func add_1_2 = create_add_1_2(); int result = add_1_2(); printf(“JIT’d 1 + 2 = %dn”, result); return 0; }

Copy-and-patch works by writing stencils, minimal C functions that implement the desired individual operations such that they compile to concatenate native code fragments. At JIT compile time, one can copy the pre-compiled fragment for each operation back-to-back, patching them change embedded constants or addresses as needed..

As an adventure into understanding how copy-and-patch works, our goal will be to create the function

But specialized at runtime to compute 1 + 2. We’ll be doing this by first breaking it down into some bytecode-sized operations:

const_int_reg1: a = 1; const_int_reg2: b = 2; add_int1_int2: c = a + b; return_int1: return c;

Implement the operation in C with relocation holes to be later patched to form our stencil. Compile the stencil into native code. Copy-paste the native code back into a C file with functions to emit it to a buffer and patch any relocations.

Then we can write our little JIT compilation engine to concatenate our stencils and execute the generated function. Let’s get started!

We compile this with clang -O3 -mcmodel=medium -c stencils.c, and examine the generated code via objdump -d -Mintel,x86-64 –disassemble –reloc stencils.o. This yields:

(The NOP’s aren’t actually a part of the function, they’re just padding added so that each function starts with 16 byte alignment.)

For each of these stencils, we fill in a template to form our stencil generation library to use during JITing.

In a fully automated setup, all of this work will happen as part of the build system. The stencil compilation and transforming them into a library of copy functions and patch functions happens as part running make.

With our stencil library in place, we can use our code generation functions to build our runtime specialized adder:

We’ve successfully built runtime code generation, while letting clang do the hard work of actually writing the assembly code, and our JIT compiler is just a bunch of memcpy calls!

(If you’re interested in the details of why these macros are the way they are, see the next post in the series!)

Copy-and-patch Compilation is a fascinating way of constructing a baseline JIT[1]. It permits incredibly fast runtime compilation of code fragments in a very easy to maintain fashion, requires barely any actual understanding of assembly code, and produces native code of sufficient quality to be within the same range as traditional, hand-written baseline JITs. [1]: Baseline JIT, as in a JIT whose goal is primarily to generate code quickly and gain performance by removing interpretation overhead than generating well optimized code itself. Baseline JITs can be paired with optimizing JITs, like V8’s Liftoff baseline JIT for WASM allowing tiering up into V8’s Crankshaft optimizing JIT. Copy-and-patch works by writing stencils, minimal C functions that implement the desired individual operations such that they compile to concatenate native code fragments. At JIT compile time, one can copy the pre-compiled fragment for each operation back-to-back, patching them change embedded constants or addresses as needed.. As an adventure into understanding how copy-and-patch works, our goal will be to create the function int add_a_b(int a, int b) { return a + b } But specialized at runtime to compute 1 + 2. We’ll be doing this by first breaking it down into some bytecode-sized operations: const_int_reg1: a = 1; const_int_reg2: b = 2; add_int1_int2: c = a + b; return_int1: return c; And to define our copy-and-patch JIT, we’ll take each of these and: Implement the operation in C with relocation holes to be later patched to form our stencil. Compile the stencil into native code. Copy-paste the native code back into a C file with functions to emit it to a buffer and patch any relocations. Then we can write our little JIT compilation engine to concatenate our stencils and execute the generated function. Let’s get started!

#include #define STENCIL_FUNCTION __attribute__((preserve_none)) extern char cnp_value_hole[65536]; extern void cnp_func_hole(void) STENCIL_FUNCTION; #define STENCIL_HOLE(type) (type)((uintptr_t)&cnp_value_hole) #define DECLARE_STENCIL_OUTPUT(…) typedef void(*stencil_output_fn)(__VA_ARGS__) STENCIL_FUNCTION; stencil_output_fn stencil_output = (stencil_output_fn)&cnp_func_hole; STENCIL_FUNCTION void load_int_reg1() { int a = STENCIL_HOLE(int); DECLARE_STENCIL_OUTPUT(int); stencil_output(a); } STENCIL_FUNCTION void load_int_reg2(int a) { int b = STENCIL_HOLE(int); DECLARE_STENCIL_OUTPUT(int, int); stencil_output(a, b); } STENCIL_FUNCTION void add_int1_int2(int a, int b) { int c = a + b; DECLARE_STENCIL_OUTPUT(int); stencil_output(c); } STENCIL_FUNCTION int return_int1(int a) { return a; }

0000000000000000 : 0: 41 bc 00 00 00 00 mov r12d,0x0 2: R_X86_64_32 cnp_value_hole 6: e9 00 00 00 00 jmp b 7: R_X86_64_PLT32 cnp_func_hole-0x4 b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0] 0000000000000010 : 10: 41 bd 00 00 00 00 mov r13d,0x0 12: R_X86_64_32 cnp_value_hole 16: e9 00 00 00 00 jmp 1b 17: R_X86_64_PLT32 cnp_func_hole-0x4 1b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0] 0000000000000020 : 20: 45 01 ec add r12d,r13d 23: e9 00 00 00 00 jmp 28 24: R_X86_64_PLT32 cnp_func_hole-0x4 28: 0f 1f 84 00 00 00 00 nop DWORD PTR [rax+rax*1+0x0] 2f: 00 0000000000000030 : 30: 44 89 e0 mov eax,r12d 33: c3 ret

#include uint8_t cnp_stencil_load_int_reg1_code[] = { 0x41, 0xbc, 0x00, 0x00, 0x00, 0x00, // mov r12d,0x0 }; uint8_t* cnp_copy_load_int_reg1(uint8_t* stencil_start) { const size_t stencil_size = sizeof(cnp_stencil_load_int_reg1_code); memcpy(stencil_start, cnp_stencil_load_int_reg1_code, stencil_size); return stencil_start + stencil_size; } void cnp_patch_load_int_reg1(uint8_t* stencil_start, int value) { // 2: R_X86_64_32 cnp_value_hole -> 0x02 offset memcpy(stencil_start + 0x2, &value, sizeof(value)); } uint8_t cnp_stencil_load_int_reg2_code[] = { 0x41, 0xbd, 0x00, 0x00, 0x00, 0x00, // mov r13d,0x0 }; uint8_t* cnp_copy_load_int_reg2(uint8_t* stencil_start) { const size_t stencil_size = sizeof(cnp_stencil_load_int_reg2_code); memcpy(stencil_start, cnp_stencil_load_int_reg2_code, stencil_size); return stencil_start + stencil_size; } void cnp_patch_load_int_reg2(uint8_t* stencil_start, int value) { // 12: R_X86_64_32 cnp_value_hole -> 0x12 – 0x10 base = 0x2 memcpy(stencil_start + 0x2, &value, sizeof(value)); } uint8_t cnp_stencil_add_int1_int2_code[] = { 0x45, 0x01, 0xec, // add r12d,r13d }; uint8_t* cnp_copy_add_int1_int2(uint8_t* stencil_start) { const size_t stencil_size = sizeof(cnp_stencil_add_int1_int2_code); memcpy(stencil_start, cnp_stencil_add_int1_int2_code, stencil_size); return stencil_start + stencil_size; } // No patching needed uint8_t cnp_stencil_return_int1_code[] = { 0x44, 0x89, 0xe0, // mov eax,r12d 0xc3, // ret }; uint8_t* cnp_copy_return_int1(uint8_t* stencil_start) { const size_t stencil_size = sizeof(cnp_stencil_return_int1_code); memcpy(stencil_start, cnp_stencil_return_int1_code, stencil_size); return stencil_start + stencil_size; } // No patching needed

XDEFiANCE'e Quality Internet Shop

This is the xdefiance Online Web Shop.

Reaching Outwards

Join the fun!

Recent blog posts

How to Build Reactive Declarative UI in Vanilla JavaScript

Fossil versus Git

Lightpanda migrate DOM implementation to Zig

Ai, Japanese chimpanzee who counted and painted dies at 49

CDC staff 'blindsided' as child vaccine schedule unilaterally overhauled

MIT Non-AI License

Your cart (items: 0)