Background
Starting from ARM-v8.1, Virtualization Host Extension (VHE) feature supports running unmodified OS in EL2. mrs x0, ESR_EL1
can be redirected to access ESR_EL2
with the help of system register redirection.
Host APP and Guest APP are now running in EL0, Guest Kernel executes in EL1, while Host Kernel & KVM live in EL2, EL1 is circumvented at most times.
Note
In this blog, KVM is integrated into Linux kernel directly, making module_init
execute as device_initcall
during kernel initialization.
Linux version 4.19 is chosen in the following parts.
Initialization
Initialization of KVM starts from virt/kvm/arm/arm.c:arm_init
, invoking the only kvm_init
inside the function:
kvm_init
-> kvm_arch_init
-> in_hyp_mode = is_kernel_in_hyp_mode(); // return true, HCR_EL2.E2H == 1 --> whole kernel in EL2N
-> init_common_resources
-> kvm_set_ipa_limit
-> init_subsystems
-> _kvm_arch_hardware_enable // enable hardware to access EL2
-> kvm_vgic_hyp_init // ??? debugger error
-> kvm_timer_hyp_init // ??? debugger error
-> kvm_perf_init // ??? debugger error
-> kvm_coproc_table_init // ??? debugger error
-> _kvm_arch_hardware_disable
-> kvm_irqfd_init
-> register_reboot_notifier
-> kmem_cache_create_usercopy
-> kvm_async_pf_init
-> KMEM_CACHE
-> misc_register // register /dev/kvm as a chardev, unlocked_ioctl = kvm_dev_ioctl
-> kvm_vfio_ops_init
KVM Simple Example
- KVM_CREATE_VM (use kvmfd, return vmfd)
kvm_dev_ioctl_create_vm
-> kvm_create_vm
-> kvm_arch_alloc_vm // has_vhe ? vzalloc : kzalloc
-> kvm_eventfd_init
-> kvm_arch_init_vm // initialize stage2 mm & vgic
-> kvm_arm_setup_stage2 // VTCR_EL2 related, 2-level pgtable (not split host PMD huge pages)
-> kvm_alloc_stage2_pgd
-> create_hyp_mappings // kernel already in hyp mode, do nothing
-> kvm_vgic_early_init // initialize static VGIC VCPU data structures
-> kvm_init_mmu_notifier // maybe for swap???
-> kvm_coalesced_mmio_init
-> get_unused_fd_flag // alloc vmfd
-> fd_install
- KVM_SET_USER_MEMORY_REGION (use vmfd, setup guest code)
copy_from_user
kvm_vm_ioctl_set_memory_region
-> kvm_set_memory_region // allocate memory, map at the given GPA
- KVM_CREATE_VCPU (use vmfd, return vcpufd)
kvm_vm_ioctl_create_vcpu
-> kvm_arch_vcpu_create
-> kmem_cache_zalloc
-> kvm_vcpu_init
-> kvm_arch_vcpu_init
-> kvm_timer_vcpu_init
-> kvm_vgic_vcpu_init
-> create_hyp_mappings // kernel already in hyp mode, do nothing
-> kvm_arch_vcpu_setup // arm64 do nothing
-> create_vcpu_fd. // kvm_vcpu_fops: map (struct kvm_run) to userspace
-> kvm_arch_vcpu_postcreate // arm64 do nothing
- KVM_ARM_VCPU_INIT (use vcpufd)
copy_from_user
kvm_arch_vcpu_ioctl_vcpu_init
-> kvm_vcpu_set_target
-> vcpu_reset_hcr // hcr_el2 |= HCR_E2H, etc.
- KVM_SET_ONE_REG (use vcpufd)
copy_from_user
kvm_arm_set_reg
- KVM_RUN (use vcpufd)
kvm_arch_vcpu_ioctl_run
-> kvm_vcpu_first_run_init
-> vcpu_load
-> kvm_arch_vcpu_load
-> kvm_arm_set_running_vcpu // this_cpu_write
-> kvm_vgic_load
-> kvm_vcpu_load_sysregs
-> put_cpu // preempt_enable
-> while loop (ret > 0)
-> cond_resched + update_vmid + check_vcpu_requests
-> preempt_disable + local_irq_disable ...
-> guest_enter_irqoff + kvm_arm_vhe_guest_enter
-> kvm_vcpu_run_vhe (return exception_index)
-> __activate_vm // vtcr_el2, vttbr_el2
-> __activate_traps // cpacr_el1, vbar_el1
-> sysreg_restore_guest_state_vhe
-> while loop (fixup_guest_exit)
-> __guest_enter // arch/arm64/kvm/hyp/entry.S
-> sysreg_save_guest_state_vhe
-> __deactivate_traps
-> sysreg_restore_host_state_vhe
-> kvm_arm_vhe_guest_exit
-> local_irq_enable
-> guest_exit
-> handle_exit_early + preempt_enable
-> handle_exit
-> handle_trap_exceptions // if exception_index is ARM_EXCEPTION_TRAP
-> arm_exit_handlers[hsr_ec]
-> vcpu_put
-> preempt_disable
-> kvm_arch_vcpu_put
-> kvm_vcpu_put_sysregs
-> kvm_vgic_put
-> kvm_arm_set_running_vcpu
-> preempt_enable