Status ~~~~~~ As of Jan 2014 the trunk contains a port to AArch64 ARMv8 -- loosely, the 64-bit ARM architecture. Currently it supports integer and FP instructions and can run anything generated by gcc-4.8.2 -O3. The port is under active development. Current limitations, as of mid-May 2014. * limited support of vector (SIMD) instructions. Initial target is support for instructions created by gcc-4.8.2 -O3 (via autovectorisation). This is complete. * Integration with the built in GDB server: - works ok (breakpoint, attach to a process blocked in a syscall, ...) - still to do: arm64 xml register description files (allowing shadow registers to be looked at). cpsr transfer to/from gdb to be looked at (see also arm equivalent code) * limited syscall support There has been extensive testing of the baseline simulation of integer and FP instructions. Memcheck is also believed to work, at least for small examples. Other tools appear to at least not crash when running /bin/date. Enough syscalls and instructions are supported for substantial programs to work. Firefox 26 is able to start up and quit. The noise level from Memcheck is low enough to make it practical to use for real debugging. Building ~~~~~~~~ You could probably build it directly on a target OS, using the normal non-cross scheme ./autogen.sh ; ./configure --prefix=.. ; make ; make install Development so far was however done by cross compiling, viz: export CC=aarch64-linux-gnu-gcc export LD=aarch64-linux-gnu-ld export AR=aarch64-linux-gnu-ar ./autogen.sh ./configure --prefix=`pwd`/Inst --host=aarch64-unknown-linux \ --enable-only64bit make -j4 make -j4 install Doing this assumes that the install path (`pwd`/Inst) is valid on both host and target, which isn't normally the case. To avoid this limitation, do instead: ./configure --prefix=/install/path/on/target \ --host=aarch64-unknown-linux \ --enable-only64bit make -j4 make -j4 install DESTDIR=/a/temp/dir/on/host # and then copy the contents of DESTDIR to the target. See README.android for more examples of cross-compile building. Implementation tidying-up/TODO notes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ UnwindStartRegs -- what should that contain? vki-arm64-linux.h: vki_sigaction_base I really don't think that __vki_sigrestore_t sa_restorer should be present. Adding it surely puts sa_mask at a wrong offset compared to (kernel) reality. But not having it causes compilation of m_signals.c to fail in hard to understand ways, so adding it temporarily. m_trampoline.S: what's the unexecutable-insn value? 0xFFFFFFFF is there at the moment, but 0x00000000 is probably what it should be. Also, fix indentation/tab-vs-space stuff ./include/vki/vki-arm64-linux.h: uses __uint128_t. Should change it to __vki_uint128_t, but what's the defn of that? m_debuginfo/priv_storage.h: need proper defn of DiCfSI readdwarf.c: is this correct? #elif defined(VGP_arm64_linux) # define FP_REG 29 //??? # define SP_REG 31 //??? # define RA_REG_DEFAULT 30 //??? vki-arm64-linux.h: re linux-3.10.5/include/uapi/asm-generic/sembuf.h I'd say the amd64 version has padding it shouldn't have. Check? syswrap-linux.c run_a_thread_NORETURN assembly sections seems like tst->os_state.exitcode has word type in which case the ppc64_linux use of lwz to read it, is wrong syswrap-linux.c ML_(do_fork_clone) assuming that VGP_arm64_linux is the same as VGP_arm_linux here dispatch-arm64-linux.S: FIXME: set up FP control state before entering generated code. Also fix screwy indentation. dispatcher-ery general: what's a good (predictor-friendly) way to branch to a register? in vki-arm64-scnums.h //#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT) Probably want to reenable that and clean up accordingly putIRegXXorZR: figure out a way that the computed value is actually used, so as to keep any memory reads that might generate it, alive. (else the simulation can lose exceptions). At least, for writes to the zero register generated by loads .. or .. can anything other integer instructions, that write to a register, cause exceptions? loads/stores: generate stack alignment checks as necessary fix barrier insns: ISB, DMB fix atomic loads/stores FMADD/FMSUB/FNMADD/FNMSUB: generate and use the relevant fused IROps so as to avoid double rounding ARM64Instr_Call getRegUsage: re-check relative to what getAllocableRegs_ARM64 makes available Make dispatch-arm64-linux.S save any callee-saved Q regs I think what is required is to save D8-D15 and nothing more than that. wrapper for __NR3264_fstat -- correct? PRE(sys_clone): get rid of references to vki_modify_ldt_t and the definition of it in vki-arm64-linux.h. Ditto for 32 bit arm. sigframe-arm64-linux.c: build_sigframe: references to nonexistent siguc->uc_mcontext.trap_no, siguc->uc_mcontext.error_code have been replaced by zero. Also in synth_ucontext. m_debugger.c: uregs.pstate = LibVEX_GuestARM64_get_nzcv(vex); /* is this correct? */ Is that remotely correct? host_arm64_defs.c: emit_ARM64INstr: ARM64in_VDfromX and ARM64in_VQfromXX: use simple top-half zeroing MOVs to vector registers instead of INS Vd.D[0], Xreg, to avoid false dependencies on the top half of the register. (Or at least check the semantics of INS Vd.D[0] to see if it zeroes out the top.) preferredVectorSubTypeFromSize: review perf effects and decide on a types-for-subparts policy fold_IRExpr_Unop: add a reduction rule for this 1Sto64(CmpNEZ64( Or64(GET:I64(1192),GET:I64(1184)) )) vis 1Sto64(CmpNEZ64(x)) --> CmpwNEZ64(x) check insn selection for memcheck-only primops: Left64 CmpwNEZ64 V128to64 V128HIto64 1Sto64 CmpNEZ64 CmpNEZ32 widen_z_8_to_64 1Sto32 Left32 32HLto64 CmpwNEZ32 CmpNEZ8 isel: get rid of various cases where zero is put into a register and just use xzr instead. Especially for CmpNEZ64/32. And for writing zeroes into the CC thunk fields. /* Keep this list in sync with that in iselNext below */ /* Keep this list in sync with that for Ist_Exit above */ uh .. they are not in sync very stupid: imm64 x23, 0xFFFFFFFFFFFFFFA0 17 F4 9F D2 F7 FF BF F2 F7 FF DF F2 F7 FF FF F2 valgrind.h: fix VALGRIND_ALIGN_STACK/VALGRIND_RESTORE_STACK, also add CFI annotations could possibly bring r29 into use, which be useful as it is callee saved ubfm/sbfm etc: special case cases that are simple shifts, as iropt can't always simplify the general-case IR to a shift in such cases. LDP,STP (immediate, simm7) (FP&VEC) should zero out hi parts of dst registers in the LDP case DUP insns: use Iop_Dup8x16, Iop_Dup16x8, Iop_Dup32x4 rather than doing it "by hand" Any place where ZeroHI64ofV128 is used in conjunction with FP vector IROps: find a way to make sure that arithmetic on the upper half of the values is "harmless." math_MINMAXV: use real Iop_Cat{Odd,Even}Lanes ops rather than inline scalar code chainXDirect_ARM64: use direct jump forms when possible