README.aarch64


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240


Status
~~~~~~

As of Jan 2014 the trunk contains a port to AArch64 ARMv8 -- loosely,
the 64-bit ARM architecture.  Currently it supports integer and FP
instructions and can run anything generated by gcc-4.8.2 -O3.  The
port is under active development.

Current limitations, as of mid-May 2014.

* limited support of vector (SIMD) instructions.  Initial target is
  support for instructions created by gcc-4.8.2 -O3
  (via autovectorisation).  This is complete.

* Integration with the built in GDB server:
   - works ok (breakpoint, attach to a process blocked in a syscall, ...)
   - still to do:
      arm64 xml register description files (allowing shadow registers
                                            to be looked at).
      cpsr transfer to/from gdb to be looked at (see also arm equivalent code)

* limited syscall support

There has been extensive testing of the baseline simulation of integer
and FP instructions.  Memcheck is also believed to work, at least for
small examples.  Other tools appear to at least not crash when running
/bin/date.

Enough syscalls and instructions are supported for substantial
programs to work.  Firefox 26 is able to start up and quit.  The noise
level from Memcheck is low enough to make it practical to use for real
debugging.


Building
~~~~~~~~

You could probably build it directly on a target OS, using the normal
non-cross scheme

  ./autogen.sh ; ./configure --prefix=.. ; make ; make install

Development so far was however done by cross compiling, viz:

  export CC=aarch64-linux-gnu-gcc
  export LD=aarch64-linux-gnu-ld
  export AR=aarch64-linux-gnu-ar

  ./autogen.sh
  ./configure --prefix=`pwd`/Inst --host=aarch64-unknown-linux \
              --enable-only64bit
  make -j4
  make -j4 install

Doing this assumes that the install path (`pwd`/Inst) is valid on
both host and target, which isn't normally the case.  To avoid
this limitation, do instead:

  ./configure --prefix=/install/path/on/target \
              --host=aarch64-unknown-linux \
              --enable-only64bit
  make -j4
  make -j4 install DESTDIR=/a/temp/dir/on/host
  # and then copy the contents of DESTDIR to the target.

See README.android for more examples of cross-compile building.


Implementation tidying-up/TODO notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

UnwindStartRegs -- what should that contain?


vki-arm64-linux.h: vki_sigaction_base
I really don't think that __vki_sigrestore_t sa_restorer
should be present.  Adding it surely puts sa_mask at a wrong
offset compared to (kernel) reality.  But not having it causes
compilation of m_signals.c to fail in hard to understand ways,
so adding it temporarily.


m_trampoline.S: what's the unexecutable-insn value? 0xFFFFFFFF 
is there at the moment, but 0x00000000 is probably what it should be.
Also, fix indentation/tab-vs-space stuff


./include/vki/vki-arm64-linux.h: uses __uint128_t.  Should change
it to __vki_uint128_t, but what's the defn of that?


m_debuginfo/priv_storage.h: need proper defn of DiCfSI


readdwarf.c: is this correct?
#elif defined(VGP_arm64_linux)
#  define FP_REG         29    //???
#  define SP_REG         31    //???
#  define RA_REG_DEFAULT 30    //???


vki-arm64-linux.h:
re linux-3.10.5/include/uapi/asm-generic/sembuf.h
I'd say the amd64 version has padding it shouldn't have.  Check?


syswrap-linux.c run_a_thread_NORETURN assembly sections
seems like tst->os_state.exitcode has word type
in which case the ppc64_linux use of lwz to read it, is wrong


syswrap-linux.c ML_(do_fork_clone)
assuming that VGP_arm64_linux is the same as VGP_arm_linux here


dispatch-arm64-linux.S: FIXME: set up FP control state before
entering generated code.  Also fix screwy indentation.


dispatcher-ery general: what's a good (predictor-friendly) way to
branch to a register?


in vki-arm64-scnums.h
//#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT)
Probably want to reenable that and clean up accordingly


putIRegXXorZR: figure out a way that the computed value is actually
used, so as to keep any memory reads that might generate it, alive.
(else the simulation can lose exceptions).  At least, for writes to
the zero register generated by loads .. or .. can anything other
integer instructions, that write to a register, cause exceptions?


loads/stores: generate stack alignment checks as necessary


fix barrier insns: ISB, DMB


fix atomic loads/stores


FMADD/FMSUB/FNMADD/FNMSUB: generate and use the relevant fused
IROps so as to avoid double rounding


ARM64Instr_Call getRegUsage: re-check relative to what
getAllocableRegs_ARM64 makes available


Make dispatch-arm64-linux.S save any callee-saved Q regs
I think what is required is to save D8-D15 and nothing more than that.


wrapper for __NR3264_fstat -- correct?


PRE(sys_clone): get rid of references to vki_modify_ldt_t and the
definition of it in vki-arm64-linux.h.  Ditto for 32 bit arm.


sigframe-arm64-linux.c: build_sigframe: references to nonexistent
siguc->uc_mcontext.trap_no, siguc->uc_mcontext.error_code have been
replaced by zero.  Also in synth_ucontext.


m_debugger.c:
uregs.pstate   = LibVEX_GuestARM64_get_nzcv(vex); /* is this correct? */
Is that remotely correct?


host_arm64_defs.c: emit_ARM64INstr:
ARM64in_VDfromX and ARM64in_VQfromXX: use simple top-half zeroing
MOVs to vector registers instead of INS Vd.D[0], Xreg, to avoid false
dependencies on the top half of the register.  (Or at least check
the semantics of INS Vd.D[0] to see if it zeroes out the top.)


preferredVectorSubTypeFromSize: review perf effects and decide
on a types-for-subparts policy


fold_IRExpr_Unop: add a reduction rule for this
1Sto64(CmpNEZ64( Or64(GET:I64(1192),GET:I64(1184)) ))
vis 1Sto64(CmpNEZ64(x)) --> CmpwNEZ64(x)


check insn selection for memcheck-only primops:
Left64 CmpwNEZ64 V128to64 V128HIto64 1Sto64 CmpNEZ64 CmpNEZ32
widen_z_8_to_64 1Sto32 Left32 32HLto64 CmpwNEZ32 CmpNEZ8


isel: get rid of various cases where zero is put into a register
and just use xzr instead.  Especially for CmpNEZ64/32.  And for
writing zeroes into the CC thunk fields.


/* Keep this list in sync with that in iselNext below */
/* Keep this list in sync with that for Ist_Exit above */
uh .. they are not in sync


very stupid:
imm64  x23, 0xFFFFFFFFFFFFFFA0
17 F4 9F D2 F7 FF BF F2 F7 FF DF F2 F7 FF FF F2 


valgrind.h: fix VALGRIND_ALIGN_STACK/VALGRIND_RESTORE_STACK,
also add CFI annotations


could possibly bring r29 into use, which be useful as it is
callee saved


ubfm/sbfm etc: special case cases that are simple shifts, as iropt
can't always simplify the general-case IR to a shift in such cases.


LDP,STP (immediate, simm7) (FP&VEC)
should zero out hi parts of dst registers in the LDP case


DUP insns: use Iop_Dup8x16, Iop_Dup16x8, Iop_Dup32x4
rather than doing it "by hand"


Any place where ZeroHI64ofV128 is used in conjunction with
FP vector IROps: find a way to make sure that arithmetic on
the upper half of the values is "harmless."


math_MINMAXV: use real Iop_Cat{Odd,Even}Lanes ops rather than
inline scalar code


chainXDirect_ARM64: use direct jump forms when possible