-
Notifications
You must be signed in to change notification settings - Fork 16
/
TODO
291 lines (257 loc) · 11.1 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
General
DONE - Pass cpu sub opts to cc1 (65c816 registers depend on subtype
for example)
- Rework 8080/5/Z80 to use tree based rewriting when it can and cut
optimizer file down. Need some kind of mini peephole in the backends
for speed
- Allow private peephole rules so we can do generic Z80 rst helpers
DONE - Fix comparisons of longs returning long type!
- Divison by 0 traps in helpers ?
- Why is global const data ending up .data (because .literal is strings so we'd
need to make them .code which might be worth it for unbanked ?)
- Option to put data/constdata into other segs like discard ?
(watch literals though)
- Rework helper() as some kind of target
gen_helper(fnname, typestr/NULL, typestr/NULL);
that does the formatting up and gen_helpcall and tail bits in one
go.
- Helper for backends that works out what can be done entirely 8bit and
marks the subtree
- Sort out make deps for be-* files
- General extra optimiser pass that loads, merges, cleans up and
optimizes trees.
- In the backends support reversible ops (eg >= <= > <) with a rewrite
to put const on the right as we do with the directly switchable ops
- Can we generically rewrite T_PLUS(T_LOCAL, n) to fold in the add -
and do we need to
1802:
- Optimizations
- shift by 8,16,24 l/r unsigned helpers
- multiply and divide fastpaths with shift
- peephole for jumps
- peephole for store/reload of local
- use plusconst in eqops for add/sub (will need direct helpers
- use mul/div optimizations also in mul/div/rem of eqops
- enable registers
EE200:
DONE - Build an ee200 emulator
- Start running basic tests
- Finish mull / div32x32 stuff
- Debug conditions
- Simpler CPU6 optimizations
Nova:
- Pseudo ops for jmp with range
- Experiment with deferred cleanup (would be good on 65C816 esp too)
Keep sp, and spfree. On a func return we adjust spfree, on a push/pop
we do push/pop or generate a load relative to AC3 and adjust spfree if
in the spfree range. Before a branch/ret we correct spfree/sp. So we'd
generate
jsr __cpush,0
.word 55
jsr 1,1
.word __func
lda 1,6,3
sta 0,3
jsr 1,1
.word __func2
popa 0
Z80:
MOSTLY- Sort out size of code
- parse arg code from 65C816 but probably just use it to
deal with reg args as push ix/iy/bc directly (also track some HL
usage so we can do stuff like ld hl,%d push hl push hl - ditto 808x)
- track HL ?
DONE - Option for no IX (or global IX ?)
DONE - Some kind of core code hooks for passing platform ops
so you can specify stuff like -mz80-noiy to target.c and backend.c
- Move work back over to fuzix tree and check size of pass 2
- Some lousy register ops still happen where RREF could be much
smarter.
- Banked mode (needs bintools fixes for subarch stuff and generate
push af/blah/pop af and 2 byte extra offset). Will then need a
bunch of linker changes
[compiler side done]
- Is it worth doing extra saves of bc' de' hl' / bc de hl in the
signal handler etc and using them as 3 extra reg pairs. We can
only really use them faster load/store and some limited
optimizations (inc/dec/ maybe add/sub const) and they would not
be that cheap to get to (exx push exx pop) but are 4 bytes into
reg needed (29 cycles)
- Deferred stack cleanup and stack adjust so we can optimize
initializers
- Ultimately work out what can be done 8bit and do it via A
Z8:
- CCONLY (enables stuff like decw testing for 1 or tests for 0 etc)
- plusplusc is probably not worth it
- Track regpairs for r4/r6/r8/r10 and use them as well when loading
r14/r15 ?
- reg globals somehow
Super8:
- As Z8
- bit ops - optimize | 1<< & ~ 1 << etc like on Z80
- optimize bit tests once we get CCONLY
- Can we do <0 compares etc with clr r3, bcp r3,r2,#15 ?
- MULT/DIV and other op improvements, better use of ldei etc
- Why are some reg derefs not being optimized nicely eg *--p ends up
shuffling stuff in and out of r14/r15
DONE - use CALL IA for obvious targets
- gargr2, pargr2 (worth 700 bytes on Fuzix)
(more if hit some common offsets with own helpers)
- pushln (377) + maybe some common offsets
- cc* - 479
- kill nstore2 on super8 ? def if not -Os)
DONE?- Find and for -Os out of line
or r3, r2
clr r2
jr z, X1
ld r3,#1
X1:
xor r3,#1
find how ldw rr2, is getting 32bit values sometimes without
maskign for -1 etc
Rewrite all the
clr r2
call __cceqconstb (and other cc forms)
to remove the clr r2
load from reg not using ldw ?
ld r14,r6
ld r15,r7
add 217,#2
adc 216,#0
jp __cleanup2
should have used incw incw
(also we need to see if we can make all these irq atomic)
__cpluseq1 __cpluseq2 __cpluseq4 // ditto for char helpers
680x
- Size of code!
- Inline simple 8/16/24 style const shifts
- ++/-- for 1/2 bytes optimize to use inc/dec in place if
not needing result. Allow for offset of X so we can do
locals (usual case) fast.
DONE - Turn on LREF etc for 32bit types (and check float safe)
(done for those with Y not clear worth it otherwise)
DONE - Add floats (showing a test fail ?)
DONE - Track what X points to
- Track Y and D int values when working 32bit
- Optimise Y changes via INY/DEY when it makes sense
DONE?- ++ etc should store via x but remember if d now is the value
- 6809 for conditions don't use sub and re-order to but cmp and flip the
logic test (eg while (i < j) generates poor code)
PART - Don't go via X for ,S style += forms on 6809
- CC flags
MOST - Fix linker for ,PCREL stuff. Add options on -m6809 for lbra/pcrel v jmp etc
- Arg helpers - maybe arg0 in D, but certainly take const args via X
- 6809 - figure out how to turn assignments of T_PLUS(T_LOCAL, n)) into
leax, stx ?
- 6303 - use X/S tracking to optimise cleanup on function end
- lplus2xb peephole (https://github.com/EtchedPixels/Fuzix-Compiler-Kit/issues/144#issuecomment-2448679156)
65C816
8080
- Import Z80 trick for locals via DE
shld %1
shld %2
lhld %1 but volatiles...
mvi l,0
mov a,l
=
xra a
mov l,a
(and look at removing surplus writes to l in those cases)
push h
pop h
push h
=
push h ?
xchg
call __cmpne | __cmpeq
remove the xchg as de/hl is same as hl/de
others invert logic and remove xchg
- Z80 find why asm blew up for (ix) not (ix + 0)
- Optimise compares to 0 and FFFF on 8080/5 with
mov a,h ana l and mov a,h ora l ??
and sbc hl,de on z80 - these all need the flags only bits
- keep a volatile count of in context volatiles and some short arrays of names of volatiles
in scope so we can do basic volatiles as well as casting style
- spot conditionals that are if (simple) and turn them into some kind of FLAGSONLY node so that
we can optimize simple conditions and avoid bool calls (eg ora a jz)
- set a FLAGONLY on the top node of the conditional in an if/while/.., then if we see a
condition with FLAGONLY we can optimize it in the target
for stuff like if (*p) we can spot the boolc with FLAGSONLY and optimise it to
stuff like ld a,h or l and ld a,l or a (and peep can fix some of the other bits)
[Part done CCONLY exists now to use it more]
- Switch optimizer
- Optimizer options so can switch between cheap, full and add on stuff like rst hooks
- register arguments (some way to pass the info and then generate a subtree
EQ REG regvar DEREF ARGUMENT n to initialize it)
- hash array and function type lookup
- hash name lookup
- rewrite x = x + .. and x = x -... etc as += -=
- Turn the frame the other way up ?
- Track register state on backend
- Fix up the canonical and array decay - check decay to array type
- Have O_ op table tight packed and rerite T_ into O_ entries, then have a property table
for them (memread, memwrite, modworking, sideffect, setcc, communtatitve)
(rewrite is easy - directly code the ops with the right bits so T_ = O_ except
for the ones based on the ascii. Pack those tight and use a 256 byte lookup)
- Can we tag the presence of volatile somehow. It is rare so we could just
remember if volatile appeared in a cast anywhere in the expression and assume
badness, and also keep a small table of names that are volatile (eg keep
10 sym ptrs for global and for local ?)
- Other volatile option, is to pass a volatile marker on expression trees if
a volatile cast appears or a local is marked volatile. Globals are a bit
tricker that way ? (in progress)
- 8085 - load byte vars by word loading the right offset and ignoring one.
- Eliminate assignment to self ?
- Pass "no &local used in this function" for optimization hints (means local
writes don't invalidate mem which for many reg processors with tracking is
really important)
Broken
- need to switch how we handle -ve numbers to fix -32768 problem but also
stuff like negative floats. In check for MINUS/PLUS we need to look for
-ve constant and if so invert it and re-insert a fake MINUS token
(Other plan -
unambiguous minus is T_NEGATE
ambiguious generate T_NEGMAYBE then the value. If we are
expecting a uni op then we parse it as T_NEGATE. If we
were expecting a value we throw it and negate the following
constant)
- Fix
{ int n; { long n; } } - need to know if we are creating new sym level
- Fix
void (*p)(void); void x(void); if (p == x) - bad type (&x works)
Longer term
- fix ptrdiff_t matches INT assumption
- backend hooks for building stack frame via initializers
- literal based handling for some types (set by target) - eg double/longlong
so that we pass pointers around including one to a memory "register".
&x will need care as do casts
- do we want to go to some kind of table rule parsing model over the
current code based one ?
- look at some kind of simple register assignment rewriting so that 6803/6809
etc can try and rewrite some subtrees to use an index register, and maybe
also eliminate the push case for some of those
- simple register tracking helper library
- rewrite && and || and maybe ?: so that we don't have
(AND (BOOL (x)) (BOOL(y))) but some kind of
BOOL(AND(CC(x) CC(y))) and work on condition code plus a tidy if needed so
that we can have ccode setting trees that drop the top bool if the CC is
ok. - IP partly done but need to sort branch trees out to get best result
- 6809 - use pshs/puls as a sneaky tight way to mod the stack
- Optimizer pass for tree rearrangement ?
PARTDONE - Whilst we need to be careful for side effects on * 0, we don't for and/or
subtrees with && || so should deal with those
- native compile support for machines where a pointer is a native long type. Cross
is going to be fine usually
- Write tree size before each expr and load and chain multiple trees if room
Does need care handling any extra header stuff in the way.
- Can we walk the subtrees looking for the most repeated left/right expensive
refs (ie locals on 8080 or Z80) and rewrite the tree so the most common is
in say BC, then rewrite subtree with REG and a load/put of the reg
- 8080 pass first argument in HL, so defer final push before func call. Then
can xthl push hl to get stack in order, and do cleanup of all args on
the return path instead of caller - would need vararg help for cleanup ?
- 8080 rewrite SHL(constant 1, by n) into a 1 << n node so we can gen a
fast 1 << n (lookup table ?)
DONE - Walk subtrees of logic ops to try and optimize bools if value not used and subnodes just
need cc
- Turn !! (NOT NOT) into BOOL