decodetree.rst (8959B)
1======================== 2Decodetree Specification 3======================== 4 5A *decodetree* is built from instruction *patterns*. A pattern may 6represent a single architectural instruction or a group of same, depending 7on what is convenient for further processing. 8 9Each pattern has both *fixedbits* and *fixedmask*, the combination of which 10describes the condition under which the pattern is matched:: 11 12 (insn & fixedmask) == fixedbits 13 14Each pattern may have *fields*, which are extracted from the insn and 15passed along to the translator. Examples of such are registers, 16immediates, and sub-opcodes. 17 18In support of patterns, one may declare *fields*, *argument sets*, and 19*formats*, each of which may be re-used to simplify further definitions. 20 21Fields 22====== 23 24Syntax:: 25 26 field_def := '%' identifier ( unnamed_field )* ( !function=identifier )? 27 unnamed_field := number ':' ( 's' ) number 28 29For *unnamed_field*, the first number is the least-significant bit position 30of the field and the second number is the length of the field. If the 's' is 31present, the field is considered signed. If multiple ``unnamed_fields`` are 32present, they are concatenated. In this way one can define disjoint fields. 33 34If ``!function`` is specified, the concatenated result is passed through the 35named function, taking and returning an integral value. 36 37One may use ``!function`` with zero ``unnamed_fields``. This case is called 38a *parameter*, and the named function is only passed the ``DisasContext`` 39and returns an integral value extracted from there. 40 41A field with no ``unnamed_fields`` and no ``!function`` is in error. 42 43Field examples: 44 45+---------------------------+---------------------------------------------+ 46| Input | Generated code | 47+===========================+=============================================+ 48| %disp 0:s16 | sextract(i, 0, 16) | 49+---------------------------+---------------------------------------------+ 50| %imm9 16:6 10:3 | extract(i, 16, 6) << 3 | extract(i, 10, 3) | 51+---------------------------+---------------------------------------------+ 52| %disp12 0:s1 1:1 2:10 | sextract(i, 0, 1) << 11 | | 53| | extract(i, 1, 1) << 10 | | 54| | extract(i, 2, 10) | 55+---------------------------+---------------------------------------------+ 56| %shimm8 5:s8 13:1 | expand_shimm8(sextract(i, 5, 8) << 1 | | 57| !function=expand_shimm8 | extract(i, 13, 1)) | 58+---------------------------+---------------------------------------------+ 59 60Argument Sets 61============= 62 63Syntax:: 64 65 args_def := '&' identifier ( args_elt )+ ( !extern )? 66 args_elt := identifier (':' identifier)? 67 68Each *args_elt* defines an argument within the argument set. 69If the form of the *args_elt* contains a colon, the first 70identifier is the argument name and the second identifier is 71the argument type. If the colon is missing, the argument 72type will be ``int``. 73 74Each argument set will be rendered as a C structure "arg_$name" 75with each of the fields being one of the member arguments. 76 77If ``!extern`` is specified, the backing structure is assumed 78to have been already declared, typically via a second decoder. 79 80Argument sets are useful when one wants to define helper functions 81for the translator functions that can perform operations on a common 82set of arguments. This can ensure, for instance, that the ``AND`` 83pattern and the ``OR`` pattern put their operands into the same named 84structure, so that a common ``gen_logic_insn`` may be able to handle 85the operations common between the two. 86 87Argument set examples:: 88 89 ®3 ra rb rc 90 &loadstore reg base offset 91 &longldst reg base offset:int64_t 92 93 94Formats 95======= 96 97Syntax:: 98 99 fmt_def := '@' identifier ( fmt_elt )+ 100 fmt_elt := fixedbit_elt | field_elt | field_ref | args_ref 101 fixedbit_elt := [01.-]+ 102 field_elt := identifier ':' 's'? number 103 field_ref := '%' identifier | identifier '=' '%' identifier 104 args_ref := '&' identifier 105 106Defining a format is a handy way to avoid replicating groups of fields 107across many instruction patterns. 108 109A *fixedbit_elt* describes a contiguous sequence of bits that must 110be 1, 0, or don't care. The difference between '.' and '-' 111is that '.' means that the bit will be covered with a field or a 112final 0 or 1 from the pattern, and '-' means that the bit is really 113ignored by the cpu and will not be specified. 114 115A *field_elt* describes a simple field only given a width; the position of 116the field is implied by its position with respect to other *fixedbit_elt* 117and *field_elt*. 118 119If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined. 120Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that. 121 122A *field_ref* incorporates a field by reference. This is the only way to 123add a complex field to a format. A field may be renamed in the process 124via assignment to another identifier. This is intended to allow the 125same argument set be used with disjoint named fields. 126 127A single *args_ref* may specify an argument set to use for the format. 128The set of fields in the format must be a subset of the arguments in 129the argument set. If an argument set is not specified, one will be 130inferred from the set of fields. 131 132It is recommended, but not required, that all *field_ref* and *args_ref* 133appear at the end of the line, not interleaving with *fixedbit_elf* or 134*field_elt*. 135 136Format examples:: 137 138 @opr ...... ra:5 rb:5 ... 0 ....... rc:5 139 @opi ...... ra:5 lit:8 1 ....... rc:5 140 141Patterns 142======== 143 144Syntax:: 145 146 pat_def := identifier ( pat_elt )+ 147 pat_elt := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt 148 fmt_ref := '@' identifier 149 const_elt := identifier '=' number 150 151The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats. 152A pattern that does not specify a named format will have one inferred 153from a referenced argument set (if present) and the set of fields. 154 155A *const_elt* allows a argument to be set to a constant value. This may 156come in handy when fields overlap between patterns and one has to 157include the values in the *fixedbit_elt* instead. 158 159The decoder will call a translator function for each pattern matched. 160 161Pattern examples:: 162 163 addl_r 010000 ..... ..... .... 0000000 ..... @opr 164 addl_i 010000 ..... ..... .... 0000000 ..... @opi 165 166which will, in part, invoke:: 167 168 trans_addl_r(ctx, &arg_opr, insn) 169 170and:: 171 172 trans_addl_i(ctx, &arg_opi, insn) 173 174Pattern Groups 175============== 176 177Syntax:: 178 179 group := overlap_group | no_overlap_group 180 overlap_group := '{' ( pat_def | group )+ '}' 181 no_overlap_group := '[' ( pat_def | group )+ ']' 182 183A *group* begins with a lone open-brace or open-bracket, with all 184subsequent lines indented two spaces, and ending with a lone 185close-brace or close-bracket. Groups may be nested, increasing the 186required indentation of the lines within the nested group to two 187spaces per nesting level. 188 189Patterns within overlap groups are allowed to overlap. Conflicts are 190resolved by selecting the patterns in order. If all of the fixedbits 191for a pattern match, its translate function will be called. If the 192translate function returns false, then subsequent patterns within the 193group will be matched. 194 195Patterns within no-overlap groups are not allowed to overlap, just 196the same as ungrouped patterns. Thus no-overlap groups are intended 197to be nested inside overlap groups. 198 199The following example from PA-RISC shows specialization of the *or* 200instruction:: 201 202 { 203 { 204 nop 000010 ----- ----- 0000 001001 0 00000 205 copy 000010 00000 r1:5 0000 001001 0 rt:5 206 } 207 or 000010 rt2:5 r1:5 cf:4 001001 0 rt:5 208 } 209 210When the *cf* field is zero, the instruction has no side effects, 211and may be specialized. When the *rt* field is zero, the output 212is discarded and so the instruction has no effect. When the *rt2* 213field is zero, the operation is ``reg[r1] | 0`` and so encodes 214the canonical register copy operation. 215 216The output from the generator might look like:: 217 218 switch (insn & 0xfc000fe0) { 219 case 0x08000240: 220 /* 000010.. ........ ....0010 010..... */ 221 if ((insn & 0x0000f000) == 0x00000000) { 222 /* 000010.. ........ 00000010 010..... */ 223 if ((insn & 0x0000001f) == 0x00000000) { 224 /* 000010.. ........ 00000010 01000000 */ 225 extract_decode_Fmt_0(&u.f_decode0, insn); 226 if (trans_nop(ctx, &u.f_decode0)) return true; 227 } 228 if ((insn & 0x03e00000) == 0x00000000) { 229 /* 00001000 000..... 00000010 010..... */ 230 extract_decode_Fmt_1(&u.f_decode1, insn); 231 if (trans_copy(ctx, &u.f_decode1)) return true; 232 } 233 } 234 extract_decode_Fmt_2(&u.f_decode2, insn); 235 if (trans_or(ctx, &u.f_decode2)) return true; 236 return false; 237 }