CONTENTS

CHAPTER 1 USER ARCHITECTURE INTRODUCTION
1.1 Introduction to the Tile Processor Architecture ................................................................. 1
1.2 About this Manual ............................................................................................................. 1
1.3 What’s New In This Manual ............................................................................................ 1
1.4 Conventions .................................................................................................................... 2
  1.4.1 Byte and Bit Order ................................................................................................... 2
  1.4.2 Reserved Fields ........................................................................................................ 3
  1.4.3 Numbering ................................................................................................................ 3
1.5 Implementation Dependence .......................................................................................... 4

CHAPTER 2 BASIC ARCHITECTURE
2.1 Architectural Overview .................................................................................................... 5
  2.1.1 Tile Architecture ..................................................................................................... 6
    2.1.1.1 Processor Engine ............................................................................................... 6
    2.1.1.2 Cache Engine .................................................................................................... 7
    2.1.1.3 Switch Engine ................................................................................................. 8
  2.1.2 I/O Devices ............................................................................................................... 9
  2.1.3 iMesh ...................................................................................................................... 10
2.2 Data Types ..................................................................................................................... 11
2.3 Addressing ..................................................................................................................... 11

CHAPTER 3 PROCESSOR ENGINE ARCHITECTURE
3.1 VLIW Nature of the Processor Engine ............................................................................ 13
3.2 Atomicity of Bundles ..................................................................................................... 13
3.3 Register Set .................................................................................................................... 14
3.4 Program Counter ........................................................................................................... 15
3.5 Special Purpose Registers ............................................................................................ 16
3.6 TILE64 and TILEpro Processing Engine Pipeline ......................................................... 16
  3.6.1 Fetch ...................................................................................................................... 16
  3.6.2 RegisterFile (RF) .................................................................................................. 16
  3.6.3 Execute Stages (EX0, EX1) .................................................................................. 17
  3.6.4 WriteBack (WB) ................................................................................................... 17
  3.6.5 Pipeline Latencies ................................................................................................. 17
CHAPTER 4 PROCESSOR ENGINE INSTRUCTION SET

4.1 Overview ........................................................................................................................................................................... 19
4.1 Instruction Set Architecture ................................................................................................................................................. 19
  4.1.1 Instruction Organization and Format .......................................................................................................................... 19
    4.1.1.1 X Instruction Formats ................................................................................................................................................ 20
    4.1.1.2 Y Instruction Formats ................................................................................................................................................ 26
  4.1.2 Definitions and Semantics ............................................................................................................................................... 30
    4.1.2.1 Constants ................................................................................................................................................................. 30
    4.1.2.2 Types ...................................................................................................................................................................... 32
    4.1.2.3 Functions ............................................................................................................................................................... 32
  4.1.3 Master List of Main Processor Instructions ................................................................................................................. 35
  4.1.4 Arithmetic Instructions .................................................................................................................................................... 43
  4.1.5 Bit Manipulation Instructions ........................................................................................................................................ 63
  4.1.6 Compare Instructions ...................................................................................................................................................... 76
  4.1.7 Control Instructions ....................................................................................................................................................... 95
  4.1.8 Logical Instructions ..................................................................................................................................................... 121
  4.1.9 Memory Instructions .................................................................................................................................................... 163
  4.1.10 Memory Maintenance Instructions .......................................................................................................................... 183
  4.1.11 Multiply Instructions ................................................................................................................................................... 190
  4.1.12 NOP Instructions ................................................................................................................................................... 214
  4.1.13 SIMD Instructions ..................................................................................................................................................... 218
  4.1.14 System Instructions .................................................................................................................................................. 347
  4.1.15 Pseudo Instructions .................................................................................................................................................. 359

CHAPTER 5 MEMORY AND CACHE ARCHITECTURE

5.1 Memory Architecture ......................................................................................................................................................... 361
5.2 Cache Architecture ............................................................................................................................................................. 362
  5.2.1 Overview .................................................................................................................................................................. 362
  5.2.2 Cache Microarchitecture ............................................................................................................................................... 363
    5.2.2.1 Dynamic Distributed Cached Shared Memory ......................................................................................................... 364
    5.2.2.2 Coherent and Direct-to-Cache I/O ........................................................................................................................ 366
    5.2.2.3 Striped Memory .................................................................................................................................................... 366
  5.2.3 Direct Memory Access .................................................................................................................................................. 366
5.3 Memory Consistency Model ................................................................................................................................................. 368

CHAPTER 6 ON-CHIP NETWORK ARCHITECTURE

6.1 Overview ............................................................................................................................................................................. 373
6.2 Network Properties ............................................................................................................................................................... 373
  6.2.1 Switches ...................................................................................................................................................................... 373
  6.2.2 Packets .................................................................................................................................................................... 374
CONTENTS

6.2.3 Routing .............................................................................................................................................................................. 374
6.2.4 Flow Control ...................................................................................................................................................................... 374
6.2.5 Fairness and Arbitration ................................................................................................................................................. 374
6.2.6 Timing .............................................................................................................................................................................. 374
6.2.7 Link Width ...................................................................................................................................................................... 374
6.3 Memory Networks ............................................................................................................................................................... 374
6.3.1 Packet Sizes .................................................................................................................................................................. 374
6.3.2 Deadlock ...................................................................................................................................................................... 375
6.4 Messaging Networks .......................................................................................................................................................... 375
6.4.1 Register Mapping .......................................................................................................................................................... 375
6.4.2 Packet Format .............................................................................................................................................................. 376
6.4.3 Demux ......................................................................................................................................................................... 377
6.4.4 Deadlock ...................................................................................................................................................................... 378
6.4.5 Hardwall ...................................................................................................................................................................... 378
6.4.6 Routing ........................................................................................................................................................................ 377

CHAPTER 7 STATIC NETWORK

7.1 Overview .......................................................................................................................................................................... 381
7.2 Static Routing ................................................................................................................................................................. 381
7.3 Data Flow Control .......................................................................................................................................................... 382
7.4 Hardwall Protection ......................................................................................................................................................... 382
7.5 User-Accessible Special Purpose Registers ................................................................................................................. 383

CHAPTER 8 USER-LEVEL SYSTEM CONCERNS

8.1 Overview .......................................................................................................................................................................... 385
8.2 System Calls ...................................................................................................................................................................... 385
8.3 Interrupt Overview .......................................................................................................................................................... 386
8.3.1 Interrupt List ................................................................................................................................................................ 386
8.4 User-Level Interrupts ......................................................................................................................................................... 389
8.5 Interaction with I/O Devices ............................................................................................................................................ 389
8.6 Cycle Count ...................................................................................................................................................................... 389

APPENDIX A SPECIAL PURPOSE REGISTERS

A.1 Introduction .................................................................................................................................................................... 391
A.2 SPR Register Descriptions ............................................................................................................................................ 396

GLOSSARY .............................................................................................................................................................................. 459

INDEX .................................................................................................................................................................................. 461
1 USER ARCHITECTURE INTRODUCTION

1.1 Introduction to the Tile Processor Architecture

The Tile Processor™ is a new class of multicore processor that delivers unprecedented levels of performance, flexibility, and power efficiency in a highly integrated device. The Tile Processor is programmable in standard ANSI C, and implements Tilera’s iMesh Multicore technology, enabling application scaling across multiple cores (or tiles).

Each tile is a full-featured processor core, and is capable of running an entire operating system. Every tile implements a 32-bit, three-wide integer processor engine with an instruction fetch unit, execution units, a memory management unit including Translation Lookaside Buffers (TLBs), a 64-entry register file, and a two-level cache hierarchy. Hardware maintains cache coherence for processor and I/O memory accesses.

The tiles in the Tile Processor are connected to each other, to the on-chip memory controllers, and to the on-chip I/O controllers by multiple independent mesh networks. Tilera’s iMesh™ multicore technology enables the Tile Processor to provide performance scalability and high bandwidth/low latency communication between all on-chip components.

1.2 About this Manual

This manual is organized as follows:

- **Chapter 1: User Architecture Introduction** (this chapter) provides an overview of this manual.
- **Chapter 2: Basic Architecture** provides hierarchical overview of the Tilera Tile Processor Architecture.
- **Chapter 3: Processor Engine Architecture** describes the processor engine (PE) in detail.
- **Chapter 4: Processor Engine Instruction Set** describes the instruction set architecture and bundling rules and formats.
- **Chapter 5: Memory and Cache Architecture** describes how memory is structured and accessed.
- **Chapter 6: On-Chip Network Architecture** describes the User Dynamic Network (UDN), which is used by applications to send messages between tiles.
- **Chapter 7: Static Network** describes the structure and functions of the Static Network (STN).
- **Chapter 8: User-level System Concerns** describes the system call architecture used to implement system interactions and interrupts, and communication with I/O devices.
- **Appendix A: Special Purpose Registers** defines special instructions (Special Purpose Registers, or SPRs) that access different portions of system level state.
- **Glossary** defines terms used in this document.

1.3 What’s New In This Manual

This manual has been revised as follows:
• Introduction to the TILEPro family of processors
• New cache and memory architecture
• New instructions to support the TILEPro™ family of processors can be found in Chapter 4: Processor Engine Instruction Set. These are:
  • adds: Add Word Saturating
  • dword_align: Double Word Align
  • subs: Subtract Word Saturating
  • lbadd: Load Byte and Add
  • lbadd_u: Load Byte Unsigned and Add
  • lhadd: Load Half Word and Add
  • lhadd_u: Load Half Word Unsigned and Add
  • lw_na: Load Word No Alignment Trap
  • lwadd: Load Word and Add
  • lwadd_na: Load Word No Alignment Trap and Add
  • sbadd: Store Byte and Add
  • shadd: Store Half Word and Add
  • swadd: Store Word and Add
  • wh64: Write Hint 64 Bytes
  • addbs_u: Add Bytes Saturating Unsigned
  • addhs: Add Half Words Saturating
  • packbs_u: Pack Half Words Saturating
  • packhs: Pack Half Words Saturating
  • subbs_u: Subtract Bytes Saturating Unsigned
  • subhs: Subtract Half Words Saturating

1.4 Conventions

The following section describes the notational conventions used in this document.

1.4.1 Byte and Bit Order

The Tile Processor Architecture is little endian. More significant bytes are always numbered with a higher number than less significant bytes (LSBs). When data is stored in memory, bytes that are of greater significance are stored in higher numbered memory addresses than bytes of less significance.

When sets of bits are described or displayed in this document, bits of higher significance are displayed to the left of bits with lower significance. For instance, if 32 bits are to be displayed and are numbered from 0 to 31, bit 31 is displayed to the left of bit 0. Bits numbered with a higher number have greater significance than bits with a lower number.
1.4.2 Reserved Fields

Unused bits in control or I/O registers are considered reserved (reserved 0). When bits labeled as reserved are read they are not guaranteed to return 0. Bits denoted as reserved, must be written as 0. Bits that are ignored by the hardware are explicitly called out as being write-ignored. Writing a non-0 value to a reserved field will cause the processor to enter an undefined state.

1.4.3 Numbering

The default numeric base used in this document is base ten, or decimal representation. Any use of a numeric without an explicitly base identifier is considered to be a decimal number. Hexadecimal numbering is used widely in this document. When a numeric is to be interpreted as a hexadecimal (base sixteen) number, the prefix “0x” is prepended to the number. For example, the number 74 can also be expressed as 0x4A when written in hexadecimal.

When ranges of bits are numbered as a subset of a larger set of ordered bits a bracket notation is used. The notation contains one or two numbers separated by a colon. If only one number is specified, the numbered bit position is the bit referenced. In example, if “bus” is a 32-bit bus that is numbered 31 to 0 and the text describes bit 5, bus[5] is the nomenclature used to signify that bit. Bit ranges are specified as two numbers, with the left number being the higher-order bit locations and right number being the lower-ordered bit location. Bit ranges are inclusive of the specified higher- and lower-ordered bit locations. This nomenclature is consistent with the default manner in which little-endian bit ranges are denoted. For example, if word is a 32-bit word numbered 31 to 0 and the text describes the bits from bit 5 through bit 20, the appropriate manner to denote that is word[20:5].

Figure 1-1 shows an example of how bitfields are graphically presented in this document. Bits[31:21] are shown as reserved bits.

![Figure 1-1: Bitfield Example](image)

Figure 1-2 shows four bitfields that are logically represented along with a gap. The gap is not reserved, but is instead allocated for another use, and typically specified elsewhere.

![Figure 1-2: Bitfield Example with Fields Allocated by Other Functions](image)
1.5 Implementation Dependence

This document describes the high-level Tile Processor Architecture and the microarchitecture of the TilePro64™ and Tile64™ implementations.
2  BASIC ARCHITECTURE

2.1 Architectural Overview

This section contains an overview of the Tilera Tile Processor™ Architecture.

The Tile Processor Architecture consists of tiles, input/output devices, and a communication fabric that connects them. Figure 2-3 shows the TILE64™/TILEPro64™ Tile Processor with details of an individual tile’s structure.

![Figure 2-3: Tile Processor Hardware Architecture](image)
2.1.1 Tile Architecture

The *tile* is the basic unit of replication in the Tile Processor Architecture. A key feature of a tile is that it is identical to all other tiles in a system. The fact that tiles are homogeneous eases automated mapping of programs to an array of tiles and allows for the arbitrary placement of programs across the homogeneous array. The tile is the main source of computational power within the Tile Processor Architecture. Each tile consists of a processor engine, a cache engine, and a switch engine. Figure 2-4 takes a view inside of a tile.

![Figure 2-4: Basic Tile Architecture](image)

2.1.1.1 Processor Engine

The processor engine consists of a fetch unit, instruction decoder, issue logic, general purpose register file, and special purpose registers. Figure 2-5 shows the basic architecture of the processor engine. The processor engine is a 32-bit, three instruction wide Very Long Instruction Word (VLIW) processor. Each VLIW bundle of instructions is 64 bits and is capable of encoding two or three instructions. The processor engine contains 56 general purpose registers, seven registers that interface to the on-chip iMesh networks, and one hard-wired zero register. While the stack pointer *sp* is included in the general purpose registers it is used only as a stack pointer by software convention.

The processor engine contains three instruction execution pipelines. The three pipelines that comprise the main processor are asymmetric, and are designated pipelines 0 through 2. Pipeline 0 is capable of executing any ALU operation, bit manipulation operations, select operations, multiply operations, and fused multiply-add operations. Pipeline 1 is capable of executing any ALU operation, special purpose register reads and writes, and control flow instructions (branches and jumps). Pipeline 2 is capable of executing load and store instructions and cache and memory maintenance instructions.
2.1.1.2 Cache Engine

The tile’s cache engine is responsible for handling caching of instructions and data, providing an interface to the memory system, translating memory addresses from virtual to physical addresses, and providing a coherent view of memory. The organization of the cache subsystem is implementation-dependent and the Tile Processor Architecture does not require a specific size or organization. For example, the cache organization found within the TILE64 processor provides an 8KB level 1 processor engine instruction cache, a two-way set associative 8KB level 1 data cache, and a two-way set associative 64KB unified level 2 cache. The TILEPro64 processor provides a 16KB level 1 instruction cache, a two-way set associative 8KB level 1 data cache and a four-way set associative 64KB unified level 2 cache.

Figure 2-6 provides a conceptual block diagram of the processor/cache interface. When needed data is not found in the cache, the cache engine uses the on-chip networks to check for the data in other caches or in main memory.
The TILE64 and TILEPro processors use two independent, dynamically-routed mesh networks to communicate with multiple memory controllers on the periphery of the chip and with other tiles. The Tile Processor Architecture supports virtual memory to supply protection and relocation of data structures stored in physical memory. The cache engine contains memory management units implemented via translation lookaside buffers (TLBs).

### 2.1.1.3 Switch Engine

Each tile contains a switch engine. The switch engine connects to neighboring tiles and I/Os (including the on-chip memory controllers) via the intra-tile iMesh. The tiles are laid out in a two dimensional grid, thus the switch engine connects to the neighbors to the north, south, east, and west. The switch engine connects directly to I/O devices if a tile is adjacent to an I/O device.

The switch engine is composed of multiple dynamic networks and a single static network. The Tile Processor Architecture contains three register-mapped architecturally-defined networks: the user dynamic network (UDN), the input/output dynamic network (IDN), and the static network (STN). In addition to the architecturally defined networks, TILE64 and TILEPro contain hardware managed networks for communication with main memory and for inter-tile memory mapped communication.

*Figure 2-7 shows an example network crosspoint with fully connected crossbar.*
The UDN is primarily used by *user-level processes* to communicate with fast low-latency explicit messages. The Tilera software suite provides libraries to the developer to facilitate accessing the UDN with differing programming paradigms. The IDN is used by the system software to communicate with I/O devices and for tile-to-tile communication at the system level.

The static network is a scalar operand network designed to transport scalar values efficiently from one tile to another tile across the iMesh. The routing of the static network crossbar is controlled by the processor engine. The static network passes data to and from the main processor, and connects to the tiles to the north, south, east and west.

The static network is configured in a hard-coded routing mode, which specifies how data is to be routed from one port to another. Routing in the static network is atomic—a transfer that routes data stalls until input data is available and all the targeted output ports are free. The routing in the static network encodes the input direction that supplies data for each output direction, thereby allowing multicasting of data.

### 2.1.2 I/O Devices

The iMesh interconnect fabric extends from the periphery of the array of tiles to connect to I/O interfaces, which translate messaging packets into operations on the inputs and outputs of the chip. An I/O device can be connected to any iMesh network, but typically I/O devices are connected to the IO Dynamic Network (IDN) and memory networks. The arrangement of I/O devices and the way in which they are connected to tile-array ports is specific to a particular implementation.
Chapter 2 Basic Architecture

The TILEPro64 and TILE64 processors have the following on-chip interfaces: the TILEPro processor implementation has the following on-chip interfaces: two 10Gbps XAUI, two PCIe 4x, two 10/100/1000 Gbps Ethernet, four DDR-2 64-bit memory interfaces, 64 general purpose I/Os, two-wire interface (I^2C-compatible), SPI, HPI, and a UART.

The typical makeup of an I/O interface can be seen in Figure 2-8. This example shows an I/O device being connected to a dynamic network port. On the left side of this figure, the I/O device is connected to buffering and a finite state machine for control. The finite state machine is the key portion of the I/O interface. The finite state machine receives dynamic messages from the fabric and parses the messages. In response to messages that it parses, the finite state machine acts accordingly by controlling the I/O device. Likewise, the finite state machine receives data and control requests from the I/O device and constructs messages destined for the tiles, memory, or other I/O interfaces. I/O shims typically contain buffering in order to provide end-to-end flow control.

2.1.3 iMesh

An instantiation of the Tile Processor Architecture consists of a rectangular array of tiles and I/O devices. In order for the tiles to communicate with each other and to I/O devices, the Tile Processor Architecture provides a communication fabric called the iMesh™. The iMesh consists of the array of the switch engines, which are embedded inside each tile of the array, and the two-dimensional network that interconnects the engines and I/O devices.

There are three types of networks: the static networks, architecturally-defined dynamic networks, and implementation-specific dynamic networks. The TILE64 has the two architecturally-defined dynamic networks (IDN/UDN), the static network (STN) and two implementation specific networks to interconnect the tile’s cache engines and memory controllers. TILEPro64 adds a third memory network for coherence traffic.

Each of these networks is logically 32-bits (one word) in width. Each switch engine contains multiple independent crossbars that each contain five connections. The five connections are north, south, east, west, and one connecting to the processor engine. Each connection consists of two uni-
directional links. For example, a UDN connection from a tile’s switch engine to a neighboring tile’s switch engine is logically 64-bits wide. Thirty-two bits are used for traffic leaving a tile for the tile to the east and 32-bits are used for traffic entering the tile from the easterly side.

I/O devices directly connect to switch engines of tiles on the periphery of the array via the iMesh. In Tile Processor Architecture implementations, the tiles are typically arranged in a rectangular two-dimensional array surrounded by I/O devices. The I/O devices connect on the periphery of the tile array to a set of networks that extend out of the tile array. I/O devices typically use a single connection to the IDN, but may be connected to any of the on-chip networks and may be connected to multiple tiles’ networks in order to increase I/O to tile array bandwidth. I/O devices can also use the iMesh and tile switch engines to route traffic between one I/O device and another I/O device.

2.2 Data Types

Differing sized data types can be used on the Tile Processor Architecture. Data composed of 8 bits is considered a \textit{byte}. Datum composed of 16 bits is considered a \textit{Half Word}. Data composed of 32 bits is considered a \textit{Word}, and data composed of 64 bits is considered a \textit{Double Word}. In addition to these basic data types, the processor engine supports two packed data formats to be used with the SIMD instructions. These formats pack a number of smaller elements into a single word. The SIMD instructions support a \textit{packed byte} format, which consists of four bytes packed into a \textit{word}. The SIMD instructions also support a \textit{Packed Half Word} format, which consists of two half words packed into a word. See “SIMD Instructions” on page 218 for more details.

2.3 Addressing

The Tile Processor architecture defines a flat, globally shared 64-bit physical address space and a 32-bit virtual address space. The TILE64 and TILEPro family of processors implement a 36-bit physical address space. The globally shared physical address space provides the mechanism by which processes and threads can share instructions and data.
3 PROCESSOR ENGINE ARCHITECTURE

This section describes the processor engine in detail. The processor engine is the primary computational resource inside a tile. The processor engine is an asymmetric very long instruction word (VLIW) processor.

3.1 VLIW Nature of the Processor Engine

The processor engine contains three computational pipelines.

Each instruction bundle is 64-bits wide and can encode either two or three instructions. Some instructions can be encoded in either two-wide or three-wide bundles, and some can be encoded in two-wide bundles only. The most common instructions and those with short immediates can be encoded in a three instruction format. “Processor Engine Instruction Set” on page 19 discusses the encoding format and mix of instructions in greater detail.

3.2 Atomicity of Bundles

The Tile Processor Architecture has a well defined, precise interrupt model with well defined instruction ordering. A bundle of instructions executes atomically. Thus either all of the instructions in the bundle are executed or none of the instructions in a bundle are executed. Inside of a single bundle, the different instructions can be dependent on many resources. In order for a bundle to execute, all of the resources upon which a bundle is dependent must be available and ready. If one instruction in a bundle causes an exception, none of the instructions in that bundle commit state changes. Register access within a bundle is an all-or-nothing endeavor. This distinction is important for register reads as well as register writes, as register reads/writes can both modify network state when accessing network mapped registers. Memory operations are likewise atomic with respect to an instruction bundle completing.

Individual instructions within a bundle must comply with certain register semantics. Read-after-write (RAW) dependencies are enforced between instruction bundles. There is no ordering within a bundle, and the numbering of pipelines or instruction slots within a bundle is only used for convenience and does not imply any ordering. Within an instruction bundle, it is valid to encode an output operand that is the same as an input operand. Because there is explicitly no implied dependency within a bundle, the semantics for this specify that the input operands for all instructions in a bundle are read before any of the output operands are written. Write-after-write (WAW) semantics between two bundles are defined as: the latest write overwrites earlier writes.

Within a bundle, WAW dependencies are forbidden. If more than one instruction in a bundle writes to the same output operand register, unpredictable results for any destination operand within that bundle can occur. Also, implementations are free to signal this case as an illegal instruction. There is one exception to this rule—multiple instructions within a bundle may legally target the zero register. Lastly, some instructions, such as instructions that implicitly write the link register, implicitly write registers. If an instruction implicitly writes to a register that another instruction in the same bundle writes to, unpredictable results can occur for any output register used by that bundle and/or an illegal instruction interrupt can occur.
Chapter 3 Processor Engine Architecture

3.3 Register Set

The Tile Processor Architecture contains 64 architectured registers. Each register is 32-bits wide. Of the 64 registers, some are general purpose registers and others allow access to the on-chip networks.

Table 3-1 presents the registers available to be used in instructions. The first 55 registers are general purpose registers. The stack pointer $sp$ is included in the 55 general purpose registers and is specified as a stack pointer only by software convention. Register $lr$ can be used as a general purpose register. Control-transfer instructions that link have the effect of writing the value PC+8 into $lr$. Thus instructions bundled with jal, jalp, jalr, and jalrp must not write to $lr$. Note that the LNK instruction will write to $lr$ only if $lr$ is specified as the destination register. Register $sn$ allows access the static network. Registers $idn0$ and $idn1$ provide access to the two demultiplexed IDN networks. All writes to the IDN should use $idn0$; the result of writing to $idn1$ is undefined. Registers $udn0$, $udn1$, $udn2$, and $udn3$ allow access to the four demultiplexed ports of the UDN. All writes to the UDN should use $udn0$; the result of writing to $udn1$-$udn3$ is undefined. The final register, zero, is a register that contains no state and always reads 0. Writes to register 0 (zero) have no effect on the register file; however, instructions that target this register might have other results, such as effecting data prefetches or causing exceptions.

Note: Note that register $r0$ and register zero are distinct; register $r0$ is a general purpose register. Table 3-1 presents the register identifier mapping.

<table>
<thead>
<tr>
<th>Register Numbers</th>
<th>Short Name</th>
<th>Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 - 53</td>
<td>r0-r53</td>
<td>General Purpose Registers</td>
</tr>
<tr>
<td>54</td>
<td>sp</td>
<td>Stack Pointer</td>
</tr>
<tr>
<td>55</td>
<td>lr</td>
<td>Link Register</td>
</tr>
<tr>
<td>56</td>
<td>sn</td>
<td>Static Network</td>
</tr>
<tr>
<td>57</td>
<td>idn0</td>
<td>IDN Port 0</td>
</tr>
<tr>
<td>58</td>
<td>idn1</td>
<td>IDN Port 1</td>
</tr>
<tr>
<td>59</td>
<td>udn0</td>
<td>UDN Port 0</td>
</tr>
<tr>
<td>60</td>
<td>udn1</td>
<td>UDN Port 1</td>
</tr>
<tr>
<td>61</td>
<td>udn2</td>
<td>UDN Port 2</td>
</tr>
<tr>
<td>62</td>
<td>udn3</td>
<td>UDN Port 3</td>
</tr>
<tr>
<td>63</td>
<td>zero</td>
<td>Always Returns Zero</td>
</tr>
</tbody>
</table>

In order to reduce latency for tile-to-tile communications and reduce instruction occupancy, the Tile Processor Architecture provides access to the on-chip networks through register access. Any instruction executed in the processor engine can read or write to the following networks: UDN, IDN, and STN. There are no restrictions on the number of networks that can be written or read in a particular bundle. Each demultiplexing queue counts as an independent network for reads. For network writes, all three networks (UDN, IDN, and STN) can be written to in a given instruction bundle. It is illegal for multiple instructions in a bundle to write to the same network, as this is a violation of WAW ordering for processor registers. The same network register can appear in mul-
multiple source fields in one instruction or inside of one bundle. When a single network (or demultiplex queue) is read multiple times in one bundle, only one value is dequeued from the network (demux queue) and every instruction inside of a bundle receives the same value. Network operations are atomic with respect to bundle execution.

Reading and writing networks can cause the processor to stall. If no data are available on a network port when an instruction tries to read from the corresponding network-mapped register, the entire bundle stalls waiting for the input to arrive. Likewise, if a bundle writes to a network and the output network is full, the bundle stalls until there is room in the output queue. Listing 3-1 contains example code for network reads and writes.

Listing 3-1. Network Reads and Writes

```c
// add writes to udn0, sub reads
// idn0 and idn1 and writes to sn
{addi udn0, r5, 10; sub sn, idn0, idn1}
// increment the data coming from
// udn0, add registers, and load
{addi udn0, udn0, 1; add r5, r6, r7; ld r8, r9}
// mask low bit of udn0 into r5 and
// mask second bit into r6.  reads only
// one value from udn.
{andi r5, udn0, 1; andi r6, udn0, 2}
```

The Tile Processor Architecture provides two methods of writing to the static network: by specifying \texttt{sn} as the destination register, and by setting the \texttt{S} bit within an encoded instruction. To set the \texttt{S} bit in an assembly program, add the suffix \texttt{.sn} to the mnemonic. The advantage of this method is that a GPR or another network-mapped register can be specified as the destination register, allowing both to be written simultaneously. This saves the programmer from having to add an extra instruction that explicitly writes to \texttt{sn}. The result of specifying \texttt{sn} as the destination register of an instruction with its \texttt{S} set bit is undefined. The \texttt{S}-bit only appears on a subset of the instructions, most notably, arithmetic instructions that execute in two-wide mode. With respect to obeying the above rules regarding bundle atomicity and network flow control, setting the \texttt{S} bit is identical to specifying \texttt{sn} as the destination register. When the \texttt{S}-bit is used, a move instruction to the network can be saved. Listing 3-2 contains example code for an instruction that writes the \texttt{STN} with an \texttt{S}-bit.

Listing 3-2. Writing an Instruction to the STN with an S-Bit

```c
// Instruction adds r6 and r7 and
// deposits result in r5 and enqueues the
// result in the static network.
{add.sn r5, r6, r7}
```

### 3.4 Program Counter

Each processor engine contains a program counter that denotes the location of the instruction bundle that is being executed. Instruction bundles are 64 bits, thus the program counter must be aligned to 8 bytes. The program counter is modified in the natural course of program execution by branches and jumps. Also, the program counter is modified when an interrupt is signaled or when a return from interrupt instruction \texttt{iret} is executed. Instructions that link — \texttt{jal}, \texttt{jalr}, \texttt{jalrp}, and \texttt{lnk} — read the contents of the program counter for the current instruction bundle, add 8 (the length of an instruction), and write the result into a register. For \texttt{jal}, \texttt{jalr}, and \texttt{jalrp}, the register written with the link address is always \texttt{lr}; for \texttt{lnk}, the destination register is specified explicitly. Jumps that link are useful for sub-routine calls and the \texttt{lnk} instruction is useful for position independent code.
For more information, see “Control Instructions” on page 95.

3.5 Special Purpose Registers

The processor engine contains special purpose registers (SPRs) that are used to control many features of a tile. The processor engine can read an SPR with the mfspr instruction and write to an SPR with the mtspr instruction. Most SPRs are used by system software for tile configuration or for accessing context switching state.

Special purpose registers are a mixture of state and a generalized interface to control structures. Some of the special purpose registers simply hold state and provide a location to store data that is not in the general purpose register file or memory. Other special purpose registers hold no state but serve as a convenient word-oriented interface to control structures within a tile. Some SPRs possess a mixture of machine hardware status state and control functions. The act of reading or writing an SPR can cause side effects. SPRs are also the main access control mechanism for protected state in the Tile Processor Architecture. The SPR space is designed so that groups of SPRs require different protection levels to access it.

For more information, see “Special Purpose Registers” on page 391.

3.6 TILE64 and TILEPro Processing Engine Pipeline

The Tile Processor Engine has three execution pipelines (P2, P1, P0) of two stages (EX0, EX1) each. Both modes of bundling instructions, namely the X mode and the Y mode, can issue instructions into any of the three of the execution pipelines (P2, P1, P0). Y-mode uses all three pipelines simultaneously. One of the pipelines remains in IDLE mode during X-mode issue. P0 is capable of executing all arithmetic and logical operations, bit and byte manipulation, select, and all multiply and fused multiply instructions. P1 can execute all of the arithmetic and logical operations, SPR reads and writes, conditional branches, and jumps. P2 can service memory operations only: loads, stores, and test-and-set instructions.

The Processor Engine uses a short, in-order pipeline aimed at low branch latency and low load-to-use latency. The basic pipeline consists of five stages: Fetch, RegisterFile, Execute0, Execute1, and WriteBack.

3.6.1 Fetch

The Fetch pipeline stage runs the complete loop from updating the Program Counter (PC) through fetching an instruction to selecting a new PC. The PC provides an index into several structures in parallel: the icache data and tag arrays, the merged Branch Target Buffer and line prediction array, and the ITLB. The fetch address multiplexor must then predict the next PC based on any of several inputs: the next sequential instruction, line prediction or branch prediction, an incorrectly-predicted branch, or an interrupt.

3.6.2 RegisterFile (RF)

There are three instruction pipelines, one for each of the instructions in a bundle. These pipelines are designated as P0, P1 and P2. Bundles containing two instructions will always result in one instruction being issued in P0. The second instruction will be issued in either P1 or P2, depending on the type of instruction.
The RF stage produces valid source operands for the instructions. This operation involves four steps: decoding the two or three instructions contained in the bundle, as provided by the Fetch stage each cycle; accessing the source operands from the register file and/or network ports; checking instruction dependencies; and bypassing operand data from earlier instructions. A three-instruction bundle can require up to seven source register operands and three destination register operands — three source operands to support the fused MulAdd and conditional transfer operations, two source operands each for the other two instruction pipelines.

3.6.3 Execute Stages (EX0, EX1)

The EX0 pipeline stage is the instruction commit point of the processor; if no exception occurs, then the architectural state can be modified. The early commit point allows the processor to transmit values computed in one tile to another tile with extremely low, register-like latencies. Single-cycle operations can bypass from the output of EX0 into the subsequent EX0. Two-cycle operations are fully pipelined and can bypass from the output of EX1 into the input of EX0.

3.6.4 WriteBack (WB)

Destination operands from P1 and P0 are written back to the Register File in the WB stage. Load data returning from memory is also written back to the Register File in the WB stage. The Register File is write-through, eliminating a bypass requirement from the output of WB into EX0.

3.6.5 Pipeline Latencies

In a pipelined processor, multiple operations can overlap in time. In the Tile Architecture instructions that have longer latencies are fully-pipelined.

<table>
<thead>
<tr>
<th>Table 3-2. TILEPro Pipeline Latencies</th>
</tr>
</thead>
<tbody>
<tr>
<td>Operation</td>
</tr>
<tr>
<td>Branch Mispredict</td>
</tr>
<tr>
<td>Load to Use - L1 hit</td>
</tr>
<tr>
<td>Load to Use - L1 miss, L2 hit</td>
</tr>
</tbody>
</table>
Table 3-2. TILEPro Pipeline Latencies (continued)

<table>
<thead>
<tr>
<th>Operation</th>
<th>Latency</th>
</tr>
</thead>
<tbody>
<tr>
<td>Load to Use - L1/L2 Miss, adjacent Dynamic Distributed Cache (DDC™) hit</td>
<td>35 cycles</td>
</tr>
<tr>
<td>Load to Use - L1/L2 Miss, DDR2 page open, typical</td>
<td>69 cycles</td>
</tr>
<tr>
<td>Load to Use - L1/L2 Miss, DDR2 page miss, typical</td>
<td>88 cycles</td>
</tr>
<tr>
<td>MUL*, SAD*, ADIFF instructions</td>
<td>2 cycles</td>
</tr>
<tr>
<td>All other instructions</td>
<td>1 cycle</td>
</tr>
</tbody>
</table>
4 PROCESSOR ENGINE INSTRUCTION SET

4.1 Overview

This chapter describes the Instruction Set Architecture (ISA), the formats used to specify instructions, definitions and semantics, constants, and pipeline latencies. For a complete list of instructions, refer to “Master List of Main Processor Instructions” on page 35.

4.1 Instruction Set Architecture

The Tile Processor Architecture instructions can be categorized into 11 major groups:

- Arithmetic Instructions
- Bit Manipulation Instructions
- Compare Instructions
- Control Instructions
- Logical Instructions
- Memory Instructions
- Memory Maintenance Instructions
- Multiply Instructions
- NOP Instructions
- SIMD Instructions
- System Instructions

4.1.1 Instruction Organization and Format

The Tile Processor Architecture utilizes a 64-bit instruction bundle to specify instructions. While the bundle is a large encoding format, this encoding provides a compiler with a relatively orthogonal instruction space that aids in compilation. Likewise, the large register namespace facilitates the allocation of data into registers, but comes at the cost of extra encoding bits in an instruction word.

The Tile Processor Architecture is capable of encoding up to three instructions in a bundle. In order to achieve this level of encoding density, some of the less common or large immediate operand instructions are encoded in a two instruction bundle. The bundle format is determined by the Mode bit, bit 63. When the Mode bit is one (1), the bundle format is a Y bundle and when the Mode bit is zero (0), the bundle is an X bundle.

Instruction formats are described in the sections that follow.
4.1.1.1 X Instruction Formats

Figure 4-10 and Figure 4-11 show the basic X format instruction encodings.

Bundles that are in the Y format can encode three simultaneous operations where one is a memory operation, one is an arithmetic operation, and the last one is an arithmetic or multiplication operation. The Y bundle format contains only a simple set of instructions with 8-bit immediates and these instructions are not capable of writing to both the static network and a register in a single instruction (Y mode instructions lack S bits). The X mode bundle is capable of encoding a superset of the instructions that can be encoded in Y mode, however only two instructions can be encoded in each bundle. X mode bundles are capable of encoding all instructions, including complex instructions such as control transfers and long immediate instructions. Also, many instructions in X mode bundles have S bits that indicate that the instruction writes to the static network in addition to the destination register specified in the instruction. For more information on the S-bit, refer to page 15.

Y mode instructions contain three encoding slots, Y2, Y1, and Y0. Y2 is the pipeline which executes loads and stores, Y1 is capable of executing arithmetic and logical instructions, and Y0 is capable of executing multiply, arithmetic, and logical instructions. Figure 4-30 through Figure 4-38 present the instruction formats and encodings for the Y pipelines. X mode contains two encoding slots, X1 and X0. The X1 pipeline is capable of executing load, store, branches, arithmetic, and logical instructions by merging Y2 and Y1 pipelines. Pipeline X0 is capable of executing multiply, arithmetic, and logical instructions. Figure 4-12 through Figure 4-27 present the instruction formats and encodings for the X pipelines.

Some instruction formats, or specific instructions, contain unused fields. It is strongly recommended that these contain zeros, as future versions of the architecture may decide to assign meanings to nonzero values in these fields. Implementations are permitted, but not required, to take an Illegal Instruction interrupt when detecting a nonzero value in an unused instruction field.
X1 Instruction Formats

The X1 RRR format encodes an operation, which requires a destination register and two source operands. For example:

{add r0, r1, r2} // Add r1 and r2 placing result into r0

![Figure 4-12: X1 RRR Format (X1_RRR)]

The X1_imm8 format encodes an operation that requires a destination register, a source register, and an 8-bit signed immediate operand. For example:

{ addi r0, r1, -13} // Add -13 to r1 and place result in r0

![Figure 4-13: X1 Immediate Format (X1_Imm8)]
The X1 Immediate MTSPR format writes an SPR with the value from a source register. For example:

```c
// Move the contents of register 0 into SPR SPR_SNSTATIC
{ mtspr SPR_SNSTATIC, r0 }  
```

![Figure 4-14: X1 Immediate MTSPR Format (X1_MT_Imm15)](image)

The X1 Immediate MFSPR format is used to move the contents of an SPR into a destination register. For example:

```c
{ mfspr r0, SPR_SNSTATIC } // Move the contents of the SPR SPR_SNSTATIC into r0  
```

![Figure 4-15: X1 Immediate MFSPR Format (X1_MF_Imm15)](image)

The X1 Long Immediate Format is used for instructions which require a destination register, a source register and a signed 16-bit immediate operand. For example:

```c
// Add 0x1234 to the contents of register 1 and place the result in register 0
{ addli r0, r1, 0x1234 }  
```

![Figure 4-16: X1 Long Immediate Format (X1_Imm16)](image)
The X1 Unary format is used for instructions which require a destination register, and a single operand register. For example:

\[
\{ \text{lw } r0, r1 \} \quad // \text{Load the contents of the word addressed by } r1 \text{ into } r0
\]

![Figure 4-17: X1 Unary Format (X1_Unary)](image)

The X1 Shift Format is used for instructions that require a destination register, a source register, and a 5-bit shift count. For example:

\[
// \text{Left shift the contents of } r1 \text{ 5-bits and place the result in } r0. \\
\{ \text{shli } r0, r1, 5 \}
\]

![Figure 4-18: X1 Shift Format (X1_Shift)](image)

The X1 Masked Merge format is used for the masked merge instruction. For example:

\[
// \text{Merge bits 5 through 7 of } r1 \text{ into the contents of } r2 \quad // \text{and place the result in } r0 \\
\{ \text{mm, } r0, r1, r2, 5, 7 \}
\]

![Figure 4-19: X1 Masked Merge Format (X1_MMR)](image)
Chapter 4 Processor Engine Instruction Set

The X1 branch format is used to encode branches. The branch offset is represented as a signed 16-bit bundle offset. For example:

```
{ bnz r0, br_target}  // Branch to br_target if the contents of r0 is not zero
```

![Figure 4-20: X1 Branch Format (X1_Br)](image)

The X1 Jump format is used to encode forward or backwards jumps. The jump offset is represented as an unsigned 28-bit bundle offset. For example:

```
{j jump_target}  // Jump to jump_target
```

![Figure 4-21: X1 Jump Format (X1_J)](image)

X0 Instruction Formats

The X0 RRR format encodes an operation, which requires a destination register and two source operands. For example:

```
{add r0, r1, r2}  // Add r1 and r2 placing result into r0
```

![Figure 4-22: X0 RRR Format (X0_RRR)](image)
The X0_imm8 format encodes an operation that requires a destination register, a source register, and an 8-bit signed immediate operand. For example:

{ addi r0, r1, -13} // Add -13 to r1 and place result in r0

The X0 Long Immediate Format is used for instructions that require a destination register, a source register, and a signed 16-bit immediate operand. For example:

// Add 0x1234 to the contents of register 1 and place the result in register 0
{ addli r0, r1, 0x1234 }

The X0 Unary format is used for instructions that require a destination register and a single operand and register. For example:

{ bytex r0, r1 } // Exchange the bytes in r1 and place the result in r0
The X0 Shift Format is used for instructions that require a destination register, a source register, and a 5-bit shift count. For example:

```
// Left shift the contents of r1 5-bits and place the result in r0.
{ shli r0, r1, 5 }
```

![Figure 4-26: X0 Shift Format (X0_Shift)]

The X0 Masked Merge format is used for the masked merge instruction. For example:

```
// Merge bits 5 through 7 of r1 into the contents of r2
// and place the result in r0
{ mm, r0, r1, r2, 5, 7 }
```

![Figure 4-27: X0 Masked Merge Format (X0_MM)]

### 4.1.1.2 Y Instruction Formats

![Figure 4-28: Y1 Specific Format]
Y2 Instruction Formats

The Y2 Load Store Format is used to encode load or store instructions. Examples:

\{ lw r0, r1 \} // Load the contents of the word addressed by r1 into r0
\{ sw r0, r1 \} // Store the contents of register r1 into the word
// addressed by r0
Y1 Instruction Formats

The Y1 RRR format encodes an operation which requires a destination register, and two source registers. The Y1 RRR format encodes an operation which requires a destination register and two source operands. For example:

\{add r0, r1, r2\} // Add r1 and r2 placing result into r0

![Figure 4-31: Y1 RRR Format (Y1_RRR)](image)

The Y1_imm8 format encodes an operation that requires a destination register, a source register, and an 8-bit signed immediate operand. For example:

\{ addi r0, r1, -13\} // Add -13 to r1 and place result in r0

![Figure 4-32: Y1 Immediate Format (Y1_Imm8)](image)

The Y1 Unary format is used for instructions that require a destination register, and a single operand and register. For example:

\{ lw r0, r1\} // Load the contents of the word addressed by r1 into r0

![Figure 4-33: Y1 Unary Format (Y1_Unary)](image)
The Y1 Shift Format is used for instructions that require a destination register, a source register, and a 5-bit shift count. For example:

// Left shift the contents of r1 5-bits and place the result in r0. 
{ shli r0, r1, 5 }

Y0 Instruction Formats

The Y0 RRR format encodes an operation, which requires a destination register and two source operands. For example:

{ add r0, r1, r2 } // Add r1 and r2 placing result into r0

The Y0_imm8 format encodes an operation that requires a destination register, a source register, and an 8-bit signed immediate operand. For example:

{ addi r0, r1, -13 } // Add -13 to r1 and place result in r0
The Y0 Unary format is used for instructions that require a destination register and a single operand register. For example:

{ bytex r0, r1 } // Exchange the bytes in r1 and place the result in r0

![Diagram showing Y0 Unary Format (Y0_Unary)]

The Y0 Shift Format is used for instructions that require a destination register, a source register, and a 5-bit shift count. For example:

// Left shift the contents of r1 5-bits and place the result in r0.
{ shli r0, r1, 5 }

![Diagram showing Y0 Shift Format (Y0_Shift)]

### 4.1.2 Definitions and Semantics

Throughout the main processor’s instruction reference, several function calls, types, and constants are utilized to define the function of a particular instruction. This section describes the functionality and values of each of these functions, types, and constants. Unless otherwise stated, operators and precedence in the instruction reference follow the same rules as ANSI C.

#### 4.1.2.1 Constants

- **WORD_SIZE 32**
  The size of a machine word in bits. The Tile Processor is a 32-bit machine.

- **WORD_MASK 0xFFFFFFFF**
  A mask to represent all of the bits in a word.

- **WORD_ADDR_MASK 0xFFFFFFFFc**
  A mask that represents the portion of an address that forms a word aligned mask.

- **HALF_WORD_SIZE 16**
  The size of half of a machine word in bits. The Tile Processor is a 32-bit machine thus half the word length is 16.
HALF_WORD_ADDR_MASK 0xFFFFffe
A mask that represents the portion of an address that forms a half word aligned mask.

BYTE_SIZE 8
The number of bits in a byte.

BYTE_SIZE_LOG_2 3
The logarithm base 2 of the number of bits in a byte.

BYTE_MASK 0xFF
A mask to represent all of the bits in a byte.

BACKWARD_OFFSET 0x80000000
A constant address offset added to the instruction specified offset in backwards jump instructions. For more information, refer to “Control Instructions” on page 95.

INSTRUCTION_SIZE 64
The length in bits of an instruction (bundle) in the Tile Processor architecture.

INSTRUCTION_SIZE_LOG_2 6
The logarithm base 2 of the length in bits of an instruction (bundle) in the Tile Processor.

ALIGNED_INSTRUCTION_MASK 0xFFFFfff8
A mask that selects the relevant bits for the address of an aligned instruction.

BYTE_16_ADDR_MASK 0xFFFFfff0
A mask that represents the portion of an address that forms a 16-byte aligned block

ZERO_REGISTER 63
The ZERO_REGISTER always reads as 0, and ignores all writes.

NUMBER_OF_REGISTERS 64
The number of architecturally visible general purpose registers in the main processor.

LINK_REGISTER 55
The LINK_REGISTER is used as an implicit destination for some control instructions.

EX_CONTEXT_SPRF_OFFSET
The starting SPR address of the interrupt context save blocks. The save blocks are indexed by protection level of the interrupt handler being invoked.

EX_CONTEXT_SIZE
The length of the interrupt context save block.

PC_EX_CONTEXT_OFFSET
The register offset of the saved PC in the interrupt save context block.

PROTECTION_LEVEL_EX_CONTEXT_OFFSET
The register offset of the saved protection level in the interrupt save context block.

INTERRUPT_MASK_EX_CONTEXT_OFFSET
The register offset of the saved interrupt mask in the interrupt save context block.
4.1.2.2 Types

SignedMachineWord
This is a signed \( \text{WORD\_SIZE} \) type.

UnsignedMachineWord
This is an unsigned \( \text{WORD\_SIZE} \) type.

RegisterFileEntry
This type represents a register file entry. This type can be cast to a \( \text{UnsignedMachineWord} \). This type has the assignment operator overloaded for assignments of \( \text{UnsignedMachineWord} \).

4.1.2.3 Functions

signExtend17
Sign extends a 17-bit value up to the machine’s word length \( \text{WORD\_SIZE} \). The type of the returned value of this function is \( \text{SignedMachineWord} \);

signExtend16
Sign extends a 16-bit value up to the machine’s word length \( \text{WORD\_SIZE} \). The type of the returned value of this function is \( \text{SignedMachineWord} \);

signExtend8
Sign extends an 8-bit value up to the machine’s word length \( \text{WORD\_SIZE} \). The type of the returned value of this function is \( \text{SignedMachineWord} \);

signExtend1
Sign extends an 1-bit value up to the machine’s word length \( \text{WORD\_SIZE} \). The type of the returned value of this function is \( \text{SignedMachineWord} \);

memoryReadWord
Returns the value stored in memory of length \( \text{WORD\_SIZE} \) at the address passed to this function. The value is not actually extended since it is already the same as \( \text{WORD\_SIZE}/\text{UnsignedMachineWord} \). The address passed as a parameter to this function is processed depending on the memory mode and contents of the TLB. The Tile Processor is a little endian machine.

memoryReadHalfWord
Returns the value stored in memory of length \( \text{HALF\_WORD\_SIZE} \) at the address passed to this function. This function returns the value 0 extended to a \( \text{UnsignedMachineWord} \). The address passed as a parameter to this function is processed depending on the memory mode and contents of the TLB. The Tile Processor is a little endian machine.

memoryReadByte
Returns the value stored in memory of length \( \text{BYTE\_SIZE} \) at the address passed to this function. This function returns the value zero extended to a \( \text{UnsignedMachineWord} \). The address passed as a parameter to this function is processed depending on the memory mode and contents of the TLB. The Tile Processor is a little endian machine.
memoryWriteWord

Writes to memory \textsc{WORD\_SIZE} bits of the second parameter into the address passed to this function as the first parameter. The address passed as the first parameter to this function is processed depending on the memory mode and contents of the TLB. The Tile Processor is a little endian machine.

memoryWriteHalfWord

Writes to memory \textsc{HALF\_WORD\_SIZE} bits of the second parameter into the address passed to this function as the first parameter. The address passed as the first parameter to this function is processed depending on the memory mode and contents of the TLB. The Tile Processor is a little endian machine.

memoryWriteByte

Writes to memory \textsc{BYTE\_SIZE} bits of the second parameter into the address passed to this function as the first parameter. The address passed as the first parameter to this function is processed depending on the memory mode and contents of the TLB. The Tile Processor is a little endian machine.

setNextPC

Set the program counter to this function’s parameter.

currentPC

Return as an \textsc{UnsignedMachineWord} the current program counter.

branchHintsCorrect

Denote that a control flow event has occurred that has been hinted correctly.

branchHintsIncorrect

Denote that a control flow event has occurred that has been hinted incorrectly.

getCurrentProtectionLevel

Returns as an \textsc{UnsignedMachineWord} the current protection level.

setProtectionLevel

Sets the current protection level from the first parameter.

setInterruptCriticalSection

Sets the current interrupt critical section bit from the first parameter.

flushCacheLine

Flushes the cache line from a tile’s local cache which contains the address passed to this function as a parameter.

invalidateCacheLine

Invalidates the cache line from a tile’s local cache which contains the address passed to this function as a parameter.

flushAndInvalidateCacheLine

Flushes and invalidates the cache line from a tile’s local cache which contains the address passed to this function as a parameter.

rf[]

Returns the indexed register file entry with type \textsc{RegisterFileEntry}. The index is an integer in the range of 0 to \textsc{NUMBER\_OF\_REGISTERS} - 1.

sprf[]

Returns the indexed special purpose register file entry. The index is an integer in the range of 0 to \(2^{15} - 1\).
pushReturnStack
Pushes the parameter onto the return prediction stack.

popReturnStack
Returns the top of the return prediction stack and pops the stack.

indirectBranchHintedIncorrect
Denote that an indirect branch has occurred and has been hinted incorrectly.

indirectBranchHintedCorrect
Denote that an indirect branch has occurred and has been hinted correctly.

dtlbProbe
See “dtlbpr: Data TLB Probe” on page 184.

memoryFence
See “mf: Memory Fence” on page 188.

getHighHalfWordUnsigned
Returns the high-order half word of the parameter.

getLowHalfWordUnsigned
Returns the low-order half word of the parameter.

iCoherent
See “icoh: Instruction Stream Coherence” on page 349.

fnop
See “fnop: Filler No Operation” on page 214.

nop
See “nop: Architectural No Operation” on page 216.

drain
See “drain: Drain Instruction” on page 348.

illegalInstruction
Denotes that an illegal instruction has occurred.

nap
See “nap: Nap” on page 354.

softwareInterrupt
Denotes that a software interrupt has occurred. The parameter specifies which software interrupt will be generated.
### 4.1.3 Master List of Main Processor Instructions

Table 4-3 provides a complete list instructions in alphabetic order. Pseudo Instructions are listed on page 359.

<table>
<thead>
<tr>
<th>Register</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>add</td>
<td>Arithmetic</td>
<td>Add Word (Refer to page 44.)</td>
</tr>
<tr>
<td>addb</td>
<td>SIMD</td>
<td>Add Bytes (Refer to page 220.)</td>
</tr>
<tr>
<td>addbs_u</td>
<td>SIMD</td>
<td>Add Bytes Saturating Unsigned (Refer to page 222.)</td>
</tr>
<tr>
<td>addh</td>
<td>SIMD</td>
<td>Add Half Words (Refer to page 224.)</td>
</tr>
<tr>
<td>addhs</td>
<td>SIMD</td>
<td>Add Half Words (Refer to page 226.)</td>
</tr>
<tr>
<td>addi</td>
<td>Arithmetic</td>
<td>Add Immediate Word (Refer to page 46.)</td>
</tr>
<tr>
<td>addib</td>
<td>SIMD</td>
<td>Add Immediate Bytes (Refer to page 228.)</td>
</tr>
<tr>
<td>addih</td>
<td>SIMD</td>
<td>Add Immediate Half Words (Refer to page 229.)</td>
</tr>
<tr>
<td>addli</td>
<td>Arithmetic</td>
<td>Add Long Immediate Word (Refer to page 48.)</td>
</tr>
<tr>
<td>addlis</td>
<td>Arithmetic</td>
<td>Add Long Immediate Static Write Word (Refer to page 49.)</td>
</tr>
<tr>
<td>adds</td>
<td>Arithmetic</td>
<td>Add Word Saturating (Refer to page 50.)</td>
</tr>
<tr>
<td>adiffb_u</td>
<td>SIMD</td>
<td>Absolute DifferenceUnsigned Bytes (Refer to page 231.)</td>
</tr>
<tr>
<td>adiffh</td>
<td>SIMD</td>
<td>Absolute Difference Half Words (Refer to page 232.)</td>
</tr>
<tr>
<td>and</td>
<td>Logical</td>
<td>And Word (Refer to page 122.)</td>
</tr>
<tr>
<td>andi</td>
<td>Logical</td>
<td>And Immediate Word (Refer to page 124.)</td>
</tr>
<tr>
<td>auli</td>
<td>Arithmetic</td>
<td>Add Upper Long Immediate Word (Refer to page 52.)</td>
</tr>
<tr>
<td>avgb_u</td>
<td>SIMD</td>
<td>Average Byte Unsigned (Refer to page 233.)</td>
</tr>
<tr>
<td>avgh</td>
<td>SIMD</td>
<td>Average Half Words (Refer to page 234.)</td>
</tr>
<tr>
<td>bbns</td>
<td>Control</td>
<td>Branch Bit Not Set Word (Refer to page 96.)</td>
</tr>
<tr>
<td>bbnst</td>
<td>Control</td>
<td>Branch Bit Not Taken Word (Refer to page 97.)</td>
</tr>
<tr>
<td>bbs</td>
<td>Control</td>
<td>Branch Bit Set Word (Refer to page 98.)</td>
</tr>
<tr>
<td>bbst</td>
<td>Control</td>
<td>Branch Bit Set Taken Word (Refer to page 99.)</td>
</tr>
<tr>
<td>bgez</td>
<td>Control</td>
<td>Branch Greater Than or Equal to Zero Word (Refer to page 100.)</td>
</tr>
<tr>
<td>bgezt</td>
<td>Control</td>
<td>Branch Greater Than or Equal to Zero Predict Taken Word (Refer to page 101.)</td>
</tr>
<tr>
<td>bgz</td>
<td>Control</td>
<td>Branch Greater Than Zero Word (Refer to page 102.)</td>
</tr>
<tr>
<td>Register</td>
<td>Type</td>
<td>Description</td>
</tr>
<tr>
<td>----------</td>
<td>-----------------</td>
<td>------------------------------------------------------------------</td>
</tr>
<tr>
<td>bgzt</td>
<td>Control</td>
<td>Branch Greater Than Zero Predict Taken Word (Refer to page 103.)</td>
</tr>
<tr>
<td>bitx</td>
<td>Bit Manipulation</td>
<td>Bit Exchange Word (Refer to page 64.)</td>
</tr>
<tr>
<td>blez</td>
<td>Control</td>
<td>Branch Less Than or Equal to Zero Word (Refer to page 104.)</td>
</tr>
<tr>
<td>blezt</td>
<td>Control</td>
<td>Branch Less Than or Equal to Zero Taken Word (Refer to page 105.)</td>
</tr>
<tr>
<td>blz</td>
<td>Control</td>
<td>Branch Less Than Zero Word (Refer to page 106.)</td>
</tr>
<tr>
<td>blzt</td>
<td>Control</td>
<td>Branch Less Than Zero Taken Word (Refer to page 107.)</td>
</tr>
<tr>
<td>bnz</td>
<td>Control</td>
<td>Branch Not Zero Word (Refer to page 108.)</td>
</tr>
<tr>
<td>bnzt</td>
<td>Control</td>
<td>Branch Not Zero Predict Taken Word (Refer to page 109.)</td>
</tr>
<tr>
<td>byteX</td>
<td>Bit Manipulation</td>
<td>Byte Exchange Word (Refer to page 66.)</td>
</tr>
<tr>
<td>bz</td>
<td>Control</td>
<td>Branch Zero Word (Refer to page 110.)</td>
</tr>
<tr>
<td>bzt</td>
<td>Control</td>
<td>Branch Zero Predict Taken Word (Refer to page 111.)</td>
</tr>
<tr>
<td>clz</td>
<td>Bit Manipulation</td>
<td>Count Leading Zeros Word (Refer to page 68.)</td>
</tr>
<tr>
<td>crc32_32</td>
<td>Bit Manipulation</td>
<td>CRC32 32-bit Step (Refer to page 70.)</td>
</tr>
<tr>
<td>crc32_8</td>
<td>Bit Manipulation</td>
<td>CRC32 8-bit Step (Refer to page 71.)</td>
</tr>
<tr>
<td>ctz</td>
<td>Bit Manipulation</td>
<td>Count Trailing Zeros Word (Refer to page 72.)</td>
</tr>
<tr>
<td>align</td>
<td>Bit Manipulation</td>
<td>Double Word Align (Refer to page 74.)</td>
</tr>
<tr>
<td>drain</td>
<td>System</td>
<td>Drain Instruction (Refer to page 348.)</td>
</tr>
<tr>
<td>dtlbpr</td>
<td>Memory Maintenance</td>
<td>Data TLB Probe (Refer to page 184.)</td>
</tr>
<tr>
<td>finv</td>
<td>Memory Maintenance</td>
<td>Flush and Invalidate Cache Line (Refer to page 185.)</td>
</tr>
<tr>
<td>flush</td>
<td>Memory Maintenance</td>
<td>Flush Cache Line (Refer to page 186.)</td>
</tr>
<tr>
<td>fnop</td>
<td>NOP</td>
<td>Filler No Operation (Refer to page 214.)</td>
</tr>
<tr>
<td>icoh</td>
<td>System</td>
<td>Instruction Stream Coherence (Refer to page 349.)</td>
</tr>
<tr>
<td>ill</td>
<td>System</td>
<td>Illegal Instruction (Refer to page 350.)</td>
</tr>
<tr>
<td>inthb</td>
<td>SIMD</td>
<td>Interleave High Byte (Refer to page 235.)</td>
</tr>
<tr>
<td>inthh</td>
<td>SIMD</td>
<td>Interleave High Half Words (Refer to page 237.)</td>
</tr>
<tr>
<td>intlb</td>
<td>SIMD</td>
<td>Interleave Low Byte (Refer to page 239.)</td>
</tr>
<tr>
<td>intlh</td>
<td>SIMD</td>
<td>Interleave Low Half Words (Refer to page 241.)</td>
</tr>
<tr>
<td>inv</td>
<td>Memory Maintenance</td>
<td>Invalidate Cache Line (Refer to page 187.)</td>
</tr>
<tr>
<td>iret</td>
<td>System</td>
<td>Interrupt Return (Refer to page 351.)</td>
</tr>
</tbody>
</table>
### Table 4-3. Master List of Main Processor Instructions (continued)

<table>
<thead>
<tr>
<th>Register</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>jalb</td>
<td>Control</td>
<td>Jump and Link Backward (Refer to page 112.)</td>
</tr>
<tr>
<td>jalf</td>
<td>Control</td>
<td>Jump and Link Forward (Refer to page 113.)</td>
</tr>
<tr>
<td>jarl</td>
<td>Control</td>
<td>Jump and Link Register (Refer to page 114.)</td>
</tr>
<tr>
<td>jalrp</td>
<td>Control</td>
<td>Jump and Link Register Predict (Refer to page 115.)</td>
</tr>
<tr>
<td>jb</td>
<td>Control</td>
<td>Jump Backward (Refer to page 116.)</td>
</tr>
<tr>
<td>jf</td>
<td>Control</td>
<td>Jump Forward (Refer to page 117.)</td>
</tr>
<tr>
<td>jr</td>
<td>Control</td>
<td>Jump Register (Refer to page 118.)</td>
</tr>
<tr>
<td>jrp</td>
<td>Control</td>
<td>Jump Register Predict (Refer to page 119.)</td>
</tr>
<tr>
<td>lb</td>
<td>Memory</td>
<td>Load Byte (Refer to page 164.)</td>
</tr>
<tr>
<td>lb_u</td>
<td>Memory</td>
<td>Load Byte Unsigned (Refer to page 165.)</td>
</tr>
<tr>
<td>lbadd</td>
<td>Memory</td>
<td>Load Byte and Add (Refer to page 166.)</td>
</tr>
<tr>
<td>lbadd_u</td>
<td>Memory</td>
<td>Load Byte Unsigned and Add (Refer to page 167.)</td>
</tr>
<tr>
<td>lh</td>
<td>Memory</td>
<td>Load Half Word (Refer to page 168.)</td>
</tr>
<tr>
<td>lh_u</td>
<td>Memory</td>
<td>Load Half Word Unsigned (Refer to page 169.)</td>
</tr>
<tr>
<td>lhadd</td>
<td>Memory</td>
<td>Load Half Word and Add (Refer to page 170.)</td>
</tr>
<tr>
<td>lhadd_u</td>
<td>Memory</td>
<td>Load Half Word Unsigned and Add (Refer to page 171.)</td>
</tr>
<tr>
<td>lnk</td>
<td>Control</td>
<td>Link (Refer to page 120.)</td>
</tr>
<tr>
<td>lw</td>
<td>Memory</td>
<td>Load Word (Refer to page 172.)</td>
</tr>
<tr>
<td>lw_na</td>
<td>Memory</td>
<td>Load Word No Alignment Trap (Refer to page 173.)</td>
</tr>
<tr>
<td>lwadd</td>
<td>Memory</td>
<td>Load Word and Add (Refer to page 174.)</td>
</tr>
<tr>
<td>lwadd_na</td>
<td>Memory</td>
<td>Load Word No Alignment Trap and Add (Refer to page 175.)</td>
</tr>
<tr>
<td>maxb_u</td>
<td>SIMD</td>
<td>Maximum Byte Unsigned (Refer to page 243.)</td>
</tr>
<tr>
<td>maxh</td>
<td>SIMD</td>
<td>Maximum Half Words (Refer to page 245.)</td>
</tr>
<tr>
<td>maxib_u</td>
<td>SIMD</td>
<td>Maximum Immediate Byte Unsigned (Refer to page 247.)</td>
</tr>
<tr>
<td>maxih</td>
<td>SIMD</td>
<td>Maximum Immediate Half Words (Refer to page 249.)</td>
</tr>
<tr>
<td>mf</td>
<td>Memory, Maintenance</td>
<td>Memory Fence (Refer to page 188.)</td>
</tr>
<tr>
<td>mfspr</td>
<td>System</td>
<td>Move from Special Purpose Register Word (Refer to page 352.)</td>
</tr>
<tr>
<td>minb_u</td>
<td>SIMD</td>
<td>Minimum Byte Unsigned (Refer to page 251.)</td>
</tr>
<tr>
<td>minh</td>
<td>SIMD</td>
<td>Minimum Half Words (Refer to page 253.)</td>
</tr>
<tr>
<td>Register</td>
<td>Type</td>
<td>Description</td>
</tr>
<tr>
<td>----------</td>
<td>------</td>
<td>-------------</td>
</tr>
<tr>
<td>minib_u</td>
<td>SIMD</td>
<td>Minimum Immediate Byte Unsigned (Refer to page 255.)</td>
</tr>
<tr>
<td>minih</td>
<td>SIMD</td>
<td>Minimum Immediate Half Words (Refer to page 257.)</td>
</tr>
<tr>
<td>mm</td>
<td>Logical</td>
<td>Masked Merge Word (Refer to page 126.)</td>
</tr>
<tr>
<td>mnz</td>
<td>Logical</td>
<td>Mask Not Zero Word (Refer to page 128.)</td>
</tr>
<tr>
<td>mnzb</td>
<td>SIMD</td>
<td>Mask Not Zero Byte (Refer to page 259.)</td>
</tr>
<tr>
<td>mnzh</td>
<td>SIMD</td>
<td>Mask Not Zero Half Words (Refer to page 261.)</td>
</tr>
<tr>
<td>mtspr</td>
<td>System</td>
<td>Move to Special Purpose Register Word (Refer to page 353.)</td>
</tr>
<tr>
<td>mulhh_ss</td>
<td>Multiply</td>
<td>Multiply High Signed High Signed Half Word (Refer to page 191.)</td>
</tr>
<tr>
<td>mulhh_su</td>
<td>Multiply</td>
<td>Multiply High Signed High Unsigned Half Word (Refer to page 192.)</td>
</tr>
<tr>
<td>mulhh_us</td>
<td>Multiply</td>
<td>Multiply High Unsigned High Signed Word (Refer to page 193.)</td>
</tr>
<tr>
<td>mulhha_ss</td>
<td>Multiply</td>
<td>Multiply Accumulate High Signed High Signed Half Word (Refer to page 194.)</td>
</tr>
<tr>
<td>mulhha_su</td>
<td>Multiply</td>
<td>Multiply Accumulate High Signed High Unsigned Half Word (Refer to page 195.)</td>
</tr>
<tr>
<td>mulhha_us</td>
<td>Multiply</td>
<td>Multiply Accumulate High Unsigned High Signed Half Word (Refer to page 196.)</td>
</tr>
<tr>
<td>mulhha_uu</td>
<td>Multiply</td>
<td>Multiply Accumulate High Unsigned High Unsigned Half Word (Refer to page 197.)</td>
</tr>
<tr>
<td>mulhlsa_uu</td>
<td>Multiply</td>
<td>Multiply Shift Accumulate High Unsigned High Unsigned Half Word (Refer to page 198.)</td>
</tr>
<tr>
<td>mulhl_ss</td>
<td>Multiply</td>
<td>Multiply High Signed Low Signed Half Word (Refer to page 199.)</td>
</tr>
<tr>
<td>mulhl_su</td>
<td>Multiply</td>
<td>Multiply High Signed Low Unsigned Half Word (Refer to page 200.)</td>
</tr>
<tr>
<td>mulhl_us</td>
<td>Multiply</td>
<td>Multiply High Unsigned Low Signed Half Word (Refer to page 201.)</td>
</tr>
<tr>
<td>mulhl_uu</td>
<td>Multiply</td>
<td>Multiply High Unsigned Low Unsigned Half Word (Refer to page 202.)</td>
</tr>
<tr>
<td>mulhla_ss</td>
<td>Multiply</td>
<td>Multiply Accumulate High Signed Low Signed Half Word (Refer to page 203.)</td>
</tr>
<tr>
<td>mulhla_su</td>
<td>Multiply</td>
<td>Multiply Accumulate High Signed Low Unsigned Half Word (Refer to page 204.)</td>
</tr>
<tr>
<td>mulhla_us</td>
<td>Multiply</td>
<td>Multiply Accumulate High Unsigned Low Signed Half Word (Refer to page 205.)</td>
</tr>
<tr>
<td>mulhla_uu</td>
<td>Multiply</td>
<td>Multiply Accumulate High Unsigned Low Unsigned Half Word (Refer to page 206.)</td>
</tr>
<tr>
<td>mulhlsa_uu</td>
<td>Multiply</td>
<td>Multiply Shift Accumulate High Unsigned Low Signed Half Word (Refer to page 207.)</td>
</tr>
<tr>
<td>mulll_ss</td>
<td>Multiply</td>
<td>Multiply Low Signed Low Signed Half Word (Refer to page 208.)</td>
</tr>
<tr>
<td>mulll_su</td>
<td>Multiply</td>
<td>Multiply Low Signed Low Unsigned Half Word (Refer to page 209.)</td>
</tr>
</tbody>
</table>
### Table 4-3. Master List of Main Processor Instructions (continued)

<table>
<thead>
<tr>
<th>Register</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>mull_uu</td>
<td>Multiply</td>
<td>Multiply Low Unsigned Low Unsigned Half Word (Refer to page 209.)</td>
</tr>
<tr>
<td>mullla_ss</td>
<td>Multiply</td>
<td>Multiply Accumulate Low Signed Low Signed Half Word (Refer to page 210.)</td>
</tr>
<tr>
<td>mullla_su</td>
<td>Multiply</td>
<td>Multiply Accumulate Low Signed Low Unsigned Half Word (Refer to page 211.)</td>
</tr>
<tr>
<td>mullla_uu</td>
<td>Multiply</td>
<td>Multiply Accumulate Low Unsigned Low Unsigned Half Word (Refer to page 212.)</td>
</tr>
<tr>
<td>mulllsa_uu</td>
<td>Multiply</td>
<td>Multiply Shift Accumulate Low Unsigned Low Unsigned Half Word (Refer to page 212.)</td>
</tr>
<tr>
<td>mvnz</td>
<td>Logical</td>
<td>Move Not Zero Word (Refer to page 130.)</td>
</tr>
<tr>
<td>mvz</td>
<td>Logical</td>
<td>Move Zero Word (Refer to page 131.)</td>
</tr>
<tr>
<td>mz</td>
<td>Logical</td>
<td>Mask Zero Word (Refer to page 132.)</td>
</tr>
<tr>
<td>mzb</td>
<td>SIMD</td>
<td>Mask Zero Byte (Refer to page 263.)</td>
</tr>
<tr>
<td>mzh</td>
<td>SIMD</td>
<td>Mask Zero Half Words (Refer to page 265.)</td>
</tr>
<tr>
<td>nap</td>
<td>System</td>
<td>Nap (Refer to page 354.)</td>
</tr>
<tr>
<td>nop</td>
<td>NOP</td>
<td>Architectural No Operation (Refer to page 216.)</td>
</tr>
<tr>
<td>nor</td>
<td>Logical</td>
<td>Nor Word (Refer to page 134.)</td>
</tr>
<tr>
<td>or</td>
<td>Logical</td>
<td>Or Word (Refer to page 136.)</td>
</tr>
<tr>
<td>ori</td>
<td>Logical</td>
<td>Or Immediate Word (Refer to page 138.)</td>
</tr>
<tr>
<td>packhb</td>
<td>SIMD</td>
<td>Pack Low Byte (Refer to page 269.)</td>
</tr>
<tr>
<td>packhs</td>
<td>SIMD</td>
<td>Pack High Half Words Saturating (Refer to page 271.)</td>
</tr>
<tr>
<td>packlb</td>
<td>SIMD</td>
<td>Pack Low Byte (Refer to page 273.)</td>
</tr>
<tr>
<td>packbs_u</td>
<td>SIMD</td>
<td>Pack Half Words Saturating (Refer to page 267.)</td>
</tr>
<tr>
<td>pcnt</td>
<td>Bit Manipulation</td>
<td>Population Count Word (Refer to page 75.)</td>
</tr>
<tr>
<td>rl</td>
<td>Logical</td>
<td>Rotate Left Word (Refer to page 140.)</td>
</tr>
<tr>
<td>rli</td>
<td>Logical</td>
<td>Rotate Left Immediate Word (Refer to page 142.)</td>
</tr>
<tr>
<td>s1a</td>
<td>Arithmetic</td>
<td>Shift Left One Add Word (Refer to page 53.)</td>
</tr>
<tr>
<td>s2a</td>
<td>Arithmetic</td>
<td>Shift Left Two Add Word (Refer to page 55.)</td>
</tr>
<tr>
<td>s3a</td>
<td>Arithmetic</td>
<td>Shift Left Three Add Word (Refer to page 57.)</td>
</tr>
<tr>
<td>sadab_u</td>
<td>SIMD</td>
<td>Sum of Absolute Difference Accumulate Unsigned Bytes (Refer to page 267.)</td>
</tr>
</tbody>
</table>
### Table 4-3. Master List of Main Processor Instructions (continued)

<table>
<thead>
<tr>
<th>Register</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>sadah</td>
<td>SIMD</td>
<td>Sum of Absolute Difference Accumulate Half Words (Refer to page 276.)</td>
</tr>
<tr>
<td>sadah_u</td>
<td>SIMD</td>
<td>Sum of Absolute Difference Accumulate Unsigned Half Words (Refer to page 277.)</td>
</tr>
<tr>
<td>sadb_u</td>
<td>SIMD</td>
<td>Sum of Absolute Difference Unsigned Bytes (Refer to page 278.)</td>
</tr>
<tr>
<td>sadh</td>
<td>SIMD</td>
<td>Sum of Absolute Difference Half Words (Refer to page 279.)</td>
</tr>
<tr>
<td>sadh_u</td>
<td>SIMD</td>
<td>Sum of Absolute Difference Unsigned Half Words (Refer to page 280.)</td>
</tr>
<tr>
<td>sb</td>
<td>Memory</td>
<td>Store Byte (Refer to page 176.)</td>
</tr>
<tr>
<td>sbadd</td>
<td>Memory</td>
<td>Store Byte and Add (Refer to page 177.)</td>
</tr>
<tr>
<td>seq</td>
<td>Compare</td>
<td>Set Equal Word (Refer to page 77.)</td>
</tr>
<tr>
<td>seqb</td>
<td>SIMD</td>
<td>Set Equal To Byte (Refer to page 281.)</td>
</tr>
<tr>
<td>seqh</td>
<td>SIMD</td>
<td>Set Equal To Half Words (Refer to page 283.)</td>
</tr>
<tr>
<td>seqi</td>
<td>Compare</td>
<td>Set Equal Immediate Word (Refer to page 79.)</td>
</tr>
<tr>
<td>seqib</td>
<td>SIMD</td>
<td>Set Equal To Immediate Byte (Refer to page 285.)</td>
</tr>
<tr>
<td>seqih</td>
<td>SIMD</td>
<td>Set Equal To Immediate Half Words (Refer to page 287.)</td>
</tr>
<tr>
<td>sh</td>
<td>Memory</td>
<td>Store Half Word (Refer to page 178.)</td>
</tr>
<tr>
<td>shadd</td>
<td>Memory</td>
<td>Store Half Word and Add (Refer to page 179.)</td>
</tr>
<tr>
<td>shl</td>
<td>Logical</td>
<td>Logical Shift Left Word (Refer to page 144.)</td>
</tr>
<tr>
<td>shlb</td>
<td>SIMD</td>
<td>Logical Shift Left Bytes (Refer to page 289.)</td>
</tr>
<tr>
<td>shlh</td>
<td>SIMD</td>
<td>Logical Shift Left Half Words (Refer to page 291.)</td>
</tr>
<tr>
<td>shli</td>
<td>Logical</td>
<td>Logical Shift Left Immediate Word (Refer to page 146.)</td>
</tr>
<tr>
<td>shlib</td>
<td>SIMD</td>
<td>Logical Shift Left Immediate Bytes (Refer to page 292.)</td>
</tr>
<tr>
<td>shlih</td>
<td>SIMD</td>
<td>Logical Shift Left Immediate Half Words (Refer to page 294.)</td>
</tr>
<tr>
<td>shr</td>
<td>Logical</td>
<td>Logical Shift Right Word (Refer to page 148.)</td>
</tr>
<tr>
<td>shrb</td>
<td>SIMD</td>
<td>Logical Shift Right Bytes (Refer to page 296.)</td>
</tr>
<tr>
<td>shrh</td>
<td>SIMD</td>
<td>Logical Shift Right Half Words (Refer to page 298.)</td>
</tr>
<tr>
<td>shri</td>
<td>Logical</td>
<td>Logical Shift Right Immediate Word (Refer to page 150.)</td>
</tr>
<tr>
<td>shrib</td>
<td>SIMD</td>
<td>Logical Shift Right Immediate Bytes (Refer to page 300.)</td>
</tr>
<tr>
<td>shrh</td>
<td>SIMD</td>
<td>Logical Shift Right Immediate Half Words (Refer to page 302.)</td>
</tr>
<tr>
<td>slt</td>
<td>Compare</td>
<td>Set Less Than Word (Refer to page 81.)</td>
</tr>
</tbody>
</table>
### Table 4-3. Master List of Main Processor Instructions (continued)

<table>
<thead>
<tr>
<th>Register</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>sltu</td>
<td>Compare</td>
<td>Set Less Than Unsigned Word (Refer to page 83.)</td>
</tr>
<tr>
<td>sltb</td>
<td>SIMD</td>
<td>Set Less Than Byte (Refer to page 304.)</td>
</tr>
<tr>
<td>sltub</td>
<td>SIMD</td>
<td>Set Less Than Unsigned Byte (Refer to page 306.)</td>
</tr>
<tr>
<td>sllte</td>
<td>Compare</td>
<td>Set Less Than or Equal Word (Refer to page 85.)</td>
</tr>
<tr>
<td>sllteu</td>
<td>Compare</td>
<td>Set Less Than or Equal Unsigned Word (Refer to page 87.)</td>
</tr>
<tr>
<td>slltb</td>
<td>SIMD</td>
<td>Set Less Than or Equal Byte (Refer to page 308.)</td>
</tr>
<tr>
<td>sllteb</td>
<td>SIMD</td>
<td>Set Less Than or Equal Unsigned Byte (Refer to page 310.)</td>
</tr>
<tr>
<td>slteh</td>
<td>SIMD</td>
<td>Set Less Than or Equal Half Words (Refer to page 312.)</td>
</tr>
<tr>
<td>sltehu</td>
<td>SIMD</td>
<td>Set Less Than or Equal Unsigned Half Words (Refer to page 314.)</td>
</tr>
<tr>
<td>sllt</td>
<td>SIMD</td>
<td>Set Less Than Half Words (Refer to page 316.)</td>
</tr>
<tr>
<td>slltu</td>
<td>SIMD</td>
<td>Set Less Than Unsigned Half Words (Refer to page 318.)</td>
</tr>
<tr>
<td>stl</td>
<td>Compare</td>
<td>Set Less Than Immediate Word (Refer to page 89.)</td>
</tr>
<tr>
<td>stlui</td>
<td>Compare</td>
<td>Set Less Than Immediate Unsigned Word (Refer to page 91.)</td>
</tr>
<tr>
<td>stltb</td>
<td>SIMD</td>
<td>Set Less Than Immediate Byte (Refer to page 320.)</td>
</tr>
<tr>
<td>stltib</td>
<td>SIMD</td>
<td>Set Less Than Immediate Unsigned Byte (Refer to page 322.)</td>
</tr>
<tr>
<td>stltih</td>
<td>SIMD</td>
<td>Set Less Than Immediate Half Words (Refer to page 324.)</td>
</tr>
<tr>
<td>stltihu</td>
<td>SIMD</td>
<td>Set Less Than Immediate Unsigned Half Words (Refer to page 326.)</td>
</tr>
<tr>
<td>sne</td>
<td>Compare</td>
<td>Set Not Equal Word (Refer to page 93.)</td>
</tr>
<tr>
<td>sneb</td>
<td>SIMD</td>
<td>Set Not Equal To Byte (Refer to page 328.)</td>
</tr>
<tr>
<td>sneh</td>
<td>SIMD</td>
<td>Set Not Equal To Half Words (Refer to page 330.)</td>
</tr>
<tr>
<td>sra</td>
<td>Logical</td>
<td>Arithmetic Shift Right Word (Refer to page 152.)</td>
</tr>
<tr>
<td>srab</td>
<td>SIMD</td>
<td>Arithmetic Shift Right Bytes (Refer to page 332.)</td>
</tr>
<tr>
<td>srah</td>
<td>SIMD</td>
<td>Arithmetic Shift Right Half Words (Refer to page 334.)</td>
</tr>
<tr>
<td>srai</td>
<td>Logical</td>
<td>Arithmetic Shift Right Immediate Word (Refer to page 154.)</td>
</tr>
<tr>
<td>sraiib</td>
<td>SIMD</td>
<td>Arithmetic Shift Right Immediate Bytes (Refer to page 336.)</td>
</tr>
<tr>
<td>srah</td>
<td>SIMD</td>
<td>Arithmetic Shift Right Immediate Half Words (Refer to page 338.)</td>
</tr>
<tr>
<td>sub</td>
<td>Arithmetic</td>
<td>Subtract Word (Refer to page 59.)</td>
</tr>
<tr>
<td>subs</td>
<td>Arithmetic</td>
<td>Subtract Word Saturating (Refer to page 61.)</td>
</tr>
<tr>
<td>subb</td>
<td>SIMD</td>
<td>Subtract Bytes (Refer to page 340.)</td>
</tr>
<tr>
<td>Register</td>
<td>Type</td>
<td>Description</td>
</tr>
<tr>
<td>------------</td>
<td>----------</td>
<td>--------------------------------------------------</td>
</tr>
<tr>
<td>subb_u</td>
<td>SIMD</td>
<td>Subtract Bytes Saturating Unsigned (Refer to page 342.)</td>
</tr>
<tr>
<td>subh</td>
<td>SIMD</td>
<td>Subtract Half Words (Refer to page 344.)</td>
</tr>
<tr>
<td>subhs</td>
<td>SIMD</td>
<td>Subtract Half Words Saturating (Refer to page 345.)</td>
</tr>
<tr>
<td>sw</td>
<td>Memory</td>
<td>Store Word (Refer to page 180.)</td>
</tr>
<tr>
<td>swadd</td>
<td>Memory</td>
<td>Store Word and Add (Refer to page 181.)</td>
</tr>
<tr>
<td>swint0</td>
<td>System</td>
<td>Software Interrupt 0 (Refer to page 355.)</td>
</tr>
<tr>
<td>swint1</td>
<td>System</td>
<td>Software Interrupt 1 (Refer to page 356.)</td>
</tr>
<tr>
<td>swint2</td>
<td>System</td>
<td>Software Interrupt 2 (Refer to page 357.)</td>
</tr>
<tr>
<td>swint3</td>
<td>System</td>
<td>Software Interrupt 3 (Refer to page 358.)</td>
</tr>
<tr>
<td>tblidxb0</td>
<td>Logical</td>
<td>Table Index Byte 0 (Refer to page 156.)</td>
</tr>
<tr>
<td>tblidxb1</td>
<td>Logical</td>
<td>Table Index Byte 1 (Refer to page 157.)</td>
</tr>
<tr>
<td>tblidxb2</td>
<td>Logical</td>
<td>Table Index Byte 2 (Refer to page 158.)</td>
</tr>
<tr>
<td>tblidxb3</td>
<td>Logical</td>
<td>Table Index Byte 3 (Refer to page 159.)</td>
</tr>
<tr>
<td>tns</td>
<td>Memory</td>
<td>Test and Set Word (Refer to page 182.)</td>
</tr>
<tr>
<td>wh64</td>
<td>Memory</td>
<td>Write Hint 64 Bytes (Refer to page 190.)</td>
</tr>
<tr>
<td>xor</td>
<td>Logical</td>
<td>Exclusive Or Word (Refer to page 160.)</td>
</tr>
<tr>
<td>xori</td>
<td>Logical</td>
<td>Exclusive Or Immediate Word (Refer to page 162.)</td>
</tr>
</tbody>
</table>
4.1.4 Arithmetic Instructions

The following sections provide detailed descriptions of arithmetic instructions listed alphabetically:

- add: Add Word
- addi: Add Immediate Word
- addli: Add Long Immediate Word
- addlis: Add Long Immediate Static Write Word
- adds: Add Word Saturating
- auli: Add Upper Long Immediate Word
- s1a: Shift Left One Add Word
- s2a: Shift Left Two Add Word
- s3a: Shift Left Three Add Word
- sub: Subtract Word
- subs: Subtract Word Saturating
add: Add Word

Syntax
add Dest, SrcA, SrcB

Example
add r5, r6, r7

Description
Adds two words together.

Functional Description
rf[Dest] = rf[SrcA] + rf[SrcB];

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>30</td>
<td>n</td>
</tr>
<tr>
<td>29</td>
<td>00000011</td>
</tr>
<tr>
<td>28</td>
<td>s</td>
</tr>
<tr>
<td>27</td>
<td>s</td>
</tr>
<tr>
<td>26</td>
<td>d</td>
</tr>
<tr>
<td></td>
<td>Dest_X0 - Dest</td>
</tr>
<tr>
<td></td>
<td>SrcA_X0 - SrcA</td>
</tr>
<tr>
<td></td>
<td>SrcB_X0 - SrcB</td>
</tr>
<tr>
<td></td>
<td>RRROpcodeExtension_X0 - 0x3</td>
</tr>
<tr>
<td></td>
<td>S_X0 - Sbit</td>
</tr>
<tr>
<td></td>
<td>Opcode_X0 - 0x0</td>
</tr>
</tbody>
</table>

**Figure 4-39: add in X0 Bit Descriptions**

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>62</td>
<td>n</td>
</tr>
<tr>
<td>61</td>
<td>00000011</td>
</tr>
<tr>
<td>60</td>
<td>s</td>
</tr>
<tr>
<td>59</td>
<td>s</td>
</tr>
<tr>
<td>58</td>
<td>d</td>
</tr>
<tr>
<td></td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td></td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td></td>
<td>SrcB_X1 - SrcB</td>
</tr>
<tr>
<td></td>
<td>RRROpcodeExtension_X1 - 0x3</td>
</tr>
<tr>
<td></td>
<td>S_X1 - Sbit</td>
</tr>
<tr>
<td></td>
<td>Opcode_X1 - 0x1</td>
</tr>
</tbody>
</table>

**Figure 4-40: add in X1 Bit Descriptions**
Instruction Set Architecture

Figure 4-41: add in Y0 Bit Descriptions

Figure 4-42: add in Y1 Bit Descriptions
addi: Add Immediate Word

Syntax
addi Dest, SrcA, Imm8

Example
addi r5, r6, 5

Description
Adds one word with a sign extended immediate.

Functional Description
rf[Dest] = rf[SrcA] + signExtend8(Imm8);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 100|    | 00 | 00 | 00 | 01 | 11 | i  | s  | d  | Dest_X0 - Dest | SrcA_X0 - SrcA | Imm8_X0 - Imm8 | ImmOpcodeExtension_X0 - 0x3 | S_X0 - Sbit | Opcode_X0 - 0x4 |

Figure 4-43: addi in X0 Bit Descriptions

| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0110|    | 00 | 00 | 00 | 11 | i  | s  | d  | Dest_X1 - Dest | SrcA_X1 - SrcA | Imm8_X1 - Imm8 | ImmOpcodeExtension_X1 - 0x3 | S_X1 - Sbit | Opcode_X1 - 0x6 |

Figure 4-44: addi in X1 Bit Descriptions
Figure 4-45: addi in Y0 Bit Descriptions

Figure 4-46: addi in Y1 Bit Descriptions
addli: Add Long Immediate Word

Syntax
addli Dest, SrcA, Imm16

Example
addli r5, r6, 0x1234

Description
Adds one word with a sign extended long immediate.

Functional Description
rf[Dest] = rf[SrcA] + signExtend16(Imm16);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

010  i  s  d

- Dest_X0 - Dest
- SrcA_X0 - SrcA
- Imm16_X0 - Imm16
- Opcode_X0 - 0x2

Figure 4-47: addli in X0 Bit Descriptions

59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31

011  i  s  d

- Dest_X1 - Dest
- SrcA_X1 - SrcA
- Imm16_X1 - Imm16
- Opcode_X1 - 0x3

Figure 4-48: addli in X1 Bit Descriptions
addlis: Add Long Immediate Static Write Word

Syntax
addlis Dest, SrcA, Imm16

Example
addlis r5, r6, 0x1234

Description
Adds one word with a sign extended long immediate. The result is placed in the destination register and enqueued in the static network output port.

Functional Description
rf[Dest] = rf[SrcA] + signExtend16(Imm16);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>30-29</td>
<td>Dest_X0 - Dest</td>
</tr>
<tr>
<td>28-27</td>
<td>SrcA_X0 - SrcA</td>
</tr>
<tr>
<td>26-20</td>
<td>Imm16_X0 - Imm16</td>
</tr>
<tr>
<td>19-12</td>
<td>Opcode_X0 - 0x1</td>
</tr>
</tbody>
</table>

Figure 4-49: addlis in X0 Bit Descriptions

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>62-61</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>60-59</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>58-52</td>
<td>Imm16_X1 - Imm16</td>
</tr>
<tr>
<td>51-36</td>
<td>Opcode_X1 - 0x2</td>
</tr>
</tbody>
</table>

Figure 4-50: addlis in X1 Bit Descriptions
adds: Add Word Saturating

Syntax
adds Dest, SrcA, SrcB

Example
adds r5, r6, r7

Description
Adds two words together saturating the result at the minimum negative value or the maximum positive value.

NOTE: This instruction is only supported in the TILEPro family of products.

Functional Description
\[
rf[Dest] = \text{signed_saturate32}((\text{SignedDoubleMachineWord}) rf[SrcA] + (\text{SignedDoubleMachineWord}) rf[SrcB])
\]

Valid Pipelines

```
X0  X1  Y0  Y1  Y2
X   X
```

Encoding

```
000  n  00110000  s  s  d
```

Figure 4-51: adds in X0 Bit Descriptions
Figure 4-52: adds in X1 Bit Descriptions
auli: Add Upper Long Immediate Word

Syntax
auli Dest, SrcA, Imm16

Example
auli r5, r6, 0x1234

Description
Returns the addition of the first source operand and a sign extended long immediate loaded into the 16 most significant bits of a word. This instruction only contains an immediate form.

Functional Description
rf[Dest] = rf[SrcA] + (signExtend16( Imm16 ) << 16);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

\[
\begin{array}{cccc}
011 & i & a & d \\
\end{array}
\]

- Dest_X0 - Dest
- SrcA_X0 - SrcA
- Imm16_X0 - Imm16
- Opcode_X0 - 0x3

Figure 4-53: auli in X0 Bit Descriptions

\[
\begin{array}{cccc}
0100 & i & a & d \\
\end{array}
\]

- Dest_X1 - Dest
- SrcA_X1 - SrcA
- Imm16_X1 - Imm16
- Opcode_X1 - 0x4

Figure 4-54: auli in X1 Bit Descriptions
s1a: Shift Left One Add Word

Syntax

\texttt{s1a Dest, SrcA, SrcB}

Example

\texttt{s1a r5, r6, r7}

Description

Shifts the first input operand left by one bit, and then adds the second source operand.

Functional Description

\[ rf[\text{Dest}] = (rf[\text{SrcA}] \ll 1) + rf[\text{SrcB}]; \]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
</tbody>
</table>

Encoding

\begin{figure}
\begin{tabular}{|c|c|c|c|c|}
\hline
\hline
000 & n & 00011011 & s & s & d & \\
\hline
\end{tabular}
\end{figure}

\textbf{Figure 4-55: s1a in X0 Bit Descriptions}

\begin{figure}
\begin{tabular}{|c|c|c|c|c|}
\hline
\hline
0001 & n & 00001110 & s & s & d & \\
\hline
\end{tabular}
\end{figure}

\textbf{Figure 4-56: s1a in X1 Bit Descriptions}
Figure 4-57: s1a in Y0 Bit Descriptions

Figure 4-58: s1a in Y1 Bit Descriptions
s2a: Shift Left Two Add Word

Syntax

s2a Dest, SrcA, SrcB

Example

s2a r5, r6, r7

Description

Shifts the first input operand left by two bits, and then adds the second source operand.

Functional Description

\[ rf[\text{Dest}] = (rf[\text{SrcA}] \ll 2) + rf[\text{SrcB}] ; \]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
000 n 000111000 s s d
Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRROpcodeExtension_X0 - 0x38
S_X0 - Sbit
Opcode_X0 - 0x0
```

```
0001 n 000011110 s s d
Dest_X1 - Dest
SrcA_X1 - SrcA
SrcB_X1 - SrcB
RRROpcodeExtension_X1 - 0x1E
S_X1 - Sbit
Opcode_X1 - 0x1
```

Figure 4-59: s2a in X0 Bit Descriptions

Figure 4-60: s2a in X1 Bit Descriptions
Figure 4-61: s2a in Y0 Bit Descriptions

Figure 4-62: s2a in Y1 Bit Descriptions
s3a: Shift Left Three Add Word

Syntax
s3a Dest, SrcA, SrcB

Example
s3a r5, r6, r7

Description
Shifts the first input operand left by three bits, and then adds the second source operand.

Functional Description
rf[Dest] = (rf[SrcA] << 3) + rf[SrcB];

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
000 00011001  s  s  d
  |     |        |
  |     |        |
  |     |        |
  |     |        |
  |     |        |
  Dest_X0 - Dest
  SrcA_X0 - SrcA
  SrcB_X0 - SrcB
  RRROpcodeExtension_X0 - 0x39
  S_X0 - Sbit
  Opcode_X0 - 0x0

0001  00011111  s  s  d
  |     |        |
  |     |        |
  |     |        |
  |     |        |
  |     |        |
  Dest_X1 - Dest
  SrcA_X1 - SrcA
  SrcB_X1 - SrcB
  RRROpcodeExtension_X1 - 0x1F
  S_X1 - Sbit
  Opcode_X1 - 0x1
```

Figure 4-63: s3a in X0 Bit Descriptions

Figure 4-64: s3a in X1 Bit Descriptions
Figure 4-65: s3a in Y0 Bit Descriptions

Figure 4-66: s3a in Y0 Bit Descriptions
sub: Subtract Word

Syntax
sub Dest, SrcA, SrcB

Example
sub r5, r6, r7

Description
Subtracts one word from another.

Functional Description
rf[Dest] = rf[SrCA] - rf[SrCB];

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-67: sub in X0 Bit Descriptions

Figure 4-68: sub in X1 Bit Descriptions
Figure 4-69: sub in Y0 Bit Descriptions

Figure 4-70: sub in Y0 Bit Descriptions
subs: Subtract Word Saturating

Syntax
subs Dest, SrcA, SrcB

Example
subs r5, r6, r7

Description
Subtracts one word from another, saturating the result at the minimum negative value or the maximum positive value.

NOTE: This instruction is only supported in the TILEPro family of products.

Functional Description
rf[Dest] =
signed_saturate32((SignedDoubleMachineWord) rf[Srca] -
(SignedDoubleMachineWord) rf[Srcb]);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-71: subs in X0 Bit Descriptions
Figure 4-72: subs in X1 Bit Descriptions

- Dest_X1 - Dest
- SrcA_X1 - SrcA
- SrcB_X1 - SrcB
- RRROpcodeExtension_X1 - 0x43
- S_X1 - Sbit
- Opcode_X1 - 0x1
4.1.5 Bit Manipulation Instructions

The following sections provide detailed descriptions of bit manipulation instructions listed alphabetically:

- `bitx`: Bit Exchange Word
- `bytex`: Byte Exchange Word
- `clz`: Count Leading Zeros Word
- `crc32_32`: CRC32 32-bit Step
- `crc32_8`: CRC32 8-bit Step
- `ctz`: Count Trailing Zeros Word
- `dword_align`: Double Word Align
- `pcnt`: Population Count Word
bitx: Bit Exchange Word

Syntax

bitx Dest, SrcA

Example

bitx r5, r6

Description

Reorders a word such that the most significant bit becomes the least significant bit in the output, the second most significant bit becomes the second least significant bit in the output, and the nth most significant bit becomes nth least significant bit in the output.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE); counter++) {
    output |=
        (((rf[SrcA] >> (counter)) & 0x1) <<
        ((WORD_SIZE - 1) - counter));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| n  |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |
| 000001011 | 00001 | s | d |

Dest_X0 - Dest
SrcA_X0 - SrcA
Un Opcode Extension_X0 - 0x1
UnShift Opcode Extension_X0 - 0xB
S_X0 - Sbit
Opcode_X0 - 0x7
```

Figure 4-73: bitx in X0 Bit Descriptions
Figure 4-74: bitx in Y0 Bit Descriptions

- Dest_Y0 - Dest
- SrcA_Y0 - SrcA
- Un Opcode Extension_Y0 - 0x1
- UnSh Opcode Extension_Y0 - 0x5
- Opcode_Y0 - 0xD
bytex: Byte Exchange Word

Syntax
bytex Dest, SrcA

Example
bytex r5, r6

Description
Reorders a word such that the most significant byte becomes the least significant byte in the output, the second most significant byte becomes the second least significant byte in the output, and the n’th most significant byte becomes n’th least significant byte in the output. This instruction changes endianness.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
output |= (((rf[SrCA] >> (counter * BYTE_SIZE)) & BYTE_MASK) <<
(((((WORD_SIZE / BYTE_SIZE) - 1) - counter) * BYTE_SIZE));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
111 a 000001011 00010 a d
   Dest_X0 · Dest
   SrcA_X0 · SrCA
   UnOpcodeExtension_X0 · 0x2
   UnShlOpcodeExtension_X0 · 0xB
   S_X0 · Sbit
   Opcode_X0 · 0x7
```

Figure 4-75: bytex in XO Bit Descriptions
Figure 4-76: bytex in YO Bit Descriptions
### clz: Count Leading Zeros Word

**Syntax**

`clz Dest, SrcA`

**Example**

`clz r5, r6`

**Description**

Returs the number leading zeros in a word before a bit is set (1). This instruction scans the input word from the most significant bit to the least significant bit. The result of this operation can range from 0 to `WORD_SIZE`.

**Functional Description**

```c
uint32_t counter;
for (counter = 0; counter < WORD_SIZE; counter++) {
    if (((rf[SrcA] >> (WORD_SIZE - 1 - counter)) & 0x1) {
        break;
    }
}
rf[Dest] = counter;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|    |
| 111|    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |
|    | 0  |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |
|    |    | 000001011 |    | 00011 |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |
```

*Figure 4-77: clz in X0 Bit Descriptions*
Figure 4-78: clz in YO Bit Descriptions
**crc32_32: CRC32 32-bit Step**

**Syntax**

crc32_32 Dest, SrcA, SrcB

**Example**

crc32_32 r5, r6, r7

**Description**

Updates a CRC32 value in the first operand with the second operand.

**Functional Description**

```c
uint32_t accum = rf[Srca];
uint32_t input = rf[Srcb];
for (uint32_t Counter = 0; Counter < 32; Counter++) {
    accum =
        (accum >> 1) ^ ((input & 1) ^ (accum & 1)) ? 0xEDB88320 :
        0x00000000;
    input = input >> 1;
}
rf[Dest] = accum;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n | 00000100 | s | s | d |

*Figure 4-79: crc32_32 in XO Bit Descriptions*
crc32_8: CRC32 8-bit Step

Syntax

crc32_8 Dest, SrcA, SrcB

Example

crc32_8 r5, r6, r7

Description

Updates a CRC32 value in the first operand with the low-order 8 bits of the second operand.

Functional Description

```c
uint32_t accum = rf[SrcA];
uint32_t input = rf[SrcB];
for (uint32_t Counter = 0; Counter < 8; Counter++) {
    accum =
        (accum >> 1) ^ (((input & 1) ^ (accum & 1)) ? 0xEDB88320 : 0x00000000);
    input = input >> 1;
}
rf[Dest] = accum;
```

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n  | 000001010 | s  | s  | d  |
```

Figure 4-80: crc32_8 in XO Bit Descriptions
ctz: Count Trailing Zeros Word

Syntax
ctz Dest, SrcA

Example
ctz r5, r6

Description
Returns the number trailing zeros in a word before a bit is set (1). This instruction scans the input word from the least significant bit to the most significant bit. The result of this operation can range from 0 to \( \text{WORD\_SIZE} \).

Functional Description

```c
uint32_t counter;
for (counter = 0; counter < WORD_SIZE; counter++) {
    if (((rf[SrcA] >> counter) & 0x1) { // Counter += 1 for a bit set
        break;
    }
}
rf[Dest] = counter;
```

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>0</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>111</td>
<td>n</td>
<td>000001011</td>
<td>0100</td>
<td>a</td>
<td>d</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- Dest_X0: Dest
- SrcA_X0: SrcA
- UnOpCodeExtension_X0: 0x4
- UnShOpCodeExtension_X0: 0x8
- S_X0: Sbit
- Opcode_X0: 0x7

Figure 4-81: ctz in X0 Bit Descriptions
Figure 4-82: ctz in Y0 Bit Descriptions
**dword_align: Double Word Align**

**Syntax**

dword_align Dest, SrcA, SrcB

**Example**

dword_align r5, r6, r7

**Description**

Shift a double word by the number of bytes specified by the bottom two bits of the second source operand. The shift direction is to the right when the processor is in little-endian mode, and to the left if the processor is in big-endian mode. The source double word is constructed from the concatenation of the first source operand and the destination register.

NOTE: This instruction is only supported in the TILEPro family of products.

**Functional Description**

\[
\begin{align*}
rf[\text{Dest}] &= \\
&= (\text{UnsignedMachineWord}) \; \text{little_endian()}? \\
&\quad \left(\left(\text{UnsignedDoubleMachineWord}\right) \left(\text{UnsignedMachineWord} \; rf[\text{SrcA}]\right) \ll \text{WORD_SIZE} \right) \\
&\quad \left| \left(\text{UnsignedDoubleMachineWord}\right) \left(\text{UnsignedMachineWord} \; rf[\text{Dest}]\right) \gg (\text{BYTE_SIZE} \times (rf[\text{SrcB}] \& 3)) \right) \\
&\quad \left(\left(\text{UnsignedDoubleMachineWord}\right) \left(\text{UnsignedMachineWord} \; rf[\text{Dest}]\right) \ll \text{WORD_SIZE} \right) \\
&\quad \left| \left(\text{UnsignedDoubleMachineWord}\right) \left(\text{UnsignedMachineWord} \; rf[\text{SrcA}]\right) \gg (\text{BYTE_SIZE} \times (rf[\text{SrcB}] \& 3)) \right) \gg \text{WORD_SIZE};
\end{align*}
\]

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n | 00101111 | s | s | ds |
```

- Dest_X0 - Dest
- SrcA_X0 - SrcA
- SrcB_X0 - SrcB
- RROOpcodeExtension_X0 - 0xSF
- S_X0 - Sbit
- Opcode_X0 - 0x0

*Figure 4-83: dword_align in X0 Bit Descriptions*
pcnt: Population Count Word

**Syntax**

```
pcnt Dest, SrcA
```

**Example**

```
pcnt r5, r6
```

**Description**

Returns the number of bits set (1) in the source operand. The result of this operation can range from 0 to WORD_SIZE.

**Functional Description**

```c
uint32_t counter;
int numberOfOnes = 0;
for (counter = 0; counter < WORD_SIZE; counter++) {
    numberOfOnes += (rf[Srca] >> counter) & 0x1;
}
rf[Dest] = numberOfOnes;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

| 30  | 29  | 28  | 27  | 26  | 25  | 24  | 23  | 22  | 21  | 20  | 19  | 18  | 17  | 16  | 15  | 14  | 13  | 12  | 11  | 10  | 9   | 8   | 7   | 6   | 5   | 4   | 3   | 2   | 1   | 0   |
|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| 111 | n   | 0000010111 | 00111 | s   | d   |

- Dest_X0 - Dest
- Srca_X0 - Srca
- UnOpCodeExtension_X0 - 0x7
- UnShOpCodeExtension_X0 - 0x8
- S_X0 - Sbit
- Opcode_X0 - 0x7

**Figure 4-84: pcnt in XO Bit Descriptions**

<table>
<thead>
<tr>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
</tr>
</thead>
<tbody>
<tr>
<td>1101</td>
<td>101</td>
<td>00111</td>
<td>s</td>
</tr>
</tbody>
</table>

- Dest_Y0 - Dest
- Srca_Y0 - Srca
- UnOpCodeExtension_Y0 - 0x7
- UnShOpCodeExtension_Y0 - 0x5
- Opcode_Y0 - 0xD

**Figure 4-85: pcnt in YO Bit Descriptions**
4.1.6 Compare Instructions

The following sections provide detailed descriptions of compare instructions listed alphabetically.

- `seq`: Set Equal Word
- `seqi`: Set Equal Immediate Word
- `slt`: Set Less Than Word
- `slt_u`: Set Less Than Unsigned Word
- `slte`: Set Less Than or Equal Word
- `slte_u`: Set Less Than or Equal Unsigned Word
- `slti`: Set Less Than Immediate Word
- `slti_u`: Set Less Than Unsigned Immediate Word
- `sne`: Set Not Equal Word
seq: Set Equal Word

Syntax

seq Dest, SrcA, SrcB

Example

seq r5, r6, r7

Description

Sets each result to 1 if the first source operand is equal to the second source operand. Otherwise the result is set to 0.

Functional Description

rf[Dest] =
((UnsignedMachineWord) rf[SrcA] ==
(UnsignedMachineWord) rf[SrcB]) ? 1 : 0;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-86: seq in XO Bit Descriptions

Figure 4-87: seq in X1 Bit Descriptions
Figure 4-88: seq in YO Bit Descriptions

Dest_Y0 - Dest
SrcA_Y0 - SrcA
SrcB_Y0 - SrcB
RRROpcodeExtension_Y0 - 0x2
Opcode_Y0 - 0x6

Figure 4-89: seq in Y1 Bit Descriptions

Dest_Y1 - Dest
SrcA_Y1 - SrcA
SrcB_Y1 - SrcB
RRROpcodeExtension_Y1 - 0x2
Opcode_Y1 - 0x6
seqi: Set Equal Immediate Word

**Syntax**

```
seqi Dest, SrcA, Imm8
```

**Example**

```
seqi r5, r6, 5
```

**Description**

Sets each result to 1 if the first source operand is equal to a sign extended immediate. Otherwise the result is set to 0.

**Functional Description**

```
rf[Dest] =
  ((UnsignedMachineWord) rf[SrcA] ==
   (UnsignedMachineWord) signExtend8(Imm8)) ? 1 : 0;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
| 105 | e | 001011 | i | s | d |
Dest_X0 - Dest
SrcA_X0 - SrcA
Imm8_X0 - Imm8
ImmOpcodeExtension_X0 - 0xB
S_X0 - Sbit
Opcode_X0 - 0x4
```

![Figure 4-90: seqi in XO Bit Descriptions](image)

```
62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31
| 0110 | e | 001110 | i | s | d |
Dest_X1 - Dest
SrcA_X1 - SrcA
Imm8_X1 - Imm8
ImmOpcodeExtension_X1 - 0xE
S_X1 - Sbit
Opcode_X1 - 0x6
```

![Figure 4-91: seqi in X1 Bit Descriptions](image)
Chapter 4 Processor Engine Instruction Set

Figure 4-92: seqi in Y0 Bit Descriptions

Figure 4-93: seqi in Y1 Bit Descriptions
slt: Set Less Than Word

Syntax

```plaintext
slt Dest, SrcA, SrcB
```

Example

```plaintext
slt r5, r6, r7
```

Description

Sets each result to 1 if the first source operand is less than the second source operand. Otherwise the result is set to 0. This instruction treats both source operands as signed values.

Functional Description

```plaintext
rf[Dest] = 
((SignedMachineWord) rf[SrCA] < 
(SignedMachineWord) rf[SrCB]) ? 1 : 0;
```

Valid Pipelines

```
X0  X1  Y0  Y1  Y2
X   X   X   X
```

Encoding

```
0001  n  00101001  s  s  d
```

```
Dest_X0 - Dest
SrCA_X0 - SrCA
SrCB_X0 - SrCB
RRROpcodeExtension_X0 - 0x53
S_X0 - Sbit
Opcode_X0 - 0x0
```

```
62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31
0001  n  00011010  s  s  d
```

```
Dest_X1 - Dest
SrCA_X1 - SrCA
SrCB_X1 - SrCB
RRROpcodeExtension_X1 - 0x35
S_X1 - Sbit
Opcode_X1 - 0x1
```

Figure 4-94: slt in X0 Bit Descriptions

Figure 4-95: slt in X1 Bit Descriptions
Figure 4-96: slt in Y0 Bit Descriptions

Figure 4-97: slt in Y1 Bit Descriptions
slt_u: Set Less Than Unsigned Word

Syntax

\texttt{slt\_u Dest, SrcA, SrcB}

Example

\texttt{slt\_u r5, r6, r7}

Description

Sets each result to 1 if the first source operand is less than the second source operand or sign extended immediate. Otherwise the result is set to 0. This instruction treats both source operands as unsigned values.

Functional Description

\[
rf[\text{Dest}] = \begin{cases} 
1 & \text{if } ((\text{UnsignedMachineWord}) rf[\text{SrcA}] < \text{SignedMachineWord}) rf[\text{SrcB}] \\
0 & \text{otherwise}
\end{cases}
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
</tbody>
</table>

Encoding

- \texttt{n}: 00
- \texttt{00101010}: 0x54
- \texttt{s}: 0x0
- \texttt{0}: 0x1

\textit{Figure 4-98: slt\_u in XO Bit Descriptions}

- \texttt{n}: 00
- \texttt{00101110}: 0x36
- \texttt{s}: 0x0
- \texttt{d}: 0x1

\textit{Figure 4-99: slt\_u in X1 Bit Descriptions}
Figure 4-100: slt_u in Y0 Bit Descriptions

Figure 4-101: slt_u in Y1 Bit Descriptions
slte: Set Less Than or Equal Word

Syntax
slte Dest, SrcA, SrcB

Example
slte r5, r6, r7

Description
Sets each result to 1 if the first source operand is less than or equal to the second source operand. Otherwise the result is set to 0. This instruction treats both source operands as signed values.

Functional Description
rf[Dest] =
   ((SignedMachineWord) rf[SrcA] <=
   (SignedMachineWord) rf[SrcB]) ? 1 : 0;

Valid Pipelines

Encoding

Figure 4-102: slte in XO Bit Descriptions

Figure 4-103: slte in X1 Bit Descriptions
Figure 4-104: slte in Y0 Bit Descriptions

Figure 4-105: slte in Y1 Bit Descriptions
slte_u: Set Less Than or Equal Unsigned Word

Syntax
slte_u Dest, SrcA, SrcB

Example
slte_u r5, r6, r7

Description
Sets each result to 1 if the first source operand is less than or equal to the second source operand. Otherwise the result is set to 0. This instruction treats both source operands as unsigned values.

Functional Description
\[
rf[\text{Dest}] = \begin{cases} 
1 & \text{if } (\text{UnsignedMachineWord } rf[\text{SrcA}] \leq \text{UnsignedMachineWord } rf[\text{SrcB}]) \\
0 & \text{otherwise}
\end{cases};
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>X</td>
</tr>
</tbody>
</table>

Encoding

Figure 4-106: slte_u in X0 Bit Descriptions

Figure 4-107: slte_u in X1 Bit Descriptions
Figure 4-108: slte_u in Y0 Bit Descriptions

Figure 4-109: slte_u in Y1 Bit Descriptions
slti: Set Less Than Immediate Word

Syntax

slti Dest, SrcA, Imm8

Example

slti r5, r6, 5

Description

Sets each result to 1 if the first source operand is less than a sign extended immediate. Otherwise the result is set to 0. This instruction treats both source operands as signed values.

Functional Description

rf[Dest] =
    ((SignedMachineWord) rf[SrCA] <
    ((SignedMachineWord) signExtend8(Imm8))) ? 1 : 0;

Valid Pipelines

Encoding

Figure 4-110: slti in X0 Bit Descriptions

Figure 4-111: slti in X1 Bit Descriptions
Figure 4-112: slti in Y0 Bit Descriptions

Figure 4-113: slti in Y1 Bit Descriptions
slti_u: Set Less Than Unsigned Immediate Word

Syntax
slti_u Dest, SrcA, Imm8

Example
slti_u r5, r6, 5

Description
Sets each result to 1 if the first source operand is less than a sign extended immediate. Otherwise the result is set to 0. This instruction treats both source operands as unsigned values.

Functional Description
\[
rf[\text{Dest}] = \begin{cases} 
1 & (\text{UnsignedMachineWord} rf[\text{SrcA}] < (\text{UnsignedMachineWord} \text{signExtend8}(\text{Imm8}))) \\
0 & \text{otherwise}
\end{cases};
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
</tbody>
</table>

Encoding

Figure 4-114: slti_u in XO Bit Descriptions

Figure 4-115: slti_u in X1 Bit Descriptions
Figure 4-116: slti_u in Y0 Bit Descriptions

Figure 4-117: slti_u in Y1 Bit Descriptions
sne: Set Not Equal Word

Syntax
sne Dest, SrcA, SrcB

Example
sne r5, r6, r7

Description
Sets each result to 1 if the first source operand is not equal to the second source operand. Otherwise the result is set to 0.

Functional Description
rf[Dest] =
   ((UnsignedMachineWord) rf[Srca] !=
    (UnsignedMachineWord) rf[SrCB]) ? 1 : 0;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-118: sne in X0 Bit Descriptions

Figure 4-119: sne in X1 Bit Descriptions
Figure 4-120: sne in Y0 Bit Descriptions

Figure 4-121: sne in Y1 Bit Descriptions
4.1.7 Control Instructions

The following sections provide detailed descriptions of control instructions listed alphabetically.

- `bnbs`: Branch Bit Not Set Word
- `bbnst`: Branch Bit Not Set Taken Word
- `bbs`: Branch Bit Set Word
- `bbst`: Branch Bit Set Taken Word
- `bgez`: Branch Greater Than or Equal to Zero Word
- `bgezt`: Branch Greater Than or Equal to Zero Predict Taken Word
- `bgz`: Branch Greater Than Zero Word
- `bgzt`: Branch Greater Than Zero Predict Taken Word
- `blez`: Branch Less Than or Equal to Zero Word
- `blezt`: Branch Less Than or Equal to Zero Taken Word
- `blz`: Branch Less Than Zero Word
- `blzt`: Branch Less Than Zero Taken Word
- `bnz`: Branch Not Zero Word
- `bnzt`: Branch Not Zero Predict Taken Word
- `bz`: Branch Zero Word
- `bzt`: Branch Zero Predict Taken Word
- `jalb`: Jump and Link Backward
- `jalf`: Jump and Link Forward
- `jalr`: Jump and Link Register
- `jalrp`: Jump and Link Register Predict
- `jb`: Jump Backward
- `jf`: Jump Forward
- `jr`: Jump Register
- `jrp`: Jump Register Predict
- `lnk`: Link
**bbns: Branch Bit Not Set Word**

**Syntax**

```
bbns SrcA, BrOff
```

**Example**

```
bbns r5, target
```

**Description**

Branches to the target if the source operand’s bit 0 is not set (0). Otherwise, the program counter advances to the next instruction in program order. Branch bit not set hints to a branch prediction mechanism that the branch is not taken. This branch does an implicit move of the source operand to register ZERO_REGISTER.

**Functional Description**

```
if (!(rf[SrcA] & 0x1)) {
    setNextPC(getCurrentPC() +
    (signExtend17(BrOff) <<
    (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)))
    branchHintedIncorrect();
} else {
    branchHintedCorrect();
}
rf[ZERO_REGISTER] = rf[SrcA];
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0101 | n |     | i |    | s | i | 110 |
```

**Figure 4-122: bbns in X1 Bit Descriptions**
bbnst: Branch Bit Not Set Taken Word

**Syntax**

```
bbnst SrcA, BrOff
```

**Example**

```
bbnst r5, target
```

**Description**

Branches to the target if the source operand’s bit 0 is not set (0). Otherwise, the program counter advances to the next instruction in program order. Branch bit not set predict taken hints to a branch prediction mechanism that the branch is taken. This branch does an implicit move of the source operand to register `ZERO_REGISTER`.

**Functional Description**

```
if (!(rf[SrcA] & 0x1)) {
    setNextPC(getCurrentPC() + (signExtend17(BrOff) << (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
    branchHintedCorrect();
} else {
    branchHintedIncorrect();
}
rf[ZERO_REGISTER] = rf[SrcA];
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-123: bbnst in X1 Bit Descriptions](image)
bbs: Branch Bit Set Word

Syntax
bbs SrcA, BrOff

Example
bbs r5, target

Description
Branches to the target if the source operand’s bit 0 is set (1). Otherwise, the program counter
advances to the next instruction in program order. Branch bit set hints to a branch prediction
mechanism that the branch is not taken. This branch does an implicit move of the source operand
to register ZERO_REGISTER.

Functional Description
if (rf[SrcA] & 0x1) {
    setNextPC(getCurrentPC() +
        (signExtend17(BrOff) <<
        (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
    branchHintedIncorrect();
} else {
    branchHintedCorrect();
}
rf[ZERO_REGISTER] = rf[SrcA];

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td>X</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0101 | a | i | s | i | 1101 |

BrType_X1 - 0xD
SrcA_X1 - SrcA
S_X1 - Sbit
Opcode_X1 - 0x5

Figure 4-124: bbs in X1 Bit Descriptions
bbst: Branch Bit Set Taken Word

Syntax

bbst SrcA, BrOff

Example

bbst r5, target

Description

Branches to the target if the source operand’s bit 0 is set (1). Otherwise, the program counter advances to the next instruction in program order. Branch bit set predict taken hints to a branch prediction mechanism that the branch is taken. This branch does an implicit move of the source operand to register ZERO_REGISTER.

Functional Description

if (rf[SrCA] & 0x1) {
    setNextPC(getCurrentPC() +
    (signExtend17(BrOff) <<
    (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
    branchHintedCorrect();
} else {
    branchHintedIncorrect();
}
rf[ZERO_REGISTER] = rf[SrCA];

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0101 | a | i | s | i | 1010 |

Figure 4-125: bbst in X1 Bit Descriptions
bgez: Branch Greater Than or Equal to Zero Word

Syntax
bgez SrcA, BrOff

Example
bgez r5, target

Description
Branches to the target if the source operand is greater than or equal to 0. Otherwise, the program counter advances to the next instruction in program order. Branch greater than or equal to 0 hints to a branch prediction mechanism that the branch is not taken.

This branch does an implicit move of the source operand to register ZERO_REGISTER.

Functional Description
if (rf[SrcA] >= 0) {
    setNextPC(getCurrentPC() +
    (signExtend17(BrOff) <<
    (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
    branchHintedIncorrect();
} else {
    branchHintedCorrect();
}
rf[ZEROREGISTER] = rf[SrcA];

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0101 | s | i | 01 |

BrType_X1 - 0x7
SrcA_X1 - SraA
BrOff_X1[14:0] - BrOff[14:0]
S_X1 - SraB
Opcode_X1 - 0x5

Figure 4-126: bgez in X1 Bit Descriptions
bgezt: Branch Greater Than or Equal to Zero Predict Taken Word

Syntax

\[ \text{bgezt} \text{ SrcA, BrOff} \]

Example

\[ \text{bgezt} \text{ r5, target} \]

Description

Branches to the target if the source operand is greater than or equal to 0. Otherwise, the program counter advances to the next instruction in program order. Branch greater than or equal to 0 predict taken hints to a branch prediction mechanism that the branch is taken. This branch does an implicit move of the source operand to register \text{ZERO\_REGISTER}.

Functional Description

\[
\begin{align*}
\text{if} \ (\text{rf}[\text{SrcA}] \geq 0) \ {\{} \\
\ & \text{setNextPC(getCurrentPC()) +} \\
\ & \quad (\text{signExtend17(\text{BrOff})} \ll (\text{INSTRUCTION\_SIZE\_LOG\_2} - \text{BYTE\_SIZE\_LOG\_2})); \\
\ & \quad \text{branchHintedCorrect();} \\
\} \quad \text{else} \ {\{} \\
\ & \quad \text{branchHintedIncorrect();} \\
\} \\
\ & \text{rf[ZERO\_REGISTER]} = \text{rf[SrcA]};
\end{align*}
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

\[
\begin{array}{cccccccccccccc}
0101 & n & i & s & i & 011 & \end{array}
\]

\[
\begin{align*}
\text{BrType}_X1 \cdot \text{0x7} \\
\text{BrOff}_X1[16:15] \cdot \text{BrOff}[16:15] \\
\text{SrcA}_X1 \cdot \text{SrcA} \\
\text{BrOff}_X1[14:0] \cdot \text{BrOff}[14:0] \\
\text{S}_X1 \cdot \text{Sbit} \\
\text{Opcode}_X1 \cdot \text{0x5}
\end{align*}
\]

Figure 4-127: bgezt in X1 Bit Descriptions
bgz: Branch Greater Than Zero Word

Syntax

bgz SrcA, BrOff

Example

bgz r5, target

Description

Branches to the target if the source operand is greater than 0. Otherwise, the program counter advances to the next instruction in program order. Branch greater than 0 hints to a branch prediction mechanism that the branch is not taken. This branch does an implicit move of the source operand to register ZERO_REGISTER.

Functional Description

if (rf[Srca] > 0) {
    setNextPC(getCurrentPC() +
    (signExtend17(BrOff) <<
    (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
    branchHintedIncorrect();
} else {
    branchHintedCorrect();
}
rf[ZERO_REGISTER] = rf[Srca];

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>X</td>
</tr>
</tbody>
</table>

Encoding

| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0101 | n | i | s | i | 0101 |

BrType_X1 - 0x5
SrcA_X1 - Srca
BrOff_X1[14:0] - BrOff[14:0]
S_X1 - Sbit
Opcode_X1 - 0x5

Figure 4-128: bgz in X1 Bit Descriptions
bgzt: Branch Greater Than Zero Predict Taken Word

**Syntax**

```
bgzt SrcA, BrOff
```

**Example**

```
bgzt r5, target
```

**Description**

Branches to the target if the source operand is greater than 0. Otherwise, the program counter advances to the next instruction in program order. Branch greater than 0 predict taken hints to a branch prediction mechanism that the branch is taken. This branch does an implicit move of the source operand to register `ZERO_REGISTER`.

**Functional Description**

```c
if (rf[SrA] > 0) {
    setNextPC(getCurrentPC() +
          (signExtend17(BrOff) <<
             (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
    branchHintedCorrect();
} else {
    branchHintedIncorrect();
}
rf[ZERO_REGISTER] = rf[SrA];
```

**Valid Pipelines**

```
<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

**Encoding**

```
0101 n i a s i 0101
```

- `BrType_X1` - 0x5
- `SrcA_X1` - SrA
- `BrOff_X1[14:0]` - BrOff[14:0]
- `S_X1` - Sbit
- `Opcode_X1` - 0x5

*Figure 4-129: bgzt in X1 Bit Descriptions*
blez: Branch Less Than or Equal to Zero Word

Syntax

blez SrcA, BrOff

Example

blez r5, target

Description

Branches to the target if the source operand is less than or equal to 0. Otherwise, the program counter advances to the next instruction in program order. Branch less than or equal to 0 hints to a branch prediction mechanism that the branch is not taken. This branch does an implicit move of the source operand to register ZERO_REGISTER.

Functional Description

if (rf[SrcA] <= 0) {
    setNextPC(getCurrentPC() +
    (signExtend17(BrOff) <<
    (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
    branchHintedIncorrect();
} else {
    branchHintedCorrect();
}
rf[ZERO_REGISTER] = rf[SrA];

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>62</th>
<th>61</th>
<th>60</th>
<th>59</th>
<th>58</th>
<th>57</th>
<th>56</th>
<th>55</th>
<th>54</th>
<th>53</th>
<th>52</th>
<th>51</th>
<th>50</th>
<th>49</th>
<th>48</th>
<th>47</th>
<th>46</th>
<th>45</th>
<th>44</th>
<th>43</th>
<th>42</th>
<th>41</th>
<th>40</th>
<th>39</th>
<th>38</th>
<th>37</th>
<th>36</th>
<th>35</th>
<th>34</th>
<th>33</th>
<th>32</th>
<th>31</th>
</tr>
</thead>
<tbody>
<tr>
<td>0101</td>
<td>i</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 4-130: blez in X1 Bit Descriptions
**blezt: Branch Less Than or Equal to Zero Taken Word**

**Syntax**

blezt SrcA, BrOff

**Example**

blezt r5, target

**Description**

Branches to the target if the source operand is less than or equal to 0. Otherwise, the program counter advances to the next instruction in program order. Branch less than or equal to 0 predict taken hints to a branch prediction mechanism that the branch is taken. This branch does an implicit move of the source operand to register ZERO_REGISTER.

**Functional Description**

```
if (rf[SrcA] <= 0) {
    setNextPC(getCurrentPC() +
    (signExtend17(BrOff) <<
    (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
    branchHintedCorrect();
} else {
    branchHintedIncorrect();
}
rf[ZERO_REGISTER] = rf[SrcA];
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-131: blezt in X1 Bit Descriptions](image-url)
blz: Branch Less Than Zero Word

Syntax

blz SrcA, BrOff

Example

blz r5, target

Description

Branches to the target if the source operand is less than 0. Otherwise, the program counter advances to the next instruction in program order. Branch less than 0 hints to a branch prediction mechanism that the branch is not taken. This branch does an implicit move of the source operand to register ZERO_REGISTER.

Functional Description

```c
if (rf[SrcA] < 0) {
    setNextPC(getCurrentPC() +
    (signExtend17(BrOff) <<
    (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
    branchHintedIncorrect();
} else {
    branchHintedCorrect();
}
rf[ZERO_REGISTER] = rf[SrcA];
```

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
<table>
<thead>
<tr>
<th></th>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>62</td>
<td>61</td>
<td>60</td>
<td>59</td>
<td>58</td>
<td>57</td>
</tr>
<tr>
<td>56</td>
<td>55</td>
<td>54</td>
<td>53</td>
<td>52</td>
<td>51</td>
</tr>
<tr>
<td>50</td>
<td>49</td>
<td>48</td>
<td>47</td>
<td>46</td>
<td>45</td>
</tr>
<tr>
<td>44</td>
<td>43</td>
<td>42</td>
<td>41</td>
<td>40</td>
<td>39</td>
</tr>
<tr>
<td>38</td>
<td>37</td>
<td>36</td>
<td>35</td>
<td>34</td>
<td>33</td>
</tr>
<tr>
<td>32</td>
<td>31</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

Figure 4-132: blz in X1 Bit Descriptions
blzt: Branch Less Than Zero Taken Word

Syntax
blzt SrcA, BrOff

Example
blzt r5, target

Description
Branches to the target if the source operand is less than 0. Otherwise, the program counter advances to the next instruction in program order. Branch less than 0 predict taken hints to a branch prediction mechanism that the branch is taken. This branch does an implicit move of the source operand to register ZERO_REGISTER.

Functional Description
if (rf[SrcA] < 0) {
    setNextPC(getCurrentPC() +
            (signExtend17(BrOff) <<
             (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2))));
    branchHintedCorrect();
} else {
    branchHintedIncorrect();
} rf[ZERO_REGISTER] = rf[SrcA];

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0101 | n  | i  | s  | i  | 1001 |

BrType_X1 - 0x9
SrcA_X1 - SrcA
BrOff_X1[14:0] - BrOff[14:0]
S_X1 - Sbit
Opcode_X1 - 0x5

Figure 4-133: blzt in X1 Bit Descriptions
bnz: Branch Not Zero Word

Syntax
bnz SrcA, BrOff

Example
bnz r5, target

Description
Branches to the target if the source operand is not equal to 0. Otherwise, the program counter advances to the next instruction in program order. Branch not 0 hints to a branch prediction mechanism that the branch is not taken. This branch does an implicit move of the source operand to register ZERO_REGISTER.

Functional Description
if (rf[Srca] != 0) {
    setNextPC(getCurrentPC() +
        (signExtend17(BrOff) << (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
    branchHintedIncorrect();
} else {
    branchHintedCorrect();
}    rf[ZERO_REGISTER] = rf[Srca];

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0101 | n | i |  | s | i | 0011 |

Figure 4-134: bnz in X1 Bit Descriptions
bnzt: Branch Not Zero Predict Taken Word

Syntax

bnzt SrcA, BrOff

Example

bnzt r5, target

Description

Branches to the target if the source operand is not equal to 0. Otherwise, the program counter advances to the next instruction in program order. Branch not 0 predict taken hints to a branch prediction mechanism that the branch is taken. This branch does an implicit move of the source operand to register ZERO_REGISTER.

Functional Description

if (rf[SrcA] != 0) {
    setNextPC(getCurrentPC() +
        (signExtend17(BrOff) <<
        (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
    branchHintedCorrect();
} else {
    branchHintedIncorrect();
}
rf[ZEROREGISTER] = rf[SrcA];

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>62</th>
<th>61</th>
<th>60</th>
<th>59</th>
<th>58</th>
<th>57</th>
<th>56</th>
<th>55</th>
<th>54</th>
<th>53</th>
<th>52</th>
<th>51</th>
<th>50</th>
<th>49</th>
<th>48</th>
<th>47</th>
<th>46</th>
<th>45</th>
<th>44</th>
<th>43</th>
<th>42</th>
<th>41</th>
<th>40</th>
<th>39</th>
<th>38</th>
<th>37</th>
<th>36</th>
<th>35</th>
<th>34</th>
<th>33</th>
<th>32</th>
<th>31</th>
</tr>
</thead>
<tbody>
<tr>
<td>0101</td>
<td>n</td>
<td>i</td>
<td>s</td>
<td>i</td>
<td>0011</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 4-135: bnzt in X1 Bit Descriptions
bz: Branch Zero Word

Syntax

bz SrcA, BrOff

Example

bz r5, target

Description

Branches to the target if the source operand is equal to 0. Otherwise, the program counter advances to the next instruction in program order. Branch 0 hints to a branch prediction mechanism that the branch is not taken. This branch does an implicit move of the source operand to register ZERO_REGISTER.

Functional Description

```c
if (rf[SrcA] == 0) {
    setNextPC(getCurrentPC() +
    (signExtend17(BrOff) <<
    (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
    branchHintedIncorrect();
} else {
    branchHintedCorrect();
}
rf[ZERO_REGISTER] = rf[SrcA];
```

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>62</th>
<th>61</th>
<th>60</th>
<th>59</th>
<th>58</th>
<th>57</th>
<th>56</th>
<th>55</th>
<th>54</th>
<th>53</th>
<th>52</th>
<th>51</th>
<th>50</th>
<th>49</th>
<th>48</th>
<th>47</th>
<th>46</th>
<th>45</th>
<th>44</th>
<th>43</th>
<th>42</th>
<th>41</th>
<th>40</th>
<th>39</th>
<th>38</th>
<th>37</th>
<th>36</th>
<th>35</th>
<th>34</th>
<th>33</th>
<th>32</th>
<th>31</th>
</tr>
</thead>
<tbody>
<tr>
<td>0101</td>
<td></td>
<td>i</td>
<td></td>
<td>s</td>
<td>i</td>
<td>0001</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 4-136: bz in X1 Bit Descriptions
bzt: Branch Zero Predict Taken Word

Syntax

\[
\text{bzt } \text{SrcA, BrOff}
\]

Example

\[
\text{bzt r5, target}
\]

Description

Branches to the target if the source operand is equal to 0. Otherwise, the program counter advances to the next instruction in program order. Branch 0 predict taken hints to a branch prediction mechanism that the branch is taken. This branch does an implicit move of the source operand to register \text{ZERO_REGISTER}.

Functional Description

\[
\begin{align*}
\text{if (rf[SrCA] == 0) }
\quad & \text{setNextPC(getCurrentPC() + (signExtend17(BrOff) << (INSTRUCTION\_SIZE\_LOG\_2 - BYTE\_SIZE\_LOG\_2))))} \\
\quad & \text{branchHintedCorrect();} \\
\text{else }
\quad & \text{branchHintedIncorrect();} \\
\text{rf[ZERO\_REGISTER] = rf[SrCA];}
\end{align*}
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
0101 n i s i 0001
```

\[
\begin{align*}
\text{BrType}_{\text{X1}} & : 0x1 \\
\text{BrOff}_{\text{X1}}[16:15] & : \text{BrOff}[16:15] \\
\text{SrcA}_{\text{X1}} & : \text{SrcA} \\
\text{BrOff}_{\text{X1}}[14:0] & : \text{BrOff}[14:0] \\
S_{\text{X1}} & : \text{Sbit} \\
\text{Opcode}_{\text{X1}} & : 0x5
\end{align*}
\]

Figure 4-137: bzt in X1 Bit Descriptions
jalb: Jump and Link Backward

Syntax
jalb JOff

Example
jalb target

Description
Unconditionally jumps to a backward target and puts the address of the subsequent instruction into register \texttt{LINK REGISTER}. The jump hints to the prediction mechanism that this jump is taken. Signals to the hardware that it should attempt to push the link address on the return stack if available.

Functional Description
\begin{verbatim}
rf[LINK_REGISTER] = getCurrentPC() + (INSTRUCTION_SIZE / BYTE_SIZE);
pushReturnStack(getCurrentPC() + (INSTRUCTION_SIZE / BYTE_SIZE));
setNextPC(getCurrentPC() + BACKWARD_OFFSET +
        (JOff << (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
jumped();
\end{verbatim}

Valid Pipelines
\begin{center}
\begin{tabular}{|c|c|c|c|c|c|}
\hline
X0 & X1 & Y0 & Y1 & Y2 \\
\hline
X & \\
\hline
\end{tabular}
\end{center}

Encoding
\begin{center}
\includegraphics[width=\textwidth]{jalb_X1_Bit_Descriptions}
\end{center}

\textit{Figure 4-138: jalb in X1 Bit Descriptions}
jalf: Jump and Link Forward

**Syntax**

\[ \text{jalf} \ JOff \]

**Example**

\[ \text{jalf} \ \text{target} \]

**Description**

Unconditionally jumps to a forward target and puts the address of the subsequent instruction into register \( \text{LINK REGISTER} \). The jump hints to the prediction mechanism that this jump is taken. Signals to the hardware that it should attempt to push the link address on the return stack if available.

**Functional Description**

\[
\begin{align*}
\text{rf[LINK REGISTER]} &= \text{getCurrentPC()} + (\text{INSTRUCTION_SIZE} / \text{BYTE_SIZE}); \\
\text{pushReturnStack} &= \text{getCurrentPC()} + (\text{INSTRUCTION_SIZE} / \text{BYTE_SIZE}); \\
\text{setNextPC} &= \text{getCurrentPC()} + (\text{JOff} \ll (\text{INSTRUCTION_SIZE}_2 - \text{BYTE_SIZE}_2)); \\
& \quad \text{jumped();}
\end{align*}
\]

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 1100 | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i | i |

\( \text{JOff}_X[20:17] \quad \text{JOff}_X[16:15] \quad \text{JOff}_X[26:21] \quad \text{JOff}_X[14:9] \quad \text{JOff}_X[27:27] \quad \text{Opcode}_X \quad 0xC \)

*Figure 4-139: jalf in X1 Bit Descriptions*
jalr: Jump and Link Register

Syntax
jalr SrcA

Example
jalr r5

Description
Unconditionally jumps to an address stored in a register and puts the address of the subsequent instruction into register LINK_REGISTER. Signals to the hardware that it should attempt to push the link address on the return stack if available.

Functional Description
rf[LINK_REGISTER] = getCurrentPC() + (INSTRUCTION_SIZE / BYTE_SIZE);
pushReturnStack(getCurrentPC() + (INSTRUCTION_SIZE / BYTE_SIZE));
setNextPC(rf[Srca] & ALIGNED_INSTRUCTION_MASK);
indirectBranchHintedIncorrect();

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
0000 | 0 | 0000 | 0000 | 0000 | 0000 |
 62  61  60  59  58  57  56  55  54  53  52  51  50  49  48  47  46  45  44  43  42  41  40  39  38  37  36  35  34  33  32  31
0001 |0| 00001001 | 00000 | 0 | 0000 |
```

Desired registers
Dest_X1 - Reserved 0x0
Srca_X1 - Srca
Srcb_X1 - Reserved 0x0
RRROpcodeExtension_X1 - 0x
S_X1 - Reserved 0x0
Opcode_X1 - 0x1

Figure 4-140: jalr in X1 Bit Descriptions
jalrp: Jump and Link Register Predict

Syntax

jalrp SrcA

Example

jalrp r5

Description

Unconditionally jumps to an address stored in a register and puts the address of the subsequent instruction into register LINK REGISTER. Signals to the hardware that it should attempt to predict the target with an address stack if available.

Functional Description

UnsignedMachineWord predictAddress = popReturnStack();
rf[LINK_REGISTER] = getCurrentPC() + (INSTRUCTION_SIZE / BYTE_SIZE);
pushReturnStack(getCurrentPC() + (INSTRUCTION_SIZE / BYTE_SIZE));
setNextPC(rf[SrcA] & ALIGNED_INSTRUCTION_MASK);
  if (predictAddress == (rf[SrcA] & ALIGNED_INSTRUCTION_MASK))
    { indirectBranchHintedCorrect();
  } else {
    indirectBranchHintedIncorrect();
  }

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-141: jalrp in X1 Bit Descriptions
jb: Jump Backward

Syntax

jb JOff

Example

jb target

Description

Unconditionally jumps to a backward target. The jump hints to the prediction mechanism that this jump is taken.

Functional Description

```
setNextPC(getCurrentPC() + BACKWARD_OFFSET +
        (JOff << (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
jumped();
```

Valid Pipelines

```
<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

Encoding

```
| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 1011 | i |   |   | i |   | i |   | i |   | i |   | i |   | i |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |    |
```

Figure 4-142: jb in X1 Bit Descriptions
jf: Jump Forward

Syntax
jf JOff

Example
jf target

Description
Unconditionally jumps to a forward target. The jump hints to the prediction mechanism that this jump is taken.

Functional Description
setNextPC(getCurrentPC() +
   (JOff << (INSTRUCTION_SIZE_LOG_2 - BYTE_SIZE_LOG_2)));
jumped();

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-143: jf in X1 Bit Descriptions
jr: Jump Register

Syntax
jr SrcA

Example
jr r5

Description
Unconditionally jumps to an address stored in a register.

Functional Description

setNextPC(rf[SrcA] & ALIGNED_INSTRUCTION_MASK);
indirectBranchHintedIncorrect();

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

![Figure 4-144: jr in X1 Bit Descriptions]

Dest_X1 - Reserved 0x0
SrcA_X1 - SrcA
SrcB_X1 - Reserved 0x0
RRROpcodeExtension_X1 - 0xC
S_X1 - Reserved 0x0
Opcode_X1 - 0x1
**jrp: Jump Register Predict**

**Syntax**

\[ \text{jrp \ SrcA} \]

**Example**

\[ \text{jrp \ r5} \]

**Description**

Unconditionally jumps to an address stored in a register. Signals to the hardware that it should attempt to predict the target with an address stack if available.

**Functional Description**

```c
setNextPC(rf[SrcA] & ALIGNED_INSTRUCTION_MASK);
if (popReturnStack() == (rf[SrcA] & ALIGNED_INSTRUCTION_MASK)) {
    indirectBranchHintedCorrect();
} else {
    indirectBranchHintedIncorrect();
}
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>X</td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-145: jrp in X1 Bit Descriptions](image)

*Dest_X1* - Reserved 0x0
*SrcA_X1* - SrcA
*SrcB_X1* - Reserved 0x0
*RRROpcodeExtension_X1* - 0xB
*S_X1* - Reserved 0x0
*Opcode_X1* - 0x1
Ink: Link

Syntax

\texttt{Ink Dest}

Example

\texttt{Ink r5}

Description

Moves the address of the subsequent instruction into the destination operand. Does not effect the address stack if available.

Functional Description

\[ rf[\text{Dest}] = \text{getCurrentPC}() + (\text{INSTRUCTION\_SIZE} / \text{BYTE\_SIZE}); \]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>62</td>
<td>\text{Dest} _X1: Dest</td>
</tr>
<tr>
<td>61</td>
<td>\text{SrcA} _X1: Reserved 0x0</td>
</tr>
<tr>
<td>60</td>
<td>\text{SrcB} _X1: Reserved 0x0</td>
</tr>
<tr>
<td>59</td>
<td>\text{RRROpcodeExtension} _X1: 0x0</td>
</tr>
<tr>
<td>58</td>
<td>\text{S} _X1: Sbit</td>
</tr>
<tr>
<td>57</td>
<td>\text{Opcode} _X1: 0x1</td>
</tr>
</tbody>
</table>

\textbf{Figure 4-146: Ink in X1 Bit Descriptions}
4.1.8 Logical Instructions

The following sections provide detailed descriptions of logical instructions listed alphabetically.

- and: And Word
- andi: And Immediate Word
- mm: Masked Merge Word
- mnz: Mask Not Zero Word
- mvnz: Move Not Zero Word
- mvz: Move Zero Word
- mz: Mask Zero Word
- nor: Nor Word
- or: Or Word
- ori: Or Immediate Word
- rl: Rotate Left Word
- rli: Rotate Left Immediate Word
- shl: Logical Shift Left Word
- shli: Logical Shift Left Immediate Word
- shr: Logical Shift Right Word
- shri: Logical Shift Right Immediate Word
- sra: Arithmetic Shift Right Word
- srai: Arithmetic Shift Right Immediate Word
- tblidxb0: Table Index Byte 0
- tblidxb1: Table Index Byte 1
- tblidxb2: Table Index Byte 2
- tblidxb3: Table Index Byte 3
- xor: Exclusive Or Word
- xori: Exclusive Or Immediate Word
and: And Word

Syntax
and Dest, SrcA, SrcB

Example
and r5, r6, r7

Description
Compute the boolean AND of two words.

Functional Description
rf[Dest] = rf[SrcA] & rf[SrcB];

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n | 00000110 | s | s | d |

- Dest_X0 - Dest
- SrcA_X0 - SrcA
- SrcB_X0 - SrcB
- RRRR_opcodeExtension_X0 - 0x6
- S_X0 - Sbit
- Opcode_X0 - 0x0

Figure 4-147: and in X0 Bit Descriptions

| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0001 | n | 00000110 | s | s | d |

- Dest_X1 - Dest
- SrcA_X1 - SrcA
- SrcB_X1 - SrcB
- RRRR_opcodeExtension_X1 - 0x4
- S_X1 - Sbit
- Opcode_X1 - 0x1

Figure 4-148: and in X1 Bit Descriptions
Figure 4-149: and in Y0 Bit Descriptions

Figure 4-150: and in Y1 Bit Descriptions
**andi: And Immediate Word**

**Syntax**

andi Dest, SrcA, Imm8

**Example**

andi r5, r6, 5

**Description**

Compute the boolean AND of a word and a sign extended immediate.

**Functional Description**

rf[Dest] = rf[SrcA] & signExtend8(Imm8);

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
</tbody>
</table>

**Encoding**

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 101| n  | 000001 | i  | s  | d  |

- Dest_X0 - Dest
- SrcA_X0 - SrcA
- Imm8_X0 - Imm8
- ImmOpcodeExtension_X0 - 0x1
- S_X0 - Sbit
- Opcode_X0 - 0x5

**Figure 4-151: andi in X0 Bit Descriptions**

<table>
<thead>
<tr>
<th>62</th>
<th>61</th>
<th>60</th>
<th>59</th>
<th>58</th>
<th>57</th>
<th>56</th>
<th>55</th>
<th>54</th>
<th>53</th>
<th>52</th>
<th>51</th>
<th>50</th>
<th>49</th>
<th>48</th>
<th>47</th>
<th>46</th>
<th>45</th>
<th>44</th>
<th>43</th>
<th>42</th>
<th>41</th>
<th>40</th>
<th>39</th>
<th>38</th>
<th>37</th>
<th>36</th>
<th>35</th>
<th>34</th>
<th>33</th>
<th>32</th>
<th>31</th>
</tr>
</thead>
<tbody>
<tr>
<td>0110</td>
<td>n</td>
<td>000010</td>
<td>i</td>
<td>s</td>
<td>d</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- Dest_X1 - Dest
- SrcA_X1 - SrcA
- Imm8_X1 - Imm8
- ImmOpcodeExtension_X1 - 0x4
- S_X1 - Sbit
- Opcode_X1 - 0x6

**Figure 4-152: andi in X1 Bit Descriptions**
**Instruction Set Architecture**

**Figure 4-153: andi in Y0 Bit Descriptions**

- **Dest_Y0**: Dest
- **SrcA_Y0**: SrcA
- **Imm8_Y0**: Imm8
- **Opcode_Y0**: 0xA

**Figure 4-154: andi in Y1 Bit Descriptions**

- **Dest_Y1**: Dest
- **SrcA_Y1**: SrcA
- **Imm8_Y1**: Imm8
- **Opcode_Y1**: 0x8
**mm: Masked Merge Word**

**Syntax**

\[ mm \text{ Dest, SrcA, SrcB, MMStart, MMEnd} \]

**Example**

mm r5, r6, r7, 5, 7

**Description**

Merge two source operands based on a running mask. The mask is specified by the MMstart and MMend fields, which contain the mask’s starting and ending bit positions. If the start position is less than or equal to the end position, then the mask contains bits set (1) from start bit position up to the ending bit position. If the start position is greater than the end position, then the mask contains the bits set (1) from the start bit position up to the WORD_SIZE bit position, and from the 0 bit position up to the end bit position. The mask selects bits out of the first source operand and the inverse of the mask selects bits out of the second source operand.

**Functional Description**

```c
unsigned machine_word mask = 0;
int start;
int end;
start = MMStart;
end = MMEnd;
mask =
  (start <= end) ? ((word_mask << start) ^ ((word_mask << end) << 1))
    : ((word_mask << start) | (word_mask >> ((word_size - 1) - end)));
rf[Dest] = (rf[SrcA] & mask) | (rf[SrcB] & (word_mask ^ mask));
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| i10| i  | i  | s  | s  | d  |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |

**Figure 4-155: mm in X0 Bit Descriptions**
Figure 4-156: mm in X1 Bit Descriptions
mnz: Mask Not Zero Word

Syntax

\texttt{mnz Dest, SrcA, SrcB}

Example

\texttt{mnz r5, r6, r7}

Description

If the first operand is not 0, then compute the boolean AND of the second operand and a value of all ones (1’s), otherwise return zero (0).

Functional Description

\[ rf[\text{Dest}] = \text{signExtend1}((rf[\text{SrcA}] != 0) \ ? \ 1 : 0) \ & \ rf[\text{SrcB}]; \]

Valid Pipelines

\begin{verbatim}
X0 X1 Y0 Y1 Y2
X X X X X
\end{verbatim}

Encoding

\begin{verbatim}
000  n  000010101  s  s  d
\end{verbatim}
Figure 4-159: mnz in Y0 Bit Descriptions

Figure 4-160: mnz in Y1 Bit Descriptions
mvnz: Move Not Zero Word

Syntax

mvnz Dest, SrcA, SrcB

Example

mvnz r5, r6, r7

Description

If the first source operand is not 0, move the second operand to the destination. Else, move the contents of the destination register to the destination. This instruction unconditionally reads the first input operand, the second input operand, and the destination operand.

Functional Description

UnsignedMachineWord localSrcB = rf[SrcB];
UnsignedMachineWord localDest = rf[Dest];
rf[Dest] = (rf[SrcA] != 0) ? (localSrcB) : (localDest)

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-161: mvnz in X0 Bit Descriptions

Figure 4-162: mvnz in Y0 Bit Descriptions
**mvz: Move Zero Word**

**Syntax**

mvz Dest, SrcA, SrcB

**Example**

mvz r5, r6, r7

**Description**

If the first source operand is 0, move the second operand to the destination. Else, move the contents of the destination register to the destination. This instruction unconditionally reads the first input operand, the second input operand, and the destination operand.

**Functional Description**

```c
UnsignedMachineWord localSrcB = rf[SrcB];
UnsignedMachineWord localDest = rf[Dest];
rf[Dest] = (rf[SrcA] == 0) ? (localSrcB) : (localDest);
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

- **X0 Bit Descriptions**
  - Dest\_X0 - Dest
  - SrcA\_X0 - SrcA
  - SrcB\_X0 - SrcB
  - RRROpcodeExtension\_X0 - 0x2E
  - S\_X0 - Sbit
  - Opcode\_X0 - 0x0

- **Y0 Bit Descriptions**
  - Dest\_Y0 - Dest
  - SrcA\_Y0 - SrcA
  - SrcB\_Y0 - SrcB
  - RRROpcodeExtension\_Y0 - 0x2
  - Opcode\_Y0 - 0x2
mz: Mask Zero Word

Syntax
mz Dest, SrcA, SrcB

Example
mz r5, r6, r7

Description
If the first operand is 0, then compute the boolean AND of the second operand and a value of all ones (1's), otherwise return zero (0).

Functional Description
rf[Dest] = signExtend1((rf[SrCA] == 0) ? 1 : 0) & rf[SrCB];

Valid Pipelines

Encoding

Figure 4-165: mz in X0 Bit Descriptions

Figure 4-166: mz in X1 Bit Descriptions
Figure 4-167: mz in Y0 Bit Descriptions

Figure 4-168: mz in Y1 Bit Descriptions
nor: Nor Word

Syntax

nor Dest, SrcA, SrcB

Example

nor r5, r6, r7

Description

Computer the boolean NOR of two words.

Functional Description

\[ rf[\text{Dest}] = \neg (rf[\text{SrcA}] \mid rf[\text{SrcB}]); \]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n | 000110010 | s | s | d   |
```

- Dest_X0 - Dest
- SrcA_X0 - SrcA
- SrcB_X0 - SrcB
- RRR Opcode Extension_X0 - 0x32
- S_X0 - Sbit
- Opcode_X0 - 0x0

**Figure 4-169: nor in X0 Bit Descriptions**

```
| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0001 | n | 000011000 | s | s | d   |
```

- Dest_X1 - Dest
- SrcA_X1 - SrcA
- SrcB_X1 - SrcB
- RRR Opcode Extension_X1 - 0x18
- S_X1 - Sbit
- Opcode_X1 - 0x1

**Figure 4-170: nor in X1 Bit Descriptions**
Figure 4-171: nor in Y0 Bit Descriptions

Figure 4-172: nor in Y1 Bit Descriptions
**or: Or Word**

**Syntax**

```
or Dest, SrcA, SrcB
```

**Example**

```
or r5, r6, r7
```

**Description**

Compute the boolean OR of two words.

**Functional Description**

```
rf[Dest] = rf[SrcA] | rf[SrcB];
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
Figure 4-173: or in X0 Bit Descriptions
```

```
Figure 4-174: or in X1 Bit Descriptions
```
Figure 4-175: or in Y0 Bit Descriptions

Figure 4-176: or in Y1 Bit Descriptions
ori: Or Immediate Word

Syntax
ori Dest, SrcA, Imm8

Example
ori r5, r6, 5

Description
Compute the boolean OR of a word and a sign extended immediate.

Functional Description
rf[Dest] = rf[SrcA] | signExtend8(Imm8);

Valid Pipelines

Encoding

Figure 4-177: ori in X0 Bit Descriptions

Figure 4-178: ori in X1 Bit Descriptions
Instruction Set Architecture

Figure 4-179: ori in Y0 Bit Descriptions

Figure 4-180: ori in Y1 Bit Descriptions
rl: Rotate Left Word

Syntax
rl Dest, SrcA, SrcB

Example
rl r5, r6, r7

Description
Rotate the first source operand to the left by the second source operand. If the shift amount is larger than the number of bits in a word, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a word. The main processor ISA does not contain a rotate right.

Functional Description

\[
rf[\text{Dest}] = \left( (rf[\text{SrcA}] \ll (rf[\text{SrcB}] \% \text{WORD_SIZE})) | \\
(\text{UnsignedMachineWord}(rf[\text{SrcA}]) \gg \\
(\text{WORD_SIZE} - (rf[\text{SrcB}] \% \text{WORD_SIZE})) \% \text{WORD_SIZE})) \right);
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
<p>| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|</p>
<table>
<thead>
<tr>
<th>000</th>
<th>n</th>
<th>000110110</th>
<th>s</th>
<th>s</th>
<th>d</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Dest_X0 - Dest</td>
<td>SrcA_X0 - SrcA</td>
<td>SrcB_X0 - SrcB</td>
<td>RRRROpcodeExtension_X0 - 0x36</td>
<td>S_X0 - Sbit</td>
</tr>
</tbody>
</table>
```

Figure 4-181: rl in X0 Bit Descriptions

```
<p>| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|</p>
<table>
<thead>
<tr>
<th>0001</th>
<th>n</th>
<th>0000111008</th>
<th>s</th>
<th>s</th>
<th>d</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Dest_X1 - Dest</td>
<td>SrcA_X1 - SrcA</td>
<td>SrcB_X1 - SrcB</td>
<td>RRRROpcodeExtension_X1 - 0x1C</td>
<td>S_X1 - Sbit</td>
</tr>
</tbody>
</table>
```

Figure 4-182: rl in X1 Bit Descriptions
Instruction Set Architecture

Figure 4-183: rl in Y0 Bit Descriptions

Figure 4-184: rl in Y1 Bit Descriptions
rli: Rotate Left Immediate Word

Syntax

\[ \text{rli Dest, SrcA, ShAmt} \]

Example

\[ \text{rli r5, r6, 5} \]

Description

Rotate the first source operand to the left by an immediate. If the shift amount is larger than the number of bits in a word, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a word. The main processor ISA does not contain a rotate right.

Functional Description

\[
\text{rf[Dest]} = \left( \text{rf[SrcA]} \ll \left( \left(\text{UnsignedMachineWord}\ ShAmt \mod \text{WORD\_SIZE} \right) \right) \right) \lor \left( \left(\text{UnsignedMachineWord}\ rf[SrcA] \gg \left(\text{WORD\_SIZE} - \left(\text{UnsignedMachineWord}\ ShAmt \mod \text{WORD\_SIZE} \right) \right) \right) \right);
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-185: rli in X0 Bit Descriptions

---

Figure 4-186: rli in X1 Bit Descriptions
Figure 4-187: rli in Y0 Bit Descriptions

Figure 4-188: rli in Y1 Bit Descriptions
**shl: Logical Shift Left Word**

**Syntax**

```shl Dest, SrcA, SrcB```

**Example**

```shl r5, r6, r7```

**Description**

Logically shift the first source operand to the left by the second source operand. If the shift amount is larger than the number of bits in a word, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a word. Left shifts shift zeros into the low ordered bits in a word and are suitable to be used as unsigned multiplication by powers of 2.

**Functional Description**

```rf[Dest] = rf[SrcA] << (rf[SrcB] % WORD_SIZE);```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-189: shl in X0 Bit Descriptions](image)

![Figure 4-190: shl in X1 Bit Descriptions](image)
Instruction Set Architecture

Figure 4-191: shl in Y0 Bit Descriptions

Figure 4-192: shl in Y1 Bit Descriptions
shli: Logical Shift Left Immediate Word

Syntax
shli Dest, SrcA, ShAmt

Example
shli r5, r6, 5

Description
Logically shift the first source operand to the left by an immediate. If the shift amount is larger than the number of bits in a word, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a word. Left shifts shift zeros into the low ordered bits in a word and are suitable to be used as unsigned multiplication by powers of 2.

Functional Description
rf[Dest] = rf[SrcA] << (((UnsignedMachineWord) ShAmt) % WORD_SIZE);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>X0</th>
<th>X1</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
</tbody>
</table>

Encoding

Figure 4-193: shli in X0 Bit Descriptions

Figure 4-194: shli in X1 Bit Descriptions
Figure 4-195: shli in Y0 Bit Descriptions

Figure 4-196: shli in Y1 Bit Descriptions
shr: Logical Shift Right Word

Syntax
shr Dest, SrcA, SrcB

Example
shr r5, r6, r7

Description
Logically shift the first source operand to the right by the second source operand. If the shift amount is larger than the number of bits in a word, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a word. Logical right shifts shift zeros into the high ordered bits in a word and are suitable to be used as unsigned integer division by powers of 2.

Functional Description
rf[Dest] = (UnsignedMachineWord) rf[SrCA] >> (rf[SrCB] % WORD_SIZE);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000| n  | 00101000| s | s | d |

Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRROpcodeExtension_X0 - 0x48
S_X0 - Sbit
Opcode_X0 - 0x0

Figure 4-197: shr in X0 Bit Descriptions

| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0001| n  | 000101001| s | s | d |

Dest_X1 - Dest
SrcA_X1 - SrcA
SrcB_X1 - SrcB
RRROpcodeExtension_X1 - 0x29
S_X1 - Sbit
Opcode_X1 - 0x1

Figure 4-198: shr in X1 Bit Descriptions
**Instruction Set Architecture**

Figure 4-199: shr in Y0 Bit Descriptions

- Dest_Y0 - Dest
- SrcA_Y0 - SrcA
- SrcB_Y0 - SrcB
- RRROpcodeExtension_Y0 - 0x2
- Opcode_Y0 - 0x4

Figure 4-200: shr in Y1 Bit Descriptions

- Dest_Y1 - Dest
- SrcA_Y1 - SrcA
- SrcB_Y1 - SrcB
- RRROpcodeExtension_Y1 - 0x2
- Opcode_Y1 - 0x4
shri: Logical Shift Right Immediate Word

Syntax
shri Dest, SrcA, ShAmt

Example
shri r5, r6, 5

Description
Logically shift the first source operand to the right by an immediate. If the shift amount is larger than the number of bits in a word, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a word. Logical right shifts shift zeros into the high ordered bits in a word and are suitable to be used as unsigned integer division by powers of 2.

Functional Description
\[ rf[\text{Dest}] = ((\text{UnsignedMachineWord}) rf[\text{SrcA}]) >> \text{ShAmt}; \]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| 111 | n | 0000000111 | i | s | d |
|   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
|   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |

**Figure 4-201: shri in X0 Bit Descriptions**

| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
| 1000 | n | 0000000111 | i | s | d |
|   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
|   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |

**Figure 4-202: shri in X1 Bit Descriptions**
Figure 4-203: shri in Y0 Bit Descriptions

Figure 4-204: shri in Y1 Bit Descriptions
sra: Arithmetic Shift Right Word

Syntax
sra Dest, SrcA, SrcB

Example
sra r5, r6, r7

Description
Arithmetically shift the first source operand to the right by the second source operand. If the shift amount is larger than the number of bits in a word, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a word. Arithmetic right shift shifts the high ordered bit into the high ordered bits in a word.

Functional Description
rf[Dest] = ((SignedMachineWord) rf[SrcA]) >> (rf[SrcB] % WORD_SIZE);

Valid Pipelines

Encoding

Figure 4-205: sra in X0 Bit Descriptions

Figure 4-206: sra in X1 Bit Descriptions
Figure 4-207: sra in Y0 Bit Descriptions

Figure 4-208: sra in Y1 Bit Descriptions
srai: Arithmetic Shift Right Immediate Word

Syntax
srai Dest, SrcA, ShAmt

Example
srai r5, r6, 5

Description
Arithmetically shift the first source operand to the right by an immediate. If the shift amount is larger than the number of bits in a word, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a word. Arithmetic right shifts shift the high ordered bit into the high ordered bits in a word.

Functional Description
rf[Dest] = ((SignedMachineWord) rf[SrcA]) >> (((UnsignedMachineWord) ShAmt)) % WORD_SIZE;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

Encoding

\[
\begin{array}{cccccc}
\text{X0} & \text{X1} & \text{Y0} & \text{Y1} & \text{Y2} \\
\hline
\text{X} & \text{X} & \text{X} & \text{X} & \\
\end{array}
\]

\[
\begin{array}{ccccccc}
\text{0} & \text{0} & \text{0} & \text{0} & \text{0} & \text{1} & \text{0} \\
\end{array}
\]

\[
\begin{array}{ccccc}
\text{d} & \text{Dest}_X0 & \text{SrcA}_X0 & \text{ShAmt}_X0 & \text{UnShOpcodeExtension}_X0 \\
\end{array}
\]

\[
\begin{array}{ccccc}
\text{n} & \text{S}_X0 & \text{Opcode}_X0 \\
\end{array}
\]

\[
\begin{array}{ccccc}
\text{X0} & \text{X1} & \text{Y0} & \text{Y1} & \text{Y2} \\
\hline
\text{X} & \text{X} & \text{X} & \text{X} & \\
\end{array}
\]

\[
\begin{array}{ccccccc}
\text{0} & \text{0} & \text{0} & \text{0} & \text{0} & \text{1} & \text{0} \\
\end{array}
\]

\[
\begin{array}{ccccc}
\text{d} & \text{Dest}_X1 & \text{SrcA}_X1 & \text{ShAmt}_X1 & \text{UnShOpcodeExtension}_X1 \\
\end{array}
\]

\[
\begin{array}{ccccc}
\text{n} & \text{S}_X1 & \text{Opcode}_X1 \\
\end{array}
\]

Figure 4-209: srai in X0 Bit Descriptions

Figure 4-210: srai in X1 Bit Descriptions
Instruction Set Architecture

Figure 4-211: srai in Y0 Bit Descriptions

Figure 4-212: srai in Y1 Bit Descriptions
**tblidxb0: Table Index Byte 0**

**Syntax**

tblidxb0 Dest, SrcA

**Example**

tblidxb0 r5, r6

**Description**

Modify the table pointer stored in the destination operand to point to the word indexed by the contents of byte 0 of the source operand. The table is assumed to be aligned to a 1024 byte boundary, and bits 9:2 of the destination are replaced by the contents of bits 7:0 of the source operand.

**Functional Description**

rf[Dest] = 

\[(rf[Dest] & -(BYTE\_MASK << 2)) | (((rf[SrCA] >> 0) & BYTE\_MASK) << 2)\];

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
<table>
<thead>
<tr>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>23</th>
<th>22</th>
<th>21</th>
<th>20</th>
<th>19</th>
<th>18</th>
<th>17</th>
<th>16</th>
<th>15</th>
<th>14</th>
<th>13</th>
<th>12</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td></td>
<td>n</td>
<td>0000001011</td>
<td>01000</td>
<td>s</td>
<td>da</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

**Figure 4-213: tblidxb0 in X0 Bit Descriptions**

```
<table>
<thead>
<tr>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td></td>
<td>101</td>
<td>01000</td>
</tr>
</tbody>
</table>
```

**Figure 4-214: tblidxb0 in Y0 Bit Descriptions**
### tblidxb1: Table Index Byte 1

**Syntax**

tblidxb1 Dest, SrcA

**Example**

tblidxb1 r5, r6

**Description**

Modify the table pointer stored in the destination operand to point to the word indexed by the contents of byte 1 of the source operand. The table is assumed to be aligned to a 1024 byte boundary, and bits 9:2 of the destination are replaced by the contents of bits 15:8 of the source operand.

**Functional Description**

\[
rf[Dest] = (rf[Dest] & \neg (\text{BYTE\_MASK} \ll 2)) | ((rf[SrcA] \gg 8) & \text{BYTE\_MASK}) \ll 2;
\]

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-215: tblidxb1 in X0 Bit Descriptions](Image)

![Figure 4-216: tblidxb1 in Y0 Bit Descriptions](Image)
**tblidxb2: Table Index Byte 2**

**Syntax**

```
tblidxb2 Dest, SrcA
```

**Example**

```
tblidxb2 r5, r6
```

**Description**

Modify the table pointer stored in the destination operand to point to the word indexed by the contents of byte 2 of the source operand. The table is assumed to be aligned to a 1024-byte boundary, and bits 9:2 of the destination are replaced by the contents of bits 23:16 of the source operand.

**Functional Description**

```
rf[Dest] =
   ((rf[Dest] & ~(BYTE_MASK << 2)) | (((rf[SrcA] >> 16) & BYTE_MASK) << 2));
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 111| n  | 000001011 | 01010 | s  | db |
```

- `Dest_X0`: Dest
- `SrcA_X0`: SrcA
- `UnOpcodeExtension_X0`: 0xA
- `UnShOpcdeExtension_X0`: 0xB
- `S_X0`: Sbit
- `Opcode_X0`: 0x7

**Figure 4-217: tblidxb2 in X0 Bit Descriptions**

```
<table>
<thead>
<tr>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>19</th>
<th>18</th>
<th>17</th>
<th>16</th>
<th>15</th>
<th>14</th>
<th>13</th>
<th>12</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1101</td>
<td></td>
<td>101</td>
<td>01010</td>
<td>s</td>
<td>db</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

- `Dest_Y0`: Dest
- `SrcA_Y0`: SrcA
- `UnOpcodeExtension_Y0`: 0xA
- `UnShOpcdeExtension_Y0`: 0x5
- `Opcode_Y0`: 0xD

**Figure 4-218: tblidxb2 in Y0 Bit Descriptions**
**tblidxb3: Table Index Byte 3**

**Syntax**

```
tblidxb3 Dest, SrcA
```

**Example**

```
tblidxb3 r5, r6
```

**Description**

Modify the table pointer stored in the destination operand to point to the word indexed by the contents of byte 3 of the source operand. The table is assumed to be aligned to a 1024 byte boundary, and bits 9:2 of the destination are replaced by the contents of bits 31:24 of the source operand.

**Functional Description**

```
rf[Dest] = (rf[Dest] & ~(BYTE_MASK << 2)) | ((rf[SrcA] >> 24) & BYTE_MASK) << 2);
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
```

<table>
<thead>
<tr>
<th>111</th>
<th>n</th>
<th>0000010111</th>
<th>01011</th>
<th>s</th>
<th>ds</th>
</tr>
</thead>
<tbody>
<tr>
<td>Dest_X0 - Dest</td>
<td>SrcA_X0 - SrcA</td>
<td>UnOpcodeExtension_X0 - 0xB</td>
<td>UnShOpcodeExtension_X0 - 0xB</td>
<td>S_X0 - Sbit</td>
<td>Opcode_X0 - 0x7</td>
</tr>
</tbody>
</table>

**Figure 4-219: tblidxb3 in X0 Bit Descriptions**

```
30 29 28 27 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
```

<table>
<thead>
<tr>
<th>1101</th>
<th>101</th>
<th>01011</th>
<th>s</th>
<th>ds</th>
</tr>
</thead>
<tbody>
<tr>
<td>Dest_Y0 - Dest</td>
<td>SrcA_Y0 - SrcA</td>
<td>UnOpcodeExtension_Y0 - 0xB</td>
<td>UnShOpcodeExtension_Y0 - 0xB</td>
<td>Opcode_Y0 - 0x5</td>
</tr>
</tbody>
</table>

**Figure 4-220: tblidxb3 in Y0 Bit Descriptions**
xor: Exclusive Or Word

Syntax
xor Dest, SrcA, SrcB

Example
xor r5, r6, r7

Description
Compute the boolean XOR of two words.

Functional Description
rf[Dest] = rf[SrcA] ^ rf[SrcB];

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>Bit</th>
<th>Destination</th>
<th>Source A</th>
<th>Source B</th>
<th>Instruction Extension</th>
<th>Source Type</th>
<th>Opcode</th>
</tr>
</thead>
<tbody>
<tr>
<td>30</td>
<td>Dest_X0</td>
<td>SrcA_X0</td>
<td>SrcB_X0</td>
<td>RRROpcodeExtension_X0</td>
<td>S_X0</td>
<td>Opcode_X0</td>
</tr>
<tr>
<td>29</td>
<td></td>
<td></td>
<td></td>
<td>0x5E</td>
<td></td>
<td>0x0</td>
</tr>
<tr>
<td>28</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>27</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>26</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>25</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>24</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>23</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>22</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>21</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>20</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>19</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>18</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>17</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>15</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>14</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>13</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>12</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>11</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>10</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>9</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>8</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>7</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>3</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 4-221: xor in X0 Bit Descriptions

<table>
<thead>
<tr>
<th>Bit</th>
<th>Destination</th>
<th>Source A</th>
<th>Source B</th>
<th>Instruction Extension</th>
<th>Source Type</th>
<th>Opcode</th>
</tr>
</thead>
<tbody>
<tr>
<td>62</td>
<td>Dest_X1</td>
<td>SrcA_X1</td>
<td>SrcB_X1</td>
<td>RRROpcodeExtension_X1</td>
<td>S_X1</td>
<td>Opcode_X1</td>
</tr>
<tr>
<td>61</td>
<td></td>
<td></td>
<td></td>
<td>0x41</td>
<td></td>
<td>0x1</td>
</tr>
<tr>
<td>60</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>59</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>58</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>57</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>56</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>55</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>54</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>53</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>52</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>51</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>50</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>49</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>48</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>47</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>46</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>45</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>44</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>43</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>42</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>41</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>40</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>39</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>38</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>37</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>36</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>35</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>34</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>33</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>32</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>31</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 4-222: xor in X1 Bit Descriptions
Figure 4-223: xor in Y0 Bit Descriptions

Figure 4-224: xor in Y1 Bit Descriptions
**xori: Exclusive Or Immediate Word**

**Syntax**

\[ \text{xori Dest, SrcA, Imm8} \]

**Example**

\[ \text{xori r5, r6, 5} \]

**Description**

Compute the boolean XOR of a word and a sign extended immediate.

**Functional Description**

\[ \text{rf}[\text{Dest}] = \text{rf}[\text{SrcA}] \ xor \ \text{signExtend8(Imm8)}; \]

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
0101 n 0000010 i s d
```

- \( \text{Dest}_X0 \) - Dest
- \( \text{SrcA}_X0 \) - SrcA
- \( \text{Imm8}_X0 \) - Imm8
- \( \text{ImmOpcodeExtension}_X0 \) - 0x2
- \( \text{S}_X0 \) - Sbit
- \( \text{Opcode}_X0 \) - 0x5

*Figure 4-225: xori in X0 Bit Descriptions*

```
0110 n 0010101 i s d
```

- \( \text{Dest}_X1 \) - Dest
- \( \text{SrcA}_X1 \) - SrcA
- \( \text{Imm8}_X1 \) - Imm8
- \( \text{ImmOpcodeExtension}_X1 \) - 0x15
- \( \text{S}_X1 \) - Sbit
- \( \text{Opcode}_X1 \) - 0x6

*Figure 4-226: xori in X1 Bit Descriptions*
4.1.9 Memory Instructions

The following sections provide detailed descriptions of memory instructions listed alphabetically.

- `lb`: Load Byte
- `lb_u`: Load Byte Unsigned
- `lbadd`: Load Byte and Add
- `lbadd_u`: Load Byte Unsigned and Add
- `lh`: Load Half Word
- `lh_u`: Load Half Word Unsigned
- `lhadd`: Load Half Word and Add
- `lhadd_u`: Load Half Word Unsigned and Add
- `lw`: Load Word
- `lw_na`: Load Word No Alignment Trap
- `lwadd`: Load Word and Add
- `lwadd_na`: Load Word No Alignment Trap and Add
- `sb`: Store Byte
- `sbadd`: Store Byte and Add
- `sh`: Store Half Word
- `shadd`: Store Half Word and Add
- `sw`: Store Word
- `swadd`: Store Word and Add
- `tns`: Test and Set Word
**lb: Load Byte**

**Syntax**

\[
\text{lb Dest, Src}
\]

**Example**

\[
\text{lb r5, r6}
\]

**Description**

Load a byte from memory into the destination register. The address of the value to be loaded is read from the source operand. The value read from memory is sign-extended to a complete word.

**Functional Description**

\[
\text{rf[Dest]} = \text{signExtend8(memoryReadByte(rf[Sr]))};
\]

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td>X</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-227: lb in X1 Bit Descriptions](image)

![Figure 4-228: lb in Y2 Bit Descriptions](image)
**lb_u: Load Byte Unsigned**

**Syntax**

\[ lb_u \text{ Dest, } \text{Src} \]

**Example**

\[ lb_u \text{ r5, r6} \]

**Description**

Load a byte from memory into the destination register. The address of the value to be loaded is read from the source operand. The value read from memory is 0 extended to a complete word.

**Functional Description**

\[ rf[\text{Dest}] = \text{memoryReadByte}(rf[\text{Src}]); \]

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td>X</td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-229: lb_u in X1 Bit Descriptions](image)

![Figure 4-230: lb_u in Y2 Bit Descriptions](image)
lbadd: Load Byte and Add

Syntax
lbadd Dest, SrcA, Imm8

Example
lbadd r5, r6, 5

Description
Load a byte from memory into the destination register. The address of the value to be loaded is read from the source operand. The value read from memory is sign-extended to a complete word. Add the signed immediate argument to the address register.

NOTE: This instruction is only supported in the TILEPro family of products.

Functional Description
\[
\begin{align*}
rf[Dest] &= \text{signExtend8(memoryReadByte(rf[SrcA]))}; \\
rf[SrcA] &= rf[SrcA] + \text{signExtend8(Imm8)};
\end{align*}
\]

Valid Pipelines

Functional Description

Encoding

Figure 4-231: lbadd in X1 Bit Descriptions
**lbadd_u: Load Byte Unsigned and Add**

**Syntax**

lbadd_u Dest, SrcA, Imm8

**Example**

lbadd_u r5, r6, 5

**Description**

Load a byte from memory into the destination register. The address of the value to be loaded is read from the source operand. The value read from memory is 0-extended to a complete word. Add the signed immediate argument to the address register.

NOTE: This instruction is only supported in the TILEPro family of products.

**Functional Description**

rf[Dest] = memoryReadByte(rf[Srca]);
rf[Srca] = rf[Srca] + signExtend8(Imm8);

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31
  0110  n  0010111  i  ds  d

Dest_X1 - Dest
Srca_X1 - Srca
Imm8_X1 - Imm8
ImmOpcodeExtension_X1 - 0x17
S_X1 - Sbit
 Opcode_X1 - 0x6
```

*Figure 4-232: lbadd_u in X1 Bit Descriptions*
**lh: Load Half Word**

**Syntax**

lh Dest, Src

**Example**

lh r5, r6

**Description**

Load a half word from memory into the destination register. The address of the value to be loaded is read from the source operand. This load only operates for half word aligned loads. Unaligned memory access causes an Unaligned Data Reference interrupt. The value read from memory is sign-extended to a complete word.

**Functional Description**

\[
rf[Dest] = \text{signExtend16}(\text{memoryReadHalfWord}(rf[Src]));
\]

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

Figure 4-233: lh in X1 Bit Descriptions

Figure 4-234: lh in Y2 Bit Descriptions
**lh_u: Load Half Word Unsigned**

**Syntax**

lh_u Dest, Src

**Example**

lh_u r5, r6

**Description**

Load a half word from memory into the destination register. The address of the value to be loaded is read from the source operand. This load only operates for half word aligned loads. Unaligned memory access causes an Unaligned Data Reference interrupt. The value read from memory is 0 extended to a complete word.

**Functional Description**

rf[Dest] = memoryReadHalfWord(rf[Ssrc]);

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-235: lh_u in X1 Bit Descriptions](image)

![Figure 4-236: lh_u in Y2 Bit Descriptions](image)
Ihadd: Load Half Word and Add

Syntax
lhadd Dest, SrcA, Imm8

Example
lhadd r5, r6, 5

Description
Load a half word from memory into the destination register. The address of the value to be loaded is read from the source operand. This load only operates for half word aligned loads. Unaligned memory access causes an Unaligned Data Reference interrupt. The value read from memory is sign-extended to a complete word. Add the signed immediate argument to the address register.

NOTE: This instruction is only supported in the TILEPro family of products.

Functional Description
rf[Dest] = signExtend16(memoryReadHalfWord(rf[SrcA]));
rf[SrcA] = rf[SrcA] + signExtend8(Imm8);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 4-237: lhadd in X1 Bit Descriptions
**lhadd_u: Load Half Word Unsigned and Add**

**Syntax**

lhadd_u Dest, SrcA, Imm8

**Example**

lhadd_u r5, r6, 5

**Description**

Load a half word from memory into the destination register. The address of the value to be loaded is read from the source operand. This load only operates for half word aligned loads. Unaligned memory access causes an Unaligned Data Reference interrupt. The value read from memory is 0 extended to a complete word. Add the signed immediate argument to the address register.

*NOTE*: This instruction is only supported in the TILEPro family of products.

**Functional Description**

\[
\text{rf}[\text{Dest}] = \text{memoryReadHalfWord(} \text{rf}[\text{SrcA}])
\]

\[
\text{rf}[\text{SrcA}] = \text{rf}[\text{SrcA}] + \text{signExtend8(Imm8)}
\]

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-238: lhadd_u in X1 Bit Descriptions](image-url)

*Figure 4-238: lhadd_u in X1 Bit Descriptions*
Chapter 4 Processor Engine Instruction Set

lw: Load Word

Syntax
lw Dest, Src

Example
lw r5, r6

Description
Load a word from memory into the destination register. The address of the value to be loaded is read from the source operand. This load only operates for word aligned loads. Unaligned memory access causes an Unaligned Data Reference interrupt.

Functional Description
rf[Dest] = memoryReadWord(rf[Src]);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 1000 | n | 0000001011 | 01110 | s | d |

Figure 4-239: lw in X1 Bit Descriptions

<table>
<thead>
<tr>
<th>58</th>
<th>57</th>
<th>56</th>
<th>55</th>
<th>54</th>
<th>53</th>
<th>52</th>
<th>51</th>
</tr>
</thead>
<tbody>
<tr>
<td>100</td>
<td>s</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 4-240: lw in Y2 Bit Descriptions
lw_na: Load Word No Alignment Trap

Syntax

```
lw_na Dest, Src
```

Example

```
lw_na r5, r6
```

Description

Load a word from memory into the destination register. The address of the value to be loaded is read from the source operand and the bottom two bits are set to 0. No Unaligned Data Reference interrupts are caused by this instruction.

NOTE: This instruction is only supported in the TILEPro family of products.

Functional Description

```
rf[Dest] = memoryReadWordNA(rf[Sr]);
```

Valid Pipelines

![Diagram showing valid pipelines]

Encoding

```
<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

![Figure 4-241: lw_na in X1 Bit Descriptions]
**lwadd: Load Word and Add**

**Syntax**

 lwadd Dest, SrcA, Imm8

**Example**

 lwadd r5, r6, 5

**Description**

Load a word from memory into the destination register. The address of the value to be loaded is read from the source operand. This load only operates for word aligned loads. Unaligned memory access causes an Unaligned Data Reference interrupt. Add the signed immediate argument to the address register.

NOTE: This instruction is only supported in the TILEPro family of products.

**Functional Description**

rf[Dest] = memoryReadWord(rf[SrcA]);
rf[SrcA] = rf[SrcA] + signExtend8(Imm8);

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

|  62 |  61 |  60 |  59 |  58 |  57 |  56 |  55 |  54 |  53 |  52 |  51 |  50 |  49 |  48 |  47 |  46 |  45 |  44 |  43 |  42 |  41 |  40 |  39 |  38 |  37 |  36 |  35 |  34 |  33 |  32 |  31 |
|-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0110 | n | 0011010 | i | ds | d |

**Figure 4-242: lwadd in X1 Bit Descriptions**
**lwadd_na: Load Word No Alignment Trap and Add**

**Syntax**

```
lwadd_na Dest, SrcA, Imm8
```

**Example**

```
lwadd_na r5, r6, 5
```

**Description**

Load a word from memory into the destination register. The address of the value to be loaded is read from the source operand and the bottom two bits are set to 0. No Unaligned Data Reference interrupts are caused by this instruction. Add the signed immediate argument to the address register.

**NOTE:** This instruction is only supported in the TILEPro family of products.

**Functional Description**

```
rf[Dest] = memoryReadWordNA(rf[SrCA]);
rf[SrCA] = rf[SrCA] + signExtend8(Imm8);
```

**Valid Pipelines**

```
<p>| | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>X0</td>
<td>X1</td>
<td>Y0</td>
<td>Y1</td>
</tr>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

**Encoding**

```
62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31
<table>
<thead>
<tr>
<th>0110</th>
<th>n</th>
<th>0011011</th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Dest_X1   - Dest</td>
<td>SrcA_X1   - SrcA</td>
<td>Imm8_X1   - Imm8</td>
<td>Imm Opcode Extension_X1   - 0x1B</td>
<td>S_X1   - Sbit</td>
<td>Opcode_X1   - 0x6</td>
</tr>
</tbody>
</table>
```

*Figure 4-243: lwadd_na in X1 Bit Descriptions*
sb: Store Byte

**Syntax**

`sb SrcA, SrcB`

**Example**

`sb r5, r6`

**Description**

Store a byte from the second source register into memory at the address held in the first source register.

**Functional Description**

```c
memoryWriteByte(rf[SrcA], rf[SrcB]);
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
0001 0 00100000  s  s  000000
```

- `Dest_X1`: Reserved 0x0
- `SrcA_X1`: SrcA
- `SrcB_X1`: SrcB
- `RRROpcodeExtension_X1`: 0x20
- `S_X1`: Reserved 0x0
- `Opcode_X1`: 0x1

```
101 s
```

- `SrcBDest_Y2`: SrcB
- `SrcA_Y2[0:0]`: SrcA[0:0]
- `Opcode_Y2`: 0x5

---

*Figure 4-244: sb in X1 Bit Descriptions*

*Figure 4-245: sb in Y2 Bit Descriptions*
sbadd: Store Byte and Add

Syntax

\texttt{sbadd \textit{SrcA}, SrcB, Imm8}

Example

\texttt{sbadd r5, r6, 5}

Description

Store a byte from the second source register into memory at the address held in the first source register. Add the signed immediate argument to the address register.

NOTE: This instruction is only supported in the TILEPro family of products.

Functional Description

\begin{verbatim}
memoryWriteByte(rf[SrcA], rf[SrcB]);
rf[SrcA] = rf[SrcA] + signExtend8(Imm8);
\end{verbatim}

Valid Pipelines

\begin{center}
\begin{tabular}{|c|c|c|c|c|}
\hline
X0 & X1 & Y0 & Y1 & Y2 \\
\hline
x & \textbf{x} & & & \\
\hline
\end{tabular}
\end{center}

Encoding

\begin{verbatim}
0110 0 0011100 i s ds i
\end{verbatim}

\begin{itemize}
\item Dest_Imm8_X1[5:0] - Imm8[5:0]
\item SrcA_X1 - SrcA
\item SrcB_X1 - SrcB
\item Dest_Imm8_X1[7:6] - Imm8[7:6]
\item ImmOpcodeExtension_X1 - 0x1C
\item S_X1 - Reserved 0x0
\item Opcode_X1 - 0x6
\end{itemize}

\textit{Figure 4-246: sbadd in X1 Bit Descriptions}
sh: Store Half Word

Syntax
sh SrcA, SrcB

Example
sh r5, r6

Description
Store a half word from the second source register into memory at the address held in the first source register. This store only operates for half word aligned stores. Unaligned memory access causes an Unaligned Data Reference interrupt.

Functional Description
memoryWriteHalfWord(rf[SrCA], rf[SrCB]);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>62</td>
<td>Dest_X1 - Reserved 0x0</td>
</tr>
<tr>
<td>61</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>60</td>
<td>SrcB_X1 - SrcB</td>
</tr>
<tr>
<td>59</td>
<td>RRROpcodeExtension_X1 - 0x2A</td>
</tr>
<tr>
<td>58</td>
<td>S_X1 - Reserved 0x0</td>
</tr>
<tr>
<td>57</td>
<td>Opcode_X1 - 0x1</td>
</tr>
<tr>
<td>56</td>
<td></td>
</tr>
<tr>
<td>55</td>
<td></td>
</tr>
<tr>
<td>54</td>
<td></td>
</tr>
<tr>
<td>53</td>
<td></td>
</tr>
<tr>
<td>52</td>
<td></td>
</tr>
<tr>
<td>51</td>
<td></td>
</tr>
<tr>
<td>50</td>
<td></td>
</tr>
<tr>
<td>49</td>
<td></td>
</tr>
<tr>
<td>48</td>
<td></td>
</tr>
<tr>
<td>47</td>
<td></td>
</tr>
<tr>
<td>46</td>
<td></td>
</tr>
<tr>
<td>45</td>
<td></td>
</tr>
<tr>
<td>44</td>
<td></td>
</tr>
<tr>
<td>43</td>
<td></td>
</tr>
<tr>
<td>42</td>
<td></td>
</tr>
<tr>
<td>41</td>
<td></td>
</tr>
<tr>
<td>40</td>
<td></td>
</tr>
<tr>
<td>39</td>
<td></td>
</tr>
<tr>
<td>38</td>
<td></td>
</tr>
<tr>
<td>37</td>
<td></td>
</tr>
<tr>
<td>36</td>
<td></td>
</tr>
<tr>
<td>35</td>
<td></td>
</tr>
<tr>
<td>34</td>
<td></td>
</tr>
<tr>
<td>33</td>
<td></td>
</tr>
<tr>
<td>32</td>
<td></td>
</tr>
<tr>
<td>31</td>
<td>Opcode_X1 - 0x1</td>
</tr>
</tbody>
</table>

Figure 4-247: sh in X1 Bit Descriptions

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>26</td>
<td>SrcBDest_Y2 - SrcB</td>
</tr>
<tr>
<td>25</td>
<td>SrcA_Y2[0:0] - SrcA[0:0]</td>
</tr>
<tr>
<td>23</td>
<td>Opcode_Y2 - 0x6</td>
</tr>
<tr>
<td>22</td>
<td></td>
</tr>
<tr>
<td>21</td>
<td></td>
</tr>
<tr>
<td>20</td>
<td></td>
</tr>
<tr>
<td>19</td>
<td></td>
</tr>
<tr>
<td>18</td>
<td></td>
</tr>
<tr>
<td>17</td>
<td></td>
</tr>
<tr>
<td>16</td>
<td></td>
</tr>
<tr>
<td>15</td>
<td></td>
</tr>
<tr>
<td>14</td>
<td></td>
</tr>
<tr>
<td>13</td>
<td></td>
</tr>
<tr>
<td>12</td>
<td></td>
</tr>
<tr>
<td>11</td>
<td></td>
</tr>
<tr>
<td>10</td>
<td></td>
</tr>
</tbody>
</table>

Figure 4-248: sh in Y2 Bit Descriptions
**shadd: Store Half Word and Add**

**Syntax**

```
shadd SrcA, SrcB, Imm8
```

**Example**

```
shadd r5, r6, 5
```

**Description**

Store a half word from the second source register into memory at the address held in the first source register. This store only operates for half word aligned stores. Unaligned memory access causes an Unaligned Data Reference interrupt. Add the signed immediate argument to the address register.

**NOTE:** This instruction is only supported in the TILEPro family of products.

**Functional Description**

```c
memoryWriteHalfWord(rf[SrCA], rf[SrCB]);
rf[SrCA] = rf[SrCA] + signExtend8(Imm8);
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
0110 0 001101 i s ds i
```

- **Dest_Imm8_X1[5:0]** - Imm8[5:0]
- **SrcA_X1** - SrCA
- **SrcB_X1** - SrCB
- **Dest_Imm8_X1[7:6]** - Imm8[7:6]
- **ImmOpcodeExtension_X1** - 0x1D
- **S_X1** - Reserved 0x0
- **Opcode_X1** - 0x6

*Figure 4-249: shadd in X1 Bit Descriptions*
sw: Store Word

Syntax
sw SrcA, SrcB

Example
sw r5, r6

Description
Store a word from the second source register into memory at the address held in the first source register. This store only operates for word aligned stores. Unaligned memory access causes an Unaligned Data Reference interrupt.

Functional Description
memoryWriteWord(rf[SrC], rf[SrC]);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td>X</td>
<td></td>
<td>X</td>
</tr>
</tbody>
</table>

Encoding

![Figure 4-250: sw in X1 Bit Descriptions](image)

![Figure 4-251: sw in Y2 Bit Descriptions](image)
swadd: Store Word and Add

Syntax

swadd SrcA, SrcB, Imm8

Example

swadd r5, r6, 5

Description

Store a word from the second source register into memory at the address held in the first source register. This store only operates for word aligned stores. Unaligned memory access causes an Unaligned Data Reference interrupt. Add the signed immediate argument to the address register.

NOTE: This instruction is only supported in the TILEPro family of products.

Functional Description

memoryWriteWord(rf[SrcA], rf[SrcB]);
rf[SrcA] = rf[SrcA] + signExtend8(Imm8);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0110</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>0011110</td>
</tr>
<tr>
<td>7</td>
<td>s</td>
</tr>
<tr>
<td>8</td>
<td>ds</td>
</tr>
<tr>
<td>9</td>
<td>i</td>
</tr>
<tr>
<td>10</td>
<td>Dest_Imm8_X1[5:0]</td>
</tr>
<tr>
<td>11</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>12</td>
<td>SrcB_X1 - SrcB</td>
</tr>
<tr>
<td>13</td>
<td>Dest_Imm8_X1[7:6]</td>
</tr>
<tr>
<td>14</td>
<td>ImmOpcodeExtension_X1 - 0x1E</td>
</tr>
<tr>
<td>15</td>
<td>S_X1 - Reserved 0x0</td>
</tr>
<tr>
<td>16</td>
<td>Opcode_X1 - 0x0</td>
</tr>
</tbody>
</table>

Figure 4-252: swadd in X1 Bit Descriptions
**tns: Test and Set Word**

**Syntax**

```plaintext
tns Dest, Src
```

**Example**

```plaintext	ns r5, r6
```

**Description**

Load a word from memory into the destination register and atomically write the value one (1) into the addressed memory location. The address of the value to be loaded then written to is read from the source operand. This instruction only operates for word aligned addresses. Unaligned memory access causes an Unaligned Data Reference interrupt.

**Functional Description**

```plaintext
r{Dest} = memoryReadWord(r{Src});
memoryWriteWord(r{Src} & WORD_ADDR_MASK, 0x00000001);
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Bit Descriptions](image)

*Figure 4-253: tns in X1 Bit Descriptions*
4.1.10 Memory Maintenance Instructions

The following sections provide detailed descriptions of memory maintenance instructions listed alphabetically:

- **dtlbpr**: Data TLB Probe
- **finv**: Flush and Invalidate Cache Line
- **flush**: Flush Cache Line
- **inv**: Invalidate Cache Line
- **mf**: Memory Fence
- **wh64**: Write Hint 64 Bytes
dtlbpr: Data TLB Probe

Syntax

dtlbpr SrcA

Example

dtlbpr r5

Description

Probe the Data TLB and return the results as a unary encoded result for each matching entry in to SPR DTLB.MATCH.0. This probe uses the data CPL and ignores the D_ASID.

Functional Description

dtlbProbe(rf[SrcA]);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>00000</td>
<td></td>
<td>o00001011</td>
<td>00010</td>
<td>s</td>
</tr>
</tbody>
</table>

Figure 4-254: dtlbpr in X1 Bit Descriptions
finv: Flush and Invalidate Cache Line

Syntax
finv SrcA

Example
finv r5

Description
Flush and Invalidates the cache line in the data cache that contains the address stored in the source operand. If a cache line that contains the address is not in the cache, this instruction has no effect on the cache contents. The line size that is flushed and invalidated is at minimum 16B. An implementation is free to flush and invalidate a larger region.

Functional Description
flushAndinvalidateCacheLine(rf[SrcA]);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>62</td>
<td>0</td>
</tr>
<tr>
<td>61</td>
<td>0</td>
</tr>
<tr>
<td>60</td>
<td>0</td>
</tr>
<tr>
<td>59</td>
<td>0</td>
</tr>
<tr>
<td>58</td>
<td>0</td>
</tr>
<tr>
<td>57</td>
<td>0</td>
</tr>
<tr>
<td>56</td>
<td>0</td>
</tr>
<tr>
<td>55</td>
<td>0</td>
</tr>
<tr>
<td>54</td>
<td>0</td>
</tr>
<tr>
<td>53</td>
<td>0</td>
</tr>
<tr>
<td>52</td>
<td>0</td>
</tr>
<tr>
<td>51</td>
<td>0</td>
</tr>
<tr>
<td>50</td>
<td>0</td>
</tr>
<tr>
<td>49</td>
<td>0</td>
</tr>
<tr>
<td>48</td>
<td>0</td>
</tr>
<tr>
<td>47</td>
<td>0</td>
</tr>
<tr>
<td>46</td>
<td>0</td>
</tr>
<tr>
<td>45</td>
<td>0</td>
</tr>
<tr>
<td>44</td>
<td>0</td>
</tr>
<tr>
<td>43</td>
<td>0</td>
</tr>
<tr>
<td>42</td>
<td>0</td>
</tr>
<tr>
<td>41</td>
<td>0</td>
</tr>
<tr>
<td>40</td>
<td>0</td>
</tr>
<tr>
<td>39</td>
<td>0</td>
</tr>
<tr>
<td>38</td>
<td>0</td>
</tr>
<tr>
<td>37</td>
<td>0</td>
</tr>
<tr>
<td>36</td>
<td>0</td>
</tr>
<tr>
<td>35</td>
<td>0</td>
</tr>
<tr>
<td>34</td>
<td>0</td>
</tr>
<tr>
<td>33</td>
<td>0</td>
</tr>
<tr>
<td>32</td>
<td>0</td>
</tr>
<tr>
<td>31</td>
<td>0</td>
</tr>
<tr>
<td>30</td>
<td>0</td>
</tr>
<tr>
<td>29</td>
<td>0</td>
</tr>
<tr>
<td>28</td>
<td>0</td>
</tr>
<tr>
<td>27</td>
<td>0</td>
</tr>
<tr>
<td>26</td>
<td>0</td>
</tr>
<tr>
<td>25</td>
<td>0</td>
</tr>
<tr>
<td>24</td>
<td>0</td>
</tr>
<tr>
<td>23</td>
<td>0</td>
</tr>
<tr>
<td>22</td>
<td>0</td>
</tr>
<tr>
<td>21</td>
<td>0</td>
</tr>
<tr>
<td>20</td>
<td>0</td>
</tr>
<tr>
<td>19</td>
<td>0</td>
</tr>
<tr>
<td>18</td>
<td>0</td>
</tr>
<tr>
<td>17</td>
<td>0</td>
</tr>
<tr>
<td>16</td>
<td>0</td>
</tr>
<tr>
<td>15</td>
<td>0</td>
</tr>
<tr>
<td>14</td>
<td>0</td>
</tr>
<tr>
<td>13</td>
<td>0</td>
</tr>
<tr>
<td>12</td>
<td>0</td>
</tr>
<tr>
<td>11</td>
<td>0</td>
</tr>
<tr>
<td>10</td>
<td>0</td>
</tr>
<tr>
<td>9</td>
<td>0</td>
</tr>
<tr>
<td>8</td>
<td>0</td>
</tr>
<tr>
<td>7</td>
<td>0</td>
</tr>
<tr>
<td>6</td>
<td>0</td>
</tr>
<tr>
<td>5</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Figure 4-255: finv in X1 Bit Descriptions
flush: Flush Cache Line

Syntax
flush SrcA

Example
flush r5

Description
Flushes the cache line in the data cache that contains the address stored in the source operand. If a cache line that contains the address is not in the cache, this instruction has no effect. If a cache line that contains the address is not dirty in the cache, this instruction has no effect. The line size that is flushed is at minimum 16B. An implementation is free to flush a larger region.

Functional Description
flushCacheLine(rf[SrCA]);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>x</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

![Codec Diagram]

Figure 4-256: flush in X1 Bit Descriptions
**inv: Invalidate Cache Line**

**Syntax**

```plaintext
inv SrcA
```

**Example**

```plaintext
inv r5
```

**Description**

Invalidates the cache line in the data cache that contains the address stored in the source operand. If a cache line that contains the address is not in the cache, this instruction has no effect on the cache contents. This instruction causes an access violation if the current privilege level is not allowed to write to the specified cache line. The line size that is invalidated is at minimum 16B. An implementation is free to invalidate a larger region.

**Functional Description**

```plaintext
invalidateCacheLine(rf[SrcA] & BYTE_16_ADDR_MASK);
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-257: inv in X1 Bit Descriptions]

---

*Tile Processor User Architecture Manual*  
187  
*Tilera Confidential — Subject to Change Without Notice*
mf: Memory Fence

Syntax

```
mf
```

Example

```
mf
```

Description

The memory fence instruction is used to establish ordering between prior memory operations and subsequent instructions. The exact order that is established depends on the page attributes of the pages that the memory operations are targeting. For more information refer to Memory and Cache Architecture.

Functional Description

```
memoryFence();
```

Valid Pipelines

```
X0  X1  Y0  Y1  Y2
   X
```

Encoding

```
62  61  60  59  58  57  56  55  54  53  52  51  50  49  48  47  46  45  44  43  42  41  40  39  38  37  36  35  34  33  32  31
1000 0 0000001011 0111 00000 00000

Dest_X1 - Reserved 0x0
SrcA_X1 - Reserved 0x0
UnOpcodeExtension_X1 - 0xF
UnShOpcodeExtension_X1 - 0xB
S_X1 - Reserved 0x0
Opcode_X1 - 0x8
```

Figure 4-258: mf in X1 Bit Descriptions
wh64: Write Hint 64 Bytes

Syntax

wh64 Src

Example

wh64 r5

Description

Hint that software intends to write every byte of the specified 64B cache line before reading it. The processor may use this hint to allocate the 64B line into the cache without fetching the current contents from main memory. The processor may set the contents of the block to any value that does not introduce a security hole.

NOTE: This instruction is only supported in the TILEPro family of products.

Functional Description

writeHint64Cache(rf[SrcA] & BYTE_64_ADDR_MASK);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 1000 | 0 | 0000001011 | 1011 | s | 000000 |

Dest_X1 - Reserved 0x0
SrcA_X1 - SrcA
UnOpcExt_X1 - 0x17
UnShOpcExt_X1 - 0xB
S_X1 - Reserved 0x0
Opc_ext_X1 - 0x8

Figure 4-259: wh64 in X1 Bit Descriptions
4.1.11 Multiply Instructions

The following sections provide detailed descriptions of multiply instructions listed alphabetically.

- `mulhh_ss`: Multiply High Signed High Signed Half Word
- `mulhh_su`: Multiply High Signed High Unsigned Half Word
- `mulhh_uu`: Multiply High Unsigned High Unsigned Half Word
- `mulhha_ss`: Multiply Accumulate High Signed High Signed Half Word
- `mulhha_su`: Multiply Accumulate High Signed High Unsigned Half Word
- `mulhha_uu`: Multiply Accumulate High Unsigned High Unsigned Half Word
- `mulhhsa_uu`: Multiply Shift Accumulate High Unsigned High Unsigned Half Word
- `mulhl_ss`: Multiply High Signed Low Signed Half Word
- `mulhl_su`: Multiply High Signed Low Unsigned Half Word
- `mulhl_us`: Multiply High Unsigned Low Signed Half Word
- `mulhl_uu`: Multiply High Unsigned Low Unsigned Half Word
- `mulhla_ss`: Multiply Accumulate High Signed Low Signed Half Word
- `mulhla_su`: Multiply Accumulate High Signed Low Unsigned Half Word
- `mulhla_uu`: Multiply Accumulate High Unsigned Low Signed Half Word
- `mulhla_uu`: Multiply Accumulate High Unsigned Low Unsigned Half Word
- `mulhlsa_uu`: Multiply Shift Accumulate High Unsigned Low Unsigned Half Word
- `mulll_ss`: Multiply Low Signed Low Signed Half Word
- `mulll_su`: Multiply Low Signed Low Unsigned Half Word
- `mulll_uu`: Multiply Low Unsigned Low Unsigned Half Word
- `mullla_ss`: Multiply Accumulate Low Signed Low Signed Half Word
- `mullla_su`: Multiply Accumulate Low Signed Low Unsigned Half Word
- `mullla_uu`: Multiply Accumulate Low Unsigned Low Unsigned Half Word
- `mulllsa_uu`: Multiply Shift Accumulate Low Unsigned Low Unsigned Half Word
mulhh_ss: Multiply High Signed High Signed Half Word

**Syntax**

mulhh_ss Dest, SrcA, SrcB

**Example**

mulhh_ss r5, r6, r7

**Description**

Multiply the high half word of the first operand by the high half word of the second operand. The result returned is a full word in length. The input operands are interpreted as signed half words.

**Functional Description**

\[
rf[\text{Dest}] = ((\text{SignedMachineWord}) \text{signExtend16(getHighHalfWord(rf[\text{SrcA}]))}) \times ((\text{SignedMachineWord}) \text{signExtend16(getHighHalfWord(rf[\text{SrcB}]))});
\]

**Valid Pipelines**

```
<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

**Encoding**

```
| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n | 00001010 | s | s | d |
```

**Figure 4-260: mulhh_ss in X0 Bit Descriptions**

```
<table>
<thead>
<tr>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
</tr>
</thead>
<tbody>
<tr>
<td>0111</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

**Figure 4-261: mulhh_ss in Y0 Bit Descriptions**
mulhh_su: Multiply High Signed High Unsigned Half Word

Syntax
mulhh_su Dest, SrcA, SrcB

Example
mulhh_su r5, r6, r7

Description
Multiply the high half word of the first operand by the high half word of the second operand. The result returned is a full word in length. The first input operand is interpreted as a signed half word and the second input operand is interpreted as an unsigned half word.

Functional Description
rf[Dest] =
((SignedMachineWord) signExtend16(getHighHalfWord(rf[SrcA])))
* ((UnsignedMachineWord) getHighHalfWord(rf[SrcB]));

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

![Figure 4-262: mulhh_su in X0 Bit Descriptions](image)

![Figure 4-263: mulhh_su in Y0 Bit Descriptions](image)
mulhh_uu: Multiply High Unsigned High Unsigned Half Word

Syntax

mulhh_uu Dest, SrcA, SrcB

Example

mulhh_uu r5, r6, r7

Description

Multiply the high half word of the first operand by the high half word of the second operand. The result returned is a full word in length. The input operands are interpreted as unsigned half words.

Functional Description

rf[Dest] =

(\text{UnsignedMachineWord} \text{ getHighHalfWord}(rf[SrcA])) \times
(\text{UnsignedMachineWord} \text{ getHighHalfWord}(rf[SrcB]));

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000| n  | 000011100| s  | s  | d  |
```

- Dest_X0 - Dest
- SrcA_X0 - SrcA
- SrcB_X0 - SrcB
- RRROpcodeExtension_X0 - 0x1C
- S_X0 - Sbit
- Opcode_X0 - 0x0

```
<table>
<thead>
<tr>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
</tr>
</thead>
<tbody>
<tr>
<td>0111</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

- Dest_Y0 - Dest
- SrcA_Y0 - SrcA
- SrcB_Y0 - SrcB
- RRROpcodeExtension_Y0 - 0x1
- Opcode_Y0 - 0x7

Figure 4-264: mulhh_uu in X0 Bit Descriptions

Figure 4-265: mulhh_uu in Y0 Bit Descriptions
mulhha_ss: Multiply Accumulate High Signed High Signed Half Word

Syntax
mulhha_ss Dest, SrcA, SrcB

Example
mulhha_ss r5, r6, r7

Description
Multiply the high half word of the first operand by the high half word of the second operand and accumulate the result into the destination operand. The result returned is a full word in length. The input operands are interpreted as signed half words.

Functional Description

\[
rf[Dest] = rf[Dest] + ((\text{SignedMachineWord} \ \text{signExtend16}(\text{getHighHalfWord}(rf[SrcA]))) \ \ast \ ((\text{SignedMachineWord} \ \text{signExtend16}(\text{getHighHalfWord}(rf[SrcB]))));
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-266: mulhha_ss in X0 Bit Descriptions

Figure 4-267: mulhha_ss in Y0 Bit Descriptions
**mulhha_su: Multiply Accumulate High Signed High Unsigned Half Word**

**Syntax**

```
mulhha_su Dest, SrcA, SrcB
```

**Example**

```
mulhha_su r5, r6, r7
```

**Description**

Multiply the high half word of the first operand by the high half word of the second operand and accumulate the result into the destination operand. The result returned is a full word in length. The first input operand is interpreted as a signed half word and the second input operand is interpreted as an unsigned half word.

**Functional Description**

```
rf[Dest] = 
    rf[Dest] + 
    ((SignedMachineWord) signExtend16(getHighHalfWord(rf[SrcA]))) * 
    ((UnsignedMachineWord) getHighHalfWord(rf[SrcB]));
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-268: mulhha_su in X0 Bit Descriptions](image)
mulhha_uu: Multiply Accumulate High Unsigned High Unsigned Half Word

Syntax
mulhha_uu Dest, SrcA, SrcB

Example
mulhha_uu r5, r6, r7

Description
Multiply the high half word of the first operand by the high half word of the second operand and accumulate the result into the destination operand. The result returned is a full word in length. The input operands are interpreted as unsigned half words.

Functional Description

\[
\text{rf}[\text{Dest}] = \text{rf}[\text{Dest}] + \left(\text{UnsignedMachineWord}\ get\High\Half\Word(\text{rf}[\text{SrcA}]) \times \text{UnsignedMachineWord}\ get\High\Half\Word(\text{rf}[\text{SrcB}])\right);
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>23</th>
<th>22</th>
<th>21</th>
<th>20</th>
<th>19</th>
<th>18</th>
<th>17</th>
<th>16</th>
<th>15</th>
<th>14</th>
<th>13</th>
<th>12</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>n</td>
<td>000011000</td>
<td>s</td>
<td>s</td>
<td>ds</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 4-269: mulhha_uu in X0 Bit Descriptions

<table>
<thead>
<tr>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
</tr>
</thead>
<tbody>
<tr>
<td>1000</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

| 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 01 | s  | s  | ds |

Figure 4-270: mulhha_uu in Y0 Bit Descriptions
mulhhsa_uu: Multiply Shift Accumulate High Unsigned High Unsigned Half Word

Syntax
mulhhsa_uu Dest, SrcA, SrcB

Example
mulhhsa_uu r5, r6, r7

Description
Multiply the high half word of the first operand by the high half word of the second operand, shift the multiply left by 16, and accumulate the result into the destination operand. The result returned is a full word in length. The input operands are interpreted as unsigned half words.

Functional Description
\[
rf[Dest] = \\
rf[Dest] + ((\text{UnsignedMachineWord} \text{ getHighHalfWord}(rf[SrcA])) \times \text{getHighHalfWord}(rf[SrcB])) << 16;
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-271: mulhhsa_uu in X0 Bit Descriptions
mulhl_ss: Multiply High Signed Low Signed Half Word

Syntax
mulhl_ss Dest, SrcA, SrcB

Example
mulhl_ss r5, r6, r7

Description
Multiply the high half word of the first operand by the low half word of the second operand. The result returned is a full word in length. The input operands are interpreted as signed half words.

Functional Description
rf[Dest] = ((SignedMachineWord) signExtend16(getHighHalfWord(rf[SrcA]))) * ((SignedMachineWord) signExtend16(getLowHalfWord(rf[SrcB])));

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-272: mulhl_ss in X0 Bit Descriptions
mulhl_su: Multiply High Signed Low Unsigned Half Word

Syntax
mulhl_su Dest, SrcA, SrcB

Example
mulhl_su r5, r6, r7

Description
Multiply the high half word of the first operand by the low half word of the second operand. The result returned is a full word in length. The first input operand is interpreted as a signed half word and the second input operand is interpreted as an unsigned half word.

Functional Description
rf[Dest] =
    ((SignedMachineWord) signExtend16(getHighHalfWord(rf[Srca])))*
    ((UnsignedMachineWord) getLowHalfWord(rf[SrCB]));

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000| n  | 00010011 | s  | s  | d  |

Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRROpcodeExtension_X0 - 0x23
S_X0 - Sbit
Opcode_X0 - 0x0

Figure 4-273: mulhl_su in X0 Bit Descriptions
mulhl_us: Multiply High Unsigned Low Signed Half Word

Syntax
mulhl_us Dest, SrcA, SrcB

Example
mulhl_us r5, r6, r7

Description
Multiply the high half word of the first operand by the low half word of the second operand. The result returned is a full word in length. The first input operand is interpreted as an unsigned half word and the second input operand is interpreted as a signed half word.

Functional Description
rf[Dest] =
((UnsignedMachineWord) getHighHalfWord(rf[SrcA])) *
((SignedMachineWord) signExtend16(getLowHalfWord(rf[SrcB])));

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
000 000100100  s  s  d
000 0100100  n
```

Figure 4-274: mulhl_us in X0 Bit Descriptions
mulhl_uu: Multiply High Unsigned Low Unsigned Half Word

Syntax

```c
mulhl_uu Dest, SrcA, SrcB
```

Example

```c
mulhl_uu r5, r6, r7
```

Description

Multiply the high half word of the first operand by the low half word of the second operand. The result returned is a full word in length. The input operands are interpreted as unsigned half words.

Functional Description

```c
rf[Dest] =
((UnsignedMachineWord) getHighHalfWord(rf[SrcA])) *
((UnsignedMachineWord) getLowHalfWord(rf[SrcB]));
```

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```plaintext

```

Figure 4-275: mulhl_uu in X0 Bit Descriptions
mulhla_ss: Multiply Accumulate High Signed Low Signed Half Word

Syntax
mulhla_ss Dest, SrcA, SrcB

Example
mulhla_ss r5, r6, r7

Description
Multiply the high half word of the first operand by the low half word of the second operand and accumulate the result into the destination operand. The result returned is a full word in length. The input operands are interpreted as signed half words.

Functional Description
\[
\text{rf}[\text{Dest}] = \text{rf}[\text{Dest}] + ((\text{SignedMachineWord}) \text{signExtend16(getHighHalfWord(rf[SrcA]))}) \times ((\text{SignedMachineWord}) \text{signExtend16(getLowHalfWord(rf[SrcB]))});
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | h | 00001101 | s | s | dx |

Dest_X0 · Dest
SrcA_X0 · SrcA
SrcB_X0 · SrcB
RRROpcodeExtension_X0 · 0x1D
S_X0 · Sbit
Opcode_X0 · 0x0

Figure 4-276: mulhla_ss in X0 Bit Descriptions
mulhla_su: Multiply Accumulate High Signed Low Unsigned Half Word

Syntax

mulhla_su Dest, SrcA, SrcB

Example

mulhla_su r5, r6, r7

Description

Multiply the high half word of the first operand by the low half word of the second operand and accumulate the result into the destination operand. The result returned is a full word in length. The first input operand is interpreted as a signed half word and the second input operand is interpreted as an unsigned half word.

Functional Description

\[
rf[\text{Dest}] = rf[\text{Dest}] + ((\text{SignedMachineWord}) \text{signExtend16}(\text{getHighHalfWord}(rf[\text{SrcA}]))) \times ((\text{UnsignedMachineWord}) \text{getLowHalfWord}(rf[\text{SrcB}]));
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | s | 000011110 | s | s | ds |

Figure 4-277: mulhla_su in X0 Bit Descriptions
mulhla_us: Multiply Accumulate High Unsigned Low Signed Half Word

Syntax
mulhla_us Dest, SrcA, SrcB

Example
mulhla_us r5, r6, r7

Description
Multiply the high half word of the first operand by the low half word of the second operand and accumulate the result into the destination operand. The result returned is a full word in length. The first input operand is interpreted as an unsigned half word and the second input operand is interpreted as a signed half word.

Functional Description
rf[Dest] =
rf[Dest] +
((UnsignedMachineWord) getHighHalfWord(rf[SrC A])) *
((SignedMachineWord) signExtend16(getLowHalfWord(rf[SrC B])));

Valid Pipelines

Encoding

Figure 4-278: mulhla_us in X0 Bit Descriptions
**mulhla_uu: Multiply Accumulate High Unsigned Low Unsigned Half Word**

**Syntax**

```plaintext
mulhla_uu Dest, SrcA, SrcB
```

**Example**

```plaintext
mulhla_uu r5, r6, r7
```

**Description**

Multiply the high half word of the first operand by the low half word of the second operand and accumulate the result into the destination operand. The result returned is a full word in length. The input operands are interpreted as unsigned half words.

**Functional Description**

```plaintext
rf[Dest] =
    rf[Dest] +
    ((UnsignedMachineWord) getHighHalfWord(rf[Srca])) *
    ((UnsignedMachineWord) getLowHalfWord(rf[SrCB]));
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
000  n  00010000  s  s  ds
```

- Dest_X0 - Dest
- SrcA_X0 - SrcA
- SrcB_X0 - SrcB
- RRR OpcodeExtension_X0 - 0x20
- S_X0 - Sbit
- Opcode_X0 - 0x0

*Figure 4-279: mulhla_uu in X0 Bit Descriptions*
mulhlsa_uu: Multiply Shift Accumulate High Unsigned Low Unsigned Half Word

Syntax
mulhlsa_uu Dest, SrcA, SrcB

Example
mulhlsa_uu r5, r6, r7

Description
Multiply the high half word of the first operand by the low half word of the second operand, shift the multiply left by 16, and accumulate the result into the destination operand. The result returned is a full word in length. The input operands are interpreted as unsigned half words.

Functional Description
\[
rf[Dest] = rf[Dest] + (((UnsignedMachineWord) getHighHalfWord(rf[SrCA])) * (((UnsignedMachineWord) getLowHalfWord(rf[SrCB]))) << 16);
\]

Valid Pipelines

Encoding

Figure 4-280: mulhlsa_uu in X0 Bit Descriptions

Figure 4-281: mulhlsa_uu in Y0 Bit Descriptions
mulll_ss: Multiply Low Signed Low Signed Half Word

Syntax

mulll_ss Dest, SrcA, SrcB

Example

mulll_ss r5, r6, r7

Description

Multiply the low half word of the first operand by the low half word of the second operand. The result returned is a full word in length. The input operands are interpreted as signed half words.

Functional Description

\[
rf[\text{Dest}] = ((\text{SignedMachineWord}) \ sign\text{Extend16}(\text{getLowHalfWord}(rf[\text{SrcA}]))) \times ((\text{SignedMachineWord}) \ sign\text{Extend16}(\text{getLowHalfWord}(rf[\text{SrcB}])));
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-282: mulll_ss in X0 Bit Descriptions

Figure 4-283: mulll_ss in Y0 Bit Descriptions
mulll_su: Multiply Low Signed Low Unsigned Half Word

Syntax

mulll_su Dest, SrcA, SrcB

Example

mulll_su r5, r6, r7

Description

Multiply the low half word of the first operand by the low half word of the second operand. The result returned is a full word in length. The first input operand is interpreted as a signed half word and the second input operand is interpreted as an unsigned half word.

Functional Description

rf[Dest] =
((SignedMachineWord) signExtend16(getLowHalfWord(rf[SrcA]))) *
((UnsignedMachineWord) getLowHalfWord(rf[SrcB]));

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
000   00010111   s   s   d
     |            |    |    |
     |    Dest_X0 - Dest
     |    SrcA_X0 - SrcA
     |    SrcB_X0 - SrcB
     |    RRROpcodeExtension_X0 - 0x2B
     |    S_X0 - Sbit
     |    Opcode_X0 - 0x0
```

Figure 4-284: mulll_su in X0 Bit Descriptions
mulll_uu: Multiply Low Unsigned Low Unsigned Half Word

Syntax
mulll_uu Dest, SrcA, SrcB

Example
mulll_uu r5, r6, r7

Description
Multiply the low half word of the first operand by the low half word of the second operand. The result returned is a full word in length. The input operands are interpreted as unsigned half words.

Functional Description
rf[Dest] =
((UnsignedMachineWord) getLowHalfWord(rf[Srca])) *
((UnsignedMachineWord) getLowHalfWord(rf[Srcb]));

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
000 m 00011100  s  s  d

Dest_X0 - Dest
SrcA_X0 - Srca
SrcB_X0 - Srcb
RRR_opcodeExtension_X0 - 0x2C
S_X0 - Sbit
Opcode_X0 - 0x0
```

Figure 4-285: mulll_uu in X0 Bit Descriptions

```
011 1 1  s  s  d

Dest_Y0 - Dest
SrcA_Y0 - Srca
SrcB_Y0 - Srcb
RRR_opcodeExtension_Y0 - 0x3
Opcode_Y0 - 0x7
```

Figure 4-286: mulll_uu in Y0 Bit Descriptions
mullla_ss: Multiply Accumulate Low Signed Low Signed Half Word

Syntax

\texttt{mullla\_ss Dest, SrcA, SrcB}

Example

\texttt{mullla\_ss r5, r6, r7}

Description

Multiply the low half word of the first operand by the low half word of the second operand and accumulate the result into the destination operand. The result returned is a full word in length. The input operands are interpreted as signed half words.

Functional Description

\begin{verbatim}
rf[Dest] = rf[Dest] + ((SignedMachineWord) signExtend16(getLowHalfWord(rf[SrcA]))) *
((SignedMachineWord) signExtend16(getLowHalfWord(rf[SrcB])));
\end{verbatim}

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

\begin{figure}
\centering
\includegraphics[width=\textwidth]{figure4-287}
\caption{mullla\_ss in X0 Bit Descriptions}
\end{figure}

\begin{figure}
\centering
\includegraphics[width=\textwidth]{figure4-288}
\caption{mullla\_ss in Y0 Bit Descriptions}
\end{figure}
**mullla_su: Multiply Accumulate Low Signed Low Unsigned Half Word**

**Syntax**

mullla_su Dest, SrcA, SrcB

**Example**

mullla_su r5, r6, r7

**Description**

Multiply the low half word of the first operand by the low half word of the second operand and accumulate the result into the destination operand. The result returned is a full word in length. The first input operand is interpreted as a signed half word and the second input operand is interpreted as an unsigned half word.

**Functional Description**

\[
rf[Dest] = rf[Dest] + \left( \text{SignedMachineWord} \left( \text{signExtend16(getLowHalfWord(rf[SrcA]))} \right) \right) \times \left( \text{UnsignedMachineWord} \left( \text{getLowHalfWord(rf[SrcB])} \right) \right);
\]

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```

000 n 000100111 ss s s ds

Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRROpcodeExtension_X0 - 0x27
S_X0 - Sbit
Opcode_X0 - 0x0
```

---

*Figure 4-289: mullla_su in X0 Bit Descriptions*
**mullla_uu: Multiply Accumulate Low Unsigned Low Unsigned Half Word**

**Syntax**

\[ \text{mullla_uu Dest, SrcA, SrcB} \]

**Example**

\[ \text{mullla_uu r5, r6, r7} \]

**Description**

Multiply the low half word of the first operand by the low half word of the second operand and accumulate the result into the destination operand. The result returned is a full word in length. The input operands are interpreted as unsigned half words.

**Functional Description**

\[ \text{rf[Dest]} = \text{rf[Dest]} + ((\text{UnsignedMachineWord}) \ \text{getLowHalfWord(rf[Srca]))} \times ((\text{UnsignedMachineWord}) \ \text{getLowHalfWord(rf[Srcb]))}; \]

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
000  s  00101000  s  s  ds
    Dest_X0  - Dest
    SrcA_X0  - SrcA
    SrcB_X0  - SrcB
    RRROpcodeExtension_X0  - 0x28
    S_X0  - Sbit
    Opcode_X0  - 0x0
```

```
1000  s  s  ds
      Dest_Y0  - Dest
      SrcA_Y0  - SrcA
      SrcB_Y0  - SrcB
      RRROpcodeExtension_Y0  - 0x3
      Opcode_Y0  - 0x8
```

*Figure 4-290: mullla_uu in X0 Bit Descriptions*

*Figure 4-291: mullla_uu in Y0 Bit Descriptions*
**mulllsa_uu: Multiply Shift Accumulate Low Unsigned Low Unsigned Half Word**

**Syntax**

\[ \text{mulllsa_uu Dest, SrcA, SrcB} \]

**Example**

\[ \text{mulllsa_uu r5, r6, r7} \]

**Description**

Multiply the low half word of the first operand by the low half word of the second operand, shift the multiply left 16, and accumulate the result into the destination operand. The result returned is a full word in length. The input operands are interpreted as unsigned half words.

**Functional Description**

\[
\text{rf}[\text{Dest}] = \text{rf}[\text{Dest}] + (((\text{UnsignedMachineWord}) \text{getLowHalfWord}(\text{rf}[\text{SrcA}])) \times ((\text{UnsignedMachineWord}) \text{getLowHalfWord}(\text{rf}[\text{SrcB}])) << 16);
\]

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-292: mulllsa_uu in X0 Bit Descriptions](image)
### 4.1.12 NOP Instructions

The following sections provide detailed descriptions of NOP instructions listed alphabetically.

- fnop: Filler No Operation
- nop: Architectural No Operation

#### fnop: Filler No Operation

**Syntax**

fnop

**Example**

fnop

**Description**

Indicate that the programmer, compiler, or tool was not able to fill this operation slot with a suitable operation. This operation has no outcome. fnop should be used to signal that the no operation is inserted because nothing else could be packed into the instruction bundle, not because an architectural nop is needed for correct operation or for timing delay. Typically, fnop’s can be removed at any point in the tool flow.

**Functional Description**

fnop();

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-293: fnop in X0 Bit Descriptions](image)

Figure 4-293: fnop in X0 Bit Descriptions
nop: Architectural No Operation

Syntax

nop

Example

nop

Description

Indicate to the hardware architecture that the machine should not issue an instruction with a side effect in this slot.

Functional Description

nop();

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>30-32</td>
<td>Dest_X0 - Reserved 0x0</td>
</tr>
<tr>
<td>33-34</td>
<td>SrcA_X0 - Reserved 0x0</td>
</tr>
<tr>
<td>35-36</td>
<td>UnOpcodeExtension_X0 - 0x6</td>
</tr>
<tr>
<td>37-38</td>
<td>UnSh OpcodeExtension_X0 - 0xB</td>
</tr>
<tr>
<td>39-40</td>
<td>S_X0 - Reserved 0x0</td>
</tr>
<tr>
<td>41-42</td>
<td>Opcode_X0 - 0x7</td>
</tr>
</tbody>
</table>

Figure 4-297: nop in X0 Bit Descriptions

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>61-63</td>
<td>Dest_X1 - Reserved 0x0</td>
</tr>
<tr>
<td>64-65</td>
<td>SrcA_X1 - Reserved 0x0</td>
</tr>
<tr>
<td>66-67</td>
<td>UnOpcodeExtension_X1 - 0x11</td>
</tr>
<tr>
<td>68-69</td>
<td>UnSh OpcodeExtension_X1 - 0xB</td>
</tr>
<tr>
<td>70-71</td>
<td>S_X1 - Reserved 0x0</td>
</tr>
<tr>
<td>72-73</td>
<td>Opcode_X1 - 0x8</td>
</tr>
</tbody>
</table>

Figure 4-298: nop in X1 Bit Descriptions
Figure 4-299: nop in Y0 Bit Descriptions

Figure 4-300: nop in Y1 Bit Descriptions
4.1.13 SIMD Instructions

The following sections provide detailed descriptions of SIMD instructions listed alphabetically.

- addb: Add Bytes
- addbs_u: Add Bytes Saturating Unsigned
- addh: Add Half Words
- addhs: Add Half Words Saturating
- addib: Add Immediate Bytes
- addih: Add Immediate Half Words
- adiffb_u: Absolute Difference Unsigned Bytes
- adiffh: Absolute Difference Half Words
- avgb_u: Average Byte Unsigned
- avgb: Average Half Words
- inthb: Interleave High Byte
- inthh: Interleave High Half Words
- intlb: Interleave Low Byte
- intlh: Interleave Low Half Words
- maxb_u: Maximum Byte Unsigned
- maxh: Maximum Half Words
- maxib_u: Maximum Immediate Byte Unsigned
- maxih: Maximum Immediate Half Words
- minb_u: Minimum Byte Unsigned
- minh: Minimum Half Words
- minib_u: Minimum Immediate Byte Unsigned
- minih: Minimum Immediate Half Words
- mnzb: Mask Not Zero Byte
- mnzh: Mask Not Zero Half Words
- mzb: Mask Zero Byte
- mzh: Mask Zero Half Words
- packbs_u: Pack Half Words Saturating
- packhb: Pack High Byte
- packhs: Pack Half Words Saturating
- packlb: Pack Low Byte
- sadab_u: Sum of Absolute Difference Accumulate Unsigned Bytes
- sadah: Sum of Absolute Difference Accumulate Half Words
- sadah_u: Sum of Absolute Difference Accumulate Unsigned Half Words
- sadb_u: Sum of Absolute Difference Unsigned Bytes
- sadh: Sum of Absolute Difference Half Words
• sadh_u: Sum of Absolute Difference Unsigned Half Words
• seqb: Set Equal to Byte
• seqh: Set Equal To Half Words
• seqib: Set Equal To Immediate Byte
• seqih: Set Equal To Immediate Half Words
• shlb: Logical Shift Left Bytes
• shlh: Logical Shift Left Half Words
• shlib: Logical Shift Left Immediate Bytes
• shlhih: Logical Shift Left Immediate Half Words
• shrb: Logical Shift Right Bytes
• shrh: Logical Shift Right Half Words
• shrh: Logical Shift Right Immediate Bytes
• shrhih: Logical Shift Right Immediate Half Words
• sltb: Set Less Than Byte
• sltb_u: Set Less Than Unsigned Byte
• slteb: Set Less Than or Equal Byte
• slteb_u: Set Less Than or Equal Unsigned Byte
• slteh: Set Less Than or Equal Half Words
• slteh_u: Set Less Than or Equal Unsigned Half Words
• slth: Set Less Than Half Words
• slth_u: Set Less Than Unsigned Half Words
• sltib: Set Less Than Immediate Byte
• sltib_u: Set Less Than Unsigned Immediate Byte
• sltih: Set Less Than Immediate Half Words
• sltih_u: Set Less Than Unsigned Immediate Half Words
• sneb: Set Not Equal To Byte
• sneh: Set Not Equal To Half Words
• srab: Arithmetic Shift Right Bytes
• srah: Arithmetic Shift Right Half Words
• sraib: Arithmetic Shift Right Immediate Bytes
• sraih: Arithmetic Shift Right Immediate Half Words
• subb: Subtract Bytes
• subbs_u: Subtract Bytes Saturating Unsigned
• subh: Subtract Half Words
• subhs: Subtract Half Words Saturating
addb: Add Bytes

Syntax
addb Dest, SrcA, SrcB

Example
addb r5, r6, r7

Description
Add the four bytes in the first source operand to the four bytes in the second source operand.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    output =
    setByte(output, counter,
        (getByte(rf[SrcA], counter) +
        getByte(rf[SrcB], counter)));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-301: addb in X0 Bit Descriptions
Figure 4-302: addb in X1 Bit Descriptions
**addbs_u: Add Bytes Saturating Unsigned**

**Syntax**

```
addbs_u Dest, SrcA, SrcB
```

**Example**

```
addbs_u r5, r6, r7
```

**Description**

Add the four bytes in the first source operand to the four bytes in the second source operand and saturate each result to 0 or the maximum positive value.

**NOTE:** This instruction is only supported in the TILEPro family of products.

**Functional Description**

```
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    output =
        setByte(output, counter,
            unsigned_saturate8(getByte(rf[SrcA], counter) +
            getByte(rf[SrcB], counter)));
} 
rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
```

```
000 n | 00110010 | s | s | d
```

- **Dest_X0** - Dest
- **SrcA_X0** - SrcA
- **SrcB_X0** - SrcB
- **RRROpcodeExtension_X0** - 0x62
- **S_X0** - Sbit
- **Opcode_X0** - 0x0

*Figure 4-303: addbs_u in X0 Bit Descriptions*
Figure 4-304: addbs_u in X1 Bit Descriptions
**addh: Add Half Words**

**Syntax**
```
addh Dest, SrcA, SrcB
```

**Example**
```
addh r5, r6, r7
```

**Description**
Add the pair of half words in the first source operand to the pair of half words in the second source operand.

**Functional Description**
```
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output =
        setHalfWord(output, counter,
            (getHalfWord(rf[SrcA], counter) +
            getHalfWord(rf[SrcB], counter)));
}
rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000| n | 00000010 | s | s | d |
```

**Figure 4-305: addh in X0 Bit Descriptions**
Instruction Set Architecture

Figure 4-306: addh in X1 Bit Descriptions
addhs: Add Half Words Saturating

Syntax

addhs Dest, SrcA, SrcB

Example

addhs r5, r6, r7

Description

Add the pair of half words in the first source operand to the pair of half words in the second source operand and saturate each result to the minimum negative value or maximum positive value.

NOTE: This instruction is only supported in the TILEPro family of products.

Functional Description

UnsingedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
  output =
    setHalfWord(output, counter,
                signed_saturate16(signExtend16
                                  (getHalfWord(rf[SrcA], counter)) +
                                  signExtend16(getHalfWord
                                                (rf[SrcB],
                                                counter))));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000| n  | 001100011 | s  | s  | d  |

Figure 4-307: addhs in X0 Bit Descriptions
Figure 4-308: addhs in X1 Bit Descriptions
addib: Add Immediate Bytes

Syntax
addib Dest, SrcA, Imm8

Example
addib r5, r6, 5

Description
Add an immediate to all four of the bytes in the source operand.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    output =
        setByte(output, counter, (getByte(rf[Srca], counter) + Imm8));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-309: addib in X0 Bit Descriptions

Figure 4-310: addib in X1 Bit Descriptions
addih: Add Immediate Half Words

Syntax
addih Dest, SrcA, Imm8

Example
addih r5, r6, 5

Description
Add a sign extended immediate to both of the half words in the source operand.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output =
        setHalfWord(output, counter,
            (getHalfWord(rf[SrcA], counter) +
            signExtend8(Imm8)));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 100| n  | 0000010 | i  | s  | d  |

Figure 4-311: addih in X0 Bit Descriptions
Figure 4-312: addih in X1 Bit Descriptions
adiffb_u: Absolute Difference Unsigned Bytes

Syntax
adiffb_u Dest, SrcA, SrcB

Example
adiffb_u r5, r6, r7

Description
Compute the absolute differences between the four bytes in the first source operand and the four bytes in the second source operand.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    output |=
    abs(((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK) -
    ((rf[SrcB] >> (counter * BYTE_SIZE)) & BYTE_MASK)) <<
    (counter * BYTE_SIZE);
} 
rf[Dest] = output;

Valid Pipelines

Encoding

Figure 4-313: adiffb_u in X0 Bit Descriptions
adiffh: Absolute Difference Half Words

Syntax
adiffh Dest, SrcA, SrcB

Example
adiffh r5, r6, r7

Description
Compute the absolute differences between the pair of half words in the first source operand and the pair of half words in the second source operand.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output |=
        abs(signExtend16((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK) -
        signExtend16((rf[SrcB] >> (counter * HALF_WORD_SIZE)) &
        HALF_WORD_MASK)) << (counter * HALF_WORD_SIZE);
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>23</th>
<th>22</th>
<th>21</th>
<th>20</th>
<th>19</th>
<th>18</th>
<th>17</th>
<th>16</th>
<th>15</th>
<th>14</th>
<th>13</th>
<th>12</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>n</td>
<td>000000101</td>
<td>s</td>
<td>s</td>
<td>d</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 4-314: adiffh in X0 Bit Descriptions
**avgb_u: Average Byte Unsigned**

**Syntax**

```
avgb_u Dest, SrcA, SrcB
```

**Example**

```
avgb_u r5, r6, r7
```

**Description**

Compute the average of the four bytes in the first source operand and the four bytes in the second source operand, rounding upwards.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    UnsignedMachineWord srca = ((rf[Srca] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    UnsignedMachineWord srcb = ((rf[Srcb] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    output |= (((srca + srcb + 1) >> 1) & BYTE_MASK) << (counter * BYTE_SIZE));
}
rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-315: avgb_u in X0 Bit Descriptions](image)

---

*Tile Processor User Architecture Manual*  
Tilera Confidential — Subject to Change Without Notice
avgh: Average Half Words

Syntax

`avgh Dest, SrcA, SrcB`

Example

`avgh r5, r6, r7`

Description

Compute the average between the pair of half words in the first source operand and the pair of half words in the second source operand, rounding upwards.

Functional Description

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    SignedMachineWord srca = 
        signExtend16((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    SignedMachineWord srcb = 
        signExtend16((rf[SrcB] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    output |= 
        (((srca + srcb + 1) >> 1) & HALF_WORD_MASK) << (counter * HALF_WORD_SIZE));
}
rf[Dest] = output;
```

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
000 n 00000100 ss ss d
Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRROpcodeExtension_X0 - 0x8
S_X0 - 0x8
Opcode_X0 - 0x0
```

Figure 4-316: avgh in X0 Bit Descriptions
inthb: Interleave High Byte

**Syntax**

```
inthb Dest, SrcA, SrcB
```

**Example**

```
inthb r5, r6, r7
```

**Description**

Interleave the two high-order bytes of the first operand with the two high-order bytes of the second operand. The high-order byte of the result will be the high-order byte of the first operand. For example if the first operand contains the packed bytes \{A3,A2,A1,A0\} and the second operand contains the packed bytes \{B3,B2,B1,B0\} then the result will be \{A3,B3,A2,B2\}.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    bool asel = ((counter & 1) == 1);
    int in_sel = 2 + counter / 2;
    int16_t srca = ((rf[SrcA] >> (in_sel * BYTE_SIZE)) & BYTE_MASK);
    int16_t srcb = ((rf[SrcB] >> (in_sel * BYTE_SIZE)) & BYTE_MASK);
    output |=
        (((asel ? srca : srcb) & BYTE_MASK) << (counter * BYTE_SIZE));
} rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td>X</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
000  0  00000010  d   
  n  s   s   d   
```

```
Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRROpcodeExtension_X0 - 0x8
S_X0 - Sbit
Opcode_X0 - 0x0
```

*Figure 4-317: inthb in X0 Bit Descriptions*
Figure 4-318: inthb in X1 Bit Descriptions
**inthh: Interleave High Half Words**

**Syntax**

`inthh Dest, SrcA, SrcB`

**Example**

`inthh r5, r6, r7`

**Description**

Interleave the high-order half word of the first operand with the high-order half word of the second operand. The high-order half word of the result will be the high-order half word of the first operand. For example if the first operand contains the packed half words \{A1,A0\} and the second operand contains the packed half words \{B1,B0\} then the result will be \{A1,B1\}.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    bool asel = ((counter & 1) == 1);
    int in_sel = 1 + counter / 2;
    int16_t srca = ((rf[SrcA] >> (in_sel * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    int16_t srcb = ((rf[SrcB] >> (in_sel * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    output |= ((asel ? srca : srcb) & HALF_WORD_MASK) << (counter * HALF_WORD_SIZE);
} rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td>X</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
000  n  000001100  s  s  d
```

**Figure 4-319: inthh in X0 Bit Descriptions**
Figure 4-320: inthh in X1 Bit Descriptions
intlb: Interleave Low Byte

**Syntax**

```markdown
intlb Dest, SrcA, SrcB
```

**Example**

```markdown
intlb r5, r6, r7
```

**Description**

Interleave the two low-order bytes of the first operand with the two low-order bytes of the second operand. The low-order byte of the result will be the low-order byte of the second operand. For example if the first operand contains the packed bytes \{A3,A2,A1,A0\} and the second operand contains the packed bytes \{B3,B2,B1,B0\} then the result will be \{A1,B1,A0,B0\}.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    bool asel = ((counter & 1) == 1);
    int in_sel = 0 + counter / 2;
    int16_t srca = ((rf[SrcA] >> (in_sel * BYTE_SIZE)) & BYTE_MASK);
    int16_t srcb = ((rf[SrcB] >> (in_sel * BYTE_SIZE)) & BYTE_MASK);
    output |=
        ((asel ? srca : srcb) & BYTE_MASK) << (counter * BYTE_SIZE));
}
rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-321: intlb in X0 Bit Descriptions](image)
### Figure 4-322: intlb in X1 Bit Descriptions

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>30</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>29</td>
<td>SrcB_X1 - SrcB</td>
</tr>
<tr>
<td>28</td>
<td>Opcode_X1 - 0x7</td>
</tr>
<tr>
<td>27</td>
<td>RRROpcodeExtension_X1 - 0x7</td>
</tr>
<tr>
<td>26</td>
<td>S_X1 - Sbit</td>
</tr>
<tr>
<td>25</td>
<td>Opcode_X1 - 0x1</td>
</tr>
<tr>
<td>24</td>
<td>000000111</td>
</tr>
<tr>
<td>23</td>
<td>0x00000111</td>
</tr>
<tr>
<td>22</td>
<td>s</td>
</tr>
<tr>
<td>21</td>
<td>s</td>
</tr>
<tr>
<td>20</td>
<td>d</td>
</tr>
<tr>
<td>19</td>
<td>n</td>
</tr>
<tr>
<td>18</td>
<td>0001</td>
</tr>
<tr>
<td>17</td>
<td>0001</td>
</tr>
<tr>
<td>16</td>
<td>62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31</td>
</tr>
</tbody>
</table>
intlh: Interleave Low Half Words

Syntax

\[
\text{intlh Dest, SrcA, SrcB}
\]

Example

\[
\text{intlh r5, r6, r7}
\]

Description

Interleave the low-order half word of the first operand with the low-order half word of the second operand. The low-order half word of the result will be the low-order half word of the second operand. For example if the first operand contains the packed half words \{A1,A0\} and the second operand contains the packed half words \{B1,B0\} then the result will be \{A0,B0\}.

Functional Description

\[
\text{UnsignedMachineWord output} = 0;
\]

\[
\text{uint32_t counter;}
\]

\[
\text{for (counter = 0; counter < (WORD\_SIZE / HALF\_WORD\_SIZE); counter++) {}
\]

\[
\text{bool asel = ((counter \& 1) == 1);}
\]

\[
\text{int in\_sel = 0 + counter / 2;}
\]

\[
\text{int16_t srca =}
\]

\[
\text{((rf[SrcA] >> (in\_sel \* HALF\_WORD\_SIZE)) \& HALF\_WORD\_MASK);}
\]

\[
\text{int16_t srcb =}
\]

\[
\text{((rf[SrcB] >> (in\_sel \* HALF\_WORD\_SIZE)) \& HALF\_WORD\_MASK);}
\]

\[
\text{output |=}
\]

\[
\text{(((asel ? srca : srcb) \& HALF\_WORD\_MASK) << (counter \* HALF\_WORD\_SIZE));}
\]

\[
}\ rf[Dest] = output;
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

\[
\begin{array}{cccccccccccccccc}
\hline
000 & n & 00000111 & s & s & d & Dest_X0 \cdot Dest & SrcA_X0 \cdot SrcA & SrcB_X0 \cdot SrcB & RRR\text{Opc}de\text{E}xtension_X0 \cdot 0xE & S_X0 \cdot Sbit & \text{Opc}de\text{e}X0 \cdot 0x0
\end{array}
\]

\textit{Figure 4-323: intlh in X0 Bit Descriptions}
Figure 4-324: intlh in X1 Bit Descriptions
maxb_u: Maximum Byte Unsigned

Syntax

maxb_u Dest, SrcA, SrcB

Example

maxb_u r5, r6, r7

Description

Set each of the bytes in the destination to the maximum of the corresponding byte in the first source operand and the corresponding byte in the second source operand.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    uint8_t srca = ((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    uint8_t srcb = ((rf[SrcB] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    output |= (((srca > srcb) ? srca : srcb) & BYTE_MASK) << (counter * BYTE_SIZE));
} rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000| n  | 00001111 | s  | s  | d  |

<table>
<thead>
<tr>
<th>Dest</th>
<th>Dest</th>
</tr>
</thead>
<tbody>
<tr>
<td>SrcA</td>
<td>SrcA</td>
</tr>
<tr>
<td>SrcB</td>
<td>SrcB</td>
</tr>
<tr>
<td>RRR</td>
<td>RRR</td>
</tr>
<tr>
<td>Opcode</td>
<td>Opcode</td>
</tr>
</tbody>
</table>

Figure 4-325: maxb_u in X0 Bit Descriptions
Figure 4-326: maxb_u in X1 Bit Descriptions
maxh: Maximum Half Words

Syntax
maxh Dest, SrcA, SrcB

Example
maxh r5, r6, r7

Description
Set each of the half words in the destination to the maximum of the corresponding half word in the first source operand and the corresponding half word in the second source operand.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    int16_t srca = ((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    int16_t srcb = ((rf[SrcB] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    output |= (((srca > srcb) ? srca : srcb) & HALF_WORD_MASK) << (counter * HALF_WORD_SIZE);
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-327: maxh in X0 Bit Descriptions
Figure 4-328: maxh in X1 Bit Descriptions
**maxib_u: Maximum Immediate Byte Unsigned**

**Syntax**

```
maxib_u Dest, SrcA, Imm8
```

**Example**

```
maxib_u r5, r6, 5
```

**Description**

Set each of the bytes in the destination to the maximum of the corresponding byte in the first source operand and the sign extended immediate.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
uint8_t immb = Imm8;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    uint8_t srca = ((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    output |= (((srca > immb) ? srca : immb) & BYTE_MASK) << (counter * BYTE_SIZE));
}
rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n | 000010000 | s | s | d |

**Figure 4-329: maxh in X0 Bit Descriptions**
Figure 4-330: maxh in X1 Bit Descriptions
maxih: Maximum Immediate Half Words

Syntax
maxih Dest, SrcA, Imm8

Example
maxih r5, r6, 5

Description
Set each of the half words in the destination to the maximum of the corresponding half word in the first source operand and the sign extended immediate.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    int16_t srca = 
        ({rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    output |= 
        ((srca >
        signExtend8(Imm8)) ? srca : signExtend8(Imm8)) &
        HALF_WORD_MASK) << (counter * HALF_WORD_SIZE));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 100 | n | 0000101 | i | s | d |

Dest_X0 - Dest
SrcA_X0 - SrcA
Imm8_X0 - Imm8
ImmOpcodeExtension_X0 - 0x5
9_X0 - Sbit
Opcode_X0 - 0x4

Figure 4-331: maxih in X0 Bit Descriptions
Figure 4-332: maxih in X1 Bit Descriptions
**minb_u: Minimum Byte Unsigned**

**Syntax**

minb_u Dest, SrcA, SrcB

**Example**

minb_u r5, r6, r7

**Description**

Set each of the bytes in the destination to the minimum of the corresponding byte in the first source operand and the corresponding byte in the second source operand.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    uint8_t srca = ((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    uint8_t srcb = ((rf[SrcB] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    output |= ((((srca < srcb) ? srca : srcb) & BYTE_MASK) << (counter * BYTE_SIZE));
}
rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
  30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
000 n 00010001 s s d
```

- Dest_X0 - Dest
- SrcA_X0 - SrcA
- SrcB_X0 - SrcB
- RRROpcodeExtension_X0 - 0x11
- S_X0 - Sbit
- Opcode_X0 - Ox0

*Figure 4-333: minb in X0 Bit Descriptions*
Figure 4-334: minb in X1 Bit Descriptions
**minh: Minimum Half Words**

**Syntax**

```
minh Dest, SrcA, SrcB
```

**Example**

```
minh r5, r6, r7
```

**Description**

Set each of the half words in the destination to the minimum of the corresponding half word in the first source operand and the corresponding half word in the second source operand.

**Functional Description**

```
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    int16_t srca = ((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    int16_t srcb = ((rf[SrcB] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    output |= (((srca < srcb) ? srca : srcb) & HALF_WORD_MASK) << (counter * HALF_WORD_SIZE);
}
rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
000 000010010  s  s  d
```

*Figure 4-335: minh in X0 Bit Descriptions*
Figure 4-336: minh in X1 Bit Descriptions
**minib_u: Minimum Immediate Byte Unsigned**

**Syntax**

minib_u Dest, SrcA, Imm8

**Example**

minib_u r5, r6, 5

**Description**

Set each of the bytes in the destination to the minimum of the corresponding byte in the first source operand and the sign extended immediate.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
uint8_t immb = Imm8;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    uint8_t srca = ((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    output |= (((srca < immb) ? srca : immb) & BYTE_MASK) << (counter * BYTE_SIZE));
}
rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 100| n  | 0000110| i  | s  | d  |
```

**Figure 4-337: minib_u in X0 Bit Descriptions**
Figure 4-338: minib_u in X1 Bit Descriptions
minih: Minimum Immediate Half Words

Syntax

minih Dest, SrcA, Imm8

Example

minih r5, r6, 5

Description

Set each of the half words in the destination to the minimum of the corresponding half word in the first source operand and the sign extended immediate.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    int16_t srca =
        ((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    output |=
        (((srca < signExtend8(Imm8)) ? srca : signExtend8(Imm8)) &
            HALF_WORD_MASK) << (counter * HALF_WORD_SIZE));
} rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-339: minih in X0 Bit Descriptions
Figure 4-340: minih in X1 Bit Descriptions
mnzb: Mask Not Zero Byte

Syntax

mnzb Dest, SrcA, SrcB

Example

mnzb r5, r6, r7

Description

Set each byte in the destination to the corresponding byte of the second operand if the corresponding byte of the first operand is not 0, otherwise set it to zero (0).

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    int8_t srca = ((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    int8_t srcb = ((rf[SrcB] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    output |= (((srca != 0) ? srcb : 0) & BYTE_MASK) << (counter * BYTE_SIZE));
} 
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>23</th>
<th>22</th>
<th>21</th>
<th>20</th>
<th>19</th>
<th>18</th>
<th>17</th>
<th>16</th>
<th>15</th>
<th>14</th>
<th>13</th>
<th>12</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>n</td>
<td>000010011</td>
<td>s</td>
<td>s</td>
<td>d</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- Dest_X0 - Dest
- SrcA_X0 - SrcA
- SrcB_X0 - SrcB
- RRRDOpcodeExtension_X0 - 0x13
- s_X0 - Sbit
- Opcode_X0 - 0x0

Figure 4-341: mnzb in X0 Bit Descriptions
Figure 4-342: mnzb in X1 Bit Descriptions
mnzh: Mask Not Zero Half Words

Syntax

mnzh Dest, SrcA, SrcB

Example

mnzh r5, r6, r7

Description

Set each half word in the destination to the corresponding half word of the second operand if the corresponding half word of the first operand is not 0, otherwise set it to zero (0).

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    int16_t srca =
        ((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    int16_t srcb =
        ((rf[SrcB] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    output |=
        (((srca != 0) ? srcb : 0) & HALF_WORD_MASK) << (counter * 
            HALF_WORD_SIZE));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
00 000010100  s s d
000 n
```

Figure 4-343: mnzh in X0 Bit Descriptions
Figure 4-344: mnzh in X1 Bit Descriptions
mzb: Mask Zero Byte

Syntax

\texttt{mzb Dest, SrcA, SrcB}

Example

\texttt{mzb r5, r6, r7}

Description

Set each byte in the destination to the corresponding byte of the second operand if the corresponding byte of the first operand is 0, otherwise set it to zero (0).

Functional Description

\begin{verbatim}
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    int8_t srca = ((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    int8_t srcb = ((rf[SrcB] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    output |= (((srca == 0) ? srcb : 0) & BYTE_MASK) << (counter * BYTE_SIZE));
}
rf[Dest] = output;
\end{verbatim}

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

\begin{verbatim}
000 | n | 00010111 | s | s | d |
\end{verbatim}

\texttt{Dest_X0 - Dest}
\texttt{SrcA_X0 - SrcA}
\texttt{SrcB_X0 - SrcB}
\texttt{RRROpcodeExtension_X0 - 0x2F}
\texttt{S_X0 - Sbit}
\texttt{Opcode_X0 - 0x0}

\textit{Figure 4-345: mzb in X0 Bit Descriptions}
Figure 4-346: mzb in X1 Bit Descriptions
mzh: Mask Zero Half Words

Syntax

mzh Dest, SrcA, SrcB

Example

mzh r5, r6, r7

Description

Set each half word in the destination to the corresponding half word of the second operand if the corresponding half word of the first operand is 0, otherwise set it to zero (0).

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    int16_t srca = ((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    int16_t srcb = ((rf[SrcB] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    output |= (((srca == 0) ? srcb : 0) & HALF_WORD_MASK) << (counter * HALF_WORD_SIZE));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
000 n | 00011000 | s | s | d
```

Figure 4-347: mzh in X0 Bit Descriptions
Figure 4-348: mzh in X1 Bit Descriptions
packbs_u: Pack Half Words Saturating

**Syntax**

packbs_u Dest, SrcA, SrcB

**Example**

packbs_u r5, r6, r7

**Description**

Saturate each half word of the two source registers to the maximum positive or 0 byte value, and then pack the results into the destination register. The high-order byte of the destination will be the saturated high-order half word of the first operand and the low-order byte of the destination will be the saturated low-order half word of the second operand. For example if the first operand contains the packed half words A1,A0 and the second operand contains the packed half word B1,B0 then the result will be sat A1,sat A0,sat B1,sat B0.

**NOTE:** This instruction is only supported in the TILEPro family of products.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    bool asel = ((counter / 2) == 1);
    int in_sel = counter & 1;
    int16_t srca = signExtend16(getHalfWord(rf[SrcA], in_sel));
    int16_t srcb = signExtend16(getHalfWord(rf[SrcB], in_sel));
    output =
        setByte(output, counter,
            unsigned_saturate8(asel ? srca : srcb));
} rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
| 000 | n | 001100111 | s | s | d |
```

Figure 4-349: packbs_u in X0 Bit Descriptions
Figure 4-350: `packbs_u` in X1 Bit Descriptions
packhb: Pack High Byte

**Syntax**

packhb Dest, SrcA, SrcB

**Example**

packhb r5, r6, r7

**Description**

Pack the high-order byte of each of the packed half words of the two source registers into the destination register. The high-order byte of the destination with be the high-order byte of the first operand. For example if the first operand contains the packed bytes \{A1_1,A1_0,A0_1,A0_0\} and the second operand contains the packed bytes \{B1_1,B1_0,B0_1,B0_0\} then the result will be \{A1_1,A0_1,B1_1,B0_1\}.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
  bool ase1 = ((counter / 2) == 1);
  int in_sel = 1 + (counter & 1) * 2;
  int8_t srca = ((rf[SrcA] >> (in_sel * BYTE_SIZE)) & BYTE_MASK);
  int8_t srcb = ((rf[SrcB] >> (in_sel * BYTE_SIZE)) & BYTE_MASK);
  output |=
    (((asel ? srca : srcb) & BYTE_MASK) << (counter * BYTE_SIZE));
} rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 003 | n | 00010100 | s | s | d |

- Dest_X0 : Dest
- SrcA_X0 : SrcA
- SrcB_X0 : SrcB
- RRROpcodeExtension_X0 : 0x34
- S_X0 : Sbit
- Opcode_X0 : Ox0

*Figure 4-351: packhb in X0 Bit Descriptions*
### Figure 4-352: packhb in X1 Bit Descriptions

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>62</td>
<td>n</td>
</tr>
<tr>
<td>61</td>
<td>opcodeExtension_X1 - 0x1A</td>
</tr>
<tr>
<td>60</td>
<td>S_X1 - Sbit</td>
</tr>
<tr>
<td>59</td>
<td>Opcode_X1 - 0x1</td>
</tr>
<tr>
<td>58</td>
<td>SrcB_X1 - SrcB</td>
</tr>
<tr>
<td>57</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>56</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>55</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>54</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>53</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>52</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>51</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>50</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>49</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>48</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>47</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>46</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>45</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>44</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>43</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>42</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>41</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>40</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>39</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>38</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>37</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>36</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>35</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>34</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>33</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>32</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>31</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>30</td>
<td>S_X1 - Sbit</td>
</tr>
<tr>
<td>29</td>
<td>Opcode_X1 - 0x1</td>
</tr>
<tr>
<td>28</td>
<td>SrcB_X1 - SrcB</td>
</tr>
<tr>
<td>27</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>26</td>
<td>opcodeExtension_X1 - 0x1A</td>
</tr>
<tr>
<td>25</td>
<td>S_X1 - Sbit</td>
</tr>
<tr>
<td>24</td>
<td>Opcode_X1 - 0x1</td>
</tr>
<tr>
<td>23</td>
<td>SrcB_X1 - SrcB</td>
</tr>
<tr>
<td>22</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>21</td>
<td>opcodeExtension_X1 - 0x1A</td>
</tr>
<tr>
<td>20</td>
<td>S_X1 - Sbit</td>
</tr>
<tr>
<td>19</td>
<td>Opcode_X1 - 0x1</td>
</tr>
<tr>
<td>18</td>
<td>SrcB_X1 - SrcB</td>
</tr>
<tr>
<td>17</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>16</td>
<td>opcodeExtension_X1 - 0x1A</td>
</tr>
<tr>
<td>15</td>
<td>S_X1 - Sbit</td>
</tr>
<tr>
<td>14</td>
<td>Opcode_X1 - 0x1</td>
</tr>
<tr>
<td>13</td>
<td>SrcB_X1 - SrcB</td>
</tr>
<tr>
<td>12</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>11</td>
<td>opcodeExtension_X1 - 0x1A</td>
</tr>
<tr>
<td>10</td>
<td>S_X1 - Sbit</td>
</tr>
<tr>
<td>9</td>
<td>Opcode_X1 - 0x1</td>
</tr>
<tr>
<td>8</td>
<td>SrcB_X1 - SrcB</td>
</tr>
<tr>
<td>7</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>6</td>
<td>opcodeExtension_X1 - 0x1A</td>
</tr>
<tr>
<td>5</td>
<td>S_X1 - Sbit</td>
</tr>
<tr>
<td>4</td>
<td>Opcode_X1 - 0x1</td>
</tr>
<tr>
<td>3</td>
<td>SrcB_X1 - SrcB</td>
</tr>
<tr>
<td>2</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>1</td>
<td>opcodeExtension_X1 - 0x1A</td>
</tr>
<tr>
<td>0</td>
<td>S_X1 - Sbit</td>
</tr>
</tbody>
</table>
packhs: Pack Half Words Saturating

**Syntax**

packhs Dest, SrcA, SrcB

**Example**

packhs r5, r6, r7

**Description**

Saturate each of the two source registers to the maximum positive or minimum negative half word value, and then pack the results into the destination register. The low-order half word of the destination will be the saturated second operand.

NOTE: This instruction is only supported in the TILEPro family of products.

**Functional Description**

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    bool asel = counter & 1;
    int16_t srca = signed_saturate16(rf[SrcA]);
    int16_t srcb = signed_saturate16(rf[SrcB]);
    output = setHalfWord(output, counter, (asel ? srca : srcb));
}
rf[Dest] = output;

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000| n  | 001100110 | s  | s  | d  |

**Figure 4-353: packhs in X0 Bit Descriptions**
Figure 4-354: packhs in X1 Bit Descriptions
packlb: Pack Low Byte

Syntax

packlb Dest, SrcA, SrcB

Example

packlb r5, r6, r7

Description

Pack the low-order byte of each of the packed half words of the two source registers into the destination register. The low-order byte of the destination will be the low-order byte of the second operand. For example if the first operand contains the packed bytes \{A1_1,A1_0,A0_1,A0_0\} and the second operand contains the packed bytes \{B1_1,B1_0,B0_1,B0_0\} then the result will be \{A1_0,A0_0,B1_0,B0_0\}.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    bool asel = ((counter / 2) == 1);
    int in_sel = 0 + (counter & 1) * 2;
    int8_t srca = ((rf[SrcA] >> (in_sel * BYTE_SIZE)) & BYTE_MASK);
    int8_t srcb = ((rf[SrcB] >> (in_sel * BYTE_SIZE)) & BYTE_MASK);
    output |=
        ((asel ? srca : srcb) & BYTE_MASK) << (counter * BYTE_SIZE));
} rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n | 00010101 | s | s | d |

Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRROpcodeExtension_X0 - 0x35
S_X0 - Sbit
Opcode_X0 - 0x0

Figure 4-355: packlb in X0 Bit Descriptions
Figure 4-356: packlb in X1 Bit Descriptions
sadab_u: Sum of Absolute Difference Accumulate Unsigned Bytes

Syntax
sadab_u Dest, SrcA, SrcB

Example
sadab_u r5, r6, r7

Description
Sum the absolute differences between the four bytes in the first source operand and the four bytes in the second source operand and accumulate the sum into the destination register.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    output +=
        abs(((rf[SrCA] >> (counter * BYTE_SIZE)) & BYTE_MASK) -
        ((rf[SrCB] >> (counter * BYTE_SIZE)) & BYTE_MASK));
} rf[Dest] = rf[Dest] + output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-357: sadab_u in X0 Bit Descriptions
sadah: Sum of Absolute Difference Accumulate Half Words

Syntax

sadah Dest, SrcA, SrcB

Example

sadah r5, r6, r7

Description

Sum the absolute differences between the pair of half words in the first source operand and the
pair of half words in the second source operand and accumulate the sum into the destination
register.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output +=
        abs(signExtend16(getHalfWord(rf[SrcA], counter)) -
        signExtend16(getHalfWord(rf[SrcB], counter)));
}
rf[Dest] = rf[Dest] + output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000| e  | 0001101H| s  | s  | ds |
|    |    | Dest_X0 | SrcA_X0 | SrcB_X0 | RRR Opcode Extension_X0 | S_X0 | Opcode_X0 |
```

Figure 4-358: sadah in X0 Bit Descriptions
sadah_u: Sum of Absolute Difference Accumulate Unsigned Half Words

Syntax
sadah_u Dest, SrcA, SrcB

Example
sadah_u r5, r6, r7

Description
Sum the absolute differences between the pair of half words in the first source operand and the pair of half words in the second source operand and accumulate the sum into the destination register.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output +=
        abs(((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK) -
            ((rf[SrcB] >> (counter * HALF_WORD_SIZE)) &
             HALF_WORD_MASK));
}
rf[Dest] = rf[Dest] + output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-359: sadah_u in X0 Bit Descriptions
sadb_u: Sum of Absolute Difference Unsigned Bytes

**Syntax**

\[ \text{sadb}_u \text{ Dest, SrcA, SrcB} \]

**Example**

\[ \text{sadb}_u \ r5, \ r6, \ r7 \]

**Description**

Sum the absolute differences between the four bytes in the first source operand and the four bytes in the second source operand.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    output +=
        abs(((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK) -
            ((rf[SrcB] >> (counter * BYTE_SIZE)) & BYTE_MASK));
}
rf[Dest] = output;
```

**Valid Pipelines**

![X0 X1 Y0 Y1 Y2 Table]

**Encoding**

```
0000 \ n 000111101 \ s s d

- 0x3D
- 0x0
```

*Figure 4-360: sadb_u in X0 Bit Descriptions*
sadh: Sum of Absolute Difference Half Words

Syntax

sadh Dest, SrcA, SrcB

Example

sadh r5, r6, r7

Description

Sum the absolute differences between the pair of half words in the first source operand and the pair of half words in the second source operand.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output +=
        abs(signExtend16(getHalfWord(rf[SrCA], counter)) -
            signExtend16(getHalfWord(rf[SrCB], counter)));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
|   | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|---|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
|   | 000| n  |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    | Dest_X0 - Dest |
|   |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    | SrcA_X0 - SrcA |
|   |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    | SrcB_X0 - SrcB |
|   |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    | RRROpcodeExtension_X0 - 0x3E |
|   |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    | s_X0 - Sbit |
|   |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    | Opcode_X0 - 0x0 |
```

Figure 4-361: sadh in X0 Bit Descriptions
**sadh_u: Sum of Absolute Difference Unsigned Half Words**

**Syntax**

\[ \text{sadh}_u \ Dest, \ SrcA, \ SrcB \]

**Example**

\[ \text{sadh}_u \ r5, \ r6, \ r7 \]

**Description**

Sum the absolute differences between the pair of half words in the first source operand and the pair of half words in the second source operand.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output +=
        abs(((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK) -
        ((rf[SrcB] >> (counter * HALF_WORD_SIZE)) &
            HALF_WORD_MASK));
}
rf[Dest] = output;
```

**Valid Pipelines**

```
<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

**Encoding**

```
0  29  28  27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12  11  10  9  8  7  6  5  4  3  2  1  0
000  r  00011111  s  s  d
```

*Figure 4-362: sadh_u in X0 Bit Descriptions*
seqb: Set Equal to Byte

Syntax
seqb Dest, SrcA, SrcB

Example
seqb r5, r6, r7

Description
Sets each result byte to 1 if the corresponding byte of the first source operand is equal to the byte of the second source operand. Otherwise the result is set to 0. This instruction treats both source bytes as signed values.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    int8_t srca = ((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    int8_t srcb = ((rf[SrcB] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    output |=
        (((srca == srcb) ? 1 : 0) & BYTE_MASK) << (counter * BYTE_SIZE));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

000 0 00100000 s s 0

Figure 4-363: seqb in X0 Bit Descriptions
Figure 4-364: seqb in X1 Bit Descriptions
**seqh: Set Equal To Half Words**

**Syntax**

```
seqh Dest, SrcA, SrcB
```

**Example**

```
seqh r5, r6, r7
```

**Description**

Sets each result half word to 1 if the corresponding half word of the first source operand is equal to the half word of the second source operand. Otherwise the result is set to 0. This instruction treats both source half words as signed values.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    int16_t srca =
        ((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    int16_t srcb =
        ((rf[SrcB] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    output |=
        (((srca == srcb) ? 1 : 0) & HALF_WORD_MASK) << (counter *
            HALF_WORD_SIZE));
}
rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n | 00100001 | s | s | d |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |

**Figure 4-365: seqh in X0 Bit Descriptions**
### Figure 4-366: seqh in X1 Bit Descriptions

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>62</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>61</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>60</td>
<td>SrcB_X1 - SrcB</td>
</tr>
<tr>
<td>59</td>
<td>RRR Opcode Extension_X1 - 0x22</td>
</tr>
<tr>
<td>58</td>
<td>S_X1 - Sbit</td>
</tr>
<tr>
<td>57</td>
<td>Opcode_X1 - 0x1</td>
</tr>
<tr>
<td>56</td>
<td></td>
</tr>
<tr>
<td>55</td>
<td></td>
</tr>
<tr>
<td>54</td>
<td></td>
</tr>
<tr>
<td>53</td>
<td></td>
</tr>
<tr>
<td>52</td>
<td></td>
</tr>
<tr>
<td>51</td>
<td></td>
</tr>
<tr>
<td>50</td>
<td></td>
</tr>
<tr>
<td>49</td>
<td></td>
</tr>
<tr>
<td>48</td>
<td></td>
</tr>
<tr>
<td>47</td>
<td></td>
</tr>
<tr>
<td>46</td>
<td></td>
</tr>
<tr>
<td>45</td>
<td></td>
</tr>
<tr>
<td>44</td>
<td></td>
</tr>
<tr>
<td>43</td>
<td></td>
</tr>
<tr>
<td>42</td>
<td></td>
</tr>
<tr>
<td>41</td>
<td></td>
</tr>
<tr>
<td>40</td>
<td></td>
</tr>
<tr>
<td>39</td>
<td></td>
</tr>
<tr>
<td>38</td>
<td></td>
</tr>
<tr>
<td>37</td>
<td></td>
</tr>
<tr>
<td>36</td>
<td></td>
</tr>
<tr>
<td>35</td>
<td></td>
</tr>
<tr>
<td>34</td>
<td></td>
</tr>
<tr>
<td>33</td>
<td></td>
</tr>
<tr>
<td>32</td>
<td></td>
</tr>
<tr>
<td>31</td>
<td></td>
</tr>
<tr>
<td>30</td>
<td></td>
</tr>
<tr>
<td>29</td>
<td></td>
</tr>
<tr>
<td>28</td>
<td></td>
</tr>
<tr>
<td>27</td>
<td></td>
</tr>
<tr>
<td>26</td>
<td></td>
</tr>
<tr>
<td>25</td>
<td></td>
</tr>
<tr>
<td>24</td>
<td></td>
</tr>
<tr>
<td>23</td>
<td></td>
</tr>
<tr>
<td>22</td>
<td></td>
</tr>
<tr>
<td>21</td>
<td></td>
</tr>
<tr>
<td>20</td>
<td></td>
</tr>
<tr>
<td>19</td>
<td></td>
</tr>
<tr>
<td>18</td>
<td></td>
</tr>
<tr>
<td>17</td>
<td></td>
</tr>
<tr>
<td>16</td>
<td></td>
</tr>
<tr>
<td>15</td>
<td></td>
</tr>
<tr>
<td>14</td>
<td></td>
</tr>
<tr>
<td>13</td>
<td></td>
</tr>
<tr>
<td>12</td>
<td></td>
</tr>
<tr>
<td>11</td>
<td></td>
</tr>
<tr>
<td>10</td>
<td></td>
</tr>
<tr>
<td>9</td>
<td></td>
</tr>
<tr>
<td>8</td>
<td></td>
</tr>
<tr>
<td>7</td>
<td></td>
</tr>
<tr>
<td>6</td>
<td></td>
</tr>
<tr>
<td>5</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td></td>
</tr>
<tr>
<td>1</td>
<td></td>
</tr>
<tr>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>
seqib: Set Equal To Immediate Byte

Syntax
seqib Dest, SrcA, Imm8

Example
seqib r5, r6, 5

Description
Sets each result byte to 1 if the corresponding byte of the first source operand is equal to a sign extended immediate. Otherwise the result is set to 0. This instruction treats both source bytes as signed values.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    int8_t srca = ((rf[SrCA] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    int8_t srcb = signExtend8(Imm8) & BYTE_MASK;
    output |= (((srca == srcb) ? 1 : 0) & BYTE_MASK) << (counter * BYTE_SIZE));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 100| n  | 0001001| i | s | d |
```

Figure 4-367: seqib in X0 Bit Descriptions
### Figure 4-368: seqib in X1 Bit Descriptions

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>1</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>2</td>
<td>Imm8_X1 - Imm8</td>
</tr>
<tr>
<td>3</td>
<td>ImmOpcodeExtension_X1 - 0xC</td>
</tr>
<tr>
<td>4</td>
<td>S_X1 - Sbit</td>
</tr>
<tr>
<td>5</td>
<td>Opcode_X1 - 0x6</td>
</tr>
</tbody>
</table>

The figure illustrates the bit descriptions for the seqib instruction in the X1 mode of the processor. The bit positions are numbered from 31 to 0, with the most significant bit (MSB) on the left and the least significant bit (LSB) on the right. Each bit is associated with a specific field in the instruction format, as indicated in the table above.
seqih: Set Equal To Immediate Half Words

Syntax
seqih Dest, SrcA, Imm8

Example
seqih r5, r6, 5

Description
Sets each result half word to 1 if the corresponding half word of the first source operand is equal to a sign extended immediate. Otherwise the result is set to 0. This instruction treats both source half words as signed values.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
  int16_t srca =
    (*((int16_t*) (rf[SrcA] >> (counter * HALF_WORD_SIZE))) & HALF_WORD_MASK);
  int16_t srcb = signExtend8(Imm8) & HALF_WORD_MASK;
  output |=
    (((srca == srcb) ? 1 : 0) & HALF_WORD_MASK) << (counter * HALF_WORD_SIZE));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 100| n  | .0001010 | i  | s  | d  |

Dest_X0 - Dest
SrcA_X0 - SrcA
Imm8_X0 - Imm8
ImmOpcodeExtension_X0 - 0xA
S_X0 - Sbit
Opcode_X0 - 0x4

Figure 4-369: seqih in X0 Bit Descriptions
Figure 4-370: seqih in X1 Bit Descriptions
shlb: Logical Shift Left Bytes

**Syntax**

\[
\text{shlb \ Dest, SrcA, SrcB}
\]

**Example**

\[
\text{shlb \ r5, r6, r7}
\]

**Description**

Logically shift the four bytes in the first source operand to the left by the second source operand. If the shift amount is larger than the number of bits in a byte, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a byte. Logical left shift shifts zeros into the low ordered bits in a byte.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output =
        setByte(output, counter,
            (getByte(rf[SrcA], counter) <<
                (((UnsignedMachineWord) rf[SrcB]) % BYTE_SIZE)));
}
rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
000  0  00100011  s  s  d
```

- \( \text{Dest}_X0 \cdot \text{Dest} \)
- \( \text{SrcA}_X0 \cdot \text{SrcA} \)
- \( \text{SrcB}_X0 \cdot \text{SrcB} \)
- \( \text{RRROpcExt}_X0 \cdot 0x43 \)
- \( \_S\_X0 \cdot \_Sbit \)
- \( \text{Opcode}_X0 \cdot 0x0 \)

*Figure 4-371: shlb in X0 Bit Descriptions*
Figure 4-372: shlb in X1 Bit Descriptions
shlh: Logical Shift Left Half Words

Syntax
shlh Dest, SrcA, SrcB

Example
shlh r5, r6, r7

Description
Logically shift the pair of half words in the first source operand to the left by the second source operand. If the shift amount is larger than the number of bits in a half word, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a half word. Logical left shift shifts zeros into the low ordered bits in a half word.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output =
    setHalfWord(output, counter,
    (getHalfWord(rf[Srca], counter) <<
    (((UnsignedMachineWord) rf[Srcb]) % HALF_WORD_SIZE)));
}
rfr[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
</tbody>
</table>

Encoding

Figure 4-373: shlh in X0 Bit Descriptions
shlib: Logical Shift Left Immediate Bytes

Syntax
shlib Dest, SrcA, ShAmt

Example
shlib r5, r6, 5

Description
Logically shift the four bytes in the first source operand to the left by an immediate. If the shift amount is larger than the number of bits in a byte, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a byte. Left shifts shift zeros into the low ordered bits in a byte and are suitable to be used as unsigned multiplication by powers of 2.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    output =
        setByte(output, counter,
            (getByte(rf[SrcA], counter) <<
            (((UnsignedMachineWord) ShAmt) % BYTE_SIZE)));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 111| n  | 00000000010 | i  | s  | d  |

- Dest_X0 - Dest
- SrcA_X0 - SrcA
- ShAmt_X0 - ShAmt
- UnShOpcodeExtension_X0 - 0x2
- S_X0 - Sbit
- Opcode_X0 - 0x7

Figure 4-374: shlib in X0 Bit Descriptions
Figure 4-375: shlb in X1 Bit Descriptions

- Dest_X1 - Dest
- SrcA_X1 - SrcA
- ShAmt_X1 - ShAmt
- UnShOpcodeExtension_X1 - 0x2
- S_X1 - Sbit
- Opcode_X1 - 0x8
shlih: Logical Shift Left Immediate Half Words

Syntax

shlih Dest, SrcA, ShAmt

Example

shlih r5, r6, 5

Description

Logically shift the pair of half words in the first source operand to the left by an immediate. If the shift amount is larger than the number of bits in a half word, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a half word. Left shifts shift zeros into the low ordered bits in a half word and are suitable to be used as unsigned multiplication by powers of 2.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output =
        setHalfWord(output, counter,
            (getHalfWord(rf[SrcA], counter) <<
                (((UnsignedMachineWord) ShAmt) %
                    HALF_WORD_SIZE)));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

\[\begin{array}{|c|c|c|c|c|c|}
\hline
30 & 29 & 28 & 27 & 26 & 25 \\
\hline
111 & n  & 00000001 & i & s & d \\
\hline
\end{array}\]

- \text{Dest}_X0: \textbf{Dest}
- \text{SrcA}_X0: \textbf{SrcA}
- \text{ShAmt}_X0: \textbf{ShAmt}
- \text{UnShOpcodeExtension}_X0: 0x3
- \text{S}_X0: \textbf{Sbit}
- \text{Opcode}_X0: 0x7

\textbf{Figure 4-376: shlih in X0 Bit Descriptions}
Figure 4-377: shlih in X1 Bit Descriptions
shrb: Logical Shift Right Bytes

Syntax

\[
\text{shrb Dest, SrcA, SrcB}
\]

Example

\[
\text{shrb r5, r6, r7}
\]

Description

Logically shift the four bytes in the first source operand to the right by the second source operand. If the shift amount is larger than the number of bits in a byte, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a byte. Logical right shift shifts zeros into the high ordered bits in a byte.

Functional Description

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    output =
        setByte(output, counter,
            (getByte(rf[SrcA], counter) >>
                (((UnsignedMachineWord) rf[SrcB]) % BYTE_SIZE)));
}
rf[Dest] = output;
```

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>23</th>
<th>22</th>
<th>21</th>
<th>20</th>
<th>19</th>
<th>18</th>
<th>17</th>
<th>16</th>
<th>15</th>
<th>14</th>
<th>13</th>
<th>12</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>n</td>
<td>001000110</td>
<td>s</td>
<td>s</td>
<td>d</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- `Dest_X0`: Dest
- `SrcA_X0`: SrcA
- `SrcB_X0`: SrcB
- `RRROpcodeExtension_X0`: 0x46
- `S_X0`: Sbit
- `Opcode_X0`: 0x0

*Figure 4-378: shrb in X0 Bit Descriptions*
Figure 4-379: shrb in X1 Bit Descriptions
shrh: Logical Shift Right Half Words

Syntax

\[
\text{shrh Dest, SrcA, SrcB}
\]

Example

\[
\text{shrh r5, r6, r7}
\]

Description

Logically shift the pair of half words in the first source operand to the right by the second source operand. If the shift amount is larger than the number of bits in a half word, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a half word. Logical right shift shifts zeros into the high ordered bits in a half word.

Functional Description

\[
\text{UnsignedMachineWord output} = 0;
\]

\[
\text{uint32_t counter;}
\]

\[
\text{for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {}
\]

\[
\text{output = setHalfWord(output, counter,}
\]

\[
\text{getHalfWord(rf[SrcA], counter) >>}
\]

\[
((\text{UnsignedMachineWord}) rf[SrcB]) \% \text{HALF_WORD_SIZE});
\]

\[
}\}
\]

\[
\text{rf[Dest]} = \text{output;}
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000| n  | 001000111| s  | s  | d  |

\[
\text{Dest}_X0 - \text{Dest}
\]

\[
\text{SrcA}_X0 - \text{SrcA}
\]

\[
\text{SrcB}_X0 - \text{SrcB}
\]

\[
\text{RRROpcodeExtension}_X0 - 0x47
\]

\[
\text{S}_X0 - \text{Sbit}
\]

\[
\text{Opcode}_X0 - 0x0
\]

Figure 4-380: sh rh in X0 Bit Descriptions
Figure 4-381: shrh in X1 Bit Descriptions
shrib: Logical Shift Right Immediate Bytes

Syntax

shrib Dest, SrcA, ShAmt

Example

shrib r5, r6, 5

Description

Logically shift the four bytes in the first source operand to the right by an immediate. If the shift amount is larger than the number of bits in a byte, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a byte. Logical right shifts shift zeros into the high ordered bits in a byte and are suitable to be used as unsigned integer division by powers of 2.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    output =
        setByte(output, counter,
            (getByte(rf[SrcA], counter) >>
            (((UnsignedMachineWord) ShAmt) % BYTE_SIZE)));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
00000000101 i s d
```

Figure 4-382: shrib in X0 Bit Descriptions
Figure 4-383: shr in X1 Bit Descriptions
shrih: Logical Shift Right Immediate Half Words

Syntax

shrih Dest, SrcA, ShAmt

Example

shrih r5, r6, 5

Description

Logically shift the pair of half words in the first source operand to the right by an immediate. If the shift amount is larger than the number of bits in a half word, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a half word. Logical right shifts shift zeros into the high ordered bits in a half word and are suitable to be used as unsigned integer division by powers of 2.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output =
        setHalfWord(output, counter,
            (getHalfWord(rf[SrcA], counter) >
            (((UnsignedMachineWord) ShAmt) %
            HALF_WORD_SIZE)));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 111| n  | 0000000110| i  | s  | d  |
```

Figure 4-384: shrih in X0 Bit Descriptions
Figure 4-385: shrh in X1 Bit Descriptions
sltb: Set Less Than Byte

Syntax

\texttt{sltb \textbf{Dest}, SrcA, SrcB}

Example

\texttt{sltb \textbf{r5}, r6, r7}

Description

Sets each result byte to 1 if the corresponding byte of the first source operand is less than the byte of the second source operand. Otherwise the result is set to 0. This instruction treats both source bytes as signed values.

Functional Description

\begin{verbatim}
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    int8_t srca = ((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    int8_t srcb = ((rf[SrcB] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    output |= (((srca < srcb) ? 1 : 0) & BYTE_MASK) << (counter * BYTE_SIZE));
}
rf[Dest] = output;
\end{verbatim}

Valid Pipelines

\begin{tabular}{|c|c|c|c|c|}
\hline
X0 & X1 & Y0 & Y1 & Y2 \\
\hline
X & X & \\
\hline
\end{tabular}

Encoding

\begin{figure}[h]
\centering
\includegraphics[width=\textwidth]{figure4-386.png}
\caption{sltb in X0 Bit Descriptions}
\end{figure}
Figure 4-387: sltb in X1 Bit Descriptions
**sltb_u: Set Less Than Unsigned Byte**

**Syntax**

sltb_u Dest, SrcA, SrcB

**Example**

sltb_u r5, r6, r7

**Description**

Sets each result byte to 1 if the corresponding byte of the first source operand is less than the byte of the second source operand. Otherwise the result is set to 0. This instruction treats both source bytes as unsigned values.

**Functional Description**

UnSignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    uint8_t srca = ((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    uint8_t srcb = ((rf[SrcB] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    output |= (((srca < srcb) ? 1 : 0) & BYTE_MASK) << (counter * BYTE_SIZE));
}
rf[Dest] = output;

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
000  n  00100101  s  s  d
```

- Dest_X0 - Dest
- SrcA_X0 - SrcA
- SrcB_X0 - SrcB
- RRROpcodeExtension_X0 - 0x4A
- S_X0 - Sbit
- Opcode_X0 - 0x0

*Figure 4-388: sltb_u in X0 Bit Descriptions*
Figure 4-389: 

- Dest1: Dest
- SrcA1: SrcA
- SrcB1: SrcB
- RROpcodeExtension1: 0x2C
- S1: Sbit
- Opcode1: 0x1
slteb: Set Less Than or Equal Byte

Syntax

slteb Dest, SrcA, SrcB

Example

slteb r5, r6, r7

Description

Sets each result byte to 1 if the corresponding byte of the first source operand is less than or equal to the byte of the second source operand. Otherwise the result is set to 0. This instruction treats both source bytes as signed values.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    int8_t srca = ((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    int8_t srcb = ((rf[SrcB] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    output |= (((srca <= srcb) ? 1 : 0) & BYTE_MASK) << (counter * BYTE_SIZE));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n | 00101011 | s | s | s|

Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRROpcodeExtension_X0 - 0x4B
S_X0 - Sbit
Opcode_X0 - 0x0
```

---

Figure 4-390: slteb in X0 Bit Descriptions
Figure 4-391: s1teb in X1 Bit Descriptions
Chapter 4 Processor Engine Instruction Set

slteb_u: Set Less Than or Equal Unsigned Byte

Syntax

```
slteb_u Dest, SrcA, SrcB
```

Example

```
slteb_u r5, r6, r7
```

Description

Sets each result byte to 1 if the corresponding byte of the first source operand is less than or equal to the byte of the second source operand. Otherwise the result is set to 0. This instruction treats both source bytes as unsigned values.

Functional Description

```
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    uint8_t srca = ((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    uint8_t srcb = ((rf[SrcB] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    output |= (((srca <= srcb) ? 1 : 0) & BYTE_MASK) << (counter * BYTE_SIZE));
}
rf[Dest] = output;
```

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |   0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n | 00101100 | s | s | d |

Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RROOpcodeExtension_X0 - 0x4C
S_X0 - Sbit
Opcode_X0 - 0x0

Figure 4-392: slteb_u in X0 Bit Descriptions
### Figure 4-393: s1teb_u in X1 Bit Descriptions

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>62-61</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>59-58</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>55-54</td>
<td>SrcB_X1 - SrcB</td>
</tr>
<tr>
<td>48-47</td>
<td>RRROpcodeExtension_X1 - 0x2E</td>
</tr>
<tr>
<td>43-42</td>
<td>S_X1 - Sbit</td>
</tr>
<tr>
<td>39-38</td>
<td>Opcode_X1 - 0x1</td>
</tr>
</tbody>
</table>

![Diagram showing bit descriptions for s1teb_u in X1](image-url)
slteh: Set Less Than or Equal Half Words

Syntax
slteh Dest, SrcA, SrcB

Example
slteh r5, r6, r7

Description
Sets each result half word to 1 if the corresponding half word of the first source operand is less than or equal to the half word of the second source operand. Otherwise the result is set to 0. This instruction treats both source half words as signed values.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    int16_t srca =
        ((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    int16_t srcb =
        ((rf[SrcB] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    output |=
        (((srca <=
            srcb) ? 1 : 0) & HALF_WORD_MASK) << (counter_*
            HALF_WORD_SIZE));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n | 001001101 | s | s | d |

Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRROpcodeExtension_X0 - 0x4D
S_X0 - Sbit
Opcode_X0 - 0x0

Figure 4-394: slteh in X0 Bit Descriptions
### Instruction Set Architecture

**Figure 4-395: slteh in X1 Bit Descriptions**

<table>
<thead>
<tr>
<th>Bit Position</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0001</td>
<td><code>n</code></td>
</tr>
<tr>
<td>00101111</td>
<td><code>s</code></td>
</tr>
<tr>
<td>0001</td>
<td><code>d</code></td>
</tr>
<tr>
<td>0001</td>
<td><code>s</code></td>
</tr>
<tr>
<td>0001</td>
<td><code>s</code></td>
</tr>
<tr>
<td>0001</td>
<td><code>d</code></td>
</tr>
<tr>
<td>0001</td>
<td><code>s</code></td>
</tr>
</tbody>
</table>

- `Dest_X1` - Dest
- `SrcA_X1` - SrcA
- `SrcB_X1` - SrcB
- `RRROpcodeExtension_X1` - 0x2F
- `S_X1` - Sbit
- `Opcode_X1` - 0x1
slteh_u: Set Less Than or Equal Unsigned Half Words

Syntax
slteh_u Dest, SrcA, SrcB

Example
slteh_u r5, r6, r7

Description
Sets each result half word to 1 if the corresponding half word of the first source operand is less than or equal to the half word of the second source operand. Otherwise the result is set to 0. This instruction treats both source half words as unsigned values.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    uint16_t srca = ((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    uint16_t srcb = ((rf[SrcB] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    output |= (((srca <= srcb) ? 1 : 0) & HALF_WORD_MASK) << (counter * HALF_WORD_SIZE));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|    |    |    |
| 000 | n | 00101110 | s | s | d |

Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRROpcodeExtension_X0 - 0x4E
S_X0 - Sbit
Opcode_X0 - 0x0

Figure 4-396: slteh_u in X0 Bit Descriptions
Figure 4-397: s1teh_u in X1 Bit Descriptions
slth: Set Less Than Half Words

Syntax

\[ \text{slth Dest, SrcA, SrcB} \]

Example

\[ \text{slth r5, r6, r7} \]

Description

Sets each result half word to 1 if the corresponding half word of the first source operand is less than the half word of the second source operand. Otherwise the result is set to 0. This instruction treats both source half words as signed values.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    int16_t srca = ((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    int16_t srcb = ((rf[SrcB] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    output |= (((srca < srcb) ? 1 : 0) & HALF_WORD_MASK) << (counter * HALF_WORD_SIZE);
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n | 00 | 10 | 0001 | s | s | d |

Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRROpcodeExtension_X0 - 0x51
S_X0 - Sbit
Opcode_X0 - 0x0

Figure 4-398: slth in X0 Bit Descriptions
Figure 4-399: slt in X1 Bit Descriptions
slth_u: Set Less Than Unsigned Half Words

Syntax

slth_u Dest, SrcA, SrcB

Example

slth_u r5, r6, r7

Description

Sets each result half word to 1 if the corresponding half word of the first source operand is less than the half word of the second source operand. Otherwise the result is set to 0. This instruction treats both source half words as unsigned values.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    uint16_t srca =
        ((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    uint16_t srcb =
        ((rf[SrcB] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    output |=
        (((srca <
            srcb) ? 1 : 0) & HALF_WORD_MASK) << (counter *
            HALF_WORD_SIZE));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>23</th>
<th>22</th>
<th>21</th>
<th>20</th>
<th>19</th>
<th>18</th>
<th>17</th>
<th>16</th>
<th>15</th>
<th>14</th>
<th>13</th>
<th>12</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>n</td>
<td>00101010</td>
<td>s</td>
<td>s</td>
<td>d</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 4-400: slth_u in X0 Bit Descriptions
Figure 4-401: slth_u in X1 Bit Descriptions
sltib: Set Less Than Immediate Byte

Syntax

\[ \text{sltib Dest, SrcA, Imm8} \]

Example

\[ \text{sltib r5, r6, 5} \]

Description

Sets each result byte to 1 if the corresponding byte of the first source operand is less than a sign extended immediate. Otherwise the result is set to 0. This instruction treats both source bytes as signed values.

Functional Description

\[
\text{UnsignedMachineWord output} = 0; \\
\text{uint32_t counter;} \\
\text{for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++)} \\
\text{\{ } \\
\text{\quad int8_t srca = ((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK);} \\
\text{\quad int8_t srcb = signExtend8(Imm8) & BYTE_MASK;} \\
\text{\quad output |=} \\
\text{\quad \quad \quad (((srca < srcb) ? 1 : 0) & BYTE_MASK) \ll (counter * BYTE_SIZE));} \\
\text{\}} \\
\text{rf[Dest]} = \text{output;}
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 100| n  | 0001100 | i | s | d |

\[
\text{Dest}_{X0} - \text{Dest} \\
\text{SrcA}_{X0} - \text{SrcA} \\
\text{Imm8}_{X0} - \text{Imm8} \\
\text{ImmOpcdeExtension}_{X0} - \text{0xC} \\
\text{S}_{X0} - \text{Sbit} \\
\text{Opcode}_{X0} - \text{0x4}
\]

Figure 4-402: sltib in X0 Bit Descriptions
Figure 4-403: sltib in X1 Bit Descriptions
### sltib_u: Set Less Than Unsigned Immediate Byte

**Syntax**

\[
\text{sltib\_u Dest, SrcA, Imm8}
\]

**Example**

\[
\text{sltib\_u r5, r6, 5}
\]

**Description**

Sets each result byte to 1 if the corresponding byte of the first source operand is less than a sign extended immediate. Otherwise the result is set to 0. This instruction treats both source bytes as unsigned values.

**Functional Description**

```
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    uint8_t srca = ((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    uint8_t srcb = signExtend8(Imm8) & BYTE_MASK;
    output |= (((srca < srcb) ? 1 : 0) & BYTE_MASK) << (counter * BYTE_SIZE));
}
rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
0001101
```

*Figure 4-404: sltib\_u in X0 Bit Descriptions*
Figure 4-405: stib_u in X1 Bit Descriptions
### sltih: Set Less Than Immediate Half Words

**Syntax**

\[
\text{sltih Dest, SrcA, Imm8}
\]

**Example**

\[
\text{sltih r5, r6, 5}
\]

**Description**

Sets each result half word to 1 if the corresponding half word of the first source operand is less than a sign extended immediate. Otherwise the result is set to 0. This instruction treats both source half words as signed values.

**Functional Description**

UnsingedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    int16_t srca =
        ((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    int16_t srcb = signExtend8(Imm8) & HALF_WORD_MASK;
    output |=
        (((srca < srcb) ? 1 : 0) & HALF_WORD_MASK) << (counter *
            HALF_WORD_SIZE));
} rf[Dest] = output;

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 100| n  | 0001110| i   | s   | d   |

- **Dest:** Dest
- **SrcA:** SrcA
- **Imm8:** Imm8
- **ImmOpcodeExtension:** 0xE
- **S:** Sbit
- **Opcodes:** 0x4

*Figure 4-406: sltih in X0 Bit Descriptions*
Figure 4-407: sltih in X1 Bit Descriptions
**sltih_u: Set Less Than Unsigned Immediate Half Words**

**Syntax**

\[ \text{sltih}_u \text{ Dest, SrcA, Imm8} \]

**Example**

\[ \text{sltih}_u \text{ r5, r6, 5} \]

**Description**

Sets each result half word to 1 if the corresponding half word of the first source operand is less than a sign extended immediate. Otherwise the result is set to 0. This instruction treats both source half words as unsigned values.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    uint16_t srca = (
        (rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    uint16_t srcb = signExtend8(Imm8) & HALF_WORD_MASK;
    output |=
        (((srca <
            (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK) << (counter * HALF_WORD_SIZE));
}
rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-408: sltih_u in X0 Bit Descriptions](image-url)
Figure 4-409: sltih_u in X1 Bit Descriptions
sneb: Set Not Equal To Byte

Syntax
sneb Dest, SrcA, SrcB

Example
sneb r5, r6, r7

Description
Sets each result byte to 1 if the corresponding byte of the first source operand is not equal to the byte of the second source operand. Otherwise the result is set to 0. This instruction treats both source bytes as signed values.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    int8_t srca = ((rf[SrcA] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    int8_t srcb = ((rf[SrcB] >> (counter * BYTE_SIZE)) & BYTE_MASK);
    output |= (((srca != srcb) ? 1 : 0) & BYTE_MASK) << (counter * BYTE_SIZE));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

000 r 00101010 s s s d

Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRR Opcode Extension_X0 - 0x55
S_X0 - Sbit
Opcode_X0 - 0x0

Figure 4-410: sneb in X0 Bit Descriptions
Figure 4-411: sneb in X1 Bit Descriptions
sneh: Set Not Equal To Half Words

Syntax

sneh Dest, SrcA, SrcB

Example

sneh r5, r6, r7

Description

Sets each result half word to 1 if the corresponding half word of the first source operand is not equal to the half word of the second source operand. Otherwise the result is set to 0. This instruction treats both source half words as signed values.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    int16_t srca =
        (((rf[SrcA] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    int16_t srcb =
        (((rf[SrcB] >> (counter * HALF_WORD_SIZE)) & HALF_WORD_MASK);
    output |=
        (((srca != srcb) ? 1 : 0) & HALF_WORD_MASK) << (counter * HALF_WORD_SIZE));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>23</th>
<th>22</th>
<th>21</th>
<th>20</th>
<th>19</th>
<th>18</th>
<th>17</th>
<th>16</th>
<th>15</th>
<th>14</th>
<th>13</th>
<th>12</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>n</td>
<td>00101010</td>
<td>s</td>
<td>s</td>
<td>d</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRROpcodeExtension_X0 - 0x56
S_X0 - Sbit
Opcode_X0 - 0x0

Figure 4-412: sneh in X0 Bit Descriptions
### Figure 4-413: sneh in X1 Bit Descriptions

<table>
<thead>
<tr>
<th>Bit Position</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31-30</td>
<td>Dest_X1 - Dest</td>
</tr>
<tr>
<td>29-28</td>
<td>SrcA_X1 - SrcA</td>
</tr>
<tr>
<td>27-26</td>
<td>SrcB_X1 - SrcB</td>
</tr>
<tr>
<td>25-24</td>
<td>RRROpcodeExtension_X1 - 0x38</td>
</tr>
<tr>
<td>23</td>
<td>S_X1 - Sbit</td>
</tr>
<tr>
<td>22</td>
<td>Opcode_X1 - 0x1</td>
</tr>
</tbody>
</table>

The figure illustrates the bit descriptions for the sneh instruction in the X1 architecture, with specific roles assigned to each bit position.
srab: Arithmetic Shift Right Bytes

Syntax
srab Dest, SrcA, SrcB

Example
srab r5, r6, r7

Description
Arithmetically shift the four bytes in the first source operand to the right by the second source operand. If the shift amount is larger than the number of bits in a byte, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a byte. Arithmetic right shift shifts the high ordered bit into the high ordered bits in a byte.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    output =
        setByte(output, counter,
            (signExtend8(getByte(rf[SrCA], counter)) >>
            (((UnsignedMachineWord) rf[SrCB]) % BYTE_SIZE)));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-14: srab in X0 Bit Descriptions
Figure 4-415: srab in X1 Bit Descriptions
srah: Arithmetic Shift Right Half Words

Syntax
srah Dest, SrcA, SrcB

Example
srah r5, r6, r7

Description
Arithmetically shift the pair of half words in the first source operand to the right by the second
source operand. If the shift amount is larger than the number of bits in a half word, the effective
shift amount is computed to be the specified shift amount modulo the number of bits in a half
word. Arithmetic right shift shifts the high ordered bit into the high ordered bits in a half word.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output =
    setHalfWord(output, counter,
               (signExtend16(getHalfWord(rf[SrcA], counter)) >>
               (((UnsignedMachineWord) rf[SrcB]) %
                HALF_WORD_SIZE)));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>23</th>
<th>22</th>
<th>21</th>
<th>20</th>
<th>19</th>
<th>18</th>
<th>17</th>
<th>16</th>
<th>15</th>
<th>14</th>
<th>13</th>
<th>12</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>n</td>
<td>001011001</td>
<td>s</td>
<td>s</td>
<td>s</td>
<td>d</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRROpcodeExtension_X0 - 0x59
S_X0 - Sbit
Opcode_X0 - 0x0

Figure 4-416: srah in X0 Bit Descriptions
### Instruction Set Architecture

**Figure 4-417: srah in X1 Bit Descriptions**

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0001</td>
<td>r</td>
</tr>
<tr>
<td>00011111</td>
<td>s</td>
</tr>
<tr>
<td></td>
<td>s</td>
</tr>
<tr>
<td></td>
<td>d</td>
</tr>
</tbody>
</table>

- Dest_X1 - Dest
- SrcA_X1 - SrcA
- SrcB_X1 - SrcB
- RRROpcodeExtension_X1 - 0x3B
- S_X1 - Sbit
- Opcode_X1 - 0x1
sraib: Arithmetic Shift Right Immediate Bytes

Syntax

sraib Dest, SrcA, ShAmt

Example

sraib r5, r6, 5

Description

Arithmetically shift the four bytes in the first source operand to the right by an immediate. If the shift amount is larger than the number of bits in a byte, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a byte. Arithmetic right shifts shift the high ordered bit into the high ordered bits in a byte.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    output =
        setByte(output, counter,
            (signExtend8(getByte(rf[SrcA], counter)) >>
             ((UnsignedMachineWord) ShAmt) % BYTE_SIZE));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 111|    | 0000011000 | i | s | d |

Figure 4-418: sraib in X0 Bit Descriptions
Figure 4-419: sraib in X1 Bit Descriptions
sraih: Arithmetic Shift Right Immediate Half Words

**Syntax**

sraih Dest, SrcA, ShAmt

**Example**

sraih r5, r6, 5

**Description**

Arithmetically shift pair of half words in the first source operand to the right by an immediate. If the shift amount is larger than the number of bits in a half word, the effective shift amount is computed to be the specified shift amount modulo the number of bits in a half word. Arithmetic right shifts shift the high ordered bit into the high ordered bits in a half word.

**Functional Description**

```c
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output =
        setHalfWord(output, counter,
            (signExtend16(getHalfWord(rf[SrcA], counter)) >>
                (((UnsignedMachineWord) ShAmt) %
                HALF_WORD_SIZE)));
}
rf[Dest] = output;
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td>X</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
  | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 1  | n  | 000001001 | i | s | d |
```

- Dest_X0 - Dest
- SrcA_X0 - SrcA
- ShAmt_X0 - ShAmt
- UnShOpcodeExtension_X0 - 0x9
- S_X0 - Sbit
- Opcode_X0 - 0x7

*Figure 4-420: sraih in X0 Bit Descriptions*
Figure 4-421: srai in X1 Bit Descriptions
subb: Subtract Bytes

Syntax
subb Dest, SrcA, SrcB

Example
subb r5, r6, r7

Description
Subtract the four bytes in the second source operand from the four bytes in the first source operand.

Functional Description
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / BYTE_SIZE); counter++) {
    output =
        setByte(output, counter,
            (getByte(rf[SrcA], counter) -
            getByte(rf[SrcB], counter)));
} rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n | 0010111 | s | s | d |

Dest_X0 - Dest
SrcA_X0 - SrcA
SrcB_X0 - SrcB
RRROpcodeExtension_X0 - 0x5B
S_X0 - Sbit
Opcode_X0 - 0x0

Figure 4-422: subb in X0 Bit Descriptions
Figure 4-423: subb in X1 Bit Descriptions
subbs_u: Subtract Bytes Saturating Unsigned

Syntax

\[
\text{subbs}_u \text{ Dest, SrcA, SrcB}
\]

Example

\[
\text{subbs}_u \ r5, \ r6, \ r7
\]

Description

Subtract the four bytes in the second source operand from the four bytes in the first source operand and saturate each result to 0 or the maximum positive value.

NOTE: This instruction is only supported in the TILEPro family of products.

Functional Description

\[
\text{UnsignedMachineWord output} = 0;
\]
\[
\text{uint32_t counter;}
\]
\[
\text{for} \ (\text{counter} = 0; \ \text{counter} < (\text{WORD_SIZE} / \text{BYTE_SIZE}); \ \text{counter}++) \ {\}
\]
\[
\text{output} =
\]
\[
\text{setByte(output, counter, unsigned_saturate8(getByte(rf[SrcA], counter) - getByte(rf[SrcB], counter))});
\]
\[
\text{rf[Dest]} = \text{output};
\]

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

| 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 000 | n | 001100100 | s | s | d |

- Dest_X0 \ - Dest
- SrcA_X0 \ - SrcA
- SrcB_X0 \ - SrcB
- RRR Opcode_Extension_X0 \ - 0x64
- S_X0 \ - Sbit
- Opcode_X0 \ - 0x0

Figure 4-424: subbs_u in X0 Bit Descriptions
Figure 4-425: subbs_u in X1 Bit Descriptions
subh: Subtract Half Words

Syntax

subh Dest, SrcA, SrcB

Example

subh r5, r6, r7

Description

Subtract the pair of half words in the second source operand from the pair of half words in the first source operand.

Functional Description

UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output =
        setHalfWord(output, counter,
            (getHalfWord(rf[SrcA], counter) -
            getHalfWord(rf[SrcB], counter)));
}
rf[Dest] = output;

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th></th>
<th>Dest_X0</th>
<th>Dest</th>
</tr>
</thead>
<tbody>
<tr>
<td>Opcode</td>
<td>0x0</td>
<td></td>
</tr>
<tr>
<td>D</td>
<td>0x5C</td>
<td></td>
</tr>
<tr>
<td>S_X0</td>
<td>0x0</td>
<td>Sbit</td>
</tr>
<tr>
<td>S</td>
<td>0011100</td>
<td>s</td>
</tr>
<tr>
<td>s</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>n</td>
<td>000</td>
<td></td>
</tr>
</tbody>
</table>

Figure 4-426: subh in X0 Bit Descriptions
### subhs: Subtract Half Words Saturating

#### Syntax

```
subhs Dest, SrcA, SrcB
```

#### Example

```
subhs r5, r6, r7
```

#### Description

Subtract the pair of half words in the second source operand from the pair of half words in the first source operand and saturate each result to the minimum negative value or maximum positive value.

**NOTE:** This instruction is only supported in the TILEPro family of products.

#### Functional Description

```
UnsignedMachineWord output = 0;
uint32_t counter;
for (counter = 0; counter < (WORD_SIZE / HALF_WORD_SIZE); counter++) {
    output =
        setHalfWord(output, counter,
            signed_saturate16(signExtend16
                (getHalfWord(rf[SrcA], counter)) -
                signExtend16(getHalfWord(rf[SrcB],
                    counter))));
}
rf[Dest] = output
```

#### Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Encoding

```
000  n  001100101  s  s  d
```

- **Dest_X0**: Dest
- **SrcA_X0**: SrcA
- **SrcB_X0**: SrcB
- **RRROcodeExtension_X0**: 0x65
- **S_X0**: Sbit
- **Opcode_X0**: 0x0

*Figure 4-427: subhs in X0 Bit Descriptions*
Figure 4-428: subhs in X1 Bit Descriptions
4.1.14 System Instructions

The following sections provide detailed descriptions of system instructions listed alphabetically.

- drain: Drain Instruction
- icoh: Instruction Stream Coherence
- ill: Illegal Instruction
- iret: Interrupt Return
- mfspr: Move from Special Purpose Register Word
- mtspr: Move to Special Purpose Register Word
- nap: Nap
- swint0: Software Interrupt 0
- swint1: Software Interrupt 1
- swint2: Software Interrupt 2
- swint3: Software Interrupt 3
drain: Drain Instruction

Syntax

drain

Example

drain

Description

Acts as a barrier that requires all previous instructions to complete before any subsequent instructions are executed. A Drain Instruction is dependent on all program order and previous instructions. All, program order subsequent instructions are dependent on the Drain Instruction. Instructions in the same bundle as the Drain Instruction will produce unspecified results. The Drain Instruction also traverses the full length of any processor pipelining before subsequent instructions are executed. By traversing the length of any processor pipelining, the Drain Instruction can be used to make state modifications to portions of the processor pipeline earlier than where the state modification takes place. The Drain Instruction does not post memory operations or serve as a Memory Fence. In order to guarantee memory ordering, a mf instruction is required.

Functional Description

drain();

Valid Pipelines

<p>| | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>X0</td>
<td>X1</td>
<td>Y0</td>
<td>Y1</td>
<td>Y2</td>
</tr>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
<table>
<thead>
<tr>
<th>62</th>
<th>61</th>
<th>60</th>
<th>59</th>
<th>58</th>
<th>57</th>
<th>56</th>
<th>55</th>
<th>54</th>
<th>53</th>
<th>52</th>
<th>51</th>
<th>50</th>
<th>49</th>
<th>48</th>
<th>47</th>
<th>46</th>
<th>45</th>
<th>44</th>
<th>43</th>
<th>42</th>
<th>41</th>
<th>40</th>
<th>39</th>
<th>38</th>
<th>37</th>
<th>36</th>
<th>35</th>
<th>34</th>
<th>33</th>
<th>32</th>
<th>31</th>
</tr>
</thead>
<tbody>
<tr>
<td>1000</td>
<td>0</td>
<td>0000001011</td>
<td>00001</td>
<td>00000</td>
<td>00000</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

Dest_X1 - Reserved 0x0
SrcA_X1 - Reserved 0x0
UnOpcodeExtension_X1 - 0x1
UnShOpcodeExtension_X1 - 0x8
S_X1 - Reserved 0x0
Opcode_X1 - 0x8

Figure 4-429: drain in X1 Bit Descriptions
**icoh: Instruction Stream Coherence**

**Syntax**

icoh SrcA

**Example**

icoh r5

**Description**

Make the instruction stream coherent with the data stream for a particular cache index. Removes possible stale instructions from the instruction stream caching system. The source operand names a particular indexed set in the instruction cache. All of the blocks associated with the indexed set are removed from the icache. The icoh instruction minimally flushes words, but may operate on cache lines depending on the instruction cache implementation. One icoh instruction is minimally guaranteed to flush an aligned word of data from the instruction cache. The indexing of the instruction cache is the same as if the parameter of the instruction is interpreted as a 64-bit zero-extended physical address. If icoh is used in a loop that increments any address by words and loops icoh instructions over an address range up to the size of the implementation specific instruction cache size, then the entire instruction cache is cleared with the exception of the flushing loop.

The Instruction Stream Coherence instruction needs to be used when data stores are made to a memory location which is to be executed later. Examples of this include self modifying code and physical page invalidates.

**Functional Description**

iCoherent(rf[SrcA]);

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-430: icoh in X1 Bit Descriptions](image_url)

Figure 4-430: icoh in X1 Bit Descriptions
**ill: Illegal Instruction**

**Syntax**

```plaintext
ill
```

**Example**

```plaintext
ill
```

**Description**

Causes an illegal instruction interrupt to occur. The Illegal Instruction is guaranteed to always cause an illegal instruction interrupt for all current and future derivations of the architecture.

**Functional Description**

```plaintext
illegalInstruction();
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

![Figure 4-431: ill in X1 Bit Descriptions](image1)

- Dest_X1 - Reserved 0x0
- SrcA_X1 - Reserved 0x0
- UnOpcodeExtension_X1 - 0x7
- UnShOpcodeExtension_X1 - 0xB
- S_X1 - Reserved 0x0
- Opcode_X1 - 0xB

![Figure 4-432: ill in Y1 Bit Descriptions](image2)

- Dest_Y1 - Reserved 0x0
- SrcA_Y1 - Reserved 0x0
- UnOpcodeExtension_Y1 - 0x2
- UnShOpcodeExtension_Y1 - 0x5
- Opcode_Y1 - 0xB
iret: Interrupt Return

Syntax

iret

Example

iret

Description

Returns from an interrupt. Transfers control flow to the program counter location and protection level contained in the current PL’s EX_CONTEXT registers, and restores the interrupt critical section bit to the value contained in those registers.

Functional Description

setNextPC(sprf
  [EX_CONTEXT_SPRF_OFFSET +
  (getCurrentProtectionLevel() * EX_CONTEXT_SIZE) +
  PC_EX_CONTEXT_OFFSET]);
branchPredictedIncorrect();
setProtectionLevel(sprf
  [EX_CONTEXT_SPRF_OFFSET +
  (getCurrentProtectionLevel() * EX_CONTEXT_SIZE) +
  PROTECTION_LEVEL_EX_CONTEXT_OFFSET]);
setInterruptCriticalSection(sprf[EX_CONTEXT_SPRF_OFFSET +
  (getCurrentProtectionLevel() * EX_CONTEXT_SIZE) +
  INTERRUPT_CRITICAL_SECTION_EX_CONTEXT_OFFSET]);
/* besides the PC we need to set our new protection level, and set the interrupt critical section bit atomically inside of this instruction */

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

![Figure 4-433: iret in X1 Bit Descriptions](image)
mfspr: Move from Special Purpose Register Word

Syntax
mfspr Dest, Imm15

Example
mfspr r6, 0x5

Description
Moves a word from a special purpose register. The special purpose register number is contained as an immediate and allows for the addressing of $2^{15}$ possible special purpose registers.

Functional Description
rf[Dest] = sprf[Imm15];

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

```
+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   |   | 1 |   |   |   | 1 |   |   |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 0000111 | i | i | d |
+---+---+---+---+---+---+---+---+---+---+---+---+---+
```

Figure 4-434: mfspr in X1 Bit Descriptions
**Instruction Set Architecture**

**mtspr: Move to Special Purpose Register Word**

**Syntax**

```
mtspr Imm15, SrcA
```

**Example**

```
mtspr 0x5, r6
```

**Description**

Moves a word to a special purpose register. The special purpose register number is contained as an immediate and allows for the addressing of $2^{15}$ possible special purpose registers.

**Functional Description**

```
sprf[Imm15] = rf[SrcA];
```

**Valid Pipelines**

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Encoding**

```
<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>110</td>
<td>0010101</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

![Figure 4-435: mtspr in X1 Bit Descriptions](image)

*Figure 4-435: mtspr in X1 Bit Descriptions*
**nap: Nap**

**Syntax**

`nap`

**Example**

`nap`

**Description**

Enters a lower power state. This instruction may or may not complete. To guarantee continued napping on all implementations, this instruction should be used in a loop. Instructions in the same bundle as the `nap` instruction will produce unspecified results. If this instruction completes, this operation does not modify architectural state.

**Functional Description**

`nap();`

**Valid Pipelines**

<p>| | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>X0</td>
<td>X1</td>
<td>Y0</td>
<td>Y1</td>
<td>Y2</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>X</td>
</tr>
</tbody>
</table>

**Encoding**

| 62 | 61 | 60 | 59 | 58 | 57 | 56 | 55 | 54 | 53 | 52 | 51 | 50 | 49 | 48 | 47 | 46 | 45 | 44 | 43 | 42 | 41 | 40 | 39 | 38 | 37 | 36 | 35 | 34 | 33 | 32 | 31 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
|    |    |    | 0  | 0000001011 | 10000 | 000000 | 000000 |

- Dest_X1 - Reserved 0x0
- SrcA_X1 - Reserved 0x0
- UnOpCodeExtension_X1 - 0x10
- UnShOpCodeExtension_X1 - 0xB
- S_X1 - Reserved 0x0
- Opcode_X1 - 0x8

*Figure 4-436: nap in X1 Bit Descriptions*
swint0: Software Interrupt 0

Syntax
swint0

Example
swint0

Description
Signals that a precise software interrupt should occur on this instruction to the Software Interrupt 0 interrupt handler. Instructions in the same bundle as the Software Interrupt 0 instruction will produce unspecified results.

Functional Description
softwareInterrupt(0);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

![Figure 4-437: swint0 in X1 Bit Descriptions]
swint1: Software Interrupt 1

Syntax

\texttt{swint1}

Example

\texttt{swint1}

Description

Signals that a precise software interrupt should occur on this instruction to the Software Interrupt 1 interrupt handler. Instructions in the same bundle as the Software Interrupt 1 instruction will produce unspecified results.

Functional Description

\texttt{softwareInterrupt(1);}

Valid Pipelines

\begin{tabular}{|c|c|c|c|c|}
\hline
X0 & X1 & Y0 & Y1 & Y2 \\
\hline
X  &   &   &   &   \\
\hline
\end{tabular}

Encoding

\begin{figure}[h]
\centering
\includegraphics[width=\textwidth]{swint1_bitDescriptions.png}
\caption{swint1 in X1 Bit Descriptions}
\end{figure}
swint2: Software Interrupt 2

Syntax
swint2

Example
swint2

Description
Signals that a precise software interrupt should occur on this instruction to the Software Interrupt 2 interrupt handler. Instructions in the same bundle as the Software Interrupt 2 instruction will produce unspecified results.

Functional Description
softwareInterrupt(2);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

Figure 4-439: swint2 in X1 Bit Descriptions
swint3: Software Interrupt 3

Syntax
swint3

Example
swint3

Description
Signals that a precise software interrupt should occur on this instruction to the Software Interrupt 3 interrupt handler. Instructions in the same bundle as the Software Interrupt 3 instruction will produce unspecified results.

Functional Description
softwareInterrupt(3);

Valid Pipelines

<table>
<thead>
<tr>
<th>X0</th>
<th>X1</th>
<th>Y0</th>
<th>Y1</th>
<th>Y2</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding

<table>
<thead>
<tr>
<th>Bit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>100</td>
<td>0x0</td>
</tr>
<tr>
<td>101</td>
<td>0x0</td>
</tr>
<tr>
<td>110</td>
<td>0x0</td>
</tr>
<tr>
<td>111</td>
<td>0x0</td>
</tr>
</tbody>
</table>

Figure 4-440: swint3 in X1 Bit Descriptions
### 4.1.15 Pseudo Instructions

Tilera’s assembler supports several pseudo-instructions for the convenience of the programmer. Each of these instructions shares an encoding with a standard ISA instruction.

<table>
<thead>
<tr>
<th>Pseudo Instruction</th>
<th>Canonical Form</th>
</tr>
</thead>
<tbody>
<tr>
<td>move dst, src</td>
<td>or dst, src, zero</td>
</tr>
<tr>
<td>movei dst, simm8</td>
<td>ori dst, zero, simm8</td>
</tr>
<tr>
<td>moveli dst, simm16</td>
<td>addli dst, zero, simm16</td>
</tr>
<tr>
<td>movelis dst, simm16</td>
<td>addlis dst, zero, simm16</td>
</tr>
<tr>
<td>j target</td>
<td>jf target or jb target</td>
</tr>
<tr>
<td>jal target</td>
<td>jafl target or jahl target</td>
</tr>
<tr>
<td>prefetch src</td>
<td>lb_u zero, src</td>
</tr>
<tr>
<td>prefetch_L1 src</td>
<td>lb_u src, src</td>
</tr>
<tr>
<td>bpt</td>
<td>ill</td>
</tr>
<tr>
<td>info simm8</td>
<td>andi zero, zero, simm8</td>
</tr>
<tr>
<td>infol simm16</td>
<td>auli zero, zero, simm16</td>
</tr>
</tbody>
</table>

1 Because of limitations in the instruction encoding space, forward-going direct jumps (jf, jafl) and backward-going direct jumps (jb, jahl) have different opcodes. If the programmer uses the pseudo-instruction j or jal, the assembler will generate the appropriate ISA instruction depending upon the target of the jump.

2 For performance reasons, loads to the zero register do not result in the register file being written. Such instructions are killed entirely if they would cause DTLB_MISS or DTLB_ACCESS interrupts. The TILE architecture does not guarantee that every prefetch instruction will cause the caches to be loaded. Thus prefetch (indeed, any load to the zero register) should be considered merely a hint to the hardware.

3 The TILE architecture does not provide an explicit breakpoint instruction. Instead, bpt is encoded as an illegal instruction with non-zero values in the implicit immediate fields. Thus bpt does not have exactly the same hardware encoding as the ill instruction.

**INFO** operations are generated by the compiler and are used to convey information about the state of the stack frame at various points in the code of a function. The backtrace library interprets these operations when performing stack unwinding.

In order to perform stack unwinding, the backtrace library requires that code conform to the stack frame conventions specified in the ABI. In the presence of compiler optimizations, however, the code may deviate from these conventions. In this case, the compiler automatically inserts **INFO** operations in the code to compensate.

Intrinsics, including the **INFO** operation, are a set of functions whose names have the format __insn_xxxx(), where xxxx is an instruction in the ISA.
5 MEMORY AND CACHE ARCHITECTURE

5.1 Memory Architecture

The Tile Processor™ architecture defines a flat, globally shared 64-bit physical address space and a 32-bit virtual address space. The TILE64™ and TILEPro™ family of processors implement a 36-bit physical address space. The globally shared physical address space provides the mechanism by which processes and threads can share instructions and data. Data memory is byte, half-word, and word addressable.

By default, hardware provides a cache-coherent view of data memory to applications. That is, a read by a thread or process to a physical address P will return the value of the most recent write to address P. Instruction memory that is written by the process itself (self-modifying code) or by other processes is not kept coherent by hardware. Special software sequences using the icoh instruction must be used to enforce coherence between data and instruction memory. In the TILE64 implementation, IO writes are not kept coherent with on-chip caches. The TILEPro implementation provides hardware cache coherence for IO accesses.

A non-coherent and a non-cacheable memory mode is also supported, as shown in Table 5-1. In addition to the memory modes, the architecture provides several memory attributes for controlling the allocation and distribution of cache lines. These are shown in Table 5-2.

The Tile Processor architecture memory attributes and modes are managed and configured through system software programming of page tables and enforced through TLB entries. Chapter 4 of the Multicore Development Environment Optimization Guide (UG105) provides the Application Programmer Interface (API) and details about memory allocation.

<table>
<thead>
<tr>
<th>Memory Mode</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Coherent Memory</td>
<td>Hardware cache coherent memory.</td>
</tr>
<tr>
<td>Non-Coherent Memory</td>
<td>Hardware does not maintain coherence.</td>
</tr>
<tr>
<td>Non-Cacheable Memory</td>
<td>Data cache blocks are not cached in any on-chip caches. Instruction cache blocks are not cached in the unified L2. Instruction cache blocks are always cached in the L1 instruction cache.</td>
</tr>
</tbody>
</table>
5.2 Cache Architecture

5.2.1 Overview

Due to the large difference between DRAM and processor speeds, the cache subsystem is critical for delivering high performance. The cache subsystem’s primary role is to prevent the processor cores from stalling due to long memory latencies. To this end, the cache subsystem implements a high performance, non-blocking, two-level cache hierarchy. The two-level design isolates the timing-critical L1 caches from complexity, allowing the L1 data and instruction cache design to be simple, fast, and low power.

The execution engine does not stall on load or store cache misses. Rather, execution of subsequent instructions continues until the data requested by the cache miss is actually needed by another instruction. The cache subsystem is non-blocking and supports multiple concurrent outstanding memory operations. The cache subsystem supports hit under miss and miss under miss, allowing loads and stores to different addresses to be re-ordered to achieve high bandwidth and overlap miss latencies, while still ensuring that true memory dependencies are enforced.

The cache subsystem provides cache-coherent shared memory, atomic instructions (test-and-set), and memory fences (MF). The TILEPro cache system maintains coherence with I/O DMA accesses to memory, and allows I/O to read and write the on-chip caches directly.

Finally, the cache subsystem implements a software-programmable hardware direct memory access engine (DMA) and supports using portions of the L2 cache as a scratchpad memory.

<table>
<thead>
<tr>
<th>Attributes</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>No L1d allocation</td>
<td>Lines are not allocated in the L1d cache (TILEPro only).</td>
</tr>
<tr>
<td>No L2 allocation</td>
<td>Remotely homed lines are not allocated in the L2 cache</td>
</tr>
<tr>
<td>Pinned memory</td>
<td>Hardware will lock the requested memory page in the L2 cache.</td>
</tr>
<tr>
<td>Hashed</td>
<td>Lines on page are distributed across cores according to a hardware hash function (TILEPro only).</td>
</tr>
</tbody>
</table>
5.2.2 Cache Microarchitecture

Table 5-3 lists the most important characteristics of the TILE64 and TILEPro cache subsystems.

Table 5-3. Cache Subsystems

<table>
<thead>
<tr>
<th></th>
<th>TILE64</th>
<th>TILEPro</th>
</tr>
</thead>
<tbody>
<tr>
<td>L1 instruction (L1I)</td>
<td>8 KB, direct-mapped</td>
<td>16 KB, direct-mapped</td>
</tr>
<tr>
<td>L1 instruction</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Translation</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Lookaside buffer</td>
<td>8 entries, fully associative</td>
<td>16 entries, fully associative</td>
</tr>
<tr>
<td>L1 data (L1D)</td>
<td></td>
<td>8 KB, two-way associative</td>
</tr>
<tr>
<td>L1 data translation</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Lookaside buffer</td>
<td>16 entries, fully associative</td>
<td></td>
</tr>
<tr>
<td>L2 unified cache</td>
<td>64 KB, two-way associative</td>
<td>64 KB, four-way associative</td>
</tr>
<tr>
<td>Latency (load to use)</td>
<td>2 cycles L1D hit,</td>
<td>8 cycles local L2 hit,</td>
</tr>
<tr>
<td></td>
<td>8 cycles local L2 hit,</td>
<td>30-60 cycles remote L2 hit,</td>
</tr>
<tr>
<td></td>
<td>80 cycles L2 miss to memory</td>
<td></td>
</tr>
<tr>
<td>Architecture</td>
<td>Non-blocking, out-of-order,</td>
<td></td>
</tr>
<tr>
<td></td>
<td>stall-on-use</td>
<td></td>
</tr>
<tr>
<td>DDC® technology</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr>
<td>Line Size</td>
<td></td>
<td></td>
</tr>
<tr>
<td>L1I: 64B</td>
<td></td>
<td></td>
</tr>
<tr>
<td>L1D: 16B</td>
<td></td>
<td></td>
</tr>
<tr>
<td>L2: 64B</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Allocate Policy</td>
<td>L1I: Allocate on read miss</td>
<td></td>
</tr>
<tr>
<td></td>
<td>L1D: Allocate on load miss only</td>
<td></td>
</tr>
<tr>
<td></td>
<td>L2: Allocate on load or store</td>
<td></td>
</tr>
<tr>
<td></td>
<td>miss</td>
<td></td>
</tr>
<tr>
<td>Write Policy</td>
<td>L1I: N/A</td>
<td></td>
</tr>
<tr>
<td></td>
<td>L1D: Write through, Store update</td>
<td></td>
</tr>
<tr>
<td></td>
<td>on hit</td>
<td></td>
</tr>
<tr>
<td></td>
<td>L2: Writeback</td>
<td></td>
</tr>
<tr>
<td>Error Protection</td>
<td>L1I: 64-bit parity</td>
<td></td>
</tr>
<tr>
<td></td>
<td>L1D: 8-bit parity</td>
<td></td>
</tr>
<tr>
<td></td>
<td>L2: 8-bit parity</td>
<td></td>
</tr>
</tbody>
</table>

a.Dynamic Distributed Cache

Figure 5-441 shows the top level block diagram for the Tile cache subsystem. The processor engine can issue one load or one store per cycle. The L1D cache is checked for the requested data. If the L1D does not have the requested data, the request is delivered to the L2 cache. Stores update the L1D if the targeted cache block is present, and always write thru to the L2 cache. The L1I cache is supported by a hardware prefetching engine that predicts and fetches the most likely next instruction cache line. Misses in the L2 cache on a given tile are satisfied by caches in other tiles or from external memory. If the other caches do not have the requested cache line, then they in turn fetch it from external memory and deliver it to the requesting core.
The cache subsystem supports out of order retirement, meaning instructions subsequent to a load or store miss can write the destination register before the load or store completes. Architectural state is kept consistent, due to the issue logic that blocks subsequent instructions from using stale data. The L2 cache subsystem supports multiple outstanding memory operations and cache misses. The L2 cache subsystem maintains an outstanding miss file to track transactions launched from this tile to memory or to other tiles. Each tile can have up to eight outstanding load misses to external memory as well as four (two for TILE64) outstanding L2 writebacks.

5.2.2.1 Dynamic Distributed Cached Shared Memory

The TILEPro uses the Dynamic Distributed Cache (DDC) to provide a hardware-managed, cache-coherent approach to shared memory. Applications normally access distributed coherent cached shared memory using loads and stores. DDC allows a page of shared memory to be homed on a specific tile (or distributed across many tiles), then cached remotely by other tiles. This mechanism allows a tile to view the collection of on-chip caches of all tiles as a large shared, distributed coherent cache. It promotes on-chip access and avoids the bottleneck of off-chip global memory. This form of shared memory access is particularly useful when processes read and write shared data in a fine-grained, interleaved manner — such as with locks and other synchronization objects.

Figure 5-443 shows a read from tile A (the remote requesting tile) to a cacheline X, where cacheline X is homed at tile B (the home tile):

1. Tile A first checks its local caches for the cacheline X, and on a miss, sends a request for cacheline X to tile B.
2. Tile B receives the request for cacheline X and retrieves cacheline X from its L2.
3. Tile B then sends the full cacheline X back to tile A. Tile A installs cacheline X in its local L1 and L2 caches.
Figure 5-442. Request to Home Tile/Fill L2/L1 with Cacheline X

Figure 5-443. shows a write from tile A to a word (X[0]) in cacheline X, where cacheline X is again homed at tile B.

1. Tile A sends the write address and data to tile B.
2. Tile B receives the write address and data and checks the directory information for cacheline X. The directory indicates that tile C (the sharing tile) has a copy of cacheline X. Tile B updates cacheline X with the new value for word X[0].
3. Tile B sends an invalidate message to tile C.
4. Tile C receives the invalidation and invalidates cacheline X from its caches.
5. Tile C then sends an invalidation acknowledgement back to tile B.
6. Tile B receives the invalidation acknowledgement and sends a write acknowledgement back to tile A.
7. Tile A receives the write acknowledgement message and thus knows that the write to word X[0] has completed.
5.2.2.2 Coherent and Direct-to-Cache I/O

TILEPro provides hardware cache coherence for I/O DMA accesses. On a write to memory from an I/O DMA engine, the hardware invalidates any cached copies of the line, and updates the cache with the newly written data.

Similarly, on a read to memory from an I/O DMA engine, the hardware checks the on-chip caches for the line and supplies it from there if found. The System Architecture Manual (UG103) describes these mechanisms in detail.

5.2.2.3 Striped Memory

TILEPro provides a boot time option to enable a “striped main memory” mode of operation. Striped main memory mode overrides the default mapping of physical memory pages to the four main memory controllers. In striped main memory mode, a physical page of memory is “striped” across the four controllers at an 8KB granularity. That is, a 64KB page would have the first quarter of the page located at memory controller 0, the second quarter at memory controller 1, the third quarter at memory controller 2, and the last quarter at memory controller 3. The striped main memory mode of operation uniformly spreads all physical memory pages across the controllers, thus balancing the load among the four controllers.

5.2.3 Direct Memory Access

The Tile Processor architecture provides a direct memory access (DMA) engine in each tile. This engine can be configured by the application programmer to move data to and from main memory and the L2 cache, and between cores.

The DMA engine operates autonomously from the processor core, issuing DMA load and DMA store operations during cycles in which the cache pipeline is not being used by the processor engine. The DMA source and destination addresses need not be word or cacheline-aligned. The application programmer can specify different source and destination strides, with which the...
DMA can perform complex memory transformations such as “shape changes”, in addition to simple copy operations. Each read or write operation performed by the DMA engine executes through the data Translation Lookaside Buffers (TLBs); therefore DMA operations are fully protected and inherit memory attributes for the memory page being accessed. As a result, the DMA engine can be used to move data, for example, from an uncacheable buffer in main memory to a pinned, cacheable buffer. The DMA engine can move data from one tile’s L2 cache to another tile’s L2 cache in the background. Completion of a DMA transfer can be signaled via an interrupt (DMA_NOTIFY) or by polling a special-purpose register (SPR).

The application programmer configures the DMA engine by writing to several SPRs. To perform a DMA request, the DMA transfer description registers (DMA_BYTE, DMA_CHUNK_SIZE, DMA_DST_ADDR, DMA_DST_CHUNK_ADDR, DMA_SRC_ADDR, DMA_SRC_CHUNK_ADDR, and DMA_STRIDE) are set appropriately, and then the REQUEST bit in DMA_CTR register is set. Figure 5-444 illustrates how a 2D-to-1D DMA transfer is handled.

![Figure 5-444: 2D-to-1D DMA Transfer](image)

<table>
<thead>
<tr>
<th>DMA Registers</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>DMA_BYTE</td>
<td>DMA Byte Register. This register serves two functions. It contains the size (in bytes) to be transferred in the first chunk and the number of chunks to be transferred. For a detailed description of this register, see page 448.</td>
</tr>
<tr>
<td>DMA_CHUNK_SIZE</td>
<td>DMA Chunk Size Register. For a detailed description of this register, see page 449.</td>
</tr>
<tr>
<td>DMA_CTR</td>
<td>DMA Control Register. This register controls the DMA engine. For a detailed description of this register, see page 450.</td>
</tr>
<tr>
<td>DMA_DST_ADDR</td>
<td>DMA Destination Address Register. This register holds the address of the first byte to be written when the next DMA operation is started. For a detailed description of this register, see page 451.</td>
</tr>
</tbody>
</table>
5.3 Memory Consistency Model

The Tile Processor architecture’s memory consistency model specifies the order in which memory operations from a processor become visible to other processors in the coherence domain.

There are two main properties, P1 and P2, defined by the memory consistency model: instruction reordering rules and store atomicity. The Tile Processor architecture defines a relaxed memory consistency model in which:

P1: Instruction Reordering

Non-overlapping memory accesses from a given processor that reference shared pages can be reordered and can become visible to other processors sharing that page in an order different from the original program order, with the following restrictions:

- Data dependencies through memory accesses from a single processor are enforced (RAW, WAW, and WAR)
- Data dependencies through registers or memory determines local visibility order
- Local ordering established by memory data dependencies or register dependencies does not determine global visibility order. See Data writes (including test-and-set and flushes) must observe control dependencies.

P2: Store Atomicity

Stores performed by a processor appear to become visible simultaneously to all remote processors, but can become visible to the issuing processor before becoming globally visible (for example, by bypassing to a subsequent load through a write buffer). Test-and-set operations are atomic to all processors: bypassing to or from test-and-set operations is not allowed.
The Tile Processor architecture provides the memory fence (MF) instruction to establish ordering among otherwise unordered instructions when such ordering is needed for correctness. Data memory operations in the program prior to the memory fence instruction are made globally visible before ANY operation after the memory fence.

The Tile Processor architecture provides a test-and-set (TNS) instruction to read and write a memory location atomically.

The following code sequences illustrate the properties of the tile memory consistency model. In the examples that follow, memory addresses are denoted by x and y, are word aligned, and are assumed to contain the value 0 initially. All loads and stores are word-sized. The notation A \( \rightarrow \) B indicates that operation A becomes visible to all processors in the coherence domain before operation B becomes visible. Examples Listing 5-1. through Listing 5-5. below illustrate property P1—instruction reordering. Examples Listing 5-6. through Listing 5-8. illustrate property P2—store atomicity and write bypassing.

**Listing 5-1. Property P1—Instruction Reordering.** Stores can reorder with stores to different locations and loads can reorder with loads to different locations.

<table>
<thead>
<tr>
<th>Tile 0</th>
<th>Tile 1</th>
</tr>
</thead>
<tbody>
<tr>
<td>sw [x] = 1 // M1</td>
<td>lw r1 = [y] // M4</td>
</tr>
<tr>
<td>MF // M2</td>
<td>MF // M5</td>
</tr>
<tr>
<td>sw [y] = 1 // M3</td>
<td>lw r2 = [x] // M6</td>
</tr>
</tbody>
</table>

All outcomes for r1 and r2 are possible.

The stores can be made visible in any order. Implementations are free to reorder data memory operations to different locations. Program order does not imply visibility order.

**Listing 5-2. Property P1—Instruction Reordering.** Ordering is enforced through the memory fence instruction.

<table>
<thead>
<tr>
<th>Tile 0</th>
<th>Tile 1</th>
</tr>
</thead>
<tbody>
<tr>
<td>sw [x] = 1 // M1</td>
<td>lw r1 = [y] // M4</td>
</tr>
<tr>
<td>MF // M2</td>
<td>MF // M5</td>
</tr>
<tr>
<td>sw [y] = 1 // M3</td>
<td>lw r2 = [x] // M6</td>
</tr>
</tbody>
</table>

The only illegal outcome is r1 == 1 and r2 == 0.

Notice that this example is the same as in Listing 5-1., except that here we have an MF instruction inserted between the pair of stores on Tile 0 and also between the pair of loads on Tile 1. The use of the MF instruction ensures that M1 \( \rightarrow \) M3 and M4 \( \rightarrow \) M6. Therefore, if M3 is visible to M4, then M1 is visible to M6.

**Listing 5-3. Property P1—Instruction Reordering.** Loads can reorder with stores to different locations.

<table>
<thead>
<tr>
<th>Tile 0</th>
<th>Tile 1</th>
</tr>
</thead>
<tbody>
<tr>
<td>sw [x] = 1 // M1</td>
<td>sw [y] = 1 // M3</td>
</tr>
<tr>
<td>lw r1 = [y] // M2</td>
<td>lw r2 = [x] // M4</td>
</tr>
</tbody>
</table>

This example is similar to Listing 5-1., in that the loads and stores on each tile have no dependence and can be freely reordered. All outcomes are legal.

**Listing 5-4. Property P1—Instruction Reordering.** Preventing loads from passing stores to different locations.

<table>
<thead>
<tr>
<th>Tile 0</th>
<th>Tile 1</th>
</tr>
</thead>
<tbody>
<tr>
<td>sw [x] = 1 // M1</td>
<td>sw [y] = 1 // M3</td>
</tr>
<tr>
<td>MF</td>
<td>MF</td>
</tr>
<tr>
<td>lw r1 = [y] // M2</td>
<td>lw r2 = [x] // M4</td>
</tr>
</tbody>
</table>

The only illegal outcome is r1 == r2 == 0.
This example is similar to the one shown in Listing 5-3., except we now have MF instructions between the memory operations. The MF on Tile 0 causes M1→M2, and the MF on Tile 1 causes M3→M4. Therefore:

If \( r_1 = 0 \), we have M2→M3, so we have M1→M2→M3→M4, so \( r_2 = 1 \).

If \( r_2 = 0 \), we have M4→M1, so we have M3→M4→M1→M2, so \( r_1 = 1 \).

If \( r_1 = 1 \), we have M3→M2, but M4 is not ordered with M1, so \( r_2 = 0 \) OR \( r_2 = 1 \).

If \( r_2 = 1 \), we have M1→M4, but M2 is not ordered with M3, so \( r_1 = 0 \) OR \( r_1 = 1 \).

Listing 5-5. Property P1—Instruction Reordering.

<table>
<thead>
<tr>
<th>Tile 0</th>
<th>Tile 1</th>
</tr>
</thead>
<tbody>
<tr>
<td>sw ([x]=1) //M1</td>
<td>lw (r_2 = [y]) //M4</td>
</tr>
<tr>
<td>MF //M2</td>
<td>bbs r5, foo</td>
</tr>
<tr>
<td>sw ([y]=1) //M3</td>
<td>lw (r_3 = [x]) //M6</td>
</tr>
</tbody>
</table>

Here, \( r_2 = 1, r_3 = 0 \) is a legal outcome. M6 is dependant on the branch, however the branch is not dependent on M4. Therefore, there is no dependency between M4 and M6 and they can be reordered. Specifically, M4 may miss in the cache. While the miss is outstanding, the branch and M6 both execute, and M6 hits in the cache, writing \( r_3 = 0 \). Then, the stores on Tile 0 execute and M4 gets the new value of \( y \) (1).

Listing 5-6. Property P2—Store Atomicity and Write Bypassing. Local data dependencies do not establish global visibility ordering: processors can see their own writes early.

<table>
<thead>
<tr>
<th>Tile 0</th>
<th>Tile 1</th>
</tr>
</thead>
<tbody>
<tr>
<td>sw ([x]=1) //M1</td>
<td>lw (r_2 = [y]) //M4</td>
</tr>
<tr>
<td>lw (r_1 = [x]) //M2</td>
<td>MF //M5</td>
</tr>
<tr>
<td>sw ([y]=r_1) //M3</td>
<td>lw (r_3 = [x]) //M6</td>
</tr>
</tbody>
</table>

The following is a legal outcome: \( r_1 = r_2 = 1, r_3 = 0 \).

In this case, true data dependencies on Tile 0 cause M1, M2, and M3 to EXECUTE on Tile 0 in order. However, this does not imply that they become globally visible to Tile 1 in this order.

The above outcome could occur if Tile 0 bypassed the sw to \( x \) to the lw \( x \) through a write buffer or local cache. Now, operation M3 writes memory, and operation M4 observes the write M3, but operation M6 gets to memory before operation M1 has become globally visible. To avoid the local bypass, Tile 0 should issue a MF instruction between M1 and M2. This forces M1 to become globally visible before M3.

Listing 5-7. Property P2—Store Atomicity and Write Bypassing. Local data dependencies establish local ordering.

<table>
<thead>
<tr>
<th>Tile 0</th>
<th>Tile 1</th>
</tr>
</thead>
<tbody>
<tr>
<td>sw ([x]=1) //M1</td>
<td>lw (r_1 = [y]) //M4</td>
</tr>
<tr>
<td>MF //M2</td>
<td>lw (r_2 = [r_1]) //M5</td>
</tr>
<tr>
<td>sw ([y]=x) //M3</td>
<td></td>
</tr>
</tbody>
</table>

\( r_1 = x \) and \( r_2 = 0 \) is an illegal outcome.

M5 is data dependent on M4 and thus executes (and becomes locally visible) after M4.

Listing 5-8. Property P2—Store Atomicity and Write Bypassing. Stores have a single order as observed by remote
processors.

<table>
<thead>
<tr>
<th>Tile 0</th>
<th>Tile 1</th>
<th>Tile 2</th>
<th>Tile 3</th>
</tr>
</thead>
<tbody>
<tr>
<td>MF</td>
<td>MF</td>
<td></td>
<td></td>
</tr>
<tr>
<td>lw r2 = [y] //M3</td>
<td>lw r4 = [x] //M6</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

r1 == 1, r3 == 1, r2 == 0, r4 == 0 is an illegal outcome.

If the above outcome were legal, this would imply that Tile 3 observes M4 occurring before M1 and Tile 1 observes M1 occurring before M4. More formally, Tile 1 observes: M1 → M2 → M3 → M4. While Tile 3 observes: M4 → M5 → M6 → M1. Recalling property P2 of the consistency model, it should be noted that because a store from a given processor occurs atomically as observed by remote processors, the above outcome is illegal.
6 ON-CHIP NETWORK ARCHITECTURE

6.1 Overview

The TILEPro™ and TILE64™ family of chips utilize multiple two-dimensional mesh networks for communication between tiles and I/O devices. Memory System traffic, Cache System traffic, I/O traffic and Software based messaging all travel over the Tilera mesh networks. Each switch point in a given network contains a dedicated link to/from the Tile Processor™, as well as four bidirectional links in the cardinal directions (north, south, east and west) to neighboring switch points. The networks run at the same frequency as the Tile Processor core, providing a single cycle latency for the head of a message to “hop” from one network switch point to a neighboring switch point. The networks can be classified into two groups, the Memory Networks, which handle all memory traffic such as cache misses, DDR2 requests, and so forth; and the Messaging Networks, which allow software to have control of the network and manually send messages between tiles and I/O devices. The Memory Networks consist of the Memory Dynamic Network (MDN), the Tile Dynamic Network (TDN), and the Coherence Dynamic Network (CDN, TILEPro only). The Messaging Networks consist of the User Dynamic Network (UDN) and the I/O Dynamic Network (IDN).

6.2 Network Properties

6.2.1 Switches

Each switch point in the Tilera® networks is implemented as a full crossbar, shown in Figure 6-445. Any input port can arbitrate for any output port, excluding itself (north cannot route north).
6.2.2 Packets
Data is transmitted over the Tilera networks via “packets”. Each packet is divided into multiple $N$ bit “flits”, where $N$ is the width of the network. Each packet contains a header flit designating the destination of the packet and the size of the packet, and a payload of data flits.

6.2.3 Routing
The Tilera networks are “wormhole” networks. In a wormhole network, the header flit arbitrates for a given output port at a switch, and, once granted, locks down that output port until the final flit in the packet has successfully traversed the switch. For large packets, this type of routing may result in the reservation of multiple output ports simultaneously for the same packet. The Tilera networks use a dimension-ordered routing policy, where packets always travel in the $X$ direction first, then the $Y$-direction. The TilePro family of processors allow each network to be configured to either route $X$ first, or $Y$ first.

6.2.4 Flow Control
Flow control between neighboring switch points is implemented via a credit scheme. Each switch point has an input buffer that may hold three flits. Each output port contains a credit count corresponding to how many available entries the neighboring input port has available. When a flit is routed through an output port, the credit count is decremented. If the credit count is zero, the flit is blocked and cannot proceed. When an input port consumes a flit, a credit is returned to the corresponding output port.

6.2.5 Fairness and Arbitration
The switch points implement round-robin output port arbitration, providing equivalent fairness for all input ports.

6.2.6 Timing
The Tilera networks operate at the same frequency as the processor cores. The latency for a flit to be read from an input buffer, traverse the crossbar, and reach the storage at the input of a neighboring switch is a single cycle.

6.2.7 Link Width
All of the on-chip network links are 32 bits wide.

6.3 Memory Networks
The Memory Networks carry all requests and responses belonging to the cache system and the memory system. The TDN is responsible for carrying tile-to-tile requests, such as read/write requests. The MDN is responsible for carrying requests from a Tile to/from the DDR2s, as well as carrying all acknowledgments and responses to TDN requests. The CDN (only present in TILEPro) carries invalidate messages needed for the cache coherency protocol in the TILEPro series of processors.

6.3.1 Packet Sizes
The following tables contain a breakdown of the type of requests on the different networks and the size of each request in terms of network flits.
6.3.2 Deadlock

The Memory Networks are completely managed by hardware and are deadlock free by design.

6.4 Messaging Networks

6.4.1 Register Mapping

Inside the tile, the UDN has a direct connection to the ALU. This allows tiles to communicate with very low latency. The UDN is register-mapped such that any operation can directly write or read the network. For example:

```
add udn0, r5, r6  // Add r5 to r6 and send the result to the UDN
add r5, r6, udn0  // Read a word from the UDN, add to r6 and put the result in r5
```

The UDN can be initialized and accessed via the TMC library. See the Applications Libraries Reference Guide (UG227) for details.

Access to the UDN is fully interlocked. This allows an application to read the network port and sleep until data arrives providing a low power wait state with zero latency wake up.
Similarly, on network send, if the network is not able to consume the packet word immediately, the processor will automatically wait until buffer space is available thus saving considerable power and latency over a polling or interrupt driven scheme.

Special Purpose Registers (SPRs) and interrupts are available to monitor the status of the incoming and outgoing network ports in order to provide alternate usage models.

### 6.4.2 Packet Format

A packet consists of a route header, tag, and a variable length payload. The route header is created by the sender and contains the X,Y coordinates of the target. It is examined at each switchpoint to route the packet through the tile fabric. The second packet word is a tag and is also created by the sender. The tag word is used to differentiate between flows at the receiver.

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:0</td>
<td>Length</td>
<td>Length of packet in 32-bit words (7 bits). The length includes the tag word, which all UDN packets must have, but does not include the route header. So, a value of 2 for length indicates a packet with only a route header, a tag word, and one word of payload. A value of zero indicates a 128-word packet.</td>
</tr>
<tr>
<td>17:7</td>
<td>Dest_Y</td>
<td>Destination tile’s Y location. This field is 11 bits.</td>
</tr>
<tr>
<td>28:18</td>
<td>Dest_X</td>
<td>Destination tile’s X location. This field is 11 bits.</td>
</tr>
<tr>
<td>31:29</td>
<td>Reserved</td>
<td>Reserved. Unused bits reserved for future use. Must be zero. This field is 3 bits.</td>
</tr>
</tbody>
</table>

**Word[1]: Tag**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:0</td>
<td>Tag</td>
<td>Thirty-two-bit value indicating this packet’s flow. The tag is used by the demux logic to sort packets. The remaining words are user payload and are not interpreted by hardware</td>
</tr>
</tbody>
</table>

**Packet Payload (0-127 words)**
### 6.4.3 Demux

The UDN receive logic includes demultiplexing (demux) hardware in order to provide high performance flow detection and independent buffering. Based on the tag, an incoming packet is placed in one of four demux queues, as shown in Figure 6-447.

The tag of the incoming packet is compared against four SPRs, (UDN_TAG_0, UDN_TAG_1, UDN_TAG_2, and UDN_TAG_3). If it matches one of the resident tags, the route header and tag words are removed and the payload words are placed in the corresponding demux queue. These queues are accessible individually through register-mapped access via udn0, udn1, udn2 and udn3. This allows differently tagged flows to be serviced out of order with respect to each other.

If the incoming packet does not match any of the programmed tags, it is placed in the catch-all queue with the length field and tag left intact.

The catch-all queue is mapped to the following SPRs:

1. **UDN_CA_TAG** – the tag of the packet at the head of the catch-all queue
2. **UDN_CA_REM** – the number of words remaining in the current packet at the head of the catch-all queue
3. **UDN_CA_DATA** – the SPR that returns the payload data (one word per read SPR read)

Note that **UDN_CA_TAG** and **UDN_CA_REM** are always valid if catch-all is not empty, even when the beginning of the packet has been partially read.
When data is available on one of the queues, it is indicated by the SPR `UDN_DATA_AVAIL`. Bits 0-3 of this register correspond to the four tagged demux queues, and bit 4 indicates if any data is available in the catch-all queue. An application can poll this register if it needs to wait until data is available on a specific queue.

Interrupts may be enabled to signal when data is available on a queue. The interrupt `UDN_CA` is signaled when the catch-all queue has data available. In order for a tagged queue to signal an interrupt, it must also be enabled in the `UDN_AVAIL_EN` SPR (in addition to the system level interrupt enable). The four tagged queues share a data available interrupt. The Interrupt Service Routine (ISR) can check the `UDN_DATA_AVAIL` register to determine which of the four channels caused the interrupt.

In addition to the data available bits, SPRs are also provided that give the number of words available in each queue. `UDN_DEMUX_COUNT_0`, `UDN_DEMUX_COUNT_1`, `UDN_DEMUX_COUNT_2`, and `UDN_DEMUX_CA_COUNT` provide the count for the four tagged demux queues, and `UDN_DEMUX_CA_COUNT` gives the count of payload words in the catch-all queue. For information about these SPRs, refer to the System Architecture Manual (UG103).

The physical buffering for all these queues is implemented as small dedicated FIFOs backed by a larger shared RAM. Space for each queue in the shared RAM is allocated and de-allocated dynamically as needed. This shared buffering provides great flexibility to the message passing system.

The large RAM is also shared with the Input/Output Dynamic Network (IDN), which is only used by system software. The buffering allocated to the IDN and the UDN in the shared buffer is hard partitioned by system software and cannot be modified by the user. There is no interaction between the UDN and the IDN and the UDN will neither block nor corrupt the IDN.

### 6.4.4 Deadlock

If you are not using iLib Standard Channels, care must be taken to avoid deadlock by software buffering for received packet flows and management of dependences between the outgoing and incoming packet flows.

### 6.4.5 Hardwall

The UDN hardwall mechanism is used to prevent unwanted communication between user applications running on adjacent tiles. The hardwall mechanism consists of an SPR-programmable protection bit on each output port of the UDN switch point and an interrupt triggered by any attempted violation of a hardwall.
When an output port is protected, no data can be sent out of the associated port. Attempting to send a packet word to a protected port will trigger an interrupt on the Tile Processor. Software can then inspect the packet and take any appropriate action.

This hardwall also provides a powerful virtualization tool. For example, the hardwall could be used to emulate the behavior of a much larger fabric by detecting messages that cross a hardwall boundary and tunneling them to another group of Tiles or another process running on the same group of tiles.
7 STATIC NETWORK

7.1 Overview

The purpose of the static network is to allow applications to transport scalar operands between tiles efficiently. Instead of using a header to specify the destination, the static network uses routing specifications at each intermediate tile to determine the direction the data should take.

The static network is composed of a crossbar switch that connects to its nearest neighbors in a two-dimensional mesh network, as well as to that tile’s processor engine. Each connection is 32-bits, full duplex, and flow controlled. The time required for a word to travel in the network is just one cycle for each hop, or intermediate tile, plus one more cycle at the destination tile to get from the network to the main processor.

The static network crossbar switch can route from five different directions: north, south, east, west, and the processor engine. The crossbar is fully connected—every output can be routed from any input (except back to itself) in each cycle, including broadcast and multicast operations.

Data movement is controlled by a static route that is setup with an special purpose registers (SPR) write from the main processor. Static routes remain in force until changed by another SPR write.

7.2 Static Routing

The desired routing is specified statically by writing the SPR SNSTATIC. This SPR has five fields, corresponding to the five possible output ports, as listed in Table 7-9.

<table>
<thead>
<tr>
<th>Bits</th>
<th>Output Port</th>
</tr>
</thead>
<tbody>
<tr>
<td>14:12</td>
<td>Main Processor</td>
</tr>
<tr>
<td>11:9</td>
<td>West</td>
</tr>
<tr>
<td>8:6</td>
<td>South</td>
</tr>
<tr>
<td>5:3</td>
<td>East</td>
</tr>
<tr>
<td>2:0</td>
<td>North</td>
</tr>
</tbody>
</table>

As shown in Table 7-10, each field contains a number that specifies which input port will route to that output port:
Table 7-10. Port Designations

<table>
<thead>
<tr>
<th>Numbers</th>
<th>Input Port</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>None</td>
</tr>
<tr>
<td>1</td>
<td>North</td>
</tr>
<tr>
<td>2</td>
<td>East</td>
</tr>
<tr>
<td>3</td>
<td>South</td>
</tr>
<tr>
<td>4</td>
<td>West</td>
</tr>
<tr>
<td>5</td>
<td>Main Processor</td>
</tr>
</tbody>
</table>

For example if 03214 (in octal) is written to \texttt{SNSTATIC}, the following routes remain in effect until \texttt{SNSTATIC} is written to again:

- south to west
- east to south
- north to east
- west to north

Multicast routing is supported. For example, writing 00033 (octal) to \texttt{SNSTATIC} will cause any word from the south to be routed to both the east and the north.

NOTE: Specifying that an input port routes back to the same output port (for example 00001, which specifies north routed to north) is illegal and results in undefined behavior.

Each input port in a static route is considered individually, and as soon as the input port has a word available and all output ports have room, the word is moved.

7.3 Data Flow Control

Every port in the static network is flow-controlled, which allows it to tolerate delays introduced by unpredictable events. The TILE64 implements the flow control using a credit-based flow control system. Each link buffers at least three words of storage, and the sender therefore begins with three credits. A sender decrements its credit count when it sends a word, and increments the credit count when it receives acknowledgement from the receiver. A sender can only send when its count is non-0.

7.4 Hardwall Protection

The STN hardwall mechanism is used to prevent unwanted communication between user applications running on adjacent tiles. The hardwall mechanism consists of an SPR-programmable protection bit on each output port of the STN switch point and an interrupt triggered by any attempted violation of a hardwall.

When an output port is protected, no data can be sent out of the associated port. Attempting to send a word to a protected port will trigger an interrupt on the Tile Processor. Software can then inspect the word and take any appropriate action.
When a static route specifies a multicast route, and just one of the many output directions causes a protection violation, the word will not be routed to any of the output ports.

### 7.5 User-Accessible Special Purpose Registers

The list of all user-accessible SPRs follows. Please see the appendix for more details.

- **Static network control register** (*SNCTL*)
  
  Contains bits to freeze the crossbar switch.

- **Static Network FIFO Data register** (*SNFIFO*)
  
  Used to save or restore static network state, or to extract words blocked by a routing violation.

- **Static Network FIFO Select register** (*SNFIFO_SEL*)
  
  Controls which FIFO is read/written when accessing *SNFIFO_DATA*.
  
  - 0 – North Input FIFO
  - 1 – East Input FIFO
  - 3 – South Input FIFO
  - 4 – West Input FIFO
  - 5 – Main Processor Input FIFO
  - 6 – Main Processor Output FIFO

- **Static Network Input State register** (*SNISTATE*)
  
  Used to save or restore static network state. Indicates how many words are in each port’s input buffer.

- **Static Network Output State register** (*SNOSTATE*)
  
  Used to save or restore static network state. Indicates how many credits each output port has for sending. Also contains how many words are in the Main Processor Output FIFO.

- **Static Network Static Route register** (*SNSTATIC*)
  
  Used to setup a static route (see “Static Routing” on page 381)

- **Static Network Data Available register** (*SN_DATA_AVAIL*)
  
  Indicates if data is available to be read from the static network by the processor engine.
8 USER-LEVEL SYSTEM CONCERNS

8.1 Overview

User-level programs need to interact with the greater system where they are executed. In order to interact with the system, user-level programs need to be able to execute system calls, interact with I/O, and control in-tile devices of a system nature. This section describes system interactions from a user-level viewpoint.

8.2 System Calls

A system call is a mechanism whereby a user-level program voluntarily passes control flow to a more privileged piece of software. A system call typically involves passing some information along with the program control flow. The system software may elect to pass return data to the user-level program after the system call completes. System calls are typically executed in response to a user-level program requiring some functionality that is provided by system software. Access to system calls are typically done through library code not directly implemented by end users.

The Tile Processor Architecture supports the ability for user-level programs to call system software via the *swint0*, *swint1*, *swint2*, and *swint3* instructions\(^1\). The architecture includes four “swint” interrupt handlers with each one corresponding to one of the swint instructions. When a swint instruction is executed, an interrupt is signaled to the respective swint interrupt handler. There are four swint interrupt levels because there are four protection levels in the Tile Processor Architecture. Therefore it is possible to choose the level of system software in which a program wants to request services. The control of the protection level to which a swint instruction vectors is not hard coded, but as a software convention, the swint number matches the protection level, where 0 is user-level, 1 is supervisor (OS), 2 is hypervisor, and 3 is for a virtual machine monitor.

Typically there are many different calls that a user-level program may want to do to system level software. As there is only one interrupt per protection level, the actual call that is needed must be signaled to system software in some manner. By software convention, a system call number is deposited into a known General Purpose Register (GPR) and then the swint is signaled. The system call number allows the system to determine which service a user-level program requires from system software. Parameters can also be passed through other processor registers and through memory. After the completion of the system call, the system software returns control to the user-level program via the *iret* instruction. The system software may elect to return a value or set of values to the user process through processor registers or through memory.

---

1. “swint” is an abbreviation of software interrupt.
8.3 Interrupt Overview

Exceptional occurrences happen in any computer system. The Tile Processor Architecture unifies all exceptional occurrences in a class of events called interrupts.

Four protection levels are provided by hardware to isolate protection concerns. These protection levels effect how interrupts occur on the Tile Processor Architecture. The protection levels are numbered 0 through 3 with 0 being the least privileged and 3 being the most privileged. This document focuses on programs executed at protection level 0 (user-level).

Table 8-11 presents the list of interrupts that are available on the Tile Processor Architecture. The System Architecture Manual provides more detail of interrupt processing and the various attributes of interrupts. If multiple interrupts are signaled at the same time, the interrupt with the lowest interrupt number will be signaled first.

8.3.1 Interrupt List

Table 8-11 lists all interrupts that can be seen by the user.

<table>
<thead>
<tr>
<th>Interrupt Number</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>ITLB_MISS</td>
<td>ITLB Miss.</td>
</tr>
<tr>
<td>1</td>
<td>MEM_ERROR</td>
<td>Memory Error</td>
</tr>
<tr>
<td>2</td>
<td>ILL</td>
<td>Illegal Instruction</td>
</tr>
<tr>
<td>3</td>
<td>GPV</td>
<td>General Protection Violation</td>
</tr>
<tr>
<td>4</td>
<td>SN_ACCESS</td>
<td>Static Networks Access</td>
</tr>
<tr>
<td>5</td>
<td>IDN_ACCESS</td>
<td>IO Dynamic Network (IDN) Access</td>
</tr>
<tr>
<td>6</td>
<td>UDN_ACCESS</td>
<td>User Dynamic Network (UDN) Access</td>
</tr>
<tr>
<td>7</td>
<td>IDN_REFILL</td>
<td>IDN Refill</td>
</tr>
<tr>
<td>8</td>
<td>UDN_REFILL</td>
<td>UDN Refill</td>
</tr>
<tr>
<td>9</td>
<td>IDN_COMPLETE</td>
<td>IDN Complete</td>
</tr>
<tr>
<td>10</td>
<td>UDN_COMPLETE</td>
<td>UDN Complete</td>
</tr>
<tr>
<td>11</td>
<td>SWINT_3</td>
<td>Software Interrupt 3</td>
</tr>
<tr>
<td>12</td>
<td>SWINT_2</td>
<td>Software Interrupt 2</td>
</tr>
<tr>
<td>13</td>
<td>SWINT_1</td>
<td>Software Interrupt 1</td>
</tr>
<tr>
<td>14</td>
<td>SWINT_0</td>
<td>Software Interrupt 0</td>
</tr>
<tr>
<td>15</td>
<td>UNALIGN_DATA</td>
<td>Unaligned Data</td>
</tr>
</tbody>
</table>
### Table 8-11. Master Interrupt Table (continued)

<table>
<thead>
<tr>
<th>Interrupt Number</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>16</td>
<td>DTLB_MISS</td>
<td>Data Translation Lookaside Buffer (DTLB) Miss</td>
</tr>
<tr>
<td>17</td>
<td>DTLB_ACCESS</td>
<td>DTLB Access Error</td>
</tr>
<tr>
<td>18</td>
<td>DMATLB_MISS</td>
<td>Direct Memory Access (DMA) Translation Lookaside Buffer Miss</td>
</tr>
<tr>
<td>19</td>
<td>DMATLB_ACCESS</td>
<td>DMA Translation Lookaside Buffer Access Error</td>
</tr>
<tr>
<td>20</td>
<td>Reserved</td>
<td>Reserved</td>
</tr>
<tr>
<td>21</td>
<td>Reserved</td>
<td>Reserved</td>
</tr>
<tr>
<td>22</td>
<td>SN_FIREWALL</td>
<td>SN Firewall Violation</td>
</tr>
<tr>
<td>23</td>
<td>IDN_FIREWALL</td>
<td>IDN Firewall Violation</td>
</tr>
<tr>
<td>24</td>
<td>UDN_FIREWALL</td>
<td>UDN Firewall Violation</td>
</tr>
<tr>
<td>25</td>
<td>TILE_TIMER</td>
<td>Tile Timer</td>
</tr>
<tr>
<td>26</td>
<td>IDN_TIMER</td>
<td>IDN Timer</td>
</tr>
<tr>
<td>27</td>
<td>UDN_TIMER</td>
<td>UDN Timer</td>
</tr>
<tr>
<td>28</td>
<td>DMA_NOTIFY</td>
<td>DMA Notification</td>
</tr>
<tr>
<td>29</td>
<td>IDN_CA</td>
<td>IDN Catch-All Available</td>
</tr>
<tr>
<td>30</td>
<td>UDN_CA</td>
<td>UDN Catch-All Available</td>
</tr>
<tr>
<td>31</td>
<td>IDN_AVAIL</td>
<td>IDN Available</td>
</tr>
<tr>
<td>32</td>
<td>UDN_AVAIL</td>
<td>UDN Available</td>
</tr>
<tr>
<td>33</td>
<td>PERF_COUNT</td>
<td>Performance Counters</td>
</tr>
<tr>
<td>34</td>
<td>INTCTRL_3</td>
<td>Interrupt Control 3</td>
</tr>
<tr>
<td>35</td>
<td>INTCTRL_2</td>
<td>Interrupt Control 2</td>
</tr>
<tr>
<td>36</td>
<td>INTCTRL_1</td>
<td>Interrupt Control 1</td>
</tr>
<tr>
<td>37</td>
<td>INTCTRL_0</td>
<td>Interrupt Control 0</td>
</tr>
<tr>
<td>38</td>
<td>BOOT_ACCESS</td>
<td>Boot Access</td>
</tr>
<tr>
<td>39</td>
<td>WORLD_ACCESS</td>
<td>World Access</td>
</tr>
<tr>
<td>40</td>
<td>I_ASID</td>
<td>Instruction Address Space Identifier (ASID)</td>
</tr>
<tr>
<td>41</td>
<td>D_ASID</td>
<td>Data ASID</td>
</tr>
</tbody>
</table>
The Tile Processor Architecture uses a vectored approach to interrupts; there are four sets of interrupt vectors, one for each protection level. On an interrupt, the architecture changes the program counter to a value derived from the interrupt number and the protection level at which the interrupt executes. The offset is \texttt{Interrupt\_Base\_Address} (0xFC000000), plus the protection level multiplied by 16 MB (0x01000000), plus the interrupt number multiplied by 256. This allows 32 VLIW instructions to fit in each interrupt vector, and allows all of a protection level’s interrupt vectors and up to 16 MB of accompanying code to be mapped into virtual address space using a single large-page ITLB entry. If more than 32 instructions are needed to handle an interrupt, the interrupt vector code can jump to the rest of the interrupt handler located in that same large page, or anywhere else in the address space.

When an interrupt occurs, the program counter of the processor is vectored to a fixed interrupt location. The fixed interrupt location is the virtual address:

\[
\text{(interrupt\_number} \ll (\text{Interrupt\_Vector\_Number\_of\_Instructions\_Log}_2 + \text{Instruction\_Size\_Log}_2) + \text{destination\_protection\_level} \ll (\text{Interrupt\_Vector\_PL\_Offset\_Log}_2 + \text{Interrupt\_Base\_Address})
\]

- \text{Interrupt\_Vector\_Number\_of\_Instructions\_Log}_2 is 5.
- \text{Instruction\_Size\_Log}_2 is 3
- \text{Interrupt\_Vector\_PL\_Offset\_Log}_2 is 24.
- \text{Interrupt\_Base\_Address} is 0xFC000000.

When an interrupt occurs, in order for a subsequent interrupt to not interrupt a pending interrupt, the \texttt{Interrupt\_Critical\_Section} bit is atomically set. This SPR effects the masking of further interrupts and is discussed along with interrupt masking in more detail in the \textit{System Architecture Manual}.

The program counter that was interrupted and the associated protection level on interrupt is placed in the SPRs \texttt{EX\_CONTEXT\_X\_0} and \texttt{EX\_CONTEXT\_X\_1}. There are four sets of these SPRs, one for each protection level that can be interrupted into. The “X” in \texttt{EX\_CONTEXT\_X\_0} and \texttt{EX\_CONTEXT\_X\_1} denotes a value between 0 and 3 for each protection level. \texttt{EX\_CONTEXT\_X\_0} contains the exceptional program counter and \texttt{EX\_CONTEXT\_X\_1} contains the protection level that was interrupted along with the interrupted state of the \texttt{Interrupt\_Critical\_Section} SPR.

<table>
<thead>
<tr>
<th>Interrupt Number</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>42</td>
<td>DMA_ASID</td>
<td>DMA ASID</td>
</tr>
<tr>
<td>43</td>
<td>RESERVED</td>
<td>RESERVED</td>
</tr>
<tr>
<td>44</td>
<td>DMA_CPL</td>
<td>DMA Current Protection Level</td>
</tr>
<tr>
<td>45</td>
<td>RESERVED</td>
<td>RESERVED</td>
</tr>
<tr>
<td>46</td>
<td>DOUBLE_FAULT</td>
<td>Double Fault</td>
</tr>
</tbody>
</table>

Table 8-11. Master Interrupt Table (continued)
8.4 User-Level Interrupts

Unlike most computer architectures, the Tile Processor Architecture supports user-level interrupts. User-level interrupts interrupt from protection level 0 destined for protection level 0. An example of an interrupt that would interrupt from protection level 0 to protection level 0 is the UDN Available interrupt. The UDN is a user-level network and the availability of a network message can trigger an interrupt to occur. In order for the interrupt to be delivered, place an appropriate interrupt handler in the correct interrupt vector location and unmask the interrupt.

User-level interrupt routines consist of 32 bundles of instructions that are laid out starting at address 0xFC000000. Thus to install a protection level 0 interrupt handler for the UDN Available interrupt (number 32), the interrupt handler would be installed at address 0xFC002000. If more than 32 instruction bundles is required, the last instruction in the interrupt handler should be used to jump to the appropriate code. If other interrupts need to be enabled inside of an interrupt handler, the INTERRUPT_CRITICAL_SECTION SPR may be cleared. The System Architecture Manual details the masking of interrupts in more detail.

When an asynchronous interrupt is signaled, all of the Tile Processor Architecture’s general purpose registers can potentially contain state that cannot be modified in order to return transparently to the interrupted process. This leaves the interrupt handler with a dilemma, it needs to save off state in the general purpose register file to memory in order to use the general purpose registers, but in order to execute a store instruction, at least one general purpose register is needed to hold an address. To address this problem, the Tile Processor Architecture provides system save registers. Four 32-bit system save registers are provided for each protection level. The system save registers can be read and written by the corresponding protection level and higher privileged protection levels. If a lower protection level attempts to access the system save registers of a higher protection level, a General Protection Violation occurs. The system save registers are mapped into the SPR space. The corresponding SPRs for a given protection level are SYSTEM_SAVE_X_0, SYSTEM_SAVE_X_1, SYSTEM_SAVE_X_2, and SYSTEM_SAVE_X_3, where X denotes a protection level 0 through 3.

After all of the interrupt processing is complete, the iret instruction should be executed. The iret instruction transitions the program counter to that stored in the EX_CONTEXT_0_0. Likewise it updates the INTERRUPT_CRITICAL_SECTION SPR and the protection level from EX_CONTEXT_0_1. These updates are done atomically.

8.5 Interaction with I/O Devices

User-level code typically interacts with I/O devices by utilizing the features provided by system software. Typically there is a driver for an I/O device which resides in the system software. The prototypical I/O device on Tile Processor Architecture is connected to the IDN and to the memory system via the iMesh. In order for a user-level program to access I/O, a system call is made to system software which then may message an I/O device for the user-level software. In response to the IDN message, the I/O device may respond back with data over the IDN, or may deposit data into memory. The I/O device will typically have sophisticated DMA engines which orchestrate the movement of bulk data from the I/O device to memory or from memory to the I/O device. More details of I/O device specifics are described in the System Architecture Manual and in the Tile Processor I/O Device Guide (UG104) for a particular implementation.

8.6 Cycle Count

Each tile contains a 64-bit cycle counter. The 64-bit cycle counter is a monotonically increasing counter that can be read by reads to the CYCLE_LOW and CYCLE_HIGH SPRs. The cycle counter increases for each major cycle of a specific implementation. The relationship between cycle count and instructions executed is implementation specific. A suggested implementation increments the
cycle count for each cycle that a bundle could issue. \texttt{CYCLE\_LOW} returns the lower 32 bits of the cycle counter while \texttt{CYCLE\_HIGH} returns the upper 32 bits of the cycle count. The cycle counter is reset to 0 when the machine is reset. The \texttt{CYCLE\_LOW} and \texttt{CYCLE\_HIGH} registers are read only registers. System software can modify the cycle counter for virtualization purposes via the \texttt{CYCLE\_LOW\_MODIFY} and \texttt{CYCLE\_HIGH\_MODIFY} special purpose registers.
A.1 Introduction

In addition to having the processor state be accessible by the standard Instruction Set Architecture (ISA), every modern processor contains some state that software needs to access, but only infrequently. Consider a DMA operation, for example. A program initiates a DMA transfer by specifying the size of the data block to be transferred, along with the source and destination addresses. The program also polls a status bit to determine when the transfer has completed.

The Tile Architecture™ provides access to all software-readable and software-writable state through a 15-bit addressed, word-oriented register file. This register file is called the Special Purpose Register File (SPRF), and each register in this register set is called an SPR. Not every bit within every SPR is physically implemented to hold state information/data. Some bits merely provide an interface to another state within the tile. Further, the SPRF is sparsely populated—not every address within the SPRF refers to an actual SPR.

Two instructions in the Tile Architecture provide access to the special purpose registers: the Move To Special Purpose Register Word (mtspr) and Move From Special Purpose Register Word (mfspr). User programs can access SPRs to control and monitor the Static Network, the User Dynamic Network, the Tile Timer, and the DMA Engine.

Table A-12 provides the list of SPRs organized by function. Information for TILE64 users is shown in yellow shading and information for TILEPro users is shown in red shading. Note that the SPRs listed below and described in the sections that follow represent a portion of the complete Special Purpose Register listing. For more information, refer to Chapter 8: Special Purpose Registers in the System Architecture Manual (UG103).

<table>
<thead>
<tr>
<th>Register/Details</th>
<th>Address</th>
<th>Access MPL</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Static Network Registers</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td><em>Static Network Control Register (SNCTL)</em> on page 396</td>
<td>0x805</td>
<td>SN_ACCESS</td>
</tr>
<tr>
<td><em>Static Network Fifo Data (SNFIFO_DATA)</em> on page 397</td>
<td>0x806</td>
<td></td>
</tr>
<tr>
<td><em>Static Network FIFO Select Register (SNFIFO_SEL)</em> on page 398</td>
<td>0x807</td>
<td></td>
</tr>
<tr>
<td><em>Static Network Input State Register (SNISTATE)</em> on page 399</td>
<td>0x809</td>
<td></td>
</tr>
<tr>
<td><em>Static Network Output State Register (SNOSTATE)</em> on page 400</td>
<td>0x80a</td>
<td></td>
</tr>
<tr>
<td><em>Static Network Static Route (SNSTATIC)</em> on page 401</td>
<td>0x80c</td>
<td></td>
</tr>
<tr>
<td>Register/Details</td>
<td>Address</td>
<td>Access MPL</td>
</tr>
<tr>
<td>---------------------------------------------------------------------------------</td>
<td>---------</td>
<td>--------------</td>
</tr>
<tr>
<td><strong>Static Network Registers (continued)</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>“Static Network Data Available (SN_DATA_AVAIL)” on page 402</td>
<td>0x900</td>
<td>SN_ACCESS</td>
</tr>
<tr>
<td><strong>Static Network Static Registers — Used for TILEPro Processors ONLY</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>“Static Network Control (SN_STATIC_CTL)” on page 403</td>
<td></td>
<td>SN_STATIC_ACCESS</td>
</tr>
<tr>
<td>“Static Network FIFO Data (SN_STATIC_FIFO_DATA)” on page 404</td>
<td></td>
<td></td>
</tr>
<tr>
<td>“Static Network FIFO Select (SN_STATIC_FIFO_SEL)” on page 405</td>
<td></td>
<td></td>
</tr>
<tr>
<td>“Static Network Input State (SN_STATIC_ISTATE)” on page 406</td>
<td></td>
<td></td>
</tr>
<tr>
<td>“Static Network Output State (SN_STATIC_OSTATE)” on page 407</td>
<td></td>
<td></td>
</tr>
<tr>
<td>“Static Network Static Route (SN_STATIC_STATIC)” on page 408</td>
<td></td>
<td></td>
</tr>
<tr>
<td>“Static Network Data Available (SN_STATIC_DATA_AVAIL)” on page 409</td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>User Dynamic Network Registers</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>“User Dynamic Network Catch-all Demultiplexor Count Register (UDN_DEMUX_CA_COUNT)” on page 410</td>
<td>0xc05</td>
<td>UDN_ACCESS</td>
</tr>
<tr>
<td>“User Dynamic Network Demultiplexor Count 0 Register (UDN_DEMUX_COUNT_0)” on page 411</td>
<td>0xc06</td>
<td></td>
</tr>
<tr>
<td>“User Dynamic Network Demultiplexor Count 1 Register (UDN_DEMUX_COUNT_1)” on page 412</td>
<td>0xc07</td>
<td></td>
</tr>
<tr>
<td>“User Dynamic Network Demultiplexor Count 2 Register (UDN_DEMUX_COUNT_2)” on page 413</td>
<td>0xc08</td>
<td></td>
</tr>
<tr>
<td>“User Dynamic Network Demultiplexor Count 3 Register (UDN_DEMUX_COUNT_3)” on page 414</td>
<td>0xc09</td>
<td></td>
</tr>
<tr>
<td>“UDN Demux Control Register (UDN_DEMUX_CTL)” on page 415</td>
<td>0xc0a</td>
<td></td>
</tr>
<tr>
<td>“User Dynamic Network Demux Current Tag (UDN_DEMUX_CURR_TAG)” on page 415</td>
<td>0xc0b</td>
<td></td>
</tr>
<tr>
<td>“UDN Demux Queue Select Register (UDN_DEMUX_QUEUE_SEL)” on page 415</td>
<td>0xc0c</td>
<td></td>
</tr>
<tr>
<td>“User Dynamic Network Demux State (UDN_DEMUX_STATUS)” on page 416</td>
<td>0xc0d</td>
<td></td>
</tr>
<tr>
<td>“User Dynamic Network Demux FIFO (UDN_DEMUX_WRITE_FIFO)” on page 416</td>
<td>0xc0e</td>
<td></td>
</tr>
<tr>
<td>“User Dynamic Network Demux Write Queue (UDN_DEMUX_WRITE_QUEUE)” on page 417</td>
<td>0xc0f</td>
<td></td>
</tr>
</tbody>
</table>
### Table A-12. Special Purpose Registers (continued)

<table>
<thead>
<tr>
<th>Register/Details</th>
<th>Address</th>
<th>Access MPL</th>
</tr>
</thead>
<tbody>
<tr>
<td>&quot;User Dynamic Network Words Pending (UDN_PENDING)&quot; on page 417</td>
<td>0xc10</td>
<td>UDN_ACCESS</td>
</tr>
<tr>
<td>&quot;User Dynamic Network FIFO Data (UDN_SP_FIFO_DATA)&quot; on page 418</td>
<td>0xc11</td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network FIFO Data (UDN_SP_FIFO_DATA)&quot; on page 418</td>
<td>0xc12</td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network Freeze (UDN_SP_FREEZE)&quot; on page 419</td>
<td>0xc13</td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network Port State (UDN_SP_STATE)&quot; on page 420</td>
<td>0xc14</td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network Tag 0 (UDN_TAG_0)&quot; on page 421</td>
<td>0xc15</td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network Tag 1 (UDN_TAG_1)&quot; on page 421</td>
<td>0xc16</td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network Tag 2 (UDN_TAG_2)&quot; on page 421</td>
<td>0xc17</td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network Tag 3 (UDN_TAG_3)&quot; on page 422</td>
<td>0xc18</td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network Tag Valid (UDN_TAG_VALID)&quot; on page 422</td>
<td>0xc19</td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network Tile Coordinates (UDN_TILE_COORD)&quot; on page 423</td>
<td>0xc1a</td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network Catch-All Data (UDN_CA_DATA)&quot; on page 424</td>
<td>0xd00</td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network Catch-all Remaining Words (UDN_CA_REM)&quot; on page 424</td>
<td>0xd01</td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network Catch-All Data (UDN_CA_DATA)&quot; on page 424</td>
<td>0xd02</td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network Data Available (UDN_DATA_AVAIL)&quot; on page 425</td>
<td>0xd03</td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network Refill Available Enable (UDN_REFILL_EN)&quot; on page 426</td>
<td>0x1005</td>
<td>UDN_REFILL</td>
</tr>
<tr>
<td>&quot;User Dynamic Network Remaining (UDN_REMAINING)&quot; on page 427</td>
<td>0x1405</td>
<td>UDN_COMPLETE</td>
</tr>
<tr>
<td>&quot;User Dynamic Network Available Enables (UDN_AVAIL_EN)&quot; on page 428</td>
<td>0x4005</td>
<td>UDN_AVAIL</td>
</tr>
<tr>
<td>&quot;User Dynamic Network Registers (continued)&quot;</td>
<td></td>
<td></td>
</tr>
<tr>
<td>&quot;User Dynamic Network Deadlock Counter (UDN_DEADLOCK_COUNT)&quot; on page 429</td>
<td>0x3605</td>
<td>UDN_TIMER</td>
</tr>
<tr>
<td>&quot;User Dynamic Network Deadlock Timeout (UDN_DEADLOCK_TIMEOUT)&quot; on page 430</td>
<td>0x3606</td>
<td></td>
</tr>
</tbody>
</table>
## Table A-12. Special Purpose Registers (continued)

<table>
<thead>
<tr>
<th>Register/Details</th>
<th>Address</th>
<th>Access MPL</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>World-Accessible Registers</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>“Cycle Counter High (CYCLE_HIGH)” on page 431</td>
<td>0x4e06</td>
<td>WORLD_ACCESS</td>
</tr>
<tr>
<td>“Cycle Counter Low (CYCLE_LOW)” on page 431</td>
<td>0x4e07</td>
<td></td>
</tr>
<tr>
<td>“Done Magic Register (DONE)” on page 432</td>
<td>0x4e08</td>
<td></td>
</tr>
<tr>
<td>“Fail Magic Register (FAIL)” on page 432</td>
<td>0x4e09</td>
<td></td>
</tr>
<tr>
<td>“Interrupt Critical Section (INTERRUPT_CRITICAL_SECTION)” on page 433</td>
<td>0x4e0a</td>
<td></td>
</tr>
<tr>
<td>“Pass Magic Register (PASS)” on page 433</td>
<td>0x4e0b</td>
<td></td>
</tr>
<tr>
<td><strong>Interrupt Control 0 Registers</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>“Exceptional Context Protection Level 0 Entry 0 (EX_CONTEXT_0_0)” on page 434</td>
<td>0x4a05</td>
<td>INTCTRL_0</td>
</tr>
<tr>
<td>“Exceptional Context Protection Level 0 Entry 1 (EX_CONTEXT_0_1)” on page 435</td>
<td>0x4a06</td>
<td></td>
</tr>
<tr>
<td>“Interrupt Control 0 Status (INTCTRL_N_STATUS)” on page 436</td>
<td>0x4a07</td>
<td></td>
</tr>
<tr>
<td>“Interrupt Mask Protection Level 0 Entry 0 (INTERRUPT_MASK_0_0)” on page 437</td>
<td>0x4a08</td>
<td></td>
</tr>
<tr>
<td>“Interrupt Mask Protection Level 0 Entry 1 (INTERRUPT_MASK_0_1)” on page 439</td>
<td>0x4a09</td>
<td></td>
</tr>
<tr>
<td>“Interrupt Mask Protection Level 0 Entry 0 (INTERRUPT_MASK_RESET_0)” on page 440</td>
<td>0x4a0a</td>
<td></td>
</tr>
<tr>
<td>“Interrupt Mask Protection Level 0 Entry 1 (INTERRUPT_MASK_RESET_0_1)” on page 442</td>
<td>0x4a0b</td>
<td></td>
</tr>
<tr>
<td>“Interrupt Mask Protection Level 0 Entry 0 (INTERRUPT_MASK_SET_0_0)” on page 443</td>
<td>0x4a0c</td>
<td></td>
</tr>
<tr>
<td>“Interrupt Mask Protection Level 0 Entry 1 (INTERRUPT_MASK_SET_0_1)” on page 445</td>
<td>0x4a0d</td>
<td></td>
</tr>
<tr>
<td>“System Save Register Level 0 Entry 0 (SYSTEM_SAVE_0_0)” on page 446</td>
<td>0x4b00</td>
<td></td>
</tr>
<tr>
<td>“System Save Register Level 0 Entry 1 (SYSTEM_SAVE_0_1)” on page 446</td>
<td>0x4b01</td>
<td></td>
</tr>
<tr>
<td>“System Save Register Level 0 Entry 2 (SYSTEM_SAVE_0_2)” on page 446</td>
<td>0x4b02</td>
<td></td>
</tr>
<tr>
<td>“System Save Register Level 0 Entry 3 (SYSTEM_SAVE_0_3)” on page 446</td>
<td>0x4b03</td>
<td></td>
</tr>
<tr>
<td><strong>Tile Timer Register</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>“Minimum Protection Level for Tile Timer (MPL_TILE_TIMER)” on page 447</td>
<td>0x3205</td>
<td>TILE_TIMER</td>
</tr>
</tbody>
</table>
Table A-12. Special Purpose Registers (continued)

<table>
<thead>
<tr>
<th>Register/Details</th>
<th>Address</th>
<th>Access MPL</th>
</tr>
</thead>
<tbody>
<tr>
<td>“DMA Byte (DMA_BYTE) Register” on page 448</td>
<td>0x3900</td>
<td>DMA_NOTIFY</td>
</tr>
<tr>
<td>“DMA Chunk Size (DMA_CHUNK_SIZE) Register” on page 449</td>
<td>0x3901</td>
<td></td>
</tr>
<tr>
<td>“DMA Control (DMA_CTR) Register” on page 450</td>
<td>0x3902</td>
<td></td>
</tr>
<tr>
<td>“DMA Destination Address (DMA_DST_ADDR) Register” on page 451</td>
<td>0x3903</td>
<td></td>
</tr>
<tr>
<td>“DMA Destination Chunk Address (DMA_DST_CHUNK_ADDR) Register” on page 452</td>
<td>0x3904</td>
<td></td>
</tr>
<tr>
<td>“DMA Source Address (DMA_SRC_ADDR) Register” on page 453</td>
<td>0x3905</td>
<td></td>
</tr>
<tr>
<td>“DMA Source Chunk Address (DMA_SRC_CHUNK_ADDR) Register” on page 454</td>
<td>0x3906</td>
<td></td>
</tr>
<tr>
<td>“DMA Source And Destination Strides (DMA_STRIDE) Register” on page 455</td>
<td>0x3907</td>
<td></td>
</tr>
<tr>
<td>“DMA User Status (DMA_USER_STATUS) Register” on page 456</td>
<td>0x3908</td>
<td></td>
</tr>
</tbody>
</table>
A.2 SPR Register Descriptions

Registers are described in ascending address order.

**Static Network Control Register (SNCTL)**

This register controls execution of the static network processor and fabric.

**Speed**

Slow

**Minimum Protection Level**

SN_ACCESS

---

**Figure 7. SNCTL Register Diagram**

---

**Table A-13. SNCTL Register Bit Descriptions**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:2</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>FRZPROC</td>
<td>1</td>
<td>For TILE 64, this bit freezes the static network processor.</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>For TILEPro, this is reserved.</td>
</tr>
<tr>
<td>0</td>
<td>FRZFABRIC</td>
<td>1</td>
<td>Freeze the static network fabric.</td>
</tr>
</tbody>
</table>

---
Static Network Fifo Data (SNFIFO_DATA)

Accesses the data FIFO specified by SNFIFO_SEL. When read, returns the top entry on the specified FIFO and removes it from the FIFO. When written, it writes the specified data into the FIFO.

**Speed**

Slow

**Minimum Protection Level**

SN_ACCESS

---

Figure 8. SNFIFO_DATA Register Diagram
Appendix A Special Purpose Registers

Static Network FIFO Select Register (SNFIFO_SEL)

This register specifies which FIFO will be read and written by the SNFIFO_DATA register.

Speed

Slow

Minimum Protection Level

SN_ACCESS

Figure 9. SNFIFO_SEL Register Diagram

Table A-14. SNFIFO_SEL Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:3</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>2:0</td>
<td>SNFIFO_SEL</td>
<td>0</td>
<td>This bitfield specifies which FIFO will be read and written by the SNFIFO_DATA register. FIFOs are as follows:</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>0: North Input FIFO</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>1: East Input FIFO</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>2: South Input FIFO</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>3: West Input FIFO</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>4: Processor Input FIFO</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>5: Processor Output FIFO</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>6 and 7: Undefined</td>
</tr>
</tbody>
</table>
Static Network Input State Register (SNISTATE)

This register specifies the number of entries in the static network’s input FIFOs.

**Speed**

Slow

**Minimum Protection Level**

\textit{SN\_ACCESS}

![Figure 10. SNISTATE Register Diagram](image)

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:20</td>
<td>Reserved</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>19:16</td>
<td>M</td>
<td>0</td>
<td>Main Processor Input FIFO entry count. TILE64 implements the bitfield 18:16; writes to bit 19 are ignored, and these bits are read as 0.</td>
</tr>
<tr>
<td>15:12</td>
<td>W</td>
<td>0</td>
<td>West Input FIFO entry count. TILE64 implements the bitfield 13:12; writes to bits 15:14 are ignored, and these bits are read as 0.</td>
</tr>
<tr>
<td>11:8</td>
<td>S</td>
<td>0</td>
<td>South Input FIFO entry count. TILE64 implements the bitfield 9:8; writes to bits 11:10 are ignored, and these bits are read as 0.</td>
</tr>
<tr>
<td>7:4</td>
<td>E</td>
<td>0</td>
<td>East Input FIFO entry count. TILE64 implements the bitfield 5:4; writes to bits 7:6 are ignored, and these bits are read as 0.</td>
</tr>
<tr>
<td>3:0</td>
<td>N</td>
<td>0</td>
<td>North Input FIFO entry count. TILE64 implements the bitfield 1:0; writes to bits 3:2 are ignored, and these bits are read as 0.</td>
</tr>
</tbody>
</table>
Static Network Output State Register (SNOSTATE)

This register specifies the number of credits available to the static network’s output FIFOs on the compass points as well as the number of entries present in the output FIFO going from the static network to the processor.

**Speed**

Slow

**Minimum Protection Level**

SN_ACCESS

![Figure 11. SNOSTATE Register Diagram](image)

**Table A-16. SNOSTATE Register Bit Descriptions**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:20</td>
<td>Reserved</td>
<td>0x0</td>
<td>Reserved</td>
</tr>
<tr>
<td>19:16</td>
<td>M</td>
<td>0</td>
<td>Main Processor Output FIFO entry count. TILE64 implements the bitfield 18:16; writes to bit 19 are ignored, and these bits are read as 0.</td>
</tr>
<tr>
<td>15:12</td>
<td>W</td>
<td>0</td>
<td>West Output FIFO credit count. TILE64 implements the bitfield 13:12; writes to bits 15:14 are ignored, and these bits are read as 0.</td>
</tr>
<tr>
<td>11:8</td>
<td>S</td>
<td>0</td>
<td>South Output FIFO credit count. TILE64 implements the bitfield 9:8; writes to bits 11:10 are ignored, and these bits are read as 0.</td>
</tr>
<tr>
<td>7:4</td>
<td>E</td>
<td>0</td>
<td>East Output FIFO credit count. TILE64 implements the bitfield 5:4; writes to bits 7:6 are ignored, and these bits are read as 0.</td>
</tr>
<tr>
<td>3:0</td>
<td>N</td>
<td>0</td>
<td>North Output FIFO credit count. TILE64 implements the bitfield 1:0; writes to bits 3:2 are ignored, and these bits are read as 0.</td>
</tr>
</tbody>
</table>
Static Network Static Route (SNSTATIC)

This register specifies the static input route to a given output port.

**Speed**

Slow

**Minimum Protection Level**

SN_ACCESS

![SNSTATIC Register Diagram](image)

**Figure A-1: SNSTATIC Register Diagram**

Table A-17. SNSTATIC Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:15</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>14:12</td>
<td>M</td>
<td>0</td>
<td>Main Processor static input route.</td>
</tr>
<tr>
<td>11:9</td>
<td>W</td>
<td>0</td>
<td>West static input route.</td>
</tr>
<tr>
<td>8:6</td>
<td>S</td>
<td>0</td>
<td>South static input route.</td>
</tr>
<tr>
<td>5:3</td>
<td>E</td>
<td>0</td>
<td>East static input route.</td>
</tr>
<tr>
<td>2:0</td>
<td>N</td>
<td>0</td>
<td>North static input route.</td>
</tr>
</tbody>
</table>

As shown in Table A-18, each field contains a number that specifies which input port will route to that output port:

**Table A-18. Port Designations**

<table>
<thead>
<tr>
<th>Numbers</th>
<th>Input Port</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>None</td>
</tr>
<tr>
<td>1</td>
<td>North</td>
</tr>
<tr>
<td>2</td>
<td>East</td>
</tr>
<tr>
<td>3</td>
<td>South</td>
</tr>
</tbody>
</table>
Table A-18. Port Designations (continued)

<table>
<thead>
<tr>
<th>Numbers</th>
<th>Input Port</th>
</tr>
</thead>
<tbody>
<tr>
<td>4</td>
<td>West</td>
</tr>
<tr>
<td>5</td>
<td>Main Processor</td>
</tr>
</tbody>
</table>

Static Network Data Available (SN_DATA_AVAIL)

This register contains a bit field that indicates that data is available on the static network.

**Speed**

Fast

**Minimum Protection Level**

SN_ACCESS

![Figure A-2: SN_DATA_AVAIL Register Diagram](image)

Table A-19. SN_DATA_AVAIL Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:1</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>AVAIL</td>
<td>0</td>
<td>Data is available to be read on the static network.</td>
</tr>
</tbody>
</table>
Static Network Control (SN_STATIC_CTL)

This register controls execution of the static network processor and fabric.
NOTE: This SPR is reserved for TILE64 and is not reserved for TILEPro.

Speed

Slow

Minimum Protection Level

SN_STATIC_ACCESS

![Figure 2. SN_STATIC_CTL Register Diagram]

Table A-20. SN_STATIC_CTL Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:1</td>
<td>Reserved</td>
<td>0x0</td>
<td>Reserved</td>
</tr>
<tr>
<td>0</td>
<td>FRZFABRIC</td>
<td>1</td>
<td>Added in TILEPro: Freeze the static network fabric.</td>
</tr>
</tbody>
</table>
Static Network FIFO Data (SN_STATIC_FIFO_DATA)

Accesses the data FIFO specified by SNFIFO_SEL. When read, returns the top entry on the specified FIFO and removes it from the FIFO. When written, it writes the specified data into the FIFO.

NOTE: This SPR is reserved for TILE64 and is not reserved for TILEPro.

Speed
Slow

Minimum Protection Level
SN_STATIC_ACCESS

![Figure 3. SN_STATIC_FIFO_DATA Register Diagram]

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:0</td>
<td>SN_STATIC_FIFO_DATA</td>
<td>0</td>
<td>Added in TILEPro: Accesses the data fifo specified by SNFIFO_SEL. When read, returns the top entry on the specified fifo and removes it from the FIFO. When written, it writes the specified data into the FIFO.</td>
</tr>
</tbody>
</table>
Static Network FIFO Select (SN_STATIC_FIFO_SEL)

This SPR specifies which FIFO will be read and written by the SNFIFO_DATA register. 

NOTE: This SPR is reserved for TILE64 and is not reserved for TILEPro.

Speed
Slow

Minimum Protection Level

SN_STATIC_ACCESS

Table A-22. SN_STATIC_FIFO_SEL Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:3</td>
<td>Reserved</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>2:0</td>
<td>SN_STATIC_FIFO_SEL</td>
<td>0</td>
<td>Added in TILEPro: This bitfield specifies which FIFO will be read and written by the SNFIFO_DATA register.</td>
</tr>
</tbody>
</table>
Static Network Input State (SN_STATIC_ISTATE)

This register specifies the number of entries in the static network’s Input FIFOs. NOTE: This SPR is reserved for TILE64 and is not reserved for TILEPro.

Speed

Slow

Minimum Protection Level

SN_STATIC_ACCESS

![Figure 5. SN_STATIC_ISTATE Register Diagram](image_url)

Table A-23. SN_STATIC_ISTATE Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:20</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>19:16</td>
<td>M</td>
<td>0</td>
<td>Added in TILEPro: Main Processor Input FIFO entry count.</td>
</tr>
<tr>
<td>15:12</td>
<td>W</td>
<td>0</td>
<td>Added in TILEPro: West Input FIFO entry count.</td>
</tr>
<tr>
<td>11:8</td>
<td>S</td>
<td>0</td>
<td>Added in TILEPro: South Input FIFO entry count.</td>
</tr>
<tr>
<td>7:4</td>
<td>E</td>
<td>0</td>
<td>Added in TILEPro: East Input FIFO entry count.</td>
</tr>
<tr>
<td>3:0</td>
<td>N</td>
<td>0</td>
<td>Added in TILEPro: North Input FIFO entry count.</td>
</tr>
</tbody>
</table>
Static Network Output State (SN_STATIC_OSTATE)

This register specifies the number of credits available to the static network’s output FIFOs on the compass points as well as the number of entries present in the output FIFO going from the static network to the processor.

NOTE: This SPR is reserved for TILE64 and is not reserved for TILEPro.

Speed

Slow

Minimum Protection Level

SN_STATIC_ACCESS

![SN_STATIC_OSTATE Register Diagram](image)

Figure 6. SN_STATIC_OSTATE Register Diagram

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:20</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>19:16</td>
<td>M</td>
<td>0</td>
<td>Added in TILEPro: Main Processor Output FIFO credit count.</td>
</tr>
<tr>
<td>15:12</td>
<td>W</td>
<td>0</td>
<td>Added in TILEPro: West Output FIFO credit count.</td>
</tr>
<tr>
<td>11:8</td>
<td>S</td>
<td>0</td>
<td>Added in TILEPro: South Output FIFO credit count.</td>
</tr>
<tr>
<td>7:4</td>
<td>E</td>
<td>0</td>
<td>Added in TILEPro: East Output FIFO credit count.</td>
</tr>
<tr>
<td>3:0</td>
<td>N</td>
<td>0</td>
<td>Added in TILEPro: North Output FIFO entry count.</td>
</tr>
</tbody>
</table>
Static Network Static Route (SN_STATIC_STATIC)

This register specifies the static input route to a given output port.

NOTE: This SPR is reserved for TILE64 and is not reserved for TILEPro.

Speed
Slow

Minimum Protection Level
SN_STATIC_ACCESS

![Figure 7. SN_STATIC_STATIC Register Diagram]

Table A-25. SN_STATIC_STATIC Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:15</td>
<td>Reserved</td>
<td>0x0000000000000000</td>
<td>Reserved</td>
</tr>
<tr>
<td>14:12</td>
<td>M</td>
<td>0</td>
<td>Added in TILEPro: Main Processor static input route.</td>
</tr>
<tr>
<td>11:9</td>
<td>W</td>
<td>0</td>
<td>Added in TILEPro: West static input route.</td>
</tr>
<tr>
<td>8:6</td>
<td>S</td>
<td>0</td>
<td>Added in TILEPro: South static input route.</td>
</tr>
<tr>
<td>5:3</td>
<td>E</td>
<td>0</td>
<td>Added in TILEPro: East static input route.</td>
</tr>
<tr>
<td>2:0</td>
<td>N</td>
<td>0</td>
<td>Added in TILEPro: North static input route.</td>
</tr>
</tbody>
</table>
Static Network Data Available (SN_STATIC_DATA_AVAIL)

This register contains a bit field that indicates that data is available on the static network.

NOTE: This SPR is reserved for TILE64 and is not reserved for TILEPro.

Speed

Fast

Minimum Protection Level

SN_STATIC_ACCESS

Table A-26. SN_STATIC_DATA_AVAIL Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:1</td>
<td>Reserved</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>0</td>
<td>AVAIL</td>
<td>0</td>
<td>Added in TILEPro: This bit indicates data is available to be read on the static network.</td>
</tr>
</tbody>
</table>
User Dynamic Network Catch-all Demultiplexor Count Register
(UDN_DEMUX_CA_COUNT)

This register contains the number of words that have been received for Catch-all Queue of the User Dynamic Network.

**Speed**
Slow

**Minimum Protection Level**
UDN_ACCESS

```
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
UDN_DEMUX_CA_COUNT
```

*Figure 9. UDN_DEMUX_CA_COUNT Register Diagram*

**Table A-27. UDN_DEMUX_CA_COUNT Register Bit Descriptions**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:0</td>
<td>UDN_DEMUX_CA_COUNT</td>
<td>0</td>
<td>Number two-word slices the UDN is allowed to consume in the demux buffer. If the sum of IDN and UDN thresholds exceeds 56, the IDN and UDN networks can compete for buffer entries and the refill/context swap flows must account for concurrent activity on the other network. TILE64 implements the bitfield 6:0; writes to bits 31:7 are ignored, and these bits are read as 0.</td>
</tr>
</tbody>
</table>
User Dynamic Network Demultiplexor Count 0 Register (UDN_DEMUX_COUNT_0)

This register contains the number of words that have been received for channel 0 of the User Dynamic Network.

**Speed**

Slow

**Minimum Protection Level**

UDN_ACCESS

![UDN_DEMUX_COUNT_0 Register Diagram](image)

**Table A-28. UDN_DEMUX_COUNT_0 Register Bit Descriptions**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:0</td>
<td>UDN_DEMUX_COUNT_0</td>
<td>0</td>
<td>Count. Implements the bitfield 6:0; writes to bits 31:7 are ignored, and these bits are read as 0.</td>
</tr>
</tbody>
</table>
User Dynamic Network Demultiplexor Count 1 Register (UDN_DEMUX_COUNT_1)

This register contains the number of words that have been received for channel 1 of the User Dynamic Network.

**Speed**

Slow

**Minimum Protection Level**

UDN_ACCESS

![UDN_DEMUX_COUNT_1 Register Diagram](image)

**Figure 11. UDN_DEMUX_COUNT_1 Register Diagram**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:0</td>
<td>UDN_DEMUX_COUNT_1</td>
<td>0</td>
<td>Count. Implements the bitfield 6:0; writes to bits 31:7 are ignored, and these bits are read as 0.</td>
</tr>
</tbody>
</table>
User Dynamic Network Demultiplexor Count 2 Register (UDN_DEMUX_COUNT_2)

This register contains the number of words that have been received for channel 2 of the User Dynamic Network.

Speed
Slow

Minimum Protection Level
UDN_ACCESS

Table A-30. UDN_DEMUX_COUNT_2 Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:0</td>
<td>UDN_DEMUX_COUNT_2</td>
<td>0</td>
<td>Count. Implements the bitfield 6:0; writes to bits 31:7 are ignored, and these bits are read as 0.</td>
</tr>
</tbody>
</table>
User Dynamic Network Demultiplexor Count 3 Register (UDN_DEMUX_COUNT_3)

This register contains the number of words that have been received for channel 3 of the User Dynamic Network.

Speed
Slow

Minimum Protection Level
UDN_ACCESS

Table A-31. UDN_DEMUX_COUNT_3 Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:0</td>
<td>UDN_DEMUX_COUNT_3</td>
<td>0</td>
<td>Count. Implements the bitfield 6:0; writes to bits 31:7 are ignored, and these bits are read as 0.</td>
</tr>
</tbody>
</table>
UDN Demux Control Register (UDN_DEMUX_CTL)

When written, demux state is cleared. Used after state extraction and during state restore.

Speed
Slow

Minimum Protection Level
UDN_ACCESS

User Dynamic Network Demux Current Tag (UDN_DEMUX_CURR_TAG)

This register contains the tag of current packet being dequeued. This register is valid only when the CURR_REM field is not 0 in the UDN_DEMUX_STATUS register.

Speed
Slow

Minimum Protection Level
UDN_ACCESS

UDN Demux Queue Select Register (UDN_DEMUX_QUEUE_SEL)

Selects demux queue to be written on UDN_DEMUX_WRITE_QUEUE.

Speed
Slow

Minimum Protection Level
UDN_ACCESS

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:2</td>
<td>Reserved</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>1:0</td>
<td>UDN_DEMUX_QUEUE_SEL</td>
<td>0</td>
<td>Selects demux queue to be written on UDN_DEMUX_WRITE_QUEUE.</td>
</tr>
</tbody>
</table>

Figure 14. UDN_DEMUX_QUEUE_SEL Register Diagram

Table A-32. UDN_DEMUX_QUEUE_SEL Register Bit Descriptions
User Dynamic Network Demux FIFO (UDN_DEMUX_WRITE_FIFO)

When this register is written to, one word of data is pushed into demux FIFO. When this register is read, one word is read from FIFO.

**Speed**

Slow

**Minimum Protection Level**

UDN_ACCESS

User Dynamic Network Demux State (UDN_DEMUX_STATUS)

This register enables access to the demux logic state for context swapping, deadlock recovery information, and tag changes.

**Speed**

Slow

**Minimum Protection Level**

UDN_ACCESS

![Figure A-1: UDN_DEMUX_STATUS Register Diagram](image)

**Table A-33. UDN_DEMUX_STATUS Register Bit Descriptions**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:12</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>11:10</td>
<td>RCV_FIFO_CNT</td>
<td></td>
<td>Number of entries in the receive FIFO.</td>
</tr>
<tr>
<td>9</td>
<td>SPACE_AVAIL</td>
<td></td>
<td>Space is available in the demux framing logic for at least one word.</td>
</tr>
<tr>
<td>8</td>
<td>WAIT_TAG</td>
<td></td>
<td>Currently waiting for tag word. State save/restore should ignore current tag</td>
</tr>
<tr>
<td>7:0</td>
<td>CURR_REM</td>
<td></td>
<td>Number of words remaining in packet currently being dequeued. When 0, no</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>packet inflight.</td>
</tr>
</tbody>
</table>
User Dynamic Network Demux Write Queue (UDN_DEMUX_WRITE_QUEUE)

When this register is written to, one word of data is pushed into demux queue selected by QUEUE_SEL — used to push data into queues that are in refill mode.

**Speed**
Slow

**Minimum Protection Level**
UDN_ACCESS

User Dynamic Network Words Pending (UDN_PENDING)

This register contains the number of words remaining in packet being sent into network from the main processor.

**Speed**
Slow

**Minimum Protection Level**
UDN_ACCESS

![Figure A-2: UDN_PENDING Register Diagram](image)

**Table A-34. UDN_PENDING Register Bit Descriptions**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:8</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>7:0</td>
<td>UDN_PENDING</td>
<td></td>
<td>The number of words remaining in packet being sent into network from the main processor.</td>
</tr>
</tbody>
</table>
User Dynamic Network FIFO Data (UDN_SP_FIFO_DATA)

This register provides access to the data FIFO specified by UDN_SP_FIFO_SEL. When this register is read, it returns the top entry on the specified FIFO and removes the entry from the FIFO. When this register is written to, it writes the specified data into the FIFO.

**Speed**
Slow

**Minimum Protection Level**
UDN_ACCESS

User Dynamic Network FIFO Select (UDN_SP_FIFO_SEL)

This register specifies which port’s data and state will be read and written by the UDN_SP_FIFO_DATA and UDN_SP_STATE registers. When set to 4(d), main processor FIFO may be restored by writing to the network register. Data may be lost if left set to 4(d) and switch point is frozen and too much data is written to egress FIFO.

**Speed**
Slow

**Minimum Protection Level**
UDN_ACCESS

![Figure A-3: UDN_SP_FIFO_SEL Register Diagram](image)

Table A-35. UDN_SP_FIFO_SEL Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:3</td>
<td>Reserved</td>
<td></td>
<td>Reserved</td>
</tr>
</tbody>
</table>
| 2:0  | UDN_SP_FIFO_SEL  | 0     | Specifies which port’s data and state will be read and written by the UDN SP FIFO DATA and UDN SP STATE registers. When set to 4(d), processor FIFO may be restored by writing to the network register. Data may be lost if left set to 4(d) and switch point is frozen and too much data is written to egress FIFO. The encodings are:
0 North
1 South
2 East
3 West
4 cORE |
User Dynamic Network Freeze (UDN_SP_FREEZE)

This register freezes the network in preparation for context swap.

**Speed**
Slow

**Minimum Protection Level**
UDN_ACCESS

![UDN_SP_FREEZE Register Diagram](image)

**Figure A-4: UDN_SP_FREEZE Register Diagram**

**Table A-36. UDN_SP_FREEZE Register Bit Descriptions**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:3</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>NON_DEST_EXT</td>
<td>0</td>
<td>When asserted, the tile will return credit to neighbors when data is extracted from FIFOs via SPR reads. This is used for extracting data in the protection-violation case.</td>
</tr>
<tr>
<td>1</td>
<td>DEMUX.FRZ</td>
<td>0</td>
<td>Freeze demux.</td>
</tr>
<tr>
<td>0</td>
<td>SP.FRZ</td>
<td>0</td>
<td>Freeze Switchpoint.</td>
</tr>
</tbody>
</table>
User Dynamic Network Port State (UDN_SP_STATE)

This register accesses the switch point state for the port specified by UDN_SP_FIFO_SEL. When read, returns the associated port state. When written, it writes the specified data into the FIFO.

Speed
Slow

Minimum Protection Level
UDN_ACCESS

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:20</td>
<td>Reserved</td>
<td>0x0</td>
<td></td>
</tr>
<tr>
<td>19:18</td>
<td>OP_CREDIT</td>
<td></td>
<td>Number of credits at the output port available to send packet words to neighbor.</td>
</tr>
<tr>
<td>17</td>
<td>OP_LOCKED</td>
<td></td>
<td>Output port is currently locked on a given input port (mid-packet).</td>
</tr>
<tr>
<td>16:13</td>
<td>OP_MUX_SEL</td>
<td></td>
<td>Input port being selected for output port. Bit[0] is the default route (South input port for North output port for example). the remaining bits walk around compass clockwise starting from default route, skipping output port. The core port is between South and West for this algorithm. For the North output port, bits are: 3 East 2 West 1 Core 0 South.</td>
</tr>
<tr>
<td>12</td>
<td>IP_SOP</td>
<td></td>
<td>The next word to be dequeued is the route header for a new packet.</td>
</tr>
</tbody>
</table>
### Table A-37. UDN_SP_STATE Register Bit Descriptions (continued)

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>11</td>
<td>IP_EOP</td>
<td></td>
<td>Next word to be dequeued is the last word in a packet.</td>
</tr>
<tr>
<td>10:4</td>
<td>IP_WORDS_REM</td>
<td></td>
<td>The number of words remaining in packet being dequeued from input port. When 0 and IP_SOP = 1, no packet is being dequeued. When 0 and IP_SOP is 0, there are 128 words remaining in the packet.</td>
</tr>
<tr>
<td>3:0</td>
<td>FCNT</td>
<td></td>
<td>The number of valid entries in the associated FIFO.</td>
</tr>
</tbody>
</table>

### User Dynamic Network Tag 0 (UDN_TAG_0)

This register contains the tag for channel 0 of the User Dynamic Network.

**Speed**

Slow

**Minimum Protection Level**

UDN_ACCESS

### User Dynamic Network Tag 1 (UDN_TAG_1)

This register contains the tag for channel 1 of the User Dynamic Network.

**Speed**

Slow

**Minimum Protection Level**

UDN_ACCESS

### User Dynamic Network Tag 2 (UDN_TAG_2)

This register contains the tag for channel 2 of the User Dynamic Network.

**Speed**

Slow

**Minimum Protection Level**

UDN_ACCESS
User Dynamic Network Tag 3 (UDN_TAG_3)

This register contains the tag for channel 3 of the User Dynamic Network.

**Speed**

Slow

**Minimum Protection Level**

UDN_ACCESS

User Dynamic Network Tag Valid (UDN_TAG_VALID)

This register specifies which tags are valid for the User Dynamic Network.

**Speed**

Slow

**Minimum Protection Level**

UDN_ACCESS

![Figure A-6: UDN_TAG_VALID Register Diagram](image)

Table A-38. UDN_TAG_VALID Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:12</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>11:8</td>
<td>RF</td>
<td>0</td>
<td>Refill Mode</td>
</tr>
<tr>
<td>7:4</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>3:0</td>
<td>VLD</td>
<td>0</td>
<td>Tag Valid</td>
</tr>
</tbody>
</table>
User Dynamic Network Tile Coordinates (UDN_TILE_COORD)

This register contains the tile coordinates for the User Dynamic Network.

Speed

Slow

Minimum Protection Level

UDN_ACCESS

Table A-39. UDN_TILECOORD Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:30</td>
<td>EDGE</td>
<td>0</td>
<td>Edge.</td>
</tr>
<tr>
<td>29</td>
<td>Reserved</td>
<td></td>
<td>Reserved.</td>
</tr>
<tr>
<td>28:18</td>
<td>XLOC</td>
<td>1</td>
<td>X location.</td>
</tr>
<tr>
<td>17:7</td>
<td>YLOC</td>
<td>1</td>
<td>Y location.</td>
</tr>
<tr>
<td>6:1</td>
<td>Reserved</td>
<td></td>
<td>Reserved.</td>
</tr>
<tr>
<td>0</td>
<td>ROUTE_ORDER</td>
<td>0</td>
<td>For TILEPro: 0 When 0, packets are routed in the X dimension first followed by the Y dimension.</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>1 When 1, the Y dimension is routed first.</td>
</tr>
</tbody>
</table>

Figure A-7: UDN_TILECOORD Register Diagram
User Dynamic Network Catch-All Data (UDN_CA_DATA)

This register contains the next word to be read from the message at the head of the Catch-all Queue. Reading this register dequeues from the Catch-all Queue.

**Speed**

Fast

**Minimum Protection Level**

UDN_ACCESS

User Dynamic Network Catch-all Remaining Words (UDN_CA_REM)

This register contains the number of words remaining to be read from the message at the head of the Catch-all Queue.

**Speed**

Fast

**Minimum Protection Level**

UDN_ACCESS

![Figure A-8: UDN_CA_REM Register Diagram](image)

**Table A-40. UDN_CA_REM Register Bit Descriptions**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:7</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>6:0</td>
<td>UDN_CA_REM</td>
<td>0</td>
<td>This register contains the number of words remaining to be read from the message at the head of the Catch-all Queue. When no message is in the Catch-all Queue, this field is 0.</td>
</tr>
<tr>
<td></td>
<td>Reserved 0x0</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
User Dynamic Network Catch-all Tag (UDN_CA_TAG)

This register contains the tag the message at the head of the Catch-all Queue.

**Speed**
Fast

**Minimum Protection Level**
UDN_ACCESS

User Dynamic Network Data Available (UDN_DATA_AVAIL)

This register contains bit fields that indicate that data is available on particular User Dynamic Network demultiplexor ports.

**Speed**
Fast

**Minimum Protection Level**
UDN_ACCESS

![UDN_DATA_AVAIL Register Diagram](image)

**Figure A-9: UDN_DATA_AVAIL Register Diagram**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:5</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>AVAIL_CA</td>
<td>0</td>
<td>Data is available to be read on UDN catch-all queue.</td>
</tr>
<tr>
<td>3</td>
<td>AVAIL_3</td>
<td>0</td>
<td>Data is available to be read on UDN demultiplexor port 3.</td>
</tr>
<tr>
<td>2</td>
<td>AVAIL_2</td>
<td>0</td>
<td>Data is available to be read on UDN demultiplexor port 2.</td>
</tr>
<tr>
<td>1</td>
<td>AVAIL_1</td>
<td>0</td>
<td>Data is available to be read on UDN demultiplexor port 1.</td>
</tr>
<tr>
<td>0</td>
<td>AVAIL_0</td>
<td>0</td>
<td>Data is available to be read on UDN demultiplexor port 0.</td>
</tr>
</tbody>
</table>
User Dynamic Network Refill Available Enable (UDN_REFILL_EN)

This register controls whether or not a particular UDN input port signals the UDN refill interrupt when data is available.

**Speed**

Slow

**Minimum Protection Level**

UDN_REFILL

![Figure A-10: UDN_REFILL_EN Register Diagram](image)

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:4</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>EN_3</td>
<td>0</td>
<td>Enable UDN 3 Refill Interrupt</td>
</tr>
<tr>
<td>2</td>
<td>EN_2</td>
<td>0</td>
<td>Enable UDN 2 Refill Interrupt</td>
</tr>
<tr>
<td>1</td>
<td>EN_1</td>
<td>0</td>
<td>Enable UDN 1 Refill Interrupt</td>
</tr>
<tr>
<td>0</td>
<td>EN_0</td>
<td>0</td>
<td>Enable UDN 0 Refill Interrupt</td>
</tr>
</tbody>
</table>
User Dynamic Network Remaining (UDN_REMAINING)

This register controls how many words remain to be written until the UDN complete interrupt is signaled.

Speed
Slow

Minimum Protection Level
UDN_COMPLETE

Table A-43. UDN_REMAINING Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:8</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>7:0</td>
<td>WORDS</td>
<td>0</td>
<td>Number of words left to be written</td>
</tr>
</tbody>
</table>
User Dynamic Network Available Enables (UDN_AVAIL_EN)

This register controls whether or not a particular UDN input port signals the UDN available interrupt when data is available.

Speed
Slow

Minimum Protection Level
UDN_AVAIL

![UDN_AVAIL_EN Register Diagram](image)

**Figure A-12: UDN_AVAIL_EN Register Diagram**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:4</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>EN_3</td>
<td></td>
<td>Enable UDN 3 Available Interrupt</td>
</tr>
<tr>
<td>2</td>
<td>EN_2</td>
<td></td>
<td>Enable UDN 2 Available Interrupt</td>
</tr>
<tr>
<td>1</td>
<td>EN_1</td>
<td></td>
<td>Enable UDN 1 Available Interrupt</td>
</tr>
<tr>
<td>0</td>
<td>EN_0</td>
<td></td>
<td>Enable UDN 0 Available Interrupt</td>
</tr>
</tbody>
</table>
User Dynamic Network Deadlock Counter (UDN_DEADLOCK_COUNT)

This register is used to save/restore current state of deadlock down-counter.

**Speed**

Slow

**Minimum Protection Level**

UDN_TIMER

![Figure A-13: UDN_DEADLOCK_COUNT Register Diagram](image_url)

Table A-45. UDN_DEADLOCK_COUNT Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:16</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>15:0</td>
<td>UDN_DEADLOCK_COUNT</td>
<td>0</td>
<td>UDN deadlock count</td>
</tr>
</tbody>
</table>
User Dynamic Network Deadlock Timeout (UDN_DEADLOCK_TIMEOUT)

This register provides the number of 16-cycle intervals to wait before asserting the deadlock interrupt when data is stalled in the demux logic’s dequeueing buffer.

**Speed**
- Slow

**Minimum Protection Level**
- UDN_TIMER

![Figure A-14](udn_deadlock_timeout_register_diagram)

**Table A-46. UDN_DEADLOCK_TIMEOUT Register Bit Descriptions**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:16</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>15:0</td>
<td>UDN_DEADLOCK_TIMEOUT</td>
<td>0</td>
<td>UDN Deadlock Timeout</td>
</tr>
</tbody>
</table>
Cycle Counter High (CYCLE_HIGH)

This register contains the top 32 bits of the 64 bit cycle counter. The cycle counter is incremented every machine cycle.

Speed
Slow

MPL
WORLD_ACCESS

Figure A-15: CYCLE_HIGH Register Diagram

Cycle Counter Low (CYCLE_LOW)

This register contains the bottom 32 bits of the 64 bit cycle counter. The cycle counter is incremented every machine cycle.

Speed
Slow

MPL
WORLD_ACCESS

Figure A-16: CYCLE_LOW Register Diagram
Appendix A Special Purpose Registers

Done Magic Register (DONE)

A magic register that is used to signal completion information to the test system. PASS/FAIL/DONE share a 32-bit storage element.

- **Speed**: Slow
- **MPL**: WORLD_ACCESS

```
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
```

*Figure A-17: DONE Register Diagram*

Fail Magic Register (FAIL)

A magic register that is used to signal failure information to the test system. PASS/FAIL/DONE share a 32-bit storage element.

- **Speed**: Slow
- **MPL**: WORLD_ACCESS

```
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
```

*Figure A-18: FAIL Register Diagram*
Interrupt Critical Section (INTERRUPT_CRITICAL_SECTION)

This register specifies whether or not the main processor is in an interrupt critical section. This register is used by interrupts and iret instructions.

**Speed**
Slow

**MPL**
WORLD_ACCESS

![Interrupt Critical Section Register Diagram](image1)

Pass Magic Register (PASS)

A magic register that is used to pass information to the test system. PASS/FAIL/DONE share a 32-bit storage element.

**Speed**
Slow

**MPL**
WORLD_ACCESS

![Pass Magic Register Diagram](image2)
Appendix A Special Purpose Registers

Exceptional Context Protection Level 0 Entry 0 (EX_CONTEXT_0_0)

This register specifies the first part of the exceptional context for protection level 0. This register is used by interrupts and iret instructions.

**Speed**

Slow

**Minimum Protection Level**

INTCTRL_0

![Figure A-21: EX_CONTEXT_0_0 Register Diagram]

Table A-47. EX_CONTEXT_0_0 Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:0</td>
<td>PC</td>
<td>0</td>
<td>The program counter for the context that was interrupted.</td>
</tr>
</tbody>
</table>
Exceptional Context Protection Level 0 Entry 1 (EX_CONTEXT_0_1)

This register specifies the second part of the exceptional context for protection level 0. This register is used by interrupts and iret instructions.

**Speed**
Slow

**Minimum Protection Level**
INTCTRL_0

![Figure A-22: EX_CONTEXT_0_1 Register Diagram](image)

**Table A-48. EX_CONTEXT_0_1 Register Bit Descriptions**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:3</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>2</td>
<td>ICS</td>
<td>0</td>
<td>Interrupt Critical Section. This bit indicates if the interrupted context is in an interrupt critical section.</td>
</tr>
<tr>
<td>1:0</td>
<td>PL</td>
<td>0</td>
<td>Protection Level. This field provides the protection level for the context that was interrupted.</td>
</tr>
</tbody>
</table>
Interrupt Control 0 Status (INTCTRL_N_STATUS)

This register is used to specify the interrupt control 0 interrupt.

Speed

- Slow

Minimum Protection Level

- INTCTRL_0

![Diagram of INTCTRL_0_STATUS Register]

**Figure A-23: INTCTRL_0_STATUS Register Diagram**

**Table A-49. INTCTRL_0_STATUS Register Bit Descriptions**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:1</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>0</td>
<td>INTCTRL_N_STATUS</td>
<td>0</td>
<td>This field specifies the interrupt control N interrupt.</td>
</tr>
</tbody>
</table>
Interrupt Mask Protection Level 0 Entry 0 (INTERRUPT_MASK_0_0)

This register is used to mask (disable) interrupts. A value of 1 in a bit position masks the interrupt and a value of 0 enables the interrupt. This register specifies the interrupt mask for interrupts 0 through 31 (see Table 8-11 on page 386 for the mapping of interrupt numbers).

Speed

Slow

Minimum Protection Level

INTCTRL_0

![Register Diagram](image)

Figure A-24: INTERRUPT_MASK_0_0 Register Diagram

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>MASK_31</td>
<td>1</td>
<td>A value of 1 disables the IDN_AVAIL interrupt.</td>
</tr>
<tr>
<td>30</td>
<td>MASK_30</td>
<td>1</td>
<td>A value of 1 disables the UDN_CA interrupt.</td>
</tr>
</tbody>
</table>
### Table A-50. INTERRUPT_MASK_0_0 Register Bit Descriptions (continued)

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>29</td>
<td>MASK_29</td>
<td>1</td>
<td>A value of 1 disables the IDN_CA interrupt.</td>
</tr>
<tr>
<td>28</td>
<td>MASK_28</td>
<td>1</td>
<td>A value of 1 disables the DMA_NOTIFY interrupt.</td>
</tr>
<tr>
<td>27</td>
<td>MASK_27</td>
<td>1</td>
<td>A value of 1 disables the UDN TIMER interrupt.</td>
</tr>
<tr>
<td>26</td>
<td>MASK_26</td>
<td>1</td>
<td>A value of 1 disables the IDN TIMER interrupt.</td>
</tr>
<tr>
<td>25</td>
<td>MASK_25</td>
<td>1</td>
<td>A value of 1 disables the TILE_TIMER interrupt.</td>
</tr>
<tr>
<td>24</td>
<td>MASK_24</td>
<td>1</td>
<td>A value of 1 disables the UDN_FIREWALL interrupt.</td>
</tr>
<tr>
<td>23</td>
<td>MASK_23</td>
<td>1</td>
<td>A value of 1 disables the IDN_FIREWALL interrupt.</td>
</tr>
<tr>
<td>22</td>
<td>MASK_22</td>
<td>1</td>
<td>A value of 1 disables the SN_FIREWALL interrupt.</td>
</tr>
<tr>
<td>21</td>
<td>MASK_21</td>
<td>1</td>
<td>Reserved</td>
</tr>
<tr>
<td>20</td>
<td>MASK_20</td>
<td>1</td>
<td>Reserved</td>
</tr>
<tr>
<td>19</td>
<td>MASK_19</td>
<td>1</td>
<td>A value of 1 disables the DMATLB_ACCESS interrupt.</td>
</tr>
<tr>
<td>18</td>
<td>MASK_18</td>
<td>1</td>
<td>A value of 1 disables the DMATLB_MISS interrupt.</td>
</tr>
<tr>
<td>17:11</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>10</td>
<td>MASK_10</td>
<td>1</td>
<td>A value of 1 disables the UDN_COMPLETE interrupt.</td>
</tr>
<tr>
<td>9</td>
<td>MASK_9</td>
<td>1</td>
<td>A value of 1 disables the IDN_COMPLETE interrupt.</td>
</tr>
<tr>
<td>8</td>
<td>MASK_8</td>
<td>1</td>
<td>A value of 1 disables the UDN_REFILL interrupt.</td>
</tr>
<tr>
<td>7</td>
<td>MASK_7</td>
<td>1</td>
<td>A value of 1 disables the IDN_REFILL interrupt.</td>
</tr>
<tr>
<td>6:2</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>1</td>
<td>MASK_1</td>
<td>1</td>
<td>A value of 1 disables the MEM_ERROR interrupt.</td>
</tr>
<tr>
<td>0</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
</tbody>
</table>
Interrupt Mask Protection Level 0 Entry 1 (INTERRUPT_MASK_0_1)

This register is used to mask (disable) interrupts. A value of 1 in a bit position masks the interrupt and a value of 0 enables the interrupt. This register specifies the interrupt mask for interrupts 32 through 37 (see Table 8-11 on page 386 for the mapping of interrupt numbers).

**Speed**

Slow

**Minimum Protection Level**

INTCTRL_0

![Register Diagram](image)

Table A-51. INTERRUPT_MASK_0_1 Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:6</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>31:17</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>16</td>
<td>MASK_48</td>
<td>1</td>
<td>Added in TILEPro: A value of 1 disables the AUX_PERF_COUNT interrupt.</td>
</tr>
<tr>
<td>15</td>
<td>MASK_47</td>
<td>1</td>
<td>Added in TILEPro: A value of 1 disables the SN_STATIC_ACCESS interrupt.</td>
</tr>
<tr>
<td>15:6</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>5</td>
<td>MASK_37</td>
<td>1</td>
<td>A value of 1 disables the INTCTRL_0 interrupt.</td>
</tr>
<tr>
<td>4</td>
<td>MASK_36</td>
<td>1</td>
<td>A value of 1 disables the INTCTRL_1 interrupt.</td>
</tr>
<tr>
<td>3</td>
<td>MASK_35</td>
<td>1</td>
<td>A value of 1 disables the INTCTRL_2 interrupt.</td>
</tr>
<tr>
<td>2</td>
<td>MASK_34</td>
<td>1</td>
<td>A value of 1 disables the INTCTRL_3 interrupt.</td>
</tr>
<tr>
<td>1</td>
<td>MASK_33</td>
<td>1</td>
<td>A value of 1 disables the PERF_COUNT interrupt.</td>
</tr>
<tr>
<td>0</td>
<td>MASK_32</td>
<td>1</td>
<td>A value of 1 disables the UDN_AVAIL interrupt.</td>
</tr>
</tbody>
</table>
Interrupt Mask Protection Level 0 Entry 0 (INTERRUPT_MASK_RESET_0)

This register is used to clear bits in the interrupt mask. Writing a value of 1 to a bit position resets the interrupt mask for that position. Writing a value of 0 to a bit position has no effect. This register clears the interrupt mask for interrupts 0 through 31 (see Table 8-11 on page 386 for the mapping of interrupt numbers).

Speed

Slow

Minimum Protection Level

INTCTRL_0

Figure A-26: INTERRUPT_MASK_RESET_0 Register Diagram

Table A-52. INTERRUPT_MASK_RESET_0 Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>MASK_31</td>
<td>1</td>
<td>A value of 1 enables the IDN_AVAIL interrupt.</td>
</tr>
</tbody>
</table>
Table A-52. INTERRUPT_MASK_RESET_0 Register Bit Descriptions (continued)

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>30</td>
<td>MASK_30</td>
<td>1</td>
<td>A value of 1 enables the UDN_CA interrupt.</td>
</tr>
<tr>
<td>29</td>
<td>MASK_29</td>
<td>1</td>
<td>A value of 1 enables the IDN_CA interrupt.</td>
</tr>
<tr>
<td>28</td>
<td>MASK_28</td>
<td>1</td>
<td>A value of 1 enables the DMA_NOTIFY interrupt.</td>
</tr>
<tr>
<td>27</td>
<td>MASK_27</td>
<td>1</td>
<td>A value of 1 enables the UDN_TIMER interrupt.</td>
</tr>
<tr>
<td>26</td>
<td>MASK_26</td>
<td>1</td>
<td>A value of 1 enables the IDN_TIMER interrupt.</td>
</tr>
<tr>
<td>25</td>
<td>MASK_25</td>
<td>1</td>
<td>A value of 1 enables the TILE_TIMER interrupt.</td>
</tr>
<tr>
<td>24</td>
<td>MASK_24</td>
<td>1</td>
<td>A value of 1 enables the UDN_FIREWALL interrupt.</td>
</tr>
<tr>
<td>23</td>
<td>MASK_23</td>
<td>1</td>
<td>A value of 1 enables the IDN_FIREWALL interrupt.</td>
</tr>
<tr>
<td>22</td>
<td>MASK_22</td>
<td>1</td>
<td>A value of 1 enables the SN_FIREWALL interrupt.</td>
</tr>
<tr>
<td>21</td>
<td>MASK_21</td>
<td>1</td>
<td>Reserved</td>
</tr>
<tr>
<td>20</td>
<td>MASK_20</td>
<td>1</td>
<td>Reserved</td>
</tr>
<tr>
<td>19</td>
<td>MASK_19</td>
<td>1</td>
<td>A value of 1 enables the DMATLB_ACCESS interrupt.</td>
</tr>
<tr>
<td>18</td>
<td>MASK_18</td>
<td>1</td>
<td>A value of 1 enables the DMATLB_MISS interrupt.</td>
</tr>
<tr>
<td>17:11</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>10</td>
<td>MASK_10</td>
<td>1</td>
<td>A value of 1 enables the UDN_COMPLETE interrupt.</td>
</tr>
<tr>
<td>9</td>
<td>MASK_9</td>
<td>1</td>
<td>A value of 1 enables the IDN_COMPLETE interrupt.</td>
</tr>
<tr>
<td>8</td>
<td>MASK_8</td>
<td>1</td>
<td>A value of 1 enables the UDN_REFILL interrupt.</td>
</tr>
<tr>
<td>7</td>
<td>MASK_7</td>
<td>1</td>
<td>A value of 1 enables the IDN_REFILL interrupt.</td>
</tr>
<tr>
<td>6:2</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>1</td>
<td>MASK_1</td>
<td>1</td>
<td>A value of 1 enables the MEM_ERROR interrupt.</td>
</tr>
<tr>
<td>0</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
</tbody>
</table>
Interrupt Mask Protection Level 0 Entry 1 (INTERRUPT_MASK_RESET_0_1)

This register is used to clear bits in the interrupt mask. Writing a value of 1 to a bit position resets the interrupt mask for that position. Writing a value of 0 to a bit position has no effect. This register clears the interrupt mask for interrupts 32 through 37 (see Table 8-11 on page 386 for the mapping of interrupt numbers).

**Speed**

Slow

**Minimum Protection Level**

INTCTRL_0

Table A-53. INTERRUPT_MASK_RESET_0_1 Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:17</td>
<td>Reserved</td>
<td>0x0</td>
<td>Reserved</td>
</tr>
<tr>
<td>16</td>
<td>MASK_48</td>
<td>1</td>
<td>Added in TILEPro: A value of 1 enables the AUX_PERF_COUNT interrupt.</td>
</tr>
<tr>
<td>15</td>
<td>MASK_47</td>
<td>1</td>
<td>Added in TILEPro: A value of 1 enables the SN_STATIC_ACCESS interrupt.</td>
</tr>
<tr>
<td>15:6</td>
<td>Reserved</td>
<td>0x0</td>
<td>Reserved</td>
</tr>
<tr>
<td>5</td>
<td>MASK_37</td>
<td>1</td>
<td>A value of 1 enables the INTCTRL_0 interrupt.</td>
</tr>
<tr>
<td>4</td>
<td>MASK_36</td>
<td>1</td>
<td>A value of 1 enables the INTCTRL_1 interrupt.</td>
</tr>
<tr>
<td>3</td>
<td>MASK_35</td>
<td>1</td>
<td>A value of 1 enables the INTCTRL_2 interrupt.</td>
</tr>
<tr>
<td>2</td>
<td>MASK_34</td>
<td>1</td>
<td>A value of 1 enables the INTCTRL_3 interrupt.</td>
</tr>
<tr>
<td>1</td>
<td>MASK_33</td>
<td>1</td>
<td>A value of 1 enables the PERF_COUNT interrupt.</td>
</tr>
<tr>
<td>0</td>
<td>MASK_32</td>
<td>1</td>
<td>A value of 1 enables the UDN_AVAIL interrupt.</td>
</tr>
</tbody>
</table>
Interrupt Mask Protection Level 0 Entry 0 (INTERRUPT_MASK_SET_0_0)

This register is used to set bits in the interrupt mask. Writing a value of 1 to a bit position sets the interrupt mask for that position. Writing a value of 0 to a bit position has no effect. This register sets the interrupt mask for interrupts 0 through 31 (see Table 8-11 on page 386 for the mapping of interrupt numbers).

**Speed**

Slow

**Minimum Protection Level**

INTCTRL_0

![REGISTER DIAGRAM]

Figure A-28: INTERRUPT_MASK_SET_0_0 Register Diagram

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>MASK_31</td>
<td>1</td>
<td>A value of 1 disables the IDN_AVAIL interrupt.</td>
</tr>
</tbody>
</table>
### Table A-54. INTERRUPT_MASK_SET_0_0 Register Bit Descriptions (continued)

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>30</td>
<td>MASK_30</td>
<td>1</td>
<td>A value of 1 disables the UDN_CA interrupt.</td>
</tr>
<tr>
<td>29</td>
<td>MASK_29</td>
<td>1</td>
<td>A value of 1 disables the IDN_CA interrupt.</td>
</tr>
<tr>
<td>28</td>
<td>MASK_28</td>
<td>1</td>
<td>A value of 1 disables the DMA_NOTIFY interrupt.</td>
</tr>
<tr>
<td>27</td>
<td>MASK_27</td>
<td>1</td>
<td>A value of 1 disables the UDN_TIMER interrupt.</td>
</tr>
<tr>
<td>26</td>
<td>MASK_26</td>
<td>1</td>
<td>A value of 1 disables the IDN_TIMER interrupt.</td>
</tr>
<tr>
<td>25</td>
<td>MASK_25</td>
<td>1</td>
<td>A value of 1 disables the TILE_TIMER interrupt.</td>
</tr>
<tr>
<td>24</td>
<td>MASK_24</td>
<td>1</td>
<td>A value of 1 disables the UDN_FIREWALL interrupt.</td>
</tr>
<tr>
<td>23</td>
<td>MASK_23</td>
<td>1</td>
<td>A value of 1 disables the IDN_FIREWALL interrupt.</td>
</tr>
<tr>
<td>22</td>
<td>MASK_22</td>
<td>1</td>
<td>A value of 1 disables the SN_FIREWALL interrupt.</td>
</tr>
<tr>
<td>21</td>
<td>MASK_21</td>
<td>1</td>
<td>Reserved</td>
</tr>
<tr>
<td>20</td>
<td>MASK_20</td>
<td>1</td>
<td>Reserved</td>
</tr>
<tr>
<td>19</td>
<td>MASK_19</td>
<td>1</td>
<td>A value of 1 disables the DMATLB_ACCESS interrupt.</td>
</tr>
<tr>
<td>18</td>
<td>MASK_18</td>
<td>1</td>
<td>A value of 1 disables the DMATLB_MISS interrupt.</td>
</tr>
<tr>
<td>17:11</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>10</td>
<td>MASK_10</td>
<td>1</td>
<td>A value of 1 disables the UDN_COMPLETE interrupt.</td>
</tr>
<tr>
<td>9</td>
<td>MASK_9</td>
<td>1</td>
<td>A value of 1 disables the IDN_COMPLETE interrupt.</td>
</tr>
<tr>
<td>8</td>
<td>MASK_8</td>
<td>1</td>
<td>A value of 1 disables the UDN_REFILL interrupt.</td>
</tr>
<tr>
<td>7</td>
<td>MASK_7</td>
<td>1</td>
<td>A value of 1 disables the IDN_REFILL interrupt.</td>
</tr>
<tr>
<td>6:2</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>1</td>
<td>MASK_1</td>
<td>1</td>
<td>A value of 1 disables the MEM_ERROR interrupt.</td>
</tr>
<tr>
<td>0</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
</tbody>
</table>
Interrupt Mask Protection Level 0 Entry 1 (INTERRUPT_MASK_SET_0_1)

This register is used to set bits in the interrupt mask. Writing a value of 1 to a bit position sets the interrupt mask for that position. Writing a value of 0 to a bit position has no effect. This register sets the interrupt mask for interrupts 32 through 37 (see Table 8-11 on page 386 for the mapping of interrupt numbers).

Speed
Slow

Minimum Protection Level
INTCTRL_0

![Register Diagram]

Figure A-29: INTERRUPT_MASK_SET_0_1 Register Diagram

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:17</td>
<td>Reserved</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>16</td>
<td>MASK_48</td>
<td>1</td>
<td>Added in TILEPro: A value of 1 disables the AUX_PERF_COUNT interrupt.</td>
</tr>
<tr>
<td>15</td>
<td>MASK_47</td>
<td>1</td>
<td>Added in TILEPro: A value of 1 disables the SN_STATIC_ACCESS interrupt.</td>
</tr>
<tr>
<td>15:6</td>
<td>Reserved</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>5</td>
<td>MASK_37</td>
<td>1</td>
<td>A value of 1 disables the INTCTRL_0 interrupt.</td>
</tr>
<tr>
<td>4</td>
<td>MASK_36</td>
<td>1</td>
<td>A value of 1 disables the INTCTRL_1 interrupt.</td>
</tr>
<tr>
<td>3</td>
<td>MASK_35</td>
<td>1</td>
<td>A value of 1 disables the INTCTRL_2 interrupt.</td>
</tr>
<tr>
<td>2</td>
<td>MASK_34</td>
<td>1</td>
<td>A value of 1 disables the INTCTRL_3 interrupt.</td>
</tr>
<tr>
<td>1</td>
<td>MASK_33</td>
<td>1</td>
<td>A value of 1 disables the UDN_AVAIL interrupt.</td>
</tr>
<tr>
<td>0</td>
<td>MASK_32</td>
<td>1</td>
<td>A value of 1 disables the UDN_AVAIL interrupt.</td>
</tr>
</tbody>
</table>

Table A-55. INTERRUPT_MASK_SET_0_1 Register Bit Descriptions
Appendix A Special Purpose Registers

System Save Register Level 0 Entry 0 (SYSTEM_SAVE_0_0)
This register is used to save system state during interrupt critical sections.

Speed
Fast

Minimum Protection Level
INTCTRL_0

System Save Register Level 0 Entry 1 (SYSTEM_SAVE_0_1)
This register is used to save system state during interrupt critical sections.

Speed
Fast

Minimum Protection Level
INTCTRL_0

System Save Register Level 0 Entry 2 (SYSTEM_SAVE_0_2)
This register is used to save system state during interrupt critical sections.

Speed
Fast

Minimum Protection Level
INTCTRL_0

System Save Register Level 0 Entry 3 (SYSTEM_SAVE_0_3)
This register is used to save system state during interrupt critical sections.

Speed
Fast

Minimum Protection Level
INTCTRL_0
Minimum Protection Level for Tile Timer (MPL_TILE_TIMER)

This register specifies the minimum protection level needed to access the tile timer administratively. This register also serves as the protection level that handles tile timer interrupts.

Speed
Slow

Minimum Protection Level
TILE_TIMER

Figure 2. MPL_TILE_TIMER Register Diagram

Table A-56. MPL_TILE_TIMER Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:2</td>
<td>Reserved</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>1:0</td>
<td>MPL</td>
<td>0</td>
<td>Minimum Protection Level</td>
</tr>
</tbody>
</table>
Appendix A Special Purpose Registers

DMA Byte (DMA_BYTE) Register

This register specifies the number of chunks that a DMA operation will transfer, as well as the number of bytes that will be transferred in the first chunk.

Speed

Fast

Minimum Protection Level

DMA_NOTIFY

![Figure A-1: DMA_BYTE Register Diagram](image)

Table A-57. DMA_BYTE Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:20</td>
<td>CHUNK_NUMBER</td>
<td>0</td>
<td>Number of chunks to be transferred. The first chunk will contain the number of bytes specified in the SIZE field of this register; the remaining chunks, if this field is greater than 1, will contain the number of bytes specified by the CHUNK_SIZE register.</td>
</tr>
<tr>
<td>19:0</td>
<td>SIZE</td>
<td>0</td>
<td>Number of bytes to be transferred in the first chunk. For multi-chunk transfers, this should be less than or equal to the value in the CHUNK_SIZE register.</td>
</tr>
</tbody>
</table>
DMA Chunk Size (DMA_CHUNK_SIZE) Register

This register specifies the DMA chunk size in bytes. It need not be set for an operation if only one chunk is to be transferred.

**Speed**

Fast

**Minimum Protection Level**

DMA_NOTIFY

![DMA_CHUNK_SIZE Register Diagram](image)

**Table A-58. DMA_CHUNK_SIZE Register Bit Descriptions**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:20</td>
<td>Reserved</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>19:0</td>
<td>DMA_CHUNK_SIZE</td>
<td>0</td>
<td>This register specifies the DMA chunk size in bytes. It need not be set for an operation if only one chunk is to be transferred.</td>
</tr>
</tbody>
</table>
DMA Control (DMA_CTR) Register

This register controls the DMA engine. To perform a DMA request, the DMA transfer description registers (DMA_BYTE, DMA_CHUNK_SIZE, DMA_DST_ADDR, DMA_DST_CHUNK_ADDR, DMA_SRC_ADDR, DMA_SRC_CHUNK_ADDR, and DMA_STRIDE) are set appropriately, and then the REQUEST bit in this register is set. To context-switch the DMA engine, the SUSPEND bit in this register is set; then, once the BUSY bit in the DMA_USER_STATUS register has cleared, the transfer description registers are read and their contents saved. At a later time, those values may be re-loaded into the corresponding registers and the DMA engine restarted by writing the REQUEST bit; the transfer will then continue from when it was suspended.

**Speed**

Fast

**Minimum Protection Level**

DMA_NOTIFY

---

### Table A-59. DMA_CTR Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:2</td>
<td>Reserved</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>SUSPEND</td>
<td>0</td>
<td>1 When set to 1, suspends the currently active DMA operation; this has no effect if no DMA operation is currently in progress. The DMA operation has not been suspended until the BUSY bit in the STATUS register has cleared.</td>
</tr>
<tr>
<td>0</td>
<td>REQUEST</td>
<td>0</td>
<td>1 When set to 1, starts a new DMA operation; this has no effect if a DMA operation is currently in progress.</td>
</tr>
</tbody>
</table>

---

![DMA_CTR Register Diagram](image_url)
DMA Destination Address (DMA_DST_ADDR) Register

This register holds the address of the first byte to be written when the next DMA operation is started; this will normally be identical to the DST_CHUNK_ADDR register unless the DMA engine is being restarted after partially transferring a chunk.

**Speed**

Fast

**Minimum Protection Level**

DMA_NOTIFY

![DMA_DST_ADDR Register Diagram](image)

*Figure A-4: DMA_DST_ADDR Register Diagram*

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:0</td>
<td>DMA_DST_ADDR</td>
<td>0</td>
<td>Address</td>
</tr>
</tbody>
</table>
Appendix A Special Purpose Registers

DMA Destination Chunk Address (DMA_DST_CHUNK_ADDR) Register

This register holds the address of the first byte in the first destination chunk for the next DMA operation. This may not be the first byte to be written, depending on the contents of the DST_ADDR register.

Speed
Fast

Minimum Protection Level
DMA_NOTIFY

![Figure A-5: DMA_DST_CHUNK_ADDR Register Diagram](image)

Table A-61. DMA_DST_CHUNK_ADDR Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:0</td>
<td>DMA_DST_CHUNK_ADDR</td>
<td>0</td>
<td>Address of the first byte in the first destination chunk for the next DMA operation.</td>
</tr>
</tbody>
</table>
DMA Source Address (DMA_SRC_ADDR) Register

This register holds the address of the first byte to be read when the next DMA operation is started; this will normally be identical to the SRC_CHUNK_ADDR register unless the DMA engine is being restarted after partially transferring a chunk.

**Speed**

Fast

**Minimum Protection Level**

DMA_NOTIFY

Table A-62. DMA_SRC_ADDR Register Bit Descriptions

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:0</td>
<td>DMA_SRC_ADDR</td>
<td>0</td>
<td>Address of the first byte to be read when the next DMA operation is started.</td>
</tr>
</tbody>
</table>

Figure A-6: DMA_SRC_ADDR Register Diagram
**Appendix A Special Purpose Registers**

**DMA Source Chunk Address (DMA_SRC_CHUNK_ADDR) Register**

This register holds the address of the first byte in the first source chunk for the next DMA operation. This may not be the first byte to be read, depending on the contents of the SRC_ADDR register.

**Speed**

Fast

**Minimum Protection Level**

DMA_NOTIFY

![Figure A-7: DMA_SRC_CHUNK_ADDR Register Diagram](image)

**Table A-63. DMA_SRC_CHUNK_ADDR Register Bit Descriptions**

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:0</td>
<td>DMA_SRC_CHUNK_ADDR</td>
<td>0</td>
<td>Address of the first byte in the first source chunk for the next DMA operation.</td>
</tr>
</tbody>
</table>
DMA Source And Destination Strides (DMA_STRIDE) Register

This register specifies the DMA source and destination strides. A stride is the distance between the first byte of successive chunks within one DMA operation; if only one chunk is transferred, the stride is irrelevant.

**Speed**

Fast

**Minimum Protection Level**

DMA_NOTIFY

![Figure A-8: DMA_STRIDE Register Diagram](image)

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:16</td>
<td>STORE</td>
<td>0</td>
<td>Store (destination) stride in bytes.</td>
</tr>
<tr>
<td>15:0</td>
<td>LOAD</td>
<td>0</td>
<td>Load (source) stride in bytes.</td>
</tr>
</tbody>
</table>
**DMA User Status (DMA_USER_STATUS) Register**

This register can be accessed by programs running at the DMA_NOTIFY PL; this is expected to be lower than the DMATLB_MISS PL.

**Speed**

Fast

**Minimum Protection Level**

DMA_NOTIFY

---

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31:7</td>
<td>Reserved</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>6</td>
<td>ERROR</td>
<td>0</td>
<td>Status only. 1 This bit is set when the DMA engine encounters an internal error. This bit is cleared when a write to DMA_CTR starts a new transfer.</td>
</tr>
<tr>
<td>5</td>
<td>Reserved</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>4</td>
<td>RUNNING</td>
<td>0</td>
<td>Status only. 1 If this bit is set, the last transfer started on the DMA engine has not been suspended via the SUSPEND bit in DMA_CTR. This bit is set when a write to DMA_CTR starts a new transfer; it is cleared when a write to DMA_CTR suspends an active transfer; it is not cleared in the event of a TLB miss, access violation, error, or normal DMA completion. This bit is used to determine whether the DMA engine should be restarted when exiting the DMATLB miss handler; it is suggested that the engine only be restarted if this bit is set.</td>
</tr>
<tr>
<td>3:2</td>
<td>Reserved</td>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>
Table A-65. DMA_USER_STATUS Register Bit Descriptions (continued)

<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Reset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>BUSY</td>
<td>0</td>
<td>Busy bit. Status only. If this bit is set, the DMA engine is active, and the contents of the DMA transfer description registers are undefined. If this bit is clear, and the engine has been paused due to the SUSPEND bit being set in the DM_CTR register, or due to a TLB miss or access violation, then the DONE bit will be set, and the DMA transfer description registers may be inspected to determine the state of the engine at the time of the suspension. If this bit is 0, and the engine completed the last DMA request, the DONE bit will be set, and the content of the DMA transfer description registers are undefined.</td>
</tr>
<tr>
<td>0</td>
<td>DONE</td>
<td>0</td>
<td>Done bit. This bit is set when a DMA transfer completes. It is cleared when a write to DMA CTR starts a new transfer; it may also be cleared by writing a 1 to it whenever the BUSY bit is 0. While this bit is set, the DMA_NOTIFY interrupt is asserted.</td>
</tr>
<tr>
<td>Term</td>
<td>Definition</td>
<td></td>
<td></td>
</tr>
<tr>
<td>------------------------</td>
<td>--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</td>
<td></td>
<td></td>
</tr>
<tr>
<td>CPLD</td>
<td>Complex PLD. A programmable logic device (PLD) that is made up of several simple PLDs (SPLDs) with a programmable switching matrix in between the logic blocks. CPLDs typically use EEPROM, flash memory or SRAM to hold the logic design interconnections.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>DDC™</td>
<td>Dynamic Distributed Cache. A system for accelerating multicore coherent cache subsystem performance. Based on the concept of a distributed L3 cache, a portion of which exists on each tile and is accessible to other tiles through the iMesh. A TLB directory structure exists on each tile — eliminating bottlenecks of centralized coherency management — mapping the locations of pages among the other tiles.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Dynamic Network</td>
<td>A network where the path of each message is determined at each switch point. The path of each message may be different, based on the contents of the message. This is in contrast to the static network, which has a statically specified route at each switch point, and every data follows an identical route.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>ECC</td>
<td>Error-Correcting Code. A type of memory that corrects errors on the fly.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>host port interfaces (HPis)</td>
<td>A 16-bit-wide parallel port through which a host processor can directly access the CPU's memory space. The host device functions as a master to the interface, which increases ease of access. The host and CPU can exchange information via internal or external memory. The host also has direct access to memory-mapped peripherals. Connectivity to the CPU's memory space is provided through the DMA controller.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Hypervisor services</td>
<td>Provided to support two basic operations: install a new page table (performed on context switch), and flush the TLB (performed after invalidating or changing a page table entry). On a page fault, the client receives an interrupt, and is responsible for taking appropriate action (such as making the necessary data available via appropriate changes to the page table, or terminating a user program which has used an invalid address).</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Little-endian byte ordering</td>
<td>More significant bytes are numbered with a higher byte address or byte number than less significant bytes (LSBs).</td>
<td></td>
<td></td>
</tr>
<tr>
<td>MPI</td>
<td>Message Passing Interface. MPI is a library specification for message-passing, proposed as a standard by a broadly based committee of vendors, implementors, and users.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>MPL</td>
<td>Minimum Protection Level. Each interrupt has a minimum protection level at which it may be processed. Interrupts which are signalled by a protection level less than the MPL are processed at the MPL protection level. Interrupts which are signalled at a protection level higher than the MPL are processed at the higher protection level. The MPL for a given interrupt is typically determined by system software.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Multicore Development Environment™ (MDE™)</td>
<td>Multicore programming environment.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Term</td>
<td>Definition</td>
<td></td>
<td></td>
</tr>
<tr>
<td>-------------------</td>
<td>----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</td>
<td></td>
<td></td>
</tr>
<tr>
<td>RAW Dependence</td>
<td>Read-after-Write dependence, or true dependence. RAW dependencies arise when a read operation on a location follows in program order a write operation to the same location. The read operation must receive the value from the most recent write operation, and must wait for the write operation to complete if the processor executes the operations simultaneously or out of order.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>SIMD</td>
<td>Single Instruction Multiple Data. An architecture that allows a single instruction to apply to multiple sets of data. In the Tile Processor™, SIMD instructions allow a single instruction to operate on registers containing four bytes or two halfwords.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>SPI-SROM</td>
<td>Serial Flash with serial peripheral interface.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Static Network</td>
<td>A network where the routing for a given input port is specified statically. Each data on an input port will be sent to the same output port. This is in contrast to a dynamic network, where each message on an input port may be routed to a different output port.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>UART</td>
<td>(Universal Asynchronous Receiver Transmitter). The electronic circuit that makes up the serial port. Also known as “universal serial asynchronous receiver transmitter” (USART), it converts parallel bytes from the CPU into serial bits for transmission, and vice versa. It generates and strips the start and stop bits appended to each character.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>VLIW architecture</td>
<td>VLIW (Very Long Instruction Word). A microprocessor design technology. A chip with VLIW technology is capable of executing many operations within one clock cycle. Essentially, a compiler reduces program instructions into basic operations that the processor can perform simultaneously. The operations are put into a very long instruction word that the processor then takes apart and passes the operations off to the appropriate devices.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>WAR Dependence</td>
<td>Write-after-Read dependence, or anti-dependence. WAR dependencies arise when a write operation on a location follows in program order a read operation to the same location. The read operation must not receive the value from the following write operation, so the write operation must wait for all previous read operations to complete if the processor executes the operations simultaneously or out of order.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>WAW Dependence</td>
<td>Write-after-Write dependence or output dependence. WAW dependencies arise when a write operation on a location follows in program order a write operation to the same location. The final write operation must be the value in the location after both operations are completed, so the second write operation must wait for all previous write operations to complete, or the earlier write operations must be ignored if the processor executes the operations simultaneously or out of order.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>wormhole routing</td>
<td>A network where the routing is determined by the header of a packet, and where once the header of a packet has traversed a switch point, the routing will not be changed until the last packet word has traversed the switch.</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
INDEX

A
about this manual 1
absolute difference half words 232
absolute difference unsigned bytes 231
ack frame conventions 359
add 44
  bytes saturating unsigned 222
  half words 224
    saturating 226
immediate
  bytes 228
  half words 229
    word 46
  in X0 bit descriptions 44
long immediate
  static write word 49
  word 48
upper long immediate word 52
word 44
  saturating 50
addbs_u 222
addd 224
addhs 226
addi 46
addih 229
addli 48, 359
addlis 49, 359
adds 50
adiffb_u 231
adiffh 232
ALIGNED_INSTRUCTION_MASK 31
and 122
  immediate word 124
    word 122
andi 124, 359
API 361
Application Programmer Interface
  See API
architectural no operation 216
arithmetic instructions 43
arithmetic shift
  right half words 334
arithmetic shift right
  bytes 332
  immediate bytes 336
  immediate half words 338
  immediate word 154
  word 152
atomic instructions 362
au li 52, 359
average
  byte unsigned 233
  half words 234
avgb_u 233
avg h 234
B
backtrace library 359
BACKWARD_OFFSET 31
b bns 96
b bnst 97
bs 98
b bst 99
b gez 100
b gezt 101
b gz 102
b gzt 103
bit exchange word 64
bit manipulation instructions 63
bitx 64
ble z 104
ble zt 105
bl z 106
blzt 107
bn z 108
bnzt 109
bpt 359
branch
  greater than
    zero predict taken word 103
  greater than or equal to
    zero predict taken word 101
    zero word 100
  greater than zero word 102
  less than
    zero taken word 107
  less than or equal to
    zero taken word 105
    zero word 104
  less than zero word 106
  not zero
Index

predict taken word 109
word 108
zero predict taken word 111
zero word 110
branch bit
not taken word 97
not set word 96
set taken word 99
set word 98
Branch Target Buffer 16
branchHintedCorrect 33
branchHintedIncorrect 33
BUSY bit 457
byte
defined 11
byte and bit order 2
byte exchange word 66
BYTE_16_ADDR_MASK 31
BYTE_MASK 0xFF 31
BYTE_SIZE_8 31
BYTE_SIZE_LOG_2 31
byteex 66
bz 110
bzt 111

C
cache architecture 362
cache engine 6, 7
cache microarchitecture 363
cache misses 362, 364
cache subsystems 363
cache-coherent shared memory 362
clz 68
Coherence Dynamic Network (CDN) 373
coherent I/O 366
compare instructions 76
conditional transfer operations 17
constants 30
collection instructions 95
conventions 2
count
leading zeros word 68
tailing zeros word 72
CRC32 32-bit step 70
crc32 8 71
CRC32 8-bit step 71
crc32_32 70
ctz 72
cycle counter high (CYCLE_HIGH) 431
cycle counter low (CYCLE_LOW) 431
CYCLE_HIGH register 431
CYCLE_LOW register 431
CYCLEHIGH SPRs 389
CYCLEHIGHMODIFY SPRs 390
CYCLELOW SPRs 389
CYCLELOWMODIFY SPRs 390

D
data flow control 382
data TLB probe 184
data writes
flushes 368
test-and-set 368
DDC 459
DDR-2 10
deadlock 378
deadlocks 375
definitions and semantics 30
demultiplex queue 15
demultiplexing (demux) hardware 377
demux 377
demux queue 15
destination register 364
destination register operands 17
Direct Memory Access
See DMA
Direct Memory Access, See DMA
direct-to-cache I/O 366
distributed coherent cached shared memory 364
DMA 366
registers 367
DMA Chunk Size register, See DMA CHUNK SIZE
DMA Control register, See DMA_CTR
DMA Destination Address register, See DMA_DST_ADDR
DMA Destination Chunk Address register, See
DMA_DST CHUNK_ADDR
DMA Source Address register, See DMA_SRC_ADDR
DMA Source And Destination Strides register, See
DMA STRIDE
DMA Source Chunk Address register, See
DMA SRC CHUNK ADD
DMA User Status register, See DMA USER STATUS
DMA CHUNK SIZE 449
DMA_CTR 450
DMA_DST_ADDR 451
DMA_DST CHUNK_ADDR 452
DMA_SRC_ADDR 453
DMA_SRC CHUNK_ADDR 454
DMA STRIDE 455
DMA USER STATUS 456
DONE bit 457
done magic register (DONE) 432
DONE register 432
double word
defined 11
double word align 74
drain 34, 348
drain instruction 348
dtlbpr 184
dtlbProbe 34
dword_align 74
Dynamic Distributed Cache 459
### Index

**E**
- end-to-end flow control 10
- EX_CONTEXT_SIZE 31
- EX_CONTEXT_SPRF_OFFSET 31
- EX0 17
- EX1 17
- exclusive or immediate word 162
- exclusive or word 160
- execute stages 17
- Execute0 16
- Execute1 16
- execution pipelines 16

**F**
- FAIL register 432
- Fetch 16
- filler no operation 214
- finv 185
- flits 374
- flow control 374
- flush 186
- flush and invalidate cache line 185
- flush cache line 186
- flushAndInvaliddataCacheLine 33
- flushCacheLine 33
- flushes 368
- fnop 34, 214
- functions 32

**G**
- general purpose register (GPR) 385
- general purpose registers 14, 389
- getCurrentPC 33
- getCurrentProtectionLevel 33
- getHighHalfWordUnsigned 34
- getLowHalfWordUnsigned 34
- GPR 385

**H**
- half word
  - defined 11
- HALF_WORD_ADDR_MASK 31
- HALF_WORD_SIZE_16 30
- hardwall 378
  - protection 382
- host port interfaces
  - see HPIs
- HPI 459
  - interface 10
- HPIs
  - defined 459

**I**
- I/O devices
  - interaction with 389
- I/O Dynamic Network (IDN) 373
- I/O interface 10
- illustrated 10
- icoh 349
- iCoherent 34
- IDN 378
- idn0 register 14
- idn1 register 14
- ill 350, 359
- illegal instruction 350
- illegalInstruction 34
- iMesh
  - described 10
- implementation dependence 4
- indirectBranchHintedCorrect 34
- indirectBranchHintedIncorrect 34
- info 359
- INFO operations 359
- infol 359
- Input/Output Dynamic Network (IDN) 8, 378
- instruction formats
  - X 20
  - X0 24
  - X1 21
  - Y 26
  - Y0 29
  - Y1 28
  - Y2 27
- instruction organization and format 19
- instruction set architecture 19
- Instruction Set Architecture See ISA
- instruction stream coherence 349
- INSTRUCTION_SIZE_64 31
- INSTRUCTION_SIZE_LOG_2 6 31
- instructions 347
  - arithmetic 43
  - bit manipulation 63
  - compare 76
  - control 95
  - logical 121
  - master list of main processor instructions 35
  - memory maintenance 183
  - multiply 190
  - NOP 214
  - SIMD 218
- INTCTRL_0 interrupt 445
- INTCTRL_1 interrupt 445
- INTCTRL_2 interrupt 445
- INTCTRL_3 interrupt 445
- interaction with I/O devices 389
  - interleave
    - high byte 235
    - high half words 237
    - low byte 239
    - low half words 241
  - interrupt
    - return 351
    - signaling DMA transfer complete 367
    - interrupt service routing
INDEX

See ISR

interrupts
  list 386
  overview 386
  user-level 389

inter-tile memory mapped communication 8

intrinsics 359

invalidCacheLine 33

IO Dynamic Network (IDN) 9

isa 19, 391

ISR 378

J
  j 359
  jal 359
  jalb 112
  jalf 113
  jalr 114
  jalrp 115
  jb 116
  jf 117
  jrp 118, 119
  jump
    and link
      backward 112
      forward 113
      register 114
      register predict 115
      backward 116
      forward 117
      register predict 118, 119

L
  L1 instruction and data caches 362
  L2 cache 362
  L2 cache subsystem 364
  L2 writebacks 364
  lb 164
  lb_u 165, 359
  lbadd 166
  lbadd_u 167
  less significant bytes (LSBs) 2
  lh 168
  lh_u 169
  lhadd 170
  lhadd_u 171
  link 120
  link width 374
  load byte 164
    unsigned 165
    unsigned and add 167
    half word 168
    and add 170
    unsigned 169
    and add 171
    word 172
    and add 174
    no alignment trap 173
    no alignment trap and add 175
  load byte and add 166
  loads and stores 364
  logical instructions 121
  logical shift
    left
      immediate bytes 292
      immediate word 146
      word 144
    right
      bytes 296
      immediate half words 302
      immediate word 150
      word 148
  lr register 14
  lw 172
  lw_na 173
  lwadd 174
  lwadd_na 175

M
  mask
    not zero
      byte 259
      half words 261
      word 128
    zero
      byte 263
      half words 265
      word 132
  masked merge word 126
  maxb_u 243
  maxh 245
  maxib_u 247
  maxih 249
  maximum
    byte unsigned 243
    half words 245
    immediate byte unsigned 247
    immediate half words 249
  memory
    distributed coherent cached shared memory 364
    fence (MF) 369
    instructions 163
maintenance instructions 183
memory consistency model 368
Memory Dynamic Network (MDN) 373
memory fence 188
memory fences (MF) 362
Memory Networks 373
memory networks 374
memoryFence 34
memoryReadByte 32
memoryReadHalfWord 32
memoryReadWord 32
memoryWriteByte 33
memoryWriteHalfWord 33
memoryWriteWord 33
Messaging Networks 373
messaging networks 375
mf 188
mf instruction 348
mfspr 352
minb_u 251
minh 253
minib_u 255
minih 257
minimum
  byte unsigned 251
  half words 253
  immediate byte unsigned 255
  immediate half words 257
Minimum Protection Level for Tile Timer, See MPL_TILE_TIMER
mn 126
mnz 128
mnzb 259
mnzh 261
move 359
  from special purpose register word 352
  not zero word 130
  to special purpose register word 353
  zero word 131
Move From Special Purpose Register (MFSPR) 391
Move To Special Purpose Register Word (MTSPR) 391
movei 359
moveli 359
movelis 359
MPI
defined 459
MPL
defined 459
MPL_TILE_TIMER 447
mtspr 353
MulAdd operations 17
mulhh_ss 191
mulhh_su 192
mulhh_uu 193
mulhha_ss 194
mulhha_su 195
mulhha_uu 196
mulhhsa_uu 197
mulh nameof 198
mulh nameof 199
mulh nameof 200
mulh nameof 201
mulh nameof 202
mulh nameof 203
mulh nameof 204
mulh nameof 205
mulh nameof 206
mulh nameof 207
mulh nameof 208
mulh nameof 209
mulh nameof 210
mulh nameof 211
mulh nameof 212
mulh nameof 213
mulh nameof 214
mulh nameof 215
multiply
  accumulate
    high signed high signed half word 194
    high signed high unsigned half word 195
    high signed low signed half word 202
    high signed low unsigned half word 203
    high unsigned high signed half word 196
    high unsigned low signed half word 204
    high unsigned low unsigned half word 205
    low signed low signed half word 210
    low signed low unsigned half word 211
    low unsigned low unsigned half word 212
    high signed
      high signed half Word 191
      high unsigned half word 192
      low signed half word 198
      low unsigned half word 199
    high unsigned
      high unsigned half word 193
      low signed half word 200
      low unsigned half word 201
    low signed
      low signed half word 207
      low unsigned half word 208
    low unsigned
      low unsigned half word 209
    shift accumulate
      high unsigned high unsigned half word 197
      high unsigned low unsigned half word 206
      low unsigned low unsigned half word 213
  multiply instructions 190
mvnz 130
mvz 131
mz 132
mzb 263
mzh 265
N
nap 34, 354
Index

network
  properties 373
nop 34, 216
NOP instructions 214
nor 134
nor word 134
NUMBER_OF_REGISTERS_64 31
numbering 3

O
opcode 359
or 136, 359
  immediate word 138
  word 136
ori 138, 359

P
P0 16
P1 16
P2 16
pack
  half words saturating 267, 271
  high byte 269
  low byte 273
packbs_u 267
packed byte format 11
packed half word format 11
packet format 376
packet sizes 374
packets 374
packhb 269
packhs 271
packlb 273
PC_EX_CONTEXT_OFFSET 31
pcnt 75
pipeline 16
  latencies 17
pipelines 6
popReturnStack 34
population count word 75
port designations 401
prefetch 359
prefetch_L1 359
processing engine
  pipeline 16
processor engine 6
Program Counter (PC) 16
protection
  hardwall 382
PROTECTION_LEVEL_EX_CONTEXT_OFFSET 31
pseudo instructions 359
pushReturnStack 34

R
r0-r53 register 14
RAW dependence
  defined 460
read-after-write (RAW) dependencies 13
refill mode 417
register mapping 375
RegisterFile 16
RegisterFile (RF) 16
RegisterFileEntry 32
REQUEST bit 450
reserved fields 3
rl 140
rlr 142
rotate
  left immediate word 142
  left word 140
round-robin output port arbitration 374
route header 376
routing 374
routing the packet 376

S
s1a 53
s2a 55
s3a 57
sadab_u 275
sadah 276
sadah_u 277
sadb_u 278
sadh 279
sadh_u 280
sb 176
sbadd 177
scratchpad memory 362
seq 77
seqb 281
seqh 283
seqi 79
set
  equal
    immediate word 79
    to byte 281
    word 77
  less than
    immediate word 89
    or equal
      unsigned word 87
      word 85
    unsigned immediate word 91
    unsigned word 83
    word 81
  not equal
    word 93
Set Equal To Half Words 283
Set Less Than Unsigned Byte 306
setInterruptCriticalSection 33
setNextPC 33
setProtectionLevel 33
sh 178
shadd
Index

store
  half word and add 179
shift
  left
    one add word 53
    three add word 57
    two add word 55
shl 144
shi 146
shlib 292
shr 148
shrb 296
shri 150
shrih 302
SignedMachineWord 32
signExtend1 32
signExtend16 32
signExtend17 32
signExtend8 32
SIMD instructions 11, 218
slt 81
slt_u 83
sltb_u 306
slte 85
slte_u 87
slti 89
slti_u 91
sn register 14
SN_DATA_AVAIL 383, 402
SN_STATIC_CTL 403
SN_STATIC_DATA_AVAIL 409
SN_STATIC_FIFO_DATA 404
SN_STATIC_FIFO_SEL 405
SN_STATIC_ISTATE 406
SN_STATIC_OSTATE 407
SNCTL 383
sne 93
SNFIFO 383
SNFIFO_DATA 383, 397
SNFIFO_SEL 383, 398
SNISTATE 383, 399
SNOSTATE 383, 400
SNSTATIC 383, 401
software interrupt 0 355
software interrupt 1 356
software interrupt 2 357
software interrupt 3 358
softwareInterrupt 34
sp register 14
Special Purpose Register File
  See SPRF
special purpose registers
  See SPR
Special Purpose Registers, See SPRs
  specifying
    input port to which to route output 381, 401
SPI 10
SPI-SROM 460
SPR 367, 381
  fields 381
SPRF 391
SPRs 376, 383
  address information 391
  listed by access MPL 391
  register descriptions 396
  use of 16
  user-accessible 383
sra 152
srb 332
srah 334
srai 154
sraib 336
sraih 338
state machine 10
static network 381
  processor program counter 383
static network (STN) 8
Static Network Control register
  See SN_STATIC_CTL
Static Network Data Available register
  See SN_STATIC_DATA_AVAIL
Static Network Data Available register, See
  SN_DATA_AVAIL
Static Network FIFO Data register
  See SN_STATIC_FIFO_DATA
Static Network Fifo Data register, See SNFIFO_DATA
Static Network FIFO Select register
  See SN_STATIC_FIFO_SEL
Static Network FIFO Select register, See SNFIFO_SEL
Static Network Input State register
  See SN_STATIC_ISTATE
Static Network Input State register, See SNISTATE
Static Network Output State register
  See SN_STATIC_OSTATE
Static Network Output State register, See SNOSTATE
Static Network Static Route register, See SNSTATIC
  static routing 381
store
  byte 176
  byte and add 177
  half word 178
  word 180
  word and add 181
striped memory 366
sub 59
subb 340
subbs_u 342
subh 344
subhs 345
subs 61
subtract
  bytes 340
  saturating unsigned 342
  half words 344
<table>
<thead>
<tr>
<th>Supported Memory Modes</th>
<th>361, 362</th>
</tr>
</thead>
<tbody>
<tr>
<td>SUSPEND Bit</td>
<td>450</td>
</tr>
<tr>
<td>sw</td>
<td>180</td>
</tr>
<tr>
<td>swadd</td>
<td>181</td>
</tr>
<tr>
<td>swint</td>
<td>355</td>
</tr>
<tr>
<td>swint2</td>
<td>357</td>
</tr>
<tr>
<td>swint3</td>
<td>358</td>
</tr>
<tr>
<td>Switch Engine</td>
<td>6, 8</td>
</tr>
<tr>
<td>Switches</td>
<td>373</td>
</tr>
<tr>
<td>Switchpoint</td>
<td>376</td>
</tr>
<tr>
<td>System</td>
<td>347</td>
</tr>
<tr>
<td>System Calls</td>
<td>385</td>
</tr>
<tr>
<td>System Instructions</td>
<td>347</td>
</tr>
<tr>
<td>T</td>
<td></td>
</tr>
<tr>
<td>Table Index Byte 0</td>
<td>156</td>
</tr>
<tr>
<td>Table Index Byte 1</td>
<td>157</td>
</tr>
<tr>
<td>Table Index Byte 2</td>
<td>158</td>
</tr>
<tr>
<td>Table Index Byte 3</td>
<td>159</td>
</tr>
<tr>
<td>Tag</td>
<td>376</td>
</tr>
<tr>
<td>Tag Word</td>
<td>376</td>
</tr>
<tr>
<td>Target of a Jump</td>
<td>359</td>
</tr>
<tr>
<td>Tblidxb0</td>
<td>156</td>
</tr>
<tr>
<td>Tblidxb1</td>
<td>157</td>
</tr>
<tr>
<td>Tblidxb2</td>
<td>158</td>
</tr>
<tr>
<td>Tblidxb3</td>
<td>159</td>
</tr>
<tr>
<td>Test and Set Word</td>
<td>182</td>
</tr>
<tr>
<td>Test-and-Set (TNS)</td>
<td>369</td>
</tr>
<tr>
<td>Test-and-Set Data Writes</td>
<td>368</td>
</tr>
<tr>
<td>Tile</td>
<td></td>
</tr>
<tr>
<td>Defined</td>
<td>6</td>
</tr>
<tr>
<td>Tile Dynamic Network (TDN)</td>
<td>373</td>
</tr>
<tr>
<td>Tile Fabric</td>
<td>376</td>
</tr>
<tr>
<td>Timing</td>
<td>374</td>
</tr>
<tr>
<td>TLB</td>
<td>1</td>
</tr>
<tr>
<td>TNS</td>
<td>182</td>
</tr>
<tr>
<td>Translation Lookaside Buffers</td>
<td>See TLB</td>
</tr>
<tr>
<td>Two-Wire Interface</td>
<td>10</td>
</tr>
<tr>
<td>Types</td>
<td>32</td>
</tr>
<tr>
<td>U</td>
<td></td>
</tr>
<tr>
<td>UART</td>
<td>10, 460</td>
</tr>
<tr>
<td>UDN</td>
<td></td>
</tr>
<tr>
<td>Hardwall Mechanism</td>
<td>379</td>
</tr>
<tr>
<td>Interlocked</td>
<td>375</td>
</tr>
<tr>
<td>Packet Format, Illustrated</td>
<td>376</td>
</tr>
<tr>
<td>UDN Available Enables Register, See UDN_AVAIL_EN</td>
<td></td>
</tr>
<tr>
<td>UDN Catch-All Data Register, See UDN_CA_DATA</td>
<td></td>
</tr>
<tr>
<td>UDN Catch-all Remaining Words Register, See UDN_CA_REM</td>
<td></td>
</tr>
<tr>
<td>UDN Catch-all Tag Register, See UDN_CA_TAG</td>
<td></td>
</tr>
<tr>
<td>UDN Data Available Register, See UDN_DATA_AVAIL</td>
<td></td>
</tr>
<tr>
<td>UDN Deadlock Counter, See UDN_DEADLOCK_COUNT</td>
<td></td>
</tr>
<tr>
<td>UDN Deadlock Timeout Register, See UDN_DEADLOCK_TIMEOUT</td>
<td></td>
</tr>
<tr>
<td>UDN Demultiplexor Count 1 Register, See UDN_DEMUX_COUNT_1</td>
<td></td>
</tr>
<tr>
<td>UDN Demultiplexor Count 2 Register, See UDN_DEMUX_COUNT_2</td>
<td></td>
</tr>
<tr>
<td>UDN Demux Control Register, See UDN_DEMUX_CTL</td>
<td></td>
</tr>
<tr>
<td>UDN Demux Current Tag Register, See UDN_DEMUX_CURR_TAG</td>
<td></td>
</tr>
<tr>
<td>UDN Demux FIFO Register, See UDN_DEMUX_WRITE_FIFO</td>
<td></td>
</tr>
<tr>
<td>UDN Demux Queue Select Register, See UDN_DEMUX_QUEUE_SEL</td>
<td></td>
</tr>
<tr>
<td>UDN Demux Write Queue Register, See UDN_DEMUX_WRITE_QUEUE</td>
<td></td>
</tr>
<tr>
<td>UDN FIFO Data Register, See UDN_SP_FIFO_DATA</td>
<td></td>
</tr>
<tr>
<td>UDN FIFO Select Register, See UDN_SP_FIFO_SEL</td>
<td></td>
</tr>
<tr>
<td>UDN Freeze Register, See UDN_SP_FREEZE</td>
<td></td>
</tr>
<tr>
<td>UDN Packet Description</td>
<td>376</td>
</tr>
<tr>
<td>UDN Port State Register, See UDN_SP_STATE</td>
<td></td>
</tr>
<tr>
<td>UDN Refill Available Enables, See UDN_REFILL</td>
<td></td>
</tr>
<tr>
<td>UDN Remaining Register, See UDN_REMAINEING</td>
<td></td>
</tr>
<tr>
<td>UDN Switch Point</td>
<td>378</td>
</tr>
<tr>
<td>UDN Tag 0 Register</td>
<td></td>
</tr>
<tr>
<td>UDN Tag 1 Register</td>
<td></td>
</tr>
<tr>
<td>UDN Tag 2 Register</td>
<td></td>
</tr>
<tr>
<td>UDN Tag 3 Register</td>
<td></td>
</tr>
<tr>
<td>UDN Tile Coordinates Register, See UDN_TILE_COORD</td>
<td></td>
</tr>
<tr>
<td>UDN Words Pending Register, See UDN_PENDING</td>
<td></td>
</tr>
<tr>
<td>UDN_AVAIL Interrupt</td>
<td>445</td>
</tr>
<tr>
<td>UDN_AVAIL_EN</td>
<td>378, 428</td>
</tr>
<tr>
<td>UDN_CA</td>
<td>378</td>
</tr>
<tr>
<td>UDN_CA_DATA</td>
<td>424</td>
</tr>
<tr>
<td>UDN_CA_REM</td>
<td>377, 424</td>
</tr>
<tr>
<td>UDN_CA_Tag</td>
<td>425</td>
</tr>
<tr>
<td>UDN_DATA_AVAIL</td>
<td>378, 425, 426</td>
</tr>
<tr>
<td>UDN_DEADLOCK_COUNT</td>
<td>429</td>
</tr>
<tr>
<td>UDN_DEADLOCK_TIMEOUT</td>
<td>430</td>
</tr>
<tr>
<td>UDN_DEADLOCK_TIMEOUT_Register</td>
<td>430</td>
</tr>
<tr>
<td>UDN_DEMUX_COUNT_0</td>
<td>411</td>
</tr>
<tr>
<td>UDN_DEMUX_COUNT_1</td>
<td>412</td>
</tr>
<tr>
<td>UDN_DEMUX_COUNT_2</td>
<td>413</td>
</tr>
<tr>
<td>UDN_DEMUX_COUNT_3</td>
<td>414</td>
</tr>
<tr>
<td>UDN_DEMUX_COUNT_n</td>
<td>378</td>
</tr>
<tr>
<td>UDN_DEMUX_CTL</td>
<td>415</td>
</tr>
<tr>
<td>UDN_DEMUX_CURR_TAG</td>
<td>415</td>
</tr>
<tr>
<td>Index</td>
<td></td>
</tr>
<tr>
<td>-------</td>
<td></td>
</tr>
<tr>
<td>UDN_DEMUX_QUEUE_SEL</td>
<td>415</td>
</tr>
<tr>
<td>UDN_DEMUX_WRITE_FIFO</td>
<td>416</td>
</tr>
<tr>
<td>UDN_DEMUX_WRITE_QUEUE</td>
<td>417</td>
</tr>
<tr>
<td>UDN_PENDING</td>
<td>417</td>
</tr>
<tr>
<td>UDN_REFILL</td>
<td>426</td>
</tr>
<tr>
<td>UDN_REMAINING</td>
<td>427</td>
</tr>
<tr>
<td>UDN_SP_FIFO_DATA</td>
<td>418</td>
</tr>
<tr>
<td>UDN_SP_FIFO_SEL</td>
<td>418</td>
</tr>
<tr>
<td>UDN_SP_FREEZE</td>
<td>419</td>
</tr>
<tr>
<td>UDN_SP_STATE</td>
<td>420</td>
</tr>
<tr>
<td>UDN_TAG_0</td>
<td>421</td>
</tr>
<tr>
<td>UDN_TAG_1</td>
<td>421</td>
</tr>
<tr>
<td>UDN_TAG_2</td>
<td>421</td>
</tr>
<tr>
<td>UDN_TAG_3</td>
<td>422</td>
</tr>
<tr>
<td>UDN_TAG_n</td>
<td>377</td>
</tr>
<tr>
<td>UDN_TILE_COORD</td>
<td>423</td>
</tr>
<tr>
<td>udn0 register</td>
<td>14</td>
</tr>
<tr>
<td>udn1 register</td>
<td>14</td>
</tr>
<tr>
<td>udn2 register</td>
<td>14</td>
</tr>
<tr>
<td>udn3 register</td>
<td>14</td>
</tr>
<tr>
<td>UnsignedMachineWord</td>
<td>32</td>
</tr>
<tr>
<td>User Dynamic Network (UDN)</td>
<td>8, 373</td>
</tr>
<tr>
<td>User Dynamic Network Demultiplexor Count 0 register, See UDN_DEMUX_COUNT_0</td>
<td></td>
</tr>
<tr>
<td>User Dynamic Network Demultiplexor Count 3 register, See UDN_DEMUX_COUNT_3</td>
<td></td>
</tr>
<tr>
<td>user-accessible special purpose registers</td>
<td>383</td>
</tr>
<tr>
<td>user-accessible SPRs</td>
<td>383</td>
</tr>
<tr>
<td>user-level interrupts</td>
<td>389</td>
</tr>
<tr>
<td>user-level processes</td>
<td>9</td>
</tr>
<tr>
<td>W</td>
<td></td>
</tr>
<tr>
<td>WAR dependence</td>
<td>defined 460</td>
</tr>
<tr>
<td>WAW dependence</td>
<td>defined 460</td>
</tr>
<tr>
<td>WB</td>
<td>16, 17</td>
</tr>
<tr>
<td>wh64</td>
<td>189</td>
</tr>
<tr>
<td>what’s new In this manual</td>
<td>1</td>
</tr>
<tr>
<td>word</td>
<td>defined 11</td>
</tr>
<tr>
<td>WORD_ADDR_MASK 0xFFFFfific</td>
<td>30</td>
</tr>
<tr>
<td>WORD_MASK 0xFFFFFFFF</td>
<td>30</td>
</tr>
<tr>
<td>WORD_SIZE 32</td>
<td>30</td>
</tr>
<tr>
<td>wormhole</td>
<td></td>
</tr>
<tr>
<td>routing, defined</td>
<td>460</td>
</tr>
<tr>
<td>write</td>
<td></td>
</tr>
<tr>
<td>hint 64 bytes</td>
<td>189</td>
</tr>
<tr>
<td>write-after-write (WAW) semantics</td>
<td>13</td>
</tr>
<tr>
<td>WriteBack</td>
<td></td>
</tr>
<tr>
<td>see WB</td>
<td></td>
</tr>
<tr>
<td>X</td>
<td></td>
</tr>
<tr>
<td>X instruction formats</td>
<td>20</td>
</tr>
<tr>
<td>X,Y coordinates</td>
<td></td>
</tr>
<tr>
<td>of the target</td>
<td>376</td>
</tr>
<tr>
<td>X0 instruction formats</td>
<td>24</td>
</tr>
<tr>
<td>X1 instruction formats</td>
<td>21</td>
</tr>
<tr>
<td>xor</td>
<td>160</td>
</tr>
<tr>
<td>xori</td>
<td>162</td>
</tr>
<tr>
<td>Y</td>
<td></td>
</tr>
<tr>
<td>Y instruction formats</td>
<td>26</td>
</tr>
<tr>
<td>Y0 instruction formats</td>
<td>29</td>
</tr>
<tr>
<td>Y1 instruction formats</td>
<td>28</td>
</tr>
<tr>
<td>Y2 instruction formats</td>
<td>27</td>
</tr>
<tr>
<td>Z</td>
<td></td>
</tr>
<tr>
<td>zero register</td>
<td>14</td>
</tr>
<tr>
<td>ZERO_REGISTER</td>
<td>63 31</td>
</tr>
</tbody>
</table>