Y86-64处理器架构简介

ISA(指令集架构)

指令集架构（英语：Instruction Set Architecture，缩写为ISA），又称指令集或指令集体系，是计算机体系结构open in new window中与程序设计open in new window有关的部分，包含了基本数据类型open in new window，指令集，寄存器open in new window，寻址模式open in new window，存储体系open in new window，中断open in new window，异常处理open in new window以及外部I/Oopen in new window。指令集架构包含一系列的opcodeopen in new window即操作码（机器语言open in new window），以及由特定处理器执行的基本命令。
不同的处理器“家族”——例如Intelopen in new window IA-32open in new window和x86-64open in new window、IBMopen in new window/Freescale Power和ARMopen in new window处理器家族——有不同的指令集架构。[1]open in new window
指令集体系与微架构open in new window（一套用于执行指令集的微处理器设计方法）不同。使用不同微架构的电脑可以共享一种指令集。例如，Intelopen in new window的Pentiumopen in new window和AMDopen in new window的AMD Athlonopen in new window，两者几乎采用相同版本的x86open in new window指令集体系，但是两者在内部设计上有本质的区别。
维基百科

Designing an ISA

Designing processor states visible to programmers「设计对程序员可见的处理器状态」
Designing a set of instructions「设计一套指令」
Encoding the instructions「对指令编码」

All the above designs will be exemplified by the Y86-64 ISA --- a much simpler X86 ISA enough to demonstrate the key concepts「以上所有设计都将以Y86-64 ISA为例-一种更简单的X86 ISA，足以演示关键概念」

Circuits

Digital circuits

Logic gates

Logic gates：Basic computing electronic circuit elements「Basic computing electronic circuit elements」

Logic gates are always active, is some input to a gate changes, then within some small amount of time, the output will change accordingly「逻辑门始终处于活动状态，某个门的某些输入会发生变化，然后在短时间内，输出将相应地发生变化」
Can be represented by hardware control language (HCL)「可以用硬件控制语言（HCL）表示」
- 比如：out = a && b; out = a || b …

Combinational circuits

Example 1

if a and b are equal, output 1; otherwise, output 0With and, or and not gates

Example 2

selecting a or b according to s

Example 3

From a single bit to multiple bits (word)

Arithmetic Logic Unit (ALU)

Using and, or, not gates to implement arithmetic logic「Using and, or, not gates to implement arithmetic logic」
Compute the result, and set the conditional codes「Compute the result, and set the conditional codes」
Inputs and outputs are multi-bit word「Inputs and outputs are multi-bit word」

Storage elements

Storage elements are special electronic circuits that can retain data values「存储元件是可以保留数据值的特殊电子电路」

Storage elements can be read or written
Storage elements can be addressed
Storage elements rely on clocks to retain data values「存储元件依靠时钟来保留数据值」

Y86-64 processor

state

15 64-bit general purpose registers
Conditional codes
- ZF: zero;
- SF: negative;
- OF: overflow
Program Counter：Indicates address of next instruction
Memory
- Byte-addressable storage array「字节-可寻址存储阵列」
- Words stored in little-endian byte order「以little-endian字节顺序存储的单词」

little-endian

其实big endian是指低地址存放最高有效字节（MSB），而little endian则是低地址存放最低有效字节（LSB）。

用文字说明可能比较抽象，下面用图像加以说明。比如数字0x12345678在两种不同字节序CPU中的存储顺序如下所示：

从上面两图可以看出，采用big endian方式存储数据是符合我们人类的思维习惯的。

Instruction set

Encoding registers

给寄存器编码

Each register is uniquely specified by a 4-bit ID「每个寄存器由一个4位ID唯一地指定」

ID 15 (0xF) indicates “no register”「ID 15（0x F）表示“无寄存器”」

Instruction examples

Uniqueness (requirement on designing an ISA)「唯一性（设计ISA的要求）」

The encodings must have a unique interpretation「编码必须具有唯一的解释」
Given a sequence of bytes (machine code), it can be interpreted into only one valid sequence of instructions「给定一个字节序列（机器代码），它只能解释为一个有效的指令序列」
From the first instruction, always being able to find the start byte of the next instruction「从第一条指令开始，总是能够找到下一条指令的起始字节

Standard stages to execute one instruction

We have ……
- Hardware building blocks that can do arithmetic computation
- Hardware storage elements to store data
- Machine instructions defined
We want to put all these things together to build a CPU
- That can read and understand a program in machine instructions
- That can perform the functions specified by the machine instructions
  - By operating the computation and storage elements of the CPU

As there are so many instructions, it will be not wise to design a specific hardware circuit for each instruction「由于指令太多，为每个指令设计特定的硬件电路是不明智的。」
The execution of instructions is standardized, i.e., all instructions follow the same steps, an in each step share the same hardware「指令的执行是标准化的，即所有指令都遵循相同的步骤，并且每一步都共享相同的硬件」

Stages/Steps	Functions
Fetch	Read an instruction from the memory「从内存中读取指令」
Decode	Read operands「操作数」 from registers「从寄存器读取操作数」
Execute	Compute value or address「数学计算」
Memory access	Read or write data from/to memory「从内存读取数据或向内存写入数据」
Write back	Write results to registers「将结果写入寄存器」
PC update	Update PC, get ready for the next instruction「更新PC，准备下一条指令」

Computed values

Stored in CPU on hardware lines/pins

Run the machine codes

Use an example program to show how the CPU run a program in the machine code form「使用示例程序来显示CPU如何以机器代码形式运行程序」

Pipeline

The whole production process is composed of multiple stages「整个生产过程由多个阶段组成」
Worker on each stage do only ONE thing「每个阶段的工人只能做一件事情」
Products line up on the pipeline, each goes through all stages「产品在流水线中排列，每个阶段都经过各个阶段」

Rethinking the sequential machine

Every instruction goes through six stages「每条指令都经过六个阶段」

In the sequential implementation, when the instruction is in one stage, e.g., execute, all the hardware components in other stages are idle「在顺序实现中，当指令处于一个阶段，例如执行时，其他阶段的所有硬件组件都处于空闲状态」

This is under-utilization of the processor hardware「这是在-处理器硬件的利用率不足」

Understanding the performance of pipeline

Executing an instruction consumes 300ps (1ps = 10-12s)

How many instructions can we execute in 1s? (throughput, IPS)
1/(300 * 10-12) = 3.33 X 109 instructions（在无同时执行的情况下）

Decompose the execution of each instruction into 3 stages, each stage takes 100ps to execute「将每个指令的执行分解为3个阶段，每个阶段需要100ps的执行时间」

How many instructions can we execute in 1s? (throughput, IPS)「我们可以在一秒内执行多少条指令？（吞吐量，IPS）」

1/(100 * 10-12) = 1010
3 times faster than the execution above
Adding registers between two consecutive pipeline stages「在两个连续的流水线阶段之间添加寄存器」
Each time a clock signal arrives, the result of stage-x will be written to the register between stage-x and stage-(x+1)「每次时钟信号到达时，stage-x 的结果将被写入stage-x和 stage-（x + 1）之间的寄存器中」
Once the result of stage-x is written, the next stage can start execution with the result as its input「写入stage-x的结果后，下一个stage可以开始执行，并将结果作为输入」
Accessing registers introduces extra time delay; the end-to-end latency of finishing a single instruction is increased「访问寄存器会带来额外的时间延迟；完成一条指令的端到端延迟增加了」

Redesign the CPU with pipeline

Bad pipeline design

Nonuniform partitioning「分区不均匀」
Latency is determined by the longest stage「延迟由最长的阶段决定」

Make the pipeline stages uniform

More stages: deep pipeline
More stage registers -> more time overhead
Sometimes, a stage cannot be decomposed「分解的」

Data hazard in pipelines

Data hazard in pipelines「管道中的数据危害」

Data dependencies: the results computed by one instruction are used as the data for a following instruction「数据依存关系：一条指令计算的结果用作下一条指令的数据」
Data hazard: data dependencies have the potential to cause an erroneous computation by the pipeline「数据危害：数据依赖性可能会导致管道计算错误」

Solution: Stalling

补充材料

Y86-64 instruction set

补充一些指令的stage

mrmovq

jg

cmovle

x86/x64 指令长度

AMD manual Vol3 第 1.1 Instruction Byte Brder 节中明确地说：An instruction can be between one and 15 bytes in length.

做题笔记

对于每个指令，都有执行Decode、Write back、Memory 这几个过程。le xxx 指令只有这三个过程

引用材料

https://zh.wikipedia.org/wiki/指令集架構open in new window
https://blog.csdn.net/sunshine1314/article/details/2309655open in new window
AMD manual Vol3
COMP1411 @ PolyU's PowerPoint

Y86-64处理器架构简介

# Y86-64处理器架构简介

# ISA(指令集架构)

# Designing an ISA

# Circuits

# Digital circuits

# Logic gates

# Combinational circuits

# Example 1

# Example 2

# Example 3

# Arithmetic Logic Unit (ALU)

# Storage elements

# Y86-64 processor

# state

# little-endian

# Instruction set

# Encoding registers

# Instruction examples

# Standard stages to execute one instruction

# Computed values

# Run the machine codes

# Pipeline

# Understanding the performance of pipeline

# Redesign the CPU with pipeline

# Bad pipeline design

# Make the pipeline stages uniform

# Data hazard in pipelines

# Solution: Stalling

# Other Problems

# 补充材料

# Y86-64 instruction set

# 补充一些指令的stage

# mrmovq

# jg

# cmovle

# x86/x64 指令长度

# 做题笔记

# 引用材料

Y86-64处理器架构简介

ISA(指令集架构)

Designing an ISA

Circuits

Digital circuits

Logic gates

Combinational circuits

Example 1

Example 2

Example 3

Arithmetic Logic Unit (ALU)

Storage elements

Y86-64 processor

state

little-endian

Instruction set

Encoding registers

Instruction examples

Standard stages to execute one instruction

Computed values

Run the machine codes

Pipeline

Understanding the performance of pipeline

Redesign the CPU with pipeline

Bad pipeline design

Make the pipeline stages uniform

Data hazard in pipelines

Solution: Stalling

Other Problems

补充材料

Y86-64 instruction set

补充一些指令的stage

mrmovq

jg

cmovle

x86/x64 指令长度

做题笔记

引用材料