Skip to content

Architecture

Farzad Fatollahi-Fard edited this page Jun 22, 2015 · 11 revisions

Architecture

OpenSoC Fabric consists of a hierarchical collection of modules, as previously shown in section High-Level Overview. Instantiating the top-level module will instantiate the specified topology and all required other modules (such as routers and allocators), with the specified parameters. Some lower-level modules expect parameters that are not expected by the top-level module. In those cases, the instantiating module is responsible for deriving the desired values from user-specified parameters. For example, VC allocators in routers calculate the number of output VCs to allocate based on the number of output ports and VCs. The input and output ports between modules is well defined to allow easy replace of existing modules with new ones.

abstract class Allocator(parms: Parameters) extends Module(parms) {
	val numReqs = parms.get[Int]("numReqs")
	val numRes = parms.get[Int]("numRes")
	val arbCtor = parms.get[Parameters=>Arbiter]("arbCtor")
	val io = new Bundle {
		val requests = Vec.fill(numRes) { Vec.fill(numReqs){ new RequestIO }.flip }
		val resources = Vec.fill(numRes) { new ResourceIO }
		val chosens = Vec.fill(numRes) { UInt(OUTPUT, Chisel.log2Up(numReqs)) }
	}
}

class SwitchAllocator(parms: Parameters) extends Allocator(parms) {	
	// Implementation
}

In addition, for each module there is an abstract implementation that includes the inputs, outputs, and functionality that is common to all child modules of that type. For example, the allocator abstract module defines the necessary input and output ports, as shown in the code segement above. Specific allocator implementations, such as separable allocators, are defined as child modules that inherit those set of inputs and outputs, and implement the necessary functionality.

All lower-level modules also include unit testers to aid development of new modules. Those unit testers should be used to verify functionality before integrating with the network. Unit testers may need to be extended or modified depending on what exactly needs to be tested in each module.

This section goes into more detail for each specific module:

Arbiter

The arbiter module takes a variable number of requests and grants one of them each cycle to a specific resource. Only a single grant is generated because it is assumed that there is a single resource for which to arbitrate. Arbiters are used as building blocks for some types of allocators. Requests also have a separate lock input that, when asserted, maintains the grant to that requestor.

Child modules: RRarbiter: This implements a round robin arbiter. Once the grant is released from a requestor, the next requestor in order gets priority.

Allocator

The allocator class grants a variable number of resources to a variable number of requestors. Resources can be VCs to implement a VC allocator, or output ports for a switch allocator. Each resource also has an input to specify whether it is eligible (ready) to be allocated. Finally, requestors that previously received a grant may lock the resource such that the allocator does not grant it to another requestor. This is used to ensure that the VC allocator does not grant an output VC to another packet before the previous one is completely transmitted to the output.

Child modules: Allocator: Currently, the allocator just takes a constructor to an Arbiter and instantiates them accordingly.

InputNetworkInterface

This class implements synthesizable hardware to divide packets into the appropriate number of flits. It is used with test harnesses that inject packets into the network. It uses an instance of the class "PacketToFlit".

Child modules: InputPacketInterface

InputToFlit

This is an abstract class that takes a data structure as input and generates a collection of flits.

Child modules: PacketToFlit: This module takes a packet as parameter and divides it into flits of the appropriate format. It consists of a FSM and a packet queue. It also contains two functions: one to create a body flit and one to create a head flit.

InjectionChannelQ

This module implements an injection queue which receives flits from the network endpoints (traffic sources) and injects them to the attached router respecting credits. The queue has a single produces and a single consumer. Flits are sent to the consumer when credits are available.

RouterRegFile

This implements a register file used by the router to store head flit information to perform allocation and routing while the head flit itself resides in the input queue.

Switch

Implements a mux-based crossbar of variable inputs and outputs.

Router/VCRouter

The router abstract class only defines the set of input and output flit channels that all routers must take.

Child modules: WormholeRouter, VCRouter. A diagram of the VC router is shown below. The red injection and ejection queues are at the network boundaries and thus not inside the router. Incoming flits are stored in input queues. Input queues are divided per VC. Each input VC queue stores state for the flit at its head with the pipeline stage it is currently in, as well as the chosen output and range of eligible VCs provided by the routing function (there is one routing function instance per input VC).

VC Router Diagram

The router includes one VC allocator that performs allocation for "input ports" x "VCs" requestors and "output ports" x "VCs" resources. Therefore, each output VC is a separate resource and can get allocated independently. VCs are assigned (locked) to packets until the entire packet has left the input VC. This is to prevent interleaving of packets inside the same VC. The switch allocator performs an allocation for the same "input ports" x "VCs" requestors but only "output ports" resources. That is because the switch is a shared physical resources and only one flit may depart an input in any given cycle even if there are multiple eligible flits in the same input (but different VCs).

At the output side of the router there is a set of pipeline registers. Routers use credit-based flow control. Flits departing an input VC generate a credit to the previous router to signify that a slot in the specified VC was just made available. Arriving credits increase the credit count for a given output VC (output VCs correspond to input VCs of the next router). Output VCs with a zero credit count are ineligible to receive flits to prevent buffer overflow.

Currently, once packets reach the head of their input VCs they spend one cycle to route, one cycle to get allocated a VC (assuming no contention), one cycle to allocate the switch, one then one cycle to traverse the switch and get stored into the output registers so that they can traverse the channel.

The wormhole router is similar to VCRouter but lacks VCs. As such, it doesn't have the VC allocation pipeline stage.

RoutingFunction

A routing function receives the necessary information from a head flit (because that contains the destination information) and returns a range of eligible output ports and output VCs for the packet the head flit belongs to. Routing is performed only for the head flit of a packet since body flits follow the same route.

Child modules: CMeshDOR, CFlatBflyDOR. Each child class is specific to a topology. CMeshDOR implements dimension-order routing for the concentrated mesh. CFlatBFlyDOR implements dimension-order routing for the flattened butterfly. Dimension-order routing exhausts all hops in one dimension before switching to the next. In a 2D mesh, this means a packet will perform all of its hops in the X dimension before taking hops in Y. Because this routing is inherently deadlock free because it prevents cyclic dependencies, packets are eligible for all VCs regardless of type.

Topology

The topology class contains all routers and channels, and is responsible for connecting routers and channels correctly. Every topology contains a convention the routing function for that topology must be aware of. That's because output channels are stored in a vector inside each router. Therefore, if the channel with the lowest index connects to the local endpoint (ejects from the network), the routing function must be aware of that convention. Our current topologies connect injection/ejection channels to the lowest index input and output ports. The rest of the identifiers are assigned in dimension order, starting from positive directions. For example, the output channel that connects to a router with a greater X dimension (e.g., from router 0,0 to 1,0) has an index of 2 in the vector, the output channel that connects to a router with a lower X has an index of 3, etc.

The topology class also acts as an interface of the network and traffic generation. The topology exposes injection and ejection channels that the top-level module uses to inject and eject traffic. Finally, the topology class may contain some statistics counters.

Child modules: CMesh, CFlatFly: The CMesh topology class instantiates a concentrated mesh of any dimension, while CFlatFly implements a concentrated flattened butterfly of any dimension. As specified in the User Parameters section, the routers per dimension in each of the two topologies need not be the same. In other words, the topologies do not have to be square. The CMesh connects every router to its adjacent neighbors, as shown in the figure below. CFlatFly connects every router to every other router that shares at least one dimension. In the case of a 2D flattened butterfly which is shown below, every router connects to every other router in the same row and in the same column.

Mesh Flattend Butterfly
Example 2D 4x4 Mesh Topology Example 2D 4x4 Flattened Butterfly Topology