Skip to content

Memory Clustering in the Enumerator #73

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 41 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
215e6f9
Added notes on how to possibly start.
Mar 7, 2020
449456b
Idea on how to implement checking if an instruction is part of a clus…
Mar 10, 2020
a6376ab
Added LLVM's method to check if we should cluster MemOps
Mar 11, 2020
186a1f3
Idea for implementation (WIP)
Mar 13, 2020
c4b0973
Fixed some compilation issues
Mar 13, 2020
75b02f4
Fixed some compiler bugs, and added experimental cost.
Mar 13, 2020
00501ae
Cleaned up debug statements. NFC
Mar 13, 2020
3603da9
Added clustering cost to ChkCostFsblty, and added TODOs.
Mar 13, 2020
035272b
Fix typo for variable and disabled terminating enumerator when we fin…
Mar 13, 2020
a2cd231
Debugging statements and reset mem clustering info in InitForSchduling
Mar 17, 2020
760c38d
Added setting or memory clustering in settings. Fixed clustering for …
Mar 17, 2020
8b5e2cc
Fix missing var.
Mar 17, 2020
93f01e3
Fix memory segmentation
Mar 17, 2020
111d5eb
Use an integer instead of a vector for cluster groups.
Mar 17, 2020
298fb0f
Fix error with static variable.
Mar 18, 2020
30c7d9c
Added MEM heuristic priority. Not yet implemented.
Mar 19, 2020
d712460
ALso save state for cluster of size 1.
Mar 19, 2020
91967ba
First implementation of MEM heuristic.
Mar 19, 2020
ed248f0
Print out ready list and changes to linked list (Vlad)
Mar 20, 2020
3664057
Extract more information about each cluster to be later used in lower…
Mar 20, 2020
b8e4ac5
Error fixes
Mar 20, 2020
b519e25
First implementation of cost function
Mar 20, 2020
26a89c3
Some code cleanup. No functional changes.
Mar 20, 2020
ec8e0bd
Missed variable to clean up
Mar 20, 2020
f467f83
Fix issues with enumerator not updating priorities
Mar 27, 2020
7fcb9a4
Added store clustering and debugging statements
Apr 4, 2020
cccccc3
Fix segmentation fault due to copying ready list when a dynamic heuri…
Apr 9, 2020
b4f55af
Updated comments for easier review.
Apr 23, 2020
9e3c5cc
Merge branch 'master' into memory-clustering-project
Apr 30, 2020
46b9542
Fix not accounting for multiple clusters within the same store-chain.
Apr 30, 2020
19184f5
Working implementation of clustering using B&B. No history domination.
Jun 4, 2020
4bfbc61
Copy in dag mutation fix.
Jun 10, 2020
0d80260
Copy verify schedule bugfix patch for dag mutation fix.
Jun 10, 2020
58978df
Missed a file to copy over.
Jun 10, 2020
ee1d32f
Ignore artificial edges for potential clustering and display clusters…
Jun 12, 2020
decb49f
Add option to print cluster information after scheduling and revert c…
Jul 18, 2020
913f83d
Added 2nd ILP pass with lower target occupancy
Aug 16, 2020
b01eeff
Add two conditions for re-scheduling ILP pass; Minimum occupancy and …
Aug 17, 2020
9bbb91d
Fix ILP Improvement calculation bugs
Aug 19, 2020
6b28d0d
Disable heuristic scheduler and B&B enumerator in 3rd ILP pass.
Aug 21, 2020
527d08f
Fix incorrect statement
Aug 24, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 28 additions & 1 deletion example/optsched-cfg/sched.ini
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,38 @@ USE_OPT_SCHED YES
# Same options as use optimal scheduling.
PRINT_SPILL_COUNTS YES

# Print clustering information
# YES
# NO
PRINT_CLUSTER YES

# Use two pass scheduling approach.
# First pass minimizes RP and second pass tries to balances RP and ILP.
# YES
# NO
USE_TWO_PASS NO

# Sets a limit for occupancy in the second ILP pass. We will not go below this
# occupancy when attempting rescheduling.
# Valid values: 1-10 (whole integers)
MIN_OCCUPANCY_FOR_RESCHEDULE 3

# Sets the required schedule length improvement percentage for the second ILP
# pass. If we do not meet this minimum improvement then we do not keep the
# lower occupancy schedules.
# Valid values: 1-100 (whole integers)
MIN_ILP_IMPROVEMENT 10

# Allow enumerator to try to cluster memory operations together in the second
# pass.
# YES
# NO
CLUSTER_MEMORY_OPS NO

# The weight for clustering. This factor determines the importance of
# trying to find clusters when enumerating.
CLUSTER_WEIGHT 1000

# These 3 flags control which schedulers will be used.
# Each one can be individually toggled. The heuristic
# list scheduler or ACO must be run before the
Expand Down Expand Up @@ -85,7 +111,8 @@ HEURISTIC LUC_CP_NID
ENUM_HEURISTIC LUC_CP_NID

# The heuuristic used for the enumerator in the second pass in the two-pass scheduling approach.
# Same valid values as HEURISTIC.
# Same valid values as HEURISTIC with an additional heuristic:
# Cluster: Favor instructions that are part of an active memory clustering group.
SECOND_PASS_ENUM_HEURISTIC LUC_CP_NID

# The spill cost function to be used. Valid values are:
Expand Down
5 changes: 4 additions & 1 deletion include/opt-sched/Scheduler/OptSchedDDGWrapperBase.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,12 @@ class OptSchedDDGWrapperBase {
public:
virtual ~OptSchedDDGWrapperBase() = default;

virtual void convertSUnits() = 0;
virtual void convertSUnits(bool IgnoreRealEdges,
bool IgnoreArtificialEdges) = 0;

virtual void convertRegFiles() = 0;

virtual int findPossibleClusters(bool IsLoad) = 0;
};

} // namespace opt_sched
Expand Down
91 changes: 90 additions & 1 deletion include/opt-sched/Scheduler/bb_spill.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,11 @@ Last Update: Apr. 2011
#include "opt-sched/Scheduler/OptSchedTarget.h"
#include "opt-sched/Scheduler/defines.h"
#include "opt-sched/Scheduler/sched_region.h"
#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"
#include <map>
#include <memory>
#include <set>
#include <vector>

Expand All @@ -32,6 +35,91 @@ class BBWithSpill : public SchedRegion {

InstCount crntSpillCost_;
InstCount optmlSpillCost_;
int CurrentClusterCost;

/// Used to calculate the dynamic lower bound for clustering.
llvm::SmallVector<int, 32> ClusterCount;
llvm::SmallVector<int, 32> ClusterInstrRemainderCount;
int ClusterGroupCount;

void computeAndPrintClustering(InstSchedule *Sched) override;

/// Print the current clusters found so far in the schedule.
void printCurrentClustering() override;

void initForClustering();

/// Calculate the lower bound cost for memory operations clustering and
/// return the lower bound cost. Does not take into account the clustering
/// weight.
int calculateClusterStaticLB();

/// Helper function for clustering to save the state of the current cluster.
void saveCluster(SchedInstruction *inst);

/// Helper function for clustering to start a new clustering.
void initCluster(SchedInstruction *inst);

/// Reset the active cluster to 0 (none).
void resetActiveCluster(SchedInstruction *inst);

/// Helper function to restore the previous cluster.
void restorePreviousCluster(SchedInstruction *inst);

bool isClusterFinished();

int calculateClusterDLB();

/// Current cluster size
unsigned int CurrentClusterSize;

/// The minimum amount of cluster blocks possible.
int MinClusterBlocks;

/// The minimum amount of cluster blocks + the optimistic expected cluster
/// blocks remaining.
int DynamicClusterLowerBound;

/// Current active cluster group.
int ClusterActiveGroup;

int StartCycle;

/// Data struct to contain information about the previous clusters
struct PastClusters {
/// The cluster group
int ClusterGroup;
/// Size of the cluster when it was ended by an instruction not in the
/// cluster
int ClusterSize;

/// Instruction number that ended this cluster. Used to check if we should
/// restore the cluster state when backtracking.
int InstNum;

int Start;

/// Contains the actual names of the instructions in the cluster. Only used
/// for printing and debugging purposes.
std::unique_ptr<llvm::SmallVector<llvm::StringRef, 4>> InstrList;

/// Constructor for this struct
PastClusters(int Cluster, int Size, int Instructions, int CycleStart)
: ClusterGroup(Cluster), ClusterSize(Size), InstNum(Instructions),
Start(CycleStart) {}
};

/// Vector containing the (n-1) past clusters
llvm::SmallVector<std::unique_ptr<PastClusters>, 4> PastClustersList;

/// Contains the actual names of the instructions in the current cluster.
/// Only used for printing and debugging purposes.
std::unique_ptr<llvm::SmallVector<llvm::StringRef, 4>> InstrList;

/// Pointer to the last cluster. This is kept out of the vector to avoid
/// having to fetch it every time we compare the current instruction
/// number to the one that ended the cluster.
std::unique_ptr<PastClusters> LastCluster;

// The target machine
const OptSchedTarget *OST;
Expand Down Expand Up @@ -103,7 +191,8 @@ class BBWithSpill : public SchedRegion {
void InitForCostCmputtn_();
InstCount CmputDynmcCost_();

void UpdateSpillInfoForSchdul_(SchedInstruction *inst, bool trackCnflcts);
void UpdateSpillInfoForSchdul_(SchedInstruction *inst, bool trackCnflcts,
int Start);
void UpdateSpillInfoForUnSchdul_(SchedInstruction *inst);
void SetupPhysRegs_();
void CmputCrntSpillCost_();
Expand Down
26 changes: 25 additions & 1 deletion include/opt-sched/Scheduler/data_dep.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Last Update: Mar. 2011
#include "opt-sched/Scheduler/buffers.h"
#include "opt-sched/Scheduler/defines.h"
#include "opt-sched/Scheduler/sched_basic_data.h"
#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SmallVector.h"
#include <memory>

Expand Down Expand Up @@ -291,7 +292,24 @@ class DataDepGraph : public llvm::opt_sched::OptSchedDDGWrapperBase,

RegisterFile *getRegFiles() { return RegFiles.get(); }

// Memory clustering helper functions
int getMinClusterCount() { return MinClusterCount; }
void setMinClusterCount(int Max) { MinClusterCount = Max; }
int getTotalInstructionsInAllClusters() {
return TotalInstructionsInAllClusters;
}
void setTotalInstructionsInAllClusters(int Max) {
TotalInstructionsInAllClusters = Max;
}
int getTotalInstructionsInCluster(int Cluster);

protected:
int MinClusterCount;
int TotalInstructionsInAllClusters;
/// Map the cluster block to the total number of instructions found in the
/// block
MapVector<int, int> MaxInstructionsInEachClusters;

// TODO(max): Get rid of this.
// Number of basic blocks
int32_t bscBlkCnt_;
Expand Down Expand Up @@ -391,7 +409,7 @@ class DataDepGraph : public llvm::opt_sched::OptSchedDDGWrapperBase,
InstCount fileUB, int blkNum);
FUNC_RESULT FinishNode_(InstCount nodeNum, InstCount edgeCnt = -1);
void CreateEdge_(InstCount frmInstNum, InstCount toInstNum, int ltncy,
DependenceType depType);
DependenceType depType, bool IsArtificial = false);

FUNC_RESULT Finish_();

Expand Down Expand Up @@ -629,6 +647,9 @@ class InstSchedule {
// The schedule's spill cost according to the cost function used
InstCount spillCost_;

// The number of clusters
int ClusterSize;

// An array of peak reg pressures for all reg types in the schedule
InstCount *peakRegPressures_;

Expand Down Expand Up @@ -676,6 +697,8 @@ class InstSchedule {
InstCount GetExecCost() const;
void SetSpillCost(InstCount cost);
InstCount GetSpillCost() const;
void setClusterSize(int size);
int getClusterSize() const;

void ResetInstIter();
InstCount GetFrstInst(InstCount &cycleNum, InstCount &slotNum);
Expand All @@ -699,6 +722,7 @@ class InstSchedule {
void Print(std::ostream &out, char const *const title);
void PrintInstList(FILE *file, DataDepGraph *dataDepGraph,
const char *title) const;
void Print(std::ostream &out, char const *const title, DataDepGraph *ddg);
void PrintRegPressures() const;
bool Verify(MachineModel *machMdl, DataDepGraph *dataDepGraph);
void PrintClassData();
Expand Down
67 changes: 65 additions & 2 deletions include/opt-sched/Scheduler/enumerator.h
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,12 @@ class EnumTreeNode {
InstCount peakSpillCost_;
InstCount spillCostSum_;
InstCount totalCost_ = -1;
int ClusterCost;
int ClusterActiveGroup;
int ClusterAbsorbCount;
int ClusterDLB;
int ClusterTotalCost = -1;
int ClusterBestCost;
bool totalCostIsActualCost_ = false;
ReserveSlot *rsrvSlots_;

Expand Down Expand Up @@ -276,6 +282,18 @@ class EnumTreeNode {
inline void SetSpillCostSum(InstCount cost);
inline InstCount GetSpillCostSum();

inline void setClusteringCost(int Cost);
inline int getClusteringCost();
inline void setCurClusteringGroup(int Group);
inline int getCurClusteringGroup();
inline void setClusterAbsorbCount(int Absorb);
inline int getClusterAbsorbCount();
inline void setClusterLwrBound(int ClusterDynamicLowerBound);
inline int getClusterLwrBound();
inline void setTotalClusterCost(int Cost);
inline int getTotalClusterCost();
inline bool isClustering();

bool ChkInstRdndncy(SchedInstruction *inst, int brnchNum);
bool IsNxtSlotStall();

Expand Down Expand Up @@ -317,6 +335,9 @@ class Enumerator : public ConstrainedScheduler {
friend class HistEnumTreeNode;
friend class CostHistEnumTreeNode;

// Should we cluster memory operations
bool Clustering;

// TODO(max): Document.
bool isCnstrctd_;

Expand Down Expand Up @@ -508,7 +529,7 @@ class Enumerator : public ConstrainedScheduler {
InstCount schedUprBound, int16_t sigHashSize,
SchedPriorities prirts, Pruning PruningStrategy,
bool SchedForRPOnly, bool enblStallEnum, Milliseconds timeout,
InstCount preFxdInstCnt = 0,
bool ClusteringEnabled, InstCount preFxdInstCnt = 0,
SchedInstruction *preFxdInsts[] = NULL);
virtual ~Enumerator();
virtual void Reset();
Expand All @@ -525,6 +546,8 @@ class Enumerator : public ConstrainedScheduler {
// (Chris)
inline bool IsSchedForRPOnly() const { return SchedForRPOnly_; }

inline bool isClustering() const { return Clustering; }

// Calculates the schedule and returns it in the passed argument.
FUNC_RESULT FindSchedule(InstSchedule *sched, SchedRegion *rgn) {
return RES_ERROR;
Expand Down Expand Up @@ -586,6 +609,7 @@ class LengthCostEnumerator : public Enumerator {
bool WasObjctvMet_();
bool BackTrack_();
InstCount GetBestCost_();
int GetBestClusterCost_();
void CreateRootNode_();

// Check if branching from the current node by scheduling this instruction
Expand All @@ -603,7 +627,7 @@ class LengthCostEnumerator : public Enumerator {
SchedPriorities prirts, Pruning PruningStrategy,
bool SchedForRPOnly, bool enblStallEnum,
Milliseconds timeout, SPILL_COST_FUNCTION spillCostFunc,
InstCount preFxdInstCnt = 0,
bool ClusteringEnabled, InstCount preFxdInstCnt = 0,
SchedInstruction *preFxdInsts[] = NULL);
virtual ~LengthCostEnumerator();
void Reset();
Expand All @@ -616,6 +640,7 @@ class LengthCostEnumerator : public Enumerator {
bool IsCostEnum();
SPILL_COST_FUNCTION GetSpillCostFunc() { return spillCostFunc_; }
inline InstCount GetBestCost() { return GetBestCost_(); }
int getBestClusterCost() { return GetBestClusterCost_(); }
};
/*****************************************************************************/

Expand Down Expand Up @@ -851,6 +876,44 @@ void EnumTreeNode::SetSpillCostSum(InstCount cost) {
InstCount EnumTreeNode::GetSpillCostSum() { return spillCostSum_; }
/*****************************************************************************/

void EnumTreeNode::setClusteringCost(int Cost) {
assert(Cost >= 0);
ClusterCost = Cost;
}

int EnumTreeNode::getClusteringCost() { return ClusterCost; }

void EnumTreeNode::setCurClusteringGroup(int Group) {
assert(Group >= 0);
ClusterActiveGroup = Group;
}

int EnumTreeNode::getCurClusteringGroup() { return ClusterActiveGroup; }

void EnumTreeNode::setClusterAbsorbCount(int Absorb) {
assert(Absorb >= 0);
ClusterAbsorbCount = Absorb;
}

int EnumTreeNode::getClusterAbsorbCount() { return ClusterAbsorbCount; }

void EnumTreeNode::setClusterLwrBound(int ClusterDynamicLowerBound) {
assert(ClusterDynamicLowerBound >= 0);
ClusterDLB = ClusterDynamicLowerBound;
}

int EnumTreeNode::getClusterLwrBound() { return ClusterDLB; }

void EnumTreeNode::setTotalClusterCost(int Cost) {
assert(Cost >= 0);
ClusterTotalCost = Cost;
}

int EnumTreeNode::getTotalClusterCost() { return ClusterTotalCost; }

bool EnumTreeNode::isClustering() { return enumrtr_->isClustering(); }
/*****************************************************************************/

bool EnumTreeNode::IsNxtCycleNew_() {
if (enumrtr_->issuRate_ == 1) {
return true;
Expand Down
Loading