Skip to content

Local Shortcut Variables: When to Use a Value Copy and When to Use a Reference

amirroth edited this page Sep 25, 2022 · 6 revisions

EnergyPlus has a large and deep state data structure and it is often convenient to use local variables to create shortcuts into parts of that data structure. There seems to be confusion about when those shortcuts should be values vs. references (or pointers, a reference a pointer are the same exact thing just with different syntax). Right now it appears that most existing shortcut variables that were created during the transition to the state structure were created as references and the rationale behind this is given as "Why create a local copy if you don't have to? The compiler will create a local copy if it needs one". This explanation is true in a high-level sense, you certainly don't want to unnecessarily copy data and yes the compiler can create local copies of data and optimize around them. However, it misses four important nuances:

  • The fact that when you create a reference/pointer, you are also copying something locally, you are just copying a different thing, the address vs. the value.
  • That reading/using a value through a reference/pointer is more expensive than reading/using a value copy if the compiler has not optimized the access.
  • How references/pointers effect the compilers ability to optimize.
  • The difference between scalar and non-scalar variables. As always, the nuances are often more important than the general high-level rule and in this case they certainly are.

As with most things pertaining to programming and why idiom X is better than idiom Y, it helps to understand something about how X and Y translate to machine code and how the processor will execute that machine code.

Registers and Memory

The most important thing to understand here is the difference between the two types of storage the processor deals with: register and memory. [Ed: Before you say "what about disk/network/random-IO-device?" you should know that the processor has no idea that these things exist--these are BIOS/operating system constructs that to the processor look like memory.]

Registers are the fastest kind of memory. In modern processors--and by modern I mean pretty much every processor built since the early 1980s--the computation path is layed out in such a way that reading and writing registers essentially has a cost of zero. Part and parcel of this cost is that registers are also the only type of storage on which the computation path can operate directly. The processor can add two register values and store the result in a third register. It can read the value of a register and decide whether to branch or not. The processor cannot directly add a memory value to a register value and store the result in a register. To achieve that effect, it has to perform two steps: i) read the memory value into a register (incidentally, to do this the address of the value already has to be in a register), ii) do a register-register add. Depending on the processor, the cost of reading something from memory into a register is something between 1 and 4--if the memory location happens to be in the on-chip cache, which it will be the majority of the time--and of course there is also the cost of executing the additional instruction.

Of course, the number of registers is limited. The x86_64 architecture has 16 64-bit general purpose registers [Ed: We are going to ignore SSE registers for now], meaning that at any point in time the compiler only has 16 values on which it can tell the processor to operate directly. If it wants more values, it has to shuttle values back and forth between the registers and memory. Meanwhile, the amount of memory available to the compiler is essentially unlimited, 2^64 bytes. [Ed: Of course, the computer doesn't actually have this much memory, but the operating system implements what is called "virtual memory" which makes it look like it does.] [Ed2: Incidentally, this is the meaning of "64-bit architecture", i.e., memory addresses are 64-bits, meaning that the compiler thinks that there are 2^64 bytes worth of memory and that registers are 64-bit wide so that they can hold addresses.]

Value/Copy and Reference/Pointer

Now that we know this about registers and memory, we can think about what value and reference variables look like to the processor. Let's look at this code:

Real64 TimeStepSys = state.dataHVACGlobal->TimeStepSys;
auto &thisBaseboard = state.dataElectBaseboardRad->ElecBaseboard(baseboardNum);

thisBaseboard.Energy = thisBaseboard.Power * TimeStepSys * DataHVACGlobals::SecInHour;
thisBaseboard.ConvEnergy = thisBaseboard.ConvPower * TimeStepSys * DataHVACGlobals::SecInHour;
thisBaseboard.RadEnergy = thisBaseboard.RadPower * TimeStepSys * DataHVACGlobals::SecInHour;

state and baseboardNum are parameters to the function and so they will be in registers R1 and R2 when the function body is invoked. Here is what this code will translate into (pseudo-assembly, not x86_64, but close enough):

   LOAD R1, 200 -> R3    // Load state.dataHVACGlobal into R3, member dataHVACGlobal is at offset 200 in struct state
   LOAD R3, 8 -> R4      // Load R3->TimeStepSys into R4, member TimeStepSys is at offset 8 in struct HVACGlobal

   LOAD R1, 208 -> R3    // Load state.dataElectBaseboardRad into R3. Reuse R3 since don't need to access dataHVACGlobal again
   LOAD R3, 0 -> R3      // Load R3->ElecBaseboard into R3. Reuse R3 again
   MULT R2, 40 -> R5     // Multiply numBaseboard by size of ElecBaseboard object to get offset in array
   ADD R3, R5 -> R3      // By adding starting address of array (R3) to offset (R5), we get the address/reference to thisBaseboard into R3

   LOAD R3, 80 -> R5     // Load R3.Power into R5
   MULT R5, R4 -> R5     // Multiply by TimeStepSys
   MULT R5, 3600 -> R5   // Multiply by SecInHour, this is a compile time constant so the compiler inserts it into the instruction
   STORE R5 -> R3, 88    // Store R5 into R3.Energy

   LOAD R3, 96 -> R5     // Load R3.ConvPower into R5
   MULT R5, R4 -> R5     // Multiply by TimeStepSys
   MULT R5, 3600 -> R5   // Multiply by SecInHour, this is a compile time constant so the compiler inserts it into the instruction
   STORE R5 -> R3, 104   // Store R5 into R3.ConvEnergy

   LOAD R3, 96 -> R5     // Load R3.RadPower into R5
   MULT R5, R4 -> R5     // Multiply by TimeStepSys
   MULT R5, 3600 -> R5   // Multiply by SecInHour, this is a compile time constant so the compiler inserts it into the instruction
   STORE R5 -> R3, 104   // Store R5 into R3.RadEnergy
Clone this wiki locally