From bf54914b964933d60b0958b3e5d4518190af8ba8 Mon Sep 17 00:00:00 2001 From: Brian Beggs Date: Tue, 4 Mar 2025 10:52:03 -0800 Subject: [PATCH] [skip ci] Update llms.md (#18464) ### Ticket N/A ### Problem description Order of sections was incorrect. ### What's changed Section order corrected. ### Checklist - [ ] [All post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml) CI passes - [ ] [Blackhole Post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml) CI passes (if applicable) - [ ] [Model regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml) CI passes (if applicable) - [ ] [Device performance regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml) CI passes (if applicable) - [ ] **(For models and ops writers)** Full [new models tests](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) CI passes (if applicable) - [ ] New/Existing tests provide coverage for changes --- tech_reports/LLMs/llms.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tech_reports/LLMs/llms.md b/tech_reports/LLMs/llms.md index 12abeb815b7..0342e432399 100644 --- a/tech_reports/LLMs/llms.md +++ b/tech_reports/LLMs/llms.md @@ -209,7 +209,7 @@ ttnn_gamma_rm = ttnn.as_tensor( The distributed implementation is designed for cases where activations are **sharded along the embedding dimension** across multiple devices. It ensures the correct computation of mean and variance across shards by leveraging cross-device communication. Both interleaved and width-sharded inputs are supported. -##### 2.3.2.2.1 Steps to Perform Distributed Normalization on TT-Devices +##### 2.3.1.2.1 Steps to Perform Distributed Normalization on TT-Devices 1. **Compute Local Statistics** - Each device computes the required statistics (e.g., \(E[x]\), \(E[x^2]\)) locally on its shard of the input tensor. - For **RMSNorm**, only \(E[x^2]\) is required.