Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zone config troubleshooting guide #19283

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rmloveland
Copy link
Contributor

@rmloveland rmloveland commented Jan 6, 2025

Fixes DOC-9210

Summary of changes:

  • Add a new page, 'Troubleshoot Replication Zones', to The
    ZoneConfigonomicon (tm)

  • Update the 'Replication controls' page with more detailed info re:
    zone config inheritance hierarchy and behavior

  • Update the made-up examples in the 'Critical nodes endpoint' docs to
    point back to 'Troubleshoot Replication Zones'

  • Fix incorrect statements on the ALTER RANGE page since they're
    needed to map from range IDs returned by the critical nodes
    endpoint (mentioned in 'Troubleshoot Replication Zones') to actual
    schema objects

  • Add moar links (tm) from various zone config-related pages to the new
    troubleshooting guide and amongst themselves

  • Add a note to various zone config-related docs saying "most users
    should not do manual zone config changes, see Multi-region SQL and
    Zone Config Extensions instead"

For reviewers, the main changes are:

  • new Troubleshoot replication zones page
  • new section of 'Replication Controls' page called How zone config inheritance works
  • small changes to ALTER RANGE docs to update the SHOW RANGES queries to get them to actually work (they were missing the ... WITH DETAILS clause, and/or using crdb_internal.ranges - this was a bit of a driveby change but seemed good to do since I was touching ALTER RANGE ... CONFIGURE ZONE to point to the troubleshooting guide
  • most of the other changes are "infrastructure", adding links, etc.

@rmloveland rmloveland marked this pull request as draft January 6, 2025 22:39
@rmloveland rmloveland force-pushed the 20241209-DOC-9210-zone-config-troubleshooting-guide branch 7 times, most recently from e8df32a to 674915d Compare January 15, 2025 19:52
@rmloveland rmloveland force-pushed the 20241209-DOC-9210-zone-config-troubleshooting-guide branch 6 times, most recently from e11a980 to d460824 Compare January 27, 2025 21:57
@rmloveland rmloveland force-pushed the 20241209-DOC-9210-zone-config-troubleshooting-guide branch 3 times, most recently from fc2c697 to ab3faec Compare February 3, 2025 22:57
@rmloveland rmloveland force-pushed the 20241209-DOC-9210-zone-config-troubleshooting-guide branch from ab3faec to fb31a32 Compare February 4, 2025 17:03
@cockroachdb cockroachdb deleted a comment from netlify bot Feb 4, 2025
@cockroachdb cockroachdb deleted a comment from netlify bot Feb 4, 2025
@cockroachdb cockroachdb deleted a comment from netlify bot Feb 4, 2025
@rmloveland rmloveland marked this pull request as ready for review February 4, 2025 19:08
@cockroachdb cockroachdb deleted a comment from github-actions bot Feb 4, 2025
@rmloveland rmloveland force-pushed the 20241209-DOC-9210-zone-config-troubleshooting-guide branch 2 times, most recently from 0f0bfde to a2f820e Compare February 4, 2025 19:13
@cockroachdb cockroachdb deleted a comment from github-actions bot Feb 4, 2025
Copy link

github-actions bot commented Feb 4, 2025

Files changed:

Fixes DOC-9210

Summary of changes:

- Add a new page, 'Troubleshoot Replication Zones', to _The
  ZoneConfigonomicon (tm)_

- Update the 'Replication controls' page with more detailed info re:
  zone config inheritance hierarchy and behavior

- Update the made-up examples in the 'Critical nodes endpoint' docs to
  point back to 'Troubleshoot Replication Zones'

- Fix incorrect statements on the `ALTER RANGE` page since they're
  needed to map from range IDs returned by the critical nodes
  endpoint (mentioned in 'Troubleshoot Replication Zones') to actual
  schema objects

- Add moar links (tm) from various zone config-related pages to the new
  troubleshooting guide and amongst themselves

- Add a note to various zone config-related docs saying "most users
  should not do manual zone config changes, see Multi-region SQL and
  Zone Config Extensions instead"
@rmloveland rmloveland force-pushed the 20241209-DOC-9210-zone-config-troubleshooting-guide branch from a2f820e to a91c384 Compare February 4, 2025 20:13
@rmloveland rmloveland requested a review from fqazi February 4, 2025 21:58
@rmloveland rmloveland requested a review from annrpom February 4, 2025 21:58

The following example query uses the [`SHOW RANGES`]({% link {{ page.version.version }}/show-ranges.md %}) statement to show, for each range ID, which tables and indexes use that range for their underlying storage. The query assumes the [`movr` schema]({% link {{ page.version.version }}/movr.md %}#the-movr-database) that is loaded by [`cockroach demo`]({% link {{ page.version.version }}/cockroach-demo.md %}), so you'll need to modify it to work with your schema.

XXX: GET HELP WITH THIS QUERY, OUTPUT SEEMS NOT QUITE RIGHT??? (or is it ok?)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fqazi and @annrpom , i think this SQL query to map from range IDs to schema object names could be improved, it "kinda works" (tm) but please feel free to suggest improvements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as you can see the output is maybe not the best)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks fine to me! maybe faizan has other thoughts...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can introduce array_aggregation here to make it more readable:

 WITH movr_tables AS (SHOW RANGES FROM DATABASE movr WITH TABLES),
                                       movr_indexes AS (SHOW RANGES FROM DATABASE movr WITH INDEXES)
                                   SELECT array_agg(movr_indexes.range_id) as ranges,
                                          movr_tables.table_name,
                                          movr_indexes.index_name
                                     FROM movr_tables, movr_indexes
                                    WHERE movr_tables.range_id = movr_indexes.range_id
                                 GROUP BY  movr_tables.table_name,  movr_indexes.index_name ORDER BY table_name, index_name;

WDYT? @annrpom @rmloveland

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's great @fqazi , agree that it's easier to read, have updated to use this query - thank you! i have also updated the surrounding text to match the new table output

Copy link
Contributor

@annrpom annrpom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woah this is great!!! i left some comments, but feel free to ask me for any clarification

1. If there's no applicable table replication zone, CockroachDB uses the database replication zone.
1. If there's no applicable database replication zone, CockroachDB uses the `default` cluster-wide replication zone.
1. If there's a replication zone [for the row](#create-a-replication-zone-for-a-partition) (a.k.a. [partition]({% link {{ page.version.version }}/partitioning.md %})), CockroachDB uses it.
1. If there's no applicable row replication zone and the row is from a secondary index, CockroachDB uses the [secondary index replication zone](#create-a-replication-zone-for-a-secondary-index).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder why the docs before also just specified secondary index -- can we generalize "secondary indexes" to just "indexes" (or was this specification intentional)?

primary indexes can be configured as such:

demo@127.0.0.1:26257/demoapp/movr> CREATE TABLE foo(i INT PRIMARY 
                                -> KEY, j INT) PARTITION BY LIST  
                                -> (i) (                          
                                ->   PARTITION p VALUES IN (1, 5) 
                                -> );                             

demo@127.0.0.1:26257/demoapp/movr> ALTER INDEX foo@foo_pkey       
                                -> CONFIGURE ZONE USING           
                                -> num_replicas = 11; 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching this, idk why it says "secondary index" and is so specific about it, i don't think there is a serious intent behind that, i think it's just how the person writing the docs did it at one time. looking at git it's a change from v1.2 so i think it's just historical accident

went through here and a few other places in this doc that said "secondary index" and changed to just say "index"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have a similar comment here as the outline-style diagram, but also:

i would expect a direct arrow from index (on the left) to the partitions -- right now, it looks like the inheritance has table as the parent and index, partition, and subpartitions as the direct children

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent was to show table / index level partitions? Probably naming would help the individual examples..I did a quick thing with PlantUML to split it off, but we need to call both of them Partition

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would expect a direct arrow from index (on the left) to the partitions -- right now, it looks like the inheritance has table as the parent and index, partition, and subpartitions as the direct children

ah yes, oops, i added a direct arrow from the index to the first partition

this leaves table as the parent of the index (and the subpartitions per cockroachdb/cockroach#75862)

i hope it's a bit better, please let me know what you think!

I think the intent was to show table / index level partitions? Probably naming would help the individual examples

yes, sorry, the intent was to show the table/index level partitions and how the subpartitions are really inheriting from tables per the linked issue above

i also changed all of the names here to match the names added to the other diagram, i hope it helps but please let me know

- sub-partition 1.1 (NB. Sub-partitions inherit from tables, *not* partitions - see cockroachdb/cockroach#75862)
- sub-partition 1.1.1
- sub-partition 1.2
- table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i believe adding 2 tables to this diagram is trying to illustrate how inheritance comes in when multiple of the same type of object have their zone configured (which is great), but would we be able to make this distinction more clear?

right now, i think i'm reading "table" as "all tables" if that makes why i was confused a bit more clear

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 tables here does make it a bit confusing, maybe naming the objects will help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry, ya i can see how that is confusing

took a shot at adding some names to the DBs/table/indexes , PTAL - i don't know if these are good names but they attempt to show the hierarchy a little. def open to feedback tho

- default
  - database A
    - table A.B
      - index A.B.1
        - partition (row) 1
          - sub-partition 1.1 (NB. Sub-partitions inherit from tables, *not* partitions - see cockroachdb/cockroach#75862)
            - sub-partition 1.1.1
          - sub-partition 1.2
    - table A.C
      - index A.C.1
  - database D
    - ...

Copy link

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good a few minor comments


As [discussed previously](#level-priorities), CockroachDB always uses the most granular replication zone available for each schema object (database, table, etc.). More-specific settings applied to schema objects at lower levels of the inheritance tree will always override the settings of objects above them in the tree. All configurations will therefore be modified versions of the [`default` range]({% link {{ page.version.version }}/configure-replication-zones.md %}#view-the-default-replication-zone), which acts as the root of the tree. This means that in practice, **any changes to a specific schema object's zone configurations are by definition user-initiated**, either via [Multi-region SQL abstractions]({% link {{ page.version.version }}/multiregion-overview.md %}) or [manual changes](#why-manual-zone-config-management-is-not-recommended).

Because each zone config inherits all of its initial values from its parent object, **it only stores the values that differ from its parent**. Any fields that are unset will be looked up in the parent object’s zone configuration. This continues recursively up the inheritance hierarchy all the way to the `default` zone config. In practice, most values are cached for performance.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, most values are cached for performance.

I guess the intention here is to say there is no overhead from the inheritance right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but i'm also happy to remove it, seems like a level of detail that doesn't matter for conceptual understanding. wdyt?

- sub-partition 1.1 (NB. Sub-partitions inherit from tables, *not* partitions - see cockroachdb/cockroach#75862)
- sub-partition 1.1.1
- sub-partition 1.2
- table
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 tables here does make it a bit confusing, maybe naming the objects will help?


The following example query uses the [`SHOW RANGES`]({% link {{ page.version.version }}/show-ranges.md %}) statement to show, for each range ID, which tables and indexes use that range for their underlying storage. The query assumes the [`movr` schema]({% link {{ page.version.version }}/movr.md %}#the-movr-database) that is loaded by [`cockroach demo`]({% link {{ page.version.version }}/cockroach-demo.md %}), so you'll need to modify it to work with your schema.

XXX: GET HELP WITH THIS QUERY, OUTPUT SEEMS NOT QUITE RIGHT??? (or is it ok?)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can introduce array_aggregation here to make it more readable:

 WITH movr_tables AS (SHOW RANGES FROM DATABASE movr WITH TABLES),
                                       movr_indexes AS (SHOW RANGES FROM DATABASE movr WITH INDEXES)
                                   SELECT array_agg(movr_indexes.range_id) as ranges,
                                          movr_tables.table_name,
                                          movr_indexes.index_name
                                     FROM movr_tables, movr_indexes
                                    WHERE movr_tables.range_id = movr_indexes.range_id
                                 GROUP BY  movr_tables.table_name,  movr_indexes.index_name ORDER BY table_name, index_name;

WDYT? @annrpom @rmloveland

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent was to show table / index level partitions? Probably naming would help the individual examples..I did a quick thing with PlantUML to split it off, but we need to call both of them Partition

image

@rmloveland
Copy link
Contributor Author

@fqazi and @annrpom thanks for looking at this! i've updated it based on your feedback so far, PTAL and let me know what you think. happy to update as needed

@rmloveland rmloveland requested review from fqazi and annrpom February 13, 2025 21:30
Copy link
Contributor

@annrpom annrpom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! 🚀

Copy link

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

@rmloveland
Copy link
Contributor Author

woooohooooo thanks for the reviews @annrpom and @fqazi !!!!!!!!!!

now on to docs team review

@rmloveland rmloveland requested a review from taroface February 18, 2025 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants