When a CLUSTER BY clause is added to a Snowflake table, it specifies one or more columns to organize the data within the table’s micro-partitions. This clustering aims to colocate data with similar values in the same or adjacent micro-partitions. By doing so, it enhances the efficiency of query pruning, where the Snowflake query optimizer can skip over irrelevant micro-partitions that do not contain the data relevant to the query, thereby improving performance.
[References:, Snowflake Documentation on Clustering Keys & Clustered Tables1., Community discussions on how source data’s ordering affects a table with a cluster key, ]
Submit