Why You Should Avoid SELECT * Queries in Snowflake ?

Upon data ingestion into Snowflake, the platform undergoes a transformation process, reconfiguring the data into its internal, finely-tuned, and space-efficient columnar format. This optimized data is then securely stored within cloud storage infrastructure.

Snowflake takes full responsibility for the intricacies of data storage management. This encompasses organizing data, determining file sizes, defining structures, applying compression techniques, managing metadata, collecting statistics, and overseeing various other facets of data storage. Importantly, it’s worth noting that the data objects stored by Snowflake remain hidden from direct customer visibility and accessibility. Instead, customers interact with this data solely through SQL query operations performed within the Snowflake environment.

The diagram provided below visually represents the distinction between Row Storage, designed primarily for OLTP systems, and Column Storage (shown on the right), which incorporates notable compression enhancements and enables faster I/O operations.

For example, if the following query were executed on a table with 50 columns, it would retrieve just 8% of the columns from disk storage.select region, state, city, sum(sales)
from orders
group by region, state, city;

A study from Stanford University shows that columnar databases are significantly faster at analytic queries than row-oriented ones because they only read the column values needed to satisfy the query.

In production systems, avoiding this practice is essential as it brings two significant advantages. First, it reduces the volume of data transferred from remote storage into memory. Second, it ensures the efficient utilization of Virtual Warehouse Cache storage. Both of these benefits play a pivotal role in enhancing the overall query performance of the system.