karasms.com

Exploring the Latest BigQuery Features: Materialized Views Update

Written on

BigQuery has introduced exciting updates this April, particularly with the advent of Google Next. To keep you informed, I will share the latest developments separately, beginning with Materialized Views. This topic follows our previous discussion on views in the "Hidden Gems of BigQuery" series.

Materialized Views Now Generally Available

The options for allow_non_incremental_definition and max_staleness for Materialized Views have reached general availability (GA).

Given that this subject can be quite intricate—something I struggled with for a while—I believe a brief overview of Materialized Views is beneficial.

Regular views are generally understood as a "syntax sugar," "abstraction layer," or "security layer," depending on the use case. When a query is executed against a regular view, its definition is replaced with the corresponding SQL statement.

Materialized Views, however, are distinct as they store data physically.

Why opt for them? Primarily, Materialized Views can act like regular views, but they also enhance query performance in OLTP systems and help reduce costs in OLAP systems. This efficiency arises because the query can utilize the pre-calculated and stored data of the Materialized View instead of processing the entire underlying dataset.

While the benefits of improved performance and reduced costs are appealing, implementing Materialized Views requires careful management of when to compute and refresh the data. Different systems adopt various strategies for this; let's explore how BigQuery handles these challenges.

Data Freshness

The first consideration is whether to ensure that Materialized Views always reflect the latest data.

We have two choices:

  1. Guarantee the latest data (data freshness), necessitating constant data refresh.
  2. Allow for outdated data (data staleness), simplifying implementation (for example, refreshing every 5 minutes), but potentially limiting use cases.

BigQuery supports both methods, but by default, it prioritizes data freshness, so we'll focus on that approach first.

Data Refreshing

Having agreed on the need for data freshness, several strategies arise:

  1. Avoid storing view data, always querying the base tables. This guarantees freshness but operates like a regular view.
  2. Store view data on request and refresh it when base data changes, functioning similarly to a cache, which BigQuery already offers.
  3. Calculate and refresh view data on a schedule. This allows for pre-calculated data availability when a query arrives, provided no base data changes occurred post-refresh. This is indeed how BigQuery operates.

Upon declaring a Materialized View, BigQuery will:

  • Initiate the initial data calculation (which may take time depending on data volume).
  • Refresh the Materialized View data within the next 5 minutes if there are changes to the base tables (but not more frequently than every 30 minutes).

Instead of adhering to a fixed schedule (like every 5 minutes), which may not be effective if data updates infrequently, BigQuery opts to refresh data upon changes to base tables.

But how do we ensure the promised data freshness? If a query for the Materialized View arrives shortly after the base data changes, don't worry. BigQuery will directly query the base data if the Materialized View data is outdated (similar to a regular view).

In summary, BigQuery's Materialized View strategy includes:

  • Calculating view data upon creation.
  • Updating view data within 5 minutes following any changes to base data, with a maximum cap of 30 minutes.

When querying Materialized View data:

  • If the data is current, it is utilized (reducing the data size queried).
  • If not current, the query will fetch data from the base tables.

This leads to several follow-up questions:

Can I modify the 5 and 30-minute settings? You can enable or disable automatic refresh.

ALTER MATERIALIZED VIEW PROJECT.DATASET.MATERIALIZED_VIEW

SET OPTIONS (enable_refresh = true);

However, the 5-minute interval is not adjustable, as it is a best-effort approach without guarantees on refresh timing. The 30-minute frequency cap can be modified.

ALTER MATERIALIZED VIEW PROJECT.DATASET.MATERIALIZED_VIEW

SET OPTIONS (refresh_interval_minutes = 60);

Is manual data refresh possible? Yes, you can manually refresh the data regardless of the automatic refresh settings.

CALL BQ.REFRESH_MATERIALIZED_VIEW('PROJECT.DATASET.MATERIALIZED_VIEW');

What if my data changes frequently? If your base data updates often (e.g., every minute) and queries are constant, the Materialized View may essentially function as a regular view. This is something to consider.

What if my data is extensive? Is constant refreshing costly? Yes, it can be expensive. BigQuery addresses this by employing a method of incremental updates rather than full recalculations, only modifying data that has changed.

Can incremental updates apply to any SQL statements in the view? Unfortunately, no. Materialized Views do come with restrictions on allowable SQL statements.

What if I need to use SQL features outside these restrictions? This is where the allow_non_incremental_definition option comes into play, which is now GA!

What if I don’t require strict data freshness? Since BigQuery guarantees freshness, it queries base data whenever the Materialized View data is outdated. If you can work with data that is 15, 30, or more minutes old, you can set a max_staleness interval.

CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table

OPTIONS (enable_refresh = true, refresh_interval_minutes = 60,

max_staleness = INTERVAL "4:0:0" HOUR TO SECOND)

AS

...

This allows you to bypass querying the base dataset when changes are relatively recent, balancing freshness and cost.

Combining allow_non_incremental_definition and max_staleness Returning to the more complex queries within views, BigQuery aims to avoid full recalculations through incremental updates. However, if you need to bypass these restrictions, you can use the allow_non_incremental_definition statement. You must specify a max_staleness interval to understand the implications of this choice.

CREATE MATERIALIZED VIEW my_project.my_dataset.my_mv_table

OPTIONS (

enable_refresh = true, refresh_interval_minutes = 60,

max_staleness = INTERVAL "4" HOUR,

allow_non_incremental_definition = true)

AS

...

Is that everything? One more point: Materialized Views with allow_non_incremental_definition enabled are not compatible with smart-tuning.

What is smart-tuning? Smart-tuning is a significant topic. In essence, BigQuery automatically rewrites queries to utilize Materialized Views wherever feasible. This improves query performance and cost without altering results. For a query to be rewritten, the Materialized View must meet certain conditions:

  • It must belong to the same dataset as one of its base tables.
  • It must utilize the same set of base tables as the query.
  • It must include all columns being read.
  • It must cover all rows being read.

If you're interested in learning more, check out the smart-tuning documentation.

I hope this overview of the newly available features has been informative!

For more insights on Hidden Gems of BigQuery, check out the following articles:

  • Auto, Transactions and Indexes — Hidden Gems of BigQuery
  • Change History and Defaults — Hidden Gems of BigQuery — Part 2
  • Data Lineage and Routines — Hidden Gems of BigQuery — Part 3
  • Views, Caching, Scheduled Queries, Materialized Views — Hidden Gems of BigQuery — Part 4 — Five types of “Views”
  • ML in BigQuery, including new Generative AI — BigQuery Breaking News — Generative AI
  • Time-travel window, fail-safe, snapshots, and clones — Hidden Gems of BigQuery — P6 — Time-traveling and clones
  • TRY…CATCH, RAISE, ERROR, SAFE_ and SAFE. — Hidden Gems of BigQuery — P7 — Safety first

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Exploring the Ideal Keyboard for Writers: My Journey

Discover my quest for the perfect keyboard and setup to enhance my writing experience.

Exploring the Emotional Connection with Sex Dolls: A Deeper Dive

Investigating the emotional bonds some individuals form with sex dolls and the implications on relationships and sexuality.

Understanding the Profit Equation: Key to Business Success

Discover the essential profit formula that every business must understand to thrive in a competitive market.

Innovative Approaches to Gravity Batteries for Energy Storage

Explore how gravity batteries harness renewable energy and the innovative methods for energy storage.

Mastering Technical Writing: The Essential ABCs Explained

Discover the significance of Accuracy, Brevity, and Clarity in technical writing for effective communication.

Navigating Idealism and Reality: Reflections from the Ocean

A personal reflection on idealism, reality, and transformation inspired by a new life in Australia.

Practical Strategies for Cultivating Resilience in Life

Explore actionable techniques to strengthen your resilience and thrive through adversity with positivity and determination.

Understanding Dopamine: The Misconception Behind Happiness

Explore the truth about dopamine, its relation to happiness, and how to cultivate genuine joy in your life.