Smartsheet to BigQuery: A Practical Integration Guide

13 min read

How to get Smartsheet data into BigQuery for real analysis — covering the API, ETL options, schema design, incremental loading, and how to deliver charts downstream to Slack automatically.

Smartsheet to BigQuery: A Practical Integration Guide

Smartsheet is where a lot of operational work actually lives. Project plans, resource allocation grids, budget trackers, intake forms, programme portfolios. Teams love it because it looks like a spreadsheet but behaves more like a database — row hierarchy, conditional logic, cross-sheet formulas, automated workflows.

The problem is that Smartsheet's analytics are limited to what Smartsheet can see. If you want to know how your project delivery rate correlates with your sales pipeline, or how resource utilisation has trended over the last two years, or how your intake queue backlog compares across quarters — you need a warehouse.

This guide covers the full journey: getting Smartsheet data into BigQuery correctly, handling the data model quirks that trip most people up, and then using that data to power automated chart delivery to Slack.


Why Smartsheet Data Belongs in BigQuery

Before getting into the mechanics, it's worth being precise about which Smartsheet data is valuable in a warehouse — because not all of it is.

High-value data to bring into BigQuery:

  • Project portfolio status — current phase, RAG status, owner, target dates across all projects. In Smartsheet this is usually a multi-level hierarchy sheet. In BQ you can flatten it, aggregate by department or portfolio, and join against delivery outcomes.
  • Resource allocation / capacity — who is assigned to what, at what percentage, across what dates. Smartsheet is commonly used as a capacity planner. In BQ you can calculate utilisation rates, spot overallocation, and trend it over time.
  • Budget vs actuals — finance teams often maintain spend tracking in Smartsheet sheets with monthly actuals filled in. In BQ this can be joined against GL data or ARR data for a complete picture.
  • Intake / request queues — volume, source, category, resolution time, assignee. Smartsheet forms feed into sheets; that operational data is rich for SLA analysis.
  • Cross-sheet summaries — Smartsheet reports can pull rows from multiple sheets, but they can't do proper SQL aggregations. A BQ view can.

Lower-value data to think carefully about:

  • Attachment metadata and discussion threads (useful for audit trails, but rarely for analytics)
  • Auto-number columns and Smartsheet system fields (mostly internal plumbing)
  • Sheets that are really just working documents rather than structured data records

Option 1: Managed ETL (Fastest Path)

If you want Smartsheet in BigQuery with minimal engineering investment, managed connectors are the right call:

Fivetran

Fivetran has a Smartsheet connector that runs incremental syncs on a schedule. It normalises sheets into BQ tables, handles pagination and API rate limits, and alerts on schema drift. The catch is cost — Fivetran prices per connector per month, and it adds up fast for large orgs with many sheets.

Best for: Enterprises where engineering time is more expensive than tooling cost.

Airbyte

Airbyte is the open-source alternative. The Smartsheet source connector is community-maintained. You can self-host on a VM or Cloud Run, or use Airbyte Cloud. Less polish than Fivetran, but free to run and highly configurable.

Best for: Engineering teams comfortable with self-hosted infrastructure who want to avoid per-connector costs.

Stitch (by Talend)

Stitch sits between Fivetran and Airbyte in cost and complexity. It has a Smartsheet integration and destinations including BigQuery. Simpler setup than Airbyte, cheaper than Fivetran for smaller volumes.

Best for: Smaller data teams who want a managed service without enterprise pricing.


Option 2: Custom Python Pipeline (Most Control)

If your Smartsheet schema is complex, if you need to transform data as it lands, or if you want to run this inside your existing GCP infrastructure, a custom pipeline gives you full control.

The Smartsheet Python SDK

Smartsheet publishes an official Python SDK. Install it alongside the BigQuery client:

pip install smartsheet-python-sdk google-cloud-bigquery pandas pyarrow

Pulling a sheet

The core unit in Smartsheet is a Sheet. Every sheet has:

  • A columns list — each column has an id, title, and type
  • A rows list — each row has cells, with each cell referencing a column_id
import smartsheet

ss = smartsheet.Smartsheet(access_token='your_token')
sheet = ss.Sheets.get_sheet(
    sheet_id=1234567890,
    include=['objectValue'],  # Needed for CONTACT_LIST and multi-value types
)

# Build a column ID → title map
columns = {col.id: col.title for col in sheet.columns}
col_types = {col.id: col.type for col in sheet.columns}

Handling Smartsheet column types

This is where most DIY pipelines go wrong. Smartsheet has a richer column type system than it appears:

Smartsheet typeWhat it containsHow to extract
TEXT_NUMBERString or numbercell.value
DATEISO date stringcell.value
DATETIMEISO datetimecell.value
CONTACT_LISTObject with name, emailcell.object_value.email
MULTI_CONTACT_LISTList of contact objectsJoin emails with comma
CHECKBOXBooleancell.value (True/False)
PICKLISTStringcell.value
MULTI_PICKLISTList of stringscell.object_value.values
DURATIONString like "4W 2D"cell.value (parse separately)
PREDECESSORDependency stringcell.value
AUTO_NUMBERAuto-incremented stringcell.value
def extract_cell_value(cell, col_type):
    if cell.value is None:
        return None

    if col_type == 'CONTACT_LIST':
        obj = cell.object_value
        if obj and hasattr(obj, 'email'):
            return obj.email
        return cell.display_value

    elif col_type == 'MULTI_CONTACT_LIST':
        obj = cell.object_value
        if obj and hasattr(obj, 'values'):
            return ','.join(
                v.email for v in obj.values if hasattr(v, 'email')
            )
        return cell.display_value

    elif col_type == 'MULTI_PICKLIST':
        obj = cell.object_value
        if obj and hasattr(obj, 'values'):
            return ','.join(obj.values)
        return cell.display_value

    else:
        # For TEXT_NUMBER, DATE, DATETIME, CHECKBOX, PICKLIST
        return cell.display_value or cell.value

Handling row hierarchy

Smartsheet rows can be nested — a project has sub-tasks, sub-tasks have checklist items. Each row carries parent_id and indent fields. Whether you need to preserve this depends on your use case:

def parse_rows(sheet, columns, col_types):
    rows = []
    for row in sheet.rows:
        record = {
            '_row_id': row.id,
            '_row_number': row.row_number,
            '_parent_id': row.parent_id,       # None if top-level
            '_indent': row.indent,              # 1 = top-level, 2 = child, etc.
            '_created_at': row.created_at.isoformat() if row.created_at else None,
            '_modified_at': row.modified_at.isoformat() if row.modified_at else None,
        }
        for cell in row.cells:
            col_title = columns.get(cell.column_id)
            col_type = col_types.get(cell.column_id)
            if col_title:
                # Sanitise column name for BQ
                bq_col = col_title.lower().replace(' ', '_').replace('-', '_')
                record[bq_col] = extract_cell_value(cell, col_type)
        rows.append(record)
    return rows

If you're building project hierarchy analysis, keep _parent_id and _indent in your BQ table and use recursive CTEs or ARRAY_AGG to reconstruct the tree.

Writing to BigQuery

import pandas as pd
from google.cloud import bigquery

def load_to_bigquery(rows, project, dataset, table, write_mode='WRITE_TRUNCATE'):
    df = pd.DataFrame(rows)

    bq = bigquery.Client(project=project)
    table_ref = f'{project}.{dataset}.{table}'

    job_config = bigquery.LoadJobConfig(
        write_disposition=write_mode,  # WRITE_TRUNCATE for full refresh
        autodetect=True,
        source_format=bigquery.SourceFormat.PARQUET,
    )

    job = bq.load_table_from_dataframe(df, table_ref, job_config=job_config)
    job.result()
    print(f'Loaded {len(df)} rows into {table_ref}')

Option 3: CSV Export → GCS → BigQuery (Simple but Manual)

For teams that don't need real-time freshness and prefer simplicity:

  1. In Smartsheet, export a sheet as CSV (File → Export → Export Sheet to CSV)
  2. Upload to a GCS bucket (gsutil cp sheet.csv gs://your-bucket/smartsheet/)
  3. Load into BigQuery via bq load or the console

This is fine for monthly snapshots or ad hoc analysis, but breaks down for anything you want to automate or keep fresh. The CSV export also loses some metadata (row IDs, hierarchy) that the API preserves.


Incremental Loading: Only Sync What Changed

Full sheet refreshes are expensive at scale and burn through your Smartsheet API quota (300 requests/minute on most plans). Once your initial load is in place, switch to incremental loading.

The Smartsheet API supports a rowsModifiedSince filter:

from datetime import datetime, timezone

def get_modified_rows(ss, sheet_id, since: datetime):
    """Pull only rows modified after `since`."""
    sheet = ss.Sheets.get_sheet(
        sheet_id=sheet_id,
        rows_modified_since=since.strftime('%Y-%m-%dT%H:%M:%SZ'),
        include=['objectValue'],
    )
    return sheet

# Load last sync time from a metadata table in BQ
def get_last_sync_time(bq, project, dataset) -> datetime:
    query = f"""
        SELECT MAX(sync_at) AS last_sync
        FROM `{project}.{dataset}._sync_metadata`
        WHERE sheet_id = @sheet_id
    """
    # ... execute and return datetime or epoch if no prior sync
    pass

# After a successful load, record the sync time
def record_sync(bq, project, dataset, sheet_id):
    now = datetime.now(timezone.utc)
    rows = [{'sheet_id': str(sheet_id), 'sync_at': now.isoformat()}]
    bq.insert_rows_json(f'{project}.{dataset}._sync_metadata', rows)

For the BQ write, use WRITE_APPEND with a subsequent MERGE statement rather than WRITE_TRUNCATE to upsert changed rows against the existing table:

-- Merge incremental rows into the main table
MERGE `your-project.smartsheet.projects` AS target
USING `your-project.smartsheet.projects_staging` AS source
  ON target._row_id = source._row_id
WHEN MATCHED THEN
  UPDATE SET
    status = source.status,
    owner = source.owner,
    target_date = source.target_date,
    _modified_at = source._modified_at
WHEN NOT MATCHED THEN
  INSERT ROW

This is the right pattern for any Smartsheet sheet that updates frequently — project status grids, resource plans, budget trackers.


Schema Design Recommendations

A few patterns that work well for Smartsheet data in BigQuery:

One table per sheet (simple, recommended to start)

Each Smartsheet sheet becomes a BQ table. Column names are sanitised sheet column titles. This mirrors the source structure and is easy to reason about.

smartsheet.projects           -- Main project portfolio sheet
smartsheet.resources          -- Resource allocation sheet
smartsheet.budget_tracker     -- Budget vs actuals sheet
smartsheet.intake_queue       -- Incoming request form responses

Add a _sheet_id column for multi-sheet merges

If you're ingesting multiple sheets with the same structure (e.g., one sheet per region or one sheet per team), add _sheet_id and _sheet_name columns so you can UNION ALL them cleanly:

SELECT *, '1234567890' AS _sheet_id, 'EMEA Projects' AS _sheet_name
FROM `smartsheet.emea_projects`
UNION ALL
SELECT *, '9876543210' AS _sheet_id, 'APAC Projects' AS _sheet_name
FROM `smartsheet.apac_projects`

Flatten hierarchy into a view

Keep the raw _parent_id and _indent in your base table, then expose a flattened view for analysts:

-- View: top-level projects with summary child counts
CREATE OR REPLACE VIEW `smartsheet.vw_projects_summary` AS
SELECT
  p.project_name,
  p.status,
  p.owner,
  p.target_date,
  COUNT(c._row_id) AS task_count,
  COUNTIF(c.status = 'Complete') AS tasks_complete,
  SAFE_DIVIDE(COUNTIF(c.status = 'Complete'), COUNT(c._row_id)) AS completion_rate
FROM `smartsheet.projects` p
LEFT JOIN `smartsheet.projects` c ON c._parent_id = p._row_id
WHERE p._parent_id IS NULL  -- Top-level rows only
GROUP BY 1, 2, 3, 4

Querying Smartsheet Data in BigQuery

Once your pipeline is running, BQ unlocks analysis that Smartsheet simply cannot do:

-- Portfolio health: how many projects are at risk this quarter?
SELECT
  owner,
  status,
  COUNT(*) AS project_count,
  COUNTIF(target_date < CURRENT_DATE() AND status != 'Complete') AS overdue
FROM `smartsheet.vw_projects_summary`
WHERE DATE_TRUNC(target_date, QUARTER) = DATE_TRUNC(CURRENT_DATE(), QUARTER)
GROUP BY 1, 2
ORDER BY overdue DESC;
-- Resource utilisation: who is over 100% allocated next month?
SELECT
  resource_name,
  SUM(allocation_pct) AS total_allocation,
  COUNT(DISTINCT project_name) AS project_count
FROM `smartsheet.resources`
WHERE start_date <= DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH)
  AND end_date   >= CURRENT_DATE()
GROUP BY 1
HAVING total_allocation > 100
ORDER BY total_allocation DESC;
-- Intake queue SLA: average resolution time by category over last 90 days
SELECT
  category,
  COUNT(*) AS total_requests,
  COUNTIF(resolution_date IS NOT NULL) AS resolved,
  AVG(DATE_DIFF(resolution_date, submitted_date, DAY)) AS avg_days_to_resolve
FROM `smartsheet.intake_queue`
WHERE submitted_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
GROUP BY 1
ORDER BY avg_days_to_resolve DESC;

This kind of analysis is what makes the pipeline worth building. Smartsheet shows you a snapshot; BigQuery lets you ask questions across time.


Downstream: Delivering Charts to Slack with Chartcastr

Getting data into BigQuery is the hard part. The last mile — making sure the people who need to act on that data actually see it — is where most teams drop the ball.

Stakeholders don't query BigQuery. They're in Slack. If your Smartsheet project health data lives in a BQ table with nobody looking at it, you've moved the "data sitting unread" problem from Smartsheet to a fancier location.

The pattern that works

Step 1: Surface BQ results in Google Sheets (Connected Sheets)

Use Connected Sheets to pull your BQ query results into a Google Sheet on a refresh schedule:

  1. Open a Google Sheet → Data → Data connectors → Connect to BigQuery
  2. Connect to your project and run your query (e.g. the portfolio health query above)
  3. Set Connected Sheets to refresh daily

The Sheet now stays current with your BQ data automatically.

Step 2: Build your chart in Sheets

Build a chart on the Connected Sheets range — bar chart of projects by status, line chart of intake queue volume over time, whatever is most useful for your audience.

Step 3: Connect to Chartcastr → deliver to Slack

Connect the Google Sheet to Chartcastr, pick the chart or data range, select a Slack channel, and set a schedule.

Every Monday morning, Chartcastr:

  1. Reads the latest data from your Connected Sheet (which has pulled from BQ)
  2. Renders a high-resolution chart
  3. Posts it to your Slack channel with an AI-written summary

A data chart automatically posted to Slack with AI analysis

The AI summary explains what changed week-over-week — not just "here is the chart" but "project completion rate dropped 8 points; 3 projects moved from On Track to At Risk, all owned by the same team."

What to send, and to whom

BQ querySlack channelCadence
Portfolio health by owner#project-managementWeekly Monday
Overdue projects#leadershipWeekly Monday
Resource overallocation warnings#people-opsWeekly Friday
Intake queue volume + SLA#ops-teamDaily
Budget burn rate#financeMonthly first Monday

The pattern is: one BQ query → one Connected Sheet chart → one Chartcastr connection → one Slack channel on a schedule. You can have as many of these as you need.

The alternative: Chartcastr direct BigQuery connection

If you'd rather skip the Connected Sheets step, Chartcastr supports BigQuery as a direct source. Connect your GCP project, write the SQL, and Chartcastr handles the chart rendering and Slack delivery end-to-end — no intermediate Sheet required.


Putting It Together

The full Smartsheet → BigQuery → Slack pipeline looks like this:

Smartsheet (operational data)
    ↓  [Python pipeline / Fivetran / Airbyte]
BigQuery (analytical layer)
    ↓  [Connected Sheets or Chartcastr direct]
Google Sheets (optional rendering layer)
    ↓  [Chartcastr]
Slack (where the team actually sees it)

Each layer has a clear job:

  • Smartsheet — where the work happens and data is entered
  • BigQuery — where the data is analysed, joined, and historised
  • Connected Sheets / Chartcastr — where the data becomes a chart
  • Slack — where the insight reaches the team without them having to go look for it

The pipeline from Smartsheet to Slack isn't trivial to set up — but once it's running, the reporting that used to require someone manually pulling numbers and pasting screenshots into channels happens automatically every week.

Set up your first BigQuery → Slack chart delivery with Chartcastr →


Related Posts

Frequently Asked Questions

Was this post helpful?

Google SheetsSlackAI Summaries

Turn your data into automated team updates.

Connect a data source, create charts, and deliver AI-powered insights to Slack or email — in minutes.

No card required. Setup in 2 minutes.

Chartcastr