Use SQL with Table Sources

Use SQL expressions to manage your source data.

Limited Availability

Access requirements

Profiles: Data Engineer

Custom profile with these capabilities: Data (Write, Detailed)

Data required: Access to the sources you're managing

Reach out to your administrator for access.

Overview

Sources are the building blocks for data in Visier. They are the raw materials to map your data to Visier's analytic model.

Visier supports:

  • File sources: File-based containers that use regex and override behavior to manage data. For more information about acceptable files, see Data File Guidelines.
  • Table sources: Row-based containers that use SQL and APIs to manage data. For more information, see Use SQL with Table Sources.

Table sources use SQL to bring data into Visier. You can create table sources:

  • With Visier APIs. We recommend using APIs for table sources. For more information, see Table Source API.
  • In Studio. The interface is best suited for troubleshooting data issues and working with table sources from data connectors, such as Snowflake. For more information, see Manually create a table source.

Difference from file sources

You can use Spark SQL to query, upsert, replace, commit, rollback, and process data. SQL expressions tell Visier how to map and replace data. File sources use Visier Extraction Language (VEL) and Override behavior.

A table source temporarily stores data while you commit the data to Visier, then discards the files. Table sources are better for real-time data processing because they do not track individual files indefinitely and can efficiently handle updates and deletions. As a result, the volume of data to process is much lower.

About transactions

To bring data into a table source, start a transaction. A transaction is a temporary container that stores data and SQL actions, such as inserting or replacing data in the table source. An open transaction is a write transaction or a read transaction.

  • Write transaction: The transaction makes changes to a table source. One write transaction can be open at a time. Close a transaction by committing the transaction and processing the data. In an API call, specify tablesToModify to lock tables for writing during the transaction.
  • Read transaction: The transaction can view data within a table source. While a SQL job processes changes to a table source, a read transaction will only see the data available prior to the job. In an API call, omit tablesToModify to make the transaction read-only.

About input files

A data file that you upload to a table source transaction is an input file. An input file can contain data to insert into your sources, IDs to match against to change or delete rows, or any other input for your SQL query.

SQL expressions reference input files in the transaction by the id field (set in the job's inputs array), not the filename.

Example: An input file is declared as "id": "new_employees" in the inputs field. SQL expressions reference this file as new_employees, regardless of the input file's filename.

If the id contains special characters such as dots or hyphens, wrap the ID in backticks; for example, `my-input.csv`.

Add data to a table source

Prerequisites: Connect your source data to objects in Visier. For more information, see Add a Mapping. If mapping by formula, use SQL expressions instead of VEL.

Use APIs to add data (recommended)

  1. Start a transaction: POST /v1alpha/data/table-sources/transactions. For more information and code samples, see Start a transaction.
  2. Upload a file to the transaction: PUT /v1alpha/data/table-sources/transactions/{transactionId}/inputs/{filename}. For more information and code samples, see Upload an input file to a transaction.
  3. Apply SQL expressions and process the data: POST /v1alpha/data/table-sources/transactions/{transactionId}/jobs. For more information and code samples, see Run a SQL job within a transaction.

Use Studio to add data

  1. On the global navigation bar, click Sources.
  2. Select the table source and click Records.
  3. In Temporary Data, click Add Files.
  4. Select the files to upload.
  5. In SQL Commands, type the SQL expressions to apply to the source. For an example of potential SQL expressions, see SQL examples.
  6. Click Execute SQL. The expression moves to Current Transaction.
  7. Optional: To remove SQL expressions, in Current Transaction, click Rollback.
  8. When finished, click Commit.
  9. To process the data, run a job. For more information, see Run a Job.

Define column settings

Assign each column a data type and optionally designate one column as the primary timestamp.

Use APIs to define columns (recommended)

Define the columns array in any of the following API calls:

  1. Create a table source: POST /v1alpha/data/table-sources/tables. For more information and code samples, see Create a table source.
  2. Update a table source: PUT /v1alpha/data/table-sources/tables/{tableId}. For more information and code samples, see Update a table source.

Use Studio to define columns

After creating a table source in Studio, you can create columns, assign column data types, and choose the primary timestamp. For more information, see Manually create a table source.

SQL examples

This section provides sample SQL expressions to use in table source queries and mapping formulas. For more information about SQL functions, see Apache Spark SQL Functions.

Table source mapping formulas have protective limitations. They cannot use functions that change row counts or depend on table order. For a list of SQL functions that are not supported, see Unsupported SQL functions.

Samples in this section use the following table.

Copy
-- employee_custom_data
-- Columns: employee_id (STRING), badge_number (STRING), office_floor (INTEGER),
--          is_remote (BOOLEAN), effective_date (DATE)

SELECT

Copy
Example: Basic select
SELECT employee_id, badge_number, office_floor, effective_date
FROM employee_custom_data
Copy
Example: Filter rows
SELECT employee_id, badge_number, office_floor, effective_date
FROM employee_custom_data
WHERE effective_date >= '2026-01-01'
  AND is_remote = false
Copy
Example: Aggregate by office floor
SELECT office_floor,
       COUNT(*)                                         AS headcount,
       SUM(CASE WHEN is_remote THEN 1 ELSE 0 END)      AS remote_count
FROM employee_custom_data
WHERE effective_date = '2026-01-01'
GROUP BY office_floor
ORDER BY office_floor
Copy
Example: Join against an input file with ID new_employees
SELECT e.employee_id,
       e.badge_number,
       n.office_floor,
       n.is_remote,
       n.effective_date
FROM new_employees n
LEFT JOIN employee_custom_data e
    ON e.employee_id = n.employee_id
   AND e.effective_date = n.effective_date
Copy
Example: Remove duplicate rows and keep the most recent record per employee
SELECT employee_id, badge_number, office_floor, is_remote, effective_date
FROM (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY effective_date DESC) AS rn
    FROM employee_custom_data
) ranked
WHERE rn = 1

INSERT

Copy
Example: Insert rows from an input view with ID new_employees
INSERT INTO employee_custom_data
SELECT employee_id, badge_number, office_floor, is_remote, effective_date
FROM new_employees
Copy
Example: Insert a single row
INSERT INTO employee_custom_data (employee_id, badge_number, office_floor, is_remote, effective_date)
VALUES ('E001', 'B-1042', 3, false, '2026-03-01')
Copy
Example: Insert only rows that don't already exist in the target file
INSERT INTO employee_custom_data
SELECT n.employee_id, n.badge_number, n.office_floor, n.is_remote, n.effective_date
FROM new_employees n
WHERE NOT EXISTS (
    SELECT 1
    FROM employee_custom_data e
    WHERE e.employee_id   = n.employee_id
      AND e.effective_date = n.effective_date
)

UPDATE

Copy
Example: Update a single column for all matching rows
UPDATE employee_custom_data
SET is_remote = true
WHERE office_floor = 0
Copy
Example: Update multiple columns using values from an input view
MERGE INTO employee_custom_data AS target
USING new_employees AS source
    ON target.employee_id    = source.employee_id
   AND target.effective_date = source.effective_date
WHEN MATCHED THEN
    UPDATE SET
        office_floor = source.office_floor,
        is_remote    = source.is_remote

Note: UPDATE only supports a WHERE clause. UPDATE does not support join-based updates. Use MERGE with a WHEN MATCHED clause to update existing rows from a source view without inserting new rows.

DELETE

Copy
Example: Delete rows for a specific date
DELETE FROM employee_custom_data
WHERE effective_date = '2025-12-31'
Copy
Example: Delete rows from the table that have no matching record in the input file. This acts as a hard delete for removed employees.
DELETE FROM employee_custom_data
WHERE employee_id NOT IN (
    SELECT employee_id FROM new_employees
)
Copy
Example: Change data capture (CDC) to delete rows flagged for deletion in the source
DELETE FROM employee_custom_data
WHERE (employee_id, effective_date) IN (
    SELECT employee_id, effective_date
    FROM new_employees
    WHERE is_deleted = true
)

The above example processes a CDC feed where the input file includes an is_deleted column. This removes target file rows that have a matching key in the source file and are flagged for deletion.

MERGE

Copy
Example: Upsert (insert new rows, update existing rows)
MERGE INTO employee_custom_data AS target
USING new_employees AS source
    ON target.employee_id   = source.employee_id
   AND target.effective_date = source.effective_date
WHEN MATCHED THEN
    UPDATE SET
        badge_number = source.badge_number,
        office_floor = source.office_floor,
        is_remote    = source.is_remote
WHEN NOT MATCHED THEN
    INSERT (employee_id, badge_number, office_floor, is_remote, effective_date)
    VALUES (source.employee_id, source.badge_number, source.office_floor, source.is_remote, source.effective_date)

A single MERGE statement can handle updates and inserts but cannot delete rows that are absent from the source. To implement a full sync, run a MERGE for the upsert followed by a separate DELETE for rows no longer present in the source. These can be submitted as two separate query actions in the same job.

Copy
Example: Full sync (insert new rows, update existing rows, delete removed rows)
// Step 1: Upsert from the source file
MERGE INTO employee_custom_data AS target
USING new_employees AS source
    ON target.employee_id   = source.employee_id
   AND target.effective_date = source.effective_date
WHEN MATCHED THEN
    UPDATE SET
        badge_number = source.badge_number,
        office_floor = source.office_floor,
        is_remote    = source.is_remote
WHEN NOT MATCHED THEN
    INSERT (employee_id, badge_number, office_floor, is_remote, effective_date)
    VALUES (source.employee_id, source.badge_number, source.office_floor, source.is_remote, source.effective_date)

// Step 2: Delete rows that are no longer in the source file
DELETE FROM employee_custom_data
WHERE employee_id NOT IN (SELECT employee_id FROM new_employees)

The following example processes a CDC feed where the input includes an op column indicating the operation type:

  • 'I' for insert.
  • 'U' for update.
  • 'D' for delete.

This removes rows matched by key that are flagged for deletion, updates other matched rows are updated, and inserts unmatched rows that are not deletions.

Copy
Example: Change data capture (CDC) to apply a change feed with inserts, updates, and deletes
MERGE INTO employee_custom_data AS target
USING new_employees AS source
    ON target.employee_id   = source.employee_id
   AND target.effective_date = source.effective_date
WHEN MATCHED AND source.op = 'D' THEN
    DELETE
WHEN MATCHED AND source.op != 'D' THEN
    UPDATE SET
        badge_number = source.badge_number,
        office_floor = source.office_floor,
        is_remote    = source.is_remote
WHEN NOT MATCHED AND source.op != 'D' THEN
    INSERT (employee_id, badge_number, office_floor, is_remote, effective_date)
    VALUES (source.employee_id, source.badge_number, source.office_floor, source.is_remote, source.effective_date)
Copy
Example: Only update rows where specific columns are different
MERGE INTO employee_custom_data AS target
USING new_employees AS source
    ON target.employee_id   = source.employee_id
   AND target.effective_date = source.effective_date
WHEN MATCHED AND (
    target.office_floor <> source.office_floor OR
    target.is_remote    <> source.is_remote
) THEN
    UPDATE SET
        office_floor = source.office_floor,
        is_remote    = source.is_remote
WHEN NOT MATCHED THEN
    INSERT (employee_id, badge_number, office_floor, is_remote, effective_date)
    VALUES (source.employee_id, source.badge_number, source.office_floor, source.is_remote, source.effective_date)

Spark SQL for column mappings

Use these expressions in the SELECT list of a query to transform or derive column values before writing them to a table source.

String functions

Copy
Example: Normalize to uppercase
UPPER(employee_id)                                  AS employee_id
Copy
Example: Strip leading and trailing whitespace
TRIM(badge_number)                                  AS badge_number
Copy
Example: Concatenate columns
CONCAT(first_name, ' ', last_name)                  AS full_name
Copy
Example: Concatenate columns, taking the separator once
CONCAT_WS(' ', first_name, last_name)
Copy
Example: Extract a substring (1-indexed) for characters 1 to 3
SUBSTR(employee_id, 1, 3)                           AS id_prefix
Copy
Example: Remove non-alphanumeric characters
REGEXP_REPLACE(badge_number, '[^A-Za-z0-9]', '')    AS badge_clean
Copy
Example: Split a delimited string and take the first element
SPLIT(location_code, '-')[0]                        AS region_code
Copy
Example: Replace nulls with a default string
COALESCE(department_name, 'Unknown')                AS department_name
Copy
Example: Replace nulls with a default string
COALESCE(department_name, 'Unknown')                AS department_name

Date and time functions

Copy
Example: Parse a string into a date
TO_DATE(hire_date_str, 'yyyy-MM-dd')                AS hire_date
Copy
Example: Format a date as a string
DATE_FORMAT(effective_date, 'yyyy-MM')              AS year_month
Copy
Example: Add 90 days to a date
DATE_ADD(effective_date, 90)                        AS review_date
Copy
Example: Number of days between two dates
DATEDIFF(end_date, start_date)                      AS tenure_days
Copy
Example: Extract parts of a date
YEAR(effective_date)                                AS year
MONTH(effective_date)                               AS month
DAYOFWEEK(effective_date)                           AS day_of_week
Copy
Example: Current date at query time
CURRENT_DATE()                                      AS snapshot_date
Copy
Example: Truncate to the start of the month
DATE_TRUNC('month', effective_date)                 AS month_start

Conditional expressions

Copy
Example: Simple if/else
IF(is_remote, 'Remote', 'On-site')                  AS work_location
Copy
Example: Multi-branch conditional
CASE
    WHEN office_floor >= 10 THEN 'Executive'
    WHEN office_floor >= 5  THEN 'Senior'
    ELSE                         'Standard'
END                                                 AS floor_tier
Copy
Example: Return null if two values are equal (useful for clearing sentinel values)
NULLIF(badge_number, 'N/A')                         AS badge_number
Copy
Example: First non-null value in a list
COALESCE(preferred_name, legal_name, employee_id)   AS display_name

Numeric functions

Copy
Example: Round to 2 decimal places
ROUND(salary / 12.0, 2)                             AS monthly_salary
Copy
Example: Absolute value
ABS(variance)                                       AS abs_variance
Copy
Example: Integer division remainder
MOD(employee_seq, 10)                               AS shard_key
Copy
Example: Cast string to integer
CAST(headcount_str AS INTEGER)                      AS headcount
Copy
Example: Safe division (avoid divide-by-zero)
CASE WHEN total > 0 THEN ROUND(part / total * 100, 1) ELSE 0.0 END AS pct

Type casting

Copy
Example: String to date
CAST('2026-01-01' AS DATE)                          AS start_date
Copy
Example: Integer to string
CAST(office_floor AS STRING)                        AS floor_label
Copy
Example: String to Boolean
CAST(is_remote_flag AS BOOLEAN)                     AS is_remote
Copy
Example: Timestamp to date (drops the time component)
CAST(updated_at AS DATE)                            AS updated_date

Hashing and anonymization

Copy
Example: SHA-256 hash of a column (for pseudonymization)
SHA2(employee_id, 256)                              AS employee_id_hash
Copy
Example: MD5 hash
MD5(CONCAT(employee_id, effective_date))            AS row_fingerprint

Unsupported SQL functions

Table source mapping formulas have protective limitations. They cannot use functions that change row counts or depend on table order.

Unsupported in mapping formulas and source queries

Table sources do not support the following functions in any way.

  • Data definition language (DDL)CREATE, DROP, ALTER
  • ProceduresCALL
  • Table propertiesTBLPROPERTIES
  • Insecure functions: These functions expose sensitive information or allow arbitrary code execution, and are not supported. 
    • current_catalog
    • current_database
    • current_schema
    • current_user
    • java_method
    • raise_method
    • raise_error
    • reflect
    • try_reflect
    • user
    • version

Unsupported in mapping formulas

You can use the following functions in source queries but not in mapping formulas.

Aggregating functions

  • array_agg
  • bitmap_or_agg
  • bitmap_construct_agg
  • count
  • count_if
  • count_min_sketch
  • cume_dist
  • histogram_numeric
  • hll_sketch_agg
  • hll_union_agg
  • kurtosis
  • grouping
  • grouping_id
  • sum
  • avg
  • min
  • max
  • collect_list
  • collect_set
  • approx_count_distinct
  • first
  • last

Set generating functions

  • explode
  • explode_outer
  • inline
  • inline_outer
  • json_tuple
  • posexplode
  • posexplode_outer
  • stack

Side effect functions

  • curdate
  • current_date
  • current_timestamp
  • localtimestamp
  • monotonically_increasing_id
  • now
  • rand
  • randn
  • random
  • session_window
  • shuffle

Window functions

  • row_number
  • rank
  • dense_rank
  • ntile
  • lag
  • lead
  • cume_dist
  • percent_rank