Use SQL with Table Sources
Use SQL expressions to manage your source data.
Access requirements
Profiles: Data Engineer
Custom profile with these capabilities: Data (Write, Detailed)
Data required: Access to the sources you're managing
Reach out to your administrator for access.
Overview
Sources are the building blocks for data in Visier. They are the raw materials to map your data to Visier's analytic model.
Visier supports:
- File sources: File-based containers that use regex and override behavior to manage data. For more information about acceptable files, see Data File Guidelines.
- Table sources: Row-based containers that use SQL and APIs to manage data. For more information, see Use SQL with Table Sources.
Table sources use SQL to bring data into Visier. You can create table sources:
- With Visier APIs. We recommend using APIs for table sources. For more information, see Table Source API.
- In Studio. The interface is best suited for troubleshooting data issues and working with table sources from data connectors, such as Snowflake. For more information, see Manually create a table source.
Difference from file sources
You can use Spark SQL to query, upsert, replace, commit, rollback, and process data. SQL expressions tell Visier how to map and replace data. File sources use Visier Extraction Language (VEL) and Override behavior.
A table source temporarily stores data while you commit the data to Visier, then discards the files. Table sources are better for real-time data processing because they do not track individual files indefinitely and can efficiently handle updates and deletions. As a result, the volume of data to process is much lower.
About transactions
To bring data into a table source, start a transaction. A transaction is a temporary container that stores data and SQL actions, such as inserting or replacing data in the table source. An open transaction is a write transaction or a read transaction.
- Write transaction: The transaction makes changes to a table source. One write transaction can be open at a time. Close a transaction by committing the transaction and processing the data. In an API call, specify tablesToModify to lock tables for writing during the transaction.
- Read transaction: The transaction can view data within a table source. While a SQL job processes changes to a table source, a read transaction will only see the data available prior to the job. In an API call, omit tablesToModify to make the transaction read-only.
About input files
A data file that you upload to a table source transaction is an input file. An input file can contain data to insert into your sources, IDs to match against to change or delete rows, or any other input for your SQL query.
SQL expressions reference input files in the transaction by the id field (set in the job's inputs array), not the filename.
Example: An input file is declared as "id": "new_employees" in the inputs field. SQL expressions reference this file as new_employees, regardless of the input file's filename.
If the id contains special characters such as dots or hyphens, wrap the ID in backticks; for example, `my-input.csv`.
Add data to a table source
Prerequisites: Connect your source data to objects in Visier. For more information, see Add a Mapping. If mapping by formula, use SQL expressions instead of VEL.
Use APIs to add data (recommended)
- Start a transaction: POST /v1alpha/data/table-sources/transactions. For more information and code samples, see Start a transaction.
- Upload a file to the transaction: PUT /v1alpha/data/table-sources/transactions/{transactionId}/inputs/{filename}. For more information and code samples, see Upload an input file to a transaction.
- Apply SQL expressions and process the data: POST /v1alpha/data/table-sources/transactions/{transactionId}/jobs. For more information and code samples, see Run a SQL job within a transaction.
Use Studio to add data
- On the global navigation bar, click Sources.
- Select the table source and click Records.
- In Temporary Data, click Add Files.
- Select the files to upload.
- In SQL Commands, type the SQL expressions to apply to the source. For an example of potential SQL expressions, see SQL examples.
- Click Execute SQL. The expression moves to Current Transaction.
- Optional: To remove SQL expressions, in Current Transaction, click Rollback.
- When finished, click Commit.
- To process the data, run a job. For more information, see Run a Job.
Define column settings
Assign each column a data type and optionally designate one column as the primary timestamp.
Use APIs to define columns (recommended)
Define the columns array in any of the following API calls:
- Create a table source: POST /v1alpha/data/table-sources/tables. For more information and code samples, see Create a table source.
- Update a table source: PUT /v1alpha/data/table-sources/tables/{tableId}. For more information and code samples, see Update a table source.
Use Studio to define columns
After creating a table source in Studio, you can create columns, assign column data types, and choose the primary timestamp. For more information, see Manually create a table source.
SQL examples
This section provides sample SQL expressions to use in table source queries and mapping formulas. For more information about SQL functions, see Apache Spark SQL Functions.
Table source mapping formulas have protective limitations. They cannot use functions that change row counts or depend on table order. For a list of SQL functions that are not supported, see Unsupported SQL functions.
Samples in this section use the following table.
-- employee_custom_data
-- Columns: employee_id (STRING), badge_number (STRING), office_floor (INTEGER),
-- is_remote (BOOLEAN), effective_date (DATE)
SELECT
SELECT employee_id, badge_number, office_floor, effective_date
FROM employee_custom_data
SELECT employee_id, badge_number, office_floor, effective_date
FROM employee_custom_data
WHERE effective_date >= '2026-01-01'
AND is_remote = false
SELECT office_floor,
COUNT(*) AS headcount,
SUM(CASE WHEN is_remote THEN 1 ELSE 0 END) AS remote_count
FROM employee_custom_data
WHERE effective_date = '2026-01-01'
GROUP BY office_floor
ORDER BY office_floor
SELECT e.employee_id,
e.badge_number,
n.office_floor,
n.is_remote,
n.effective_date
FROM new_employees n
LEFT JOIN employee_custom_data e
ON e.employee_id = n.employee_id
AND e.effective_date = n.effective_date
SELECT employee_id, badge_number, office_floor, is_remote, effective_date
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY effective_date DESC) AS rn
FROM employee_custom_data
) ranked
WHERE rn = 1
INSERT
INSERT INTO employee_custom_data
SELECT employee_id, badge_number, office_floor, is_remote, effective_date
FROM new_employees
INSERT INTO employee_custom_data (employee_id, badge_number, office_floor, is_remote, effective_date)
VALUES ('E001', 'B-1042', 3, false, '2026-03-01')
INSERT INTO employee_custom_data
SELECT n.employee_id, n.badge_number, n.office_floor, n.is_remote, n.effective_date
FROM new_employees n
WHERE NOT EXISTS (
SELECT 1
FROM employee_custom_data e
WHERE e.employee_id = n.employee_id
AND e.effective_date = n.effective_date
)
UPDATE
UPDATE employee_custom_data
SET is_remote = true
WHERE office_floor = 0
MERGE INTO employee_custom_data AS target
USING new_employees AS source
ON target.employee_id = source.employee_id
AND target.effective_date = source.effective_date
WHEN MATCHED THEN
UPDATE SET
office_floor = source.office_floor,
is_remote = source.is_remote
Note: UPDATE only supports a WHERE clause. UPDATE does not support join-based updates. Use MERGE with a WHEN MATCHED clause to update existing rows from a source view without inserting new rows.
DELETE
DELETE FROM employee_custom_data
WHERE effective_date = '2025-12-31'
DELETE FROM employee_custom_data
WHERE employee_id NOT IN (
SELECT employee_id FROM new_employees
)
DELETE FROM employee_custom_data
WHERE (employee_id, effective_date) IN (
SELECT employee_id, effective_date
FROM new_employees
WHERE is_deleted = true
)
The above example processes a CDC feed where the input file includes an is_deleted column. This removes target file rows that have a matching key in the source file and are flagged for deletion.
MERGE
MERGE INTO employee_custom_data AS target
USING new_employees AS source
ON target.employee_id = source.employee_id
AND target.effective_date = source.effective_date
WHEN MATCHED THEN
UPDATE SET
badge_number = source.badge_number,
office_floor = source.office_floor,
is_remote = source.is_remote
WHEN NOT MATCHED THEN
INSERT (employee_id, badge_number, office_floor, is_remote, effective_date)
VALUES (source.employee_id, source.badge_number, source.office_floor, source.is_remote, source.effective_date)
A single MERGE statement can handle updates and inserts but cannot delete rows that are absent from the source. To implement a full sync, run a MERGE for the upsert followed by a separate DELETE for rows no longer present in the source. These can be submitted as two separate query actions in the same job.
// Step 1: Upsert from the source file
MERGE INTO employee_custom_data AS target
USING new_employees AS source
ON target.employee_id = source.employee_id
AND target.effective_date = source.effective_date
WHEN MATCHED THEN
UPDATE SET
badge_number = source.badge_number,
office_floor = source.office_floor,
is_remote = source.is_remote
WHEN NOT MATCHED THEN
INSERT (employee_id, badge_number, office_floor, is_remote, effective_date)
VALUES (source.employee_id, source.badge_number, source.office_floor, source.is_remote, source.effective_date)
// Step 2: Delete rows that are no longer in the source file
DELETE FROM employee_custom_data
WHERE employee_id NOT IN (SELECT employee_id FROM new_employees)
The following example processes a CDC feed where the input includes an op column indicating the operation type:
- 'I' for insert.
- 'U' for update.
- 'D' for delete.
This removes rows matched by key that are flagged for deletion, updates other matched rows are updated, and inserts unmatched rows that are not deletions.
MERGE INTO employee_custom_data AS target
USING new_employees AS source
ON target.employee_id = source.employee_id
AND target.effective_date = source.effective_date
WHEN MATCHED AND source.op = 'D' THEN
DELETE
WHEN MATCHED AND source.op != 'D' THEN
UPDATE SET
badge_number = source.badge_number,
office_floor = source.office_floor,
is_remote = source.is_remote
WHEN NOT MATCHED AND source.op != 'D' THEN
INSERT (employee_id, badge_number, office_floor, is_remote, effective_date)
VALUES (source.employee_id, source.badge_number, source.office_floor, source.is_remote, source.effective_date)
MERGE INTO employee_custom_data AS target
USING new_employees AS source
ON target.employee_id = source.employee_id
AND target.effective_date = source.effective_date
WHEN MATCHED AND (
target.office_floor <> source.office_floor OR
target.is_remote <> source.is_remote
) THEN
UPDATE SET
office_floor = source.office_floor,
is_remote = source.is_remote
WHEN NOT MATCHED THEN
INSERT (employee_id, badge_number, office_floor, is_remote, effective_date)
VALUES (source.employee_id, source.badge_number, source.office_floor, source.is_remote, source.effective_date)
Spark SQL for column mappings
Use these expressions in the SELECT list of a query to transform or derive column values before writing them to a table source.
String functions
SUBSTR(employee_id, 1, 3) AS id_prefix
REGEXP_REPLACE(badge_number, '[^A-Za-z0-9]', '') AS badge_clean
SPLIT(location_code, '-')[0] AS region_code
COALESCE(department_name, 'Unknown') AS department_name
COALESCE(department_name, 'Unknown') AS department_name
Date and time functions
YEAR(effective_date) AS year
MONTH(effective_date) AS month
DAYOFWEEK(effective_date) AS day_of_week
Conditional expressions
CASE
WHEN office_floor >= 10 THEN 'Executive'
WHEN office_floor >= 5 THEN 'Senior'
ELSE 'Standard'
END AS floor_tier
NULLIF(badge_number, 'N/A') AS badge_number
COALESCE(preferred_name, legal_name, employee_id) AS display_name
Numeric functions
CASE WHEN total > 0 THEN ROUND(part / total * 100, 1) ELSE 0.0 END AS pct
Type casting
Hashing and anonymization
SHA2(employee_id, 256) AS employee_id_hash
Unsupported SQL functions
Table source mapping formulas have protective limitations. They cannot use functions that change row counts or depend on table order.
Unsupported in mapping formulas and source queries
Table sources do not support the following functions in any way.
- Data definition language (DDL): CREATE, DROP, ALTER
- Procedures: CALL
- Table properties: TBLPROPERTIES
- Insecure functions: These functions expose sensitive information or allow arbitrary code execution, and are not supported.
- current_catalog
- current_database
- current_schema
- current_user
- java_method
- raise_method
- raise_error
- reflect
- try_reflect
- user
- version
Unsupported in mapping formulas
You can use the following functions in source queries but not in mapping formulas.
Aggregating functions
- array_agg
- bitmap_or_agg
- bitmap_construct_agg
- count
- count_if
- count_min_sketch
- cume_dist
- histogram_numeric
- hll_sketch_agg
- hll_union_agg
- kurtosis
- grouping
- grouping_id
- sum
- avg
- min
- max
- collect_list
- collect_set
- approx_count_distinct
- first
- last
Set generating functions
- explode
- explode_outer
- inline
- inline_outer
- json_tuple
- posexplode
- posexplode_outer
- stack
Side effect functions
- curdate
- current_date
- current_timestamp
- localtimestamp
- monotonically_increasing_id
- now
- rand
- randn
- random
- session_window
- shuffle
Window functions
- row_number
- rank
- dense_rank
- ntile
- lag
- lead
- cume_dist
- percent_rank
