Use SQL with Table Sources
Use SQL expressions to manage your source data.
Access requirements
Profiles: Data Engineer
Custom profile with these capabilities: Data (Write, Detailed)
Data required: Access to the sources you're managing
Reach out to your administrator for access.
Overview
Sources are the building blocks for data in Visier. They are the raw materials to map your data to Visier's analytic model.
Visier supports the following source types:
- File sources: File-based containers that use regex and override behavior to manage data. For more information about acceptable files, see Data File Guidelines.
- Table sources: Row-based containers that support incremental updates and SQL transformations during sync. For more information, see Use SQL with Table Sources.
The following table describes the differences between the behavior of each source type.
|
File sources |
Table sources |
|---|---|
|
Each data transfer or extraction outputs to a unique file. |
Each data transfer or extraction directly modifies the same table. |
|
Use Visier Extraction Language to query records in a source's Records tab or SQL to query sources in the Data > Query room. For more information, see Query Source Data. |
Use Spark SQL to interact with the table data. For more information, see Use SQL with Table Sources. |
|
To delete an individual's data, such as for GDPR compliance, open a Visier Technical Support ticket to remove the information one-by-one from every extracted file. |
To delete an individual's data, such as for GDPR compliance, use a DELETE SQL query on the table without contacting Visier Technical Support. Note: The snapshot retention period specifies how long to keep previous snapshots of tables before permanently deleting them. You can set the retention period between 1 and 180 days. The default is 3 days. |
|
The data is available in the Data Transfers room. For more information, see Navigate Studio. |
The data is available in the associated table source in the Sources room. For more information, see Sources. |
You can create table sources:
- Using data connectors. For more information, see Data In Connectors.
- With Visier APIs. For more information, see Table Source API.
- Manually. For more information, see Manually create a table source.
Difference from file sources
You can use Spark SQL to query, upsert, replace, commit, rollback, and process data. SQL expressions tell Visier how to map and replace data. File sources use Visier Extraction Language (VEL) and Override behavior.
A table source temporarily stores data while you commit the data to Visier, then discards the files. Table sources are better for near real-time data processing because they do not track individual files indefinitely and can efficiently handle updates and deletions. As a result, the volume of data to process is much lower.
About transactions
To bring data into a table source, start a transaction. A transaction is a temporary container that stores data and SQL actions, such as inserting or replacing data in the table source. An open transaction is a write transaction or a read transaction.
- Write transaction: The transaction makes changes to a table source. One write transaction can be open at a time. Close a transaction by committing the transaction and processing the data. In an API call, specify tablesToModify to lock tables for writing during the transaction.
- Read transaction: The transaction can view data within a table source. While a SQL job processes changes to a table source, a read transaction will only see the data available prior to the job. In an API call, omit tablesToModify to make the transaction read-only.
About input files
A data file that you upload to a table source transaction is an input file. An input file can contain data to insert into your sources, IDs to match against to change or delete rows, or any other input for your SQL query.
SQL expressions reference input files in the transaction by the id field (set in the job's inputs array), not the filename.
Example: An input file is declared as "id": "new_employees" in the inputs field. SQL expressions reference this file as new_employees, regardless of the input file's filename.
If the id contains special characters such as dots or hyphens, wrap the ID in backticks; for example, `my-input.csv`.
Add data to a table source
Prerequisites: Connect your source data to objects in Visier. For more information, see Add a Mapping. If mapping by formula, use SQL expressions instead of VEL.
Use connectors to add data
If your table source is associated with a data connector, the table source receives new data after a data connector extraction job runs. For more information, see Run a Data Connector Extraction Job.
After a successful extraction job, the data is available in the associated table source in the Sources room. For more information, see Sources.
Use APIs to add data
- Start a transaction: POST /v1alpha/data/table-sources/transactions. For more information and code samples, see Start a transaction.
- Upload a file to the transaction: PUT /v1alpha/data/table-sources/transactions/{transactionId}/inputs/{filename}. For more information and code samples, see Upload an input file to a transaction.
- Apply SQL expressions and process the data: POST /v1alpha/data/table-sources/transactions/{transactionId}/jobs. For more information and code samples, see Run a SQL job within a transaction.
Manually add data
- On the global navigation bar, click Sources.
- Select the table source and click Records.
- In Temporary Data, click Add Files.
- Select the files to upload.
- In SQL Commands, type the SQL expressions to apply to the source. For an example of potential SQL expressions, see SQL examples.
- Click Execute SQL. The expression moves to Current Transaction.
- Optional: To remove SQL expressions, in Current Transaction, click Rollback.
- When finished, click Commit.
- To process the data, run a job. For more information, see Run a Job.
Define column settings
Assign each column a data type and optionally designate one column as the primary timestamp.
Use APIs to define columns
Define the columns array in any of the following API calls:
- Create a table source: POST /v1alpha/data/table-sources/tables. For more information and code samples, see Create a table source.
- Update a table source: PUT /v1alpha/data/table-sources/tables/{tableId}. For more information and code samples, see Update a table source.
Manually define columns
When you access a table source in the Sources room, you can create columns, assign column data types, and choose the primary timestamp. For more information, see Manually create a table source.
Edit key-value pairs
View and change the user-defined key-value pairs for a table source. In addition to key-value pairs, a table source's Table Properties tab displays the source's number of records, last updated date, and data size.
Use APIs to edit key-value pairs
- Retrieve a table source's existing key-value pairs: GET /v1alpha/data/table-sources/tables/{tableId}/metadata. In the response, look for metadata. For more information and code samples, see Retrieve a table source's metadata.
- Run a SQL job with the setMetadata action. POST /v1alpha/data/table-sources/transactions/{transactionId}/jobs. For more information and code samples, see setMetadata.
Manually edit key-value pairs
- On the global navigation bar, click Sources.
- Select the table source and click Table Properties.
- In Key/Value Metadata, do any of the following:
- To add a new key-value pair, click Add Pair. Type a key and value.
- To change an existing key-value pair, type a new key or value.
- To delete a key-value pair, click the Delete button
.
- To revert your changes without saving, click Reset.
- To save your changes, click Commit. This executes the changes to the table source.
SQL examples
This section provides sample SQL expressions to use in table source queries and mapping formulas. For more information about SQL functions, see Apache Spark SQL Functions.
Table source mapping formulas have protective limitations. They cannot use functions that change row counts or depend on table order. For a list of SQL functions that are not supported, see Unsupported SQL functions.
Samples in this section use the following table.
-- employee_custom_data
-- Columns: employee_id (STRING), badge_number (STRING), office_floor (INTEGER),
-- is_remote (BOOLEAN), effective_date (DATE)
SELECT
SELECT employee_id, badge_number, office_floor, effective_date
FROM employee_custom_data
SELECT employee_id, badge_number, office_floor, effective_date
FROM employee_custom_data
WHERE effective_date >= '2026-01-01'
AND is_remote = false
SELECT office_floor,
COUNT(*) AS headcount,
SUM(CASE WHEN is_remote THEN 1 ELSE 0 END) AS remote_count
FROM employee_custom_data
WHERE effective_date = '2026-01-01'
GROUP BY office_floor
ORDER BY office_floor
SELECT e.employee_id,
e.badge_number,
n.office_floor,
n.is_remote,
n.effective_date
FROM new_employees n
LEFT JOIN employee_custom_data e
ON e.employee_id = n.employee_id
AND e.effective_date = n.effective_date
SELECT employee_id, badge_number, office_floor, is_remote, effective_date
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY effective_date DESC) AS rn
FROM employee_custom_data
) ranked
WHERE rn = 1
INSERT
INSERT INTO employee_custom_data
SELECT employee_id, badge_number, office_floor, is_remote, effective_date
FROM new_employees
INSERT INTO employee_custom_data (employee_id, badge_number, office_floor, is_remote, effective_date)
VALUES ('E001', 'B-1042', 3, false, '2026-03-01')
INSERT INTO employee_custom_data
SELECT n.employee_id, n.badge_number, n.office_floor, n.is_remote, n.effective_date
FROM new_employees n
WHERE NOT EXISTS (
SELECT 1
FROM employee_custom_data e
WHERE e.employee_id = n.employee_id
AND e.effective_date = n.effective_date
)
UPDATE
UPDATE employee_custom_data
SET is_remote = true
WHERE office_floor = 0
MERGE INTO employee_custom_data AS target
USING new_employees AS source
ON target.employee_id = source.employee_id
AND target.effective_date = source.effective_date
WHEN MATCHED THEN
UPDATE SET
office_floor = source.office_floor,
is_remote = source.is_remote
Note: UPDATE only supports a WHERE clause. UPDATE does not support join-based updates. Use MERGE with a WHEN MATCHED clause to update existing rows from a source view without inserting new rows.
DELETE
DELETE FROM employee_custom_data
WHERE effective_date = '2025-12-31'
DELETE FROM employee_custom_data
WHERE employee_id NOT IN (
SELECT employee_id FROM new_employees
)
DELETE FROM employee_custom_data
WHERE (employee_id, effective_date) IN (
SELECT employee_id, effective_date
FROM new_employees
WHERE is_deleted = true
)
The above example processes a CDC feed where the input file includes an is_deleted column. This removes target file rows that have a matching key in the source file and are flagged for deletion.
MERGE
MERGE INTO employee_custom_data AS target
USING new_employees AS source
ON target.employee_id = source.employee_id
AND target.effective_date = source.effective_date
WHEN MATCHED THEN
UPDATE SET
badge_number = source.badge_number,
office_floor = source.office_floor,
is_remote = source.is_remote
WHEN NOT MATCHED THEN
INSERT (employee_id, badge_number, office_floor, is_remote, effective_date)
VALUES (source.employee_id, source.badge_number, source.office_floor, source.is_remote, source.effective_date)
A single MERGE statement can handle updates and inserts but cannot delete rows that are absent from the source. To implement a full sync, run a MERGE for the upsert followed by a separate DELETE for rows no longer present in the source. These can be submitted as two separate query actions in the same job.
// Step 1: Upsert from the source file
MERGE INTO employee_custom_data AS target
USING new_employees AS source
ON target.employee_id = source.employee_id
AND target.effective_date = source.effective_date
WHEN MATCHED THEN
UPDATE SET
badge_number = source.badge_number,
office_floor = source.office_floor,
is_remote = source.is_remote
WHEN NOT MATCHED THEN
INSERT (employee_id, badge_number, office_floor, is_remote, effective_date)
VALUES (source.employee_id, source.badge_number, source.office_floor, source.is_remote, source.effective_date)
// Step 2: Delete rows that are no longer in the source file
DELETE FROM employee_custom_data
WHERE employee_id NOT IN (SELECT employee_id FROM new_employees)
The following example processes a CDC feed where the input includes an op column indicating the operation type:
- 'I' for insert.
- 'U' for update.
- 'D' for delete.
This removes rows matched by key that are flagged for deletion, updates other matched rows are updated, and inserts unmatched rows that are not deletions.
MERGE INTO employee_custom_data AS target
USING new_employees AS source
ON target.employee_id = source.employee_id
AND target.effective_date = source.effective_date
WHEN MATCHED AND source.op = 'D' THEN
DELETE
WHEN MATCHED AND source.op != 'D' THEN
UPDATE SET
badge_number = source.badge_number,
office_floor = source.office_floor,
is_remote = source.is_remote
WHEN NOT MATCHED AND source.op != 'D' THEN
INSERT (employee_id, badge_number, office_floor, is_remote, effective_date)
VALUES (source.employee_id, source.badge_number, source.office_floor, source.is_remote, source.effective_date)
MERGE INTO employee_custom_data AS target
USING new_employees AS source
ON target.employee_id = source.employee_id
AND target.effective_date = source.effective_date
WHEN MATCHED AND (
target.office_floor <> source.office_floor OR
target.is_remote <> source.is_remote
) THEN
UPDATE SET
office_floor = source.office_floor,
is_remote = source.is_remote
WHEN NOT MATCHED THEN
INSERT (employee_id, badge_number, office_floor, is_remote, effective_date)
VALUES (source.employee_id, source.badge_number, source.office_floor, source.is_remote, source.effective_date)
Spark SQL for column mappings
Use these expressions in the SELECT list of a query to transform or derive column values before writing them to a table source.
String functions
SUBSTR(employee_id, 1, 3) AS id_prefix
REGEXP_REPLACE(badge_number, '[^A-Za-z0-9]', '') AS badge_clean
SPLIT(location_code, '-')[0] AS region_code
COALESCE(department_name, 'Unknown') AS department_name
COALESCE(department_name, 'Unknown') AS department_name
Date and time functions
YEAR(effective_date) AS year
MONTH(effective_date) AS month
DAYOFWEEK(effective_date) AS day_of_week
Conditional expressions
CASE
WHEN office_floor >= 10 THEN 'Executive'
WHEN office_floor >= 5 THEN 'Senior'
ELSE 'Standard'
END AS floor_tier
NULLIF(badge_number, 'N/A') AS badge_number
COALESCE(preferred_name, legal_name, employee_id) AS display_name
Numeric functions
CASE WHEN total > 0 THEN ROUND(part / total * 100, 1) ELSE 0.0 END AS pct
Type casting
Hashing and anonymization
SHA2(employee_id, 256) AS employee_id_hash
Unsupported SQL functions
Table source mapping formulas have protective limitations. They cannot use functions that change row counts or depend on table order.
Unsupported in mapping formulas and source queries
Table sources do not support the following functions in any way.
- Data definition language (DDL): CREATE, DROP, ALTER
- Procedures: CALL
- Table properties: TBLPROPERTIES
- Insecure functions: These functions expose sensitive information or allow arbitrary code execution, and are not supported.
- current_catalog
- current_database
- current_schema
- current_user
- java_method
- raise_method
- raise_error
- reflect
- try_reflect
- user
- version
Unsupported in mapping formulas
You can use the following functions in source queries but not in mapping formulas.
Aggregating functions
- array_agg
- bitmap_or_agg
- bitmap_construct_agg
- count
- count_if
- count_min_sketch
- cume_dist
- histogram_numeric
- hll_sketch_agg
- hll_union_agg
- kurtosis
- grouping
- grouping_id
- sum
- avg
- min
- max
- collect_list
- collect_set
- approx_count_distinct
- first
- last
Set generating functions
- explode
- explode_outer
- inline
- inline_outer
- json_tuple
- posexplode
- posexplode_outer
- stack
Side effect functions
- curdate
- current_date
- current_timestamp
- localtimestamp
- monotonically_increasing_id
- now
- rand
- randn
- random
- session_window
- shuffle
Window functions
- row_number
- rank
- dense_rank
- ntile
- lag
- lead
- cume_dist
- percent_rank
