Query Source Data

Learn how to query records from your data sources.

Who can use this feature?

Users with this profile:

  • Data Engineer or Data: Read (access level), Detailed (view level)

Additionally, users must have full data access to query sources, such as Super Admin Permission. For more information, see Super Admin Permission.

Not sure if you have this feature or capability? Reach out to your administrator.

Overview

You can investigate your source data in Visier with the source query tool. Sources in Visier store data before processing; a source contains the raw data from any data files that have the same content and structure. You can write SQL-like queries to retrieve specific records from your sources, such as finding duplicate records or finding which files contain a specific record. To access the source query room, do one of the following:

  • In the solution experience, in a visualization, click the Info button > Metric lineage > Metric Validation button > Launch Source Query.
  • In Studio, on the global navigation bar, click Data > Query.

The following screenshot shows the Query room in the global workspace.

  1. Search: Find a source or source column in the source directory.
  2. Source directory: A list of available sources to query. Expand a source to see its columns and column data types. Hover over a source or column and then click the Add button to insert it into the query.
  3. Filter: Filter the source directory to only sources that have records from included files.
  4. Sort: Sort the source directory alphabetically in ascending or descending order.
  5. Include records from excluded files: Option to include records in the query from data files that are excluded from the source.
  6. Query box: The query statement.
  7. Time picker: Select the time range against which to run the query, such as Last 30 Days or a custom date range. The time picker selects files based on their snapshot time.
  8. Run Query: Execute the query.
  9. Results actions: Options for the results table. You can also search for records in the results table.
    • Information: More information about the source query tool, such as query result limits and how to interact with the results.
    • Enter full screen: View the results in full screen.
    • Download: Download the results in CSV format.
  10. Results: The query results. Each query retrieves up to 1000 records.

Prerequisites

To retrieve records using source query, your sources must contain data. For more information, see Sources.

Write a query

In the Query tab, you can write a query to retrieve records from your sources in Visier. You can use SQL syntax to write queries in Visier. For more information about supported functions and keywords, see Query functions.

Query format

In a query, insert source data using one of the following formats:

  • Source_Name.Column_Name

    For example, Starts_Events.Position_Code identifies the Starts_Events source and the Position_Code column within that source.

  • `Source Name`.`Column Name`

    For example, `Exit Events`.`Fiscal Period` identifies the Exit Events source and the Fiscal Period column within that source. If your source or column name contains spaces, you must use this format.

To use features like auto-complete and syntax highlighting, we recommend always using `Source_Name`.`Column_Name` format. Type ` in the query box to see a list of all sources that you can insert into the query with auto-complete.

Sources and columns may have more than one name in Visier, such as an object name and a display name. When writing a query, be sure to use the right name type:

  • Source: Use the display name. This can be found in the Basic Information tab of a source in the Display Name field.
  • Column: Use the original name. This can be found in the Columns tab of a source in the Original Name column.

Tip: To simplify your queries, use aliases for the field names and table names. This is useful when using joins. For example, consider the following query.

SELECT Employee.EmployeeID, Employee.Title, Employee.Email_Address, `Employee Exit`.`Fiscal Period` FROM Employee INNER JOIN `Employee Exit` ON Employee.EmployeeID = `Employee Exit`.EmployeeID

You can simplify the above query by adding an alias to the tables, as shown next.

SELECT Emp.EmployeeID, Emp.Title, Emp.Email_Address, Exit.`Fiscal Period` FROM Employee Emp INNER JOIN `Employee Exit` Exit ON Emp.EmployeeID = `Exit`.EmployeeID

Source directory

Use the source directory in the Query room to view all available sources. In the source directory, you can: 

  • Expand a source to see its column and column data types.
  • Hover over the Source icon to see information about source records and files.
  • Insert a source or column into your query.
  • Filter the list of sources to only show sources that has records (loaded data).
  • Sort the list of sources alphabetically in ascending or descending order.

The following screenshot shows the source directory in the Query room.

Query functions

Note: Source query supports read-only functions. You cannot modify source tables using insert, transform, or other functions.

In addition to SQL functions, Visier supports the following functions in the source query tool. For examples of query statement, see Sample queries.

getFileName

Returns the file name of the file being read.

getJsonWithPath

Returns the string representation of the JSON object at the given JSON path.

getUploadTime

Returns the upload time of the file being read.

Sample queries

The following section offers a selection of sample queries that cover different source query use cases.

Query multiple sources to create a single table

Let's say you want to select columns from the Employee and Employee Exit sources to identify employees that exited the organization. Using the source explorer, you can query both sources to create a single table that contains the information that you want to review.

To retrieve that data, you can write the following query: SELECT Employee.EmployeeID, Employee.Title, Employee.Email_Address, `Employee Exit`.`Fiscal Period` FROM Employee INNER JOIN `Employee Exit` ON Employee.EmployeeID = `Employee Exit`.EmployeeID

The above statement tells Visier to retrieve three columns from the Employee source: Employee ID, Title, and Email Address, and one column from the Employee Exit source: Fiscal Period. The statement then tells the platform to join the Employee and Employee Exit sources on the column EmployeeID, which appears in both sources.

The table that's returned includes data from both sources and combines the EmployeeID column so that it isn’t duplicated, as shown in the following screenshot.

Query using snapshot dates

The following query uses SQL to retrieve records that have snapshot dates.

select `Snapshot Date`, `Employee ID`, `Employment Status` from `Employee Historical` where `Employee ID` = "M8L1GQPWT" and to_date(`Snapshot Date`, "MM/dd/yyyy") < to_date("12/31/2021", "MM/dd/yyyy")

order by to_date(`Snapshot Date`, "MM/dd/yyyy")

Query starts, exits, and employee profile in one statement

The following query gets the first employee record and then gets the start and exit information. This query runs even if there are no starts or exits for the employee ID. In this query, UPI is equal to EmployeeID.

select first(employment_Profile.RECORD_DATE), employment_Profile.UPI, employment_Profile.STATUS, employment_start.HIRE_DATE, employment_start.Employment Start Type, employee_exit.TERMINATION_DATE, employee_exit.TERMINATION_REASON from employment_Profile Left outer Join employment_start on employment_Profile.UPI = employment_start.UPI left outer join employee_exit on employment_Profile.UPI = employee_exit.UPI where employment_Profile.UPI in ("15103121", "15085148", "15087173", "15092691", "151003877") group by employment_Profile.UPI, employment_Profile.STATUS, employment_start.HIRE_DATE, employment_start.Employment Start Type, employee_exit.TERMINATION_DATE, employee_exit.TERMINATION_REASON

Identify uploaded files that contain a specific record

The following query identifies which uploaded files contain a specific record.

select distinct(right(getFileName(), getUploadTime())) from `Employee Performance` where `Employee ID` = "111487"

In this example, the query returns the following string.

20230118_1952251888_8550ced3-29e7-4315-8d8-6e3361405e74_upload/Filename.ext.parquet

  • 20230118: The date the file was uploaded.
  • 8550ced3-29e7-4315-8d8-6e3361405e74: The file's upload ID.
  • Filename.ext: The uploaded file's display name.

Identify duplicate records

The following query returns Employee_IDs with duplicate records.

select Count (Employee_id), Employee_id from SourceFile group by Employee_id having count(Employee_id) > 1

Identify the number of duplicate records

The following query returns the total number of duplicate records. If the numbers are the same, there are no duplicates. If the numbers are different, duplicates is count - Distinct count. The previous Identify duplicate records query returns which values are duplicated.

select count(DISTINCT EmployeeID), count(EmployeeID), SnapshotDate

from Employment_Start

group by SnapshotDate

Validate new files to ensure loads are correct

The following query gets the number of records for a value.

select count(EmployeeID), EmploymentStatus f1 from EmployeeProfile group by f1

You can use this query for any field by replacing the dimension in the query. The following query is the same as the above query, but replaces EmploymentStatus with ContractType.

select count(EmployeeID), ContractType f1 from EmployeeProfile group by f1

Count unique records by year, month

The following query counts unique records by year and month. This is useful for starts and exits.

select count(distinct EmployeeID), date_part('year', `EventDate`) Year, date_part('Month', `EventDate`) Month from Employment_Start_UKG group by Year, Month, order by Year, Month

Count ratings by rating for active records only

The following query uses an anti join to exclude all records from the left side of the join when there is a record from the right side. If a record appears in the terminations or exits files, it isn't included in the list.

select Count(distinct `Employee Performance`.`Employee ID`) Records, `Performance Cycle`, `Performance Rating Code` Code, `Performance Rating Display Name` Rating from `Employee Performance` anti join `Employee_Exit_DS` on `Employee Performance`.`Employee ID` = Employee_Exit_DS.`Employee ID` where `Performance Year` = "2022" group by `Performance Cycle`,`Performance Rating Code`, `Performance Rating Display Name` order by `Performance Cycle`,`Performance Rating Code`, `Performance Rating Display Name`

Check event dates

The following query checks dates.

select * from employee_paychanges where date_part('year', `Visier Event Date`) ="2023" and cast(`Visier Event Date` as date) > cast('2023-03-22' as date)

Check for negative values

The following query checks for negative values.

select count(`Employee ID`), Date_part_('year', `Visier Event Date`) Year from employee_paychanges where date_part('year', `Visier Event Date`) = "2023" and cast(`Change Amount` as decimal) < 0 group by year

Check if the direct manager in employee profile is terminated

The following query checks if the direct manager in an employee profile is terminated. This query uses aliases to join employee profile to employee profile.

select employment_Profile.UPI, employment_Profile.MANAGER_UPI, MAN.STATUS from employment_Profile inner join employment_Profile MAN on employment_Profile.MANAGER_UPI=MAN.UPI where employment_Profile.STATUS!= "Terminated" and MAN.STATUS = "Terminated" order by employment_Profile.MANAGER_UPI

Limit the number of records returned

The following query limits the numbers of records returned when testing queries. In the following example, the query returns 5 records.

select * from Worker_Stock_WD_TestV2 limit 5

Subtract dates

The following query subtracts dates using Spark SQL.

select Count(Distinct `Employee ID`), `Performance Year` from `Employee Performance` where cast(months_between(to_date(`Evaluation Period End Date`), to_date(`Evaluation Period Start Date`)) as INT) > 24 group by `Performance Year`

Summarize totals as an integer

The following query summarizes totals as an integer using cast and sorts the results in descending order.

select sum(cast (Amount as decimal)), Company Code from revenue_sample group by Company Code order by sum(cast (Amount as decimal)) desc