Source Settings

Learn about the settings you can change for a file source.

Overview

Each file source has a set of configurable settings, such as the file regex and cleansing options. Source settings are changed in the global workspace, and once saved, changes cannot be rolled back. Most settings are not applicable to table sources.

Note: A source's file type is locked once generated. If you change your data delivery format, for example move from Excel to CSV, you must convert the source's file type. Please contact Visier Technical Support for assistance.

To find the source's settings:

  1. On the global navigation bar, click Data > Sources.
  2. Select a source.
  3. Navigate to Settings.

File regex

Records supplied through a file require a regular expression (regex) to select the file in a folder.

Recommended: Write as specific an expression as possible to avoid mistakenly accepting the wrong file. Keep in mind that regular expressions are not shell wildcards and the dot (.) character is a wildcard.

Narrow the expression by including the file extension. For example, use \.txt for text and \.xlsx for Microsoft Excel. Include the item the source represents with strings like .*Employee.* or .*Exit.*.

You may broaden the expression with the wildcard (.) and quantifications: ?, +, *, or {n}. As well, prepending (?i) makes the expression case insensitive.

Example: Best practice

The following expression filters for a text file, starting with the tenant code X0X, followed by a descriptive name, and then eight digits for a date in YYYYMMDD format.

Copy
X0X_Exit_Events_\d{8}.txt

The following expression accepts a file starting with the tenant code, followed by a descriptive name, and date in YYYYMM format. The expression is case insensitive and accepts either XLS or XLSX files.

Copy
(?i) X0X_Employee_Profile_*\d{6}.xlsx

Example: Overly broad regex

The following expression filters for a text file for the employee profiles. It includes the word Employee and six digits for a date in YYYYMM format.

Copy
.*Employee.*\d{6}.txt

However, this expression accepts:

  • Employee_Profile_YYYYMM.txt: A file listing every active employee.
  • Supervisory_Employee_YYYYMM.txt: A file with five columns including only supervisors and who they report to.

A broad regex may load the wrong file and result in errors.

Sheet name

If you upload an XLS or XLSX file containing multiple sheets, each sheet will create a unique data transfer. Once the source is generated, the file regex includes a Sheet name with the sheet name automatically added. The sheet name is an exact name match to the name of the sheet from which the source was generated. This allows the source to load records in future data transfers if there is an exact sheet name match in the transfer.

If the sheet name is blank but the schema of the sheet matches the source, the source will still load records from the data transfer. This allows the source to recognize the correct source sheet without requiring an exact name match, however, it will attempt to load all sheets to a source that match the original regex, which may result in loading too many sheets.

Snapshot time

The snapshot time represents when a record was true in the source system, such as an HRIS, and is typically the moment of export. By default, Visier overwrites the original snapshot time with the Upload time (UTC) when the file is uploaded to the platform.

Example:  

  • If you exported a set of records from your HRIS on 2020 August 1 at 09:00:13.214Z. The time of the export is the snapshot time. If you upload these records to Visier on 2020 August 5 at 3:00PM, that is the upload time (UTC). Because the UTC overwrites the original snapshot time, there is a time difference of about four days and six hours later. You can adjust the snapshot time by minus 102 hours to represent the original snapshot time.
  • If you uploaded a file three months ago and want to replace a record in that file by uploading a new file today, you can change the snapshot time of today's file to three months ago to correct the previously uploaded records.

Use the following optional parameters to adjust the snapshot time so it accurately reflects when the data was exported:

  • Offset (hours): An integer value used to shift the time. The platform calculates the time as: Snapshot time = Upload time + Offset
  • Snap to midnight: When enabled, the snapshot time moves to the start of the day. This zeros the time portion of the timestamp, for example, 20200801 09:00:13.214Z becomes 20200801 00:00:00.000Z.

For more information on how Visier defines time for records, see How Time Is Represented in a Source.

Delimiter

A delimiter is a character used to separate individual fields in a record. You can either select or type a delimiter.

Caution: In Excel sources, the default delimiter is TAB. Do not change the default delimiter.

Escape character

An escape character signals that characters should be treated literally rather than as special symbols, allowing quoting characters, delimiters, or other reserved characters to appear within a string without breaking its syntax. If the delimiter appears within a value, enclosing the value in quoting characters prevents Visier from splitting it into multiple parts. For example, if quoting characters (") are used to enclose values and a comma (,) is the delimiter, the platform renders the following input string as a unitary value: "Director, PMO".

Distinct sets of quoting characters can be used to prevent Visier from omitting necessary quotation marks in the data. For example, two quoting characters (") surrounding the string "octarine" below will render the single set of quotation marks in the data.

Example: Table of values

Imagine you wish to load the following records.

Event date Name Job title Favorite color
2019-01-10 Kasimir Chavez Sr. Business Analyst "octarine"
2019-01-17 Byron Charles Director, PMO white
2019-02-23 Kimberly Powers HR Business Partner red

Note:  

  • Mr. Charles's job title includes a comma—the delimiter.
  • Mr. Chavez's favorite color is a coined name from a book and as such is to be recorded in quotation marks.

These first records are compliant with the CSV standard—RFC 4180:

  • "Event date", "Name", "Job title", "Favorite color"
  • "2019-01-10", "Kasimir Chavez", "Sr. Business Analyst", ""octarine""
  • "2019-01-17", "Byron Charles", "Director, PMO", "white"
  • "2019-02-23", "Kimberly Powers", "HR Business Partner", "red"

These next records are not compliant and the platform parses row three as five columns while all other rows have four columns:

  • Event date, Name, Job title, Favorite color
  • 2019-01-10, Kasimir Chavez, Sr. Business Analyst, "octarine"
  • 2019-01-17, Byron Charles, Director, PMO, white
  • 2019-02-23, Kimberly Powers, HR Business Partner, red

The preceding records require an escape character because the delimiter is used as part of a string in the data. Without it, the platform parses Director, PMO as separate columns rather than one value.

Encoding

The encoding schema for the characters in a source must be set.

Visier supports many different encodings in families such as IBM, ISO, JIS, Windows, x-IBM, x-Mac, and x-Windows. A complete list is shown in the solution. Common encodings are ISO-8859-1, UTF-8, UTF-16LE, and Windows-1252. Our preferred encoding is UTF-8.

There are some practical aspects when working with encodings:

  • Visier converts all accepted encodings to UTF-8.
  • You can validate any selected encoding against the source.
  • Incorrect encoding may lead to an error message that the source has no input data.
  • The encoding must stay consistent over time from one load to another. Inconsistent file encodings result in load failures.

    Note: If you change the encoding and that change is consistent going forward, you can adjust the source setting for encoding with no impact to the previously received and loaded files in the source.

Skip lines

You can configure the source settings to skip a specific number of lines at the beginning or end of a source file. Skip line settings are applied to every file added to the source; ensure that any uploaded files are consistent and require the configured skip line settings.

  • Skip beginning lines: The first line of a source file should include a header listing the column names. Some source files include a preamble above the header such as a notice of confidentiality or provenance. You can skip a specified number of lines at the beginning of the source file, such as the header and any preamble lines, by setting the Skip beginning lines value.
  • Skip end lines: Some source files contain trailing lines that aren’t needed in Visier, such as lines for metadata records. You can skip unnecessary lines at the end of your source file by setting the Skip end lines value.

Cleansing

In cleansing a source, Visier (the loader) moves the records, checks they are in proper format, and changes the encoding. The loader removes blank lines and applies other cleansing tasks specified for the source.

Use these settings to specify how to remove unneeded elements from the source. For example, the loader may strip new lines within a cell, but new lines at the end of a record are honored. If your source includes long cell values, you can set the loader to ignore or truncate cell content in excess of 4096 characters.

The following table describes the elements involved in cleansing a source.

Element Description
Strip line breaks

Sometimes data has fields that contain line-breaks to create text wrapping. These line breaks, which are wrapped in escape characters and placed in a value, break our loader. If enabled, the loader removes line breaks.

Note: The loader always strips blank lines from files including a full row with all empty cells, and blank lines with no delimiters.

Truncate large cells The loader has an internal limit for how many characters may be in a cell. If enabled, the loader shortens any cells in excess of Visier's limit.

Automatic file exclusion

Whether or not files are automatically excluded from a source after a specified time period. This is useful for automatically cleaning up your source containers if, for example, restatement data is loaded frequently.

If Automatic File Exclusion is enabled, you may optionally enable file deletion for auto-excluded files. For example, files that are set to automatically exclude after 14 days will be excluded and then deleted from the source after those 14 days have passed.

Deletion of excluded files

Choose whether to keep or permanently delete excluded files. If enabled, specify the number of days to retain these files before they are permanently deleted. The default is 30 days.

Note:  

  • For files excluded prior to enabling this setting, the system will start counting days the moment the toggle is turned on.
  • This setting only applies to manually excluded files. To manage automatically excluded files, use the Enable automatic file exclusion setting.

Stop loading new data

When enabled, new data files will no longer be loaded into this source. Files currently loaded into this source are not affected.

Exclude from auto-processing

When enabled, Visier ignores this source when checking which data categories require a processing job after receiving data.

Character remover

When enabled, specify individual characters to strip from uploaded files. For example, entering 'abc' will remove every instance of 'a', 'b', and 'c' found in your data. The system treats the input as a literal set, where each character is handled individually, and does not require escape or code sequences.

Auto-extraction

Note: This setting is only applicable to data connector sources.

Whether or not data is extracted automatically each day. This setting is enabled by default and only applicable to sources created via data connector. For more information, see Set Up Data Connectors.

In this section