Source Settings

Learn about the settings you can change for a source.

Who can use this feature?

Users with this profile:

  • Data Engineer

Not sure if you have this feature or capability? Reach out to your administrator.

Overview

Each source has a set of configurable settings, such as the file regex and cleansing options. Source settings are changed in the global workspace and once saved, changes cannot be rolled back.

To find the source's settings:

  1. On the global navigation bar, click Data > Sources.
  2. Select a source.
  3. Navigate to Settings.

File regex

Records supplied through a file require a regular expression (regex) to select the file in a folder.

Recommended: Write as specific an expression as possible to avoid mistakenly accepting the wrong file. Keep in mind that regular expressions are not shell wildcards and the dot (.) character is a wildcard.

Narrow the expression by including the file extension. For example, use \.txt for text and \.xlsx for Microsoft Excel®. Include the item the source represents with strings like .*Employee.* or .*Exit.*.

You may broaden the expression with the wildcard (.) and quantifications: ?, +, *, or {n}. As well, prepending (?i) makes the expression case insensitive.

Example: Best practice

The following expression filters for a text file, starting with the tenant code X0X, followed by a descriptive name, and then eight digits for a date in YYYYMMDD format.

  • X0X_Exit_Events_\d{8}.txt

The following expression accepts a file starting with the tenant code, followed by a descriptive name, and date in YYYYMM format. The expression is case insensitive and accepts either XLS or XLSX files.

  • (?i) X0X_Employee_Profile_*\d{6}.xlsx

Example: Overly broad regex

The following expression filters for a text file for the employee profiles. It includes the word Employee and six digits for a date in YYYYMM format.

  • .*Employee.*\d{6}.txt

However, this expression accepts:

  • Employee_Profile_YYYYMM.txt: A file listing every active employee.
  • Supervisory_Employee_YYYYMM.txt: A file with five columns including only supervisors and who they report to.

A broad regex may load the wrong file and result in errors.

Sheet name

If the source originates from an XLS or XLSX extension containing multiple sheets, the file regex includes the Sheet name setting. The sheet name is an exact name match to the name of the sheet from which the source was generated. This allows the source to load records in future data transfers if there is an exact sheet name match in the transfer.

If the sheet name is blank but the schema of the sheet matches the source, the source will still load records from the data transfer. This allows the source to recognize the correct source sheet without requiring an exact name match, however, it will attempt to load all sheets to a source that match the original regex and may load too many sheets.

Delimiter

A delimiter is a character used to separate individual fields in a record. You can either select or type a delimiter.

Caution: In Excel sources, the default delimiter is TAB. Do not change the default delimiter.

Escape character

If the delimiter is part of a string and is to appear in the data, use an escape character to stop Visier from splitting the value in two. For example, if double quote (") is the escape character and comma (,) the delimiter, the platform renders the following input spring as a unitary value: "Director, PMO".

Example: Table of values

Imagine you wish to load the following records.

Event date Name Job title Favorite color
2019-01-10 Kasimir Chavez Sr. Business Analyst "octarine"
2019-01-17 Byron Charles Director, PMO white
2019-02-23 Kimberly Powers HR Business Partner red

Note:  

  • Mr. Charles's job title includes a comma—the delimiter.

  • Mr. Chavez's favorite color is a coined name from a book and as such is to be recorded in quotation marks.

These first records are compliant with the CSV standard—RFC 4180:

  • "Event date", "Name", "Job title", "Favorite color"

  • "2019-01-10", "Kasimir Chavez", "Sr. Business Analyst", ""octarine""

  • "2019-01-17", "Byron Charles", "Director, PMO", "white"

  • "2019-02-23", "Kimberly Powers", "HR Business Partner", "red"

These next records are not compliant and the platform parses row three as five columns while all other rows have four columns:

  • Event date, Name, Job title, Favorite color

  • 2019-01-10, Kasimir Chavez, Sr. Business Analyst, "octarine"

  • 2019-01-17, Byron Charles, Director, PMO, white

  • 2019-02-23, Kimberly Powers, HR Business Partner, red

The preceding records require an escape character because the delimiter is used as part of a string in the data. Without it, the platform parses Director, PMO as separate columns rather than one value.

Encoding

The encoding schema for the characters in a source must be set.

Visier supports many different encodings in families such as IBM, ISO, JIS, Windows, x-IBM, x-Mac and x-Windows. A complete list is shown in the solution. Common encodings are ISO-8859-1, UTF-8, UTF-16LE, and Windows-1252. Our preferred encoding is UTF-8.

There are some practical aspects when working with encodings:

  • Visier converts all accepted encodings to UTF-8.
  • You can validate any selected encoding against the source.
  • Incorrect encoding may lead to an error message that the source has no input data.
  • The encoding must stay consistent over time from one load to another. Inconsistent file encodings result in load failures.

    Note: If you change the encoding and that change is consistent going forward, you can adjust the source setting for encoding with no impact to the previously received and loaded files in the source.

Skip lines

You can configure the source settings to skip a specific number of lines at the beginning or end of a source file. Skip line settings are applied to every file added to the source; ensure that any uploaded files are consistent and require the configured skip line settings.

  • Skip beginning lines: The first line of a source file should include a header listing the column names. Some source files include a preamble above the header such as a notice of confidentiality or provenance. You can skip a specified number of lines at the beginning of the source file, such as the header and any preamble lines, by setting the Skip beginning lines value.
  • Skip end lines: Some source files contain trailing lines that aren’t needed in Visier, such as lines for metadata records. You can skip unnecessary lines at the end of your source file by setting the Skip end lines value.

Cleansing options

In cleansing a source, Visier (the loader) moves the records, checks they are in proper format, and changes the encoding. The loader strips out blank lines and applies other cleansing tasks specified for the source.

Use these settings to specify how to remove unneeded elements from the source. For example, the loader may strip new lines within a cell, but new lines at the end of a record are honored. If your source includes long cell values, you can set the loader to ignore or truncate cell content in excess of 4096 characters.

The following table describes the elements involved in cleansing a source.

Element Description
Run scripts

You may have a cleansing script to address errors in your record formats. If enabled, the loader searches for a script:

  • /esldata/scripts/<Tenant_Name>_pre_process.sh

For example, a preprocessing script for X0X tenant is: WFF_X0X_pre_process.sh

Strip line breaks

Sometimes data has fields that contain line-breaks to create text wrapping. These line breaks, which are wrapped in escape characters and placed in a value, break our loader. If enabled, the loader removes line breaks.

Note: The loader always strips blank lines from files including a full row with all empty cells, and blank lines with no delimiters.

Truncate large cells The loader has an internal limit for how many characters may be in a cell. If enabled, the loader shortens any cells in excess of Visier's limit.

Automatic file retention

Whether or not files are automatically excluded from a source after a specified time period. This is useful for automatically cleaning up your source containers if, for example, restatement data is loaded frequently.

If Automatic File Retention is enabled, you may optionally enable file deletion for auto-excluded files. For example, files that are set to automatically exclude after 14 days will be excluded and then deleted from the source after those 14 days have passed.

Auto-extraction

Note: This setting is only applicable to data connector sources.

Whether or not data is extracted automatically each day. This setting is enabled by default and only applicable to sources created via data connector. For more information, see Set Up Data Connectors.

In this section