Data Connectors Best Practices
Follow these best practices to efficiently extract accurate data with data connectors.
As a partner administrator responsible, you must prioritize the implementation and ongoing maintenance of data connectors to ensure accurate, complete, and efficient data extraction. These best practices will allow you to minimize risk, optimize performance, and support production use cases effectively.
Implementation and validation
Best practices to help you implement and validate data connectors. Without sufficient validation, incomplete and inaccurate data can undermine customer trust and product usability.
Start with the administrating tenant
Once the connector is configured, confirm the accuracy of extracted data by publishing the project with the connector configuration in the administrating tenant. Validate key aspects such as:
- Authentication access to the source system.
- Correctness of data formats and extracted fields.
Scale incrementally
After validating the administrating tenant, add 10 analytic tenants for further validation. This incremental approach to validating child tenants minimizes risks and ensures that configurations are correct before scaling to full production.
Validate data integrity
Once the connector definition is created and the credentials are added, start the data transmission work by manually running a data extraction for the administrating tenant.
Use the source query tool to confirm that all of the expected records have been transferred to the tenant. Run a select all query for each table and compare the record count for the Visier source container to the source system. For example, perform a record count match for all tables and troubleshoot any discrepancies. If they match, run a data processing job to generate a data version and perform the validation of key metrics. If they do not match, investigate why specific records are not transferring, such as verifying if the column is defined on the source container.
After completing this initial step for the administrating tenant, proceed with a small group of analytic tenants. Consider creating an early adopters program to understand how the solution behaves as your customer count increases and identify discrepancies between demo data and what is being pulled from the analytic tenants. This will help determine if Visier needs to assist with any infrastructure considerations.
Data extraction
Best practices to help optimize data extraction.
Use APIs for production workflows
As you scale, transition from using the user interface to the Data and Job Handling APIs for extraction and processing jobs. Testing APIs during implementation ensures familiarity and smooth transition to production use.
Optimize SQL batch size
Set appropriate SQL batch size parameters to optimize extraction speed and avoid timeout errors. Failure to optimize batch sizes or parameters may lead to slower extractions or timeout errors, impacting performance and scalability. We recommend that you start with:
- 1,000,000 records for tables with 20 columns or less and decrease by 100,000 if the query times out.
- 500,000 records for tables with more than 20 columns and increase by 100,000 until the query times out.
Data width and query performance variables may require iterative tuning.
Manage the number of concurrent extractions
To avoid bottlenecks and ensure performance, don't schedule or execute too many data extractions at the same time.
Leverage additional parameters
Explore optional parameters for Java Database Connectivity (JDBC) connectors to further customize data extractions. For more information about the additional parameters that can be used with data extraction jobs, see Schedule a Job.
Maintenance and scalability
Best practices to help you maintain and grow your use of data connectors.
Validate new data sources before you scale
Before broadly applying changes to configurations that include new datasets or columns, validate the configuration updates with a small group of analytic tenants.
Continuously monitor and optimize
As customer tenant counts increase, review performance metrics to ensure extraction processes remain efficient. Address any infrastructure considerations as needed.