Datasource

In today’s data-driven world, enterprise data is scattered across diverse landscapes — files, databases, object stores, cloud warehouses, and APIs embedded within various applications. At xVector, transforming this fragmented information into actionable insights begins with Data Sources.

The Gateway: Data Sources

Data Sources are the gateway, allowing users to connect to, import, and synchronize data from multiple origins. Whether the data resides in structured files, dynamic APIs, or sophisticated cloud storage systems, users can configure and execute a connector to bring it into xVector as a data source. A rich catalog of connectors, periodically updated by xVector, ensures compatibility with an ever-expanding array of systems. Missing a connector? Reach out to connectors@xvectorlabs.com, and a new one can be developed quickly.

Once connected, the process doesn’t stop at simply importing data. Updates from source systems are seamlessly upserted, reflecting real-time changes while preserving the historical timeline of values. Bulk data import is supported with the OVERWRITE option. This meticulous synchronization ensures traceability, enabling businesses to trust the integrity and provenance of their data.

A Seamless Experience

xVector simplifies the journey from raw data to actionable insights, offering users the tools to acquire, refine, and analyze data confidently. With regular updates to its connectors catalog and robust metadata management, the platform ensures that businesses can harness the full potential of their data ecosystem — turning scattered information into cohesive narratives that drive impactful decisions.

Connectors

Available connectors by type:

Files — CSV, JSON, GZIP
Databases — MySQL, SQL Server, PostgreSQL, MongoDB
Object Stores — S3, MinIO
Cloud Data Warehouses — Amazon Redshift, Google BigQuery
APIs — Salesforce, Mailchimp, Zoho

Common Features

Metadata

xVector automatically infers metadata using sampling techniques while creating a datasource. It is recommended that the metadata be reviewed carefully and any corrections made if required. Metadata setting is a crucial step.

Setting	Description
Column Name	Same name as present in the source datasource
Column New Name	New name given to a column in the dataset when copied from the datasource
Description	Description for a column; available to other users and copied into datasets
Data Type	Automatically inferred (int, float, string, date); review for potential errors
Format	Applicable for datetime and currency (e.g., ‘YYYY-MM-DD’, ‘DD/MM/YYYY HH:MM:SS’); visualization only
Semantic Type	Appropriate semantic type for the column (e.g., SSN, zip code for int columns); used in visualization
Statistical Type	Used for visualization and modeling purposes
Skip Histogram	Default is false (generates histogram). Set to true for very high cardinality columns
Nullable	Default is true. Set to false if column value cannot be null; xVector warns if nulls are found
Dimension	Set the column as a dimension; used in visualization and modeling
Measure	Set the column as a measure; used in visualization and modeling

Profile

Data profiling involves examining data to assess its structure, content, and quality. This process calculates various statistical values, including minimum and maximum values, the presence of missing records, and the frequency of distinct values in categorical columns. It can also identify correlations between different attributes. Profiling can be automated by setting the profile option to true or performed manually.

CSV

A CSV-type datasource is used when you need to bring CSV data from a local system. From a workspace, click on Add and choose the Datasources option.

Workspace Add menu showing the Datasources option

Choose the CSV data source type from the available list.

Datasource type selection dialog showing available connectors

Configure

Field	Description
Choose File	Browse your local system to select the required CSV file
Name	Provide a meaningful name
Delimiter	Choose a delimiter according to the file
Header	Set to true if the header is present, otherwise false

Advanced

Field	Description
Workspace	Select a workspace for the datasource
Encoding	Select encoding
Quote	Single character for escaping quoted values
Escape	Single character for escaping quotes inside an already quoted value
Comment	Single character for skipping lines beginning with this character (disabled by default)
Null value	String representation of a null value
NaN value	String representation of a non-number value
Positive/Negative Inf	String representations of infinity values
Multiline	Parse one record, which may span multiple lines, per file
Mode	Permissive or drop malformed
Quote Escape Character	Single character for escaping the escape for the quote character
Empty Value	String representation of an empty value
Write Option	Upsert, Bulk Insert, Insert, Delete
Run Profile	Set to true for running profile
Run Correlation	Set to true for running correlation
Machine Specifications	Use default or set to false for custom machine specs
Expression	Write expressions to be executed while reading the data

Preview Data

Displays a sample of data from the selected file.

Column Metadata

View the automatically inferred metadata and modify it if needed.

CSV datasource column metadata — right columns showing additional settings

Table Options

Configure record key, partition key, and other table-level settings.

Click Save to begin the import. A status screen will appear to track progress. On completion, it redirects to the workspace page.

S3

An S3 datasource is used to bring data from an AWS S3 bucket. You need an AWS access key and secret key.

From a workspace, click on Add and choose the Datasources option, then select the S3 data source type.

Configure

Field	Description
Datasource name	Provide a meaningful name for the data
Saved accounts	Select from already created accounts
New account name	Provide an account name for creating a new account
AWS access key ID	Provide the AWS access key ID
AWS secret key	Enter the AWS secret key
Allowed IPs	Shows the allowed IPs list
File type	Choose file type (CSV, JSON)
Test Connection	Click to test the connection with provided credentials
Extension	Choose file extension (CSV, JSON, GZIP)

Advanced

Field	Description
Workspace	Choose from available options
Bucket	Select the source S3 bucket
Prefix for files	Provide prefix for files
Choose folder	Choose files from the S3 bucket
Header	Set to true if header is present, otherwise false
Delimiter	Choose a delimiter according to the file
Encoding	Select encoding
Quote / Escape / Comment	Character settings for parsing
Null / NaN / Inf values	String representations for special values
Multiline	Parse one record across multiple lines per file
Quote Escape Character	Escaping the escape for the quote character
Empty Value	String representation of an empty value
Write Option	Choose from Upsert, Bulk Insert, Insert, Delete
Expression	Write expressions to be executed while reading the data

Preview Data

Metadata

View the automatically inferred metadata and modify it if needed.

S3 datasource metadata — right columns showing additional settings

Table Options

Click Save to begin the import. A status screen will track progress and redirect to the workspace on completion.

Datasource

The Gateway: Data Sources

A Seamless Experience

Connectors

Common Features

Metadata

Profile

CSV

Configure

Advanced

Preview Data

Column Metadata

Table Options

S3

Configure

Advanced

Preview Data

Metadata

Table Options

Request A Demo

Thank you!