Skip to content

Datasource

In today’s data-driven world, enterprise data is scattered across diverse landscapes — files, databases, object stores, cloud warehouses, and APIs embedded within various applications. At xVector, transforming this fragmented information into actionable insights begins with Data Sources.

Data Sources are the gateway, allowing users to connect to, import, and synchronize data from multiple origins. Whether the data resides in structured files, dynamic APIs, or sophisticated cloud storage systems, users can configure and execute a connector to bring it into xVector as a data source. A rich catalog of connectors, periodically updated by xVector, ensures compatibility with an ever-expanding array of systems. Missing a connector? Reach out to connectors@xvectorlabs.com, and a new one can be developed quickly.

Once connected, the process doesn’t stop at simply importing data. Updates from source systems are seamlessly upserted, reflecting real-time changes while preserving the historical timeline of values. Bulk data import is supported with the OVERWRITE option. This meticulous synchronization ensures traceability, enabling businesses to trust the integrity and provenance of their data.

xVector simplifies the journey from raw data to actionable insights, offering users the tools to acquire, refine, and analyze data confidently. With regular updates to its connectors catalog and robust metadata management, the platform ensures that businesses can harness the full potential of their data ecosystem — turning scattered information into cohesive narratives that drive impactful decisions.

Available connectors by type:

  • Files — CSV, JSON, GZIP
  • Databases — MySQL, SQL Server, PostgreSQL, MongoDB
  • Object Stores — S3, MinIO
  • Cloud Data Warehouses — Amazon Redshift, Google BigQuery
  • APIs — Salesforce, Mailchimp, Zoho

xVector automatically infers metadata using sampling techniques while creating a datasource. It is recommended that the metadata be reviewed carefully and any corrections made if required. Metadata setting is a crucial step.

SettingDescription
Column NameSame name as present in the source datasource
Column New NameNew name given to a column in the dataset when copied from the datasource
DescriptionDescription for a column; available to other users and copied into datasets
Data TypeAutomatically inferred (int, float, string, date); review for potential errors
FormatApplicable for datetime and currency (e.g., ‘YYYY-MM-DD’, ‘DD/MM/YYYY HH:MM:SS’); visualization only
Semantic TypeAppropriate semantic type for the column (e.g., SSN, zip code for int columns); used in visualization
Statistical TypeUsed for visualization and modeling purposes
Skip HistogramDefault is false (generates histogram). Set to true for very high cardinality columns
NullableDefault is true. Set to false if column value cannot be null; xVector warns if nulls are found
DimensionSet the column as a dimension; used in visualization and modeling
MeasureSet the column as a measure; used in visualization and modeling

Data profiling involves examining data to assess its structure, content, and quality. This process calculates various statistical values, including minimum and maximum values, the presence of missing records, and the frequency of distinct values in categorical columns. It can also identify correlations between different attributes. Profiling can be automated by setting the profile option to true or performed manually.

A CSV-type datasource is used when you need to bring CSV data from a local system. From a workspace, click on Add and choose the Datasources option.

Workspace Add menu showing the Datasources option

Choose the CSV data source type from the available list.

Datasource type selection dialog showing available connectors
FieldDescription
Choose FileBrowse your local system to select the required CSV file
NameProvide a meaningful name
DelimiterChoose a delimiter according to the file
HeaderSet to true if the header is present, otherwise false
CSV datasource configure step with file selection and basic settings
FieldDescription
WorkspaceSelect a workspace for the datasource
EncodingSelect encoding
QuoteSingle character for escaping quoted values
EscapeSingle character for escaping quotes inside an already quoted value
CommentSingle character for skipping lines beginning with this character (disabled by default)
Null valueString representation of a null value
NaN valueString representation of a non-number value
Positive/Negative InfString representations of infinity values
MultilineParse one record, which may span multiple lines, per file
ModePermissive or drop malformed
Quote Escape CharacterSingle character for escaping the escape for the quote character
Empty ValueString representation of an empty value
Write OptionUpsert, Bulk Insert, Insert, Delete
Run ProfileSet to true for running profile
Run CorrelationSet to true for running correlation
Machine SpecificationsUse default or set to false for custom machine specs
ExpressionWrite expressions to be executed while reading the data
CSV datasource advanced settings

Displays a sample of data from the selected file.

CSV datasource preview data step showing sample rows

View the automatically inferred metadata and modify it if needed.

CSV datasource column metadata — left columns CSV datasource column metadata — right columns showing additional settings

Configure record key, partition key, and other table-level settings.

CSV datasource table options for record key and partition key

Click Save to begin the import. A status screen will appear to track progress. On completion, it redirects to the workspace page.

An S3 datasource is used to bring data from an AWS S3 bucket. You need an AWS access key and secret key.

From a workspace, click on Add and choose the Datasources option, then select the S3 data source type.

FieldDescription
Datasource nameProvide a meaningful name for the data
Saved accountsSelect from already created accounts
New account nameProvide an account name for creating a new account
AWS access key IDProvide the AWS access key ID
AWS secret keyEnter the AWS secret key
Allowed IPsShows the allowed IPs list
File typeChoose file type (CSV, JSON)
Test ConnectionClick to test the connection with provided credentials
ExtensionChoose file extension (CSV, JSON, GZIP)
S3 datasource configure step with AWS credentials
FieldDescription
WorkspaceChoose from available options
BucketSelect the source S3 bucket
Prefix for filesProvide prefix for files
Choose folderChoose files from the S3 bucket
HeaderSet to true if header is present, otherwise false
DelimiterChoose a delimiter according to the file
EncodingSelect encoding
Quote / Escape / CommentCharacter settings for parsing
Null / NaN / Inf valuesString representations for special values
MultilineParse one record across multiple lines per file
Quote Escape CharacterEscaping the escape for the quote character
Empty ValueString representation of an empty value
Write OptionChoose from Upsert, Bulk Insert, Insert, Delete
ExpressionWrite expressions to be executed while reading the data
S3 datasource advanced settings S3 datasource preview data showing sample rows

View the automatically inferred metadata and modify it if needed.

S3 datasource metadata — left columns S3 datasource metadata — right columns showing additional settings S3 datasource table options for record key and partition key

Click Save to begin the import. A status screen will track progress and redirect to the workspace on completion.