Toggle Side Panel
Cyclr Community
  • Forums
  • Tutorials
  • Documentation
        • User Documentation >>
          • Introduction to Cyclr
          • API Guides
          • How to use Connectors
          • Navigating the Cyclr ConsoleLearn all about your Cyclr Console and configuration
          • Embedding Cyclr in your SaaS
        • View More...
        • Connector Guides >>
          • Salesforce
          • Microsoft
          • GitHub
          • HubSpot
          • Oracle NetSuite
          • Stripe
        • View More...
  • Resources
    • New Features
    • Cypher
    • DevOps Services
    • Changelog
    • Support Portal
    • Referral Program
    • Blog
    • News Archive
More options
    Sign in
    Cyclr Logo
    Cyclr Logo
    • Forums
    • Tutorials
    • Documentation
          • User Documentation >>
            • Introduction to Cyclr
            • API Guides
            • How to use Connectors
            • Navigating the Cyclr ConsoleLearn all about your Cyclr Console and configuration
            • Embedding Cyclr in your SaaS
          • View More...
          • Connector Guides >>
            • Salesforce
            • Microsoft
            • GitHub
            • HubSpot
            • Oracle NetSuite
            • Stripe
          • View More...
    • Resources
      • New Features
      • Cypher
      • DevOps Services
      • Changelog
      • Support Portal
      • Referral Program
      • Blog
      • News Archive
    Close search
    Home » User Documentation » Deep Data Ingestion » Preparing and Ingesting Data into Vector Databases

    About Cyclr

    6
    • Introduction to Cyclr
    • Minimum requirements
    • Glossary
    • Errors
    • IP Allow List
    • Legal notices

    Cyclr Console

    14
    • Partner Console Dashboard
    • The Builder
    • Reports
    • Console Configuration
    • General Settings
    • Notifications
    • Embedding Customization
    • Security settings
    • Data retention settings
    • Custom Service Domains
    • Handle transaction errors
    • Payload Size Limitations
    • Performance tips
    • GitHub Integration

    Connectors

    38
    • Introduction to Connectors
    • Connector Installation Overview
    • Connector Settings
    • Handling Connector Releases
    • Connectivity Comparison
    • Connector Guides
    • Application Connectors
      • Introduction to Application Connectors
    • Utility Connectors
      • Introduction to Utility Connectors
      • Counter Storage
      • Cross Updating Preventer
      • Data Tools
      • Entity Cross Ref Storage
      • Generic File
      • Generic File Downloader
      • Generic Form
      • Generic Webhook
      • Data Storage
        • Introduction to Data Storage
        • Global Data Storage
        • Global Object Storage
        • Cycle Data Storage
        • Cycle Object Storage
    • Custom Connectors
      • Introduction to Custom Connectors
      • Methods
      • Triggers
      • Last Successful Run Date
      • Parameters
      • Data Types
      • Paging
      • Custom Objects
      • Dynamic Custom Fields
      • Automatically Install Webhooks
      • Connector Standards
      • Settings
      • Custom Connector Authentication
      • Rate Limits
      • Scripting
        • Scripting Introduction
        • Scripting Events
        • Scripting Functions

    Templates

    17
    • Introduction to Templates
    • Template Settings
    • Create a Template
    • Tools
    • Connectivity Tools
    • Field Mappings
    • Add Custom Fields
    • Custom Object Method Categories
    • Test Scripts
    • Collection splitting
    • Pass data between two steps
    • Test a template
    • Template versioning
    • Introduction to Cycles
    • Stop a cycle
    • Copy Cycles as Templates
    • Import or Export Templates

    Accounts

    7
    • Introduction to Accounts
    • Account Users
    • Sub Accounts
    • Connector Authentication Link
    • Export or Import Cycles
    • Account-level OAuth Client Credentials
    • Securing Cyclr Webhooks

    Embedding

    17
    • Introduction to Embedding
    • ORBIT
    • Installing a Partner Connector
    • Embed Cyclr in an iFrame
    • LAUNCH
      • Introduction to LAUNCH
      • User Experience
      • Customize Appearance
      • Build Compatible Templates
      • Handle LAUNCH callbacks
      • Deploy LAUNCH
    • Marketplace
      • Introduction to Marketplace
      • Set up a Marketplace
      • Styling Marketplace
      • Deploying a Marketplace
      • Marketplace Callback
      • Marketplace Webhook Callback
      • Marketplace Settings

    API

    19
    • Introduction to the Cyclr API
    • Authentication
    • Authorize Account API calls
    • Install Connectors into an Account
    • Install a Cycle from a Template
    • Step Setup
    • Activate a Cycle
    • Install Connectors
    • Create an Account
    • API Example Walkthrough
    • Connector Authentication
      • Introduction to Connector Authentication
      • API Key Authentication
      • HTTP Basic Authentication
      • OAuth Authentication
    • Data on Demand
      • Introduction to Data on Demand
      • Get Account Connectors
      • Get Connector Methods
      • Call a Connector Method
      • IP Restriction

    Deep Data Ingestion

    3
    • Vector Databases Introduction
    • Preparing and Ingesting Data into Vector Databases
    • Querying & Retrieving Data from Vector Databases

    Near Real-Time Actions

    6
    • Generic Webhook
    • Introduction to Data on Demand
    • Get Account Connectors
    • Get Connector Methods
    • Call a Connector Method
    • IP Restriction

    MCP Servers

    2
    • Introduction to MCP Servers
    • MCP Server Templates

    Release Notes

    35
    • Introduction to Release Notes
    • 2026
      • 2026-02
      • 2026-01
    • 2025
      • 2025-11
      • 2025-10
      • 2025-09
      • 2025-08
      • 2025-07
      • 2025-06
      • 2025-05
      • 2025-04
      • 2025-03
      • 2025-02
      • 2025-01
    • 2024
      • 2024-12
      • 2024-11
      • 2024-10
      • 2024-09
      • 2024-08
      • 2024-07
      • 2024-06
      • 2024-05
      • 2024-04
      • 2024-03
      • 2024-02
      • 2024-01
    • 2023
      • 2023-12
      • 2023-11
      • 2023-10
      • 2023-06
      • 2023-05
      • 2023-04
    • Archive
      • Archive
      • 2022
      • 2021
    View Categories
    • Home
    • Documentation
    • Deep Data Ingestion
    • Preparing and Ingesting Data into Vector Databases

    Preparing and Ingesting Data into Vector Databases

    1 min read

    Populating a vector database typically involves sourcing content, transforming it into a vector representation, and storing it in a searchable format. 

    Data Cleaning and Preprocessing #

    To store data in a vector database, you must first convert it into a numerical vector. Cyclr performs embedding by integrating with external models via its Connectors:

    • Use an embedding service to convert your input text into a high-dimensional vector.
      • For example, with the ChatGPT connector, you can call the “Create Embedding” method to pass a text string. The resulting output is a vector, often with dimensions defined by the model (e.g. OpenAI’s text-embedding-3-small returns vectors with 1536 dimensions). 
    • Use OCR (Optical Character Recognition) to extract text from PDFs.
      • In this example, we built a custom connector for MistralAI to process text through OCR and pass it to external models.
    • Convert extracted text to Markdown or plain text for consistency.

    For example, a PDF file might be extracted using OCR, converted into markdown, and then embedded via OpenAI. The resulting vector is then passed to the vector database.

    Always refer to your embedding provider’s documentation to verify the expected output format. Ensure consistency between the embedding model used during data ingestion and querying. Other providers and custom models can also be used, as long as they return compatible vector formats.

    Vector Upsertion #

    Vector upsertion is the process of adding or updating vector records in a database. These records may typically include:

    • A unique identifier
    • The embedding vector itself (a high-dimensional array)
    • Optional metadata for filtering or context

    In Cyclr, upsertion is handled via connector methods that map incoming data from source systems to the required format for your vector database. For example, the Cyclr Pinecone connector includes methods like:

    • Upsert Vectors: Store one or more vectors in a specified namespace*
    • Upsert Text: Embed and store text using integrated models (if supported) into a namespace*
    • Delete Vectors, Update Vector, List Vector IDs: Manage vector records

    *A namespace in this context refers to a logical partition within a vector database index. It is used to isolate groups of vectors under a shared identifier, allowing for targeted queries, scoped data management, and organization of content. When you upsert vectors or perform searches, specifying a namespace ensures that operations are confined to that partition.

    Watch: How to Ingest and Store Vectors from Sheets Using Pinecone (Video Walkthrough – Episode 2)

    Workflow Orchestration #

    The ingestion process can be orchestrated as a Cyclr workflow. A workflow might, for example:

    • Retrieve data, e.g. a document from Google Drive or rows from Google Sheets
    • Call an embedding service for each content item
    • Map the output into a vector record
    • Upsert the result into the database

    These workflows can be scheduled, triggered by events such as new uploads, or run manually. 

    Vector Databases IntroductionQuerying & Retrieving Data from Vector Databases
    Page Contents
    • Data Cleaning and Preprocessing
    • Vector Upsertion
    • Workflow Orchestration

    Company

    • Company
    • About Us
    • Security and Compliance
    • Pricing
    • Blog
    • Branding
    • Embedded iPaaS
    • Release Notes

    Legal

    • Website Terms
    • Privacy Policy
    • Terms and Conditions
    • Data Protection Agreement
    • SLA
    • GDPR

    UK Office

    +44 (0) 3300 102 525

    US Office

    +1 (646) 585-2525

    Cyclr Logo

    White labelled API integration framework for creating & managing in-app SaaS integrations.

    © 2025 Cyclr. All rights reserved.