CSV and pipelines

CSV and pipelines

CSV files

CSV files are popular and may be part of your routine. CSV files are text files (human readable) and contain tabular data with comma-separated values. The first line may be a list of column titles.

Example:

login,firstname,lastname
hbarb,Hector,Barbarossa

CSV need special care before processing with Cavaliba

  • column names to field mapping
  • character encoding
  • column separator
  • which Cavaliba schema

Pipelines

Pipelines are Cavaliba objects to assist in the processing of CSV files. Pipelines are in fact more general and can perform many more data transformations, not limited to CSV.

The idea behind CSV and Pipelines association, is to not require from the producer of a CSV file, to modify its content to adapt to a Cavaliba specific Schema. Pipeline perform the transformation on-the-fly at import (or export) time.

Create a Pipeline from the UI

Before importing a CSV file, we need to define a pipeline. In Cavaliba, pipelines are objects from the builtin _pipeline schema.

As always, we can create a Cavaliba object:

  • from the CLI with the object definition from a file
  • from the Web UI with the Import tool to also load a file or YAML code with the object definition
  • from the Web UI by manually editing the Web Form for this object

Let’s create a pipeline from the UI Web formular.

From the Sidebar, click on Pipeline, and from the Action menu, select New.

Fill the form as follow:

Pipeline USER

Click on Save.

The YAML code equivalent is this:

- classname: _pipeline
  keyname: user_import_csv
  displayname: "User CSV Import Pipeline"
  description: "Pipeline to import users in CSV format with pipe delimiter"
  content: |
    csv_delimiter: '|'
    encoding: 'utf-8'
    classname: _user
    keyfield: login
    tasks:
      - field_lower: ['', email, login]    

You can also paste that code in the Import tool from the UI sidebar.

Pipeline YAML

This pipeline configuration:

  • Uses pipe (|) as the CSV delimiter
  • Specifies UTF-8 encoding (‘ISO-8859-1’ is another valid encoding)
  • Maps CSV rows to the _user schema
  • Uses login as the primary (unique) key field
  • Includes a task to convert email and login fields to lowercase ; many task operators are available.

User Import Pipeline UI

from the CLI

Alternatively, load the pipeline definition into Cavaliba with the provided demo files :

cavaliba load builtin/demo/010_user_pipeline.yaml

Import CSV with Pipeline

Now you can import a CSV file using the newly created pipeline:

# Basic import
cavaliba load builtin/demo/010_user.csv --pipeline user_import_csv

# Import with progress indicator
cavaliba load builtin/demo/010_user.csv --pipeline user_import_csv --progress

# Preview last 5 records before importing
cavaliba load builtin/demo/010_user.csv --pipeline user_import_csv --last 5

# Verbose output with details
cavaliba load builtin/demo/010_user.csv --pipeline user_import_csv --last 5 --verbose

Pipeline Output Example

When you run the import from the CLI with --verbose, you’ll see the processed data:

(...)

Found: 5 objects
[
  {
    "classname": "_user",
    "login": "adela81",
    "firstname": "Marijn",
    "lastname": "Torrecilla",
    "displayname": "Marijn Torrecilla",
    "email": "torrecilla@demo.cavaliba",
    "mobile": "+56-16832089",
    "external_id": "BA-948192",
    "is_enabled": true,
    "description": "User Marijn Torrecilla for cavaliba demo",
    "want_notifications": true,
    "want_24": false,
    "want_email": true,
    "want_sms": false,
    "secondary_email": "",
    "secondary_mobile": "",
    "keyname": "adela81"
  },
  ...
]

The pipeline processes the CSV data and load the user objects in the Database.