DATA - Pipeline

Pipelines

Pipelines are small transformations applied on incoming data before writing Instances to the Database.

Pipelines are managed as a special built-in _pipeline DataClass.

Pipelines contain tasks to perform atomic operations on each field of each incoming data instance.

Pipeline Example

# A small pipeline to adapt user.csv to bulk load users from external systems

- classname: _pipeline
  keyname: user_import_pipeline
  displayname: user_import_csv
  is_enabled: true
  description: Use this pipeline to import users as a csv file from system X/Y/Z
  content: |
      csv_delimiter: ';'
      classname: user
      keyfield: login
      tasks: 
          - field_lower: email
          - field_lower: login
          - field_upper: external_id
          - field_uuid: uuid_auto
          - field_datetime_now: last_sync      

Pipeline usecase

$ docker exec -it cavaliba_app python manage.py \
         cavaliba_load /files/user.csv --pipeline user_import_pipeline

Pipeline tasks reference


        - field_add: new_field
        - field_copy: ['firstname','firstname2']
        - field_rename: ['lastname','LastName']
        - field_delete : 'external_id'
        - field_lower: 'displayname'
        - field_upper: 'displayname'
        - field_date_now: fieldname              : YYYY-MM-DD
        - field_time_now: fieldname              : HH:MM:SS
        - field_datetime_now: fieldname          : YYYY-MM-DD HH:MM:SS
        - field_keep: ["field1", "field2", ...]  : keep only these fields (and classname/keyname)
        - field_regexp_sub: ["fieldname", "pattern","replace"]
        - field_uuid: "fieldname"