DATA - Pipeline
Pipelines
Pipelines are small transformations applied on incoming data before writing Instances to the Database.
Pipelines are managed as a special built-in _pipeline DataClass.
Pipelines contain tasks to perform atomic operations on each field of each incoming data instance.
Pipeline Example
# A small pipeline to adapt user.csv to bulk load users from external systems
- classname: _pipeline
keyname: user_import_pipeline
displayname: user_import_csv
is_enabled: true
description: Use this pipeline to import users as a csv file from system X/Y/Z
content: |
csv_delimiter: ';'
classname: user
keyfield: login
tasks:
- field_lower: email
- field_lower: login
- field_upper: external_id
- field_uuid: uuid_auto
- field_datetime_now: last_sync
Pipeline usecase
$ docker exec -it cavaliba_app python manage.py \
cavaliba_load /files/user.csv --pipeline user_import_pipeline
Pipeline tasks reference
- field_add: new_field
- field_copy: ['firstname','firstname2']
- field_rename: ['lastname','LastName']
- field_delete : 'external_id'
- field_lower: 'displayname'
- field_upper: 'displayname'
- field_date_now: fieldname : YYYY-MM-DD
- field_time_now: fieldname : HH:MM:SS
- field_datetime_now: fieldname : YYYY-MM-DD HH:MM:SS
- field_keep: ["field1", "field2", ...] : keep only these fields (and classname/keyname)
- field_regexp_sub: ["fieldname", "pattern","replace"]
- field_uuid: "fieldname"