Transform data from CSV files with Pandas in Python containers (in parallel)

Source

yaml

id: python-csv-each-parallel
namespace: company.team

tasks:
  - id: csv
    type: io.kestra.plugin.core.flow.ForEach
    concurrencyLimit: 0
    values:
      - https://huggingface.co/datasets/kestra/datasets/raw/main/csv/orders.csv
      - https://huggingface.co/datasets/kestra/datasets/raw/main/csv/products.csv
      - https://huggingface.co/datasets/kestra/datasets/raw/main/csv/salaries.csv
    tasks:
      - id: pandas
        type: io.kestra.plugin.scripts.python.Script
        taskRunner:
          type: io.kestra.plugin.scripts.runner.docker.Docker
        containerImage: ghcr.io/kestra-io/pydata:latest
        script: |
          import pandas as pd
          df = pd.read_csv("{{ taskrun.value }}")
          df.info()

About this blueprint

Python Kestra

This flow reads a list of CSV files and processes each file in parallel in isolated Python scripts using Pandas.

For Each

Script

Docker

More Related Blueprints

PythonKestra

Extract a zip file, decompress it, and convert CSV to parquet format in Python

PythonSQLKestra

Extract multiple tables from Postgres using SQL queries and process those as Pandas dataframes on schedule

PythonKestra

Run a Python script and generate outputs, metrics and files specified with a variable

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra