Skip to content

Daft

In order to follow this examples, make sure your installation is all set for daft

Install

pip install cuallee
pip install cuallee[daft]

is_complete

It validates the completeness attribute of a data set. It confirms that a column does not contain null values.

is_complete

In this example, we validate that the column id does not have any missing values.

import daft
import numpy as np
from cuallee import Check, CheckLevel

df = daft.from_pydict({"id": np.arange(10)})
check = Check(CheckLevel.WARNING, "CompletePredicate")
check.is_complete("id")

# Validate
check.validate(df)

output:

╭───────────────────┬────────┬───────┬─────────┬───────────┬────────────────┬───────┬─────────────┬────────┬─────────────────────┬───────┬────────────╮
│ check             ┆ column ┆ id    ┆ level   ┆ pass_rate ┆ pass_threshold ┆ rows  ┆ rule        ┆ status ┆ timestamp           ┆ value ┆ violations │
│ ---               ┆ ---    ┆ ---   ┆ ---     ┆ ---       ┆ ---            ┆ ---   ┆ ---         ┆ ---    ┆ ---                 ┆ ---   ┆ ---        │
│ Utf8              ┆ Utf8   ┆ Int64 ┆ Utf8    ┆ Float64   ┆ Float64        ┆ Int64 ┆ Utf8        ┆ Utf8   ┆ Utf8                ┆ Utf8  ┆ Int64      │
╞═══════════════════╪════════╪═══════╪═════════╪═══════════╪════════════════╪═══════╪═════════════╪════════╪═════════════════════╪═══════╪════════════╡
│ CompletePredicate ┆ id     ┆ 1     ┆ WARNING ┆ 1         ┆ 1              ┆ 10    ┆ is_complete ┆ PASS   ┆ 2024-03-26 18:55:43 ┆ N/A   ┆ 0          │
╰───────────────────┴────────┴───────┴─────────┴───────────┴────────────────┴───────┴─────────────┴────────┴─────────────────────┴───────┴────────────╯

In this example, we intentionally place 2 null values in the dataframe and that produces a FAIL check as result.

import daft
import numpy as np
from cuallee import Check, CheckLevel

df = daft.from_pydict({"id": [1,2,3,None,None]})
check = Check(CheckLevel.WARNING, "CompletePredicate")
check.is_complete("id")

# Validate
check.validate(df)

output:

╭───────────────────┬────────┬───────┬─────────┬───────────┬────────────────┬───────┬─────────────┬────────┬─────────────────────┬───────┬────────────╮
│ check             ┆ column ┆ id    ┆ level   ┆ pass_rate ┆ pass_threshold ┆ rows  ┆ rule        ┆ status ┆ timestamp           ┆ value ┆ violations │
│ ---               ┆ ---    ┆ ---   ┆ ---     ┆ ---       ┆ ---            ┆ ---   ┆ ---         ┆ ---    ┆ ---                 ┆ ---   ┆ ---        │
│ Utf8              ┆ Utf8   ┆ Int64 ┆ Utf8    ┆ Float64   ┆ Float64        ┆ Int64 ┆ Utf8        ┆ Utf8   ┆ Utf8                ┆ Utf8  ┆ Int64      │
╞═══════════════════╪════════╪═══════╪═════════╪═══════════╪════════════════╪═══════╪═════════════╪════════╪═════════════════════╪═══════╪════════════╡
│ CompletePredicate ┆ id     ┆ 1     ┆ WARNING ┆ 0.6       ┆ 1              ┆ 5     ┆ is_complete ┆ FAIL   ┆ 2024-05-18 21:24:15 ┆ N/A   ┆ 2          │
╰───────────────────┴────────┴───────┴─────────┴───────────┴────────────────┴───────┴─────────────┴────────┴─────────────────────┴───────┴────────────╯

In this example, we validate reuse the data frame with empty values from the previous example, however we set our tolerance via the pct parameter on the rule is_complete to 0.6. Producing now a PASS result on the check, regardless of the 2 present null values.

import daft
import numpy as np
from cuallee import Check, CheckLevel

df = daft.from_pydict({"id": [1,2,3,None,None]})
check = Check(CheckLevel.WARNING, "CompletePredicate")
check.is_complete("id", pct=.6)

# Validate
check.validate(df)

output:

╭───────────────────┬────────┬───────┬─────────┬───────────┬────────────────┬───────┬─────────────┬────────┬─────────────────────┬───────┬────────────╮
│ check             ┆ column ┆ id    ┆ level   ┆ pass_rate ┆ pass_threshold ┆ rows  ┆ rule        ┆ status ┆ timestamp           ┆ value ┆ violations │
│ ---               ┆ ---    ┆ ---   ┆ ---     ┆ ---       ┆ ---            ┆ ---   ┆ ---         ┆ ---    ┆ ---                 ┆ ---   ┆ ---        │
│ Utf8              ┆ Utf8   ┆ Int64 ┆ Utf8    ┆ Float64   ┆ Float64        ┆ Int64 ┆ Utf8        ┆ Utf8   ┆ Utf8                ┆ Utf8  ┆ Int64      │
╞═══════════════════╪════════╪═══════╪═════════╪═══════════╪════════════════╪═══════╪═════════════╪════════╪═════════════════════╪═══════╪════════════╡
│ CompletePredicate ┆ id     ┆ 1     ┆ WARNING ┆ 0.6       ┆ 0.6            ┆ 5     ┆ is_complete ┆ PASS   ┆ 2024-05-18 21:24:15 ┆ N/A   ┆ 2          │
╰───────────────────┴────────┴───────┴─────────┴───────────┴────────────────┴───────┴─────────────┴────────┴─────────────────────┴───────┴────────────╯