Skip to content

Polars

In order to follow this examples, make sure your installation is all set for polars

Install

pip install cuallee
pip install cuallee[polars]

is_complete

It validates the completeness attribute of a data set. It confirms that a column does not contain null values .

is_complete

In this example, we validate that the column id does not have any missing values.

import polars as pl
from cuallee import Check

df = pl.DataFrame({"id" : [1,2,3,4,5]})
check = Check()
check.is_complete("id")

# Validate
check.validate(df)

output:

shape: (1, 12)
┌─────┬─────────────────────┬───────────────┬─────────┬────────┬─────────────┬───────┬──────┬────────────┬───────────┬────────────────┬────────┐
│ id  ┆ timestamp           ┆ check         ┆ level   ┆ column ┆ rule        ┆ value ┆ rows ┆ violations ┆ pass_rate ┆ pass_threshold ┆ status │
│ --- ┆ ---                 ┆ ---           ┆ ---     ┆ ---    ┆ ---         ┆ ---   ┆ ---  ┆ ---        ┆ ---       ┆ ---            ┆ ---    │
│ i64 ┆ str                 ┆ str           ┆ str     ┆ str    ┆ str         ┆ str   ┆ i64  ┆ i64        ┆ f64       ┆ f64            ┆ str    │
╞═════╪═════════════════════╪═══════════════╪═════════╪════════╪═════════════╪═══════╪══════╪════════════╪═══════════╪════════════════╪════════╡
│ 1   ┆ 2024-05-18 16:53:56 ┆ cuallee.check ┆ WARNING ┆ id     ┆ is_complete ┆ N/A   ┆ 5    ┆ 0          ┆ 1.0       ┆ 1.0            ┆ PASS   │
└─────┴─────────────────────┴───────────────┴─────────┴────────┴─────────────┴───────┴──────┴────────────┴───────────┴────────────────┴────────┘

In this example, we intentionally place 2 null values in the dataframe and that produces a FAIL check as result.

import polars as pl
from cuallee import Check

df = pl.DataFrame({"id" : [1,2,3,None, None]})
check = Check()
check.is_complete("id")

# Validate
check.validate(df)

output:

shape: (1, 12)
┌─────┬─────────────────────┬───────────────┬─────────┬────────┬─────────────┬───────┬──────┬────────────┬───────────┬────────────────┬────────┐
│ id  ┆ timestamp           ┆ check         ┆ level   ┆ column ┆ rule        ┆ value ┆ rows ┆ violations ┆ pass_rate ┆ pass_threshold ┆ status │
│ --- ┆ ---                 ┆ ---           ┆ ---     ┆ ---    ┆ ---         ┆ ---   ┆ ---  ┆ ---        ┆ ---       ┆ ---            ┆ ---    │
│ i64 ┆ str                 ┆ str           ┆ str     ┆ str    ┆ str         ┆ str   ┆ i64  ┆ i64        ┆ f64       ┆ f64            ┆ str    │
╞═════╪═════════════════════╪═══════════════╪═════════╪════════╪═════════════╪═══════╪══════╪════════════╪═══════════╪════════════════╪════════╡
│ 1   ┆ 2024-05-18 16:53:56 ┆ cuallee.check ┆ WARNING ┆ id     ┆ is_complete ┆ N/A   ┆ 5    ┆ 2          ┆ 0.6       ┆ 1.0            ┆ FAIL   │
└─────┴─────────────────────┴───────────────┴─────────┴────────┴─────────────┴───────┴──────┴────────────┴───────────┴────────────────┴────────┘

In this example, we validate reuse the data frame with empty values from the previous example, however we set our tolerance via the pct parameter on the rule is_complete to 0.6. Producing now a PASS result on the check, regardless of the 2 present null values.

import polars as pl
from cuallee import Check

df = pl.DataFrame({"id" : [1,2,3,None, None]})
check = Check()
check.is_complete("id", pct=0.6)

# Validate
check.validate(df)

output:

shape: (1, 12)
┌─────┬─────────────────────┬───────────────┬─────────┬────────┬─────────────┬───────┬──────┬────────────┬───────────┬────────────────┬────────┐
│ id  ┆ timestamp           ┆ check         ┆ level   ┆ column ┆ rule        ┆ value ┆ rows ┆ violations ┆ pass_rate ┆ pass_threshold ┆ status │
│ --- ┆ ---                 ┆ ---           ┆ ---     ┆ ---    ┆ ---         ┆ ---   ┆ ---  ┆ ---        ┆ ---       ┆ ---            ┆ ---    │
│ i64 ┆ str                 ┆ str           ┆ str     ┆ str    ┆ str         ┆ str   ┆ i64  ┆ i64        ┆ f64       ┆ f64            ┆ str    │
╞═════╪═════════════════════╪═══════════════╪═════════╪════════╪═════════════╪═══════╪══════╪════════════╪═══════════╪════════════════╪════════╡
│ 1   ┆ 2024-05-18 16:53:56 ┆ cuallee.check ┆ WARNING ┆ id     ┆ is_complete ┆ N/A   ┆ 5    ┆ 2          ┆ 0.6       ┆ 0.6            ┆ PASS   │
└─────┴─────────────────────┴───────────────┴─────────┴────────┴─────────────┴───────┴──────┴────────────┴───────────┴────────────────┴────────┘