Pandas
In order to follow this examples, make sure your installation is all set for pandas
Install
pip install cuallee
pip install cuallee[pandas]
is_complete
It validates the completeness attribute of a data set. It confirms that a column does not contain null
values.
is_complete
In this example, we validate that the column id
does not have any missing values.
import pandas as pd
from cuallee import Check
df = pd.DataFrame({"id" : [1,2,3,4,5]})
check = Check()
check.is_complete("id")
# Validate
check.validate(df)
output:
id timestamp check level column rule value rows violations pass_rate pass_threshold status
1 2024-05-18 16:22:53 cuallee.check WARNING id is_complete N/A 5 0 1.0 1.0 PASS
In this example, we intentionally place 2 null
values in the dataframe and that produces a FAIL
check as result.
import pandas as pd
from cuallee import Check
df = pd.DataFrame({"id" : [1,2,3,None, None]})
check = Check()
check.is_complete("id")
# Validate
check.validate(df)
output:
id timestamp check level column rule value rows violations pass_rate pass_threshold status
1 2024-05-18 16:33:55 cuallee.check WARNING id is_complete N/A 5 2 0.6 1.0 FAIL
In this example, we validate reuse the data frame with empty values from the previous example, however we set our tolerance via the pct
parameter on the rule is_complete
to 0.6
. Producing now a PASS
result on the check, regardless of the 2
present null
values.
import pandas as pd
from cuallee import Check
df = pd.DataFrame({"id" : [1,2,3,None, None]})
check = Check()
check.is_complete("id", pct=0.6)
# Validate
check.validate(df)
output:
id timestamp check level column rule value rows violations pass_rate pass_threshold status
1 2024-05-18 16:33:55 cuallee.check WARNING id is_complete N/A 5 2 0.6 0.6 PASS