Snowpark
In order to follow this examples, make sure your installation is all set for snowpark
Install
pip install cuallee
pip install cuallee[snowpark]
Pre-Requisites
You will have a SnowFlake account in order to proceed with the examples below.
Once you get an account, you can obtain your account details normally located in the bottom left corner of your SnowFlake environment.
The following environment variables are required and used during runtime from cuallee to connect to your instance:
SF_ACCOUNT=hp00000.us-east4.gcpSF_USER=user.name@cuallee.comSF_PASSWORD=MySecretPa$$word?SF_ROLE=ACCOUNTADMINSF_WAREHOUSE=COMPUTE_WHSF_DATABASE=SNOWFLAKE_SAMPLE_DATASF_SCHEMA=TPCH_SF10
Cost Associated
Be aware that running cuallee checks in snowpark incurs into cloud costs.
In SnowFlake
The following How-To guide, explains the steps to configure cuallee directly in SnowFlake:
is_complete
It validates the completeness attribute of a data set. It confirms that a column does not contain null values.
is_complete
In this example, we validate that the column id does not have any missing values.
import os
from snowflake.snowpark import Session
from cuallee import Check
settings = {
"account": os.getenv("SF_ACCOUNT"),
"user": os.getenv("SF_USER"),
"password": os.getenv("SF_PASSWORD"),
"role": os.getenv("SF_ROLE"),
"warehouse": os.getenv("SF_WAREHOUSE"),
"database": os.getenv("SF_DATABASE"),
"schema": os.getenv("SF_SCHEMA"),
}
snowpark = Session.builder.configs(settings).create()
df = snowpark.range(10)
check = Check()
check.is_complete("ID")
# Validate
check.validate(df).show()
output:
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|"ID" |"TIMESTAMP" |"CHECK" |"LEVEL" |"COLUMN" |"RULE" |"VALUE" |"ROWS" |"VIOLATIONS" |"PASS_RATE" |"PASS_THRESHOLD" |"STATUS" |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|1 |2024-05-18 20:47:28 |cuallee.check |WARNING |ID |is_complete |N/A |10 |0.0 |1.0 |1.0 |PASS |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
In this example, we intentionally place 2 null values in the dataframe and that produces a FAIL check as result.
import os
from snowflake.snowpark import Session
from cuallee import Check
settings = {
"account": os.getenv("SF_ACCOUNT"),
"user": os.getenv("SF_USER"),
"password": os.getenv("SF_PASSWORD"),
"role": os.getenv("SF_ROLE"),
"warehouse": os.getenv("SF_WAREHOUSE"),
"database": os.getenv("SF_DATABASE"),
"schema": os.getenv("SF_SCHEMA"),
}
snowpark = Session.builder.configs(settings).create()
df = snowpark.range(8).union_all(snowpark.create_dataframe([[None], [None]], ["ID"]))
check = Check()
check.is_complete("ID")
# Validate
check.validate(df).show()
output:
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|"ID" |"TIMESTAMP" |"CHECK" |"LEVEL" |"COLUMN" |"RULE" |"VALUE" |"ROWS" |"VIOLATIONS" |"PASS_RATE" |"PASS_THRESHOLD" |"STATUS" |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|1 |2024-05-18 20:47:28 |cuallee.check |WARNING |ID |is_complete |N/A |10 |2.0 |0.8 |1.0 |FAIL |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
In this example, we validate reuse the data frame with empty values from the previous example, however we set our tolerance via the pct parameter on the rule is_complete to 0.8. Producing now a PASS result on the check, regardless of the 2 present null values.
import os
from snowflake.snowpark import Session
from cuallee import Check
settings = {
"account": os.getenv("SF_ACCOUNT"),
"user": os.getenv("SF_USER"),
"password": os.getenv("SF_PASSWORD"),
"role": os.getenv("SF_ROLE"),
"warehouse": os.getenv("SF_WAREHOUSE"),
"database": os.getenv("SF_DATABASE"),
"schema": os.getenv("SF_SCHEMA"),
}
snowpark = Session.builder.configs(settings).create()
df = snowpark.range(8).union_all(snowpark.create_dataframe([[None], [None]], ["ID"]))
check = Check()
check.is_complete("ID", pct=0.8)
# Validate
check.validate(df).show()
output:
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|"ID" |"TIMESTAMP" |"CHECK" |"LEVEL" |"COLUMN" |"RULE" |"VALUE" |"ROWS" |"VIOLATIONS" |"PASS_RATE" |"PASS_THRESHOLD" |"STATUS" |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|1 |2024-05-18 20:47:28 |cuallee.check |WARNING |ID |is_complete |N/A |10 |2.0 |0.8 |0.8 |PASS |
---------------------------------------------------------------------------------------------------------------------------------------------------------------