A company is developing machine learning (ML) models. A data engineer needs to apply data quality rules to training data. The company stores the training data in an Amazon S3 bucket.
A.
Create an AWS Lambda function to check data quality and to raise exceptions in the code.
B.
Create an AWS Glue DataBrew project for the data in the S3 bucket. Create a ruleset for the data quality rules. Create a profile job to run the data quality rules. Use Amazon EventBridge to run the profile job when data is added to the S3 bucket.
C.
Create an Amazon EMR provisioned cluster. Add a Python data quality package.
D.
Create AWS Lambda functions to evaluate data quality rules and orchestrate with AWS Step Functions.
AWS Glue DataBrew provides a no-code way to define and run data quality rulesets for data stored in S3. You can trigger profiling jobs via Amazon EventBridge on new uploads for automated checks.
“Use AWS Glue DataBrew to define and run data quality rules on S3 datasets with minimal coding effort. Automate validation by triggering jobs through EventBridge.”
– Ace the AWS Certified Data Engineer - Associate Certification - version 2 - apple.pdf
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit