High-signal DEA-C01 reference: ingestion patterns (batch/stream/CDC), ETL and orchestration choices, S3 data lakes + Lake Formation governance, Glue Catalog + partitions, Redshift/Athena analytics trade-offs, monitoring/data quality, and security/privacy controls.
On this page
Keep this page open while drilling questions. DEA‑C01 rewards “production data platform realism”: correct service selection, replayability/backfills, partitioning/file formats, monitoring and data quality, and governance-by-default.
Quick facts (DEA-C01)
Item
Value
Questions
65 (multiple-choice + multiple-response)
Time
130 minutes
Passing score
720 (scaled 100–1000)
Cost
150 USD
Domains
D1 34% • D2 26% • D3 22% • D4 18%
Fast strategy (what the exam expects)
If the requirement is replayable + backfillable, design for idempotency, checkpoints, and reprocessing (S3 as durable landing is common).
If you see “best cost/performance for queries on S3”, think Parquet + partitioning + Athena/Redshift Spectrum, not raw CSV scans.
If you see “govern access to S3 data across services”, think Lake Formation + Glue Data Catalog, not just IAM.
If you see “batch vs streaming”, focus on latency, ordering, retention, and operational complexity.
If you see “audit” or “governance”, include CloudTrail, central log storage, and controlled access to logs.
Final 20-minute recall (exam day)
Cue -> best answer (pattern map)
If the question says…
Usually best answer
Replayable ingest and backfills
S3 raw zone + idempotent processing + checkpoints
Database replication / CDC
AWS DMS
Low-latency event stream analytics
Kinesis Data Streams or MSK (+ Flink when stateful processing is needed)
Cheapest ad-hoc SQL on S3
Athena + Parquet + partition pruning
Warehouse-style analytics and mixed workload SQL
Redshift (plus Spectrum for external S3 data)
Cross-engine data permissions on lake data
Lake Formation + Glue Data Catalog
Production orchestration with dependencies/retries
MWAA or Step Functions
PII discovery in S3
Amazon Macie
Schema discovery and metadata
Glue crawlers + explicit table design where needed
Data quality guardrails
In-pipeline checks + quarantine + alerting
Must-memorize DEA defaults
Topic
Fast recall
File format for analytics
Parquet/ORC beats CSV/JSON for scan cost and speed
S3 table performance
Partition on query predicates; avoid tiny files
Delivery semantics
Most streaming/integration paths are at-least-once
flowchart LR
E["EventBridge schedule"] --> W["Workflow start"]
W --> I["Ingest"]
I --> V{"Valid?"}
V -->|yes| T["Transform"]
V -->|no| Q["Quarantine + alert"]
T --> C["Catalog/partitions update"]
C --> P["Publish dataset"]
P --> N["Notify (SNS)"]
High-yield reliability rules:
Design for retries + duplicates (at-least-once is normal).
Make steps idempotent (safe re-runs).
Track freshness/latency SLIs (what matters to users).
Cost: scan volume (Athena), cluster usage (EMR/Redshift), data transfer
Security/audit: access logs, permission changes
Common AWS tooling:
CloudWatch (metrics/logs/alarms, Logs Insights)
CloudTrail (API calls; audit)
Macie (PII discovery; policy violations)
9) Data quality (Domain 3)
Data quality dimensions (memorize)
Dimension
Example check
Completeness
Required fields not null
Consistency
Same customer_id format across sources
Accuracy
Values within expected ranges
Integrity
Valid foreign keys / referential relationships
High-yield pattern: run checks in-pipeline, quarantine bad records, and alert.
10) Security and governance (Domain 4)
Lake Formation (why it’s a big deal)
Lake Formation helps you manage fine-grained permissions for data in S3 across engines like Athena/EMR/Redshift Spectrum, using a consistent governance model.
Encryption and key points
Prefer SSE-KMS for S3 and service-level encryption for analytics services.
Use TLS for encryption in transit.
Don’t log secrets or raw PII; keep logs access-controlled.
Audit readiness checklist
CloudTrail enabled and centralized (optionally CloudTrail Lake for queries)
CloudWatch Logs retention + encryption set
Access to logs is restricted (separation of duties)
Data sharing has explicit approvals and is traceable
Next steps
Use Resources to stay anchored to the official exam guide and core analytics docs.
Use the FAQ to confirm expected depth, candidate profile, and service coverage.
Turn your weak rows into replayable scenario prompts and drill them under time.