How Drizzle Cube Works
This technical deep-dive explains how Drizzle Cube transforms semantic queries into optimized SQL and executes them safely at scale. Understanding this architecture will help you build better cubes, optimize performance, and debug complex queries.
Overview: The Challenge
Section titled “Overview: The Challenge”Analytics queries are fundamentally different from transactional queries. They often involve:
- Complex aggregations across multiple tables
- Time-based grouping with various granularities
- Cross-cube relationships that can cause data fan-out
- Dynamic filtering that must be applied at the right query stage
- Security constraints that must be enforced consistently
Drizzle Cube solves these challenges through a sophisticated query planning and execution engine built on top of Drizzle ORM’s type-safe SQL generation.
Query Processing Pipeline
Section titled “Query Processing Pipeline”Let’s examine each phase in detail.
Phase 1: Query Planning
Section titled “Phase 1: Query Planning”The QueryPlanner
analyzes the incoming semantic query to understand which cubes are involved and how they should be joined.
Cube Usage Analysis
Section titled “Cube Usage Analysis”// Example semantic queryconst query = { measures: ["Employees.count", "Productivity.totalLinesOfCode"], dimensions: ["Departments.name"], timeDimensions: [{ dimension: "Productivity.date", granularity: "month" }], filters: [ { member: "Employees.isActive", operator: "equals", values: [true] } ]}
The planner extracts cube names from each query component:
- Measures:
Employees
,Productivity
- Dimensions:
Departments
- Time Dimensions:
Productivity
- Filters:
Employees
Cubes involved: Employees
, Departments
, Productivity
Primary Cube Selection
Section titled “Primary Cube Selection”The planner uses a sophisticated strategy to choose the primary cube:
- Dimension-based selection - Prefer cubes with dimensions (they represent the “grain” of analysis)
- Connectivity check - Ensure the primary cube can reach all other cubes via joins
- Consistency guarantee - Use deterministic ordering to avoid query plan variations
For our example, Employees
would be chosen as primary since it has connectivity to both Departments
(belongsTo) and Productivity
(hasMany).
Join Path Detection
Section titled “Join Path Detection”The planner uses breadth-first search (BFS) to find the shortest join path between cubes:
Each cube defines its relationships in the joins
configuration:
// From examples/hono/cubes.tsemployeesCube = defineCube('Employees', { joins: { Departments: { targetCube: () => departmentsCube, relationship: 'belongsTo', on: [{ source: employees.departmentId, target: departments.id }] }, Productivity: { targetCube: () => productivityCube, relationship: 'hasMany', on: [{ source: employees.id, target: productivity.employeeId }] } }})
Phase 2: Fan-out Prevention
Section titled “Phase 2: Fan-out Prevention”The most critical challenge in multi-cube analytics is the fan-out problem.
The Fan-out Problem
Section titled “The Fan-out Problem”When joining a parent table to a child table in a hasMany
relationship, each parent row gets duplicated for every child row. This causes incorrect aggregation results.
Example: Getting employee count and total lines of code:
-- Naive approach (WRONG)SELECT COUNT(employees.id) as employee_count, SUM(productivity.lines_of_code) as total_linesFROM employeesJOIN productivity ON employees.id = productivity.employee_id
Problem: If Employee #1 has 30 productivity records, they get counted 30 times instead of once!
Employee 1 → Productivity Record 1, 2, 3, ..., 30Employee 2 → Productivity Record 31, 32, 33, ..., 45Employee 3 → Productivity Record 46, 47, 48, ..., 60
Result: employee_count = 90
(should be 3), total_lines = correct
The CTE Solution
Section titled “The CTE Solution”Drizzle Cube solves this using Common Table Expressions (CTEs) for pre-aggregation:
Generated SQL with CTE:
WITH productivity_agg AS ( SELECT employee_id, DATE_TRUNC('month', date) as month, -- Time dimension computed in CTE SUM(lines_of_code) as totalLinesOfCode FROM productivity WHERE organisation_id = $1 AND date >= $2 AND date <= $3 -- Time filters applied in CTE GROUP BY employee_id, DATE_TRUNC('month', date) -- Group by pre-computed time dimension)SELECT departments.name, productivity_agg.month, -- Use pre-computed time dimension from CTE COUNT(employees.id) as employee_count, -- Correct: each employee counted once SUM(productivity_agg.totalLinesOfCode) as total_lines_of_codeFROM employeesLEFT JOIN departments ON employees.department_id = departments.idLEFT JOIN productivity_agg ON employees.id = productivity_agg.employee_idWHERE employees.organisation_id = $1 AND employees.is_active = $4GROUP BY departments.name, productivity_agg.month -- Group by pre-computed valuesORDER BY productivity_agg.month ASC
CTE Detection Logic
Section titled “CTE Detection Logic”The planner automatically detects when CTEs are needed:
// From query-planner.ts:428private planPreAggregationCTEs( cubes: Map<string, Cube<TSchema>>, primaryCube: Cube<TSchema>, joinCubes: QueryPlan<TSchema>['joinCubes'], query: SemanticQuery): QueryPlan<TSchema>['preAggregationCTEs'] {
// Check each join cube for hasMany relationships for (const joinCube of joinCubes) { const hasManyJoinDef = this.findHasManyJoinDef(primaryCube, joinCube.cube.name) if (!hasManyJoinDef) continue // Not hasMany = no fan-out risk
// Check if we have measures from this hasMany cube const measuresFromThisCube = query.measures?.filter(m => m.startsWith(joinCube.cube.name + '.') ) || []
if (measuresFromThisCube.length > 0) { // Create CTE to pre-aggregate this cube's data preAggCTEs.push({ cube: joinCube.cube, cteAlias: `${joinCube.cube.name.toLowerCase()}_agg`, joinKeys: extractedJoinKeys, measures: measuresFromThisCube }) } }}
Phase 3: SQL Generation with Drizzle
Section titled “Phase 3: SQL Generation with Drizzle”The QueryBuilder
generates type-safe SQL using Drizzle ORM’s query builder. Every piece of SQL goes through Drizzle - no string concatenation or manual SQL construction.
Selection Building
Section titled “Selection Building”// From query-builder.ts:57buildSelections(cubes, query, context): Record<string, SQL | AnyColumn> { const selections: Record<string, SQL | AnyColumn> = {}
// Add dimensions with explicit aliases if (query.dimensions) { for (const dimensionName of query.dimensions) { const [cubeName, fieldName] = dimensionName.split('.') const cube = cubeMap.get(cubeName) if (cube && cube.dimensions[fieldName]) { const dimension = cube.dimensions[fieldName] const sqlExpr = resolveSqlExpression(dimension.sql, context) selections[dimensionName] = sql`${sqlExpr}`.as(dimensionName) } } }
// Add measures with aggregation if (query.measures) { for (const measureName of query.measures) { const [cubeName, fieldName] = measureName.split('.') const measure = cube.measures[fieldName] const aggregatedExpr = this.buildMeasureExpression(measure, context) selections[measureName] = sql`${aggregatedExpr}`.as(measureName) } }}
Measure Expression Building
Section titled “Measure Expression Building”Measures can include complex filtering using CASE WHEN
expressions:
// From query-builder.ts:168buildMeasureExpression(measure, context): SQL { let baseExpr = resolveSqlExpression(measure.sql, context)
// Apply measure filters if they exist if (measure.filters && measure.filters.length > 0) { const filterConditions = measure.filters.map(filter => filter(context)) const andCondition = and(...filterConditions)
// Use CASE WHEN for conditional aggregation const caseExpr = this.databaseAdapter.buildCaseWhen([ { when: andCondition, then: baseExpr } ]) baseExpr = caseExpr }
// Apply aggregation function switch (measure.type) { case 'count': return count(baseExpr) case 'sum': return sum(baseExpr) case 'avg': return this.databaseAdapter.buildAvg(baseExpr) // ... other aggregation types }}
Example generated SQL for a filtered measure:
-- For measure: activeCount with filter on employees.active = trueCOUNT(CASE WHEN employees.active = $1 THEN employees.id END) as activeCount
WHERE vs HAVING Clause Separation
Section titled “WHERE vs HAVING Clause Separation”Drizzle Cube correctly separates dimension filters (WHERE) from measure filters (HAVING):
// Dimension filters → WHERE clause (before aggregation)const whereConditions = this.queryBuilder.buildWhereConditions(cubes, query, context)if (whereConditions.length > 0) { drizzleQuery = drizzleQuery.where(and(...whereConditions))}
// Measure filters → HAVING clause (after aggregation)const havingConditions = this.queryBuilder.buildHavingConditions(cubes, query, context)if (havingConditions.length > 0) { drizzleQuery = drizzleQuery.having(and(...havingConditions))}
This ensures filters are applied at the correct stage of query processing.
Phase 4: Query Execution
Section titled “Phase 4: Query Execution”The QueryExecutor
orchestrates the entire process:
// From executor.ts:58async execute(cubes, query, securityContext): Promise<QueryResult> { // 1. Validate query const validation = validateQueryAgainstCubes(cubes, query) if (!validation.isValid) throw new Error(validation.errors.join(', '))
// 2. Create query context with security const context: QueryContext<TSchema> = { db: this.dbExecutor.db, schema: this.dbExecutor.schema, securityContext // Injected into all cube SQL functions }
// 3. Create unified query plan const queryPlan = this.queryPlanner.createQueryPlan(cubes, query, context)
// 4. Build Drizzle query const builtQuery = this.buildUnifiedQuery(queryPlan, query, context)
// 5. Execute with type conversion const numericFields = this.queryBuilder.collectNumericFields(cubes, query) const data = await this.dbExecutor.execute(builtQuery, numericFields)
return { data, annotation: this.generateAnnotations(queryPlan, query) }}
CTE Building
Section titled “CTE Building”When CTEs are needed, they’re built first and referenced in the main query:
// From executor.ts:142private buildPreAggregationCTE(cteInfo, query, context, queryPlan) { const cube = cteInfo.cube const cubeBase = cube.sql(context) // Gets security filtering automatically
// Build CTE selections: join keys + measures + requested dimensions const cteSelections = {}
// Add join key columns for (const joinKey of cteInfo.joinKeys) { cteSelections[joinKey.targetColumn] = joinKey.targetColumnObj }
// Add measures with aggregation for (const measureName of cteInfo.measures) { const [, fieldName] = measureName.split('.') const measure = cube.measures[fieldName] const aggregatedExpr = this.queryBuilder.buildMeasureExpression(measure, context) cteSelections[fieldName] = sql`${aggregatedExpr}`.as(fieldName) }
// Build CTE query with GROUP BY on join keys let cteQuery = context.db .select(cteSelections) .from(cubeBase.from) .where(cubeBase.where) // Security context applied .groupBy(...joinKeyColumns)
return context.db.$with(cteInfo.cteAlias).as(cteQuery)}
Security and Type Safety
Section titled “Security and Type Safety”Security Context Injection
Section titled “Security Context Injection”Every cube’s sql
function receives a security context:
// From examples/hono/cubes.tsemployeesCube = defineCube('Employees', { sql: (ctx: QueryContext<Schema>): BaseQueryDefinition => ({ from: employees, // Security filtering applied automatically to every query where: eq(employees.organisationId, ctx.securityContext.organisationId) })})
This ensures every generated query includes security constraints, preventing data leaks.
SQL Injection Prevention
Section titled “SQL Injection Prevention”All values go through Drizzle’s parameterized query system:
// Filter values are always parameterizedbuildFilterCondition(fieldExpr, operator, values): SQL { const filteredValues = values.map(this.databaseAdapter.convertFilterValue)
switch (operator) { case 'equals': return eq(fieldExpr, filteredValues[0]) // → WHERE field = $1 case 'inDateRange': return and( gte(fieldExpr, normalizedStartDate), // → WHERE field >= $2 lte(fieldExpr, normalizedEndDate) // → AND field <= $3 ) }}
No string concatenation ever occurs - all SQL is built through Drizzle’s type-safe methods.
Performance Optimizations
Section titled “Performance Optimizations”Intelligent Join Ordering
Section titled “Intelligent Join Ordering”The query planner orders joins to minimize intermediate result sets:
- Primary cube provides the base row set
- belongsTo relationships joined first (typically reduces rows)
- hasMany relationships use CTEs to avoid fan-out
Database-Specific Adaptations
Section titled “Database-Specific Adaptations”Different databases have different SQL dialects and performance characteristics:
// PostgreSQL time dimension truncationbuildTimeDimension(granularity, baseExpr): SQL { switch (granularity) { case 'day': return sql`DATE_TRUNC('day', ${baseExpr})` case 'month': return sql`DATE_TRUNC('month', ${baseExpr})` case 'year': return sql`DATE_TRUNC('year', ${baseExpr})` }}
// MySQL time dimension truncationbuildTimeDimension(granularity, baseExpr): SQL { switch (granularity) { case 'day': return sql`DATE(${baseExpr})` case 'month': return sql`DATE_FORMAT(${baseExpr}, '%Y-%m-01')` case 'year': return sql`DATE_FORMAT(${baseExpr}, '%Y-01-01')` }}
Selective Type Conversion
Section titled “Selective Type Conversion”Only numeric fields are converted from strings to numbers in the result processing:
const numericFields = this.queryBuilder.collectNumericFields(cubes, query)const data = await this.dbExecutor.execute(builtQuery, numericFields)
This avoids unnecessary type conversions on string dimensions and time values.
Real-World Example
Section titled “Real-World Example”Let’s trace through a complex query using our example cubes:
Query: “Show monthly productivity by department for active employees”
const query = { measures: ["Employees.count", "Productivity.totalLinesOfCode"], dimensions: ["Departments.name"], timeDimensions: [{ dimension: "Productivity.date", granularity: "month", dateRange: ["2024-01-01", "2024-12-31"] }], filters: [ { member: "Employees.isActive", operator: "equals", values: [true] } ]}
Query Plan:
- Primary Cube:
Employees
(has dimensions + connectivity) - Join Cubes:
Departments
(belongsTo),Productivity
(hasMany) - CTE Required: Yes (hasMany relationship with measures)
Generated SQL:
WITH productivity_agg AS ( -- Pre-aggregate productivity data to prevent fan-out SELECT employee_id, DATE_TRUNC('month', date) as month, SUM(lines_of_code) as totalLinesOfCode FROM productivity WHERE organisation_id = $1 AND date >= $2 AND date <= $3 GROUP BY employee_id, DATE_TRUNC('month', date))SELECT departments.name as "Departments.name", productivity_agg.month as "Productivity.date", COUNT(employees.id) as "Employees.count", SUM(productivity_agg.totalLinesOfCode) as "Productivity.totalLinesOfCode"FROM employeesLEFT JOIN departments ON employees.department_id = departments.idLEFT JOIN productivity_agg ON employees.id = productivity_agg.employee_id AND DATE_TRUNC('month', productivity_agg.month) = productivity_agg.monthWHERE employees.organisation_id = $1 AND employees.is_active = $4GROUP BY departments.name, productivity_agg.monthORDER BY productivity_agg.month ASC
Key points:
- ✅ Security context applied to all tables (
organisation_id = $1
) - ✅ Time filter applied in CTE to reduce data before join
- ✅ No fan-out - each employee counted once per month
- ✅ All values parameterized (
$1
,$2
,$3
,$4
) - ✅ Proper aggregation levels maintained
Conclusion
Section titled “Conclusion”Drizzle Cube’s architecture solves the fundamental challenges of semantic layer analytics:
- Fan-out prevention through intelligent CTE pre-aggregation
- Type safety through Drizzle ORM integration
- Security through consistent context injection
- Performance through optimized query planning
- Flexibility through dynamic SQL generation
This sophisticated system allows developers to focus on defining business logic in cubes while the framework handles the complex query optimization and execution details automatically.
The result is correct, performant, and secure analytics queries that scale from simple dashboards to complex multi-cube analysis scenarios.