Delta Vacuum Configuration - SQL Server Management Guide
Overview
Delta table vacuum configuration is now stored in SQL Server table
[dbo].[DeltaVacuumConfiguration]. This allows dynamic configuration without
code changes.
Current Configuration: 8 managed tables + 17 external tables = 25 total Delta tables per environment
Table Structure
[dbo].[DeltaVacuumConfiguration]
├── ConfigurationId (PK, Identity)
├── TableName (Required)
├── TableType (Required: 'MANAGED' or 'EXTERNAL')
├── Container (Required for EXTERNAL, NULL for MANAGED)
├── Path (Required for EXTERNAL, NULL for MANAGED)
├── RetentionHours (NULL = use environment default)
├── ZOrderColumns (Comma-separated list for OPTIMIZE ZORDER BY)
├── IsActive (1 = active, 0 = disabled)
├── Notes (Optional description)
├── CreatedDate (Auto-generated)
└── ModifiedDate (Auto-updated on changes)
Note: Each environment (PRD/STG/TST/DEV) has its own database containing this table.
How It Works
- Database Per Environment: Each environment has its own database (S0PMOBAI_PRD, S0PMOBAI_STG, etc.)
- SQL Query: Loads all tables where
IsActive = 1from current environment's database - Processing: Tables are processed with their configured or default retention
Common Operations
View Active Configuration
SELECT *
FROM [dbo].[DeltaVacuumConfiguration]
WHERE IsActive = 1
ORDER BY TableType, TableName;
Add a New Managed Table
INSERT INTO [dbo].[DeltaVacuumConfiguration]
(TableName, TableType, Container, Path, RetentionHours, ZOrderColumns, IsActive, Notes)
VALUES
('new_table_name', 'MANAGED', NULL, NULL, NULL, 'ItemCode', 1, 'Description of table');
Add a New External Table
INSERT INTO [dbo].[DeltaVacuumConfiguration]
(TableName, TableType, Container, Path, RetentionHours, ZOrderColumns, IsActive, Notes)
VALUES
('TableName', 'EXTERNAL', 'container-name', '/path/to/table', NULL, NULL, 1, 'Description');
-- Note: Add to each environment's database separately
Disable a Table (Don't Delete - Keep History)
UPDATE [dbo].[DeltaVacuumConfiguration]
SET IsActive = 0
WHERE TableName = 'table_to_disable';
Re-enable a Disabled Table
UPDATE [dbo].[DeltaVacuumConfiguration]
SET IsActive = 1
WHERE TableName = 'table_to_enable';
Set Custom Retention for Compliance (30 days)
UPDATE [dbo].[DeltaVacuumConfiguration]
SET RetentionHours = 720,
Notes = 'Compliance requirement - 30 day retention'
WHERE TableName = 'CouponDetails';
-- Note: Apply to PRD database only
Remove Custom Retention (Use Environment Default)
UPDATE [dbo].[DeltaVacuumConfiguration]
SET RetentionHours = NULL
WHERE TableName = 'SomeTable';
View Tables by Type
-- Count by type
SELECT
TableType,
COUNT(*) AS TableCount,
SUM(CASE WHEN IsActive = 1 THEN 1 ELSE 0 END) AS ActiveCount
FROM [dbo].[DeltaVacuumConfiguration]
GROUP BY TableType
ORDER BY TableType;
Find Tables with Custom Retention
SELECT
TableName,
TableType,
RetentionHours,
RetentionHours / 24.0 AS RetentionDays,
Notes
FROM [dbo].[DeltaVacuumConfiguration]
WHERE RetentionHours IS NOT NULL
ORDER BY RetentionHours DESC;
Environment Default Retention
- PRD: 336 hours (14 days)
- STG: 168 hours (7 days)
- TST: 96 hours (4 days)
- DEV: 48 hours (2 days)
Best Practices
1. Never Delete Rows
- Use
IsActive = 0to disable instead of DELETE - Maintains audit history
2. One Database Per Environment
- Each environment has its own database (S0PMOBAI_PRD, S0PMOBAI_STG, etc.)
- Configuration changes apply only to that environment
- Managed tables in hive_metastore are accessible from all environments
3. Replicate Configuration Across Environments
- Use SQL scripts to maintain consistent configuration
- Test changes in DEV database first
- Deploy same scripts to STG, TST, PRD databases
4. Test in DEV First
-- Add to DEV database first
INSERT INTO [dbo].[DeltaVacuumConfiguration]
(TableName, TableType, Container, Path, RetentionHours, ZOrderColumns, IsActive, Notes)
VALUES
('new_table', 'EXTERNAL', 'container', '/path', NULL, NULL, 1, 'Testing');
-- After validation, run same script in STG, TST, PRD databases
5. Document with Notes
- Always populate
Notesfield - Explain why table needs vacuum
- Document any special retention requirements
6. Monitor Modified Dates
-- Recently modified configurations
SELECT TOP 10
TableName,
TableType,
IsActive,
ZOrderColumns,
ModifiedDate,
Notes
FROM [dbo].[DeltaVacuumConfiguration]
ORDER BY ModifiedDate DESC;
Troubleshooting
Table Not Being Vacuumed?
- Check if
IsActive = 1 - Verify you're connected to the correct environment's database
- Check notebook logs for configuration load errors
Wrong Environment Processing Table?
- Verify the JDBC connection string points to correct database
- Check ENVIRONMENT variable in Databricks job configuration
- Ensure database name matches environment (S0PMOBAI_PRD, S0PMOBAI_STG, etc.)
Custom Retention Not Working?
- Ensure
RetentionHoursis an integer (not NULL, not string) - Check notebook logs for "custom: XXXh" indicator
Migration Notes
Old Approach: Hard-coded Python lists in notebook New Approach: SQL Server configuration table
Benefits: ✅ No code changes needed to add/remove tables ✅ Centralized configuration management ✅ Audit trail (ModifiedDate, Notes) ✅ Easy disable/enable without deleting ✅ Environment-specific control ✅ Can manage via SQL queries or admin UI
Security
- Table is in
dboschema - Requires database write permissions to modify
- Read permissions needed for vacuum notebook
- Changes are automatically timestamped
Examples
Seasonal Table (Only Vacuum During Non-Peak)
-- Disable during holiday season (run in PRD database)
UPDATE [dbo].[DeltaVacuumConfiguration]
SET IsActive = 0, Notes = 'Disabled during Black Friday - Re-enable after Nov 30'
WHERE TableName = 'holiday_deals';
-- Re-enable after season (run in PRD database)
UPDATE [dbo].[DeltaVacuumConfiguration]
SET IsActive = 1, Notes = 'Re-enabled after holiday season'
WHERE TableName = 'holiday_deals';
Add Same Table to All Environments
-- Run this same script in each environment's database (DEV, TST, STG, PRD)
-- Managed table
INSERT INTO [dbo].[DeltaVacuumConfiguration]
(TableName, TableType, Container, Path, RetentionHours, ZOrderColumns, IsActive, Notes)
VALUES
('new_managed_table', 'MANAGED', NULL, NULL, NULL, 'ItemCode', 1, 'New table');
-- External table
INSERT INTO [dbo].[DeltaVacuumConfiguration]
(TableName, TableType, Container, Path, RetentionHours, ZOrderColumns, IsActive, Notes)
VALUES
('new_external', 'EXTERNAL', 'container', '/path', NULL, NULL, 1, 'External table');
-- Note: Run in S0PMOBAI_DEV, then S0PMOBAI_TST, S0PMOBAI_STG, S0PMOBAI_PRD
Support
For questions or issues:
- Check notebook Application Insights logs
- Query configuration table for unexpected entries
- Contact Data Engineering team