Skip to main content

Delta Vacuum Configuration - SQL Server Management Guide

Overview

Delta table vacuum configuration is now stored in SQL Server table [dbo].[DeltaVacuumConfiguration]. This allows dynamic configuration without code changes.

Current Configuration: 8 managed tables + 17 external tables = 25 total Delta tables per environment

Table Structure

[dbo].[DeltaVacuumConfiguration]
├── ConfigurationId (PK, Identity)
├── TableName (Required)
├── TableType (Required: 'MANAGED' or 'EXTERNAL')
├── Container (Required for EXTERNAL, NULL for MANAGED)
├── Path (Required for EXTERNAL, NULL for MANAGED)
├── RetentionHours (NULL = use environment default)
├── ZOrderColumns (Comma-separated list for OPTIMIZE ZORDER BY)
├── IsActive (1 = active, 0 = disabled)
├── Notes (Optional description)
├── CreatedDate (Auto-generated)
└── ModifiedDate (Auto-updated on changes)

Note: Each environment (PRD/STG/TST/DEV) has its own database containing this table.

How It Works

  1. Database Per Environment: Each environment has its own database (S0PMOBAI_PRD, S0PMOBAI_STG, etc.)
  2. SQL Query: Loads all tables where IsActive = 1 from current environment's database
  3. Processing: Tables are processed with their configured or default retention

Common Operations

View Active Configuration

SELECT * 
FROM [dbo].[DeltaVacuumConfiguration]
WHERE IsActive = 1
ORDER BY TableType, TableName;

Add a New Managed Table

INSERT INTO [dbo].[DeltaVacuumConfiguration] 
(TableName, TableType, Container, Path, RetentionHours, ZOrderColumns, IsActive, Notes)
VALUES
('new_table_name', 'MANAGED', NULL, NULL, NULL, 'ItemCode', 1, 'Description of table');

Add a New External Table

INSERT INTO [dbo].[DeltaVacuumConfiguration] 
(TableName, TableType, Container, Path, RetentionHours, ZOrderColumns, IsActive, Notes)
VALUES
('TableName', 'EXTERNAL', 'container-name', '/path/to/table', NULL, NULL, 1, 'Description');

-- Note: Add to each environment's database separately

Disable a Table (Don't Delete - Keep History)

UPDATE [dbo].[DeltaVacuumConfiguration] 
SET IsActive = 0
WHERE TableName = 'table_to_disable';

Re-enable a Disabled Table

UPDATE [dbo].[DeltaVacuumConfiguration] 
SET IsActive = 1
WHERE TableName = 'table_to_enable';

Set Custom Retention for Compliance (30 days)

UPDATE [dbo].[DeltaVacuumConfiguration] 
SET RetentionHours = 720,
Notes = 'Compliance requirement - 30 day retention'
WHERE TableName = 'CouponDetails';

-- Note: Apply to PRD database only

Remove Custom Retention (Use Environment Default)

UPDATE [dbo].[DeltaVacuumConfiguration] 
SET RetentionHours = NULL
WHERE TableName = 'SomeTable';

View Tables by Type

-- Count by type
SELECT
TableType,
COUNT(*) AS TableCount,
SUM(CASE WHEN IsActive = 1 THEN 1 ELSE 0 END) AS ActiveCount
FROM [dbo].[DeltaVacuumConfiguration]
GROUP BY TableType
ORDER BY TableType;

Find Tables with Custom Retention

SELECT 
TableName,
TableType,
RetentionHours,
RetentionHours / 24.0 AS RetentionDays,
Notes
FROM [dbo].[DeltaVacuumConfiguration]
WHERE RetentionHours IS NOT NULL
ORDER BY RetentionHours DESC;

Environment Default Retention

  • PRD: 336 hours (14 days)
  • STG: 168 hours (7 days)
  • TST: 96 hours (4 days)
  • DEV: 48 hours (2 days)

Best Practices

1. Never Delete Rows

  • Use IsActive = 0 to disable instead of DELETE
  • Maintains audit history

2. One Database Per Environment

  • Each environment has its own database (S0PMOBAI_PRD, S0PMOBAI_STG, etc.)
  • Configuration changes apply only to that environment
  • Managed tables in hive_metastore are accessible from all environments

3. Replicate Configuration Across Environments

  • Use SQL scripts to maintain consistent configuration
  • Test changes in DEV database first
  • Deploy same scripts to STG, TST, PRD databases

4. Test in DEV First

-- Add to DEV database first
INSERT INTO [dbo].[DeltaVacuumConfiguration]
(TableName, TableType, Container, Path, RetentionHours, ZOrderColumns, IsActive, Notes)
VALUES
('new_table', 'EXTERNAL', 'container', '/path', NULL, NULL, 1, 'Testing');

-- After validation, run same script in STG, TST, PRD databases

5. Document with Notes

  • Always populate Notes field
  • Explain why table needs vacuum
  • Document any special retention requirements

6. Monitor Modified Dates

-- Recently modified configurations
SELECT TOP 10
TableName,
TableType,
IsActive,
ZOrderColumns,
ModifiedDate,
Notes
FROM [dbo].[DeltaVacuumConfiguration]
ORDER BY ModifiedDate DESC;

Troubleshooting

Table Not Being Vacuumed?

  1. Check if IsActive = 1
  2. Verify you're connected to the correct environment's database
  3. Check notebook logs for configuration load errors

Wrong Environment Processing Table?

  • Verify the JDBC connection string points to correct database
  • Check ENVIRONMENT variable in Databricks job configuration
  • Ensure database name matches environment (S0PMOBAI_PRD, S0PMOBAI_STG, etc.)

Custom Retention Not Working?

  • Ensure RetentionHours is an integer (not NULL, not string)
  • Check notebook logs for "custom: XXXh" indicator

Migration Notes

Old Approach: Hard-coded Python lists in notebook New Approach: SQL Server configuration table

Benefits: ✅ No code changes needed to add/remove tables ✅ Centralized configuration management ✅ Audit trail (ModifiedDate, Notes) ✅ Easy disable/enable without deleting ✅ Environment-specific control ✅ Can manage via SQL queries or admin UI

Security

  • Table is in dbo schema
  • Requires database write permissions to modify
  • Read permissions needed for vacuum notebook
  • Changes are automatically timestamped

Examples

Seasonal Table (Only Vacuum During Non-Peak)

-- Disable during holiday season (run in PRD database)
UPDATE [dbo].[DeltaVacuumConfiguration]
SET IsActive = 0, Notes = 'Disabled during Black Friday - Re-enable after Nov 30'
WHERE TableName = 'holiday_deals';

-- Re-enable after season (run in PRD database)
UPDATE [dbo].[DeltaVacuumConfiguration]
SET IsActive = 1, Notes = 'Re-enabled after holiday season'
WHERE TableName = 'holiday_deals';

Add Same Table to All Environments

-- Run this same script in each environment's database (DEV, TST, STG, PRD)

-- Managed table
INSERT INTO [dbo].[DeltaVacuumConfiguration]
(TableName, TableType, Container, Path, RetentionHours, ZOrderColumns, IsActive, Notes)
VALUES
('new_managed_table', 'MANAGED', NULL, NULL, NULL, 'ItemCode', 1, 'New table');

-- External table
INSERT INTO [dbo].[DeltaVacuumConfiguration]
(TableName, TableType, Container, Path, RetentionHours, ZOrderColumns, IsActive, Notes)
VALUES
('new_external', 'EXTERNAL', 'container', '/path', NULL, NULL, 1, 'External table');

-- Note: Run in S0PMOBAI_DEV, then S0PMOBAI_TST, S0PMOBAI_STG, S0PMOBAI_PRD

Support

For questions or issues:

  1. Check notebook Application Insights logs
  2. Query configuration table for unexpected entries
  3. Contact Data Engineering team