Databricks Community

Prasanna_N · ‎03-17-2025

i have data from march1 to march 14 in the final inference table and i have given 1 week granularity. after that profile and drift table is generated and i see the window start time as like this object

start:
"2025-02-24T00:00:00.000Z"
end:
"2025-03-03T00:00:00.000Z"

Question: i don't have February data at all then how it is starting from February coming and why not from March 1 because i have data starting from March 1. please help me understand. and especially about the granularity how it is working and how good i can utilize that. what is the best way to get the insights from dashboard.

mark_ott · ‎10-24-2025

The reason your profile or drift table shows a window starting earlier than your actual data date (February 24 instead of March 1) is due to how granularity and time-window alignment work in your monitoring setup.

Why the window starts at February 24

When you set a weekly granularity, the system automatically groups data by calendar week boundaries, not by your dataset’s start date. In most monitoring systems like Azure ML or Visier, weekly granularity typically follows the ISO week convention (Monday through Sunday).

So when your data begins on March 1, 2025 (Saturday):

The week that includes March 1 actually starts on Monday, February 24, 2025, and ends on Sunday, March 2, 2025.
Therefore, your first monitoring window is labeled as starting on 2025-02-24T00:00:00Z and ending on 2025-03-03T00:00:00Z, even if no data exists for February 24–28.

This ensures consistent week-to-week comparisons across your system, helping drift detection tools aggregate data uniformly within predefined temporal buckets.

How granularity affects your analysis

Granularity determines how the data is grouped and summarized for drift analysis or profiling :

Daily granularity evaluates drift per day — detailed but can be noisy.
Weekly granularity smooths short-term variance, giving cleaner insight into slow changes or long-term drift.
Monthly granularity aggregates heavily — useful for stable, long-term tracking.

Setting a coarser granularity (like weekly) enlarges each time window, which may cause start times to appear earlier than your visible data because of anchor alignment to natural calendar intervals.

Best practices for interpreting dashboard insights

To make the best use of your drift/profiling dashboard:

Confirm window alignment: Understand that windows cover fixed periods (weeks, months) defined by system rules, not your dataset boundaries.
Filter empty windows: Ignore windows with no actual data records loaded; drift metrics will naturally be zero or incomplete.
Use weekly or daily granularity depending on data volume:
- For continuous, high-frequency model input — daily granularity gives precision.
- For moderate datasets (like yours) — weekly granularity balances stability and detectability.
Leverage feature-level drift metrics: Focus on which input features are showing the largest change in distribution between windows; these usually explain model degradation or instability.

In short, the February 24 start time appears because your chosen one-week granularity anchors to full calendar weeks rather than your actual data start date. This is normal and ensures consistent comparison across all future monitoring windows.

AbhayPSingh · ‎10-24-2025

More or less repeating what Mark said and adding some additional thoughts.

Why the Window Starts from February 24

The reason you're seeing a window starting from February 24 (even though your data starts March 1) is because monitoring systems align time windows to standard calendar boundaries rather than your data's actual start date.

When you set a 1-week granularity:

- The system creates windows aligned to calendar weeks (typically Monday-Sunday or Sunday-Saturday)

- March 1, 2025 is a Saturday

- The calendar week containing March 1 actually starts on February 24 (Monday) and ends on March 2 (Sunday)

- This is why your first window shows: February 24 00:00:00 to March 3 00:00:00

How Granularity Works

Granularity in monitoring systems determines:

1. Window Size: The time period for aggregating metrics

- 1 week = 7-day windows

- Windows are fixed to calendar boundaries

2. Alignment: Windows snap to standard intervals

- Weekly: Aligns to start of week (Monday or Sunday)

- Daily: Aligns to midnight

- Monthly: Aligns to first of month

3. Coverage: Each window includes all data points within that time range

- Your March 1-2 data falls into the Feb 24 - Mar 2 window

- March 3-9 data goes into the next window

- March 10-14 data goes into a third window

Best Practices for Granularity Selection

Choose granularity based on:

1. Data Volume

- High volume (1000s/day): Use daily or weekly

- Medium volume (100s/day): Use weekly or bi-weekly

- Low volume (<100/day): Use monthly

2. Change Detection Needs

- Rapid drift detection: Use daily

- Stable patterns: Use weekly/monthly

- Seasonal patterns: Match the seasonality period

3. Business Requirements

- Real-time monitoring: Daily or shorter

- Trend analysis: Weekly/monthly

- Reporting cycles: Align with business reporting

Getting Insights from the Dashboard

To maximize dashboard value:

1. Focus on Drift Metrics

- Look for sudden spikes in drift scores

- Compare consecutive windows for trends

- Set alerts for significant drift thresholds

2. Analyze Feature Statistics

- Monitor mean/median shifts

- Check distribution changes (histograms)

- Track null rates and data quality

3. Time-based Patterns

- Compare weekday vs weekend patterns

- Identify seasonal trends

- Look for gradual vs sudden changes

4. Actionable Insights

- Prioritize features with highest drift

- Correlate drift with model performance

- Document when/why drift occurs

Recommendations for Your Setup

Given your March 1-14 data with weekly granularity:

1. Expect 3 windows:

- Feb 24 - Mar 2 (contains Mar 1-2 data)

- Mar 3 - Mar 9 (full week of data)

- Mar 10 - Mar 16 (contains Mar 10-14 data)

2. Consider adjusting granularity if:

- You need faster drift detection → Use daily

- You have limited data → Use bi-weekly

- You want smoother trends → Use monthly

3. Handle partial windows:

- First/last windows may have less data

- Consider minimum data thresholds for reliable metrics

- Document expected vs actual data coverage

The February start date in your window is completely normal behavior - it's the system ensuring consistent, comparable time windows aligned to calendar boundaries rather than your data boundaries.