# Implementation Notes: Metadata Features
## Overview
This document captures lessons learned from implementing auto-generated metadata features for Redis documentation pages, including:
- Table of Contents (TOC) metadata
- Per-language identifiers and code examples
- Per-codetabs metadata with language/client mappings
- Metadata deduplication with location tracking
These insights should help guide future metadata feature implementations.
## Key Lessons
### 1. Start with Hugo's Built-in Functions
**Lesson**: Always check what Hugo provides before building custom solutions.
**Context**: Initial attempts tried to manually extract headers from page content using custom partials. This was complex, error-prone, and required parsing HTML/Markdown.
**Solution**: Hugo's `.TableOfContents` method already generates HTML TOC from page headings. Using this as the source was much simpler and more reliable.
**Takeaway**: For future metadata features, audit Hugo's built-in methods first. They often solve 80% of the problem with minimal code.
### 2. Regex Substitution for Format Conversion
**Lesson**: Simple regex transformations can convert between formats more reliably than complex parsing.
**Context**: Converting HTML to JSON seemed like it would require a full HTML parser or complex state machine.
**Solution**: Breaking the conversion into small, sequential regex steps:
1. Remove wrapper elements (``)
2. Replace structural tags (`
` → `[`, `
` → `]`)
3. Replace content tags (`
TITLE` → `{"id":"ID","title":"TITLE"`)
4. Add structural elements (commas, nested arrays)
**Takeaway**: For format conversions, think in terms of sequential substitution patterns rather than parsing. This is often simpler and more maintainable.
### 3. Hugo Template Whitespace Matters
**Lesson**: Hugo template whitespace and comments generate output that affects final formatting.
**Context**: Generated JSON had many blank lines, making it less readable.
**Solution**: Use Hugo's whitespace trimming markers (`{{-` and `-}}`) to prevent unwanted newlines.
**Takeaway**: When generating structured output (JSON, YAML), always consider whitespace. Test the final output, not just the template logic.
### 4. Markdown Templates Have Different Processing Rules
**Lesson**: Hugo's markdown template processor (`.md` files) behaves differently from HTML templates.
**Context**: Initial attempts to include metadata in markdown output failed because the template processor treated code blocks as boundaries.
**Solution**: Place metadata generation in the template itself, not in content blocks. Use `safeHTML` filter to prevent HTML entity escaping.
**Takeaway**: When targeting multiple output formats, test each format separately. Markdown templates have unique constraints that HTML templates don't have.
### 5. Validate Against Schema Early
**Lesson**: Create the schema before or immediately after implementation, not after.
**Context**: Schema was created last, after implementation was complete.
**Better approach**: Define the schema first, then implement to match it. This:
- Clarifies the target structure
- Enables validation during development
- Provides documentation for implementers
- Helps catch structural issues early
**Takeaway**: For future metadata features, write the schema first as a specification.
### 6. Centralize Configuration in `config.toml`
**Lesson**: Language and client identifiers should be centralized in configuration, not hardcoded in templates.
**Context**: When implementing per-language metadata, we initially considered hardcoding language/client mappings in templates. This would have been error-prone and difficult to maintain.
**Solution**: Created a centralized `clientsConfig` in `config.toml` with:
```toml
[params.clientsConfig.Python]
langId = "python"
clientId = "redis-py"
clientName = "redis-py"
```
Then referenced this in templates via `index $.Site.Params.clientsConfig $tabTitle`.
**Takeaway**: For metadata that maps display names to stable identifiers, use `config.toml` as the single source of truth. This enables:
- Easy updates without template changes
- Consistency across all pages
- Clear documentation of all supported languages/clients
- Reusability across multiple templates
### 7. Use Data Attributes for Per-Element Metadata
**Lesson**: For metadata that applies to individual DOM elements (not page-level), use data attributes instead of separate metadata blocks.
**Context**: When implementing per-codetabs metadata, we could have created separate metadata blocks for each codetabs container. Instead, we used a `data-codetabs-meta` attribute on the container itself.
**Solution**: Store JSON metadata directly in data attributes:
```html
```
**Benefits**:
- Single source of truth per element
- No duplication across panels
- Easy runtime access via `element.getAttribute()`
- Scales well with multiple instances on same page
- Reduces overall page size vs. separate metadata blocks
**Takeaway**: For element-level metadata, prefer data attributes over separate metadata blocks. This is more efficient and easier to access at runtime.
### 8. Clarify Duplicate Metadata with Location Fields
**Lesson**: When metadata is duplicated in multiple locations, explicitly mark which is primary and which is fallback.
**Context**: We embed page metadata in both `` (script tag) and `` (hidden div) for redundancy. Without clear marking, downstream tools couldn't determine which to use.
**Solution**: Added two fields to every metadata instance:
- `location`: "head" or "body" - indicates where this copy is located
- `duplicateOf`: "head:data-ai-metadata" - references the primary copy (only in duplicates)
**Benefits**:
- Eliminates confusion for downstream tooling
- Enables smart caching (use head, skip body)
- Supports fallback logic (if head unavailable, use body)
- Documents precedence clearly
- Minimal overhead (just 2 small fields)
**Takeaway**: When duplicating metadata for redundancy, always include location markers. This enables intelligent handling by tools and AI agents.
### 9. Document Metadata Precedence Explicitly
**Lesson**: When multiple metadata sources exist, document which takes precedence and why.
**Context**: With head and body metadata, per-codetabs metadata, and per-panel attributes, tools need to know which to use.
**Solution**: Added a "Metadata Precedence" section to documentation that clearly states:
1. Prefer head metadata (primary, efficient)
2. Use body as fallback (if head unavailable)
3. Check `duplicateOf` field (indicates duplicate)
**Takeaway**: Always document metadata precedence explicitly. This prevents tools from making incorrect assumptions and enables consistent behavior across different implementations.
### 10. Test Multiple Page Types
**Lesson**: Metadata features must work across different page types with different content.
**Context**: Implementation was tested on data types pages and command pages, which have different metadata fields.
**Takeaway**: Always test on at least 2-3 different page types to ensure the feature is robust and handles optional fields correctly.
### 11. Document Optional Metadata Fields Thoroughly
**Lesson**: Optional metadata fields require clear documentation about when to use them and what they mean.
**Context**: The `buildsUpon` field was added to code examples to indicate learning progression, but without clear guidance, content authors didn't know when to use it.
**Solution**: Created comprehensive documentation including:
- Field definition in PAGE_METADATA_FORMAT.md
- Usage guidance in tcedocs/README.md with patterns and best practices
- AI agent guide explaining how to consume the metadata
- Validation rules specifying constraints and error handling
**Takeaway**: For optional metadata fields, provide:
1. **What it is**: Clear definition and purpose
2. **When to use it**: Specific guidance on when the field should be present
3. **How to use it**: Examples and patterns
4. **How to consume it**: Guidance for downstream tools and AI agents
5. **Validation rules**: Constraints and error handling
### 12. Provide Multiple Documentation Layers
**Lesson**: Different audiences need different documentation.
**Context**: The `buildsUpon` feature needed documentation for:
- Content authors (when/how to use it)
- AI agents (how to consume and use the metadata)
- Build system (validation rules and constraints)
**Solution**: Created separate documentation files:
- `PAGE_METADATA_FORMAT.md` - Metadata structure and examples
- `tcedocs/README.md` - Content author guide with patterns
- `BUILDSUPON_AI_AGENT_GUIDE.md` - AI agent consumption patterns
- `BUILDSUPON_VALIDATION_RULES.md` - Validation rules and constraints
**Takeaway**: For complex features, create multiple documentation files targeting different audiences:
- **Specification docs** for implementers
- **User guides** for content authors
- **Integration guides** for downstream tools
- **Validation docs** for build systems
## Implementation Checklist for Future Metadata Features
When implementing new metadata features, follow this order:
### Phase 1: Planning & Configuration
1. **Identify the metadata scope**
- Is this page-level or element-level metadata?
- Will it be duplicated across multiple locations?
- Does it need to map display names to stable identifiers?
2. **Centralize configuration** (if needed)
- Add mappings to `config.toml` under `params`
- Use consistent naming conventions
- Document all supported values
3. **Define the schema** (`static/schemas/feature-name.json`)
- Specify required and optional fields
- Use JSON Schema Draft 7
- Include examples
- If duplicating metadata, include `location` and `duplicateOf` fields
### Phase 2: Documentation
4. **Create documentation** (`for-ais-only/metadata_docs/FEATURE_NAME_FORMAT.md`)
- Explain the purpose and structure
- Show examples for different scenarios
- Document embedding locations (HTML, Markdown, data attributes)
- If multiple metadata sources exist, document precedence clearly
- Include usage examples for downstream tools
### Phase 3: Implementation
5. **Implement the feature**
- For page-level metadata: Create/modify Hugo partials
- For element-level metadata: Use data attributes on the element
- Test on multiple page types
- Verify output in both HTML and Markdown formats
6. **Handle optional fields gracefully**
- Use Hugo's `if` statements to only include fields when present
- Test on pages with and without optional metadata
### Phase 4: Validation & Documentation
7. **Validate the output**
- Write validation scripts
- Test against the schema
- Check whitespace and formatting
- Verify on multiple page types
8. **Document implementation notes**
- Capture lessons learned
- Note any workarounds or gotchas
- Provide guidance for future similar features
- Update this file with new insights
## Common Gotchas
### Template & Output Issues
- **HTML entity escaping**: Use `safeHTML` filter when outputting HTML/JSON in markdown templates
- **Whitespace in templates**: Use `{{-` and `-}}` to trim whitespace
- **Nested structures**: Test deeply nested content to ensure regex patterns handle all cases
- **Optional fields**: Remember that not all pages have all metadata fields
- **Markdown vs HTML**: Always test both output formats
### Metadata Design Issues
- **Hardcoded identifiers**: Don't hardcode language/client mappings in templates - use `config.toml`
- **Duplicate metadata confusion**: Always include `location` and `duplicateOf` fields when duplicating metadata
- **Missing precedence documentation**: Tools won't know which metadata to use without explicit precedence guidance
- **Element-level metadata in separate blocks**: Use data attributes instead of separate metadata blocks for element-level metadata
- **Inconsistent naming**: Use stable identifiers (langId, clientId) separate from display names (id, clientName)
### Testing Issues
- **Single page type testing**: Test on at least 2-3 different page types (command pages, guide pages, etc.)
- **Missing optional fields**: Test pages that don't have all optional metadata fields
- **Large nested structures**: Test with deeply nested content (e.g., multi-level TOC, many code examples)
- **Multiple instances**: Test pages with multiple instances of the same metadata type
## Complete Metadata Architecture
The Redis documentation now has a comprehensive, multi-layered metadata system:
### Layer 1: Page-Level Metadata (Primary)
- **Location**: `