feat(docs): Adding models + APIs for context base V1 #15191
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introducing Documents in DataHub (Context)
This PR introduces a new Document entity to DataHub, enabling users to create, manage, and organize first-party knowledge base content directly within the platform. Documents can be hierarchically organized, linked to data assets, and managed through a complete lifecycle including draft/publish workflows.
Core Data Models
Introduces comprehensive metadata models for the Document entity in DataHub:
Entity Definition
documententity with key aspectdocumentKeyand search capabilitiesCore Aspects (PDL Models)
DocumentKey- Unique identifier for documentsDocumentInfo- Primary aspect containing:draftOffieldDocumentContents- Text content storageDocumentStatus&DocumentState- Publication state managementDocumentSource- Tracking external sources for third-party integrationsParentDocument,RelatedAsset,RelatedDocument- Relationship modelsDraftOf- Draft-to-published document linkingGraphQL APIs
Comprehensive GraphQL API surface in
knowledge.graphql:Mutations
createDocument- Create new documents with content, relationships, and hierarchyupdateDocumentContents- Update document text and titleupdateDocumentRelatedEntities- Manage relationships to assets and other documentsmoveDocument- Relocate documents within the hierarchydeleteDocument- Remove documents and their referencesupdateDocumentStatus- Toggle between PUBLISHED/UNPUBLISHED statesmergeDraft- Merge draft content into published document with optional draft deletionQueries
document(urn)- Fetch document by URN with full metadatasearchDocuments- Hybrid semantic search with rich filtering:Special Features
draftsfield - Lists all draft versions of a published documentchangeHistoryfield - Chronological audit log of document modifications with support for: Content changes, Parent changes (moves), Relationship changes, State changes, etc.Authorization & Privileges
New Platform Privilege
MANAGE_DOCUMENTS- Platform-level privilege for managing all documentsEntity-Level Privileges
Documents support standard DataHub entity privileges:
VIEW_ENTITY_PAGE/GET_ENTITY- View documentEDIT_ENTITY_DOCS/EDIT_ENTITY- Edit document contentCREATE_ENTITY- Create documentsEDIT_ENTITY_OWNERS- Manage ownershipEDIT_ENTITY_DOMAINS- Assign domainsSHARE_ENTITY- Share documentsEDIT_ENTITY_PROPERTIES- Edit structured propertiesAuthorization Logic
canCreateDocument()- RequiresCREATE_ENTITYfor documents orMANAGE_DOCUMENTScanEditDocument()- RequiresEDIT_ENTITY_DOCS,EDIT_ENTITY, orMANAGE_DOCUMENTScanGetDocument()- RequiresVIEW_ENTITY_PAGEorMANAGE_DOCUMENTScanDeleteDocument()- Requires delete authorization orMANAGE_DOCUMENTSBackend Services
DocumentService
Complete service layer implementation in
metadata-service/services:Timeline Support
DocumentInfoChangeEventGenerator- Generates change events for audit historyFactory Beans
DocumentServiceFactory- Spring factory for service instantiationTest Coverage
Smoke Tests
document_test.py(410 lines) - End-to-end document lifecycle testsdocument_draft_test.py(326 lines) - Draft creation, merging, and workflowsdocument_change_history_test.py(281 lines) - Timeline and change trackingUnit Tests
DocumentServiceTest.java(486 lines) - Service layer business logicDocumentMapperTest.java- Type mapping validationDocumentInfoChangeEventGeneratorTest.java- Timeline event generationKey Features & Use Cases
This PR lays the foundation for DataHub to become a central knowledge hub, combining first-party documentation with data asset management in a unified platform.
Coming in a followup PR:
Status
Ready for review.