Context
Migrating content into Content Hub is a multi-step process that involves moving vast amounts of data, assets, metadata, and associated workflows from legacy systems.
This migration must be handled meticulously to avoid data loss, ensure data integrity, and maintain the intended structure and functionality of digital assets. Inadequate planning or testing can lead to significant challenges, including data corruption, metadata mismatches, broken links, and user access issues.
Execution
This is intended to provide a structured approach to ensure a seamless migration process, minimize risk, and align the migrated data and assets with the new Content Hub environment. The objective is to ensure that all assets are successfully transferred, retain their original structure and metadata, and function as expected within Sitecore Content Hub, facilitating a smooth transition for users and supporting organizational goals.
Key phases include pre-migration preparation, data mapping, migration execution, and post-migration testing, all designed to streamline tasks, minimize errors, and maintain content quality and organization.
It also incorporates strategies for addressing unforeseen challenges during migration and establishes procedures to ensure ongoing data integrity and optimization in the Content Hub environment.
1. Pre-Migration Assessment and Planning
- Conduct an inventory of assets, metadata, and associated workflows in the legacy system. Outline existing data sources, file types, how many files and versions, how much data in total?
- Determine which data will be brought to which environment. For examples many customer’s have a DEV and QA environment with a smaller subset of data in comparison to their production environments.
- Define migration scope, including data to be migrated and any non-essential items to be archived.
- Assess data quality and cleanliness in the legacy system to avoid migrating unnecessary or corrupted data.
- Identify source file location that includes individual file direct links. All files to be migrated need to be accessible via a direct link example: Azure Blob Storage.
- Clean up assets that don’t need to be migrated.
- Collect all file metadata. This should be collected in a spreadsheet that includes all filenames, direct links and any other metadata that should be migrated to Content Hub.
- Establish a migration timeline, including key milestones and roles for the migration team.
2. Data Mapping and Metadata Structure
- Develop a data mapping plan to align legacy metadata fields with Sitecore Content Hub metadata fields.
- Define data transformation requirements (e.g., format conversions, taxonomy updates).
- Validate that metadata and tags are mapped accurately to support search functionality and asset discoverability.
- Separators for multi-valued attributes i.e. EU | USA | APAK
- List of values mapping to correct value.
- Hierarchies including full path.
- We recommend creating the following properties to assist during migration:
- BatchID property to give your import batches numbers to assist with troubleshooting.
- Create an additional property to store the ID of the previous DAM.
3. Content Migration Execution
- Select the appropriate migration tools or scripts, considering the complexity and volume of content. Many use an excel file as the source of legacy data and leverage Content Hub import/export functionality. More detail can be found in Insights below.
- Set up a migration environment to test data transfer in a controlled setting.
- Ensure that permissions and role-based access control (RBAC) settings are properly configured in Content Hub.
- Conduct batch migration tests to identify and address potential issues before the full migration.
- Monitor timing in QA to be able to provide estimated duration for final migration on production.
- Monitor graph during migration.
4. Migration Testing and Quality Assurance
- Implement quality assurance checks to validate data accuracy, completeness, and integrity post-migration.
- Test content accessibility, metadata accuracy, and file functionality (e.g., preview, download).
- Verify user roles and permissions to ensure only authorized personnel have access to relevant content.
More information available in the Testing and Quality Assurance recipe
5. Workflow Validation and Asset Usability
- Test workflows and automation in Sitecore Content Hub to ensure they function correctly with migrated content.
- Validate that asset relationships (e.g., dependencies, collections) are maintained post-migration.
- Ensure that all assets and their associated workflows meet project goals and usability expectations.
Post-Migration Monitoring and Optimization
- Set up post-migration monitoring to track system performance, identify potential issues, and resolve them promptly.
- Implement a feedback loop to gather user input on usability and functionality of migrated content.
- Develop an optimization plan to address any ongoing content organization, tagging, or workflow issues that emerge after migration.
Insights
Importing and exporting in Content Hub enables asset and asset metadata migration from a legacy system. An Microsoft Excel file is used as the source of legacy data.
In order to successfully migrate data, specific rules must be followed to avoid technical errors that would stop the import. We will now examine those rules and how they affect the structure of the Excel file used to import legacy data into your Content Hub.
Create an Excel Import File
There are a set of ground rules to follow when importing the Excel document into Sitecore Content Hub.
- One entity definition per worksheet: Each entity definition being imported into a Content Hub requires its own worksheet. A worksheet must contain data for only one entity definition.
- The worksheet name is the name of the entity definition being imported: If M.Asset entities are being imported, then the worksheet will be named M.Asset. The import process uses the name of the worksheet to determine which types of entities to create and populate with data.
- Identifiers to ID entities: Use identifiers to reference existing entities in the system.
- Worksheets are imported in the order they appear in the Excel file: This is important when importing relations, as the referenced related item should already be imported and extant in the Content Hub before attempting to import and relate a new entity to it.
- Use the Metadata member’s name, not its label, as the Excel column header: The metadata field names, not the labels, are used for the Excel column headers. This is to avoid any doubt on which property or relation is being used, as the name is unique per entity definition. When working with a self-relation, you can add colon- Parent or colon-Child.
- Multi values separate by “|”: When multiple values are possible, as with option lists, taxonomies, or relations, you can separate the values by using the pipe character.
- Multilingual = multiple columns with
propertyname#culture.
: Multilingual properties have a column-per-portal language that is enabled. The property name is extended with the culture. - M.Asset definition requires the “File” column to import your media files: The File column must be present as the first column in the worksheet for the M.Asset definition. It must contain the URL of either a public or authorized link. The import process will use the URL to run a fetch job that will copy the media file into the Content Hub cloud-based blob storage. When this fetch job is complete, a corresponding processing job will begin; its purpose is to assign the metadata stored in the Excel row into the newly created asset and persist it.
- Assign a meaningful, user-determined identifier: ny default, if no identifier value is set, the system will create a GUID value and assign it as default identifier.
- In Column D, the multilingual description is given in English.
- The AssetTypeToAsset relation is set by giving an asset type identifier: the asset types are loaded in the previous worksheet to make sure they are present in Sitecore Content Hub and can be linked to the new entity being created.
- The File column will be used to run a fetch job with the given URL, with either a public or authorized link
Setup Data Export in Content Hub
You will create different export profiles to retrieve the information you are interested in, then enable the export for users directly on the search. When exporting data out of Content Hub with the Excel export feature, you can make use of the export profiles to define which metadata you want to export (e.g., Name, Definition, IsDefault, and Setting).
For the properties, only the name needs to be included. The property type will be taken into account and exported in the appropriate format. The “relations” are a combination of the relation names and properties for export-related entities and profiles.
For assets, you can add an additional setting to control the export of public links as a URL. You can define whether you want to receive all public links from the asset or from the master file.
"publicLinks": { "asset": true, "masterfile": false }
When exporting, you have some additional options to select. The Filename is the name of the file that you will be able to download. The User-friendly column headers switch determines whether the property and relation names or labels are displayed in the exported file. User-friendly values mean that the field’s label (rather than its name) is used as a column header. User-friendly cell values means that the label value in the field will display without its full identifier showing up.
Example:
{ "properties": [ "Title", "Description", "FileName" ], "relations": { "AssetTypeToAsset": { "exportRelatedEntities": false }, "FinalLifeCycleStatusToAsset": { "exportRelatedEntities": false }, "ContentRepositoryToAsset": { "exportRelatedEntities": false } }, "includeSystemProperties": true, "publicLinks": { "asset": true, "masterfile": false }, "version": "1.1" }
For some clients it may make sense to temporarily move active scripts to background during migration as the metadata is already accurate while in other instances your scripts may be need. It it worth reviewing to see if it’s worthwhile to temporarily disable. Remember to enable after migration is complete.