GravatarBen Ramey's Blog
Scripture, programming problems, solutions and stories.

Iterating All EPiServer Catalog MetaObjects

If you're maintaining an e-commerce-enabled site in EPiServer and you're using the built-in catalog, then you have probably found the need to incorporate external data into the catalog, perhaps from various sources.

Such is the case for me on an EPiServer site I've been working on over the last (almost) four years. The primary data for the catalog comes into EPiServer from an external PIM system. But, then there is all sorts of ancillary data that needs to calculated from existing fields or pulled from external sources. These external data sources don't really provide a way to incrementally apply deltas for the data they provide. The only option to make updates is to walk through the entire catalog and see if anything needs updating when matched against the external data I've pulled.

This external data is stored in the catalog as custom EPiServer MetaFields. So, a need arose to easily walk through the entire catalog, pick up every MetaObject (to update the custom MetaFields) in every available catalog culture and make the appropriate updates.

Below are the iterators I wrote to accomplish this. They let you walk the entire catalog tree (in a breadth-first manner) starting from the top as if you were iterating a simple C# array or list.

High-Level Usage

Because all we're doing is writing iterators, the high-level usage looks just like you were interating through an IEnumerable in C# like an array or a List<T>.

The iterators are implemented as extension methods on the ICatalogSystem interface. So, iterating through all of the catalog entry MetaObjects looks like this.

using Mediachase.Commerce.Catalog;
using Mediachase.MetaDataPlus;

ICatalogSystem catalog = CatalogContext.Current;
MetaDataContext mdc = MetaDataContext.Instance;

foreach ((MetaObject metaObject, MetaDataContext metaDataContext) entryMO in catalog.AllCatalogSystemEntryMetaObjects(mdc))
{
    entryMO.metaObject["SomeCustomMetaFieldName"] = "external data source value here";
    entryMO.metaObject.AcceptChanges(entryMO.metaDataContext);
}

I implemented the separate iterators for NodeContent MetaObjects and the EntryContent MetaObjects. You easily use them in combination to iterate through the MetaObjects of the entire catalog tree.

Note a couple of things from the example code above:

  1. You have to give the iterator the MetaDataContext. As we'll see in the iterator code, it will copy this MDC for each catalog language so that you get a MetaObject for every culture present in your catalog.
  2. Because of #1 above, the current MetaDataContext is passed to you during iteration so that you know which catalog culture this MetaObject is for.

Extension Method Code

The extension method code for AllCatalogSystemEntryMetaObjects is pretty simple. It uses four iterators in nested foreach loops to loop through

  1. All catalogs
  2. All nodes
  3. All entries
  4. All meta objects
public static IEnumerable<(MetaObject metaObject, MetaDataContext metaDataContext)> AllCatalogSystemEntryMetaObjects(
    this ICatalogSystem catalogSystem,
    MetaDataContext metaDataContext)
{
    foreach (var catalog in new CatalogSystemCatalogs(catalogSystem))
    {
        foreach (var node in new CatalogSystemNodes(catalogSystem, catalog.CatalogId))
        {
            foreach (var entry in new CatalogSystemEntries(catalogSystem, catalog.CatalogId, node.CatalogNodeId))
            {
                foreach (var tuple in new CatalogSystemEntryMetaObjects(metaDataContext, catalog, entry))
                {
                    yield return tuple;
                }
            }
        }
    }
}

I broke the iteration down into separate iterators in this fashion so that I could easily iterate through all the entries (for example) in a single node, if I knew from the beginning which node I wanted to start at. If that was the case, I could easily write another extension method that took a node ID (or node code) as a parameter, use the ICatalogSystem to find that node and then pass the node ID to the CatalogSystemEntries iterator to iterate just the entries under that node.

Iterators

We have four iterators in use:

  1. CatalogSystemCatalogs - iterates through all catalogs
  2. CatalogSystemNodes - iterates through all nodes, in a breadth-first fashion, in a catalog
  3. CatalogSystemEntries - iterates through all entries in a node
  4. CatalogSystemEntryMetaObjects - iterates through all MetaObjects for an entry

CatalogSystemCatalogs Iterator

The code is pretty simple. We implement an IEnumerable which uses our implementation of an IEnumerator.

The IEnumerator simply gets all the catalogs from the ICatalogSystem and, as you call MoveNext() (usually done for you by the foreach loop), increments an index value to give you the current CatalogDto.CatalogRow to work with.

public class CatalogSystemCatalogs : IEnumerable<CatalogDto.CatalogRow>
{
    private readonly CatalogSystemCatalogEnumerator _enumerator;

    public CatalogSystemCatalogs(ICatalogSystem catalogSystem)
    {
        _enumerator = new CatalogSystemCatalogEnumerator(catalogSystem);
    }

    public IEnumerator<CatalogDto.CatalogRow> GetEnumerator() => _enumerator;

    IEnumerator IEnumerable.GetEnumerator() => _enumerator;
}

public class CatalogSystemCatalogEnumerator : IEnumerator<CatalogDto.CatalogRow>
{
    private int _currentIndex = 0;
    private readonly ICatalogSystem _catalogSystem;
    private Lazy<Lst<CatalogDto.CatalogRow>> _catalogs;

    public CatalogSystemCatalogEnumerator(ICatalogSystem catalogSystem)
    {
        _catalogSystem = catalogSystem;
        _catalogs = new Lazy<Lst<CatalogDto.CatalogRow>>(GetCatalogs);
    }

    public CatalogDto.CatalogRow Current
    {
        get
        {
            return _catalogs.Value[_currentIndex];
        }
    }

    object IEnumerator.Current => Current;

    public void Dispose()
    {
        Reset();
    }

    public bool MoveNext()
    {
        if (_catalogs.IsValueCreated)
        {
            _currentIndex++;
        }

        if (_currentIndex >= _catalogs.Value.Count)
        {
            return false;
        }

        return true;
    }

    public void Reset()
    {
        _catalogs = new Lazy<Lst<CatalogDto.CatalogRow>>(GetCatalogs);
        _currentIndex = 0;
    }

    private Lst<CatalogDto.CatalogRow> GetCatalogs()
        => _catalogSystem.GetCatalogDto().Catalog.Freeze();
}

CatalogSystemNodes Iterator

For the nodes iterator, things get a little more complicated. Nodes can be nested, so we have tree structure we have to navigate. We use a Queue to walk through the nodes in a breadth-first fashion.

I like to use as many functional programming techniques as I can in C# these days. I use the great LanguageExt library for this. This library is where the Que<T>, Option<T> and Lst<T> data structures come from. I don't want to go in depth about what they are, but to understand the code below, you should now this: Que<T> is an immutable Queue<T>, Lst<T> is an immutable List<T> and Option<T> is basically the functional way to do the null-object pattern (instead of dealing with nulls, use an object to represent "I don't have a value").

If you're not familiar with breadth-first tree traversals, it's not too complicated. What we do first is get all the nodes at the top level (in our case, all the nodes that are direct children of the catalog we pass in) and put them in the queue. Then, we iterate through each of the nodes we put in the queue and add their child nodes to the queue. We do this until we run out of nodes to process in the queue which indicates we've gone through all nodes in the tree.

public class CatalogSystemNodes : IEnumerable<CatalogNodeDto.CatalogNodeRow>
{
    private readonly CatalogSystemNodeEnumerator _enumerator;

    public CatalogSystemNodes(ICatalogSystem catalogSystem, int catalogId, int startingParentNodeId = 0)
    {
        _enumerator = new CatalogSystemNodeEnumerator(catalogSystem, catalogId, startingParentNodeId);
    }

    public IEnumerator<CatalogNodeDto.CatalogNodeRow> GetEnumerator() => _enumerator;

    IEnumerator IEnumerable.GetEnumerator() => _enumerator;
}

public class CatalogSystemNodeEnumerator : IEnumerator<CatalogNodeDto.CatalogNodeRow>
{
    private Que<int> _nodeQue;
    private readonly ICatalogSystem _catalogSystem;
    private readonly int _catalogId;
    private readonly int _startingParentNodeId;
    private Option<CatalogNodeDto.CatalogNodeRow> _current;

    // We have a parameter for the starting node ID (startingParentNodeId) which defaults to 0.  If you
    // pass in a node ID, we'll start iterating child nodes of that node.  If you don't pass a value (keep
    // the default of 0), then we'll iterate through all nodes in the catalog.
    public CatalogSystemNodeEnumerator(ICatalogSystem catalogSystem, int catalogId, int startingParentNodeId = 0)
    {
        _catalogSystem = catalogSystem;
        _catalogId = catalogId;
        _startingParentNodeId = startingParentNodeId;
        _nodeQue = Prelude.Queue(_startingParentNodeId);
        _current = Option<CatalogNodeDto.CatalogNodeRow>.None;
    }

    // ValueUnsafe() lets us get the value of _current (which is an Option<T>)
    // or null if the Option<T> has no value.
    public CatalogNodeDto.CatalogNodeRow Current => _current.ValueUnsafe();

    object IEnumerator.Current => Current;

    public void Dispose()
    {
        Reset();
    }

    public bool MoveNext()
    {
        // end-case scenario--we've run out of nodes in the queue, so we're
        // done traversing the node tree
        if (!_nodeQue.Any())
        {
            return false;
        }

        int nextNodeId = GetNextNodeIdAndUpdateQue();

        // special starting condition when we want to do the entire tree
        // and no other starting node was passed in
        if (nextNodeId == 0)
        {
            // special case of there being a catalog node with no category nodes beneath it
            if (!_nodeQue.Any())
            {
                return false;
            }
            nextNodeId = GetNextNodeIdAndUpdateQue();
        }

        _current = GetNode(nextNodeId);

        return _current.IsSome;
    }

    private int GetNextNodeIdAndUpdateQue()
    {
        // get the next node ID from the queue
        int nextNodeId = _nodeQue.Peek();
        // remove the retrieved node from the queue
        _nodeQue = _nodeQue.Dequeue();
        // get all of this node's children and add their IDs to the queue
        _nodeQue = GetChildren(nextNodeId)
            .Fold(_nodeQue, (q, nodeRow) => q.Enqueue(nodeRow.CatalogNodeId));

        return nextNodeId;
    }

    private Option<CatalogNodeDto.CatalogNodeRow> GetNode(int nodeId)
        => _catalogSystem.GetCatalogNodeDto(nodeId).CatalogNode.HeadOrNone();

    private Lst<CatalogNodeDto.CatalogNodeRow> GetChildren(int nodeId)
        => _catalogSystem.GetCatalogNodesDto(_catalogId, nodeId).CatalogNode.Freeze();

    public void Reset()
    {
        _nodeQue = Prelude.Queue(_startingParentNodeId);
        _current = Option<CatalogNodeDto.CatalogNodeRow>.None;
    }
}

CatalogSystemEntries Iterator

Since entries cannot be arrange in a nested fashion, this iterator is a little less complicated than the nodes iterator. All we do is take in a catalogId and a catalogNodeId and retrieve all the entries below the given node ID, passing them to you one at a time as you call MoveNext().

The only mildly complex thing here is that we use a Lazy<T> so that we're not retrieving the entries immediately when you instanstiate the IEnumerator. We only load the entries when you MoveNext() and access Current which indicates you really do want some entries.

public class CatalogSystemEntries : IEnumerable<CatalogEntryDto.CatalogEntryRow>
{
    private readonly CatalogSystemEntryEnumerator _enumerator;

    public CatalogSystemEntries(ICatalogSystem catalogSystem, int catalogId, int catalogNodeId)
    {
        _enumerator = new CatalogSystemEntryEnumerator(catalogSystem, catalogId, catalogNodeId);
    }

    public IEnumerator<CatalogEntryDto.CatalogEntryRow> GetEnumerator() => _enumerator;

    IEnumerator IEnumerable.GetEnumerator() => _enumerator;
}

public class CatalogSystemEntryEnumerator : IEnumerator<CatalogEntryDto.CatalogEntryRow>
{
    private int _currentIndex = 0;
    private readonly ICatalogSystem _catalogSystem;
    private Lazy<Lst<CatalogEntryDto.CatalogEntryRow>> _entries;
    private readonly int _catalogNodeId;
    private readonly int _catalogId;

    // we take in the catalogNodeId to start at so that you could use this IEnumerator
    // to collect entries at any starting node in the catalog.
    public CatalogSystemEntryEnumerator(ICatalogSystem catalogSystem, int catalogId, int catalogNodeId)
    {
        _catalogSystem = catalogSystem;
        // we use a Lazy<T> object here so that we don't do a database call
        // as soon as you instantiate this class.  We only do it once you
        // access Current, indicating you really want to iterate the entries now
        _entries = new Lazy<Lst<CatalogEntryDto.CatalogEntryRow>>(GetEntries);
        _catalogNodeId = catalogNodeId;
        _catalogId = catalogId;
    }

    public CatalogEntryDto.CatalogEntryRow Current
    {
        get
        {
            return _entries.Value[_currentIndex];
        }
    }

    object IEnumerator.Current => Current;

    public void Dispose()
    {
        Reset();
    }

    public bool MoveNext()
    {
        // the first MoveNext() call will be when when _entries doesn't have
        // a value yet (but will after Current is accessed).  So, this leaves
        // the _currentIndex at 0 the first time MoveNext() is called.
        if (_entries.IsValueCreated)
        {
            _currentIndex++;
        }

        // we just incremented _currentIndex to the count of the _entries list,
        // so we return false to indicate there are no more values.  We really could
        // use == here instead of >=.  I guess >= feels safer for some reason.
        if (_currentIndex >= _entries.Value.Count)
        {
            return false;
        }

        return true;
    }

    public void Reset()
    {
        _entries = new Lazy<Lst<CatalogEntryDto.CatalogEntryRow>>(GetEntries);
        _currentIndex = 0;
    }

    private Lst<CatalogEntryDto.CatalogEntryRow> GetEntries()
        => _catalogSystem.GetCatalogEntriesDto(_catalogId, _catalogNodeId).CatalogEntry.Freeze();
}

CatalogSystemEntryMetaObjects Iterator

The MetaObjects iterator is a different than the other iterators in that it doesn't really iterate through a list or tree of MetaObjects. Instead, it iterates through all fo the MetaObject instances for a particular entry--one instance for reach culture available for the catalog.

So, as you'll see in the code below, one of the first things we do is grab all of the languages for the catalog so that we know what we need to interate through.

public class CatalogSystemEntryMetaObjects : IEnumerable<(MetaObject metaObject, MetaDataContext metaDataContext)>
{
    private readonly CatalogSystemEntryMetaObjectsEnumerator _enumerator;

    public CatalogSystemEntryMetaObjects(MetaDataContext metaDataContext, CatalogDto.CatalogRow catalog, CatalogEntryDto.CatalogEntryRow entry)
    {
        _enumerator = new CatalogSystemEntryMetaObjectsEnumerator(metaDataContext, catalog, entry);
    }

    public IEnumerator<(MetaObject metaObject, MetaDataContext metaDataContext)> GetEnumerator() => _enumerator;

    IEnumerator IEnumerable.GetEnumerator() => _enumerator;
}

public class CatalogSystemEntryMetaObjectsEnumerator : IEnumerator<(MetaObject metaObject, MetaDataContext metaDataContext)>
{
    private int _currentIndex = 0;
    private readonly CatalogEntryDto.CatalogEntryRow _entry;
    private readonly MetaDataContext _metaDataContext;
    private Lazy<Lst<CultureInfo>> _catalogLanguages;
    private readonly CatalogDto.CatalogRow _catalog;

    public CatalogSystemEntryMetaObjectsEnumerator(MetaDataContext metaDataContext,
        CatalogDto.CatalogRow catalog,
        CatalogEntryDto.CatalogEntryRow entry)
    {
        _catalog = catalog;
        _entry = entry;
        _metaDataContext = metaDataContext;
        _catalogLanguages = new Lazy<Lst<CultureInfo>>(GetCatalogLanguages);
    }

    public (MetaObject metaObject, MetaDataContext metaDataContext) Current
    {
        get
        {
            // what we're really iterating through is the catalog languages.
            // as we iterate through them, we grab the associate MetaObject for the current
            // language for the given entry
            CultureInfo currentCultureInfo = _catalogLanguages.Value[_currentIndex];

            var mdc = _metaDataContext.Clone();
            mdc.UseCurrentThreadCulture = false;
            mdc.Language = currentCultureInfo.Name;

            return (MetaObject.Load(mdc, _entry.CatalogEntryId, _entry.MetaClassId), mdc);
        }
    }

    object IEnumerator.Current => Current;

    public void Dispose()
    {
        Reset();
    }

    public bool MoveNext()
    {
        // As with the Lazy<T> for iterating through the entries, the first tie MoveNext() is called
        // the _catalogLanguages Lazy<T> will not have a value.  It will after the first Current access.
        // So, for the first MoveNext(), we want to keep the _currentIndex at 0, which is what this if-statement
        // accomplishes.
        if (_catalogLanguages.IsValueCreated)
        {
            _currentIndex++;
        }

        if (_currentIndex >= _catalogLanguages.Value.Count)
        {
            return false;
        }

        return true;
    }

    public void Reset()
    {
        _catalogLanguages = new Lazy<Lst<CultureInfo>>(GetCatalogLanguages);
        _currentIndex = 0;
    }

    private Lst<CultureInfo> GetCatalogLanguages()
        => _catalog.GetCatalogLanguageRows()
            .Map(lr => new CultureInfo(lr.LanguageCode))
            .Freeze();
}

Warning

Even though the above code will give you the convenience of iterating through your entire EPiServer catalog(s), be warned! You could easily forget what the IEnumerators really do above and do something like this:

// REALLY BAD CODE, DON'T DO
List<(MetaObject metaObject, MetaDataContext mdc)> allTheMetaObjects = _catalogSystem.AllCatalogSystemEntryMetaObjects(mdc).ToList();

I hope you see the danger of this. If you have a large catalog, you're loading the ENTIRE set of MetaObjects into memory in your allTheMetaObjects list. This probably won't be good for your application. If you have a small catalog, this might not be a big deal. But, keep this in mind!

Comments