Discovering the Requirements – How much data will I need?

In the last posting we looked at understanding your analysis and reporting needs. We will now discuss the data required to underpin you Customer Marketing Solution, reviewing how this is identified, understood and used to help build the solution. This posting is the sixth instalment of the Discovering the Requirements series. If you are joining the discussion now, you may want to start by reading the first posting.

When looking at the underlying data sources feeding into your customer marketing solution, the actual content, format and management of those sources needs to be reviewed and qualified. This posting will look at both these areas beginning with the content & format, and then moving on to provide a cursory view of the management of the data to create a single customer view.

In the third posting of this series on “How do I define what the solution should deliver?“, a list of the required data sources based on the customer’s touch points was covered. Before looking in detail at the information required to be reviewed for each of the data sources and data feeds, it is worth reviewing the difference between these often transferable terms:

Data Source – A collection of data on multiple topics from a named system or data repository. For example, an insurance policy administration system.
Data Feed – A single file of data, representing one type of data from a data source. For example, a policy table contained within the policy administration data source for an insurance business.

The differentiation is crucial, as it helps ensure all data areas are covered and not summarised as a single data feed, which in fact covers multiple data areas each requiring different management to incorporate into the single customer view.

For each data source and data feed a detailed analysis is then required to understand and document the following key items:

Data Feed format, identifying how the data will be presented. – Will the feed be provided as a flat file, a database view, a database table, etc? For flat files consideration needs to be given to how the fields will be delimited and if any header or footer rows will be provided.
Data Feed source details, defining where the data will be provided from and the name of the source file, table, etc. This should also cover any filter criteria, which remove test data or agreed historic information.
Data Feed volume– there are three key elements to understand, which will help with the solution sizing and projected growth:
- Current Volume – How many records are currently held in the data feed?
- Project Annual Growth – How many additional new records will be added to the data feed in the next year. This can often be a difficult question to answer, but can be based on the Business Goals identified in the earlier posting “How do I know if I am delivering the right solution?”, where for example a key goal was “To grow the number of active customers by 5% year on year” which provides an indicative number for the projected annual growth.
- Projected Updated Volume – An often overlooked element is the amount of data being provided with each update, which may include updated and new records. For highly volatile data this can result in a lot of information being provided with each data supply and can impact refresh performance, unless designed into the solution.
Data Feed relationships – For each data source and sometime across data sources, the relationship between each data feed needs to be understood to ensure for example transactions are linked to the correct individual.
Data supply details– What type of data supply will be provided for each data feed and how often (Hourly, Daily, Weekly, Monthly, etc) will they be provided? There are traditionally 3 key data supply types:
- Refresh – Complete resupply of all data with each supply of data (This is sometimes called Rebuild or Holistic)
- Update – New and updated records are provided with each supply of data (This is often called Incremental or Delta).
- Append – Only new records are provided with each supply of data.

Other types of supply can include

One Off – a supply of historic data only, loaded at the start of the solution.
Partial Refresh – all records for a particular set are re-provided. For example, if one order item changes or is removed, all order items are resupplied for the given order.

Data Structure– For each field in the data feed the following details need to be understood:
- Field Name.
- Field Description – including possible code values and their meanings for discrete sets of values.
- Source Field Name – in many systems data can be held with unclear field names, but the physical name still needs to be understood.
- Data Type – Type of data held within the field (e.g. integer, decimal, text, date, etc.), including details on the field size and any precision required for decimal data.
- Default value – Is there a default value for the given field.
- Nullable – Can the field hold null values?
- In Scope – Is the data required to be used within the customer marketing solution.

This information will be critical in the design and development of the single customer view and should be based on actual data and not only on existing documentation. This requires the physical data to be analysed to check for incorrect primary/foreign key, spurious values, undocumented changes, unused fields, amongst other things. At this point a close eye should be kept on the analysis, reporting and campaigning requirements, to ensure all requested data elements are provided and to challenge the need for unnecessary information.

This will provide a detailed understanding of the required data sources and data feeds, but will not cover how the data should be managed to create and maintain a single customer view. To complete this, the following topics must be covered to provide details around the business requirements:

Understanding who the customer is – in an earlier posting “How do I define what I should deliver?” we looked at the various customer touch points and this can now be expanded on to understand the communication and interaction channels that can be used to define an individual.
Data Quality needs – to ensure accurate delivery and correct presentation of a customer’s contact details be that via email, post, telephone, etc. the provided customer data will need to be managed to provide:
- Accuracy – correction of misspelt information (e.g. misspelt street name)
- Validity – identification of valid details (e.g. email address format).
- Completion – ensure all necessary information is provided to give a complete contact channel (e.g. missing town details).
- Timeliness – validation of data, to ensure information provided is current and up to date (e.g. out of date postcode information).
- Uniqueness – identification of a single customer across multiple data sources and multiple contact channels.
Data Survivorship – within any solution multiple sources of the same or similar information will be provided from a variety of trusted sources (e.g. a policy record from an insurance business) and less trusted sources (e.g. a survey record captured by a 3^rd party). This overlap of information requires rules to be defined on how the data will be merged and which source/s should take priority. In a future posting I will discuss this topic further and consider how multiple brands can be managed within a solution.
Reject Management – for a given data source and data feeds, rules will be required to ensure poor/corrupt data is not loaded into the single customer view (e.g. Incomplete customer details, such as no name information for web captured details). This potential restriction of data will need to ensure the data and related information (e.g. transactional details) can be represented at a later date if the data quality improves. A consideration at this point is the need for feedback loops to source system to highlight potential customer data issues.
Permission Management – as highlighted under data survivorship, multiple sources can cover the same piece of information, which is especially true of permission details, for example, permission to contact via email or via post. The rules governing this and the need to keep a history of permission changes will need to reviewed and agreed in line with best practice, legal requirements and company privacy rules.
Deduplication – building on my first point, concerning “Understanding who the customer is” a review of potential matching options and the levels of match required is needed. Key questions at this point include the use of single or multi-channel matching, the trust in provided personal information (such as date of birth) and customer numbers provided from the source systems.

After completing these exercises you will have a complete view of the data sources & data feeds to a granular level, as well as an understanding of data management requirements needed to support the delivery of a single customer view.

Please join me next week when I will discuss non-functional elements of the customer marketing solution and how these can impact when and how the solution will be delivered. (Week 7: When and How will the solution be delivered?)

If you have any further questions or would like support / guidance in discovering or defining your Database Marketing solution, please contact me through the BlacklerRoberts Ltd “Contact Us” page and I will be happy to discuss your needs. You can also follow @BlacklerRoberts on twitter for further insights.

Discovering the Requirements – How much data will I need?

Leave a Reply Cancel reply