When I took the first survey of my undertaking, I found our speech copious without order, and energetick without rules: wherever I turned my view, there was perplexity to be disentangled, and confusion to be regulated; choice was to be made out of boundless variety, without any established principle of selection; adulterations were to be detected, without a settled test of purity; and modes of expression to be rejected or received, without the suffrages of any writers of classical reputation of acknowledged authority.
‐ Preface to Dictionary of the English Language, Samuel Johnson, 1755 [9]
A data dictionary is defined in the IBM Dictionary of Computing as a centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format
.[1] However, Business Analysts should be aware that the term data dictionary can have several meanings depending on the person one is speaking with. A data dictionary can be:
As you can see from the list above, the term data dictionary may be interpreted differently by the person you are speaking to if they are a Oracle database administrator (#1 above), database architect (#4 above), system analyst (#2 above), or business analyst (#3 above). And if you are talking a non-technical business person, they may conflate a data dictionary with a glossary of business terms.
Depending on the context, a data dictionary can be a component part of database management system (DBMS) or a separate data store in the form of a database, spreadsheet, or even an index card file (in the old days).
Generally, there are two data dictionary types that Business Analysts deal with. These are logical data dictionaries and physical data dictionaries. And then some data dictionaries combine the metadata of both a logical and physical dictionary.
A logical data dictionary typically describes information in business terms and focuses on the meaning of terms and their relationship with other terms. For example, from the business perspective there may be a data entity called Client
, which includes Client Name
, Client ID
, and Order History
attributes. Once defined, a logical data dictionary rarely needs to change.
In contrast, a physical data dictionary describes the physical attributes of a data element. What is the data type (string, boolean, integer, etc)? What is the maximum length? What is the format? What is the data encoding standard (ANSI, UTF-8, etc)? Where is the data physically stored (database name, column, row, etc)?
While a the attributes of a logical data dictionary are unlikely to change unless there are significant business changes, the physical attributes can change much more frequently. And because the data element could be stored in multiple databases with different physical structures, there may be different physical data dictionaries that include the same logical data entity.
Most business analysts will be creating the logical data dictionary as part of their requirements work, which would then be converted to physical data dictionary as part of the solution definition and design phases of an effort.
According to the International Standards Organization (ISO): The increased use of data processing and electronic data interchange heavily relies on accurate, reliable, controllable, and verifiable data recorded in databases. One of the prerequisites for a correct and proper use and interpretation of data is that both users and owners of data have a common understanding of the meaning and descriptive characteristics (e.g., representation) of that data. To guarantee this shared view, a number of basic attributes has to be defined.
[9]
This is done through data dictionaries. Or put another way:
Data dictionaries assist with gathering requirements, standardize data element definitions, reduce redundancy, and enhance consistency between systems. What may seem like administrative busy work when you're launching a project can become a time and money-saving resource by revealing complexities that were overlooked in initial specifications and spotting costly inconsistencies before they are deeply embedded in the organization's systems.[8]
Among the many benefits of a data dictionary are:
The data dictionary can be one of the most important requirement artifacts that a BA creates when dealing with software solutions. It ensures that all stakeholders are in agreement on the definition and structure of data that the solution will use. The saying Garbage In, Garbage Out
is never more true than when dealing with system data, so spend time ensuring that you have carefully defined the data needs of your solution.
A data dictionary (and glossary) is often one of the first artifacts that gets created when the requirements elicitation process starts. One of the key needs in any project is to make sure everyone is speaking the same language. A glossary helps with this. But unless you dive into the level needed to define terms that goes into a data dictionary, you are unlikely to determine if there are slightly different terms that are used within the business units you are working with.
An example of a situation where a data dictionary can help identify the usage of a seemingly common term with different meanings comes from the financial services field, where a seemingly standard term such as "Cash" can have different meanings depending on who you talk to. For one person Cash
might mean literally cash in hand or immediately accessible money such as that in a bank account. But from various investment perspectives Cash
might include short-term Treasury Bills, Certificates of Deposit, certain types of promissory notes, and other types relatively liquid investments. Now imagine a situation where the formula you need to calculate a figure for a regulatory report requires one definition of Cash
but your internal systems are all built around a different definition of Cash
. Unless you have used a data dictionary to very precisely define what you mean by Cash
, your stakeholders and developers may not even realize there is a difference.
Choose your data dictionary location and structure. Are you going to use Excel? Does your requirements management tool specifically support a data dictionary?
Decide what attributes you are going to capture in data dictionary. Logical and Physical data dictionaries frequently have different attributes that should be captures. However, some data dictionaries are set up to capture both types of information. In those cases the business analyst might start the data dictionary by working with the business to define the logical data attributes while the technical team would then expand the data with the physical attributes for the specified data elements.
For each data element, the following logical attributes may be used in a data dictionary:
Attribute Name | Description |
---|---|
ID Code | Every data element in the dictionary should have a unique identification outside of its name. This might be a number, or an alphanumeric combination. |
Name | Every data element in the dictionary should have a unique name. |
Aliases | If the data element is potentially referred to by more than one name, the additional names can be recorded as aliases (or synonyms). For example, an element named Pricemay also be referred to as Costor Value. Thus Costand Valuewould be recorded as Aliases of Price |
Definition | Every attribute should include a definition that precisely defines the element associated with the Name, and which distinguishes it from other elements that may be similar. |
Formula | This specifies if the value being defined should be calculated using a specific formula. |
Unit | If the data element should be specified in a specific unit of measurement (such a miles, centimeters, or a currency) |
Update Frequency | This attribute how often the business expects the element value to be updated. This may be daily, weekly, monthly, point in time (if updates occur on a non-scheduled basis such as when a client changes their mailing address), or other values. |
Definition Source | It is often a good idea to keep track of who provided a specific element definition. |
Range of Values | This is the possible range of values the element needs to support. This could be a range of numbers, a set of letters, or one of a specific set of values (such as State abbreviations if part of an address) |
Time Dimension | Frequently, a business can want the same value but over a different period of time. For example, a business want to know the total amount of sales to a customer over a 1-month, 6-month, 1-year, and lifetime period. This attribute can also interact with the Update Frequency attribute to provide different results, so there could be a 1-year sales total that is updated on once a year on January 1st to provide a calendar year value, and another 1-year sales total that is updated daily to provide a rolling value. |
History Required | For some elements, the business may want to maintain not just a current value, but past values as well. For example, if the business was tracking a daily sales total, they may want to maintain specific daily values for the prior 5 years. |
Notes | This attribute field can be used to provide specific examples, background, or other points of information that the BA feels appropriate. |
Components | If the element is a compound element, this attribute can be used to indicate the component elements that are incorporated into this element. This can also be used to indicate the relationships with the component elements (mandatory, optional, 1 to 1, 1 to many, etc) |
Owner | This is the ownerof the data value that is being defined. This is usually the business unit or person who is responsible for the data in question. This might mean that they are responsible for the system that generates the data, or are the subject-matter expert for the ensuring the quality and accuracy of the data in question. |
Domain | This is the context in which the data element exists. For example, data elements within a clientdomain may include client ID, first name, last name, address, city, state, zip code, order history, preferred payment method, and other data elements that are related to the client. Sometimes a data element can be part of more than one domain. |
Default Value | This specifies if the data element has a default value, and if so what that value is. |
Security Classification | If the data element has a particular security classification that should control its use or access to the data, it might be noted here. An example might be sensitive client data (such as Social Security Numbers in the U.S.) that a firm might have to store for regulatory purposes, but for which access should be restricted and which should not be made public. |
* This is not intended to be a complete list of possible logical attributes. Please add any new attributes you may use or be aware of.
For each data element, the following attributes may be used in a data dictionary:
Attribute Name | Description |
---|---|
Size | If the element has a specific size (25 characters), or a maximum size (100kb), it should be documented here |
Database Name | The name of the database the element is stored in |
Database location | The specific location (column, row, record, etc) in a database that the element is stored in. |
Data Type | The type of data the elements is stored as. This can include a Boolean value, integer, number, record, or a number of other specific data types. |
* This is not intended to be a complete list of possible physical attributes. Please add any new attributes you may use or be aware of.
Decide what your version control system and change process for the data dictionary is. The data dictionary most frequently the document that is worked on by multiple parties simultaneously (business analysts, systems analysts, data analysts, data architects, etc.) and is thus one of the documents most in need of a solid version control and update process.
Start defining your data elements.
Make sure that each data dictionary entry defines only one data element. When working to define a new element, make sure to figure out if the element under discussion is a primitive or composite data element. Primitive data elements are those that cannot be deconstructed any further. Composite data elements are those that are made up of or incorporate primitive data elements.
For example, a name element is generally a composite data element that is made up of the following primitive data elements: first name, middle name, family name, and name suffix. There may also be a need to specify a name prefix element that would support titles such as Mr.
, Dr.
, Ms.
, and other similar titles.
When defining a composite element you should specify the primitive elements first and then the composite element, making sure that the definition of the composite element explicitly references the primitive elements that it is made up of.
The attributes for each data element in the dictionary should be verified with all stakeholder groups in order to reach agreement on naming, definitions, attributes, formulas, and other critical attributes of the data.
The end result of a data dictionary effort should be a single artifact that:
A Professional's Guide to Systems Analysis, Second Edition