A basic component of a privacy program is understanding what data you collect, where that data resides, and how it flows through your data processing systems. When combined with other characteristics of the data, this knowledge allows a privacy professional to understand what laws and regulations must be applied to your privacy initiatives and determining how well your organization complies with these requirements.
Other parts of an organization have an interest in this same information about personal information maintained. Information Technology, including Database Administration, clearly need to know what information is collected, where the data resides, how the information is protected, what applications access that data, how the applications interrelate, as well as many other operational aspects of the data. All of these characteristics are also of interest to the privacy team.
The need for IT professionals to know this information is nothing new. I can remember working with data dictionaries in the 1980s that were used to collect similar information to that described above. Keeping a data dictionary current was an insurmountable challenge for many organizations so these repositories quickly became out of date and unused.
Now in 2020 we are seeing data inventories become an integral part of many privacy programs. We use technology to scan our infrastructures to find personal information. We use Privacy Impact Assessments and other processes to keep the inventories current. In short, the maturing of privacy practices is allowing data inventories to succeed where many data dictionaries failed.
I frequently get asked “what needs to be in my data inventory?” Capturing all of the details about all the personal information data elements may be very expensive from financial, time, and labor perspectives. Capturing too little information may make the data inventory useless from a privacy perspective. So, the answer is to collect enough information for the privacy team to do their job.
Who owns a data inventory?
Recently we worked with a privacy team to improve their Privacy Impact Assessment process. During the process, the privacy attorney wanted to collect all of the details about personal information and how it was maintained. The name of the database, the table and column in which data was stored, what applications had access, how was the data element protected in transit and at rest, and so on. The intent was primarily to perform the PIA, but also to capture all of this information in the privacy office data inventory. I suggest the attorney was going a bit too far.
Naturally, IT and the DBAs had this information in their data inventories as well. Operationally, these teams need detailed information on personal information to keep the business running. As new systems are proposed and implemented, as new protection standards are reviewed and established, as new processing of personal information is undertaken, these teams need the most up-to-date data inventory information available.
Both Privacy and the IT teams need the most current information when performing a PIA or addressing a data incident. If each has their own data inventory, there may be conflicts between the repositories. Which is the “source of the truth”? Incorrect information in a data inventory may lead to bad decisions. So, I suggest that there should be one data inventory for the organization. But who owns the data dictionary?
Since IT has the most current information about personal information maintained by the organization, I suggest that IT should own the data dictionary with the privacy office being a consumer of the information. Similar to how Security is responsible for establishing controls for protecting personal information based on business requirements (including those from the privacy team), the data inventory should not only address IT’s needs, but the subset of information that the privacy office and security office needs should be made available as well.
What does the privacy office need?
The needs of the privacy office for a data inventory is much simpler than that needed for IT. To make most decisions regarding the protection of personal information you just need high level information. There are exceptions of course like during PIAs, Data Protection Impact Assessments, and addressing incidents for example, but these are time when the IT team will be involved so they can bring the relevant detailed data with them.
I look to GDPR’s Article 30 for the requirements for a data inventory for a privacy office. For a controller, and this is where I get these requirements, for each process you need:
• The purposes of the processing,
• A description of the categories of data subjects,
• A description of the personal data,
• The categories of recipients of the data when sent outside the organization,
• The countries to where the data is being exported,
• The retention periods, and
• The security measures undertaken to protect the data at rest and in transit.
Often, I replace the description of the categories of recipients of the data with the names of the recipients to make the data inventory a bit more complete.
If, as a privacy office, I have access to the above information, I can make most of my decisions and recommendations for data protections such as defining new processes or policies or providing oversight for compliance of privacy activities. If I need more detailed information PIAs, DPIAs, or addressing incidents, I can go to the single source for the detailed data dictionary in IT.