In Google Analytics we’re not allowed to track personally identifiable information (PII) such as usernames that would be identifiable by a 3rd party such as Google. This is because of section 7 of the terms of service:
You will not (and will not allow any third party to) use the Service to track, collect or upload any data that personally identifies an individual (such as a name, email address or billing information), or other data which can be reasonably linked to such information by Google.
While sending Google information that is personally identifiable is simply not permitted, you can instead send an identifier which is known only to you.
This is “confirmed” by “RockinFewl” in the Google Product Forum:
Earlier this year, I had an interesting talk with a Google representative on this very matter.
He confirmed that you are actually allowed to track individuals, but that you cannot store personally identifiable information on Google’s servers or a GA cookie. More specifically: you cannot store names or ip addresses in a custom var, but you can store ids that need your backend to resolve into a person identification. He said that whatever you’re doing in your backend is beyond the responsibility of Google.
It is further confirmed by Justin Cutroni, Analytics Evangelist at Google:
To add Google Analytics data to a data warehouse you need to add some type of primary key to Google Analytics. In most of the work that I’ve done this key is a visitor ID. This anonymous identifier usually comes from some other system like a CRM. [...] I know what you’re thinking, “You can’t store personally identifiable information in Google Analytics!” But this isn’t personally identifiable information.
This also means that if you’re trying to identify users within the Google Analytics UI, you’ll have to do a search in a separate system to lookup who the user is. This is probably not very usable in reality, but if you use the Google Analytics API to create an integration with your backend system, you would be able to perform the lookup and display the correct user details in your reports.
Real world examples
So what’s allowed and what isn’t allowed?
In my day job, I’m normally using systems such as Atlassian Confluence or IBM Connections to create new product features or new product integrations. How can we track users and uniquely identify them in these systems?
The simplest way to track a user is to add a custom variable that records say the username, email address or another identifier.
Example: Atlassian Confluence
In Atlassian Confluence, you could easily use the following:
_gaq.push(['_setCustomVar', 2, 'username', AJS.params.remoteUser, 1 ]); // ***** DO NOT USE THIS *****
This would be a violation of section 7 of the terms of service because it sends personally identifiable information to Google. OK, a username may not be easily linked to a user’s actual identity, but it’s likely that it would be.
Example: IBM Connections
So, we know that usernames and email addresses are not allowed, but what else can we use?
In IBM Connections, each user is internally identifiable by a universal user identity (UUID). If this is not externally available, this would be perfect to send to Google as the identifier.
We could then use:
_gaq.push(['_setCustomVar', 2, 'uuid', uuid, 1 ]);
However, most IBM Connections systems have the ability to query a user based on this UUID using the Profiles REST API e.g.
If this REST API is protected by authentication, then we are good. It would comply with section 7 of the terms of service.
If this REST API is not protected by authentication, then this would be a violation section 7 of the terms of service because it sends information to Google…
which can be reasonably linked to such information by Google
To get around this problem, you should create what I’m going to term a “Google Analytics identifier” (GAID) which is mapped to the username or UUID and is only used to send tracking data to Google Analytics. You’ll likely need to store this against the user object/user table in your backend system.
That way, you can use this:
_gaq.push(['_setCustomVar', 2, 'gaid', gaid, 1 ]);
Provided this GAID is not publicly accessible, we are good. It would comply with section 7 of the terms of service.
You will be able to happily track users, but now just need to generate some reports in your backend system that decodes these GAIDs into useful data. Hold tight, that’s another story.
Update (April 2014):
- You can now use the PII for Google Analytics Chrome extension to decode these GAIDs into useful data.
- Google released Universal Analytics out of beta this month. Here’s a tutorial on How to send user IDs to Google Analytics for Universal Analytics with updated code examples.
- Generate UUID in Java
- Merging Google Analytics with your Data Warehouse
- Why You Could Lose ALL Your Google Analytics Data
- Are ‘usernames’ Privately-Identifiable Information (PII)?
- Understanding Cross Device Measurement and the User-ID