Identifying your users in Google Analytics while complying with section 7 of the terms of service

Google Analytics

In Google Analytics we’re not allowed to track personally identifiable information (PII) such as usernames that would be identifiable by a 3rd party such as Google. This is because of section 7 of the terms of service:

You will not (and will not allow any third party to) use the Service to track, collect or upload any data that personally identifies an individual (such as a name, email address or billing information), or other data which can be reasonably linked to such information by Google.

While sending Google information that is personally identifiable is simply not permitted, you can instead send an identifier which is known only to you.

This is “confirmed” by “RockinFewl” in the Google Product Forum:

Earlier this year, I had an interesting talk with a Google representative on this very matter.

He confirmed that you are actually allowed to track individuals, but that you cannot store personally identifiable information on Google’s servers or a GA cookie. More specifically: you cannot store names or ip addresses in a custom var, but you can store ids that need your backend to resolve into a person identification. He said that whatever you’re doing in your backend is beyond the responsibility of Google.

Google Analytics Forum | Best way to track individual users

It is further confirmed by Justin Cutroni, Analytics Evangelist at Google:

To add Google Analytics data to a data warehouse you need to add some type of primary key to Google Analytics. In most of the work that I’ve done this key is a visitor ID. This anonymous identifier usually comes from some other system like a CRM. […] I know what you’re thinking, “You can’t store personally identifiable information in Google Analytics!” But this isn’t personally identifiable information.

Merging Google Analytics with your Data Warehouse

This also means that if you’re trying to identify users within the Google Analytics UI, you’ll have to do a search in a separate system to lookup who the user is. This is probably not very usable in reality, but if you use the Google Analytics API to create an integration with your backend system, you would be able to perform the lookup and display the correct user details in your reports.

Update: The PII Viewer for Google Analytics Chrome extension helps display PII within the Google Analytics UI without breaking the terms of service.

Real world examples

So what’s allowed and what isn’t allowed?

In my day job, I’m normally using systems such as Atlassian Confluence or IBM Connections to create new product features or new product integrations. How can we track users and uniquely identify them in these systems?

The simplest way to track a user is to add a custom variable that records say the username, email address or another identifier.

Example: Atlassian Confluence

In Atlassian Confluence, you could easily use the following:

 _gaq.push(['_setCustomVar', 2, 'username', AJS.params.remoteUser, 1 ]); // ***** DO NOT USE THIS *****

This would be a violation of section 7 of the terms of service because it sends personally identifiable information to Google. OK, a username may not be easily linked to a user’s actual identity, but it’s likely that it would be.

Example: IBM Connections

So, we know that usernames and email addresses are not allowed, but what else can we use?

In IBM Connections, each user is internally identifiable by a universal user identity (UUID). If this is not externally available, this would be perfect to send to Google as the identifier.

We could then use:

_gaq.push(['_setCustomVar', 2, 'uuid', uuid, 1 ]);

However, most IBM Connections systems have the ability to query a user based on this UUID using the Profiles REST API e.g.

https://connections.example.ibm.com/profiles/atom/profile.do?userid={GUID}

If this REST API is protected by authentication, then we are good. It would comply with section 7 of the terms of service.

If this REST API is not protected by authentication, then this would be a violation section 7 of the terms of service because it sends information to Google…

which can be reasonably linked to such information by Google

Workaround

To get around this problem, you should create what I’m going to term a “Google Analytics identifier” (GAID) which is mapped to the username or UUID and is only used to send tracking data to Google Analytics. You’ll likely need to store this against the user object/user table in your backend system.

That way, you can use this:

 _gaq.push(['_setCustomVar', 2, 'gaid', gaid, 1 ]);

Provided this GAID is not publicly accessible, we are good. It would comply with section 7 of the terms of service.

You will be able to happily track users, but now just need to generate some reports in your backend system that decodes these GAIDs into useful data. Hold tight, that’s another story.

Update (April 2014):

Further reading

David is a senior developer and solutions architect at AppFusions based in Nottingham, England.

AppFusions
AppFusions solves mixed-technology integration problems. We bring engineering and business workflows together, you can work better, faster and smoother.

AppFusions is headquartered in San Francisco, California and works with enterprise vendors and partners such as IBM, Jive, Egnyte, DropBox, Box and Atlassian.

27 comments on “Identifying your users in Google Analytics while complying with section 7 of the terms of service

  1. Shehzad says:

    Hi David,

    thank you for this fantastic post to clear things up. We are using Moodle as our VLE and have already installed GA but would very much like to be able to track individual user activity in Moodle. Each student at our college has a username and locks in with that and its usually a 6 digit number. Could we use this as our custom variable? im not sure what you mean when you say if the username is available publically? Any help is very much appreciated.
    Thank you
    Shehzad

    1. David says:

      Hi Shehzed,

      If the the username is something that a third party could understand and then use to get the user’s details, then that is no good. Perhaps Moodle has a UUID or similar that you could use.

  2. Pingback: Quora
  3. @dvdsmpsn says:

    RT @christian_r: L’article qui a fait ma soirée : on peut tracker des individus dans GA tout en respectant les TOS Google #joie http://t.c…

  4. @jmesam says:

    Identifying your users in Google Analytics complying with section 7 of the terms of service http://t.co/aISnsvogLV via @analyticsdennis

  5. @SemperBanU says:

    BTW – if you ever wanted to set data with Google Analytics that ties a specific user to a data point, here’s how http://t.co/CXkCl5ftEZ

  6. Laura says:

    Hi David,

    Great article, I’ve been researching this a lot and yours is the first post that has made things a lot clearer!
    I want to be able to send an email to my database and then track individual level behaviour from the email to the website (i.e. what did those who clicked through from the email get up to on the website).
    Is this possible using your method if I used a GAID and linked it back to the email address (UUID)?
    Is there an “off the shelf” solution or do you have to be a web developer genius and create an integration API from scratch?

    Thanks,

    Laura

    1. Laura

      You should be able to add individual user tracking on all your emails if you use something like MailChimp to send to your mailing list. This would give stats on who opened what and when.

      You could add the same tracking ID as the “GUID” for Google Analytics integration on the links in your email. This should be achievable with a little JavaScript. Creating the lookup to convert from GAID back to email address is the tricky bit.

  7. Jarrod says:

    Hi David,

    Interesting article, it has cleared up a few misconceptions that I had. I am hoping you might be able to answer this question:- what about storing other bits of information? We are looking at an application that does a postcode lookup and returns search results based on the postcode. Is it okay to store the postcode in GA (because it is all client-side with no database or server facility) or, extending that concept, a geo-location? My feeling is ‘no’, but I am not so sure now.

    Thanks,
    Jarrod

    1. I’d say that storing postcodes or geo-location is a bit shady. This would not uniquely identify a person, but is at least reduces the person to a unique cohort from which a third party may be able to guess the user.

      A safer approach would be for you to create a “GAID” from which you could lookup the postcode or geo-location.

      1. Jarrod says:

        Thanks for the prompt reply. I will go back to the powers-that-be with that info.

        Don’t think we can use the gaid approach as there is no capacity to store the postcodes when a search is made.

        Thanks again.

  8. @pufn1ca says:

    Identifying your users in Google Analytics while complying with section 7 of the terms of service http://t.co/gvLxSJcwi0

  9. Cedric says:

    Hi, tx for the article. Is it possible to retrieve the User ID in realtime from GA for current visitor? And can you query the GA API for the segment for that user then?

    Tx!

    Cedric

  10. wq says:

    Hi David

    Seems like the below two videos are broken.

    here are the videos:

    PII for Google Analytics Chrome extension
    How to send user IDs to Google Analytics

    Thanks

    1. Hi wq,

      The videos seem to be there and working for me. However, you could try watching them on YouTube instead:

  11. Sergey says:

    Hi, David!
    Thanks for the article,
    Could you tell me, please, can I write clientId and timestamp (date, hour, second) to custom dimension or event label?
    It helps me to identify user, but only with information from CRM.

    1. Hi Sergey,

      It all depends on what clientId contains. If it’s an ID that only you know, then use it. If someone else could guess what it means, then likely it’s personally identifiable information, so shouldn’t be used.

  12. Glen says:

    It’s unclear whether or not it’s acceptable to use a user ID that is visible in a URL, for example: mysite.com/users/123-username (user ID: 123)

    The people that you quote seem to say one thing, while you state another (in the IBM Connections example).

    “but you can store ids that need your backend to resolve into a person identification” — what about a front-end?

    Does Google not want to store the actual PII (names, e-mail addresses, etc.) or does it not want to store data (such as an ID) that could be used by a user (or an automated system) to link to the PII? If it’s the latter, who would actually have access to the ID in order to do something like this?

    Thanks.

    1. If the user ID is 123 and your site sends tracking URLs such as mysite.com/users/123-username to Google Analytics, then I’d say that you may be breaking their terms of service. The reason being is that you are sending 123 and username together. This means that you’ve sent personally identifiable information to Google that can also be linked to your user id.

      If however, mysite.com/users/123-username is a private URL that is only ever displayed to the specific user, you could simply stop the tracking for all URLs starting in mysite.com/users/. This would mean that Google doesn’t receive the data, so they don’t have the ability to link the user ID to the username, so you would be acting within the terms of service.

      In my example with IBM Connections, the user’s UUID was not accessible unless you’re logged into the system and does not display anywhere alongside a username or any other PII.

      1. Glen says:

        Hi David,

        I’ve been thinking more about this, and I’ve realised something rather important: It is impossible for Google to know what an integer user ID represents.

        As an example, if we *didn’t* use the actual user ID of our website, and used a private UUID instead:

        #, Our User ID, Our UUID/GA User ID
        1, 123, 111
        2, 456, 123
        3, 789, 345

        If we sent the UUID 123, Google cannot assume that that represents the public user ID on our website. If for some strange reason they map it to the profile URLs on our website, they’d find that 123 does represent a user profile, however it’s not for the correct user (#1). In fact, such an integer may be used in many URLs, f.e. /articles/123-my-article and /videos/123-my-video.

        Similarly, if we sent our public user IDs, they cannot assume that these map to profiles on our website, as they could just as easily represent an internal UUID that we are using.

        In other words, sending an integer is meaningless to Google, unless they know what it represents, and this information would never be available to them, and therefore you cannot personally identify someone using the GA user ID, since you don’t know what that refers to on our website (you can guess, but you could easily be referring to the wrong profile, so no assumptions can be made).

        For this reason, I think that using the “public” user IDs on our website is acceptable and should be in compliance with the terms of service, simply because the number on its own has no meaning to Google.

        Do you agree?

        1. Using a UUID/GAID is the way forward. That way, if Google’s lawyers get in touch assuming that you’re using the public user ID you can happily prove them wrong and keep your data. You can quickly marry up the data afterwards using my PII Viewer for Google Analytics Chrome extension.

  13. This is really helpful — and I can see now the importance of using anonymized userIds (which I was not). I can send these readily enough to Google through my backend. However — I don’t see how to get the information out of Google about what userIDs are accessing what parts of my site. I see that the chrome extension you made works with custom variables (because you started it before GA had rolled out userId support). Now that we do have GA userIds — should we still use your extension? does it work with userIds? or does Google have some other way of achieving the same?

    1. Google doesn’t allow you to view the user ID in any of their reports. More, they just use it to consolidate cross device usage in their reporting.

      To view user IDs, add a custom dimension. To convert user IDs to something that is human readable, use my PII Viewer Chrome extensions. Or roll your own integration.

  14. Ninh says:

    Hi David,

    I have been using your method for awhile, just one more question:

    Are there anyway we can associate User information from Google like Age, Gender…to our email?

    1. If you have that information, then possibly. But adding too much information into the tracking may be risky from a PII perspective. An anonymised ID is ok, but adding (say) age, gender, location, ethnicity etc to your tracking is narrowing the odds that someone else could figure out who the person is, so it’s probably not cool.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>