The Hide & Seek bug that led me to WebKit
Hide & Seek is a Safari content blocker I made a couple of years back. The goal was to enable someone to use Google search as an anonymous user, while being able to use other Google web services like Gmail as a logged in user (and similar stuff for Bing and Microsoft web services). To do this, the content blocker specifies rules that asks Safari to block cookies only for certain URLs used in web search, thereby preventing Google (or Bing) from identifying the user performing the search.
While developing the Hide & Seek iOS app, I noticed a problem: When I tapped on a Google search result, Google ended up knowing that I followed that particular link. My Google search history showed my activity of following that search result, even though it didn’t show the search activity.
To debug this, I used Charles to log all requests made by iOS Safari when I tap on a link in iOS Simulator. I could see that Safari was notifying a Google URL whenever I follow a search result. So I tried to block delivery of cookies to that URL by adding an appropriate rule to my content blocker. Amazingly, that had no effect. My content blocker was telling Safari to block cookies on a particular URL, but Safari was ignoring that rule and sending the cookies along anyway to that URL. Even when I modified the content blocker rule to block requests to that URL completely, that rule too was being ignored.
I noticed that the Google URL to which the errant request was being
sent appeared as the value of a
ping attribute in the search result
link’s anchor (
a) HTML element. That led me to dig up what
I learnt that the
ping attribute exists as a faster way to track clicking of
links. Normally, Google tracks clicking of search results with a redirect. In
desktop browsers, a Google search result link looks something like this
<a href="https://www.google.com/url?url=http://example.com/search-result/"> Example search result </a>
So when the user clicks on the search result link, the user is initially sent to a Google-hosted URL, with the actual search result link passed on as a query parameter. Google logs this activity, then redirects to the actual search result.
For iOS Safari, Google switched to using the
for performing this tracking, so the HTML code for Google’s search
result link for iOS looks something like this now:
<a href="http://example.com/search-result/" ping="https://www.google.com/url?url=http://example.com/search-result/"> Example search result </a>
When the user taps on this link, the browser goes directly to the search
result page, and in parallel, sends a POST request to the URL
specified in the
ping attribute. The “ping” sent as the POST request
enables Google log the activity as before. There’s now no need for
Google to redirect to the search result.
This is obviously faster than having to follow a Google-hosted URL first, and then be redirected to the actual search result. So, it’s nice of iOS Safari to support it and of Google to make use of it.
The only problem was that my content blocking rules were being ignored for these pings. There wasn’t anything I could do to fix this in my content blocker code, so I filed a radar with Apple, and then shipped my app anyway with this limitation, asking users of my app to use the Open in New Tab option in iOS Safari rather than following search results directly.
The content blocking capability in Safari comes from WebKit, which is open source. That meant I could browse the code and see why content blocking wasn’t doing what I thought it should.
I had observed that all the
<a ping>-related POST requests had their
content type set as
text/ping, and a search in the WebKit code for
that content type led me to
The code for the content blocking machinery was at
(the content blocking machinery is always referred to as “content
extensions” in the WebKit codebase).
I could see that when loading a
page and its resources,
were consulting the content blocking machinery before proceeding.
PingLoader::sendPing wasn’t doing that, and there was nothing
about content blocking in the commit history for that file. It looked
like an oversight that
PingLoader::sendPing wasn’t participating in
content blocking, so I prepared a testcase as a macOS Safari
extension and filed a
bug on WebKit.
I tried to modify
PingLoader::sendPing to make it ask the content
blocking machinery before sending the ping and tested it against the
WebKit MiniBrowser (a simple single-tab browser that’s part of the
WebKit repository). I could see that with my edits, cookies were
correctly blocked for
<a ping> pings. I added tests (which took much
longer than I thought) and submitted that as a patch to fix the issue.
I also tried to see if that would indeed fix the original problem I started with – of Google getting to know who followed its search result – by trying to use my fix from the iOS Simulator, but I couldn’t get that working. I wrote to Benjamin Poulain, the author of the WebKit blog post on content blockers, on my problem and he assured me that testing it on the MiniBrowser would suffice.
Benjamin and Alex Christensen, who are behind a lot of the content blocking code in WebKit, helped me correct my patch and also expand its scope in two ways:
- To handle the css-display-none action type in content blocking rules
- To enforce content blocking for two other fire-and-forget requests that were being handled in PingLoader.cpp: CSP violation reports, and image loads triggered by the page getting unloaded (typically used for tracking when the user is leaving a page)
My fix was accepted in a couple of weeks, but it took until iOS 10 for that to be shipped as part of iOS.
With iOS 10, Hide & Seek can now be made to handle Google search results correctly. It took me a while to update Hide & Seek for that (I took about a year off to care for my baby daughter), and that’s now shipped as well.
And that’s the story of how a problem in my app led to a fix in WebKit.