BA 49/52: Driving Large Scale Backend Projects
A TPM’s reflection on driving zero-to-one projects and lessons learned from leading Apple’s Advanced Data Protection for iCloud Project.
Apple just announced a set of new privacy features across the ecosystem. The feature I am most excited about is Advanced Data Protection for iCloud. Why? ADP for iCloud was the largest zero-to-one project in my career that I led as a TPM.
Even though Apple just announced ADP this week, this project was kicked off back in 2014. It involved more than two dozen teams across iOS, macOS, and iCloud engineering to design, QA, product marketing, legal, privacy, government affairs.
The scale of this project spanned multiple OS and iCloud releases as well as significant rewrites of core functionality in particular the iCloud Keychain. Alas, I never got a chance to finish this and left Apple by 2017 to join Google in the Android Things team.
This single project taught me a lot about driving immensely large backend heavy engineering programs. This experience became the foundation of how I approached future projects that I led at Google, and Nike.
When driving large scale complex projects, there are several important lens through which a TPM most constantly look through and assess the trajectory of progress for the cross-functional team:
Performance - as you can imagine, timing is everything. To humans, 20ms or 200ms are inconsequential. However, in systems architecture, a few hundred miliseconds here and there quickly add up into human discernible seconds. It was very important for us that we measure the total time round trip time it took for requests coming from iOS client side to iCloud and Server backends; round trip times should matter to you as well and its something that has to start from Day 0. Nothing is harder than trying to address performance issues when you are 50% done coding your architecture. Build for performance from the start.
Resilience - a good system is a resilient system. Being able to recover from a disaster or other unexpected issues becomes critical to success. This can range from something as ubiquitous as multi-region support to redundancy in your architecture. Every solution or design you build, ask the question - how resilient is it?
Capacity - one could argue that capacity is one facet of performance. Capacity is not about speed but rather how many simultaneous round trips could you take at any given time. When we sketched out the original solution for ADP, the iCloud engineering team spent a lot of time assessing the required capacity the iCloud servers needed to handle to ensure we didn’t find ourselves constrained. Our approach for this was simple, what capacity did our data centers need if all the iCloud users (billions) decided to login at the same time during our busiest time of the year traffic wise (Christmas). Based on that request per second estimation, we determined the number of HSM Servers we needed to order to support ADP.
Resources - Human and Financial resources are the most critical piece of the equation for any large scale project. Financial resources are absolutely critical when working with cloud providers like Azure, GCP, and AWS. This is particularly true when building solutions or products at startups - cost control is everything. Make sure part of the project plan is a cost estimation of what it would take to support the product or service you are building. For a startup it maybe cloud provider spending, for the ADP for iCloud it was the cost of buying and installing new HSM servers at Apple Data Centers.
Security - all systems are as strong as the weakest link. No architecture or solution should ever be signed off on without evaluating for vulnerabilities and mitigation against common security hacks, intrusions and attacks. However, particularly with consumer facing systems like iOS and iCloud, security must be balanced against ease of use for the users. If you make a solution super secure yet difficult to comprehend and use, your adoption will be low. This is one of the primary reasons why Apple’s initial two step account protection had miserably low adoption rates. Creating app specific passwords for everything although more robust, can be too cumbersome for users unlike say a device passcode that is only known to the user but not to Apple and something the user enters on a regular basis. Build for security but ensure usability is not sacrificed.
Next time you are tasked with driving large complex projects that are backend heavy, review the progress, architecture and plans through the above lens; you are looking for red flags that stick out, stories that are incomplete, questions that need to be addressed, calculations that you forgot to do.
When you find these gaps, you as the TPM should take the lead and work with the right teams and people to fill those gaps in.
Until next time 👋!
-Aadil
How was this week’s newsletter?