Building Pinterest

This is a guide for designing photo sharing platform.

Requirements and Goals of the System

Requirements

  1. Users should be able to upload, download, and view images.
  2. Users can search for images based on titles or descriptions.
  3. Users can follow others to personalize their content feed.
  4. The system should generate and display a personalized homepage with top images from followed users.
  5. The platform must ensure high availability.
  6. Low latency for home page loading.
  7. Consistency can be relaxed slightly to ensure high availability; brief delays in displaying new images are acceptable.
  8. Uploaded images should be stored reliably, ensuring no data loss.

Things to Keep in Mind

The system is expected to handle a significant number of read requests, so it must prioritize fast retrieval of images.

  1. Users can upload large volumes of images, so storage management should be efficient.
  2. Fast image retrieval is critical to meet user expectations.
  3. The system must guarantee 100% reliability for uploaded images.

Capacity Estimation and Constraints

  • Assume a user base of 250M total users, with 1.5M daily active users.
  • Around 3M new images uploaded daily, equating to 35 new uploads per second.
  • Average image size: 300KB.

Storage requirements:

  • Daily storage: 3M * 300KB = ~900GB.
  • Storage for 5 years: 900GB * 365 days * 5 years = ~1.64PB.

High-Level System Design

At a high level, the platform will need to support two primary workflows: uploading images and viewing or searching for images. This requires blob storage servers for storing images and database servers for metadata.

Database Schema

The system must store data about users, their uploaded images, and their social connections. The image table will store metadata, and an index on (ImageID, CreationDate) will facilitate fetching the latest images.

Metadata can be stored using SQL for its ability to handle relationships. Considering, scaling relational databases has challenges.

Image files can be stored in distributed blob systems like S3. Metadata can be managed in a key-value store, with the key being ImageID and the value being an object with fields like ImageURL, UserID, and Description.

| **Users Table** | | --- | | UserID (Primary Key) | | Name | | Email | | CreationDate | | **Images Table** | | --- | | ImageID (Primary Key) | | UserID | | ImageURL | | CreationDate | | Description | | **Follows Table** | | --- | | FollowerID | | FollowingID |

Reads vs Writes

Operations like saving data to disk can be slower, especially for uploads, while retrievals can be faster when served from a cache.

Users uploading files may take over all available connections since uploading is generally a slower process. This could block the system from serving read requests if it's overwhelmed by write operations. To address this issue, we can segregate read and write requests into distinct services.

Considering that most web servers have a limit on the number of concurrent connections, this is a factor. Uploads require synchronous handling. While reads can be handled asynchronously, This emphasizes the need for separate, dedicated servers for reading and writing, ensuring uploads don't dominate the system.

By isolating read and write requests, we also enable independent scaling and optimization of these operations, enhancing the system's overall efficiency and responsiveness.

Reliability

Ensuring file integrity is critical for our service, so we’ll maintain multiple copies of each file. This way, if a storage server becomes unavailable, we can recover the data from a backup stored on another server.

This approach extends to all components of the system. To achieve high availability, we need to deploy multiple instances of services across the system. If some instances fail, others can continue handling requests, keeping the system operational. Redundancy eliminates single points of failure, ensuring the system remains robust and dependable.

Building redundancy into a system ensures it remains resilient and operational during failures. For example, if two service instances are running, and one stops functioning, the system can seamlessly switch to the other without disrupting user experience. Whether through automated or manual failover, this redundancy ensures the system is prepared to handle unexpected issues effectively.

Data Partitioning

Sharding Based on UserID: One approach is to partition data based on the UserID, ensuring all photos uploaded by a single user are stored on the same shard. For example, if the total storage requirement is 1.64PB and each database shard can store 8TB, we would need 1,640/8=205 shards. To accommodate future scalability, we might round this up and allocate 250 shards.

The shard for a user can be determined using:

ShardID = UserID % 250

To ensure system-wide uniqueness for each photo, we can append the ShardID to the PhotoID.

How can we generate PhotoIDs?

Each database shard can maintain its own auto-incrementing sequence for generating PhotoIDs. Appending the ShardID to the PhotoID makes the identifier unique across the system.

Challenges with this partitioning approach:

  1. Hot Users: Popular users with many followers could overwhelm their shard with high traffic when their photos are accessed or downloaded frequently.
  2. Uneven Storage: Users with significantly larger photo libraries could create imbalances in storage distribution across shards.
  3. Single-Shard Limitations: If one user's data exceeds the capacity of a single shard, splitting their data across multiple shards could result in increased latency.
  4. Availability Risks: Storing all of a user's data on one shard means their data is entirely unavailable if that shard goes offline or becomes overloaded.

Sharding Based on PhotoID: An alternative approach is to generate globally unique PhotoIDs first and determine the shard using:

ShardID = PhotoID % 250

This method avoids the need to append ShardID to PhotoIDs, as the IDs would already be unique across the system.

How can we generate PhotoIDs: We can use a globally unique sortable UUID generator

Ranking and Timeline Generation

To build a user's timeline, the system needs to gather the most recent, popular, and relevant photos from the people they follow.

Suppose we need to retrieve the top N photos for a user’s timeline. The application server first queries the list of users they follow and fetches metadata for the latest N photos from each of them. These photos are then ranked using criteria like recency and engagement (likes, shares, etc.) to determine the top N results for the timeline. However, this approach can result in high latency due to the need to query multiple tables and perform computationally expensive sorting, merging, and ranking operations.

Pre-generating the timeline:

To reduce latency, we can precompute timelines and store them in a cache for fast retrieval. Specialized servers can continuously generate these timelines in the background. When a user requests their timeline, the system simply queries this precomputed data for a quick response.

Approaches to delivering timeline data to users:

  1. Pull Model:
  2. Users periodically request (pull) timeline updates from the server.
    • Challenges:
      • Updates are delayed until the user requests them.
      • Many pull requests may return no new data, wasting resources.
  3. Push Model:
  4. The server pushes new timeline updates to users as soon as they’re available. This requires users to maintain a long-polling connection or similar mechanism.
    • Challenges:
      • High traffic for popular users (e.g., celebrities with millions of followers) can overwhelm the system.
      • Frequent updates to large follower groups can strain server resources.
  5. Hybrid Model:
  6. A combination of pull and push strategies can balance efficiency:
    • Push updates to users with moderate followings.
    • Users with extensive followings or high activity levels receive updates less frequently or via pull requests.
    • Limit push updates to a reasonable frequency to manage server load.

Querying Photos for Timeline Generation

Since we want to latest photos in generating timelines, we can improve the performance of querying this data by also supporting range based time partitioning for the timestamp like CreationDate. Allowing for easy limiting of amount of partitioned data being processed.

Cache and Load Balancing

To support a global user base, the service needs an efficient, large-scale photo delivery system. This can be achieved by deploying a geographically distributed network of photo cache servers and using Content Delivery Networks (CDNs) to bring content closer to users.

Caching Metadata:

For frequently accessed database rows, caching can significantly reduce load and improve performance. We can use Memcached or a similar system to store "hot" data. Application servers should check the cache before querying the database. The Least Recently Used (LRU) eviction policy can be effective, as it removes the least accessed items first.

Load Balancing:

CDNs and distributed cache servers help balance the load across multiple geographic regions. Load balancers can distribute traffic among servers based on availability, proximity, and current load, ensuring users experience minimal latency. This strategy ensures scalability and reliability for photo delivery on a global scale.

Other System Design Resource Pages:

Browse all articles