Building a Text Sharing Platform

We’ll create a text-sharing web application, enabling users to store snippets of plain text and access them via unique, random URLs.

What is a Text Sharing Platform?

Such platforms allow users to upload text and generate unique links to access the content. They are frequently used for quickly sharing snippets like code, logs, or notes with others through a simple URL.

System Requirements and Objectives

Our text-sharing service must fulfill the following:

Core Functionalities:

Users can upload text and receive a unique link for access.
Only plain text uploads are allowed.
Uploaded content should expire after a predefined time, with users optionally setting the expiration duration.
Users may specify a personalized alias for their links.
Ensure data reliability—uploaded content should not be lost.
Maintain high availability so users can consistently access their text.
Minimize latency for real-time text retrieval.
Generated links should be secure and hard to predict.

Key Design Considerations

While this service shares similarities with a URL shortening system, some unique requirements demand attention:

Text size limits: To avoid misuse, impose a cap of 8MB per paste.
Custom URL restrictions: Users may define custom links, but these should have reasonable length restrictions for uniformity.

Estimating Capacity and Constraints

This system is expected to handle far more reads than writes.

Traffic projections:

With an assumption of 800,000 new pastes daily, there would be roughly 4.8 million reads per day.

New pastes per second:

800,000 / (24 × 3600) ≈ 9 pastes/second

Reads per second if 6x more than writes:

4,800,000 / (24 × 3600) ≈ 56 reads/second

Storage estimates:

Average paste size is estimated to be 8KB (as logs or code are typically small).

Daily storage requirement:

800,000 × 8KB = 6.4GB/day

For a decade, the total storage required would be approximately 23TB, assuming no deletions. Allowing for 70% capacity usage, we plan for 33TB.

Unique key storage:

With 2.9 billion pastes over 10 years, a base62 encoding system (A-Z, a-z, 0-9) generating 6-character strings would suffice:

62⁶ ≈ 56 billion unique keys

Storing all keys (assuming 1 byte per character):

2.9 billion × 6 = 17GB, which is negligible relative to 33TB.

Bandwidth estimates:

For uploads: 9 writes/second → 72KB/s

For reads: 56 reads/second → 0.44MB/s

Caching requirements:

Caching the top 20% of accessed pastes requires:

0.2 × 4.8M reads/day × 8KB ≈ 7.7GB.

API

The platform’s core functionalities can be accessed through RESTful APIs. Below are examples:

Create Paste API:

createPaste(api_key, content, custom_alias=None, username=None, title=None, expiration=None)

Parameters:
- api_key (string): A unique key for API quota management.
- content (string): The text to be stored.
- custom_alias (string): Optional user-defined alias.
- username (string): Optional user identifier.
- title (string): Optional title for the text.
- expiration (string): Optional expiration time.
Response: Returns a unique URL or an error code.

Retrieve Paste API:

getPaste(api_key, paste_id)

Parameters:
- paste_id: A unique identifier for the paste.
Response: Returns the content or an error message.

Delete Paste API:

deletePaste(api_key, paste_id)

Response: Returns true on success or false on failure.

6. Database Schema

Observations for database design:

Must store billions of records.
Metadata (e.g., timestamps, IDs) is small (~50 bytes per record).
Text content varies from a few KB to several MB.
Minimal relational dependencies.
Optimized for heavy read operations.

We would need two tables, one for storing information about the Pastes and the other for users’ data.

| **URL Table** | **Description** | | --- | --- | | PasteID (Primary Key) | Unique id for the paste. | | CustomAlias | Optional unique path | | ContentURL | The path to storage where content is located | | CreationDate | Timestamp when the paste was created | | ExpirationDate | When the paste expires. | | UserID (Foreign Key) | Identifier for the user who created the URL. | | **User Table** | **Description** | | --- | --- | | UserID (Primary Key) | Unique identifier for the user. | | UserName | Name of the user. | | APIKey | User-specific API key for service access. |

Architecture

At a high level, we need an application layer that will serve all the read and write requests. Application layer will talk to a storage layer to store and retrieve data. We can segregate our storage layer with one database storing metadata related to each paste, users, etc., while the other storing paste contents in some sort of block storage or a database. This division of data will allow us to scale them individually.

Application layer

Our application layer will process all incoming and outgoing requests. The application servers will be talking to the backend data store components to serve the requests.

How to handle a write request? Upon receiving a write request, our application server will insert the data (text content, metadata, etc.) into their respective data stores. After the successful insertion, the server can return the url to the user with key being a UUID or Snowflake ID of the URL Table primary key for that row.

How to handle a paste read request? Upon receiving a read paste request, the application service layer contacts the datastore. The datastore searches for the key, and if it is found, returns the paste’s contents. Otherwise, an error code is returned.

Datastore layer

We can divide our datastore layer into two:

Metadata database: We can use a relational database like MySQL or a Distributed Key-Value store like Dynamo or Cassandra.
Blob storage: We can store our contents in a blob storage that could be a distributed file storage or an SQL-like database. Whenever we feel like hitting our full capacity on content storage, we can easily increase it by adding more servers.

Building a Text Sharing Platform

What is a Text Sharing Platform?

System Requirements and Objectives

Key Design Considerations

Estimating Capacity and Constraints

API

6. Database Schema

Architecture

Other System Design Resource Pages:

System Design Interview Resource

What’s: Consistent Hashing

Consistent Hashing Implementation in GO

What’s: Load Balancer

What’s: Proxy

What’s: Indexes

What’s: Caching

What’s: Partitioning

What’s: Distributed Queue

Client Server Communications

What’s: Redundancy and Replication

What’s: SQL and NoSQL

Real-time Communication

How to Approach System Design Interviews

Building a URL Shortening Service

Building a Text Sharing Platform

Building Pinterest

Building a Newsfeed

Building Video Sharing Platform

Building Chat App

Building Search System