The System Design Template I Use

System Design Primer: https://github.com/donnemartin/system-design-primer

System / Architecture design is an important part of any software engineering project. Right after requirement gathering for features and before diving into development, every project lead has to come up with a system design document illustrating how the overall system would like and how it will interact with external services. This process is followed in almost all the Big tech companies including FAANG (or MAMAA now?) and others.

Today’s I am going to present a system design document outline which I personally use before working on any large scale project. It is inspired by many Senior engineer’s design documents / templates and hopefully will help you with your next software design.

The design usually follows two parts – High Level Design and Low Level Design but I will present a template which combines both.

Without further ado:


Overview (Optional)

Describe what is the purpose of the document and who is it for? Should people know anything before diving in? Is there any targeted audience (Tech vs Non Tech etc).

Problem

This is an important part of the document.

Explain the problem in hand which needs the solution. Don’t discuss the background, add that information in the Appendix. Share relevant information about the problem and tag the wiki documents the reader can go to, to understand more about the scenario. Avoid abbreviations and expect user has no prior knowledge about the problem in hand.

Tenets (Optional)

A tenet is a principle or belief that helps align teams and bring everyone into an agreement around critical questions. At Amazon, we add this section to go back to first principles while making a design decision. This is optional for small designs though.

Requirements

Describe all the requirements that the problem imposes. What is in scope for the proposed solution? Preferably write from the end user’s perspective. What are they expecting from this solution? List all the requirements that you can think about for the project / product and get them verified by the stakeholders.

Sometimes it helps to add use cases:

  • As a Retail Website user, I want to be able to add product to my cart.
  • As a financial analyst I want monthly report to be generated in under 5 minutes.
  • As an on-call engineer I need to have a dashboard representing the health of a system by region.

Out of Scope

Requirements describe what is in scope of the project however in some cases it is worth explaining what is out of scope to help reader deeper understand decision making framework and avoid unnecessary questions.

Success Criteria

Imagine the solution is in production already. How will you evaluate the success of the product? What data can you use?
For example:

  • Time-to-Market reduced by 90%.
  • The solution can scale to 10,000 users with 100ms P90 latency.

Architecture

Describe the architecture of the solution with explanatory text, bullet points and diagrams. Try to avoid using too many lists or only diagrams with no text. A good design should balance a substantial amount of explanation with few key diagrams and lists. Also mention why this design is the recommended solution for the problem in hand.

High-Level Overview (HLD)

Start with the overview of high-level design. Make a list of system components. Focus on logical components of the solution rather than particular technology.
Good example:

  • Data Ingestion service
  • Data storage
  • Web UI
  • Notification service
  • Export functionality

Bad example:

  • DynamoDB
  • ReactJS

Creating a diagram for a high-level overview is always a good idea.

API Design

List all the APIs through which users / services will interact with this product / service. Mention the payloads, verbs, versions for each API. How can this API model evolve in future? How will customers interact with these?

This section can evolve during the implementation so even if you keep some aspects for later, that’s fine.

Data Storage and Model

What data model will you be using? Which database is suitable for this? Evaluate how much data the system will be processing. Make a future growth forecast. Prove that the solution will scale to the needs of the business in 3-5 years.

Also think about data pipelines, data ingestion and pre-processing, storage layer etc.

Application / Component Level Design (LLD)

Dive deeper into the design of each individual component in following sections. Add components, data flow and control flow diagrams – whatever is applicable.

Dependencies

Be explicit about the other systems you’re interacting with. Are they internal or external? They contain huge number of risks because you have no control over those systems. They require thorough analysis and risk mitigation.
Share your assumptions about dependencies. For instance, we assume Service A will be able to handle 50,000 TPS, and etc.

Design Alternatives Considered

Talk about all the different design alternatives – combination of different Infrastructure platforms, databases, service frameworks, logical approaches etc – you considered and mention the recommended one which you think will be best suited for this project and why.

A table with pros and cons including columns like cost, scalability, ease of use, latency, maintainability and community support are good way to judge the best services and platforms to use for this design.

Cost Analysis

Analyze how much infrastructure costs the solution will generate. This section can be optional for smaller problems however as a general rule it is beneficial to spend some time calculating the impact in order to choose a right solution.

Plan for the future growth.

Failure Modes

Contemplate on what can go wrong in the system: dependency failure, traffic overflow, performance degradation, bug in the business logic, etc. It is extremely important to know how system can fail.

Non Functional Requirements

Think about non-functional aspects of your software project – Scalability, Availability, Maintainability, Reliability, Latency, Security etc . All of these are very important for the project in the long run.

Scalability

How many users are you expecting? How many transactions / queries? How much data?

Latency and Availability

What are the P99, P50 etc numbers? Create an SLA (Service Level Agreement) with your end customers about this numbers.

Maintainability

How much maintenance will be required for each service / component? Who will manage the services in the long run?

Security

Mention the level of security required by this system and how it will achieve it. The level of security depends on different variables the system is storing / interacting with – user information, financial information, user connection details, etc.

Testing and Observability

Talk about different testing, monitoring and alerts strategies which will be used before and after the product goes into production.

Testing (Optional)

This section contains information about how testing will be done – Unit tests, Integration tests, A/B tests and stress tests etc and which tools will be used for them, this section depends on the team and company wide practices.

Metrics and Alarms

Talk about key metrics which will be tracked and how success will be measured. Have clear dashboards tracking the most important metrics. Mention the alarms tracking the SLAs and where will they be seen by the team.

Concerns / Risks

Discuss the main risks and concerns which can affect the project. For example, It can be external dependencies or resource shortage.

Future Improvements (Optional)

This section talks about the features which the users can expect in future releases. Talk about different features you plan to roll out and what that timeline can look like.

FAQs

This section is great to put some questions people ask regularly and you don’t want to spend time on talking them through during a review. If the design is new and you don’t have history of questions try to anticipate what people might ask.

Appendix A: Subtitle (Optional)

Can contain deep technical analysis, data, thorough investigation description and so on. Something that people might be asking for but not required for everyone to read during a review.

Glossary

  • MTTR – Mean Time To Resolution of a system outage.

References

  • Service 1: [link]

[1] Glossary – an alphabetical list of terms or words found in or relating to a specific subject, text, or dialect, with explanations; a brief dictionary.



This is the extensive system design outline that will help you think about all the aspects of your software project. Let me know if I missed something.



Why Everyone Should Learn To Write

Do you hate writing? Me too.

I am not talking about tweets or IG posts. I’m talking about 500 word long articles and essays. Most of us don’t like writing them. They take too much time and effort.

But is time really the problem? I don’t think so. We spend a lot of time on social media everyday right? Yeah time is not the problem.

So is effort the problem? Hmm let’s see. What happens when you open a blank page to write? Does anxiety kicks in? For me it does.

Continue reading “Why Everyone Should Learn To Write”

Who should you be: Technology Generalist or Specialist?

Everyone in the tech industry knows that every few years (months?) a new technology or framework enters the market.

Angular, Ember and JQuery were good enough until React came up. Not that people don’t use Angular anymore, but everyone wants to learn React now.

Same is true for every other computer science field: Deep Learning and Reinforcement Learning became extremely popular in the Machine Learning (ML) field once neural networks started improving computer vision applications.

It goes for the trending tech as well: Social media apps made web and mobile development very popular. Then ML and AI entered the market along with Blockchain and IOT.

This is how the general trend goes.

A novel technology arrives –> everyone starts using it –> It becomes the Industry favorite –> A novel technology arrives

Now the question is, in this forever evolving world of technology, should you specialize in one field or try many?

Continue reading “Who should you be: Technology Generalist or Specialist?”

Comparative Case Study of ML Systems: Tensorflow vs PyTorch

In this post, I’ll perform a small comparative study between the background architecture of TensorFlow: A System for Large-Scale Machine Learning and PyTorch: An Imperative Style, High-Performance Deep Learning Library

The information mentioned below is extracted for these two papers.

Continue reading “Comparative Case Study of ML Systems: Tensorflow vs PyTorch”

Top Resources for getting a Software Engineer job at Big N Companies


This short post is written for recent graduates, current students and code newbies looking for a job as a Software Engineer in the Big N companies like Google, Facebook, Amazon, Netflix etc.

I have personally tried most of the resources mentioned here (free and paid) during my job search. These have helped me to land offers from companies like Samsung, Myntra, Walmart Labs and most recently Amazon.

Hopefully, this list helps you prepare for your dream job too!

Note: This list is by no means comprehensive and is only supposed to provide a starting point in your job prep. I will keep adding resources here. Please contact me, if you have some.

Continue reading “Top Resources for getting a Software Engineer job at Big N Companies”

My Summer Internship Experience at Walmart Labs

Now that I’m back at Arizona State University for my fall semester, this seems like a good time to share my Summer 2019 Internship experience with everyone.

I decided to write this blog for anyone interested in applying at Walmart Labs for an Internship. The whole experience was full of learning and fun and I’ll always be grateful for this opportunity.

Continue reading “My Summer Internship Experience at Walmart Labs”

A brief Introduction to Support Vector Machine

Support Vector Machine (SVM) is one of the most popular Machine Learning Classifier. It falls under the category of Supervised learning algorithms and uses the concept of Margin to classify between classes. It gives better accuracy than KNN, Decision Trees and Naive Bayes Classifier and hence is quite useful.

Continue reading “A brief Introduction to Support Vector Machine”

Life @ Arizona State University

This fall I joined the Masters in Computer Science Graduate program at the ever amazing and diverse Arizona State University.

Coming to the US was an intimidating task given that I had never lived alone and the baggage loss at the airport added to the troubles but more on that later. So after arriving in Phoenix, Arizona and witnessing the blistering heat of this otherwise amazing city, I finally managed to go to the Sun Devil county, my home for next 2 years, Arizona State University!

Continue reading “Life @ Arizona State University”

How AI will affect different fields of Technology!

Artificial intelligence is a term that inspires wonder in the minds of some people but instigates terror in the hearts of others. The truth is AI is changing our lives at a phenomenal speed. From driverless cars to special purpose robots, AI is meant to enhance the quality of human lives.

Continue reading “How AI will affect different fields of Technology!”

Want to ace technical interviews? Get started with Competitive Programming!

If you are preparing for Software Developer / Engineer jobs, you have to be prepared to go through rigorous technical interviews. All these interviews require good programming skills. Apart from impressive side projects and relevant experience, knowledge of Data Structures (DS) and Algorithm Design & Analysis (ADA) with good problem-solving skills are the most important things you’ll need to ace the interview.

Continue reading “Want to ace technical interviews? Get started with Competitive Programming!”