In this post, I’ll perform a small comparative study between the background architecture of TensorFlow: A System for Large-Scale Machine Learning and PyTorch: An Imperative Style, High-Performance Deep Learning Library
The information mentioned below is extracted for these two papers.
I chose TensorFlow and PyTorch to perform a comparative study as I have used both of these systems and understanding their underlying design principles fascinated me. I wanted to know how these systems are processing models under the hood that makes them the Industry standard. The speed and complex architecture needed to solve Industry focused Machine Learning problems is difficult to achieve and both of these systems have managed to do that.
This post aims to provide a look at the architectural differences of both systems. For deeper understanding of the background working of each, please read their academic papers! They’re really informative and simple to read.
Data Flow Graphs: Tensorflow came out as an advancement to DistBelief, the previous Google ML platform that used parameter-server model. Tensorflow combines the high-level programming models of data flow and low-level efficiency of parameter servers and hence is much powerful than its predecessor. It combines computation and state management which were done by separate worker threads in parameter-server based architectures. It also uses TPUs which work better than CPUs and GPUs in providing performance. It has made developing complex (and novel) deep learning models comparatively easy for researchers.
A data flow graph is a neural network programmed along with placeholders and update rules. It can be used to determine ordering of the operations to estimate memory consumption etc.
High GPU Utilization: It provides a one-language platform to develop new ML architectures, is really fast and uses a single data flow graph to represent all computation and state in the algorithm. By deferring execution until the program is complete, it improves the overall execution performance i.e. high GPU utilization. The main power of Tensorflow lies in concurrent and distributed execution of overlapping subgraphs of the overall graph.
Community Support: Tensorflow also has huge community support and works quite well in production. Therefore many large companies like Google, Twitter, Airbnb, Open AI etc. use Tensorflow backends for their ML projects.
Mobile Interfaces: It also provides distributed execution across clusters and dynamic workflows which allows complex ML programs to run on CPUs, GPUs and TPUs while having an interface on a mobile device. Tensorflow-Lite is launched keeping the mobile devices in mind.
Large Scale ML: For intensive large scale tasks such as Image Classification and Language Modeling, Tensorflow provides a very fault-tolerant, distributed and optimized architecture which can used to trained very large ML models as is visible through Google’s ML focused applications.
Steep Learning Curve: Tensorflow has a learning curve and is aimed to make it easier for researchers to develop ML model, it adds the over-head of understanding its architecture. It also doesn’t provide fault tolerance to individual operations like Spark’s RDDs.
Debugging and Dynamic Execution: The ease of debugging ML programs is not something Tensorflow design focusses on. It also didn’t supported dynamic computation graphs earlier and the model can only be run when the computation graph has already been defined. Though it has added support for this in the latest version.
Though its architecture is very scalable and efficient, more efforts to improve user-experience can be made as Python users take some time getting used to it.
Ease of Use: PyTorch is a python focused ML framework which is developed to keep the user in mind. It focusses on maintaining performance while keeping the ease of use high for the end user. The “Everything is a program” approach of PyTorch makes it a very user-friendly platform.
C++ Core: As most Deep learning researchers are familiar with python, its developers developed this python library though its core is written in C++ for speed and performance. Unlike Tensorflow, it doesn’t use static data flow approach, hence to overcome the problem of global interpreter lock (which ensures only one thread is running at a time) of Python, its core ‘libtorch’ library (written in C++) implements the tensor data structure, autodiff integration etc. features in a multithreaded environment.
Python Library Support: PyTorch keeps simplicity above performance and hence makes a tradeoff. All the Python functionalities: print statements, debuggers, use of Numpy, Matplotlib etc. work effortlessly with PyTorch.
From the paper itself:
Trading 10% of speed for a significantly simpler to use model is acceptable; 100% is not.
CPU-GPU Sync: It’s highly interoperable and extensible and works well with other GPU using libraries. It uses CUDA to execute operators asynchronously on GPUs which helps in attaining high performance even for a language like Python. This way, it runs the control-flow of the model in Python through CPU and runs tensor operations on GPU, everything happens in a CPU-GPU synchronization.
Multiprocessing: It uses its own multiprocessing module (torch.multiprocessing) for allowing concurrent threads to speed up the programs. It also manages memory carefully by using reference-counting (counting number of uses for each tensor) and removing tensors that aren’t used anymore.
Dynamic Execution: PyTorch also supports dynamic computation graphs which helps the user to develop and run the model on the go. This is highly useful in models like RNN which uses run time variable length input.
Performance: As PyTorch is built for Python, it has to make some trade-offs with performance. It doesn’t use data flow graphs that are popular among other ML systems like TensorFlow, Theano etc. known for increasing performance.
Dependence: It also has to be dependent on a lot of other independent libraries to overcome the limitations of Python like using CUDA streams. As CUDA streams follow the FIFO approach, PyTorch needs to maintain a synchronization between CPU and GPU cycles as it follows the “one pool per stream” design. This may lead to fragmentation, sync overheads and some weird corner cases but PyTorch ensures the users may never encounter them.
Common Features Shared
- Both the systems use an efficient C++ core for achieving high performance. As computing gradients of loss function while using SGD is done by all ML programs, both PyTorch and Tensorflow provides efficient automatic differentiation algorithms.
- Both makes use of distributed execution and multiprocessing among worker threads / sub-computations to improve performance. Both the systems are open-source and really popular among the ML research community. Both uses asynchronous parameter updates for their algorithm execution.
As PyTorch came later than TensorFlow, it covered a lot of weak spots of it.
- PyTorch provides data parallelism as well as debugging both of that are a problem with TensorFlow. PyTorch is easier to learn for researchers compared to Tensorflow.
- PyTorch maintains a separation between its control and data flow whereas Tensorflow combines it into a single data flow graph.
- PyTorch performs reverse-mode automatic differentiation and TensorFlow also performs backward differentiation, though the difference lies in the optimization algorithms Tensorflow provides to removing overheads.
- TensorFlow makes use of deferred execution until the entire program is available whereas this is not possible in the case of PyTorch and therefore uses other means of improving efficiency like custom caching tensor collectors and reference counting.
Overall, PyTorch performs better than Tensorflow in a lot of areas including ease of use while not compromising on performance.
Taking benchmarks into consideration from the PyTorch paper, it performs better than Tensorflow implementing all the major ML algorithms like AlexNet, VGG – 19 etc. Though for large scale production systems, Tensorflow still remains the main choice due to its community support and strong architecture.
There’s no one way to design large scale systems. Each ML platform designs its features keeping some core aspects in mind. For TensorFlow it was performance whereas for PyTorch it was user experience.