Simplify Your App by Using the PubSub Pattern, Part 1 Intro to PubSub Using Python and Redis


note 1 Due to the complexity of the topic of PubSub and its implementation, this tutorial will be a multi-part series of articles.
note 2 We’ll use technologies like Python and Redis, but everything discussed here is actually technology agnostic.
In this article, we will

  • learn about the publ-sub architecture
  • discuss why and when it is useful
  • solve a common challenge using pub-sub

The Problem

Let’s say we want to make a web application with the following requirements:

  • new user signs up –> send welcome email
  • new user does something for the first time –> notify sales people
  • user is inactive for X days –> notify user
  • user deletes his/her account with us –> notify customer support

An event(left side of the arrow) triggers some action(right side of the arrow) and sometimes an action triggers more events.
In addition,

  • Some actions need to be handled asynchronously. For example, we don’t want to make a user wait while we send an email.
  • Some actions are to be handled immediately and some are scheduled to happen in future.
  • Some aspects of the behavior of the system should be configurable by non-developers.
  • Business wants to experiment with new events and actions without investing much time and money.
  • Scaling the application should not require a total rewrite or redesign of the application.
  • We – the devs – want to write high quality code. We want to have automated tests and readable code that just works!

Intro to PubSub

Pubsub is an architecture used for designing complex systems characterized by requirements similar to the ones above. If you adhere to the design guidelines of PubSub, you’d end up with software that is much easier to reason about, maintain, and scale.
The PubSub architecture is made up of 3 components:

 

  • Publishers send events to the hub. They don’t care what happens to those events.
  • The Hub (also called a broker) routes events; it tells events where to go.
  • Subscribers act on events; they take actions. They don’t care where events come from. A subscriber tells the hub that it’s interested in one or more topics. When an event with that topic occurs, the broker notifies the subscriber.

Note: a topic is like a tag for an event. An event can have one or more topics. Also, a subscriber can subscribe to one or more topics.
The above 3 components effectively divide our application into 3 self-contained modules or components which allows for multiple implementations of the PubSub architecture.
First, we’ll go through one implementation and discuss its strengths and weaknesses. In the next article, we’ll go through another implementation that addresses the weaknesses of the first implementation.

All-Components-Within-One-Application Implementation

We can code our web framework, publisher, hub, and subscriber within the same application. Those who are familiar with Django signals have already used this implementation of PubSub. For example, you can subscribe to signals – Django signals are similar to PubSub topics- like pre_save or post_save if you want to trigger some action each time a row in your database is modified.
The picture below shows how these 3 compoenents interact with each other:

Let’s start with an event publisher. The NewUserCreated object is an event publisher that hides the details of how we send event details to the hub.

# publish.py
from . import hub # implemention comes later

class NewUserCreated: # event class
    topics = {'new_user_created'} # event can have multiple topics

    @classmethod
    def publish(cls, user_id): 
        hub(cls.topics, user_id)

# how events are triggered in another module
from .publish import NewUserCreated

def signup_handler(username, email, password):
    user = make_new_user(username, email, password)
    NewUserCreated.publish(user.id)
    do_some_other_things()

When a new user signs up, we send the new_user_created topic and event details(user_id) to the hub.
The hub is an instance of the PubSubHub class:

# hub.py
from collections import defaultdict

class PubSubHub:

    def __init__(self):
        self.topic_subscriber_mapper = defaultdict(set) #1

    def subscribe_to(self, *topics): # a decorator that takes topics as parameters
        def real_subscribe(subscriber_function): #2
            for topic in topics:
                self.topic_subscriber_mapper[topic].add(subscriber_function) #3
            return subscriber_function

        return real_subscribe


    def __call__(self, topics, **kwargs): #4
        for topic in topics: #5
            for subscriber in self.topic_subscriber_mapper[topic]:
                subscriber(**kwargs) #6

Explanation

#1 The hub has a mapping of topics to subscribers. A subscriber can register interest in one or more topics using the subscribe_to method of the PubSubHubclass.
#2 The subscribe_to method is a decorator that takes one or more topics as its argument. In Python, a decorator takes a function, does something to it, and returns it. In our case, the subscribe_to method takes one or more topics and the function(subscriber) that is interested in those topics and populates the topic to subscriber maps in the hub. You will see how this decorator is used later on.
#3 This line creates a mapping from a topic to an interested subscriber.
#4 This special “dunder” method makes an instance of the PubSubHub class callabe. That’s why we can send events to the hub using hub(cls.topics, auction_id)defined as part of the NewUserCreated class implemented above.
#5 We get the topics of an event from the event publisher. An event can have more than one topic. This loop ensures that we call subscribers interested in each topic.
#6 We call each subscriber with parameters we received from the event publisher.
Now, let’s look at a subscriber:

# subscribe.py
import emailer

from . import hub

@router.subscribe_to('new_user_created') #1
def email_new_user(user_id):
    user = get_user(user_id)
    context = {
        'subject': 'Welcome!',
        'body': 'Hello{}! Thanks for signing up.'.format(user.first_name),
    }
    emailer.send(user.email, context)

#1 We’re using the subscribe_to decorator on top of our subscriber function. subscribe_to creates a mapping from the new_user_created topic to the email_new_user function.
Note: The code above doesn’t run as is because we haven’t created a router object yet. Also, we need to trigger the decorators somehow. To see a full impolementation, go to [this repo].

Analysis

This implementation is simple and many frameworks are implemented this way. However, our requirements mention that some tasks need to ben run asyncronously and that scalability should not require a rewrite of the whole application.

Async

There are two reasons why we want asynchrony:

1

Some tasks like sending an email take a long time to finish and users don’t like waiting. We could use an async web framework to enable async in our application, but that would only work if all of our long-running tasks are IO-bound. Plus, the most popular Python web frameworks like Django and Flask don’t have async built-in. In short, we don’t want to learn and use a totally new framework which doesn’t even guarantee to address all the async issues.

2

Even if we were okay with waiting for long-running tasks to finish, there’s still a big reason why we want to have asynchrony.
Did you notice that if anything in the above implementation breaks every component of the PubSub architecture is affected? Let’s say we have 3 subscribers for the new_user_created topic. The hub tries to notify the first subscriber, but an exception is thrown. This means that none of the 3 subscribers get notified.
One way of fixing this would be to catch the exceptions. Since we have 3 major components and a wide range of subscribers, the number of exceptions we’d have to hanlde will be big. This can affect performance and reduce code quality.

Scalability

To scale this implementation, we’d have to scale the whole application(add more servers running the same application). As a result, we’d get more publishers, hubs, and subscribers in addition to the main web application. This is a waste of resources because it’s rarely the case that we need to scale those 4 components at once. And, sometimes scaling everything can be dangerous. For example, if our subscribers have too many events to process, then adding more publishers would make matters worse.

How to Fix These Issues

We can fix the weaknesses of the first implementation by running each PubSub component in a separate application or process. However, this would require the introduction of task queues and buffers to our architecture. In the next part of this series, we’ll use tools like Redis, RQ, rqworker, and rqscheduler to do just that.

Article by Mikaeil Orfanian

Written on June 18, 2017
Share :