Building a Multi-Modal Twitch Bot in Python using GPT-4o

Building a Multi-Modal Twitch Bot in Python using GPT-4o

Introduction

Twitch chat bots offer a fun way to enhance your stream's interactivity. They can engage with viewers, answer questions, and create an engaging experience. However, their capabilities have been limited – unless implemented directly in the game, bots could only send messages to the chat without interacting directly with the streamer or game. Using Conduit, bots can now interact with the streamer through their audio and video with just a few lines of code. In this blog post, we'll walk through building a multi-modal Twitch bot in Python using Conduit. This bot will listen to and watch any Twitch livestream, responding to the video as if it were a real viewer.

If you're interested in the code for this project, you can find it on our examples repo.

Setup

Getting an API Key

Before we begin writing any code, we need to generate an API key for the Conduit service.

To do this, navigate to your Conduit Settings, sign in, and click on the "+ API Key" button in the top right.

API keys can not be recovered from the dashboard. Make sure to securely save your key in a password manager once you've generated it!

Installing the Python Packages

Before we begin, we need to make sure twitchio and socketio are installed. We will use TwitchIO to handle the interaction with Twitch and SocketIO to connect to the Conduit service. To install these packages, we can use pip:

pip install twitchio python-socketio

Be sure to install python-socketio and not socketio. The socketio package is not the same as python-socketio and will cause errors when running the bot.

Coding the bot

Using the OpenAI Vision API

The first step in creating a Twitch bot that can see and listen to a stream is writing the code to allow us to process images / transcriptions and respond to them in a way that makes sense. To do this, we will leverage the OpenAI vision API, passing it a base64 encoded image and the transcription from the stream.

First, we should create a OpenAIClient class that sets up OpenAI. We'll create this new class in the file open_ai_client.py.

class OpenAIClient:
    def __init__(self):
        self.client = OpenAI(api_key="sk-...")

Next, we can create a method to generate a response to the streamer. This method will use what the streamer said and the image associated with it to prompt OpenAI for a response that sounds like it came from a Twitch chatter.

def generate_response(self, transcription: str, image: str):
    system_prompt = f"""
    You are an enthusiastic viewer in a popular streamer's Twitch chat. 
 
    You will be given a transcript of a stream interaction and an image from the stream.
 
    As an engaged and reactive Twitch chatter, provide a short typed response of 1-2 sentences that you would send to the chat in this moment. 
    Your response should be enthusiastic, use some typical Twitch lingo, and relate to the specific game moment/dialogue or imagery being shown.
 
    You should never use emojis.
    """
 
    completion = self.client.chat.completions.create(
    model="gpt-4o",
    messages=[
            {"role": "system", "content": system_prompt},
            {
                "role": "user",
                "content": [
                    { "type": "text", "text": transcription },
                    { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{image}" } },
                ],
            }
        ],
        max_tokens=30
    )
 
    return completion.choices[0].message.content

Here we reduce the max number of tokens in the response since Twitch chats are usually fairly short. Calling generate_response, should now generate a one sentence response to what the streamer said. You can and should play with the max_tokens and system_prompt to customize the "feel" of the bot.

Connecting to Twitch

Now that the bot has a way to respond to it's incoming data, we will write the code to connect our bot to Twitch so that it can send messages in any given Twitch chat. To do this we first need to create a new Python file called twitch_bot.py and add the following constructor:

class TwitchBot(commands.Bot):
    def __init__(self):
        super().__init__(
            token="<your oauth token>", 
            prefix='!', 
            initial_channels=['<your_stream_name>']
        )

This will spin up a new Twitch bot that will process commands prefixed with the '!' character (e.g. !hello). It will also join your Twitch chat automatically so that we don't need to manually join it later.

To get your Twitch OAuth token, you can head to https://twitchapps.com/tmi/.

To help with debugging, we can add an event handler to log our username when we are done connecting to Twitch. To do that, we'll create the event_ready method:

async def event_ready(self):
    print(f'Logged in as | {self.nick}')

Finally, we can write the method to actually send a message to the Twitch chat. We'll call this method respond_to_streamer since it will take the transcription and image data from the stream and use OpenAI to come up with a believable response.

async def respond_to_streamer(self, stream_url: str, transcript: str, image: str):
    channel = stream_url.split("/")[-1]
    print(f"Sending response to {channel}")
    response = self.open_ai_client.generate_response(transcript, image)
    print(f"Responding to {channel}: {response}")
    await self.get_channel(channel).send(response)

The powerful part of this code is await self.get_channel(channel).send(response). This will get try to get the channel name from our cache and send a message in the chat. If we aren't connected to the stream get_channel will return None and this will fail.

Believe it or not, that's all the code we actually need to have a functioning Twitch bot. Now that we have the ability to interact in chat, let's move on to using Conduit to pull in transcriptions and images.

Getting Real-Time Transcriptions / Images from Twitch

Conduit makes it easy to get both transcriptions and images from Twitch livestreams in real-time. We can start by creating the ConduitClient class in a new python file called conduit_client.py. Since Conduit uses SocketIO to interact with clients / send events, we will spin up a new SocketIO AsyncClient in our constructor. We can then register the event handlers to process incoming messages from Conduit, and create a new TwitchBot so that we can respond to the transcriptions that we receive.

class ConduitClient:    
    def __init__(self):
        self.sio = socketio.AsyncClient()
        self.__register_event_handlers()
        self.twitch_bot = TwitchBot()

Event Handlers

Conduit emits multiple event types that we should listen to in our client. We'll first create a private method __register_event_handlers that will allow us to listen / process these events in our code, then we'll jump into the implementations of each event handler.

You can see the full list of events Conduit will emit in our websocket documentation.

def __register_event_handlers(self):
    self.sio.on("livestream_data_event", self.__on_livestream_data_received)
    self.sio.on("error_event", self.__on_error_received)
    self.sio.on("subscribe_event", self.__on_subscribe_received)
    self.sio.on("unsubscribe_event", self.__on_unsubscribe_received)

The event being emitted from Conduit that we will be most interested in is the livestream_data_event. This event will contain the stream url the transcription comes from, the transcription message, and an image from the stream. We can pull out all three of these data points from the event and pass it directly to our TwitchBot which will use OpenAI to process a response and send a message in the Twitch chat.

async def __on_livestream_data_received(self, data):
    for transcription in data:
        stream = transcription["stream_url"]
        message = transcription["transcription"]
        b64_image = transcription["frame"]
        
        print(f"Received transcription: {message}")
        await self.twitch_bot.respond_to_streamer(stream, message, b64_image)

The next event we need to listen for is the error_event. While not quite a fun as the livestream_data_event, this event is equally important as it will notify you when something is wrong with your request or when there was an issue on Conduit's side. Each error_event will contain the error_code and the error_message for the error. In this example, we'll simply print out when we see an error.

async def __on_error_received(self, data):
    for error in data:
        code = error["error_code"]
        error = error["error_message"]
        print(f"Error code: {code} Error Message: {error}.")

You can see all possible errors Conduit can emit on our websocket documentation.

Finally, there are the subscribe_event and unsubscribe_event. These events will fire any time you join or leave a livestream and contain a list of all streams you're currently subscribed to. To keep things simple, in these event handlers we'll just print the entire list out.

async def __on_subscribe_received(self,subscriptions):
    print(subscriptions)
 
async def __on_unsubscribe_received(self, subscriptions):
    print(subscriptions)

Connecting / Disconnecting from Conduit

Now that we have a way to process all the data we get from Conduit, we can start working on a way to connect / disconnect from Conduit. Since Conduit uses SocketIO, connecting to the client is fairly straight forward. All we need to do is call connect on our SocketIO client, passing the Conduit data url and the API key we generated earlier. After connecting to Conduit, we can go ahead and connect to Twitch as well.

async def connect(self):
    await self.sio.connect(
        "https://data.tryconduit.io", 
        auth={"Authorization": "sb_..."}, 
        wait_timeout=10
    )
    await self.twitch_bot.connect()

Disconnecting from these clients is even easier. All we need to do is call disconnect / close.

async def disconnect(self):
    await self.sio.disconnect()
    await self.twitch_bot.close()

Joining / Leaving Twitch Streams

With the rest of the Conduit client wired up, we can finally write the code to join Twitch livestreams and begin listening for the transcriptions / images. To do this, we can emit a subscribe event with the stream_url we want to join and the content_type we're interested in.

At the time of writing this, Conduit only supports subscribing to "all" content. We are currently working to allow subscribing to separate data streams. Please check our websocket documentation for the most up-to-date information.

async def join_stream(self, stream_url: str):
    subscribe_request = {
        "stream_url": stream_url,
        "content_type": ["all"]
    }
    await self.sio.emit("subscribe", subscribe_request)

Similarly, we can unsubscribe from streams by emitting an unsubscribe_event.

async def leave_stream(self, stream_url: str):
    unsubscribe_request = {
        "stream_url": stream_url,
        "content_type": ["all"]
    }
    await self.sio.emit("unsubscribe", unsubscribe_request)

Running the Bot

With the ConduitClient finished, we can now write the main driver code for the bot in main.py. This code will run an asyncio coroutine that connects to Conduit and joins our livestream. Once we join the stream, we should start receiving transcriptions and which will be forward to our TwitchBot. Pressing ctrl + c will stop the bot and disconnect us from Conduit and Twitch.

async def main():    
    conduit = ConduitClient()
    
    await conduit.connect()
    await conduit.join_stream("https://www.twitch.tv/thelazydeveloper")
 
    try:
        while True:
            await asyncio.sleep(1)
    except KeyboardInterrupt:
        await conduit.disconnect()
 
if __name__ == "__main__":
    asyncio.run(main())

Conclusion

And that's all folks! In this blog post, we saw just how easy it is to leverage real-time transcriptions and images from Twitch livestreams to build an AI powered Twitch chat bot. Using this technology, you can build highly intelligent chat bots that interact directly with the streamer and that build engagement with the Twitch audience.