Complere Infosystem

How to stream OpenAI chat completions

How to Stream OpenAI Chat Completions

How to stream OpenAI Chat
Completions

JUNE 04, 2024 | BLOGS

How to stream OpenAI chat completions

Upon requesting a completion from OpenAI, the entire completion is generated and returned in a single response by default.

It may take several seconds to get a response if you’re generating long completions.

You can choose to stream the completion as it is being generated to receive answers more quickly. This gives you the option to print or process part of the completion before finishing the entire piece.

This is particularly useful for applications needing immediate feedback from the model, such as interactive chatbots, live coding assistants, or real-time content generation tools. 

What is Streaming?

Streaming is a method used in computing where data is sent in a continuous flow, allowing it to be processed in a steady and continuous stream. Unlike the traditional download and execute model where the entire package of data must be fully received before any processing can start, streaming enables the data to start being processed as soon as enough of it has been received to begin operations.

Example of Standard Chat Completion Response

The completion response is computed and after that it is returned. 

completion response

How to Stream a Chat Completion

To enable streaming with OpenAI’s API, set the stream key to true in your API request. Here’s an example using JavaScript: 

chat completion

You can now process the incoming data incrementally: 

OpenAI model

This loop listens for data chunks sent by the OpenAI model. It checks if the model has finished generating content and then writes each received chunk to the response. This method ensures that the frontend can begin processing data without waiting for the entire content to be generated. 

What is Server-Sent Events (SSE)

Server-Sent Events (SSE) are a standard allowing servers to push information to web clients. Unlike WebSocket, SSE is designed specifically for one-way communications from the server to the client. This makes SSE ideal for applications like live updates from social feeds, news broadcasting, or as in this case, streaming AI-generated text. 

SSE works over standard HTTP and is straightforward to implement in modern web applications. Events streamed from the server are text-based and encoded in UTF-8, making them highly compatible across different platforms and easy to handle in client-side JavaScript. 

Consuming Streamed Data on the Frontend

To handle streamed data on the frontend, you can use the Fetch API to make a request to the server endpoint that initiates the stream. Following is an example: 

Data on the Frontend

The response is expected to be a stream (indicated by the text/event-stream content type), which is then read incrementally. The TextDecoderStream is used to ensure that the streamed text is properly decoded from UTF-8 as it is received.

Conclusion

By using SSE, developers can create more engaging user experiences, with AI responses delivered in real-time. Whether you are building a chatbot, a live commentary tool or any other application that benefits from immediate textual output, streaming AI completions is a powerful feature to include in your development toolkit. Are you curious to know more about AI tools and technologies? Our next blog from this series will provide you with interesting and useful information on integrating large language models with external tools. Click here to improve your tech knowledge. 

Scroll to Top