Upon requesting a completion from OpenAI, the entire completion is generated and returned in a single response by default.
It may take several seconds to get a response if you’re generating long completions.
You can choose to stream the completion as it is being generated to receive answers more quickly. This gives you the option to print or process part of the completion before finishing the entire piece.
This is particularly useful for applications needing immediate feedback from the model, such as interactive chatbots, live coding assistants, or real-time content generation tools.
What is Streaming?
Streaming is a method used in computing where data is sent in a continuous flow, allowing it to be processed in a steady and continuous stream. Unlike the traditional download and execute model where the entire package of data must be fully received before any processing can start, streaming enables the data to start being processed as soon as enough of it has been received to begin operations.
Example of Standard Chat Completion Response
The completion response is computed and after that it is returned.
How to Stream a Chat Completion
To enable streaming with OpenAI’s API, set the stream key to true in your API request. Here’s an example using JavaScript:
You can now process the incoming data incrementally:
This loop listens for data chunks sent by the OpenAI model. It checks if the model has finished generating content and then writes each received chunk to the response. This method ensures that the frontend can begin processing data without waiting for the entire content to be generated.
What is Server-Sent Events (SSE)
Server-Sent Events (SSE) are a standard allowing servers to push information to web clients. Unlike WebSocket, SSE is designed specifically for one-way communications from the server to the client. This makes SSE ideal for applications like live updates from social feeds, news broadcasting, or as in this case, streaming AI-generated text.
SSE works over standard HTTP and is straightforward to implement in modern web applications. Events streamed from the server are text-based and encoded in UTF-8, making them highly compatible across different platforms and easy to handle in client-side JavaScript.
Consuming Streamed Data on the Frontend
To handle streamed data on the frontend, you can use the Fetch API to make a request to the server endpoint that initiates the stream. Following is an example:
The response is expected to be a stream (indicated by the text/event-stream content type), which is then read incrementally. The TextDecoderStream is used to ensure that the streamed text is properly decoded from UTF-8 as it is received.
Conclusion
By using SSE, developers can create more engaging user experiences, with AI responses delivered in real-time. Whether you are building a chatbot, a live commentary tool or any other application that benefits from immediate textual output, streaming AI completions is a powerful feature to include in your development toolkit. Are you curious to know more about AI tools and technologies? Our next blog from this series will provide you with interesting and useful information on integrating large language models with external tools. Click here to improve your tech knowledge.
Ready to improve your business operations by innovating Data into conversations? Click here to see how Data Outlook can help you automate your processes.
About Author
I’m Isha Taneja, and I love working with data to help businesses make smart decisions. Based in India, I use the latest technology to turn complex data into simple and useful insights. My job is to make sure companies can use their data in the best way possible.
When I’m not working on data projects, I enjoy writing blog posts to share what I know. I aim to make tricky topics easy to understand for everyone. Join me on this journey to explore how data can change the way we do business!
I also serve as the Editor-in-Chief at “The Executive Outlook,” where I interview industry leaders to share their personal opinions and add valuable insights to the industry.