As a developer exploring the capabilities of Elixir and Phoenix, I recently conducted a performance experiment using a community-driven OpenAI client. The goal was to evaluate how well Elixir, with its powerful concurrency model, handles high-volume API requests. The results were impressive, demonstrating Elixir's efficiency and robustness in managing concurrent tasks across multiple CPU cores.
Background
At CourseMojo, we are scaling an AI assistant teacher for public schools which provides real-time, intelligent responses to students' answers to open-ended questions. During our load testing, we identified that heavy CPU utilization was primarily due to calls to the OpenAI API. This prompted me to explore Elixir and Phoenix to see if they could offer a more efficient solution.
Project Setup
To set up the project, we used the following steps:
Create a New Elixir Project: Run the following command to create a new Elixir project and navigate into the directory:
mix local.hex mix archive.install hex phx_new mix phx.new openai_experiment --no-ecto cd openai_experiment
Add Dependencies: Add the ex_openai library to your mix.exs file:
defp deps do [ {:ex_openai, "~> 1.6"} ] end
Run mix deps.get to fetch the dependencies.
Configure the Client: Set up your configuration in config/config.exs:
import Config config :ex_openai, api_key: System.get_env("OPENAI_API_KEY") # the ex_openai http client seems to use this. config :hackney, pool_size: 1800, max_connections: 1800
Increase File Descriptor Limit: To handle more than 1000 concurrent connections, increase the file descriptor limit:
ulimit -n 4096
Implement the Concurrency Logic: We used Task.async_stream/3 to manage concurrency, allowing us to process tasks efficiently across multiple CPU cores.
Elixir Script
Here is the Elixir script lib/openai_experiment.ex used for the experiment:
defmodule OpenaiExperiment do
alias ExOpenAI.Chat
alias ExOpenAI.Components.ChatCompletionRequestUserMessage
@max_retries 4
@retry_delay 2000 # milliseconds
@concurrency_limit 500
def generate_chat_completion(index, attempt \\ 1) do
msgs = [
%ChatCompletionRequestUserMessage{role: :system, content: "You are a helpful assistant."},
%ChatCompletionRequestUserMessage{role: :user, content: "What is the number #{index}?"}
]
case Chat.create_chat_completion(msgs, "gpt-4o-mini") do
{:ok, response} ->
IO.inspect(response, label: "Response for number #{index}")
{:ok, index}
{:error, %HTTPoison.Error{reason: reason}} when attempt <= @max_retries ->
IO.puts("HTTP error for number #{index}: #{inspect(reason)}, attempt #{attempt}")
:timer.sleep(@retry_delay)
generate_chat_completion(index, attempt + 1)
{:error, reason} when attempt <= @max_retries ->
IO.puts("Retrying number #{index} due to #{inspect(reason)}, attempt #{attempt}")
:timer.sleep(@retry_delay)
generate_chat_completion(index, attempt + 1)
{:error, reason} ->
IO.inspect(reason, label: "Final error for number #{index}")
{:error, index, reason}
end
end
def execute_chats_concurrently do
{time, results} = :timer.tc(fn ->
1..1800
|> Task.async_stream(&generate_chat_completion/1, max_concurrency: @concurrency_limit, timeout: 30000)
|> Enum.to_list()
end)
summarize_results(results)
IO.puts("Total execution time: #{time / 1_000_000} seconds")
end
defp summarize_results(results) do
successes = Enum.count(results, fn
{:ok, {:ok, _index}} -> true
_ -> false
end)
failures = Enum.count(results, fn
{:ok, {:error, _index, _reason}} -> true
_ -> false
end)
IO.puts("Summary:")
IO.puts("Successful tasks: #{successes}")
IO.puts("Failed tasks: #{failures}")
end
end
# Run the test
#OpenaiExperiment.execute_chats_concurrently()
Compiling and Running the Example
Compile the Project: Ensure your project is compiled by running:
mix compile
Start the Elixir Interactive Shell: Launch the interactive Elixir shell with your project loaded:
iex -S mix
Run the Example: Once inside the interactive shell, execute the function to run the experiment:
OpenaiExperiment.execute_chats_concurrently()
Results
The results of the experiment were as follows:
- Successful tasks: 1800
- Failed tasks: 0
- Total execution time: 23.436011 seconds
Analysis
The real time, which represents the total elapsed time, was significantly efficient for the Elixir script. Here are some key takeaways from the results:
- Concurrency Handling: Elixir handled 1800 tasks efficiently, leveraging its lightweight process model with a concurrency limit of 500.
- Multi-Core Utilization: Elixir utilized all 4 CPU cores, demonstrating its ability to efficiently distribute tasks across available resources.
- Built-in Retries: The script included a retry mechanism to handle transient errors, ensuring robustness in task execution.
This experiment demonstrated that Elixir, with its concurrency model, is a strong contender for handling high-volume, concurrent API requests using a community-driven OpenAI client. Elixir's efficient concurrency handling makes it a superior choice for scenarios where performance and speed are critical.
If you’re working on a project that involves extensive use of the OpenAI API or any other high-volume API interactions, consider leveraging Elixir to maximize performance and efficiency. The results of this experiment highlight the potential gains in speed and responsiveness that can be achieved with the right choice of technology.
It's worth noting that the OpenAI client used in this experiment is community-driven, not official, which also impacts performance. Additionally, keep an eye on developments in Elixir and Phoenix, as they continue to evolve and improve.
This post was written with the help of GPT-4 using the Elixir ex_openai library.