When using the OpenAI ChatGPT API for conversation, you can manipulate the maximum number of responses generated by using the ‘max\_tokens’ parameter, which limits the length of the model’s response.
This is an example of how to use it:
```
openai.ChatCompletion.create(
model=“gpt-3.5-turbo”,
messages=[
{“role”: “system”, “content”: “You are a helpful assistant.”},
{“role”: “user”, “content”: “Who won the world series in 2020?”},
],
max_tokens=60 # this limits the response to 60 tokens
)
```
Note that if the model returns less than ‘max_tokens’ tokens, you will still be billed for ‘max_tokens’ amount. Also, setting ‘max_tokens’ to a low value may result in responses that are cut-off and don’t make sense. You will need to experiment and balance this according to your requirements. The API does guarantee 4096 tokens and it will raise an error if the total tokens in conversation history + ‘max_tokens’ exceed this limit.