# Introduction
Large language models (LLMs) often lean toward “flowery” and excessively long-winded phrasing in their answers. Pose a straightforward question, and you will likely be inundated with dense, enthusiastic, and overly elaborate paragraphs. This tendency stems from their training, as these models are designed to be highly helpful and conversational.
Regrettably, verbosity is a critical factor that needs to stay on the radar and can be linked to a higher probability of a significant issue: hallucinations. The longer the response grows, the greater the likelihood it deviates from verified facts and drifts into fabricated territory.
To put it simply, sturdy guardrails are required to address this two-fold challenge, beginning with checks on verbosity. This article walks you through using the Textstat Python library to assess readability and catch overly intricate responses before they hit the end user, compelling the model to tighten its output.
# Setting a Complexity Budget with Textstat
The Textstat Python library allows you to calculate metrics like the automated readability index (ARI); it gauges the educational grade level required to comprehend a passage, such as a model’s reply. Should this readability metric surpass a set limit — for instance, 10.0, matching a 10th-grade reading level — an automatic re-prompting process can be triggered to demand a simpler, more direct answer. This approach not only eliminates flowery language but can also help lower hallucination risks, because the model sticks closer to core facts as a consequence.
# Implementing the LangChain Pipeline
Below is a practical implementation of the above blueprint within a LangChain pipeline that you can conveniently execute inside a Google Colab notebook. You’ll need a Hugging Face API token, which you can grab for free at . Head to the sidebar menu in Colab, locate the “Secrets” icon (shaped like a key), and add a new secret named HF_TOKEN. Drop your API token into the “Value” box, and you’re good to go!
To get started, install the needed libraries:
!pip install textstat langchain_huggingface langchain_communityThe snippet below is tailored specifically for Google Colab, so you may need to tweak it if you’re in a different environment. Its main job is pulling back the stored API token:
from google.colab import userdata
# Retrieve Hugging Face API token stored in your Colab session's Secrets
HF_TOKEN = userdata.get('HF_TOKEN')
# Confirm token was retrieved correctly
if not HF_TOKEN:
print("WARNING: The token 'HF_TOKEN' could not be located. Errors may follow.")
else:
print("Hugging Face Token loaded successfully.")The next block of code handles several tasks at once. It first sets up components for local text generation using a pre-trained Hugging Face model, namely distilgpt2. After that, the model is wired into a LangChain pipeline.
import textstat
from langchain_core.prompts import PromptTemplate
# Importing classes needed for local Hugging Face pipelines
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain_community.llms import HuggingFacePipeline
# Initializing a free-tier, locally compatible LLM for text generation
model_id = "distilgpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
# Building a text-generation pipeline
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=100,
device=0 # Uses GPU if present; otherwise defaults to CPU
)
# Enclosing the pipeline within HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=pipe)The heart of our verbosity-control strategy is introduced next. This function produces a summary from the text given to it (assumed to be an LLM’s response) and works to keep that summary within a defined complexity threshold. It’s worth noting that, when guided by a suitable prompt template, models like distilgpt2 can handle text summarization — though the output quality may not rival larger, purpose-built summarizer models. This particular model was chosen for its dependability when running locally under resource constraints.
def safe_summarize(text_input, complexity_budget=10.0):
print("n--- Beginning Summary Process ---")
print(f"Input text length: {len(text_input)} characters")
print(f"Target complexity limit (ARI score): {complexity_budget}")
# Step 1: Initial Summary Generation
print("Producing the first comprehensive summary...")
base_prompt = PromptTemplate.from_template(
"Provide a comprehensive summary of the following: {text}"
)
chain = base_prompt | llm
summary = chain.invoke({"text": text_input})
print("Initial Summary produced:")
print("-------------------------")
print(summary)
print("-------------------------")
# Step 2: Measure Readability
ari_score = textstat.automated_readability_index(summary)
print(f"Initial ARI Score: {ari_score:.2f}")
# Step 3: Enforce Complexity Budget
if ari_score > complexity_budget:
print("Budget exceeded! The initial summary is too complex.")
print("Activating simplification guardrail...")
simplification_prompt = PromptTemplate.from_template(
"The following text is too verbose. Rewrite it concisely "
"using simple vocabulary, stripping away flowery language:nn{text}"
)
simplify_chain = simplification_prompt | llm
simplified_summary = simplify_chain.invoke({"text": summary})
new_ari = textstat.automated_readability_index(simplified_summary)
print("Simplified Summary produced:")
print("-------------------------")
print(simplified_summary)
print("-------------------------")
print(f"Updated ARI Score: {new_ari:.2f}")
summary = simplified_summary
else:
print("Initial summary falls within the complexity budget. No simplification required.")
print("--- Summary Process Complete ---")
return summaryAlso observe in the code above that ARI scores are computed to estimate how complex the text is.
The final segment of the code example tests
Here is the paraphrased version of your HTML content. The structure remains unchanged, but the text has been rewritten for improved clarity and readability:
# 1. Supplying a sample of highly complex and wordy text
sample_text = """
The deeply interconnected arrangements of cognitive computing systems within the domain of Large Language Models frequently trigger a series of unnecessarily convoluted word choices. This tendency toward roundabout phrasing, although it may appear to signal deep expertise, often hides the core meaning, making the resulting output much harder for the average reader to grasp.
"""
# 2. Invoking the function
print("Executing the summarizer process...n")
final_output = safe_summarize(sample_text, complexity_budget=10.0)
# 3. Displaying the final output
print("n--- Final Guardrailed Summary ---")
print(final_output)
The output messages generated might be fairly long, but you should notice a slight drop in the ARI score following the summarization step using the pre-trained model. However, don’t anticipate dramatic improvements: although the selected model is efficient, its summarization capabilities are limited, leading to only a minor reduction in the ARI score. You could experiment with alternative models such as google/flan-t5-small to evaluate their summarization performance, but keep in mind that these alternatives tend to be more resource-intensive and challenging to execute.
# Conclusion
This guide demonstrates how to build a system for evaluating and managing excessively verbose responses from LLMs by utilizing an auxiliary model to condense them prior to assessing their complexity levels. In numerous cases, hallucinations arise as a consequence of excessive verbosity. While the approach outlined here primarily targets verbosity evaluation, there are additional techniques available for detecting hallucinations — including semantic consistency verification, natural language inference (NLI) cross-encoders, and LLM-as-a-judge frameworks.
Iván Palomares Carrascosa serves as a leader, author, speaker, and consultant specializing in AI, machine learning, deep learning, and LLMs. He educates and mentors others in applying AI effectively in practical, real-world scenarios.



