AI hallucinations are a widely known drawback and, in terms of compliance assessments, these convincing however inaccurate assessments may cause actual injury with poor danger assessments, incorrect coverage steerage, and even inaccurate incident experiences.
Cybersecurity leaders say the actual hassle begins when AI strikes previous writing summaries and begins making judgment calls. That’s when it’s requested to determine issues comparable to whether or not safety controls are doing their job, if an organization is assembly compliance requirements, or if an incident was dealt with the proper approach.
Listed here are 9 methods CISOs can deal with the issue of AI hallucinations.
Maintain people within the loop for high-stakes selections
Fred Kwong, vp and CISO at DeVry College, says his workforce is rigorously testing AI in governance, danger, and compliance work, particularly in third-party danger assessments. He notes that whereas AI helps overview vendor questionnaires and supporting proof that assess the safety posture of these distributors, it doesn’t substitute folks.
“What we’re seeing is the interpretation is not as good as I would want it to be, or it’s different than how we’re interpreting it as humans,” Kwong says.
He explains that AI usually reads management necessities otherwise than skilled safety professionals do. Due to that, his workforce nonetheless evaluations the outcomes manually. For now, AI just isn’t saving a lot time as a result of the belief within the expertise simply just isn’t there but, he says.
Mignona Coté, senior vp and CISO at Infor, agrees that human oversight is crucial, particularly in danger scoring, management assessments, and incident triage. “Keep the human in the loop, full stop,” says Coté, who sees AI as a productiveness instrument, not one thing that ought to make ultimate selections by itself.
Deal with AI outputs as drafts, not completed merchandise
One of many largest dangers is over-trusting AI, based on safety specialists. Coté says her group modified its coverage so AI-generated content material can’t go straight into compliance documentation with no human overview.
“The moment your team starts treating an AI-generated answer as a finished work product, you have a problem,” she says. “Treat every output as a first draft as opposed to a final one. There will come a point where repetitive questions will have repetitive answers. By labeling those answers and time stamping them at origination time, they can be addressed at scale.”
Srikumar Ramanathan, chief options officer at Mphasis, says this over-trust usually comes from what he calls “automation bias.” Folks naturally assume that one thing written clearly and confidently should be right.
To counter that, he says firms have to construct an “active skepticism” tradition. “[That means] looking upon AI outputs as unverified drafts that require a signature of human accountability before they are actionable,” he explains.
Demand proof, not polished prose, from distributors
When distributors say their AI can “assess compliance” or “validate controls,” safety leaders say patrons have to ask the robust questions.
Kwong says he pushes distributors to offer traceability of the solutions that the AI offers so his workforce can see how the AI reached its conclusions. “Without that traceability, it makes it even that much harder for us to identify,” he says.
Ramanathan says patrons ought to ask whether or not the system can level to the precise proof behind its reply, comparable to a time-stamped log entry or a particular configuration file. If it may well’t, the instrument could be producing textual content that sounds proper.
Puneet Bhatnagar, a cybersecurity and identification chief, says the important thing query is whether or not the AI is definitely analyzing reside operational information or simply summarizing paperwork. “If a vendor cannot show a deterministic evidence path behind its conclusion, it’s likely generating narrative – not performing an assessment,” says Bhatnagar who most lately served as SVP and head of identification administration at Blackstone. “Compliance isn’t about language. It’s about proof.”
Stress-test fashions earlier than extending belief
Kwong recommends testing AI instruments to see how constant they’re. For instance, ship the identical information by twice and examine the outcomes.
“If you send the same data again, is it spitting back the same result?” he asks.
If solutions change considerably, that’s a purple flag. He additionally suggests eradicating essential proof to see how the mannequin reacts. If it confidently offers a solution anyway, that might sign a hallucination.
Coté says her workforce checks AI outputs in opposition to different instruments, together with scanning programs and exterior penetration testing outcomes. “And we don’t extend trust to any AI tool until it has proven itself against known outcomes repeatedly,” she says.
Measure hallucination charges and monitor drift
Safety leaders say organizations want to trace how correct AI is over time. Kwong says groups ought to recurrently examine AI-generated assessments with human evaluations and examine the variations. That course of ought to occur a minimum of quarterly.
Ramanathan suggests monitoring metrics comparable to “drift rate,” which measures how usually AI conclusions differ from human evaluations. “A model that was 92% accurate six months ago and is 85% accurate today is more dangerous than one that’s been consistently at 80% because your team’s trust was calibrated to the higher number,” he notes.
He additionally recommends measuring how usually cited proof actually helps the AI’s claims. If hallucination charges climb too excessive, organizations ought to scale back how a lot authority the AI has, for instance, downgrading it to a much less autonomous position of their governance fashions.
Look ahead to contextual blind spots in compliance mapping
Bhatnagar says essentially the most harmful hallucinations occur when AI is requested to make judgment calls about management effectiveness, regulatory gaps, or incident affect.
AI can produce what he calls “plausible compliance”, or solutions that sound convincing however are flawed as a result of they lack real-world context. Compliance usually relies on technical particulars, compensating controls, and operational realities that documentation alone doesn’t present.
Ramanathan provides that AI usually struggles with the nuance of permissive language, (“may,” “can”) versus restrictive language (“must,” “is required to”).
“For example, AI often misinterprets permissive language like ’employees may access the system after completing training’ as a strict, enforceable rule, treating optional permissions as mandatory controls,” Ramanathan explains. “This causes AI to overestimate the authority of permissive or vague language, resulting in incorrect assumptions about whether policies are properly enforced or security measures are effective.”
Push again on generic or equivalent assessments
Some distributors overstate what their AI instruments truly do. Bhatnagar says many instruments summarize paperwork or generate hole experiences however distributors market these options as in the event that they’re doing full, automated compliance checks.
The chance will increase when a number of clients obtain practically equivalent assessments. Organizations might consider their controls have been completely evaluated when the AI solely carried out a surface-level doc overview.
Ramanathan says this creates false confidence and broader trade danger. If one common mannequin has a flaw, that blind spot can unfold extensively.
Bhatnagar provides that he has seen distributors market AI instruments as assessing whether or not organizations are compliant, even when a number of clients obtain structurally comparable or practically equivalent assessments.
In these conditions, the instrument might not truly be analyzing company-specific insurance policies or proof however as a substitute producing textual content that seems personalized with out being grounded in actuality, he says. “We are still in the early stages of separating AI narrative generation from AI-based verification,” he says. “That distinction will define the next phase of governance tooling.”
Reinforce accountability in audits and authorized evaluations
From a regulatory standpoint, AI doesn’t take away duty, based on specialists. Ramanathan says regulators are clear that responsibility of care stays with company officers.
“If an AI-generated assessment misses a material weakness, the organization is liable for ‘failure to supervise,’” he says. “We are already in an era wherein relying on unverified AI outputs could be seen as gross negligence. If your audit findings are wrong because of an AI error, you haven’t just failed an audit, you are held responsible for filing a misleading regulatory statement. ‘AI told me so’ is not a defense.”
Coté says having the ability to present {that a} human reviewed and accredited every consequential choice is crucial throughout audits. “The key is proving a human was at every consequential decision point, with a timestamp and an audit trail to back it up,” she notes.
Be cautious with automated regulatory mapping
Ramanathan says that one of many largest compliance dangers seems when firms depend on AI to robotically map inside controls to regulatory frameworks, comparable to GDPR or SOC 2.
“The greatest compliance risk by far is in automated regulatory mapping,” he notes. “The AI might confidently claim a control exists or satisfies a requirement based on a linguistic pattern rather than a functional or operational reality.”
For instance, an AI instrument may see an encryption setting listed in a database configuration and assume encryption is lively, even when that characteristic is turned off within the system.
Ramanathan says this will create “a massive security gap where a company believes they are audit-ready, only to discover during a breach that their AI-verified defenses were nonexistent or misconfigured.”
To cut back that danger, he says organizations have to construction their insurance policies and laws extra clearly and join them to enforceable technical guidelines relatively than relying solely on AI to interpret paperwork.



