Research on artificial intelligence (AI) and mental health has focused largely on harms at deployment, including chatbot safety, sycophancy, and AI-associated delusions. Less attention has been paid to a prior question: whether the human-generated text and preference judgments that shape large language models are themselves clinically reliable, particularly when self-report may be distorted. This Viewpoint aims to develop the clinical psychiatric construct of collusion—the uncritical acceptance of an unreliable account—as an analytic lens for AI training and deployment, and to argue that the clinical reliability of training and preference data should be treated as an explicit trustworthy-AI criterion in mental-health–relevant systems. A conceptual synthesis of psychiatry, clinical psychology, and AI safety literature was undertaken. The analysis distinguishes three pipeline layers: pretraining corpora, preference data and posttraining methods, and deployment-time interaction. It maps the clinical construct of collusion against adjacent technical concepts, including sycophancy, reward overoptimization, grounding, refusal training, red-teaming, and live monitoring. The synthesis suggests that collusion-like dynamics are least applicable at the pretraining layer and most applicable at the preference-data and deployment layers, where unassessed user or labeler input can be reinforced without corroboration. Existing mitigations, including data curation, Constitutional AI, reward-model evaluation, grounded generation, refusal training, red-teaming, and postdeployment monitoring, address parts of this problem. However, these approaches are not yet organized around a clinically informed account of when self-report is unreliable. The central novelty is therefore not a generic claim about bias, but the proposal that clinical self-report reliability should be assessed as a distinct data-quality and governance dimension. Trustworthy-AI frameworks for mental-health–relevant applications should incorporate clinical expertise in self-report reliability into preference-data design, red-teaming, and postmarket surveillance. Adding the clinical reliability of training and preference data as an explicit criterion could complement existing technical safeguards while leaving empirical evaluation of clinician involvement as an open research agenda.
<img src="https://jmir-production.s3.us-east-2.amazonaws.com/thumbs/5de93d37e885a1e423959daac50674dc" />