Validating the Jane Competency System AI Critical Thinking Assessments

April 1, 2021
April 1, 2021

This blog post is the first of two that excerpt a HealthStream article, “The Validity of the Jane® Competency System AI Critical Thinking Assessments,” by Randy L. Carden, Statistical Consultant, HealthStream.

The development of critical thinking/judgment skills by nurses is of paramount importance in the healthcare industry today. Factors, such as the following, have all converged to make the development of advanced critical judgment skills a top priority:

  • A growing senior population requiring nursing care
  • A high percentage of seasoned nurses taking retirement
  • Nursing shortages in many areas of the country
  • Increased patient acuity in many settings
  • The need to bring new nurses up-to-speed as quickly as possible

It has long been thought that these types of skills could only be developed through years of on-the-job training and experience. Now, however, we are finding that artificial intelligence (AI) can play a major role in providing efficient, comprehensive tools for enhancing critical judgment.

The paper excerpted by this blog post details psychometric studies conducted by HealthStream to evaluate whether computers can perform as well as human evaluators in assessing critical thinking skills in nurses.


Human evaluators have long been relied upon to judge performance and to score tests and assessments. With the advent of artificial intelligence and machine learning, it raises the question of how well can a computer analyze responses of nurses who are asked to evaluate a clinical situation or dilemma? The following study sought to answer this question by comparing computer scoring with human scoring of nurse responses to clinical dilemmas. This study assesses the validity of the Jane™ competency system AI critical thinking assessments as an evaluative tool in scoring responses of RNs in situations where critical judgment is required.

Purpose of the Study

The purpose of this study was to test the validity of Jane™ (leveraging IBM Watson with HealthStream’s proprietary scoring algorithm). Validity is the degree to which a test, instrument, or assessment measures what it purports to measure. In this study a particular type of validity was evaluated—construct validity. Construct validity has to do with the degree to which an instrument measures a particular dimension, concept, or construct. In this case it relates to the degree to which Jane™ measures the critical thinking/judgment of a sample of nurses as they indicate how they would respond to various nursing dilemmas and situations. In the current study, if Jane™ scores correlate with a known measure of the construct in question, then construct validity will be established. In this study, Jane™ scores were compared to scores of a trained, human RN using “model answers” established by PBDS.

How the Study Was Conducted

In order to assess the critical thinking of participants, nurses viewed and then reacted to a series of videos that were approximately 2-3 minutes in duration. Specific videos were assigned to participants based on the nurse’s specialty area. After viewing a video segment, nurse participants were asked to do the following:

  1. Identify the primary emerging issue or problem
  2. Describe the clinical observations that supported the perceived emerging issue/problem
  3. Identify action strategies that they would take
  4. Identify the rationale or reasoning supporting the action they planned to take

Nurse responses to the critical judgment videos were compared to “model answers” that have been developed through 30+ years of response data and evidence-based practice. The nurse responses to the videos were evaluated by a team of nurses who have deep experience using the “model answers.”

Jane™ was “trained” by leveraging artificial intelligence, powered by IBM Watson, and the PBDS database which contains more than 15 million data points of completed assessment responses. This training included identification of problems, observations, actions, and rationale based on “model answers” established for PBDS.

Proprietary grading/scoring algorithms were developed by using the “model answers” with consultation and interpretative guidance by specially trained nurse raters. The next step included sending selected evaluations to an experienced lead nurse rater. The nurse rater evaluated 28 sets, which included 8 conversations per set, yielding a total of 224 conversations.


In order to evaluate the construct validity of using Jane™ and HealthStream’s proprietary scoring algorithm, nurses across three nationally recognized healthcare systems were recruited to take the critical thinking assessments. As a result, over 326 completions were obtained. Twenty-eight complete evaluations sets were selected across all score ranges for final comparison between Jane™ and human ratings.

The subsequent post in this series will include the following findings about janeTM:

  • Summary/Conclusions
  • Validity and Reliability of PBDS
  • Content Validity
  • Construct Validity
  • Predictive Validity
  • Reliability

HealthStream Focuses on Clinical Development

At HealthStream we spend a lot of time focused on developing the clinical workforce. HealthStream’s jane™ is The World’s First Digital Mentor for Nurses. Jane harnesses the power of artificial intelligence (AI) to create a system that personalizes competency development at scale, quickly identifies risk and opportunity, and improves quality outcomes by focusing on critical thinking. Leveraging decades of research and with over 4 million assessments completed, Jane was designed to power lifelong, professional growth of clinical professionals. JaneTM is an important component of HealthStream’s suite of clinical development solutions.

Download the full article, “The Validity of the janeTM Competency System AI Critical Thinking Assessments,” in which we investigate the assessments on which janeTM is built.