Measuring nursing competencies in the operating theatre: Instrument development and psychometric analysis using Item Response Theory

doi:10.1016/j.nedt.2012.04.008

Nurse Education Today

Volume 33, Issue 9, September 2013, Pages 1088-1093

https://doi.org/10.1016/j.nedt.2012.04.008 Get rights and content

Summary

Background

Concern about the process of identifying underlying competencies that contribute to effective nursing performance has been debated with a lack of consensus surrounding an approved measurement instrument for assessing clinical performance. Although a number of methodologies are noted in the development of competency-based assessment measures, these studies are not without criticism.

Research aim

The primary aim of the study was to develop and validate a Performance Based Scoring Rubric, which included both analytical and holistic scales. The aim included examining the validity and reliability of the rubric, which was designed to measure clinical competencies in the operating theatre.

Research method

The fieldwork observations of 32 nurse educators and preceptors assessing the performance of 95 instrument nurses in the operating theatre were used in the calibration of the rubric. The Rasch model, a particular model among Item Response Models, was used in the calibration of each item in the rubric in an attempt at improving the measurement properties of the scale. This is done by establishing the ‘fit’ of the data to the conditions demanded by the Rasch model.

Results

Acceptable reliability estimates, specifically a high Cronbach's alpha reliability coefficient (0.940), as well as empirical support for construct and criterion validity for the rubric were achieved. Calibration of the Performance Based Scoring Rubric using Rasch model revealed that the fit statistics for most items were acceptable.

Conclusion

The use of the Rasch model offers a number of features in developing and refining healthcare competency‐based assessments, improving confidence in measuring clinical performance. The Rasch model was shown to be useful in developing and validating a competency-based assessment for measuring the competence of the instrument nurse in the operating theatre with implications for use in other areas of nursing practice.

Introduction

The focus of the study was on developing and validating a Performance Based Scoring Rubric designed to assess the competence of the instrument nurse in the operating theatre.

The aim of preparing nurses for practice is to ensure that they are competent to manage and provide quality care for patients (Dolan, 2003, Pirie and Gray, 2007). Based on the evidence surrounding medical errors, major concerns in healthcare services include patient safety and quality of care. The competence of healthcare professionals to deliver safe, effective patient-centred care is an important factor contributing to the outcomes in patient safety (Lunney et al., 2007) with the delivery of quality care directly related to the level of competence (Meretoja and Leino-Kilpi, 2001, Winslade et al., 2007). While the purpose of education is relatively easy to define, there is a far greater challenge in trying to measure the outcome of education and ensure that competent nurses are being prepared (Edwards et al., 2001). Nurses are informed of the standard of practice expected of them with the development and acceptance of nursing competency standards; however an assessment system to support such competencies is required (Evans, 2008).

A major focus in nursing education is on the judgement of clinical performance, and, although it may appear straight forward, it is a complex process (Dolan, 2003). The challenge of assessing and measuring clinical competence is inherent in the nursing profession due to the diverse nature of nursing roles (Chambers, 1998, Dolan, 2003) including the complexity of nursing practice and human interaction (Gonczi et al., 1993), all of which are difficult to measure (Cowan et al., 2008, McGrath et al., 2006). With the transfer of nursing to the tertiary sector, there has been a review of the assessment procedures that dominated nursing education in Australia, such as the use of multiple choice tests and clinical skills tests, with a move towards competency-based assessment. The difficulty of finding an effective measure is reported in the literature as the search for a valid and reliable method of assessing competence continues (Norman et al., 2002, While, 1994). In order to be able to determine the nurse's competence in delivering the expected level of care, there is a need for the nursing profession to develop procedures that would enhance agreement about what constitutes a satisfactory performance (Dunn et al., 2000, Hager and Gillis, 1995).

Difficulties surrounding the development of valid and reliable assessment measures in nursing have resulted in the development of assessment instruments that closely measure the clinical experience, that is an increase in face validity with less focus on reliability, given that there are different variables present in the clinical setting that may influence the outcome of the assessment (McGrath et al., 2006). Although rarely addressed, the reliability and validity of assessment measures reported in the literature have not been specific enough (Watson et al., 2002) with theoretical frameworks rarely reported (Meretoja and Leino-Kilpi, 2001). Literature reviews suggest that there is no ‘gold standard’ for measuring clinical competence (Dolan, 2003, Redfern et al., 2002, Watson et al., 2002) with assessing nurses continuing to pose a challenge for nursing education (Evans, 2008).

Competence cannot be directly observed (Gonczi et al., 1993, Heywood et al., 1992, Wolf, 1989), therefore the evidence needs to be of sufficient quality and quantity to make a sound judgement about the individual's level of competence (Gonczi, 1994). With the movement in nursing from an apprenticeship model to one that focuses on the development of cognitive and caring skills (Benner, 1984), the importance of being able to measure nursing skills has resulted, since such measurement is central to the notion of professionalism and essential to nursing practice (Fotheringham, 2010, Norman et al., 1992). Similarly, evaluation of competence should include assessing competence in an integrated manner (Wolf, 1989). The method selected should be based on the relevance to what is being assessed to increase the validity of the evidence and ensure generalisability of the performance to other tasks (Bailey, 1993).

Reporting of results has been a contentious issue since the introduction of competency-based training and assessment, with much of the debate focused on reporting competence beyond the competent/not yet competent dichotomy (Gillis and Griffin, 2004). There are several scoring approaches used for evaluating performance with two main approaches, analytical and holistic scoring, identified by Anthanasou (1997). Analytical scoring requires an examination of specific aspects of the performance against a set of criteria, as opposed to judging the overall impression of the candidate's performance (Anthanasou, 1997, Perkins, 1983). Assessors may judge the entire performance as a whole but each aspect of the performance is awarded a separate score as well as the allocation of an overall score (Gillis, 2003, Goulden, 1989). An advantage of using the analytical method is the provision of more detailed information about the performance of the candidate. Strengths and weaknesses are identified allowing for tailored feedback in meeting the needs of the student.

When the overall performance is judged holistically, the method of evaluation considers the overall quality of the performance, including competent components, but does not mark them separately (Goulden, 1989). A single classification of competence level is made. A limitation of holistic scoring relates to a single score, which while useful in ranking information, provides very little diagnostic information about various aspects of the performance. If an analytical scoring process is adopted, evidence of competence needs to be evaluated against some scoring criteria, which are usually referred to as a ‘rubric’. A rubric, at its most basic, is a “scoring tool that lays out the specific expectations for an assignment” (Stevens and Levi, 2005, p. 3) which is implemented when an evaluation of the quality of performance is required. ‘Rubric’ is the technical term for allocating scores and is considered the most important aspect of the performance assessment (Gillis and Griffin, 2004, Griffin, 2000).

McGrath et al. (2006) provided a range of views of the use of generic domains of clinical competence without taking into consideration the specific skills required to practice in a specialist environment. The use of generic domains relates to concerns surrounding standardising nursing practice which is a common criticism of competency-based assessment (Evans, 2008). With the development of specific competencies, for example competencies specific to specialised areas of nursing, the assumption is that generic competencies will not be sufficient because the roles in different contexts of nursing vary considerably. Therefore the need to develop practice specific competencies has been identified (Sutton and Arbon, 1994). A lack of existing instruments dealing with the measurement of perioperative competencies was identified in the literature (Nicholson, 2005) which was supported by a report published with the revised Australian College of Operating Room Nurses Competency Standards (2006).

The aim of the study was to examine the validity and reliability of a Performance Based Scoring Rubric designed to assess the competence of the instrument nurse in the operating theatre using the Rasch model (1960). The investigation into developing a measurement instrument for determining clinical competence in nursing was driven by two main rationales. The first related to a lack of a ‘gold standard’ for judging clinical performance in nursing practice (Dolan, 2003, Redfern et al., 2002, Watson et al., 2002). The second was linked to exploring the use of Rasch model theory in validating a competency-based assessment for the operating theatre to attain empirical support for construct and content validity and identify measurement problems (Beck and Gable, 2001, Fox, 1999, Smith, 2004).

Section snippets

Development of the Performance Based Scoring Rubric

The instrument for recording the field observation of the nurse educators and preceptors included both analytical scoring rubrics and holistic scoring of both levels of proficiency and competence.

The internal consistency of the 18-item Analytical Observation Form

To examine the internal consistency of the Analytical Observation Form, the responses of the 32 nurse educators and preceptors observing 95 nurses in the operating theatre were analysed using Cronbach's alpha and Person Separation Reliability Index. This is known as the person indices, which is an indication of how much the test is able to provide a clear distinction between the person and the items as determined by Rasch Modeling (Wright and Masters, 1982).

The internal reliability of the

Conclusions

The implications for assessing and inferring competence in nursing practice have been identified in the literature. While a number of measurement instruments have been developed in nursing, it is acknowledged that inadequate psychometric testing has been reported resulting in a lack of confidence in supporting implementation of the measurement instrument. Evidence is required to reduce the subjectivity of direct observation and, using a checklist approach, which has been relied on heavily in

References (60)

D.E. Beck et al.
Increasing the accuracy of observer ratings by enhancing cognitive processing skills
American Journal of Pharmaceutical Education
(1995)
D. Cowan et al.
Measuring nursing competence: development of a self-assessment tool for general nurses across Europe
International Journal of Nursing Studies
(2008)
D. Fotheringham
Triangulation for the assessment of clinical nursing skills: a review of theory, use and methodology
International Journal of Nursing Studies
(2010)
C. Hagquist et al.
Using the Rasch model in nursing research: an introduction and illustrative example
International Journal of Nursing Studies
(2009)
I. Norman et al.
The validity and reliability of methods to assess the competence to practice of pre-registration nursing and midwifery students
International Journal of Nursing Studies
(2002)
E.S. Pirie et al.
Exploring the assessors' and nurses' experience of formal assessment of clinical competency in the administration of blood components
Nurse Education in Practice
(2007)
F.A. Sutton et al.
Australian nursing — moving forward? Competencies and the nursing profession
Nurse Education Today
(1994)
J.A. Anthanasou
Introduction to Educational Testing
(1997)
Australian College of Operative Room Nurses
Australian College of Operating Room Nurses Competency Standards
(2006)
Australian College of Operative Room Nurses
ACORN Standards for Perioperative Nursing
(2008)

M. Bailey

Theoretical Models and the Assessment of Competency

(1993)

A. Bateman

C. Beck et al.

Item Response Theory in affective instrument development: an illustration

Journal of Nursing Measurement

(2001)

P. Benner

From Novice to Expert. Excellence and Power in Clinical Nursing Practice

(1984)

T.G. Bond et al.

Applying the Rasch Model. Fundamental Measurement in the Human Sciences

(2001)

K. Bondy

Criterion-referenced definitions for rating scales in clinical evaluation

The Journal of Nursing Education

(1983)

P. Brink et al.

Advanced Design in Nursing Research

(1998)

M. Chambers

Some issues in the assessment of clinical practice: a review of the literature

Journal of Clinical Nursing

(1998)

B. Clayton et al.

Maximising confidence in assessment decision-making: a springboard to quality in assessment

G. Dolan

Assessing student nurse clinical competency: will we ever get it right?

Journal of Clinical Nursing

(2003)

S. Dreyfus

Formal models vs human situational understanding: inherent limitations on the modeling of business expertise

Office: Technology and People

(1982)

S.V. Dunn et al.

The development of competency standards for specialist critical nurses

Journal of Advanced Nursing

(2000)

H. Edwards et al.

Evaluating student learning: an Australian case study

Nursing & Health Sciences

(2001)

A. Evans

Competency assessment in nursing. A summary of literature published since 2000

C. Fox

An introduction to the partial credit model for developing nursing assessment

The Journal of Nursing Education

(1999)

Gillis, S. (2003). The domains of vocational assessment decision-making. Unpublished Doctoral dissertation, The...

S. Gillis et al.

Using rubrics to recognise varying levels of performance

Training Agenda: A Journal of Vocational Education and Training

(2004)

A. Gonczi

Competency based assessment in the professions in Australia

Assessment in Education: Principles, Policy & Practice

(1994)

A. Gonczi et al.

The Development of Competency‐Based Assessment Strategies for the Professions

(1993)

N.R. Goulden

Theoretical and empirical comparisons of holistic and analytical scoring of written and spoken discourse

Cited by (14)

The use and quality of reporting of Rasch analysis in nursing research: A methodological scoping review
2022, International Journal of Nursing Studies
Rasch analysis is widely used in the life sciences. Rasch analysis is a mathematical and probabilistic model based on the assumption that the probability of passing a single item is governed by a person's ability and the difficulty of the item. However, its use in nursing science remains unclear.
To (i) describe the use of Rasch analysis in nursing research and (ii) determine the quality of reporting in nursing studies using Rasch models.
A methodological scoping review of literature was conducted. The systematic electronic literature search was initially conducted on 1 February 2020 and updated on 16 April 2021 from PubMed/Medline and CINAHL databases. The search was limited to covering the timeframe from the earliest literature available until 31 December 2020. The search terms used were Rasch, IRT, item response theory, and nursing. The search was limited to the English language and title/abstract level. The analysis included quantification and content analysis.
In total, 388 hits were identified. Following a two-phase retrieval process, 88 articles were included in the final analysis. Rasch analysis was used to test the psychometric properties of the newly developed instrument, and validate or test a short version of the existing instrument. The reporting of Rasch analysis demonstrated large variability in quality. Rating scale functioning, internal scale validity using goodness-of-fit statistics, and unidimensionality were the most frequently reported outcomes.
The use of Rasch analysis in nursing science was found to be unsystematic. Rasch analysis could provide new possibilities for investigating measurement properties. However, robust, comprehensive, and precise reporting of the methodological choices and results of Rasch analysis is needed. Furthermore, the use of Rasch analysis in nursing science is encouraged.
- •
  Rasch analysis is a mathematical and probabilistic model based on the assumption that the probability of passing a single item is related to a person's ability and the difficulty of the item.
- •
  Rasch analysis is widely used in the life sciences. However, its use and quality of reporting in nursing science have yet to be explored.
- •
  Rasch analysis is rarely used in nursing science, although its use is increasing.
- •
  This review has identified deficiencies in the reporting of nursing validation studies using Rasch analysis.
- •
  Minimum standards for the reporting of Rasch analysis in nursing research are proposed.
Emergency department registered nurses’ disaster medicine competencies. An exploratory study utilizing a modified Delphi technique
2019, International Emergency Nursing
Citation Excerpt :
Although recognized as important, a review of literature revealed that specific disaster medicine competencies for ED registered nurses, as well as a method for assessing disaster preparedness of emergency department registered nurses may be vague or missing [23,24]. The ability to assess disaster preparedness, as well as training and educating staff as required may be difficult [25,26]. Training and education in disaster response should be planned in relation to defined learning objectives based on what the nurses need to master.
Students’ approaches to learning in a clinical practicum: A psychometric evaluation based on item response theory
2018, Nurse Education Today
Citation Excerpt :
This item-level analysis can yield valuable information for the utility of a learning approach instrument, because some items may possess higher discriminating power in differentiating students at varied levels of a learning approach, and some items may reflect a more difficult learning strategy than others. In response to this research gap, item response theory (IRT; van der Linden and Hambleton, 2013), as a modern measurement theory, offers a promising solution, and it has been increasingly used in nurse education in recent years (Nicholson et al., 2013). Differing from the conventional classical test theory, IRT models the response of an individual to an item as a function of item measures (item discrimination and item difficulty) and person measures (also referred to as latent trait, in a continuum ranging from low to high levels).
The investigation of learning approaches in the clinical workplace context has remained an under-researched area. Despite the validation of learning approach instruments and their applications in various clinical contexts, little is known about the extent to which an individual item, that reflects a specific learning strategy and motive, effectively contributes to characterizing students' learning approaches.
This study aimed to measure nursing students' approaches to learning in a clinical practicum using the Approaches to Learning at Work Questionnaire (ALWQ).
Survey research design was used in the study.
A sample of year 3 nursing students (n = 208) who undertook a 6-week clinical practicum course participated in the study.
Factor analyses were conducted, followed by an item response theory analysis, including model assumption evaluation (unidimensionality and local independence), item calibration and goodness-of-fit assessment.
Two subscales, deep and surface, were derived. Findings suggested that: (a) items measuring the deep motive from intrinsic interest and deep strategies of relating new ideas to similar situations, and that of concept mapping served as the strongest discriminating indicators; (b) the surface strategy of memorizing facts and details without an overall picture exhibited the highest discriminating power among all surface items; and, (c) both subscales appeared to be informative in assessing a broad range of the corresponding latent trait. The 21-item ALWQ derived from this study presented an efficient, internally consistent and precise measure.
Findings provided a useful psychometric evaluation of the ALWQ in the clinical practicum context, added evidence to the utility of the ALWQ for nursing education practice and research, and echoed the discussions from previous studies on the role of the contextual factors in influencing student choices of different learning strategies. They provided insights for clinical educators to measure nursing students' approaches to learning and facilitate their learning in the clinical practicum setting.
Psychometric properties of the Global Operative Assessment of Laparoscopic Skills (GOALS) using item response theory
2017, American Journal of Surgery
Citation Excerpt :
In CTT, the total score is calculated regardless of how much each item correlates with the operative skill that is intended to be measured. However, using global rating scales such as GOALS, scores on items are ordinal rather than interval and the psychological distance between items differs from one to the other.5,8 “Bimanual dexterity”, “efficiency” and “autonomy” in the GOALS scale are more difficult items.
The extent to which each item assessed using the Global Operative Assessment of Laparoscopic Skills (GOALS) contributes to the total score remains unknown. The purpose of this study was to evaluate the level of difficulty and discriminative ability of each of the 5 GOALS items using item response theory (IRT).
A total of 396 GOALS assessments for a variety of laparoscopic procedures over a 12-year time period were included. Threshold parameters of item difficulty and discrimination power were estimated for each item using IRT.
The higher slope parameters seen with “bimanual dexterity” and “efficiency” are indicative of greater discriminative ability than “depth perception”, “tissue handling”, and “autonomy”.
IRT psychometric analysis indicates that the 5 GOALS items do not demonstrate uniform difficulty and discriminative power, suggesting that they should not be scored equally. “Bimanual dexterity” and “efficiency” seem to have stronger discrimination. Weighted scores based on these findings could improve the accuracy of assessing individual laparoscopic skills.
Assessment of bachelor's theses in a nursing degree with a rubrics system: Development and validation study
2016, Nurse Education Today
Citation Excerpt :
Rubrics are assessment tools that establish criteria and achievement levels by setting a rating scale (Shipman et al., 2012). Rubrics are frequently used in nursing education in order to assess clinical judgement (Shin et al., 2015), clinical laboratories (Wu et al., 2015), or the skills acquisition in clinical settings (Nicholson et al., 2013). However, no studies have been found on their applications in the BT assessment.
Writing a bachelor thesis (BT) is the last step to obtain a nursing degree. In order to perform an effective assessment of a nursing BT, certain reliable and valid tools are required.
To develop and validate a 3-rubric system (drafting process, dissertation, and viva) to assess final year nursing students' BT.
A multi-disciplinary study of content validity and psychometric properties. The study was carried out between December 2014 and July 2015.
Nursing Degree at Universitat Jaume I. Spain.
Eleven experts (9 nursing professors and 2 education professors from 6 different universities) took part in the development and content validity stages. Fifty-two theses presented during the 2014–2015 academic year were included by consecutive sampling of cases in order to study the psychometric properties.
First, a group of experts was created to validate the content of the assessment system based on three rubrics (drafting process, dissertation, and viva). Subsequently, a reliability and validity study of the rubrics was carried out on the 52 theses presented during the 2014–2015 academic year.
The BT drafting process rubric has 8 criteria (S-CVI = 0.93; α = 0.837; ICC = 0.614), the dissertation rubric has 7 criteria (S-CVI = 0.9; α = 0.893; ICC = 0.74), and the viva rubric has 4 criteria (S-CVI = 0.86; α = 8.16; ICC = 0.895).
A nursing BT assessment system based on three rubrics (drafting process, dissertation, and viva) has been validated. This system may be transferred to other nursing degrees or degrees from other academic areas. It is necessary to continue with the validation process taking into account factors that may affect the results obtained.
Using Rasch analysis to identify midwifery students' learning about providing breastfeeding support
2015, Women and Birth
Citation Excerpt :
Another unique aspect of Rasch analysis is its capacity to explore individuals’ response patterns to a survey, to identify if participants tended to answer the items using the same rating category throughout the survey, or if they tended to avoid selecting the categories located at either ends of the scale. Rasch analysis has many applications; it has been used to measure self-efficacy in different occupational roles in nursing26; to measure competence of operating theatre nurses’27,28 and renal nursing29; to determine the performance of a depression screening scale30 and a psychological distress scale31; and for assessing English language skill development.32 Rasch analysis was chosen as the method of choice for this study for two specific reasons.
To report on a study measuring midwifery students’ self-reported abilities in teaching and supervising breastfeeding mothers. Abilities were assessed at two time intervals, before and after completing a maternal and infant nutrition topic with simultaneous clinical opportunities to consolidate their skills.
A convenience sample of midwifery students in an Australian university completed a pre- and post-intervention survey to assess their self-rated ability to teach and supervise breastfeeding mothers. Rasch analysis was used to gain conjoint estimates of students’ self-reported abilities to teach and supervise breastfeeding mothers across 37 items with varying complexity of care needs. Rasch analysis was used to determine validity and reliability of the 37-item tool, to develop a hierarchical linear scale reflecting difficulty of breastfeeding supportive activities, and to determine student learning between the two time points.
From 95 responses, 34 of the 37 items met the requirement for unidimensionality. Outcomes identified that midwifery students’ self-efficacy measures for breastfeeding support abilities can be reliably estimated. A hierarchical scale for learning these clinical skills has been generated to inform curricula development and learning processes. While there was an overall increase in the ease of difficulty of 21 items in the survey at time point two, eight items were identified as becoming more difficult.
The findings of this study challenge midwifery educators to reconsider the educational activities provided for midwifery students’ that aim to develop the requisite skills to provide effective breastfeeding support, including the provision of specific clinical opportunities to learn from breastfeeding women.

View all citing articles on Scopus

View full text

Measuring nursing competencies in the operating theatre: Instrument development and psychometric analysis using Item Response Theory

Summary

Background

Research aim

Research method

Results

Conclusion

Introduction

Section snippets

Development of the Performance Based Scoring Rubric

The internal consistency of the 18-item Analytical Observation Form

Conclusions

American Journal of Pharmaceutical Education

International Journal of Nursing Studies

International Journal of Nursing Studies

International Journal of Nursing Studies

International Journal of Nursing Studies

Nurse Education in Practice

Nurse Education Today

Introduction to Educational Testing

Australian College of Operating Room Nurses Competency Standards

ACORN Standards for Perioperative Nursing

Theoretical Models and the Assessment of Competency

Item Response Theory in affective instrument development: an illustration

Journal of Nursing Measurement

From Novice to Expert. Excellence and Power in Clinical Nursing Practice

Applying the Rasch Model. Fundamental Measurement in the Human Sciences

Criterion-referenced definitions for rating scales in clinical evaluation

The Journal of Nursing Education

Advanced Design in Nursing Research

Some issues in the assessment of clinical practice: a review of the literature

Journal of Clinical Nursing

Maximising confidence in assessment decision-making: a springboard to quality in assessment

Assessing student nurse clinical competency: will we ever get it right?

Journal of Clinical Nursing

Formal models vs human situational understanding: inherent limitations on the modeling of business expertise

Office: Technology and People

The development of competency standards for specialist critical nurses

Journal of Advanced Nursing

Evaluating student learning: an Australian case study

Nursing & Health Sciences

Competency assessment in nursing. A summary of literature published since 2000

An introduction to the partial credit model for developing nursing assessment

The Journal of Nursing Education

Using rubrics to recognise varying levels of performance

Training Agenda: A Journal of Vocational Education and Training

Competency based assessment in the professions in Australia

Assessment in Education: Principles, Policy & Practice

The Development of Competency‐Based Assessment Strategies for the Professions

Theoretical and empirical comparisons of holistic and analytical scoring of written and spoken discourse