Quick link to artifacts

Significant Results

Nursing as a discipline is heavily reliant on testing to ensure that a basic level of understanding has occurred.  This serves to protect the public whom we serve.  This testing occurs when students are in the nursing program and once they graduate, they must pass a national licensing exam in order to practice nursing.

At the beginning of the semester, Pat Woodbery and I sat down to evaluate the current exams in NUR1141C.  We reviewed prior test questions, made revisions, and wrote additional questions.  We input all of the questions into a robust test bank database, ParTest.  Along with its partner, ParScore, we have been able to come up with a large test bank of usable questions for NUR 1141C.  We are continually updating the questions based on current information and data discovered through the testing process.  During Fall of 2003, I placed all questions for Mental Health Concepts in Nursing into the ParTest system.  My course leader and I evaluated these questions and now have a respectable database from which to draw questions for Mental Health as well as Pharmacology.

Test construction is an ongoing process, with each evaluation (analysis of test data) one can see constant revision is required.  This semester, I used some questions that had performed very well in the 2 prior terms.  Based on the test data, these questions had to be nullified.  Was this a function of teaching or learning...or a combination of the two.  Only through item analysis with a large number of students responding to the question can this be clarified.

Faculty at Valencia and at other nursing schools highly value a well written test question that evaluates the learners understanding of nursing concepts and critical thinking ability.  One can reason that if a student can think critically, they are likely to make more sound clinical judgments.  It is important when testing to use questions that are at the level of application or higher in Blooms Taxonomy.   Questions that evaluate knowledge are important but do not go on to evaluate the level of thinking necessary to critically think.  One important aspect of testing is level of discrimination, questions that ask the student to choose the best, most, important, or first require that the student think critically.   In the example question students are ask to discriminate between several answers and indicating the best answer.  More than one may be plausible, but one is correct. In this example, you can see that all distracters were selected, with C being the weakest response.  Upon analysis, this is probably due to the fact the other distracters are valid responses, while C is not particularly plausible.  This distracter rather than the question is in need of revision (Morrison, Smith & Britt, 1996).

When developing a test blueprint certain aspects of test development are important to the success of the test.  Validity of tests is a critical part of test construction.  Validity refers to a tests ability to test what it is designed to test.  In other words, is it testing the knowledge taught or some other factor.   The test should reflect the content studied in equal proportion to class time and learning objectives.  ParTest allows the instructor to store questions by type, difficulty, cognitive level, and level of discrimination among other things.  This is quite valuable to instructors looking to put together a reliable instrument with a clear blueprint. 

The nursing department has parameters for test questions that are to be nullified, if a P-value falls below 0.45 the question is considered too difficult and is nullified. A P-value describes the difficulty of a question.  Items that are too difficult or too easy will alter the reliability of the whole test.  This is because a very high or low p-value does not discriminate between high and low achievers (McDonald 2002). In addition, Mrs. Woobery has taught me that tests must be evaluated as a whole, if the P-values range from 0.5-0.9 but over 30 percent of the test questions have P-values below 70, then that test is very difficult.  The test needs to be evaluated.

Techniques for test writing involve measuring cognitive levels, the most basic knowledge and comprehension are rarely used because they only ask the student to recall learned material. Most test items that are used are at the application level or higher.   Application items force the student to apply knowledge in a different way.  Analysis forces the student to take apart the information and make inferences.  The synthesis level  asks the student to combine information in new ways (http://www.kcmetro.cc.mo.us/longview/ctac/blooms.htm). ParTest enables instructors to keep track of the knowledge level and the results of all prior administrations of  individual test items. 

In order to develop reliable instruments, certain statistical data is necessary.  The median describes the midpoint of all scores on a particular instrument.  Reliability refers to the consistency of the scores.  The Kuder-Richardson value  is a measure that indicates a tests reliability (would the student perform the same if the test was readministered).  An acceptable reliability coefficient for teacher-made test is 0.7 or higher (0.99 being the most reliable).  This is an  example of one of our item analysis sheets for Spring of 2003, the prior test administration in 2002 was a Kuder-Richardson value of 0.75.   Kuder-Richardson values for NUR 1141C ranged from (0.69  to  0.79 ) in during Spring of 2003. 

In addition to the resources I used for writing this, I completed Assessment Strategies for Nursing Educators-Test Development and Item Writing., the content of the course was very helpful and highlighted the blueprint for the NCLEX.

 

Index of Artifiacts Used on This Page