In an assessment context, trustworthiness means honest, valid and reliable assessments that really do assess what they claim to, and assessment procedures that produce consistent results, when administered in similar circumstances, at different times and involving different raters. Judgement is thorough and well-grounded in evidence, as well as being motivated by honourable intentions and guided by integrity. This is in contrast to assessments that might be one-off, or based on thin evidence and where the basis of judgements are not clearly explained and documented.
Self or peer reflection
- Think of an assessment task you have used that worked well and provided helpful information.
– What characteristics contributed to your view that it was a successful task?
– In what ways was the information it provided helpful to you and your students?
- Think of an assessment task that did not work well or that did not provide helpful information.
– What characteristics contributed to your view that it was not a successful task?
– In what ways could the task have been improved to provide more helpful information?
Teacher-based assessment of EAL students is inherently more valid and reliable than externally set and assessed tests because it assesses authentic language use in low-stress conditions with multiple assessment tasks. However, teacher-based assessment cannot be treated like an external exam in which tasks are totally standardised and all contextual variables controlled; to attempt to do so would be to negate the very rationale for teacher-based assessment. To be effective, schools and teachers must develop skills in selecting/designing and implementing assessment tasks, as well building trust in their own judgements and respect for those of their colleagues.
Some key things teachers (and schools) can do to enhance the trustworthiness of assessment include ensuring students have:
• information about how and when they will be assessed
• clear understanding of the assessment criteria
• an opportunity to use language that is appropriate, natural and authentic
• an opportunity to demonstrate their best
• confidence that the scoring is balanced and fair.
In addition teachers need to ensure:
• they use benchmarking processes that are collaborative, interactive and respectful
• sufficient records are kept for accountability purposes.
Students should understand how and when they will be assessed
Teachers need to plan the teaching and learning activities that will improve the students’ confidence and skills for the assessment. The task types used for TEAL may be new to students and need to be practised. When students practise using similarly structured activities based on different material, they receive feedback about whether the language choices they are making are appropriate for the task type and text type. Practice opportunities also allow students to become comfortable with the demands of the task. A certain level of comfort is necessary for students to produce high quality, authentic language that is not forced or memorised – as may be the case when students are not familiar with this type of activity.
Students must know before they begin a planned assessment task exactly what it will be like. Telling them the exact task can be delayed until a few days or even one day before the assessment, depending on the preparation the task involves. Delaying information about the exact assessment task prevents over-rehearsal and memorisation of key words/speeches/scripts. Teachers should also ensure students have read and understood the assessment criteria and have experience using them for self and peer assessment in informal situations before conducting a planned assessment activity. EAL students need to understand the language as well as the concepts expressed in the criteria. Ideally teachers have also used these criteria for informal assessment and teaching purposes before they conduct any formal assessments so that students are very familiar with the criteria describing different levels of success. Schools should also make sure parents understand that the assessment criteria are a valuable source of feedback, not simply a score.
Student language use should be appropriate, natural and authentic
EAL students who are developing competency in social interactions may still struggle to express themselves appropriately in academic contexts. They constantly make choices about grammar and vocabulary at the same time as they are trying to communicate ideas about content. They may use gesture or even draw a picture to help them. If these resources are suddenly withdrawn, the stress may reduce the amount and quality of language the student is able to produce. Alternatively, students may try to memorise chunks of text which may not be understood, may not represent their true level of competency or may not be used in an appropriate way to support meaning. To aid the use of appropriate, natural and authentic language, students need to feel comfortable about the task requirements, level of difficulty and audience for their text, product or conversation. They need to feel comfortable about taking risks to express their understanding and ideas, to ask questions (if appropriate) and be aware that making mistakes can be opportunities for further learning. If an assessment task will be undertaken as part of a group, students should have already practised a similar task with the same people so that any interpersonal issues which may inhibit language use are identified beforehand and steps are taken to resolve the issue(s).
Students have the opportunity to demonstrate their best
If students are to demonstrate their best, they must be be familiar with what they are striving to accomplish. If students are given some level of choice within the assessment task, this will enhance confidence. This is particularly important in the case of students who are very shy or lack self-confidence, or whose English skills are very low level.
During the assessment task the teacher/assessor may need to:
- offer personal encouragement to participate or extend ideas
- scaffold or prompt language when the students become stuck
- encourage students to use their own words if the language appears rehearsed or memorized.
Teacher comments or questions should be used flexibly to ensure that students have the opportunity to show what they know and are capable of in order to achieve the most valid “true” judgment of ability. That is, questions and task prompts are not designed to be standardised to the extent that students who do not understand or who misunderstand have no opportunity to demonstrate their language skills. When making their assessment judgements, the teacher/assessor needs to take into account the amount and nature of teacher support given and any variations made to the assessment task. Such information needs to be clearly recorded and communicated to the student and colleagues.
The assessment is credible, balanced and fair
Teachers cannot measure ability or progress without continually placing students in situations that allow them to demonstrate understanding. To be credible, the assessment needs to relate to core material and skills which have been taught. Similarly, if the content of the assessment is too hard or if the language used is too complex, students may not be able to apply what they have learnt. The form of the assessment also needs to elicit the information the teacher needs. A cloze passage, for example, may assess literal and inferential comprehension as well as vocabulary and syntax, but not oral fluency or pronunciation. Credibility is also known as assessment validity.
It is easy for teachers to develop scoring biases, for example, giving much more emphasis to accuracy than fluency, or rewarding confidence, quantity and fluency more than the quality of the ideas. Teachers may also allow their overall impression of the student to influence their marking. Handwriting, body language and level of cooperation are personal attributes which may distract a teacher from providing balanced feedback. Moderation using a sample of responses is one way to ensure consistent scoring. Consistent award of scores by the same marker is known as intra-rater reliability, whereas consistent award of scores between markers is known as inter-rater reliability.
Fair assessments are not biased in favour of any group of students: gender, first language, country of birth, socio-economic status. The assessment should discriminate only in terms of student ability, not in terms of religion, family background or race. As it may be hard to predict how a student may react to the demands of a particular assessment, a range of assessments should be used over time. Again, if students are given some level of choice within the assessment task, this will allow them to avoid a poor performance based solely on lack of prior-knowledge or unexpected stress evoked by the content.
Benchmarking is collaborative, interactive and respectful
Benchmarking involves a group of teachers meeting together to share understanding of what it means to achieve each level in the success criteria. They try to reach agreement on how their own students’ work compares to the wider group of EAL students. They refine their understanding by:
- identifying any areas where they may not be looking at tasks or performances in quite the same way
- talking through issues respectfully until they can resolve any misunderstandings
- applying their revised understanding to more sample responses
- satisfying themselves they all have a similar perception of what the criteria look like when their own students are doing it.
It is not necessary to have complete consensus: teachers do not all need to agree to give identical scores as some variation between adjacent scores is to be expected. Trustworthiness comes more from the process of seeking agreement, justifying options and so on than from absolute agreement. Benchmarking should be a collaborative and interactive process. It ensures that teachers agree on the interpretation of the assessment criteria and that any scores given by different teachers have a high level of consistency. Benchmarking allows the students’ teacher to stand back while their colleagues challenge or confirm their judgements. Assessment for learning does not assume that the class teacher is totally objective or has no preconceived ideas or assumptions about a student’s level of English. To the contrary, it seeks to make such assumptions explicit and open to discussion with fellow teachers.
Benchmarking starts by teachers sharing ideas about all aspects of the assessment process, including:
- discussing which tasks are suitable for their own context
- talking about problems and finding ways to solve them
- looking at / listening to selected responses of students in other classes and discussing their judgements
- helping each other review scores
- discussing problematic cases and areas of concern
- evaluating appropriate feedback for the students.
All of these professional activities help teachers feel confident in making professional judgements and teaching.
Listen to Dr Penny McKay, author of the EAL Band Scales, talking about the benefits for teachers of discussing student assessments with colleagues. Then listen to Prof. Dylan Wiliam talking about the importance of sharing assessment judgements with colleagues.
The more benchmarking becomes part of the climate of the school, the less teachers need to talk through what each score means because they will gradually internalise the way the rubrics work. This within-the-school understanding and consensus-building grows over time as the principles and standards of each assessment become familiar to everyone, and as all teachers learn to trust themselves and their colleagues to conduct the system carefully and honestly. This is not “another meeting” intruding into teachers’ time: it is a rich form of professional development from which many teachers gain tremendous personal benefit.
Schools are accountable for student performance and progress. Reports to parents and for administrative purposes need to be based on evidence which can be produced if necessary. Keeping records of student responses and any scores assigned allows progress to be tracked over time. Different kinds of records of the assessment process should be kept, including:
- a brief results sheet for each student for each assessment activity
- class records when a number of students have undertaken the same assessment activity
- paper or digital (audio or video) copies of a range of sample assessments.
Think of a recent assessment you have conducted. Discuss its trustworthiness and any changes would you make to the assessment to improve its trustworthiness in the future. Consider the following questions:
To what extent did it measure the learning you intended it to measure?
- Would experts endorse the quality of the questions and success criteria?
- Did the task or questions focus on core content?
- Does student performance reflect their level of understanding of core content?
To what extent did the results accurately reflect student ability?
- Did factors such as timing of the test, tiredness, personal distraction, illness or anxiety affect the student’s performance?
- What processes were followed to ensure accurate and consistent scoring?
To what extent did the results influenced by gender, first language or background knowledge?
Activity 2: The role of criteria in producing trustworthy assessment decisions
This activity involves participants in exploration of issues related to the use of multiple criteria in assessment rubrics. The following samples are used for these tasks.
- TEAL Writing Task 13: Sample A
- TEAL Writing Task 13: Sample B
- TEAL Writing Task 13: Sample C
- TEAL Writing Task 13: Sample D
- TEAL Writing Task 13: Sample E
Select your mode of completion of this activity using the tabs below.
- Select Self study/Individual if you are completing this individually.
- Select Collaborative Group if you are a group of teachers collaborating in this Professional Learning activity.
Allocate all participants to three sub-groups. Allocate each sub-group one of the following criteria to use in assessing some samples of TEAL Writing Task 13, Text reconstruction: Making a Pizza. Use these broad criteria:
- Clear communication about the topic: The extent to which the text looks like a recipe (that is, it conforms to the expectations and conventions of a recipe) and provides a clear set of instructions for a reader to follow
- Accurate and correct use of language: The extent to which appropriate language (grammatical structures including imperatives, prepositions and adverbs) and vocabulary are used accurately and correctly.
- Writing skills in English: The extent to which the writing conventions (i.e. spelling, punctuation and formatting) are correct and appropriate.
- Give each sub-group a set of TEAL Writing Task 13, Making a Pizza: Text reconstruction Samples A, B, C, D and E.
- Ask each sub-group to rank the samples from 1 (weakest) to 5 (strongest) using their allocated criterion.
- Complete the task looking only at the single criterion allocated to the sub-group. At this stage do not refer to the TEAL website and the annotations or criteria sheet for TEAL writing task 13. Work only with the criterion provided for this task.
- Compare the rankings given by each sub-group.
Discuss these questions:
- Did the use of three different criteria result in similar or different rankings of the samples?
- What does this result tell you about the use of multiple criteria in language assessment of EAL learners?
- When might it be appropriate to give greatest weighting to each of the criteria used in this task?
Use the following table to match each assessment purpose to the most relevant criterion. Use these abbreviations:
CC: Clear communication
AL: Accurate language
WS: Writing skills
|Assessment purpose||Weighted criterion|
|Diagnosis of an EAL student’s learning needs in relation to basic prepositions (words like on, with)|
|Identification of the EAL student’s overall ability to write recipes or procedural texts|
|Assessment of the extent of EAL student learning towards the end of a unit of work that taught a range of verbs within the theme of cooking|
|Checking the progress of an EAL student in developing accurate spelling|
|Gauging how capable the student is in giving instructions|
|Checking the progress of an EAL student in using punctuation conventions of English|
Click here for answers to this task. To what extent were your answers the same as those given on the key?
4. When multiple teachers are involved in assessing EAL students, what procedures can be followed to ensure there is clear understanding of the weighting each gives to different criteria? When should teachers use the same weightings of the criteria? When is it appropriate for different teachers to use different criteria or weightings?
|Click here to to view post-discussion reflections intended to identify and explore points you may have covered in your discussion. Professional Learning leaders can choose the best way to share this with colleagues, whether by leading a discussion covering these points, or by sharing this sheet with participants after the discussion.|
Activity 3: The influence of development and prior learning on EAL student performance of assessment tasks
This activity involves participants in exploration of the ways in which EAL student performance of assessment tasks can be influenced by their age and prior experience.
The following samples are used for these tasks.
Select your mode of completion of this activity using the tabs below.
- Select Self study/Individual if you are completing this individually.
- Select Collaborative Group if you are a group of teachers collaborating in this Professional Learning activity.
Form group(s) of 3 to 5 participants and work through the following tasks and texts together: Step 1. Rank the three samples according to the level of English language development evident in the samples. Step 2. The teacher wrote evaluations of each sample. Read the evaluations, and identify the sample each evaluation refers to. Step 3. The teacher provided the following information about the students who wrote these samples. Identify which student wrote each sample. Step 4 Rank the samples according to how much support their writers (EAL students A, B and C) will need in order to meet the expectations of mainstream classes at their year level. In doing this, consider tasks involving written procedure texts the students will be expected to complete at the relevant year level. Base your judgment on the samples produced here, and what you now know about the students’ learning trajectories. (Note: While it is not good practice to make broad judgments about student level or needs based on performance of a single assessment task, it used here for professional learning purposes only. It is not expected you would use such a procedure with your students.) Give reasons for your ranking. Step 5. Sample D was ranked 1 (lowest) of six year 5/6 samples the teacher collected, while Sample G was ranked 3, and Sample F was ranked 8, out of twelve Year 1- 2 samples collected (where 1 is the lowest level of performance of the task and 12 is the highest). Yet when these three samples are assessed by language criteria, Sample D is the strongest of these three samples. How would you explain that the strongest of these samples was the weakest of the Year 5/6 samples? Now place your reasons in the most appropriate column in the following table: Differences as a result of different stages of cognitive development e.g. a capacity to see different points of view Differences as a result of prior learning and ‘knowledge of the world’ – school, cultural and social knowledge e.g. knowledge that planets move around the sun Differences in language knowledge expected at different year levels e.g. how a Year 1 child may be expected to express an opinion, such as I like…, compared to how a Year 6 child may be expected to describe opinions, such as Some people think that… Reflection: How may differences in age and prior learning influence EAL learners’ performance of an assessment task? How should these be taken into account when applying criteria to tasks completed by EAL learners? Can you identify a recent experience you have had where the prior experiences of an EAL learner has influenced their performance of a task, and produced a result that surprised you in some way?
A. The text is a strong response to the task for a student at this year level. The text is comprehensible, and recognizable as an attempt at a recipe. Imperative verbs are used appropriately, and some steps are well-formed sentences. While the sequence of some steps is marked with adverbs such as ‘then’, this is more reminiscent of the style of spoken instructions than formal written texts. The writing is comprehensible, and some content vocabulary is correctly spelt. Misspelt words are recognisable, and indicate the student is aware of either the shape of the intended word or uses a phonetic spelling.
B. This text shows that the student is aware of what is expected in a recipe, but has not yet developed the language necessary to produce a text expected at this year level. While the student is aware of the imperative verb forms and uses them to begin instructions, the student currently has limited knowledge of how to use simple prepositions to add details to the instructions. The student knows some of the vocabulary for this topic, and can spell some of these words correctly, but there are noticeable spelling errors in some of the topic-specific vocabulary. Some misspelt words can be identified, but others are vey difficult to recognise. Letters are generally well-formed, and capital letters are used to begin sentences, but full stops are not usually used.
C. While the student struggled to complete the task, the text illustrates a developing awareness of what is expected in written texts, and is a good attempt at the task at this stage of the student’s learning. While not all ingredients and steps are included in the text, an attempt is made to write some steps as complete sentences. A few words are spelt correctly and there are some attempts at spelling that reflect developing awareness of sound-symbol relationships of English. While many letters are recognizable, some are easily confused with other letters. The student does not appear to be aware of the need for punctuation, nor how it should be used in sentences.
This Year 5 EAL student did not attend school in their country of origin, and began to develop literacy skills after starting school in Australia nearly two years ago. The student has good spoken language skills, but while very good progress has been made with writing skills, some parts of the student’s writing is difficult to recognise. This needs to be addressed for the student to succeed in different areas of the curriculum.
This Year 2 student has made good progress in learning English and developing literacy skills since beginning school in Australia 18 months ago. The student is making very good progress in communicating in written English, but makes significant errors in spelling, and has been slow to develop an awareness of formal writing conventions such as punctuation.
This Year 1 EAL student commenced school two terms ago, and had not attended pre-school or school prior to commencing school in Australia. The student is making good progress in developing spoken English and in developing basic literacy skills.
Form group(s) of 3 to 5 participants and work through the following tasks and texts together:
Rank the three samples according to the level of English language development evident in the samples.
The teacher wrote evaluations of each sample. Read the evaluations, and identify the sample each evaluation refers to.
The teacher provided the following information about the students who wrote these samples. Identify which student wrote each sample.
Rank the samples according to how much support their writers (EAL students A, B and C) will need in order to meet the expectations of mainstream classes at their year level. In doing this, consider tasks involving written procedure texts the students will be expected to complete at the relevant year level. Base your judgment on the samples produced here, and what you now know about the students’ learning trajectories.
(Note: While it is not good practice to make broad judgments about student level or needs based on performance of a single assessment task, it used here for professional learning purposes only. It is not expected you would use such a procedure with your students.)
Give reasons for your ranking.
Sample D was ranked 1 (lowest) of six year 5/6 samples the teacher collected, while Sample G was ranked 3, and Sample F was ranked 8, out of twelve Year 1- 2 samples collected (where 1 is the lowest level of performance of the task and 12 is the highest).
Yet when these three samples are assessed by language criteria, Sample D is the strongest of these three samples.
How would you explain that the strongest of these samples was the weakest of the Year 5/6 samples?
Now place your reasons in the most appropriate column in the following table:
Differences as a result of different stages of cognitive development
e.g. a capacity to see different points of view
Differences as a result of prior learning and ‘knowledge of the world’ – school, cultural and social knowledge
e.g. knowledge that planets move around the sun
Differences in language knowledge expected at different year levels
e.g. how a Year 1 child may be expected to express an opinion, such as I like…, compared to how a Year 6 child may be expected to describe opinions, such as Some people think that…
How may differences in age and prior learning influence EAL learners’ performance of an assessment task? How should these be taken into account when applying criteria to tasks completed by EAL learners?
Can you identify a recent experience you have had where the prior experiences of an EAL learner has influenced their performance of a task, and produced a result that surprised you in some way?
In this video, Dr Douglas Reeves discusses the challenges of moderation for teachers. Despite the time it may take for teachers to reach agreement, moderation is critical if teachers are to sort out ambiguities in the scoring process. Teacher agreement about how to interpret scores is critical if students are to see the assessment process as fair.
At this site you can view 20 videos which together form a Canadian webcast on Teacher Moderation: Collaborative Assessment of Student Work. (The video listed above is number 18 in the list.) Three teachers use moderation to develop a shared understanding of a student’s achievement by analysing the tasks, rubrics, student work samples and curriculum documents. Once they agree on what the student can already do, they collaboratively determine the next steps required to move the individual (or the class) forward in their learning.
This video shows a discussion between a class teacher and a senior colleague. Together they examine a piece of work and the success criteria to validate the decision to award standard A.
AITSL Illustrations of practice: The Australian Institute for Teaching and School Leadership provides videos of teachers grading using rubrics, and using success criteria. Scroll to the standards pertaining to assessment.
This web page gives information about error of measurement in relation to each domain of NAPLAN and each school year tested. School leaders are reminded that all tests are subject to a certain amount of measurement error.
The authors of this article on linking classroom assessment with student learning (published by ETS) emphasise that classroom assessment does more than measure learning. They provide useful guidelines to good assessment, including advice about what to do after the test is over.
This article contains the reflections of Pamela Moss (2003) about the shortcomings of conventional validity theory in her classroom practice. She advocates aggregating judgements using multiple sources of evidence to inform not just what students know and can do but who they are becoming as learners. Moss notes that disagreement among readers or interpreters of student work can enhance validity “because each reader is provoked to see and possibly reconsider criteria…taken for granted.”(p19).
In this article (published in Practical Assessment & Evaluation, 2006), John Ross reviews research evidence on reliability, validity and utility of student self-assessment. He suggests that self-assessment is a reliable assessment technique and by involving students in rubric construction, teachers can strengthen reliability. He concludes that validity, which can be more complex, can be enhanced through specific student training on self assessment procedures.
- How could you improve your own participation in benchmarking activities? List factors you find difficult (time, participants, personal attributes etc).
- Who do you already work with?
- What could help you achieve better agreement when you work with someone you don’t find it easy to disagree with?