My apologies for the delay getting this newsletter out. For some reason my day job has been keeping me terribly busy lately. Apparently people still need test prep.
I’ll get right to it…
Details on the new TOEFL score scale
ETS has now published a comprehensive guide to the upcoming TOEFL score scale revision, focused on institutions.
The guide contains charts to convert between the old (1-120) scale and the new (1-6 scale), a chart to convert both to the CEFR and charts to convert 1-30 TOEFL section scores to 1-6 scores. There is also a chart to convert IELTS scores to 1-6 TOEFL scores. So many charts.
A few other things are mentioned in the guide:
It notes that “[w]e have not conducted a score concordance study with the Duolingo English Test (DET). To determine the best TOEFL score, we recommend you select the TOEFL score based on your desired CEFR level, rather than a direct comparison to DET.”
There is a long section addressing a hypothetical score user that currently enjoys the perceived preciseness of the 1-120 scale, and is worried about the new scale’s lack of such precision. It is worth reading.
It says that “digital guidebooks” for test takers will be provided in July.
Again, it is confirmed that starting in January 2028, score reports will only contain the 1-6 score scale.
It notes that starting January 2026, paper score reports will be retired. The guide specifies that institutions will no longer receive paper score reports. I assume that test takers will not receive them either, but that is not stated.
To be honest, I think that getting 13,000 institutions to update their score requirements will be a Herculean task. I look forward to watching it happen.
TOEFL Office Hours
Speaking of the revised TOEFL, my second “Office Hours” chat about the upcoming changes was a big success. About 75 interested test-prep folks showed up, and most stuck around for the whole hour. I’ll try my best to host a third one once ETS has published some practice tests. Probably around July 16.
Here are a few notes which I think express community sentiment:
Many attendees expressed a desire for more clarity on how adaptive testing will work on the TOEFL. I sense that people want assurances that this change will be fair to students and comprehensible to people preparing them for the test. ETS should spend some time on this issue as the launch date approaches.
Many are still wondering if the revised TOEFL will contain integrated speaking and writing tasks. In some ways, integrated tasks are the TOEFL’s bread-and-butter, setting it apart from competing products. But on the other hand, eliminating them (or at least their current incarnations) could create a faster, cheaper and more streamlined test form.
My friend Pamela Sharpe, who has been preparing students for the TOEFL since 1970, was in attendance. I asked her what she thinks the biggest revision to the TOEFL has been to date. Her response? Maybe this one.
Most people are pretty enthusiastic about the new score scale. Few believe there is a meaningful difference between a kid with a score of 102/120 and a kid with a score of 104/120.
Many people are wondering if the free practice tests from ETS will be adaptive. While the TOEFL Essentials test is adaptive, the practice tests provided by ETS are not.
Some concerns regarding equity in test prep were raised. Right now, no-budget test takers can wander down to their local library and get copies of the official books to prepare for the test. That will no longer be the case in January, when test takers will be more dependent on paid prep products, which can be costly.
Likewise, there were a few concerns about the accuracy of the practice tests set to be released next month. Older teachers remember how flawed some material was when the TOEFL iBT launched in 2005. This could impact our plans to prepare for the revisions in a timely manner.
A few attendees expressed hope that the test will include global English accents.
We talked about whether the TOEFL might become a regularly adjusted test, like the DET. Some attendees figure this would be great for students. Others were not so enthusiastic.
We talked a lot about TOEFL Essentials and TOEIC and how early reports suggest that this test will share item types with those tests. Like it or not, people are starting to believe that this is a glammed-up TOEFL Essentials test. Mostly because of the way it was presented to test prep firms in China a few weeks ago. If ETS views this sentiment as problematic, they ought to nip it in the bud ASAP.
Weird TOEFL Ad Campaign
ETS launched and quickly reconsidered a social media campaign touting the “low score cancellation rate” of the newly enhanced TOEFL Home Edition. It seems that it disappeared a few hours after I wrote about it on LinkedIn. That’s probably for the best since the campaign was clunkily implemented. Here’s an image from the campaign:
In any case, score cancellation is an issue that ETS will have to keep in mind as they move forward with enhancements to the TOEFL. It’s no secret that the TOEFL Home Edition has a bit of a social media problem. When potential test takers raise the possibility of taking the Home Edition, others are quick to chime in and warn them about this very issue. I’ve written many times about test takers sharing stories about curious score and test cancellations – cancellations for having a jagged score profile, cancellations for utilizing too much RAM during the test, cancellations for unauthorized software being detected immediately after successfully passing a system scan… and other stuff.
Five years in, this remains a pretty big issue that I believe has a measurable impact on the number of people who opt to take the TOEFL Home Edition. Until yesterday I wasn’t sure that anyone in a leadership position over at ETS truly realized how big that impact was.
Here’s a link to ETS’s Trustpilot page, where many of the comments touch on this issue (across the whole range of ETS products). With a rating of 1.1 out of 5 they’ve somehow managed to be more disliked than the folks at Wells Fargo and even American Airlines.
I’ve written again and again about how legacy test makers seem increasingly disconnected from the people who consume their tests. Cleary, this issue could have been tackled head on before now. This is the fourth major revision to the test in six years and I don’t think there will be time and money for a fifth revision if this one flops. Success or failure will come down to communicating with customers, understanding customers and, as we say, “meeting them where they are.”
So kudos to whoever came up with the campaign. Just… maybe work on the wording a bit.
Cambridge Research on AI-Powered SDS for IELTS
Check out this wonderful new article in TESOL Quarterly by Yasin Karatay and Jing Xu. It explores the possibility of using an AI-powered spoken dialog system to simulate an IELTS examiner. Specifically, the researchers used a self-developed SDS powered by GPT-4o to simulate the third part of the IELTS speaking test.
It is important to note that in this research the AI served only as the interlocutor. Ratings were carried out by trained humans who reviewed recordings of the interactions.
The authors concluded that “the SDS consistently elicited some key interactional competence features, as seen in face-to-face oral proficiency interviews, and that such features were useful in distinguishing between higher- and lower-proficiency test takers.” They also highlighted areas for improvement around non-verbal clues and some unnatural interactions.
There is a lot more to it, of course. So take a moment to read the article.
Though the IELTS partnership takes a “cautiously curious” approach to the use of AI in testing, this may be an area worth exploring in more depth. Some observers have raised concerns about the ability of the partnership to maintain consistent standards across four million annual speaking tests, each carried out by an individual examiner. It is possible that at least some of these administrations are impacted by things like bias, fatigue and differing ability levels across the examiner population. It’s no secret that IELTS test takers frequently travel to test centers which they feel are more conducive to a higher speaking score. The IELTS partnership is quick to dissuade test takers of this notion, but it isn’t inconceivable that one examiner might be better at their job than another. Or that one might possess certain biases which another does not and that such differences could impact how the speaking test unfolds.
On top of that, this sort of change could lead to immense cost savings for the organizations that administer the IELTS (even if human raters are retained). Some of those savings could be passed on to test takers, if only to make the IELTS more competitive in an increasingly crowded market of tests. That’s a touchy subject so I’ll leave it for another day, but I think most readers can imagine the possibilities.
Finally, it is worth noting that this research was done by researchers at Cambridge University Press and Assessment. So maybe this could lead to something.
You may also appreciate this similar research funded by ETS back in 2021. Based on that work, I’m still optimistic that an AI-powered SDS system will find its way into some future version of the TOEFL.
Could Duolingo be Winning?
It risks generating controversy, but I would be remiss if I didn’t link to Nicholas Cuthbert‘s video and wrap-up from DETcon London, since it generated quite a lot of spirited discussion. Notably, Nicholas describes Duolingo as “ahead of the game.”
And also says: “Duolingo are winning.”
Hyperbole? Perhaps. Duolingo is certainly on a winning trajectory in terms of acceptance in key receiving markets, market share, brand awareness, test taker engagement, use of technology, and baffling social media campaigns. But it is important to note that IELTS still does more test administrations than everyone else combined. There are still people paying $530 to take an IELTS test. IELTS will be the market leader for many years to come. Accordingly, there is still plenty of time for the IELTS partnership to develop a “next-gen” IELTS that eats Duolingo’s lunch.
Heck, Pearson is hoping to do just that in about four months.
Recent messaging from the British Council and Cambridge University Press & Assessment suggests that the IELTS partners plan to double-down on their more traditional approach to assessment. It seems like they don’t plan to change the way they assess students, but instead encourage score users to more carefully consider which tests they choose to accept.
Regardless, I’m convinced that Cambridge has got top people working on… something (see above).
In a private conversation, a well-informed industry watcher recently expressed some incredulity that LLMs haven’t totally disrupted the high-stakes language testing sector. He was shocked that people still pay hundreds of dollars to take a test. My response was that this stuff takes time. Everyone knows that university governance is a slow process. Immigration regulations are even slower. But things might finally be coming to a head – coincidentally both ETS and Pearson announced the existence of their “next-gen” (my term) tests at NAFSA a few weeks ago. Things are moving a tiny bit faster now.
By the way, you must get to one of these DETcon events if you get the chance. They are a charming combination of research presentations, community building and Duolingo’s trademark irreverence. I understand that at the most recent Pittsburgh-based event, Duolingo CEO Luis von Ahn was subjected to an unannounced Yinzer Test. Not sure what that is, but I suspect it is similar to a Voight-Kampff Test. In any case, the results have not been shared publicly.
IELTS Continues to Push Back on AI
The Higher Education Policy Institute has published an article written by a managing director at Cambridge University Press & Assessment about what is described as “the shift to remote language testing that removes substantial human supervision from the process.”
Notes the author:
“While some may be excited by the prospect of an “AI-first” model of testing, we should pursue the best of both worlds – human oversight prioritised and empowered by AI. This means, for instance, human-proctored tests delivered in test centres that use tried and proven tech tools.”
And:
“Cambridge has been using and experimenting with AI for decades. We know in some circumstances that AI can be transformative in improving users’ experience. For the highest stakes assessments, innovation alone is no alternative to real human teaching, learning and understanding. And the higher the stakes, the more important human oversight becomes.”
Cambridge has been pushing back against newer tests with a bit more forcefulness in recent months (see also the “The impact of English language test choices for UK HE” report).
To my eye, the debate between at-home testing vs on-site testing is over, with supporters of at-home scoring a decisive victory. Indeed, Cambridge’s own at-home IELTS is widely accepted at schools across key receiving markets. But more importantly, it seems that most test takers really like the idea of at-home testing. Many of those who forgo it in favor of an on-site test do so out of fears that the maker of their chosen test stinks at delivering a seamless at-home product – not because of some love of the test center experience. As test makers get better at doing at-home testing, more test takers will pile into that option.
There might still be room for a robust debate about the merits of synchronous online proctoring (that is, a proctor watches as you take the test) vs asynchronous online proctoring (a proctor watches a video of your test after the fact). But maybe that debate will soon reach a conclusion as well. Note that Pearson seems to be going the async route in their new PEE Test, and that ETS will offer an async option in the new TOEIC Link Test (which is being pitched to higher-ed as an admissions test). These developments suggest that the writing is on the wall for live proctors. Indeed, I was a little surprised to learn that ETS will maintain them as part of the revised TOEFL set to launch in early 2026.
TOEFL Essentials Price Increase
The cost of taking the TOEFL Essentials Test was recently increased to $199 USD. That’s about a 100% increase. In a few markets, the TOEFL Essentials is now more expensive than the TOEFL iBT.
It’s a curious move. I can’t really figure it out. Perhaps someone with a bigger brain than I can explain it.
The TOEFL Essentials test was launched in May of 2021 as a cheaper, shorter and wholly at-home alternative to the TOEFL iBT. At the time, most people assumed its development was a response to the growing prominence of the Duolingo English Test. It never really took off, though. That’s partly because the number of accepting institutions was low and also because it was still twice as expensive (and twice as long) as the DET.
One of the new items developed for the test (the “writing for an academic discussion” task) was folded into the TOEFL iBT in 2023. It appears that a few more of its items (some also shared with the TOEIC) will be added to the iBT in January.
IELTS UKVI in Bangladesh Goes Computer-Only
I read that IELTS UKVI will go fully computer-based in Bangladesh starting July 27. After that date, the paper-based version will no longer be offered.
According to this article, “[t]he decision to shift the IELTS for UKVI entirely to computer-based format has been taken internationally.”
Some other countries still allow test takers to book paper-based tests beyond that date, but perhaps they are on different schedules.
I’m curious if the upcoming HOELT test will include a paper-based version. Paper-based testing makes tests more accessible and equitable… but some have raised concerns related to test security.
Paper IELTS goes Pen-Only?
It appears that the paper IELTS will now be completed in pen. Here’s an image I spotted on Facebook:
On Washback
Dan Isbell has written a guest post about washback in English test preparation for the Duolingo English Test blog. It discusses preparation for the DET in particular, and for English tests in general. Isbell divides test prep into three types: activities that improve your English in general, activities that help you perform better on a particular test (test familiarization), and activities that help you game a particular test (templates and guessing strategies, for instance).
I understand that a full report on this topic is forthcoming. I will add a link when it is available.
It’s an interesting thing to explore. Test preparation always includes at least some good washback. No matter what test they are preparing for, most test takers complete at least a few practice tests. As a result, they will spend time consuming stuff in English and producing stuff in English. This is good. But does it have a really meaningful impact on their fluency in the language? I don’t know.
The TOEFL iBT contains two 800-word articles which are excerpted from actual textbooks. Students here in Korea take their preparation pretty seriously and might complete 20 or 30 practice tests before test day (or between several test days). That means they spend a lot of time reading some pretty dense material in English. Does that improve their fluency? Of course. Does it improve their fluency a lot? I don’t know.
My test prep niche is writing. It gives me great joy to know that my students walk away from their lessons with a noticeably stronger command of English grammar and language use conventions. But, needless to say, there are faster and more economical ways to learn about sentence fragments and collocations.
Does all of this test prep mean that students spend less time on more useful and effective language acquisition approaches? Maybe.
Is it the job of a test maker to give a darn? Or is their only job to accurately measure language fluency? I don’t know.
A few stray thoughts come to mind:
I’m interested to know how the age of a test impacts the way that students prepare for it. As a test ages, people working in test prep become more and more familiar with the design of that test and can use that knowledge to develop better and extremely granular type 2 strategies. Elderly readers might recall that in the early years of the TOEFL iBT we had just one official book (badly written) and a handful of books from third party publishers (even worse) to go by. We didn’t know very much about the design specifications of test items, nor about how speaking and writing items were scored. Things are obviously much different now. Now we know almost everything there is to know. We know so much nowadays that it might be malpractice to not spend quite a lot of time on test familiarization strategies. Should tests be meaningfully refreshed on a regular basis to mitigate the impact of this factor?
I love reading about the early history of the Princeton Review. That firm emerged in the early 1980s when the SAT was long in the tooth and probably at its peak terribleness. Princeton Review taught students how to eliminate answer choices without actually reading questions. They also taught students how to recognize unscored sections so they could enjoy a refreshing nap part way through the test.
In 2019 Malcolm Gladwell and his assistant both took the LSAT for an episode of his “Revisionist History” podcast. They got coaching from the one and only John Katzman beforehand. The point of the episode is that time management (type 2) is the most important thing when it comes to getting a good score on this test. It made LSAT tutors really cranky.
CELPIP/IELTS Concordance
Prometric and the IELTS partners have just published a concordance study comparing the CELPIP and IELTS-General tests.
It is a very nice study. I just want to mention that of the 1089 participants, seemingly not a single one earned an IELTS writing score of 9.0. Two participants earned a score of 8.5. This is the fourth concordance study in a row (earlier studies linked IELTS scores with TOEFL, MET and LanguageCert) involving IELTS in which not a single person reported a perfect writing score. I don’t know if that’s meaningful, but it amuses me.
ITEP Interview
Here’s a great interview with iTEP International CEO Todd Maurer conducted by Cathoven AI. It explores the history of the company, its English tests (high stakes and otherwise) and some of the innovations they’ve brought to market. Without really being a household name, over the past few decades iTEP has carved out a niche for itself and formed some meaningful partnerships with academic institutions.
It is worth remembering that iTEP were the first to introduce certain features that were later adopted by better known testing firms.
I chuckled when Todd said that “the TOEFL test, for a long time, was three hours.” That’s funny because the TOEFL was actually four and a half hours long.
One of my goals for 2025 is to learn more about the iTEP products. I will take the iTEP Academic this month, I hope.
Supporting this Newsletter
I’ve received some very generous pledges of financial support from readers already. However, due to onerous banking regulations here in Korea, it is unlikely that I’ll ever be able to turn on Substack’s monetization feature. Anyone who really wants to make donation can do so via a Ko-Fi page I’ve set up. You can sign up to make a one-time or monthly donation.
Image of the Week
This week’s image is a copy of Playbill magazine featuring “English,” the pulitzer-prize winning play about preparing for the TOEFL. I’ve also included a copy of the insert that Duolingo placed inside after sponsoring the broadway debut of the play a few months ago. If you didn’t make it to New York, note that you can buy the play in paperback at your favorite bookstore (or Amazon).