Converting content into smart content is crucial for publishers to survive in the digital and artificial intelligence age. Educational publishers have a plethora of content which needs to be accurately stored and labeled to the right language or reading level. Hence, following the CEFR* scale is key for publishers to match content to the right readability level. In this blog post, we'll examine what kind of solutions exist in the market that offer CEFR classification.

Artificial intelligence has built-in algorithms which can classify content sidestepping the need for manual tagging performed by experts. Such automatic tagging can be performed by algorithms, which assign individual pieces of content an appropriate language level based on an objective standard.

Several companies have technology that can tag content, but in different ways than the CEFR scale.

1. Wizenoze (Netherlands) Wizenoze uses machine learning to estimate the readability of educational texts. Wizenoze classifies text according to their own Index – the Wizenoze Readability Index. This Index is based on a score of 1-5 with each level roughly comparable to UK and US educational stages. Their tool has several additional functions such as finding simpler alternatives for difficult terms. However, Wizenoze doesn’t provide any CEFR classification.

2. Aylien (Ireland) Aylien uses Natural Language Processing to extract insights from textual content. Using APIs Aylien analyzes texts based on sentiment categorizes them according to IAB-QAG & IPTC News Codes and extracts metadata from them. Aylien also has several helpful features like suggestions for hashtags or tools to remove clutter from web pages. The Aylien tool can also detect which language a text is written in and assign it a confidence level score, based on its own criteria. However, Aylien doesn’t provide any difficulty reading level of a text and definitely not CEFR classification.

3. UNSILO (Denmark) UNSILO brings publishing together with Artificial Intelligence. Primarily focused on evaluating manuscripts and building content packages for new business opportunities, UNSILO is not specifically designed for educational publishers. However, they can provide custom metadata tagging solutions, using APIs focused on key concepts and related content. UNSILO does not assign difficulty reading levels to content.

4. Watson IBM (United States) IBM’s Watson is the grandfather of Natural Language Processing for content. It can extract entities, relationships, keywords and semantic roles from unstructured data. It currently operates in 13 different languages, although it doesn’t provide an assessment of language difficulty. It isn’t designed specifically for educational publishers or even publishers, but no list of metadata-tagging tools would be complete without it.

5. EDIA (Netherlands) EDIA uses Artificial Intelligence and machine learning to automatically metatag and classifies content for educational publishers. EDIA’s primary product – 360AI – classifies texts according to language level using the Common European Framework of Reference for Languages (CEFR). This standard is used broadly across Europe and assigns content based on 6 levels of complexity. 360AI can operate in English, Dutch, French, German, Spanish and Italian, creating a cross-linguistic standard for content. 360AI is also able to combine the CEFR standard with a publisher’s in-house classification style, making content even more adaptive and helpful.

It’s clear that there are many different ways for publishers to automatically metadata tag content for different language levels. Different classification standards are used across the industry, which can sometimes make it difficult for publishers to maintain a standardized system. However, the options available are manifold, meaning educational publishers can choose the right tool for their environment and market.

*The Common European Framework of Reference for Languages (CEFR) is an international language learning standard set by experts from the Council of Europe. The CEFR scale has reading levels ranging from A1 (beginner) to C2 (native).

Topics: Publishers, AI in Education

Walter Montenarie

Written by Walter Montenarie

Walter Montenarie is EDIA's Chief Commercial Officer (CCO) as of March 2019.