Pedagogy and Big data

Cortexio takes a step into the future and begins with Machine Learning. The willingness has always existed, but not until now have we found the right partners. Together with researchers from Linnaeus University, we will start throwing around terms such as Big data , Machine learning , Data mining and Learning analytics . But what does all these terms mean?

The more learning activities that move out into cyberspace the more data is created. Every click, interval between clicks, scrolling etc. can be used to draw conclusions about users. In educational contexts, this data can be used to try to better understand and improve the learning process. Data collection and data analysis is also increasingly used by various companies to generate business intelligence to improve decision making. In higher education, data processing from digital tools has been used for some time to try to understand and adapt educational systems to the students’ ability. E.g. to improve human long-term storage of learned knowledge.

If you compare educational software with entertainment software, e.g. Spotify or Netflix, there is still some way to go in terms of applying data analysis to make predictions of what kind of information source would best suit a specific student to learn a specific new knowledge. The reasons why educational software development has not reached as far in terms of customer satisfaction in relation to recommendations may be due to several variables where one can imagine that the size of the smallest unit (eg a song in Spotify or a movie or a section of a series on Netflix ) matters. What also matters is the subjective ability of users to assess a knowledge unit’s value. One might, for example, imagine that it is easier to judge whether you think a song is good or bad, as the taste of music is something obviously subjective. On the other hand, judging the educational value of a video or a piece of text can be perceived as of as more difficult, since it might require a higher level of education in the specific field to assess the quality of the content (is it really factual?) and how it is presented (is there an easier way to explain this without missing essential facts?).

As big data, data collection and machine learning become popular concepts, there may be a point in distinguishing what you can actually do today from what can still be more in line with science fiction. Big data is perhaps the most popular of all these concepts and may be used so diligently precisely because it more or less lacks definition. Big data is used to describe amounts of data that exceed the amounts that can normally be handled by a database software. This means that the exact size from the beginning is vaguely defined and also constantly changes over time (as a regular database can handle ever larger amounts of information). Another general term used more and more is machine learning. Machine learning is a term used for data management. After collecting large amounts of data, one can draw conclusions about the population that gave rise to the data. By setting up formulas that can relate to previously collected data and evaluate and sort new data points based on this “older” data (somewhat simplified), a software can draw conclusions about a user based on their behavior. Furthermore, the formulas can become more and more accurate when this new data is also used to make the reference library even larger. This is called Machine learning.

Two other concepts used are Educational data mining and Learner analytics. The concepts can be distinguished in that data mining is used to try to develop new programs and apps based on collected data (what user behaviors exist and how could one build functionality to teach something in relation to these behaviors). Learner analytics then focuses on using known predictive algorithms and how to use them to increase learning, but still from a quantitative perspective. However, these concepts are relatively fuzzy and can in many ways go into each other.

Part of Data mining is User modeling. User modeling is about creating an image of specific users based on their click habits and behavior in a digital learning medium. One can then draw conclusions about what a user knows and does not know. What works as motivating forces on the user and how satisfied the user is with the functionality of the software. The data is retrieved live and at best, feedback can also be given to the user based on the user behavior live.

The user behavior can also be used to segment the users based on specific characteristics. The groups that the users are divided into can then lead to the different groups being able to receive adapted learning material based on expected wishes. In order to be able to make this type of content adaptation, relatively complicated algorithms are often required that take into account both the quality and quantity of the teaching material and the presentation form, etc. A more common and simpler form of data use that does not require automatic application of collected data is to display statistics on the user behavior for a teacher who can thus draw conclusions about their students’ progress and special needs in order to make efforts where needed.

The goal of Cortexio is to use all the data we can obtain to maximize learning per unit of time invested for each individual user. Depending on the user’s specific characteristics in terms of reading speed and memory capacity, the presentation of material must be adapted. This, to introduce one last term in the post, is what is called adaptive learning.

Over and out

Elias