Reinforcement Learning Approach for Adaptive e-Learning Based on Multiple Learner Characteristics

We introduce a novel three stepwise model of adaptive e-learning using multiple learner characteristics. We design a model of a learner attributes enlisting the study domain, summary details of the student and the requirements of the student. We include the theories of learning style to categorize and identify specific individuals so as to improve their experience on the online learning platform and apply it in the model. The affective state extraction model which extracts learner emotions from text inputs during the platform interactions. We finally pass the system extracted information the adaptivity domain which uses the off-policy Q-learning model free algorithm (Jang et al., 2019) to structure the learning path into tutorials, lectures and workshops depending on predefined constraints of learning. Simulated results show better adaptivity incases of multiple characteristics as opposed to single learner characteristics. Further research to include more than three characteristics as in this research.


Introduction
Increase in learner enrolment has forced higher education institutions to look for effective ways in which they can reach many learners. One of the strategies deployed by higher education institutions is the use of e-learning. According to Steinbacher and Hoffmann (2015) this involves inclusion of digital tools for learning delivery and employing technology to enable learning to take place without limitation of time and location. The study by (Hadullo et al., 2018)also mentioned inadequate academic staff to facilitate online learning, poorly designed and course materials that are not interactive as challenges faced by learners in online learning. There is therefore a need to improve the quality of e-learning. One advantage that face to face learning has over online learning is the ability of learners to get immediate feedback and clarifications on areas they are facing difficulties (Linecar & Marchbank, 2020). Effort in research has gone into personalizing learning by making learning management systems adaptive . In the effort to improve the quality of e-learning to cater for learners needs, researchers have made and developed adaptive learning systems.
Research in adaptive learning goes back in the1990s. During that time researchers were looking at two major areas: hypertext and user modelling (Ennouamani & Mahani, 2017). Research in adaptive learning has grown ever since. One of the major areas of researches in adaptivity in e-learning at current is in learner modelling (Premlatha & Geetha, 2015). Chrysafiadi and Virvou (2013), and Raj and Renumol (2021) listed the different approaches for modelling learner characteristics as follows: overlay; stereotyping; perturbation; machine learning techniques; cognitive theories; constraint-based models; fuzzy learner modelling; Bayesian networks; and ontology-based modelling. According to Chrysafiadi and Virvou (2013), learner modelling is the foundation for adaptive learning.
Several researchers have used different techniques to develop the learner's models. These techniques have been classified into two: static and dynamic methods. Static methods involve collecting information about learners by having them fill out a questionnaire. According to El Aissaoui et al. (2018) this method does not lead to accurate detection of learner's learning styles as learners are usually un aware of their learning styles. For dynamic methods learner characteristics are collected while learners interact with the system (Ennouamani & Mahani, 2017). To create learners' models, many researchers are now focusing on the use of dynamic techniques.  identified the needs to be filled in adaptive learning systems as: ways of identifying and confirming learning styles; automatic learning styles identification process improvement; improving agents guiding the learner during the learning; ability to tract learning behavior; and basing adaptive LMS on learner assessment.

Literature review
This chapter introduces adaptive learning, e-learning, a review of learning theories and their relations to adaptive learning. Next it reviews researches on learner characteristics in an e-learning environment, e-learning models based on various learner characteristics. The chapter also gives the overview of Artificial Intelligence (AI) techniques used in adaptive e-learning systems.

Theories of learning in adaptive e-learning
Significant advancements of technology birthed with it, tools, environments and procedures for aiding learning and brought in a number of changes in learning environments and the way people learn keep on changing or will be made to change in conformity with the merging trends and technological issue. However, Havard et al. (2016) advocates that the implementation of technology in learning should not be in isolation but be driven by the way people learn. In this section, we review some of the learning theories that has been fronted to enhance adaptive learning by various researchers.
Quite a number of learning theories have been fronted by various researches in order to address the online-learning. From Hadullo et al. (2017), the following learning theories have been looked into and proposed as theories which can make e-learning effective; the social constructivism, the theory of network, the cognitive load theory and the connectivism.
Constructivism, behaviorism and cognitivism are the main learning theories that have been the building stones for the learning and instruction process. Some researchers glide deeper and bring suggest specificity; Hammad et al. (2018) advocates for adaptive systems to based principles of constructivist, behaviorist and cognitivist on the higher scale.
According to Dalgarno (2001), a constructivist envisions learning being the knowledge construction process by building understanding based on past experiences and inputs making shift in focus from teaching to guiding learners so that the learners themselves construct knowledge. In Behaviorism learning is viewed as a response to external stimuli from environmental state-actions reinforcement activities so as to achieve the set specific objective.
Cognitivism, relates learning to a computer process as it defines learning as a process of acquiring, storing and retrieving information.  Table 1, most if not all of the adaptive e-learning systems failed to incorporate the three learning theories. The common theories among the studies is cognitivism and all the studies are mono-theoretical as learning theories are concerned.
For those studies that based their adaptivity in e-learning systems based on theories of learning, they just utilized one aspect of the learning theories. The aspect of how learners process information was the most utilized aspect of cognitivism. There is therefore a need for adaptive e-learning systems to be based on the whole principles of the learning theory so that we can tell if the outcome was because of basing the system on a particular learning theory.
Since most of these concepts of how learning occurs are build based on the weakness of the preceding concepts, there is need of combining the learning theories principles when building adaptive e-learning systems in order to be able to explain learning properly

Learner characteristics and adaptive learning systems
Most adaptive systems have succeeded in most cases where their profound abilities have been based on the accuracy in assessing general and specific learner characteristics (Colchester et al., 2017). This is what informs learner modelling to bring out the adaptation based on learner characteristics. Deciding which learner characteristic to be part of the learner model is usually a challenge (Nurjanah, 2008). Learner characteristics can be static or dynamic. The classification is applicable during modelling. Static learner characteristics include such objects such as name, age, email which do not change during the actual leaning or simulated learning. The collation of such are done through applicable questionnaires customized backed with both front end and back end for such data. Adaptivity in e-learning system may be classified dichotomously as static and dynamic. Dynamic characteristics of the learner include such features that are

Authors
Studied Learner characteristics Theory (Almohammadi & Hagras, 2013b) Learner knowledge Cognitivism (Deeb et al., 2014) Learning style Cognitivism (Fenza et al., 2017) Learner knowledge Cognitivism (Kolekar et al., 2010) Learning style Cognitivism (Rajendran et al., 2018) The learner's Affective states Cognitivism (Malpani, 2011) The Learner's Prior knowledge and current knowledge level. This was done by measuring the ability of the learner to answer quizzes correctly Cognitivism (Sabourin et al., 2011) Learner effect Behaviorism (C. H. Wu et al., 2017) Learner knowledge Cognitivism (Alshammari et al., 2015) Learning style Cognitivism (Whitehill & Movellan, 2018) Learner knowledge Cognitivism (Hwang et al., 2013) Learning style Cognitivism (Yang et al., 2016) Both the learner's Learning style and cognitive styles acquired as a result of interactions with the environment, which complicates their modelling specificity rather than modelling their applicability (Chrysafiadi & Virvou, 2013).  Wu et al., 2015) A fuzzy tree matching-based personalized E-learning recommender system Preferences (Tadlaoui et al., 2018) A learner model based on multi-entity Bayesian networks and artificial intelligence in adaptive hypermedia educational systems Learner knowledge (Alshammari et al., 2014) Adaptivity in E-Learning Systems Learning style, learner knowledge and learner preferences were found to be the most used learner characteristics in the learner model (Kanimozhi, n.d

E-Learning models
Learner models inform the foundations of adaptation in e-learning (Ding et al., 2018). Various models have been explored, developed and aligned to help model varied learner characteristics. Rabat (2016) considered Andragogy and self-directed learning, adult learning theories, to come up with a learning adaptive e-learning model. They encompassed in their modeling, prior knowledge, affective states, personality traits, cognitive characteristics, personal characteristic and knowledge. Mejia et al. (2017) considered people with disabilities in their model and so their setup consisted of demographic data, competencies, reading difficulties, and cognitive traits. Mejia et al. (2017) did not consider any learning theory. Huang et al. (2017) placed the learner I a contextual environment and modelled learner's context with regard to social context, cognitive levels, basic information, learners learning style, learner preferences and related interests. Ding et al. (2018) also considered fundamental initial information, the learner's style of learning, the learner's cognitive abilities and the learner's prior knowledge state in their model. From Huang et al. (2017) and Ding et al. (2018) in their models did not take into account any learning theory.

AI techniques applied in adaptivity in e-learning systems
AI Tools are seen to be appropriate tools to model learners as they exhibit the abilities of replicating human decision-making process. Some of the AI techniques that have been used for constructing learner models include; fuzzy logic, neural networks, Bayesian networks, and hidden Markov models (Almohammadi & Hagras, 2013a). AI techniques have been used in two ways; one is for classifying learners into groups to provide adaptation to those particular groups, two is for diagnosing the learner characteristics as learners learn so as to adjust the instruction method.
Fuzzy logic is seen as an extension of set theory, Fuzzy logic is usually used to assess learning and knowledge of the learner. It has been used in several studies to make adaptation based on learner's knowledge. Almohammadi and Hagras (2013a) used fuzzy logic to extract rules from learner data so that they could tell the knowledge needs of learners. Aajli and Afdel (2017) use fuzzy logic to automatically generate the domain model of the adaptive e-learning systems.
Bayesian networks are directed acyclic graphs which are usually used for modelling variables probabilistic dependencies (Liu et al., 2006). Bayesian networks have been used in adaptive systems in order to provide adaptive instruction. For instance, Liu et al. (2006) use Bayesian networks to assess the learner knowledge and provide instruction as per the learner knowledge; Firte et al. (2009) use Bayesian network to classify users based on their navigation habits and then suggest content based on the classification; Guan et al. (2013) use Bayesian network to provide learning path adaptability by first constructing the domain module using a Bayesian network; Ueno and Okamoto (2007) use Bayesian network to provide motivational messages based on the learner logs.
Hidden Markov models have been used in adaptive e-learning systems. For instance, Deeb et al. (2014) used the K-means algorithm together with the Hidden Markov models to cluster learners into different learning styles and adapt content to suit the learner learning style; Rani et al. (2017) used fuzzy petri nets and hidden Markov model to adapt learning content to each learner in accordance with the learner's learning path.  This research adopted iterative incremental methodology. This is a time-based stepwise software development process and each step defines a definitive block that keeps expanding the model. It begins with initializing the specification to create a basic model. From the initial complete model, user testing process is carried out which gives the user feedback which informs need for specification adjustments and model incremental expansion. The process is repeated till the model becomes functionally complete and acceptable application meeting all requirements put forth by the project. See Figure 2.

Figure 2. Iterative and incremental methodology
The project feedback is received after each iteration is completed.

Model architecture
The Reinforcement Learning Approach for Adaptive E-learning Using Multiple Learner Characteristics (RELUMECEL) Model Framework gathers learner characteristics, learning style, affective state and prior knowledge the give recommendations on the instruction design of contents best suited for individual learner based on the three characteristics. It also gives the best learning path for a learner revisiting the e-learning environment as well as giving the content developers the required updates needed so as to achieve adaptability for various learners represented in the given learning environment. RELUMECEL has a module for collecting the learners profile information. As the learner interacts with the e-learning environment, the following information is collected: user id, username, full names, email address, date of birth, course taking. The learner profile information is used for tracking the learner and delivering the required information and modeling the adaptivity based on the learner profile. This domain will be updated further with information from the feature extraction domain.

Extraction of the learners learning style
The application of learning styles in e-learning environment setups, reinforces and enhances the learner's experiences by making the content retainable in the most effective and realistic manner and form. Implementation of learning style as learner characteristic in adaptive e-learning environment allows the acquisition of skills, knowledge and attitudes by the learner through the study or experience of the learner by their learning style preference.
We use the latest version of VARK questionnaire for the setup. VARK was developed Fleming. We use it to determine the Learning style of the learner. create a module and incorporate the questions of VARK in this module of the VARK questionnaire and its analysis responses as developed by VARK. The VARK questionnaire is incorporated in RELUMECEL as application module. The modules analyze the response of the learner and determines the learners learning style.
The figure below shows the application module with VARK Questionnaire. The learner at the initial interaction with the e-learning platform, is taken through the learning style module and answer the 16 questions depicted in the questionnaire as given by Fleming which basically asks the learner to reveal the way learner likes to learn. And with this the model will provide analysis of the given learner and give its learning style using VARK database developed by Fleming. The scores are used in RELUMECEL Engine to give further analysis together with the other learner characteristics

Affective states extraction
RELUMECEL focuses on, extraction of the affective state and "modelling the affective state." Modelling of the affective state is contextualized to the e-learning environment and the measure is in relation to various learning styles modelled. The extraction process is initiated by developing a model from existing natural language processing libraries, identification of the dataset to be used, preparing the dataset, dividing the dataset into training and test, identifying the best classification algorithms and finally experimenting with the best training algorithms for the best possible results. Once the model is built and tested its incorporated in the RELUMECEL environment so that it can be used for extraction of the learner affective state during his/her interaction with the e-learning environment We used the ISEAR data which is an authentic for seven emotional attributes; fear, anger, disgust, joy.

The prior knowledge extraction
The measure of level of learner's knowledge in a particular field of study is very crucial in assisted adaptability for the learning path to be taken (van Riesen et al., 2018). Once the learner logs into the system and selects the subject and the topic he/she wants to take, he will be taken through the test questions of the subject, then he will be guided through to the next course of action and the outcome measured and the reward given based on the nature of the outcome.
The information resultant from the prior knowledge extraction process is kept as a log and fed into adaptation module to be used later for adaptability processing. The extraction of prior knowledge is further extended later; as later seen in adaptation module. It forms the basis of determining a state a learner is at and the type of action he should proceed to take to gain maximum reward. It will also determine the where to explore more on the environment or just exploit learning by greedily picking on the next action to maximize on the rewards.
The Questions for prior knowledge extraction are based are aligned with the course being taken, learning objectives and other instruction design requirements that are in tandem with both learning theories and the learner characteristics being studied for adaptation.

The adaptivity domain
Once the RELUMECEL model has extracted information; learner's affective state, learning style and the prior knowledge, this is used as input to reinforcement learning model which is the core of adaptivity domain.
In Reinforcement learning, learning is a natural phenomenon that results from the interaction of an agent with its environment (Sutton, 2018). The environment domain consists of states and actions. The interaction of the agent with the environment is specific and strategic so as maximize some rewards apportioned during the learning process. Situations are mapped into actions similar to other forms of learning. In reinforcement learning, the argent/learner discovers the best action to take in any given situation within the parameters of the environment. The agent must proactively sense the environment, choose the best action in a given state within the environment among the available actions that maximizes the reward function. With the best action taken, the agent state is updated and it acquires a new state.
From Figure 7, we visualize a general reinforcement learning architecture. A given reinforcement learning environment has got features which defines it; State S, time t and state at a given time S t . A given state has value which is dependent on immediate reward R at t giving R t .

Figure 6. A reinforcement learning
To implement our reinforcement learning, we will explore the Q-learning algorithm.

Q-learning algorithm
According to Balasubramanian Velusamy (2013), Q-learning algorithm (Watkins, 1992) is model-free reinforcement learning that is focused in finding the optimal policy of a given Markov Decision Process (MDP). A Markov decision process is a 5-tuple ( Ѕ , A , Pa , Ra , γ ∶) where

S is a finite set of states,
A is a finite set of actions or As is set of actions from s) p(s ′ , r|s, a) = Pr {S t+1 = s ′ , R t+1 = r |S t = s, A t = a} is probability that action a in state s at time t will lead to state s′at time t + 1, in case of deterministic case we have δ(s t , a t ) = s t+1 , R a ( s, s' ) is the expected immediate reward received after transitioning from state s to state s', due to action a, γ ∈ [0,1) is the discount factor, which represents the difference in importance between future rewards and present rewards.
In a given [problem domain the agent strives to maximize the total reward as it transitions from one state to another. The Q-function which is a generalization of Q learning calculates the best combination of every state and action that will maximize the reward. Q function will return a fixed value at start point of the processing, as it goes through the transition new values get computed as the agents rewarded and thus a Q-table is updated by these new values.
Q-function is denoted by Q(s t , a t ) ← Q(s t , a t ) + α [r t+1 + y max a Q (s t+1 , a) − Q(s t , a t )] where t -Present or Current state t + 1 -the Next state Q (s t , a t ) -the Q − values for the current state R (St, at) -Reward after performing action at in St α -The rate of learning (0 ≤ α ≤ 1) γ -Discount factor deciding the significance of the future and upcoming possible rewards (0 ≤ α ≤ 1)

Implementation and discussion
The reinforcement learning architecture, begins with the learner's characteristics having been extracted from various modules of the model. These are stored as part of the learner logs which are used for various computations which inform reinforcement and hence learning path.
The lessons are designed and generated using a specific instruction model which aligns the lesson to the learning theories which addresses the specific characteristics and finally bringing out the adaptability. The learners with go through the guided learning process gets to attend the online lessons, do the assignments, tests and submits where necessary. The model detects the learner interactions and chooses for the learner the best paths, through actions at given times within specific states. This is done so that the learner can get maximum possible reward based on the state-action space. This is repeated in case the learner continually until a given best path is determine based on given learner combined learner characteristics measurements.
A learner visiting the system for a second or more time will have their information retrieved and the adaptability given. This will apply also to new learners with similarity in their learner characteristics Table 4 below shows a lesson plan indicating module for topic "Object oriented programming using C++." Table 4. The learning modules for a given lesson table

UNIT-One
Overview of C++:

Description
Object Oriented paradigms, Data abstraction/control abstraction, OOPS principles, Origin of C++, Sample C++ program, dynamic initialization of variables, new and delete operators, C++ keywords, General form of C++ program, Type casting, Introducing C++ classes, Difference between class and structure.

Expected from the student
In this reinforcement learning model for adaptive e-learning, following the prescribed instruction model, states sϵS to be considered include taking Lessons, reading extra-material, solving exercises, going through questions and answers, waiting for answer, waiting for results, assignments, and assessments, Discussion, understanding and explanation. The actions aϵA to be considered include read/study, read more, study extra material, solve exercises, submit exercise, ask where doubt, perform tests, discuss, giving up, Questions and Answers, for more understanding then do assignment submission and finally complete the learning by exiting the system or logging out.
We assign the rewards values between 0 to ten and apportion the as in the algorithm below.  The content for e-learning is presented based on a given designer with the variables being, the title subtitles activities at each stage, timeline of each stage, the intended outcomes and the program structure. The planner assessment is based on ACM\IEEE curriculum recommendations. We built an instructional environment and resources which is dynamic and encompasses adaptability otherwise known as an adaptable instruction design based on environment and resources.

Begin
As a put out in Schott (2015), designing an instructional follows theoretical and practical research in the fields of cognition, education psychology and problem solving techniques. The strategies used in instruction design enhances the creation of guidelines for best practices in all aspects of the instruction process which include; planning and management of e-learning instruction method, delivery techniques, learner assessment and evaluation and feedback methods. The fundamentals of the theory are to produce measurable changes in learners' cognitive skills and attitudes. This calls for construction of lessons to achieve the intended objectives which then inform the creation of course plans.
There are a number of instruction models including analysis, design, development, implementation, and evaluation (ADDIE) model, ASSURE model, Dick And Carey Model, 4CD/ID model (Khalil & Elkhider, 2020). The models are formed to implement all or a at least one learning theory. Learning Theories (Cognitive & Processing, 2018) are defined as an organized set of principles explaining how individuals acquire, retain, and recall knowledge and they include Behaviorism, Cognitive Information Processing (Cognitivism) and Constructivism.
In this research the designer section explores ASSURE instruction model to design the dynamic courses and is keen on bringing out all the elements of Behaviorism learning theory.

69
The ASSURE Model can be seen below. Figure 8. The ASSURE instructional model As shown in Figure 8, ASSURE model is an acronym for the steps followed in the model; Analyze Learner Characteristics, State objectives, select/modify/Design Materials, utilize materials, Require Learner Response and Evaluation. In this research we are extracting multiple learner characteristics to give model e-learning environment and give adaptation to learners. ASSURE model is therefore ideal for our design purposes and as indicated by Sundayana et al. (2017) is well suited for Problem based and discovery learning.
Our environment states consist of lessons, exercises, assignments, assessments, exams and actions consists of, study, study extra materials, do assignments, perform test, submit assignments and others depending on the course composition. We have a wide state action space to be considered. We assume that the agent in this environment is also influence by the different learning characteristics of the learner/agent.
With the optimal policy calculated based on learner characteristics and given instruction design presented, and with the logs of learner profile, the model will then provide adaptability per learner based on learning characteristics of the learner.

Evaluations and experimental results
In using simulation of q-learning algorithm use simulation to get to help in varying the agent parameters especially the learning characteristics. Figure 5.1 shows the initial graph. We then defined the reward system which set at the maxim of 100 and if the learning takes place smoothly throughout the iterations, then Max reward is 100 as shown in the matrix generated in Figure 10. This Matrix is also the initial body of the initialized probabilities.
The initial matrix: With reinforcement value set at .75 we get matrix as shown and 1000 simulations of Iterations we are able to generate the matrix of the best learning path and provide determine the necessary reinforcement as shown in Figure 10.

Conclusion and future work
This work presents enhanced approach to infusing adaptability to learning management systems by looking into three learning characteristics; learning style, prior knowledge and affective state. In this research we have created an adaptability based on learner characteristics and using reinforcement learning technology, we studied various processes which can be used to extract these characteristics.
We implemented reinforcement learning using Q-learning algorithm to bring out adaptability. We have not exhausted all the learner characteristics and therefore we propose that in future work the research can be extended to exhaust learner characteristics.