In an era where personalised education is increasingly pivotal, the integration of adaptive learning technologies has emerged as a transformative force in the realm of e-learning. Traditional educational approaches often fail to cater to the diverse needs of individual learners, resulting in a one-size-fits-all model that leaves many underserved. Recognising these limitations, the development of a q-learning-based model introduces a sophisticated mechanism to tailor course content to each students unique learning style, preferences, and pace. By leveraging reinforcement learning techniques, this model dynamically adjusts the sequence of instructional material, enhancing engagement and optimising knowledge retention. The new system leverages reinforcement learning techniques to autonomously adapt to user behaviour, delivering tailored content aimed at fulfilling learning objectives based on the feedback received, whether affirmative or negative. Functioning as an intelligent agent, the system scrutinises user interactions and selects the most suitable responses to enhance the overall learning experience. The primary goal of this research is to create a dynamically adaptive e-learning system utilising reinforcement learning methodologies. The reinforcement learning algorithms entail making targeted decisions that yield varying rewards, with each knowledge component associated with a specific reward based on its relevance. These algorithms are grounded in the principles of Markov decision processes, which encompass a set of actions and the probabilities of transitioning between different states. Within this Markov decision process framework, both a reward function and a transition function are defined. The core function of the proposed system is to recommend learning pathways by concurrently considering sequential behaviour, learning styles, activities, materials, difficulty levels, feedback, preferences, competencies, and knowledge levels, employing the q-learning algorithm. The optimal path for the active learner in the course used for the implementation is ????0→????1→????6→????9. The proposed system identifies the study trajectory favoured by learners for a particular course. The results demonstrated that after 200 iterations, the performance of the q-learning algorithm exceeded that recorded after 100 iterations. The success rate is 60.86% and 70.82& for 100 and 200 iterations respectively while the optimal course selection path training time is 10 and 8 for 100 and 200 iterations respectively.