Reinforcement learning for inventory management

The inventory stock control is one of the most significant problems in the supply chain management process of a firm. Reducing its stock costs helps gaining in performance and competitiveness.


The inventory stock management includes aspects such as controlling and overseeing purchases from suppliers as well as customers, maintaining the storage of stock, controlling the amount of product for sale, and order fulfillment. A decision maker (learning agent) observes the random stochastic demands and local information of inventory such as inventory levels as its inputs to make decisions about the next ordering values as its actions. Since the inventory on-hand (the available amount of stock in inventory), unmet demands (backorders), and the existence of ordering are costly, the optimization problem is designed to minimize the overall cumulative costs.

As a result, the objective function is to reduce the long-run cost (cumulative reward) whose components are linear holding, linear penalties, and fixed ordering costs. In most inventory management policies, this is done using basic heuristics that are not always able to account for the complexity of the system and the stochasticity of the demand.

This results in two possible scenarios: the first is to exceedingly order which results in paying unnecessary costs, the second is to make an insufficient order which results in unsatisfied demands.

In order to minimize inventory management costs, a promising route is to utilise a reinforcement learning approach. Indeed, stock management can be modeled as a sequential decision-making process under uncertainty which is often written as a Markov decision process (MDP). In this case, reinforcement learning provides robust solutions for this kind of tasks. Recent progress in machine learning, and RL in particular, involving deep models in other complex domains (deep Q-learning) suggests that the achieving of high-quality results (which can even be transferred from one inventory problem to another via transfer learning, minimizing overheads) even in these types of highly complex environments may be well possible.

A quantum-enhanced method to generate optimal inventory management strategies would bring twomain benefits:

  • Improved inventory management directly translates in reduction of expenses;
  • Better performance ensures more timely delivery of items to customers, and avoids delays.
Our website uses cookies to give you the most optimal experience online by: measuring our audience, understanding how our webpages are viewed and improving consequently the way our website works, providing you with relevant and personalized marketing content. You have full control over what you want to activate. You can accept the cookies by clicking on the “Accept all cookies” button or customize your choices by selecting the cookies you want to activate. You can also decline all cookies by clicking on the “Decline all cookies” button. Please find more information on our use of cookies and how to withdraw at any time your consent on our privacy policy.
Accept all cookies
Decline all cookies
Privacy Policy