Abstract
Problem definition: We consider the joint optimization of ordering and upgrading decisions in a dynamic multiproduct system over a finite horizon of T periods. In each period, multiple types of demand arrive stochastically and can be satisfied either with supply of the same type or by upgrading to a higher-quality product. The goal is to find an optimal joint replenishment and allocation policy that maximizes total expected profit, both when the firm knows the demand distributions a priori and when the firm must learn them over time. Methodology/results: We first characterize the structure of the clairvoyant optimal joint ordering and allocation policy. Building on this structure, we propose a new online learning algorithm, termed stochastic subgradient descent with perturbed subgradient (SGD-PG for short), and show that it achieves cumulative regret growing on the order of the square root of T, which matches the known lower bound for any online learning method. We further show that SGD-PG can be extended to a nested censored demand setting. In the course of the algorithmic design, we propose a linear programming (LP)-based approach to compute the subgradient and prove that it produces the same output as the perturbed subgradient method. The LP-based method also allows us to extend the results to general upgrading structures. We demonstrate the efficacy of the proposed algorithms in numerical experiments. Managerial implications: This work provides practitioners with the optimal policy for inventory replenishment and allocation in a multiproduct system with upgrading. When the demand distribution is unknown, we propose an easy-to-implement and provably good algorithm for demand learning. In addition, our numerical results quantify the value of optimal upgrading and identify the conditions under which upgrading is most beneficial.
Funding: This research was partially supported by an Amazon research award.
Supplemental Material: The e-companion is available at https://doi.org/10.1287/msom.2024.0974 .