Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management

Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management - BizPub.ai