Don’t Follow Reinforcement Learning Blindly: Lower Sample Complexity of Learning Optimal Inventory Control Policies with Fixed Ordering Cost