Also known as multi-armed bandit problem, K-armed bandit, N-armed bandit, bandit problem
reinforcement learning problem exemplifying the exploration–exploitation tradeoff
Discovered by embedding cosine similarity (sentence-transformers MiniLM, 384-dim).