I write about technology at theluddite.org

  • 1 Post
  • 3 Comments
Joined 2 years ago
cake
Cake day: June 7th, 2023

help-circle
  • No need to apologize for length with me basically ever!

    I was thinking how you did it in the second paragraph, but even more stripped down. The algorithm has N content buckets to choose from, then, once it chooses, the success is how much of the video the user watched. Users have the choice to only keep watching or log off for simplicity. For small N, I think that @[email protected] is right on that it’s the multi-armed bandit problem if we assume that user preferences are static. If we introduce the complexity that users prefer familiar things, which I think is pretty fair, so users are more likely to keep watching from a bucket if it’s a familiar bucket, I assume that exploration gets heavily disincentivized and exhibits some pretty weird behavior, while exploitation becomes much more favorable. What I like about this is that, with only a small deviation from a classic problem, it would help explain what you also explain, which is getting stuck in corners.

    Once you allow user choice beyond consume/log off, I think your way of thinking about it, as a turn based game, is exactly right, and your point about bin refinement is great and I hadn’t thought of that.


  • Thanks!

    I feel enlightened now that you called out the self-reinforcing nature of the algorithms. It makes sense that an RL agent solving the bandits problem would create its own bubbles out of laziness.

    You’re totally right that it’s like a multi-armed bandit problem, but maybe with so many possibilities that searching is prohibitively expensive, since the space of options to search is much bigger than the rate that humans can consume content. In other ways, though, there’s a dissimilarity because the agent’s reward depends on its past choices (people watch more of what they’re recommended). It would be really interesting to know if anyone has modeled a multi-armed bandit problem with this kind of self-dependency. I bet that, in that case, the exploration behavior is pretty chaotic. @[email protected] this seems like something you might just know off the top of your head!

    Maybe we can take advantage of that laziness to incept critical thinking back into social media, or at least have it eat itself.

    If you have any ideas for how to turn social media against itself, I’d love to hear them. I worked on this post unusually long for a lot of reasons, but one of them was trying to think of a counter strategy. I came up with nothing though!