ML Vault
All notes

Jaccard similarity

The difference from other similarity metrics in this article is that Jaccard similarity takes sets or binary vectors as an input. If vectors contain rankings or ratings, it is not applicable. In the case of movie recommendation, let’s say we have 3 movies with 3 top tags.

  • Movie A tags = (adventure, romantic, action)
  • Movie B tags = (adventure, space, action)
  • Movie C tags = (romantic, comedy, friendship)

Based on the data we may say that movie A is more similar to movie B than to movie C. This is because A and B share 2 tags (adventure, action) and A and C share one tag (romantic).