It took a lot of tinkering, but the data and code for this recommender is finally in the proper format for the frontend and backend engineers. I started with lists of scores for each post id, then that evolved into dictionaries of scores, and then we ended up with dictionaries of dictionaries (so many keys! everything is retrievable!). For each post id, we have the top X-number of most similar post id's, their scores, the entities from the original post, and the entities each similar post has in common with the original post, all in a super cool dictionary.
Where To Store, How To Serve
I stored the data for each post in Elasticsearch documents hosted in an S3 bucket. Fandor uses Elasticsearch for its search feature, so it was a natural fit to store the data to serve to the back and frontend engineers. Everything is stored in JSON format in Elasticsearch, which fit well with the data in the form of a dictionary of dictionaries.
To store the data for the data team to be able to query, I created a PostgreSQL table in Postico (SO CONVENIENT!) and learned the nifty trick that one of the possible datatypes in SQL is JSON (hallelujah!) which made my job so easy. I just saved all the data for each post this way and put it straight into the SQL table.
The code I wrote will be run each night and the recommendations will update offline and be stored in Elasticsearch. Recommender systems of this nature are especially difficult to evaluate, so I'm glad I set my KPI and made a dashboard before starting this project. Hopefully this recommender will be put into production by the end of my time at Fandor because I'd really like to know the quantified effect this will have on articles read and videos watched.