Building a Recommender System Using Embeddings

Wed, Jul 31, 2019

As part of the Data Science and Machine Learning team at Drop, I shared a blog post on how we built a recommender model to create a personalized experience for our members. Check it out! :)

User-User Collaborative Filtering Using Neo4j Graph Database

Sun, Nov 5, 2017

Motivation ¶

Recommendation systems fall under two categories: personalized and non-personalized recommenders. My previous post on association rules mining is an example of a non-personalized recommender, as the recommendations generated are not tailored to a specific user. By contrast, a personalized recommender system takes into account user preferences in order to make recommendations for that user. There are various personalized recommender algorithms, and in this post, I will be implementing user-user collaborative filtering. This algorithm finds users similar to the target user in order to generate recommendations for the target user. And to add to the fun, I will be implementing the algorithm using a graph database versus the more traditional approach of matrix factorization.

User-User Collaborative Filtering ¶

This personalized recommender algorithm assumes that past agreements predict future agreements. It uses the concept of similarity in order to identify users that are "like" the target user in terms of their preferences. Users identified as most similar to the target user become part of the target user's neighbourhood. Preferences of these neighbours are then used to generate recommendations for the target user.

Concretely, here are the steps we will be implementing to generate recommendations for an online grocer during the user check-out process:

Select a similarity metric to quantify likeness between users in the system.
For each user pair, compute similarity metric.
For each target user, select the top k neighbours based on the similarity metric.
Identify products purchased by the top k neighbours that have not been purchased by the target user.
Rank these products by the number of purchasing neighbours.
Recommend the top n products to the target user.

In this particular demonstration, we are building a recommender system based on the preferences of 100 users. As such, we would like the neighbourhood size, k, to be large enough to identify clear agreement between users, but not too large that we end up including those that are not very similar to the target user. Hence we choose k=10. Secondly, in the context of a user check-out application for an online grocer, the goal is to increase user basket by surfacing products that are as relevant as possible, without overwhelming the user. Therefore we limit the number of recommendations to n=10 products per user.

Jaccard Index ¶

The Jaccard index measures similarity between two sets, with values ranging from 0 to 1. A value of 0 indicates that the two sets have no elements in common, while a value of 1 implies that the two sets are identical. Given two sets A and B, the Jaccard index is computed as follows:

$$J(A,B) = \frac{| A \cap B |} {| A \cup B |}$$

The numerator is the number of elements that A and B have in common, while the denominator is the number of elements that are unique to each set.

In this implementation of user-user collaborative filtering, we will be using the Jaccard index to measure similarity between two users. This is primarily due to the sparse nature of the data, where there is a large number of products and each user purchases only a small fraction of those products. If we were to model our user preferences using binary attributes (ie: 1 if user purchased product X and 0 if user did not purchase product X), we would have a lot of 0s and very few 1s. The Jaccard index is effective in this case, as it eliminates matching attributes that have a value of 0 for both users. In other words, when computing similarity between two users, we only consider products that have been purchased, either by both users, or at least one of the users. Another great thing about the Jaccard index is that it accounts for cases where one user purchases significantly more products than the user it's being compared to. This can result in higher overlap of products purchased between the two, but this does not necessarily mean that the two users are similar. With the equation above, we see that the denominator will be large, thereby resulting in a smaller Jaccard index value.

Graph Database ¶

A graph database is a way of representing and storing data. Unlike a relational database which represents data as rows and columns, data in a graph database is represented using nodes, edges and properties. This representation makes graph databases conducive to storing data that is inherently connected. For our implementation of user-user collaborative filtering, we are interested in the relationships that exist between users based on their preferences. In particular, we have a system comprised of users who ordered orders, and these orders contain products that are in a department and in an aisle. These relationships are easily depicted using a property graph data model:

Property Graph Model

The nodes represent entities in our system: User, Order, Product, Department and Aisle. The edges represent relationships: ORDERED, CONTAINS, IN_DEPARTMENT and IN_AISLE. The attributes in the nodes represent properties (eg: a User has property "user_id"). Mapping to natural language, we can generally think of nodes as nouns, edges as verbs and properties as adjectives.

In this particular graph, each node type contains the same set of properties (eg: all Orders have an order_id, order_number, order_day_of_week and order_hour_of_day), but one of the interesting properties of graph databases is that it is schema-free. This means that a node can have an arbitrary set of properties. For example, we can have two user nodes, u1 and u2, with u1 having properties such as name, address and phone number, and u2 having properties such as name and email. The concept of a schema-free model also applies to the relationships that exist in the graph.

We will be implementing the property graph model above using Neo4j, arguably the most popular graph database today. The actual creation and querying of the database will be done using the Cypher query language, often thought of as SQL for graphs.

Input Dataset ¶

Similar to my post on association rules mining, we will once again be using data from Instacart. The datasets can be downloaded from the link below, along with the data dictionary:

“The Instacart Online Grocery Shopping Dataset 2017”, Accessed from https://www.instacart.com/datasets/grocery-shopping-2017 on Oct 10, 2017.

Part 1: Data Preparation ¶

In [1]:

import pandas as pd
import numpy as np
import sys

In [2]:

# Utility functions 

# Returns the size of an object in MB
def size(obj):
    return "{0:.2f} MB".format(sys.getsizeof(obj) / (1000 * 1000))

# Displays dataframe dimensions, size and top 5 records
def inspect_df(df_name, df):
    print('{0} --- dimensions: {1};  size: {2}'.format(df_name, df.shape, size(df)))  
    display(df.head())
    
# Exports dataframe to CSV, the format for loading data into Neo4j 
def export_to_csv(df, out):
    df.to_csv(out, sep='|', columns=df.columns, index=False)

A. Load user order data ¶

Data from Instacart contains 3-99 orders per user. Inspecting the distribution of number of orders per user, we see that users place 16 orders on average, with 75% of the user base placing at least 5 orders. For demonstration purposes, we will be running collaborative filtering on a random sample of 100 users who purchased at least 5 orders.

In [9]:

min_orders   = 5     # minimum order count per user
sample_count = 100   # number of users to select randomly

# Load data from evaluation set "prior" (please see data dictionary for definition of 'eval_set') 
order_user           = pd.read_csv('orders.csv')
order_user           = order_user[order_user['eval_set'] == 'prior']

# Get distribution of number of orders per user
user_order_count     = order_user.groupby('user_id').agg({'order_id':'count'}).rename(columns={'order_id':'num_orders'}).reset_index()
print('Distribution of number of orders per user:')
display(user_order_count['num_orders'].describe())

# Select users who purchased at least 'min_orders'
user_order_atleast_x = user_order_count[user_order_count['num_orders'] >= min_orders]

# For reproducibility, set seed before taking a random sample
np.random.seed(1111)
user_sample          = np.random.choice(user_order_atleast_x['user_id'], sample_count, replace=False)

# Subset 'order_user' to include records associated with the 100 randomly selected users
order_user           = order_user[order_user['user_id'].isin(user_sample)]
order_user           = order_user[['order_id','user_id','order_number','order_dow','order_hour_of_day']]
inspect_df('order_user', order_user)

Distribution of number of orders per user:

count    206209.000000
mean         15.590367
std          16.654774
min           3.000000
25%           5.000000
50%           9.000000
75%          19.000000
max          99.000000
Name: num_orders, dtype: float64


order_user --- dimensions: (1901, 5);  size: 0.09 MB

	order_id	user_id	order_number	order_dow	order_hour_of_day
11334	2808127	701	1	2	14
11335	2677145	701	2	3	11
11336	740361	701	3	1	13
11337	2866491	701	4	3	12
11338	1676999	701	5	4	11

B. Load order details data ¶

In [10]:

# Load orders associated with our 100 selected users, along with the products contained in those orders
order_product = pd.read_csv('order_products__prior.csv')
order_product = order_product[order_product['order_id'].isin(order_user.order_id.unique())][['order_id','product_id']]
inspect_df('order_product', order_product)

order_product --- dimensions: (19840, 2);  size: 0.48 MB

	order_id	product_id
1855	209	39409
1856	209	20842
1857	209	16965
1858	209	8021
1859	209	23001

C. Load product data ¶

In [11]:

# Load products purchased by our 100 selected users
products = pd.read_csv('products.csv')
products = products[products['product_id'].isin(order_product.product_id.unique())]
inspect_df('products', products)

products --- dimensions: (3959, 4);  size: 0.46 MB

	product_id	product_name	aisle_id	department_id
0	1	Chocolate Sandwich Cookies	61	19
33	34	Peanut Butter Cereal	121	14
44	45	European Cucumber	83	4
98	99	Local Living Butter Lettuce	83	4
115	116	English Muffins	93	3

D. Load aisle data ¶

In [12]:

# Load entire aisle data as it contains the names related to the aisle IDs from the 'products' data
aisles = pd.read_csv('aisles.csv')
inspect_df('aisles', aisles)

aisles --- dimensions: (134, 2);  size: 0.01 MB

	aisle_id	aisle
0	1	prepared soups salads
1	2	specialty cheeses
2	3	energy granola bars
3	4	instant foods
4	5	marinades meat preparation

E. Load department data ¶

In [13]:

# Load entire department data as it contains the names related to the department IDs from the 'products' data
departments = pd.read_csv('departments.csv')
inspect_df('departments', departments)

departments --- dimensions: (21, 2);  size: 0.00 MB

	department_id	department
0	1	frozen
1	2	other
2	3	bakery
3	4	produce
4	5	alcohol

F. Export dataframes to CSV, which in turn will be loaded into Neo4j ¶

In [14]:

export_to_csv(order_user,    '~/neo4j_instacart/import/neo4j_order_user.csv')
export_to_csv(order_product, '~/neo4j_instacart/import/neo4j_order_product.csv')    
export_to_csv(products,      '~/neo4j_instacart/import/neo4j_products.csv')
export_to_csv(aisles,        '~/neo4j_instacart/import/neo4j_aisles.csv')
export_to_csv(departments,   '~/neo4j_instacart/import/neo4j_departments.csv')

Part 2: Create Neo4j Graph Database ¶

A. Set up authentication and connection to Neo4j ¶

In [15]:

# py2neo allows us to work with Neo4j from within Python
from py2neo import authenticate, Graph

# Set up authentication parameters
authenticate("localhost:7474", "neo4j", "xxxxxxxx") 

# Connect to authenticated graph database
g = Graph("http://localhost:7474/db/data/")

B. Start with an empty database, then create constraints to ensure uniqueness of nodes ¶

In [16]:

# Each time this notebook is run, we start with an empty graph database
g.run("MATCH (n) DETACH DELETE n;")    

# We drop and recreate our node constraints
g.run("DROP CONSTRAINT ON (order:Order)             ASSERT order.order_id            IS UNIQUE;")
g.run("DROP CONSTRAINT ON (user:User)               ASSERT user.user_id              IS UNIQUE;")
g.run("DROP CONSTRAINT ON (product:Product)         ASSERT product.product_id        IS UNIQUE;")
g.run("DROP CONSTRAINT ON (aisle:Aisle)             ASSERT aisle.aisle_id            IS UNIQUE;")
g.run("DROP CONSTRAINT ON (department:Department)   ASSERT department.department_id  IS UNIQUE;")

g.run("CREATE CONSTRAINT ON (order:Order)           ASSERT order.order_id            IS UNIQUE;")
g.run("CREATE CONSTRAINT ON (user:User)             ASSERT user.user_id              IS UNIQUE;")
g.run("CREATE CONSTRAINT ON (product:Product)       ASSERT product.product_id        IS UNIQUE;")
g.run("CREATE CONSTRAINT ON (aisle:Aisle)           ASSERT aisle.aisle_id            IS UNIQUE;")
g.run("CREATE CONSTRAINT ON (department:Department) ASSERT department.department_id  IS UNIQUE;")

Out[16]:

<py2neo.database.Cursor at 0x109ea3e10>

C. Load product data into Neo4j ¶

In [19]:

query = """
        // Load and commit every 500 records
        USING PERIODIC COMMIT 500 
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_products.csv' AS line FIELDTERMINATOR '|' 
        WITH line 
        
        // Create Product, Aisle and Department nodes
        CREATE (product:Product {product_id: toInteger(line.product_id), product_name: line.product_name}) 
        MERGE  (aisle:Aisle {aisle_id: toInteger(line.aisle_id)}) 
        MERGE  (department:Department {department_id: toInteger(line.department_id)}) 

        // Create relationships between products and aisles & products and departments 
        CREATE (product)-[:IN_AISLE]->(aisle) 
        CREATE (product)-[:IN_DEPARTMENT]->(department);
        """

g.run(query)

Out[19]:

<py2neo.database.Cursor at 0x10a9e4208>

D. Load aisle data into Neo4j ¶

In [137]:

query = """
        // Aisle data is very small, so there is no need to do periodic commits
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_aisles.csv' AS line FIELDTERMINATOR '|' 
        WITH line 
        
        // For each Aisle node, set property 'aisle_name' 
        MATCH (aisle:Aisle {aisle_id: toInteger(line.aisle_id)}) 
        SET aisle.aisle_name = line.aisle;
        """

g.run(query)

Out[137]:

<py2neo.database.Cursor at 0x1101457b8>

E. Load department data into Neo4j ¶

In [138]:

query = """
        // Department data is very small, so there is no need to do periodic commits
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_departments.csv' AS line FIELDTERMINATOR '|' 
        WITH line
        
        // For each Department node, set property 'department_name' 
        MATCH (department:Department {department_id: toInteger(line.department_id)}) 
        SET department.department_name = line.department;
        """

g.run(query)

Out[138]:

<py2neo.database.Cursor at 0x11751acf8>

F. Load order details data into Neo4j ¶

In [139]:

query = """
        // Load and commit every 500 records        
        USING PERIODIC COMMIT 500
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_order_product.csv' AS line FIELDTERMINATOR '|'
        WITH line
        
        // Create Order nodes and then create relationships between orders and products
        MERGE (order:Order {order_id: toInteger(line.order_id)})
        MERGE (product:Product {product_id: toInteger(line.product_id)})
        CREATE (order)-[:CONTAINS]->(product);
        """

g.run(query)

Out[139]:

<py2neo.database.Cursor at 0x110145160>

G. Load user order data into Neo4j ¶

In [140]:

query = """
        // Load and commit every 500 records 
        USING PERIODIC COMMIT 500
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_order_user.csv' AS line FIELDTERMINATOR '|'
        WITH line
        
        // Create User nodes and then create relationships between users and orders 
        MERGE (order:Order {order_id: toInteger(line.order_id)})
        MERGE (user:User   {user_id:  toInteger(line.user_id)})

        // Create relationships between users and orders, then set Order properties
        CREATE(user)-[o:ORDERED]->(order)              
        SET order.order_number = toInteger(line.order_number),
            order.order_day_of_week = toInteger(line.order_dow), 
            order.order_hour_of_day = toInteger(line.order_hour_of_day);
        """

g.run(query)

Out[140]:

<py2neo.database.Cursor at 0x1149251d0>

H. What our graph looks like ¶

This is what the nodes and relationships we have created look like in Neo4j for a small subset of the data. Please use the legend on the top left corner only to determine the colours associated with the different nodes (ie: ignore the numbers).

Instacart Graph

Part 3: Implement User-User Collaborative Filtering Algorithm ¶

In [221]:

# Implements user-user collaborative filtering using the following steps:
#   1. For each user pair, compute Jaccard index
#   2. For each target user, select top k neighbours based on Jaccard index
#   3. Identify products purchased by the top k neighbours that have not been purchased by the target user
#   4. Rank these products by the number of purchasing neighbours
#   5. Return the top n recommendations for each user

def collaborative_filtering(graph, neighbourhood_size, num_recos):

    query = """
           // Get user pairs and count of distinct products that they have both purchased
           MATCH (u1:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)<-[:CONTAINS]-(:Order)<-[:ORDERED]-(u2:User)
           WHERE u1 <> u2
           WITH u1, u2, COUNT(DISTINCT p) as intersection_count

           // Get count of all the distinct products that they have purchased between them
           MATCH (u:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)
           WHERE u in [u1, u2]
           WITH u1, u2, intersection_count, COUNT(DISTINCT p) as union_count

           // Compute Jaccard index
           WITH u1, u2, intersection_count, union_count, (intersection_count*1.0/union_count) as jaccard_index

           // Get top k neighbours based on Jaccard index
           ORDER BY jaccard_index DESC, u2.user_id
           WITH u1, COLLECT(u2)[0..{k}] as neighbours
           WHERE LENGTH(neighbours) = {k}                                              // only want users with enough neighbours
           UNWIND neighbours as neighbour
           WITH u1, neighbour

           // Get top n recommendations from the selected neighbours
           MATCH (neighbour)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)             // get all products bought by neighbour
           WHERE not (u1)-[:ORDERED]->(:Order)-[:CONTAINS]->(p)                        // which target user has not already bought
           WITH u1, p, COUNT(DISTINCT neighbour) as cnt                                // count neighbours who purchased product
           ORDER BY u1.user_id, cnt DESC                                               // sort by count desc
           RETURN u1.user_id as user, COLLECT(p.product_name)[0..{n}] as recos         // return top n products
           """

    recos = {}
    for row in graph.run(query, k=neighbourhood_size, n=num_recos):
        recos[row[0]] = row[1]

    return recos

Part 4: Execute User-User Collaborative Filtering ¶

Our collaborative filtering function expects 3 parameters: a graph database, the neighbourhood size and the number of products to recommend to each user. A reminder that our graph database, g, contains nodes and relationships pertaining to user orders. And as previously discussed, we have chosen k=10 as the neighbourhood size and n=10 as the number of products to recommend to each of our users. We now invoke our collaborative filtering function using these parameters.

In [277]:

%%time
recommendations = collaborative_filtering(g,10,10)
display(recommendations)

{701: ['Strawberries',
  'Organic Zucchini',
  'Organic Strawberries',
  'Limes',
  'Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Baby Carrots',
  'Organic Black Beans',
  'Organic Fuji Apple',
  'Red Vine Tomato'],
 1562: ['Organic Whole String Cheese',
  'Honeycrisp Apple',
  'Organic Strawberries',
  'Sparkling Water Grapefruit',
  'Organic Zucchini',
  'Organic Yellow Onion',
  'Organic Ginger Root',
  'Asparagus',
  'Salted Butter',
  'Whole Almonds'],
 4789: ['Organic Granny Smith Apple',
  'Limes',
  'Organic Green Cabbage',
  'Organic Cilantro',
  'Creamy Almond Butter',
  'Corn Tortillas',
  'Organic Grape Tomatoes',
  'Unsweetened Almondmilk',
  'Organic Blackberries',
  'Organic Lacinato (Dinosaur) Kale'],
 5225: ['Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Strawberries',
  'Organic Blueberries',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Honeycrisp Apple',
  'Sweet Kale Salad Mix',
  'Banana',
  'Green Onions'],
 5939: ['Organic Lemon',
  'Organic Kiwi',
  'Fresh Cauliflower',
  'Organic Red Onion',
  'Organic Small Bunch Celery',
  'Organic Raspberries',
  'Banana',
  'Organic Garlic',
  'Organic Large Extra Fancy Fuji Apple',
  'Frozen Organic Wild Blueberries'],
 6043: ['Organic Garlic',
  'Organic Zucchini',
  'Bag of Organic Bananas',
  'Organic Yellow Onion',
  'Large Lemon',
  'Small Hass Avocado',
  'Organic Cilantro',
  'Organic Tomato Paste',
  'Organic Blueberries',
  'Organic Grape Tomatoes'],
 6389: ['Organic Zucchini',
  'Half & Half',
  'Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Organic Yellow Onion',
  'Banana',
  'Organic Garlic',
  'Penne Rigate',
  'Organic Garnet Sweet Potato (Yam)',
  'Organic Raspberries'],
 7968: ['Strawberries',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Organic Lemon',
  '100% Recycled Paper Towels',
  'Organic Blueberries',
  'Organic Garlic',
  'Bag of Organic Bananas',
  'Limes',
  'Organic Grape Tomatoes'],
 12906: ['Organic Garlic',
  'Bag of Organic Bananas',
  'Organic Raspberries',
  'Organic Black Beans',
  'Free & Clear Natural Laundry Detergent For Sensitive Skin',
  'Organic Blueberries',
  'Original Hummus',
  'Organic Hass Avocado',
  'Organic Red Onion',
  'Organic Zucchini'],
 24670: ['Organic Blueberries',
  'Organic Baby Carrots',
  'Organic Raspberries',
  'Organic Hass Avocado',
  'Organic Garlic',
  'Orange Bell Pepper',
  'Organic Italian Parsley Bunch',
  'Organic Cilantro',
  'Honeycrisp Apple',
  'Organic Baby Spinach'],
 25442: ['Organic Blueberries',
  'Organic Red Onion',
  'Organic Peeled Whole Baby Carrots',
  'Organic Garlic',
  'Organic Baby Arugula',
  'Jalapeno Peppers',
  'Large Lemon',
  'Limes',
  'Bunched Cilantro',
  'Organic Avocado'],
 25490: ['Bag of Organic Bananas',
  'Organic Garlic',
  'Organic Strawberries',
  'Banana',
  'Extra Virgin Olive Oil',
  'Organic Avocado',
  'Organic Navel Orange',
  'Carrots',
  'Small Hass Avocado',
  'Organic Hass Avocado'],
 26277: ['Bag of Organic Bananas',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Cereal',
  'Pure Vanilla Extract',
  'Hass Avocado Variety',
  'Peaches',
  'Original Beef Jerky',
  'Unsweetened Vanilla Almond Breeze',
  'Organic Lemonade'],
 32976: ['Bag of Organic Bananas',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Apple Honeycrisp Organic',
  'Peanut Butter Creamy With Salt',
  'Organic Baby Spinach',
  'Organic Grape Tomatoes',
  'Organic Zucchini',
  'Seedless Red Grapes',
  'Organic Yellow Onion'],
 37120: ['Organic Strawberries',
  'Organic Baby Spinach',
  'Strawberries',
  'Sour Cream',
  'Organic Avocado',
  'Banana',
  'Carrots',
  'Organic Baby Carrots',
  'Limes',
  'Russet Potato'],
 40286: ['Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Banana',
  'Organic Yellow Onion',
  'Sparkling Water Grapefruit',
  'Organic Strawberries',
  'Large Lemon',
  'Limes',
  'Organic Hass Avocado',
  'Asparagus'],
 42145: ['Organic Baby Spinach',
  'Large Lemon',
  'Organic Baby Carrots',
  'Fresh Cauliflower',
  'Organic Whole Milk',
  'Organic Large Extra Fancy Fuji Apple',
  'Bunched Cilantro',
  'Organic Kiwi',
  'Organic Ginger Root',
  'Grated Parmesan'],
 43902: ['Large Lemon',
  'Banana',
  'Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Organic Strawberries',
  'Yellow Onions',
  'Organic Yellow Onion',
  'Organic Granny Smith Apple',
  'Limes',
  'Organic Raspberries'],
 45067: ['Organic Yellow Onion',
  'Limes',
  'Organic Grape Tomatoes',
  'Organic Black Beans',
  'Cucumber Kirby',
  'Bunched Cilantro',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Organic Avocado',
  'Asparagus'],
 47838: ['Organic Ginger Root',
  'Organic Lacinato (Dinosaur) Kale',
  'Organic Baby Carrots',
  'Organic Italian Parsley Bunch',
  'Banana',
  'Organic Cucumber',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Avocado',
  'Limes',
  'Organic Zucchini'],
 49441: ['Organic Yellow Onion',
  'Organic Hass Avocado',
  'Honeycrisp Apple',
  'White Corn',
  'Organic Baby Carrots',
  'Organic Whole String Cheese',
  'Red Vine Tomato',
  'Extra Virgin Olive Oil',
  'Pineapple Chunks',
  'Organic Peeled Whole Baby Carrots'],
 50241: ['Organic Raspberries',
  'Yellow Onions',
  'Organic Baby Spinach',
  'Red Vine Tomato',
  'Organic Granny Smith Apple',
  'Bunched Cilantro',
  'Organic Black Beans',
  'Organic Yellow Onion',
  'Basil Pesto',
  'Limes'],
 51076: ['Strawberries',
  'Red Onion',
  'Organic Hass Avocado',
  'Blueberries',
  'Organic Peeled Whole Baby Carrots',
  'Original Hummus',
  'Organic Chicken & Apple Sausage',
  'Organic Grape Tomatoes',
  'Organic Fuji Apple',
  'Green Bell Pepper'],
 52784: ['Bag of Organic Bananas',
  'Organic Hass Avocado',
  'Large Lemon',
  'Red Vine Tomato',
  'Bunched Cilantro',
  'Organic Strawberries',
  'Honeycrisp Apple',
  'Green Bell Pepper',
  'Jalapeno Peppers',
  'Banana'],
 53304: ['Organic Baby Carrots',
  'Sweet Kale Salad Mix',
  'Blackberries',
  'Celery Sticks',
  'Brussels Sprouts',
  'Apple Honeycrisp Organic',
  'Brioche Hamburger Buns',
  'Apricots',
  'Packaged Grape Tomatoes',
  'Cereal'],
 53968: ['Strawberries',
  'Organic Baby Spinach',
  'Large Lemon',
  'Organic Garlic',
  'Organic Extra Firm Tofu',
  'Organic Zucchini',
  'Organic Avocado',
  'Organic Basil',
  'Red Onion',
  'Organic Raspberries'],
 55720: ['Organic Garlic',
  'Organic Zucchini',
  'Asparagus',
  'Fresh Cauliflower',
  'Organic Raspberries',
  'Organic Tomato Cluster',
  'Red Vine Tomato',
  'Organic Red Onion',
  'Organic Avocado',
  'Organic Yellow Onion'],
 56266: ['Banana',
  'Bag of Organic Bananas',
  'Limes',
  'Organic Strawberries',
  'Organic Garlic',
  'Roasted Red Pepper Hummus',
  'Organic Blueberries',
  'Organic Hass Avocado',
  'Honeycrisp Apple',
  'Ground Black Pepper'],
 58959: ['Organic Large Extra Fancy Fuji Apple',
  'Chicken Base, Organic',
  'Organic Baby Spinach',
  'Banana',
  'Organic Baby Carrots',
  'Organic Unsweetened Almond Milk',
  'Organic Garlic',
  'Large Lemon',
  'Grape White/Green Seedless',
  'Sparkling Lemon Water'],
 59889: ['Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Organic Raspberries',
  'Organic Hass Avocado',
  'Large Lemon',
  'Organic Peeled Whole Baby Carrots',
  'Organic Baby Carrots',
  'Fresh Cauliflower',
  'Organic Black Beans'],
 61065: ['Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Banana',
  'Organic Small Bunch Celery',
  'Organic Yellow Onion',
  'Strawberries',
  'Organic Ginger Root',
  'Yellow Onions',
  'Organic Raspberries'],
 63472: ['Carrots',
  'Bag of Organic Bananas',
  'Organic Avocado',
  'Organic Strawberries',
  'Banana',
  'Organic Grape Tomatoes',
  'Organic Hass Avocado',
  'Philadelphia Original Cream Cheese',
  'Half & Half',
  'Gluten Free Chocolate Dipped Donuts'],
 66265: ['Blueberries',
  'Organic Strawberries',
  'Bag of Organic Bananas',
  'Large Lemon',
  'Tilapia Filet',
  'Bartlett Pears',
  'Half & Half',
  'Banana',
  'Unsweetened Almondmilk',
  'Organic Peeled Whole Baby Carrots'],
 67941: ['Large Lemon',
  'Organic Red Onion',
  'Organic Baby Spinach',
  'Bag of Organic Bananas',
  'Strawberries',
  'Sparkling Water Grapefruit',
  'Fresh Cauliflower',
  'Organic Garlic',
  'Banana',
  'Organic Small Bunch Celery'],
 69178: ['Organic Blueberries',
  'Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Strawberries',
  'Raspberries',
  'Banana',
  'Creamy Almond Butter',
  'Organic Garlic',
  'Brussels Sprouts'],
 72791: ['Orange Bell Pepper',
  'Bag of Organic Bananas',
  'Organic Peeled Whole Baby Carrots',
  'Organic Yellow Onion',
  'Organic Baby Spinach',
  'Organic Zucchini',
  'Bunched Cilantro',
  'Organic Avocado',
  'Organic Raspberries',
  'Sugar Snap Peas'],
 73171: ['Organic Blueberries',
  'Organic Zucchini',
  'Bag of Organic Bananas',
  'Large Lemon',
  'Extra Virgin Olive Oil',
  'Strawberries',
  'Organic Yellow Onion',
  'Organic Baby Spinach',
  'Basil Pesto',
  'Limes'],
 73477: ['Organic Baby Spinach',
  'Large Lemon',
  'Organic Kiwi',
  'Strawberries',
  'Organic Baby Carrots',
  'Organic Tomato Paste',
  'Organic Italian Parsley Bunch',
  'Organic Raspberries',
  'Organic Fuji Apple',
  'Limes'],
 75993: ['Bag of Organic Bananas',
  'Large Lemon',
  'Organic Garlic',
  'Organic Strawberries',
  'Organic Avocado',
  'Banana',
  'Organic Blueberries',
  'Organic Ground Korintje Cinnamon',
  'Organic Hass Avocado',
  'Organic Zucchini'],
 85028: ['Organic Avocado',
  'Organic Strawberries',
  'Organic Baby Spinach',
  'Organic Garlic',
  'Bag of Organic Bananas',
  'Yellow Onions',
  'Organic Hass Avocado',
  'Bunched Cilantro',
  'Organic Raspberries',
  'Organic Baby Carrots'],
 85238: ['Organic Avocado',
  'Organic Raspberries',
  'Organic Zucchini',
  'Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Yellow Onion',
  'Organic Small Bunch Celery',
  'Limes',
  'Organic Ginger Root',
  'Organic Lemon'],
 87350: ['Organic Avocado',
  'Apple Honeycrisp Organic',
  'Original Hummus',
  'Organic Cucumber',
  'Carrots',
  'Yellow Onions',
  'Small Hass Avocado',
  'Extra Virgin Olive Oil',
  'Organic Zucchini',
  'Organic Blueberries'],
 89776: ['Organic Zucchini',
  'Large Lemon',
  'Red Onion',
  'Orange Bell Pepper',
  'Organic Fuji Apple',
  'Organic Avocado',
  'Carrots',
  'Red Vine Tomato',
  'Organic Yellow Onion',
  'Organic Extra Firm Tofu'],
 93241: ['Organic Strawberries',
  'Large Lemon',
  'Organic Garlic',
  'Organic Raspberries',
  'Organic Creamy Peanut Butter',
  'Organic Baby Carrots',
  'Creamy Almond Butter',
  'Lime',
  'Bag of Organic Bananas',
  'Organic Red Bell Pepper'],
 95686: ['Banana',
  'Organic Avocado',
  'Organic Hass Avocado',
  'Organic Strawberries',
  'Large Lemon',
  'Organic Zucchini',
  'Organic Baby Spinach',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Baby Carrots',
  'Asparagus'],
 96466: ['Organic Zucchini',
  'Organic Small Bunch Celery',
  'Organic Fuji Apple',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Baby Carrots',
  'Fresh Cauliflower',
  'Organic Baby Spinach',
  'Michigan Organic Kale',
  'Organic Whole Strawberries',
  'Large Lemon'],
 98570: ['Strawberries',
  'Organic Hass Avocado',
  'Organic Whole String Cheese',
  'Organic Baby Spinach',
  'Original Hummus',
  'Organic Tomato Paste',
  'Organic Yellow Onion',
  'Extra Virgin Olive Oil',
  'Organic Peeled Whole Baby Carrots',
  'Organic Black Beans'],
 99282: ['Organic Raspberries',
  'Organic Small Bunch Celery',
  'Organic Cucumber',
  'Original Hummus',
  'Organic Yellow Onion',
  'Organic Avocado',
  'Organic Zucchini',
  'Strawberries',
  'Organic Lemon',
  'Organic Baby Carrots'],
 100253: ['Organic Strawberries',
  'Organic Hass Avocado',
  'Organic Avocado',
  'Bing Cherries',
  'Organic Yellow Onion',
  'Organic Grape Tomatoes',
  'Organic Lemon',
  'Organic Tomato Cluster',
  'Extra Virgin Olive Oil',
  'Organic Garlic'],
 102099: ['Bag of Organic Bananas',
  'Organic Strawberries',
  'Organic Zucchini',
  'Red Vine Tomato',
  'Organic Hass Avocado',
  'Organic Blueberries',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Avocado',
  'Organic Small Bunch Celery',
  'Organic Raspberries'],
 104175: ['Banana',
  'Organic Blueberries',
  'Organic Strawberries',
  'Large Lemon',
  'Bag of Organic Bananas',
  'Extra Virgin Olive Oil',
  'Organic Baby Spinach',
  'Cucumber Kirby',
  'Organic Yellow Onion',
  'Organic Avocado'],
 107051: ['Organic Zucchini',
  'Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Limes',
  'Organic Avocado',
  'Organic Extra Firm Tofu',
  'Fresh Cauliflower',
  'Asparagus',
  'Chocolate Chip Cookie Dough Ice Cream',
  'Organic Blueberries'],
 107931: ['Large Lemon',
  'Organic Granny Smith Apple',
  'Pineapple Chunks',
  'Organic Baby Carrots',
  'Organic Raspberries',
  'Limes',
  'Organic Italian Parsley Bunch',
  'White Corn',
  'Bag of Organic Bananas',
  'Yellow Onions'],
 111387: ['Bag of Organic Bananas',
  'Garlic',
  'Banana',
  'Organic Raspberries',
  'Large Lemon',
  'Organic Lemon',
  'Organic Cilantro',
  'Clementines, Bag',
  'Organic Red Bell Pepper',
  '100% Whole Wheat Bread'],
 114336: ['Organic Strawberries',
  'Strawberries',
  'Organic Blueberries',
  'Sweet Kale Salad Mix',
  'Blackberries',
  'Organic Baby Carrots',
  'Cereal',
  'Hass Avocados',
  'Organic Baby Spinach',
  'Meyer Lemons'],
 114764: ['Bag of Organic Bananas',
  'Organic Strawberries',
  'Banana',
  'Organic Ginger Root',
  'Organic Small Bunch Celery',
  'Organic Baby Spinach',
  'Organic Kiwi',
  'Organic Cucumber',
  'Unsweetened Almondmilk',
  'Frozen Organic Wild Blueberries'],
 118102: ['Bag of Organic Bananas',
  'Banana',
  'Organic Baby Spinach',
  'Organic Zucchini',
  'Large Lemon',
  'Organic Blueberries',
  'Lime Sparkling Water',
  'Organic Lacinato (Dinosaur) Kale',
  'Organic Ginger Root',
  'Organic Whole Strawberries'],
 118981: ['Large Lemon',
  'Organic Blueberries',
  'Organic Avocado',
  'Organic Fuji Apple',
  'Strawberries',
  'Basil Pesto',
  'Organic Baby Spinach',
  'Organic Garlic',
  'Organic Raspberries',
  'Organic Baby Carrots'],
 120138: ['Banana',
  'Organic Grape Tomatoes',
  'Strawberries',
  'Red Onion',
  'Large Lemon',
  'Blueberries',
  'Organic Fuji Apple',
  'Organic Avocado',
  'Organic Cilantro',
  'Organic Peeled Whole Baby Carrots'],
 120660: ['Bag of Organic Bananas',
  'Organic Yellow Onion',
  'Organic Lemon',
  'Limes',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Banana',
  'Organic Blueberries',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Hass Avocado'],
 125120: ['Organic Blueberries',
  'Strawberries',
  'Organic Strawberries',
  'Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Blackberries',
  'Raspberries',
  'Fresh Asparagus',
  'Honeycrisp Apples',
  'Sweet Kale Salad Mix'],
 129124: ['Organic Baby Spinach',
  'Organic Strawberries',
  'Banana',
  'Organic Avocado',
  '100% Recycled Paper Towels',
  'Bag of Organic Bananas',
  'Extra Virgin Olive Oil',
  'Organic Grape Tomatoes',
  'Feta Cheese Crumbles',
  'Large Lemon'],
 131280: ['Organic Strawberries',
  'Limes',
  'Orange Bell Pepper',
  'Bag of Organic Bananas',
  'Organic Blueberries',
  'Strawberries',
  'Organic Baby Spinach',
  'Organic Grape Tomatoes',
  'Organic Tomato Cluster',
  'Blueberries'],
 132038: ['Organic Strawberries',
  'Large Lemon',
  'Organic Blueberries',
  'Grape White/Green Seedless',
  'Organic Zucchini',
  'Organic Avocado',
  'Limes',
  'Orange Bell Pepper',
  'Organic Baby Spinach',
  'Organic Whole Milk'],
 132551: ['Bag of Organic Bananas',
  'Strawberries',
  'Organic Blueberries',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Blackberries',
  'Fresh Asparagus',
  'Organic Lemon',
  'Soda',
  'Real Mayonnaise'],
 133738: ['Organic Avocado',
  'Organic Garlic',
  'Cucumber Kirby',
  'Organic Baby Carrots',
  'Organic Whole Milk',
  'Bag of Organic Bananas',
  'Organic Blueberries',
  'Organic Small Bunch Celery',
  'Organic Hass Avocado',
  'Organic Strawberries'],
 133964: ['Large Lemon',
  'Strawberries',
  'Organic Blueberries',
  'Organic Zucchini',
  'Organic Strawberries',
  'Basil Pesto',
  'Organic Baby Spinach',
  'Original Fresh Stack Crackers',
  'Bag of Organic Bananas',
  'No Salt Added Black Beans'],
 138067: ['Organic Whole Milk',
  'Kale & Spinach Superfood Puffs',
  'Shredded Mild Cheddar Cheese',
  'Organic Grape Tomatoes',
  'Granny Smith Apples',
  'Large Lemon',
  'Pure & Natural Sour Cream',
  'Cherubs Heavenly Salad Tomatoes',
  'Lime',
  'Limes'],
 138203: ['Organic Avocado',
  'Organic Garlic',
  'Limes',
  'Organic Peeled Whole Baby Carrots',
  'Yellow Onions',
  'Organic Small Bunch Celery',
  'Organic Cilantro',
  'Organic Tomato Paste',
  'Organic Garnet Sweet Potato (Yam)',
  'Organic Hass Avocado'],
 139656: ['Organic Raspberries',
  'Organic Strawberries',
  'Organic Blueberries',
  'Organic Garlic',
  'Organic Avocado',
  'Organic Granny Smith Apple',
  'Organic Cilantro',
  'Organic Blackberries',
  'Large Lemon',
  'Banana'],
 141719: ['Organic Raspberries',
  'Strawberries',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Organic Hass Avocado',
  'Organic Ginger Root',
  'Organic Cilantro',
  'Red Onion',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Peeled Whole Baby Carrots'],
 147179: ['Organic Avocado',
  'Organic Zucchini',
  'Organic Strawberries',
  'Bag of Organic Bananas',
  'Basil Pesto',
  'Strawberries',
  'Organic Whole Milk',
  'Organic Raspberries',
  'Seedless Red Grapes',
  'Organic Baby Arugula'],
 151119: ['Large Lemon',
  'Banana',
  'Pure & Natural Sour Cream',
  'Red Onion',
  'Organic Grape Tomatoes',
  'Blueberries',
  'Limes',
  'Eggo Homestyle Waffles',
  'Sauvignon Blanc',
  'Elbow Macaroni Pasta'],
 151410: ['Organic Strawberries',
  'Banana',
  'Bag of Organic Bananas',
  'Red Vine Tomato',
  'Large Lemon',
  'Organic Avocado',
  'Organic Cucumber',
  'Organic Zucchini',
  'Limes',
  'Organic Mint'],
 151564: ['Organic Strawberries',
  'Organic Baby Spinach',
  'Banana',
  'Large Lemon',
  'Original Hummus',
  'Organic Raspberries',
  'Organic Ginger Root',
  'Organic Baby Carrots',
  'Limes',
  'Organic Cilantro'],
 154852: ['Organic Strawberries',
  'Bag of Organic Bananas',
  'Organic Garlic',
  'Organic Yellow Onion',
  'Organic Avocado',
  'Organic Cilantro',
  'Organic Granny Smith Apple',
  'Organic Baby Spinach',
  'Organic Blueberries',
  'Extra Virgin Olive Oil'],
 156537: ['Strawberries',
  'Large Lemon',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Red Onion',
  'Blackberries',
  'Beef Loin New York Strip Steak',
  'Bunched Cilantro',
  'Organic Baby Arugula',
  'Cherrios Honey Nut'],
 157497: ['Organic Baby Spinach',
  'Strawberries',
  'Banana',
  'Orange Bell Pepper',
  'Organic Zucchini',
  'Organic Blueberries',
  'Organic Garlic',
  'Organic Strawberries',
  'Organic Garnet Sweet Potato (Yam)',
  '100% Recycled Paper Towels'],
 157798: ['Seedless Small Watermelon',
  'Organic Baby Spinach',
  'Hass Avocados',
  'Seedless Red Grapes',
  'Heavy Duty Aluminum Foil',
  'Creamy Almond Butter',
  'Organic Whole String Cheese',
  'Meyer Lemons',
  'Brussels Sprouts',
  'Brioche Hamburger Buns'],
 158373: ['Banana',
  'Organic Avocado',
  'Organic Quick Oats',
  'Seedless Red Grapes',
  'Organic Blueberries',
  'Organic Grape Tomatoes',
  'Limes',
  'Organic Broccoli Florets',
  'Organic Lemon',
  'Organic Baby Carrots'],
 159308: ['Red Onion',
  'Blueberries',
  'Large Lemon',
  'Plain Whole Milk Yogurt',
  'Orange Bell Pepper',
  'Organic Reduced Fat Milk',
  'Provolone',
  'Seedless Red Grapes',
  'Organic Beans & Rice Cheddar Cheese Burrito',
  'Supergreens!'],
 161574: ['Bag of Organic Bananas',
  'Sweet Kale Salad Mix',
  'Soda',
  'Raspberries',
  'Green Bell Pepper',
  'Apricots',
  'Real Mayonnaise',
  'Organic Navel Orange',
  'Fat Free Skim Milk',
  'Brussels Sprouts'],
 166707: ['Organic Baby Carrots',
  'Organic Avocado',
  'Strawberries',
  'Organic Small Bunch Celery',
  'Yellow Onions',
  'Banana',
  'Organic Ginger Root',
  'Organic Baby Spinach',
  'Cucumber Kirby',
  'Red Vine Tomato'],
 169583: ['Organic Hass Avocado',
  'Organic Zucchini',
  'Extra Virgin Olive Oil',
  'Organic Blueberries',
  'Honeycrisp Apple',
  'Organic Yellow Onion',
  'Organic Raspberries',
  'Basil Pesto',
  'Bing Cherries',
  'Organic Italian Parsley Bunch'],
 177453: ['Banana',
  'Large Lemon',
  'Organic Baby Arugula',
  'Strawberries',
  'Bag of Organic Bananas',
  'Limes',
  'Organic Rainbow Carrots',
  'Chocolate Peanut Butter Ice Cream',
  'Instant Coffee',
  'Organic Medjool Dates'],
 179429: ['Banana',
  'Large Lemon',
  'Organic Grape Tomatoes',
  'Organic Garlic',
  'Orange Bell Pepper',
  'Organic Cilantro',
  'Sustainably Soft Bath Tissue',
  'Organic Diced Tomatoes',
  'Organic Fuji Apple',
  'Strawberries'],
 180305: ['Bag of Organic Bananas',
  'Organic Yellow Onion',
  'Organic Baby Spinach',
  'Banana',
  'Organic Whole Milk',
  'Asparagus',
  'Organic Peeled Whole Baby Carrots',
  'Apple Honeycrisp Organic',
  'Organic Grape Tomatoes',
  'Large Lemon'],
 180461: ['Organic Black Beans',
  'Original Hummus',
  'Strawberries',
  'Free & Clear Natural Laundry Detergent For Sensitive Skin',
  'Organic Blackberries',
  'Organic Tomato Paste',
  'Basil Pesto',
  'Asparagus',
  '100% Recycled Paper Towels',
  'Extra Virgin Olive Oil'],
 182863: ['Bag of Organic Bananas',
  'Organic Hass Avocado',
  'Organic Strawberries',
  'Organic Red Bell Pepper',
  'Organic Baby Spinach',
  'Organic Black Beans',
  'Banana',
  'Organic Small Bunch Celery',
  'Organic Garnet Sweet Potato (Yam)',
  "Organic D'Anjou Pears"],
 185153: ['Organic Peeled Whole Baby Carrots',
  'Organic Baby Spinach',
  'Large Lemon',
  'Cucumber Kirby',
  'Blueberries',
  'Organic Avocado',
  'Feta Cheese Crumbles',
  'Organic Garlic',
  'Organic Strawberries',
  'Tomato Sauce'],
 187019: ['Organic Blueberries',
  'Organic Baby Carrots',
  'Blackberries',
  'Bag of Organic Bananas',
  'Limes',
  'Organic Lemon',
  'Sweet Kale Salad Mix',
  'Organic Strawberries',
  'Brussels Sprouts',
  'Organic Avocado'],
 187754: ['Organic Yellow Onion',
  'Organic Cilantro',
  'Organic Lemon',
  'Banana',
  'Organic Garlic',
  'Organic Granny Smith Apple',
  'Large Lemon',
  'Organic Peeled Whole Baby Carrots',
  'Frozen Organic Wild Blueberries',
  'Organic Black Beans'],
 192587: ['Banana',
  'Organic Grape Tomatoes',
  'Organic Grade A Free Range Large Brown Eggs',
  'Organic Peeled Whole Baby Carrots',
  'Strawberries',
  'Cucumber Kirby',
  'Organic Blueberries',
  'Organic Baby Arugula',
  'Organic Baby Spinach',
  'Fresh Cauliflower'],
 197989: ['Organic Strawberries',
  'Organic Cucumber',
  'Large Lemon',
  'Organic Avocado',
  'Organic Yellow Onion',
  'Asparagus',
  'Extra Virgin Olive Oil',
  'Granny Smith Apples',
  'Organic Gala Apples',
  'Organic Blueberries'],
 199124: ['Red Onion',
  'Banana',
  'Strawberries',
  'Classic Hummus',
  'Seedless Red Grapes',
  'Orange Bell Pepper',
  'Organic Strawberries',
  'Organic Avocado',
  'Blueberries',
  'Uncured Genoa Salami'],
 200078: ['Large Lemon',
  'Jalapeno Peppers',
  'Bag of Organic Bananas',
  'Carrots',
  'Red Onion',
  'Orange Bell Pepper',
  'Yellow Onions',
  'Organic Hass Avocado',
  'Organic Peeled Whole Baby Carrots',
  'Limes'],
 201135: ['Organic Avocado',
  'Large Lemon',
  'Organic Yellow Onion',
  'Organic Garlic',
  'Organic Baby Carrots',
  'Original Hummus',
  'Organic Whole String Cheese',
  'Pineapple Chunks',
  'Organic Cucumber',
  'Organic Red Onion'],
 201870: ['Large Lemon',
  'Organic Baby Spinach',
  'Strawberries',
  'Organic Baby Carrots',
  'Organic Half & Half',
  'Organic Ginger Root',
  'Organic Zucchini',
  'Organic Fuji Apple',
  'Bag of Organic Bananas',
  'Organic Avocado'],
 203111: ['Strawberries',
  'Packaged Grape Tomatoes',
  'Unsweetened Vanilla Almond Breeze',
  'Organic Sage',
  'Organic Tortilla Chips',
  'Soda',
  'Vanilla Milk Chocolate Almond Ice Cream Bars Multi-Pack',
  'Organic Zucchini',
  'Coconut Water',
  'Teriyaki & Pineapple Chicken Meatballs']}

CPU times: user 46.5 ms, sys: 2.75 ms, total: 49.2 ms
Wall time: 1min

As can be seen above, our collaborative filtering function returns a dictionary of users and their top 10 recommended products. To see how we arrived at the output above, let's break down our function using a speficic example, user 4789. For reference, below are the products that this user has been recommended:

In [278]:

recommendations[4789]

Out[278]:

['Organic Granny Smith Apple',
 'Limes',
 'Organic Green Cabbage',
 'Organic Cilantro',
 'Creamy Almond Butter',
 'Corn Tortillas',
 'Organic Grape Tomatoes',
 'Unsweetened Almondmilk',
 'Organic Blackberries',
 'Organic Lacinato (Dinosaur) Kale']

The first main component of our collaborative filtering function identifies the top 10 neighbours for user 4789. It does so by creating user pairs, where u1 is always user 4789 and u2 is any other user who has purchased products that u1 has purchased. It then computes the Jaccard index for each user pair, by taking the number of distinct products that u1 and u2 have purchased in common (intersection_count) and dividing it by the number of distinct products that are unique to each user (union_count). The 10 users with the highest Jaccard index are selected as user 4789's neighbourhood.

In [285]:

query = """
        // Get count of all distinct products that user 4789 has purchased and find other users who have purchased them
        MATCH (u1:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)<-[:CONTAINS]-(:Order)<-[:ORDERED]-(u2:User)
        WHERE u1 <> u2
          AND u1.user_id = {uid}
        WITH u1, u2, COUNT(DISTINCT p) as intersection_count
        
        // Get count of all the distinct products that are unique to each user
        MATCH (u:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)
        WHERE u in [u1, u2]
        WITH u1, u2, intersection_count, COUNT(DISTINCT p) as union_count
       
        // Compute Jaccard index
        WITH u1, u2, intersection_count, union_count, (intersection_count*1.0/union_count) as jaccard_index
        
        // Get top k neighbours based on Jaccard index
        ORDER BY jaccard_index DESC, u2.user_id
        WITH u1, COLLECT([u2.user_id, jaccard_index, intersection_count, union_count])[0..{k}] as neighbours
     
        WHERE LENGTH(neighbours) = {k}                // only want to return users with enough neighbours
        RETURN u1.user_id as user, neighbours
        """

neighbours = {}
for row in g.run(query, uid=4789, k=10):
    neighbours[row[0]] = row[1]

print("Labels for user 4789's neighbour list: user_id, jaccard_index, intersection_count, union count")
display(neighbours)

Labels for user 4789's neighbour list: user_id, jaccard_index, intersection_count, union count

  
{4789: [[42145, 0.12794612794612795, 38, 297],
  [138203, 0.10497237569060773, 38, 362],
  [87350, 0.09390862944162437, 37, 394],
  [49441, 0.0912280701754386, 26, 285],
  [187754, 0.0912280701754386, 26, 285],
  [180461, 0.09115281501340483, 34, 373],
  [120660, 0.08641975308641975, 21, 243],
  [107931, 0.08360128617363344, 26, 311],
  [73477, 0.07855626326963906, 37, 471],
  [154852, 0.0735930735930736, 17, 231]]}

The second main component of our collaborative filtering function generates recommendations for user 4789 using the neighbours identified above. It does so by considering products that the neighbours have purchased which user 4789 has not already purchased. The function then counts the number of neighbours who have purchased each of the candidate products. The 10 products with the highest neighbour count are selected as recommendations for user 4789.

In [287]:

%%time
query = """
        // Get top n recommendations for user 4789 from the selected neighbours
        MATCH (u1:User),
              (neighbour:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)        // get all products bought by neighbour
        WHERE u1.user_id = {uid}
          AND neighbour.user_id in {neighbours}
          AND not (u1)-[:ORDERED]->(:Order)-[:CONTAINS]->(p)                        // which u1 has not already bought
        
        WITH u1, p, COUNT(DISTINCT neighbour) as cnt                                // count times purchased by neighbours
        ORDER BY u1.user_id, cnt DESC                                               // and sort by count desc
        RETURN u1.user_id as user, COLLECT([p.product_name,cnt])[0..{n}] as recos  
        """

recos = {}
for row in g.run(query, uid=4789, neighbours=[42145,138203,87350,49441,187754,180461,120660,107931,73477,154852], n=10):
    recos[row[0]] = row[1]
    
print("Labels for user 4789's recommendations list: product, number of purchasing neighbours")
display(recos)

Labels for user 4789's recommendations list: product, number of purchasing neighbours


{4789: [['Organic Granny Smith Apple', 6],
  ['Limes', 5],
  ['Organic Green Cabbage', 5],
  ['Organic Cilantro', 5],
  ['Creamy Almond Butter', 5],
  ['Corn Tortillas', 5],
  ['Organic Grape Tomatoes', 5],
  ['Unsweetened Almondmilk', 4],
  ['Organic Blackberries', 4],
  ['Organic Lacinato (Dinosaur) Kale', 4]]}
  

CPU times: user 6.85 ms, sys: 2.08 ms, total: 8.93 ms
Wall time: 716 ms

Part 5: Evaluating Recommender Performance ¶

If we were to actually integrate our recommender system in to a production environment, we would need a way to measure its performance. As mentioned, in the context of a user check-out application for an online grocer, the goal is to increase basket size, by surfacing a short list of products that are as relevant as posssible to the user. For this particular application, we could choose precision as our metric for evaluating our recommender's performance. Precision is computed as the proportion of products that the user actually purchased, out of all the products that user has been recommended. To determine overall recommender performance, average precision can be calculated using the precision values for all the users in the system.

Conclusion ¶

We have demonstrated how to build a user-based recommender system leveraging the principles of user-user collaborative filtering. We've discussed the key concepts underlying this algorithm, from identifying neighbourhoods using a similarity metric, to generating recommendations for a user based on its neighbours' preferences. In addition, we have shown how easy and intuitive modeling connected data can be with a graph database. One final point worth noting: in real world applications, we may want to implement non-personalized recommendation strategies for users who are new to the system and those who have not yet made sufficient purchases. Strategies may include recommending top selling products for new users, and for the latter group, products identified to have high affinity with other products that the user has already purchased. This can be done through association rules mining, also known as market basket analysis.

Association Rules Mining Using Python Generators to Handle Large Datasets

Tue, Sep 12, 2017

Motivation¶

I was looking to run association analysis in Python using the apriori algorithm to derive rules of the form {A} -> {B}. However, I quickly discovered that it's not part of the standard Python machine learning libraries. Although there are some implementations that exist, I could not find one capable of handling large datasets. "Large" in my case was an orders dataset with 32 million records, containing 3.2 million unique orders and about 50K unique items (file size just over 1 GB). So, I decided to write my own implementation, leveraging the apriori algorithm to generate simple {A} -> {B} association rules. Since I only care about understanding relationships between any given pair of items, using apriori to get to item sets of size 2 is sufficient. I went through various iterations, splitting the data into multiple subsets just so I could get functions like crosstab and combinations to run on my machine with 8 GB of memory. :) But even with this approach, I could only process about 1800 items before my kernel would crash... And that's when I learned about the wonderful world of Python generators.

Python Generators¶

In a nutshell, a generator is a special type of function that returns an iterable sequence of items. However, unlike regular functions which return all the values at once (eg: returning all the elements of a list), a generator yields one value at a time. To get the next value in the set, we must ask for it - either by explicitly calling the generator's built-in "next" method, or implicitly via a for loop. This is a great property of generators because it means that we don't have to store all of the values in memory at once. We can load and process one value at a time, discard when finished and move on to process the next value. This feature makes generators perfect for creating item pairs and counting their frequency of co-occurence. Here's a concrete example of what we're trying to accomplish:

Get all possible item pairs for a given order

eg:  order 1:  apple, egg, milk    -->  item pairs: {apple, egg}, {apple, milk}, {egg, milk}
     order 2:  egg, milk           -->  item pairs: {egg, milk}

Count the number of times each item pair appears

eg: {apple, egg}: 1
    {apple, milk}: 1
    {egg, milk}: 2

Here's the generator that implements the above tasks:

In [1]:

import numpy as np
from itertools import combinations, groupby
from collections import Counter

# Sample data
orders = np.array([[1,'apple'], [1,'egg'], [1,'milk'], [2,'egg'], [2,'milk']], dtype=object)

# Generator that yields item pairs, one at a time
def get_item_pairs(order_item):
    
    # For each order, generate a list of items in that order
    for order_id, order_object in groupby(orders, lambda x: x[0]):
        item_list = [item[1] for item in order_object]      
    
        # For each item list, generate item pairs, one at a time
        for item_pair in combinations(item_list, 2):
            yield item_pair                                      


# Counter iterates through the item pairs returned by our generator and keeps a tally of their occurrence
Counter(get_item_pairs(orders))

Out[1]:

Counter({('apple', 'egg'): 1, ('apple', 'milk'): 1, ('egg', 'milk'): 2})

get_item_pairs() generates a list of items for each order and produces item pairs for that order, one pair at a time. The first item pair is passed to Counter which keeps track of the number of times an item pair occurs. The next item pair is taken, and again, passed to Counter. This process continues until there are no more item pairs left. With this approach, we end up not using much memory as item pairs are discarded after the count is updated.

Apriori Algorithm¶

Apriori is an algorithm used to identify frequent item sets (in our case, item pairs). It does so using a "bottom up" approach, first identifying individual items that satisfy a minimum occurence threshold. It then extends the item set, adding one item at a time and checking if the resulting item set still satisfies the specified threshold. The algorithm stops when there are no more items to add that meet the minimum occurrence requirement. Here's an example of apriori in action, assuming a minimum occurence threshold of 3:

order 1: apple, egg, milk  
order 2: carrot, milk  
order 3: apple, egg, carrot
order 4: apple, egg
order 5: apple, carrot


Iteration 1:  Count the number of times each item occurs   
item set      occurrence count    
{apple}              4   
{egg}                3   
{milk}               2   
{carrot}             2   

{milk} and {carrot} are eliminated because they do not meet the minimum occurrence threshold.


Iteration 2: Build item sets of size 2 using the remaining items from Iteration 1 (ie: apple, egg)  
item set           occurence count  
{apple, egg}             3  

Only {apple, egg} remains and the algorithm stops since there are no more items to add.

If we had more orders and items, we can continue to iterate, building item sets consisting of more than 2 elements. For the problem we are trying to solve (ie: finding relationships between pairs of items), it suffices to implement apriori to get to item sets of size 2.

Association Rules Mining¶

Once the item sets have been generated using apriori, we can start mining association rules. Given that we are only looking at item sets of size 2, the association rules we will generate will be of the form {A} -> {B}. One common application of these rules is in the domain of recommender systems, where customers who purchased item A are recommended item B.

Here are 3 key metrics to consider when evaluating association rules:

support
This is the percentage of orders that contains the item set. In the example above, there are 5 orders in total and {apple,egg} occurs in 3 of them, so:
```
             support{apple,egg} = 3/5 or 60%
```
The minimum support threshold required by apriori can be set based on knowledge of your domain. In this grocery dataset for example, since there could be thousands of distinct items and an order can contain only a small fraction of these items, setting the support threshold to 0.01% may be reasonable.
confidence
Given two items, A and B, confidence measures the percentage of times that item B is purchased, given that item A was purchased. This is expressed as:
```
             confidence{A->B} = support{A,B} / support{A}
```
Confidence values range from 0 to 1, where 0 indicates that B is never purchased when A is purchased, and 1 indicates that B is always purchased whenever A is purchased. Note that the confidence measure is directional. This means that we can also compute the percentage of times that item A is purchased, given that item B was purchased:
```
             confidence{B->A} = support{A,B} / support{B}
```
In our example, the percentage of times that egg is purchased, given that apple was purchased is:
```
             confidence{apple->egg} = support{apple,egg} / support{apple}
                                    = (3/5) / (4/5)
                                    = 0.75 or 75%
```
A confidence value of 0.75 implies that out of all orders that contain apple, 75% of them also contain egg. Now, we look at the confidence measure in the opposite direction (ie: egg->apple):
```
             confidence{egg->apple} = support{apple,egg} / support{egg}
                                    = (3/5) / (3/5)
                                    = 1 or 100%  
```
Here we see that all of the orders that contain egg also contain apple. But, does this mean that there is a relationship between these two items, or are they occurring together in the same orders simply by chance? To answer this question, we look at another measure which takes into account the popularity of both items.
lift
Given two items, A and B, lift indicates whether there is a relationship between A and B, or whether the two items are occuring together in the same orders simply by chance (ie: at random). Unlike the confidence metric whose value may vary depending on direction (eg: confidence{A->B} may be different from confidence{B->A}), lift has no direction. This means that the lift{A,B} is always equal to the lift{B,A}:
```
             lift{A,B} = lift{B,A} = support{A,B} / (support{A} * support{B})
```
In our example, we compute lift as follows:
```
             lift{apple,egg} = lift{egg,apple} = support{apple,egg} / (support{apple} * support{egg})
                                               = (3/5) / (4/5 * 3/5) 
                                               = 1.25
```
One way to understand lift is to think of the denominator as the likelihood that A and B will appear in the same order if there was no relationship between them. In the example above, if apple occurred in 80% of the orders and egg occurred in 60% of the orders, then if there was no relationship between them, we would expect both of them to show up together in the same order 48% of the time (ie: 80% * 60%). The numerator, on the other hand, represents how often apple and egg actually appear together in the same order. In this example, that is 60% of the time. Taking the numerator and dividing it by the denominator, we get to how many more times apple and egg actually appear in the same order, compared to if there was no relationship between them (ie: that they are occurring together simply at random).

In summary, lift can take on the following values:
```
 * lift = 1 implies no relationship between A and B. 
   (ie: A and B occur together only by chance)

 * lift > 1 implies that there is a positive relationship between A and B.
   (ie:  A and B occur together more often than random)

 * lift < 1 implies that there is a negative relationship between A and B.
   (ie:  A and B occur together less often than random)
```
In our example, apple and egg occur together 1.25 times more than random, so we conclude that there exists a positive relationship between them.

Armed with knowledge of apriori and association rules mining, let's dive into the data and code to see what relationships we unravel!

Input Dataset¶

Instacart, an online grocer, has graciously made some of their datasets accessible to the public. The order and product datasets that we will be using can be downloaded from the link below, along with the data dictionary:

“The Instacart Online Grocery Shopping Dataset 2017”, Accessed from https://www.instacart.com/datasets/grocery-shopping-2017 on September 1, 2017.

In [2]:

import pandas as pd
import numpy as np
import sys
from itertools import combinations, groupby
from collections import Counter

In [3]:

# Function that returns the size of an object in MB
def size(obj):
    return "{0:.2f} MB".format(sys.getsizeof(obj) / (1000 * 1000))

Part 1: Data Preparation¶

A. Load order data¶

In [4]:

orders = pd.read_csv('order_products__prior.csv')
print('orders -- dimensions: {0};   size: {1}'.format(orders.shape, size(orders)))
display(orders.head())

orders -- dimensions: (32434489, 4);   size: 1037.90 MB

	order_id	product_id	add_to_cart_order	reordered
0	2	33120	1	1
1	2	28985	2	1
2	2	9327	3	0
3	2	45918	4	1
4	2	30035	5	0

B. Convert order data into format expected by the association rules function¶

In [5]:

# Convert from DataFrame to a Series, with order_id as index and item_id as value
orders = orders.set_index('order_id')['product_id'].rename('item_id')
display(orders.head(10))
type(orders)

order_id
2    33120
2    28985
2     9327
2    45918
2    30035
2    17794
2    40141
2     1819
2    43668
3    33754
Name: item_id, dtype: int64

pandas.core.series.Series

C. Display summary statistics for order data¶

In [6]:

print('orders -- dimensions: {0};   size: {1};   unique_orders: {2};   unique_items: {3}'
      .format(orders.shape, size(orders), len(orders.index.unique()), len(orders.value_counts())))

orders -- dimensions: (32434489,);   size: 518.95 MB;   unique_orders: 3214874;   unique_items: 49677

Part 2: Association Rules Function¶

A. Helper functions to the main association rules function¶

In [7]:

# Returns frequency counts for items and item pairs
def freq(iterable):
    if type(iterable) == pd.core.series.Series:
        return iterable.value_counts().rename("freq")
    else: 
        return pd.Series(Counter(iterable)).rename("freq")

    
# Returns number of unique orders
def order_count(order_item):
    return len(set(order_item.index))


# Returns generator that yields item pairs, one at a time
def get_item_pairs(order_item):
    order_item = order_item.reset_index().as_matrix()
    for order_id, order_object in groupby(order_item, lambda x: x[0]):
        item_list = [item[1] for item in order_object]
              
        for item_pair in combinations(item_list, 2):
            yield item_pair
            

# Returns frequency and support associated with item
def merge_item_stats(item_pairs, item_stats):
    return (item_pairs
                .merge(item_stats.rename(columns={'freq': 'freqA', 'support': 'supportA'}), left_on='item_A', right_index=True)
                .merge(item_stats.rename(columns={'freq': 'freqB', 'support': 'supportB'}), left_on='item_B', right_index=True))


# Returns name associated with item
def merge_item_name(rules, item_name):
    columns = ['itemA','itemB','freqAB','supportAB','freqA','supportA','freqB','supportB', 
               'confidenceAtoB','confidenceBtoA','lift']
    rules = (rules
                .merge(item_name.rename(columns={'item_name': 'itemA'}), left_on='item_A', right_on='item_id')
                .merge(item_name.rename(columns={'item_name': 'itemB'}), left_on='item_B', right_on='item_id'))
    return rules[columns]

B. Association rules function¶

In [8]:

def association_rules(order_item, min_support):

    print("Starting order_item: {:22d}".format(len(order_item)))


    # Calculate item frequency and support
    item_stats             = freq(order_item).to_frame("freq")
    item_stats['support']  = item_stats['freq'] / order_count(order_item) * 100


    # Filter from order_item items below min support 
    qualifying_items       = item_stats[item_stats['support'] >= min_support].index
    order_item             = order_item[order_item.isin(qualifying_items)]

    print("Items with support >= {}: {:15d}".format(min_support, len(qualifying_items)))
    print("Remaining order_item: {:21d}".format(len(order_item)))


    # Filter from order_item orders with less than 2 items
    order_size             = freq(order_item.index)
    qualifying_orders      = order_size[order_size >= 2].index
    order_item             = order_item[order_item.index.isin(qualifying_orders)]

    print("Remaining orders with 2+ items: {:11d}".format(len(qualifying_orders)))
    print("Remaining order_item: {:21d}".format(len(order_item)))


    # Recalculate item frequency and support
    item_stats             = freq(order_item).to_frame("freq")
    item_stats['support']  = item_stats['freq'] / order_count(order_item) * 100


    # Get item pairs generator
    item_pair_gen          = get_item_pairs(order_item)


    # Calculate item pair frequency and support
    item_pairs              = freq(item_pair_gen).to_frame("freqAB")
    item_pairs['supportAB'] = item_pairs['freqAB'] / len(qualifying_orders) * 100

    print("Item pairs: {:31d}".format(len(item_pairs)))


    # Filter from item_pairs those below min support
    item_pairs              = item_pairs[item_pairs['supportAB'] >= min_support]

    print("Item pairs with support >= {}: {:10d}\n".format(min_support, len(item_pairs)))


    # Create table of association rules and compute relevant metrics
    item_pairs = item_pairs.reset_index().rename(columns={'level_0': 'item_A', 'level_1': 'item_B'})
    item_pairs = merge_item_stats(item_pairs, item_stats)
    
    item_pairs['confidenceAtoB'] = item_pairs['supportAB'] / item_pairs['supportA']
    item_pairs['confidenceBtoA'] = item_pairs['supportAB'] / item_pairs['supportB']
    item_pairs['lift']           = item_pairs['supportAB'] / (item_pairs['supportA'] * item_pairs['supportB'])
   

    # Return association rules sorted by lift in descending order
    return item_pairs.sort_values('lift', ascending=False)

Part 3: Association Rules Mining¶

In [9]:

%%time
rules = association_rules(orders, 0.01)

Starting order_item:               32434489
Items with support >= 0.01:           10906
Remaining order_item:              29843570
Remaining orders with 2+ items:     3013325
Remaining order_item:              29662716
Item pairs:                        30622410
Item pairs with support >= 0.01:      48751

CPU times: user 9min 26s, sys: 34.5 s, total: 10min 1s
Wall time: 10min 13s

In [10]:

# Replace item ID with item name and display association rules
item_name   = pd.read_csv('products.csv')
item_name   = item_name.rename(columns={'product_id':'item_id', 'product_name':'item_name'})
rules_final = merge_item_name(rules, item_name).sort_values('lift', ascending=False)
display(rules_final)

	itemA	itemB	freqAB	supportAB	freqA	supportA	freqB	supportB	confidenceAtoB	confidenceBtoA	lift
0	Organic Strawberry Chia Lowfat 2% Cottage Cheese	Organic Cottage Cheese Blueberry Acai Chia	306	0.010155	1163	0.038595	839	0.027843	0.263113	0.364720	9.449868
1	Grain Free Chicken Formula Cat Food	Grain Free Turkey Formula Cat Food	318	0.010553	1809	0.060033	879	0.029170	0.175788	0.361775	6.026229
3	Organic Fruit Yogurt Smoothie Mixed Berry	Apple Blueberry Fruit Yogurt Smoothie	349	0.011582	1518	0.050376	1249	0.041449	0.229908	0.279424	5.546732
9	Nonfat Strawberry With Fruit On The Bottom Gre...	0% Greek, Blueberry on the Bottom Yogurt	409	0.013573	1666	0.055288	1391	0.046162	0.245498	0.294033	5.318230
10	Organic Grapefruit Ginger Sparkling Yerba Mate	Cranberry Pomegranate Sparkling Yerba Mate	351	0.011648	1731	0.057445	1149	0.038131	0.202773	0.305483	5.317849
11	Baby Food Pouch - Roasted Carrot Spinach & Beans	Baby Food Pouch - Butternut Squash, Carrot & C...	332	0.011018	1503	0.049878	1290	0.042810	0.220892	0.257364	5.159830
12	Unsweetened Whole Milk Mixed Berry Greek Yogurt	Unsweetened Whole Milk Blueberry Greek Yogurt	438	0.014535	1622	0.053828	1621	0.053794	0.270037	0.270204	5.019798
23	Uncured Cracked Pepper Beef	Chipotle Beef & Pork Realstick	410	0.013606	1839	0.061029	1370	0.045465	0.222947	0.299270	4.903741
24	Organic Mango Yogurt	Organic Whole Milk Washington Black Cherry Yogurt	334	0.011084	1675	0.055586	1390	0.046128	0.199403	0.240288	4.322777
2	Grain Free Chicken Formula Cat Food	Grain Free Turkey & Salmon Formula Cat Food	391	0.012976	1809	0.060033	1553	0.051538	0.216142	0.251771	4.193848
25	Raspberry Essence Water	Unsweetened Pomegranate Essence Water	366	0.012146	2025	0.067202	1304	0.043274	0.180741	0.280675	4.176615
13	Unsweetened Whole Milk Strawberry Yogurt	Unsweetened Whole Milk Blueberry Greek Yogurt	440	0.014602	1965	0.065210	1621	0.053794	0.223919	0.271437	4.162489
14	Unsweetened Whole Milk Peach Greek Yogurt	Unsweetened Whole Milk Blueberry Greek Yogurt	421	0.013971	1922	0.063783	1621	0.053794	0.219043	0.259716	4.071849
44	Oh My Yog! Pacific Coast Strawberry Trilayer Y...	Oh My Yog! Organic Wild Quebec Blueberry Cream...	860	0.028540	2857	0.094812	2271	0.075365	0.301015	0.378688	3.994083
55	Mighty 4 Kale, Strawberry, Amaranth & Greek Yo...	Mighty 4 Essential Tots Spinach, Kiwi, Barley ...	390	0.012943	2206	0.073208	1337	0.044370	0.176791	0.291698	3.984498
20	Unsweetened Whole Milk Peach Greek Yogurt	Unsweetened Whole Milk Strawberry Yogurt	499	0.016560	1922	0.063783	1965	0.065210	0.259625	0.253944	3.981352
65	0% Greek, Blueberry on the Bottom Yogurt	Nonfat Strawberry With Fruit On The Bottom Gre...	305	0.010122	1391	0.046162	1666	0.055288	0.219267	0.183073	3.965918
15	Unsweetened Whole Milk Mixed Berry Greek Yogurt	Unsweetened Whole Milk Peach Greek Yogurt	410	0.013606	1622	0.053828	1922	0.063783	0.252774	0.213319	3.963014
43	Unsweetened Whole Milk Peach Greek Yogurt	Unsweetened Whole Milk Mixed Berry Greek Yogurt	407	0.013507	1922	0.063783	1622	0.053828	0.211759	0.250925	3.934016
26	Unsweetened Blackberry Water	Unsweetened Pomegranate Essence Water	494	0.016394	3114	0.103341	1304	0.043274	0.158638	0.378834	3.665867
19	Unsweetened Whole Milk Mixed Berry Greek Yogurt	Unsweetened Whole Milk Strawberry Yogurt	383	0.012710	1622	0.053828	1965	0.065210	0.236128	0.194911	3.621024
16	Unsweetened Whole Milk Strawberry Yogurt	Unsweetened Whole Milk Peach Greek Yogurt	444	0.014735	1965	0.065210	1922	0.063783	0.225954	0.231009	3.542526
56	Mighty 4 Sweet Potato, Blueberry, Millet & Gre...	Mighty 4 Essential Tots Spinach, Kiwi, Barley ...	398	0.013208	2534	0.084093	1337	0.044370	0.157064	0.297681	3.539900
74	Sweet Potatoes Stage 2	Organic Stage 2 Winter Squash Baby Food Puree	322	0.010686	2077	0.068927	1322	0.043872	0.155031	0.243570	3.533734
79	Compostable Forks	Plastic Spoons	321	0.010653	1528	0.050708	1838	0.060996	0.210079	0.174646	3.444151
75	Organic Stage 2 Carrots Baby Food	Organic Stage 2 Winter Squash Baby Food Puree	337	0.011184	2306	0.076527	1322	0.043872	0.146141	0.254917	3.331080
42	Unsweetened Whole Milk Strawberry Yogurt	Unsweetened Whole Milk Mixed Berry Greek Yogurt	352	0.011681	1965	0.065210	1622	0.053828	0.179135	0.217016	3.327938
21	Unsweetened Whole Milk Blueberry Greek Yogurt	Unsweetened Whole Milk Strawberry Yogurt	350	0.011615	1621	0.053794	1965	0.065210	0.215916	0.178117	3.311071
17	Unsweetened Whole Milk Blueberry Greek Yogurt	Unsweetened Whole Milk Peach Greek Yogurt	341	0.011316	1621	0.053794	1922	0.063783	0.210364	0.177419	3.298101
83	Cream Top Blueberry Yogurt	Cream Top Peach on the Bottom Yogurt	313	0.010387	1676	0.055620	1748	0.058009	0.186754	0.179062	3.219399
...	...	...	...	...	...	...	...	...	...	...	...
22444	Large Lemon	Hass Avocados	468	0.015531	152177	5.050136	49246	1.634274	0.003075	0.009503	0.001882
2577	Red Onion	Bag of Organic Bananas	1008	0.033451	42906	1.423876	376367	12.490090	0.023493	0.002678	0.001881
250	Roasted Pine Nut Hummus	Banana	327	0.010852	11176	0.370886	470096	15.600574	0.029259	0.000696	0.001876
655	Organic Large Green Asparagus	Banana	556	0.018451	19228	0.638099	470096	15.600574	0.028916	0.001183	0.001854
40897	Banana	Organic Extra Virgin Olive Oil	369	0.012246	470096	15.600574	12788	0.424382	0.000785	0.028855	0.001850
2652	Spinach	Bag of Organic Bananas	383	0.012710	16766	0.556395	376367	12.490090	0.022844	0.001018	0.001829
2722	Sour Cream	Bag of Organic Bananas	486	0.016128	21481	0.712867	376367	12.490090	0.022625	0.001291	0.001811
11143	Organic Blueberries	Blueberries	329	0.010918	99359	3.297321	55703	1.848556	0.003311	0.005906	0.001791
2537	Green Onions	Bag of Organic Bananas	592	0.019646	26467	0.878332	376367	12.490090	0.022367	0.001573	0.001791
1386	2% Reduced Fat Milk	Organic Strawberries	574	0.019049	36768	1.220180	263416	8.741706	0.015611	0.002179	0.001786
3291	2% Reduced Fat Milk	Organic Baby Spinach	523	0.017356	36768	1.220180	240637	7.985763	0.014224	0.002173	0.001781
530	Chocolate Chip Cookies	Banana	377	0.012511	13688	0.454249	470096	15.600574	0.027542	0.000802	0.001765
10681	Half & Half	Organic Half & Half	302	0.010022	68842	2.284586	75334	2.500029	0.004387	0.004009	0.001755
5446	Organic Reduced Fat 2% Milk	Organic Whole Milk	379	0.012577	47593	1.579418	136832	4.540898	0.007963	0.002770	0.001754
11455	Banana	Soda	864	0.028673	470096	15.600574	33008	1.095401	0.001838	0.026175	0.001678
11421	Bag of Organic Bananas	Fridge Pack Cola	366	0.012146	376367	12.490090	18005	0.597513	0.000972	0.020328	0.001628
2568	Asparation/Broccolini/Baby Broccoli	Bag of Organic Bananas	317	0.010520	16480	0.546904	376367	12.490090	0.019235	0.000842	0.001540
19596	Banana	Organic Tortilla Chips	320	0.010619	470096	15.600574	13458	0.446616	0.000681	0.023778	0.001524
2319	Fridge Pack Cola	Bag of Organic Bananas	341	0.011316	18005	0.597513	376367	12.490090	0.018939	0.000906	0.001516
11017	Organic Baby Spinach	2% Reduced Fat Milk	403	0.013374	240637	7.985763	36768	1.220180	0.001675	0.010961	0.001373
22572	Organic Raspberries	Raspberries	322	0.010686	136621	4.533895	56858	1.886886	0.002357	0.005663	0.001249
11012	Organic Strawberries	2% Reduced Fat Milk	371	0.012312	263416	8.741706	36768	1.220180	0.001408	0.010090	0.001154
246	Soda	Banana	531	0.017622	33008	1.095401	470096	15.600574	0.016087	0.001130	0.001031
11555	Banana	Clementines	397	0.013175	470096	15.600574	29798	0.988874	0.000845	0.013323	0.000854
1474	Strawberries	Organic Strawberries	706	0.023429	141805	4.705931	263416	8.741706	0.004979	0.002680	0.000570
7271	Organic Strawberries	Strawberries	640	0.021239	263416	8.741706	141805	4.705931	0.002430	0.004513	0.000516
6763	Organic Hass Avocado	Organic Avocado	464	0.015398	212785	7.061469	176241	5.848722	0.002181	0.002633	0.000373
4387	Organic Avocado	Organic Hass Avocado	443	0.014701	176241	5.848722	212785	7.061469	0.002514	0.002082	0.000356
2596	Banana	Bag of Organic Bananas	654	0.021704	470096	15.600574	376367	12.490090	0.001391	0.001738	0.000111
670	Bag of Organic Bananas	Banana	522	0.017323	376367	12.490090	470096	15.600574	0.001387	0.001110	0.000089

48751 rows × 11 columns

Part 4: Conclusion¶

From the output above, we see that the top associations are not surprising, with one flavor of an item being purchased with another flavor from the same item family (eg: Strawberry Chia Cottage Cheese with Blueberry Acai Cottage Cheese, Chicken Cat Food with Turkey Cat Food, etc). As mentioned, one common application of association rules mining is in the domain of recommender systems. Once item pairs have been identified as having positive relationship, recommendations can be made to customers in order to increase sales. And hopefully, along the way, also introduce customers to items they never would have tried before or even imagined existed! If you wish to see the Python notebook corresponding to the code above, please click here.

datathèque

Building a Recommender System Using Embeddings

User-User Collaborative Filtering Using Neo4j Graph Database

Motivation ¶

User-User Collaborative Filtering ¶

Jaccard Index ¶

Graph Database ¶

Input Dataset ¶

Part 1: Data Preparation ¶

A. Load user order data ¶

B. Load order details data ¶

C. Load product data ¶

D. Load aisle data ¶

E. Load department data ¶

F. Export dataframes to CSV, which in turn will be loaded into Neo4j ¶

Part 2: Create Neo4j Graph Database ¶

A. Set up authentication and connection to Neo4j ¶

B. Start with an empty database, then create constraints to ensure uniqueness of nodes ¶

C. Load product data into Neo4j ¶

D. Load aisle data into Neo4j ¶

E. Load department data into Neo4j ¶

F. Load order details data into Neo4j ¶

G. Load user order data into Neo4j ¶

H. What our graph looks like ¶

Part 3: Implement User-User Collaborative Filtering Algorithm ¶

Part 4: Execute User-User Collaborative Filtering ¶

Part 5: Evaluating Recommender Performance ¶

Conclusion ¶

Association Rules Mining Using Python Generators to Handle Large Datasets

Motivation¶

Python Generators¶

Apriori Algorithm¶

Association Rules Mining¶

Input Dataset¶

Part 1: Data Preparation¶

A. Load order data¶

B. Convert order data into format expected by the association rules function¶

C. Display summary statistics for order data¶

Part 2: Association Rules Function¶

A. Helper functions to the main association rules function¶

B. Association rules function¶

Part 3: Association Rules Mining¶

Part 4: Conclusion¶