Motivation ¶

Recommendation systems fall under two categories: personalized and non-personalized recommenders. My previous post on association rules mining is an example of a non-personalized recommender, as the recommendations generated are not tailored to a specific user. By contrast, a personalized recommender system takes into account user preferences in order to make recommendations for that user. There are various personalized recommender algorithms, and in this post, I will be implementing user-user collaborative filtering. This algorithm finds users similar to the target user in order to generate recommendations for the target user. And to add to the fun, I will be implementing the algorithm using a graph database versus the more traditional approach of matrix factorization.

User-User Collaborative Filtering ¶

This personalized recommender algorithm assumes that past agreements predict future agreements. It uses the concept of similarity in order to identify users that are "like" the target user in terms of their preferences. Users identified as most similar to the target user become part of the target user's neighbourhood. Preferences of these neighbours are then used to generate recommendations for the target user.

Concretely, here are the steps we will be implementing to generate recommendations for an online grocer during the user check-out process:

Select a similarity metric to quantify likeness between users in the system.
For each user pair, compute similarity metric.
For each target user, select the top k neighbours based on the similarity metric.
Identify products purchased by the top k neighbours that have not been purchased by the target user.
Rank these products by the number of purchasing neighbours.
Recommend the top n products to the target user.

In this particular demonstration, we are building a recommender system based on the preferences of 100 users. As such, we would like the neighbourhood size, k, to be large enough to identify clear agreement between users, but not too large that we end up including those that are not very similar to the target user. Hence we choose k=10. Secondly, in the context of a user check-out application for an online grocer, the goal is to increase user basket by surfacing products that are as relevant as possible, without overwhelming the user. Therefore we limit the number of recommendations to n=10 products per user.

Jaccard Index ¶

The Jaccard index measures similarity between two sets, with values ranging from 0 to 1. A value of 0 indicates that the two sets have no elements in common, while a value of 1 implies that the two sets are identical. Given two sets A and B, the Jaccard index is computed as follows:

$$J(A,B) = \frac{| A \cap B |} {| A \cup B |}$$

The numerator is the number of elements that A and B have in common, while the denominator is the number of elements that are unique to each set.

In this implementation of user-user collaborative filtering, we will be using the Jaccard index to measure similarity between two users. This is primarily due to the sparse nature of the data, where there is a large number of products and each user purchases only a small fraction of those products. If we were to model our user preferences using binary attributes (ie: 1 if user purchased product X and 0 if user did not purchase product X), we would have a lot of 0s and very few 1s. The Jaccard index is effective in this case, as it eliminates matching attributes that have a value of 0 for both users. In other words, when computing similarity between two users, we only consider products that have been purchased, either by both users, or at least one of the users. Another great thing about the Jaccard index is that it accounts for cases where one user purchases significantly more products than the user it's being compared to. This can result in higher overlap of products purchased between the two, but this does not necessarily mean that the two users are similar. With the equation above, we see that the denominator will be large, thereby resulting in a smaller Jaccard index value.

Graph Database ¶

A graph database is a way of representing and storing data. Unlike a relational database which represents data as rows and columns, data in a graph database is represented using nodes, edges and properties. This representation makes graph databases conducive to storing data that is inherently connected. For our implementation of user-user collaborative filtering, we are interested in the relationships that exist between users based on their preferences. In particular, we have a system comprised of users who ordered orders, and these orders contain products that are in a department and in an aisle. These relationships are easily depicted using a property graph data model:

Property Graph Model

The nodes represent entities in our system: User, Order, Product, Department and Aisle. The edges represent relationships: ORDERED, CONTAINS, IN_DEPARTMENT and IN_AISLE. The attributes in the nodes represent properties (eg: a User has property "user_id"). Mapping to natural language, we can generally think of nodes as nouns, edges as verbs and properties as adjectives.

In this particular graph, each node type contains the same set of properties (eg: all Orders have an order_id, order_number, order_day_of_week and order_hour_of_day), but one of the interesting properties of graph databases is that it is schema-free. This means that a node can have an arbitrary set of properties. For example, we can have two user nodes, u1 and u2, with u1 having properties such as name, address and phone number, and u2 having properties such as name and email. The concept of a schema-free model also applies to the relationships that exist in the graph.

We will be implementing the property graph model above using Neo4j, arguably the most popular graph database today. The actual creation and querying of the database will be done using the Cypher query language, often thought of as SQL for graphs.

Input Dataset ¶

Similar to my post on association rules mining, we will once again be using data from Instacart. The datasets can be downloaded from the link below, along with the data dictionary:

“The Instacart Online Grocery Shopping Dataset 2017”, Accessed from https://www.instacart.com/datasets/grocery-shopping-2017 on Oct 10, 2017.

Part 1: Data Preparation ¶

In [1]:

import pandas as pd
import numpy as np
import sys

In [2]:

# Utility functions 

# Returns the size of an object in MB
def size(obj):
    return "{0:.2f} MB".format(sys.getsizeof(obj) / (1000 * 1000))

# Displays dataframe dimensions, size and top 5 records
def inspect_df(df_name, df):
    print('{0} --- dimensions: {1};  size: {2}'.format(df_name, df.shape, size(df)))  
    display(df.head())
    
# Exports dataframe to CSV, the format for loading data into Neo4j 
def export_to_csv(df, out):
    df.to_csv(out, sep='|', columns=df.columns, index=False)

A. Load user order data ¶

Data from Instacart contains 3-99 orders per user. Inspecting the distribution of number of orders per user, we see that users place 16 orders on average, with 75% of the user base placing at least 5 orders. For demonstration purposes, we will be running collaborative filtering on a random sample of 100 users who purchased at least 5 orders.

In [9]:

min_orders   = 5     # minimum order count per user
sample_count = 100   # number of users to select randomly

# Load data from evaluation set "prior" (please see data dictionary for definition of 'eval_set') 
order_user           = pd.read_csv('orders.csv')
order_user           = order_user[order_user['eval_set'] == 'prior']

# Get distribution of number of orders per user
user_order_count     = order_user.groupby('user_id').agg({'order_id':'count'}).rename(columns={'order_id':'num_orders'}).reset_index()
print('Distribution of number of orders per user:')
display(user_order_count['num_orders'].describe())

# Select users who purchased at least 'min_orders'
user_order_atleast_x = user_order_count[user_order_count['num_orders'] >= min_orders]

# For reproducibility, set seed before taking a random sample
np.random.seed(1111)
user_sample          = np.random.choice(user_order_atleast_x['user_id'], sample_count, replace=False)

# Subset 'order_user' to include records associated with the 100 randomly selected users
order_user           = order_user[order_user['user_id'].isin(user_sample)]
order_user           = order_user[['order_id','user_id','order_number','order_dow','order_hour_of_day']]
inspect_df('order_user', order_user)

Distribution of number of orders per user:

count    206209.000000
mean         15.590367
std          16.654774
min           3.000000
25%           5.000000
50%           9.000000
75%          19.000000
max          99.000000
Name: num_orders, dtype: float64


order_user --- dimensions: (1901, 5);  size: 0.09 MB

	order_id	user_id	order_number	order_dow	order_hour_of_day
11334	2808127	701	1	2	14
11335	2677145	701	2	3	11
11336	740361	701	3	1	13
11337	2866491	701	4	3	12
11338	1676999	701	5	4	11

B. Load order details data ¶

In [10]:

# Load orders associated with our 100 selected users, along with the products contained in those orders
order_product = pd.read_csv('order_products__prior.csv')
order_product = order_product[order_product['order_id'].isin(order_user.order_id.unique())][['order_id','product_id']]
inspect_df('order_product', order_product)

order_product --- dimensions: (19840, 2);  size: 0.48 MB

	order_id	product_id
1855	209	39409
1856	209	20842
1857	209	16965
1858	209	8021
1859	209	23001

C. Load product data ¶

In [11]:

# Load products purchased by our 100 selected users
products = pd.read_csv('products.csv')
products = products[products['product_id'].isin(order_product.product_id.unique())]
inspect_df('products', products)

products --- dimensions: (3959, 4);  size: 0.46 MB

	product_id	product_name	aisle_id	department_id
0	1	Chocolate Sandwich Cookies	61	19
33	34	Peanut Butter Cereal	121	14
44	45	European Cucumber	83	4
98	99	Local Living Butter Lettuce	83	4
115	116	English Muffins	93	3

D. Load aisle data ¶

In [12]:

# Load entire aisle data as it contains the names related to the aisle IDs from the 'products' data
aisles = pd.read_csv('aisles.csv')
inspect_df('aisles', aisles)

aisles --- dimensions: (134, 2);  size: 0.01 MB

	aisle_id	aisle
0	1	prepared soups salads
1	2	specialty cheeses
2	3	energy granola bars
3	4	instant foods
4	5	marinades meat preparation

E. Load department data ¶

In [13]:

# Load entire department data as it contains the names related to the department IDs from the 'products' data
departments = pd.read_csv('departments.csv')
inspect_df('departments', departments)

departments --- dimensions: (21, 2);  size: 0.00 MB

	department_id	department
0	1	frozen
1	2	other
2	3	bakery
3	4	produce
4	5	alcohol

F. Export dataframes to CSV, which in turn will be loaded into Neo4j ¶

In [14]:

export_to_csv(order_user,    '~/neo4j_instacart/import/neo4j_order_user.csv')
export_to_csv(order_product, '~/neo4j_instacart/import/neo4j_order_product.csv')    
export_to_csv(products,      '~/neo4j_instacart/import/neo4j_products.csv')
export_to_csv(aisles,        '~/neo4j_instacart/import/neo4j_aisles.csv')
export_to_csv(departments,   '~/neo4j_instacart/import/neo4j_departments.csv')

Part 2: Create Neo4j Graph Database ¶

A. Set up authentication and connection to Neo4j ¶

In [15]:

# py2neo allows us to work with Neo4j from within Python
from py2neo import authenticate, Graph

# Set up authentication parameters
authenticate("localhost:7474", "neo4j", "xxxxxxxx") 

# Connect to authenticated graph database
g = Graph("http://localhost:7474/db/data/")

B. Start with an empty database, then create constraints to ensure uniqueness of nodes ¶

In [16]:

# Each time this notebook is run, we start with an empty graph database
g.run("MATCH (n) DETACH DELETE n;")    

# We drop and recreate our node constraints
g.run("DROP CONSTRAINT ON (order:Order)             ASSERT order.order_id            IS UNIQUE;")
g.run("DROP CONSTRAINT ON (user:User)               ASSERT user.user_id              IS UNIQUE;")
g.run("DROP CONSTRAINT ON (product:Product)         ASSERT product.product_id        IS UNIQUE;")
g.run("DROP CONSTRAINT ON (aisle:Aisle)             ASSERT aisle.aisle_id            IS UNIQUE;")
g.run("DROP CONSTRAINT ON (department:Department)   ASSERT department.department_id  IS UNIQUE;")

g.run("CREATE CONSTRAINT ON (order:Order)           ASSERT order.order_id            IS UNIQUE;")
g.run("CREATE CONSTRAINT ON (user:User)             ASSERT user.user_id              IS UNIQUE;")
g.run("CREATE CONSTRAINT ON (product:Product)       ASSERT product.product_id        IS UNIQUE;")
g.run("CREATE CONSTRAINT ON (aisle:Aisle)           ASSERT aisle.aisle_id            IS UNIQUE;")
g.run("CREATE CONSTRAINT ON (department:Department) ASSERT department.department_id  IS UNIQUE;")

Out[16]:

<py2neo.database.Cursor at 0x109ea3e10>

C. Load product data into Neo4j ¶

In [19]:

query = """
        // Load and commit every 500 records
        USING PERIODIC COMMIT 500 
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_products.csv' AS line FIELDTERMINATOR '|' 
        WITH line 
        
        // Create Product, Aisle and Department nodes
        CREATE (product:Product {product_id: toInteger(line.product_id), product_name: line.product_name}) 
        MERGE  (aisle:Aisle {aisle_id: toInteger(line.aisle_id)}) 
        MERGE  (department:Department {department_id: toInteger(line.department_id)}) 

        // Create relationships between products and aisles & products and departments 
        CREATE (product)-[:IN_AISLE]->(aisle) 
        CREATE (product)-[:IN_DEPARTMENT]->(department);
        """

g.run(query)

Out[19]:

<py2neo.database.Cursor at 0x10a9e4208>

D. Load aisle data into Neo4j ¶

In [137]:

query = """
        // Aisle data is very small, so there is no need to do periodic commits
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_aisles.csv' AS line FIELDTERMINATOR '|' 
        WITH line 
        
        // For each Aisle node, set property 'aisle_name' 
        MATCH (aisle:Aisle {aisle_id: toInteger(line.aisle_id)}) 
        SET aisle.aisle_name = line.aisle;
        """

g.run(query)

Out[137]:

<py2neo.database.Cursor at 0x1101457b8>

E. Load department data into Neo4j ¶

In [138]:

query = """
        // Department data is very small, so there is no need to do periodic commits
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_departments.csv' AS line FIELDTERMINATOR '|' 
        WITH line
        
        // For each Department node, set property 'department_name' 
        MATCH (department:Department {department_id: toInteger(line.department_id)}) 
        SET department.department_name = line.department;
        """

g.run(query)

Out[138]:

<py2neo.database.Cursor at 0x11751acf8>

F. Load order details data into Neo4j ¶

In [139]:

query = """
        // Load and commit every 500 records        
        USING PERIODIC COMMIT 500
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_order_product.csv' AS line FIELDTERMINATOR '|'
        WITH line
        
        // Create Order nodes and then create relationships between orders and products
        MERGE (order:Order {order_id: toInteger(line.order_id)})
        MERGE (product:Product {product_id: toInteger(line.product_id)})
        CREATE (order)-[:CONTAINS]->(product);
        """

g.run(query)

Out[139]:

<py2neo.database.Cursor at 0x110145160>

G. Load user order data into Neo4j ¶

In [140]:

query = """
        // Load and commit every 500 records 
        USING PERIODIC COMMIT 500
        LOAD CSV WITH HEADERS FROM 'file:///neo4j_order_user.csv' AS line FIELDTERMINATOR '|'
        WITH line
        
        // Create User nodes and then create relationships between users and orders 
        MERGE (order:Order {order_id: toInteger(line.order_id)})
        MERGE (user:User   {user_id:  toInteger(line.user_id)})

        // Create relationships between users and orders, then set Order properties
        CREATE(user)-[o:ORDERED]->(order)              
        SET order.order_number = toInteger(line.order_number),
            order.order_day_of_week = toInteger(line.order_dow), 
            order.order_hour_of_day = toInteger(line.order_hour_of_day);
        """

g.run(query)

Out[140]:

<py2neo.database.Cursor at 0x1149251d0>

H. What our graph looks like ¶

This is what the nodes and relationships we have created look like in Neo4j for a small subset of the data. Please use the legend on the top left corner only to determine the colours associated with the different nodes (ie: ignore the numbers).

Instacart Graph

Part 3: Implement User-User Collaborative Filtering Algorithm ¶

In [221]:

# Implements user-user collaborative filtering using the following steps:
#   1. For each user pair, compute Jaccard index
#   2. For each target user, select top k neighbours based on Jaccard index
#   3. Identify products purchased by the top k neighbours that have not been purchased by the target user
#   4. Rank these products by the number of purchasing neighbours
#   5. Return the top n recommendations for each user

def collaborative_filtering(graph, neighbourhood_size, num_recos):

    query = """
           // Get user pairs and count of distinct products that they have both purchased
           MATCH (u1:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)<-[:CONTAINS]-(:Order)<-[:ORDERED]-(u2:User)
           WHERE u1 <> u2
           WITH u1, u2, COUNT(DISTINCT p) as intersection_count

           // Get count of all the distinct products that they have purchased between them
           MATCH (u:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)
           WHERE u in [u1, u2]
           WITH u1, u2, intersection_count, COUNT(DISTINCT p) as union_count

           // Compute Jaccard index
           WITH u1, u2, intersection_count, union_count, (intersection_count*1.0/union_count) as jaccard_index

           // Get top k neighbours based on Jaccard index
           ORDER BY jaccard_index DESC, u2.user_id
           WITH u1, COLLECT(u2)[0..{k}] as neighbours
           WHERE LENGTH(neighbours) = {k}                                              // only want users with enough neighbours
           UNWIND neighbours as neighbour
           WITH u1, neighbour

           // Get top n recommendations from the selected neighbours
           MATCH (neighbour)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)             // get all products bought by neighbour
           WHERE not (u1)-[:ORDERED]->(:Order)-[:CONTAINS]->(p)                        // which target user has not already bought
           WITH u1, p, COUNT(DISTINCT neighbour) as cnt                                // count neighbours who purchased product
           ORDER BY u1.user_id, cnt DESC                                               // sort by count desc
           RETURN u1.user_id as user, COLLECT(p.product_name)[0..{n}] as recos         // return top n products
           """

    recos = {}
    for row in graph.run(query, k=neighbourhood_size, n=num_recos):
        recos[row[0]] = row[1]

    return recos

Part 4: Execute User-User Collaborative Filtering ¶

Our collaborative filtering function expects 3 parameters: a graph database, the neighbourhood size and the number of products to recommend to each user. A reminder that our graph database, g, contains nodes and relationships pertaining to user orders. And as previously discussed, we have chosen k=10 as the neighbourhood size and n=10 as the number of products to recommend to each of our users. We now invoke our collaborative filtering function using these parameters.

In [277]:

%%time
recommendations = collaborative_filtering(g,10,10)
display(recommendations)

{701: ['Strawberries',
  'Organic Zucchini',
  'Organic Strawberries',
  'Limes',
  'Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Baby Carrots',
  'Organic Black Beans',
  'Organic Fuji Apple',
  'Red Vine Tomato'],
 1562: ['Organic Whole String Cheese',
  'Honeycrisp Apple',
  'Organic Strawberries',
  'Sparkling Water Grapefruit',
  'Organic Zucchini',
  'Organic Yellow Onion',
  'Organic Ginger Root',
  'Asparagus',
  'Salted Butter',
  'Whole Almonds'],
 4789: ['Organic Granny Smith Apple',
  'Limes',
  'Organic Green Cabbage',
  'Organic Cilantro',
  'Creamy Almond Butter',
  'Corn Tortillas',
  'Organic Grape Tomatoes',
  'Unsweetened Almondmilk',
  'Organic Blackberries',
  'Organic Lacinato (Dinosaur) Kale'],
 5225: ['Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Strawberries',
  'Organic Blueberries',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Honeycrisp Apple',
  'Sweet Kale Salad Mix',
  'Banana',
  'Green Onions'],
 5939: ['Organic Lemon',
  'Organic Kiwi',
  'Fresh Cauliflower',
  'Organic Red Onion',
  'Organic Small Bunch Celery',
  'Organic Raspberries',
  'Banana',
  'Organic Garlic',
  'Organic Large Extra Fancy Fuji Apple',
  'Frozen Organic Wild Blueberries'],
 6043: ['Organic Garlic',
  'Organic Zucchini',
  'Bag of Organic Bananas',
  'Organic Yellow Onion',
  'Large Lemon',
  'Small Hass Avocado',
  'Organic Cilantro',
  'Organic Tomato Paste',
  'Organic Blueberries',
  'Organic Grape Tomatoes'],
 6389: ['Organic Zucchini',
  'Half & Half',
  'Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Organic Yellow Onion',
  'Banana',
  'Organic Garlic',
  'Penne Rigate',
  'Organic Garnet Sweet Potato (Yam)',
  'Organic Raspberries'],
 7968: ['Strawberries',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Organic Lemon',
  '100% Recycled Paper Towels',
  'Organic Blueberries',
  'Organic Garlic',
  'Bag of Organic Bananas',
  'Limes',
  'Organic Grape Tomatoes'],
 12906: ['Organic Garlic',
  'Bag of Organic Bananas',
  'Organic Raspberries',
  'Organic Black Beans',
  'Free & Clear Natural Laundry Detergent For Sensitive Skin',
  'Organic Blueberries',
  'Original Hummus',
  'Organic Hass Avocado',
  'Organic Red Onion',
  'Organic Zucchini'],
 24670: ['Organic Blueberries',
  'Organic Baby Carrots',
  'Organic Raspberries',
  'Organic Hass Avocado',
  'Organic Garlic',
  'Orange Bell Pepper',
  'Organic Italian Parsley Bunch',
  'Organic Cilantro',
  'Honeycrisp Apple',
  'Organic Baby Spinach'],
 25442: ['Organic Blueberries',
  'Organic Red Onion',
  'Organic Peeled Whole Baby Carrots',
  'Organic Garlic',
  'Organic Baby Arugula',
  'Jalapeno Peppers',
  'Large Lemon',
  'Limes',
  'Bunched Cilantro',
  'Organic Avocado'],
 25490: ['Bag of Organic Bananas',
  'Organic Garlic',
  'Organic Strawberries',
  'Banana',
  'Extra Virgin Olive Oil',
  'Organic Avocado',
  'Organic Navel Orange',
  'Carrots',
  'Small Hass Avocado',
  'Organic Hass Avocado'],
 26277: ['Bag of Organic Bananas',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Cereal',
  'Pure Vanilla Extract',
  'Hass Avocado Variety',
  'Peaches',
  'Original Beef Jerky',
  'Unsweetened Vanilla Almond Breeze',
  'Organic Lemonade'],
 32976: ['Bag of Organic Bananas',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Apple Honeycrisp Organic',
  'Peanut Butter Creamy With Salt',
  'Organic Baby Spinach',
  'Organic Grape Tomatoes',
  'Organic Zucchini',
  'Seedless Red Grapes',
  'Organic Yellow Onion'],
 37120: ['Organic Strawberries',
  'Organic Baby Spinach',
  'Strawberries',
  'Sour Cream',
  'Organic Avocado',
  'Banana',
  'Carrots',
  'Organic Baby Carrots',
  'Limes',
  'Russet Potato'],
 40286: ['Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Banana',
  'Organic Yellow Onion',
  'Sparkling Water Grapefruit',
  'Organic Strawberries',
  'Large Lemon',
  'Limes',
  'Organic Hass Avocado',
  'Asparagus'],
 42145: ['Organic Baby Spinach',
  'Large Lemon',
  'Organic Baby Carrots',
  'Fresh Cauliflower',
  'Organic Whole Milk',
  'Organic Large Extra Fancy Fuji Apple',
  'Bunched Cilantro',
  'Organic Kiwi',
  'Organic Ginger Root',
  'Grated Parmesan'],
 43902: ['Large Lemon',
  'Banana',
  'Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Organic Strawberries',
  'Yellow Onions',
  'Organic Yellow Onion',
  'Organic Granny Smith Apple',
  'Limes',
  'Organic Raspberries'],
 45067: ['Organic Yellow Onion',
  'Limes',
  'Organic Grape Tomatoes',
  'Organic Black Beans',
  'Cucumber Kirby',
  'Bunched Cilantro',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Organic Avocado',
  'Asparagus'],
 47838: ['Organic Ginger Root',
  'Organic Lacinato (Dinosaur) Kale',
  'Organic Baby Carrots',
  'Organic Italian Parsley Bunch',
  'Banana',
  'Organic Cucumber',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Avocado',
  'Limes',
  'Organic Zucchini'],
 49441: ['Organic Yellow Onion',
  'Organic Hass Avocado',
  'Honeycrisp Apple',
  'White Corn',
  'Organic Baby Carrots',
  'Organic Whole String Cheese',
  'Red Vine Tomato',
  'Extra Virgin Olive Oil',
  'Pineapple Chunks',
  'Organic Peeled Whole Baby Carrots'],
 50241: ['Organic Raspberries',
  'Yellow Onions',
  'Organic Baby Spinach',
  'Red Vine Tomato',
  'Organic Granny Smith Apple',
  'Bunched Cilantro',
  'Organic Black Beans',
  'Organic Yellow Onion',
  'Basil Pesto',
  'Limes'],
 51076: ['Strawberries',
  'Red Onion',
  'Organic Hass Avocado',
  'Blueberries',
  'Organic Peeled Whole Baby Carrots',
  'Original Hummus',
  'Organic Chicken & Apple Sausage',
  'Organic Grape Tomatoes',
  'Organic Fuji Apple',
  'Green Bell Pepper'],
 52784: ['Bag of Organic Bananas',
  'Organic Hass Avocado',
  'Large Lemon',
  'Red Vine Tomato',
  'Bunched Cilantro',
  'Organic Strawberries',
  'Honeycrisp Apple',
  'Green Bell Pepper',
  'Jalapeno Peppers',
  'Banana'],
 53304: ['Organic Baby Carrots',
  'Sweet Kale Salad Mix',
  'Blackberries',
  'Celery Sticks',
  'Brussels Sprouts',
  'Apple Honeycrisp Organic',
  'Brioche Hamburger Buns',
  'Apricots',
  'Packaged Grape Tomatoes',
  'Cereal'],
 53968: ['Strawberries',
  'Organic Baby Spinach',
  'Large Lemon',
  'Organic Garlic',
  'Organic Extra Firm Tofu',
  'Organic Zucchini',
  'Organic Avocado',
  'Organic Basil',
  'Red Onion',
  'Organic Raspberries'],
 55720: ['Organic Garlic',
  'Organic Zucchini',
  'Asparagus',
  'Fresh Cauliflower',
  'Organic Raspberries',
  'Organic Tomato Cluster',
  'Red Vine Tomato',
  'Organic Red Onion',
  'Organic Avocado',
  'Organic Yellow Onion'],
 56266: ['Banana',
  'Bag of Organic Bananas',
  'Limes',
  'Organic Strawberries',
  'Organic Garlic',
  'Roasted Red Pepper Hummus',
  'Organic Blueberries',
  'Organic Hass Avocado',
  'Honeycrisp Apple',
  'Ground Black Pepper'],
 58959: ['Organic Large Extra Fancy Fuji Apple',
  'Chicken Base, Organic',
  'Organic Baby Spinach',
  'Banana',
  'Organic Baby Carrots',
  'Organic Unsweetened Almond Milk',
  'Organic Garlic',
  'Large Lemon',
  'Grape White/Green Seedless',
  'Sparkling Lemon Water'],
 59889: ['Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Organic Raspberries',
  'Organic Hass Avocado',
  'Large Lemon',
  'Organic Peeled Whole Baby Carrots',
  'Organic Baby Carrots',
  'Fresh Cauliflower',
  'Organic Black Beans'],
 61065: ['Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Banana',
  'Organic Small Bunch Celery',
  'Organic Yellow Onion',
  'Strawberries',
  'Organic Ginger Root',
  'Yellow Onions',
  'Organic Raspberries'],
 63472: ['Carrots',
  'Bag of Organic Bananas',
  'Organic Avocado',
  'Organic Strawberries',
  'Banana',
  'Organic Grape Tomatoes',
  'Organic Hass Avocado',
  'Philadelphia Original Cream Cheese',
  'Half & Half',
  'Gluten Free Chocolate Dipped Donuts'],
 66265: ['Blueberries',
  'Organic Strawberries',
  'Bag of Organic Bananas',
  'Large Lemon',
  'Tilapia Filet',
  'Bartlett Pears',
  'Half & Half',
  'Banana',
  'Unsweetened Almondmilk',
  'Organic Peeled Whole Baby Carrots'],
 67941: ['Large Lemon',
  'Organic Red Onion',
  'Organic Baby Spinach',
  'Bag of Organic Bananas',
  'Strawberries',
  'Sparkling Water Grapefruit',
  'Fresh Cauliflower',
  'Organic Garlic',
  'Banana',
  'Organic Small Bunch Celery'],
 69178: ['Organic Blueberries',
  'Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Strawberries',
  'Raspberries',
  'Banana',
  'Creamy Almond Butter',
  'Organic Garlic',
  'Brussels Sprouts'],
 72791: ['Orange Bell Pepper',
  'Bag of Organic Bananas',
  'Organic Peeled Whole Baby Carrots',
  'Organic Yellow Onion',
  'Organic Baby Spinach',
  'Organic Zucchini',
  'Bunched Cilantro',
  'Organic Avocado',
  'Organic Raspberries',
  'Sugar Snap Peas'],
 73171: ['Organic Blueberries',
  'Organic Zucchini',
  'Bag of Organic Bananas',
  'Large Lemon',
  'Extra Virgin Olive Oil',
  'Strawberries',
  'Organic Yellow Onion',
  'Organic Baby Spinach',
  'Basil Pesto',
  'Limes'],
 73477: ['Organic Baby Spinach',
  'Large Lemon',
  'Organic Kiwi',
  'Strawberries',
  'Organic Baby Carrots',
  'Organic Tomato Paste',
  'Organic Italian Parsley Bunch',
  'Organic Raspberries',
  'Organic Fuji Apple',
  'Limes'],
 75993: ['Bag of Organic Bananas',
  'Large Lemon',
  'Organic Garlic',
  'Organic Strawberries',
  'Organic Avocado',
  'Banana',
  'Organic Blueberries',
  'Organic Ground Korintje Cinnamon',
  'Organic Hass Avocado',
  'Organic Zucchini'],
 85028: ['Organic Avocado',
  'Organic Strawberries',
  'Organic Baby Spinach',
  'Organic Garlic',
  'Bag of Organic Bananas',
  'Yellow Onions',
  'Organic Hass Avocado',
  'Bunched Cilantro',
  'Organic Raspberries',
  'Organic Baby Carrots'],
 85238: ['Organic Avocado',
  'Organic Raspberries',
  'Organic Zucchini',
  'Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Organic Yellow Onion',
  'Organic Small Bunch Celery',
  'Limes',
  'Organic Ginger Root',
  'Organic Lemon'],
 87350: ['Organic Avocado',
  'Apple Honeycrisp Organic',
  'Original Hummus',
  'Organic Cucumber',
  'Carrots',
  'Yellow Onions',
  'Small Hass Avocado',
  'Extra Virgin Olive Oil',
  'Organic Zucchini',
  'Organic Blueberries'],
 89776: ['Organic Zucchini',
  'Large Lemon',
  'Red Onion',
  'Orange Bell Pepper',
  'Organic Fuji Apple',
  'Organic Avocado',
  'Carrots',
  'Red Vine Tomato',
  'Organic Yellow Onion',
  'Organic Extra Firm Tofu'],
 93241: ['Organic Strawberries',
  'Large Lemon',
  'Organic Garlic',
  'Organic Raspberries',
  'Organic Creamy Peanut Butter',
  'Organic Baby Carrots',
  'Creamy Almond Butter',
  'Lime',
  'Bag of Organic Bananas',
  'Organic Red Bell Pepper'],
 95686: ['Banana',
  'Organic Avocado',
  'Organic Hass Avocado',
  'Organic Strawberries',
  'Large Lemon',
  'Organic Zucchini',
  'Organic Baby Spinach',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Baby Carrots',
  'Asparagus'],
 96466: ['Organic Zucchini',
  'Organic Small Bunch Celery',
  'Organic Fuji Apple',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Baby Carrots',
  'Fresh Cauliflower',
  'Organic Baby Spinach',
  'Michigan Organic Kale',
  'Organic Whole Strawberries',
  'Large Lemon'],
 98570: ['Strawberries',
  'Organic Hass Avocado',
  'Organic Whole String Cheese',
  'Organic Baby Spinach',
  'Original Hummus',
  'Organic Tomato Paste',
  'Organic Yellow Onion',
  'Extra Virgin Olive Oil',
  'Organic Peeled Whole Baby Carrots',
  'Organic Black Beans'],
 99282: ['Organic Raspberries',
  'Organic Small Bunch Celery',
  'Organic Cucumber',
  'Original Hummus',
  'Organic Yellow Onion',
  'Organic Avocado',
  'Organic Zucchini',
  'Strawberries',
  'Organic Lemon',
  'Organic Baby Carrots'],
 100253: ['Organic Strawberries',
  'Organic Hass Avocado',
  'Organic Avocado',
  'Bing Cherries',
  'Organic Yellow Onion',
  'Organic Grape Tomatoes',
  'Organic Lemon',
  'Organic Tomato Cluster',
  'Extra Virgin Olive Oil',
  'Organic Garlic'],
 102099: ['Bag of Organic Bananas',
  'Organic Strawberries',
  'Organic Zucchini',
  'Red Vine Tomato',
  'Organic Hass Avocado',
  'Organic Blueberries',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Avocado',
  'Organic Small Bunch Celery',
  'Organic Raspberries'],
 104175: ['Banana',
  'Organic Blueberries',
  'Organic Strawberries',
  'Large Lemon',
  'Bag of Organic Bananas',
  'Extra Virgin Olive Oil',
  'Organic Baby Spinach',
  'Cucumber Kirby',
  'Organic Yellow Onion',
  'Organic Avocado'],
 107051: ['Organic Zucchini',
  'Bag of Organic Bananas',
  'Organic Baby Spinach',
  'Limes',
  'Organic Avocado',
  'Organic Extra Firm Tofu',
  'Fresh Cauliflower',
  'Asparagus',
  'Chocolate Chip Cookie Dough Ice Cream',
  'Organic Blueberries'],
 107931: ['Large Lemon',
  'Organic Granny Smith Apple',
  'Pineapple Chunks',
  'Organic Baby Carrots',
  'Organic Raspberries',
  'Limes',
  'Organic Italian Parsley Bunch',
  'White Corn',
  'Bag of Organic Bananas',
  'Yellow Onions'],
 111387: ['Bag of Organic Bananas',
  'Garlic',
  'Banana',
  'Organic Raspberries',
  'Large Lemon',
  'Organic Lemon',
  'Organic Cilantro',
  'Clementines, Bag',
  'Organic Red Bell Pepper',
  '100% Whole Wheat Bread'],
 114336: ['Organic Strawberries',
  'Strawberries',
  'Organic Blueberries',
  'Sweet Kale Salad Mix',
  'Blackberries',
  'Organic Baby Carrots',
  'Cereal',
  'Hass Avocados',
  'Organic Baby Spinach',
  'Meyer Lemons'],
 114764: ['Bag of Organic Bananas',
  'Organic Strawberries',
  'Banana',
  'Organic Ginger Root',
  'Organic Small Bunch Celery',
  'Organic Baby Spinach',
  'Organic Kiwi',
  'Organic Cucumber',
  'Unsweetened Almondmilk',
  'Frozen Organic Wild Blueberries'],
 118102: ['Bag of Organic Bananas',
  'Banana',
  'Organic Baby Spinach',
  'Organic Zucchini',
  'Large Lemon',
  'Organic Blueberries',
  'Lime Sparkling Water',
  'Organic Lacinato (Dinosaur) Kale',
  'Organic Ginger Root',
  'Organic Whole Strawberries'],
 118981: ['Large Lemon',
  'Organic Blueberries',
  'Organic Avocado',
  'Organic Fuji Apple',
  'Strawberries',
  'Basil Pesto',
  'Organic Baby Spinach',
  'Organic Garlic',
  'Organic Raspberries',
  'Organic Baby Carrots'],
 120138: ['Banana',
  'Organic Grape Tomatoes',
  'Strawberries',
  'Red Onion',
  'Large Lemon',
  'Blueberries',
  'Organic Fuji Apple',
  'Organic Avocado',
  'Organic Cilantro',
  'Organic Peeled Whole Baby Carrots'],
 120660: ['Bag of Organic Bananas',
  'Organic Yellow Onion',
  'Organic Lemon',
  'Limes',
  'Organic Baby Spinach',
  'Organic Strawberries',
  'Banana',
  'Organic Blueberries',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Hass Avocado'],
 125120: ['Organic Blueberries',
  'Strawberries',
  'Organic Strawberries',
  'Bag of Organic Bananas',
  'Organic Baby Carrots',
  'Blackberries',
  'Raspberries',
  'Fresh Asparagus',
  'Honeycrisp Apples',
  'Sweet Kale Salad Mix'],
 129124: ['Organic Baby Spinach',
  'Organic Strawberries',
  'Banana',
  'Organic Avocado',
  '100% Recycled Paper Towels',
  'Bag of Organic Bananas',
  'Extra Virgin Olive Oil',
  'Organic Grape Tomatoes',
  'Feta Cheese Crumbles',
  'Large Lemon'],
 131280: ['Organic Strawberries',
  'Limes',
  'Orange Bell Pepper',
  'Bag of Organic Bananas',
  'Organic Blueberries',
  'Strawberries',
  'Organic Baby Spinach',
  'Organic Grape Tomatoes',
  'Organic Tomato Cluster',
  'Blueberries'],
 132038: ['Organic Strawberries',
  'Large Lemon',
  'Organic Blueberries',
  'Grape White/Green Seedless',
  'Organic Zucchini',
  'Organic Avocado',
  'Limes',
  'Orange Bell Pepper',
  'Organic Baby Spinach',
  'Organic Whole Milk'],
 132551: ['Bag of Organic Bananas',
  'Strawberries',
  'Organic Blueberries',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Blackberries',
  'Fresh Asparagus',
  'Organic Lemon',
  'Soda',
  'Real Mayonnaise'],
 133738: ['Organic Avocado',
  'Organic Garlic',
  'Cucumber Kirby',
  'Organic Baby Carrots',
  'Organic Whole Milk',
  'Bag of Organic Bananas',
  'Organic Blueberries',
  'Organic Small Bunch Celery',
  'Organic Hass Avocado',
  'Organic Strawberries'],
 133964: ['Large Lemon',
  'Strawberries',
  'Organic Blueberries',
  'Organic Zucchini',
  'Organic Strawberries',
  'Basil Pesto',
  'Organic Baby Spinach',
  'Original Fresh Stack Crackers',
  'Bag of Organic Bananas',
  'No Salt Added Black Beans'],
 138067: ['Organic Whole Milk',
  'Kale & Spinach Superfood Puffs',
  'Shredded Mild Cheddar Cheese',
  'Organic Grape Tomatoes',
  'Granny Smith Apples',
  'Large Lemon',
  'Pure & Natural Sour Cream',
  'Cherubs Heavenly Salad Tomatoes',
  'Lime',
  'Limes'],
 138203: ['Organic Avocado',
  'Organic Garlic',
  'Limes',
  'Organic Peeled Whole Baby Carrots',
  'Yellow Onions',
  'Organic Small Bunch Celery',
  'Organic Cilantro',
  'Organic Tomato Paste',
  'Organic Garnet Sweet Potato (Yam)',
  'Organic Hass Avocado'],
 139656: ['Organic Raspberries',
  'Organic Strawberries',
  'Organic Blueberries',
  'Organic Garlic',
  'Organic Avocado',
  'Organic Granny Smith Apple',
  'Organic Cilantro',
  'Organic Blackberries',
  'Large Lemon',
  'Banana'],
 141719: ['Organic Raspberries',
  'Strawberries',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Organic Hass Avocado',
  'Organic Ginger Root',
  'Organic Cilantro',
  'Red Onion',
  'Organic Large Extra Fancy Fuji Apple',
  'Organic Peeled Whole Baby Carrots'],
 147179: ['Organic Avocado',
  'Organic Zucchini',
  'Organic Strawberries',
  'Bag of Organic Bananas',
  'Basil Pesto',
  'Strawberries',
  'Organic Whole Milk',
  'Organic Raspberries',
  'Seedless Red Grapes',
  'Organic Baby Arugula'],
 151119: ['Large Lemon',
  'Banana',
  'Pure & Natural Sour Cream',
  'Red Onion',
  'Organic Grape Tomatoes',
  'Blueberries',
  'Limes',
  'Eggo Homestyle Waffles',
  'Sauvignon Blanc',
  'Elbow Macaroni Pasta'],
 151410: ['Organic Strawberries',
  'Banana',
  'Bag of Organic Bananas',
  'Red Vine Tomato',
  'Large Lemon',
  'Organic Avocado',
  'Organic Cucumber',
  'Organic Zucchini',
  'Limes',
  'Organic Mint'],
 151564: ['Organic Strawberries',
  'Organic Baby Spinach',
  'Banana',
  'Large Lemon',
  'Original Hummus',
  'Organic Raspberries',
  'Organic Ginger Root',
  'Organic Baby Carrots',
  'Limes',
  'Organic Cilantro'],
 154852: ['Organic Strawberries',
  'Bag of Organic Bananas',
  'Organic Garlic',
  'Organic Yellow Onion',
  'Organic Avocado',
  'Organic Cilantro',
  'Organic Granny Smith Apple',
  'Organic Baby Spinach',
  'Organic Blueberries',
  'Extra Virgin Olive Oil'],
 156537: ['Strawberries',
  'Large Lemon',
  'Organic Strawberries',
  'Organic Baby Carrots',
  'Red Onion',
  'Blackberries',
  'Beef Loin New York Strip Steak',
  'Bunched Cilantro',
  'Organic Baby Arugula',
  'Cherrios Honey Nut'],
 157497: ['Organic Baby Spinach',
  'Strawberries',
  'Banana',
  'Orange Bell Pepper',
  'Organic Zucchini',
  'Organic Blueberries',
  'Organic Garlic',
  'Organic Strawberries',
  'Organic Garnet Sweet Potato (Yam)',
  '100% Recycled Paper Towels'],
 157798: ['Seedless Small Watermelon',
  'Organic Baby Spinach',
  'Hass Avocados',
  'Seedless Red Grapes',
  'Heavy Duty Aluminum Foil',
  'Creamy Almond Butter',
  'Organic Whole String Cheese',
  'Meyer Lemons',
  'Brussels Sprouts',
  'Brioche Hamburger Buns'],
 158373: ['Banana',
  'Organic Avocado',
  'Organic Quick Oats',
  'Seedless Red Grapes',
  'Organic Blueberries',
  'Organic Grape Tomatoes',
  'Limes',
  'Organic Broccoli Florets',
  'Organic Lemon',
  'Organic Baby Carrots'],
 159308: ['Red Onion',
  'Blueberries',
  'Large Lemon',
  'Plain Whole Milk Yogurt',
  'Orange Bell Pepper',
  'Organic Reduced Fat Milk',
  'Provolone',
  'Seedless Red Grapes',
  'Organic Beans & Rice Cheddar Cheese Burrito',
  'Supergreens!'],
 161574: ['Bag of Organic Bananas',
  'Sweet Kale Salad Mix',
  'Soda',
  'Raspberries',
  'Green Bell Pepper',
  'Apricots',
  'Real Mayonnaise',
  'Organic Navel Orange',
  'Fat Free Skim Milk',
  'Brussels Sprouts'],
 166707: ['Organic Baby Carrots',
  'Organic Avocado',
  'Strawberries',
  'Organic Small Bunch Celery',
  'Yellow Onions',
  'Banana',
  'Organic Ginger Root',
  'Organic Baby Spinach',
  'Cucumber Kirby',
  'Red Vine Tomato'],
 169583: ['Organic Hass Avocado',
  'Organic Zucchini',
  'Extra Virgin Olive Oil',
  'Organic Blueberries',
  'Honeycrisp Apple',
  'Organic Yellow Onion',
  'Organic Raspberries',
  'Basil Pesto',
  'Bing Cherries',
  'Organic Italian Parsley Bunch'],
 177453: ['Banana',
  'Large Lemon',
  'Organic Baby Arugula',
  'Strawberries',
  'Bag of Organic Bananas',
  'Limes',
  'Organic Rainbow Carrots',
  'Chocolate Peanut Butter Ice Cream',
  'Instant Coffee',
  'Organic Medjool Dates'],
 179429: ['Banana',
  'Large Lemon',
  'Organic Grape Tomatoes',
  'Organic Garlic',
  'Orange Bell Pepper',
  'Organic Cilantro',
  'Sustainably Soft Bath Tissue',
  'Organic Diced Tomatoes',
  'Organic Fuji Apple',
  'Strawberries'],
 180305: ['Bag of Organic Bananas',
  'Organic Yellow Onion',
  'Organic Baby Spinach',
  'Banana',
  'Organic Whole Milk',
  'Asparagus',
  'Organic Peeled Whole Baby Carrots',
  'Apple Honeycrisp Organic',
  'Organic Grape Tomatoes',
  'Large Lemon'],
 180461: ['Organic Black Beans',
  'Original Hummus',
  'Strawberries',
  'Free & Clear Natural Laundry Detergent For Sensitive Skin',
  'Organic Blackberries',
  'Organic Tomato Paste',
  'Basil Pesto',
  'Asparagus',
  '100% Recycled Paper Towels',
  'Extra Virgin Olive Oil'],
 182863: ['Bag of Organic Bananas',
  'Organic Hass Avocado',
  'Organic Strawberries',
  'Organic Red Bell Pepper',
  'Organic Baby Spinach',
  'Organic Black Beans',
  'Banana',
  'Organic Small Bunch Celery',
  'Organic Garnet Sweet Potato (Yam)',
  "Organic D'Anjou Pears"],
 185153: ['Organic Peeled Whole Baby Carrots',
  'Organic Baby Spinach',
  'Large Lemon',
  'Cucumber Kirby',
  'Blueberries',
  'Organic Avocado',
  'Feta Cheese Crumbles',
  'Organic Garlic',
  'Organic Strawberries',
  'Tomato Sauce'],
 187019: ['Organic Blueberries',
  'Organic Baby Carrots',
  'Blackberries',
  'Bag of Organic Bananas',
  'Limes',
  'Organic Lemon',
  'Sweet Kale Salad Mix',
  'Organic Strawberries',
  'Brussels Sprouts',
  'Organic Avocado'],
 187754: ['Organic Yellow Onion',
  'Organic Cilantro',
  'Organic Lemon',
  'Banana',
  'Organic Garlic',
  'Organic Granny Smith Apple',
  'Large Lemon',
  'Organic Peeled Whole Baby Carrots',
  'Frozen Organic Wild Blueberries',
  'Organic Black Beans'],
 192587: ['Banana',
  'Organic Grape Tomatoes',
  'Organic Grade A Free Range Large Brown Eggs',
  'Organic Peeled Whole Baby Carrots',
  'Strawberries',
  'Cucumber Kirby',
  'Organic Blueberries',
  'Organic Baby Arugula',
  'Organic Baby Spinach',
  'Fresh Cauliflower'],
 197989: ['Organic Strawberries',
  'Organic Cucumber',
  'Large Lemon',
  'Organic Avocado',
  'Organic Yellow Onion',
  'Asparagus',
  'Extra Virgin Olive Oil',
  'Granny Smith Apples',
  'Organic Gala Apples',
  'Organic Blueberries'],
 199124: ['Red Onion',
  'Banana',
  'Strawberries',
  'Classic Hummus',
  'Seedless Red Grapes',
  'Orange Bell Pepper',
  'Organic Strawberries',
  'Organic Avocado',
  'Blueberries',
  'Uncured Genoa Salami'],
 200078: ['Large Lemon',
  'Jalapeno Peppers',
  'Bag of Organic Bananas',
  'Carrots',
  'Red Onion',
  'Orange Bell Pepper',
  'Yellow Onions',
  'Organic Hass Avocado',
  'Organic Peeled Whole Baby Carrots',
  'Limes'],
 201135: ['Organic Avocado',
  'Large Lemon',
  'Organic Yellow Onion',
  'Organic Garlic',
  'Organic Baby Carrots',
  'Original Hummus',
  'Organic Whole String Cheese',
  'Pineapple Chunks',
  'Organic Cucumber',
  'Organic Red Onion'],
 201870: ['Large Lemon',
  'Organic Baby Spinach',
  'Strawberries',
  'Organic Baby Carrots',
  'Organic Half & Half',
  'Organic Ginger Root',
  'Organic Zucchini',
  'Organic Fuji Apple',
  'Bag of Organic Bananas',
  'Organic Avocado'],
 203111: ['Strawberries',
  'Packaged Grape Tomatoes',
  'Unsweetened Vanilla Almond Breeze',
  'Organic Sage',
  'Organic Tortilla Chips',
  'Soda',
  'Vanilla Milk Chocolate Almond Ice Cream Bars Multi-Pack',
  'Organic Zucchini',
  'Coconut Water',
  'Teriyaki & Pineapple Chicken Meatballs']}

CPU times: user 46.5 ms, sys: 2.75 ms, total: 49.2 ms
Wall time: 1min

As can be seen above, our collaborative filtering function returns a dictionary of users and their top 10 recommended products. To see how we arrived at the output above, let's break down our function using a speficic example, user 4789. For reference, below are the products that this user has been recommended:

In [278]:

recommendations[4789]

Out[278]:

['Organic Granny Smith Apple',
 'Limes',
 'Organic Green Cabbage',
 'Organic Cilantro',
 'Creamy Almond Butter',
 'Corn Tortillas',
 'Organic Grape Tomatoes',
 'Unsweetened Almondmilk',
 'Organic Blackberries',
 'Organic Lacinato (Dinosaur) Kale']

The first main component of our collaborative filtering function identifies the top 10 neighbours for user 4789. It does so by creating user pairs, where u1 is always user 4789 and u2 is any other user who has purchased products that u1 has purchased. It then computes the Jaccard index for each user pair, by taking the number of distinct products that u1 and u2 have purchased in common (intersection_count) and dividing it by the number of distinct products that are unique to each user (union_count). The 10 users with the highest Jaccard index are selected as user 4789's neighbourhood.

In [285]:

query = """
        // Get count of all distinct products that user 4789 has purchased and find other users who have purchased them
        MATCH (u1:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)<-[:CONTAINS]-(:Order)<-[:ORDERED]-(u2:User)
        WHERE u1 <> u2
          AND u1.user_id = {uid}
        WITH u1, u2, COUNT(DISTINCT p) as intersection_count
        
        // Get count of all the distinct products that are unique to each user
        MATCH (u:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)
        WHERE u in [u1, u2]
        WITH u1, u2, intersection_count, COUNT(DISTINCT p) as union_count
       
        // Compute Jaccard index
        WITH u1, u2, intersection_count, union_count, (intersection_count*1.0/union_count) as jaccard_index
        
        // Get top k neighbours based on Jaccard index
        ORDER BY jaccard_index DESC, u2.user_id
        WITH u1, COLLECT([u2.user_id, jaccard_index, intersection_count, union_count])[0..{k}] as neighbours
     
        WHERE LENGTH(neighbours) = {k}                // only want to return users with enough neighbours
        RETURN u1.user_id as user, neighbours
        """

neighbours = {}
for row in g.run(query, uid=4789, k=10):
    neighbours[row[0]] = row[1]

print("Labels for user 4789's neighbour list: user_id, jaccard_index, intersection_count, union count")
display(neighbours)

Labels for user 4789's neighbour list: user_id, jaccard_index, intersection_count, union count

  
{4789: [[42145, 0.12794612794612795, 38, 297],
  [138203, 0.10497237569060773, 38, 362],
  [87350, 0.09390862944162437, 37, 394],
  [49441, 0.0912280701754386, 26, 285],
  [187754, 0.0912280701754386, 26, 285],
  [180461, 0.09115281501340483, 34, 373],
  [120660, 0.08641975308641975, 21, 243],
  [107931, 0.08360128617363344, 26, 311],
  [73477, 0.07855626326963906, 37, 471],
  [154852, 0.0735930735930736, 17, 231]]}

The second main component of our collaborative filtering function generates recommendations for user 4789 using the neighbours identified above. It does so by considering products that the neighbours have purchased which user 4789 has not already purchased. The function then counts the number of neighbours who have purchased each of the candidate products. The 10 products with the highest neighbour count are selected as recommendations for user 4789.

In [287]:

%%time
query = """
        // Get top n recommendations for user 4789 from the selected neighbours
        MATCH (u1:User),
              (neighbour:User)-[:ORDERED]->(:Order)-[:CONTAINS]->(p:Product)        // get all products bought by neighbour
        WHERE u1.user_id = {uid}
          AND neighbour.user_id in {neighbours}
          AND not (u1)-[:ORDERED]->(:Order)-[:CONTAINS]->(p)                        // which u1 has not already bought
        
        WITH u1, p, COUNT(DISTINCT neighbour) as cnt                                // count times purchased by neighbours
        ORDER BY u1.user_id, cnt DESC                                               // and sort by count desc
        RETURN u1.user_id as user, COLLECT([p.product_name,cnt])[0..{n}] as recos  
        """

recos = {}
for row in g.run(query, uid=4789, neighbours=[42145,138203,87350,49441,187754,180461,120660,107931,73477,154852], n=10):
    recos[row[0]] = row[1]
    
print("Labels for user 4789's recommendations list: product, number of purchasing neighbours")
display(recos)

Labels for user 4789's recommendations list: product, number of purchasing neighbours


{4789: [['Organic Granny Smith Apple', 6],
  ['Limes', 5],
  ['Organic Green Cabbage', 5],
  ['Organic Cilantro', 5],
  ['Creamy Almond Butter', 5],
  ['Corn Tortillas', 5],
  ['Organic Grape Tomatoes', 5],
  ['Unsweetened Almondmilk', 4],
  ['Organic Blackberries', 4],
  ['Organic Lacinato (Dinosaur) Kale', 4]]}
  

CPU times: user 6.85 ms, sys: 2.08 ms, total: 8.93 ms
Wall time: 716 ms

Part 5: Evaluating Recommender Performance ¶

If we were to actually integrate our recommender system in to a production environment, we would need a way to measure its performance. As mentioned, in the context of a user check-out application for an online grocer, the goal is to increase basket size, by surfacing a short list of products that are as relevant as posssible to the user. For this particular application, we could choose precision as our metric for evaluating our recommender's performance. Precision is computed as the proportion of products that the user actually purchased, out of all the products that user has been recommended. To determine overall recommender performance, average precision can be calculated using the precision values for all the users in the system.

Conclusion ¶

We have demonstrated how to build a user-based recommender system leveraging the principles of user-user collaborative filtering. We've discussed the key concepts underlying this algorithm, from identifying neighbourhoods using a similarity metric, to generating recommendations for a user based on its neighbours' preferences. In addition, we have shown how easy and intuitive modeling connected data can be with a graph database. One final point worth noting: in real world applications, we may want to implement non-personalized recommendation strategies for users who are new to the system and those who have not yet made sufficient purchases. Strategies may include recommending top selling products for new users, and for the latter group, products identified to have high affinity with other products that the user has already purchased. This can be done through association rules mining, also known as market basket analysis.

datathèque

User-User Collaborative Filtering Using Neo4j Graph Database