Lab 7: Faster Gift Delivery
If you are a current student, please Log In for full access to the web site.
Note that this link will take you to an external site (https://oidc.mit.edu) to authenticate, and then you will be redirected back to this page.
Table of Contents
- 1) Preparation
- 2) Introduction
- 3) A Faster Query
- 4) More Optimizations
- 5) Code Submission
- 6) Checkoff
This lab assumes you have Python 3.5 or later installed on your machine.
The following file contains code and other resources as a starting point for this lab: lab7.zip
Most of your changes should be made to
lab.py, which you will submit at the
end of this lab. Importantly, you should not add any imports to the file.
This lab is worth a total of 4 points. Your score for the lab is based on:
- correctly answering the questions on this page (0.5 points)
- passing the test cases from
test.pyunder the time limit (1.5 points), and
- a brief "checkoff" conversation with a staff member to discuss your code (2 points).
For this lab, you will only receive credit for a test case if it runs to completion under the time limit on the server.
Santa has made great use of the implementations of
Graph from the previous lab.
However, it turns out that Santa forgot to deliver gifts in a small country in
the Balkans and time is running out! You need to create a new implementation of
Graph that can respond to queries much faster, so that Santa can find
buildings and allocate delivery teams in time for Christmas.
In this lab, you will write a new implementation of
Graph with a much faster
query method by first improving the general algorithm for finding pattern
matches, then making use of redundancy and specialized algorithms to tackle
To begin, copy your implementation of
GraphFactory from Lab 6 into
since our tests will use that to create new instances. You will need to
implement the required functions in the
FastGraph class provided in
but read section 3) before implementing
query. We have copied over all methods
Graph interface (in
graph.py) for convenience. We leave the base
internal representation of
FastGraph up to you; the additions described in
section 4) should work with pretty much anything.
3) A Faster Query
For Lab 6, we asked you to implement a "brute-force" approach to
this week you will write a smarter algorithm that avoids unnecessary work. The
idea is that instead of generating permutations then checking for correctness,
you can "follow" the pattern, and try to see if there is a matching part of the
graph. A more detailed description of this algorithm follows.
pattern is a list of tuples of the form
(label, neighbors) where
each list index refers to a distinct node.
- Pick an index in the pattern that hasn't been assigned.
- For every node in the graph:
- If the label of the node matches the label of the index, assign the node to the index, else try next one.
- For every neighbor of the current index in the pattern:
- Repeat step 2 with the neighbor as the starting index and considering only the neighbors of the chosen node in the graph.
- Repeat steps 1 and 2 until all nodes in the pattern have been assigned.
Keep in mind that you are looking for all possible assignments of nodes to the indices in the pattern, not just a single one. This means that you will need to figure out a way to keep track of these possible assignments as you are trying to cover the entire pattern with nodes from the graph.
Hint: You might find it useful to write a helper function to perform step 2. of the algorithm, since it is called recursively.
4) More Optimizations
Your new algorithm is much faster than the old one, but there is still room for
improvement. Most of the queries Santa makes for the gift-delivery mission
involve patterns that represent stars and cliques. An
n-legged star is a
structure that contains one central node that has edges towards
n other nodes.
k-cliques are subgraphs with
k nodes in which any two nodes are connected by an
undirected edge. Remember from Lab 6 that an undirected edge is represented as two
directed edges in opposite directions.
Below you can find two examples of how such subgraphs would look: a 5-legged star and a 5-clique.
Santa is asking you to build on your current
FastGraph implementation and add
algorithms that would allow quicker queries for star and clique patterns. In
order to support these optimizations, you will augment
additional data structures that would make information relevant for the special
star and clique queries quicker to access. In Lab 6, you created a
implementation that was more space-efficient at the cost of additional runtime,
while now we will trade memory for time.
The improved data structure should be able to support general pattern queries
the same way as before. The difference is that now patterns that resemble
cliques and stars will be detected in advance, and you will apply a different,
optimized version of
query to them.
In order to optimize the graph search for star-like structures, you should think about how to store information about node degrees. The degree of a node is the number of total edges that are outgoing from that node to other nodes in the graph.1 For example, the center node of the star above has degree 5. In order to better understand how this might be helpful in the context of our optimization problem, answer the questions below:
How do the node degrees change as one adds or removes nodes and edges from a graph? How can you use node degrees in order to make a query run faster on certain graphs?
After implementing support for faster star queries inside
FastGraph, your code
should be able to pass all test cases in the
Test_3_QueryStars class from
query tries to match each node in the graph for at least one
position in the pattern, then follows each edge to make sure they exist in the
graph as well. However, if the pattern is a sizable
k-clique, it's likely that only a
small fraction of the nodes in the graph will have the necessary connectivity to
satisfy it (i.e. will be contained in cliques of size at least
k). You could
save a lot of work by storing sets of nodes that form cliques in the internal
FastGraph. This would allow the
query method to only
check a subset of the nodes when a clique pattern gets passed as argument. Note
that nodes within a clique might still have different labels that need to be
We suggest keeping track of either all the maximal cliques or just all cliques in which each node is contained while constructing the graph. A maximal clique is a clique whose size cannot be increased by including more nodes in the graph to it. In other words, the nodes in a maximal clique are not contained within any other, larger, clique. Note that a maximal clique need not be the maximum clique (the clique containing the largest number of nodes). To check your understanding, answer the concept questions below about the following graph, where node names are integers.
Which approach would be more space efficient: maintaining all maximal cliques
or maintaining all possible cliques that each node is contained in?
Since our focus in this lab is on time efficiency and because these two strategies
would make the
query method similarly fast, it is up to you which one to use.
However, note that keeping track of maximal cliques might introduce additional complications
that you need to account for. For example, some maximal cliques might become non maximal
after adding edges to the graph and some non maximal cliques might become maximal after
removing edges/nodes from the graph.
Regardless of which strategy you use, the number and dimensions of the cliques
can still change significantly in the process of building a graph through the
methods defined on the
Graph interface. It is up to you how to store these cliques
as well as the relationship between a node and the cliques that contain it, but you
should make sure that the methods which mutate the graph by adding or removing
nodes/edges properly update the internal representation of
FastGraph. If you need help,
you can take a look at the following hints on
how certain graph operations can affect the clique dynamics.
You might notice that the
remove_node methods actually become slower than before implementing the
clique optimization, but this is nothing to worry about! Santa rarely changes
the maps and usually only uses your data structure for multiple repeated queries
on cliques and stars. This means that if the
query method is fast enough on
these two kinds of inputs, the time saved when performing queries offsets the
time taken to build the graph.
After implementing this feature, your code should pass all
test cases in
5) Code Submission
Once you are finished with the code, please come to a tutorial, lab session, or office hour and add yourself to the queue asking for a checkoff. You must be ready to discuss your code in detail before asking for a checkoff.
You should be prepared to demonstrate your code (which should be well-commented, should avoid repetition, and should make good use of helper functions). In particular, be prepared to discuss:
- description of your optimization for stars
- description of your optimization for cliques