Collaborating with Jupyter Notebook using Git
Jupyter Notebook is an amazing solution for using Python for data science. git is what everyone uses for version control you gotta use that too. Now, how do we make them work together?
This is a solution that works for me, and I will probably come back to update this as I improve upon it. Here’s my situation
- This solution should work for a workspace (directory) that is part of a larger repository
- The Python environments may be different for different directories in the repository
- Notebook output should not be saved in the repository (these are to be generated locally)
- Some parts of the metadata should be in the repository (e.g. kernel info) but other parts should be kept out of it (e.g. cell execution)
.gitattributes
file
*.ipynb filter=filter-notebook
scripts/initialize.sh
Sets up the filters
#!/bin/bash
git config filter.filter-notebook.clean "./scripts/filter-notebook/clean.sh"
git config filter.filter-notebook.smudge "./scripts/filter-notebook/smudge.sh"
scripts/filter-notebook/clean.sh
#!/bin/bash
# This script...
# 1. Sets all cells.outputs to an empty array
# 2. Sets all cells.execution_count to null
# 3. Removes the "collapsed" key from all cells.metadata
# 4. Removes the "autoscroll" key from all cells.metadata
# 5. Removes all but the "name" and "version" keys from metadata.language_info if those keys exist
# 6. Removes all but the "language_info" and "kernelspec" keys from metadata
# 7. Writes parts of the metadata to stderr
jq --indent 1 \
'(.cells[] | select(has("outputs")) | .outputs) = [] '\
'| (.cells[] | select(has("execution_count")) | .execution_count) = null '\
'| .cells[].metadata |= del(.collapsed) '\
'| .cells[].metadata |= del(.autoscroll) '\
'| .metadata.language_info |= {name, version}'\
'| .metadata |= {language_info, kernelspec}'\
'| debug({"Kernel Name": .metadata.kernelspec.name, "Language Info": .metadata.language_info})'
scripts/filter-notebook/smudge.sh
#!/bin/bash
cat
<workspace path>/scripts/initialize.sh
#!/bin/bash
# Initialize the virtual environment
bash scripts/reinstall-venv.sh
# Install the kernel for the environment
.venv/bin/python -m ipykernel install --user --name <kernel name> --display-name "<kernel display name>"
<workspace path>/scripts/reinstall-venv.sh
#!/bin/bash
rm -r .venv
<python version you want> -m venv .venv
bash scripts/update.sh
<workspace path>/scripts/update.sh
#!/bin/bash
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
.venv/bin/python -m ipykernel install --user --name <kernel name> --display-name "<kernel display name>"
<workspace path>/scripts/update-requirements.sh
#!/bin/bash
rm -r .venv
<python version you want> -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements-dev.txt
pip freeze > requirements.txt
bash scripts/update.sh