Submission Instructions

PhysioNet/CinC Challenge 2021: Cloud Submission Instructions

Table of contents

Introduction

Similarly to last year’s Challenge, teams must submit both the code for their models and the code for training their models. To help, we have implemented example entries in both MATLAB and Python, and we encourage teams to use these example entries as templates for their entries.

Preparation and submission instructions

  1. Create a private GitHub or GitLab repository for your code. We recommend cloning our example code and replacing it with your code. Add physionetchallengeshelper as a collaborator to your repository.
  2. Add your classification code to your repository. Like the example code, your code must be in the root directory of the master branch.
  3. Do not include extra files that are not required to create and run your classification code, such as the training data.
  4. Follow the instructions for the programming language of your submission.
  5. Submit your entry through this form. We will clone your repository using the HTTPS URL that ends in .git. On GitHub, you can get this URL by clicking on “Clone or download” and copying and pasting the URL, e.g., https://github.com/physionetchallenges/python-classifier-2021.git. Please see here for an example.
  6. We will put the scores for successful entries on the leaderboard. The leaderboard will publicly show your team name, run time, and score.

MATLAB-specific instructions

  1. Confirm that your MATLAB code compiles and runs in MATLAB R2020B or R2021A (when available).
  2. Using our sample MATLAB classification code (link) as a template, format your code in the following way. Consider downloading this repository, replacing our code with your code, and adding the updated files to your repository.
  3. AUTHORS.txt, LICENSE.txt, README.md: Update as appropriate. Please include your authors. Unfortunately, our submission system will be unable to read your README file to change how we run your code.
  4. train_model.m: Do not edit this script. It calls your team_training_code.m script. We will not use the train_model.m script from your repository, so any change made to this code will not be included.
  5. team_training_code.m: Update this script to create and save your model. It loads the header with the data and demographics information for a recording, extracts features from the data using the get_features.m function which you can update and edit, and outputs and saves your model (weights and any needed parameters). You can edit this script and the get_features.m function as much as you need.
  6. test_model.m: Do not change this script. It loads your models by calling load_ECG_*leads_model functions (*=2,3,6 or 12 for four different lead sets; 2-leads, 3-leads, 6-leads and 12-leads models). Then, it calls your team_testing_code function for each recording and performs on all file input and output. We will not use the test_model.m script from your repository, so any change made to this code will not be included.
  7. team_testing_code.m: Update this script to load and run your model weights and any parameters from files in your submission. It takes the input test data, header files, and the loaded models (outputs of your train_model.m) and returns a probability or confidence score and a binary classification for each class as output.
  8. get_features.m: Update this scripts to extract your choice of features from the ECG recordings.
  9. get_leads.m: Do not edit this script. It extracts 4 different lead sets (2-leads, 3-leads, 6-leads and 12-leads) of ECG recordings.
  10. extract_data_from_header.m: Do not edit this script. It extracts the data information from the header files.
  11. Add your code to the root/base directory of the master branch of your repository.
  12. We will download your code, compile it using the MATLAB compiler: training (mcc -m train_model.m -a .) and running (mcc -m test_model.m -a .) your classifier, and run them on Google Cloud.
  13. Here is a sample repository that you can use as a template: MATLAB classifier.

Python-specific instructions

  1. Using our sample Python classification code (link) as a template, format your code in the following way. Consider downloading this repository, replacing our code with your code, and adding the updated files to your repository.
  2. Dockerfile: Update to specify the version of Python that you are using on your machine. Add any additional packages that you need. Do not change the name or location of this file. The structure of this file is important, especially the 3 lines that are marked as “DO NOT EDIT”.
  3. requirements.txt: Add Python packages to be installed with pip. Specify the versions of these packages that you are using on your machine. Remove unnecessary packages, such as Matplotlib, that your classification code does not need.
  4. AUTHORS.txt, LICENSE.txt, README.md: Update as appropriate. Please include your authors. Unfortunately, our submission system will be unable to read your README file to change how we run your code.
  5. team_code.py: Update this script to load and run your trained model.
  6. train_model.py: Do not change this script. It calls functions from the team_code script to run your training code on the training data.
  7. helper_code.py Do not change this script. It is a script with helper variables and functions used for our code. You are welcome to use them in your code.
  8. test_model.py: Do not change this script. It calls your trained models to run on the test data. We will not use the test_model.py script from your repository, so any change made to this code will not be included.
  9. Add your code to the root/base directory of the master branch of your repository.
  10. We will download your code, build a Docker image from your Dockerfile, and run it on Google Cloud.
  11. Here is a sample repository that you can use as a template: Python classifier.

Docker-specific FAQs

Why containers?

Containers allow you to define the environment that you think is best suited for your algorithm. For example, if you think your algorithm needs a specific version of a Linux distribution or a certain version of a library or framework, then you can use the containers to specify the environment. Here are two links with data science-centric introductions to Docker: https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5 https://link.medium.com/G87RxYuQIV

Quickly, how can I test my submission locally?

Install Docker. Clone your repository. Build an image. Run it on a single recording.

Less quickly, how can I test my submission locally? Please give me commands that I can copy and paste.

To guarantee that we can run your code, please install Docker, build a Docker image from your code, and run it on the training data. To quickly check your code for bugs, you may want to run it on a subset of the training data.

If you have trouble running your code, then please try the follow steps to run the example code, which is known to work.

  1. Create a folder example in your home directory with several subfolders.

     user@computer:~$ cd ~/
     user@computer:~$ mkdir example
     user@computer:~$ cd example
     user@computer:~/example$ mkdir training_data test_data model test_outputs
    
  2. Download the training data from the Challenge website. Put some of the training data in training_data and test_data. You can use some of the training data to check your code (and should perform cross-validation on the training data to evaluate your algorithm).

  3. Download or clone this repository in your terminal.

     user@computer:~/example$ git clone https://github.com/physionetchallenges/python-classifier-2021.git
    
  4. Build a Docker image and run the example code in your terminal.

     user@computer:~/example$ ls
     model  python-classifier-2021  test_data  test_outputs  training_data
    
     user@computer:~/example$ ls training_data/
     A0001.hea  A0001.mat  A0002.hea  A0002.mat  A0003.hea  ...
    
     user@computer:~/example$ cd python-classifier-2021/
    
     user@computer:~/example/python-classifier-2021$ docker build -t image .
    
     Sending build context to Docker daemon  30.21kB
     [...]
     Successfully tagged image:latest
    
     user@computer:~/example/python-classifier-2021$ docker run -it -v ~/example/model:/physionet/model -v ~/example/test_data:/physionet/test_data -v ~/example/test_outputs:/physionet/test_outputs -v ~/example/training_data:/physionet/training_data image bash
    
     root@[...]:/physionet# ls
         Dockerfile             model             test_data      train_model.py
         extract_leads_wfdb.py  README.md         test_model.py
         helper_code.py         requirements.txt  test_outputs
         LICENSE                team_code.py      training_data
    
     root@[...]:/physionet# python train_model.py training_data model
    
     root@[...]:/physionet# python test_model.py model test_data test_outputs
    
     root@[...]:/physionet# exit
     Exit
    
     user@computer:~/example/python-classifier-2021$ cd ..
    
     user@computer:~/example$ ls test_outputs/
     A0006.csv  A0007.csv  A0008.csv  A0009.csv  A0010.csv  ...
    

FAQ

What computational resources will my entry have?

We will run your training code on Google Cloud using 10 vCPUs, 65 GB RAM, 100 GB disk space, and an optional NVIDIA T4 Tensor Core GPU with 16 GB VRAM. Your training code has a 72 hour time limit without a GPU and a 48 hour time limit with a GPU.

We will run your trained model on Google Cloud using 6 vCPUs, 39 GB RAM, 100 GB disk space, and an optional NVIDIA T4 Tensor Core GPU with 16 GB VRAM. Your trained model has a 24 hour time limit on each of the validation and test sets.

We are using an N1 custom machine type to run submissions on GCP. If you would like to use a predefined machine type, then the n1-highmem-8 is the closest predefined machine type, but with 2 fewer vCPUs and 13 GB less RAM. For GPU submissions, we use the 418.40.04 driver version.

How do I install Docker?

Go to https://docs.docker.com/install/ and install the Docker Community Edition. For troubleshooting, see https://docs.docker.com/config/daemon/

Do I have to use your Dockerfile?

No. The only part of the Dockerfile we care about are the three lines marked as ”DO NOT EDIT”. These three lines help ensure that, during the build process of the container, your code is copied into a folder called physionet so that our cloud-based pipelines can find your code and run it. Please do not change those three lines. You are free to change your base image, and at times you should (see the next question).

What’s the base image in Docker?

Think of Docker as a series of images, or snapshots of a virtual machine, that are layered on top of each other. For example, your image may built on top of a very lightweight Ubuntu operating system with Python 3.8.6 that we get from the official Docker Hub (think of it as a GitHub for Docker). We can then install our requirements (NumPy and SciPy) on it. If you need the latest version of TensorFlow, then search for it on hub.docker.com and edit your file so that the first line of your Dockerfile now reads as: FROM tensorflow. For a specific version, say 1.11, lookup the tags and change it accordingly to FROM tensorflow:1.11.0. We recommend using specific versions for reproducibility.

sklearn or scikit-learn?

For Python, if your entry uses scikit-learn, then you need to install it via pip using the package name scikit-learn instead of sklearn in your requirements.txt file: See here.

xgboost?

For Python, try python:3.8.9-buster in the first line of your Dockerfile. This image includes additional packages, such as GCC, that xgboost needs. Additionally, include xgboost in your requirements.txt file. Specify the version of xgboost that you are using in your requirements.txt file. For R, add RUN R -e 'install.packages(“xgboost”)' to your Dockerfile.

Pandas?

For Python, try python:3.8.9-buster in the first line of your Dockerfile if you experience errors.

GPUs?

We provide an optional NVIDIA T4 Tensor Core GPU with 16 GB VRAM. We use the NVIDIA 418.40.04 driver for the GPU. The latest supported version of CUDA is 10.1, and the latest supported version of PyTorch is therefore 1.7.1.

Why can’t I install a common Python or R package using Python or R’s package manager?

Some packages have dependencies, such as GCC, that need to be installed. Try python:3.8.9-buster, which includes more packages by default, or installing the dependencies. If the first line of your Dockerfile is FROM python:3.8.6-slim, then you are building a Docker image with the Debian Linux distribution, so you can install GCC and other related libraries that many Python and R packages use by adding the line RUN apt install build-essential to your Dockerfile before installing these packages.

How do I build my image?

git clone <<your repository URL that ends in .git>>
cd <<your repository name>>
ls

You should see a Dockerfile and other relevant files here.

docker build -t <<some image name that must be in lowercase letters>> .
docker images
docker run -it <<image name from above>> bash

This will take you into your container and you should see your code.

Please see Docker-specific FAQs for more information and description.

What can I do to make sure that my submission is successful?

You can avoid most submission errors with the following steps:

Why is my entry unsuccessful on your submission system? It works on my computer.

There are several common reasons for unexpected errors:

How do I learn more?

Please see the PhysioNet/CinC Challenge 2021 webpage for more details. Please post questions and concerns on the Challenge discussion forum.


Supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH grant number R01EB030362.

© PhysioNet Challenges. Website content licensed under the Creative Commons Attribution 4.0 International Public License.

Back