This lesson is still being designed and assembled (Pre-Alpha version)

CSIRO Data School - Introduction to Machine Learning: Lesson Design

Contributors

[Names]

We are actively seeking contributors to help with design of exercises, feeback on lessons, proofreading etc. Please submit a pull request to make your contribution.

Process

We are loosely following the reverse instructional design process described in the software carpentry lesson template here: [reverse design] (http://carpentries.github.io/lesson-example/01-design/index.html)

Assumptions

Audience: Number of students: 12 Number of educators: 2 + 1 helper Contact time: Practical time: Skill level: Beginner through to lower intermediate. Background: CSIRO employees, including research scientists, technicians, and support staff. Prior learning:

1. Final task (Practical exercise)

Task

What should learners be able to do, at the end of this unit? Please describe this in one to two sentences, in terms of a practical, real-world task.

Your task(s) should be achievable in the practical time allotted. Individual tasks should take 1/2 to 2 days.

Task: Process a dataset with “issues”

Task: Identify existence of issues

Task: Fix/overcome those issues

Data

What kind of data do learners need to bring to the workshop, to complete this task? What features should the data have? (shape, size, complexity, licence etc).

Can you design this exercise on a predesigned data set if necessary? If so, what should this dataset look like, and where can we find it from? Do we need different data for different learners?

Ideally the data will contain a range of “issues” that can be used to illuminate particular ML concepts. Dataset issues might include:

2. Concept map

What are all the ideas, connections, and assumptions a learner must master to achieve the task(s) described above? Please take photos of your concept maps and upload to the figures directory, with a link below.

High level concept map Data concept map Pipelining concept map Model concept map Assumptions concept map

3. Episodes

Break your concept map up into smaller ‘chunks’. Each new map should only contain 5-6 ideas. These form the individual teaching ‘episodes’ from which our larger topic unit is composed.

Give each ‘chunk’ a title and link to the smaller concept map figure below. Estimate the teaching time.

4. Ordering

We all know how interconnected every concept is, however teaching happens in linear time (let’s debate this over dinner!). So, we now need to turn our concept networks into ordered lists.

Start by ordering your ‘chunks’ or episodes. Then write each idea or concept within a chunk onto a sticky note, and order the sticky notes. Make a poster of your stick note episodes! Do you episodes fit into your teaching time? Bring it to lunch for feedback.

Once you are happy with your design, please transcribe it below.

5. Exercises (formative assessments)

Each sticky note needs an exercise! Start putting your episodes into GitHub, and formulating an exercise to test / teach each concept. Keep in mind the ‘data’ you described back in step one. Try and keep the exercises relevant and engaging.

Keep a note of exercises / tasks that still need work here. Raise them as git issues for completion post workshop.

NOTES & FEEDBACK

1 - What is ML?

Ep 1: Motivating Example, Counter Example

Exercises:

Ep 2: ML Vs. Programming

Exercises:

Ep 3: Types of ML No code

Ep 4: Why to ML?

Ep 5: When to ML?

Ep 6: How to ML

2 - Data Pipelines and ETL (Extraction, Transformation and Loading)

Ep 1: Data

Ep 2: Getting and Viewing Data

Exercises:

Ep 3: Data Munging

Exercises:

Ep 4: Tidying Data

Exercises:

Ep 5: Feature Engineering

Exercises:

Ep 6: Data Augmentation

Exercises:

Ep 7: Data Pipeline (optional?)

Ep 8: Recipe not necessarily code

Exercises:

3 - ML Models

Ep 1: Types of ML Algorithms with code

Ep 2: Types of ML Problems with code

Ep 3: Looking inside the Black Box

Exercises:

Ep 4: Training / Learning

Ep 5: Recipe

4 - Testing and Verification

Ep 1: Introduction

Ep 2: Metrics for Performance

Ep 3: Overfitting

Ep 4: Validation and Hyperparameter Tuning

Ep 5: Recipe