Three common options for storage are text files, spreadsheets, and databases. Text files are easiest to create, and work well with version control, but then we would have to build search and analysis tools ourselves. Spreadsheets are good for doing simple analyses, but they don’t handle large or complex data sets well. Databases, however, include powerful tools for search and analysis, and can handle large, complex data sets. These lessons will show how to use a relational database to explore data, using SQL - Structured Query Language.
Prerequisites
- This lesson requires the Unix shell, plus SQLite3 or DB Browser for SQLite.
- Please download the database we will use: survey.db
Setup | Download files required for the lesson | |
00:00 | 1. Why use Relational Databases |
What are Relational Databases?
Why use Relational Databases? |
00:10 | 2. Selecting Data | How can I get data from a database? |
00:25 | 3. Sorting and Removing Duplicates |
How can I sort a query’s results?
How can I remove duplicate values from a query’s results? |
00:45 | 4. Filtering | How can I select subsets of data? |
01:05 | 5. Calculating New Values | How can I calculate new values on the fly? |
01:15 | 6. Missing Data |
How do databases represent missing information?
What special handling does missing information require? |
01:45 | 7. Aggregation | How can I calculate sums, averages, and other summary values? |
02:05 | 8. Combining Data | How can I combine data from multiple tables? |
02:45 | 9. LLMs for SQL | How can I use LLMs to help with databases and SQL? |
03:05 | 10. Interfacing Programming Languages and Databases - Python | How can I access databases from programs written in Python? |
03:40 | 11. Geospatial Data Science with Python and Databases |
How can I load geospatial data in a database?
What processing can be done with geospatial database? |
04:25 | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.