Why use Relational Databases
|
Relational Databases are an efficient way to store and query data…
making use of relationships between multiple tables of information.
The most common syntax for interacting with such databases is SQL- Structured Query Language.
Implementations include MySQL, SQL Server, Oracle and Postgres.
|
Selecting Data
|
|
Sorting and Removing Duplicates
|
The records in a database table are not intrinsically ordered: if we want to display them in some order, we must specify that explicitly with ORDER BY.
The values in a database are not guaranteed to be unique: if we want to eliminate duplicates, we must specify that explicitly as well using DISTINCT.
|
Filtering
|
Use WHERE to specify conditions that records must meet in order to be included in a query’s results.
Use AND, OR, and NOT to combine tests.
Filtering is done on whole records, so conditions can use fields that are not actually displayed.
Write queries incrementally.
|
Calculating New Values
|
|
Missing Data
|
Databases use a special value called NULL to represent missing information.
Almost all operations on NULL produce NULL.
Queries can test for NULLs using IS NULL and IS NOT NULL.
|
Aggregation
|
Use aggregation functions to combine multiple values.
Aggregation functions ignore null values.
Aggregation happens after filtering.
Use GROUP BY to combine subsets separately.
If no aggregation function is specified for a field, the query may return an arbitrary value for that field.
|
Combining Data
|
Use JOIN to combine data from two tables.
Use table.field notation to refer to fields when doing joins.
Every fact should be represented in a database exactly once.
A join produces all combinations of records from one table with records from another.
A primary key is a field (or set of fields) whose values uniquely identify the records in a table.
A foreign key is a field (or set of fields) in one table whose values are a primary key in another table.
We can eliminate meaningless combinations of records by matching primary keys and foreign keys between tables.
The most common join condition is matching keys.
|
LLMs for SQL
|
LLMs can be used for tasks including tutoring, debugging, writing code, fixing errors, and search
LLMs can generate incorrect or unexpected responses
Testing is essential and this too can be in collaboration with an LLM
|
Interfacing Programming Languages and Databases - Python
|
|
Geospatial Data Science with Python and Databases
|
Shapefiles are a common geospatial format that can be used in DuckDB
Geospatial data can be processed in some specialized databases
Visualising Geospatial Data from DuckDB is possible using GeoPandas and Folium
|