Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ Find out more on our [mission and scope](target-mission) statement and our [road
:hidden:

user_guide/index
user_guide/datasets
examples/index
community/index
api_index
Expand Down
111 changes: 111 additions & 0 deletions docs/source/user_guide/datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Working with Public Datasets

In addition to sample data for testing and examples, `movement` provides access to publicly available datasets of animal poses and trajectories. These datasets can be useful for research, method development, benchmarking, and learning.

## Available Datasets

You can list the available public datasets using:

```python
from movement import list_public_datasets

datasets = list_public_datasets()
print(datasets)
```

To get more information about a specific dataset:

```python
from movement import get_dataset_info

info = get_dataset_info("calms21")
print(info["description"])
print(info["url"])
print(info["paper"])
print(info["license"])
```

## CalMS21 Dataset

The [CalMS21 dataset](https://data.caltech.edu/records/g6fjs-ceqwp) contains multi-animal pose tracking data for various animal types and behavioral tasks.

```python
from movement import public_data

# Fetch mouse data from the open field task
mouse_data = public_data.fetch_calms21(
subset="train",
animal_type="mouse",
task="open_field",
)

# Fetch fly data from the courtship task
fly_data = public_data.fetch_calms21(
subset="train",
animal_type="fly",
task="courtship",
)
```

The available parameters are:

- `subset`: "train", "val", or "test"
- `animal_type`: "mouse", "fly", or "ciona"
- `task`: Depends on the animal type
- For mouse: "open_field", "social_interaction", "resident_intruder"
- For fly: "courtship", "egg_laying", "aggression"
- For ciona: "social_investigation"
- `frame_rate`: Optional, to override the original frame rate

## Rat7M Dataset

The [Rat7M dataset](https://data.caltech.edu/records/bpkf7-jae29) contains tracking data for multiple rats in complex environments.

```python
from movement import public_data

# Fetch data from the open field task
rat_data = public_data.fetch_rat7m(subset="open_field")
```

The available parameters are:

- `subset`: "open_field", "shelter", or "maze"
- `frame_rate`: Optional, to override the original frame rate

## Data Caching

Downloaded datasets are cached locally in the `~/.movement/public_data` directory. This means that after the first download, subsequent requests for the same dataset will be faster as they'll use the local copy.

## Working with the Data

Once loaded, these datasets are returned as standard `movement` xarray Datasets, allowing you to apply all the analysis and visualization tools available in the package:

```python
from movement import public_data
import matplotlib.pyplot as plt

# Fetch data
ds = public_data.fetch_calms21(animal_type="mouse", task="open_field")

# Access position data
position = ds.position

# Compute kinematics
from movement import kinematics
velocity = kinematics.compute_velocity(position)
speed = kinematics.compute_speed(position)

# Visualize
from movement.plots import plot_centroid_trajectory
fig, ax = plot_centroid_trajectory(position)
plt.show()
```

## Citation

When using these public datasets in your research, please cite the original papers:

- CalMS21: Pereira, T. D., et al. (2022). "SLEAP: A deep learning system for multi-animal pose tracking". Nature Methods, 19(4), 486-495. https://arxiv.org/abs/2104.02710

- Rat7M: Dunn et al. (2021). "Geometric deep learning enables 3D kinematic profiling across species and environments". Nature Methods, 18(5), 564-573. https://doi.org/10.1038/s41592-021-01106-6
98 changes: 98 additions & 0 deletions examples/public_datasets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
"""Working with public datasets
==========================

This example demonstrates how to access and work with publicly available
datasets of animal poses and trajectories.
"""

# %%
# Imports
# -------

from movement import public_data

# %%
# Listing available datasets
# -------------------------
# First, let's see what public datasets are available:

datasets = public_data.list_public_datasets()
print("Available public datasets:")
for dataset in datasets:
info = public_data.get_dataset_info(dataset)
print(f"\n{dataset}:")
print(f" Description: {info['description']}")
print(f" URL: {info['url']}")
print(f" Paper: {info['paper']}")
print(f" License: {info['license']}")

# %%
# CalMS21 Dataset
# --------------
# The CalMS21 dataset contains multi-animal pose tracking data for various
# animal types and behavioral tasks.

# %%
# Let's fetch a subset of the CalMS21 dataset with mice in an open field:

mouse_data = public_data.fetch_calms21(
subset="train",
animal_type="mouse",
task="open_field",
)

# NOTE: This is currently a placeholder implementation.
# In the full implementation, this would download and load actual data.

print("\nDataset attributes:")
for key, value in mouse_data.attrs.items():
print(f" {key}: {value}")

# %%
# We can also fetch data for different animal types and tasks:

fly_data = public_data.fetch_calms21(
subset="train",
animal_type="fly",
task="courtship",
)

# NOTE: This is currently a placeholder implementation.
# In the full implementation, this would download and load actual data.

print("\nDataset attributes:")
for key, value in fly_data.attrs.items():
print(f" {key}: {value}")

# %%
# Rat7M Dataset
# ------------
# The Rat7M dataset contains tracking data for multiple rats in complex
# environments.

# %%
# Let's fetch a subset of the Rat7M dataset:

rat_data = public_data.fetch_rat7m(subset="open_field")

# NOTE: This is currently a placeholder implementation.
# In the full implementation, this would download and load actual data.

print("\nDataset attributes:")
for key, value in rat_data.attrs.items():
print(f" {key}: {value}")

# %%
# Using the data
# -------------
# Once the data is loaded, you can use all the movement functionality
# for analysis and visualization.
#
# NOTE: Since we're currently using placeholder data, we can't demonstrate
# actual analysis here. When the full implementation is complete, this
# example will include code for:
#
# - Visualizing trajectories
# - Computing kinematic measures
# - Analyzing behavioral patterns
# - Comparing across datasets
7 changes: 7 additions & 0 deletions movement/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,12 @@

xr.set_options(keep_attrs=True, display_expand_data=False)


# initialize logger upon import
# configure_logging() # This call is incorrect and removed

# Import public datasets module functions to make them available at package level
from movement.public_data import list_public_datasets, get_dataset_info

# Configure logging to stderr and a file
logger.configure()
Loading