Tutorials
Example datasets
Generating 3D Data
To generate example data, we will use the built-in pancreas dataset from scvelo. The process of obtaining the velocity vector components is detailed in the scvelo tutorials. We will focus on one key difference that enables generating three-dimensional data.
import scanpy as sc
import scvelo as scv
adata = scv.datasets.pancreas()
adata
AnnData object with n_obs × n_vars = 3696 × 27998
obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score'
var: 'highly_variable_genes'
uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca'
obsm: 'X_pca', 'X_umap'
layers: 'spliced', 'unspliced'
obsp: 'distances', 'connectivities'
The dataset already contains a UMAP embedding, but it is two-dimensional.
adata.obsm["X_umap"].shape
(3696, 2)
Using scanpy.tl.umap, we will create a three-dimensional UMAP embedding instead. This will allow us to represent the cells in 3D space and the velocity vectors will be determined according to the dimensionality of the specified embedding.
sc.tl.umap(adata, n_components = 3)
adata.obsm["X_umap"].shape
(3696, 3)
scv.pp.filter_genes(adata, min_shared_counts=20)
scv.pp.normalize_per_cell(adata)
scv.pp.filter_genes_dispersion(adata, n_top_genes=2000)
scv.pp.log1p(adata)
adata
AnnData object with n_obs × n_vars = 3696 × 2000
obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts'
var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'umap', 'log1p'
obsm: 'X_pca', 'X_umap'
layers: 'spliced', 'unspliced'
obsp: 'distances', 'connectivities'
scv.tl.velocity_graph(adata)
scv.tl.velocity(adata)
scv.tl.velocity_embedding(adata, basis="umap")
adata
AnnData object with n_obs × n_vars = 3696 × 2000
obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition'
var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'highly_variable', 'velocity_gamma', 'velocity_qreg_ratio', 'velocity_r2', 'velocity_genes'
uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'umap', 'log1p', 'velocity_params', 'velocity_graph', 'velocity_graph_neg'
obsm: 'X_pca', 'X_umap', 'velocity_umap'
layers: 'spliced', 'unspliced', 'Ms', 'Mu', 'velocity', 'variance_velocity'
obsp: 'distances', 'connectivities'
The velocity vectors have been successfully determined and are located in obsm as velocity_umap.
Reducing the file size
Dash, which is used to create Cell Journey, has its limitations. Loading very large files can be automatically interrupted. Therefore, files for the Cell Journey analysis should be stripped of unnecessary data, especially large dense matrices. For the pancreas dataset it is sufficient to limit the data to what is contained in var, obs, obsm, and the sparse X matrix.
import scanpy as sc
import os
adata_slim = sc.AnnData(X=adata.X, obs=adata.obs, var=adata.var, obsm=adata.obsm)
adata_slim.write("pancreas_slim.h5ad")
For comparison, we can also save the entire adata dataset.
adata.write("pancreas_full.h5ad")
full_dataset = os.stat("pancreas_full.h5ad")
full_dataset_size = full_dataset.st_size / (1024 ** 2)
slim_dataset = os.stat("pancreas_slim.h5ad")
slim_dataset_size = slim_dataset.st_size / (1024 ** 2)
print(f"Full dataset: {full_dataset_size:.2f} MB, slim dataset: {slim_dataset_size:.2f} MB")
Full dataset: 1756.84 MB, slim dataset: 14.64 MB
Recreating article figures
Pancreatic endocrinogenesis
- Upload data and select coordinates: Load
pancreas.h5adprovided in the datasets directory. - Upload data and select coordinates: Select X_umap(1), X_umap(2), and X_umap(3) as X, Y, and Z coordinates.
- Upload data and select coordinates: Select velocity_umap (1), velocity_umap (2), velocity_umap (3) as U, V, and W coordinates.
- Upload data and select coordinates: Click
Submit selected coordinates. - Upload data and select coordinates: Set
Target sumto 10000 and clickLognormalize. - Global plot configuration: change
Axesswitch toHide.
Figure C (SCATTER PLOT)
- Scatter plot: select
clusters_coarsefrom theSelect featuredropdown menu. - Global plot configuration: change
Legend: horizontal positionandLegend: vertical positionto obtain an optimal position, e.g. 0.50 and 0.30 accordingly.
Figure C (CONE PLOT)
- Cone plot: select
rainbowfrom theColor scaledropdown menu. - Cone plot: set
Cone sizeto12.00.
Figure C (STREAMLINES)
- Streamline plot: set
Grid sizeto 20,Number of stepsto 500,Step sizeto 2.00, andDifference thresholdto 0.001. - Streamline plot: click
Generate trajectories (streamlines and streamlets). - Streamline plot: uncheck
Combine trajectories with the scatter plotswitch. - Streamline plot: change
Line widthto 4.0.
Figure C (STREAMLETS)
- Repeat the steps for Figure C (STREAMLINES).
- Streamline plot: change
Show streamlinestoShow streamlets. - Streamline plot: set
Streamlets lengthto 10. - Streamline plot: click
Update streamlets. - Streamline plot: change
Color scaleto Reds.
Figure C (SCATTER + VOLUME PLOT)
- Scatter plot: input Serping1 in the
Modality feature - Scatter plot: select Turbo from the
Built-in continuous color scaledropdown menu. - Scatter plot: change
Add volume plot to continuous featureandSingle color scater when volume is plottedto ON. - Scatter plot: select the second color from the left in the second row of the suggested colors (light grey box).
- Scatter plot: select linear from the
Radial basis functiondropdown menu. - Scatter plot: change
Point sizeto 1.00,Volume plot transparency cut-off quantileto 50,Volume plot opacityto 0.06,Gaussian filter standard deviation multiplierto 2.00, andRadius scalerto 1.300.
Figure C (SCATTER + STREAMLINES)
- Repeat the steps for Figure C (STREAMLINES).
- Streamline plot: change
Combine trajectories with the scatter plotto ON. - Streamline plot: set
Subset current number of trajectoriesto 70 and clickConfirm. - Scatter plot: change
Built-in continuous color scaleto Balance.
Figure B
- Scatter plot: select clusters from the
Select featuredropdown menu. - Scatter plot: change
Use custom color paletteto ON, and paste #1DACD6 #FFAACC #66FF66 #0066FF #FF7A00 #FC2847 #FDFF00 #000000 into theSpace-separated list of color hex values (max 20 colors)field. - Streamline plot: change
Color scaleto Greys, andLine widthto 10.0. - Cell Journey (trajectory): click
Generate grid. - Cell Journey (trajectory): set
Number of clustersto 8,Number of automatically selected featuresto 200,Tube segmentsto 25,Features activities shown in heatmapto Relative to first segment, andHighlight selected cellsto Don't highlight. - Click on a random cell from the Ngn3 low EP cluster. Try a few cells within the suggested area if the first one didn't result in an appropriate trajectory.
Bone marrow mononuclear progenitors
- Upload data and select coordinates: Load
bone_marrow.h5adprovided in the datasets directory. - Upload data and select coordinates: Select RNA: X_umap(1), RNA: X_umap(2), and RNA: X_umap(3) as X, Y, and Z coordinates.
- Upload data and select coordinates: Select RNA: velocity_umap(1), RNA: velocity_umap(2), and RNA: velocity_umap(3) as U, V, and W coordinates.
- Upload data and select coordinates: Click
Submit selected coordinates. - Upload data and select coordinates: Select RNA modality, set
Target sumto 10000, and clickLognormalize. Select ADT modality, setTarget sumto 10000, and clickLognormalize. - Global plot configuration: change
Axesswitch toHide.
Figure D (SCATTER PLOT + TRAJECTORY + TUBE CELLS)
- Global plot configuration: change
Legendswitch to Hide. - Scatter plot: change
Point sizeto 1.00. - Scatter plot: select second color from the left in the second row of the suggested colors (light grey box).
- Streamline plot: set
Grid sizeto 25,Number of stepsto 500, andDifference thresholdto 0.001, and clickGenerate trajectories (streamlines and streamlets). - Streamline plot: change
Show streamlinestoShow streamlets, setStremlets lengthto 10, and clickUpdate streamlets. - Streamline plot: change
Color scaleto Jet. - Scatter plot: select ADT from the
Modalitydropdown menu and input CD34 in the field below. - Scatter plot: select Reds from the
Built-in continuous color scalefield. - Scatter plot: change
Add volume plot to continuous featureandSingle color scater when volume is plottedto ON. - Scatter plot: set
Volume plot transparency cut-off quantileto 55,Volume plot opacityto 0.04,Gaussian filter standard deviation multiplierto 3.00, andRadius scalerto 1.300.
Figure D (SCATTER + STREAMLETS + VOLUME PLOT)
- Streamline plot: change
Lide widthto 5.0. - Cell Journey (trajectory): click
Generate grid. - Cell Journey (trajectory): set
Tube segmentsto 5 andHighlight selected cellsto Each segment separately. - Global plot configuration: change
Legend: horizontal positionandLegend: vertical positionto obtain optimal position, e.g. to 0.20 in both cases.
Figure D (RNA MODALITY HEATMAP)
- Cell Journey (trajectory): click
Generate grid. - Cell Journey (trajectory): set
Step sizeto 2.00,Tube segmentsto 20,Number of clustersto 8,Number of automatically selected featuresto 50, andHeatmap color scaleto Inferno. - Scatter plot: select RNA from the
Modalitydropdown menu. - Click on a random cell from the center the point cloud. Try few cells within the suggested area if the first one didn't result in an appropriate trajectory.
- Cell Journey (trajectory): select Box plot from the
Plot typedropdown menu, setTrendlineto Median-based cubic spline. - Find HBB gene by hovering the heatmap. Click on any segment to obtain Figure D (RNA: HBB ALONG TRAJECTORY).
Figure D (ADT MODALITY HEATMAP)
- Cell Journey (trajectory): click
Generate grid. - Cell Journey (trajectory): set
Step sizeto 2.00,Tube segmentsto 20,Number of clustersto 3,Number of automatically selected featuresto 10, andHeatmap color scaleto Inferno. - Scatter plot: select ADT from the
Modalitydropdown menu. - Click on a random cell from the center of the point cloud. Try a few cells within the suggested area if the first one didn't result in an appropriate trajectory.
- Cell Journey (trajectory): select Box plot from the
Plot typedropdown menu, setTrendlineto Median-based cubic spline. - Find CD34 gene by hovering the heatmap. Click on any segment to obtain Figure D (ADT: CD34 ALONG TRAJECTORY).