Tutorials
Generating 3D Data
To generate example data, we will use the built-in pancreas dataset from scvelo. The process of obtaining the velocity vector components is detailed in the scvelo tutorials. We will focus on one key difference that enables generating three-dimensional data.
adata = scv.datasets.pancreas()
adata
AnnData object with n_obs × n_vars = 3696 × 27998
obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score'
var: 'highly_variable_genes'
uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca'
obsm: 'X_pca', 'X_umap'
layers: 'spliced', 'unspliced'
obsp: 'distances', 'connectivities'
The dataset already contains a UMAP embedding, but it is two-dimensional.
adata.obsm["X_umap"].shape
(3696, 2)
Using scanpy.tl.umap, we will create a three-dimensional UMAP embedding instead. This will allow us to represent the cells in 3D space and the velocity vectors will be determined according to the dimensionality of the specified embedding.
sc.tl.umap(adata, n_components = 3)
adata.obsm["X_umap"].shape
(3696, 3)
scv.pp.filter_genes(adata, min_shared_counts=20)
scv.pp.normalize_per_cell(adata)
scv.pp.filter_genes_dispersion(adata, n_top_genes=2000)
scv.pp.log1p(adata)
adata
AnnData object with n_obs × n_vars = 3696 × 2000
obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts'
var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'umap', 'log1p'
obsm: 'X_pca', 'X_umap'
layers: 'spliced', 'unspliced'
obsp: 'distances', 'connectivities'
scv.tl.velocity_graph(adata)
scv.tl.velocity(adata)
scv.tl.velocity_embedding(adata, basis='umap')
adata
AnnData object with n_obs × n_vars = 3696 × 2000
obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition'
var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'highly_variable', 'velocity_gamma', 'velocity_qreg_ratio', 'velocity_r2', 'velocity_genes'
uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'umap', 'log1p', 'velocity_params', 'velocity_graph', 'velocity_graph_neg'
obsm: 'X_pca', 'X_umap', 'velocity_umap'
layers: 'spliced', 'unspliced', 'Ms', 'Mu', 'velocity', 'variance_velocity'
obsp: 'distances', 'connectivities'
The velocity vectors have been successfully determined and are located in obsm
as velocity_umap
.
Reducing the file size
Dash, which is used to create Cell Journey, has its limitations. Loading very large files can be automatically interrupted. Therefore, files for the Cell Journey analysis should be stripped of unnecessary data, especially large dense matrices. For the pancreas dataset it is sufficient to limit the data to what is contained in var
, obs
, obsm
, and the sparse X
matrix.
adata_slim = sc.AnnData(X=adata.X, obs=adata.obs, var=adata.var, obsm=adata.obsm)
adata_slim.write("pancreas_slim.h5ad)
For comparison, we can also save the entire adata dataset.
adata.write("pancreas_full.h5ad")
full_dataset = os.stat("pancreas_full.h5ad")
full_dataset_size = full_dataset.st_size / (1024 ** 2)
slim_dataset = os.stat("pancreas_slim.h5ad")
slim_dataset_size = slim_dataset.st_size / (1024 ** 2)
print(f"Full dataset: {full_dataset_size:.2f} MB, slim dataset: {slim_dataset_size:.2f} MB")
Full dataset: 1756.84 MB, slim dataset: 14.64 MB
Recreating article figures
Pancreatic endocrinogenesis
- Upload data and select coordinates: Load
pancreas.h5ad
provided in the datasets directory. - Upload data and select coordinates: Select X_umap(1), X_umap(2), and X_umap(3) as X, Y, and Z coordinates.
- Upload data and select coordinates: Select velocity_umap (1), velocity_umap (2), velocity_umap (3) as U, V, and W coordinates.
- Upload data and select coordinates: Click
Submit selected coordinates
. - Upload data and select coordinates: Set
Target sum
to 10000 and clickLognormalize
. - Global plot configuration: change
Axes
switch toHide
.
Figure C (SCATTER PLOT)
- Scatter plot: select
clusters_coarse
from theSelect feature
dropdown menu. - Global plot configuration: change
Legend: horizontal position
andLegend: vertical position
to obtain an optimal position, e.g. 0.50 and 0.30 accordingly.
Figure C (CONE PLOT)
- Cone plot: select
rainbow
from theColor scale
dropdown menu. - Cone plot: set
Cone size
to12.00
.
Figure C (STREAMLINES)
- Streamline plot: set
Grid size
to 20,Number of steps
to 500,Step size
to 2.00, andDifference threshold
to 0.001. - Streamline plot: click
Generate trajectories (streamlines and streamlets)
. - Streamline plot: uncheck
Combine trajectories with the scatter plot
switch. - Streamline plot: change
Line width
to 4.0.
Figure C (STREAMLETS)
- Repeat the steps for Figure C (STREAMLINES).
- Streamline plot: change
Show streamlines
toShow streamlets
. - Streamline plot: set
Streamlets length
to 10. - Streamline plot: click
Update streamlets
. - Streamline plot: change
Color scale
to Reds.
Figure C (SCATTER + VOLUME PLOT)
- Scatter plot: input Serping1 in the
Modality feature
- Scatter plot: select Turbo from the
Built-in continuous color scale
dropdown menu. - Scatter plot: change
Add volume plot to continuous feature
andSingle color scater when volume is plotted
to ON. - Scatter plot: select the second color from the left in the second row of the suggested colors (light grey box).
- Scatter plot: select linear from the
Radial basis function
dropdown menu. - Scatter plot: change
Point size
to 1.00,Volume plot transparency cut-off quantile
to 50,Volume plot opacity
to 0.06,Gaussian filter standard deviation multiplier
to 2.00, andRadius scaler
to 1.300.
Figure C (SCATTER + STREAMLINES)
- Repeat the steps for Figure C (STREAMLINES).
- Streamline plot: change
Combine trajectories with the scatter plot
to ON. - Streamline plot: set
Subset current number of trajectories
to 70 and clickConfirm
. - Scatter plot: change
Built-in continuous color scale
to Balance.
Figure B
- Scatter plot: select clusters from the
Select feature
dropdown menu. - Scatter plot: change
Use custom color palette
to ON, and paste #1DACD6 #FFAACC #66FF66 #0066FF #FF7A00 #FC2847 #FDFF00 #000000 into theSpace-separated list of color hex values (max 20 colors)
field. - Streamline plot: change
Color scale
to Greys, andLine width
to 10.0. - Cell Journey (trajectory): click
Generate grid
. - Cell Journey (trajectory): set
Number of clusters
to 8,Number of automatically selected features
to 200,Tube segments
to 25,Features activities shown in heatmap
to Relative to first segment, andHighlight selected cells
to Don't highlight. - Click on a random cell from the Ngn3 low EP cluster. Try a few cells within the suggested area if the first one didn't result in an appropriate trajectory.
Bone marrow mononuclear progenitors
- Upload data and select coordinates: Load
bone_marrow.h5ad
provided in the datasets directory. - Upload data and select coordinates: Select RNA: X_umap(1), RNA: X_umap(2), and RNA: X_umap(3) as X, Y, and Z coordinates.
- Upload data and select coordinates: Select RNA: velocity_umap(1), RNA: velocity_umap(2), and RNA: velocity_umap(3) as U, V, and W coordinates.
- Upload data and select coordinates: Click
Submit selected coordinates
. - Upload data and select coordinates: Select RNA modality, set
Target sum
to 10000, and clickLognormalize
. Select ADT modality, setTarget sum
to 10000, and clickLognormalize
. - Global plot configuration: change
Axes
switch toHide
.
Figure D (SCATTER PLOT + TRAJECTORY + TUBE CELLS)
- Global plot configuration: change
Legend
switch to Hide. - Scatter plot: change
Point size
to 1.00. - Scatter plot: select second color from the left in the second row of the suggested colors (light grey box).
- Streamline plot: set
Grid size
to 25,Number of steps
to 500, andDifference threshold
to 0.001, and clickGenerate trajectories (streamlines and streamlets)
. - Streamline plot: change
Show streamlines
toShow streamlets
, setStremlets length
to 10, and clickUpdate streamlets
. - Streamline plot: change
Color scale
to Jet. - Scatter plot: select ADT from the
Modality
dropdown menu and input CD34 in the field below. - Scatter plot: select Reds from the
Built-in continuous color scale
field. - Scatter plot: change
Add volume plot to continuous feature
andSingle color scater when volume is plotted
to ON. - Scatter plot: set
Volume plot transparency cut-off quantile
to 55,Volume plot opacity
to 0.04,Gaussian filter standard deviation multiplier
to 3.00, andRadius scaler
to 1.300.
Figure D (SCATTER + STREAMLETS + VOLUME PLOT)
- Streamline plot: change
Lide width
to 5.0. - Cell Journey (trajectory): click
Generate grid
. - Cell Journey (trajectory): set
Tube segments
to 5 andHighlight selected cells
to Each segment separately. - Global plot configuration: change
Legend: horizontal position
andLegend: vertical position
to obtain optimal position, e.g. to 0.20 in both cases.
Figure D (RNA MODALITY HEATMAP)
- Cell Journey (trajectory): click
Generate grid
. - Cell Journey (trajectory): set
Step size
to 2.00,Tube segments
to 20,Number of clusters
to 8,Number of automatically selected features
to 50, andHeatmap color scale
to Inferno. - Scatter plot: select RNA from the
Modality
dropdown menu. - Click on a random cell from the center the point cloud. Try few cells within the suggested area if the first one didn't result in an appropriate trajectory.
- Cell Journey (trajectory): select Box plot from the
Plot type
dropdown menu, setTrendline
to Median-based cubic spline. - Find HBB gene by hovering the heatmap. Click on any segment to obtain Figure D (RNA: HBB ALONG TRAJECTORY).
Figure D (ADT MODALITY HEATMAP)
- Cell Journey (trajectory): click
Generate grid
. - Cell Journey (trajectory): set
Step size
to 2.00,Tube segments
to 20,Number of clusters
to 3,Number of automatically selected features
to 10, andHeatmap color scale
to Inferno. - Scatter plot: select ADT from the
Modality
dropdown menu. - Click on a random cell from the center of the point cloud. Try a few cells within the suggested area if the first one didn't result in an appropriate trajectory.
- Cell Journey (trajectory): select Box plot from the
Plot type
dropdown menu, setTrendline
to Median-based cubic spline. - Find CD34 gene by hovering the heatmap. Click on any segment to obtain Figure D (ADT: CD34 ALONG TRAJECTORY).