Tutorials

Generating 3D Data

To generate example data, we will use the built-in pancreas dataset from scvelo. The process of obtaining the velocity vector components is detailed in the scvelo tutorials. We will focus on one key difference that enables generating three-dimensional data.

adata = scv.datasets.pancreas()
adata
AnnData object with n_obs × n_vars = 3696 × 27998
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score'
    var: 'highly_variable_genes'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca'
    obsm: 'X_pca', 'X_umap'
    layers: 'spliced', 'unspliced'
    obsp: 'distances', 'connectivities'

The dataset already contains a UMAP embedding, but it is two-dimensional.

adata.obsm["X_umap"].shape
(3696, 2)

Using scanpy.tl.umap, we will create a three-dimensional UMAP embedding instead. This will allow us to represent the cells in 3D space and the velocity vectors will be determined according to the dimensionality of the specified embedding.

sc.tl.umap(adata, n_components = 3)
adata.obsm["X_umap"].shape
(3696, 3)
scv.pp.filter_genes(adata, min_shared_counts=20)
scv.pp.normalize_per_cell(adata)
scv.pp.filter_genes_dispersion(adata, n_top_genes=2000)
scv.pp.log1p(adata)
adata
AnnData object with n_obs × n_vars = 3696 × 2000
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'umap', 'log1p'
    obsm: 'X_pca', 'X_umap'
    layers: 'spliced', 'unspliced'
    obsp: 'distances', 'connectivities'
scv.tl.velocity_graph(adata)
scv.tl.velocity(adata)
scv.tl.velocity_embedding(adata, basis='umap')
adata
AnnData object with n_obs × n_vars = 3696 × 2000
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'highly_variable', 'velocity_gamma', 'velocity_qreg_ratio', 'velocity_r2', 'velocity_genes'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'umap', 'log1p', 'velocity_params', 'velocity_graph', 'velocity_graph_neg'
    obsm: 'X_pca', 'X_umap', 'velocity_umap'
    layers: 'spliced', 'unspliced', 'Ms', 'Mu', 'velocity', 'variance_velocity'
    obsp: 'distances', 'connectivities'

The velocity vectors have been successfully determined and are located in obsm as velocity_umap.

Reducing the file size

Dash, which is used to create Cell Journey, has its limitations. Loading very large files can be automatically interrupted. Therefore, files for the Cell Journey analysis should be stripped of unnecessary data, especially large dense matrices. For the pancreas dataset it is sufficient to limit the data to what is contained in var, obs, obsm, and the sparse X matrix.

adata_slim = sc.AnnData(X=adata.X, obs=adata.obs, var=adata.var, obsm=adata.obsm)
adata_slim.write("pancreas_slim.h5ad)

For comparison, we can also save the entire adata dataset.

adata.write("pancreas_full.h5ad")
full_dataset = os.stat("pancreas_full.h5ad")
full_dataset_size = full_dataset.st_size / (1024 ** 2)
slim_dataset = os.stat("pancreas_slim.h5ad")
slim_dataset_size = slim_dataset.st_size / (1024 ** 2)
print(f"Full dataset: {full_dataset_size:.2f} MB, slim dataset: {slim_dataset_size:.2f} MB")
Full dataset: 1756.84 MB, slim dataset: 14.64 MB

Recreating article figures

Pancreatic endocrinogenesis

  1. Upload data and select coordinates: Load pancreas.h5ad provided in the datasets directory.
  2. Upload data and select coordinates: Select X_umap(1), X_umap(2), and X_umap(3) as X, Y, and Z coordinates.
  3. Upload data and select coordinates: Select velocity_umap (1), velocity_umap (2), velocity_umap (3) as U, V, and W coordinates.
  4. Upload data and select coordinates: Click Submit selected coordinates.
  5. Upload data and select coordinates: Set Target sum to 10000 and click Lognormalize.
  6. Global plot configuration: change Axes switch to Hide.

Figure C (SCATTER PLOT)

  1. Scatter plot: select clusters_coarse from the Select feature dropdown menu.
  2. Global plot configuration: change Legend: horizontal position and Legend: vertical position to obtain an optimal position, e.g. 0.50 and 0.30 accordingly.

Figure C (CONE PLOT)

  1. Cone plot: select rainbow from the Color scale dropdown menu.
  2. Cone plot: set Cone size to 12.00.

Figure C (STREAMLINES)

  1. Streamline plot: set Grid size to 20, Number of steps to 500, Step size to 2.00, and Difference threshold to 0.001.
  2. Streamline plot: click Generate trajectories (streamlines and streamlets).
  3. Streamline plot: uncheck Combine trajectories with the scatter plot switch.
  4. Streamline plot: change Line width to 4.0.

Figure C (STREAMLETS)

  1. Repeat the steps for Figure C (STREAMLINES).
  2. Streamline plot: change Show streamlines to Show streamlets.
  3. Streamline plot: set Streamlets length to 10.
  4. Streamline plot: click Update streamlets.
  5. Streamline plot: change Color scale to Reds.

Figure C (SCATTER + VOLUME PLOT)

  1. Scatter plot: input Serping1 in the Modality feature
  2. Scatter plot: select Turbo from the Built-in continuous color scale dropdown menu.
  3. Scatter plot: change Add volume plot to continuous feature and Single color scater when volume is plotted to ON.
  4. Scatter plot: select the second color from the left in the second row of the suggested colors (light grey box).
  5. Scatter plot: select linear from the Radial basis function dropdown menu.
  6. Scatter plot: change Point size to 1.00, Volume plot transparency cut-off quantile to 50, Volume plot opacity to 0.06, Gaussian filter standard deviation multiplier to 2.00, and Radius scaler to 1.300.

Figure C (SCATTER + STREAMLINES)

  1. Repeat the steps for Figure C (STREAMLINES).
  2. Streamline plot: change Combine trajectories with the scatter plot to ON.
  3. Streamline plot: set Subset current number of trajectories to 70 and click Confirm.
  4. Scatter plot: change Built-in continuous color scale to Balance.

Figure B

  1. Scatter plot: select clusters from the Select feature dropdown menu.
  2. Scatter plot: change Use custom color palette to ON, and paste #1DACD6 #FFAACC #66FF66 #0066FF #FF7A00 #FC2847 #FDFF00 #000000 into the Space-separated list of color hex values (max 20 colors) field.
  3. Streamline plot: change Color scale to Greys, and Line width to 10.0.
  4. Cell Journey (trajectory): click Generate grid.
  5. Cell Journey (trajectory): set Number of clusters to 8, Number of automatically selected features to 200, Tube segments to 25, Features activities shown in heatmap to Relative to first segment, and Highlight selected cells to Don't highlight.
  6. Click on a random cell from the Ngn3 low EP cluster. Try a few cells within the suggested area if the first one didn't result in an appropriate trajectory.

Bone marrow mononuclear progenitors

  1. Upload data and select coordinates: Load bone_marrow.h5ad provided in the datasets directory.
  2. Upload data and select coordinates: Select RNA: X_umap(1), RNA: X_umap(2), and RNA: X_umap(3) as X, Y, and Z coordinates.
  3. Upload data and select coordinates: Select RNA: velocity_umap(1), RNA: velocity_umap(2), and RNA: velocity_umap(3) as U, V, and W coordinates.
  4. Upload data and select coordinates: Click Submit selected coordinates.
  5. Upload data and select coordinates: Select RNA modality, set Target sum to 10000, and click Lognormalize. Select ADT modality, set Target sum to 10000, and click Lognormalize.
  6. Global plot configuration: change Axes switch to Hide.

Figure D (SCATTER PLOT + TRAJECTORY + TUBE CELLS)

  1. Global plot configuration: change Legend switch to Hide.
  2. Scatter plot: change Point size to 1.00.
  3. Scatter plot: select second color from the left in the second row of the suggested colors (light grey box).
  4. Streamline plot: set Grid size to 25, Number of steps to 500, and Difference threshold to 0.001, and click Generate trajectories (streamlines and streamlets).
  5. Streamline plot: change Show streamlines to Show streamlets, set Stremlets length to 10, and click Update streamlets.
  6. Streamline plot: change Color scale to Jet.
  7. Scatter plot: select ADT from the Modality dropdown menu and input CD34 in the field below.
  8. Scatter plot: select Reds from the Built-in continuous color scale field.
  9. Scatter plot: change Add volume plot to continuous feature and Single color scater when volume is plotted to ON.
  10. Scatter plot: set Volume plot transparency cut-off quantile to 55, Volume plot opacity to 0.04, Gaussian filter standard deviation multiplier to 3.00, and Radius scaler to 1.300.

Figure D (SCATTER + STREAMLETS + VOLUME PLOT)

  1. Streamline plot: change Lide width to 5.0.
  2. Cell Journey (trajectory): click Generate grid.
  3. Cell Journey (trajectory): set Tube segments to 5 and Highlight selected cells to Each segment separately.
  4. Global plot configuration: change Legend: horizontal position and Legend: vertical position to obtain optimal position, e.g. to 0.20 in both cases.

Figure D (RNA MODALITY HEATMAP)

  1. Cell Journey (trajectory): click Generate grid.
  2. Cell Journey (trajectory): set Step size to 2.00, Tube segments to 20, Number of clusters to 8, Number of automatically selected features to 50, and Heatmap color scale to Inferno.
  3. Scatter plot: select RNA from the Modality dropdown menu.
  4. Click on a random cell from the center the point cloud. Try few cells within the suggested area if the first one didn't result in an appropriate trajectory.
  5. Cell Journey (trajectory): select Box plot from the Plot type dropdown menu, set Trendline to Median-based cubic spline.
  6. Find HBB gene by hovering the heatmap. Click on any segment to obtain Figure D (RNA: HBB ALONG TRAJECTORY).

Figure D (ADT MODALITY HEATMAP)

  1. Cell Journey (trajectory): click Generate grid.
  2. Cell Journey (trajectory): set Step size to 2.00, Tube segments to 20, Number of clusters to 3, Number of automatically selected features to 10, and Heatmap color scale to Inferno.
  3. Scatter plot: select ADT from the Modality dropdown menu.
  4. Click on a random cell from the center of the point cloud. Try a few cells within the suggested area if the first one didn't result in an appropriate trajectory.
  5. Cell Journey (trajectory): select Box plot from the Plot type dropdown menu, set Trendline to Median-based cubic spline.
  6. Find CD34 gene by hovering the heatmap. Click on any segment to obtain Figure D (ADT: CD34 ALONG TRAJECTORY).