RELION

RELION developers

Aug 27, 2024

CONTENTS

1 What’s new? 1

1.1 Release 5.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Release 4.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Release 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 The team 5

2.1 Current members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Past members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Installation 9

3.1 Download RELION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2 Setup a conda environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.4 General conﬁguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.5 Conﬁguration with CPU acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.6 Conﬁguration with HIP/ROCm acceleration for AMD GPUs . . . . . . . . . . . . . . . . . . . . . . 15

3.7 Conﬁguration with SYCL acceleration (Intel GPUs) . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.8 Set-up queue job submission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.9 Edit the environment set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Single particle tutorial 19

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 Particle picking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.4 Reference-free 2D class averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.5 De novo 3D model generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.6 Unsupervised 3D classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.7 High-resolution 3D reﬁnement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.8 Mask creation & Postprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.9 CTF and aberration reﬁnement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.10 Bayesian polishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.11 Local-resolution estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.12 Checking the handedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.13 ModelAngelo: atomic modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.14 DynaMight: exploring motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.15 Wrapping up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5 Subtomogram tutorial 69

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.2 Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.3 Motion correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.4 CTF estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.5 Exclude tilt-images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.6 Align tilt-series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.7 Reconstruct tomograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.8 Denoise tomograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.9 Particle picking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.10 Extract subtomos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.11 Import coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.12 De novo 3D model generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.13 Reconstruct particle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.14 Initial 3D reﬁnement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.15 Duplicate particles removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.16 3D classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.17 High-resolution 3D reﬁnement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.18 Tomo reﬁnement cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.19 Tomo reﬁnement 1: CTF reﬁnement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.20 Tomo reﬁnement 2: Bayesian polishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.21 Model building with ModelAngelo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6 On-the-ﬂy processing 105

6.1 Computation settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.2 Preprocessing settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6.3 Particle settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.4 Processing settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.5 Intervening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.6 Control more options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.7 Site-speciﬁc setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

7 Reference pages 111

7.1 Movie Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7.2 Pixel size issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.3 Automation: Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7.4 Helical reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

7.5 Subtomogram Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

7.6 Using RELION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.7 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

7.8 Developer Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

7.9 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Bibliography 163

CHAPTER

ONE

WHAT’S NEW?

1.1 Release 5.0

Blush regularisation

Dari Kimanius has developed a new method to incorporate more prior knowledge into the cryo-EM reﬁnement process

than the one typically used (which merely assumes smoothness in real-space, or limited power in Fourier-space). This

method is called Blush regularisation and it uses a denoising convolutional neural network inside the iterative reﬁnement

algorithm of Class3D, Reﬁne3D or MultiBody jobs. The eﬀects of this are largest when the signal is weak and standard

reﬁnement in RELION would overﬁt (as for example visible from streaky artefacts in the solvent region). Using Blush

reglarisation, Dari successfully reﬁned a data set of a 40 kDa protein:RNA complex to 2.5A. The same data set was

intractable in standard RELION or CryoSPARC.

DynaMight for modelling continuous structural heterogeneity

Johannes Schwab developed a method called DynaMight that ‘explores protein Dyna-mics, and Might improve your

map’. It is based on a variational auto-encoder that predicts 3D deformations of a Gaussian model for the consensus

map, and a deformed backprojection algorithm that attempts to “un-do” these deformations to reconstruct an improved

consensus map.

ModelAngelo for automated atomic model building

Kiarash Jamali developed a machine-learning approach for automated atomic model building and identiﬁcation of

unknown proteins in cryo-EM maps. ModelAngelo will build most of your automatically, provided the resolution

extends beyond 3.5-4.0 Angstroms. Goodbye to months in the dark graphics room!

Select subsets of ﬁlaments using dendrograms

David Li developed a useful utility to select subsets of ﬁlament particles that belong to the same structural class. It has

been implemented on the Helix tab of the Subset selection job type.

Support for AMD and Intel GPUs (HIP/ROCm and SYCL)

Suyash Tandon from AMD and Jason Do from Intel, together with their colleagues, have contributed code for GPU

acceleration of relion-5 in HIP/ROCm and SYCL, respectively. This means that the relion_refine program can

now also be run eﬃciently on AMD and Intel GPUs. (The previously existing CUDA implementation and vectorised

CPU-acceleration still work too.)

A complete subtomo-gram averaging pipeline

Alister Burt, Euan Pyle, Sjors Scheres and others have developed a new pipeline for sub-tomogram averaging that starts

with serialEM mdoc ﬁles and raw movies, and potentially ends with automated model building by ModelAngelo. You

can access it by launching relion --tomo from the command line. However, please do note that this part of the code

is not yet well tested and we have not yet been able to write an explanatory tutorial for this, so please be patient. Until

we have ﬁnished the documentation and testing, you can play with the code already, but we cannot yet provide any

feedback...

RELION

1.2 Release 4.0

Watch Sjors giving a Structural Studies Colloquium at MRC-LMB about the new features in release 4.0 on YouTube.

Note that since then, Schedules have been renamed to Schemes to prevent confusion with the existing functionality to

schedule jobs in the GUI.

A new approach to subtomogram averaging

Jasenko Zivanov and Joaquin (Kino) Oton have implemented a new approach to averaging in cryo-electron tomography,

which replaces standard sub-tomograms with the concept of pseudo-sub-tomograms. The new approach leads to better

weighting of the individual 2D images that make up a tilt series in relion_refine and the single-particle concepts of

Bayesian polishing and CTF reﬁnement have now also been implemented for tomography data. A preprint/publication

about this work is pending.

The VDAM reﬁnement algorithm

Dari Kimanius has implemented a new, gradient-driven algorithm with implicit regularisation, called Variable-metric

Gradient Descent with Adaptive Moments (VDAM). The VDAM algorithm replaces the previously implemented SAGD

algorithm for initial model generation, and makes 2D and 3D classiﬁcation faster, especially for large data sets. A

preprint/publication about this work, together with the automated class selection and the execution of workﬂow, is

pending.

Automated 2D class selection

Liyi Dong developed a new algorithm for automatic selection of 2D class average images that combines features that

are extracted from RELION’s 2D classiﬁcation metadata with a convolutional neural network that acts on the 2D class

averages themselves. The corresponding program, relion_class_ranker can be called through the Subset selection

job type.

Automatic execution of workﬂows

Sjors developed a framework for the automated execution of predeﬁned workﬂows, which is explained in more detail

in the section on On-the-ﬂy processing.

Tighter integration of the pipeliner with CCP-EM software

The CCP-EM team, mainly Matt Iadanza, Colin Palmer and Tom Burnley, have implemented a python-based pipeliner

in the CCP-EM software that mimics the relion pipeliner, but will be extended to include other CCP-EM softwares

too. The python interface is convenient for scripting, and can also be called from relion’s main GUI, by adding the

additional argument relion --ccpem &.

1.3 Release 3.1

Aberration corrections and optics groups

One of the major new features in relion-3.1 is a correction for higher-order aberrations in the data, i.e. besides the

beamtilt correction already present in relion-3.0, the current version can also estimate and correct for trefoil and tetrafoil,

as well as deviations from the nominal spherical aberration (Cs). The corresponding paper can be found on bioRxiv

[ZNS20]. The signal to estimate these aberrations is calculated by averaging over particles from multiple micrographs.

To allow for multiple subsets of a data set having diﬀerent Zernike coeﬃcients, relion-3.1 implements the new concept

of optics groups. Optics groups are deﬁned in a separate table called data_optics at the top of a STAR ﬁle, which

will also contain a table called data_movies, data_micrographs or data_particles, depending on what type of

images it refers to. The second table is similar to the content of STAR ﬁles in previous releases, but contains a new

column called rlnOpticsGroup, which is also present in the data_optics table. Common CTF-parameters, like

rlnVoltage and _`rlnSphericalAberration, but also the new rlnOddZernike and rlnEvenZernike, can be

stored once for each optics group in the data_optics table, without the need to store them for each particle/micrograph

in the second table.

2 Chapter 1. What’s new?

RELION

The same program that handles higher-order aberrations can also be used to reﬁne diﬀerences in (anisotropic) mag-

niﬁcation between the reference and (groups of) the particles. Besides correcting for anisotropic magniﬁcation in the

data, this is also useful when combining data from diﬀerent scopes. As of release 3.1, the program that does 2D/3D

classiﬁcation and 3D reﬁnement (relion_refine) can combine particles with diﬀerent box sizes and pixel sizes in a

single reﬁnement, and the magniﬁcation reﬁnement can be used to correct small errors in the (calibrated) pixel sizes.

The box and pixel size of the input reference (or the ﬁrst optics group in 2D classiﬁcation) will be used for the re-

constructions/class averages. You may want to check they are on the desired scale before running classiﬁcations or

reﬁnements!

Upon reading STAR ﬁles that were generated in older releases of relion, relion-3.1 will attempt to convert these auto-

matically into the relion-3.1-style STAR ﬁles. Therefore, moving a project from an older release to relion-3.1 should

be easy.

The External job-type

relion-3.1 allows execution of third-party software within the relion pipeline through the new External

job-type. See

this section for details on how to use this.

*Schedules* for on-the-ﬂy processing

The python script relion_it.py in relion-3.0 has been replaced by a new framework of Schedules, which implement

decision-based scheduling and execution of relion jobs. This comes with its own GUI interface. See Schedules for

details on how to use this.

General tweaks

Several tweaks have been made to enhance user experience:

• The pipeliner no longer looks for output ﬁles to see whether a job has ﬁnished. Instead, upon successful exit, all

programs that are launched from within the relion pipeline will write out a ﬁle called RELION_EXIT_SUCCESS

in the job directory. This avoids problems with subsequent execution of scheduled jobs with slow disc I/O.

• Likewise, when encountering an error, all programs will write out a ﬁle called RELION_EXIT_FAILURE. The

GUI will recognise these jobs and use a red font in the Finished jobs list. Note that incorrectly labeled jobs can

be changed using the ‘Mask as ﬁnished’ or ‘Mark as failed’ options from the Job actions pull-down menu.

• There is an ‘Abort running’ option on the Job actions pull-down menu, which will trigger the currently selected

job to abort. This works because all jobs that are executed from within the relion pipeline will be on the lookout

for a ﬁle called RELION_JOB_ABORT_NOW in their output directory. When this ﬁle is detected, the job will exit

prematurely and write out a RELION_EXIT_ABORTED ﬁle in the job directory. Thereby, users no longer need

to kill undesired processes through the queuing or operating system. The GUI will display aborted jobs with a

strike-through red font in the Finished jobs list.

• When a job execution has given an error, in previous releases the user would need to ﬁx the error through the

input parameters, and then launch a new job. They would then typically delete the old job. relion-3.1 allows to

directly overwrite the old job. This is accessible on Linux systems through ALT+o or through the Overwrite

continue option from the ‘File menu’. Note that the run.out and run.err ﬁles will be deleted upon a job

overwrite.

** Tweaks to helical processing **

Several new functionalities were implemented for helical processing:

• The relion_helix_inimodel2d program can be used to generate initial 3D reference maps for helices, in

particular for amyloids, from 2D classes that span an entire cross-over (see this section).

• The translational oﬀsets along the direction of the helical axis can now be restricted to a single rise in 2D-

classiﬁcation.

1.3. Release 3.1 3

RELION

• The 3D reﬁnement and 3D classiﬁcation now can use a prior on the ﬁrst Euler angle, (rlnAngleRotPrior),

which was implemented by Kent Thurber from the Tycko lab at the NIH.

4 Chapter 1. What’s new?

CHAPTER

TWO

THE TEAM

Below are the past and present members of the relion team in reverse chronological order of joining.

2.1 Current members

2.1.1 Bogdan Toader

Google Scholar

Bogdan studied Computer Science and Mathematics at the University of Manchester and did a PhD in Mathematics at

the University of Oxford. He joined the LMB in January 2024 as a postdoc in both our and Tanmay Bharat’s groups to

further improve and develop new tools and algoriths for cryo-ET data processing.

2.1.2 Kiarash Jamali

Google Scholar

Kiarash studied Mathematics and Statistics at the University of Toronto, St George. He joined our group as a PhD

student in October 2021, but had been working with us a volunteering undergrad student since the COVID19 lockdown

in April 2020. Kiarash works on various machine-learning methods for solving cryo-EM structures and beyond.

2.1.3 Johannes Schwab

Google Scholar

Johannes studied Mathematics at the University of Innsbruckm Austria, where he also obtained his PhD. Johannes

works on machine-learning approaches to deal with molecular ﬂexibility in cryo-EM reconstruction.

2.1.4 Dari Kimanius

Google Scholar

Dari studied Theoretical Physics at the Royal Institute of Technology (KTH, Stockholm), and did a PhD in Biophysics

and Bioinformatics at Stockholm University. Dari joined our group as a post-doc in May 2019. Dari implemented

GPU-acceleration during his PhD. For his postdoc at LMB, he works on the incorporation of more informative priors

through deep convolutional neural networks, as well as on the design of more eﬃcient optimisation algorithms.

RELION

2.1.5 Euan Pyle

Google Scholar

Euan studied Chemistry and Biology at Durham University before going on to complete his PhD at Imperial College

London. As a post-doc position in Giulia Zanetti’s lab at Birkbeck College/The Francis Crick Institute, he contributed

code to the RELION 5.0 tomography pipeline, in close collaboration with Alister Burt.

2.1.6 Takanori Nakane

Google Scholar

Takanori studied as an M.D. at Kyoto University, where he also obtained his PhD in Medicine. In 2014 he also obtained

an M.Phil in Computational Biology from the University of Cambridge. Takanori joined our team in October 2017. He

works on many facets of single-particle analysis and kindly dedicates a lot of his time to provide support to RELION

users worldwide.

2.1.7 Sjors Scheres

Google Scholar

Sjors studied Chemistry at Utrecht University, The Netherlands, where he also obtained his PhD in protein crystallog-

raphy. He was a post-doc in the group of Jose-Maria Carazo in Madrid, before he started his group at the LMB in 2010.

Sjors designed the regularised likelihood algorithm and is the original architect of relion.

2.2 Past members

2.2.1 Alister Burt

Google Scholar

Alister studied Chemistry at the University at York, UK and did his PhD at the IBS in Grenoble, France. He was a

post-doc in the group of David Barford, where he contributed code to the new sub-tomogram avaraging pipeline in

relion-5;

2.2.2 David Li

Google Scholar

David studied Electrical Engineering and Computer Science at MIT. He worked in the lab of Prof Feng Zhang on

CRISPR systems. David joined our group as an MPhil student in October 2022 to work on in vitro assembly of tau

ﬁlaments. During his stay with us David also developed new classiﬁcation software for ﬁlaments. David went on to do

a PhD at Stanford.

6 Chapter 2. The team

RELION

2.2.3 Liyi Dong

Liyi studied Biology at China Agricultural University and did a BSc at Purdue University with Michael Rossmann.

Liyi joined our group as a PhD student in November 2017. She worked on automated selection of 2D class averages.

2.2.4 Shaoda He

Shaoda studied Biology at Peking University, where he also participated in a programme organized by the Electronic

Engineering and Computer Science Department. Shaoda joined our group in October 2014. He developed helical

reconstruction and local symmetry averaging.

2.2.5 Joaquín Otón

Google Scholar

Joaquin (Kino) studied Physics at Universitat de Vàlencia (Spain) and did a PhD in Optical Engineering at Universitat

Politècnica de Catalunya. He joined the neighbouring Briggs group in 2018. Together with Jasenko, he co-developed

a new approach for sub-tomogram averaging.

2.2.6 Jasenko Zivanov

Jasenko studied Computer Science at the University of Basel, Switzerland, where he also obtained his PhD. Jasenko

was a postdoc with us for more than three years (during 2017-2020), and continues to work with us from Switzerland.

Jasenko developed Bayesian polishing and optical aberration correction for single-particle analsyis. Together with

Kino, he also developed a new approach for sub-tomogram averaging.

ò Note

relion is distributed under a GPLv2 license, i.e. it is completely free, open-source software for both academia and

industry.

2.2. Past members 7

RELION

8 Chapter 2. The team

CHAPTER

THREE

INSTALLATION

The sections below explain how to download and install relion on your computer.

Note that relion depends on and uses several external programs and libraries.

C++ compiler:

RELION 5.0 requires a C++ compiler that fully supports the C++14 standard. For GCC, this means version 5.0

or later. Note that GCC 4.8, which comes with RedHat Enterprise Linux / Cent OS 7.x, is too old. You can

obtain newer GCC via devtoolset or use free Intel compiler that comes with oneAPI toolkit (see below).

MPI:

Your system will need MPI runtime (most ﬂavours will do). If you don’t have an MPI installation already on

your system, we recommend installing OpenMPI.

CUDA, HIP/ROCm, SYCL or oneAPI intel compilers:

If you have GPUs from nvidia, AMD or Intel, you can accelerate many jobs considerably.

By default, relion will build with GPU-acceleration support, for which you’ll need cuda. Download it from

NVIDIA website. Note that CUDA toolkits support only a limited range of C compilers. Also note that a newer

CUDA toolkit requires a newer GPU driver. Carefully read the release note and make sure you have a compatible

set of GPU driver, C compiler and CUDA toolkit.

If you want to compile with HIP/ROCm, you will need

• AMD ROCm

If you want to compile with SYCL, you will need

• Intel oneAPI Base Toolkit and HPC Toolkit (All components recommended; this is also recommended

if you want to build the CPU acceleration path, see below)

• Intel software for general purpose GPU capabilities

• Intel CPU Runtime for OpenCL(TM) Applications (optional)

• Codeplay oneAPI for NVIDIA GPU (optional)

• Codeplay oneAPI for AMD GPU (optional)

CTFFIND-4.1:

CTF estimation is not part of relion. Instead, relion provides a wrapper to Alexis Rohou and Niko Grigorieﬀ’s

ctﬃnd 4 [RG15]. Please obtain CTFFIND 4.1.x from their Web site. Note that CTFFIND 5.x is not supported.

Ghostscript:

RELION uses Ghostscript to generate PDF ﬁles.

FLTK (only for GUI):

RELION uses FLTK as a GUI tool kit. This will be installed automatically (see below).

RELION

X Window system libraries (only for GUI):

RELION needs basic X11 libraries together with Xft for the GUI. Most Linux distributions have packages called

libxft-dev or libXft-devel and libX11-devel. Note that you need developer packages if you build your

own FLTK.

FFT libraries:

RELION needs an FFT library. The default is FFTW. This will be installed automatically (see below). Depending

on your CPU, Intel MKL FFT or AMD optimised FFTW might run faster. See below how to use them.

libtiﬀ:

RELION needs libtiﬀ version >= 4.0. Most Linux distributions have packages called libtiff-dev or

libtiff-devel. Note that you need a developer package.

libpng:

RELION needs libpng. Most Linux distributions have packages called libpng-dev or libpng-devel. Note

that you need a developer package.

pbzip2, xz, zstd:

RELION needs these commands in the PATH to read MRC movies compressed by bzip2, xz or ZStandard, re-

spectively. Note that RELION uses pbzip2, not bzip2. Most Linux distributions provide packages for these

utilities.

UCSF MotionCor2 (optional):

relion implements its own (CPU-only) implementation of the UCSF motioncor2 algorithm for whole-frame mi-

crograph movie-alignment [ZPA+17]. If you want, you can still use the (GPU-accelerated) UCSF program. You

can download it from David Agard’s page and follow his installation instructions. Note that using the UCSF

program does not make full advantage of the opportunities provided in Bayesian polishing.

ResMap (optional):

Local-resolution estimation may be performed inside relion’s own postprocessing program. Alternatively, one

can also use Alp Kucukelbir’s resmap [KST14]. Download it from Alp’s ResMap website and follow his instal-

lation instructions.

In practice, most of these dependencies can be installed by system’s package manager if you have the root priviledge.

In Debian or Ubuntu:

sudo apt install cmake git build-essential mpi-default-bin mpi-default-dev libfftw3-dev␣

˓→libtiff-dev libpng-dev ghostscript libxft-dev

In RHEL, Cent OS, Scientiﬁc Linux:

sudo yum install cmake git gcc gcc-c++ openmpi-devel fftw-devel libtiff-devel libpng-

˓→devel ghostscript libXft-devel libX11-devel

3.1 Download RELION

We store the public release versions of relion on GitHub, a site that provides code-development with version control

and issue tracking through the use of git. We will not describe the use of git in general, as you will not need more

than very basic features. Below we outline the few commands needed on a UNIX-system, please refer to general git

descriptions and tutorials to suit your system. To get the code, you clone or download the repository. We recommend

cloning, because it allows you very easily update the code when new versions are released. To do so, use the shell

command-line:

git clone https://github.com/3dem/relion.git

10 Chapter 3. Installation

RELION

This will create a local Git repository. All subsequent git-commands should be run inside this directory.

The master branch (default) contains the stable release of relion-4.0. By performing:

git checkout ver5.0

you can access the latest (developmental) updates for RELION 5.0x.

The code will be intermittently updated to amend issues. To incorporate these changes, use the command-line:

git pull

inside you local repository (the source-code directory downloaded). If you have changed the code in some way, this

will force you to commit a local merge. You are free to do so, but we will assume you have not changed the code.

Refer to external instructions regarding git and merging so-called conﬂicts if you have changed the code an need to

keep those changes.

3.2 Setup a conda environment

To add support for Python modules (e.g. Blush, ModelAngelo and DynaMight) you will have to setup a Python envi-

ronment with dependencies. We recommend installing via Miniforge to avoid inadvertently installing packages from

the restrictively licensed “default” conda repository.

Once you have conda setup, you can install all the RELION Python dependencies into a new environment by running:

conda env create -f environment.yml

Also code in this environment will be updated intermittently. You can incorporate the latest changes by running:

conda env update -f environment.yml

. Warning

You should NOT activate this relion-5.0 conda environment when compiling and using RELION; RELION

activates it automatically only when necessary. Otherwise, system-wide installation of compilers/libraries/MPI

runtime might get mixed up with those provided by conda, leading to compilation failures or runtime errors. The

same applies to other software packages that provide their own libraries/MPI runtime, such as CCPEM, CCP4,

EMAN2, DIALS, PHENIX.

The cmake command should automatically detect the relion-5.0 conda environment created above. If it does not,

you can specify -DPYTHON_EXE_PATH=path/to/your/conda/python. Additionally, if you intend to make use of

automatically downloaded pretrained model weights (used in e.g. Blush, ModelAngelo and class_ranker), it is recom-

mended to explicitly set the destination directory (TORCH_HOME) by including the ﬂag -DTORCH_HOME_PATH=path/

to/torch/home. You have to create this directory before running cmake. Otherwise, it will be downloaded to the

default location (usually ~/.cache/torch).

At the moment, the model weights for Blush are stored on MRC-LMB’s FTP server. If your network blocks FTP, please

follow instructions here.

3.2. Setup a conda environment 11

RELION

3.3 Compilation

relion has an installation procedure which relies on cmake. You will need to have this program installed, but most

UNIX-systems have this by default. You will need to make a build-directory in which the code will be compiled. This

can be placed inside the repository:

cd relion

mkdir build

cd build

You then invoke cmake inside the build-directoy, but point to the source-directoy to conﬁgure the installation. This

will not install relion, just conﬁgure the build:

cmake ..

The output will notify you of what was detected and what type of build will be installed. Because relion is rich in terms

of the possible conﬁgurations, it is important to check this output. For instance:

• The path to the MPI library.

• GPU-capability will only be included if a CUDA SDK is detected. If not, the program will install, but without

support for GPUs.

• The path to the Python interpreter.

• If FFTW is not detected, instructions are included to download and install it in a local directory known to the

relion installation.

• As above, regarding FLTK (required for GUI). If a GUI is not desired, this can be escaped as explained in the

following section.

The MPI library must be the one you intend to use relion with. Compiling relion with one version of MPI and running the

resulting binary with mpirun from another version can cause crash. Note that some software packages (e.g. CCPEM,

crYOLO, EMAN2) come with their own MPI runtime. Sourcing/activating their environment might update PATH and

LD_LIBRARY_PATH environmental variables and put their MPI runtime into the highest priority.

The MPI C++ compiler (mpicxx) and CUDA compiler (nvcc) internally calls a C++ compiler. This must match the

compiler cmake picked up. Otherwise, the compilation might fail at the linking step.

Following the completion of cmake-conﬁguration without errors, make is used to install the program:

make -j N

, where N is the number of processes to use during installation. Using a higher number simply means that it will compile

faster.

Take note of any warnings or errors reported. relion will be installed in the build directory’s sub-directory called bin.

To make the installation system-wide, see below.

Wherever you install relion, make sure your PATH environmental variable points to the directory containing relion

binaries. Launching relion with a path like /path/to/relion is not the right way; this starts the right GUI, but the

GUI might invoke other versions of relion in the PATH.

12 Chapter 3. Installation

RELION

3.4 General conﬁguration

CMake allows conﬁguration of many aspects of the installation, some of which are outlined here. Note that by default,

relion is conﬁgured to build with CUA acceleration on NVidia GPUs. Instructions for building with CPU, HIP/Rocm

(AMD) SYCL (Intel et al) acceleration are given in the next section below.

Most options can be set by adding options to the cmake conﬁguration. Under the below subheadings, some example

replacement commands are given to substitute the original conﬁguration command. It is also recommended to clean

or purge your build-directory between builds, since CMake caches some of previous conﬁgurations:

cd build

rm -fr *

And of course, any of the below options can be combined.

Omitting the GUI:

cmake -DGUI=OFF .. (default is ON)

With this option, GUI programs (e.g. relion, relion_manualpick, relion_display) are not be built and

FLTK becomes unnecessary.

Using single-precision on the CPU:

cmake -DDoublePrec_CPU=OFF .. (default is ON)

This will reduce (CPU but not GPU) memory consumption to about half. This is useful when memory hungry

tasks such as motion correction and Polishing run out of memory. This is safe in most cases but please use the

default double precision build if CtfReﬁne produces NaNs.

Using double-precision on the GPU:

cmake -DDoublePrec_GPU=ON .. (default is OFF)

This will slow down GPU-execution considerably, while this does NOT improve the resolution. Thus, this option

is not recommended.

Compiling NVIDIA GPU codes for your architecture:

cmake -DCUDA_ARCH=52 .. (default is 35, meaning compute capability 3.5, which is the lowest supported by

relion)

CUDA-capable NVIDIA devices have a so-called compute capability, which code can be compiled against for

optimal performance. The compute capability of your card can be looked up at the table in NVIDIA website.

WARNING: If you use a wrong number, compilation might succeed but the resulting binary can fail at the

runtime.

Forcing build and use of local FFTW:

cmake -DFORCE_OWN_FFTW=ON ..

This will download, verify and install FFTW during the installation process.

Forcing build and use of AMD optimized FFTW:

cmake -DFORCE_OWN_FFTW=ON -DAMDFFTW=ON ..

This will download, verify and install AMD optimized version of FFTW during the installation process. This is

recommended for AMD CPUs (e.g. Ryzen, Threadripper, EPYC).

Forcing build and use of Intel MKL FFT:

cmake -DMKLFFT=ON ..

This will use FFT library from Intel MKL. In contrast to the FFTW options above, this will not download MKL

automatically. You have to install MKL and set relevants paths (usually by sourcing the mkl_vars.sh script).

3.4. General conﬁguration 13

RELION

Forcing build and use of local FLTK:

cmake -DFORCE_OWN_FLTK=ON ..

This will download, verify and install FLTK during the installation process. If any of these are not detected

during conﬁguration, this will happen automatically anyway, and you should not have to specify the below options

manually.

Specify location of libtiﬀ:

cmake -DTIFF_INCLUDE_DIR=/path/to/include -DTIFF_LIBRARY=/path/to/libtiff.so.5

This option is to use libtiﬀ installed in non-standard location.

Specifying an installation location:

To allow relion a system-wide installation use:

cmake -DCMAKE_INSTALL_PREFIX=/path/to/install/dir/ ..

make -j N

make install

. Warning

Do not specify the build directory itself as CMAKE_INSTALL_PREFIX. This does not work! If you are happy with

binaries in the build directory, leave CMAKE_INSTALL_PREFIX as default and omit the make install step.

3.5 Conﬁguration with CPU acceleration

Enable accelerated CPU code path:

cmake -DALTCPU=ON

Note that this is mutually exclusive with GPU acceleration (-DCUDA=ON). Intel Classic compilers are recom-

mended for this option (see below).

Use Intel compilers:

There are two Intel compilers: Intel Classic compiler and Intel oneAPI DPC++/C++ compiler. They often gen-

erate faster binaries for Intel CPUs, especially when combined with the accelerated CPU code path above. As of

2024 April, the classic compiler generate faster binaries than the DPC++/C++ compiler.

Both compilers used to be available free of chage as part of Intel oneAPI HPC toolkit. Unfortunately, the classic

compiler was removed in the 2024 release and only the DPC++/C++ compiler is currently distributed. (Ap-

parently older versions seem to be available via YUM/APT, but we do not know how long they remain.) We

recommend you to use the classic compiler if you have older oneAPI toolkit installers at hand.

To use Intel Classic compiler, run below after sourcing the initialization script (setvars.sh):

mkdir build-cpu

cd build-cpu

cmake .. -DMKLFFT=ON \

-DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DMPI_C_COMPILER=mpiicc -DMPI_CXX_

˓→COMPILER=mpiicpc \

-DCMAKE_C_FLAGS="-O3 -ip -g -xCOMMON-AVX512 -restrict " \

-DCMAKE_CXX_FLAGS="-O3 -ip -g -xCOMMON-AVX512 -restrict "

make -j 24

This generates binaries optimized with AVX512 instructions. If your CPU supports only up to AVX256, use

-xCORE-AVX2 instead of -xCOMMON-AVX512.

14 Chapter 3. Installation

RELION

If you do not have Intel Classic compiler, use the DPC++/C++ compiler from the latest oneAPI release. Its

performance is being improved. The cmake line should be:

cmake .. -DMKLFFT=ON \

-DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DMPI_C_COMPILER=mpiicx -DMPI_CXX_

˓→COMPILER=mpiicpx \

-DCMAKE_C_FLAGS="-O3 -qopenmp-simd -xCORE-AVX512 -qopt-zmm-usage=high -qoverride-

˓→limits " \

-DCMAKE_CXX_FLAGS="-O3 -qopenmp-simd -xCORE-AVX512 -qopt-zmm-usage=high -qoverride-

˓→limits "

If your CPU supports only up to AVX256, use -xCORE-AVX2 instead of -xCORE-AVX512.

If you don’t want to use Intel MPI, change DMPI_C_COMPILER and DMPI_CXX_COMPILER variables accordingly.

For example, to use OpenMPI with Intel Classic compiler, specify mpicc and mpicxx after setting environmen-

tal variables OMPI_CC=icc and OMPI_CXX=icpc. If cmake still picks up Intel MPI, specify MPI_HOME. See

OpenMPI FAQ and FindMPI manual for details.

3.6 Conﬁguration with HIP/ROCm acceleration for AMD GPUs

Enable the accelerated HIP/ROCm code path with:

cmake -DHIP=ON

Note that this is mutually exclusive with other accelerated code paths (e.g. CUDA, ALTCPU and SYCL). On our

system, we build with HIP/ROCm acceleration to use AMD GPUs with the following commands:

export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH

export PATH=/opt/rocm/:$PATH

export ROCM_PATH=/opt/rocm/

mkdir build-amd

cd build-amd

cmake -DCMAKE_BUILD_TYPE=Release -DHIP=ON -DHIP_ARCH="gfx90a,gfx908" -DFORCE_OWN_FFTW=ON␣

˓→ -DAMDFFTW=on ..

make -j 24

If you get problems ﬁnding omp.h, make sure you have openmp-extras-devel installed on your system too.

3.7 Conﬁguration with SYCL acceleration (Intel GPUs)

Enable accelerated the SYCL code path with:

cmake -DSYCL=ON

Note that this is mutually exclusive with other accelerated code paths (e.g. CUDA, ALTCPU and HIP/ROCm). Tech-

nically speaking, you can build SYCL for AMD and NVIDIA GPUs to make a single binary that runs on NVIDIA,

AMD and Intel GPUs, but this is highly experimental and not tested well.

For now, this way of building RELION is explained here:.

3.6. Conﬁguration with HIP/ROCm acceleration for AMD GPUs 15

RELION

3.8 Set-up queue job submission

The GUI allows the user to submit jobs to a job queueing system with a single click. For this to work, a template job

submission script needs to be provided for the queueing system at hand (e.g. TORQUE, PBS, SGE). In this script a set

of strings (variables) in the template script is replaced by the values given in the GUI. The following table contains all

deﬁned variables:

String Variable Meaning

XXXoutfileXXX string The standard output log ﬁle RELION GUI displays.

XXXerrfileXXX string The standard error log ﬁle RELION GUI displays.

XXXcommandXXX string relion command + arguments

XXXqueueXXX string Name of the queue to submit job to

XXXmpinodesXXX integer The number of MPI processes to use

XXXthreadsXXX integer The number of threads to use on each MPI process

XXXcoresXXX integer The number of MPI processes times the number of threads

XXXdedicatedXXX integer The minimum number of cores on each node (use this to ﬁll entire

nodes)

XXXnodesXXX integer The total number of nodes to be requested

XXXextra1XXX string Installation-speciﬁc, see below

XXXextra2XXX string Installation-speciﬁc, see below

The XXXcommandXXX variable needs a special care. For non-MPI commands (e.g. relion_refine) not only the

variable but the whole line is replaced. Thus, mpirun XXXcommandXXX will be mpirun relion_refine_mpi for an

MPI job but relion_refine for a non-MPI job. Also note that some jobs consist of multiple lines of commands. See

CCPEM threads (1 and 2) for typical pitfalls.

There are two environment variables that control the use of the entry of the ‘Minimum number of dedicated cores

per node’ on the Running tabs of the GUI: RELION_MINIMUM_DEDICATED sets its default value (1 if not set);

RELION_ALLOW_CHANGE_MINIMUM_DEDICATED sets whether the user will be able to change this entry. At LMB,

we set the default to 24 and do not allow users to change it. In this way, we enforce that our hyper-threaded 12-core

nodes get ﬁlled up entirely by individual relion jobs.

By default, the XXXextra1XXX, XXXextra2XXX, ... variables are not used. They provide additional ﬂexibility for

queueing systems that require additional variables. They may be activated by ﬁrst setting RELION_QSUB_EXTRA_COUNT

to the number of ﬁelds you need (e.g. 2) and then setting the RELION_QSUB_EXTRA1, RELION_QSUB_EXTRA2, ...

environment variables, respectively. This will result in extra input ﬁelds in the GUI, with the label text being equal to the

value of the environment variable. Likewise, their default values (upon starting the GUI) can be set through environment

variables RELION_QSUB_EXTRA1_DEFAULT, RELION_QSUB_EXTRA2_DEFAULT, etc and their help messages can be set

through environmental variables RELION_QSUB_EXTRA1_HELP, RELION_QSUB_EXTRA2_HELP and so on. But note

that (unlike all other entries in the GUI) the extra values are not remembered from one run to the other.

The template job submission script may be saved in any location. By default, the one used at the LMB is present as

gui/qsub.csh in the relion tar-ball. Upon installation this ﬁle is copied to the bin directory. It is convenient for the

user if he does not have to select this ﬁle each time he opens the relion GUI in a new directory. Therefore, one may set

the environment variable RELION_QSUB_TEMPLATE to point to the location of the script for the system at hand. This

value will be pre-set as default in the GUI. (Note the user still has the liberty to deﬁne and use his own template!)

ò Note

If somehow the job queue submission cannot be set up, relion may still be run in parallel and on a job queueing

system. The GUI comprises a Print command button that prints the entire relion command, including all arguments,

to the screen. Pasting of this command to a job queue submission script, and manual submission of this script may

16 Chapter 3. Installation

RELION

then be used to submit the parallel job to a queueing system.

3.9 Edit the environment set-up

For relion, we source the following C-shell setup in our .cshrc ﬁle. You’ll need to change all the paths for your own

system, and translate the script in case you use a bash shell (which uses export instead of setenv etc).

#!/bin/csh -f

# Setup openMPI if not already done so

if ("" == "`echo $path | grep /public/EM/OpenMPI/openmpi/bin`") then

set path=(/public/EM/OpenMPI/openmpi/bin $path)

endif

if ("1" == "$?LD_LIBRARY_PATH") then

if ("$LD_LIBRARY_PATH" !~ */public/EM/OpenMPI/openmpi/lib*) then

setenv LD_LIBRARY_PATH /public/EM/OpenMPI/openmpi/lib:$LD_LIBRARY_PATH

endif

else

setenv LD_LIBRARY_PATH /public/EM/OpenMPI/openmpi/lib

endif

# Setup |RELION| if not already done so

if ("" == "`echo $path | grep /public/EM/RELION/relion/bin`") then

set path=(/public/EM/RELION/relion/bin $path)

endif

if ("1" == "$?LD_LIBRARY_PATH") then

if ("$LD_LIBRARY_PATH" !~ */public/EM/RELION/relion/lib*) then

setenv LD_LIBRARY_PATH /public/EM/RELION/relion/lib:$LD_LIBRARY_PATH

endif

else

setenv LD_LIBRARY_PATH /public/EM/RELION/relion/lib

endif

# CUDA for RELION

setenv PATH /public/EM/CUDA/Cuda11.4/bin:$PATH

setenv LD_LIBRARY_PATH /public/EM/CUDA/Cuda11.4/lib64:$LD_LIBRARY_PATH

setenv CUDA_HOME /public/EM/CUDA/Cuda11.4

# Where is qsub template script stored

setenv RELION_QSUB_TEMPLATE /public/EM/RELION/relion-devel/bin/qsub.csh

# Default PDF viewer

setenv RELION_PDFVIEWER_EXECUTABLE evince

# Default MOTIONCOR2 executable

setenv RELION_MOTIONCOR2_EXECUTABLE /public/EM/MOTIONCOR2/bin/MotionCor2_1.0.4

# Default CTFFIND-4.1+ executable

setenv RELION_CTFFIND_EXECUTABLE /public/EM/ctffind/ctffind.exe

(continues on next page)

3.9. Edit the environment set-up 17

RELION

(continued from previous page)

# Default ResMap executable

setenv RELION_RESMAP_EXECUTABLE /public/EM/ResMap/ResMap-1.1.4-linux64

# Enforce cluster jobs to occupy entire nodes with 24 hyperthreads

setenv RELION_MINIMUM_DEDICATED 24

# Do not allow the user to change the enforcement of entire nodes

setenv RELION_ALLOW_CHANGE_MINIMUM_DEDICATED 0

# Ask for confirmation if users try to submit local jobs with more than 12 MPI nodes

setenv RELION_WARNING_LOCAL_MPI 12

# Other useful variables

# RELION_MPI_RUN: The mpi runtime ('mpirun' by default)

# RELION_QSUB_NRMPI: The default for 'Number of MPI procs'

# RELION_MPI_MAX: The maximum number of MPI processes available from the GUI

# RELION_QSUB_NRTHREADS: The default for 'Number of threads'

# RELION_THREAD_MAX: The maximum number of threads per MPI process available from the GUI

# RELION_QUEUE_USE: The default for 'Submit to queue?'. "yes" or "no".

# RELION_QUEUE_NAME: The default for 'Queue Name"

# RELION_QSUB_COMMAND: The default for 'Queue submit command'

# RELION_MINIMUM_DEDICATED: The default for 'Minimum dedicated cores per node'

# RELION_ALLOW_CHANGE_MINIMUM_DEDICATED: Whether to allow a user to change the 'Minimum␣

˓→dedicated cores per node' field in the GUI

# RELION_SHELL: A shell used to launch CTFFIND in CtfFind jobs ('csh' by default; only␣

˓→available from 3.1)

# RELION_SCRATCH_DIR: The default scratch directory in the GUI

# RELION_STACK_BUFFER: The buffer size used for MRC(S) file I/O, potentially useful on␣

˓→GPFS or Lustre file system. See https://github.com/3dem/relion/pull/783 for details.

18 Chapter 3. Installation

CHAPTER

FOUR

SINGLE PARTICLE TUTORIAL

4.1 Introduction

This tutorial provides an introduction to the use of relion-5.0 for cryo-EM structure determination. This tutorial covers

the entire single-particle analysis workﬂow in relion-5.0: beam-induced motion-correction, CTF estimation; auto-

mated particle picking; particle extraction; 2D class averaging; automated 2D class selection; VDAM-based initial

model generation; 3D classiﬁcation; high-resolution 3D reﬁnement (including Blush regularisation); CTF reﬁnement

and higher-order aberration correction; Bayesian polishing to correct for beam-induced motions in the movies; map

sharpening and local-resolution estimation; automated model building with ModelAngelo; and ﬂexibility analysis with

DynaMight. Carefully going through this tutorial should take less than a day (if you have a suitable GPU or if you

follow our precalculated results). After that, you should be able to run relion on your own data.

This tutorial uses a test data set on beta-galactosidase that was kindly given to us by Takayuki Kato from the Namba

group at Osaka university, Japan. It was collected on a JEOL CRYO ARM 200 microscope. The data and our pre-

calculated results may be downloaded and unpacked using the commands below. The full data set is also available at

EMPIAR-10204.

wget ftp://ftp.mrc-lmb.cam.ac.uk/pub/scheres/relion30_tutorial_data.tar

wget ftp://ftp.mrc-lmb.cam.ac.uk/pub/scheres/relion50_tutorial_precalculated_results.tar.

˓→gz

tar -xf relion30_tutorial_data.tar

tar -zxf relion50_tutorial_precalculated_results.tar.gz

If you have any questions about relion, ﬁrst read this entire document, check the FAQ on the relion Wiki and the archives

of the CCPEM mailing list. If that doesn’t help, subscribe to the CCPEM email list and use the email address above

for asking your question.

³ Caution

Please, please, please, do not send us direct emails, as we can no longer respond to all of those.

RELION

4.2 Preprocessing

4.2.1 Getting organised

We recommend to create a single directory per project, i.e. per structure you want to determine. We’ll call this the

project directory. It is important to always launch the RELION graphical user-interface (GUI) from the project

directory. Inside the project directory you should make a separate directory to store all your raw micrographs or

micrograph movies in MRC, TIFF or EER format. We like to call this directory Movies/ if all movies are in one

directory, or for example Movies/15jan16/ and Movies/23jan16/ if they are in diﬀerent directories (e.g. because

they were collected on diﬀerent dates). If for some reason you do not want to place your movies inside the relion project

directory, then inside the project directory you can also make a symbolic link to the directory where your movies are

stored.

. Warning

Symbolic links must be made by an absolute path (e.g. /storage/data/15jan16). Use of relative paths (e.g. ..

/../storage/data/15jan16) can cause problems in later steps. Our precalculated example contains a symbolic

link from Movies to ../Tutorial4.0/Movies/ but please do not follow this practice. Because we don’t know where

you decompress the archive, we cannot include a link by an absolute path.

Single-image micrographs should have a .mrc extension, while movies can have a .mrc, .mrcs, .tif, .tiff or .eer

extension. For EER movies, see this for the details. RELION 5.0 can also read MRC movies compressed by bzip2,

xz, zstd or gzip. When you unpacked the tutorial test data, the (Movies/) directory was created. It should contain

24 movies in compressed TIFF format, a gain-reference ﬁle (gain.mrc) and a NOTES ﬁle with information about the

experiment.

We will start by launching the relion GUI. As said before, this GUI always needs to be launched from the project

directory. To prevent errors with this, the GUI will ask for conﬁrmation the ﬁrst time you launch it in a new directory.

Make sure you are inside the project directory, and launch the GUI by typing:

relion &

and answer Yes when prompted to set up a new relion project here.

The ﬁrst thing to do is to import the set of recorded micrograph movies into the pipeline. Select Import from the

job-type browser on the left, and ﬁll in the following parameters on the Movies/mics tab:

Import raw movies/micrographs?

Yes

Raw input ﬁles:

Movies/*.tiﬀ

Are these multi-frame movies?

Yes

(Set this to No if these are single-frame micrographs)

Optics group name:

opticsGroup1

(This ﬁeld can be used to divide the data set into multiple optics groups: separately import each

optics group with its own name, and then use the Join star files jobtype to combine the groups.

MTF of the detector:

mtf_k2_200kV.star

20 Chapter 4. Single particle tutorial

RELION

Pixel size (Angstrom):

0.885

Voltage (kV):

200

Spherical aberration (mm):

1.4

Amplitude contrast:

0.1

Beamtilt in X (mrad):

Beamtilt in Y (mrad):

The MTF ﬁle can be obtained from the Gatan Web site. If you are working oﬄine and cannot obtain the ﬁle, you can

ignore it. The MTF correction does not change the ﬁnal resolution (but changes the B factor). You can also apply it in

PostProcessing.

On the Others tab, make sure the following is set:

Import other node types?

You may provide a meaningful alias (for example: movies) for this job in the white ﬁeld named Current job:

Give_alias_here. Clicking the Run! button will launch the job. A directory called Import/job001/ will be cre-

ated, together with a symbolic link to this directory that is called Import/movies. Inside the newly created directory

a star ﬁle with all the movies is created. Have a look at it using:

less Import/job001/movies.star

If you had extracted your particles in a diﬀerent software package, then instead of going through the Preprocessing

steps below, you would use the same Import job-type to import particles star ﬁle, 3D references, 3D masks, etc. Note

that this is NOT the recommended way to run relion, and that the user is responsible for generating correct star ﬁles.

4.2.2 Beam-induced motion correction

The Motion correction job-type implements relion’s own (CPU-based) implementation of the UCSF motioncor2 pro-

gram for convenient whole-frame movie alignment, as well as a wrapper to the (GPU-based) motioncor2 program

itself [ZPA+17]. Besides executing the calculations on the CPU/GPU, there are three other diﬀerences between the two

implementations:

• Bayesian polishing (for per-particle motion-correction; see this section) can only read local motion tracks from

our own implementation;

• The motioncor2 program performs outlier-pixel detection on-the-ﬂy, and this information is not conveyed to

Bayesian polishing , which may result in unexpectedly bad particles after polishing;

• Our own implementation can write out the sum of power spectra over several movie frames, which can be passed

directly into ctﬃnd 4.1 for faster CTF-estimation.

For these three reasons, we now favour running our own implementation.

On the I/O tab set:

4.2. Preprocessing 21

RELION

Input movies STAR ﬁle:

Import/job001/movies.star

(Note that the Browse button will only list movie star ﬁles.)

First frame for corrected sum:

Last frame for corrected sum:

-1

(This will result in using all movie frames.)

Dose per frame (e/A2)

1.277

Pre-exposure (e/A2)

EER fractionation

(This option will be ignored for TIFF ﬁles.)

Write output in ﬂoat16?

Yes

(This will save a factor of 2 in disk space compared to the default of writing in ﬂoat32. Note that RE-

LION and CCPEM will read ﬂoat16 images, but other programs may not (yet) do so. For example,

Gctf will not work with ﬂoat16 images. Also note that this option does not work with UCSF Mo-

tionCor2. For CTF estimation, use CTFFIND-4.1 with pre-calculated power spectra, by activating

the ‘Save sum of power spectra’ option below.)

Do dose-weighting?

Yes

Save non-dose-weighted as well?

(In some cases non-dose-weighted micrographs give better CTF estimates. To save disk space, we’re

not using this option here as the data are very good anyway.)

Save sum of power spectra?

Yes

Sum of power spectra every e/A2:

(This seems to be a good value according to measurements by Greg McMullan and Richard Hender-

son.)

Fill in the Motion tab as follows:

Bfactor:

150

(use larger values for super-resolution movies)

Number of patches X,Y

5 5

22 Chapter 4. Single particle tutorial

RELION

Group frames:

Binning factor:

(we often use 2 for super-resolution movies)

Gain-reference image:

Movies/gain.mrc

(This can be used to provide a gain-reference ﬁle for on-the-ﬂy gain-reference correction. This is

necessary in this case, as these movies are not yet gain-corrected.)

Gain rotation:

No rotation (0)

Gain ﬂip:

No ﬂipping (0)

Defect ﬁle:

(This can be used to mask away broken pixels on the detector. Formats supported in our own im-

plementation and in UCSF motioncor2 are either a text ﬁle in UCSF motioncor2 format (each line

contains four numbers: x, y, width and height of a defect region); or a defect map (an image in MRC

or TIFF format, where 0=good and 1=bad pixels). The coordinate system is the same as the input

movie before application of binning, rotation and/or ﬂipping. Note that defect text ﬁles produced

by SerialEM are NOT supported! However, one can convert a SerialEM-style defect ﬁle into a

defect map using imod.)

Use RELION’s own implementation?

Yes

(this reduces the requirement to install the UCSF implementation. If you have the UCSF program

installed anyway, you could also use that one. In that case, you also need to ﬁll in the options below.)

Fill in the Running tab as follows:

Number of MPI procs:

(Assuming you’re running this tutorial on a local computer)

Number of threads:

(As these movies are 24 frames, each thread will do two movie frames)

Submit to queue?

(Again, assuming you’re running this tutorial on a local computer)

Executing this program takes approximately 5 minutes when using 12 threads on a reasonably modern machine. Note

that our own implementation of the motioncor2 algorithm does not use a GPU. This program is multi-threaded. As

each thread will work independently on a movie frame, it is optimal to use a number of threads such that the number of

movie frames divided by the number threads is an integer number. As these movies have 24 frames, using 12 threads

will result in 2 frames being processed by each thread. You can look at the estimated beam-induced shifts, and their

statistics over the entire data set, by selecting the out: logfile.pdf from the Display: button below the run buttons,

or you can look at the summed micrographs by selecting out: corrected_micrographs.star. Depending on the size of

4.2. Preprocessing 23

RELION

your screen, you should probably downscale the micrographs (Scale: 0.3) and use Sigma contrast: 3 and few

columns (something like Number of columns: 3) for convenient visualisation. Note that you cannot select any

micrographs from this display. If you want to exclude micrographs at this point (which we will not do, because they

are all ﬁne), you could use the Subset selection job-type.

4.2.3 CTF estimation

Next, we will estimate the CTF parameters for each corrected micrograph. You can use the CTF estimation job-type

as a wrapper to Alexis Rohou and Niko Grigorieﬀ’s ctﬃnd 4.1 to execute eﬃciently on the CPU. We now prefer ctﬃnd

4.1, as it is the only open-source option, and because it allows reading in the movie-averaged power spectra calculation

by relion’s own implementation of the motioncor2 algorithm. Support for GCTF was dropped in RELION 5.0. Fill in

the settings as follows:

On the I/O :

Input micrographs STAR ﬁle:

Motioncorr/job002/corrected_micrographs.star

(You can again use the Browse button to select the corrected_micrographs.star ﬁle of the

Motion correction job.)

Use micrograph without dose-weighting?

(These may have better Thon rings than the dose-weighted ones, but we decided in the previous step

not to write these out)

Estimate phase shifts?

(This is only useful for phase-plate data)

Amount of astigmatism (A):

100

(Assuming your scope was reasonably well aligned, this value will be suitable for many data sets.)

On the CTFFIND-4.1 tab, set:

Use CTFFIND-4.1?

Yes

CTFFIND-4.1 executable:

/wherever/it/is/ctﬃnd.exe

Use power spectra from MotionCorr job?

Yes

(We can use these, as we told relion’s own implementation of the motioncor2 algorithm to write these

out in the previous section.)

Use exhaustive search?

(In diﬃcult cases, the slower exhaustive searches may yield better results. For these data, this is not

necessary.)

Estimate CTF on window size (pix)

-1

24 Chapter 4. Single particle tutorial

RELION

(If a positive value is given, a squared window of this size at the center of the micrograph will be

used to estimate the CTF. This may be useful to exclude parts of the micrograph that are unsuitable

for CTF estimation, e.g. the labels at the edge of photographic ﬁlm. )

FFT box size (pix):

512

Minimum resolution (A):

Maximum resolution (A):

Minimum defocus cvalue (A):

5000

Maximum defocus cvalue (A):

50000

Defocus step size (A):

500

On the Running tab, use six MPI processes to process the 24 micrographs in parallel. This took less than 10 seconds on

our machine. Once the job ﬁnishes there are additional ﬁles for each micrograph inside the output CtfFind/job003/

Movies directory: the .ctf ﬁle contains an image in MRC format with the computed power spectrum and the ﬁtted

CTF model; the .log ﬁle contains the output from ctﬃnd; the .com ﬁle contains the script that was used to launch

ctﬃnd.

You can visualise all the Thon-ring images using the Display button, selecting out: micrographs_ctf.star. The

zeros between the Thon rings in the experimental images should coincide with the ones in the model. Note that you can

sort the display in order of defocus, maximum resolution, ﬁgure-of-merit, etc. The logfile.pdf ﬁle contains plots of

useful parameters, such as defocus, astigmatism, estimated resolution, etc for all micrographs, and histograms of these

values over the entire data set. Analysing these plots may be useful to spot problems in your data acquisition.

If you see CTF models that are not a satisfactory ﬁt to the experimental Thon rings, you can delete the .log ﬁles

for those micrographs, select the CtfFind/job003 entry from the Finished jobs list, alter the parameters in the

parameter-panel, and then re-run the job by clicking the Continue! button. Only those micrographs for which a .log

ﬁle does not exist will be re-processed. You can do this until all CTF models are satisfactory. If this is not possible,

or if you decide to discard micrographs because they have unsatisfactory Thon rins, you can use the Subset selection

job-type to do this.

4.2.4 Manual particle picking

The next job-type Manual picking may be used to manually select particle coordinates in the (averaged) micrographs.

We like to manually select at least several micrographs in order to get familiar with our data. Often, the manually

selected particles to calculate reference-free 2D class averages, which will then be used as templates for automated

particle picking of the entire data set. However, as of release 3.0, relion also contains a reference-free auto-picking

procedure based on a Laplacian-of-Gaussian (LoG) ﬁlter. In many cases, this procedure provides reasonable starting

coordinates, so that the Manual picking step may be skipped. The pre-shipped Schemes for on-the-ﬂy processing in the

relion_it.py script make use of this functionality to perform fully automated on-the-ﬂy processing. In this tutorial,

we will just launch a Manual picking job for illustrative purposes, and then proceed with LoG-based Auto-picking to

generate the ﬁrst set of particles.

Picking particles manually is a personal experience! If you don’t like to pick particles in relion, we also support

coordinate ﬁle formats for Jude Short’s ximdisp [Smi99] (with any extension); for xmipp-2.4 [SNRS+08] (with any

extension); and for Steven Ludtke’s e2boxer.py [TPB+07] (with a .box extension). If you use any of these, make

4.2. Preprocessing 25

RELION

sure to save the coordinate ﬁles as a text ﬁle in the same directory as from where you imported the micrographs (or

movies), and with the same micrograph rootname, but a diﬀerent (suﬃx+) extension as the micrograph, e.g. Movies/

006.box or Movies/006_pick.star for micrograph Movies/006.mrc. You should then use the Import job-type

and set Node type: to 2D/3D particle coordinates. Make sure that the Input Files: ﬁeld contains a linux

wildcard, followed by the coordinate-ﬁle suﬃx, e.g. for the examples above you have to give Movies/*.box or

Movies/*_pick.star, respectively.

On the I/O tab of the Manual picking job-type, use the Browse button to select the micrographs_ctf.star ﬁle that

was created in CtfFind/job003, ignore the Colors tab, and ﬁll in the Display tab as follows:

Particle diameter (A):

200

(This merely controls the diameter of the circle that is displayed on the micrograph.)

Scale for micrographs:

0.25

(But this depends on your screen size)

Sigma contrast:

(Micrographs are often best display with sigma-contrast, i.e. black will be 3 standard deviation

below the mean and white will be 3 standard deviations above the mean. The grey-scale is always

linear from black to white. See the DisplayImages entry on the RELION wiki for more details)

White value:

(Use this to manually set which value will be white. For this to work, Sigma contrast should be

set to 0)

Black value:

(Use this to manually set which value will be black. For this to work, Sigma contrast should be

set to 0)

Lowpass ﬁlter (A):

-1

(Playing with this may help you to see particles better in very noisy micrographs)

Highpass ﬁlter (A):

-1

(This is sometimes useful to remove dark->light gradients over the entire micrograph)

Pixel size:

0.885

(This is needed to calculate the particle diameter, and the low- and high-pass ﬁlters)

OR use Topaz denoising?:

Yes

(This has been a feature since relion-4.0 and will make a system call to topaz)

26 Chapter 4. Single particle tutorial

RELION

ò Note

As of relion-5.0, Topaz comes pre-installed in the relion-5 conda environment, which should be picked up auto-

matically by the GUI.

Run the job by clicking the Run! button and click on a few particles if you want to. However, as we will use the

LoG-based autopicking in the next section, you do not need to pick any if you don’t want to. If you were going to use

manually picked particles for an initial 2D classiﬁcation job, then you would need approximately 500-1,000 particles

in order to calculate reasonable class averages. Left-mouse click for picking, middle-mouse click for deleting a picked

particle, right-mouse click for a pop-up menu in which you will need to save the coordinates!. Note that you can

always come back to pick more from where you left it (provided you saved the star ﬁles with the coordinates throught

the pop-up menu), by selecting ManualPick/job004 from the Finished jobs and clicking the Continue! button.

4.3 Particle picking

4.3.1 Select a subset of the micrographs

We will now use a template-free auto-picking procedure based on a Laplacian-of-Gaussian (LoG) ﬁlter to select an

initial set of particles. These particles will then be used in a 2D classiﬁcation job to generate 2D class averages. The

tutorial up until relion-3.1 would suggest to use the resulting class averages as templates for a second, reference-based

Auto-picking job. Since relion-4.0, there is also an integrated topaz wrapper in the Auto-picking job, which will be

used instead. In addition, we will use a new automated 2D class average selection procedure to select particles that

contribute to good classes without any user interaction. The selected particles will then be used to train the neural

network in topaz to speciﬁcally pick particles for this data set. Alternatively, one could run topaz picking with their

default neural network straight away. In that case, one could skip the jobs of LoG-picking, 2D classiﬁcation, automated

2D class selection and re-training of the topaz network below, and proceed straight to the last Auto-picking job on this

page.

One typically trains the topaz neural network on a relatively small subset of the micrographs. In order to select a subset

of the micrographs, go to the Subset selection job, and on the I/O tab leave everything empty, except:

OR select from micrographs.star:

CtfFind/job003/micrographs_ctf.star

Then, on the Subsets tab, set:

OR split into subsets?

Yes

Randomise order before making subsets?

Subset size:

OR number of subsets:

-1

Then press Run! , which will create star ﬁles with subsets of 10 micrographs in the output directory. We will only use

the ﬁrst one Select/job005/micrographs_split1.star.

Note that if one would have preferred a more user-interactive way of selecting micrographs for training, one could

have also selected certain micrographs in the GUI of the previous Manual picking job, to then save a ﬁle called

micrographs_selected.star inside that output directory.

4.3. Particle picking 27

RELION

4.3.2 LoG-based auto-picking

Now, proceed to the Auto-picking job, and on the I/O tab set:

Input micrographs for autopick:

Select/job005/micrographs_split1.star

Pixel size in micrographs (A)

-1

(The pixel size will be set automatically from the information in the input STAR ﬁle.)

Use reference-based template-matching?

OR: use Laplacian-of-Gaussian?

Yes

OR: use Topaz?

On the Laplacian tab, set:

Min. diameter for loG ﬁlter (A)

150

Max. diameter for loG ﬁlter (A)

180

(This should correspond to the smallest and largest size of your particless projections in Ångstroms.)

Are the particles white?

(They are black.)

Maximum resolution to consider

(Just leave the default value here.)

Adjust default threshold

(Positive values, i.e. high thresholds, will pick fewer particles, negative values will pick more parti-

cles. Useful values are probably in the range [-1,1], but in many cases the default value of zero will

do a decent job. The threshold is moved this many standard deviations away from the average.)

Upper threshold

(Use this to discard picks with LoG values that are this many standard deviations above the average,

e.g. to avoid high contrast contamination like ice and ethane droplets. Good values depend on the

contrast of micrographs and may need to be interactively explored; for low contrast micrographs,

values of ~ 1.5 may be reasonable, but this value is too low for the high-contrast micrographs in this

tutorial.)

Ignore the Topaz , References , autopicking and Helix tabs, and run using a single MPI processor on the Running tab .

Perhaps an alias like LoG would be meaningful? Using a single processor, these calculations take about 15 seconds on

our computer.

28 Chapter 4. Single particle tutorial

RELION

You can check the results by clicking the autopick.star option from the Display: button. One could manually

add/delete particles in the pop-up window that appears at this stage. In addition, one could choose to pick more or

fewer particle by running a new job while adjusting the default threshold on the Laplacian tab, and/or the parameters

for the stddev and avg of the noise on the autopicking tab. However, at this stage we are merely after a more-or-less OK

initial set of particles for the generation of templates for a second auto-picking job, so in many cases this is probably

not necessary.

4.3.3 Particle extraction

Once you have a coordinate ﬁle for every micrograph that you want to pick particles from, you can extract the cor-

responding particles and gather all required metadata through the Particle extraction job-type. On the corresponding

I/O tab, set:

micrograph STAR ﬁle:

CtfFind/job003/micrographs_ctf.star

(Use the Browse button to select this ﬁle. You could also chose the selected micrographs ﬁle from

the ManualPick directory. It doesn’t matter as there are only coordinate ﬁles for the three selected

micrographs anyway. Warning that coordinates ﬁles are missing for the rest of the micrographs will

appear in red in the bottom window of the GUI.)

Input coordinates:

AutoPick/job006/autopick.star

(Use the Browse button to select this ﬁle)

OR re-extract reﬁned particles?

(This option allows you to use a _data.star ﬁle from a 2D cassiﬁcation , 3D classiﬁcation or

3D auto-reﬁne job for re-extraction of only those particles in the star ﬁle. This may for example

be useful if you had previously down-scaled your particles upon extraction, and after initial classiﬁ-

cations you now want to perform reﬁnements with the original-scaled particles. As of relion-3.0, this

functionality has been extended with an option to ‘re-center reﬁned coordinates’ on a user-speciﬁed

X,Y,Z-coordinate in the 3D reference used for a 3D classiﬁcation or 3D auto-reﬁne job. This will

adjust the X and Y origin coordinates of all particles, such that a reconstruction of the newly extracted

particles will be centered on that X,Y,Z position. This is useful for focused reﬁnements.)

Write output in ﬂoat16?

Yes

(If set to Yes, this program will write output images in ﬂoat16 MRC format. This will save a factor

of two in disk space compared to the default of writing in ﬂoat32. Note that RELION and CCPEM

will read ﬂoat16 images, but other programs may not (yet) do so.)

On the extract tab you set the parameters for the actual particle extraction:

Particle box size (pix):

256

(This should always be an even number!)

Invert contrast?

Yes

(This makes white instead of black particles.)

4.3. Particle picking 29

RELION

Normalize particles?

Yes

(We always normalize.)

Diameter background circle (pix):

200

(Particles will be normalized to a mean value of zero and a standard-deviation of one for all pixels

in the background area.The background area is deﬁned as all pixels outside a circle with this given

diameter in pixels (before rescaling). When specifying a negative value, a default value of 75% of

the Particle box size will be used.)

Stddev for white dust removal:

-1

Stddev for black dust removal:

-1

(We only remove very white or black outlier pixels if we actually see them in the data. In such cases

we would use stddev values of 5 or so. In this data set there are no outlier pixels, so we don’t correct

for them, and leave the default values at -1 (i.e. don’t do anything).

Rescale particles?

Yes

(Down-scaling particles will speed up computations. Therefore, we often down-scale particles in

the initial stages of processing, in order to speed up the initial classiﬁcations of suitable particles.

Once our reconstructions get close to the Nyquist frequency, we then re-extract the particles without

down-scaling.)

Re-scaled sized (pixels)?

Use autopick FOM threshold?

(This option allows to only extract those particles with the highest ﬁgure-of-merits from the autopick-

ing procedure. We will use this later on to extract particles picked by topaz.)

As we will later on also use the same job-type to extract all template-based auto-picked particles, it may be a good idea

to give this job an alias like LoG. Ignore the Helix tab, and run using a single MPI processor.

Your particles will be extracted into MRC stacks (which always have an .mrcs extension in relion) in a new directory

called Extract/job007/Movies/. It’s always a good idea to quickly check that all has gone OK by visualising your

extracted particles selecting out: particles.star from the Display: button. Right-mouse clicking in the display

window may be used for example to select all particles (Invert selection) and calculating the average of all unaligned

particles (Show average of selection).

30 Chapter 4. Single particle tutorial

RELION

4.3.4 2D class averaging to select good particles

To calculate templates for the subsequent auto-picking of all micrographs, we will use the 2D classiﬁcation job-type.

On the I/O tab, set:

Input images STAR ﬁle

Extract/job007/particles.star

Continue from here

(Note that any 2D classiﬁcation , 3D initial model , 3D classiﬁcation , or 3D auto-reﬁne jobs may be

continued in case it stalls, by providing the _optimiser.star ﬁle from the last completed iteration.)

On the CTF tab set:

Do CTF-correction?

Yes

(We will perform full phase+amplitude correction inside the Bayesian framework)

Ignore CTFs until ﬁrst peak?

(This option is occasionally useful, when amplitude correction gives spuriously strong low-resolution

components, and all particles get classiﬁed together in very few, fuzzy classes.)

On the Optimisation tab, set:

Number of classes:

(For cryo-EM data we like to use on average at least approximately 100 particles per class. For

negative stain one may use fewer, e.g. 20-50 particles per class. However, with this small number

of particles, we have observed a better separation into diﬀerent classes by relaxing these numbers.

Possibly, always having a minimum of 50 classes is not a bad idea.)

Regularisation parameter T:

(For the exact deﬁnition of T, please refer to [Sch12a]. For cryo-EM 2D classiﬁcation we typically use

values of T=2-3, and for 3D classiﬁcation values of 3-4. For negative stain sometimes slightly lower

values are better. In general, if your class averages appear very noisy, then lower T; if your class

averages remain too-low resolution, then increase T. The main thing is to be aware of overﬁtting

high-resolution noise.)

Use EM algorithm?

Yes

(This is the standard Expectation Maximisation algorithm in relion.)

Number of iterations:

(For the default EM-algorithm, one normally doesn’t change the default of 25 iterations)

Use VDAM algorithm?

(This is gradient-descent-like algorithm that was introduced in relion-4.0. It runs much faster than

the standard EM-algorithm for large data sets, and has been observed to yield better class average

4.3. Particle picking 31

RELION

images in many cases. It is however slower for data sets with only a few thousand particles, which is

the main reason we are not using it here.)

Mask diameter (A):

200

(This mask will be applied to all 2D class averages. It will also be used to remove solvent noise

and neighbouring particles in the corner of the particle images. On one hand, you want to keep the

diameter small, as too much noisy solvent and neighbouring particles may interfere with alignment.

On the other hand, you want to make sure the diameter is larger than the longest dimension of your

particles, as you do not want to clip oﬀ any signal from the class averages.)

Mask individual particles with zeros?

Yes

Limit resolution E-step to (A):

-1

(If a positive value is given, then no frequencies beyond this value will be included in the alignment.

This can also be useful to prevent overﬁtting. Here we don’t really need it, but it could have been set

to 10-15A anyway. Diﬃcult classiﬁcations, i.e. with very noisy data, often beneﬁt from limiting the

resolution.)

Center class averages?

Yes

(This will re-center all class average images every iteration based on their center of mass. This is

useful for their subsequent use in template-based auto-picking, but also for the automated 2D class

average image selection in the next section.)

On the Sampling tab we hardly ever change the defaults. Six degrees angular sampling is enough for most projects,

although some large icosahedral viruses or some ﬁlamentous structures may beneﬁt from ﬁner angular samplings.

Ignore the Helix tab, and on the Compute tab, set:

Use parallel disc I/O?

Yes

(This way, all MPI slaves will read their own particles from disc. Use this option if you have a fast

(parallel?) ﬁle system. Note that non-parallel ﬁle systems may not be able to handle parallel access

from multiple MPI nodes. In such cases one could set this option to No. In that case, only the master

MPI node will read in the particles and send them through the network to the MPI slaves.)

Number of pooled particles:

(Particles are processed in individual batches by MPI slaves. During each batch, a stack of particle

images is only opened and closed once to improve disk access times. All particle images of a single

batch are read into memory together. The size of these batches is at least one particle per thread

used. The nr_pooled_particles parameter controls how many particles are read together for

each thread. If it is set to 30 and one uses 8 threads, batches of 30x8=240 particles will be read

together. This may improve performance on systems where disk access, and particularly metadata

handling of disk access, is a problem. Typically, when using GPUs we use values of 10-30; when

using only CPUs we use much smaller values, like 3. This option has a modest cost of increased

RAM usage.)

Pre-read all particles into RAM?

Yes

32 Chapter 4. Single particle tutorial

RELION

(If set to Yes, all particle images will be read into computer memory, which will greatly speed up

calculations on systems with slow disk access. However, one should of course be careful with the

amount of RAM available. Because particles are read in double-precision, it will take ( N × box_size

× box_size × 4 / (1024 × 1024 × 1024) ) Giga-bytes to read N particles into RAM. If parallel disc I/O

is set to Yes, then all MPI slaves will read in all particles. If parallel disc I/O is set to No, then only

the master reads all particles into RAM and sends those particles through the network to the MPI

slaves during the reﬁnement iterations.)

Copy particles to scratch directory?

(This is useful if you don’t have enough RAM to pre-read all particles, but you do have a fast (SSD?)

scratch disk on your computer. In that case, specify the name of the scratch disk where you can make

a temporary directory, e.g. /ssd)

Combine iterations through disc?

(This way all MPI nodes combine their data at the end of each iteration through the network. If the

network is your main bottle-neck or somehow causing problems, you can set this option to No. In

that case, all MPI nodes will write/read their data to disc.)

Use GPU acceleration?

Yes

(If you have a suitable GPU, this job will go much faster.)

Which GPUs to use:

0:1

(This will depend on the available GPUs on your system! If you leave this empty, the program will

try to ﬁgure out which GPUs to use, but you can explicitly tell it which GPU IDs , e.g. 0 or 1, to use.

If you use multiple MPI-processors, you can run each MPI process on a speciﬁed GPU. Our machine

has 2 GPUs, and we will use on MPI process on each GPU in this example. GPU IDs for diﬀerent

MPI processes are separated by colons, e.g. 0:1:0:1 will run MPI process 0 and 2 on GPU 0, and

MPI process 1 and 3 will run on GPU 1. GPU IDs for diﬀerent threads are separated by commas, so

when using a single MPI process one could still use multiple GPUs, e.g. 0,1,2,3. Combinations of

colons and commas are also possible.)

On the Running tab, specify:

Number of MPI procs

(Note that when using the EM-algorithm, 2D classiﬁcation , 3D classiﬁcation , 3D initial model and

3D auto-reﬁne use one MPI process as a master, which does not do any calculations itself, but sends

jobs to the other MPI processors. Therefore, we often run the EM-algorithm using a single worker

MPI process on each of the available GPUs, so we specify 3 here to include the master and one

workers on each of the two GPUs.)

Number of threads

(Threads oﬀer the advantage of more eﬃcient RAM usage, whereas MPI parallelization may scale

better than threads for iterations with many particles. Often, you may want to adjust the number of

threads to make full use of all the CPU cores on your computer. The total number of requested CPUs,

or cores, will be the product of the number of MPI processors and the number of threads.)

Because we will run more 2D classiﬁcation jobs, it may again be a good idea to use a meaningful alias, for example

LoG. You can look at the resulting class averages using the Display: button to select out: run_it025_optimiser.star

4.3. Particle picking 33

RELION

from. On the pop-up window, you may want to choose to look at the class averages in a speciﬁc order, e.g. based on

rlnClassDistribution (in reverse order, i.e. from high-to-low instead of the default low-to-high) or on rlnAccuracyRo-

tations.

4.3.5 Selecting good 2D classes for Topaz training

Selection of suitable class average images is done in the Subset selection job-type. Up until relion-3.1, this step was

always done interactively by the user, who would select good class averages by clicking on them in the GUI. As of

relion-4.0, there is also an automated procedure, based on a neural network that was trained on thousands of 2D class

averages. This option will be used below.

On the I/O tab, remove the micrographs.star ﬁle entry from before, and set:

Select classes from job:

Class2D/job008/run_it025_optimiser.star

On the Class options tab, give:

Automatically select 2D classes?

Yes

Minimum threshold for auto-selection

0.21

(The score ranges from 0 for absolute rubbish class average images to 1 for gorgeous ones. We are

using a relatively low value here, because we only have a few particles, so the 2D class averages will

probably not look very good. On your own data sets, you will probably want to run the program

once, sort your class averages on their predicted score and decide what a good value is for those class

averages; also see below.)

Select at least this many particles

-1

(If this is value is positive, then even if they have scores below the minimum threshold, select at least

this many particles with the best scores.)

OR: select at least this many classes

-1

(If this is value is positive, then even if they have scores below the minimum threshold, select at least

this many classes with the best scores.)

Re-center the class averages?

(This option allows automated centering of the 2D class averages, but we already did that during 2D

class averaging. In particular when using class average images for auto-picking it is important that

the are centered, as otherwise all your particle coordinates will become systematically oﬀ-centered.)

Regroup the particles?

(This option is useful when there are very few (selected) particles on individual micrographs, in which

case the estimation of noise power spectra and scale factors become unstable. By default, the latter

are calculated independently per micrograph. This option allows to grouping particles from multiple

micrographs together in these calcutaions. relion will warn you (in classiﬁcation or auto-reﬁne runs)

when your groups become too small.)

34 Chapter 4. Single particle tutorial

RELION

On the Subsets tab, make sure you switch to No again the following option:

:OR: split into subsets? No

Ignore the other tabs, and run the job. You can visualise the results of the automated class selection by selecting

rank_optimiser.star from the Display: button, and sort the images on rlnClassScore, in reverse order. Do you

want to adjust the threshold for auto-selection?

4.3.6 Re-training the TOPAZ neural network

In older versions of the relion tutorial, one would now use the selected 2D class averages as templates for reference-

based auto-picking. Instead, the new wrapper to topaz will be used to ﬁrst re-train the neural network in topaz and then

to pick the entire data set using the retrained network. Note that for this data set one could also have foregone re-training

of topaz and just use the pretrained network it comes with. This tutorial is merely showing you the re-training option

as it may be relevant for your own data.

On the I/O tab of the Auto-picking job-type, set:

Input micrographs for autopick:

Select/job005/micrographs_split1.star

Pixel size in micrographs (A)

-1

Use reference-based template-matching?

OR: use Laplacian-of-Gaussian?

OR: use Topaz?

Yes

On the Topaz tab, set:

Particle diameter (A)

180

Perform topaz picking?

Perform topaz training?

Yes

Nr of particles per micrograph

300

Input picked coordinates for training

(This option can be used to train on manually selected particles from a Manual picking job. We will use the automatically selected particles from the previous step instead.)

OR train on a set of particles?

Yes

Particles STAR ﬁle for training

Select/job009/particles.star

Additional topaz arguments

4.3. Particle picking 35

RELION

On the autopicking tab, you can ignore everything except the below:

Use GPU acceleration?

Yes

(Topaz picking and training require one GPU)

Which GPUs to use:

Ignore the other tabs, and run using a single MPI processor on the Running tab . On our computer, with a Titan V GPU,

this step took 10 minutes. Perhaps a good time for a quick cup of coﬀee?

4.3.7 Pick all micrographs with the re-trained TOPAZ neural network

On the I/O tab of a new Auto-picking job, set:

Input micrographs for autopick:

CtfFind/job003/micrographs_ctf.star

Pixel size in micrographs (A)

-1

Use reference-based template-matching?

OR: use Laplacian-of-Gaussian?

OR: use Topaz?

Yes

On the Topaz tab, set:

Particle diameter (A)

180

Perform topaz picking?

Yes

Trained topaz model

AutoPick/job010/model_epoch10.sav

Perform topaz training?

Nr of particles per micrograph

300

Additional topaz arguments

On the autopicking tab, you can ignore everything except the below:

Use GPU acceleration?

Yes

(Topaz picking and training require one GPU)

36 Chapter 4. Single particle tutorial

RELION

Which GPUs to use:

On our computer, running with a single process, this step takes approximately 4 minutes. Note that re-training of topaz

is not parallelised and should always be performed with a single MPI process. However, picking with topaz has been

parallelised and can be run using multiple MPI processes.

The number of particles from default topaz picking will be relatively high, because no threshold to its ﬁgure-of-merit

will be applied. The ﬁgure-of-merits for all picks are stored in the rlnAutopickFigureOfMerit column in the output

STAR ﬁles. A minimum threshold of -3 is probably reasonable in many cases. One can visualise the ﬁgure of merits

by colouring the picks in the micrographs. For that, change the colouring parameters in the Manual picking job-type.

On the following on the Colors tab, set:

Blue<>red color particles?

Yes

MetaDataLabel for color:

rlnAutopickFigureOfMerit

STAR ﬁle with color label:

Blue value:

Red value:

-3

and save the settings to the project directory with the Save job.star menu item from the top left Jobs menu.

Then, select autopick.star from the Display: button of the Autopick/job011 job to launch the GUI. From the

File menu at the top left of its main window, one can use Set FOM threshold to display only picks with a FOM

above the threshold A similar option is also available in the per-micrograph viewer, using the right-mouse button pop-up

menu. Picks with a high threshold will be blue; picks with a low threshold will be red.

4.3.8 Particle extraction

Finally, one needs to re-extract the ﬁnal set of picked coordinates by again using the Particle extraction job-type.

On the corresponding I/O tab, set:

micrograph STAR ﬁle:

CtfFind/job003/micrographs_ctf.star

Input coordinates:

AutoPick/job011/autopick.star

OR re-extract reﬁned particles?

Leave all the other options as they were before, except for the extract tab, where one sets:

Use autopick FOM threshold?

Yes

Minimum autopick FOM

-3

4.3. Particle picking 37

RELION

Write output in ﬂoat16?

Yes

Running this job will generate the initial particle set for further processing. Using four MPI processors, this job takes

a few seconds.

4.4 Reference-free 2D class averaging

We almost always use reference-free 2D class averaging to throw away bad particles. Because bad particles do not

average well together, they often go to relatively small classes that yield ugly 2D class averages. Throwing those away

then becomes an eﬃcient way of cleaning up your data.

4.4.1 Running the job

Most options will remain the same as explained when we were generating templates for the auto-picking in the previous

section, but on the I/O tab of the 2D classiﬁcation job-type, set:

Input images STAR ﬁle:

Extract/job012/particles.star

and on the Optimisation tab, we used:

Number of classes:

100

(because we now have more particles, we can allow more classes than before.)

Regularisation parameter T:

(For the exact deﬁnition of T, please refer to [Sch12a]. For cryo-EM 2D classiﬁcation we typically use

values of T=2-3, and for 3D classiﬁcation values of 3-4. For negative stain sometimes slightly lower

values are better. In general, if your class averages appear very noisy, then lower T; if your class

averages remain too-low resolution, then increase T. The main thing is to be aware of overﬁtting

high-resolution noise.)

Use EM algorithm?

(This is the standard Expectation Maximisation algorithm in relion.)

Use VDAM algorithm?

Yes

(This is gradient-descent-like algorithm that was introduced in relion-4.0. It runs much faster than

the standard EM-algorithm for large data sets, and has been observed to yield better class average

images in many cases.)

Number of VDAM mini-batches

100

(Number of mini-batches to be processed using the VDAM algorithm. Using 200 has given good

results for many data sets. Using 100 will run faster, but sometimes at the expense of some quality

loss.)

38 Chapter 4. Single particle tutorial

RELION

Mask diameter (A):

200

(This mask will be applied to all 2D class averages. It will also be used to remove solvent noise

and neighbouring particles in the corner of the particle images. On one hand, you want to keep the

diameter small, as too much noisy solvent and neighbouring particles may interfere with alignment.

On the other hand, you want to make sure the diameter is larger than the longest dimension of your

particles, as you do not want to clip oﬀ any signal from the class averages.)

Mask individual particles with zeros?

Yes

Limit resolution E-step to (A):

-1

(If a positive value is given, then no frequencies beyond this value will be included in the alignment.

This can also be useful to prevent overﬁtting. Here we don’t really need it, but it could have been set

to 10-15A anyway. Diﬃcult classiﬁcations, i.e. with very noisy data, often beneﬁt from limiting the

resolution.)

Center class averages?

Yes

(This will re-center all class average images every iteration based on their center of mass. This is

useful for their subsequent use in template-based auto-picking, but also for the automated 2D class

average image selection in the next section.)

The VDAM algorithm cannot be run with MPI parallelisation. Therefore, we ran with the following options on the

Compute tab:

Use GPU acceleration?

Yes

Which GPUs to use:

0,1,2,3

(Our machine has 4 GPUs, so we’re using them all.)

and on the Running tab, we used:

Number of MPI procs:

Number of threads:

The job then took just over 3 minutes.

4.4.2 Selecting good particles for further processing

After the 2D classiﬁcation job has ﬁnished, we can launch another Subset selection job (Select/job014).

On the I/O tab, set:

Select classes from job:

Class2D/job013/run_it100_optimiser.star

On the Class options tab, set:

4.4. Reference-free 2D class averaging 39

RELION

Automatically select 2D classes?

Yes

Minimum threshold for auto-selection

0.1

(You may need to run the program twice, perhaps by overwriting the previous run (using the Over-

write option from the Jobactions: button, to select the appropriate threshold. Alternatively, you can

also selected the classes interactively, while sorting on rlnClassDistribution.)

We got 4598 particles from 34 selected classes.

Note that this procedure of 2D classiﬁcation and Subset selection may be repeated several times. But if you do so, be

careful not to throw away your minority views!

4.4.3 Analysing the Class2D results in more detail

ò Note

If you are in a hurry to get through this tutorial, you can skip this sub-section. It contains more detailed information

for the interested reader.

For every iteration of 2D or 3D classiﬁcation relion performs, it writes out a set of ﬁles. For the last iteration of our 2D

class averaging calculation these are:

• Class2D/job013/run_it100_classes.mrcs is the MRC stack with the resulting class averages. These are

the images that will be displayed in the relion GUI when you select the _optimiser.star ﬁle from the Display:

button on the main GUI. Note that relion performs full CTF correction (if selected on the GUI), so your class

averages are probably white on a black background. If the data is good, often they are very much like projections

of a low-pass ﬁltered atomic model. The quality of your 2D class averages are a very good indication of how good

your 3D map will become. We like to see internal structure within projections of protein domains, and the solvent

area around you particles should ideally be ﬂat. Radially extending streaks in the solvent region are a typical

sign of overﬁtting. If this happens, you could try to limit the resolution in the E-step of the 2D classiﬁcation

algorithm.

• Class2D/job013/run_it100_model.star contains the model parameters that are reﬁned besides the actual

class averages (i.e. the distribution of the images over the classes, the spherical average of the signal-to-noise

ratios in the reconstructed structures, the noise spectra of all groups, etc. Have a look at this ﬁle using the less

command. In particular, check the distribution of particles over each class in the table data_model_classes.

If you compare this with the class averages themselves, you will see that particles with few classes are low-

resolution, while classes with many particles are high-resolution. This is an important feature of the Bayesian

approach, as averaging over fewer particles will naturally lead to lower signal-to-noise ratios in the average. The

estimated spectral signal-to-noise ratios for each class are stored in the data_model_class_N tables, where

N is the number of each class. The table data_model_groups stores a reﬁned intensity scale-factor for each

group: groups with values higher than one have a stronger signal than the average, relatively low-signal groups

have values lower than one. These values are often correlated with the defocus, but also depend on accumulated

contamination and ice thickness. For each diﬀerent optics group, the estimated noise spectra are stored in tables

called data_model_optics_group_N.

• Class2D/job013/run_it100_data.star contains all metadata related to the individual particles. Besides

the information in the input particles.star ﬁle, there is now additional information about the optimal orien-

tations, the optimal class assignment, the contribution to the log-likelihood, etc. Note that this ﬁle can be used

again as input for a new reﬁnement, as the star ﬁle format remains the same.

40 Chapter 4. Single particle tutorial

RELION

• Class2D/job013/run_it100_optimiser.star contains some general information about the reﬁnement pro-

cess that is necessary for restarting an unﬁnished run. For example, if you think the process did not converge yet

after 25 iterations (you could compare the class averages from iterations 24 and 25 to assess that), you could select

this job in the Finished jobs panel, and on the I/O tab select this ﬁle for Continue from here, and then set

Number of iterations: 40 on the Optimisation tab. The job will then restart at iteration 26 and run until

iteration 40. You might also choose to use a ﬁner angular or translational sampling rate on the Sampling tab.

Another useful feature of the optimiser.star ﬁles is that it’s ﬁrst line contains a comment with the exact command

line argument that was given to this run. As of release-4.0, relion also uses the optimiser.star ﬁles as input nodes

for diﬀerent types of subsequent jobs. For example, it replaces the model.star input nodes for Subset selection

jobs.

• Class2D/job013/run_it100_sampling.star contains information about the employed sampling rates. This

ﬁle is also necessary for restarting.

4.4.4 Making groups

ò Note

If you are in a hurry to get through this tutorial, you can skip this sub-section. It contains more detailed information

for the interested reader.

relion groups particles together for the estimation of a single-number intensity scale factor that describes diﬀerences

in overall signal-to-noise ratios between diﬀerent parts of the data, e.g. due to ice thickness, defocus or contamination.

The default behaviour is to treat all particles from each micrograph as a separate group. This behaviour is ﬁne if you

have many particles per micrograph, but when you are using a high magniﬁcation, your sample is very diluted, or your

ﬁnal selection contains only a few particles per micrograph, then the estimation of the intensity scale factor (and the

noise spectra) may become unstable. We generally recommend to have at least 10-20 particles in each group, but do

note that initial numbers of particles per group may become much smaller after 2D and 3D classiﬁcation.

In cases with few particles per micrograph we recommend to group particles from multiple micrographs together.

For this purpose, the GUI implements a convenient functionality in the Subset selection job-type: when selecting a

_optimiser.star ﬁle on the I/O tab, one can use Regroup particles? Yes and Approximate nr of groups:

5 on the Class options tab to re-group all particles into 5 groups. (The actual number may vary somewhat from the input

value, hence the Approximate on the input ﬁeld.) This way, complicated grouping procedures in previous releases of

relion may be avoided. As the micrographs in this tutorial do contain suﬃcient particles, we will not use this procedure

now.

Please note that the groups in relion are very diﬀerent from defocus groups that are sometimes used in other programs.

relion will always use per-particle (anisotropic) CTF correction, irrespective of the groups used.

4.5 De novo 3D model generation

relion-4.0 uses a gradient-driven algorithm to generate a de novo 3D initial reference

from the 2D particles. As of

release 4.0, this algorithm is diﬀerent from the SGD algorithm in the CryoSPARC program [PRFB17]. Provided you

have a reasonable distribution of viewing directions, and your data were good enough to yield detailed class averages

in 2D classiﬁcation , this algorithm is likely to yield a suitable, low-resolution model that can subsequently be used for

3D classiﬁcation or 3D auto-reﬁne .

4.5. De novo 3D model generation 41

RELION

4.5.1 Running the job

Select the Select/job014/particles.star ﬁle on the I/O tab of the 3D initial reference jobtype. Everything is

aready in order on the CTF . Fill in the Optimisation tab as follows (leave the defaults for the angular and oﬀset

sampling):

Number of VDAM mini-batches

100

(The VDAM algorithm will loop over mini-batches, which contain only hundreds to thousands of

particles each.)

Regularisation parameter T

(The default is 4 for 3D runs.)

Number of classes

(Sometimes, using more than one class may help in providing a ‘sink’ for sub-optimal particles that

may still exist in the data set. In this case, which is quite homogeneous, a single class should work

just ﬁne.)

Mask diameter (A)

200

(The same as before).

Flatten and enforce non-negative solvent

Yes

Symmetry

Run in C1 and apply symmetry later?

Yes

(The actual reﬁnement will be run in C1, which has been observed to converge better than running

VDAM in higher symmetry groups. After the reﬁnement, the relion_align_symmetry program

is run automatically to detect the symmetry axes and the symmetry will be applied.)

On the Compute tab, set:

Use parallel disc I/O?

Yes

Number of pooled particles

Pre-read all particles into RAM?

Yes

(Again, this is only possible here because the data set is small. For your own data, you would like

write the particles to a scratch disk instead, see below.)

Copy particles to scratch directory

Combine iterations through disc?

42 Chapter 4. Single particle tutorial

RELION

Use GPU acceleration?

Yes

Which GPUs to use

0,1,2,3

On the Running tab, set:

Number of MPI procs

(Remember that the gradient-driven algorithm cannot be run with multiple MPI processes.)

Number of threads

Using the settings above, this job took 2 minutes on our system.

4.5.2 Analysing the results

You can look at the output map in 2D slices through the 3D map by selecting InitialModel/job015/

initial_model.mrc from the Display: button. We like looking at 3D maps in 2D slices, as it is a good way to

assess artifacts, for example streaks in the solvent region. You may also want to look at your map in 3D, with a 3D

viewer like UCSF chimera.

4.6 Unsupervised 3D classiﬁcation

All data sets are heterogeneous! The question is how much you are willing to tolerate. relion’s 3D multi-reference

reﬁnement procedure provides a powerful unsupervised 3D classiﬁcation approach.

4.6.1 Running the job

Unsupervised 3D classifcation may be run from the 3D classiﬁcation job-type. On the I/O tab set:

Input images STAR ﬁle:

Select/job014/particles.star

Reference map:

InitialModel/job015/initial_model.mrc

Reference mask (optional):

(Leave this empty. This is the place where we for example provided large/small-subunit masks for

our focussed ribosome reﬁnements. If left empty, a spherical mask with the particle diameter given

on the Optimisation tab will be used. This introduces the least bias into the classiﬁcation.)

On the Reference tab set:

Ref. map is on absolute greyscale:

Yes

(Given that this map was reconstructed from this data set, it is already on the correct greyscale. Any

map that is not reconstructed from the same data in relion should probably be considered as not being

on the correct greyscale.)

4.6. Unsupervised 3D classiﬁcation 43

RELION

Initial low-pass ﬁlter (A):

(One should NOT use high-resolution starting models as they may introduce bias into the reﬁnement

process. As also explained in [Sch10], one should ﬁlter the initial map as much as one can. For

ribosome we often use 70 Å, for smaller particles we typically use values of 40-60 Å.)

Symmetry:

(Although we know that this sample has D2 symmetry, it is often a good idea to perform an initial

classiﬁcation without any symmetry, so bad particles, which are not symmetric, can get separated

from proper ones, and the symmetry can be veriﬁed in the reconstructed maps.)

On the CTF tab set:

Do CTF correction?

Yes

Ignore CTFs until ﬁrst peak?

(Only use this option if you also did so in the 2D classiﬁcation job that you used to create the refer-

ences.)

On the Optimisation tab set:

Number of classes:

(Using more classes will divide the data set into more subsets, potentially describing more variability.

The computational costs scales linearly with the number of classes, both in terms of CPU time and

required computer memory.)

Regularisation parameter T:

For the exact deﬁnition of T, please refer to [Sch12a]. For cryo-EM 2D classiﬁcation we typically

use values of T=1-2, and for 3D classiﬁcation values of 2-4. For negative stain sometimes slightly

lower values are better. In general, if your class averages appear noisy, then lower T; if your class

averages remain too low resolution, then increase T. The main thing is to be aware of overﬁtting

high-resolution noise.

Number of iterations:

(We typically do not change this.)

Use fast subsets (for large data sets)?:

(This option will signiﬁcantly speed up calculations for data sets of hundreds of thousands pf parti-

cles. However, sometimes performance is aﬀected too. For small data sets like this one, we do not

recommend using this option.)

Mask diameter (A):

200

(Just use the same value as we did before in the 2D classiﬁcation job-type.)

Mask individual particles with zeros?

Yes

44 Chapter 4. Single particle tutorial

RELION

Limit resolution E-step to (A):

-1

(If a positive value is given, then no frequencies beyond this value will be included in the alignment.

This can also be useful to prevent overﬁtting. Here we don’t really need it, but it could have been set

to 10-15A anyway.)

Use Blush regularisation?

Yes

(Blush regularisation is a new feature in relion-5.0. It switches from the default smoothness prior

in Fourier space to a new prior in the form of a denoising convolutional neural network that was

trained on many EMDB half-maps. It is particularly powerful when the signal-to-noise ratios in the

images is low, e.g. because of small complexes, and standard 3D classiﬁcation, 3D auto-reﬁnement

(see next section of this tutorial), or multi-body reﬁnement suﬀers from overﬁtting. This tutorial data

set has relatively high signal-to-noise ratios, so using Blush regularisation is probably not necessary.

We used it here anyway to make you aware of the option. To run this option, the relion-5 conda

environment should be installed and you need CUDA-enabled GPUs to run pytorch on. If you

cannot run this, don’t worry and just set the option to No.)

On the Sampling tab one usually does not need to change anything (only for large and highly symmetric particles, like

icosahedral viruses, does one typically use a 3.7 degree angular sampling at this point). Ignore the Helix tab, and on

the Compute tab set:

Use parallel disc I/O?

Yes

Number of pooled particles:

Skip padding?

Pre-read all particles into RAM?

Yes

(Again, this is only possible here because the data set is small. For your own data, you would like

write the particles to a scratch disk instead, see below.)

Copy particles to scratch directory

Combine iterations through disc?

Use GPU acceleration?

Yes

Which GPUs to use

0:1:2:3

On the Running tab, set:

Number of MPI procs

Number of threads

4.6. Unsupervised 3D classiﬁcation 45

RELION

3D classiﬁcation takes more memory than 2D classiﬁcation, so often more threads are used. However, in this case the

images are rather small and RAM-shortage may not be such a big issue. On our computer with 4 GPUs, we used 5

MPIs and 6 threads, and this calculation took approximately 6 minutes without Blush. Switching Blush on changed

this to 13 minutes. The relatively diﬀerence will be much smaller for larger data sets that spend most of their time

during their E-step, whereas Blush regularisation happens only in the M-step.

When analysing the resulting class reconstructions, it is useful to also look at them in 2D slices, not only as a thresholded

map in for example UCSF chimera. In the slices view you will get a much better impression of unresolved heterogeneity,

which will show up as fuzzy or streaked regions in the slices. Slices also give a good impression of the ﬂatness of the

solvent region. Use the Display: button and select any of the reconstructions from the last iteration to open a slices-view

in relion; you can also see a central slice through all classes simultaneously by selecting the run_it025_optimiser.

star option from the Display: button. We then often reverse sort them on rlnClassDistribution.

When looking at your rendered maps in 3D, e.g. using UCSF chimera, it is often a good idea to ﬁt them all into the

best one, as maps may rotate slightly during reﬁnement. In chimera, we use the [Tools]-[Volume Data]-[Fit

in Map] tool for that. For looking at multiple maps alongside each other, we also like the [Tools]-[Structure

Comparison]-[Tile Structures] tool, combined with the independent center-of-rotation method on the

Viewing window.

4.6.2 Selecting good particles for further processing

After the 3D classiﬁcation job has ﬁnished, we can launch another Subset selection job.

On the I/O tab, set:

Select classes from job:

Class3D/job016/run_it25_optimiser.star

Because automated class selection has not been implemented for 3D classiﬁcation, on the Class options tab, we now

need to set:

Automatically select 2D classes?

We selected a single good class, discarding particles from 3 suboptimal classes. We retained over 4400 particles. This

well-behaved tutorial data set is relatively homogeneous.. .

Note that this procedure of 3D classiﬁcation and Subset selection may be repeated several times.

4.6.3 Analysing the results in more detail

ò Note

Again, if you are in a hurry to get through this tutorial, you can skip this sub-section.

It contains more detailed information for the interested reader.

The output ﬁles are basically the same as for the 2D classiﬁcation run (we’re actually using the same code for 2D

and 3D reﬁnements). The only diﬀerence is that the map for each class is saved as a separate MRC map, e.g.

run_it025_class00?.mrc, as opposed to the single MRC stack with 2D class averages that was output before.

As before, smaller classes will be low-pass ﬁltered more strongly than large classes, and the spectral signal-to-noise

ratios are stored in the data_model_class_N tables (with 𝑁 = 1, . . . , 𝐾) of the _model.star ﬁles. Perhaps now is

a good time to introduce two handy scripts that are useful to extract any type of data from star ﬁles. Try typing:

46 Chapter 4. Single particle tutorial

RELION

relion_star_printtable Class3D/job016/run_it025_model.star

data_model_class_1 rlnResolution rlnSsnrMap

It will print the two columns with the resolution (rlnResolution) and the spectral signal-to-noise ratio (rlnSsnrMap)

from table data_model_class_1 to the screen. You could redirect this to a ﬁle for subsequent plotting in your favourite

program. Alternatively, if gnuplot is installed on your system, you may type:

relion_star_plottable Class3D/job016/run_it025_model.star

data_model_class_1 rlnResolution rlnSsnrMap

To check whether your run had converged, (as mentioned above) you could also monitor:

grep _rlnChangesOptimalClasses Class3D/job016/run_it???_optimiser.star

As you may appreciate by now: the star ﬁles are a very convenient way of handling many diﬀerent types of input and out-

put data. Linux shell commands like grep and awk, possibly combined into scripts like relion_star_printtable,

provide you with a ﬂexible and powerful way to analyze your results.

4.7 High-resolution 3D reﬁnement

Once a subset of suﬃcient homogeneity has been selected, one may use the 3D auto-reﬁne procedure in relion to

reﬁne this subset to high resolution in a fully automated manner. This procedure employs the so-called gold-standard

way to calculate Fourier Shell Correlation (FSC) from independently reﬁned half-reconstructions in order to estimate

resolution, so that self-enhancing overﬁtting may be avoided [SC12]. Combined with a procedure to estimate the

accuracy of the angular assignments [Sch12b], it automatically determines when a reﬁnement has converged. Thereby,

this procedure requires very little user input, i.e. it remains objective, and has been observed to yield excellent maps

for many data sets. Another advantage is that one typically only needs to run it once, as there are hardly any parameters

to optimize.

However, before we start our high-resolution reﬁnement, we should ﬁrst re-extract our current set of selected particles

with less down-scaling, so that we can potentially go to higher resolution. To do this, go to the Particle extraction

jobtype on the GUI, and on the I/O tab give:

micrograph STAR ﬁle:

CtfFind/job003/micrographs_ctf.star

(This should still be there.)

Coordinate-ﬁle suﬃx:

(Leave this empty now.)

OR re-extract reﬁned particles?

Yes

Reﬁned particles STAR ﬁle:

Select/job017/particles.star

(Now, we will use only the reﬁned subset of selected particles.)

Reset the reﬁned oﬀsets to zero?

(This would discard the translational oﬀsets from the previous classiﬁcation runs.)

4.7. High-resolution 3D reﬁnement 47

RELION

OR: re-center reﬁned coordinates?

Yes

(This will re-center all the particles according to the aligned oﬀsets from the 3D classiﬁcation job

above.)

Recenter on - X, Y, Z (pix)

0 0 0

(We want to keep the centre of the molecule in the middle of the box.)

Write output in ﬂoat16?

Yes

And on the extract tab, we keep everything as it was, except:

Particle box size (pix)

360

(we will use a larger box, so that de-localised CTF signals can be better modeled. This is important

for the CTF reﬁnement later on.)

Rescale particles?

Yes

(to prevent working with very large images, let’s down-sample to a pixel size of 360*0.885/256=1.244

Å. This will limit our maximum achievable resolution to 2.5 Å, which is probably enough for such a

small data set.)

Re-scaled size (pixels):

256

In addition, we will need to rescale the best map obtained thus far to the 256-pixel box size. This is done from the

command-line, but make sure you select the correct class (check the ﬁle Class3D/job016/run_it025_model.star

to see which class is the largest)!:

relion_image_handler --i Class3D/job016/run_it025_class002.mrc \

--angpix 3.54 --rescale_angpix 1.244 --new_box 256 \

--o Class3D/job016/run_it025_class002_box256.mrc

4.7.1 Running the auto-reﬁne job

On the I/O tab of the 3D auto-reﬁne job-type set:

Input images STAR ﬁle:

Extract/job018/particles.star

Reference map:

Class3D/job016/run_it025_class002_box256.mrc

(Note this one is not directly available through the Browse button, as it was not part the relion pipeline

yet.)

Reference mask (optional):

(leave this empty for now)

On the Reference tab, set:

48 Chapter 4. Single particle tutorial

RELION

Ref. map is on absolute greyscale?

(because of the diﬀerent normalisation of down-scaled images, the rescaled map is no longer on the

correct absolute grey scale. Setting this option to No is therefore important, and will correct the

greyscale in the ﬁrst iteration of the reﬁnement.)

Initial low-pass ﬁlter (A)

(We typically start auto-reﬁnements from low-pass ﬁltered maps to prevent bias towards high-

frequency components in the map, and to maintain the gold-standard of completely independent

reﬁnements at resolutions higher than the initial one.)

Symmetry

(We now aim for high-resolution reﬁnement, so imposing symmetry will eﬀectively quadruple the

number of particles.)

Parameters on the CTF , Optimisation tabs remain the same as they were in the 3D classiﬁcation job, but let’s skip

Blush regularisation for this data set with its relatively high signal-to-noise particles.

On the Auto-sampling tab, one can usually keep the defaults. Note that the orientational sampling rates on the Sampling

tab will only be used in the ﬁrst few iterations, from there on the algorithm will automatically increase the angular

sampling rates until convergence. Therefore, for all reﬁnements with less than octahedral or icosahedral symmetry, we

typically use the default angular sampling of 7.5 degrees, and local searches from a sampling of 1.8 degrees. Only for

higher symmetry reﬁnements, we use 3.7 degrees sampling and perform local searches from 0.9 degrees. The only

thing we will change here is to set:

Use ﬁner angular sampling faster?

Yes

(This will be more aggresive in proceeding with iterations of ﬁner angular sampling faster. This will

therefore speed up the calculations. You might want to check that you’re not loosing resolution for

this in the later stages of your own processing, but during the initial stages it often does not matter

much.)

On the Compute tab, we also replicate the settings of the 3D classiﬁcation :

Use parallel disc I/O?

Yes

Number of pooled particles:

Skip padding?

Yes

(By default, relion will pad the 3D maps with zeros, to allow better interpolations in Fourier space.

This requires 8x more computer memory (RAM). By switching this oﬀ for this early reﬁnement, we

will save RAM and increase speed. One risk is that aliasing artefacts fold in at the edges of the 3D

reconstruction, so be careful with this option if your box is very tight.)

Pre-read all particles into RAM?

Yes

(Again, this is only possible here because the data set is small. For your own data, you would like

write the particles to a scratch disk instead, see below.)

4.7. High-resolution 3D reﬁnement 49

RELION

Copy particles to scratch directory

Combine iterations through disc?

Use GPU acceleration?

Yes

Which GPUs to use

0:1:2:3

As the MPI nodes are divided between one master (who does nothing else than bossing the others around) and two sets

of slaves who do all the work on the two half-sets, it is most eﬃcient to use an odd number of MPI processors, and the

minimum number of MPI processes for 3D auto-reﬁne jobs is 3. Memory requirements may increase signiﬁcantly at

the ﬁnal iteration, as all frequencies until Nyquist will be taken into account, so for larger sized boxes than the ones in

this test data set you may want to run with as many threads as you have cores on your cluster nodes.

On our computer with 4 GPUs, we used 5 MPIs and 6 threads, and this calculation took approximately 7 minutes.

4.7.2 Analysing the results

Also the output ﬁles are largely the same as for the 3D classiﬁcation job. However, at every iteration the program writes

out two run_it0??_half?_model.star and two run_it0??_half?_class001.mrc ﬁles: one for each indepen-

dently reﬁned half of the data. Only upon convergence a single run_model.star and run_class001.mrc ﬁle will be

written out (without _it0?? in their names). Because in the last iteration the two independent half-reconstructions are

joined together, the resolution will typically improve signiﬁcantly in the last iteration. Because the program will use

all data out to Nyquist frequency, this iteration also requires more memory and CPU.

Note that the automated increase in angular sampling is an important aspect of the auto-reﬁne procedure. It is based on

signal-to-noise considerations that are explained in [Sch12b], to estimate the accuracy of the angular and translational

assignments. The program will not use ﬁner angular and translational sampling rates than it deems necessary (because it

would not improve the results). The estimated accuracies and employed sampling rates, together with current resolution

estimates are all stored in the _optimiser.star and _model.star ﬁles, but may also be extracted from the stdout ﬁle.

For example, try:

grep Auto Refine3D/job019/run.out

4.8 Mask creation & Postprocessing

After performing a 3D auto-reﬁnement, the map needs to be sharpened. Also, the gold-standard FSC curves inside

the auto-reﬁne procedures only use unmasked maps (unless you’ve used the option Use solvent-ﬂattened FSCs). This

means that the actual resolution is under-estimated during the actual reﬁnement, because noise in the solvent region

will lower the FSC curve. relion’s procedure for B-factor sharpening and calculating masked FSC curves [CMF+13] is

called post-processing. First however, we’ll need to make a mask to deﬁne where the protein ends and the solvent

region starts. This is done using the Mask Creation job-type.

50 Chapter 4. Single particle tutorial

RELION

4.8.1 Making a mask

On the I/O tab, select the output map from the ﬁnished 3D auto-reﬁne job: Refine3D/job019/run_class001.mrc.

On the Mask tab set:

Lowpass ﬁlter map (A):

(A 15 Å low-pass ﬁlter seems to be a good choice for smooth solvent masks for many proteins.)

Pixel size (A):

-1

(This value will be taken automatically from the header of the input map.)

Initial binarisation threshold:

0.01

(This should be a threshold at which rendering of the low-pass ﬁltered map in for example chimera

shows no noisy spots outside the protein area. Move the threshold up and down to ﬁnd a suitable

spot. Remember that one can use the command-line program called relion_image_handler with

the options --lowpass 15 --angpix 1.244 to get a low-pass ﬁltered version of an input map.

Often good values for the initial threshold are around 0.002-0.02.)

Extend binary map this many pixels:

(The threshold above is used to generate a black-and-white mask. The white volume in this map will

be grown this many pixels in all directions. Use this to make your initial binary mask less tight.)

Add a soft-edge of this many pixels:

(This will put a cosine-shaped soft edge on your masks. This is important, as the correction procedure

that measures the eﬀect of the mask on the FSC curve may be quite sensitive to too sharp masks.

As the mask generation is relatively quick, we often play with the mask parameters to get the best

resolution estimate.)

Ignore the Helix tab.

To speed things up, on the Running tab, make sure to use multiple threads, e.g.:

Number of threads

You can look at slices through the resulting mask using the Display: button, or you can load the mask into UCSF

chimera. The latter may be a good idea, together with the map from the auto-reﬁne procedure, to conﬁrm that the

masks encapsulates the entire structure, but does not leave a lot of solvent inside the mask. You can continue the same

job with new settings for the mask generation multiple times, until you have found a mask you like.

4.8. Mask creation & Postprocessing 51

RELION

4.8.2 Post-processing

Now select the Post-processing job-type, and on the I/O tab, set:

One of the 2 unﬁltered half-maps:

Reﬁne3D/job019/run_half1_class001_unﬁl.mrc

Solvent mask:

MaskCreate/job020/mask.mrc

Calibrated pixel size (A):

1.244

(Sometimes you ﬁnd out when you start building a model that what you thought was the

correct pixel size, in fact was oﬀ by several percent.

Inside relion, everything up until this point was still consistent (at least up to ~2A resolution), so

you do not need to re-reﬁne your map and/or re-classify your data. All you need to do is provide

the correct pixel size here for your correct map and ﬁnal resolution estimation.)

On the Sharpen tab, set:

Estimate B-factor automatically:

Yes

(This procedure is based on the classic Rosenthal and Henderson paper [RH03], and will need the

ﬁnal resolution to extend signiﬁcantly beyond 10 Å. If your map does not reach that resolution, you

may want to use your own ad-hoc B-factor instead.)

Lowest resolution for auto-B ﬁt (A):

(This is usually not changed.)

Use your own B-factor?

Skip FSC-weighting?

(This option is sometimes useful to analyse regions of the map in which the resolution extends beyond

the overall resolution of the map. This is not the case now.)

MTF of the detector (STAR ﬁle)

mtf_k2_200kV.star

Original detector pixel size

0.885

(This is the original pixel size (in Angstroms) in the raw (non-super-resolution!) micrographs.)

Run the job (no need for a cluster, as this job will run very quickly). Using the Display button, you can display slizes

through the postprocessed map and a PDF with the FSC curves and the Guinier plots for this structure. You can also

open the PostProcess/job021/postprocess.mrc map in chimera, where you will see that it is much easier to see

where all the alpha-helices are then in the converged map of the 3D auto-reﬁne procedure. The resolution estimate

is based on the phase-randomization procedure as published previously [CMF+13]. Make sure that the FSC of the

phase-randomized maps (the red curve) is more-or-less zero at the estimated resolution of the postprocessed map. If

it is not, then your mask is too sharp or has too many details. In that case use a stronger low-pass ﬁlter and/or a wider

and more softer mask in the Mask creation step above, and repeat the postprocessing.

52 Chapter 4. Single particle tutorial

RELION

4.9 CTF and aberration reﬁnement

Next, we’ll use the CTF reﬁnement job-type to estimate the asymmetrical and symmetrical aberrations in the dataset;

whether there is any anisotropic magniﬁcation; and we’ll re-estimate per-particle defocus values for the entire data set.

Running this job-type can lead to further improvements in resolution at a relatively minor computational cost, but it

all depends on how ﬂat your ice was (for per-particle defocus estimates), and how well you had aligned your scope

(for the aberrations). It runs from a previous 3D auto-reﬁne job as well as a corresponding Post-processing job. Let’s

start with the higher-order aberrations, to see whether this data suﬀered from beamtilt or trefoil (which are asymmetric

aberrations), or from tetrafoil or an error in spherical aberration (which are symmetric aberrations).

4.9.1 Higher-order aberrations

On the I/O tab of CTF reﬁnement job-type on the GUI the set:

Particles (from Reﬁne3D)

Reﬁne3D/job019/run_data.star

Postprocess STAR ﬁle:

PostProcess/job021/postprocess.star

On the Fit tab set:

Estimate (anisotropic magniﬁcation

(We will do this later, see below.)

Perform CTF parameter ﬁtting?

(We will do this later, see below.)

Estimate beamtilt?

Yes

(Despite the observation that most microscopists perform coma-free alignment schemes prior to data

acquisition, there are still many data sets with a signiﬁcant amount of beamtilt. That’s why in this

example we’re ﬁrst looking for beamtilt, and do the anisotropic magniﬁcation and the per-particle

defocus later. In general, one would try to ﬁrst estimate the source of the largest errors.)

Also estimate trefoil?

Yes

(This will allow more higher-order Zernike polynomials in the ﬁtting of the asymmetric aberrations.)

estimate 4th order aberrations?

Yes

(This is done mostly for illustrative purposes here. One would not expect a big improvement at the

current resolution of 3 Å.)

Minimum resolution for ﬁts (A):

(Just leave the default.)

This program is only implemented on the CPU. Using 1 MPI and 12 threads, on our computer, this job ﬁnished in

approximately one minute.

4.9. CTF and aberration reﬁnement 53

RELION

You can analyse the accumulated averages for the asymmetrical and symmetrical aberrations, as well as their models,

by selecting the logfile.pdf ﬁle from the Display: button on the GUI. You’ll see that this data actually suﬀered from

some beamtilt: one side of the asymmetrical aberration images is blue, whereas the other side is red. You can ﬁnd the

values (approximately -0.2 mrad of beamtilt in the Y-direction) in the optics table of the output STAR ﬁle:

less CtfRefine/job022/particles_ctf_refine.star

There was also a small error in the spherical aberration, as the symmetrical aberration image shows a signiﬁcant,

circularly symmetric diﬀerence (the image is blue at higher spatial frequencies, i.e. away from the center of the image).

Importantly, for both the asymmetric and the symmetric aberaations, the model seems to capture the aberrations well.

If the data had suﬀered from trefoil, then the asymmetric aberration plot would have shown 3-fold symmetric blue/red

deviations. If the data had suﬀered from tetrafoil, then the symmetric aberration plot would have shown 4-fold sym-

metric blue/red deviations. Examples of those are shown in the supplement of our 2019 publication on Tau ﬁlaments

from the brain of individuals with chronic traumatic encephalopathy (CTE).

4.9.2 Anisotropic magniﬁcation

Next, let’s see whether these data suﬀer from anisotropic magniﬁcation. On the I/O tab of CTF reﬁnement job-type

on the GUI, use the output from the previous CTF reﬁnement job as input to this one:

Particles (from Reﬁne3D)

CtfReﬁne/job022/particles_ctf_reﬁne.star

Postprocess STAR ﬁle:

PostProcess/job021/postprocess.star

And this time, on the Fit tab set:

Estimate (anisotropic magniﬁcation

Yes

(This will deactivate most of the other options, as simultaneous magniﬁcation and aberration reﬁne-

ment is unstable.)

Minimum resolution for ﬁts (A):

(Just leave the default.)

Using 1 MPI and 12 threads, on our computer, this job ﬁnished in approximately one minute.

Again, the relevant images to analyse are in the logfile.pdf. There seem to be some blue-red trends, but the actual

anisotropy is very small, as assessed from the _rlnMagMat?? elements of the (2x2) transformation matrix in the optics

table of the output STAR ﬁle, which are close to the 0 and 1 values of the identity matrix:

less CtfRefine/job023/particles_ctf_refine.star

54 Chapter 4. Single particle tutorial

RELION

4.9.3 Per-particle defocus values

Lastly, let’s re-estimate the defocus values for each particle. Again, use the output from the previous job as input for

this one (although we could have just as well kept using the output from the aberration correction, as the magniﬁcation

anisotropy was very small):

Particles (from Reﬁne3D)

CtfReﬁne/job023/particles_ctf_reﬁne.star

Postprocess STAR ﬁle:

PostProcess/job021/postprocess.star

And this time, on the Fit tab set:

Estimate (anisotropic magniﬁcation

Perform CTF parameter ﬁtting?

Yes

Fit defocus?

Per-particle

(Provided the resolution of the reference extends well beyond 4 Å, per-particle defocus estimation

seems to be relatively stable. It will account for non-horizontal ice layers, and particles at the top or

bottom of the ice layer.)

Fit astigmatism?

Per-micrograph

(Provided the resolution of the reference extends well beyond 4 Å, and there are enough particles

on each micrograph, estimating astigmatism on a per-micrograph basis seems to be relatively stable.

Doing this on a pre-particle basis would require particles with very strong signal.)

Fit B-factor?

Fit phase-shift?

(This is useful for phase-plate data.)

Estimate beamtilt?

estimate 4th order aberrations?

Minimum resolution for ﬁts (A):

(Just leave the default.)

Using 1 MPI and 12 threads, on our computer, this job ﬁnished in four minutes on our computer.

Per-particle defocus values are plotted by colour for each micrograph in the logfile.pdf. Can you spot micrographs

with a tilted ice layer?

It is probably a good idea to re-run 3D auto-reﬁne and Post-processing at this stage, so we can conﬁrm that the new

particle STAR ﬁle actually gives better results.

4.9. CTF and aberration reﬁnement 55

RELION

For the 3D auto-reﬁne , we left all options as before, except on the I/O tab:

Input images STAR ﬁle

CtfReﬁne/job024/particles_ctf_reﬁne.star

Reference map

Reﬁne3D/job019/run_class001.mrc

and on the Reference tab:

Ref. map is on absolute greyscale?

Yes

After another Post-processing job, the resolution improved to 3.0 Å.

4.10 Bayesian polishing

relion implements a Bayesian approach to per-particle, reference-based beam-induced motion correction. This ap-

proachs aims to optimise a regularised likelihood, which allows us to associate with each hypothetical set of particle

trajectories a prior likelihood that favors spatially coherent and temporally smooth motion without imposing any hard

constraints. The smoothness prior term requires three parameters that describe the statistics of the observed motion. To

estimate the prior that yields the best motion tracks for this particular data set, we can ﬁrst run the program in ‘training

mode’. Once the estimates have been obtained, one can then run the program again to ﬁt tracks for the motion of all

particles in the data set and to produce adequately weighted averages of the aligned movie frames.

4.10.1 Running in training mode

Using 24 threads in parallel, this job took almost 2 hours on our computer.. . If you do not want to wait for this, you

can just proceed to the next section and use the sigma-values from our precalculated results, which are already given

in that section. For many data sets the default parameters on the GUI (𝜎

vel

= 0.2; 𝜎

div

= 5000; 𝜎

acc

= 2) will also

perform well, so people often skip training for Bayesian polishing

If you do want to run the training job yourself, on the I/O tab of the Bayesian polishing job-type set:

Micrographs (from MotionCorr):

MotionCorr/job002/corrected_micrographs.star

Particles (from Reﬁne3D or CtfReﬁne):

Reﬁne3D/job025/run_data.star

(These particles will be polished)

Postprocess STAR ﬁle

PostProcess/job026/postprocess.star

(the mask and FSC curve from this job will be used in the polishing procedure.)

First movie frame:

Last movie frame:

-1

(Some people throw away the ﬁrst or last frames from their movies. Note that this is not recom-

mended when performing Bayesian polishing in relion. The B-factor weighting of the movie frames

56 Chapter 4. Single particle tutorial

RELION

will automatically optimise the signal-to-noise ratio in the shiny particles, so it is best to include all

movie frames.)

Extraction size (pix in unbinned movie)

-1

(This option can be used to extract polished particles in a box with a diﬀerent size than the ones from

the input reﬁnement above. It is irrelevant for a training job.)

Re-scaled size (pixels)

-1

(This option can be used to re-scale the polished particles to a diﬀerent size than the ones from the

input reﬁnement above. It is irrelevant for a training job.)

Write output in ﬂoat16?

Yes

On the Train tab set:

Train optimal parameters?

Yes

Fraction of Fourier pixels for testing:

0.5

(Just leave the default here)

Use this many particles:

3000

(That’s almost all we have anyway. Note that the more particles, the more RAM this program will

take. If you run out of memory, try training with fewer particles. Using much fewer than 4000

particles is not recommended.)

On the Polish tab make sure you set:

Perform particle polishing?

Note that the training step of this program has not been MPI-parallelised. Therefore, make sure you use only a single

MPI process. We ran the program with 24 threads to speed it up. Still, the calculation took more than 1 hour.

4.10.2 Running in polishing mode

Once the training step is ﬁnished, the program will write out a text ﬁle called Polish/job027/opt_params.txt. To

use these parameters to polish your particles, click on the job-type menu on the left to select a new Bayesian polishing

job. Keep the parameters on the I/O tab the same as before, and on the Train tab, make sure you switch the training

oﬀ. Then, on the

Polish tab set:

Perform particle polishing?

Yes

Optimised parameter ﬁle:

Polish/job027/opt_params_all_groups.txt

OR use your own parameters?

4.10. Bayesian polishing 57

RELION

Minimum resolution for B-factor ﬁt (A):

Maximum resolution for B-factor ﬁt (A):

-1

(just leave the defaults for these last two parameters)

Alternatively, if you decided to skip the training set, then you can ﬁll in the Polish tab with the sigma-parameters that

we obtained in our run:

Perform particle polishing?

Yes

Optimised parameter ﬁle:

(leave this empty to use the optimal parameters we got as per below.)

OR use your own parameters?

Yes

Sigma for velocity (A/dose)

0.45

Sigma for divergence (A)

1290

Sigma for acceleration (A/dose)

2.66

Minimum resolution for B-factor ﬁt (A):

Maximum resolution for B-factor ﬁt (A):

-1

(just leave the defaults for these last two parameters)

This part of the program is MPI-parallelised. Using 3 MPI processes, with 8 threads each, our run ﬁnished in less than

six minutes. We could have used multiple MPI processes to speed this up, although disk access may become limited.

The Bayesian polishing job outputs a STAR ﬁle with the polished particles called shiny.star and a PDF logﬁle. The

latter contains plots of the scale and B-factors used for the radiation-damage weighting, plus plots of the reﬁned particle

tracks for all included particles on all micrographs. Looking at the plots for this data set, it appeared that the stage was

a bit drifty: almost all particles move from the top right to the bottom left during the movies. The B-factors and scale

factors look like expected: a slight dip for the ﬁrst frames, which comes from a small amount of initial fast beam-induced

movement, and then going down linearly as a result of radiation damage.

4.10.3 Re-running reﬁnement and post-processing

After polishing, the signal-to-noise ratio in the particles has improved, and one should submit a new 3D auto-reﬁne

job and a corrsponding Post-processing job. We chose to run the 3D auto-reﬁne job with the shiny particles using the

following option on the I/O tab:

Input images STAR ﬁle:

Polish/job028/shiny.star

58 Chapter 4. Single particle tutorial

RELION

Reference Map:

Reﬁne3D/job025/run_half1_class001_unﬁl.mrc

(By giving one of the half-maps as reference, both half-maps will actually be read in for gold-standard

reﬁnement. This prevents overﬁtting by reading in a joint reconstruction, and therefore one can start

the reﬁnement from higher initial resolutions.)

Reference mask (optional):

MaskCreate/job020/mask.mrc

(this is the mask we made for the ﬁrst Post-processing job. Using this option, the solvent will be set

to zero for all pixels outside the mask. This reduces noise in the reference, and thus lead to better

orientation assignments and thus reconstructions.)

Because we are providing a half-map, overﬁtting shouldn’t be a problem and on the Reference tab we can set:

Initial low-pass ﬁlter (A)

On the Optimisation tab, we set:

Use solvent-ﬂattened FSCs?

Yes

(Using this option, the reﬁnement will use a solvent-correction on the gold-standard FSC curve at

every iteration, very much like the one used in the Post-processing job-type. This option is partic-

ularly useful when the protein occupies a relatively small volume inside the particle box, e.g. with

very elongated molecules, or when one focusses reﬁnement on a small part using a mask. The default

way of calculating FSCs in the 3D auto-reﬁnement is without masking the (gold-standard) half-maps,

which systematically under-estimates the resolution during reﬁnement. This is remediated by calcu-

lating phase-randomised solvent-corrected FSC curves at every iteration, and this generally leads to

a noticeable improvement in resolution.)

Also, on the Auto-sampling tab, we now set:

Use ﬁner angular sampling faster?

(This was OK in the earlier stages to speed up calculations, but at this stage we want to get the highest

resolution we can, so we will opt for the slower, but safer options.)

And on the Compute tab, we now set:

Skip padding?

(For the same reason as above: skipping padding speeds up, but some aliasing artefacts can fold back

in. As we do have the required computer memory for the padded reconstruction, let’s go safe for this

last reﬁnement.)

As you can see in the pre-calculated results, after a ﬁnal Post-processing job, we obtained an overall resolution of 2.9

Å. Not bad for 3GB of data, right?

4.10. Bayesian polishing 59

RELION

4.10.4 When and how to run CTF reﬁnement and Bayesian polishing

Both Bayesian polishing and CTF reﬁnement , which comprises per-particle defocus, magniﬁcation and higher-order

aberration estimation, may improve the resolution of the reconstruction. This raises a question of which one to apply

ﬁrst. In this example, we ﬁrst reﬁned the aberrations, the magniﬁcation, and then the per-particle defocus values. We

then followed up with polishing, but we could have also performed the polishing before any of the CTF reﬁnements.

Both approaches beneﬁt from higher resolution models, so an iterative procedure may be beneﬁcial. For example,

one could repeat the CTF reﬁnement after the Bayesian polishing. In general, it is probably best to tackle the biggest

problem ﬁrst, and some trial and error may be necessary.

Moreover, we have seen for some cases that the training prodcedure of Bayesian polishing yields inconsistent results:

i.e. multiple runs yield very diﬀerent sigma values. However, we have also observed that often the actual sigma values

used for the polishing do not matter much for the resolution of the map after re-reﬁning the shiny particles. Therefore,

and also because the training is computationally expensive, it may be just as well to run the polishing directly with the

default parameters (𝜎

vel

= 0.2; 𝜎

div

= 5000; 𝜎

acc

= 2), i.e. without training for your speciﬁc data set.

4.11 Local-resolution estimation

The estimated resolution from the post-processing program is a global estimate. However, a single number cannot

describe the variations in resolution that are often observed in reconstructions of macromolecular complexes. Alp Ku-

cukelbir and Hemant Tagare wrote a nifty program to estimate the variation in resolution throughout the map [KST14].

relion implements a wrapper to this program through the Local resolution job-type. Alternatively, one can choose to

run a post-processing-like procedure with a soft spherical mask that is moved around the entire map. In the example

below, we use the latter.

4.11.1 Running the job

On the I/O tab set:

One of the two unﬁltered half-maps:

Reﬁne3D/job029/run_half1_class001_unﬁl.mrc

User-provided solvent mask:

MaskCreate/job020/mask.mrc

Calibrated pixel size:

1.244

(Sometimes you ﬁnd out when you start building a model that what you thought was the correct pixel

size, in fact was oﬀ by several percent. Inside relion, everything up until this point was still consistent

(provided you aren’t at resolutions beyond 2 Angstrom). so you do not need to re-reﬁne your map

and/or re-classify your data. All you need to do is provide the correct pixel size here for your correct

map and ﬁnal resolution estimation.)

On the ResMap tab set:

Use ResMap?

On the Relion tab set:

Use Relion?

Yes

60 Chapter 4. Single particle tutorial

RELION

User-provided B-factor:

-30

(This value will be used to also calculate a locally-ﬁltered and sharpened map. Probably you want to

use a value close to the one determined automatically during the Post-processing job.)

MTF of the detector (STAR ﬁle):

mtf_k2_200kV.star

(The same as for the Post-processing job.)

4.11.2 Analysing the results

Running with 8 MPI processes, this job took approximately 7 minutes. The output is a ﬁle called LocalRes/job031/

relion_locres.mrc that may be used in UCSF chimera to color the Postprocess/job030/postprocess.mrc map ac-

cording to local resolution. This is done using [Tools]-[Volume data]-[Surface color], and then select by

volume data value and browse to the resmap ﬁle.

Unique to the relion option is the additional output of a locally-ﬁltered (and sharpened map), which may be use-

ful to describe the overall variations in map quality in a single map. This map is saved as LocalRes/job031/

relion_locres_filtered.mrc and can be visualised directly in UCSF chimera (and optionally also coloured by

local resolution as before).

4.12 Checking the handedness

Careful inspection of the map may indicate that the handedness is incorrect, e.g. because the 𝛼-helices turn the wrong

way around. Remember that it is impossible to determine absolute handedness from a data set without tilting the

microscopy stage. The SGD algorithm in the 3D initial model jobtype therefore has a 50 % chance of being in the

opposite hand. In our precalculated results, this was not the case. One may ﬂip the handedness of the postprocessed

map from the command line as follows:

relion_image_handler --i PostProcess/job030/postprocess.mrc \

--o PostProcess/job030/postprocess_invert.mrc --invert_hand

The same command could also be run on any of the other maps. If one realises earlier on in the image processing

procedure that the hand is wrong, one could of course also switch to the other hand earlier on. For relion itself it

doesn’t matter, as both hands cannot be distinguished, but it may be more convenient to ﬂip the hand as soon as you

notice it.

Once in the correct hand, you might want to load the map into UCSF chimera and superimpose it with an atomic model

for 𝛽-galactosidase. You could try fetching one straight from the PDB using PDB-ID 5a1a.

4.13 ModelAngelo: atomic modelling

As of release 5.0, relion comes with a machine-learning approach for automated atomic model building called

ModelAngelo [JKZ+24]. ModelAngelo is a graph neural network that combines information from the cryo-EM map

with sequence information of the proteins that are present in the map to build an atomic model.

Because we know that the sample in this data set is beta-galactosidase from E.coli, we can provide the protein sequence

as a FASTA ﬁle. You can download it from uniprot through this link. Save it as betagal.fasta in the project

directory.

4.12. Checking the handedness 61

RELION

4.13.1 Running the job

Using the ModelAngelo building job type, we set on the I/O tab:

B-factor sharpened map:

PostProcess/job030/postprocess_masked.mrc

FASTA sequence for proteins:

betagal.fasta

FASTA sequence for DNA:

(There is no DNA in this complex.)

FASTA sequence for RNA:

(There is no RNA in this complex.)

Which GPUs to use:

0,1,2,3

On the Hmmer tab make sure you this is set:

Perform HMMer search?

4.13.2 Analysing the results

Running with 4 GPUs (relatively old 1080s), this job took approximately 18 minutes. The output is a coordinate

ﬁle called ModelAngelo/job032/job032.cif that may be used in UCSF chimera together wit the Postprocess/

job030/postprocess.mrc map. Note that although ModelAngelo did a very good job on this relatively easy test

case, you should always check its results carefully in a program like coot [ELSC10] . You will also need to perform a

stereochemical reﬁnement of the coordinates. For this, we like servalcat [YPBM21].

4.13.3 Discovering unknown proteins

One does not always know the identity of all proteins that are present in a cryo-EM reconstruction. One can also run

ModelAngelo without providing a FASTA ﬁle with the known protein sequence. It will still try to model protein chains

in the map, but will perform less well without the knowledge of the sequence. But, one can then perform a Hidden

Markov Model search using the full probabilities for all 20 amino acids for every residue, for example against a FASTA

ﬁle that contains the entire genome for that organism. In that way, one can identify unknown proteins in the map.

Download the E.coli genome (K-12 substrain) from this link to NCBI. Use the Send to dropdown menu on the top

right to select Coding Sequences, select the FASTA protein format, and click Create File. Save the resulting

ﬁle as Ecoli.fasta in your project directory.

Using the same ModelAngelo building job type, this time we set on the I/O tab:

B-factor sharpened map:

PostProcess/job030/postprocess_masked.mrc

FASTA sequence for proteins:

62 Chapter 4. Single particle tutorial

RELION

FASTA sequence for DNA:

(There is no DNA in this complex.)

FASTA sequence for RNA:

(There is no RNA in this complex.)

Which GPUs to use:

0,1,2,3

On the Hmmer tab, we now set:

Perform HMMer search?

Yes

Library with sequences for HMMer search:

Ecoli.fasta

Alphabet for HMMer search:

amino

And we leave the rest of the HMMSearch parameters at their defaults.

4.13.4 Which one is my protein?

Running with 4 GPUs (again our relatively old 1080s), this job took approximately 12 minutes: running without the

sequence is faster than running with the sequence. However, without knowledge of the sequence, ModelAngelo has trou-

ble building a single chain, as you can see by visualising ModelAngelo/job033/job033.cif in UCSF chimera. The

subsequent HMM search easily identiﬁes beta-galactosidase, as you can see in ModelAngelo/job033/best_hits.

csv. Cool, huh?

4.14 DynaMight: exploring motions

As of release 5.0, relion comes with a machine-learning approach for the analysis of molecular motions and ﬂexibility

called DynaMight [SKB+23]. DynaMight will ﬁt molecular motions for each experimental particle image as a 3D

deformation ﬁeld, which is learnt using a variational auto-encoder. It also implements functionality to calculate a

pseudo-inverse 3D deformation ﬁeld that can then be used in a deformed weighted backprojection algorithm to obtain

an improved 3D reconstruction of the consensus structure.

Beta-galactosidase is not a ﬂoppy protein, therefore running DynaMight on this data set will not be exciting! The below

just walks you through the steps, so you learn how to run DynaMight jobs for your own data set.

4.14.1 Estimating the motions

We will use the DynaMight ﬂexibility job type. On the I/O tab set:

Input images STAR ﬁle:

Reﬁne3D/job029/run_data.star

Consensus map:

Reﬁne3D/job029/run_class001.mrc

4.14. DynaMight: exploring motions 63

RELION

Number of Gaussian:

8000

(Describing ﬂexibility at higher resolutions will require more Gaussians. As a rule of thumb, use

1-2 Gaussians per amino acid or nucleotide in your complex. The beta-galactosidase tetramer has

just over 4x1042 amino acids, so we will use 8000 Gaussians here. The more Gaussians you use, the

longer this will take.. . so you may want to run smaller tests too. Note that running with more than

30,000 Gaussians will be diﬃcult in terms of memory unless you have a very large GPU.)

Initial map threshold (optional):

0.015

(This is a threshold at which ﬂexible regions still show in 3D visualisation tools like UCSF chimera,

but you should avoid the appearance of noisy density islands in the solvent region at this threshold.)

Regularization factor:

Which (single) GPU to use:

(For now, DynaMight only uses one GPU at a time.)

Preload images in RAM?

Yes

(As we only have a few thousand, we can read all particles in RAM and save time.)

For nowm ignore the Tasks tab; on the Running tab set:

Number of threads:

4.14.2 Visualising the deformations

On our, relatively old, 1080 GPU, estimating the deformations took less than onehour and a half. To visualise the

ﬂexibility in the beta-galactosidase complex, we use a continuation of the DynaMight ﬂexibility job, where on the

Tasks tab we set:

Checkpoint ﬁle:

DynaMight/job034/forward_deformations/checkpoints/checkpoint_ﬁnal.pth

Do visualization?

Yes

Half-set to visualize:

(You can visualize either halfset 1 or halfset 2. There is another option where you visualize halfset

0. In that case, you can visualize error estimates on the 3D deformations for a validation set of 10%

of your particles)

Do inverse-deformation estimation?

Do deformed backprojection?

64 Chapter 4. Single particle tutorial

RELION

Running this by pressing the Continue! button will launch a viewer that was written in napari. It is quite a heavy-

handed tool, so you will need to run this viewer locally, while also needing a GPU (speciﬁed on the I/O tab, to perform

some of the calculations.. .

By clicking on the colourful representation of latent space on the right, the 3D viewer will display the corresponding

deformed state of the consensus structure. On the bottom right, by selecting trajectory under the action menu, one

can draw a line through latent space using the middle-mouse button. This will create a movie of the states in the 3D

viewer, which you can play with the triangle button underneath the 3D viewer. As said before, this looks underwhelming

for beta-galactosidase, which basically has very little ﬂexibility. You can use the right-mouse to zoom in on the tips,

where perhaps you can see some imnute movements?

You can save maps or movies using the button on the bottom right. The individual states of such a movie would be

saved as maps in MRC format here: DynaMight/job138/movies/movie01_half1/. You can then load all these

maps into a 3D visualisation program, like UCSF |chimera|, and recreate the movie there.

4.14.3 Estimating inverse deformations

The 3D deformations are deﬁned from the consensus positions of the Gaussians to their new position. For use in de-

formed weighted backprojection, we ﬁrst need estimates for the inverse deformations at every point in the 3D Cartesian

space. For this, DynaMight also uses a neural network. We can train it by using another continuation of the same

DynaMight ﬂexibility job, where on the Tasks tab we now set:

Checkpoint ﬁle:

DynaMight/job034/forward_deformations/checkpoints/checkpoint_ﬁnal.pth

Do visualization?

Do inverse-deformation estimation?

Yes

Number of epochs to perform

200

Store deformations in RAM?

(If set to Yes, dynamight will store all deformations in the GPU memory, which will speed up the

calculations, but you need to have enough GPU memory to do this. We could not do this on our small

1080s.)

Do deformed backprojection?

Running this by pressing the Continue! button took a bit over an hour on one of our 1080s.

4.14. DynaMight: exploring motions 65

RELION

4.14.4 Deformed backprojection

Finally, the resulting inverse deformations can then be used for a deformed backprojection that attempts to reconstruct an

improved version of the consensus map, where densities for each particle are warped into non-straight lines in the back-

projection in an attempt to “un-do” their deformations. Again, we use a continuation of the same DynaMight ﬂexibility

job, where on the Tasks tab we set:

Checkpoint ﬁle:

DynaMight/job034/forward_deformations/checkpoints/checkpoint_ﬁnal.pth

Do visualization?

Do inverse-deformation estimation?

Do deformed backprojection?

Yes

Backprojection batchsize:

(Batches are processes in parallel on the GPU. Again, because our 1080s don’t have a lot of memory,

we could only run with a batchsize of 2. If you have a larger GPU, you could try and use more, e.g.

5 or 10.)

This took approximately 45 minutes on our system. The resulting half-maps can be used for a standard Post-processing

job. In this case, nothing really interesting happens as there are hardly any deformations. The resolution actually

decreases by about 0.5 Angstrom, probably because the deformed backprojection algorithm is not as good as good

as the Fourier inversion algorithm for structurally homogeneous data sets. More interesting examples of deformed

backprojection reconstructions are in the DynaMight paper [SKB+23].

4.15 Wrapping up

4.15.1 Cleaning up your directories

In order to save disk space, relion has an option to clean up job directories. There are two modes of cleaning: ‘gentle’

cleaning will only delete intermediate ﬁles from the job directory being cleaned; ‘harsh’ cleaning also deletes ﬁles that

may be necessary to launch a new job that needs input from the job being cleaned. For example, harsh cleaning will

remove averaged micrographs from a MotionCorr job, or extracted particles stacks from a Particle extraction job, while

gentle cleaning will remove all ﬁles from itermediate iterations of 2D classiﬁcation , 3D classiﬁcation or 3D auto-reﬁne

jobs. You can clean individual jobs from the Job actions button; or you can clean all jobs from the ‘Jobs’ pull-down

menu at the top of the GUI. We used the ‘Gently clean all jobs’ option from that menu before making a tarball of the

project directory that we distributed as our precalculated results. You might want to gently clean your project directory

before you put your data in long-term storage.

66 Chapter 4. Single particle tutorial

RELION

4.15.2 Asking questions and citing us

That’s it! Hopefully you enjoyed this tutorial and found it useful. If you have any questions about relion, please ﬁrst

check the FAQ on the relion Wiki and the CCPEM mailing list. If that doesn’t help, use the CCPEM list for asking

your question.

If relion turns out to be useful in your research, please do cite our papers and tell your colleagues about it.

4.15.3 Further reading

The theory behind the reﬁnement procedure in relion is described in detail in:

• S.H.W. Scheres (2012) “relion: Implementation of a Bayesian approach to cryo-EM structure determination” J.

Struc. Biol., 180, 519-530.

• S.H.W. Scheres (2012) “A Bayesian view on cryo-EM structure determination” J. Mol. Biol., 415, 406-418.

A comprehensive overview of how to use relion for all types of classiﬁcations is described in:

• S.H.W. Scheres (2016) “Processing of structurally heterogeneous cryo-EM data in relion” Meth. Enzym., 579,

125-157.

This tutorial does not cover multi-body reﬁnement, which is useful to describe continuous motions in relatively large

complexes. You can ﬁnd a manuscript with speciﬁc instructions on how to perform multi-body reﬁnement on the

RELION Wiki

4.15. Wrapping up 67

RELION

68 Chapter 4. Single particle tutorial

CHAPTER

FIVE

SUBTOMOGRAM TUTORIAL

5.1 Introduction

This tutorial will walk you through the tomography and sub-tomogram averaging pipeline of relion-5.0 . This semi-

automated pipeline is designed to carry out all of the steps in a typical subtomogram averaging project from your raw

tilt series movies to your ﬁnal subtomogram average. We will use a test data set composed of 5 tomograms of immature

HIV-1 dMACANC VLPs, which is available at EMPIAR-10164.

5.1.1 Requirements

1. Your raw tilt series data as individual movies, or as individual motion-corrected images, but not as combined tilt

series stacks.

2. The mdoc ﬁles containing the metadata for each tilt series.

5.1.2 Getting Organised

We recommend creating a single directory per project. We’ll call this the project directory.

To start processing, make your own project directory and download the raw data (frames and mdoc ﬁles) corresponding

to the ﬁve tomograms (TS_01, TS_03, TS_43, TS_45, TS_54) from the EMPIAR link above. A RELION project

directory containing precalculated results of all the processing steps of this dataset can be downloaded from:

We will start by launching the relion GUI. It is important to always launch the RELION graphical user-interface

(GUI) from the project directory. To prevent errors with this, the GUI will ask for conﬁrmation the ﬁrst time you

launch it in a new directory. Make sure you are inside the project directory, and launch the GUI by typing:

relion --tomo&

and answer Yes when prompted to set up a new relion project here.

RELION

5.2 Import

We will now import the raw data into RELION. In the GUI, select Import from the jobt-type browser on the top left

and ﬁll in the following parameters on the General tab:

Tilt image ﬁles

frames/*.mrc

Movies already motion corrected?

(These are 8-frame movies that have not yet been motion-corrected.)

mdoc ﬁles

mdoc/*.mdoc

Optics group name

optics1

(All imported tomograms will belong to the same optics group with this name. When left empty,

each tomogram will be its own optics group, with the tomogram name as the optics group name.)

Preﬁx

Pixel size (Angstrom):

0.675

Voltage (kV):

300

Spherical aberration (mm):

2.7

Amplitude contrast:

0.1

On the Tilt series tab:

Dose rate per tile-image

Is dose rate per movie frame?

Tilt axis angle (deg)

(This is the nominal value for the tilt-axis orientation wrt to the Y-axis (positive is CCW from Y))

MTF ﬁle

Invert defocus handedness?

Yes

(Specify Yes to ﬂip the handedness of the defocus geometry; the default, Yes, leads to a value of -1

in the STAR ﬁle, which is the correct one for the tutorial dataset.)

We skip the Coordinates tab for now (we will use it in the Import coordinates section to import particle coordinates

obtained from other programs), and on the Running tab set:

70 Chapter 5. Subtomogram tutorial

RELION

Submit to queue?

You may provide a meaningful alias (for example: tilt_series) for this job in the white ﬁeld named Current job:

Give_alias_here. Clicking the Run! button will launch the job.

A directory called ImportTomo/job001/ will be created, together with a symbolic link to this directory that is called

ImportTomo/tilt_series. Inside the newly created directory a tilt_series.star ﬁle is created. It contains a

table with an entry for each tilt series. For each tilt series, a separate starﬁle contains relevant metadata that has been

extracted from the input images and mdoc ﬁles. Have a look at these by typing:

less Import/job001/tilt_series.star

less Import/job001/tilt_series/TS_01.star

5.3 Motion correction

As the raw data in EMPIAR-10164 are movies, we need to carry out motion correction. If your own raw data are

already motion-corrected, you can skip this section provided you have selected ‘Yes’ under ‘Movies already motion

corrected?’ in the Import

To start, click the Motion correction job type in the job-type browser and ﬁll in the following parameters on the I/O

tab:

Input tilt series

ImportTomo/job001/tilt_series.star

(RELION will use this star ﬁle to locate the movies for each tilt series.)

EER fractionation

(This dataset was not saved in the EER format. Therefore, we can ignore the EER fractionation ﬁeld;

but don’t leave it empty. If your own data are collected in the EER format, the fractionation should

be adjusted as described here.)

Write output in ﬂoat16?

Yes

(We recommend writing images in ﬂoat16 format to save disk space. If you select ‘Yes’ to writing

in ﬂoat16, you must also select ‘Yes’ to ‘Save sum of power spectra?’, because CtfFind4 will not be

able to read ﬂoat16 images.)

Save images for denoising?

Yes

(The tutorial dataset is very high quality with good contrast and, consequently, the tomograms pro-

duced probably do not need to be denoised. However, to demonstrate RELION’s denoising protocol

later, we will select ‘Yes’ in the ‘Save images for denoising?’ ﬁeld. If, for your own data, you have

limited hard drive space and you are conﬁdent your tomograms will be high contrast and easily inter-

preted, you can select ‘No’. However, for most datasets we recommend saving images for denoising,

just in case it may be needed at a later stage. )

Save sum of power spectra?

Yes

5.3. Motion correction 71

RELION

(Sums of non-dose weighted power spectra provide better signal for CTF estimation. The power

spectra can be used by CTFFIND4. This option is not available for UCSF MotionCor2. You must

use this option when writing in ﬂoat16, as CtfFind4 cannot read images in ﬂoat16 format.)

Sum power spectra every n frames

(McMullan et al. Ultramicroscopy, 2015 suggest that summing power spectra every 4.0 e/A2 gives

optimal Thon rings. One frame will be 3 e/A2.)

On the Motion tab:

Bfactor

150

(For your own data, you may need to increase this value, if the SNR is particularly low. For super

resolution movies, increasing the B factor may also help.)

Number of patches

1 1

(As there is so little dose in each movie frame, it is better not to use patch-based motion correction.)

Group frames

Binning factor

(The raw images were collected in super-resolution mode and have not yet been binned. This will

bin the images to 1.35 Å.)

Gain-reference image

“”

(The tutorial data has already been gain corrected so does not need a ‘Gain reference image’ and so

this ﬁeld should be left blank.)

Gain rotation

No rotation (0)

(This will be ignored as we are not doing gain correction.)

Gain ﬂip

No ﬂip (0)

(This will be ignored as we are not doing gain correction.)

Defect ﬁle

“”

(We don’t have a ﬁle that describes the camera defects for this data set. For your own data, the

defect ﬁle can be used to mask away broken pixels on the detector. Formats supported in our own

implementation and in UCSF motioncor2 are either a text ﬁle in UCSF motioncor2 format (each line

contains four numbers: x, y, width and height of a defect region); or a defect map (an image in MRC

or TIFF format, where 0=good and 1=bad pixels. The coordinate system is the same as the input

movie before application of binning, rotation and/or ﬂipping. Note that defect text ﬁles produced by

SerialEM are NOT supported! However, one can convert a SerialEM-style defect ﬁle into a defect

map using imod.).

Use RELION’s own implementation

Yes

72 Chapter 5. Subtomogram tutorial

RELION

If you prefer to use UCSF MotionCor2, select No, provide the path to the executable, the GPU ID(s) of the GPUs you

wish to use, and any other MotionCor2 arguments in their respective ﬁelds. Note that MotionCor2 cannot save images

in ﬂoat16 yet, nor does it write out summed power spectra of movie frames for subsequent CTF estimation.

On the Running tab:

Number of MPI procs:

(We used 32 parallel processes on our computer.)

Number of threads:

Submit to queue?

(We used a local machine, but this will depend on your setup.)

Clicking the Run! button will launch the job. Your motion corrected particles will be output into the MotionCorr/

job002/ directory. The output star ﬁle containing all necessary metadata for input into other jobs is saved as

MotionCorr/job002/corrected_tilt_series.star. You can again have a look at the star ﬁles it refers to, to

see accumulated metadata about the motion correction by typing:

less MotionCorr/job002/tilt_series/TS_01.star

5.4 CTF estimation

Next, we will estimate the CTF parameters for each motion-corrected image of the ﬁve tilt series. We will use Alexis

Rohou and Niko Grigorieﬀ’s CTFFIND-4.1 because it gives excellent results and allows reading in the movie-averaged

power spectra calculation by RELION’s own implementation of the MotionCor2 algorithm.

Select the CTF estimation job type in the job-type browser and ﬁll in the following parameters on the I/O tab:

Input tilt series

MotionCorr/job002/corrected_tilt_series.star

Estimate phase shifts?

(Estimating phase shifts option is only useful for phase plate data)

Amount of astigmatism (A)

100

(This number is suitable for most data sets on reasonably well aligned microscopes.)

On the CTFFIND-4.1 tab:

CTFFIND-4.1 executable

path/to/ctﬃnd

(You will need to have CTFFIND-4.1 installed somewhere. Copy the ﬁlename of the executable

here.)

Use power spectra from MotionCorr job?

Yes

(This estimates the CTF parameters using the power spectra that were saved during motion correc-

tion.)

5.4. CTF estimation 73

RELION

Use exhaustive search

Estimate CTF on window size(pix)

-1

FFT box size (pix)

512

Minimum resolution (A)

(This is the lowest resolution that is used for CTF estimation. For you own data, you may need to

change these values.)

Maximum resolution (A)

(This is the highest resolution that is used for CTF estimation, in the image with the least dose. For

you own data, you may need to change these values.)

Minimum defocus value (A)

5000

(Note that this value will be ignored as long as we use the ‘Nominal defocus search range’ ﬁeld

below.)

Maximum defocus value (A)

50000

(Note that this value will be ignored as long as we use the ‘Nominal defocus search range’ ﬁeld

below.)

Defocus step size (A)

500

Nominal defocus search range (A)

10000

(If a positive value is given, the defocus search range will be set to +/- this value (in A) around the

nominal defocus value from the input STAR ﬁle. The nominal defocus would have been extracted

from the mdoc ﬁles. When using this option make sure that the correct values are present in the input

star ﬁles for each tilt series. If a zero or negative value is given for this ﬁeld, then the overall min-max

defocus search ranges above will be used instead.)

Dose-dependent Thon ring fading (e/A2)

100

(If a positive value is given, then the maximum resolution for CTF estimation is lowerered by

exp(dose/this_factor) times the original maximum resolution speciﬁed above. Remember that

exp(1)~=2.7, so a value of 100 e/A^2 for this factor will yield a 2.7x higher maxres number (i.e.

2.7x lower resolution) for an accumulated dose of 100 e/A^2; Smaller values will lead to faster decay

of the maxres. If zero or a negative value is given, the maximum value speciﬁed above will be used

for all images.)

On the Running tab:

Make sure you use a suitable number of parallel MPI processes for your computational setup, possibly submit to a

queue, and click the Run! button to launch the job.

74 Chapter 5. Subtomogram tutorial

RELION

The output will be saved in the CtfFind/job003/ directory. The output star ﬁle containing all necessary metadata

for input into other jobs is saved as CtfFind/job003/tilt_series_ctf.star. You can again have a look at added

metadata for each tilt series image by looking into their star ﬁles:

less CtfFind/job003/tilt_series/TS_01.star

Check the defocus values for a few tilt series by examining their star ﬁles in the tilt_series directory of the CtfFind job.

The rlnDefocusU and rlnDefocusV columns specify the estimated defocus values.

In addition, the logfile.pdf ﬁle contains plots of useful parameters, such as defocus, astigmatism, estimated reso-

lution, etc for all micrographs, and histograms of these values over the entire data set. Analysing these plots may be

useful to spot problems in your data acquisition.

Lastly, you can also see the power spectrum of the estimated CTF of each tilt image in a tilt series using the

relion_dislay command:

relion_display --gui --i CtfFind/job003/tilt_series/TS_01.star

Compared to single-particle data, tilt series images tend to have low SNR, particularly at higher tilts. Consequently, the

defocus values may vary at higher tilts. Later in this tutorial, we will perform reference-based CTF reﬁnement, during

which we can determine better defoci, so as long as the estimated values aren’t too far oﬀ, we can hope to improve them

later on and we need not worry too much at this stage.

5.5 Exclude tilt-images

Often, tilt series contain some images that are not suitable for tomogram reconstruction or further image processing,

for example because they are partially or all black because of grid bars blocking the ﬁeld of view. We can exclude these

images from further processing using the Exclude tilt-images job.

On the I/O tab, set:

Input tilt series

CtfFind/job003/tilt_series_ctf.star

Number of cached tilt series

(This will store 5 tilt series in memory and minimise loading times between tilt series. If your com-

puter struggles with loading so many, you can change this to 1.)

Clicking the Run! button will launch the Napari. Note that Napari is notoriously slow when displaying over a

network, so make sure you are running this job from the same computer as you are sitting behind!

Before doing anything, wait for the tilt series to load. The loading of the tilt series is indicated in the bottom left

corner. Once loading is ﬁnished, we can examine the ﬁrst tilt series. In the Images panel on the far right of the

GUI, click the top image and scroll through the tilt series using the arrow key on your keyboard. For TS_01, the

ﬁrst image (TS_01_039_-60_0.mrc) is blank but the rest of the tilt series looks good. Therefore, that ﬁrst im-

age should be excluded from further calculations. Do this by unticking the box next to its name and then click-

ing Save tilt-series STAR ﬁle in the bottom right corner. Then, move onto the next tilt series (TS_03) by clicking

Session_1_TS_03 in the tilt series panel in the bottom right. Repeat the same for all tilt series. We recommend clicking

Save tilt-series STAR ﬁle after each tilt series, just in case Napari crashes. If Napari does crash, you can start a new

Exclude tilt-images job, where the input tilt series ﬁle on the I/O tab is the selected_tilt_series.star from the

ExcludeTiltImages/ directory of the crashed job.

When ﬁnished, click Save tilt-series STAR ﬁle and close Napari.

5.5. Exclude tilt-images 75

RELION

5.6 Align tilt-series

Before tomogram reconstruction, each tilt series must be aligned. To do this, RELION 5 implements a wrapper to

IMOD or AreTomo. For the tutorial dataset, we will use IMOD’s ﬁducial-based alignment as the raw data contains (10

nm) gold beads as ﬁducial markers. For your own data, you may want to use various tilt series alignment methods and

then compare the quality of the tomograms that each method produces (see next step).

Start by selecting the Align tilt-series job, and in the I/O tab set:

Input tilt series

ExcludeTiltImages/job004/selected_tilt_series.star

On the IMOD tab, set:

Use IMOD’s ﬁducial based alignment?

Yes

Fiducal diameter (nm)

Use IMOD’s patch-tracking for alignment?

(For your own data, select Yes if you want to use IMOD’s patch-tracking instead of ﬁducial-based

alignment.)

On the AreTomo tab, make sure this is still set:

Use AreTomo

Then, run the job, which has not been parallelised (yet), by clicking the Run! button.

You can again view the accumulated metadata for each tilt series by typing:

less AlignTiltSeries/job005/tilt_series/TS_01.star

The tutorial data is very good and automated alignment is not a problem. This may not be the case for your own data.

You can however go into each of the individual IMOD directories (e.g. AlignTiltSeries/job005/external/

TS_01) and re-run IMOD for individual tilt series by tweaking the parameter ﬁles. This will obviously require some

skills in IMOD, which may be useful to acquire anyway. You can ﬁnd the IMOD documentation here. First delete the

corresponding AlignTiltSeries/job005/tilt_series/TS_XX.star ﬁle (if it exists in the ﬁrst place) and then

continue the Align tilt-series job from the GUI. The job will then read in the new results and generate an updated star

ﬁle.

5.7 Reconstruct tomograms

We can now generate our tomograms. Click on the Reconstruct tomograms job from the job types, and on the I/O tab

set:

Input tilt series

AlignTiltSeries/job005/aligned_tilt_series.star

On the Reconstruct tab:

Unbinned tomogram width (Xdim)

4000

76 Chapter 5. Subtomogram tutorial

RELION

(This is the X-dimension of the reconstructed tomogram, in voxels. We’re using a slightly larger

tomogram volume than the actual size of the images (which is 3710 x 3838) so that if they rotate,

all pixels will still be in the tomogram. Because we use a large binned pixel below, the costs on disk

space are not too high, but you could tweak this to have a bit smaller tomograms.)

Unbinned tomogram height (Ydim)

4000

(As above for the Y-dimension.)

Unbinned tomogram thickness (Zdim)

2000

(This is the Z-dimension of the tomogram, in voxels. For the tutorial data, 2000 voxels encapsulates

the signal for all ﬁve tomograms. For your own data, you may want to test a few values here to

make sure the tomogram thickness is not too small to contain your entire sample. If you intend to

denoise your tomograms later, it is better not to pick a tomogram thickness that is much greater than

the thickness of your sample, because the denoising protocol randomly extracts subtomograms from

your tomograms and you don’t want too many without signal.)

Binned pixel size (A)

(A binned pixel size of 10 Angstrom will suﬃce for particle picking and denoising. Typically, the

larger the pixel size, the faster the tomogram reconstruction and the less space the tomograms occupy

on disk.)

Generate tomograms for denoising?

Yes

(The quality of the tutorial tomograms is very good and they don’t need to be denoised; however, we

set this to Yes to demonstrate denoising at the next step. This setting requires setting Save images

for denoising? to Yes in the Motion correction job.)

Tilt angle oﬀset (deg)

(This option allows you to reconstruct all your tomograms according to a pre-set oﬀset for the tilt

angle. This may be useful, for example when you have FIB-milled all you lamellae under a know tilt

angle.)

Reconstruct only this tomogram

(This option allows you to reconstruct only one tomogram from the input series, for example because

you have tweaked its alignment parameters or because you want to run a quick test. Just specify the

rlnTomoName, e.g. TS_01)

On the

Running tab choose the approriate number of MPIs and Threads and then click on the Run! button. With 5

MPI and 12 threads this step took less than 3 minutes on our computer. But denoising tomograms will take more time.

The output tomograms are called Tomograms/job006/tomograms/rec_TS_01.mrc etc. if Generate tomograms

for denoising? was set to No or Tomograms/job006/tomograms/rec_TS_01_half<1/2>.mrc otherwise. You

can visualise them in your favourite viewer, including IMOD’s 3dmod or Napari. The main objective of these tomo-

grams is to assess the quality of your sample and to allow particle picking. They do not need to contain high-resolution

information at this point.

5.7. Reconstruct tomograms 77

RELION

5.8 Denoise tomograms

For denoising tomorgams, the Denoise tomograms is a wrapper of Cryo-CARE [BJPJ19].

The ﬁrst step is to train a neural network on patches from the odd and even frame tomograms using the NOISE2NOISE

approach. Therefore, denoising only works if both Save images for denoising? in the Motion correction job and

Generate tomograms for denoising? in the Reconstruct tomograms were both set to Yes. The second step is to

apply the trained neural network to patches in the noisy tomograms to obtained the denoised versions. Each of the two

steps will be carried out as a separate Denoise tomograms job, as described below.

5.8.1 Train

For the training step, select the Denoise tomograms job type, and on the I/O tab, set:

Input tomograms.star:

Tomograms/job006/tomograms.star

Directory with cryoCARE execuables:

“... ”

(The path to the cryoCARE directory on your local machine)

Which GPU to use:

On the CryoCARE: Train tab, set:

Train denoising model:

Yes

Tomograms for model training:

TS_01:TS_03:TS_43:TS_45:TS_54

Number of sub-volumes per tomogram:

1200

Sub-volume dimensions (px):

On the CryoCARE: Predict tab, set:

Generate denoised tomograms:

and press the Run button to run the job.

On our machine, this job took under 1.5 hours to run.

78 Chapter 5. Subtomogram tutorial

RELION

5.8.2 Predict

For the actual denoising step, select a new Denoise tomograms job, and set the ﬁelds on the I/O tab to the same values

as in the previous Denoise tomograms job. On the CryoCARE: Train tab, set:

Train denoising model:

and on the CryoCARE: Predict tab, set:

Generate denoised tomograms:

Yes

Path to denoising model:

Denoise/job007/denoising_model.tar.gz

(The model trained in the previous Denoise job.)

and leave the other ﬁelds set to their default values. Running this job is much faster than the training job and in our

workspace it took around one minute.

5.8.3 Analysing the results

The tomogram set resulting from the Predict step (in our case the ﬁle Denoise/job008/tomograms.star) contains

the additional column rlnTomoReconstructedTomogramDenoised pointing to the denoised tomograms, which can

be visualised, for example, using IMOD or Napari. As mentioned in the Reconstruct tomograms section, the denoised

tomograms, just like the initial reconstructed odd/even frame tomograms, are used for visualisation and particle picking

only, and will not be used for further processing in the sub-tomogram averaging pipeline once the particles have been

annotated.

5.9 Particle picking

Particle picking is probably the most diﬃcult part of the subtomogram averaging pipeline. It is important that you

pick particles as accurately as possible, so RELION is not trying to align and average particles without the appropriate

signal. Alister Burt has written a Napari plug-in to pick particles directly in your tomograms. His picker can be used

to pick individual particles, or particles that are randomly sampled from geometric shapes (curved lines for ﬁlaments,

spheres for capsids or vesicles, or surfaces for membranes or other larger objects. (Please do not use the surface picker

– it is not functional yet).

To launch the Napari picker, again make sure you are working on the computer you are actually sitting behind, as

Napari performs poorly over remote connections, and select the Pick tomograms job type. On the I/O tab, set:

Input tomograms.star

Denoise/job008/tomograms.star

Picking mode

spheres

(The Napari picker has 4 modes of picking: particles, spheres, ﬁlaments and surfaces. For the tuto-

rial data we use spheres, because HIV-VLPs are more-or-less that shape. Besides randomly picking

particles with the distance speciﬁed below, this mode of picking also provides prior angles on the par-

ticles that will mean that with a tilt angle of 90 degrees, they will be oriented with their Z-axis along

the normal to the sphere surface. In what follows, this prior information will be used in reﬁnements

and classiﬁcation of the subtomograms. Filaments are picked as multiple points forming a (curved)

5.9. Particle picking 79

RELION

line, with priors that result in tilt=90 angles orienting particles with their Z-axis along the helical

axis. Individual particle picking does not provide any priors. Surface picking is not functional yet.)

Particle spacing (A)

(This is the inter-particle distance with which particle positions will be randomly sampled from on

the sphere. HIV capsid hexamers are approximately 75Å apart. By over-sampling the sphere we

reduce the number of missed particles.)

Input particles.star (optional)

“”

(We do not use this ﬁeld at this point in the pipeline. It can be used to load the particles in a given

star ﬁle as annotations on the tomogram by providing such a star ﬁle in this ﬁeld together with setting

Picking mode to particles.)

On the Running tab set:

Submit to queue

And click on the Run! button to launch the Napari GUI.

Napari will automatically open the ﬁrst tomogram on the GUI. Keep down (drag) the left mouse button (lef-mouse

+ drag) to rotate the scene; use the scroll wheel to zoom in/out; use Shift + lef-mouse + drag on the tomogram

plane to move the visualised plane along the Z axis (normal to plane). The contrast limits slide bar in the top left

side of the screen can be used to adjust contrast in the visualised tomogram plane.

Sphere annotation:

The ﬁrst step is to locate the centre of the sphere-like object in the tomogram. In the tutorial data, HIV-VLPs

are more-or-less distorted spheres. Move along the Z-dim of a chosen capsid to ﬁnd the maximum diameter

and locate the middle plane of that capsid. Then use Alt + left-click or (Ctrl + Alt + left-click on

some Linux ﬂavours) at that centre to place a small sphere. To resize the sphere press letter r and the sphere will

be resized to point where the mouse is pointing. You’ll need to play a bit to get the right radius of the sphere.

The right sphere size is when the surface of the sphere cuts through the middle of the capsid membrane. Then

go to a new capsid and repeat the above process. To move on to a new capsid press the letter n. To delete or

move an existing sphere, select the n3d spheres layer on the left side of the screen, then left-click on the centre

of the sphere. The sphere can then be moved by drag-and-drop of its centre, or deleted by pressing the x button

on the top-left of the screen (under layer controls). Once you ﬁnished annotating all the spheres in one

tomogram, press the save spheres button. Next you can choose the second tomogram and so on. Once you

ﬁnished annotating all the tomograms, close the Napari window.

Filaments annotation:

If you’re picking ﬁlaments, ﬁnd the start of one and use (Ctrl +) Alt + left-click to add a point to its

(curved) line; keep adding as many points as you need to describe the curvature. To move on to a new ﬁla-

ment press the letter n. Once you ﬁnished annotating all the ﬁlaments in one tomogram press the button save

filaments and move to the next tomogram.

Surface annotation:

This is not functional yet. Please stay tuned.

Individual particle annotation:

To pick a new indiviual particle, use (Ctrl +) Alt + left-click. Napari will show this location with a

small blue sphere. To delete or move any particle picked, ﬁrst select the n3d points layer on the Napari GUI

(the layer list is on the left of the screen), then select the particle by left-clicking on it. Pressing the x button on the

top left side of the screen (under layer controls) deletes the particle, while pressing the third button (Select

points) allows you to select a new particle, which can be deleted in the same way, or moved by drag-and-drop.

80 Chapter 5. Subtomogram tutorial

RELION

Once you ﬁnished picking all the particles in one tomogram, click on the save particles button and move to

the next tomogram. picking all the tomograms just close the Napari GUI.

After the Napari picker is closed, a python script will save the selected annotations to individual annotation

ﬁles *_spheres.star, *_filaments.star or *_particles.star, one per tomogram, to the Picks/job009/

annotations directory, and then write out a ﬁle called Picks/job009/particles.star with all particles

from all tomograms. This star ﬁle contains the coordinates of your particles in 3 dimensions under the headers

rlnCoordinate<X/Y/Z>, which are in (unbinned) pixels starting from (1,1,1) in the top corner of the tomogram.

We are currently working to write out these coordinates as rlnCenteredCoordinate<X/Y/Z>Angst, which wil be

in Angstroms from the centre of the tomogram. Please bear with us as this work is being ﬁnished during beta-code

development.

For your own data, you may also want to try other particle pickers such as TomoTwin, DeePiCt, DeepFinder, CrYOLO,

or others. We strongly recommend only picking in tomograms generated in ReconstructTomograms (or Denoise)

jobs, unless you can verify that the coordinates that you picked in tomograms generated outside of RELION match

the coordinates of the RELION tomograms perfectly. Future developments in the ccp-em tomography pipeline will

hopefully make using third-party pickers easier.

5.10 Extract subtomos

Now that we have 3D particle coordinates, the Extract job extracts the relevant cropped areas of the tilt series images

for each individual particle and saves them as CTF-premultiplied extracted 2D image stacks (or as 3D volumes) on the

disk for further processing using the relion_reﬁne program. As these are not really boxed out of a 3D tomogram, we

call these particles pseudo-subtomograms.

The pseudo-subtomograms can be either 2D stacks or 3D volumes (where Fourier slices of the 2D images are combined

in 3D). While both options are available in Extract , we strongly recommend working with 2D stacks, as they take less

disk space and, in the case of reconstruction and reﬁnement, this saves an interpolation step and therefore avoids artifacts

related to pseudo-subtomogram construction.

Pseudo-subtomogram particles are described by a particle set star ﬁle, as well as a tomogram set star ﬁle. In addi-

tion, if Bayesian polishing has been performed, the pseudo-subtomograms are also described by a trajectory set star

ﬁle. For convenience, a single optimisation set star ﬁle provides links to the corresponding three star ﬁles. After a

Bayesian polishing and/or CTF reﬁnement job, a new set of updated pseudo-subtomos particles should be extracted

prior to further reﬁnement or classiﬁcation by relion_reﬁne.

We will start by extracting pseudo-subtomograms in relatively large boxes with a large (binned) pixel size to speed up

the initial calculations for obtaining a de novo initial model. Select the Extract subtomos jobtype, and on the IO tab

set:

Input optimisation set

Picks/job009/optimisation_set.star

(You can see how this ﬁle refers to the inidividual particle set and tomogram set star ﬁles using cat

Picks/job008/optimisation_set.star.)

OR: use direct entries?

Input particle set

“”

Input tomogram set

“”

Input trajectory set

“”

5.10. Extract subtomos 81

RELION

On the Reconstruct tab, we set:

Binning factor

(This will result in a pixel size of 8.1 Angstroms... )

Box size (binned pix)

192

(This is the box size for the original cropping of the particle from the tilt series images. This box is

often set larger than the cropped box size below, because a larger box will allow more of the high-

frequency signal to be captured by pre-multiplication of the CTF.)

Cropped box size (binned pix)

(This is the ﬁnal box size of the particles. If this value is smaller than the original box size above,

then a second cropping operation is performed after the CTF pre-multiplication.)

Maximum dose (e/A^2)

(Tilt series frames with a dose higher than this maximum dose (in electrons per squared Angstroms)

will not be included in the 3D pseudo-subtomogram, or in the 2D stack. For 2D stacks, this will

reduce disc I/O operations and increase speed.)

Minimum nr. frames

(Some particles are outside the ﬁeld of view (i.e. invisible) in high-tilt images. When set to a positive

value, each particle needs to be visible in at least this number of tilt series frames with doses below

the maximum dose to be written out as a pseudo-subtomogram.)

Write output as 2D stacks?

Yes

(This is new as of relion-5 and the preferred way of generating pseudo-subtomograms. If set to

No, then relion-4 3D pseudo-subtomograms will be written out. Either can be used in subsequent

reﬁnements and classiﬁcations, but 2D stacks take up less disk space and give slightly better results

in 3D reﬁnements. In some cases, we have noticed that 3D pseudo-subtomograms behave better in

the VDAM algorithm of 3D initial reference jobs, which is why one can still write in this format too.)

Write output in ﬂoat16?

Yes

(This will save a factor of 2 in disk space.)

On the Running tab, we aim to saturate the processes on our 112-core CPU node by setting:

Number of MPI procs

Number of threads

Note that the MPI versions of this program (and those of Reconstruct particle , CTF reﬁnement and Bayesian polishing

are parallelized at the level of individual tomograms. Therefore, the Number of MPI processes should not exceed

the number of tomograms.

Using the settings above, this job took around 17 minutes on our system.

82 Chapter 5. Subtomogram tutorial

RELION

Your pseudo-subtomogram 2D stacks will be stored into MRC ﬁles in a new directory called Extract/job010/

Subtomograms/TS_01/1_stack2d.mrcs etc. The program will also write out an updated particle set as Extract/

job010/particles.star and a new optimisation set as Extract/job010/optimisation_set.star.

5.11 Import coordinates

While we will not use this functionality in this tutorial, here we describe brieﬂy how one can import coordinates of

particles picked outside RELION, for example by using picking software such as crYOLO [WMS+19].

We assume, for illustration purpose, that the coordinate ﬁles are one text ﬁle per tomogram, containing particle co-

ordinates that have been picked in the tomograms obtained from a Reconstruct tomograms or Denoise tomgorams

job, with the ﬁle names containing the tomogram name (e.g. rec_TS_01.coords) and are located in a directory

PARTICLE_COORDS.

Select Import from the jobt-type browser on the left and ﬁll in the following parameters on the Coordinates tab:

Or Import coordinates instead?

Yes

Input coordinates?

PARTICLE_COORDS/rec_TS_*

(Alternatively, if the particle ﬁles are in RELION STAR ﬁle format, then the input coordinate ﬁle

should be a 2-column STAR ﬁle with columns rlnTomoName and rlnTomoImportParticleFile

for the tomogram names and their corrsesponding particle coordinate ﬁles.)

Remove substring from ﬁlenames:

rec_

(This ﬁeld and the one below allow the import program to ﬁnd the tomogram name in the ﬁle name.)

Second substring to remove

.coords

Text ﬁles contain centered coordinates?

Multiply coords in text ﬁles with:

7.407407

(If the coordinates are with respect to the binning/voxel size in the tomograms in Denoise/job008/

tomograms/, this is the number that they will be multiplied with so that the resulting coordinates

correspond to the pixel size in the motion-corrected tilt series. In this case, this is found in the

rlnTomoTomogramBinning column of the Denoise/job008/tomograms.star ﬁle.)

Add this to coords in text ﬁles:

Inside the newly created directory, you will ﬁnd the imported particle set particles.star ﬁle. This should then

be used as input to a new Extract subtomos job together with the tomogram set tomograms.star ﬁle containing the

details of the tomograms used for picking outside RELION:

Input optimisation set:

“”

OR: use direct entries?

Yes

5.11. Import coordinates 83

RELION

Input particle set:

Import/jobXX/particles.star

Input tomogram set:

Denoise/job008/tomograms.star

This will generate new 2D stacks or 3D pseudo-subtomograms that can be used in the next reﬁnement steps. The

resulting particles can be visualised using the Napari plugin in a new Pick tomograms job with Picking mode set to

particles, as explained in the Visualising the remaining particles section.

5.12 De novo 3D model generation

relion-5.0 uses a gradient-driven algorithm to generate a de novo 3D initial reference from the pseudo-subtomograms.

This algorithm is diﬀerent from the SGD algorithm in the CryoSPARC program [PRFB17]. Provided you have a

reasonable distribution of angular directions, this algorithm is likely to yield a suitable low-resolution model that can

subsequently be used for 3D classiﬁcation or 3D auto-reﬁne .

The sample and the sphere picking procedure used in this tutorial make it possible to obtain initial orientations nor-

mal to the spheroidal surface during the picking process, which can be used to obtain an initial reference using the

Reconstruct particle job, as explained in the Reconstruct particle section. The map obtained in this way will be used as

a reference for the ﬁrst 3D auto reﬁne job at bin 6, which will be explained in the Initial 3D reﬁnement section.

However, here we will show how to obtain a de novo model without any prior knowledge, using the VDAM algorithm.

We have noticed that in some cases VDAM generates better initial models using 3D pseudo-sobtomograms rather than

2D stacks, so you may want to try both approaches for your own dataset. To generate 3D stacks, a new Extract subtomos

job should be run with the Write output as 2D stacks? option set to No (see the Extract subtomos section).

5.12.1 Running the job

Select the Extract/job010/optimisation_set.star ﬁle on the I/O tab of the 3D initial reference jobtype. Ev-

erything is already in order on the CTF tab. Fill in the Optimisation tab as follows:

Number of VDAM mini-batches

100

(The number of iterations of VDAM. The algorithm will loop over mini-batches, which contain only

hundreds to thousands of particles.)

Regularisation parameter T

(Values greater than 1 for this regularisation parameter (T in the JMB2011 paper) put more weight

on the experimental data. Values around 2-4 have been observed to be useful for 3D initial model

calculations.)

Number of classes

(Sometimes, using more than one class may help in providing a ‘sink’ for sub-optimal particles that

may still exist in the data set. In this case, which is quite homogeneous, a single class should work

just ﬁne.)

Mask diameter (A)

500

84 Chapter 5. Subtomogram tutorial

RELION

Flatten and enforce non-negative solvent

Yes

Symmetry

Run in C1 and apply symmetry later

Yes

(If set to yes, the actual reﬁnement will be run in C1, which has been observed to converge better

than performing it in higher symmetry groups. After the reﬁnement, the relion_align_symmetry

program is run to automatically detect the symmetry axes and the symmetry will be applied.)

Prior width on tilt angle (deg)

(Since the picking gives tilt angles so that the particles are normal to surface of the pseudo-spheres,

we enforce this prior knowledge here.)

On the Compute tab, set:

Use parallel disc I/O?

Yes

Number of pooled particles:

Pre-read all particles into RAM?

Copy particles to scratch directory

“”

Combine iterations through disc?

Use GPU acceleration?

Yes

Which GPUs to use

On the Running tab, set:

Number of MPI procs

(Remember that the gradient-driven algorithm does not scale well with MPI.)

Number of threads

Using the settings above, this job took 90 minutes on our system. If you didn’t get that coﬀee before, perhaps now is a

good time too. ..

5.12. De novo 3D model generation 85

RELION

5.12.2 Analysing the results

You could look at the output map from the gradient-driven algorithm (InitialModel/job011/

run_it100_class001.mrc) with a 3D viewer like UCSF chimera. If Run in C1 and apply symmetry later

was set to yes, you should probably conﬁrm that the symmetry point group was correct and that the symmetry axes

were identiﬁed correctly. If so, the symmetrised output map (InitialModel/job011/initial_model.mrc) should

look similar to the output map from the gradient-driven algorithm.

5.13 Reconstruct particle

The Reconstruct particle job generates a reference map from a particle set by averaging the particles using positions

and orientations given in the input particles ﬁle. This reference map will then be used for further processing in

3D auto-refne , 3D classiﬁcation , CTF reﬁnement or Bayesian polishing .

Continuing the tutorial, we will now generate a ﬁrst 3D reference map using the particles obtained by the previous

Extract subtomos job at bin 6.

5.13.1 Running the job

Select the IO tab from the Reconstruct particle jobtype.

Input optimisation set:

Extract/job010/optimisation_set.star

OR: use direct entries:

Input particle set:

Input tomogram set:

Input trajectory set:

On the Average tab, make sure the following is set to reconstruct particles with a binning factor of 6:

Binning factor:

Box size (binned pix):

192

Cropped box size (binned pix):

Wiener SNR constant:

Symmetry:

On the Running tab, set:

86 Chapter 5. Subtomogram tutorial

RELION

Number of MPI procs:

Number of threads:

Note that reconstructing from 2D tilt series relion_tomo_reconstruct_particle program has 3 thread arguments. The

number of threads provided here sets both --j and --j_out arguments so, to avoid exceeding the available memory

resources, the --mem argument should also be set using around 80-90% to keep a safety margin. In our system, we ran

1 MPI process per cluster node (64Gb) so we added:

Additional arguments:

--mem 50

Using the settings above, this job took less than 5 minutes on our system.

5.13.2 Analysing the results

You can look at the output map Reconstruct/job011/merged.mrc with a 3D viewer like UCSF chimera. When

random halfsets are speciﬁed in the input particles ﬁle via the rlnRandomSubset ﬁeld, the two halfmaps half1.mrc

and half2.mrc are also generated.

5.14 Initial 3D reﬁnement

Once we have an initial reference map, one may use the 3D auto-reﬁne procedure in relion to reﬁne the dataset to

high resolution in a fully automated manner. This procedure employs the so-called gold-standard way to calculate

Fourier Shell Correlation (FSC) from independently reﬁned half-reconstructions in order to estimate resolution, so

that self-enhancing overﬁtting may be avoided [SC12]. Combined with a procedure to estimate the accuracy of the

angular assignments [Sch12b], it automatically determines when a reﬁnement has converged. Thereby, this procedure

requires very little user input, i.e. it remains objective, and has been observed to yield excellent maps for many data

sets. Another advantage is that one typically only needs to run it once, as there are hardly any parameters to optimize.

However, as the pseudo-subtomogram ﬁles require more memory resources compared to SPA, we suggest running this

procedure in several steps, from high binning factors to 1, to improve processing time. Since the ﬁrst set of pseudo-

subtomograms have been extracted at binning factor 6, we will start the 3D reﬁnement using those same particles.

5.14.1 Running the auto-reﬁne job at bin 6

On the I/O tab of the 3D auto-reﬁne job-type set:

Input optimisation set:

Extract/job010/optimisation_set.star

(If an optimisation set ﬁle is provided, the input particles and tomograms STAR ﬁles are set based

on its content.)

OR: use direct entries?

(Since we provide an optimisation set ﬁle in the ﬁeld above, we will not be providing a particle set,

tomogram set or trajectory set.)

Input particle set:

“”

5.14. Initial 3D reﬁnement 87

RELION

(Note this is blank as it is extracted from the optimisation set ﬁle.)

Input tomogram set:

“”

(This is also blank as it is given in the optimisation set ﬁle.)

Input trajectory set:

“”

(This is blank as we have not run a Bayesian polishing job yet.)

Reference map:

Reconstruct/job011/merged.mrc

Reference mask (optional):

“”

(We’re not using a mask at this point, so leave this empty for now.)

On the Reference tab, set:

Ref. map is on absolute greyscale?

Yes

Resize reference if needed?

Yes

Initial low-pass ﬁlter (A)

(We typically start auto-reﬁnements from low-pass ﬁltered maps to prevent bias towards high-

frequency components in the map, and to maintain the gold-standard of completely independent

reﬁnements at resolutions higher than the initial one.)

Symmetry

On the CTF tab set:

Do CTF correction?

Yes

Ignore CTFs until ﬁrst peak?

On the Optimisation tab set:

Mask diameter (A):

500

and keep the defaults for the remaining options.

Note that the box size at bin 6 is 96 x 8.1Å = 777.6Å, so setting a large mask diameter of 500Å (remember the HIV

capsid hexamers are 75Å apart) in the ﬁrst 3D auto-reﬁne job at bin 6 allows us to use more information in the low-

resolution images to obtain a ﬁrst round of particle alignments and a map that will then be further reﬁned with a smaller

mask of diameter 230Å and a smaller binning factor (i.e. higher resolution).

On the Auto-sampling tab, one can usually keep the defaults. Note that the orientational sampling rates on the

Auto-sampling tab will only be used in the ﬁrst few iterations, from there on the algorithm will automatically increase

the angular sampling rates until convergence. Therefore, for all reﬁnements with less than octahedral or icosahedral

symmetry, we typically use the default angular sampling of 7.5 degrees, and local searches from a sampling of 1.8

88 Chapter 5. Subtomogram tutorial

RELION

degrees. Only for higher symmetry reﬁnements we use 3.7 degrees sampling and perform local searches from 0.9

degrees.

The last two ﬁelds on the Auto-sampling tab are set as follows:

Use ﬁner angular sampling faster?

(If set to yes, the reﬁnement is more aggresive in proceeding with iterations of ﬁner angular sampling.

This will speed up the calculations at the potential cost of suboptimal convergence. Therefore, if using

this option, you might want to check that you are not obtaining suboptimal alignments in the early

reﬁne jobs and not losing resolution in the later stages of your own processing.)

Prior width on tilt angle (deg)

(This ﬁeld has the same purpose as in the 3D initial reference job: enforcing priors on the tilt angle

of the particles. Since we know from the sphere picking procedure that the particles are normal to

the surface of the spheres, we can use this knowledge to speed-up convergence.)

Ignore the Helix tab, and on the Compute tab set:

Use parallel disc I/O?

Yes

Number of pooled particles:

Skip padding?

Pre-read all particles into RAM?

Copy particles to scratch directory

“”

Combine iterations through disc?

Use GPU acceleration?

Yes

Which GPUs to use

(Set the id sequence of the GPU cards separated by colon (0:1:2) or leave blank to automatically

use all conﬁgured cards)

On the Running tab, set:

Number of MPI procs

Number of threads

As the MPI nodes are divided between one leader (who does nothing else than bossing the others around) and two sets

of followers who do all the work on the two half-sets, it is most eﬃcient to use an odd number of MPI processors, and

the minimum number of MPI processes for 3D auto-reﬁne jobs is 3. Memory requirements may increase signiﬁcantly

5.14. Initial 3D reﬁnement 89

RELION

at the ﬁnal iteration, as all frequencies until Nyquist will be taken into account, so for larger sized boxes than the ones

in this test data set you may want to run with as many threads as you have cores on your cluster nodes.

Before pressing the Run! button, we give this job the alias bin6 so we can refer to it easily later.

On our computer with 2 GPUs, this calculation took approximately 3.5 hours.

5.14.2 Analysing the results

At every iteration the program writes out two run_it0??_half?_model.star and two run_it0??_half?

_class001.mrc ﬁles: one for each independently reﬁned half of the data. Only upon convergence a single

run_model.star and run_class001.mrc ﬁle will be written out (without _it0?? in their names). Because the

two independent half-reconstructions are joined together in the last iteration, the resolution will typically improve sig-

niﬁcantly. This iteration also requires more memory and CPU, as the program will use all the data up to Nyquist

frequency.

Note that the automated increase in angular sampling is an important aspect of the auto-reﬁne procedure. It is based on

signal-to-noise considerations that are explained in [Sch12b], to estimate the accuracy of the angular and translational

assignments. The program will not use ﬁner angular and translational sampling rates than it deems necessary (because it

would not improve the results). The estimated accuracies and employed sampling rates, together with current resolution

estimates, are stored in the _optimiser.star and _model.star ﬁles, but may also be extracted from the stdout ﬁle.

For more information, check the SPA tutorial high-resolution 3D reﬁnement step.

The program also writes an optimisation set run_optimisation_set.star ﬁle, updated with run_data.star

(i.e. the particles ﬁle) and the tomograms and trajectories ﬁles (given as input to the 3D auto-reﬁne job). This

run_optimisation_set.star ﬁle should not be confused with the _optimiser.star ﬁles used regularly by re-

lion_reﬁne.

This job will have likely reached Nyquist frequency so, to go to higher resolution, we will need a new set of pseudo-

subtomo particles at a smaller binning factor, 2 or directly 1.

5.14.3 Pseudo-subtomograms at bin 2

We will now perform 3D reﬁnement at binning factor 2, which will lead to a higher resolution features than the previous

binning factor. To do this, we ﬁrst need to extract a new set of pseudo-subtomograms at binning factor 2. Go to the

Extract subtomos jobtype on the GUI, and on the I/O set:

Input optimisation set:

Reﬁne3D/job012/run_optimisation_set.star

On the Reconstruct tab, make sure the following is set to extract particles with a binning factor of 2:

Binning factor:

Box size (binned pix):

256

Cropped box size (binned pix):

128

The other parameters are the same as in the previous Extract subtomos job:

Maximum dose (2/A^2)

Minimum nr. frames

90 Chapter 5. Subtomogram tutorial

RELION

Write output as 2D stacks?

Yes

Write output in ﬂoat16?

Yes

5.14.4 Obtaining a 3D reference at bin 2

Having extracted a new set of particles at binning factor 2, we will now obtain a 3D reference map at the same binnig

factor. Select the Reconstrcut particle jobtype on the GUI, and set in the I/O tab:

Input optimisation set:

Extract/job013/optimisation_set.star

OR: use direct entries?

and on the Average , set:

Binning factor:

Box size (binnied pix):

256

Cropped box size (binned pix):

128

Symmetry:

Then run the job with the same settings on the Running tab as in the previously run Reconstruct particle job. With the

newly extracted bin 2 particles and 3D reference, we will now proceed to the bin 2 3D auto-reﬁne job.

5.14.5 Running the auto-reﬁne job at bin 2

On the I/O tab of the 3D auto-reﬁne job-type set:

Input optimisation set:

Extract/job013/optimisation_set.star

OR: use direct entries?

(Note that the input particle set, input tomogram set and input trajectory set are empty as this infor-

mation is extracted from the optimisation set ﬁle.)

Reference map:

Reconstruct/job014/half1.mrc

(Here we use the resulting map from the bin 6 3D auto-reﬁne job.)

On the Reference tab, set:

Ref. map is on absolute greyscale?

Yes

5.14. Initial 3D reﬁnement 91

RELION

Resize reference if needed?

Yes

Initial low-pass ﬁlter (A)

(We set the low-pass ﬁlter slightly below the reached resolution in the previous step. In this case, it

is the Nyquist resolution at binning factor 6.)

Symmetry

On the CTF tab set:

Do CTF correction?

Yes

Ignore CTFs until ﬁrst peak?

On the Optimisation tab set:

Mask diameter (A):

230

On the Auto-sampling tab, to resume the reﬁnement from the current resolution, we could adjust the angular sampling

below the angular resolution given the initial low-pass ﬁlter argument and mask diameter. A coarse estimation can be

obtained by arctan(

𝑟𝑒𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛*2

𝑑𝑖𝑎𝑚𝑒𝑡𝑒𝑟

). In our case:

Initial angular sampling:

7.5 degrees

Use ﬁner angular sampling faster?

On our computer with 2 GPUs, we used 5 MPIs and 6 threads, and this calculation took approximately 7.5 hours.

Again, the 3D reﬁnement will have reached Nyquist resolution.

Before doing further reﬁnement at binning factor 1, we need to eliminate the duplicate particles that would lead to

an overestimated resolution, as well as the bad particles that do not have suﬃcient information for high-resolution

reﬁnement. We will do this in the next two sections.

5.15 Duplicate particles removal

Before continuing with higher-resolution reﬁnement of the structure we have so far, now is a good time to discard any

unwanted particles, which are either duplicates or low quality. In this section, we address the problem of duplicate

particles.

As the initial particle picking consists of random sampling of positions on the annotated spheres, these positions are

gradually reﬁned into positions of particles in the tilt series that resemble projections of the structure being subtomo-

gram averaged. Consequently, it is likely that multiple particle positions converge to the same, or very close, positions

in the tilt series (and equivalently in the tomogram). This is problematic because, in addition to unnecessarily reﬁning

the same particle multiple times in each reﬁnement cycle (and therefore wasting time and computational resources), the

same particle can appear in both random halfsets during reﬁnemt, which leads to overestimated resolution estimates,

since the gold-standard FSC criterion is no longer satisﬁed. Therefore, duplicate particles must be removed from the

particle set.

92 Chapter 5. Subtomogram tutorial

RELION

To do this, go to the Subset selection job-type, and on the I/O tab, write the path of the previous job’s output particles

ﬁle in the appropriate ﬁeld:

Select classes from job:

“”

OR select from micrographs.star:

“”

OR select from particles.star:

Reﬁne3D/job015/run_data.star

Leave all the other ﬁelds at their default values, except on the Duplicates tab:

OR: remove duplicates?

Yes

Minimum inter-particle distance(A)

(Since the distance between two adjacent hexamers is approximately 75Å, we can reasonably assume

that any particles closer than 30Å will eventually converge to the same particle, and therefore should

be considered duplicates.)

We give this job the alias remove-dups and press the Run! button. The duplicated particles are then removed and the

resulting duplicate-free particle set is in Select/remove-dups/particles.star.

In our workspace, this job removed 8937 duplicated particles from the initial 30596 particles.

5.16 3D classiﬁcation

After removing the duplicate particles from the particle set, we will now perform 3D classiﬁcation in order to ﬁlter

out bad particles that will prevent us from obtaining a high-resolution structure in the subsequent reﬁnement steps at

binning factor 1.

As the particles were initially picked at random on the spherical annotations, while the HIV capsids are not strictly

spherical, it is likely that some of the particles we have reﬁned up to this point either do not correspond to actual

hexamers at all (in the case when the spherical annotation and the capsid do not coincide in the tomogram) or were

sampled from regions where the data itself does not contain high-resolution information or has artifacts.

Therefore, assuming we have obtained a good low-resolution reference map, we will now classify the particles based

on how well they align with the reference and discard those that align poorly, which will show as blurry, or completely

incorrect, 3D classes.

5.16.1 Running the 3D classiﬁcation job

Select the 3D classiﬁcation job-type, and on the I/O tab, set the appropriate ﬁles:

Input optimisation set:

“”

OR: use direct entries?

Yes

(Since we ran the Subset selection job to remove the duplicate particles in the previous step, we will

now use its output particles ﬁle as the particle set instead of the previously used optimisation set. The

tomogram set will be the same as the one in the previous optimisation set ﬁle.)

5.16. 3D classiﬁcation 93

RELION

Input particle set:

Select/job016/particles.star

Input tomogram set:

Denoise/job008/tomograms.star

Reference map:

Reﬁne3D/job015/run_class001.mrc

On the Reference tab, ensure that the following two ﬁelds are set as follows:

Initial low-pass ﬁlter (A):

(We low-pass ﬁlter the reference map at a low enough resolution in order not to bias the reﬁned

classes towards high-resolution features.)

Symmetry:

(At this point, the aim is to ﬁnd the bad particles in the particle set rather than to obtain the best

reconstruction, so we perform the classiﬁcation without enforcing the symmetry, since the bad par-

ticles are not necessarily symmetric. The good (and symmetric) particles will end up assigned to a

class that hopefully results in a symmetric map.)

Leave the ﬁelds on the CTF tab with their default values, then in the Optimisation tab leave the default values for all

ﬁelds except:

Number of classes:

(There is no right number of classes, and in your own run of the pipeline it may be useful to experiment

with diﬀerent numbers to ensure that the largest number of good particles is selected. Note that the

computational cost of 3D classiﬁcation scales linearly with the number of classes, both in terms of

CPU time and memory.)

Mask diameter (A):

230

On the Sampling tab, change the ﬁelds:

Perform image alignment?

Yes

Angular sampling interval:

1.8 degrees

Perform local angular searches?

Yes

(As the purpose of this classiﬁcation step for this particular dataset is to see which particles align

well with the reference structure we have so far, rather than to discover heterogeneity in the data, we

only perform local alignment of the individual particles.)

Prior width on tilt angle (deg)

Ignore the Helix tab and set the ﬁelds on the Compute and Running tabs to the same values as in the 3D auto-reﬁne

jobs in the Initial 3D reﬁnement section:

94 Chapter 5. Subtomogram tutorial

RELION

Number of pooled particles:

Use GPU acceleration?

Yes

Number of MPI procs:

Number of threads:

On our computer with 2 GPUs, this job took around 2 hours for 25 iterations and 21659 particles.

5.16.2 Analysing the results of the 3D classiﬁcation

For an overview of the the resulting classes in the ﬁnal iteration, use the Display: button on the 3D classiﬁcation job

that has just completed and select out:run_it025_optimiser.star, which will open a Relion Display GUI. This

allows you to see central slices through all classes side-by-side. Before pressing the Display! button on the Relion

Display GUI, it is useful to set a few ﬁelds as follows.

The Min and Max ﬁelds set the intensity range of the displayed images, and setting them to non-zero values allows you

to see all class images in the same intensity range. This is useful to see which classes have low intensity (i.e. little mass

inside the volume), which otherwise would be diﬃcult to assess on a normalised scale for each image. The speciﬁc

values to set depend on the resulting intensities after each run; in our data for this particular 3D classiﬁcation run, the

following values were useful:

Min

-0.01

Max

0.04

Another useful ﬁeld to tick is

Sort images on:

rlnClassDistribution

as well as the Display label? ﬁeld. Finally, press the Display! button and the class images will be shown in the

order of the fraction of the total number of particles assigned to each class, with the fraction shown as the label of each

image. Right-clicking on individual images and selecting Show metadata this class will open a new window

with more information about the class, such as the number of particles in it.

It is also possible to see 2D slices through each class map by going back to the main Relion GUI and clicking on

Display: and out:run_it025_class00X.mrc for any class index X. This will open the Relion Display GUI and

pressing Display! will open a new window with the individual slices through the map. Alternatively, the individual

map ﬁles are located in Class3D/job017/run_it025_class001.mrc and can be opened with a 3D volume viewer

such as UCSF chimera.

5.16. 3D classiﬁcation 95

RELION

5.16.3 Discarding the bad particles

Next, we want to only keep in our dataset the particles belonging to the best classes obtained in the 3D classiﬁcation

and discard the rest. To do this, go to the Subset selection job-type and in the I/O tab input the _optimiser.star

ﬁle from the previous 3D auto-reﬁne job into the appropriate ﬁeld:

Select classes from job:

Class3D/job017/run_it025_optimiser.star

Leave all the other ﬁelds in all tabs set to their default values (set the remove duplicates? ﬁeld in the Duplicates

tab to No if it was set otherwise from the previous job) and press the Run! button. This will open the Relion Display

GUI, and after setting the ﬁelds to the same values as explained above, press the Display! button, which will open the

window showing all classes side-by-side.

Select the good classes by clicking on the individual class images – the selected classes will have a red border, then

right-click on one of the selected images and select Save selected classes. The output of the job in the main

Relion GUI will show a message indicating that the particles.star ﬁle has been saved and the number of selected

particles it contains. You can now safely close the Relion Display GUI windows, and the new particles ﬁle containing

only the particles in the selected classes is Select/job018/particles.star.

In our workspace, the resulting particles.star ﬁle now contains 9442 particles.

5.16.4 Visualising the remaining particles

To visualise the particles we have left as 3D annotations on the tomograms, we can launch the Napari picker with

picking mode: particles (also see Particle picking), with the additional particles ﬁle given as the optional input.

First, make sure that the Relion GUI is running locally, as the Napari plugin is slow over the network. Then, select the

Pick tomograms job type and set the following ﬁelds on the I/O tab:

Input tomograms.star:

Denoise/job008/tomograms.star

Picking mode:

particles

Particle spacing(A):

“”

(Since we selected the particles picking mode above, this ﬁeld will be ignored.)

Input particles.star (optional):

Select/job018/particles.star

(The particle set ﬁle that we want to visualise.)

Running the job will generate the particle annotation ﬁles Picks/job019/annotations/TS_XX_particles.star

from the input particles.star ﬁle and start the Napari plugin, where you can visualise the particle annotations and

manipulate them as explained in the Particle picking section. After clicking the save particles button and closing

the Napari window, new particles.star and optimisation_set.star ﬁles will be created, which can be used in

subsequent steps.

The procedure to perform 3D classiﬁcation using pseudo-subtomograms is the same as in the single particle analysis

pipeline, so you may ﬁnd it useful to also read the description in the SPA tutorial.

96 Chapter 5. Subtomogram tutorial

RELION

5.17 High-resolution 3D reﬁnement

At this point in the tutorial, we have obtained a reﬁned map at binning factor 2, we have removed any duplicate particles,

as well as any bad particles that do not align well with our map. We are now ready to perform 3D reﬁnement at binning

factor 1.

5.17.1 Pseudo-subtomograms and reference map at bin 1

First, we need to generate new pseudo-subtomograms and a reference map at binning factor 1. The order of running

the Extract subtomos and Reconstruct particle jobtypes is irrelevant as long as they are run consecutively to keep track

of the optimisation set ﬁle. In addition, since the previous job we ran was a Subset selection job, we must use its output

particles ﬁle as the input particle set to the next job (either Extract subtomos or Reconstruct particle ):

Input optimisation set:

“”

OR: use direct entries?

Yes

Input particle set

Select/job018/particles.star

Input tomgoram set

Denoise/job008/tomograms.star

Input trajectory set

“”

For 3D reﬁnement at binning factor 1, make sure the following options are properly set on Extract subtomos Reconstruct

tab and Reconstruct particle Average tab:

Binning factor:

Box size (binned pix):

512

Cropped box size (binned pix):

192

5.17.2 Masks for high-resolution reﬁnement and FSC estimation

Next, we need to generate appropriate masks to help the reﬁnement process and FSC assessment. In the particular case

of the HIV capsid, we provide the following masks:

1. mask_align.mrc, which we will use in the Reference mask ﬁeld of the 3D auto-reﬁne job: Since the reﬁned

region in our structure so far contains both the HIV capsid and the matrix, we need to make sure that we only

focus on the capsid. Moreover, this mask is nearly as wide as the 230Å diameter mask applied to the images

during reﬁnement, ensuring that we use as much of the information available in the volume as possible to align

the particles.

2. msk_fsc.mrc, which we will use in the Solvent mask ﬁeld of the Post-processing job (see Post-processing

below). Since we use a wide mask to improve the alignment of particles during reﬁnement, the volume we reﬁne

5.17. High-resolution 3D reﬁnement 97

RELION

will consist of more than one hexamer. In order to asssess the reconstruction quality of the central hexamer only,

we use a tighter mask around it for FSC calculations.

The above masks can be created externally using tools like UCSF chimera or dynamo_mask_GUI from the dynamo

package. For general mask creation, refer to SPA tutorial mask description.

In order to use the provided masks, we suggest recentering the reference map to ensure that the masks and the reference

are aligned (so that the masks focus on the capsid). You could look at the output reference map from the previous

step (Reconstruct/job025/merged.mrc) and the mask (masks/mask_align.mrc) with a 3D viewer like IMOD

3dmod to estimate the Z oﬀset between both maps, in pixels. In our case, it is 2.75 pixels but this could be diﬀerent as

it depends on the initial de novo model. Thus, recentering the particles can be done from the command-line:

relion_star_handler --i Extract/job020/particles.star \

--o Extract/job024/particles2.75.star --center --center_Z 2.75

To check that the capsid within the reference map is aligned with the mask, we can reconstruct it using the

Reconstruct particle job-type, described in Reconstruct particle.

5.17.3 Running the auto-reﬁne job at bin 1

On the I/O tab of the 3D auto-reﬁne job-type set:

Input optimisation set:

Extract/job020/optimisation_set.star

OR: use direct entries:

(If a new particles ﬁle has been generated in the previous step during recentering, this option should

be set to Yes and the correct input particle set and tomogram set ﬁles should be used.)

Reference map:

Reconstruct/job021/half1.mrc

(Once we reach a high enough resolution in the reﬁnement process, it is important to keep the two

halfsets entirely separate in order to obtain a gold-standard reconstruction. Halfmap ﬁles should be

used as reference maps for each halfset processed by relion_reﬁne, keeping the 3D reﬁnement of

both halfsets independent along the whole workﬂow. To that end, when the reference map ﬁle name

contains either *half?*.mrc template, each halfmap is automatically assigned to its halfset.)

Reference mask (optional):

mask_align.mrc

On the Reference tab, set:

Ref. map is on absolute greyscale?

Yes

Resize reference if needed?

Yes

Initial low-pass ﬁlter (A)

5.5

(We set the low-pass ﬁlter slightly below the reached resolution in the previous step. In this case, it’s

Nyquist resolution at binning factor 2.)

98 Chapter 5. Subtomogram tutorial

RELION

Symmetry

On the CTF tab set:

Do CTF correction?

Yes

Ignore CTFs until ﬁrst peak?

On the Optimisation tab set:

Mask diameter (A):

230

Mask individual particles with zeros?

Yes

Use solvent-ﬂattened FSCs?

Yes

(This option enables the computation of the resolution during reﬁnement using masked halfmaps

using the provided mask_align.mrc mask ﬁle.)

Use Blush regularisation?

On the Auto-sampling tab set:

Initial angular sampling:

1.8 degrees

On our system, using 2 GPU cards, this job took around 1.5 hours to run.

We now remove duplicates again by running Subset selection with a minimum inter-particle distance of 50Å to ensure

that no other particles converged to the same positions, and then generate a new reference map with Reconstruct particle .

5.17.4 Post-processing

Finally, the procedure to sharpen a 3D reference map and estimate the gold-standard FSC curves for subtomogram

averaging is the same as described in the SPA tutorial.

Select the Post-processing jobtype, and on the I/O tab, set:

One of the 2 unﬁltered half-maps:

Reconstruct/job025/half1.mrc

Solvent mask:

mask_fsc.mrc

(This is the tight mask that only includes the central hexamer.)

We leave the other ﬁelds as they are and run the job. The resulting ﬁnal resolution we see in our workspace is 4Å.

At this point, this is the best alignment we could reach without applying any speciﬁc tomo reﬁnement, as shown in the

Tomo reﬁnement cycle section.

5.17. High-resolution 3D reﬁnement 99

RELION

5.18 Tomo reﬁnement cycle

We will now describe the speciﬁc tomo reﬁnement steps that will reﬁne the particle-speciﬁc CTF parameters, tilt angles

and beam-induced deformations, which will lead to higher resolution in our subtomogram-averaged structure.

The tomo reﬁnement jobs are CTF reﬁnement and Bayesian polishing , and one tomo reﬁnement cycle would include

both. Moreover, for better improvements, we recommend running multiple tomo reﬁnement cycles.

In the current section, we give an overview of how such a tomo reﬁnement cycle would look.

5.18.1 Reference map and FSC data

Before running each of the tomo reﬁnement jobs ( CTF reﬁnement and Bayesian polishing ), the following are required:

1. Reference halfmaps obtained by running Reconstruct particle . The implementation of the Reconstruct particle

job is consistent with the the implementation of the tomo reﬁnement jobs (e.g. range of the intensities), so it is

important to use this reference map rather than maps obtained from 3D auto-reﬁne .

2. Post-processed FSC data for the reference halfmaps obtained by running Post-processing . This is required to

estimate the SNR. If not provided, these programs internally calculate it without phase randomization, so SNR

will be slightly optimistic.

3. Alignment mask ﬁle at binning facor 1

4. optimisation set

Note that, independently of the binning factor of the data in the previous steps, CTF reﬁnement and Bayesian polishing

protocols process data in the original pixel size (binning 1). Therefore, the reference map should always be recon-

structed at this binning level and proper reference and FSC masks should be used.

5.18.2 Running the jobs

The order of CTF reﬁnement and Bayesian polishing is not important, as long as we keep track of the optimisation set

ﬁle. In our workspace, in each cycle we ran the two jobs as follows:

1. CTF reﬁnement , described in detail in the Tomo reﬁnement 1: CTF reﬁnement section, and

2. Bayesian polishing , described in detail in the Tomo reﬁnement 2: Bayesian polishing section.

5.18.3 3D reﬁnement

After running both tomo speciﬁc reﬁnement steps, it is still recommended to run a new 3D auto-reﬁne job to take

advantage of the improved tomograms and particles. To this end, we need to construct a new set of pseudo-subtomos

and reference maps as described in the Pseudo-subtomograms and reference map at bin 1 subsection. For the new

3D auto-reﬁne job, the same parameters as in the Running the auto-reﬁne job at bin 1 section apply, except for:

On the Reference tab, set:

Initial low-pass ﬁlter (A):

On the Auto-sampling tab set:

Initial angular sampling:

0.9 degrees

100 Chapter 5. Subtomogram tutorial

RELION

This new 3D reﬁnement step took just under 2 hours on our system (2 GPU cards) and, after Reconstruct particle and

Post-processing with the tight mask, we reached a resolution of 3.6Å, completing the ﬁrst tomo reﬁnement cycle. After

another four full tomo reﬁne cycles, we reached 3.3Å, and depending on the quality of the picked particles, it is also

possible to obtain 3.2Å.

5.19 Tomo reﬁnement 1: CTF reﬁnement

In this step we will show how to apply the reﬁnement of the diﬀerent parameters related to the CTF estimation such as tilt

series projection defocus and contrast scale. For a full description of the arguments, check the relion_tomo_reﬁne_ctf

program.

If you are running this job without following the tutorial, please note the requirements for the reference map and FSC

data as described in Reference map and FSC data.

5.19.1 Running the job

Since we ran both Reconstruct particle and Post-processing after the bin 1 reﬁnement step at the end of the

High-resolution 3D reﬁnement section, we already have the required reference halfmaps (Reconstruct/job025/

half<12>.mrc) and post-processing FSC data (PostProcess/job026/postprocess.star).

In addition, we can use the optimisation set ﬁle from a previous job or, in this case, the explicit particle set from the

duplicate removal step and the separate tomogram set (which would otherwise be included in the optimisation set).

Lastly, the reference mask ﬁle used in the previous 3D auto-reﬁne at binning factor 1 is also required by the tomo

reﬁnement jobs, which in our case is mask_align.mrc.

Select the CTF reﬁnement job-type and set on the I/O tab:

Input optimisation set:

“”

OR: use direct entries?

Yes

(Because the previous job run was Subset selection to remove duplicate particles, we will use the

resulting particles ﬁle as the input particle set. Otherwise, we could have used the optimisation set

ﬁle directly from the previous 3D auto-reﬁne job.)

Input particle set:

Select/job024/particles.star

Input tomogram set:

Denoise/job008/tomograms.star

Input trajectory set

“”

(This is empty in the ﬁrst tomo reﬁnement cycle, unless Bayesian polishing is run ﬁrst, in which case

we would include the generated motion.star ﬁle, unless it already is included in the optimisation

set ﬁle.)

One of the 2 reference half-maps:

Reconstruct/job025/half1.mrc

5.19. Tomo reﬁnement 1: CTF reﬁnement 101

RELION

Reference mask:

mask_align.mrc

Input postprocess STAR:

PostProcess/job026/postprocess.star

On the Defocus tab, set:

Box size for estimation (pix)

512

Reﬁne defocus?

Yes

Defocus search range (Å)

3000

Do defocus regularisation?

Yes

Defocus regularsation lambda

0.1

Reﬁne constrast scale?

Yes

Reﬁne scale per frame?

Yes

Reﬁne scale per tomogram?

On the Running tab, set:

Number of MPI procs:

Number of threads:

With these parameters, the job should take around 10 minutes to run.

5.19.2 Analysing the results

The output folder CtfRefine/job027 contains a new tomograms.star ﬁle with the reﬁned parameters. To as-

sess the result, run new Reconstruct particle and Post-processing jobs using the generated CtfRefine/job027/

optimisation_set.star ﬁle. In our workspace, we see a slight improvement in the resolution to 3.87Å.

These reference map and postprocess ﬁles will also be used as inputs for the next Bayesian polishing run.

102 Chapter 5. Subtomogram tutorial

RELION

5.20 Tomo reﬁnement 2: Bayesian polishing

relion has also implemented the analogous to Bayesian polishing for tomography. This procedure reﬁnes the projections

that map 3D space onto the images of the tilt series. Optionally, the beam-induced motion trajectories of the particles

and deformations can also be estimated. For a complete description of the arguments, check the relion_tomo_align

program.

If you are running this job without following the tutorial, please note the requirements for the reference map and FSC

data as described in Reference map and FSC data.

5.20.1 Running the job

Select the Bayesian polishing job-type and set on the I/O tab:

Input optimisation set:

CtfReﬁne/job027/optimisation_set.star

OR: use direct entries?

One of the 2 reference half-maps:

Reconstruct/job028/half1.mrc

Reference mask:

mask_align.mrc

Input postprocess STAR

PostProcess/job029/postprocess.star

On the Polish tab, set:

Box size for estimation (pix)

512

Max position error (pix)

Align by shift only?

Alignment model

(Does not apply)

On the Motion tab, set:

Fit per-particle motion?

Yes

Sigma for velocity (Å/dose)

0.2

Sigma for divergence (Å)

5000

Use Gaussian decay

5.20. Tomo reﬁnement 2: Bayesian polishing 103

RELION

On the Running tab, set:

Number of MPI procs:

Number of threads:

Note that the per-particle motion estimation increases the processing time signiﬁcantly. On our system it took around

2 hours.

5.20.2 Analysing the results

In the output folder Polish/job030 you will ﬁnd new tomograms.star and particles.star ﬁles including the

corrected tilt series alignment and particle positions and a trajectory set ﬁle motion.star with particle trajectories.

To assess the result, we generate new particles with the Extract subtomos job using the resulting optimisation_set.

star ﬁle, followed by Reconstruct particle and Post-processing . Compared to the previous FSC estimation, we observe

a clear improvement and a resolution of 3.65Å.

5.21 Model building with ModelAngelo

Building an atomic model using ModelAngelo [JKZ+24] is done in the same way as in the SPA tutorial. First, download

the protein sequence as a FASTA ﬁle from this link (Download Files -> FASTA Sequence).

5.21.1 Running the job

Select the ModelAngelo building job type and, on the I/O tab, set:

B-factor sharpened map:

PostProcess/job079/postprocess_masked.mrc

FASTA sequence for proteins:

rcsb_pdb_5L93.fasta

After inputing the path to the ModelAngelo executable and the GPUs to use and making sure that Perform HMMer

search? is set to No on the Hmmer tab, we run the job.

5.21.2 Analysing the results

On our machine with 2 GPUs, this job took 3 minutes to run. The output is a coordinate ﬁle called ModelAngelo/

job080/job080.cif that may be used in UCSF chimera together with the Postprocess/job079/postprocess.

mrc map. Note that although ModelAngelo did a very good job on this relatively easy test case, you should always

check its results carefully in a program like coot [ELSC10] . You will also need to perform a stereochemical reﬁnement

of the coordinates. For this, we like servalcat [YPBM21].

104 Chapter 5. Subtomogram tutorial

CHAPTER

SIX

ON-THE-FLY PROCESSING

As of relion-4.0, on-the-ﬂy processing is based on the Schemes functionality. This has been implemented together

with a small Tkinter GUI that was written by Colin Palmer from CCPEM for the relion_it.py python script that

was distributed with relion-3.0 and 3.1. The new program still carries the same name, and can be launched from your

Project directory from the command line:

relion_it.py &

This will launch the GUI, which contains several sections that need to be ﬁlled in by the user.

ò Note

This script depends on pre-conﬁgured Schemes/prep and Schemes/proc Schemes that are distributed inside the

scripts directory on the relion source code. To ﬁnd these, the environment variable RELION_SCRIPT_DIRECTORY

needs to be set to point to this directory. You can also use this variable to point towards your own modiﬁed version

of the Schemes.

6.1 Computation settings

This section speciﬁes what calculations will be performed.

Do MotionCorr & CTF?

(If selected, a Scheme called Schemes/prep will be lanuched that will loop over all movies (as

deﬁned in the section Preprocessing settings to run Import , Motion correction and CTF estimation .

By default this will be done for a maximum of 50 movies at a time.)

micrographs_ctf.star:

Schemes/prep/ctﬃnd/micrographs_ctf.star

(This option is only used if Do MotionCorr & CTF? is not selected. In that case, the user can provide

the output STAR ﬁle from a previous CTF estimation job to perform the rest of the processing on.

Do Autopick & Class2D?

(If selected, a second Scheme called Schemes/proc will be lanuched that will automatically pick

particles, as speciﬁed on the Particle settings and Processing settings sections, and then perform

2D classiﬁcation and automated class selection through the Subset selection job. This is repeated on

a loop, while the number of particles extracted is still increasing.)

105

RELION

Do Reﬁne3D?

(If selected, the iteration of the second Scheme will also comprise a 3D auto-reﬁne job.)

3D reference:

None

(If set to None, or anything else that does not exist as a ﬁle, then an 3D initial model job will be

launched before the auto-reﬁne job. Note that in the case of symmetry, as deﬁned in the Particle

settings section, the initial model calculation is still performed in C1, but this calculation is followed

by an alignment of the symmetry axes and the application of the symmetry to the model.)

GPUs (comma-separated):

0,1

(This option is used to specify the IDs of the GPU devices you want to run on. Since the motion

correction and CTF estimation are CPU-only, this option is only used by the second Scheme.)

6.2 Preprocessing settings

This section speciﬁes information about where the movies are and how they were recorded.

Pattern for movies:

Movies/*.tiﬀ

(This speciﬁes where the movies are. If you have recorded movies in the .eer format, it is recom-

mended that you convert them to .tiff format during the copying process from the microscope

computer to your processing computer.)

Gain reference (optional):

Movies/gain.mrc

(Only use this option if your movies have not been gain-corrected yet, otherwise leave empty.)

Super-resolution?

(Click this if your movies are in super-resolution. Note that we do not recommend recording movies

in super-resolution, and that they will be binned during the Motion correction job.)

Voltage (kV):

300

Cs (mm):

2.7

Phase plate?

(Click this if you have collected your images with a phase plate. In that case, the CTF estimation job

will also estimate the phase shift.)

(Super-res) pixel size (A):

(Provide the pixel size in the movies. If they are in super-resolution, then provide the (smaller) super-

resolution pixel size.)

106 Chapter 6. On-the-ﬂy processing

RELION

Exposure rate (e-/A2/frame):

1.2

(This is the accumulated dose in a single movie frame.)

6.3 Particle settings

Symmetry:

Longest diameter (A):

180

(The longest diameter will be used to automatically determine the box size below, as well as for LoG

and topaz picking.)

Shortest diameter (A):

150

(This will only be used for LoG picking. This value should be smaller or equal than the longest

diameter above, and is useful to pick elongated particles.)

Mask diameter (A):

198.0

(This is used for Auto-picking jobs, as well as 2D classiﬁcation , 3D initial model and 3D auto-reﬁne

Box size (px):

246

(The box size in the original micrograph.)

Down-sample to (px)

(To speed up all calculations in the proc Scheme, all particles will be downsampled to this box size.)

Calculate for me:

(This will generate automated suggestions for the mask diameter, the box size and the down-sampled

box size. We often use these.)

6.4 Processing settings

Min resolution micrographs?

(Only micrographs with an estimated CTF resolution beyond this value will be selected. Set to 999

not to throw away any micrographs.)

Retrain topaz network?

(If this is selected, then the proc Scheme will ﬁrst use the below speciﬁed number of particles for an

initial 2D classiﬁcation and automated class selection in Subset selection . The selected particles are

6.3. Particle settings 107

RELION

then used to re-train the neural network in topaz for this data set. Once the re-training is ﬁnished, the

entire data set will be picked using topaz.

Nr particles for Log picking:

10000

(The number of particles used for LoG picking.)

LoG picking threshold:

(The threshold to LoG pick particles.

LoG class2d score:

0.5

(The threshold to automatically select 2D class averages from the LoG picked particles. A value of

0 means rubbish classes; a value of 1 means gorgeous classes.)

Topaz model:

Schemes/proc/train_topaz/model_epoch10.sav

(If one does not retrain the topaz network, then this option can be used to provide a pre-trained

network. If this option is left empty, then the default general network inside topaz is used.)

Nr particles per micrograph:

300

(The expected number of particles per micrograph, which is used both for topaz training and picking.)

Topaz picking threshold:

(The topaz threshold to select particle. Using negative values, e.g. -3, will pick more particles.)

Topaz class2d score:

0.5

(The threshold to automatically select 2D class averages from the LoG picked particles. A value of

0 means rubbish classes; a value of 1 means gorgeous classes.)

Finally, the GUI has two action buttons:

The Save options button will save the currently selected options to a ﬁle called relion_it_options.py. This (to-

gether with any other options ﬁles) can be read in when launching the GUI a next time from the command line:

relion_it.py relion_it_options.py [extra_options2.py ....] &

The Save &run button will also save the options, and it will actually launch the Schemes and open the normal relion

GUI, from which the progress can be monitored, as explained on the Schemes reference page.

108 Chapter 6. On-the-ﬂy processing

RELION

6.5 Intervening

Once the Schemes are running, you will see new jobs popping up in the normal relion GUI. As soon as you start seeing

some results, you may ﬁnd that you want to change some of the parameters. To make stopping and restarting a Scheme

easier, there is another GUI: relion_schemegui.py. It needs to be launched for each running Scheme separately.

The Save &run button above, will have launched one for both the prec and proc Scheme, but you can also launch it

from the command line:

relion_schemegui.py proc &

This GUI will look for the hidden directory (.relion_lock_scheme_proc) that locks this Scheme to see whether it

is running or not, and it will update the Current entry to indicate at what job or operator the Scheme currently is.

To stop a running Scheme, press the Abort button and wait for the underlying jobs and the schemer to receive the abort

signal. Depending on what the Scheme is executing, this may take a bit of time. Once it has been aborted, you can then

change options to speciﬁc jobs through the Set Job option section, or change variables in the Scheme through the

Set Scheme variable section. (The GUI still needs some work here to make this easier and more error-resistant).

After changing variables to any job, it’s status will be reverted to has not started, meaning that a new relion

job will be launched next time the Scheme comes across it. For any continue type of job (like Motion correction ,

CTF estimation , Auto-picking or Particle extraction ), a new job will only be launched if that job’s options were

changed, or if the options were changed for any job that came before that job. Otherwise, the job will just continue,

and thereby already performed calculations will not be repeated.

To start the Scheme again, press the Restart button. The Scheme will be executed from the job or operator speciﬁed

on the Current entry. If you want, you can change this from the point where it was aborted. If you want to restart the

Scheme all the way from the beginning, then press the Reset button, before pressing Restart .

Sometimes, a Scheme dies because of an error, not because of it ﬁnishing or being aborted. In that case, the lock

directory (.relion_lock_scheme_proc) needs to be deleted, before the Scheme can be used again. Press the Unlock

button to print instructions on how to do that. (TODO: implement this through a popup window from the GUI...)

6.6 Control more options

Not all options of all relion jobs, or all of the parameters of the Schemes themselves can be controlled from the

relion_it.py GUI. You can still control all of these through manually editing the relion_it_options.py ﬁle.

For this, use double underscores to separate SCHEMENAME__JOBNAME__JOBOPTION for any option. Some options are

already in the default ﬁle, but any other options can be added.

E.g. to change the number of 2D classes (nr_classes) in the class2d_ini job of the the proc Scheme, you can add

the following line to the relion_it_options.py ﬁle:

'proc__class2d_ini__nr_classes', '200',

Likewise, use SCHEMENAME__VARIABLENAME for variables in the Schemes themselves, e.g. to set de do_at_most

variable, which determines the maximum number of micrographs that are processed in one cycle of the prep Scheme,

edit this line:

'prep__do_at_most', '100',

You can also save options for the relevant settings for your local setup in a second options ﬁle, e.g.

relion_it_options_LMB-Krios1.py, and then call relion_it.py with those, e.g.:

relion_it.py relion_it_options_LMB-Krios1.py relion_it_options.py &

6.5. Intervening 109

RELION

If the same option is speciﬁed in multiple options ﬁles, the value in the last ﬁle on the command line will be used.

One could even make a speciﬁc command for each microscopy setup by using an alias like:

alias relion_it_krios1.py 'relion_it.py relion_it_options_LMB-Krios1.py'

6.7 Site-speciﬁc setup

At the very least, you will need to change the position of the executables for ctﬃnd 4.1 or Gctf and topaz, but you may

also want to tweak the default settings for number of threads or MPI processors for the diﬀerent jobs. So, you local

setup options ﬁle will likely include options like:

{

'prep__ctffind__fn_ctffind_exe' : '/wherever/ctffind/ctffind.exe',

'prep__ctffind__fn_gctf_exe' : '/wherever/Gctf/bin/Gctf',

'proc__inipicker__fn_topaz_exe' : '/wherever/topaz/topaz',

'proc__restpicker__fn_topaz_exe' : '/wherever/topaz/topaz',

'proc__train_topaz__fn_topaz_exe' : '/wherever/topaz/topaz',

'prep_motioncorr__nr_threads' : '16',

'proc_restpicker__nr_mpi' : '4',

'proc_extract_ini__nr_mpi' : '4',

'proc_extract_rest__nr_mpi' : '4',

'proc_class2d_ini__nr_threads' : '12',

'proc_class2d_rest__nr_threads' : '12',

'proc_inimodel3d__nr_threads' : '12',

'proc_refine3d__nr_threads' : '8',

'proc_refine3d__nr_mpi' : '3'

}

Remember it is also possible to edit the job.star and scheme.star ﬁles inside your own copy of the Schemes/prep

and Schemes/proc directories, and use the environment variable $RELION_SCRIPT_DIRECTORY to point towards the

modiﬁed scripts. That provides an alternative that would no longer rely on specifying an extra options ﬁle, and allows

maximum ﬂexibility in adopting the schemes to your speciﬁc needs.

110 Chapter 6. On-the-ﬂy processing

CHAPTER

SEVEN

REFERENCE PAGES

7.1 Movie Compression

This page describes how to compress CryoEM movies to save disk spaces and reduce I/O loads.

7.1.1 Falcon3 / Falcon4

Falcon3 and Falcon4 output images in 16-bit integers. Gain normalization is always applied and cannot be disabled.

In the counting mode, an electron is not rendered as a single 1 but seems to be placed as a blob of 3x3 pixels that sum

to 100. This is probably to reduce noise aliasing from frequencies higher than the Nyquist frequency.

Below is an example from Falcon3. You can see three electrons. Two electrons have overlapping tails:

0 0 0 0 0 0 0 0 0 0 0

0 0 3 7 1 0 0 0 0 0 0

0 0 15 42 11 16 4 0 0 0 0

0 0 6 15 14 44 11 3 7 1 0

0 0 0 0 2 7 2 15 42 7 0

0 0 0 0 0 0 0 6 15 3 0

0 0 0 0 0 0 0 0 0 0 0

Because of this feature, Falcon3 and Falcon4 movies do not compress well. Typically, one can reduce the size to about

50 to 60 % of the original movie by TIFF with the deﬂate ﬁlter (also called zip ﬁlter). The LZW ﬁlter is faster but

gives larger ﬁles.

By converting to TIFF from EER, you can avoid this blob convolution. Keep reading below.

7.1.2 Falcon4 EER

EER (Electron Event Representation) is a new movie format for Falcon4. It records electron events at the detector’s

physical frame rate (248 Hz). The ﬁle size is typically smaller than MRC, but can be larger than fractionated TIFF from

Falcon4.

The location of events is stored in a 4x super-resolution grid (i.e. 16K x 16K pixels). RELION renders an electron as

a single dot in a 4K or 8K grid. By rendering in a 8K grid and then Fourier cropping to a 4K grid, you can remove

noise beyond the (physical) Nyquist frequency. If you directly render in a 4K grid, the noise aliases back and slightly

lowers the DQE. However, this eﬀect is very tiny in practice and the resolution rarely changes. One possibility is to

start processing at 4K and coarse slicing and then switch to 8K and ﬁner slicing in Polish to save processing time and

memory (see below).

111

RELION

ò Note

As of RELION 3.1.1, 8K rendering can create artifacts around defect lines. Thus, we do not recommend 8K

rendering.

To process an EER dataset, proceed as follows. This feature needs RELION >= 3.1.1.

1. Decide how many (internal) frames to group into a fraction.

For example, if you have 1000 internal frames and group them by 30, you will get 33 fractions. The remaining

10 (= 1000 - 30 * 33) frames will be ignored. Too ﬁne slicing leads to very slow processing and out of memory

errors.

2. Calculate how many e/A2 each fraction has.

Fractionate such that each fraction has about 0.5 to 1.25 e/A2. You can change this later.

3. Import EER ﬁles as usual.

Use the physical pixel size for 4K rendering. For 8K rendering, the pixel size should be half of the physical

size.

4. Run motion correction.

• Specify the value decided in the step 1 to EER fractionation.

• Specify the dose rate calculated in step 2.

• Specify the gain reference.

• Group frames in the GUI must be 1 regardless of what you choose in step 1.

• Binning factor should be 2 to bring a 8K super resolution grid into a 4K physical grid by Fourier

cropping.

Otherwise, set it to 1.

• Add --eer_upsampling 2 if you work in a 8K grid. The default is 4K, that is, --eer_upsampling 1.

If you want to change the rendering mode (4K or 8K) and/or the fractionation before Polish, you have to modify the

trajectory STAR ﬁles produced by a MotionCorr job. Use scripts/eer_trajectory_handler.py:

python3 /path/to/eer_trajectory_handler.py --i MotionCorr/job002/corrected_micrographs.

˓→star --o up2_gr8 --resample 2 --regroup 8

will change the sampling to 8K and the fractionation to 8 frames. The output will be MotionCorr/job002/

corrected_micrographs_up2_gr8.star, which should be speciﬁed to a Polish job. Note that the extraction box

size for Polish is in pixels of the rendered movie.

The gain reference for EER has confusing convention due to historical reasons. When the movie is in the EER format

and the gain reference is in the MRC format, RELION will divide raw pixel values with the provided gain. In contrast,

if the gain reference is given in the TIFF format with the .gain extension (as written by newer versions of the Falcon

software), RELION will multiply raw pixel values with the provided gain (as is done for K2/K3). In any case, when the

gain is zero, the pixel is considered as defective. The gain reference can be 8K x 8K or 4K x 4K. If the size of the gain

reference and the size requested by --eer_upsampling do not match, the gain reference is upsampled / downsampled.

If memory usage is a concern, consider building RELION in CPU single precision (cmake -DDoublePrec_CPU=OFF).

Another useful tool is relion_convert_to_tiff, which renders an EER movie into a compressed integer TIFF.

You can render in 4K or 8K (--eer_upsampling) with speciﬁed fractionation (--eer_grouping). This can further

reduce ﬁle sizes by sacriﬁcing spatial and temporal resolutions to a reasonable value. Due to the diﬀerent meanings

of the gain reference for EER and TIFF, you have to take the inverse of the EER gain reference before processing the

resulting TIFF ﬁles. The tool performs this conversion when the EER gain reference is speciﬁed in the --gain option.

112 Chapter 7. Reference pages

RELION

ò Note

Old versions of EPU display the one-seventh of the real number of frames in EER, as reported in CCPEM. This

issue has been ﬁxed in the latest EPU release. In case of doubt, check the real number of frames by running

relion_convert_to_tiff on one of the movies. The command will print “Found X raw frames.”

7.1.3 Gatan K2 / K3 in counting or super-resolution mode

Gain non-normalised movies

Always use the gain non-normalised mode. Then the images represent electron counts and consist of mostly 0s, a few

1s, fewer 2s, even fewer 3s and so on. Below is an example:

1 0 0 0 0 0 0 0

0 1 0 3 1 1 0 2

1 0 0 1 0 1 0 0

2 1 0 0 0 0 0 1

0 2 0 1 0 0 0 0

Such images compress extremely well by TIFF with the LZW ﬁlter. The deﬂate ﬁlter is much slower and gives larger

ﬁles.

SerialEM can directly write compressed TIFF movies. EPU cannot write movies in TIFF but in integer (8 bit for

counting mode, 4 bit for super-resolution mode) MRC, which can be converted to TIFF later.

Update: According to Grigory Sharov at MRC-LMB, TIFF output is available in EPU >=2.4 for K3 and EPU >=2.9

with TEM server >=7.6 (requires Windows 10) for K2.

Gain normalised movies

The gain reference looks like below:

1.149962 1.083618 1.198896 1.140650 1.159426 1.063172 1.204020 1.145287

1.075346 1.122473 1.173919 1.149962 1.159426 1.051271 1.178831 1.071257

1.043484 1.009823 1.109215 1.100549 1.183784 1.100549 1.126962 0.978266

1.193816 1.047363 1.246640 1.214399 1.193816 1.067199 1.051271 0.999080

1.131488 1.159426 1.304355 1.051271 1.035811 1.131488 0.974881 0.988563

If you save movies in the gain normalised mode, the electron counts are multiplied by this to yield:

1.149962 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

0.000000 1.122473 0.000000 3.449886 1.159426 1.051271 0.000000 2.142514

1.043484 0.000000 0.000000 1.100549 0.000000 1.100549 0.000000 0.000000

2.387632 1.047363 0.000000 0.000000 0.000000 0.000000 0.000000 0.999080

0.000000 2.318852 0.000000 1.051271 0.000000 0.000000 0.000000 0.000000

Now, values are 32-bit ﬂoating points, which hardly compress.

If we knew the gain reference, we could divide pixel values by the gain to bring them back to integers. However, the

gain reference is not written in the gain normalised mode. Fortunately, there is a way to reliably estimate the gain

reference.

Consider a particular pixel and look at the values over many frames. The values should be integer multiples of its gain,

for example:

7.1. Movie Compression 113

RELION

0.000000 2.21843 0.0000000 1.109215 0.000000 0.000000 1.109215 3.327645 0.000000␣

˓→...

Thus, we can estimate the gain by ﬁnding the greatest common divisor among these values. Since the dose rate of

counting mode movies is very low (otherwise, you will have coincidence losses), it is highly probable that the list

contains an observation corresponding to one electron. Thus, one simply needs to ﬁnd the smallest positive value and

use it as the gain for this pixel. In this case, it is 1.109215.

There is one complication. Digital Micrograph applies defect correction when working in the gain normalised mode.

Values of such pixels are no longer integer multiples of their gain and the above trick does not work. For such pixels,

one can keep the original values and set the gain to 1.000000. Then gain multiplication does not modify such pixels.

The output remains 32bit ﬂoating point numbers, not integers, but since most values are 0.000000, 1.000000, 2.000000,

3.000000, etc except for defect pixels, the entropy is smaller than the input and compression is more eﬃcient.

7.1.4 relion_convert_to_tiﬀ

The command relion_convert_to_tiff implements above compression schemes.

For Falcon3, Falcon4, gain non-normalised K2/K3 images, the usage is very simple:

relion_convert_to_tiff --i movies.star --o Converted/

The STAR ﬁle needs only the rlnMicrographMovieName column. You can also specify a list ﬁle .lst that contains

movie names without any STAR headers.

When the input is from Falcon detectors, judged by the width being 4096 pixels, it applies deﬂate compression

at level 6. Otherwise, LZW compression is performed. This default can be overridden by --compression and

--deflate_level arguments. By default, relion_convert_to_tiff treats all rows in a frame as one TIFF strip to

improve the compression ratio. This can be disabled by --line_by_line option.

--only_do_unfinished allows conversion of only new ﬁles. The program writes to a temporary ﬁle and renames it

to .tif only after all frames have been written. Thus, killing a program in the middle is safe.

In contrast to mrc2tif command from the IMOD suite, relion_convert_to_tiff does not support thread paralleliza-

tion to compress one movie with many cores. However, one can use MPI parallelization as:

mpirun -np 24 relion_convert_to_tiff_mpi --i movies.star --o Converted/ # 24 processes

to process many movies simultaneously.

ò Note

In MRC-LMB computer cluster, you should run the above command after booking a full CPU node by qlogin -l

dedicated=24. Note that our cluster nodes cannot access /teraraid*. If your movies are there, you have to run

conversion on max, hex or hal (be considerate to others by reducing the number of processes!).

114 Chapter 7. Reference pages

RELION

Gain estimation

To compress gain normalised K2 movies, one has to ﬁrst estimate the gain used during data collection. Note that this

gain is diﬀerent from what relion_estimate_gain estimates.

relion_convert_to_tiff --i movies.star --o Converted/ --estimate_gain

This prints a row per frame:

Processing Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 000

˓→#Changed 7673083 #Mismatch 0, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 001

˓→#Changed 4549992 #Mismatch 80676, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 002

˓→#Changed 2743457 #Mismatch 89580, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 003

˓→#Changed 1670936 #Mismatch 77997, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 004

˓→#Changed 1028044 #Mismatch 59783, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 005

˓→#Changed 638637 #Mismatch 44309, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 006

˓→#Changed 399216 #Mismatch 30629, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 007

˓→#Changed 251807 #Mismatch 21201, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 008

˓→#Changed 159379 #Mismatch 15021, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 009

˓→#Changed 101211 #Mismatch 10315, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 010

˓→#Changed 64619 #Mismatch 7191, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 011

˓→#Changed 41322 #Mismatch 5089, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 012

˓→#Changed 26191 #Mismatch 3789, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 013

˓→#Changed 16901 #Mismatch 2901, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 014

˓→#Changed 10994 #Mismatch 2284, #Negative 0, #Unreliable 14238980␣

(continues on next page)

7.1. Movie Compression 115

RELION

(continued from previous page)

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 015

˓→#Changed 7170 #Mismatch 1885, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 016

˓→#Changed 4538 #Mismatch 1613, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 017

˓→#Changed 2980 #Mismatch 1446, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 018

˓→#Changed 1913 #Mismatch 1349, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 019

˓→#Changed 1273 #Mismatch 1293, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 020

˓→#Changed 859 #Mismatch 1256, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 021

˓→#Changed 554 #Mismatch 1206, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 022

˓→#Changed 344 #Mismatch 1232, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 023

˓→#Changed 243 #Mismatch 1188, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 024

˓→#Changed 169 #Mismatch 1189, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 025

˓→#Changed 107 #Mismatch 1195, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 026

˓→#Changed 79 #Mismatch 1182, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 027

˓→#Changed 60 #Mismatch 1206, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 028

˓→#Changed 53 #Mismatch 1187, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 029

˓→#Changed 39 #Mismatch 1177, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 030

˓→#Changed 37 #Mismatch 1165, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 031

˓→#Changed 30 #Mismatch 1199, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

(continues on next page)

116 Chapter 7. Reference pages

RELION

(continued from previous page)

Original/FoilHole_7230495_Data_7226082_7226083_20181209_0240-40759.mrc Frame 032

˓→#Changed 23 #Mismatch 1185, #Negative 0, #Unreliable 14238980␣

˓→/ 14238980

As explained above, the program ﬁnds smallest positive numbers for each pixel over many frames. #Changed is the

number of pixels whose minimum value is updated. #Mismatch is the number of pixels whose value in the frame is not

an integer multiple of the current gain estimate. This happens when (1) the pixel is defective and Digital Micrograph

applied correction or (2) the estimated gain is not correct (for example, the current minimum corresponds to two

electrons and the frame contains three electrons).

#Unreliable is the number of pixels whose gain estimate is still unreliable. A pixel is considered to be reliable when

values which are integer multiples of the current gain estimate were observed at least --thresh times (default 50)

without being interrupted by mismatch.

After processing several hundreds frames, the values should become stable. The number of mismatches ﬂuctuates. The

number of unreliable pixels is usually 1000 to 5000 in most K2 detectors.

Processing Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 000

˓→#Changed 0 #Mismatch 1216, #Negative 0, #Unreliable 1346␣

˓→/ 14238980

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 001

˓→#Changed 0 #Mismatch 1203, #Negative 0, #Unreliable 1346␣

˓→/ 14238980

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 002

˓→#Changed 0 #Mismatch 1199, #Negative 0, #Unreliable 1346␣

˓→/ 14238980

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 003

˓→#Changed 0 #Mismatch 1199, #Negative 0, #Unreliable 1346␣

˓→/ 14238980

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 004

˓→#Changed 0 #Mismatch 1192, #Negative 0, #Unreliable 1346␣

˓→/ 14238980

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 005

˓→#Changed 0 #Mismatch 1210, #Negative 0, #Unreliable 1346␣

˓→/ 14238980

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 006

˓→#Changed 0 #Mismatch 1186, #Negative 0, #Unreliable 1346␣

˓→/ 14238980

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 007

˓→#Changed 0 #Mismatch 1219, #Negative 0, #Unreliable 1346␣

˓→/ 14238980

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 008

˓→#Changed 0 #Mismatch 1224, #Negative 0, #Unreliable 1346␣

˓→/ 14238980

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 009

˓→#Changed 0 #Mismatch 1197, #Negative 0, #Unreliable 1346␣

˓→/ 14238980

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 010

˓→#Changed 0 #Mismatch 1202, #Negative 0, #Unreliable 1346␣

˓→/ 14238980

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 011

˓→#Changed 0 #Mismatch 1176, #Negative 0, #Unreliable 1346␣

(continues on next page)

7.1. Movie Compression 117

RELION

(continued from previous page)

˓→/ 14238980

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 012

˓→#Changed 0 #Mismatch 1197, #Negative 0, #Unreliable 1346␣

˓→/ 14238980

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 013

˓→#Changed 0 #Mismatch 1196, #Negative 0, #Unreliable 1346␣

˓→/ 14238980

Original/FoilHole_7232574_Data_7226091_7226092_20181209_2221-42294.mrc Frame 014

˓→#Changed 0 #Mismatch 1204, #Negative 0, #Unreliable 1346␣

˓→/ 14238980

Now you can stop the program by pressing Ctrl-C. The program updates gain_estimate.bin and

gain_estimate_reliability.bin every movie.

To perform actual compression, specify gain_estimate.bin as --gain option:

relion_convert_to_tiff --i movies.star --o Converted/ --gain Converted/gain_estimate.bin

The program writes not only TIFF movies but also gain-reference.mrc, which should be used for subsequent data

processing.

Practical considerations

If you updated the gain reference in Digital Micrograph during data collection, you have to divide your dataset into two

and estimate gain separately.

Some pixels are cold pixels and emit 0 most of the time. Thus, it is very rare to observe values corresponding to one

electron. If you terminate gain estimation too early, such pixels are ﬂagged as unreliable. This is safe, because values

of unreliable pixels are always written as they are with the gain value of 1.0000.

If the program never observes an event corresponding to one electron but only events corresponding to two or four

electrons during gain estimation, the program mistakenly considers the value for two electrons as the gain and still ﬂags

the pixel as reliable. If the program encounters an event corresponding to one or three electrons during compression,

which is not multiple of the estimated gain, the program emits an error and terminates. In this case, you have to

re-run gain estimation from more frames and repeat compression from the beginning. Fortunately, such situation

is highly unlikely; because the pixel values are Poisson distributed and the dose rate is low, you observes an event

corresponding to one electron frequently. When the dose rate is high, the probability for one-electron events is lower,

but the distribution becomes also wider. This means that you observe neighbouring values (e.g. two-, three- and

four-electron events) with similar frequencies. In other words, it is unlikely to observe many two- and four-electron

events without observing any three-electron events. Because three is not divisible by two, this pixel remains ﬂagged as

unreliable. The --ignore_error option forces the program to continue by rounding non-conforming values but this

leads to change of pixel values.

Defective pixels do not carry much information. If we round them to nearest integers, the output can be saved as

integers, not ﬂoating point numbers, and the compression ratio will improve. Since the number of defects are very

small (1000 to 5000 out of 14 million pixels in K2) and their values are not very accurate anyway, such a slightly-lossy

compression scheme probably do not hurt the resolution. Implementation and veriﬁcation of such a strategy is on our

TODO list.

118 Chapter 7. Reference pages

RELION

7.1.5 Compressed MRC ﬁles

Some movies, especially Falcon 3 or Falcon 4 movies (non-EER), can be compressed signiﬁcantly better by bzip2

than deﬂate TIFF. RELION 4.0.1 and newer support MRC movies compressed by bzip2, xz or ZStandard in

RELION’s own motion correction and Bayesian Polish for SPA. Other RELION features, including tomography,

relion_movie_reconstruct and relion_image_handler, do NOT support compressed MRC ﬁles (yet).

To read compressed MRC ﬁles, RELION needs pbzip2 (not bzip2), xz and zstd command in your PATH for bzip2,

xz and ZStandard, respectively.

These formats do not allow random access; in other words, RELION has to decompress all the N - 1 preceeding frames

only to read the N-th frame. This is not a big issue for tools that read all frames anyway (e.g. motion correction and

Polish), but poses signiﬁcant ineﬃciency for others. Fortunately, LZW-TIFF achieves similar (or better) compression

for K2/K3 movies and Falcon 3/4 movies converted from EER. Thus, we hope the compressed MRC format is necessary

only for archiving old Falcon 3/4 MRC movies.

7.1.6 Examples

Compression rates depend on dose. Fewer electrons typically lead to better compression.

Falcon 3 counting

FoilHole_24156969_Data_24154827_24154828_20170425_0847_Fractions.tif from EMPIAR-10309 (A2a receptor).

The deposited ﬁle is already in TIFF, but decompressed to 16 bit integer MRC for testing. 4096 x 4096 pixels, 75 frames,

16 bit integer, mean = 36.644 (i.e. 0.36 e/px/frame)

• 16 bit integer MRC: 2,516,583,424

• IMOD mrc2tif, lzw: 1,583,972,550 (62.9 %)

• IMOD mrc2tif, zip level 6: 1,432,846,496 (56.9 %)

• IMOD mrc2tif, zip level 9: 1,432,820,192 (56.9 %)

• relion_convert_to_tiﬀ, auto = zip level 6: 1,337,873,325 (53.2 %)

• bzip2: 1,067,277,634 (42.4 %)

Note that bzip2 gives a smaller ﬁle.

K2 counting, gain normalised from EPU

FoilHole_12404830_Data_12400523_12400524_20181213_1058-251321.mrc from EMPIAR-10317 (ABC trans-

porter). 3838 x 3710 pixels, 40 frames, 32 bit ﬂoating point from EPU, mean = 1.45

• 32 bit ﬂoating point MRC: 2,278,237,824

• IMOD mrc2tif, lzw: 1,554,985,144 (68.3 %)

• IMOD mrc2tif, zip level 6: 1,274,226,678 (55.9 %)

• bzip2: 739,204,755 (32.4 %)

• relion_convert_to_tiﬀ after gain estimation: 270,856,741 (11.9 %)

1502 pixels were marked as unreliable.

Also note that it would have been 569560224 bytes (25 %) in gain non-normalised 8-bit integer MRC even before

compression.

7.1. Movie Compression 119

RELION

K2 counting, gain non-normalised from EPU

FoilHole_2491648_Data_2484494_2484495_20190505_2224-167458.mrc from EMPIAR-10340 (tau ﬁlaments).

3838 x 3710 pixels, 48 frames, 8 bit integers from EPU, mean = 0.96

• 8 bit integer MRC: 683,472,064

• IMOD mrc2tif, zip level 6: 205,103,218 (30.0 %)

• IMOD mrc2tif, lzw: 193,826,068 (28.4 %)

• relion_convert_to_tiﬀ, auto = lzw: 190,200,465 (27.8 %)

• bzip2: 189,773,107 (27.8 %)

By saving in the gain non-normalised mode, the integer MRC ﬁle is one forth the size of the ﬂoating point MRC ﬁle

(32 / 8 = 4). LZW compression further reduces the size.

Falcon 4 EER

FoilHole_13722039_Data_13716084_13716086_20200315_0111_Fractions.mrc.eer from EMPIAR-10500 (GABAA

receptor). The deposited ﬁle is in the EER format. The ﬁle is converted to LZW-TIFF at diﬀerent temporal resolutions

(frame grouping) and spatial resolutions (4K physical grid or 8K super-resolution grid). 4096 x 4096 pixels, 0.725

Å/px, 1113 detector frames, mean = 20.275 (i.e. 0.0182 e/px/frame)

• Original EER: 498,516,028

• LZW-TIFF, 4K, group by 24 (0.44 e/px/fraction, 0.83 e/Å/fraction): 146,363,549 (29.4 %)

• LZW-TIFF, 4K, group by 16 (0.29 e/px/fraction, 0.55 e/Å/fraction): 175,643,049 (35.2 %)

• LZW-TIFF, 4K, group by 12 (0.22 e/px/fraction, 0.41 e/Å/fraction): 198,052,039 (39.7 %)

• LZW-TIFF, 4K, group by 8 (0.15 e/px/fraction, 0.28 e/Å/fraction): 233,619,295 (46.9 %)

• LZW-TIFF, 8K, group by 24 (0.83 e/Å/fraction): 257,857,873 (51.7 %)

• LZW-TIFF, 8K, group by 16 (0.55 e/Å/fraction): 296,274,173 (59.4 %)

• LZW-TIFF, 8K, group by 12 (0.41 e/Å/fraction): 325,821,955 (65.4 %)

• LZW-TIFF, 8K, group by 8 (0.28 e/Å/fraction): 373,938,153 (75.0 %)

Although EER ﬁles are smaller than Falcon MRC ﬁles written by EPU, they can be made even smaller by fractionation

to a reasonable temporal resolution. By converting to LZW-TIFF from EER, you can avoid 3x3 blob convolution and

get smaller ﬁles than converting from MRC. Since electrons are rendered as dots in EER (as in K2/K3), LZW ﬁlter is

faster and produces smaller ﬁles than the deﬂate ﬁlter.

7.2 Pixel size issues

7.2.1 What should I do if the pixel size turned out to be wrong?

If the error is small (say 1-2 %) and the resolution is not very high (3 Å), you can specify the correct pixel size in

the PostProcess job. This scales the resolution in the FSC curve. When the error is large, the presence of spherical

aberration invalidates this approach. Continue reading.

120 Chapter 7. Reference pages

RELION

. Warning

You should not edit your STAR ﬁles because your current defocus values are ﬁtted against the initial, slightly

wrong pixel size. Also, you should not use “Manually set pixel size” in the Extraction job. It will make metadata

inconsistent and break Bayesian Polishing. Thus, this option was removed in RELION 3.1.

7.2.2 Cs and the error in the pixel size

Recall that the phase shift due to defocus is proportional to the square of the wave number (i.e. inverse resolution),

while that due to spherical aberration is proportional to the forth power of the wave number. At lower resolutions, the

defocus term dominates and errors in the pixel size (i.e. errors in the wave number) can be absorbed into the defocus

value ﬁtted at the nominal pixel size. At higher resolution, however, the Cs term becomes signiﬁcant. Since Cs is not

ﬁtted but given at the correct pixel size, the error persists. As the two terms have the opposite sign, the errors sometimes

cancel out at certain resolution shells, leading to a strange bump in the FSC curve. See an example below contributed

from a user. Here the pixel size was oﬀ by about 6 % (truth: 0.51 Å/px, nominal: 0.54 Å/px).

Below is a theoretical consideration. Let’s consider a CTF at defocus 5000 Å, Cs 2.7 mm at 0.51 Å/px. This is shown

in orange. If one thought the pixel size is 0.54 Å/px, the calculated CTF (blue) became quite oﬀ even at 5 Å (0.2).

7.2. Pixel size issues 121

RELION

However, the defocus is ﬁtted (by CtfFind, followed by CtfReﬁne) at 0.54 Å/px, the nominal pixel size. The defocus

became 5526 Å, absorbing the error in the pixel size. This is shown in blue. The ﬁt is almost perfect up to about 3.3

Å. In this region, you can update the pixel size in PostProcess. Beyond this point, the error from the Cs term starts to

appear and the two curves go out of phase. This is why the FSC drops to zero. However, the two curves came into

phase again at about 1.8 Å (0.55)! This is why the FSC goes up again.

122 Chapter 7. Reference pages

RELION

If you reﬁne Cs and defocus simultaneously, the error in the pixel size is completely absorbed and the ﬁt becomes

perfect. Notice that the reﬁned Cs is 3.39, which is 2.7 * (0.54 / 0.51)^4. Also note that the reﬁned defocus 5606 Å is

5000 * (0.54 / 0.51)^2.

7.2. Pixel size issues 123

RELION

In practice, one just needs to run CtfRefine twice: ﬁrst with Estimate 4th order aberrations to reﬁne Cs,

followed by another run for defocus. Note that rlnSphericalAberration remains the same. The error in Cs is

expressed in rlnEvenZernike. You should never edit the pixel size in the STAR ﬁle!

Now everything is consistent at the nominal pixel size of 0.54 Å/px. In PostProcess, one should specify 0.51 Å/px to

re-scale the resolution and the header of the output map.

7.2.3 How can I merge datasets with diﬀerent pixel sizes?

First of all: it is very common that one of your datasets is signiﬁcantly better (i.e. thinner ice) than the others and merging

many datasets does not improve resolution. First process datasets individually and then merge the most promising two.

If it improves the resolution, merge the third dataset. Combining millions of bad particles simply because you have

them is a very bad idea and waste of storage and computational time!

For RELION 3.0, please see an excellent explanation posted to CCPEM by Max Wilkinson.

From RELION 3.1, you can reﬁne particles with diﬀerent pixel sizes and/or box sizes. Suppose you want to bring

polished (shiny) particles from the dataset2 project into the dataset1 project.

First, go to the Polish directory within the dataset1 project. Make a symbolic link to dataset2’s Polish/jobXXX by a

full path. A link by a relative path sometimes causes unexpected problems in RELION when it tries to expand paths.

If you are unlucky and the job number jobXXX is already present in the Polish directory of the dataset1 project, you

have to make a link by another name. In this case, you have to replace the ﬁle names in dataset2’s shiny.star.

Next, make sure they have diﬀerent rlnOpticsGroupName. For example:

dataset1’s shiny.star:

124 Chapter 7. Reference pages

RELION

data_optics

loop_

_rlnOpticsGroup #1

_rlnOpticsGroupName #2

_rlnAmplitudeContrast #3

_rlnSphericalAberration #4

_rlnVoltage #5

_rlnImagePixelSize #6

_rlnMicrographOriginalPixelSize #7

_rlnImageSize #8

_rlnImageDimensionality #9

1 dataset1 0.100000 2.700000 300.000000 1.000000 1.000000␣

˓→ 140 2

dataset2’s shiny.star:

data_optics

loop_

_rlnOpticsGroup #1

_rlnOpticsGroupName #2

_rlnAmplitudeContrast #3

_rlnSphericalAberration #4

_rlnVoltage #5

_rlnImagePixelSize #6

_rlnMicrographOriginalPixelSize #7

_rlnImageSize #8

_rlnImageDimensionality #9

1 dataset2 0.100000 2.700000 300.000000 1.100000 1.100000␣

˓→ 128 2

Then use JoinStar. You don’t have to import; just go above the .Nodes directory to see the ﬁle from dataset2.

The result should look like:

data_optics

loop_

_rlnOpticsGroup #1

_rlnOpticsGroupName #2

_rlnAmplitudeContrast #3

_rlnSphericalAberration #4

_rlnVoltage #5

_rlnImagePixelSize #6

_rlnMicrographOriginalPixelSize #7

_rlnImageSize #8

_rlnImageDimensionality #9

1 dataset1 0.100000 2.700000 300.000000 1.000000 1.000000␣

˓→ 140 2

2 dataset2 0.100000 2.700000 300.000000 1.100000 1.100000␣

˓→ 128 2

Note that the dataset2’s rlnOpticsGroup has been re-numbered to 2.

7.2. Pixel size issues 125

RELION

If two datasets came from diﬀerent detectors and/or had very diﬀerent pixel sizes, you might want to apply MTF

correction during reﬁnement. To do this, add two more columns: rlnMtfFileName to specify the MTF STAR ﬁle (the

path is relative to the project directory) and rlnMicrographOriginalPixelSize to specify the detector pixel size

(i.e. before down-sampling during extraction).

Reﬁne this combined dataset. For a reference and mask, you must use the pixel size and box size of the ﬁrst optics group

(or use --trust_ref_size option). The output pixel size and the box size will be the same as the input reference

map. After reﬁnement, run CtfRefine with Estimate anisotropic magnification: Yes. This will reﬁne the

relative pixel size diﬀerence between two datasets. In the above example, the nominal diﬀerence is 10 %, but it might

be actually 9.4 %, for example. Then run Reﬁne3D again. The absolute pixel size of the output can drift a bit. It needs

to be calibrated against atomic models.

You can do similar things for other job types as long as you make all relevant paths consistent. For example, to run

Polish in a new project directory, you have to make sure the paths to raw movies, the gain reference and the motion

correction job are valid as well as paths to the particles and the references. For Polishing, do NOT merge MotionCorr

STAR ﬁles. First, run Polishing on one of the MotionCorr STAR ﬁles with run_data.star that contains all particles.

This will process and write only particles from micrographs present in the given MotionCorr STAR ﬁle. Repeat this for

the other MotionCorr STAR ﬁles. Finally, join two shiny.star ﬁles from the two jobs. The --only_group option

is not the right way to do Polishing on combined datasets.

7.2.4 How can I estimate the absolute pixel size of a map?

Experimentally and ideally, one can use diﬀraction from calibration standards.

Computationally, one can compare the map and a reﬁned atomic model. However, if the model has been reﬁned against

the same map, the model might have been biased towards the map. Also note that bond and angle RMSDs are not always

reliable when restraints are too strong.

If you are sure that 𝐶𝑠

true

given by the microscope manufacturer is accurate, which is often the case, AND the acceler-

ation voltage is also accurate, you can optimize the pixel size such that the apparent Cs ﬁtted at the pixel size becomes

𝐶𝑠

true

First, ﬁnd out the 𝑍

coeﬃcient. This is the 7th number in rlnEvenZernike. The contribution of the spherical

aberration to the argument of the CTF’s sine function is 1/2𝜋𝜆

𝑘

𝐶𝑠, where k is the wave-number and 𝜆 is the

relativistic wavelength of the electron (0.0196 Å for 300 kV and 0.0251 Å for 200 kV). 𝑍

= 6𝑘

− 6𝑘

+ 1. By

comparing the 𝑘

term, we ﬁnd the correction to the Cs is 12𝑍

/(𝜋𝜆

). Note that −6𝑘

+ 1 terms are cancelled by

the lower-order Zernike coeﬃcients and can be ignored. Thus,

𝐶𝑠

apparent

= 𝐶𝑠

nominal

+ 12𝑍

* 10

−7

/(𝜋𝜆

−7

is to convert Cs from Å to mm. 𝐶𝑠

nominal

is what you used in CTFFIND (e.g. 2.7 mm for Titan and Talos).

The real pixel size can be calculated as NominalPixelSize(𝐶𝑠

true

/𝐶𝑠

apparent

)

1/4

7.2.5 Examples

In RELION 3.1, relion_refine (Reﬁne3D, Class3D, MultiBody) stretches, shrinks, pads and/or crops input particles

so that the output has the same nominal pixel size and the box size as the input reference. Let’s study what happens in

various cases.

126 Chapter 7. Reference pages

RELION

One optics group

Suppose your optics group says 1.00 Å/px. If the header of your reference also says 1.00 Å/px, particles are used

without any stretching or shrinking. The output remains 1.00 Å/px.

Let’s assume that the nominal pixel size of 1.00 Å/px turns out slightly oﬀ and the real pixel size is 0.98 Å/px. As

discussed above, you should never change the pixel size in the optics group table, since all CTF parameters, coor-

dinates and trajectories are consistent with the nominal pixel size of 1.00 Å/px. When the header of your reference is

1.00 Å/px, the output map header says 1.00 Å/px, but it is actually 0.98 Å/px. Thus, you should specify 0.98 Å/px in

PostProcess. The post-processed map will have 0.98 Å/px in the header and the X-axis of the FSC curve will be also

consistent with 0.98 Å/px.

You must not use this 0.98 Å/px map from PostProcess in future Reﬁne3D/MultiBody/Class3D jobs. Otherwise,

relion_refine stretches nominal 1.00 Å/px particles into 0.98 Å/px. Thus, the output map will have the nominal

pixel size of 0.98 Å/px but the actual pixel size will be 0.98 * 0.98 / 1.00 = 0.96 Å/px!

Multiple optics groups

Suppose your have two optics groups, one at 1.00 Å/px and the other at 0.95 Å/px. If the header of your reference is

1.00 Å/px, the particles in the ﬁrst group are used without any stretching or shrinking, while those from the second

group are shrunk to match 1.00 Å/px. The output map is at 1.00 Å/px.

Let’s assume that the real pixel size of the ﬁrst group is exactly at 1.00 Å/px, while that of the second group is actually

0.90 Å/px. After stretching by the ratio between the nominal pixel size and the pixel size in the reference header, the

true pixel size for the second group corresponds to 0.90 / 0.95 * 1.00 = 0.947 Å/px. Thus, the reconstruction is a

mixture of particles at 1.00 Å/px and 0.947 Å/px. Depending on the number and signal-to-noise ratio of particles in

each group, the real pixel size of the output can be any value between 0.947 and 1.00 Å/px, while the output header

says 1.00 Å/px.

Anisotropic magniﬁcation reﬁnement in CtfReﬁne ﬁnds the relative pixel size diﬀerences between particles and the

reference. Suppose the actual pixel size of the output from Reﬁne3D is 0.99 Å/px and there is no anistropic magni-

ﬁcation. CtfReﬁne ﬁnds the particles in the ﬁrst group look smaller in real space than expected and puts 0.9900 0

0 0.9900 in rlnMatrix00, 01, 10, 11 (0.9900 = 0.99 / (1.00 * 1.00 / 1.00)). The particles in the second group

look larger in real space than expected and rlnMatrix00 to rlnMatrix11 will be 1.045 0 0 1.045 (= 0.99 / (0.90

/ 0.95 * 1.00)).

Notice that these numbers do not give you the absolute pixel size (e.g. 0.95 / 1.045 = 0.9091 ̸= 0.900). However,

you can work out the pixel size of the reconstruction and other optics groups if you are conﬁdent with the pixel size

of one optics group. You can also work out the pixel size of all optics groups if you calibrate the pixel size of the

reconstruction with atomic models. Carefully follows two streching/shrinking steps done in relion_refine: one to

match the nominal pixel size of an optics group to the nominal pixel size of the reference and the further correction by

rlnMatrix.

7.3 Automation: Schemes

Schemes were introduced to relion-3.1, where they were called Schedules but to prevent confusion with scheduled jobs

in the main GUI, they were renamed to Schemes in relion-4.0 and their functionality was further improved. Schemes aim

to provide a generalised methodology for automatic submission of relion jobs. This is useful for creating standardised

workﬂows, for example to be used in on-the-ﬂy processing. The relion_it.py script that was introduced in relion-3.1

has been re-written to work with the Schemes.

The Schemes framework is built around the following key concepts: a directed graph that represents the logic of a series

of subsequent relion job-types is encoded in Nodes and Edges. Nodes can be either a relion job or a so-called Operator;

Edges form the connections between Nodes. In addition, Schemes have their own Variables.

7.3. Automation: Schemes 127

RELION

All information for each Scheme is stored in its own subdirectory of the Schemes/ directory in a relion project, e.g.

Schemes/prep. Within each Scheme’s directory, the scheme.star ﬁle contains information about all the Variables,

Edges, Operators and Jobs.

7.3.1 Variables

Three diﬀerent types of Variables exist: ﬂoatVariables are numbers; booleanVariables are either True or False; and

stringVariables are text. Each Variable has a VariableName; a so-called VariableResetValue, at which the value is

initialised; and a VariableValue, which may change during execution of the Scheme through the actions of Operators,

as outlined below.

One special stringVariable is called email. When this is set, upon completion or upon encountering an error, the

Scheme will send an email (through the Linux mail command) to the value of the email stringVariable.

7.3.2 Jobs

Jobs are the ﬁrst type of Node. They can be of any of the jobtypes deﬁned in the relion pipeliner, i.e. Import

Motion correction , etc, including the new External . Any Variable deﬁned in the Scheme can be set as a parameter

in a Job, by using two dollar signs on the GUI or in the job.star ﬁle. For example, one could deﬁne a ﬂoatVariable

voltage and use $$voltage on the corresponding input line of an Import job. Upon execution of the job inside the

Scheme, the $$voltage will be replaced with the current value of the voltage ﬂoatVariable.

Jobs within a Scheme each have a JobName and a JobNameOriginal. The latter is deﬁned upon creation of the job (see

next section); the former depends on the execution status of the Scheme, and will be set to the executed relion job’s

name, e.g. CtfFind/job004. In addition, each job has a JobMod and a jobHasStarted status. There are two types of

JobMode:

new

regardless of jobHasStarted, a new job will be created, with its own new JobName, every time the Schemer

passes through this Node.

continue

if jobHasStarted is False, a new job, with its own new JobName, will be created. If jobHasStarted is True, the

job will be executed as a continue job inside the existing JobName directory.

When a Scheme executes a Job, it always sets jobHasStarted to True. When a Scheme is reset, the jobHasStarted status

for all jobs is set to False.

7.3.3 Operators

Operators are the second type of Node. Each operator within a Scheme has a unique name and a type. Operators can

also have an output Variable: output, on which they act, and up to two input Variables: input1 and input2. Most, but

not all operators change the value of their output Variable.

The following types of operators act on an output that is a ﬂoatVariable:

float=set

output = ﬂoatVariable input1

float=plus

output = ﬂoatVariable input1 + ﬂoatVariable input2

float=minus

output = ﬂoatVariable input1 - ﬂoatVariable input2

128 Chapter 7. Reference pages

RELION

float=mult

output = ﬂoatVariable input1 × ﬂoatVariable input2

float=divide

output = ﬂoatVariable input1 / ﬂoatVariable input2

float=round

output = ROUND(ﬂoatVariable input1)

float=count_images

sets output to the number of images in the STAR ﬁle with the ﬁlename in stringVariable input1. stringVariable

input2 can be particles, micrographs or movies, depending on what type of images need to be counted.

float=count_words

sets output to the number of words in stringVariable input1, where individual words need to be separated with a

, (comma) sign.

float=read_star

sets output to the value of a double or integer that is read from a STAR ﬁle. stringVariable input1 deﬁnes

which variable to read as: starﬁlename,tablename,metadatalabel. If tablename is a table instead of a list, then

ﬂoatVariable input2 deﬁnes the line number, with the default of zero being the ﬁrst line.

float=star_table_max

sets output to the maximum value of a column in a starﬁle table, where stringVariable input1 speciﬁes the column

as starﬁlename,tablename,metadatalabel.

float=star_table_min

sets output to the minimum value of a column in a starﬁle table, where stringVariable input1 speciﬁes the column

as starﬁlename,tablename,metadatalabel.

float=star_table_avg

sets output to the average value of a column in a starﬁle table, where stringVariable input1 speciﬁes the column

as starﬁlename,tablename,metadatalabel.

float=star_table_sort_idx

a sorting will be performed on the values of a column in a starﬁle table, where stringVariable input1 speciﬁes the

column as starﬁlename,tablename,metadatalabel. stringVariable input2 speciﬁes the index in the ordered array:

the lowest number is 1, the second lowest is 2, the highest is -1 and the one-but-highest is -2. Then, output is set

to the corresponding index in the original table.

The following types of operators act on an output that is a booleanVariable:

bool=set

output = booleanVariable input1

bool=and

output = booleanVariable input1 AND booleanVariable input2

bool=or

output = booleanVariable input1 OR booleanVariable input2

bool=not

output = NOT booleanVariable input1

bool=gt

output = ﬂoatVariable input1 > ﬂoatVariable input2

bool=lt

output = ﬂoatVariable input1 < ﬂoatVariable input2

bool=ge

output = ﬂoatVariable input1 >= ﬂoatVariable input2

7.3. Automation: Schemes 129

RELION

bool=le

output = ﬂoatVariable input1 <= ﬂoatVariable input2

bool=eq

output = ﬂoatVariable input1 == ﬂoatVariable input2

bool=file_exists

output = True if a ﬁle with the ﬁlename stored in stringVariable input1 exists on the ﬁle system; False otherwise

bool=read_star

reads output from a boolean that is stored inside a STAR ﬁle. stringVariable input1 deﬁnes which variable to

read as: starﬁlename,tablename,metadatalabel. If tablename is a table instead of a list, then ﬂoatVariable input2

deﬁnes the line number, with the default of zero being the ﬁrst line.

The following types of operators act on an output that is a stringVariable:

string=set

output = stringVariable input1

string=join

output = concatenate stringVariable input1 and stringVariable input2

string=before_first

sets output to the substring of stringVariable input1 that occurs before the ﬁrst instance of substring stringVari-

able input2.

string=after_first

sets output to the substring of stringVariable input1 that occurs after the ﬁrst instance of substring stringVariable

input2.

string=before_last

sets output to the substring of stringVariable input1 that occurs before the last instance of substring stringVariable

input2.

string=after_last

sets output to the substring of stringVariable input1 that occurs after the last instance of substring stringVariable

input2.

string=read_star

reads output from a string that is stored inside a STAR ﬁle. stringVariable input1 deﬁnes which variable to read

as: starﬁlename,tablename,metadatalabel. If tablename is a table instead of a list, then ﬂoatVariable input2

deﬁnes the line number, with the default of zero being the ﬁrst line.

string=glob

output = GLOB(stringVariable input1), where input1 contains a Linux wildcard and GLOB is the Linux function

that returns all the ﬁles that exist for that wildcard. Each existing ﬁle will be separated by a comma in the output

string.

string=nth_word

output = the Nth substring in stringVariable input1, where N=*ﬂoatVariable* input2, and substrings are separated

by commas. Counting starts at one, and negative values for input2 mean counting from the end, e.g. input2=-2

means the second-last word.

The following types of operators do not act on any variable:

touch_file

performs touch input1 on the ﬁle system

copy_file

performs cp input1 input2 on the ﬁle system. stringVariable input1 may contain a linux wildcard. If string-

Variable input2 contains a directory structure that does not exist yet, it will be created.

130 Chapter 7. Reference pages

RELION

move_file

performs mv input1 input2 on the ﬁle system. stringVariable input1 may contain a linux wildcard. If string-

Variable input2 contains a directory structure that does not exist yet, it will be created.

delete_file

performs rm -f input1 on the ﬁle system. stringVariable input1 may contain a linux wildcard.

sends an email, provided a stringVariable with the name email exists and the Linux command mail is functional.

The content of the email has the current value of stringVariable input1, and optionally also stringVariable input2.

wait

waits ﬂoatVariable input1 seconds since the last time this operator was executed. The ﬁrst time it is executed,

this operator only starts the counter and does not wait. Optionally, if output is deﬁned as a ﬂoatVariable, then

the elapsed number of seconds since last time is stored in output.

exit_maxtime

terminates the execution of the Scheme after the number of hours have passed since its start as stored in ﬂoat-

Variable input1.

exit

terminates the execution of the Scheme.

7.3.4 Edges

Two types of Edges exist. The ﬁrst type is a normal Edge, which connects an inputNode to an ouputNode, thereby

deﬁning their consecutive execution.

The second type is called a Fork. A Fork has one inputNode, an outputNode, an outputNodeIfTrue, and an associated

booleanVariable. Whether one or the other output Node is executed depends on the current value of the booleanVariable

that is associated with the Fork. The fork with lead from the inputNode, an outputNode if the booleanVariable is False.

The fork will lead from the inputNode, an outputNodeIfTrue if the booleanVariable is True. Thereby, Forks are the

main instrument of making decisions in Schemes.

7.3.5 Create a Scheme

The combination of the Variables, Nodes and Edges allows one to create complicated sequences of jobs. It is probably

a good idea to draw out a logical ﬂow-chart of your sequence before creating a Scheme. Then, use your favourite

text editor to manually edit the ﬁles Schemes/SCHEMENAME/scheme.star and all the ﬁles Schemes/SCHEMENAME/

JOBNAMES/job.star for all the jobs in that Scheme. Following the prep and proc examples in the scripts directory

of your relion installation is probably the easiest way to get started.

In the Schemes/SCHEMENAME/scheme.star ﬁle, ﬁrst add all the diﬀerent variables and operators that you will need.

Note that any variable names that contain a JobNameOriginal of any of the Jobs inside any Scheme that is present

in the ProjectDirectory, will be replaced by the current JobName upon execution of an operator. For example, a

stringVariable with the value Schemes/prep/ctffind/micrographs_ctf.star will be replaced to something like

CtfFind/job003/micrographs_ctf.star upon execution of the job that uses it, assuming that the current JobName

of that job is CtfFind/job003/ in the Schemes/prep/scheme.star ﬁle.

Then, add your jobs. You can use the normal relion GUI to ﬁll in all parameters of each job that you need and then use the

Save job.star options from the Jobs menu to save the job.star ﬁle in the corresponding Schemes/SCHEMENAME/

JOBNAMES/ directory. Jobs use the same mechanism as described for the Variables above. So, if an Auto-picking job

depends on its micrographs STAR ﬁle input on a CTF estimation job called ctﬃnd, and this CTF estimation job is part of

a Scheme called prep, then the micrographs STAR ﬁle input for the Auto-picking job should be set to Schemes/prep/

ctffind/micrographs_ctf.star, and this will be converted to CtfFind/job003/micrographs_ctf.star upon

7.3. Automation: Schemes 131

RELION

execution of the job. In addition, a corresponding edge will be added to the default_pipeliner.star upon execution

of the Scheme. Also note that parameters in job.star ﬁles may be updated with the current values of Variables from

the Scheme by using the $$ preﬁx, followed by the name of the corresponding Variable, as also mentioned above.

In addition, the JobMode needs to be chosen from options: new or continue. Typically, in on-the-ﬂy-processing

procedures that iterate over ever more movies, jobs like Import , Motion correction , CTF estimation , Auto-picking

and Particle extraction are set as continue, whereas most other jobs are set as new.

Finally, once all the Variables, Operators and Jobs are in place, one should deﬁne all the Edges between them.

The Scheme will be initialised (and reset) to the left-hand Node of the ﬁrst deﬁned Edge. If the Scheme is not an inﬁnite

loop, it is recommended to add the exit Operator as the last Node.

Once a Scheme has been created, it may be useful for more than one relion project. Therefore, you may want to store

it in a tar-ball:

tar -zcvf preprocess_scheme.tar.gz Schemes/preprocess

That tar-ball can then be extracted in any new relion project directory:

tar -zxvf preprocess_scheme.tar.gz

Executing a Scheme

Once you have created the Scheme/name/ (sub)directories (with “name” being the name of your scheme), you can

launch a separate GUI using:

relion_schemegui.py name

You can start, stop, change parameters, and restart the scheme from there. You can also look into this python script to

see the actual calls it makes to relion_schemer, which is the command line program that executes the scheme. While it

runs, you can then follow the generation of new jobs in the normal relion GUI.

7.4 Helical reconstruction

Shaoda He, a PhD-student in the Scheres group, implemented a workﬂow for the processing of helical assem-

blies. This involves additional tabs to the parameter-panels of the Auto-picking , Particle extraction , 2D classiﬁcation ,

3D classiﬁcation , 3D auto-reﬁne , Particle polishing and Mask create job-types. We do not have a separate tutorial for

processing helical assemblies. The general principles remain the same as for single-particle analysis, which is covered

in this tutorial. Therefore, users intending to use relion for helical processing are still encouraged to do this tutorial

ﬁrst. For a detailed description of the helical options, the user is referred to the corresponding pages on the RELION

wiki, or to Shaoda’s paper[HS17]. We are aware that a tutorial on helical processing is probably overdue, but due to

time constraints we haven’t got to doing that yet. Sorry. ..

132 Chapter 7. Reference pages

RELION

7.4.1 Initial model generation for amyloids

The relion_helix_inimodel2d program is a new feature in relion-3.1. It allows generation of an initial 3D reference

for helical reconstruction, in particular for amyloids. It is run from the command line, and takes a selection of suitable

2D class averages as input. It will try to align these class averages with respect to each other to form a continuously

changing density that spans an entire cross-over. At the heart of the iterative reﬁnement process lies a tomographic 2D

reconstruction with all 1D pixel columns from the cross-over. Details of this program, together with a more elaborate

documentation of its functionality remain to be published. Possible usage is:

relion_helix_inimodel2d --i Select/job056/class_averages.star \

--crossover_distance 800 --angpix 1.15 --maxres 9 --search_shift 3 \

--mask_diameter 250 --j 6 --iter 5 --o IniModel/run1

relion_helix_inimodel2d --i IniModel/run1_it005.star \

--iniref 1@IniModel/run1_it005_reconstructed.mrcs --angpix 1.15 \

--maxres 6 --search_angle 2 --step_angle 0.5 --mask_diameter 250 \

--j 6 --iter 5 --o IniModel/run2

We like xmipp-2.4 for live-updated display of the following images during the optimisation:

xmipp_show -img rec.spi after_reproject.spi before_reproject.spi -poll &

7.5 Subtomogram Averaging

7.5.1 Data Types

The custom data types used in tomography programs in relion are:

Tomogram set

The tomogram set describes all the tomograms (i.e. tilt series) deﬁned in a project, and is usually named tomograms.

star. It contains a single table, named global, which deﬁnes the properties pertaining to each tomogram. Each row

corresponds to one tomogram, and it lists the following properties:

• The tomogram name (rlnTomoName): a unique identiﬁer used to refer to this tomogram within the entire project

(i.e. within the tomogram set, the particle set and the trajectory set)

• The tilt series ﬁle name (rlnTomoTiltSeriesStarFile): the path to the star ﬁle corresponding to each to-

mogram/image stack

• The tomogram size (rlnTomoSize<X/Y/Z>): the extent of the tomogram region in 3D space

• The handedness (rlnTomoHand): this value is either +1 or -1, and it describes whether the focus increases or

decreases as a function of depth (projected Z coordinate). It cannot be known a priori and has to be determined

experimentally. It is typically identical across the entire data set.

• The pixel size of the raw micrographs, given in Å (rlnMicrographOriginalPixelSize)

• The pixel size of the tilt series, given in Å (rlnTomoTiltSeriesPixelSize), corresponding to the dimensions

given by rlnTomoSize<X/Y/Z>, after the binning applied in the Motion correction job

• The tomogram binning (rlnTomoTomogramBinning): the binning applied to the tilt series in

rlnTomoTiltSeriesStarFile to reconstruct the tomogram in rlnTomoReconstructedTomogram

7.5. Subtomogram Averaging 133

RELION

• The reconstructed tomogram ﬁle (rlnTomoReconstructedTomogram): the path to the reconstructed tomo-

gram mrc ﬁle, at the binning in rlnTomoTomogramBinning

• The Etomo ﬁle (rlnEtomoDirectiveFile)

• The optics group name (rlnOpticsGroupName)

• The voltage, given in kV (rlnVoltage)

• The spherical aberration, given in mm^-1 (rlnSphericalAberration)

• The amplitude contrast (rlnAmplitudeContrast)

The star ﬁle for each tomogram, to which rlnTomoTiltSeriesStarFile points, contains a single table with the

same name as the tomogram. In it, each row corresponds to one tilt image and includes the following properties,

among others:

• The tilt image rotation angles and shifts (rlnTomo<X/Y>Tilt, rlnTomoZRot and rlnTomo<X/

Y>ShiftAngst)

• The astigmatic defocus (rlnDefocus<U/V> and rlnDefocusAngle)

• The intensity scale factor related to eﬀective ice thickness at each tilt angle (rlnCtfScalefactor)

• The cumulative radiation dose (rlnMicrographPreExposure)

• The frame count (rlnTomoTiltMovieFrameCount)

Particle set

The particle set (typically named particles.star) lists all the particles in a data set. It is roughly equivalent to

the corresponding particles ﬁle used by the SPA programs in relion, and it serves as the binding element between the

tomography programs and the ones used for the general reﬁnement.

Like an SPA particle set, the tomography particle set consists of an optics table and a particles table. Unlike in

an SPA particle set, the coordinates of the particles are given in 3 dimensions (rlnCenteredCoordinate<X/Y/

Z>Angst and rlnOrigin<X/Y/Z>Angst), and each particle also requires a tomogram name (rlnTomoName) that

identiﬁes the particle’s tomogram of origin, as deﬁned in the tomogram set. Note that in Relion 5, the pixel coordinates

rlnCoordinate<X/Y/Z> have been replaced by the absolute and centered coordinates rlnCenteredCoordinate<X/

Y/Z>Angst, which are in Angstroms from the centre of the tomogram.

Also unlike in SPA, the tomography particles should not be thought of as pixel data stored on disk, but instead as

abstract entities deﬁned in this ﬁle. They are represented on the disk either as 2D stacks of CTF-premultiplied ex-

tracted images (the preferred way in Relion 5) or as 3D volumes where Fourier slices of the 2D images are combined

in 3D. The particles that are used for reﬁnement (i.e. the pseudo-subtomograms) can be created using the program

relion_tomo_subtomo, but they should be regarded as transient data that become invalid once any of the other prop-

erties of their tomograms have been updated. Therefore, during the processing of a tomography data set, the pseudo-

subtomograms are typically created multiple times.

Another diﬀerence to a SPA 2D particle set is that tomography particles can optionally possess two diﬀerent ori-

entations. In tomography, the approximate orientation of a particle is often dictated by the geometric context (e.g.

the orientation of a membrane in which they are embedded). Therefore, the subtomogram alignment can be greatly

accelerated by deﬁning the initial orientation of each subtomogram in relation to that geometry (using the ﬁelds

rlnTomoSubtomogram<Rot/Tilt/Psi>) and then applying strong priors during reﬁnement to constrain the angles

that describe the orientation of the particle itself with respect to its subtomogram (rlnAngle<Rot/Tilt/Psi>). Since

only the latter angles are considered by relion_refine, the priors will only aﬀect those angles. Typically, this means

that the rlnAngleTilt angle will be constrained, while the other two are usually unknown, even in tomography.

134 Chapter 7. Reference pages

RELION

Trajectory set

The trajectory set (typically named motion.star) contains the motion trajectories of all particles in a corresponding

particle set. The trajectories can be estimated using the program relion_tomo_align, and they will be considered by

any relion tomography program that requires particle positions.

The trajectory ﬁle consists of P+1 tables, where P is the number of particles in the corresponding particle set. The ﬁrst

table only contains the number of particles, while every successive table, named using the particle name (the same as

the value of rlnTomoParticleName in the particle set ﬁle) lists the trajectory of each particle. The trajectories are

given in Å, and they are relative to the position of the particle in the chronologically ﬁrst tilt image (i.e. the image with

the lowest cumulative dose).

Optimisation set

The optimisation set is the central data type in a relion tomography project, and it only contains paths to other ﬁles. It

is usually named optimisation_set.star.

Speciﬁcally, it can point to the following data ﬁles:

• A tomogram set listing all tomograms.

• A particle set listing all particles.

• A trajectory set describing the motion of all particles.

All of those entries are optional, and diﬀerent programs require diﬀerent inputs. All programs take an optimisation

set as input, and most of them also write one out. If a data ﬁle is created or updated by a program, it is added to the

optimisation set that the program writes out. This frees the user from having to track which programs update which

data ﬁles. For example, relion_tomo_reﬁne_ctf will only update the tomogram set, because the defoci are only deﬁned

once for each tilt image, while relion_tomo_align will update the tomogram set, the particle set and the trajectory set.

When running a program, the individual data ﬁles can also be speciﬁed separately, in which case the individual ﬁles will

override the ones listed in the optimisation set. This allows the user to easily perform the same procedure with speciﬁc

data ﬁles exchanged. If all data ﬁles required by a program have been speciﬁed individually, then an optimisation set

is not required at all.

There are 3 ways to create an initial optimisation set:

• Use the utility program relion_tomo_make_optimisation_set.

• Run any program and specify the individual data ﬁles. The program will then output a new optimisation set.

• Write one manually.

7.5.2 Programs

These are the most relevant tomography-related programs in relion:

7.5. Subtomogram Averaging 135

RELION

relion_tomo_reconstruct_particle

This program constructs high-resolution 3D reference maps from a given optimisation set. It is similar to

relion_reconstruct, except that it does not require pre-extracted particle images. Instead, it extracts square win-

dows of arbitrary size (--b argument) directly from the tilt series and it uses those for the reconstruction (using Fourier

inversion). It considers the defoci of the individual particles, as well as particle motion and higher-order aberrations.

The output consists of a pair of maps (one for each half set) as well as a merged map. In addition, the numerator (data

term) and denominator (weight term) (see Eq. 3 in the |RELION| paper) of all 3 maps are also written out to facilitate

further processing.

Relevant program arguments:

• --i and/or --p, --t and --mot: input optimisation-set and/or its components (see optimisation set).

• --b: box size of the reconstruction. Note that this is independent of the box size used to reﬁne the particle.

This allows the user to construct a 3D map of arbitrary size to gain an overview of the structure surrounding the

particle. A suﬃciently large box size also allows more of the high-frequency signal to be captured that has been

delocalised by the CTF.

• --crop: cropped box size. If set, the program will output an additional set of maps that have been cropped to this

size. This is useful if a map is desired that is smaller than the box size required to retrieve the CTF-delocalised

signal.

• --bin: downsampling (binning) factor. Note that this does not alter the box size. The reconstructed region

instead becomes larger.

• --SNR: apply a Wiener ﬁlter with this signal-to-noise ratio. If omitted, the reconstruction will use a heuristic to

prevent divisions by excessively small numbers. Please note that using a low (even though realistic) SNR might

wash out the higher frequencies, which could make the map unsuitable to be used for further reﬁnement.

• --sym: name of the symmetry class (e.g. C6 for six-fold point symmetry).

• --mem: (approximate) maximum amount of memory to be used for the reconstruction (see below).

• --j: number of threads used for the non-reconstruction parts of the program (e.g. symmetry application or

gridding correction). This should be set to the number of CPU cores available.

• --j_out: number of threads that compute partial reconstructions in parallel. This is faster, but it requires

additional memory for each thread. When used together with the --mem argument, this number will be reduced

to (approximately) maintain the imposed memory limitation.

• --j_in: number of threads to be used for each partial reconstruction. This is a slower way to parallelise the

procedure, but it does not require additional memory. Unless memory is limited, the --j_out option should be

preferred. The product of --j_out and --j_in should not exceed the number of CPU cores available.

• --o: name of the output directory (that will be created).

136 Chapter 7. Reference pages

RELION

Program output:

After running the program, the output directory (--o argument) will contain the following items:

• <half1/half2/merged>.mrc: the reconstructed cropped maps.

• <half1/half2/merged>_full.mrc: the reconstructed full-size maps, in case the full size (--b argument) diﬀers

from the cropped size (--crop argument).

• data_<half1/half2/merged>.mrc: the reconstructed data term at full size.

• weight_<half1/half2/merged>.mrc: the reconstructed weight term at full size.

• note.txt: a text ﬁle containing the command issued to run the program.

relion_tomo_subtomo

This program creates pseudo subtomograms that can be fed into relion_refine to perform subtomogram alignment.

Those pseudo subtomograms are not meant to represent the actual density distributions of the particles, but instead

abstract 3D image terms that allow relion_refine to approximate the cost function that would arise if the alignment

were to be performed on the stack of corresponding 2D images instead. Speciﬁcally, each pseudo subtomogram consists

of a sum of CTF-weighted observations (data term) as well as the corresponding sum of CTF

terms (weight term). In

addition, the output weight term also contains - concatenated along the Z axis - a 3D multiplicity-image (number of

Fourier slices contributing to each Fourier voxel) that allows relion_refine to estimate a spherically symmetrical

noise distribution.

The reconstructed pseudo subtomograms consider the motion of the particles as well as higher-order aberrations, if

either have been estimated. The program will output a new particle set, as well as a new optimisation set in which the

initial particle set has been replaced by the newly generated one. That particle set can be fed into relion_refine.

If a subtomogram orientation has been deﬁned (see particle set), then the subtomogram will be constructed in that

coordinate system. If the approximate orientation of the particle is known a-priori from the surrounding geometry, this

allows angle priors to be used in relion_refine to avoid searching through implausible orientations. In particular, for

particles sampled from a manifold (using relion_tomo_sample_manifold), the Z axis of the subtomogram coordinate-

system is always perpendicular to the local manifold. A strong tilt-angle prior can then be applied to only search

through orientations that do not tilt the particle too far away from that orientation, while leaving the remaining two

angles unconstrained (i.e. the direction of tilt and the rotation around the Z axis).

Relevant program arguments:

• --i and/or --p, --t and --mot: input optimisation-set and/or its components (see optimisation set).

• --b: initial box size of the reconstruction. A suﬃciently large box size allows more of the high-frequency signal

to be captured that has been delocalised by the CTF.

• --crop: cropped output box size. After construction, the resulting pseudo subtomograms are cropped to this

size. A smaller box size allows the (generally expensive) reﬁnement using relion_refine to proceed more

rapidly.

• --bin: downsampling (binning) factor. Note that this does not alter the initial or the cropped box size. The

reconstructed region instead becomes larger.

• --cone_weight: downweight a cone in Fourier space along the Z axis (as deﬁned by the coordinate system of

the particle). This is useful for particles embedded in a membrane, as it can prevent the alignment from being

driven by the membrane signal (the signal of a planar membrane is localised within one line in 3D Fourier space).

Note that the coordinate system of a particle is given by both the subtomogram orientation (if deﬁned) and the

7.5. Subtomogram Averaging 137

RELION

particle orientation (see particle set). This allows the user to ﬁrst obtain a membrane-driven alignment, and to

then speciﬁcally suppress the signal in that direction.

• --cone_angle: the (full) opening angle of the cone to be suppressed, given in degrees. This angle should

include both the uncertainty about the membrane orientation and its variation across the region represented in

the subtomogram.

• --j: number of threads used to reconstruct pseudo subtomograms in parallel. Each thread will require additional

memory, so this should be set to the number of CPU cores available, unless the memory requirements become

prohibitive.

• --o: name of the output directory (that will be created).

Program output:

After running the program, the output directory (--o argument) will contain the following items:

• particles.star: a new particle set containing the ﬁle names of the reconstructed subtomograms. This ﬁle can be

understood by relion_refine

• optimisation_set.star: a new optimisation set pointing to the new particle set.

• Subtomograms/<**`particle_name`>_data.mrc**: 3D images representing the data terms of all subtomo-

grams.

• Subtomograms/< particle_name >_weights.mrc: the weight terms of all subtomograms. Each 3D image

contains, concatenated along Z, an image describing the sums over all CTF

and an image describing the multi-

plicities.

• note.txt: a text ﬁle containing the command issued to run the program.

relion_tomo_align

This program reﬁnes the projections that map 3D space onto the images of the tilt series, as well as (optionally) also

the beam-induced motion trajectories of the particles. Each projection is optimised using the full 5 degrees of freedom:

the assumption of a common tilt axis is abandoned. Even if no beam-induced motion is estimated, the (in that case

static) 3D particle-positions are also optimised by the program. This is because those 3D positions cannot be assumed

to be known in advance, since they have only been inferred from the observed 2D particle-positions in the individual

tilt images. Therefore, this program always looks for the optimal set of 3D positions and projections.

If the particle motion is also being estimated (using the --motion argument), then the 3D positions of the particles are

allowed to diﬀer between tilt images, albeit only in a limited and spatially smooth manner. This is done analogously to

Bayesian polishing in 2D (implemented in relion_motion_refine), except that the trajectories are estimated in 3D,

and that no acceleration constraint is imposed. Because part of the particle motion is rigid, the motion estimation also

always includes a rigid alignment of the frames.

Please note that estimating beam-induced motion signiﬁcantly increases the runtime of the program, so it should only

be done in the ﬁnal rounds of optimisation.

The estimated particle trajectories are written into the output trajectory set, the projection matrices (containing the rigid

part of the motion as well as the alignment of the tilt series - these cannot be distinguished) into the output tomogram set

and the estimated (initial) particle positions into the output particle set which replace the given trajectory, tomogram

and particle sets in the output optimisation set.

138 Chapter 7. Reference pages

RELION

General program arguments:

• --i and/or --p, --t, --mot, --ref<1/2>, --mask and --fsc: input optimisation-set and/or its components

(see optimisation set).

• --b: box size to be used for the estimation. Note that this can be larger than the box size of the reference map.

A suﬃciently large box size allows more of the high-frequency signal to be captured that has been delocalised

by the CTF.

• --r: maximal assumed error in the initial 2D particle-positions (distances between the projected 3D positions

and their true positions in the images), given in pixels.

• --j: number of threads to be used. This should be set to the number of CPU cores available.

• --o: name of the output directory (that will be created).

Motion-related arguments:

• --motion: perform motion estimation simultaneously with the tilt-image alignment.

• --s_vel: the expected amount of motion (i.e. the std. deviation of particle positions in Å after 1 electron per

of radiation).

• --s_div: the expected spatial smoothness of the particle trajectories (a greater value means spatially smoother

motion).

• --sq_exp_ker: assume that the correlation of the velocities of two particles decays as a Gaussian over their

distance, instead of as an exponential. This will produce spatially smoother motion and result in a shorter program

runtime.

Deformation-related arguments:

• --deformation: perform 2D deformation estimation for each tilt-image.

• --def_w: Number of horizontal sampling points for the deformation grid.

• --def_h: Number of vertical sampling points for the deformation grid.

• --def_model: Type of model to estimate deformations (linear, spline or Fourier).

• --def_reg (0.0): Value of the deformation regulariser

• --per_frame_deformation: Model separate 2D deformations for each tilt-image instead of the whole tilt

series.

Program output:

After running the program, the output directory (--o argument) will contain the following items:

• motion.star: a new trajectory set

• tomograms.star: a new tomogram set containing the newly estimated projection matrices (rlnTomoProj<X/

Y/Z/W>)

• particles.star: a new particle set containing the newly estimated particle positions in 3D (rlnCoordinate<X/

Y/Z>)

• optimisation_set.star: a optimisation set pointing to the new trajectory, tomogram and particle sets.

7.5. Subtomogram Averaging 139

RELION

• Trajectories/< tomogram_name >_x<1/8>.ply: meshes of 3D trajectories in ply format, scaled by a fac-

tor of 1 or 8, respectively. Note that only the trajectories themselves are scaled - their positions are those

of the particles. They can be viewed in relation to (low-magniﬁcation) tomograms as produced by re-

lion_tomo_reconstruct_tomogram in a viewer that supports both meshes and 3D images (such as e.g. Paraview).

• note.txt: a text ﬁle containing the command issued to run the program.

relion_tomo_reﬁne_ctf

This program estimates the astigmatic defoci of the individual tilt images, the ice thickness (which determines the

overall signal intensity) and higher-order optical aberrations.

Defocus: In tomography, the relative depth distances between particles are known from the 3D positions of the particles.

Therefore, only one defocus value is estimated for all the particles in each tilt image. Because of the often large number

of particles in each tomogram, this value can typically be estimated to greater precision than in single-particle analysis,

where the defocus of each particle has to be estimated independently.

Ice-thickness: A thicker sample permits fewer electrons to pass through, which reduces the scale of the signal. In

addition to the actual variation in sample thickness, the eﬀective thickness of the ice also increases as the sample is

tilted. This program allows the user to estimate the signal intensity either independently for each tilt image, or by ﬁtting

the base thickness, the initial beam luminance and the surface normal of the sample assuming Beer-Lambert’s Law.

In the latter case, only one surface normal and base thickness are estimated for an entire tilt series, which allows for a

more stable ﬁt from tomograms with fewer particles.

Higher-order optical aberrations: This algorithm works analogously to relion_ctf_refine for single-particle

analysis. As in single-particle analysis, the aberrations are estimated per optics group. This allows the user to group

particles that are expected to share the same aberrations, either by image region or by subset of tilt series, or both. Both

symmetrical (even) and antisymmetrical (odd) aberrations are supported. A detailed description of the aberrations

estimation algorithm can be found in the relion aberrations paper.

The estimated defocus and ice-thickness values are written into the output tomogram set and the aberrations into the

particle set, both of which replace the given sets in the output optimisation set.

General program arguments:

• --i and/or --p, --t, --mot, --ref<1/2>, --mask and --fsc: input optimisation-set and/or its components

(see optimisation set).

• --b: box size to be used for the estimation. Note that this can be larger than the box size of the reference map.

A suﬃciently large box size allows more of the high-frequency signal to be captured that has been delocalised

by the CTF.

• --j: number of threads to be used. This should be set to the number of CPU cores available.

• --o: name of the output directory (that will be created).

140 Chapter 7. Reference pages

RELION

Defocus-estimation arguments:

• --do_defocus: instructs the program to estimate the defoci.

• --do_reg_defocus: applies defocus regularisation. High-tilt images do not oﬀer enough signal to recover the

defocus value precisely. The regularisation forces the estimated defoci to assume similar values within a given

tilt series, which prevents those high-tilt images from overﬁtting.

• --lambda: defocus regularisation strength.

• --d0 and --d1: minimal and maximal defocus oﬀset (from the initial value) to scan in the initial part of the

defocus estimation. This scan allows the algorithm to escape a local minimum in case it has been intialised in

one. Afterwards, a local astigmatic-defocus reﬁnement is performed. Both values are assumed to be given in Å.

• --ds: number of steps between --d0 and --d1.

• --kmin: Lowest spatial frequency to consider, in Å.

• --reset_to_common: Reset the CTFs of all tilts to a common one prior to local reﬁnement.

Ice-thickness estimation arguments:

• --do_scale: instructs the program to estimate the signal scale or ice thickness.

• --per_frame_scale: estimate the signal-scale parameter independently for each tilt. If not speciﬁed, the ice

thickness, beam luminance and surface normal are estimated instead. Those three parameters then imply the

signal intensity for each frame. Due to the smaller number of parameters, the ice thickness model is more robust

to noise. By default, the ice thickness and surface normal will be estimated per tilt-series, and the beam luminance

globally.

• --per_tomogram_scale: estimate the beam luminance separately for each tilt series. This is not recommended.

Aberrations-estimation arguments:

• --do_even_aberrations: instructs the program to estimate the even (i.e. symmetrical) aberrations. These

deform the shape of the CTF.

• --do_odd_aberrations: instructs the program to estimate the odd (i.e. anti-symmetrical) aberrations. These

rotate the phases of the observed images.

• --ne: number of Zernike bands used to ﬁt the even aberrations. A greater number can quickly lead to overﬁtting.

The user is advised to keep this value at 4 - this will allow for a correction to the spherical aberration term, as

well as four-fold and ﬁrst- and second-order two-fold astigmatism.

• --no: number of odd Zernike bands. The user is advised to keep this value at 3 - this will allow for beam tilt

and three-fold astigmatism.

7.5. Subtomogram Averaging 141

RELION

Program output:

After running the program, the output directory (--o argument) will contain the following items:

• tomograms.star: a new tomogram set containing the newly estimated defoci (rlnDefocus<U/V> and

rlnDefocusAngle) and ice thickness values. Note that only the signal scale (rlnCtfScalefactor) implied

by the ice thickness is used by other programs.

• particles.star: a new particle set containing the estimated aberrations.

• optimisation_set.star: a new optimisation set pointing to the new tomogram and particle sets.

• note.txt: a text ﬁle containing the command issued to run the program.

• group_< optics_group >_<even/odd>_phase_per-pixel.mrc: visualisations showing the optimal phase-shift

for each pixel. Inspecting this allows the user to ensure that the estimated pattern is not mere noise.

• group_< optics_group >_<even/odd>_phase_nonlinear-ﬁt.mrc: visualisations of the ﬁts using Zernike

polynomials, which will be used by the other programs.

relion_tomo_local_particle_reﬁne

This program optimises the viewing angles and 3D positions of all particles. This is done by following the local gradient

of the L2 error function downhill, so it is signiﬁcantly faster than a full reﬁnement using relion_refine. Currently,

the results are slightly worse than those from relion_refine, though.

The estimated particle angles and positions are written into the output particle set which replaces the given particle set

in the output optimisation set.

Relevant program arguments:

• --i and/or --p, --t, --mot, --ref<1/2>, --mask and --fsc: input optimisation-set and/or its components

(see optimisation set).

• --b: box size to be used for the estimation. Note that this can be larger than the box size of the reference map.

A suﬃciently large box size allows more of the high-frequency signal to be captured that has been delocalised

by the CTF.

• --j: number of threads to be used. This should be set to the number of CPU cores available.

• --o: name of the output directory (that will be created).

Program output:

After running the program, the output directory (--o argument) will contain the following items:

• particles.star: a new particle set containing the newly estimated angles and positions.

• optimisation_set.star: a new tomogram set pointing to the new particle set.

• note.txt: a text ﬁle containing the command issued to run the program.

142 Chapter 7. Reference pages

RELION

relion_tomo_ﬁt_blobs_3d

This program reﬁnes spherical manifolds to better ﬁt the double membranes of vesicles, producing spheroidal mani-

folds. The resulting manifold will run along the outer interface of the outer membrane, where the signal is the strongest.

These spheroidal manifolds can then be used to sample random particles at a speciﬁc depth and with an appropriate

initial orientation using the program relion_tomo_sample_manifold.

The initial spherical manifolds need to be part of the input manifold set that can be created using the program re-

lion_tomo_add_spheres. The estimated spheroids are written into the output manifold set which replaces the given

one in the output optimisation set. Note: the user is recommended to ﬁrst detect all ﬁducials in the input tomo-

grams, since their strong signal can disturb the membrane estimation. This can be done using the program re-

lion_tomo_ﬁnd_ﬁducials.

Relevant program arguments:

• --i and/or --t and --man: input optimisation-set and/or its components (see optimisation set).

• --ms: approximate membrane separation in Å.

• --rr: the expected variation of the vesicle radius as a multiple of the initial sphere radius.

• --frad: approximate radius of the ﬁducial markers (required for their deletion).

• --bin: binning level at which to perform the membrane estimation. This needs to allow multiple pixels of

separation between the two membranes.

• --n: number of Spherical Harmonics bands used to represent the spheroids. A greater number will allow for a

more precise membrane ﬁt, but it will also make overﬁtting more likey. A value between 5 and 9 is recommended.

• --j: number of threads to be used. This should be set to the number of CPU cores available.

• --o: name of the output directory (that will be created).

Program output:

After running the program, the output directory (--o argument) will contain the following items:

• manifolds.star: a new manifold set containing the spheroidal membranes. It will not contain the initial spheres,

nor any other initial manifolds.

• optimisation_set.star: a new optimisation set pointing to the new manifold set.

• Meshes/< tomogram_name >.ply: visualisations showing the ﬁtted spheroids for each tomogram in

ply format. They can be viewed in relation to (low-magniﬁcation) tomograms as produced by re-

lion_tomo_reconstruct_tomogram in a viewer that supports both meshes and 3D images (such as e.g. Paraview).

• note.txt: a text ﬁle containing the command issued to run the program.

7.5. Subtomogram Averaging 143

RELION

relion_tomo_reconstruct_tomogram

This program creates a tomogram (i.e. a 3D image approximating the scattering potential within a cryo-EM sample)

from a tilt series deﬁned in a given tomogram set. Please note that these tomograms are not intended for further process-

ing. Instead, their purpose is only to allow for the inspection of the sample and the detection of speciﬁc structures, such

as manifolds or individual particles. In order to perform subtomogram averaging, the detected particles should be writ-

ten into a|particle_set|, from which pseudo subtomograms should be constructed using the program program_subtomo.

These pseudo subtomograms can then be reﬁned using relion_refine.

This program is also not intended for the reconstruction of full-size tomograms, since the resulting 3D image needs to

ﬁt into memory.

Relevant program arguments:

• --i or --t: input optimisation-set or tomogram set (see optimisation set).

• --tn: tomogram_name of the tomogram to reconstruct (see tomogram set).

• --no_weight: reconstruct an unweighted tomogram, as opposed to applying a weighting in 3D using a Wiener

ﬁlter. If the purpose of the tomogram is for a human to slice through it along the viewing direction, then an

unweighted tomogram oﬀers the most contrast. It will introduce streaking artifacts along the viewing direction,

however.

• --pre_weight: apply a pre-weighting in 2D instead of a 3D weighting. This is signiﬁcantly more eﬃcient for

large tomograms.

• --SNR: the signal-to-noise ratio assumed by the Wiener ﬁlter. Note: a realistic SNR might lead to an excessive

dampening of the high frequencies. If the purpose of the tomogram is only for humans to inspect it, then this

dampening might make the image clearer.

• --noctf: do not modulate the input images by a (spatially constant) CTF.

• --keep_mean: do not subtract the mean from (i.e. do not zero the DC component of) every 2D image prior to

reconstruction.

• --td: taper distance. Do not backproject image regions closer than this to the edge of a 2D image.

• --tf: taper falloﬀ. Fade out the contribution of each 2D image along this distance to the edge of the eﬀective

image. If a value has been speciﬁed for the --td argument, then this falloﬀ begins at that edge. These two options

are useful if the images show artifacts near the edge, and the resulting tomogram is to be used for automated

detection.

• --<x0/y0/z0>: the origin of the 3D region to be reconstructed. Note: to maintain consistency with other

processing software, the default origin (corresponding to the one used by IMOD) is equal to 1. Therefore, 1 is

also the default value for these arguments.

• --<w/h/d>: the size of the 3D region to be reconstructed (in bin-1 pixels). If not speciﬁed, the size indicated in

the tomogram set will be used, which should correspond to the region deﬁned in IMOD.

• --bin: the binning level of the reconstruction. Note: the default value for this is 8, in order to prevent accidentally

reconstructing a bin-1 tomogram.

• --j: number of threads to be used. This should be set to the number of CPU cores available.

• --o: name of the output 3D image. This is the only ﬁle created by this program. If the ﬁle name contains

subdirectories, then those will also be created.

144 Chapter 7. Reference pages

RELION

relion_tomo_import_tomograms

This program imports a set of tilt series to form a tomogram set. It is typically the ﬁrst program to be used when

creating a relion tomography data set. It imports an existing alignment from IMOD and the initial CTF parameters

from either CTFFind or CtfPlotter. Additional parameters, such as the fractional electron dose and the chronological

sequence of frames have to be supplied by the user.

The tilt series alignment eﬀectively deﬁnes an arbitrary 3D coordinate system, as well as a set of projections that map

those 3D coordinates to the individual 2D images of the tilt series. The aim of this program is to translate that alignment

into a format readable by relion. If used successfully, this will produce an equivalent 3D coordinate system inside relion

that will allow particle positions already deﬁned in an IMOD tomogram to be used unchanged.

Optimally, the tilt series would be imported into relion right after alignment, and the particles would be picked in a

low-magniﬁcation tomogram generated by relion_tomo_reconstruct_tomogram. That way, the particle positions would

be guaranteed to be correct.

The primary input to relion_tomo_import_tomograms is a .star ﬁle containing one row for each tilt series and at least

the following columns:

• rlnTomoImportImodDir: path to the IMOD directory. That directory has to contain the command ﬁles used to

run IMOD’s tilt and newst programs. Typically, those are named tilt.com and newst.com. If those are not

their names, then the actual names can be speciﬁed using the --tc and --nc arguments, respectively. Additional

IMOD ﬁles that are referenced inside those .com ﬁles also need to be present inside the IMOD directory.

• rlnTomoImportCtfFindFile or rlnTomoImportCtfPlotterFile: path to the initial CTF estimate from either

CTFFind or CtfPlotter, respectively.

• rlnTomoTiltSeriesName: path to the actual tilt series. Note: if the tilt series ends with .st, this needs to be

speciﬁed with an .st:mrc ending to tell relion to interpret it as an mrc ﬁle.

The following additional columns may be present. If they are not, then the corresponding command line arguments

will be used. In that case, they will be identical for all tilt series.

• rlnTomoName: the tomogram name. The user is advised to specify a short and meaningful name here, since

it will make the data more readable and simplify the resulting directory structure. Names like e.g. TS_001 are

recommended. If not speciﬁed, the entire path to the tilt series (i.e. the value of rlnTomoTiltSeriesName)

will be used instead.

• rlnTomoImportFractionalDose: the electron dose corresponding to one tilt image. If omitted, the value of the

--fd argument will be used.

• rlnTomoImportOrderList: path to a two-column text ﬁle specifying the chronological order in which the im-

ages were acquired. That ﬁle has to contain two numbers per line separated by a comma (and no spaces), where

the ﬁrst number counts up from 1, while the second describes the sequence of tilt angles. To determine the se-

quence of tilt images, the program will look for the closest tilt angle for each image (as stored in the .tlt ﬁle).

This approach can deal with frames missing from the tilt series. If not speciﬁed in the .star ﬁle, the --ol input

argument will be used.

• rlnOpticsGroupName: an arbitrary name for an optics group. This allows the set of tilt series to be separated

into subsets that share the same optical aberrations. This is useful if the data have been collected in multiple

sessions that might exhibit diﬀerent aberrations. If omitted, all tilt series will be assigned to the same default

optics group.

• rlnTomoImportOﬀset<X/Y/Z>: an arbitrary oﬀset to the 3D coordinate system. This is useful if particles have

already been picked in tomograms that have been cropped after reconstruction by IMOD. If the IMOD-internal

SHIFT command has been used to apply oﬀsets, then this will be handled internally and does not need to be

speciﬁed here. If omitted, then the values of the --off<x/y/z> command line arguments will be used instead

(which default to 0).

7.5. Subtomogram Averaging 145

RELION

• rlnTomoImportCulledFile: output ﬁle name for a new tilt series with the excluded frames missing. This is only

needed if tilt images have been excluded using IMOD’s EXCLUDE, EXCLUDELIST or EXCLUDELIST2 commands.

In that case, this becomes a mandatory parameter.

General program arguments:

• --i: path to the input .star ﬁle.

• --t: path to an optional input tomogram set . If speciﬁed, the imported tomograms will be added to that set.

• --hand: the handedness of the tilt geometry. The value of this parameter is either +1 or -1, and it describes

whether the focus increases or decreases as a function of Z distance. It has to be determined experimentally. In

our experiments, it has always been -1.

Optics arguments:

• --angpix: the pixel size in Å.

• --voltage: the voltage in kV.

• --Cs: the spherical aberration in mm.

• --Q0: the amplitude constrast.

Electron dose arguments:

• --ol: the ﬁle name of a frame-order list ﬁle. Corresponds to the rlnTomoImportOrderList column in the

input .star ﬁle.

• --fd: The fractional (i.e. per tilt-image) electron dose. Corresponds to the rlnTomoImportFractionalDose

column in the input .star ﬁle.

IMOD import arguments:

• --flipYZ: if the IMOD tomogram has been ﬂipped along Y and Z (i.e. using clip ﬂipyz) after an IMOD re-

construction and before the particles have been picked, this will apply the same transformation to the relion

coordinate system. In case a rotation around X was performed (i.e. using clip rotx) this parameter should be

used together with --flipZ. This will allow relion to use particle positions deﬁned in the X-rotated tomogram

unchanged.

• --flipZ: same as above, in case the Z axis has been ﬂipped. This can be used together with the --flipYZ

option .

• --flipAng: indicates that all angles in the IMOD .tlt ﬁle have also been ﬂipped. This is rare.

• --off<x/y/z>: applies an oﬀset to the 3D coordinate system. Equivalent to the rlnTomoImportOffse<X/Y/

Z> column in the input .star ﬁle.

Final note: not all IMOD options are supported by this program. For example, the X-axis tilt option is not sup-

ported. In case it has been used, we recommend ignoring it and reﬁning the resulting alignment using the program

relion_tomo_align.

146 Chapter 7. Reference pages

RELION

relion_tomo_import_particles

This program imports a set of 3D coordinates and (optionally) Euler angles to construct a new particle set.

The 3D coordinates can be provided either in one big .star ﬁle, or as a set of shorter ﬁles, one for each tomogram.

If all the coordinates are provided as one .star ﬁle, it has to contain at least the following columns:

• rlnTomoName: name of the tomogram to which a particle belongs.

• rlnCoordinate<X/Y/Z>: the 3D coordinates within that tomogram.

If the input ﬁle also contains the particle angle columns rlnAngle<Rot/Tilt/Psi>, or any other column, they are also

imported.

In case the other format is chosen, then the input ﬁle only needs to contain the following columns: rlnTomoName

and rlnTomoImportParticleFile. The latter column points to a set of ﬁles (one for each tomogram) that contain the

coordinates (rlnCoordinate<X/Y/Z>) and (optionally) angles (rlnAngle<Rot/Tilt/Psi>).

Program arguments:

• --i: Input .star ﬁle containing a set of particle coordinates or a set of names of ﬁles containing those.

• --t: Input tomogram set which particles belong.

• --o: Output directory.

relion_tomo_make_optimisation_set

This program creates or, updates, an optimisation set ﬁle.

Program arguments:

• --i and/or --p, --t, --mot, --ref<1/2>, --mask and --fsc: input optimisation-set and/or its components

(see optimisation set).

• --o: name of the output optimisation set ﬁle.

relion_tomo_make_reference

This program runs relion_postprocess on the pair of reference half maps set in the input optimisation set or gen-

erated by relion_tomo_reconstruct_particle so they can be used by the diﬀerent reﬁnement programs more easily.

This program creates an output optimisation set ﬁle and a Postprocess folder within the output directory.

Optimisation set arguments:

• --i and/or --p, --t, --mot and --man: input optimisation-set and/or its components (see optimisation set).

• --opt_mask: Mask to be used for subsequent optimisation procedures. This argument updates the output opti-

misation set ﬁle.

7.5. Subtomogram Averaging 147

RELION

General program arguments:

• --rec: Reconstruction path of a relion_tomo_reconstruct_particle job (The --o argument). If this argument is

set, half maps in input optimisation set ﬁle are ignored.

• --mask: Mask to be used for FSC computation. This argument does not updates the output optimisation set ﬁle.

• --o: name of the output directory (that will be created).

relion_tomo_ﬁnd_ﬁducials

To be completed.

relion_tomo_add_spheres

To be completed.

relion_tomo_sample_manifold

To be completed.

Relion 1.4 Subtomogram averaging

For sub-tomogram averaging, which was implemented with help from Tanmay Bharat, a former postdoc in the Lowe

group at MRC-LMB, the same holds as for helical processing. Many general concepts remain the same as for single-

particle analysis, and users intending to perform sub-tomogram averaging in relion are encouraged to go through this

tutorial ﬁrst. For a detailed description of the sub-tomogram averaging procedures, the user is referred to the cor-

responding pages on the :textsc:`relion Wiki`, or to Tanmay’s paper[BRL+15]. Please note that we are still actively

working on making the sub-tomogram averaging pipeline more convenient to use and better. This work is done in close

collaboration with the group of John Briggs, also at MRC-LMB. Meanwhile, please be advised that the sub-tomogram

averaging pipeline is considerably less stream-lined than the single-particle one, and users should be prepared to do

some scripting outside the relion pipeline for many cases.

Core programs

• relion_tomo_reconstruct_particle : creates high-resolution 3D maps from a given data set.

• relion_tomo_subtomo : creates pseudo subtomograms that can be fed into relion_refine.

Reﬁnement programs

These programs use an existing particle set and a pair of high-resolution 3D reference maps to estimate or improve

diﬀerent properties:

• relion_tomo_align : aligns the images of a tilt series and estimates beam-induced motion to better ﬁt a set of

particles.

• relion_tomo_reﬁne_ctf : reﬁnes the astigmatic defocus, signal intensity (i.e. ice thickness) and higher-order

aberrations.

• relion_tomo_local_particle_reﬁne : oﬀers a rapid way to reﬁne the angles and positions of individual particles.

• relion_tomo_ﬁt_blobs_3d : ﬁts a set of spherical manifolds representing vesicles to the observed membranes.

148 Chapter 7. Reference pages

RELION

Utility programs

• relion_tomo_import_tomograms : imports new tilt series to create or extend a tomogram set.

• relion_tomo_import_particles : imports new particles.

• relion_tomo_reconstruct_tomogram : creates a (typically low-magniﬁcation) tomogram.

• relion_tomo_make_optimisation_set : ties a set of disparate data elements into an optimisation set.

• relion_tomo_make_reference : runs relion_postprocess on a pair of reference half maps.

• relion_tomo_add_spheres : creates a manifold set from a set of spheres given in .cmm format.

• relion_tomo_ﬁnd_ﬁducials : detects the ﬁducial markers in a given data set.

• relion_tomo_sample_manifold : samples a set of particle positions and orientations from a set of manifolds.

7.5.3 Previous pipeline

• RELION 1.4 subtomogram averaging pipeline.

7.6 Using RELION

7.6.1 The GUI

A pipeline approach

The GUI serves a central role in it’s pipelined approach, details of which have been published in the 2016 Proceedings of

the CCP-EM Spring Symposium [FLS17]. We recommend to create a single directory per project, i.e. per structure you

want to determine. We call this the project directory. It is important to always launch the relion graphical user-interface

(GUI), by typing the command relion, from the project directory.

The GUI keeps track of all jobs and how output from one job is used as input for another, thereby forming a workﬂow

or pipeline. Each type of job has its own output directory, e.g. Class2D/, and inside these job-type directories, new

jobs get consecutive numbers, e.g. Class2D/job010. Inside these individual job directories, output names are ﬁxed,

e.g. Class2D/job010/run. To provide a mechanism to have more meaningful names for jobs, a system of job aliases

is used, which are implemented as symbolic links to the individual job directories on the ﬁlesystem. All info about

the pipeline is stored in a ﬁle called default_pipeline.star, but in normal circumstances the user does not need

to look into this ﬁle. In case this ﬁle gets corrupted, one can copy back a backup of this ﬁle from the last executed job

directory.

The upper half: jobtype-browser and parameter-panel

On the left of the upper half of the GUI is the jobtype-browser: a vertical list of jobtypes, e.g. 2D classiﬁcation . On

the right is a panel with multiple tabs, where parameters to the diﬀerent types of jobs may be input. On the top left

of the GUI are three diﬀerent menu’s, providing a range of functionalities. The Scheme and Run! buttons can be

used to scheme jobs for future execution, or to execute them now. The former is particularly useful in preparing fully

automated pipelines that can be run iteratively, for example in real-time as data is being collected. See Schemes for

more details. By clicking in the jobtype-browser on the left-hand side of the GUI, a new job (with a Run! button) will

be loaded in the parameter-panel on the right.

7.6. Using RELION 149

RELION

The lower half: job-lists and stdout/stderr windows

The lower half of the GUI contains lists of jobs that are still running ( Running jobs ), have already ﬁnished

( Finished jobs ), or are schemed for later execution ( Schemed jobs ). By clicking jobs in these lists, the parameters of

that job will be loaded in the parameter-panel, and the Run! button will change color and turn into continue now! . Upon

clicking the latter, no new output job-directory will be made, but the job will be continued according to the parameters

given in the parameter-panel. 2D classiﬁcation , 3D classiﬁcations and 3D auto-reﬁne jobs will need a _optimiser.star

ﬁle to continue from, and will have ﬁlenames with the iteration from which they were continued, e.g. run_ct23. Other

types of jobs may continue from the point until they were executed before, e.g. Motion correction , CTF estimation ,

Auto-picking and Particle Extraction will continue by running only on those micrographs that weren’t done before. The

Input to this job and Output from this job lists link jobs together and can be used to browse backwards or forwards

in the project history.

At the bottom of the lower half of the GUI, the standard output (stdout) and standard error (stderr) of the selected

(ﬁnished or running) job will show in black and red text, respectively. The stderr should ideally be empty, any text

here is usually worth inspection. These text displays get updated every time you click on a job in the job-lists. Double-

clicking on the stdout or stderr displays will open a pop-up window with the entire text for more convenient scrolling.

The Display button

The Display: button below the run and scheme buttons serves to visualise the most important input and output ﬁles for

each job. When a job from the job-lists in the lower half of the GUI is selected, clicking this button will pop-up a menu

with all the input and output of this job that can be displayed (for example, particles, micrographs, coordinates, PDF

ﬁles, etc). A more general functionality to display any (e.g. intermediate) ﬁle can be accessed through the Display

option of the File menu on the top left of the GUI.

The Job actions button

The Job actions button opens up a little menu with options for the selected (running, ﬁnished or schemed) job. Here,

you can access a ﬁle called note.txt (that is saved in every individual job directory and which may be used to store user

comments); you can change the alias of the job; you can mark a job as ﬁnished (in case it somehow got stuck); you can

make a ﬂowchart of the history of that job (provided LaTeX and the TikZ package are installed on your system, also

see this section); or you can delete or clean a job to save disk space (see below).

Clean-up to save disk space

Deletion of jobs moves the entire job directory from the project directory into a directory called Trash/. You can

empty the Trash folder from ‘File’ menu on the top left of the GUI to really free up the space. Until you do so, you can

still undelete jobs using the corresponding option from the Jobs menu on the top left.

To save disk space, you can also clean jobs, which will move intermediate ﬁles to the Trash folder, e.g. the ﬁles written

out for all intermediate iterations of reﬁne jobs. There are two cleaning options: gentle clean will leave all ﬁles

intact that could be used as input into another job, while harsh clean may also remove those. Evidently, harsh

cleaning can free up more space, in particular directories with particle stacks or micrographs may become large, e.g.

from Motion correction , Particle extraction , Movie reﬁnement and Particle polishing job types. One can also clean all

directories in the project with a single click using the corresponding options from the Jobs menu on the top left of the

GUI. You can protect speciﬁc directories from harsh cleaning by placing a ﬁle called NO_HARSH_CLEAN inside them,

e.g. you may want to protect your ﬁnal set of polished particles from deletion by executing:

150 Chapter 7. Reference pages

RELION

touch Polish/job098/NO_HARSH_CLEAN

7.6.2 Optimise computations for your setup

GPU-acceleration

Dari Kimanius and Bjoern Forsberg from the group of Erik Lindahl (Stockholm) have ported the most computationally

expensive parts of relion for the use of GPUs. Because they used the cuda-libraries from nvidia to do this, GPU-

acceleration in relion only works with nvidia cards. These need to be of compute capability 3.5 or higher. Both single

and double precision cards will work, so one is not restricted to the expensive double-precision cards, but can use the

cheaper gaming cards as well. Details of their implementation can be found in their eLife paper[KFSL16].

Two diﬀerent relion programs have been GPU-accelerated: relion_autopick (for Auto-picking ) and relion_reﬁne (for

2D classiﬁcation , 3D classiﬁcation and 3D auto-reﬁne jobs). Both the sequential and the MPI-versions of these pro-

grams have been accelerated.

Disk access

With the much improved speed of image processing provided by the GPU-acceleration, access to the hard disk increas-

ingly becomes a bottle neck. Several options are available on the relion GUI to optimise disk access for your data set

and computer setup. For 2D classiﬁcation , 3D initial model , 3D classiﬁcation and 3D auto-reﬁne one can choose to

use parallel disc I/O. When set to Yes, all MPI processes will read particles simultaneously from the hard disk. Oth-

erwise, only the master will read images and send them through the network to the slaves. Parallel ﬁle systems like

gluster of fhgfs are good at parallel disc I/O. NFS may break with many slaves reading in parallel.

One can also set the number of pooled particles. Particles are processed in individual batches by MPI slaves. During

each batch, a stack of particle images is only opened and closed once to improve disk access times. All particle images

of a single batch are read into memory together. The size of these batches is at least one particle per thread used. This

parameter controls how many particles are read together in a batch by each thread. If it is set to 3 and one uses 8 threads,

batches of 3x8=24 particles will be read together. This may improve performance on systems where disk access, and

particularly metadata handling of disk access, is a problem. It has a modest cost of increased RAM usage.

If one has a relatively small data set (and/or a computer with a lot of RAM), then one can pre-read all particles into

RAM at the beginning of a calculation. This will greatly speed up calculations on systems with relatively slow disk

access. However, one should of course be careful with the amount of RAM available. Because particles are read in

ﬂoat-precision, it will take

𝑁×𝑏𝑜𝑥𝑠𝑖𝑧𝑒×𝑏𝑜𝑥𝑠𝑖𝑧𝑒×4

1024×1024×1024

gigabytes to read N particles into RAM. For 100,000 particles with a

200-pixel boxsize that becomes 15GB, or 60 GB for the same number of particles in a 400-pixel boxsize.

If the data set is too large to pre-read into RAM, but each computing node has a local, fast disk (e.g. a solid-state drive)

mounted with the same name, then one can let each MPI slave copy all particles onto the local disk prior to starting the

calculations. This is done using the Copy particles to scratch directory. If multiple slaves will be executed

on the same node, only the ﬁrst slave will copy the particles. If the local disk is too small to hold the entire data set,

those particles that no loner ﬁt on the scratch disk will be read from their original position. A sub-directory called

relion_volatile will be created inside the speciﬁed directory name. For example, if one speciﬁes /ssd, then a

directory called /ssd/relion_volatile will be created. If the /ssd/relion_volatile directory already exists,

it will be wiped before copying the particles. Then, the program will copy all input particles into a single large stack

inside this directory. If the job ﬁnishes correctly, the /ssd/relion_volatile directory will be deleted again. If the

job crashes before ﬁnishing, you may want to remove it yourself. The program will create the /ssd/relion_volatile

directory with writing permissions for everyone. Thereby, one can choose to use /ssd, i.e. without a username, as a

scratch directory. That way, provided always only a single job is executed by a single user on each computing node,

the local disks do not run the risk of ﬁlling up with junk when jobs crash and users forget to clean the scratch disk

themselves.

7.6. Using RELION 151

RELION

Finally, there is an option to combine iterations through disc. If set to Yes, at the end of every iteration all

MPI slaves will write out a large ﬁle with their accumulated results. The MPI master will read in all these ﬁles,

combine them all, and write out a new ﬁle with the combined results. All MPI salves will then read in the combined

results. This reduces heavy load on the network, but increases load on the disc I/O. This will aﬀect the time it takes

between the progress-bar in the expectation step reaching its end (the mouse gets to the cheese) and the start of the

ensuing maximisation step. It will depend on your system setup which is most eﬃcient. This option was originally

implemented to circumvent bugs on the network cards on our old cluster at LMB. Nowadays, we prefer not to use this

option, as it tends to be very slow when reﬁnements reached high resolutions.

7.6.3 Interaction with other programs

Although, in principle, relion can use particles that have been extracted by a diﬀerent program, this is NOT the recom-

mended procedure. Many programs change the particles themselves, e.g. through phase ﬂipping, band-pass or Wiener

ﬁltering, masking etc. All these are sub-optimal for subsequent use in relion. Moreover, gathering all the required

metadata into a correctly formatted relion-type star ﬁle may be prone to errors. Because re-extracting your particles

in relion is straightforward and very fast, the procedure outlined below is often a much easier (and better) route into

relion.

Also, several implementations of wrappers around relion have now been reported (e.g. in eman2, scipion and appion).

Although we try to be helpful when others write these wrappers, we have absolutely no control over them and do not

know whether their ﬁnal product uses relion in the best way. Therefore, in case of any doubt regarding results obtained

with these wrappers, we would recommend following the procedures outlined in this tutorial. The recommended way

of executing external programs from within the relion pipeline itself is outlined in the next section.

7.6.4 The External job-type

User interaction through the GUI

The External job-type on the relion-3.1 GUI provides a way to execute any third-party program from within the relion

pipeline. The interaction with the user is as follows:

On the Input tab set:

External executable:

myscript.py

(This is the ﬁlename of an executable script, which will call the external program.)

Input movies:

Input micrographs:

Input particles:

Input coordinates:

Input 3D reference:

Input 3D mask:

152 Chapter 7. Reference pages

RELION

The user provides at least one of the input entries to tell relion from which other jobs the input nodes come from, and

what type of input this is. This will therefore allow to maintain an intact directional graph inside the pipeliner. On

the Params tab, the user can then provide up to ten (optional) free parameters that will be passed onto the executable

script. Finally, the Running tab allows multi-threaded execution, queue submission, and any other arguments to be

passed through the Additional arguments entry.

The GUI will then create and execute the following command, either locally or through a queueing system:

myscript.py --o External/jobXXX/ --in_YYY ZZZ --LABELN VALUEN --j J

where

• XXX is the current jobnumber in the pipeline.

• YYY is the type of the input node: movies, mics, parts, coords, 3dref, or mask,

• ZZZ is the name of the corresponding input node. Note that more than one input node may be given, each with

its own --in_YYY argument.

• LABELN is the label of a free parameter, as deﬁned on the Params tab of the GUI. Note that up to ten diﬀerent

labels may be used.

• VALUEN is the corresponding value of the free parameter. This is optional: not every label needs a value.

• J is the number of threads deﬁned on the running tab.

Functionality of the executable script

It is the responsibility of the executable script (myscript.py) to handle the command line parsing of the generated

command. In addition, there are a few rules the script needs to adhere to:

• All output needs to be written out in the output directory, as speciﬁed by the --o External/jobXXX/ option.

In addition, for jobs that emulate relion job-types like Motion correction or Auto-picking , the output should be

organised in the same directory structure as the corresponding relion job-type would make.

• When completed, the script should create an empty ﬁle called RELION_JOB_EXIT_SUCCESS in the output direc-

tory. This will tell the pipeliner that the task has ﬁnished successfully. Aborted or failed runs may optionally be

communicated by creating ﬁles called RELION_JOB_EXIT_ABORTED and RELION_JOB_EXIT_FAILURE.

• The job should output ﬁle called RELION_OUTPUT_NODES.star, which should have a table called

data_output_nodes. This table should have two columns with the names and types of all the output nodes

of the job, using the _rlnPipeLineNodeName and _rlnPipeLineNodeType metadata labels. See the

data_pipeline_nodes table in the default_pipeline.star ﬁle of any relion project directory for exam-

ples. Output nodes deﬁned here will lead to the creation of edges between jobs in the directional graph of the

pipeliner; and output nodes will be available for convenient displaying by the user using the Display: button on

the GUI.

• The following may not be necessary or relevant, but the GUI has a Job actions button, which allows users

to abort running jobs. This button will create a ﬁle called RELION_JOB_ABORT_NOW in the output direc-

tory. If this functionality is to be used, the script should abort the job when this ﬁle is present, create the

RELION_JOB_EXIT_ABORTED ﬁle, remove the RELION_JOB_ABORT_NOW ﬁle, and then exit.

7.6. Using RELION 153

RELION

Example: a particle-picker

If your external program is a particle picker, e.g. Topaz [BMR+19], then you would give on the Input tab:

External executable:

run_topaz.py

Input micrographs:

CtfFind/job005/micrographs_ctf.star

(This ﬁle would be visible through the Browse button next to the input entry, which would only show

star ﬁles of micrographs that exist in the current project.)

On the Params tab, one would provide any necessary arguments to be picked up by the script, for example:

Param1 label, value:

threshold 0.1

Param2 label, value:

denoise_ﬁrst

Upon pressing the Run! button, this would execute the following command:

run_topaz.py --o External/job006/ \

--in_mics CtfFind/job005/micrographs_ctf.star \

--threshold 0.1 --denoise_first --j 1

The executable run_topaz.py is then responsible for correctly passing the command line arguments to topaz, and to

make sure the rules in the previous section are adhered to. For picking jobs, the directory structure of the input movies

(or micrographs) should be maintained inside the output directory, and each micrograph would have a STAR ﬁle with

the picked coordinates that has the same rootname as the original micrograph, but with a _PICKNAME.star suﬃx. The

PICKNAME is a free string. One could use the name of the particle-picking program, for example topaz. Therefore,

if a movie was originally imported as Movies/mic001.tif, its corresponding STAR ﬁle with the picked coordinate

would be placed in External/job006/Movies/mic001_topaz.star. In addition, in the output directory, the script

should create a text ﬁle called coords_suffix_PICKNAME.star (i.e. coords_suffix_topaz.star). This ﬁle

should contain at least one line of text, which is the name of the input micrographs STAR ﬁle given on the Input tab,

i.e. CtfFind/job005/micrographs_ctf.star. The output node (the coords_suffix_topaz.star ﬁle) should

also be listed in the RELION_OUTPUT_NODES.star ﬁle. This ﬁle would therefore look like this:

data_output_nodes

loop_

_rlnPipeLineNodeName #1

_rlnPipeLineNodeType #2

External/job006/coords_suffix_topaz.star 2

7.7 Conventions

7.7.1 Image formats

relion reads the following image ﬁle formats:

• MRC images (with extension .mrc)

• MRC stacks (with extension .mrcs)

154 Chapter 7. Reference pages

RELION

• SPIDER images (with extension .spi)

• SPIDER stacks (with extension .spi)

• TIFF images or movies (with extensions .tif and .tiff)

• EER movies (with extension .eer)

For compatibility with other EM programs (e.g. UCSF MotionCor2, IMOD), TIFF images are ﬂipped along the slow

axis when being read into memory or written to a ﬁle. This happens regardless of the TIFFTAG_ORIENTATION value

in the header.

relion writes individual images and image stacks in the MRC format . Because there is no distinction between 3D

maps or stacks of 2D images in MRC format, 3D maps are indicated with a .mrc extension, and stacks with a .mrcs

extension. Individual images in stacks are indicated by an integer number (ranging from 1 to the number of images

in the stack) followed by an “@” sign and the ﬁlename of the stack. Thereby, the ﬁrst three images in a stack called

my_images.mrcs are called:

1@my_images.mrcs

2@my_images.mrcs

3@my_images.mrcs

RELION supports the following MRC modes.

signed 8-bit integer

signed 16-bit integer

32-bit ﬂoating point

unsigned 16-bit integer

12:

16-bit ﬂoating point (RELION extension)

101:

packed 4-bit integer (IMOD extension)

The mode 12 is RELION’s extension proposed in 2021 to save disk space. It uses IEEE754, binary16 encoding. Motion

Correction (with RELION’s own implementation, not UCSF MotionCor2), Extract, Polish and Subtract jobs can write

them with the --float16 option. All RELION programs can read mode 12 MRC ﬁles. External programs that use

recent versions of mrcﬁle Python package can read them but other programs might not be able to read them.

As of 2021 March,

• GCTF does not support ﬂoat16.

• CTFFIND does not support ﬂoat16 but RELION’s motion correction job always writes power spectra in ﬂoat32,

which can be used in CTFFIND.

• Topaz does not support ﬂoat16 but RELION’s wrapper performs preprocessing from ﬂoat16 to ﬂoat32. Thus, as

long as you use Topaz via RELION’s wrapper, training and picking work ﬁne.

7.7. Conventions 155

RELION

7.7.2 STAR format

relion uses the STAR (Self-deﬁning Text Archiving and Retrieval) format (Hall, Allen and Brown, 1991) for the storage

of label-value pairs for all kinds of input and output metadata. The STAR format is an alternative to XML, but it is

more readable and occupies less space. The STAR format has been adopted by the crystallographic community in the

form of CIF (Crystallographic Information Framework), and Bernard Heymann’s BSOFT package was the ﬁrst to use

STAR in the ﬁeld of 3D-EM.

relion’s implementation of the STAR format has the following rules (partially copied from BSOFT’s manual):

• The ﬁle name must end in a “.star” extension.

• Each ﬁle must have one or more data blocks. The start of a data block is deﬁned by the keyword data_ followed

by an optional string for identiﬁcation (e.g., “data_images”).

• Multiple values associated with one or more labels in a data block can be arranged in a table using the keyword

loop_ followed by the list of labels and columns of values. The values are delimited by whitespace (i.e., blanks,

tabs, end-of-lines and carriage returns). The loop must be followed by an empty line to indicate its end.

• Label names always starts with an underscore (“_”). Each label may only be used once within each data block.

• Data items or values can be numeric or strings of characters. A string is interpreted as a single item when it

doesn’t contain spaces

• Comments are strings which can occur in three places: * File comments: All text before the ﬁrst data_ keyword

* Data block comments: Strings on their own lines starting with “#” or with “;” as the ﬁrst character in the line.

* Item comments: Strings on the same line as and following tag-value items, also indicated by a leading “#”.

• String values that contain spaces can be quoted by ". An empty string becomes "".

Metadata label deﬁnitions

relion has its own set of deﬁned metadata labels. The following command will print a list of the deﬁnitions of all of

them:

relion_refine --print_metadata_labels

7.7.3 Optics Groups

relion 3.1 introduced optics groups. See [ZNS20] for an introduction.

Movies, micrographs or particles are in the movies, micrographs and particles tables, respectively, and refer to

an optics group entry in the optics table via the rlnOpticsGroup column. rlnOpticsGroupName strings are used

when merging two STAR ﬁles. Optics groups with diﬀerent names are considered diﬀerent and rlnOpticsGroup IDs

will be re-numbered.

relion 3.1+ automatically upgrades old-style STAR ﬁles from relion 3.0 and earlier. You can also manually convert

them by relion_convert_star. Downgrading to the relion 3.0 format is not oﬃcially supported but you might ﬁnd

this ccpem mailing list post useful.

156 Chapter 7. Reference pages

RELION

7.7.4 Orientations

Orientations (rlnAngleRot, rlnAngleTilt, rlnAnglePsi) in a STAR ﬁle rotate the reference into obser-

vations (i.e. particle image), while translations (rlnOriginXAngstrom and rlnOriginYAngstrom) shifts

observations into the reference projection. For developers, a good starting point for code reading is

ObservationModel::predictObservation() in the src/jaz/obs_model.cpp.

In compliance with the Heymann, Chagoyen and Belnap (2005) standard relion uses a right-handed coordinate system

with orthogonal axes X, Y and Z, where right-handed rotations are called positive, and Euler angles are deﬁned as:

• The ﬁrst rotation is called rlnAngleRot and is around the Z-axis.

• The second rotation is called rlnAngleTilt and is around the new Y-axis.

• The third rotation is called rlnAnglePsi and is around the new Z axis

As such, relion uses the same Euler angles as XMIPP, SPIDER and FREALIGN.

The center of rotation of a 2D image of dimensions xdim x ydim is deﬁned by ((int)xdim/2, (int)(ydim/2))

(with the ﬁrst pixel in the upper left being (0,0). Note that for both xdim=ydim=65 and for xdim=ydim=64, the center

will be at (32,32). This is the same convention as used in SPIDER and XMIPP. Origin oﬀsets reported for individual

images translate the image to its center and are applied BEFORE rotations.

Particle translations used to be in pixels (rlnOriginX and rlnOriginY) but this changed to Angstroms

(rlnOriginXAngstrom and rlnOriginYAngstrom) in relion 3.1.

The unit of particle coordinates in a micrograph (rlnCoordinateX and rlnCoordinateY) is pixel in the aligned and

summed micrograph (possibly binned from super-resolution movies). The origin is the ﬁrst element in the 2D array of

an MRC ﬁle. The origin is displayed at the upper-left corner in relion (other programs might display in other ways).

7.7.5 Contrast Transfer Function

CTF parameters are deﬁned as in ctﬃnd 4.1, also see [MG03].

Higher order aberrations

rlnOddZernike contains coeﬃcients for asymmetric (antisymmetric) Zernike polynomials

𝑍

−1

, 𝑍

−3

, 𝑍

−1, 𝑍

, 𝑍

, · · · in this order. rlnEvenZernike contains coeﬃcients for symmetric Zernike

polynomials 𝑍

, 𝑍

−2

, 𝑍

−4

, 𝑍

−2

, 𝑍

· · · in this order. Thus, the 7-th item in the rlnEvenZernike,

, is related to an error in the spherical aberration coeﬃcient.

Look at the table in Wikipedia but ignore square root terms, as the coeﬃcients are not normalised in relion. For

example, 𝑍

−1

= (3𝑟

− 2𝑟) sin 𝜃 = 3(𝑘

𝑥

+ 𝑘

𝑦

)𝑘

𝑦

− 2𝑘

𝑦

, where 𝑘

𝑥

and 𝑘

𝑦

are wave-numbers in the reciprocal space

(1 / Å).

Anisotropic magniﬁcation corrections

Transformation by anisotropic magniﬁcation brings the reference into observations (i.e.particle images) in real space.

Note that stretching in real space is shrinking in reciprocal space and vice versa.

rlnMagMatrix_00 to rlnMagMatrix_11 represent the matrix M in the section 2.4 of [ZNS20]. The values become

larger when the observed particle in the real space looks larger than the reference projection at the nominal pixel size.

This also means that the true pixel size is actually smaller than the nominal pixel size.

7.7. Conventions 157

RELION

7.7.6 Symmetry

Symmetry libraries have been copied from XMIPP. As such, with the exception of tetrahedral symmetry, they comply

with [JBH05]:

Group Nota-

tion

Origin Orientation

Asymmetric C1 User-deﬁned User-deﬁned

Cyclic C<n> On symm axis, Z user-deﬁned Symm axis on Z

Dihedral D<n> Intersection of symm axes principle symm axis on Z, 2-fold on X

Tetrahedral T Intersection of symm axes 3-fold axis on Z (deviating from Heymann et

al!)

Octahedral O Intersection of symm axes 4-fold axes on X, Y, Z

Icosahedral I<n> Intersection of symm axes **

** Multiple settings of the icosahedral symmetry group have been implemented:

I1:

No-crowther 222 setting (=standard in Heymann et al): 2-fold axes on X,Y,Z. With the positive Z pointing at the

viewer, the front-most 5-fold vertices are in YZ plane, and the front-most 3-fold axes are in the XZ plane.

I2:

Crowther 222 setting: 2-fold axes on X,Y,Z. With the positive Z pointing at the viewer, the front-most 5-fold

vertices are in XZ plane, and the front-most 3-fold axes are in the YZ plane.

I3:

52-setting (as in SPIDER?): 5-fold axis on Z and 2-fold on Y. With the positive Z pointing at the viewer and

without taken into account the 5-fold vertex in Z, there is one of the front-most 5-fold vertices in -XZ plane

I4:

Alternative 52 setting: with the positive Z pointing at the viewer and without taken into account the 5-fold vertices

in Z, there is one of the front-most 5-fold vertices in +XZ plane.

In case of doubt, a list of all employed symmetry operators may be printed to screen using the command (for example

for the D7 group):

relion_refine --sym D7 --print_symmetry_ops

7.8 Developer Guide

ò Note

This page describes information for RELION developers and code contributors. General users do not need infor-

mation here.

7.8.1 Coding styles in C++

Indentation should be done with tabs, while alignment should be done with spaces.

New lines should be as follows. Note that due to limitation of rST syntax, I used spaces instead of tabs.

158 Chapter 7. Reference pages

RELION

// Use this

void test(int a, int b)

{

if (!flag)

{

A();

B();

}

else

{

C();

D();

}

// Not this

void test(int a, int b) {

if (!flag) {

A();

B();

} else {

A();

B();

}

7.8.2 How to compile this documentation

To compile this documentation into HTML, type:

make html

For PDF, perform:

make latex

cd build/latex

pdflatex relion.tex # you might have to install several style files

If your system has latexmk, this can be done in one line:

make latexpdf

If above commands complain about missing Python packages, try:

python3 -m pip install --upgrade sphinx sphinxcontrib-bibtex

7.8. Developer Guide 159

RELION

7.8.3 Coding styles in reStructuredText (rST)

For Å, 𝛼, 𝛽 in the main text, just type as is using Unicode. Don’t use TeX commands.

To make git diff easier to understand, each sentense should use a new line.

A sentence.

Next sentence.

A new paragraph.

``Keyword``, *emphasis by italic* and **emphasis by bold** are marked like this.

You can cite a paper :cite:`Zivanov_estimation_2020`.

A link to `rST documentation <https://www.sphinx-doc.org/en/master/usage/

˓→restructuredtext/basics.html>`_.

Note the underline.

You can use LaTeX math :math:`\sum x_i^2`.

We have custom tags for :button:`buttons`, :joblist:`joblists`, :jobtype:`jobtypes` and␣

˓→:guitab:`guitabs`.

|RELION| prints RELION in smallscaps.

We also have |MotionCor2| and |CTFFIND4.1| defined.

This paragram is a bad example. Two sentences are in one line.

A block quote for commands is an indented block following two

colons and an empty line. Please use FOUR SPACES, not tabs for indentation.

- When using a list, be careful about indentation.

The second sentence must be aligned.

- The second item in the list.

:Field list: suggested value

(Explanation sentence 1.

Note the indentation of this line!

Also note the following blank line.)

:Another option without a value: \

(Please put `\` at the end of the line to the field that should be left empty.)

The above example is rendered as:

A sentence. Next sentence.

A new paragraph. Keyword, emphasis by italic and emphasis by bold are marked like this. You can cite a paper

[ZNS20]. A link to rST documentation. Note the underline. You can use LaTeX math



𝑥

𝑖

We have custom tags for buttons , joblists , jobtypes and guitabs . relion prints RELION in smallscaps. We also

have motioncor2 and ctﬃnd 4.1 deﬁned.

This paragram is a bad example. Two sentences are in one line.

160 Chapter 7. Reference pages

RELION

A block quote for commands is an indented block following two

colons and an empty line. Please use FOUR SPACES, not tabs for indentation.

• When using a list, be careful about indentation. The second sentence must be aligned.

• The second item in the list.

Field list

suggested value

(Explanation sentence 1. Note the indentation of this line! Also note the blank lines.)

Another option without a value

(Please put \ at the end of the line to the ﬁeld that should be left empty.)

7.9 Bibliography

7.9. Bibliography 161

RELION

162 Chapter 7. Reference pages

BIBLIOGRAPHY

[BMR+19] Tristan Bepler, Andrew Morin, Micah Rapp, Julia Brasch, Lawrence Shapiro, Alex J. Noble, and Bon-

nie Berger. Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micro-

graphs. Nature Methods, 2019. doi:10.1038/s41592-019-0575-8.

[BRL+15] Tanmay A. M. Bharat, Christopher J. Russo, Jan Löwe, Lori A. Passmore, and Sjors H. W.

Scheres. Advances in Single-Particle Electron Cryomicroscopy Structure Determination applied to

Sub-tomogram Averaging. Structure (London, England: 1993), 23(9):1743–1753, September 2015.

doi:10.1016/j.str.2015.06.026.

[BJPJ19] Tim-Oliver Buchholz, Mareike Jordan, Gaia Pigino, and Florian Jug. Cryo-CARE: Content-Aware Image

Restoration for Cryo-Transmission Electron Microscopy Data. 2019 IEEE 16th International Symposium

on Biomedical Imaging (ISBI 2019), pages 502–506, April 2019. doi:10.1109/ISBI.2019.8759519.

[CMF+13] Shaoxia Chen, Greg McMullan, Abdul R. Faruqi, Garib N. Murshudov, Judith M. Short, Sjors H. W.

Scheres, and Richard Henderson. High-resolution noise substitution to measure overﬁtting and validate

resolution in 3d structure determination by single particle electron cryomicroscopy. Ultramicroscopy,

135:24–35, December 2013. doi:10.1016/j.ultramic.2013.06.004.

[ELSC10] Paul Emsley, Bernhard Lohkamp, William G. Scott, and Kevin Cowtan. Features and develop-

ment of coot. Acta Crystallographica Section D - Biological Crystallography, 66:486–501, 2010.

doi:10.1107/S0907444910007493.

[FLS17] Rafael Fernandez-Leiro and Sjors H. W. Scheres. A pipeline approach to single-particle processing

in RELION. Acta Crystallographica. Section D, Structural Biology, 73(Pt 6):496–502, June 2017.

doi:10.1107/S2059798316019276.

[HS17] Shaoda He and Sjors H. W. Scheres. Helical reconstruction in RELION. Journal of Structural Biology,

2017. doi:10.1016/j.jsb.2017.02.003.

[JBH05] David M Belnap J Bernard Heymann, Mónica Chagoyen. Common conventions for interchange and

archiving of three-dimensional electron microscopy information in structural biology. Journal of Struc-

tural Biology, 151(2):196–207, 2005.

[JKZ+24] Kiarash Jamali, Lukas Käll, Rui Zhang, Alan Brown, Dari Kimanius, and Sjors H. W. Scheres. Automated

model building and protein identiﬁcation in cryo-EM maps. Nature, 628(8007):450–457, April 2024.

doi:10.1038/s41586-024-07215-4.

[KFSL16] Dari Kimanius, Björn O Forsberg, Sjors HW Scheres, and Erik Lindahl. Accelerated cryo-EM

structure determination with parallelisation using GPUs in RELION-2. eLife, November 2016.

doi:10.7554/eLife.18722.

[KST14] Alp Kucukelbir, Fred J Sigworth, and Hemant D Tagare. Quantifying the local resolution of cryo-EM

density maps. Nature methods, 11(1):63–65, January 2014. doi:10.1038/nmeth.2727.

163

RELION

[MG03] Joseph A Mindell and Nikolaus Grigorieﬀ. Accurate determination of local defocus and specimen tilt in

electron microscopy. Journal of Structural Biology, 142(3):334–347, 2003.

[PRFB17] Ali Punjani, John L. Rubinstein, David J. Fleet, and Marcus A. Brubaker. cryoSPARC: algorithms for

rapid unsupervised cryo-EM structure determination. Nature Methods, 14(3):290–296, March 2017.

doi:10.1038/nmeth.4169.

[RG15] Alexis Rohou and Nikolaus Grigorieﬀ. CTFFIND4: Fast and accurate defocus estimation

from electron micrographs. Journal of Structural Biology, 192(2):216–221, November 2015.

doi:10.1016/j.jsb.2015.08.008.

[RH03] Peter B Rosenthal and Richard Henderson. Optimal determination of particle orientation, absolute

hand, and contrast loss in single-particle electron cryomicroscopy. Journal of Molecular Biology,

333(4):721–745, October 2003.

[Sch10] Sjors H W Scheres. Classiﬁcation of Structural Heterogeneity by Maximum-Likelihood Methods. In Cryo-

EM, Part B: 3-D Reconstruction, volume 482 of Methods in Enzymology, pages 295–320. Academic

Press, 2010.

[Sch12a] Sjors H W Scheres. A Bayesian view on cryo-EM structure determination. Journal of Molecular Biology,

415(2):406–418, January 2012. doi:10.1016/j.jmb.2011.11.010.

[Sch12b] Sjors H W Scheres. RELION: Implementation of a Bayesian approach to cryo-EM structure determination.

Journal of Structural Biology, 180(3):519–530, December 2012. doi:10.1016/j.jsb.2012.09.006.

[SC12] Sjors H W Scheres and Shaoxia Chen. Prevention of overﬁtting in cryo-EM structure determination. Na-

ture methods, 9(9):853–854, September 2012. doi:10.1038/nmeth.2115.

[SNRS+08] Sjors H W Scheres, R. Nunez-Ramirez, C. O. S Sorzano, J. M Carazo, and R. Marabini. Image processing

for electron microscopy single-particle analysis using XMIPP. Nature Protocols, 3(6):977–90, 2008.

[SKB+23] Johannes Schwab, Dari Kimanius, Alister Burt, Tom Dendooven, and Sjors Scheres. Dynamight:

estimating molecular motions with improved reconstruction from cryo-em images. bioRxiv, 2023.

doi:10.1101/2023.10.18.562877.

[Smi99] J M Smith. Ximdisp–A visualization tool to aid structure determination from electron microscope images.

Journal of structural biology, 125(2-3):223–228, May 1999. doi:10.1006/jsbi.1998.4073.

[TPB+07] Guang Tang, Liwei Peng, Philip R Baldwin, Deepinder S Mann, Wen Jiang, Ian Rees, and Steven J Ludtke.

EMAN2: an extensible image processing suite for electron microscopy. Journal of Structural Biology,

157(1):38–46, January 2007. doi:10.1016/j.jsb.2006.05.009.

[WMS+19] Thorsten Wagner, Felipe Merino, Markus Stabrin, Toshio Moriya, Claudia Antoni, Amir Apelbaum, Phi-

line Hagel, Oleg Sitsel, Tobias Raisch, Daniel Prumbaum, Dennis Quentin, Daniel Roderer, Sebastian

Tacke, Birte Siebolds, Evelyn Schubert, Tanvir R. Shaikh, Pascal Lill, Christos Gatsogiannis, and Stefan

Raunser. SPHIRE-crYOLO is a fast and accurate fully automated particle picker for cryo-EM. Communi-

cations Biology, 2(1):1–13, June 2019. doi:10.1038/s42003-019-0437-z.

[YPBM21] Keitaro Yamashita, Colin M. Palmer, Tom Burnley, and Garib N. Murshudov. Cryo-EM single-

particle structure reﬁnement and map calculation using \it Servalcat. Acta Crystallographica Section D,

77(10):1282–1291, Oct 2021. doi:10.1107/S2059798321009475.

[ZPA+17] Shawn Q. Zheng, Eugene Palovcak, Jean-Paul Armache, Kliment A. Verba, Yifan Cheng, and David A.

Agard. MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron mi-

croscopy. Nature Methods, 14(4):331–332, April 2017. doi:10.1038/nmeth.4193.

[ZNS20] Jasenko Zivanov, Takanori Nakane, and Sjors Scheres. Estimation of high-order aberra-

tions and anisotropic magniﬁcation from cryo-em datasets in relion-3.1. IUCrJ, 2020.

doi:10.1107/S2052252520000081.

164 Bibliography