ThreeB: documentation.h Source File

ThreeB 1.1
00001 ///@file documentation.h Doxygen documentation bits
00002 
00003 /// @defgroup gDebug Useful debugging functions.
00004 
00005 /// @defgroup gUtility General utility functions.
00006 
00007 /// @defgroup gHMM Generic hidden Markov model solver.
00008 ///
00009 /// The notation follows `A tutorial on hidden Markov models and selected applications in speech recognition', Rabiner, 1989
00010 ///
00011 
00012 /// @defgroup gPlugin Classes related to the ImageJ Plugin
00013 
00014 /// @defgroup gStorm Storm classes
00015 
00016 // @defgroup gMultiSpotDrift Storm classes specific to multispot processing with drift
00017 
00018 /// @defgroup gStormImages Storm imagery classes (basic image processing)
00019 
00020 // @defgroup gGraphics Graphical display
00021 
00022 // @defgroup gUserInterface General user interface classes
00023 
00024 /// @file multispot5.cc Fit spots to the data
00025 
00026 /// @file multispot5_headless.cc FitSpots driver for entierly headless (batch) operation
00027 
00028 /// @file multispot5_gui.cc FitSpots driver for interactive (GUI) operation and debugging
00029 
00030 ///@file mt19937.h Mersenne twister interface code
00031 
00032 ///@file randomc.h 
00033 
00034 ///@file mersenne.cpp Agner Fogg's Mersenne Twister implementation.
00035 
00036 /// @file debug.h  Debugging bits
00037 
00038 /// @file debug.cc Debugging bits
00039 
00040 /// @file utility.h Utility bits.
00041 
00042 /// @file utility.cc Utility bits.
00043 
00044 /// @file storm_imagery.cc Code dealing with storm imagery (low level).
00045 
00046 /// @file storm_imagery.h Code dealing with storm imagery (low level).
00047 
00048 /// @file storm.h Code dealing with storm imagery (high level).
00049 
00050 /// @file forward_algorithm.h Contains an implementation fo the forward algorithm.
00051 
00052 
00053 /// @defgroup gMultiSpot Storm classes specific to multispot processing
00054 // 
00055 // Having completely smooth drift is too memory intensive.
00056 // 
00057 // The drift model is linear, approximated as piecewise constant.  Generally there
00058 // are <i>F</i> frames and these are divided up into <i>S</i> steps. The equation
00059 // for finding the step \e s from the frame \e i is:
00060 // \f[
00061 // i    s = \lfloor \frac{f S}{F} \rfloor.
00062 // \f]
00063 // The inverse is defined to be:
00064 // \f[
00065 //  f \approx \frac{s + \frac{1}{2}}{S}F
00066 // \f]
00067 //
00068 
00069 
00070 /** \mainpage 3B Microscopy Analysis
00071 
00072 \section sIntro Introduction
00073 
00074 This project contains the reference implementation of the 3B microscopy analysis method,
00075 and an ImageJ plugin.
00076 Please refer to <a href="http://www.coxphysics.com/3b">the project website</a> for more information
00077 on the method.
00078 
00079 To get started with analysing data, the <a href="http://rsbweb.nih.gov/ij/">ImageJ</a> plugin is
00080 the most suitable piece of software. This can be obtained from the  <a href="http://www.coxphysics.com/3b">the project website</a>.
00081 
00082 For more advanced analysis (such as running on a cluster), the 
00083 the program \c multispot5_headless should be used.
00084 This program needs to be run from the commandline.
00085 
00086 This project contains the source code for the commandline program and the ImageJ plugin.
00087 
00088 \section sStartHere Getting started
00089 
00090 \subsection sdata Dataset types and experimental parameters
00091 
00092 Bayesian analysis of blinking and bleaching allows data to be extracted from
00093 datasets in which multiple fluorophores are overlapping in each frame.  It can
00094 of course also be used to analyse standard localisation (PALM/STORM) data, but
00095 it must be borne in mind that the low density and high frame number of such
00096 datasets can lead to long runtimes.  Here we briefly discuss the different types
00097 of dataset, related to different applications, that you may wish to analyse
00098 using 3B.
00099 
00100 \subsubsection slow Low density PALM/STORM datasets 
00101 
00102 These datasets have few fluorophores overlapping
00103 in each image and are at least 10,000 frames long.  As discussed above, they can
00104 be analysed with 3B but their run time makes this a large time investment.  If
00105 you wish to use this approach for performance verification, we would suggest
00106 selecting a small spatial area (around 1.5-3\f$\mu\f$m square).  The algorithm can
00107 also be parallelised by running different sets of a few hundred frames
00108 for the same area
00109 separately.  Even with parallelisation, the large number of
00110 frames will make it time consuming to run.  If you simply want to know what the
00111 structure is like, we would suggest using a method such as QuickPALM, which are
00112 very fast in comparison.
00113 
00114 \subsubsection shigh High density fixed cell datasets 
00115 
00116 These datasets are of fixed cells but have
00117 multiple fluorophores overlapping in each frame.  You may acquire this type of
00118 dataset if the system you use has a fluorophore which cannot be photoswitched,
00119 or the blinking properties of which cannot be tuned over a wide enough range
00120 using the embedding medium, or if your light source is not powerful enough to
00121 drive most of the fluorophores into the non-emitting state.  In fixed samples
00122 labelled with fluorescent proteins there will in our experience be almost no
00123 blinking present, and 3B will therefore pick up bleaching.  While this can
00124 produce satisfactory results, the localisation is on single event on a high
00125 background and therefore the performance will be significantly degraded.
00126 
00127 Some users choose to acquire this type of dense data if they have severe
00128 problems with drift, particularly in the z-direction, as it drastically cuts the
00129 drift over the acquisition time.  However, in the long term it is worth trying
00130 to improve the stability of such systems, since z-drift will impact the accuracy
00131 of all types of localisation measurement.  If you are unsure how badly your
00132 system is affected by z-drift, it is useful to carry out a calibration of the
00133 drift of the microscope using a bead sample before starting acquisition of data
00134 for superresolution.  
00135 
00136 \subsubsection slive Live cell datasets 
00137 
00138 These datasets are of live cells, generally labelled with
00139 standard fluorescent proteins such as mCherry.  The mounting medium should be
00140 phenol red free, to avoid unnecessary background.  The intensity of the light
00141 source should be selected such that it is high enough to produce blinking but
00142 not strong enough to completely bleach the sample over the time of the
00143 acquisition.  For example, for a standard Xe arc lamp illumination with the full
00144 power of the lamp is generally suitable, but if you are using a powerful laser
00145 source it is recommended to take multiple datasets with different power levels to
00146 determine which is suitable.
00147 
00148 The time which is needed to acquire the data necessary for a single
00149 superresolution frame is dependent on the illumination intensity, the speed of
00150 the camera, and the properties of the flurophore.  Many older EMCCD cameras have
00151 a maximum acquisition speed of around 50 frames per second for a small region of
00152 interest, which then gives a limit of 4&nbsp;s to acquire 200 frames.  High
00153 specification EMCCD cameras and SCMOS cameras have much higher acquisition
00154 speeds of up to thousands of frames per second for restricted areas.  However,
00155 in order to maintain the number of photons from each flurophore per frame the
00156 illumination intensity would have to be increased, which is likely to bleach the
00157 sample rapidly, and change the blinking properties of the fluorophore to some
00158 extent.
00159 
00160 The acquisition of live cell datasets allows dynamics to be observed.  It should
00161 be noted, however, that if the structures move over the timescale of the
00162 acquisition then that movement will cause blurring in the reconstructed image.
00163 
00164 
00165 \subsubsection slive Selection of appropriate cell structures for observation 
00166 
00167 The 3B algorithm must
00168 be able to pick up the changes in intensity which occur when a fluorophore
00169 switches between an emitting and a non-emitting state.  The accuracy with which
00170 localisation can occur depends, as for other localisation techniques, on the
00171 number of photons from the fluorophore and the background level.  Since 3B is
00172 generally used with a widefield setup, if the sample has a lot of fluorescent
00173 structure out of the plane of focus, the background will be higher and it will
00174 be more difficult to localise.  For thick samples, the background can be reduced
00175 by using TIRF or high angle illumination.
00176 
00177 
00178 
00179 \subsection sanalysis Iterations and run time
00180 
00181 
00182 In general, determining the number of iterations required for MCMC algorithms
00183 is an unsolved problem. A good general rule is that once the reconstruction
00184 stops changing significantly with increasing iterations, then it is likely that
00185 the reconstruction has converged to a reasonable point.
00186 
00187 If you are unsure, then rerun exactly the same area with exactly the same
00188 parameters, but with a different random seed. Note that the ImageJ plugin will
00189 automatically select a different seed each time in the standard interface, but
00190 with the advanced interface or commandline program, the seed must be specified
00191 in the configuration file (see \ref sconfig). If the two results appear
00192 essentially the same, then it is very likely that a sufficient number of
00193 iterations has been reached.
00194 
00195 A good general rule is that 200 iterations is sufficient for convergence under
00196 almost all circumstances. For the example usage given in this manual, the
00197 required number of iterations for good convergence requires about 6 hours on a
00198 standard PC (Core i7 at 3GHz).
00199 
00200 Convergence may be achieved before 200 iterations, but
00201 terminating the run before convergence can lead to artefacts.  As with any
00202 microscopy method, the experiment and analysis should always be carried out
00203 appropriately to minimise the risk of artefacts.
00204 
00205 There are a number of issues which can lead to artefacts:
00206 - Early termination of the algorithm
00207   - The algorithm builds up a reconstrutcion using a number of random samples.
00208     If the algorithm is terminated too early, then the reconstruction will be
00209     dominated by the randomness, and there will not be enough samples for random
00210     fluctuations to average out.
00211   - MCMC algorithms may exhibit a property called <a
00212     href="http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm">burn
00213     in</a>. As the 3B algorithm runs, it maintains an estimate of the number of
00214     spots present in the image. Since it is an MCMC algorithm, this estimate
00215     will fluctuate around some mean value. However, 3B can add or
00216     remove spots at a rate of at most 5 per iteration (usually much slower). If
00217     the algorithm is started with a very bad estimage of the number of spots
00218     then it may take many iterations for it to approach the mean. During this
00219     time, the samples drawn will not be representative.
00220     \n
00221     3B must therefore run for enough iterations that the later, representative
00222     iterations will dominate and swamp the earlier ones. If 3B is started using
00223     a very bad estimate, then this can require substantially more than 200 
00224     iterations.
00225 
00226 - Bright regions close to the boundary
00227   - The 3B algorithm will not examine any pixels outside the boundary of the
00228     marked region. If there is a bright region of the image on or near the
00229     boundary, then 3B will naturally attempt to places fluorophores there. However
00230     since the point spread function of the microscope is typically several pixels across, the images of fluorophores near
00231     the boundary will extend across it and so will be missing information, which
00232     could lead to artefacts.
00233     \n
00234     If possible, placing the boundary near a bright region should be avoided. If
00235     this is not possible, then the reconstruction close to the boundary should
00236     be ignored. Anything further than the PSF diameter from the boundary is
00237     unlikely to be affected. Typically this is around 3 pixels.
00238 
00239 - Insufficient background regions
00240   - The 3B algorithm needs to be able to estimate the babkground noise level in
00241     order to model the image correctly. If there is an insufficient amount of
00242     background (for instances if the images are too small), then the estimation
00243     of the noise level may be poor which could lead to artefacts. 
00244     \n
00245     The sample data provided is an example of data for which the background can
00246     be estimated effectively. 
00247 
00248 - Image drift or motion
00249   - The 3B algorithm does not model image motion.
00250     \n
00251     If there is significant motion for example due to drift or a live cell
00252     moving then artefacts may result. The artefacts take the form of streaking
00253     or smearing in the direction of drift, or structure bunching randomly at one
00254     end of the drift or the other.
00255     \n
00256     Ideally the experiment should be run to minimize drift, but if this is not
00257     possible, then drift correction software (for example based on tracking
00258     beads) should be used to correct the drift. For live cell analysis, a tradeoff can be made between spatial and temporal resolution.  If fewer frames are analysed, the temporal resolution is higher but the spatial resolution will be degraded.  Varying the number of frames analysed can also be used to investigate the impact of sample movement in live cells, if this is a concern.
00259 
00260 - Incorrect parameters
00261   - If the parameters (especially the FWHM of the point spread function of the microscope) are set incorrectly then
00262     3B will not be able to model the image correctly, which may lead to poorer
00263     resolution or artefacts. The FWHM can be readily determined by taking a
00264     diffraction limited image of beads. 
00265 
00266 - Very high background levels
00267   - See \ref slive .
00268 
00269 
00270 \section sUsingPlugin Using the ImageJ Plugin
00271 
00272 
00273 A tutorial for using the ImageJ plugin is provided under
00274 <code>Plugins>3B>Help</code>
00275 
00276 The plugin can operate in two modes, standard and advanced. The standard mode
00277 allows the user to set the microscope PSF and spot size, the starting number of
00278 spots for the analysis and the range of frames.
00279 
00280 The plugin also offers an advanced mode of operation which allows much greater
00281 control over 3B. See \ref sconfig for further details.
00282 
00283 \section sCMD Using the commandline program
00284 
00285 We have provided a set of test data on the website. Download and unpack the zip
00286 file. It will create a new directory called test data with the collowing
00287 contents:
00288 @code 
00289 test_data/AVG_test_data.bmp
00290 test_data/markup1.bmp
00291 img_000000000.fits
00292 img_000000001.fits
00293 img_000000002.fits
00294 ...
00295 img_000000299.fits
00296 @endcode
00297 
00298 Then run the following command:
00299 
00300 @code
00301 ./multispot5_headless --save_spots test_data/results.txt --log_ratios test_data/markup1.bmp test_data/img_000000*
00302 @endcode
00303 
00304 The program will save the results in the file \c test_data/results.txt . The
00305 program will run indefinitely in the default setup, but you may view the
00306 results at any stage.  There is no well defined stopping point for this type of
00307 algorithm, so it is advisable continuously monitor the resultant image, and
00308 stop the algorithm when the output image is no longer changing with time. After
00309 30 minutes on a fast PC (e.g. Core i7 975), the ring structure which is not
00310 resolved in the widefield image should be clearly visible.
00311 After about 75 mins, the finer details of the structure begin to approach those seen in Fig 2e
00312 in the associated paper.
00313 
00314 The ImageJ plugin can load a results file and perform a reconstruction.
00315 
00316 
00317 Alternatively, you can process the results file further in order to view the results.
00318 Run the following command:
00319 @code
00320 awk '/PASS/{for(i=2; i <=NF; i+=4)print $(i+2), $(i+3)}' test_data/results.txt > test_data/coordinates.txt
00321 @endcode
00322 
00323 The file <tt>test_data/coordinates.txt</tt> contains a long list of \f$(x, y)\f$
00324 coordinates, representing possible spots positions. In order to view the
00325 results, load the data into a graph plotting program and create a scatter plot.
00326 NOTE: the axes are in pixel coordinates, so you will have to multiple any
00327 distances by the number of nm/pixel in order to get distances in nm.
00328 
00329 
00330 \subsection sExplaneExample Example usage explained
00331 
00332 
00333 \subsubsection ssTestdata Test Data
00334 
00335 The 300 TIFF files in the test directory correspond to the data used for Fig. 2
00336 in the paper. Please refer to the paper for details on how the data was
00337 obtained.
00338 
00339 The file <tt>AVG_test_data.bmp</tt> is a Z projection made using 
00340 <a href="http://rsbweb.nih.gov/ij/">ImageJ</a>.
00341 
00342 The file <tt>markup1.bmp</tt> is a mask indicating which area of the image to
00343 analyse. All perfectly black pixels are ignored, ecerything else is analysed. If
00344 you overlay <tt>markup1.bmp</tt> and <tt>AVG_test_data.bmp</tt> you can see
00345 which area the markup corresponds to. The markup file was created using 
00346 <a href="http://www.gimp.org">the GIMP</a>.
00347 
00348 \subsubsection ssRunning Running the program
00349 
00350 The general form for running the program is:
00351 
00352 @code
00353 ./multispot5_headless [ --variable1 value1 [ --variable2 value 2 [ ... ] ] ] image1 image2 ... 
00354 @endcode
00355 
00356 so the example sets ths variable \c save_spots to \c test_data/results.txt and
00357 the variable \c log_ratios to \c test_data/markup1.bmp.  The remaining
00358 arguments is the list of files to be analysed.
00359 
00360 The program gets the markup in the filename given in the \c log_ratios variable
00361 (yes, the choice of name is very strange, and corresponds to a very old phase of
00362 development). The more sanely named variable \c save_spots is the filename in
00363 which the output is to be saved.
00364 
00365 The program actually has a large number of variables which must be set. Most of
00366 them you probably don't want to change, but some of them you will want to
00367 change. The default values for these variables are stored in \c multispot5.cfg
00368 The format of this file should be mostly self explanatory. Everything after a \c
00369 // is a comment and is ignored. See \ref sconfig for further details.
00370 
00371 You will probably want to change:
00372 <ul>
00373 <li> \c blur.mu
00374 
00375 This is the prior over spot size, which is how the pixel size and microscope
00376 FWHM are represented. Some example values for a FWHM of 300nm/pix at 160 and
00377 100 nm/pix and for a FWHM of 270nm at 79nm per pixel. 
00378 
00379 If you have significantly largre or smaller pixels, the performance may be
00380 degraded.
00381 
00382 <li> \c placement.uniform.num_spots
00383 
00384 This is the initial number of spots to be placed down. Eventually, the algorithm
00385 will converge to a reasonable number of spots, even if this value is far off.
00386 The default value (15) is appropriate given the small area and dimness of the
00387 sample data. You will want to increase this number for larger areas of markup
00388 and relatively brighter regions.
00389 
00390 If this number is more than 1000, then the algorithm will run very slowly and
00391 may take several days.
00392 
00393 </ul>
00394 
00395 Note that variables specified on the commandline override all variables in the
00396 configuration file.
00397 
00398 The program can read FITS, BMP, PPM and PGM images. Depending on how it
00399 has been compiled, it can also read TIFF, PNG and JPEG images.
00400 The program cannot work on multi-image TIFF files. ImageJ can be used to split a
00401 multi-image TIFF into a collection of single image files. All the images loaded
00402 must be the same size.
00403 
00404 \subsubsection ssExtract Extracting and visualising the data
00405 
00406 The output file (in this case  \c test_data/results.txt ) containing the results
00407 is in a format unsuitable for plotting directly, and must be extracted. The
00408 reason for this is that the output file contains enough information to
00409 seamlessly continue long runs which have been interrupted. The provided AWK
00410 program extracts the coordinates of the spots over all iterations and puts them
00411 in \c coordinates.txt.
00412 
00413 Alternatively, the data can ve visualised using the plugin under the menu
00414 <code>Plugins>3B>Open 3B run</code>
00415 
00416 \section sconfig The configuration file and advanced settings
00417 
00418 The 3B system has a large number of parameters which control its behaviour.
00419 These are controled via the configuration file for the commandlie program or
00420 via the ``Advanced'' option for the plugin. The ``Advanced'' option essentially
00421 allows you to supply a fully custom configuration file.
00422 
00423 The sample configuration file is given below along with explanations of all
00424 parameters. In the program, most of these parameters are used by the 
00425 FitSpots class.
00426 
00427 \include jar/multispot5.cfg
00428 
00429 
00430 
00431 \page sComp Compiling the programs
00432 
00433 In order to compile the project, you will need to download and install the
00434 following libraries:
00435 <ul>
00436 <li> TooN http://www.edwardrosten.com/cvd/toon.html
00437 <li> libcvd http://www.edwardrosten.com/cvd/
00438 <li> gvars3 http://www.edwardrosten.com/cvd/gvars3.html
00439 </ul>
00440 The program is portable and is well tested under Linux and OSX. It will also
00441 compile under Windows using cygwin or MinGW.
00442 
00443 The program can be built using the usual method for compiling under Linux:
00444 @code
00445 ./configure && make 
00446 @endcode
00447 
00448 \section sPlugin Compiling the ImageJ plugin
00449 
00450 The plugin is provided pre-compiled from the project website.
00451 
00452 There are two ways of building the plugin, manual and automatic. If you want to
00453 make changes to the plugin, then use manual building. If you want to
00454 automatically build the plugin for several platforms, then use automatic
00455 building.
00456 
00457 \subsection sManual Manual
00458 
00459 The basic build instructions are the same as for the commandline program.
00460 You will also need the JDK (Java Development Kit) and  ImageJ installed.
00461 
00462 You will have to locate where your system has installed the JDK. If it is 
00463 not in /usr/lib/jvm/java-6-openjdk/include, you will have to specify the path:
00464 
00465 First configure the system:
00466 @code
00467 ./configure --with-imagej=/path/to/ImageJ/ij.jar --with-jni=/path/to/jdk/include
00468 @endcode
00469 
00470 You will also need to make sure that the JDK programs (javac, havah, etc) are in
00471 your path. The configure script will attempt to detect the location of the JNI headers.
00472 If it fails, you will need to specify \c --with-jni=/path/to/jdk/include
00473 
00474 
00475 To build the JAVA part:
00476 @code
00477 make three_B.jar
00478 @endcode
00479 
00480 
00481 To build the plugin (on Linux):
00482 @code
00483 make libthreeB_jni.so DYNAMIC_PLUGIN=1
00484 @endcode
00485 Note that if you do not specify \c DYNAMIC_PLUGIN, then the makefile will try to
00486 build a plugin with some dependencies statically linked in which will almost
00487 certainly fail unless you have set the system up to support  such an operation.
00488 
00489 On MinGW:
00490 @code
00491 make threeB_jni.dll
00492 @endcode
00493 
00494 Now copy three_B.jar and libthreeB_jni.so into your ImageJ plugins directory.
00495 
00496 
00497 \subsection sAutoBuild Automatic Build
00498 
00499 The automatic build method is very slow and is designed to be able to repeatably 
00500 build plugins for 32 and 64 bit Linux and Windows. It is also designed to build
00501 the plugin with as many static dependencies as possible so that only a single
00502 DLL/so needs to be shipped per system.
00503 
00504 The script operates by building a temporary install of Ubuntu 10.04 LTS, and
00505 using that to compile all variants of the plugin. 
00506 
00507 You will need a Debian based system (or a system on which the command \c
00508 debootstrap works) and root access.
00509 
00510 Tha automatic build system makes use of cLAPACK, rather than LAPACK as the
00511 LAPACK part is not speed critical and it is easier to build CLAPACK without
00512 additional external dependencies.
00513 
00514 To build, run the following commands:
00515 @code
00516 #First make a tar.gz of the source code
00517 bash make_dist.sh
00518 
00519 #Now execute the automatic build process
00520 bash build_plugin.sh
00521 @endcode
00522 
00523 The build takes a long time, and you should probably edit \c build_plugin.sh to
00524 point the installer at an Ubuntu mirror somewhere near to where you are.
00525 
00526 At the end of the build, the script will print out a directory name like:
00527 @code
00528 dist-123908
00529 @endcode
00530 
00531 A fresh copy of the plugin DLL and shared object will be present in that
00532 directory named.
00533 
00534 
00535 */