PROFASI
Version 1.5

A histogram that can adjust its own range according to the data. More...
#include <AdaptiveHis.hh>
Public Member Functions  
AdaptiveHis ()  
Default constructor.  
AdaptiveHis (const AdaptiveHis &)  
Copy constructor.  
AdaptiveHis &  operator= (const AdaptiveHis &) 
Assignment operator.  
void  init () 
Initializes using info about range etc.  
int  adjust () 
Adjust range to accommodate data, keeping bin size fixed.  
int  put (double x, int i=0) 
Put a value into the histogram.  
void  Export (const char *filename, int normmode=2, int lyout=2) 
Save histogram and out of range points to files.  
void  disable_adjust () 
Public Member Functions inherited from prf_utils::His1D  
His1D ()  
Default constructor.  
His1D (int nbl)  
Create His1D with nbl blocks (create nbl histograms)  
His1D (double xmn, double xmx, int npnts, int numblocks=1)  
Construct with xmin, xmax, number of bins and number of blocks.  
His1D (double xmn, double xmx, double bnsz, int numblocks=1)  
Construct with xmin, xmax, bin size and number of blocks.  
His1D (const His1D &)  
Copy constructor (copies data, so do not initialize after this!)  
His1D &  operator= (const His1D &) 
Assignment operator (copies data, do not initialize after this!)  
void  Name (std::string nm) 
Give it a name.  
void  init () 
Initializes using info about range etc.  
void  reset () 
Reset all data (init calls this)  
void  NBlocks (int n) 
Make it a histogram of n blocks.  
int  NBlocks () const 
Return the number of blocks.  
long  n_entries (int i) const 
number of entries in block i  
long  n_entries_in_range (int i) const 
number of entries in block i in range  
void  Range (double x0, double x1) 
Set range.  
void  Nbins (int v) 
Set number of bins.  
int  Nbins () const 
Get number of bins.  
void  set_bin_size (double sz) 
Set bin size.  
double  Xmin () const 
Get xmin.  
double  Xmax () const 
Get xmax.  
double  Xbin () const 
Get bin size.  
double  xval (int i) 
x value for the middle of i'th bin  
double  yval (int iblk, int i) 
y value for the middle of i'th bin  
int  put (double x, int iblk=0) 
put value x into the iblk block  
int  nput (double howmanytimes, double x, int iblk) 
put n indentical values at once  
double  normalize () 
normalize histogram so that each block sums to 1  
double  unnormalize () 
unnormalize histogram so that each block sums to its occupancy  
int  Import (const char *filename) 
Import histogram data written in the format of Export function.  
virtual His1D &  operator+= (His1D &) 
Add information from another given histogram.  
This is a minor modification of the His1D class that frees the user from estimating a good range for a histogram before starting to fill it. When one starts a Monte Carlo run with a new protein, or in any other application where a histogram may be required, one has often no idea where the values of one measurement might lie. Before version 1.1 of PROFASI, one had to first make a trial run to get a good feeling for the true range of the data, and then start new production runs where the correct histogram ranges were specified. To a large extent, this will now be unnecessary.
This histogram follows the data, wherever it is. You just have to declare an AdaptiveHis, fill it with data, ask it to "adjust()" once in a while, and Export() it to a file. If you then plot the file, it will have a very reasonable range: not too many empty bins on left and right. Not too many missed data points.
This does not mean that one does not need to think about the size of the values put into the histograms at all. The adjust() function does not change the size of the bins used for the histogram. That's how it works! If there are many points outside the current range, the class remembers those missed points. When statistics is collected we pretend that there were an infinite number of bins of the size set at initialization. We only choose to do the book keeping on a finite range of those bins. If there are points outside our currently tracked bins, we can add a few bins to the left or right to accommodate them, without affecting the collected data at all. If we were to change the bin size during an adjustment, that would interfere with the data collected before adjust() was called.
So, an initial guess for the size of the data is useful. More precisely a good initial estimate of the size of the bins is useful. If the minimum or maximum ranges are wrong, this class will take care of it. If your data values range from 3000 to 10000 and you initialize your AdaptiveHis to a range 0 to 1 with 100 bins, The adjust function will result in a huge histogram. It will have a range 3000 to 10000, with 700000 bins. But if you initialize it to 2000 to 4000 with 50 bins, you will be fine. It will once again find the correct range, but will have less than 200 bins.
This class was introduced in version 1.1. It is possible that in the future, its functionality will be absorbed in His1D.
int prf_utils::AdaptiveHis::adjust  (  ) 
Appropriate range for the data is found by examining out of range points, and the occupancy of currently used bins. We add bins, only if we can fill them. The fundamental reason for the existance of any kind of histograms is that one does not wish to save each and every data point. "Similar" data points are groupped, or binned together. Now if there is one data point outside the current range, such that we would need add 100 bins to the right to reach that datapoint, the use of the histogram itself loses its meaning, if we do that. It is more economical to save that data point than to create 100 more bins and remember the frequency of each. Therefore, even after repeated calls to adjust, there might be one remaining outofrange point that this class simply refuses to cover. When the histogram is saved, such points, if any, are saved in a separate file, and you can deal with them if you like.
Return value is nonzero if the range really changes.

inline 
Disable/enable range tracking features. One can temporarily disable the "adaptive" qualities of the histogram. This is intended for use, if it is known that the incomming data for a certain stage of the program can contain nonsensical values which should have no bearing on the range. When "adjustability" is disabled, the histogram forgets new out of range values, until it is reenabled.

virtual 
The histogram data is saved as in class His1D. The parameters normmode and lyout are simply passed down to the base class function. But this class also saves the outofrange points not covered by the final range(those the function adjust() refuses to include), in a second file with the same name as the histogram file, but with an extension ".out_of_range" at the end.
Reimplemented from prf_utils::His1D.
AdaptiveHis & prf_utils::AdaptiveHis::operator=  (  const AdaptiveHis &  hs  ) 
It copies data, do not initialize after this!
int prf_utils::AdaptiveHis::put  (  double  x, 
int  i = 0 

) 
The value x is put into the histogram if it fits in the range. If not, it is stored in the outofrange list. The adjust function deals with these out of range points and may put them into bins when the range is appropriately extended.