PROFASI  Version 1.5
Creating PROFASI style histograms with prf_his1d

You have a data file with hundreds of thousands of lines and tens of columns, and you want to look at the distribution of the data in the 13th column. prf_his1d is a convenient tool to make such a histogram. You need to use UNIX commands "awk" or "cut" to extract the relevant column, and pipe it to prf_his1d like this:

awk '{print $13;}' the_datafile | prf_his1d -r -30 55 -nb 100 -o output.his

The available options are :

The program simply takes the data from the standard input and puts them in a histogram with the specified range (-r -30 55) and number of bins(-nb 100). If the data falls outside the specified range, by default, the histogram will try to adjust its range to fit the data, without losing information. This is a property of PROFASI's self adjusting histograms. With the option "–no_adjust or -na" this feature is turned off. Even if you want to use the self adjust feature, try to provide at least a sensible range in the range option. This is used to set the bin size, which is not changed while adjusting the histogram range later.

Suppose you have stored some sort of integer identifier in the 2 column, that labels the line of data in some way. In PROFASI run time history files, you have the temperature index written in the second column. You might be interested in the histogram of energy for temperature index 4. You can do that exactly as before by using a conditional in "awk".

awk '{if ($2==4) print $3;}' n* /rt | prf_his1d -r 10 60 -nb 100 -o etot.his

But then you might also need to get the histogram for temperature index 0,1,2... ! A new file and a new command for every index is unnecessary. PROFASI histograms can store, in one file, information about the same kind of data categorized by some integer index. The right way to make a single output file containing histograms of column 3 partitioned by column 2 is :

awk '{print $2,$3;}' n* /rt | prf_his1d -nk 16 -o etot.his -r 10 60 -nb 100

The option "-nk" used in the above example changes how the incoming data stream is interpreted. First, the data is now assumed to be in two columns. The first column is supposed to be an integer index, whereas the second column is the data to be put in the histograms. The value "16" passed to the "-nk" option instructs the program to create 16 blocks for the histogram, i.e., that the integer index will vary from 0 to 15. The data lines in which the integer index is 3 will be stored in the 3rd block of the histogram, index 4 ==> block 4 etc. The resulting histogram file will contain many columns. The first column is the x bins. The subsequent columns will be the histogram values (frequencies, probabilities ...) for the different integer indexes. You can change the layout of the histogram to PROFASI 1.1 histogram layout by using the "–layout or -l 1" option.

Although we used the total energy in the above examples, if you are interested in that histogram, it is much better to use the histograms generated during the runs, like n0/his_Etot etc. This program is intended to be used in situations where such a histogram file is not available.


PROFASI: Protein Folding and Aggregation Simulator, Version 1.5
© (2005-2016) Anders Irbäck and Sandipan Mohanty
Documentation generated on Mon Jul 18 2016 using Doxygen version 1.8.2