The BNF software uses setuptools, which is the standard library for packaging python software. After downloading the archive containing the current version of BNF, you should extract it to a directory of choice. In unix-like systems you can do it by typing
tar -xzf bnf-0.1.tgz
Once you have the sources extracted, the installation is performed by a single command
python setup.py install
in the source directory (it may require the administrator privileges).
This installs the BNfinder library to an apropriate location for your python interpreter, and a bnf script which may be accessed from a command line.
BNfinder can be executed by typing
> bnf <options>
The following options are available:
The learning data must be passed to BNfinder in a text file splitted into 3 parts: preamble, experiment specification and experiment data. The preamble allows user to specify some features of data and/or network, while the next two parts contain the learning data, essentially formatted as a table with space- or tab-separated values.
The preamble allows specifying experiment peturbations, structural constraints, vertex value types, vertex CPD types and edge weights. Each line in the preamble has the following form:
#<command> <arguments>
Experiments with perturbed values of some vertices carry no information regarding their regulatory mechanism. Thus including these experiments data in learning parents of their perturbed vertices biases the result (see [3] for a detailed treatment). The following command handles perturbations:
One possible way of specifying structural constraints with BNfinder is to list potential parents of particular vertices. An easier method is available for constraints of the cascade form, where the vertex set is splitted into a sequence of groups and each parent of a vertex must belong to one of previous groups (a simple but extremely useful example is a cascade with 2 groups: regulators and regulatees). There are 2 commands specifying structural constraints:
Note that structural constraints forcing network’s acyclicity are necessery for learning a static Bayesian network with BNfinder.
Vertex value types may be specified with the following commands:
Values in <value list> may be integers or words (strings without whitespaces). When some vertices are left unspecified, BNfinder tries to recognize their possible value sets. However it may miss, in particular when some float numbers are written in integer format or when some possible values are not represented in the dataset (note that the size of the set of possible values affects the score).
The space of possible CPDs of some vertices given their parents may be restricted to noisy-and or noisy-or distributions. In this case, the sets of possible values of these vertices and their potential parents must be either {0,1} or float numbers. Moreover, BNfinder should be executed with the MDL scoring criterion. The following commands specify vertices with noisy CPDs:
The following commands set prior weights on network edges:
Weights must be positive float numbers. Edges with greater weights are penalized harder. The default weight is 1.
The experiment specification has the following form:
<name> <experiment list>
where <name> is a word starting with a symbol other then #. The form of experiment names depends on the data type and, consequently, on the type of learned network:
Each line of the experiment data part has the following form:
<vertex> <value list>
where <vertex> is a word and values are listed in the order corresponding to <experiment list>.
The SIF (Simple Interaction File), usually contained in files with .sif extension is the simplest of the supported formats and carries only information on the topology of the network. In this format, each line represents the fact of a single interaction. In our case such interaction represents the fact that one variable depends on some other variable. Each line contains three values:
To show it by example, the file:
A + B B - C
Describes a network of the following shape:
A →^{+} B →^{−} C. |
The main advantage of this format is that it can be read by the Cytoscape (http://cytoscape.org) software allowing for quick visualization. It is also trivial to use such data in one’s own software.
Suboptimal parents sets are written to a file in a simple text format splitted into sections representing the sets of the parents of each vertex. Each section contains a leading line with the vertex name followed by lines representing its consecutive suboptimal parents sets. Each of these lines has the form:
<relative probability> <vertex list>
were <relative probability> is the ratio of the set’s posterior probability to the posterior probability of the empty parents set and <vertex list> contains the elements of the set. Lines are ordered decreasingly according to <relative probability>.
To show it by example, the section:
C 2.333333 B 1.000000 0.592593 B A
reports 3 most probable parents sets of the vertex C: {B},∅,{B,A}. Moreover, it states that {B} is 2.333333 times more probable than the empty set and the corresponding ratio for {B,A} equals 0.592593.
Bayesian Interchange Format (BIF) is a simple text format dedicated to Bayesian networks. It is supported in some BN applications (e.g. JavaBayes, Bayes Networks Editor) and may be easily converted with available tools to other popular formats (including XML formats and BNT format of K. Murphy’s Bayes Net Toolbox). BNfinder writes learned networks in BIF version 0.15.
A network saved in <file> as a dictionary may be loaded to your Python environment by
eval(open(<file>).read())
The dictionary consists of items corresponding to all network’s vertices. Each item has the following form:
<vertex name> : <vertex dictionary>
Vertex dictionaries have the following items:
The form of the vertex CPD dictionary depends on the vertex type. In the case of noisy CPD, the dictionary items have the following form:
In the case of general CPD, the dictionary has items of the following form:
When BNfinder is executed from a command line with the option -v, it prints out communicates related to its current action: loading data, learning regulators of consecutive vertices and writing output files. Moreover, after finishing computations for a vertex its predicted best parents sets and their scores are reported and after finishing computations for all vertices BNfinder reports the score and structure of the optimal network.