These descriptors are integers and can thus be interpreted as the coordinates of discrete cells in a three-dimensional space. It uses three simple descriptors that characterize sep. Since the lower the value of α, the more top-heavy the distribution, it appears that the distribution of frameworks is unusually top-heavy.Ī new method for organizing chem. A list of 12 power-law distributions found in the physical and social sciences (28) shows that only three of them have an exponent less than 2.07. It is apparent from eq 3 that this slope should equal −(α − 1), and so the calculated value for the exponent is α = 2.07. An estimate of −1.07 was obtained for the slope. In order to incorporate this constraint, the distribution was translated to put this point at the origin, and the technique of regression through the origin (31) was used. Since this is a cumulative distribution, the fitted line must take on the value log(total number of hetero frameworks) at a frequency of 1. To estimate the slope, this distribution was fitted to a line using least-squares linear regression. The slope of the distribution in Figure 9 can be used to calculate the power-law exponent α. The framework can be viewed as one possible definition of the molecular scaffold, a term which is widely used in medicinal chemistry but is not precisely defined. Part of the reason this concept is useful in medicinal chemistry is that it describes the arrangement of rings in a structure, and rings are key building blocks in the design of drugs. Typically, the framework describes only molecular topology, i.e., contains no three-dimensional or stereochemical information. By this definition, only cyclic structures have a framework. The framework is obtained by pruning all side-chain atoms, i.e., nonring atoms not on a direct path between two ring systems. By their definition, the framework of a structure consists of all the ring systems and all the linkers, which are acyclic fragments that connect the ring systems. This concept was proposed by Bemis and Murcko (12) as a way to help understand the common features in drug molecules. One type of large structural feature that is often associated with a specific chemical family is the molecular framework. We believe this power law is evidence that the minimization of synthetic cost has been a key factor in shaping the known universe of organic chemistry. This may be explained by the cost of synthesis: making a new derivative of a framework is probably less costly if many other derivatives are known. This suggests that the more often a framework has been used as the basis for a compound, the more likely it is to be used in another compound. The most significant finding is that the framework distribution conforms almost exactly to a power law. When frameworks are analyzed at the graph level, an even more top-heavy distribution is found: half of the compounds can be described by only 143 framework shapes. The distribution of frameworks among compounds is found to be top-heavy, i.e., a small percentage of frameworks occur in a large percentage of compounds. Framework data from more than 24 million organic compounds is analyzed. The scaffold of a molecule is taken to be its framework, defined as all its ring systems and all the linkers that connect them. By analyzing the scaffold content of the CAS Registry, we attempt to characterize in a comprehensive way the structural diversity of organic chemistry.