A metabolic system consists of a number of reactions transforming molecules of one kind into another to provide the energy that living cells need. Based on the biochemical reaction principles, dynamic metabolic systems can be modeled by a group of coupled differential equations which consists of parameters, states (concentration of molecules involved), and reaction rates. Reaction rates are typically either polynomials or rational functions in states and constant parameters. As a result, dynamic metabolic systems are a group of differential equations nonlinear and coupled in both parameters and states. Therefore, it is challenging to estimate parameters in complex dynamic metabolic systems. In this paper, we propose a method to analyze the complexity of dynamic metabolic systems for parameter estimation. As a result, the estimation of parameters in dynamic metabolic systems is reduced to the estimation of parameters in a group of decoupled rational functions plus polynomials (which we call improper rational functions) or in polynomials. Furthermore, by taking its special structure of improper rational functions, we develop an efficient algorithm to estimate parameters in improper rational functions. The proposed method is applied to the estimation of parameters in a dynamic metabolic system. The simulation results show the superior performance of the proposed method.
1. Introduction
Living cells require energy and material for maintaining their essential biological processes through metabolism, which is a highly organized process. Metabolic systems are defined by the enzymes dynamically converting molecules of one type into molecules of another type in a reversible or irreversible manner. Modeling and parameter estimation in dynamic metabolic systems provide new approaches towards the analysis of experimental data and properties of the systems, ultimately leading to a great understanding of the language of living cells and organisms. Moreover, these approaches can also provide systematic strategies for key issues in medicine, pharmaceutical, and biotechnological industries [1]. The formulation and identification of metabolic systems generally includes the building of the mathematical model of biological process and the estimating of system parameters. Because the components of a pathway interact not only with each other in the same pathway but also with those in different pathways, most (if not all) of mathematical models of metabolic systems are highly complex and nonlinear. The widely used approaches for modeling inter- and intracellular dynamic processes are based on mass action law [1–4]. By mass action law, the reaction rates are generally polynomials in concentrations of metabolites with reaction constants or rational functions which are a fraction and whose denominator and numerators are polynomials in concentrations of metabolites with reaction constants [1–4]. As a result, the mathematical model is nonlinear not only in the states but also in the parameters. Estimation of these parameters is crucial to construct a whole metabolic system [5–7].
In general, all algorithms for nonlinear parameter estimation can be used to estimate parameters in metabolic systems, for example, Gauss-Newton iteration method, and its variants such as Box-Kanemasu interpolation method, Levenberg damped least squares methods and Marquardt’s method [8, 9]. However, these iteration methods are initial-sensitive. Another main shortcoming is that these methods may converge to the local minimum of the least squares cost function and thus cannot find the real values of parameters. Furthermore, because of their highly complexity and nonlinearity, Gauss-Newton iteration method and its variants cannot efficiently and accurately estimate the parameters in metabolic systems [5–7, 10, 11].
In this paper, we propose a systematic method for estimating parameters in dynamic metabolic systems. Typically mathematical model of dynamic metabolic systems consists of a group of nonlinear differential equations, some of which contains several rational functions in which parameters are nonlinear. In Section 2, we propose a method for model complexity analysis via the stoichiometric matrix. As a result, we obtain a group of equations, each of which contains only one-rational function plus polynomial functions which we called an improper rational function. Then, based on the observation that in the improper rational functions both the denominator and numerator are linear in parameters while polynomials are also linear in parameters, we develop an iterative linear least squares method for estimating parameters in dynamic metabolic systems in Section 3. The basic idea is to transfer optimizing a nonlinear least squares objective function into iteratively solving a sequence of linear least squares problems. In Section 4, we apply our developed method to estimate parameters in a metabolism system. Finally we give conclusions and some directions of future work along with this study in Section 5.
2. Model Complexity Analysis for Parameter Estimation
A dynamic metabolic system consists of k substances (molecules), and m reactions can be described by a system of differential equations as follows:
(1)dxidt=∑j=1mcijrj,fori=1,…,k,
where xi represents the concentrations of molecule i, rj represents the reaction rate j, and cij represents the stoichiometric coefficient of molecule i in reaction j. The mass action law in biochemical kinetics [2–4, 12] states that the reaction rate is proportional to the probability of a collision of the reactants. This probability is in turn proportional to the concentration of reactants. Therefore, reaction rate rj is a function of the concentrations of molecules involved in reaction j and proportion constants.
The stoichiometric coefficient cij assigned to molecule i and reaction j can be put into a so-called stoichiometric matrix C=[cij]k×m. Let X=[x1,x2,…,xk]T and r=[r1,r2,…,rm]T, and let β=[β1,β2,…,βp]T represent the vector consisting of all independent proportion constants, and then (1) can be rewritten in the following vector-matrix format:
(2)dXdt=Cr(X,β).
In principle, the stoichiometric coefficient cij in matrix C is a constant integer and can be decided according to how molecule i is involved in reaction j. According to mass action law, the expression of reaction rates can be determined to be polynomials or rational functions with reaction constants [2–4, 12]. The challenge to build up the mathematic model of dynamic metabolic system (2) is to estimate the parameter vector β, especially when some reaction rates are in the form of rational functions in which parameters are nonlinear.
If each differential equation in (2) contains one-rational function without or with polynomial functions, the parameters in model (2) can be estimated by algorithms in [13, 14] or a new algorithm proposed in the next section of this paper. Unfortunately, each differential equation contains a linear combination of several rational functions, which makes the parameter estimation in those coupled differential equations more difficult. The stoichiometric matrix contains very important information about the structure of the metabolic systems and is widely used to analyze the steady state and flux balance of metabolic systems [2–4]. In this paper, via the stoichiometric matrix, we propose a systematic method to transfer a system of differential equations (2) into another system of differential equations, in which each differential equation contains at most one-rational function.
Running Example. To illustrate the proposed method, we use the upper part of glycolysis system as a running example, showing how the method is applied to this system step after step. The schematic representation of this system is shown in Figure 1. The model for this metabolic system is described by the system of differential equations (2) as follows:
(3)ddtGluc6P=r1-r2-r3,ddtFruc6P=r3-r4,ddtFruc1,6P2=r4-r5,ddtATP=-r1-r2-r4+r6-r7-r8,ddtADP=r1+r2+r4-r6+r7+2r8,ddtAMP=-r8.
Schematic representation of the upper part of glycolysis [4].
Based on the mass action law, the individual reaction rates can be expressed as
(4)r1=Vmax,2ATP(t)KATP,1+ATP(t),r2=k2ATP(t)·Gluc6P(t),r3=(Vmax,3fKGluc6P,3Gluc6P(t)-Vmax,3rKFruc6P,3Fruc6P(t)Vmax,3fKGluc6P,3)×(1+(Gluc6P(t)KGluc6P,3)+Fruc6P(t)KFruc6P,3)-1,r4=Vmax,4(Fruc6P(t))2KFruc6P,4(1+κ(ATP(t)/AMP(t))2)+(Fruc6P(t))2,r5=k5Fruc1,6P2(t),r6=k6ADP(t),r7=k7ATP(t),r8=k8fATP(t)·AMP(t)-k8r(ADP(t))2.
Model (3) has six ordinary differential equations (ODEs) and 15 parameters contained in eight reaction rates, three out of which are rational functions. Some ODEs contain more than one rational reaction rates, which makes the parameter more difficult.
Comparing (3) to (2) we have the state vector: X = [Gluc6P; Fruc6P; Fruc1,6P_{2}; ATP, ADP, AMP] and stoichiometric matrix:
(5)C=[1-1-100000001-100000001-1000-1-10-101-1-111010-1120000000-1].
In the following, we describe our proposed method to analyze the complexity of model (2) through the running example.
Step 1.
Collect the columns in the stoichiometric matrix corresponding to the rational reaction rates in model (2) to construct a submatrix Cr and collect other columns (corresponding to polynomial reaction rates) to construct a submatrix Cp. Therefore, we have
(6)dXdt=Cr(X,β)=Crrr(X,β)+Cprp(X,β),
where rr is the subvector of r and consists of all rational reaction rates while rp is another subvector of r and consists of all polynomial reaction rates. In this step, we should make sure that the rank of matrix Cr equals the number of rational reaction rates. If the rank of matrix Cr does not equal the number of rational reaction rates, it means that some rational reaction rates are not independent. Then we combine dependent rational reaction rates together to create a new reaction rate such that all resulted rational reaction rates should be linearly independent [14]. As a result, the rank of matrix Cr will equal the number of rational reaction rates.
For the running example, we have
(7)Cr=[c1,c3,c4]=[1-1001-1001-10-1101000],Cp=[c2,c5,c6,c7,c8]=[-10000000000-1000-101-1-110-1120000-1],
and rr=[r1,r3,r4] and rp=[r2,r5,r6,r7,r8]. The rank of matrix Cr equals 3, which is the number of rational reaction rates.
Step 2.
Calculate the left inverse matrix of Cr. That is, calculate Cr- such that
(8)Cr-Cr=I.
As matrix Cr has the column full rank, matrix Cr- satisfying (8) exists although it is typically not unique. For a given matrix Cr, Cr- can be easily found by solving (8) which is a linear algebraic system. If it is not unique, any matrix satisfying (8) works for our proposed method.
For the running example, we can have
(9)Cr-=[111000011000001000].
Step 3.
Multiply (6) by matrix Cr- from the left to obtain
(10)Cr-dXdt=Cr-Crrr(X,β)+Cr-Cprp(X,β)=rr(X,β)+Cr-Cprp(X,β)
or
(11)rr(X,β)+Cr-Cprp(X,β)=Cr-dXdt.
From its expression, each differential equation in the system (11) contains only one-rational reaction rates plus a linear combination of polynomial reaction rates.
For the running example, we have
(12)r1-r2-r5=ddt(Gluc6P+Fruc6P+Fruc1,6P2),r3-r5=ddt(Fruc6P+Fruc1,6P2),r4-r5=ddtFruc1,6P2.
Step 4.
Calculate matrix Cr⊥ such that
(13)Cr⊥Cr=0,
where Cr⊥ has the full row rank and rank(Cr⊥)+rank(Cr-) = the number of rows in Cr. Note that Cr⊥ can be easily found by solving (13), which is a homogenous linear algebraic system. Again if it is not unique, any matrix satisfying (13) works for our proposed method.
Then multiply (6) by matrix Cr⊥ from the left to obtain
(14)Cr⊥dXdt=Cr⊥Crrr(X,β)+Cr⊥Cprp(X,β)=Cr⊥Cprp(X,β)
or
(15)Cr⊥Cprp(X,β)=Cr⊥dXdt.
For the running example, we can have
(16)Cr⊥=[112100000110000001],Cr⊥Cp=[-2-21-1-1000010000-1].
Step 5.
Let D=Cr⊥Cp. If rank(D) ≥ the number of columns, then solving (15) yields
(17)rp(X,β)=(DTD)-1DTCr⊥dXdt.
If rank(D) < the number of columns, it means that some polynomial reaction rates in (15) are linearly dependent. Then combine the linearly dependent rates and construct a new reaction rate vector r-p(X,β) and full column rank matrix D- such that
(18)D-r-p(X,β)=Drp(X,β)=Cr⊥Cprp(X,β)=Cr⊥dXdt,
and then solving (18) yields
(19)r-p(X,β)=(D-TD-)D-TCr⊥dXdt.
For the running example, we have rank(D) < the number of columns. As the first four columns are linearly dependent, we can have a new reaction rates -2r2-2r5+r6-r7. Therefore, we have
(20)D-=[1-1010-1],D-TCr⊥=[112100-1-1-201-1],
and furthermore, noting that (d/dt)(ATP+ADP+AMP)=0, from (19) we have
(21)r6-r7-2r2-2r5=ddt(AMPP2Gluc6P+Fruc6P+2Fruc1,6P2+ATP-AMP),r8=-ddtAMP.
After these five steps, dynamic metabolic system (2) is transferred into a system of differential equations, in which each differential equation contains one-rational function plus polynomial functions ((11) or (12)) or only polynomial function ((19) or (21)). Parameters in (19) can be analytically estimated by well-known least squares methods. In the next section, we describe an algorithm to estimate parameters in (11).
3. Parameter Estimation Algorithm
After its complexity analysis, estimating parameters in dynamic metabolic system is reduced to mainly estimating parameters in a rational function plus polynomial, which we call the improper rational function. These functions are nonlinear in both parameters and state variables. Therefore, estimation of parameters in these models is a nonlinear estimation problem. In general, all algorithms for nonlinear parameter estimation can be used to estimate parameters in the improper rational functions, for example, Gauss-Newton iteration method and its variants such as Box-Kanemasu interpolation method, Levenberg damped least squares methods, Marquardt’s method [9–12, 15], and more sophisticated methods [16]. However, these iteration methods are initial sensitive. Another main shortcoming is that most of these methods may converge to the local minimum of the least squares cost function and thus cannot find the real values of parameters. In the following, we describe an iterative linear least squares method to estimate parameters in the improper rational functions. The basic idea is to transfer optimizing a nonlinear least squares objective function into iteratively solving a sequence of linear least squares problems.
Consider the general form of the following improper rational functions:
(22)η(X,β)=N0(X)+∑i=1pNNi(X)βNiD0(X)+∑j=1pDDj(X)βDj+∑k=1pPPk(X)βPk,
where the vector X consists of the state variables and the p-dimensional vector β consists of all parameters in the improper rational function (22), which can naturally be divided into three groups: those in the numerator of the rational functions βNi(i=1,…,pN), those in the denominator of the rational function βDj(j=1,…,pD), and those in the polynomial βPk(k=1,…,pP), where we have that pD+pN+pP=p. Ni(X)(i=0,1,…,pN), Dj(X)(j=0,1,…,pD), and Pk(X)(k=1,…,pP) are the known functions nonlinear in the state variable X and do not contain any unknown parameters. Either N0(X) or D0(X) must be nonzero, and otherwise from sensitivity analysis [9, 16] the parameters in model (22) cannot be uniquely identified.
If there is no polynomial part, model (22) is reduced to a rational function. Recently, several methods have been proposed for estimating parameters in rational functions [5, 6, 13, 14]. The authors in [5, 6] have employed general nonlinear parameter estimation methods to estimate parameters in rational functions. As shown in their results, the estimation error is fairly large. We have observed that in rational functions both the denominator and numerator are linear in the parameters. Based on this observation, we have developed iterative linear least squares methods in [13, 14] for estimating parameters in rational functions. Mathematically, improper rational function (22) can be rewritten as the following rational function:
(23)η(X,β)=(N0(X)+∑i=1pNNi(X)βNi+(∑k=1pPPk(X)βPk)×(D0(X)+∑j=1pDDj(X)βDj))×(D0(X)+∑j=1pDDj(X)βDj)-1.
However, in the numerator of the model above, there are pDpP+pN+pP coefficients while there are pD+pN+pP unknown parameters. When pP=1, the number of parameters is equal to the numbers of coefficients, and the methods developed in [13, 14] can be applied. However, when pP>1, those methods are not applicable as the number of parameters is less than the number of coefficients in the numerator.
In order to describe an algorithm to estimate parameters in the improper rational function (22) for n given groups of observation data yt and Xt(t=1,2,…,n), we introduce the following notation:
(24)βN=[βN1,βN2,…,βNpN]T∈RpN,βD=[βD1,βD2,…,βDpD]T∈RpD,βP=[βP1,βP2,…,βPpD]T∈RpP,β=[βPTβNTβDT]T,φN(Xt)=[N1(Xt),N2(Xt),…,NpN(Xt)]∈RpN,φD(Xt)=[D1(Xt),D2(Xt),…,DpD(Xt)]∈RpD,φP(Xt)=[P1(Xt),P2(Xt),…,PpP(Xt)]∈RpP,Y=[y(1),y(2),…,y(n)]T∈Rn,ΦN0=[N0(X1),N0(X2),…,N0(Xn)]T∈Rn,ΦD0=[D0(X1),D0(X2),…,D0(Xn)]T∈Rn,ΦN=[φN(X1)φN(X2)⋮φN(Xn)]∈Rn×pN,ΦD=[φD(X1)φD(X2)⋮φD(Xn)]∈Rn×pD,ΦP=[φP(X1)φP(X2)⋮φP(Xn)]∈Rn×pP,Ψ(βD)=diag[D0(X1)+φD(X1)βDD0(X2)+φD(X2)βD⋮D0(Xn)+φD(Xn)βD]∈Rn×n.
To estimate parameters in the improper rational function (22), as in [11], we form a sum of the weighted squared errors (the cost function) with the notions above as follows:
(25)J(β)=J(βP,βN,βD)=∑(D0(Xt)+φD(Xt)βD)2×(N0(Xt)+φN(Xt)βND0(Xt)+φD(Xt)βD+ΦPβP-yt)2.
Minimizing J(β) with respect to β=[βPT,βNT,βDT]T can give the nonlinear least squares estimation of parameters βP, βN, and βD. We rewrite the objective function (22) as follows:
(26)J(β)=∑[(D0(Xt)+φD(Xt)βD)ΦPβP+φN(Xt)βN-φD(Xt)ytβD-D0(Xt)yt+N0(Xt)]2.
In the objective function (26), for a given parameters β-D in the first term, we have
(27)J(β)=J(βP,βN,βD,β-D)=[A(β-D)β-b]T[A(β-D)β-b],
where
(28)A(β-D)=[Ψ(β-D)ΦPTΦNT-diag(Y)ΦDT]∈Rn×p,(29)b=(ΦD0diag(Y)-ΦN0)∈Rn.
Then for given parameters β-D, we can estimate the parameters β=[βPT,βNT,βDT]T by linear least squares method as follows:
(30)β=[AT(β-D)A(β-D)]-1AT(β-D)b.
Based on the above discussion, we propose the following iterative linear least squares method.
Step 1.
Choose the initial guess for βD0.
Step 2.
Iteratively construct matrix A(βDs) and vector b by (28) and (29), respectively, and then solve the linear least squares problem:
(31)J(βs+1)=[A(βDs)βs+1-b]T[A(βDs)βs+1-b],
which gives the solution
(32)βs+1=[AT(βDs)A(βDs)]-1AT(βDs)buntil the stopping criterion is met, where βs=[βPsT,βNsT,βDsT]T is the estimation of parameters β at step s.
From (31), if the estimation sequence β1,β2,… is converged to β*, the objective function (26) reaches its minimum value at β*. That is, β*is the estimation of parameters in model (22).
There are several ways to set up a stopping criterion. In this paper the stopping criteria are chosen as
(33)∥βk-βk-1∥∥βk-1∥+1≤ε,
where ∥·∥ is the Euclidean norm of the vector and ε is a preset small positive number, for example, 10-5.
4. Application
To investigate the method developed in previous sections, this study generates artificial data from the dynamic metabolic system in the running example with the biochemically plausible parameter values [4] listed in column 2 of Table 1 and initial values: Gluc6P(0) = 1 mM, Fruc6P(0) = 0 mM, Fruc1,6P_{2}(0) = 0 mM, ATP(0) = 2.1 mM, ADP(0) = 1.4 mM, and AMP (0) = 0.1 mM. The trajectory of this system is depicted in Figure 2. From Figure 2, the concentrations of all molecules except for Frucose-1,6-biphosphate reach their its steady states after about 0.1 minutes while Frucose-1,6-biphosphate after 0.5 minutes. Therefore, we do not use the data simulated after 0.5 minutes.
The true value (from [4]), estimated value, and relative estimation errors.
Parameter name
True value
Estimated value
REE (%)
Vmax,2 (mM·min^{−1})
50.2747
50.2447
0.0001
KATP,1 (mM)
0.10
0.10000
0.0399
k2 (mM^{−1}·min^{−1})
2.26
2.2599
0.0049
Vmax,3f (mM·min^{−1})
140.282
139.4917
0.5633
Vmax,3r (mM·min^{−1})
140.282
141.3623
0.7701
KGluc6P,3 (mM)
0.80
0.7999
1.3884
KFruc6P,3 (mM)
0.15
0.1499
0.0930
Vmax,4 (mM·min^{−1})
44.7287
44.6664
0.1372
KFruc6P,4 (mM^{2})
0.021
0.0206
1.8457
k
0.15
0.1526
1.7447
k5 (min^{−1})
6.04662
6.0466
0.0007
k6 (min^{−1})
68.48
68.4837
0.0054
k7 (min^{−1})
3.21
3.20797
0.0078
k8f (min^{−1})
432.9
432.8408
0.0137
k8r (min^{−1})
133.33
133.314
0.0120
Trajectory of system (3).
Although no noise is added to the artificial data in the simulation, noises are introduced in numerically calculating the derivatives by finite difference formulas. In general, the higher the sampling frequency and more data points are used, the more accurate the numerical derivatives are. On the other hand, we may not obtain data with the high frequency because of experimental limitations in practice. In this study, the sampling frequency is 100 data points per minute. In numerically calculating the concentration change rate at each time point from concentration x, we adopt the five-point central finite difference formula as follows:
(34)x˙(tn)=112Δt[x(tn-2)-8x(tn-1)+8x(tn+1)-x(tn+2)].
The estimation accuracy of the proposed method is investigated in terms of relative estimation error which is defined as
(35)REE=∥estimate_value-true_value∥∥true_value∥.
As all parameters to be estimated are nonnegative, initial values are chosen as 0 or 1 in this study. The experimental results are listed in columns 3 and 4 in Table 1. From column 3 in Table 1, the estimated parameter values are very close to the corresponding true values. Actually the relative estimation errors calculated from (29) for all estimated parameters except for two are less than 1%. This indicates that the proposed method can accurately estimate the parameters in this system.
5. Conclusions and Future Work
In this study, we have first described a method to analyze the complexity of metabolic systems for parameter estimation, based on the stoichiometric matrix of the metabolic systems. As a result, the estimation of parameters in the metabolic systems has been reduced to the estimation of parameters in the improper rational functions or polynomial functions. Then we have developed an iterative linear least squares method for estimating parameters in the improper rational models. The results from its application to a metabolism system have shown that the proposed method can accurately estimate the parameters in metabolic systems.
We do not consider the noises in the data except those introduced by numerical derivatives in this study. One direction of future work is to investigate the influence of noises in the data to the estimation accuracy. In addition, low sampling frequency is expected, particularly for molecular biological systems as in practice measurements from them may be very expensive or it is impossible to sample measurements with high frequencies. Another direction of future work is to improve the estimation accuracy of the proposed method with low sampling frequencies.
Acknowledgments
This work was supported by the Special Fund of Ministry of Education of Beijing for Distinguishing Professors and Science and Technology Funds of Beijing Ministry of Education (SQKM201210037001) to Li-Ping Tian, by National Natural Science Foundation of China (NSFC 61134004) to Zhong-Ke Shi, and by Natural Sciences and Engineering Research Council of Canada (NSERC) to Fang-Xiang Wu.
FusseneggerM.BaileyJ. E.VarnerJ.A mathematical model of caspase function in apoptosisNielsenJ.VilladsenJ.LidenG.StephanopoulosG. N.AritidouA. A.NielsenJ.KlippE.HerwigR.KowaldA.WierlingC.LehrachH.GadkarK. G.VarnerJ.DoyleF. J.IIIModel identification of signal transduction networks from data using a state regulator problemGadkarK. G.GunawanR.DoyleF. J.IIIIterative approach to model identification of biological networksChouI.-C.VoitE. O.Recent developments in parameter estimation and structure identification of biochemical and genomic systemsBeckJ. V.ArnoldK. J.van den BosA.MendesP.KellD. B.Non-linear optimization of biochemical pathways: applications to metabolic engineering and parameter estimationMolesC. G.MendesP.BangaJ. R.Parameter estimation in biochemical pathways: a comparison of global optimization methodsKlippE.LiebermeisterW.WierlingC.KowaldA.LehracjH.HerwingR.WuF. X.MuL.ShiZ. K.Estimation of parameters in rational reaction rates of molecular biological systems via weighted least squaresWuF. X.ShiZ. K.MuL.Estimating parameters in the caspase activated apoptosis systemMarucciL.SantiniS.di BernardoM.di BernardoD.Derivation, identification and validation of a computational model of a novel synthetic regulatory network in yeastChengL.HouZ. G.LinY.TanM.ZhangW. C.WuF.-X.Recurrent neural network for non-smooth convex optimization problems with application to the identification of genetic regulatory networks