Bayesian Spatial Models with Application to Child Malnutrition, Mortality and Tuberculosis

PhD Research Proposal

Kasahun Takele

PhD Student by Research in Data Science (Biostatistics)

African Center of Excellence in Data Science (ACE-DS)

College of Business and Economics, University of Rwanda

March 2018

Table of Contents

TOC o “1-3” h z u 1.Introduction PAGEREF _Toc507841091 h 11.1.Background of the Study PAGEREF _Toc507841092 h 11.2.Statement of the Problem PAGEREF _Toc507841093 h 21.3.Research Objective PAGEREF _Toc507841094 h 51.4.Research Rationale PAGEREF _Toc507841095 h 52.Literature Review PAGEREF _Toc507841096 h 62.1.Reviews on Malnourishment of Children PAGEREF _Toc507841097 h 62.2.Empirical Review on Child Mortality PAGEREF _Toc507841098 h 72.3.Review on Tuberculosis PAGEREF _Toc507841099 h 82.4.Review of Models PAGEREF _Toc507841100 h 83.Methodology PAGEREF _Toc507841101 h 93.1.Data Description PAGEREF _Toc507841102 h 93.2.Bayesian Geo-Additive Regression Models PAGEREF _Toc507841103 h 103.3.Spatial Multivariate GLMM PAGEREF _Toc507841104 h 103.6.Spatial Kernel Density Estimation PAGEREF _Toc507841105 h 143.7.Geoadditive Bayesian Discrete-Time Survival Model PAGEREF _Toc507841106 h 143.8.Indicators of Spatial Autocorrelation PAGEREF _Toc507841107 h 153.9.Spatiotemporal Analysis PAGEREF _Toc507841108 h 164.References PAGEREF _Toc507841109 h 17

Introduction

Background of the Study

Child malnutrition and mortality are among the most serious socio-economic and demographic problems in sub-Saharan African countries, and they have great impact on future development (Adebayo and Fahrmeir, 2002, Gayawan et al., 2016). Some of previous studies suggested as there are complex social, demographic and geographic processes operating in under-five mortality (Gayawan et al., 2016, Adebayo and Fahrmeir, 2005, Kandala and Ghilagaber, 2006). According to Gayawan et al. (2016) findings under-five mortality is spatially structured with adjusted mortality risks displaying similar patterns among neighboring regions which might be attributed to common childhood disease prevalence, general healthcare practices, similar poverty level and acute malnutrition caused by food insecurity. Likewise, Niragire et al. (2017) and Ayele et al. (2015) indicated as there is substantial district level spatial variation in child mortality. Childhood mortality is an increasing public health issue in Ethiopia with spatial variation across different regions (Ayele et al., 2015). Besides, decision makers are often interested in knowing the distribution of health outcomes according to geographical regions and socioeconomic factors and how these factors exert influence at different locations (Gayawan et al., 2016). However, previous works in Ethiopia on childhood mortality have been generally failed to incorporate spatial aspects (Takele, 2014, Gemechis and Kumar, 2010, Getiye, 2011, Ayele et al., 2016).

Furthermore, undernourished children are at higher risk of illness, dying, repeating grades in school, dropping out of school and less productive. These consequences costly to the health system, the education system, families and absent of the workforce due to undernutrition that related child mortalities which signify lost economic productivity (COHA, 2012). According to (COHA, 2012, ECSA, 2011, EHICES, 2011) reported 67% of the adult population in the country suffered from stunting as children and eliminating malnutrition in Ethiopia is a necessary step for growth and transformation.

In addition, various studies in literature done on Bayesian model applied to under five children nutritional status (Takele, 2013, Adebayo, 2002, Kandala, 2001, Lang et al., 2002, Fahrmeir and Lang, 2001, Raach, 2005, Khaled, 2007). But, all these studies used univariate methods while these methods are difficult to implement when the number of biomarkers is large since most of this approaches require integrating over the vector of all random effects to evaluate the joint likelihood of the multivariate data. A very recent exception is Habyarimana et al. (2016) who studied joint modeling approach for child malnutrition data. Whereas limited previous works, geographical determinants of children malnutrition and mortality have been neglected in Ethiopia. Unnoticed spatial effects, nonlinear effects, random effects and simultaneously estimating child anthropometric measurements are an open window to research. Thus, to bridge the gap this study proposes an approach for jointly modelling multivariate children nutritional status and mortality which easily incorporate linear, nonlinear, random and spatial effects simultaneously.

The other persistent problem in Ethiopia is Tuberculosis (TB). The African region, which constitutes 54 countries, contributes 26% of the global burden of TB making ranked second next to Asia which contributes 59% of the global case load (WHO, 2012). Further, the sub-Saharan Africa has continued leading in prevalence and incidence of major infectious disease killers including HIV/AIDS, TB and malaria (Mboowa and Gerald, 2014). Particularly, Ethiopia ranks 3rd in Africa and 8th among the 22 highest TB burdened countries in the world (WHO, 2009). In addition, studies have suggested problems with treating an entire region with one control strategy, rather than targeting high-risk areas with more effective control measures (Assuncao et al., 1998; Anselin, 1995; Bailey and Gatrell, 1995). However, a very few work has been done on spatiotemporal model and applied to tuberculosis case (Tabatabee et al., 2015, Venkatesanet and Srinivasan, 2008, Kipruto et al., 2015, Bailey and Gatrell, 1995). In addition, there is limited study modelled tuberculosis case by incorporating temporal components and covariate effects. Furthermore, flexible random effect models which can capture skewness not employed.

Therefore, modelling child malnutrition, mortality and dynamics of TB and demographic, socio-economic and health disparities is important in its own right to design appropriate policies to combat them. The purpose of this study is therefore to model child malnutrition, child mortality and dynamics of TB using spatial framework.

Statement of the Problem Childhood malnutrition is issues of great public health importance throughout developing countries due to its contribution to child mortality and disability adjusted life years. Malnutrition makes a child more vulnerable to infection and infection contributes to malnutrition. Besides, various studies verified that childhood malnutrition and mortality are jointly stimulating problem (Fahrmeir & Khatab, 2008, Khatab, 2010, Takele, 2013, Ayele et al., 2016, Kandala, 2011). Indeed, child mortality rates are not only influenced by socioeconomic, demographic and health variables but they also vary considerably across regions and districts in the developing countries (Adebayo and Fahrmeir, 2002). Alkema et al. (2014) developed the Bayesian B-spline bias reduction method to estimates the U5MR at the national level. However, subnational disparity is of great interest, and has been highlighted as such in the Sustainable Development Goals (UN, 2015). Consequently, space-time modeling and explaining of spatial clustering and risk factors of child mortality by joining socio-economic, demographic and location data at region level will be novelty attempt.

Because of methodological restraints, it is difficult to detect nonlinear covariate effects adequately and it is impossible to recover district specific spatial effects with common Bayesian linear regression analysis or other classical models. Then, recent investigations have applied geoadditive regression models (Fahrmeir and Lang, 2001; Kneib and Lang, 2004, Fahrmeir and Khatab, 2008, Ayele et al, 2017; Niragire et al., 2017; Adebayo et al, 2005; Kandala et al, 2006). However, most of previous studies in the country ignored spatial effects and quite limited (Takele and Taye, 2014; Gemechis and Kumar, 2010, Takele and Deressa, 2014, Tizazu, 2014; Wubet, 2013; Tadiwos et al, 2013; Mandefro et al., 2015). Equivalently failing to consider spatial dependency implies that nonlinear relationship between covariates and responses are modelled erroneously as linear and the statistical model fails to account for an important covariates that in itself is spatially structured and thus causes spatial structuring in the responses (Legendre et al. 2002). In addition, statistical outcomes produced by classical correlation and regression are aggregates and can be misleading if generalized for all local areas.

In addition to spatial and nonlinear covariate effects neglected, child anthropometric measurements were not simultaneously estimated (Takele and Taye, 2014; Gemechis and Kumar, 2010, Takele and Deressa, 2014, Tizazu, 2014; Wubet, 2013; Tadiwos et al, 2013) so that multivariate regression analysis is needed. Neglecting existing association between response variables can lead to biased and inefficient estimation of covariate effects (Adebayo et al., 2002). However, some approaches exist for spatial generalized linear mixed modeling (GLMMs) of public health problem (Diggle et al, 1998, Zxang, 2002, Manda et al., 2011; Kazembe and Namangale, 2007; Manda et al., 2012, Ngesa et al., 2013, Habyarimana et al., 2016). The multivariate conditional autoregressive model by Carlin and Banerjee (2003) will be encountered. The advantages of the joint model over the separate models include better control of type I error rates in multiple tests, possible gains in efficiency in the parameter estimates and the ability to answer intrinsically multivariate questions (Gueorguieva, 2001; Kandala et al., 2011, Habyarimana et al., 2016). The extension of the work of Habyarimana et al. (2016) will be encountered to employ structured additive model which includes the semi parametric and spatial variability to identify risk factors. To the best of our knowledge there is no researcher in literature used multivariate joint model to account for possible correlation among anthropometric indices in the study area.

The purpose of this project is to apply spatial multivariate GLMM to child malnutrition and compare and contrast the results vis a vis separate modeling of the malnutrition and geoadditive Bayesian discrete-time survival model to identify the risk factors and spatial effects of child mortality. The models will be applied to the under-five child data extracted from the Ethiopia DHS, collected by the CSA of Ethiopia in 2016.

Likewise, in Ethiopia TB is not distributed uniformly with certain regions recording higher notification rates than others. The Ethiopian national TB control program does not provide services based on those areas with the greatest notifications but rather on a uniform strategy. TB is a disease characterized with varying distribution across regions depending on socio-economic status, HIV burden and efficiency of the health system (Kipruto et al., 2015). Other study by Randremanana et al. (2010) reveals that high TB risk areas were clustered and its distribution associated with the number of patients lost to follow-up and the number households with more than one case.

In tuberculosis modeling, classical models have shortcomings of unable to handle problem of spatial correlation and incorporation of covariates (Lawson et al., 2003). Furthermore, limited models have been developed to deal with space and time varying data to model dynamics of tuberculosis especially in developing world (Vieira, 2008 and Verver, 2004). Notwithstanding, a Bayesian spatial modelling technique allow flexibility in random variation and model covariates while borrowing information from neighboring region which will be encountered in this study.

This thesis will consider problems of nonlinear effects, spatial effects and estimation in spatial GLMM and develops a model that caters for spatial correlation, random effect and estimation.

Research Objective

The main objective of this study is to develop a method for computing malnutrition of children under five years, mortality of children under five years and tuberculosis together with their determinants from DHS and hospital data.

Specific objectives are:

To deal with nonlinear effects of continuous covariates by fitting Bayesian geoadditive model to the data.

To develop model for child anthropometric indices using Multivariate joint model under Bayesian GLMM approach.

To explore correlation between the anthropometric indices in space using spatial multivariate joint model under SGLMM.

To model under-five child mortality and its determinants by fitting geoadditive Bayesian discrete time survival model (GBDTSM).

To model dynamics of TB and identify contributing risk factors for TB distribution over time and space to the data.

Research questions

How child anthropometric indices are related to each other in space?

Are continuous covariates associated to child growth retardation?

Is TB distribution uniform over time and space in the study area?

What are the contributing risk factors for TB distribution over time in the study area?

Is Bayesian geoadditive discrete time survival model robust to model and identify risk factors for under-five child mortality?

Research Rationale

The study of child malnutrition and mortality in a holistic way has become feasible in most developing countries and in particular in Ethiopia owing to the availability of a representative datasets on various maternal and child health indicators collected by DHS program based on robust sampling techniques.

However, there is no detailed study conducted using spatial multivariate models. And also, it’s very important to see child nutrition, mortality and tuberculosis determinant factors at all levels like individual, household, neighborhoods and community. This can be explored by using hierarchical spatial models. Understanding relative contribution of individual, household and area level factors at predicting childhood mortality, malnutrition and TB distribution is tool for government planners and decision makers, academicians, researchers, and practitioners to formulate more sound and targeted strategies for CMMTB prevention, intervention and control particularly in study area.

Literature Review

Reviews on Malnourishment of ChildrenGood nutrition is the cornerstone for survival, health and development for current and succeeding generations. Well-nourished children perform better in school, grow into healthy adults and in turn give their children a better start in life. Malnutrition is the reason behind more than half of all child deaths worldwide (UNICEF, 2009). Children’s dilemma is largely invisible, three quarters of the children who die from causes related to malnutrition were only mildly or moderately undernourished, showing no outward sign of their vulnerability (WHO, 1995). The scourge of childhood malnutrition, especially in Asia and Africa, often amounting to hunger and starvation, remains a public health scandal and outrage. Fulfilling children’s rights to good nutrition, including adequate food, health and care, deserves much more attention and resources than it currently receives (Latham, 2010).

Kandala et al (2008), Khateb (2010), Kandala et al (2011) and Habyarimana et al (2016) found that the geoadditive models are able to identify refined socioeconomic and spatial influences on undernutrition than reliance on linear models with regional dummy variable. The spatial analysis shows distinct patterns that point to the influence of omitted variables with strong spatial structure or possibly epidemiological processes that account for this spatial structure. Researches conducted in developing countries intended to examine spatial variation in childhood nutritional outcomes have established the importance of geographic location to improved understanding of childhood nutritional outcomes (Adekanmbi et al., 2013; Kandala et al., 2011; Khatab, 2010; Wand et al., 2012).

The few studies into childhood malnutrition in Ethiopia have mainly reported the overall prevalence of malnutrition as well as the effects of demographic, socioeconomic and health factors linked to under-five child malnutrition (Takele and Taye, 2014; Tizazu, 2014; Wubet, 2013; Tadiwos et al, 2013; Mandefro et al., 2015) but spatial variation in under-five children’s nutritional outcomes across the country remains unstudied. To the best of our knowledge, there is no study in the literature using spatial multivariate GLMM and simultaneously find the key determinants of underweight, stunting and wasting of children under age five in the study area.

Empirical Review on Child MortalityUnder five Child Mortality Rate (U5CMR) is one of the most important sensitive indicators of the socio-economic and health status of a community. This is because more than any other age-group of a population, child’s survival depends on the socioeconomic conditions of their environment (Madise, 2003, Takele and Taye, 2014). The high level of U5CMR is an indication of discouraging socio-economic development along with the poor government commitment for improving health status of its nation.

There is an established consensus that improvement in child survival requires progress on multiple fronts beyond biomedical interventions (Ayele et al, 2017; Niragire et al., 2017; Adebayo et al, 2005; Kandala et al, 2006). It is also recognized that child mortality determinants vary significantly across communities and countries, such that results for a country cannot be reliably generalized to another country (Black et al., 2003). Such determinants exist at the individual, household, and community levels (Niragire et al., 2017). Assessment of spatial effect on hard or survivor function is not only of interest in its own but can be quite useful for detecting unobserved covariates, which carry spatial information (Hennerfeind et al, 2003).

Previous works on childhood mortality have been limited to examining socio-economic, demographic and health related determinants in specific contexts but have generally failed to incorporate spatial aspects. For instance, Takele and Taye (2014), Gemechis et al (2010), Madise et al (2003) and Berger et al. (2002), are few examples. For this reason, an important issue is the development of numerically efficient solutions for evaluating the likelihood in the presence of time varying effects of covariates and spatial effects, which will be the central aim of this work.

Review on TuberculosisOne third of the world’s population is infected by mycobacterium tuberculosis, the bacterium that causes TB. TB remains a major cause of morbidity and mortality in many countries and a significant public health problem worldwide. The global incidence of TB was estimated to be 139 cases per 100,000 in 2006. 95% of these cases and 98% of TB deaths occur in developing countries, affecting mostly persons in the economically productive age group (15–50 years) (Dara et al., 2009).

Shawel (2009) and Zerdo et al.( 2014) employed logistic regression model and identified that being urban resident, having greater than 3 visits to clinics for TB symptoms, coughing for greater than 4 weeks, smoking, previous history of treatment for pulmonary TB, poor ventilation of the cell and sharing a cell with a TB patient are risk factors for pulmonary TB. Boru and Eshete (2015) employed conventional binary logistic regression and revealed cough of more than 2 weeks and history of TB in the group had significant association with pulmonary TB.

Other previous studies have shown that there is a strong correlation between the measures of income, education and social vulnerability (Roza et al., 2012 and Corbett et al., 2003). Kipruto et al. (2013) shown the inequality of TB disease distribution in Kenya and identified HIV+, gender and age are significant covariates. Megersa (2013) indicated educational status, waste disposal system, monthly income, contact history with a patient of active TB or presence of a family member with active TB, drug adherence, knowledge on TB prevention and history of exposure to substance were factors independently associated with the occurrence of active TB among HIV/AIDS patients taking ART.

To best of our knowledge, no study has been conducted before on space-time dynamics of TB in any part of the study area. Consequently, there is a lack of scientific and geographic explanation as to where and when the disease tends to concentrate.

Review of Models

Spatial generalized linear mixed models (GLMMs) for spatially dependent non-Gaussian variables observed in a continuous region and the minimum mean-squared error (MMSE) prediction under the Bayesian framework were introduced by (Diggle et al., 1998). Zhang (2002) developed a Monte Carlo version of the EM gradient algorithm for maximum likelihood estimation of model parameters and the MMSE prediction of random effects in a GLMM, which can be implemented through the Metropolis-Hastings algorithm.

Kim et al. (2001) proposed the two fold CAR model in which they allowed for sharing of information between neighboring regions with respect to the same disease and also between the two diseases within the same region. The multivariate conditional autoregressive (MCAR) model for modeling multiple diseases under separability assumptions was developed by Carlin and Banerjee (2003). The separability assumption dictates that the association structure decomposes into spatial and non-spatial components. The joint consideration comes in the sense that the spatial random effects were assumed to follow a joint distribution which allows for correlation of the components. Also assumed that there was a single parameter that controlled spatial dependency in all the diseases.

Habyarimana et al. (2016) utilized the multivariate joint model under GLMM in order to simultaneously identify the key determinants of malnutrition of children under five years based on three anthropometric indicators to include random effects and to find out the possible correlation among these anthropometric indicators. And extended GLMM to Spatial GLMM to include spatial variability. Ngesa et al. (2013) developed a model for the joint variation of the human immune deficiency virus (HIV) and the herpes simplex virus-type 2 (HSV-2) using Bayesian approach and WinBugs software. Indicated as the joint spatial modelling strategy helps in stabilizing parameter estimates by borrowing strength between different diseases and also between neighboring regions.

According to Dabo-Niang et al. (2014) variogram analysis and kriging are respectively useful tools to measure spatial dependence and achieve spatial prediction in parametric geostatistical methods. However, nonparametric spatial modeling is much less extensive than parametric. Tran (1990), Menezes (2012), Wang et al. (2012), and Dabo-Niang et al. (2011) done to study nonparametric variogram, density or regression problems for spatial data.

Methodology

Data Description

In this study, both primary and secondary data will be utilized. The data on malnutrition and mortality will be taken from EDHS 2016. The data on TB reported cases will be gathered from all health centers and district hospitals of East Hararge zone. It is based on yearly records of the reported cases for the district aggregated as a single entity representing each of the 19 districts. Specifically, the data for the year 2011-2017 is proposed to be used for spatiotemporal analysis of TB prevalence.

Bayesian Geo-Additive Regression Models

Statistical spatial models have been used in diverse applications, such as engineering, geology, ecology, and public health, for analyzing geographically referenced data. Advances in computing power, Geographic information system (GIS), and computational techniques, such as the Markov Chain Monte Carlo (MCMC), allow sophisticated spatial models to be developed. In biostatistical field, spatial models have particularly been increasingly employed to analyze disease rates and develop disease maps.

For instance, spatial analyses of undernutrition often are confined to using region-specific dummy variables to capture the spatial dimension. Here, the study will go a step further by exploring district patterns of childhood undernutrition and, possibly nonlinear effects of other factors within a simultaneous, coherent regression framework using a geo-additive semi-parametric mixed model. Because the predictor contains usual linear terms, nonlinear effects of metrical covariates and geographic effects in additive form, such models are also called geo-additive models.

The geo-additive regression model is:

(1)

here, f1,…,fp are non-linear smooth effects of the metrical covariates, and fspat is the effect of the spatial covariate si?{1,…,S} labelling the regions in the country. Regression models with predictors as in (1) are sometimes referred to as geo-additive models. The observation model (1) may be extended by including interaction f(x)w between a continuous covariate x and a binary component of w, say, leading to so called varying coefficient models. We propose the extension of univariate Bayesian spatial model to multivariate setup and study the posterior propriety as an important step.

Spatial Multivariate GLMMThe present study will use a spatial multivariate GLMM to model non-Gaussian data. Assume that there are n distinct sites on a spatial domain where observations on p variables are collected. Let the multivariate data consist of the p-dimensional random vector yj =y1j,y2j,…,ypj’ for the site, for j=1,2,…,n. Corresponding to the response yij, denote by xij=(xij1, xij2, …,xijqi)’ to be the qi×1 vector of explanatory variables. The following two hierarchical model is considered for the distribution of the np×1 vector of all observables y=(y’1,y’2, …,y’n)’. In the first stage of the hierarchy, the variables are independent with density

(2)

Where fij belongs to an exponential family of densities with canonical parameter ?ij and hi(?ij) is the normalizing constant satisfying

(3)

It can be shown in McCullagh and Nelder (1989) that is a differentiable function with inverse. In the second stage, the canonical parameter ?ij is related to xij via a link function in the usual linear model setup,

(4)

Where ?i is a qi×1 vector of regression coefficient and ?ij are error random variables. The hierarchical specification is completed by eliciting the distribution for the error component ?ijs, namely, ?~Nnp(0,D) where ?j??1j,?2j,…, ?pj’is the p×1 error vector at the jth spatial site, ??(?’1,?’2,…,?’n)’ is the np×1 vector of all the error variables and D is the covariance matrix (of dimension np×np) of ?. Such models based on an unstructured D are called a random effect GLMs or generalized linear mixed models (GLMMs).

Multivariate Conditional Autoregressive (MCAR)

In the spatial framework, the distribution of ?, and hence D, can be given a more concrete structure based on neighboring dependencies. Following Mardia (1988), we define conditions under which the conditional multivariate distributions uniquely determine the corresponding multivariate joint probability density function. Let, where each is an n×1 vector. Then ? is np×1 vector. Also let ? have a multivariate Gaussian distribution with mean 0 and dispersion matrix D, written as

(5)

Where D is an np×np symmetric and positive definite matrix. It is informative to look at D is a p×p block matrix with n×n block Dij. Then from Mardia (1988) the full conditional distributions is given by

(6)

This implies that. The full conditional probability density functions are

(7)

where ?i and Cij are n×n matrices. ?i is also symmetric and positive definite. We now write ?i and Cij in terms of D, the precision matrix of the joint distribution as . If we set ? to be a block diagonal matrix with ?i blocks and C as partitioned matrix with blocks Cij and Cii=0n×n then, distributed as MCAR. During implementation, give appropriate prior, most often

Variogram: The variogram is a tool in the analysis of spatial data used to describe the nature of spatial covariance in a very convenient manner. Cressie (1993) gives a detailed treatment of the variogram. A variogram represents the average variance between observations separated by the distance h, is the semivariogram. It is given by (Journel, 1978, Habyarimana et al., 2016) as

(8)

where ?(si) is the measurement at location si with N(h) the number of sampled points of distance length h.

The non-Gaussian spatial problems can be analyzed in the context of generalized linear mixed models, where the specification of the likelihood of the random variable is required. The spatial process can be incorporated as y(si/?), and this assumed to be conditionally independent for any location siwith the conditional mean ?(si) = EY (si?), where the parameter ? is used to define the distribution of s. Therefore, the spatial correlated random effect is incorporated into the linear predictor as

(9)

where X and w are the design matrices. The random effect at location (si), ?~?0, ??(?) and ?~?(0, ??2?), where the spatial correlation is parameterized by ? in ??(?) (Schaberger and Gotway, 2005).

Spatial dependence: when the response variable is spatially structured because it depends upon covariates that are themselves spatially structured by their own generating processes (Legendre et al. 2002). The mathematical equation is:

yj=?y+fcovariates+?jWhere the value taken by a dependent variable y at site j is the overall regional mean ?j of the variable, modulated by adding the local effect of the explanatory variables at site j, plus a random error component j

Estimation of Spatial MGLMM

Let b(s) , s?R2 Gaussian with ?b(s)=0 for all s. If conditionally on {b(s), s?R2}, {Ys,s?R2} is an independent process and for each s the distribution of Y(s) depends on b(s) only, then for any s, s1,s2,…,sn (Shiryayev, 1984).

, (10)

Where the coefficient ci are such that ?b(s)/b(si, i=1,2,…, n}=i=1ncib(si) and Y=Ysi, i=1,2,…,n. For any sampling site si, E {b (si) / Y} cannot be given in closed form when Y is not Gaussian but can be approximated by Monte Carlo samples. Other estimation procedures such as ML, REML, MMSE, INLA, and MCEMGA will be employed and robust estimators are suggested.

Spatial Kernel Density Estimation

Suppose that there exists a collection of density functions fi,i??N of Xi,i??N and we want to estimate fi0 at a fixed i0. fi is close to fi0 , when i is close to i0 and if there is enough such sites named i1,…,iM that are close to i0, then the variables xi1,…,xiM can be used to estimate fi0. Classically the spatial kernel density estimator for f is given by (Tran, 1990) as

fn(x)(0)=nbnd-1i=InKx-Xibn, x?Rd (11)

Where K is a kernel and bn is a sequence of bandwidth that tends to zero as n goes to infinity, n= n1,n2,…,nN be the sample size,fn(xk)(0)=fn(xj)(0), for all xk=xj even if k?j follows strict stationary condition. However, it does not take into account the spatial dependence. As a solution (Dabo-N et al., 2014) developed modified version of kernel estimator (11) that consider spatial dependence and we will encounter in this work. Let XI,i??N be a spatial process observed at any i?In, Xi?Rd. Then, the new kernel density estimator of f is defined by (Dabo-N et al., 2014) as

fnxj=1bn d?nNnK1xj-xibnK2,?ni-j,For each fixed xj?Rd located at ?In , where K2,?ni-j=CnjK2ti-tj?n where ti=in=i1n,…, iNn, Cnj;0 is a normalized constant eventually equal to one, K1 and K2 are kernels respectively defined on Rd and R. xi and xj are nearly independent when i-j is high.

Geoadditive Bayesian Discrete-Time Survival Model

Classical parametric regression models for analyzing child mortality or survival have severe problems with estimating small area effects and simultaneously adjusting for other covariates, in particular when some of the covariates are nonlinear or time varying (Kandala et al, 2006, Adebayo et al, 2005, Adebayo et al, 2002). Consequently, flexible semiparametric approaches are needed which allow one to incorporate small-area spatial effects, nonlinear or time-varying effects of covariates and usual linear effects in a joint model.

In discrete-time survival analysis, we consider a sample of n live births and the set of observations on these i,ti,xi,li,regsi; s=1,…,11; where ti =1,…,1825 records the number of days the child i survived (until death or end of the observation period); xi=(xi1,…, xip)’ is a vector of observations on p metrical covariates X, regsi is the region s, where the child i was living during the survey, and yi=(yi1,…, yiq) is a vector of q observations on categorical covariates Y. The geoadditive hazard regression model for the child i residing in region s is then defined as follows according to Adebayo and Fahrmeir (2005), Hennerfeind et al. (2006) and Niragire et al., 2017):

hi(t)=exp?f0t+f1xi1+…+fpxip+fspatregsi+yi? (12)

Equation (11) can be written as where ?i(t) is the geoadditive predictor:

?i(t)=exp?(f0t+f1xi1+…+fpxip+fspatregsi+yi? (13)

Indicators of Spatial AutocorrelationSpatial autocorrelation identifies the patterns of spatial dependency by calculating the correlation of a variable with itself within a geographic space (Cliff and Ord, 1981). In this study, the global Moran’s I and Geary’s ratio C statistics will be applied to investigate the spatial autocorrelation and distribution pattern of TB in the study area. The value of Moran’s I is calculated based on the deviation from the mean of two neighboring values (Moran, 1950). The mathematical formula is as follows.

I=Nijwijxi-xxj-xWi(xi-x)2 (14)

where n is the sample size, is the mean of the variable, xi is the value of the variable at a particular location i, xj is the variable value at location j, and wij is a spatial weight indexing the location of i relative to j. The value of Moran’s I ranges from -1 for negative spatial autocorrelation to 1 for positive spatial autocorrelation. Its significance is evaluated by using a z score and the corresponding p value. The null hypothesis states that there is no spatial autocorrelation for the variable within the geographic area.

Similar to Moran’s I method of measuring spatial autocorrelation, Geary’s ratio C also adopts a cross-product term (Getis, 1991). Geary’s ratio is formally defined as

C=(n-1)ijwij(xi-x)2Wj(xi-x)2 (15)

The Geary’s ratio C ranges from 0 to 2, 0 indicating a perfect positive spatial autocorrelation (i.e, all neighboring values are the same) and 2 indicating a negative spatial autocorrelation. Furthermore, the local versions of Moran’s I, Geary’s ratio C and Moran scatter plot are proposed to examine the level of spatial autocorrelation at the local scale.

Spatiotemporal AnalysisIn this study, a general linear model assuming a Poisson distribution with spatial and temporal random effects will be used to characterize the relationships between TB cases and covariates. For count data, it is assumed that nij is the count of TB cases for the ith district at jth time. There are assumed to be t periods and m districts. Hence the number of TB cases in each district at time, t, is Poisson-distributed. That is,

yij? Poisson(?)(16)

Given x1,x2,…,xm location dependent covariates and z1,z2,…,zt temporally dependent covariates:

PY=kx1,x2,…,xm=e-??kk!,k=0,1,2,… (17)

where Maximum likelihood estimation procedure will be used to estimate the parameters ? and ?.

Furthermore, this study extends to the Bayesian Hierarchical Generalized Linear Mixed Models (BHGLMMs) which are used in small area estimation because of their ability to incorporate multiple levels of model dependencies (Fong et al., 2009 and Cnaan et al., 1997) will be considered. Integrated nested Laplace approximation (INLA) developed by (Rue et al., 2009) and based on nested Laplace approximations is a new approach for Bayesian inference on latent Gaussian models. It has an excellent performance in terms of good accuracy and reduced computational time, (Grilli et al., 2014, Taylor et al, 2013, Cameletti et al., 2013 and Rue et al, 2009).

Three candidate models will be employed for selection:

M1: All variables fixed

M2: All variables fixed + spatial effect

M3: all categorical variables fixed +nonlinear of continuous variables + spatial effect

References

Adebayo SB, Fahrmeir L, 2005. Analyzing child mortality in Nigeria with geoadditive discrete-time survival models. Stat Med.

Adebayo SB, Fahrmeir L, 2002.Analyzing Child Mortality in Nigeria with Geoadditive Survival Models.

Alkema, L., J. R. New, J. Pedersen, D. You, et al. 2014. Child mortality estimation 2013: an overview of updates in estimation methods by the United Nations Inter-Agency Group for Child Mortality Estimation. PloS ONE

Anselin L. 1995. Local Indicators of Spatial Association- LISA. Geography Analysis

Assun¸cao RM, Barreto SM, Guerra HL, Sakurai E. 1998. Mapas de taxas epidemiologicas: uma abordagem Bayesiana. Cad Sau´de Pu´blica.

Ayele D.G., Temesgen T. Z. and Hemry M. 2017. Survival analysis of under-five mortality using Cox and frailty models in Ethiopia. Journal of Health, Population and Nutrition.

Ayele DG, Zewotir T, Mwambi H. 2016. Indirect child mortality estimation technique to identify trends of under-five mortality in Ethiopia. Afr Health Sci.

Bailey TC, Gatrell AC. 1995. Interactive Spatial Data Analysis. Longman Scientific and Technica.

B.P. Carlin and S. Banerjee. 2003. Hierarchical multivariate car models for spatiotemporally correlated survival data. Bayesian statistics.

Cameletti, M., Lindgren, F., Simpson, D., & Rue, H.2013. Spatio-temporal modeling of particulate matter concentration through the SPDE approach. AStA Advances in Statistical Analysis.

Cliff, A., J. K. Ord. 1981. Spatial Processes Models and Applications. Pion, London.

Cnaan, A., Laird, N.M., & Slasor, P. 1997. Using the General Linear Mixed Model to analyze unbalanced repeated measures and longitudinal data. Statistics in Medicine.

COHA, 2012. The Cost of Hunger in Africa: The Social and Economic Impact of Child Under nutrition in Ethiopia.

Corbett EL, WattCJ, Walker N,MaherD,Williams BG, Raviglione MC. 2003. The growing burden of tuberculosis: global trends and interactions with the HIV epidemic. Archives of internal medicine.

Dara, M. Grzemska, M. Kimerling, E. Reyes,H. Zagorskiy, A. 2009. Guidelines for control of tuberculosis in prison. ICRC.

Dabo-Niang S, Rachdi M, Yao AF. 2011. Kernel regression estimation for spatial functional random variables. Far East J Theor Stat.

F. Niragire, Th. N.O. Achia, Alexandre L., J. Ntaganira, 2017. Child mortality inequalities across Rwanda districts: a geoadditive continuous-time survival analysis. Geospatial Health.

Fong, Y., Rue, H. & Wakefield, J. 2009. Bayesian inference for generalized linear mixed models. Biostatistics.

E.Gayawan, M. Adarabioyo, D.M. Okewole, Stephen G. Fashoto and Joel C. Ukaegbu. 2016. Geographical variations in infant and child mortality in West Africa: a geo-additive discrete-time survival modelling.

Gemechis F. And Kumar, P. 2010. Infant and Child Mortality in Ethiopia: As Statistical Analysis Approach. Ethiopian Journal of Science and Education.

Getis, A. 1991. Spatial Interaction and Spatial Autocorrelation: A Cross-product Approach. Environment and Planning.

Grilli, L., Metelli, S., & Rampichini, C. 2014. Bayesian estimation with integrated nested Laplace approximation for binary logit mixed models. Journal of Statistical Computation and Simulation.

Habyarimana F., Zewotir T., Ramroop S. and Aayele D. G. 2016. Spatial Distribution of Determinants of Malnutrition of Children under Five Years in Rwanda: Simultaneous Measurement of Three Anthropometric Indices. Journal of human ecology.

Hennerfeind A., Brezger A., Fahrmeir L. 2003. Geoadditive survival models. Working paper

Kandala N.B, Fahrmeir L., Klasen S., Priebe J., 2008. Geo-additive models of childhood undernutrition in three sub-Saharan African countries.

Kandala N-B and Ghilagaber G, 2006. A geo-additive Bayesian discrete- time survival model and its application to spatial analysis of childhood mortality in Malawi.

Kandala, NB.; Fahrmeir, L and Klasen, S. 2002. Geo-additive Models of Childhood Undernutrition in Three Sub-Saharan African Countries. Sonderforschungsbereich.

Kandala, N. B., T. P. Madungu, J. B. Emina, K. P. Nzita, and F. P. Cappuccio (2011). Malnutrition among children under the age of five in the Democratic Republic of Congo (DRC): does geographic location matter? BMC Public Health.

Khaled, K. 2010. Child Malnutrition in Egypt Using Geoadditive Gaussian and Latent Variable Models.

Kipruto H, Mung’atu J, Ogila K, Adem A, Mwalili S, Kibuchi E,Ong’ang’o JR, Sang G. 2015. Spatial Temporal Modelling of Tuberculosis in Kenya Using Small Area Estimation. International Journal of Science and Research.

K.V Mardia. 1988. Multi-dimensional multivariate gaussian markov random fields with application to image processing. Journal of Multivariate Analysis.

Latham, MC; Jonsson, U, and Sterken, E, Kent, G. 2010. RUTF Stuff: Can the Children be Saved with Fortified Peanut Paste? World Nutrition. Journal of the World Public Health Nutrition Association.

Legendre, P., Dale, M. R. T., Fortin, M.-J., Gurevitch, J., Hohn, M. and Myers, D. 2002. The consequences of spatial structure for the design and analysis of ecological field surveys. Ecography.

Mandefro A., Mekitie W., Mohammed T. and Lamessa D. 2015. Prevalence of under nutrition and associated factors among children aged between six to fifty-nine months in Bule Hora district, South Ethiopia.

McLeod, K.S. 2000. Our Sense of Snow: The Myth of John Snow in Medical Geography. Social Science and Medicine.

McCullagh, P., and Nelder, J.A. 1989. Generalized Linear Models (2nd ed.), London: Chap man and Hall.

Ministry of Health of Ethiopia, 2008. Health and Health Related Indicators. Addis Ababa.

N.A.C. Cressie. 1993. Statistics for Spatial Data. Wiley-Inter science.

Moran, PA. 1950. Notes on Continuous Stochastic Phenomena. Biometrika.

O. Ngesa, T. Achia, and H. Mwambi. 2013. Spatial Joint Disease Modeling and Mapping with Application to HIV and HSV-2. In Proceedings of the 55th Annual Conference of the South African Statistical Association.

Raach, A.W. 2005. A Bayesian semiparametric latent variable model for binary, ordinal and continuous response. Dissertation, Department of Statistics, University of Munich.

Roza, DL, Caccia B, Martinez EZ. 2012. Spatio-temporal patterns of tuberculosis incidence in Ribeirão Preto, State of São Paulo, southeast Brazil, and their relationship with social vulnerability: a Bayesian analysis.” Rev. Soc. Bras. Med. Trop.

Rue, H., Martino, S., & Chopin, N. 2009. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the royal statistical society.

Tadiwos Z., Dagnet A. 2013. Determinants of Child Malnutrition: Empirical Evidence from Kombolcha District of Eastern Hararghe Zone, Ethiopia. Quarterly Journal of International Agriculture

Takele K. and Taye A.2014. Bayesian modelling of growth retardation among children underfive years old in Ethiopia. Far East Journal of Theoretical Statistics.

Takele K. 2013. Semi-parametric analysis of children nutritional status in Ethiopia. International Journal of Statistics and Applications.

Taylor, B. M., & Diggle, P. J. 2013. INLA or MCMC? A tutorial and comparative evaluation for spatial prediction in log-Gaussian Cox processes. Journal of Statistical Computation and Simulation.

Tizazu B. 2014. Stunting Status of Under-Five Children in Rural Ethiopia: Multilevel Logistic Regression Analysis.

Tran LT. 990. Kernel density estimation on random fields. J Multivariate Anal

Venkatesan P and Srinivasan R. 2008. Applied Bayesian statistical Analysis. Proceeding of NSABSA.

Verver S, Warren RM, Munch Z, et al. 2004. Transmission of Tuberculosis in a High Incidence Urban Community in South Africa. Int J. Epidemiology.

Vieira RC, Prado TN, Siqueira MG, Dietze R, Maciel EL. 2008. Spatial distribution of new tuberculosis cases in Vit´oria, state of Esp´?rito Santo, between 2000 and 2005. Rev Soc Bras Med Trop.

Wand, H., N. Lote, I. Semos, and P. Siba 2012. Investigating the spatial variations of high prevalences of severe malnutrition among children in Papua New Guinea: results from geoadditive models: BMC Research Notes.

Wang H, Wang J, Huang B. 2012. Prediction for spatiotemporal models with auto regression in errors. J Nonparametric Stat

WHO. 1995. Physical Status: The Use and Interpretation of Anthropometry. WHO Technical Report Series No. 854. Geneva.

Wubet K., 2013. Determinants of child malnutrition in Ethiopia. Ethiopian Economic Association