Advertisement
Letter to the Editor| Volume 135, ISSUE 7, P1903-1905, July 2015

Geographical and Temporal Correlations in the Incidence of Lyme Disease, RMSF, Ehrlichiosis, and Coccidioidomycosis with Search Data

      Abbreviations

      CDC
      center for disease control
      GT
      Google Trend
      TO THE EDITOR
      Public health initiatives depend on timely data collection and dissemination of information. Recently, digital surveillance systems using “big data” such as internet search metrics, or online news stories, have predicted disease outbreaks such as severe acute respiratory syndrome 2 months before publication by World Health Organization and reported on a strange fever in Guinea 9 days before the official information release on the current Ebola epidemic in West Africa (
      • Anema A.
      • Kluberg S.
      • Wilson K.
      • et al.
      Digital surveillance for enhanced detection and response to outbreaks.
      ;
      • Milinovich G.J.
      • Magalhaes R.J.
      • Hu W.
      Role of big data in the early detection of Ebola and other emerging infectious diseases.
      ). Surveillance systems using search metric analyses such as Google Trends (GT) have shown promise in tracking influenza in real time, faster compared with traditional data collection on influenza, which typically lags 12–14 days behind (
      • Ginsberg J.
      • Mohebbi M.H.
      • Patel R.S.
      • et al.
      Detecting influenza epidemics using search engine query data.
      ).
      Epidemiological studies using search metrics assume that those falling ill with a particular disease will search for it online and the volume and geographical location of such searches can be interpreted as a proxy for disease incidence and location. Initial flaws in methodology resulted in an overestimation of influenza incidence due to search queries being overly influenced by media publicity rather than disease activity (
      • Lazer D.
      • Kennedy R.
      • King G.
      • et al.
      Big data. The parable of Google Flu: traps in big data analysis.
      ). Newer algorithms are now being tested that take better account of such confounding factors (
      • Santillana M.
      • Zhang D.W.
      • Althouse B.M.
      • et al.
      What can digital disease detection learn from (an external revision to) google flu trends?.
      ), and GT can now show major news stories on the same time line. Indeed, some emergency departments have demonstrated that such data may successfully be used to predict staffing and vaccine stocking needs (
      • Araz O.M.
      • Bentley D.
      • Muelleman R.L.
      Using Google Flu Trends data in forecasting influenza-like-illness related ED visits in Omaha, Nebraska.
      ;
      • Thompson L.H.
      • Malik M.T.
      • Gumel A.
      • et al.
      Emergency department and “Google flu trends” data as syndromic surveillance indicators for seasonal influenza.
      ).
      Although increasingly used in other fields of medicine, “big data” has so far seen little use in dermatology. In this study, we use GT to identify the geographical and seasonal trends in three tickborne diseases, (Lyme disease, ehrlichiosis, and Rocky Mountain spotted fever (RMSF)) and one fungal disease, (coccidioidomycosis). Such diseases are highly relevant to dermatologists who may be the first ones to diagnose them via their cutaneous manifestations (Supplementary Table S1 online). We then compare this with traditional Center for Disease Control (CDC) data on actual disease events, which we hypothesized will correlate with search data and thereby demonstrate the utility of this resource for tracking and predicting these dermatologically relevant infectious diseases.
      Tickborne diseases are most prevalent in the summer months (Figure 1) because of the life cycle of the tick vector and the increase in human outdoor activities (
      • Dana A.N.
      Diagnosis and treatment of tick infestation and tick-borne diseases with cutaneous manifestations.
      ;
      • Shapiro E.D.
      Clinical practice. Lyme disease.
      ). We demonstrated a correlation between monthly Google search frequency and the actual seasonal incidence of the tickborne diseases (Lyme r=0.69, P<0.0001; ehrlichiosis r=0.59, P<0.0001; RMSF r=0.46, P<0.0001; Table 1 and Supplementary Materials and Methods online). Unlike the tickborne diseases, coccidioidomycosis does not have a seasonal incidence peak according to the CDC data. Fittingly, our analysis showed only a weak seasonal correlation (r=0.4169) between GT and CDC data (Table 1). This result is likely due to the much larger data set we have analyzed, allowing even subtle correlations to be elicited. If we reduce our data to look at only 1 year, all of the tickborne seasonal data remain significant (P<0.05, for 2012 only), but coccidioidomycosis data then does not reach statistical significance (e.g., P=0.14; 2012 analyzed alone).
      Figure thumbnail gr1
      Figure 1Temporal correlation between Lyme disease search queries and Center for Disease Control (CDC) Morbidity And Mortality Weekly Report (MMWR) data. Open box plot shows averages and standard deviations of Lyme disease CDC reported cases each from 2007 to 2012. Solid circle plot shows Google search query average frequencies and standard deviations from 2007 to 2012 for the search topic Lyme disease. GT Search Frequency % denotes the format of GT data, which normalizes search frequency for each search term from 0 to 100%. GT, Google Trends.
      Table 1Correlation between GT and CDC geographic and temporal data
      a.Lyme DiseaseEhrlichiosisRMSFCoccidioidomycosis
      Pearson’s r0.69120.59260.45720.4169
      95% confidence interval0.5471–0.79550.4184–0.72480.2521–0.62290.1822–0.6066
      P-value (two-tailed)<0.0001<0.0001<0.00010.0009
      b.201220112010200920082007
      Lyme disease0.74440.75050.61040.68550.60950.7194
      P-value (two-tailed)<0.0001<0.0001<0.0001<0.0001<0.0001<0.0001
      Ehrlichiosis0.3231
      P-value (two-tailed)0.0346
      RMSF0.63860.59380.38650.31840.29040.06475
      P-value (two-tailed)<0.0001<0.00010.00610.02580.0430.6654
      Coccidioidomycosis0.48130.4907
      P-value (two-tailed)0.01730.0174
      Abbreviations: CDC, Center for Disease Control; GT, Google Trends; MMWR, Morbidity And Mortality Weekly Report; RMSF, Rocky Mountain spotted fever.
      Table 1a. Pearson’s correlation coefficients and P-values derived from the comparison of cumulative GT search data and CDC MMWR monthly reports for the listed diseases between 2007 and 2012. b. Spearman’s rank correlation coefficients and P-values derived from the comparison of state-based GT search data in the mainland United States to the CDC MMWR monthly reports by state for each individual year listed. Inadequate frequency of searches for state-based subanalysis for Ehrlichiosis from 2007 to 2011 and for Coccidioidomycosis from 2007 to 2010.
      Tickborne diseases are restricted to the habitat of the tick vector—Lyme disease cases are most prevalent in the northeast and upper Midwest states corresponding to the habitat of the Lyme vector Ixodes scapularis. The soil-dwelling fungus coccidioidomycosis is prevalent in the southwestern United States (
      • Welsh O.
      • Vera-Cabrera L.
      • Rendon A.
      • et al.
      Coccidioidomycosis.
      ). Accordingly, we demonstrated a geographical correlation between the states with the most searches for the specific infectious disease and states having the most reported new infections (for year 2012 in order of decreasing correlation: Lyme r=0.74, P<0.0001; RMSF r=0.64, P<0.0001; coccidioidomycosis r=0.48, P=0.0173; ehrlichiosis r=0.32, P=0.03; Table 1 and Supplementary Materials and Methods online).
      CDC infectious disease data have a typical 1–2 week reporting lag (
      • Ginsberg J.
      • Mohebbi M.H.
      • Patel R.S.
      • et al.
      Detecting influenza epidemics using search engine query data.
      ;
      • Lazer D.
      • Kennedy R.
      • King G.
      • et al.
      Big data. The parable of Google Flu: traps in big data analysis.
      ). GT has the potential to predict disease outbreaks closer to real time. In fact, when GT was dynamically recalibrated by combining it with CDC forward projected data (based on a 2-week lag), it was more predictive of influenza incidence than CDC or GT alone (
      • Lazer D.
      • Kennedy R.
      • King G.
      • et al.
      Big data. The parable of Google Flu: traps in big data analysis.
      ).
      As climate change alters the distribution of the Lyme disease vector, the black-legged tick (
      • Feria-Arroyo T.P.
      • Castro-Arellano I.
      • Gordillo-Perez G.
      • et al.
      Implications of climate change on the distribution of the tick vector Ixodes scapularis and risk for Lyme disease in the Texas-Mexico transboundary region.
      ;
      • Ogden N.H.
      • Radojevic M.
      • Wu X.
      • et al.
      Estimated effects of projected climate change on the basic reproductive number of the Lyme disease vector Ixodes scapularis.
      ) or the host of the tick, the white-footed mouse, Peromyscus leucopus, (
      • Roy-Dufresne E.
      • Logan T.
      • Simon J.A.
      • et al.
      Poleward expansion of the white-footed mouse (Peromyscus leucopus under climate change: implications for the spread of lyme disease.
      ) cases of Lyme disease are spreading to new locales (
      • Robinson S.J.
      • Neitzel D.F.
      • Moen R.A.
      • et al.
      Disease risk in a dynamic environment: the spread of tick-borne pathogens in Minnesota, USA.
      ;
      • Wang P.
      • Glowacki M.N.
      • Hoet A.E.
      • et al.
      Emergence of Ixodes scapularis and Borrelia burgdorferi, the Lyme disease vector and agent, in Ohio.
      ). In areas not normally affected by Lyme, “big data” may serve as a warning system that alerts physicians that disease may be extending into their area. Such clinical tips may allow earlier diagnosis and treatment and therefore lower morbidity in such diseases.
      The methodology presented here has been subject to significant criticism (
      • Lazer D.
      • Kennedy R.
      • King G.
      • et al.
      Big data. The parable of Google Flu: traps in big data analysis.
      ). For one, correlations do not indicate causality and the clinical relevance of weak correlations (such as some presented here) is subject to question. Confounding factors include search term selection and search algorithm updating by Google in accordance with their business model. Media publicity may explain the stronger correlations found with Lyme disease.
      Correlations using search terms for uncommon conditions, such as the other diseases in this analysis, have not previously been reported in search metric analyses and may be a better representation of the true correlation rate. In fact, our findings may suggest a role for public health campaigns on less common conditions to facilitate following and tracking epidemics.
      The correlation of this historical data suggests that big data mining using GT may be a useful resource in understanding the links between climate and infectious disease. In addition, it may prove useful in predicting disease outbreaks to help with emergency preparedness and resource distribution. In the future, we hope for more options in daily data extraction and more precise location information. We propose that a more ideal big data platform would be a research tool not tied to a company core business model and may allow for integration of traditional data sources such as CDC data.

      Supplementary Material

      Supplementary material is linked to the online version of the paper at http://www.nature.com/jid

      REFERENCES

        • Anema A.
        • Kluberg S.
        • Wilson K.
        • et al.
        Digital surveillance for enhanced detection and response to outbreaks.
        Lancet Infect Dis. 2014; 14: 1035-1037
        • Araz O.M.
        • Bentley D.
        • Muelleman R.L.
        Using Google Flu Trends data in forecasting influenza-like-illness related ED visits in Omaha, Nebraska.
        Am J Emerg Med. 2014; 32: 1016-1023
        • Dana A.N.
        Diagnosis and treatment of tick infestation and tick-borne diseases with cutaneous manifestations.
        Dermatol Ther. 2009; 22: 293-326
        • Feria-Arroyo T.P.
        • Castro-Arellano I.
        • Gordillo-Perez G.
        • et al.
        Implications of climate change on the distribution of the tick vector Ixodes scapularis and risk for Lyme disease in the Texas-Mexico transboundary region.
        Parasit Vectors. 2014; 7: 199
        • Ginsberg J.
        • Mohebbi M.H.
        • Patel R.S.
        • et al.
        Detecting influenza epidemics using search engine query data.
        Nature. 2009; 457: 1012-1014
        • Lazer D.
        • Kennedy R.
        • King G.
        • et al.
        Big data. The parable of Google Flu: traps in big data analysis.
        Science. 2014; 343: 1203-1205
        • Milinovich G.J.
        • Magalhaes R.J.
        • Hu W.
        Role of big data in the early detection of Ebola and other emerging infectious diseases.
        Lancet Glob Health. 2015; 3: e20-e21
        • Ogden N.H.
        • Radojevic M.
        • Wu X.
        • et al.
        Estimated effects of projected climate change on the basic reproductive number of the Lyme disease vector Ixodes scapularis.
        Environ Health Perspect. 2014; 122: 631-638
        • Robinson S.J.
        • Neitzel D.F.
        • Moen R.A.
        • et al.
        Disease risk in a dynamic environment: the spread of tick-borne pathogens in Minnesota, USA.
        Ecohealth. 2014;
        • Roy-Dufresne E.
        • Logan T.
        • Simon J.A.
        • et al.
        Poleward expansion of the white-footed mouse (Peromyscus leucopus under climate change: implications for the spread of lyme disease.
        PLoS One. 2013; 8: e80724
        • Santillana M.
        • Zhang D.W.
        • Althouse B.M.
        • et al.
        What can digital disease detection learn from (an external revision to) google flu trends?.
        Am J Prev Med. 2014; 47: 341-347
        • Shapiro E.D.
        Clinical practice. Lyme disease.
        N Engl J Med. 2014; 370: 1724-1731
        • Thompson L.H.
        • Malik M.T.
        • Gumel A.
        • et al.
        Emergency department and “Google flu trends” data as syndromic surveillance indicators for seasonal influenza.
        Epidemiol Infect. 2014; 142: 2397-2405
        • Wang P.
        • Glowacki M.N.
        • Hoet A.E.
        • et al.
        Emergence of Ixodes scapularis and Borrelia burgdorferi, the Lyme disease vector and agent, in Ohio.
        Front Cell Infect Microbiol. 2014; 4: 70
        • Welsh O.
        • Vera-Cabrera L.
        • Rendon A.
        • et al.
        Coccidioidomycosis.
        Clin Dermatol. 2012; 30: 573-591