Continuous Fetal Monitoring (Cardiotocography) Flunks Its Test—Again

by Henci Goer  |   April 7, 2017

A Lancet commentary trying to explain away the findings of a trial of computerized interpretation of fetal monitoring tracings popped up in CBU’s Google alerts, but we get ahead of ourselves. Let’s start with a look at the trial.

The trialists hypothesized that: “a substantial proportion of substandard care results from failure to correctly identify abnormal fetal heart rate patterns, that improved recognition of abnormality would reduce substandard care and poor outcomes, and that improved recognition of normality would decrease unnecessary intervention.” To test that hypothesis, the trial evaluated whether adding color-coded (blue: least severe; yellow: moderate severity; red: most severe) computer interpretations of degrees of abnormality to the display of fetal heart rate (FHR) tracings would improve newborn outcomes and decrease cesarean and instrumental deliveries.

The trial was conducted in 24 centers in the U.K. and Ireland and comprised 46,042 women laboring with a singleton or twin pregnancy at 35 weeks’ gestation or more and whose babies had no known congenital anomalies or fetal heart rhythm abnormalities. Participants were randomly allocated to continuous FHR monitoring (cardiotocography) with or without the color-coded decision aid. The trialists state that continuous monitoring isn’t routine in the U.K., but give no information on why participants were being continuously monitored or what percentage of the population they represented. All babies with an adverse outcome potentially related to insufficient oxygen in labor (hypoxia) were evaluated by a panel to determine whether different management might have prevented the adverse outcome. A sample of 6707 children were followed for two years to assess long-term health and development. The trial was big enough to have a 90% probability of detecting a 50% reduction in poor neonatal outcomes from 3 to 1-5 per 1000.

Of note, the trialists had to refine their original definition for the primary outcome (admission to neonatal intensive care within 48 hr of birth for 48 hr or more with evidence of feeding difficulties, respiratory problems, or abnormal neurologic symptoms [encephalopathy]) because many of the intensive care admissions for these reasons were due to other disorders, which implies that studies using admission to intensive care as a measure of hypoxia in labor are likely to overestimate its prevalence.

As for results, availability of the decision aid had no effect on newborn, maternal, or childhood outcomes. Equal percentages of babies (7 per 1000) experienced the primary outcome (demise during labor or up to 28 days after birth or significant morbidity, including moderate or severe abnormal neurologic symptoms or breathing complications). Rates were virtually identical for a long list of specific adverse outcomes such as Apgar score < 4 at 5 min, low cord-blood pH, need for resuscitation, newborn seizures, or transfer to the neonatal intensive care unit at birth. For women, cesarean (24%) and instrumental vaginal delivery (25%) rates were identical. The same was true for indications for cesarean or instrumental vaginal delivery (18% abnormal FHR; 22% delayed progress; 7-8% combination of both). Equal percentages of births with adverse newborn outcomes were judged to have had suboptimal management (14/35 decision-aid; 13/36 controls). No differences were found between groups in the children followed up for two years.

Availability of the decision aid also had no effect on whether clinicians intervened. Cesarean rates according to urgency (immediate threat to life; some threat of compromise; no threat of compromise; elective) were similar. The median (half delivered before and half after) time from identifying severely abnormal FHR to delivery (58 min) was identical and the range nearly so (13 to 264-279 min). Some cases of severely abnormal fetal heart rate were technical errors, for example, the monitor registered the maternal heart rate, which is much slower than the normal fetal heart rate, as the baby’s. Errors of this kind help to explain the relatively lengthy median times and longer durations in the face of seemingly severe FHR abnormalities.

In other words, the hypothesis was convincingly disconfirmed. The decision aid failed to reduce cesarean or instrumental deliveries, which means it didn’t improve ability to discriminate non-concerning from concerning heart rate patterns. The decision aid also failed to decrease instances of substandard care which means it didn’t improve ability to recognize severely abnormal FHR patterns. What is more, the aid flunked in a population at higher risk, if the trialists’ assertion that continuous fetal monitoring wasn’t routine is true.

“Truthiness” in Action

The continuous fetal monitoring research fits the definition of insanity. Going back decades to the original studies of continuous monitoring and ending with this latest trial, every twist and tweak to continuous FHR monitoring, has sprung from the unshakeable belief, despite the evidence, that continuous FHR monitoring will improve newborn outcomes and reduce unnecessary rescue deliveries compared with intermittent listening. Instead of a call to abandon ship, every failure to show that has led to a “try harder” response:

Internal monitoring will help by improving the quality of the tracings. It doesn’t (Bakker 2012; Harper 2013). Standardizing definitions of abnormal FHR will solve the problem. It doesn’t either (Rhose 2014). Admission test strips will identify babies already of concern so that they can be monitored continuously while healthy babies go on to intermittent listening, which will eliminate unnecessary intervention. Nope (Devane 2012). Fetal scalp-blood sampling for low blood pH or high lactose content will do the trick by enabling doctors to tell which babies are tolerating labor despite a concerning heart rate tracing. Wrong again (Alfirevic 2013; East 2015). Adding electrocardiogram data will allow better discrimination. Makes no difference (Neilson 2015).

This refusal to face facts continues right up to the present day. The authors of the accompanying commentary argue that the decision aid didn’t make a difference because no prescribed actions were attached to its diagnosis. Medical-model thinkers just can’t give it up. Continuous fetal monitoring has to work because it makes so much sense to them that it should.

“The great tragedy of Science: the slaying of a beautiful hypothesis by an ugly fact.”
−Thomas Henry Huxley

This brings us to why continuous FHR monitoring doesn’t work. It’s because the theory behind it, that insufficient oxygen is the main cause of neurologic injury and death, that heart rate changes reliably warn of impending injury, and that improved ability to distinguish between abnormal and normal FHR patterns would improve outcomes and reduce use of rescue delivery, is wrong on all counts. Ample research consistently shows that the link between abnormal FHR and condition at birth, as measured by such things as Apgar scores, cord blood pH, or lactate, is weak; that the link between condition at birth and abnormal newborn neurologic signs, such as altered consciousness, poor muscle tone, feeding problems, or seizure, is weak; and that the link between newborn abnormal neurologic signs and death or cerebral palsy is weak, which means the link between abnormal fetal heart-rate patterns and severe adverse outcomes is pretty much nonexistent (Alfirevic 2013; Althaus 2005; Chauhan 2008; Graham 2008; Hogan 2007; Lie 2010; Low 1990; Milsom 2002; Murphy 1990; Nelson 1996; Pin 2009;  Sameshima 2004;  Williams 2003; Yeh 2012; Yudkin 1994). More information won’t change that fact no matter how much you tinker with it. At one end of the scale, you have what everyone would agree is normal. At the other, you have severely abnormal patterns that everyone would agree demand action. Anything in between is reading tea leaves.

What’s more, there are many reasons for the disconnect between FHR patterns and newborn outcomes that have nothing to do with hypoxia in labor. For example, the injury may have occurred prior to labor (Badawi 1998; Fahey 2005), in which case rescue delivery can make no difference, or the precipitating event may occur so quickly that rescue delivery isn’t possible, as, for example, with placental detachment (abruption). Other problems such as fever during labor (Lieberman 2000), administering sodium deficient IV fluids or too much sugar in IV fluids, or drinking excessive amounts of fluids can produce the same symptoms as low oxygen, these being high blood lactate, low blood pH, and newborn seizures. If hypoxia isn’t the problem, delivery won’t improve the outcome (Higgins 1996; Johansson 2002; Moen 2009; Philipson 1987; Stratton 1995; West 2004). The problem may also be, as the panel reviewing poor outcomes in this trial found, a failure of response, not identification. Furthermore, the slowing of the FHR and the switch to anaerobic metabolism that eventually decreases blood pH are healthy adaptive responses to suboptimal conditions that function to protect the brain and vital organs from hypoxic injury (Bennet 2009; Fahey 2005; Low 1999; Ugwumadu 2014). One reason low blood pH correlates so poorly with neurologic injury is that the adaptation usually succeeds (Ruth 1988; Ugwumadu 2014).

The Take-Away

If continuous monitoring were equivalent to intermittent listening, it wouldn’t matter which was used, but it isn’t. Because of its false-positive rate (the monitor says there is a problem when there isn’t), continuous monitoring increases the likelihood of cesarean and instrumental vaginal delivery (Alfirevic 2013). For this reason, both the Society of Obstetricians & Gynaecologists of Canada and the Royal College of Obstetricians & Gynaecologists, the U.K.’s professional organization, recommend intermittent listening as the preferable monitoring method for healthy women and babies, which gives women unimpeachable support for insisting on it.