Skip to Main Content

Formulating a Research Question

February 06, 2023
  • 00:02<v Maria>My name is Maria Ciarleglio</v>
  • 00:04and I'm a faculty member
  • 00:05in the Department of Biostatistics
  • 00:08at the Yale School of Public Health.
  • 00:11In this video series I will introduce the clinical research
  • 00:14process to prepare you to collaborate with a statistician.
  • 00:20In this first video we'll discuss what is often
  • 00:24the first step of the research process,
  • 00:26formulating a research question.
  • 00:31The first step in the research process
  • 00:33is to convert the need for information
  • 00:36into an answerable question or hypothesis.
  • 00:40A well formulated research question is specific and precise.
  • 00:45The research question guides the study design
  • 00:49and other design-related study characteristics,
  • 00:52the data that are collected, the data analysis
  • 00:56and ultimately determines what you can conclude
  • 00:59at the end of the study.
  • 01:02The PICO criteria can be used to guide you
  • 01:05in framing a comparative research question.
  • 01:09The PICO framework begins by specifying the population
  • 01:12of interest, then the intervention being studied,
  • 01:16the control or comparator group,
  • 01:19and the outcomes of interest.
  • 01:23Begin by specifying the population of interest.
  • 01:26For example,
  • 01:27patients with non-alcoholic fatty liver disease.
  • 01:31The target population is the group of patients
  • 01:34to which you would like to generalize your study findings.
  • 01:38The study population is the group
  • 01:40of patients to which you have access.
  • 01:44The study population may be a subset
  • 01:47of the target population.
  • 01:50For example, your goal may be to generalize
  • 01:53to all adult Americans
  • 01:55with non-alcoholic fatty liver disease.
  • 01:58However, you may be limited to a patient population
  • 02:02from a certain state or medical center.
  • 02:06In our case, we may only have access to patients
  • 02:09with non-alcoholic fatty liver disease
  • 02:12followed in the liver clinic from 2015 to 2020.
  • 02:18In this case, you could either collect data
  • 02:20from all individuals in the available study population
  • 02:24if it's feasible to do that.
  • 02:26Otherwise, if the study population is too large
  • 02:29you could select a random sample
  • 02:32from that available study population.
  • 02:35If you choose a representative random sample
  • 02:39your results are generalizable to that study population.
  • 02:45Next, specify the main intervention,
  • 02:49which is the exposure test treatment
  • 02:52or the main prognostic factor
  • 02:54that you are interested in studying.
  • 02:57For example, lifestyle modification to achieve weight loss
  • 03:02or if studying liver cancer,
  • 03:04your intervention of interest could be serafenib
  • 03:07to prolonged survival.
  • 03:12If you're interested in performing a comparison,
  • 03:15the next step is to specify a control
  • 03:18or comparison intervention or exposure.
  • 03:23This can be, for example,
  • 03:24a placebo control or the current standard of care.
  • 03:30Finally, we must specify the clinical outcome
  • 03:33or primary endpoint of your study.
  • 03:37This includes the element of time, if that's appropriate,
  • 03:40and this would apply if you're looking
  • 03:42at a fixed follow up time period post-intervention.
  • 03:47Say three month survival following surgery
  • 03:51or NAFLD resolution one year following
  • 03:54a certain percentage reduction in total body weight.
  • 04:01Let's run through an example of the type of study
  • 04:03we often perform using medical record data.
  • 04:07The research question asks
  • 04:10among Hepatitis B infected persons,
  • 04:12what factors tests best identify individuals
  • 04:17at highest risk of progression,
  • 04:19as well as those at low risk of progression?
  • 04:23The population studied is Hepatitis B infected persons
  • 04:28treated at the Yale Liver Center between 2011 and 2021.
  • 04:35The interventions of interest
  • 04:37are different patient characteristics.
  • 04:39Specifically, the study will look at different permutations
  • 04:43of key baseline exposures or risk factors
  • 04:47identified in previous studies of Hepatitis B prognosis.
  • 04:52Here, the investigators will look at age
  • 04:55presence of fibrosis, presence of cirrhosis,
  • 04:58elevated ALT, and detectable viral load.
  • 05:05The comparator group for each of these factors
  • 05:08is absence of the baseline factor.
  • 05:12The outcomes of interest are liver related morbidity,
  • 05:16progression of liver disease
  • 05:17and mortality during up to 10 years of follow up.
  • 05:22Now, this is more of an exploratory study
  • 05:25looking for signals of association, but even still,
  • 05:29it has a clearly defined population,
  • 05:31intervention or exposures of interest,
  • 05:34control or reference levels of the exposures
  • 05:37and outcomes of interest.
  • 05:40Sitting down and thinking through the PICO criteria
  • 05:43forces you to make decisions
  • 05:45and pre-specify important aspects of your study.
  • 05:51As we saw in the last example,
  • 05:53there are often multiple clinical endpoints of interest.
  • 05:57Endpoints are classified as clinical or nonclinical.
  • 06:03Clinical endpoints describe outcomes
  • 06:05involving how a patient feels, functions or survives.
  • 06:10They may be assessed by a clinician
  • 06:12and involve clinical judgment,
  • 06:14such as the occurrence of stroke or MI.
  • 06:17They may also be measured by a standard performance measure
  • 06:21such as a pulmonary function test
  • 06:23or they can be patient-reported,
  • 06:25such as self-reported symptoms or quality of life.
  • 06:30Nonclinical endpoints include biomarkers
  • 06:33that may not directly relate to how a patient feels,
  • 06:37however they're thought to be important indicators
  • 06:39of the disease process.
  • 06:41These endpoints can include blood tests, imaging
  • 06:45or other physiological measures such as blood pressure.
  • 06:49A good primary outcome should directly align
  • 06:52with the primary aim of the study.
  • 06:55The endpoint should be accurate
  • 06:57and precise, quantifiable, validated, and reproducible.
  • 07:02We generally include a single primary endpoint.
  • 07:06The goal should be to choose a primary endpoint
  • 07:09that will influence decision making in practice.
  • 07:13The most significant and impactful endpoint that addresses
  • 07:17the research question is chosen as the primary endpoint
  • 07:21and additional important endpoints may be designated
  • 07:25as secondary or tertiary.
  • 07:28Secondary endpoints may not be considered sufficient
  • 07:32to influence decision making alone,
  • 07:35but may help support the claim of efficacy.
  • 07:38Tertiary endpoints
  • 07:39are sometimes called exploratory endpoints.
  • 07:43If included, they are generally used
  • 07:45to test exploratory hypotheses.
  • 07:50Again, we generally use a single primary outcome.
  • 07:54Using multiple primary endpoints may lead
  • 07:57to an unfocused research question and can present problems
  • 08:01with interpretation if the treatment effect is observed
  • 08:04to differ across the multiple outcomes.
  • 08:08However, multiple endpoints may be needed
  • 08:11when a clinical benefit depends
  • 08:13on more than one aspect of the disease.
  • 08:16For example, in Alzheimer's, we may require an effect
  • 08:20on both cognition and function,
  • 08:23so there may be situations where multiple endpoints
  • 08:26are necessary for demonstrating efficacy.
  • 08:30The statistical issue with multiple endpoints
  • 08:32is what we call multiplicity.
  • 08:36When we conduct statistical analysis
  • 08:38and perform hypothesis tests,
  • 08:40there's a chance that we conclude
  • 08:42a significant difference exists between the arms tested
  • 08:47when in truth, there is no difference.
  • 08:49This is due to random variation in the data
  • 08:52that we can observe, but this is a mistake in error,
  • 08:56and we refer to this type of error as a type one error
  • 09:02or the alpha level of the test.
  • 09:05We like to keep this type of error low,
  • 09:08so we typically set the type one error of our tests at 5%.
  • 09:13So when you're testing a single endpoint,
  • 09:16you can maintain a type one error of 5%.
  • 09:20However, suppose we're testing two primary endpoints
  • 09:23and success on either endpoint would lead
  • 09:26to a conclusion of a treatment difference.
  • 09:30The type one error rate on each endpoint compounds
  • 09:34and there's an inflation of the overall type one error
  • 09:36probability above 5%.
  • 09:40This increases the chance of false conclusions
  • 09:43regarding the efficacy of the intervention.
  • 09:46Special statistical testing procedures
  • 09:49need to be used to control the type one error rate
  • 09:52for the study with multiple endpoints.
  • 09:56Multiple primary endpoints occur in three ways.
  • 10:00The first is when there are multiple endpoints
  • 10:03and each endpoint could be sufficient
  • 10:05on its own to establish the efficacy
  • 10:07of the intervention being tested.
  • 10:10These multiple endpoints correspond
  • 10:11to multiple chances of success,
  • 10:14so failure to adjust for multiplicity
  • 10:17can lead to type one error rate inflation
  • 10:20and a false conclusion of effectiveness.
  • 10:23The second option is when the determination of effectiveness
  • 10:27depends on success on all primary endpoints
  • 10:31when there are two or more primary endpoints.
  • 10:34In this setting, there are no multiplicity issues related
  • 10:37to the primary endpoints
  • 10:40as there is only one path that leads
  • 10:42to a successful outcome for the trial and therefore,
  • 10:46no concern with type one error rate inflation.
  • 10:50The third option combines several aspects
  • 10:52of effectiveness into a single primary composite endpoint.
  • 10:57This avoids multiple endpoint related multiplicity issues.
  • 11:01In many cardiovascular studies
  • 11:04it's common to combine several endpoints.
  • 11:06For example, cardiovascular death, heart attack and stroke
  • 11:11into a single composite primary endpoint.
  • 11:15In this case, death is considered on its own
  • 11:18as a secondary endpoint.
  • 11:19If any one of the elements
  • 11:21of the composite outcome is observed,
  • 11:23then the endpoint has occurred for that patient.
  • 11:27It's important that the endpoints included
  • 11:29in the composite endpoint
  • 11:31are of similar clinical importance.
  • 11:34Using a composite endpoint is helpful
  • 11:37when the components are individually rare
  • 11:41so choosing a composite endpoint allows you to
  • 11:43observe more events.
  • 11:45A limitation of using a composite endpoint is that
  • 11:49given the sample size of the study,
  • 11:51there may not be adequate statistical power
  • 11:55to test each component of the endpoint separately.
  • 11:59We'll discuss statistical power in a future video
  • 12:02on elements of sample size calculations.
  • 12:05We'll also discuss endpoints and variables in general,
  • 12:09from a data collection perspective in a future video.
  • 12:15In this video, we discussed important things
  • 12:18to consider when formulating your research question.
  • 12:22From the research question will flow
  • 12:24the specific statistical hypotheses to be tested,
  • 12:28the design of the study, including the sample size,
  • 12:31the data necessary to answer the research question,
  • 12:35the statistical analysis that will be performed
  • 12:38and the conclusions that can be made.
  • 12:41The next video, which is the second video in this series,
  • 12:45will give you an overview
  • 12:46of study designs commonly used in clinical research.
  • 12:51In video three, we will discuss the data collection process
  • 12:55and formally define different variable types.
  • 12:58This video will prepare us
  • 13:00for video four on sample size determination.