Richard D. Morey

University of Groningen

- Philosophy: Jan-Willem Romeijn (Rijksuniversiteit Groningen)
- Statistics: Paul Speckman, Jeff Rouder (University of Missouri)
- Psychology: Rink Hoekstra (Rijksuniversiteit Groningen)

- How strong is the evidence for anthropogenic global warming?
- Is someone who rejects the theory of relativity rational?
- Should we believe in subliminal priming, given the literature on it?
- How strongly should we believe that \(\delta>0\), given a particular data set?

These are all scientific questions, concerning evidence, reason, and belief. Only the last sounds odd.

That which would justify a change in a person's belief regarding a question of interest (Fox, 2011).

- Requires
*justfication*or*rationality*(otherwise not useful) - Is inherently
*subjective*(questions and belief are subjective) - Requires accounting for
*belief* - Additionally: is relative (for our purposes)

"The processes of [statistical inference]...are certainly not any sort of 'reasoning', at least not in the sense in which this word is used in other instances; they are acts of will." (Neyman, 1957)

"We are inclined to think that as far as a particular hypothesis is concerned, no test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of that hypothesis." (Neyman & Pearson, 1933)

"We are inclined to think that as far as a particular hypothesis is concerned, no test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of that hypothesis." (Neyman & Pearson, 1933)

"...the **feeling** induced by a test of significance has an objective basis in that the probability statement on which it is based is a fact communicable to and verifiable by other rational minds. The level of significance in such cases fulfils the conditions of a measure of the rational grounds for the disbelief it engenders" (my emphasis; Fisher, 1959).

- Beliefs can be represented as "plausibilities": \(0\leq p \leq 1\)
- Plausibilies conform to the laws of probability (Cox, 1946; Ramsey, 1926; de Finetti, 1935; Joyce, 1998)
- Updating beliefs in response to data \(y\) is done according to the Bayesian conditionalization:

\[ \frac{p_y(\theta_0)}{p_y(\theta_1)} = \frac{p(\theta_0\mid y)}{p(\theta_1\mid y)} = \frac{p(y\mid\theta_0)}{p(y\mid\theta_1)}\times\frac{p(\theta_0)}{p(\theta_1)} \]

Schrier et al. (2008): "The interpretation of systematic reviews with meta-analyses: an objective or subjective process?"

- 8 medical researchers study 23 studies grouped as 1 + 5 meta-analyses; total N: 69,505
- Researchers given typical inferential statistics for meta-analyses
- "I believe magnesium has now been shown to be beneficial for patients during the post-MI period."
- "I recommend that magnesium therapy be used in patients during the post-MI period."

"I believe magnesium has now been shown to be beneficial for patients during the post-MI period."

"I recommend that magnesium therapy be used in patients during the post-MI period."

- The same data moved some researchers in one direction, others in the other
- Data increased the disagreement among the researchers, in spite of large \(N\)
- What hope is there for research if so much data can't induce agreement?

- Experimental data evaluation (Do I need more participants?)
- Theory building (which phenomena do I need to account for?)
- Theory evaluation (is the evidence for my theory strong or week?)
- Evaluation of clinical trials (how much evidence for efficacy is there?)
- Basically...
**everywhere in science**!

- Scenario 1: Statistical falsification
- Scenario 2: Power, Type I error rate, significance
- Scenario 3: \(p\) value and sample size
- Scenario 4: \(p\) value, sample size, and power (extended Q3)
- Scenario 5: Confidence interval

Two researchers disagree about the sex of an adult antelope skull found. It is known that for adults of this antelope species, all males have antlers between 7cm and 12cm long. All females have antlers between 3cm and 5cm long. There are no exceptions. However, their assistant — who found the skull — has not told them the length of the antlers yet, and neither researcher has seen the antlers.

Based on its location in a particular grave site, the two researchers have their own hypotheses about the sex of the antelope. Dr. Z believes that the skull belonged to a female antelope. Dr. W believes that the skull belonged to a male antelope.

Dr. X, their assistant, returns with the measurements of the antlers.

The exact length of the antlers was 4.5cm.

Suppose that like Dr. X, you were completely neutral and had no preference for either hypothesis. In light of Dr. X's measurement, how does the support for Dr. W's hypothesis relate to the support for Dr. Z's?

e.g., "The evidence for Dr. W's hypothesis is 10 times stronger than the evidence for Dr. Z's hypothesis"

Two researchers disagree about the size of an effect of a genetic mutation on the weight of mice.

Dr. A believes that this genetic mutation decreases the weight of mice by 1 gram. Dr. B believes that this genetic mutation increases the weight of mice by 1 gram.

The two researchers ask a neutral third researcher, Dr. C, to conduct an experiment to test their hypotheses. Because Dr. C has no preference for either hypothesis, she randomly selects, by a fair coin flip, Dr. B's hypothesis as the null hypothesis. She designs and performs an experiment so that the statistical test she performs on the data has a type I error rate of 5% and a power, if Dr. A's hypothesis is correct, of 80%.

Dr. C performs the statistical test. The results are statistically significant, indicating that the null hypothesis Dr. B's is to be rejected. For the purpose of this question, consider the assumptions of the statistical procedure met.

Suppose that like Dr. C, you were completely neutral and had no preference for either hypothesis before the experiment. In light of Dr. C's findings, how does the support for Dr. A's hypothesis relate to the support for Dr. B's?

e.g., "The evidence for Dr. A's hypothesis is 10 times stronger than the evidence for Dr. B's hypothesis"

\(N=55\) participants. 2 participants answered \(\infty\).

Two researchers are studying the density of a new material, previously unknown to science. Based on their particular theoretical leanings, they disagree about what the density of the material will be.

Dr. K believes that the density of the new material is 1.2 g/cm3. Dr. L believes that the density of the new material is 0.8 g/cm3.

The two researchers ask a neutral third researcher, Dr. Z, to conduct measurements to test the density of the new material. Dr. Z is an expert at measuring density, but due to recent cuts in funding for lab equipment, he has to use substandard equipment. To overcome this problem, Dr. Z measures the material 10 times and constructs a 95% confidence interval around the mean density measurement, based on a standard t procedure. For the purpose of this question, consider the assumptions of the statistical procedure met.

Dr. Z reports back to the two scientists the 95% confidence interval shown below.

Four types of intervals could be presented:

\(N=64\). One participant answered "Infinity" in each non-"Equal" condition.

- Substantial disagreement exists regarding whether evidence can be extracted from classical statistical reports.
- If they do believe it, their assessments of evidence are variable across orders of magnitude
- However, evidence seems to be somewhat meaningful to researchers: trends make sense.

Idea: Use psychometric techniques to assess evaluations of statistical evidence.

(\(\delta\) is the standardized effect size \((\mu-\mu_0)/\sigma\))

- Evidence is a critical idea in science and philosophy
- Current dominant statistical techniques don't quantify evidence (and aren't meant to!)
- Evidence can be formalized using Bayesian statistics
- Needed: Training in statistical methods that interface with belief! (See: BayesFactor software - Bayesian linear models in R)
- Scientific evidence is a complicated thing; statistical evidence is only one (formal) piece of the pie.