19  Statistical Philosophy

19.1 Some larger-scale advice before moving into modeling

More than any single method, R (or Python) package, or complex statistical model, the ultimate goal of this course is for you to develop a personal and operational statistical philosophy. This term is often thrown around, so we define a statistical philosophy as:

A set of principles that consistently guides how you think about data, models, uncertainty, and decisions.

In other words, a good statistical philosophy is not a set of rules; it does not tell you exactly what to do in every situation. Instead, it functions to reduce bias, prevent analytical wandering, promote clarity, encourage efficiency. Most importantly, a good statistical philosophy evolves with you, as you learn more. I will offer a set of guiding principles and aphorisms early in the course. You should modify them, reject them, or replace them entirely. What matters is that you have something guiding your choices. This is how you become consistent, thoughtful, and credible as a scientist rather than someone who simply runs analyses.

Look and listen to your graduate student peers, postdoctoral researchers, and faculty who you interact with and pay close attention to the different statistical perspectives that exist. It’s a fascinating amount of variation! One good example is how different practitioners interpret the output from multimodel (AIC, BIC, or DIC) comparisons. Whenever you hear a different perspective, ask the researcher about how the philosophy that guides their inferential decisions.

What follows are two components. First, I describe four statistical aphorisms. They are intentionally broad. Again, think of them as guardrails rather than rules. Then, I explain what I mean by analytical workflow, which supports an operational statistical philosophy.

19.2 Four Statistical Aphorisms

Beware perfection. In data analysis, one can be paralyzed by by the moving target of the perfect solution. But what does perfection really mean in practice? Does it mean that an analysis looks complex enough to impress your colleagues or instructors, even if it is does not match the scope or intent of your questions (or, worse yet, is blatantly incorrect)? Is it an analysis that works flawlessly the first time, right out of the box? Or is it one that is quickly built and deployed but also hides critical assumptions or produces error-filled predictions?

The danger in seeking statistical (or, more generally, scientific) perfection lies in waiting for an ideal solution that may never arrive, or may arrive too late. In doing so, your and your collaborators’ progress stalls, opportunities are missed, and, perhaps most importantly, learning is sorely delayed. This idea is captured poetically in the words of Robert Watson-Watt, the inventor of radar:

Give them the third best to go on with; the second best comes too late, the best never comes.

Watson-Watt’s advice should remind us that, in dynamic and complex fields such as radar development or our own messy statistical modeling, timely action often outweighs our vision of perfection. A solution that is simply good enough today allows you to:

  • Learn through deployment: Practical application often reveals insights that you cannot anticipate.
  • Iterate and improve: Real-world feedback reveals issues far faster than endless refinement in isolation!
  • Transparently communicate analytical limitations: Collaborators and stakeholders can better understand assumptions, uncertainties, and the scope of your work.

In short (as as you know), perfection is often the enemy of good progress. By purposefully embracing imperfection, you will be able to better learn, iterate, and produce work that is timely, practical, and ultimately more impactful.

There is a reason why my lab is called the Behavioral Complexity Lab. Ecological and biological systems are wonderfully complex, thrillingly messy, and often only partially observed. Pretending that such complexities do not exist does not make your analysis stronger or more robust; it just makes the analysis less honest and transparent. Actively embracing complexity does not mean building the most complicated model you can imagine. It means acknowledging that the process generating your data is rarely simple and that oversimplification might be very costly.

At the same time, complexity must be earned. Adding complex model structure, more parameters, or more nested hierarchical layers should be motivated by expert knowledge of biology (at any level) or study design, not by the existence of a published paper, an established method, or a online tutorial. Tis better to manage complexity rather than avoid it (mainly because complexity is all around us). Good analysts learn how to represent complexity in ways that are interpretable, testable, and aligned with the question being asked.

This is based on a real sign I saw on a hiking trail in southern Brazil, and I thought it was a good metaphorical reminder to not fall into someone else’s statistical trap!

Consider a statistical approach of a senior colleague or advisor that they deem the correct or optimal approach (without justification)? One of the fastest ways to absolutely derail an otherwise good analysis is to blindly inherit and use a statistical framework without understanding why it was chosen in the first place. If you blindly accepted it, what is to say that the previous scientist did not also do the same, ad infinitum? Methods might be copied or carried over from published papers, lab traditions, or online examples without careful thought. Every statistical choice embeds assumptions about scale, error, independence, and causality. When you adopt someone else’s approach wholesale, you also adopt all of their assumptions, whether or not they apply to your data. So, please be cautious.

The assignments in this course will repeatedly (ad nauseum) ask you to justify your modeling decisions clearly. And your analysis decisions should align with your question, your data, and your scientific goals.

There is a point at which conducting more data exploration, running more (and different) models, and sifting through more diagnostics stop improving understanding and start delaying decisions. This is paralysis of analysis, and it is insidious. It often shows up as endless model tweaking, perpetual searching for the perfect model structure, or reluctance to commit to an inferential path (which can lead to never-ending arguments with co-authors),

A good analytical workflow balances scientific rigor with momentum. Certainly, you may end up with data that are near-perfect and easy to work with. You also may use models that are well-matched to your data constraints without you ever lifting an analytical finger. But this is rare. You should aim for models that are good enough to answer the question at hand, given the data you have, within the constraints you face. This does not mean lowering your standards, especially when faced with looming deadlines. This simply means recognizing diminishing returns and knowing when to move forward. Science advances through iteration and improvement, not achieving perfection on a first attempt.