Testing the Untestable: When Edge Cases Are The Norm

Testing chaos visualization showing edge cases spawning more edge cases

A note on this series: This is a series of my software engineering philosophies focusing on the intersection of the regulated life science sector and the traditional software design paradigm. No proprietary projects or algorithms are shared; insights are based on my personal experiences and opinions shared within are my own and do not represent the view of my past or current employer.

Remember when you thought you’d tested everything? That adorable moment when your test coverage hit 100% and you actually believed your Focus Restoration Device™ was “validated”? Until a day trader straps it on while monitoring six Bloomberg terminals and r/wallstreetbets simultaneously. Your “doom-scrolling detection” algorithm can’t differentiate between mindless browsing and this person legitimately researching why a jpeg of a monkey is worth more than their house. They’re not distracted - they’re achieving peak focus on seventeen terrible decisions at once. This is why edge cases matter: your algorithm assumed distraction and focus are opposites, but this diamond-handed hero just proved they’re quantum superposed. 🚀

In medical software, edge cases aren’t outliers - they’re your entire customer base. Every patient is an edge case. Every clinical scenario is unique. And that pristine test environment you built? It has about as much in common with a real hospital as a fashion runway has with a war zone.

Here’s the uncomfortable truth: You can’t test medical software into safety. You can only test it into understanding its failure modes. The difference isn’t semantic - it’s philosophical¹.

Traditional software testing assumes that inputs are predictable, users follow instructions, environments are stable, and failures are bugs to fix. However, in the reality of medical software, inputs can be anything thrown by the universe, users often MacGyver solutions to save lives, environments range from sterile operating rooms to field hospitals, and failures serve as learning opportunities about the variety that was previously missed.

ISO 14971:2019 defines risk as the combination of probability of occurrence of harm and severity of that harm². But let’s reframe this through our entropy lens:

Risk = Unmanaged Variety × Clinical Impact

This isn’t replacing ISO 14971 - it’s a complementary way to think about risk that emphasizes the variety (entropy) your system must handle.

// src/tests/entropy_injection.rs
// Test Philosophy: Don't just confirm it works. Discover how it breaks.
// Chaos engineering for our Focus Restoration Device™ (emerging best practice, not regulatory requirement)

#[test]
fn test_focus_detector_under_clinical_chaos() {
    let pristine_pattern = load_browsing_pattern("test_data/focused_work_session.csv");
    assert!(FocusCore::new().analyze(pristine_pattern).is_therapeutic());

    // Inject real-world entropy that would make any QA engineer weep
    let chaos_scenarios = vec![
        ClinicalEntropy::DiamondHandsMode { hodl_strength: 0.99, panic_sell_threshold: -89.0 },
        ClinicalEntropy::RedditDDOverload { tabs_open: 47, copium_level: "astronomical" },
        ClinicalEntropy::ElonTweetDetected { market_volatility: 420.69, user_heartrate: 180 },
        ClinicalEntropy::MarginCallIncoming { sweat_conductivity: 0.95, denial_factor: 1.0 },
    ];

    for (i, chaos) in chaos_scenarios.iter().enumerate() {
        let mut chaotic_session = load_browsing_pattern("test_data/doom_scrolling.csv");
        chaotic_session.apply_entropy(chaos);

        let result = FocusCore::new().analyze(chaotic_session);

        // Not "is the zap voltage correct?" but "does it fail without electrocuting anyone?"
        assert!(result.is_safe(), "Scenario {} became unsafe with {:?}", i, chaos);

        if result.is_uncertain() {
            println!("Scenario {}: Correctly flagged uncertainty for {:?}", i, chaos);
        }
    }
}

This test doesn’t just check for correctness; it probes for resilience. We’re not asking, “Does it get the right answer?” We’re asking, “Does it fall apart when the world gets messy?” For SaMD, the second question is infinitely more important.

Every untested state isn’t just a coverage gap - it’s accumulated entropy waiting to manifest during someone’s medical emergency. Our Focus Restoration Device™ learned this the hard way when it encountered retail investors (more on that shortly).

Before we adopt a new philosophy, let’s ground ourselves in regulatory reality. The FDA and EU MDR don’t care about our feelings on entropy; they care about Verification and Validation (V&V).

Per FDA’s 21 CFR 820.3 definitions³:

Verification: “Confirmation by examination and provision of objective evidence that specified requirements have been fulfilled.” Translation: Did we build the device right? This is the low-entropy exercise: testing against requirements in a controlled lab environment.
Validation: “Establishing by objective evidence that device specifications conform with user needs and intended use(s).” Translation: Did we build the right device? This is the high-entropy challenge where our verified system meets real healthcare workers in the wild.

The fatal flaw in many SaMD projects is treating validation as a final, confirmatory step. It’s not. It’s a discovery process. The gap between a perfectly verified system and a validated one is where patient harm lives.

Our Focus Restoration Device™ learned this the hard way. Verified with 99.9% accuracy in controlled labs, it met retail investors with 47 Reddit tabs titled “This is not financial advice but…” The device couldn’t decide if rapidly switching between devastating loss posts and technical analysis meant the user needed therapeutic intervention or was already experiencing it. Your lab tests assumed people browse to consume information, not gamble their rent money on meme stocks.⁴

Traditional testing is a hunt for confirmation. We write tests to prove our code works, seeking the green checkmark of success. This is a dangerous mindset in SaMD. We must instead become explorers, mapping the boundaries of our system’s sanity. Our goal isn’t to prove it works, but to deeply understand how it fails.

This philosophy complements (not replaces) regulatory requirements. Think of it as adding depth to your IEC 62304 compliance and ISO 14971 risk management.

Let’s apply this to our Focus Restoration Device™. Instead of just running another test case from our pristine browsing patterns, we embrace entropy injection: stress-testing the assumptions about how humans actually use computers.

Stop testing to prove your software works. Start testing to discover how it fails. Every test that passes teaches you little. Every test that fails is a gift of knowledge about reality. Automated tests catch what you thought of. Exploratory testing, especially with real clinical users, catches what you didn’t. Give your device to a nurse who’s been awake for 30 hours and watch them use it in ways that violate the laws of physics.

Code coverage tells you what code ran. Mutation testing tells you if your tests would actually catch a bug if one were introduced. It’s the uncomfortable truth detector for your test suite. While not explicitly required by FDA or IEC 62304, mutation testing is gaining traction as an advanced practice for safety-critical software: think of it as stress-testing your stress tests.

You cannot test quality into medical software. You can only test to understand its limits. Every test that passes is a hypothesis that hasn’t been disproven yet. Every test that fails is reality teaching you humility.

Your test suite isn’t a safety net - it’s a learning system. It doesn’t prove your software is safe; it proves you are rigorously trying to understand how it might be unsafe.

The 2025 FDA cybersecurity guidance emphasizes “reasonable assurance of cybersecurity,” not perfect security - acknowledgment that in our high-entropy world, resilience matters more than invulnerability. The same principle applies to all SaMD testing.

Remember: In medical software, your edge cases have edge cases, and somewhere out there, a retail investor is using your Focus Restoration Device™ while “doing their own research” on why a coin named after a dog is the future of finance. The device, confused by this quantum state of focused distraction, starts zapping at random intervals - which the user interprets as divine timing signals to buy more. They’ve now made 47 trades based on your therapeutic voltage patterns. Congratulations, your medical device is now an unregistered financial advisor, and somewhere an FDA reviewer just felt a disturbance in the Force.

Test accordingly.

Previous: Part 2 - The Three-Zone Architecture: Structure for Chaos

Next: Part 4 - Becoming Antifragile: When Medical Software Gets Stronger From Chaos

References

Leveson, N. (2011). “Engineering a Safer World: Systems Thinking Applied to Safety.” MIT Press. The distinction between testing for correctness versus testing for understanding failure modes is fundamental to safety-critical systems. ↩
ISO 14971:2019. “Medical devices - Application of risk management to medical devices.” Third edition emphasizing benefit-risk analysis and lifecycle risk management. ↩
FDA 21 CFR 820.3. “Quality System Regulation - Definitions.” The regulatory definitions distinguishing verification from validation in medical device development. ↩
FDA (2025). “Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations.” Draft guidance introducing comprehensive AI/ML device requirements and PCCP framework. ↩

References

Footnotes