The Case for Keeping Health Data on Your Phone

In this article

The privacy problem What on-device ML can do Why health is a good fit How Helix works The 94-day metric Constraint as design Privacy and capability

The privacy problem with cloud health AI

Most consumer health AI sends your data to a server. Some tools are transparent about this. Many are not. The data center receives your heart rate history, your sleep patterns, your blood panel results, your menstrual cycle data, your medication logs - and processes it to return an analysis or a recommendation. The analysis may be useful. The privacy cost is extraordinary.

The problem is not simply that the data is stored somewhere outside your control, though that is a problem. The problem is the entire category of risk that transmission creates. Health data that leaves your device can be subpoenaed in legal proceedings. It can be breached. It can be sold or shared under terms of service that users do not carefully read. It can be accessed by third-party analytics partners whose existence is buried in a privacy policy. In the United States, the protections afforded to health data held by technology companies are substantially weaker than protections afforded to data held by covered entities under HIPAA.

The companies building these tools have strong incentives to collect the data. More data produces better models. Better models are a competitive advantage. The business model of most consumer health platforms is, at some level, the data. The analysis is the product that justifies data collection. This is not a conspiracy - it is the logical outcome of building a business around data-dependent AI in an era when health data regulation has not kept pace with technology.

We started with the premise that this was not acceptable for Helix. Health data is among the most sensitive data that exists about a person. It reveals conditions, behaviors, and vulnerabilities that affect insurance, employment, relationships, and personal safety. No analysis is worth the risk of that data leaving the device that owns it.

What on-device ML can do with Apple Silicon

Until recently, the on-device argument for health AI would have been technically compromised. Complex ML models - the kind required for longitudinal biomarker analysis - were too computationally demanding for mobile hardware. You could do simple statistical analysis on-device. You could not do anything that required a transformer-based model or the kind of time-series analysis that produces meaningful trend detection.

Apple Silicon changed this. The Neural Engine in A-series chips starting with A15, and in M-series chips, has sufficient throughput to run the class of model that Helix requires - not just classification tasks, but continuous inference over longitudinal data. Core ML provides a framework for deploying quantized models that run efficiently on the Neural Engine without requiring server infrastructure.

The iPhone in the pocket of a Helix user has, since 2021, been capable of running models that were datacenter workloads three years earlier. That capability shift is the technical foundation that made Helix possible as an on-device product. Not as a compromised, limited version of an on-device product - as a product that would not have been meaningfully better if it sent data to a server.

The Neural Engine executes matrix operations in hardware - the same operation that underlies transformer attention and the time-series models at the core of Helix. At 16+ TOPS in modern devices, these operations complete fast enough to run inference on aggregated biomarker data continuously in the background without meaningful battery impact. The constraint of on-device processing, which would have forced real compromises in 2019, is no longer a meaningful constraint for the models Helix requires.

Why health longitudinal analysis is a particularly good fit for on-device AI

Beyond the privacy requirement, on-device architecture turns out to be specifically well-suited to the problem of longitudinal health analysis. This is not a coincidence - it is a structural property of the problem.

Longitudinal analysis is fundamentally about an individual's history. The model needs to understand your baseline - your personal normal range for HRV, your typical resting heart rate, your sleep quality distribution, your lab value history. That baseline is different from the population baseline. An HRV of 38ms might be a significant decline for someone whose typical range is 55-70ms. It might be entirely normal for someone whose historical range is 32-45ms. The clinical relevance depends on the individual context.

This means that the most important data for the model - the personal baseline - is held entirely on your device. The HealthKit history, the lab PDFs you have imported, the years of Apple Watch data - none of it benefits from being on a server where it can be compared to population averages. The comparison that matters is within-individual over time, and that data is already on your device.

How Helix works technically

The core of Helix is a longitudinal biomarker model that runs continuously in the background. It reads HealthKit data through Apple's authorized APIs - no scraping, no backdoors, the same mechanisms any HealthKit-authorized app uses. It imports lab results through a PDF parsing pipeline that runs locally, extracting values and mapping them to standardized biomarker identifiers. It integrates Apple Watch continuous measurements through HealthKit's background delivery mechanism.

All of these data streams are normalized into a unified timeline. HRV measurements from different sensors are reconciled. Lab values from different labs are mapped to consistent reference ranges. Sleep data from HealthKit and Apple Watch is deduplicated and integrated. The resulting timeline is the substrate that the longitudinal model analyzes.

The model identifies deviations from personal baselines using statistical methods that account for normal variation, seasonal patterns, and the natural measurement noise in consumer-grade sensors. When it detects a pattern that exceeds a configurable significance threshold, it surfaces a finding - a description of what it noticed, with the data that supports the observation and a confidence level. It does not diagnose. It observes.

All of this runs on the Neural Engine, in the background, without requiring an active internet connection. Helix works offline. The model improves as more data accumulates - but the improvement is in the personal baseline model, not in a shared model updated with user data. Your data makes Helix better for you. It does not contribute to a product that is better for everyone at the cost of your privacy.

The 94-day lead time metric and what it means

The 94-day average lead time is the most frequently cited Helix metric and the most frequently misunderstood. It is worth being precise about what it does and does not claim.

The 94-day figure represents the average time between when Helix flags a trend in a user's biomarker data and when that trend would have been detected through standard-of-care medical review - typically an annual or semi-annual physical examination with blood work. It is derived from a study of cases where Helix flagged a trend, the user subsequently discussed the finding with a physician, and the physician confirmed that the finding was clinically significant.

It does not mean that every Helix flag corresponds to a clinically significant finding. Many flags are investigated and found to be within normal variation. The metric applies specifically to findings that are ultimately confirmed as meaningful. It does not claim that Helix can diagnose conditions - it claims that when there is a real change in a user's biomarker profile, Helix tends to notice it before a physician would encounter it through routine care.

The 94 days matters because early detection matters. A trend that is visible in biomarker data three months before symptoms appear is, in many cases, more tractable than a condition that presents acutely. The value is not in alarming users - the value is in surfacing information that can be evaluated calmly, in conversation with a physician, before it becomes an emergency.

Why the constraint of on-device made the product better

We said we built Helix around the on-device constraint - not despite it. That deserves unpacking, because it sounds like the kind of thing companies say to make a limitation sound like a feature.

The constraint genuinely shaped the product in ways that improved it. Because the model runs on the user's device and has access only to the user's data, we were forced to build a personal baseline model rather than a population baseline model. A population baseline model would have been easier - you train it on aggregate data and deploy it to all users. A personal baseline model requires building systems that work well when you have only one user's data, accumulated over time. That is harder, and it is better, because the clinically relevant baseline for longitudinal health analysis is the individual's own history.

Because the processing is on-device, the product works offline. This turned out to matter in ways we had not fully anticipated. Users who travel internationally, users in rural areas, users who deliberately limit connectivity - for all of these users, on-device was not just a privacy feature but a usability feature. The analysis runs when you wake up in the morning, connected or not, because it does not need a server.

And because the model does not send data to our servers, we are not tempted by the data. This sounds trivial. It is not. The availability of user health data creates pressure - analytical, financial, strategic - to use it. On-device architecture removes that temptation architecturally, not through policy. Policy can change. Architecture does not change without rewriting the product.

Privacy and capability are not in tension - they’re the same thing, correctly designed

The conventional framing of consumer health AI presents privacy and capability as a trade-off. More privacy means less capability - you give up features or accuracy in exchange for keeping your data local. We think this framing is wrong, and Helix is our argument for why.

The privacy architecture is not a constraint that limits capability. It is the architecture that enables the right capability - longitudinal, individual, personal - rather than a compromised capability that requires population data to paper over the absence of individual context. On-device processing with Apple Silicon is not slower than server-side processing in a way that matters for this application. It is faster in every user-facing interaction because there is no network latency. The model is better for the individual because it is trained on the individual's data, not constrained to represent an average person.

Privacy and capability are not in tension when the design is correct. The tension arises from building a product around a data model that requires bulk collection - and then trying to apply privacy policies on top of that model. Start from the architecture. Build for the individual. Process on-device. The capability that results is not less than what you would have with server-side processing. In the domain of longitudinal health analysis, it is more.