As LLM-driven agents transition from experimental prototypes to real-world deployments, a critical design challenge is growing increasingly difficult to address: the more powerful and useful cloud-based memory becomes, the more it exposes sensitive user information. A collaborative research team from MemTensor (Shanghai), HONOR Device, and Tongji University has unveiled **MemPrivacy**, a framework designed to resolve this conflict without undermining the very utility that makes personalized memory valuable in the first place.
## The Core Problem With Cloud Memory
Every time you engage with an AI agent, your conversation may reveal highly sensitive information — health conditions, email addresses, financial details, passwords, and other personal data. In a standard edge-cloud setup, the user’s device (the edge) handles input processing, while the computationally intensive tasks of memory management and reasoning occur in the cloud. While this architecture is efficient, it means raw, unredacted user data is transmitted to and stored within cloud infrastructure.
The threat is far from hypothetical. Earlier research has demonstrated that multi-turn memory attacks can successfully extract private information at rates as high as 69%, and targeted leakage attacks against memory systems have achieved a 75% success rate. Indirect prompt injection techniques can even trick agents into actively coaxing users to reveal confidential details. Once sensitive content makes its way into cloud logs, vector databases, or external memory stores, it remains potentially accessible through subsequent storage, retrieval, and reuse cycles — long after the original conversation has ended.
Previous attempts to solve this problem have relied on masking — substituting sensitive values with generic tokens like `***`. The fundamental flaw with this approach is that it destroys meaning. For example, if a user asks an agent to compose an email to their doctor and both the blood pressure reading and email address are replaced with `***`, the cloud model cannot meaningfully complete the task. More rigorous methods such as differential privacy and cryptographic protection offer stronger theoretical guarantees, but they are notoriously difficult to weave into interactive memory pipelines without significantly degrading the quality of responses.
## What MemPrivacy Does Differently
Instead of masking private content outright, MemPrivacy substitutes it with **typed placeholders** — structured tokens such as `
This approach is called ***local reversible pseudonymization***, and the complete pipeline operates in three stages. **Stage 1 (Uplink Desensitization):** A lightweight on-device model scans the input for privacy-sensitive segments, classifies each by type and sensitivity level, and replaces them with typed placeholders. The mappings between original values and their corresponding placeholders are stored locally and persist across sessions, ensuring that the same value always maps to the same placeholder. **Stage 2 (Cloud Processing):** The sanitized input is forwarded to the cloud agent or memory system. The typed placeholders preserve enough semantic context for memory formation and retrieval to work correctly. **Stage 3 (Downlink Restoration):** The cloud’s response, which may contain placeholders, is restored on the device through a lightweight database lookup and string substitution process, introducing negligible latency.
## A Four-Level Privacy Taxonomy
A central contribution of the research team is a four-level privacy taxonomy (PL1–PL4) that precisely defines what gets protected and at what threshold:
– **PL1** encompasses general preferences, habits, and stylistic choices that do not identify an individual and carry minimal risk. These are not protected by
Users can set their own masking threshold — for instance, choosing to protect only PL3 and PL4 data, or applying full protection across PL2 through PL4 — offering fine-grained control over the balance between privacy and data usefulness.

MemPrivacy-Bench and Model Training
To develop and assess their method, the researchers built MemPrivacy-Bench, a benchmark comprising 200 synthetic user profiles and more than 155,000 privacy-related instances (125,776 for training, 29,967 for testing) across balanced Chinese and English conversations. The dataset spans 7 broad scenario categories and 23 detailed subcategories. The test set includes 615 question-answer pairs covering six memory task types: basic recall, temporal reasoning, adversarial questioning, dynamic updates, implicit inference, and information aggregation. Annotations were initially produced by a dual-model pipeline using Gemini-3.1-Pro and GPT-5.2, then validated by six human annotators, reaching a final annotation accuracy of 98.08%.
The MemPrivacy extraction models are fine-tuned from Qwen3 base models at 0.6B, 1.7B, and 4B parameter scales. Training involves supervised fine-tuning (SFT) followed by reinforcement learning with Group Relative Policy Optimization (GRPO). GRPO calculates advantages based on relative rewards across multiple sampled outputs per input, using F1 score as the reward signal — eliminating the need for a separate critic model and its associated computational cost. The training split used 160 user profiles, while the test split used 40.
Experimental Results
On MemPrivacy-Bench, the top-performing model — MemPrivacy-4B-RL — achieves an F1 score of 85.97%, surpassing Gemini-3.1-Pro’s 78.41%, the strongest general-purpose model tested. Even the smallest variant, MemPrivacy-0.6B-SFT, reaches 83.09% F1, outperforming every general-purpose model evaluated. On the out-of-distribution PersonaMem-v2 benchmark, MemPrivacy-4B-RL scores 94.48% F1, compared to 92.18% for DeepSeek-V3.2-Think, the leading general model on that benchmark.
OpenAI recently released Privacy-Filter, an open-source bidirectional token-classification model for PII detection. It achieves 35.50% F1 on MemPrivacy-Bench — a gap of over 50 percentage points behind the best MemPrivacy model — though it runs at significantly lower latency (0.34s versus roughly 2s for MemPrivacy models on the same benchmark).
For downstream memory utility, MemPrivacy was evaluated across three popular memory systems: LangMem, Mem0, and Memobase. When protecting all PL2–PL4 content, accuracy drops on MemPrivacy-Bench are limited to 0.73%–1.30% and 0.71%–1.60% on PersonaMem-v2, compared to no-protection baselines. In contrast, irreversible masking causes accuracy drops of 16.99%–41.87% on MemPrivacy-Bench, while untyped placeholder masking leads to drops of 4.72%–6.67% on MemPrivacy-Bench and 2.67%–8.71% on PersonaMem-v2.
Key Takeaways
- MemPrivacy substitutes sensitive user data with semantically typed placeholders (e.g.,
) on the user’s device before sending anything to the cloud, ensuring the cloud-based memory system never sees raw private values. - The
- The framework defines a four-tier privacy classification system (PL1–PL4), spanning from general preferences to highly sensitive, immediately exploitable credentials, with customizable masking thresholds set by the user.
- MemPrivacy-4B-RL achieves an F1 score of 85.97% on MemPrivacy-Bench and 94.48% on PersonaMem-v2, surpassing GPT-5.2 (68.99%) and Gemini-3.1-Pro (78.41%) in privacy span extraction tasks.
- When applied at the PL2–PL4 level across LangMem, Mem0, and Memobase, MemPrivacy limits memory utility degradation to just 1.6%, whereas irreversible masking can cause accuracy drops as high as 41.87%.
- With model sizes ranging from 0.6B to 4B parameters and per-message inference completed in under two seconds, the framework is well-suited for on-device deployment without introducing noticeable latency.
Marktechpost’s Visual Explainer
health data. It can draft the email correctly. It never sees 160/110 or RC-7291.
Session PersistenceThe original-to-placeholder mapping is kept in a local secure database and remains available across sessions. Each unique value is always assigned the same placeholder, ensuring reliable long-term recall.
Multiple spans of the same type are differentiated by incremental indices:
,, and so on.



