zkPrompt
Last updated
Was this helpful?
Last updated
Was this helpful?
In large language models (LLMs), prompts are categorized into System Prompts and User Prompts. Both types of prompts directly impact the user experience and the effectiveness of the model, with system prompts playing a particularly crucial role. A typical LLM interaction workflow is illustrated in Figure 1.
System Prompts are provided by developers to define the context, tone, and boundaries of LLM responses. Acting as a guiding framework, they shape the AI's behavior and style throughout the interaction. Mastering the design of system prompts has become a core skill for LLM developers, which is why most are reluctant to disclose their system prompts. Moreover, many LLMs operate as black boxes for users, making it impossible to guarantee the consistency of system prompts.
System Prompt Consistency refers to the model's ability to consistently respond based on the predefined system prompt across different interactions, without any alterations. This ensures that the model's behavior aligns with the developer's intended framework.
To address this, we propose zkPrompt, a method leveraging zero-knowledge proof (ZKP) algorithms to prove the consistency of model behavior (i.e., system prompt consistency) without revealing the system prompt itself. As shown in Figure 2, the process of initializing the model with a system prompt is compiled into a circuit, and a commitment module is introduced within the circuit to commit to the system prompt provided to the model.
The zk circuit ensures that the committed prompt and the prompt used to initialize the model are identical. The commitment value is then output publicly from the circuit, allowing any user to verify this fact using the initial commitment value.
A cryptographic commitment is a protocol that allows one party to commit to a value without revealing its content to another party until the committer chooses to disclose it. Common commitment schemes include hash commitments and Pedersen commitments.
In zkPrompt, developers can both keep their committed system prompt confidential and ensure that once the system prompt is committed, it cannot be altered. This guarantees the integrity of the system prompt.
The developer commits to the system prompt and sends the commitment value on-chain.
The circuit is initialized, generating a prover key and a verifier key, with the verifier key sent on-chain.
The user sends a User Prompt.
The LLM processes the user request and generates a Response along with a zk Proof, which is then sent on-chain for verification.
The smart contract uses the initial system prompt commitment value to verify the zk Proof. If the verification passes, it ensures that the system prompt has not been tampered with.