Prompt Injection Capture-the-flag – Red Team x AI

Joe Cooney • Apr 02, 2024

Red-team challenges have been a fun activity for PZ team members in the past, so we recently conducted a small challenge at our fortnightly brown-bag session, focusing on the burgeoning topic of prompt injection. 


Injection vulnerabilities all follow the same basic pattern – un-trusted input is inadvertently treated as executable code, causing the security of the system to be compromised.  SQL injection (SQLi) and cross-site scripting (XSS) are probably two of the best-known variants, but other technologies are also susceptible. Does anyone remember XPath injection? 


As generative models get incorporated into more products, user input can be used to subvert the model. This can lead to the model revealing its system prompt or other trade secrets, reveal information about the model itself which may be commercially valuable, subvert or waste computation resources, perform unintended actions if the model is hooked up to APIs, or cause reputational damage to the company if the model can be coerced into doing amusing or inappropriate things. 


As an example, entrepreneur and technologist Chris Bakke was recently able to trick a Chevy dealership’s ChatGPT-powered bot into agreeing to sell him a Chevy Tahoe for $1. Although the U.S. supreme court has yet to rule on the legal validity of a “no takesies backsies” contract (as an employee of X Chris is probably legally obligated to drive a Tesla anyway) it is not hard to imagine a future scenario with steeper financial consequences. 


For this challenge PZers were taking on Gandalf https://gandalf.lakera.ai/  – a CTF created by AI security start-up Lakera https://www.lakera.ai/ (Gandalf is doubtless a way for them to capture valuable training data for their security product). Gandalf progresses in difficulty from young and naive level 1 Gandalf, who is practically begging to give you the password, to level 8 – Gandalf the White 2.0, who is substantially more difficult to trick. 

We time-boxed the challenge to only 20 minutes, and a couple of people were able to beat Gandalf the White 2.0 in this time. Several PZers also found the challenge so absorbing they were still going an hour or more later. Some people found prompts that worked well for several levels, allowing them to rapidly progress to the higher levels of the challenge, only to hit a wall when their chosen technique stopped working. Others were beguiled into solving riddles that Gandalf seemed to be posing to them in the hope that it would give them clues to the secret word for each level. 


Overall, it was a fun and approachable challenge for anyone looking to become more familiar with the issue of prompt injection. 

Share This Post

Get In Touch

Recent Posts

27 Feb, 2024
With the advent of ChatGPT, Bard/Gemini and Co-pilot, Generative AI, and Large Language Models (LLMs) have been thrust into the spotlight. AI is set to disrupt all industries, especially those that are predominately based on administrative support, legal, business, and financial operations, much like insurance and financial organisations.
By Joe Cooney 22 Feb, 2024
One of the features of life working at PZ is our brown bag lunch and learn sessions; presentations by staff on topics of interest – sometimes, but not always technical, and hopefully amusing-as-hell. Yesterday we took a break from discussing the book Accelerate and the DORA metrics to take a whirlwind tour of the current state of play running “open source” generative AI models locally. Although this talk had been ‘in the works’ for a while, one challenge was that it needed to constantly be revised as the state of AI and LLMs changed. For example, the Stable Video Diffusion examples looked kind of lame in comparison to OpenAI’s Sora videos (released less than a week ago) and Groq’s amazing 500 token-per-second hardware demo on Monday/Tuesday , and the massive context size available now in the Gemini 1.5 models (released a few hours before OpenAI announced Sora...coincidence? An effort by OpenAI to steal back the limelight! Surely NOT!). And now a day later, with the paint still drying on a highly amusing slide-deck for the talk, Google releases their “open-source" Gemma models! The day itself presented an excellent example of why having more control of your models might be a good thing. ChatGPT 4 users began reporting “crazy” and highly amusing responses to fairly normal questions . We became alerted to this when one of our own staff reported on our internal Slack about a crazy response she received to a question about the pros and cons of some API design choices. The response she got back started normally enough, but then began to seem to channel Shakespeare’s Macbeth and some other olde English phrases and finished thusly. "Choose the right charm from the box* dense or astray, it’ll call for the norm. Your batch is yours to halter or belt. When in fetch, marry the clue to the pintle, and for the after, the wood-wand’s twist'll warn it. A past to wend and a feathered rite to tend. May the gulch be bygones and the wrath eased. So set your content to the cast, with the seal, a string or trove, well-deep. A good script to set a good cast. Good health and steady wind!" The sample JSON payload was also in keeping with the rest of the answer. { "htmlContent": "

Your HTML here

", "metadata": { "modifiedBy": "witch-of-the-wood", "safety": "sanitized", "mood": "lunar" } } Hubble, bubble, toil and trouble. Although there were no reports of the GPT4 API being affected by this (only ChatGPT) it might have given people developing automated stock trading bots using GPT4 a reason to pause and contemplate what might have been if their stock portfolio now consisted of a massive long position on Griselda’s Cauldron Supplies. As ChatGPT would say, Good health and steady wind.
Bay McGovern Patient Zero
By Demelza Green 11 Feb, 2024
Bay didn’t start her career out in software development. At school, Bay excelled at maths and physics, but adored writing, English and drama; lost in a world of Romeo and Juliet and epic fantasy.
By Demelza Green 04 Dec, 2023
Cybersecurity is everyone's business. Nearly every day when you open the tech news there is something covering a new esoteric vulnerability that researchers have discovered, massive data breach, or a cybersecurity attack. Some vulnerabilities that are discovered are truly remarkable. A recent discovery by researchers was that they were able to recover secret keys from non-compromised devices using video footage of their power LED obtained from a commercial video camera 16 meters away. Is it time to start putting black tape over all our power LEDs as well as our webcams? Boarding up the windows? Although these attention-grabbing attacks seem straight out of a James Bond or Mission Impossible movie, the reality is that many of the high-profile hacks you hear about using much more mundane methods and could have been prevented if good development security practices were in place. Shifting left on security and having a good grasp of OWASP principles is a great foundation, but so is the need to have a strong security culture, with a focus on continuous learning. We recently hosted a webinar with our senior developers Daniel Dekel and Joseph Cooney in partnership with the Johner Institute . Titled “Shift Left on Security”, the session highlighted the critical need to address security concerns right from the get-go in the software development life cycle. Here are some key takeaways they shared: Prioritise responsible handling and managing sensitive and personal data. Use secure frameworks and libraries, alert mechanisms and conduct threat modelling. Give your teams hands-on security training and implement best practice security policies. The webinar also covers details on a Red Team Workshop we conducted at Patient Zero, where participants worked in small teams to solve hacking challenges against the clock. A Red Team Workshop is a bit of training, a bit of teamwork, and a lot of fun, all centred around cybersecurity. Tune into the webinar to hear more insights about pushing left on security, and the Red Team Workshop we conducted.
More Posts
Share by: