We take communication for granted because we do it so frequently, but it’s actually a complex process. - Joseph Sommerville
Note: There’s a lot of pictures in this post and if you are reading on email your email provider might truncate the message. You can click "View entire message" to see the whole thing.
One concept using AI that is currently gaining traction is prompt engineering. Prompt engineering is a way of using text to instruct an AI system to generate content (text, images, video, audio, etc). This becomes involved and is necessary because each AI system interprets text differently, with different levels of performance and quality, to varying degrees of results. Simply, you can think of prompt engineering as determining the proper way to talk to an AI system to convey the idea you are trying to produce.
Jim Fan, an AI scientist at Nvidia, has a certain take and explanation as to what prompt engineering is and why we need it. You can view the full thread here.

Jim has an additional take on why prompt engineering exists for these systems.


I completely disagree with Jim Fan here. With a generative AI system, we need a way to trigger the output we want. That means we have to find a way to convey the concept in our head into written words in such a manner that an AI interprets them sufficiently to provide a high quality input. This is fundamentally a communication issue that will never disappear. If I asked twenty people to draw me a “majestic dog” I would get twenty different designs. Even if they are all high quality, they will likely have different breeds. Is that wrong? Or have I not fully conveyed the concept in my head for someone else to draw?
However, what I do agree with Jim on is that there will be no prompt engineering jobs. Prompt engineering will be a skill that everyone will need to have. So let’s walk through what it takes to do prompt engineering well. For our example, we will create our way to the image at the top of this post. So let’s fire up DALLE-2 and see how we get there.
Based on seeing the image at the start of this post, what words would you use to generate it? Take a moment to think. If you want to be hands on, feel free to follow along by signing up for DALLE-2 and using the initial free credits to create in the way you think gets to the same image.
Note: As we go through these outputs, I’ll be referring to images by their position. DALLE-2 outputs four images. Image 1 will be the first image on the left continuing up to image 4 which will be the last image on the right.
The basic idea I had in my head before starting was that AI presents gifts to people from a seemingly other dimension and I’d like to convey that as an image. We need to find a way to communicate that idea while giving the system enough leeway to delight us. As we walk through, we’ll discover the things that the system picks up on, where it might need to improve, and how we might need to communicate the idea better.
Let’s take a first pass and use the following prompt
AI giving a gift to people from another dimension
For me, these leave a lot to be desired. Image 1 seems more like an alien. Perhaps we should try building up the image component by component. I think we need the idea of a portal to convey the other dimension. Let’s start there.
a gift being handed through a portal
Hmm. That’s not quite what I had in mind. I’d like more of a beach and ocean setting. Also, the gift isn’t usually being handed off through the portal in these images. They are more in the foreground. Let’s try to get the setting.
a gift being handed through a portal on the beach by the sea
Images 3 and 4 don’t have a recipient and images 1 and 2 look weird. We are getting the beach and ocean in there so that’s a plus. However, we see more of a portal opening up to a beach rather than a portal magically appearing on the beach.
Let’s try and improve the portal by providing a little more description.
a gift being handed through an energy portal on the beach by the sea
What? Well that went CGI-like pretty quickly. Image 1 doesn’t have a portal, image 2 has some weird depth issues. Image 3 is just strange and image 4 looks like an early 2000’s bad photoshop job.
Let’s keep trying to make the portal better. We’ll do this by being more explicit about the gift and the portal.
a glowing gift being handed through a curved energy portal on the beach by the sea
Ugh. Well image 4 probably has the closest setting to what I’m trying to accomplish. I’m not sure what happened to the recipients in the other three images. Perhaps if we are more explicit about the participants we’ll get better results.
a glowing gift being handed through a curved energy portal between a robot and a human on the beach by the sea
What happened to the portals? I think there’s one on the ground in image 3. I will admit images 3 and 4 are a lot closer to what I’m thinking but the context is still a bit off. Let’s rephrase a bit to make sure the action of handing the gift through the portal is captured.
a robot handing a glowing gift through a curved energy portal to a human on the beach by the sea
Image 2 has much more the essence of what I’m trying to capture. The other three images are still missing the requisite portals. With the exception of image 3, it also looks like humans disappeared from this.
Ok. We’ve been pretty descriptive and still aren’t getting what we need. Let’s try a different tact and see if we can figure out if adding a style can improve things. We’ll quickly go through a few different styles and then talk about them.
a robot handing a glowing gift through a curved energy portal to a human on the beach by the sea, surrealism
a robot handing a glowing gift through a curved energy portal to a human on the beach by the sea, photorealistic
a robot handing a glowing gift through a curved energy portal to a human on the beach by the sea, baroque
a robot handing a glowing gift through a curved energy portal to a human on the beach by the sea, digital art
Different styles, similar concepts. Surrealism doesn’t really capture what’s in my mind. Photorealistic seems off and sometimes too CGI-like for me. The baroque style almost seems like a dream to me. I like the digital art the best. Image 3 in digital art is getting closer!
So let's stick with using the digital art style. Now we’ll add a little bit of strange magic to see what happens. You can add symbols sometimes to get better results. I think this plays with the random noise used by the system but I’m not quite sure. We’ll add `(((…)))` to the prompt.
a robot appearing through a curved energy portal handing a glowing gift to a human on the beach by the sea, digital art (((...)))
That trick didn’t seem to work and created more of a view from caves. Perhaps if we try being overly specific with explicit directions that will help.
a curved, glowing energy portal opening on the beach by the sea where a robot hands a human a gift through the portal, the human is on one side of the portal and the robot is on the other, slight aerial view, the gift is in the middle of the portal, digital art
Well some of that explicit direction definitely helped. The gift is in the middle, the view changed for images 1 and 3. However, these images almost seem like they are part of a video game and aren’t really conveying the right concept for me. Maybe we aren’t giving the right amount of direction.
a robot handing a glowing gift through a curved energy portal to a human on the beach by the sea, digital art, partial sideways view, focus equally on the gift, portal, robot, and human
Nope. Those get further away. Maybe we have to change how it focuses.
a robot handing a glowing gift through a curved energy portal to a human on the beach by the sea, digital art, partial sideways view, focus more on the gift and portal and less on the robot and human
Well DALLE-2 is definitely not following directions now. Where did the portal go?! What should we do? Let’s tweak a few adjectives to see if we can get a different response. I saw something interesting in one of the past prompts that I can’t explain; I think the AI is on to something. We’ll use the following prompt we used previously but tweaked to add blue.
a robot appearing through a curved blue energy portal handing a glowing gift to a human on the beach by the sea, digital art (((...)))
Wow that’s a bit different from the first attempt. Adding blue seems to have definitely controlled the color but also improves the concept a bit more. Image 2 is really close except it looks like robot to robot. Let’s be more descriptive on the human and change it from human to traveler.
a robot appearing through a curved blue energy portal handing a glowing gift to a traveler on the beach by the sea, digital art (((...)))
Bingo! Image 1 is close enough to what was in my head. I’d personally like the portal to be more connected to the ground which could be solved with inpainting or the DALLE-2 editor. Let’s go with that.
What can we learn from this process?
Communicating an idea in your head is difficult and takes more than a few words.
What might seem like an obvious prompt to you, is not necessarily obvious to the system.
Like a normal creation process, using generative AI is an iterative process. Like any designer or artist worth their salt will tell you, you are never going to get it right on the first attempt.
You need to understand what the system picks up on in order to create quality prompts.
Adding adjectives, being explicit, switching words around can help to a point. However sometimes it is better to rephrase things.
Different styles provide a lot of control over the output.
Focusing on an end idea will be easier to build to rather than having a vague idea of what you want.
Prompt engineering is a skill that anyone using AI tools will need to pick up.