My name is Chu Lahlou. I am the Director of Generative AI Engineering and Solutions at Booz Allen Hamilton. Generative applications are systems that are powered by generative AI technology and large language models. It's important because it's simply one of the most talked about technologies today. But I think I want to differentiate a little bit from what consumers and end users are used to. Like applications on their phones and desktop like ChatGPT and Claude, because those are general purpose assistant tools. We're talking about systems that are fully integrated with enterprise or agency systems. People are used to these commercial products and think it's a plug-and-play. I often heard people say, “Build a ChatGPT for my enterprise.” However, in order to do so, you really need to consider the complexity of the technologies you're introducing to your enterprise technology systems. How do we scale the application? Connect with your data usage, make sure it's grounded on your enterprise data and fully embedded in your existing processes. I think the main purpose of the paper is to demystify the complexity of engineering a GenAI application. The simplicity, the appearance of these applications makes people think you can plug-and-play and just purchase the technology and put it into your enterprise and have it be fully scaled. But there are actually multiple technical layers. And how you choose in these abundant availability of technology is very critical and is challenging. So you want to lay out a framework for people to follow, to be able to make the right decisions. Deployment of GenAI applications really depends on, again, the use case and where you are deploying your application. For one side, a lot of agencies would look at enterprise usage — support a large workforce, improving their efficiency. So scalability, maybe latency, and throughput is critical. So cloud naturally makes sense. But then for other mission-critical scenarios - battlespace, need to be in an air gap environment, or disconnect environment - edge scenarios or on-prem deployment will be important. But beyond the mission itself, cost and again, latency, are other things to consider. Cost in the cloud you pay as you go, using APIs, versus on-prem, you may require more upfront investment in training, tuning, and deploying the model. But over time, you may be able to get more economic benefits from using those models. Selecting an LLM is more intricate than people would have thought. People tend to go to just the most powerful models. But in fact, it's important to choose the right model for the right task. It could be good for summarization, reasoning, mathematics. And oftentimes you don't have to over-engineer it if your problem is simple enough. You can significantly reduce the cost by using a smaller model. So there are many things that we should consider as part of the tech stack. Generative AI Tech Stack is really just a reference architecture for development teams and engineering and business stakeholders to introduce and use as a guidance when they're building their GenAI system. They're very important because we need a holistic view of an application. How do you choose your data? How do you prepare your data? How do you choose your model? Which platform do you use? There's a lot of trade-off that decision makers have to make based on the cost, latency, and other functional or nonfunctional engineering requirements. LLM Operations is really an extension of the Machine Learning Operations or MLOps. It really focuses more on the unique features of generative AI applications, including monitoring the healthiness of prompts, tool usage for agent, and also the orchestration framework. Where the checkpoints should be introduced. How can we have the right monitoring in place so that these models are not hallucinating or be misused and they're being current with the information provided. Federal agencies can manage the risk with generative AI systems by implementing continuous monitoring across prompt, outputs, and data access through LLM Ops. They can also established GRC controls like audits, human-in-the-loop checkpoints, and red teaming for pre- and post-deployment. Finally, they can have a sandbox environment to enable fast experimentation and risk checks. So they get the response quickly and iterate as needed. I think there will be a lot of breakthrough in the technology area, and that's very exciting. But one critical element that needs to happen for us to fully scale out GenAI applications across federal and private sectors, I think is a standardized compliance framework. The technology is evolving at a much faster speed than our traditional IT pipeline, so we need to catch up on a compliance and security-check perspective to make sure we're ready to get these things into everyday use.