AI Costs Vs Performance: Strategies For Running LLMs In Finance

Pavan Emani is the SVP, Engineering Leader for Gen AI and Machine Learning at Truist Bank.

getty

The rise of Large Language Models (LLMs) in financial services has unlocked new possibilities, from real-time credit scoring and automated compliance reporting to fraud detection and risk analysis. However, financial institutions are grappling with a major challenge: The cost of running these AI models at scale.

Unlike traditional machine learning models, LLMs require massive computational resources, leading to high infrastructure costs, latency issues and concerns over return on investment (ROI). As financial firms navigate this AI-powered transformation, they must strike a delicate balance between cost and performance to remain competitive.

This article explores key strategies financial institutions can adopt to optimize AI infrastructure and ensure efficient, cost-effective deployment of LLMs.

1. Optimize Compute With Model Size And Fine-Tuning Approaches

LLMs are expensive to run because of their size and compute intensity. Financial firms should carefully evaluate whether they need a foundational model (e.g., GPT-4, Claude or Gemini) or if a smaller, fine-tuned model can achieve the same results.

Key Optimization Strategies:

• Fine-tune Smaller Models Instead of Using Massive LLMs: Instead of deploying a 175B+ parameter model, organizations can fine-tune smaller models (7B–13B parameters) on financial data to achieve comparable performance with significantly lower costs.

• Use Retrieval-Augmented Generation (RAG): Instead of fine-tuning an LLM on financial documents, RAG allows models to retrieve relevant financial insights from vector databases (e.g., Pinecone, Weaviate or FAISS) without constant retraining.

• Leverage Quantization & Distillation: Techniques like LoRA (Low-Rank Adaptation) and model distillation help financial institutions reduce LLM compute costs by compressing models without sacrificing accuracy.

The Rise Of Cost-Effective Models

A growing number of open-source, cost-effective LLMs like DeepSeek-R1 are significantly reducing AI deployment costs making it a viable alternative for financial services firms looking to optimize their AI spending.

2. Cloud Vs. On-Prem: Choosing The Right AI Infrastructure

Financial institutions must strategically decide whether to run LLMs on-premises, in the cloud, or through a hybrid approach based on workload demands. Some cost-saving considerations to keep in mind are:

Cloud Providers such as AWS, Azure, GCP or Snowflake Cortex offer flexibility and scalability but can become prohibitively expensive for always-on workloads. Reserved instances and spot pricing can help optimize spending.

On-Prem GPU Clusters like NVIDIA DGX or AMD Instinct tend to be more ideal for firms that need predictable AI workloads and want to control data security but require upfront capital investment. However, institutions can also elect to utilize a hybrid cloud strategy, which combines on-prem for high-frequency workloads and cloud for burst compute needs to ensure cost efficiency.

3. Efficient API Calls & Smart Prompt Engineering

Running LLMs at scale means paying for each API call, whether it’s a credit risk assessment or a customer service response. Smart API usage and prompt engineering can drastically cut costs. There are a number of optimization strategies institutions may leverage to keep costs in control.

For example, instead of making real-time, one-off API requests, financial firms can batch process multiple queries at once. They can use shorter prompts as every additional token increases model inference costs. Optimizing prompt length can significantly reduce costs per API call.

Institutions can also utilize tokenization and caching mechanisms. By caching frequently used responses, institutions can avoid redundant queries to LLMs.

4. Monitoring & Optimizing LLM Performance With Cost Analytics

Continuous LLM cost monitoring ensures financial firms don’t overspend on AI workloads. Implementing a cost-tracking dashboard using tools like AWS Cost Explorer, Azure Monitor or Grafana to monitor LLM usage and identify inefficiencies.

Institutions should utilize adaptive scaling. Dynamically adjusting LLM resources based on demand, reducing model size during non-peak hours cuts costs. Additionally, benchmark model performance vs. cost. Regularly testing and evaluating whether a smaller, fine-tuned model performs as well as a large-scale LLM, ensures optimal cost-performance trade-offs.

5. Governance & Compliance: Ensuring AI Cost Efficiency Without Risk

Cost optimization should never come at the expense of compliance and security. Financial institutions must implement robust AI governance frameworks to prevent regulatory penalties and inefficiencies.

Best Practices for Cost-Efficient AI Governance:

• Data Access Control: Restricting AI access to only authorized personnel to prevent unnecessary usage.

• LLM Audit Logs: Tracking every AI decision to ensure accountability and explainability for regulators.

• Cost-Based Access Policies: Limiting high-cost LLM operations to critical use cases only to avoid wasteful AI spending.

Final Thoughts: The Future Of Cost-Efficient AI In Financial Services

The rapid adoption of LLMs in financial services has transformed everything from credit underwriting to risk management and fraud detection. However, runaway AI costs can eat into margins if not carefully optimized.

By implementing strategies such as fine-tuning smaller models, leveraging hybrid cloud infrastructure, optimizing API usage and real-time AI cost monitoring, financial institutions can maximize AI ROI without sacrificing performance.

In the coming years, cost-effective models like DeepSeek will redefine AI economics, helping financial institutions cut LLM costs by 50%-70% while maintaining accuracy. The winners in this space will be those who balance innovation with efficiency, ensuring AI delivers tangible business value while keeping infrastructure costs under control.

Are you prepared to optimize your AI costs while staying ahead in the financial services AI race?

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?