2024 Generative AI Trends - a year of introspection

In 2024, the landscape of Generative AI is set to undergo significant transformations. Here is, what I believe to be three important trends. Firstly, the innovation around Graphics Processing Units (GPU’s) will play a pivotal role in enhancing the performance of Large Language Models (LLM’s) while reducing the associated costs and energy consumption. Secondly, the financial implications of Large Language Models (LLMs) will compel companies to recalibrate their budgets and adapt to the fluctuating costs of these technologies. Thirdly, the focus will shift towards self-regulation and establishing trust in AI systems, as industries and jurisdictions grapple with the rapid pace of AI advancements and the lag in regulatory frameworks.

You are going to hear a lot more about GPU’s

GPU’s is the tech that supports LLMs like GPT-3 and GPT-4. They were designed for parallel processing and was best known for their capabilities in gaming due to their ability to accelerate the rendering of 3D graphics, now having become more flexible and programmable. Their extraordinary amount of computational capability delivers massive acceleration in AI workloads that take advantage of their highly parallel nature – this is fundamental for training and running large-scale AI models such as LLM’s as they can simply process faster and more effective.

Herein lies the first problem – we all know that Generative AI is booming, but the costs can be extraordinary. According to some estimates, each training run of OpenAI's GPT-3 language model required at least $5 million worth of GPUs (thousands running in parallel). And these models require many runs. To put this into perspective, Sam Altman, when asked whether the cost of training foundation models was in the order of $100 million, he said that it was “more than that” and is getting more expensive. Newer models are trained on more data, and therefore require more computational power and longer training times. And currently models and their size are scaling faster than GPU capabilities can keep up. So, this means (A LOT) more GPU’s and therefore (A LOT) more cost.

The second problem – once trained, even many more thousands of GPUs are required to handle hundreds of millions of daily user requests. The number of users is also growing exponentially, fast, and, therefore, more costs. The cost examples shown below clearly illustrates the significant costs associated with LLM requests:

Credit: https://medium.com/emalpha/the-economics-of-large-language-models-2671985b621c

The third problem – all these GPU’s obviously use an extremely high amount of energy, which exacerbates the significant cost and environmental challenges ☹.

All of this means that additional innovation in GPU performance is expected, driven by the huge increase in training and usage workloads expected in 2024 and a demand for lower energy consumption, ultimately to drive the cost down for us as end users. For example, in first half of December 2023 Nvidia, AMD and Intel all announced the launch of next generation AI solutions. Nvidia claims that H200 GPUs will deliver 4.2x faster pre-training and supervised fine-tuning performance compared to A100s. AMD claims that its latest MI300X will deliver 40% more compute units, 1.5x more memory capacity and 1.7x more peak memory bandwidth compared to its predecessor, the MI250X. And Intel claims their 5th Gen Xeon AI accelerated CPU will deliver up to 10x better training performance compared to their previous generation Xeon CPU.

Here, as an example are the costs for H200 (reserved). Multiply this by the many hundreds or thousands of GPU’s required, then add more for full configuration. Eyewatering amounts.

Source: https://lambdalabs.com/nvidia-gh200

Volatile cost fluctuations and the impact of supply and demand

The cost and increasing demand for GPUs which is crucial for AI (discussed in the previous section) and their fluctuating availability will impact the cost of LLM’s significantly. Many companies already must content with cost volatility, for example utility providers that must balance cost based on the energy market, however most companies have not had to do so, and as they start to adopt LLM’s into their operations, they will have to build such practices into their cost management. They may even have to build in the ability to change between models and alternate between cloud providers, manage their own GPU clusters, include scale management, and understand peak vs off peak cycles to minimise their cost.

So just as hardware and cloud providers have to look at the cost (and energy consumption) associated with LLM’s, notably through the lens of GPU’s, customers will have to devise strategies to account for the cost volatility associated with the use of LLM’s. They will have to become a lot more adaptable and include AI within their budgets, not just roll it into generic IT spend.

Tightening up of regimes associated with the trustworthiness of AI

Already mentioned in previous articles, definitive regulations and frameworks to govern AI is lacking in most jurisdictions, and what exists is fragmented and reactive to the sheer speed of which AI is evolving - industries are trying and struggling to keep up.

A study by Deloittes in August 2023 showed that consumer trust and perception of reliability nosedived when consumers know that AI is involved in a brand they are dealing with.

Trustworthiness is not new as data scientists have been grappling with issues such as bias and transparency for a long time, but this is now even more important as AI is moving fast into the mainstream through LLM’s as “black boxes”.

So, in 2024, we will see a concerted effort by companies to proactively create strategies, frameworks and governance in a proactive, and self-regulatory manner, whilst broader industry, state and national policies tries to catch up. Part of these strategies and frameworks must cover trustworthiness. This refers to the reliability, safety, privacy and fairness of AI systems to help build perceptions of AI as being dependable and ethical especially as these AI systems are considered black box. By focussing on these, the paradigm shifts from black box to “glass box”.

In the most severe cases, certain proposed regulations, for example those put forward by the EU, are so stringent that numerous AI software tools of the current generation might be effectively barred from that market. Consequently, to navigate these tight regulatory constraints, companies may opt to develop their own models and utilize generative AI cloud services for training and running them, which could significantly boost revenue for cloud providers but create the cost volatility I spoke of previously.

Human & AI collaboration

Costs and ethics aside, we will also see increased collaboration using AI. Casual collaboration tools already widely available such as ChatGPT, BARD, etc. is already extensively used in the workforce, and this will only accelerate where companies make professional tools such as Copilot and/ or other AI companions available to their workforces. Still in various stages of maturity are the expert tools (for example Copilot in the MSFT stack – Fabric Data Engineering, Fabric Data Science, Fabric Semantic Model, Fabric Power BI Reports, Power Automate, Power Apps, GitHub; Google Duet; AWS Code Whisperer), but these could prove to have the most significant impact on the workforce due to the way in which they could support productivity.

Human & AI collaboration

These realities, again, will place the emphasis on internal company frameworks that describe responsible and safe use of AI, and ethics where persona, technology (casual, professional and expert) and use case types intersect. For example, what responsible usage guidelines and rules apply when staff use ChatGPT, ownership rules when a data engineer uses code recommended by GitHub Copilot, data security when generative AI is used to analyse data, avoiding bias within data science models, trustworthiness of AI enabled decision making engines, and many more. This will also require that questions such as human accountability for actions and design are answered and how key individuals will be empowered to govern related decisions.

Conclusion

The transformation of Generative AI will accelerate in 2024. While much was discussed about AI's potential in 2023, the coming year is expected to deepen our understanding of its challenges, including cost and environmental concerns driven by performance demands. Additionally, the financial impact on companies, the need for adaptability, and an increased focus on AI's trustworthiness and self-regulation will be key. Moreover, the growing collaboration between humans and AI will bring new dynamics. Indeed, 2024 might well be a year marked by introspection and strategic realignment in the AI sector.