Contact Form

Name

Email *

Message *

Tuesday 14 November 2023

Worrying times for AI ahead? Major tech companies are running out of data to train LLMs

In the rapidly evolving landscape of the AI economy, data emerges as the linchpin that propels advancements. It is not merely a component; rather, it stands as the lifeblood of AI models, influencing their fundamental functionality and overall quality.

The correlation is clear: the more abundant and diverse the human-generated data an AI system is exposed to, the more adept it becomes.

However, a disconcerting revelation casts a shadow over AI companies—the finite nature of natural data. In a warning that has been reverberating among AI researchers for nearly a year, experts caution that the well of natural data, essential for training AI systems, is running dry.

Rita Matulionyte, a professor of information technology law at Macquarie University in Australia, emphasizes this concern in an essay for The Conversation.

A study by the AI forecasting organisation Epoch AI adds a tangible timeline to the foreboding scenario. The study estimates that AI companies could confront a shortage of high-quality textual training data as early as 2026, with low-quality text and image data potentially depleting between 2030 and 2060.

This data scarcity poses a substantial threat to AI firms heavily reliant on continuous data influx for the enhancement of their models. The trajectory of AI development has mirrored the infusion of increasing volumes of data. If this supply chain stagnates, the consequences could reverberate throughout the industry.

Matulionyte suggests a potential remedy in the form of synthetic data, generated by AI models. However, the viability of this solution is contested, with research indicating a risk of an “inbreeding effect” that distorts the model when trained on AI-generated content. Despite these challenges, some companies are already exploring synthetic training sets.

A pragmatic alternative emerges in the concept of data partnerships. In essence, companies or institutions possessing vast repositories of high-quality data could enter into agreements with AI companies to share this data, often in exchange for financial compensation.

OpenAI, a prominent Silicon Valley AI firm, recently launched a Data Partnership initiative. In a blog post, the company underscores the significance of such collaborations in steering the future of AI and creating models that are more relevant to diverse organizations.

As the race for data intensifies, the practicality of data partnerships becomes a focal point. Many AI datasets currently derive from internet-scraped data created by online users, making data partnerships a plausible solution. Yet, with the escalating value of data, the competition for datasets is poised to intensify, raising questions about the willingness of institutions and individuals to share their data with AI entities.

Even with data partnerships, there remains a lingering uncertainty about the sustainability of the data supply. Despite the seemingly boundless expanse of the internet, the impending challenge of dwindling data reserves forces a reassessment of assumptions about the endless nature of this critical resource.

(With input from agencies)



from Firstpost Tech Latest News https://ift.tt/7x5oklJ

No comments:

Post a Comment

please do not enter any spam link in the comment box.

Navigating the World of Crypto: Exploring the Potential of Crypto4u

 In recent years, the world of cryptocurrency has undergone a seismic shift, evolving from a niche interest among tech enthusiasts to a glob...