Photo:
‘The company’s endeavor aims to generate both public and private databases.’
OpenAI, the renowned artificial intelligence (AI) company, has announced a new initiative called OpenAI Data Partnerships. The program aims to collect large-scale datasets from third parties to train its AI models. OpenAI is specifically looking for data that is not easily accessible online to the public. The company is interested in various formats, including text, images, audio, and video.
OpenAI is particularly interested in data on any topic and in any language, as long as it expresses human intention. This human-centric data will help improve OpenAI’s tools, such as its speech recognition technology, which is used for transcribing spoken words. The initiative is also aligned with ChatGPT’s recent expansion to support voice queries, enabling more conversational interactions with users. By exposing its AI models to a wide range of information, OpenAI aims to enhance its ability to hold human-like conversations and improve other related features.
OpenAI has already started collaborating with interested organizations, including the Icelandic government, to improve GPT-4’s ability to comprehend queries made in the Icelandic language. To participate in the program, organizations can submit a form on OpenAI’s website, providing details about the type and size of the data they intend to share. OpenAI offers two pathways for datasets: the Open-Source archive, which makes the data public for anyone to use, and the private dataset pathway, which allows companies to train proprietary AI models while keeping their data confidential.
Privacy is a significant concern, especially given ChatGPT’s massive user base of approximately 100 million weekly active users worldwide. OpenAI clarifies that it does not use data generated by its API to train its models unless users explicitly submit information through an opt-in form. However, as OpenAI collects data through its Data Partnerships program, the company’s handling of private datasets will be closely scrutinized.