The Price of Knowledge: AI’s Role in Advancing Open Access to Information

Discover why a balanced approach to commercial interests is necessary while promoting open access for societal benefit.

April 5, 2024

AI’s Role in Advancing Open Access to Information

Julius Černiauskas, CEO of Oxylabs, explores the ethical implications of AI’s access to knowledge and copyright protection and the need for a fair balance between commercial interests and open access.

The recent development of generative AI (Gen AI) tools has added new layers to the old problem of access to knowledge. On the one hand, information is a fundamental right, and keeping it locked behind paywalls is detrimental to individual and collective human advancement. On the other hand, knowledge creation and publishing costs and those who work on them deserve compensation.

Sometimes, however, compensation does not flow reasonably in current commercial models. This gives rise to a recharged strive for a fair open access model to information. 

AI Between Knowledge and Profit

Last year, we saw a wave of lawsuits against major developers of AI-based tools such as Google, Meta, and OpenAI, as well as its investor, Microsoft. Some of these lawsuits concern the unlicensed usage of intellectual property to train AI models.

American comedian Sarah Silverman, whose memoir was allegedly used to train AI models, is among the authors suing AI companies. According to AP News, her claim that the book was used “Without consent, without credit, and compensation” summarizes the main pain points for the authors whose work is thus used to create another ultimately commercial product, such as ChatGPT.

Some argue that such practices might be permitted as a fair use of copyrighted material depending, among other things, on whether they transform the original work enough and their effect on the original’s market value. AI developers might have trouble finding quality data to continue their work if these practices are prohibited.

The Risks of the AI Knowledge Bubble

As AI and data companies wait for the courts to decide on copyright law matters, practical concerns stemming from withholding data from further training of the AI models should be addressed. AI tools are developing as an increasingly important source of information. 

Yet, hallucinate, when referring to AI answering with false information, was last year’s Cambridge Dictionary word of the year. These hallucinations will, unfortunately, increase if data is withheld from AI models.

The lack of diverse and up-to-date information created by humans is one of the reasons AI hallucinates. Without access to multifaceted and heterogeneous data sets, developers must use data synthesized by other Large Language Models (LLMs) to train their own. This creates an AI echo chamber, which some researchers call a model collapseOpens a new window . Models keep getting data describing the same probable reality, overestimating the probability of these outcomes and underestimating improbable ones. Gradually, the model forgets outliers completely, shrinking its understanding of what is possible.

The more AI models are built this way, the higher the risks involved. Namely, poorly trained tools will produce low-quality answers, spread misinformation, reaffirm stereotypes, and make parts of human knowledge virtually obsolete. Therefore, it is necessary to discuss ways copyright holders could exercise their rights fairly and without halting further innovation in the AI industry.

Paid Flow of Information

A fair mechanism for compensation that allows copyright holders to profit from commercial AI development would be a perfect solution. Alternatively, AI developers could only use such material for non-commercial purposes to advance knowledge and provide public access to benefit society. Unfortunately, accessing the most important knowledge scientists produce to benefit humankind is difficult.

Digital distribution of information should have reduced publishing costs, thus reducing the cost of knowledge access. Yet, commercial scholarly publishing powerhouses have high subscription fees and profit margins of up to 40%, according to June 2023 Academic Publishers Statistics,Opens a new window while remaining secretive about the costs of their internal procedures. 

These costs do not involve paying for the authors. Researchers get paid to do the research but not to publish it. They publish for academic credit, which enables them to advance their careers and to make their work known and useful to society. 

They must pay article processing charges (APCs) to be published in the most prestigious journals with the highest impact factors. According to Oxford University’s Publishing for Open Access report, the global average cost to publish in such journals can average between £2-3,500 but can be as high as £10,000.

The authors can also publish in journals that do not charge APCs. However, these publications are considered less authoritative and thus have a smaller impact on the development of scientific fields. Thus, even good research is often overlooked if scientists and their institutions cannot afford to publish it in major journals. Meanwhile, wider audiences who cannot afford the large subscription fees set by publishers have no access to the knowledge locked behind paywalls.

Keeping scholarly knowledge from public access also hinders AI development if developers decide not to take small-impact journals as authoritative enough or pay the high prices of prestigious journals.

In a rather dystopian scenario such as this, artificial and human intelligence will be left to develop within the limits of their echo chambers when costs and paywalls get in the way.

The Goal of Open-Access

The issues with the current systems of knowledge-sharing drive calls for a more urgent move to an open-access model. For scientific research, this would mean only publishing in open-access journals. These journals transparently report their article processing costs, which are in the lower hundreds and could be covered by government-funded scientific institutions. The published articles can then be freely accessed by everyone worldwide, including AI developers.

Open access, especially for scholarly sources, would improve the quality of the outputs LLMs can provide. AI tools better at digesting and distributing information would, in turn, make this information more accessible and useful to wider audiences. 

However, to promote (and support) this shift to open access, AI companies must avoid the pitfalls of the current knowledge industries. Unfortunately, it is hard to tell how everything will look in practice before we know what boundaries case law will put in place.

Ideally, the new system would involve compensation for the copyright owners whose work was used for algorithm training. If monetary compensation is not possible, they could benefit in another way, as researchers benefit from exposure and reputation-building when publishing their research in top-tier journals. Additionally, allowing you to opt out of using your work this way would be fair.

To be recognized as reputable commercial disseminators of knowledge, AI firms should steer in a direction other than that of the current academic publishing powerhouses. It would involve going for transparency about their operations, methods of data acquisition, and costs. Distributing value extracted from AI models more fairly rather than maximizing profits would help build more trust in them. 

Finally, whether AI developers can promote universally beneficial open access to knowledge depends on how open AI remains. The more capabilities of AI models remain free, the better their case for being given access to various data sources.

See More: How Multimodal Capabilities Can Revolutionize AI Models

Bridging Knowledge Gaps and Building Trust

The Internet has been developed to help researchers share ideas and information easily. Later, it became a tool to disseminate information to everyone worldwide, built on a firm belief that open access to information drives humanity forward. 

AI tools can advance open access by making knowledge easily findable and digestible. Establishing a fair system where the free flow of information is balanced with commercial interests should start by building trust between companies, authors, and consumers.

How is your workplace navigating the conversations surrounding copyright protection and the ethical training of AI? Let us know on FacebookOpens a new window , XOpens a new window , and LinkedInOpens a new window . We’d love to hear from you!

Image Source: Shutterstock

MORE ON AI KNOWLEDGE

Julius Černiauskas
Julius Černiauskas is Lithuania’s technology industry leader & the CEO of Oxylabs, a top global provider of premium proxies and web scraping solutions, employing over 400 specialists. Since joining the company in 2015, he successfully transformed the basic business idea of Oxylabs into the tech giant that it is today by employing his profound knowledge of big data and information technology trends. He implemented a brand-new company structure which led to the development of the market's most sophisticated public web data gathering service.
Take me to Community
Do you still have questions? Head over to the Spiceworks Community to find answers.