November 18, 2024 by Dana McKay and George Buchanan, The Conversation
Collected at: https://techxplore.com/news/2024-11-ai-megaplatforms-hyperlinks-built-web.html
The original idea for the world wide web emerged in a flurry of scientific thought around the end of World War II. It began with a hypothetical machine called the “memex,” proposed by US Office of Scientific Research and Development head Vannevar Bush in an article titled “As We May Think,” published in The Atlantic in 1945.
The memex would help us access all knowledge, instantaneously and from our desks. It had a searchable index, and documents were linked together by the “trails” made by users when they associated one document with another. Bush imagined the memex using microfiche and photography, but conceptually it was almost the modern internet.
The true value in this early idea was the links: if you wanted to explore more, there was an easy, built-in way to do that. Anyone who has spent hours following random links on Wikipedia and learning about things they never knew interested them will recognize this value. (There is of course a Wikipedia page about this phenomenon.)
Links have made the web what it is. But as social media platforms, generative AI tools and even search engines are trying harder to keep users on their site or app, the humble link is starting to look like an endangered species.
The laws of links
Modern search engines were developed in the shadow of the memex, but at first they faced unexpected legal issues. In the early days of the internet, it was not clear whether “crawling” web pages to ingest them into a search engine index was a violation of copyright.
It was also not clear whether, in linking to information that might help someone build a bomb, defraud someone, or carry out some other nefarious activity, search engines or website hosts were “publishers.” Being publishers would make them legally liable for content they hosted or linked to.
The issue of web crawling has been dealt with by a combination of fair use, country-specific exemptions for crawling, and the “safe harbor” provisions of the US Digital Millenium Copyright Act. These permit web crawling as long as the search engines do not alter the original work, link to it, only use it for a relatively short term, and don’t profit from the original content.
The issue of problematic content was addressed (at least in the very influential US jurisdiction) via legislation called Section 230. This offers immunity to “providers or users of interactive computer services” who deliver information “provided by another content provider.”
Without this law, the internet as we know it couldn’t exist, because it is impossible to manually check every page linked to or every social media post for illegal content.
This doesn’t mean the internet is a complete Wild West, though. Section 230 has been successfully challenged on the basis of illegal discrimination, when a mandatory questionnaire about housing asked for race. More recently, a case brought against TikTok has suggested platforms are not immune when their algorithms recommend specific videos.
The web’s social contract is failing
All of the laws that have created the internet, though, have relied on links. The social contract is that a search engine can scrape your site, or a social media company can host your words or pictures, as long as they give you, the person who created it, credit (or discredit if you’re giving bad advice). The link isn’t just the thing you follow down a Wikipedia rabbit hole, it’s a way of giving credit, and allowing content creators to profit from their content.
Large platforms, including Google, Microsoft and OpenAI, have used these laws, and the social contract they imply, to keep ingesting content at industrial scale.
The provision of links, eyeballs and credit, though, is falling as AI does not link to its sources. To take one example, news snippets provided in search engines and social media have displaced the original articles so much that tech platforms now have to pay for these snippets in Australia and Canada.
Large tech companies value keeping people on their sites as clicks can be monetized by selling personalized ads.
Another problem with AI is that it typically relearns infrequently and holds onto dated content. While the latest AI-powered search tools claim to do better on this front, it is unclear how good they are.
And, as with news snippets, large corporates are reluctant to give credit and views to others. There are good people-centered reasons for social media companies and search engines to want you to not have to leave. A key value of ChatGPT is providing information in a single, condensed form so you never have to click a link—even if one is available.
Copyright and creativity
Is the sidelining of links a good thing, though? Many experts argue not.
Using content without credit is arguably copyright infringement. Replacing artists and writers with AI reduces creativity in society.
Summarizing information, without linking out to original sources, reduces people’s ability to fact check, is prone to bias, and may reduce the learning, thought and creativity supported by browsing many documents. After all, Wikipedia would be no fun without the rabbit hole, and the internet without links is just an online book written by a robot.
AI backlash looms
So what does the future hold? Ironically, the same AI systems that have made the link problem worse have also increased the likelihood that things will change.
The copyright exemptions that apply for crawling and linking are being challenged by creatives whose work has been incorporated into AI models. Proposed changes to Section 230 law may mean that digital platforms are safer to link to material than replicate it.
We have power for change, too: where links exist, click on them. You never know where following a trail might take you.
Leave a Reply