October 25, 2024
In 2022,
Software Freedom Conservancy (SFC)
convened a committee in the wake of Microsoft’s GitHub Copilot
announcement, to meet and begin considering the complex questions that arise
from the use of large language models (LLMs) in generative AI systems that
seek to assist software developers.
Everyone on our committee
has watched as interest in this issue has grown in the
FOSS community.
While the Committee was initially convened to consider how copyleft related
to these systems, our focus changed as we considered the complex issues. With
the unending influx of models, products, and projects in this area, we began
to see a potential dystopia: no systems available today are reproducible by
the public, and all of them seem to disrespect user rights and freedoms in
some manner. Rather than despair, we turned our minds to what FOSS does
best: imagining the ideal if corporate interests were not the primary force
defining society’s relationship with software.
In the past, the FOSS community has responded to new challenges with a
race-to-the-bottom document that defines the bare minimum of user rights and
freedoms that the community of activists will accept. For-profit companies
hope to legitimately claim whatever they produce is “FOSS enough”. As such,
we have avoided any process that effectively auto-endorses the problematic
practices of companies whose proprietary products are already widely deployed
. No system, particularly a proprietary one, should ever be „too big to
fail“.
While our proposal may seem unrealistic, nearly every proposal in the
history of FOSS has seemed unrealistic — until it happened. We call on
the FOSS community to not lament what is, but to dream and strive for what
can be. The statement follows:
Machine-Learning-Assisted Programming that Respects User Freedom
There has been intense industry ballyhoo about a specific branch of
Artificial Intelligence (AI): generative AI backed by large language models
(LLMs). We have reached an era in computing history where input data sets
for many different types of works are quite large (after decades of Internet
content archiving), and hardware is powerful enough to rebuild LLMs
repetitively. As FOSS (Free and Open Source Software) activists, we must
turn at least a modicum of attention to the matter, lest its future be
dominated by the same proprietary software companies that have curtailed user
rights for so long.
LLM-backed
generative AI impacts the
rights of everyone — including developers, creators, and
users. Software freedom, both in theory and practice, yields substantial
public good. Yet, traditional, narrow FOSS analysis has
boundaries and confines; it’s inadequate when applied to these
technologies.
We propose an aspirational vision of a FOSS, LLM-backed generative AI
system for computer-assisted programming that software rights supporters
would be proud to use and improve.
This narrow approach is by design. We are keenly cognizant that LLMs have
been built for myriad works — from visual art, to the spoken human
voice, to music, to literature, to actors‘ performances. However, this
document focuses on systems that employ LLM-backed generative AI to assist
programmers because such systems have a critical role in the future of FOSS.
While the impact of AI-based programming assistants‘ in the daily life of
programmers remains unclear (in the long term), it seems likely that AI
assistants have the potential to advance FOSS goals around the
democratization of software development. For example, such systems help
newcomers get started with unfamiliar codebases. We must look hopefully to
these technologies and seek ways to deploy them that help everyone.
Aspirational Target for a Software-Rights-Respecting AI Assisted
Programming System
The ideal system for generative-AI-assisted programming should have the
following properties:
- The system is built using only FOSS, and is used only for the creation of FOSS, and never for proprietary software. In this manner, the system would propagate and improve interest in software freedom and rights.
- The system must respect the principle of “FOSS in, FOSS out, and FOSS
throughout”. In detail, this means:- All software and generally useful
technical information (including but not limited to: user interface code and
applications for generating new material from the model, data cleaning code,
model architecture, hyper parameters, model weights, and the model itself)
needed to create the system are freely available to the public under a FOSS license. - All training data should be fully identified, and available freely and publicly on the Internet, under a FOSS license.
- All software and generally useful
- The system will aid the user in adding necessary licensing notices and determining any licensing requirements
of the output.
As an aspirational document, this is not intended to be prescriptive nor definitional. We describe the absolute ideal LLM-backed generative AI system for FOSS that we can imagine. Articulating the ideal paves the road to understanding why common consensus remains insufficient. We must be the change we want in the world, and strive for what is right — until the politically unviable becomes viable.