There is another way

The Growing backlash against AI
We are not short of information about the rapid development of Large Language Models for AI. It would be quite easy to spend every working day reading Substack newsletters and LinkedIn posts. However, it could become somewhat dispiriting. Its worth noting that while the owners and shareholders of the big tech companies daily boast of the progress AI is making in building a better world, the backlash against AI grows daily - see Alberto Romeo’s comprehensive summary in a post entitled How America Turned Against AI According to the Poll Data: A (Very Big) Compilation. Quick summary: “Across every pollster—Gallup, Pew, Change Research, the Washington Post, Marquette, Morning Consult, Heatmap, NBC News, Politico, the University of Maryland—the direction is the same: Americans have soured on AI datacenters, on AI itself, and on the people building it.” And although it has not reached that stage yet in Europe it is not difficult to image it will.
An open-source LLM that speaks Sardinian
Anyway, while wading through the mountain of AI inspired commentary (most of it probably written by AI, it is possible to find occasional sources of inspiration and hope. On LinkedIn I stumbled a post by Luca Ballore, a Software Engineer at DICE (Electronic Arts). Luca says: “Today I'm releasing a hobby project I've had in mind for quite some time, but never had the possibility to work on, for many reasons. It's something I'm deeply connected to and that makes me particularly proud. I'd like to present LLiMba, an open-source LLM that speaks Sardinian. Sardinian is a Romance language with roughly 1 million speakers in Sardinia (Italy), classified as endangered by UNESCO. It's also my real first language, as my first word was actually in Sardinian. I started speaking it before Italian, before anything else. It's the language my parents and grandparents spoke at home, and basically the language I grew up hearing every day.”
It seems that until now, it has had practically no official support in modern NLP and no commercial translation service supports. Although major LLMs can produce "some" Sardinian if you prompt them carefully or hand them a grammar reference, it is very limited.
Marco continues: “That gap shouldn't exist for a living language, however small the community. So I built LLiMba. Starting from Qwen2.5-3B-Instruct, the model was adapted through continued pretraining on Sardinian text gathered from every digital source I could find: news sites, the Sardinian Wikipedia, decades of literary translations made by community translators. I then ran supervised fine-tuning on instruction pairs, and made the full pipeline run on a single consumer GPU, basically the one in my gaming computer (an NVIDIA RTX 4090 with 24 GB).”
The result, he says. is a model that holds conversations in Sardinian, translates to and from it, and outperforms its base model by orders of magnitude on translation benchmarks. It's not perfect (small models never are) but it's a real first step in the right direction.
In what should become normal to technical descriptions on LinkedIn and in such publications (but probably won’t) Marco provides open links to the Models and datasets, code, a live demo and a paper.
Small Language Models and Open Source development
Marco’s work shows the promise of what AI technology could bring. And it offers an alternative to those who say Europe should deregulate and follow the example of the large tech companies in the USA. Imagine if the EU started providing serious support for initiative like LLiMba including infrastructure and linking with universities and communities and SMEs. For some time, I have been arguing about the potential. Small Language Models and Open Source development – its time to move it forward now.
About the Image
This is an illustration done on Procreate representing facial recognition technology. It has symbols relating to how AI works in facial recognition software, like locating the unique distance between eyes, nose shapes, etc. The mathematical symbols, like the distance formula and ruler, are representations of the AI software at work. This project started as a graphic requested by an AI literacy speaker at Oregon State University.
