Could an artificial intelligence save us work in our research project? Could it understand the meaning of words and classify them as we want? A group of colleagues from the ESMAS-ES+ project, Majo Domínguez, Iván Arias, and myself, set out to answer these questions. We analyzed the reliability of several AI models (ChatGPT, Gemini) in semantic tagging of nouns and compared it with that of the project members doing the same task. Who classified nouns better based on their meaning? People or AIs? At what point in the process did each face more challenges?
For now, humans are winning, but not by much. The results were interesting enough to share with other specialists, so we went to Cavtat, near Dubrovnik, to present a poster at the 21st EURALEX congress. Now that the conference is over and we are back home, it’s time to reflect on our experience.
During the development of our semantic annotation system with the PORTLEX lexical ontology, we built on the experience of previous projects. This time we used a German text from a TED Talks corpus. This source of varied and multilingual texts gave us a solid foundation, although the experiment presented some challenges. One of the biggest challenges was creating an effective prompt for the AI models, which led us to learn about prompt engineering to achieve good results. Additionally, language models are constantly updated, which forced us to adjust our strategies several times.
In this first experiment with AI, we found that language models are quite good at identifying nouns in a text. However, some nouns escape them and are not recognized properly, meaning they fail to identify everything they should. Still, we believe that although the performance of these automatic models does not yet reach the precision of human annotation, the results are a promising starting point, especially with additional adjustments to the prompts.
Attending the EURALEX conference gave us the opportunity to showcase our work to the academic and professional community. The feedback from other specialists helped us see the results of this first experiment from different perspectives, identify key areas for improvement, and validate our methodologies. This type of interaction and collaboration is essential for the growth of projects as specific as ESMAS-ES+. Moreover, in Cavtat we saw that we are not the only ones collaborating with AI in lexicography. The exchange of ideas with colleagues with these interests inspired and motivated us to continue improving the integration between human work and artificial intelligence in this field of applied linguistics.