Exploring challenges of Large Language Models in estimating the distance
- Autor(en)
- Mina Karimi, Krzysztof Janowicz
- Abstrakt
Generative Artificial Intelligence (GenAI), particularly Large Language Models (LLMs), has experienced significantgrowth and attention in recent years. These advanced models, exemplified by ChatGPT, have demonstrated remarkablecapabilities in generating text and image results using prompts from natural language. The ability to generate coherentand contextually relevant responses has led to widespread applications across diverse domains such as education,transportation, healthcare, law, finance, scientific research, and geography (Dash et al., 2023; Zhao et., 2023).LLMs, despite only being trained to predict the next token, have shown significant abilities (Bubeck et al., 2023). Thishas led to questions about what these models have truly learned. One idea is that LLMs gather many correlationsbetween words but don't truly understand coherent model or the data they are trained on. Another idea is that duringtraining, LLMs develop clearer and more understandable models of the data generating process, known as a "worldmodel" (Gurnee & Tagmark, 2023). In essence, ChatGPT goes beyond the traditional GPT model and can respond to awide range of human queries and incorporate facts from external sources to enhance accuracy and reliability with theuse of techniques like retrieval-augmented generation (RAG). However, as ChatGPT becomes more prevalent, there isa growing attention to its geographical perceptions and the accuracy of its outputs in this domain. Questions arise abouthow well LLMs, trained on vast textual datasets, truly understand geographical information and whether they canprovide trustworthy responses of geographical concepts. As these models continue to evolve and find applications invarious domains, there is a heightened focus on examining their effectiveness and limitations in handling geographicalqueries. (Ji & Gao, 2023).In this paper, we investigate the potential of LLMs, for instance ChatGPT, in understanding geographical concepts suchas estimating the distances between cities. We examined its performance for well-known and major cities as well assmaller or lesser-known cities. When asked about the distance between major cities like Vienna and Salzburg in Austriaor Tehran and Isfahan in Iran, ChatGPT can provide a reasonable estimation based on its training data. These cities arecommonly featured in datasets used for training such models, allowing ChatGPT to generate accurate distances.However, the story changes when it comes to smaller or less prominent cities. For cities that are not as widely known,such as Zwettl and Bad Goisern in Austria or Kashmar and Kesheh in Iran, ChatGPT struggles to provide accuratedistances. Due to the limited representation of such cities in its training data, ChatGPT may resort to generating randomor incorrect values. These inaccurate distances persist even when we specify the province of these small cities to thelanguage model (e.g., Kashmar in Khorasan Razavi, and Kesheh in Isfahan). The examples are shown in Table 1.This limitation is especially noticeable when asking for distances between cities with similar names, such as the variousSpringfields in the United States and the Feldkirchs in Alsace, France, and Austria. Despite being distinct cities,ChatGPT may produce erroneous distances or fail to differentiate between them, leading to ambiguous or incorrectresponses. This underscores the need for caution when relying on LLMs for geographical information, especially forlesser-known or similarly named places.Furthermore, although promising in urban science, the utilization of generative AI also encounters ethical issues likemisinformation and bias, sometimes lacking accuracy in portraying compositions and locations under specificcircumstances (Jang et al., 2023; Kang et al., 2023). For example, there exists a significant bias in the language used toquery distances. When asked in English, ChatGPT may provide more accurate results for well-known cities due to theprevalence of English-language datasets containing information on these cities. However, when asked in Persian, theresults vary, as the model's training data may not be as rich or diverse in Persian-language geographical information. In summary, while ChatGPT excels at estimating distances between major cities, its performance diminishes for smalleror less common locations. The language used to query distances also has a noticeable bias, and difficulties arise withlocations that have similar names. These limitations highlight the challenges of relying solely on LLMs for accurate andreliable geographical information, especially in diverse linguistic contexts and for lesser-known cities.
- Organisation(en)
- Institut für Geographie und Regionalforschung
- Journal
- Abstracts of the ICA
- Band
- 7
- Seiten
- 68
- Anzahl der Seiten
- 1
- ISSN
- 2570-2106
- DOI
- https://doi.org/10.5194/ica-abs-7-68-2024
- Publikationsdatum
- 2024
- Peer-reviewed
- Ja
- ÖFOS 2012
- 507003 Geoinformatik, 102001 Artificial Intelligence, 102035 Data Science
- Sustainable Development Goals
- SDG 3 – Gesundheit und Wohlergehen, SDG 11 – Nachhaltige Städte und Gemeinden
- Link zum Portal
- https://ucrisportal.univie.ac.at/de/publications/f9b47087-548a-4bad-aa40-40bf7ed7c20c