Artificial IntelligenceGenAI

SWIFT ISO 20022 Mandate for Structured Address Challenges Mitigated by Generative Artificial Intelligence

By Dr. Georgios Vranopoulos, Research Fellow, University of Plymouth

Generative Artificial Intelligence (Gen-AI) is becoming the driving force in all disciplines including research and industry after crossing the inflated expectations stage into the maturity era where business adoption will be evolving into tactical initiatives as shown in Figure 1 [1]. Signifying the initialization of the actual implementation and realization era for AI adoption, as confirmed by McKinsey [2], see Figure 2.

Figure 1. Artificial Intelligence Hype Cycle
Figure 2. Move beyond the potential to proven applications of Gen-AI

Society for Worldwide Interbank Financial Telecommunication (SWIFT) ISO 20022 mandate is directing all financial organization, using the SWIFT network for cross-border payments, to adopt a structured address within the SWIFT Message XML (MX) message rather than the old approach which was utilizing unstructured addresses, see Table 1 [3]. The adoption timeframe set by SWIFT is November 2026, post the adoption of MX at which time the coexistence with Cross-Border Payments and Reporting Plus (CBPR+) will end [4]. This poses a steep timeline for migration and a challenge to many financial establishments, including banks [5].

Table 1. Address Samples

Manual or rule-based transformation of all addresses from an unstructured format into the ISO 20022 structured format can by a tiresome task, prone to errors [7]. Towards mitigating this challenge Gen-AI is “called into arms” in facilitating a seamless migration. Of course, human review and validation are required just like in any migration, but review is much simpler and less time consuming.

The banking industry is a heavily regulated sector, especially post-2008 financial crisis [9]. Based on the regulatory requirements, the team utilized a private Azure infrastructure, with zero data retention, through the corporate network in safeguarding data confidentiality and Data Loss Prevention (DLP). The same principles were employed in identifying the sample set to be only fifteen (15) addresses. The LLMs used were also selected based on their compliance to zero data footprint and Microsoft’s responsible AI integrated policies and cognitive policies whilst an additional cybersecurity layer was provisioned with the use of Azure Application Programming Interface Management (APIM) in protecting the LLMs end-points from unlawful usage.

Lab experiments confirm Gen-AI’s proficiency in accurately transforming unstructured data into schema-led information by utilizing insightful prompt engineering.

In this approach several Large Language Models (LLMs) have been utilised on a sample set of fifteen (15) addresses. The evaluation was based on the models’: (i) accuracy of results, where the quality of the result is quantified. The evaluation was subdivided into two (2) independent ratings: (a) One coming from a set of cross-functional experts. The team of 12 Subject Matter Experts (SMEs) was composed of Payments experts, Information Technology (IT) experts and Data Scientists, having an aggregated experience of almost three centuries (281 years) with an average of 23 years in the respective space. (b) and an automated one that was validating the structure of the XML and the existence of required elements (eg. country) [8]. And (ii) speed of execution, where the time response time for each LLM generation is measured. In having a sufficient sample data, the executions for each model were repeated at least three times and response times were averaged. Based on the evaluation results a balanced scorecard was created attributing different weights to each Key Performance Indicator (KPI) and represented with the use of a heat-map for ease of visualization. The evaluation team considered high weights for accuracy, attributing the user evaluation 50% and the automated validation another 40%, leaving 10% for the time execution. The rationale behind this decision was that the structural integrity and data quality were of vital importance whilst execution times could be factored in with careful migration planning. In factoring the potential sizing of the addresses to be parsed, a penalty was introduced accounting for 50% of the attributed percentage for execution time. The limit for the penalty was set to sixty (60) days with a threshold of two million addresses. For the metric “without penalty”, ten thousand (10,000) records were extrapolated whilst for the metric “with penalty” two million (2,000,000) were extrapolated.

The lab experiments confirmed that Gen-AI is capable of adequately migrating the data with the use of appropriate prompt engineering. The prompts included:

  • the definitions of the elements coming from the SWIFT MX guidelines [3].
  • directives in preventing hallucination.
  • directives for required elements like the country.
  • directives for typo correction and generation of missing elements.

In the experiment the following LLMs were utilised as follows in alphabetical order:

  • Falcon
  • GPT 3.5 Turbo
  • GPT 4.1
  • GPT 4o
  • GPT o3 Mini
  • Llama 2
  • Llama 3.2
  • Mistral
  • Mistral Nemo

The LLMs are anonymized in the results in preserving impartiality and refraining from any marketing or other kind of promotion. The results presented in Table 2, tabulate the outcome of the experiment using a balanced scorecard heatmap approach where dark green represents the best outcome, amber medium and red indicates inefficiencies. Based on the outcome of the scorecard a ranking is calculated.

ModelRatingExecution Time Contribution 10%Final Score Weighted – 100%Rank
SME Contribution 50%Rule Contribution 40%with Penaltywithout Penaltywith Penaltywithout Penaltywith Penaltywithout Penalty
1st64%96%100%100%80.3%80.3%23
2nd69%98%89%89%82.4%82.4%11
3rd70%100%17%67%76.5%81.5%42
4th33%91%0%44%52.7%57.2%78
5th57%100%0%22%68.7%70.9%55
6th45%73%6%56%52.2%57.2%87
7th45%100%0%33%62.3%65.7%66
8th23%18%0%11%18.4%19.6%99
9th64%100%78%78%79.8%79.8%34
Table 2. Balanced Scorecard Heatmap

The SMEs’ evaluations are presented in Figure 3, where it is observed that models 1, 2, 3 and 9 have the highest performance. The bubbles represent the number of addresses rated in the respective category across all SMEs evaluation per LLM.

Figure 3. Models Performance

The overall performance for the SMEs’ evaluations along with the algorithmic evaluation is showcased in Figure 4. The best performing models have been highlighted with a checkmark and are: (i) for SME’s, model 3 and (ii) for Automated, models 3,5 and 9.

Figure 4. User & Rule Based Performance

Execution times efficiency is depicted in Figure 5 where we can observe, as indicated by the checkmark, that model 1 has the fastest response times irrespective of the number of records processed (representing the 10,000 VS 2,000,000 records).

Figure 5. Execution Time Performance

By combining the data available at hand, the overall performance is calculated and represented in Figure 6. Depending on the expected data set to be processed, closest to 10,000 or 2,000,000 records, the ranking is superimposed for ease of identification:

  • for small datasets (without penalty), model 2 with 82.39% efficiency seems to be the best, followed by models 3 and 1 with efficiencies of 81.5% and 80.28% respectively.
  • for large datasets (with penalty), model 2 with 82.39% efficiency seems to be the best followed by models 1 and 9 with efficiencies of 80.28% and 79.78% respectively.
Figure 6. Overall Weighted Performance

References:
[1]      Gartner, Hype Cycle for Artificial Intelligence. 2024.
[2]      D. De Kroon, E. De Boer, R. Shahani, and S. Alabaster, ‘Lessons from the Lighthouses | McKinsey’. Accessed: Mar. 31, 2025. [Online]. Available: https://www.mckinsey.com/capabilities/operations/our-insights/lessons-from-the-lighthouses?hsid=a7fda29a-004a-44f1-9790-39669fc94e3c
[3]      Swift, ‘Standards MX Message Reference Guide’, 2024.
[4]      Swift, ‘ISO 20022: Implementation | Swift’. Accessed: May 02, 2025. [Online]. Available: https://www.swift.com/standards/iso-20022/iso-20022-faqs/implementation
[5]      P. Pathak and P. Bhosale, ‘The next big ISO 20022 migration: structured addresses’. Accessed: May 02, 2025. [Online]. Available: https://www.redcompasslabs.com/insights/the-next-big-iso-20022-migration-structured-addresses/
[6]      ‘EPC Guidance Document Provision of Addresses under the EPC Payment Schemes EPC153-22 / Version 2.0’, Nov. 2024. [Online]. Available: www.epc-cep.eu
[7]      A. Thakur, ‘AI-Driven Validation Frameworks for SWIFT/ISO 20022 Financial Messaging Systems’, International Journal of Information Technology and Management Information Systems, vol. 16, no. 2, pp. 612–627, Mar. 2025, doi: 10.34218/IJITMIS_16_02_040.
[8]      D. Vogel, ‘iso20022-structured-postal_address-v1.6’, Mar. 2024.
[9]      R. Kemp, ‘Legal aspects of managing Big Data’, Computer Law & Security Review, vol. 30, no. 5, pp. 482–491, 2014, doi: 10.1016/j.clsr.2014.07.006.