Please use this identifier to cite or link to this item: https://dair.nps.edu/handle/123456789/5135
Full metadata record
DC FieldValueLanguage
dc.contributor.authorRyan Bell, Ryan Longshore-
dc.contributor.authorRaymond Madachy-
dc.date.accessioned2024-06-03T13:53:26Z-
dc.date.available2024-06-03T13:53:26Z-
dc.date.issued2024-05-01-
dc.identifier.citationAPAen_US
dc.identifier.urihttps://dair.nps.edu/handle/123456789/5135-
dc.descriptionSYM Paperen_US
dc.description.abstractIn the rapidly evolving field of artificial intelligence (AI), Large Language Models (LLMs) have demonstrated unprecedented capabilities in understanding and generating natural language. However, their proficiency in specialized domains, particularly in the complex and interdisciplinary field of systems engineering, remains less explored. This paper introduces SysEngBench, a novel benchmark specifically designed to evaluate LLMs in the context of systems engineering concepts and applications. SysEngBench will encompass a comprehensive set of tasks derived from core systems engineering processes, including requirements analysis, system architecture design, risk management, and stakeholder communication. By leveraging a diverse array of real-world and synthetically generated scenarios, SysEngBench aims to provide an assessment of LLMs’ ability to interpret complex engineering problems and generate innovative solutions. Our evaluation of leading LLMs using SysEngBench reveals significant insights into their current capabilities and limitations in systems engineering contexts. The findings suggest pathways for future research and development aimed at enhancing LLMs’ utility in the systems engineering discipline. SysEngBench contributes to the understanding of AI’s potential impact on systems engineering.en_US
dc.description.sponsorshipARPen_US
dc.language.isoen_USen_US
dc.publisherAcquisition Research Programen_US
dc.relation.ispartofseriesAcquisition Management;SYM-AM-24-072-
dc.subjectSystems Engineeringen_US
dc.subjectCustom Generative Pre-trained Transformer (GPT)en_US
dc.subjectRisk Identificationen_US
dc.subjectRisk Analysisen_US
dc.subjectRisk Managementen_US
dc.subjectLarge Language Model (LLM)en_US
dc.titleIntroducing SysEngBench: A Novel Benchmark for Assessing Large Language Models in Systems Engineeringen_US
dc.typeTechnical Reporten_US
Appears in Collections:Annual Acquisition Research Symposium Proceedings & Presentations

Files in This Item:
File Description SizeFormat 
SYM-AM-24-072.pdf939.13 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.