Introducing SysEngBench: A Novel Benchmark for Assessing Large Language Models in Systems Engineering

Ryan Bell, Ryan Longshore; Raymond Madachy

Please use this identifier to cite or link to this item: https://dair.nps.edu/handle/123456789/5135

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ryan Bell, Ryan Longshore	-
dc.contributor.author	Raymond Madachy	-
dc.date.accessioned	2024-06-03T13:53:26Z	-
dc.date.available	2024-06-03T13:53:26Z	-
dc.date.issued	2024-05-01	-
dc.identifier.citation	APA	en_US
dc.identifier.uri	https://dair.nps.edu/handle/123456789/5135	-
dc.description	SYM Paper	en_US
dc.description.abstract	In the rapidly evolving field of artificial intelligence (AI), Large Language Models (LLMs) have demonstrated unprecedented capabilities in understanding and generating natural language. However, their proficiency in specialized domains, particularly in the complex and interdisciplinary field of systems engineering, remains less explored. This paper introduces SysEngBench, a novel benchmark specifically designed to evaluate LLMs in the context of systems engineering concepts and applications. SysEngBench will encompass a comprehensive set of tasks derived from core systems engineering processes, including requirements analysis, system architecture design, risk management, and stakeholder communication. By leveraging a diverse array of real-world and synthetically generated scenarios, SysEngBench aims to provide an assessment of LLMs’ ability to interpret complex engineering problems and generate innovative solutions. Our evaluation of leading LLMs using SysEngBench reveals significant insights into their current capabilities and limitations in systems engineering contexts. The findings suggest pathways for future research and development aimed at enhancing LLMs’ utility in the systems engineering discipline. SysEngBench contributes to the understanding of AI’s potential impact on systems engineering.	en_US
dc.description.sponsorship	ARP	en_US
dc.language.iso	en_US	en_US
dc.publisher	Acquisition Research Program	en_US
dc.relation.ispartofseries	Acquisition Management;SYM-AM-24-072	-
dc.subject	Systems Engineering	en_US
dc.subject	Custom Generative Pre-trained Transformer (GPT)	en_US
dc.subject	Risk Identification	en_US
dc.subject	Risk Analysis	en_US
dc.subject	Risk Management	en_US
dc.subject	Large Language Model (LLM)	en_US
dc.title	Introducing SysEngBench: A Novel Benchmark for Assessing Large Language Models in Systems Engineering	en_US
dc.type	Technical Report	en_US
Appears in Collections:	Annual Acquisition Research Symposium Proceedings & Presentations

Files in This Item:

File	Description	Size	Format
SYM-AM-24-072.pdf		939.13 kB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets