Introducing SysEngBench: A Novel Benchmark for Assessing Large Language Models in Systems Engineering

Please use this identifier to cite or link to this item: https://dair.nps.edu/handle/123456789/5254

Title:	Introducing SysEngBench: A Novel Benchmark for Assessing Large Language Models in Systems Engineering
Authors:	Ryan Bell
Keywords:	Systems Engineering Custom Generative Pre-trained Transformer GPT Risk Identification
Issue Date:	27-Aug-2024
Publisher:	Acquisition Research Program
Citation:	APA
Series/Report no.:	Acquisition Management;SYM-AM-24-160
Abstract:	In the rapidly evolving field of artificial intelligence (AI), Large Language Models (LLMs) have demonstrated unprecedented capabilities in understanding and generating natural language. However, their proficiency in specialized domains, particularly in the complex and interdisciplinary field of systems engineering, remains less explored. This paper introduces SysEngBench, a novel benchmark specifically designed to evaluate LLMs in the context of systems engineering concepts and applications. SysEngBench will encompass a comprehensive set of tasks derived from core systems engineering processes, including requirements analysis, system architecture design, risk management, and stakeholder communication. By leveraging a diverse array of real-world and synthetically generated scenarios, SysEngBench aims to provide an assessment of LLMs’ ability to interpret complex engineering problems and generate innovative solutions.
Description:	SYM Presentation
URI:	https://dair.nps.edu/handle/123456789/5254
Appears in Collections:	Annual Acquisition Research Symposium Proceedings & Presentations

Files in This Item:

File	Description	Size	Format
SYM-AM-24-160.pdf	Presentation	3.02 MB	Adobe PDF	View/Open

DSpace JSPUI