Optimizing large language models: Techniques for efficiency and scalability in NLP applications
Department of Master of Computer Applications, RV College of Engineering, Bengaluru, India.
Research Article
International Journal of Engineering Research Updates, 2024, 07(02), 046-055.
Article DOI: 10.53430/ijeru.2024.7.2.0040
Publication history:
Received on 16 August 2024; revised on 25 September 2024; accepted on 28 September 2024
Abstract:
Optimizing the efficiency and scalability of Large Language Models (LLMs) is crucial for advancing Natural Language Processing (NLP) applications. This paper explores various optimization techniques for enhancing LLMs, focusing on strategies that improve computational efficiency and model scalability. The paper provides a comparative analysis of these techniques by evaluating their impact on key performance metrics, such as training time, memory usage, and inference speed. Through rigorous experimentation, our optimized model demonstrated a 30% reduction in training time and a 25% decrease in memory consumption, while maintaining competitive accuracy levels. The integration of these optimization techniques into a comprehensive framework facilitates enhanced operational efficiency and resource utilization. The findings underscore the significant benefits of adopting optimization strategies in LLMs, offering a valuable approach for improving the performance and scalability of NLP applications.
Keywords:
LLMs; Training Performance Metrics; Inference Time; GPU; TPU; Hyperparameter; Model Pruning
Full text article in PDF:
Copyright information:
Copyright © 2024 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0