Assessing the Efficiency of Gradient Descent Variants in Training Neural Networks
DOI:
https://doi.org/10.36676/dira.v12.i3.114Keywords:
Neural Networks, Efficiency of Gradient Descent VariantsAbstract
Neural networking has thus become an important technique in artificial intelligence (AI) because of its ability in relational determination on data. These networks drawing their paradigms from the human brain have greatly transformed such fields as computer vision, natural language processing, and autonomous systems. However, the training of neural networks is still computationally expensive even for today’s commodity hardware, no matter how much simpler networks become more complex and deep. This is due to the fact that it is occasionally necessary to fine-tune millions or even billions of parameters, and this is accomplished through learning procedures.An integral part of this learning is the optimization of a cost function or error function, and in this we most frequently use gradient-based methods.
References
Acharyya, R. (2022). International Economics: an introduction to theory and policy. Oxford University Press.
Srinivasan, V., Sankar, A. R., & Balasubramanian, V. N. (2018, January). ADINE: An adaptive momentum method for stochastic gradient descent. In Proceedings of the ACM india joint international conference on data science and management of data (pp. 249-256).
Ramezani-Kebrya, A., Khisti, A., & Liang, B. (2021). On the generalization of stochastic gradient descent with momentum.
Cutkosky, A., & Orabona, F. (2019). Momentum-based variance reduction in non-convex sgd. Advances in neural information processing systems, 32.
Zhang, J. (2019). Gradient descent based optimization algorithms for deep learning models training. arXiv preprint arXiv:1903.03614.
Lian, Y., Tang, Y., & Zhou, S. (2022). Research on three-step accelerated gradient algorithm in deep learning. Statistical Theory and Related Fields, 6(1), 40-57.
Chen, S., Shi, B., & Yuan, Y. X. (2023). On Underdamped Nesterov's Acceleration. arXiv preprint arXiv:2304.14642.
Zhao, W., & Huang, H. (2024). Adaptive stepsize estimation based accelerated gradient descent algorithm for fully complex-valued neural networks. Expert Systems with Applications, 236, 121166.
Dogo, E. M., Afolabi, O. J., Nwulu, N. I., Twala, B., & Aigbavboa, C. O. (2018, December). A comparative analysis of gradient descent-based optimization algorithms on convolutional neural networks. In 2018 international conference on computational techniques, electronics and mechanical systems (CTEMS) (pp. 92-99). IEEE.
Pearlmutter, B. A. (1996). An investigation of the gradient descent process in neural networks. Carnegie Mellon University.
Klein, S., Pluim, J. P., Staring, M., & Viergever, M. A. (2009). Adaptive stochastic gradient descent optimisation for image registration. International journal of computer vision, 81, 227-239.
Sipper, M. (2022). High per parameter: A large-scale study of hyperparameter tuning for machine learning Algorithms. Algorithms, 15(9), 315.
Weerts, H. J., Mueller, A. C., & Vanschoren, J. (2020). Importance of tuning hyperparameters of machine learning algorithms. arXiv preprint arXiv:2007.07588.
Elgeldawi, E., Sayed, A., Galal, A. R., & Zaki, A. M. (2021, November). Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis. In Informatics (Vol. 8, No. 4, p. 79). MDPI.
Jin, R., & He, X. (2020, October). Convergence of momentum-based stochastic gradient descent. In 2020 IEEE 16th International Conference on Control & Automation (ICCA) (pp. 779-784). IEEE.
Huang, F., Gao, S., Pei, J., & Huang, H. (2020, November). Momentum-based policy gradient methods. In International conference on machine learning (pp. 4422-4433). PMLR.
Maurya, M., & Yadav, N. (2022, May). A comparative analysis of gradient-based optimization methods for machine learning problems. In International Conference on Data Analytics and Computing (pp. 85-102). Singapore: Springer Nature Singapore.
Pydipalli, R., & Tejani, J. G. (2019). A Comparative Study of Rubber Polymerization Methods: Vulcanization vs. Thermoplastic Processing. Technology & Management Review, 4, 36-48.
Rodriguez, M., Tejani, J. G., Pydipalli, R., & Patel, B. (2018). Bioinformatics Algorithms for Molecular Docking: IT and Chemistry Synergy. Asia Pacific Journal of Energy and Environment, 5(2), 113-122. https://doi.org/10.18034/apjee.v5i2.742
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Darpan International Research Analysis
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.