Assessing the Efficiency of Gradient Descent Variants in Training Neural Networks

Authors

  • Ria Kundra ria_cse_2025@msit.in
  • Ojaswi Ojaswisharma2013.study@gmail.com

DOI:

https://doi.org/10.36676/dira.v12.i3.114

Keywords:

Neural Networks, Efficiency of Gradient Descent Variants

Abstract

Neural networking has thus become an important technique in artificial intelligence (AI) because of its ability in relational determination on data. These networks drawing their paradigms from the human brain have greatly transformed such fields as computer vision, natural language processing, and autonomous systems. However, the training of neural networks is still computationally expensive even for today’s commodity hardware, no matter how much simpler networks become more complex and deep. This is due to the fact that it is occasionally necessary to fine-tune millions or even billions of parameters, and this is accomplished through learning procedures.An integral part of this learning is the optimization of a cost function or error function, and in this we most frequently use gradient-based methods.

References

Acharyya, R. (2022). International Economics: an introduction to theory and policy. Oxford University Press.

Srinivasan, V., Sankar, A. R., & Balasubramanian, V. N. (2018, January). ADINE: An adaptive momentum method for stochastic gradient descent. In Proceedings of the ACM india joint international conference on data science and management of data (pp. 249-256).

Ramezani-Kebrya, A., Khisti, A., & Liang, B. (2021). On the generalization of stochastic gradient descent with momentum.

Cutkosky, A., & Orabona, F. (2019). Momentum-based variance reduction in non-convex sgd. Advances in neural information processing systems, 32.

Zhang, J. (2019). Gradient descent based optimization algorithms for deep learning models training. arXiv preprint arXiv:1903.03614.

Lian, Y., Tang, Y., & Zhou, S. (2022). Research on three-step accelerated gradient algorithm in deep learning. Statistical Theory and Related Fields, 6(1), 40-57.

Chen, S., Shi, B., & Yuan, Y. X. (2023). On Underdamped Nesterov's Acceleration. arXiv preprint arXiv:2304.14642.

Zhao, W., & Huang, H. (2024). Adaptive stepsize estimation based accelerated gradient descent algorithm for fully complex-valued neural networks. Expert Systems with Applications, 236, 121166.

Dogo, E. M., Afolabi, O. J., Nwulu, N. I., Twala, B., & Aigbavboa, C. O. (2018, December). A comparative analysis of gradient descent-based optimization algorithms on convolutional neural networks. In 2018 international conference on computational techniques, electronics and mechanical systems (CTEMS) (pp. 92-99). IEEE.

Pearlmutter, B. A. (1996). An investigation of the gradient descent process in neural networks. Carnegie Mellon University.

Klein, S., Pluim, J. P., Staring, M., & Viergever, M. A. (2009). Adaptive stochastic gradient descent optimisation for image registration. International journal of computer vision, 81, 227-239.

Sipper, M. (2022). High per parameter: A large-scale study of hyperparameter tuning for machine learning Algorithms. Algorithms, 15(9), 315.

Weerts, H. J., Mueller, A. C., & Vanschoren, J. (2020). Importance of tuning hyperparameters of machine learning algorithms. arXiv preprint arXiv:2007.07588.

Elgeldawi, E., Sayed, A., Galal, A. R., & Zaki, A. M. (2021, November). Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis. In Informatics (Vol. 8, No. 4, p. 79). MDPI.

Jin, R., & He, X. (2020, October). Convergence of momentum-based stochastic gradient descent. In 2020 IEEE 16th International Conference on Control & Automation (ICCA) (pp. 779-784). IEEE.

Huang, F., Gao, S., Pei, J., & Huang, H. (2020, November). Momentum-based policy gradient methods. In International conference on machine learning (pp. 4422-4433). PMLR.

Maurya, M., & Yadav, N. (2022, May). A comparative analysis of gradient-based optimization methods for machine learning problems. In International Conference on Data Analytics and Computing (pp. 85-102). Singapore: Springer Nature Singapore.

Pydipalli, R., & Tejani, J. G. (2019). A Comparative Study of Rubber Polymerization Methods: Vulcanization vs. Thermoplastic Processing. Technology & Management Review, 4, 36-48.

Rodriguez, M., Tejani, J. G., Pydipalli, R., & Patel, B. (2018). Bioinformatics Algorithms for Molecular Docking: IT and Chemistry Synergy. Asia Pacific Journal of Energy and Environment, 5(2), 113-122. https://doi.org/10.18034/apjee.v5i2.742

Downloads

Published

2024-09-27
CITATION
DOI: 10.36676/dira.v12.i3.114
Published: 2024-09-27

How to Cite

Ria Kundra, & Ojaswi. (2024). Assessing the Efficiency of Gradient Descent Variants in Training Neural Networks. Darpan International Research Analysis, 12(3), 596–604. https://doi.org/10.36676/dira.v12.i3.114