N Optimizing Multi-Tenant DAG Execution Systems for High-Throughput Inference
DOI:
https://doi.org/10.36676/dira.v12.i3.139Keywords:
Multi-tenant DAG execution,, high-throughput inference,, scheduling optimization, hybrid execution modelsAbstract
In large-scale data processing and machine learning systems, Directed Acyclic Graphs (DAGs) serve as the backbone for orchestrating complex workflows that involve multiple dependent stages. Multi-tenant DAG execution systems are increasingly being used to handle concurrent workloads from multiple users and applications. However, these systems face significant challenges when it comes to achieving high-throughput inference, particularly in shared environments where resource contention, scheduling efficiency, and tenant isolation become critical concerns. High-throughput inference is a necessity in use cases such as real-time recommendation engines, large-scale data processing pipelines, and cloud-based AI services, where latency and throughput are vital to maintaining system performance.
This research paper aims to address the primary challenges associated with optimizing multi-tenant DAG execution systems for high-throughput inference. We begin by analyzing the limitations of existing frameworks such as Apache Airflow, Luigi, and Prefect in multi-tenant environments, focusing on issues like resource contention, inefficient scheduling, and lack of dynamic scalability. To tackle these issues, we propose a set of optimization strategies that include adaptive resource allocation, tenant-aware scheduling, and hybrid execution models that balance between real-time and batch inference.
Our first strategy involves dynamic partitioning of resources to prevent contention and ensure fair allocation among tenants based on workload priority and expected resource utilization. This approach is supplemented by intelligent scheduling techniques that leverage cost-based heuristics and priority queues, reducing overall latency and improving system throughput. Additionally, we introduce a hybrid execution model that supports both real-time and batch processing pipelines, enabling flexible execution of diverse workload types in the same shared environment. This allows the system to dynamically switch between real-time and batch modes based on workload characteristics, thereby optimizing resource utilization.
To further enhance performance, we propose incorporating memory-aware caching mechanisms that prioritize data locality and reduce redundant data movements between nodes in the DAG. This not only decreases execution time for individual DAG stages but also minimizes I/O overhead, a critical factor in high-throughput systems. These strategies are integrated into a multi-tenant DAG execution framework designed to support various machine learning and data analytics workloads in a cloud-native environment.
The effectiveness of our optimizations is evaluated through comprehensive experiments using real-world datasets and synthetic benchmarks, comparing our approach against baseline systems. Our results demonstrate significant improvements in throughput, latency, and scalability, validating the proposed techniques for real-world adoption in multi-tenant DAG execution systems. We also present a case study of applying these optimizations to a large-scale AI inference platform, highlighting the practical benefits and potential challenges of deploying such systems in a production environment.
Ultimately, this research provides valuable insights into optimizing DAG execution for high-throughput inference, offering a blueprint for building scalable, efficient, and tenant-aware DAG systems capable of handling diverse and dynamic workloads.
References
Salunkhe, Vishwasrao, Srikanthudu Avancha, Bipin Gajbhiye, Ujjawal Jain, and Punit Goel. 2022. "AI Integration in Clinical Decision Support Systems: Enhancing Patient Outcomes through SMART on FHIR and CDS Hooks." International Journal for Research Publication & Seminar 13(5):338. DOI: https://doi.org/10.36676/jrps.v13.i5.1506.
Eeti, S., Jain, A., & Goel, P. (2023). A comparative study of NoSQL databases: MongoDB, HBase, and Phoenix. International Journal of New Trends in Information Technology, 1(12), a91-a108. Available at: http://www.rjpn/ijnti/papers/IJNTI2312013.pdf
Tangudu, A., Jain, S., & Pandian, P. K. G. (2023). Developing scalable APIs for data synchronization in Salesforce environments. Darpan International Research Analysis, 11(1), 75. https://doi.org/10.36676/dira.v11.i1.83
Ayyagiri, A., Goel, O., & Agarwal, N. (2023). "Optimizing large-scale data processing with asynchronous techniques." International Journal of Novel Research and Development, 8(9), e277-e294. https://ijnrd.org/viewpaperforall.php?paper=IJNRD2309431
Tangudu, A., Jain, S., & Jain, S. (2023). Advanced techniques in Salesforce application development and customization. International Journal of Novel Research and Development, 8(11), Article IJNRD2311397. https://www.ijnrd.org
Kolli, R. K., Goel, P., & Jain, A. (2023). MPLS Layer 3 VPNs in Enterprise Networks. Journal of Emerging Technologies and Network Research, 1(10), Article JETNR2310002. doi 10.xxxx/jetnr2310002
FNU Antara, DR. SARITA GUPTA, PROF.(DR) SANGEET VASHISHTHA, "A Comparative Analysis of Innovative Cloud Data Pipeline Architectures: Snowflake vs. Azure Data Factory", International Journal of Creative Research Thoughts (IJCRT), Volume.11, Issue 4, pp.j380-j391, April 2023. http://www.ijcrt papers/IJCRT23A4210.pdf
Singiri, E. S., Gupta, E. V., & Khan, S. (2023). "Comparing AWS Redshift and Snowflake for data analytics: Performance and usability." International Journal of New Technologies and Innovations, 1(4), a1-a14. [rjpn ijnti/viewpaperforall.php?paper=IJNTI2304001](rjpn ijnti/viewpaperforall.php?paper=IJNTI2304001)
"Advanced Threat Modeling Techniques for Microservices Architectures." (2023). International Journal of Novel Research and Development, 8(4), h288-h304. Available: [http://www.ijnrd papers/IJNRD2304737.pdf](http://www.ijnrd papers/IJNRD2304737.pdf)
Gajbhiye, B., Aggarwal, A., & Goel, P. (Prof. Dr.). (2023). "Security automation in application development using robotic process automation (RPA)." Universal Research Reports, 10(3), 167. https://doi.org/10.36676/urr.v10.i3.1331
Ayyagiri, A., Jain, S., & Aggarwal, A. (2023). "Innovations in multi-factor authentication: Exploring OAuth for enhanced security." Innovative Research Thoughts, 9(4). https://doi.org/10.36676/irt.v9.i4.1460
Voola, Pramod Kumar, Sowmith Daram, Aditya Mehra, Om Goel, and Shubham Jain. 2023. "Data Streaming Pipelines in Life Sciences: Improving Data Integrity and Compliance in Clinical Trials." Innovative Research Thoughts 9(5):231. DOI: https://doi.org/10.36676/irt.v9.i5.1485.
Pagidi, Ravi Kiran, Phanindra Kumar Kankanampati, Rajas Paresh Kshirsagar, Raghav Agarwal, Shalu Jain, and Aayush Jain. 2023. “Implementing Advanced Analytics for Real-Time Decision Making in Enterprise Systems.” International Journal of Electronics and Communication Engineering (IJECE)
Tangudu, A., Chhapola, A., & Jain, S. (2023). Integrating Salesforce with third-party platforms: Challenges and best practices. International Journal for Research Publication & Seminar, 14(4), 229. https://doi.org/10.36676/jrps.v14.i4.1478
Kshirsagar, Rajas Paresh, Venudhar Rao Hajari, Abhishek Tangudu, Raghav Agarwal, Shalu Jain, and Aayush Jain. 2023. “Improving Media Buying Cycles Through Advanced Data Analytics.” International Journal of Progressive Research in Engineering Management and Science (IJPREMS) 3(12):542–558. Retrieved (https://www.ijprems.com).
Gannamneni, Nanda Kishore, Pramod Kumar Voola, Amit Mangal, Punit Goel, and S. P. Singh. 2023. "Implementing SAP S/4 HANA Credit Management: A Roadmap for Financial and Sales Teams." International Research Journal of Modernization in Engineering Technology and Science 5(11). DOI: https://www.doi.org/10.56726/IRJMETS46857.
Voola, Pramod Kumar, Srikanthudu Avancha, Bipin Gajbhiye, Om Goel, and Ujjawal Jain. 2023. "Automation in Mobile Testing: Techniques and Strategies for Faster, More Accurate Testing in Healthcare Applications." Shodh Sagar® Universal Research Reports 10(4):420. https://doi.org/10.36676/urr.v10.i4.1356.
Tangudu, Abhishek, Akshun Chhapola, and Shalu Jain. 2023. "Enhancing Salesforce Development Productivity through Accelerator Packages." International Journal of Computer Science and Engineering 12(2):73–88. https://drive.google.com/file/d/1i9wxoxoda_pdI1Op0yVa_6uQ2Agmn3Xz/view
Salunkhe, Vishwasrao, Dheerender Thakur, Kodamasimham Krishna, Om Goel, and Arpit Jain. 2023. "Optimizing Cloud-Based Clinical Platforms: Best Practices for HIPAA and HITRUST Compliance." Innovative Research Thoughts 9(5):247–247. DOI: https://doi.org/10.36676/irt.v9.i5.1486.
Salunkhe, Vishwasrao, Shreyas Mahimkar, Sumit Shekhar, Prof. (Dr.) Arpit Jain, and Prof. (Dr.) Punit Goel. 2023. "The Role of IoT in Connected Health: Improving Patient Monitoring and Engagement in Kidney Dialysis." SHODH SAGAR® Universal Research Reports 10(4):437. DOI: https://doi.org/10.36676/urr.v10.i4.1357.
Agrawal, Shashwat, Pranav Murthy, Ravi Kumar, Shalu Jain, and Raghav Agarwal. 2023. "Data-Driven Decision Making in Supply Chain Management." Innovative Research Thoughts 9(5):265–71. DOI: https://doi.org/10.36676/irt.v9.i5.1487.
Agrawal, Shashwat, Venkata Ramanaiah Chintha, Vishesh Narendra Pamadi, Anshika Aggarwal, and Punit Goel. 2023. "The Role of Predictive Analytics in Inventory Management." Shodh Sagar Universal Research Reports 10(4):456. DOI: https://doi.org/10.36676/urr.v10.i4.1358.
Mahadik, Siddhey, Umababu Chinta, Vijay Bhasker Reddy Bhimanapati, Punit Goel, and Arpit Jain. 2023. “Product Roadmap Planning in Dynamic Markets.” Innovative Research Thoughts 9(5):282. DOI: https://doi.org/10.36676/irt.v9.i5.1488.
Tangudu, A., Chhapola, A., & Jain, S. (2023). Leveraging lightning web components for modern Salesforce UI development. Innovative Research Thoughts: Refereed & Peer Reviewed International Journal, 9(2), 1-10. https://doi.org/10.36676/irt.v9.12.1459
Pagidi, Ravi Kiran, Santhosh Vijayabaskar, Bipin Gajbhiye, Om Goel, Arpit Jain, and Punit Goel. 2023. “Real Time Data Ingestion and Transformation in Azure Data Platforms.” International Research Journal of Modernization in Engineering, Technology and Science 5(11):1-12. doi:10.56726/IRJMETS46860.
Mahadik, Siddhey, Fnu Antara, Pronoy Chopra, A Renuka, and Om Goel. 2023. "User-Centric Design in Product Development." Shodh Sagar® Universal Research Reports 10(4):473. https://doi.org/10.36676/urr.v10.i4.1359.
. Khair, Md Abul, Srikanthudu Avancha, Bipin Gajbhiye, Punit Goel, and Arpit Jain. 2023. "The Role of Oracle HCM in Transforming HR Operations." Innovative Research Thoughts 9(5):300. doi:10.36676/irt.v9.i5.1489.
Mahadik, S., Murthy, P., Kumar, R., Goel, O., & Jain, A. (2023). The influence of market strategy on product success. International Journal of Research in Modern Engineering and Emerging Technology (IJRMEET), 11(7).
Vadlamani, Satish, Nishit Agarwal, Venkata Ramanaiah Chintha, Er. Aman Shrivastav, Shalu Jain, and Om Goel. 2023. "Cross Platform Data Migration Strategies for Enterprise Data Warehouses." International Research Journal of Modernization in Engineering, Technology and Science 5(11):1-10. https://doi.org/10.56726/IRJMETS46858.
Gannamneni, Nanda Kishore, Bipin Gajbhiye, Santhosh Vijayabaskar, Om Goel, Arpit Jain, and Punit Goel. 2023. "Challenges and Solutions in Global Rollout Projects Using Agile Methodology in SAP SD/OTC." International Journal of Progressive Research in Engineering Management and Science (IJPREMS) 3(12):476-487. doi: https://www.doi.org/10.58257/IJPREMS32323.
"Joshi, Archit, Raja Kumar Kolli, Shanmukha Eeti, Punit Goel, Arpit Jain, and Alok Gupta. 2023. "MVVM in Android UI Libraries: A Case Study of Rearchitecting Messaging SDKs." International Journal of Progressive Research in Engineering Management and Science 3(12):444-459. doi:10.58257/IJPREMS32376.
Murali Mohana Krishna Dandu, Siddhey Mahadik, Prof.(Dr.) Arpit Jain, Md Abul Khair, & Om Goel. (2023). Learning To Rank for E-commerce Cart Optimization. Universal Research Reports, 10(2), 586–610. https://doi.org/10.36676/urr.v10.i2.1372.
Kshirsagar, Rajas Paresh, Jaswanth Alahari, Aravind Ayyagiri, Punit Goel, Arpit Jain, and Aman Shrivastav. 2023. “Cross Functional Leadership in Product Development for Programmatic Advertising Platforms.” International Research Journal of Modernization in Engineering Technology and Science 5(11):1-15. doi: https://www.doi.org/10.56726/IRJMETS46861.
Dandu, Murali Mohana Krishna, Dasaiah Pakanati, Harshita Cherukuri, Om Goel, Shakeb Khan, and Aman Shrivastav. (2023). "Domain-Specific Pretraining for Retail Object Detection." International Journal of Progressive Research in Engineering Management and Science 3(12): 413-427. https://doi.org/10.58257/IJPREMS32369.
Tirupati, Krishna Kishor, Shreyas Mahimkar, Sumit Shekhar, Om Goel, Arpit Jain, and Alok Gupta. 2023. "Advanced Techniques for Data Integration and Management Using Azure Logic Apps and ADF." International Journal of Progressive Research in Engineering Management and Science 3(12):460–475. doi: https://www.doi.org/10.58257/IJPREMS32371.
Sivaprasad Nadukuru, Archit Joshi, Shalu Jain, Krishna Kishor Tirupati, & Akshun Chhapola. (2023). Advanced Techniques in SAP SD Customization for Pricing and Billing. Innovative Research Thoughts, 9(1), 421–449. https://doi.org/10.36676/irt.v9.i1.1496.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Darpan International Research Analysis
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.