The talk focuses on using Django as the primary framework to integrate and manage advanced LLMs like Llama 3.1, 3.2, and 3.3.
It explores the role of Agent creation with AI model providers like groq being called asynchronously from celery
Django will be demonstrated as a backend solution for orchestrating inference requests and managing interactions, a real world example of implementation of the architecture in AWS with models deployed in AWS will be shown
The session includes an introduction to Celery for handling asynchronous tasks such as batch processing and inference job queuing.
The conference highlights how Django Rest Framework (DRF) can expose APIs to seamlessly integrate Open Source models into web applications.
Best practices for scaling inference workflows with Celery, Redis, in a simplified single-image AWS deployment will be discussed.
Challenges such as optimizing latency, managing concurrent requests, and handling model-specific configurations will be addressed.
The session concludes with future perspectives on deploying LLMs efficiently for real-time and large-scale language applications using Django, celery and model providers.