AI in Production - AI strategy and tactics.

Hannes Hapke

BIO

As one of Digits’s principal machine learning engineers since 2020, Hannes Hapke is fully immersed in day-to-day, evolving, innovative ways to use machine learning to boost productivity for accountants and business owners. Before joining Digits, Hannes solved machine learning infrastructure problems in various industries, including healthcare, retail, recruiting, and renewable energies.

Hannes is an active contributor to TensorFlow’s TFX Addons project and has co-authored multiple machine learning publications, including the book “Building Machine Learning Pipelines” by O’Reilly Media. He has also presented state-of-the-art ML work at conferences like ODSC, or O’Reilly’s TensorFlow World.

TITLE

Deploying Large Language Models with Ease

ABSTRACT

The power of large language models (LLMs) is undeniable, yet many organizations hesitate to embrace third-party APIs due to privacy concerns and a desire for greater control. Self-hosting LLMs present its own challenges, from navigating the ever-evolving model and tooling landscape (Mixtral, Llama3, llama.cpp, etc.) to optimizing inference and managing costs.

This talk delves into the intricacies of deploying LLMs on-premise or within your cloud VPC. We’ll explore the key decisions involved, including:

Model selection: Evaluating available models based on your specific needs and use case.
Tooling and infrastructure: Navigating the diverse ecosystem of LLM frameworks and optimization tools.
Inference optimization: Balancing performance, accuracy, and cost through techniques like quantization, parallelization, and batching.
Deployment strategies: Choosing the optimal environment for your LLM, whether you want to use it in a streaming or batching process.

Takeaways

Attendees will gain a comprehensive understanding of the LLM deployment landscape, learn from real-world case studies, and be equipped to make informed decisions for their own projects.