AI in Production - AI strategy and tactics.

J. Mark Locklear

BIO

Mark Locklear is a Project Manager and Web Developer at the Extension Foundation and has 20 years of IT experience including network administration, quality assurance, and software development. He is an Adjunct Instructor at Asheville Buncombe Technical Community College and is passionate about education and teaching students software development.

TITLE

ExtensionBot: A Domain Specific Generative AI Tool for the Cooperative Extension Service

ABSTRACT

The ExtensionBot project is a groundbreaking initiative of the Extension Foundation that harnesses the power of artificial intelligence to deliver trustworthy, research-based answers to the public, drawing upon the vast knowledge and expertise of the Cooperative Extension Service. By training a large language model exclusively on data provided by land grant research institutions, ExtensionBot ensures that its responses are grounded in reliable, verifiable information. Rather than being trained on the entire internet, ExtensionBot is trained on specific data sets from Land-Grant Universities. In this case the data are primarily agriculture related (horticulture, farming…ect). This presentation will provide an overview of the project’s goals, use cases, the technology behind ExtensionBot, and the collaborative efforts and challenges of obtaining data from universities. Attendees will learn how ExtensionBot aims to enhance the accessibility and reach of Cooperative Extension resources while maintaining the highest standards of accuracy and credibility.

A big part of this project is not just building the AI Tool (ExtensionBot) but the data gathering, curating and “data wrangling” for the data sets that ExtensionBot is being trained on. I will also talk about our efforts to work collaboratively with Research Universities to provide us data but also crawling university websites when universities were not able to provide data sets for us.

Details Introduction (10 minutes). History of the project, demo of ExtensionBot and the technology behind it. Data Sources (10 minutes). We believe the strength of the project is the data behind it. I will talk about how we are gathering data from research institutions both by working directly with University IT Staff, but also by implementing automated scripts to crawl sites. Implementation (10 minutes). Who its for and how/where its used and how it can be implemented. Future of the tool (5 minutes). Where we see the project going over the next 12-24 months. Conclusion and Questions (5 minutes).

INTERVIEW

What inspired you to delve into AI and pursue a career in this field?

I heard someone say once, there is no such thing as luck, there is only where preparation meets opportunity! My company had started a small chatbot project in 2018 built on the RASA framework. The idea was to create a chatbot that could answer questions on bugs and pests. The particular group we were building the chatbot for had a large database on bug and pest data. Honestly, the initial version of that product was not very good, but our team learned a lot doing it, so it really set us up for when ChatGPT hit the scene in the Fall of 2022. The buzz around ChatGPT caused our leadership to be like "...hey, aren't we already doing something like this?". Even though the current project we were building back then was nowhere close to ChatGPT, it got our leadership to perk up and show interest as well as be willing to throw some funding our way to make it better. That led us down the path of building our LLM and RAG pipeline and it just continued to evolve from there.

Can you share a real-world example of how your AI solution has made a significant impact in your industry?

Sure, our tool is the only US Agricultural Based chatbot out there at the moment. We are training our LLM (currently Mistral-7B) on research based data from Land Grant Institutions across the United States. My organization serves Land Grant institutions so we already have those relationships in place. While there's still plenty of work to do to acquire data from each institution, already having the connection and relationship in place is a huge first step in acquiring data sets.

What do you see as the most significant challenges in deploying AI in production, and how do you suggest overcoming them?

Probably money and expertise. Money to run the GPU's. We are currently spending around $600-$800 a month to run the GPU's and other supporting technology for our LLM and RAG Pipeline. Part of me feels like this may get cheaper over time as GPU's (potentially) get less expensive, but we also anticipate our datasets will continue to grow and the number of users will also grow, so I'm not sure any cost savings from tech shifts will outpace our data and use growth rates. The expertise is the other factor. I mentioned being lucky earlier, I feel like we have gotten lucky with the contractor we work with the high level of engineering that relationship has given us access to. We are a non-profit and the particular contractor we work with has a passion of the Cooperative Extension Service and its work so we don't pay anywhere near the money we might pay another contractor who is just "in for the money".

How do you stay updated with the rapid advancements in AI technology, and what resources would you recommend to others in the field?

This is a tough one for me. I'm not a big reader, and in fact, I often have to limit my uptake for new or cutting edge technology just because I simply don't have the bandwidth to take it all in on a daily, or even weekly basis. I feel like my effort and energy goes into trying to make real tools work for real people and by the time the week is up, I don't have much time for reading about the new whiz bang AI model or website. There is so much hype and press around AI right now, I feel like you don't have to try hard to find out the technologies that are actually making a real impact. All of that said, I do subscribe to https://towardsdatascience.com/ along with going to a couple of conferences a year, but not necessarily AI specific conferences. There are two other conferences I am involved with this year and while neither is focused on AI, they both have a significant number of talks related to AI.

What future trends in AI are you most excited about, and how do you believe they will shape the industry?

I'm probably most excited about a lot of the tools and services I think we will see around being able to build your own GPT's, like, what I live to call "domain specific" GPT's or chatbots and off the shelf tools that are going to allow non-developers to build their own domain specific chatbots. So I see this and a sort of poor man's RAG pipeline. ChatGPT has something like this that they have recently released that is allowing users to upload documents and have the tool only respond with answers based on those uploaded documents. And I think Microsoft has something similar where you can give it a root domain (URL) from the web and it will limit responses to the information at that URL.

What advice would you give to product teams and developers just starting with AI projects?

Talk to your customers and stakeholders and try to identify specific use cases, then build something that is useful and works for those people. Also, less is more or KISS (keep it simple stupid). Again, build a tool that does one or two things really well instead of trying to build something that does 20 things, meh. I was recently at a conference and saw a production demo of Microsoft CoPilot and I was really blown away at how comprehensive it is, but it also feels a bit bloated, though that feels like a very Microsoft approach doesn't it; and that is not to say that it's wrong. MS has a strong enough user base that they can take a "everything but the kitchen sink" approach and see what sticks. But, for the rest of us I think we need to be more focused on specific use cases and implementations.

How do you approach ethical considerations when developing and deploying AI solutions?

My approach is generally to be as transparent as possible about all aspects of a project. From data gathering (source data) to engineering. I think if you can "say it outloud" and be able to look yourself in the mirror then you can probably look a client or stakeholder in the eye and tell them the same thing.

What role do you believe collaboration plays in advancing AI technology, and how do you foster it within your teams?

I do a lot of this in my position. I work for a non-profit that serves the Cooperative Extension Service and also Land Grant Institutions across the US. Our projects are grant funded, so collaboration is literally a requirement for my job based on the funding we get for the USDA and other federal grants. Generally there is already a culture of collaboration built into the organizations I interact with. That being said, approach interactions with one of general curiosity about other organizations; what they do, how they work, and how can I or my organization can help them do their work better. I am all about relationships and making a personal connection with people before diving into technical details of a project or system. Like, I will almost never remember someone's name, but I will always remember where you are from, or where you grew up. But we have to have a substantive conversation before I can find those kinds of details out.

In your experience, what are the key factors that contribute to the successful integration of AI into existing business processes?

This is really difficult and is the million dollar question that everyone is trying to figure out right now. I think I touched on this earlier, but I like a less is more approach. Find one or two areas that you think the addition of AI might work and start there. I also like the fail fast approach; again find an area where you think AI might be beneficial and implement it in a small way then evaluate its effectiveness, then pivot to either change, enhance or expand its use in that (or other) areas based on what you see and hear from the people using the tools.