TATA ServiceSage: A Gen AI–Based RCA Chat Assistant - MATLAB
Video Player is loading.
Current Time 0:00
Duration 31:07
Loaded: 0.53%
Stream Type LIVE
Remaining Time 31:07
 
1x
  • Chapters
  • descriptions off, selected
  • en (Main), selected
    Video length is 31:07

    TATA ServiceSage: A Gen AI–Based RCA Chat Assistant

    Bhakti Kalghatgi, Tata Motors
    Shubham Gupta, Tata Motors

    The AI-powered ServiceSage chatbot for Tata Motors vehicles harnesses advanced natural language processing (NLP) and generative AI technologies to transform vehicle diagnostics and maintenance. Built on a retrieval-augmented generation (RAG) architecture, this intelligent system integrates large language models (LLMs) to efficiently extract and interpret complex information from service manuals. This approach enables ServiceSage to offer dynamic diagnostic capabilities, enabling precise, real-time query resolution.

    ServiceSage retains chat history for contextual continuity and leverages continuous learning algorithms to evolve based on user feedback, ensuring that the system adapts and improves over time. Designed to enhance operational efficiency, the system significantly improves diagnostic accuracy, reduces repair times, and elevates the overall service experience for both technicians and vehicle owners. By offering up-to-date, context-aware, and accurate support, ServiceSage sets a new standard for automotive service operations.

    Published: 22 Dec 2024

    Thanks for that introduction. So to start with, I'm Bhakti. I'm from Tata Motors. And we will be showcasing here one of the application as a prototype, what we have worked on.

    And we are working with the multiple areas of artificial intelligence-- one of that being generative AI. So how it can be utilized. And we can leverage the benefits of that so that many of our issues or the troubles which we are facing, it can be minimized. So this is just one of the proof of concept, what we have demonstrated.

    So it's a service stage. So this is the agenda. So while we are speaking with the application, first, we will go through the different automobile challenges in the diagnostic what we have. Then what is generative AI?

    So as we just had a poll, which says few of the people are aware of AI, and few are having hands-on. So to meet up or to bridge a gap, OK, so we will be just going through a generative AI and how it is different from, say, conventional AI.

    Then what is a RAG? What is called as a Retrieval Augmented Generation? Brief about that, and how we have used that in our workflow. Overall, the demo of the workflow and the features-- key features, what we implemented.

    So to start with, just first for the automotive, diagnostics, and services. So as you all know, what is a diagnostic? It's like a troubleshooting the issues which all are present as part of or been faced by the customers. So it has been also evolved, like from the previous age where the vehicles were mechanical driven with a less number of controllers to the till date where the number of controllers are being increased as part of a vehicle.

    So traditionally, we used to follow a mechanical assistance was there. And he used to do the servicing and all that part. But as the systems evolved, OK, there has been an evolution in the trains and the way the servicing or the diagnostic need to be done. But still, however, there are certain issues which are being faced while we are working as a part of service industry.

    So as an OEM, when we sell a vehicle, when we go through the service industries, what we found was the people faced this as a complex troubleshooting methodology because they feel like they struggle with the complex. Multiple faults are there. And because of the increase in the software, the complexity has definitely increased.

    Then right now, we are having a, say, diagnostic manual. It's like a document-based or root cause analysis, you can say. So it actually need to go through the multiple pages and makes the task very cumbersome. Then we also-- because of this activity, it becomes a time-consuming task. And then what has been faced is like a vehicle takes a longer amount of time for the servicing.

    It also been observed that there is a knowledge gap. So if you observe who is a talented or experienced service or service person or a technician, he can resolve the issue at a faster pace compared to a new technician who is handling that. And then that brings a knowledge gap. So these issues actually combines. And finally, it leads to the delays, which actually been-- rather, the overall customer who is there at a stake has been-- he is a sufferer.

    Now while we are into the age where there are so much evolution that is happening the, say, multiple processes, then the generative AI. And we are talking about the clouds and all. So why can't we utilize this to resolve these issues? So is there any solution to this?

    Yes, there are certain options where we can resolve this. Say, for example, for the complex troubleshooting, we can have the multiple documentations, or we can have the cross referencing done. If you want to have a manual thing, so can we make that search better or a smart search which will actually save his time? So which will actually resolve the delays. And the centralized documentation will help. Multiple things will be available.

    And overall, because of this, a technician, even if he is a new technician, he can go through a proper guided manual or an app, which will be independent of the experience of a person. So with that, actually, it will help as a overall one-stop platform, where across the multiple vehicles it can serve that. So a single application that is as a requirement, what we found, it has been required.

    Now, while we are having this, we thought, can we have something done? So the solution, what we thought is a desktop-based app, which is based on a generative AI platform. So this is that app. So in our subsequent slides, now we will tell you about what is generative AI, how the workflow of this app is, and how we have developed.

    So it's a desktop-based app, which has been developed as a prototype. So multiple like this as a generative AI, how we can utilize as an advantage? So while going through the background of this, so it started with what is a generative AI? So I will just give in brief about the generative AI concept.

    So even if you Google right now, OK, you will get, in brief, about what generative AI does. It does the generation of the text, images, videos, and the design, and the patterns, different patterns based on the learnings that has been done. And not only the learning, it generates a new thing about it. So it generates a new text, new image, new video, and, if combined, it can help you out in multiple different areas.

    So then question comes, how these generative AI's are built on? Or how they are working? So they are actually built on a transformer and a Large Language Model, what we call it as a LLMs. And they are having a context in the data to generate the relevant outputs.

    Then it works on the principle of a neural networks, and it identifies the existing pattern and upon which it actually generates the new patterns. So what are the capabilities of that? So as I was explaining, it can help you in the text generation, the images and videos, and then the automating the manuals, and then over the predictive analysis.

    So while we are doing this, how it is different from the traditional AI? So in that concept, the traditional AI-- what it includes is a machine learning and a basic artificial intelligence. It actually deals with the patterns which are involved into the data set, but it does not have a capability to generate the new data. Plus it requires a significant training or the retraining to have the new information. So we need to update those models as and when required. And so that gives the limitations to the traditional AI and giving an advantage over the generative AI.

    Now, since we are talking about a generative AI, there are multiple use cases which are there in as part of automotive industry. Say, first is being chatbot. So what is a chatbot? It's a realistic user interface where we can ask a question, and it can give you the answer. So famous is like a, say, ChatGPT. It's one of the famous chatbot that has been developed.

    Then we also have a, say, in-vehicle or in-car virtual assistant. So this is actually useful, say, if you have a control-- as a voice control to have, say, temperature settings on or, say, making a call to a person or understanding the traffic details or sending the messages and further.

    So next app can be, say, a synthetic data generation where it can be used to understand the different images, videos of the traffic signal. And we can utilize them to build a new apps. Say, for example, this is useful for, say, autonomous vehicles where this data is being used as a validation data set. And it can be trained so that the autonomous vehicle can run.

    Next, we also can utilize an AI in the, say, design and prototyping where the different multiple design options are available and best out of which can be understood and generated based on your requirements. We also use generative AI in the simulation and virtual prototyping. Say, for example, basis or GenAI solution-- we can have the vehicle performance tested. And we can have the different diverse conditions given and to make the more robust testing. It also being observed. It can be used as in production use cases where over the assembly lines to optimize many of the things that increase the overall productivity.

    So while we are being understanding what is a GenAI, and how it is different from traditional AI? And there are multiple use cases about it. So I would like to take it ahead as a introduction to a retrieval augmented generation. So I will ask Shubham to take it ahead.

    OK. Thank you, Bhakti. Good afternoon, everyone. My name is Shubham. So I would like to as Bhakit highlighted, about how GenAI is transforming the automotive industry. So I will be deep down into how we can use it for the industry-specific applications.

    So as you must be aware about one of the widely used application of a GenAI currently that you all must be using is ChatGPT. So basically, ChatGPT is based on the open source LLM, where an open source LL that is GPT 3.4 or 4. And it is trained on the large training data set. So the challenge is with the open source LLM. We need a huge computational power, and we need a frequent retraining of models.

    So in context of the-- sorry. OK. So basically, ChatGPT types of open source LLM is based on the prompt engineering where we are providing the input, which is prompt, which guides the LLM or to form the output. And then output is having the generic information in that.

    So GenAI has an immense potential, but the problem is using the open source LLM. So the question arises, how we can use generative AI for the company-specific data where it can be secured and we can use for the application?

    And there are some limitations and challenges using the open source LLM. That is context and relevance, data privacy, and hallucination. So in the context relevance limitation of the context and elevation, it happens due to the domain-specific training. So if we want to train the LLM as per the domain-specific knowledge, we want to basically give the data to the LLM for the training part, which leads to the data security. So hence the exposure of the sensitive data. So we can't provide the sensitive data to the open source LLM for the training.

    And the third one is the hallucination. So open source LLM basically generates the output that are basically believable, but that are technically incorrect.

    So what's next? So we want to bridge the gap between the LLM general knowledge and with the company-specific data. So there is a lot of debatable topic whether the fine tuning model is more appropriate part, or the RAG is the one. So coming with the fine tuning where coming on the part of the fine tuning where we can train the model.

    And this approach is basically very effective, but it requires a lot of cost and retraining of a model. So hence we go for the RAG approach, where we just required a input. And then this input will be added to the LLM using the external knowledge of a data source. And then this combined will generate the output. And then this output response is basically having the relevant information, which was not in the case of the open source LLM and the prompt engineering.

    So coming on the next, I would like to go further with the importance of the RAG and the features of the RAG. RAG-- that is Retrieval Augmentation Generations. So by using the RAG, there are the certain importance. First one is the domain-specific knowledge. By providing the domain-specific knowledge, we can basically get the relevant output and relevant answers. And these answers will be coming from the databases where we have provided all the documents and manuals.

    And the second one is the enhanced contextual understanding. By combining the query and the relevant data, we get a optimized prompt. And then these optimized prompt goes to the LLM, interacting with the desired output.

    And the third one is improved privacy. OK. Improved privacy-- so basically, when deployed on the secured data premises, the data that will be handled locally is the more scalable solution and hence reduce the data leakage. And the fourth one is improve the response reliability. So RAG can be the better approach the hallucination part. So it gives the accurate answers and accurate responses.

    And the fifth one is consistency of proven [AUDIO OUT]. So by giving the centralized database for all the users, we can get a information, which is uniform in the service for all the people [AUDIO OUT] application. And the last one is cost effective and efficient. So here, the solution is effective and efficient as we are not training the whole model. We are taking just a small two billion parameters of a model. We are just combining this model with the external source of knowledge that is specific. And they are already a better output as compared to the traditional AI and prompt engineering.

    So looking forward, this is the architecture of a RAG. I will be explaining about by giving the simple example. Like, suppose this is a user. If he asks a question which is a query-- like, hey, how are you? So this sentence and query will be converted back to the embeddings by using the embedding models. So then these vector embeddings will be matched with the vector database where we have already document and vectorize the pre-manuals.

    And then these vector embeddings and the database that is having the documents will be matched by using the NLP techniques like the cosine similarities, keywords, and all the And then we get a context. And then this context will be given back by combining with a query that is augmented.

    And then this optimize prompt, which is containing the query in the context part, will be given to the LLM. And combining with the element LLM knowledge, we get a response that is relevant. And then we get a desired output. So the question we asked was, how are you? So it will give us a different responses in this. And optimize responses will be given back from the LLM.

    OK. So for better understanding, I would like to invite Kostov to talk about more. So how this workflow we have implemented by using the MATLAB tools.

    Thanks, Shubham and Bhakti for talking about the automated diagnostics as well as the GenAI introductory part. I will just maybe walk you through-- before jumping to how we have established the workflow by using the MATLAB, I will just talk about four different ways how you can work with the LLM models by using the MATLAB.

    So first three-- so these are nothing but the co-pilot. So where you can give the query. And as an output, you will get some MATLAB-based code, right? So these three different options are used to improve the productivity whenever you are working with the MATLAB.

    So first is the MATLAB AI chat playground where normally it is available on the MathWorks website, where you can put some query and get some code. Second, you can also leverage the MATLAB extension. Suppose you are already working with the GitHub Copilot. So maybe that copilot can generate the code for you. And you can maybe more focus on your problem-solving thing.

    Third is nothing but the customized MATLAB GPT. In case you are already customer or user of the ChatGPT, and in case you are having some questions regarding the MATLAB, how to generate the code, how to train the model, how to define the different functions, instead of using the ChatGPT, you can switch to MATLAB GPT under the ChatGPT. And then you can go ahead with by giving some query. So the same technology through the different platforms, as well as having the different fidelity.

    Now coming to our main last option, that is using the language models by using the MATLAB. So this is what we have used while establishing this chatbot, by using this. So it is available in GitHub. There are some codes and some APIs available through which you can leverage or call the local as well as the public-m based LLM models and build your own chatbot.

    So you can call it, integrate into the MATLAB, build some app by using the App Designer, and after that, deploy the model maybe as a standalone application, as well as a web app. So we'll be talking more about this last option.

    Now, I will just maybe recall the architecture, which Shubham has mentioned as a architecture. And I will just try to map how we have established that architecture by using the MATLAB.

    So first is the data injection. Now since it is a RAG, we are playing with lot of PDF documents, right? So first we have mapped those PDF documents in the MATLAB code. That's how the document parsing comes into picture. I have mentioned some couple of functions here. For example, data filestore, file data store, as well as read-all, which normally helps to deal with the big data.

    Then once we have mapped the documents, then second is the data pre-processing. Here in the upper part of each bucket, I have mentioned what we did; and the lower part, which functions are being used. So here first we have tokenize the documents. So tokens are nothing but the collection of words which are being further used for the removing some words, maybe cleaning the data in a textual format, et cetera.

    So here first we are tokenizing the data, which we have extracted from the PDF. Then we are applying some data cleaning techniques. Third is setting up the LLM.

    On one hand, we have the tokenized data from the PDF documents. Here I'm setting up the LLM models. So by using maybe the Ollama chart object, I'm calling up locally downloaded Ollama models. So here we used the Llama 3.1. And once it is established, then I'm going for the query.

    So on one hand, I'm just taking the inputs from the PDF. On the second hand, I'm taking the inputs from the user. So for example, what is the P code? What is the issue behind maybe the jamming the break and something like that?

    So that query, I'm taking, on the other hand. Again, I'm applying the tokenization by using this tokenized document. And at the end, retrieval, what I'm doing it. So maybe, first, we imported the data from the document, then we taken the query from the user.

    And we are trying to match, for example, whatever the paragraphs we have taken to which paragraph, there is a similar kind of thing between the query and the paragraph. So I'm doing the ranking based on the different similarity algorithm. For that, we have used maybe the BM25, BM25 plus, as well as the cosine similarity algorithm, to find the correlation between the query and the imported data from the PDF documents.

    And everything, whatever the tokenized query, as well as tokenized documents, these two inputs we are giving to the Ollama models. And as output, we are getting whatever the response or what is the resolution with respect to that respective query. So this is how we have established the workflow.

    Let me show you some-- and everything is established by using the Text Analytics Toolbox within a MATLAB. So first is the data-- so I'm just showing you how we did in the MATLAB. So this is one low code, no code app. Normally, we call it as a pre-processor text data. So without writing any code, you can import the data.

    You do some tokenization. You do some data cleaning. That's why since everything is interactively so that you don't need to write any code. That's why normally we call it as a no code approach.

    Second is I'm also showing some lines of code. So by using or by implementing few lines, you can maybe do the tokenization, do the pre-processing, as well as apply some similarity algorithms. And this is normally-- since it's very minimal code, we call it as a low code, right? So in this way, with this low code, no code approach, you can establish the RAG workflow in a small duration of time maybe in the world of the rapidly developing GenAI space. So this is the main differentiator, I would say here.

    Now, everything is developed at the MATLAB script base. Now the question is, how you can deploy it? So how can you take it to the user so that user can interactively use it?

    So first, it is in the MATLAB script. Then, we have developed one app by using the MATLAB app designer. And we have integrated that code to the app designer. And we released that app as a MATLAB app. So currently, so that whoever has the MATLAB license, they will be able to use it.

    There are also some other options. And this will be interact with your local database. So once the MATLAB app is released, you can map it to the local folders where all your PDF documents are located. So currently, it is MATLAB app. But alternatively, also you can maybe release it as maybe the docker, as a docker.

    So you can dockerize all the algorithms together, put it on-premises or on cloud. You can release it as maybe the standalone application so that non-MATLAB can use it. Also you can use it. And as well as you can also launch that app as a web app, which can be, again, hosted on your on-premises or the public or private cloud. So this is how we have taken that from the desktop to the deployment as a MATLAB web app.

    Now maybe show them-- why don't you show the actual chatbot how it functions?

    Your context about the RAG workflow using MATLAB tool has set the perfect stage for the demo of a video. So this is how the chatbot looks like. So this chatbot is basically developed on the tool chain that is app designer or for MATLAB. And this can be used in the desktop and in your local laptop.

    And this chatbot is having some key features where we can basically dynamically manually access the different types of variant of the vehicles. And we can instantly switch between the different versions of the manuals.

    And the second thing is the chat history. So on the left side where we can see all the previous chats that questions and answers has been asked. And the third one is the interactive feedback. So there is also a feedback system, like an interactive feedback where we can give the responses based on the responses. We can give the feedback like and dislike. And this like and dislike can be used for the responses for the developer. So how the responses we are getting in that.

    And the last one is the log files. So all these chats and all these responses that we are getting can be saved locally in our system. And that can be used for all the checking the outputs and responses for the developer. And here you can see that we are asking a question of the customer symptoms, like for the brake switch between the accelerator and brake correlations. So based on that, we are getting responses.

    And you can see-- look. Yes, you can see the responses that we are getting different options, like, how it's been used. And based on that, we can also select Start a New conversation where you can start with a new conversation and also change between that manuals.

    So the next question that has been asked-- what is the status for the P code 0480? So you can see that it will retrieve the responses and give the output based on the relevant context. And after that, we can provide the feedback, like or dislike, on that.

    And so this is how the window looks like. And all these logs are basically saved locally in our system and in a text form. And all these manuals that we are retrieving as a context is being also used in the local systems. So there is no-- and hence there is no data leakage in that. All this chatbot and all this feedback systems, all this basically, we have developed locally in our secure infra.

    Coming on the next, I would like to-- based on that demo, I would like to highlight the key features of this app, ServiceSage. First one is secure data. So by storing and processing the data locally within the organization, we are basically processing the information locally.

    And the second one is a dynamic manual access. We can basically choose between different the vehicle variant. And we can instantly switch between the different manual in real time. And the third one is the chat history. All these interactions will be as a query. And responses will be saved in the chat history, which retains the context of the multiple interactions.

    Last but not the least, accurate root cause analysis. So all these interactions will lead to the-- provides a precise diagnostic insights of how it comes to the root cause. And the last is the fault specific recommendations. So it provides the different types of troubleshooting steps and insights of the question we have asked.

    So what can be the potential impact of we have achieved by using and developing this app? So the chatbot we have developed, it enhances the customer experience by reducing the time of searching the manual manually. And it increases the overtime between the customers. And it also improves the service reliability because it's minimized the error.

    And the third one is it also reduces the vehicle downtime as we are reducing the manual diagnostic in that. And it also boosts and improves the service operations and technical productivity and also competency with that. Coming on that, so Service Sage is just a beginning. So we can further-- including the further features and improvement, we can improve this diagnostic tool a lot. And also, we can do that. Thank you and open to queries.

    Yeah, thank you. Thank you, everyone, for listening to our session. Are there any questions? Maybe, if you got some questions.

    We have an interesting question. How you are handling hallucination? for your responses.

    OK. So normally, hallucination starts with the documents. So the document should be very contextual. There should be a very less noise. Whatever information we have, that should be correlated to the questions which we are asking. So after that second is the LLM models. We should choose the right LLM models, which normally have the domain from our documents which we are feeding to.

    Second is the tuning the LLM models. There are some hyperparameters, for example, sampling top P, sampling top K, as well as temperature, which you need to fine tune in order to get the trade-off between the creativity, as well as the exactness. So these are maybe three, four questions. As well as the worst case may be in case you are not overcoming the initialization by using these three, four techniques, then last option is maybe the fine tuning or the retraining the models with new contextual data sets.

    Sure, thank you.

    So keep posting your questions in the chat. Sorry, we are your QR code. So we'll be taking that and sending over there. So live Q&A is via QR scanning the QR code and posting your question. And you can find the QR code backside of your ID card.

    And also we can also discuss after the session in case you have any some questions, huh?

    Yeah.

    OK. Thank you. Thank you very much.

    Thank you. Thank you.