Generative AI/LLM Test: DITA vs. Unstructured Content
Posted: Thu Jun 22, 2023 1:20 pm
Hi All,
I've been thinking. If you plan to train a Large Language Model (LLM) on your content corpus, it just makes intuitive sense that any results chatbots, generative AI (GAI), etc. return will be better if the content corpus is structured. Things like DITA structures, SEO tags, etc. give the LLMs more data to understand the content and return better results. However, in all my reading, I can't find any actual proof of this. Everyone just takes it for granted. Having said that, the people with the purse strings--the ones who fund projects and staffing--don't necessarily think this supposition is so intuitive. There's a certain level of magic thinking. ("Just point the engines at our content. They'll figure it out.")
So I've been thinking that I'd like to run a test. To that end, I have two questions:
- Does anyone see any flaws in the methodology of the test (details below)?
- Would anyone like to work on this with me?
Test details:
Goal:
Determine the impact of structured content on LLM models. Does it impact the accuracy of results?
Methodology:
1. Take a large, public body of knowledge and format it two ways: structured and unstructured.
2. Post both using different URL structures to enable separate testing on each.
3. Train two separate instances of an LLM on the two different doc sets. Results must stay separate.
4. Generate a set of questions to test the accuracy of the models.
5. Run the exact same set of questions on both models using separate accounts. Again, results must stay separate.
6. Compare results to determine which are more accurate.
Thoughts? I'd love to see what people think!
Best,
Claudette
I've been thinking. If you plan to train a Large Language Model (LLM) on your content corpus, it just makes intuitive sense that any results chatbots, generative AI (GAI), etc. return will be better if the content corpus is structured. Things like DITA structures, SEO tags, etc. give the LLMs more data to understand the content and return better results. However, in all my reading, I can't find any actual proof of this. Everyone just takes it for granted. Having said that, the people with the purse strings--the ones who fund projects and staffing--don't necessarily think this supposition is so intuitive. There's a certain level of magic thinking. ("Just point the engines at our content. They'll figure it out.")
So I've been thinking that I'd like to run a test. To that end, I have two questions:
- Does anyone see any flaws in the methodology of the test (details below)?
- Would anyone like to work on this with me?
Test details:
Goal:
Determine the impact of structured content on LLM models. Does it impact the accuracy of results?
Methodology:
1. Take a large, public body of knowledge and format it two ways: structured and unstructured.
2. Post both using different URL structures to enable separate testing on each.
3. Train two separate instances of an LLM on the two different doc sets. Results must stay separate.
4. Generate a set of questions to test the accuracy of the models.
5. Run the exact same set of questions on both models using separate accounts. Again, results must stay separate.
6. Compare results to determine which are more accurate.
Thoughts? I'd love to see what people think!
Best,
Claudette