2024/2025 SCGG Season
August 13 2022
New SCGG season kick-off meeting. Member introductions and discussion: What is a knowledge graph (initial discussion to be continued in the next session Recordings here: viewtopic.php?t=18
2022/2023 SCGG Season Archive
June 2022
We have a fantastic session planned for this month’s semantic content graph guild meeting. Heather Hedden has assembled a thorough overview of fundamentals of semantic technologies. Of course, the session will be recorded, but I suggest everyone try to make this one as I’ve not seen a more comprehensive overview from soup-to-nuts (are those ontology concepts? LOL) that will level set all of us going forward. Heather has a skillful knack for making complex concepts simple, clear, and thorough!
Separate zoom invitations have gone out for this session. If you are not a guild member and would like to attend please contact me here or via LinkedIn. Michael Iantosca
August 2022
Microsoft Docs Knowledge Graph - with Dana Bublitz
10AM EST on Friday, August 19
Started in 2015, Microsoft Docs is one of the largest online documentation resources available. With over 3 million pieces of English content alone and over 4000 contributors to our content, we’ve grown to include not only technical documentation for all Microsoft products but are also increasingly adding educational and training content to our site. To support the exponential growth of our content and to help all our audiences find the right information at the right time, the Information Architecture Team at Microsoft Docs has been developing a knowledge graph to power our site experience and content discoverability. Dana will talk about the journey from managing content using only a few disorganized lists to a more structured ontology that will support the Docs knowledge graph.
September 2022
From DITA to Ontologies and onto Molecular Information Snippets
Vishal Palliyathu
11AM EST on Friday, September 16
A documentation library is a mandatory prerequisite for products. However, customers often use a solution that might include multiple products from multiple products and even companies. Customers would need to go and look at multiple documentation sources to understand the various features in a solution topology. They would have to switch between various enterprise products, their variants, and multiple releases, to make sense of what’s happening. This solution proposes an idea where information can be modularised and be made accessible as API calls. Customers can hence, choose to design their own documentation solutions by subscribing to only the features they are using for a product, and they can choose to subscribe for content spanning multiple products. Customers can also choose to subscribe to these APIs, if they choose to link it with their product events. Behind the scenes, a DITA-derived Ontology is at work.
October 2022
Graph Databases
Ontotext - Peio Popov
11AM EST on Thursday, October 20th
We’re fortune to have the team from Ontotext to come and educate us on the role and purpose of knowledge graph databases. A graph database is a semantic database specifically designed to create, manage, and build productive knowledge graph-based applications, at scale. Ontotext’s GraphDB is based on popular open standards including RDF and the SPARQL query language. Whereas platforms such as PoolParty enable us to create and manage taxonomies and ontologies, a graph database can create knowledge graphs at scale and the two can complement one another if based on the same open standards.
January 2023
Big Content and Knowledge Graphs (Live Demo!)
Topic: Semantic Content Graph Guild – January 2023 Meeting
Presenters:
Helmut Nagy, COO, Semantic Web Company
Michael Iantosca, Senior Director of Content Platforms, Avalara
Date: January 20, 2023 at 11:00 AM US Eastern Time (US and Canada)
For our January session of the SCGG, we’re thrilled to have Helmut Nagy (along with your truly) come present Big Content and Knowledge Graphs. Avalara, in collaboration with Semantic Web Company, launched a joint project to make Knowledge Graphs real and tangible for Big Content. Enough talk and blue sky – let’s see the goods! We’ve all been talking about it for months – and there have been scant working models to demonstrate the incredible power that awaits us. In this session, Helmut Nagy, Chief Operating Office of Semantic Web Company (provider of PoolParty), will present an overview of the pilot project that has been underway for several months. The project is based on generating a usable knowledge graph built on a standard DITA content corpus and modeled on a DITA-based ontology along with semantically enhanced content and using a knowledge graph database. Michael will describe the project goals and Helmut will explain the project implementation approach and best of all – A LIVE DEMO of a knowledge graph in action. Helmut presented a version of this session at the Knowledge World conference held in Washington D.C. in November. While still a work in progress, the project has since progressed further.
February 2023
iiRDS Ovverview(Live Demo!)
Topic: iiRDS Semantic Unification Standard
Presenters: Dr. Harald Stadlbauer
Date: February 17th 2023 at 11:00 AM US Eastern Time (US and Canada)
iiRDS – The International Standard for Intelligent Information Request and Delivery. It is especially interesting as a potential taxonomic layer in the Semantic Content Maturity Model that we’ve put forth to enable advanced content-as-a-service models driven by knowledge graphs. iiRDS provides a standard that appears to help bridge the semantic gaps between uniting internal content silos, interchange with partner content sources, and standardizing semantic to enable event-driven content retrieval, such as mining “Big Content” repositories for precision answers based on chatbot queries or dynamic inbound signals from users of software applications. I am sure we’re going to have lots of questions for Dr. Dr. Stadlbauer
March 2023
Practical ChatGPT and other large language models
Topic: A technical how-to discussion with a working PoC (and warnings)
Presenters: Michael Iantosca, Ashwinkumar Sharma, Avalara
Date: March 17th 2023 at 11:00 AM US Eastern Time (US and Canada)
There's a lot of chatter (excuse the pun) about ChatGPT these days. It has become a popular cocktail party topic for prognosticators, pundits, and soothsayers. As usual, the hype is off the charts. But how many have any depth and insights about the technology itself? It's important to understand what the technology can do, but even more important to understand its limitations so that we're better able to understand how to make it work for our needs. In this session, we'll go beyond the surface discussions. We'll show a working model that uses the OpenAPI and cover the issues and raise the questions few have yet to confront. Also, we'll discuss how the work on which this guild is focused can play a huge role in making these LLMs practical, reliable, and trustworthy. We don't have all of the answers yet, but we believe we're on the right track and that our work in semantic technology holds the key to the kingdom.
June 2023
Large Language Model Prompting
Presenter: Boris Horner
Date: June 23rd, 2023 at 11:00 AM US Eastern Time (US and Canada)
Abstract:
AI is rapidly disrupting many job definitions in nearly every industry, and this time it’s not belt workers being replaced by robots, but office jobs that are changing. Most people have ChatGPT in mind when they hear the acronyms „AI“ or „LLM“. But for sensitive data (intellectual property or personal information), training and running your own AI model(s) is inevitable. Training typically works with large sets of tupels of input and expected output. It’s not rocket science, but it’s nothing that can be done just „on the fly“ when a user wants some modified inference behaviour, because it requires high-end hardware for hours or days. On the other hand, instructing the AI in a chat works well (as anyone can try out in ChatGPT), but is limited by restrictions of token length and must be coped with by managing, summarizing and re-injecting the history of the chat.
Some complex transformation tasks (for example, applying terminology, or converting unsemantic HTML to semantic DITA) need substantial training data, others work well with an large model with substantial training on „basic“ linguistics. But often, users want to fine-tune the behaviour. For example, when converting an HTML cooking recipe to DITA, based on an LLM trained with technical documentation, you probably don’t want hazardstatements saying that knives are sharp and stoves are hot.
Most LLMs allow to not only send one request, but a dialog between the user and the LLM prefixing the actual request. Such „briefings“, either used as instructions for a generic, large model with good linguistics or a pre-trained one for a certain class of transformations, are normally short enough to fit into the maximum token length, together with the request. It is, however, too complicated for end users to design and prepend those dialogs manually every time they run a request. To standardize and simplify this step, we propose a solution that manages such prefix dialogs in an easy to use software tool. The briefing can be alternatively written in the source format as required by the LLM, o ras DITA and be automatically converted to the source format. These briefings can then be reused by mouse click and it’s possible to combine them to chains of single operations.
The approach we propose hides all the low-level syntax from the briefing author, hides the briefing from the end user and makes the per-use-briefing reusable.
Presenter Bio:
Dr. Boris Horner is the managing director of texolution GmbH (https://texolution.eu), a South German consulting and development service provider for advanced software solutions in technical information and related fields. He has an education as a physicist and a doctorate in mechanical engineering. After some years of experience as an employee in the fields of flow measurement and later IT, he started to work independently in the late 1990s. Since then, he was involved in a large number of projects in all major branches of industry, like mechanical and plant engineering, aerospace/defense, automotive and pharma/healthcare. Apart from consultancy and custom development, texolution also offers standard products, standalone or based on the open source CCMS Cinnamon (https://cinnamon-cms.com). Apart from that, major fields of interest are LLM and image creation AI, terminology management, ontology databases and DITA.
July 2023
Responsible AI based on LLMs
Presenters:
Andreas Blumauer - Founder and CEO of Semantic Web Company
Michael Iantosca – Senior Director of Content Platforms, Avalara Inc.
Date: July 14th, 2023 at 10:00 AM US Eastern Time (US and Canada)
Abstract:
After ChatGPT was made available to the public in November 2022, it became clear to even the biggest AI skeptics that a new era has now dawned and that AI will eventually enter many areas of life in the coming years.
ChatGPT attracted a lot of attention with its detailed answers covering a wide range of knowledge areas, but at the same time the call for responsible and explainable AI became louder again. Indeed, a notable drawback of generative AI, and large language models (LLMs) in particular, is their tendency to often generate superficial and inaccurate information that, moreover, does not provide any provenance information.
While it has become clear in recent months that generative AI and especially LLMs, on which ChatGPT is also based, are arguably fundamental building blocks of an enterprise-grade AI architecture, this needs to be complemented by other technologies and measures in order to speak of responsible AI. In particular, governance models and legal frameworks have yet to be put in place, as will be mandated, for example, by the EU in its forthcoming AI Act, to provide sufficient assurances to all stakeholders (investors, companies, citizens and consumers, etc.).
In this webinar, Michael Iantosca (Avalara) and Andreas Blumauer (Semantic Web Company) discuss the merging of LLMs and semantic technologies, in particular how knowledge graphs can be used in combination with services like ChatGPT to develop applications that combine the best of both worlds to lead to responsible, explainable generative AI.
Questions identified as critical, particularly for regulated industries, will focus on: What is the role of high-quality, well-structured training data? How should the 'human-in-the-loop (HITL)' design principle be evaluated in the context of LLMs? What AI applications for regulated industries, e.g. around ESG standards, are realistic in the immediate future?
Michael and Andreas have been working on the fusion of different AI, content and knowledge technologies for several decades, and at the end of this webinar still dare to look into the future to discuss and contrast different scenarios with more or less usable, respectively responsible AI.