The story behind Schematise and a use-case demonstration

Legal Tech, AI, Currently working · 16 Apr 2024

Table of Contents:

An overview of Schematise, as it currently stands (16th April, 2024).
Why have laws in XML?
A demonstration of compliance mapping using LegalRuleML

An overview of Schematise, as it currently stands (16th April, 2024).

I started the project “Schematise” at The Fifth Elephant’s Open Source AI Hackathon as a way to get a programmatic way of generating laws in the XML format (for reasons explained hereunder).

My intention was to expedite the annotation of laws in a way that a single user can decrease the amount of time required for the same. The reason I say “single user” here is because I think the way to utilise machine-readability in the current era of Legal-Tech development will require a single organisation or individual to adapt the code to their requirements.

Hence, for example, training a model on the dataset generated (using the Local Inference method without OpenAI) would first necessitate the expression in a way that suits the training of the model.

Usage for dataset generation

For instance, in a text-classification model trained on Indian laws, one could be assisted by Schematise to categorise which statements represent obligations, and which ones represent prohibition or any of the other categories available in the LegalRuleML documentation.

Why have laws in XML?

Machine-readability (to put it simply).

There are a great many problems that get solved using machine-readable laws. As the name suggests, this allows you to express legal information in a way that enables:

Semantic annotation
Interoperability (say, across jurisdictions, or even across contexts)
Temporal management (a great language for point-in-time classifications, as IndianKanoon’s blogpost explains)

How does AkomaNtoso help?

A standard of “legal ontology” that is uniform across contexts.
It is an OASIS Open Standard which further helps in utilisation, due to its expansive documentation.
It has been historically deliberated. One of the benefits are the several examples that exist on the GitHub repositories - [1][2] that can be utilised with attribution as per their terms of the CC license.

Complianalyse (proof of concept)

‘Before there came “Schematise”, there was “Complianalyse”’

I started off the Fifth Elephant + Hasgeek’s Open Source AI Hackathon conceiving of an app that could generate compliance checklists from an XML format.

But due to the lack of an automated method to generate laws in XML as of yet, and in the interests of adhering with the UNIX philosophy of “Doing one thing and doing it well”, I decided to first build Schematise.

Nevertheless, events took an interesting turn with the news regarding an advisory issued by the Ministry of Electronics and Information Technology under the Government of India. Hence, a use-case presented itself for the purpose of testing whether Schematise was generating XML content that could reasonably serve the purpose for building a simple compliance-checklist app.

Hence, I took time to program a simple python Flask app to test this.

Here is a brief overview of what the development process was:

I first generated and edited the XML content from Schematise using the Local Workflow.

Hence, a paragraph such as,

All intermediaries or platforms to ensure that their computer resource do not permit any bias or discrimination or threaten the integrity of the electoral process including via the use of Artificial Intelligence model(s)/LLM/Generative AI, software(s) or algorithm(s).

would get represented in XML as,

<lrml:Statements key="textblock2">
    <lrml:PrescriptiveStatement key="paragraph3">
        <ruleml:Rule>
            <ruleml:if>
                <ruleml:And>
                    <ruleml:Atom keyref=":intermediaryORplatform">
                        <ruleml:Var>Intermediary</ruleml:Var>
                        <ruleml:Var>Platform</ruleml:Var>
                    </ruleml:Atom>
                    <ruleml:Atom keyref=":useAIModelLLMGenerativeAISoftwareAlgorithm"/>
                    <ruleml:Atom keyref=":CResource"/>
                </ruleml:And>
            </ruleml:if>
            <ruleml:then>
                <lrml:SuborderList>
                    <lrml:Prohibition>
                        <ruleml:Atom>
                            <ruleml:Rel iri=":permit"/>
                            <ruleml:Ind>bias</ruleml:Ind>
                        </ruleml:Atom>
                    </lrml:Prohibition>
                    <lrml:Prohibition>
                        <ruleml:Atom>
                            <ruleml:Rel iri=":permit"/>
                            <ruleml:Ind>discrimination</ruleml:Ind>
                        </ruleml:Atom>
                    </lrml:Prohibition>
                    <lrml:Prohibition>
                        <ruleml:Atom>
                            <ruleml:Rel iri=":threaten"/>
                            <ruleml:Ind>integrity of electoral process</ruleml:Ind>
                        </ruleml:Atom>
                    </lrml:Prohibition>
                </lrml:SuborderList>
            </ruleml:then>
        </ruleml:Rule>
    </lrml:PrescriptiveStatement>
</lrml:Statements>

Thereafter, I added code to allow users to select which of the paragraphs they would like to check their compliance obligations for. I also added an option to view the paragraph of the advisory while hovering over the selection using Javascript. Like all other coding in this project, this was done with the help of LLMs.
Lastly, I added backend modules in Python to parse through the XML generated programmatically and generated compliances based on conditions. You can view this code with comments explaining the process of parsing the XML in the format I annotated the law at this GitHub link (click here)