Malli, Data-Driven Schemas for Clojure/Script
Malli, Data-Driven Schemas for Clojure/Script

Introduction

After over a year in incubation, we are proud to announce the first alpha-version of Malli, a new data validation and specification library for Clojure/Script. It provides unified tools for building data-driven schema systems covering schema definition, validation, (human) errors, value and schema transformation, value and schema generation, registries and much more.

It's an open system, aiming to be both a fun to use and easy to extend. It has first class apis for both basic users and advanced/library developers. Using the lessons learned on making reitit, Malli is written from the group up to be fast.

There is a long list of prior art, but special thanks to Plumatic Schema and clojure.spec for showing the way - and to Clojurists Together for funding to ship the lib out.

The simplest thing that works:

(require '[malli.core :as m])

(m/validate int? 1)
; => true

Specification Language

One of the key lessons learned from building successful multi-tenant, data-oriented and data-driven systems with Clojure is that we should aim to define our data models as literal/serializable data we can easily program with. This gives us big leverage as we can roundtrip our schemas into database and into the web clients for local validation or editing. The specification language should be expressive and easy to understand, also by business and domain people.

There is a great post by Valentin Waeselynck about the benefits of having a common definition language and deriving more or less everything from it.

Example of a recursive Malli Schema, describing an Order:

(def Order
  [:schema
   {:registry {"Country" [:map
                          [:name [:enum :FI :PO]]
                          [:neighbors [:vector [:ref "Country"]]]]
               "Burger" [:map
                         [:name string?]
                         [:description {:optional true} string?]
                         [:origin [:maybe "Country"]]
                         [:price pos-int?]]
               "OrderLine" [:map
                            [:burger "Burger"]
                            [:amount int?]]
               "Order" [:map
                        [:lines [:vector "OrderLine"]]
                        [:delivery [:map
                                    [:delivered boolean?]
                                    [:address [:map
                                               [:street string?]
                                               [:zip int?]
                                               [:country "Country"]]]]]]}}
   "Order"])

You don't have to know much about Malli's internals to be able to read that. And, we didn't need any dependencies, it's just data and functions. You can play with the above schema in malli.io (the schema is serialized as a query parameter).

Example of transforming Order into DOT, the Graph Description Language:

(require '[malli.dot :as md])

(md/transform Order)
; => "digraph {...}"

Visualized using graphviz:

Defining Schemas

Malli defaults to a Reagent-style hiccup-syntax for defining schemas:

boolean?                               ;; type
[:tuple double? double?]               ;; [type & children]
[:enum {:title "Color"} "red" "black"] ;; [type properties & children]
[MyType {:my "props"} :my :children]   ;; reagent-style types

Internally, Schemas are represented as protocol instances. Schema syntax can be compiled into Schema protocol instances using m/schema function, backed by a scoped Registry.

Using the default registry:

(m/schema [:or int? string?])
; => [:or int? string?]

Using an explicit registry:

(m/schema [:or int? string?] {:registry m/default-registry})
; => [:or int? string?]

Malli supports several types of registries: immutable, mutable, dynamic, lazy, local and composite.

Core protocols

For internal elegance, Malli is built using protocols, but unless you are creating library extensions, you don't have to know about those. For the curious, here are some of the most important ones:

(defprotocol IntoSchema
  (-into-schema [this properties children options] "creates a new schema instance"))

(defprotocol Schema
  (-type [this] "returns type of the schema")
  (-type-properties [this] "returns schema type properties")
  (-validator [this] "returns a predicate function that checks if the schema is valid")
  (-explainer [this path] "returns a function of `x in acc -> maybe errors` to explain the errors for invalid values")
  (-transformer [this transformer method options] "returns an interceptor map with :enter and :leave functions to transform the value for the given schema and method")
  (-walk [this walker path options] "walks the schema and it's children")
  (-properties [this] "returns original schema properties")
  (-options [this] "returns original options")
  (-children [this] "returns schema children")
  (-form [this] "returns original form of the schema"))

(defprotocol MapSchema
  (-entries [this] "returns sequence of `key -val-schema` MapEntries"))

(defprotocol LensSchema
  (-keep [this] "returns truthy if schema contributes to value path")
  (-get [this key default] "returns schema at key")
  (-set [this key value] "returns a copy with key having new value"))

(defprotocol RefSchema
  (-ref [this] "returns the reference name")
  (-deref [this] "returns the referenced schema"))

Validation and Errors

The most common Schema application is runtime validation and for that, there is validate. Like most Malli core functions, it supports both Schema instances and raw Schema syntax - calling schema behind the scenes:

(m/validate (m/schema [:maybe string?]) "sheep")
; => true

(m/validate [:maybe string?] "sheep")
; => true

Below is an example of validating a closed map (maps are open by default) with the constraint that the two password fields must match. Malli integrates with sci to add support for serializable anonymous function schemas without needing macros or eval.

(def UserForm
  [:and
   [:map {:closed true}
    [:name string?]
    [:age pos-int?]
    [:password string?]
    [:password2 string?]]
   [:fn {:error/message "passwords must match"
         :error/path [:password2]}
    '(fn [{:keys [password password2]}]
       (= password password2))]])

(m/validate
  UserForm
  {:name "Liisa"
   :age 64
   :password "Liisa4"
   :password2 "Liisa444"})
; => false

To get more programmatic details about errors, there is m/explain, inspired by clojure.spec:

(-> UserForm
    (m/explain
      {:name "Liisa"
       :age "64"
       :password "Liisa4"
       :password2 "Liisa444"}))
;{:schema [:and
;          [:map {:closed true}
;           [:name string?]
;           [:age pos-int?]
;           [:password string?]
;           [:password2 string?]]
;          [:fn {:error/path [:password2]
;                :error/message "passwords must match"}
;           (fn [{:keys [password password2]}]
;             (= password password2))]]
; :value {:name "Liisa"
;         :age "64"
;         :password "Liisa4"
;         :password2 "Liisa444"},
; :errors ({:path [0 :age]
;           :in [:age]
;           :schema pos-int?
;           :value "64"}
;          {:path [1],
;           :in [],
;           :schema [:fn {:error/path [:password2]
;                         :message "passwords must match"}
;                    (fn [{:keys [password password2]}] 
;                      (= password password2))]
;           :value {:name "Liisa"
;                   :age "64"
;                   :password "Liisa4"
;                   :password2 "Liisa444"}})}

Malli also provides humanized errors, supporting localization:

(require '[malli.error :as me])

(-> UserForm
    (m/explain
      {:name "Liisa"
       :age "64"
       :password "Liisa4"
       :password2 "Liisa444"})
    (me/humanize))
;{:age ["should be a positive int"]
; :password2 ["passwords must match"]}

... and ships with a spell checker for :map and :multi keys, thanks to Bruce Hauman's code originally found in spell-spec:

(-> UserForm
    (m/explain
      {:name "Liisa"
       :age "64"
       :passwordz "Liisa4"})
    (me/with-spell-checking)
    (me/humanize))
;{:age ["should be a positive int"]
; :passwordz ["should be spelled :password2 or :password"]}

Value Transformation

Runtime value transformation, aka coercion is also important. Schemas should be used to derive the required transformations between different formats like JSON and EDN. Malli has a performant built-in two-way transformation engine, inspired by Plumatic Schema and the lessons learned from developing spec-tools.

(require '[malli.transform :as mt])

(m/decode [:set keyword?] ["kikka" "kukka"] (mt/string-transformer))
; => #{:kukka :kikka}

By default, the decoding and encoding logic is derived from schema types, but schema properties can be used too. Example of a bidirectional transfromation:

(def LegacyString
  [:string {:decode/string '(fn [x] (str/lower-case (subs x 4)))
            :encode/string '(fn [x] (str/upper-case (str "HAL_" x)))}])

(m/decode LegacyString "HAL_KIKKA" (mt/string-transformer))
; => "kikka"

(m/encode LegacyString "kikka" (mt/string-transformer))
; => "HAL_KIKKA"

Transformation chains are composable. Below is an example where we decode data from JSON to EDN, stripping away all extra keys and applying default values:

(def Address
  [:map
   [:id string?]
   [:tags [:set keyword?]]
   [:address
    [:map
     [:street string?]
     [:city string?]
     [:zip {:default 33100} int?]
     [:lonlat [:tuple double? double?]]]]])

(m/decode
  Address
  {:id "Lillan",
   :EVIL "LYN"
   :tags ["coffee" "artesan" "garden"],
   :address {:street "Ahlmanintie 29"
             :DARK "ORKO"
             :city "Tampere"
             :lonlat [61.4858322 23.7854658]}}
  (mt/transformer
    (mt/strip-extra-keys-transformer)
    (mt/default-value-transformer)
    (mt/json-transformer)))
;{:id "Lillan",
; :tags #{:coffee :artesan :garden},
; :address {:street "Ahlmanintie 29"
;           :city "Tampere"
;           :zip 33100
;           :lonlat [61.4858322 23.7854658]}}

Thanks to the internal ahead-of-time resolver, the value transformation are quite fast. It's currently 1-3 orders of magnitude faster than with clojure.spec + spec-tools and we aim to make it even faster.

Schema Transformation

One of the key aspect of Clojure is data-orientation. We build our values using immutable maps, sets, lists and vectors, without needing any predefined classes or schemas:

(def user
  {:name "tommi"
   :age 45
   :address {:street "Hämeenkatu"
             :country "Finland"}})

The Joy of Clojure:

(-> user
    (select-keys [:name :address])
    (assoc-in [:address :zip] 33100))
;{:name "tommi"
; :address {:street "Hämeenkatu"
;           :country "Finland"
;           :zip 33100}}

Malli aims to bring the same power to working with Schemas. Schema instances are immutable values and thanks to generic -walk, -get and -set protocol methods and helpers in malli.util, we can transform Schemas just like normal data.

(def User
  [:map
   [:name string?]
   [:age int?]
   [:address [:map
              [:street string?]
              [:country string?]]]])

(m/validate User user)
; => true

Transforming it like in the example above:

(require '[malli.util :as mu])

(-> User
    (mu/select-keys [:name :address])
    (mu/assoc-in [:address :zip] int?))
;[:map
; [:name string?]
; [:address [:map
;            [:street string?]
;            [:country string?]
;            [:zip int?]]]]

This still holds:

(m/validate
  (-> User
      (mu/select-keys [:name :address])
      (mu/assoc-in [:address :zip] int?))
  (-> user
      (select-keys [:name :address])
      (assoc-in [:address :zip] 33100)))
; => true

There are also utilities for schema merge, union, find-first, pre- and postwalk. Below is an example how to transform a Schema from hiccup-syntax to cljfx-style map-syntax:

(m/walk
  User
  (fn [schema _ children _]
    (let [properties (m/properties schema)]
      (cond-> {:type (m/type schema)}
              (seq properties) (assoc :properties properties)
              (seq children) (assoc :children children)))))
;{:type :map,
; :children [[:name nil {:type string?}]
;            [:age nil {:type int?}]
;            [:address nil {:type :map
;                           :children [[:street nil {:type string?}] 
;                                      [:country nil {:type string?}]]}]]}

Malli to DOT and Malli to JSON Schema transformers are also available. To highlight power of the transforming utilities, the DOT-transformer is just 69 lines of code.

Value Generation

Value generation is a process of creating valid example values out of a Schema. Both Plumatic and Spec have had this for years and so does Malli. Thanks to test.check for all the heavy lifting:

(require '[malli.generator :as mg])

(mg/sample string? {:size 5, :seed 42})
; ("" "o" "1" "p" "gQl")

Schema properties can be used to control the value generation. These properties include::gen/fmap, :gen/elements, :gen/min, :gen/max and :gen/gen:

(mg/generate [:string {:gen/fmap '(partial str "kikka_")}])
; "kikka_WT3K0yax2"

Random Orders coming your way:

(mg/generate Order {:size 5, :seed 42})
;{:lines [{:burger {:name "5n156", :origin nil, :price 1, :description ""}, :amount -3}
;         {:burger {:name "rxH0", :origin {:name :FI, :neighbors []}, :price 7}, :amount 0}
;         {:burger {:name "JU", :origin nil, :price 12, :description "Wa3"}, :amount -4}],
; :delivery {:delivered true, :address {:street "5", :zip -1, :country {:name :PO, :neighbors []}}}}

Schema Generation

Inspired by F# Type providers, there are two built-in ways to generate Schemas from external data: inferring schemas and lazy/provider registries.

Inferring Schemas

Inferring schemas is the inverse of value generation. Given a set of values, a minimal schema that all values are valid against is returned. Inspired by spec-provider for Clojure Spec.

(require '[malli.provider :as mp])

(mp/provide [user])
;[:map
; [:name string?] 
; [:age int?] 
; [:address [:map 
;            [:street string?] 
;            [:country string?]]]]

The current implementation is naive (69 lines of code) but already quite useful.

Lazy/Provider Registries

Another way to generate schemas is to use a special lazy-registry that allows a callback-function to be used to return a schema given a registry key. There is a nice example that pulls out AWS CloudFormation Schemas on-demand and converts them into Malli Schemas.

Beyond Runtime Validation

The primary goal of Malli has been to create a tool for communicating the structure of data, to be enforced at runtime. In JavaScript, it would be more like yup than TypeScript. But we don't have to stop there: Malli will have function schemas, it should provide pretty errors for developers, it most likely will integrate with Clojure Spec, definitely with clj-kondo, maybe even with Typed Clojure.

We could build providers for GraphQL, PostgreSQL and TypeScript to simplify interop in the real world projects we build. Maybe infer schemas with HugSQL the in spirit of pgtyped?

A demo of function schemas with clj-kondo from my talk at ClojureD:

Malli vs Clojure.Spec

But wait, there is already clojure.spec, which is the supposed future of Clojure, why create a library doing about the same? Short answer: the libraries have different goals: Spec is not going to be a runtime transformation engine, period.

Spec builds around a global registry and strong ideologies around it, Malli builds on schemas and registries as values and aims to be pragmatic. The world is not perfect, it's riddled with nulls and optional keys and we should embrace them.

You could use a single global registry in Malli, but you don't have to. You can build a strict Spec2 Schema/Select-style system with decomplected maps keys and values with Malli, but you don't have to. Malli is just a library, putting the user in control of things.

Spec2 example schema with Malli:

(def registry
  {::street string?
   ::city string?
   ::state string?
   ::zip int?
   ::addr [:map ::street ::city ::state ::zip]
   ::id int?
   ::first string?
   ::last string?
   ::user [:map ::id ::first ::last ::addr]})

(def User
  [:schema {:registry registry}
   ::user])

(mg/generate User {:size 21, :seed 21})
;{:user/id -1669,
; :user/first "5qU1TP3",
; :user/last "mbQN36L",
; :user/addr {:user/street "W901"
;             :user/city "cA7AZkvLu0Dl6lXg"
;             :user/state ""
;             :user/zip 61730}}

I believe Spec and Malli can co-exist and can be even friends together. Spec is an awesome tool for describing the core language and library apis, and Malli the world we are living in.

It's Alpha-time!

Since today, the following artifact is found in Clojars:

;; Leiningen/Boot
[metosin/malli "0.1.0"]

;; Clojure CLI/deps.edn
metosin/malli {:mvn/version "0.1.0"}

Big thanks to all contributors, everyone at Metosin and to Clojurists Together for the final nudge to get the stable version out. We are all excited and it has been real joy to see the current design emerged from original drafts.

Despite having been in pre-alpha, Malli has over 96k downloads, 459 stars on github and many people are already using it in production. All design decisions are openly discussed both in #malli Slack an in Github Issues. After this release, all changes will be properly tracked in the CHANGELOG.

What's Next?

There are still many weeks of the Clojurist Together funding, which enables me to focus on Malli. The roadmap is mostly laid out as Github Issues, but here is some ongoing work:

We are using Malli actively in our new projects, and will continue to explore things we can do with it. Busy times, so help is welcome.

Getting Started

Simplest way to get started with Malli is to join #malli, read the README, try things in malli.io or use it via an external library, including reitit or gungnir. Google's version of how to pronounce Malli.

The coordinates:

You can discuss this post in Clojureverse.

Cheers,

Tommi