Karsten Schmidt on Nostr: Dreaming again about software as a graph. Instead of using the semantic units of ...
Dreaming again about software as a graph. Instead of using the semantic units of programming languages, our entire dev infra rests on hierarchies which should have zero relation with the internal structures/architectures of our code and, like so much else, are the byproduct of older legacy decisions. Despite this, 99% of modern PLs are still designed around these seemingly permanent legacy structures.
Early “web search” indices before Google (i.e. Altavista, DMOZ, Yahoo) were all about putting links into hierarchies. Our entire modern software development architecture is (still) using the same model: From hierarchical file systems, languages using files as basic organizational unit (vs functions/classes/types), name spaces, packages (aka virtualized folders) as containers, both to group functionality and to distribute it. Git repos are yet another level of hierarchy on top (although excluding monorepos, they’re usually the same level as packages). It’s folders, not turtles, all the way down (and up)!
Caring about usability & maintenance, for years I’ve been struggling with this overall setup and having an increasingly hard time (and spending too much of it) to figure out _where_ to put (new & old) functionality: Should it be combined with or become part of existing packages, go in a new file/package, should it be (always) internal or public, should it exist at all (as standalone unit)... This struggle, first stemming from 15 years of Java coding, largely motivated my adoption of Literate Programming (LP) during most of the 2010s (via #Clojure & #OrgMode), though it only partially helped with some aspects, and other people around me discouraged it...
Hierarchies lead to hard-to-undo systemic calcification of structures. Structures/institutions which only make/made sense for a time, maybe were a good pragmatic/useful solution at the time, but then should be allowed to cease to exist, or the very least should be more soft and open for change. To me software is NOT about upholding hierarchies, but about malleability, above all other concerns like reliability, reproducibility, security, etc. There’re hundreds of existing things I’d like to migrate/re-organize, but I can’t because it’d break hundreds of downstream projects — and I do care about others who’re using these libraries! For context, being the by far largest thi.ng meta-project, the https://thi.ng/umbrella monorepo contains 200 packages/libraries with a total of 4100+ standalone functions and ~2200 types/interfaces/classes. It’s getting ever harder to fight the existing hierarchies and I know I could drastically reduce these numbers if it wasn’t for these enforced structures.
Apart from calcification, hierarchies also lead to other issues like duplication, (lack of) discoverability, competition (of responsibilities), paradox of choice, and increased maintenance efforts... All things I’d like to avoid in my work!
What I want instead (and have already started prototyping several times) is a distributed graph based version of:
- Content addressable standalone semantic units of code (language agnostic). Any change immediately leads to new version. Requires an alias system to make references & versions human readable.
- Property graph database of all code. Each node has typed links to dependencies (e.g. other functions), documentation (incl. example usage, references, research papers), bidirectional links to previous and next versions, other metadata (author, license etc.)
- No files, no packages, only tags (for discovery)
As an interim adaptation step to keep on using existing languages/infra, a form of LP style “tangle” tool is required. This tool would linearize/dedupe the referenced subgraph(s) into traditional source files as a pre-build step and implicitly perform dead code elimination, which should also lead to much lower compile efforts/times...
Using a graph approach, we can have much more advanced & useful dev tools, more easily produce visualizations to aid codebase & dependency analysis/maintenance, refactoring, etc.
#Software #Graph #OpenSource
Early “web search” indices before Google (i.e. Altavista, DMOZ, Yahoo) were all about putting links into hierarchies. Our entire modern software development architecture is (still) using the same model: From hierarchical file systems, languages using files as basic organizational unit (vs functions/classes/types), name spaces, packages (aka virtualized folders) as containers, both to group functionality and to distribute it. Git repos are yet another level of hierarchy on top (although excluding monorepos, they’re usually the same level as packages). It’s folders, not turtles, all the way down (and up)!
Caring about usability & maintenance, for years I’ve been struggling with this overall setup and having an increasingly hard time (and spending too much of it) to figure out _where_ to put (new & old) functionality: Should it be combined with or become part of existing packages, go in a new file/package, should it be (always) internal or public, should it exist at all (as standalone unit)... This struggle, first stemming from 15 years of Java coding, largely motivated my adoption of Literate Programming (LP) during most of the 2010s (via #Clojure & #OrgMode), though it only partially helped with some aspects, and other people around me discouraged it...
Hierarchies lead to hard-to-undo systemic calcification of structures. Structures/institutions which only make/made sense for a time, maybe were a good pragmatic/useful solution at the time, but then should be allowed to cease to exist, or the very least should be more soft and open for change. To me software is NOT about upholding hierarchies, but about malleability, above all other concerns like reliability, reproducibility, security, etc. There’re hundreds of existing things I’d like to migrate/re-organize, but I can’t because it’d break hundreds of downstream projects — and I do care about others who’re using these libraries! For context, being the by far largest thi.ng meta-project, the https://thi.ng/umbrella monorepo contains 200 packages/libraries with a total of 4100+ standalone functions and ~2200 types/interfaces/classes. It’s getting ever harder to fight the existing hierarchies and I know I could drastically reduce these numbers if it wasn’t for these enforced structures.
Apart from calcification, hierarchies also lead to other issues like duplication, (lack of) discoverability, competition (of responsibilities), paradox of choice, and increased maintenance efforts... All things I’d like to avoid in my work!
What I want instead (and have already started prototyping several times) is a distributed graph based version of:
- Content addressable standalone semantic units of code (language agnostic). Any change immediately leads to new version. Requires an alias system to make references & versions human readable.
- Property graph database of all code. Each node has typed links to dependencies (e.g. other functions), documentation (incl. example usage, references, research papers), bidirectional links to previous and next versions, other metadata (author, license etc.)
- No files, no packages, only tags (for discovery)
As an interim adaptation step to keep on using existing languages/infra, a form of LP style “tangle” tool is required. This tool would linearize/dedupe the referenced subgraph(s) into traditional source files as a pre-build step and implicitly perform dead code elimination, which should also lead to much lower compile efforts/times...
Using a graph approach, we can have much more advanced & useful dev tools, more easily produce visualizations to aid codebase & dependency analysis/maintenance, refactoring, etc.
#Software #Graph #OpenSource