Releases: morloc-project/morloc
Sockets and Shared Memory (10,000X speedup)
This release includes a full rewrite of the morloc
backend. In prior releases,
every cross-language call would require writing all arguments to files on the
disk and calling a morloc
-generated executable for the given language with the
arguments as temporary files. For interpreted languages like R and Python,
starting these executables would require initializing the interpreter at a cost
of ~300ms or ~50ms, respectively. So cross-language calls were very expensive.
This release replaces file-based communication with shared memory and cold calls
to executables with UNIX domain socket messages between daemons. Compiling a
morloc
module creates a nexus
executable that serves as the command-line
interface to the exported functions. The nexus accepts function arguments as raw
JSON, JSON files or MessagePack binary files. Calling a specific function will
first initialize a daemon for every language the morloc
function uses. Each
daemon listens over a UNIX domain socket for commands (either from the nexus or
another pool). When the nexus or a language daemon makes a cross-language call,
arguments are converted to a generic binary form in a shared memory pool. The
relative pointers to these arguments are sent to the downstream daemon via a
message over a UNIX domain socket. The downstream daemon performs a computation,
writes the result back to the shared memory, and returns a message over the
socket telling the caller where to find the result. These messages also encode
error status, allowing error messages and possibly other metadata to propagate
between languages and ultimately back to the user.
Cross-language communication now has a constant overhead of a few microseconds
for needed to message over a socket plus the time required to format
argument data to/from generic binary forms in shared memory. As a simple test,
the morloc function map inc xs
-- where map
is a C++ loop, inc
is a python
function that increments an integer, and xs
is a list of integers -- runs
at under 3 microseconds per integer. This is a ~10-20 thousand fold
improvement over the past cost of ~50ms per call to Python.
None of these changes to the backend have any effect on the code the morloc
programmer writes. The type annotations that were added in the past are
sufficient for the morloc
compiler to convert all types from all languages to
and from generic binary structures.
Better Typing
The main changes since the 0.43.0 release are the addition of typeclasses, basic value checking, and explicit function type parameters.
Typeclasses cannot yet be used as explicit constraints in function signatures, so their value in modeling data is limited. But at least we can now have one add
function name for both integers and doubles. Also, packing and unpacking has been re-implemented using a new Packable
type class. This means we no longer need the special pack
and unpack
descriptors in signatures.
Value checking is important since morloc
can define multiple definitions for one term. For example, it is legal to write:
x = 1
x = 2
This would not redefine x
, as is done in many languages, but would rather associate both values with the variable name and attempt to disambiguate them later (which in the past implementation would have arbitrarily picked the last one). Now I have a very rudimentary value checker that will check for contradictions between primitives. It cannot descend past a source function call. In the future, I will need to extend the value checker to compare different sourced functions. This will likely have an LLM solution.
Explicit function parameters are now added to function signatures to provide an order for the generic type variables. For example:
snd a b :: (a, b) -> b
This deviates from Haskell syntax, but clarifies the relationship between the morloc
type signature and type signatures in other languages, such as the C++ prototype:
template <class A, class, B>
B snd(tuple<A,B>);
This also allows us to conveniently refer to functions as parameterized types, e.g.: snd Int Bool
. Possibly such type functions could be used in signatures as well. I will explore this later.
Nearly Useful
The full influenza case study, a re-implementation of OctoFlu, is supported in this release. This proves that morloc
can be used to solve non-trivial problems. Here are the main advancements of this release:
- Infer all concrete types directly from the general type
- Allow file inputs rather than only raw JSON -- this allows large data sets to be processed without hitting argument size limits
- Better (though still far from good) debugging options and error messages
- Clean import/export system with wildcards
- Support for eta-reduction
- Many bug fixes and greatly extended test coverage
However, the language is still weak in many areas:
- No type classes
- No effect handling (e.g., exceptions, mutations, non-determinacy)
- Weak record/table/object support
- No pattern matching or sum types
- No binary operator support (I'm getting a little tired of writing
add
rather than+
) - Limited debugging features
- Limited language support
- Slow compile times (due to one specific issue in the frontend type system)
- Inefficient serialization scheme (uses JSON currently, should convert to some sort of remote procedure call system)
- No formal specification of the type-system -- the conversions from general to concrete to serial, the resolution of ambiguous trees, the propagation of types through segmentation, the threading of arguments -- all this is very involved but not yet mathematically defined. I am not confident that it is all sound.
- No shiny paladin salesperson, only a grumpy morlock who thinks only about problems
Where scoping
This release adds scoped where
to morloc
and fixes several subtle bugs in the typesystem.
Under the hood, the entire architecture has been refactored. Previously concrete and abstract typechecking were done in a single step, now first the general types are inferred, then the trees are disambiguated, and finally the concrete types are inferred. Also, typechecking is now done AFTER the raw expressions have been parsed and desugared into the set of ASTs that will be exported from a module. There are many, many more changes in the implementation. If you are curious, read the commit messages.
Pretty good typechecking and serialization
This release sets the foundation for morloc. Basic typechecking/inference, code generation, interoperability, and serialization are all working well. Finally morloc
is sufficiently developed to be useful.
The main future goals break down as follows:
- Richer type system - typeclasses, "shapes", semantic types (probably use a logic engine like z3)
- Effect handling and error/warning propagation
- Optimization - all current optimizations steps are basically stubs
- Doxygen-like documentation, caching, manifold hooks and such (see the last release)
- Improved build system
- Support for many more languages and a streamlined language onboarding process
- The MorlocIO package manager and community portal
- MorlocStudio
PyCon 2019
This release marks the version of morloc
that was used in the poster presented at PyCon 2019 in Cleveland.
Minimal Haskell Prototype
This release presents a very simple Morloc prototype. It is mostly experimental and will change greatly in the future with no attempt to preserve backwards compatibility.
This prototype includes
- A simple, typed, functional scripting language
- A compiler to translate these scripts into RDF graphs and then executable code
- Simple type checking
- Support for Python and R
- A system for specifying language-specific types and transforming the data as needed
- Syntax for specifying type constraints
Pre-release of Haskell prototype
This prototype is (currently) much less sophisticated than the C prototype. However, the code is far more elegant and will serve as a more flexible foundation for future development.
It can currently run R code in a simple shell interface. For example:
> sum [1,2,3]
6
This passes the Morloc vector [1,2,3]
into the R function sum
and returns the result.
This pre-release is an experimental foundation for the Morloc language. The syntax and features will change wildly in the future with no attempt at maintaining backwards compatibility.
Final version of the C prototype
This is the final version of the C prototype.
The features are described in the README. Here is an overview:
-
integrated R, Python, and Bash through a simple type system
-
workflows are pull-based graphs
-
explores the "manifold" template idea and multi-dimensional workflows
-
compilation exposes all exported functions through the manifold nexus
-
allows checks and effects to be added outside of the core workflow
This prototype will not be maintained in the future.