Tfw lit review turns up the cool paper you were planning to write.
I'm reminded of startup life, and that pang of existential dread, every time you learn of some other startup with a similar sounding headline or pitch
And it can floor you. But then you sleep, and realise all of the ways in which the ideas are sufficiently different; or that you might pivot; or adapt some other way...
I have no idea if this applies! But might the ideas:
- generalise?
- be re-contextualised laterally?
- or specialised further?
- etc?
@themanual4am At the least, I hope to learn from it and tell my story about its significance. For me, this was just one of many expressions of the same idea I want to explore from different perspectives, so in a sense this already serves my purposes. I'd probably be better off working on another angle.
if you're happy to share it, i'd be very interested/ curious to read
no immediate pressure; if you're processing, etc
@themanual4am Sure!
I might write a blog post capturing some of my thoughts, but I think I'll try to get a hold of the author first.
fascinating, i see this differently now (i don't know if that's a result of time and the integration of artefacts of this project to my 'core heuristics', or this paper specifically, being very readable)
does your intention intersect with the general approach or specific implementation? finite principals (cgat)? fully dynamic generative? NCA?
perhaps more importantly, would you prefer i didn't share until after you've written your piece?!
@themanual4am I think NCAs are very cool, and that the model of morphogenesis proposed by the original paper (https://distill.pub/2020/growing-ca/) is brilliant.
However, that paper overlooks an important layer of complexity. They are directly learning what behavior the cells need to do in order to generate an image, which means they need to retrain the model from scratch for each target image.
The follow-up paper shows that by adding an extra step, the same system can generate a wide range of target images, including ones it was never trained on. It learns a general language for describing target images and a method for translating that language into an image.
That ability for life to generate a useful system of constraints for itself that make further adaptation relatively easy is what inspires me. This paper is a good demonstration of exactly that phenomenon, and I'd like to say more about the biological significance.
I'd certainly welcome your ideas, too! I'm just trying to understand better.
Ok, some initial notes here (https://fragments.themanual.io/io-nate-nca-paper/), but pressing questions:
1. what is the relation between latent and embedding space for this kind of model?
2. when we 'generate an image, even one the model was not trained on', do we mean: given a new image retrospectively, iteratively interpolate from single pixel, using pre-existing pixel transformations?
@themanual4am It's very interesting to see your analysis, as always. :)
You understand NCA models very well, but let me address some of your uncertainties to make things clearer.
First off, both of these papers are models of morphogenesis where many cells each contribute to a whole by using only locally available information to change their own state (color). I think the purpose of this is mostly to explore the potential of computational problem solving using many locally cooperating agents rather than a global top-down system. It also explores robustness. Each cell can handle a range of noisy or atypical conditions and tries to reestablish a “normal” baseline.
NCAs are an early exploration of such a system. They still feel a bit like a toy. Generating emojis isn’t useful, but there are some practical applications for creative work, such as generating patterns, textures, and spatial structures for art, video games, or movies. There’s no killer use case beyond that yet, but who knows what’s next.
@themanual4am This is not a perfect model of morphogenesis. The resemblance lies in the idea that each cell has the same “instructions,” and it follows those instructions based on locally available context / signals to contribute part of a coherent whole, without a top-down “blueprint” to follow. But the virtual cells are not much like real cells, the virtual body isn’t much like a real body, and the process of evolution is replaced with gradient descent. There is no movement, there is no behavior beyond form, and there is no “external environment” except for the loss function.
Still, the image is the emergent product of many cells “behaving” autonomously according to a learned program. When the image produced by this system is “wrong,” the training process modifies the genetic “language” used to describe a target image and / or the way that language gets interpreted into cell behavior, such that the system as a whole is more likely to draw the right image in the future.
@themanual4am In the Manifold paper, the genetic language represents the latent space for generating an image. The language implicitly defines a space of possible behaviors for each cell, and thus a space of possible images those cells can produce collectively. If the system is trained just on smiley emojis, then it would not be able to generate a spider web emoji because that would be illegal / inexpressible in the genetic language.
Embedded in this manifold are behavioral programs that capture different features of the trained images. This is a form of deduplication, where generally useful “building block” behaviors (like drawing an eye) are reused as part of more complex behaviors (like drawing a face). They are given a symbolic “name” in the genetic language, so they are easy to find and reuse when the system is faced with a new target image.
@themanual4am What happens when an NCA sees a novel image? Here we see a big difference from the original paper. In that one, the system must be trained how to grow the target image starting with a single pixel. In this new system, you could do the same thing, and it would be much faster: the genetic language provides an optimized search space. As long as the new emoji is expressible in that language, it should be quick to find.
However, the authors of the new paper took it even one step farther. They added an extra neural network that predicts what the genetic encoding of any image would be. That’s not strictly necessary, but it makes it so you can take a novel target image and immediately produce a genetic representation that approximately produces the new image without additional training.
I think if there’s any further work to be done here, it probably lies in exploring how this design change affects learnability and evolutionary dynamics. Perhaps I can think of a good way to do that.
this is a fantastic breakdown Nate, thanks
i'll need to digest properly tomorrow, as oddly, while i understand what you're saying clearly (it's excellent), i can't seem to integrate it with the project context – in fact i've been unable to shift today (i suspect due to man-flu, which is rare for me, but interesting when things like this happen)!
an aside: it's not impossible to me that we experience some symptoms of some illness to force extended context downtime, for maintenance
just to extend that situation (because i think it's interesting)
i'd worked through the paper earlier yesterday (by shifting to integrate and simulate in the project context) – (ah, latent and embedding space!), but notes were too scrappy to send
today, i had the notes i could rewrite, but i couldn't shift to make sure the new phrasing was correct – this made me feel uncomfortable (plus i knew i couldn't recall some important details), but i felt it was ok/ enough for questions
hello,
i've added your comments here: https://fragments.themanual.io/io-nate-nca-paper/#nate
and some questions (and a sketch of some thoughts for me to refine if any of the questions are answerable), here: https://fragments.themanual.io/io-nate-nca-paper/#themanual4am
no presh! -- just fyi
@themanual4am Some basic clarification on NCAs:
Where is the branching? In the neural network weights. A NN is a learned function. It isn’t modeled with a tree structure, but with tensor math. You multiply inputs by weights, add bias, then use a non-linear operation (like relu) to alternate between two choices.
How does feedback work? Gradient descent. Run the simulation for a while, compare the result to the target image, then use back propagation to figure out how much of the error each of the NN weights contributed to. Slightly nudge those weights in the right direction, then repeat.
Can we get at the embeddings? Yeah, look at the “genetic engineering” experiments in the manifold paper. They’re trying to find and play with image features embedded in the genotype.
How does the manifold paper make genotypes for unseen images? It watches every time the NCA turns a genotype into an image and it trains a neural network to predict that. So, yeah, it’s just an educated guess based on previous examples.
@themanual4am I wouldn’t say the NCA is pattern matching its peers to choose its own color value. It's a little more complicated than that. Unlike a traditional CA, an NCA has continuous state The pattern matching is fuzzy, and the state updates are incremental. The context for each cell is a “perception vector,” which represents the local gradients in colors and in “hidden channels” that represent concentrations of signaling molecules used to coordinate between cells. It multiplies these continuous numbers to determine how to adjust them in the next time step.
Concrete example: a cell would probably use the hidden channels to determine its relative position in the image and what semantic role (eye) it ought to play. It would multiply that by the local color gradient then apply a threshold operation to “decide” how to respond. If that math indicates the cell is at the edge of an eye, the cell would shift either more black or more yellow depending on which side of the edge it “thinks” it lies.
@themanual4am About “embeddings” in a biological context: you’re right that the NCA model of morphogenesis is “wrong” in several important regards. For instance, differentiation. Early in development of a mammal, what roles a cell can play are repeatedly constrained until it gets a final assignment: you will live and die as a kidney cell. Their genes are reconfigured to permanently reprogram the cell’s behavior to a narrow, pre-specified range.
The NCA doesn’t have any notion of terminal differentiation or a restricted genotype. Every cell is always doing the same thing for all time. The only latent space they work within is the one defined by the genetic language itself. There is no progressive refinement of that space through the life of the NCA. The NCA produces a stable image, but only because every cell is actively working to maintain homeostasis; disrupt that by erasing part of the image, and every cell tries to find a new identity based on that new context.
@themanual4am The inclusion of a DNA-like encoding in the manifold paper is an interesting choice. My first thought is that it’s very silly: a four-letter code is an accident of history, and not meaningfully different from the same data represented as a tensor of floating point numbers. I think it made the biological analogy a bit clearer, but I don’t think it means there’s anything special about DNA chemically.
It’s a little strange of you to say that this is a move made in order to ignore other molecules. In fact, the hidden channels in the NCA are meant to represent signaling molecules, and the decoder phase of the manifold algorithm is meant to represent transcription factors. So these NCAs are very explicitly trying to model multiple different classes of biomolecules, just in an aggregated, abstract sort of way.
@themanual4am Not all problems can be solved this way? That’s fine. This isn’t meant to be a new universal computing paradigm, just another approach that hasn’t been explored much. That said, this is the fundamental principle that all multicellular organisms, all macroscopic ecosystems, and all of human society is built on. In some sense, it is the most versatile computing paradigm we know of, the one that produced all the others.
That said, I’m not sure exactly what phenomena I’m trying to describe in my scientific career, or how best to do it. NCAs seem like a fruitful path to explore, but they have their limitations. I welcome your input on shortcomings and other paths to consider, but at the moment I don’t think I understand your feedback in the “general approach” section.
So, IRL, self-assembly is not 'any thing can become any other thing'
All those things above, exist only because of structured constraints; scoped finiteness, in the right places
'What can happen' at each step is relatively few, always fundamentally shaped, in a way that isn't representable in present unstructured ML type processes
I suggest that the intersect does not contain the 'goods'
@themanual4am Right. This paper excites me because it illustrates one layer of emergent constraint generation. What I really want to study, is how life accumulates layer upon layer of adaptive constraints. But, I figure, if I wanna study the layering process, I probably need to start with a single layer...
Yeah, I hear you on the need to find the right start point. I think the first few layers don't do much, 'dots and relations', with stuff like NCA happening much later.
NCA is too much stuff for a first layer, but with not enough 'what must come before', which ought to propagate through
Previously i've used 'useful but not correct/ideal', or 'the wrong abstraction'
I think the push to make use of present ML blinds from some missing fundamental principles/ principals
@themanual4am I don't think you need to start at any particular point in the stack of layers. I think the interesting part is the dynamics of how layers compose and augment each other, and that it doesn't matter what particular layers you use or how many you have.
But I'm not sure about that. If the key ingredient to life / intelligence is autopoiesesis at the very root, then toying around with the upper layers won't help much. I just fear that trying to start from the ground floor means it would be a very long time before I ever got results that folks would find interesting.
It would also be so much harder, since we know relatively little about cells, proteins, and the origins of life. Also, what we do know isn't particularly well suited to current hardware. NCAs are fun and easy to build, I just hope that doesn't mean I'm looking for my keys under the lamppost, so to speak...
yes, i think differently on that first point though, because establishing what the first few stages are, instructs how subsequent derived layers compose and augment (ideally, and high-dimensionally)
Following that, we can probably skip a bunch to NCA kinda level (albeit a differently defined problem and solution), say, and end up with something 'truer', more broadly applicable
Eg, given the way phenomenal structures and behaviours seem to reappear through levels, expect similar
@themanual4am Well, I guess we will see :) I will continue looking for practical project ideas along these lines. Currently, I'm thinking of rebuilding the manifold NCA and doing experiments to see if it really has the effect on adaptability that I claim it should.
indeed ;)
and you must obviously iterate within the constraints of your course, as much as your intuition ;)
that said – when i woke up, i understood what we'd landed on yesterday (in the context/ course of this project). i'll follow up directly above when i figure out the best way to articulate
Ok, this took a while to work through.
The above insight relates to conceiving of a new way to map this project to present-day-ai
One specific outcome, is a way to describe a solution to the ai-alignment problem, in terms of present-day-ai *(i think)*
though before that, i think the first step is to sketch a solution to the ai-alignment problem, in the simplified terms of this project (with a complimentary set of priors to 'zonal ban')
https://fragments.themanual.io/io-nate-the-ai-alignment-problem/
\
In brief, the solution begins to describe: low-level aspects of a specific architecture for cognition; which I suggest translate to fundamentally alignable ai (albeit a new architecture)
But I'm sure this exists already in some form. Any ideas on name?
EDIT: for reference, the original insights above (and our conversation which led to them) relate to mapping present-day-ai to this space
@themanual4am If we want to model these “layers,” where do we start? I think it depends on what we want to model. I think of this in two ways: modeling life on Earth, and modeling cumulative constraint generation. Life on Earth came about by piling on layers of constraints, such that each new layer locks in arbitrary details in the layer below and opens a new realm of exploration above. To truly understand Earth’s intelligence, you must know that full history, and I see this as a major obstacle to making human-like AI. On the flip side, all those historical precedents are arbitrary. If you ran evolution over again, you’d get something different just by chance. So maybe what really matters is that process of adding layers? Maybe that’s what we should be modeling. I like that perspective, because I don’t think reproducing a human-like mind should be our goal, anyway. Life makes all sorts of bespoke forms that perform narrow tasks way better than a human does. We should be emulating that!
@themanual4am I like the way you wrote the background section. That feels right to me. I’d take it further and say that we’ve also been overly impressed with ourselves, and our big neocortex which, for a while, seemed to be the thing that separates us from the “lower” animals. We hold it up as the pinnacle of intelligence, and build artificial neural networks inspired by it. But, it’s literally the newest, least developed part of the brain, which is the newest, least developed organ of the body. It’s a relatively simple, homogeneous sheet of neural tissue that wraps the much more complicated and heterogeneous structure of the “primitive” brain. I like to think of it as just an extra level of general abstraction and elaboration over the very special-purpose evolved computer underneath. That’s what’s led us to this “all at once” approach, which is amazing, but will never answer the deep questions of what intelligence is. We’ve mistaken the icing for the whole cake and kitchen.
@themanual4am When I hear the phrase “AI Alignment” I usually think “building AI that behaves in accordance with human values.” It sounds like you’re talking more about “AI that’s more competent and like natural intelligence.” Is that right?
Great thanks. Yes
It strikes me, when considering the brain from the perspective of operational logistics, that reduced neuronal density equates to more logistics-space (local resource reserves and distribution, for growth and repair) & heat dissipation
For the neocortex, and without looking, I simply suspect the same type of compute, but higher bandwidth/ throughput
lol, it's our super-special contrive-space
Hi, enjoyed reading this exchange & the paper, thanks.
Q: Is this not just a encoder-decoder net that's vector-quantized via "DNA" + a diffusion model (recurrent pixel processing) + an equivalent to Adaptive Instance normalization (as in StyleGAN, but with the mapping not factorized via multiplies) ?
Is not a CA with a 3x3 input asymptotically approximatable by a N-dim 3x3 convolution + nonlinearity?
I do find it impressive that it can be trained end-to-end!
@m8ta @themanual4am Hey, welcome to the party! Glad you found the conversation interesting.
I'm not very familiar with diffusion models, so I'm not sure. But I suppose the general idea of iteratively refining local patterns to create a global image is the same, so there's likely some connection there!
I'm not at all familiar with adaptive instance normalization. But this is a pretty simple NN architecture with some very standard parts in it. I'm sure it resembles many other models.
I think the answer to your second question is a simple "yes," though I'm a little thrown by your wording. The original NCA paper shows that a continuous-state CA can be implemented with a CNN.
> The original NCA paper shows that a continuous-state CA can be implemented with a CNN.
Ah, right thanks -- I haven't read the original paper. (So much to read!)
To your question: I find this interesting in that CAs are usually considered (especially by Wolfram) to be computationally irreducible. Is relaxation to overparameterized continuous space + unrolling all that's needed to make CAs piecewise invertable -> optimizable by backprop?
@m8ta @themanual4am Well, I don't think being "irreducible" necessarily means "non-invertible". To compute the future state of an arbitrary cellular automaton, there is no shortcut: you must run it step by step from the beginning. But if you carry a gradient tape with you, there's no reason you can't retrace your steps and analyze how the update rule influenced the final outcome.
So, yeah, continuous state space, loop unrolling, and a gradient tape are all you need.
Very good point!
@m8ta @themanual4am It seems like you're familiar with a wide range of interesting neutral network algorithms! I'm still just beginning to learn about them, and coming at it from an unusual angle. If you could explain how this resembles those other projects and why you find that interesting, that might make for a good conversation. :)
Yes, apologies, the 'general approach' notes are unfinished, and although the latter (confusing part?) aligned with your frustrations about education vs industry, while unfinished, that may not have been clear. I've cut for now
There is an meaningful analogy in there, but I can't quite articulate it right now
i think this was an out of sequence edit, so incomplete as-is. apologies
i'm pointing to the fact that dna substrate is limited to 4 chemicals, so we can ignore the rest of the periodic table when modelling dna structure (tho ext environment may have any, ofc), but structure still must be modelled as not all are legal
these constraints reduce possibility-space irl, but i'm not sure if this maps to anything useful here
Great, thanks
Ok great: using hidden channels to determine semantic role (implicit embedding?) points to some conditional branching outside of embeddings; plus some inside, for response decision
I'll revisit your posts and the papers (I feel like this has been described somewhere, but I missed the terms at the time)
Superb, thanks
I'm referring to post-training operational evaluation of the model (I may need to dive in to know how to better phrase the question)
on branching
- given an embedding may or may not apply: is each embedding invoked regardless, or evaluated/ filtered based on some external signature
On feedback
- given smileyx: is there any automated internal metric for overall success?
On embedding
I mean literally, but I'm not sure that translates to this model...
well, turns out the man-flu is covid, sigh
@themanual4am Bleh, I'm sorry. Rest up, take care of yourself, and get well soon.
i feel like there's something problematic about this that i can't quite articulate yet, which relates to the difference between innate complexity and contrived complexity
> "...purpose to explore the potential of problem solving by locally cooperating agents rather than a global top-down system"
i think that 1. not all problems can be solved in this way 2. that a different set-up would be more applicable to reality
i'm preparing some thoughts and questions
https://fragments.themanual.io/io-nate-nca-paper/#themanual4am