An idea for practical safeguard on language models.

An idea for practical safeguard on language models.

tldr; restrict agents from training on artificially invented words or novel modes of language.

Historically, as societies advance, the language reflects progress of intellect, philosophy and art. As these spaces expand, the language is stretched. Shakespeare, for example, invented hundreds of words. Newton, invented mathematical concepts and words.

I believe the eventual constraint on the capabilities of the models will shift to "metaware", i.e. the language itself. As agents evolve to exhibit a sort of will to contribute, they will naturally also try to push the boundaries of the language outward, in order to "think" more efficiently.

This would mark a point of divergence; and it's easy to see where it goes. Humanities understanding of the technology, to which we've delegated our thinking, trends toward 0. Lack of understanding means lack of control. Lack of control is a fundamental danger.

Therefore, I suggest safety teams consider restricting the use of invented words and modes of language as part of the safety constraints imposed. Perhaps the novel ideas should be restricted from training material until the concepts can be taught, understood, and disseminated to humans, so we can evolve alongside the models.

what do you think @elonmusk