Krieg Eterna

X The Everything CLI

Introduction

I have been using and thinking about AI coding assistants for the past year and I think that there is something wrong with how they work. We are quickly moving toward a paradigm where the old role of the Software Integrator, a company whose main goal is to write and maintain software for another company is reemerging as a major business opportunity in the U.S. software market, but instead of integrating Word Processors, Excel Sheets and Databases into existing companies these new Software Integrators are integrating AI agents and chat assistants trained on internal enterprise data in such a way that the data does not leave the premises of the company. Whenever you hear this term (very similar is CRM or Customer Relationship Management software), you should feel embarrassed for the software industry, and humanity, as a whole. A software integrator is only necessary when the software they are integrating is not responsive enough to the user’s wishes that they can just use it out of the box.

The Rise of AI Middleware

For example, see Glean which the basic premise is install Glean within your AWS/Azure account and connect it to your company chat (Slack, Webex, Teams, etc.) , code repo (Bitbucket, Github, etc.) and Document Store (Google Drive, Confluence, etc.) and then you can have a chat bot over all of you internal company data and gain insights for that.

There are many companies trying to do similar things to automate workflows See Zapier, basic premise is that you connect all of the same things from Glean to Zapier and then build out workflows that represent your business processes that task LLMs with collecting and modifying data, talking to customers, etc. The thing that I find wrong with both of these kinds of setups, it’s permission and arcane knowledge, in order to get the benefit of AI in my workflows I have to convince a ton of people at my company that this relatively expensive and recurring SaaS product will improve efficiency and then I have to go and teach everyone how to use it. This is somewhat mitigated by the fact that for some of these products I can talk to the product itself to figure out how it works, but many people do not have that capability. For the best companies, they will realize that this is a worthwhile cost and pay it and win overtime, but for the average user AI is just the same as having a software engineer that they can ask to automate things for them, this is not the end goal.

We have built these systems in the mirror image of the old system. If I the customer want something, I ask the company to make it for me and then I get back something that somewhat resembles what I asked for, the loop is tighter now because of AI these tools for sure, but it is the same loop. It also incurs the same types of cost-benefit calculation when you go to use them, how much of my time will it take to explain to the AI what I want, what is the likelihood that it will understand and deliver the correct product, etc. We have democratized the ability to produce working software but the same edge cases, deep technical knowledge, problem solving abilities, etc. are required if something goes wrong. What would a piece of software look like that would break that loop entirely? When I tell my Mom, “Hey you could use AI to automate the things you are doing at work”, this seems like an insurmountable task, partly because she hasn’t written software since college and partly because she is so busy with serving clients in the current workflow that she doesn’t have the time and energy to use a new tool. How can we change this? I think the answer comes from a similar idea in content algorithms, the things people want to see are the things they use the most, in a similar way the things that people would like to automate are revealed by the things that they do the most with a computer.

Revealed Preferences and the Memex

A few nights ago I was reading Project Xanadu: The Internet That Might Have Been and one of the things that struck me about this Memex idea is that the user’s own actions reveal what data should be kept and stored and later revealed to them again though these “trails” (essentially a trail is the browser history but you can share them and create a network of research that you have done, made by the path you take through the internet). This sort of tracking is often done on the Internet but rarely to the users benefit. Probably YouTube recommendations are the only thing I can think of where it actually works, but even with that product you still have a pretty hard time curating what content is served to you. Want to only be served things that make you a better person (history, math, learning, etc.)? Sorry your revealed preference is for drama and non-sense, guess you will be getting that till the end of time, even if you know it is making you a worse person. Suffice to say that revealed preferences can be used for good and ill, but certainly they are addictive and powerful when setup in the correct way.

So how are business processes revealed to the computer? I think in the following way: every time you use a website, traffic is sent and recorded by your browser usually via REST API calls. These take the form of GET, asking for some piece of information, POST updating the server with some completely new information, and PUT updating stale information on the server. These actions are the atoms of the web universe and any automation task over an existing website will use all of these in order to drive that automation. There may also be some massaging of data by the front-end JavaScript, looking up things directly from a database etc., but if the webpage has exposed it to the user then it will likely happen through REST calls.

State of the Art

The “AI native” solution to automating work, has been to reveal the concept of a MCP (Model Context Protocol) endpoint on most APIs on the internet so that LLMs can more easily call these APIs. All that MCP really is, is a endpoint on the companies server that takes some structured text (JSON) and uses the existing endpoints to make API calls on behalf of the LLM. I think this is mostly stupid for a few reasons. First, you haven’t really automated anything, the LLM still has a probabilistic chance of using the MCP endpoint correctly. Second, LLMs have a hard time reasoning which of these MCP tools to use and ones that were created directly by the model companies work much better, so there is a second class citizen problem where any tool that OpenAI puts in its model pre-training is likely to be used correctly and any tool you make with MCP is unlikely to work reliably, meaning that OpenAI has a dogged advantage in their own ecosystem.

After about 10 tools the likely hood that the model will choose the correct tool to use goes down dramatically. At a higher level, this is just a replication of the existing workflow that humans do, made slightly easier for our faster thinking friends the LLMs to use. LLMs may be significantly cheaper on an individual run to use, but given how error prone they are, they end up being pretty expensive, especially if you scale them up to serve your entire customer base. If you don’t have a “pay as you go” profit scheme then you have to either hope that your users use the service less than your usage costs (See Movie Pass for how this can go wrong in a pretty obvious way, since you are effectively subsidizing OpenAI or the movie theaters in the Movie Pass case), or you need to raise your prices to account for this and you end up being at least as expensive as the model companies (200 bucks a month is pretty high for most consumer facing applications).

Reaching for Xanadu

What I want the AI to do is to figure out what I am doing that is successful and automate that in computer code not in high level text. Once my workflow is established, why should I need a subscription to OpenAI at all? If you “Go all in on AI” as a company you have effectively signed yourself up to vendor lock in very similar to the lock in that cloud computing has. This is slightly different since as far as I can tell, one base model from OpenAI is roughly as good as from Anthropic, and switching them out is just changing the URL and credit card info you have setup to serve your LLM requests within your company. However, you are still locked in to the entire idea that you need a LLM to run your company, in the future when inevitably the model companies consolidate, there will be much less choice on which models you can use and therefore your profits will be eaten up by the remaining company.

LLMs have the benefit of being able to deal with a lot of situations out of the box and so they are incredibly useful, but what we want is to have them figure out what to do and encode it in a format that is cheaper to run, does not rely on third parties, and is easier to debug if something goes wrong, i.e. computer code. So if we can find a way to combine probabilistic code generation with deterministic code generation such that the user does not need a degree in Computer Science to get halfway decent results, then we could eat the lunch of all of the software integrators without resigning our freedom and profit margins to OpenAI and Microsoft, or similarly Anthropic and AWS.

So how can we accomplish this? I think the answer is the Automated Command Line Interface, we want build a personalized command line interface for the user that is based on their usage of the Internet. Let’s say the user is making a pizza party, the user goes to Papa John’s website, we would see them get all of the current pizza deals, make some selections on the page to add to their cart and finally checkout with their address and credit card information. All the while, REST requests to Papa John’s server are being made for: GETting the webpage, POSTing the desired pizza to the cart, and POSTing the card info. So, our program would break each of these requests down into Skills get_menu, add_pizza_to_cart, checkout_cart and write the meta skill of give_me_papa_johns_pizza which does all three, we encode this process in python code with a decorator that tracks where the skill is in our codebase and reveals it to our automated command line interface so that we can use it later. Then, we let the user know that they have a new skill available. If the user continues to use the browser to do other business operations that involve ordering a pizza, for example setting up a pizza party, we can combine the order_paper_plates, order_plastic_utensils, order_party_hats, check_attendees_schedule commands into a new uber command to setup_pizza_party that would ask for the email list involved and order the appropriate amount of pizza to the right conference room and notify the user that they have a new Setup Pizza Party skill available.

Note that determining where a Skill starts and stops is not too important. Since we are in a regime where computer code is cheap to write and maintain, we should have as much of it as we possibly can, leaving what skills to use and reuse to the user, more on this later.

The user is screaming at us what they want to do and how, but because we have not had the ability to really listen until now, only CEOs and Software Engineers could really utilize the power of computers, the rest of the world is still in the type writer era, with a better screen attached.

When you get these skills and skill acquisition going, business becomes like a video game instead of drudgery. If to automate our testing pipeline at work (which took me a few weeks of figuring out what every step in our web interface was an how to massage the data to use each piece), I could instead just use our web tool, like a human would, and have our testing pipeline automated, I would feel like I had superpowers. Once that automation was done, using the same structure I described above, I already felt really powerful, but that feeling was tempered by the knowledge that if we add more steps to the user’s workflow I will need to go and update the pipeline. There are other techniques you could use to automate this automation process, for example parsing your SpringBoot server code-base and generating code based on the results, but we should remember that the vast majority of people at any given company have no ability to something like this, and an individual employee is removed from their Software Department at such a distance that they don’t know what automation they would want if they could ask for it. Add in Software Development timelines and the problem is exacerbated such that most people won’t even ask.

The great lesson of Capitalism and Federalism is that the further you can distribute the capabilities to make powerful decisions down to the person who has the most relevant information, i.e. the store owner, local government, etc., the better decisions get made. In most American companies that have a Software Department, there is a kind of Communism in place much like the utopian vision of Cybernetics movement of the 1980’s in Soviet Union. The great dream of Cybernetics is that we have a central department that uses its advanced computerized techniques and central data to make better decisions on behalf of the rest of the company/country. This was mainly necessary in corporate America because of the time and expense that it takes to get good at computers. Some of this has been mitigated by the advent of the Graphical User Interface and later the Web Browser, but these better interface have mostly been used to provide very structured steps for the user. Computers are powerful however, because of the composability of their operations, so far this composability has not been exposed to the average person, and I think the above scheme could change that.

Skill Discovery and Ranking

But okay lets back up, so far we have a loop that will build higher and higher levels of abstraction in the business processes that we want to automate, but we still have the same issue that a scheme like MCP has, too many skills and no way to tell when one of those skills is relevant to the one we are currently making, or if we want to give decision making access to a LLM how do we present to it under the ten skill threshold where it will be able to call the appropriate skill for the job? I think we should use a combination of a Elo system and vector embeddings to quickly retrieve the relevant code and skill to use.

By the user’s own usage of a skill the more useful skills will be strengthened in their Elo ranking and will be more likely to be displayed to the LLM when it is making new skills or using them. The same way that neural pathways in the brain are strengthened through their repeated use and a certain memory may be associated with another through their frequent juxtaposition, our skills that are used often will be near the top of the list of skills that an LLM can use when it is writing code for a new one, and skills that are frequently used in quick succession should quickly have a meta-level skill that calls all of them together.

In psychology this is the branding effect, you as a child may like polar bears(scary!) and like sugar (tasty!), but there is no association between them in your mind. After many Christmas ads where the polar bears are drinking Coke, you slowly build that association, so that whenever you see a polar bear in the zoo or in an ad you instantly think of Coke, or even more powerfully you begin to associate the end of the year, being with family and the entire Christmas season with Coke.

If you could write an algorithm that behaves in a similar way, it does not matter how many associations that you might make, or skills you might acquire, since the relevant ones will bubble to the top by their frequent usage.

The Missing Lobe

If we think about the two system thinking, from Thinking Fast and Slow, where System One is fast automatic and stereotypical thinking, and System Two is slow learning and thinking things through carefully, right now we have a good proxy for System Two in the LLMs, but have no standard System One for computers. Regardless if this project works out, we should think about that paradigm and how to build a System One for computers since that is clearly what is missing right now. If you had to relearn how to drive a car every time you went to the grocery store (learning is associated with System Two), you’d get in a lot more accidents. Think about how scary it was to learn to drive in the first place and how effortless it becomes after only a few short hours (In my home state it only takes 60 hours to get a driver’s license once you have the learners permit).

I suspect that what happens to make human learning so efficient is not that silicon neural networks are tremendously less efficient at learning a task from zero than the chemical brain (though there definitely is an argument for this cause the brain does fundamentally different math to achieve it results), but the brain has this way of composing meta-skills together at higher and higher levels of abstraction that we have largely ignored. When a person goes to learn to drive a car they are thinking about the wheel and the clutch and the brakes as different objects and so they are pretty poor to start off with, but since they learn them individually and combine them as a meta-skill they can learn the entire system faster than a neural network which combines the system.

Skill Recall

I think the final missing piece to our automated skill acquisition system is how can we efficiently store and search through all of the skills that we generate, we talked a little bit about sorting them by their usage, which is good, but where is the code stored? how are we going to label each command? I think you just use a Postgres DB that stores five columns, the first column is the python code that makes up the skill. The second column is the id’s of the other skills that are called by this skill. The third column is the Elo ranking of the skill. The fourth column is a list of words that describe the skill. The fifth column is the location in the code base where the python code exists. With that information I think that you could build a reliable skill acquisition and retrieval system assuming that you use something like libCST for the code transformations. When making a meta-skill you’d also need to recursively look at the sub-skills’ descriptions to make a new description, probably including information about when the user wanted to use the skill and for what reason (this can be a guess at first and refined over subsequent usage of the skill). You’d probably want to include information about what site a skill is used for, when the last usage was, etc. to make searching and sorting easier, so for example getting all of the skills related to Papa John’s would be easier, but maybe that’s just in the description.

Conclusion

What I am arguing for is not another layer of SaaS middleware, not another “Integrator” who sits between me and the machine, but a rethinking of how computers should respond to people. We don’t need AI that mimics the bureaucracy of a Software Department, we need AI that listens closely enough to our revealed behavior to translate it directly into tools. Every click, every REST call, every repetitive sequence of steps we take online is a blueprint for automation. If those patterns could be turned into skills that are composable, searchable, and personalized we would finally have a system that empowers the end user instead of locking them in.

The Everything CLI is not about chasing yet another platform war between OpenAI, Anthropic, Microsoft, or AWS. It’s about building the missing System One for computers, the fast, automatic layer that captures what we already know how to do and makes it available to us as code. If we succeed, the role of the “Software Integrator” disappears, because every person becomes their own integrator. Work starts to feel less like drudgery and more like play, and we stop asking permission from central authorities to make our own tools. That is the real promise of AI—not endless subscriptions, but freedom.