1.6 AI Today: What Is Solved, What Is Hard, What Is Hype

Alright, let’s cut through the noise. You’re here because you’ve heard the two prevailing narratives: AI is either about to usher in a utopian paradise or it’s a sky-net that’s going to turn us all into paperclips. The reality is, of course, far more mundane, fascinating, and frankly, a bit ridiculous. Today’s AI landscape is a bizarre patchwork of the genuinely miraculous, the stubbornly impossible, and a truly staggering amount of marketing fluff. Let’s map it out.

What Is Actually Solved (The “Shockingly Competent” Tier)

We’ve nailed pattern recognition in high-dimensional, static data. This sounds boring until you realize it describes about 80% of what people think is magic.

Perception is largely a solved problem. Giving a machine decent eyes and ears? Done. Object detection, speech-to-text, and image generation aren’t just good; they’re commodity services. You can spin up a vision API in an afternoon that would have required a multi-million-dollar DARPA grant 15 years ago.

# A shockingly simple example using OpenAI's CLIP to classify an image.
# This is the kind of thing that would have been a PhD thesis in 2010.
from PIL import Image
import openai

# Load an image of your dubious-looking breakfast
image = Image.open("questionable_avocado_toast.jpg")

# No training needed. Zero. Zilch. Just ask.
response = openai.ChatCompletion.create(
    model="gpt-4-vision-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_to_base64(image)}"}},
            ],
        }
    ],
    max_tokens=300,
)

print(response.choices[0].message.content)
# Output: "A piece of avocado toast, though the avocado is slightly brown and there's an unidentifiable green sprinkle that might be parsley or might be something else. Proceed with caution."

The “why” here is that we finally found architectures (like CNNs for vision and Transformers for language) that are brutally effective at distilling statistical patterns from vast amounts of data. We have the compute, and we scraped the entire internet for that data. It was a perfect, albeit computationally grotesque, storm.

Best Practice: For these perception tasks, never build a model from scratch. Fine-tune a pre-trained foundation model on your specific data. You’re standing on the shoulders of a giant that was trained on a trillion images, not a guy who saw 1000 of your cat pictures.

What Is Genuinely Hard (The “Oh, We’re Not Even Close” Tier)

This is where the real AI research happens. The easy stuff is productized. The hard stuff remains… hard.

Reasoning, Causality, and Common Sense. LLMs are glorious stochastic parrots. They are the ultimate pattern matchers, but they do not reason. They can perfectly mimic the syntax of thought without any of the semantics. Ask one to reason about cause and effect that isn’t blatantly stated in its training data, and it will confidently hallucinate nonsense. It has no internal world model. My favorite example is that any modern LLM can write a beautiful essay on the tragedy of a family pet dying, but it cannot answer a simple question like “If I put my shoes in the fridge, will they be cold?” without first having seen a dozen nearly-identical examples. It doesn’t know what cold is.

Real-World Interaction and Adaptability. Beating the world champion at Go is solved. Getting a robot to fold a pile of laundry is a multi-year, multi-PhD project for a top lab. The real world is infinitely messy, unpredictable, and expensive to simulate. This is the “Moravec’s Paradox” in action: what is hard for humans is easy for AI, and what is easy for humans (like not pouring orange juice into your cereal bowl instead of the glass) is insanely hard for AI.

Pitfall: Never, ever trust an LLM’s output without a verification step. It doesn’t know truth from falsehood; it knows likely-seeming sequences of tokens from unlikely ones. This is why Retrieval Augmented Generation (RAG) is so crucial—it tethers the model’s creativity to actual facts from a knowledge base.

What Is Pure, Unadulterated Hype (The “Please Stop This” Tier)

The hype cycle is currently powered by three main engines:

1. AGI is Around the Corner: No. It’s not. I promise. We have no consensus on what consciousness or general intelligence even is, let alone a roadmap to build it. Every time you see a headline like “Google Engineer Claims AI Is Sentient,” pour yourself a strong drink. That engineer has confused a very sophisticated pattern generator for a mind. It’s an understandable mistake, given how compelling the mimicry is, but it’s a categorical error.

2. AI “Understanding”: When a company says its AI “understands” your business needs, replace the word “understands” with “statistically models based on the data it was fed.” The former sells SaaS subscriptions; the latter is what’s actually happening.

3. The Myth of Autonomous Everything: The current best practice in robotics is not full autonomy; it is human-in-the-loop design. The AI handles the tedious, pattern-heavy lifting (e.g., “identify all the defects on this circuit board”) and the human handles the nuanced judgment calls (“is this critical or just a cosmetic scratch?”). Anyone selling you on a fully autonomous “AI employee” is either lying or dangerously naive.

The state of the art, right now, is this: we have built the most incredible tools for amplification. They can amplify human creativity, productivity, and unfortunately, human bias and error. Your job is to learn to wield these tools, understand their profound limitations, and laugh when someone tries to tell you they’re magic. They’re not magic. They’re just the most interesting engineering we’ve ever done.