October had many language model releases. The mid-size models, and even the small models, are catching up to frontier models like GPT-4.5o in performance. But the release that blew us all away wasn’t a language model: It was Claude’s computer use API. Computer use allows you to teach Claude how to use a computer: how to run an application, click on buttons, and use a shell or an editor. It has many problems, security not being the least of them—but it’s bound to improve. Sending screen captures to Claude so it can compute where to click is clumsy at best, and there are no doubt better solutions (such as using accessibility tools). However, computer use gives us a glimpse at a future where we’ll be working with agents that can plan and execute complex multistep operations.
AI
- Little Language Models is an educational program that teaches young children about probability, artificial intelligence, and related topics. It’s fun and playful and can enable children to build simple models of their own.
- Grafana and NVIDIA are working on a large language model for observability, apparently given the awkward name LLo11yPop. The model aims to answer natural language questions about system status and performance based on telemetry data.
- Google is open-sourcing SynthID, a system for watermarking text so AI-generated documents can be traced to the LLM that generated them. Watermarks do not affect the accuracy or quality of generated documents. SynthID watermarks resist some tampering, including editing.
- Mistral has released two new models, Ministral 3B and Ministral 8B. These are small models, designed to work on resource-limited “edge” systems. Unlike many of Mistral’s previous small models, these are not open source.
- Anthropic has added a “computer use” API to Claude. Computer use allows the model to take control of the computer and use it to find data by reading the screen, clicking buttons and other affordances, and typing. It’s currently in beta.
- Moonshine is a new open source speech-to-text model that has been optimized for small, resource-constrained devices. It claims accuracy equivalent to Whisper, at five times the speed.
- Meta is releasing a free dataset named Open Materials 2024 to help materials scientists discover new materials.
- Anthropic has published some tools for working with Claude in GitHub. At this point, tools to help analyze financial data and build customer support agents are available.
- NVIDIA has quietly launched Llama-3.1-Nemotron-70B-Instruct-HF, a language model that outperforms both GPT-4o and Claude 3.5 on benchmarks. This model is based on the open source Llama, and it’s relatively small (70B parameters).
- NotebookLM has excited everyone with its ability to generate podcasts. Google has taken it a step farther by adding tools that give users more control over what the virtual podcast participants say.
- Data literacy is the new survival skill: We’ve known this for some time, but it’s all too easy to forget, particularly in the age of AI.
- The Open Source Initiative has a “humble” definition for open source AI. The definition recognizes four distinct categories for data: open, public, obtainable, and unshareable.
- Does training AI models require huge data centers? PrimeIntellect is training a 10B model using distributed, contributed resources.
- OpenAI has published Swarm, a platform for building AI agents, on GitHub. They caution that Swarm is experimental and they will not respond to pull requests. Feel free to join the experiment.
- OpenAI has also released Canvas, an interactive tool for writing code and text with GPT-4o. Canvas is similar to Claude’s Artifacts.
- Two of the newly released Llama 3.2 models—90B and 11B—are multimodal. The 11B model will run comfortably on a laptop. Meta has also released the Llama Stack APIs, a set of APIs to aid developers building generative AI applications.
- OpenAI has announced a pseudo-real-time API. Their goal is to enable building realistic voice applications, including the ability to interrupt the AI in the flow of conversation.
- Will AI-powered glasses become the next blockbuster consumer device? Meta’s Orion prototype could be the killer user interface for AI. It’s not about gaming; it’s about asking AI about the things you see. Now if they can only be manufactured at a decent price point.
- AI avatars are interviewing job candidates. This is not going to go well…
- The Allen Institute has developed a small language model called Molmo that they claim has performance equivalent to GPT-4o.
- Humane Intelligence, an organization founded by Rumman Chowdhury, has offered a prize to developers building an AI vision model that can detect online hate-based images.
- These days, it’s not a surprise that a computer can play chess and other board games. But table tennis? You may prefer the video to the paper.
- The Qwen family of language models, ranging from 0.5B to 72B parameters, is getting impressive reviews. Even the largest can be made to run on older GPUs, not just H100s and A100s.
- Now an AI can “prove” it’s human. An AI-based computer vision model has demonstrated the ability to defeat Google’s latest CAPTCHA (reCAPTCHAv2) 100% of the time.
- Open AI is now expanding access to its Advanced Voice Mode to more users. Advanced Voice Mode makes ChatGPT truly conversational: You can interrupt it mid-sentence, and it responds to your tone of voice.
- Neural motion planning is a neural network-based technique that allows robots to plan and execute tasks in unfamiliar environments.
Programming
- Safe C++ proposes extensions to the C++ language to make it memory safe. Errors in memory safety have long been the largest source of security vulnerabilities.
- Microsoft sees GenAIOps as a “paradigm shift” for IT. It will become increasingly necessary as software incorporates AI and IT teams need to become specialists in AI infrastructure. One aspect of GenAIOps will be collecting, curating, and cleaning datasets.
- Huly is an open source platform for project management.
- Typst is a new system for writing scientific (and other) texts. It has capabilities equivalent to LaTeX, but the syntax is much simpler, similar to Markdown.
- Microsoft has begun a project that will make Linux’s eBPF available on Windows. In the Linux world, eBPF has proven invaluable for observability, security, and compliance tools. Windows eBPF will be bytecode compatible with Linux.
- Python 3.13 has been released. The most important changes are a new REPL that features multiline editing and color support; an experimental option to disable the global interpreter lock (GIL); and an experimental just-in-time compiler.
- Ziggy is a new language for data serialization. It isn’t a general purpose programming language; it’s a specialized language for defining data schemas precisely and painlessly.
- Microsoft’s new security-first initiative is tied to their platform engineering efforts. Platform engineering limits the number of tools developers need to use, which in turn reduces the amount of code that needs to be secured and maintained.
- The CNCF Artifact Hub is a source for cloud native configurations, plug-ins, and other software for building cloud native infrastructure. It isn’t a GitHub-like repository; it links back to the artifacts’ sources rather than storing them.
- Want to run Linux on an Intel 4004, a CPU from 1971? It will take almost 5 days to boot. What’s more amazing is that it’s actually running on an emulator that runs on the 4004.
Security
- It’s no surprise that prompt injection works well against Anthropic’s amazing computer use API. Anthropic’s documentation warns of many vulnerabilities. So it’s also not surprising that someone has gone ahead and tried it. Don’t stop experimenting, but be careful.
- Imprompter is an attack against large language models that uses a malicious prompt to force the model to exfiltrate data from previous chats.
- One major source of security vulnerabilities is code that includes secrets (account names and passwords, certificates, etc.) HashiCorp’s Vault Radar scans software, including repositories and pull requests, to detect secrets that have been exposed.
- Mandiant security researchers have discovered that 70% of vulnerabilities that were exploited in the past year were zero-days—that is, new vulnerabilities that had not been previously reported. Once discovered, vulnerabilities are almost immediately weaponized and used as attacks.
- OpenAI has shut down the accounts of threat actors using GPT for a number of activities including developing malware, generating and propagating misinformation, and phishing. It would be surprising if similar abuse has not taken place with other models.
- GitLab’s latest security updates address a vulnerability that allows attackers to trigger CI/CD pipelines on any branch of a repository.
- Students have connected Meta’s Ray-Ban Smart Glasses to an invasive image search site. They then use language models to assemble data from a number of databases that contain personal information, such as addresses.
- Cloudflare has blocked a series of distributed denial of service (DDoS) attacks, including one with a peak rate of 3.8 terabits per second, the highest ever recorded.
- In incident reviews, don’t discuss action items responding to the incident. The incident review is about learning and understanding; talking about fixes will derail it. The fixes can always be discussed later, and will be better if they’re based on a firm understanding.
- We’ve long known that requirements for changing passwords were a bad practice. NIST is now proposing rules that would eliminate password composition requirements, such as one capital letter, one number, and one character in a non-Latin alphabet.
- A prompt injection attack against GPT’s long-term memory allows the attacker to send all of a user’s input and output to an arbitrary server. This attack is persistent; it remains in GPT’s long-term memory. At this point, it has been partially remediated.
- Kaspersky, which is shutting down US operations, has deleted their software from the US users’ computer and installed Pango Group’s UltraAV and (in some cases) UltraVPN without users’ permission. Kaspersky’s behavior begs the question: When does an antimalware vendor become malware?
Web
- Videos from XOXO 2024 have been posted. Molly White and Erin Kissane are particularly highly recommended.
- Do we need yet another React web framework? The developers of One think so. One promises to be simple, opinionated, and local-first.
- Tom Coates has announced the formation of the Social Web Foundation, an organization dedicated to helping federated networks grow in healthy ways.
- Trouble in the WordPress world: WordPress.org has blocked WP Engine, an important hosting provider for WordPress users, from accessing its resources. Drama ensues, escalates, and becomes increasingly vicious.
Hardware
- ARM has canceled the license that allows Qualcomm to produce the Snapdragon processor, which is the basis for most mobile phones. Is this an opportunity for RISC-V?
- There’s a new RISC-V microprocessor that’s not made of silicon. It’s flexible, low power, and capable of running AI workloads (though at relatively low speeds).
- Bunnie Huang leaves us with the terrifying realization that building a bomb into a small IoT device isn’t just feasible—it’s relatively easy and inexpensive.