Saturday, November 16, 2024

Radar Trends to Watch: June 2024 – O’Reilly

May was a month of announcements: between Google, Apple, Microsoft, and OpenAI, there was much ado about—well, very little, in fact. It’s always seemed to me that big announcements steal attention that might otherwise go to projects that are less flashy but more deserving. (Or maybe I’m just becoming jaded.)

That’s not to say that nothing interesting happened. We’re seeing continued interest in small language models—small enough to run on cell phones (which have more processing power than the supercomputers of a few decades ago). We’ve wondered whether new programming languages make sense in the era of AI-generated code—but we saw Bend (for highly parallel code) and Jolie (for services), plus LuaX (a new Lua interpreter) and Faer (for high-performance numerics in Rust). And for web developers, someone has been using CSS Grid to typeset music. Programming of various sorts is very much alive.


Learn faster. Dig deeper. See farther.

AI

  • The first two parts of the three-part series What We Learned from Year of Building with LLMs, have been posted on O’Reilly Radar. The third part will be posted on June 6. This series is an wide-ranging collection of wisdom and experience that will be essential to anyone building AI applications.
  • llama-fs is a filesystem based on Llama 3 that names and finds files for you. It’s a very interesting idea, though I’m not sure it’s one I would trust.
  • MonsterGPT is a tool on OpenAI’s GPT Marketplace for using ChatGPT to fine-tune smaller LLMs. You point it at the dataset (it can use datasets hosted on Hugging Face) and the model, and it does the rest.
  • Target Speech Hearing is a new system for noise canceling headphones that may allow the user to hear a single voice in a crowd; unwanted voices are canceled out.
  • Ambient Diffusion is a new training strategy for generative art that reduces the problem of reproducing works or styles that are in the training data. It trains models on corrupted versions of the initial training data, so that it is impossible to “memorize” any particular work.
  • Copilot+ PCs are personal computers with hardware capable of running AI applications, including neural processors and GPUs. Copilot+ PCs are intended to support AI features that are being integrated into Windows 11.
  • Meta has created a new family of mixed-modal models called Chameleon. Unlike multimodal models, which use different models for text and images, Chameleon is a single model and can freely integrate data from different modalities.
  • Here’s an implementation of Llama 3, in detail, from scratch. You need to download the weights from Meta.
  • Thom Wolf, one of Hugging Face’s cofounders, has published a list of books and articles to read if you want to get into AI.
  • GPT-4o can be used to aid in code reviews. It’s useful. But when it comes to real insight, it falls short. How many times do you want to be told to use longer variable names or write more comments?
  • A new brain interface device can convert thought into speech.
  • For better or for worse, Google is integrating generative AI into search. It has a serious problem with generating bad results, something that Google is trying to fix. Tom’s Hardware shows how to disable AI-generated results.
  • Google has announced “Project Astra,” which adds interactive voice and vision to its models. It also announced that a future version of Gemini will have a two-million-token context window. Other announcements include Gemini Flash, a lightweight model to run on smaller devices, and Veo, a text-to-video model that’s said to be comparable to Sora.
  • The latest version of GPT, GPT-4o, adds real-time interactive voice, vision, and emotional analysis capabilities. Latency on voice input has been reduced to 3.2 seconds.
  • OpenAI has released a draft proposal for Model Specs, which provide a way to specify the desired behavior for a model. Model specifications look like an interesting supplement to—though not a replacement for—model cards.
  • KnowHalu is a new framework for detecting hallucinations in large language model output.
  • A new, three-part series on AI safety is starting. It’s basic and looks reasonably well-balanced. Right now, only the first part has been written.
  • Can AI forget? Ben Lorica writes about unlearning, the process by which information can be removed from a pretrained model. Unlearning will be important for many reasons, not the least of which is European regulations about removing incorrect personal data.
  • Georgia Tech and Meta have created an open dataset of climate data to train AI for carbon capture systems.
  • Apple has released its OpenELM language models. These models are all relatively small (270M-3B parameters) and designed to run on mobile devices. Source code is available on Hugging Face; they are licensed under the Apple Sample Code License.
  • Snowflake-arctic-instruct is a new language model. It claims to be the largest truly open source model (128×3.66 parameter mixture of experts).

Programming

  • LuaX (Lua eXtended) is a new interpreter for the Lua programming language that can compile standalone executables.
  • Google has released Firebase Genkit support for its Gemma models. This framework allows JavaScript developers to create Node.js backends for integrating the Gemma language model into applications. Support for Go is promised soon.
  • Not useful but cool: a group at the University of Michigan has created spectrograms that look like images but that can be played as sound.
  • Bend is a new high-level programming language for generating highly parallel code. The code can run on multicore CPUs or on GPUs. Bend looks and feels like Python, but it automatically detects opportunities for parallelism.
  • Red Hat has made Red Hat Enterprise Linux (RHEL) bootable as a container image. This makes it easier to use RHEL in the context of modern cloud native development.
  • Patchwork attempts to extend Git-like source control from software to written texts and other artifacts. One possible application would be to help integrate human writers and AI assistants. More generally, its developers are interested in creating local-first collaborative data layers.
  • Jolie is a new programming language that’s designed for developing services, as opposed to functions or objects. It stresses contracts, which define the relationship between the user and the service. It’s ideal for designing APIs and microservices.
  • The Graph Query Language (GQL) is a new ISO standard for querying graph databases, putting it on a par with SQL.
  • Faer is a new Rust library for linear algebra. A good linear algebra library is a basic requirement for numerical computation, including machine learning and artificial intelligence.
  • A new Linux distribution, with the unfortunate name EB corbos Linux for Safety Applications, supports the automotive industry’s functional safety requirements, meaning that it can be used in embedded systems on automobiles.

Web

Security

  • An XSS vulnerability in GitLab allows one-click account takeover.
  • LastPass will start encrypting the URLs of the sites to which users login. These URLs aren’t particularly sensitive, but encryption is an important step toward a zero-knowledge design.
  • Something new to disable: Windows 11 is adding a “recall” feature that saves everything that takes place on the computer and allows applications to restore previous state. Recall is a major threat to security and privacy. Microsoft claims that content remains local, but that’s a song we’ve heard before.
  • Apple and Google have united on a standard for detecting Bluetooth tracking devices that are used for stalking users.
  • AI adoption by criminals is still relatively low, but real. Most of the activity focuses around jailbreaks for legitimate LLMs (jailbreak as a service) and deep fakes. There are a fair number of fraudulent datasets. So far, there is only one LLM trained for criminal applications.
  • TunnelVision, a newly discovered attack against virtually all VPNs allows the attacker to route the victim’s unencrypted traffic through the attacker’s servers. While this is called “new,” the vulnerability has existed since 2002.
  • Microsoft has proposed Zero Trust DNS (ZTDNS), a framework that claims to solve many of the security issues DNS has had over the years. All communications are encrypted. Resolvers are only allowed to resolve names that are explicitly allowed. It is unclear whether ZTDNS will be a Windows-only or an Enterprise-only solution. It is now in private preview.
  • A change in the mechanism for changing passwords has made GitLab vulnerable to account hijacking. In turn, a hijacked account could be use to plant vulnerabilities that compromise software supply chains.
  • The UK has banned guessable default passwords on IoT devices. Vendors can still sell devices with default passwords, but each password must be unique.
  • If you want to understand the xz attack in detail, here’s a guest lecture from Columbia. It includes a live demo.

Augmented and Virtual Reality

  • Researchers have developed augmented reality glasses that look like regular glasses rather than a helmet. They rely on holography to produce full color 3D images. While it’s unclear whether this will ever become a product, it’s exactly what AR needs to succeed.
  • Stability AI has released Stable Video 3D, which generates a 3D image from a single 2D image.

Design

  • Poor design has consequences: at least 11 people are running for president of Iceland who had no idea that they were running. The same confusing web page is used to endorse a candidate and to register your own candidacy.
  • IF has been curating a catalog of design patterns for AI. It’s a great source for people who are designing AI systems and who need to build services that their users will trust.

Robotics

  • Cylon is a JavaScript framework for robotics and the Internet of Things. If you want to use Node.js when you’re programming robots, now you can.
  • An autonomous AI-enabled robot has designed, built, and tested a 3D object that is currently the world’s best shock absorber. It absorbs 75% of the energy used to crush it.
  • The incorporation of AI into robotics means roboticists need new sources of data. Where will that data come from (paywall)? 3D data is preferable, but slow and expensive to develop. Online videos?


Related Articles

Latest Articles