Tuesday, November 12, 2024

Microsoft unveils serverless fine-tuning for its Phi-3 small language model


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Microsoft is a major backer and partner of OpenAI, but that doesn’t mean it wants to let the latter company run away with the generative AI ballgame.

As proof of that, today Microsoft announced a new way to fine-tune its Phi-3 small language model without developers having to manage their own servers, and for free (initially).

Fine-tuning refers to the process of adapting an AI model through system prompts or adjusting its underlying weights (parameters) to make it behave in different and more optimal ways for specific use cases and end users, even adding new capabilities.

What is Phi-3?

The company unveiled Phi-3, a 3 billion parameter model, back in April as a low-cost, enterprise grade option for third-party developers to build new applications and software atop of.

While significantly smaller than most other leading language models (Meta’s Llama 3.1 for instance, comes in a 405 billion parameter flavor — parameters being the “settings” that guide the neural network’s processing and responses), Phi-3 performed on the level of OpenAI’s GPT-3.5 model, according to comments provided at that time to VentureBeat by Sébastien Bubeck, Vice President of Microsoft generative AI.

Specifically, Phi-3 was designed to offer affordable performance on coding, common sense reasoning, and general knowledge.

It’s now a whole family consisting of 6 separate models with different numbers of parameters and context lengths (the amount of tokens, or numerical representations of data) the user can provide in a single input, the latter ranging from 4,000 to 128,000 — with costs ranging from $0.0003 USD per 1,000 input tokens to $0.0005 USD/1K input tokens.

However, put into the more typical “per million” token pricing, it comes out to $0.3/$0.9 per 1 million tokens to start, exactly double OpenAI’s new GPT-4o mini pricing for input and about 1.5 times as expensive for output tokens.

Phi-3 was designed to be safe for enterprises to use with guardrails to reduce bias and toxicity. Even back when it was first announced, Microsoft’s Bubeck promoted its capability to be fine-tuned for specific enterprise use cases.

“You can bring in your data and fine-tune this general model, and get amazing performance on narrow verticals,” he told us.

But at that point, there was no serverless option to fine-tune it: if you wanted to do it, you had to set up your own Microsoft Azure server or download the model and run it on your own local machine, which may not have enough space.

Serverless fine-tuning unlocks new options

Today, however, Microsoft announced the general public availability of its “Models-as-a-Service (serverless endpoint)” in its Azure AI development platform.

It also announced that “Phi-3-small is now available via a serverless endpoint so developers can quickly and easily get started with AI development without having to manage underlying infrastructure.”

Phi-3-vision, which can handle imagery inputs “will soon be available via a serverless endpoint” as well, according to Microsoft’s blog post.

But those models are simply available “as is” through Microsoft’s Azure AI development platform. Developers can build apps atop them, but they can’t create their own versions of the models tuned to their own use cases.

For developers looking to do that, Microsoft says they should turn to the Phi-3-mini and Phi-3-medium, which can be fine-tuned with third-party “data to build AI experiences that are more relevant to their users, safely, and economically.”

“Given their small compute footprint, cloud and edge compatibility, Phi-3 models are well suited for fine-tuning to improve base model performance across a variety of scenarios including learning a new skill or a task (e.g. tutoring) or enhancing consistency and quality of the response (e.g. tone or style of responses in chat/Q&A),” the company writes.

Specifically, Microsoft states that the educational software company Khan Academy is already using a fine-tuned Phi-3 to benchmark the performance of its Khanmigo for Teachers powered by Microsoft’s Azure OpenAI Service.

A new price and capability war for enterprise AI developers

The pricing for serverless fine-tuning of Phi-3-mini-4k-instruct starts at $0.004 per 1,000 tokens ($4 per 1 million tokens), while no pricing has been listed yet for the medium model.

While it’s a clear win for developers looking to stay in the Microsoft ecosystem, it’s also a notable competitor to Microsoft’s own ally OpenAI’s efforts to capture enterprise AI developers.

And OpenAI just days ago announced free fine-tuning of GPT-4o mini up to 2 million tokens per day through September 23rd, for so-called “Tier 4 and 5” users of its application programming interface (API), or those who spend at least $250 or $1000 on API credits.

Coming also on the heels of Meta’s release of the open source Llama 3.1 family and Mistral’s new Mistral Large 2 model, both of which can also be fine tuned for different uses, it’s clear the race to offer compelling AI options for enterprise development is in full swing — and AI providers are courting developers with both small and big models.


Related Articles

Latest Articles