<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>metaflow &#8211; Noise</title>
	<atom:link href="https://noise.getoto.net/tag/metaflow/feed/" rel="self" type="application/rss+xml" />
	<link>https://noise.getoto.net</link>
	<description>The collective thoughts of the interwebz</description>
	<lastBuildDate>Tue, 04 Nov 2025 20:33:44 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8.2</generator>
	<item>
		<title>Supercharging the ML and AI Development Experience at Netflix</title>
		<link>https://noise.getoto.net/2025/11/04/supercharging-the-ml-and-ai-development-experience-at-netflix/</link>
		
		<dc:creator><![CDATA[Netflix Technology Blog]]></dc:creator>
		<pubDate>Tue, 04 Nov 2025 20:33:44 +0000</pubDate>
				<category><![CDATA[artificial intelligence]]></category>
		<category><![CDATA[Developer Tools]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[metaflow]]></category>
		<category><![CDATA[mlops]]></category>
		<guid isPermaLink="false">https://medium.com/p/b2d5b95c63eb</guid>

					<description><![CDATA[Supercharging the ML and AI Development Experience at Netflix with MetaflowShashank Srikanth, Romain CledatMetaflow — a framework we started and open-sourced in 2019 — now powers a wide range of ML and AI systems across Netflix and at many other compan...]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Introducing Configurable Metaflow</title>
		<link>https://noise.getoto.net/2024/12/20/introducing-configurable-metaflow/</link>
		
		<dc:creator><![CDATA[Netflix Technology Blog]]></dc:creator>
		<pubDate>Fri, 20 Dec 2024 07:11:37 +0000</pubDate>
				<category><![CDATA[machine learning]]></category>
		<category><![CDATA[metaflow]]></category>
		<category><![CDATA[mlops]]></category>
		<guid isPermaLink="false">https://medium.com/p/d2fb8e9ba1c6</guid>

					<description><![CDATA[<p><a href="https://www.linkedin.com/in/david-j-berg/"><em>David J. Berg</em></a>*<em>, </em><a href="https://www.linkedin.com/in/david-casler-05a5278/"><em>David Casler</em></a>^, <a href="https://www.linkedin.com/in/romain-cledat-4a211a5/"><em>Romain Cledat</em></a>*<em>, </em><a href="https://www.linkedin.com/in/qian-huang-emma/"><em>Qian Huang</em></a>*<em>, </em><a href="https://www.linkedin.com/in/rui-lin-483a83111/"><em>Rui Lin</em></a>*<em>, </em><a href="https://www.linkedin.com/in/nissanpow/"><em>Nissan Pow</em></a>*<em>, </em><a href="https://www.linkedin.com/in/nurcansonmez/"><em>Nurcan Sonmez</em></a>*<em>, </em><a href="https://www.linkedin.com/in/shashanksrikanth/"><em>Shashank Srikanth</em></a>*<em>, </em><a href="https://www.linkedin.com/in/chaoying-wang/"><em>Chaoying Wang</em></a>*<em>, </em><a href="https://www.linkedin.com/in/reginalw/"><em>Regina Wang</em></a>*<em>, </em><a href="https://www.linkedin.com/in/zitingyu/"><em>Darin Yu</em></a>*<br>*: Model Development Team, Machine Learning Platform<br>^: Content Demand Modeling Team</p><p>A month ago at QConSF, we showcased how <a href="https://qconsf.com/presentation/nov2024/supporting-diverse-ml-systems-netflix">Netflix utilizes Metaflow to power a diverse set of ML and AI use cases</a>, managing thousands of unique Metaflow flows. This followed a previous <a href="https://netflixtechblog.com/supporting-diverse-ml-systems-at-netflix-2d2e6b6d205d">blog</a> on the same topic. Many of these projects are under constant development by dedicated teams with their own business goals and development best practices, such as the system that <a href="https://netflixtechblog.com/supporting-content-decision-makers-with-machine-learning-995b7b76006f">supports our content decision makers</a>, or the system that ranks which language subtitles are most valuable for a specific piece of content.</p><p>As a central ML and AI platform team, our role is to empower our partner teams with tools that maximize their productivity and effectiveness, while adapting to their specific needs (not the other way around). This has been a guiding design principle with <a href="https://netflixtechblog.com/open-sourcing-metaflow-a-human-centric-framework-for-data-science-fa72e04a5d9">Metaflow since its inception</a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XrOVl25ZLx8_4nHLRxNgDg.png"><figcaption>Metaflow infrastructure stack</figcaption></figure><p>Standing on the shoulders of our extensive cloud infrastructure, Metaflow facilitates easy access to data, compute, and <a href="https://netflixtechblog.com/maestro-netflixs-workflow-orchestrator-ee13a06f9c78">production-grade workflow orchestration</a>, as well as built-in best practices for common concerns such as <a href="https://docs.metaflow.org/scaling/tagging">collaboration</a>, <a href="https://docs.metaflow.org/metaflow/basics#artifacts">versioning</a>, <a href="https://docs.metaflow.org/scaling/dependencies">dependency management</a>, and <a href="https://outerbounds.com/blog/metaflow-dynamic-cards">observability</a>, which teams use to setup ML/AI experiments and systems that work for them. As a result, Metaflow users at Netflix have been able to run millions of experiments over the past few years without wasting time on low-level concerns.</p><h3>A long standing FAQ: configurable flows</h3><p>While Metaflow aims to be un-opinionated about some of the upper levels of the stack, some teams within Netflix have developed their own opinionated tooling. As part of Metaflow’s adaptation to their specific needs, we constantly try to understand what has been developed and, more importantly, what gaps these solutions are filling.</p><p>In some cases, we determine that the gap being addressed is very team specific, or too opinionated at too high a level in the stack, and we therefore decide to not develop it within Metaflow. In other cases, however, we realize that we can develop an underlying construct that aids in filling that gap. Note that even in that case, we do not always aim to completely fill the gap and instead focus on extracting a more general lower level concept that can be leveraged by that particular user but also by others. One such recurring pattern we noticed at Netflix is the need to deploy sets of closely related flows, often as part of a larger pipeline involving table creations, ETLs, and deployment jobs. Frequently, practitioners want to <a href="https://docs.metaflow.org/production/coordinating-larger-metaflow-projects">experiment with variants</a> of these flows, testing new data, new parameterizations, or new algorithms, while keeping the overall structure of the flow or flows intact.</p><p>A natural solution is to make flows configurable using configuration files, so variants can be defined without changing the code. Thus far, there hasn’t been a built-in solution for configuring flows, so teams have built their bespoke solutions leveraging Metaflow’s <a href="https://docs.metaflow.org/metaflow/basics#advanced-parameters">JSON-typed Parameters</a>, <a href="https://docs.metaflow.org/scaling/data#data-in-local-files">IncludeFile</a>, and <a href="https://docs.metaflow.org/production/scheduling-metaflow-flows/scheduling-with-aws-step-functions#deploy-time-parameters">deploy-time Parameters</a> or deploying their own home-grown solution (often with great pain). However, none of these solutions make it easy to configure all aspects of the flow’s behavior, decorators in particular.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*3f9q7PZgxYX8rRygIOWXyA.png"><figcaption>Requests for a feature like Metaflow Config</figcaption></figure><p>Outside Netflix, we have seen similar frequently asked questions on the <a href="http://chat.metaflow.org/">Metaflow community Slack</a> as shown in the user quotes above:</p><ul><li>how can I adjust <a href="https://docs.metaflow.org/scaling/remote-tasks/requesting-resources">the @resource requirements</a>, such as CPU or memory, without having to hardcode the values in my flows?</li><li>how to adjust <a href="https://docs.metaflow.org/production/scheduling-metaflow-flows/scheduling-with-argo-workflows#time-based-triggering">the triggering @schedule</a> programmatically, so our production and staging deployments can run at different cadences?</li></ul><h3>New in Metaflow: Configs!</h3><p>Today, to answer the FAQ, we introduce a new — small but mighty — feature in Metaflow: <a href="https://docs.metaflow.org/metaflow/configuring-flows/introduction">a Config object</a>. Configs complement the existing Metaflow constructs of artifacts and Parameters, by allowing you to configure all aspects of the flow, decorators in particular, prior to any run starting. At the end of the day, artifacts, Parameters and Configs are all stored as artifacts by Metaflow but they differ in when they are persisted as shown in the diagram below:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*L-klklqt1n9LKXG0jh-fTw.png"><figcaption>Different data artifacts in Metaflow</figcaption></figure><p>Said another way:</p><ul><li>An<strong> artifact</strong> is resolved and persisted to the datastore at the end of each task.</li><li>A<strong> parameter</strong> is resolved and persisted at the start of a run; it can therefore be modified up to that point. One common use case is to use <a href="https://docs.metaflow.org/production/event-triggering">triggers</a> to pass values to a run right before executing. Parameters can only be used within your step code.</li><li>A<strong> config</strong> is resolved and persisted when the flow is deployed. When using a scheduler such as <a href="https://docs.metaflow.org/production/scheduling-metaflow-flows/scheduling-with-argo-workflows">Argo Workflows</a>, deployment happens when create’ing the flow. In the case of a local run, “deployment” happens just prior to the execution of the run — think of “deployment” as gathering all that is needed to run the flow. Unlike parameters, configs can be used more widely in your flow code, particularly, they can be used in step or flow level decorators as well as to set defaults for parameters. Configs can of course also be used within your flow.</li></ul><p>As an example, you can specify a Config that reads a pleasantly human-readable configuration file, formatted as <a href="https://toml.io/en/">TOML</a>. The Config specifies a triggering ‘@schedule’ and ‘@resource’ requirements, as well as application-specific parameters for this specific deployment:</p><pre>[schedule]<br>cron = "0 * * * *"<br><br>[model]<br>optimizer = "adam"<br>learning_rate = 0.5<br><br>[resources]<br>cpu = 1</pre><p>Using the newly released Metaflow 2.13, you can configure a flow with a Config like above, as demonstrated by this flow:</p><pre>import pprint<br>from metaflow import FlowSpec, step, Config, resources, config_expr, schedule<br><br>@schedule(cron=config_expr("config.schedule.cron"))<br>class ConfigurableFlow(FlowSpec):<br>    config = Config("config", default="myconfig.toml", parser="tomllib.loads")<br><br>    @resources(cpu=config.resources.cpu)<br>    @step<br>    def start(self):<br>        print("Config loaded:")<br>        pprint.pp(self.config)<br>        self.next(self.end)<br><br>    @step<br>    def end(self):<br>        pass<br><br>if __name__ == "__main__":<br>    ConfigurableFlow()</pre><p>There is a lot going on in the code above, a few highlights:</p><ul><li>you can refer to configs <em>before</em> they have been defined using ‘config_expr’.</li><li>you can define arbitrary <a href="https://docs.metaflow.org/metaflow/configuring-flows/parsing-configs">parsers</a> — using a string means the parser doesn’t even have to be present remotely!</li></ul><p>From the developer’s point of view, Configs behave like dictionary-like artifacts. For convenience, they support the dot-syntax (when possible) for accessing keys, making it easy to access values in a nested configuration. You can also unpack the whole Config (or a subtree of it) with Python’s standard dictionary unpacking syntax, ‘**config’. The standard dictionary subscript notation is also available.</p><p>Since Configs turn into dictionary artifacts, they get versioned and persisted automatically as artifacts. You can <a href="https://docs.metaflow.org/metaflow/client">access Configs of any past runs easily through the Client API</a>. As a result, your data, models, code, Parameters, Configs, and <a href="https://docs.metaflow.org/scaling/dependencies">execution environments</a> are all stored as a consistent bundle — neatly organized in <a href="https://docs.metaflow.org/scaling/tagging">Metaflow namespaces</a> — paving the way for easily reproducible, consistent, low-boilerplate, and now easily configurable experiments and robust production deployments.</p><h3>More than a humble config file</h3><p>While you can get far by accompanying your flow with a simple config file (stored in your favorite format, thanks to <a href="https://docs.metaflow.org/metaflow/configuring-flows/parsing-configs">user-definable parsers</a>), Configs unlock a number of advanced use cases. Consider these examples from the updated documentation:</p><ul><li>You can <a href="https://docs.metaflow.org/metaflow/configuring-flows/basic-configuration#mixing-configs-and-parameters"><strong>choose the right level of runtime configurability</strong></a> versus fixed deployments by mixing Parameters and Configs. For instance, you can use a Config to define a default value for a parameter which can be <a href="https://docs.metaflow.org/production/event-triggering/external-events#passing-parameters-in-events">overridden by a real-time event</a> as a run is triggered.</li><li>You can define a custom parser to <a href="https://docs.metaflow.org/metaflow/configuring-flows/parsing-configs#validating-configs-with-pydantic"><strong>validate the configuration</strong></a>, e.g. using the popular <a href="https://docs.pydantic.dev/latest/">Pydantic</a> library.</li><li>You are not limited to using a single file: you can leverage a configuration manager like <a href="https://omegaconf.readthedocs.io/en/2.3_branch/">OmegaConf</a> or <a href="https://hydra.cc/">Hydra</a> to <a href="https://docs.metaflow.org/metaflow/configuring-flows/parsing-configs#advanced-configurations-with-omegaconf"><strong>manage a hierarchy of cascading configuration files</strong></a>. You can also use a domain-specific tool for generating Configs, such as Netflix’s <em>Metaboost</em> which we cover below.</li><li>You can also <a href="https://docs.metaflow.org/metaflow/configuring-flows/custom-parsers#generating-configs-programmatically"><strong>generate configurations on the fly</strong></a>, e.g. fetch Configs from an external service, or inspect the execution environment, such as the current GIT branch, and include it as an extra piece of context in runs.</li></ul><p>A major benefit of Config over previous more hacky solutions for configuring flows is that they work seamlessly with other features of Metaflow: you can run steps remotely and deploy flows to production, even when relying on custom parsers, without having to worry about packaging Configs or parsers manually or keeping Configs consistent across tasks. Configs also work with the <a href="https://docs.metaflow.org/metaflow/managing-flows/runner">Runner</a> and <a href="https://docs.metaflow.org/metaflow/managing-flows/deployer">Deployer</a>.</p><h3>The Hollywood principle: don’t call us, we’ll call you</h3><p>When used in conjunction with a configuration manager like <a href="https://hydra.cc/">Hydra</a>, Configs enable a pattern that is highly relevant for ML and AI use cases: orchestrating experiments over multiple configurations or sweeping over parameter spaces. While Metaflow has always supported <a href="https://docs.outerbounds.com/grid-search-with-metaflow/">sweeping over parameter grids</a> easily using foreaches, it hasn’t been easily possible to alter the flow itself, e.g. to change <a href="https://docs.metaflow.org/api/step-decorators/resources">@resources</a> or <a href="https://docs.metaflow.org/api/step-decorators/conda">@pypi/@conda</a> dependencies for every experiment.</p><p>In a typical case, you trigger a Metaflow flow that consumes a configuration file, changing <em>how</em> a run behaves. With Hydra, you can <a href="https://en.wikipedia.org/wiki/Inversion_of_control">invert the control</a>: it is Hydra that decides <em>what</em> gets run based on a configuration file. Thanks to Metaflow’s new <a href="https://docs.metaflow.org/metaflow/managing-flows/runner">Runner</a> and <a href="https://docs.metaflow.org/metaflow/managing-flows/deployer">Deployer</a> APIs, you can create a Hydra app that operates Metaflow programmatically — for instance, to deploy and execute hundreds of variants of a flow in a large-scale experiment.</p><p><a href="https://docs.metaflow.org/metaflow/configuring-flows/config-driven-experimentation">Take a look at two interesting examples of this pattern</a> in the documentation. As a teaser, this video shows Hydra orchestrating deployment of tens of Metaflow flows, each of which benchmarks PyTorch using a varying number of CPU cores and tensor sizes, updating a visualization of the results in real-time as the experiment progresses:</p><a href="https://medium.com/media/e1e6d120dc74e75d9e52956b6cee7efe/href">https://medium.com/media/e1e6d120dc74e75d9e52956b6cee7efe/href</a><h3>Metaboosting Metaflow — based on a true story</h3><p>To give a motivating example of what configurations look like at Netflix in practice, let’s consider <em>Metaboost</em>, an internal Netflix CLI tool that helps ML practitioners manage, develop and execute their cross-platform projects, somewhat similar to the open-source Hydra discussed above but with specific integrations to the Netflix ecosystem. Metaboost is an example of an opinionated framework developed by a team already using Metaflow. In fact, a part of the inspiration for introducing Configs in Metaflow came from this very use case.</p><p>Metaboost serves as a single interface to three different internal platforms at Netflix that manage ETL/Workflows (<a href="https://netflixtechblog.com/maestro-netflixs-workflow-orchestrator-ee13a06f9c78"><em>Maestro</em></a>), Machine Learning Pipelines (<a href="https://docs.metaflow.org/"><em>Metaflow</em></a>) and Data Warehouse Tables (<em>Kragle</em>). In this context, having a single configuration system to manage a ML project holistically gives users increased project coherence and decreased project risk.</p><h4>Configuration in Metaboost</h4><p>Ease of configuration and templatizing are core values of Metaboost. Templatizing in Metaboost is achieved through the concept of <em>bindings</em>, wherein we can <em>bind</em> a Metaflow pipeline to an arbitrary label, and then create a corresponding bespoke configuration for that label. The binding-connected configuration is then merged into a global set of configurations containing such information as GIT repository, branch, etc. Binding a Metaflow, will also signal to Metaboost that it should instantiate the Metaflow flow once per binding into our orchestration cluster.</p><p>Imagine a ML practitioner on the Netflix Content ML team, sourcing features from hundreds of columns in our data warehouse, and creating a multitude of models against a <em>growing</em> suite of metrics. When a brand new content metric comes along, with Metaboost, the first version of the metric’s predictive model can easily be created by simply swapping the target column against which the model is trained.</p><p>Subsequent versions of the model will result from experimenting with hyper parameters, tweaking feature engineering, or conducting feature diets. Metaboost’s bindings, and their integration with Metaflow Configs, can be leveraged to scale the number of experiments as fast as a scientist can create experiment based configurations.</p><h4>Scaling experiments with Metaboost bindings — backed by Metaflow Config</h4><p>Consider a Metaboost ML project named `demo` that creates and loads data to custom tables (ETL managed by Maestro), and then trains a simple model on this data (ML Pipeline managed by Metaflow). The project structure of this repository might look like the following:</p><pre>├── metaflows<br>│   ├── custom                               -&#62; custom python code, used by<br>&#124;   &#124;   &#124;                                       Metaflow<br>│   │   ├── data.py<br>│   │   └── model.py<br>│   └── training.py                          -&#62; defines our Metaflow pipeline<br>├── schemas<br>│   ├── demo_features_f.tbl.yaml             -&#62; table DDL, stores our ETL<br>&#124;   &#124;                                           output, Metaflow input<br>│   └── demo_predictions_f.tbl.yaml          -&#62; table DDL,<br>&#124;                                               stores our Metaflow output<br>├── settings<br>│   ├── settings.configuration.EXP_01.yaml   -&#62; defines the additive<br>&#124;   &#124;                                           config for Experiment 1<br>│   ├── settings.configuration.EXP_02.yaml   -&#62; defines the additive<br>&#124;   &#124;                                           config for Experiment 2<br>│   ├── settings.configuration.yaml          -&#62; defines our global<br>&#124;   &#124;                                           configuration<br>│   └── settings.environment.yaml            -&#62; defines parameters based on<br>&#124;                                               git branch (e.g. READ_DB)<br>├── tests<br>├── workflows<br>│   ├── sql<br>│   ├── demo.demo_features_f.sch.yaml        -&#62; Maestro workflow, defines ETL<br>│   └── demo.main.sch.yaml                   -&#62; Maestro workflow, orchestrates<br>&#124;                                               ETLs and Metaflow<br>└── metaboost.yaml                           -&#62; defines our project for<br>                                                Metaboost</pre><p>The configuration files in the settings directory above contain the following YAML files:</p><pre># settings.configuration.yaml (global configuration)<br>model:<br>  fit_intercept: True<br>conda:<br>  numpy: '1.22.4'<br>  "scikit-learn": '1.4.0'</pre><pre># settings.configuration.EXP_01.yaml<br>target_column: metricA<br>features:<br>  - runtime<br>  - content_type<br>  - top_billed_talent</pre><pre># settings.configuration.EXP_02.yaml<br>target_column: metricA<br>features:<br>  - runtime<br>  - director<br>  - box_office</pre><p>Metaboost will merge each experiment configuration (<em>*.EXP*.yaml</em>) into the global configuration (settings.configuration.yaml) <em>individually</em> at Metaboost command initialization. Let’s take a look at how Metaboost combines these configurations with a Metaboost command:</p><pre>(venv-demo) ~/projects/metaboost-demo [branch=demoX] <br>$ metaboost metaflow settings show --yaml-path=configuration<br><br>binding=EXP_01:<br>model:                     -&#62; defined in setting.configuration.yaml (global)<br>  fit_intercept: true<br>conda:                     -&#62; defined in setting.configuration.yaml (global)<br>  numpy: 1.22.4<br>  "scikit-learn": 1.4.0<br>target_column: metricA     -&#62; defined in setting.configuration.EXP_01.yaml<br>features:                  -&#62; defined in setting.configuration.EXP_01.yaml<br>- runtime<br>- content_type<br>- top_billed_talent<br><br>binding=EXP_02:<br>model:                     -&#62; defined in setting.configuration.yaml (global)<br>  fit_intercept: true<br>conda:                     -&#62; defined in setting.configuration.yaml (global)<br>  numpy: 1.22.4<br>  "scikit-learn": 1.4.0<br>target_column: metricA     -&#62; defined in setting.configuration.EXP_02.yaml<br>features:                  -&#62; defined in setting.configuration.EXP_02.yaml<br>- runtime<br>- director<br>- box_office</pre><p>Metaboost understands it should deploy/run two independent instances of training.py — one for the EXP_01 binding and one for the EXP_02 binding. You can also see that Metaboost is aware that the tables and ETL workflows are <em>not bound</em>, and should only be deployed once. These details of which artifacts to bind and which to leave unbound are encoded in the project’s top-level metaboost.yaml file.</p><pre>(venv-demo) ~/projects/metaboost-demo [branch=demoX] <br>$ metaboost project list<br><br>Tables (metaboost table list):<br>schemas/demo_predictions_f.tbl.yaml (binding=default):<br>    table_path=prodhive/demo_db/demo_predictions_f<br>schemas/demo_features_f.tbl.yaml (binding=default):<br>    table_path=prodhive/demo_db/demo_features_f<br><br>Workflows (metaboost workflow list):<br>workflows/demo.demo_features_f.sch.yaml (binding=default):<br>    cluster=sandbox, workflow.id=demo.branch_demox.demo_features_f<br>workflows/demo.main.sch.yaml (binding=default):<br>    cluster=sandbox, workflow.id=demo.branch_demox.main<br><br>Metaflows (metaboost metaflow list):<br>metaflows/training.py (binding=EXP_01): -&#62; EXP_01 instance of training.py<br>    cluster=sandbox, workflow.id=demo.branch_demox.EXP_01.training   <br>metaflows/training.py (binding=EXP_02): -&#62; EXP_02 instance of training.py<br>    cluster=sandbox, workflow.id=demo.branch_demox.EXP_02.training</pre><p>Below is a simple Metaflow pipeline that fetches data, executes feature engineering, and trains a LinearRegression model. The work to integrate Metaboost Settings into a user’s Metaflow pipeline (implemented using Metaflow Configs) is as easy as adding a single mix-in to the FlowSpec definition:</p><pre>from metaflow import FlowSpec, Parameter, conda_base, step<br>from custom.data import feature_engineer, get_data<br>from metaflow.metaboost import MetaboostSettings<br><br>@conda_base(<br>    libraries=MetaboostSettings.get_deploy_time_settings("configuration.conda")<br>)<br>class DemoTraining(FlowSpec, MetaboostSettings):<br>    prediction_date = Parameter("prediction_date", type=int, default=-1)<br><br>    @step<br>    def start(self):<br>        # get show_settings() for free with the mixin<br>        # and get convenient debugging info<br>        self.show_settings(exclude_patterns=["artifact*", "system*"])<br><br>        self.next(self.get_features)<br><br>    @step<br>    def get_features(self):<br>        # feature engineers on our extracted data<br>        self.fe_df = feature_engineer(<br>            # loads data from our ETL pipeline<br>            data=get_data(prediction_date=self.prediction_date),<br>            features=self.settings.configuration.features +<br>                [self.settings.configuration.target_column]<br>        )<br><br>        self.next(self.train)<br><br>    @step<br>    def train(self):<br>        from sklearn.linear_model import LinearRegression<br><br>        # trains our model<br>        self.model = LinearRegression(<br>            fit_intercept=self.settings.configuration.model.fit_intercept<br>        ).fit(<br>            X=self.fe_df[self.settings.configuration.features],<br>            y=self.fe_df[self.settings.configuration.target_column]<br>        )<br>        print(f"Fit slope: {self.model.coef_[0]}")<br>        print(f"Fit intercept: {self.model.intercept_}")<br><br>        self.next(self.end)<br><br>    @step<br>    def end(self):<br>        pass<br><br><br>if __name__ == "__main__":<br>    DemoTraining()</pre><p>The Metaflow Config is added to the FlowSpec by mixing in the MetaboostSettings class. Referencing a configuration value is as easy as using the dot syntax to drill into whichever parameter you’d like.</p><p>Finally let’s take a look at the output from our sample Metaflow above. We execute experiment EXP_01 with</p><pre>metaboost metaflow run --binding=EXP_01</pre><p>which upon execution will merge the configurations into a single <em>settings</em> file (shown previously) and serialize it as a yaml file to the <em>.metaboost/settings/compiled/</em> directory.</p><p>You can see the actual command and args that were sub-processed in the <em>Metaboost Execution</em> section below. Please note the <strong>–config</strong> argument pointing to the serialized yaml file, and then subsequently accessible via <strong>self.settings</strong>. Also note the convenient printing of configuration values to stdout during the start step using a mixed in function named <strong>show_settings()</strong>.</p><pre>(venv-demo) ~/projects/metaboost-demo [branch=demoX] <br>$ metaboost metaflow run --binding=EXP_01<br><br>Metaboost Execution: <br> - python3.10 /root/repos/cdm-metaboost-irl/metaflows/training.py<br>   --no-pylint --package-suffixes=.py --environment=conda<br>   --config settings<br>   .metaboost/settings/compiled/settings.branch_demox.EXP_01.training.mP4eIStG.yaml<br>   run --prediction_date20241006<br><br>Metaflow 2.12.39+nflxfastdata(2.13.5);nflx(2.13.5);metaboost(0.0.27)<br>  executing DemoTraining for user:dcasler<br>Validating your flow...<br>    The graph looks good!<br>Bootstrapping Conda environment... (this could take a few minutes)<br>All packages already cached in s3.<br>All environments already cached in s3.<br><br>Workflow starting (run-id 50), see it in the UI at<br>https://metaflowui.prod.netflix.net/DemoTraining/50<br><br>[50/start/251640833] Task is starting.<br>[50/start/251640833] Configuration Values:<br>[50/start/251640833]   settings.configuration.conda.numpy            = 1.22.4<br>[50/start/251640833]   settings.configuration.features.0             = runtime<br>[50/start/251640833]   settings.configuration.features.1             = content_type<br>[50/start/251640833]   settings.configuration.features.2             = top_billed_talent<br>[50/start/251640833]   settings.configuration.model.fit_intercept    = True<br>[50/start/251640833]   settings.configuration.target_column          = metricA<br>[50/start/251640833]   settings.environment.READ_DATABASE            = data_warehouse_prod<br>[50/start/251640833]   settings.environment.TARGET_DATABASE          = demo_dev<br>[50/start/251640833] Task finished successfully.<br><br>[50/get_features/251640840] Task is starting.<br>[50/get_features/251640840] Task finished successfully.<br><br>[50/train/251640854] Task is starting.<br>[50/train/251640854] Fit slope: 0.4702672504331096<br>[50/train/251640854] Fit intercept: -6.247919678070083<br>[50/train/251640854] Task finished successfully.<br><br>[50/end/251640868] Task is starting.<br>[50/end/251640868] Task finished successfully.<br><br>Done! See the run in the UI at<br>https://metaflowui.prod.netflix.net/DemoTraining/50</pre><h4>Takeaways</h4><p>Metaboost is an integration tool that aims to ease the project development, management and execution burden of ML projects at Netflix. It employs a configuration system that combines git based parameters, global configurations and arbitrarily <em>bound</em> configuration files for use during execution against internal Netflix platforms.</p><p>Integrating this configuration system with the new Config in Metaflow is incredibly simple (by design), only requiring users to add a mix-in class to their FlowSpec — <a href="https://docs.metaflow.org/metaflow/configuring-flows/custom-parsers#including-default-configs-in-flows">similar to this example in Metaflow documentation</a> — and then reference the configuration values in steps or decorators. The example above templatizes a training Metaflow for the sake of experimentation, but users could just as easily use bindings/configs to templatize their flows across target metrics, business initiatives or any other arbitrary lines of work.</p><h3>Try it at home</h3><p>It couldn’t be easier to get started with Configs! Just</p><pre>pip install -U metaflow</pre><p>to get the latest version and <a href="https://docs.metaflow.org/metaflow/configuring-flows/introduction">head to the updated documentation</a> for examples. If you are impatient, you can find and execute <a href="https://github.com/outerbounds/config-examples">all config-related examples in this repository</a> as well.</p><p>If you have any questions or feedback about Config (or other Metaflow features), you can reach out to us at the <a href="http://chat.metaflow.org/">Metaflow community Slack</a>.</p><h3>Acknowledgments</h3><p>We would like to thank <a href="https://outerbounds.co/">Outerbounds</a> for their collaboration on this feature; for rigorously testing it and developing a repository of examples to showcase some of the possibilities offered by this feature.</p><img src="https://medium.com/_/stat?event=post.clientViewed&#38;referrerSource=full_rss&#38;postId=d2fb8e9ba1c6" width="1" height="1" alt=""><hr><p><a href="https://netflixtechblog.com/introducing-configurable-metaflow-d2fb8e9ba1c6">Introducing Configurable Metaflow</a> was originally published in <a href="https://netflixtechblog.com/">Netflix TechBlog</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Open-Sourcing a Monitoring GUI for Metaflow</title>
		<link>https://noise.getoto.net/2021/10/28/open-sourcing-a-monitoring-gui-for-metaflow/</link>
		
		<dc:creator><![CDATA[Netflix Technology Blog]]></dc:creator>
		<pubDate>Wed, 27 Oct 2021 21:58:06 +0000</pubDate>
				<category><![CDATA[machine learning]]></category>
		<category><![CDATA[metaflow]]></category>
		<category><![CDATA[Netflix]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[python]]></category>
		<guid isPermaLink="false">https://medium.com/p/75ff465f0d60</guid>

					<description><![CDATA[Open-Sourcing a Monitoring GUI for Metaflow, Netflix’s ML Platformtl;dr Today, we are open-sourcing a long-awaited GUI for Metaflow. The Metaflow GUI allows data scientists to monitor their workflows in real-time, track experiments, and see detailed lo...]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/

Object Caching 32/72 objects using Memcached
Page Caching using Disk: Enhanced 
Lazy Loading (feed)
Database Caching using Memcached

Served from: noise.getoto.net @ 2025-12-07 23:34:11 by W3 Total Cache
-->