Nova Weekly Team Meeting (R21)

Meeting log here:

  • June 2: newton-1, non-priority spec approval freeze
  • Keystone Fernet token as default was reverted in Devstack to resolve build issues
  • We have 59 approved blueprints: – 6 are completed, 5 have not started, 3 are blocked
  • BDM v1 and miscellaneous API deprecation to be grouped into the extensions cleanup spec


  • Need help with cells v2 testing in the gate
  • Currently investigating if using transport_url (if configured) could make upgrading easier
    • Also discussion around deprecating the old driver specific messaging options to encourage deployers to set transport_url


Live Migration

  • Test coverage for experimental in the gate has access to the latest qemu and libvirt via Ubuntu 16.04
  • Also increased CI coverage for storage backends
  • Many of changes to storage pools have merged




Newton Design Summit Recap, Part 1: Scheduler and Cells

The OpenStack Design Summit was held April 25-29, 2016, in Austin, TX. OpenStack contributors got together to discuss key issues and to plan for the Newton release. This is the first in a series providing a summary of key items that were discussed by the Nova team during that time.

Note that these recaps will be Nova-centric, focusing primarily on sessions and discussions pertaining to Nova interests. Also note that these are not comprehensive recaps, and links to additional resources will be provided later in the articles for reference.

The Nova PTL, Matt Riedemann, wrote up summaries for key Nova sessions. For each item, I’ll provide the link to the openstack-dev mailing list archive of his summary and some TL;DR bullet points.

Nova Newton Priorities Tracking

The Nova team maintains an etherpad with an updated list of items to review. This helps the team maintain focus on key issues during the cycle.

All Open Specifications

Here is a link to all open specifications in Nova that need reviewing. As you can see, there are a lot, so cross referencing with the Priorities etherpad linked above is useful to narrow things down.



The long term road map is to move the Scheduler out of Nova and to modularize it to allow it to use external placement decision libraries. This is a long process with incremental changes happening each cycle, keeping backwards compatibility in mind and minimizing the impact to end users as these changes happen.

Key Points

PCI and NUMA database differences need to be addressed in order to move forward

  • NUMA is stored differently than PCI in the database
    • NUMA topology including compute node information (capacity, allocation, usage, etc) is stored in json blob in the same field in compute_nodes.NUMA_topology
    • PCI devices stores information in a different table
    • PCI request instance is stored in instance_extra
  • Goal is to split those things into an inventories table and allocations table
  • Caller should not be aware of any backend changes, resource tracker needs to still show the same values

The allocations/inventories table will go into the API database

  • Deployers and operators were already unhappy with a new API db, so new db would make them even more unhappy
  • Ultimately Scheduler will be moved into its own thing, but this is an interim solution

Capabilities are still undefined

  • Proposal: a capability is a single value representing a specific feature
  • Proposal: Create a set of enum classes that very distinctly describe what a particular capability is
    • We need to consolidate the different values from different resources
      • eg, libvirt returns a feature flag, vmware returns something else, we need to provide a unified value to provide to the caller
  • This is what is currently proposed:


Completed in Mitaka

  • Resource Providers Database Schema
  • Online data migration for inventory (capacity, reserve amounts, cpu, disk, etc)

Slated for Newton

  • Migrating allocation fields for instances
    • blueprint:  resource-providers-allocations
  • Define what a capability is (and isn’t) and define a standard representation
    • blueprint: standardize-capabilities
  • Generic Resource Pool Object Modeling + REST API
    • blueprint:  generic-resource-pools
  • Cleanup around PCI device handling and migration of PCI fields
  • Migration of NUMA fields

Additional Links

Cells v2


One of the biggest challenges Nova faces is its own upper limit. The concept of “cells”, or the idea of many independent compute “containers” running simultaneously, was born as a solution this problem. Unfortunately the Cells v1 effort was not successful at addressing this issue but some important lessons were learned. Cells v2 is the current attempt to solve the compute scalability issue and to generally improve the Nova code base as a handy side-effect.

Andrew Laski gave a fantastic talk providing an overview of Cells v1 and the plan for Cells v2 in Newton. This talk provides some great context and high level architecture. You can view the talk on here.

One key difference between Cells v1 and Cells v2 is that Cells v1 implemented the cells architecture as an alternate path (with all the complexity maintaining an alternate path brings) whereas Cells v2 will be *the only* path. The default would be a single cell deployment with one Compute instance living in one cell.

Key concepts:

  • The API cell is the cell responsible for running the Nova API to handle requests to instances (even ones located in other cells)
    • Has its own API Database
    • The Scheduler lives in this cell and manages instance “scheduling” from here
  • Cell 0 is a special cell that lives outside of the regular cell hierarchy.
    • It is the default location in the event of instance “scheduling” failures
    • Cell 0 is also the default cell in a single-cell deployment (ie, Devstack)
    • Cell 0 can be combined with the API cell for simplicity
  • In Cells v2, managing instance requests requires additional data in order to route messages to the appropriate cell.
    • Instead of just looking up the hostname of the compute node where an instance lives (what we currently do), we will also need the database and queue connection information.
  • Database is split between “local” and “global” data.
    • Local – stuff that only the compute nodes within the cell need to know about
    • Global – stuff the Nova API (living in the API cell) needs to know about to handle instance requests

Completed in Mitaka

  • Created the API database
  • Database connection switching – tell Nova which database connection to use
  • Tools to help with upgrades
  • Scheduling Interaction focused items
    • Implement BuildRequest object + storage in database
      • Persist instance data when a boot request is received by the API but the instance hasn’t been created and written to the database yet
    • RequestSpec Object to persist instance details needed for allocation
      • Used internally by the BuildRequest object and contains the specific “scheduling” details
    • Make cell id nullable
      • A null cell_id defines a state where a boot request was received, but the instance still needs to be allocated to a cell. Because there is no cell_id, information needs to come from the BuildRequest object rather than from the database.

Slated for Newton

  • Data Migration to the API database
    • Flavors
    • Aggregates
    • Key Pairs
    • Quotas
  • Implementation of Cell 0
  • Creation of additional upgrade tools
  • Message Queue connection switching
    • specify which message queue a message should go to
  • Start work on multiple cells support

Additional Links

Editor’s Note: Please feel free to contact me or post comments with any corrections!

API-Ref Daily Update

No change from yesterday, but we’re chugging along!

Sean Dague recently pushed some changes to the Burndown chart to include a table of what still needs to be done. The data for this chart is available in both JSON and text formats. Now I can just import that data into my spreadsheet without having to manually tweak things, making this whole thing go a lot quicker.

Api-ref status for May 17, 2016 (click for larger image)

Click here for more information about the API-Ref project.

API-Ref Daily Update

Recently, Nova moved our API documentation from WADL XML format to RST, a human readable text markup common in the Python community. We also moved these documents into our Nova source tree and out of the central documentation repository. As part of this effort, we are verifying the correctness of this API documentation. See the full details from the original mailing list post.

Due to the large collaborative effort, keeping track of what’s being done is pretty difficult. I started a spreadsheet with data via a script I modified originally written by Sean Dague. Because the commits are all over the place, I have to manually parse and enter much of the data. I run the script in the morning to check for updates and then update the main grid with anything new that’s being worked on. It currently takes me about 15 minutes which is why I haven’t tried to automate more than I already have.

I’ll post the updated chart here each morning when I update my spreadsheet.

Api-ref status for May 13, 2016 (click for larger image)


Summaries In Progress!

Just a quick update, it’s been a pretty busy week and I have a pile of half finished summaries sitting in my Drafts folder. I’m making Friday my official “Nova Rollup Day”. By early next week you should have a nice pile of highlights of what’s going on in the Nova developer community!

Nova Weekly Team Meeting (R22)

Newton Milestone 1 is June 1
  • Deadline for non-priority specs


Devstack switched to using fernet tokens by default on 4/29, which is causing race failures in the identity tests
  • porting that to python 3 is going to be a massive undertaking due to the large amount of mox in the base class
  •     note: mox does not work in python 3

Team Reminders

– Bug numbers are rising, if you have time, please help triage!
– Api-ref doc sprint, help review or update the api-ref docs

Subteam Updates


  • melwitt is working on mq switching
  • ccarmack is working on testing
  • everyone else is working on db migrations
  • add-buildrequest-obj series


  • spec for generic-resource-pools is top priority this week

Live Migration

  • went over Summit review and priorities


  • deletion of the legacy v2 code in tree,  paste.ini entries are fully removed now, as well as the functional tests for the legacy code
  • will be a series of test tear downs, then the final code remove over the next couple of weeks
  • getting the policy in code specs reviewed


  • NFV documentation improvements
  • mellanox ci is working on moving to containers for their ci like how the intel nfv ci does it



Nova Developer Rollup #2

I’m still working on Nova Session Summaries (aka the Specification Reviewers’ Cheatsheet) but here’s some mailing list activity summaries covering May 2 through May 3rd 2016.

Libvirt Version Requirements

Driver Review Dashboards

  • Sean Dague put together a draft Gerrit dashboard to improve review coverage for changes that expand driver CI support. We have some CI drivers that are non-voting because they are behaving inconsistently. We want to fix these issues and we want to make sure the relevant changes get looked at by the team.
    • Right now it only filters on reviews that have at least a +1 from specific key reviewers, should it also filter on “starred_by” as well?
    • Are these currently the right people for each driver?
    • Should these be made voting again (ie, +1  required from relevant CI to pass all tests)? If so, how soon could we potentially do that?

Next Minimum Libvirt Version

  • We should increase the MIN_LIBVIRT_VERSION to 1.2.8. This will mean that NUMA support in libvirt (excepting the blacklists) and huge page support is assumed on x86_64.
    • If we ignore RHEL 7.1, we can bump to 1.2.9
      • Is that really OK?
        • Yes, when Ocata is released, RHEL 7.3 will be the latest update, so people shouldn’t be continuing to deploy on RHEL 7.1
        • We should be aiming to target $current & $current-1 RHEL-7 update releases
  • Where I can find details of what RPM versions are present in SLES releases?

Citrix XenServer CI Server Update

 SR-IOV/PCI Passthrough /NFV Meeting Update

  • Editor’s Note: sorry for the brevity here, with everything else on my plate this week, I didn’t have a lot of time to do context research!
  • Meetings are now weekly
  • Current focus: Improving SR-IOV/PCI Passthrough /NFV testing and Documentation
    •  lennyb and wznoinsk are moving Mellanox CI to containers
    • Need owners for Multinode CI for IOV/PCI Passthrough /NFV and PF Passthrough
    • lbeliveau and moshele are working on improving PCI Passthrough SR-IOV Documentation
    • sfinucan is working on improving NUMA and cpu pinning Documentation
    • Editor’s Note: New Contributor alert! These documentation improvements could be good things for new folks to review!!

Api-ref Documentation Review Sprint

Compute Node instance_dir

  • Should the instance_dir be the same for all compute nodes?
    •  Yes, because the target host doesn’t get to decide where its files are copied to. When using libvirt to do cold migration/resize the source host creates the instance directory on the target host using ssh and then copies the files to it using scp. Note that this is a security flaw that should be addressed in the libvirt storage pools work.

Administrative Level Commands in the OpenStack Client

  • Will OpenStack Client support administrative and service level commands?
    •      For Nova Scheduler work this release, the placement API associated with resource pools will be developed in a decoupled fashion within the Nova tree because it will eventually live outside of Nova.
    • CLI’s are supposed to plug into the OpenStack Client
    • It’s still unclear what the scope of the OpenStack Client is, will it support admin and service level usage or is it just for end users?

API Case Sensitivity

  • During Mitaka, an issue with MySQL was discovered where case-sensitivity is ignored for text fields. In the case of Metadata Keys for both instances and aggregates, this lead either to a unique key violation error or to deletion and retrieval of multiple values.
  • The solution discussed at the summit is to update the input validation so only lower case values are accepted, provide tools to administrators to correct any inconsistencies on the backend, and to set any bugs reported around metadata case inconsistency issues as WONTFIX.
  • The spec is here: