NVIDIA NEO User Manual and Release Notes v2.7.20
NVIDIA NEO User Manual and Release Notes v2.7.20

Services

The Services window enables simple configuration and continuous validation of services in the fabric. For each type of service, service instances can be created providing a clear visualization for the state of the services and their underlying components. A bring-up wizard can further simplify the configuration of the network by allowing the user to provide in a few minimal steps all the input required for bringing up the network from scratch.

services_2.5.png

The five available service types are:

  • Bring Up

  • Virtual Modular Switch

    • VMS

    • L3 Network Provisioning

  • MLAG

  • MTU

  • RoCE

The service types and configurations are divided in the Service view as seen in the figure above, and an Add button, as well as a help button, are available for each one of them.

Warning

The configuration and cleanup commands generated for the services assume that the switches have no prior configuration. Prior configuration may cause some of the commands to fail and lead to inconsistent configuration on the switches.

NEO enables a quick network bring-up that includes all the required configurations in one easy process. The user should only provide minimal input for the type of configurations needed in the bring-up wizard. All configuration steps are optional. Clicking “BringUp Wizard” will open the wizard for user input.

Warning

The wizard works on Onyx switch systems.

Warning

A configuration snapshot is taken for the devices participating in the bring-up before any configuration is done. This snapshot can be used to revert all the bring-up configuration changes.

Device Access

In this tab, the user can fill out the Device Access information for the device types participating in the bring-up.

Warning

This updates the global credentials for the selected system type.

Device_Access_2.5.png


Integration

In this tab, the user can define integration with various hypervisors. This new capability helps NEO acquire information about the VMs running on them and handle VM lifecycle events to properly configure VLAN on the switches.

  • Host Bond Configuration – the user can select the type of bonds that are being used on the hosts. If LACP bond configuration is used, NEO will suggest to create MPOs (see MLAG Port Channels) according to the links it detected on the switches.

  • VLAN Provisioning Port Mode – the user can select which port mode to assign to the switch ports or MPOs (according what the user selected in the Host Bond Configuration section mentioned above). The options are hybrid, trunk, or default (which is to let NEO use the current switch port mode configuration). This is used when using NEO to handle VM lifecycle events and change switch VLAN configuration accordingly.

Integration_2.5.png

VMware vCenter DVS Configuration

In this section the user can define VMware vCenter connectivity information. NEO uses it to get information from the vCenter regarding VM information and lifecycle events.

The VLAN Provisioning drop down contains the following options:

  • Disabled – VM lifecycle events will not be handled. NEO will only retrieve VM information from vCenter.

  • Global VLAN provisioning – NEO will listen to network events. In case of a network change event (e.g. adding or removing a network), NEO will add or remove VLANs to/from all switch ports. VLANs will be removed from the ports but will not be removed from the switch.

    Warning

    This is the recommended VLAN provisioning mode when working with Live Migration.

    In this mode, the VLANs' auto-provisioning is performed upon network creation (before the VM migration event) therefore, it prevents traffic lose.

  • Per port VLAN provisioning – NEO will listen to VM lifecycle events. In case of a VM change (e.g. VM added, removed or migrated) which required changes in VLANs, NEO will add or remove the VLAN accordingly from the relevant switch ports.

Upon filling the vCenter IP address, port, username and password, the user should click the Connect button to make sure the details are correct and NEO can connect the vCenter. If the connection is successful, a list of clusters managed by the vCenter will be shown in the Clusters table. The user should check the clusters he/she wants NEO to manage.

Warning

The Connect button should be clicked after every change so the new information will be processed by NEO.

VMware_vCenter_DVS_Configuration_2.5.png


Nutanix Prism AHV Configuration

In this section the user can define the Nutanix Prism Central and the Prism Element connectivity information. NEO uses it to get information from Prism regarding devices, VM information and lifecycle events. For further information, refer to NEO-Nutanix Prism Plug-in.

Nutanix_Prism_AHV_Configuration_2.5.png

  • The VLAN Provisioning drop down contains the following options:

    • Disabled – VM lifecycle events will not be handled. NEO will only retrieve VM information from vCenter.

    • Global VLAN provisioning – NEO will listen to network events. In case of a network change event (e.g. adding or removing a network), NEO will add or remove VLANs to/from all switch ports. VLANs will be removed from the ports but will not be removed from the switch.

      Warning

      This is the recommended VLAN provisioning mode when working with Live Migration.

      In this mode, the VLANs' auto-provisioning is performed upon network creation (before the VM migration event) therefore, it prevents traffic lose.

    • Per port VLAN provisioning – NEO will listen to VM lifecycle events. In case of a VM change (e.g. VM added, removed or migrated) which required changes in VLANs, NEO will add or remove the VLAN accordingly from the relevant switch ports.

  • Prism Central - IP, port, username and password are used to connect to the Prism Central. In case of working without Prism Central, put the Prism Element details instead.

  • Prism Elements Credentials – here the user should fill the username and passwords of each Prism Element in the network. Use “default” to fill the same credentials to all Prism Elements or specify credentials per Prism Element IP.

    Upon filling the Prism Central IP address, port, username and password, and the Prism Element credentials, the user should click the Connect button to make sure the details are correct and NEO can connect the Prism. If the connection is successful, the switches and Nutanix hosts known to Prism will be added to NEO. This might take a couple of minutes.

    Warning

    The Connect button should be clicked after every change so the new information will be processed by NEO.

    When enabling VLAN provisioning, the user can also set some advanced properties that affect the communication with Prism.

Device Discovery

In this tab, the user can fill out the switches needed to be configured. The switches are organized in pairs, so MLAG can be created from each pair. NVIDIA ® NEO ® can automatically detect MLAG switch pairs that fulfill the connectivity prerequisites and move them to the “Selected” table.

Warning

MLAG configuration may be skipped by using the "Proceed without MLAG configuration" checkbox, and selected devices can be configured with MTU and ROCE in the Network Services step.

If the switches are not listed, the user can click “Add Device” and add them.

Add_device_2.5.png

Devices can be added by their management IP address (click the

Add_device_button_2.5.png

button to add them to the list). When done, click the “Add Devices” button. In case only one switch is known by NEO, NEO will try to discover switches linked to this switch using LLDP. If you want to use this ability, please make sure that LLDP protocol is enabled on your switches. once LLDP results are retrieved, relevant switch IP will be automatically populated.

discover-by-ip.PNG

Alternatively, the user can specify a range of IPs or subnet IP to scan (see also Discovery Settings) and click the "Save and Scan" button to start scanning.

discover-by-range.PNG

discover-by-subnet.PNG

After adding the devices in any of the above methods, they will undergo a short discovery cycle to get the required data and then will be available for the bring-up.

MLAG

The MLAG tab defines the necessary information for MLAG configuration in the selected switch pair.

  • The MPO VLAN field allows the user to add VLANs (networks) to all switch pairs in one click.

  • The MPO switchport mode field sets the default switchport mode that will be used for MPOs defined in each MLAG pair. The user can change specific MPOs to other values if necessary.

MLAG.png

For each pair, the user can select the ports that will be part of the MLAG IPL. The ports that NEO identified as linking the two switches are automatically selected. Clicking the

MLAG_Button_2.5.png

button will allow the user to set other MLAG related attributes.

Setup

This section defines MLAG attributes:

MLAG_2_2.5.png


IPL Configuration

This section defines MLAG IPL attributes:

MLAG_3_2.5.png


MLAG Port Channels

This section defines MPOs to configure on the switch. If you are using LACP bond mode configuration, NEO will auto-populate the table with any host linked to both switches in the pair.

mpo.PNG

The user can add or change MPOs according to the required network configuration.

MLAG_5_2.5.png


Networks

This section defines layer 2 networks (VLANs) to configure on the switch. A default network with VLAN 1 is automatically added and is the default for MLAG port channel native VLAN definition.

MLAG_Networks.png

Add a network by clicking the “Add” button and setting its name and VLAN ID:

MLAG_Networks_1.png

Network Services

In this tab, the user can specify RoCE and MTU definitions. If RoCE is required, the user can define in the advanced section ECN thresholds and the priority to use for RoCE traffic.

Network_Services.png


Monitoring

In this tab, the user can define the telemetry means for monitoring the network configuration and traffic behavior. In the top section, the user can decide whether or not to deploy the telemetry agent on the switches (top checkbox), and if so, which telemetry sessions to use.

For more information on Telemetry Agent and Sessions see Telemetry Streaming.

Monitoring.png

In the bottom section, the user can select which telemetry snapshots to enable. These will run a show command periodically and the user will be notified when the output will change. Clicking “Add Telemetry Snapshot” allows the user to add his own show command:

Monitoring_2.png


Summary

In this tab, the user can see a summary of all the definitions that are going to be configured on each switch pair.

Summary.png

Clicking “Apply configuration” will start the configuration process, which can take a couple of minutes. You can track the progress in the bring-up progress dialog and in the jobs page. For MLAG, RoCE and MTU configurations, service objects will be created and used to apply the required configuration on the switch pairs (see section .Services v2.7.10#Service Elements for more information). Telemetry actions (agent deployment and session configuration) will be done after the services are configured.

In case the MLAG configuration failed, the bring-up will not continue to the next phases. After failures in other phases NEO will try to continue with the bring-up process.

Virtual Modular Switch

A drop-down menu will appear, allowing the user to select two types of services when clicking the "+" button:

vms-menu.PNG

VMS

Warning

Before setting up VMS using the NEO VMS service, it is highly recommend to review the information and prerequisites found in Virtual Modular Switch™ Reference Guide.

NVIDIA's Virtual Modular Switch (VMS) solution, comprised of NVIDIA10GbE, 40GbE, and 56GbE fixed switches, provides an ideal and optimized approach for a fixed switch aggregation. VMS is energy efficient and scales up to 28.8Tb/s of non-blocking bandwidth and up to 720 nodes of 40GbE and operates at ultra-low latencies. The VMS can be set up in Layer 3 mode (L3-VMS) based on OSPF. VMS configuration and bring-up can be fully automated, from the early planning stages until it is operational, by leveraging the VMS Wizard. The VMS Wizard provides an automation environment to provision the fabric with a centralized application, an application that learns the way the switches interconnect and how they ought to operate in the data center. Once the fabric size is defined and the types of switches in the fabric are selected, the VMS Wizard specifies how to configure the switches. After installation, the wizard verifies the connectivity and applies the configuration to the switches.

In order to configure the VMS solution:

  1. Click the “Add” button on the left side of the VMS row.

  2. Type the service name and description under “General”.

    vms-wizard.PNG


    Select the number of tiers (VMS Levels – 2 or 3).

    Warning

    For 2 levels only (Spines and TORs), select 2. For 3 levels (TORs, Leafs and Spines), select 3.

    Unlike the 2 levels choice, if you select 3 levels, you will be given more options, as can be seen in the figure below. You will also be requested to fill out the Leafs tab.

    1. Link width from top of rack – The number of cables from each TOR to Leaf

    2. Uplink from top of rack – The number of Leafs connected to each TOR

    3. Link width from Leafs – The number of cables from each Leaf to Spine

      Warning

      For further information on the VMS topology, you may refer to the VMS Reference Guide at www.mellanox.com, under Products -> Ethernet Switch Systems -> VMS.

  3. Select the switch members of Spines after choosing the number of ports. The available options for this tier are 12 and 36 ports. For further information on these options, please refer to "Supported Switches per Tier".

    vms-spines.PNG

    Supported Switches per Tier

    Number of Ports

    Switch Family

    Supported Tier/s

    12 ports

    MSN2100

    TOR/Leaf/Spine

    32 ports

    MSN2700, MSN3700

    TOR/Leaf/Spine

    48+8 ports

    MSN2410

    TOR

  4. Select the switch members of Leafs after choosing the number of ports. The available options for this tier are 12 and 32 ports. For further information on these options, please refer to .Services v2.7.10#Supported Switches per Tier.

    vms-leafs.PNG

  5. Select the switch members for TORs after choosing the number of ports. The available options for this tier are 12, 32, and 48+12 ports. For further information on these options, please refer to "Supported Switches per Tier".

    vms-tor.png

  6. Fill in the “Network” and “Subnet Mask” fields, then click “Finish”.

    vms-network.png

    Once clicked “Finish”, a service instance will be created and a service element will appear on the Services main page. A right click on a service element will enable performing different operations. For information on the operations and the service instances in general, please refer to “Service Elements”. A task for the VMS configuration will also be created when clicking “Finish”, as described in the step below.

  7. A task that contains all the VMS configurations to all switches will be created. Right-click the task and select “Run” to configure all the switches that are part of the VMS.

    image2019-3-25_18-13-40.png

    Warning

    If the selected switches are not connected as a fat tree, NEO will not create the task and will send an error message.

In order to delete a configured VMS service:

  1. Right-click your configured VMS icon and click "Delete".

  2. Click OK when prompted by the confirmation message.

L3 Network Provisioning

The L3 network provisioning service provides a simple provisioning capability for configuring the layer 3 network connectivity. This can be done by selecting the NVIDIAswitches and defining their IP subnet for inter-switch connectivity. The service will then discover all links between these switches and will allocate a subnet of the length of 30 for each link pair from the subnet provided by the user.

In order to configure the L3 network provisioning service, follow the steps below:

  1. Click the “Add” button on the left side of the Virtual Modular Switch row.

  2. Fill in the required information and check the desired checkboxes under the General dialog box:

    1. Provide a name and description of the service.

    2. Then in the OSPF Subnet Reservation field, type the subnet used for allocating IP addresses to OSPF areas.

    3. [Optional] When the “Add Auto-Discovered Switches” checkbox is checked, a notification will be generated, notifying the user of a topology change in the newly created topology/service. For further information, refer to “Notifications”.

    4. [Optional] Check the “Release Unused Resources” checkbox for unused links to be deallocated within the timeout interval chosen in minutes (the minimum is 15 minutes).

    5. [Optional] Check the “Auto Configure Switches When Topology Changed” checkbox for auto configuration of devices upon topology changes. When this checkbox is checked, no notification will be generated. Rather, an event will appear under “Events”.

      l3-network.PNG

  3. Choose the devices to configure the L3 network provisioning service for, and click “Finish”.

    l3-devices.PNG

    Once clicked “Finish”, a service instance will be created and a service element will appear on the Services main page. A right click on a service element will enable performing different operations. For information on the operations and the service instances in general, please refer to “Service Elements”.

MLAG

The MLAG service allows configuring a pair of NVIDIA ® Onyx ® or Cumulus switches with the following to support multi-chassis LAGs and periodically validates their configuration:

  1. Switch cluster

  2. MAGP router and network

  3. MLAG port channel

  4. Host bond

In order to configure the MLAG service:

  1. Click the “Add” button on the left side of the MLAG row.

  2. In the Cluster tab, select the switch type and IP of the first switch in the cluster. The rest of the fields (including the collapsible Advanced section) will be filled out automatically, with the option to be edited. Note that some fields might not be filled in case there is no appropriate peer switch.

    Warning

    The information in the Cluster tab is mandatory for the creation of the MLAG service, and cannot be changed once the service is created.

    mlag-wizard.PNG

  3. Under Networks tab, you can manage MAGP networks on the MLAG cluster. Click “Add” to add a new network and fill in the required information, or edit/delete a network using the icons in the rightmost column of the network row.

    Warning

    Networks are not mandatory for the MLAG service creation. They can be added, edited or removed after the service has been created.

    mlag-network.PNG

  4. Under Servers tab, you can manage the connectivity between the MLAG switches and the Linux hosts the MLAG switches are connected to. This includes both switch side configuration and (optionally) the host side bond creation. When first accessing this tab, it will be initialized with connected servers that NEO has already identified. Click “Add” to add a new server and fill in the required information, or edit/delete a server using the icons in the rightmost column of the server row.

    Warning

    Servers are not mandatory for the MLAG service creation. They can be added, edited or removed after the service has been created. However, if you define a server, you also need to define the network it belongs to in the Networks tab.

    mlag-service.PNG

Once clicked “Finish”, a service instance will be created and a service element will appear on the Services main page. A right click on a service element will enable performing different operations. For information on the operations and the service instances in general, please refer to “Service Elements”.

Warning

When NEO discovers an MLAG configured on the switches, it will automatically create a service for it.


MTU

The MTU service allows configuring an interface MTU on specified Onyx switches to a desirable value and periodically validates their configuration.

In order to configure the MLAG service:

  1. Click the “Add” button on the left side of the MTU row.

  2. Fill in the name, description, and MTU fields.

    mtu-wizard.PNG

  3. Choose the device to configure the MTU service for, and click “Finish”.

    mtu-devices.PNG

    Once clicked “Finish”, a service instance will be created and a service element will appear on the Services main page. A right click on a service element will enable performing different operations. For information on the operations and the service instances in general, please refer to “Service Elements”.

RoCE

RDMA over Converged Ethernet (RoCE) is a network protocol that allows remote direct memory access (RDMA) over an Ethernet network. It is mainly useful for network-intensive applications like networked storage or cluster computing, which require a network infrastructure with high bandwidth and low latency.

RoCE can be configured in the following configuration types: ECN only, ECN with QoS, and ECN with QoS and PFC.

To allow the network to use RoCE, both switches and hosts should be configured appropriately. The service allows one of the following modes to specify the devices to configure:

  1. All host ports - configures all network starting from the hosts’ ports, through their directly linked switch ports, and including ports interconnecting switches.

  2. All switch ports - configures all ports and LAGs on all network switches applicable for RoCE. Does not include host ports.

  3. Custom selection - allows the user to specifically define which devices will be configured. If this option is selected, the wizard will include another step to define the devices. Each device can be defined as:

    1. Host - In this mode you select the specific host interfaces that you wish to configure. These interfaces must be linked to a supported switch. The switch interfaces that are directly connected to the host interfaces will also be configured.

    2. Switch - In this mode you select the specific switch interfaces that you wish to configure. These can also be LAGs or MLAGs.
      In both modes you can select the "Configure inter-switch links" option to configure all the switch interfaces that are connected to the selected devices. For example, if you specify the leaf switches and select this option, the interfaces that connect the leaf switches to the spine switches or between different spine switches will also be configured.

For Windows hosts, the interface connectivity is not automatically detected. Therefore, the switch interfaces that are directly connected to the host interfaces will not be implicitly configured, and the "Configure inter-switch links" option is not relevant. You must explicitly create another RoCE service for the switch ports you wish to configure. This is relevant in case you select the "all host ports" option, or define hosts in the "Custom selection" option.

Editing RoCE Service

If you have specified the configured devices explicitly, using the "Custom selection" option, you can edit the RoCE service to add or remove devices and interfaces to/from your configuration. However, you will not be able to change the network configuration type, the configuration parameter values or the device type to be configured (host/switch).

Warning

Removing a switch interface does not remove the RoCE configuration that is already assigned to it until the user applies the changes.


Requirements

Before configuring RoCE using NEO, make sure your network fulfills the following requirements:

  • Host

    • The host should have a ConnectX-4 or ConnectX-5 NIC installed.

    • The host should have NEO-Host v1.3 and above installed.

    • Linux host should have a MLNX_OFED version compatible with NEOHost installed.

    • For Linux host, the configuration will only run on ports that NEO identifies as links to an applicable switch.

    • Linux host should have Link Layer Discovery Protocol Agent Daemon (LLDPAD) package installed.

    • Windows host should have a Windows Server 2016 operating system and WinOF2 v2.0 and above installed.

  • Switch

    • The switch should be either an NVIDIA ® Spectrum ® , a Cumulus or a 3232C/3231Q Cisco switch.

    • NVIDIA switch should have Onyx v3.6.5000 or above installed.

    • Cumulus switch should have operating system v3.5 and above installed.

    • The cables should support 100G rate.

    • The ports speed should be configured to 100G.

Limitations

  • Host: The configuration is non-persistent. Rebooting a host requires reconfiguring it.

RoCE Configuration

In order to configure RoCE:

  1. Click the “Add” button on the right side of the RoCE row.

  2. Name your service and check/uncheck the QoS and PFC checkboxes, as desired. Select which devices will be configured by this service. In the Advanced section you can also alter the configured value for certain parameters, depending on the RoCE configuration you choose. The "Apply Configuration" checkbox defines whether configuring the devices will start immediately upon clicking the “Finish” button.

    roce-wizard.PNG

  3. If you choose to explicitly define the devices to be configured with the "Custom selection" option, use the Members tab to define the devices and the interfaces that will be configured by the service. You can select either hosts or switches.

    roce-members-host.PNG

    Once clicked “Submit”, a service instance will be created and a service element will appear on the Services main page. A right click on a service element will enable performing different operations. For information on the operations and the service instances in general, please refer to “Service Elements”.

In order to delete a configured RoCE service:

  1. Right-click your configured RoCE icon and click "Clean Up".

  2. Click OK when prompted by the confirmation message.

The service elements are colored squares that stand for service instances and appear in the Services main page once a service type is created.

image2019-3-31_12-51-4.png

  1. Elements Colors: The color of a service element varies mainly according to the service instance’s last configuration status. However, when the service’s status is “Monitoring”, the color will be determined according to the service instance’s last validation status.

    lastConfigurationStatus

    Color

    Initializing

    Blue

    Idle

    Grey

    InitializingFailure

    Red

    Colors According to the Last Validation Status

    lastConfigurationStatus

    lastValidationStatus

    Color

    Monitoring

    Unknown

    Grey

    Completed

    Green

    Completed With Errors

    Red

    Colors According to the Last Configuration Status

  2. Element Operations: A right-click on a service element will enable configuring the service by selecting “Apply Configuration”. Other operations may also be available for a service element depending on its status, see details in the table below.

    image2019-3-31_13-55-7.png


    Available Service Element Operations
     

    Status

    Operations Available

    Operations Description

    Initializing

    None

    N/A

    Idle

    Apply Configuration

    Configures the service.

    Once the configuration is applied, the service status will automatically change to "Monitoring", and start periodic validation.

    Only once the configuration is applied, the "Validate" and operation will become available.

    Apply Changes

    Available if the service has been edited since applying the configuration. Applies only the configuration changes.

    Clean-up

    Cleans-up the configuration done by the service. Once the configuration is cleaned, the service status will automatically change to "Idle", and stop periodic validation. Previous configuration and validation status will be reset.

    Warning

    Clean-up is currently supported for RoCE and MLAG service types on Onyx and Cumulus switches.

    Warning

    MLAG clean-up is only supported for MLAG services created in NEO 2.6 and above.

    Warning

    After MLAG clean-up is performed, MPO VLAN, IP routing, IP DHCP relay instance, LACP, and protocol MAGP configuration will remain on Onyx switches. For Cumulus switches, only the MPO's VLAN configuration remains.

    Validate

    Validates the configuration of the service.

    Start

    Starts a periodic validation of the service (default interval is 30 minutes). This will change the status to “Monitoring”.

    Delete

    Deletes the service.

    Show Properties/ Edit Service

    Shows the information filled when the service was created. Some service types can be edited.

    InitializingFailure

    Delete

    Deletes the service.

    Show Properties

    Shows the information filled when the service was created.

    Monitoring

    Stop

    Stops a periodic validation of the services. This will change the status to “Idle”.

    Show Properties/ Edit Service

    Shows the information filled when the service was created. Some service types can be edited.

  • Device configuration backup: Before configuration changing operations (Apply Configuration, Apply Changes, Clean-up: see table above), a network snapshot will be created for all the devices that are about to be configured. This snapshot can be used to revert to the original device state if the configuration fails, or if it has unwanted implications. If the snapshot creation fails the operation will not run.

  • Element Icons: Each service element contains the following:

  • The name of the service

  • The wrench icon – if the last configuration status of the service was “Unknown”

  • The clock icon – if the service state is Monitoring

  • The spinner icon – if the service is going under a validation or configuration process at the moment

  • Element Information: When hovering over an element, the following information will be displayed:

    • The service’s “state”

    • The service’s “last configuration status”

    • The service’s “last validation status”

  • Service Details Modal: When clicking a service element, a modal with more details about the service will appear. The modal consists of three tabs:

    • “Service Details” tab – lists the service type, the time it was created, the time it was last updated, the last validation status and time, and the last configuration status and time.

      image2019-3-31_14-7-33.png

      If the service initialization fails, an error message will be added to the bottom of Service Details list.

    • “Validation Heatmap” tab – provides a validation heatmap of the service devices, colored according to their validation job status (Completed – Green, Completed with Errors – Red, Unknown – Grey). When clicked on a device, more details about its IP and name (and the relevant errors if there are any) will be displayed.

      image2019-3-31_14-8-28.png

    • “Configuration Heatmap” tab – provides a configuration heatmap of the service devices, colored according to their configuration job status (Completed – Green, Completed with Errors – Red, Unknown – Grey). When clicked on a device, more details about its IP and name (and the relevant errors if there are any) will be displayed.

      image2019-3-31_14-8-57.png

© Copyright 2023, NVIDIA. Last updated on Nov 16, 2023.