DRM Driver uAPI¶

drm/i915 uAPI¶

uevents generated by i915 on it's device node

I915_L3_PARITY_UEVENT - Generated when the driver receives a parity mismatch: event from the gpu l3 cache. Additional information supplied is ROW, BANK, SUBBANK, SLICE of the affected cacheline. Userspace should keep track of these events and if a specific cache-line seems to have a persistent error remap it with the l3 remapping tool supplied in intel-gpu-tools. The value supplied with the event is always 1.
I915_ERROR_UEVENT - Generated upon error detection, currently only via: hangcheck. The error detection event is a good indicator of when things began to go badly. The value supplied with the event is a 1 upon error detection, and a 0 upon reset completion, signifying no more error exists. NOTE: Disabling hangcheck or reset via module parameter will cause the related events to not be seen.
I915_RESET_UEVENT - Event is generated just before an attempt to reset the: GPU. The value supplied with the event is always 1. NOTE: Disable reset via module parameter will cause this event to not be seen.

struct i915_user_extension¶: Base class for defining a chain of extensions

Definition:

struct i915_user_extension {
    __u64 next_extension;
    __u32 name;
    __u32 flags;
    __u32 rsvd[4];
};

Members

next_extension

Pointer to the next struct i915_user_extension, or zero if the end.

name

Name of the extension.

Note that the name here is just some integer.

Also note that the name space for this is not global for the whole driver, but rather its scope/meaning is limited to the specific piece of uAPI which has embedded the struct i915_user_extension.

flags

MBZ

All undefined bits must be zero.

rsvd

MBZ

Reserved for future use; must be zero.

Description

Many interfaces need to grow over time. In most cases we can simply extend the struct and have userspace pass in more data. Another option, as demonstrated by Vulkan's approach to providing extensions for forward and backward compatibility, is to use a list of optional structs to provide those extra details.

The key advantage to using an extension chain is that it allows us to redefine the interface more easily than an ever growing struct of increasing complexity, and for large parts of that interface to be entirely optional. The downside is more pointer chasing; chasing across the __user boundary with pointers encapsulated inside u64.

Example chaining:

struct i915_user_extension ext3 {
        .next_extension = 0, // end
        .name = ...,
};
struct i915_user_extension ext2 {
        .next_extension = (uintptr_t)&ext3,
        .name = ...,
};
struct i915_user_extension ext1 {
        .next_extension = (uintptr_t)&ext2,
        .name = ...,
};

Typically the struct i915_user_extension would be embedded in some uAPI struct, and in this case we would feed it the head of the chain(i.e ext1), which would then apply all of the above extensions.

enum drm_i915_gem_engine_class¶: uapi engine type enumeration

Constants

I915_ENGINE_CLASS_RENDER: Render engines support instructions used for 3D, Compute (GPGPU), and programmable media workloads. These instructions fetch data and dispatch individual work items to threads that operate in parallel. The threads run small programs (called "kernels" or "shaders") on the GPU's execution units (EUs).
I915_ENGINE_CLASS_COPY: Copy engines (also referred to as "blitters") support instructions that move blocks of data from one location in memory to another, or that fill a specified location of memory with fixed data. Copy engines can perform pre-defined logical or bitwise operations on the source, destination, or pattern data.
I915_ENGINE_CLASS_VIDEO: Video engines (also referred to as "bit stream decode" (BSD) or "vdbox") support instructions that perform fixed-function media decode and encode.
I915_ENGINE_CLASS_VIDEO_ENHANCE: Video enhancement engines (also referred to as "vebox") support instructions related to image enhancement.
I915_ENGINE_CLASS_COMPUTE: Compute engines support a subset of the instructions available on render engines: compute engines support Compute (GPGPU) and programmable media workloads, but do not support the 3D pipeline.
I915_ENGINE_CLASS_INVALID: Placeholder value to represent an invalid engine class assignment.

Description

Different engines serve different roles, and there may be more than one engine serving each role. This enum provides a classification of the role of the engine, which may be used when requesting operations to be performed on a certain subset of engines, or for providing information about that group.

struct i915_engine_class_instance¶: Engine class/instance identifier

Definition:

struct i915_engine_class_instance {
    __u16 engine_class;
#define I915_ENGINE_CLASS_INVALID_NONE -1;
#define I915_ENGINE_CLASS_INVALID_VIRTUAL -2;
    __u16 engine_instance;
};

Members

engine_class: Engine class from enum drm_i915_gem_engine_class
engine_instance: Engine instance.

Description

There may be more than one engine fulfilling any role within the system. Each engine of a class is given a unique instance number and therefore any engine can be specified by its class:instance tuplet. APIs that allow access to any engine in the system will use struct i915_engine_class_instance for this identification.

perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915

struct drm_i915_getparam¶: Driver parameter query structure.

Definition:

struct drm_i915_getparam {
    __s32 param;
    int __user *value;
};

Members

param

Driver parameter to query.

value

Address of memory where queried value should be put.

WARNING: Using pointers instead of fixed-size u64 means we need to write compat32 code. Don't repeat this mistake.

type drm_i915_getparam_t¶: Driver parameter query structure. See struct drm_i915_getparam.

struct drm_i915_gem_mmap_offset¶: Retrieve an offset so we can mmap this buffer object.

Definition:

struct drm_i915_gem_mmap_offset {
    __u32 handle;
    __u32 pad;
    __u64 offset;
    __u64 flags;
#define I915_MMAP_OFFSET_GTT    0;
#define I915_MMAP_OFFSET_WC     1;
#define I915_MMAP_OFFSET_WB     2;
#define I915_MMAP_OFFSET_UC     3;
#define I915_MMAP_OFFSET_FIXED  4;
    __u64 extensions;
};

Members

handle

Handle for the object being mapped.

pad

Must be zero

offset

The fake offset to use for subsequent mmap call

This is a fixed-size type for 32/64 compatibility.

flags

Flags for extended behaviour.

It is mandatory that one of the MMAP_OFFSET types should be included:

I915_MMAP_OFFSET_GTT: Use mmap with the object bound to GTT. (Write-Combined)
I915_MMAP_OFFSET_WC: Use Write-Combined caching.
I915_MMAP_OFFSET_WB: Use Write-Back caching.
I915_MMAP_OFFSET_FIXED: Use object placement to determine caching.

On devices with local memory I915_MMAP_OFFSET_FIXED is the only valid type. On devices without local memory, this caching mode is invalid.

As caching mode when specifying I915_MMAP_OFFSET_FIXED, WC or WB will be used, depending on the object placement on creation. WB will be used when the object can only exist in system memory, WC otherwise.

extensions

Zero-terminated chain of extensions.

No current extensions defined; mbz.

Description

This struct is passed as argument to the DRM_IOCTL_I915_GEM_MMAP_OFFSET ioctl, and is used to retrieve the fake offset to mmap an object specified by handle.

The legacy way of using DRM_IOCTL_I915_GEM_MMAP is removed on gen12+. DRM_IOCTL_I915_GEM_MMAP_GTT is an older supported alias to this struct, but will behave as setting the extensions to 0, and flags to I915_MMAP_OFFSET_GTT.

struct drm_i915_gem_set_domain¶: Adjust the objects write or read domain, in preparation for accessing the pages via some CPU domain.

Definition:

struct drm_i915_gem_set_domain {
    __u32 handle;
    __u32 read_domains;
    __u32 write_domain;
};

Members

handle

Handle for the object.

read_domains

New read domains.

write_domain

New write domain.

Note that having something in the write domain implies it's in the read domain, and only that read domain.

Description

Specifying a new write or read domain will flush the object out of the previous domain(if required), before then updating the objects domain tracking with the new domain.

Note this might involve waiting for the object first if it is still active on the GPU.

Supported values for read_domains and write_domain:

I915_GEM_DOMAIN_WC: Uncached write-combined domain

I915_GEM_DOMAIN_CPU: CPU cache domain

I915_GEM_DOMAIN_GTT: Mappable aperture domain

All other domains are rejected.

Note that for discrete, starting from DG1, this is no longer supported, and is instead rejected. On such platforms the CPU domain is effectively static, where we also only support a single drm_i915_gem_mmap_offset cache mode, which can't be set explicitly and instead depends on the object placements, as per the below.

Implicit caching rules, starting from DG1:

If any of the object placements (see drm_i915_gem_create_ext_memory_regions) contain I915_MEMORY_CLASS_DEVICE then the object will be allocated and mapped as write-combined only.

Everything else is always allocated and mapped as write-back, with the guarantee that everything is also coherent with the GPU.

Note that this is likely to change in the future again, where we might need more flexibility on future devices, so making this all explicit as part of a new drm_i915_gem_create_ext extension is probable.

struct drm_i915_gem_exec_fence¶: An input or output fence for the execbuf ioctl.

Definition:

struct drm_i915_gem_exec_fence {
    __u32 handle;
    __u32 flags;
#define I915_EXEC_FENCE_WAIT            (1<<0);
#define I915_EXEC_FENCE_SIGNAL          (1<<1);
#define __I915_EXEC_FENCE_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SIGNAL << 1));
};

Members

handle

User's handle for a drm_syncobj to wait on or signal.

flags

Supported flags are:

I915_EXEC_FENCE_WAIT: Wait for the input fence before request submission.

I915_EXEC_FENCE_SIGNAL: Return request completion fence as output

Description

The request will wait for input fence to signal before submission.

The returned output fence will be signaled after the completion of the request.

struct drm_i915_gem_execbuffer_ext_timeline_fences¶: Timeline fences for execbuf ioctl.

Definition:

struct drm_i915_gem_execbuffer_ext_timeline_fences {
#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES 0;
    struct i915_user_extension base;
    __u64 fence_count;
    __u64 handles_ptr;
    __u64 values_ptr;
};

Members

base: Extension link. See struct i915_user_extension.
fence_count: Number of elements in the handles_ptr & value_ptr arrays.
handles_ptr: Pointer to an array of struct drm_i915_gem_exec_fence of length fence_count.
values_ptr: Pointer to an array of u64 values of length fence_count. Values must be 0 for a binary drm_syncobj. A Value of 0 for a timeline drm_syncobj is invalid as it turns a drm_syncobj into a binary one.

Description

This structure describes an array of drm_syncobj and associated points for timeline variants of drm_syncobj. It is invalid to append this structure to the execbuf if I915_EXEC_FENCE_ARRAY is set.

struct drm_i915_gem_execbuffer2¶: Structure for DRM_I915_GEM_EXECBUFFER2 ioctl.

Definition:

struct drm_i915_gem_execbuffer2 {
    __u64 buffers_ptr;
    __u32 buffer_count;
    __u32 batch_start_offset;
    __u32 batch_len;
    __u32 DR1;
    __u32 DR4;
    __u32 num_cliprects;
    __u64 cliprects_ptr;
    __u64 flags;
#define I915_EXEC_RING_MASK              (0x3f);
#define I915_EXEC_DEFAULT                (0<<0);
#define I915_EXEC_RENDER                 (1<<0);
#define I915_EXEC_BSD                    (2<<0);
#define I915_EXEC_BLT                    (3<<0);
#define I915_EXEC_VEBOX                  (4<<0);
#define I915_EXEC_CONSTANTS_MASK        (3<<6);
#define I915_EXEC_CONSTANTS_REL_GENERAL (0<<6) ;
#define I915_EXEC_CONSTANTS_ABSOLUTE    (1<<6);
#define I915_EXEC_CONSTANTS_REL_SURFACE (2<<6) ;
#define I915_EXEC_GEN7_SOL_RESET        (1<<8);
#define I915_EXEC_SECURE                (1<<9);
#define I915_EXEC_IS_PINNED             (1<<10);
#define I915_EXEC_NO_RELOC              (1<<11);
#define I915_EXEC_HANDLE_LUT            (1<<12);
#define I915_EXEC_BSD_SHIFT      (13);
#define I915_EXEC_BSD_MASK       (3 << I915_EXEC_BSD_SHIFT);
#define I915_EXEC_BSD_DEFAULT    (0 << I915_EXEC_BSD_SHIFT);
#define I915_EXEC_BSD_RING1      (1 << I915_EXEC_BSD_SHIFT);
#define I915_EXEC_BSD_RING2      (2 << I915_EXEC_BSD_SHIFT);
#define I915_EXEC_RESOURCE_STREAMER     (1<<15);
#define I915_EXEC_FENCE_IN              (1<<16);
#define I915_EXEC_FENCE_OUT             (1<<17);
#define I915_EXEC_BATCH_FIRST           (1<<18);
#define I915_EXEC_FENCE_ARRAY   (1<<19);
#define I915_EXEC_FENCE_SUBMIT          (1 << 20);
#define I915_EXEC_USE_EXTENSIONS        (1 << 21);
#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_USE_EXTENSIONS << 1));
    __u64 rsvd1;
    __u64 rsvd2;
};

Members

buffers_ptr

Pointer to a list of gem_exec_object2 structs

buffer_count

Number of elements in buffers_ptr array

batch_start_offset

Offset in the batchbuffer to start execution from.

batch_len

Length in bytes of the batch buffer, starting from the batch_start_offset. If 0, length is assumed to be the batch buffer object size.

DR1

deprecated

DR4

deprecated

num_cliprects

See cliprects_ptr

cliprects_ptr

Kernel clipping was a DRI1 misfeature.

It is invalid to use this field if I915_EXEC_FENCE_ARRAY or I915_EXEC_USE_EXTENSIONS flags are not set.

If I915_EXEC_FENCE_ARRAY is set, then this is a pointer to an array of drm_i915_gem_exec_fence and num_cliprects is the length of the array.

If I915_EXEC_USE_EXTENSIONS is set, then this is a pointer to a single i915_user_extension and num_cliprects is 0.

flags

Execbuf flags

rsvd1

Context id

rsvd2

in and out sync_file file descriptors.

When I915_EXEC_FENCE_IN or I915_EXEC_FENCE_SUBMIT flag is set, the lower 32 bits of this field will have the in sync_file fd (input).

When I915_EXEC_FENCE_OUT flag is set, the upper 32 bits of this field will have the out sync_file fd (output).

struct drm_i915_gem_caching¶: Set or get the caching for given object handle.

Definition:

struct drm_i915_gem_caching {
    __u32 handle;
#define I915_CACHING_NONE               0;
#define I915_CACHING_CACHED             1;
#define I915_CACHING_DISPLAY            2;
    __u32 caching;
};

Members

handle

Handle of the buffer to set/get the caching level.

caching

The GTT caching level to apply or possible return value.

The supported caching values:

I915_CACHING_NONE:

GPU access is not coherent with CPU caches. Default for machines without an LLC. This means manual flushing might be needed, if we want GPU access to be coherent.

I915_CACHING_CACHED:

GPU access is coherent with CPU caches and furthermore the data is cached in last-level caches shared between CPU cores and the GPU GT.

I915_CACHING_DISPLAY:

Special GPU caching mode which is coherent with the scanout engines. Transparently falls back to I915_CACHING_NONE on platforms where no special cache mode (like write-through or gfdt flushing) is available. The kernel automatically sets this mode when using a buffer as a scanout target. Userspace can manually set this mode to avoid a costly stall and clflush in the hotpath of drawing the first frame.

Description

Allow userspace to control the GTT caching bits for a given object when the object is later mapped through the ppGTT(or GGTT on older platforms lacking ppGTT support, or if the object is used for scanout). Note that this might require unbinding the object from the GTT first, if its current caching value doesn't match.

Note that this all changes on discrete platforms, starting from DG1, the set/get caching is no longer supported, and is now rejected. Instead the CPU caching attributes(WB vs WC) will become an immutable creation time property for the object, along with the GTT caching level. For now we don't expose any new uAPI for this, instead on DG1 this is all implicit, although this largely shouldn't matter since DG1 is coherent by default(without any way of controlling it).

Implicit caching rules, starting from DG1:

If any of the object placements (see drm_i915_gem_create_ext_memory_regions) contain I915_MEMORY_CLASS_DEVICE then the object will be allocated and mapped as write-combined only.

Everything else is always allocated and mapped as write-back, with the guarantee that everything is also coherent with the GPU.

Note that this is likely to change in the future again, where we might need more flexibility on future devices, so making this all explicit as part of a new drm_i915_gem_create_ext extension is probable.

Side note: Part of the reason for this is that changing the at-allocation-time CPU caching attributes for the pages might be required(and is expensive) if we need to then CPU map the pages later with different caching attributes. This inconsistent caching behaviour, while supported on x86, is not universally supported on other architectures. So for simplicity we opt for setting everything at creation time, whilst also making it immutable, on discrete platforms.

struct drm_i915_gem_context_create_ext¶: Structure for creating contexts.

Definition:

struct drm_i915_gem_context_create_ext {
    __u32 ctx_id;
    __u32 flags;
#define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS        (1u << 0);
#define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE       (1u << 1);
#define I915_CONTEXT_CREATE_FLAGS_UNKNOWN       (-(I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE << 1));
    __u64 extensions;
#define I915_CONTEXT_CREATE_EXT_SETPARAM 0;
#define I915_CONTEXT_CREATE_EXT_CLONE 1;
};

Members

ctx_id

Id of the created context (output)

flags

Supported flags are:

I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS:

Extensions may be appended to this structure and driver must check for those. See extensions.

I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE

Created context will have single timeline.

extensions

Zero-terminated chain of extensions.

I915_CONTEXT_CREATE_EXT_SETPARAM: Context parameter to set or query during context creation. See struct drm_i915_gem_context_create_ext_setparam.

I915_CONTEXT_CREATE_EXT_CLONE: This extension has been removed. On the off chance someone somewhere has attempted to use it, never re-use this extension number.

struct drm_i915_gem_context_param¶: Context parameter to set or query.

Definition:

struct drm_i915_gem_context_param {
    __u32 ctx_id;
    __u32 size;
    __u64 param;
#define I915_CONTEXT_PARAM_BAN_PERIOD   0x1;
#define I915_CONTEXT_PARAM_NO_ZEROMAP   0x2;
#define I915_CONTEXT_PARAM_GTT_SIZE     0x3;
#define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE     0x4;
#define I915_CONTEXT_PARAM_BANNABLE     0x5;
#define I915_CONTEXT_PARAM_PRIORITY     0x6;
#define I915_CONTEXT_MAX_USER_PRIORITY        1023 ;
#define I915_CONTEXT_DEFAULT_PRIORITY         0;
#define I915_CONTEXT_MIN_USER_PRIORITY        -1023 ;
#define I915_CONTEXT_PARAM_SSEU         0x7;
#define I915_CONTEXT_PARAM_RECOVERABLE  0x8;
#define I915_CONTEXT_PARAM_VM           0x9;
#define I915_CONTEXT_PARAM_ENGINES      0xa;
#define I915_CONTEXT_PARAM_PERSISTENCE  0xb;
#define I915_CONTEXT_PARAM_RINGSIZE     0xc;
#define I915_CONTEXT_PARAM_PROTECTED_CONTENT    0xd;
    __u64 value;
};

Members

ctx_id: Context id
size: Size of the parameter value
param: Parameter to set or query
value: Context parameter value to be set or queried

Virtual Engine uAPI

Virtual engine is a concept where userspace is able to configure a set of physical engines, submit a batch buffer, and let the driver execute it on any engine from the set as it sees fit.

This is primarily useful on parts which have multiple instances of a same class engine, like for example GT3+ Skylake parts with their two VCS engines.

For instance userspace can enumerate all engines of a certain class using the previously described Engine Discovery uAPI. After that userspace can create a GEM context with a placeholder slot for the virtual engine (using I915_ENGINE_CLASS_INVALID and I915_ENGINE_CLASS_INVALID_NONE for class and instance respectively) and finally using the I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE extension place a virtual engine in the same reserved slot.

Example of creating a virtual engine and submitting a batch buffer to it:

I915_DEFINE_CONTEXT_ENGINES_LOAD_BALANCE(virtual, 2) = {
        .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
        .engine_index = 0, // Place this virtual engine into engine map slot 0
        .num_siblings = 2,
        .engines = { { I915_ENGINE_CLASS_VIDEO, 0 },
                     { I915_ENGINE_CLASS_VIDEO, 1 }, },
};
I915_DEFINE_CONTEXT_PARAM_ENGINES(engines, 1) = {
        .engines = { { I915_ENGINE_CLASS_INVALID,
                       I915_ENGINE_CLASS_INVALID_NONE } },
        .extensions = to_user_pointer(&virtual), // Chains after load_balance extension
};
struct drm_i915_gem_context_create_ext_setparam p_engines = {
        .base = {
                .name = I915_CONTEXT_CREATE_EXT_SETPARAM,
        },
        .param = {
                .param = I915_CONTEXT_PARAM_ENGINES,
                .value = to_user_pointer(&engines),
                .size = sizeof(engines),
        },
};
struct drm_i915_gem_context_create_ext create = {
        .flags = I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS,
        .extensions = to_user_pointer(&p_engines);
};

ctx_id = gem_context_create_ext(drm_fd, &create);

// Now we have created a GEM context with its engine map containing a
// single virtual engine. Submissions to this slot can go either to
// vcs0 or vcs1, depending on the load balancing algorithm used inside
// the driver. The load balancing is dynamic from one batch buffer to
// another and transparent to userspace.

...
execbuf.rsvd1 = ctx_id;
execbuf.flags = 0; // Submits to index 0 which is the virtual engine
gem_execbuf(drm_fd, &execbuf);

struct i915_context_engines_parallel_submit¶: Configure engine for parallel submission.

Definition:

struct i915_context_engines_parallel_submit {
    struct i915_user_extension base;
    __u16 engine_index;
    __u16 width;
    __u16 num_siblings;
    __u16 mbz16;
    __u64 flags;
    __u64 mbz64[3];
    struct i915_engine_class_instance engines[];
};

Members

base

base user extension.

engine_index

slot for parallel engine

width

number of contexts per parallel engine or in other words the number of batches in each submission

num_siblings

number of siblings per context or in other words the number of possible placements for each submission

mbz16

reserved for future use; must be zero

flags

all undefined flags must be zero, currently not defined flags

mbz64

reserved for future use; must be zero

engines

2-d array of engine instances to configure parallel engine

length = width (i) * num_siblings (j) index = j + i * num_siblings

Description

Setup a slot in the context engine map to allow multiple BBs to be submitted in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU in parallel. Multiple hardware contexts are created internally in the i915 to run these BBs. Once a slot is configured for N BBs only N BBs can be submitted in each execbuf IOCTL and this is implicit behavior e.g. The user doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how many BBs there are based on the slot's configuration. The N BBs are the last N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.

The default placement behavior is to create implicit bonds between each context if each context maps to more than 1 physical engine (e.g. context is a virtual engine). Also we only allow contexts of same engine class and these contexts must be in logically contiguous order. Examples of the placement behavior are described below. Lastly, the default is to not allow BBs to be preempted mid-batch. Rather insert coordinated preemption points on all hardware contexts between each set of BBs. Flags could be added in the future to change both of these default behaviors.

Returns -EINVAL if hardware context placement configuration is invalid or if the placement configuration isn't supported on the platform / submission interface. Returns -ENODEV if extension isn't supported on the platform / submission interface.

Examples syntax:
CS[X] = generic engine of same class, logical instance X
INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE

Example 1 pseudo code:
set_engines(INVALID)
set_parallel(engine_index=0, width=2, num_siblings=1,
             engines=CS[0],CS[1])

Results in the following valid placement:
CS[0], CS[1]

Example 2 pseudo code:
set_engines(INVALID)
set_parallel(engine_index=0, width=2, num_siblings=2,
             engines=CS[0],CS[2],CS[1],CS[3])

Results in the following valid placements:
CS[0], CS[1]
CS[2], CS[3]

This can be thought of as two virtual engines, each containing two
engines thereby making a 2D array. However, there are bonds tying the
entries together and placing restrictions on how they can be scheduled.
Specifically, the scheduler can choose only vertical columns from the 2D
array. That is, CS[0] is bonded to CS[1] and CS[2] to CS[3]. So if the
scheduler wants to submit to CS[0], it must also choose CS[1] and vice
versa. Same for CS[2] requires also using CS[3].
VE[0] = CS[0], CS[2]
VE[1] = CS[1], CS[3]

Example 3 pseudo code:
set_engines(INVALID)
set_parallel(engine_index=0, width=2, num_siblings=2,
             engines=CS[0],CS[1],CS[1],CS[3])

Results in the following valid and invalid placements:
CS[0], CS[1]
CS[1], CS[3] - Not logically contiguous, return -EINVAL

Context Engine Map uAPI

Context engine map is a new way of addressing engines when submitting batch- buffers, replacing the existing way of using identifiers like I915_EXEC_BLT inside the flags field of struct drm_i915_gem_execbuffer2.

To use it created GEM contexts need to be configured with a list of engines the user is intending to submit to. This is accomplished using the I915_CONTEXT_PARAM_ENGINES parameter and struct i915_context_param_engines.

For such contexts the I915_EXEC_RING_MASK field becomes an index into the configured map.

Example of creating such context and submitting against it:

I915_DEFINE_CONTEXT_PARAM_ENGINES(engines, 2) = {
        .engines = { { I915_ENGINE_CLASS_RENDER, 0 },
                     { I915_ENGINE_CLASS_COPY, 0 } }
};
struct drm_i915_gem_context_create_ext_setparam p_engines = {
        .base = {
                .name = I915_CONTEXT_CREATE_EXT_SETPARAM,
        },
        .param = {
                .param = I915_CONTEXT_PARAM_ENGINES,
                .value = to_user_pointer(&engines),
                .size = sizeof(engines),
        },
};
struct drm_i915_gem_context_create_ext create = {
        .flags = I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS,
        .extensions = to_user_pointer(&p_engines);
};

ctx_id = gem_context_create_ext(drm_fd, &create);

// We have now created a GEM context with two engines in the map:
// Index 0 points to rcs0 while index 1 points to bcs0. Other engines
// will not be accessible from this context.

...
execbuf.rsvd1 = ctx_id;
execbuf.flags = 0; // Submits to index 0, which is rcs0 for this context
gem_execbuf(drm_fd, &execbuf);

...
execbuf.rsvd1 = ctx_id;
execbuf.flags = 1; // Submits to index 0, which is bcs0 for this context
gem_execbuf(drm_fd, &execbuf);

struct drm_i915_gem_context_create_ext_setparam¶: Context parameter to set or query during context creation.

Definition:

struct drm_i915_gem_context_create_ext_setparam {
    struct i915_user_extension base;
    struct drm_i915_gem_context_param param;
};

Members

base: Extension link. See struct i915_user_extension.
param: Context parameter to set or query. See struct drm_i915_gem_context_param.

struct drm_i915_gem_vm_control¶: Structure to create or destroy VM.

Definition:

struct drm_i915_gem_vm_control {
    __u64 extensions;
    __u32 flags;
    __u32 vm_id;
};

Members

extensions: Zero-terminated chain of extensions.
flags: reserved for future usage, currently MBZ
vm_id: Id of the VM created or to be destroyed

Description

DRM_I915_GEM_VM_CREATE -

Create a new virtual memory address space (ppGTT) for use within a context on the same file. Extensions can be provided to configure exactly how the address space is setup upon creation.

The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is returned in the outparam id.

An extension chain maybe provided, starting with extensions, and terminated by the next_extension being 0. Currently, no extensions are defined.

DRM_I915_GEM_VM_DESTROY -

Destroys a previously created VM id, specified in vm_id.

No extensions or flags are allowed currently, and so must be zero.

struct drm_i915_gem_userptr¶: Create GEM object from user allocated memory.

Definition:

struct drm_i915_gem_userptr {
    __u64 user_ptr;
    __u64 user_size;
    __u32 flags;
#define I915_USERPTR_READ_ONLY 0x1;
#define I915_USERPTR_PROBE 0x2;
#define I915_USERPTR_UNSYNCHRONIZED 0x80000000;
    __u32 handle;
};

Members

user_ptr

The pointer to the allocated memory.

Needs to be aligned to PAGE_SIZE.

user_size

The size in bytes for the allocated memory. This will also become the object size.

Needs to be aligned to PAGE_SIZE, and should be at least PAGE_SIZE, or larger.

flags

Supported flags:

I915_USERPTR_READ_ONLY:

Mark the object as readonly, this also means GPU access can only be readonly. This is only supported on HW which supports readonly access through the GTT. If the HW can't support readonly access, an error is returned.

I915_USERPTR_PROBE:

Probe the provided user_ptr range and validate that the user_ptr is indeed pointing to normal memory and that the range is also valid. For example if some garbage address is given to the kernel, then this should complain.

Returns -EFAULT if the probe failed.

Note that this doesn't populate the backing pages, and also doesn't guarantee that the object will remain valid when the object is eventually used.

The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE returns a non-zero value.

I915_USERPTR_UNSYNCHRONIZED:

NOT USED. Setting this flag will result in an error.

handle

Returned handle for the object.

Object handles are nonzero.

Description

Userptr objects have several restrictions on what ioctls can be used with the object handle.

struct drm_i915_perf_oa_config¶

Definition:

struct drm_i915_perf_oa_config {
    char uuid[36];
    __u32 n_mux_regs;
    __u32 n_boolean_regs;
    __u32 n_flex_regs;
    __u64 mux_regs_ptr;
    __u64 boolean_regs_ptr;
    __u64 flex_regs_ptr;
};

Members

uuid: String formatted like "%08x-%04x-%04x-%04x-%012x"
n_mux_regs: Number of mux regs in mux_regs_ptr.
n_boolean_regs: Number of boolean regs in boolean_regs_ptr.
n_flex_regs: Number of flex regs in flex_regs_ptr.
mux_regs_ptr: Pointer to tuples of u32 values (register address, value) for mux registers. Expected length of buffer is (2 * sizeof(u32) * n_mux_regs).
boolean_regs_ptr: Pointer to tuples of u32 values (register address, value) for mux registers. Expected length of buffer is (2 * sizeof(u32) * n_boolean_regs).
flex_regs_ptr: Pointer to tuples of u32 values (register address, value) for mux registers. Expected length of buffer is (2 * sizeof(u32) * n_flex_regs).

Description

Structure to upload perf dynamic configuration into the kernel.

struct drm_i915_query_item¶: An individual query for the kernel to process.

Definition:

struct drm_i915_query_item {
    __u64 query_id;
#define DRM_I915_QUERY_TOPOLOGY_INFO            1;
#define DRM_I915_QUERY_ENGINE_INFO              2;
#define DRM_I915_QUERY_PERF_CONFIG              3;
#define DRM_I915_QUERY_MEMORY_REGIONS           4;
#define DRM_I915_QUERY_HWCONFIG_BLOB            5;
#define DRM_I915_QUERY_GEOMETRY_SUBSLICES       6;
    __s32 length;
    __u32 flags;
#define DRM_I915_QUERY_PERF_CONFIG_LIST          1;
#define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID 2;
#define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID   3;
    __u64 data_ptr;
};

Members

query_id

The id for this query. Currently accepted query IDs are:

DRM_I915_QUERY_TOPOLOGY_INFO (see struct drm_i915_query_topology_info)
DRM_I915_QUERY_ENGINE_INFO (see struct drm_i915_engine_info)
DRM_I915_QUERY_PERF_CONFIG (see struct drm_i915_query_perf_config)
DRM_I915_QUERY_MEMORY_REGIONS (see struct drm_i915_query_memory_regions)
DRM_I915_QUERY_HWCONFIG_BLOB (see GuC HWCONFIG blob uAPI)
DRM_I915_QUERY_GEOMETRY_SUBSLICES (see struct drm_i915_query_topology_info)

length

When set to zero by userspace, this is filled with the size of the data to be written at the data_ptr pointer. The kernel sets this value to a negative value to signal an error on a particular query item.

flags

When query_id == DRM_I915_QUERY_TOPOLOGY_INFO, must be 0.

When query_id == DRM_I915_QUERY_PERF_CONFIG, must be one of the following:

DRM_I915_QUERY_PERF_CONFIG_LIST

DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID

DRM_I915_QUERY_PERF_CONFIG_FOR_UUID

When query_id == DRM_I915_QUERY_GEOMETRY_SUBSLICES must contain a struct i915_engine_class_instance that references a render engine.

data_ptr

Data will be written at the location pointed by data_ptr when the value of length matches the length of the data to be written by the kernel.

Description

The behaviour is determined by the query_id. Note that exactly what data_ptr is also depends on the specific query_id.

struct drm_i915_query¶: Supply an array of struct drm_i915_query_item for the kernel to fill out.

Definition:

struct drm_i915_query {
    __u32 num_items;
    __u32 flags;
    __u64 items_ptr;
};

Members

num_items: The number of elements in the items_ptr array
flags: Unused for now. Must be cleared to zero.
items_ptr: Pointer to an array of struct drm_i915_query_item. The number of array elements is num_items.

Description

Note that this is generally a two step process for each struct drm_i915_query_item in the array:

Call the DRM_IOCTL_I915_QUERY, giving it our array of struct drm_i915_query_item, with drm_i915_query_item.length set to zero. The kernel will then fill in the size, in bytes, which tells userspace how memory it needs to allocate for the blob(say for an array of properties).
Next we call DRM_IOCTL_I915_QUERY again, this time with the drm_i915_query_item.data_ptr equal to our newly allocated blob. Note that the drm_i915_query_item.length should still be the same as what the kernel previously set. At this point the kernel can fill in the blob.

Note that for some query items it can make sense for userspace to just pass in a buffer/blob equal to or larger than the required size. In this case only a single ioctl call is needed. For some smaller query items this can work quite well.

struct drm_i915_query_topology_info¶

Definition:

struct drm_i915_query_topology_info {
    __u16 flags;
    __u16 max_slices;
    __u16 max_subslices;
    __u16 max_eus_per_subslice;
    __u16 subslice_offset;
    __u16 subslice_stride;
    __u16 eu_offset;
    __u16 eu_stride;
    __u8 data[];
};

Members

flags

Unused for now. Must be cleared to zero.

max_slices

The number of bits used to express the slice mask.

max_subslices

The number of bits used to express the subslice mask.

max_eus_per_subslice

The number of bits in the EU mask that correspond to a single subslice's EUs.

subslice_offset

Offset in data[] at which the subslice masks are stored.

subslice_stride

Stride at which each of the subslice masks for each slice are stored.

eu_offset

Offset in data[] at which the EU masks are stored.

eu_stride

Stride at which each of the EU masks for each subslice are stored.

data

Contains 3 pieces of information :

The slice mask with one bit per slice telling whether a slice is available. The availability of slice X can be queried with the following formula :
```
(data[X / 8] >> (X % 8)) & 1
```
Starting with Xe_HP platforms, Intel hardware no longer has traditional slices so i915 will always report a single slice (hardcoded slicemask = 0x1) which contains all of the platform's subslices. I.e., the mask here does not reflect any of the newer hardware concepts such as "gslices" or "cslices" since userspace is capable of inferring those from the subslice mask.
The subslice mask for each slice with one bit per subslice telling whether a subslice is available. Starting with Gen12 we use the term "subslice" to refer to what the hardware documentation describes as a "dual-subslices." The availability of subslice Y in slice X can be queried with the following formula :
```
(data[subslice_offset + X * subslice_stride + Y / 8] >> (Y % 8)) & 1
```
The EU mask for each subslice in each slice, with one bit per EU telling whether an EU is available. The availability of EU Z in subslice Y in slice X can be queried with the following formula :
```
(data[eu_offset +
      (X * max_subslices + Y) * eu_stride +
      Z / 8
 ] >> (Z % 8)) & 1
```

Description

Describes slice/subslice/EU information queried by DRM_I915_QUERY_TOPOLOGY_INFO

Engine Discovery uAPI

Engine discovery uAPI is a way of enumerating physical engines present in a GPU associated with an open i915 DRM file descriptor. This supersedes the old way of using DRM_IOCTL_I915_GETPARAM and engine identifiers like I915_PARAM_HAS_BLT.

The need for this interface came starting with Icelake and newer GPUs, which started to establish a pattern of having multiple engines of a same class, where not all instances were always completely functionally equivalent.

Entry point for this uapi is DRM_IOCTL_I915_QUERY with the DRM_I915_QUERY_ENGINE_INFO as the queried item id.

Example for getting the list of engines:

struct drm_i915_query_engine_info *info;
struct drm_i915_query_item item = {
        .query_id = DRM_I915_QUERY_ENGINE_INFO;
};
struct drm_i915_query query = {
        .num_items = 1,
        .items_ptr = (uintptr_t)&item,
};
int err, i;

// First query the size of the blob we need, this needs to be large
// enough to hold our array of engines. The kernel will fill out the
// item.length for us, which is the number of bytes we need.
//
// Alternatively a large buffer can be allocated straight away enabling
// querying in one pass, in which case item.length should contain the
// length of the provided buffer.
err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
if (err) ...

info = calloc(1, item.length);
// Now that we allocated the required number of bytes, we call the ioctl
// again, this time with the data_ptr pointing to our newly allocated
// blob, which the kernel can then populate with info on all engines.
item.data_ptr = (uintptr_t)&info,

err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
if (err) ...

// We can now access each engine in the array
for (i = 0; i < info->num_engines; i++) {
        struct drm_i915_engine_info einfo = info->engines[i];
        u16 class = einfo.engine.class;
        u16 instance = einfo.engine.instance;
        ....
}

free(info);

Each of the enumerated engines, apart from being defined by its class and instance (see struct i915_engine_class_instance), also can have flags and capabilities defined as documented in i915_drm.h.

For instance video engines which support HEVC encoding will have the I915_VIDEO_CLASS_CAPABILITY_HEVC capability bit set.

Engine discovery only fully comes to its own when combined with the new way of addressing engines when submitting batch buffers using contexts with engine maps configured.

struct drm_i915_engine_info¶

Definition:

struct drm_i915_engine_info {
    struct i915_engine_class_instance engine;
    __u32 rsvd0;
    __u64 flags;
#define I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE           (1 << 0);
    __u64 capabilities;
#define I915_VIDEO_CLASS_CAPABILITY_HEVC                (1 << 0);
#define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC     (1 << 1);
    __u16 logical_instance;
    __u16 rsvd1[3];
    __u64 rsvd2[3];
};

Members

engine: Engine class and instance.
rsvd0: Reserved field.
flags: Engine flags.
capabilities: Capabilities of this engine.
logical_instance: Logical instance of engine
rsvd1: Reserved fields.
rsvd2: Reserved fields.

Description

Describes one engine and it's capabilities as known to the driver.

struct drm_i915_query_engine_info¶

Definition:

struct drm_i915_query_engine_info {
    __u32 num_engines;
    __u32 rsvd[3];
    struct drm_i915_engine_info engines[];
};

Members

num_engines: Number of struct drm_i915_engine_info structs following.
rsvd: MBZ
engines: Marker for drm_i915_engine_info structures.

Description

Engine info query enumerates all engines known to the driver by filling in an array of struct drm_i915_engine_info structures.

struct drm_i915_query_perf_config¶

Definition:

struct drm_i915_query_perf_config {
    union {
        __u64 n_configs;
        __u64 config;
        char uuid[36];
    };
    __u32 flags;
    __u8 data[];
};

Members

{unnamed_union}

anonymous

n_configs

When drm_i915_query_item.flags == DRM_I915_QUERY_PERF_CONFIG_LIST, i915 sets this fields to the number of configurations available.

config

When drm_i915_query_item.flags == DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID, i915 will use the value in this field as configuration identifier to decide what data to write into config_ptr.

uuid

When drm_i915_query_item.flags == DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID, i915 will use the value in this field as configuration identifier to decide what data to write into config_ptr.

String formatted like "08x-````04x-````04x-````04x-````012x"

flags

Unused for now. Must be cleared to zero.

data

When drm_i915_query_item.flags == DRM_I915_QUERY_PERF_CONFIG_LIST, i915 will write an array of __u64 of configuration identifiers.

When drm_i915_query_item.flags == DRM_I915_QUERY_PERF_CONFIG_DATA, i915 will write a struct drm_i915_perf_oa_config. If the following fields of struct drm_i915_perf_oa_config are not set to 0, i915 will write into the associated pointers the values of submitted when the configuration was created :

drm_i915_perf_oa_config.n_mux_regs

drm_i915_perf_oa_config.n_boolean_regs

drm_i915_perf_oa_config.n_flex_regs

Description

Data written by the kernel with query DRM_I915_QUERY_PERF_CONFIG and DRM_I915_QUERY_GEOMETRY_SUBSLICES.

enum drm_i915_gem_memory_class¶: Supported memory classes

Constants

I915_MEMORY_CLASS_SYSTEM: System memory
I915_MEMORY_CLASS_DEVICE: Device local-memory

struct drm_i915_gem_memory_class_instance¶: Identify particular memory region

Definition:

struct drm_i915_gem_memory_class_instance {
    __u16 memory_class;
    __u16 memory_instance;
};

Members

memory_class: See enum drm_i915_gem_memory_class
memory_instance: Which instance

struct drm_i915_memory_region_info¶: Describes one region as known to the driver.

Definition:

struct drm_i915_memory_region_info {
    struct drm_i915_gem_memory_class_instance region;
    __u32 rsvd0;
    __u64 probed_size;
    __u64 unallocated_size;
    union {
        __u64 rsvd1[8];
        struct {
            __u64 probed_cpu_visible_size;
            __u64 unallocated_cpu_visible_size;
        };
    };
};

Members

region

The class:instance pair encoding

rsvd0

MBZ

probed_size

Memory probed by the driver

Note that it should not be possible to ever encounter a zero value here, also note that no current region type will ever return -1 here. Although for future region types, this might be a possibility. The same applies to the other size fields.

unallocated_size

Estimate of memory remaining

Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting. Without this (or if this is an older kernel) the value here will always equal the probed_size. Note this is only currently tracked for I915_MEMORY_CLASS_DEVICE regions (for other types the value here will always equal the probed_size).

{unnamed_union}

anonymous

rsvd1

MBZ

{unnamed_struct}

anonymous

probed_cpu_visible_size

Memory probed by the driver that is CPU accessible.

This will be always be <= probed_size, and the remainder (if there is any) will not be CPU accessible.

On systems without small BAR, the probed_size will always equal the probed_cpu_visible_size, since all of it will be CPU accessible.

Note this is only tracked for I915_MEMORY_CLASS_DEVICE regions (for other types the value here will always equal the probed_size).

Note that if the value returned here is zero, then this must be an old kernel which lacks the relevant small-bar uAPI support (including I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on such systems we should never actually end up with a small BAR configuration, assuming we are able to load the kernel module. Hence it should be safe to treat this the same as when probed_cpu_visible_size == probed_size.

unallocated_cpu_visible_size

Estimate of CPU visible memory remaining.

Note this is only tracked for I915_MEMORY_CLASS_DEVICE regions (for other types the value here will always equal the probed_cpu_visible_size).

Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting. Without this the value here will always equal the probed_cpu_visible_size. Note this is only currently tracked for I915_MEMORY_CLASS_DEVICE regions (for other types the value here will also always equal the probed_cpu_visible_size).

If this is an older kernel the value here will be zero, see also probed_cpu_visible_size.

Description

Note this is using both struct drm_i915_query_item and struct drm_i915_query. For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS at drm_i915_query_item.query_id.

struct drm_i915_query_memory_regions¶

Definition:

struct drm_i915_query_memory_regions {
    __u32 num_regions;
    __u32 rsvd[3];
    struct drm_i915_memory_region_info regions[];
};

Members

num_regions: Number of supported regions
rsvd: MBZ
regions: Info about each supported region

Description

The region info query enumerates all regions known to the driver by filling in an array of struct drm_i915_memory_region_info structures.

Example for getting the list of supported regions:

struct drm_i915_query_memory_regions *info;
struct drm_i915_query_item item = {
        .query_id = DRM_I915_QUERY_MEMORY_REGIONS;
};
struct drm_i915_query query = {
        .num_items = 1,
        .items_ptr = (uintptr_t)&item,
};
int err, i;

// First query the size of the blob we need, this needs to be large
// enough to hold our array of regions. The kernel will fill out the
// item.length for us, which is the number of bytes we need.
err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
if (err) ...

info = calloc(1, item.length);
// Now that we allocated the required number of bytes, we call the ioctl
// again, this time with the data_ptr pointing to our newly allocated
// blob, which the kernel can then populate with the all the region info.
item.data_ptr = (uintptr_t)&info,

err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
if (err) ...

// We can now access each region in the array
for (i = 0; i < info->num_regions; i++) {
        struct drm_i915_memory_region_info mr = info->regions[i];
        u16 class = mr.region.class;
        u16 instance = mr.region.instance;

        ....
}

free(info);

GuC HWCONFIG blob uAPI

The GuC produces a blob with information about the current device. i915 reads this blob from GuC and makes it available via this uAPI.

The format and meaning of the blob content are documented in the Programmer's Reference Manual.

struct drm_i915_gem_create_ext¶: Existing gem_create behaviour, with added extension support using struct i915_user_extension.

Definition:

struct drm_i915_gem_create_ext {
    __u64 size;
    __u32 handle;
#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0);
    __u32 flags;
#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0;
#define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1;
#define I915_GEM_CREATE_EXT_SET_PAT 2;
    __u64 extensions;
};

Members

size

Requested size for the object.

The (page-aligned) allocated size for the object will be returned.

On platforms like DG2/ATS the kernel will always use 64K or larger pages for I915_MEMORY_CLASS_DEVICE. The kernel also requires a minimum of 64K GTT alignment for such objects.

NOTE: Previously the ABI here required a minimum GTT alignment of 2M on DG2/ATS, due to how the hardware implemented 64K GTT page support, where we had the following complications:

1) The entire PDE (which covers a 2MB virtual address range), must contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same PDE is forbidden by the hardware.

2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM objects.

However on actual production HW this was completely changed to now allow setting a TLB hint at the PTE level (see PS64), which is a lot more flexible than the above. With this the 2M restriction was dropped where we now only require 64K.

handle

Returned handle for the object.

Object handles are nonzero.

flags

Optional flags.

Supported values:

I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that the object will need to be accessed via the CPU.

Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and only strictly required on configurations where some subset of the device memory is directly visible/mappable through the CPU (which we also call small BAR), like on some DG2+ systems. Note that this is quite undesirable, but due to various factors like the client CPU, BIOS etc it's something we can expect to see in the wild. See drm_i915_memory_region_info.probed_cpu_visible_size for how to determine if this system applies.

Note that one of the placements MUST be I915_MEMORY_CLASS_SYSTEM, to ensure the kernel can always spill the allocation to system memory, if the object can't be allocated in the mappable part of I915_MEMORY_CLASS_DEVICE.

Also note that since the kernel only supports flat-CCS on objects that can only be placed in I915_MEMORY_CLASS_DEVICE, we therefore don't support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together with flat-CCS.

Without this hint, the kernel will assume that non-mappable I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the kernel can still migrate the object to the mappable part, as a last resort, if userspace ever CPU faults this object, but this might be expensive, and so ideally should be avoided.

On older kernels which lack the relevant small-bar uAPI support (see also drm_i915_memory_region_info.probed_cpu_visible_size), usage of the flag will result in an error, but it should NEVER be possible to end up with a small BAR configuration, assuming we can also successfully load the i915 kernel module. In such cases the entire I915_MEMORY_CLASS_DEVICE region will be CPU accessible, and as such there are zero restrictions on where the object can be placed.

extensions

The chain of extensions to apply to this object.

This will be useful in the future when we need to support several different extensions, and we need to apply more than one when creating the object. See struct i915_user_extension.

If we don't supply any extensions then we get the same old gem_create behaviour.

For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see struct drm_i915_gem_create_ext_memory_regions.

For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see struct drm_i915_gem_create_ext_protected_content.

For I915_GEM_CREATE_EXT_SET_PAT usage see struct drm_i915_gem_create_ext_set_pat.

Description

Note that new buffer flags should be added here, at least for the stuff that is immutable. Previously we would have two ioctls, one to create the object with gem_create, and another to apply various parameters, however this creates some ambiguity for the params which are considered immutable. Also in general we're phasing out the various SET/GET ioctls.

struct drm_i915_gem_create_ext_memory_regions¶: The I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.

Definition:

struct drm_i915_gem_create_ext_memory_regions {
    struct i915_user_extension base;
    __u32 pad;
    __u32 num_regions;
    __u64 regions;
};

Members

base

Extension link. See struct i915_user_extension.

pad

MBZ

num_regions

Number of elements in the regions array.

regions

The regions/placements array.

An array of struct drm_i915_gem_memory_class_instance.

Description

Set the object with the desired set of placements/regions in priority order. Each entry must be unique and supported by the device.

This is provided as an array of struct drm_i915_gem_memory_class_instance, or an equivalent layout of class:instance pair encodings. See struct drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to query the supported regions for a device.

As an example, on discrete devices, if we wish to set the placement as device local-memory we can do something like:

struct drm_i915_gem_memory_class_instance region_lmem = {
        .memory_class = I915_MEMORY_CLASS_DEVICE,
        .memory_instance = 0,
};
struct drm_i915_gem_create_ext_memory_regions regions = {
        .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
        .regions = (uintptr_t)&region_lmem,
        .num_regions = 1,
};
struct drm_i915_gem_create_ext create_ext = {
        .size = 16 * PAGE_SIZE,
        .extensions = (uintptr_t)&regions,
};

int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
if (err) ...

At which point we get the object handle in drm_i915_gem_create_ext.handle, along with the final object size in drm_i915_gem_create_ext.size, which should account for any rounding up, if required.

Note that userspace has no means of knowing the current backing region for objects where num_regions is larger than one. The kernel will only ensure that the priority order of the regions array is honoured, either when initially placing the object, or when moving memory around due to memory pressure

On Flat-CCS capable HW, compression is supported for the objects residing in I915_MEMORY_CLASS_DEVICE. When such objects (compressed) have other memory class in regions and migrated (by i915, due to memory constraints) to the non I915_MEMORY_CLASS_DEVICE region, then i915 needs to decompress the content. But i915 doesn't have the required information to decompress the userspace compressed objects.

So i915 supports Flat-CCS, on the objects which can reside only on I915_MEMORY_CLASS_DEVICE regions.

struct drm_i915_gem_create_ext_protected_content¶: The I915_OBJECT_PARAM_PROTECTED_CONTENT extension.

Definition:

struct drm_i915_gem_create_ext_protected_content {
    struct i915_user_extension base;
    __u32 flags;
};

Members

base: Extension link. See struct i915_user_extension.
flags: reserved for future usage, currently MBZ

Description

If this extension is provided, buffer contents are expected to be protected by PXP encryption and require decryption for scan out and processing. This is only possible on platforms that have PXP enabled, on all other scenarios using this extension will cause the ioctl to fail and return -ENODEV. The flags parameter is reserved for future expansion and must currently be set to zero.

The buffer contents are considered invalid after a PXP session teardown.

The encryption is guaranteed to be processed correctly only if the object is submitted with a context created using the I915_CONTEXT_PARAM_PROTECTED_CONTENT flag. This will also enable extra checks at submission time on the validity of the objects involved.

Below is an example on how to create a protected object:

struct drm_i915_gem_create_ext_protected_content protected_ext = {
        .base = { .name = I915_GEM_CREATE_EXT_PROTECTED_CONTENT },
        .flags = 0,
};
struct drm_i915_gem_create_ext create_ext = {
        .size = PAGE_SIZE,
        .extensions = (uintptr_t)&protected_ext,
};

int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
if (err) ...

struct drm_i915_gem_create_ext_set_pat¶: The I915_GEM_CREATE_EXT_SET_PAT extension.

Definition:

struct drm_i915_gem_create_ext_set_pat {
    struct i915_user_extension base;
    __u32 pat_index;
    __u32 rsvd;
};

Members

base: Extension link. See struct i915_user_extension.
pat_index: PAT index to be set PAT index is a bit field in Page Table Entry to control caching behaviors for GPU accesses. The definition of PAT index is platform dependent and can be found in hardware specifications,
rsvd: reserved for future use

Description

If this extension is provided, the specified caching policy (PAT index) is applied to the buffer object.

Below is an example on how to create an object with specific caching policy:

struct drm_i915_gem_create_ext_set_pat set_pat_ext = {
        .base = { .name = I915_GEM_CREATE_EXT_SET_PAT },
        .pat_index = 0,
};
struct drm_i915_gem_create_ext create_ext = {
        .size = PAGE_SIZE,
        .extensions = (uintptr_t)&set_pat_ext,
};

int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
if (err) ...

drm/nouveau uAPI¶

VM_BIND / EXEC uAPI¶

Nouveau's VM_BIND / EXEC UAPI consists of three ioctls: DRM_NOUVEAU_VM_INIT, DRM_NOUVEAU_VM_BIND and DRM_NOUVEAU_EXEC.

In order to use the UAPI firstly a user client must initialize the VA space using the DRM_NOUVEAU_VM_INIT ioctl specifying which region of the VA space should be managed by the kernel and which by the UMD.

The DRM_NOUVEAU_VM_BIND ioctl provides clients an interface to manage the userspace-managable portion of the VA space. It provides operations to map and unmap memory. Mappings may be flagged as sparse. Sparse mappings are not backed by a GEM object and the kernel will ignore GEM handles provided alongside a sparse mapping.

Userspace may request memory backed mappings either within or outside of the bounds (but not crossing those bounds) of a previously mapped sparse mapping. Subsequently requested memory backed mappings within a sparse mapping will take precedence over the corresponding range of the sparse mapping. If such memory backed mappings are unmapped the kernel will make sure that the corresponding sparse mapping will take their place again. Requests to unmap a sparse mapping that still contains memory backed mappings will result in those memory backed mappings being unmapped first.

Unmap requests are not bound to the range of existing mappings and can even overlap the bounds of sparse mappings. For such a request the kernel will make sure to unmap all memory backed mappings within the given range, splitting up memory backed mappings which are only partially contained within the given range. Unmap requests with the sparse flag set must match the range of a previously mapped sparse mapping exactly though.

While the kernel generally permits arbitrary sequences and ranges of memory backed mappings being mapped and unmapped, either within a single or multiple VM_BIND ioctl calls, there are some restrictions for sparse mappings.

The kernel does not permit to:

unmap non-existent sparse mappings
unmap a sparse mapping and map a new sparse mapping overlapping the range of the previously unmapped sparse mapping within the same VM_BIND ioctl
unmap a sparse mapping and map new memory backed mappings overlapping the range of the previously unmapped sparse mapping within the same VM_BIND ioctl

When using the VM_BIND ioctl to request the kernel to map memory to a given virtual address in the GPU's VA space there is no guarantee that the actual mappings are created in the GPU's MMU. If the given memory is swapped out at the time the bind operation is executed the kernel will stash the mapping details into it's internal alloctor and create the actual MMU mappings once the memory is swapped back in. While this is transparent for userspace, it is guaranteed that all the backing memory is swapped back in and all the memory mappings, as requested by userspace previously, are actually mapped once the DRM_NOUVEAU_EXEC ioctl is called to submit an exec job.

A VM_BIND job can be executed either synchronously or asynchronously. If exectued asynchronously, userspace may provide a list of syncobjs this job will wait for and/or a list of syncobj the kernel will signal once the VM_BIND job finished execution. If executed synchronously the ioctl will block until the bind job is finished. For synchronous jobs the kernel will not permit any syncobjs submitted to the kernel.

To execute a push buffer the UAPI provides the DRM_NOUVEAU_EXEC ioctl. EXEC jobs are always executed asynchronously, and, equal to VM_BIND jobs, provide the option to synchronize them with syncobjs.

Besides that, EXEC jobs can be scheduled for a specified channel to execute on.

Since VM_BIND jobs update the GPU's VA space on job submit, EXEC jobs do have an up to date view of the VA space. However, the actual mappings might still be pending. Hence, EXEC jobs require to have the particular fences - of the corresponding VM_BIND jobs they depent on - attached to them.

struct drm_nouveau_sync¶: sync object

Definition:

struct drm_nouveau_sync {
    __u32 flags;
#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0;
#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1;
#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf;
    __u32 handle;
    __u64 timeline_value;
};

Members

flags

the flags for a sync object

The first 8 bits are used to determine the type of the sync object.

handle

the handle of the sync object

timeline_value

The timeline point of the sync object in case the syncobj is of type DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.

Description

This structure serves as synchronization mechanism for (potentially) asynchronous operations such as EXEC or VM_BIND.

struct drm_nouveau_vm_init¶: GPU VA space init structure

Definition:

struct drm_nouveau_vm_init {
    __u64 kernel_managed_addr;
    __u64 kernel_managed_size;
};

Members

kernel_managed_addr: start address of the kernel managed VA space region
kernel_managed_size: size of the kernel managed VA space region in bytes

Description

Used to initialize the GPU's VA space for a user client, telling the kernel which portion of the VA space is managed by the UMD and kernel respectively.

For the UMD to use the VM_BIND uAPI, this must be called before any BOs or channels are created; if called afterwards DRM_IOCTL_NOUVEAU_VM_INIT fails with -ENOSYS.

struct drm_nouveau_vm_bind_op¶: VM_BIND operation

Definition:

struct drm_nouveau_vm_bind_op {
    __u32 op;
#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x0;
#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x1;
    __u32 flags;
#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8);
    __u32 handle;
    __u32 pad;
    __u64 addr;
    __u64 bo_offset;
    __u64 range;
};

Members

op: the operation type
flags: the flags for a drm_nouveau_vm_bind_op
handle: the handle of the DRM GEM object to map
pad: 32 bit padding, should be 0
addr: the address the VA space region or (memory backed) mapping should be mapped to
bo_offset: the offset within the BO backing the mapping
range: the size of the requested mapping in bytes

Description

This structure represents a single VM_BIND operation. UMDs should pass an array of this structure via struct drm_nouveau_vm_bind's op_ptr field.

struct drm_nouveau_vm_bind¶: structure for DRM_IOCTL_NOUVEAU_VM_BIND

Definition:

struct drm_nouveau_vm_bind {
    __u32 op_count;
    __u32 flags;
#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1;
    __u32 wait_count;
    __u32 sig_count;
    __u64 wait_ptr;
    __u64 sig_ptr;
    __u64 op_ptr;
};

Members

op_count: the number of drm_nouveau_vm_bind_op
flags: the flags for a drm_nouveau_vm_bind ioctl
wait_count: the number of wait drm_nouveau_syncs
sig_count: the number of drm_nouveau_syncs to signal when finished
wait_ptr: pointer to drm_nouveau_syncs to wait for
sig_ptr: pointer to drm_nouveau_syncs to signal when finished
op_ptr: pointer to the drm_nouveau_vm_bind_ops to execute

struct drm_nouveau_exec_push¶: EXEC push operation

Definition:

struct drm_nouveau_exec_push {
    __u64 va;
    __u32 va_len;
    __u32 flags;
#define DRM_NOUVEAU_EXEC_PUSH_NO_PREFETCH 0x1;
};

Members

va: the virtual address of the push buffer mapping
va_len: the length of the push buffer mapping
flags: the flags for this push buffer mapping

Description

This structure represents a single EXEC push operation. UMDs should pass an array of this structure via struct drm_nouveau_exec's push_ptr field.

struct drm_nouveau_exec¶: structure for DRM_IOCTL_NOUVEAU_EXEC

Definition:

struct drm_nouveau_exec {
    __u32 channel;
    __u32 push_count;
    __u32 wait_count;
    __u32 sig_count;
    __u64 wait_ptr;
    __u64 sig_ptr;
    __u64 push_ptr;
};

Members

channel: the channel to execute the push buffer in
push_count: the number of drm_nouveau_exec_push ops
wait_count: the number of wait drm_nouveau_syncs
sig_count: the number of drm_nouveau_syncs to signal when finished
wait_ptr: pointer to drm_nouveau_syncs to wait for
sig_ptr: pointer to drm_nouveau_syncs to signal when finished
push_ptr: pointer to drm_nouveau_exec_push ops

The Linux Kernel

Contents

This Page

DRM Driver uAPI¶

drm/i915 uAPI¶

drm/nouveau uAPI¶

VM_BIND / EXEC uAPI¶