Neptune Query Language (NQL)#

Experimental

This feature is experimental. We're happy to hear your feedback through GitHub!

When using fetch_runs_table() to fetch runs from your project, you can pass a raw NQL string to the query argument.

import neptune

project = neptune.init_project()
project.fetch_runs_table(
    query='(`sys/tags`:stringSet CONTAINS "some-tag") AND (`f1`:float >= 0.85)'
)

This way, the runs can be filtered by any field and a number of criteria.

You can filter model objects similarly when using fetch_models_table() and fetch_model_versions_table().

How is NQL different from the app search?

The search query builder in the web app has some extra functionality added on top, to make query building more convenient. Queries are converted to raw NQL underneath the hood.

In the first version of adding querying capabilities to the API, we're exposing NQL without modifications.

The later sections contain example queries for various data types:

Float
Float series
String
Tags
Artifact version
System metadata (name, size, state, timestamps, etc.)

NQL syntax#

An NQL query has the following parts:

`<field name>`:<fieldType> <OPERATOR> <zero or more values>

For example:

`scores/f1`:float >= 0.60

Building a query: Step by step#

The following tabs walk you through constructing each part of a valid query.

1. Field name2. Field type3. Operator4. Value(5. Statistical function)

Query constructed so far

`scores/f1`

Use the field name you specified when assigning the metadata to the run. For the above example, it would be: run["scores/f1"] = f1_score

While usually not necessary, it's safest to enclose the field name in single backquotes (`).

Query constructed so far

`scores/f1`:float

For Neptune to correctly parse the specified field name, you need to provide the Neptune field type immediately after the field name, separated by a colon (:). The field type must be in camel case.¹

Available types:

float

floatSeries

string

stringSeries

stringSet

experimentState

artifact

bool

datetime

int

Query constructed so far

`scores/f1`:float >

The available operators depends on the field type.

Operators	Supported field types
`=` ,`!=`	`artifact`, `bool`, `experimentState`, `string`, `int`, `float`, `floatSeries` aggregates
`>`, `>=`, `<`, `<=`	`int`, `float`, `floatSeries` aggregates
`CONTAINS`	`string`, `stringSeries`, `stringSet`
`EXISTS`	Any
`NOT`	Negates other operators or clauses. See Negation ↓

Query constructed so far

`scores/f1`:float > 0.8

It's usually possible to enter the plain value without quotes, but in some cases double quotes (") are necessary. For example, if the value contains a space.

Enclosing a special value in quotes

query='`sys/tags`:stringSet CONTAINS "my tag"'

(OPTIONAL) If your field is a float series, you need to wrap the first part of the expression in a supported aggregate function: average(), last(), max(), or min().

Example query

average(`accuracy`:floatSeries) > 0.8

Multi-clause (complex) queries#

You can also build a complex query, in which multiple conditions are joined by logical operators.

Surround the clauses with () and use AND or OR to join them.

(`field1`:fieldType = value1) AND (`field2`:fieldType = value2)

Example query: Particular learning rate and high enough final accuracy

query='(last(`metrics/acc`):float >= 0.85) AND (`learning_rate`:float = 0.002)'

Note that each run is matched against the full query individually.

Negation#

You can use NOT in front of operators or clauses.

The following are equivalent and would exclude runs that have "blobfish" in their name:

`sys/name`:string NOT CONTAINS "blobfish"

NOT `sys/name`:string CONTAINS "blobfish"

You can also negate joined clauses. This requires enclosing them with parentheses:

Don't include failed runs whose names contain "blobfish"

NOT (`sys/name`:string CONTAINS blobfish AND `sys/failed`:bool = True)

Aggregate functions of numerical series#

You can use the following statistical (aggregate) functions on FloatSeries fields:

average()
last()
max()
min()

For example, to get the last logged score of a float series field with the path metrics/accuracy:

last(`metrics/accuracy`):float >= 0.80

More examples#

Models small enough to be used on mobile that have decent test accuracy#

NQL query

(`model_info/size_MB`:float <= 50MB) AND (`test/acc`:float > 0.90)

What was logged

run = neptune.init_run()
run["model_info/size_MB"] = 45
for epoch in epochs:
    # training loop
    acc = ...
    run["test/acc"].append(acc)

All of Jackie's runs from the current exploration task#

NQL query

(`sys/owner`:string = "jackie") AND (`sys/tags`:stringSet CONTAINS "exploration")

What was logged

run = neptune.init_run(
    api_token="...", # (1)!
    tags=["exploration", "pretrained"],
)

The API token of jackie's account, passed to this argument or set to the NEPTUNE_API_TOKEN environment variable

All failed runs from the start of the year#

NQL query

(sys/creation_time:datetime > "2024-01-01T00:00:00Z") AND (sys/failed:bool = True)

What was logged

# Date is in 2024
run = neptune.init_run()
# Exception was raised during execution

Float#

If you assigned a float value to a field, that field type is Float and can be queried as follows:

Retrieve runs with F1 score lower than 0.5

project.fetch_runs_table(
    query='`f1_score`:float < 0.50'
)

In this case, the logging code could be something like run["f1_score"] = 0.48 for a run matching the expression.

Float series#

If you used append() or extend() to create a series, you need to use an aggregate function to access float values characterizing the series.

Filter by last appended accuracy score

last(`metrics/accuracy`):float >= 0.80

The following statistical functions are supported:

average()
last()
max()
min()

String#

You can filter either by the full string, or use the CONTAINS operator to access substrings.

Exact match

project.fetch_runs_table(
    query='`sys/name`:string = "cunning-blobfish"'
)

Partial match (contains substring)

project.fetch_runs_table(
    query='`sys/name`:string CONTAINS "blobfish"'
)

String series#

For StringSeries fields, only the last logged entry is considered.

For example, the last line of logged system metrics (stderr or stdout).

project.fetch_runs_table(
    query='`my_monitoring_namespace/stdout`:stringSeries CONTAINS "error"'
)

Tags#

When adding tags at creation or later through the web app, they're stored as a StringSet in the auto-created sys/tags field. To filter by one or more tags, this is the field you need to access.

Query by single tag

project.fetch_runs_table(
    query='`sys/tags`:stringSet CONTAINS "tag-name"'
)

Query by multiple tags: Matches at least one tag (OR)

(`sys/tags`:stringSet CONTAINS "tag1") OR (`sys/tags`:stringSet CONTAINS "tag2")

Query by multiple tags: Matches all tags (AND)

(`sys/tags`:stringSet CONTAINS "tag1") AND (`sys/tags`:stringSet CONTAINS "tag2")

System metadata#

The system namespace (sys) automatically stores basic metadata about the environment and run. Most of the values are simple string, float, or Boolean values.

Learn more

API ≫ System namespace (sys)

Date and time#

Neptune automatically creates three timestamp fields:

sys/creation_time: When the run object was first created.
sys/modification_time: When the object was last modified (for example, a tag was removed or some metadata was logged).
sys/ping_time: When the object last interacted with the Python client library (something was logged or modified through the code).

For the value, you can enter a combined date and time representation with a time-zone specification, in ISO 8601 format:

YYYY-MM-DDThh:mm:ssZ

where Z is the time-zone offset for UTC. You can use a different offset.

Pinged by the Python client after 5 AM UTC on a specific date

`sys/ping_time`:datetime > "2024-02-06T05:00:00Z"

Pinged by the Python client after 5 AM Japanese time on a specific date

`sys/ping_time`:datetime > "2024-02-06T05:00:00+09"

You can also enter relative time values:

-2h (last 2 hours)
-5d (last 5 days)
-1M (last month)

Created more than 3 months ago

`sys/creation_time`:datetime < "-3M"

Description#

You can pass a description to the description argument of the init_run() function. You can also set the description through the web app, in the run information modal.

To filter by the description:

Exact match

project.fetch_runs_table(
    query='`sys/description`:string = "test run on new data"'
)

Partial match (contains substring)

project.fetch_runs_table(
    query='`sys/description`:string CONTAINS "new data"'
)

ID#

Each run automatically receives a unique Neptune ID, which consists of the project key and a counter.

Single run

project.fetch_runs_table(
    query='`sys/id`:string = "NLI-345"'
)

Use the OR operator to fetch multiple specific runs at once.

Multiple runs

project.fetch_runs_table(
    query='(`sys/id`:string = "NLI-35") OR (`sys/id`:string = "NLI-36")'
)

Name#

You can pass a name to the name argument of the init_run() function, or add it later through the run information modal in the web app.

Neptune does not require the name to be unique, but you can use it as a human-friendly identifier.

Exact match

project.fetch_runs_table(
    query='`sys/name`:string = "cunning-blobfish"'
)

Partial match (contains substring)

project.fetch_runs_table(
    query='`sys/name`:string CONTAINS "blobfish"'
)

Owner#

The owner refers to the user or service account that created the run.

By owner: Regular username

project.fetch_runs_table(
    query='`sys/owner`:string = "jackie"'
)

By one of the workspace service accounts

project.fetch_runs_table(
    query='`sys/owner`:string CONTAINS "@ml-team"' # (1)!
)

In this case, the expression matches all service account names that belong to the workspace ml-team. Learn more: Service accounts →

Size#

Size refers to the run object itself. That is, how much storage space it's taking up in Neptune.

Note on storage and trash

As long as runs remain in the project trash, they take up space.

By default, trashed objects are excluded from the query. To include them:

Find large runs, including trashed ones

project.fetch_runs_table(
    query='`sys/size`:float > 100MB',
    trashed=None, # (1)!
)

To include only trashed runs, set to True.

Run objects larger than 10 MB

project.fetch_runs_table(
    query='`sys/size`:float > 10MB'
)

There's a few ways to enter the size value. If you include a space, you need to enclose the value in double quotes (").

The following are equivalent:

`sys/size`:float > 800kb
`sys/size`:float > "800 kb"
`sys/size`:float > 800000

State#

If a run has been initialized for logging or read-only access, its state is active as long as the connection to Neptune remains open. Otherwise, the state is inactive.

You can ensure that only closed runs are fetched with the following:

Fetch only inactive runs

project.fetch_runs_table(
    query='`sys/state`:experimentState = "inactive"'
)

Status (failed)#

If an exception occurred during the run, it's set as "Failed". In practice, it means the sys/failed field is set to True.

Fetch failed runs

project.fetch_runs_table(
    query='`sys/failed`:bool = True'
)

Artifact version#

You can filter runs by Artifact hash:

What to log

run = neptune.init_run()
run["dataset_version"].track_files("path/to/dataset")

How to query

project = neptune.init_project()
project.fetch_runs_table(
    query='`dataset_version`:artifact = 9a113b799082e5fd628be178bedd52837bac24e91f'
)

Other file-related fields are not supported.

Learn more

Track artifacts

The type specification is needed in order to disambiguate between runs that may have the same field name but of different data types. ↩