Skip to content

Custom installation of Neptune 2.4#

Self-hosted Neptune is deployed by configuring and installing a Helm chart. This way, you can set Neptune up to meet more advanced needs – such as external storage, managed database, or scaling.

As opposed to the simplified installation procedure that bootstraps a K3s cluster and deploys Neptune with default values, the custom installation assumes you have a cluster already set up.

Before you start#

Ensure that you meet and understand the Prerequisites.

About the Neptune Helm chart#

The chart consists of multiple deployments of different services combined through an Ingress resource.

Existing components:

Name Required Description
keycloak ✅ Yes Authentication service. For details, see the Keycloak documentation.
backend ✅ Yes Authorization service and organizational structure. Manages roles and privileges.
frontend ✅ Yes Static file serving, including HTML, JavaScript and images.
notifications ✅ Yes Notifications and web-socket handling, used in Python client and frontend notifications.
notebookconverter ✅ Yes Python service for comparing notebooks.
leaderboard ✅ Yes Run, metadata and visualization service. All data ingestion is handled here.
kafka ❌ No Small, single node Kafka with PVC persistence, enabled by default to reduce dependencies.

Additionally, during an installation or upgrade, Neptune launches a config job that aims to synchronize some settings in Keycloak.

Configuring the Helm chart#

This section walks you through the configuration options of the Neptune Helm chart values file (usually values.yml).

For sample configurations, see Examples.

Using Secrets in parameter values#

You can provide values that contain sensitive data as Kubernetes Secrets.

General recipe for "converting" a Neptune parameter to a Secret variant:

  1. Create an appropriate Secret in the target namespace of the Kubernetes cluster.
  2. In the Neptune Helm chart values file, append Secret to the parameter string.

    For example: usernameusernameSecret.

  3. Under the parameter name, add the following:

    values.yml
    secret: <name of Secret>
    key: <name of some key in Secret>
    

Now, when the chart is installed, it will pull the values from the specified Secret.

Example
Secret manifest
apiVersion: v1
kind: Secret
metadata:
    name: neptune-database-credentials
type: Opaque
data:
    username: bmVwdHVuZV91c2Vy  # base64-encoded - neptune_user
    password: bmVwdHVuZV9wYXNzd29yZA==  # base64-encoded - neptune_password
values.yml
imagePullSecrets:
    - regcred
database:
    host: mysql
    port: 3306
    usernameSecret:
        secret: neptune-database-credentials
        key: username
    passwordSecret:
        secret: neptune-database-credentials
        key: password
ingress:
    annotations:
        # nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
        nginx.ingress.kubernetes.io/proxy-body-size: 10G
        nginx.ingress.kubernetes.io/proxy-read-timeout: "86400"
        # kubernetes.io/ingress.class: neptune
init:
    workspaceName: team
    administrator:
        username: administrator
        password: change_me
leaderboard:
    elasticsearch:
        address: http://elasticsearch-neptune:9200
    storage:
        pvc:
            size: 100Gi

Common parameters#

Show common parameters
Name Description Default value
registry The root location of Neptune's container images. Change if you want to proxy them to your own registry. eu.gcr.io/neptune-distribution/neptune/onprem
imagePullPolicy One of IfNotPresent, Always, Never. This value is shared among all Neptune containers. Always
imagePullSecrets An array of strings pointing to image pull Secrets. []
updateStrategy Deployment update strategy. One of RollingUpdate, Recreate. RollingUpdate
service.type One of ClusterIp, NodePort Use NodePort when going with AWS ingress annotation ALB. ClusterIp
database.host The default database hostname.
database.port The default database port.
database.username The default database username.
database.usernameSecret.key The default Secret's key name to use as database username.
database.usernameSecret.secret The default Secret's name reference to use as database password.
database.password Database password.
database.passwordSecret.key The Secret's key name to use as database username.
database.passwordSecret.secret The Secret's name reference to use as database password.
database.parameters The default JDBC connection to use. Depending on your setup, different settings are required. The default value matches a VPC-private GCP database and simple, no-SSL MySQL database. For Azure Database for MySQL - Flexible Server, use serverTimezone=UTC&characterEncoding=UTF-8 allowPublicKeyRetrieval=true&useSSL=false&serverTimezone=UTC&characterEncoding=UTF-8
database.minIdleConnections The default minimum active DB connections to keep per component. 2
database.maxConnections The default maximum active DB connections to use per component. 15
nodeSelector The default node selector to use for all Neptune deployments. This is copied directly into Kubernetes Deployment manifests. {}
tolerations The default selector to use for all Neptune deployments. This is copied directly into Kubernetes Deployment manifests. []
podSecurityContext The default security context added to each Neptune pod.
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
fsGroupChangePolicy: OnRootMismatch
runAsNonRoot: true
metrics.enabled Enables Prometheus scraping annotations on pods. false
extraResources A templated string that can be used to create additional Kubernetes manifests together with this chart. ""
extraValues An arbitrary YAML file with additional values that you can use when rendering custom templates provided through the extraResources option. The Neptune Helm chart uses JSON schema validation to limit and troubleshoot values, but these extra values are not validated. {}
ingress Neptune exposition details. See its dedicated section.
experimental Experimental Neptune Helm chart features. This section my change without notice.
init.workspaceName Name of team workspace to create. Relevant only on first installation.
init.administrator.username Neptune administrator username.
init.administrator.usernameSecret.key Neptune administrator username Secret's key name.
init.administrator.usernameSecret.secret Neptune administrator username Secret's name reference.
init.administrator.password Neptune administrator password. This value is not updated after initial installation if the user changes the password in the app.
init.administrator.passwordSecret.key Neptune administrator password Secret's key name.
init.administrator.passwordSecret.secret Neptune administrator password Secret's name reference.

Configuring components#

The keys of this section are present in all of the following components, as their subkeys:

  • keycloak
  • backend
  • frontend
  • notifications
  • notebookconverter
  • leaderboard
  • kafka

You can override database settings per component using the same key schema as for the Common parameters.

Common component parameters#

Common parameters are not listed if they are not overridden.

Common component parameters
Name Description Default value
image The image name of the container to use. {{ .Values.registry }}/component-image-name
tag The container image tag to use. Leave empty to use the same version as that of the Neptune Helm chart. ""
replicas The amount of pods to spawn per Deployment. The default is not Highly Available. 1
updateStrategy If provided, overrides the top-level updateStrategy key for this component. ""
nodeSelector If provided, overrides the top-level nodeSelector key for this component. {}
tolerations If provided, overrides the top-level tolerations key for this component. []
affinity If provided, copies this key directly into Deployment. {}
port Component specific, main port that the component exposes via its service.
podSecurityContext If provided, overrides the top-level podSecurityContext key for this component. {}
serviceAccountName Name of the service account to use for this Deployment's pods.
terminationGracePeriodSeconds Number of seconds to wait before force-killing containers.
resources.requests.cpu The amount of CPU the component requests for scheduling. For best performance, it should equal the component's limit.
resources.requests.memory The amount of memory the component requests for scheduling. For best performance, should equal the component's limit.
resources.limits.cpu The amount of CPU the component is limited to.
resources.limits.memory The amount of memory the component is limited to. Note: Changing this may require other changes in Java components.
extraEnv A dictionary of ENV_NAME: "value" environment variables to be added to deployment. {}
extraSecretEnv A dictionary of ENV_NAME: <Secret reference> environment variables to be added to the deployment. Example:
component:
extraSecretEnv:
ENV_NAME:
secret: "name_of_the_secret"
key: "name_of_key_in_secret"
{}
extraContainers An array of additional containers to be deployed together with Neptune's containers. Useful for proxy and sidecar containers. []
extraVolumes Additional Kubernetes volume references to be added to Neptune's pods. Useful for providing additional CAs or configs. []
extraVolumeMounts Additional Kubernetes volume mounts to be added to Neptune's container. References one of the extra volumes. []

Component-specific parameters#

Keycloak
Name Description Default value
keycloak.image See Common component parameters. {{ .Values.registry }}/keycloak
keycloak.port See Common component parameters. 7070
keycloak.resources.requests.cpu See Common component parameters. "0.2"
keycloak.resources.requests.memory See Common component parameters. "1Gi"
keycloak.resources.limits.cpu See Common component parameters. "0.5"
keycloak.resources.limits.memory See Common component parameters. "1536Mi"
keycloak.database.schema Name of database schema for Keycloak's exclusive use. neptune_keycloak
keycloak.users.instanceAdmin Keycloak Master realm superadmin. Leave empty to auto-generate randomly. {}
keycloak.users.serviceAccount Keycloak user with elevated privileges used by Backend to manage Keycloak. Leave empty to auto-generate randomly. {}
keycloak.clients.management Keycloak client to be used by serviceAccount in internal communication. Leave empty to auto-generate randomly. {}
Backend
Name Description Default value
backend.image See Common component parameters. {{ .Values.registry }}/backend
backend.port See Common component parameters. 8080
backend.adminPort Admin API exposition. 8079
backend.authorizationPort Internal authorization API. Not exposed externally. 8085
backend.resources.requests.cpu See Common component parameters. "0.2"
backend.resources.requests.memory See Common component parameters. "1Gi"
backend.resources.limits.cpu See Common component parameters. "0.5"
backend.resources.limits.memory See Common component parameters. "1536Mi"
backend.database.schema Name of database schema for Backend to use. neptune_instance
backend.hpa.enabled Enables HorizonalPodAutoscaler for Backend. false
backend.hpa.minReplicas When enabled, keep at least this amount of replicas. 2
backend.hpa.maxReplicas When enabled, keep at most this amount of replicas. 3
backend.hpa.targetAverageUtilization When enabled, autoscale based on CPU metric. 80
Frontend
Name Description Default value
frontend.image See Common component parameters. {{ .Values.registry }}/frontend
frontend.port See Common component parameters. 8080
frontend.resources.requests.cpu See Common component parameters. "0.1"
frontend.resources.requests.memory See Common component parameters. "10Mi"
frontend.resources.limits.cpu See Common component parameters. "0.1"
frontend.resources.limits.memory See Common component parameters. "100Mi"
Notification
Name Description Default value
notifications.image See Common component parameters. {{ .Values.registry }}/notifications
notifications.port See Common component parameters. 8084
notifications.resources.requests.cpu See Common component parameters. "0.1"
notifications.resources.requests.memory See Common component parameters. "1Gi"
notifications.resources.limits.cpu See Common component parameters. "0.2"
notifications.resources.limits.memory See Common component parameters. "1Gi"
notifications.database.schema Name of database schema for Notifications' exclusive use. neptune_notifications
Notebookconverter
Name Description Default value
notebookconverter.image See Common component parameters. {{ .Values.registry }}/notebook-converter
notebookconverter.port See Common component parameters. 8080
notebookconverter.resources.requests.cpu See Common component parameters. "0.1"
notebookconverter.resources.requests.memory See Common component parameters. "512Mi"
notebookconverter.resources.limits.cpu See Common component parameters. "0.3"
notebookconverter.resources.limits.memory See Common component parameters. "1Gi"
Kafka

:warn: When using the Kafka provided by this chart, do not change the replicas amount. This Kafka deployment is designed to be single-node.

Name Description Default value
kafka.enabled Set to false to use your own Kafka deployment. This requires setting externalAddress. true
kafka.externalConfig A file that can be used as the Kafka config for Java.
kafka.image See Common component parameters. {{ .Values.registry }}/kafka
kafka.tag See Common component parameters. "3.5.1-v4"
kafka.port See Common component parameters. 9092
kafka.resources.requests.cpu See Common component parameters. "0.2"
kafka.resources.requests.memory See Common component parameters. "1536Mi"
kafka.resources.limits.cpu See Common component parameters. "0.3"
kafka.resources.limits.memory See Common component parameters. "1536Mi"
kafka.persistance Set to false to disable persistence. Some Elasticsearch updates may be lost if Kafka restarts. The kafka.storage section is ignored in this case. true
kafka.storage.existingClaim Use already existing PVC instead of provisioning one. If this is set, other settings are ignored. ""
kafka.storage.storageClass Use this StorageClass to provision the PVC. If not given, the default storage class for the target cluster will be used.
kafka.storage.size Set the size of the disk. "50Gi"
kafka.storage.accessMode Set the access mode of provisioned PV/PVC. As Kafka is single-node, the default ReadWriteOnce is sufficient. ReadWriteOnce
kafka.storage.annotations A dictionary of annotations to be added to the provisioned PVC. Some provisioners may require them. {}
kafka.extraEnv See Common component parameters. LOG_RETENTION_HOURS: "24"

Leaderboard configuration#

If you want to use external database instances, point the leaderboard.elasticsearch.address option to the address of the external Elasticsearch service and configure the database common parameters as needed.

Otherwise, the installation will set up databases inside the Kubernetes cluster where Neptune is deployed.

Show Leaderboard parameters
Name Description Default value
leaderboard.image See Common component parameters. {{ .Values.registry }}/leaderboard
leaderboard.port See Common component parameters. 8088
leaderboard.resources.requests.cpu See Common component parameters. "0.6"
leaderboard.resources.requests.memory See Common component parameters. "2Gi"
leaderboard.resources.limits.cpu See Common component parameters. "2"
leaderboard.resources.limits.memory See Common component parameters. "4Gi"
leaderboard.database.schemaVersion Experimental. Changing this may irreversibly break your instance. 1
leaderboard.database.main.schema Name of the database schema for Leaderboard's exclusive use. neptune_leaderboard
leaderboard.database.artifacts.schema Name of the database schema for Leaderboard's Artifact features to use. neptune_artifacts
leaderboard.elasticsearch.address (REQUIRED) Full Elasticsearch service address, for example http://elasticsearch-service.elasticsearch:9200 ""
leaderboard.elasticsearch.clusterName The name of the Elasticsearch cluster that Neptune is intended to be used with. The default is usually fine. "elastic"
leaderboard.elasticsearch.insecureSSL If your Elasticsearch uses HTTPS with a self-signed certificate, set this to true to disable certificate validation by Elasticsearch client. false
leaderboard.elasticsearch.shards The amount of Shards for the main index that Neptune uses. Used during the initial installation only. 5
leaderboard.elasticsearch.replicas The amount of Shard Replicas for the main index that Neptune uses. Used during the initial installation only. 0
leaderboard.elasticsearch.attributes.shards The amount of Shards for the attribute index that Neptune uses. Used during the initial installation only. 5
leaderboard.elasticsearch.attributes.replicas The amount of Shard Replicas for the attribute index that Neptune uses. Used during the initial installation only. 0
leaderboard.storage Describes the Object Storage that Neptune will use. This configuration is explained in detail in the Configuring storage section.
leaderboard.hpa.minReplicas When enabled, keep at least this amount of replicas. 1
leaderboard.hpa.maxReplicas When enabled, keep at most this amount of replicas. 3
leaderboard.hpa.targetAverageUtilization When enabled, autoscale based on CPU metric. 80
Example
database:
    host: <your_database_hostname>
    port: <your_database_port>
    username: <your_database_username>
    password: <your_database_user_password>
...
leaderboard:
    elasticsearch:
        address: <your_Elasticsearch_service_address>
    ...

For a full sample configuration, see Examples: Using an external database.

Advanced leaderboard function separation

For very large deployments designed to handle hundreds of millions of data points per second, Neptune's Leaderboard can be split into several parts that are responsible for handling different aspects of Neptune's traffic with efficiency and without impacting one another.

For most deployments, however, this separation is neither necessary nor recommended. Contact us to discuss this option.

Storage configuration#

Neptune can use one type of storage at a time. The following options are supported:

  1. (default) PersistentVolumeClaim (PVC)
  2. Amazon Web Services (AWS) S3 compatible service
  3. Azure Blob Storage

For more details on storage, see Prerequisites: Object storage.

PersistentVolumeClaim#

If no other configuration is set, Neptune instructs Kubernetes to provision a PVC using the defaults.

Show parameters
Name Description Default value
leaderboard.storage.existingClaim Use already existing PVC instead of provisioning one. If this is set, other settings are ignored. ""
leaderboard.storage.storageClass Use this StorageClass to provision the PVC. If not given, the default storage class for the target cluster will be used.
leaderboard.storage.size Set the size of the disk. 1024 Gi
leaderboard.storage.accessMode Set the access mode of provisioned PV/PVC. Using default ReadWriteOnce prevents scaling up of Leaderboard deployment. Use ReadWriteMany if scaling is required. ReadWriteOnce
leaderboard.storage.annotations A dictionary of annotations to be added to the provisioned PVC. Some provisioners may require them. {}

S3 compatible service#

Neptune has been tested1 on the following services that are compatible with the S3 API:

  • AWS S3
  • Google Cloud Storage (GCS) via Interoperability API
  • MinIO
Show parameters
Name Description Default value
leaderboard.storage.s3.bucketName The bucket name to use.
leaderboard.storage.s3.serviceEndpoint The endpoint to use. For GCS, use https://storage.googleapis.com
leaderboard.storage.s3.signingRegion The region of bucket. For GCS, use the project name of the bucket.
leaderboard.storage.s3.accessKeyId The key ID to use when accessing S3 content. Can be empty if your environment supports STS tokens or the VM has proper permissions. Mutually exclusive with the Secret version.
leaderboard.storage.s3.accessKeyIdSecret.key The key of the Secret where accessKeyId is stored.
leaderboard.storage.s3.accessKeyIdSecret.secret The name of the Secret where accessKeyId is stored.
leaderboard.storage.s3.secretAccessKey The access key to use when accessing S3 content. Can be empty if your environment supports STS tokens or the VM has proper permissions. Mutually exclusive with the Secret version.
leaderboard.storage.s3.secretAccessKeySecret.key The key of the Secret where secretAccessKey is stored.
leaderboard.storage.s3.secretAccessKeySecret.secret The name of the Secret where secretAccessKey is stored.
leaderboard.storage.s3.clientThreadPoolSize The number of threads to use when accessing S3 service.
Example values
leaderboard:
    ...
    storage:
        s3:
            bucketName: "neptune-bucket"
            serviceEndpoint: "https://storage.googleapis.com"
            signingRegion: "your-gcp-project"
            accessKeyId: "GOOG1ENQ3ASPZI............................RCRIIQ"
            secretAccessKey: "0A5o..................rnftZP"

For a full sample configuration, see Examples: Using S3 for storage.

Azure Blob Storage#

Show parameters
Name Description Default value
leaderboard.storage.azureBlob.container The name of the container within the storage account to use.
leaderboard.storage.azureBlob.connectionString The connection string, as in Azure Portal.
leaderboard.storage.azureBlob.clientThreadPoolSize The number of threads to use when access Azure Blob Storage. 200

Exposing Neptune#

Neptune is typically exposed via an Ingress Kubernetes resource.

Show parameters
Name Description Default value
ingress.enabled Set to false to disable Ingress creation. true
ingress.host Provides the hostname that this Ingress should handle. If this is the only Ingress handled by your Ingress Controller, you may leave this empty. ""
ingress.class Use specific ingressClassName. This should match the value specified in your Ingress Controllers. You may leave this empty if your Ingress Controller is configured to handle all class names. ""
ingress.labels Dictionary of labels to put into the created Ingress. {}
ingress.annotations Dictionary of annotations to be added to the created Ingress.
For Nginx Ingress Controller, the following annotations are highly recommended:
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "0"
nginx.ingress.kubernetes.io/proxy-read-timeout: "86400"
This is usually the place where different LoadBalancers are set up.
Please note that for AWS ALB you need to set service.type to NodePort
{}
ingress.tls.enabled Requires ingress.host and ingress.tls.secret to be set. Configures the TLS section on Ingress. false
ingress.tls.secret The name of the Kubernetes TLS Secret to use for TLS certificates.

Post-installation jobs#

To ensure that all configurations are synchronized, Neptune runs a Kubernetes job each time it is installed or updated.

Known issue

During the first installation, if you use --wait in the helm command, the job may not be created and the installation may time out.

Show parameters
Name Description Default value
config.image See Common component parameters. {{ .Values.registry }}/config
config.resources.requests.cpu See Common component parameters. "0.1"
config.resources.requests.memory See Common component parameters. "10Mi"
config.resources.limits.cpu See Common component parameters. "0.1"
config.resources.limits.memory See Common component parameters. "1Gi"

Installing the Helm chart#

To install the chart with the release name neptune, as administrator ("root" user), run the following commands:

helm repo add neptune https://helm.neptune.ai
helm repo update
helm upgrade -i \
    neptune \
    neptune/neptune \
    --namespace neptune \
    --create-namespace

Examples#

Tip

You can view the full parameters list separately in the Helm chart reference.

Simplest installation#

The default values of the Neptune Helm chart represent a minimal but complete configuration.

You can install Neptune using the following values:

values.yml
dockerRegistryCredentials: "" # (1)!
imagePullSecrets:
    - regcred
database:
    host: mysql
    port: 3306
    username: neptune_user
    password: neptune
ingress:
    annotations:
        # nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
        nginx.ingress.kubernetes.io/proxy-body-size: 10G
        nginx.ingress.kubernetes.io/proxy-read-timeout: "86400"
        # kubernetes.io/ingress.class: neptune
init:
    workspaceName: team
    administrator:
        username: administrator
        password: change_me
leaderboard:
    elasticsearch:
        address: http://elasticsearch-neptune:9200
    storage:
        pvc:
            size: 100Gi
  1. Your DRC token. For example: "ewogICJ0e..............iCn0K"

Using an external database#

You can use the database and leaderboard.elasticsearch.address parameters to point Neptune to an external database.

values.yml
dockerRegistryCredentials: "" # (1)!
imagePullSecrets:
    - regcred
database:
    host: <your_database_hostname>
    port: <your_database_port>
    username: <your_database_username>
    password: <your_database_user_password>
ingress:
    annotations:
        # nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
        nginx.ingress.kubernetes.io/proxy-body-size: 10G
        nginx.ingress.kubernetes.io/proxy-read-timeout: "86400"
        # kubernetes.io/ingress.class: neptune
init:
    workspaceName: team
    administrator:
        username: administrator
        password: change_me
leaderboard:
    elasticsearch:
        address: <your_Elasticsearch_service_address>
    storage:
        pvc:
            size: 100Gi
  1. Your DRC token. For example: "ewogICJ0e..............iCn0K"

Using S3 (GCS option) for storage#

The following example configuration specifies GCS for Neptune object storage.

values.yml
dockerRegistryCredentials: "" # (1)!
imagePullSecrets:
    - regcred
database:
    host: mysql
    port: 3306
    username: neptune_user
    password: neptune
ingress:
    annotations:
        # nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
        nginx.ingress.kubernetes.io/proxy-body-size: 10G
        nginx.ingress.kubernetes.io/proxy-read-timeout: "86400"
        # kubernetes.io/ingress.class: neptune
init:
    workspaceName: team
    administrator:
        username: administrator
        password: change_me
leaderboard:
    elasticsearch:
        address: http://elasticsearch-neptune:9200
    storage:
        s3:
            bucketName: "neptune-bucket"
            serviceEndpoint: "https://storage.googleapis.com"
            signingRegion: "your-gcp-project"
            accessKeyId: "GOOG1ENQ3ASPZI............................RCRIIQ"
            secretAccessKey: "0A5o..................rnftZP"
  1. Your DRC token. For example: "ewogICJ0e..............iCn0K"

Uninstalling the chart#

If you need to uninstall the Neptune helm chart:

helm uninstall neptune -n neptune

  1. Other implementations may work as long as they support put, get, list and multi-chunk uploads. Use at your own risk.