Troubleshoot Portworx on Kubernetes

Useful commands

  • List Portworx pods:

    kubectl get pods -l name=portworx -n kube-system -o wide
  • Describe Portworx pods:

    kubectl describe pods -l name=portworx -n kube-system
  • Get Portworx cluster status:

      PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath='{.items[0]}')
      kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl status
  • List Portworx volumes:

      PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath='{.items[0]}')
      kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl volume list
  • Portworx logs:

    • Recent Portworx logs can be gathered by using this kubectl command:

      kubectl logs -n kube-system -l name=portworx -c portworx --tail=99999
    • If you have access to a particular node, you can use this journalctl command to get all Portworx logs:

      journalctl -lu portworx*
  • Monitor kubelet logs on a particular Kubernetes node:

    journalctl -lfu kubelet
    • This can be useful to understand why a particular pod is stuck in creating or terminating state on a node.

Collecting Portworx logs

Please run the following commands on any one of the nodes running Portworx:

uname -a
docker version
kubectl version
kubectl logs -n kube-system -l name=portworx -c portworx --tail=99999
kubectl get pods -n kube-system -l name=portworx -o wide
PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath='{.items[0]}')
kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl status
kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl volume list

Include above logs when contacting us.

Get support

If you have an enterprise license, please contact us at with your license key and logs.

We are always available on Slack. Join us! Slack


  • Portworx container will fail to come up if it cannot reach etcd. For etcd installation instructions refer this doc.
    • The etcd location specified when creating the Portworx cluster needs to be reachable from all nodes.
    • Run curl <etcd_location>/version from each node to ensure reachability. For e.g curl ""
  • If you deployed etcd as a Kubernetes service, use the ClusterIP instead of the kube-dns name. Portworx nodes cannot resolve kube-dns entries since Portworx containers are in the host network.

Internal Kvdb

  • In an event of a disaster where, internal kvdb is in an unrecoverable error state follow this doc to recover your Portworx cluster

The Portworx cluster

  • Ports 9001 - 9022 must be open for internal network traffic between nodes running Portworx. Without this, Portworx cluster nodes will not be able to communicate and cluster will be down.
  • If one of your nodes has a custom taint, the Portworx pod will not get scheduled on that node unless you add a toleration in the Portworx DaemonSet spec. Read here for more information about taints and tolerations.
  • When the Portworx container boots on a node for the first time, it attempts to download kernel headers to compile it’s kernel module. This can fail if the host sits behind a proxy. To workaround this, install the kernel headers on the host. For example on centos, this will be yum install kernel-headers-`uname -r and yum install kernel-devel-`uname -r`
  • If one of the Portworx nodes is in maintenance mode, this could be because one or more of the drives has failed. In this mode, you can replace up to one failed drive. If there are multiple drive failures, a node can be decommissioned from the cluster. Once the node is decommissioned, the drives can be replaced and recommissioned into the cluster.
  • After you labeled a node with px/enabled=remove (or px/service=restart), and Portworx is not uninstalling (or, restarting):
    • On a “busy cluster”, Kubernetes can take some time until it processes the node-labels change, and notifies Portowrx service – please allow a few minutes for labels to be processed.
    • Sometimes it may happen that Kubernetes labels processing stops altogether - in this case please reinstall the “oci-monitor” component by applying and then deleting the px/enabled=false label: kubectl label nodes --all px/enabled=false; sleep 30; kubectl label nodes --all px/enabled-
    • this should reinstall/redeploy the “oci-monitor” component without disturbing the PX-OCI service or disrupting the storage, and the Kubernetes labels should work afterwards
  • The kubectl apply ... command fails with “error validating”:
    • This likely happened because of a version discrepancy between the “kubectl” client and Kubernetes backend server (ie. using “kubectl” v1.8.4 to apply spec to Kubernetes server v1.6.13-gke.0).
    • To fix this, you can either:
    • Downgrade the “kubectl” version to match your server’s version, or
    • Reapply the spec with client-validation turned off, e.g.: kubectl apply --validate=false ...

PVC creation

If the PVC creation is failing, this could be due the following reasons

  • A firewall/iptables rule for port 9001 is present on the hosts running Portworx containers. This prevents the create volume call to come to the Portworx API server.
  • For Kubernetes versions 1.6.4 and before, Portworx may not running on the Kubernetes control plane node.
  • For Kubernetes versions 1.6.5 and above, if you don’t have Portworx running on the control plane node, ensure that
    • The portworx-service Kubernetes Service is running in the kube-system namespace.
    • You don’t have any custom taints on the control plane node. Doing so will disallow kube-proxy from running on the control plane node and that will cause the portworx-service to fail to handle requests.
  • The StorageClass name specified might be incorrect.
  • Describe the PVC using kubectl describe pvc <pvc-name> and look at errors in the events section which might be causing failure of the PVC creation.
  • Make sure you are running Kubernetes 1.6 and above. Kubernetes 1.5 does not have our native driver which is required for PVC creation.

PVC Controller

If you are running Portworx in AKS and run into port conflict in the PVC controller, you can overwrite the default PVC Controller ports using the and annotations on the StorageCluster object:

kind: StorageCluster
  name: portworx
  namespace: kube-system
  annotations: "10261" "10262"

DNS policy updates

If you need to change the dnsPolicy parameter for the PX-OCI service, please also restart the PX-OCI service(s) after changing/editing the YAML-spec:

  # Apply change to DNS-Policy, wait for change to propagate (rollout) to all the nodes
  kubectl apply -f px_oci-updatedDnsPolicy.yaml
  kubectl rollout status -n kube-system ds/portworx

  # Request restart of PX-OCI services
  kubectl label nodes --all px/service=restart --overwrite
  # [OPTIONAL] Clean up the node-label after services restarted
  sleep 30; kubectl label nodes --all px/service-

Application pods

  • Ensure that the Portworx container is running on the node where the application pod is scheduled. This is required for Portworx to mount the volume into the pod.
  • Ensure the PVC used by the application pod is in “Bound” state.
  • Ensure that namespace of pod and the PersistentVolumeClaim is the same.
  • Check if Portworx is in maintenance mode on the node where the pod is running. If so, that will cause existing pods to see a read-only filesystem after about 10 minutes. New pods using Portworx will fail to start on this node.
    • Use /opt/pwx/bin/pxctl status to check the status of your Portworx cluster.
  • If a pod is stuck in terminating state, observe journalctl -lfu kubelet on the node where the pod is trying to terminate for errors during the pod termination process. Reach out to us over slack with the specific errors.
  • If a pod is stuck in Creating state, describe the pod using kubectl describe pod <pod-name> look at errors in the events section which might be causing the failure.
  • If a pod is stuck in CrashLoopBackoff state, check the logs of the pod using kubectl logs <pod-name> [<container-name>] and look for the failure reason. It could be because of any of the following reasons
    • Portworx was down on this node for a period of more than 10 minutes. This caused the volume to go into read-only state. Hence the application pod can no longer write to the volume filesystem. To fix this issue, delete the pod. A new pod will get created and the volume will be setup again. The pod will resume with the same persistent data since that is being backed by a PVC provisioned by Portworx.
    • The application container found existing data in the mounted PVC volume and was expecting an empty volume.

Last edited: Monday, Aug 29, 2022