Rediscover journal(04May23) : Learning Kubernetes Basics
Now that the service and training has been dockerized, its time to get familiar with kubernetes terms and learning to read its declarative yaml syntaxes. This post will be extremely basics, as it goes through the fundamentals of kubernetes, and how I attempt to relate it back to my project.
For simplification, the kubernetes cluster that I deployed is via minikube
. All functions will be done using kubectl
as well.
Services, Deployments, and Pods
Persistent volume
I use a cli tool call Kompose
in attempt to convert my docker compose yaml file to kubernetes automatically. It generates the following yaml files for my docker compose file:
- A series of claim0 - claim2 persistent volume claim yaml file. This yaml file, when applied, claims a specified amount of storage on the cluster. What exactly does it claim, and how does it do that? A
persistent volume
is an individual resource within the cluster, managed by its own Pod. unlikevolume
, which is managed by a specific pod, and will be destroyed when that pod goes down,persistent volume
stays up all the time. However, they need to be predefined, such as file system, volume, and identifiers.A persistent volume claim yaml file generated for this project to store database would look like this:
apiVersion: v1 kind: PersistentVolumeClaim metadata: creationTimestamp: null labels: io.kompose.service: mlflow-claim0 name: mlflow-claim0 spec: accessModes: - ReadWriteOnce resources: requests: storage: 100Mi status: {}
Important to make sure
accessModes
isReadWriteOnce
as you do not want multiple services accessing this db at the same time. After applying it, a simplekubectl describe pvc
returns this:1Name: training-claim1 2Namespace: default 3StorageClass: standard 4Status: Bound 5Volume: pvc-b8478cfb-eba4-4ff5-948b-ce91f4f43ba0 6Labels: io.kompose.service=training-claim1 7Annotations: pv.kubernetes.io/bind-completed: yes 8 pv.kubernetes.io/bound-by-controller: yes 9 volume.beta.kubernetes.io/storage-provisioner: k8s.io/minikube-hostpath 10 volume.kubernetes.io/storage-provisioner: k8s.io/minikube-hostpath 11Finalizers: [kubernetes.io/pvc-protection] 12Capacity: 100Mi 13Access Modes: RWO 14VolumeMode: Filesystem 15Used By: training-7469d6b495-gkn5r 16Events: <none>
From first glance, my concerns with db as persistent volumes is that you can’t exactly scale this up since it is declarative. Some of the cloud providers do support dynamically edit file storage size on their pvc. In the case of minikube, it creates a
hostPath
volume (which means it takes a filesystem path from the host and directly link it to a persistent volume on the cluster. It means, according to this link here that taught me more, if you have a NAS on a IP address on your file system, you’ll be able to attached it onto the cluster.
Services
A service enables network access to a set of pods. Here is the kubernetes file for the service mlflow
1apiVersion: v1
2kind: Service
3metadata:
4 annotations:
5 kompose.cmd: kompose convert
6 kompose.version: 1.26.0 (40646f47)
7 creationTimestamp: null
8 labels:
9 io.kompose.service: mlflow
10 name: mlflow
11spec:
12 ports:
13 - name: "5000"
14 port: 5000
15 targetPort: 5000
16 nodePort: 5555
17 selector:
18 io.kompose.service: mlflow
19status:
20 loadBalancer: {}
A rather simple service. The important part that I learned here are the ports:
port
: host port
targetPort
: service port
`nodePort`: port on node to be accessed form. Allowing you to access a node directly.
Deployment
After deploying, it seems to fail to pull image:
1$ minictl get pods
2NAME READY STATUS RESTARTS AGE
3mlflow-b7bf49d7c-qjzbt 0/1 ImagePullBackOff 0 25h
4terraform-example-75b7f49985-8x2qf 1/1 Running 2 (18h ago) 3d
5terraform-example-75b7f49985-vp4pk 1/1 Running 2 (18h ago) 3d
6terraform-example-75b7f49985-z44ls 1/1 Running 2 (18h ago) 3d
7training-7469d6b495-gkn5r 0/1 ImagePullBackOff 0 25h
I used kubectl describe pods
to get the error further
1Events:
2 Type Reason Age From Message
3 ---- ------ ---- ---- -------
4 Warning Failed 19h (x71 over 25h) kubelet Failed to pull image "mlflowtraining:latest":
5 rpc error: code = Unknown
6 desc = Error response from daemon:
7 pull access denied for mlflowtraining,
8 repository does not exist or may require 'docker login': denied:
9 requested access to the resource is denied
10 Normal Pulling 19h (x72 over 25h) kubelet Pulling image "mlflowtraining:latest"
11 Normal BackOff 18h (x1628 over 25h) kubelet Back-off pulling image "mlflowtraining:latest"
12 Normal SandboxChanged 31m kubelet Pod sandbox changed, it will be killed and re-created.
13 Normal Pulling 29m (x4 over 31m) kubelet Pulling image "mlflowtraining:latest"
14 Warning Failed 29m (x4 over 31m) kubelet Failed to pull image "mlflowtraining:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied for mlflowtraining, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
By default, kubernetes pulls image from a registry, by default docker’s registry. We change this by adding imagePullPolicy: Never
to discourage it from seeking from a registry and assume a local registry.
I then check if it is running by minikube kubectl logs mlflow-<podsidhere>
1(mlops-aws)12:50~@:~/practice/mlops (docker_dev)$ minictl logs mlflow-64bc4c975c-69zts
22023/05/08 04:43:13 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
32023/05/08 04:43:13 INFO mlflow.store.db.utils: Updating database tables
4INFO [alembic.runtime.migration] Context impl SQLiteImpl.
5INFO [alembic.runtime.migration] Will assume non-transactional DDL.
6INFO [alembic.runtime.migration] Running upgrade 451aebb31d03 -> 90e64c465722, migrate user column to tags
7INFO [89d4b8295536_create_latest_metrics_table_py] Migration complete!
8INFO [alembic.runtime.migration] Running upgrade 89d4b8295536 -> 2b4d017a5e9b, add model registry tables to db
9INFO [2b4d017a5e9b_add_model_registry_tables_to_db_py] Adding registered_models and model_versions tables to database.
10INFO [2b4d017a5e9b_add_model_registry_tables_to_db_py] Migration complete!
11INFO [alembic.runtime.migration] Running upgrade cc1f77228345 -> 97727af70f4d, Add creation_time and last_update_time to experiments table
12INFO [alembic.runtime.migration] Context impl SQLiteImpl.
13INFO [alembic.runtime.migration] Will assume non-transactional DDL.
14[2023-05-08 04:43:18 +0000] [40] [INFO] Starting gunicorn 20.1.0
15[2023-05-08 04:43:18 +0000] [40] [INFO] Listening at: http://0.0.0.0:5000 (40)
16[2023-05-08 04:43:18 +0000] [40] [INFO] Using worker: sync
17[2023-05-08 04:43:18 +0000] [42] [INFO] Booting worker with pid: 42
18[2023-05-08 04:43:18 +0000] [43] [INFO] Booting worker with pid: 43
19[2023-05-08 04:43:18 +0000] [44] [INFO] Booting worker with pid: 44
20[2023-05-08 04:43:18 +0000] [45] [INFO] Booting worker with pid: 45
Once it is running, I simply portforward to my host machine’s port 5000: