Authors: Daniel Vega-Myhre (Google), Abdullah Gharaibeh (Google), Kevin Hannon (Red Hat) In this article, we introduce JobSet, an open source API for representing distributed jobs. The goal of JobSet is to provide a unified API for distributed ML training and HPC workloads on Kubernetes. Why JobSet?...
Search

Microsoft Tried To Steal A Project And Almost Got Away With It....

YouTube Video
Click to view this content.

Yesterday we had the first Headlamp release after we joined the SIG UI!
Yesterday we had the first Headlamp release after we joined the @kubernetes SIG UI!
It's also probably the version with the most changes ever, so it's impossible to summarize all the great things in one message here! Instead, check it all out at:
https://github.com/kubernetes-sigs/headlamp/releases/tag/v0.31.0

π New CrowdSec Academy course just landed!
π New CrowdSec Academy course just landed!
Ready to secure your @kubernetes cluster with real-time protection?
Learn how to:
π Deploy CrowdSec in K8s
π Enable TLS
π‘οΈ Set up a powerful WAF
π Monitor audit logs
Start learning now π https://academy.crowdsec.net/course/deploying-crowdsec-in-kubernetes
CrowdSec #Kubernetes #OpenSource #CyberSecurity #DevSecOps #FOSS @K8sArchitect

How to see what is using flannel or circumvent flannel address usage in kubernetes?
[EDIT (solved)]: Turns out, cilium did not remove its network links, and somehow kept updating to my current CIDIR leading to a duplicate, removing the links worked.
I keep on getting issues with CNI and networking.. I just want my cluster to work.. anyways
undefined
Apr 28 17:14:30 raspberrypi k3s[2373903]: time="2025-04-28T17:14:30+12:00" level=error msg="flannel exited: failed to register flannel network: failed to configure interface flannel.1: failed to set interface flannel.1 to UP state: address already in use"
How do i see what is using flannel Here is my server arguments:
undefined
ExecStart=/usr/local/bin/k3s \ server \ --kubelet-arg=allowed-unsafe-sysctls=net.core.rmem_max,net.core.wmem_max,net.ipv4.ip_forward \ --flannel-backend vxlan \ --disable=traefik \ --write-kubeconfig-mode 644
So I am using the default flannel backend, I tried repeatedly uninstalling then re-installing k3s, I deleted the current flannel interface with ip link, there

Traefik is not running properly, kube-apiserver pod might be down
[EDIT] Soo.. kinda fixed? It was my backend, turns out, it forwards /nextcloud onto the nextcloud service, which does not know what to do with it unless I set something like site-url to include that path. So I made a middleware to strip the prefix, but now it cannot access any of its files because it will use the wrong path. I will look for siteurl settings but I dont think all of my services have one, so any advice would be appreciated for a general solution
So currently my raspberrypi is connected to my internet under the ip, 192.168.68.77, (I configured traefik to work with that host and alternative hosts if need be). According to traefik logs I think that it does not work because it is missing access to the api server, although i could be wrong, i installed traefik via helm, and I have a config file for it, and disabled the default traefik given by k3s. here is the traefik config and logs: config: https://pastebin.com/XYH2LKF9 logs: https://pastebin.com/sbjPZCXv pods and svcs (al
Iβve seen external-dns used for cloudflare and it looks to support powerdns as well.

Kubernetes DNS broke
spiderunderurbed@raspberrypi:~/k8s $ kubectl run -it --rm network-tools \
--image=nicolaka/netshoot \ --restart=Never \ -- /bin/bash If you don't see a command prompt, try pressing enter. network-tools:~# cat /etc/resolv.conf search default.svc.cluster.local svc.cluster.local cluster.local nameserver 10.43.0.10 options ndots:5 network-tools:~#
DNS does not work in my k8s cluster. I dont know how to debug this, this is all my logs are in Coredns and kubedns:
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.override
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
This probably isnt enough, but what more can I do to debug this? I dont think its anything to do with my CNI, I am using calico, 1.1.1.1 as a nameserver or any nameserver works, but the issue is that internal to external dns mappings do not work, dns cannot resolve outside. Maybe not inside either according to this:
undefined
spiderunderurbed@raspberrypi:~/k8s $

Authors: Daniel Vega-Myhre (Google), Abdullah Gharaibeh (Google), Kevin Hannon (Red Hat)
In this article, we introduce JobSet, an open source API for representing distributed jobs. The goal of JobSet is to provide a unified API for distributed ML training and HPC workloads on Kubernetes.
[...]
[T]he Job API fixed many gaps for running batch workloads, including Indexed completion mode, higher scalability, Pod failure policies and Pod backoff policy to mention a few of the most recent enhancements. However, running ML training and HPC workloads using the upstream Job API requires extra orchestration to fill the following gaps:
Multi-template Pods : Most HPC or ML training jobs include more than one type of Pods. The different Pods are part of the same workload, but they need to run a different container, request different resources or have different failure policies. A common example is the driver-worker pattern.
Job groups : Large scale training workloads span multiple network top

k9s debug-container plugin
cross-posted from: https://lemmy.ml/post/20234044
Do you know about using Kubernetes Debug containers? They're really useful for troubleshooting well-built, locked-down images that are running in your cluster. I was thinking it would be nice if k9s had this feature, and lo and behold, it has a plugin! I just had to add that snippet to my
${HOME}/.config/k9s/plugins.yaml
, run k9s, find the pod, press enter to get into the pod's containers, select a container, and press Shift-D. The debug-container plugin uses the nicolaka/netshoot image, which has a bunch of useful tools on it. Easy debugging in k9s!
I looked at Tekton, but the complexity of doing simple things put me off. I have been running woodpecker which now has Kubernetes support.
Installing the Helm Chart for the Woodpecker agent gives K8s support with no special configuration needed. My needs are simple but I have been really impressed with how easy it has been.

One of biggest problems of (https://fosstodon.org/tags/kubernetes) is complexity.


One of biggest problems of #kubernetes is complexity.
@thockin on #KubeCon keynote shares his insights. I've seen that time and again with my users, as well as on our Logz.io DevOps Pulse yearly survey.
Maintainers aren't the end users of
@kubernetes , which doesn't help.

(https://fosstodon.org/tags/KubeCon) (https://fosstodon.org/tags/ObservabilityDay)? Itβs time to talk about the unspoken challenges of (https://fosstodon.org


KubeCon #ObservabilityDay? Itβs time to talk about the unspoken challenges of #monitoring #Kubernetes: the bloat of metric data, the high churn rate of pod metrics, configuration complexity, and so much more. https://horovits.medium.com/f30c58722541
observability #devops #SRE @kubernetes @linuxfoundation

Itβs time to talk about the unspoken challenges of monitoring (https://fosstodon.org/tags/Kubernetes): the bloat of metric data, the high churn rate of pod metrics, configuration complexi


Itβs time to talk about the unspoken challenges of monitoring #Kubernetes: the bloat of metric data, the high churn rate of pod metrics, configuration complexity, and so much more.
https://horovits.medium.com/f30c58722541
kubecon @kubernetes #k8s #monitoring #observability #devops #SRE @victoriametrics

Thanks for your answer. That's correct as much as I can see in the EKS docs. But in GKE there is a little disclaimer here
If you want to use a beta Kubernetes feature in GKE, assume that the feature is enabled. Test the feature on your specific GKE control plane version. In some cases, GKE might disable a beta feature in a specific control plane version.
They basically say "ok, trust on all the beta features would be enabled by default, but we can disable some of them without advising you". Funny guys.
Disclaimer: I'm just a K3s hobbyist.
It really depends on your needs. At the companies I've worked for, they require some sort of support and guaranteed security, usually in the form of a contract. I do recall a note in the docs about using an external DB for HA, so probably check that out. Finally, what's the availability and resiliency requirements? That may impact your decision. Finally, you may be able to try it out in a nonproduction environment and save some dough while evaluating it's viability.

I used Gorilla-CLI to give me kubectl command to patch a daemonset
Gorilla-CLI converts NLP into commands. No OpenAI keys needed!
https://github.com/gorilla-llm/gorilla-cli
Today, I wanted to patch my nodelocaldns daemon set to not run on Fargate nodes. Of course I donβt remember the schema for patching with specific instructions. So, I asked Gorilla
$ gorilla show me how to patch a daemonset using kubectl to add nodeaffinity that matches expression eks.amazonaws.com/compute-type notin Fargate
Gorilla responded with:
kubectl -n kube-system patch daemonset node-local-dns --patch '{"spec": {"template": {"spec": {"affinity": {"nodeAffinity": {"requiredDuringSchedulingIgnoredDuringExecution": {"nodeSelectorTerms": [{"matchExpressions": [{"key": "eks.amazonaws.com/compute-type","operator": "NotIn","values": ["fargate"]}]}]}}}}}}'
Close enough! It just missed a trailing '}'
Really impressed.

Using Kubernetes for development?
I'd love to hear some stories about how you or your organization is using Kubernetes for development! My team is experimenting with using it because our "platform" is getting into the territory of too large to run or manage on a single developer machine. We've previously used Docker Compose to enable starting things up locally, but that started getting complicated.
The approach we're trying now is to have a Helm chart to deploy the entire platform to a k8s namespace unique to each developer and then using Telepresence to connect a developer's laptop to the cluster and allow them to run specific services they're working on locally.
This seems to be working well, but now I'm finding myself concerned with resource utilization in the cluster as devs don't remember to uninstall or scale down their workloads when they're not active any more, leading to inflation of the cluster size.
Would love to hear some stories from others!