
For a broader view of GitOps tools, see our 2025 comparison.
ArgoCD is the undisputed king of GitOps on Kubernetes. Everyone uses it. But the official documentation only tells you how to drive it on a freshly paved road. Out in the real world, where the pavement ends, you'll find potholes, weird engine noises, and a whole lot of community forum posts from people stuck in the mud.
This is a field guide based on what people are actually complaining about on Reddit, GitHub issues, and Stack Overflow. It covers the problems that don't make it into the marketing material.
The most common sign something’s wrong in ArgoCD land is the dreaded sync timeout or an application controller pegging the CPU. I've seen teams immediately throw more memory and CPU at the problem. It rarely works. That's a rookie move.
The issue is almost never the raw resources. It’s usually a symptom of something deeper:
argocd-repo-server is choking because your Helm chart is a monster or your Kustomize setup is too complex.argocd-application-controller is hammering the Kubernetes API server too hard, causing it to throttle requests.Before you scale up, you need to tune the controller. These are the knobs you should be looking at first.
| Parameter | Component | Default | Why You Should Care |
|---|---|---|---|
--status-processors |
application-controller |
20 | Controls how many apps can be reconciled at once. Too low, and things get slow. |
--operation-processors |
application-controller |
10 | Controls how many sync operations can run at once. Increase if syncs are queueing up. |
ARGOCD_K8S_CLIENT_QPS |
application-controller |
50 | Rate limit for talking to the K8s API. If you see throttling, bump this, but watch your API server. |
timeout.reconciliation |
argocd-cm ConfigMap |
120s (+ up to 60s jitter) | How often Argo checks Git. The timeout.reconciliation.jitter (default 60s) adds random delay, so polls average about every 2.5 minutes. If you have thousands of apps, you don't need it checking that often. |
--parallelismlimit |
repo-server |
1 | Concurrent manifest generations. If your repo server is OOM'ing, this is a likely culprit. |
Start here. Tweak these values, watch your metrics, and only then consider giving it more raw power.
Here's what trips up more teams than anything: ArgoCD doesn't run helm install or helm upgrade. It runs helm template.
And that changes everything.
It means ArgoCD has no concept of an "install" vs. an "upgrade." It's just a "sync." The nasty side effect is that your Helm pre-install and pre-upgrade hooks both run. Every. Single. Time. If your hooks aren't idempotent, meaning they can run over and over without causing problems, you're in for a world of pain.
The fix is to design your hooks to be harmless on repeated runs. For a one-off job, that means telling ArgoCD to clean up the hook resource after it succeeds.
# In your hook's manifest (e.g., a Job)
apiVersion: batch/v1
kind: Job
metadata:
name: my-presync-db-migration
annotations:
# This is the ArgoCD hook annotation
argocd.argoproj.io/hook: PreSync
# This tells ArgoCD to delete the Job object once the hook succeeds
argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
template:
spec:
containers:
- name: db-migrator
image: my-company/db-migrator:1.2.0
# ... rest of your job spec
restartPolicy: Never
backoffLimit: 1Also, forget about using Helm's lookup function. Since helm template runs without cluster access, lookup won't work. You'll have to refactor your charts to pass that data in via values.
The "App-of-Apps" pattern is the standard way to manage complex environments. You have a root app that deploys... other apps. It's a great idea until you have dependencies. What if your monitoring app needs the CRDs from your Prometheus operator app to be deployed first?
By default, ArgoCD syncs them all at once. Chaos ensues.
The solution is Sync Waves. It’s a simple annotation that lets you add an order to the chaos. Resources in lower-numbered waves are synced and must become healthy before ArgoCD moves on to the next wave.
# In your Prometheus Operator Application manifest
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: prometheus-operator
annotations:
# Wave 0: Deploy the operator CRDs first
argocd.argoproj.io/sync-wave: "0"
# ...
---
# In your Monitoring Application manifest
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: kube-prometheus-stack
annotations:
# Wave 1: Deploy after the operator is ready
argocd.argoproj.io/sync-wave: "1"
# ...It's simple, but essential for making the App-of-Apps pattern usable.
This is the seam where CI/CD pipelines for Terraform and OpenTofu meet GitOps, and it gets really awkward. Before you get here, it helps to settle where Terraform should stop and ArgoCD should take over, which is its own decision we work through in Terraform vs GitOps for Kubernetes. Say you provision an AWS RDS database with Terraform. The Terraform outputs, the database endpoint, username, and a secret ARN, now need to flow into the Kubernetes application that ArgoCD is deploying. How?
This is a classic handoff problem, and the community has come up with some clever, if clunky, workarounds:
values.yaml file containing the outputs directly back into your GitOps repo.Both of these work. But they feel like duct tape. You're creating this awkward seam in your process, either by relying on an external store as a middleman or by having your infrastructure tool pollute your application configuration history.
This is where a more integrated platform makes a lot more sense. Tools like Scalr, for instance, don't treat infrastructure provisioning and application deployment as two separate worlds you have to bridge. They manage the whole workflow. When a Terraform module creates an RDS instance, its outputs become first-class citizens, available to the next stage in the pipeline that deploys the application via ArgoCD. There's no clumsy handoff because it's all part of one environment definition. You solve the problem at the architectural level instead of patching over it with clever scripts.
For years, people used the ArgoCD Vault Plugin (AVP) to inject secrets during manifest generation. If you're still doing this, stop. The ArgoCD maintainers themselves now officially recommend against it.
Why? It's a security anti-pattern. Using AVP means your argocd-repo-server needs a credential to your Vault instance. This widens your attack surface. Worse, the rendered manifests, now with plaintext secrets, get stored in ArgoCD's Redis cache.
The modern, secure way is to use an operator-based pattern.
ExternalSecret manifest to Git. This manifest tells ESO where to find the secret in Vault (or AWS/GCP/Azure secret managers). ESO then fetches the secret and creates a native Kubernetes Secret object inside the cluster.Secret like it always would.In this model, ArgoCD is completely ignorant of Vault. It doesn't need credentials, and it doesn't handle plaintext secrets. It just manages the ExternalSecret custom resource, and the operator handles the sensitive work. That's a much cleaner separation of concerns.
Most ArgoCD pain comes from a handful of failure modes the docs gloss over. Tune the controller parameters before scaling resources. Design hooks that survive repeated runs, since helm template reruns them on every sync. Use Sync Waves to order dependent apps, and hand secrets to an operator like External Secrets rather than the Vault Plugin. The happy path in the docs won't surface any of these. The community forum threads will.
Key Sources Used:
