Companies Are Drowning in AI Infrastructure—This Company Wants to Fix It

Interviews
Tuesday, 10 March 2026 at 20:39
agents
As more companies train and deploy AI models, the complexity of the underlying infrastructure is exploding. GPU clusters, cloud environments, and AI workflows must scale fast—while staying manageable, secure, and affordable. According to infrastructure company Quali, that’s one of the biggest challenges of today’s AI boom: organizations are pouring money into compute, but often lack the systems to orchestrate that infrastructure efficiently.
Quali builds software that helps companies manage and automate their AI infrastructure, from GPU provisioning to end-to-end AI workflows. The company positions its platform as a solution for “AI-ready infrastructure”: a layer that gives real-time visibility into costs, usage, and changes, while automatically enforcing governance and policy. CEO Lior Koriat says this is essential now that AI systems can trigger infrastructure actions on their own and organizations struggle with rising cloud bills and uncontrolled AI experiments.
In a conversation with aiwereld.nl, Koriat explains why traditional infrastructure automation no longer suffices, how intent-driven infrastructure works, and why the biggest bottleneck for AI projects often isn’t technology—it’s complexity.

1. We hear a lot about “AI-ready infrastructure.” What does that actually mean for enterprises running GPU-heavy workloads?

Koriat: “AI-ready infrastructure goes far beyond just making GPUs available. In practical enterprise terms, it’s the ability to orchestrate infrastructure around the specific operational demands of AI—from GPU provisioning to full AI workflows—while preserving governance, visibility, and cost control as systems scale.
Quali defines AI-ready infrastructure as an operating model that provides real-time insight into cost, drift, utilization, and change activity; enables policy-as-code enforcement; supports traceable, reversible change management; and delivers self-service at scale. The concept spans the entire AI lifecycle, effectively covering everything from GPUs to workflows.
Lior Koriat
For enterprises running GPU-intensive workloads, AI readiness hinges on three interconnected capabilities. First: the ability to deploy AI infrastructure quickly without repeated manual reconfiguration. Intent-driven orchestration combines AI-led interpretation, automated environment provisioning, and continuous optimization to accelerate rollouts while maintaining governance across hybrid environments.
Second: treating GPU workloads as dynamic, policy-driven environments rather than static clusters that stay on permanently. Allocation is continuously optimized across training, inference, and retraining phases to boost utilization and cut costs.
Third: enforcing operational discipline to prevent runaway spend and underutilization. Automated governance, intelligent consumption policies, and real-time workload optimization ensure GPUs run only when they deliver measurable value.
Operationally, AI-ready infrastructure means controlled self-service access to GPU stacks, lifecycle management controls, enforceable provisioning policies, transparent cost visibility, and continuous optimization aligned to workload demands.”

2. You describe Quali’s approach as “intent-driven infrastructure.” How is that fundamentally different from traditional automation or Infrastructure-as-Code?

Koriat: “Infrastructure-as-Code remains foundational, but it’s no longer sufficient in an increasingly agent-driven tech landscape where AI systems can initiate infrastructure actions themselves.
The shift is from infrastructure as code to infrastructure as intent. In this model, tools like Terraform still serve as execution engines, while a higher-level control plane determines what should be deployed, when it should happen, and under which governance constraints.
The difference is abstraction and responsiveness. Intent operates at a higher level than raw scripts. Users can submit natural-language prompts that translate into immediately executable, policy-compliant blueprints, while administrators retain control over which resources are allowed.
Prompts are converted into environment-as-code artifacts that can be launched on demand within predefined guardrails. For example, a data scientist can request a GPU training environment via a natural-language prompt or catalog entry, after which the platform translates that request into a policy-compliant infrastructure blueprint.
At the same time, the system is event-driven and context-aware. Rather than relying solely on manual pipeline executions, it responds to triggers such as policy violations, performance signals, and workflow events—allowing infrastructure to adapt dynamically.”
Within this framework, Infrastructure-as-Code isn’t replaced. It becomes a modular layer inside a governance-driven control plane that interprets intent, orchestrates across tools and providers, and continuously optimizes infrastructure state.

3. Cloud sprawl and rogue AI experiments: how do intent-based systems restore control without killing speed?

Koriat: "Quali tackles this by embedding governance directly into environment creation and launch—not as an after-the-fact audit. By shifting governance earlier in the lifecycle, organizations keep control without throttling experimentation.
At the core are pre-approved, policy-infused blueprints. DevOps and IT define standardized environments with governance rules, budget parameters, and access controls. These curated templates let users create, update, consume, and tear down environments on their own—while staying within defined guardrails.
Because the platform initiates resource creation, it can enforce compliance at launch by blocking deployments that violate policy. Non-compliant configurations must be fixed before they go live.
Policy enforcement is powered by policy-as-code mechanisms like Open Policy Agent, automatically evaluating access, cost, and usage rules during provisioning.
Secure credential handling prevents sprawl while enabling safe self-service. Cost control is operationalized through automatic shutdown policies, usage-based triggers, and enforceable spend limits. Governance stops being just reporting and becomes an active control system.
Innovation stays fast through controlled self-service, with enforcement baked into templates, role-based access control, and automated lifecycle management. Teams move quickly within clearly defined operating boundaries."

4. As AI scales, what’s the biggest operational bottleneck—costs, governance, security, or complexity?

Koriat: "The primary bottleneck is usually structural complexity, with costs as the most visible symptom—especially in GPU-heavy environments.
As AI programs scale, infrastructure complexity often outpaces what manual processes can handle. The result: stalled experiments, idle or unreachable GPUs, weak cost visibility, and long provisioning cycles that slow model iteration.
Legacy tools and siloed workflows make it worse, amplifying governance and security risks as teams gain more autonomy.
Runaway GPU spending and energy inefficiency are often downstream effects of unmanaged complexity—think idle training clusters, duplicate experiments, or environments left running long after workloads complete.
In this world, cost, governance, and security aren’t separate problems—they’re intertwined. A centralized control-plane model addresses this by unifying orchestration, governance, and optimization into a continuously automated framework instead of relying on manual ticketing."

5. With global GPU demand exploding, who should own infrastructure: centralized platform teams or federated AI teams?

Koriat: "Quali supports a model that pairs centralized guardrails with decentralized execution.
In this setup, a central platform team manages the governance framework, reusable blueprints, catalogs, and cost limits. Federated AI and data science teams then consume these environments via controlled self-service.
The platform draws a clear line between those who define and orchestrate environments and those who deploy and use them. Controls like max duration, tagging requirements, and spend limits are enforced at launch.
This lets AI teams iterate fast without rebuilding infrastructure standards each time—while preventing sprawl, inconsistency, and unmanaged GPU spend."
The technical questions below were answered by David Ben-Shabat, Vice President of Research & Development.

6. Torque is positioned as an autonomous infrastructure platform. Realistically, what does “autonomous” mean today—and where does human oversight remain critical?

Ben-Shabat: "In practical terms, autonomy means shifting routine operational decisions from manual steps to AI-assisted detect–decide–act loops that operate within established governance policies.
The system continuously monitors infrastructure state, detects drift, policy violations, idle resources, or workload phase changes, determines pre-approved responses, and executes corrective actions automatically."
Such actions can include: rejecting non-compliant deployments, reallocating GPU resources, adjusting infrastructure capacity, or shutting down idle environments.
Here, autonomy means bounded automation within predefined governance frameworks. Teams remain responsible for setting policies, writing approved blueprints, defining budget limits, and approving exceptional or high-risk changes.
Human oversight sets intent and governance parameters, while the system handles execution and optimization."

7. How are you using AI or agentic AI in infrastructure management itself? Are we heading toward self-healing, self-optimizing environments?

Ben-Shabat: "Quali applies agentic AI principles across infrastructure design, reliability, and cost optimization.
The platform continuously monitors resource health and supports automatic remediation for common infrastructure issues. For more complex incidents, AI-driven insights help pinpoint root causes and speed up drift recovery.
Cost optimization comes from continuous cost calculation, rejecting deployments that exceed policy limits, detecting idle resources, and automatically terminating them before waste accumulates.
GPU environments are dynamically provisioned and scaled across training and inference workloads, using real-time performance and utilization data to right-size infrastructure.
These capabilities reflect a broader shift toward self-healing and self-optimizing environments, powered by continuous feedback loops and built-in policy frameworks."

8. Intent-based systems demand strong policy enforcement. How do you ensure governance while keeping dynamic AI workloads possible?

Ben-Shabat: "Governance is achieved through an integrated mix of policy-as-code enforcement, standardized blueprints, and lifecycle control.
Policies around access, cost, and usage are embedded directly into provisioning workflows and evaluated before infrastructure is deployed. Infrastructure is assembled into reusable, policy-compliant blueprints that must first pass governance checks.
Because the platform initiates infrastructure creation, it can pre-calculate expected costs, enforce real-time limits, and trigger automatic termination based on time or usage conditions.
Dynamic workloads remain possible—but only within encoded policy boundaries. Automated enforcement replaces manual reviews while maintaining compliance."

9. GPU clusters and AI pipelines create unpredictable usage spikes. How does Quali’s architecture handle elasticity without unleashing runaway costs?

Ben-Shabat: "Elasticity is managed as both workload-aware and policy-constrained. GPU allocation is continuously optimized across different workload phases, right-sizing resources for training, inference, and retraining.
Real-time monitoring enables context-aware prioritization and reallocation as demand fluctuates.
On the cost side, deployment expenses are pre-calculated, near real-time dashboards provide visibility, and deployments that exceed limits can be rejected.
Time-bounded environments and automatic teardown policies prevent infrastructure from lingering. Scheduled shutdowns and maximum durations systematically reduce idle spend.
This approach lets infrastructure respond to demand spikes while maintaining strict governance over capacity and spend."

10. Looking three to five years ahead, will infrastructure platforms evolve into AI-native operating layers for enterprises? What does that architecture look like?

Ben-Shabat: "Quali expects infrastructure platforms to evolve into AI-native operating layers over the next few years—much like Kubernetes became the operating layer for container infrastructure.
AI-native architectures will feature a centralized control plane that can interpret intent from human prompts, pipelines, or AI agents and translate it into governed environments.
A continuous policy core will enforce compliance, cost ceilings, and access boundaries across the entire lifecycle.
Specialized agents will make decisions on scaling, remediation, drift management, cost optimization, and troubleshooting within detect–decide–act loops.
Human approval will remain available for exceptional cases.
In this model, the infrastructure platform acts as an enterprise operating system for AI—coordinating governance, cost control, orchestration, and operational resilience across multi-cloud AI environments."
loading

Loading