Guide intermediate

Skyvern vs UiPath for vision-RPA: license, cost, and team-fit decision framework

AGPL-3.0 §13 plain-reading, a four-profile decision matrix, and a self-evaluation checklist for picking between OSS vision-first RPA (Skyvern) and enterprise selector-first RPA (UiPath).

Published May 6, 2026 by Pondero Editorial
Table of Contents
Pondero, operated by Hildebrandt AI LLC, earns a commission from some links on this page. This does not influence our editorial decisions. Read our affiliate disclosure

Skyvern vs UiPath for vision-RPA: license, cost, and team-fit decision framework

Drafted May 6, 2026 by Pondero Editorial.

This is a decision framework, not a hands-on review. We read the Skyvern AGPL-3.0 LICENSE file in full and worked through both products’ public docs and repos. We have not run a UiPath production deployment, and we are not publishing a Skyvern install timing test in this piece; if and when we do, it will ship as a separate “we actually ran it” companion. The framework below is what we would use to make the pick in a real procurement conversation: license posture, cost shape, team profile, and where each tool actually fits.

What “vision-based RPA” actually means

Traditional RPA records a script against the DOM of a web page or the accessibility tree of a desktop app. The bot finds a button by its CSS selector or UIA element id, clicks, types, and moves on. This works as long as selectors stay stable. The day a vendor rebuilds their portal in a new component framework, every selector breaks.

Vision-based RPA replaces “find this selector” with “find what looks like the Submit button in this screenshot, then click it.” The bot hands a screenshot to a vision-capable language model, and the model returns an action plan: click coordinates, text to type, scroll deltas. If the page changes, the model re-grounds against the new screenshot. DOM access is still a useful fallback when the visual cue is ambiguous; the strongest stacks combine both.

Skyvern is built on this premise. It uses an LLM with vision (OpenAI or Anthropic by default), falls back to DOM extraction when vision is uncertain, and exposes a Python SDK and HTTP API (docs.skyvern.com). UiPath added vision later as the AI Computer Vision activity set, but the default UiPath authoring path is still selector-driven workflows.

AGPL-3.0 §13: what it means if you embed Skyvern in a SaaS

This is the clause that decides the pick for most teams shipping a product. Skyvern is licensed under the GNU Affero General Public License v3.0. The clause that matters for SaaS embedding is §13 (Remote Network Interaction), and we are quoting the FSF source verbatim because paraphrasing this from memory is exactly how teams get themselves into trouble:

Notwithstanding any other provision of this License, if you modify the Program, your modified version must prominently offer all users interacting with it remotely through a computer network (if your version supports such interaction) an opportunity to receive the Corresponding Source of your version by providing access to the Corresponding Source from a network server at no charge, through some standard or customary means of facilitating copying of software.

Source: GNU AGPL-3.0 §13

In plain reading: if you modify Skyvern and run that modified version as a network-accessible service other people interact with, you must offer those users the source of your modifications. The trigger is “users interact with it over a network,” not “users receive a binary.”

Three common scenarios:

  • Internal self-host, your team only. No problem. AGPL §13 triggers on external users; employees on your own infrastructure are fine.
  • SaaS product using Skyvern under the hood. The clause triggers. You owe customers the source of your Skyvern modifications, not your entire SaaS codebase, but the line between “Skyvern modification” and “your app code that calls Skyvern” is a real legal question. Have a lawyer read it before you ship.
  • Skyvern Cloud or separate commercial license. The AGPL obligation is replaced by commercial license terms.

The AGPL is not a trap. It is a deliberate copyleft posture that aligns Skyvern’s incentives with paying for the hosted product if you want to embed commercially without source-sharing. We are not lawyers and this is not legal advice; if your model is “embed Skyvern inside a closed-source product we sell,” the next step is fifteen minutes with counsel and a read of the Skyvern LICENSE file.

Decision matrix: four typical team profiles

This is the section we would actually use to recommend a pick.

Team profilePickWhy
Ops team running ten internal workflows against vendor portals that change layouts oftenSkyvern (self-host)Vision-first removes selector maintenance; per-bot pricing model does not match the workload; AGPL is fine for internal use.
Engineering team embedding RPA inside a SaaS product they sell to customersUiPath (or Skyvern with a commercial license)AGPL network-use clause kicks in for the SaaS case. Without a commercial Skyvern license, UiPath is the simpler legal posture.
Regulated enterprise (financial services, healthcare) replacing a Selenium estateUiPathAudit, role-based access control, and procurement story are not yet matched on Skyvern.
Indie or small team that wants to ship one or two browser agents without a per-bot billSkyvern (self-host)Cost curve scales with task volume rather than bot count; LLM bill is the only variable.

Skyvern Cloud is the hosted version of the same engine, which removes the AGPL self-host obligation for embedders willing to pay for it. We will replace this with an affiliate slug once a partnership is registered.

Side-by-side rubric

Where the source for a rating is a vendor doc, we cite it. Where the rating depends on your team’s stack, we say so.

Build speed for a single workflow

For a one-off browse-fill-submit job against a page with stable selectors, an experienced UiPath developer in Studio is competitive with a Skyvern prompt. For the same job against a page with shifting markup, Skyvern’s vision-first approach removes most of the selector maintenance loop. The crossover happens at “how often does the target page change.” If you control the target portal, both win; if the target is a third-party vendor portal that ships a new design every quarter, Skyvern’s vision path gets you fewer page-broke incidents.

Total cost (per-bot vs self-host)

UiPath is per-bot priced for unattended automation, with separate Studio author seats and an Orchestrator runtime (uipath.com/pricing). Public list prices vary by region and contract; teams typically pay in the low four figures per unattended bot per month at modest volume. Skyvern self-host has no per-bot license cost; the variable cost is LLM inference per task, which scales with complexity rather than bot count.

# AI-generated cost-projection sketch.
# Treat the per-token math as a back-of-envelope, not a quote.
# <!-- ai-generated -->

monthly_runs = 1_000

# Skyvern self-host: server + LLM inference per run.
infra_per_month_usd = 200             # small VM + Postgres
llm_tokens_per_run = 60_000           # vision + planning, varies by task
llm_usd_per_1k_tokens = 0.005         # blended vision + chat estimate
skyvern_total = infra_per_month_usd + (monthly_runs * llm_tokens_per_run / 1000) * llm_usd_per_1k_tokens

# UiPath: per-bot license, public-list anchor only.
uipath_unattended_bots = 2
uipath_per_bot_usd = 1_500
uipath_total = uipath_unattended_bots * uipath_per_bot_usd

print(f"Skyvern self-host (sketch): ${skyvern_total:,.0f}/mo")
print(f"UiPath two unattended bots (sketch): ${uipath_total:,.0f}/mo")

The shape of the curve matters more than the exact numbers. Skyvern’s cost rises with task volume; UiPath’s rises with bot count. Ten workflows on two unattended bots looks cheap on UiPath until run volume per bot is high; one high-volume workflow looks cheap on Skyvern until LLM tokens compound. Run the script above against your real volume.

Vision model and accuracy on dynamic UIs

Skyvern’s posture is vision-first with a DOM fallback. The model is pluggable; the documented defaults are a vision-capable OpenAI model with Anthropic as an alternative. Accuracy on a given page therefore depends on the model you pick, not on Skyvern’s plumbing. UiPath’s AI Computer Vision activity uses UiPath’s own hosted vision service (docs.uipath.com) and sits as a fallback inside a selector-driven workflow. For pages with shifting layouts, vision-first is the better fit; for ironclad selectors with no design churn, vision rarely decides.

License posture summary

The §13 reading above is the long version. Short version for the rubric: Skyvern is AGPL-3.0 (LICENSE) and UiPath is a closed-source EULA. AGPL is fine for self-hosted internal use, fine for selling Skyvern services if you offer modifications to your network users, and a real problem for embedding Skyvern inside a closed SaaS without a separate commercial license. UiPath’s posture has none of that nuance, with the matching trade-off that you are paying enterprise license fees instead.

Audit and governance for regulated teams

UiPath has the deeper audit story today. Orchestrator ships role-based access control, asset vaulting, queue-level audit logs, and an enterprise compliance posture (SOC 2, ISO 27001, regional data residency). Skyvern’s self-host gives you full control of the data plane (prompts, screenshots, and traces never leave your infrastructure), but the audit features are what you build on top of the artifact bundle. For a small ops team controlling the deployment, the bundle Skyvern produces per task (final screenshot, HAR, LLM trace per the docs) is enough; for a regulated enterprise replacing a UiPath estate, the audit gap is real and you should price the build cost in.

When UiPath still wins

The UiPath wins above are not edge cases. RPA’s biggest installed base is in financial services, insurance, and healthcare, and those buyers are paying for the audit story, the partner ecosystem, and the multi-year contract structure as much as for the runtime.

The other place UiPath still wins is desktop-app automation. Skyvern is browser-shaped; if half your workflow is automating a Citrix-served Windows app, UiPath’s desktop activity catalog is still the default answer. A hybrid stack (UiPath for desktop, Skyvern for the browser-native portion) is a real pattern teams ship.

How UiPath authoring differs in shape

We did not record a UiPath workflow ourselves for this piece. The framework point is the authoring shape, which you can read off the vendor docs without running anything: in UiPath you build a flowchart in Studio (a Windows desktop app), drag in Open Browser, Type Into, and Click activities, point each at a UI element captured by the recorder, and publish the workflow as an unattended job to UiPath Orchestrator (docs.uipath.com, uipath.com/product). For pages where a stable selector is hard to capture, AI Computer Vision activities provide a fallback.

The artifact in UiPath is a .xaml flowchart and an Orchestrator deployment, not a natural-language prompt. Skyvern’s first user types a prompt and reads a screenshot per task; UiPath’s first user holds a flowchart in their head and reasons about per-activity selectors and Orchestrator queues. That difference shows up in hiring, in the time-to-first-working-bot, and in what breaks first when the target page changes. It does not show up in any feature checklist; you only see it once you have the team trying to author a workflow.

How to evaluate this for your team (no sandbox required)

Before you commit to either tool, walk through these questions. The answers usually pick the tool for you.

  1. License posture. Do you ship a SaaS product whose users interact with the RPA layer over a network? If yes, AGPL §13 is in scope and either you commit to source-sharing your Skyvern modifications, you license Skyvern Cloud, or you pick UiPath. If your use is internal-only, the clause does not trigger and Skyvern self-host is fine.
  2. Target-page volatility. Pull the last twelve months of your “RPA bot broke” incidents (or your Selenium/Playwright equivalent if you have not deployed RPA yet). What fraction were “the page changed and the selector died”? At more than ~30%, vision-first pays for itself in maintenance time. Below ~10%, selector-driven UiPath is fine.
  3. Volume shape. Sketch the cost script in the rubric above against your actual numbers: monthly task runs, average LLM token spend per run for the model you would pick, and the UiPath bot count you would need. If the curves cross above your volume, Skyvern wins on cost; if they cross below, UiPath wins.
  4. Team headcount. Skyvern self-host requires people who can run a Docker compose stack, manage Postgres, rotate LLM API keys, and read an LLM trace when a task fails. UiPath requires Studio authors and an Orchestrator admin. A two-person ops team usually has the first set; a fifty-person enterprise RPA practice usually has the second.
  5. Browser-vs-desktop split. What fraction of your target workflows are browser-only versus include desktop apps? Skyvern is browser-shaped. If more than ~20% of your workflow steps live in a Windows desktop app, UiPath (or a hybrid with Skyvern for the browser portion) is the default.
  6. Audit and procurement. Does your security team require SOC 2, ISO 27001, role-based access control with audit logs, and a vendor-of-record for the runtime? UiPath ships those out of the box. Skyvern’s artifact bundle is enough for a small ops team’s internal audit; replicating the enterprise audit posture on Skyvern is a build project, and you should scope it before you commit.
  7. Exit cost. A Skyvern workflow is a natural-language prompt plus a target URL; rewriting against a different vision-first agent (or against browser-use directly) is hours, not weeks. A UiPath estate is .xaml flowcharts, Orchestrator queues, and per-bot license commitments; migrating off is a real project. Price the lock-in into your decision.

If you answer most of these toward Skyvern, the next step is a self-host pilot on a non-production target page. If you answer most toward UiPath, the next step is a Studio Community Edition trial against the same target. Either way, do that on your own form, not someone else’s demo page.

How this fits with the rest of the agent stack

If your workflow is browser-based and the goal is “fill this form, click through this portal,” Skyvern is the right shape. If the workflow is “an LLM agent that uses tools and occasionally drives a browser,” you are looking at a multi-agent framework with browser-use as a tool, not a vision-RPA platform. Our LangGraph in production writeup covers the state-graph patterns orchestration sits inside, our n8n AI Agent nodes review covers the low-code path for ops teams, and the best ops-automation tools roundup catalogs the broader category.

FAQ

Is Skyvern’s accuracy actually competitive with UiPath’s selector-based bots? On stable pages with stable selectors, traditional UiPath bots are near-100% accurate; vision-first systems trade a little of that for resilience to layout change. For shifting third-party portals, vision-first is typically more reliable in production because the failure mode is not “the selector broke today.”

Can we run Skyvern without an OpenAI or Anthropic key? The documented defaults are OpenAI and Anthropic, but the model layer is configurable, and any vision-capable chat completion endpoint matching the supported provider shape is wirable. Self-hosted vision models are a real option for teams that need the data plane fully on-prem.

How does the AGPL clause affect a regulated enterprise running Skyvern internally? It does not. Internal employee use over your own corporate network is not the network-interaction case AGPL §13 targets. The clause triggers when external users interact with a modified Skyvern remotely.

Does UiPath have an answer to vision-first RPA? Yes, the AI Computer Vision activity set (docs.uipath.com). It is positioned as a fallback inside an otherwise selector-driven workflow, not as the default authoring path. The cultural difference between “vision is the default” and “vision is a fallback” is real even when the underlying capability overlaps.

Should we wait for Skyvern to ship enterprise audit features before evaluating? For the regulated-enterprise profile, yes. For the ops-team profile, no. The artifact bundle Skyvern produces (screenshot, HAR, LLM trace per task per the docs) is already enough for a credible internal audit story.

Why is this article a framework rather than a hands-on review? RPA-replacement content is fab-prone, and we would rather ship a vetted decision framework with a verbatim AGPL §13 reading than invent install timings or screenshots. The framework material above is the part of the piece we are most confident in. A “we actually ran it” companion will ship separately if and when we publish a Skyvern install on a pinned commit SHA against a real target.