这是本节的多页打印视图。 点击此处打印.

返回本页常规视图.

Selenium 浏览器自动化项目

Selenium 是支持 web 浏览器自动化的一系列工具和库的综合项目。

它提供了扩展来模拟用户与浏览器的交互,用于扩展浏览器分配的分发服务器, 以及用于实现 W3C WebDriver 规范 的基础结构, 该 规范 允许您为所有主要 Web 浏览器编写可互换的代码。

这个项目是由志愿者贡献者实现的,他们投入了自己数千小时的时间, 并使源代码免费提供给任何人使用、享受和改进。

Selenium 汇集了浏览器供应商,工程师和爱好者,以进一步围绕 Web 平台自动化进行公开讨论。 该项目组织了一次年度会议,以教学和培养社区。

Selenium 的核心是 WebDriver,这是一个编写指令集的接口,可以在许多浏览器中互换运行。 这里有一个最简单的说明:

package dev.selenium.hello;

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

public class HelloSelenium {
    public static void main(String[] args) {
        WebDriver driver = new ChromeDriver();

        driver.get("https://selenium.dev");

        driver.quit();
    }
}
from selenium import webdriver


driver = webdriver.Chrome()

driver.get("http://selenium.dev")

driver.quit()
using OpenQA.Selenium.Chrome;

namespace SeleniumDocs.Hello;

public static class HelloSelenium
{
    public static void Main()
    {
        var driver = new ChromeDriver();
            
        driver.Navigate().GoToUrl("https://selenium.dev");
            
        driver.Quit();
    }
}
require 'selenium-webdriver'

driver = Selenium::WebDriver.for :chrome

driver.get 'https://selenium.dev'

driver.quit
const {Builder} = require('selenium-webdriver');

(async function helloSelenium() {
  let driver = await new Builder().forBrowser('chrome').build();

  await driver.get('https://selenium.dev');

  await driver.quit();
})();
package dev.selenium.hello

import org.openqa.selenium.chrome.ChromeDriver

fun main() {
    val driver = ChromeDriver()

    driver.get("https://selenium.dev")

    driver.quit()
}

请参阅 概述 以检查不同的项目组件, 并确定Selenium是否适合您.

您应该继续阅读 开始, 以了解如何安装Selenium, 将其成功用作测试自动化工具, 并将这样的简单测试扩展为 在大型分布式环境, 以及不同操作系统上的环境上 运行多个浏览器的测试.

1 - 概述

Selenium适合你吗? 请参见不同项目组件的概述.

Selenium 不仅仅是一个工具或 API, 它还包含许多工具.

WebDriver

如果您开始使用桌面网站测试自动化, 那么您将使用 WebDriver APIs. WebDriver 使用浏览器供应商提供的浏览器自动化 API 来控制浏览器和运行测试. 这就像真正的用户正在操作浏览器一样. 由于 WebDriver 不要求使用应用程序代码编译其 API, 因此它本质上不具有侵入性. 因此, 您测试的应用程序与实时推送的应用程序相同.

Selenium IDE

Selenium IDE (Integrated Development Environment 集成开发环境) 是用来开发 Selenium 测试用例的工具. 这是一个易于使用的 Chrome 和 Firefox 浏览器扩展, 通常是开发测试用例最有效率的方式. 它使用现有的 Selenium 命令记录用户在浏览器中的操作, 参数由元素的上下文确定. 这不仅节省了开发时间, 而且是学习 Selenium 脚本语法的一种很好的方法.

Grid

Selenium Grid允许您在不同平台的不同机器上运行测试用例. 可以本地控制测试用例的操作, 当测试用例被触发时, 它们由远端自动执行.

当开发完WebDriver测试之后, 您可能需要在多个浏览器和操作系统的组合上运行测试. 这就是 Grid 的用途所在.

1.1 - 了解组件

使用 WebDriver 构建测试套件需要您理解并有效地使用许多不同的组件。就像软件中的一切一样, 不同的人对同一个想法使用不同的术语。下面是在这个描述中如何使用术语的细分。

专业术语

  • API: 应用程序编程接口。这是一组用来操作 WebDriver 的 “命令”。
  • 库: 一个代码模块,它包含 api 和实现这些 api 所需的代码。库是对应于具体的语言的,例如 Java 的 .jar 文件,.NET 的 .dll 文件,等等。
  • 驱动程序: 负责控制实际的浏览器。大多数驱动程序是由浏览器厂商自己创建的。 驱动程序通常是与浏览器一起在系统上运行的可执行模块,而不是在执行测试套件的系统上。 (尽管它们可能是同一个系统。) 注意: 有些人把驱动称为代理。
  • 框架: 用于支持 WebDriver 套件的附加库。这些框架可能是测试框架,如 JUnit 或 NUnit。 它们也可能是支持自然语言特性的框架,如 Cucumber 或 Robotium。还可以编写和使用框架来操作或配置被测试的系统、 数据创建、测试预言等等。

组成部分

至少,WebDriver 通过一个驱动程序与浏览器对话。通信有两种方式: WebDriver 通过驱动程序向浏览器传递命令, 然后通过相同的路径接收信息。

基本通信

驱动程序是特定于浏览器的,例如 ChromeDriver 对应于谷歌的 Chrome/Chromium, GeckoDriver 对应于 Mozilla 的 Firefox 的,等等。驱动程序在与浏览器相同的系统上运行。 这可能与执行测试本身的系统相同,也可能不同。

上面这个简单的例子就是 _直接_通信。与浏览器的通信也可以是通过 Selenium 服务器或 RemoteWebDriver 进行的 _远程_通信。RemoteWebDriver 与驱动程序和浏览器运行在同一个系统上。

远程通信

远程通信也可以使用 Selenium Server 或 Selenium Grid 进行,这两者依次与主机系统上的驱动程序进行通信

通过 Grid 远程通信

应用框架

WebDriver 有且只有一个任务: 通过上面的任何方法与浏览器通信。WebDriver 对测试一窍不通:它不知道如何比较事物、 断言通过或失败,当然它也不知道报告或 Given/When/Then 语法。

这就是各种框架发挥作用的地方。至少你需要一个与绑定语言相匹配的测试框架,比如. NET 的 NUnit, Java 的 JUnit, Ruby 的 RSpec 等等。

测试框架负责运行和执行 WebDriver 以及测试中相关步骤。因此,您可以认为它看起来类似于下图。

测试框架

像 Cucumber 这样的自然语言框架/工具可能作为上图中测试框架框的一部分存在, 或者它们可能将测试框架完全封装在它们自己的实现中。

1.2 - 深度介绍

Selenium 是一系列工具和库的综合项目,这些工具和库支持 web 浏览器的自动化。

Selenium 控制网页浏览器

Selenium 有很多功能, 但其核心是 web 浏览器自动化的一个工具集, 它使用最好的技术来远程控制浏览器实例, 并模拟用户与浏览器的交互。

它允许用户模拟终端用户执行的常见活动;将文本输入到字段中,选择下拉值和复选框,并单击文档中的链接。 它还提供许多其他控件,比如鼠标移动、任意 JavaScript 执行等等。

虽然 Selenium 主要用于网站的前端测试,但其核心是浏览器用户代理库。 这些接口在应用程序中无处不在,它们鼓励与其他库进行组合,以满足您的目的。

一个接口来统治它们

该项目的指导原则之一是支持所有(主要)浏览器技术的通用接口。 Web 浏览器是非常复杂的,高度工程化的应用程序, 以完全不同的方式执行它们的操作,但是在执行这些操作时,它们通常看起来是一样的 即使文本以相同的字体呈现,图像也会显示在相同的位置,并且链接会将您带到相同的目的地。 下面发生的事情就像白天和黑夜一样不同。 Selenium “抽象”了这些差异,向编写代码的人隐藏了它们的细节和复杂性。 这允许您编写几行代码来执行一个复杂的工作流程, 但是这几行代码将在 Firefox、 Internet Explorer、 Chrome 和所有其他支持的浏览器上执行。

工具和支持

Selenium 的极简设计方法使其具有通用性,可以作为更大应用程序中的组件。 Selenium 保护伞下提供的周边基础设施为您提供了组合自己的 浏览器 grid 的工具, 因此测试就可以跨一系列机器在不同的浏览器和多个操作系统上运行。

想象一下, 服务器机房或数据中心的一组计算机同时启动浏览器,访问站点的链接、表单和表格 — 全天 24 小时测试应用程序。 通过为最常见的语言提供的简单编程接口, 这些测试将不知疲倦地并行运行, 当错误发生时向您报告。

通过为用户提供工具和文档, 不仅可以控制浏览器, 还可以方便地扩展和部署这些grid, 从而帮助您实现这一目标。

谁在使用 Selenium

世界上许多最重要的公司都在基于浏览器的测试中采用了 Selenium, 这常常取代了多年来涉及其他专有工具的工作。 随着它越来越受欢迎, 它的需求和挑战也成倍增加。

随着网络变得越来越复杂,新的技术被添加到网站上, 这个项目的任务就是尽可能地跟上它们。 作为一个开源项目,这种支持是通过许多志愿者的慷慨捐赠来提供的, 每个志愿者都有一份“日常工作”。

该项目的另一个任务是鼓励更多的志愿者参与到这项工作中来, 并建立一个强大的社区,以便项目能够继续跟上新兴的技术, 并继续成为功能测试自动化的主导平台。

2 - WebDriver

WebDriver以原生的方式驱动浏览器, 在此了解更多内容.

WebDriver 以本地化方式驱动浏览器,就像用户在本地或使用 Selenium 服务器的远程机器上所做的那样,这标志着浏览器自动化的飞跃。

Selenium WebDriver 指的是语言绑定和各个浏览器控制代码的实现。 这通常被称为 WebDriver

Selenium WebDriver 是 W3C 推荐标准

  • WebDriver 被设计成一个简单和简洁的编程接口。

  • WebDriver 是一个简洁的面向对象 API。

  • 它能有效地驱动浏览器。

2.1 - 入门指南

如果你是Selenium的新手, 我们有一些资源帮助你快速入门.

Selenium 通过使用 WebDriver 支持市场上所有主流浏览器的自动化。 Webdriver 是一个 API 和协议,它定义了一个语言中立的接口,用于控制 web 浏览器的行为。 每个浏览器都有一个特定的 WebDriver 实现,称为驱动程序。 驱动程序是负责委派给浏览器的组件,并处理与 Selenium 和浏览器之间的通信。

这种分离是有意识地努力让浏览器供应商为其浏览器的实现负责的一部分。 Selenium 在可能的情况下使用这些第三方驱动程序, 但是在这些驱动程序不存在的情况下,它也提供了由项目自己维护的驱动程序。

Selenium 框架通过一个面向用户的界面将所有这些部分连接在一起, 该界面允许透明地使用不同的浏览器后端, 从而实现跨浏览器和跨平台自动化。

Selenium setup is quite different from the setup of other commercial tools. Before you can start writing Selenium code, you have to install the language bindings libraries for your language of choice, the browser you want to use, and the driver for that browser.

Follow the links below to get up and going with Selenium WebDriver.

If you wish to start with a low-code/record and playback tool, please check Selenium IDE

Once you get things working, if you want to scale up your tests, check out the Selenium Grid.

2.1.1 - 安装Selenium类库

配置自动化的浏览器.

首先,您需要为自动化项目安装 Selenium 绑定库。 库的安装过程取决于您选择使用的语言。

请求对应的程序语言

查看该库所支持java的最低版本 here.

应熟练掌握build tool以安装支持java的Selenium库

Maven

具体的依赖位于项目中的 pom.xml 文件:

        <dependency>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-java</artifactId>
            <version>4.15.0</version>
        </dependency>

Gradle

具体的依赖位于项目中的 build.gradle 文件中的 testImplementation:

    testImplementation 'org.seleniumhq.selenium:selenium-java:4.15.0'
    testImplementation 'org.junit.jupiter:junit-jupiter-engine:5.10.0'

该库所支持的Python版本最低版本可以在 支持的Python版本 章节中找到 PyPi

这里提供了几种不同的方式来安装 Selenium .

Pip

pip install selenium

下载

此外你可以从这里下载 PyPI source archive (selenium-x.x.x.tar.gz) 并通过: setup.py 文件安装:

python setup.py install

在项目中使用

为了在项目中使用它,需要将它添加到 requirements.txt 文件中:

selenium==4.15.2

Selenium 所支持的所有平台的列表一览 见诸于 Nuget

该处阐述了一些安装Selenium的选项.

包管理器

Install-Package Selenium.WebDriver

.NET CLI

dotnet add package Selenium.WebDriver

CSProj

csproj 文件里, 具体的依赖 PackageReference(包参数) 位于 ItemGroup (项目组)中:

      <PackageReference Include="Selenium.WebDriver" Version="4.15.0" />

其他附加思虑事项

更多的注意事项,适用于使用 Visual Studio Code (vscode) 和 C#

安装兼容的 .NET SDK 作为章节的先决条件 同时安装 vscode 的扩展 (Ctrl-Shift-X)以适配 C# 和 NuGet 可以遵照此处进行 操作指南 创建 C# 控制台项目并运行 “Hello World”. 你也可以用命令行 dotnet new NUnit 创建NUnit初阶项目. 确保文件 %appdata%\NuGet\nuget.config 已经配置完成,就像某位开发者报告的问题一样,它可能因为某种因素被自动清空. 如果 nuget.config 是空的,或者未配置的,那么 .NET 创建的Selenium项目可能失败. 加入如下章节到文件 nuget.config 如果出现清空的情况:

<configuration>
  <packageSources>
    <add key="nuget.org" value="https://api.nuget.org/v3/index.json" protocolVersion="3" />
    <add key="nuget.org" value="https://www.nuget.org/api/v2/" />   
  </packageSources>
...

更多关于 nuget.config 的信息 点击. 你可能需要按照自己的需求配置 nuget.config .

现在,返回 vscode ,按下 Ctrl-Shift-P, 然后键入 “NuGet Add Package”, 并选择自己需要的 Selenium 包,例如 Selenium.WebDriver. 按下回车并选择版本. 现在你可以使用说明文档中关于 C# vscode下的案例了.

你可以查看 Selenium 对 Ruby 版本支持和最低支持. 具体位于 rubygems.org

Selenium 可以使用两种不同方法安装.

手动安装

gem install selenium-webdriver

加入项目的 gemfile

gem 'selenium-devtools', '~> 0.119'

You can find the minimum required version of Node for any given version of Selenium in the 你可以在此查看 Selenium 对 Node 的版本支持情况 位于 Node Support Policy 中的相关章节 npmjs

Selenium is typically installed using npm.

本地安装

npm install selenium-webdriver

加入项目

在你的项目 package.json, 必须加入到 dependencies:

        "mocha": "^10.2.0"
Use the Java bindings for Kotlin.

下一步

创建你的第一个Selenium脚本

2.1.2 - 编写第一个Selenium脚本

逐步构建一个Selenium脚本的说明

当你完成 Selenium安装 后, 便可以开始书写Selenium脚本了.

八个基本组成部分

Selenium所做的一切, 就是发送给浏览器命令, 用以执行某些操作或为信息发送请求. 您将使用Selenium执行的大部分操作, 都是以下基本命令的组合

Click on the link to “View full example on GitHub” to see the code in context.

1. 使用驱动实例开启会话

关于如何启动会话,请浏览我们的文档 驱动会话

        WebDriver driver = new ChromeDriver();
driver = webdriver.Chrome()
        IWebDriver driver = new ChromeDriver();
driver = Selenium::WebDriver.for :chrome
    driver = await new Builder().forBrowser('chrome').build();
        driver = ChromeDriver()

2. 在浏览器上执行操作

在本例中, 我们 导航 到一个网页.

        driver.get("https://www.selenium.dev/selenium/web/web-form.html");
driver.get("https://www.selenium.dev/selenium/web/web-form.html")
        driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/web-form.html");
driver.get('https://www.selenium.dev/selenium/web/web-form.html')
    await driver.get('https://www.selenium.dev/selenium/web/web-form.html');
        driver.get("https://www.selenium.dev/selenium/web/web-form.html")

3. 请求 浏览器信息

您可以请求一系列关于浏览器的信息 , 包括窗口句柄、浏览器尺寸/位置、cookie、警报等.

        driver.getTitle();
title = driver.title
        var title = driver.Title;
    let title = await driver.getTitle();
        val title = driver.title

4. 建立等待策略

将代码与浏览器的当前状态同步 是Selenium面临的最大挑战之一, 做好它是一个高级主题.

基本上, 您希望在尝试定位元素之前, 确保该元素位于页面上, 并且在尝试与该元素交互之前, 该元素处于可交互状态.

隐式等待很少是最好的解决方案, 但在这里最容易演示, 所以我们将使用它作为占位符.

阅读更多关于等待策略 的信息.

        driver.manage().timeouts().implicitlyWait(Duration.ofMillis(500));
driver.implicitly_wait(0.5)
        driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromMilliseconds(500);
driver.manage.timeouts.implicit_wait = 500
    await driver.manage().setTimeouts({implicit: 500});
        driver.manage().timeouts().implicitlyWait(Duration.ofMillis(500))

5. 发送命令 查找元素

大多数Selenium会话中的主要命令都与元素相关, 如果不先找到元素, 就无法与之交互.

        WebElement textBox = driver.findElement(By.name("my-text"));
        WebElement submitButton = driver.findElement(By.cssSelector("button"));
text_box = driver.find_element(by=By.NAME, value="my-text")
submit_button = driver.find_element(by=By.CSS_SELECTOR, value="button")
        var textBox = driver.FindElement(By.Name("my-text"));
        var submitButton = driver.FindElement(By.TagName("button"));
text_box = driver.find_element(name: 'my-text')
submit_button = driver.find_element(tag_name: 'button')
    let textBox = await driver.findElement(By.name('my-text'));
    let submitButton = await driver.findElement(By.css('button'));
        var textBox = driver.findElement(By.name("my-text"))
        val submitButton = driver.findElement(By.cssSelector("button"))

6. 操作元素

对于一个元素, 只有少数几个操作可以执行, 但您将经常使用它们.

        textBox.sendKeys("Selenium");
        submitButton.click();
text_box.send_keys("Selenium")
submit_button.click()
        textBox.SendKeys("Selenium");
        submitButton.Click();
text_box.send_keys('Selenium')
submit_button.click
    await textBox.sendKeys('Selenium');
    await submitButton.click();
        textBox.sendKeys("Selenium")
        submitButton.click()

7. 获取元素信息

元素存储了很多被请求的信息.

        message.getText();
text = message.text
        var value = message.Text;
    let value = await message.getText();
        val value = message.getText()

8. 结束会话

这将结束驱动程序进程, 默认情况下, 该进程也会关闭浏览器. 无法向此驱动程序实例发送更多命令.

See Quitting Sessions.

Running Selenium File

接下来的步骤

Most Selenium users execute many sessions and need to organize them to minimize duplication and keep the code more maintainable. Read on to learn about how to put this code into context for your use case with Using Selenium.

2.1.3 - Organizing and Executing Selenium Code

Scaling Selenium execution with an IDE and a Test Runner library

If you want to run more than a handful of one-off scripts, you need to be able to organize and work with your code. This page should give you ideas for how to actually do productive things with your Selenium code.

Common Uses

Most people use Selenium to execute automated tests for web applications, but Selenium support any use case of browser automation.

Repetitive Tasks

Perhaps you need to log into a website and download something, or submit a form. You can create a Selenium script to run with a service at preset times.

Web Scraping

Are you looking to collect data from a site that doesn’t have an API? Selenium will let you do this, but please make sure you are familiar with the website’s terms of service as some websites do not permit it and others will even block Selenium.

Testing

Running Selenium for testing requires making assertions on actions taken by Selenium. So a good assertion library is required. Additional features to provide structure for tests require use of Test Runner.

IDEs

Regardless of how you use Selenium code, you won’t be very effective writing or executing it without a good Integrated Developer Environment. Here are some common options…

Test Runner

Even if you aren’t using Selenium for testing, if you have advanced use cases, it might make sense to use a test runner to better organize your code. Being able to use before/after hooks and run things in groups or in parallel can be very useful.

Choosing

There are many different test runners available.

All the code examples in this documentation can be found in (or is being moved to) our example directories that use test runners and get executed every release to ensure all the code is correct and updated. Here is a list of test runners with links. The first item is the one that is used by this repository and the one that will be used for all examples on this page.

  • JUnit - A widely-used testing framework for Java-based Selenium tests.
  • TestNG - Offers extra features like parallel test execution and parameterized tests.
  • pytest - A preferred choice for many, thanks to its simplicity and powerful plugins.
  • unittest - Python’s standard library testing framework.
  • NUnit - A popular unit-testing framework for .NET.
  • MS Test - Microsoft’s own unit testing framework.
  • RSpec - The most widely used testing library for running Selenium tests in Ruby.
  • Minitest - A lightweight testing framework that comes with Ruby standard library.
  • Jest - Primarily known as a testing framework for React, it can also be used for Selenium tests.
  • Mocha - The most common JS library for running Selenium tests.

Installing

This is very similar to what was required in Install a Selenium Library. This code is only showing examples for what is being used in our Documentation Examples project.

Maven

Gradle

To use it in a project, add it to the requirements.txt file:

in the project’s csproj file, specify the dependency as a PackageReference in ItemGroup:

Add to project’s gemfile

In your project’s package.json, add requirement to dependencies:

Asserting

Setting Up and Tearing Down

Executing

Maven

mvn clean test

Gradle

gradle clean test
mocha runningTests.spec.js

Examples

In First script, we saw each of the components of a Selenium script. Here’s an example of that code using a test runner:

package dev.selenium.getting_started;

import org.junit.jupiter.api.Test;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

import java.time.Duration;

import static org.junit.jupiter.api.Assertions.assertEquals;

public class UsingSeleniumTest {

    @Test
    public void eightComponents() {
        WebDriver driver = new ChromeDriver();
        driver.get("https://www.selenium.dev/selenium/web/web-form.html");

        String title = driver.getTitle();
        assertEquals("Web form", title);

        driver.manage().timeouts().implicitlyWait(Duration.ofMillis(500));

        WebElement textBox = driver.findElement(By.name("my-text"));
        WebElement submitButton = driver.findElement(By.cssSelector("button"));

        textBox.sendKeys("Selenium");
        submitButton.click();

        WebElement message = driver.findElement(By.id("message"));
        String value = message.getText();
        assertEquals("Received!", value);

        driver.quit();
    }

}
from selenium import webdriver
from selenium.webdriver.common.by import By


def test_eight_components():
    driver = webdriver.Chrome()

    driver.get("https://www.selenium.dev/selenium/web/web-form.html")

    title = driver.title
    assert title == "Web form"

    driver.implicitly_wait(0.5)

    text_box = driver.find_element(by=By.NAME, value="my-text")
    submit_button = driver.find_element(by=By.CSS_SELECTOR, value="button")

    text_box.send_keys("Selenium")
    submit_button.click()

    message = driver.find_element(by=By.ID, value="message")
    value = message.text
    assert value == "Received!"

    driver.quit()
using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace SeleniumDocs.GettingStarted
{
    [TestClass]
    public class UsingSeleniumTest
    {

        [TestMethod]
        public void EightComponents()
        {
            IWebDriver driver = new ChromeDriver();

            driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/web-form.html");

            var title = driver.Title;
            Assert.AreEqual("Web form", title);

            driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromMilliseconds(500);

            var textBox = driver.FindElement(By.Name("my-text"));
            var submitButton = driver.FindElement(By.TagName("button"));
            
            textBox.SendKeys("Selenium");
            submitButton.Click();
            
            var message = driver.FindElement(By.Id("message"));
            var value = message.Text;
            Assert.AreEqual("Received!", value);
            
            driver.Quit();
        }
    }
}
# frozen_string_literal: true

require 'spec_helper'

RSpec.describe 'Using Selenium' do
  it 'uses eight components' do
    driver = Selenium::WebDriver.for :chrome

    driver.get('https://www.selenium.dev/selenium/web/web-form.html')

    title = driver.title
    expect(title).to eq('Web form')

    driver.manage.timeouts.implicit_wait = 500

    text_box = driver.find_element(name: 'my-text')
    submit_button = driver.find_element(tag_name: 'button')

    text_box.send_keys('Selenium')
    submit_button.click

    message = driver.find_element(id: 'message')
    value = message.text
    expect(value).to eq('Received!')

    driver.quit
  end
end
const {By, Builder} = require('selenium-webdriver');
const assert = require("assert");

  describe('First script', function () {
    let driver;
    
    before(async function () {
      driver = await new Builder().forBrowser('chrome').build();
    });
    
    it('First Selenium script with mocha', async function () {
      await driver.get('https://www.selenium.dev/selenium/web/web-form.html');
      
      let title = await driver.getTitle();
      assert.equal("Web form", title);
      
      await driver.manage().setTimeouts({implicit: 500});
      
      let textBox = await driver.findElement(By.name('my-text'));
      let submitButton = await driver.findElement(By.css('button'));
      
      await textBox.sendKeys('Selenium');
      await submitButton.click();
      
      let message = await driver.findElement(By.id('message'));
      let value = await message.getText();
      assert.equal("Received!", value);
    });
  
    after(async () => await driver.quit());
  });

Next Steps

Take what you’ve learned and build out your Selenium code!

As you find more functionality that you need, read up on the rest of our WebDriver documentation.

2.2 - 驱动会话

启动和停止会话, 用于打开和关闭浏览器.

创建会话

创建会话对应于W3C的命令 新建会话

会话是通过初始化新的驱动类对象自动创建的.

每种语言都允许使用来自这些类 (或等效类) 之一的参数创建会话:

本地驱动

启动本地驱动的首要唯一参数 包括在本地计算机上有关启动所需驱动服务的信息.

  • 服务 对象仅适用于本地驱动,并提供有关浏览器驱动的信息

远程驱动

用于启动远程驱动的首要唯一参数包括有关在何处执行代码的信息. 请浏览 远程驱动章节中的详细信息

退出会话

退出会话对应于W3C的命令 删除会话.

重要提示: quit 方法与 close 方法不同, 建议始终使用 quit 来结束会话

2.2.1 - 浏览器选项

这些capabilities用于所有浏览器.

在 Selenium 3 中, capabilities是借助"Desired Capabilities"类定义于会话中的. 从 Selenium 4 开始, 您必须使用浏览器选项类. 对于远程驱动程序会话, 浏览器选项实例是必需的, 因为它确定将使用哪个浏览器.

这些选项在 Capabilities 的 w3c 规范中进行了描述.

每个浏览器都有 自定义选项 , 是规范定义之外的内容.

browserName

Browser name is set by default when using an Options class instance.

browserVersion

This capability is optional, this is used to set the available browser version at remote end. In recent versions of Selenium, if the version is not found on the system, it will be automatically downloaded by Selenium Manager

pageLoadStrategy

共有三种类型的页面加载策略.

页面加载策略可以在此链接查询 document.readyState , 如下表所述:

策略就绪状态备注
normalcomplete默认值, 等待所有资源下载
eagerinteractiveDOM 访问已准备就绪, 但诸如图像的其他资源可能仍在加载
noneAny完全不会阻塞 WebDriver

文档的 document.readyState 属性描述当前文档的加载状态.

当通过URL导航到新页面时, 默认情况下, WebDriver将暂缓完成导航方法 (例如, driver.navigate().get())直到文档就绪状态完成. 这 并非意味着该页面已完成加载, 特别是对于使用 JavaScript 在就绪状态返回完成后 动态加载内容单页应用程序的站点. 另请注意此行为不适用于单击元素或提交表单后出现的导航行为.

如果由于下载对自动化不重要的资源(例如, 图像、css、js) 而需要很长时间才能加载页面, 您可以将默认参数normal更改为 eagernone 以加快会话加载速度. 此值适用于整个会话, 因此请确保您的 等待策略 足够普适.

normal (默认值)

WebDriver一直等到 load 事件触发并返回.

import org.openqa.selenium.PageLoadStrategy;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.chrome.ChromeDriver;

public class pageLoadStrategy {
  public static void main(String[] args) {
    ChromeOptions chromeOptions = new ChromeOptions();
    chromeOptions.setPageLoadStrategy(PageLoadStrategy.NORMAL);
    WebDriver driver = new ChromeDriver(chromeOptions);
    try {
      // Navigate to Url
      driver.get("https://google.com");
    } finally {
      driver.quit();
    }
  }
}
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.page_load_strategy = 'normal'
driver = webdriver.Chrome(options=options)
driver.get("http://www.google.com")
driver.quit()
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace pageLoadStrategy {
  class pageLoadStrategy {
    public static void Main(string[] args) {
      var chromeOptions = new ChromeOptions();
      chromeOptions.PageLoadStrategy = PageLoadStrategy.Normal;
      IWebDriver driver = new ChromeDriver(chromeOptions);
      try {
        driver.Navigate().GoToUrl("https://example.com");
      } finally {
        driver.Quit();
      }
    }
  }
}
require 'selenium-webdriver'
options = Selenium::WebDriver::Options.chrome
options.page_load_strategy = :normal

driver = Selenium::WebDriver.for :chrome, options: options
driver.get('https://www.google.com')
    it('Navigate using normal page loading strategy', async function () {
      let driver = await env
        .builder()
        .setChromeOptions(options.setPageLoadStrategy('normal'))
        .build();

      await driver.get('https://www.selenium.dev/selenium/web/blank.html');
import org.openqa.selenium.PageLoadStrategy
import org.openqa.selenium.chrome.ChromeDriver
import org.openqa.selenium.chrome.ChromeOptions

fun main() {
  val chromeOptions = ChromeOptions()
  chromeOptions.setPageLoadStrategy(PageLoadStrategy.NORMAL)
  val driver = ChromeDriver(chromeOptions)
  try {
    driver.get("https://www.google.com")
  }
  finally {
    driver.quit()
  }
}

eager

WebDriver一直等到 DOMContentLoaded 事件触发并返回.

import org.openqa.selenium.PageLoadStrategy;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.chrome.ChromeDriver;

public class pageLoadStrategy {
  public static void main(String[] args) {
    ChromeOptions chromeOptions = new ChromeOptions();
    chromeOptions.setPageLoadStrategy(PageLoadStrategy.EAGER);
    WebDriver driver = new ChromeDriver(chromeOptions);
    try {
      // Navigate to Url
      driver.get("https://google.com");
    } finally {
      driver.quit();
    }
  }
}
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.page_load_strategy = 'eager'
driver = webdriver.Chrome(options=options)
driver.get("http://www.google.com")
driver.quit()
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace pageLoadStrategy {
  class pageLoadStrategy {
    public static void Main(string[] args) {
      var chromeOptions = new ChromeOptions();
      chromeOptions.PageLoadStrategy = PageLoadStrategy.Eager;
      IWebDriver driver = new ChromeDriver(chromeOptions);
      try {
        driver.Navigate().GoToUrl("https://example.com");
      } finally {
        driver.Quit();
      }
    }
  }
}
require 'selenium-webdriver'
options = Selenium::WebDriver::Options.chrome
options.page_load_strategy = :eager

driver = Selenium::WebDriver.for :chrome, options: options
driver.get('https://www.google.com')
    it('Navigate using eager page loading strategy', async function () {
      let driver = await env
        .builder()
        .setChromeOptions(options.setPageLoadStrategy('eager'))
        .build();

      await driver.get('https://www.selenium.dev/selenium/web/blank.html');
import org.openqa.selenium.PageLoadStrategy
import org.openqa.selenium.chrome.ChromeDriver
import org.openqa.selenium.chrome.ChromeOptions

fun main() {
  val chromeOptions = ChromeOptions()
  chromeOptions.setPageLoadStrategy(PageLoadStrategy.EAGER)
  val driver = ChromeDriver(chromeOptions)
  try {
    driver.get("https://www.google.com")
  }
  finally {
    driver.quit()
  }
}

none

WebDriver 仅等待初始页面已下载.

import org.openqa.selenium.PageLoadStrategy;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.chrome.ChromeDriver;

public class pageLoadStrategy {
  public static void main(String[] args) {
    ChromeOptions chromeOptions = new ChromeOptions();
    chromeOptions.setPageLoadStrategy(PageLoadStrategy.NONE);
    WebDriver driver = new ChromeDriver(chromeOptions);
    try {
      // Navigate to Url
      driver.get("https://google.com");
    } finally {
      driver.quit();
    }
  }
}
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.page_load_strategy = 'none'
driver = webdriver.Chrome(options=options)
driver.get("http://www.google.com")
driver.quit()
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace pageLoadStrategy {
  class pageLoadStrategy {
    public static void Main(string[] args) {
      var chromeOptions = new ChromeOptions();
      chromeOptions.PageLoadStrategy = PageLoadStrategy.None;
      IWebDriver driver = new ChromeDriver(chromeOptions);
      try {
        driver.Navigate().GoToUrl("https://example.com");
      } finally {
        driver.Quit();
      }
    }
  }
}
require 'selenium-webdriver'
options = Selenium::WebDriver::Options.chrome
options.page_load_strategy = :none

driver = Selenium::WebDriver.for :chrome, options: options
driver.get('https://www.google.com')
    it('Navigate using none page loading strategy', async function () {
      let driver = await env
        .builder()
        .setChromeOptions(options.setPageLoadStrategy('none'))
        .build();

      await driver.get('https://www.selenium.dev/selenium/web/blank.html');
import org.openqa.selenium.PageLoadStrategy
import org.openqa.selenium.chrome.ChromeDriver
import org.openqa.selenium.chrome.ChromeOptions

fun main() {
  val chromeOptions = ChromeOptions()
  chromeOptions.setPageLoadStrategy(PageLoadStrategy.NONE)
  val driver = ChromeDriver(chromeOptions)
  try {
    driver.get("https://www.google.com")
  }
  finally {
    driver.quit()
  }
}

platformName

这标识了远端的操作系统, 获取 platformName 将返回操作系统的名称.

在基于云的供应者中, 设置 platformName 将在远程端设置操作系统.

acceptInsecureCerts

此功能检查在会话期间导航时 是否使用了过期的 (或) 无效的 TLS Certificate .

如果将功能设置为 false, 则页面浏览遇到任何域证书问题时, 将返回insecure certificate error . 如果设置为 true, 则浏览器将信任无效证书.

默认情况下, 此功能将信任所有自签名证书. 设置后, acceptInsecureCerts 功能将在整个会话中生效.

timeouts

WebDriver session 具有一定的 session timeout 间隔, 在此间隔内, 用户可以控制执行脚本或从浏览器检索信息的行为.

每个会话超时都配置有不同 timeouts 的组合, 如下所述:

Script Timeout:

指定在当前浏览上下文中, 中断正在执行脚本的时机. WebDriver创建新会话时, 将设置默认的超时时间为 30,000 .

Page Load Timeout:

指定在当前浏览上下文中, 加载网页的时间间隔. WebDriver创建新会话时, 默认设置超时时间为 300,000 . 如果页面加载限制了给定 (或默认) 的时间范围, 则该脚本将被 TimeoutException 停止.

Implicit Wait Timeout

指定在定位元素时, 等待隐式元素定位策略的时间. WebDriver创建新会话时, 将设置默认超时时间为 0 .

unhandledPromptBehavior

指定当前会话 user prompt handler 的状态. 默认为 dismiss and notify state .

User Prompt Handler

这定义了在远端出现用户提示时必须采取的措施. 该行为由unhandledPromptBehavior 功能定义, 具有以下状态:

  • dismiss
  • accept
  • dismiss and notify
  • accept and notify
  • ignore

setWindowRect

用于所有支持 调整大小和重新定位 命令 的远程终端.

strictFileInteractability

新功能用于是否对 类型为文件的输入(input type=file) 元素进行严格的交互性检查. 默认关闭严格性检查, 在将 元素的Send Keys 方法作用于隐藏的文件上传时, 会有控制方面的行为区别.

proxy

代理服务器充当客户端和服务器之间的请求中介. 简述而言, 流量将通过代理服务器流向您请求的地址, 然后返回.

使用代理服务器用于Selenium的自动化脚本, 可能对以下方面有益:

  • 捕获网络流量
  • 模拟网站后端响应
  • 在复杂的网络拓扑结构或严格的公司限制/政策下访问目标站点.

如果您在公司环境中, 并且浏览器无法连接到URL, 则最有可能是因为环境, 需要借助代理进行访问.

Selenium WebDriver提供了如下设置代理的方法

import org.openqa.selenium.Proxy;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;

public class ProxyTest {
  public static void main(String[] args) {
    Proxy proxy = new Proxy();
    proxy.setHttpProxy("<HOST:PORT>");
    ChromeOptions options = new ChromeOptions();
    options.setCapability("proxy", proxy);
    WebDriver driver = new ChromeDriver(options);
    driver.get("https://www.google.com/");
    driver.manage().window().maximize();
    driver.quit();
  }
}
from selenium import webdriver

PROXY = "<HOST:PORT>"
webdriver.DesiredCapabilities.FIREFOX['proxy'] = {
"httpProxy": PROXY,
"ftpProxy": PROXY,
"sslProxy": PROXY,
"proxyType": "MANUAL",

}

with webdriver.Firefox() as driver:
# Open URL
    driver.get("https://selenium.dev")
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

public class ProxyTest{
public static void Main() {
ChromeOptions options = new ChromeOptions();
Proxy proxy = new Proxy();
proxy.Kind = ProxyKind.Manual;
proxy.IsAutoDetect = false;
proxy.SslProxy = "<HOST:PORT>";
options.Proxy = proxy;
options.AddArgument("ignore-certificate-errors");
IWebDriver driver = new ChromeDriver(options);
driver.Navigate().GoToUrl("https://www.selenium.dev/");
}
}
# this code was written with Selenium 4

proxy = Selenium::WebDriver::Proxy.new(http: '<HOST:PORT>')
cap   = Selenium::WebDriver::Remote::Capabilities.chrome(proxy: proxy)

driver = Selenium::WebDriver.for(:chrome, capabilities: cap)
driver.get('http://google.com')
let webdriver = require('selenium-webdriver');
let chrome = require('selenium-webdriver/chrome');
let proxy = require('selenium-webdriver/proxy');
let opts = new chrome.Options();

(async function example() {
opts.setProxy(proxy.manual({http: '<HOST:PORT>'}));
let driver = new webdriver.Builder()
.forBrowser('chrome')
.setChromeOptions(opts)
.build();
try {
await driver.get("https://selenium.dev");
}
finally {
await driver.quit();
}
}());
import org.openqa.selenium.Proxy
import org.openqa.selenium.WebDriver
import org.openqa.selenium.chrome.ChromeDriver
import org.openqa.selenium.chrome.ChromeOptions

class proxyTest {
fun main() {

        val proxy = Proxy()
        proxy.setHttpProxy("<HOST:PORT>")
        val options = ChromeOptions()
        options.setCapability("proxy", proxy)
        val driver: WebDriver = ChromeDriver(options)
        driver["https://www.google.com/"]
        driver.manage().window().maximize()
        driver.quit()
    }
}

2.2.2 - HTTP Client Configuration

允许您为HTTP库设置各种参数.

2.2.3 - Driver Service Class

服务类用于管理驱动程序的启动和停止. They can not be used with a Remote WebDriver session.

Service classes allow you to specify information about the driver, like location and which port to use. They also let you specify what arguments get passed to the command line. Most of the useful arguments are related to logging.

Default Service instance

To start a driver with a default service instance:

    ChromeDriverService service = new ChromeDriverService.Builder().build();
    driver = new ChromeDriver(service);
    service = webdriver.ChromeService()
    driver = webdriver.Chrome(service=service)
            var service = ChromeDriverService.CreateDefaultService();
            driver = new ChromeDriver(service);
    service = Selenium::WebDriver::Service.chrome
    @driver = Selenium::WebDriver.for :chrome, service: service

Driver location

Note: If you are using Selenium 4.6 or greater, you shouldn’t need to set a driver location. If you can not update Selenium or have an advanced use case here is how to specify the driver location:

    ChromeDriverService service =
        new ChromeDriverService.Builder().usingDriverExecutable(driverPath).build();
    service = webdriver.ChromeService(executable_path=chromedriver_bin)
            var service = ChromeDriverService.CreateDefaultService(GetDriverLocation(options));
    service.executable_path = driver_path

Driver port

If you want the driver to run on a specific port, you may specify it as follows:

    ChromeDriverService service = new ChromeDriverService.Builder().usingPort(1234).build();
    service = webdriver.ChromeService(port=1234)
            service.Port = 1234;
    service.port = 1234

Logging

Logging functionality varies between browsers. Most browsers allow you to specify location and level of logs. Take a look at the respective browser page:

2.2.4 - 远程WebDriver

Selenium lets you automate browsers on remote computers if there is a Selenium Grid running on them. The computer that executes the code is referred to as the client computer, and the computer with the browser and driver is referred to as the remote computer or sometimes as an end-node. To direct Selenium tests to the remote computer, you need to use a Remote WebDriver class and pass the URL including the port of the grid on that machine. Please see the grid documentation for all the various ways the grid can be configured.

Basic Example

The driver needs to know where to send commands to and which browser to start on the Remote computer. So an address and an options instance are both required.

    ChromeOptions options = new ChromeOptions();
    driver = new RemoteWebDriver(gridUrl, options);
    options = webdriver.ChromeOptions()
    driver = webdriver.Remote(command_executor=server, options=options)
            var options = new ChromeOptions();
            driver = new RemoteWebDriver(GridUrl, options);
    options = Selenium::WebDriver::Options.chrome
    driver = Selenium::WebDriver.for :remote, url: grid_url, options: options

Uploads

Uploading a file is more complicated for Remote WebDriver sessions because the file you want to upload is likely on the computer executing the code, but the driver on the remote computer is looking for the provided path on its local file system. The solution is to use a Local File Detector. When one is set, Selenium will bundle the file, and send it to the remote machine, so the driver can see the reference to it. Some bindings include a basic local file detector by default, and all of them allow for a custom file detector.

Java does not include a Local File Detector by default, so you must always add one to do uploads.
    ((RemoteWebDriver) driver).setFileDetector(new LocalFileDetector());
    WebElement fileInput = driver.findElement(By.cssSelector("input[type=file]"));
    fileInput.sendKeys(uploadFile.getAbsolutePath());
    driver.findElement(By.id("file-submit")).click();

Python adds a local file detector to remote webdriver instances by default, but you can also create your own class.

    driver.file_detector = LocalFileDetector()
    file_input = driver.find_element(By.CSS_SELECTOR, "input[type='file']")
    file_input.send_keys(upload_file)
    driver.find_element(By.ID, "file-submit").click()
.NET adds a local file detector to remote webdriver instances by default, but you can also create your own class.
            ((RemoteWebDriver)driver).FileDetector = new LocalFileDetector();
            IWebElement fileInput = driver.FindElement(By.CssSelector("input[type=file]"));
            fileInput.SendKeys(uploadFile);
            driver.FindElement(By.Id("file-submit")).Click();
Ruby adds a local file detector to remote webdriver instances by default, but you can also create your own lambda:
    driver.file_detector = ->((filename, *)) { filename.include?('selenium') && filename }
    file_input = driver.find_element(css: 'input[type=file]')
    file_input.send_keys(upload_file)
    driver.find_element(id: 'file-submit').click

Downloads

Chrome, Edge and Firefox each allow you to set the location of the download directory. When you do this on a remote computer, though, the location is on the remote computer’s local file system. Selenium allows you to enable downloads to get these files onto the client computer.

Enable Downloads in the Grid

Regardless of the client, when starting the grid in node or standalone mode, you must add the flag:

--enable-managed-downloads true

Enable Downloads in the Client

The grid uses the se:downloadsEnabled capability to toggle whether to be responsible for managing the browser location. Each of the bindings have a method in the options class to set this.

    ChromeOptions options = new ChromeOptions();
    options.setEnableDownloads(true);
    driver = new RemoteWebDriver(gridUrl, options);
    options = webdriver.ChromeOptions()
    options.enable_downloads = True
    driver = webdriver.Remote(command_executor=server, options=options)
            ChromeOptions options = new ChromeOptions
            {
                EnableDownloads = true
            };
            driver = new RemoteWebDriver(GridUrl, options);
    options = Selenium::WebDriver::Options.chrome(enable_downloads: true)
    driver = Selenium::WebDriver.for :remote, url: grid_url, options: options

List Downloadable Files

Be aware that Selenium is not waiting for files to finish downloading, so the list is an immediate snapshot of what file names are currently in the directory for the given session.

    List<String> files = ((HasDownloads) driver).getDownloadableFiles();
    files = driver.get_downloadable_files()
            List<string> names = ((RemoteWebDriver)driver).GetDownloadableFiles();
    files = driver.downloadable_files

Download a File

Selenium looks for the name of the provided file in the list and downloads it to the provided target directory.

    ((HasDownloads) driver).downloadFile(downloadableFile, targetDirectory);
    driver.download_file(downloadable_file, target_directory)
            ((RemoteWebDriver)driver).DownloadFile(downloadableFile, targetDirectory);
    driver.download_file(downloadable_file, target_directory)

Delete Downloaded Files

By default, the download directory is deleted at the end of the applicable session, but you can also delete all files during the session.

    ((HasDownloads) driver).deleteDownloadableFiles();
    driver.delete_downloadable_files()
            ((RemoteWebDriver)driver).DeleteDownloadableFiles();
    driver.delete_downloadable_files

Browser specific functionalities

Each browser has implemented special functionality that is available only to that browser. Each of the Selenium bindings has implemented a different way to use those features in a Remote Session

Java requires you to use the Augmenter class, which allows it to automatically pull in implementations for all interfaces that match the capabilities used with the RemoteWebDriver

    driver = new Augmenter().augment(driver);

Of interest, using the RemoteWebDriverBuilder automatically augments the driver, so it is a great way to get all the functionality by default:

    driver =
        RemoteWebDriver.builder()
            .address(gridUrl)
            .oneOf(new ChromeOptions())
            .setCapability("ext:options", Map.of("key", "value"))
            .config(ClientConfig.defaultConfig())
            .build();
.NET uses a custom command executor for executing commands that are valid for the given browser in the remote driver.
            var customCommandDriver = driver as ICustomDriverCommandExecutor;
            customCommandDriver.RegisterCustomDriverCommands(FirefoxDriver.CustomCommandDefinitions);

            var screenshotResponse = customCommandDriver
                .ExecuteCustomDriverCommand(FirefoxDriver.GetFullPageScreenshotCommand, null);
Ruby uses mixins to add applicable browser specific methods to the Remote WebDriver session; the methods should always just work for you.

追踪客户端请求

此功能仅适用于Java客户端绑定 (Beta版以后). 远程WebDriver客户端向Selenium网格服务器发送请求, 后者将请求传递给WebDriver. 应该在服务器端和客户端启用跟踪, 以便端到端地追踪HTTP请求. 两端都应该有一个指向可视化框架的追踪导出器设置. 默认情况下, 对客户端和服务器都启用追踪. 若设置可视化框架Jaeger UI及Selenium Grid 4, 请参阅所需版本的追踪设置 .

对于客户端设置, 请执行以下步骤.

添加所需依赖

可以使用Maven安装追踪导出器的外部库. 在项目pom.xml中添加 opentelemetry-exporter-jaegergrpc-netty 的依赖项:

  <dependency>
      <groupId>io.opentelemetry</groupId>
      <artifactId>opentelemetry-exporter-jaeger</artifactId>
      <version>1.0.0</version>
    </dependency>
    <dependency>
      <groupId>io.grpc</groupId>
      <artifactId>grpc-netty</artifactId>
      <version>1.35.0</version>
    </dependency>

在运行客户端时添加/传递所需的系统属性

System.setProperty("otel.traces.exporter", "jaeger");
System.setProperty("otel.exporter.jaeger.endpoint", "http://localhost:14250");
System.setProperty("otel.resource.attributes", "service.name=selenium-java-client");

ImmutableCapabilities capabilities = new ImmutableCapabilities("browserName", "chrome");

WebDriver driver = new RemoteWebDriver(new URL("http://www.example.com"), capabilities);

driver.get("http://www.google.com");

driver.quit();

  

有关所需Selenium版本 及其外部依赖关系版本等更多信息, 请参阅追踪设置 .

更多信息请访问:

2.3 - 支持的浏览器列表

每个浏览器都有定制和特有的功能。

2.3.1 - Chrome specific functionality

These are capabilities and features specific to Google Chrome browsers.

默认情况下,Selenium 4与Chrome v75及更高版本兼容. 但是请注意Chrome浏览器的版本与chromedriver的主版本需要匹配.

Options

所有浏览器的通用功能请看这 Options page.

Chrome浏览器的特有功能可以在谷歌的页面找到: Capabilities & ChromeOptions

基于默认选项的Chrome浏览器会话看起来是这样:

    ChromeOptions options = new ChromeOptions();
    driver = new ChromeDriver(options);
    options = webdriver.ChromeOptions()
    driver = webdriver.Chrome(options=options)
            var options = new ChromeOptions();
            driver = new ChromeDriver(options);
      options = Selenium::WebDriver::Options.chrome
      @driver = Selenium::WebDriver.for :chrome, options: options
      const Options = new Chrome.Options();
      let driver = await env
        .builder()
        .setChromeOptions(Options)
        .build();

下面是一些不同功能的常见示例:

参数

The args parameter is for a list of command line switches to be used when starting the browser. There are two excellent resources for investigating these arguments:

Commonly used args include --start-maximized, --headless=new and --user-data-dir=...

Add an argument to options:

    options.addArguments("--start-maximized");
    options.add_argument("--start-maximized")
            options.AddArgument("--start-maximized");
      options.args << '--start-maximized'
      let driver = await env
        .builder()
        .setChromeOptions(options.addArguments('--headless=new'))
        .build();

从指定位置启动浏览器

binary 参数接收一个使用浏览器的备用路径,通过这个参数你可以使用chromedriver 去驱动各种基于Chromium 内核的浏览器.

添加一个浏览器地址到选项中:

    options.setBinary(getChromeLocation());
    options.binary_location = chrome_bin
            options.BinaryLocation = GetChromeLocation();
      options.binary = chrome_location
      let driver = await env
        .builder()
        .setChromeOptions(options.setChromeBinaryPath(`Path to chrome binary`))
        .build();

添加扩展程序

extensions 参数接受crx文件. As for unpacked directories, please use the load-extension argument instead, as mentioned in this post.

添加一个扩展程序到选项中:

    options.addExtensions(extensionFilePath);
    options.add_extension(extension_file_path)
            options.AddExtension(extensionFilePath);
      options.add_extension(extension_file_path)
      const options = new Chrome.Options();
      let driver = await env
        .builder()
        .setChromeOptions(options.addExtensions(['./test/resources/extensions/webextensions-selenium-example.crx']))
        .build();

保持浏览器的打开状态

detach 参数设置为true将在驱动过程结束后保持浏览器的打开状态.

添加一个布尔值到选项中:

Note: This is already the default behavior in Java.

    options.add_experimental_option("detach", True)

Note: This is already the default behavior in .NET.

      options.detach = true
      let driver = await env
        .builder()
        .setChromeOptions(options.detachDriver(true))
        .build();

排除的参数

Chrome 添加了各种参数,如果你不希望添加某些参数,可以将其传入 excludeSwitches. 一个常见的例子是重新打开弹出窗口阻止程序. A full list of default arguments can be parsed from the Chromium Source Code

设置排除参数至选项中:

    options.setExperimentalOption("excludeSwitches", List.of("disable-popup-blocking"));
    options.add_experimental_option('excludeSwitches', ['disable-popup-blocking'])
            options.AddExcludedArgument("disable-popup-blocking");
      options.exclude_switches << 'disable-popup-blocking'
      let driver = await env
        .builder()
        .setChromeOptions(options.excludeSwitches('enable-automation'))
        .build();

Service

Examples for creating a default Service object, and for setting driver location and port can be found on the Driver Service page.

Log output

Getting driver logs can be helpful for debugging issues. The Service class lets you direct where the logs will go. Logging output is ignored unless the user directs it somewhere.

File output

To change the logging output to save to a specific file:

    ChromeDriverService service =
        new ChromeDriverService.Builder().withLogFile(logLocation).build();

Note: Java also allows setting file output by System Property:
Property key: ChromeDriverService.CHROME_DRIVER_LOG_PROPERTY
Property value: String representing path to log file

    service = webdriver.ChromeService(log_output=log_path)
            service.LogPath = GetLogLocation();
      service.log = file_name

Console output

To change the logging output to display in the console as STDOUT:

    ChromeDriverService service =
        new ChromeDriverService.Builder().withLogOutput(System.out).build();

Note: Java also allows setting console output by System Property;
Property key: ChromeDriverService.CHROME_DRIVER_LOG_PROPERTY
Property value: DriverService.LOG_STDOUT or DriverService.LOG_STDERR

    service = webdriver.ChromeService(log_output=subprocess.STDOUT)

$stdout and $stderr are both valid values

      service.log = $stdout

Log level

There are 6 available log levels: ALL, DEBUG, INFO, WARNING, SEVERE, and OFF. Note that --verbose is equivalent to --log-level=ALL and --silent is equivalent to --log-level=OFF, so this example is just setting the log level generically:

    ChromeDriverService service =
        new ChromeDriverService.Builder().withLogLevel(ChromiumDriverLogLevel.DEBUG).build();

Note: Java also allows setting log level by System Property:
Property key: ChromeDriverService.CHROME_DRIVER_LOG_LEVEL_PROPERTY
Property value: String representation of ChromiumDriverLogLevel enum

    service = webdriver.ChromeService(service_args=['--log-level=DEBUG'], log_output=subprocess.STDOUT)
      service.args << '--log-level=DEBUG'

Log file features

There are 2 features that are only available when logging to a file:

  • append log
  • readable timestamps

To use them, you need to also explicitly specify the log path and log level. The log output will be managed by the driver, not the process, so minor differences may be seen.

    ChromeDriverService service =
        new ChromeDriverService.Builder().withAppendLog(true).withReadableTimestamp(true).build();

Note: Java also allows toggling these features by System Property:
Property keys: ChromeDriverService.CHROME_DRIVER_APPEND_LOG_PROPERTY and ChromeDriverService.CHROME_DRIVER_READABLE_TIMESTAMP
Property value: "true" or "false"

    service = webdriver.ChromeService(service_args=['--append-log', '--readable-timestamp'], log_output=log_path)
      service.args << '--append-log'
      service.args << '--readable-timestamp'

Disabling build check

Chromedriver and Chrome browser versions should match, and if they don’t the driver will error. If you disable the build check, you can force the driver to be used with any version of Chrome. Note that this is an unsupported feature, and bugs will not be investigated.

    ChromeDriverService service =
        new ChromeDriverService.Builder().withBuildCheckDisabled(true).build();

Note: Java also allows disabling build checks by System Property:
Property key: ChromeDriverService.CHROME_DRIVER_DISABLE_BUILD_CHECK
Property value: "true" or "false"

    service = webdriver.ChromeService(service_args=['--disable-build-check'], log_output=subprocess.STDOUT)
            service.DisableBuildCheck = true;
      service.args << '--disable-build-check'

Special Features

Casting

你可以驱动 Chrome Cast 设备,包括共享选项卡

网络条件

您可以模拟各种网络条件.

The following examples are for local webdrivers. For remote webdrivers, please refer to the Remote WebDriver page.

Logs

Permissions

DevTools

See the Chrome DevTools section for more information about using Chrome DevTools

2.3.2 - Edge 特定功能

这些是特定于微软Edge浏览器的功能和特性.

微软Edge是用Chromium实现的,最早支持版本是v79. 与Chrome类似, Edge驱动的主版本号必须与Edge浏览器的主要版本匹配.

Chrome 页面 上找到的所有capabilities和选项也适用于Edge.

选项

Capabilities common to all browsers are described on the Options page.

Capabilities unique to Chromium are documented at Google’s page for Capabilities & ChromeOptions

使用基本定义的选项启动 Edge 会话如下所示:

    EdgeOptions options = new EdgeOptions();
    driver = new EdgeDriver(options);
    options = webdriver.EdgeOptions()
    driver = webdriver.Edge(options=options)
            var options = new EdgeOptions();
            driver = new EdgeDriver(options);
      options = Selenium::WebDriver::Options.edge
      @driver = Selenium::WebDriver.for :edge, options: options
      driver = await env.builder()
        .setEdgeOptions(options)
        .build();
    });

Arguments

The args parameter is for a list of command line switches to be used when starting the browser. There are two excellent resources for investigating these arguments:

Commonly used args include --start-maximized and --headless=new and --user-data-dir=...

Add an argument to options:

    options.addArguments("--start-maximized");
    options.add_argument("--start-maximized")
            options.AddArgument("--start-maximized");
      options.args << '--start-maximized'

Start browser in a specified location

The binary parameter takes the path of an alternate location of browser to use. With this parameter you can use chromedriver to drive various Chromium based browsers.

Add a browser location to options:

    options.setBinary(getEdgeLocation());
    options.binary_location = edge_bin
            options.BinaryLocation = GetEdgeLocation();
      options.binary = edge_location

Add extensions

The extensions parameter accepts crx files. As for unpacked directories, please use the load-extension argument instead, as mentioned in this post.

Add an extension to options:

    options.addExtensions(extensionFilePath);
    options.add_extension(extension_file_path)
            options.AddExtension(extensionFilePath);
      options.add_extension(extension_file_path)

Keeping browser open

Setting the detach parameter to true will keep the browser open after the process has ended, so long as the quit command is not sent to the driver.

Note: This is already the default behavior in Java.

    options.add_experimental_option("detach", True)

Note: This is already the default behavior in .NET.

      options.detach = true

Excluding arguments

MSEdgedriver has several default arguments it uses to start the browser. If you do not want those arguments added, pass them into excludeSwitches. A common example is to turn the popup blocker back on. A full list of default arguments can be parsed from the Chromium Source Code

Set excluded arguments on options:

    options.setExperimentalOption("excludeSwitches", List.of("disable-popup-blocking"));
    options.add_experimental_option('excludeSwitches', ['disable-popup-blocking'])
            options.AddExcludedArgument("disable-popup-blocking");
      options.exclude_switches << 'disable-popup-blocking'

Service

Examples for creating a default Service object, and for setting driver location and port can be found on the Driver Service page.

Log output

Getting driver logs can be helpful for debugging issues. The Service class lets you direct where the logs will go. Logging output is ignored unless the user directs it somewhere.

File output

To change the logging output to save to a specific file:

    EdgeDriverService service = new EdgeDriverService.Builder().withLogFile(logLocation).build();

Note: Java also allows setting file output by System Property:
Property key: EdgeDriverService.EDGE_DRIVER_LOG_PROPERTY
Property value: String representing path to log file

    service = webdriver.EdgeService(log_output=log_path)
            service.LogPath = GetLogLocation();
      service.log = file_name

Console output

To change the logging output to display in the console as STDOUT:

    EdgeDriverService service = new EdgeDriverService.Builder().withLogOutput(System.out).build();

Note: Java also allows setting console output by System Property;
Property key: EdgeDriverService.EDGE_DRIVER_LOG_PROPERTY
Property value: DriverService.LOG_STDOUT or DriverService.LOG_STDERR

$stdout and $stderr are both valid values

      service.log = $stdout

Log level

There are 6 available log levels: ALL, DEBUG, INFO, WARNING, SEVERE, and OFF. Note that --verbose is equivalent to --log-level=ALL and --silent is equivalent to --log-level=OFF, so this example is just setting the log level generically:

    EdgeDriverService service =
        new EdgeDriverService.Builder().withLoglevel(ChromiumDriverLogLevel.DEBUG).build();

Note: Java also allows setting log level by System Property:
Property key: EdgeDriverService.EDGE_DRIVER_LOG_LEVEL_PROPERTY
Property value: String representation of ChromiumDriverLogLevel enum

    service = webdriver.EdgeService(service_args=['--log-level=DEBUG'], log_output=log_path)
      service.args << '--log-level=DEBUG'

Log file features

There are 2 features that are only available when logging to a file:

  • append log
  • readable timestamps

To use them, you need to also explicitly specify the log path and log level. The log output will be managed by the driver, not the process, so minor differences may be seen.

    EdgeDriverService service =
        new EdgeDriverService.Builder().withAppendLog(true).withReadableTimestamp(true).build();

Note: Java also allows toggling these features by System Property:
Property keys: EdgeDriverService.EDGE_DRIVER_APPEND_LOG_PROPERTY and EdgeDriverService.EDGE_DRIVER_READABLE_TIMESTAMP
Property value: "true" or "false"

    service = webdriver.EdgeService(service_args=['--append-log', '--readable-timestamp'], log_output=log_path)
      service.args << '--append-log'
      service.args << '--readable-timestamp'

Disabling build check

Edge browser and msedgedriver versions should match, and if they don’t the driver will error. If you disable the build check, you can force the driver to be used with any version of Edge. Note that this is an unsupported feature, and bugs will not be investigated.

    EdgeDriverService service =
        new EdgeDriverService.Builder().withBuildCheckDisabled(true).build();

Note: Java also allows disabling build checks by System Property:
Property key: EdgeDriverService.EDGE_DRIVER_DISABLE_BUILD_CHECK
Property value: "true" or "false"

    service = webdriver.EdgeService(service_args=['--disable-build-check'], log_output=log_path)
            service.DisableBuildCheck = true;
      service.args << '--disable-build-check'

Internet Explorer 兼容模式

微软Edge可以被"Internet Explorer兼容模式"驱动, 该模式使用Internet Explorer驱动类与微软Edge结合使用. 阅读 Internet Explorer 页面 了解更多详情.

Special Features

Some browsers have implemented additional features that are unique to them.

Casting

You can drive Chrome Cast devices with Edge, including sharing tabs

Network conditions

You can simulate various network conditions.

Logs

Permissions

DevTools

See the Chrome DevTools section for more information about using DevTools in Edge

2.3.3 - Firefox specific functionality

These are capabilities and features specific to Mozilla Firefox browsers.

Selenium 4 requires Firefox 78 or greater. It is recommended to always use the latest version of geckodriver.

Options

Capabilities common to all browsers are described on the Options page.

Capabilities unique to Firefox can be found at Mozilla’s page for firefoxOptions

Starting a Firefox session with basic defined options looks like this:

    FirefoxOptions options = new FirefoxOptions();
    driver = new FirefoxDriver(options);
    options = webdriver.FirefoxOptions()
    driver = webdriver.Firefox(options=options)
            var options = new FirefoxOptions();
            driver = new FirefoxDriver(options);
      options = Selenium::WebDriver::Options.firefox
      @driver = Selenium::WebDriver.for :firefox, options: options
      let options = new firefox.Options();
      driver = await env.builder()
        .setFirefoxOptions(options)
        .build();

Here are a few common use cases with different capabilities:

Arguments

The args parameter is for a list of Command line switches used when starting the browser.
Commonly used args include -headless and "-profile", "/path/to/profile"

Add an argument to options:

    options.addArguments("-headless");
    options.add_argument("-headless")
            options.AddArgument("-headless");
      options.args << '-headless'
    let driver = await env.builder()
      .setFirefoxOptions(options.addArguments('--headless'))
      .build();

Start browser in a specified location

The binary parameter takes the path of an alternate location of browser to use. For example, with this parameter you can use geckodriver to drive Firefox Nightly instead of the production version when both are present on your computer.

Add a browser location to options:

    options.setBinary(getFirefoxLocation());
    options.binary_location = firefox_bin
            options.BinaryLocation = GetFirefoxLocation();
      options.binary = firefox_location

Profiles

There are several ways to work with Firefox profiles

FirefoxProfile profile = new FirefoxProfile();
FirefoxOptions options = new FirefoxOptions();
options.setProfile(profile);
driver = new RemoteWebDriver(options);
  
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
options=Options()
firefox_profile = FirefoxProfile()
firefox_profile.set_preference("javascript.enabled", False)
options.profile = firefox_profile
  
var options = new FirefoxOptions();
var profile = new FirefoxProfile();
options.Profile = profile;
var driver = new RemoteWebDriver(options);
  
profile = Selenium::WebDriver::Firefox::Profile.new
profile['browser.download.dir'] = "/tmp/webdriver-downloads"
options = Selenium::WebDriver::Firefox::Options.new(profile: profile)
driver = Selenium::WebDriver.for :firefox, options: options
  
const { Builder } = require("selenium-webdriver");
const firefox = require('selenium-webdriver/firefox');

const options = new firefox.Options();
let profile = '/path to custom profile';
options.setProfile(profile);
const driver = new Builder()
    .forBrowser('firefox')
    .setFirefoxOptions(options)
    .build();
  
val options = FirefoxOptions()
options.profile = FirefoxProfile()
driver = RemoteWebDriver(options)
  

Service

Service settings common to all browsers are described on the Service page.

Log output

Getting driver logs can be helpful for debugging various issues. The Service class lets you direct where the logs will go. Logging output is ignored unless the user directs it somewhere.

File output

To change the logging output to save to a specific file:

    FirefoxDriverService service =
        new GeckoDriverService.Builder().withLogFile(logLocation).build();

Note: Java also allows setting file output by System Property:
Property key: GeckoDriverService.GECKO_DRIVER_LOG_PROPERTY
Property value: String representing path to log file

    service = webdriver.FirefoxService(log_output=log_path, service_args=['--log', 'debug'])
      service.log = file_name

Console output

To change the logging output to display in the console:

    FirefoxDriverService service =
        new GeckoDriverService.Builder().withLogOutput(System.out).build();

Note: Java also allows setting console output by System Property;
Property key: GeckoDriverService.GECKO_DRIVER_LOG_PROPERTY
Property value: DriverService.LOG_STDOUT or DriverService.LOG_STDERR

    service = webdriver.FirefoxService(log_output=subprocess.STDOUT)
      service.log = $stdout

Log level

There are 7 available log levels: fatal, error, warn, info, config, debug, trace. If logging is specified the level defaults to info.

Note that -v is equivalent to -log debug and -vv is equivalent to log trace, so this examples is just for setting the log level generically:

    FirefoxDriverService service =
        new GeckoDriverService.Builder().withLogLevel(FirefoxDriverLogLevel.DEBUG).build();

Note: Java also allows setting log level by System Property:
Property key: GeckoDriverService.GECKO_DRIVER_LOG_LEVEL_PROPERTY
Property value: String representation of FirefoxDriverLogLevel enum

    service = webdriver.FirefoxService(log_output=log_path, service_args=['--log', 'debug'])
      service.args += %w[--log debug]

Truncated Logs

The driver logs everything that gets sent to it, including string representations of large binaries, so Firefox truncates lines by default. To turn off truncation:

    FirefoxDriverService service =
        new GeckoDriverService.Builder().withTruncatedLogs(false).build();

Note: Java also allows setting log level by System Property:
Property key: GeckoDriverService.GECKO_DRIVER_LOG_NO_TRUNCATE
Property value: "true" or "false"

    service = webdriver.FirefoxService(service_args=['--log-no-truncate', '--log', 'debug'], log_output=log_path)
      service.args << '--log-no-truncate'

Profile Root

The default directory for profiles is the system temporary directory. If you do not have access to that directory, or want profiles to be created some place specific, you can change the profile root directory:

    FirefoxDriverService service =
        new GeckoDriverService.Builder().withProfileRoot(profileDirectory).build();

Note: Java also allows setting log level by System Property:
Property key: GeckoDriverService.GECKO_DRIVER_PROFILE_ROOT
Property value: String representing path to profile root directory

    service = webdriver.FirefoxService(service_args=['--profile-root', temp_dir])
      service.args += ['--profile-root', root_directory]

Special Features

Add-ons

Unlike Chrome, Firefox extensions are not added as part of capabilities as mentioned in this issue, they are created after starting the driver.

The following examples are for local webdrivers. For remote webdrivers, please refer to the Remote WebDriver page.

Installation

A signed xpi file you would get from Mozilla Addon page

    driver.installExtension(xpiPath);
    driver.install_addon(addon_path)
            driver.InstallAddOnFromFile(Path.GetFullPath(extensionFilePath));
      driver.install_addon(extension_file_path)
    const xpiPath = path.resolve('./test/resources/extensions/selenium-example.xpi')
    let driver = await env.builder().build();
    let id = await driver.installAddon(xpiPath);

Uninstallation

Uninstalling an addon requires knowing its id. The id can be obtained from the return value when installing the add-on.

    driver.uninstallExtension(id);
    driver.uninstall_addon(id)
            driver.UninstallAddOn(extensionId);
      driver.uninstall_addon(extension_id)
    const xpiPath = path.resolve('./test/resources/extensions/selenium-example.xpi')
    let driver = await env.builder().build();
    let id = await driver.installAddon(xpiPath);
    await driver.uninstallAddon(id);

Unsigned installation

When working with an unfinished or unpublished extension, it will likely not be signed. As such, it can only be installed as “temporary.” This can be done by passing in either a zip file or a directory, here’s an example with a directory:

    driver.installExtension(path, true);
    driver.install_addon(addon_path, temporary=True)
            driver.InstallAddOnFromDirectory(Path.GetFullPath(extensionDirPath), true);
      driver.install_addon(extension_dir_path, true)

Full page screenshots

The following examples are for local webdrivers. For remote webdrivers, please refer to the Remote WebDriver page.

Context

The following examples are for local webdrivers. For remote webdrivers, please refer to the Remote WebDriver page.

2.3.4 - IE specific functionality

These are capabilities and features specific to Microsoft Internet Explorer browsers.

As of June 2022, Selenium officially no longer supports standalone Internet Explorer. The Internet Explorer driver still supports running Microsoft Edge in “IE Compatibility Mode.”

Special considerations

The IE Driver is the only driver maintained by the Selenium Project directly. While binaries for both the 32-bit and 64-bit versions of Internet Explorer are available, there are some known limitations with the 64-bit driver. As such it is recommended to use the 32-bit driver.

Additional information about using Internet Explorer can be found on the IE Driver Server page

Options

Starting a Microsoft Edge browser in Internet Explorer Compatibility mode with basic defined options looks like this:

        InternetExplorerOptions options = new InternetExplorerOptions();
        options.attachToEdgeChrome();
        options.withEdgeExecutablePath(getEdgeLocation());
        driver = new InternetExplorerDriver(options);
    options = webdriver.IeOptions()
    options.attach_to_edge_chrome = True
    options.edge_executable_path = edge_bin
    driver = webdriver.Ie(options=options)
            var options = new InternetExplorerOptions();
            options.AttachToEdgeChrome = true;
            options.EdgeExecutablePath = GetEdgeLocation();
            driver = new InternetExplorerDriver(options);
      options = Selenium::WebDriver::Options.ie
      options.attach_to_edge_chrome = true
      options.edge_executable_path = edge_location
      @driver = Selenium::WebDriver.for :ie, options: options

As of Internet Explorer Driver v4.5.0:

  • If IE is not present on the system (default in Windows 11), you do not need to use the two parameters above. IE Driver will use Edge and will automatically locate it.
  • If IE and Edge are both present on the system, you only need to set attaching to Edge, IE Driver will automatically locate Edge on your system.

So, if IE is not on the system, you only need:

        InternetExplorerOptions options = new InternetExplorerOptions();
        driver = new InternetExplorerDriver(options);
    options = webdriver.IeOptions()
    driver = webdriver.Ie(options=options)
            var options = new InternetExplorerOptions();
            driver = new InternetExplorerDriver(options);
      options = Selenium::WebDriver::Options.ie
      @driver = Selenium::WebDriver.for :ie, options: options
let driver = await new Builder()
.forBrowser('internet explorer')
.setIEOptions(options)
.build();
<p><a href=/documentation/about/contributing/#moving-examples>
<span class="selenium-badge-code" data-bs-toggle="tooltip" data-bs-placement="right"
      title="One or more of these examples need to be implemented in the examples directory; click for details in the contribution guide">Move Code</span></a></p>


val options = InternetExplorerOptions()
val driver = InternetExplorerDriver(options)

Here are a few common use cases with different capabilities:

fileUploadDialogTimeout

在某些环境中, 当打开文件上传对话框时, Internet Explorer可能会超时. IEDriver的默认超时为1000毫秒, 但您可以使用fileUploadDialogTimeout功能来增加超时时间.

InternetExplorerOptions options = new InternetExplorerOptions();
options.waitForUploadDialogUpTo(Duration.ofSeconds(2));
WebDriver driver = new RemoteWebDriver(options); 
  
from selenium import webdriver

options = webdriver.IeOptions()
options.file_upload_dialog_timeout = 2000
driver = webdriver.Ie(options=options)

driver.get("http://www.google.com")

driver.quit()
  
var options = new InternetExplorerOptions();
options.FileUploadDialogTimeout = TimeSpan.FromMilliseconds(2000);
var driver = new RemoteWebDriver(options);
  
options = Selenium::WebDriver::IE::Options.new
options.file_upload_dialog_timeout = 2000
driver = Selenium::WebDriver.for(:ie, options: options)
  
const ie = require('selenium-webdriver/ie');
let options = new ie.Options().fileUploadDialogTimeout(2000);
let driver = await Builder()
          .setIeOptions(options)
          .build();  
  
val options = InternetExplorerOptions()
options.waitForUploadDialogUpTo(Duration.ofSeconds(2))
val driver = RemoteWebDriver(options)
  

ensureCleanSession

设置为 true时, 此功能将清除InternetExplorer所有正在运行实例的 缓存, 浏览器历史记录和Cookies (包括手动启动或由驱动程序启动的实例) . 默认情况下,此设置为 false.

使用此功能将导致启动浏览器时性能下降, 因为驱动程序将等待直到缓存清除后再启动IE浏览器.

此功能接受一个布尔值作为参数.

InternetExplorerOptions options = new InternetExplorerOptions();
options.destructivelyEnsureCleanSession();
WebDriver driver = new RemoteWebDriver(options);
  
from selenium import webdriver

options = webdriver.IeOptions()
options.ensure_clean_session = True
driver = webdriver.Ie(options=options)

driver.get("http://www.google.com")

driver.quit()
  
var options = new InternetExplorerOptions();
options.EnsureCleanSession = true;
var driver = new RemoteWebDriver(options);
  
options = Selenium::WebDriver::IE::Options.new
options.ensure_clean_session = true
driver = Selenium::WebDriver.for(:ie, options: options)
  
const ie = require('selenium-webdriver/ie');
let options = new ie.Options().ensureCleanSession(true);
let driver = await Builder()
          .setIeOptions(options)
          .build(); 
  
val options = InternetExplorerOptions()
options.destructivelyEnsureCleanSession()
val driver = RemoteWebDriver(options)
  

ignoreZoomSetting

InternetExplorer驱动程序期望浏览器的缩放级别为100%, 否则驱动程序将可能抛出异常. 通过将 ignoreZoomSetting 设置为 true, 可以禁用此默认行为.

此功能接受一个布尔值作为参数.

InternetExplorerOptions options = new InternetExplorerOptions();
options.ignoreZoomSettings();
WebDriver driver = new RemoteWebDriver(options);
  
from selenium import webdriver

options = webdriver.IeOptions()
options.ignore_zoom_level = True
driver = webdriver.Ie(options=options)

driver.get("http://www.google.com")

driver.quit()
  
var options = new InternetExplorerOptions();
options.IgnoreZoomLevel = true;
var driver = new RemoteWebDriver(options);
  
options = Selenium::WebDriver::IE::Options.new
options.ignore_zoom_level = true
driver = Selenium::WebDriver.for(:ie, options: options)
  
const ie = require('selenium-webdriver/ie');
let options = new ie.Options().ignoreZoomSetting(true);
let driver = await Builder()
          .setIeOptions(options)
          .build(); 
  
val options = InternetExplorerOptions()
options.ignoreZoomSettings()
val driver = RemoteWebDriver(options)
  

ignoreProtectedModeSettings

启动新的IE会话时是否跳过 保护模式 检查.

如果未设置, 并且所有区域的 保护模式 设置都不同, 则驱动程序将可能引发异常.

如果将功能设置为 true, 则测试可能会变得不稳定, 无响应, 或者浏览器可能会挂起. 但是, 到目前为止, 这仍然是第二好的选择, 并且第一选择应该 始终 是手动实际设置每个区域的保护模式设置. 如果用户正在使用此属性, 则只会给予 “尽力而为” 的支持.

此功能接受一个布尔值作为参数.

InternetExplorerOptions options = new InternetExplorerOptions();
options.introduceFlakinessByIgnoringSecurityDomains();
WebDriver driver = new RemoteWebDriver(options);
  
from selenium import webdriver

options = webdriver.IeOptions()
options.ignore_protected_mode_settings = True
driver = webdriver.Ie(options=options)

driver.get("http://www.google.com")

driver.quit()
  
var options = new InternetExplorerOptions();
options.IntroduceInstabilityByIgnoringProtectedModeSettings = true;
var driver = new RemoteWebDriver(options);
  
options = Selenium::WebDriver::IE::Options.new
options.ignore_protected_mode_settings = true
driver = Selenium::WebDriver.for(:ie, options: options)
  
const ie = require('selenium-webdriver/ie');
let options = new ie.Options().introduceFlakinessByIgnoringProtectedModeSettings(true);
let driver = await Builder()
          .setIeOptions(options)
          .build(); 
  
val options = InternetExplorerOptions()
options.introduceFlakinessByIgnoringSecurityDomains()
val driver = RemoteWebDriver(options)
  

silent

设置为 true时, 此功能将禁止IEDriverServer的诊断输出.

此功能接受一个布尔值作为参数.

InternetExplorerOptions options = new InternetExplorerOptions();
options.setCapability("silent", true);
WebDriver driver = new InternetExplorerDriver(options);
  
from selenium import webdriver

options = webdriver.IeOptions()
options.set_capability("silent", True)
driver = webdriver.Ie(options=options)

driver.get("http://www.google.com")

driver.quit()
  
InternetExplorerOptions options = new InternetExplorerOptions();
options.AddAdditionalInternetExplorerOption("silent", true);
IWebDriver driver = new InternetExplorerDriver(options);
  
    
    
    
    
    <p><a href=/documentation/about/contributing/#creating-examples>
    <span class="selenium-badge-code" data-bs-toggle="tooltip" data-bs-placement="right"
          title="This code example is missing. Examples are added to the examples directory; click for details in the contribution guide">Add Example</span></a></p>
    

  
const {Builder,By, Capabilities} = require('selenium-webdriver');
let caps = Capabilities.ie();
caps.set('silent', true);

(async function example() {
    let driver = await new Builder()
        .forBrowser('internet explorer')
        .withCapabilities(caps)
        .build();
    try {
        await driver.get('http://www.google.com/ncr');
    }
    finally {
        await driver.quit();
    }
})();
  
import org.openqa.selenium.Capabilities
import org.openqa.selenium.ie.InternetExplorerDriver
import org.openqa.selenium.ie.InternetExplorerOptions

fun main() {
    val options = InternetExplorerOptions()
    options.setCapability("silent", true)
    val driver = InternetExplorerDriver(options)
    try {
        driver.get("https://google.com/ncr")
        val caps = driver.getCapabilities()
        println(caps)
    } finally {
        driver.quit()
    }
}
  

IE 命令行选项

Internet Explorer包含几个命令行选项, 使您可以进行故障排除和配置浏览器.

下面介绍了一些受支持的命令行选项

  • -private : 用于在私有浏览模式下启动IE. 这适用于IE 8和更高版本.

  • -k : 在kiosk模式下启动Internet Explorer. 浏览器在一个最大化的窗口中打开, 该窗口不显示地址栏, 导航按钮或状态栏.

  • -extoff : 在无附加模式下启动IE. 此选项专门用于解决浏览器加载项问题. 在IE 7和更高版本中均可使用.

注意: forceCreateProcessApi 应该启用命令行参数才能正常工作.

import org.openqa.selenium.Capabilities;
import org.openqa.selenium.ie.InternetExplorerDriver;
import org.openqa.selenium.ie.InternetExplorerOptions;

public class ieTest {
    public static void main(String[] args) {
        InternetExplorerOptions options = new InternetExplorerOptions();
        options.useCreateProcessApiToLaunchIe();
        options.addCommandSwitches("-k");
        InternetExplorerDriver driver = new InternetExplorerDriver(options);
        try {
            driver.get("https://google.com/ncr");
            Capabilities caps = driver.getCapabilities();
            System.out.println(caps);
        } finally {
            driver.quit();
        }
    }
}
  
from selenium import webdriver

options = webdriver.IeOptions()
options.add_argument('-private')
options.force_create_process_api = True
driver = webdriver.Ie(options=options)

driver.get("http://www.google.com")

driver.quit()
  
using System;
using OpenQA.Selenium;
using OpenQA.Selenium.IE;

namespace ieTest {
 class Program {
  static void Main(string[] args) {
   InternetExplorerOptions options = new InternetExplorerOptions();
   options.ForceCreateProcessApi = true;
   options.BrowserCommandLineArguments = "-k";
   IWebDriver driver = new InternetExplorerDriver(options);
   driver.Url = "https://google.com/ncr";
  }
 }
}
  
require 'selenium-webdriver'
options = Selenium::WebDriver::IE::Options.new
options.force_create_process_api = true
options.add_argument('-k')
driver = Selenium::WebDriver.for(:ie, options: options)

begin
  driver.get 'https://google.com'
  puts(driver.capabilities.to_json)
ensure
  driver.quit
end
  
const ie = require('selenium-webdriver/ie');
let options = new ie.Options();
options.addBrowserCommandSwitches('-k');
options.addBrowserCommandSwitches('-private');
options.forceCreateProcessApi(true);

driver = await env.builder()
          .setIeOptions(options)
          .build();
  
import org.openqa.selenium.Capabilities
import org.openqa.selenium.ie.InternetExplorerDriver
import org.openqa.selenium.ie.InternetExplorerOptions

fun main() {
    val options = InternetExplorerOptions()
    options.useCreateProcessApiToLaunchIe()
    options.addCommandSwitches("-k")
    val driver = InternetExplorerDriver(options)
    try {
        driver.get("https://google.com/ncr")
        val caps = driver.getCapabilities()
        println(caps)
    } finally {
        driver.quit()
    }
}
  

forceCreateProcessApi

强制使用CreateProcess API启动Internet Explorer. 默认值为false.

对于IE 8及更高版本, 此选项要求将 “TabProcGrowth” 注册表值设置为0.

import org.openqa.selenium.Capabilities;
import org.openqa.selenium.ie.InternetExplorerDriver;
import org.openqa.selenium.ie.InternetExplorerOptions;

public class ieTest {
    public static void main(String[] args) {
        InternetExplorerOptions options = new InternetExplorerOptions();
        options.useCreateProcessApiToLaunchIe();
        InternetExplorerDriver driver = new InternetExplorerDriver(options);
        try {
            driver.get("https://google.com/ncr");
            Capabilities caps = driver.getCapabilities();
            System.out.println(caps);
        } finally {
            driver.quit();
        }
    }
}
  
from selenium import webdriver

options = webdriver.IeOptions()
options.force_create_process_api = True
driver = webdriver.Ie(options=options)

driver.get("http://www.google.com")

driver.quit()
  
using System;
using OpenQA.Selenium;
using OpenQA.Selenium.IE;

namespace ieTest {
 class Program {
  static void Main(string[] args) {
   InternetExplorerOptions options = new InternetExplorerOptions();
   options.ForceCreateProcessApi = true;
   IWebDriver driver = new InternetExplorerDriver(options);
   driver.Url = "https://google.com/ncr";
  }
 }
}
  
require 'selenium-webdriver'
options = Selenium::WebDriver::IE::Options.new
options.force_create_process_api = true
driver = Selenium::WebDriver.for(:ie, options: options)

begin
  driver.get 'https://google.com'
  puts(driver.capabilities.to_json)
ensure
  driver.quit
end
  
const ie = require('selenium-webdriver/ie');
let options = new ie.Options();
options.forceCreateProcessApi(true);

driver = await env.builder()
          .setIeOptions(options)
          .build();
  
import org.openqa.selenium.Capabilities
import org.openqa.selenium.ie.InternetExplorerDriver
import org.openqa.selenium.ie.InternetExplorerOptions

fun main() {
    val options = InternetExplorerOptions()
    options.useCreateProcessApiToLaunchIe()
    val driver = InternetExplorerDriver(options)
    try {
        driver.get("https://google.com/ncr")
        val caps = driver.getCapabilities()
        println(caps)
    } finally {
        driver.quit()
    }
}
  

Service

Service settings common to all browsers are described on the Service page.

Log output

Getting driver logs can be helpful for debugging various issues. The Service class lets you direct where the logs will go. Logging output is ignored unless the user directs it somewhere.

File output

To change the logging output to save to a specific file:

                .withLogFile(getLogLocation())

Note: Java also allows setting file output by System Property:
Property key: InternetExplorerDriverService.IE_DRIVER_LOGFILE_PROPERTY
Property value: String representing path to log file

    service = webdriver.IeService(log_output=log_path, log_level='INFO')
      service.log = file_name

Console output

To change the logging output to display in the console as STDOUT:

                .withLogOutput(System.out)

Note: Java also allows setting console output by System Property;
Property key: InternetExplorerDriverService.IE_DRIVER_LOGFILE_PROPERTY
Property value: DriverService.LOG_STDOUT or DriverService.LOG_STDERR

    service = webdriver.IeService(log_output=subprocess.STDOUT)
      service.log = $stdout

Log Level

There are 6 available log levels: FATAL, ERROR, WARN, INFO, DEBUG, and TRACE If logging output is specified, the default level is FATAL

                .withLogLevel(InternetExplorerDriverLogLevel.WARN)

Note: Java also allows setting log level by System Property:
Property key: InternetExplorerDriverService.IE_DRIVER_LOGLEVEL_PROPERTY
Property value: String representation of InternetExplorerDriverLogLevel.DEBUG.toString() enum

    service = webdriver.IeService(log_output=log_path, log_level='WARN')
            service.LoggingLevel = InternetExplorerDriverLogLevel.Warn;
      service.args << '-log-level=WARN'

Supporting Files Path

                .withExtractPath(getTempDirectory())
**Note**: Java also allows setting log level by System Property:\ Property key: `InternetExplorerDriverService.IE_DRIVER_EXTRACT_PATH_PROPERTY`\ Property value: String representing path to supporting files directory
    service = webdriver.IeService(service_args=["–extract-path="+temp_dir])
            service.LibraryExtractionPath = GetTempDirectory();
      service.args << "–extract-path=#{root_directory}"

2.3.5 - Safari 特定功能

这些是特定于Apple Safari浏览器的功能和特性.

与Chromium和Firefox驱动不同, safari驱动随操作系统安装. 要在 Safari 上启用自动化, 请从终端运行以下命令:

safaridriver --enable

选项

所有浏览器通用的Capabilities在选项页.

Safari独有的Capabilities可以在Apple的页面关于Safari的WebDriver 上找到

使用基本定义的选项启动 Safari 会话如下所示:

        SafariOptions options = new SafariOptions();
        driver = new SafariDriver(options);
    options = webdriver.SafariOptions()
    driver = webdriver.Safari(options=options)
            var options = new SafariOptions();
            driver = new SafariDriver(options);
      options = Selenium::WebDriver::Options.safari
      @driver = Selenium::WebDriver.for :safari, options: options
      let driver = await env.builder()
      .setSafariOptions(options)
      .build();
val options = SafariOptions()
val driver = SafariDriver(options)

移动端

那些希望在iOS上自动化Safari的人可以参考 Appium project.

Service

Service settings common to all browsers are described on the Service page.

Logging

Unlike other browsers, Safari doesn’t let you choose where logs are output, or change levels. The one option available is to turn logs off or on. If logs are toggled on, they can be found at:~/Library/Logs/com.apple.WebDriver/.

                .withLogging(true)

Note: Java also allows setting console output by System Property;
Property key: SafariDriverService.SAFARI_DRIVER_LOGGING
Property value: "true" or "false"

    service = webdriver.SafariService(service_args=["--diagnose"])
      service.args << '--diagnose'

Safari Technology Preview

Apple provides a development version of their browser — Safari Technology Preview To use this version in your code:

2.4 - 等待

Perhaps the most common challenge for browser automation is ensuring that the web application is in a state to execute a particular Selenium command as desired. The processes often end up in a race condition where sometimes the browser gets into the right state first (things work as intended) and sometimes the Selenium code executes first (things do not work as intended). This is one of the primary causes of flaky tests.

All navigation commands wait for a specific readyState value based on the page load strategy (the default value to wait for is "complete") before the driver returns control to the code. The readyState only concerns itself with loading assets defined in the HTML, but loaded JavaScript assets often result in changes to the site, and elements that need to be interacted with may not yet be on the page when the code is ready to execute the next Selenium command.

Similarly, in a lot of single page applications, elements get dynamically added to a page or change visibility based on a click. An element must be both present and displayed on the page in order for Selenium to interact with it.

Take this page for example: https://www.selenium.dev/selenium/web/dynamic.html When the “Add a box!” button is clicked, a “div” element that does not exist is created. When the “Reveal a new input” button is clicked, a hidden text field element is displayed. In both cases the transition takes a couple seconds. If the Selenium code is to click one of these buttons and interact with the resulting element, it will do so before that element is ready and fail.

The first solution many people turn to is adding a sleep statement to pause the code execution for a set period of time. Because the code can’t know exactly how long it needs to wait, this can fail when it doesn’t sleep long enough. Alternately, if the value is set too high and a sleep statement is added in every place it is needed, the duration of the session can become prohibitive.

Selenium provides two different mechanisms for synchronization that are better.

Implicit waits

Selenium has a built-in way to automatically wait for elements called an implicit wait. An implicit wait value can be set either with the timeouts capability in the browser options, or with a driver method (as shown below).

This is a global setting that applies to every element location call for the entire session. The default value is 0, which means that if the element is not found, it will immediately return an error. If an implicit wait is set, the driver will wait for the duration of the provided value before returning the error. Note that as soon as the element is located, the driver will return the element reference and the code will continue executing, so a larger implicit wait value won’t necessarily increase the duration of the session.

Warning: Do not mix implicit and explicit waits. Doing so can cause unpredictable wait times. For example, setting an implicit wait of 10 seconds and an explicit wait of 15 seconds could cause a timeout to occur after 20 seconds.

Solving our example with an implicit wait looks like this:

    driver.manage().timeouts().implicitlyWait(Duration.ofSeconds(2));
    driver.implicitly_wait(2)
            driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(2);
    driver.manage.timeouts.implicit_wait = 2
            await driver.manage().setTimeouts({ implicit: 2000 });
            await driver.get('https://www.selenium.dev/selenium/web/dynamic.html');
            await driver.findElement(By.id("adder")).click();

            let added = await driver.findElement(By.id("box0"));

Explicit waits

Explicit waits are loops added to the code that poll the application for a specific condition to evaluate as true before it exits the loop and continues to the next command in the code. If the condition is not met before a designated timeout value, the code will give a timeout error. Since there are many ways for the application not to be in the desired state, so explicit waits are a great choice to specify the exact condition to wait for in each place it is needed. Another nice feature is that, by default, the Selenium Wait class automatically waits for the designated element to exist.

This example shows the condition being waited for as a lambda. Java also supports Expected Conditions

    Wait<WebDriver> wait = new WebDriverWait(driver, Duration.ofSeconds(2));
    wait.until(d -> revealed.isDisplayed());

This example shows the condition being waited for as a lambda. Python also supports Expected Conditions

    wait = WebDriverWait(driver, timeout=2)
    wait.until(lambda d : revealed.is_displayed())
            WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(2));
            wait.Until(d => revealed.Displayed);
    wait = Selenium::WebDriver::Wait.new
    wait.until { revealed.displayed? }

JavaScript also supports Expected Conditions

            await driver.get('https://www.selenium.dev/selenium/web/dynamic.html');
            let revealed = await driver.findElement(By.id("revealed"));
            await driver.findElement(By.id("reveal")).click();
            await driver.wait(until.elementIsVisible(revealed), 2000);
            await revealed.sendKeys("Displayed");

Customization

The Wait class can be instantiated with various parameters that will change how the conditions are evaluated.

This can include:

  • Changing how often the code is evaluated (polling interval)
  • Specifying which exceptions should be handled automatically
  • Changing the total timeout length
  • Customizing the timeout message

For instance, if the element not interactable error is retried by default, then we can add an action on a method inside the code getting executed (we just need to make sure that the code returns true when it is successful):

The easiest way to customize Waits in Java is to use the FluentWait class:

    Wait<WebDriver> wait =
        new FluentWait<>(driver)
            .withTimeout(Duration.ofSeconds(2))
            .pollingEvery(Duration.ofMillis(300))
            .ignoring(ElementNotInteractableException.class);

    wait.until(
        d -> {
          revealed.sendKeys("Displayed");
          return true;
        });
    errors = [NoSuchElementException, ElementNotInteractableException]
    wait = WebDriverWait(driver, timeout=2, poll_frequency=.2, ignored_exceptions=errors)
    wait.until(lambda d : revealed.send_keys("Displayed") or True)
            WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(2))
            {
                PollingInterval = TimeSpan.FromMilliseconds(300),
            };
            wait.IgnoreExceptionTypes(typeof(ElementNotInteractableException));

            wait.Until(d => {
                revealed.SendKeys("Displayed");
                return true;
            });
    errors = [Selenium::WebDriver::Error::NoSuchElementError,
              Selenium::WebDriver::Error::ElementNotInteractableError]
    wait = Selenium::WebDriver::Wait.new(timeout: 2,
                                         interval: 0.3,
                                         ignore: errors)

    wait.until { revealed.send_keys('Displayed') || true }

2.5 - 网络元素

在DOM中识别和使用元素对象.

大多数人的Selenium代码都涉及使用web元素.

2.5.1 - 文件上传

Because Selenium cannot interact with the file upload dialog, it provides a way to upload files without opening the dialog. If the element is an input element with type file, you can use the send keys method to send the full path to the file that will be uploaded.

    WebElement fileInput = driver.findElement(By.cssSelector("input[type=file]"));
    fileInput.sendKeys(uploadFile.getAbsolutePath());
    driver.findElement(By.id("file-submit")).click();
    file_input = driver.find_element(By.CSS_SELECTOR, "input[type='file']")
    file_input.send_keys(upload_file)
    driver.find_element(By.ID, "file-submit").click()
            IWebElement fileInput = driver.FindElement(By.CssSelector("input[type=file]"));
            fileInput.SendKeys(uploadFile);
            driver.FindElement(By.Id("file-submit")).Click();
    file_input = driver.find_element(css: 'input[type=file]')
    file_input.send_keys(upload_file)
    driver.find_element(id: 'file-submit').click
      await driver.findElement(By.id("file-upload")).sendKeys(image);
      await driver.findElement(By.id("file-submit")).submit();

Move Code

```java import org.openqa.selenium.By import org.openqa.selenium.chrome.ChromeDriver fun main() { val driver = ChromeDriver() driver.get("https://the-internet.herokuapp.com/upload") driver.findElement(By.id("file-upload")).sendKeys("selenium-snapshot.jpg") driver.findElement(By.id("file-submit")).submit() if(driver.pageSource.contains("File Uploaded!")) { println("file uploaded") } else{ println("file not uploaded") } } ```

2.5.2 - 查询网络元素

根据提供的定位值定位元素.

One of the most fundamental aspects of using Selenium is obtaining element references to work with. Selenium offers a number of built-in locator strategies to uniquely identify an element. There are many ways to use the locators in very advanced scenarios. For the purposes of this documentation, let’s consider this HTML snippet:

<ol id="vegetables">
 <li class="potatoes"> <li class="onions"> <li class="tomatoes"><span>Tomato is a Vegetable</span></ol>
<ul id="fruits">
  <li class="bananas">  <li class="apples">  <li class="tomatoes"><span>Tomato is a Fruit</span></ul>

First matching element

Many locators will match multiple elements on the page. The singular find element method will return a reference to the first element found within a given context.

Evaluating entire DOM

When the find element method is called on the driver instance, it returns a reference to the first element in the DOM that matches with the provided locator. This value can be stored and used for future element actions. In our example HTML above, there are two elements that have a class name of “tomatoes” so this method will return the element in the “vegetables” list.

WebElement vegetable = driver.findElement(By.className("tomatoes"));
  
vegetable = driver.find_element(By.CLASS_NAME, "tomatoes")
  
var vegetable = driver.FindElement(By.ClassName("tomatoes"));
  
vegetable = driver.find_element(class: 'tomatoes')
  
const vegetable = await driver.findElement(By.className('tomatoes'));
  
val vegetable: WebElement = driver.findElement(By.className("tomatoes"))
  

Evaluating a subset of the DOM

Rather than finding a unique locator in the entire DOM, it is often useful to narrow the search to the scope of another located element. In the above example there are two elements with a class name of “tomatoes” and it is a little more challenging to get the reference for the second one.

One solution is to locate an element with a unique attribute that is an ancestor of the desired element and not an ancestor of the undesired element, then call find element on that object:

WebElement fruits = driver.findElement(By.id("fruits"));
WebElement fruit = fruits.findElement(By.className("tomatoes"));
  
fruits = driver.find_element(By.ID, "fruits")
fruit = fruits.find_element(By.CLASS_NAME,"tomatoes")
  
IWebElement fruits = driver.FindElement(By.Id("fruits"));
IWebElement fruit = fruits.FindElement(By.ClassName("tomatoes"));
  
fruits = driver.find_element(id: 'fruits')
fruit = fruits.find_element(class: 'tomatoes')
  
const fruits = await driver.findElement(By.id('fruits'));
const fruit = fruits.findElement(By.className('tomatoes'));
  
val fruits = driver.findElement(By.id("fruits"))
val fruit = fruits.findElement(By.className("tomatoes"))
  

Java and C#
WebDriver, WebElement and ShadowRoot classes all implement a SearchContext interface, which is considered a role-based interface. Role-based interfaces allow you to determine whether a particular driver implementation supports a given feature. These interfaces are clearly defined and try to adhere to having only a single role of responsibility.

Optimized locator

A nested lookup might not be the most effective location strategy since it requires two separate commands to be issued to the browser.

To improve the performance slightly, we can use either CSS or XPath to find this element in a single command. See the Locator strategy suggestions in our Encouraged test practices section.

For this example, we’ll use a CSS Selector:

WebElement fruit = driver.findElement(By.cssSelector("#fruits .tomatoes"));
  
fruit = driver.find_element(By.CSS_SELECTOR,"#fruits .tomatoes")
  
var fruit = driver.FindElement(By.CssSelector("#fruits .tomatoes"));
  
fruit = driver.find_element(css: '#fruits .tomatoes')
  
const fruit = await driver.findElement(By.css('#fruits .tomatoes'));
  
val fruit = driver.findElement(By.cssSelector("#fruits .tomatoes"))
  

All matching elements

There are several use cases for needing to get references to all elements that match a locator, rather than just the first one. The plural find elements methods return a collection of element references. If there are no matches, an empty list is returned. In this case, references to all fruits and vegetable list items will be returned in a collection.

List<WebElement> plants = driver.findElements(By.tagName("li"));
  
plants = driver.find_elements(By.TAG_NAME, "li")
  
IReadOnlyList<IWebElement> plants = driver.FindElements(By.TagName("li"));
  
plants = driver.find_elements(tag_name: 'li')
  
const plants = await driver.findElements(By.tagName('li'));
  
val plants: List<WebElement> = driver.findElements(By.tagName("li"))
  

Get element

Often you get a collection of elements but want to work with a specific element, which means you need to iterate over the collection and identify the one you want.

List<WebElement> elements = driver.findElements(By.tagName("li"));

for (WebElement element : elements) {
    System.out.println("Paragraph text:" + element.getText());
}
  
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Firefox()

    # Navigate to Url
driver.get("https://www.example.com")

    # Get all the elements available with tag name 'p'
elements = driver.find_elements(By.TAG_NAME, 'p')

for e in elements:
    print(e.text)
  
using OpenQA.Selenium;
using OpenQA.Selenium.Firefox;
using System.Collections.Generic;

namespace FindElementsExample {
 class FindElementsExample {
  public static void Main(string[] args) {
   IWebDriver driver = new FirefoxDriver();
   try {
    // Navigate to Url
    driver.Navigate().GoToUrl("https://example.com");

    // Get all the elements available with tag name 'p'
    IList < IWebElement > elements = driver.FindElements(By.TagName("p"));
    foreach(IWebElement e in elements) {
     System.Console.WriteLine(e.Text);
    }

   } finally {
    driver.Quit();
   }
  }
 }
}
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :firefox
begin
     # Navigate to URL
  driver.get 'https://www.example.com'

     # Get all the elements available with tag name 'p'
  elements = driver.find_elements(:tag_name,'p')

  elements.each { |e|
    puts e.text
  }
ensure
  driver.quit
end
  
const {Builder, By} = require('selenium-webdriver');
(async function example() {
    let driver = await new Builder().forBrowser('firefox').build();
    try {
        // Navigate to Url
        await driver.get('https://www.example.com');

        // Get all the elements available with tag 'p'
        let elements = await driver.findElements(By.css('p'));
        for(let e of elements) {
            console.log(await e.getText());
        }
    }
    finally {
        await driver.quit();
    }
})();
  
import org.openqa.selenium.By
import org.openqa.selenium.firefox.FirefoxDriver

fun main() {
    val driver = FirefoxDriver()
    try {
        driver.get("https://example.com")
        // Get all the elements available with tag name 'p'
        val elements = driver.findElements(By.tagName("p"))
        for (element in elements) {
            println("Paragraph text:" + element.text)
        }
    } finally {
        driver.quit()
    }
}
  

Find Elements From Element

It is used to find the list of matching child WebElements within the context of parent element. To achieve this, the parent WebElement is chained with ‘findElements’ to access child elements

  import org.openqa.selenium.By;
  import org.openqa.selenium.WebDriver;
  import org.openqa.selenium.WebElement;
  import org.openqa.selenium.chrome.ChromeDriver;
  import java.util.List;

  public class findElementsFromElement {
      public static void main(String[] args) {
          WebDriver driver = new ChromeDriver();
          try {
              driver.get("https://example.com");

              // Get element with tag name 'div'
              WebElement element = driver.findElement(By.tagName("div"));

              // Get all the elements available with tag name 'p'
              List<WebElement> elements = element.findElements(By.tagName("p"));
              for (WebElement e : elements) {
                  System.out.println(e.getText());
              }
          } finally {
              driver.quit();
          }
      }
  }
  
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://www.example.com")

    # Get element with tag name 'div'
element = driver.find_element(By.TAG_NAME, 'div')

    # Get all the elements available with tag name 'p'
elements = element.find_elements(By.TAG_NAME, 'p')
for e in elements:
    print(e.text)
  
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using System.Collections.Generic;

namespace FindElementsFromElement {
 class FindElementsFromElement {
  public static void Main(string[] args) {
   IWebDriver driver = new ChromeDriver();
   try {
    driver.Navigate().GoToUrl("https://example.com");

    // Get element with tag name 'div'
    IWebElement element = driver.FindElement(By.TagName("div"));

    // Get all the elements available with tag name 'p'
    IList < IWebElement > elements = element.FindElements(By.TagName("p"));
    foreach(IWebElement e in elements) {
     System.Console.WriteLine(e.Text);
    }
   } finally {
    driver.Quit();
   }
  }
 }
}
  
  require 'selenium-webdriver'
  driver = Selenium::WebDriver.for :chrome
  begin
    # Navigate to URL
    driver.get 'https://www.example.com'

    # Get element with tag name 'div'
    element = driver.find_element(:tag_name,'div')

    # Get all the elements available with tag name 'p'
    elements = element.find_elements(:tag_name,'p')

    elements.each { |e|
      puts e.text
    }
  ensure
    driver.quit
  end
  
  const {Builder, By} = require('selenium-webdriver');

  (async function example() {
      let driver = new Builder()
          .forBrowser('chrome')
          .build();

      await driver.get('https://www.example.com');

      // Get element with tag name 'div'
      let element = driver.findElement(By.css("div"));

      // Get all the elements available with tag name 'p'
      let elements = await element.findElements(By.css("p"));
      for(let e of elements) {
          console.log(await e.getText());
      }
  })();
  
  import org.openqa.selenium.By
  import org.openqa.selenium.chrome.ChromeDriver

  fun main() {
      val driver = ChromeDriver()
      try {
          driver.get("https://example.com")

          // Get element with tag name 'div'
          val element = driver.findElement(By.tagName("div"))

          // Get all the elements available with tag name 'p'
          val elements = element.findElements(By.tagName("p"))
          for (e in elements) {
              println(e.text)
          }
      } finally {
          driver.quit()
      }
  }
  

Get Active Element

It is used to track (or) find DOM element which has the focus in the current browsing context.

  import org.openqa.selenium.*;
  import org.openqa.selenium.chrome.ChromeDriver;

  public class activeElementTest {
    public static void main(String[] args) {
      WebDriver driver = new ChromeDriver();
      try {
        driver.get("http://www.google.com");
        driver.findElement(By.cssSelector("[name='q']")).sendKeys("webElement");

        // Get attribute of current active element
        String attr = driver.switchTo().activeElement().getAttribute("title");
        System.out.println(attr);
      } finally {
        driver.quit();
      }
    }
  }
  
  from selenium import webdriver
  from selenium.webdriver.common.by import By

  driver = webdriver.Chrome()
  driver.get("https://www.google.com")
  driver.find_element(By.CSS_SELECTOR, '[name="q"]').send_keys("webElement")

    # Get attribute of current active element
  attr = driver.switch_to.active_element.get_attribute("title")
  print(attr)
  
    using OpenQA.Selenium;
    using OpenQA.Selenium.Chrome;

    namespace ActiveElement {
     class ActiveElement {
      public static void Main(string[] args) {
       IWebDriver driver = new ChromeDriver();
       try {
        // Navigate to Url
        driver.Navigate().GoToUrl("https://www.google.com");
        driver.FindElement(By.CssSelector("[name='q']")).SendKeys("webElement");

        // Get attribute of current active element
        string attr = driver.SwitchTo().ActiveElement().GetAttribute("title");
        System.Console.WriteLine(attr);
       } finally {
        driver.Quit();
       }
      }
     }
    }
  
  require 'selenium-webdriver'
  driver = Selenium::WebDriver.for :chrome
  begin
    driver.get 'https://www.google.com'
    driver.find_element(css: '[name="q"]').send_keys('webElement')

    # Get attribute of current active element
    attr = driver.switch_to.active_element.attribute('title')
    puts attr
  ensure
    driver.quit
  end
  
  const {Builder, By} = require('selenium-webdriver');

  (async function example() {
      let driver = await new Builder().forBrowser('chrome').build();
      await driver.get('https://www.google.com');
      await  driver.findElement(By.css('[name="q"]')).sendKeys("webElement");

      // Get attribute of current active element
      let attr = await driver.switchTo().activeElement().getAttribute("title");
      console.log(`${attr}`)
  })();
  
  import org.openqa.selenium.By
  import org.openqa.selenium.chrome.ChromeDriver

  fun main() {
      val driver = ChromeDriver()
      try {
          driver.get("https://www.google.com")
          driver.findElement(By.cssSelector("[name='q']")).sendKeys("webElement")

          // Get attribute of current active element
          val attr = driver.switchTo().activeElement().getAttribute("title")
          print(attr)
      } finally {
          driver.quit()
      }
  }
  

2.5.3 - Web元素交互

用于操纵表单的高级指令集.

仅有五种基本命令可用于元素的操作:

  • 点击 (适用于任何元素)
  • 发送键位 (仅适用于文本字段和内容可编辑元素)
  • 清除 (仅适用于文本字段和内容可编辑元素)
  • 提交 (仅适用于表单元素)
  • 选择 (参见 选择列表元素)

附加验证

这些方法的设计目的是尽量模拟用户体验, 所以, 与 Actions接口 不同, 在指定制定操作之前, 会尝试执行两件事.

  1. 如果它确定元素在视口之外, 则会将元素滚动到视图中, 特别是将元素底部与视口底部对齐.
  2. 确保元素在执行操作之前是可交互的 . 这可能意味着滚动不成功, 或者该元素没有以其他方式显示.
    确定某个元素是否显示在页面上太难了 无法直接在webdriver规范中定义, 因此Selenium发送一个带有JavaScript原子的执行命令, 检查是否有可能阻止该元素显示. 如果确定某个元素不在视口中, 不显示, 不可 键盘交互, 或不可 指针交互, 则返回一个元素不可交互 错误.

点击

元素点击命令 执行在 元素中央. 如果元素中央由于某些原因被 遮挡 , Selenium将返回一个 元素点击中断 错误.

        driver.get("https://www.selenium.dev/selenium/web/inputs.html");

	    // Click on the element 
        WebElement checkInput=driver.findElement(By.name("checkbox_input"));
        checkInput.click();
    # Navigate to url
	driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    # Click on the element 
	driver.find_element(By.NAME, "color_input").click()
  
  // Navigate to Url
  driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/inputs.html");

  // Click the element
  driver.FindElement(By.Name("color_input")).Click();
  
  
    # Navigate to URL
  driver.get 'https://www.selenium.dev/selenium/web/inputs.html'

    # Click the element
  driver.find_element(name: 'color_input').click

  
    // Navigate to Url
    await driver.get('https://www.selenium.dev/selenium/web/inputs.html');

    // Click the element
    await driver.findElement(By.name('color_input')).click();
  
  
    // Navigate to Url
    driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    // Click the element
    driver.findElement(By.name("color_input")).click();
  
  

发送键位

元素发送键位命令 将录入提供的键位到 可编辑的 元素. 通常, 这意味着元素是具有 文本 类型的表单的输入元素或具有 内容可编辑 属性的元素. 如果不可编辑, 则返回 无效元素状态 错误.

以下 是WebDriver支持的按键列表.

        // Clear field to empty it from any previous data
        WebElement emailInput=driver.findElement(By.name("email_input"));
        emailInput.clear();
	    //Enter Text
        String email="admin@localhost.dev";
	    emailInput.sendKeys(email);
    # Navigate to url
	driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    # Clear field to empty it from any previous data
	driver.find_element(By.NAME, "email_input").clear()

	# Enter Text
	driver.find_element(By.NAME, "email_input").send_keys("admin@localhost.dev" )

  
  // Navigate to Url
  driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/inputs.html");

  // Clear field to empty it from any previous data
  driver.FindElement(By.Name("email_input")).Clear();
  
  //Enter Text
  driver.FindElement(By.Name("email_input")).SendKeys("admin@localhost.dev");
  
  
}
  
    # Navigate to URL
	driver.get 'https://www.selenium.dev/selenium/web/inputs.html'

    # Clear field to empty it from any previous data
	driver.find_element(name: 'email_input').clear
	
	# Enter Text
	driver.find_element(name: 'email_input').send_keys 'admin@localhost.dev'

  
    // Navigate to Url
    await driver.get('https://www.selenium.dev/selenium/web/inputs.html');

	//Clear field to empty it from any previous data
	await driver.findElement(By.name('email_input')).clear();

    // Enter text 
    await driver.findElement(By.name('email_input')).sendKeys('admin@localhost.dev');
  
  
  
    // Navigate to Url
    driver.get("https://www.selenium.dev/selenium/web/inputs.html")

	//Clear field to empty it from any previous data
	driver.findElement(By.name("email_input")).clear()
	
    // Enter text 
    driver.findElement(By.name("email_input")).sendKeys("admin@localhost.dev")
  
  

清除

元素清除命令 重置元素的内容. 这要求元素 可编辑, 且 可重置. 通常, 这意味着元素是具有 文本 类型的表单的输入元素或具有 内容可编辑 属性的元素. 如果不满足这些条件, 将返回 无效元素状态 错误.

        //Clear Element
        // Clear field to empty it from any previous data
        emailInput.clear();
    # Navigate to url
	driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    # Clear field to empty it from any previous data
	driver.find_element(By.NAME, "email_input").clear()

	
  
  // Navigate to Url
  driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/inputs.html");

  // Clear field to empty it from any previous data
  driver.FindElement(By.Name("email_input")).Clear();
  
 
  
}
  
    # Navigate to URL
	driver.get 'https://www.selenium.dev/selenium/web/inputs.html'

    # Clear field to empty it from any previous data
	driver.find_element(name: 'email_input').clear

  
    // Navigate to Url
    await driver.get('https://www.selenium.dev/selenium/web/inputs.html');

	//Clear field to empty it from any previous data
	await driver.findElement(By.name('email_input')).clear();

   
  
  
    // Navigate to Url
    driver.get("https://www.selenium.dev/selenium/web/inputs.html")

	//Clear field to empty it from any previous data
	driver.findElement(By.name("email_input")).clear()
	
  
  

提交

在Selenium 4中, 不再通过单独的端点以及脚本执行的方法来实现. 因此, 建议不要使用此方法, 而是单击相应的表单提交按钮.

2.5.4 - 定位策略

在DOM中标识一个或多个特定元素的方法.

定位器是在页面上标识元素的一种方法。它是传送给 查找元素 方法的参数。

查看 鼓励测试练习 寻找 定位器的小技巧, 包含在查找方法中,不同时间,不同原因下,单独声明的定位器的使用方法。

元素选择策略

在 WebDriver 中有 8 种不同的内置元素定位策略:

定位器 Locator描述
class name定位class属性与搜索值匹配的元素(不允许使用复合类名)
css selector定位 CSS 选择器匹配的元素
id定位 id 属性与搜索值匹配的元素
name定位 name 属性与搜索值匹配的元素
link text定位link text可视文本与搜索值完全匹配的锚元素
partial link text定位link text可视文本部分与搜索值部分匹配的锚点元素。如果匹配多个元素,则只选择第一个元素。
tag name定位标签名称与搜索值匹配的元素
xpath定位与 XPath 表达式匹配的元素

Creating Locators

To work on a web element using Selenium, we need to first locate it on the web page. Selenium provides us above mentioned ways, using which we can locate element on the page. To understand and create locator we will use the following HTML snippet.

<html>
<body>
<style>
.information {
  background-color: white;
  color: black;
  padding: 10px;
}
</style>
<h2>Contact Selenium</h2>

<form action="/action_page.php">
  <input type="radio" name="gender" value="m" />Male &nbsp;
  <input type="radio" name="gender" value="f" />Female <br>
  <br>
  <label for="fname">First name:</label><br>
  <input class="information" type="text" id="fname" name="fname" value="Jane"><br><br>
  <label for="lname">Last name:</label><br>
  <input class="information" type="text" id="lname" name="lname" value="Doe"><br><br>
  <label for="newsletter">Newsletter:</label>
  <input type="checkbox" name="newsletter" value="1" /><br><br>
  <input type="submit" value="Submit">
</form> 

<p>To know more about Selenium, visit the official page 
<a href ="www.selenium.dev">Selenium Official Page</a> 
</p>

</body>
</html>

class name

The HTML page web element can have attribute class. We can see an example in the above shown HTML snippet. We can identify these elements using the class name locator available in Selenium.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.className("information"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.CLASS_NAME, "information")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.ClassName("information"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(class: 'information')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.className('information'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.className("information"))
  

css selector

CSS is the language used to style HTML pages. We can use css selector locator strategy to identify the element on the page. If the element has an id, we create the locator as css = #id. Otherwise the format we follow is css =[attribute=value] . Let us see an example from above HTML snippet. We will create locator for First Name textbox, using css.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.cssSelector("#fname"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.CSS_SELECTOR, "#fname")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.CssSelector("#fname"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(css: '#fname')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.css('#fname'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.css("#fname"))
  

id

We can use the ID attribute available with element in a web page to locate it. Generally the ID property should be unique for a element on the web page. We will identify the Last Name field using it.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.id("lname"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.ID, "lname")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.Id("lname"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(id: 'lname')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.id('lname'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.id("lname"))
  

name

We can use the NAME attribute available with element in a web page to locate it. Generally the NAME property should be unique for a element on the web page. We will identify the Newsletter checkbox using it.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.name("newsletter"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.NAME, "newsletter")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.Name("newsletter"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(name: 'newsletter')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.name('newsletter'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.name("newsletter"))
  

If the element we want to locate is a link, we can use the link text locator to identify it on the web page. The link text is the text displayed of the link. In the HTML snippet shared, we have a link available, lets see how will we locate it.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.linkText("Selenium Official Page"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.LINK_TEXT, "Selenium Official Page")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.LinkText("Selenium Official Page"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(link_text: 'Selenium Official Page')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.linkText('Selenium Official Page'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.linkText("Selenium Official Page"))
  

If the element we want to locate is a link, we can use the partial link text locator to identify it on the web page. The link text is the text displayed of the link. We can pass partial text as value. In the HTML snippet shared, we have a link available, lets see how will we locate it.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.partialLinkText("Official Page"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.PARTIAL_LINK_TEXT, "Official Page")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.PartialLinkText("Official Page"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(partial_link_text: 'Official Page')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.partialLinkText('Official Page'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.partialLinkText("Official Page"))
  

tag name

We can use the HTML TAG itself as a locator to identify the web element on the page. From the above HTML snippet shared, lets identify the link, using its html tag “a”.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.tagName("a"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.TAG_NAME, "a")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.TagName("a"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(tag_name: 'a')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.tagName('a'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.tagName("a"))
  

xpath

A HTML document can be considered as a XML document, and then we can use xpath which will be the path traversed to reach the element of interest to locate the element. The XPath could be absolute xpath, which is created from the root of the document. Example - /html/form/input[1]. This will return the male radio button. Or the xpath could be relative. Example- //input[@name=‘fname’]. This will return the first name text box. Let us create locator for female radio button using xpath.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.xpath("//input[@value='f']"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.XPATH, "//input[@value='f']")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.Xpath("//input[@value='f']"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(xpath: '//input[@value='f']')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.xpath('//input[@value='f']'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.xpath('//input[@value='f']'))
  

Relative Locators

Selenium 4 introduces Relative Locators (previously called as Friendly Locators). These locators are helpful when it is not easy to construct a locator for the desired element, but easy to describe spatially where the element is in relation to an element that does have an easily constructed locator.

How it works

Selenium uses the JavaScript function getBoundingClientRect() to determine the size and position of elements on the page, and can use this information to locate neighboring elements.
find the relative elements.

Relative locator methods can take as the argument for the point of origin, either a previously located element reference, or another locator. In these examples we’ll be using locators only, but you could swap the locator in the final method with an element object and it will work the same.

Let us consider the below example for understanding the relative locators.

Relative Locators

Available relative locators

Above

If the email text field element is not easily identifiable for some reason, but the password text field element is, we can locate the text field element using the fact that it is an “input” element “above” the password element.

By emailLocator = RelativeLocator.with(By.tagName("input")).above(By.id("password"));
email_locator = locate_with(By.TAG_NAME, "input").above({By.ID: "password"})
var emailLocator = RelativeBy.WithLocator(By.TagName("input")).Above(By.Id("password"));
email_locator = {relative: {tag_name: 'input', above: {id: 'password'}}}
let emailLocator = locateWith(By.tagName('input')).above(By.id('password'));
val emailLocator = RelativeLocator.with(By.tagName("input")).above(By.id("password"))

Below

If the password text field element is not easily identifiable for some reason, but the email text field element is, we can locate the text field element using the fact that it is an “input” element “below” the email element.

By passwordLocator = RelativeLocator.with(By.tagName("input")).below(By.id("email"));
password_locator = locate_with(By.TAG_NAME, "input").below({By.ID: "email"})
var passwordLocator = RelativeBy.WithLocator(By.TagName("input")).Below(By.Id("email"));
password_locator = {relative: {tag_name: 'input', below: {id: 'email'}}}
let passwordLocator = locateWith(By.tagName('input')).below(By.id('email'));
val passwordLocator = RelativeLocator.with(By.tagName("input")).below(By.id("email"))

Left of

If the cancel button is not easily identifiable for some reason, but the submit button element is, we can locate the cancel button element using the fact that it is a “button” element to the “left of” the submit element.

By cancelLocator = RelativeLocator.with(By.tagName("button")).toLeftOf(By.id("submit"));
cancel_locator = locate_with(By.TAG_NAME, "button").to_left_of({By.ID: "submit"})
var cancelLocator = RelativeBy.WithLocator(By.tagName("button")).LeftOf(By.Id("submit"));
cancel_locator = {relative: {tag_name: 'button', left: {id: 'submit'}}}
let cancelLocator = locateWith(By.tagName('button')).toLeftOf(By.id('submit'));
val cancelLocator = RelativeLocator.with(By.tagName("button")).toLeftOf(By.id("submit"))

Right of

If the submit button is not easily identifiable for some reason, but the cancel button element is, we can locate the submit button element using the fact that it is a “button” element “to the right of” the cancel element.

By submitLocator = RelativeLocator.with(By.tagName("button")).toRightOf(By.id("cancel"));
submit_locator = locate_with(By.TAG_NAME, "button").to_right_of({By.ID: "cancel"})
var submitLocator = RelativeBy.WithLocator(By.tagName("button")).RightOf(By.Id("cancel"));
submit_locator = {relative: {tag_name: 'button', right: {id: 'cancel'}}}
let submitLocator = locateWith(By.tagName('button')).toRightOf(By.id('cancel'));
val submitLocator = RelativeLocator.with(By.tagName("button")).toRightOf(By.id("cancel"))

Near

If the relative positioning is not obvious, or it varies based on window size, you can use the near method to identify an element that is at most 50px away from the provided locator. One great use case for this is to work with a form element that doesn’t have an easily constructed locator, but its associated input label element does.

By emailLocator = RelativeLocator.with(By.tagName("input")).near(By.id("lbl-email"));
email_locator = locate_with(By.TAG_NAME, "input").near({By.ID: "lbl-email"})
var emailLocator = RelativeBy.WithLocator(By.tagName("input")).Near(By.Id("lbl-email"));
email_locator = {relative: {tag_name: 'input', near: {id: 'lbl-email'}}}
let emailLocator = locateWith(By.tagName('input')).near(By.id('lbl-email'));
val emailLocator = RelativeLocator.with(By.tagName("input")).near(By.id("lbl-email"));

Chaining relative locators

You can also chain locators if needed. Sometimes the element is most easily identified as being both above/below one element and right/left of another.

By submitLocator = RelativeLocator.with(By.tagName("button")).below(By.id("email")).toRightOf(By.id("cancel"));
submit_locator = locate_with(By.TAG_NAME, "button").below({By.ID: "email"}).to_right_of({By.ID: "cancel"})
var submitLocator = RelativeBy.WithLocator(By.tagName("button")).Below(By.Id("email")).RightOf(By.Id("cancel"));
submit_locator = {relative: {tag_name: 'button', below: {id: 'email'}, right: {id: 'cancel'}}}
let submitLocator = locateWith(By.tagName('button')).below(By.id('email')).toRightOf(By.id('cancel'));
val submitLocator = RelativeLocator.with(By.tagName("button")).below(By.id("email")).toRightOf(By.id("cancel"))

2.5.5 - 关于网络元素的信息

元素相关的知识.

您可以查询有关特定元素的许多详细信息。

是否显示

此方法用于检查连接的元素是否正确显示在网页上. 返回一个 Boolean 值, 如果连接的元素显示在当前的浏览器上下文中,则为True,否则返回false。

此功能于W3C规范中提及, 但由于无法覆盖所有潜在条件而无法定义。 因此,Selenium不能期望驱动程序直接实现这种功能,现在依赖于直接执行大量JavaScript函数。 这个函数对一个元素的性质和在树中的关系做了许多近似的判断,以返回一个值。

// Navigate to the url
driver.get("https://www.selenium.dev/selenium/web/inputs.html");

// Get boolean value for is element display
boolean isEmailVisible = driver.findElement(By.name("email_input")).isDisplayed();
# Navigate to the url
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

# Get boolean value for is element display
is_email_visible = driver.find_element(By.NAME, "email_input").is_displayed()
//Navigate to the url
driver.Url = "https://www.selenium.dev/selenium/web/inputs.html";

//Get boolean value for is element display
Boolean is_email_visible = driver.FindElement(By.Name("email_input")).Displayed;
# Navigate to the url
driver.get("https://www.selenium.dev/selenium/web/inputs.html");

#fetch display status
val = driver.find_element(name: 'email_input').displayed?
    // Resolves Promise and returns boolean value
    let result =  await driver.findElement(By.name("email_input")).isDisplayed();
//navigates to url
 driver.get("https://www.selenium.dev/selenium/web/inputs.html")

 //returns true if element is displayed else returns false
 val flag = driver.findElement(By.name("email_input")).isDisplayed()

是否启用

此方法用于检查所连接的元素在网页上是启用还是禁用状态。 返回一个布尔值,如果在当前浏览上下文中是 启用 状态,则返回 true,否则返回 false

  //navigates to url
  driver.get("https://www.selenium.dev/selenium/web/inputs.html");

  //returns true if element is enabled else returns false
  boolean value = driver.findElement(By.name("button_input")).isEnabled();
  
    # Navigate to url
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    # Returns true if element is enabled else returns false
value = driver.find_element(By.NAME, 'button_input').is_enabled()
  
// Navigate to Url
driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/inputs.html");

// Store the WebElement
IWebElement element = driver.FindElement(By.Name("button_input"));

// Prints true if element is enabled else returns false
System.Console.WriteLine(element.Enabled);
  
    # Navigate to url
driver.get 'https://www.selenium.dev/selenium/web/inputs.html'

    # Returns true if element is enabled else returns false
ele = driver.find_element(name: 'button_input').enabled?
  
    // Resolves Promise and returns boolean value
    let element =  await driver.findElement(By.name("button_input")).isEnabled();
 //navigates to url
 driver.get("https://www.selenium.dev/selenium/web/inputs.html")

 //returns true if element is enabled else returns false
 val attr = driver.findElement(By.name("button_input")).isEnabled()
  

是否被选定

此方法确认相关的元素是否 已选定,常用于复选框、单选框、输入框和选择元素中。

该方法返回一个布尔值,如果在当前浏览上下文中 选择了 引用的元素,则返回 True,否则返回 False

 //navigates to url
 driver.get("https://www.selenium.dev/selenium/web/inputs.html");

 //returns true if element is checked else returns false
 boolean value = driver.findElement(By.name("checkbox_input")).isSelected();
  
    # Navigate to url
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    # Returns true if element is checked else returns false
value = driver.find_element(By.NAME, "checkbox_input").is_selected()
  
// Navigate to Url
driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/inputs.html");

// Returns true if element ins checked else returns false
bool value = driver.FindElement(By.Name("checkbox_input")).Selected;
  
    # Navigate to url
driver.get 'https://www.selenium.dev/selenium/web/inputs.html'

    # Returns true if element is checked else returns false
ele = driver.find_element(name: "checkbox_input").selected?
  
    // Returns true if element ins checked else returns false
    let isSelected = await driver.findElement(By.name("checkbox_input")).isSelected();
 //navigates to url
 driver.get("https://www.selenium.dev/selenium/web/inputs.html")

 //returns true if element is checked else returns false
 val attr =  driver.findElement(By.name("checkbox_input")).isSelected()
  

获取元素标签名

此方法用于获取在当前浏览上下文中具有焦点的被引用元素的TagName

 //navigates to url
 driver.get("https://www.selenium.dev/selenium/web/inputs.html");

 //returns TagName of the element
 String value = driver.findElement(By.name("email_input")).getTagName();
  
    # Navigate to url
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    # Returns TagName of the element
attr = driver.find_element(By.NAME, "email_input").tag_name
  
// Navigate to Url
driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/inputs.html");

// Returns TagName of the element
string attr = driver.FindElement(By.Name("email_input")).TagName;
  
    # Navigate to url
driver.get 'https://www.selenium.dev/selenium/web/inputs.html'

    # Returns TagName of the element
attr = driver.find_element(name: "email_input").tag_name
  
    // Returns TagName of the element
    let value = await driver.findElement(By.name('email_input')).getTagName();
 //navigates to url
 driver.get("https://www.selenium.dev/selenium/web/inputs.html")

 //returns TagName of the element
 val attr =  driver.findElement(By.name("email_input")).getTagName()
  

位置和大小

用于获取参照元素的尺寸和坐标。

提取的数据主体包含以下详细信息:

  • 元素左上角的X轴位置
  • 元素左上角的y轴位置
  • 元素的高度
  • 元素的宽度
// Navigate to url
driver.get("https://www.selenium.dev/selenium/web/inputs.html");

// Returns height, width, x and y coordinates referenced element
Rectangle res =  driver.findElement(By.name("range_input")).getRect();

// Rectangle class provides getX,getY, getWidth, getHeight methods
System.out.println(res.getX());
  
    # Navigate to url
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    # Returns height, width, x and y coordinates referenced element
res = driver.find_element(By.NAME, "range_input").rect
  
// Navigate to Url
driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/inputs.html");

var res = driver.FindElement(By.Name("range_input"));
// Return x and y coordinates referenced element
System.Console.WriteLine(res.Location);
// Returns height, width
System.Console.WriteLine(res.Size);
  
    # Navigate to url
driver.get 'https://www.selenium.dev/selenium/web/inputs.html'

    # Returns height, width, x and y coordinates referenced element
res = driver.find_element(name: "range_input").rect
  
    let object = await driver.findElement(By.name('range_input')).getRect();
// Navigate to url
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

// Returns height, width, x and y coordinates referenced element
val res = driver.findElement(By.name("range_input")).rect

// Rectangle class provides getX,getY, getWidth, getHeight methods
println(res.getX())
  

获取元素CSS值

获取当前浏览上下文中元素的特定计算样式属性的值。

// Navigate to Url
driver.get("https://www.selenium.dev/selenium/web/colorPage.html");

// Retrieves the computed style property 'color' of linktext
String cssValue = driver.findElement(By.id("namedColor")).getCssValue("background-color");

  
    # Navigate to Url
driver.get('https://www.selenium.dev/selenium/web/colorPage.html')

    # Retrieves the computed style property 'color' of linktext
cssValue = driver.find_element(By.ID, "namedColor").value_of_css_property('background-color')

  
// Navigate to Url
driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/colorPage.html");

// Retrieves the computed style property 'color' of linktext
String cssValue = driver.FindElement(By.Id("namedColor")).GetCssValue("background-color");

  
    # Navigate to Url
driver.get 'https://www.selenium.dev/selenium/web/colorPage.html'

    # Retrieves the computed style property 'color' of linktext
cssValue = driver.find_element(:id, 'namedColor').css_value('background-color')

  
    await driver.get('https://www.selenium.dev/selenium/web/colorPage.html');
      // Returns background color of the element
      let value = await driver.findElement(By.id('namedColor')).getCssValue('background-color');
// Navigate to Url
driver.get("https://www.selenium.dev/selenium/web/colorPage.html")

// Retrieves the computed style property 'color' of linktext
val cssValue = driver.findElement(By.id("namedColor")).getCssValue("background-color")

  

文本内容

获取特定元素渲染后的文本内容。

// Navigate to url
driver.get("https://www.selenium.dev/selenium/web/linked_image.html");

// Retrieves the text of the element
String text = driver.findElement(By.id("justanotherlink")).getText();
  
    # Navigate to url
driver.get("https://www.selenium.dev/selenium/web/linked_image.html")

    # Retrieves the text of the element
text = driver.find_element(By.ID, "justanotherlink").text
  
// Navigate to url
driver.Url="https://www.selenium.dev/selenium/web/linked_image.html";

// Retrieves the text of the element
String text = driver.FindElement(By.Id("justanotherlink")).Text;
  
    # Navigate to url
driver.get 'https://www.selenium.dev/selenium/web/linked_image.html'

    # Retrieves the text of the element
text = driver.find_element(:id, 'justanotherlink').text
  
    await driver.get('https://www.selenium.dev/selenium/web/linked_image.html');
    // Returns text of the element
    let text = await driver.findElement(By.id('justanotherLink')).getText();
// Navigate to URL
driver.get("https://www.selenium.dev/selenium/web/linked_image.html")

// retrieves the text of the element
val text = driver.findElement(By.id("justanotherlink")).getText()
  

获取特性或属性

获取与 DOM 属性关联的运行时的值。 它返回与该元素的 DOM 特性或属性关联的数据。

//Navigate to the url
driver.get("https://www.selenium.dev/selenium/web/inputs.html");

//identify the email text box
WebElement emailTxt = driver.findElement(By.name(("email_input")));

//fetch the value property associated with the textbox
String valueInfo = eleSelLink.getAttribute("value");
  
# Navigate to the url
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

# Identify the email text box
email_txt = driver.find_element(By.NAME, "email_input")

# Fetch the value property associated with the textbox
value_info = email_txt.get_attribute("value")
  
 //Navigate to the url
driver.Url="https://www.selenium.dev/selenium/web/inputs.html";

//identify the email text box
IWebElement emailTxt = driver.FindElement(By.Name(("email_input")));

//fetch the value property associated with the textbox
String valueInfo = eleSelLink.GetAttribute("value");
  
# Navigate to the url
driver.get("https://www.selenium.dev/selenium/web/inputs.html");

#identify the email text box
email_element=driver.find_element(name: 'email_input')

#fetch the value property associated with the textbox
emailVal = email_element.attribute("value");
  
    // identify the email text box
    const emailElement = await driver.findElement(By.xpath('//input[@name="email_input"]'));
    
    //fetch the attribute "name" associated with the textbox
    const nameAttribute = await emailElement.getAttribute("name");
// Navigate to URL
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

//fetch the value property associated with the textbox
val attr = driver.findElement(By.name("email_input")).getAttribute("value")
  

2.6 - 浏览器交互

获取浏览器信息

获取标题

从浏览器中读取当前页面的标题:

driver.getTitle();
driver.title
driver.Title;
driver.title
await driver.getTitle();
driver.title

获取当前 URL

您可以从浏览器的地址栏读取当前的 URL,使用:

driver.getCurrentUrl();
driver.current_url
driver.Url;
driver.current_url
await driver.getCurrentUrl();
driver.currentUrl

2.6.1 - 浏览器导航

打开网站

启动浏览器后你要做的第一件事就是打开你的网站。这可以通过一行代码实现:

// 简便的方法
driver.get("https://selenium.dev");

// 更长的方法
driver.navigate().to("https://selenium.dev");
driver.get("https://selenium.dev")
driver.Navigate().GoToUrl(@"https://selenium.dev");
    # 简便的方法
driver.get 'https://selenium.dev'

    # 更长的方法
driver.navigate.to 'https://selenium.dev'
await driver.get('https://selenium.dev');
// 简便的方法
driver.get("https://selenium.dev")

// 更长的方法
driver.navigate().to("https://selenium.dev")

后退

按下浏览器的后退按钮:

driver.navigate().back();
driver.back()
driver.Navigate().Back();
driver.navigate.back
await driver.navigate().back();
driver.navigate().back() 

前进

按下浏览器的前进键:

driver.navigate().forward();
driver.forward()
driver.Navigate().Forward();
driver.navigate.forward
await driver.navigate().forward();
driver.navigate().forward()

刷新

刷新当前页面:

driver.navigate().refresh();
driver.refresh()
driver.Navigate().Refresh();
driver.navigate.refresh
await driver.navigate().refresh();
driver.navigate().refresh()

2.6.2 - JavaScript 警告框,提示框和确认框

WebDriver提供了一个API, 用于处理JavaScript提供的三种类型的原生弹窗消息. 这些弹窗由浏览器提供限定的样式.

Alerts 警告框

其中最基本的称为警告框, 它显示一条自定义消息, 以及一个用于关闭该警告的按钮, 在大多数浏览器中标记为"确定"(OK). 在大多数浏览器中, 也可以通过按"关闭"(close)按钮将其关闭, 但这始终与“确定”按钮具有相同的作用. 查看样例警告框.

WebDriver可以从弹窗获取文本并接受或关闭这些警告.

//Click the link to activate the alert
driver.findElement(By.linkText("See an example alert")).click();

//Wait for the alert to be displayed and store it in a variable
Alert alert = wait.until(ExpectedConditions.alertIsPresent());

//Store the alert text in a variable
String text = alert.getText();

//Press the OK button
alert.accept();
  
# Click the link to activate the alert
driver.find_element(By.LINK_TEXT, "See an example alert").click()

# Wait for the alert to be displayed and store it in a variable
alert = wait.until(expected_conditions.alert_is_present())

# Store the alert text in a variable
text = alert.text

# Press the OK button
alert.accept()
  
//Click the link to activate the alert
driver.FindElement(By.LinkText("See an example alert")).Click();

//Wait for the alert to be displayed and store it in a variable
IAlert alert = wait.Until(ExpectedConditions.AlertIsPresent());

//Store the alert text in a variable
string text = alert.Text;

//Press the OK button
alert.Accept();
  
# Click the link to activate the alert
driver.find_element(:link_text, 'See an example alert').click

# Store the alert reference in a variable
alert = driver.switch_to.alert

# Store the alert text in a variable
alert_text = alert.text

# Press on OK button
alert.accept
  
//Click the link to activate the alert
await driver.findElement(By.linkText('See an example alert')).click();

// Wait for the alert to be displayed
await driver.wait(until.alertIsPresent());

// Store the alert in a variable
let alert = await driver.switchTo().alert();

//Store the alert text in a variable
let alertText = await alert.getText();

//Press the OK button
await alert.accept();

// Note: To use await, the above code should be inside an async function
  
//Click the link to activate the alert
driver.findElement(By.linkText("See an example alert")).click()

//Wait for the alert to be displayed and store it in a variable
val alert = wait.until(ExpectedConditions.alertIsPresent())

//Store the alert text in a variable
val text = alert.getText()

//Press the OK button
alert.accept()
  

Confirm 确认框

确认框类似于警告框, 不同之处在于用户还可以选择取消消息. 查看样例确认框.

此示例还呈现了警告的另一种实现:

//Click the link to activate the alert
driver.findElement(By.linkText("See a sample confirm")).click();

//Wait for the alert to be displayed
wait.until(ExpectedConditions.alertIsPresent());

//Store the alert in a variable
Alert alert = driver.switchTo().alert();

//Store the alert in a variable for reuse
String text = alert.getText();

//Press the Cancel button
alert.dismiss();
  
# Click the link to activate the alert
driver.find_element(By.LINK_TEXT, "See a sample confirm").click()

# Wait for the alert to be displayed
wait.until(expected_conditions.alert_is_present())

# Store the alert in a variable for reuse
alert = driver.switch_to.alert

# Store the alert text in a variable
text = alert.text

# Press the Cancel button
alert.dismiss()
  
//Click the link to activate the alert
driver.FindElement(By.LinkText("See a sample confirm")).Click();

//Wait for the alert to be displayed
wait.Until(ExpectedConditions.AlertIsPresent());

//Store the alert in a variable
IAlert alert = driver.SwitchTo().Alert();

//Store the alert in a variable for reuse
string text = alert.Text;

//Press the Cancel button
alert.Dismiss();
  
# Click the link to activate the alert
driver.find_element(:link_text, 'See a sample confirm').click

# Store the alert reference in a variable
alert = driver.switch_to.alert

# Store the alert text in a variable
alert_text = alert.text

# Press on Cancel button
alert.dismiss
  
//Click the link to activate the alert
await driver.findElement(By.linkText('See a sample confirm')).click();

// Wait for the alert to be displayed
await driver.wait(until.alertIsPresent());

// Store the alert in a variable
let alert = await driver.switchTo().alert();

//Store the alert text in a variable
let alertText = await alert.getText();

//Press the Cancel button
await alert.dismiss();

// Note: To use await, the above code should be inside an async function
  
//Click the link to activate the alert
driver.findElement(By.linkText("See a sample confirm")).click()

//Wait for the alert to be displayed
wait.until(ExpectedConditions.alertIsPresent())

//Store the alert in a variable
val alert = driver.switchTo().alert()

//Store the alert in a variable for reuse
val text = alert.text

//Press the Cancel button
alert.dismiss()
  

Prompt 提示框

提示框与确认框相似, 不同之处在于它们还包括文本输入. 与处理表单元素类似, 您可以使用WebDriver的sendKeys来填写响应. 这将完全替换占位符文本. 按下取消按钮将不会提交任何文本. 查看样例提示框.

//Click the link to activate the alert
driver.findElement(By.linkText("See a sample prompt")).click();

//Wait for the alert to be displayed and store it in a variable
Alert alert = wait.until(ExpectedConditions.alertIsPresent());

//Type your message
alert.sendKeys("Selenium");

//Press the OK button
alert.accept();
  
# Click the link to activate the alert
driver.find_element(By.LINK_TEXT, "See a sample prompt").click()

# Wait for the alert to be displayed
wait.until(expected_conditions.alert_is_present())

# Store the alert in a variable for reuse
alert = Alert(driver)

# Type your message
alert.send_keys("Selenium")

# Press the OK button
alert.accept()
  
//Click the link to activate the alert
driver.FindElement(By.LinkText("See a sample prompt")).Click();

//Wait for the alert to be displayed and store it in a variable
IAlert alert = wait.Until(ExpectedConditions.AlertIsPresent());

//Type your message
alert.SendKeys("Selenium");

//Press the OK button
alert.Accept();
  
# Click the link to activate the alert
driver.find_element(:link_text, 'See a sample prompt').click

# Store the alert reference in a variable
alert = driver.switch_to.alert

# Type a message
alert.send_keys("selenium")

# Press on Ok button
alert.accept
  
//Click the link to activate the alert
await driver.findElement(By.linkText('See a sample prompt')).click();

// Wait for the alert to be displayed
await driver.wait(until.alertIsPresent());

// Store the alert in a variable
let alert = await driver.switchTo().alert();

//Type your message
await alert.sendKeys("Selenium");

//Press the OK button
await alert.accept();

//Note: To use await, the above code should be inside an async function
  
//Click the link to activate the alert
driver.findElement(By.linkText("See a sample prompt")).click()

//Wait for the alert to be displayed and store it in a variable
val alert = wait.until(ExpectedConditions.alertIsPresent())

//Type your message
alert.sendKeys("Selenium")

//Press the OK button
alert.accept()
  

2.6.3 - 同cookies一起工作

Cookie是从网站发送并存储在您的计算机中的一小段数据. Cookies主要用于识别用户并加载存储的信息.

WebDriver API提供了一种使用内置的方法与Cookie进行交互:

这个方法常常用于将cookie添加到当前访问的上下文中. 添加Cookie仅接受一组已定义的可序列化JSON对象. 这里 是一个链接, 用于描述可接受的JSON键值的列表

首先, 您需要位于有效Cookie的域上. 如果您在开始与网站进行交互之前尝试预设cookie, 并且您的首页很大或需要一段时间才能加载完毕, 则可以选择在网站上找到一个较小的页面 (通常404页很小, 例如 http://example.com/some404page)

import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;

public class addCookie {
    public static void main(String[] args) {
        WebDriver driver = new ChromeDriver();
        try {
            driver.get("http://www.example.com");

            // Adds the cookie into current browser context
            driver.manage().addCookie(new Cookie("key", "value"));
        } finally {
            driver.quit();
        }
    }
}
  
from selenium import webdriver

driver = webdriver.Chrome()

driver.get("http://www.example.com")

# Adds the cookie into current browser context
driver.add_cookie({"name": "key", "value": "value"})
  
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace AddCookie {
 class AddCookie {
  public static void Main(string[] args) {
   IWebDriver driver = new ChromeDriver();
   try {
    // Navigate to Url
    driver.Navigate().GoToUrl("https://example.com");

    // Adds the cookie into current browser context
    driver.Manage().Cookies.AddCookie(new Cookie("key", "value"));
   } finally {
    driver.Quit();
   }
  }
 }
}
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
  driver.get 'https://www.example.com'
  
  # Adds the cookie into current browser context
  driver.manage.add_cookie(name: "key", value: "value")
ensure
  driver.quit
end
  

        it('Create a cookie', async function() {
            await driver.get('https://www.selenium.dev/selenium/web/blank.html');

            // set a cookie on the current domain
            await driver.manage().addCookie({ name: 'key', value: 'value' });
import org.openqa.selenium.Cookie
import org.openqa.selenium.chrome.ChromeDriver

fun main() {
    val driver = ChromeDriver()
    try {
        driver.get("https://example.com")

        // Adds the cookie into current browser context
        driver.manage().addCookie(Cookie("key", "value"))
    } finally {
        driver.quit()
    }
}  
  

此方法返回与cookie名称匹配的序列化cookie数据中所有关联的cookie.

import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;

public class getCookieNamed {
    public static void main(String[] args) {
        WebDriver driver = new ChromeDriver();
        try {
            driver.get("http://www.example.com");
            driver.manage().addCookie(new Cookie("foo", "bar"));

            // Get cookie details with named cookie 'foo'
            Cookie cookie1 = driver.manage().getCookieNamed("foo");
            System.out.println(cookie1);
        } finally {
            driver.quit();
        }
    }
}
  
from selenium import webdriver

driver = webdriver.Chrome()

# Navigate to url
driver.get("http://www.example.com")

# Adds the cookie into current browser context
driver.add_cookie({"name": "foo", "value": "bar"})

# Get cookie details with named cookie 'foo'
print(driver.get_cookie("foo"))
  
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace GetCookieNamed {
 class GetCookieNamed {
  public static void Main(string[] args) {
   IWebDriver driver = new ChromeDriver();
   try {
    // Navigate to Url
    driver.Navigate().GoToUrl("https://example.com");
    driver.Manage().Cookies.AddCookie(new Cookie("foo", "bar"));

    // Get cookie details with named cookie 'foo'
    var cookie = driver.Manage().Cookies.GetCookieNamed("foo");
    System.Console.WriteLine(cookie);
   } finally {
    driver.Quit();
   }
  }
 }
}
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
  driver.get 'https://www.example.com'
  driver.manage.add_cookie(name: "foo", value: "bar")

  # Get cookie details with named cookie 'foo'
  puts driver.manage.cookie_named('foo')
ensure
  driver.quit
end
  

        it('Read cookie', async function() {
            await driver.get('https://www.selenium.dev/selenium/web/blank.html');

            // set a cookie on the current domain
            await driver.manage().addCookie({ name: 'foo', value: 'bar' });

            // Get cookie details with named cookie 'foo'
            await driver.manage().getCookie('foo').then(function(cookie) {
                console.log('cookie details => ', cookie);
            });
import org.openqa.selenium.Cookie
import org.openqa.selenium.chrome.ChromeDriver

fun main() {
    val driver = ChromeDriver()
    try {
        driver.get("https://example.com")
        driver.manage().addCookie(Cookie("foo", "bar"))

        // Get cookie details with named cookie 'foo'
        val cookie = driver.manage().getCookieNamed("foo")
        println(cookie)
    } finally {
        driver.quit()
    }
}  
  

获取全部 Cookies

此方法会针对当前访问上下文返回“成功的序列化cookie数据”. 如果浏览器不再可用, 则返回错误.

import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;
import java.util.Set;

public class getAllCookies {
    public static void main(String[] args) {
        WebDriver driver = new ChromeDriver();
        try {
            driver.get("http://www.example.com");
            // Add few cookies
            driver.manage().addCookie(new Cookie("test1", "cookie1"));
            driver.manage().addCookie(new Cookie("test2", "cookie2"));

            // Get All available cookies
            Set<Cookie> cookies = driver.manage().getCookies();
            System.out.println(cookies);
        } finally {
            driver.quit();
        }
    }
}
  
from selenium import webdriver

driver = webdriver.Chrome()

# Navigate to url
driver.get("http://www.example.com")

driver.add_cookie({"name": "test1", "value": "cookie1"})
driver.add_cookie({"name": "test2", "value": "cookie2"})

# Get all available cookies
print(driver.get_cookies())
  
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace GetAllCookies {
 class GetAllCookies {
  public static void Main(string[] args) {
   IWebDriver driver = new ChromeDriver();
   try {
    // Navigate to Url
    driver.Navigate().GoToUrl("https://example.com");
    driver.Manage().Cookies.AddCookie(new Cookie("test1", "cookie1"));
    driver.Manage().Cookies.AddCookie(new Cookie("test2", "cookie2"));

    // Get All available cookies
    var cookies = driver.Manage().Cookies.AllCookies;
   } finally {
    driver.Quit();
   }
  }
 }
}
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
  driver.get 'https://www.example.com'
  driver.manage.add_cookie(name: "test1", value: "cookie1")
  driver.manage.add_cookie(name: "test2", value: "cookie2")

  # Get all available cookies
  puts driver.manage.all_cookies
ensure
  driver.quit
end
  

        it('Read all cookies', async function() {
            await driver.get('https://www.selenium.dev/selenium/web/blank.html');

            // Add few cookies
            await driver.manage().addCookie({ name: 'test1', value: 'cookie1' });
            await driver.manage().addCookie({ name: 'test2', value: 'cookie2' });

            // Get all Available cookies
            await driver.manage().getCookies().then(function(cookies) {
                console.log('cookie details => ', cookies);
            });
import org.openqa.selenium.Cookie
import org.openqa.selenium.chrome.ChromeDriver

fun main() {
    val driver = ChromeDriver()
    try {
        driver.get("https://example.com")
        driver.manage().addCookie(Cookie("test1", "cookie1"))
        driver.manage().addCookie(Cookie("test2", "cookie2"))

        // Get All available cookies
        val cookies = driver.manage().cookies
        println(cookies)
    } finally {
        driver.quit()
    }
}  
  

此方法删除与提供的cookie名称匹配的cookie数据.

import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;

public class deleteCookie {
    public static void main(String[] args) {
        WebDriver driver = new ChromeDriver();
        try {
            driver.get("http://www.example.com");
            driver.manage().addCookie(new Cookie("test1", "cookie1"));
            Cookie cookie1 = new Cookie("test2", "cookie2");
            driver.manage().addCookie(cookie1);

            // delete a cookie with name 'test1'
            driver.manage().deleteCookieNamed("test1");

            /*
             Selenium Java bindings also provides a way to delete
             cookie by passing cookie object of current browsing context
             */
            driver.manage().deleteCookie(cookie1);
        } finally {
            driver.quit();
        }
    }
}
  
from selenium import webdriver
driver = webdriver.Chrome()

# Navigate to url
driver.get("http://www.example.com")
driver.add_cookie({"name": "test1", "value": "cookie1"})
driver.add_cookie({"name": "test2", "value": "cookie2"})

# Delete a cookie with name 'test1'
driver.delete_cookie("test1")
  
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace DeleteCookie {
 class DeleteCookie {
  public static void Main(string[] args) {
   IWebDriver driver = new ChromeDriver();
   try {
    // Navigate to Url
    driver.Navigate().GoToUrl("https://example.com");
    driver.Manage().Cookies.AddCookie(new Cookie("test1", "cookie1"));
    var cookie = new Cookie("test2", "cookie2");
    driver.Manage().Cookies.AddCookie(cookie);

    // delete a cookie with name 'test1'	
    driver.Manage().Cookies.DeleteCookieNamed("test1");

    // Selenium .net bindings also provides a way to delete
    // cookie by passing cookie object of current browsing context
    driver.Manage().Cookies.DeleteCookie(cookie);
   } finally {
    driver.Quit();
   }
  }
 }
}
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
  driver.get 'https://www.example.com'
  driver.manage.add_cookie(name: "test1", value: "cookie1")
  driver.manage.add_cookie(name: "test2", value: "cookie2")

  # delete a cookie with name 'test1'
  driver.manage.delete_cookie('test1')
ensure
  driver.quit
end
  

        it('Delete a cookie', async function() {
            await driver.get('https://www.selenium.dev/selenium/web/blank.html');

            // Add few cookies
            await driver.manage().addCookie({ name: 'test1', value: 'cookie1' });
            await driver.manage().addCookie({ name: 'test2', value: 'cookie2' });

            // Delete a cookie with name 'test1'
            await driver.manage().deleteCookie('test1');

            // Get all Available cookies
            await driver.manage().getCookies().then(function(cookies) {
                console.log('cookie details => ', cookies);
            });
import org.openqa.selenium.Cookie
import org.openqa.selenium.chrome.ChromeDriver

fun main() {
    val driver = ChromeDriver()
    try {
        driver.get("https://example.com")
        driver.manage().addCookie(Cookie("test1", "cookie1"))
        val cookie1 = Cookie("test2", "cookie2")
        driver.manage().addCookie(cookie1)

        // delete a cookie with name 'test1'
        driver.manage().deleteCookieNamed("test1")
        
        // delete cookie by passing cookie object of current browsing context.
        driver.manage().deleteCookie(cookie1)
    } finally {
        driver.quit()
    }
}  
  

删除所有 Cookies

此方法删除当前访问上下文的所有cookie.

import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;

public class deleteAllCookies {
    public static void main(String[] args) {
        WebDriver driver = new ChromeDriver();
        try {
            driver.get("http://www.example.com");
            driver.manage().addCookie(new Cookie("test1", "cookie1"));
            driver.manage().addCookie(new Cookie("test2", "cookie2"));

            // deletes all cookies
            driver.manage().deleteAllCookies();
        } finally {
            driver.quit();
        }
    }
}
  
from selenium import webdriver
driver = webdriver.Chrome()

# Navigate to url
driver.get("http://www.example.com")
driver.add_cookie({"name": "test1", "value": "cookie1"})
driver.add_cookie({"name": "test2", "value": "cookie2"})

#  Deletes all cookies
driver.delete_all_cookies()
  
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace DeleteAllCookies {
 class DeleteAllCookies {
  public static void Main(string[] args) {
   IWebDriver driver = new ChromeDriver();
   try {
    // Navigate to Url
    driver.Navigate().GoToUrl("https://example.com");
    driver.Manage().Cookies.AddCookie(new Cookie("test1", "cookie1"));
    driver.Manage().Cookies.AddCookie(new Cookie("test2", "cookie2"));

    // deletes all cookies
    driver.Manage().Cookies.DeleteAllCookies();
   } finally {
    driver.Quit();
   }
  }
 }
}
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
  driver.get 'https://www.example.com'
  driver.manage.add_cookie(name: "test1", value: "cookie1")
  driver.manage.add_cookie(name: "test2", value: "cookie2")

  # deletes all cookies
  driver.manage.delete_all_cookies
ensure
  driver.quit
end
  

        it('Delete all cookies', async function() {
            await driver.get('https://www.selenium.dev/selenium/web/blank.html');

            // Add few cookies
            await driver.manage().addCookie({ name: 'test1', value: 'cookie1' });
            await driver.manage().addCookie({ name: 'test2', value: 'cookie2' });

            // Delete all cookies
            await driver.manage().deleteAllCookies();
import org.openqa.selenium.Cookie
import org.openqa.selenium.chrome.ChromeDriver

fun main() {
    val driver = ChromeDriver()
    try {
        driver.get("https://example.com")
        driver.manage().addCookie(Cookie("test1", "cookie1"))
        driver.manage().addCookie(Cookie("test2", "cookie2"))

        // deletes all cookies
        driver.manage().deleteAllCookies()
    } finally {
        driver.quit()
    }
}
  

Same-Site Cookie属性

此属性允许用户引导浏览器控制cookie, 是否与第三方站点发起的请求一起发送. 引入其是为了防止CSRF(跨站请求伪造)攻击.

Same-Site cookie属性接受以下两种参数作为指令

Strict:

当sameSite属性设置为 Strict, cookie不会与来自第三方网站的请求一起发送.

Lax:

当您将cookie sameSite属性设置为 Lax, cookie将与第三方网站发起的GET请求一起发送.

注意: 到目前为止, 此功能已在Chrome(80+版本), Firefox(79+版本)中提供, 并适用于Selenium 4以及更高版本.

import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;

public class cookieTest {
  public static void main(String[] args) {
    WebDriver driver = new ChromeDriver();
    try {
      driver.get("http://www.example.com");
      Cookie cookie = new Cookie.Builder("key", "value").sameSite("Strict").build();
      Cookie cookie1 = new Cookie.Builder("key", "value").sameSite("Lax").build();
      driver.manage().addCookie(cookie);
      driver.manage().addCookie(cookie1);
      System.out.println(cookie.getSameSite());
      System.out.println(cookie1.getSameSite());
    } finally {
      driver.quit();
    }
  }
}
  
from selenium import webdriver

driver = webdriver.Chrome()

driver.get("http://www.example.com")
# Adds the cookie into current browser context with sameSite 'Strict' (or) 'Lax'
driver.add_cookie({"name": "foo", "value": "value", 'sameSite': 'Strict'})
driver.add_cookie({"name": "foo1", "value": "value", 'sameSite': 'Lax'})
cookie1 = driver.get_cookie('foo')
cookie2 = driver.get_cookie('foo1')
print(cookie1)
print(cookie2)
  
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace SameSiteCookie {
  class SameSiteCookie {
    static void Main(string[] args) {
      IWebDriver driver = new ChromeDriver();
      try {
        driver.Navigate().GoToUrl("http://www.example.com");

        var cookie1Dictionary = new System.Collections.Generic.Dictionary<string, object>() {
          { "name", "test1" }, { "value", "cookie1" }, { "sameSite", "Strict" } };
        var cookie1 = Cookie.FromDictionary(cookie1Dictionary);

        var cookie2Dictionary = new System.Collections.Generic.Dictionary<string, object>() {
          { "name", "test2" }, { "value", "cookie2" }, { "sameSite", "Lax" } };
        var cookie2 = Cookie.FromDictionary(cookie2Dictionary);

        driver.Manage().Cookies.AddCookie(cookie1);
        driver.Manage().Cookies.AddCookie(cookie2);

        System.Console.WriteLine(cookie1.SameSite);
        System.Console.WriteLine(cookie2.SameSite);
      } finally {
        driver.Quit();
      }
    }
  }
}
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
  driver.get 'https://www.example.com'
  # Adds the cookie into current browser context with sameSite 'Strict' (or) 'Lax'
  driver.manage.add_cookie(name: "foo", value: "bar", same_site: "Strict")
  driver.manage.add_cookie(name: "foo1", value: "bar", same_site: "Lax")
  puts driver.manage.cookie_named('foo')
  puts driver.manage.cookie_named('foo1')
ensure
  driver.quit
end
  

        it('Create cookies with sameSite', async function() {
            await driver.get('https://www.selenium.dev/selenium/web/blank.html');

            // set a cookie on the current domain with sameSite 'Strict' (or) 'Lax'
            await driver.manage().addCookie({ name: 'key', value: 'value', sameSite: 'Strict' });
            await driver.manage().addCookie({ name: 'key', value: 'value', sameSite: 'Lax' });
import org.openqa.selenium.Cookie
import org.openqa.selenium.chrome.ChromeDriver

fun main() {
    val driver = ChromeDriver()
    try {
        driver.get("http://www.example.com")
        val cookie = Cookie.Builder("key", "value").sameSite("Strict").build()
        val cookie1 = Cookie.Builder("key", "value").sameSite("Lax").build()
        driver.manage().addCookie(cookie)
        driver.manage().addCookie(cookie1)
        println(cookie.getSameSite())
        println(cookie1.getSameSite())
    } finally {
        driver.quit()
    }
} 
  

2.6.4 - 与IFrames和frames一起工作

框架是一种现在已被弃用的方法,用于从同一域中的多个文档构建站点布局。除非你使用的是 HTML5 之前的 webapp,否则你不太可能与他们合作。内嵌框架允许插入来自完全不同领域的文档,并且仍然经常使用。

如果您需要使用框架或 iframe, WebDriver 允许您以相同的方式使用它们。考虑 iframe 中的一个按钮。 如果我们使用浏览器开发工具检查元素,我们可能会看到以下内容:

<div id="modal">
  <iframe id="buttonframe"name="myframe"src="https://seleniumhq.github.io">
   <button>Click here</button>
 </iframe>
</div>

如果不是 iframe,我们可能会使用如下方式点击按钮:

// 这不会工作
driver.findElement(By.tagName("button")).click();
    # 这不会工作
driver.find_element(By.TAG_NAME, 'button').click()
// 这不会工作
driver.FindElement(By.TagName("button")).Click();
    # 这不会工作
driver.find_element(:tag_name,'button').click
// 这不会工作
await driver.findElement(By.css('button')).click();
// 这不会工作
driver.findElement(By.tagName("button")).click()

但是,如果 iframe 之外没有按钮,那么您可能会得到一个 no such element 无此元素 的错误。 这是因为 Selenium 只知道顶层文档中的元素。为了与按钮进行交互,我们需要首先切换到框架, 这与切换窗口的方式类似。WebDriver 提供了三种切换到帧的方法。

使用 WebElement

使用 WebElement 进行切换是最灵活的选择。您可以使用首选的选择器找到框架并切换到它。

// 存储网页元素
WebElement iframe = driver.findElement(By.cssSelector("#modal>iframe"));

// 切换到 frame
driver.switchTo().frame(iframe);

// 现在可以点击按钮
driver.findElement(By.tagName("button")).click();
    # 存储网页元素
iframe = driver.find_element(By.CSS_SELECTOR, "#modal > iframe")

    # 切换到选择的 iframe
driver.switch_to.frame(iframe)

    # 单击按钮
driver.find_element(By.TAG_NAME, 'button').click()
// 存储网页元素
IWebElement iframe = driver.FindElement(By.CssSelector("#modal>iframe"));

// 切换到 frame
driver.SwitchTo().Frame(iframe);

// 现在可以点击按钮
driver.FindElement(By.TagName("button")).Click();
    # Store iframe web element
iframe = driver.find_element(:css,'#modal> iframe')

    # 切换到 frame
driver.switch_to.frame iframe

    # 单击按钮
driver.find_element(:tag_name,'button').click
// 存储网页元素
const iframe = driver.findElement(By.css('#modal> iframe'));

// 切换到 frame
await driver.switchTo().frame(iframe);

// 现在可以点击按钮
await driver.findElement(By.css('button')).click();
// 存储网页元素
val iframe = driver.findElement(By.cssSelector("#modal>iframe"))

// 切换到 frame
driver.switchTo().frame(iframe)

// 现在可以点击按钮
driver.findElement(By.tagName("button")).click()

使用 name 或 id

如果您的 frame 或 iframe 具有 id 或 name 属性,则可以使用该属性。如果名称或 id 在页面上不是唯一的, 那么将切换到找到的第一个。

// 使用 ID
driver.switchTo().frame("buttonframe");

// 或者使用 name 代替
driver.switchTo().frame("myframe");

// 现在可以点击按钮
driver.findElement(By.tagName("button")).click();
    # 通过 id 切换框架
driver.switch_to.frame('buttonframe')

    # 单击按钮
driver.find_element(By.TAG_NAME, 'button').click()
// 使用 ID
driver.SwitchTo().Frame("buttonframe");

// 或者使用 name 代替
driver.SwitchTo().Frame("myframe");

// 现在可以点击按钮
driver.FindElement(By.TagName("button")).Click();
    # Switch by ID
driver.switch_to.frame 'buttonframe'

    # 单击按钮
driver.find_element(:tag_name,'button').click
// 使用 ID
await driver.switchTo().frame('buttonframe');

// 或者使用 name 代替
await driver.switchTo().frame('myframe');

// 现在可以点击按钮
await driver.findElement(By.css('button')).click();
// 使用 ID
driver.switchTo().frame("buttonframe")

// 或者使用 name 代替
driver.switchTo().frame("myframe")

// 现在可以点击按钮
driver.findElement(By.tagName("button")).click()

使用索引

还可以使用frame的索引, 例如可以使用JavaScript中的 window.frames 进行查询.

// 切换到第 2 个框架
driver.switchTo().frame(1);
    # 切换到第 2 个框架
driver.switch_to.frame(1)
// 切换到第 2 个框架
driver.SwitchTo().Frame(1);
    # 基于索引切换到第 2 个 iframe
iframe = driver.find_elements(By.TAG_NAME,'iframe')[1]

    # 切换到选择的 iframe
driver.switch_to.frame(iframe)
// 切换到第 2 个框架
await driver.switchTo().frame(1);
// 切换到第 2 个框架
driver.switchTo().frame(1)

离开框架

离开 iframe 或 frameset,切换回默认内容,如下所示:

// 回到顶层
driver.switchTo().defaultContent();
    # 切回到默认内容
driver.switch_to.default_content()
// 回到顶层
driver.SwitchTo().DefaultContent();
    # 回到顶层
driver.switch_to.default_content
// 回到顶层
await driver.switchTo().defaultContent();
// 回到顶层
driver.switchTo().defaultContent()

2.6.5 - 同窗口和标签一起工作

窗口和标签页

WebDriver 没有区分窗口和标签页。如果你的站点打开了一个新标签页或窗口,Selenium 将允许您使用窗口句柄来处理它。 每个窗口都有一个唯一的标识符,该标识符在单个会话中保持持久性。你可以使用以下方法获得当前窗口的窗口句柄:

driver.getWindowHandle();
driver.current_window_handle
driver.CurrentWindowHandle;
driver.window_handle
await driver.getWindowHandle();
driver.windowHandle

切换窗口或标签页

单击在 <a href=“https://seleniumhq.github.io"target="_blank”>新窗口 中打开链接, 则屏幕会聚焦在新窗口或新标签页上,但 WebDriver 不知道操作系统认为哪个窗口是活动的。 要使用新窗口,您需要切换到它。 如果只有两个选项卡或窗口被打开,并且你知道从哪个窗口开始, 则你可以遍历 WebDriver, 通过排除法可以看到两个窗口或选项卡,然后切换到你需要的窗口或选项卡。

不过,Selenium 4 提供了一个新的 api NewWindow 它创建一个新选项卡 (或) 新窗口并自动切换到它。

// 存储原始窗口的 ID
String originalWindow = driver.getWindowHandle();

// 检查一下,我们还没有打开其他的窗口
assert driver.getWindowHandles().size() == 1;

// 点击在新窗口中打开的链接
driver.findElement(By.linkText("new window")).click();

// 等待新窗口或标签页
wait.until(numberOfWindowsToBe(2));

// 循环执行,直到找到一个新的窗口句柄
for (String windowHandle : driver.getWindowHandles()) {
	if(!originalWindow.contentEquals(windowHandle)) {
		driver.switchTo().window(windowHandle);
		break;
	}
}

// 等待新标签完成加载内容
wait.until(titleIs("Selenium documentation"));
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

    # 启动驱动程序
with webdriver.Firefox() as driver:
    # 打开网址
driver.get("https://seleniumhq.github.io")

    # 设置等待
    wait = WebDriverWait(driver, 10)

    # 存储原始窗口的 ID
    original_window = driver.current_window_handle

    # 检查一下,我们还没有打开其他的窗口
    assert len(driver.window_handles) == 1

    # 单击在新窗口中打开的链接
    driver.find_element(By.LINK_TEXT, "new window").click()

    # 等待新窗口或标签页
    wait.until(EC.number_of_windows_to_be(2))

    # 循环执行,直到找到一个新的窗口句柄
    for window_handle in driver.window_handles:
        if window_handle != original_window:
            driver.switch_to.window(window_handle)
            break

    # 等待新标签页完成加载内容
    wait.until(EC.title_is("SeleniumHQ Browser Automation"))
// 存储原始窗口的 ID
string originalWindow = driver.CurrentWindowHandle;

// 检查一下,我们还没有打开其他的窗口
Assert.AreEqual(driver.WindowHandles.Count, 1);

// 单击在新窗口中打开的链接
driver.FindElement(By.LinkText("new window")).Click();

// 等待新窗口或标签页
wait.Until(wd => wd.WindowHandles.Count == 2);

// 循环执行,直到找到一个新的窗口句柄
foreach(string window in driver.WindowHandles)
{if(originalWindow != window)
{driver.SwitchTo().Window(window);
break;
}
}
// 等待新标签页完成加载内容
wait.Until(wd => wd.Title == "Selenium documentation");
    # 存储原始窗口的 ID
original_window = driver.window_handle

    #检查一下,我们还没有打开其他的窗口
assert(driver.window_handles.length == 1,'Expected one window')

    #点击在新窗口中打开的链接
driver.find_element(link:'new window').click

    #等待新窗口或标签页
wait.until {driver.window_handles.length == 2}

    #循环执行,直到找到一个新的窗口句柄
driver.window_handles.each do |handle|
if handle != original_window
driver.switch_to.window handle
break
end
end

    #等待新标签页完成加载内容
wait.until {driver.title =='Selenium documentation'}
// 存储原始窗口的 ID
const originalWindow = await driver.getWindowHandle();

// 检查一下,我们还没有打开其他的窗口
assert((await driver.getAllWindowHandles()).length === 1);

// 点击在新窗口中打开的链接
await driver.findElement(By.linkText('new window')).click();

// 等待新窗口或标签页
await driver.wait(async () => (await driver.getAllWindowHandles()).length === 2,
10000
);

// 循环执行,直到找到一个新的窗口句柄
const windows = await driver.getAllWindowHandles();
windows.forEach(async handle => {if (handle !== originalWindow) {await driver.switchTo().window(handle);
}
});

// 等待新标签页完成加载内容
await driver.wait(until.titleIs('Selenium documentation'), 10000);
// 存储原始窗口的 ID
val originalWindow = driver.getWindowHandle()

// 检查一下,我们还没有打开其他的窗口
assert(driver.getWindowHandles().size() === 1)

// 点击在新窗口中打开的链接
driver.findElement(By.linkText("new window")).click()

// 等待新窗口或标签页
wait.until(numberOfWindowsToBe(2))

// 循环执行,直到找到一个新的窗口句柄
for (windowHandle in driver.getWindowHandles()) {
if (!originalWindow.contentEquals(windowHandle)) {
driver.switchTo().window(windowHandle)
break
}
}

// 等待新标签页完成加载内容
wait.until(titleIs("Selenium documentation"))

创建新窗口(或)新标签页并且切换

创建一个新窗口 (或) 标签页,屏幕焦点将聚焦在新窗口或标签在上。您不需要切换到新窗口 (或) 标签页。如果除了新窗口之外, 您打开了两个以上的窗口 (或) 标签页,您可以通过遍历 WebDriver 看到两个窗口或选项卡,并切换到非原始窗口。

注意: 该特性适用于 Selenium 4 及其后续版本。

// 打开新标签页并切换到新标签页
driver.switchTo().newWindow(WindowType.TAB);

// 打开一个新窗口并切换到新窗口
driver.switchTo().newWindow(WindowType.WINDOW);
    # 打开新标签页并切换到新标签页
driver.switch_to.new_window('tab')

    # 打开一个新窗口并切换到新窗口
driver.switch_to.new_window('window')
// 打开新标签页并切换到新标签页
driver.SwitchTo().NewWindow(WindowType.Tab)

// 打开一个新窗口并切换到新窗口
driver.SwitchTo().NewWindow(WindowType.Window)
    # 注意:ruby 中的 new_window 只打开一个新标签页(或)窗口,不会自动切换
    # 用户必须切换到新选项卡 (或) 新窗口

    # 打开新标签页并切换到新标签页
driver.manage.new_window(:tab)

    # 打开一个新窗口并切换到新窗口
driver.manage.new_window(:window)
// 打开新标签页并切换到新标签页
await driver.switchTo().newWindow('tab');

// 打开一个新窗口并切换到新窗口
await driver.switchTo().newWindow('window');
// 打开新标签页并切换到新标签页
driver.switchTo().newWindow(WindowType.TAB)

// 打开一个新窗口并切换到新窗口
driver.switchTo().newWindow(WindowType.WINDOW)

关闭窗口或标签页

当你完成了一个窗口或标签页的工作时,_并且_它不是浏览器中最后一个打开的窗口或标签页时,你应该关闭它并切换回你之前使用的窗口。 假设您遵循了前一节中的代码示例,您将把前一个窗口句柄存储在一个变量中。把这些放在一起,你会得到:

//关闭标签页或窗口
driver.close();

//切回到之前的标签页或窗口
driver.switchTo().window(originalWindow);
    #关闭标签页或窗口
driver.close()

    #切回到之前的标签页或窗口
driver.switch_to.window(original_window)
//关闭标签页或窗口
driver.Close();

//切回到之前的标签页或窗口
driver.SwitchTo().Window(originalWindow);
    #关闭标签页或窗口
driver.close

    #切回到之前的标签页或窗口
driver.switch_to.window original_window
//关闭标签页或窗口
await driver.close();

//切回到之前的标签页或窗口
await driver.switchTo().window(originalWindow);
//关闭标签页或窗口
driver.close()

//切回到之前的标签页或窗口
driver.switchTo().window(originalWindow)

如果在关闭一个窗口后忘记切换回另一个窗口句柄,WebDriver 将在当前关闭的页面上执行,并触发一个 No Such Window Exception 无此窗口异常。必须切换回有效的窗口句柄才能继续执行。

在会话结束时退出浏览器

当你完成了浏览器会话,你应该调用 quit 退出,而不是 close 关闭:

driver.quit();
driver.quit()
driver.Quit();
driver.quit
await driver.quit();
driver.quit()

  • 退出将会
    • 关闭所有与 WebDriver 会话相关的窗口和选项卡
    • 结束浏览器进程
    • 结束后台驱动进程
    • 通知 Selenium Grid 浏览器不再使用,以便可以由另一个会话使用它(如果您正在使用 Selenium Grid)

调用 quit() 失败将留下额外的后台进程和端口运行在机器上,这可能在以后导致一些问题。

有的测试框架提供了一些方法和注释,您可以在测试结束时放入 teardown() 方法中。

/**
* 使用 JUnit 的例子
* https://junit.org/junit5/docs/current/api/org/junit/jupiter/api/AfterAll.html
*/
@AfterAll
public static void tearDown() {
    driver.quit();
}
    # unittest teardown
    # https://docs.python.org/3/library/unittest.html?highlight=teardown#unittest.TestCase.tearDown
def tearDown(self):
self.driver.quit()
/*
使用 Visual Studio 的 UnitTesting 的例子
https://msdn.microsoft.com/en-us/library/microsoft.visualstudio.testtools.unittesting.aspx
*/
[TestCleanup]
public void TearDown()
{driver.Quit();
}
    # UnitTest Teardown
    # https://www.rubydoc.info/github/test-unit/test-unit/Test/Unit/TestCase
def teardown
@driver.quit
end
/**
* 使用 Mocha 的例子
* https://mochajs.org/#hooks
  */
  after('Tear down', async function () {await driver.quit();
  });
  
/**
* 使用 JUnit 的例子
* https://junit.org/junit5/docs/current/api/org/junit/jupiter/api/AfterAll.html
*/
@AfterAll
fun tearDown() {
	driver.quit()
}

如果不在测试上下文中运行 WebDriver,您可以考虑使用 try / finally,这是大多数语言都提供的, 这样一个异常处理仍然可以清理 WebDriver 会话。

try {
    //WebDriver 代码…
} finally {
    driver.quit();
}
try:
    #WebDriver 代码…
finally:
driver.quit()
try {//WebDriver 代码…} finally {driver.Quit();
}
begin
    #WebDriver 代码…
ensure
driver.quit
end
try {//WebDriver 代码…} finally {await driver.quit();
}
try {//WebDriver 代码…} finally {driver.quit()
}

Python 的 WebDriver 现在支持 Python 上下文管理器,当使用 with 关键字时,可以在执行结束时自动退出驱动程序。

with webdriver.Firefox() as driver:
  # WebDriver 代码…

# 在此缩进位置后 WebDriver 会自动退出

窗口管理

屏幕分辨率会影响 web 应用程序的呈现方式,因此 WebDriver 提供了移动和调整浏览器窗口大小的机制。

获取窗口大小

获取浏览器窗口的大小(以像素为单位)。

// 分别获取每个尺寸
int width = driver.manage().window().getSize().getWidth();
int height = driver.manage().window().getSize().getHeight();

// 或者存储尺寸并在以后查询它们
Dimension size = driver.manage().window().getSize();
int width1 = size.getWidth();
int height1 = size.getHeight();
    # 分别获取每个尺寸
width = driver.get_window_size().get("width")
height = driver.get_window_size().get("height")

    # 或者存储尺寸并在以后查询它们
size = driver.get_window_size()
width1 = size.get("width")
height1 = size.get("height")
// 分别获取每个尺寸
int width = driver.Manage().Window.Size.Width;
int height = driver.Manage().Window.Size.Height;

// 或者存储尺寸并在以后查询它们
System.Drawing.Size size = driver.Manage().Window.Size;
int width1 = size.Width;
int height1 = size.Height;
    # 分别获取每个尺寸
width = driver.manage.window.size.width
height = driver.manage.window.size.height

    # 或者存储尺寸并在以后查询它们
size = driver.manage.window.size
width1 = size.width
height1 = size.height
// 分别获取每个尺寸
const {width, height} = await driver.manage().window().getRect();

// 或者存储尺寸并在以后查询它们
const rect = await driver.manage().window().getRect();
const width1 = rect.width;
const height1 = rect.height;
// 分别获取每个尺寸
val width = driver.manage().window().size.width
val height = driver.manage().window().size.height

// 或者存储尺寸并在以后查询它们
val size = driver.manage().window().size
val width1 = size.width
val height1 = size.height

设置窗口大小

恢复窗口并设置窗口大小。

driver.manage().window().setSize(new Dimension(1024, 768));
driver.set_window_size(1024, 768)
driver.Manage().Window.Size = new Size(1024, 768);
driver.manage.window.resize_to(1024,768)
await driver.manage().window().setRect({width: 1024, height: 768});
driver.manage().window().size = Dimension(1024, 768)

得到窗口的位置

获取浏览器窗口左上角的坐标。

// 分别获取每个尺寸
int x = driver.manage().window().getPosition().getX();
int y = driver.manage().window().getPosition().getY();

// 或者存储尺寸并在以后查询它们
Point position = driver.manage().window().getPosition();
int x1 = position.getX();
int y1 = position.getY();
    # 分别获取每个尺寸
x = driver.get_window_position().get('x')
y = driver.get_window_position().get('y')

    # 或者存储尺寸并在以后查询它们
position = driver.get_window_position()
x1 = position.get('x')
y1 = position.get('y')
// 分别获取每个尺寸
int x = driver.Manage().Window.Position.X;
int y = driver.Manage().Window.Position.Y;

// 或者存储尺寸并在以后查询它们
Point position = driver.Manage().Window.Position;
int x1 = position.X;
int y1 = position.Y;
    #Access each dimension individually
x = driver.manage.window.position.x
y = driver.manage.window.position.y

    # Or store the dimensions and query them later
rect  = driver.manage.window.rect
x1 = rect.x
y1 = rect.y
// 分别获取每个尺寸
const {x, y} = await driver.manage().window().getRect();

// 或者存储尺寸并在以后查询它们
const rect = await driver.manage().window().getRect();
const x1 = rect.x;
const y1 = rect.y;
// 分别获取每个尺寸
val x = driver.manage().window().position.x
val y = driver.manage().window().position.y

// 或者存储尺寸并在以后查询它们
val position = driver.manage().window().position
val x1 = position.x
val y1 = position.y

设置窗口位置

将窗口移动到设定的位置。

// 将窗口移动到主显示器的左上角
driver.manage().window().setPosition(new Point(0, 0));
    # 将窗口移动到主显示器的左上角
driver.set_window_position(0, 0)
// 将窗口移动到主显示器的左上角
driver.Manage().Window.Position = new Point(0, 0);
driver.manage.window.move_to(0,0)
// 将窗口移动到主显示器的左上角
await driver.manage().window().setRect({x: 0, y: 0});
// 将窗口移动到主显示器的左上角
driver.manage().window().position = Point(0,0)

最大化窗口

扩大窗口。对于大多数操作系统,窗口将填满屏幕,而不会阻挡操作系统自己的菜单和工具栏。

driver.manage().window().maximize();
driver.maximize_window()
driver.Manage().Window.Maximize();
driver.manage.window.maximize
await driver.manage().window().maximize();
driver.manage().window().maximize()

最小化窗口

最小化当前浏览上下文的窗口. 这种命令的精准行为将作用于各个特定的窗口管理器.

最小化窗口通常将窗口隐藏在系统托盘中.

注意: 此功能适用于Selenium 4以及更高版本.

driver.manage().window().minimize();
driver.minimize_window()
driver.Manage().Window.Minimize();
driver.manage.window.minimize
await driver.manage().window().minimize();
driver.manage().window().minimize()

全屏窗口

填充整个屏幕,类似于在大多数浏览器中按下 F11。

driver.manage().window().fullscreen();
driver.fullscreen_window()
driver.Manage().Window.FullScreen();
driver.manage.window.full_screen
await driver.manage().window().fullscreen();
driver.manage().window().fullscreen()

屏幕截图

用于捕获当前浏览上下文的屏幕截图. WebDriver端点 屏幕截图 返回以Base64格式编码的屏幕截图.

import org.apache.commons.io.FileUtils;
import org.openqa.selenium.chrome.ChromeDriver;
import java.io.*;
import org.openqa.selenium.*;

public class SeleniumTakeScreenshot {
	public static void main(String args[]) throws IOException {
		WebDriver driver = new ChromeDriver();
		driver.get("http://www.example.com");
		File scrFile = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
		FileUtils.copyFile(scrFile, new File("./image.png"));
		driver.quit();
	}
}
from selenium import webdriver

driver = webdriver.Chrome()

    # Navigate to url
driver.get("http://www.example.com")

    # Returns and base64 encoded string into image
driver.save_screenshot('./image.png')

driver.quit()
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Support.UI;

    var driver = new ChromeDriver();
    driver.Navigate().GoToUrl("http://www.example.com");
    Screenshot screenshot = (driver as ITakesScreenshot).GetScreenshot();
    screenshot.SaveAsFile("screenshot.png", ScreenshotImageFormat.Png); // Format values are Bmp, Gif, Jpeg, Png, Tiff
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
driver.get 'https://example.com/'

    # Takes and Stores the screenshot in specified path
driver.save_screenshot('./image.png')

end
let {Builder} = require('selenium-webdriver');
let fs = require('fs');

(async function example() {
let driver = await new Builder()
.forBrowser('chrome')
.build();

    await driver.get('https://www.example.com');
    // Returns base64 encoded string
    let encodedString = await driver.takeScreenshot();
    await fs.writeFileSync('./image.png', encodedString, 'base64');
    await driver.quit();
}())
import com.oracle.tools.packager.IOUtils.copyFile
import org.openqa.selenium.*
import org.openqa.selenium.chrome.ChromeDriver
import java.io.File

fun main(){
val driver =  ChromeDriver()
driver.get("https://www.example.com")
val scrFile = (driver as TakesScreenshot).getScreenshotAs<File>(OutputType.FILE)
copyFile(scrFile, File("./image.png"))
driver.quit()
}

元素屏幕截图

用于捕获当前浏览上下文的元素的屏幕截图. WebDriver端点 屏幕截图 返回以Base64格式编码的屏幕截图.

import org.apache.commons.io.FileUtils;
import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;
import java.io.File;
import java.io.IOException;

public class SeleniumelementTakeScreenshot {
public static void main(String args[]) throws IOException {
	WebDriver driver = new ChromeDriver();
		driver.get("https://www.example.com");
		WebElement element = driver.findElement(By.cssSelector("h1"));
		File scrFile = element.getScreenshotAs(OutputType.FILE);
		FileUtils.copyFile(scrFile, new File("./image.png"));
		driver.quit();
	}
}
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()

    # Navigate to url
driver.get("http://www.example.com")

ele = driver.find_element(By.CSS_SELECTOR, 'h1')

    # Returns and base64 encoded string into image
ele.screenshot('./image.png')

driver.quit()
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Support.UI;

    // Webdriver
    var driver = new ChromeDriver();
    driver.Navigate().GoToUrl("http://www.example.com");

    // Fetch element using FindElement
    var webElement = driver.FindElement(By.CssSelector("h1"));

    // Screenshot for the element
    var elementScreenshot = (webElement as ITakesScreenshot).GetScreenshot();
    elementScreenshot.SaveAsFile("screenshot_of_element.png");
    # Works with Selenium4-alpha7 Ruby bindings and above
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
driver.get 'https://example.com/'
ele = driver.find_element(:css, 'h1')

    # Takes and Stores the element screenshot in specified path
ele.save_screenshot('./image.jpg')
end
const {Builder, By} = require('selenium-webdriver');
let fs = require('fs');

(async function example() {
let driver = await new Builder()
.forBrowser('chrome')
.build();

await driver.get('https://www.example.com');
let ele = await driver.findElement(By.css("h1"));
// Captures the element screenshot
let encodedString = await ele.takeScreenshot(true);
await fs.writeFileSync('./image.png', encodedString, 'base64');
await driver.quit();
}())
import org.apache.commons.io.FileUtils
import org.openqa.selenium.chrome.ChromeDriver
import org.openqa.selenium.*
import java.io.File

fun main() {
val driver = ChromeDriver()
driver.get("https://www.example.com")
val element = driver.findElement(By.cssSelector("h1"))
val scrFile: File = element.getScreenshotAs(OutputType.FILE)
FileUtils.copyFile(scrFile, File("./image.png"))
driver.quit()
}

执行脚本

在当前frame或者窗口的上下文中,执行JavaScript代码片段.

//Creating the JavascriptExecutor interface object by Type casting
JavascriptExecutor js = (JavascriptExecutor)driver;
//Button Element
WebElement button =driver.findElement(By.name("btnLogin"));
//Executing JavaScript to click on element
js.executeScript("arguments[0].click();", button);
//Get return value from script
String text = (String) js.executeScript("return arguments[0].innerText", button);
//Executing JavaScript directly
js.executeScript("console.log('hello world')");
    # Stores the header element
header = driver.find_element(By.CSS_SELECTOR, "h1")

    # Executing JavaScript to capture innerText of header element
driver.execute_script('return arguments[0].innerText', header)
//creating Chromedriver instance
	IWebDriver driver = new ChromeDriver();
	//Creating the JavascriptExecutor interface object by Type casting
	IJavaScriptExecutor js = (IJavaScriptExecutor) driver;
	//Button Element
	IWebElement button = driver.FindElement(By.Name("btnLogin"));
	//Executing JavaScript to click on element
	js.ExecuteScript("arguments[0].click();", button);
	//Get return value from script
	String text = (String)js.ExecuteScript("return arguments[0].innerText", button);
	//Executing JavaScript directly
	js.ExecuteScript("console.log('hello world')");
    # Stores the header element
header = driver.find_element(css: 'h1')

    # Get return value from script
result = driver.execute_script("return arguments[0].innerText", header)

    # Executing JavaScript directly
driver.execute_script("alert('hello world')")
// Stores the header element
let header = await driver.findElement(By.css('h1'));

// Executing JavaScript to capture innerText of header element
let text = await driver.executeScript('return arguments[0].innerText', header);
// Stores the header element
val header = driver.findElement(By.cssSelector("h1"))

// Get return value from script
val result = driver.executeScript("return arguments[0].innerText", header)

// Executing JavaScript directly
driver.executeScript("alert('hello world')")

打印页面

打印当前浏览器内的页面

注意: 此功能需要无头模式下的Chromium浏览器

import org.openqa.selenium.print.PrintOptions;

driver.get("https://www.selenium.dev");
printer = (PrintsPage) driver;

PrintOptions printOptions = new PrintOptions();
printOptions.setPageRanges("1-2");

Pdf pdf = printer.print(printOptions);
String content = pdf.getContent();
from selenium.webdriver.common.print_page_options import PrintOptions

    print_options = PrintOptions()
    print_options.page_ranges = ['1-2']

    driver.get("printPage.html")

    base64code = driver.print_page(print_options)
// code sample not available please raise a PR
driver.navigate_to 'https://www.selenium.dev'

    base64encodedContent = driver.print_page(orientation: 'landscape')
const {Builder} = require('selenium-webdriver');
const chrome = require('selenium-webdriver/chrome');
let opts = new chrome.Options();
let fs = require('fs');
(async function example() {
let driver = new Builder()
.forBrowser('chrome')
.setChromeOptions(opts.headless())
.build();
await driver.get('https://www.selenium.dev');
try {
let base64 = await driver.printPage({pageRanges:["1-2"]});
await fs.writeFileSync('./test.pdf', base64, 'base64');
} catch (e) {
console.log(e)
}
await driver.quit();
driver.get("https://www.selenium.dev")
val printer = driver as PrintsPage

val printOptions = PrintOptions()
printOptions.setPageRanges("1-2")

val pdf: Pdf = printer.print(printOptions)
val content = pdf.content

2.6.6 - 虚拟身份验证器

一种Web身份验证器模型的表示形式.

Web 应用程序可以启用基于公钥的身份验证机制(称为 Web 身份验证)以无密码方式对用户进行身份验证。 Web 身份验证 定义了允许用户创建公钥凭据并将其注册到身份验证器的 API。 身份验证器可以是硬件设备或软件实体,用于存储用户的公钥凭证并根据请求检索它们。

顾名思义,虚拟身份验证器模拟此类身份验证器进行测试。

虚拟身份验证器选项

虚拟身份验证器具有 一组属性。 这些属性在 Selenium 绑定中映射为 VirtualAuthenticatorOptions。

    VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
      .setIsUserVerified(true)
      .setHasUserVerification(true)
      .setIsUserConsenting(true)
      .setTransport(VirtualAuthenticatorOptions.Transport.USB)
      .setProtocol(VirtualAuthenticatorOptions.Protocol.U2F)
      .setHasResidentKey(false);
            // Create virtual authenticator options
            VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
                .SetIsUserVerified(true)
                .SetHasUserVerification(true)
                .SetIsUserConsenting(true)
                .SetTransport(VirtualAuthenticatorOptions.Transport.USB)
                .SetProtocol(VirtualAuthenticatorOptions.Protocol.U2F)
                .SetHasResidentKey(false);
    it('Virtual options', async function () {
      options = new VirtualAuthenticatorOptions();
      options.setIsUserVerified(true);
      options.setHasUserVerification(true);
      options.setIsUserConsenting(true);
      options.setTransport(Transport['USB']);
      options.setProtocol(Protocol['U2F']);

添加虚拟身份验证器

它使用提供的属性创建一个新的虚拟身份验证器。

    VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
      .setProtocol(VirtualAuthenticatorOptions.Protocol.U2F)
      .setHasResidentKey(false);

    VirtualAuthenticator authenticator =
      ((HasVirtualAuthenticator) driver).addVirtualAuthenticator(options);
            // Create virtual authenticator options
            VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
                .SetProtocol(VirtualAuthenticatorOptions.Protocol.U2F)
                .SetHasResidentKey(false);

            // Register a virtual authenticator
            ((WebDriver)driver).AddVirtualAuthenticator(options);

            List<Credential> credentialList = ((WebDriver)driver).GetCredentials();
            options.setProtocol(Protocol['U2F']);
            options.setHasResidentKey(false);

            // Register a virtual authenticator
            await driver.addVirtualAuthenticator(options);

删除虚拟身份验证器

删除之前添加的虚拟身份验证器。

    ((HasVirtualAuthenticator) driver).removeVirtualAuthenticator(authenticator);
            VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
                .SetProtocol(VirtualAuthenticatorOptions.Protocol.U2F)
                .SetHasResidentKey(false);

            String virtualAuthenticatorId = ((WebDriver)driver).AddVirtualAuthenticator(options);

            ((WebDriver)driver).RemoveVirtualAuthenticator(virtualAuthenticatorId);
            await driver.addVirtualAuthenticator(options);
            await driver.removeVirtualAuthenticator();

创建永久凭据

使用给定的所需凭据 参数 创建一个永久(有状态的)凭据。

    byte[] credentialId = {1, 2, 3, 4};
    byte[] userHandle = {1};
    Credential residentCredential = Credential.createResidentCredential(
      credentialId, "localhost", rsaPrivateKey, userHandle, /*signCount=*/0);
            byte[] credentialId = { 1, 2, 3, 4 };
            byte[] userHandle = { 1 };

            Credential residentCredential = Credential.CreateResidentCredential(
              credentialId, "localhost", base64EncodedPK, userHandle, 0);

创建临时凭据

使用给定的所需凭据 参数 创建一个常驻(无状态)凭据。

    byte[] credentialId = {1, 2, 3, 4};
    Credential nonResidentCredential = Credential.createNonResidentCredential(
      credentialId, "localhost", ec256PrivateKey, /*signCount=*/0);
            byte[] credentialId = { 1, 2, 3, 4 };

            Credential nonResidentCredential = Credential.CreateNonResidentCredential(
              credentialId, "localhost", base64EncodedEC256PK, 0);

添加凭据

向身份验证器注册凭据。

    VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
      .setProtocol(VirtualAuthenticatorOptions.Protocol.U2F)
      .setHasResidentKey(false);

    VirtualAuthenticator authenticator = ((HasVirtualAuthenticator) driver).addVirtualAuthenticator(options);

    byte[] credentialId = {1, 2, 3, 4};
    Credential nonResidentCredential = Credential.createNonResidentCredential(
      credentialId, "localhost", ec256PrivateKey, /*signCount=*/0);
    authenticator.addCredential(nonResidentCredential);
            VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
                .SetProtocol(VirtualAuthenticatorOptions.Protocol.U2F)
                .SetHasResidentKey(false);

            ((WebDriver)driver).AddVirtualAuthenticator(options);

            byte[] credentialId = { 1, 2, 3, 4 };

            Credential nonResidentCredential = Credential.CreateNonResidentCredential(
              credentialId, "localhost", base64EncodedEC256PK, 0);

            ((WebDriver)driver).AddCredential(nonResidentCredential);

获取凭据

返回身份验证者拥有的凭据列表。

    VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
      .setProtocol(VirtualAuthenticatorOptions.Protocol.CTAP2)
      .setHasResidentKey(true)
      .setHasUserVerification(true)
      .setIsUserVerified(true);
    VirtualAuthenticator authenticator = ((HasVirtualAuthenticator) driver).addVirtualAuthenticator(options);

    byte[] credentialId = {1, 2, 3, 4};
    byte[] userHandle = {1};
    Credential residentCredential = Credential.createResidentCredential(
      credentialId, "localhost", rsaPrivateKey, userHandle, /*signCount=*/0);

    authenticator.addCredential(residentCredential);

    List<Credential> credentialList = authenticator.getCredentials();
            VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
                .SetProtocol(Protocol.CTAP2)
                .SetHasResidentKey(true)
                .SetHasUserVerification(true)
                .SetIsUserVerified(true);

            ((WebDriver)driver).AddVirtualAuthenticator(options);

            byte[] credentialId = { 1, 2, 3, 4 };
            byte[] userHandle = { 1 };

            Credential residentCredential = Credential.CreateResidentCredential(
              credentialId, "localhost", base64EncodedPK, userHandle, 0);

            ((WebDriver)driver).AddCredential(residentCredential);

            List<Credential> credentialList = ((WebDriver)driver).GetCredentials();

删除凭据

根据传递的凭据ID从身份验证器中删除凭据。

            ((WebDriver)driver).AddVirtualAuthenticator(new VirtualAuthenticatorOptions());

            byte[] credentialId = { 1, 2, 3, 4 };

            Credential nonResidentCredential = Credential.CreateNonResidentCredential(
              credentialId, "localhost", base64EncodedEC256PK, 0);

            ((WebDriver)driver).AddCredential(nonResidentCredential);

            ((WebDriver)driver).RemoveCredential(credentialId);
    VirtualAuthenticator authenticator =
      ((HasVirtualAuthenticator) driver).addVirtualAuthenticator(new VirtualAuthenticatorOptions());

    byte[] credentialId = {1, 2, 3, 4};
    Credential credential = Credential.createNonResidentCredential(
      credentialId, "localhost", rsaPrivateKey, 0);

    authenticator.addCredential(credential);

    authenticator.removeCredential(credentialId);

删除所有凭据

从身份验证器中删除所有凭据。

    VirtualAuthenticator authenticator =
      ((HasVirtualAuthenticator) driver).addVirtualAuthenticator(new VirtualAuthenticatorOptions());

    byte[] credentialId = {1, 2, 3, 4};
    Credential residentCredential = Credential.createNonResidentCredential(
      credentialId, "localhost", rsaPrivateKey, /*signCount=*/0);

    authenticator.addCredential(residentCredential);

    authenticator.removeAllCredentials();
            ((WebDriver)driver).AddVirtualAuthenticator(new VirtualAuthenticatorOptions());

            byte[] credentialId = { 1, 2, 3, 4 };

            Credential nonResidentCredential = Credential.CreateNonResidentCredential(
              credentialId, "localhost", base64EncodedEC256PK, 0);

            ((WebDriver)driver).AddCredential(nonResidentCredential);

            ((WebDriver)driver).RemoveAllCredentials();

设置用户验证状态

设置身份验证器是模拟用户验证成功还是失败。

    VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
      .setIsUserVerified(true);
            VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
                .SetIsUserVerified(true);

2.7 - Actions接口

用于向 Web 浏览器提供虚拟化设备输入操作的低级接口.

除了高级元素交互之外, Actions 接口 还提供了对指定输入设备 可以执行的确切操作的精细控制. Selenium为3种输入源提供了接口: 键盘设备的键输入, 鼠标, 笔或触摸设备的输入, 以及滚轮设备的滚轮输入 (在Selenium 4.2中引入). Selenium允许您构建分配给特定输入的独立操作命令, 会将他们链接在一起, 并调用关联的执行方法以一次执行它们.

Action构造器

在从遗留JSON Wire协议迁移到 新的W3C WebDriver协议的过程中, 低级的操作构建块变得特别详细. 它非常强大, 但每个输入设备都有多种使用方法, 如果您需要管理多个设备, 则负责确保他们之间的同步正确.

值得庆幸的是, 您可能不需要学习如何直接使用低级命令, 因为您可能要执行的几乎所有操作, 都已提供了相应的简便方法, 这些方法可以为您组合较低级别的命令. 请分别参阅相应的键盘, 鼠标, 滚轮 页面.

暂停

指针移动和滚轮滚动 允许用户设置操作的持续时间, 但有时您只需要在操作之间等待一下, 即可正常工作.

        WebElement clickable = driver.findElement(By.id("clickable"));
        new Actions(driver)
                .moveToElement(clickable)
                .pause(Duration.ofSeconds(1))
                .clickAndHold()
                .pause(Duration.ofSeconds(1))
                .sendKeys("abc")
                .perform();
    clickable = driver.find_element(By.ID, "clickable")
    ActionChains(driver)\
        .move_to_element(clickable)\
        .pause(1)\
        .click_and_hold()\
        .pause(1)\
        .send_keys("abc")\
        .perform()
            IWebElement clickable = driver.FindElement(By.Id("clickable"));
            new Actions(driver)
                .MoveToElement(clickable)
                .Pause(TimeSpan.FromSeconds(1))
                .ClickAndHold()
                .Pause(TimeSpan.FromSeconds(1))
                .SendKeys("abc")
                .Perform();
    clickable = driver.find_element(id: 'clickable')
    driver.action
          .move_to(clickable)
          .pause(duration: 1)
          .click_and_hold
          .pause(duration: 1)
          .send_keys('abc')
          .perform
      const clickable = await driver.findElement(By.id('clickable'))
      await driver.actions()
        .move({ origin: clickable })
        .pause(1000)
        .press()
        .pause(1000)
        .sendKeys('abc')
        .perform()
        val clickable = driver.findElement(By.id("clickable"))
        Actions(driver)
            .moveToElement(clickable)
            .pause(Duration.ofSeconds(1))
            .clickAndHold()
            .pause(Duration.ofSeconds(1))
            .sendKeys("abc")
            .perform() 

释放所有Actions

需要注意的重要一点是, 驱动程序会记住整个会话中所有输入项的状态. 即使创建actions类的新实例, 按下的键和指针的位置 也将处于以前执行的操作离开它们的任何状态.

有一种特殊的方法来释放所有当前按下的键和指针按钮. 此方法在每种语言中的实现方式不同, 因为它不会使用perform方法执行.

        ((RemoteWebDriver) driver).resetInputState();
    ActionBuilder(driver).clear_actions()
            ((WebDriver)driver).ResetInputState();
    driver.action.release_actions
      await driver.actions().clear()
        (driver as RemoteWebDriver).resetInputState()

2.7.1 - 键盘操作

一种适用于任何与网页交互的按键输入设备的表现形式.

只有 2 个操作可以使用键盘完成: 按下某个键,以及释放一个按下的键. 除了支持 ASCII 字符外,每个键盘按键还具有 可以按特定顺序按下或释放的表现形式.

按键

除了由常规unicode表示的按键, 其他键盘按键被分配了一些unicode值以用于操作Selenium 每种语言都有自己的方式来援引这些按键; 这里 可以找到完整列表

按下按键

        new Actions(driver)
                .keyDown(Keys.SHIFT)
                .sendKeys("a")
                .perform();
    ActionChains(driver)\
        .key_down(Keys.SHIFT)\
        .send_keys("abc")\
        .perform()
                .KeyDown(Keys.Shift)
                .SendKeys("a")
                .Perform();
    driver.action
          .key_down(:shift)
          .send_keys('a')
          .perform
      await driver.actions()
        .keyDown(Key.SHIFT)
        .sendKeys('a')
        .perform()
        Actions(driver)
                .keyDown(Keys.SHIFT)
                .sendKeys("a")
                .perform()

释放按键

        new Actions(driver)
                .keyDown(Keys.SHIFT)
                .sendKeys("a")
                .keyUp(Keys.SHIFT)
                .sendKeys("b")
                .perform();
    ActionChains(driver)\
        .key_down(Keys.SHIFT)\
        .send_keys("a")\
        .key_up(Keys.SHIFT)\
        .send_keys("b")\
        .perform()
            new Actions(driver)
                .KeyDown(Keys.Shift)
                .SendKeys("a")
                .KeyUp(Keys.Shift)
                .SendKeys("b")
                .Perform();
    driver.action
          .key_down(:shift)
          .send_keys('a')
          .key_up(:shift)
          .send_keys('b')
          .perform
      await driver.actions()
        .keyDown(Key.SHIFT)
        .sendKeys('a')
        .keyUp(Key.SHIFT)
        .sendKeys('b')
        .perform()
        Actions(driver)
                .keyDown(Keys.SHIFT)
                .sendKeys("a")
                .keyUp(Keys.SHIFT)
                .sendKeys("b")
                .perform()

键入

这是Actions API的一种便捷方法, 它将 keyDown 和 keyUp 命令组合在一个操作中. 执行此命令与使用 element 方法略有不同, 但这主要用于,需要在其他操作之间键入多个字符时使用.

活跃元素

        new Actions(driver)
                .sendKeys("abc")
                .perform();
    ActionChains(driver)\
        .send_keys("abc")\
        .perform()

            new Actions(driver)
                .SendKeys("abc")
    driver.action
          .send_keys('abc')
          .perform
      const textField = driver.findElement(By.id('textInput'))
      await textField.click()
        Actions(driver)
                .sendKeys("abc")
                .perform()

指定元素

        new Actions(driver)
                .sendKeys(textField, "Selenium!")
                .perform();
    text_input = driver.find_element(By.ID, "textInput")
    ActionChains(driver)\
        .send_keys_to_element(text_input, "abc")\
        .perform()
            driver.FindElement(By.TagName("body")).Click();
            
            IWebElement textField = driver.FindElement(By.Id("textInput"));
            new Actions(driver)
    text_field = driver.find_element(id: 'textInput')
    driver.action
          .send_keys(text_field, 'Selenium!')
          .perform
      const textField = await driver.findElement(By.id('textInput'))

      await driver.actions()
        .sendKeys(textField, 'abc')
        .perform()
        val textField = driver.findElement(By.id("textInput"))
        Actions(driver)
                .sendKeys(textField, "Selenium!")
                .perform()

复制粘贴

下面是使用上述所有方法执行复制/粘贴操作的示例. 请注意, 用于此操作的键位会有所不同, 具体取决于它是否是 Mac OS. 此代码将以文本收尾: SeleniumSelenium!

        Keys cmdCtrl = Platform.getCurrent().is(Platform.MAC) ? Keys.COMMAND : Keys.CONTROL;

        WebElement textField = driver.findElement(By.id("textInput"));
        new Actions(driver)
                .sendKeys(textField, "Selenium!")
                .sendKeys(Keys.ARROW_LEFT)
                .keyDown(Keys.SHIFT)
                .sendKeys(Keys.ARROW_UP)
                .keyUp(Keys.SHIFT)
                .keyDown(cmdCtrl)
                .sendKeys("xvv")
                .keyUp(cmdCtrl)
                .perform();

        Assertions.assertEquals("SeleniumSelenium!", textField.getAttribute("value"));
    cmd_ctrl = Keys.COMMAND if sys.platform == 'darwin' else Keys.CONTROL

    ActionChains(driver)\
        .send_keys("Selenium!")\
        .send_keys(Keys.ARROW_LEFT)\
        .key_down(Keys.SHIFT)\
        .send_keys(Keys.ARROW_UP)\
        .key_up(Keys.SHIFT)\
        .key_down(cmd_ctrl)\
        .send_keys("xvv")\
        .key_up(cmd_ctrl)\
        .perform()

            var capabilities = ((WebDriver)driver).Capabilities;
            String platformName = (string)capabilities.GetCapability("platformName");

            String cmdCtrl = platformName.Contains("mac") ? Keys.Command : Keys.Control;

            new Actions(driver)
                .SendKeys("Selenium!")
                .SendKeys(Keys.ArrowLeft)
                .KeyDown(Keys.Shift)
                .SendKeys(Keys.ArrowUp)
    cmd_ctrl = driver.capabilities.platform_name.include?('mac') ? :command : :control
    driver.action
          .send_keys('Selenium!')
          .send_keys(:arrow_left)
          .key_down(:shift)
          .send_keys(:arrow_up)
          .key_up(:shift)
          .key_down(cmd_ctrl)
          .send_keys('xvv')
          .key_up(cmd_ctrl)
          .perform
      const cmdCtrl = platform.includes('darwin') ? Key.COMMAND : Key.CONTROL

      await driver.actions()
        .click(textField)
        .sendKeys('Selenium!')
        .sendKeys(Key.ARROW_LEFT)
        .keyDown(Key.SHIFT)
        .sendKeys(Key.ARROW_UP)
        .keyUp(Key.SHIFT)
        .keyDown(cmdCtrl)
        .sendKeys('xvv')
        .keyUp(cmdCtrl)
        .perform()
        val cmdCtrl = if(platformName == Platform.MAC) Keys.COMMAND else Keys.CONTROL

        val textField = driver.findElement(By.id("textInput"))
        Actions(driver)
                .sendKeys(textField, "Selenium!")
                .sendKeys(Keys.ARROW_LEFT)
                .keyDown(Keys.SHIFT)
                .sendKeys(Keys.ARROW_UP)
                .keyUp(Keys.SHIFT)
                .keyDown(cmdCtrl)
                .sendKeys("xvv")
                .keyUp(cmdCtrl)
                .perform()

2.7.2 - Mouse actions

A representation of any pointer device for interacting with a web page.

There are only 3 actions that can be accomplished with a mouse: pressing down on a button, releasing a pressed button, and moving the mouse. Selenium provides convenience methods that combine these actions in the most common ways.

Click and hold

This method combines moving the mouse to the center of an element with pressing the left mouse button. This is useful for focusing a specific element:

        WebElement clickable = driver.findElement(By.id("clickable"));
        new Actions(driver)
                .clickAndHold(clickable)
                .perform();
    clickable = driver.find_element(By.ID, "clickable")
    ActionChains(driver)\
        .click_and_hold(clickable)\
        .perform()
            IWebElement clickable = driver.FindElement(By.Id("clickable"));
            new Actions(driver)
                .ClickAndHold(clickable)
                .Perform();
    clickable = driver.find_element(id: 'clickable')
    driver.action
          .click_and_hold(clickable)
          .perform
      let clickable = driver.findElement(By.id("clickable"));
      const actions = driver.actions({async: true});
      await actions.move({origin: clickable}).press().perform();
        val clickable = driver.findElement(By.id("clickable"))
        Actions(driver)
                .clickAndHold(clickable)
                .perform()

Click and release

This method combines moving to the center of an element with pressing and releasing the left mouse button. This is otherwise known as “clicking”:

        WebElement clickable = driver.findElement(By.id("click"));
        new Actions(driver)
                .click(clickable)
                .perform();
    clickable = driver.find_element(By.ID, "click")
    ActionChains(driver)\
        .click(clickable)\
        .perform()
            IWebElement clickable = driver.FindElement(By.Id("click"));
            new Actions(driver)
                .Click(clickable)
                .Perform();
    clickable = driver.find_element(id: 'click')
    driver.action
          .click(clickable)
          .perform
      let click = driver.findElement(By.id("click"));
      const actions = driver.actions({async: true});
      await actions.move({origin: click}).click().perform();
        val clickable = driver.findElement(By.id("click"))
        Actions(driver)
                .click(clickable)
                .perform()

Alternate Button Clicks

There are a total of 5 defined buttons for a Mouse:

  • 0 — Left Button (the default)
  • 1 — Middle Button (currently unsupported)
  • 2 — Right Button
  • 3 — X1 (Back) Button
  • 4 — X2 (Forward) Button

Context Click

This method combines moving to the center of an element with pressing and releasing the right mouse button (button 2). This is otherwise known as “right-clicking”:

        WebElement clickable = driver.findElement(By.id("clickable"));
        new Actions(driver)
                .contextClick(clickable)
                .perform();
    clickable = driver.find_element(By.ID, "clickable")
    ActionChains(driver)\
        .context_click(clickable)\
        .perform()
            IWebElement clickable = driver.FindElement(By.Id("clickable"));
            new Actions(driver)
                .ContextClick(clickable)
                .Perform();
      clickable = driver.find_element(id: 'clickable')
      driver.action
            .context_click(clickable)
            .perform
      const clickable = driver.findElement(By.id("clickable"));
      const actions = driver.actions({async: true});
      await actions.contextClick(clickable).perform();
        val clickable = driver.findElement(By.id("clickable"))
        Actions(driver)
                .contextClick(clickable)
                .perform()

Back Click

There is no convenience method for this, it is just pressing and releasing mouse button 3

        PointerInput mouse = new PointerInput(PointerInput.Kind.MOUSE, "default mouse");

        Sequence actions = new Sequence(mouse, 0)
                .addAction(mouse.createPointerDown(PointerInput.MouseButton.BACK.asArg()))
                .addAction(mouse.createPointerUp(PointerInput.MouseButton.BACK.asArg()));

        ((RemoteWebDriver) driver).perform(Collections.singletonList(actions));
    action = ActionBuilder(driver)
    action.pointer_action.pointer_down(MouseButton.BACK)
    action.pointer_action.pointer_up(MouseButton.BACK)
    action.perform()
            ActionBuilder actionBuilder = new ActionBuilder();
            PointerInputDevice mouse = new PointerInputDevice(PointerKind.Mouse, "default mouse");
            actionBuilder.AddAction(mouse.CreatePointerDown(MouseButton.Back));
            actionBuilder.AddAction(mouse.CreatePointerUp(MouseButton.Back));
            ((IActionExecutor)driver).PerformActions(actionBuilder.ToActionSequenceList());
      driver.action
            .pointer_down(:back)
            .pointer_up(:back)
            .perform
      const actions = driver.actions({async: true});
      await actions.press(Button.BACK).release(Button.BACK).perform()
        val mouse = PointerInput(PointerInput.Kind.MOUSE, "default mouse")

        val actions = Sequence(mouse, 0)
                .addAction(mouse.createPointerDown(PointerInput.MouseButton.BACK.asArg()))
                .addAction(mouse.createPointerUp(PointerInput.MouseButton.BACK.asArg()))

        (driver as RemoteWebDriver).perform(Collections.singletonList(actions))

Forward Click

There is no convenience method for this, it is just pressing and releasing mouse button 4

        PointerInput mouse = new PointerInput(PointerInput.Kind.MOUSE, "default mouse");

        Sequence actions = new Sequence(mouse, 0)
                .addAction(mouse.createPointerDown(PointerInput.MouseButton.FORWARD.asArg()))
                .addAction(mouse.createPointerUp(PointerInput.MouseButton.FORWARD.asArg()));

        ((RemoteWebDriver) driver).perform(Collections.singletonList(actions));
    action = ActionBuilder(driver)
    action.pointer_action.pointer_down(MouseButton.FORWARD)
    action.pointer_action.pointer_up(MouseButton.FORWARD)
    action.perform()
            ActionBuilder actionBuilder = new ActionBuilder();
            PointerInputDevice mouse = new PointerInputDevice(PointerKind.Mouse, "default mouse");
            actionBuilder.AddAction(mouse.CreatePointerDown(MouseButton.Forward));
            actionBuilder.AddAction(mouse.CreatePointerUp(MouseButton.Forward));
            ((IActionExecutor)driver).PerformActions(actionBuilder.ToActionSequenceList());
      driver.action
            .pointer_down(:forward)
            .pointer_up(:forward)
            .perform
      const actions = driver.actions({async: true});
      await actions.press(Button.FORWARD).release(Button.FORWARD).perform()
        val mouse = PointerInput(PointerInput.Kind.MOUSE, "default mouse")

        val actions = Sequence(mouse, 0)
                .addAction(mouse.createPointerDown(PointerInput.MouseButton.FORWARD.asArg()))
                .addAction(mouse.createPointerUp(PointerInput.MouseButton.FORWARD.asArg()))

        (driver as RemoteWebDriver).perform(Collections.singletonList(actions))

Double click

This method combines moving to the center of an element with pressing and releasing the left mouse button twice.

        WebElement clickable = driver.findElement(By.id("clickable"));
        new Actions(driver)
                .doubleClick(clickable)
                .perform();
    clickable = driver.find_element(By.ID, "clickable")
    ActionChains(driver)\
        .double_click(clickable)\
        .perform()
            IWebElement clickable = driver.FindElement(By.Id("clickable"));
            new Actions(driver)
                .DoubleClick(clickable)
                .Perform();
    clickable = driver.find_element(id: 'clickable')
    driver.action
          .double_click(clickable)
          .perform
      const clickable = driver.findElement(By.id("clickable"));
      const actions = driver.actions({async: true});
      await actions.doubleClick(clickable).perform();
        val clickable = driver.findElement(By.id("clickable"))
        Actions(driver)
                .doubleClick(clickable)
                .perform()

Move to element

This method moves the mouse to the in-view center point of the element. This is otherwise known as “hovering.” Note that the element must be in the viewport or else the command will error.

        WebElement hoverable = driver.findElement(By.id("hover"));
        new Actions(driver)
                .moveToElement(hoverable)
                .perform();
    hoverable = driver.find_element(By.ID, "hover")
    ActionChains(driver)\
        .move_to_element(hoverable)\
        .perform()
            IWebElement hoverable = driver.FindElement(By.Id("hover"));
            new Actions(driver)
                .MoveToElement(hoverable)
                .Perform();
    hoverable = driver.find_element(id: 'hover')
    driver.action
          .move_to(hoverable)
          .perform
      const hoverable = driver.findElement(By.id("hover"));
      const actions = driver.actions({async: true});
      await actions.move({origin: hoverable}).perform();
        val hoverable = driver.findElement(By.id("hover"))
        Actions(driver)
                .moveToElement(hoverable)
                .perform()

Move by offset

These methods first move the mouse to the designated origin and then by the number of pixels in the provided offset. Note that the position of the mouse must be in the viewport or else the command will error.

Offset from Element

This method moves the mouse to the in-view center point of the element, then moves by the provided offset.

        WebElement tracker = driver.findElement(By.id("mouse-tracker"));
        new Actions(driver)
                .moveToElement(tracker, 8, 0)
                .perform();
    mouse_tracker = driver.find_element(By.ID, "mouse-tracker")
    ActionChains(driver)\
        .move_to_element_with_offset(mouse_tracker, 8, 0)\
        .perform()
            IWebElement tracker = driver.FindElement(By.Id("mouse-tracker"));
            new Actions(driver)
                .MoveToElement(tracker, 8, 0)
                .Perform();
      mouse_tracker = driver.find_element(id: 'mouse-tracker')
      driver.action
            .move_to(mouse_tracker, 8, 11)
            .perform
        val tracker = driver.findElement(By.id("mouse-tracker"))
        Actions(driver)
                .moveToElement(tracker, 8, 0)
                .perform()

Offset from Viewport

This method moves the mouse from the upper left corner of the current viewport by the provided offset.

        PointerInput mouse = new PointerInput(PointerInput.Kind.MOUSE, "default mouse");

        Sequence actions = new Sequence(mouse, 0)
                .addAction(mouse.createPointerMove(Duration.ZERO, PointerInput.Origin.viewport(), 8, 12));

        ((RemoteWebDriver) driver).perform(Collections.singletonList(actions));
    action = ActionBuilder(driver)
    action.pointer_action.move_to_location(8, 0)
    action.perform()
            ActionBuilder actionBuilder = new ActionBuilder();
            PointerInputDevice mouse = new PointerInputDevice(PointerKind.Mouse, "default mouse");
            actionBuilder.AddAction(mouse.CreatePointerMove(CoordinateOrigin.Viewport,
                8, 0, TimeSpan.Zero));
            ((IActionExecutor)driver).PerformActions(actionBuilder.ToActionSequenceList());
      driver.action
            .move_to_location(8, 12)
            .perform
      const actions = driver.actions({async: true});
      await actions.move({x: 8, y: 0}).perform();
        val mouse = PointerInput(PointerInput.Kind.MOUSE, "default mouse")

        val actions = Sequence(mouse, 0)
                .addAction(mouse.createPointerMove(Duration.ZERO, PointerInput.Origin.viewport(), 8, 12))

        (driver as RemoteWebDriver).perform(Collections.singletonList(actions))

Offset from Current Pointer Location

This method moves the mouse from its current position by the offset provided by the user. If the mouse has not previously been moved, the position will be in the upper left corner of the viewport. Note that the pointer position does not change when the page is scrolled.

Note that the first argument X specifies to move right when positive, while the second argument Y specifies to move down when positive. So moveByOffset(30, -10) moves right 30 and up 10 from the current mouse position.

        new Actions(driver)
                .moveByOffset(13, 15)
                .perform();
    ActionChains(driver)\
        .move_by_offset( 13, 15)\
        .perform()
            new Actions(driver)
                .MoveByOffset(13, 15)
                .Perform();
      driver.action
            .move_by(13, 15)
            .perform
      await actions.move({x: 13, y: 15, origin: Origin.POINTER}).perform()
        Actions(driver)
                .moveByOffset(13, 15)
                .perform()

Drag and Drop on Element

This method firstly performs a click-and-hold on the source element, moves to the location of the target element and then releases the mouse.

        WebElement draggable = driver.findElement(By.id("draggable"));
        WebElement droppable = driver.findElement(By.id("droppable"));
        new Actions(driver)
                .dragAndDrop(draggable, droppable)
                .perform();
    draggable = driver.find_element(By.ID, "draggable")
    droppable = driver.find_element(By.ID, "droppable")
    ActionChains(driver)\
        .drag_and_drop(draggable, droppable)\
        .perform()
            IWebElement draggable = driver.FindElement(By.Id("draggable"));
            IWebElement droppable = driver.FindElement(By.Id("droppable"));
            new Actions(driver)
                .DragAndDrop(draggable, droppable)
                .Perform();
    draggable = driver.find_element(id: 'draggable')
    droppable = driver.find_element(id: 'droppable')
    driver.action
          .drag_and_drop(draggable, droppable)
          .perform
      const draggable = driver.findElement(By.id("draggable"));
      const droppable = await driver.findElement(By.id("droppable"));
      const actions = driver.actions({async: true});
      await actions.dragAndDrop(draggable, droppable).perform();
        val draggable = driver.findElement(By.id("draggable"))
        val droppable = driver.findElement(By.id("droppable"))
        Actions(driver)
                .dragAndDrop(draggable, droppable)
                .perform()

Drag and Drop by Offset

This method firstly performs a click-and-hold on the source element, moves to the given offset and then releases the mouse.

        WebElement draggable = driver.findElement(By.id("draggable"));
        Rectangle start = draggable.getRect();
        Rectangle finish = driver.findElement(By.id("droppable")).getRect();
        new Actions(driver)
                .dragAndDropBy(draggable, finish.getX() - start.getX(), finish.getY() - start.getY())
                .perform();
    draggable = driver.find_element(By.ID, "draggable")
    start = draggable.location
    finish = driver.find_element(By.ID, "droppable").location
    ActionChains(driver)\
        .drag_and_drop_by_offset(draggable, finish['x'] - start['x'], finish['y'] - start['y'])\
        .perform()
            IWebElement draggable = driver.FindElement(By.Id("draggable"));
            Point start = draggable.Location;
            Point finish = driver.FindElement(By.Id("droppable")).Location;
            new Actions(driver)
                .DragAndDropToOffset(draggable, finish.X - start.X, finish.Y - start.Y)
                .Perform();
    draggable = driver.find_element(id: 'draggable')
    start = draggable.rect
    finish = driver.find_element(id: 'droppable').rect
    driver.action
          .drag_and_drop_by(draggable, finish.x - start.x, finish.y - start.y)
          .perform
      const draggable = driver.findElement(By.id("draggable"));
      let start = await draggable.getRect();
      let finish = await driver.findElement(By.id("droppable")).getRect();
      const actions = driver.actions({async: true});
      await actions.dragAndDrop(draggable, {x: finish.x - start.x, y: finish.y - start.y}).perform();
        val draggable = driver.findElement(By.id("draggable"))
        val start = draggable.getRect()
        val finish = driver.findElement(By.id("droppable")).getRect()
        Actions(driver)
                .dragAndDropBy(draggable, finish.getX() - start.getX(), finish.getY() - start.getY())
                .perform()

2.7.3 - Pen actions

A representation of a pen stylus kind of pointer input for interacting with a web page.

Chromium Only

A Pen is a type of pointer input that has most of the same behavior as a mouse, but can also have event properties unique to a stylus. Additionally, while a mouse has 5 buttons, a pen has 3 equivalent button states:

  • 0 — Touch Contact (the default; equivalent to a left click)
  • 2 — Barrel Button (equivalent to a right click)
  • 5 — Eraser Button (currently unsupported by drivers)

Using a Pen

        WebElement pointerArea = driver.findElement(By.id("pointerArea"));
        new Actions(driver)
                .setActivePointer(PointerInput.Kind.PEN, "default pen")
                .moveToElement(pointerArea)
                .clickAndHold()
                .moveByOffset(2, 2)
                .release()
                .perform();
    pointer_area = driver.find_element(By.ID, "pointerArea")
    pen_input = PointerInput(POINTER_PEN, "default pen")
    action = ActionBuilder(driver, mouse=pen_input)
    action.pointer_action\
        .move_to(pointer_area)\
        .pointer_down()\
        .move_by(2, 2)\
        .pointer_up()
    action.perform()
            IWebElement pointerArea = driver.FindElement(By.Id("pointerArea"));
            ActionBuilder actionBuilder = new ActionBuilder();
            PointerInputDevice pen = new PointerInputDevice(PointerKind.Pen, "default pen");
            
            actionBuilder.AddAction(pen.CreatePointerMove(pointerArea, 0, 0, TimeSpan.FromMilliseconds(800)));
            actionBuilder.AddAction(pen.CreatePointerDown(MouseButton.Left));
            actionBuilder.AddAction(pen.CreatePointerMove(CoordinateOrigin.Pointer,
                2, 2, TimeSpan.Zero));
            actionBuilder.AddAction(pen.CreatePointerUp(MouseButton.Left));
            ((IActionExecutor)driver).PerformActions(actionBuilder.ToActionSequenceList());
    pointer_area = driver.find_element(id: 'pointerArea')
    driver.action(devices: :pen)
          .move_to(pointer_area)
          .pointer_down
          .move_by(2, 2)
          .pointer_up
          .perform
        val pointerArea = driver.findElement(By.id("pointerArea"))
        Actions(driver)
                .setActivePointer(PointerInput.Kind.PEN, "default pen")
                .moveToElement(pointerArea)
                .clickAndHold()
                .moveByOffset(2, 2)
                .release()
                .perform()

Adding Pointer Event Attributes

        WebElement pointerArea = driver.findElement(By.id("pointerArea"));
        PointerInput pen = new PointerInput(PointerInput.Kind.PEN, "default pen");
        PointerInput.PointerEventProperties eventProperties = PointerInput.eventProperties()
                .setTiltX(-72)
                .setTiltY(9)
                .setTwist(86);
        PointerInput.Origin origin = PointerInput.Origin.fromElement(pointerArea);

        Sequence actionListPen = new Sequence(pen, 0)
                .addAction(pen.createPointerMove(Duration.ZERO, origin, 0, 0))
                .addAction(pen.createPointerDown(0))
                .addAction(pen.createPointerMove(Duration.ZERO, origin, 2, 2, eventProperties))
                .addAction(pen.createPointerUp(0));

        ((RemoteWebDriver) driver).perform(Collections.singletonList(actionListPen));
    pointer_area = driver.find_element(By.ID, "pointerArea")
    pen_input = PointerInput(POINTER_PEN, "default pen")
    action = ActionBuilder(driver, mouse=pen_input)
    action.pointer_action\
        .move_to(pointer_area)\
        .pointer_down()\
        .move_by(2, 2, tilt_x=-72, tilt_y=9, twist=86)\
        .pointer_up(0)
    action.perform()
            IWebElement pointerArea = driver.FindElement(By.Id("pointerArea"));
            ActionBuilder actionBuilder = new ActionBuilder();
            PointerInputDevice pen = new PointerInputDevice(PointerKind.Pen, "default pen");
            PointerInputDevice.PointerEventProperties properties = new PointerInputDevice.PointerEventProperties() {
                TiltX = -72,
                TiltY = 9,
                Twist = 86,
            };            
            actionBuilder.AddAction(pen.CreatePointerMove(pointerArea, 0, 0, TimeSpan.FromMilliseconds(800)));
            actionBuilder.AddAction(pen.CreatePointerDown(MouseButton.Left));
            actionBuilder.AddAction(pen.CreatePointerMove(CoordinateOrigin.Pointer,
                2, 2, TimeSpan.Zero, properties));
            actionBuilder.AddAction(pen.CreatePointerUp(MouseButton.Left));
            ((IActionExecutor)driver).PerformActions(actionBuilder.ToActionSequenceList());
    pointer_area = driver.find_element(id: 'pointerArea')
    driver.action(devices: :pen)
          .move_to(pointer_area)
          .pointer_down
          .move_by(2, 2, tilt_x: -72, tilt_y: 9, twist: 86)
          .pointer_up
          .perform
        val pointerArea = driver.findElement(By.id("pointerArea"))
        val pen = PointerInput(PointerInput.Kind.PEN, "default pen")
        val eventProperties = PointerInput.eventProperties()
                .setTiltX(-72)
                .setTiltY(9)
                .setTwist(86)
        val origin = PointerInput.Origin.fromElement(pointerArea)
        
        val actionListPen = Sequence(pen, 0)
                .addAction(pen.createPointerMove(Duration.ZERO, origin, 0, 0))
                .addAction(pen.createPointerDown(0))
                .addAction(pen.createPointerMove(Duration.ZERO, origin, 2, 2, eventProperties))
                .addAction(pen.createPointerUp(0))

        (driver as RemoteWebDriver).perform(listOf(actionListPen))

2.7.4 - Scroll wheel actions

A representation of a scroll wheel input device for interacting with a web page.

Chromium Only

There are 5 scenarios for scrolling on a page.

Scroll to element

This is the most common scenario. Unlike traditional click and send keys methods, the actions class does not automatically scroll the target element into view, so this method will need to be used if elements are not already inside the viewport.

This method takes a web element as the sole argument.

Regardless of whether the element is above or below the current viewscreen, the viewport will be scrolled so the bottom of the element is at the bottom of the screen.

        WebElement iframe = driver.findElement(By.tagName("iframe"));
        new Actions(driver)
                .scrollToElement(iframe)
                .perform();
    iframe = driver.find_element(By.TAG_NAME, "iframe")
    ActionChains(driver)\
        .scroll_to_element(iframe)\
        .perform()
            IWebElement iframe = driver.FindElement(By.TagName("iframe"));
            new Actions(driver)
                .ScrollToElement(iframe)
                .Perform();
    iframe = driver.find_element(tag_name: 'iframe')
    driver.action
          .scroll_to(iframe)
          .perform
      const iframe = await driver.findElement(By.css("iframe"))

      await driver.actions()
        .scroll(0, 0, 0, 0, iframe)
        .perform()
        val iframe = driver.findElement(By.tagName("iframe"))
        Actions(driver)
                .scrollToElement(iframe)
                .perform()

Scroll by given amount

This is the second most common scenario for scrolling. Pass in an delta x and a delta y value for how much to scroll in the right and down directions. Negative values represent left and up, respectively.

        WebElement footer = driver.findElement(By.tagName("footer"));
        int deltaY = footer.getRect().y;
        new Actions(driver)
                .scrollByAmount(0, deltaY)
                .perform();
    footer = driver.find_element(By.TAG_NAME, "footer")
    delta_y = footer.rect['y']
    ActionChains(driver)\
        .scroll_by_amount(0, delta_y)\
        .perform()
            IWebElement footer = driver.FindElement(By.TagName("footer"));
            int deltaY = footer.Location.Y;
            new Actions(driver)
                .ScrollByAmount(0, deltaY)
                .Perform();
    footer = driver.find_element(tag_name: 'footer')
    delta_y = footer.rect.y
    driver.action
          .scroll_by(0, delta_y)
          .perform
      const footer = await driver.findElement(By.css("footer"))
      const deltaY = (await footer.getRect()).y

      await driver.actions()
        .scroll(0, 0, 0, deltaY)
        .perform()
        val footer = driver.findElement(By.tagName("footer"))
        val deltaY = footer.getRect().y
        Actions(driver)
                .scrollByAmount(0, deltaY)
                .perform()

Scroll from an element by a given amount

This scenario is effectively a combination of the above two methods.

To execute this use the “Scroll From” method, which takes 3 arguments. The first represents the origination point, which we designate as the element, and the second two are the delta x and delta y values.

If the element is out of the viewport, it will be scrolled to the bottom of the screen, then the page will be scrolled by the provided delta x and delta y values.

        WebElement iframe = driver.findElement(By.tagName("iframe"));
        WheelInput.ScrollOrigin scrollOrigin = WheelInput.ScrollOrigin.fromElement(iframe);
        new Actions(driver)
                .scrollFromOrigin(scrollOrigin, 0, 200)
                .perform();
    iframe = driver.find_element(By.TAG_NAME, "iframe")
    scroll_origin = ScrollOrigin.from_element(iframe)
    ActionChains(driver)\
        .scroll_from_origin(scroll_origin, 0, 200)\
        .perform()
            IWebElement iframe = driver.FindElement(By.TagName("iframe"));
            WheelInputDevice.ScrollOrigin scrollOrigin = new WheelInputDevice.ScrollOrigin
            {
                Element = iframe
            };
            new Actions(driver)
                .ScrollFromOrigin(scrollOrigin, 0, 200)
                .Perform();
    iframe = driver.find_element(tag_name: 'iframe')
    scroll_origin = Selenium::WebDriver::WheelActions::ScrollOrigin.element(iframe)
    driver.action
          .scroll_from(scroll_origin, 0, 200)
          .perform
      const iframe = await driver.findElement(By.css("iframe"))

      await driver.actions()
        .scroll(0, 0, 0, 200, iframe)
        .perform()
        val iframe = driver.findElement(By.tagName("iframe"))
        val scrollOrigin = WheelInput.ScrollOrigin.fromElement(iframe)
        Actions(driver)
                .scrollFromOrigin(scrollOrigin, 0, 200)
                .perform()

Scroll from an element with an offset

This scenario is used when you need to scroll only a portion of the screen, and it is outside the viewport. Or is inside the viewport and the portion of the screen that must be scrolled is a known offset away from a specific element.

This uses the “Scroll From” method again, and in addition to specifying the element, an offset is specified to indicate the origin point of the scroll. The offset is calculated from the center of the provided element.

If the element is out of the viewport, it first will be scrolled to the bottom of the screen, then the origin of the scroll will be determined by adding the offset to the coordinates of the center of the element, and finally the page will be scrolled by the provided delta x and delta y values.

Note that if the offset from the center of the element falls outside of the viewport, it will result in an exception.

        WebElement footer = driver.findElement(By.tagName("footer"));
        WheelInput.ScrollOrigin scrollOrigin = WheelInput.ScrollOrigin.fromElement(footer, 0, -50);
        new Actions(driver)
                .scrollFromOrigin(scrollOrigin,0, 200)
                .perform();
    footer = driver.find_element(By.TAG_NAME, "footer")
    scroll_origin = ScrollOrigin.from_element(footer, 0, -50)
    ActionChains(driver)\
        .scroll_from_origin(scroll_origin, 0, 200)\
        .perform()
            IWebElement footer = driver.FindElement(By.TagName("footer"));
            var scrollOrigin = new WheelInputDevice.ScrollOrigin
            {
                Element = footer,
                XOffset = 0,
                YOffset = -50
            };
            new Actions(driver)
                .ScrollFromOrigin(scrollOrigin, 0, 200)
                .Perform();
    footer = driver.find_element(tag_name: 'footer')
    scroll_origin = Selenium::WebDriver::WheelActions::ScrollOrigin.element(footer, 0, -50)
    driver.action
          .scroll_from(scroll_origin, 0, 200)
          .perform
      const footer = await driver.findElement(By.css("footer"))

      await driver.actions()
        .scroll(0, -50, 0, 200, footer)
        .perform()
        val footer = driver.findElement(By.tagName("footer"))
        val scrollOrigin = WheelInput.ScrollOrigin.fromElement(footer, 0, -50)
        Actions(driver)
                .scrollFromOrigin(scrollOrigin,0, 200)
                .perform()

Scroll from a offset of origin (element) by given amount

The final scenario is used when you need to scroll only a portion of the screen, and it is already inside the viewport.

This uses the “Scroll From” method again, but the viewport is designated instead of an element. An offset is specified from the upper left corner of the current viewport. After the origin point is determined, the page will be scrolled by the provided delta x and delta y values.

Note that if the offset from the upper left corner of the viewport falls outside of the screen, it will result in an exception.

        WheelInput.ScrollOrigin scrollOrigin = WheelInput.ScrollOrigin.fromViewport(10, 10);
        new Actions(driver)
                .scrollFromOrigin(scrollOrigin, 0, 200)
                .perform();
    scroll_origin = ScrollOrigin.from_viewport(10, 10)

    ActionChains(driver)\
        .scroll_from_origin(scroll_origin, 0, 200)\
        .perform()
            var scrollOrigin = new WheelInputDevice.ScrollOrigin
            {
                Viewport = true,
                XOffset = 10,
                YOffset = 10
            };
            new Actions(driver)
                .ScrollFromOrigin(scrollOrigin, 0, 200)
                .Perform();
    scroll_origin = Selenium::WebDriver::WheelActions::ScrollOrigin.viewport(10, 10)
    driver.action
          .scroll_from(scroll_origin, 0, 200)
          .perform
      await driver.actions()
        .scroll(10, 10, 0, 200)
        .perform()
        val scrollOrigin = WheelInput.ScrollOrigin.fromViewport(10, 10)
        Actions(driver)
                .scrollFromOrigin(scrollOrigin, 0, 200)
                .perform()

2.8 - 双向协议

Selenium正在与浏览器供应商合作创建 WebDriver双向协议 , 作为一种提供稳定的跨浏览器API的方法, 该API使用双向协议处理 各种浏览器的通用自动化以及特定测试的需求. 在此之前, 寻求此功能的用户 必须忍受当前实现的全部问题和局限.

严格限制请求响应命令的传统WebDriver模型, 将从user agent转变为基于WebSockets的软件控制, 通过这样完善流事件的能力, 以便更好地匹配浏览器DOM的事件性质.

While the specification is in works, the browser vendors are parallely implementing the WebDriver BiDirectional Protocol. Refer web-platform-tests dashboard to see how far along the browser vendors are. Selenium is trying to keep up with the browser vendors and has started implementing W3C BiDi APIs. The goal is to ensure APIs are W3C compliant and uniform among the different language bindings.

However, until the specification and corresponding Selenium implementation is complete there are many useful things that CDP offers. Selenium offers some useful helper classes that use CDP.

2.8.1 - Chrome DevTools

Many browsers provide “DevTools” – a set of tools that are integrated with the browser that developers can use to debug web apps and explore the performance of their pages. Google Chrome’s DevTools make use of a protocol called the Chrome DevTools Protocol (or “CDP” for short). As the name suggests, this is not designed for testing, nor to have a stable API, so functionality is highly dependent on the version of the browser.

The WebDriver BiDirectional Protocol is the next generation of the W3C WebDriver protocol and aims to provide a stable API implemented by all browsers, but it’s not yet complete. Until it is, Selenium provides access to the CDP for those browsers that implement it (such as Google Chrome, or Microsoft Edge, and Firefox), allowing you to enhance your tests in interesting ways. Some examples of what you can do with it are given below.

Ways to Use Chrome DevTools With Selenium

There are three different ways to access Chrome DevTools in Selenium. If you look for other examples online, you will likely see each of these mixed and matched.

  • The CDP Endpoint was the first option available to users. It only works for the most simple things (setting state, getting basic information), and you have to know the “magic strings” for the domain and methods and key value pairs. For basic requirements, this might be simpler than the other options. These methods are only temporarily supported.
  • The CDP API is an improvement on just using the endpoint because you can set do things asynchronously. Instead of a String and a Map, you can access the supported classes, methods and parameters in the code. These methods are also only temporarily supported.
  • The BiDi API option should be used whenever possible because it abstracts away the implementation details entirely and will work with either CDP or WebDriver-BiDi when Selenium moves away from CDP.

Examples With Limited Value

There are a number of commonly cited examples for using CDP that are of limited practical value.

  • Geo Location — almost all sites use the IP address to determine physical location, so setting an emulated geolocation rarely has the desired effect.
  • Overriding Device Metrics — Chrome provides a great API for setting Mobile Emulation in the Options classes, which is generally superior to attempting to do this with CDP.

Check out the examples in these documents for ways to do additional useful things:

2.8.1.1 - Chrome DevTools Protocol Endpoint

Google provides a /cdp/execute endpoint that can be accessed directly. Each Selenium binding provides a method that allows you to pass the CDP domain as a String, and the required parameters as a simple Map.

These methods will eventually be removed. It is recommended to use the WebDriver-BiDi or WebDriver Bidi APIs methods where possible to ensure future compatibility.

Usage

Generally you should prefer the use of the CDP API over this approach, but sometimes the syntax is cleaner or significantly more simple.

Limitations include:

  • It only works for use cases that are limited to setting or getting information; any actual asynchronous interactions require another implementation
  • You have to know the exactly correct “magic strings” for domains and keys
  • It is possible that an update to Chrome will change the required parameters

Examples

An alternate implementation can be found at CDP API Set Cookie

    Map<String, Object> cookie = new HashMap<>();
    cookie.put("name", "cheese");
    cookie.put("value", "gouda");
    cookie.put("domain", "www.selenium.dev");
    cookie.put("secure", true);

    ((HasCdp) driver).executeCdpCommand("Network.setCookie", cookie);
    cookie = {'name': 'cheese',
              'value': 'gouda',
              'domain': 'www.selenium.dev',
              'secure': True}

    driver.execute_cdp_cmd('Network.setCookie', cookie)
            var cookie = new Dictionary<string, object>
            {
                { "name", "cheese" },
                { "value", "gouda" },
                { "domain", "www.selenium.dev" },
                { "secure", true }
            };

            ((ChromeDriver)driver).ExecuteCdpCommand("Network.setCookie", cookie);

The CDP API Set Cookie implementation should be preferred

    cookie = {name: 'cheese',
              value: 'gouda',
              domain: 'www.selenium.dev',
              secure: true}

    driver.execute_cdp('Network.setCookie', **cookie)

Performance Metrics

An alternate implementation can be found at CDP API Performance Metrics

The CDP API Performance Metrics implementation should be preferred

    ((HasCdp) driver).executeCdpCommand("Performance.enable", new HashMap<>());

    Map<String, Object> response =
        ((HasCdp) driver).executeCdpCommand("Performance.getMetrics", new HashMap<>());
    driver.execute_cdp_cmd('Performance.enable', {})

    metric_list = driver.execute_cdp_cmd('Performance.getMetrics', {})["metrics"]
            ((ChromeDriver)driver).ExecuteCdpCommand("Performance.enable", emptyDictionary);

            Dictionary<string, object> response = (Dictionary<string, object>)((ChromeDriver)driver)
                .ExecuteCdpCommand("Performance.getMetrics", emptyDictionary);

The CDP API Performance Metrics implementation should be preferred

    driver.execute_cdp('Performance.enable')

    metric_list = driver.execute_cdp('Performance.getMetrics')['metrics']

Basic authentication

Alternate implementations can be found at CDP API Basic Authentication and BiDi API Basic Authentication

The BiDi API Basic Authentication implementation should be preferred

    ((HasCdp) driver).executeCdpCommand("Network.enable", new HashMap<>());

    String encodedAuth = Base64.getEncoder().encodeToString("admin:admin".getBytes());
    Map<String, Object> headers =
        ImmutableMap.of("headers", ImmutableMap.of("authorization", "Basic " + encodedAuth));

    ((HasCdp) driver).executeCdpCommand("Network.setExtraHTTPHeaders", headers);
    driver.execute_cdp_cmd("Network.enable", {})

    credentials = base64.b64encode("admin:admin".encode()).decode()
    headers = {'headers': {'authorization': 'Basic ' + credentials}}

    driver.execute_cdp_cmd('Network.setExtraHTTPHeaders', headers)
            ((ChromeDriver)driver).ExecuteCdpCommand("Network.enable", emptyDictionary);
            
            string encodedAuth = Convert.ToBase64String(Encoding.Default.GetBytes("admin:admin"));
            var headers = new Dictionary<string, object>
            {
                { "headers", new Dictionary<string, string> { { "authorization", "Basic " + encodedAuth } } }
            };

            ((ChromeDriver)driver).ExecuteCdpCommand("Network.setExtraHTTPHeaders", headers);

The BiDi API Basic Authentication implementation should be preferred

    driver.execute_cdp('Network.enable')

    credentials = Base64.strict_encode64('admin:admin')
    headers = {authorization: "Basic #{credentials}"}

    driver.execute_cdp('Network.setExtraHTTPHeaders', headers: headers)

2.8.1.2 - Chrome DevTools Protocol API

Each of the Selenium bindings dynamically generates classes and methods for the various CDP domains and features; these are tied to specific versions of Chrome.

While Selenium 4 provides direct access to the Chrome DevTools Protocol (CDP), these methods will eventually be removed. It is recommended to use the WebDriver Bidi APIs methods where possible to ensure future compatibility.

Usage

If your use case has been implemented by WebDriver Bidi or the BiDi API, you should use those implementations instead of this one. Generally you should prefer this approach over executing with the CDP Endpoint, especially in Ruby.

Examples

An alternate implementation can be found at CDP Endpoint Set Cookie

Because Java requires using all the parameters example, the Map approach used in CDP Endpoint Set Cookie might be more simple.

    devTools = ((HasDevTools) driver).getDevTools();
    devTools.createSession();

    devTools.send(
        Network.setCookie(
            "cheese",
            "gouda",
            Optional.empty(),
            Optional.of("www.selenium.dev"),
            Optional.empty(),
            Optional.of(true),
            Optional.empty(),
            Optional.empty(),
            Optional.empty(),
            Optional.empty(),
            Optional.empty(),
            Optional.empty(),
            Optional.empty(),
            Optional.empty()));

Because Python requires using async methods for this example, the synchronous approach found in CDP Endpoint Set Cookie might be easier.

    async with driver.bidi_connection() as connection:
        execution = connection.devtools.network.set_cookie(
            name="cheese",
            value="gouda",
            domain="www.selenium.dev",
            secure=True
        )

        await connection.session.execute(execution)

Due to the added complexity in .NET of obtaining the domains and executing with awaits, the CDP Endpoint Set Cookie might be easier.

            var session = ((IDevTools)driver).GetDevToolsSession();
            var domains = session.GetVersionSpecificDomains<OpenQA.Selenium.DevTools.V118.DevToolsSessionDomains>();
            await domains.Network.Enable(new OpenQA.Selenium.DevTools.V118.Network.EnableCommandSettings());

            var cookieCommandSettings = new SetCookieCommandSettings
            {
                Name = "cheese",
                Value = "gouda",
                Domain = "www.selenium.dev",
                Secure = true
            };

            await domains.Network.SetCookie(cookieCommandSettings);
    driver.devtools.network.set_cookie(name: 'cheese',
                                       value: 'gouda',
                                       domain: 'www.selenium.dev',
                                       secure: true)

Performance Metrics

An alternate implementation can be found at CDP Endpoint Performance Metrics

    devTools = ((HasDevTools) driver).getDevTools();
    devTools.createSession();
    devTools.send(Performance.enable(Optional.empty()));

    List<Metric> metricList = devTools.send(Performance.getMetrics());

Because Python requires using async methods for this example, the synchronous approach found in CDP Endpoint Performance Metrics might be easier.

    async with driver.bidi_connection() as connection:
        await connection.session.execute(connection.devtools.performance.enable())

        metric_list = await connection.session.execute(connection.devtools.performance.get_metrics())

Due to the added complexity in .NET of obtaining the domains and executing with awaits, the CDP Endpoint Performance Metrics might be easier.

            var session = ((IDevTools)driver).GetDevToolsSession();
            var domains = session.GetVersionSpecificDomains<OpenQA.Selenium.DevTools.V118.DevToolsSessionDomains>();
            await domains.Performance.Enable(new OpenQA.Selenium.DevTools.V118.Performance.EnableCommandSettings());

            var metricsResponse =
                await session.SendCommand<GetMetricsCommandSettings, GetMetricsCommandResponse>(
                    new GetMetricsCommandSettings()
                );
    driver.devtools.performance.enable

    metric_list = driver.devtools.performance.get_metrics.dig('result', 'metrics')

Basic authentication

Alternate implementations can be found at CDP Endpoint Basic Authentication and BiDi API Basic Authentication

The BiDi API Basic Authentication implementation should be preferred

    devTools = ((HasDevTools) driver).getDevTools();
    devTools.createSession();
    devTools.send(Network.enable(Optional.of(100000), Optional.of(100000), Optional.of(100000)));

    String encodedAuth = Base64.getEncoder().encodeToString("admin:admin".getBytes());
    Map<String, Object> headers = ImmutableMap.of("Authorization", "Basic " + encodedAuth);

    devTools.send(Network.setExtraHTTPHeaders(new Headers(headers)));

Because Python requires using async methods for this example, the synchronous approach found in CDP Endpoint Basic Authentication might be easier.

    async with driver.bidi_connection() as connection:
        await connection.session.execute(connection.devtools.network.enable())

        credentials = base64.b64encode("admin:admin".encode()).decode()
        auth = {'authorization': 'Basic ' + credentials}

        await connection.session.execute(connection.devtools.network.set_extra_http_headers(Headers(auth)))

Due to the added complexity in .NET of obtaining the domains and executing with awaits, the CDP Endpoint Basic Authentication might be easier.

            var session = ((IDevTools)driver).GetDevToolsSession();
            var domains = session.GetVersionSpecificDomains<OpenQA.Selenium.DevTools.V118.DevToolsSessionDomains>();
            await domains.Network.Enable(new OpenQA.Selenium.DevTools.V118.Network.EnableCommandSettings());

            var encodedAuth = Convert.ToBase64String(Encoding.Default.GetBytes("admin:admin"));
            var headerSettings = new SetExtraHTTPHeadersCommandSettings
            {
                Headers = new Headers()
                {
                    { "authorization", "Basic " + encodedAuth }
                }
            };

            await domains.Network.SetExtraHTTPHeaders(headerSettings);

The BiDi API Basic Authentication implementation should be preferred

    driver.devtools.network.enable

    credentials = Base64.strict_encode64('admin:admin')

    driver.devtools.network.set_extra_http_headers(headers: {authorization: "Basic #{credentials}"})

Console logs

Because reading console logs requires setting an event listener, this cannot be done with a CDP Endpoint implementation Alternate implementations can be found at BiDi API Console logs and errors and WebDriver BiDi Console logs

Use the WebDriver BiDi Console logs implementation

    DevTools devTools = ((HasDevTools) driver).getDevTools();
    devTools.createSession();
    devTools.send(Runtime.enable());

    CopyOnWriteArrayList<String> logs = new CopyOnWriteArrayList<>();
    devTools.addListener(
        Runtime.consoleAPICalled(),
        event -> logs.add((String) event.getArgs().get(0).getValue().orElse("")));

The BiDi API Console logs and errors implementation should be preferred

    driver.devtools.runtime.enable

    logs = []
    driver.devtools.runtime.on(:console_api_called) do |params|
      logs << params['args'].first['value']
    end

JavaScript exceptions

Similar to console logs, but this listens for actual javascript exceptions not just logged errors Alternate implementations can be found at BiDi API JavaScript exceptions and WebDriver BiDi JavaScript exceptions

Use the WebDriver BiDi JavaScript exceptions implementation

    DevTools devTools = ((HasDevTools) driver).getDevTools();
    devTools.createSession();
    devTools.send(Runtime.enable());

    CopyOnWriteArrayList<JavascriptException> errors = new CopyOnWriteArrayList<>();
    devTools.getDomains().events().addJavascriptExceptionListener(errors::add);

Download complete

Wait for a download to finish before continuing. Because getting download status requires setting a listener, this cannot be done with a CDP Endpoint implementation.

    devTools = ((HasDevTools) driver).getDevTools();
    devTools.createSession();
    devTools.send(
        Browser.setDownloadBehavior(
            Browser.SetDownloadBehaviorBehavior.ALLOWANDNAME,
            Optional.empty(),
            Optional.of(""),
            Optional.of(true)));

    AtomicBoolean completed = new AtomicBoolean(false);
    devTools.addListener(
        Browser.downloadProgress(),
        e -> completed.set(Objects.equals(e.getState().toString(), "completed")));
    driver.devtools.browser.set_download_behavior(behavior: 'allow',
                                                  download_path: '',
                                                  events_enabled: true)

    driver.devtools.browser.on(:download_progress) do |progress|
      @completed = progress['state'] == 'completed'
    end

2.8.1.3 - Chrome Devtools Protocol with BiDi API

These examples are currently implemented with CDP, but the same code should work when the functionality is re-implemented with WebDriver-BiDi.

Usage

The following list of APIs will be growing as the Selenium project works through supporting real world use cases. If there is additional functionality you’d like to see, please raise a feature request.

As these examples are re-implemented with the WebDriver-Bidi protocol, they will be moved to the WebDriver Bidi pages.

Examples

Basic authentication

Some applications make use of browser authentication to secure pages. It used to be common to handle them in the URL, but browser stopped supporting this. With BiDi, you can now provide the credentials when necessary

Alternate implementations can be found at CDP Endpoint Basic Authentication and CDP API Basic Authentication

    Predicate<URI> uriPredicate = uri -> uri.toString().contains("herokuapp.com");
    Supplier<Credentials> authentication = UsernameAndPassword.of("admin", "admin");

    ((HasAuthentication) driver).register(uriPredicate, authentication);

An alternate implementation may be found at CDP Endpoint Basic Authentication

Implementation Missing

            var handler = new NetworkAuthenticationHandler()
            {
                UriMatcher = uri => uri.AbsoluteUri.Contains("herokuapp"),
                Credentials = new PasswordCredentials("admin", "admin")
            };

            var networkInterceptor = driver.Manage().Network;
            networkInterceptor.AddAuthenticationHandler(handler);

            await networkInterceptor.StartMonitoring();
    driver.register(username: 'admin',
                    password: 'admin',
                    uri: /herokuapp/)

Move Code

const {Builder} = require('selenium-webdriver');

(async function example() {
  try {
    let driver = await new Builder()
      .forBrowser('chrome')
      .build();

    const pageCdpConnection = await driver.createCDPConnection('page');
    await driver.register('username', 'password', pageCdpConnection);
    await driver.get('https://the-internet.herokuapp.com/basic_auth');
    await driver.quit();
  }catch (e){
    console.log(e)
  }
}())

Move Code

val uriPredicate = Predicate { uri: URI ->
        uri.host.contains("your-domain.com")
    }
(driver as HasAuthentication).register(uriPredicate, UsernameAndPassword.of("admin", "password"))
driver.get("https://your-domain.com/login")

Pin scripts

This can be especially useful when executing on a remote server. For example, whenever you check the visibility of an element, or whenever you use the classic get attribute method, Selenium is sending the contents of a js file to the script execution endpoint. These files are each about 50kB, which adds up.

            var key = await new JavaScriptEngine(driver).PinScript("return arguments;");

            var arguments = ((WebDriver)driver).ExecuteScript(key, 1, true, element);
    key = driver.pin_script('return arguments;')
    arguments = driver.execute_script(key, 1, true, element)

Mutation observation

Mutation Observation is the ability to capture events via WebDriver BiDi when there are DOM mutations on a specific element in the DOM.

    CopyOnWriteArrayList<WebElement> mutations = new CopyOnWriteArrayList<>();
    ((HasLogEvents) driver).onLogEvent(domMutation(e -> mutations.add(e.getElement())));
    async with driver.bidi_connection() as session:
        log = Log(driver, session)
            var mutations = new List<IWebElement>();
            using IJavaScriptEngine monitor = new JavaScriptEngine(driver);
            monitor.DomMutated += (_, e) =>
            {
                var locator = By.CssSelector($"*[data-__webdriver_id='{e.AttributeData.TargetId}']");
                mutations.Add(driver.FindElement(locator));
            };

            await monitor.StartEventMonitoring();
            await monitor.EnableDomMutationMonitoring();
    mutations = []
    driver.on_log_event(:mutation) { |mutation| mutations << mutation.element }

Move Code

const {Builder, until} = require('selenium-webdriver');
const assert = require("assert");

(async function example() {
  try {
    let driver = await new Builder()
      .forBrowser('chrome')
      .build();

    const cdpConnection = await driver.createCDPConnection('page');
    await driver.logMutationEvents(cdpConnection, event => {
      assert.deepStrictEqual(event['attribute_name'], 'style');
      assert.deepStrictEqual(event['current_value'], "");
      assert.deepStrictEqual(event['old_value'], "display:none;");
    });

    await driver.get('dynamic.html');
    await driver.findElement({id: 'reveal'}).click();
    let revealed = driver.findElement({id: 'revealed'});
    await driver.wait(until.elementIsVisible(revealed), 5000);
    await driver.quit();
  }catch (e){
    console.log(e)
  }
}())

Console logs and errors

Listen to the console.log events and register callbacks to process the event.

CDP API Console logs and WebDriver BiDi Console logs

Use the WebDriver BiDi Console logs implementation. HasLogEvents will likely end up deprecated because it does not implement Closeable.

    CopyOnWriteArrayList<String> messages = new CopyOnWriteArrayList<>();
    ((HasLogEvents) driver).onLogEvent(consoleEvent(e -> messages.add(e.getMessages().get(0))));
    async with driver.bidi_connection() as session:
        log = Log(driver, session)

        async with log.add_listener(Console.ALL) as messages:
            using IJavaScriptEngine monitor = new JavaScriptEngine(driver);
            var messages = new List<string>();
            monitor.JavaScriptConsoleApiCalled += (_, e) =>
            {
                messages.Add(e.MessageContent);
            };

            await monitor.StartEventMonitoring();
    logs = []
    driver.on_log_event(:console) { |log| logs << log.args.first }

Move Code

const {Builder} = require('selenium-webdriver');
(async () => {
  try {
    let driver = new Builder()
      .forBrowser('chrome')
      .build();

    const cdpConnection = await driver.createCDPConnection('page');
    await driver.onLogEvent(cdpConnection, function (event) {
      console.log(event['args'][0]['value']);
    });
    await driver.executeScript('console.log("here")');
    await driver.quit();
  }catch (e){
    console.log(e);
  }
})()

Move Code

fun kotlinConsoleLogExample() {
    val driver = ChromeDriver()
    val devTools = driver.devTools
    devTools.createSession()

    val logConsole = { c: ConsoleEvent -> print("Console log message is: " + c.messages)}
    devTools.domains.events().addConsoleListener(logConsole)

    driver.get("https://www.google.com")

    val executor = driver as JavascriptExecutor
    executor.executeScript("console.log('Hello World')")

    val input = driver.findElement(By.name("q"))
    input.sendKeys("Selenium 4")
    input.sendKeys(Keys.RETURN)
    driver.quit()
}

JavaScript exceptions

Listen to the JS Exceptions and register callbacks to process the exception details.

    async with driver.bidi_connection() as session:
        log = Log(driver, session)

        async with log.add_js_error_listener() as messages:
            using IJavaScriptEngine monitor = new JavaScriptEngine(driver);
            var messages = new List<string>();
            monitor.JavaScriptExceptionThrown += (_, e) =>
            {
                messages.Add(e.Message);
            };

            await monitor.StartEventMonitoring();
    exceptions = []
    driver.on_log_event(:exception) { |exception| exceptions << exception }

Network Interception

Both requests and responses can be recorded or transformed.

Response information

    CopyOnWriteArrayList<String> contentType = new CopyOnWriteArrayList<>();

    try (NetworkInterceptor ignored =
        new NetworkInterceptor(
            driver,
            (Filter)
                next ->
                    req -> {
                      HttpResponse res = next.execute(req);
                      contentType.add(res.getHeader("Content-Type"));
                      return res;
                    })) {
            var contentType = new List<string>();

            INetwork networkInterceptor = driver.Manage().Network;
            networkInterceptor.NetworkResponseReceived += (_, e)  =>
            {
                contentType.Add(e.ResponseHeaders["content-type"]);
            };

            await networkInterceptor.StartMonitoring();
    content_type = []
    driver.intercept do |request, &continue|
      continue.call(request) do |response|
        content_type << response.headers['content-type']
      end
    end

Response transformation

    try (NetworkInterceptor ignored =
        new NetworkInterceptor(
            driver,
            Route.matching(req -> true)
                .to(
                    () ->
                        req ->
                            new HttpResponse()
                                .setStatus(200)
                                .addHeader("Content-Type", MediaType.HTML_UTF_8.toString())
                                .setContent(Contents.utf8String("Creamy, delicious cheese!"))))) {
            var handler = new NetworkResponseHandler()
            {
                ResponseMatcher = _ => true,
                ResponseTransformer = _ => new HttpResponseData
                {
                    StatusCode = 200,
                    Body = "Creamy, delicious cheese!"
                }
            };

            INetwork networkInterceptor = driver.Manage().Network;
            networkInterceptor.AddResponseHandler(handler);

            await networkInterceptor.StartMonitoring();
    driver.intercept do |request, &continue|
      continue.call(request) do |response|
        response.body = 'Creamy, delicious cheese!' if request.url.include?('blank')
      end
    end

Move Code

const connection = await driver.createCDPConnection('page')
let url = fileServer.whereIs("/cheese")
let httpResponse = new HttpResponse(url)
httpResponse.addHeaders("Content-Type", "UTF-8")
httpResponse.body = "sausages"
await driver.onIntercept(connection, httpResponse, async function () {
  let body = await driver.getPageSource()
  assert.strictEqual(body.includes("sausages"), true, `Body contains: ${body}`)
})
driver.get(url)

Move Code

val driver = ChromeDriver()
val interceptor = new NetworkInterceptor(
      driver,
      Route.matching(req -> true)
        .to(() -> req -> new HttpResponse()
          .setStatus(200)
          .addHeader("Content-Type", MediaType.HTML_UTF_8.toString())
          .setContent(utf8String("Creamy, delicious cheese!"))))

    driver.get(appServer.whereIs("/cheese"))

    String source = driver.getPageSource()

Request interception

            var handler = new NetworkRequestHandler
            {
                RequestMatcher = request => request.Url.Contains("one.js"),
                RequestTransformer = request =>
                {
                    request.Url = request.Url.Replace("one", "two");

                    return request;
                }
            };

            INetwork networkInterceptor = driver.Manage().Network;
            networkInterceptor.AddRequestHandler(handler);

            await networkInterceptor.StartMonitoring();
    driver.intercept do |request, &continue|
      uri = URI(request.url)
      request.url = uri.to_s.gsub('one', 'two') if uri.path&.end_with?('one.js')
      continue.call(request)
    end

2.8.2 - BiDirectional API (W3C compliant)

The following list of APIs will be growing as the WebDriver BiDirectional Protocol grows and browser vendors implement the same. Additionally, Selenium will try to support real-world use cases that internally use a combination of W3C BiDi protocol APIs.

If there is additional functionality you’d like to see, please raise a feature request.

2.8.2.1 - Browsing Context

Page being translated from English to Chinese. Do you speak Chinese? Help us to translate it by sending us pull requests!

This section contains the APIs related to browsing context commands.

Open a new window

Creates a new browsing context in a new window.

        BrowsingContext browsingContext = new BrowsingContext(driver, WindowType.WINDOW);

Open a new tab

Creates a new browsing context in a new tab.

        BrowsingContext browsingContext = new BrowsingContext(driver, WindowType.TAB);

Use existing window handle

Creates a browsing context for the existing tab/window to run commands.

        String id = driver.getWindowHandle();
        BrowsingContext browsingContext = new BrowsingContext(driver, id);

Open a window with a reference browsing context

A reference browsing context is a top-level browsing context. The API allows to pass the reference browsing context, which is used to create a new window. The implementation is operating system specific.

        BrowsingContext
                browsingContext =
                new BrowsingContext(driver, WindowType.WINDOW, driver.getWindowHandle());

Open a tab with a reference browsing context

A reference browsing context is a top-level browsing context. The API allows to pass the reference browsing context, which is used to create a new tab. The implementation is operating system specific.

        BrowsingContext
                browsingContext =
                new BrowsingContext(driver, WindowType.TAB, driver.getWindowHandle());
        BrowsingContext browsingContext = new BrowsingContext(driver, WindowType.TAB);

        NavigationResult info = browsingContext.navigate("https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html");
        BrowsingContext browsingContext = new BrowsingContext(driver, WindowType.TAB);

        NavigationResult info = browsingContext.navigate("https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html",
                ReadinessState.COMPLETE);

Get browsing context tree

Provides a tree of all browsing contexts descending from the parent browsing context, including the parent browsing context.

        String referenceContextId = driver.getWindowHandle();
        BrowsingContext parentWindow = new BrowsingContext(driver, referenceContextId);

        parentWindow.navigate("https://www.selenium.dev/selenium/web/iframes.html", ReadinessState.COMPLETE);

        List<BrowsingContextInfo> contextInfoList = parentWindow.getTree();

Get browsing context tree with depth

Provides a tree of all browsing contexts descending from the parent browsing context, including the parent browsing context upto the depth value passed.

        String referenceContextId = driver.getWindowHandle();
        BrowsingContext parentWindow = new BrowsingContext(driver, referenceContextId);

        parentWindow.navigate("https://www.selenium.dev/selenium/web/iframes.html", ReadinessState.COMPLETE);

Get All Top level browsing contexts

        BrowsingContext window1 = new BrowsingContext(driver, driver.getWindowHandle());
        BrowsingContext window2 = new BrowsingContext(driver, WindowType.WINDOW);

        List<BrowsingContextInfo> contextInfoList = window1.getTopLevelContexts();

Close a tab/window

        BrowsingContext window1 = new BrowsingContext(driver, WindowType.WINDOW);
        BrowsingContext window2 = new BrowsingContext(driver, WindowType.WINDOW);

        window2.close();

2.8.2.2 - BiDirectional API (W3C compliant)

Page being translated from English to Chinese. Do you speak Chinese? Help us to translate it by sending us pull requests!

This section contains the APIs related to logging.

Console logs

Listen to the console.log events and register callbacks to process the event.

    public void jsErrors() {
        CopyOnWriteArrayList<ConsoleLogEntry> logs = new CopyOnWriteArrayList<>();

        try (LogInspector logInspector = new LogInspector(driver)) {
            logInspector.onConsoleEntry(logs::add);
        }

        driver.get("https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html");
            const inspector = await LogInspector(driver)
            await inspector.onConsoleEntry(function (log) {
              logEntry = log
            })
    
            await driver.get('https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html')
            await driver.findElement({ id: 'consoleLog' }).click()
            
            assert.equal(logEntry.text, 'Hello, world!')
            assert.equal(logEntry.realm, null)
            assert.equal(logEntry.type, 'console')
            assert.equal(logEntry.level, 'info')
            assert.equal(logEntry.method, 'log')
            assert.equal(logEntry.stackTrace, null)
            assert.equal(logEntry.args.length, 1)

JavaScript exceptions

Listen to the JS Exceptions and register callbacks to process the exception details.

            logInspector.onJavaScriptLog(future::complete);

            driver.get("https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html");
            driver.findElement(By.id("jsException")).click();

            JavascriptLogEntry logEntry = future.get(5, TimeUnit.SECONDS);

            Assertions.assertEquals("Error: Not working", logEntry.getText());
        const inspector = await LogInspector(driver)
        await inspector.onJavascriptException(function (log) {
            logEntry = log
        })

        await driver.get('https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html')
        await driver.findElement({ id: 'jsException' }).click()

        assert.equal(logEntry.text, 'Error: Not working')
        assert.equal(logEntry.type, 'javascript')
        assert.equal(logEntry.level, 'error')

Listen to JS Logs

Listen to all JS logs at all levels and register callbacks to process the log.

            driver.findElement(By.id("consoleLog")).click();

            ConsoleLogEntry logEntry = future.get(5, TimeUnit.SECONDS);

            Assertions.assertEquals("Hello, world!", logEntry.getText());
            Assertions.assertNull(logEntry.getRealm());
            Assertions.assertEquals(1, logEntry.getArgs().size());
            Assertions.assertEquals("console", logEntry.getType());

2.9 - Support features

针对更高层面功能的额外支持类.

Selenium的核心库试图提供底层以及普适的功能. 每种语言的支持类都为常见交互提供特定的包装器, 可用于简化某些行为.

2.9.1 - Waiting with Expected Conditions

These are classes used to describe what needs to be waited for.

Expected Conditions are used with Explicit Waits. Instead of defining the block of code to be executed with a lambda, an expected conditions method can be created to represent common things that get waited on. Some methods take locators as arguments, others take elements as arguments.

These methods can include conditions such as:

  • element exists
  • element is stale
  • element is visible
  • text is visible
  • title contains specified value
[Expected Conditions Documentation](https://www.selenium.dev/selenium/docs/api/py/webdriver_support/selenium.webdriver.support.expected_conditions.html)

Add Example

.NET stopped supporting Expected Conditions in Selenium 4 to minimize maintenance hassle and redundancy.
Ruby makes frequent use of blocks, procs and lambdas and does not need Expected Conditions classes

2.9.2 - 命令监听器

允许您在每次发送特定 Selenium 命令时执行自定义操作

2.9.3 - 同颜色一起工作

在测试中, 您偶尔会需要验证某事物的颜色;问题是网络上的颜色定义不是个常量. 如果有一种简单的方法可以比较颜色的十六进制与RGB呈现, 或者颜色的RGBA与HSLA呈现, 岂不美哉?

不用担心有一个解决方案:Color 类!

首先, 您需要导入该类:

import org.openqa.selenium.support.Color;
  
from selenium.webdriver.support.color import Color
  
// This feature is not implemented - Help us by sending a pr to implement this feature
  
include Selenium::WebDriver::Support
  
// This feature is not implemented - Help us by sending a pr to implement this feature
  
import org.openqa.selenium.support.Color

您现在可以开始创建颜色对象. 每个颜色对象都需要使用您颜色的字符串定义来创建. 支持的颜色定义如下:

private final Color HEX_COLOUR = Color.fromString("#2F7ED8");
private final Color RGB_COLOUR = Color.fromString("rgb(255, 255, 255)");
private final Color RGB_COLOUR = Color.fromString("rgb(40%, 20%, 40%)");
private final Color RGBA_COLOUR = Color.fromString("rgba(255, 255, 255, 0.5)");
private final Color RGBA_COLOUR = Color.fromString("rgba(40%, 20%, 40%, 0.5)");
private final Color HSL_COLOUR = Color.fromString("hsl(100, 0%, 50%)");
private final Color HSLA_COLOUR = Color.fromString("hsla(100, 0%, 50%, 0.5)");
  
HEX_COLOUR = Color.from_string('#2F7ED8')
RGB_COLOUR = Color.from_string('rgb(255, 255, 255)')
RGB_COLOUR = Color.from_string('rgb(40%, 20%, 40%)')
RGBA_COLOUR = Color.from_string('rgba(255, 255, 255, 0.5)')
RGBA_COLOUR = Color.from_string('rgba(40%, 20%, 40%, 0.5)')
HSL_COLOUR = Color.from_string('hsl(100, 0%, 50%)')
HSLA_COLOUR = Color.from_string('hsla(100, 0%, 50%, 0.5)')
  
// This feature is not implemented - Help us by sending a pr to implement this feature
  
HEX_COLOUR = Color.from_string('#2F7ED8')
RGB_COLOUR = Color.from_string('rgb(255, 255, 255)')
RGB_COLOUR = Color.from_string('rgb(40%, 20%, 40%)')
RGBA_COLOUR = Color.from_string('rgba(255, 255, 255, 0.5)')
RGBA_COLOUR = Color.from_string('rgba(40%, 20%, 40%, 0.5)')
HSL_COLOUR = Color.from_string('hsl(100, 0%, 50%)')
HSLA_COLOUR = Color.from_string('hsla(100, 0%, 50%, 0.5)')
  
// This feature is not implemented - Help us by sending a pr to implement this feature
  
private val HEX_COLOUR = Color.fromString("#2F7ED8")
private val RGB_COLOUR = Color.fromString("rgb(255, 255, 255)")
private val RGB_COLOUR_PERCENT = Color.fromString("rgb(40%, 20%, 40%)")
private val RGBA_COLOUR = Color.fromString("rgba(255, 255, 255, 0.5)")
private val RGBA_COLOUR_PERCENT = Color.fromString("rgba(40%, 20%, 40%, 0.5)")
private val HSL_COLOUR = Color.fromString("hsl(100, 0%, 50%)")
private val HSLA_COLOUR = Color.fromString("hsla(100, 0%, 50%, 0.5)")
  

Color类还支持在以下网址中指定的所有基本颜色定义 http://www.w3.org/TR/css3-color/#html4.

private final Color BLACK = Color.fromString("black");
private final Color CHOCOLATE = Color.fromString("chocolate");
private final Color HOTPINK = Color.fromString("hotpink");
  
BLACK = Color.from_string('black')
CHOCOLATE = Color.from_string('chocolate')
HOTPINK = Color.from_string('hotpink')
  
// This feature is not implemented - Help us by sending a pr to implement this feature
  
BLACK = Color.from_string('black')
CHOCOLATE = Color.from_string('chocolate')
HOTPINK = Color.from_string('hotpink')
  
// This feature is not implemented - Help us by sending a pr to implement this feature
  
private val BLACK = Color.fromString("black")
private val CHOCOLATE = Color.fromString("chocolate")
private val HOTPINK = Color.fromString("hotpink")
  

如果元素上未设置颜色, 则有时浏览器会返回“透明”的颜色值. Color类也支持此功能:

private final Color TRANSPARENT = Color.fromString("transparent");
  
TRANSPARENT = Color.from_string('transparent')
  
// This feature is not implemented - Help us by sending a pr to implement this feature
  
TRANSPARENT = Color.from_string('transparent')
  
// This feature is not implemented - Help us by sending a pr to implement this feature
  
private val TRANSPARENT = Color.fromString("transparent")
  

现在, 您可以安全地查询元素以获取其颜色/背景色, 任何响应都将被正确解析并转换为有效的Color对象:

Color loginButtonColour = Color.fromString(driver.findElement(By.id("login")).getCssValue("color"));

Color loginButtonBackgroundColour = Color.fromString(driver.findElement(By.id("login")).getCssValue("background-color"));
  
login_button_colour = Color.from_string(driver.find_element(By.ID,'login').value_of_css_property('color'))

login_button_background_colour = Color.from_string(driver.find_element(By.ID,'login').value_of_css_property('background-color'))
  
// This feature is not implemented - Help us by sending a pr to implement this feature
  
login_button_colour = Color.from_string(driver.find_element(id: 'login').css_value('color'))

login_button_background_colour = Color.from_string(driver.find_element(id: 'login').css_value('background-color'))
  
// This feature is not implemented - Help us by sending a pr to implement this feature
  
val loginButtonColour = Color.fromString(driver.findElement(By.id("login")).getCssValue("color"))

val loginButtonBackgroundColour = Color.fromString(driver.findElement(By.id("login")).getCssValue("background-color"))
  

然后, 您可以直接比较颜色对象:

assert loginButtonBackgroundColour.equals(HOTPINK);
  
assert login_button_background_colour == HOTPINK
  
// This feature is not implemented - Help us by sending a pr to implement this feature
  
assert(login_button_background_colour == HOTPINK)
  
// This feature is not implemented - Help us by sending a pr to implement this feature
  
assert(loginButtonBackgroundColour.equals(HOTPINK))
  

或者, 您可以将颜色转换为以下格式之一并执行静态验证:

assert loginButtonBackgroundColour.asHex().equals("#ff69b4");
assert loginButtonBackgroundColour.asRgba().equals("rgba(255, 105, 180, 1)");
assert loginButtonBackgroundColour.asRgb().equals("rgb(255, 105, 180)");
  
assert login_button_background_colour.hex == '#ff69b4'
assert login_button_background_colour.rgba == 'rgba(255, 105, 180, 1)'
assert login_button_background_colour.rgb == 'rgb(255, 105, 180)'
  
// This feature is not implemented - Help us by sending a pr to implement this feature
  
assert(login_button_background_colour.hex == '#ff69b4')
assert(login_button_background_colour.rgba == 'rgba(255, 105, 180, 1)')
assert(login_button_background_colour.rgb == 'rgb(255, 105, 180)')
  
// This feature is not implemented - Help us by sending a pr to implement this feature
  
assert(loginButtonBackgroundColour.asHex().equals("#ff69b4"))
assert(loginButtonBackgroundColour.asRgba().equals("rgba(255, 105, 180, 1)"))
assert(loginButtonBackgroundColour.asRgb().equals("rgb(255, 105, 180)"))
  

颜色不再是问题.

2.9.4 - 线程守卫

此类仅在Java中可用

ThreadGuard检查是否仅从创建驱动程序的同一线程中调用了驱动程序. 线程问题 (尤其是在Parallel中运行测试时) 可能遇到神秘并且难以诊断错误. 使用此包装器可以防止此类错误,
并且在发生此类情况时会抛出异常.

以下的示例模拟一种线程冲突的情况:

public class DriverClash {
  //thread main (id 1) created this driver
  private WebDriver protectedDriver = ThreadGuard.protect(new ChromeDriver());

  static {
    System.setProperty("webdriver.chrome.driver", "<Set path to your Chromedriver>");
  }

  //Thread-1 (id 24) is calling the same driver causing the clash to happen
  Runnable r1 = () -> {protectedDriver.get("https://selenium.dev");};
  Thread thr1 = new Thread(r1);

  void runThreads(){
    thr1.start();
  }

  public static void main(String[] args) {
    new DriverClash().runThreads();
  }
}

结果如下所示:

Exception in thread "Thread-1" org.openqa.selenium.WebDriverException:
Thread safety error; this instance of WebDriver was constructed
on thread main (id 1)and is being accessed by thread Thread-1 (id 24)
This is not permitted and *will* cause undefined behaviour

正如示例所示:

  • protectedDriver 将在主线程中创建
  • 我们使用Java的 Runnable 启动一个新进程, 并使用一个新的 Thread 运行该进程
  • 这两个 Thread 都会发生冲突, 因为主线程的内存中没有 protectedDriver
  • ThreadGuard.protect 会抛出异常

注意:

这不能代替并发运行时使用 ThreadLocal 管理驱动程序的需求.

2.9.5 - 使用选择列表元素

与其他元素相比,选择列表具有特殊的行为.

Select对象现在将为您提供一系列命令, 用于允许您与 <select> 元素进行交互.

如果您使用的是 Java 或 .NET, 请确保您在代码中已正确加载所需的包. 您可以通过GitHub查看下面示例的完整代码.

请注意,此类仅适用于 HTML 元素 selectoption. 这个类将不适用于那些通过 divli 并使用JavaScript遮罩层设计的下拉列表.

类型

选择方法的行为可能会有所不同, 具体取决于正在使用的 <select> 元素的类型.

单选

这是标准的下拉对象,其只能选定一个选项.

<select name="selectomatic">
    <option selected="selected" id="non_multi_option" value="one">One</option>
    <option value="two">Two</option>
    <option value="four">Four</option>
    <option value="still learning how to count, apparently">Still learning how to count, apparently</option>
</select>

复选

此选择列表允许同时选定和取消选择多个选项. 这仅适用于具有 multiple 属性的 <select>元素.

<select name="multi" id="multi" multiple="multiple">
    <option selected="selected" value="eggs">Eggs</option>
    <option value="ham">Ham</option>
    <option selected="selected" value="sausages">Sausages</option>
    <option value="onion gravy">Onion gravy</option>
</select>

构建类

首先定位一个 <select> 元素, 然后借助其初始化一个Select 对象. 请注意, 从 Selenium 4.5 开始, 您无法针对禁用的 <select> 元素构建 Select 对象.

        WebElement selectElement = driver.findElement(By.name("selectomatic"));
        Select select = new Select(selectElement);
    select_element = driver.find_element(By.NAME, 'selectomatic')
    select = Select(select_element)
            var selectElement = driver.FindElement(By.Name("selectomatic"));
            var select = new SelectElement(selectElement);
    select_element = driver.find_element(name: 'selectomatic')
    select = Selenium::WebDriver::Support::Select.new(select_element)
      const selectElement = await driver.findElement(By.name('selectomatic'))
      const select = new Select(selectElement)
    val selectElement = driver.findElement(By.name("selectomatic"))
    val select = Select(selectElement)

选项列表

共有两种列表可以被获取:

全部选项

获取 <select> 元素中所有选项列表:

        List<WebElement> optionList = select.getOptions();
    option_list = select.options
            IList<IWebElement> optionList = select.Options;
    option_list = select.options
      const optionList = await select.getOptions()
    val optionList = select.getOptions()

选中的选项

获取 <select> 元素中所选中的选项列表. 对于标准选择列表这将只是一个包含一个元素的列表, 对于复选列表则表示包含的零个或多个元素.

        List<WebElement> selectedOptionList = select.getAllSelectedOptions();
    selected_option_list = select.all_selected_options
            IList<IWebElement> selectedOptionList = select.AllSelectedOptions;
    selected_option_list = select.selected_options
      const selectedOptionList = await select.getAllSelectedOptions()
    val selectedOptionList = select.getAllSelectedOptions()

选项

Select类提供了三种选择选项的方法. 请注意, 对于复选类型的选择列, 对于要选择的每个元素可以重复使用这些方法.

文本

根据其可见文本选择选项

        select.selectByVisibleText("Four");
    select.select_by_visible_text('Four')
            select.SelectByText("Four");
    select.select_by(:text, 'Four')
      await select.selectByVisibleText('Four')
    select.selectByVisibleText("Four")

根据其值属性选择选项

        select.selectByValue("two");
    select.select_by_value('two')
            select.SelectByValue("two");
    select.select_by(:value, 'two')
      await select.selectByValue('two')
    select.selectByValue("two")

序号

根据其在列表中的位置选择选项

        select.selectByIndex(3);
    select.select_by_index(3)
            select.SelectByIndex(3);
    select.select_by(:index, 3)
      await select.selectByIndex(3)
    select.selectByIndex(3)

禁用的选项

具有 disabled 属性的选项可能无法被选择.

    <select name="single_disabled">
      <option id="sinlge_disabled_1" value="enabled">Enabled</option>
      <option id="sinlge_disabled_2" value="disabled" disabled="disabled">Disabled</option>
    </select>
        Assertions.assertThrows(UnsupportedOperationException.class, () -> {
            select.selectByValue("disabled");
        });
    with pytest.raises(NotImplementedError):
        select.select_by_value('disabled')
            Assert.ThrowsException<InvalidOperationException>(() => select.SelectByValue("disabled"));
    expect {
      select.select_by(:value, 'disabled')
    }.to raise_exception(Selenium::WebDriver::Error::UnsupportedOperationError)
      await assert.rejects(async () => {
        await select.selectByValue("disabled")
      }, {
        name: 'UnsupportedOperationError',
    Assertions.assertThrows(UnsupportedOperationException::class.java) {
      select.selectByValue("disabled")
    }

取消选择选项

只有复选类型的选择列表才能取消选择选项. 您可以对要选择的每个元素重复使用这些方法.

        select.deselectByValue("eggs");
    select.deselect_by_value('eggs')
            select.DeselectByValue("eggs");
    select.deselect_by(:value, 'eggs')
      await select.deselectByValue('eggs')
    select.deselectByValue("eggs")

2.10 - 故障排除协助

如何管理 WebDriver 的问题.

Selenium错误的根本原因并不总是很明显.

  1. 最常见的Selenium相关错误, 是源自未及时同步的结果. 请阅读 等待策略. 当遇到一个问题, 如果不确定是否因为同步策略, 您可以尝试暂时硬编码一个较大的休眠时间, 您将明确添加显式等待是否有帮助.

  2. 请注意, 报告给项目的许多错误, 实际上是由Selenium向其发送命令的基础驱动程序所引起的. 您可以通过执行 浏览器 中的 多个命令来解决驱动程序问题.

  3. 如果您对如何执行有疑惑, 请查看 支持选项 获取帮助的方法.

  4. 如果您认为您发现了Selenium代码的问题, 请在Github上提交 问题报告.

2.10.1 - 理解常见的异常

如何处理Selenium代码中的各种问题.

无效选择器的异常 (Invalid Selector Exception)

某些时候难以获得正确的CSS以及XPath选择器。

潜在原因

您尝试使用的CSS或XPath选择器包含无效字符或无效查询。

可行方案

通过验证器服务运行选择器:

或者使用浏览器扩展程序来获取已知的良好值:

没有这样元素的异常 (No Such Element Exception)

在您尝试找到该元素的当前时刻无法定位元素。

潜在原因

  • 您在错误的位置寻找元素 (也许以前的操作不成功)
  • 您在错误的时间寻找元素 (该元素尚未显示在 DOM 中)
  • 自您编写代码以来定位器已变更

可行方案

  • 确保您位于期望的页面上,并且代码中的前置操作已正确完成
  • 确保您使用的是正确的 等待策略
  • 使用浏览器的devtools控制台更新定位器或使用浏览器扩展程序,例如:

过时元素引用的异常 (Stale Element Reference Exception)

当成功定位到元素时, WebDriver会为其设置一个引用ID作为标记, 如果由于上下文环境发生变化, 导致之前元素的位置发生了变化或者无法找到了, WebDriver并不会自动重新定位, 任何使用之前元素所做的操作将报错该异常。

常见因素

以下情况可能发生此异常:

  • 您已刷新页面,或者页面的 DOM 已动态更改。
  • 您已导航到其他页面。
  • 您已切换到另一个窗口,或者进入/移出某个 frame / iframe

常见方案

DOM已变更

当页面刷新或页面上的项目各处移动时, 页面上仍然有一个具有所需定位器的元素, 它只是不再被正在使用的元素对象访问, 并且必须重新定位该元素才能再次使用。

这往往通过以下两种方式之一完成:

  • 每次使用时都要重新定位元素。 尽管有可能元素在定位和使用元素之间的微秒内, 发生变化的可能性很小。 缺点是这不是最有效的方法, 尤其是在 Remote Grid上运行时。

  • 用另一个存储定位器的对象包装 Web 元素,并缓存定位的 Selenium 元素。 对该包装对象执行操作时,您可以尝试使用之前找到的缓存对象, 如果它是发生了变化,则可以捕获异常, 使用存储的定位器重新定位元素,并重试该方法。 这样效率更高,但如果您使用的定位器在页面更改后引用了不同的元素(而不是您想要的元素),则可能会导致问题。

上下文已变更

元素对象是针对特定的上下文存储的, 因此如果您切换到不同的上下文, 比如不同的 Window 或不同的 frameiframe 元素引用仍然有效, 但暂时无法访问。在这种情况下, 重新定位元素无济于事,因为它在当前上下文中不存在。

要解决此问题,您需要确保在使用该元素之前切换回正确的上下文。

页面已变更

这种情况发生在您不仅更改了上下文, 而且导航到另一个页面并破坏了元素所在的上下文。 您无法仅从当前上下文重新定位它, 也无法切换回元素有效的活动上下文。 如果这是您的错误原因, 您必须回到正确的位置并重新定位元素。

2.10.1.1 - Unable to Locate Driver Error

Troubleshooting missing path to driver executable.

Historically, this is the most common error beginning Selenium users get when trying to run code for the first time:

The path to the driver executable must be set by the webdriver.chrome.driver system property; for more information, see https://chromedriver.chromium.org/. The latest version can be downloaded from https://chromedriver.chromium.org/downloads
The executable chromedriver needs to be available in the path.
The file geckodriver does not exist. The driver can be downloaded at https://github.com/mozilla/geckodriver/releases"
Unable to locate the chromedriver executable;

Likely cause

Through WebDriver, Selenium supports all major browsers. In order to drive the requested browser, Selenium needs to send commands to it via an executable driver. This error means the necessary driver could not be found by any of the means Selenium attempts to use.

Possible solutions

There are several ways to ensure Selenium gets the driver it needs.

Use the latest version of Selenium

As of Selenium 4.6, Selenium downloads the correct driver for you. You shouldn’t need to do anything. If you are using the latest version of Selenium and you are getting an error, please turn on logging and file a bug report with that information.

If you want to read more information about how Selenium manages driver downloads for you, you can read about the Selenium Manager.

Use the PATH environment variable

This option first requires manually downloading the driver.

This is a flexible option to change location of drivers without having to update your code, and will work on multiple machines without requiring that each machine put the drivers in the same place.

You can either place the drivers in a directory that is already listed in PATH, or you can place them in a directory and add it to PATH.

To see what directories are already on PATH, open a Terminal and execute:

echo $PATH

If the location to your driver is not already in a directory listed, you can add a new directory to PATH:

echo 'export PATH=$PATH:/path/to/driver' >> ~/.bash_profile
source ~/.bash_profile

You can test if it has been added correctly by checking the version of the driver:

chromedriver --version

To see what directories are already on PATH, open a Terminal and execute:

echo $PATH

If the location to your driver is not already in a directory listed, you can add a new directory to PATH:

echo 'export PATH=$PATH:/path/to/driver' >> ~/.zshenv
source ~/.zshenv

You can test if it has been added correctly by checking the version of the driver:

chromedriver --version

To see what directories are already on PATH, open a Command Prompt and execute:

echo %PATH%

If the location to your driver is not already in a directory listed, you can add a new directory to PATH:

setx PATH "%PATH%;C:\WebDriver\bin"

You can test if it has been added correctly by checking the version of the driver:

chromedriver.exe --version

Specify the location of the driver

If you cannot upgrade to the latest version of Selenium, you do not want Selenium to download drivers for you, and you can’t figure out the environment variables, you can specify the location of the driver in the Service object.

You first need to download the desired driver, then create an instance of the applicable Service class and set the path.

Specifying the location in the code itself has the advantage of not needing to figure out Environment Variables on your system, but has the drawback of making the code less flexible.

Driver management libraries

Before Selenium managed drivers itself, other projects were created to do so for you.

If you can’t use Selenium Manager because you are using an older version of Selenium (please upgrade), or need an advanced feature not yet implemented by Selenium Manager, you might try one of these tools to keep your drivers automatically updated:

Download the driver

浏览器支持的操作系统维护者下载问题追溯
Chromium/ChromeWindows/macOS/LinuxGoogle下载Issues
FirefoxWindows/macOS/LinuxMozilla下载Issues
EdgeWindows/macOS/LinuxMicrosoft下载Issues
Internet ExplorerWindowsSelenium Project下载Issues
SafarimacOS High Sierra and newerApple内置Issues

备注:Opera驱动不再适用于Selenium的最新功能,目前官方不支持。

2.10.2 - 记录 Selenium 命令

获取Selenium的执行信息.

启用日志记录是获取额外信息的宝贵方法, 这些信息可能有助于您确定 遇到问题的原因.

获取一个logger

Java 日志通常是按类创建的. 您可以通过默认logger来使用所有loggers. 为了过滤特定类, 请参考 过滤器

获取根logger:

        Logger logger = Logger.getLogger("");

Java日志并不简单直接, 如果您只是在寻找一种简单的方法 查看重要的Selenium日志, 请参阅 Selenium Logger 项目

Python logs are typically created per module. You can match all submodules by referencing the top level module. So to work with all loggers in selenium module, you can do this:

    logger = logging.getLogger('selenium')
.NET does not currently have a Logging implementation

If you want to see as much debugging as possible in all the classes, you can turn on debugging globally in Ruby by setting $DEBUG = true.

For more fine-tuned control, Ruby Selenium created its own Logger class to wrap the default Logger class. This implementation provides some interesting additional features. Obtain the logger directly from the #loggerclass method on the Selenium::WebDriver module:

    logger = Selenium::WebDriver.logger
const logging = require('selenium-webdriver/lib/logging')
logger = logging.getLogger('webdriver')

日志级别

Logger级别有助于根据日志的严重性过滤日志.

Java有七种logger级别: SEVERE, WARNING, INFO, CONFIG, FINE, FINER, 以及 FINEST. 默认是 INFO.

您必须更改logger的级别和根logger上的处理程序的级别:

        logger.setLevel(Level.FINE);
        Arrays.stream(logger.getHandlers()).forEach(handler -> {
            handler.setLevel(Level.FINE);
        });

Python has 6 logger levels: CRITICAL, ERROR, WARNING, INFO, DEBUG, and NOTSET. The default is WARNING

To change the level of the logger:

    logger.setLevel(logging.DEBUG)

Things get complicated when you use PyTest, though. By default, PyTest hides logging unless the test fails. You need to set 3 things to get PyTest to display logs on passing tests.

To always output logs with PyTest you need to run with additional arguments. First, -s to prevent PyTest from capturing the console. Second, -p no:logging, which allows you to override the default PyTest logging settings so logs can be displayed regardless of errors.

So you need to set these flags in your IDE, or run PyTest on command line like:

pytest -s -p no:logging

Finally, since you turned off logging in the arguments above, you now need to add configuration to turn it back on:

logging.basicConfig(level=logging.WARN)
.NET does not currently have a Logging implementation

Ruby logger has 5 logger levels: :debug, :info, :warn, :error, :fatal. As of Selenium v4.9.1, The default is :info.

To change the level of the logger:

    logger.level = :debug

JavaScript has 9 logger levels: OFF, SEVERE, WARNING, INFO, DEBUG, FINE, FINER, FINEST, ALL. The default is OFF.

To change the level of the logger:

logger.setLevel(logging.Level.INFO)

Actionable items

对于需要用户后续行动的操作, 会将其记录为警告. 这经常用于被弃用的内容. 由于各种原因, Selenium项目不遵循标准的语义版本控制实践. 我们的政策是将 3 个版本的内容标记为已弃用后, 再删除它们, 因此弃用可能记录为警告.

Java 日志可操作的内容在logger级别 WARN

Example:

May 08, 2023 9:23:38 PM dev.selenium.troubleshooting.LoggingTest logging
WARNING: this is a warning

Python logs actionable content at logger level — WARNING Details about deprecations are logged at this level.

Example:

WARNING  selenium:test_logging.py:23 this is a warning
.NET does not currently have a Logging implementation

Ruby logs actionable content at logger level — :warn. Details about deprecations are logged at this level.

For example:

2023-05-08 20:53:13 WARN Selenium [:example_id] this is a warning 

Because these items can get annoying, we’ve provided an easy way to turn them off, see filtering section below.

Useful information

这是默认级别, Selenium 记录用户应该注意但不需要对其执行操作的内容. 这可能会引用新方法或将用户定向到有关某些内容的详细信息

Java 日志有用的信息在logger 级别 INFO

Example:

May 08, 2023 9:23:38 PM dev.selenium.troubleshooting.LoggingTest logging
INFO: this is useful information

Python logs useful information at logger level — INFO

Example:

INFO     selenium:test_logging.py:22 this is useful information
.NET does not currently have a Logging implementation

Ruby logs useful information at logger level — :info.

Example:

2023-05-08 20:53:13 INFO Selenium [:example_id] this is useful information 

Logs useful information at level: INFO

Debugging Details

调试日志级别用于诊断问题和解决问题可能需要的信息.

Java日志的大多数调试信息在logger 级别 FINE

Example:

May 08, 2023 9:23:38 PM dev.selenium.troubleshooting.LoggingTest logging
FINE: this is detailed debug information

Python logs debugging details at logger level — DEBUG

Example:

DEBUG    selenium:test_logging.py:24 this is detailed debug information
.NET does not currently have a Logging implementation

Ruby only provides one level for debugging, so all details are at logger level — :debug.

Example:

2023-05-08 20:53:13 DEBUG Selenium [:example_id] this is detailed debug information 

Logs debugging details at level: FINER and FINEST

Logger 输出

日志可以显示在控制台中, 也可以存储在文件中. 不同的语言有不同的默认值.

默认情况下, 所有日志都发送到 System.err. 要将输出定向到文件, 您需要添加一个处理程序:

        Handler handler = new FileHandler("selenium.xml");
        logger.addHandler(handler);

By default all logs are sent to sys.stderr. To direct output somewhere else, you need to add a handler with either a StreamHandler or a FileHandler:

    handler = logging.FileHandler(log_path)
    logger.addHandler(handler)
.NET does not currently have a Logging implementation

By default, logs are sent to the console in stdout.
To store the logs in a file:

    logger.output = file_name

JavaScript does not currently support sending output to a file.

To send logs to console output:

logging.installConsoleHandler()

Logger 过滤

Java 日志记录是按类级别管理的, 因此不要使用根logger (Logger.getLogger("")), 而是在每个类的基础上设置要使用的级别:

        Logger.getLogger(RemoteWebDriver.class.getName()).setLevel(Level.FINEST);
        Logger.getLogger(SeleniumManager.class.getName()).setLevel(Level.SEVERE);
Because logging is managed by module, instead of working with just "selenium", you can specify different levels for different modules:
    logging.getLogger('selenium.webdriver.remote').setLevel(logging.WARN)
    logging.getLogger('selenium.webdriver.common').setLevel(logging.DEBUG)
.NET does not currently have a Logging implementation

Ruby’s logger allows you to opt in (“allow”) or opt out (“ignore”) of log messages based on their IDs. Everything that Selenium logs includes an ID. You can also turn on or off all deprecation notices by using :deprecations.

These methods accept one or more symbols or an array of symbols:

    logger.ignore(:jwp_caps, :logger_info)

or

    logger.allow(%i[selenium_manager example_id])

2.10.3 - 升级到Selenium 4

对Selenium 4感兴趣? 查看本指南, 它将帮助您升级到最新版本!

如果您使用的是官方支持的语言 (Ruby、JavaScript、C#、Python和Java), 那么升级到Selenium 4应该是一个轻松的过程. 在某些情况下可能会出现一些问题, 本指南将帮助您解决这些问题. 我们将完成升级项目依赖项的步骤, 并了解版本升级带来的主要反对意见和更改.

请按照以下步骤升级到Selenium 4:

  • 准备我们的测试代码
  • 升级依赖
  • 潜在错误和弃用消息

注意:在开发Selenium 3.x版本的同时, 实现了对W3C WebDriver标准的支持. 此新协议和遗留JSON Wire协议均受支持. 在3.11版前后, Selenium代码与W3C 1级规范兼容. 最新版本的Selenium 3中的W3C兼容代码将在Selenium 4中正常工作.

准备测试代码

Selenium 4 移除了对遗留协议的支持, 并在底层实现上默认使用 W3C WebDriver 标准.
对于大多数情况, 这种实现不会影响终端用户.
主要的例外是 CapabilitiesActions 类.

Capabilities

如果测试capabilities的结构不符合 W3C标准, 可能会导致会话无法正常开启.
以下是 W3C WebDriver 标准capabilities列表:

  • browserName
  • browserVersion (替代 version)
  • platformName (替代 platform)
  • acceptInsecureCerts
  • pageLoadStrategy
  • proxy
  • timeouts
  • unhandledPromptBehavior

可以在以下位置找到标准capabilities的最新列表 W3C WebDriver.

上面列表中未包含的任何capability, 都需要包含供应商前缀. 这适用于浏览器特定capability 以及云供应商特定capability. 例如, 如果您的云供应商为您的测试 使用 buildname capability, 您需要将它们包装在一个 cloud: options 块中 (请与您的云供应商联系以获取适当的前缀).

Before

DesiredCapabilities caps = DesiredCapabilities.firefox();
caps.setCapability("platform", "Windows 10");
caps.setCapability("version", "92");
caps.setCapability("build", myTestBuild);
caps.setCapability("name", myTestName);
WebDriver driver = new RemoteWebDriver(new URL(cloudUrl), caps);
caps = {};
caps['browserName'] = 'Firefox';
caps['platform'] = 'Windows 10';
caps['version'] = '92';
caps['build'] = myTestBuild;
caps['name'] = myTestName;
DesiredCapabilities caps = new DesiredCapabilities();
caps.SetCapability("browserName", "firefox");
caps.SetCapability("platform", "Windows 10");
caps.SetCapability("version", "92");
caps.SetCapability("build", myTestBuild);
caps.SetCapability("name", myTestName);
var driver = new RemoteWebDriver(new Uri(CloudURL), caps);
caps = Selenium: : WebDriver: : Remote: : Capabilities.firefox
caps[: platform] = 'Windows 10'
caps[: version] = '92'
caps[: build] = my_test_build
caps[: name] = my_test_name
driver = Selenium: : WebDriver.for : remote, url:  cloud_url, desired_capabilities:  caps
caps = {}
caps['browserName'] = 'firefox'
caps['platform'] = 'Windows 10'
caps['version'] = '92'
caps['build'] = my_test_build
caps['name'] = my_test_name
driver = webdriver.Remote(cloud_url, desired_capabilities=caps)

After

FirefoxOptions browserOptions = new FirefoxOptions();
browserOptions.setPlatformName("Windows 10");
browserOptions.setBrowserVersion("92");
Map<String, Object> cloudOptions = new HashMap<>();
cloudOptions.put("build", myTestBuild);
cloudOptions.put("name", myTestName);
browserOptions.setCapability("cloud: options", cloudOptions);
WebDriver driver = new RemoteWebDriver(new URL(cloudUrl), browserOptions);
capabilities = {
  browserName:  'firefox',
  browserVersion:  '92',
  platformName:  'Windows 10',
  'cloud: options':  {
     build:  myTestBuild,
     name:  myTestName,
  }
}
var browserOptions = new FirefoxOptions();
browserOptions.PlatformName = "Windows 10";
browserOptions.BrowserVersion = "92";
var cloudOptions = new Dictionary<string, object>();
cloudOptions.Add("build", myTestBuild);
cloudOptions.Add("name", myTestName);
browserOptions.AddAdditionalOption("cloud: options", cloudOptions);
var driver = new RemoteWebDriver(new Uri(CloudURL), browserOptions);
options = Selenium: : WebDriver: : Options.firefox
options.browser_version = 'latest'
options.platform_name = 'Windows 10'
cloud_options = {}
cloud_options[: build] = my_test_build
cloud_options[: name] = my_test_name
options.add_option('cloud: options', cloud_options)
driver = Selenium: : WebDriver.for : remote, url:  cloud_url, capabilities:  options
from selenium.webdriver.firefox.options import Options as FirefoxOptions
options = FirefoxOptions()
options.browser_version = '92'
options.platform_name = 'Windows 10'
cloud_options = {}
cloud_options['build'] = my_test_build
cloud_options['name'] = my_test_name
options.set_capability('cloud: options', cloud_options)
driver = webdriver.Remote(cloud_url, options=options)

在 Java 中查找元素工具方法

在 Java 绑定(FindsBy 接口)中 查找元素的工具方法已被删除 因为它们仅供内部使用.
以下代码示例更好地解释了这一点.

使用 findElement* 查找单个元素

Before

driver.findElementByClassName("className");
driver.findElementByCssSelector(".className");
driver.findElementById("elementId");
driver.findElementByLinkText("linkText");
driver.findElementByName("elementName");
driver.findElementByPartialLinkText("partialText");
driver.findElementByTagName("elementTagName");
driver.findElementByXPath("xPath");
After

driver.findElement(By.className("className"));
driver.findElement(By.cssSelector(".className"));
driver.findElement(By.id("elementId"));
driver.findElement(By.linkText("linkText"));
driver.findElement(By.name("elementName"));
driver.findElement(By.partialLinkText("partialText"));
driver.findElement(By.tagName("elementTagName"));
driver.findElement(By.xpath("xPath"));

使用 findElements* 查找多个元素

Before

driver.findElementsByClassName("className");
driver.findElementsByCssSelector(".className");
driver.findElementsById("elementId");
driver.findElementsByLinkText("linkText");
driver.findElementsByName("elementName");
driver.findElementsByPartialLinkText("partialText");
driver.findElementsByTagName("elementTagName");
driver.findElementsByXPath("xPath");
After

driver.findElements(By.className("className"));
driver.findElements(By.cssSelector(".className"));
driver.findElements(By.id("elementId"));
driver.findElements(By.linkText("linkText"));
driver.findElements(By.name("elementName"));
driver.findElements(By.partialLinkText("partialText"));
driver.findElements(By.tagName("elementTagName"));
driver.findElements(By.xpath("xPath"));

升级依赖

检查下面的小节以安装 Selenium 4 并升级您的项目依赖项.

Java

升级 Selenium 的过程取决于所使用的构建工具.
我们将涵盖Java 中最常见的 MavenGradle .
所需的最低 Java 版本仍然是 8.

Maven

Before

<dependencies>
  <!-- more dependencies ... -->
  <dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-java</artifactId>
    <version>3.141.59</version>
  </dependency>
  <!-- more dependencies ... -->
</dependencies>
After

<dependencies>
    <!-- more dependencies ... -->
    <dependency>
        <groupId>org.seleniumhq.selenium</groupId>
        <artifactId>selenium-java</artifactId>
        <version>4.4.0</version>
    </dependency>
    <!-- more dependencies ... -->
</dependencies>

进行更改后, 您可以在pom.xml 文件的同一目录中 执行 mvn clean compile .

Gradle

Before

plugins {
    id 'java'
}
group 'org.example'
version '1.0-SNAPSHOT'
repositories {
    mavenCentral()
}
dependencies {
    testImplementation 'org.junit.jupiter: junit-jupiter-api: 5.7.0'
    testRuntimeOnly 'org.junit.jupiter: junit-jupiter-engine: 5.7.0'
    implementation group:  'org.seleniumhq.selenium', name:  'selenium-java', version:  '3.141.59'
}
test {
    useJUnitPlatform()
}
After

plugins {
    id 'java'
}
group 'org.example'
version '1.0-SNAPSHOT'
repositories {
    mavenCentral()
}
dependencies {
    testImplementation 'org.junit.jupiter: junit-jupiter-api: 5.7.0'
    testRuntimeOnly 'org.junit.jupiter: junit-jupiter-engine: 5.7.0'
    implementation group:  'org.seleniumhq.selenium', name:  'selenium-java', version:  '4.4.0'
}
test {
    useJUnitPlatform()
}

进行更改后, 您可以在 build.gradle 文件所在的同一目录中 执行./gradlew clean build .

要检查所有 Java 版本, 您可以前往 MVNRepository .

C#

在 C# 中获取 Selenium 4 更新的 地方是 NuGet .
在下面包 Selenium.WebDriver 你可以获得更新到最新版本的说明.
在 Visual Studio 内部, 您可以通过 NuGet 包管理器执行:

PM> Install-Package Selenium.WebDriver -Version 4.4.0

Python

使用 Python 的最重要变化是所需的最低版本. Selenium 4 将至少需要 Python 3.7 或更高版本.
更多详细信息可以在 Python 包索引 .
基于命令行做升级的话, 你可以执行:

pip install selenium==4.4.3

Ruby

Selenium 4 的更新细节 可以在RubyGems中的gem发现 selenium-webdriver .
要安装最新版本, 您可以执行:

gem install selenium-webdriver

将以下内容添加到你的Gemfile:

gem 'selenium-webdriver', '~> 4.4.0'

JavaScript

可以在 Node 包管理器中找到 selenium-webdriver 包, npmjs .
Selenium 4 可以在 这里 找到.
要安装, 你可以执行:

npm install selenium-webdriver

或者, 更新你的 package.json 并运行 npm install :

{
  "name":  "selenium-tests",
  "version":  "1.0.0",
  "dependencies":  {
    "selenium-webdriver":  "^4.4.0"
  }
}

潜在错误和弃用消息

这是一组代码示例, 它们将有助于克服 您升级到 Selenium 4 后 可能会遇到的弃用消息.

Java

等待和超时

Timeout 中接收到的参数 已经从期望 (long time, TimeUnit unit) 切换到期待 (Duration duration) .

Before

driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
driver.manage().timeouts().setScriptTimeout(2, TimeUnit.MINUTES);
driver.manage().timeouts().pageLoadTimeout(10, TimeUnit.SECONDS);
After

driver.manage().timeouts().implicitlyWait(Duration.ofSeconds(10));
driver.manage().timeouts().scriptTimeout(Duration.ofMinutes(2));
driver.manage().timeouts().pageLoadTimeout(Duration.ofSeconds(10));

等待现在也期望不同的参数. WebDriverWait 现在期待一个 Duration 而不是以秒和毫秒为单位的 long 超时. FluentWait 的工具方法 withTimeoutpollingEvery 已经从期望 (long time, TimeUnit unit) 切换到 期待(Duration duration) .

Before

new WebDriverWait(driver, 3)
.until(ExpectedConditions.elementToBeClickable(By.cssSelector("#id")));

Wait<WebDriver> wait = new FluentWait<WebDriver>(driver)
  .withTimeout(30, TimeUnit.SECONDS)
  .pollingEvery(5, TimeUnit.SECONDS)
  .ignoring(NoSuchElementException.class);
After

new WebDriverWait(driver, Duration.ofSeconds(3))
  .until(ExpectedConditions.elementToBeClickable(By.cssSelector("#id")));

  Wait<WebDriver> wait = new FluentWait<WebDriver>(driver)
  .withTimeout(Duration.ofSeconds(30))
  .pollingEvery(Duration.ofSeconds(5))
  .ignoring(NoSuchElementException.class);

合并capabilities不再改变调用对象

曾经可以将一组不同的capabilities合并到另一组中, 并且改变调用对象.
现在, 需要分配合并操作的结果.

Before

MutableCapabilities capabilities = new MutableCapabilities();
capabilities.setCapability("platformVersion", "Windows 10");
FirefoxOptions options = new FirefoxOptions();
options.setHeadless(true);
options.merge(capabilities);

//作为结果, `options` 对象被修改
After

MutableCapabilities capabilities = new MutableCapabilities();
capabilities.setCapability("platformVersion", "Windows 10");
FirefoxOptions options = new FirefoxOptions();
options.setHeadless(true);
options = options.merge(capabilities);

// `merge` 调用的结果需要分配给一个对象.

Firefox 遗留模式

在 GeckoDriver 出现之前, Selenium 项目有一个驱动程序实现来自动化 Firefox(版本 <48).
但是, 不再需要此实现, 因为在最新版本的 Firefox 中它不起作用.
为避免升级到 Selenium 4 时出现重大问题, setLegacy选项将显示为已弃用.
建议停止使用旧的实现 并且只依赖 GeckoDriver.
以下代码将显示在升级之后弃用的 setLegacy 行.

FirefoxOptions options = new FirefoxOptions();
options.setLegacy(true);

BrowserType

BrowserType 接口已经存在很长时间了, 但是其已变为弃用 且推荐使用新的 Browser 接口.

Before

MutableCapabilities capabilities = new MutableCapabilities();
capabilities.setCapability("browserVersion", "92");
capabilities.setCapability("browserName", BrowserType.FIREFOX);
After

MutableCapabilities capabilities = new MutableCapabilities();
capabilities.setCapability("browserVersion", "92");
capabilities.setCapability("browserName", Browser.FIREFOX);

C#

AddAdditionalCapability 已弃用

推荐使用AddAdditionalOption替代.
以下为一个示例:

Before

var browserOptions = new ChromeOptions();
browserOptions.PlatformName = "Windows 10";
browserOptions.BrowserVersion = "latest";
var cloudOptions = new Dictionary<string, object>();
browserOptions.AddAdditionalCapability("cloud: options", cloudOptions, true);
After

var browserOptions = new ChromeOptions();
browserOptions.PlatformName = "Windows 10";
browserOptions.BrowserVersion = "latest";
var cloudOptions = new Dictionary<string, object>();
browserOptions.AddAdditionalOption("cloud: options", cloudOptions);

Python

executable_path 已弃用, 请传递一个服务对象

在Selenium 4中,
您需要从服务对象设置驱动程序的 可执行路径 , 以防止出现弃用警告. (或者不要设置路径, 而是确保所需的驱动程序位于系统路径上.)

Before

from selenium import webdriver
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(
    executable_path=CHROMEDRIVER_PATH, 
    options=options
)
After

from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
options = webdriver.ChromeOptions()
service = ChromeService(executable_path=CHROMEDRIVER_PATH)
driver = webdriver.Chrome(service=service, options=options)

总结

我们已经过了升级到 Selenium 4 时要考虑的主要变化. 涵盖为升级准备测试代码时要涵盖的不同方面, 包括关于如何避免 使用Selenium新版本时 可能出现的潜在问题的建议.
最后, 我们还介绍了一系列您可能会遇到的升级问题, 分享这些问题的潜在修复方案.

本文最初发布于 https://saucelabs.com/resources/articles/how-to-upgrade-to-selenium-4

3 - Grid

要在多台计算机上并行运行测试吗? 那么, Grid正是为你准备的.

Selenium Grid 允许通过将客户端发送的命令路由到远程浏览器实例来在远程机器上执行 WebDriver 脚本。

Grid 的目标:

  • 提供一种在多台机器上并行运行测试的简单方法
  • 允许在不同的浏览器版本上进行测试
  • 启用跨平台测试

感兴趣? 通过以下部分了解Grid的工作原理, 以及如何设置自己的.

3.1 - Selenium Grid快速起步

一步一步地说明如何运行简单的Selenium Grid.

Page being translated from English to Chinese. Do you speak Chinese? Help us to translate it by sending us pull requests!

快速开始

  1. 先决条件
  2. 启动 Grid
    • java -jar selenium-server-<version>.jar standalone
  3. 将您的 WebDriver 测试指向 http://localhost:4444
  4. (可选) 通过在浏览器中打开 http://localhost:4444 检查正在运行的测试和可用的功能

*想知道如何将您的测试指向 http://localhost:4444吗? 请查看 RemoteWebDriver section

要了解更多不同的配置选项,请查看以下各小节。

Grid 角色

Grid由六个不同的组件组成,这使您可以以不同的方式部署它。

根据您的需求,您可以单独启动每个组件(分布式),将它们分组为Hub和Node,或者全部在单个机器上运行(独立)。

单机部署(Standalone)

Standalone 可以将所有 Grid 组件 无缝地整合成一个单独的实体。在 Standalone 模式下运行 Grid,只需一个命令即可获得一个完整的 Grid,并在一个进程中运行。Standalone 只能在一台机器上运行。

Standalone 模式也是最容易启动 Selenium Grid 的模式。默认情况下,服务器将在 http://localhost:4444 上监听 RemoteWebDriver 请求,并且服务器将从系统 PATH 中检测可以使用的驱动程序。

java -jar selenium-server-<version>.jar standalone

成功启动 Standalone 模式的 Grid 后,将你的 WebDriver 测试指向 http://localhost:4444

Standalone 的常见用途包括:

  • 本地使用 RemoteWebDriver 开发或调试测试
  • 推送代码之前运行快速测试套件
  • 在 CI/CD 工具(GitHub Actions、Jenkins 等)中设置简单的 Grid

Hub and Node

Hub和Node是最常用的角色,因为它允许:

  • 将不同的机器组合成一个单一的Grid
    • 例如拥有不同操作系统和/或浏览器版本的机器
  • 在不同的环境中运行WebDriver测试有一个单一的入口点
  • 在不破坏Grid的情况下增加或减少容量。

Hub

一个Hub由以下组件组成: 路由器(Router)、分发器(Distributor)、会话映射(Session Map)、新会话队列(New Session Queue)和事件总线(Event Bus)。

java -jar selenium-server-<version>.jar hub

默认情况下,服务器将在 http://localhost:4444 上监听RemoteWebDriver请求。

Node

在启动时,Node将从系统的PATH中检测可用的驱动程序。

以下命令假设Node正在运行的机器与Hub在同一台机器上。

java -jar selenium-server-<version>.jar node
同一台机器上的多个Node

Node 1

java -jar selenium-server-<version>.jar node --port 5555

Node 2

java -jar selenium-server-<version>.jar node --port 6666
不同机器上的Node和Hub

HubNodes通过HTTP和事件总线事件总线位于Hub内部)进行通信。

Node通过事件总线向Hub发送消息以开始注册过程。当Hub收到消息时,通过HTTP与Node联系以确认其存在。

要成功将Node注册到Hub,重要的是要在Hub机器上公开事件总线端口(默认为4442和4443)。这也适用于Node端口。有了这个,HubNode都能够通信。

如果Hub使用默认端口,则可以使用 --hub 注册Node

java -jar selenium-server-<version>.jar node --hub http://<hub-ip>:4444

Hub未使用默认端口时,需要使用--publish-events--subscribe-events

例如,如果Hub使用端口8886、8887和8888。

java -jar selenium-server-<version>.jar hub --publish-events tcp://<hub-ip>:8886 --subscribe-events tcp://<hub-ip>:8887 --port 8888

Node需要使用这些端口才能成功注册。

java -jar selenium-server-<version>.jar node --publish-events tcp://<hub-ip>:8886 --subscribe-events tcp://<hub-ip>:8887

分部署部署(Distributed)

在使用分布式Grid时,每个组件都需要单独启动,并且理想情况下应该在不同的机器上。

  1. 事件总线(Event Bus): 使不同网格组件之间的内部通信成为可能。

默认端口为:444244435557

java -jar selenium-server-<version>.jar event-bus --publish-events tcp://<event-bus-ip>:4442 --subscribe-events tcp://<event-bus-ip>:4443 --port 5557
  1. 新会话队列(New Session Queue): 将新的会话请求添加到一个队列中,Distributor将查询该队列。

默认端口为5559

java -jar selenium-server-<version>.jar sessionqueue --port 5559
  1. 会话映射(Session Map): 将会话ID映射到运行该会话的节点。

默认会话映射端口为5556会话映射事件总线进行交互。

java -jar selenium-server-<version>.jar sessions --publish-events tcp://<event-bus-ip>:4442 --subscribe-events tcp://<event-bus-ip>:4443 --port 5556
  1. 分配器(Distributor): 查询新 会话队列(New Session Queue) 以获取新会话请求,并在能力匹配时将其分配给 NodeNodes 注册到 Distributor 的方式与在 Hub/Node 网格中注册到 Hub 相同。

默认分配器端口为5553分配器新会话队列会话映射事件总线Node(s) 进行交互。

java -jar selenium-server-<version>.jar distributor --publish-events tcp://<event-bus-ip>:4442 --subscribe-events tcp://<event-bus-ip>:4443 --sessions http://<sessions-ip>:5556 --sessionqueue http://<new-session-queue-ip>:5559 --port 5553 --bind-bus false
  1. 路由器(Router): 将新会话请求重定向到队列,并将正在运行的会话请求重定向到运行该会话的Node

默认路由器端口为4444路由器新会话队列会话映射分配器 进行交互。

java -jar selenium-server-<version>.jar router --sessions http://<sessions-ip>:5556 --distributor http://<distributor-ip>:5553 --sessionqueue http://<new-session-queue-ip>:5559 --port 4444
  1. Node(s)

默认 Node 端口是 5555.

java -jar selenium-server-<version>.jar node --publish-events tcp://<event-bus-ip>:4442 --subscribe-events tcp://<event-bus-ip>:4443

测试中的 Metadata

向测试中添加 Metadata 并通过GraphQL进行消费,或通过 Selenium Grid UI 可视化其部分内容(例如se:name)。

可以通过在 capability 前加上 se: 来添加元数据。以下是一个Java的快速示例。

ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.setCapability("browserVersion", "100");
chromeOptions.setCapability("platformName", "Windows");
// Showing a test name instead of the session id in the Grid UI
chromeOptions.setCapability("se:name", "My simple test"); 
// Other type of metadata can be seen in the Grid UI by clicking on the 
// session info or via GraphQL
chromeOptions.setCapability("se:sampleMetadata", "Sample metadata value"); 
WebDriver driver = new RemoteWebDriver(new URL("http://gridUrl:4444"), chromeOptions);
driver.get("http://www.google.com");
driver.quit();

查询 Selenium Grid 相关状态

启动 Grid 后,主要有两种方式查询其状态,通过 Grid UI 或通过 API 调用。

可以通过打开您喜欢的浏览器并前往http://localhost:4444

API 调用可以通过 http://localhost:4444/status 端点或使用 GraphQL

为简单起见,本页中显示的所有命令示例均假定组件正在运行在本地。更详细的示例和用法可以在配置组件 部分。

使用 Java 11 中的 HTTP Client

默认情况下,Grid 将使用 AsyncHttpClient。 AsyncHttpClient 是一个建立在 Netty 之上的开源库。 它允许异步执行 HTTP 请求和响应。 此外,它还提供 WebSocket 支持。 因此它很合适。

然而,AsyncHttpClient 从 2021 年 6 月开始就没有主动维护了。恰逢 Java 11+ 提供了内置的 HTTP 和 WebSocket 客户端。

目前,Selenium 计划将支持的最低版本升级到 Java 11。然而,这需要大量的工作。为了确保用户体验不受影响,将其与主要发布版本和相应的公告对齐是至关重要的。

要使用 Java 11 客户端,您需要下载 selenium-http-jdk-client jar文件并使用 --ext 参数使其在 Grid jar 的类路径中可用。

jar文件可以直接从 repo1.maven.org 下载,然后使用以下方式启动Grid:

java -Dwebdriver.http.factory=jdk-http-client -jar selenium-server-<version>.jar --ext selenium-http-jdk-client-<version>.jar standalone

下载 selenium-http-jdk-client jar 文件的替代方法是使用 Coursier

java -Dwebdriver.http.factory=jdk-http-client -jar selenium-server-<version>.jar --ext $(coursier fetch -p org.seleniumhq.selenium:selenium-http-jdk-client:<version>) standalone

如果您使用的是集线器/节点模式或分布式模式,则需要为每个组件设置 -Dwebdriver.http.factory=jdk-http-client--ext 参数。

Grid 的规模

选择 Grid 角色取决于需要支持什么操作系统和浏览器、需要执行多少个并行会话、可用机器的数量以及这些机器的配置(CPU、RAM)。

并发创建会话依赖于 分配器 的可用处理器。 例如,如果一台机器有 4 个 CPU,则 分配器 最多只能同时创建 4 个会话。

默认情况下,Node 支持的最大并发会话数受可用 CPU 数量的限制。 例如,如果 Node 机器有 8 个 CPU,它最多可以运行 8 个并发浏览器会话(Safari 除外,它始终是一个)。 此外,预计每个浏览器会话应使用大约 1GB 的 RAM。

通常,建议 Nodes 尽可能小。 与其让机器有 32 个 CPU 和 32GB RAM 来运行 32 个并发浏览器会话,不如有 32 个小的 Node,以便更好地隔离进程。 有了这个,如果一个 Node 发生故障,它将以孤立的方式进行。 Docker 是实现这种方法的好工具。

请注意,默认值(每个浏览器 1 个 CPU/1GB RAM)是建议值,它们不适用于您的上下文。 建议将它们用作参考,但持续测量性能将有助于确定您的环境的理想值。

Grid 大小与支持的并发会话数量和 Node 数量有关,没有“一刀切”的说法。 下面提到的尺寸是粗略的估计,不同环境之间可能会有所不同。 例如,当 Hub 具有足够的资源时,具有 120 个 NodesHub/Node 可能运行良好。 以下值并非一成不变,欢迎提供反馈!

小规模

StandaloneHub/Node 不超过5个 Nodes.

中等规模

Hub/Node 介于6到60个 Nodes 之间。

大规模

Hub/Node 介于60到100个 Nodes 之间, Distributed 超过100个 Nodes

请注意

必须使用适当的防火墙权限保护Selenium Grid免受外部访问。

以下一种或多种情况可能会导致你的 Grid 处于一个不安全的状态:

  • 您提供对您的 Grid 基础设施的开放访问
  • 您允许第三方访问内部网络应用程序和文件
  • 您允许第三方运行自定义二进制文件

请参阅 Detectify 上的这篇博文,它提供了一个很好的公开暴露的 Grid 如何被滥用的概述:不要让你的 Grid 暴露在外

延伸阅读

3.2 - 什么时候应该使用Grid

Is Grid right for you?

Page being translated from English to Chinese. Do you speak Chinese? Help us to translate it by sending us pull requests!

什么情况下可以使用 Selenium Grid

  • 想要在不同的浏览器类型、浏览器版本和操作系统上并行运行测试时
  • 想要缩短执行测试案例所需的时间

Selenium Grid 可以并行地在多台计算机(称为节点)上运行测试案例。对于大型和长时间运行的测试案例,这可以节省几分钟、几小时甚至几天的时间。

这有效的缩短了测试结果的反馈时间,使得在测试的应用程序发生变化时能够更快地得到测试结果。

Grid 可以并行地运行测试,支持多种不同的浏览器类型,并且可以同时运行多个相同浏览器的实例。

举个例子,假设一个拥有六个节点的Grid。第一台计算机拥有Firefox的最新版本,第二台拥有Firefox的上一个版本,第三台运行最新版Chrome,而其余三台机器是Mac Mini,允许在最新版本的Safari上并行运行三个测试。

执行时间可以用一个简单的公式来表示:

测试次数 × 平均测试时间 / 节点数 = 总执行时间

   15      *       45s        /        1        =      11m 15s   // Without Grid
   15      *       45s        /        5        =      2m 15s    // Grid with 5 Nodes
   15      *       45s        /        15       =      45s       // Grid with 15 Nodes
  100      *       120s       /        15       =      13m 20s   // Would take over 3 hours without Grid

在测试案例执行时,Grid 会按照测试配置将测试分配到相应的浏览器上运行。

即使对于比较复杂的 Selenium 测试案例,这样的配置也可以极大地加快执行时间。

Selenium GridSelenium 项目中的重要组成部分,由同一团队的核心Selenium开发人员并行维护。由于意识到测试执行速度的重要性,Grid 自设计之初就成为 Selenium 项目的关键部分。

3.3 - 服务网格的组件

检查不同的Grid组件以了解如何使用它们.

Page being translated from English to Chinese. Do you speak Chinese? Help us to translate it by sending us pull requests!

Selenium Grid 4 是对以前版本的彻底重写。除了对性能和标准合规性进行全面改进外,还分解了 Grid 的不同功能以反映更现代的计算和软件开发时代。 Selenium Grid 4 专为容器化和云分布式可扩展性而构建,是现代时代的全新解决方案。

Selenium Grid 4 Components

路由器(Router)

路由器 是 Grid 的入口点,接收所有外部请求,并将它们转发给正确的组件。

如果路由器收到新的会话请求,它将被转发到新会话队列

如果请求属于一个已经存在的session,路由器会查询Session Map得到session运行所在的Node ID,然后将请求直接转发给Node

路由器通过将请求发送到能够更好地处理它们的组件来平衡网格中的负载,而不会使过程中不需要的任何组件过载。

分发器(Distributor)

分发器有两个主要职责:

注册并跟踪所有Node及其功能

Node通过事件总线发送Node注册事件来注册到分发器分发器读取它,然后尝试通过 HTTP 到达Node以确认它的存在。如果请求成功,Distributor注册节点并通过 GridModel 跟踪所有Node功能。

查询新会话队列并处理任何未决的新会话请求

当一个新的会话请求被发送到路由器时,它被转发到新会话队列,它将在队列中等待。 Distributor 将轮询新会话队列以查找未决的新会话请求,然后找到可以创建会话的合适Node。会话创建后,分发器将会话 ID 与正在执行会话的Node之间的关系存储在会话映射中。

会话映射(Session Map)

会话映射 是一个数据存储,用于保存会话 ID 和运行会话的Node之间的关系。它支持路由器在将请求转发到Node的过程中进行查询。路由器将向会话映射询问与会话 ID 关联的Node

新会话队列(New Session Queue)

新会话队列按先进先出的顺序保存所有新会话请求。它具有可配置的参数,用于设置请求超时和请求重试间隔(检查超时的频率)。

路由器新会话请求添加到新会话队列中并等待响应。新会话队列定期检查队列中是否有任何请求超时,如果是,则立即拒绝并将其删除。

分发器定期检查是否有可用的插槽。如果有可用的插槽,则分发器会轮询新会话队列以查找第一个匹配的请求。然后,分发器尝试创建新会话

一旦请求的功能与任何空闲Node插槽的功能匹配,分发器将尝试获取可用插槽。如果所有插槽都已忙碌,则分发器将将请求发送回队列。如果请求在重试或添加到队列的前面时超时,则会被拒绝。

成功创建会话后,分发器将会话信息发送到新会话队列,该信息然后被发送回路由器,最终发送给客户端。

节点(Node)

一个Grid可以包含多个Node。每个Node管理它所在机器上可用浏览器的插槽。

Node通过事件总线分发器注册自己,并将其配置作为注册消息发送。

默认情况下,Node会自动注册其所在机器上路径中可用的所有浏览器驱动程序。它还为基于Chromium的浏览器和Firefox创建每个可用CPU一个插槽。对于Safari,只创建一个插槽。通过特定的配置,它可以在Docker容器中运行会话或转发命令。

Node仅执行接收到的命令,不评估、不做出判断或控制任何除命令和响应流之外的东西。Node所在的机器不需要与其他组件具有相同的操作系统。例如,Windows节点可能具有在Edge上提供IE模式作为浏览器选项的能力,而在Linux或Mac上则不可能,网格可以配置多个具有Windows、Mac或Linux的Node

事件总线(Event Bus)

事件总线作为节点分发器新会话队列会话映射之间的通信路径。Grid 的大部分内部通信都通过消息进行,避免了频繁的HTTP调用。在完全分布式模式下启动Grid 时,事件总线应该是第一个组件。

3.4 - 配置组件

在这里,您可以看到如何根据公共配置值和特定于组件的配置值分别配置每个网格组件.

3.4.1 - 配置帮助

获取有关配置网格的所有可用选项的信息.

Help命令显示基于当前代码实现的信息. 因此, 如果文档没有更新, 它将提供准确的信息. 这是了解任何新版本Grid4配置的最便捷方法.

信息命令

Info命令提供以下主题的详细文档:

  • 配置Selenium
  • 安全
  • 会话表配置
  • 追踪

配置帮助

通过运行以下命令快速获取配置帮助:

java -jar selenium-server-<version>.jar info config

安全

获取构建网格服务器的详细信息, 用于安全通信和节点注册.

java -jar selenium-server-<version>.jar info security

会话表配置

默认情况下, 网格使用本地会话表来存储会话信息. 网格支持额外的存储选项, 比如Redis和JDBC-SQL支持的数据库. 要设置不同的会话存储, 请使用以下命令获取设置步骤:

java -jar selenium-server-<version>.jar info sessionmap

基于OpenTelemetry和Jaeger的追踪配置

默认情况下, 追踪是启用的. 要通过Jaeger导出追踪并将其可视化, 请使用以下命令进行说明:

java -jar selenium-server-<version>.jar info tracing

列出Selenium网格的命令

java -jar selenium-server-<version>.jar --config-help

上述命令将显示所有可用的命令及其描述.

组件帮助命令

在Selenium后面键入–help的配置选项, 以获取特定组件的配置信息.

Standalone

java -jar selenium-server-<version>.jar standalone --help

Hub

java -jar selenium-server-<version>.jar hub --help

Sessions

java -jar selenium-server-<version>.jar sessions --help

队列器

java -jar selenium-server-<version>.jar sessionqueue --help

Distributor

java -jar selenium-server-<version>.jar distributor --help

Router

java -jar selenium-server-<version>.jar router --help

Node

java -jar selenium-server-<version>.jar node --help

3.4.2 - CLI 选项

所有网格组件配置CLI选项的详细信息.

Page being translated from English to Chinese. Do you speak Chinese? Help us to translate it by sending us pull requests!

Different sections are available to configure a Grid. Each section has options can be configured through command line arguments.

A complete description of the component to section mapping can be seen below.

Note that this documentation could be outdated if an option was modified or added but has not been documented yet. In case you bump into this situation, please check the “Config help” section and feel free to send us a pull request updating this page.

Sections

StandaloneHubNodeDistributorRouterSessionsSessionQueue
Distributor
Docker
Events
Logging
Network
Node
Router
Relay
Server
SessionQueue
Sessions

Distributor

OptionTypeValue/ExampleDescription
--healthcheck-intervalint120How often, in seconds, will the health check run for all Nodes. This ensures the server can ping all the Nodes successfully.
--distributorurihttp://localhost:5553Url of the distributor.
--distributor-hoststringlocalhostHost on which the distributor is listening.
--distributor-implementationstringorg.openqa.selenium.grid.distributor.local.LocalDistributorFull class name of non-default distributor implementation
--distributor-portint5553Port on which the distributor is listening.
--reject-unsupported-capsbooleanfalseAllow the Distributor to reject a request immediately if the Grid does not support the requested capability. Rejecting requests immediately is suitable for a Grid setup that does not spin up Nodes on demand.
--slot-matcherstringorg.openqa.selenium.grid.data.DefaultSlotMatcherFull class name of non-default slot matcher to use. This is used to determine whether a Node can support a particular session.
--slot-selectorstringorg.openqa.selenium.grid.distributor.selector.DefaultSlotSelectorFull class name of non-default slot selector. This is used to select a slot in a Node once the Node has been matched.

Docker

OptionTypeValue/ExampleDescription
--docker-assets-pathstring/opt/selenium/assetsAbsolute path where assets will be stored
--docker-string[]selenium/standalone-firefox:latest '{"browserName": "firefox"}'Docker configs which map image name to stereotype capabilities (example `-D selenium/standalone-firefox:latest ‘{“browserName”: “firefox”}’)
--docker-devicesstring[]/dev/kvm:/dev/kvmExposes devices to a container. Each device mapping declaration must have at least the path of the device in both host and container separated by a colon like in this example: /device/path/in/host:/device/path/in/container
--docker-hoststringlocalhostHost name where the Docker daemon is running
--docker-portint2375Port where the Docker daemon is running
--docker-urlstringhttp://localhost:2375URL for connecting to the Docker daemon
--docker-video-imagestringselenium/video:latestDocker image to be used when video recording is enabled

Events

OptionTypeValue/ExampleDescription
--bind-busbooleanfalseWhether the connection string should be bound or connected.
When true, the component will be bound to the Event Bus (as in the Event Bus will also be started by the component, typically by the Distributor and the Hub).
When false, the component will connect to the Event Bus.
--events-implementationstringorg.openqa.selenium.events.zeromq.ZeroMqEventBusFull class name of non-default event bus implementation
--publish-eventsstringtcp://*:4442Connection string for publishing events to the event bus
--subscribe-eventsstringtcp://*:4443Connection string for subscribing to events from the event bus

Logging

OptionTypeValue/ExampleDescription
--http-logsbooleanfalseEnable http logging. Tracing should be enabled to log http logs.
--log-encodingstringUTF-8Log encoding
--logstringWindows path example :
'\path\to\file\gridlog.log'
or
'C:\path\path\to\file\gridlog.log'

Linux/Unix/MacOS path example :
'/path/to/file/gridlog.log'
File to write out logs. Ensure the file path is compatible with the operating system’s file path.
--log-levelstring“INFO”Log level. Default logging level is INFO. Log levels are described here https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html
--plain-logsbooleantrueUse plain log lines
--structured-logsbooleanfalseUse structured logs
--tracingbooleantrueEnable trace collection
--log-timestamp-formatstringHH:mm:ss.SSSAllows the configure log timestamp format

Network

OptionTypeValue/ExampleDescription
--relax-checksbooleanfalseRelax checks on origin header and content type of incoming requests, in contravention of strict W3C spec compliance.

Node

OptionTypeValue/ExampleDescription
--detect-driversbooleantrueAutodetect which drivers are available on the current system, and add them to the Node.
--driver-configurationstring[]display-name="Firefox Nightly" max-sessions=2 webdriver-path="/usr/local/bin/geckodriver" stereotype='{"browserName": "firefox", "browserVersion": "86", "moz:firefoxOptions": {"binary":"/Applications/Firefox Nightly.app/Contents/MacOS/firefox-bin"}}'List of configured drivers a Node supports. It is recommended to provide this type of configuration through a toml config file to improve readability
--driver-factorystring[]org.openqa.selenium.example.LynxDriverFactory '{"browserName": "lynx"}'Mapping of fully qualified class name to a browser configuration that this matches against.
--driver-implementationstring[]"firefox"Drivers that should be checked. If specified, will skip autoconfiguration.
--node-implementationstring"org.openqa.selenium.grid.node.local.LocalNodeFactory"Full classname of non-default Node implementation. This is used to manage a session’s lifecycle.
--grid-urlstringhttps://grid.example.comPublic URL of the Grid as a whole (typically the address of the Hub or the Router)
--heartbeat-periodint60How often, in seconds, will the Node send heartbeat events to the Distributor to inform it that the Node is up.
--max-sessionsint8Maximum number of concurrent sessions. Default value is the number of available processors.
--override-max-sessionsbooleanfalseThe # of available processors is the recommended max sessions value (1 browser session per processor). Setting this flag to true allows the recommended max value to be overwritten. Session stability and reliability might suffer as the host could run out of resources.
--register-cycleint10How often, in seconds, the Node will try to register itself for the first time to the Distributor.
--register-periodint120How long, in seconds, will the Node try to register to the Distributor for the first time. After this period is completed, the Node will not attempt to register again.
--session-timeoutint300Let X be the session-timeout in seconds. The Node will automatically kill a session that has not had any activity in the last X seconds. This will release the slot for other tests.
--vnc-env-varstringSTART_XVFBEnvironment variable to check in order to determine if a vnc stream is available or not.
--no-vnc-portint7900If VNC is available, sets the port where the local noVNC stream can be obtained
--drain-after-session-countint1Drain and shutdown the Node after X sessions have been executed. Useful for environments like Kubernetes. A value higher than zero enables this feature.
--hubstringhttp://localhost:4444The address of the Hub in a Hub-and-Node configuration. Can be a hostname or IP address (hostname), in which case the Hub will be assumed to be http://hostname:4444, the --grid-url will be the same --publish-events will be tcp://hostname:4442 and --subscribe-events will be tcp://hostname:4443. If hostname contains a port number, that will be used for --grid-url but the URIs for the event bus will remain the same. Any of these default values may be overridden but setting the correct flags. If the hostname has a protocol (such as https) that will be used too.
--enable-cdpbooleantrueEnable CDP proxying in Grid. A Grid admin can disable CDP if the network doesnot allow websockets. True by default.
--enable-managed-downloadsbooleanfalseThis causes the Node to auto manage files downloaded for a given session on the Node.
--selenium-managerbooleanfalseWhen drivers are not available on the current system, use Selenium Manager. False by default.

Relay

OptionTypeValue/ExampleDescription
--service-urlstringhttp://localhost:4723URL for connecting to the service that supports WebDriver commands like an Appium server or a cloud service.
--service-hoststringlocalhostHost name where the service that supports WebDriver commands is running
--service-portint4723Port where the service that supports WebDriver commands is running
--service-status-endpointstring/statusOptional, endpoint to query the WebDriver service status, an HTTP 200 response is expected
--service-configurationstring[]max-sessions=2 stereotype='{"browserName": "safari", "platformName": "iOS", "appium:platformVersion": "14.5"}}'Configuration for the service where calls will be relayed to. It is recommended to provide this type of configuration through a toml config file to improve readability.

Router

OptionTypeValue/ExampleDescription
--passwordstringmyStrongPasswordPassword clients must use to connect to the server. Both this and the username need to be set in order to be used.
--usernamestringadminUser name clients must use to connect to the server. Both this and the password need to be set in order to be used.
--sub-pathstringmy_company/selenium_gridA sub-path that should be considered for all user facing routes on the Hub/Router/Standalone.

Server

OptionTypeValue/ExampleDescription
--allow-corsbooleantrueWhether the Selenium server should allow web browser connections from any host
--hoststringlocalhostServer IP or hostname: usually determined automatically.
--bind-hostbooleantrueWhether the server should bind to the host address/name, or only use it to" report its reachable url. Helpful in complex network topologies where the server cannot report itself with the current IP/hostname but rather an external IP or hostname (e.g. inside a Docker container)
--https-certificatepath/path/to/cert.pemServer certificate for https. Get more detailed information by running “java -jar selenium-server.jar info security”
--https-private-keypath/path/to/key.pkcs8Private key for https. Get more detailed information by running “java -jar selenium-server.jar info security”
--max-threadsint24Maximum number of listener threads. Default value is: (available processors) * 3.
--portint4444Port to listen on. There is no default as this parameter is used by different components, for example, Router/Hub/Standalone will use 4444 and Node will use 5555.

SessionQueue

OptionTypeValue/ExampleDescription
--sessionqueueurihttp://localhost:1237Address of the session queue server.
-sessionqueue-hoststringlocalhostHost on which the session queue server is listening.
--sessionqueue-portint1234Port on which the session queue server is listening.
--session-request-timeoutint300Timeout in seconds. A new incoming session request is added to the queue. Requests sitting in the queue for longer than the configured time will timeout.
--session-retry-intervalint5Retry interval in seconds. If all slots are busy, new session request will be retried after the given interval.

Sessions

OptionTypeValue/ExampleDescription
--sessionsurihttp://localhost:1234Address of the session map server.
--sessions-hoststringlocalhostHost on which the session map server is listening.
--sessions-portint1234Port on which the session map server is listening.

Configuration examples

All the options mentioned above can be used when starting the Grid components. They are a good way of exploring the Grid options, and trying out values to find a suitable configuration.

We recommend the use of Toml files to configure a Grid. Configuration files improve readability, and you can also check them in source control.

When needed, you can combine a Toml file configuration with CLI arguments.

Command-line flags

To pass config options as command-line flags, identify the valid options for the component and follow the template below.

java -jar selenium-server-<version>.jar <component> --<option> value

Standalone, setting max sessions and main port

java -jar selenium-server-<version>.jar standalone --max-sessions 4 --port 4444

Hub, setting a new session request timeout, a main port, and disabling tracing

java -jar selenium-server-<version>.jar hub --session-request-timeout 500 --port 3333 --tracing false

Node, with 4 max sessions, with debug(fine) log, 7777 as port, and only with Firefox and Edge

java -jar selenium-server-<version>.jar node --max-sessions 4 --log-level "fine" --port 7777 --driver-implementation "firefox" --driver-implementation "edge"

Distributor, setting Session Map server url, Session Queue server url, and disabling bus

java -jar selenium-server-<version>.jar distributor --sessions http://localhost:5556 --sessionqueue http://localhost:5559 --bind-bus false

Setting custom capabilities for matching specific Nodes

Important: Custom capabilities need to be set in the configuration in all Nodes. They also need to be included always in every session request.

Start the Hub
java -jar selenium-server-<version>.jar hub
Start the Node A with custom cap set to true
java -jar selenium-server-<version>.jar node --detect-drivers false --driver-configuration display-name="Chrome (custom capability true)" max-sessions=1 stereotype='{"browserName":"chrome","gsg:customcap":true}' --port 6161
Start the Node B with custom cap set to false
java -jar selenium-server-<version>.jar node --detect-drivers false --driver-configuration display-name="Chrome (custom capability true)" max-sessions=1 stereotype='{"browserName":"chrome","gsg:customcap":false}' --port 6262
Matching Node A
ChromeOptions options = new ChromeOptions();
options.setCapability("gsg:customcap", true);
WebDriver driver = new RemoteWebDriver(new URL("http://localhost:4444"), options);
driver.get("https://selenium.dev");
driver.quit();

Set the custom capability to false in order to match the Node B.

Enabling Managed downloads by the Node

At times a test may need to access files that were downloaded by it on the Node. To retrieve such files, following can be done.

Start the Hub
java -jar selenium-server-<version>.jar hub
Start the Node with manage downloads enabled
java -jar selenium-server-<version>.jar node --enable-managed-downloads true
Set the capability at the test level

Tests that want to use this feature should set the capability "se:downloadsEnabled"to true

options.setCapability("se:downloadsEnabled", true);
How does this work
  • The Grid infrastructure will try to match a session request with "se:downloadsEnabled" against ONLY those nodes which were started with --enable-managed-downloads true
  • If a session is matched, then the Node automatically sets the required capabilities to let the browser know, as to where should a file be downloaded.
  • The Node now allows a user to:
    • List all the files that were downloaded for a specific session and
    • Retrieve a specific file from the list of files.
  • The directory into which files were downloaded for a specific session gets automatically cleaned up when the session ends (or) timesout due to inactivity.

Note: Currently this capability is ONLY supported on:

  • Edge
  • Firefox and
  • Chrome browser
Listing files that can be downloaded for current session:
  • The endpoint to GET from is /session/<sessionId>/se/files.
  • The session needs to be active in order for the command to work.
  • The raw response looks like below:
{
  "value": {
    "names": [
      "Red-blue-green-channel.jpg"
    ]
  }
}

In the response the list of file names appear under the key names.

Dowloading a file:
  • The endpoint to POST from is /session/<sessionId>/se/files with a payload of the form {"name": "fileNameGoesHere}
  • The session needs to be active in order for the command to work.
  • The raw response looks like below:
{
  "value": {
    "filename": "Red-blue-green-channel.jpg",
    "contents": "Base64EncodedStringContentsOfDownloadedFileAsZipGoesHere"
  }
}
  • The response blob contains two keys,
    • filename - The file name that was downloaded.
    • contents - Base64 encoded zipped contents of the file.
  • The file contents are Base64 encoded and they need to be unzipped.
List files that can be downloaded

The below mentioned curl example can be used to list all the files that were downloaded by the current session in the Node, and which can be retrieved locally.

curl -X GET "http://localhost:4444/session/90c0149a-2e75-424d-857a-e78734943d4c/se/files"

A sample response would look like below:

{
  "value": {
    "names": [
      "Red-blue-green-channel.jpg"
    ]
  }
}
Retrieve a downloaded file

Assuming the downloaded file is named Red-blue-green-channel.jpg, and using curl, the file could be downloaded with the following command:

curl -H "Accept: application/json" \
-H "Content-Type: application/json; charset=utf-8" \
-X POST -d '{"name":"Red-blue-green-channel.jpg"}' \
"http://localhost:4444/session/18033434-fa4f-4d11-a7df-9e6d75920e19/se/files"

A sample response would look like below:

{
  "value": {
    "filename": "Red-blue-green-channel.jpg",
    "contents": "UEsDBBQACAgIAJpagVYAAAAAAAAAAAAAAAAaAAAAUmVkLWJsAAAAAAAAAAAAUmVkLWJsdWUtZ3JlZW4tY2hhbm5lbC5qcGdQSwUGAAAAAAEAAQBIAAAAcNkAAAAA"
  }
}
Complete sample code in Java

Below is an example in Java that does the following:

  • Sets the capability to indicate that the test requires automatic managing of downloaded files.
  • Triggers a file download via a browser.
  • Lists the files that are available for retrieval from the remote node (These are essentially files that were downloaded in the current session)
  • Picks one file and downloads the file from the remote node to the local machine.
import com.google.common.collect.ImmutableMap;

import org.openqa.selenium.By;
import org.openqa.selenium.io.Zip;
import org.openqa.selenium.json.Json;
import org.openqa.selenium.remote.RemoteWebDriver;
import org.openqa.selenium.remote.http.HttpClient;
import org.openqa.selenium.remote.http.HttpRequest;
import org.openqa.selenium.remote.http.HttpResponse;

import java.io.File;
import java.net.URL;
import java.nio.file.Files;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.concurrent.TimeUnit;

import static org.openqa.selenium.remote.http.Contents.asJson;
import static org.openqa.selenium.remote.http.Contents.string;
import static org.openqa.selenium.remote.http.HttpMethod.GET;
import static org.openqa.selenium.remote.http.HttpMethod.POST;

public class DownloadsSample {

  public static void main(String[] args) throws Exception {
    // Assuming the Grid is running locally.
    URL gridUrl = new URL("http://localhost:4444");
    ChromeOptions options = new ChromeOptions();
    options.setCapability("se:downloadsEnabled", true);
    RemoteWebDriver driver = new RemoteWebDriver(gridUrl, options);
    try {
      demoFileDownloads(driver, gridUrl);
    } finally {
      driver.quit();
    }
  }

	private static void demoFileDownloads(RemoteWebDriver driver, URL gridUrl) throws Exception {
		driver.get("https://www.selenium.dev/selenium/web/downloads/download.html");
		// Download the two available files on the page
		driver.findElement(By.id("file-1")).click();
		driver.findElement(By.id("file-2")).click();

		// The download happens in a remote Node, which makes it difficult to know when the file
		// has been completely downloaded. For demonstration purposes, this example uses a
		// 10-second sleep which should be enough time for a file to be downloaded.
		// We strongly recommend to avoid hardcoded sleeps, and ideally, to modify your
		// application under test, so it offers a way to know when the file has been completely
		// downloaded.
		TimeUnit.SECONDS.sleep(10);

		//This is the endpoint which will provide us with list of files to download and also to
		//let us download a specific file.
		String downloadsEndpoint = String.format("/session/%s/se/files", driver.getSessionId());

		String fileToDownload;

		try (HttpClient client = HttpClient.Factory.createDefault().createClient(gridUrl)) {
			// To list all files that are were downloaded on the remote node for the current session
			// we trigger GET request.
			HttpRequest request = new HttpRequest(GET, downloadsEndpoint);
			HttpResponse response = client.execute(request);
			Map<String, Object> jsonResponse = new Json().toType(string(response), Json.MAP_TYPE);
			@SuppressWarnings("unchecked")
			Map<String, Object> value = (Map<String, Object>) jsonResponse.get("value");
			@SuppressWarnings("unchecked")
			List<String> names = (List<String>) value.get("names");
			// Let's say there were "n" files downloaded for the current session, we would like
			// to retrieve ONLY the first file.
			fileToDownload = names.get(0);
		}

		// Now, let's download the file
		try (HttpClient client = HttpClient.Factory.createDefault().createClient(gridUrl)) {
			// To retrieve a specific file from one or more files that were downloaded by the current session
			// on a remote node, we use a POST request.
			HttpRequest request = new HttpRequest(POST, downloadsEndpoint);
			request.setContent(asJson(ImmutableMap.of("name", fileToDownload)));
			HttpResponse response = client.execute(request);
			Map<String, Object> jsonResponse = new Json().toType(string(response), Json.MAP_TYPE);
			@SuppressWarnings("unchecked")
			Map<String, Object> value = (Map<String, Object>) jsonResponse.get("value");
			// The returned map would contain 2 keys,
			// filename - This represents the name of the file (same as what was provided by the test)
			// contents - Base64 encoded String which contains the zipped file.
			String zippedContents = value.get("contents").toString();
			// The file contents would always be a zip file and has to be unzipped.
			File downloadDir = Zip.unzipToTempDir(zippedContents, "download", "");
			// Read the file contents
			File downloadedFile = Optional.ofNullable(downloadDir.listFiles()).orElse(new File[]{})[0];
			String fileContent = String.join("", Files.readAllLines(downloadedFile.toPath()));
			System.out.println("The file which was "
					+ "downloaded in the node is now available in the directory: "
					+ downloadDir.getAbsolutePath() + " and has the contents: " + fileContent);
		}
	}


}

3.4.3 - Toml配置选项

使用Toml文件的Grid配置示例.

CLI选项 中 显示的所有选项都可以通过 TOML 文件进行配置. 此页面显示不同Grid组件的配置示例.

请注意, 如果修改或添加了选项, 但尚未记录, 则此文档可能已过时. 如果您遇到这种情况, 请查看 “配置帮助” 部分, 并随时向我们发送更新此页面的请求.

概述

Selenium Grid对配置文件使用 TOML 格式. 配置文件由多个部分组成, 每个部分都有选项及其各自的值.

有关详细的使用指南, 请参阅TOML文档 . 如果出现解析错误, 请使用 TOML linter 验证配置.

一般配置结构具有以下模式:

[section1]
option1="value"

[section2]
option2=["value1","value2"]
option3=true

下面是一些使用Toml文件配置的 Grid组件示例, 该组件可以 从下面的方式开始:

java -jar selenium-server-<version>.jar <component> --config /path/to/file/<file-name>.toml

单机模式

单机服务器, 在端口4449上运行, 新会话请求超时500秒.

[server]
port = 4449

[sessionqueue]
session-request-timeout = 500

特定浏览器和最大会话数限制

默认情况下仅启用Firefox 和Chrome的单机服务器或节点.

[node]
drivers = ["chrome", "firefox"]
max-sessions = 3

配置和定制驱动程序

具有定制驱动程序的单机或节点服务器, 允许使用Firefox试用或者每日构建的功能, 并且有不同的浏览器版本.

[node]
detect-drivers = false
[[node.driver-configuration]]
max-sessions = 100
display-name = "Firefox Nightly"
stereotype = "{\"browserName\": \"firefox\", \"browserVersion\": \"93\", \"platformName\": \"MAC\", \"moz:firefoxOptions\": {\"binary\": \"/Applications/Firefox Nightly.app/Contents/MacOS/firefox-bin\"}}"
[[node.driver-configuration]]
display-name = "Chrome Beta"
stereotype = "{\"browserName\": \"chrome\", \"browserVersion\": \"94\", \"platformName\": \"MAC\", \"goog:chromeOptions\": {\"binary\": \"/Applications/Google Chrome Beta.app/Contents/MacOS/Google Chrome Beta\"}}"
[[node.driver-configuration]]
display-name = "Chrome Dev"
stereotype = "{\"browserName\": \"chrome\", \"browserVersion\": \"95\", \"platformName\": \"MAC\", \"goog:chromeOptions\": {\"binary\": \"/Applications/Google Chrome Dev.app/Contents/MacOS/Google Chrome Dev\"}}"
webdriver-executable = '/path/to/chromedriver/95/chromedriver'

带Docker的单机或节点

单机或节点服务器能够在Docker容器中运行每个新会话. 禁用驱动程序检测, 则最多有2个并发会话. 原型配置需要映射一个Docker映像, Docker的守护进程需要通过http/tcp公开. 此外,可以通过 devices 属性定义在主机上可访问的哪些设备文件将在容器中可用。 有关 docker 设备映射如何工作的更多信息,请参阅 docker 文档。

[node]
detect-drivers = false
max-sessions = 2

[docker]
configs = [
    "selenium/standalone-chrome:93.0", "{\"browserName\": \"chrome\", \"browserVersion\": \"91\"}", 
    "selenium/standalone-firefox:92.0", "{\"browserName\": \"firefox\", \"browserVersion\": \"92\"}"
]
#Optionally define all device files that should be mapped to docker containers
#devices = [
#    "/dev/kvm:/dev/kvm"
#]
url = "http://localhost:2375"
video-image = "selenium/video:latest"

将命令中继到支持WebDriver的服务端点

连接到支持WebDriver外部服务 的Selenium Grid非常有用. 这种服务的一个例子可以是 云提供商或Appium服务器. 这样, Grid可以实现对本地不存在的平台和版本的更多覆盖.

下面是一个将Appium服务器连接到Grid的示例.

[node]
detect-drivers = false

[relay]
# Default Appium/Cloud server endpoint
url = "http://localhost:4723/wd/hub"
status-endpoint = "/status"
# Stereotypes supported by the service. The initial number is "max-sessions", and will allocate 
# that many test slots to that particular configuration
configs = [
  "5", "{\"browserName\": \"chrome\", \"platformName\": \"android\", \"appium:platformVersion\": \"11\"}"
]

启用基本身份验证

通过配置包含用户名和密码的 路由器/集线器/单机的方式, 可以使用这样的基本身份验证保护Grid. 加载Grid UI或者开始一个新的会话时 需要此用户/密码组合.

[router]
username = "admin"
password = "myStrongPassword"

下面是一个Java示例, 演示如何使用配置的用户和密码启动会话.

URL gridUrl = new URL("http://admin:myStrongPassword@localhost:4444");
RemoteWebDriver webDriver = new RemoteWebDriver(gridUrl, new ChromeOptions());

Setting custom capabilities for matching specific Nodes

Important: Custom capabilities need to be set in the configuration in all Nodes. They also need to be included always in every session request.

[node]
detect-drivers = false

[[node.driver-configuration]]
display-name = "firefox"
stereotype = '{"browserName": "firefox", "platformName": "macOS", "browserVersion":"96", "networkname:applicationName":"node_1", "nodename:applicationName":"app_1" }'
max-sessions = 5

Here is a Java example showing how to match that Node

FirefoxOptions options = new FirefoxOptions();
options.setCapability("networkname:applicationName", "node_1");
options.setCapability("nodename:applicationName", "app_1");
options.setBrowserVersion("96");
options.setPlatformName("macOS");
WebDriver driver = new RemoteWebDriver(new URL("http://localhost:4444"), options);
driver.get("https://selenium.dev");
driver.quit();

Enabling Managed downloads by the Node.

The Node can be instructed to manage downloads automatically. This will cause the Node to save all files that were downloaded for a particular session into a temp directory, which can later be retrieved from the node. To turn this capability on, use the below configuration:

[node]
enable-managed-downloads = true

Refer to the CLI section for a complete example.

3.5 - Grid架构

Page being translated from English to Chinese. Do you speak Chinese? Help us to translate it by sending us pull requests!

The Grid is designed as a set of components that all fulfill a role in maintaining the Grid. It can seem quite complicated, but hopefully this document can help clear up any confusion.

The Key Components

The main components of the Grid are:

Event Bus
Used for sending messages which may be received asynchronously between the other components.
New Session Queue
Maintains a list of incoming sessions which have yet to be assigned to a Node by the Distributor.
Distributor
Responsible for maintaining a model of the available locations in the Grid where a session may run (known as "slots") and taking any incoming new session requests and assigning them to a slot.
Node
Runs a WebDriver session. Each session is assigned to a slot, and each node has one or more slots.
Session Map
Maintains a mapping between the session ID and the address of the Node the session is running on.
Router
Acts as the front-end of the Grid. This is the only part of the Grid which may be exposed to the wider Web (though we strongly caution against it). This routes incoming requests to either the New Session Queue or the Node on which the session is running.

While discussing the Grid, there are some other useful concepts to keep in mind:

  • A slot is the place where a session can run.
  • Each slot has a stereotype. This is the minimal set of capabilities that a new session session request must match before the Distributor will send that request to the Node owning the slot.
  • The Grid Model is how the Distributor tracks the state of the Grid. As the name suggests, this may sometimes fall out of sync with reality (perhaps because the Distributor has only just started). It is used in preference to querying each Node so that the Distributor can quickly assign a slot to a New Session request.

Synchronous and Asynchronous Calls

There are two main communication mechanisms used within the Grid:

  1. Synchronous “REST-ish” JSON over HTTP requests.
  2. Asynchronous events sent to the Event Bus.

How do we pick which communication mechanism to use? After all, we could model the entire Grid in an event-based way, and it would work out just fine.

The answer is that if the action being performed is synchronous (eg. most WebDriver calls), or if missing the response would be problematic, the Grid uses a synchronous call. If, instead, we want to broadcast information to anyone who’s interested, or if missing the response doesn’t matter, then we prefer to use the event bus.

One interesting thing to note is that the async calls are more decoupled from their listeners than the synchronous calls are.

Start Up Sequence and Dependencies Between Components

Although the Grid is designed to allow components to start up in any order, conceptually the order in which components starts is:

  1. The Event Bus and Session Map start first. These have no other dependencies, not even on each other, and so are safe to start in parallel.
  2. The Session Queue starts next.
  3. It is now possible to start the Distributor. This will periodically connect to the Session Queue and poll for jobs, though this polling might be initiated either by an event (that a New Session has been added to the queue) or at regular intervals.
  4. The Router(s) can be started. New Session requests will be directed to the Session Queue, and the Distributor will attempt to find a slot to run the session on.
  5. We are now able to start a Node. See below for details about how the Node is registered with the Grid. Once registration is complete, the Grid is ready to serve traffic.

You can picture the dependencies between components this way, where a “✅” indicates that there is a synchronous dependency between the components.

Event BusDistributorNodeRouterSession MapSession Queue
Event BusX
DistributorX
NodeX
RouterX
Session MapX
Session QueueX

Node Registration

The process of registering a new Node to the Grid is lightweight.

  1. When the Node starts, it should emit a “heart beat” event on a regular basis. This heartbeat contains the node status.
  2. The Distributor listens for the heart beat events. When it sees one, it attempts to GET the /status endpoint of the Node. It is from this information that the Grid is set up.

The Distributor will use the same /status endpoint to check the Node on a regular basis, but the Node should continue sending heart beat events even after started so that a Distributor without a persistent store of the Grid state can be restarted and will (eventually) be up to date and correct.

The Node Status Object

The Node Status is a JSON blob with the following fields:

NameTypeDescription
availabilitystringA string which is one of up, draining, or down. The important one is draining, which indicates that no new sessions should be sent to the Node, and once the last session on it closes, the Node will exit or restart.
externalUrlstringThe URI that the other components in the Grid should connect to.
lastSessionCreatedintegerThe epoch timestamp of when the last session was created on this Node. The Distributor will attempt to send new sessions to the Node that has been idle longest if all other things are equal.
maxSessionCountintegerAlthough a session count can be inferred by counting the number of available slots, this integer value is used to determine the maximum number of sessions that should be running simultaneously on the Node before it is considered “full”.
nodeIdstringA UUID used to identify this instance of the Node.
osInfoobjectAn object with arch, name, and version fields. This is used by the Grid UI and the GraphQL queries.
slotsarrayAn array of Slot objects (described below)
versionstringThe version of the Node (for Selenium, this will match the Selenium version number)

It is recommended to put values in all fields.

The Slot Object

The Slot object represents a single slot within a Node. A “slot” is where a single session may be run. It is possible that a Node will have more slots than it can run concurrently. For example, a node may be able to run up 10 sessions, but they could be any combination of Chrome, Edge, or Firefox; in this case, the Node would indicate a “max session count” of 10, and then also say it has 10 slots for Chrome, 10 for Edge, and 10 for Firefox.

NameTypeDescription
idstringUUID to refer to the slot
lastStartedstringWhen the slot last had a session started, in ISO-8601 format
stereotypeobjectThe minimal set of capabilities this slot will match against. A minimal example is {"browserName": "firefox"}
sessionobjectThe Session object (see below)

The Session Object

This represents a running session within a slot

NameTypeDescription
capabilitiesobjectThe actual capabilities provided by the session. Will match the return value from the new session command
startTimestringThe start time of the session in ISO-8601 format
stereotypeobjectThe minimal set of capabilities this slot will match against. A minimal example is {"browserName": "firefox"}
uristringThe URI used by the Node to communicate with the session

3.6 - 高级功能

要获得高级功能的所有详细信息, 了解其工作原理, 以及如何设置自己的功能, 请浏览以下部分.

3.6.1 - 可观测性

Page being translated from English to Chinese. Do you speak Chinese? Help us to translate it by sending us pull requests!

Table of Contents

Selenium Grid

Grid aids in scaling and distributing tests by executing tests on various browser and operating system combinations.

Observability

Observability has three pillars: traces, metrics and logs. Since Selenium Grid 4 is designed to be fully distributed, observability will make it easier to understand and debug the internals.

Distributed tracing

A single request or transaction spans multiple services and components. Tracing tracks the request lifecycle as each service executes the request. It is useful in debugging in an error scenario. Some key terms used in tracing context are:

Trace Tracing allows one to trace a request through multiple services, starting from its origin to its final destination. This request’s journey helps in debugging, monitoring the end-to-end flow, and identifying failures. A trace depicts the end-to-end request flow. Each trace has a unique id as its identifier.

Span Each trace is made up of timed operations called spans. A span has a start and end time and it represents operations done by a service. The granularity of span depends on how it is instrumented. Each span has a unique identifier. All spans within a trace have the same trace id.

Span Attributes Span attributes are key-value pairs which provide additional information about each span.

Events Events are timed-stamped logs within a span. They provide additional context to the existing spans. Events also contain key-value pairs as event attributes.

Event logging

Logging is essential to debug an application. Logging is often done in a human-readable format. But for machines to search and analyze the logs, it has to have a well-defined format. Structured logging is a common practice of recording logs consistently in a fixed format. It commonly contains fields like:

  • Timestamp
  • Logging level
  • Logger class
  • Log message (This is further broken down into fields relevant to the operation where the log was recorded)

Logs and events are closely related. Events encapsulate all the possible information available to do a single unit of work. Logs are essentially subsets of an event. At the crux, both aid in debugging. Refer following resources for detailed understanding:

  1. https://www.honeycomb.io/blog/how-are-structured-logs-different-from-events/
  2. https://charity.wtf/2019/02/05/logs-vs-structured-events/

Grid Observability

Selenium server is instrumented with tracing using OpenTelemetry. Every request to the server is traced from start to end. Each trace consists of a series of spans as a request is executed within the server. Most spans in the Selenium server consist of two events:

  1. Normal event - records all information about a unit of work and marks successful completion of the work.
  2. Error event - records all information till the error occurs and then records the error information. Marks an exception event.

Running Selenium server

  1. Standalone
  2. Hub and Node
  3. Fully Distributed
  4. Docker

Visualizing Traces

All spans, events and their respective attributes are part of a trace. Tracing works while running the server in all of the above-mentioned modes.

By default, tracing is enabled in the Selenium server. Selenium server exports the traces via two exporters:

  1. Console - Logs all traces and their included spans at FINE level. By default, Selenium server prints logs at INFO level and above. The log-level flag can be used to pass a logging level of choice while running the Selenium Grid jar/s.
java -jar selenium-server-4.0.0-<selenium-version>.jar standalone --log-level FINE
  1. Jaeger UI - OpenTelemetry provides the APIs and SDKs to instrument traces in the code. Whereas Jaeger is a tracing backend, that aids in collecting the tracing telemetry data and providing querying, filtering and visualizing features for the data.

Detailed instructions of visualizing traces using Jaeger UI can be obtained by running the command :

java -jar selenium-server-4.0.0-<selenium-version>.jar info tracing

A very good example and scripts to run the server and send traces to Jaeger

Leveraging event logs

Tracing has to be enabled for event logging as well, even if one does not wish to export traces to visualize them.
By default, tracing is enabled. No additional parameters need to be passed to see logs on the console. All events within a span are logged at FINE level. Error events are logged at WARN level.

All event logs have the following fields :

FieldField valueDescription
Event timeeventIdTimestamp of the event record in epoch nanoseconds.
Trace IdtracedIdEach trace is uniquely identified by a trace id.
Span IdspanIdEach span within a trace is uniquely identified by a span id.
Span KindspanKindSpan kind is a property of span indicating the type of span. It helps in understanding the nature of the unit of work done by the Span.
Event nameeventNameThis maps to the log message.
Event attributeseventAttributesThis forms the crux of the event logs, based on the operation executed, it has JSON formatted key-value pairs. This also includes a handler class attribute, to show the logger class.

Sample log

FINE [LoggingOptions$1.lambda$export$1] - {
  "traceId": "fc8aef1d44b3cc8bc09eb8e581c4a8eb",
  "spanId": "b7d3b9865d3ddd45",
  "spanKind": "INTERNAL",
  "eventTime": 1597819675128886121,
  "eventName": "Session request execution complete",
  "attributes": {
    "http.status_code": 200,
    "http.handler_class": "org.openqa.selenium.grid.router.HandleSession",
    "http.url": "\u002fsession\u002fdd35257f104bb43fdfb06242953f4c85",
    "http.method": "DELETE",
    "session.id": "dd35257f104bb43fdfb06242953f4c85"
  }
}

In addition to the above fields, based on OpenTelemetry specification error logs consist of :

FieldField valueDescription
Exception typeexception.typeThe class name of the exception.
Exception messageexception.messageReason for the exception.
Exception stacktraceexception.stacktracePrints the call stack at the point of time when the exception was thrown. Helps in understanding the origin of the exception.

Sample error log

WARN [LoggingOptions$1.lambda$export$1] - {
  "traceId": "7efa5ea57e02f89cdf8de586fe09f564",
  "spanId": "914df6bc9a1f6e2b",
  "spanKind": "INTERNAL",
  "eventTime": 1597820253450580272,
  "eventName": "exception",
  "attributes": {
    "exception.type": "org.openqa.selenium.ScriptTimeoutException",
    "exception.message": "Unable to execute request: java.sql.SQLSyntaxErrorException: Table 'mysql.sessions_mappa' doesn't exist ..." (full message will be printed),
    "exception.stacktrace": "org.openqa.selenium.ScriptTimeoutException: java.sql.SQLSyntaxErrorException: Table 'mysql.sessions_mappa' doesn't exist\nBuild info: version: '4.0.0-alpha-7', revision: 'Unknown'\nSystem info: host: 'XYZ-MacBook-Pro.local', ip: 'fe80:0:0:0:10d5:b63a:bdc6:1aff%en0', os.name: 'Mac OS X', os.arch: 'x86_64', os.version: '10.13.6', java.version: '11.0.7'\nDriver info: driver.version: unknown ...." (full stack will be printed),
    "http.handler_class": "org.openqa.selenium.grid.distributor.remote.RemoteDistributor",
    "http.url": "\u002fsession",
    "http.method": "POST"
  }
}

Note: Logs are pretty printed above for readability. Pretty printing for logs is turned off in Selenium server.

The steps above should set you up for seeing traces and logs.

References

  1. Understanding Tracing
  2. OpenTelemetry Tracing API Specification
  3. Selenium Wiki
  4. Structured logs vs events
  5. Jaeger framework

3.6.2 - GraphQL查询支持

GraphQL 是一种用于API的查询语言, 也是用于使用现有数据完成这些查询的运行时. 其仅仅是使用户能够准确地获取所需.

枚举

枚举是表示字段的可能值的集合.

例如, Node对象具有一个称为status的字段. 状态是一个枚举 (特别是Status类型) , 因为它可能是UP , DRAININGUNAVAILABLE.

标量

标量是基本类型的值: Int, Float, String, Boolean, 或 ID.

在调用GraphQL API时, 必须指定嵌套子字段, 直到只返回标量.

模式的结构

网格模式的结构如下:

{
    session(id: "<session-id>") : {
        id,
        capabilities,
        startTime,
        uri,
        nodeId,
        nodeUri,
        sessionDurationMillis
        slot : {
            id,
            stereotype,
            lastStarted
        }
    }
    grid: {
        uri,
        totalSlots,
        nodeCount,
        maxSession,
        sessionCount,
        version,
        sessionQueueSize
    }
    sessionsInfo: {
        sessionQueueRequests,
        sessions: [
            {
                id,
                capabilities,
                startTime,
                uri,
                nodeId,
                nodeUri,
                sessionDurationMillis
                slot : {
                    id,
                    stereotype,
                    lastStarted
                }
            }
        ]
    }
    nodesInfo: {
        nodes : [
            {
                id,
                uri,
                status,
                maxSession,
                slotCount,
                sessions: [
                    {
                        id,
                        capabilities,
                        startTime,
                        uri,
                        nodeId,
                        nodeUri,
                        sessionDurationMillis
                        slot : {
                            id,
                            stereotype,
                            lastStarted
                        }
                    }
                ],
                sessionCount,
                stereotypes,
                version,
                osInfo: {
                    arch,
                    name,
                    version
                }
            }
        ]
    }
}

查询 GraphQL

查询GraphQL的最佳方法是使用curl请求. GraphQL允许您仅获取所需的数据, 仅此而已.

下面给出了一些GraphQL查询的示例. 您可以根据需要构建自己的查询.

查询网格中 maxSessionsessionCount 的数量:

curl -X POST -H "Content-Type: application/json" --data '{"query": "{ grid { maxSession, sessionCount } }"}' -s <LINK_TO_GRAPHQL_ENDPOINT>

通常在本地机器上 <LINK_TO_GRAPHQL_ENDPOINT> 会是 http://localhost:4444/graphql

查询全部会话、及节点以及网格的详情 :

curl -X POST -H "Content-Type: application/json" --data '{"query":"{ grid { uri, maxSession, sessionCount }, nodesInfo { nodes { id, uri, status, sessions { id, capabilities, startTime, uri, nodeId, nodeUri, sessionDurationMillis, slot { id, stereotype, lastStarted } }, slotCount, sessionCount }} }"}' -s <LINK_TO_GRAPHQL_ENDPOINT>

查询以获取当前网格的会话总数 :

curl -X POST -H "Content-Type: application/json" --data '{"query":"{ grid { sessionCount } }"}' -s <LINK_TO_GRAPHQL_ENDPOINT>

查询以获取网格中的最大会话数量 :

curl -X POST -H "Content-Type: application/json" --data '{"query":"{ grid { maxSession } }"}' -s <LINK_TO_GRAPHQL_ENDPOINT>

查询以获取网格中所有节点的全部会话详情 :

curl -X POST -H "Content-Type: application/json" --data '{"query":"{ sessionsInfo { sessions { id, capabilities, startTime, uri, nodeId, nodeId, sessionDurationMillis } } }"}' -s <LINK_TO_GRAPHQL_ENDPOINT>

查询以获取网格中每个节点中所有会话的插槽信息 :

curl -X POST -H "Content-Type: application/json" --data '{"query":"{ sessionsInfo { sessions { id, slot { id, stereotype, lastStarted } } } }"}' -s <LINK_TO_GRAPHQL_ENDPOINT>

查询以获取给定会话的会话信息查询以获取给定会话的会话信息 :

curl -X POST -H "Content-Type: application/json" --data '{"query":"{ session (id: \"<session-id>\") { id, capabilities, startTime, uri, nodeId, nodeUri, sessionDurationMillis, slot { id, stereotype, lastStarted } } } "}' -s <LINK_TO_GRAPHQL_ENDPOINT>

查询网格中每个节点的功能 :

curl -X POST -H "Content-Type: application/json" --data '{"query": "{ nodesInfo { nodes { stereotypes } } }"}' -s <LINK_TO_GRAPHQL_ENDPOINT>

查询网格中每个节点的状态 :

curl -X POST -H "Content-Type: application/json" --data '{"query": "{ nodesInfo { nodes { status } } }"}' -s <LINK_TO_GRAPHQL_ENDPOINT>

查询每个节点和网格的 URI :

curl -X POST -H "Content-Type: application/json" --data '{"query": "{ nodesInfo { nodes { uri } } }"}' -s <LINK_TO_GRAPHQL_ENDPOINT>

Query for getting the current requests in the New Session Queue:

curl -X POST -H "Content-Type: application/json" --data '{"query":"{ sessionsInfo { sessionQueueRequests } }"}' -s <LINK_TO_GRAPHQL_ENDPOINT>

Query for getting the New Session Queue size :

curl -X POST -H "Content-Type: application/json" --data '{"query":"{ grid { sessionQueueSize } }"}' -s <LINK_TO_GRAPHQL_ENDPOINT>

3.6.3 - Grid端点

Grid

Grid 状态

Grid状态提供Grid的当前状态. 它包含每个注册节点的详细信息. 对于每个节点, 状态包括有关节点可用性、会话和插槽的信息.

cURL GET 'http://localhost:4444/status'

在独立模式下, Grid URL是独立服务器的地址.

在集线器节点模式下, Grid URL是集线器服务器的地址.

在完全分布式模式下, Grid URL是路由服务器的地址.

以上所有模式的默认URL皆为http://localhost:4444.

分发器

删除节点

要从Grid中删除节点, 请使用下面列出的cURL命令. 它不会停止在该节点上运行的任何持续中的会话. 除非显式终止, 否则节点将继续按原样运行. 分发器不再知晓该节点, 因此任何匹配的新会话请求 也不会转发到该节点.

在独立模式下, 分发器URL是独立服务器的地址.

在集线器节点模式下, 分发器URL是集线器服务器的地址.

cURL --request DELETE 'http://localhost:4444/se/grid/distributor/node/<node-id>' --header 'X-REGISTRATION-SECRET: <secret> '

在完全分布式模式下, URL是分发器的地址.

cURL --request DELETE 'http://localhost:5553/se/grid/distributor/node/<node-id>' --header 'X-REGISTRATION-SECRET: <secret>'

如果在设置Grid时未配置注册密码, 则使用

cURL --request DELETE 'http://<Distributor-URL>/se/grid/distributor/node/<node-id>' --header 'X-REGISTRATION-SECRET;'

放空节点

节点放空命令用于优雅地关闭节点. 放空节点将在所有持续中的会话完成后停止节点. 但是, 它不接受任何新的会话请求.

在独立模式下, 分发器URL是独立服务器的地址.

在集线器节点模式下, 分发器URL是集线器服务器的地址.

cURL --request POST 'http://localhost:4444/se/grid/distributor/node/<node-id>/drain' --header 'X-REGISTRATION-SECRET: <secret> '

在完全分布式模式下, URL是分发服务器的地址.

cURL --request POST 'http://localhost:5553/se/grid/distributor/node/<node-id>/drain' --header 'X-REGISTRATION-SECRET: <secret>'

如果在设置Grid时未配置注册密码, 则使用

cURL --request POST 'http://<Distributor-URL>/se/grid/distributor/node/<node-id>/drain' --header 'X-REGISTRATION-SECRET;'

节点

本节中的端点适用于 集线器节点模式和 节点独立运行的完全分布式网格模式. 在一个节点的情况下, 默认节点的URL为 http://localhost:5555 . 如果有多个节点, 请使用 Grid 状态 获取所有节点详细信息 以及定位地址.

状态

节点状态本质上是节点的运行状况检查. 分发器定期ping节点状态, 并相应地更新Grid模型. 状态包括相关的可用性, 会话和插槽的信息.

cURL --request GET 'http://localhost:5555/status'

放空

分发器将 放空 命令传递给 由node-id标识的相应节点. 要直接放空节点, 请使用下面列出的cuRL命令. 两个端点都有效并产生相同的结果. 放空会等待持续中的会话完成后 才停止节点.

cURL --request POST 'http://localhost:5555/se/grid/node/drain' --header 'X-REGISTRATION-SECRET: <secret>'

如果在设置Grid时未配置注册密码, 则使用

cURL --request POST 'http://<node-URL>/se/grid/node/drain' --header 'X-REGISTRATION-SECRET;'

检查会话所有者

要检查会话是否属于某一节点, 请使用下面列出的cURL命令.

cURL --request GET 'http://localhost:5555/se/grid/node/owner/<session-id>' --header 'X-REGISTRATION-SECRET: <secret>'

如果在设置Grid时未配置注册密码, 则使用

cURL --request GET 'http://<node-URL>/se/grid/node/owner/<session-id>' --header 'X-REGISTRATION-SECRET;'

如果会话属于该节点, 则返回true, 否则返回false.

删除会话

删除会话将终止WebDriver会话, 退出驱动程序 并将其从活动会话集合中删除. 任何使用删除的会话id 或重用驱动程序实例的请求 都将抛出错误.

cURL --request DELETE 'http://localhost:5555/se/grid/node/session/<session-id>' --header 'X-REGISTRATION-SECRET: <secret>'

如果在设置Grid时未配置注册密码, 则使用

cURL --request DELETE 'http://<node-URL>/se/grid/node/session/<session-id>' --header 'X-REGISTRATION-SECRET;'

新会话队列

清除新会话队列

新会话请求队列保存新会话请求. 要清除队列, 请使用下面列出的cURL命令. 清除队列将拒绝队列中的所有请求. 对于每个这样的请求, 服务器都会向相应的客户端返回一个错误响应. 清除命令的返回结果是 已删除请求的总数.

在独立模式下, 队列URL是独立服务器的地址. 在集线器节点模式下, 队列URL是集线器服务器的地址.

cURL --request DELETE 'http://localhost:4444/se/grid/newsessionqueue/queue' --header 'X-REGISTRATION-SECRET: <secret>'

在完全分布式模式下, 队列URL是新会话队列服务器的地址.

cURL --request DELETE 'http://localhost:5559/se/grid/newsessionqueue/queue' --header 'X-REGISTRATION-SECRET: <secret>'

如果在设置Grid时未配置注册密码, 则使用

cURL --request DELETE 'http://<URL>/se/grid/newsessionqueue/queue' --header 'X-REGISTRATION-SECRET;'

获取新会话队列请求

New Session Request Queue holds the new session requests. To get the current requests in the queue, use the cURL command enlisted below. The response returns the total number of requests in the queue and the request payloads. 新会话请求队列保存新会话请求. 要获取队列中的当前请求, 请使用下面列出的cURL命令. 响应会返回队列中的请求总数以及请求内容.

在独立模式下, 队列URL是独立服务器的地址. 在集线器节点模式下, 队列URL是集线器服务器的地址.

cURL --request GET 'http://localhost:4444/se/grid/newsessionqueue/queue'

在完全分布式模式下, 队列URL是新会话队列服务器的地址.

cURL --request GET 'http://localhost:5559/se/grid/newsessionqueue/queue'

3.6.4 - Customizing a Node

Page being translated from English to Chinese. Do you speak Chinese? Help us to translate it by sending us pull requests!

How to customize a Node

There are times when we would like a Node to be customized to our needs.

For e.g., we may like to do some additional setup before a session begins execution and some clean-up after a session runs to completion.

Following steps can be followed for this:

  • Create a class that extends org.openqa.selenium.grid.node.Node

  • Add a static method (this will be our factory method) to the newly created class whose signature looks like this:

    public static Node create(Config config). Here:

    • Node is of type org.openqa.selenium.grid.node.Node
    • Config is of type org.openqa.selenium.grid.config.Config
  • Within this factory method, include logic for creating your new Class.

  • To wire in this new customized logic into the hub, start the node and pass in the fully qualified class name of the above class to the argument --node-implementation

Let’s see an example of all this:

Custom Node as an uber jar

  1. Create a sample project using your favourite build tool (Maven|Gradle).
  2. Add the below dependency to your sample project.
  3. Add your customized Node to the project.
  4. Build an uber jar to be able to start the Node using java -jar command.
  5. Now start the Node using the command:
java -jar custom_node-server.jar node \
--node-implementation org.seleniumhq.samples.DecoratedLoggingNode

Note: If you are using Maven as a build tool, please prefer using maven-shade-plugin instead of maven-assembly-plugin because maven-assembly plugin seems to have issues with being able to merge multiple Service Provider Interface files (META-INF/services)

Custom Node as a regular jar

  1. Create a sample project using your favourite build tool (Maven|Gradle).
  2. Add the below dependency to your sample project.
  3. Add your customized Node to the project.
  4. Build a jar of your project using your build tool.
  5. Now start the Node using the command:
java -jar selenium-server-4.6.0.jar \
--ext custom_node-1.0-SNAPSHOT.jar node \
--node-implementation org.seleniumhq.samples.DecoratedLoggingNode

Below is a sample that just prints some messages on to the console whenever there’s an activity of interest (session created, session deleted, a webdriver command executed etc.,) on the Node.

Sample customized node
package org.seleniumhq.samples;

import java.net.URI;
import java.util.UUID;
import org.openqa.selenium.Capabilities;
import org.openqa.selenium.NoSuchSessionException;
import org.openqa.selenium.WebDriverException;
import org.openqa.selenium.grid.config.Config;
import org.openqa.selenium.grid.data.CreateSessionRequest;
import org.openqa.selenium.grid.data.CreateSessionResponse;
import org.openqa.selenium.grid.data.NodeId;
import org.openqa.selenium.grid.data.NodeStatus;
import org.openqa.selenium.grid.data.Session;
import org.openqa.selenium.grid.log.LoggingOptions;
import org.openqa.selenium.grid.node.HealthCheck;
import org.openqa.selenium.grid.node.Node;
import org.openqa.selenium.grid.node.local.LocalNodeFactory;
import org.openqa.selenium.grid.security.Secret;
import org.openqa.selenium.grid.security.SecretOptions;
import org.openqa.selenium.grid.server.BaseServerOptions;
import org.openqa.selenium.internal.Either;
import org.openqa.selenium.remote.SessionId;
import org.openqa.selenium.remote.http.HttpRequest;
import org.openqa.selenium.remote.http.HttpResponse;
import org.openqa.selenium.remote.tracing.Tracer;

public class DecoratedLoggingNode extends Node {

  private Node node;

  protected DecoratedLoggingNode(Tracer tracer, URI uri, Secret registrationSecret) {
    super(tracer, new NodeId(UUID.randomUUID()), uri, registrationSecret);
  }

  public static Node create(Config config) {
    LoggingOptions loggingOptions = new LoggingOptions(config);
    BaseServerOptions serverOptions = new BaseServerOptions(config);
    URI uri = serverOptions.getExternalUri();
    SecretOptions secretOptions = new SecretOptions(config);

    // Refer to the foot notes for additional context on this line.
    Node node = LocalNodeFactory.create(config);

    DecoratedLoggingNode wrapper = new DecoratedLoggingNode(loggingOptions.getTracer(),
        uri, secretOptions.getRegistrationSecret());
    wrapper.node = node;
    return wrapper;
  }

  @Override
  public Either<WebDriverException, CreateSessionResponse> newSession(
      CreateSessionRequest sessionRequest) {
    System.out.println("Before newSession()");
    try {
      return this.node.newSession(sessionRequest);
    } finally {
      System.out.println("After newSession()");
    }
  }

  @Override
  public HttpResponse executeWebDriverCommand(HttpRequest req) {
    try {
      System.out.println("Before executeWebDriverCommand(): " + req.getUri());
      return node.executeWebDriverCommand(req);
    } finally {
      System.out.println("After executeWebDriverCommand()");
    }
  }

  @Override
  public Session getSession(SessionId id) throws NoSuchSessionException {
    try {
      System.out.println("Before getSession()");
      return node.getSession(id);
    } finally {
      System.out.println("After getSession()");
    }
  }

  @Override
  public HttpResponse uploadFile(HttpRequest req, SessionId id) {
    try {
      System.out.println("Before uploadFile()");
      return node.uploadFile(req, id);
    } finally {
      System.out.println("After uploadFile()");
    }
  }

  @Override
  public void stop(SessionId id) throws NoSuchSessionException {
    try {
      System.out.println("Before stop()");
      node.stop(id);
    } finally {
      System.out.println("After stop()");
    }
  }

  @Override
  public boolean isSessionOwner(SessionId id) {
    try {
      System.out.println("Before isSessionOwner()");
      return node.isSessionOwner(id);
    } finally {
      System.out.println("After isSessionOwner()");
    }
  }

  @Override
  public boolean isSupporting(Capabilities capabilities) {
    try {
      System.out.println("Before isSupporting");
      return node.isSupporting(capabilities);
    } finally {
      System.out.println("After isSupporting()");
    }
  }

  @Override
  public NodeStatus getStatus() {
    try {
      System.out.println("Before getStatus()");
      return node.getStatus();
    } finally {
      System.out.println("After getStatus()");
    }
  }

  @Override
  public HealthCheck getHealthCheck() {
    try {
      System.out.println("Before getHealthCheck()");
      return node.getHealthCheck();
    } finally {
      System.out.println("After getHealthCheck()");
    }
  }

  @Override
  public void drain() {
    try {
      System.out.println("Before drain()");
      node.drain();
    } finally {
      System.out.println("After drain()");
    }

  }

  @Override
  public boolean isReady() {
    try {
      System.out.println("Before isReady()");
      return node.isReady();
    } finally {
      System.out.println("After isReady()");
    }
  }
}

Foot Notes:

In the above example, the line Node node = LocalNodeFactory.create(config); explicitly creates a LocalNode.

There are basically 2 types of user facing implementations of org.openqa.selenium.grid.node.Node available.

These classes are good starting points to learn how to build a custom Node and also to learn the internals of a Node.

  • org.openqa.selenium.grid.node.local.LocalNode - Used to represent a long running Node and is the default implementation that gets wired in when you start a node.
    • It can be created by calling LocalNodeFactory.create(config);, where:
      • LocalNodeFactory belongs to org.openqa.selenium.grid.node.local
      • Config belongs to org.openqa.selenium.grid.config
  • org.openqa.selenium.grid.node.k8s.OneShotNode - This is a special reference implementation wherein the Node gracefully shuts itself down after servicing one test session. This class is currently not available as part of any pre-built maven artifact.
    • You can refer to the source code here to understand its internals.
    • To build it locally refer here.
    • It can be created by calling OneShotNode.create(config), where:
      • OneShotNode belongs to org.openqa.selenium.grid.node.k8s
      • Config belongs to org.openqa.selenium.grid.config

3.6.5 - External datastore

Page being translated from English to Chinese. Do you speak Chinese? Help us to translate it by sending us pull requests!

Table of Contents

Introduction

Selenium Grid allows you to persist information related to currently running sessions into an external data store. The external data store could be backed by your favourite database (or) Redis Cache system.

Setup

  • Coursier - As a dependency resolver, so that we can download maven artifacts on the fly and make them available in our classpath
  • Docker - To manage our PostGreSQL/Redis docker containers.

Database backed Session Map

For the sake of this illustration, we are going to work with PostGreSQL database.

We will spin off a PostGreSQL database as a docker container using a docker compose file.

Steps

You can skip this step if you already have a PostGreSQL database instance available at your disposal.

  • Create a sql file named init.sql with the below contents:
CREATE TABLE IF NOT EXISTS sessions_map(
    session_ids varchar(256),
    session_caps text,
    session_uri varchar(256),
    session_stereotype text,
    session_start varchar(256)
 );
  • In the same directory as the init.sql, create a file named docker-compose.yml with its contents as below:
version: '3.8'
services:
  db:
    image: postgres:9.6-bullseye
    restart: always
    environment:
      - POSTGRES_USER=seluser
      - POSTGRES_PASSWORD=seluser
      - POSTGRES_DB=selenium_sessions
    ports:
      - "5432:5432"
    volumes:
    - ./init.sql:/docker-entrypoint-initdb.d/init.sql

We can now start our database container by running:

docker-compose up -d

Our database name is selenium_sessions with its username and password set to seluser

If you are working with an already running PostGreSQL DB instance, then you just need to create a database named selenium_sessions and the table sessions_map using the above mentioned SQL statement.

  • Create a Selenium Grid configuration file named sessions.toml with the below contents:
[sessions]
implementation = "org.openqa.selenium.grid.sessionmap.jdbc.JdbcBackedSessionMap"
jdbc-url = "jdbc:postgresql://localhost:5432/selenium_sessions"
jdbc-user = "seluser"
jdbc-password = "seluser"

Note: If you plan to use an existing PostGreSQL DB instance, then replace localhost:5432 with the actual host and port number of your instance.

  • Below is a simple shell script (let’s call it distributed.sh) that we will use to bring up our distributed Grid.
SE_VERSION=<current_selenium_version>
JAR_NAME=selenium-server-${SE_VERSION}.jar
PUBLISH="--publish-events tcp://localhost:4442"
SUBSCRIBE="--subscribe-events tcp://localhost:4443"
SESSIONS="--sessions http://localhost:5556"
SESSIONS_QUEUE="--sessionqueue http://localhost:5559"
echo 'Starting Event Bus'
java -jar $JAR_NAME event-bus $PUBLISH $SUBSCRIBE --port 5557 &
echo 'Starting New Session Queue'
java -jar $JAR_NAME sessionqueue --port 5559 &
echo 'Starting Sessions Map'
java -jar $JAR_NAME \
--ext $(coursier fetch -p org.seleniumhq.selenium:selenium-session-map-jdbc:${SE_VERSION} org.postgresql:postgresql:42.3.1) \
sessions $PUBLISH $SUBSCRIBE --port 5556 --config sessions.toml &
echo 'Starting Distributor'
java -jar $JAR_NAME  distributor $PUBLISH $SUBSCRIBE $SESSIONS $SESSIONS_QUEUE --port 5553 --bind-bus false &
echo 'Starting Router'
java -jar $JAR_NAME router $SESSIONS --distributor http://localhost:5553 $SESSIONS_QUEUE --port 4444 &
echo 'Starting Node'
java -jar $JAR_NAME node $PUBLISH $SUBSCRIBE &
  • At this point the current directory should contain the following files:

    • docker-compose.yml
    • init.sql
    • sessions.toml
    • distributed.sh
  • You can now spawn the Grid by running distributed.sh shell script and quickly run a test. You will notice that the Grid now stores session information into the PostGreSQL database.

In the line which spawns a SessionMap on a machine:

export SE_VERSION=<current_selenium_version>
java -jar selenium-server-${SE_VERSION}.jar \
--ext $(coursier fetch -p org.seleniumhq.selenium:selenium-session-map-jdbc:${SE_VERSION} org.postgresql:postgresql:42.3.1) \
sessions --publish-events tcp://localhost:4442 \
--subscribe-events tcp://localhost:4443 \
--port 5556 --config sessions.toml 
  • The variable names from the above script have been replaced with their actual values for clarity.
  • Remember to substitute localhost with the actual hostname of the machine where your Event-Bus is running.
  • The arguments being passed to coursier are basically the GAV (Group Artifact Version) Maven co-ordinates of:
  • sessions.toml is the configuration file that we created earlier.

Redis backed Session Map

We will spin off a Redis Cache docker container using a docker compose file.

Steps

You can skip this step if you already have a Redis Cache instance available at your disposal.

  • Create a file named docker-compose.yml with its contents as below:
version: '3.8'
services:
  redis:
    image: redis:bullseye
    restart: always
    ports:
      - "6379:6379"

We can now start our Redis container by running:

docker-compose up -d
  • Create a Selenium Grid configuration file named sessions.toml with the below contents:
[sessions]
scheme = "redis"
implementation = "org.openqa.selenium.grid.sessionmap.redis.RedisBackedSessionMap"
hostname = "localhost"
port = 6379

Note: If you plan to use an existing Redis Cache instance, then replace localhost and 6379 with the actual host and port number of your instance.

  • Below is a simple shell script (let’s call it distributed.sh) that we will use to bring up our distributed grid.
SE_VERSION=<current_selenium_version>
JAR_NAME=selenium-server-${SE_VERSION}.jar
PUBLISH="--publish-events tcp://localhost:4442"
SUBSCRIBE="--subscribe-events tcp://localhost:4443"
SESSIONS="--sessions http://localhost:5556"
SESSIONS_QUEUE="--sessionqueue http://localhost:5559"
echo 'Starting Event Bus'
java -jar $JAR_NAME event-bus $PUBLISH $SUBSCRIBE --port 5557 &
echo 'Starting New Session Queue'
java -jar $JAR_NAME sessionqueue --port 5559 &
echo 'Starting Session Map'
java -jar $JAR_NAME \
--ext $(coursier fetch -p org.seleniumhq.selenium:selenium-session-map-redis:${SE_VERSION}) \
sessions $PUBLISH $SUBSCRIBE --port 5556 --config sessions.toml &
echo 'Starting Distributor'
java -jar $JAR_NAME  distributor $PUBLISH $SUBSCRIBE $SESSIONS $SESSIONS_QUEUE --port 5553 --bind-bus false &
echo 'Starting Router'
java -jar $JAR_NAME router $SESSIONS --distributor http://localhost:5553 $SESSIONS_QUEUE --port 4444 &
echo 'Starting Node'
java -jar $JAR_NAME node $PUBLISH $SUBSCRIBE &
  • At this point the current directory should contain the following files:

    • docker-compose.yml
    • sessions.toml
    • distributed.sh
  • You can now spawn the Grid by running distributed.sh shell script and quickly run a test. You will notice that the Grid now stores session information into the Redis instance. You can perhaps make use of a Redis GUI such as TablePlus to see them (Make sure that you have setup a debug point in your test, because the values will get deleted as soon as the test runs to completion).

In the line which spawns a SessionMap on a machine:

export SE_VERSION=<current_selenium_version>
java -jar selenium-server-${SE_VERSION}.jar \
--ext $(coursier fetch -p org.seleniumhq.selenium:selenium-session-map-redis:${SE_VERSION}) \
sessions --publish-events tcp://localhost:4442 \
--subscribe-events tcp://localhost:4443 \
--port 5556 --config sessions.toml 
  • The variable names from the above script have been replaced with their actual values for clarity.
  • Remember to substitute localhost with the actual hostname of the machine where your Event-Bus is running.
  • The arguments being passed to coursier are basically the GAV (Group Artifact Version) Maven co-ordinates of:
  • sessions.toml is the configuration file that we created earlier.

4 - IE驱动服务器

Internet Explorer驱动是一种实现WebDriver规范的单机服务器.

This documentation previously located on the wiki

InternetExplorerDriver 是一个独立的服务器, 它实现WebDriver的wire协议. 此驱动程序已在Windows 10的IE 11上进行了测试. 它可能适用于旧版本的IE和Windows, 但未受支持.

驱动程序支持运行32位和64位版本的浏览器. 启动浏览器的"位数", 取决于启动的IEDriverServer.exe的版本. 若是32位的IEDriverServer.exe已启动, 将启动32位版本的IE. 同样的, 如果启动的64位版本的IEDriverServer.exe, 将启动64位版本的IE.

安装

尽管需要一些配置, 但是在使用InternetExplorerDriver之前, 您并不需要运行安装程序. 独立服务器可执行文件必须从下载 页面下载并放置在您的PATH 中.

Pros

  • 在真实浏览器中运行并支持JavaScript

Cons

  • 显然InternetExplorerDriver只能在Windows上运行!
  • 相当慢 (尽管仍然很快 :)

命令行开关

作为一个独立的可执行文件, IE驱动程序的行为可以通过各种命令行参数进行修改. 要设置这些命令行参数的值, 应参考所使用语言绑定的文档. 下表描述了支持的命令行开关. 支持所有 -<开关>, –<开关> 和 /<开关> .

开关释义
–port=<端口号>指定IE驱动程序的HTTP服务器将在其上侦听来自语言绑定的命令的端口. 默认值为5555.
–host=<主机IP地址>指定主机适配器的IP地址, IE驱动程序的HTTP服务器将在其上侦听来自语言绑定的命令. 默认值为127.0.0.1.
–log-level=<日志级别>指定日志消息的输出级别. 有效值包括 FATAL, ERROR, WARN, INFO, DEBUG, 以及 TRACE. 默认为FATAL.
–log-file=<日志文件>指定日志文件的完整路径和文件名. 默认为标准输出.
–extract-path=<路径>指定用于提取服务器使用的支持文件的目录的完整路径. 如果未指定, 则默认为临时目录.
–silent在服务器启动时抑制诊断输出.

重要的系统属性

以下系统属性用于 InternetExplorerDriver (在Java代码中使用 System.getProperty() 读取并使用System.setProperty() 进行设置, 或者 在命令行标识 “-DpropertyName=value”) :

属性含义
webdriver.ie.driverIE驱动程序二进制文件的位置.
webdriver.ie.driver.host指定IE驱动程序将在其上侦听的主机适配器的IP地址.
webdriver.ie.driver.loglevel指定日志消息的输出级别. 有效值包括 FATAL, ERROR, WARN, INFO, DEBUG, 以及 TRACE. 默认为FATAL.
webdriver.ie.driver.logfile指定日志文件的完整路径和文件名.
webdriver.ie.driver.silent在IE驱动程序启动时抑制诊断输出.
webdriver.ie.driver.extractpath指定用于提取服务器使用的支持文件的目录的完整路径. 如果未指定, 则默认为临时目录.

必要的配置

  • IEDriverServer可执行文件必须已下载 并放置在您的 PATH 中.
  • 在Windows Vista, Windows 7或Windows 10上的IE 7或更高版本上, 必须将每个区域的保护模式设置设置为相同的值. 该值可以打开或关闭, 只要每个区域的值相同. 要设置受保护模式设置, 请从“工具”菜单中选择“Internet选项…”, 然后单击“安全”选项卡. 对于每个区域, 标签底部将有一个复选框, 标记为“启用保护模式”.
  • 此外, IE 10及更高版本必须禁用“增强保护模式”. 此选项位于“Internet选项”对话框的“高级”选项卡中.
  • 浏览器缩放级别必须设置为100%, 以便将本机鼠标事件设置为正确的坐标.
  • 对于Windows 10, 您还需要在显示设置中将“更改文本、应用程序和其他项目的大小”设置为100%.
  • 对于IE 11, 您需要在目标计算机上设置一个注册表项, 以便驱动程序能够保持与它创建的Internet Explorer实例的连接. 对于32位Windows安装, 您必须在注册表编辑器中检查的项是HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet Explorer\Main\FeatureControl\FEATURE_BFCACHE. 对于64位Windows安装, 键为 HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Internet Explorer\Main\FeatureControl\FEATURE_BFCACHE. 请注意, FEATURE_BFCACHE 子项可能存在, 也可能不存在, 如果不存在, 则应创建该子项. 要点: 在此键内, 创建一个名为iexplore.exe值为0的DWORD.

原生事件以及Internet Explorer

由于 InternetExplorerDriver 仅适用于Windows, 因此它尝试使用所谓的“原生”或操作系统级事件 在浏览器中执行鼠标和键盘操作. 针对同样的操作, 与使用JavaScript模拟相比有明显的差异. 使用原生事件的优点是它不依赖于JavaScript沙箱, 并且可以确保在浏览器中正确传播JavaScript事件. 但是, 当IE浏览器窗口失焦, 以及当尝试将鼠标悬停在元素上时, 当前存在一些鼠标事件问题.

浏览器焦点

难点在于, 如果窗口没有焦点, IE本身似乎没有完全尊重我们在IE浏览器窗口 (WM\_MOUSEDOWNWM\_MOUSEUP) 上发送的Windows消息. 具体来说, 被单击的元素将在其周围接收一个焦点窗口, 但该元素不会处理该单击. 可以说, 我们根本不应该发送信息; 相反, 我们应该使用 SendInput() API, 但该API明确要求窗口具有焦点. WebDriver项目有两个相互冲突的目标.

首先, 我们争取尽可能地模仿用户. 这意味着使用原生事件, 而不是使用JavaScript模拟事件.

其次, 我们不希望浏览器窗口的焦点被自动化. 这意味着仅将浏览器窗口强制到前台是次优的. .

另一个考虑因素是 多个IE实例在多个WebDriver实例下运行的可能性, 这意味着任何此类“将窗口置于前台”解决方案 都必须包装某种同步结构 (互斥体?)到 IE驱动程序的C++代码中. 即使如此, 若用户将另一个窗口带到前台的操作, 介于驱动程序将IE带到前台和执行原生事件之间, 则此代码仍将受制于竞争条件.

围绕驱动程序的需求, 以及如何对这两个相互冲突的目标进行优先排序的讨论正在进行中. 当前流行的观点是前者优先于后者, 并在使用IE驱动程序时, 将您的机器标记为无法执行其他任务. 然而, 这一决定远未最终确定, 实现这一决定的代码可能相当复杂.

悬停在元素上

当您尝试将鼠标悬停在元素上, 并且物理鼠标光标位于IE浏览器窗口的边界内时, 鼠标悬停将不起作用. 更具体地说, 悬停将在几分之一秒内工作, 然后元素将恢复到其以前的状态. 出现这种情况的普遍理论是, IE在其事件循环期间正在进行某种类型的命中测试, 这导致它在物理光标位于窗口边界内时响应物理鼠标位置. WebDriver开发团队仍未找到解决IE这种行为的方法.

点击元素选项或提交表单以及弹窗

IE驱动程序在两个地方不使用原生事件与元素交互. 这是在 <select> 元素中单击 <option> 元素. 在正常情况下, IE驱动程序根据元素的位置和大小计算单击位置, 通常由JavaScript getBoundingClientRect() 方法返回. 但是, 对于 <option> 元素, getBoundingClientRect() 返回一个位置为零、大小为零的矩形. IE驱动程序通过使用 click() 来处理这一场景, 它本质上设置了元素 .selected 的属性, 并在JavaScript中模拟 onChange 事件. 但是, 这意味着如果 <select> 元素的 onChange 事件 包含调用 alert(), confirm()prompt()的JavaScript代码, 则调用WebElement的 click() 方法将挂起, 直到手动取消模式对话框. 对于此行为, 仅使用WebDriver代码没有已知的解决方法.

类似地, 在某些情况下, 通过WebElement的submit()方法提交HTML表单 可能会产生相同的效果. 如果驱动程序调用表单上的JavaScriptsubmit()函数, 并且有一个onSubmit事件处理程序调用 JavaScript alert(), confirm()prompt()函数, 则可能发生这种情况.

该限制被记录为3508号问题 (谷歌代码).

InternetExplorerDriver多实例

随着 IEDriverServer.exe的创建, 应该可以同时创建和使用InternetExplorerDriver的多个实例. 但是, 此功能基本上未经测试, 并且可能存在cookie、窗口焦点等问题. 如果尝试使用IE驱动程序的多个实例, 并遇到这些问题, 请考虑使用RemoteWebDriver 和虚拟机.

对于InternetExplorer的多个实例之间 共享的Cookie (和其他会话项) 问题, 有两种解决方案.

首先是在私有模式下启动InternetExplorer. 在此之后, InternetExplorer将使用干净的会话数据启动, 并且在退出时不会保存更改的会话数据. 为此, 您需要向驱动程序传递2个特定功能: 具有 true 值的 ie.forceCreateProcessApi 和具有 -private 值的 ie.browserCommandLineSwitches . 请注意, 它仅适用于InternetExplorer 8及更高版本, Windows注册表HKLM_CURRENT_USER\\Software\\Microsoft\\Internet Explorer\\Main 路径应包含值为 0TabProcGrowth 键.

第二个是在InternetExplorer启动期间清理会话. 为此, 您需要将具有 true 值的 特定 ie.ensureCleanSession 功能传递给驱动程序. 这将清除InternetExplorer所有运行实例的缓存, 包括手动启动的实例.

远程远行 IEDriverServer.exe

IEDriverServer.exe 启动的HTTP服务器, 将访问控制列表设置为仅接受来自本地计算机的连接, 并禁止来自远程计算机的传入连接. 目前, 如果不修改IEDriverServer.exe的源代码, 这一点是无法更改的. 要在远程计算机上运行Internet Explorer驱动程序, 请将Java独立远程服务器 与语言绑定的等效RemoteWebDriver结合使用.

按照Windows服务运行 IEDriverServer.exe

明确不支持尝试使用IEDriverServer.exe 作为Windows服务应用程序的一部分. 服务进程及其衍生的流程的需求 与在常规用户上下文中执行的流程有很大不同. IEDriverServer.exe 在该环境中未经测试, 并且包括明确地禁止在服务进程中使用的Windows API调用. 虽然可以让IE驱动程序在服务进程下运行, 但在该环境中遇到问题的用户需要寻求自己的解决方案.

4.1 - Internet Explorer Driver Internals

More detailed information on the IE Driver.

Client Code Into the Driver

We use the W3C WebDriver protocol to communicate with a local instance of an HTTP server. This greatly simplifies the implementation of the language-specific code, and minimzes the number of entry points into the C++ DLL that must be called using a native-code interop technology such as JNA, ctypes, pinvoke or DL.

Memory Management

The IE driver utilizes the Active Template Library (ATL) to take advantage of its implementation of smart pointers to COM objects. This makes reference counting and cleanup of COM objects much easier.

Why Do We Require Protected Mode Settings Changes?

IE 7 on Windows Vista introduced the concept of Protected Mode, which allows for some measure of protection to the underlying Windows OS when browsing. The problem is that when you manipulate an instance of IE via COM, and you navigate to a page that would cause a transition into or out of Protected Mode, IE requires that another browser session be created. This will orphan the COM object of the previous session, not allowing you to control it any longer.

In IE 7, this will usually manifest itself as a new top-level browser window; in IE 8, a new IExplore.exe process will be created, but it will usually (not always!) seamlessly attach it to the existing IE top-level frame window. Any browser automation framework that drives IE externally (as opposed to using a WebBrowser control) will run into these problems.

In order to work around that problem, we dictate that to work with IE, all zones must have the same Protected Mode setting. As long as it’s on for all zones, or off for all zones, we can prevent the transistions to different Protected Mode zones that would invalidate our browser object. It also allows users to continue to run with UAC turned on, and to run securely in the browser if they set Protected Mode “on” for all zones.

In earlier releases of the IE driver, if the user’s Protected Mode settings were not correctly set, we would launch IE, and the process would simply hang until the HTTP request timed out. This was suboptimal, as it gave no indication what needed to be set. Erring on the side of caution, we do not modify the user’s Protected Mode settings. Current versions, however check that the Protected Mode settings are properly set, and will return an error response if they are not.

Keyboard and Mouse Input

Key files: interactions.cpp

There are two ways that we could simulate keyboard and mouse input. The first way, which is used in parts of webdriver, is to synthesize events on the DOM. This has a number of drawbacks, since each browser (and version of a browser) has its own unique quirks; to model each of these is a demanding task, and impossible to get completely right (for example, it’s hard to tell what window.selection should be and this is a read-only property on some browsers) The alternative approach is to synthesize keyboard and mouse input at the OS level, ideally without stealing focus from the user (who tends to be doing other things on their computer as long-running webdriver tests run)

The code for doing this is in interactions.cpp The key thing to note here is that we use PostMessages to push window events on to the message queue of the IE instance. Typing, in particular, is interesting: we only send the “keydown” and “keyup” messages. The “keypress” event is created if necessary by IE’s internal event processing. Because the key press event is not always generated (for example, not every character is printable, and if the default event bubbling is cancelled, listeners don’t see the key press event) we send a “probe” event in after the key down. Once we see that this has been processed, we know that the key press event is on the stack of events to be processed, and that it is safe to send the key up event. If this was not done, it is possible for events to fire in the wrong order, which is definitely sub-optimal.

Working On the InternetExplorerDriver

Currently, there are tests that will run for the InternetExplorerDriver in all languages (Java, C#, Python, and Ruby), so you should be able to test your changes to the native code no matter what language you’re comfortable working in from the client side. For working on the C++ code, you’ll need Visual Studio 2010 Professional or higher. Unfortunately, the C++ code of the driver uses ATL to ease the pain of working with COM objects, and ATL is not supplied with Visual C++ 2010 Express Edition. If you’re using Eclipse, the process for making and testing modifications is:

  1. Edit the C++ code in VS.
  2. Build the code to ensure that it compiles
  3. Do a complete rebuild when you are ready to run a test. This will cause the created DLL to be copied to the right place to allow its use in Eclipse
  4. Load Eclipse (or some other IDE, such as Idea)
  5. Edit the SingleTestSuite so that it is usingDriver(IE)
  6. Create a JUnit run configuration that uses the “webdriver-internet-explorer” project. If you don’t do this, the test won’t work at all, and there will be a somewhat cryptic error message on the console.

Once the basic setup is done, you can start working on the code pretty quickly. You can attach to the process you execute your code from using Visual Studio (from the Debug menu, select Attach to Process…).

5 - Selenium IDE

Selenium IDE是一个记录和回放用户操作的浏览器扩展.

Selenium的集成开发环境 (Selenium IDE) 是一个易于使用的浏览器扩展, 利用既定的Selenium命令记录用户的浏览器行为, 参数由每个元素的上下文定义. 它提供了一种学习Selenium语法的极好方法. 其适用于谷歌Chrome、Mozilla火狐以及微软Edge浏览器.

有关更多信息, 请访问完整的 Selenium IDE 文档

6 - Selenium Manager (Beta)

Selenium Manager is a command-line tool implemented in Rust that provides automated driver and browser management for Selenium. Selenium bindings use this tool by default, so you do not need to download it or add anything to your code or do anything else to use it.

Motivation

TL;DR: Selenium Manager is the official driver manager of the Selenium project, and it is shipped out of the box with every Selenium release.

Selenium uses the native support implemented by each browser to carry out the automation process. For this reason, Selenium users need to place a component called driver (chromedriver, geckodriver, msedgedriver, etc.) between the script using the Selenium API and the browser. For many years, managing these drivers was a manual process for Selenium users. This way, they had to download the required driver for a browser (chromedriver for Chrome, geckodriver for Firefox, etc.) and place it in the PATH or export the driver path as a system property (Java, JavaScript, etc.). But this process was cumbersome and led to maintainability issues.

Let’s consider an example. Imagine you manually downloaded the required chromedriver for driving your Chrome with Selenium. When you did this process, the stable version of Chrome was 113, so you downloaded chromedriver 113 and put it in your PATH. At that moment, your Selenium script executed correctly. But the problem is that Chrome is evergreen. This name refers to Chrome’s ability to upgrade automatically and silently to the next stable version when available. This feature is excellent for end-users but potentially dangerous for browser automation. Let’s go back to the example to discover it. Your local Chrome eventually updates to version 115. And that moment, your Selenium script is broken due to the incompatibility between the manually downloaded driver (113) and the Chrome version (115). Thus, your Selenium script fails with the following error message: “session not created: This version of ChromeDriver only supports Chrome version 113”.

This problem is the primary reason for the existence of the so-called driver managers (such as WebDriverManager for Java, webdriver-manager for Python, webdriver-manager for JavaScript, WebDriverManager.Net for C#, and webdrivers for Ruby). All these projects were an inspiration and a clear sign that the community needed this feature to be built in Selenium. Thus, the Selenium project has created Selenium Manager, the official driver manager for Selenium, shipped out of the box with each Selenium release as of version 4.6.

Usage

TL;DR: Selenium Manager is used by the Selenium bindings when the drivers (chromedriver, geckodriver, etc.) are unavailable.

Driver management through Selenium Manager is opt-in for the Selenium bindings. Thus, users can continue managing their drivers manually (putting the driver in the PATH or using system properties) or rely on a third-party driver manager to do it automatically. Selenium Manager only operates as a fallback: if no driver is provided, Selenium Manager will come to the rescue.

Selenium Manager is a CLI (command line interface) tool implemented in Rust to allow cross-platform execution and compiled for Windows, Linux, and macOS. The Selenium Manager binaries are shipped with each Selenium release. This way, each Selenium binding language invokes Selenium Manager to carry out the automated driver and browser management explained in the following sections.

Automated driver management

TL;DR: Selenium Manager automatically discovers, downloads, and caches the drivers required by Selenium when these drivers are unavailable.

The primary feature of Selenium Manager is called automated driver management. Let’s consider an example to understand it. Suppose we want to driver Chrome with Selenium (see the doc about how to start a session with Selenium). Before the session begins, and when the driver is unavailable, Selenium Manager manages chromedriver for us. We use the term management for this feature (and not just download) since this process is broader and implies different steps:

  1. Browser version discovery. Selenium Manager discovers the browser version (e.g., Chrome, Firefox, Edge) installed in the machine that executes Selenium. This step uses shell commands (e.g., google-chrome --version).
  2. Driver version discovery. With the discovered browser version, the proper driver version is resolved. For this step, the online metadata/endpoints maintained by the browser vendors (e.g., chromedriver, geckodriver, or msedgedriver) are used.
  3. Driver download. The driver URL is obtained with the resolved driver version; with that URL, the driver artifact is downloaded, uncompressed, and stored locally.
  4. Driver cache. Uncompressed driver binaries are stored in a local cache folder (~/.cache/selenium). The next time the same driver is required, it will be used from there if the driver is already in the cache.

Automated browser management

TL;DR: Selenium Manager automatically discovers, downloads, and caches the browsers driven with Selenium (Chrome, Firefox, and Edge) when these browsers are not installed in the local system.

As of Selenium 4.11.0, Selenium Manager also implements automated browser management. With this feature, Selenium Manager allows us to discover, download, and cache the different browser releases, making them seamlessly available for Selenium. Internally, Selenium Manager uses an equivalent management procedure explained in the section before, but this time, for browser releases.

The browser automatically managed by Selenium Manager are:

Let’s consider again the typical example of driving Chrome with Selenium. And this time, suppose Chrome is not installed on the local machine when starting a new session). In that case, the current stable CfT release will be discovered, downloaded, and cached (in ~/.cache/selenium/chrome) by Selenium Manager.

But there is more. In addition to the stable browser version, Selenium Manager also allows downloading older browser versions (in the case of CfT, starting in version 113, the first version published as CfT). To set a browser version with Selenium, we use a browser option called browserVersion.

Let’s consider another simple example. Suppose we set browserVersion to 114 using Chrome options. In this case, Selenium Manager will check if Chrome 114 is already installed. If it is, it will be used. If not, Selenium Manager will manage (i.e., discover, download, and cache) CfT 114. And in either case, the chromedriver is also managed. Finally, Selenium will start Chrome to be driven programmatically, as usual.

But there is even more. In addition to fixed browser versions (e.g., 113, 114, 115, etc.), we can use the following labels for browserVersion:

  • stable: Current CfT version.
  • beta: Next version to stable.
  • dev: Version in development at this moment.
  • canary: Nightly build for developers.
  • esr: Extended Support Release (only for Firefox).

When these labels are specified, Selenium Manager first checks if a given browser is already installed (beta, dev, etc.), and when it is not detected, the browser is automatically managed.

Edge in Windows

Automated Edge management by Selenium Manager in Windows is different from other browsers. Both Chrome and Firefox (and Edge in macOS and Linux) are downloaded automatically to the local cache (~/.cache/selenium) by Selenium Manager. Nevertheless, the same cannot be done for Edge in Windows. The reason is that the Edge installer for Windows is distributed as a Microsoft Installer (MSI) file, designed to be executed with administrator rights. This way, when Edge is attempted to be installed with Selenium Manager in Windows with a non-administrator session, a warning message will be displayed by Selenium Manager as follows:

edge can only be installed in Windows with administrator permissions

Therefore, administrator permissions are required to install Edge in Windows automatically through Selenium Manager, and Edge is eventually installed in the usual program files folder (e.g., C:\Program Files (x86)\Microsoft\Edge).

Configuration

TL;DR: Selenium Manager should work silently and transparently for most users. Nevertheless, there are scenarios (e.g., to specify a custom cache path or setup globally a proxy) where custom configuration can be required.

Selenium Manager is a CLI tool. Therefore, under the hood, the Selenium bindings call Selenium Manager by invoking shell commands. Like any other CLI tool, arguments can be used to specify specific capabilities in Selenium Manager. The different arguments supported by Selenium Manager can be checked by running the following command:

$ ./selenium-manager --help

In addition to CLI arguments, Selenium Manager allows two additional mechanisms for configuration:

  • Configuration file. Selenium Manager uses a file called se-config.toml located in the Selenium cache (by default, at ~/.cache/selenium) for custom configuration values. This TOML file contains a key-value collection used for custom configuration.
  • Environmental variables. Each configuration key has its equivalence in environmental variables by converting each key name to uppercase, replacing the dash symbol (-) with an underscore (_), and adding the prefix SE_.

The configuration file is honored by Selenium Manager when it is present, and the corresponding CLI parameter is not specified. Besides, the environmental variables are used when neither of the previous options (CLI arguments and configuration file) is specified. In other words, the order of preference for Selenium Manager custom configuration is as follows:

  1. CLI arguments.
  2. Configuration file.
  3. Environment variables.

Notice that the Selenium bindings use the CLI arguments to specify configuration values, which in turn, are defined in each binding using browser options.

The following table summarizes all the supported arguments supported by Selenium Manager and their correspondence key in the configuration file and environment variables.

CLI argumentConfiguration fileEnv variableDescription
--browser BROWSERbrowser = "BROWSER"SE_BROWSER=BROWSERBrowser name: chrome, firefox, edge, iexplorer, safari, safaritp, or webview2
--driver <DRIVER>driver = "DRIVER"SE_DRIVER=DRIVERDriver name: chromedriver, geckodriver, msedgedriver, IEDriverServer, or safaridriver
--browser-version <BROWSER_VERSION>browser-version = "BROWSER_VERSION"SE_BROWSER_VERSION=BROWSER_VERSIONMajor browser version (e.g., 105, 106, etc. Also: beta, dev, canary -or nightly-, and esr -in Firefox- are accepted)
--driver-version <DRIVER_VERSION>driver-version = "DRIVER_VERSION"SE_DRIVER_VERSION=DRIVER_VERSIONDriver version (e.g., 106.0.5249.61, 0.31.0, etc.)
--browser-path <BROWSER_PATH>browser-path = "BROWSER_PATH"SE_BROWSER_PATH=BROWSER_PATHBrowser path (absolute) for browser version detection (e.g., /usr/bin/google-chrome, /Applications/Google Chrome.app/Contents/MacOS/Google Chrome, C:\Program Files\Google\Chrome\Application\chrome.exe)
--driver-mirror-url <DRIVER_MIRROR_URL>driver-mirror-url = "DRIVER_MIRROR_URL"SE_DRIVER_MIRROR_URL=DRIVER_MIRROR_URLMirror URL for driver repositories
--browser-mirror-url <BROWSER_MIRROR_URL>browser-mirror-url = "BROWSER_MIRROR_URL"SE_BROWSER_MIRROR_URL=BROWSER_MIRROR_URLMirror URL for browser repositories
--output <OUTPUT>output = "OUTPUT"SE_OUTPUT=OUTPUTOutput type: LOGGER (using INFO, WARN, etc.), JSON (custom JSON notation), or SHELL (Unix-like). Default: LOGGER
--os <OS>os = "OS"SE_OS=OSOperating system for drivers and browsers (i.e., windows, linux, or macos)
--arch <ARCH>arch = "ARCH"SE_ARCH=ARCHSystem architecture for drivers and browsers (i.e., x32, x64, or arm64)
--proxy <PROXY>proxy = "PROXY"SE_PROXY=PROXYHTTP proxy for network connection (e.g., myproxy:port, myuser:mypass@myproxy:port)
--timeout <TIMEOUT>timeout = TIMEOUTSE_TIMEOUT=TIMEOUTTimeout for network requests (in seconds). Default: 300
--offlineoffline = trueSE_OFFLINE=trueOffline mode (i.e., disabling network requests and downloads)
--force-browser-downloadforce-browser-download = trueSE_FORCE_BROWSER_DOWNLOAD=trueForce to download browser, e.g., when a browser is already installed in the system, but you want Selenium Manager to download and use it
--avoid-browser-downloadavoid-browser-download = trueSE_AVOID_BROWSER_DOWNLOAD=trueAvoid to download browser, e.g., when a browser is supposed to be downloaded by Selenium Manager, but you prefer to avoid it
--debugdebug = trueSE_DEBUG=trueDisplay DEBUG messages
--tracetrace = trueSE_TRACE=trueDisplay TRACE messages
--cache-path <CACHE_PATH>cache-path="CACHE_PATH"SE_CACHE_PATH=CACHE_PATHLocal folder used to store downloaded assets (drivers and browsers), local metadata, and configuration file. See next section for details. Default: ~/.cache/selenium. For Windows paths in the TOML configuration file, double backslashes are required (e.g., C:\\custom\\cache).
--ttl <TTL>ttl = TTLSE_TTL=TTLTime-to-live in seconds. See next section for details. Default: 3600 (1 hour)

In addition to the configuration keys specified in the table before, there are some special cases, namely:

  • Browser version. In addition to browser-version, we can use the specific configuration keys to specify custom versions per supported browser. This way, the keys chrome-version, firefox-version, edge-version, etc., are supported. The same applies to environment variables (i.e., SE_CHROME_VERSION, SE_FIREFOX_VERSION, SE_EDGE_VERSION, etc.).
  • Driver version. Following the same pattern, we can use chromedriver-version, geckodriver-version, msedgedriver-version, etc. (in the configuration file), and SE_CHROMEDRIVER_VERSION, SE_GECKODRIVER_VERSION, SE_MSEDGEDRIVER_VERSION, etc. (as environment variables).
  • Browser path. Following the same pattern, we can use chrome-path, firefox-path, edge-path, etc. (in the configuration file), and SE_CHROME_PATH, SE_FIREFOX_PATH, SE_EDGE_PATH, etc. (as environment variables). The Selenium bindings also allow to specify a custom location of the browser path using options, namely: Chrome), Edge, or Firefox.
  • Driver mirror. Following the same pattern, we can use chromedriver-mirror-url, geckodriver-mirror-url, msedgedriver-mirror-url, etc. (in the configuration file), and SE_CHROMEDRIVER_MIRROR_URL, SE_GECKODRIVER_MIRROR_URL, SE_MSEDGEDRIVER_MIRROR_URL, etc. (as environment variables).
  • Browser mirror. Following the same pattern, we can use chrome-mirror-url, firefox-mirror-url, edge-mirror-url, etc. (in the configuration file), and SE_CHROME_MIRROR_URL, SE_FIREFOX_MIRROR_URL, SE_EDGE_MIRROR_URL, etc. (as environment variables).

Caching

TL;DR: The drivers and browsers managed by Selenium Manager are stored in a local folder (~/.cache/selenium).

The cache in Selenium Manager is a local folder (~/.cache/selenium by default) in which the downloaded assets (drivers and browsers) are stored. For the sake of performance, when a driver or browser is already in the cache (i.e., there is a cache hint), Selenium Manager uses it from there.

In addition to the downloaded drivers and browsers, two additional files live in the cache’s root:

  • Configuration file (se-config.toml). This file is optional and, as explained in the previous section, allows to store custom configuration values for Selenium Manager. This file is maintained by the end-user and read by Selenium Manager.
  • Metadata file (se-metadata.json). This file contains versions discovered by Selenium Manger making network requests (e.g., using the CfT JSON endpoints) and the time-to-live (TTL) in which they are valid. Selenium Manager automatically maintains this file.

The TTL in Selenium Manager is inspired by the TTL for DNS, a well-known mechanism that refers to how long some values are cached before they are automatically refreshed. In the case of Selenium Manager, these values are the versions found by making network requests for driver and browser version discovery. By default, the TTL is 3600 seconds (i.e., 1 hour) and can be tuned using configuration values or disabled by setting this configuration value to 0.

The TTL mechanism is a way to improve the overall performance of Selenium. It is based on the fact that the discovered driver and browser versions (e.g., the proper chromedriver version for Chrome 115 is 115.0.5790.170) will likely remain the same in the short term. Therefore, the discovered versions are written in the metadata file and read from there instead of making the same consecutive network request. This way, during the driver version discovery (step 2 of the automated driver management process previously introduced), Selenium Manager first reads the file metadata. When a fresh resolution (i.e., a driver/browser version valid during a TTL) is found, that version is used (saving some time in making a new network request). If not found or the TTL has expired, a network request is made, and the result is stored in the metadata file.

Let’s consider an example. A Selenium binding asks Selenium Manager to resolve chromedriver. Selenium Manager detects that Chrome 115 is installed, so it makes a network request to the CfT endpoints to discover the proper chromedriver version (115.0.5790.170, at that moment). This version is stored in the metadata file and considered valid during the next hour (TTL). If Selenium Manager is asked to resolve chromedriver during that time (which is likely to happen in the execution of a test suite), the chromedriver version is discovered by reading the metadata file instead of making a new request to the CfT endpoints. After one hour, the chromedriver version stored in the cache will be considered as stale, and Selenium Manager will refresh it by making a new network request to the corresponding endpoint.

Selenium Manager includes two additional arguments two handle the cache, namely:

  • --clear-cache: To remove the cache folder.
  • --clear-metadata: To remove the metadata file.

Versioning

Selenium Manager follows the same versioning schema as Selenium. Nevertheless, we use the major version 0 for Selenium Manager releases because it is still in beta. For example, the Selenium Manager binaries shipped with Selenium 4.12.0 corresponds to version 0.4.12.

Getting Selenium Manager

For most users, direct interaction with Selenium Manager is not required since the Selenium bindings use it internally. Nevertheless, if you want to play with Selenium Manager or use it for your use case involving driver or browser management, you can get the Selenium Manager binaries in different ways:

  • From the Selenium repository. The Selenium Manager source code is stored in the main Selenium repo under the folder rust. Moreover, you can find the compiled versions for Windows, Linux, and macOS in the common folder of this repo.
  • From the build workflow. Selenium Manager is compiled using a GitHub Actions workflow. This workflow creates binaries for Windows, Linux, and macOS. You can download these binaries from these workflow executions.
  • From the cache. As of version 4.15.0 of the Selenium Java bindings, the Selenium Manager binary is extracted and copied to the cache folder. For instance, the Selenium Manager binary shipped with Selenium 4.15.0 is stored in the folder ~/.cache/selenium/manager/0.4.15).

Examples

Let’s consider a typical example: we want to manage chromedriver automatically. For that, we invoke Selenium Manager as follows (notice that the flag --debug is optional, but it helps us to understand what Selenium Manager is doing):

$ ./selenium-manager --browser chrome --debug
DEBUG   chromedriver not found in PATH
DEBUG   chrome detected at C:\Program Files\Google\Chrome\Application\chrome.exe
DEBUG   Running command: wmic datafile where name='C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe' get Version /value
DEBUG   Output: "\r\r\n\r\r\nVersion=116.0.5845.111\r\r\n\r\r\n\r\r\n\r"
DEBUG   Detected browser: chrome 116.0.5845.111
DEBUG   Discovering versions from https://googlechromelabs.github.io/chrome-for-testing/known-good-versions-with-downloads.json
DEBUG   Required driver: chromedriver 116.0.5845.96
DEBUG   Downloading chromedriver 116.0.5845.96 from https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/116.0.5845.96/win64/chromedriver-win64.zip
INFO    Driver path: C:\Users\boni\.cache\selenium\chromedriver\win64\116.0.5845.96\chromedriver.exe
INFO    Browser path: C:\Program Files\Google\Chrome\Application\chrome.exe

In this case, the local Chrome (in Windows) is detected by Selenium Manager. Then, using its version and the CfT endpoints, the proper chromedriver version (115, in this example) is downloaded to the local cache. Finally, Selenium Manager provides two results: i) the driver path (downloaded) and ii) the browser path (local).

Let’s consider another example. Now we want to use Chrome beta. Therefore, we invoke Selenium Manager specifying that version label as follows (notice that the CfT beta is discovered, downloaded, and stored in the local cache):

$ ./selenium-manager --browser chrome --browser-version beta --debug
DEBUG   chromedriver not found in PATH
DEBUG   chrome not found in PATH
DEBUG   chrome beta not found in the system
DEBUG   Discovering versions from https://googlechromelabs.github.io/chrome-for-testing/last-known-good-versions-with-downloads.json
DEBUG   Required browser: chrome 117.0.5938.22
DEBUG   Downloading chrome 117.0.5938.22 from https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/117.0.5938.22/win64/chrome-win64.zip
DEBUG   chrome 117.0.5938.22 has been downloaded at C:\Users\boni\.cache\selenium\chrome\win64\117.0.5938.22\chrome.exe
DEBUG   Discovering versions from https://googlechromelabs.github.io/chrome-for-testing/known-good-versions-with-downloads.json
DEBUG   Required driver: chromedriver 117.0.5938.22
DEBUG   Downloading chromedriver 117.0.5938.22 from https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/117.0.5938.22/win64/chromedriver-win64.zip
INFO    Driver path: C:\Users\boni\.cache\selenium\chromedriver\win64\117.0.5938.22\chromedriver.exe
INFO    Browser path: C:\Users\boni\.cache\selenium\chrome\win64\117.0.5938.22\chrome.exe

Selenium Grid

Selenium Manager allows you to configure the drivers automatically when setting up Selenium Grid. To that aim, you need to include the argument --selenium-manager true in the command to start Selenium Grid. For more details, visit the Selenium Grid starting page.

Moreover, Selenium Manager also allows managing Selenium Grid releases automatically. For that, the argument --grid is used as follows:

$ ./selenium-manager --grid

After this command, Selenium Manager discovers the latest version of Selenium Grid, storing the selenium-server.jar in the local cache.

Optionally, the argument --grid allows to specify a Selenium Grid version (--grid <GRID_VERSION>).

Known Limitations

Connectivity issues

Selenium Manager requests remote endpoints (like Chrome for Testing (CfT), among others) to discover and download drivers and browsers from online repositories. When this operation is done in a corporate environment with a proxy or firewall, it might lead to connectivity problems like the following:

error sending request for url (https://googlechromelabs.github.io/chrome-for-testing/known-good-versions-with-downloads.json)
error trying to connect: dns error: failed to lookup address information
error trying to connect: An existing connection was forcibly closed by the remote host. (os error 10054)

When that happens, consider the following solutions:

  • Use the proxy capabilities of Selenium (see documentation). Alternatively, use the environment variable SE_PROXY to set the proxy URL or use the configuration file (see configuration).
  • Review your network setup to enable the remote requests and downloads required by Selenium Manager.

Custom package managers

If you are using a Linux package manager (Anaconda, snap, etc) that requires a specific driver be used for your browsers, you’ll need to either specify the driver location, the browser location, or both, depending on the requirements.

Alternative architectures

Selenium supports all five architectures managed by Google’s Chrome for Testing, and all six drivers provided for Microsoft Edge.

Each release of the Selenium bindings comes with three separate Selenium Manager binaries — one for Linux, Windows, and Mac.

  • The Mac version supports both x64 and aarch64 (Intel and Apple).
  • The Windows version should work for both x86 and x64 (32-bit and 64-bit OS).
  • The Linux version has only been verified to work for x64.

Reasons for not supporting more architectures:

  1. Neither Chrome for Testing nor Microsoft Edge supports additional architectures, so Selenium Manager would need to manage something unofficial for it to work.
  2. We currently build the binaries from existing GitHub actions runners, which do not support these architectures
  3. Any additional architectures would get distributed with all Selenium releases, increasing the total build size

If you are running Linux on arm64/aarch64, 32-bit architecture, or a Raspberry Pi, Selenium Manager will not work for you. The biggest issue for people is that they used to get custom-built drivers and put them on PATH and have them work. Now that Selenium Manager is responsible for locating drivers on PATH, this approach no longer works, and users need to use a Service class and set the location directly. There are a number of advantages to having Selenium Manager look for drivers on PATH instead of managing that logic in each of the bindings, so that’s currently a trade-off we are comfortable with.

However, as of Selenium 4.13.0, the Selenium bindings allow locating the Selenium Manager binary using an environment variable called SE_MANAGER_PATH. If this variable is set, the bindings will use its value as the Selenium Manager path in the local filesystem. This feature will allow users to provide a custom compilation of Selenium Manager, for instance, if the default binaries (compiled for Windows, Linux, and macOS) are incompatible with a given system (e.g., ARM64 in Linux).

Browser dependencies

When automatically managing browsers in Linux, Selenium Manager relies on the releases published by the browser vendors (i.e., Chrome, Firefox, and Edge). These releases are portable in most cases. Nevertheless, there might be cases in which existing libraries are required. In Linux, this problem might be experienced when trying to run Firefox, e.g., as follows:

libdbus-glib-1.so.2: cannot open shared object file: No such file or directory
Couldn't load XPCOM.

If that happens, the solution is to install that library, for instance, as follows:

sudo apt-get install libdbus-glib-1-2

A similar issue might happen when trying to execute Chrome for Testing in Linux:

error while loading shared libraries: libatk-1.0.so.0: cannot open shared object file: No such file or directory

In this case, the library to be installed is the following:

sudo apt-get install libatk-bridge2.0-0

Roadmap

You can trace the work in progress in the Selenium Manager project dashboard. Moreover, you can check the new features shipped with each Selenium Manager release in its changelog file.

7 - 鼓励的行为

Selenium项目的一些测试指南和建议.

关于"最佳实践"的注解:我们有意在本文档中避免使用"最佳实践"的说辞. 没有一种方法可以适用于所有情况. 我们更喜欢"指南和建议"的想法. 我们鼓励您通读这些内容, 并仔细地确定哪种方法适用于您的特定环境.

由于许多原因, 功能测试很难正确完成. 即便应用程序的状态, 复杂性, 依赖还不够让测试变得 足够复杂, 操作浏览器(特别是跨浏览器的兼容性测试)就已经使得写一个好的测试变成一种挑战.

Selenium提供了一些工具使得功能测试用户更简单的操作浏览器, 但是这些工具并不能帮助你来写一个好的 架构的测试套件. 这章我们会针对怎么来做web页面的功能测试的自动化给出一些忠告, 指南和建议.

这章记录了很多历年来成功的使用Selenium的用户的常用的软件设计模式.

7.1 - 测试自动化概述

首先,问问自己是否真的需要使用浏览器。 在某些情况下,如果您正在开发一个复杂的 web 应用程序, 您需要打开一个浏览器并进行实际测试,这种可能性是很大的。

然而,诸如 Selenium 之类的功能性最终用户测试运行起来很昂贵。 此外,它们通常需要大量的基础设施才能有效运行。 经常问问自己,您想要测试的东西是否可以使用更轻量级的测试方法(如单元测试)完成, 还是使用较低级的方法完成,这是一个很好的规则。

一旦确定您正在进行Web浏览器测试业务, 并且您的 Selenium 环境已经准备好开始编写测试, 您通常会执行以下三个步骤的组合:

  • 设置数据
  • 执行一组离散的操作
  • 评估结果

您需要尽可能缩短这些步骤; 一到两个操作在大多数时间内应该足够了。 浏览器自动化具有“脆弱”的美誉, 但实际上那是因为用户经常对它要求过高。 在后面的章节中,我们将回到您可以使用的技术, 为了缓解测试中明显的间歇性问题, 特别是如何克服 浏览器 和 WebDriver 之间的竞争条件

通过保持测试简短并仅在您完全没有替代方案时使用Web浏览器,您可以用最小的代码片段来完成很多测试。

Selenium测试的一个显著优势是,它能够从用户的角度测试应用程序的所有组件(从后端到前端)。 因此,换句话说,虽然功能测试运行起来可能很昂贵,但它们同时也包含了大量关键业务部分。

测试要求

如前所述,Selenium 测试运行起来可能很昂贵。 在多大程度上取决于您正在运行测试的浏览器, 但历史上浏览器的行为变化太大,以至于通常是针对多个浏览器进行交叉测试的既定目标。

Selenium 允许您在多个操作系统上的多个浏览器上运行相同的指令, 但是对所有可能的浏览器、它们的不同版本以及它们所运行的许多操作系统的枚举将很快成为一项繁重的工作。

让我们从一个例子开始

Larry 写了一个网站,允许用户订购他们自己定制的独角兽。

一般的工作流程(我们称之为“幸福之路”)是这样的:

  • 创建一个账户
  • 配置他们的独角兽
  • 添加到购物车
  • 检验并付款
  • 给出关于他们独角兽的反馈

编写一个宏大的 Selenium 脚本来执行所有这些操作是很诱人的 — 很多人都会尝试这样做。 抵制诱惑! 这样做会导致测试: a) 需要很长时间; b) 会受到一些与页面呈现时间问题有关的常见问题的影响; c) 如果失败,它不会给出一个简洁的、“可检查”的方法来诊断出了什么问题。

测试此场景的首选策略是将其分解为一系列独立的、快速的测试,每个测试都有一个存在的“理由”。

假设您想测试第二步: 配置您的独角兽。 它将执行以下操作:

  • 创建一个帐户
  • 配置一个独角兽

请注意,我们跳过了这些步骤的其余部分, 在完成这一步之后,我们将在其他小的、离散的测试用例中测试工作流的其余部分。

首先,您需要创建一个帐户。在这里您可以做出一些选择:

  • 您想使用现有帐户吗?
  • 您想创建一个新帐户吗?
  • 在配置开始之前,是否需要考虑有任何特殊属性的用户需要吗?

不管您如何回答这个问题, 解决方案是让它成为测试中“设置数据”部分的一部分 - 如果 Larry 公开了一个 API, 使您(或任何人)能够创建和更新用户帐户, 一定要用它来回答这个问题 请确保使用这个 API 来回答这个问题 — 如果可能的话, 您希望只有在您拥有一个用户之后才启动浏览器,您可以使用该用户的凭证进行登录。

如果每个工作流的每个测试都是从创建用户帐户开始的,那么每个测试的执行都会增加许多秒。 调用 API 并与数据库进行通信是快速、“无头”的操作, 不需要打开浏览器、导航到正确页面、点击并等待表单提交等昂贵的过程。

理想情况下,您可以在一行代码中处理这个设置阶段,这些代码将在任何浏览器启动之前执行:

// Create a user who has read-only permissions--they can configure a unicorn,
// but they do not have payment information set up, nor do they have
// administrative privileges. At the time the user is created, its email
// address and password are randomly generated--you don't even need to
// know them.
User user = UserFactory.createCommonUser(); //This method is defined elsewhere.

// Log in as this user.
// Logging in on this site takes you to your personal "My Account" page, so the
// AccountPage object is returned by the loginAs method, allowing you to then
// perform actions from the AccountPage.
AccountPage accountPage = loginAs(user.getEmail(), user.getPassword());
  
# Create a user who has read-only permissions--they can configure a unicorn,
# but they do not have payment information set up, nor do they have
# administrative privileges. At the time the user is created, its email
# address and password are randomly generated--you don't even need to
# know them.
user = user_factory.create_common_user() #This method is defined elsewhere.

# Log in as this user.
# Logging in on this site takes you to your personal "My Account" page, so the
# AccountPage object is returned by the loginAs method, allowing you to then
# perform actions from the AccountPage.
account_page = login_as(user.get_email(), user.get_password())
  
// Create a user who has read-only permissions--they can configure a unicorn,
// but they do not have payment information set up, nor do they have
// administrative privileges. At the time the user is created, its email
// address and password are randomly generated--you don't even need to
// know them.
User user = UserFactory.CreateCommonUser(); //This method is defined elsewhere.

// Log in as this user.
// Logging in on this site takes you to your personal "My Account" page, so the
// AccountPage object is returned by the loginAs method, allowing you to then
// perform actions from the AccountPage.
AccountPage accountPage = LoginAs(user.Email, user.Password);
  
# Create a user who has read-only permissions--they can configure a unicorn,
# but they do not have payment information set up, nor do they have
# administrative privileges. At the time the user is created, its email
# address and password are randomly generated--you don't even need to
# know them.
user = UserFactory.create_common_user #This method is defined elsewhere.

# Log in as this user.
# Logging in on this site takes you to your personal "My Account" page, so the
# AccountPage object is returned by the loginAs method, allowing you to then
# perform actions from the AccountPage.
account_page = login_as(user.email, user.password)
  
// Create a user who has read-only permissions--they can configure a unicorn,
// but they do not have payment information set up, nor do they have
// administrative privileges. At the time the user is created, its email
// address and password are randomly generated--you don't even need to
// know them.
var user = userFactory.createCommonUser(); //This method is defined elsewhere.

// Log in as this user.
// Logging in on this site takes you to your personal "My Account" page, so the
// AccountPage object is returned by the loginAs method, allowing you to then
// perform actions from the AccountPage.
var accountPage = loginAs(user.email, user.password);
  
// Create a user who has read-only permissions--they can configure a unicorn,
// but they do not have payment information set up, nor do they have
// administrative privileges. At the time the user is created, its email
// address and password are randomly generated--you don't even need to
// know them.
val user = UserFactory.createCommonUser() //This method is defined elsewhere.

// Log in as this user.
// Logging in on this site takes you to your personal "My Account" page, so the
// AccountPage object is returned by the loginAs method, allowing you to then
// perform actions from the AccountPage.
val accountPage = loginAs(user.getEmail(), user.getPassword())
  

您可以想象,UserFactory可以扩展为提供诸如createAdminUser()createUserWithPayment()的方法。 关键是,这两行代码不会分散您对此测试的最终目的的注意力: 配置独角兽。

页面对象模型 的复杂性将在后面的章节中讨论,但我们将在这里介绍这个概念:

您的测试应该由操作组成,从用户的角度出发,在站点的页面上下文中执行。 这些页面被存储为对象, 其中包含关于 web 页面如何组成以及如何执行操作的特定信息 — 作为测试人员,您应该很少关注这些信息。

您想要什么样的独角兽? 您可能想要粉红色,但不一定。 紫色最近很流行。 她需要太阳镜吗? 明星纹身? 这些选择虽然困难,但是作为测试人员, 您的主要关注点是 — 您需要确保您的订单履行中心将正确的独角兽发送给正确的人,而这就要从这些选择开始。

请注意,我们在该段落中没有讨论按钮,字段,下拉菜单,单选按钮或 Web 表单。 您的测试也不应该! 您希望像尝试解决问题的用户一样编写代码。 这是一种方法(从前面的例子继续):

// The Unicorn is a top-level Object--it has attributes, which are set here.
// This only stores the values; it does not fill out any web forms or interact
// with the browser in any way.
Unicorn sparkles = new Unicorn("Sparkles", UnicornColors.PURPLE, UnicornAccessories.SUNGLASSES, UnicornAdornments.STAR_TATTOOS);

// Since we're already "on" the account page, we have to use it to get to the
// actual place where you configure unicorns. Calling the "Add Unicorn" method
// takes us there.
AddUnicornPage addUnicornPage = accountPage.addUnicorn();

// Now that we're on the AddUnicornPage, we will pass the "sparkles" object to
// its createUnicorn() method. This method will take Sparkles' attributes,
// fill out the form, and click submit.
UnicornConfirmationPage unicornConfirmationPage = addUnicornPage.createUnicorn(sparkles);
  
# The Unicorn is a top-level Object--it has attributes, which are set here.
# This only stores the values; it does not fill out any web forms or interact
# with the browser in any way.
sparkles = Unicorn("Sparkles", UnicornColors.PURPLE, UnicornAccessories.SUNGLASSES, UnicornAdornments.STAR_TATTOOS)

# Since we're already "on" the account page, we have to use it to get to the
# actual place where you configure unicorns. Calling the "Add Unicorn" method
# takes us there.
add_unicorn_page = account_page.add_unicorn()

# Now that we're on the AddUnicornPage, we will pass the "sparkles" object to
# its createUnicorn() method. This method will take Sparkles' attributes,
# fill out the form, and click submit.
unicorn_confirmation_page = add_unicorn_page.create_unicorn(sparkles)
  
// The Unicorn is a top-level Object--it has attributes, which are set here. 
// This only stores the values; it does not fill out any web forms or interact
// with the browser in any way.
Unicorn sparkles = new Unicorn("Sparkles", UnicornColors.Purple, UnicornAccessories.Sunglasses, UnicornAdornments.StarTattoos);

// Since we are already "on" the account page, we have to use it to get to the
// actual place where you configure unicorns. Calling the "Add Unicorn" method
// takes us there.
AddUnicornPage addUnicornPage = accountPage.AddUnicorn();

// Now that we're on the AddUnicornPage, we will pass the "sparkles" object to
// its createUnicorn() method. This method will take Sparkles' attributes,
// fill out the form, and click submit.
UnicornConfirmationPage unicornConfirmationPage = addUnicornPage.CreateUnicorn(sparkles);
  
# The Unicorn is a top-level Object--it has attributes, which are set here.
# This only stores the values; it does not fill out any web forms or interact
# with the browser in any way.
sparkles = Unicorn.new('Sparkles', UnicornColors.PURPLE, UnicornAccessories.SUNGLASSES, UnicornAdornments.STAR_TATTOOS)

# Since we're already "on" the account page, we have to use it to get to the
# actual place where you configure unicorns. Calling the "Add Unicorn" method
# takes us there.
add_unicorn_page = account_page.add_unicorn

# Now that we're on the AddUnicornPage, we will pass the "sparkles" object to
# its createUnicorn() method. This method will take Sparkles' attributes,
# fill out the form, and click submit.
unicorn_confirmation_page = add_unicorn_page.create_unicorn(sparkles)
  
// The Unicorn is a top-level Object--it has attributes, which are set here.
// This only stores the values; it does not fill out any web forms or interact
// with the browser in any way.
var sparkles = new Unicorn("Sparkles", UnicornColors.PURPLE, UnicornAccessories.SUNGLASSES, UnicornAdornments.STAR_TATTOOS);

// Since we are already "on" the account page, we have to use it to get to the
// actual place where you configure unicorns. Calling the "Add Unicorn" method
// takes us there.

var addUnicornPage = accountPage.addUnicorn();

// Now that we're on the AddUnicornPage, we will pass the "sparkles" object to
// its createUnicorn() method. This method will take Sparkles' attributes,
// fill out the form, and click submit.
var unicornConfirmationPage = addUnicornPage.createUnicorn(sparkles);

  
// The Unicorn is a top-level Object--it has attributes, which are set here. 
// This only stores the values; it does not fill out any web forms or interact
// with the browser in any way.
val sparkles = Unicorn("Sparkles", UnicornColors.PURPLE, UnicornAccessories.SUNGLASSES, UnicornAdornments.STAR_TATTOOS)

// Since we are already "on" the account page, we have to use it to get to the
// actual place where you configure unicorns. Calling the "Add Unicorn" method
// takes us there.
val addUnicornPage = accountPage.addUnicorn()

// Now that we're on the AddUnicornPage, we will pass the "sparkles" object to
// its createUnicorn() method. This method will take Sparkles' attributes,
// fill out the form, and click submit.
unicornConfirmationPage = addUnicornPage.createUnicorn(sparkles)

  

既然您已经配置好了独角兽, 您需要进入第三步:确保它确实有效。

// The exists() method from UnicornConfirmationPage will take the Sparkles
// object--a specification of the attributes you want to see, and compare
// them with the fields on the page.
Assert.assertTrue("Sparkles should have been created, with all attributes intact", unicornConfirmationPage.exists(sparkles));
  
# The exists() method from UnicornConfirmationPage will take the Sparkles
# object--a specification of the attributes you want to see, and compare
# them with the fields on the page.
assert unicorn_confirmation_page.exists(sparkles), "Sparkles should have been created, with all attributes intact"
  
// The exists() method from UnicornConfirmationPage will take the Sparkles 
// object--a specification of the attributes you want to see, and compare
// them with the fields on the page.
Assert.True(unicornConfirmationPage.Exists(sparkles), "Sparkles should have been created, with all attributes intact");
  
# The exists() method from UnicornConfirmationPage will take the Sparkles
# object--a specification of the attributes you want to see, and compare
# them with the fields on the page.
expect(unicorn_confirmation_page.exists?(sparkles)).to be, 'Sparkles should have been created, with all attributes intact'
  
// The exists() method from UnicornConfirmationPage will take the Sparkles
// object--a specification of the attributes you want to see, and compare
// them with the fields on the page.
assert(unicornConfirmationPage.exists(sparkles), "Sparkles should have been created, with all attributes intact");

  
// The exists() method from UnicornConfirmationPage will take the Sparkles 
// object--a specification of the attributes you want to see, and compare
// them with the fields on the page.
assertTrue("Sparkles should have been created, with all attributes intact", unicornConfirmationPage.exists(sparkles))
  

请注意,测试人员在这段代码中除了谈论独角兽之外还没有做任何事情 — 没有按钮、定位器和浏览器控件。 这种对应用程序建模的方法允许您保持这些测试级别的命令不变, 即使 Larry 下周决定不再喜欢 Ruby-on-Rails, 并决定用最新的带有 Fortran 前端的 Haskell 绑定重新实现整个站点。

为了符合站点的重新设计,您的页面对象需要进行一些小的维护,但是这些测试将保持不变。 采用这一基本设计,您将希望继续使用尽可能少的面向浏览器的步骤来完成您的工作流。 您的下一个工作流程将包括在购物车中添加独角兽。 您可能需要多次迭代此测试,以确保购物车正确地保持其状态: 在开始之前,购物车中是否有多个独角兽? 购物车能装多少? 如果您创建多个具有相同名称或特性,它会崩溃吗? 它将只保留现有的一个还是添加另一个?

每次通过工作流时,您都希望尽量避免创建帐户、以用户身份登录和配置独角兽。 理想情况下,您将能够创建一个帐户,并通过 API 或数据库预先配置独角兽。 然后,您只需作为用户登录,找到 Sparkles,并将它添加到购物车中。

是否自动化?

自动化总是有优势吗? 什么时候应该决定去自动化测试用例?

自动化测试用例并不总是有利的. 有时候手动测试可能更合适. 例如,如果应用程序的用户界面,在不久的将来会发生很大变化,那么任何自动化都可能需要重写. 此外,有时根本没有足够的时间来构建自动化测试. 从短期来看,手动测试可能更有效. 如果应用程序的截止日期非常紧迫,当前没有可用的自动化测试,并且必须在特定时间范围内完成,那么手动测试是最好的解决方案.

7.2 - 设计模式和开发策略

Most of the documentation found in this section is still in English. Please note we are not accepting pull requests to translate this content as translating documentation of legacy components does not add value to the community nor the project.

(previously located: https://github.com/SeleniumHQ/selenium/wiki/Bot-Style-Tests)

Overview

Over time, projects tend to accumulate large numbers of tests. As the total number of tests increases, it becomes harder to make changes to the codebase — a single “simple” change may cause numerous tests to fail, even though the application still works properly. Sometimes these problems are unavoidable, but when they do occur you want to be up and running again as quickly as possible. The following design patterns and strategies have been used before with WebDriver to help to make tests easier to write and maintain. They may help you too.

DomainDrivenDesign: Express your tests in the language of the end-user of the app. PageObjects: A simple abstraction of the UI of your web app. LoadableComponent: Modeling PageObjects as components. BotStyleTests: Using a command-based approach to automating tests, rather than the object-based approach that PageObjects encourage

Loadable Component

What Is It?

The LoadableComponent is a base class that aims to make writing PageObjects less painful. It does this by providing a standard way of ensuring that pages are loaded and providing hooks to make debugging the failure of a page to load easier. You can use it to help reduce the amount of boilerplate code in your tests, which in turn make maintaining your tests less tiresome.

There is currently an implementation in Java that ships as part of Selenium 2, but the approach used is simple enough to be implemented in any language.

Simple Usage

As an example of a UI that we’d like to model, take a look at the new issue page. From the point of view of a test author, this offers the service of being able to file a new issue. A basic Page Object would look like:

package com.example.webdriver;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;

public class EditIssue {

  private final WebDriver driver;

  public EditIssue(WebDriver driver) {
    this.driver = driver;
  }

  public void setSummary(String summary) {
    WebElement field = driver.findElement(By.name("summary"));
    clearAndType(field, summary);
  }

  public void enterDescription(String description) {
    WebElement field = driver.findElement(By.name("comment"));
    clearAndType(field, description);
  }

  public IssueList submit() {
    driver.findElement(By.id("submit")).click();
    return new IssueList(driver);
  }

  private void clearAndType(WebElement field, String text) {
    field.clear();
    field.sendKeys(text);
  }
}

In order to turn this into a LoadableComponent, all we need to do is to set that as the base type:

public class EditIssue extends LoadableComponent<EditIssue> {
  // rest of class ignored for now
}

This signature looks a little unusual, but it all means is that this class represents a LoadableComponent that loads the EditIssue page.

By extending this base class, we need to implement two new methods:

  @Override
  protected void load() {
    driver.get("https://github.com/SeleniumHQ/selenium/issues/new");
  }

  @Override
  protected void isLoaded() throws Error {
    String url = driver.getCurrentUrl();
    assertTrue("Not on the issue entry page: " + url, url.endsWith("/new"));
  }

The load method is used to navigate to the page, whilst the isLoaded method is used to determine whether we are on the right page. Although the method looks like it should return a boolean, instead it performs a series of assertions using JUnit’s Assert class. There can be as few or as many assertions as you like. By using these assertions it’s possible to give users of the class clear information that can be used to debug tests.

With a little rework, our PageObject looks like:

package com.example.webdriver;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.support.FindBy;
import org.openqa.selenium.support.PageFactory;

import static junit.framework.Assert.assertTrue;

public class EditIssue extends LoadableComponent<EditIssue> {

  private final WebDriver driver;
  
  // By default the PageFactory will locate elements with the same name or id
  // as the field. Since the summary element has a name attribute of "summary"
  // we don't need any additional annotations.
  private WebElement summary;
  
  // Same with the submit element, which has the ID "submit"
  private WebElement submit;
  
  // But we'd prefer a different name in our code than "comment", so we use the
  // FindBy annotation to tell the PageFactory how to locate the element.
  @FindBy(name = "comment") private WebElement description;
  
  public EditIssue(WebDriver driver) {
    this.driver = driver;
    
    // This call sets the WebElement fields.
    PageFactory.initElements(driver, this);
  }

  @Override
  protected void load() {
    driver.get("https://github.com/SeleniumHQ/selenium/issues/new");
  }

  @Override
  protected void isLoaded() throws Error {
    String url = driver.getCurrentUrl();
    assertTrue("Not on the issue entry page: " + url, url.endsWith("/new"));
  }
  
  public void setSummary(String issueSummary) {
    clearAndType(summary, issueSummary);
  }

  public void enterDescription(String issueDescription) {
    clearAndType(description, issueDescription);
  }

  public IssueList submit() {
    submit.click();
    return new IssueList(driver);
  }

  private void clearAndType(WebElement field, String text) {
    field.clear();
    field.sendKeys(text);
  }
}

That doesn’t seem to have bought us much, right? One thing it has done is encapsulate the information about how to navigate to the page into the page itself, meaning that this information’s not scattered through the code base. It also means that we can do this in our tests:

EditIssue page = new EditIssue(driver).get();

This call will cause the driver to navigate to the page if that’s necessary.

Nested Components

LoadableComponents start to become more useful when they are used in conjunction with other LoadableComponents. Using our example, we could view the “edit issue” page as a component within a project’s website (after all, we access it via a tab on that site). You also need to be logged in to file an issue. We could model this as a tree of nested components:

 + ProjectPage
 +---+ SecuredPage
     +---+ EditIssue

What would this look like in code? For a start, each logical component would have its own class. The “load” method in each of them would “get” the parent. The end result, in addition to the EditIssue class above is:

ProjectPage.java:

package com.example.webdriver;

import org.openqa.selenium.WebDriver;

import static org.junit.Assert.assertTrue;

public class ProjectPage extends LoadableComponent<ProjectPage> {

  private final WebDriver driver;
  private final String projectName;

  public ProjectPage(WebDriver driver, String projectName) {
    this.driver = driver;
    this.projectName = projectName;
  }

  @Override
  protected void load() {
    driver.get("http://" + projectName + ".googlecode.com/");
  }

  @Override
  protected void isLoaded() throws Error {
    String url = driver.getCurrentUrl();

    assertTrue(url.contains(projectName));
  }
}

and SecuredPage.java:

package com.example.webdriver;

import org.openqa.selenium.By;
import org.openqa.selenium.NoSuchElementException;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;

import static org.junit.Assert.fail;

public class SecuredPage extends LoadableComponent<SecuredPage> {

  private final WebDriver driver;
  private final LoadableComponent<?> parent;
  private final String username;
  private final String password;

  public SecuredPage(WebDriver driver, LoadableComponent<?> parent, String username, String password) {
    this.driver = driver;
    this.parent = parent;
    this.username = username;
    this.password = password;
  }

  @Override
  protected void load() {
    parent.get();

    String originalUrl = driver.getCurrentUrl();

    // Sign in
    driver.get("https://www.google.com/accounts/ServiceLogin?service=code");
    driver.findElement(By.name("Email")).sendKeys(username);
    WebElement passwordField = driver.findElement(By.name("Passwd"));
    passwordField.sendKeys(password);
    passwordField.submit();

    // Now return to the original URL
    driver.get(originalUrl);
  }

  @Override
  protected void isLoaded() throws Error {
    // If you're signed in, you have the option of picking a different login.
    // Let's check for the presence of that.

    try {
      WebElement div = driver.findElement(By.id("multilogin-dropdown"));
    } catch (NoSuchElementException e) {
      fail("Cannot locate user name link");
    }
  }
}

The “load” method in EditIssue now looks like:

  @Override
  protected void load() {
    securedPage.get();

    driver.get("https://github.com/SeleniumHQ/selenium/issues/new");
  }

This shows that the components are all “nested” within each other. A call to get() in EditIssue will cause all its dependencies to load too. The example usage:

public class FooTest {
  private EditIssue editIssue;

  @Before
  public void prepareComponents() {
    WebDriver driver = new FirefoxDriver();

    ProjectPage project = new ProjectPage(driver, "selenium");
    SecuredPage securedPage = new SecuredPage(driver, project, "example", "top secret");
    editIssue = new EditIssue(driver, securedPage);
  }

  @Test
  public void demonstrateNestedLoadableComponents() {
    editIssue.get();

    editIssue.setSummary("Summary");
    editIssue.enterDescription("This is an example");
  }
}

If you’re using a library such as Guiceberry in your tests, the preamble of setting up the PageObjects can be omitted leading to nice, clear, readable tests.

Bot Pattern

(previously located: https://github.com/SeleniumHQ/selenium/wiki/Bot-Style-Tests)

Although PageObjects are a useful way of reducing duplication in your tests, it’s not always a pattern that teams feel comfortable following. An alternative approach is to follow a more “command-like” style of testing.

A “bot” is an action-oriented abstraction over the raw Selenium APIs. This means that if you find that commands aren’t doing the Right Thing for your app, it’s easy to change them. As an example:

public class ActionBot {
  private final WebDriver driver;

  public ActionBot(WebDriver driver) {
    this.driver = driver;
  }

  public void click(By locator) {
    driver.findElement(locator).click();
  }

  public void submit(By locator) {
    driver.findElement(locator).submit();
  }

  /** 
   * Type something into an input field. WebDriver doesn't normally clear these
   * before typing, so this method does that first. It also sends a return key
   * to move the focus out of the element.
   */
  public void type(By locator, String text) { 
    WebElement element = driver.findElement(locator);
    element.clear();
    element.sendKeys(text + "\n");
  }
}

Once these abstractions have been built and duplication in your tests identified, it’s possible to layer PageObjects on top of bots.

7.3 - 测试类型

验收测试

进行这种类型的测试以确定功能或系统是否满足客户的期望和要求. 这种测试通常涉及客户的合作或反馈, 是一种验证活动, 可以用于回答以下问题:

我们是否在制造 正确的 产品?

对于Web应用程序, 可以通过模拟用户期望的行为 直接使用Selenium来完成此测试的自动化. 可以通过记录/回放, 或通过本文档中介绍的各种支持的语言来完成此类模拟. 注意:有些人可能还会提到, 验收测试是 功能测试 的子类型.

功能测试

进行这种类型的测试是为了确定功能或系统是否正常运行而没有问题. 它会在不同级别检查系统, 以确保涵盖所有方案并且系统能够执行预期的 工作 . 这是一个验证活动, 它回答了以下问题:

我们是否在 正确地 制造产品?

这通常包括: 测试没有错误 (404, 异常…) , 以可用的方式 (正确的重定向) 正常运行, 以可访问的方式并匹配其规格 (请参见前述的 验收测试 ) .

对于Web应用程序, 可以通过模拟预期的结果, 直接使用Selenium来完成此测试的自动化. 可以通过记录/回放或通过本文档中说明的各种支持的语言来完成此模拟.

性能测试

顾名思义, 进行性能测试是为了衡量应用程序的性能.

性能测试主要有两种类型:

负载测试

进行了负载测试, 以验证应用程序在各种特定的负载 (通常是同时连接一定数量的用户) 下的运行状况

压力测试

进行压力测试, 以验证应用程序在压力 (或高于最大支持负载) 下的运行状况.

通常, 性能测试是通过执行一些Selenium书写的测试来完成的, 这些测试模拟了不同的用户 使用Web应用程序的特定功能 并检索了一些有意义的指标.

通常, 这是由其他检索指标的工具完成的.
JMeter 就是这样一种工具.

对于Web应用程序, 要测量的详细信息包括 吞吐量、 延迟、数据丢失、单个组件加载时间…

注意1:所有浏览器的开发人员工具 均具有“性能”标签 (可通过按F12进行访问)

注2:这属于 非功能测试 的类型, 因为它通常是按系统而不是按功能/特征进行测量.

回归测试

此测试通常在修改, 修复或添加功能之后进行.

为了确保所做的更改没有破坏任何现有功能, 将再次执行一些已经执行过的测试.

重新执行的测试集可以是全部或部分, 并且可以包括几种不同的类型, 具体取决于具体的应用程序和开发团队.

测试驱动开发 (TDD)

TDD本身不是一种测试类型, 而是一种迭代开发方法, 用于测试驱动功能的设计.

每个周期都从创建功能应通过的一组单元测试开始 (这将使首次执行失败) .

此后, 进行开发以使测试通过. 在另一个周期开始再次执行测试, 此过程一直持续到所有测试通过为止.

如此做的目的是基于以下情况, 既缺陷发现的时间越早成本越低, 从而加快了应用程序的开发.

行为驱动开发 (BDD)

BDD还是基于上述 (TDD) 的迭代开发方法, 其目的是让各方参与到应用程序的开发中.

每个周期都从创建一些规范开始 (应该失败) . 然后创建会失败的单元测试 (也应该失败) , 之后着手开发.

重复此循环, 直到所有类型的测试通过.

为此, 使用了规范语言. 因为简单、标准和明确, 各方都应该能够理解. 大多数工具都使用 Gherkin 作为这种语言.

目标是为了解决潜在的验收问题, 从而能够检测出比TDD还要多的错误, 并使各方之间的沟通更加顺畅.

当前有一组工具可用于编写规范 并将其与编码功能 (例如 CucumberSpecFlow ) 匹配.

在Selenium之上构建了一系列工具, 可通过将BDD规范直接转换为可执行代码, 使得上述过程更为迅速. 例如 JBehave, Capybara 和 Robot Framework.

7.4 - 指南和建议

Selenium项目的一些测试指南和建议.

关于"最佳实践"的注解:我们有意在本文档中避免使用"最佳实践"的说辞. 没有一种方法可以适用于所有情况. 我们更喜欢"指南和建议"的想法. 我们鼓励您通读这些内容, 并仔细地确定哪种方法适用于您的特定环境.

由于许多原因, 功能测试很难正确完成. 即便应用程序的状态, 复杂性, 依赖还不够让测试变得 足够复杂, 操作浏览器(特别是跨浏览器的兼容性测试)就已经使得写一个好的测试变成一种挑战.

Selenium提供了一些工具使得功能测试用户更简单的操作浏览器, 但是这些工具并不能帮助你来写一个好的 架构的测试套件. 这章我们会针对怎么来做web页面的功能测试的自动化给出一些忠告, 指南和建议.

这章记录了很多历年来成功的使用Selenium的用户的常用的软件设计模式.

7.4.1 - PO设计模式

Note: this page has merged contents from multiple sources, including the Selenium wiki

Overview

Within your web app’s UI, there are areas where your tests interact with. A Page Object only models these as objects within the test code. This reduces the amount of duplicated code and means that if the UI changes, the fix needs only to be applied in one place.

Page Object is a Design Pattern that has become popular in test automation for enhancing test maintenance and reducing code duplication. A page object is an object-oriented class that serves as an interface to a page of your AUT. The tests then use the methods of this page object class whenever they need to interact with the UI of that page. The benefit is that if the UI changes for the page, the tests themselves don’t need to change, only the code within the page object needs to change. Subsequently, all changes to support that new UI are located in one place.

Advantages

  • There is a clean separation between the test code and page-specific code, such as locators (or their use if you’re using a UI Map) and layout.
  • There is a single repository for the services or operations the page offers rather than having these services scattered throughout the tests.

In both cases, this allows any modifications required due to UI changes to all be made in one place. Helpful information on this technique can be found on numerous blogs as this ‘test design pattern’ is becoming widely used. We encourage readers who wish to know more to search the internet for blogs on this subject. Many have written on this design pattern and can provide helpful tips beyond the scope of this user guide. To get you started, we’ll illustrate page objects with a simple example.

Examples

First, consider an example, typical of test automation, that does not use a page object:

/***
 * Tests login feature
 */
public class Login {

  public void testLogin() {
    // fill login data on sign-in page
    driver.findElement(By.name("user_name")).sendKeys("userName");
    driver.findElement(By.name("password")).sendKeys("my supersecret password");
    driver.findElement(By.name("sign-in")).click();

    // verify h1 tag is "Hello userName" after login
    driver.findElement(By.tagName("h1")).isDisplayed();
    assertThat(driver.findElement(By.tagName("h1")).getText(), is("Hello userName"));
  }
}

There are two problems with this approach.

  • There is no separation between the test method and the AUT’s locators (IDs in this example); both are intertwined in a single method. If the AUT’s UI changes its identifiers, layout, or how a login is input and processed, the test itself must change.
  • The ID-locators would be spread in multiple tests, in all tests that had to use this login page.

Applying the page object techniques, this example could be rewritten like this in the following example of a page object for a Sign-in page.

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;

/**
 * Page Object encapsulates the Sign-in page.
 */
public class SignInPage {
  protected WebDriver driver;

  // <input name="user_name" type="text" value="">
  private By usernameBy = By.name("user_name");
  // <input name="password" type="password" value="">
  private By passwordBy = By.name("password");
  // <input name="sign_in" type="submit" value="SignIn">
  private By signinBy = By.name("sign_in");

  public SignInPage(WebDriver driver){
    this.driver = driver;
     if (!driver.getTitle().equals("Sign In Page")) {
      throw new IllegalStateException("This is not Sign In Page," +
            " current page is: " + driver.getCurrentUrl());
    }
  }

  /**
    * Login as valid user
    *
    * @param userName
    * @param password
    * @return HomePage object
    */
  public HomePage loginValidUser(String userName, String password) {
    driver.findElement(usernameBy).sendKeys(userName);
    driver.findElement(passwordBy).sendKeys(password);
    driver.findElement(signinBy).click();
    return new HomePage(driver);
  }
}

and page object for a Home page could look like this.

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;

/**
 * Page Object encapsulates the Home Page
 */
public class HomePage {
  protected WebDriver driver;

  // <h1>Hello userName</h1>
  private By messageBy = By.tagName("h1");

  public HomePage(WebDriver driver){
    this.driver = driver;
    if (!driver.getTitle().equals("Home Page of logged in user")) {
      throw new IllegalStateException("This is not Home Page of logged in user," +
            " current page is: " + driver.getCurrentUrl());
    }
  }

  /**
    * Get message (h1 tag)
    *
    * @return String message text
    */
  public String getMessageText() {
    return driver.findElement(messageBy).getText();
  }

  public HomePage manageProfile() {
    // Page encapsulation to manage profile functionality
    return new HomePage(driver);
  }
  /* More methods offering the services represented by Home Page
  of Logged User. These methods in turn might return more Page Objects
  for example click on Compose mail button could return ComposeMail class object */
}

So now, the login test would use these two page objects as follows.

/***
 * Tests login feature
 */
public class TestLogin {

  @Test
  public void testLogin() {
    SignInPage signInPage = new SignInPage(driver);
    HomePage homePage = signInPage.loginValidUser("userName", "password");
    assertThat(homePage.getMessageText(), is("Hello userName"));
  }

}

There is a lot of flexibility in how the page objects may be designed, but there are a few basic rules for getting the desired maintainability of your test code.

Assertions in Page Objects

Page objects themselves should never make verifications or assertions. This is part of your test and should always be within the test’s code, never in an page object. The page object will contain the representation of the page, and the services the page provides via methods but no code related to what is being tested should be within the page object.

There is one, single, verification which can, and should, be within the page object and that is to verify that the page, and possibly critical elements on the page, were loaded correctly. This verification should be done while instantiating the page object. In the examples above, both the SignInPage and HomePage constructors check that the expected page is available and ready for requests from the test.

Page Component Objects

A page object does not necessarily need to represent all the parts of a page itself. This was noted by Martin Fowler in the early days, while first coining the term “panel objects”.

The same principles used for page objects can be used to create “Page Component Objects”, as it was later called, that represent discrete chunks of the page and can be included in page objects. These component objects can provide references to the elements inside those discrete chunks, and methods to leverage the functionality or behavior provided by them.

For example, a Products page has multiple products.

<!-- Products Page -->
<div class="header_container">
    <span class="title">Products</span>
</div>

<div class="inventory_list">
    <div class="inventory_item">
    </div>
    <div class="inventory_item">
    </div>
    <div class="inventory_item">
    </div>
    <div class="inventory_item">
    </div>
    <div class="inventory_item">
    </div>
    <div class="inventory_item">
    </div>
</div>

Each product is a component of the Products page.

<!-- Inventory Item -->
<div class="inventory_item">
    <div class="inventory_item_name">Backpack</div>
    <div class="pricebar">
        <div class="inventory_item_price">$29.99</div>
        <button id="add-to-cart-backpack">Add to cart</button>
    </div>
</div>

The Products page HAS-A list of products. This object relationship is called Composition. In simpler terms, something is composed of another thing.

public abstract class BasePage {
    protected WebDriver driver;

    public BasePage(WebDriver driver) {
        this.driver = driver;
    }
}

// Page Object
public class ProductsPage extends BasePage {
    public ProductsPage(WebDriver driver) {
        super(driver);
        // No assertions, throws an exception if the element is not loaded
        new WebDriverWait(driver, Duration.ofSeconds(3))
            .until(d -> d.findElement(By.className("header_container")));
    }

    // Returning a list of products is a service of the page
    public List<Product> getProducts() {
        return driver.findElements(By.className("inventory_item"))
            .stream()
            .map(e -> new Product(e)) // Map WebElement to a product component
            .toList();
    }

    // Return a specific product using a boolean-valued function (predicate)
    // This is the behavioral Strategy Pattern from GoF
    public Product getProduct(Predicate<Product> condition) {
        return getProducts()
            .stream()
            .filter(condition) // Filter by product name or price
            .findFirst()
            .orElseThrow();
    }
}

The Product component object is used inside the Products page object.

public abstract class BaseComponent {
    protected WebElement root;

    public BaseComponent(WebElement root) {
        this.root = root;
    }
}

// Page Component Object
public class Product extends BaseComponent {
    // The root element contains the entire component
    public Product(WebElement root) {
        super(root); // inventory_item
    }

    public String getName() {
        // Locating an element begins at the root of the component
        return root.findElement(By.className("inventory_item_name")).getText();
    }

    public BigDecimal getPrice() {
        return new BigDecimal(
                root.findElement(By.className("inventory_item_price"))
                    .getText()
                    .replace("$", "")
            ).setScale(2, RoundingMode.UNNECESSARY); // Sanitation and formatting
    }

    public void addToCart() {
        root.findElement(By.id("add-to-cart-backpack")).click();
    }
}

So now, the products test would use the page object and the page component object as follows.

public class ProductsTest {
    @Test
    public void testProductInventory() {
        var productsPage = new ProductsPage(driver); // page object
        var products = productsPage.getProducts();
        assertEquals(6, products.size()); // expected, actual
    }
    
    @Test
    public void testProductPrices() {
        var productsPage = new ProductsPage(driver);

        // Pass a lambda expression (predicate) to filter the list of products
        // The predicate or "strategy" is the behavior passed as parameter
        var backpack = productsPage.getProduct(p -> p.getName().equals("Backpack")); // page component object
        var bikeLight = productsPage.getProduct(p -> p.getName().equals("Bike Light"));

        assertEquals(new BigDecimal("29.99"), backpack.getPrice());
        assertEquals(new BigDecimal("9.99"), bikeLight.getPrice());
    }
}

The page and component are represented by their own objects. Both objects only have methods for the services they offer, which matches the real-world application in object-oriented programming.

You can even nest component objects inside other component objects for more complex pages. If a page in the AUT has multiple components, or common components used throughout the site (e.g. a navigation bar), then it may improve maintainability and reduce code duplication.

Other Design Patterns Used in Testing

There are other design patterns that also may be used in testing. Discussing all of these is beyond the scope of this user guide. Here, we merely want to introduce the concepts to make the reader aware of some of the things that can be done. As was mentioned earlier, many have blogged on this topic and we encourage the reader to search for blogs on these topics.

Implementation Notes

PageObjects can be thought of as facing in two directions simultaneously. Facing toward the developer of a test, they represent the services offered by a particular page. Facing away from the developer, they should be the only thing that has a deep knowledge of the structure of the HTML of a page (or part of a page) It’s simplest to think of the methods on a Page Object as offering the “services” that a page offers rather than exposing the details and mechanics of the page. As an example, think of the inbox of any web-based email system. Amongst the services it offers are the ability to compose a new email, choose to read a single email, and list the subject lines of the emails in the inbox. How these are implemented shouldn’t matter to the test.

Because we’re encouraging the developer of a test to try and think about the services they’re interacting with rather than the implementation, PageObjects should seldom expose the underlying WebDriver instance. To facilitate this, methods on the PageObject should return other PageObjects. This means we can effectively model the user’s journey through our application. It also means that should the way that pages relate to one another change (like when the login page asks the user to change their password the first time they log into a service when it previously didn’t do that), simply changing the appropriate method’s signature will cause the tests to fail to compile. Put another way; we can tell which tests would fail without needing to run them when we change the relationship between pages and reflect this in the PageObjects.

One consequence of this approach is that it may be necessary to model (for example) both a successful and unsuccessful login; or a click could have a different result depending on the app’s state. When this happens, it is common to have multiple methods on the PageObject:

public class LoginPage {
    public HomePage loginAs(String username, String password) {
        // ... clever magic happens here
    }
    
    public LoginPage loginAsExpectingError(String username, String password) {
        //  ... failed login here, maybe because one or both of the username and password are wrong
    }
    
    public String getErrorMessage() {
        // So we can verify that the correct error is shown
    }
}

The code presented above shows an important point: the tests, not the PageObjects, should be responsible for making assertions about the state of a page. For example:

public void testMessagesAreReadOrUnread() {
    Inbox inbox = new Inbox(driver);
    inbox.assertMessageWithSubjectIsUnread("I like cheese");
    inbox.assertMessageWithSubjectIsNotUnread("I'm not fond of tofu");
}

could be re-written as:

public void testMessagesAreReadOrUnread() {
    Inbox inbox = new Inbox(driver);
    assertTrue(inbox.isMessageWithSubjectIsUnread("I like cheese"));
    assertFalse(inbox.isMessageWithSubjectIsUnread("I'm not fond of tofu"));
}

Of course, as with every guideline, there are exceptions, and one that is commonly seen with PageObjects is to check that the WebDriver is on the correct page when we instantiate the PageObject. This is done in the example below.

Finally, a PageObject need not represent an entire page. It may represent a section that appears frequently within a site or page, such as site navigation. The essential principle is that there is only one place in your test suite with knowledge of the structure of the HTML of a particular (part of a) page.

Summary

  • The public methods represent the services that the page offers
  • Try not to expose the internals of the page
  • Generally don’t make assertions
  • Methods return other PageObjects
  • Need not represent an entire page
  • Different results for the same action are modelled as different methods

Example

public class LoginPage {
    private final WebDriver driver;

    public LoginPage(WebDriver driver) {
        this.driver = driver;

        // Check that we're on the right page.
        if (!"Login".equals(driver.getTitle())) {
            // Alternatively, we could navigate to the login page, perhaps logging out first
            throw new IllegalStateException("This is not the login page");
        }
    }

    // The login page contains several HTML elements that will be represented as WebElements.
    // The locators for these elements should only be defined once.
        By usernameLocator = By.id("username");
        By passwordLocator = By.id("passwd");
        By loginButtonLocator = By.id("login");

    // The login page allows the user to type their username into the username field
    public LoginPage typeUsername(String username) {
        // This is the only place that "knows" how to enter a username
        driver.findElement(usernameLocator).sendKeys(username);

        // Return the current page object as this action doesn't navigate to a page represented by another PageObject
        return this;	
    }

    // The login page allows the user to type their password into the password field
    public LoginPage typePassword(String password) {
        // This is the only place that "knows" how to enter a password
        driver.findElement(passwordLocator).sendKeys(password);

        // Return the current page object as this action doesn't navigate to a page represented by another PageObject
        return this;	
    }

    // The login page allows the user to submit the login form
    public HomePage submitLogin() {
        // This is the only place that submits the login form and expects the destination to be the home page.
        // A seperate method should be created for the instance of clicking login whilst expecting a login failure. 
        driver.findElement(loginButtonLocator).submit();

        // Return a new page object representing the destination. Should the login page ever
        // go somewhere else (for example, a legal disclaimer) then changing the method signature
        // for this method will mean that all tests that rely on this behaviour won't compile.
        return new HomePage(driver);	
    }

    // The login page allows the user to submit the login form knowing that an invalid username and / or password were entered
    public LoginPage submitLoginExpectingFailure() {
        // This is the only place that submits the login form and expects the destination to be the login page due to login failure.
        driver.findElement(loginButtonLocator).submit();

        // Return a new page object representing the destination. Should the user ever be navigated to the home page after submiting a login with credentials 
        // expected to fail login, the script will fail when it attempts to instantiate the LoginPage PageObject.
        return new LoginPage(driver);	
    }

    // Conceptually, the login page offers the user the service of being able to "log into"
    // the application using a user name and password. 
    public HomePage loginAs(String username, String password) {
        // The PageObject methods that enter username, password & submit login have already defined and should not be repeated here.
        typeUsername(username);
        typePassword(password);
        return submitLogin();
    }
}

7.4.2 - 领域特定语言

领域特定语言 (DSL) 是一种为用户提供解决问题的表达方式的系统. 它使用户可以按照自己的术语与系统进行交互, 而不仅仅是通过程序员的语言.

您的用户通常并不关心您网站的外观. 他们不在乎装饰, 动画或图形. 他们希望借助于您的系统, 以最小的难度使新员工融入整个流程; 他们想预订去阿拉斯加的旅行; 他们想以折扣价配置和购买独角兽. 您作为测试人员的工作应尽可能接近"捕捉”这种思维定势. 考虑到这一点, 我们开始着手"建模”您正在工作的应用程序, 以使测试脚本 (发布前用户仅有的代理) “说话”并代表用户.

在Selenium中, DSL通常由方法表示, 其编写方式使API简单易读-它们使开发人员和干系人 (用户, 产品负责人, 商业智能专家等) 之间能够产生汇报.

好处

  • 可读: 业务关系人可以理解.
  • 可写: 易于编写, 避免不必要的重复.
  • 可扩展: 可以 (合理地) 添加功能而无需打破约定以及现有功能.
  • 可维护: 通过将实现细节排除在测试用例之外, 您可以很好地隔离 AUT* 的修改.

Java

以下是Java中合理的DSL方法的示例. 为简便起见, 假定 driver 对象是预定义的并且可用于该方法.

/**
 * Takes a username and password, fills out the fields, and clicks "login".
 * @return An instance of the AccountPage
 */
public AccountPage loginAsUser(String username, String password) {
  WebElement loginField = driver.findElement(By.id("loginField"));
  loginField.clear();
  loginField.sendKeys(username);

  // Fill out the password field. The locator we're using is "By.id", and we should
  // have it defined elsewhere in the class.
  WebElement passwordField = driver.findElement(By.id("password"));
  passwordField.clear();
  passwordField.sendKeys(password);

  // Click the login button, which happens to have the id "submit".
  driver.findElement(By.id("submit")).click();

  // Create and return a new instance of the AccountPage (via the built-in Selenium
  // PageFactory).
  return PageFactory.newInstance(AccountPage.class);
}

此方法完全从测试代码中抽象出输入字段, 按钮, 单击甚至页面的概念. 使用这种方法, 测试人员要做的就是调用此方法. 这给您带来了维护方面的优势: 如果登录字段曾经更改过, 则只需更改此方法-而非您的测试.

public void loginTest() {
    loginAsUser("cbrown", "cl0wn3");

    // Now that we're logged in, do some other stuff--since we used a DSL to support
    // our testers, it's as easy as choosing from available methods.
    do.something();
    do.somethingElse();
    Assert.assertTrue("Something should have been done!", something.wasDone());

    // Note that we still haven't referred to a button or web control anywhere in this
    // script...
}

郑重强调: 您的主要目标之一应该是编写一个API, 该API允许您的测试解决 当前的问题, 而不是UI的问题. 用户界面是用户的次要问题–用户并不关心用户界面, 他们只是想完成工作. 您的测试脚本应该像用户希望做的事情以及他们想知道的事情的完整清单那样易于阅读. 测试不应该考虑UI如何要求您去做.

*AUT: 待测系统

7.4.3 - 生成应用程序状态

Selenium不应用于准备测试用例. 测试用例中所有重复性动作和准备工作, 都应通过其他方法来完成.
例如, 大多数Web UI都具有身份验证 (诸如一个登录表单) . 在每次测试之前通过Web浏览器进行登录的消除, 将提高测试的速度和稳定性. 应该创建一种方法来获取对 AUT* 的访问权限 (例如, 使用API登录并设置Cookie) . 此外, 不应使用Selenium创建预加载数据来进行测试的方法.
如前所述, 应利用现有的API为 AUT* 创建数据. *AUT: 待测系统

7.4.4 - 模拟外部服务

消除对外部服务的依赖性将大大提高测试的速度和稳定性.

7.4.5 - 改善报告

Selenium并非旨在报告测试用例的运行状态. 利用单元测试框架的内置报告功能是一个好的开始. 大多数单元测试框架都有可以生成xUnit或HTML格式的报告. xUnit报表很受欢迎, 可以将其结果导入到持续集成(CI)服务器, 例如Jenkins、Travis、Bamboo等. 以下是一些链接, 可获取关于几种语言报表输出的更多信息.

NUnit 3 Console Runner

NUnit 3 Console Command Line

xUnit getting test results in TeamCity

xUnit getting test results in CruiseControl.NET

xUnit getting test results in Azure DevOps

7.4.6 - 避免共享状态

尽管在多个地方都提到过, 但这点仍值得被再次提及. 确保测试相互隔离.

  • 不要共享测试数据. 想象一下有几个测试, 每个测试都会在选择操作执行之前查询数据库中的有效订单. 如果两个测试采用相同的顺序, 则很可能会出现意外行为.

  • 清理应用程序中过时的数据, 这些数据可能会被其他测试. 例如无效的订单记录.

  • 每次测试都创建一个新的WebDriver实例. 这在确保测试隔离的同时可以保障并行化更为简单.

7.4.7 - 使用定位器的提示

何时使用哪些定位器以及如何在代码中最好地管理它们.

这里有一些 支持的定位策略 的例子 .

一般来说,如果 HTML 的 id 是可用的、唯一的且是可预测的,那么它就是在页面上定位元素的首选方法。它们的工作速度非常快,可以避免复杂的 DOM 遍历带来的大量处理。

如果没有唯一的 id,那么最好使用写得好的 CSS 选择器来查找元素。XPath 和 CSS 选择器一样好用,但是它语法很复杂,并且经常很难调试。尽管 XPath 选择器非常灵活,但是他们通常未经过浏览器厂商的性能测试,并且运行速度很慢。

基于链接文本和部分链接文本的选择策略有其缺点,即只能对链接元素起作用。此外,它们在 WebDriver 内部调用 querySelectorAll 选择器。

标签名可能是一种危险的定位元素的方法。页面上经常出现同一标签的多个元素。这在调用 findElements(By) 方法返回元素集合的时候非常有用。

建议您尽可能保持定位器的紧凑性和可读性。使用 WebDriver 遍历 DOM 结构是一项性能花销很大的操作,搜索范围越小越好。

7.4.8 - 测试的独立性

将每个测试编写为独立的单元. 以不依赖于其他测试完成的方式编写测试:

例如有一个内容管理系统, 您可以借助其创建一些自定义内容, 这些内容在发布后作为模块显示在您的网站上, 并且CMS和应用程序之间的同步可能需要一些时间.

测试模块的一种错误方法是在测试中创建并发布内容, 然后在另一测试中检查该模块. 这是不可取的, 因为发布后内容可能无法立即用于其他测试.

与之相反的事, 您可以创建在受影响的测试中打开和关闭的打桩内容, 并将其用于验证模块. 而且, 对于内容的创建, 您仍然可以进行单独的测试.

7.4.9 - 考虑使用Fluent API

Martin Fowler创造了术语 “Fluent API”. Selenium已经在其 FluentWait 类中实现了类似的东西, 这是对标准 Wait 类的替代. 您可以在页面对象中启用Fluent API设计模式, 然后使用如下代码段查询Google搜索页面:

driver.get( "http://www.google.com/webhp?hl=en&amp;tab=ww" );
GoogleSearchPage gsp = new GoogleSearchPage(driver);
gsp.setSearchString().clickSearchButton();

Google页面对象类具有这种流畅行为后可能看起来像这样:

public abstract class BasePage {
    protected WebDriver driver;

    public BasePage(WebDriver driver) {
        this.driver = driver;
    }
}

public class GoogleSearchPage extends BasePage {
    public GoogleSearchPage(WebDriver driver) {
        super(driver);
        // Generally do not assert within pages or components.
        // Effectively throws an exception if the lambda condition is not met.
        new WebDriverWait(driver, Duration.ofSeconds(3)).until(d -> d.findElement(By.id("logo")));
    }

    public GoogleSearchPage setSearchString(String sstr) {
        driver.findElement(By.id("gbqfq")).sendKeys(sstr);
        return this;
    }

    public void clickSearchButton() {
        driver.findElement(By.id("gbqfb")).click();
    }
}

7.4.10 - 每次测试都刷新浏览器

每次测试都从一个干净的已知状态开始. 理想情况下, 为每次测试打开一个新的虚拟机. 如果打开新虚拟机不切实际, 则至少应为每次测试启动一个新的WebDriver. 对于Firefox, 请使用您已知的配置文件去启动WebDriver. 大多数浏览器驱动器,像GeckoDriver和ChromeDriver那样,默认都会以干净的已知状态和一个新的用户配置文件开始。

WebDriver driver = new FirefoxDriver();

7.5 - 不鼓励的行为

使用Selenium自动化浏览器时应避免的事项.

7.5.1 - 验证码

验证码 (CAPTCHA), 是 全自动区分计算机和人类的图灵测试 (Completely Automated Public Turing test to tell Computers and Humans Apart) 的简称, 是被明确地设计用于阻止自动化的, 所以不要尝试!
规避验证码的检查, 主要有两个策略:

  • 在测试环境中禁用验证码
  • 添加钩子以允许测试绕过验证码

7.5.2 - 文件下载

虽然可以通过在Selenium的控制下单击浏览器的链接来开始下载, 但是API并不会暴露下载进度, 因此这是一种不理想的测试下载文件的方式. 因为下载文件并非模拟用户与Web平台交互的重要方面. 取而代之的是, 应使用Selenium(以及任何必要的cookie)查找链接, 并将其传递给例如libcurl这样的HTTP请求库.

HtmlUnit driver 可以通过实现AttachmentHandler 接口将附件作为输入流进行访问来下载附件. 可以将AttachmentHandler添加到 HtmlUnit WebClient.

7.5.3 - HTTP响应码

对于Selenium RC中的某些浏览器配置, Selenium充当了浏览器和自动化站点之间的代理. 这意味着可以捕获或操纵通过Selenium传递的所有浏览器流量. captureNetworkTraffic() 方法旨在捕获浏览器和自动化站点之间的所有网络流量,包括HTTP响应码.

Selenium WebDriver是一种完全不同的浏览器自动化实现, 它更喜欢表现得像用户一样,这种方式来自于基于WebDriver编写测试的方式. 在自动化功能测试中,检查状态码并不是测试失败的特别重要的细节, 之前的步骤更重要.

浏览器将始终呈现HTTP状态代码,例如404或500错误页面. 遇到这些错误页面时,一种“快速失败”的简单方法是 在每次加载页面后检查页面标题或可信赖点的内容(例如 <h1> 标签). 如果使用的是页面对象模型,则可以将此检查置于类构造函数中或类似于期望的页面加载的位置. 有时,HTTP代码甚至可能出现在浏览器的错误页面中, 您可以使用WebDriver读取此信息并改善调试输出.

检查网页本身的一种理想实践是符合WebDriver的呈现以及用户的视角.

如果您坚持,捕获HTTP状态代码的高级解决方案是复刻Selenium RC的行为去使用代理. WebDriver API提供了为浏览器设置代理的功能, 并且有许多代理可以通过编程方式来操纵发送到Web服务器和从Web服务器接收的请求的内容. 使用代理可以决定如何响应重定向响应代码. 此外,并非每个浏览器都将响应代码提供给WebDriver, 因此选择使用代理可以使您拥有适用于每个浏览器的解决方案.

7.5.4 - Gmail, email 和 Facebook 登录

由于多种原因, 不建议使用WebDriver登录Gmail和Facebook等网站. 除了违反这些网站的使用条款之外 (您可能会面临帐户被关闭的风险) , 还有其运行速度缓慢且不可靠的因素.

理想的做法是使用电子邮件供应商提供的API, 或者对于Facebook, 使用开发者工具的服务, 该服务是被用于创建测试帐户、朋友等内容的API. 尽管使用API可能看起来有些额外的工作量, 但是您将获得基于速度、可靠性和稳定性的回报. API不会频繁更改, 但是网页和HTML定位符经常变化, 并且需要您更新测试框架的代码.

在任何时候测试使用WebDriver登录第三方站点, 都会增加测试失败的风险, 因为这会使您的测试时间更长. 通常的经验是, 执行时间较长的测试会更加脆弱和不可靠.

符合W3C conformant 的WebDriver实现, 也会使用 WebDriver 的属性对 navigator 对象进行注释, 用于缓解拒绝服务的攻击.

7.5.5 - 测试依赖

关于自动化测试的一个常见想法和误解是关于特定的测试顺序. 您的测试应该能够以任何顺序运行,而不是依赖于完成其他测试才能成功.

7.5.6 - 性能测试

通常不建议使用Selenium和WebDriver进行性能测试. 并非因为不能做, 只是缺乏针对此类工作的优化, 因而难以得到乐观的结果.

对于用户而言, 在用户上下文中执行性能测试似乎是自然而然的选择, 但是WebDriver的测试会受到许多外部和内部的影响而变得脆弱, 这是您无法控制的. 例如, 浏览器的启动速度, HTTP服务器的速度, 托管JavaScript或CSS的第三方服务器的响应 以及WebDriver实现本身检测的损失. 这些因素的变化会影响结果. 很难区分网站自身与外部资源之间的性能差异, 并且也很难明确浏览器中使用WebDriver对性能的影响, 尤其是在注入脚本时.

另一个潜在的吸引点是"节省时间"-同时执行功能和性能测试. 但是, 功能和性能测试分别具有截然不同的目标. 要测试功能, 测试人员可能需要耐心等待加载, 但这会使性能测试结果蒙上阴影, 反之亦然.

为了提高网站的性能, 您需要不依赖于环境的差异来分析整体性能, 识别不良代码的实践, 对单个资源 (即CSS或JavaScript) 的性能进行细分 以了解需要改进的地方. 有很多性能测试工具已经可以完成这项工作, 并且提供了报告和分析结果, 甚至可以提出改进建议.

例如一种易于使用的 (开源) 软件包是: JMeter

7.5.7 - 爬取链接

建议您不要使用WebDriver来通过链接进行爬网, 并非因为无法完成,而是因为它绝对不是最理想的工具。 WebDriver需要一些时间来启动,并且可能要花几秒钟到一分钟的时间, 具体取决于测试的编写方式,仅仅是为了获取页面并遍历DOM.

除了使用WebDriver之外, 您还可以通过执行 curl 命令或 使用诸如BeautifulSoup之类的库来节省大量时间, 因为这些方法不依赖于创建浏览器和导航至页面. 通过不使用WebDriver可以节省大量时间.

7.5.8 - 双因素认证

双因素认证通常简写成 2FA 是一种一次性密码(OTP)通常用在移动应用上例如“谷歌认证器”, “微软认证器”等等,或者通过短信或者邮件来认证。在Selenium自动化中这些都是影响有效自动化 的极大挑战。虽然也有一些方法可以自动化这些过程,但是同样对于Selenium自动化也引入了很多不安全因素。 所以你应该要避免对2FA自动化。

这里有一些对于如何绕过2FA校验的建议:

  • 在测试环境中对特定用户禁止2FA校验,这样对于这些特定用户可以直接进行自动化测试。
  • 禁止2FA校验在测试环境中。
  • 对于特定IP区域禁止2FA校验,这样我们可以配置测试机器的IP在这些白名单区域中。

8 - 关于这个文档

这些文档,就像代码本身一样,100% 由 Selenium 社区中的志愿者维护。 许多人自成立以来一直在使用它,但更多人只是在短时间内使用它,并且已经花时间帮助改善新用户的入门体验。

如果文档有问题,我们想知道! 沟通问题的最佳方式是访问 https://github.com/seleniumhq/seleniumhq.github.io/issues 并搜索问题是否已经提交。 如果没有,请随意打开一个!

社区的许多成员经常光顾 Libera.chat#selenium Libera 频道。 请随时来访并提出问题,如果您得到了您认为在这些文档中可能有用的帮助,请务必添加您的贡献! 我们可以更新这些文档,但当我们从普通提交者之外获得贡献时,对每个人来说都容易得多。

8.1 - 版权和归属

Selenium工具集下的全部版权、贡献以及归属.

Selenium文档

我们已尽一切努力使本文档尽可能完整和准确, 但难以保证一定适用. 信息提供是基于“as-is”的. 若本文档所含信息引起的任何损失或损害, 作者和出版者对所有个人或实体均不承担任何责任. 不承担任何与使用本文信息相关的专利责任.

归属

感谢:

Selenium Main Repository

5151 commits
3352 commits
2459 commits
1464 commits
1299 commits
1212 commits
1175 commits
1162 commits
1051 commits
970 commits
599 commits
523 commits
473 commits
326 commits
303 commits
289 commits
225 commits
205 commits
200 commits
191 commits
179 commits
171 commits
167 commits
138 commits
137 commits
133 commits
115 commits
109 commits
108 commits
94 commits
91 commits
90 commits
66 commits
63 commits
49 commits
48 commits
46 commits
44 commits
42 commits
41 commits
40 commits
37 commits
36 commits
34 commits
32 commits
30 commits
28 commits
26 commits
26 commits
25 commits
24 commits
22 commits
21 commits
20 commits
20 commits
20 commits
19 commits
16 commits
15 commits
14 commits
13 commits
12 commits
12 commits
12 commits
12 commits
11 commits
10 commits
10 commits
9 commits
9 commits
9 commits
9 commits
8 commits
7 commits
7 commits
7 commits
7 commits
7 commits
7 commits
7 commits
7 commits
6 commits
6 commits
6 commits
6 commits
5 commits
5 commits
5 commits
5 commits
5 commits
5 commits
5 commits
5 commits
5 commits
4 commits
4 commits
4 commits
4 commits
4 commits
4 commits

Selenium IDE

2445 commits
176 commits
88 commits
57 commits
51 commits
36 commits
36 commits
34 commits
24 commits
15 commits
15 commits
12 commits
12 commits
6 commits
4 commits
3 commits
3 commits
3 commits
3 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits

Docker Selenium

516 commits
134 commits
107 commits
53 commits
50 commits
27 commits
24 commits
12 commits
9 commits
8 commits
7 commits
5 commits
5 commits
4 commits
4 commits
4 commits
4 commits
4 commits
4 commits
4 commits
4 commits
3 commits
3 commits
3 commits
3 commits
3 commits
3 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits

Selenium Website & Docs

654 commits
592 commits
323 commits
134 commits
101 commits
47 commits
39 commits
32 commits
28 commits
25 commits
21 commits
18 commits
18 commits
17 commits
15 commits
15 commits
12 commits
12 commits
11 commits
10 commits
8 commits
8 commits
7 commits
7 commits
7 commits
6 commits
6 commits
6 commits
6 commits
5 commits
5 commits
5 commits
5 commits
5 commits
5 commits
5 commits
5 commits
4 commits
4 commits
4 commits
4 commits
4 commits
4 commits
4 commits
4 commits
3 commits
3 commits
3 commits
3 commits
3 commits
3 commits
3 commits
3 commits
3 commits
3 commits
3 commits
3 commits
3 commits
3 commits
3 commits
3 commits
3 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits

Previous Selenium Website

417 commits
87 commits
79 commits
63 commits
59 commits
40 commits
40 commits
36 commits
26 commits
24 commits
22 commits
21 commits
21 commits
19 commits
17 commits
14 commits
12 commits
11 commits
11 commits
10 commits
7 commits
6 commits
5 commits
5 commits
5 commits
5 commits
5 commits
5 commits
3 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits

Previous Documentation Rewrite Project

197 commits
105 commits
54 commits
30 commits
27 commits
25 commits
21 commits
17 commits
16 commits
12 commits
12 commits
12 commits
12 commits
8 commits
7 commits
6 commits
6 commits
6 commits
6 commits
5 commits
5 commits
5 commits
4 commits
4 commits
4 commits
4 commits
3 commits
3 commits
3 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
2 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits
1 commits

Selenium文档项目使用的第三方软件:

软件版本许可
Hugov0.110.0Apache 2.0
DocsyApache 2.0

许可

源自Selenium项目的所有代码和文档均基于Apache2.0的许可, 由 Software Freedom Conservancy 作为版权所有者.

为方便起见, 此处包含许可证, 您也可以在此查看 Apache 基金会站点:

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

8.2 - 为 Selenium 文档做贡献

有关改进Selenium文档和代码示例的信息

Selenium是一个大型软件项目, 其网站和文档是了解事情如何工作以及学习有效利用其潜力的关键.

该项目包含Selenium的网站和文档. 这是一项持续的工作(不针对任何特定版本), 用于提供有效使用Selenium、 如何参与以及如何为Selenium做出贡献的更新信息.

对网站和文档的贡献遵循以下部分中有关贡献的描述.


Selenium项目欢迎每一个人的贡献. 您可以通过多种方式提供帮助:

上报问题

在报告新问题或评论现有问题时, 请确保讨论与Selenium的软件、 其站点与文档的具体技术问题相关.

随着时间的推移, 所有Selenium组件的变化都非常快, 因此这可能会导致文档过时. 如前所述, 如果您确实遇到这种情况, 请不要为此担心. 您也可能知道如何更新文档, 因此请向我们发送包含相关更改的Pull Request.

如果不确定所发现的问题是否存在, 请通过以下沟通渠道进行描述 https://selenium.dev/support.

What to Help With

Creating Examples

Examples that need to be moved are marked with:

Add Example

We want to be able to run all of our code examples in the CI to ensure that people can copy and paste and execute everything on the site. So we put the code where it belongs in the examples directory. Each page in the documentation correlates to a test file in each of the languages, and should follow naming conventions. For instance examples for this page https://www.selenium.dev/documentation/webdriver/browsers/chrome/ get added in these files:

  • "/examples/java/src/test/java/dev/selenium/browsers/ChromeTest.java"
  • "/examples/python/tests/browsers/test_chrome.py"
  • "/examples/dotnet/SeleniumDocs/Browsers/ChromeTest.cs"
  • "/examples/ruby/spec/browsers/chrome_spec.rb"
  • "/examples/javascript/test/browser/chromeSpecificCaps.spec.js"

Each example should get its own test. Ideally each test has an assertion that verifies the code works as intended. Once the code is copied to its own test in the proper file, it needs to be referenced in the markdown file.

For example, the tab in Ruby would look like this:

    {{< tab header="Ruby" >}}
    {{< gh-codeblock path="/examples/ruby/spec/browsers/chrome_spec.rb#L8-L9" >}}
    {{< /tab >}}

The line numbers at the end represent only the line or lines of code that actually represent the item being displayed. If a user wants more context, they can click the link to the GitHub page that will show the full context.

Make sure that if you add a test to the page that all the other line numbers in the markdown file are still correct. Adding a test at the top of a page means updating every single reference in the documentation that has a line number for that file.

Finally, make sure that the tests pass in the CI.

Moving Examples

Examples that need to be moved are marked with:

Move Code

Everything from the Creating Examples section applies, with one addition.

Make sure the tab includes text=true. By default, the tabs get formatted for code, so to use markdown or other shortcode statements (like gh-codeblock) it needs to be declared as text. For most examples, the tabpane declares the text=true, but if some of the tabs have code examples, the tabpane cannot specify it, and it must be specified in the tabs that do not need automatic code formatting.

贡献

Selenium项目欢迎新的贡献者. 随时间做出重大贡献的个人将成为 提交者 , 并获得对该项目的提交权限.

本指南将指导您完成贡献的过程.

步骤 1: Fork

GitHub上Fork本项目, 并check out到您的本地

% git clone git@github.com:seleniumhq/seleniumhq.github.io.git
% cd seleniumhq.github.io

依赖: Hugo

我们使用 HugoDocsy theme 用于构建和渲染本网站. 你需要Hugo“extended”扩展的Sass/SCSS版本用于这个网站. 我们推荐使用0.110.0或更高版本的Hugo.

请参考来自Docsy的说明 安装Hugo .

步骤 2: 分支

创建一个功能分支并开始工作:

% git checkout -b my-feature-branch

我们实践基于HEAD的开发模式, 这意味着所有更改都直接应用在trunk之上.

步骤 3: 做出改变

本仓库包含站点和文档. 在开始进行更改之前, 请初始化子模块并安装所需的依赖项 (请参阅下面的命令). 要对网站进行更改, 请使用 website_and_docs 目录. 要查看更改的实时预览, 请在站点的根目录上运行 hugo server .

% git submodule update --init --recursive
% cd website_and_docs
% hugo server

请参阅 样式指南 , 以了解更多关于我们约定的信息

步骤 4: 提交

首先确保git知道您的姓名和电子邮件地址:

% git config --global user.name 'Santa Claus'
% git config --global user.email 'santa@example.com'

编写良好的提交信息很重要. 提交信息应描述更改的内容, 原因以及已解决的参考问题(如果有). 撰写时应遵循以下规则:

  1. 第一行应少于或等于50个字符, 并包含对该更改的简短说明.
  2. 保持第二行空白.
  3. 在所有72列处换行.
  4. 包括 Fixes #N, 其中 N 是提交修复的问题编号(如果有).

一个好的提交信息可能看起来像这样:

explain commit normatively in one line

Body of commit message is a few lines of text, explaining things
in more detail, possibly giving some background about the issue
being fixed, etc.

The body of the commit message can be several paragraphs, and
please do proper word-wrap and keep columns shorter than about
72 characters or so. That way `git log` will show things
nicely even when it is indented.

Fixes #141

第一行必须有意义, 因为这是人们在运行 git shortloggit log --oneline 时看到的内容.

步骤 5: Rebase

使用 git rebase (并非 git merge) 同步实时的工作.

% git fetch origin
% git rebase origin/trunk

步骤 6: 测试

永远记住要运行本地服务, 这样做可以确保您的更改没有破坏任何事情.

步骤 7: Push

% git push origin my-feature-branch

访问 https://github.com/yourusername/seleniumhq.github.io.git 并点击 Pull Request 以及填写表格. 请明确您已经签署了CLA (详见步骤 7).

Pull requests通常会在几天内进行审核. 如果有评论要解决, 请在新提交(最好是修正)中应用您的更改, 然后推push到同一分支.

步骤 8: 集成

代码审查完成后, 提交者将获取您的PR并将其集成到项目的trunk分支中. 因为我们希望在trunk分支上保持线性历史记录, 所以我们通常会squash并rebase您的分支历史记录.

沟通

有关如何与项目贡献者和整个社区进行沟通的所有详细信息, 请访问以下网址 https://selenium.dev/support

8.3 - Style guide for Selenium documentation

Conventions for contributions to the Selenium documentation and code examples

Read our contributing documentation for complete instructions on how to add content to this documentation.

Alerts

Alerts have been added to direct potential contributors to where specific content is missing.

{{< alert-content />}}

or

{{< alert-content >}}
Additional information about what specific content is needed
{{< /alert-content >}}

Which gets displayed like this:

Capitalization of titles

Our documentation uses Title Capitalization for linkTitle which should be short and Sentence capitalization for title which can be longer and more descriptive. For example, a linkTitle of Special Heading might have a title of The importance of a special heading in documentation

Line length

When editing the documentation’s source, which is written in plain HTML, limit your line lengths to around 100 characters.

Some of us take this one step further and use what is called semantic linefeeds, which is a technique whereby the HTML source lines, which are not read by the public, are split at ‘natural breaks’ in the prose. In other words, sentences are split at natural breaks between clauses. Instead of fussing with the lines of each paragraph so that they all end near the right margin, linefeeds can be added anywhere that there is a break between ideas.

This can make diffs very easy to read when collaborating through git, but it is not something we enforce contributors to use.

Translations

Selenium now has official translators for each of the supported languages.

  • If you add a code example to the important_documentation.en.md file, also add it to important_documentation.ja.md, important_documentation.pt-br.md, important_documentation.zh-cn.md.
  • If you make text changes in the English version, just make a Pull Request. The new process is for issues to be created and tagged as needs translation based on changes made in a given PR.

Code examples

All references to code should be language independent, and the code itself should be placed inside code tabs.

Default Code Tabs

The Docsy code tabs look like this:

    WebDriver driver = new ChromeDriver();
  
    driver = webdriver.Chrome()
  
    var driver = new ChromeDriver();
  
    driver = Selenium::WebDriver.for :chrome
  
    let driver = await new Builder().forBrowser('chrome').build();
  
    val driver = ChromeDriver()
  

To generate the above tabs, this is what you need to write. Note that the tabpane includes langEqualsHeader=true. This auto-formats the code in each tab to match the header name, and ensures that all tabs on the page with a language are set to the same thing.

{{< tabpane langEqualsHeader=true >}}
  {{< tab header="Java" >}}
    WebDriver driver = new ChromeDriver();
  {{< /tab >}}
  {{< tab header="Python" >}}
    driver = webdriver.Chrome()
  {{< /tab >}}
  {{< tab header="CSharp" >}}
    var driver = new ChromeDriver();
  {{< /tab >}}
  {{< tab header="Ruby" >}}
    driver = Selenium::WebDriver.for :chrome
  {{< /tab >}}
  {{< tab header="JavaScript" >}}
    let driver = await new Builder().forBrowser('chrome').build();
  {{< /tab >}}
  {{< tab header="Kotlin" >}}
    val driver = ChromeDriver()
  {{< /tab >}}
{{< /tabpane >}}

Reference GitHub Examples

To ensure that all code is kept up to date, our goal is to write the code in the repo where it can be executed when Selenium versions are updated to ensure that everything is correct.

All code examples to be in our example directories.

This code can be automatically displayed in the documentation using the gh-codeblock shortcode. The shortcode automatically generates its own html, so we do not want it to auto-format with the language header. If all tabs are using this shortcode, set text=true in the tabpane and remove langEqualsHeader=true. If only some tabs are using this shortcode, keep langEqualsHeader=true in the tabpane and add text=true to the tab. Note that the gh-codeblock line can not be indented at all.

One great thing about using gh-codeblock is that it adds a link to the full example. This means you don’t have to include any additional context code, just the line(s) that are needed, and the user can navigate to the repo to see how to use it.

A basic comparison of code looks like:

{{< tabpane text=true >}}
{{< tab header="Java" >}}
{{< gh-codeblock path="examples/java/src/test/java/dev/selenium/getting_started/FirstScript.java#L46-L47" >}}
{{< /tab >}}
{{< tab header="Python" >}}
{{< gh-codeblock path="examples/python/tests/getting_started/first_script.py#L17-L18" >}}
{{< /tab >}}
{{< tab header="CSharp" >}}
{{< gh-codeblock path="examples/dotnet/SeleniumDocs/GettingStarted/FirstScript.cs#L39-L40" >}}
{{< /tab >}}
{{< tab header="Ruby" >}}
{{< gh-codeblock path="examples/ruby/spec/getting_started/first_script.rb#L16-L17" >}}
{{< /tab >}}
{{< tab header="JavaScript" >}}
{{< gh-codeblock path="examples/javascript/test/getting_started/firstScript.spec.js#L23-L24" >}}
{{< /tab >}}
{{< tab header="Kotlin" >}}
{{< gh-codeblock path="examples/kotlin/src/test/kotlin/dev/selenium/getting_started/FirstScriptTest.kt#L25-L26" >}}
{{< /tab >}}
{{< /tabpane >}}

Which looks like this:


message = driver.find_element(by=By.ID, value="message")

message = driver.find_element(id: 'message')
    let value = await message.getText();
    assert.equal("Received!", value);
        var textBox = driver.findElement(By.name("my-text"))
        val submitButton = driver.findElement(By.cssSelector("button"))

Using Markdown in a Tab

If you want your example to include something other than code (default) or html (from gh-codeblock), you need to first set text=true, then change the Hugo syntax for the tabto use % instead of < and > with curly braces:

{{< tabpane text=true >}}
{{% tab header="Java" %}}
1. Start the driver
{{< gh-codeblock path="examples/java/src/test/java/dev/selenium/getting_started/FirstScript.java#L12" >}}
2. Navigate to a page
{{< gh-codeblock path="examples/java/src/test/java/dev/selenium/getting_started/FirstScript.java#L14" >}}
3. Quit the driver
{{< gh-codeblock path="examples/java/src/test/java/dev/selenium/getting_started/FirstScript.java#L29" >}}
{{% /tab %}}
< ... >
{{< /tabpane >}}

This produces:

  1. Start the driver
        WebDriver driver = new ChromeDriver();
  1. Navigate to a page
        driver.get("https://www.selenium.dev/selenium/web/web-form.html");
  1. Quit the driver
        driver.quit();

This is preferred to writing code comments because those will not be translated. Only include the code that is needed for the documentation, and avoid over-explaining. Finally, remember not to indent plain text or it will rendered as a codeblock.