I tried out Skyvern, a browser-based workflow automation tool that uses LLM and computer vision. Unlike code, it doesn't need to be rewritten every time a website changes.

Skyvern is a tool that automates browser workflows by providing AI instructions via text. Unlike code-based automation, Skyvern's ability to adapt to slight changes in website structure is key. While it's a paid service, creating an account at the time of writing offers a $5 credit, so I decided to test it out.
Skyvern - Automate Browser-Based Workflows with AI
https://www.skyvern.com/
GitHub - Skyvern-AI/skyvern: Automate browser-based workflows with LLMs and Computer Vision
https://github.com/Skyvern-AI/Skyvern
Skyvern is open source and can be self-hosted using Docker, but this time we will use the cloud service version developed by Skyvern. Visit the Skyvern page and click 'Get Started'.

Click 'Sign up' to create an account.

This time, I will use a Google account. Click 'Continue with Google.'

Click the account you want to log in with.

Click “Next”.

My account was successfully created. I entered the task I wanted to complete in Skyvern. In this example, I entered 'Search for 'Nikkei Stock Average' in Google Finance. The task will be completed once the latest closing price is obtained,' and clicked the paper airplane icon on the right.

The AI will automatically convert it into a Skyvern task format. Check that the URL and instructions are correct, then click 'Run.'

The task has been created. Click 'View'.

The task is in 'running' state, and you can see the AI's operation in real time.

The task was completed in about 5 minutes. The closing price information for the Nikkei Stock Average was properly extracted under the name 'latest_closing_price.' The task required two steps and four actions. The cost of the cloud version of Skyvern is
$0.10 (approximately 15 yen) per processed page
, and since I only processed one page this time, the cost was $0.10 (approximately 15 yen).
The actions performed by Skyvern are displayed in the bottom right corner, and clicking them will display a screenshot of the action. This is useful for quickly identifying which action failed if the task doesn't work properly. In this case, two actions, 'Text Input' and 'Click,' were marked as 'Fail' for some reason, but they were successful.

The 'Recording' tab allows you to watch a video of the entire AI operation. According to the video time, it took the AI 4 minutes and 17 seconds to complete the task.

In 'Parameters', you can check the parameters set during operation. If the task fails, you can edit these parameters and run it again.

The 'Diagnostics' tab stores data on the processing performed internally by Skyvern. Looking at 'LLM Request (Raw),' it appears that OpenAI's GPT-4o mini model is running behind the scenes.

Skyvern itself does not have built-in AI, so if you self-host Skyvern, you will need to provide some kind of AI yourself. According to the documentation, the supported AI models are as follows:
OpenAI: 'gpt4-turbo' 'gpt-4o' 'gpt-4o-mini'
Anthropic: “Claude 3 (Haiku・Sonnet・Opus)” “Claude 3.5 (Sonnet)”
Azure OpenAI: All GPT models
AWS Bedrock: “Claude 3 (Haiku・Sonnet・Opus)” “Claude 3.5 (Sonnet)”
In addition, support for 'Ollama,' 'Gemini,' and 'Llama 3.2' is planned for the near future.
Let's try a slightly more complicated task. This time, I typed in the prompt, 'Log in to Amazon.co.jp and check your purchase history. The task will be completed once you have retrieved the data for your most recent purchases,' and clicked the button with the paper airplane icon.

Once again, the AI will create the task details. Click 'Show Advanced Settings.'

An item called 'login_credentials' has been created, so replace it with the email address and password of the account you want to log in to and click 'Run'.

Although I was able to proceed smoothly up to the login screen, I was unable to pass two-factor authentication and it was marked as 'failed.' If it fails, the 'Failure Reason' will state the reason for the failure. Please note that even if the task fails, a fee will be charged, and in this case, a new page was opened each time I retried two-factor authentication, so the total cost was $1 (approximately 154 yen).

In addition to tasks, Skyvern also allows you to create 'Workflows.' Click 'Create Workflow.'

Click the '+' button to create a task.

By linking multiple tasks as shown in the image below, it appears possible to use the values obtained in the previous task to perform the next task. Note that Workflow is still an experimental feature.

Related Posts:
in AI, Software, Web Service, Review, Web Application, Posted by log1d_ts







