I tried using 'Skyvern', which uses LLM and computer vision to automate browser-based workflows, and unlike code, there is no need to rewrite it every time a website changes



Skyvern is a tool that automates the workflow of using a browser by instructing AI with text. Unlike automating with code, it can handle slight changes in the structure of a website without any problems. Although it is a paid service, at the time of writing, it was said that creating an account would give you a credit of $5 (about 770 yen), so I actually tried to check its usability.

Skyvern - Automate Browser-Based Workflows with AI

https://www.skyvern.com/

GitHub - Skyvern-AI/skyvern: Automate browser-based workflows with LLMs and Computer Vision
https://github.com/Skyvern-AI/Skyvern


Skyvern is open source and can be self-hosted using Docker, but this time we will use the cloud service version developed by Skyvern. Visit the Skyvern page and click 'Get Started'.



Click 'Sign up' to create an account.



This time, we will use a Google account. Click 'Continue with Google'.



Click the account you want to log in with.



Click “Next”.



The account creation was successful. Enter the task you want to perform on Skyvern. In this example, I entered 'Search for 'Nikkei Stock Average' in Google Finance. The task will be completed when you get the latest closing price,' and clicked the paper airplane icon on the right.



The AI will automatically convert it into a Skyvern task format. Check that the URL and instructions are as intended, then click 'Run.'



The task has been created. Click 'View'.



The task will be in a 'running' state, and you will be able to see the AI's operation in real time.



The task was completed in about 5 minutes. The closing price information of the Nikkei Stock Average was properly extracted under the name 'latest_closing_price'. The number of steps required to complete the task was 2, and the number of actions was 4. The fee for the cloud version of Skyvern is

0.1 dollars (about 15 yen) per processed page , and since only one page was processed this time, the fee was 0.1 dollars (about 15 yen).



The actions performed by Skyvern are displayed in the bottom right, and clicking them will show a screenshot of the action taken. This is useful for quickly identifying which action failed when a task doesn't work properly. In this case, the two actions 'Text input' and 'Click' were marked as 'Fail' for some reason, but the operation was successful.



In the 'Recording' tab, you can see the entire operation by AI in video format. According to the video time, it seems that it took the AI 4 minutes and 17 seconds to complete the task.



In 'Parameters', you can check the parameters that were set during operation. If the task fails, you can edit these parameters and run it again.



The 'Diagnostics' tab stores data on the processing performed internally by Skyvern. Looking at 'LLM Request (Raw)', it appears that OpenAI's GPT-4o mini model is running behind Skyvern.



Skyvern itself does not have built-in AI, so if you self-host Skyvern, you will need to prepare some kind of AI yourself. According to the documentation, the supported AI models are as follows:

OpenAI: 'gpt4-turbo' 'gpt-4o' 'gpt-4o-mini'
Anthropic: “Claude 3 (Haiku・Sonnet・Opus)” “Claude 3.5 (Sonnet)”
Azure OpenAI: All GPT models
AWS Bedrock: “Claude 3 (Haiku・Sonnet・Opus)” “Claude 3.5 (Sonnet)”

In addition, support for 'Ollama', 'Gemini', and 'Llama 3.2' is planned for the near future.

Let's try a slightly more complicated task. This time, I entered the prompt, 'Log in to Amazon.co.jp and check your purchase history. The task will be completed when you get the data of your most recent purchases,' and clicked the button with the paper airplane mark.



The AI will again create task details. Click 'Show Advanced Settings'.



An item called 'login_credentials' has been created, so replace it with the email address and password of the account you want to log in to and click 'Run'.



Although I was able to proceed smoothly to the login screen, I was unable to pass the two-step authentication and it became 'failed'. If it fails, the reason for 'where it failed' will be written in 'Failure Reason'. Please note that even if the task fails, a fee will be charged, and this time, a new page was opened every time I retried the two-step authentication, so the total cost was 1 dollar (about 154 yen).



In addition to tasks, you can also create 'Workflows' in Skyvern. Click 'Create Workflow'.



You can create a task by clicking the '+' button.



By linking multiple tasks as shown in the image below, it seems possible to use the values obtained in the previous task to perform the next task. Workflow is still an experimental feature.


in Review,   Software,   Web Service,   Web Application, Posted by log1d_ts