AI Frontier
Richard Meng
|
October 24, 2024

AI Frontier Data Analyst Super Power | How to analyze merchant homepages visually at scale with SQL

Find out what merchant is selling based on website visuals

Today, analyzing a web page takes a village

  • a webpage has complex layout and complex data types
  • webpages usually come in a large volume

Often times, a lot of important business questions must depend on these complex web data

  • I want to onboard this merchant to my platform, but is this merchant a fraudster?
  • I want to make sure this merchant is not selling cannabis, weapons or other illegal contents
  • Based on what this company is offering on its website, could this be a potential customer that I can sell to?

However, the rule-based systems often fall short of answering these tough questions.

Now, Large Vision Model (LVM) is smart enough to do this job!

And what if we combine LVM with SQL?

Bingo, we can answer these questions from a LOT of webpages!

Let's see it in action!

GOAL: I have a few hundred homepages, I need to find what is this merchant selling

Step 1 - convert webpages to screenshot

Usually I have to start a python worker to do this manually, but Roe has the built-in functionality for convenience

All I have to do is to create this Roe agent, which does one thing - based on a URL, take a long screenshot of the web page.

Here is one example, an Etsy merchant selling Mickey Mouse clothing.

SQL Query

-- get the screenshot of the webpage 
SELECT url, ROE('<agent_id>', MAP('url', <url>)) FROM url_data;

Once the webpages are crawled as screenshot images, we could "ask" the images with Roe's multi-modal data extraction function.

Let's try ask it to classify the webpage screenshot into 4 categories, great we can see the labels are extracted in JSON

SQL Query (cont'd)

-- AI-powered SQL dialect on Roe that extract structured data from multi-modal data
SELECT url, extract_from(<natural language prompt>, <image>) FROM url_data


Here, it feels that putting up the JSON string is cumbersome.

So let's create a new LVM-based data extraction agent, where we can define JSON schema via UI.

Finally, let's run the SQL with our new data extraction agent + screenshot agent

Great, now I can see the etsy webpage contains a Disney representation

SQL Query (final)

-- get the screenshot of the webpage-- pass the screenshot to the next agent
SELECT url, roe(<LVM_agent>, MAP('instruction', '', 'image', roe(<screenshot_agent>, MAP('url', <url>))) FROM url_data

That's it, no longer need to set up complex infrastructure to extract data from thousands of webpages!

One line SQL + LVM is all you need!

To learn more or get a trial, schedule a demo at getroe.ai

Interested in joining? Apply here

All blogs