Find out what merchant is selling based on website visuals
Often times, a lot of important business questions must depend on these complex web data
However, the rule-based systems often fall short of answering these tough questions.
Now, Large Vision Model (LVM) is smart enough to do this job!
And what if we combine LVM with SQL?
Bingo, we can answer these questions from a LOT of webpages!
Let's see it in action!
Usually I have to start a python worker to do this manually, but Roe has the built-in functionality for convenience
All I have to do is to create this Roe agent, which does one thing - based on a URL, take a long screenshot of the web page.
Here is one example, an Etsy merchant selling Mickey Mouse clothing.
-- get the screenshot of the webpage
SELECT url, ROE('<agent_id>', MAP('url', <url>)) FROM url_data;
Once the webpages are crawled as screenshot images, we could "ask" the images with Roe's multi-modal data extraction function.
Let's try ask it to classify the webpage screenshot into 4 categories, great we can see the labels are extracted in JSON
-- AI-powered SQL dialect on Roe that extract structured data from multi-modal data
SELECT url, extract_from(<natural language prompt>, <image>) FROM url_data
Here, it feels that putting up the JSON string is cumbersome.
So let's create a new LVM-based data extraction agent, where we can define JSON schema via UI.
Finally, let's run the SQL with our new data extraction agent + screenshot agent
Great, now I can see the etsy webpage contains a Disney representation
-- get the screenshot of the webpage-- pass the screenshot to the next agent
SELECT url, roe(<LVM_agent>, MAP('instruction', '', 'image', roe(<screenshot_agent>, MAP('url', <url>))) FROM url_data
That's it, no longer need to set up complex infrastructure to extract data from thousands of webpages!
One line SQL + LVM is all you need!
To learn more or get a trial, schedule a demo at getroe.ai
Interested in joining? Apply here