Using Roe AI's VolansDB, we streamlined the process by importing, merging, and analyzing resumes from various platforms, enabling efficient data extraction and comprehensive candidate evaluation through AI-powered SQL queries.
We set up scripts to pull candidate information from ATS systems. But the information is not comprehensive and the resumes are in URLs. We can write scripts to download and import to our system, but we now have a much simpler way.
VolansDB's load_url_file
function can load a file directly from urls.
Here's what the original imported table of candidates looks like - we have a column of resume urls in the form of S3 signed links
To load the actual resume files into a new column, simply run:
```
ALTER TABLE candidate_resumes ADD COLUMN resume_file TEXT;
ALTER TABLE candidate_resumes
UPDATE resume_file = load_url_file(resume_url)
WHERE 1;
```
Then after a short while, we can see the files are loaded into the table:
Another source of resumes is S3, where we put all the other resumes we received. VolansDB supports import directly from S3 buckets:
We can easily join resumes from different sources in to the same table:
CREATE TABLE all_resumes ENGINE MergeTree ORDER BY filename AS (
SELECT name as filename, file FROM s3resumes
UNION ALL
SELECT resume_file_name AS filename, resume_file AS file FROM candidate_resumes);
We first created a data-processing AI agent specialized for PDF extraction. The platform allows us to configure a specific schema for the JSON output.
We can then apply the agent in batch on all the resumes using SQL, and expand the extraction result to table columns.
CREATE TABLE extracted ENGINE = MergeTree ORDER BY filename AS (
SELECT filename, file,
run_agent_sync('9f2a55e3-f439-49e0-9b39-25c57a50e08c', map('pdf_file', file)) AS result,
JSONExtractString(result, 'city') AS city,
JSONExtractString(result, 'country') AS country,
JSONExtractString(result, 'candidate_schools') AS candidate_schools,
JSONExtractBool(result, 'candidate_worked_at_startup') AS candidate_worked_at_startup,
JSONExtractInt(result, 'years_of_experience') AS years_of_experience,
JSONExtractString(result, 'companies') AS companies,
JSONExtractString(result, 'skills') AS skills FROM all_resumes);
This candidate started work since 2013 and we got all the information we want extracted!
The power of the engine is not only limited to execute extraction at scale, but also semantic search the PDF files directly, and doing traditional SQL filtering and aggregation on the extracted entities.
With Roe AI, you can connect to any of your file data sources as well as any ERP platform with the upcoming connectors and Zapier support. And you can analyze any type of data with the power of AI simply by using SQL.
If you are interested in learning more: