AI Frontier
Hire Like a Data Scientist #2 | Resumes from any data source

Using Roe AI's VolansDB, we streamlined the process by importing, merging, and analyzing resumes from various platforms, enabling efficient data extraction and comprehensive candidate evaluation through AI-powered SQL queries.

A couple of months after we hired our first engineer Sam, we now have more and more engineering tasks and we have to get more help. This time, we want to find people in Europe and other parts of the world to enlarge the talent pool. And it becomes challenging, as we now have a lot of inbound candidates from different sources to choose from, including ATS systems like Gem, downloaded Resumes from email and platforms like Indeed, and more. Here's how we use Roe AI's VolansDB to solve this challenge.

Import candidate resumes

Load from URLs

We set up scripts to pull candidate information from ATS systems. But the information is not comprehensive and the resumes are in URLs. We can write scripts to download and import to our system, but we now have a much simpler way.

VolansDB's load_url_file function can load a file directly from urls.

Here's what  the original imported table of candidates looks like - we have a column of resume urls in the form of S3 signed links

To load the actual resume files into a new column, simply run:

```

ALTER TABLE candidate_resumes ADD COLUMN resume_file TEXT;

ALTER TABLE candidate_resumes

UPDATE resume_file = load_url_file(resume_url)

WHERE 1;

```

Then after a short while, we can see the files are loaded into the table:

Import from S3

‍Another source of resumes is S3, where we put all the other resumes we received. VolansDB supports import directly from S3 buckets:

Selected from many supported data sources.
Configure connection to the bucket.
Import files into a new VolansDB table.
You can now manage all your imported S3 files in a SQL-like table!

Resume analysis with AI

Merging tables

We can easily join resumes from different sources in to the same table:

```

CREATE TABLE all_resumes ENGINE MergeTree ORDER BY filename AS (

SELECT name as filename, file FROM s3resumes

UNION ALL

SELECT resume_file_name AS filename, resume_file AS file FROM candidate_resumes

);

```

Extract candidate info

We first created a data-processing AI agent specialized for PDF extraction. The platform allows us to configure a specific schema for the JSON output.

We can then apply the agent in batch on all the resumes using SQL, and expand the extraction result to table columns.

```

CREATE TABLE extracted ENGINE = MergeTree ORDER BY filename AS (
 SELECT filename, file,
 run_agent_sync('9f2a55e3-f439-49e0-9b39-25c57a50e08c', map('pdf_file', file)) AS result,
 JSONExtractString(result, 'city') AS city,
 JSONExtractString(result, 'country') AS country,
 JSONExtractString(result, 'candidate_schools') AS candidate_schools,
 JSONExtractBool(result, 'candidate_worked_at_startup') AS candidate_worked_at_startup,
 JSONExtractInt(result, 'years_of_experience') AS years_of_experience,
 JSONExtractString(result, 'companies') AS companies,
 JSONExtractString(result, 'skills') AS skills
 FROM all_resumes
);

```

Extracted entities

This candidate started work since 2013 and we got all the information we want extracted!

Search, Aggregate, Explore

‍The power of the engine is not only limited to execute extraction at scale, but also semantic search the PDF files directly, and doing traditional SQL filtering and aggregation on the extracted entities.

Closing thoughts

With Roe AI, you can connect to any of your file data sources as well as any ERP platform with the upcoming connectors and Zapier support. And you can analyze any type of data with the power of AI simply by using SQL.

If you are interested in learning more: