SQL Query function in Excel
If you’re reading this article you probably know that Google Sheets has a QUERY function that allows you to run SQL-like queries against data in the sheet. This function lets you do all sorts of gymnastics with the data in your sheet, be it filtering, aggregating, or pivoting data.
Being a fully-fledged desktop app, Excel tends to be more feature-rich than Google Sheets. This is especially true in the data analytics department where Excel shines with advanced Excel functions as well as Power Query functionality.
However, Excel doesn’t natively have a QUERY function that you can use in cells on the sheet.
In this blog post, I’m going to show you how to add a QUERY function to Excel and give a few examples of how to use it.
Let’s start by taking a look at the function in action.
The function is pretty straightforward. It accepts the SQL query as the first parameter and returns a table with the results of the query.
The results automatically spill to the necessary amount of space. This spilling behavior relies on the dynamic array functionality that’s available in Excel 365 (but isn’t in earlier versions of Microsoft Excel).
Works with Excel tables
In Google Sheets, the QUERY function references data by address (e.g. “A1:B10”) while columns are referenced by letters (e.g. A, B, C…).
This works but has some drawbacks:
- It makes the query sensitive to the location of the data. If the data is moved or if columns are reordered, the query will break.
- It makes the query difficult to read since it uses range addresses and column letters instead of table and column names (e.g. Employees, DateOfBirth…)
- Adding or removing rows can break the query. For example, if the provided range is “A1:H10” the query will only take into account the first 10 rows. If additional rows are added, the query will not take them into account. You can get around this by omitting the end row number (e.g. “A1:H”), but this means that there must be no other content below the data range.
Excel, on the other hand, allows explicitly defining tables (aka ListObjects) that delineate the areas that hold data. Each Excel table has a name, as do its columns. This makes Excel tables very similar to database tables and makes them easier to work with from SQL.
Full SQL syntax support (SQLite)
Under the hood, the Windy.Query function is powered by SQLite – a small but powerful embedded database engine.
When called, the function passes the query to the built-in SQLite engine which has an adapter that lets it use Excel tables as its data source.
This means that the entire SQLite syntax is available for use in queries. In comparison, in Google Sheets, the query syntax is rather limited. It only supports a single table (no joins) and a very small set of built-in functions.
Examples of use
Since the engine under the hood is SQLite, queries can use all operations available in SQLite, including table joins, temp tables, column table expressions, window functions etc… Let’s go over some examples of how to use these in Excel.
Here’s an example of a simple one-to-many join:
The usual way of doing a simple operation such as this one in Excel would be to use xlookup or PowerQuery, but SQL is now another option. And if we needed anything more complex than a simple join, SQL would quickly shine as the most powerful and convenient option of the three.
Merging table rows (union)
Another way we might want to combine two (or more) tables is to combine their rows. We can do this with a SQL UNION operator.
The tables might have some rows in common. If we want to keep only one instance of such rows we would use the regular UNION operator. If we want to keep both versions of rows that are in common, we would use the UNION ALL operator.
Finding differences between two tables
In the previous example, we had two tables that had some rows in common and some rows not. Let’s assume, for example, that the first table contains last year’s list of employees and the second table is the new list of employees.
If we wanted to find out the differences between the two tables, we could easily do that with a bit of SQL.
All of the rows that are in the first table but not in the second one we will mark as “deleted”. All of the rows that are in the second table but not in the first one we will mark as “added”. Here’s what that SQL query looks like:
select id, name, 'deleted' from employees e where not exists (select * from Employees_New en where e.id == en.Id) union select id, name, 'added' from employees_new en where not exists (select * from Employees e where e.id == en.Id)
And here’s what the result looks like:
Another useful thing we might want to do is rank rows based on some criteria. For example, suppose we have a table with a list of cities. For each city we have its population and the country it belongs to.
Our task is to find the top 3 cities in each country based on population. Here’s how we might do that in SQL.
-- we use this CTE so we can reference the calculated 'rank_pop' column in the where clause with cte as ( select city, country, population, -- using the RANK() window function RANK() OVER (PARTITION BY country ORDER BY population) as rank_pop from cities c) select * from cte where -- filtering by the 'rank_pop' column from the CTE rank_pop <= 3 order by country, rank_pop
This query is a bit more complex than the previous ones. It uses a common table expression and a window function (the rank function), and showcases the ability to write complex SQL in queries.
Queries can also make use of dozens of built-in SQLite functions. Various specialized extended functions such as RegexReplace, GPSDist (GPS distance between two points) and LevDist (fuzzy text matching) are also available.
OK, this next example is a bit of a hack, but a useful one… The query you supply doesn’t need to be a SELECT query. You can do UPDATE/INSERT/DELETE statements as well, and these will modify the data in the target Excel tables.
This can be a handy way to clean and transform data in your tables in place, without having to export/import the data to an external database (e.g. SQL Server, MySql, Postgres…).
This works because the SQLite engine isn’t copying the data. Rather, it’s using an adapter that lets it access live data in the Excel table.
How does the function see Excel tables?
At first glance, it might seem strange that the query can access your workbook tables. After all, we did not pass them in as parameters, and functions normally only work with parameters that are passed to them.
However, the Windy.Query function is aware of the workbook it’s being called from and it can read data from the workbook’s tables without the need for passing them in as parameters. This makes the function much easier to call especially when working with multiple tables.
Results returned by the Windy.Query function can optionally include headers. This is controlled by the second parameter of the function.
The texts in the column headers are determined by the SQL query itself. You can easily rename result columns by aliasing them in the select list.
Automatically refresh results
By default, the SQL query runs as a one-off operation when you enter the formula but does not refresh if the source tables change. However, if you want the query to refresh whenever one of the source tables changes, you can easily do so by setting the autoRefresh argument to true.
Note that the auto-refresh functionality relies on Excel’s RTD (Real-Time Data) server. The RTD server usually throttles updates so functions don’t overwhelm Excel with frequent updates. The default throttle interval is 2s meaning that the function will not update more than once every 2s. To improve responsiveness, you can lower this value to something like 20ms. The simplest way to do this is through the “Configure” dialog in the QueryStorm runtime’s ribbon.
When needed, SQL queries can use values from cells as parameters. To use a cell as a parameter in a query, start by giving the cell a name (named range).
Once the cell has a name, you can reference it in the query using the @paramName or $paramName syntax.
If automatic refresh is turned on, results will automatically refresh whenever one of the parameter cells changes its value.
This is all well and good for small tables, you might think, but how does it handle large data sets? Well, it handles them quite well. The function can read source tables of 100k rows and 10 columns within a few milliseconds and can return this amount of data in a second or two. In addition to this, all columns are automatically indexed so searches and joins are extremely performant as well.
This makes the function perform very well, both from the data throughput standpoint as well as from the computational one.
OK, so is this better than the Google Sheets version of the QUERY function?
Yes, dah. Did you read the previous chapters? 😛
Installing the Windy.Query function
So how do you install this function into your Excel? It’s a simple 2-step process.
Step 1 is to install the QueryStorm Runtime add-in (if you don’t already have it). This is a free, 4MB add-in for Excel that lets you install and use various extensions for Excel. It’s basically an app store for Excel.
Step 2 is to click the “Extensions” button in the “QueryStorm” tab in the Excel ribbon, find the Windy.Query package in the “Online” tab, and install it.
What happens if I share the workbook with a user who doesn’t have the function?
Nothing bad. If the other user doesn’t have the function installed, they will see the last results of the query that were returned on your machine. They just won’t be able to refresh the results.
Advanced SQL query editor
Writing SQL queries in the formula bar can get a bit unwieldy. To make queries easier to write, it’s better to use a proper editor, preferably one that offers syntax highlighting and code completion for SQL and knows about the tables in your workbook.
For this purpose, I recommend using the QueryStorm IDE. This is an advanced IDE that lets you use SQL in Excel. You can write the query in the QueryStorm code editor and then paste the query into the Windy.Query function when you’re happy with it (if needed).
The IDE does more than just allow using SQL in Excel. You can use it to create and share functions and addins for Excel. In fact the QueryStorm IDE was used to create the Windy.Query function itself.
The IDE has a free community version for individuals and small companies, while users in larger companies can make use of the free trial license. For paid licenses, check out the pricing page.
You can read more in this blog post that’s dedicated to the QueryStorm SQL IDE.
For a video demonstration of the Windy.Query function, take a look the following video: