Relation for Excel

by Matthias Bürcher @ 2016-2017 Version 2

Freeware

This macro provides functions to make simple relational algebra.
Unlike other Excel solutions, this one is purely functional, not using macros.
Relations are saved as text in one cell with double colon (::) as field separator and space+newline as row separator. The first line are the column names.
Note that in a cell, the text cannot be more than 32k characters.

Download

relation-for-excel-200.zip

The zip file contains:

(Note that the examples file does not work 100% with Mac Excel 2011, because the FORMULATEXT() function does not exist yet in this version, so you cannot see the formulas nicely. However the relation code works fine.)

This presentation gives you a good start to understand relational algebra and its use with these functions
relation-for-excel-200-presentation.pdf

Github: https://github.com/bellenuit/relation-for-excel

Relation for Google Sheets is about to be developped.

Installation

You can either copy the module in every sheet or you can install the add-in.

Copy relation-for-excel-200.xlam into the add-in folder of Excel. The location of this folder changes from Excel version to Excel version, on my version it is
/Applications/Microsoft Office 2011/Office/Add-ins

Launch Excel and enable the add-in. On my version, this is the menu Tools:Add-Ins

Open the file relation-for-excel-200-examples.xlsm. The file has the module included, so it works also without the enabled add-in.

To test, if the add-in works, you have two possibilities:

code::name
1::Sony
2::Creative Labs
3::Hewlett Packard
4::Iomega
5::Fujitsu
6::Winchester

Note that you need to copy the add-in on all Excel installations you use the functions or you will see the error #NAME?# in the cell that use the functions.

Introduction

Many people use Excel as a database because nothing else is available or allowed in the company they work. They collect data, which is often relational, but Excel cannot express the relations within the worksheet. Excel provides advanced filtering and pivot tables, but as these tools are interesting to explore data, they have no persistence: If you change a filter or a sort order, you loose the old one. It is also very difficult to build relations between data. This leads to duplicate data entries and many repetitive hand work.

What Excel is missing is a kind of query language like SQL to explore data and to display the result. Google introduced a query function in Sheets, which is already a step in the right direction. There are also some people who have developed macros that compile SQL queries to Excel formulas and then apply them to the data. The problem of these approaches is that queries are not updated when data changes.

The problem is also conceptual. The result of a SQL query is not a single value, but a table of values. Functions normally calculate only a single value they return to a single cell. You can have a common formula for a range of cells, but things gets quickly complex because you may not know the number of rows and columns in advance.

Relation for Excel goes a different approach in many ways:

This is the deal:

There is a limitation:

You have to know:

Relation for Excel does simplify the relational model in two ways:

I hope you can live with that. The reason for the simplification was to keep the syntax simple.

Imagine you have the SQL query

SELECT country, SUM(sales) 
FROM manufacturers 
WHERE manufacturer 
LIKE 'S%' 
GROUP BY country

If the data is in the range A2:C30 (manufacturer,country,sales) then the query would be expressed like

=relOrder(relProject(relSelect(relRange(A2:C30),"relLike($manufactuer,""S*""")"country::sales SUM"),"country A")

or you can also write in one function

=relFilter(A2:C30,"S relLike($manufacturer,""S*"")","P country::sales SUM","O country A")

But this is only half of the story. The power of Relation for Excel is in the fact that you do not have to write one big formula in one cell. You can split it up and each cell is a view with intermediate results you can inspect and use for further operations.

Explore the example workbook, where we have sheets with a presentation and exercises computer store, employees, warehouse, pieces and providers, accounts.

Usage

You can work two ways:

The basics is to convert a range of cells into a relation with relRange(), then make all operations on the relations and then either display the result multiline in a cell or recover the results in individual cells with relCell() or relCellArray().

Relations have more than one line in a cell. To display the relation, you can format the cell: Menu Format:Cells Tab Alignment set Wrap Text and also Text alignment vertical "top". You need also to adjust the size of the row to display all lines.

RelRange(rn ar Range)

Creates a relation from a Range. The result is a text where rows are separated with newlines and columns with ::

Exemple

- A B
1 id manufacturer
2 1 Sony
3 2 Creative Labs
4 3 Hewlett Packard
5 3 Iomega
6 5 Fujitsu
7 6 Winchester
8 id::manufacturer
1::Sony
2::Creative Labs
3::Hewlett Packard
4::Iomega
5::Fujitsu
6::Winchester

A8 has the formula =relRange(A1:B7)

relCell(rel as String, row as integer, col as Integer, Optional Numeric As Boolean = False, Optional noError As Boolean = False)

Gets a cell in a relation.
One based row and column.

Exemple:

relCell(A8,3;2) "Hewlett Packard"

relCellArray(rel as String, Optional noHeader As Boolean = False)

There is an alternative syntax, where you can enter the formula for an entire range with control-shift-enter (adding curly braces) and suppressing errors.

relSelect(rel as String, condition as String)

Filters a relation by a condition, which can be any Excel formula. The formula can use values from the current tuple: columns are selected by name or position and preceded by dollar sign if used as string and a percent sign if used as number. You may have to double quote strings inside the formula and use the comma as parameter separator inside Excel functions.

Exemples:

relSelect(A8,"%id<3") id::manufacturer
1::Sony
2::Creative Labs
relSelect(A8, "LEFT($manufacturer,1)=""F""") id::manufacturer
5::Fujitsu

relProject(rel as String, list as String)

Filters columns based on list of names or columns. The list is separated by double colon(::). Columns are 1-based. You can also aggregate on columns using the aggregators SUM COUNT AVG MIN MAX. As relations work on sets, the operator project can also work as group operator.

Exemples:

relProject(A8, "id") id
1
2
3
4
5
6
relProject(A8, "id SUM) id_sum
21

relExtend(rel as String, expression as String, Optional ByVal name As String)

You can add columns with calculations based on existing columns. The expression can by any Excel formula. Columns are selected by name or position and preceded by dollar sign, if used as string and a percent sign, if used as number. You may have to double quote strings inside the formula.

Example:

relExtend(A8, "%1*2","double") id::manufacturer::double
1::Sony::2
2::Creative Labs::4
3::Hewlett Packard::6
4::Iomega::8
5::Fujitsu::10
6::Winchester::12

relRename(rel as String, list As String)

You can rename columns. This is most needed before joins. You can rename multiple columns separated by double colons ::.

Example:

relRename(A8, "id manid") manid::manufacturer:
1::Sony
2::Creative Labs
3::Hewlett Packard
4::Iomega
5::Fujitsu
6::Winchester

relUnion(rel1 as String, rel2 as String)

Union checks for arity, common columns and eliminates duplicates.

Example:

relUnion(relSelect(A8,"%id4")) id::manufacturer
1::Sony
5::Fujitsu
6::Winchester

relDifference(rel1 as String, rel2 as String)

Difference checks for arity.

Example:

relDifference(A8,relSelect(A8,"%1id>4")) id::manufacturer
1::Sony
2::Creative Labs
3::Hewlett Packard
4::Iomega

relIntersect(rel1 as String, rel2 as String)

Intersection checks for arity.

relJoin(rel1 as String, rel2 as String,condition as string)

Join can be a theta join (specific condition) or a special join based on common column names: Use the as condition the keywords NATURAL, LEFT, RIGHT, OUTER, LEFTSEMI, RIGHTSEMI, LEFTANTISEMI, RIGHTANTISEMI.
For cross product set condition to true.
Self join is also possible.

Example:

- A B
10 id product
11 1 Walkman
12 1 Playstation
13 3 Printer
14 relJoin(A7,Range(A10:B13),"%1=%3") id1::manufacturer::id2::product
1::Sony::1::Walkman
1::Sony::1::Playstation
3::Hewlett Packard::3::Printer
15 relJoin(A7,Range(A10:B13),"NATURAL") id::manufacturer::product
1::Sony::Walkman
1::Sony::Playstation
3::Hewlett Packard::Printer

relOrder(rl as String, list as String)

This is not really relational algebra, as sets have no order, but it is still useful.
List separated by ::, each item is name or column 1-based, space and an order operator
A alpha, Z alpha reverse, 9 number top down, 1 number bottom up.

Example:

relOrder(A8 "manufacturer A") id::manufacturer
2::Creative Labs
5::Fujitsu
3::Hewlett Packard
4::Iomega
1::Sony
6::Winchester

relLimit(rel as String, start as Integer, limit as Integer)

This is not really relational algebra, but useful. Start and limit are 1-based. Limit -1 does not limit.

Example:

relLimit(relOrder(A8,"manufacturer A"),2,3) id::manufacturer
3::Hewlett Packard
4::Iomega
1::Sony

relLike(s as String, pattern as String)

Exposes the VBA LIKE operator to Excel.
Like is not full regex, but has some limited syntax that might be useful for select and join
Operators: ? (any once), * (any zero or more), # (number), [] list [!] exclude list.
Returns true if pattern is matched, so you can use it in relSelect.

relSelect("relLike($2,"*a*,A7) 2::Creative Labs
3::Hewlett Packard
4::Iomega

relRotate(rel as String)

Rotate is not really relational algebra either, but useful, too.

relRotate(A8) col1::col2::col3::col4::col5::col6
Sony::Creative Labs::Hewlett Packard::Iomega:Fujitsu::Winchester

relFilter(paramArray list)

relFilter is a shortcut to push further the limit of the result (VBA has no 32k Limit, Excel has). With some experience, relFilter makes also more readable formulas then the operator functions. We recommend to use relFilter whenever possible.

A parameter can be

relFilter works as a stack machine, add operators and relations like in RPN notation.

Example:

relFilter(C14:D19;"S %id<4"; "P manufacturer"; "O manufacturer A";"L 1 2")

Tip: Give the original tables names so that the formulas are more readable

Example:

relFilter(Products;"S %id<4"; "P manufacturer"; "O manufacturer A";"L 1 2")

relFixpoint(ByVal rel As String, fixpoint As String, ByVal start As String, connect As String)

With relFixpoint() you can walk through tree or a graph of a relation.
This is a recursive union of joins. The fixpoint column will have the initial value start and then get each time the connect column, as long as there are new values.
The column order is forced to the initial one.
See the example file.

relAssert(ByVal rel As String, ByVal constraint As String, ByVal expression As String)

Can be used to assure consistency of th data. Possible asserts

The function returns true or the error with the tuple.

Current limitations

Text size 32k and therefore a result can not have more than 32k characters or you have an error.
This means typically about 350 rows with 12 columns.
relFilter allows you to make more in one step, as VBA has not the limit, only the Excel worksheet.
Peformance not tested, though it looks decent even with a lot of data.
A select statement can return no tuples, but has still properties. Such an empty relation is further treated as the empty relation with no properties at all.
Code is still subject of refactoring, though it is very stable and used in production.

History

4.4.2017 Relation for Excel 2.0.0

15.8.2016 Relation for Excel 1.1.2

15.6.2016 Relation for Excel 1.1.1

10.5.2016 Relation for Excel 1.1

20.4. Relation for Excel 1.0