Relation for Excel 1.0 (old)

by Matthias Bürcher @ 2016

This is the old version of the functions which do use positioned based columns.
It is not supported any more.
For the current version see Relation for Excel

Freeware

This macro provides functions to make simple relational algebra.
Unlike other Excel solutions, this one is purely functional, not using macros.
Relations are saved as text in one cell with double colon (::) as field separator and newline as row separator.
Note that in a cell, the text cannot be more than 32k characters.

Download

relation-for-excel-112.zip

The zip file contains:

  • relation-for-excel-1.112.xlam: The add-in
  • relation-for-excel-1.1.2-examples.xlsm: An Excel sheet with examples

Github: https://github.com/bellenuit/relation-for-excel

Installation

Copy relation-for-excel-1.1.xlam into the add-in folder of Excel. The location of this folder changes from Excel version to Excel version, on my version it is
/Applications/Microsoft Office 2011/Office/Add-ins

Launch Excel and enable the add-in. On my version, this is the menu Tools:Add-Ins

Open the file relation-for-excel-1.1-examples.xlsm.

To test, if the add-in works, you have two possibilities:

  • The menu Insert:Function should show all functions when you type "rel"
  • The cell D26 with the formula =relRange(C14:D19) should show this result

1::Sony
2::Creative Labs
3::Hewlett Packard
4::Iomega
5::Fujitsu
6::Winchester

Note that you need to copy the add-in on all Excel installations you use the functions or you will see the error #NAME?# in the cell that use the functions.

Introduction

Many people use Excel as a database because nothing else is available or allowed in the company they work. They collect data, which is often relational, but Excel cannot express the relations within the worksheet. Excel provides advanced filtering and pivot tables, but as these tools are interesting to explore data, they have no persistence: If you change a filter or a sort order, you loose the old one. It is also very difficult to build relations between data. This leads to duplicate data entries and many repetitive hand work.

What Excel is missing is a kind of query language like SQL to explore data and to display the result. Google introduced a query function in Sheets, which is already a step in the right direction. There are also some people who have developed macros that compile SQL queries to Excel formulas and then apply them to the data. The problem of these approaches is that queries are not updated when data changes.

The problem is also conceptual. The result of a SQL query is not a single value, but a table of values. Functions normally calculate only a single value they return to a single cell. You can have a common formula for a range of cells, but things gets quickly complex because you may not know the number of rows and columns in advance.

Relation for Excel goes a different approach in many ways:

  • The solution is purely functional, so results are updated immediately if data is.
  • The solution compiles the entire table in one text that can take place in a single cell. This makes it easy to handle intermediate results when you build up complex queries.
  • The solution does not use SQL but basic relational algebra. The relational algebra can express anything than SQL can, but it integrates more easily as an Excel function.

This is the deal:

  • The relation is created from a range in form of a table in a cell: tuples (rows) are separated by newlines, properties (columns) by double colon (::).
  • You apply all relational algebra operators select, project, extend, union, difference, join. We have also non relational operators order, limit and rotate.
  • If you want, you can then retrieve the results again in cells and do whatever you want with it for further calculations.

There is a cost:

  • The result of an Excel function cannot be more than 32K characters. This might make it not usable for big tables.
  • The separator double colon (::) cannot be used inside a value.

You have to know:

  • Relational algebra works with sets and therefore has no duplicates and also no null values. It may have empty values however.
  • Tuples (rows) in a relation have no particular order. We added the relOrder function because sometimes humans would like to view the tuples in a particular order.

Relation for Excel does simplify the relational model in two ways:

  • Properties (columns) in tuples are not named. The columns are adressed by position, so the position of the column does matter. This means, that there is never a name conflict in joins, but also that natural joins are not possible, you must specify the condition explicitely.
  • Properties are not typed. Excel works with text and numbers, you define only the type when you write the expression or condition.

I hope you can live with that. The reason for the simplification was to keep the syntax simple.

Imagine you have the SQL query

SELECT country, SUM(sales)
FROM manufacturers
WHERE manufacturer
LIKE 'S%'
GROUP BY country

If the data is in the range A2:C30 (manufacturer,country,sales) then the query would be expressed like

=relOrder("2 A",relProject("2::SUM 3",relSelect("1 LIKE ""S*""".relRange(A2:C30)))

But this is only half of the story. The power of Relation for Excel is in the fact that you do not have to write one big formula in one cell. You can split it up and each cell is a view with intermediate results you can inspect and use for further operations.

Explore the example workbook, where we have sheets with a presentation and exercises computer store, employees, warehouse, pieces and providers, accounts.

Usage

The basics is to convert a range of cells into a relation with relRange(), then make all operations on the relations and then either display the result multiline in a cell or recover the results in individual cells with relCell() or relCellArray().

Relations have more than one line in a cell. To display the relation, you can format the cell: Menu Format:Cells Tab Alignment set Wrap Text and also Text alignment vertical "top". You need also to adjust the size of the row to display all lines.

RelRange(rn ar Range)


Creates a relation from a Range. The result is a text where rows are separated with newlines and columns with ::

Exemple

- A B
1 1 Sony
2 2 Creative Labs
3 3 Hewlett Packard
4 3 Iomega
5 5 Fujitsu
6 6 Winchester
7 1::Sony
2::Creative Labs
3::Hewlett Packard
4::Iomega
5::Fujitsu
6::Winchester

A7 has the formula =relRange(A1:B6)

relCell(rel as String, row as integer, col as Integer)


Gets a cell in a relation.
One based row and column.

Exemple:

relCell(A7,3;2) "Hewlett Packard"

relCellArray(rel as String)


There is an alternative syntax, where you can enter the formula for an entire range with control-shift-enter (adding curly braces) and suppressing errors.

relSelect(condition as String, rel as String)

Filters a relation by a condition, which can be any Excel formula. The formula can use values from the current tuple: columns are selected by position and preceded by dollar sign if used as string and a percent sign if used as number. You may have to double quote strings inside the formula and use the comma as parameter separator inside Excel functions.

Exemples:

relSelect("%1<3",A7) 1::Sony
2::Creative Labs
relSelect("LEFT($2,1)=""F""";A7) 5::Fujitsu

relProject(list as String, rel as String)

Filters columns based on list. The list is separated by double colon(::). Columns are 1-based. You can also aggregate on columns using the aggregators SUM COUNT AVG MIN MAX. As relations work on sets, the operator project can also work as group operator.
Note that you can not express a COUNT 0.

Exemples:

relProject("1",A7) 1
2
3
4
5
6
relProject("SUM 1,A7) 21

relExtend(expression as String, rel as String)


You can add columns with calculations based on existing columns. The expression can by any Excel formula. Columns are selected by position and preceded by dollar sign, if used as string and a percent sign, if used as number. You may have to double quote strings inside the formula.

Example:

relExtend("%1*2",A7) 1::Sony::2
2::Creative Labs::4
3::Hewlett Packard::6
4::Iomega::8
5::Fujitsu::10
6::Winchester::12

relUnion(rel1 as String, rel2 as String)


Union checks for arity and eliminates duplicates.

Example:

relUnion(relSelect("%14",A7)) 1::Sony
5::Fujitsu
6::Winchester

relDifference(rel1 as String, rel2 as String)


Difference checks for arity.

Example:

relDifference(A7,relSelect("%1>4",A7)) 1::Sony
2::Creative Labs
3::Hewlett Packard
4::Iomega

relJoin(condition as string, rel1 as String, rel2 as String)


Join is actually a theta join, column numbers are based on the joined table
For natural join set an equal condition and combine with AND, if there is more than one key.
For intersection set all colums equal.
For cross product set condition to true.
Self join is also possible.

Example:

- A B
11 1 Walkman
12 1 Playstation
13 3 Printer
14 relJoin("$1=$3",A7,Range(A11:B13)) 1::Sony::1::Walkman
1::Sony::1::Playstation
3::Hewlett Packard::3::Printer

relOrder(list as String, rel as String)


This is not really relational algebra, as sets have no order, but it is still useful.
List separated by ::, each item is column 1-based, space and an order operator
A alpha, Z alpha reverse, 9 number top down, 1 number bottom up.

Example:

relOrder("2 A",A7) 2::Creative Labs
5::Fujitsu
3::Hewlett Packard
4::Iomega
1::Sony
6::Winchester

relLimit(start as Integer, limit as Integer, rel as String)


This is not really relational algebra, but useful. Start and limit are 1-based. Limit -1 does not limit.

Example:

relLimit(2,3,relOrder("2 A",A7)) 3::Hewlett Packard
4::Iomega
1::Sony

relLike(s as String, pattern as String)


Exposes the VBA LIKE operator to Excel.
Like is not full regex, but has some limited syntax that might be useful for select and join
Operators: ? (any once), * (any zero or more), # (number), [] list [!] exclude list.
Returns true if pattern is matched, so you can use it in relSelect.
relSelect("relLike($2,"*a*,A7) 2::Creative Labs
3::Hewlett Packard
4::Iomega

relRotate(rel as String)


Rotate is not really relational algebra either, but useful, too.
relRotate(A7) 1::2::3::4::5::6
Sony::Creative Labs::Hewlett Packard::Iomega:Fujitsu::Winchester

relFilter(paramArray list)


relFilter is a shortcut to push further the limit of the result (VBA has no 32k Limit, Excel has).
A parameter can be
  • range (when it has at least 2 cells)
  • selection "S condition"
  • projection "P list"
  • extension "E expression"
  • union "U"
  • difference "D"
  • join "J condition"
  • rotate "R"
  • order "O list"
  • limit "L start limit"
  • relation "T relation"

relFilter works as a stack machine, add operators and relations like in RPN notation.

Example:

relFilter(C14:D19;"S %1<4"; "P 2"; "O 1 A";"L 1 2")

Current limitations


Text size 32k and therefore a result can not have more than 32k characters or you have an error.
This means typically about 350 rows with 12 columns.
relFilter allows you to make more in one step, as VBA has not the limit, only the Excel worksheet.
Peformance not tested, though it looks decent even with a lot of data.
There is an ambiguity between an empty relation and a relation with a tuple with one empty property.

History


15.8.2016 Relation for Excel 1.1.2
  • relCell and relRow do not return error any more, but empty strings, when out of range

15.6.2016 Relation for Excel 1.1.1

  • relProject gives no results for sum and count even if there is no row
  • relParseExpression: In expressions with more than 10 columns, you do not need to preceed any more the 1-9 columns with a 0 (substution starts top, so there is no ambiguity)

10.5.2016 Relation for Excel 1.1

  • Fixed several small bugs
  • Redefined swFilter as unlimited parameters

20.4. Relation for Excel 1.0