You know how to countif in Excel so why is Power Query so difficult?
me@jaykilleen.com wrote this almost 8 years ago and it was last updated almost 8 years ago.
If you are familiar with the Excel COUNTIF formula then you might find it frustrating to try and replicate this in Power Query or PowerBI in M Language.
I've put together a simple Excel 2016 file to illustrate the Excel way vs the Power Query / PowerBi M Language equilvalent.
You can download this data in the 2016 Excel file here on Dropbox.
Let's create two tables:
The two tables are on WorkSheet countif
and I have inserted the seed data below as tables named City
and People
. You can see the table design by clicking the design
tab and inspecting the Table Name
field.
City
is made up of two columns and an additional four of the alternatives for counting the population
:
People
is made up of three columns:
Let's fill these tables with some seed data for our examples.
CITY
Id | Name | Population(x) | Population(m) | Population(so) | Population(f) |
---|---|---|---|---|---|
1 | Brisbane | 2 | 2 | 2 | 2 |
2 | Sydney | 1 | 1 | 1 | 1 |
3 | Melbourne | 1 | 1 | 1 | 1 |
PEOPLE
Id | Name | CityId |
---|---|---|
1 | Jay | 1 |
2 | Sam | 2 |
3 | Sarah | 1 |
3 | Holly | 3 |
After the great help and answers I received by Mike Honey, Imke Feldman, Carl Walsh, Alejandro Lopez-Lago.
I tried to break this question down as simple as possible. Two very basic tables and wanting to add a column that counts the occurences of the first tables Primary Keys in the second tables reference field.
This is pretty basic if you are across Excel Table Structured References or even just a basic Excel formulas.
To complete the field Population(x)
we can simply add an extra column to the City
table in Excel (as I have already done) and copy the formula =COUNTIF(People[CityId],[@Id])
. Make sure this is pasted down the entire column and bam! it is done. How simple and easy :)
Now to replicate this in Power Query using the M Language.
I have already created the queries by adding the data from Excel Tables (Click in the Table -> Click the Data Tab -> Click From Table). We will then open the Advanced Editor and build out our Power Query M Language code.
Query Editor Group Directory Structure
.
2016-02-05-easy-excel-countif-in-power-query-or-powerbi-m-language.xlsx
├── worksheets
│ ├── countif
│ ├─ ...
│
├── vba
└───queries
├── README.md
├───models
│ ├── City.m
│ ├── People.m
│ ├─ ...
│
├───loading
│ ├─ ...
│
├───functions
│ ├── f_countif.m
│ ├─ ...
│
└───Other Queries
└── blank.m
All the magic of the Power Query M is happening in the City.m
and the f_countif.m
. First we'll inspect the City.m
query.
Here you can see the 3 additional columns add through the M Table.AddColumn
method. I'll add commmentary to the code to explain each step.
let
Source = Excel.CurrentWorkbook(){[Name="City"]}[Content],
change_type = Table.TransformColumnTypes(
Source,
{
{"Id", Int64.Type},
{"Name", type text}
}
),
add_pop_m = Table.AddColumn(
change_type,
"Population(m)",
(CityRow) =>
Table.RowCount(
Table.SelectRows(
People,
each
[CityId] = CityRow[Id]))),
add_pop_so = Table.AddColumn(
add_pop_m,
"Population(so)",
each
let Id=[Id]
in Table.RowCount(
Table.SelectRows(
People,
each
[CityId] = Id))),
add_pop_fm = Table.AddColumn(
add_pop_so,
"Population(f)",
each
f_countif(People, "CityId",[Id]))
in
add_pop_fm
The steps of this portion of the M Query goes someting like this:
City
People
tableThis is a similar alternative version from the StackOverflow answer given by Imke Feldman. It starts off the same way but with slightly different syntax that I don't yet full understand. But it works :)
This is another alternative that utilised a Custom Function. The benefits of this is that it is much shorter within the City
query but required an additional Customer Function query to be created. This query can therefore be used in other queries as a Custom Function which can keep our code DRY.
let
countif = (tbl as table, col as text, value as any) as number =>
let
select_rows = Table.SelectRows(tbl, each Record.Field(_, col) = value),
count_rows = Table.RowCount(select_rows)
in
count_rows
in
countif
This function has a table, a column and a value passed in. It works much more like the traditional Excel formula.
There is another alternative way where a table is created that groups the People
table by the CityId
and sums this into a new field called City.Count
. This table can then be merged with the City
table. This adds additional steps but could be completed with less queries on the database. What is unclear at this stage is which is the most efficient alternative. One issue I would like to investigate is whether running a function within an each
block would trigger multiple queries to the database for each iteration.
This is a problem I have had in the past in MVC frameworks such as Ruby on Rails.
That is all for now. If you have anything you can add please post your comment below. Other sources of good information that I have used for this post include include:
Also thanks to the people that helped talk through this scenario on StackOverflow.