I needed to solve a problem in PowerQuery / PowerBI for summarising totals in a table colum and assigning the values into another table without repeating these values on each row.
In our reporting API, we have line items for both Purchase Orders and Invoices in one table. They are classified as Type 1 - purchase order or Type 2 - invoice. What I wanted to do is add them up and arrive at the order / invoice balance. We offer this anyway in our Order tables but I was trying to figure out another solution. And as with all things PowerQuery, and with a steer from ChatGPT, there is a solution and it goes like this.
The line items table has everything I need. So step 1, having cleaned up the dates and some other data cleansing tasks, I duplicated the lines table. I filtered for purchase orders, type 1, and grouped by purchase order number, summarising the NetValue of the line. My table is then effiectively
I repeat this for the Invoices, changing Type to 2. Now I have a similar table again with PurchaseOrderNumber and the NetValue of any invoice lines
I merged these two tables together so that we have side by side values of order and invoice and a calculated column for the balance.
Next up, I wanted to get these values into the Orders table. So I Merged, linking on the OrderNumber.
Now another Merge of Order into the LineItems table so every line item has these new values, linking on the purchase order number.
Next up, I added an Index to the LineItems table. And the next bit is the celever bit that gives me a unqiue balance value on only 1 of the line items. I created an index column to the LineItems table.
I duplicated the lineItems table again.
I then removed duplicates based on the PurchaseOrderNumber and the new Balance field. I also removed type 2 lines - Invoice Lines.
And the master stroke, merge the de duplicated LineItems table back into the LineItems table, joining on the Index field. And now I have my Line items table with a unique value of the balance on only one of the lines. An example below
Like all things PowerQuery, there is a solution here, and for me it was the addition of the Index column. I saw that in a thread somewhere and pursued it and that's what allows the de duplication. This might not be the most efficient use of PowerQuery resource with the duplication of tables but it's a means to an end and solved the problem.
PowerQuery, PowerBI, ExCel, Deduplication of columns, summarising values