How do I rank customer visits by date in PowerPivot? - powerpivot

I have a table with customers' transactions named "purchases" with fields like this:
--------------------------------------------------
| title | price |qty| client_id | created_at |
--------------------------------------------------
| product A | 100 | 1 | 1 | 01.01.2010 |
| product B | 120 | 2 | 1 | 05.01.2010 |
| product B | 120 | 1 | 2 | 08.01.2010 |
When I create a calc column for total purchase count, it works great:
=calculate(DISTINCTCOUNT([created_at]);ALLEXCEPT(purchases;purchases[client_id]))
but when I try to calculate the number of each exact customer visit (or rank) with the formula
=calculate(DISTINCTCOUNT([created_at]);filter(purchases;purchases[created_at]<=earlier([created_at]));ALLEXCEPT(purchases;purchases[client_id]))
it calculates the number of visit regadless the current client_id, it ignores the ALLEXCEPT part of the filter.
How can I fix it?
I also tried to solve it with RANKX but the issue was similar: i don't know how to filter according to the current client_id.

not sure how, but it works :))
=CALCULATE (DISTINCTCOUNT ( [created_at] );FILTER (ALL ( purchases ); [client_id] = EARLIER ( [client_id] ) && [created_at]<=EARLIER([created_at])))
got this hint on Facebook. hope it helps someone.

Related

Make a 1 to 1 multi-field SQL join where only some of the values match

I am trying to build a table that will be used as a conversion chart. I aim to make a simple join with this conversion table on multiple fields (8 in my case), and get a result. I will try to simplify the examples as much as I can because the original chart is a 40x10 matrix.
Let's say that I have these two (I know they don't make much sense and have bad design but they are just examples):
supply_conversion_chart
---
supply (integer)
customer_id (integer)
product_id (integer)
size (varchar)
purchase_type (varchar)
purchases
---
customer_id (integer)
product_id (integer)
size (varchar)
purchase_type (varchar)
and conversion chart would look something like this:
| supply | customer_id | product_id | size | purchase_type |
|--------|--------------|------------|----------|---------------|
| 100 | 1 | anything | anything | online |
| 101 | 1 | anything | anything | offline |
| 102 | other than 1 | anything | anything | online |
| 103 | 1 | 5 | XXL | online |
The main goal was to get an exact supply value by simply doing a join by doing something like:
SELECT supply
FROM purchases p
JOIN supply_conversion_chart scc ON
p.customer_id = scc.customer_id AND
p.product_id = scc.product_id AND
p.size = scc.size AND
p.purchase_type = scc.purchase_type;
Let's say that these are the records on purchases table:
| customer_id | product_id | size | purchase_type |
|-------------|------------|------|---------------|
| 1 | 3 | M | online |
| 1 | 5 | S | offline |
| 12345 | 4 | XL | online |
| 1 | 5 | XXL | online |
| 4353 | null | M | online |
I would expect first record's supply value to be 101, second record's to be 102, third 102, fourth 103, and fifth to be 102. However, as far as I know, SQL won't be able to do a proper join on all of these records except the fourth one, which is fully matching with supply 103 on supply_conversion_chart table. I don't know if it is possible in the first place to do a join using multiple fields when some of those fields are not fully matching.
My approach is probably faulty and there are better ways to get the results I am trying to achieve but I don't even know where to start. What should I do?
The original chart is much bigger that the provided example, and that I will be doing a join on 8 different fields.
You approach is a lateral join:
select p.*, scc.*
from purchases p left join lateral
(select scc.*
from supply_conversion_chart scc
where (scc.customer_id = p.customer_id or scc.customer_id is null) and
(scc.product_id = p.product_id or scc. product_id is null) and
(scc.size = p.size or scc.size is null) and
(scc.purchase_type = p.purchase_type or scc.purchase_type is null)
order by ( (scc.customer_id = p.customer_id)::int +
(scc.product_id = p.product_id)::int
(scc.size = p.size)::int
(scc.purchase_type = p.purchase_type)::int
) desc
limit 1
) scc;
Note: This represents "everything" as NULL. It doesn't have special logic for "customer other than 1". However, it does show you how to implement basically what you are trying to do.

SQL/Power BI Joins without common column

So I have the following problem:
I have 2 tables, one containing different bids for a product_type, and one containing the price, date etc. to which the product was sold.
The tables look like this:
Table bids:
+----------+---------------------+---------------------+--------------+-------+
| Bid_id | Start_time | End_time | Product_type | price |
+----------+---------------------+---------------------+--------------+-------+
| 1 | 18.01.2020 06:00:00 | 18.01.2020 06:02:33 | blue | 5 € |
| 2 | 18.01.2020 06:00:07 | 18.01.2020 06:00:43 | blue | 7 € |
| 3 | 18.01.2020 06:01:10 | 19.01.2020 15:03:15 | red | 3 € |
| 4 | 18.01.2020 06:02:20 | 18.01.2020 06:05:44 | blue | 6 € |
| | | | | |
+----------+---------------------+---------------------+--------------+-------+
Table sells:
+---------+---------------------+--------------+--------+
| Sell_id | Sell_time | Product_type | Price |
+---------+---------------------+--------------+--------+
| 1 | 18.01.2020 06:00:31 | Blue | 6,50 € |
| 2 | 18:01.2020 06:51:03 | Red | 2,50 € |
| | | | |
+---------+---------------------+--------------+--------+
The sell_id and the bid_id have no relation with each other.
What I want to find out is, what is the maximum bid to the time we sold the product_type. So if we take sell_id 1, it should check, which bids for this specific product_type were active during the sell_time (in this case bid_id 1 and 2) and give back the higher price (in this case bid_id 2).
I tried to solve this problem in Power Bi, however, I was not able to get a solution. I assume, that I have to work with SQL-Joins to solve it.
Is it possible, to join based on criteria instead of matching columns? Something like:
SELECT bids.start_time, bids.end_time, bids.product_type, MAX(bids.price), sells.sell_time, sells.product_type, sells.price
FROM sells
INNER JOIN bids ON bids.start_time<sells.sell_time AND bids.end_time > sells.sell_time;
I am sorry if this question is confusing, I am still new to this sorry. Thanks in advance for ANY help!
Your sample data Sell_time should be 18.01.2020, right? You Can try this code (can be resource-intensive in relation to the amount of data due to Cartesian joins). If you are sure that Sell day is always in Bid Start day, then you can add date column to yours tables and use additional TREATAS(VALUE(bids[day], sells[day])
Test =
VAR __tretasfilter =
TREATAS ( VALUES ( bids[Product_type] ), sells[Product_type] )
RETURN
SUMMARIZE (
FILTER (
SUMMARIZECOLUMNS (
sells[Sell_id],
bids[Price],
bids[Start_time],
sells[Sell_time],
bids[End_time],
sells[Product_type],
__tretasfilter
),
[Start_time] <= [Sell_time]
&& [End_time] >= [Sell_time]
),
sells[Sell_id],
"MaxPrice", MAX ( bids[Price] )
)

Returning singular row/value from joined table date based on closest date

I have a Production Table and a Standing Data table. The relationship of Production to Standing Data is actually Many-To-Many which is different to how this relationship is usually represented (Many-to-One).
The standing data table holds a list of tasks and the score each task is worth. Tasks can appear multiple times with different "ValidFrom" dates for changing the score at different points in time. What I am trying to do is query the Production Table so that the TaskID is looked up in the table and uses the date it was logged to check what score it should return.
Here's an example of how I want the data to look:
Production Table:
+----------+------------+-------+-----------+--------+-------+
| RecordID | Date | EmpID | Reference | TaskID | Score |
+----------+------------+-------+-----------+--------+-------+
| 1 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 2 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 3 | 30/02/2020 | 1 | 123 | 1 | 2 |
| 4 | 31/02/2020 | 1 | 123 | 1 | 2 |
+----------+------------+-------+-----------+--------+-------+
Standing Data
+----------+--------+----------------+-------+
| RecordID | TaskID | DateActiveFrom | Score |
+----------+--------+----------------+-------+
| 1 | 1 | 01/02/2020 | 1.5 |
| 2 | 1 | 28/02/2020 | 2 |
+----------+--------+----------------+-------+
I have tried the below code but unfortunately due to multiple records meeting the criteria, the production data duplicates with two different scores per record:
SELECT p.[RecordID],
p.[Date],
p.[EmpID],
p.[Reference],
p.[TaskID],
s.[Score]
FROM ProductionTable as p
LEFT JOIN StandingDataTable as s
ON s.[TaskID] = p.[TaskID]
AND s.[DateActiveFrom] <= p.[Date];
What is the correct way to return the correct and singular/scalar Score value for this record based on the date?
You can use apply :
SELECT p.[RecordID], p.[Date], p.[EmpID], p.[Reference], p.[TaskID], s.[Score]
FROM ProductionTable as p OUTER APPLY
( SELECT TOP (1) s.[Score]
FROM StandingDataTable AS s
WHERE s.[TaskID] = p.[TaskID] AND
s.[DateActiveFrom] <= p.[Date]
ORDER BY S.DateActiveFrom DESC
) s;
You might want score basis on Record Level if so, change the where clause in apply.

Access 2016 & SQL: Totaling two columns, then subtracting them

Say I have a MoneyIN and a MoneyOUT column. I wish to total these entire columns up so I have a sum of each, then I wish to subtract the total of the MoneyOUT column from the total of the MoneyIN column. I also want to display a DateOF column and possibly a description (I think I can do that by myself).
This would be the original database where I get my information from:
+-------------+------------------+---------+----------+-----------+
| Location ID | Location Address | Date Of | Money In | Money Out |
+-------------+------------------+---------+----------+-----------+
| 1 | blah | date | 10.00 | 0.00 |
| 2 | blah | date | 2,027.10 | 27.10 |
| 2 | blah | date | 0.00 | 2000.00 |
| 1 | blah | date | 0.00 | 10.00 |
| 3 | blah | date | 5000.00 | 0.00 |
+-------------+------------------+---------+----------+-----------+
I would like to be able to type in a location ID and then have results show up (in this example I type 2 for the location)
+---------+----------+-----------+------+
| Date Of | Money In | Money Out | |
+---------+----------+-----------+------+
| date | 2027.10 | 27.10 | |
| date | 0 | 2000 | |
| Total: | 2027.10 | 2027.10 | 0 |
+---------+----------+-----------+------+
I have tried other solutions (One of which was pointed out below), however, they don't show the sum of each entire column, they simply subtract MoneyOUT from MoneyIN for each row. As of now, I am trying to do this in a query, but if there is a better way, please elaborate.
I am extremely new to SQL and Access, so please make the explanation understandable for a beginner like me. Thanks so much!
This is a table referred to below.
+-------------+-------+----------+-----------+-----------+
| Location ID | Date | Money IN | Money Out | Total Sum |
+-------------+-------+----------+-----------+-----------+
| 1 | date | 300 | 200 | |
| 1 | date | 300 | 200 | |
| 1 | date | 300 | 200 | |
| 1 | total | 900 | 600 | 300 |
+-------------+-------+----------+-----------+-----------+
The following should give you what you want:
SELECT DateOf, MoneyIn, MoneyOut, '' AS TotalSum FROM YourTable
UNION
SELECT 'Total', SUM(MoneyIn) AS SumIn, SUM(MoneyOut) AS SumOut,
SUM(MoneyIn - MoneyOut) AS TotalSum FROM YourTable
Edit:
You do not need to alter very much to achieve what you want. In order to get Access to prompt for a parameter when running a query, you give a name for the parameter in square brackets; Access will then pop-up a window prompting the user for this value. Also this parameter can be used more than once in the query, without Access prompting for it multiple times. So the following should work for you:
SELECT DateOf, MoneyIn, MoneyOut, '' AS TotalSum
FROM YourTable
WHERE LocationID=[Location ID]
UNION
SELECT 'Total', SUM(MoneyIn) AS SumIn, SUM(MoneyOut) AS SumOut,
SUM(MoneyIn - MoneyOut) AS TotalSum FROM YourTable
WHERE LocationID=[Location ID];
However, looking at your table design, I strongly encourage you to change it. You are including the address on every record. If you have three locations, but 100 records, then on average you are unnecessarily repeating each address more than 30 times. The "normal" way to avoid this would be to have a second table, Locations, which would have an ID and an Address field. You then remove address from YourTable, and in its place create a one-to-many relationship between the ID in Locations and the LocationID in YourTable.
It's a little unclear exactly what you expect without sample data, but I think this is what you want:
SELECT DateOf, SUM(MoneyIN) - SUM(MoneyOut)
FROM YourTable
GROUP BY DateOf
This will subtract the summed total of MoneyOut from MoneyIn at each distinct DateOf
Updated Answer
A UNION will let you append a 'Totals' record to the bottom of your result set:
SELECT *
FROM (
SELECT CAST(DateOf as varchar(20)) as DateOf, MoneyIn, MoneyOut, '' as NetMoneyIn
FROM YourTable
UNION
SELECT 'Total:', SUM(MoneyIn), SUM(MoneyOut), SUM(MoneyIN) - SUM(MoneyOut)
FROM YourTable
) A
ORDER BY CASE WHEN DateOf <> 'Total:' THEN 0 ELSE 1 END, DateOf
Some notes.. I used a derived table to ensure that the 'Total' record is last. Also casted DateOf to a string (assuming it is a date), otherwise you will have issues writing the string 'Total:' to that column.

SQL deleting rows with duplicate dates conditional upon values in two columns

I have data on approx 1000 individuals, where each individual can have multiple rows, with multiple dates and where the columns indicate the program admitted to and a code number.
I need each row to contain a distinct date, so I need to delete the rows of duplicate dates from my table. Where there are multiple rows with the same date, I need to keep the row that has the lowest code number. In the case of more than one row having both the same date and the same lowest code, then I need to keep the row that also has been in program (prog) B. For example;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-06-02 | 211 | B |
| 1 | 1997-08-19 | 67 | A |
| 1 | 1997-08-19 | 23 | A |
So my desired output would look like this;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-08-19 | 23 | A |
I'm struggling to come up with a solution to this, so any help greatly appreciated!
Microsoft SQL Server 2012 (X64)
The following works with your test data
SELECT ID, date, MIN(code), MAX(prog) FROM table
GROUP BY date
You can then use the results of this query to create a new table or populate a new table. Or to delete all records not returned by this query.
SQLFiddle http://sqlfiddle.com/#!9/0ebb5/5
You can use min() function: (See the details here)
select ID, DATE, min(CODE), max(PROG)
from table
group by DATE
I assume that your table has a valid primary key. However i would recommend you to take IDas Primary key. Hope this would help you.