SQL-sum over dynamic period - sql

I have 2 tables: Customers and Actions, where each customer has uniqe ID (which can be found in each table).
Part of the customers became club members at a specific date (change between the customers). I'm trying to summarize their purchases until that date, and to get those who purchase more than (for example) 200 until they become club members.
For example, I can have the following customer:
custID purchDate purchAmount
1 2015-05-12 100
1 2015-07-12 150
1 2015-12-29 320
Now, assume that custID=1 became a club member at 2015-12-25; in that case, I'd like to get SUM(purchAmount)=250 (pay attention that I'd like to get this customer because 250>200).
I tried the following:
SELECT cust.custID, SUM(purchAmount)totAmount
FROM customers cust
JOIN actions act
ON cust.custID=act.custID
WHERE act.clubMember=1
AND cust.purchDate<act.clubMemberDate
GROUP BY cust.custID
HAVING totAmount>200;
Is it the right way to "attack" this question, or should I use something like while loop over the clubMemberDate (which telling the truth-I don't know how to do)?
I'm working with Teradata.
Your help will be appreciated.

Related

How can I select multiple records from more that two tables

I am a complete beginner in SQL and need some help to learn. I have this learning material I found online and trying to write queries per the instructions. I did the first but got stuck on the rest of the queries. Can anyone please help me write any one of the queries below?
[]
Please help me learn how to do the rest.
b) Get aids and names of agents who have placed individual orders of at least $500 for any customer living
in Kyoto.
c) What cities do those customers live in, who enjoy discount under 10% and have ordered at least 1000
combs so far (in all their orders put together)?
d) What product names have been ordered by at least one customer based in Dallas through any agent based
in Tokyo?
e) Get the name, city, and total dollar amount of all orders placed by each customer, arranged in descending
order of the amounts.
Here is how I managed to do the first one.
a) Get pairs of all customer cids and agent aids, who live in the same city.
SELECT customer.cid, agent.aid FROM customers, agent WHERE customer.city = agent.city
RESULT
cid
aid
c001
a05
c004
a05
c002
a06
c003
a06
Try this one for b
SELECT distinct Agents.City -- Distinct becuase you dont want more then one of the same city being returned
FROM Agents
JOIN Orders
ON orders.aid = agents.aid
WHERE orders.cid = 'c002' -- Checks the orders table for any cid equal to c002

Only Show unique Customers per date cohort for repeat purchase rate

Scenario:
I have a table that has all of the customer purchases by Month and each month has a period. Within that table I am showing the customers that have made purchases in each Month/Period. What I am trying to figure out is how to exclude any customer that made a purchase in the previous month so that the repeat purchases are only for unique customers. The data looks like the following:
customer_email
cohortMonth
month_number
orders_for_period
abc#gmail.com
10/2019
0
2
def#gmail.com
10/2019
0
1
ghi#gmail.com
10/2019
0
1
def#gmail.com
10/2019
1
1
abc#gmail.com
10/2019
1
1
def#gmail.com
10/2019
2
1
In the Table above for Month_number=0 we have 3 total customers and within this period customer abc#gmail.com was the only repeat customer because they have 2 orders. This would show as a 33% repeat purchase rate for month_number 0. For Month_number=1 we have 2 customers that have purchased again in the period but only def#gmail.com is unique as abc#gmail.com already made the purchase. This would then bring the repeat_rate to 66% as now 2 customers have comeback and purchased out of the 3 that originally purchased.
cohortMonth
month_number
repeat_purchase_rate
10/2019
0
33%
10/2019
1
66%
10/2019
2
66%
With every unique customer that purchases in the subsequent periods we want to add that to the total to understand the repeat rate at a cumulative level.
I have tried a ton of different ways to figure this out but backing out the customers that made purchases in the previous period and only showing the unique customers is where I am struggling at. Any help is greatly appreciated!
Side Note: Whenever I format a table it looks like how I want it to look in the preview but then when I review I get the error :"Your post appears to contain code that is not properly formatted as code. Please indent all code by 4 spaces using the code toolbar button or the CTRL+K keyboard shortcut. For more editing help, click the [?] toolbar icon."
I then indent and it breaks the way the table looks. Any help on that would be great as well. Thank you

Using Count and Group By in Power BI

I have a table that contains data about different benefit plans and users enrolled in one or more of those plans. So basically the table contains two columns representing the benefit plan counts and total users enrolled in those plans.
I need to create visualization in Power BI to represent the number of total users enrolled in 1 plan, 2 plans, 3 plans, ...etc.
I wrote the query in sql to get the desired result but not sure how do I do the same in power BI.
Below is my sql query:
SELECT S.PlanCount, COUNT(S.UserName) AS Participants
FROM (
SELECT A.Username, COUNT(*) AS PlanCount
FROM [dbo].[vw_BenefitsCount_Plan_Participants] AS A
GROUP BY A.username
)AS S
GROUP BY S.PlanCount
ORDER BY S.PlanCount
The query result is below image:
So here, PlanCount column represents the total different benefit plans that users are enrolled in. For e.g. the first row means that total of 6008 members are enrolled in only 1 plan, whereas row 2 displays that there are total of 3030 members who are enrolled in total of 2 plans and similarly row 5 means there are only 10 users who are enrolled in total of 6 plans.
I am new to Power BI and trying to understand DAX functions but couldn't find a reasonable example that could help me create my visualization.
I found a something similar here and here but they seem to be more towards single count and group by usage.
Here is a simple example. I have a table of home owners who have homes in multiple cities.
Now in this table, Alex, Dave and Julie have home in 1 city (basically we can say that these 3 people own just 1 home each). Similarly Jim owns a total of 2 homes and Bob and Pam each have 3 homes in total.
Now the output that I need is a table with total number of home owners that own 1 home, 2 homes and so on. So the resulting table in SQL is this.
Where NameCount is basically count of total home owners and Homes is the count of total homes these home owners have.
Please let me know if this helps.
Thanks.
If I understood fine, you have a table like this:
BenefitPlan | User
1 | Max
1 | Joe
2 | Max
3 | Anna
If it's ok, you can simply use a plot bar (for example) where the Axis is BenefitPlan and Value is User. When you drag some column in Value field, it will be grouped automaticaly (like group by in SQL), and by default the groupping method is count.
Hope it helps.
Regards.
You can use DAX to create a summary table from your data table:
https://community.powerbi.com/t5/Desktop/Creating-a-summary-table-out-of-existing-table-assistance/td-p/431485
Once you have counted plans by customer you will then have a field that will enable you to visualize the # of customers with each count.
Mock-up of the code:
PlanSummary = SUMMARIZE('vw_BenefitsCount_Plan_Participants',[Username],COUNT([PLAN_ID])

Optimal selection for ordering multiple items (parts) from multiple suppliers (vendors)

The task here is to define the optimal (as detailed below) way of ordering items (parts) from suppliers.
The relevant parts of the table schema (with some sample data) are
Items
ID NUMBER
1 Item0001
2 Item0002
3 Item0003
Suppliers
ID NAME DELIVERY DISCOUNT
1 Supplier0001 0 0
2 Supplier0002 0 0.025
3 Supplier0003 20 0
DELIVERY is the delivery charge (in dollars) levied by that supplier on each delivery. DISCOUNT is the settlement discount (as a percentage i.e. 2.5% for ID=2 above) allowed by that supplier for on time payment.
SupplierItems
SUPPLIER_ID ITEM_ID PRICE
1 2 21.67
1 5 45.54
1 7 32.97
This is the many-to-many join between suppliers and items with the price that supplier charges for that item (in dollars). Every item has at least 1 supplier but some have more than one. A supplier may have no items.
PartsRequests
ID ITEM_ID QUANTITY LOCATION_ID ORDER_ID
1 59 4 2 (null)
2 89 5 2 (null)
3 42 4 2 (null)
This table is a request from a field site for parts to be ordered and delivered by the supplier to that site. A delivery of any number of items to a site attracts a delivery charge. When the parts are ordered, the ORDER_ID is inserted into the table so we are only concerned with those where ORDER_ID IS NULL
The question is, what is the optimal way to order these parts for each `LOCATION' where there are 3 optimal solutions that need to be presented to the user for selection.
The combination of orders with the least number of suppliers
The combination of orders with the lowest total cost i.e. The sum of QUANTITY*PRICE for each item plus the DELIVERY for each order summed over all orders ignoring DISCOUNT
As item 2 but accounting for DISCOUNT
Clearly I need to determine the combinations of orders that are available and then determining the optimal ones becomes trivial but I am a bit stuck on an efficient way to deal with building the combinations.
I have built some SQL fiddles in SQL Server 2008 with random data. This one has 100 items, 10 suppliers and 100 requests. This one has 1000 items, 50 suppliers and 250 requests. The table schema is the same.
Update
I reasoned that the solution had to be recursive and I built a nice table valued function to get but I ran into the 32 hard limit on recursion in SQL Server. I was uncomfortable with it anyway because it hinted more of a procedural language solution than a RDMS.
So I am now playing with CTE recursion.
The root query is:
SELECT DISTINCT
'' SOLUTION_ID
,LOCATION_ID
,SUPPLIER_ID
,(subquery I haven't quite worked out) SOLE_SUPPLIER
FROM PartsRequests pr
INNER JOIN
SupplierItems si ON pr.ITEM_ID=si.ITEM_ID
WHERE pr.ORDER_ID IS NULL
This gets all the suppliers that can supply the required items and is certainly a solution, probably not optimal. The subquery sets a flag if the supplier is the sole supplier of any product required for that location; if so they must be part of any solution.
The recursive part is to remove suppliers one by one by means of CTE.SUPPLIER_ID<>CTE.SUPPLIER_ID and add them if they still cover all the items. The SOLUTION_ID will be a CSV list of the suppliers removed, partly to uniquely identify each solution and partly to check against so I get combinations instead of permutations.
Still working on the details, the purpose of this update was to allow the Community to say "Yay, looks like that will work" or, alternatively "You moron, that won't work because ..."
Thanks
This is a more general answer (as in, not sql) as I think solving this problem will require something more powerful. Your first scenario is to select a minimum number of suppliers. This problem can be seen as a set cover problem as you are trying to cover all demands per site with the suppliers. This problem is already NP-complete.
Your third scenario seems to be basically the same as the second. You just have to take the discount into account in the prices, assuming you pay on time for every order.
The second scenario is at least NP-hard as I see a lot of resemblance with the facility location problem. You are trying to decide which suppliers (facilities) to use (open) to cover your orders (demands) based on their prices and delivery costs (opening costs).
Enumerating your possible solutions seems infeasible as with 10 suppliers, you have 2^10 possibilities of using them, further complicated by the distribution of demands internally.
I would suggest some dynamic programming to first select the suppliers that you have to use (=they are the only ones that deliver a specific thing), eliminating some possibilities (if the cost for supplier A +delivery cost A< cost for supplier B) and then trying to expand your set of possible solutions. Linear programming is also a valid train of thought.

SQL SUM with Repeating Sub Entries - Best Practice?

I hit this issue regularly but here is an example....
I have a Order and Delivery Tables. Each order can have one to many Deliveries.
I need to report totals based on the Order Table but also show deliveries line by line.
I can write the SQL and associated Access Report for this with ease ....
SELECT xxx
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
until I get to the summing element. I obviously only want to sum each Order once, not the 1-many times there are deliveries for that order.
e.g. The SQL might return the following based on 2 Orders (ignore the banalness of the report, this is very much simplified)
Region OrderNo Value Delivery Date
North 1 £100 12-04-2012
North 1 £100 14-04-2012
North 2 £73 01-05-2012
North 2 £73 03-05-2012
North 2 £73 07-05-2012
South 3 £50 23-04-2012
I would want to report:
Total Sales North - £173
Delivery 12-04-2012
Delivery 14-04-2012
Delivery 01-05-2012
Delivery 03-05-2012
Delivery 07-05-2012
Total Sales South - £50
Delivery 23-04-2012
The bit I'm referring to is the calculation of the £173 and £50 which the first of which obviously shouldn't be £419!
In the past I've used things like MAX (for a given Order) but that seems like a fudge.
Surely there must be a regular answer to this seemingly common problem but I can't find one.
I don't necessarily need the code - just a helpful point in the right direction.
Many thanks,
Chris.
A roll up operator may not look pretty. However, it would do the regular aggregates that you see now, and it show the subtotals of the order. This is what you're looking for.
SELECT xxx
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
GROUP BY xxx
WITH ROLLUP;
I'm not exactly sure how the rest of your query is set up, but it would look something like this:
Region OrderNo Value Delivery Date
North 1 £100 12-04-2012
North 1 £100 14-04-2012
North 2 £73 01-05-2012
North 2 £73 03-05-2012
North 2 £73 07-05-2012
NULL NULL f419 NULL
I believe what you want is called a windowing function for your aggregate operation. It looks like the following:
SELECT xxx, SUM(Value) OVER (PARTITION BY Order.Region) as OrderTotal
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
Here's the MSDN article. The PARTITION BY tells the SUM to be done separately for each distinct Order.Region.
Edit: I just noticed that I missed what you said about orders being counted multiple times. One thing you could do is SUM() the values before joining, as a CTE (guessing at your schema a bit):
WITH RegionOrders AS (
SELECT Region, OrderNo, SUM(Value) OVER (PARTITION BY Region) AS RegionTotal
FROM Order
)
SELECT Region, OrderNo, Value, DeliveryDate, RegionTotal
FROM RegionOrders RO
INNER JOIN Delivery D on D.OrderNo = RO.OrderNo