sql unique mapping of columns - sql

I have a database where there are n products ,m units sold on different dates.
Like bags are sold on daily basis , some days 5 some days 6 etc.
Sample database :
+---------+----------+-------+
| Product | UnitSold | Date |
+---------+----------+-------+
| bag | 1 | 1 jun |
| wallet | 2 | 2 jun |
| purse | 3 | 3 jun |
| bag | 4 | 4 jun |
| shoes | 3 | 4 jun |
| Shirt | 2 | 1 jun |
| bag | 5 | 2 jun |
| shirt | 6 | 3 jun |
| Purse | 1 | 1 jun |
+---------+----------+-------+
I want a unique combination of results where a particular quantity of a product is sold on particular date. How can I do that?
Example I am looking for :
Result:
+---------+----------+-------+
| Product | UnitSold | Date |
+---------+----------+-------+
| bag | 1 | 1 jun |
| purse | 3 | 3 jun |
| shirt | 6 | 3 jun |
+---------+----------+-------+
Want a specific mapping of columns
How can I do that ? I am using Microsoft sql server 2008

You could throw in a rank or row number if you don't care about what result you really want.
I threw your data into a temp table and ran the following. It will give me one result per product. With rank, it will give me number 1 based on unit sold. You can change that if you want, based on date or whatever else.
select *
from (
select *,rank() over(partition by product order by unitsold ) as rnk
from #temp a
)final
where rnk = 1
product unitsold Date rnk
bag 1 2017-06-01 1
purse 3 2017-06-02 1
shirt 2 2017-06-02 1
shoes 3 2017-06-04 1
wallet 2 2017-06-02 1

Related

Find Aggregated Data Between Two Dates in Two Tables Where One is Updated Weekly and Other is Updated Hourly

I have data in two different tables, one is updated every week or once in the middle of the week if needed, and the other table is updated every hour or so because it has more data. The first table, can be seen as
agent_id | rank | ranking_date
---------------------------
1 | 1 | 2022-03-21
2 | 2 | 2022-03-21
1 | 4 | 2022-03-14
2 | 3 | 2022-03-14
1 | 2 | 2022-03-10
And the second table contains detailed information on sales.
agent_id | call_id | talk_time | product_sold | amount | call_date
------------------------------------------------------------------
1 | 1 | 13 | 1 | 53 |2022-03-10
1 | 2 | 24 | 2 | 2 |2022-03-10
2 | 3 | 43 | 4 | 11 |2022-03-10
1 | 4 | 31 | - | 0 |2022-03-10
2 | 5 | 12 | - | 0 |2022-03-10
1 | 6 | 11 | - | 0 |2022-03-11
1 | 7 | 35 | 2 | 79 |2022-03-11
2 | 8 | 76 | - | 0 |2022-03-11
1 | 9 | 42 | 1 | 23 |2022-03-11
2 | 10 | 69 | - | 0 |2022-03-11
How can I merge the two tables to get their aggregated information? Remember the ranks change at the beginning of every week, and the sales happen every day. But the rankings can also be changed in the middle of the week if needed. So what I am trying to get is created an aggregated table for understanding the sales by each agent. Something like this
agent_id | rank | ranking_date | total_calls_handled | total_talktime | total_amount
------------------------------------------------------------------------------------
1 | 1 | 2022-03-21 | 100 | 875 | 3000 (this is 3/21 - today)
2 | 2 | 2022-03-21 | 120 | 576 | 3689 (this is 3/21 - today)
1 | 4 | 2022-03-14 | 210 | 246 | 1846 (this is 3/14 - 3/21)
2 | 3 | 2022-03-14 | 169 | 693 | 8562 (this is 3/14 - 3/21)
1 | 2 | 2022-03-10 | 201 | 559 | 1749 (this is 3/7 - 3/10)
So the data is aggregated for each agent from 7-10, 10 - 14, then 14-21. Also, if say, the latest ranking date is 2022-03-21, and today is 2022-03-23, the query returns aggregation until today.
[Edit]: added table and data details
Table and data details:
Rankings table:
agent_id: unique_id of the agent
rank: rank of an agent assigned updated every Monday or if needed
ranking_date: date when agent's ranking was last updated (Automatically every Monday or if needed)
Sales Table:
agent_id: unique_id of the agent
call_id: unique_id for a call
talk_time: duration of the call
product_sold: unique_id of the product sold (- if agent was unsuccessful to sell)
amount: commission earned by the agent (therefore same product_id has different amount) (0 if agent was unsuccessful to sell)
call_date: date when which call was made
[Edit 2]: Here is SQLFiddle.
Here we join where ranking_date and call_date are in the same week. If you make calls sunday you will need to check whether it falls in the same week as you want.
The syntax in the query is for SQL server, as the SQL Fiddle given. You will need to modify the line of the join to
on date_part(w,r.ranking_date) = date_part(w,s.call_date)
which should be compatible with Google Redshift.
select
r.agent_id,
r.rank,
r.ranking_date,
count(s.call_id) TotalCalls,
sum(s.talk_time) TotalTime,
sum(s.amount) TotalAmount
from rankings r
left join sales s
on datename(ww,r.ranking_date)= datename(ww,s.call_date)
group by
r.agent_id,
r.rank,
r.ranking_date
GO
agent_id | rank | ranking_date | TotalCalls | TotalTime | TotalAmount
-------: | ---: | :----------- | ---------: | --------: | ----------:
1 | 1 | 2022-03-21 | 0 | null | null
1 | 2 | 2022-03-10 | 10 | 356 | 168
1 | 4 | 2022-03-14 | 0 | null | null
2 | 2 | 2022-03-21 | 0 | null | null
2 | 3 | 2022-03-14 | 0 | null | null
db<>fiddle here

How to get count of particular column value from total number of records and display difference in two different columns in SQL Server

I am trying to get difference between total records and a column (Is_Registered) to get Month wise matrics of how many registered in particular month and how many are pending
Actual Data
| Inserted On | IsRegistered |
+-------------+--------------+
| 10-01-2020 | 1 |
| 15-01-2020 | 1 |
| 17-01-2020 | null |
| 17-02-2020 | 1 |
| 21-02-2020 | null |
| 04-04-2020 | null |
| 18-04-2020 | null |
| 19-04-2020 | 1 |
Expected Output -As shown in actual data, out of 8 users(records) 2 are registered in Jan and 6 are not ,in February total 3 are registered i.e. Jan's 2 + Feb's 1 and 5 are not and so on
| Year | Month | Registered | Not Registered |
| -------- | -------------- | ----------- | -------------- |
| 2020 | January | 2 | 6 |
| 2020 | Feb | 3 | 5 |
| 2020 | April | 4 | 4 |
But when a new record is added with new month then it should not update previous output result e.g. After adding new record with month as May and IsReg as NULL the value for Not_Registered should be as mentioned below because the new record is added in new month.
| Year | Month | Registered | Not Registered |
| -------- | -------------- | ----------- | -------------- |
| 2020 | January | 2 | 6 |
| 2020 | Feb | 3 | 5 |
| 2020 | April | 4 | 4 |
| 2020 | May | 4 | 5 |
And if the new record has month as May and Is_Registered as 1(true) then the output should be as follows
| Year | Month | Registered | Not Registered |
| -------- | -------------- | ----------- | -------------- |
| 2020 | January | 2 | 6 |
| 2020 | Feb | 3 | 5 |
| 2020 | April | 4 | 4 |
| 2020 | May | 5 | 4 |
I managed to write a query but didn't got expected output, what changes I'll have to make in order to get expected output
select year(dateinserted) as [Year], datename(month,dateinserted) as [Month],
coalesce(sum(cast(isregistered as int)), 0) as Authenticated,
sum(case when isregistered is null then 1 else 0 end) as UnAuthenticated
from table_name where IsRegistered is not null
group by year(dateinserted), datename(month,dateinserted)
order by year(dateinserted), month(min(dateinserted));
Output I got after executing above query -
| Year | Month | Registered | Not Registered |
| -------- | -------------- | ----------- | -------------- |
| 2020 | January | 2 | 1 |
| 2020 | Feb | 1 | 1 |
| 2020 | April | 1 | 2 |
Hmmm . . . You seem to want a cumulative sum of the counts (which are 1 or NULL, so count() works). For the second column, then difference between that and the total number of rows:
select year(dateinserted) as [Year],
datename(month, dateinserted) as [Month],
count(isregistered) as registered_in_month,
sum(count(isregistered)) over (order by min(dateinserted)) as registered_up_to_month,
sum(count(*)) over () - sum(count(isregistered)) over (order by min(dateinserted)) as not_yet_registered
from table_name
group by year(dateinserted), datename(month, dateinserted)
order by year(dateinserted), month(min(dateinserted));
Here is a db<>fiddle.
You should use self join and analytical function as follows:
Select year(t.inserted_on) as yr,
datename(month, t.dateinserted) as mnth,
Sum(count(t.is_registered))
over (order by min(t.inserted_on)) as resistered,
Tt.cnt - Sum(count(t.is_registered))
over (order by min(t.inserted_on)) as not_registered
From your_table t
Join (select t.*,
Count(*) over () as cnt
From your_table t) tt on t.inserted_on = tt.inserted_on
group by year(t.dateinserted), datename(month, t.dateinserted), tt.cnt
order by year(t.dateinserted), month(min(t.dateinserted));

Create a temp table with multiple conditions

I'm struggling with creating a temporary table with multiple conditions.
Let's call this main table A. I want to pull data from this table to output the distinct account with their last purchase date and payment date to a temporary table.
+---+--------+-----------+----------+
| | Acct | Trans_Date|Trans_code|
+---+--------+-----------+----------+
| 1 | ABC | July 31 | Purchase |
| 2 | ABC | Nov 5 | Payment |
| 3 | DEF | Mar 1 | Purchase |
| 4 | ABC | June 5 | Purchase |
| 5 | GFH | Feb 7 | Payment |
| 6 | GFH | Mar 9 | Purchase |
| 7 | DEF | Aug 8 | Payment |
| 8 | GFH | Mar 9 | Purchase |
| 9 | DEF | Aug 8 | Payment |
+---+--------+---------+----------+
Output result
+---+-------+----------------+--------------+
| | Acct | Last_trans_date|Last_transpay |
+---+-------+----------------+--------------+
| 1 | ABC | July 31 | Nov 5 |
| 2 | DEF | Mar 1 | Aug 8 |
| 3 | GFH | Mar 9 | Feb 7 |
+---+------+-----------------+--------------+
I read that using the WITH clauses could be an option, but struggling to understand it.
You can use conditional aggregation like so:
select acct,
max(case when trans_code = 'Purchase' then trans_date end) as last_purchase,
max(case when trans_code = 'Payment' then trans_date end) as last_payment
from mytable
group by acct
The syntax to insert the result of a query to another table varies across databases. In many of them, you can use:
create table newtable as
select ... -- above query
SQL Server is a notable exception, where you would need:
select ...
into newtable
from ...
group by ...
You can use conditional aggregation:
select acct, max(trans_date),
max(case when trans_code = 'Payment' then trans_date end)
from t
group by acct;
You can then insert this into an existing table or use the appropriate mechanism for your database to save the result as a new table.

MS SQL Server: Load All Data vs Aggregate with +1 round trip

I love to get your opinion on this problem.
I need to show the list of order records for the range of particular date/time. Then summarise it with # of Order compare with the "last" order. "Last" can mean either last month OR last year
Since I am going to show the list of order record, I am thinking to get the record from last month OR last year with one hit (ie. together with the records of current date/time range)
OR, alternatively, I can:
Get the record of current date/time range, THEN
Get the total number of order (using aggregate) for last month OR last year
The alternative means there is 2 round trips to database (but less data to return). Or should I stick with my current method (loading all records including those from last month OR last year).
NOTE: The website and the SQL server is hosted in Microsoft Azure Cloud. But we might switch to AWS in the future.
Thanks
Input example (some fields are omitted including time for simplicity)
----------------------------------------------------------------
| Warehouse Id | Order Id | Product Id | Quantity | Order Date |
----------------------------------------------------------------
| 1 | 10 | 1 | 10 | 2016-09-25 |
| 1 | 9 | 5 | 5 | 2016-09-24 |
| 1 | 8 | 4 | 8 | 2016-09-23 |
| 1 | 7 | 6 | 2 | 2016-09-23 |
| 1 | 6 | 8 | 1 | 2016-09-23 |
| 1 | 5 | 1 | 2 | 2016-09-22 |
| 1 | 4 | 1 | 2 | 2016-09-21 |
| 1 | 3 | 5 | 10 | 2016-09-21 |
| 1 | 2 | 5 | 15 | 2016-08-12 |
| 1 | 1 | 5 | 5 | 2016-08-10 |
----------------------------------------------------------------
The desire OUTPUT:
Input:
WarehouseId: 1
StartDate: 2016-09-01 End Date: 2016-09-30)
Comparison type: Last Month (ie. StartDate: 2016-08-01 EndDate: 2016-08-31)
Output:
Warehouse: xxx
-------------------------------------------------
| Order Id | Product Id | Quantity | Order Date |
-------------------------------------------------
| 10 | 1 | 10 | 2016-09-25 |
| 9 | 5 | 5 | 2016-09-24 |
| 8 | 4 | 8 | 2016-09-23 |
| 7 | 6 | 2 | 2016-09-23 |
| 6 | 8 | 1 | 2016-09-23 |
| 5 | 1 | 2 | 2016-09-22 |
| 4 | 1 | 2 | 2016-09-21 |
| 3 | 5 | 10 | 2016-09-21 |
-------------------------------------------------
Total Order: 40 (increase 100% from last month)
So, what I am doing now is to get ALL records from 2016-08-01 to 2016-09-30. That way I can avoid 2 round trips.
Alternatively, I can do the following:
1. Get record from 2016-09-01 to 2016-09-30
var rec = (from rec in tblOrders
where (rec.WarehouseId == whsId) && (rec.OrderDate >= startDate) && (rec.OrderDate <= endDate)
select rec).ToList();
2. Then do the SUM of total order from 2016-08-01 to 2016-08-31 for comparison purposes
var recSum = (from rec in ef.tblOrders
where (rec.WarehouseId == whsId) && (rec.OrderDate >= cStartDate) && (rec.OrderDate <= cEndDate)
group rec by rec.WarehouseId into grec
select new
{
TotalQty = grec.Sum(x => x.Quantity),
}).FirstOrDefault();
You can do this with window functions:
select o.*
from (select o.*
sum(case when datetime is "last month" or "last year" then 1 else 0 end) over () as last_num_orders
from orders o
) o
where o.datetime between #date1 and #date2;
I am very unclear what "last" means in this context. However, you can do what you want with window functions, which is the preferred option 0.

Looping through table to create a new table in SQL jointly with group by Postgres

Suppose a table has the following structure
product | day | transactionid | saleprice |
------------------------------------------------ |
Apple | 1 | 239849248 | 10 |
Apple | 2 | 239834328 | 10 |
Apple | 2 | 239849249 | 10 |
Apple | 3 | 239849234 | 11 |
Banana | 1 | 239843244 | 2 |
Banana | 2 | 239843244 | 2 |
Banana | 3 | 239843244 | 3 |
Banana | 4 | 239843244 | 3 |
Orange | 1 | 239234238 | 25 |
Orange | 2 | 239234238 | 25 |
Orange | 3 | 239234238 | 25 |
Orange | 3 | 239234238 | 26 |
Orange | 3 | 239234238 | 26 |
Orange | 4 | 239234238 | 27 |
Where a number of products are sold, every day, with multiple transactions at different prices. For each product, I am interested in a change-log of Min(SalePrice) (changelog because this rarely changes in my data). The following query gives me, for a particular product (say Orange):
SELECT max(product), day, min(saleprice)
FROM tableabove
where product = 'Orange'
group by day
order by day asc;
Gives me:
product | day | minsaleprice |
Orange | 1 | 25 |
Orange | 2 | 25 |
Orange | 3 | 25 |
Orange | 4 | 27 |
So, I have what I need for a product I specify, but now in the way I need it. For example, for orange I only need the days when the price changed (and Day 1) which means it should have only two rows for Day 1, and Day 4. I also do not know how to iterate this over all products in the table to generate a new table that looks as follows.
product | day | minsaleprice |
Apple | 1 | 10 |
Apple | 3 | 11 |
Banana | 1 | 2 |
Banana | 3 | 3 |
Orange | 1 | 25 |
Orange | 4 | 27 |
Any help is appreciated. Thanks.
I think you just want lag():
select t.*
from (select t.*,
lag(saleprice) over (partition by product order by day) as prev_saleprice
from tableabove t
) t
where prev_saleprice is null pr prev_saleprice <> saleprice;
EDIT:
If you only want changes day by day, then the same idea works with an additional aggregation:
select t.*
from (select t.product, t.day, min(salesprice) as min_saleprice
lag(min(saleprice)) over (partition by product order by day) as prev_minsaleprice
from tableabove t
group by t.product, t.day
) t
where prev_minsaleprice is null pr prev_minsaleprice <> minsaleprice;
Following on guidance from Gordon Linoff, I was was able to write the query as follows:
SELECT table2.*
FROM (SELECT table1.*, lag(table1.minsaleprice) OVER(partition by product) as prev_price
FROM (SELECT product, day, MIN(saleprice) as minsaleprice FROM tableabove
GROUP BY day, product ORDER BY product, day)
as table1)
as table2
WHERE prev_price IS null OR prev_fee <> minsaleprice