Get rowset based on distinct combination of columns - sql

Given this dataset, each stock has a yearly snapshot of value.
+----+------+------+-------+-------+
| ID | Name | Year | Stock | Value |
+----+------+------+-------+-------+
| 1 | John | 2019 | ABC | 123 |
| 1 | John | 2020 | ABC | 123 |
| 1 | John | 2021 | ABC | 123 |
| 1 | John | 2021 | XYZ | 200 |
| 1 | John | 2022 | ABC | 123 |
| 1 | John | 2022 | XYZ | 200 |
| 1 | John | 2023 | ABC | 630 |
| 1 | John | 2023 | XYZ | 200 |
+----+------+------+-------+-------+
In 2019, John only holds ABC with a value of 123
In 2020, John also only holds ABC, with a value of 123 (has not changed)
In 2021, John holds ABC but has also acquired XYZ, with a value of 200
in 2022, John holds ABC and XYZ, both of which values haven't changed.
In 2023, John holds ABC and XYZ, with ABC's value increasing to 630 and XYZ's value remaining at 200.
I would like to return rows so that
Per year, if nothing of John's portfolio has changed SINCE THE LAST YEAR, no rows are returned
If anything in John's portfolio has changed SINCE THE LAST YEAR, all his current holdings are listed
For example,
+----+------+------+-------+-------+
| ID | Name | Year | Stock | Value |
+----+------+------+-------+-------+
| 1 | John | 2019 | ABC | 123 |
| 1 | John | 2021 | ABC | 123 |
| 1 | John | 2021 | XYZ | 200 |
| 1 | John | 2023 | ABC | 630 |
| 1 | John | 2023 | XYZ | 200 |
+----+------+------+-------+-------+
How would I do this, whether it be through functions in PL/SQL or in pure SQL?

If there are not too many rows per user, then listagg() provides a convenient solution:
select ny.*
from (select name, year,
listagg(stock || ':' || value, ',') within group (order by stock) as stocks,
lag(listagg(stock || ':' || value, ',') within group (order by stock)) as prev_stocks,
lag(year) over (partition by name order by year) as prev_year
from t
group by name, year
) ny
where prev_year is null or prev_year <> year - 1 or prev_stocks <> stocks;
Alternatively, you can check each row individually and use an analytic function to project the information over all rows in a name/year:
select t.*
from (select t.*,
sum(case when prev_nsv_year = year then 0 else 1 end) over (partition by name, year) as num_diff,
lag(cnt) over (partition by name order by year) as prev_cnt
from (select t.*,
lag(year) over (partition by name, stock, value over order by year) as prev_nsv_year,
count(*) over (partition by name, year) as cnt
from t
) t
) t
where cnt <> prev_cnt or prev_cnt is null or
num_diff > 0;

Related

Insert specific row of a table into same table ONCE only

I have a table containing the below elements:
YEAR | MONTH | COMP | COMP_DESC | FIRST_NAME | LAST_NAME | EMP_SD | CURR | METHOD | POSITION_STATUS
2021 | 2 | ABC | ABC Company | Jake | Sam | 11-01-2021 | USD | |
2021 | 5 | XYZ | XYZ Company | Neo | June | 23-09-2021 | USD | OPEN | METH_004
The Result i need:
YEAR | MONTH | COMP | COMP_DESC | FIRST_NAME | LAST_NAME | EMP_SD | CURR | METHOD | POSITION_STATUS
2021 | 2 | ABC | ABC Company | Jake | Sam | 11-01-2021 | USD | |
2021 | 5 | XYZ | XYZ Company | Neo | June | 23-09-2021 | USD | OPEN | METH_004
2021 | 5 | XYZ | XYZ Company | RP_Neo | RP_June | 24-09-2021 | USD | |
When POSITION_STATUS='METH_004' AND METHOD='OPEN', I need the specific row to be duplicated in the same table with change in date and change in name JUST ONCE. and when i re run the query i dont want any more duplication happening
Below is the Query i have written:
INSERT INTO Table1
(YEAR, MONTH, COMP, COMP_DESC, FIRST_NAME, LAST_NAME, EMP_SD, CURR)
SELECT T.YEAR, T.MONTH, T.COMP, T.COMP_DESC, CONCAT('RP_',FIRST_NAME) AS N1, CONCAT('RP_',LAST_NAME) AS N2,
EMP_DATE + INTERVAL '1 day' AS NXT_DAY, T.CURR FROM Table1 T WHERE POSITION_STATUS='OPEN' AND METHOD = 'METH_004'
EXCEPT
SELECT T1.POS, T1.YEAR, T1.MONTH, T1.COMP, T1.COMP_DESC, T1.COST_CENTER
FROM Table1 T1
INNER JOIN Table T2
ON T1.YEAR=T2.YEAR
AND T1.MONTH=T2.MONTH
AND T1.COMP=T2.COMP
AND T1.COMP_DESC=T2.COMP_DESC
AND T1.CURR=T2.CURR;
Can anyone suggest me changes to the code to get the result. The above code runs without error. But the row is not replicated.
Thanks

How to get count of particular column value from total number of records and display difference in two different columns in SQL Server

I am trying to get difference between total records and a column (Is_Registered) to get Month wise matrics of how many registered in particular month and how many are pending
Actual Data
| Inserted On | IsRegistered |
+-------------+--------------+
| 10-01-2020 | 1 |
| 15-01-2020 | 1 |
| 17-01-2020 | null |
| 17-02-2020 | 1 |
| 21-02-2020 | null |
| 04-04-2020 | null |
| 18-04-2020 | null |
| 19-04-2020 | 1 |
Expected Output -As shown in actual data, out of 8 users(records) 2 are registered in Jan and 6 are not ,in February total 3 are registered i.e. Jan's 2 + Feb's 1 and 5 are not and so on
| Year | Month | Registered | Not Registered |
| -------- | -------------- | ----------- | -------------- |
| 2020 | January | 2 | 6 |
| 2020 | Feb | 3 | 5 |
| 2020 | April | 4 | 4 |
But when a new record is added with new month then it should not update previous output result e.g. After adding new record with month as May and IsReg as NULL the value for Not_Registered should be as mentioned below because the new record is added in new month.
| Year | Month | Registered | Not Registered |
| -------- | -------------- | ----------- | -------------- |
| 2020 | January | 2 | 6 |
| 2020 | Feb | 3 | 5 |
| 2020 | April | 4 | 4 |
| 2020 | May | 4 | 5 |
And if the new record has month as May and Is_Registered as 1(true) then the output should be as follows
| Year | Month | Registered | Not Registered |
| -------- | -------------- | ----------- | -------------- |
| 2020 | January | 2 | 6 |
| 2020 | Feb | 3 | 5 |
| 2020 | April | 4 | 4 |
| 2020 | May | 5 | 4 |
I managed to write a query but didn't got expected output, what changes I'll have to make in order to get expected output
select year(dateinserted) as [Year], datename(month,dateinserted) as [Month],
coalesce(sum(cast(isregistered as int)), 0) as Authenticated,
sum(case when isregistered is null then 1 else 0 end) as UnAuthenticated
from table_name where IsRegistered is not null
group by year(dateinserted), datename(month,dateinserted)
order by year(dateinserted), month(min(dateinserted));
Output I got after executing above query -
| Year | Month | Registered | Not Registered |
| -------- | -------------- | ----------- | -------------- |
| 2020 | January | 2 | 1 |
| 2020 | Feb | 1 | 1 |
| 2020 | April | 1 | 2 |
Hmmm . . . You seem to want a cumulative sum of the counts (which are 1 or NULL, so count() works). For the second column, then difference between that and the total number of rows:
select year(dateinserted) as [Year],
datename(month, dateinserted) as [Month],
count(isregistered) as registered_in_month,
sum(count(isregistered)) over (order by min(dateinserted)) as registered_up_to_month,
sum(count(*)) over () - sum(count(isregistered)) over (order by min(dateinserted)) as not_yet_registered
from table_name
group by year(dateinserted), datename(month, dateinserted)
order by year(dateinserted), month(min(dateinserted));
Here is a db<>fiddle.
You should use self join and analytical function as follows:
Select year(t.inserted_on) as yr,
datename(month, t.dateinserted) as mnth,
Sum(count(t.is_registered))
over (order by min(t.inserted_on)) as resistered,
Tt.cnt - Sum(count(t.is_registered))
over (order by min(t.inserted_on)) as not_registered
From your_table t
Join (select t.*,
Count(*) over () as cnt
From your_table t) tt on t.inserted_on = tt.inserted_on
group by year(t.dateinserted), datename(month, t.dateinserted), tt.cnt
order by year(t.dateinserted), month(min(t.dateinserted));

How to Create a Flag Based on Date Values in Hive

I have a sample table as follows:
| name | startdate | enddate | flg |
|-------|-----------|------------|-----|
| John | 6/1/2018 | 7/1/2018 | |
| John | 10/1/2018 | 11/1/2018 | |
| John | 12/1/2018 | 12/20/2018 | |
| Ron | 3/1/2017 | 9/1/2017 | |
| Ron | 5/1/2018 | 10/1/2018 | |
| Jacob | 6/10/2018 | 6/12/2018 | |
What I want in the output: If a person has a 'startdate' within 60 days (or 2 months) of an 'enddate' values; then set the flg as 1 for that person. else have the flg as 0.
For example: John has a record of startdate on December 1st; which is within 60 days of one of the enddate for this person (November 1st 2018). So, the flg for this person is set to 1.
So, the output should look like as:
| Name | startdate | enddate | flg |
|-------|-----------|------------|-----|
| John | 6/1/2018 | 7/1/2018 | 1 |
| John | 10/1/2018 | 11/1/2018 | 1 |
| John | 12/1/2018 | 12/20/2018 | 1 |
| Ron | 3/1/2017 | 9/1/2017 | 0 |
| Ron | 5/1/2018 | 10/1/2018 | 0 |
| Jacob | 6/10/2018 | 6/12/2018 | 0 |
Any idea please?
Date Functions: Use datediff and case
select Name,startdate,enddate,
case when datediff(enddate,startdate) < 60 then 1 else 0 end flag
from table
If you are comparing the previous row's enddate, use lag()
select Name,startdate,enddate,
case when datediff(startdate,prev_enddate) < 60 then 1 else 0 end flag
from
(
select Name,startdate,enddate,
lag(endate) over(partition by Name order by startdate,enddate) as prev_enddate
from table
) t
Use lag to get the enddate of the previous row (per name). After this the flag can be set per name using max window function with a case expression that checks to see if the 60 day diff is satisfied at least once per name.
select name
,startdate
,enddate
,max(case when datediff(startdate,prev_end_dt) < 60 then 1 else 0 end) over(partition by name) as flag
from (select t.*
,lag(enddate) over(partition by name order by startdate) as prev_end_dt
from table t
) t

How can I get average of groupings in a table and store the result back to the original table in SQL

I have the following table:
| Country | Month | Revenue |
|---------|-------|---------|
| US | Jan | 100 |
| US | Feb | 200 |
| US | Mar | 300 |
| Canada | Jan | 200 |
| Canada | Feb | 400 |
| Canada | Mar | 500 |
I need to get average revenue per country and store this value back to the original table to get the following output:
| Country | Month | Revenue | Average |
|---------|-------|---------|---------|
| US | Jan | 100 | 200 |
| US | Feb | 200 | 200 |
| US | Mar | 300 | 200 |
| Canada | Jan | 200 | 366.6 |
| Canada | Feb | 400 | 366.6 |
| Canada | Mar | 500 | 366.6 |
What is the best way to accomplish this in SQL? Is it better to use partition by?
The best way to do this uses window functions:
select t.*, avg(revenue) over (partition by country) as avg_revenue
from t;
To actually do the computation and store it back requires an update. Although there are other methods, the following is standard SQL:
update t
set average = (select avg(revenue) from t t2 where t.country = t2.country);
EDIT:
In T-SQL, you can somewhat improve the performance by doing:
with toupdate as (
select t.*,
avg(t.revenue) over (partition by t.country) as new_average
from t
)
update toupdate
set average = new_average;

Show only one record, if value same in another column SQL

I have a table with 5 columns like this:
| ID | NAME | PO_NUMBER | DATE | STATS |
| 1 | Jhon | 160101-001 | 2016-01-01 | 7 |
| 2 | Jhon | 160101-002 | 2016-01-01 | 7 |
| 3 | Jhon | 160102-001 | 2016-01-02 | 7 |
| 4 | Jane | 160101-001 | 2016-01-01 | 7 |
| 5 | Jane | 160102-001 | 2016-01-02 | 7 |
| 6 | Jane | 160102-002 | 2016-01-02 | 7 |
| 7 | Jane | 160102-003 | 2016-01-02 | 7 |
I need to display all values, but stats fields without duplicate according from date field.
Like this
| ID | NAME | PO_NUMBER | DATE | STATS |
| 1 | Jhon | 160101-001 | 2016-01-01 | 7 |
| 2 | Jhon | 160101-002 | 2016-01-01 | null |
| 3 | Jhon | 160102-001 | 2016-01-02 | 7 |
| 4 | Jane | 160101-001 | 2016-01-01 | 7 |
| 5 | Jane | 160102-001 | 2016-01-02 | 7 |
| 6 | Jane | 160102-002 | 2016-01-02 | null |
| 7 | Jane | 160102-003 | 2016-01-02 | null |
I've had trouble getting the hoped. Thanks
From your sample data, it appears you only want to show the stats for po_number ending with 001. If so, this should be the easiest approach:
select id, name, po_number, date,
case when right(po_number, 3) = '001' then stats else null end as stats
from yourtable
If instead you want to order by the po_number, then here's one option using row_number:
select id, name, po_number, date,
case when rn = 1 then stats else null end as stats
from (
select *, row_number() over (partition by name, date order by po_number) as rn
from yourtable
) t
SQL Fiddle Demo
since you are using SQL 2012, you can use the LEAD() or LAG() window function to compare the DATE value
select *,
STATS = case when t.DATE = LAG(DATE) OVER(ORDER BY ID)
then NULL
else STATS
end
from yourtable t
Use below code
;with temp as (
select id,name ,PO_NUMBER ,DATE, STATS,
LAG (STATS, 1, 0)
OVER (PARTITION BY name ,PO_NUMBER ,DATE ORDER BY id) AS PrevSTATS
from tableName
)
select id,name ,PO_NUMBER ,DATE,
case when STATS = PrevSTATS then null
else STATS end as STATS
from temp