Sorting Master Child - sql

Following is the table:
Groups Method RDate
1 master_6 Sales 2019-10-17
2 master_3 ITO 2017-12-22
3 child_6 SRT 2019-10-21
4 master_4 TO 2019-02-07
5 child_3 ITI 2019-03-09
6 child_6 SRT 2019-03-14
7 master_6 Sales 2019-03-14
8 child_4 TR 2019-03-14
9 master_6 Sales 2019-03-14
I want output as follow.
Groups Method RDate
2 master_3 ITO 2017-12-22
5 child_3 ITI 2019-03-09
4 master_4 TO 2019-02-07
8 child_4 TR 2019-03-14
7 master_6 Sales 2019-03-14
6 child_6 SRT 2019-03-14
9 master_6 Sales 2019-03-14
3 child_6 SRT 2019-10-21
1 master_6 Sales 2019-10-17
Logic is:
Take all rows containing word 'master' and sort them by date.
In result, first row shall be Master having the oldest date,
Next row shall be child of that master (master_1's child is Child_1, master_2 child_2, so on)
Then take next master (2nd lowest date), and then its child
for e.g in my case
master having the lowest date is rec#2, so that will come first row in result. Then for second row, find child of that master_3, so it will be child_3, (if more than 1 record for child_3 found then consider lowest date and put it at row 2 in result), and then next master record and so on.
hope I did well to explain everything.
drop table if exists #A
CREATE TABLE #A(Groups varchar(15), Method varchar(15), RDate date)
insert into #A values
('master_6','Sales','2019/10/17'),
('master_3','ITO','2017/12/22'),
('child_6','SRT','2019/10/21'),
('master_4','TO','2019/02/07'),
('child_3','ITI','2019/03/09'),
('child_6','SRT','2019/03/14'),
('master_6','Sales','2019/03/14'),
('child_4','TR','2019/03/14'),
('master_6','Sales','2019/03/14');
since there are some confusion, i am trying to explain in other words:
in my case any master(Parent) will have only 1 type of child but can have multiple records of same child, just to explain, let's say that ParentA visited Theater 6 times in a year, and their ChildA visited 5 times. ParentB visited 3 times And their child 3 times. All these records are stored in one table by date, but not in any order.
I want output where it takes all parents and their date in Asc order in the background, then take the first parent from that background list, find it's child's visit- if found many visits then just take child's first visit because his parent visit was also first.
then take second record from parent list and find their child's visit
If this is second visit of same parent then find second visit of child, if not found anything then go to the third row of the parent list and find it's child visit.
All remaining(extra) visits of Parent or child will be at the end list.

For your sample data, you can do this in the order by:
order by max(case when groups like 'master%' then date end) over (partition by right(groups, 1)) asc,
right(groups, 1),
(case when groups like 'master%' then 1 else 2 end)
This is doing:
Calculating the date for the entire group, based on the "master" date for the group.
Keeping all of a group together, in case there are ties.
Putting the master record before the child(ren).

When a master gives their number to a child, they can stay together.
with CTE_MASTERS as
(
select Groups, Method, Rdate
, substring(Groups,patindex('%[_]%',Groups)+1,len(Groups)) as groupNr
, row_number() over (partition by Groups order by rdate) as groupRownum
, row_number() over (order by rdate, Groups) as masterRownum
from #A
where Groups like 'master%'
)
, CTE_CHILDS as
(
select Groups, Method, Rdate
, substring(Groups,patindex('%[_]%',Groups)+1,len(Groups)) as groupNr
, row_number() over (partition by Groups order by rdate) as groupRownum
from #A
where Groups like 'child%'
)
, CTE_MASTERS_AND_CHILDS as
(
select *
, cast(1 as bit) as isMaster
from CTE_MASTERS
UNION ALL
select c.*
, m.masterRownum
, 0
from CTE_CHILDS c
left join CTE_MASTERS m
on c.groupNr = m.groupNr
and c.groupRownum = m.groupRownum
)
SELECT Groups, Method, Rdate
FROM CTE_MASTERS_AND_CHILDS
ORDER BY
masterRownum,
isMaster desc,
groupNr,
groupRownum;
Groups | Method | Rdate
:------- | :----- | :------------------
master_3 | ITO | 22/12/2017 00:00:00
child_3 | ITI | 09/03/2019 00:00:00
master_4 | TO | 07/02/2019 00:00:00
child_4 | TR | 14/03/2019 00:00:00
master_6 | Sales | 14/03/2019 00:00:00
child_6 | SRT | 14/03/2019 00:00:00
master_6 | Sales | 14/03/2019 00:00:00
child_6 | SRT | 21/10/2019 00:00:00
master_6 | Sales | 17/10/2019 00:00:00
db<>fiddle here

Related

SQL Code How to do iterations in historical table

I need help on SQL
I have a historical table named A. It has month ID, srvc key, etc.
I need to check if a custkey is a new customer in that table A. The logic is - to see if that cust key is new for the current month ID and does not exist prior months (less than the current month ID).
To illustrate,
My current month ID = Feb2022
The cust key MUST exist in Feb2022 BUT not in Jan 2022, Dec2021,.., and so on..
Also, is it possible to tag if a cust key exist in Feb 2022 and Jan 2022 BUT not in Dec 2021, and so on..
select A.\*,B.level_1, B.level_2, B.level_3, B.LE,
case when cust_key in ('2100707688',
'1xxx4',
'1xxxx',
'28xxxx1',
'2xxxxxx',
) then 'New' else 'Old' end as Tag,
A.NET_AMT/(nullif(A.prod_cnt,0)\*B.LE) as ARPU
Hi #NickW,
thanks for responding, what I need is it from sample historical table below, I need to tag CNumber that are new for the current month (202202). They
are new because CNumber2 didnt appear for 202201,202112,20211. I dont care if it appeared 202110 and less. I care only about CNumber which didnt appear
last 3 months.
Cnumber MonthID
1 202202
1 202201
1 202112
1 202111
2 202202
2 202105
2 202104
2 202103
2 202102
2 202101
3 202202
3 202201
3 202112
3 202111
3 202110
3 202109
Based on this sample, Only CNumber 2 satisfies this rule since it appeared on 202202 but not in 202201 202112 202111.
Next, I would want to tag also CNumber who is new for Jan2022.
In this case, current monthID = 202201. Now, that CNumber should not appear in 202112,20211,202110 to be able to say it is New.
Next, want to tag also CNumber who is new for Dec 2022. Now, that CNumber should not appear in 20211,202110,202109 to be able to tell that they are new.
And so on..
My goal is to tag customers on when did they first appear in the historical table via Month ID. I am assuming that that is their booking date. So in a table, my goal is to see a column that is named as booking date.
We can use a cte to get the month of the first entry for the account. With that we can compare and calculate as needed.
create table sales(
cnumber int,
salesDate date);
insert into sales values
(1,'2021-11-15'),
(1,'2021-12-15'),
(1,'2022-01-15'),
(1,'2022-02-15'),
(2,'2022-02-15');
with cre as (
select
cnumber cnum,
DATE_FORMAT(min(salesDate),
'%Y-%m-01') monCre
from sales
group by
cnumber),
salesMonth as(
select
DATE_FORMAT(salesDate,
'%Y-%m-01') as mon,
cnumber cust
from sales
group by
cnumber,
mon)
select
cust customer,
mon "month",
case when mon = monCre
then 'new' else 'existing' end
as "status",
TIMESTAMPDIFF(MONTH,monCre ,mon)
as "account Age"
from salesMonth
join cre on cust = cnum
order by cust, mon;
customer | month | status | account Age
-------: | :--------- | :------- | ----------:
1 | 2021-11-01 | new | 0
1 | 2021-12-01 | existing | 1
1 | 2022-01-01 | existing | 2
1 | 2022-02-01 | existing | 3
2 | 2022-02-01 | new | 0
db<>fiddle here

Netezza add new field for first record value of the day in SQL

I'm trying to add new columns of first values of the day for location and weight.
For instance, the original data format is:
id dttm location weight
--------------------------------------------
1 1/1/20 11:10:00 A 40
1 1/1/20 19:07:00 B 41.1
2 1/1/20 08:01:00 B 73.2
2 1/1/20 21:00:00 B 73.2
2 1/2/20 10:03:00 C 74
I want each id to have only one day record, such as:
id dttm location weight
--------------------------------------------
1 1/1/20 11:10:00 A 40
2 1/1/20 08:01:00 B 73.2
2 1/2/20 10:03:00 C 74
I have other columns in my data set that I'm using location and weight to create, so I don't think I can just filter for 'first' records of the day.. Is it possible to write query to recognize first record of the day for those two columns and create new column with those values?
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by id, ddtm::date order by dttm) as seqnum
from t
) t
where seqnum = 1;

Vertica SQL for running count distinct and running conditional count

I'm trying to build a department level score table based on a deeper product url level score table.
Date is not consecutive
Not all urls got score updates at same day (independent to each other)
dist_url should be running count distinct (cumulative count distinct)
dist urls and urls score >=30 are both count distinct
What I have now is:
Date url Store Dept Page Score
10/1 a US A X 10
10/1 b US A X 30
10/1 c US A X 60
10/4 a US A X 20
10/4 d US A X 60
10/6 b US A X 22
10/9 a US A X 40
10/9 e US A X 10
Date Store Dept Page dist urls urls score >=30
10/1 US A X 3 2
10/4 US A X 4 3
10/6 US A X 4 2
10/9 US A X 5 2
I think the dist_url can be done by using window function, just not sure on query.
Current query is as below, but it's wrong since not cumulative count distinct:
SELECT
bm.AnalysisDate,
su.SoID AS Store,
su.DptCaID AS DTID,
su.PageTypeID AS PTID,
COUNT(DISTINCT bm.SeoURLID) AS NumURLsWithDupScore,
SUM(CASE WHEN bm.DuplicationScore > 30 THEN 1 ELSE 0 END) AS Over30Count
FROM csn_seo.tblBotifyMetrics bm
INNER JOIN csn_seo.tblSEOURLs su
ON bm.SeoURLID = su.ID
WHERE su.DptCaID IS NOT NULL
AND su.DptCaID <> 0
AND su.PageTypeID IS NOT NULL
AND su.PageTypeID <> -1
AND bm.iscompliant = 1
GROUP BY bm.AnalysisDate, su.SoID, su.DptCaID, su.PageTypeID;
Please let me know if anyone has any idea.
Based on your question, you seem to want two levels of logic:
select date, store, dept,
sum(sum(start)) over (partition by dept, page order by date) as distinct_urls,
sum(sum(start_30)) over (partition by dept, page order by date) as distinct_urls_30
from ((select store, dept, page, url, min(date) as date, 1 as start, 0 as start_30
from t
group by store, dept, page, url
) union all
(select store, dept, page, url, min(date) as date, 0, 1
from t
where score >= 30
group by store, dept, page, url
)
) t
group by date, store, dept, page;
I don't understand how your query is related to your question.
Try as I might, I don't get your output either:
But I think you can avoid UNION SELECTs - Does this do what you expect?
NULLS don't figure in COUNT DISTINCTs - and here you can combine an aggregate expression with an OLAP one ...
And Vertica has named windows to increase readability ....
WITH
input(Date,url,Store,Dept,Page,Score) AS (
SELECT DATE '2019-10-01','a','US','A','X',10
UNION ALL SELECT DATE '2019-10-01','b','US','A','X',30
UNION ALL SELECT DATE '2019-10-01','c','US','A','X',60
UNION ALL SELECT DATE '2019-10-04','a','US','A','X',20
UNION ALL SELECT DATE '2019-10-04','d','US','A','X',60
UNION ALL SELECT DATE '2019-10-06','b','US','A','X',22
UNION ALL SELECT DATE '2019-10-09','a','US','A','X',40
UNION ALL SELECT DATE '2019-10-09','e','US','A','X',10
)
SELECT
date
, store
, dept
, page
, SUM(COUNT(DISTINCT url) ) OVER(w) AS dist_urls
, SUM(COUNT(DISTINCT CASE WHEN score >=30 THEN url END)) OVER(w) AS dist_urls_gt_30
FROM input
GROUP BY
date
, store
, dept
, page
WINDOW w AS (PARTITION BY store,dept,page ORDER BY date)
;
-- out date | store | dept | page | dist_urls | dist_urls_gt_30
-- out ------------+-------+------+------+-----------+-----------------
-- out 2019-10-01 | US | A | X | 3 | 2
-- out 2019-10-04 | US | A | X | 5 | 3
-- out 2019-10-06 | US | A | X | 6 | 3
-- out 2019-10-09 | US | A | X | 8 | 4
-- out (4 rows)
-- out
-- out Time: First fetch (4 rows): 45.321 ms. All rows formatted: 45.364 ms

Postgres count number or rows and group them by timestamp

Let's assume I have one table in postgres with just 2 columns:
ID which is PK for the table (bigint)
time which is type of timestamp
Is there any way how to get IDs grouped by time BY YEAR- when the time is date 18 February 2005 it would fit in 2005 group (so result would be)
year number of rows
1998 2
2005 5
AND if the number of result rows is smaller than some number (for example 3) SQL will return the result by month
Something like
month number of rows
(February 2018) 5
(March 2018) 2
Is that possible some nice way in postgres SQL?
You can do it using window functions (as always).
I use this table:
TABLE times;
id | t
----+-------------------------------
1 | 2018-03-14 20:04:39.81298+01
2 | 2018-03-14 20:04:42.92462+01
3 | 2018-03-14 20:04:45.774615+01
4 | 2018-03-14 20:04:48.877038+01
5 | 2017-03-14 20:05:08.94096+01
6 | 2017-03-14 20:05:16.123736+01
7 | 2017-03-14 20:05:19.91982+01
8 | 2017-01-14 20:05:32.249175+01
9 | 2017-01-14 20:05:35.793645+01
10 | 2017-01-14 20:05:39.991486+01
11 | 2016-11-14 20:05:47.951472+01
12 | 2016-11-14 20:05:52.941504+01
13 | 2016-10-14 21:05:52.941504+02
(13 rows)
First, group by month (subquery per_month).
Then add the sum per year with a window function (subquery with_year).
Finally, use CASE to decide which one you will output and remove duplicates with DISTINCT.
SELECT DISTINCT
CASE WHEN yc > 5
THEN mc
ELSE yc
END AS count,
CASE WHEN yc > 5
THEN to_char(t, 'YYYY-MM')
ELSE to_char(t, 'YYYY')
END AS period
FROM (SELECT
mc,
sum(mc) OVER (PARTITION BY date_trunc('year', t)) AS yc,
t
FROM (SELECT
count(*) AS mc,
date_trunc('month', t) AS t
FROM times
GROUP BY date_trunc('month', t)
) per_month
) with_year
ORDER BY 2;
count | period
-------+---------
3 | 2016
3 | 2017-01
3 | 2017-03
4 | 2018
(4 rows)
Just count years. If it's at least 3, then you group by years, else by months:
select
case (select count(distinct extract(year from time)) from mytable) >= 3 then
to_char(time, 'yyyy')
else
to_char(time, 'yyyy-mm')
end as season,
count(*)
from mytable
group by season
order by season;
(Unlike many other DBMS, PostgreSQL allows to use alias names in the GROUP BY clause.)

How to get the count of distinct values until a time period Impala/SQL?

I have a raw table recording customer ids coming to a store over a particular time period. Using Impala, I would like to calculate the number of distinct customer IDs coming to the store until each day. (e.g., on day 3, 5 distinct customers visited so far)
Here is a simple example of the raw table I have:
Day ID
1 1234
1 5631
1 1234
2 1234
2 4456
2 5631
3 3482
3 3452
3 1234
3 5631
3 1234
Here is what I would like to get:
Day Count(distinct ID) until that day
1 2
2 3
3 5
Is there way to easily do this in a single query?
Not 100% sure if will work on impala
But if you have a table days. Or if you have a way of create a derivated table on the fly on impala.
CREATE TABLE days ("DayC" int);
INSERT INTO days
("DayC")
VALUES (1), (2), (3);
OR
CREATE TABLE days AS
SELECT DISTINCT "Day"
FROM sales
You can use this query
SqlFiddleDemo in Postgresql
SELECT "DayC", COUNT(DISTINCT "ID")
FROM sales
cross JOIN days
WHERE "Day" <= "DayC"
GROUP BY "DayC"
OUTPUT
| DayC | count |
|------|-------|
| 1 | 2 |
| 2 | 3 |
| 3 | 5 |
UPDATE VERSION
SELECT T."DayC", COUNT(DISTINCT "ID")
FROM sales
cross JOIN (SELECT DISTINCT "Day" as "DayC" FROM sales) T
WHERE "Day" <= T."DayC"
GROUP BY T."DayC"
try this one:
select day, count(distinct(id)) from yourtable group by day