Changing a Select Query to a Count Distinct Query - sql

I am using a Select query to select Members, a variable that serves as a unique identifier, and transaction date, a Date format (MM/DD/YYYY).
Select Members , transaction_date,
FROM table WHERE Criteria = 'xxx'
Group by Members, transaction_date;
My ultimate aim is to count the # of unique members by month (i.e., a unique member in day 3, 6, 12 of a month is only counted once). I don't want to select any data, but rather run this calculation (count distinct by month) and output the calculation.

This will give distinct count per month.
SQLFiddle Demo
select month,count(*) as distinct_Count_month
from
(
select members,to_char(transaction_date, 'YYYY-MM') as month
from table1
/* add your where condition */
group by members,to_char(transaction_date, 'YYYY-MM')
) a
group by month
So for this input
+---------+------------------+
| members | transaction_date |
+---------+------------------+
| 1 | 12/23/2015 |
| 1 | 11/23/2015 |
| 1 | 11/24/2015 |
| 2 | 11/24/2015 |
| 2 | 10/24/2015 |
+---------+------------------+
You will get this output
+----------+----------------------+
| month | distinct_count_month |
+----------+----------------------+
| 2015-10 | 1 |
| 2015-11 | 2 |
| 2015-12 | 1 |
+----------+----------------------+

You might want to try this. This might work.
SELECT REPLACE(CONVERT(DATE,transaction_date,101),'-','/') AS [DATE], COUNT(MEMBERS) AS [NO OF MEMBERS]
FROM BAR
WHERE REPLACE(CONVERT(DATE,transaction_date,101),'-','/') IN
(
SELECT REPLACE(CONVERT(DATE,transaction_date,101),'-','/')
FROM BAR
)
GROUP BY REPLACE(CONVERT(DATE,transaction_date,101),'-','/')
ORDER BY REPLACE(CONVERT(DATE,transaction_date,101),'-','/')

Use COUNT(DISTINCT members) and date_trunc('month', transaction_date) to retain timestamps for most calculations (and this can also help with ordering the result). to_char() can then be used to control the display format but it isn't required elsewhere.
SELECT
to_char(date_trunc('month', transaction_date), 'YYYY-MM')
, COUNT(DISTINCT members) AS distinct_Count_month
FROM table1
GROUP BY
date_trunc('month', transaction_date)
;
result sample:
| to_char | distinct_count_month |
|---------|----------------------|
| 2015-10 | 1 |
| 2015-11 | 2 |
| 2015-12 | 1 |
see: http://sqlfiddle.com/#!15/57294/2

Related

SQL sum over a time interval / rows

the following code
SELECT distinct DATE_PART('year',date) as year_date,
DATE_PART('month',date) as month_date,
count(prepare_first_buyer.person_id) as no_of_customers_month
FROM
(
SELECT DATE(bestelldatum) ,person_id
,ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY person_id)
FROM ani.bestellung
) prepare_first_buyer
WHERE row_number=1
GROUP BY DATE_PART('year',date),DATE_PART('month',date)
ORDER BY DATE_PART('year',date),DATE_PART('month',date)
gives this table back:
| year_date | month_date | no_of_customers_month |
|:--------- |:----------:| ---------------------:|
| 2017 | 1 | 2 |
| 2017 | 2 | 5 |
| 2017 | 3 | 4 |
| 2017 | 4 | 8 |
| 2017 | 5 | 1 |
| . | . | . |
| . | . | . |
where als three are numeric values.
I need now a new column were i sum up all values from 'no_of_customers_month' for 12 months back.
e.g.
| year_date | month_date | no_of_customers_month | sum_12mon |
|:--------- |:----------:| :--------------------:|----------:|
| 2019 | 1 | 2 | 23 |
where 23 is the sum from 2019-1 back to 2018-1 over 'no_of_customers_month'.
Thx for the help.
You can use window functions:
SELECT DATE_TRUNC('month', date) as yyyymm,
COUNT(*) as no_of_customers_month,
SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', date) RANGE BETWEEN '11 month' PRECEDING AND CURRENT ROW)
FROM (SELECT DATE(bestelldatum), person_id,
ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY person_id)
FROM ani.bestellung
) b
WHERE row_number = 1
GROUP BY yyyymm
ORDER BY yyyymm;
Note: This uses date_trunc() to retrieve the year/month as a date, allowing the use of range(). I also find a date more convenient than having the year and month in separate columns.
Some versions of Postgres don't support range window frames. Assuming you have data for each month, you can use rows:
SELECT DATE_TRUNC('month', date) as yyyymm,
COUNT(*) as no_of_customers_month,
SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', date) ROWS BETWEEN 11 PRECEDING AND CURRENT ROW)
FROM (SELECT DATE(bestelldatum), person_id,
ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY person_id)
FROM ani.bestellung
) b
WHERE row_number = 1
GROUP BY yyyymm
ORDER BY yyyymm;

Filling in missing balance and dates in table to track balance

I hope you can help me with this problem. I just started out on SQL using Bigquery so my problem can seem a bit tedious.
So I have a table that basically records the date and balance whenever the balance changes. It looks somewhat like this:
+------------+-----------+------+---------+
| Date | seller_ID | Name | Balance |
+------------+-----------+------+---------+
| 2020-09-10 | 1 | John | 10 |
| 2020-09-13 | 1 | John | 8 |
| 2020-09-15 | 1 | John | 6 |
+------------+-----------+------+---------+
However, I need to create a new table with the daily balances that looks like this
+------------+-----------+------+---------+
| Date | seller_ID | Name | Balance |
+------------+-----------+------+---------+
| 2020-09-10 | 1 | John | 10 |
| 2020-09-11 | 1 | John | 10 |
| 2020-09-12 | 1 | John | 10 |
| 2020-09-13 | 1 | John | 8 |
| 2020-09-14 | 1 | John | 8 |
| 2020-09-15 | 1 | John | 6 |
+------------+-----------+------+---------+
I tried creating a separate table of all the dates between the first and final date, and then LEFT JOIN the original table with it but the resulting table isn't very helpful to draw from.
Does anyone have an idea of what to do in this case?
To fill null value with previous non-null value in BigQuery you can use LAST_VALUE with IGNORE NULLS:
WITH test_table AS (
SELECT DATE '2020-09-10' AS Date, 1 AS seller_Id, 'John' AS Name, 10 AS Balance UNION ALL
SELECT '2020-09-13', 1, 'John' AS Name, 8 UNION ALL
SELECT '2020-09-15', 1, 'John' AS Name, 6
)
SELECT Date,
LAST_VALUE(seller_Id IGNORE NULLS) OVER (ORDER BY Date) AS seller_Id,
LAST_VALUE(Name IGNORE NULLS) OVER (ORDER BY Date) AS Name,
LAST_VALUE(Balance IGNORE NULLS) OVER (ORDER BY Date) AS purchase_date
FROM UNNEST(GENERATE_DATE_ARRAY('2020-09-10', '2020-09-15')) AS Date
LEFT JOIN test_table USING (Date)
ORDER BY Date
You can do this without window functions for the balance. The key is the window function only for the date:
WITH t AS (
SELECT DATE '2020-09-10' AS Date, 1 AS seller_Id, 'John' AS Name, 10 AS Balance UNION ALL
SELECT '2020-09-13', 1, 'John' AS Name, 8 UNION ALL
SELECT '2020-09-15', 1, 'John' AS Name, 6
),
tt as (
SELECT t.*, LEAD(date) OVER (PARTITION BY name ORDER BY date) as next_date
FROM t
)
SELECT dte, tt.name, tt.balance
FROM tt LEFT JOIN
UNNEST(GENERATE_DATE_ARRAY(tt.date, COALESCE(DATE_ADD(tt.next_date, INTERVAL - 1 DAY), DATE '2020-09-15'))) dte
ON true;
(Note: The ON clause is optional in this case. However, I am not a fan of having joins without ON -- unless it is a CROSS JOIN.)
This has two important advantages over Sergey's solution. The most important is that it will work for multiple names with different time periods.
The second advantage is that it is more efficient, because it is not using window functions to fetch values from previous rows.

How can I query on the same table & get result in 2 columns

I have this table with the following data
+-------+-----------+-------+-------+
| Owner | closeDate | stage | value |
+-------+-----------+-------+-------+
| Abc | 1-1-2017 | won | 1000 |
| Abc | 31-1-2017 | won | 2000 |
| Abc | 3-1-2017 | lost | 1000 |
| Abc | 1-2-2017 | won | 5000 |
| Def | 1-2-2017 | won | 3000 |
| Def | 28-2-2017 | won | 4000 |
+-------+-----------+-------+-------+
I am aiming for a result like this where it groups the total value for each owner per month for only won stage
+-------+----------+----------+
| Owner | JanValue | FebValue |
+-------+----------+----------+
| Abc | 3000 | 5000 |
| Def | 0 | 7000 |
+-------+----------+----------+
I have tried this query but the results is getting in the record
SELECT Owner, sum(value) ,datename(month, closedate) as 'month'
FROM Table1
where closedate between '2017/01/1' and '2017/01/31' and stage='won'
GROUP BY Owner,datename(month, closedate)
UNION ALL
SELECT Owner, sum(value) ,datename(month, closedate) as 'month'
FROM Table1
where closedate between '2017/02/1' and '2017/02/28' and stage='won'
GROUP BY Owner,datename(month, closedate)
You are looking for a pivot query, this time involving the month of the close date:
SELECT
Owner,
SUM(CASE WHEN DATEPART(month, closeDate) = 1 THEN value END) AS JanValue,
SUM(CASE WHEN DATEPART(month, closeDate) = 2 THEN value END) AS FebValue,
...
FROM Table1
WHERE
stage = 'won' AND
DATEPART(year, closeDate) = 2017
GROUP BY
Owner;
Note that this approach gets stretched a bit thin when you want to consider having a monthly report across many years. In that case, you might want to use dynamic SQL to do the pivot. But, in such a case having so many months across columns would not be the most readable output IMO.
Try this for Dynamic result
SELECT
*
FROM
(
SELECT
*
FROM
(
SELECT
Owner,
CloseDate = DATENAME(month,CAST(CloseDate AS DATE)),
Val
FROM Table1
)T
PIVOT
(
SUM(VAL)
FOR CloseDate IN
(
[January],[February],[March],[April],[May],[June],[July],[August],[September],[October],[November],[December]
)
)Pvt
)Q
This will be your sample result
for the following input
The result is without filtering the Stage. You can give it in the following select
SELECT
Owner,
CloseDate = DATENAME(month,CAST(CloseDate AS DATE)),
Val
FROM Table1
where <Your Conditions>

Select rows which repeat every month

I am trying to resolve on simple task for first look.
I have transactions table.
| name |entity_id| amount | date |
|--------|---------|--------|------------|
| Github | 1 | 4.80 | 01/01/2014 |
| itunes | 2 | 2.80 | 22/01/2014 |
| Github | 1 | 4.80 | 01/02/2014 |
| Foods | 3 | 24.80 | 01/02/2014 |
| amazon | 4 | 14.20 | 01/03/2014 |
| amazon | 4 | 14.20 | 01/04/2014 |
I have to select rows which repeat every month in same day with same the amount for entity_id.(Subscriptions). Thanks for help
If your date column is created as a date type,
you could use a recursive CTE to collect continuations
after that, eliminate duplicate rows with distinct on
(and you should rename that column, because it's a reserved name in SQL)
with recursive recurring as (
select name, entity_id, amount, date as first_date, date as last_date, 0 as lvl
from transactions
union all
select r.name, r.entity_id, r.amount, r.first_date, t.date, r.lvl + 1
from recurring r
join transactions t
on row(t.name, t.entity_id, t.amount, t.date - interval '1' month)
= row(r.name, r.entity_id, r.amount, r.last_date)
)
select distinct on (name, entity_id, amount) *
from recurring
order by name, entity_id, amount, lvl desc
SQLFiddle
group it by day, for sample:
select entity_id, amount, max(date), min(date), count(*)
from transactions
group by entity_id, amount, date_part('day', date)

if more than 1 match, do not return 'unknown'

I composed a monster query. I'm certain that it can be optimized, and I would more than appreciate any comments/guidance on the query itself; however, I have a specific question:
The data I am returning is sometimes duplicated on multiple columns:
+-------+------+----------+------+-------+--------+----------+-------+------+
| first | last | deaID | cert | count | npi | clientid | month | year |
+-------+------+----------+------+-------+--------+----------+-------+------+
| Alex | Jue | UNKNOWN | MD | 11 | 123123 | 102889 | 7 | 2012 |
| Alex | Jue | BJ123123 | MD | 11 | 123123 | 102889 | 7 | 2012 |
+-------+------+----------+------+-------+--------+----------+-------+------+
as you can see all of the fields are equal except for deaID
in this case, I would like to only return:
+------+-----+----------+----+----+--------+--------+---+------+
| | | | | | | | | |
+------+-----+----------+----+----+--------+--------+---+------+
| Alex | Jue | BJ123123 | MD | 11 | 123123 | 102889 | 7 | 2012 |
+------+-----+----------+----+----+--------+--------+---+------+
however, if there are no duplicates:
+-------+------+---------+------+-------+--------+----------+-------+------+
| first | last | deaID | cert | count | npi | clientid | month | year |
+-------+------+---------+------+-------+--------+----------+-------+------+
| Alex | Jue | UNKNOWN | MD | 11 | 123123 | 102889 | 7 | 2012 |
+-------+------+---------+------+-------+--------+----------+-------+------+
then i would like to keep it!
summary
if there are duplicates remove all records with 'deaID=unknown'; however, if there is only 1 match then return that match
question
how do i return unknown records IFF there is 1 match?
here is the monster query in case anybody is interested :)
with ctebiggie as (
select distinct
p.[IMS_PRESCRIBER_ID],
p.PHYSICIAN_NPI as MLISNPI,
a.CLIENT_ID,
p.MLIS_FIRSTNAME,
p.MLIS_LASTNAME,
p_address.IMS_DEA_NBR,
p.IMS_PROFESSIONAL_ID_NBR,
p.IMS_PROFESSIONAL_ID_NBR_src,
p.IMS_CERTIFICATION_CODE,
datepart(mm,a.RECEIVED_DATE) as [Month],
datepart(yyyy,a.RECEIVED_DATE) as [Year]
from
MILLENNIUM_DW_dev..D_PHYSICIAN p
left outer join
MILLENNIUM_DW_dev..F_ACCESSION_DAILY a
on a.REQUESTOR_NPI=p.PHYSICIAN_NPI
left outer join MILLENNIUM_DW_dev..D_PHYSICIAN_ADDRESS p_address
on p.PHYSICIAN_NPI=p_address.PHYSICIAN_NPI
where
a.RECEIVED_DATE is not null
--and p.IMS_PRESCRIBER_ID is not null
--and p_address.IMS_DEA_NBR !='UNKNOWN'
and p.REC_ACTIVE_FLG=1
and p_address.REC_ACTIVE_FLG=1
and DATEPART(yyyy,received_date)=2012
and DATEPART(mm,received_date)=7
group by
p.[IMS_PRESCRIBER_ID],
p.PHYSICIAN_NPI,
p.IMS_PROFESSIONAL_ID_NBR,
p.MLIS_FIRSTNAME,
p.MLIS_LASTNAME,
p_address.IMS_DEA_NBR,
p.IMS_PROFESSIONAL_ID_NBR,
p.IMS_PROFESSIONAL_ID_NBR_src,
p.IMS_CERTIFICATION_CODE,
datepart(mm,a.RECEIVED_DATE),
datepart(yyyy,a.RECEIVED_DATE),
a.CLIENT_ID
)
,
ctecount as
(select
COUNT (Distinct f.ACCESSION_ID) [count],
f.REQUESTOR_NPI,f.CLIENT_ID,
datepart(mm,f.RECEIVED_DATE) mm,
datepart(yyyy,f.RECEIVED_DATE)yyyy
from MILLENNIUM_DW_dev..F_ACCESSION_DAILY f
where
f.CLIENT_ID not in (select * from SalesDWH..TestPractices)
and DATEPART(yyyy,f.received_date)=2012
and DATEPART(mm,f.received_date)=7
group by f.REQUESTOR_NPI,
f.CLIENT_ID,
datepart(mm,f.RECEIVED_DATE),
datepart(yyyy,f.RECEIVED_DATE)
)
select ctebiggie.*,c.* from
ctebiggie
full outer join
ctecount c
on c.REQUESTOR_NPI=ctebiggie.MLISNPI
and c.mm=ctebiggie.[Month]
and c.yyyy=ctebiggie.[Year]
and c.CLIENT_ID=ctebiggie.CLIENT_ID
Assuming you have the base query, I will assign row_number and count by partition function over this resultset. Then on the outer select, if count is 1 then unknown is selected, else it is not selected.
SELECT first,
last,
deaID,
cert,
count,
npi,
clientid,
month,
year
FROM (
SELECT first,
last,
deaID,
cert,
count,
npi,
clientid,
month,
year,
ROW_NUMBER() OVER (PARTITION BY
first,last,cert,count,npi,clientid,month,year
ORDER BY CASE WHEN deaID = 'Unkown' THEN 0 ELSE 1 END,
deaID) AS RowNumberInGroup,
COUNT() OVER (PARTITION BY first,last,cert,count,npi,clientid,month,year)
AS CountPerGroup,
SUM(CASE WHEN deaID = 'Unkown' THEN 1 ELSE 0 END)
OVER (PARTITION BY first,last,cert,count,npi,clientid,month,year)
AS UnknownCountPerGroup
FROM BaseQuery
) T
WHERE (T.CountPerGroup = T.UnknownCountPerGroup AND T.RowNumberInGroup = 1) OR T.RowNumberInGroup > T.UnknownCountPerGroup
see this helps or not
select distinct main.col1,main.col2 ,
isnull(( select col3 from table1 where table1.col1=main.col1
and table1.col2=main.col2 and col3 <>'UNKNOWN'),'UNKNOWN')
from table1 main
Sample in Sql fiddle
or fair version of yours will be
SELECT distinct first,
last,
cert,
count,
npi,
clientid,
month,
year,
isnull(
select top 1 dealid from table1 intable where
intable.first=maintable.first and
intable.last=maintable.last and
intable.cert=maintable.cert and
intable.npi=maintable.npi and
intable.clientid=outtable.clientid and
intable.month=outtable.month and
intable.year=outtable.year
where dealid<>'UNKNOWN'),'UNKNOWN') as dealId
FROM table1 maintable