To ensure uniqueness of parent record(s) - sql

How can I ensure that I am getting uniqueness on the parent record? I am wanting to get a count of mopid + user on a day by day count. How can I do this? Here is my code so far, I just have low confidence that it isn't giving me uniqueness on a mopid + user + day.
SELECT TO_CHAR(MOPNOTES.MOPNOTEDATE, 'YYYY-MM-DD') DayWorked, MOPNOTES.MOPNOTEUSER, MOPNOTES.mopid, COUNT(*) AS DAILY
FROM MOPUSER.MOPACTIVITY
INNER JOIN MOPUSER.MOPNOTES
ON MOPACTIVITY.MOPID=MOPNOTES.MOPID
WHERE MOPNOTES.MOPNOTEDATE > TO_DATE('01-JUL-13', 'DD-MON-YY') AND MOPNOTES.MOPNOTEDATE< TO_DATE('01-AUG-13', 'DD-MON-YY')
AND MOPACTIVITY.MOPSERVICEIMPACTED <> 'VOICE'
AND MOPACTIVITY.MOPSERVICEIMPACTED <> 'PWR/ENV'
AND (MOPNOTES.MOPNOTEUSER LIKE '%Ramesh%'
OR MOPNOTES.MOPNOTEUSER LIKE '%Saravanan%'
OR MOPNOTES.MOPNOTEUSER LIKE '%Boominathan%'
OR MOPNOTES.MOPNOTEUSER LIKE '%Srinivasan%'
OR MOPNOTES.MOPNOTEUSER LIKE '%Sathya%')
GROUP BY TO_CHAR(MOPNOTES.MOPNOTEDATE, 'YYYY-MM-DD'),MOPNOTES.MOPNOTEUSER,MOPNOTES.MOPID
ORDER BY TO_CHAR(MOPNOTES.MOPNOTEDATE, 'YYYY-MM-DD'),MOPNOTES.MOPNOTEUSER,MOPNOTES.MOPID

By definition, GROUP BY will give you unique rows per combination of columns grouped by. Since you are grouping by mopid + user + day, you will have uniqueness on mopid + user + day.
To illustrate, this query:
SELECT a, b
FROM ( SELECT MOD(LEVEL, 2) a, MOD(LEVEL, 4) b FROM DUAL CONNECT BY LEVEL < 11 )
GROUP BY a, b
... will give you the same results as this query:
SELECT DISTINCT a, b
FROM ( SELECT MOD(LEVEL, 2) a, MOD(LEVEL, 4) b FROM DUAL CONNECT BY LEVEL < 11 )
If you really want to be sure, you could verify that the counts match between the following two queries:
SELECT COUNT(1) FROM (
SELECT TO_CHAR(MOPNOTES.MOPNOTEDATE, 'YYYY-MM-DD') DayWorked, MOPNOTES.MOPNOTEUSER, MOPNOTES.mopid, COUNT(*) AS DAILY
FROM ...[etc]...
GROUP BY TO_CHAR(MOPNOTES.MOPNOTEDATE, 'YYYY-MM-DD'),MOPNOTES.MOPNOTEUSER,MOPNOTES.MOPID
--ORDER BY TO_CHAR(MOPNOTES.MOPNOTEDATE, 'YYYY-MM-DD'),MOPNOTES.MOPNOTEUSER,MOPNOTES.MOPID
)
SELECT COUNT(1) FROM (
SELECT DISTINCT ROWNUM, TO_CHAR(MOPNOTES.MOPNOTEDATE, 'YYYY-MM-DD') DayWorked, MOPNOTES.MOPNOTEUSER, MOPNOTES.mopid --, COUNT(*) AS DAILY
FROM ...[etc]...
--GROUP BY TO_CHAR(MOPNOTES.MOPNOTEDATE, 'YYYY-MM-DD'),MOPNOTES.MOPNOTEUSER,MOPNOTES.MOPID
--ORDER BY TO_CHAR(MOPNOTES.MOPNOTEDATE, 'YYYY-MM-DD'),MOPNOTES.MOPNOTEUSER,MOPNOTES.MOPID
)

Related

Get range of dates from dates record in MS SQL

I have dates record
with DateTable (dateItem) as
(
select '2022-07-03' union all
select '2022-07-05' union all
select '2022-07-04' union all
select '2022-07-09' union all
select '2022-07-12' union all
select '2022-07-13' union all
select '2022-07-18'
)
select dateItem
from DateTable
order by 1 asc
I want to get ranges of dates between this record like this
with DateTableRange (dateItemStart, dateItemend) as
(
select '2022-07-03','2022-07-05' union all
select '2022-07-09','2022-07-09' union all
select '2022-07-12','2022-07-13' union all
select '2022-07-18','2022-07-18'
)
select dateItemStart, dateItemend
from DateTableRange
I am able to do it in SQL with looping using while or looping by getting first one and check the next dates and if they are 1 plus then I add it in enddate and do the same in loop
But I don't know what the best or optimized way is, as there were lots of looping and temp tables involve
Edited :
as in data we have 3,4,5 and 6,7,8 is missing so range is 3-5
9 exist and 10 is missing so range is 9-9
so ranges is purely depend on the consecutive data in datetable
Any suggestion will be appreciated
With some additional clarity this requires a gaps-and-islands approach to first identify adjacent rows as groups, from which you can then use a window to identify the first and last value of each group.
I'm sure this could be refined further but should give your desired results:
with DateTable (dateItem) as
(
select '2022-07-03' union all
select '2022-07-05' union all
select '2022-07-04' union all
select '2022-07-09' union all
select '2022-07-12' union all
select '2022-07-13' union all
select '2022-07-18'
), valid as (
select *,
case when exists (
select * from DateTable d2 where Abs(DateDiff(day, d.dateitem, d2.dateitem)) = 1
) then 1 else 0 end v
from DateTable d
), grp as (
select *,
Row_Number() over(order by dateitem) - Row_Number()
over (partition by v order by dateitem) g
from Valid v
)
select distinct
Iif(v = 0, dateitem, First_Value(dateitem) over(partition by g order by dateitem)) DateItemStart,
Iif(v = 0, dateitem, First_Value(dateitem) over(partition by g order by dateitem desc)) DateItemEnd
from grp
order by dateItemStart;
See Demo Fiddle
After clarification, this is definitely a 'gaps and islands' problem.
The solution can be like this
WITH DateTable(dateItem) AS
(
SELECT * FROM (
VALUES
('2022-07-03'),
('2022-07-05'),
('2022-07-04'),
('2022-07-09'),
('2022-07-12'),
('2022-07-13'),
('2022-07-18')
) t(v)
)
SELECT
MIN(dateItem) AS range_from,
MAX(dateItem) AS range_to
FROM (
SELECT
*,
SUM(CASE WHEN DATEADD(day, 1, prev_dateItem) >= dateItem THEN 0 ELSE 1 END) OVER (ORDER BY rn) AS range_id
FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY dateItem) AS rn,
CAST(dateItem AS date) AS dateItem,
CAST(LAG(dateItem) OVER (ORDER BY dateItem) AS date) AS prev_dateItem
FROM DateTable
) groups
) islands
GROUP BY range_id
You can check a working demo

conditional running sum

I'm trying to return the number of unique users that converted over time.
So I have the following query:
WITH CTE
As
(
SELECT '2020-04-01' as date,'userA' as user,1 as goals Union all
SELECT '2020-04-01','userB',0 Union all
SELECT '2020-04-01','userC',0 Union all
SELECT '2020-04-03','userA',1 Union all
SELECT '2020-04-05','userC',1 Union all
SELECT '2020-04-06','userC',0 Union all
SELECT '2020-04-06','userB',0
)
select
date,
COUNT(DISTINCT
IF
(goals >= 1,
user,
NULL)) AS cad_converters
from CTE
group by date
I'm trying to count distinct user but I need to find a way to apply the distinct count to the whole date. I probably need to do something like a cumulative some...
expected result would be something like this
date, goals, total_unique_converted_users
'2020-04-01',1,1
'2020-04-01',0,1
'2020-04-01',0,1
'2020-04-03',1,2
'2020-04-05',1,2
'2020-04-06',0,2
'2020-04-06',0,2
Below is for BigQuery Standard SQL
#standardSQL
SELECT t.date, t.goals, total_unique_converted_users
FROM `project.dataset.table` t
LEFT JOIN (
SELECT a.date,
COUNT(DISTINCT IF(b.goals >= 1, b.user, NULL)) AS total_unique_converted_users
FROM `project.dataset.table` a
CROSS JOIN `project.dataset.table` b
WHERE a.date >= b.date
GROUP BY a.date
)
USING(date)
I would approach this by tagging when the first goal is scored for each name. Then simply do a cumulative sum:
select cte.* except (seqnum), countif(seqnum = 1) over (order by date)
from (select cte.*,
(case when goals = 1 then row_number() over (partition by user, goals order by date) end) as seqnum
from cte
) cte;
I realize this can be expressed without the case in the subquery:
select cte.* except (seqnum), countif(seqnum = 1 and goals = 1) over (order by date)
from (select cte.*,
row_number() over (partition by user, goals order by date) as seqnum
from cte
) cte;

Avoid division by zero: 1 / 0 error in WITH clause

I am using the following in a WITH clause to create a FULL JOIN in Big Query:
WITH
a AS(
SELECT
date AS Date,
SUM(Val1 / (1 - (Val2 + Val3))) AS Calc1,
FROM `project.dataset.table1`
GROUP BY Date
),
b as (SELECT
date AS Date,
FROM `project.dataset.table2`
GROUP BY Date
)
SELECT a.Date, SUM(Calc1)
FULL JOIN a on b.Date = a.Date
GROUP BY b.Date
Calc1 is creating a 'division by zero: 1 / 0' error, and I can't seem to work out how to restructure this so it doesn't occur. The query works fine outside of the WITH clause, as I can simply not include the GROUP BY so have no need to SUM Calc1?
Below is for BigQuery Standard SQL
Use
SUM(SAFE_DIVIDE(Val1, 1 - (Val2 + Val3))) AS Calc1
instead of
SUM(Val1 / (1 - (Val2 + Val3))) AS Calc1
Use NULLIF :
WITH
a AS(
SELECT
date AS Date,
SUM(Val1 / NULLIF((1 - (Val2 + Val3)),0)) AS Calc1,
FROM `project.dataset.table1`
GROUP BY Date, SUM(Calc1)
)
Have you analysed your data within "table1" to ensure that Val1, Val2 & Val3 are consistently populated, or do you have NULL values?
This could be the issue with your subtraction from 1.
WITH
a AS(
SELECT
date AS Date,
SUM(Val1 / (1 - (isnull(Val2,0.00) + isnull(Val3,0.00)))) AS Calc1
FROM `project.dataset.table1`
GROUP BY Date
),
b as (
SELECT
date AS Date,
FROM `project.dataset.table2`
GROUP BY Date
)
SELECT a.Date, SUM(a.Calc1)
FULL JOIN a on b.Date = a.Date`enter code here`
GROUP BY b.Date

Group by in columns and rows, counts and percentages per day

I have a table that has data like following.
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
I want to group them and count by attr column row wise and also create additional columns in to show their counts per day and percentages as shown below.
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
I'm able to display one count by using group by but unable to find out how to even seperate them to multiple columns. I tried to generate day1 percentage with
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
But this also is not giving me correct answer, I'm getting all zeroes for percentage and count as 1. Any help is appreciated. I'm trying to do this in Redshift which follows postgresql syntax.
Let's nail the logic before presenting:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
From here you can pivot to create a day by day if you feel the need
I am trying to enhance the query #johnHC btw if you needs for 7days then you have to those days in case when
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
http://sqlfiddle.com/#!17/54ace/20
In case that you have only 2 days:
http://sqlfiddle.com/#!17/3bdad/3 (days descending as in your example from left to right)
http://sqlfiddle.com/#!17/3bdad/5 (days ascending)
The main idea is already mentioned in the other answers. Instead of joining the CTEs for calculating the values I am using window functions which is a bit shorter and more readable I think. The pivot is done the same way.
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
A counting the rows per day and counting the rows per day AND attr
B for more readability I convert the date into numbers. Here I take the difference between current date of the row and the maximum date available in the table. So I get a counter from 0 (first day) up to n - 1 (last day)
C calculating the percentage and rounding
D pivot by filter the day numbers. The COALESCE avoids the NULL values and switched them into 0. To add more days you can multiply these columns.
Edit: Made the day counter more flexible for more days; new SQL Fiddle
Basically, I see this as conditional aggregation. But you need to get an enumerator for the date for the pivoting. So:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here is a SQL Fiddle.

SQL - values from two rows into new two rows

I have a query that gives a sum of quantity of items on working days. on weekend and holidays that quantity value and item value is empty.
I would like that on empty days is last known quantity and item.
My query is like this:
`select a.dt,b.zaliha as quantity,b.artikal as item
from
(select to_date('01-01-2017', 'DD-MM-YYYY') + rownum -1 dt
from dual
connect by level <= to_date(sysdate) - to_date('01-01-2017', 'DD-MM-YYYY') + 1
order by 1)a
LEFT OUTER JOIN
(select kolicina,sum(kolicina)over(partition by artikal order by datum_do) as zaliha,datum_do,artikal
from
(select sum(vv.kolicinaulaz-vv.kolicinaizlaz)kolicina,vz.datum as datum_do,vv.artikal
from vlpzaglavlja vz, vlpvarijante vv
where vz.id=vv.vlpzaglavlje
and vz.orgjed='01006'
and vv.skladiste='01006'
and vv.artikal in (3069,6402)
group by vz.datum,vv.artikal
order by vv.artikal,vz.datum asc)
order by artikal,datum_do asc)b
on a.dt=b.datum_do
where a.dt between to_date('12102017','ddmmyyyy') and to_date('16102017','ddmmyyyy')
order by a.dt`
and my output is like this:
and I want this:
In short, if quantity is null use lag(... ignore nulls) and coalesce or nvl:
select dt, item,
nvl(quantity, lag(quantity ignore nulls) over (partition by item order by dt))
from t
order by dt, item
Here is the full query, I cannot test it, but it is something like:
with t as (
select a.dt, b.zaliha as quantity, b.artikal as item
from (
select date '2017-10-10' + rownum - 1 dt
from dual
connect by date '2017-10-10' + rownum - 1 <= date '2017-10-16' ) a
left join (
select kolicina, datum_do, artikal,
sum(kolicina) over(partition by artikal order by datum_do) as zaliha
from (
select sum(vv.kolicinaulaz-vv.kolicinaizlaz) kolicina,
vz.datum as datum_do, vv.artikal
from vlpzaglavlja vz
join vlpvarijante vv on vz.id = vv.vlpzaglavlje
where vz.orgjed = '01006' and vv.skladiste='01006'
and vv.artikal in (3069,6402)
group by vz.datum, vv.artikal)) b
on a.dt = b.datum_do)
select *
from (
select dt, item,
nvl(quantity, lag(quantity ignore nulls)
over (partition by item order by dt)) qty
from t)
where dt >= date '2017-10-12'
order by dt, item
There are several issues in your query, major and minor:
in date generator (subquery a) you are selecting dates from long period, january to september, then joining with main tables and summing data and then selecting only small part. Why not filter dates at first?,
to_date(sysdate). sysdate is already date,
use ansi joins,
do not use order by in subqueries, it has no impact, only last ordering is important,
use date literals when defining dates, it is more readable.