Complex SQL query - Count always return 0 - sql

I want to count how much users created account each month, I fount here a query - author says it worked him, but i have problem - as i said - 3rd argument is always 0.
I was trying to fix it in many ways (delete, add args) but result was the same or SQL error.
SELECT y, m, Count(ytskagamerpl_userinfo.creationdate)
FROM (
SELECT y, m
FROM
(SELECT YEAR(CURDATE()) y UNION ALL SELECT YEAR(CURDATE())-1) years,
(SELECT 1 m UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8
UNION ALL SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL SELECT 12) months
) ym
LEFT JOIN ytskagamerpl_userinfo
ON ym.y = YEAR(FROM_UNIXTIME(ytskagamerpl_userinfo.creationdate))
AND ym.m = MONTH(FROM_UNIXTIME(ytskagamerpl_userinfo.creationdate))
WHERE (y=YEAR(CURDATE()) AND m<=MONTH(CURDATE()))
OR
(y<YEAR(CURDATE()) AND m>MONTH(CURDATE()))
GROUP BY y, m;
DATABASE:
uid | creationdate (TIMESTAMP):
1 | 20.10.2018
2 | 12.11.2018
... | ..........
X | 15.07.2019
PHP VAR DUMP _ https://pastebin.com/ABT5PCZn
I expect count of users created on X month 201X year, but it always returns 0,
years & months are displayed correctly.

Related

Converting monthly to daily data

I have monthly data that I would like to transform to daily data. The data looks like this. The extraction_dt is in date format.
isin
extraction_date
yield
001
2013-01-31
100
001
2013-02-28
110
001
2013-03-31
105
...
...
...
002
2013-01-31
200
...
...
...
And I would like to have something like this
isin
extraction_dt
yield
001
2013-01-01
100
001
2013-01-02
100
001
2013-01-03
100
..
.....
...
001
2013-02-01
110
...
...
...
I tried the following code but it does not work. I get the error message AnalysisException: Could not resolve table reference: 'cte'. How would you convert monthly to daily data?
with cte as
(select isin, extraction_dt, yield
from datashop
union all
select isin, extraction_dt, dateadd(d, 1, extraction_dt) AS date_dt, yield
from cte
where datediff(m,date_dt,dateadd(d, 1, date_dt))=0
)
select isin, date_dt,
1.0*isin / count(*) over (partition by isin, date_dt) AS daily_yield
from cte
order by 1,2
I can suggest easy solution.
generate a date series
match it with your data so it gets repeated.
So, here is the SQL you can use for Impala.
select isin, extraction_dt, a.dt AS date_dt, yield
from
datashop d,
(
select now() - INTERVAL (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) ) DAY as dt
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
) a
WHERE
from_timestamp(a.dt,'yyyy/MM') =from_timestamp(d.extraction_dt,'yyyy/MM')
order by 1,2,3
the alias a is going to generate a series of dates.
WHERE - this clause will restrict to the month of extraction_dt. and you will get all possible values for a month.
ORDER BY - will show a nice output.
Your WITH clause has a recursive (self-referencing) query. In most SQL dialects, this requires using WITH RECURSIVE, not plain WITH. According to the Impala SQL reference, Impala does not support recursive common table expressions:
The Impala WITH clause does not support recursive queries in the
WITH, which is supported in some other database systems.
In other words, you cannot do this in Impala.

how to select set of records is ID is present in one of them

Here is the table where ORGID/USERID makes unique combination:
ORGID USERID
1 1
1 2
1 3
1 4
2 1
2 5
2 6
2 7
3 9
3 10
3 11
I need to select all records (organizations and users) wherever USERID 1 is present. So USERID=1 is present in ORGID 1 and 2 so then select all users for these organizations including user 1 itself, i.e.
ORGID USERID
1 1
1 2
1 3
1 4
2 1
2 5
2 6
2 7
Is it possible to do it with one SQL query rather than SELECT *.. WHERE USERID IN (SELECT...
You could use exists:
select *
from mytable t
where exists (select 1 from mytable t1 where t1.orgid = t.orgid and t1.userid = 1)
Another option is window functions. In Postgres:
select *
from (
select t.*,
bool_or(userid = 1) over(partition by orgid) has_user_1
from mytable t
) t
where has_user_1
Or a more generic approach, that uses portable expressions:
select *
from (
select t.*,
max(case when userid = 1 then 1 else 0 end) over(partition by orgid) has_user_1
from mytable t
) t
where has_user_1 = 1
Yes, you can do it with a single select statement - no in or exists conditions, no analytic or aggregate functions in a subquery, etc. Why you want to do it that way is not clear; in any case, it is possible that the solution below is also more efficient than the alternatives. You will have to test on your real-life data to see if that is true.
The solution below has two potential disadvantages: it only works in Oracle (it uses a proprietary extension of SQL, the match_recognize clause); and it only works in Oracle 12.1 or higher.
with
my_table(orgid, userid) as (
select 1, 1 from dual union all
select 1, 2 from dual union all
select 1, 3 from dual union all
select 1, 4 from dual union all
select 2, 1 from dual union all
select 2, 5 from dual union all
select 2, 6 from dual union all
select 2, 7 from dual union all
select 3, 9 from dual union all
select 3, 10 from dual union all
select 3, 11 from dual
)
-- End of SIMULATED data (for testing), not part of the solution.
-- In real life you don't need the WITH clause; reference your actual table.
select *
from my_table
match_recognize(
partition by orgid
all rows per match
pattern (x* a x*)
define a as userid = 1
);
Output:
ORGID USERID
---------- ----------
1 1
1 2
1 3
1 4
2 1
2 5
2 7
2 6
You can use exists:
select ou.*
from orguser ou
where exists (select 1
from orguser ou ou2
where ou2.orgid = ou.orgid and ou2.userid = 1
);
Apart from Exists and windows function, you can use IN as follows:
select *
from your_table t
where t.orgid in (select t1.orgid from your_table t1 where t1.userid = 1)

Running count distinct over a column - Oracle SQL

I want to aggregate the DAYS column based on the running distinct counts of CLIENT_ID, but the catch is CLIENT_ID that were seen from the previous DAYS should not be counted. How to do this in Oracle SQL?
Based on the table below (let's call this table DAY_CLIENT):
DAY CLIENT_ID
1 10
1 11
1 12
2 10
2 11
3 10
3 11
3 12
3 13
4 10
I want to get (let's call this table DAY_AGG):
DAYS CNT_CLIENT_ID
1 3
2 3
3 4
4 4
So, in day 1 there are 3 distinct client IDs.
In day 2, there are still 3 because CLIENT_ID 10 & 11 were already found in day 1. In day 3, distinct clients became 4 because CLIENT_ID 13 is not found on previous days.
Here's an alternative solution that may or may not be more performant than the other solutions:
WITH your_table AS (SELECT 1 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 1 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 1 DAY, 12 CLIENT_ID FROM dual UNION ALL
SELECT 2 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 2 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 12 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 13 CLIENT_ID FROM dual UNION ALL
SELECT 4 DAY, 10 CLIENT_ID FROM dual)
SELECT DISTINCT DAY,
COUNT(CASE WHEN rn = 1 THEN client_id END) OVER (ORDER BY DAY) num_distinct_client_ids
FROM (SELECT DAY,
client_id,
row_number() OVER (PARTITION BY client_id ORDER BY DAY) rn
FROM your_table);
DAY NUM_DISTINCT_CLIENT_IDS
---------- -----------------------
1 3
2 3
3 4
4 4
I recommend you test all the solutions against your data to see which one works best for you.
One approach used a correlated subquery:
SELECT DISTINCT
d1.DAYS,
(SELECT COUNT(DISTINCT d2.CLIENT_ID) FROM yourTable d2
WHERE d2.DAYS <= d1.DAYS) AS CNT_CLIENT_ID
FROM yourTable d1
Here is a demo below for SQL Server, but it should also run on your Oracle. I always struggle with setting up Oracle demos.
Demo
You could also use apply operator if oracle support.
select day, CNT_CLIENT_ID
from DAY_CLIENT t cross apply (
select count(distinct CLIENT_ID) as CNT_CLIENT_ID
from DAY_CLIENT
where day <= t.day) tt
group by day, CNT_CLIENT_ID;
In other way use subquery with correlation approach
select day, (select count(distinct CLIENT_ID)
from DAY_CLIENT
where day <= t.day) as DAY_CLIENT
from DAY_CLIENT t
group by day;
Try to keep it simple, always. All other answers also good if you want to learn other ways. But in this case no need to be fancy at all.
SELECT days
, COUNT(DISTINCT client_id) cnt
FROM
(
SELECT 1 days, 10 client_id FROM dual --1
UNION ALL
SELECT 1, 11 FROM dual --2
UNION ALL
SELECT 1, 12 FROM dual --3
UNION ALL
SELECT 1, 11 FROM dual --4
UNION ALL
SELECT 2, 10 FROM dual
UNION ALL
SELECT 2, 11 FROM dual
UNION ALL
SELECT 2, 12 FROM dual
UNION ALL
SELECT 3, 10 FROM dual
UNION ALL
SELECT 3, 11 FROM dual
UNION ALL
SELECT 3, 12 FROM dual
UNION ALL
SELECT 3, 13 FROM dual
UNION ALL
SELECT 4, 10 FROM dual
)
GROUP BY days
ORDER BY 1
/
DAYS | CLIENT_ID
----------------
1 3
2 3
3 4
4 1

Oracle SQL (Toad): Expand table

Suppose I have an SQL (Oracle Toad) table named "test", which has the following fields and entries (dates are in dd/mm/yyyy format):
id ref_date value
---------------------
1 01/01/2014 20
1 01/02/2014 25
1 01/06/2014 3
1 01/09/2014 6
2 01/04/2015 7
2 01/08/2015 43
2 01/09/2015 85
2 01/12/2015 4
I know from how the table has been created that, since there are value entries for id = 1 for February 2014 and June 2014, the values for March through May 2014 must be 0. The same applies to July and August 2014 for id = 1, and for May through July 2015 and October through November 2015 for id = 2.
Now, if I want to calculate, say, the median of the value column for a given id, I will not arrive at the correct result using the table as it stands - as I'm missing 5 zero entries for each id.
I would therefore like to create/use the following (potentially just temporary table)...
id ref_date value
---------------------
1 01/01/2014 20
1 01/02/2014 25
1 01/03/2014 0
1 01/04/2014 0
1 01/05/2014 0
1 01/06/2014 3
1 01/07/2014 0
1 01/08/2014 0
1 01/09/2014 6
2 01/04/2015 7
2 01/05/2015 0
2 01/06/2015 0
2 01/07/2015 0
2 01/08/2015 43
2 01/09/2015 85
2 01/10/2015 0
2 01/11/2015 0
2 01/12/2015 4
...on which I could then compute the median by id:
select id, median(value) as med_value from test group by id
How do I do this? Or would there be an alternative way?
Many thanks,
Mr Clueless
In this solution, I build a table with all the "needed dates" and value of 0 for all of them. Then, instead of a join, I do a union all, group by id and ref_date and ADD the values in each group. If the date had a row with a value in the original table, then that's the resulting value; and if it didn't, the value will be 0. This avoids a join. In almost all cases a union all + aggregate will be faster (sometimes much faster) than a join.
I added more input data for more thorough testing. In your original question, you have two id's, and for both of them you have four positive values. You are missing five values in each case, so there will be five zeros (0) which means the median is 0 in both cases. For id=3 (which I added) I have three positive values and three zeros; the median is half of the smallest positive number. For id=4 I have just one value, which then should be the median as well.
The solution includes, in particular, an answer to your specific question - how to create the temporary table (which most likely doesn't need to be a temporary table at all, but an inline view). With factored subqueries (in the WITH clause), the optimizer decides if to treat them as temporary tables or inline views; you can see what the optimizer decided if you look at the Explain Plan.
with
inputs ( id, ref_date, value ) as (
select 1, to_date('01/01/2014', 'dd/mm/yyyy'), 20 from dual union all
select 1, to_date('01/02/2014', 'dd/mm/yyyy'), 25 from dual union all
select 1, to_date('01/06/2014', 'dd/mm/yyyy'), 3 from dual union all
select 1, to_date('01/09/2014', 'dd/mm/yyyy'), 6 from dual union all
select 2, to_date('01/04/2015', 'dd/mm/yyyy'), 7 from dual union all
select 2, to_date('01/08/2015', 'dd/mm/yyyy'), 43 from dual union all
select 2, to_date('01/09/2015', 'dd/mm/yyyy'), 85 from dual union all
select 2, to_date('01/12/2015', 'dd/mm/yyyy'), 4 from dual union all
select 3, to_date('01/01/2016', 'dd/mm/yyyy'), 12 from dual union all
select 3, to_date('01/03/2016', 'dd/mm/yyyy'), 23 from dual union all
select 3, to_date('01/06/2016', 'dd/mm/yyyy'), 2 from dual union all
select 4, to_date('01/11/2014', 'dd/mm/yyyy'), 9 from dual
),
-- the "inputs" table constructed above is for testing only,
-- it is not part of the solution.
ranges ( id, min_date, max_date ) as (
select id, min(ref_date), max(ref_date)
from inputs
group by id
),
prep ( id, ref_date, value ) as (
select id, add_months(min_date, level - 1), 0
from ranges
connect by level <= 1 + months_between( max_date, min_date )
and prior id = id
and prior sys_guid() is not null
),
v ( id, ref_date, value ) as (
select id, ref_date, sum(value)
from ( select id, ref_date, value from prep union all
select id, ref_date, value from inputs
)
group by id, ref_date
)
select id, median(value) as median_value
from v
group by id
order by id -- ORDER BY is optional
;
ID MEDIAN_VALUE
-- ------------
1 0
2 0
3 1
4 9
If ref_date is date and is second
with int1 as (select id
, max(ref_date) as max_date
, min(ref_date) as min_date from test group by id )
, s(n) as (select level -1 from dual connect by level <= (select max(months_between(max_date, min_date)) from int1 ) )
select i.id
, add_months(i.min_date,s.n) as ref_date
, nvl(value,0) as value
from int1 i
join s on add_months(i.min_date,s.n) <= i.max_date
LEFT join test t on t.id = i.id and add_months(i.min_date,s.n) = t.ref_date
And with median
with int1 as (select id
, max(ref_date) as max_date
, min(ref_date) as min_date from test group by id )
, s(n) as (select level -1 from dual connect by level <= (select max(months_between(max_date, min_date)) from int1 ) )
select i.id
, MEDIAN(nvl(value,0)) as value
from int1 i
join s on add_months(i.min_date,s.n) <= i.max_date
LEFT join test t on t.id = i.id and add_months(i.min_date,s.n) = t.ref_date
group by i.id

SQL Monthly Summary

I have a table that contains a startdate for each item
for example:
ID - Startdate
1 - 2011-01-01
2 - 2011-02-01
3 - 2011-04-01
...
I need a query that will give me the count of each item within each month, i need a full 12 month report. I tried simply grouping by the Month(StartDate) but this doesnt give me a zero for the months with no values, in the case above, for march.
so i would like the output to be along the lines of..
Month - Count
1 20
2 14
3 0
...
Any ideas?
Thanks.
SELECT A.Month, ISNULL(B.countvalue,0) Count
FROM (SELECT 1 AS MONTH
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5
UNION
SELECT 6
UNION
SELECT 7
UNION
SELECT 8
UNION
SELECT 9
UNION
SELECT 10
UNION
SELECT 11
UNION
SELECT 12 ) A LEFT JOIN (SELECT datepart(month,Startdate) AS Month, Count(ID) as countvalue FROM yourTable GROUP BY datepart(month,Startdate))B
ON A.month = B.month
Hope this helps
Another way to do this using SQL Server 2005+ or Oracle.
SQL Statement
;WITH q (Month) AS (
SELECT 1
UNION ALL
SELECT Month + 1
FROM q
WHERE q.Month < 12
)
SELECT q.Month
, COUNT(i.ID)
FROM q
LEFT OUTER JOIN Input i ON MONTH(i.StartDate) = q.Month
GROUP BY
q.Month
Test script
;WITH Input (ID, StartDate) AS (
SELECT 1, '2011-01-01'
UNION ALL SELECT 2, '2011-02-01'
UNION ALL SELECT 3, '2011-04-01'
)
, q (Month) AS (
SELECT 1
UNION ALL
SELECT Month + 1
FROM q
WHERE q.Month < 12
)
SELECT q.Month
, COUNT(i.ID)
FROM q
LEFT OUTER JOIN Input i ON MONTH(i.StartDate) = q.Month
GROUP BY
q.Month