SQL/Impala: combined multiple query (with different where clause) into one - sql

I have the following query:
'select team, count(distinct id) as distinct_id_count_w1 from myTable where timestamp > t1 and timestamp < t2 group by team'
'select team, count(distinct id) as distinct_id_count_w2 from myTable where timestamp > t2 and timestamp < t3 group by team'
Is it possible to combine these two queries into one? Thanks!

Easily :) This should work on most common DB engines:
select team, count(distinct id) as distinct_id_count_w1, null as distinct_id_count_w2 from myTable where timestamp > t1 and timestamp < t2 group by team
UNION ALL
select team, null as distinct_id_count_w1, count(distinct id) as distinct_id_count_w2 from myTable where timestamp > t2 and timestamp < t3 group by team
As Edamame stated, you may want to read both results per team. That was not clear from the question itself, but may be solved this way:
SELECT
COALESCE(interval1.team interval2.team) AS team,
interval1.distinct_id_count_w1,
interval2.distinct_id_count_w2
FROM (
select team, count(distinct id) as distinct_id_count_w1 from myTable where timestamp > t1 and timestamp < t2 group by team
) AS interval1
FULL OUTER JOIN
(
select team, count(distinct id) as distinct_id_count_w2 from myTable where timestamp > t2 and timestamp < t3 group by team
) AS interval2
ON interval1.team IS NULL OR interval2.team IS NULL OR interval1.team = interval2.team

if u think that the returned results are different, u should use "UNION ALL" because u only work with "UNION", sql will distinct the result to effect the performance of query

Related

How to get Full Record with MAX as aggregate function

I have a table with schema (id, date, value, source, ticker). I wanted to get record having highest ID group by date in sql server
Example Data
ID|date|value|source|ticker
3|10-Dec-2017|10|a|b
1|10-Dec-2017|11|p|q
Below query works in Sqlite. Do we know if I can do same with SqlServer
select max(id), date, value, source, ticker from table group by date
Expected return:-
ID|date|value|source|ticker
3|10-Dec-2017|10|a|b
Also how I can do same operation on UNION of 2 tables with same schema.
You can use subquery :
select t.*
from table t
where id = (select max(t1.id) from table t1 where t1.date = t.date);
However, you can also use row_number() function :
select top (1) with ties *
from table t
order by row_number() over (partition by [date] order by id desc);
You can also do it like below :
select t1.* from table1 t1
join (
select max(id) as id, [date] from table1
group by [date]
) as t2 on t1.id = t2.id
SQL HERE

ORA-01427 - Need the counts of each value

I get "ORA-01427: single-row subquery returns more than one row" when I run the following query:
select count(*)
from table1
where to_char(timestamp,'yyyymmddhh24') = to_char(sysdate-1/24,'yyyymmddhh24')
and attribute = (select distinct attribute from table2);
I want to get the counts of each value of attribute in the specific time frame.
I would recommend writing this as:
select count(*)
from table1 t1
where timestamp >= trunc(sysdate-1/24, 'HH') and
timestamp < trunc(sysdate, 'HH') and
exists (select 1 from table2 t2 where t2.attribute = t1.attribute);
This formulation makes it easier to use indexes and statistics for optimizing the query. Also, select distinct is not appropriate with in (although I think Oracle will optimize away the distinct).
EDIT:
You appear to want to aggregate by attribute as well:
select t1.attribute, count(*)
from table1 t1
where timestamp >= trunc(sysdate-1/24, 'HH') and
timestamp < trunc(sysdate, 'HH') and
exists (select 1 from table2 t2 where t2.attribute = t1.attribute)
group by t1.attribute;
You can do it with a join and GROUP BY:
SELECT
count(*) AS Cnt
, a.attribute
FROM table1 t
JOIN table2 a ON t.attribute=a.attribute
WHERE to_char(t.timestamp,'yyyymmddhh24') = to_char(sysdate-1/24,'yyyymmddhh24')
GROUP BY a.attribute
This produces a row for each distinct attribute from table2, paired up with the corresponding count from table1.

Get latest sql rows based on latest date and per user

I have the following table:
RowId, UserId, Date
1, 1, 1/1/01
2, 1, 2/1/01
3, 2, 5/1/01
4, 1, 3/1/01
5, 2, 9/1/01
I want to get the latest records based on date and per UserId but as a part of the following query (due to a reason I cannot change this query as this is auto generated by a tool but I can write pass any thing starting with AND...):
SELECT RowId, UserId, Date
FROM MyTable
WHERE 1 = 1
AND (
// everything which needs to be done goes here . . .
)
I have tried similar query, but get an error:
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
EDIT: Database is Sql Server 2008
You could use a NOT EXISTS condition:
SELECT RowId, UserId, Date
FROM MyTable
WHERE 1 = 1
AND NOT EXISTS (
SELECT *
FROM MyTable AS t
WHERE t.UserId = MyTable.UserId
AND t.Date > MyTable.Date
)
;
Note that if a user has more than one row with the same latest Date value, the query will return all such entries. If necessary, you can modify the subquery's condition slightly to make sure only one row is returned:
WHERE t.UserId = MyTable.UserId
AND (t.Date > MyTable.Date
OR t.Date = MyTable.Date AND t.RowId > MyTable.RowId
)
With the above condition, if two or more rows with the same Date exist for the same user, the one with the greater RowId value will be returned.
Assuming you have the ability to modify anything within the AND clause you can do a query like this if you are using TSQL
SELECT RowId, UserId, [Date]
FROM #myTable
WHERE 1 = 1
AND (
RowId IN (
SELECT D.RowId
FROM (
SELECT DISTINCT MAX(RowId) AS RowId, UserId, MAX([Date]) AS [Date]
FROM #myTable
GROUP BY UserId
) AS D
)
)
Try:
SELECT RowId, UserId, Date
FROM MyTable
WHERE 1 = 1
AND EXISTS
(SELECT 1
FROM (SELECT UserId, MAX(Date) MaxDate
FROM MyTable
GROUP BY UserId) m
WHERE m.UserId = MyTable.UserId and m.MaxDate = MyTable.Date)
SQLFiddle here.
Assuming that RowID is an identity column:
SELECT t1.RowId, t1.UserId, t1.Date
FROM MyTable t1
WHERE 1 = 1
AND t1.RowID IN (
SELECT TOP 1 t2.RowID
FROM MyTable t2
WHERE t1.UserId = t2.UserId
AND t2.Date = (SELECT MAX(t3.Date) FROM MyTable t3
WHERE t2.UserID = t3.UserId)
)
Demo

SQL how to select a group of records based on some statistics of this group?

Example, I have a record set with three columns:
id,week,count
1,1,10;
1,2,20;
1,3,30;
2,1,3;
2,2,2;
2,3,15;
What I want is just the data of IDs whose average count is > 10. Then, in this example data, the data of id=1 will be selected.
Thanks.
SELECT id FROM YourTable GROUP BY id HAVING AVG(count) > 10
SELECT *
FROM YourTable
WHERE id IN (SELECT id FROM YourTable GROUP BY id HAVING AVG(count) > 10)
Or if you are using an access database (where IN happens to have horrendous performance for whatever reason) you can use:
SELECT t2.*
FROM (SELECT id FROM YourTable GROUP BY id HAVING AVG(count) > 10) AS t1
INNER JOIN YourTable AS t2 ON t1.id = t2.id
In most databases, you can also do this with window functions:
select t.*
from (select t.*, avg(count) over (partition by id) as avgcount
from t
) t
where avgcount > 10

PostgreSQL Selecting Most Recent Entry for a Given ID

Table Essentially looks like:
Serial-ID, ID, Date, Data, Data, Data, etc.
There can be Multiple Rows for the Same ID. I'd like to create a view of this table to be used in Reports that only shows the most recent entry for each ID. It should show all of the columns.
Can someone help me with the SQL select? thanks.
There's about 5 different ways to do this, but here's one:
SELECT *
FROM yourTable AS T1
WHERE NOT EXISTS(
SELECT *
FROM yourTable AS T2
WHERE T2.ID = T1.ID AND T2.Date > T1.Date
)
And here's another:
SELECT T1.*
FROM yourTable AS T1
LEFT JOIN yourTable AS T2 ON
(
T2.ID = T1.ID
AND T2.Date > T1.Date
)
WHERE T2.ID IS NULL
One more:
WITH T AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Date DESC) AS rn
FROM yourTable
)
SELECT * FROM T WHERE rn = 1
Ok, i'm getting carried away, here's the last one I'll post(for now):
WITH T AS (
SELECT ID, MAX(Date) AS latest_date
FROM yourTable
GROUP BY ID
)
SELECT yourTable.*
FROM yourTable
JOIN T ON T.ID = yourTable.ID AND T.latest_date = yourTable.Date
I would use DISTINCT ON
CREATE VIEW your_view AS
SELECT DISTINCT ON (id) *
FROM your_table a
ORDER BY id, date DESC;
This works because distinct on suppresses rows with duplicates of the expression in parentheses. DESC in order by means the one that normally sorts last will be first, and therefor be the one that shows in the result.
https://www.postgresql.org/docs/10/static/sql-select.html#SQL-DISTINCT
This seems like a good use for correlated subqueries:
CREATE VIEW your_view AS
SELECT *
FROM your_table a
WHERE date = (
SELECT MAX(date)
FROM your_table b
WHERE b.id = a.id
)
Your date column would need to uniquely identify each row (like a TIMESTAMP type).