Combine multiple aggregate queries into one insert statement - sql

I'm trying to make a database with timeseries data and I've got 4 different queries that I'm trying to mash together to form a set of data for a single insert.
SELECT 'some date' AS date, a, b, COUNT(foo) AS c
FROM 'my_db'
WHERE date BETWEEN dateadd(hour,-24,'some date') AND dateadd(hour,-23,'some date')
GROUP BY a, b
ORDER BY a, b ASC;
SELECT a, b, COUNT(foo) AS c1
FROM 'my_db'
WHERE (date BETWEEN dateadd(hour,-24,'some date') AND dateadd(hour,-23,'some date'))
AND (foo = 'some value')
GROUP BY a, b
ORDER BY a, b ASC;
SELECT a, b, COUNT(foo) AS c2
FROM 'my_db'
WHERE (date BETWEEN dateadd(hour,-24,'some date') AND dateadd(hour,-23,'some date'))
AND (foo = 'some other value')
GROUP BY a, b
ORDER BY a, b ASC;
If these were in separate tables, I feel like I'd be able to individually do full outer joins on fields a and b, then just fill the empty fields with 0s.
And by the end, I'd be able to get to the dataset to look like this for a single insert :
| date | a | b | c | c1 | c2 |
--------------------------------------------------------------------
| 2019-08-27 12:00:00 | dog | woof | 100 | 76 | 26 |
| 2019-08-27 12:00:00 | cat | meow | 82 | 33 | 49 |
| 2019-08-27 12:00:00 | pony | neigh | 300 | 0 | 300 |
Is there a clean way to combine these queries into one so that I can bulk insert the formatted set?
Or is there a smarter way to approach the problem?

I think you want conditional aggregation:
SELECT 'some date' AS date, a, b, COUNT(foo) AS c,
SUM(CASE WHEN foo = 'some value' THEN 1 ELSE 0 END) as c1,
SUM(CASE WHEN foo = 'some other value' THEN 1 ELSE 0 END) as c2
FROM 'my_db'
WHERE date BETWEEN dateadd(hour, -24, 'some date') AND dateadd(hour, -23, 'some date')
GROUP BY a, b
ORDER BY a, b ASC;

Related

SQL Multiple Join but Retain all Records from One Table

I have a difficult operation that must be performed within SQL due to operational limitations.
I have 3 tables that contain required information. They each have some common columns that can be used for a join but do not have a single one that all three share.
The 3 tables are:
Table 1:
| rule_type | code |
|-----------|------|
| Type A | A1 |
| Type A | A1 |
| Type B | B1 |
| Type B | B1 |
| Type C | C1 |
| Type C | C1 |
Table 2:
| site_ref | code |
|----------|------|
| XYZ | A1 |
| XYZ | A1 |
| XYZ | C1 |
| XYZ | C1 |
Table 3:
| site_ref | population |
|----------|------------|
| XYZ | 100 |
| XYZ | 100 |
| XYZ | 100 |
The JOIN required must contain all 3 columns and an additional one that counts the number of distinct entries from table 1. The desired outcome would be:
| rule_type | code | site_ref | population | count |
|-----------|------|----------|------------|-------|
| Type A | A1 | XYZ | 100 | 2 |
| Type B | B1 | XYZ | 100 | 0 |
| Type C | C1 | XYZ | 100 | 2 |
I have attempted to create this joining on common columns via a FULL OUTER JOIN:
SELECT code, count(*) as count, site_ref, population, rule_type, population
FROM
(SELECT A.code, count(*) as count, B.site_ref, C.population, A.rule_type
FROM table_1 as A
FULL OUTER JOIN table_2 AS B ON A.code = B.code
JOIN table_3 as C ON B.site_ref = C.site_ref
WHERE site_ref = 'XYZ' AND rule_type in ('Type A', 'Type B', 'Type C'))
GROUP BY code, count(*) as count, site_ref, population, rule_type, population
But this is returning:
| rule_type | code | site_ref | population | count |
|-----------|------|----------|------------|-------|
| Type A | A1 | XYZ | 100 | 2 |
| Type C | C1 | XYZ | 100 | 2 |
As because there is no corresponding Type B count in table 2, it has nothing to count. I thought using a FULL OUTER JOIN would bring in these additional rule types but it hasn't. Is there a way to adapt the JOIN that will bring in the additional columns from table 1 and create an entry showing the columns of Type B but with a count of 0?
i dont understand the table 3, they have always the same data, if not how we know which dataset is for the type B?
If you want to have all possibilities, you can do a cross join.
with table_1
as
(
Select 'Type A' as rule_type, 'A1' as code
Union all
Select 'Type A' as rule_type, 'A1' as code
Union all
Select 'Type B' as rule_type, 'B1' as code
Union all
Select 'Type B' as rule_type, 'B1' as code
Union all
Select 'Type C' as rule_type, 'C1' as code
Union all
Select 'Type C' as rule_type, 'C1' as code
),
table_2
as
(
Select 'XYZ' as site_ref, 'A1' as code
Union all
Select 'XYZ' as site_ref, 'A1' as code
Union all
Select 'XYZ' as site_ref, 'C1' as code
Union all
Select 'XYZ' as site_ref, 'C1' as code
),
table_3
as
(
Select 'XYZ' as site_ref, '100' as population
Union all
Select 'XYZ' as site_ref, '100' as population
Union all
Select 'XYZ' as site_ref, '100' as population
)
Select
x.code,
x.rule_type,
c.site_ref,
c.population,
x.count
from
(
Select
a.code,
a.rule_type,
b.count
from(
Select code,
count(code) as count
from table_2
group by code
) B
full outer join table_1 AS A ON
a.code = b.code
group by
a.code,
a.rule_type,
b.count
) x
cross join table_3 as c
WHERE site_ref = 'XYZ' AND rule_type in ('Type A', 'Type B', 'Type C')
group by
x.code,
x.rule_type,
c.site_ref,
c.population,
x.count
if you want default values, you can use case when
Select
a.code,
a.rule_type,
case when x.site_ref is NUll then 'XYZ' else x.site_ref end as site_ref ,
case when c.population is NUll then '100' else c.population end as population,
case when x.count is NUll then '0' else x.count end as count
from
(
Select
code,
site_ref,
count(b.Code) as count
from
table_2 as b
group by
b.code,
b.site_ref
) x
full outer join table_1 as a on
a.code = x.code
full outer join table_3 as c ON
x.site_ref = c.site_ref
group by
a.code,
a.rule_type,
x.site_ref,
c.population,
x.count

T-SQL Query Column based on filtered condition

I could do this rather easily in Python (or any other language), but I'm trying to see if this is possible with pure T-sql
I have two tables:
Table A has a bunch of general data and timestamps with each row
+------+------+------+-----------+
| Col1 | Col2 | Col3 | Timestamp |
+------+------+------+-----------+
| A | B | C | 17:00 |
| D | E | F | 18:00 |
| G | H | I | 23:00 |
+------+------+------+-----------+
Table B is considered metadata
+-------+-----------+
| RunNo | Timestamp |
+-------+-----------+
| 1 | 16:50 |
| 2 | 17:30 |
| 3 | 18:00 |
| 4 | 19:00 |
+-------+-----------+
So the general data is referenced to a "RunNo". The timestamp in table B is just when that "Run" was created in the DB. You can match the General data to its proper run number by comparing the timestamps. For example the timestamp for the first row in Table A is 17:00 which is greater than 16:50 and less than 17:30, so obviously this row belongs to RunNo 1. How can I perform this query so the resulting table is
+------+------+------+-----------+-------+
| Col1 | Col2 | Col3 | Timestamp | RunNo |
+------+------+------+-----------+-------+
| A | B | C | 17:00 | 1 |
| D | E | F | 18:00 | 2 |
| G | H | I | 23:00 | 4 |
+------+------+------+-----------+-------+
I though maybe using CASE would be helpful here, but I couldn't figure how to put it togther
SELECT a.*,
CASE WHEN a.TIMESTAMP < b.TIMESAMP AND a.TIMESTAMP > b.TIMSTAMP then b.RunNo END AS RunNo
FROM A as a, B as b
Any help would be greatly appreciated.
CASE allows you to return different values (i.e. columns or expressions) based on a condition. This is not what you what here. You want to join tables and filter matching rows based on a condition.
I have replaced the name Timestamp with ts, as even escaped, I had difficulties with it on SQL Fiddle. It is a reserved keyword.
SELECT A.Col1, A.Col2, A.Col3, A.ts, MAX(B.RunNo) AS RunNo
FROM
A
INNER JOIN B
ON A.ts > B.ts
GROUP BY A.Col1, A.Col2, A.Col3, A.ts
With A.ts > B.ts this returns RunNo 2 for the second entry. With A.ts >= B.ts this returns RunNo 3 for the second entry.
See http://sqlfiddle.com/#!18/9dd143/6/0
with TableA as (
Select [Col1] = 'A',[Col2] = 'B',[Col3] = 'C',[Timestamp] = '17:00'
Union all Select [Col1] = 'D',[Col2] = 'E',[Col3] = 'F',[Timestamp] = '18:00'
Union all Select [Col1] = 'G',[Col2] = 'H',[Col3] = 'I',[Timestamp] = '23:00'
)
, TableB as (
Select [RunNo] = '1',[Timestamp] = '16:50'
Union all Select [RunNo] = '2',[Timestamp] = '17:30'
Union all Select [RunNo] = '3',[Timestamp] = '18:00'
Union all Select [RunNo] = '4',[Timestamp] = '19:00'
)
, TableBWithRowNumber as (
select b.RunNo, ROW_NUMBER() over (order by b.timestamp asc) as number, cast(b.Timestamp as time) as timestamp
from TableB b
)
, TableBWithNextRun as (
select b1.RunNo, startTime = b1.timestamp , endTime = b2.timestamp
from TableBWithRowNumber b1
left join TableBWithRowNumber b2 on b1.number + 1= b2.number
)
select *
from TableA a
inner join TableBWithNextRun B
on a.Timestamp >= b.startTime and (a.Timestamp < b.endTime or b.endTime is null)
This converts your timestamps to time. I wasn't sure what you're datatype is internally.
This outputs the following
Col1 Col2 Col3 Timestamp RunNo startTime endTime
A B C 17:00 1 16:50:00.0000000 17:30:00.0000000
D E F 18:00 3 18:00:00.0000000 19:00:00.0000000
G H I 23:00 4 19:00:00.0000000 NULL
You can use the lag function to get the prior value of a column and then just join.
WITH Runs AS
(
SELECT
RunNo,
COALESCE(LAG(TIMESTAMP),'00:00')) AS START_TS,
TIMESTAMP AS END_TS
FROM TableB
ORDER BY RunNo ASC
)
SELECT B.RunNo, A.*
FROM TableA A
JOIN Runs B ON A.Timestamp >= B.Start_TS AND A.Timestamp < B.End_Ts
This should be faster than any group by solution on larger datasets.

Discard rows which is not MAX in that group

I have data like this:
a b c
-------|--------|--------
100 | 3 | 50
100 | 4 | 60
101 | 3 | 70
102 | 3 | 70
102 | 4 | 80
102 | 5 | 90
a : key
b : sub_id
c : value
I want to NULL the c row for each element which has non-max a column.
My resulting table must look like:
a b c
-------|--------|--------
100 | 3 | NULL
100 | 4 | 60
101 | 3 | 70
102 | 3 | NULL
102 | 4 | NULL
102 | 5 | 90
How can I do this with an SQL Query?
#UPDATE
My relational table has about a billion rows. Please remind that while providing an answer. I cannot wait couple of hours or 1 day for executing.
Updated after the requirement was changed to "update the table":
with max_values as (
select a,
b,
max(c) over (partition by a) as max_c
from the_table
)
update the_table
set c = null
from max_values mv
where mv.a = the_table.a
and mv.b = the_table.b
and mv.max_c <> the_table.c;
SQLFiddle: http://sqlfiddle.com/#!15/1e739/1
Another possible solution, which might be faster (but you need to check the execution plan)
update the_table t1
set c = null
where exists (select 1
from the_table t2
where t2.a = t1.a
and t2.b = t2.b
and t1.c < t2.c);
SQLFiddle: http://sqlfiddle.com/#!15/1e739/2
But with "billion" rows there is no way this is going to be really fast.
DECLARE #TAB TABLE (A INT,B INT,C INT)
INSERT INTO #TAB VALUES
(100,3,50),
(100,4,60),
(101,3,70),
(102,3,70),
(102,4,80),
(102,5,90)
UPDATE X
SET C = NULL
FROM #TAB X
LEFT JOIN (
SELECT A,MAX(C) C
FROM #TAB
GROUP BY A) LU ON X.A = LU.A AND X.C = LU.C
WHERE LU.A IS NULL
SELECT * FROM #TAB
Result:
this approach will help you
How about this formulation?
select a, b,
(case when c = max(c) over (partition by a) then c end) as c
from table t;
I'm not sure if you can get this faster. An index on a, c might help.
SELECT a, b,
CASE ROW_NUMBER() OVER (PARTITION BY a ORDER BY b DESC) WHEN 1 THEN с END c
FROM mytable

Trying to select multiple columns where one is unique

I am trying to select several columns from a table where one of the columns is unique. The select statement looks something like this:
select a, distinct b, c, d
from mytable
The table looks something like this:
| a | b | c | d | e |...
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5
| 1 | 2 | 3 | 4 | 6
| 2 | 5 | 7 | 1 | 9
| 7 | 3 | 8 | 6 | 4
| 7 | 3 | 8 | 6 | 7
So the query should return something like this:
| a | b | c | d |
|---|---|---|---|
| 1 | 2 | 3 | 4
| 2 | 5 | 7 | 1
| 7 | 3 | 8 | 6
I just want to remove all of the rows where b is duplicated.
EDIT: There seems to be some confusion about which row I want to be selected in the case of duplicate b values. I don't care because the a, c, and d should (but are not guaranteed to) be the same.
Try this
SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY b ORDER BY a) NO
,* FROM TableName) AS T1 WHERE NO = 1
I think you are nearly there with DISTINCT try:
SELECT DISTINCT a, b, c, d
FROM myTable
You haven't said how to pick a row for each b value, but this will pick one for each.
Select
a,
b,
c,
d,
e
From (
Select
a,
b,
c,
d,
e,
row_number() over (partition by b order by b) rn
From
mytable
) x
Where
x.rn = 1
If you don't care what values you get for B, C, D, and E, as long as they're appropriate for that key, you can group by A:
SELECT A, MIN(B), MIN(C), MIN(D), MIN(E)
FROM MyTable
GROUP BY A
Note that MAX() would be just as valid. Some RDBMSs support a FIRST() aggregate, or similar, for exactly these circumstances where you don't care which value you get (from a certain population).
This will return what you're looking for but I think your example is flawed because you've no determinism over which value from the e column is returned.
Create Table A1 (a int, b int, c int, d int, e int)
INSERT INTO A1 (a,b,c,d,e) VALUES (1,2,3,4,5)
INSERT INTO A1 (a,b,c,d,e) VALUES (1,2,3,4,6)
INSERT INTO A1 (a,b,c,d,e) VALUES (2,5,7,1,9)
INSERT INTO A1 (a,b,c,d,e) VALUES (7,3,8,6,4)
INSERT INTO A1 (a,b,c,d,e) VALUES (7,3,8,6,7)
SELECT * FROM A1
SELECT a,b,c,d
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY b ORDER BY a) RowNum ,*
FROM A1
) As InnerQuery WHERE RowNum = 1
You cannot put DISTINCT on a single column. You should put it right after the SELECT:
SELECT DISTINCT a, b, c, d
FROM mytable
It return the result you need for your sample table. However if you require to remove duplicates only from a single column (which is not possible) you probably misunderstood something. Give us more descriptions and sample, and we try to guide you to the right direction.

How to create view that combine multiple row from 2 tables?

I want to create view that combine data from two tables, sample data in each table is like below.
SELECT Command for TableA
SELECT [ID], [Date], [SUM]
FROM TableA
Result
ID | Date | SUM
1 | 1/1/2010 | 2
1 | 1/2/2010 | 4
3 | 1/3/2010 | 6
SELECT Command for TableB
SELECT [ID], [Date], [SUM]
FROM TableB
Result
ID | Date | SUM
1 | 1/1/2010 | 5
1 | 2/1/2010 | 3
1 | 31/1/2010 | 2
2 | 1/2/2010 | 20
I want output like below
ID | Date | SUMA | SUMB
1 | 1/1/2010 | 2 | 10
1 | 1/2/2010 | 4 | 0
2 | 1/2/2010 | 0 | 20
3 | 1/3/2010 | 6 | 0
How can I do that on SQL Server 2005?
Date information be vary, as modify in table.
Try this...
SELECT
ISNULL(TableA.ID, TableB.ID) ID,
ISNULL(TableA.Date, TableB.Date),
ISNULL(TableA.Sum,0) SUMA,
ISNULL(TableB.Sum, 0) SUMB
FROM
TableA FULL OUTER JOIN TableB
ON TableA.ID = TableB.ID AND TableA.Date = TableB.Date
ORDER BY
ID
A full outer join is what you need because you want to include results from both tables regardless of whether there is a match or not.
I usually union the two queries together and then group them like so:
SELECT ID, [Date], SUM(SUMA) As SUMA, SUM(SUMB) AS SUMB
FROM (
SELECT ID, [Date], SUMA, 0 AS SUMB
FROM TableA
UNION ALL
SELECT ID, [Date], 0 As SUMA, SUMB
FROM TableB
)
GROUP BY ID, [Date]
SELECT
ISNULL(a.ID, b.ID) AS ID,
ISNULL(a.Date, b.Date) AS Date,
ISNULL(a.SUM, 0) AS SUMA,
ISNULL(b.SUM, 0) AS SUMB,
FROM
TableA AS a
FULL JOIN
TableB AS b
ON a.ID = b.ID
AND a.Date = b.Date;
It's not obvious how you want to combine the two tables. I think this is what you're after, but can you confirm please?
TableA.Date is the most important field; if a given date occurs in TableA then it will be included in the view, but not if it only occurs in TableB.
If a date has records in TableA and TableB and the records have a matching ID, they are combined into one row in the view with SUMA being taken from TableA.Sum and SUMB being TableA.Sum * TableB.Sum (e.g. Date: 01/01/2010, ID: 1) (e.g. Date: 01/03/2010 ID: 3).
If a date has records in TableA and TableB with different IDs, the view include these records separately without multiplying the Sum values at all (e.g. Date 02/01/2010, ID: 1 and ID: 2)