How to fill in empty date rows multiple times? - sql

I am trying to fill in dates with empty data, so that my query returned has every date and does not skip any.
My application needs to count bookings for activities by date in a report, and I cannot have skipped dates in what is returned by my SQL
I am trying to use a date table (I have a table with every date from 1/1/2000 to 12/31/2030) to accomplish this by doing a RIGHT OUTER JOIN on this date table, which works when dealing with one set of activities. But I have multiple sets of activities, each needing their own full range of dates regardless if there were bookings on that date.
I also have a function (DateRange) I found that allows for this:
SELECT IndividualDate FROM DateRange('d', '11/01/2017', '11/10/2018')
Let me give an example of what I am getting and what I want to get:
BAD: Without empty date rows:
date | activity_id | bookings
-----------------------------
1/2 | 1 | 5
1/4 | 1 | 4
1/3 | 2 | 6
1/4 | 2 | 2
GOOD: With empty date rows:
date | activity_id | bookings
-----------------------------
1/2 | 1 | 5
1/3 | 1 | NULL
1/4 | 1 | 4
1/2 | 2 | NULL
1/3 | 2 | 6
1/4 | 2 | 2
I hope this makes sense. I get the whole point of joining to a table of just a list of dates OR using the DateRange table function. But neither get me the "GOOD" result above.

Use a cross join to generate the rows and then left join to fill in the values:
select d.date, a.activity_id, t.bookings
from DateRange('d', ''2017-11-01',''2018-11-10') d cross join
(select distinct activity_id from t) a left join
t
on t.date = d.date and t.activity_id = a.activity_id;
It is a bit hard to follow what your data is and what comes from the function. But the idea is the same, wherever the data comes from.

I figured it out:
SELECT TOP 100 PERCENT masterlist.dt, masterlist.activity_id, count(r_activity_sales_bymonth.bookings) AS totalbookings
FROM (SELECT c.activity_id, dateadd(d, b.incr, '2016-12-31') AS dt
FROM (SELECT TOP 365 incr = row_number() OVER (ORDER BY object_id, column_id), *
FROM (SELECT a.object_id, a.column_id
FROM sys.all_columns a CROSS JOIN
sys.all_columns b) AS a) AS b CROSS JOIN
(SELECT DISTINCT activity_id
FROM r_activity_sales_bymonth) AS c) AS masterlist LEFT OUTER JOIN
r_activity_sales_bymonth ON masterlist.dt = r_activity_sales_bymonth.purchase_date AND masterlist.activity_id = r_activity_sales_bymonth.activity_id
GROUP BY masterlist.dt, masterlist.activity_id
ORDER BY masterlist.dt, masterlist.activity_id

Related

Joining a large number of tables so that all dates are kept

I have around 50-70 tables that look very similar, say:
Table 1:
id | date | count_A | count_B
1 12.05.2021 12 15
Table 2:
id | date | count_A | count_B
1 15.05.2021 8 24
The main table looks like the following:
id | label
1 X
In the end, what I would like to get is:
id | date | count_A | count_B | label
1 12.05.2021 12 15 X
1 15.05.2021 8 24 X
One intuitive approach is to use the full outer join and join on id but that would result in strange rows with several date values.
Joining on (id, date) doesn't seem to be a great option either.
What can be a possible solution here? Thanks!
You can use a subquery with the statement WITH. Inside this subquery, you can use the UNION with all the tables with the same schema.
Use a join statement between the subquery, in this case tablaC and the main table, which has a different schema.
You can see this example:
WITH tablaC AS (
SELECT ID,date,count_C,Count_D FROM Table_C
UNION ALL
SELECT ID,date,count_C,Count_D FROM Table_D
)
select c.ID,date,c.count_C,c.Count_D,m.label
from tablaC as c
join table_main as m on c.id=m.id

How to join tables on a partly-overlapping key column while retaining all data and not creating duplicate columns?

I have three tables that all have a “date” column and another column with counts of different variables - let’s call the tables T1, T2, and T3 and each of their columns are counts of dogs, cats, and birds spotted that day.
Not every table has the same set of dates. Example:
T1: Dogs spotted by day
date | dogs
------------------
2020-08-26 | 1
2020-08-27 | 4
T2: Cats spotted by day
date | cats
---------------------
2020-08-25 | 2
2020-08-26 | 5
T3: Cats spotted by day
date | birds
---------------------
2020-08-26 | 8
2020-08-27 | 3
2020-08-28 | 5
I’m trying to join them together on date while keeping all column data, but I’m having trouble doing so without getting a table that has 3 date columns. There’s no table that has all of the dates, so if I just select one of the date columns (e.g. select t1.date, t1.dogs, t2.cats, t3.birds) then I lose some of the date data. What I’m seeking is a table like this:
Desired Output: All Animals Spotted by Day
date | dogs | cats | birds |
----------------------------------------------------------
2020-08-25 | 0 (or null) | 2 | 0 (or null) |
2020-08-26 | 1 | 5 | 8 |
2020-08-27 | 4 | 0 (or null) | 3 |
2020-08-28 | 0 (or null) | 0 (or null) | 5 |
I’ve read about every stack overflow post on this I could find but maybe I’m not putting in the correct keywords because I’m not finding this. I’m working specifically in Postgres. Thank you!!
Use generate_series to construct a table of dates and use outer joins with the other tables:
SELECT d.d::date,
t1.dogs,
t2.cats,
t3.birds
FROM generate_series ('2020-08-25'::timestamp, '2020-08-28'::timestamp, '1 day'::interval) AS d(d)
LEFT JOIN t1 ON t1.date = d.d::date
LEFT JOIN t2 ON t1.dat3 = d.d::date
LEFT JOIN t3 ON t3.date = d.d::date;
Regardless of knowing the design why you need or you could change it further,
Using union and aggregation could be one option,
select date
, max(dogs) as dogs
, max(cats) as cats
, max(birds) as birds
from
(
select date,dogs,0 cats,0 birds from t1
union all
select date,0,cats,0 from t2
union all
select date,0,0,birds from t3
) t
group by date
order by date;
Note: Don't know if multiple entry possible for a single date , in case yes you need to use sum instead max
You can also use full join. For your example select * does what you want:
select *
from cats c full join
dogs d
using (date) full join
birds b
using (date);
I might recommend, however, that you put all the counts into a single table, with an additional column specifying "cat", "dog" and so on. If you had that, then simple aggregation would work:
select date,
count(*) filter (where type = 'cat'),
count(*) filter (where type = 'dog'),
count(*) filter (where type = 'bird')
from t
group by date;

Sql inner join only with last row in second table

I have two tables: leads and tracking_leads.
Table structure is as below,
---------------------------- ----------------------
| leads | | tracking_leads |
---------------------------- ----------------------
| id | | tracking_id |
| lead_id | | lead_id |
| anzahl_tickets | | field_name |
| bearbeitungs_id_einkauf | | date |
---------------------------- -----------------------
I need sql for join table lead with tracking_leads table but get only LAST match row in table tracking_leads .
Sql example:
SELECT DATE_FORMAT(tracking_leads.date, "%d.%m.%Y") as trackDate, SUM(l.anzahl_tickets)
as sumValue FROM leads as l INNER JOIN tracking_leads ON l.lead_id=tracking_leads.lead_id
WHERE bearbeitungs_id_einkauf <> '' AND tracking_leads.field_name='bearbeitungs_id_einkauf'
GROUP BY DATE_FORMAT(tracking_leads.date, "%d.%m.%Y")
In this part : INNER JOIN tracking_leads ON l.lead_id=tracking_leads.lead_id need only last record from tracking_leads table.
For example, leads data:
id lead_id anzahl_tickets bearbeitungs_id_einkauf
1 20 2 100
tracking_leads data:
tracking_id lead_id field_name date
1 20 bearbeitungs_id_einkauf 2019-05-31 13:55
2 20 bearbeitungs_id_einkauf 2019-05-31 15:00
In result i need get :
2019-05-31 2
But now i get
2019-05-31 4
Because there are duplicated of lead_id (need only last record).
How can i solve this problem?
Thanks!
My preference would be to use an inline view to get the max dates.
A correlated subquery would be executed once for each row, while the inline view would only need to be executed once.
This should work:
SELECT DATE_FORMAT(tl.date, "%d.%m.%Y") as trackDate,
SUM(l.anzahl_tickets) as sumValue
FROM leads as l
INNER JOIN (
select x.lead_id, max(x.date) date from tracking_leads x where x.field_name = 'bearbeitungs_id_einkauf' group by x.lead_id
) tl ON l.lead_id=tl.lead_id
WHERE bearbeitungs_id_einkauf <> ''
GROUP BY DATE_FORMAT(tl.date, "%d.%m.%Y")
Side node: the test for empty value of bearbeitungs_id_einkauf in the WHERE clause is database-specific, so watch out for issues there. In Oracle, for example, there is no such thing as an empty string, so you would have to test it for NOT NULL. I'm assuming this is not Oracle.
First, I don't like the date format DD-MM-YYYY, because you cannot sort by it. Just use YYYY-MM-DD.
Second, you can use a correlated subquery to get the most recent date:
SELECT DATE(tl.date) as trackDate, SUM(l.anzahl_tickets) as sumValue
FROM leads l INNER JOIN
tracking_leads tl
ON l.lead_id = tl.lead_id
WHERE l.bearbeitungs_id_einkauf <> '' AND
tl.field_name = 'bearbeitungs_id_einkauf' AND
tl.date = (SELECT MAX(tl2.date)
FROM tracking_leads tl2
WHERE tl2.lead_id = tl.lead_id AND
tl2.field_name = tl.field_name
)
GROUP BY DATE(tl.date);
Of course, you can leave your original date format if you prefer. If you do, you can use:
ORDER BY MIN(tl.date)
so the results are order by the date.

Need to run a query that returns data for every hour even if it doesn't exist

I tried searching for something similar but couldn't quite pinpoint the problem.
Currently working in Oracle 11g
I have a table, let's call it TEST_DATA, with columns "SITE", "DAY", "TIME" AND "TEMPERATURE", that holds average temperature data for a specific site at a specific day and time, and it will not contain data for every hour of every day. At the moment, let's say I only have this data:
TEST DATA:
What I need, is to build a view or a query, that returns a set of results for every day and every hour, even if it's not in the TEST_DATA table (returning 0 in that case), which would return a result similar to:
SITE | DAY | TIME | TEMPERATURE
Home | 1 | 0 | 0
Home | 1 | 1 | 0
Home | 1 | 2 | 0
Home | 1 | 3 | 15
Home | 1 | 4 | 0
Home | 1 | 5 | 23
Home | 1 | 6 | 0
Home | 1 | 7 | 0
Home | 1 | 8 | 0
Home | 1 | 9 | 0
Home | 1 | 10 | 0
... | ... | ... | ...
Does anyone have any idea how I could go about doing this using SQL?
I tried having a table with 24*31 entries and day/time columns, making a row for each day/time pair, but haven't figured out a way to get this working as I want.
I want to try and avoid to make this last table with that many rows for each of the possible sites, since I'm working with over 2000 different sites, which would make the table over a million rows long.
I appreciate any help on the matter.
EDIT:
Thanks to #Gordon Linoff's answer, I was able to adapt to my table and got the results exactly as I wanted them. The query ended up being:
SELECT s.SITE, d.DAY, d.TIME, td.TEMPERATURE
FROM (SELECT DISTINCT SITE FROM TEST_DATA) s CROSS JOIN
(SELECT * FROM TEST_DAY_TIME) d LEFT JOIN
TEST_DATA td
ON td.SITE = s.SITE AND td.DAY = d.DAY AND td.TIME = d.TIME
ORDER BY d.DAY, d.TIME;
The other table was something like this, all the way to 31.
TEST_DAY_TIME
Generate all the rows and then use left join to bring in the values. This is the basic idea:
select s.site, d.date, h.time, t.temperature
from (select distinct site from t) s cross join
(select distinct day from t) d cross join
(select 0 as time from dual union all select 1 from dual union all . . .
select 23 from dual) h left join
t
on t.site = s.site and t.day = d.date and t.time = h.time;
Of course, you will get duplicates if a day/time combination appears multiple times in the original table.
You need to create some time data then outer join to your data so you have the basis of the empty times.
This can be done with a simple use of connect by as the following. This will use 1/1/17 as the start date and give you 1 hour increments for the next 100 days.
select to_date('01-jan-17','dd-mon-yy') + numtodsinterval(level,'HOUR') as date_by_hour
from dual
connect by level <= 24*100;
You then can put your data and this data together via an outer join like the following...
WITH time_data as
(select to_date('01-jan-17','dd-mon-yy') + numtodsinterval(level,'HOUR') as date_by_hour
from dual
connect by level <= 24*100),
test_data(site, day, time, temp) as
(select 'HOME', 1, 3, 50 from dual union all
select 'HOME', 1, 4, 51 from dual union all
select 'HOME', 1, 7, 55 from dual union all
select 'HOME', 1, 12, 60 from dual)
select NVL(test_data.site,'HOME') as site, time_data.date_by_hour, NVL(test_data.temp,0) as temp
from time_data
left outer join test_data on (time_data.date_by_hour =
to_date('01-jan-17','dd-mon-yy')+numtodsinterval(test_data.day-1,'DAY')+numtodsinterval(test_data.time,'HOUR')
AND test_data.site = 'HOME')
order by time_data.date_by_hour;
Another point, I would suggest that you store your date/time data as a DATE and not simply a number day and number time.

Summing cost by id that appears on multiple rows

SOLUTION
I solved it by simple doing the following.
SELECT table_size, sum(cost) as total_cost, sum(num_players) as num_players
FROM
(
SELECT table_size, cost, sum(tp.uid) as num_players
FROM tournament as t
LEFT JOIN takes_part AS tp ON tp.tid = t.tid
LEFT JOIN users as u on u.uid = tp.tid
JOIN attributes as a on a.aid = t.attrId
GROUP BY t.tid
) as res
GROUP BY table_size
I wasn't sure it would work, what with the other aggregate functions that I had to use in my real sql, but it seems to be working ok. There may be problems in the future if I want to do other kind of calculations, for instance do a COUNT(DISTINCT tp.uid) over all tournaments. Still, in this case that is not all that important so I am satisfied for now. Thank you all for your help.
UPDATE!!!
Here is a Fiddle that explains the problem:
http://www.sqlfiddle.com/#!2/e03ff/7
I want to get:
table_size | cost
-------------------------------
5 | 110
8 | 80
OLD POST
I'm sure that there is an easy solution to this that I'm just not seeing, but I can't seem to find a solution to it anywhere. What I'm trying to do is the following:
I need to sum 'costs' per tournament in a system. For other reasons, I've had to join with lots of other tables, making the same cost appear on multiple rows, like so:
id | name | cost | (hidden_id)
-----------------------------
0 | Abc | 100 | 1
1 | ASD | 100 | 1
2 | Das | 100 | 1
3 | Ads | 50 | 2
4 | Ads | 50 | 2
5 | Fsd | 0 | 3
6 | Ads | 0 | 3
7 | Dsa | 0 | 3
The costs in the table above are linked to an id value that is not necessary selected in by the SQL (this depends on what the user decides at runtime). What I want to get, is the sum 100+50+0 = 150. Of course, if I just use SUM(cost) I will get a different answer. I tried using SUM(cost)/COUNT(*)*COUNT(tourney_ids) but this only gives correct result under certain circumstances. A (very) simple form of query looks like this:
SELECT SUM(cost) as tot_cost -- This will not work as it sums all rows where the sum appears.
FROM t
JOIN ta ON t.attr_id = ta.toaid
JOIN tr ON tr.toid = t.toid -- This row will cause multiple rows with same cost
GROUP BY *selected by user* -- This row enables the user to group by several attributes, such as weekday, hour or ids of different kinds.
UPDATE. A more correct SQL-query, perhaps:
SELECT
*some way to sum cost*
FROM tournament AS t
JOIN attribute AS ta ON t.attr_id = ta.toaid
JOIN registration AS tr ON tr.tourneyId = t.tourneyId
INNER JOIN pokerstuff as ga ON ta.game_attr_id = ga.gameId
LEFT JOIN people AS p ON p.userId = tr.userId
LEFT JOIN parttaking AS jlt ON (jlt.tourneyId = t.tourneyId AND tr.userId = jlt.userId)
LEFT JOIN (
SELECT t.tourneyId,
ta.a - (ta.b) - sum(c)*ta.cost AS cost
FROM tournament as t
JOIN attribute as ta ON (t.attr_id = ta.toaid)
JOIN registration tr ON (tr.tourneyId = t.tourneyId)
GROUP BY t.tourneyId, ta.b, ta.a
) as o on t.tourneyId = o.tourneyId
AND whereConditions
GROUP BY groupBySql
Description of the tables
tournament (tourneyId, name, attributeId)
attributes (attributeId, ..., gameid)
registration (userId, tourneyId, ...)
pokerstuff(gameid,...)
people(userId,...)
parttaking(userId, tourneyId,...)
Let's assume that we have the following (cost is actually calculated in a subquery, but since it's tied to tournament, I will treat it as an attribute here):
tournament:
tourneyId | name | cost
1 | MyTournament | 50
2 | MyTournament | 80
and
userId | tourneyId
1 | 1
2 | 1
3 | 1
4 | 1
1 | 2
4 | 2
The problem is rather simple. I need to be able to get the sum of the costs of the tournaments without counting a tournament more than once. The sum (and all other aggregates) will be dynamically grouped by the user.
A big problem is that many solutions that I've tried (such as SUM OVER...) would require that I group by certain attributes, and that I cannot do. The group by-clause must be completely decided by the user. The sum of the cost should sum over any group-by attributes, the only problem is of course the multiple rows in which the sum appears.
Do anyone of you have any good hints on what can be done?
Try the following:
select *selected by user*, sum(case rownum when 1 then a.cost end)
from
(
select
*selected by user*, cost,
row_number() over (partition by t.tid) as rownum
FROM t
JOIN ta ON t.attr_id = ta.toaid
JOIN tr ON tr.toid = t.toid
) a
group by *selected by user*
The row_number is used to number each row with the same tournament row. When suming the costs we only consider those rows with a rownum of 1. All other rows are duplicates of this one with regards to the costs.
In terms of the fiddle:
select table_size, sum(case rownum when 1 then a.cost end)
from
(
SELECT
table_size, cost,
row_number() over (partition by t.tid) as rownum
FROM tournament as t
LEFT JOIN takes_part AS tp ON tp.tid = t.tid
LEFT JOIN users as u on u.uid = tp.tid
JOIN attributes as a on a.aid = t.attrId
) a
group by table_size
As the repeated costs are the same each time you can average them by their hidden id and do something like this:
WITH MrTable AS (
SELECT DISTINCT hidden_id, AVG(cost) OVER (PARTITION BY hidden_id) AS cost
FROM stuff
)
SELECT SUM(cost) FROM MrTable;
(Updated) Given that the cost currently returned is the total cost per tournament, you could include a fractional value of cost on each line of an inner select, such that the total of all those values adds up to the total cost (allowing for the fact that each given tournament's values may be appearing multiple times), then sum that fractional cost in your outer select, like so:
select table_size, sum(frac_cost) as agg_cost from
(SELECT a.table_size , cost / count(*) over (partition by t.tid) as frac_cost
FROM tournament as t
LEFT JOIN takes_part AS tp ON tp.tid = t.tid
LEFT JOIN users as u on u.uid = tp.uid
JOIN attributes as a on a.aid = t.attrId) sq
GROUP BY table_size
SQLFiddle here.