Get aggregation values for all possible distinct entries in group by - sql

Ok, I know the title is confusing, but the idea is pretty simple. I just need to figure out how many flights were flown at five different sites during a given time period. Sometimes a site won't have any flights during the period and this is where I'm having the problem. If I use:
select count(*)
from Flight
where date between '9/9/2013' and '9/15/2013'
group by Site
order by Site
I will only get the sites that have actually flown, but I would like to have those sites where there were no flights during during that period (but have flown at other times and have records in the table) still return a value of 0.

Use condition summation. That is, move the where clause to a case statement:
select sum(case when date between '9/9/2013' and '9/15/2013' then 1 else 0 end)
from Flight
group by Site
order by Site;

Related

How to write an SQL query to get max number of counts for the most number of travelling of a user within a month

I have been given a task by my manager to write a SQL query to select the max number of counts (no of records) for a user who has travelled the most within a month provided that if the user travels multiple places on the same date, then it should be counted as one. For instance, if you look at the following table design; according to this scenario, my query must return me a count of 2. Although traveller_id "1" has traveled three times within a month, but he traveled to Thailand and USA on the same date, that is why its count is reduced to 2.
I have also developed my logic for this query but I am unable to write it due to lack of syntax knowledge. I split up this query into 3 parts:
Select All records from the table within a month using the MONTH function of SQL
Select All distinct DateTime records from the above result so that the same DateTime gets eliminated.
Select max number of counts for the traveller who visited most places.
Please help me in completing my query. You can also use a different approach from mine.
You can use the count aggregation in a cte then select top(1):
with u as
(select traveller_id,
count(distinct visit_date) as n
from travellers_log
where visit_date between '2022-03-01' and '2022-03-31'
group by traveller_id)
select top(1) traveller_id, name, n from u inner join table_travellers
on u.traveller_id = table_travellers.id
order by n desc;

TSQL query to find latest (current) record from period column when there are past present and future records

edited as requested:
My apologies. I've been dealing with this a bit and it's well and truly in my head, but not for the reader.
We have multiple records in table A which have multiple entries in the Period column. Say it's like a football schedule. Teams will have multiple dates/times in the Period column.
When we run query:
We want records selected for the most recent games only.
We don't want the earlier games.
We don't want the games "scheduled" and not yet played.
"Last game played" i.e. Period for teams are often on different days.
Table like:
Team Period
Reds 2021020508:00
Reds 2021011107:00
City 2021030507:00
Reds 2021032607:00
City 2021041607:00
Reds 2021050707:00
When I run query, I want to see the records for last game played regardless of date. So if I run the query on 27 Mar 2021, I want:
City 2021030507:00
Reds 2021032607:00
Keep in mind I used the above as an easily understandable example. In my case I have 1000s of "Teams" each of which may have 100+ different date entries in the Period column and I would like the solution to be applicable regardless of number of records, dates, or when the query is run.
What can I do?
Thanks!
So this gives you your desired output using the sample data, does it fulfil your requirement?
create table x (Team varchar(10), period varchar(20))
insert into x values
('Reds','2021020508:00'),
('Reds','2021011107:00'),
('City','2021030507:00'),
('Reds','2021032607:00'),
('City','2021041607:00'),
('Reds','2021050707:00')
select Team, Max(period) LastPeriod
from x
where period <=Format(GetDate(), 'yyyyMMddhh:mm')
group by Team
The string-formatted date you have order by text, so I think this would work
SELECT TOP 2 *
FROM tableA
WHERE period = FORMAT( GETDATE(), 'yyyyMMddhh:mm' )
ORDER BY period
Perhaps you want:
where period = (select max(t2.period) from t t2)
This returns all rows with the last period in the table.

Is there a simple line (or two) of code that will pull records before a minimum date in another table?

I want to pull Emergency room visits before a members first treatment date. Everyone as a different first treatment date and none occur before Jan 01 2012.
So if a member has a first treatment date of Feb 24 2013, I want to know how many times they visited the ER one year prior to that date.
These min dates are located in another table and I can not use the Min date in my DATEADD function. Thoughts?
One possible solution is to use a CTE to capture the visits between the dates your interested in and then join to that with your select.
Here is an example:
Rextester
Edit:
I just completely updated my answer. Sorry for the confusion.
So you have at least two tables:
Emergency room visits
Treatment information
Let's call these two tables [ERVisits] and [Treatments].
I suppose both tables have some id-field for the patient/member. Let's call it [MemberId].
How about this conceptual query:
WITH [FirstTreatments] AS
(
SELECT [MemberId], MIN([TreatmentDate]) AS [FirstTreatmentDate]
FROM [Treatments]
GROUP BY [MemberId]
)
SELECT V.[MemberId], T.[FirstTreatmentDate], COUNT(*) AS [ERVisitCount]
FROM [ERVisits] AS V INNER JOIN [FirstTreatments] AS T ON T.[MemberId] = V.[MemberId]
WHERE DATEDIFF(DAY, V.[VisitDate], T.[FirstTreatmentDate]) BETWEEN 0 AND 365
GROUP BY V.[MemberId], T.[FirstTreatmentDate]
This query should show the number of times a patient/member has visited the ER in the year before his/her first treatment date.
Here is a tester: https://rextester.com/UXIE4263

Is there a way to handle immutability that's robust and scalable?

Since bigquery is append-only, I was thinking about stamping each record I upload to it with an 'effective date' similar to how peoplesoft works, if anybody is familiar with that pattern.
Then, I could issue a select statement and join on the max effective date
select UTC_USEC_TO_MONTH(timestamp) as month, sum(amt)/100 as sales
from foo.orders as all
join (select id, max(effdt) as max_effdt from foo.orders group by id) as latest
on all.effdt = latest.max_effdt and all.id = latest.id
group by month
order by month;
Unfortunately, I believe this won't scale because of the big query 'small joins' restriction, so I wanted to see if anyone else had thought around this use case.
Yes, adding a timestamp for each record (or in some cases, a flag that captures the state of a particular record) is the right approach. The small side of a BigQuery "Small Join" can actually return at least 8MB (this value is compressed on our end, so is usually 2 to 10 times larger), so for "lookup" table type subqueries, this can actually provide a lot of records.
In your case, it's not clear to me what the exact query you are trying to run is.. it looks like you are trying to return the most recent sales times of every individual item - and then JOIN this information with the SUM of sales amt per month of each item? Can you provide more info about the query?
It might be possible to do this all in one query. For example, in our wikipedia dataset, an example might look something like...
SELECT contributor_username, UTC_USEC_TO_MONTH(timestamp * 1000000) as month,
SUM(num_characters) as total_characters_used FROM
[publicdata:samples.wikipedia] WHERE (contributor_username != '' or
contributor_username IS NOT NULL) AND timestamp > 1133395200
AND timestamp < 1157068800 GROUP BY contributor_username, month
ORDER BY contributor_username DESC, month DESC;
...to provide wikipedia contributions per user per month (like sales per month per item). This result is actually really large, so you would have to limit by date range.
UPDATE (based on comments below) a similar query that finds "num_characters" for the latest wikipedia revisions by contributors after a particular time...
SELECT current.contributor_username, current.num_characters
FROM
(SELECT contributor_username, num_characters, timestamp as time FROM [publicdata:samples.wikipedia] WHERE contributor_username != '' AND contributor_username IS NOT NULL)
AS current
JOIN
(SELECT contributor_username, MAX(timestamp) as time FROM [publicdata:samples.wikipedia] WHERE contributor_username != '' AND contributor_username IS NOT NULL AND timestamp > 1265073722 GROUP BY contributor_username) AS latest
ON
current.contributor_username = latest.contributor_username
AND
current.time = latest.time;
If your query requires you to use first build a large aggregate (for example, you need to run essentially an accurate COUNT DISTINCT) another option is to break this query up into two queries. The first query could provide the max effective date by month along with a count and save this result as a new table. Then, could run a sum query on the resulting table.
You could also store monthly sales records in separate tables, and only query the particular table for the months you are interested in, simplifying your monthly sales summaries (this could also be a more economical use of BigQuery). When you need to find aggregates across all tables, you could run your queries with multiple tables listed after the FROM clause.

SQL multiple join with count, having trouble

First of all this is a homework assignment so I'm looking for assistance, not solutions. Let me try to explain my schema. I have three tables we'll call users (with columns id and name), parties (with columns id, partydate, and user_id) and questions (with columns id, createdate, and user_id). My requirement is to show for every user the number of parties within the last year and questions created within the last year. At first I had something like this:
SELECT users.id, users.name,
COUNT(parties.id) AS numparties, COUNT(qustions.id) AS numquestions
FROM users
FULL JOIN parties ON users.id=parties.user_id
FULL JOIN questions ON users.id=questions.user_id
WHERE (parties.partydate > NOW() - interval '1 year' OR parties.partydate IS NULL)
OR (questions.createdate > NOW() - interval '1 year' OR questions.createdate IS NULL)
GROUP BY users.id, users.name
Now this works, almost! The problem is, if a user has no parties nor questions within the past year, they don't show up at all in the result. I want such a user to show up, I just want it to show them with 0 for each numparties and numquestions.
What I think I need here is some sort of conditional counting, where I only want to COUNT(parties.id) WHERE that party's partydate is within the past year, and the same for questions. I'm just unsure how to do that. I have a hacky-workaround way to do what I want, where I basically UNION the above query with a near identical copy of itself, except I use SUM(0) for numparties and numquestions and my WHERE statement is just where the date is <= instead of >. I feel this is not the best way to go about it.
Any pointers in the right direction? Thanks for the help!
Take a look at this: http://www.w3schools.com/sql/sql_join_left.asp. I think it might point you in the right direction.
Think I've got it with this:
SUM(CASE WHEN (parties.partydate > NOW() - interval '1 year') THEN 1 ELSE 0 END) as numparties
and just removed the WHERE clauses.
I think I'd resort to a subquery for this. Homework questions are fun to answer, I can heavily psuedo code this. Take the entire query you have and call it 'x'.
First thing you'll want is a list of all users regardless of how many questions they've asked.
Select distinct users.id,users.name from users
that will give you a full list of your users. The query you have above gives you there calls...so left join the two together.
Select (fields you want)
from users
left join (enter you query above here) x on x.id = users.id
Hopefully the logic here makes sense for you. Use one query to get the list of users and join that to the subquery to get their counts.
edit to add: this will bring back nulls anytime there are no records. You can make your select statement show nulls as 0's