SELECT with MAX and SUM from multiple tables - sql

I have 3 tables :
weather_data (hourly_date, rain)
weather_data_calculated (hourly_date, calc_value)
weather_data_daily (daily_date, daily_value)
I would like to get a list of DAILY value from these 3 tables using this select :
SELECT daily_date, daily_value, SUM(rain), MAX(calc_value)
The SUM and the MAX need to be done for all the hour of the day.
This is what I did :
SELECT
date_format(convert_tz(daily_date, 'GMT', 'America/Los_Angeles'), '%Y-%m-%d 00:00:00') as daily_date_gmt,
daily_value,
SUM(rain),
MAX(calc_value)
FROM weather_data_daily wdd, weather_data wd, weather_data_calculated wdc
WHERE daily_date_gmt=date_format(convert_tz(wd.hourly_date, 'GMT', 'America/Los_Angeles'), '%Y-%m-%d 00:00:00')
and daily_date_gmt=date_format(convert_tz(wdc.hourly_date, 'GMT', 'America/Los_Angeles'), '%Y-%m-%d 00:00:00')
group by daily_date_gmt
order by daily_date_gmt;
This didn't work because I don't know how to deal with the group by in this case.
I also try to use a temporary table but without success too.
Thanks for your help!

Either include daily_value in your group by, or use two queries. One will contain the date column and the two aggregates, the other will contain the date column and daily value. you can then use a single outer query to join these result sets on the date column.
EDIT: You say in your comment that including daily_value in the group by means the query doesn't complete. This is because (probably) you have no join criteria between all the tables your query includes. This will result in a potentially VERY large result set which would take a very long time. I don't mind helping with the actual SQL but you will need to update your question so that we can see which fields are coming from which tables.

Assuming you only have one entry for daily_date, daily_value in 'weather_data_daily' you should
GROUP BY daily_date, daily_value, then your aggregrations (SUM and MAX) will operate on the correct grouping.

try this:
select a.daily_date, a.daily_value, SUM(b.rain), MAX(c.calc_value)
from weather_data_daily a,weather_data b,weather_data_calculated c
where convert(varchar, a.daily_date, 101)=convert(varchar, b.hourly_date, 101)
and convert(varchar, a.daily_date, 101)=convert(varchar, c.hourly_date, 101)
group by a.daily_date, a.daily_value
You have to connect the tables together somehow (this uses an inner join). This requires getting the hourly dates and other dates in the same format. This gives them the format MM/DD/YYYY.

Related

Expanding SQL query for multiple dates

I have a SQL query that includes a __DATE__ macro. A Python script replaces this macro with the current date and then the statement is executed thus giving one day's worth of data.
For the first item selected, I would like to use tblLabTestResult.CollectionDate instead of __DATE__.
I would like to include the prior 7 days instead of just the current day.
The desired output would be something similar to:
Date,Result,Total
2021-08-28,Detected,5
2021-08-28,Not Detected,9
2021-08-29,Detected,23
2021-08-29,Not Detected,6
2021-08-30,Detected,88
2021-08-30,Not Detected,26
Current query:
SELECT
'__DATE__' as Date,
tblLabTestResult.Result as Result,
Count(tblLabTestResult.Result) as Total
FROM
PncRegDb.dbo.tblLabTestResult as tblLabTestResult
WHERE
tblLabTestResult.TestName like '%cov%'
AND tblLabTestResult.TestName not like '%aoe%'
AND tblLabTestResult.TestName not like '%antibody%'
AND tblLabTestResult.CollectionDate >= '__DATE__'
AND tblLabTestResult.CollectionDate <= '__DATE__ 11:59:59 PM'
GROUP BY
tblLabTestResult.Result;
How can I change my SQL query to accommodate these requirements? I am using MS SQL Server.
You can use DATEADD() function to get the date from 7 days ago and use all dates between date-7days and date. I have updated where condition in your query below:
SELECT
'__DATE__' as Date,
tblLabTestResult.Result as Result,
Count(tblLabTestResult.Result) as Total
FROM
PncRegDb.dbo.tblLabTestResult as tblLabTestResult
WHERE
tblLabTestResult.TestName like '%cov%'
AND tblLabTestResult.TestName not like '%aoe%'
AND tblLabTestResult.TestName not like '%antibody%'
AND tblLabTestResult.CollectionDate between DATEADD(day, -7, '__DATE__') and '__DATE__ 11:59:59 PM'
GROUP BY
tblLabTestResult.Result;
A few points:
Columns that are not aggregated must be in the GROUP BY
You should be passing your date as a parameter
Best to use a half-open interval to compare dates (exclusive end-point), so #endDate is the day after the one you want
Use short, meaningful aliases to make your code more readable
It doesn't make sense to group and aggregate by the same column. If Result is a non-nullable column then Count(Result) is the same as Count(*)
If you want to group by whole days (and CollectionDate has a time component) then replace ltr.CollectionDate with CAST(ltr.CollectionDate AS date) in both the SELECT and GROUP BY
SELECT
ltr.CollectionDate as Date,
ltr.Result as Result,
COUNT(*) as Total
FROM
PncRegDb.dbo.tblLabTestResult as tblLabTestResult
WHERE
ltr.TestName like '%cov%'
AND ltr.TestName not like '%aoe%'
AND ltr.TestName not like '%antibody%'
AND ltr.CollectionDate >= #startdate
AND ltr.CollectionDate < #endDate
GROUP BY
ltr.CollectionDate, ltr.Result;

Optimization on large tables

I have the following query that joins two large tables. I am trying to join on patient_id and records that are not older than 30 days.
select * from
chairs c
join data id
on c.patient_id = id.patient_id
and to_date(c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') >= 0
and to_date (c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') < 30
Currently, this query takes 2 hours to run. What indexes can I create on these tables for this query to run faster.
I will take a shot in the dark, because as others said it depends on what the table structure, indices, and the output of the planner is.
The most obvious thing here is that as long as it is possible, you want to represent dates as some date datatype instead of strings. That is the first and most important change you should make here. No index can save you if you transform strings. Because very likely, the problem is not the patient_id, it's your date calculation.
Other than that, forcing hash joins on the patient_id and then doing the filtering could help if for some reason the planner decided to do nested loops for that condition. But that is for after you fixed your date representation AND you still have a problem AND you see that the planner does nested loops on that attribute.
Some observations if you are stuck with string fields for the dates:
YYYYMMDD date strings are ordered and can be used for <,> and =.
Building strings from the data in chairs to use to JOIN on data will make good use of an index like one on data for patient_id, from_date.
So my suggestion would be to write expressions that build the date strings you want to use in the JOIN. Or to put it another way: do not transform the child table data from a string to something else.
Example expression that takes 30 days off a string date and returns a string date:
select to_char(to_date('20200112', 'YYYYMMDD') - INTERVAL '30 DAYS','YYYYMMDD')
Untested:
select * from
chairs c
join data id
on c.patient_id = id.patient_id
and id.from_date between to_char(to_date(c.from_date, 'YYYYMMDD') - INTERVAL '30 DAYS','YYYYMMDD')
and c.from_date
For this query:
select *
from chairs c join data
id
on c.patient_id = id.patient_id and
to_date(c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') >= 0 and
to_date (c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') < 30;
You should start with indexes on (patient_id, from_date) -- you can put them in both tables.
The date comparisons are problematic. Storing the values as actual dates can help. But it is not a 100% solution because comparison operations are still needed.
Depending on what you are actually trying to accomplish there might be other ways of writing the query. I might encourage you to ask a new question, providing sample data, desired results, and a clear explanation of what you really want. For instance, this query is likely to return a lot of rows. And that just takes time as well.
Your query have a non SERGABLE predicate because it uses functions that are iteratively executed. You need to discard such functions and replace them by a direct access to the columns. As an exemple :
SELECT *
FROM chairs AS c
JOIN data AS id
ON c.patient_id = id.patient_id
AND c.from_date BETWEEN id.from_date AND id.from_date + INTERVAL '1 day'
Will run faster with those two indexes :
CREATE X_SQLpro_001 ON chairs (patient_id, from_date);
CREATE X_SQLpro_002 ON data (patient_id, from_date) ;
Also try to avoid
SELECT *
And list only the necessary columns

How to do a query with unique results between two tables?

I need to do a query to get the last accessed file per user, in SCCM 2012. I'm trying to make a query in sql but i'm getting a lot of duplicate results.
The result that I need must contain only the last date (most recently) for each user.
There is the query that i'm using:
SELECT
dbo.v_GS_SoftwareFile.FileName,
dbo.v_R_System.User_Name0,
dbo.v_GS_SoftwareFile.FileModifiedDate
FROM
dbo.v_GS_SoftwareFile
CROSS JOIN dbo.v_R_System
WHERE
(dbo.v_GS_SoftwareFile.FileName = N'outlook.exe')
AND (dbo.v_GS_SoftwareFile.FileModifiedDate > CONVERT(DATETIME, '2015-02-01 00:00:00', 102))
GROUP BY
dbo.v_R_System.User_Name0,
dbo.v_GS_SoftwareFile.FileName,
dbo.v_GS_SoftwareFile.FileModifiedDate
What I need to add to this query?
Your CROSS JOIN might be responsible for the 'duplicate results' you report, since you don't have an actual join condition there (so, if you have 10 records in one table, and 100 records in another, you will have 10x100=1000 records). Is there a common key between your SoftwareFile and System tables?
Once you've added the JOIN condition, to get it down to a single date per user, use the MAX() function as follows:
SELECT
dbo.v_GS_SoftwareFile.FileName,
dbo.v_R_System.User_Name0,
MAX(dbo.v_GS_SoftwareFile.FileModifiedDate) AS LastFileModifiedDate
FROM
dbo.v_GS_SoftwareFile
CROSS JOIN
dbo.v_R_System
WHERE
(dbo.v_GS_SoftwareFile.FileName = N'outlook.exe')
AND (dbo.v_GS_SoftwareFile.FileModifiedDate > CONVERT(DATETIME, '2015-02-01 00:00:00', 102))
GROUP BY
dbo.v_R_System.User_Name0,
dbo.v_GS_SoftwareFile.FileName

View data by date after Format 'mmyy'

I'm trying to answer questions like, how many POs per month do we have? Or, how many lines are there in every PO by month, etc. The original PO dates are all formatted #1/1/2013#. So my first step was to Format each PO record date into 'mmyy' so I could group and COUNT them.
This worked well but, now I cannot view the data by date... For example, I cannot ask 'How many POs after December did we get?' I think this is because SQL does not recognize mm/yy as a comparable date.
Any ideas how I could restructure this?
There are 2 queries I wrote. This is the query to format the dates. This is also the query I was trying to add the date filter to (ex: >#3/14#)
SELECT qryALL_PO.POLN, Format([PO CREATE DATE],"mm/yy") AS [Date]
FROM qryALL_PO
GROUP BY qryALL_PO.POLN, Format([PO CREATE DATE],"mm/yy");
My group and counting query is:
SELECT qryALL_PO.POLN, Sum(qryALL_PO.[LINE QUANTITY]) AS SUM_QTY_PO
FROM qryALL_PO
GROUP BY qryALL_PO.POLN;
You can still count and group dates, as long as you have a way to determine the part of the date you are looking for.
In Access you can use year and month for example to get the year and month part of the date:
select year(mydate)
, month(mydate)
, count(*)
from tableX
group
by year(mydate)
, month(mydate)
You can format it 'YYYY-MM' , and then use '>' for 'after' clause

How to group by a date column by month

I have a table with a date column where date is stored in this format:
2012-08-01 16:39:17.601455+0530
How do I group or group_and_count on this column by month?
Your biggest problem is that SQLite won't directly recognize your dates as dates.
CREATE TABLE YOURTABLE (DateColumn date);
INSERT INTO "YOURTABLE" VALUES('2012-01-01');
INSERT INTO "YOURTABLE" VALUES('2012-08-01 16:39:17.601455+0530');
If you try to use strftime() to get the month . . .
sqlite> select strftime('%m', DateColumn) from yourtable;
01
. . . it picks up the month from the first row, but not from the second.
If you can reformat your existing data as valid timestamps (as far a SQLite is concerned), you can use this relatively simple query to group by year and month. (You almost certainly don't want to group by month alone.)
select strftime('%Y-%m', DateColumn) yr_mon, count(*) num_dates
from yourtable
group by yr_mon;
If you can't do that, you'll need to do some string parsing. Here's the simplest expression of this idea.
select substr(DateColumn, 1, 7) yr_mon, count(*) num_dates
from yourtable
group by yr_mon;
But that might not quite work for you. Since you have timezone information, it's sure to change the month for some values. To get a fully general solution, I think you'll need to correct for timezone, extract the year and month, and so on. The simpler approach would be to look hard at this data, declare "I'm not interested in accounting for those edge cases", and use the simpler query immediately above.
It took me a while to find the correct expression using Sequel. What I did was this:
Assuming a table like:
CREATE TABLE acct (date_time datetime, reward integer)
Then you can access the aggregated data as follows:
ds = DS[:acct]
ds.select_group(Sequel.function(:strftime, '%Y-%m', :date_time))
.select_append{sum(:reward)}.each do |row|
p row
end