I have a query below that gives me the total number of hours in a year that a person walks greater than 2 mph Unioned with a query on the total amount of entries (total number of hours recorded). Both are basically the same query with the exception of the last clause in the first one. The issue is that it takes a good 30 seconds to run this query. Is there a way for me to combine the two queries to make it run faster but get similar data? My end goal is to get the percentage of time a person walks greater than 2 mph.
SELECT COUNT(STARTING_HOUR) FROM SENSOR.SPEED
FULL OUTER JOIN ACCOUNT.ID
ON SENSOR.SPEED.Account_ID = ACCOUNT.ID.Account_ID
FULL OUTER JOIN ACCOUNT.NAME
ON ACCOUNT.ID.Account_ID = ACCOUNT.NAME.Account_ID
WHERE UPPER(NAME) LIKE '%Sarah%'
AND UPPER(NAME) LIKE '%Jones%'
AND STARTING_HOUR >= TO_DATE('2015-01-01T00:00:00', 'YYYY-MM-
DD"T"HH24:MI:SS')
AND STARTING_HOUR <= TO_DATE('2015-12-31T00:00:00', 'YYYY-MM-
DD"T"HH24:MI:SS')
AND Value > 2
UNION
SELECT COUNT(STARTING_HOUR) FROM SENSOR.SPEED
FULL OUTER JOIN ACCOUNT.ID
ON SENSOR.SPEED.Account_ID = ACCOUNT.ID.Account_ID
FULL OUTER JOIN ACCOUNT.NAME
ON ACCOUNT.ID.Account_ID = ACCOUNT.NAME.Account_ID
WHERE UPPER(NAME) LIKE '%Sarah%'
AND UPPER(NAME) LIKE '%Jones%'
AND STARTING_HOUR >= TO_DATE('2015-01-01T00:00:00', 'YYYY-MM-DD"T"HH24:MI:SS')
AND STARTING_HOUR <= TO_DATE('2015-12-31T00:00:00', 'YYYY-MM-DD"T"HH24:MI:SS')
Thank you!
First, full outer join is totally superfluous. Then table aliases make the query easier to write and read. And then you can do the arithmetic using AVG():
SELECT AVG(CASE WHEN VALUE > 2 THEN 1.0 ELSE 0 END)
FROM SENSOR.SPEED s JOIN
ACCOUNT.ID i
ON s.Account_ID = i.Account_ID JOIN
ACCOUNT.NAME n
ON i.Account_ID = n.Account_ID
WHERE UPPER(NAME) LIKE '%Sarah%' AND
UPPER(NAME) LIKE '%Jones%' AND
STARTING_HOUR >= DATE '2015-01-01' AND
STARTING_HOUR <= DATE '2015-12-31' ;
I'm pretty sure the WHERE clauses turn all the outer joins into inner joins. Perhaps you do want an outer join somewhere, but it is not obvious that any are necessary.
SELECT CASE
WHEN COUNT(1) = 0 -- Handle division by zero
THEN NULL
ELSE COUNT( CASE WHEN value > 2 THEN 1 END )
/ COUNT( 1 )
END AS Percentage
FROM SENSOR.SPEED
RIGHT OUTER JOIN ACCOUNT.NAME
ON SENSOR.SPEED.Account_ID = ACCOUNT.NAME.Account_ID
WHERE UPPER(NAME) LIKE '%SARAH%'
AND UPPER(NAME) LIKE '%JONES%'
AND STARTING_HOUR BETWEEN DATE '2015-01-01' AND DATE '2015-12-31'
Do you need the ACCOUNT.ID table? Instead, could you join directly from SENSOR.SPEED to ACCOUNT.NAME?
I am assuming that NAME is in ACCOUNT.NAME and with the UPPER(NAME) filter this will never be NULL so you can do a RIGHT OUTER JOIN instead of a FULL OUTER JOIN. Depending on which table the STARTING_HOUR column is in, this could be further simplified to an INNER JOIN.
Related
Goal is to find the empid's for a given timerange that are present in LEFT table but not in RIGHT table.
I have the following two Impala queries which I ran and got different results?
QUERY 1: select count(dbonetable.empid), COUNT(DISTINCT dbtwotable.empid) from
(select distinct dbonetable.empid
from dbonedbtable dbonetable
WHERE (dbonetable.expiration_dt >= '2009-01-01' OR dbonetable.expiration_dt IS NULL) AND dbonetable.effective_dt <= '2019-01-01' AND dbonetable.empid IS NOT NULL) dbonetable
LEFT join dbtwodbtable dbtwotable ON dbonetable.empid = dbtwotable.empid
--43324489 43270569
QUERY 2: select count(*) from (
select distinct dbonetable.empid from dbonedbtable dbonetable
LEFT ANTI join dbtwodbtable dbtwotable ON dbonetable.empid = dbtwotable.empid
AND (dbonetable.expiration_dt >= '2009-01-01' OR dbonetable.expiration_dt IS NULL) AND dbonetable.effective_dt <= '2019-01-01' AND dbonetable.empid IS NOT NULL) tab
--19088973
--For LEFT ANTI JOIN, this clause returns those values from the left-hand table that have no matching value in the right-hand table.
To explain the Context,
Query 2: Trying to find all the empid's that are in dbonetable and are not in dbtwotable using LEFT ANTI JOIN which I learned from here:
https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_joins.html
--For LEFT ANTI JOIN, this clause returns those values from the left-hand table that have no matching value in the right-hand table.
And in Query 1:
The dbOnetable calculated based on where clause and results from it are LEFT OUTER joined with dbtwotable, And on top of that result, I am doing a count(dbonetable.empid) and COUNT(DISTINCT dbtwotable.empid) which gave me a result as --43324489 43270569, which means 53,920.
My question either my Query 1 result should be 43324489 -43270569 = 53,920 or my Query 2 Result should be 19088973.
what could be missing here, is my Query 1 is incorrect? Or is my LEFT ANTI JOIN is misleading?
Thank you all in Advance.
It's different because you forgot specifying "where dbtwotable.empid is null" in the query 1
Additionally, your query 2 is logically different from query 1 because in query 1, you join only on equivalence of empid1 and empid2, while in query 2 your join has much more conditions, so the tables have much fewer common entries compared to query 1, and as a result, the final count is much larger.
If you make a join condition in query 2 the same as in query 1 and put everything else into where clause, you will get the same count that you got in query 1 (updated) which is 53920. That's the count you need
I have a code that pulls all of a Reps new customers and their year to date sales. The only problem is it only pulls customers that have sales in the invdate range, but I need it to show all of the accounts with a 0 if they do not have any sales. Is there any way to achieve this? I tried using COALESCE and it didn't seem to work. I also tried using left, right, full outer joins. Any help would be appreciated!
select
a.Acctnum,
sum(a.invtotal) as total
from invoices a right join accounts b on a.acctnum = b.acctnum where
a.invdate between '1/1/2017' and '12/31/2017'
and a.sls = '78'
and b.sls = '78'
and b.activetype = 'y' and b.startdate > (getdate()-365)
group by a.acctnum
order by total desc
You are restricting your results in your WHERE clause AFTER you join your table causing records to drop. Instead, switch to a LEFT OUTER JOIN with your accounts table driving the query. Then restrict your invoices table in your ON clause so that invoices are dropped BEFORE you join.
SELECT a.Acctnum,
sum(a.invtotal) AS total
FROM accounts b
LEFT OUTER JOIN invoices a ON
a.accntnum = b.acctnum AND
--Put the restrictions on your left most table here
--so they are removed BEFORE joining.
a.invdate BETWEEN '1/1/2017' AND '12/31/2017'
AND a.sls = '78'
WHERE
b.sls = '78'
AND b.activetype = 'y'
AND b.startdate > (getdate() - 365)
GROUP BY a.acctnum
ORDER BY total DESC
It's a bit like doing a subquery in invoices before joining in as the left table. It's just easier to drop the conditions into the ON clause.
Your problem is you where clauses are changing the right join to an inner join. Put all the ones that are aliased by a. into the ON clause.
I was trying to solve this question (No. 29) on http://www.sql-ex.ru/
Under the assumption that the income (inc) and expenses (out) of the
money at each outlet are written not more than once a day, get a
result set with the fields: point, date, income, expense. Use Income_o
and Outcome_o tables.
And came up with this solution
SELECT Income_o.point, Income_o.date, Income_o.inc, Outcome_o.out
FROM Income_o
INNER JOIN Outcome_o ON Income_o.point = Outcome_o.point
The result is obviously wrong (and hence my question here). It assumes that a point will never have more than 1 income and expense, so isn't this query correct? I can see from the same page that the correct query has some NULL column values. I would appreciate an explanation (if not the correct answer). My SQL is not a master one (and that's why I am trying to go through those!! So far done 29 out of 125 and only took help from SO on 3 of them)
The expected result is (From the website):
The result of correct query:
A snapshot of the expected result is here - http://snag.gy/yN43V.jpg
P.S. I know that the hint says UNION and JOIN and trying to get my head around this. If I can get the answer myself, I will post it.
You want a full outer join on point and date:
SELECT
COALESCE(i.point, o.point) AS point,
COALESCE(i.date, o.date) AS date,
i.inc,
o.out
FROM
Income_o AS i
FULL JOIN Outcome_o AS o ON i.point = o.point AND i.date = o.date
;
The COALESCE expressions ensure that NULL is not returned for those columns: if the Income_o side has a NULL (because the table has no match for an Outcome_o row), the value is then taken from the other side.
Alternatively you can go with a union of two outer joins, left and right:
SELECT
i.point,
i.date,
i.inc,
o.out
FROM
Income_o AS i
LEFT JOIN Outcome_o AS o ON i.point = o.point AND i.date = o.date
UNION
SELECT
o.point,
o.date,
i.inc,
o.out
FROM
Income_o AS i
RIGHT JOIN Outcome_o AS o ON i.point = o.point AND i.date = o.date
;
If the tables have matches on the specified condition, both joins will return them, but UNION will eliminate duplicate entries. This second method is essentially an alternative implementation of full outer join, useful for cases where the FULL JOIN syntax is not supported. (MySQL is one product that does not support FULL JOIN.)
You can use Group By with Aggregate Function to achieve the desired result, the sub query will combine the result set but will give results as per date, if there are two (in and out transaction) on same date, these will appear as two rows, to make it one row we can use Group By with aggregate function.
select point, date, max(inc), max(out)
from
(
select point, date, inc, NULL as out
from income_o
union all
select point, date, NULL, out
from outcome_o
)
dt
group by point, date
I think you are looking for LEFT JOIN
SELECT Income_o.point, Income_o.dat, Income_o.inc, Outcome_o.outc
FROM Income_o
LEFT JOIN Outcome_o ON Income_o.point = Outcome_o.point
try this SQL Fiddle example
My issue is that I have a Select statement that has a where clause that is generated on the fly. It is joined across 5 tables.
I basically need a Count of each DISTINCT instance of a USER ID in table 1 that falls into the scope of the WHERE. This has to be able to be executed in one statement as well. So, Esentially, I can't do a global GROUP BY because of the other 4 tables data I need returned.
If I could get a column that had the count that was duplicated where the primary key column is that would be perfect. Right now this is what I'm looking at as my query:
SELECT *
FROM TBL1 1
INNER JOIN TBL2 2 On 2.FK = 1.FK
INNER JOIN TBL3 3 On 3.PK = 2.PK INNER JOIN TBL4 4 On 4.PK = 3.PK
LEFT OUTER JOIN TBL5 5 ON 4.PK = 5.PK
WHERE 1.Date_Time_In BETWEEN '2010-11-15 12:00:00' AND '2010-11-30 12:00:00'
ORDER BY
4.Column
, 3.Column
, 3.Column2
, 1.Date_Time_In DESC
So instead of selecting all columns, I will be filtering it down to about 5 or 6 but with that I need something like a Total column that is the Distinct count of TBL1's Primary Key that applies the WHERE clause that has a possibility of growing and shrinking in size.
I almost wish there was a way to apply the same WHERE clause to a subselect because I realize that would work but don't know of a way other than creating a variable and just placing it in both places which I can't do either.
If you are using SQL Server 2005 or higher, you could use one of the AGGREGATE OVER functions.
SELECT *
, COUNT(UserID) OVER(PARTITION BY UserID) AS 'Total'
FROM TBL1 1
INNER JOIN TBL2 2 On 2.FK = 1.FK
INNER JOIN TBL3 3 On 3.PK = 2.PK INNER JOIN TBL4 4 On 4.PK = 3.PK
LEFT OUTER JOIN TBL5 5 ON 4.PK = 5.PK
WHERE 1.Date_Time_In BETWEEN '2010-11-15 12:00:00' AND '2010-11-30 12:00:00'
ORDER BY
4.Column, 3.Column, 3.Column2, 1.Date_Time_In DESC
something like adding:
inner join (select pk, count(distinct user_id) from tbl1 WHERE Date_Time_In BETWEEN '2010-11-15 12:00:00' AND '2010-11-30 12:00:00') as tbl1too on 1.PK = tbl1too.PK
A friend asked me for help on building a query that would show how many pieces of each model were sold on each day of the month, showing zeros when no pieces were sold for a particular model on a particular day, even if no items of any model are sold on that day. I came up with the query below, but it isn't working as expected. I'm only getting records for the models that have been sold, and I don't know why.
select days_of_months.`Date`,
m.NAME as "Model",
count(t.ID) as "Count"
from MODEL m
left join APPLIANCE_UNIT a on (m.ID = a.MODEL_FK and a.NUMBER_OF_UNITS > 0)
left join NEW_TICKET t on (a.NEW_TICKET_FK = t.ID and t.TYPE = 'SALES'
and t.SALES_ORDER_FK is not null)
right join (select date(concat(2009,'-',temp_months.id,'-',temp_days.id)) as "Date"
from temp_months
inner join temp_days on temp_days.id <= temp_months.last_day
where temp_months.id = 3 -- March
) days_of_months on date(t.CREATION_DATE_TIME) =
date(days_of_months.`Date`)
group by days_of_months.`Date`,
m.ID, m.NAME
I had created the temporary tables temp_months and temp_days in order to get all the days for any month. I am using MySQL 5.1, but I am trying to make the query ANSI-compliant.
You should CROSS JOIN your dates and models so that you have exactly one record for each day-model pair no matter what, and then LEFT JOIN other tables:
SELECT date, name, COUNT(t.id)
FROM (
SELECT ...
) AS days_of_months
CROSS JOIN
model m
LEFT JOIN
APPLIANCE_UNIT a
ON a.MODEL_FK = m.id
AND a.NUMBER_OF_UNITS > 0
LEFT JOIN
NEW_TICKET t
ON t.id = a.NEW_TICKET_FK
AND t.TYPE = 'SALES'
AND t.SALES_ORDER_FK IS NOT NULL
AND t.CREATION_DATE_TIME >= days_of_months.`Date`
AND t.CREATION_DATE_TIME < days_of_months.`Date` + INTERVAL 1 DAY
GROUP BY
date, name
The way you do it now you get NULL's in model_id for the days you have no sales, and they are grouped together.
Note the JOIN condition:
AND t.CREATION_DATE_TIME >= days_of_months.`Date`
AND t.CREATION_DATE_TIME < days_of_months.`Date` + INTERVAL 1 DAY
instead of
DATE(t.CREATION_DATE_TIME) = DATE(days_of_months.`Date`)
This will help make your query sargable (optimized by indexes)
You need to use outer joins, as they do not require each record in the two joined tables to have a matching record.
http://dev.mysql.com/doc/refman/5.1/en/join.html
You're looking for an OUTER join. A left outer join creates a result set with a record from the left side of the join even if the right side does not have a record to be joined with. A right outer join does the same on the opposite direction, creates a record for the right side table even if the left side does not have a corresponding record. Any column projected from the table that does not have a record will have a NULL value in the join result.