I have an issue with a query I have written for sap hana.
There is basically two tables.
First table is a dates table which contains dates for each single day in a calendar. second table is a results table containing a customer reference number and for each customer reference number a start date and end date. In this customer ref table, I have approximately 4 million records. So essentially in the inner part of the query I would be getting 4 million records for each day since 01012011. There must be a simple way of aggregating the results. I have tried an inner select query however it seems like hana is having performance issues.
I have written the code like this, however this is not optimal.
select date_sql, count(*) as count
from (
select date_sql
from tbl_ref_cal_link tbl_date
where date_sql between '2011-01-01' and add_days (to_date(current_date, 'YYYY-MM-DD'), -1)
)tbl_date
Left join #cust_ref_table M1
On tbl_date.date_sql between m1.startdate and m2.enddate)z
I would appreciate anyone's help or suggestions.
You could use Group By here
And you need to change the m2 in WHERE clause to m1 as in following SQLScript code
select
date_sql, count(m1.CustomerId) as count
from (
-- dates table here
) tbl_date
Left join cust_ref_table m1 On tbl_date.date_sql between m1.startdate and m1.enddate
group by date_sql
Related
I need to run a query every hour against a table that joins and aggregates data from another table with millions of rows.
select f.master_con,
s.containers
from
(
select master_con
from shipped
where start_time >= a and start_time <= a+1
) f,
(
select master_con,
count(distinct container) as containers
from picked
) s
where f.master_con = s.master_con
This query above sorta works, the exact syntax may not be correct because I wrote it from memory.
In the sub query 's' I only want to count container for each master_con in the 'f' query, and I think my query runs for a long time because I'm counting container for all master_con but then joining only to master_con from 'f'
Is there a better, more efficient way to write this type of query?
(In the end, I'll sum(containers) from this query above to get total containers shipped during that hour)
Most likely, there is. Can you provide some simplified sample table structures? Additionally, the join method being used has been moving towards deprecation for some time. You should declare your joins explicitly. The below should be an improvement. Left outer join was used so that you get all of the shipper records that meet your criteria and keep them even if they aren't in the picked table. Change that to inner join if you want them gone.
SELECT shipped.master_con,
COUNT(DISTINCT picked.containers) AS containers
FROM shipped LEFT OUTER JOIN
Picked ON picked.master_con = shipped.master_con
WHERE shipped.start_time BETWEEN a AND a+1
GROUP BY shipped.master_con
I am trying to pull the sum of hours worked worked and compare it to the sum of hours paid for each individual employee. These are stored in two different tables. When I query the tables separately into 2 different tables they work perfect. When I place them in the same query the results are way off.
Sample Data-PayrollTransactions:
PayrollTransactions
PayrollTime
This is the query that does not work:
SELECT Emp_No, Sum(Regular_Hours) AS PaidRegHours, Sum(Overtime_Hours) AS PaidOTHours, Sum(Reg_Hours) AS ClockedRegHours, Sum(OT_Hrs) AS ClockedOTHours
FROM PayrollTransactions, PayrollTime
WHERE Employee_No = Emp_No
GROUP BY Emp_No;
The result it pulls for 1 employee is 1000 PaidRegHours. When doing a query just from PayrollTransactions as such:
SELECT Employee_No, Sum(Regular_Hours) AS PaidRegHours, Sum(Overtime_Hours) AS PaidOTHours
FROM PayrollTransactions
GROUP BY Employee_No;
the result for that same employee is 200 PaidRegHours, which is correct. This same problem exists for all my computed fields. I am unsure how to fix this problem. Thanks for your help!
Desired Results:
DesiredOutput
Classic problem of JOIN multiples. By querying these two tables that share a many-to-many relationship on employee, you return multiple pairings (i.e., duplicates, triplets, quadruples) that are then aggregated, turning actual 200 to 1,000 summed hours. Instead, consider joining one-to-one pairs which can be achieved by joining aggregates of both tables.
Below uses subqueries but can also use stored queries. Also, the explicit JOIN is used (current ANSI SQL standard) and not implicit join as you currently have with WHERE.
SELECT p.Employee_No, p.PaidRegHours, p.PaidOTHours, t.ClockedRegHours, t.ClockedOTHours
FROM
(SELECT Employee_No,
Sum(Regular_Hours) AS PaidRegHours,
Sum(Overtime_Hours) AS PaidOTHours
FROM PayrollTransactions
GROUP BY Employee_No) p
INNER JOIN
(SELECT Emp_No,
Sum(Reg_Hours) AS ClockedRegHours,
Sum(OT_Hrs) AS ClockedOTHours
FROM PayrollTime
GROUP BY Emp_No) t
ON p.Employee_No = t.Emp_No
Alternatively, with stored queries which sometimes can be more efficient with Access' engine:
SELECT p.Employee_No, p.PaidRegHours, p.PaidOTHours, t.ClockedRegHours, t.ClockedOTHours
FROM qryPayrollTransactionsAgg p
INNER JOIN qryPayrollTimeAgg t
ON p.Employee_No = t.Emp_No
I'm trying to figure out a work around for the fact HIVE doesn't support correlated subqueries. Ultimately, I've been counting how many items exist in the data each week over the last month, and now I want to know how many items dropped out this week, came back, or are totally new. Wouldn't be too hard if I could use a where subquery but I'm having a tough time thinking of a work around without it.
Select
count(distinct item)
From data
where item in (Select item from data where date <= ("2016-05-10"))
And date between "2016-05-01" and getdate()
Any help would be great. Thank you.
Work around is left join with two result set and where second result set column is null.
ie
Select count (a.item)
from
(select distinct item from data where date between "2016-05-01" and getdate()) a
left join (Select distinct item from data where date <= ("2016-05-10")) b
on a.item =b.item
and b.item is null
I am trying to perform a cumulative sum of values in SQLite. I initially only needed to sum a single column and had the code
SELECT
t.MyColumn,
(SELECT Sum(r.KeyColumn1) FROM MyTable as r WHERE r.Date < t.Date)
FROM MyTable as t
Group By t.Date;
which worked fine.
Now I wanted to extend this to more columns KeyColumn2 and KeyColumn3 say. Instead of adding more SELECT statements I thought it would be better to use a join and wrote the following
SELECT
t.MyColumn,
Sum(r.KeyColumn1),
Sum(r.KeyColumn2),
Sum(r.KeyColumn3)
FROM MyTable as t
Left Join MyTable as r On (r.Date < t.Date)
Group By t.Date;
However this does not give me the correct answer (instead it gives values that are much larger than expected). Why is this and how could I correct the JOIN to give me the correct answer?
You are likely getting what I would call mini-Cartesian products: your Date values are probably not unique and, as a result of the self-join, you are getting matches for each of the non-unique values. After grouping by Date the results are just multiplied accordingly.
To solve this, the left side of the join must be rid of duplicate dates. One way is to derive a table of unique dates from your table:
SELECT DISTINCT Date
FROM MyTable
and use it as the left side of the join:
SELECT
t.Date,
Sum(r.KeyColumn1),
Sum(r.KeyColumn2),
Sum(r.KeyColumn3)
FROM (SELECT DISTINCT Date FROM MyTable) as t
Left Join MyTable as r On (r.Date < t.Date)
Group By t.Date;
I noticed that you used t.MyColumn in the SELECT clause, while your grouping was by t.Date. If that was intentional, you may be relying on undefined behaviour there, because the t.MyColumn value would probably be chosen arbitrarily among the (potentially) many in the same t.Date group.
For the purpose of this example, I assumed that you actually meant t.Date, so, I replaced the column accordingly, as you can see above. If my assumption was incorrect, please clarify.
Your join is not working cause he will find way more possibilities to join then your subselect would do.
The join is exploding your table.
The sub select does a sum of all records where the date is lower then the one from the current record.
The join joins every row multiple times aslong as the date is lower then the current record. This mean a single record could do as manny joins as there are records with a date lower. This causes multiple records. And in the end a higher SUM.
If you want the sum from mulitple columns you will have to use 3 sub query or define a unique join.
Is it possible to join on a field that isn't in a table, but is derived from it?
For example, if I have one table mapping calendar dates to data, and another mapping days of the week (0-6) to data. How would one join the calendar dates table to the days of week table without adding a "day of week" field to the former?
try something like this:
select
a.one+a.two, b.three
from TableA a
inner join TableB b on a.one+a.two=b.three
just put your calculation in the join, index usage is unlikely though. you don'y say your database, but if you have some command to take the weekday() of the date, you can join on that:
inner join TableB on weekday(a.EventDate)=b.Weekday
If you're using SQL server, you can use the DATEPART function to give you which day of the week (0-7) a particular date is on. You should be able to join the date column using this function and your day of the week number:
select * from
t1 inner join t2 on
DATEPART(weekday,t1.dateColumnName) = t2.dayOfTheWeek
A gotcha though - this may vary dependant on which day of the week is set as the first in your SQL Server settings.
Sure, why not.
select foo.dayofweek, bar.date from foo
join bar on datepart(dw, bar.date) = foo.dayofweek
Don't think this will leverage your indexes though, as the other guy said.