I've been pulling data from several different databases to create a summarised table of information all hinging on specific columns that they would have in common.
All of the tables have 3 columns in common:
Year
Month
Client
Other than this, they are a mixture of counts,sums,calculations and just general queries on various aspects of a client. I'm trying to map out a basic summary page on how each client is. My dream was to pull all of this into a centralised DB, with detailed information intact into tables. Then to have a series of views on each of these to summarise these tables 1 view per table. Then to have a summary table/view grouping all the views by year/month/client.
However i'm struggling to put everything together, I've got raw data in the tables like.
Ordernumber / Lines/ Client/Year/Month
with the view doing:
Count of orders / sum of lines / client/year/month.
However due to the variation with the views I can't do something like a UNION.
Example data (of the views)
View1
Year Month Count Sum ClientCode
2017 May 18 146 A
2017 May 7 110 B
2017 May 2 17 C
View2
Year Month CountOfOrders CountOfFiles SumOfLines ClientCode
2017 May 8 2 140 A
2017 May 7 6 25 B
Dream goal would be:
Year Month ClientCode Count Sum CountOfOrders CountOfFiles SumOfLines
2017 May A 18 146 8 2 140
2017 May B 7 110 7 6 25
2017 May C 2 17 0 0 0
Any advice would be great, I've tried doing a UNIONALL, so that I could do WHERE ALL_TABLES = Year 2017, Month = May. But realised that UNION's won't work as they merge rows now columns.
You can join views just like tables... Seems a LEFT JOIN is what you want here, with COALESCE() to handle nulls:
SELECT V1.Year, V1.Month, V1.ClientCode, V1.Count, V1.Sum,
COALESCE(CountOfOrders,0) COALESCE(CountOfFiles,0) COALESCE(SumOfLines,0)
FROM View1 V1
LEFT JOIN View2 V2 ON V1.Year = V2.Year
AND V1.Month = V2.Month
AND V1.ClientCode = V2.ClientCode
Only thing to note, you will need more logic if not there are Year/Month/ClientCode combinations that exist in View2, and aren't in View1.
Related
I've changed my DB structure to make it more future proof. Now I'm having trouble with the new select query.
I have table called activities that has a list of activities and how many steps per minute that activity was worth. The table was structred like this:
Activities
id act_name act_steps
12 Boxing 250
14 Karate 300
17 Yoga 89
I have another table called distance that is structed like this:
Distance
id dist_activity_id dist_activity_duration member_id
1 12 60 12
2 14 90 12
3 17 30 12
I have the query that would SUM and produce a total for all activities in the distance table
SELECT ROUND(SUM(act_steps * dist_activity_duration / 2000),2) AS total_miles
FROM distance,
activities
WHERE activities.id = distance.dist_activity_id
This worked fine.
To future proof it incase the number of steps for an activity changes I've setup a table called steps that is structured like this:
Steps
id activity_steps
1 6
2 250
3 300
4 89
I then updated the activities table, removing the act_steps column and replacing it with steps_id so it now looks like this:
Updated activities
id act_name steps_id
12 Boxing 2
14 Karate 3
17 Yoga 4
I'm not sure how to create the select command to get the SUM using the new structure.
Could someone please help me with this?
Thanks
Wayne
Learn to use proper JOIN syntax! Your query should look like:
SELECT ROUND(SUM(a.act_steps * d.dist_activity_duration / 2000), 2) AS total_miles
FROM distance d JOIN
activities a
ON a.id = d.dist_activity_id;
If you need to lookup the steps, then add another JOIN:
SELECT ROUND(SUM(s.activity_steps * d.dist_activity_duration / 2000), 2) AS total_miles
FROM distance d JOIN
activities a
ON a.id = d.dist_activity_id JOIN
steps s
ON s.id = a.steps_id;
This question already has answers here:
SQL JOIN and different types of JOINs
(6 answers)
Closed 3 years ago.
I would like to join together two tables with additional columns.
First table is for number of products despatched by product
** Table 1 - Despatches **
Month ProductID No_despatched
Jan abc 10
Jan def 15
Jan xyz 12
The second table is for the number of products returned by product, but also an additional column by return reason
** Table 2 - Returns **
Month ProductID No_returned Return_reason
Jan abc 2 Too big
Jan abc 3 Too small
Jan xyz 1 Wrong colour
I would like to join the tables to show returns and despatched on the same row with the number of despatched being duplicated if there are multiple return reasons for the same product.
** Desired output **
Month ProductID No_despatched No_returned Return_reason
Jan abc 10 2 Too big
Jan abc 10 3 Too small
Jan xyz 12 1 Wrong colour
Hope this makes sense...
Thanks in advance!
afk
This seems like a basic JOIN:
select r.month, r.productid, d.no_despathed, r.no_returned, r.return_reason
from returns r join
despatches d
on r.month = d.month and r.productid = d.productid;
The results don't seem particularly useful, because some products are missing (those with no returns). And the amounts are duplicated if there is more than one return record.
just use join
select a.*,b.No_returned,.Return_reason from
table1 join table2 on a.ProductID=b.ProductID
and a.month=b.month
In case of duplicate you may use distinct
Changing the order of clauses in your question produces the result.
with additional columns.
SELECT Table1.Month, Table1.ProductID, Table1.NoDespatched, Table2.NoReturned, Table2.ReturnReason
join two tables
FROM Table1 LEFT JOIN Table2
ON Table1.Month=Table2.Month AND Table1.ProductID=Table2.ProductID
We use a LEFT JOIN because, presumably a product can be dispatched without being returned, but nobody can return a product you didn't send out.
Suppose that I have a table in a SQL database with columns like the ones shown below. The table records various performance metrics of the employees in my company each month.
I can easily query the table so that I can see the best monthly sales figures that my employees have ever obtained, along with which employee was responsible and which month the figure was obtained in:
SELECT * FROM EmployeePerformance ORDER BY Sales DESC;
NAME MONTH SALES COMMENDATIONS ABSENCES
Karen Jul 16 36,319.13 2 0
David Feb 16 35,398.03 2 1
Martin Nov 16 33,774.38 1 1
Sandra Nov 15 33,012.55 4 0
Sandra Mar 16 31,404.45 1 0
Karen Sep 16 30,645.78 2 2
David Feb 16 29,584.81 1 1
Karen Jun 16 29,030.00 3 0
Stuart Mar 16 28,877.34 0 1
Karen Nov 15 28,214.42 1 2
Martin May 16 28,091.99 3 0
This query is very simple, but it's not quite what I want. How would I need to change it if I wanted to see only the top 3 monthly figures achieved by each employee in the result set?
To put it another way, I want to write a query that is the same as the one above, but if any employee would appear in the result set more than 3 times, then only their top 3 results should be included, and any further results of theirs should be ignored. In my sample query, Karen's figure from Nov 15 would no longer be included, because she already has three other figures higher than that according to the ordering "ORDER BY Sales DESC".
The specific SQL database I am using is either SQLite or, if what I need is not possible with SQLite, then MySQL.
In MySQL you can use windows function:
SELECT *
FROM EmployeePerformance
WHERE row_number() OVER (ORDER BY Sales DESC)<=3
ORDER BY Sales DESC
In SQLite window functions aren't available, but you still can count the preceding rows:
SELECT *
FROM EmployeePerformance e
WHERE
(SELECT COUNT(*)
FROM EmployeePerformance ee
WHERE ee.Name=e.Name and ee.Sales>e.Sales)<3
ORDER BY e.Sales DESC
I have managed to find an answer myself. It seems to work by pairing each record up with all of the records from the same person that were equal or greater, and then choosing only the (left) records that had no more than 3 greater-or-equal pairings.
SELECT P.Name, P.Month, P.Sales, P.Commendations, P.Absences
FROM Performance P
LEFT JOIN Performance P2 ON (P.Name = P2.Name AND P.Sales <= P2.Sales)
GROUP BY P.Name, P.Month, P.Sales, P.Commendations, P.Absences
HAVING COUNT(*) <= 3
ORDER BY P.Sales DESC;
I will give the credit to a_horse_with_no_name for adding the tag "greatest-n-per-group", as I would have had no idea what to search for otherwise, and by looking through other questions with this tag I managed to find what I wanted.
I found this question that was similar to mine... Using LIMIT within GROUP BY to get N results per group?
And I followed this link that somebody had included in a comment... https://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
...and the answer I wanted was in the first comment on that article. It's perfect as it uses only a LEFT JOIN, so it will work in SQLite.
Here is my SQL Fiddle: http://sqlfiddle.com/#!7/580f0/5/0
I have user data:
user store item cost
1 10 100 5
1 10 101 3
1 11 102 7
2 10 101 3
2 12 103 4
2 12 104 5
I want a table which will tell me for each user how much he bought from each store and how much he bought in total:
user store cost_this_store cost_total
1 10 8 15
1 11 7 15
2 10 3 12
2 12 9 12
I can do this with two group by and a join:
select s.user, s.store, s.cost_this_store, u.cost_total
from (select user, store, sum(cost) as cost_this_store
from my_data
group by user, store) s
join (select user, sum(cost) as cost_total
from my_data
group by user) u
on s.user = u.user
However, this is definitely not how I would do this if I were writing this in any other language (join is clearly avoidable, and the two group by are not independent).
Is it possible to avoid the join in sql?
PS. I need the solution to work in hive.
You can do this with a windowing function... which Hive added support for last year:
select distinct
user,
store,
sum(cost) over (partition by user, store) as cost_this_store,
sum(cost) over (partition by user) as cost_total
from my_data
However, I'd argue that there wasn't anything glaringly wrong with your original implementation. You've essentially got two different sets of data, which you're combining through a JOIN.
The duplication might look like a code smell in a different language, but this isn't necessarily the wrong approach in SQL, and often you'll have to take approaches such as this that duplicate a portion of a query between two intermediate result sets for performance reasons.
SQL Fiddle (SQL Server)
I hope that someone can help me with my issue. I need to create in a single SELECT statement (the system that we use has some pivot tables in Excel that handle one single SELECT) the following:
I have a INL (Invoice Lines) table, that has a lot of fields, but the important one is the date.
INL_ID DATE
19 2004-03-15 00:00:00.000
20 2004-03-15 00:00:00.000
21 2004-03-15 00:00:00.000
22 2004-03-16 00:00:00.000
23 2004-03-16 00:00:00.000
24 2004-03-16 00:00:00.000
Now, I also have a ILD (Invoice Line Details) that are related by an ID field to the INL table. From the second table I will need to use the scu_qty field to "repeat" values from the first one in my results sheet.
The ILD table values that we need are:
INL_ID scu_qty
19 1
20 1
21 1
22 4
23 4
Now, with the scu_qty I need to repeat the value of the first table and also add one day each record, the scu_qty is the quantity of days of the services that we sell in the ILD table.
So I need to get something like (i'm going to show the INL_ID 22 that you can see has a value different of 1 in the SCU_QTY). The results of the select has to give me something like:
INL_ID DATE
22 2004-03-15 0:00:00
22 2004-03-16 0:00:00
22 2004-03-17 0:00:00
22 2004-03-18 0:00:00
In this information I only wrote the fields that need to be repeated and calculated, of course I will need more fields, but will be repeated from the INL table, so I don't put them so you don't get confused.
I hope that someone can help me with this, it's very important for us this report. Thanks a lot in advance
(Sorry for my English, that isn't my first language)
SELECT INL_ID, scu_qty, CalculatedDATE ...
FROM INL
INNER JOIN ILD ON ...
INNER JOIN SequenceTable ON SequenceTable.seqNo <= ILD.scu_qty
ORDER BY INL_ID, SequenceTable.seqNo
Depending on your SQL flavour you will need to lookup date manipulation functions to do
CalculatedDATE = {INL.DATE + SequenceTable.seqNo (days)}
select INL.INL_ID, `DATE`
from
INL
inner join
ILD on INL.INL_ID = ILD.INL_ID
inner join (
select 1 as qty union select 2 union select 3 union select 4
) s on s.qty <= ILD.scu_qty
order by INL.INL_ID
In instead of that subselect you will need a table if quantity is a bit bigger. Or tell what is your RDBMS and there can be an easier way.