SQL query over several tables - sql

I am trying to write a query with SQL but I don't know how to implement it.
I have given 4 tables with company_ID, reporting year, type_ID and turnover of the company. The values of each company are queried every year, but it is possible that the company does not have a turnover every year or it is also possible that there is no turnover for a company for the whole time. And and it is valid that T1.CompanyID = T2/T3/T4.CompanyID
So now I want to query when is the latest year, as max(ReportingYear), where a turnover is specified and use this reporting year, turnover and CompanyType. If a company had no sales at all over time, the most recent ReportingYear and the associated CompanyType.
I would need these values in the final query:
T1.Company_ext,
T2.taxID,
T3.ReportingYear,
T3.CompanyType,
T4.Turnover,
T4.ReportingYear
With this query I got the companies that ever had any turnover at all.
SELECT
T2.taxID, max(T4.ReportingYear) AS Max_Report
FROM T1
LEFT JOIN T2 ON T1.CompanyID = T2.CompanyIDc.Id
LEFT JOIN T3 ON T1.CompanyID = T3.CompanyIDc.Id
LEFT JOIN T4 ON T1.CompanyID = T4.CompanyIDc.Id AND T4.ReportingYear = T3.ReportingYear
WHERE T4.Turnover IS NOT NULL
GROUP BY T2.taxID
My thought now was that the total quantity - quantity of the above query, must be the ones that had no turnover, however this query runs way too long. Does anyone know a faster way to get the result I want?
The tables look as follows: (In T4 the values I want are marked)
T1:
CompanyID
CompanyId_ext
123
5
124
6
125
7
T2:
CompanyID
taxID
123
1852
124
1935
125
1354
T3:
CompanyID
AttributeID
ReportingYear
CompanyType
123
857269
2022
Public
123
857269
2021
Not Public
124
857270
2022
Not Public
124
857270
2021
Public
125
857271
2022
Not Public
125
857271
2021
Not Public
T4:
CompanyID
Turnover
ReportingYear
123
1000
2022
123
NULL
2021
124
NULL
2022
124
500
2021
125
NULL
2022
125
NULL
2021

If you want to get the CompanyID, the CompanyType and the highest ReportingYear with a Turnover (meaning Turnover != NULL), then this is how you could implement it:
SELECT T4.CompanyID, MAX(T4.ReportingYear), T3.CompanyType
FROM T4
JOIN T3 ON T4.CompanyID = T3.CompanyID
WHERE T4.Turnover != NULL

Related

SQL Querying of Data by grouping with only one main variable(Store) and finding the percentage of customers in other variable

Tables - Store
Stores
Date
Customer_ID
A
01/01/2020
1111
C
01/01/2020
1111
F
02/01/2020
1234
A
02/01/2020
1111
A
02/01/2020
2222
Tables - Customer
Customer_ID
Age_Group
Income_Level
1111
26-30
Low
1234
25 and below
Mid
2222
31-60
High
I want to know how I can get this output.
Stores
Age_Group
Percentage_by_Age
Income_Level
Percentage_By_Income
A
25 and below
10
Low
80
A
25 and below
10
Mid
10
A
25 and below
10
High
10
A
26 - 30
42
Low
15
A
26 - 30
42
Mid
65
A
26 - 30
42
High
20
A
31 - 60
48
Low
30
A
31 - 60
48
Mid
50
A
31 - 60
48
High
20
I am using SQL to query from different tables.
First I need to aggregate the number of customers by stores, then in each store, I want to find out how many customers visited Store A in a particular age group(25 and below), and how many of them are in which income level.
May I know how I can go about solving this query?
Thanks.
My current solution/thought process
SELECT
stores AS Stores,
Age_Group AS Age,
Income_Level AS Income
COUNT(DISTINCT(Customer_ID)) AS Number_of_Customers
FROM tables JOIN tables....
GROUP BY Stores, Ages, Income;
And then manually calculating the percentages.
But it doesn't seem right.
Is there a way to produce an example output table using just SQL?
As per your requirement, Common Table Expressions can be used . You can use below code to get the expected output.
WITH
data_for_percent_by_income AS (
SELECT
COUNT(customer_id) AS cus_count_in_per_income_level_and_agegrp,
Age_group AS age_g,income_level AS inc_lvl
FROM
`project.dataset.Customer2`
WHERE
customer_id IN (
SELECT customer_id
FROM
`project.dataset.Store5`
WHERE stores='A')
GROUP BY
Age_group,income_level),tot_cus_in_defined_income_level AS (
SELECT
COUNT(customer_id) AS cus_count_in_per_income_level,Age_group AS ag
FROM
`project.dataset.Customer2`
WHERE
customer_id IN (
SELECT
customer_id
FROM
`project.dataset.Store5`
WHERE stores='A')
GROUP BY
Age_group),
tot_cus_storeA AS(
SELECT
COUNT(*) AS tot_cus_in_A
FROM
`project.dataset.Customer2`
WHERE customer_id IN (
SELECT customer_id
FROM
`project.dataset.Store5`
WHERE stores='A') ),
final_view AS(
SELECT
ROUND(cus_count_in_per_income_level_and_agegrp*100/cus_count_in_per_income_level) AS p_by_inc,
age_g,inc_lvl
FROM
data_for_percent_by_income
INNER JOIN
tot_cus_in_defined_income_level
ON
data_for_percent_by_income.age_g=tot_cus_in_defined_income_level.ag )
SELECT
stores,tot_cus_in_defined_income_level.ag AS age_group,income_level,
ROUND(cus_count_in_per_income_level*100/tot_cus_in_A) AS percentage_by_age,
p_by_inc AS percentage_by_income
FROM
tot_cus_in_defined_income_level,tot_cus_storeA,`project.dataset.Customer2`,`project.dataset.Store5`
INNER JOIN
final_view
ON
age_group=final_view.age_g AND income_level=final_view.inc_lvl
WHERE
tot_cus_in_defined_income_level.ag = Age_group AND stores='A'
GROUP BY
stores,percentage_by_age,age_group,income_level,percentage_by_income
ORDER BY Age_group
I have attached the screenshots of the input table and output table.
Customer Table
Store Table
Output Table
SELECT
s.Stores AS Stores,
c.age_group AS Age,
a.income_level AS Affluence,
CAST(COUNT(DISTINCT c.Customer_ID) AS numeric)*100/SUM(CAST(COUNT(DISTINCT c.Customer_ID) AS numeric)) OVER(PARTITION BY s.Stores ) AS Perc_of_Members
This is what I did in the end.

New to SQL - How to compare data from two sets of joined tables

I'm trying to compare a set of multiple tables against another set of multiple tables in a SQL database. Admittedly, I'm very novice to SQL so please forgive me if my terminology is incorrect.
I have 6 individual tables in my database. 3 of those are comprised of current month's data and the other three are prior month's data. I need to produce a report that shows current month data, but also compare the account balance against the prior month - I need to create a column that will list "true" or "false" based on whether the account balances match.
I have joined the three current month's data tables together and I have joined the three prior month's data tables together.
The two sets of tables are identical. Below is a description of the tables - There is one set prefixed "CURRENT" and one prefixed "PRIOR":
BALANCE:
EntityID
AccountID
AccountBalance
ENTITY:
ID
Label
Description
ACCOUNT:
ID
AccountNumber
Description
The report I need to provide should list the following from the current month's data:
Label
Description
AccountNumber
AccountDescription
AccountBalance
I need to add a column called "Changed" at the end of the report. This column should have a "True" or "False" value depending on whether or not the current & prior account balances match.
Thus far, I have only been able to join the tables. I'm unsure how to edit this query to compare the current month dbo.CURRENT_BALANCE.AccountBalance against the prior month's dbo.PRIOR_BALANCE.AccountBalance
SELECT DISTINCT
dbo.CURRENT_ENTITY.Label,
dbo.CURRENT_ENTITY.Description AS Entity,
dbo.CURRENT_ACCOUNT.AccountNumber,
dbo.CURRENT_ACCOUNT.Description AS AccountDescription,
dbo.CURRENT_BALANCE.GLAccountBalance
FROM
dbo.CURRENT_BALANCE
INNER JOIN dbo.CURRENT_ENTITY ON dbo.CURRENT_BALANCE.EntityID = dbo.CURRENT_ENTITY.ID
INNER JOIN dbo.CURRENT_ACCOUNT ON dbo.CURRENT_BALANCE.AccountID = dbo.CURRENT_ACCOUNT.ID
CROSS JOIN dbo.PRIOR_BALANCE
INNER JOIN dbo.PRIOR_ENTITY ON dbo.PRIOR_BALANCE.EntityID = dbo.PRIOR_ENTITY.ID
INNER JOIN dbo.PRIOR_ACCOUNT ON dbo.PRIOR_BALANCE.AccountID = dbo.PRIOR_ACCOUNT.ID
The query above returns my expected results:
Label Entity AccountNumber AccountDescription AccountBalance
21 Company ABC 1 Customer Sales 25
21 Company ABC 2 Customer Sales 568
22 XYZ Solutions 3 Vendor Sales 344
23 Number 1 Products 4 Vendor Sales 565
24 Enterprise Inc 5 Wholesale 334
24 Enterprise Inc 6 Wholesale 5452
24 Enterprise Inc 7 Wholesale 5877
26 QWERTY Solutions 8 Customer Sales 456
27 Acme 9 Customer Sales 752
28 United Product Solutions 10 Vendor Sales 87
What I would like to do is have a result similar to:
Label Entity AccountNumber AccountDescription AccountBalance Changed
21 Company ABC 1 Customer Sales 25 FALSE
21 Company ABC 2 Customer Sales 568 FALSE
22 XYZ Solutions 3 Vendor Sales 344 FALSE
23 Number 1 Products 4 Vendor Sales 565 FALSE
24 Enterprise Inc 5 Wholesale 334 TRUE
24 Enterprise Inc 6 Wholesale 5452 FALSE
24 Enterprise Inc 7 Wholesale 5877 TRUE
26 QWERTY Solutions 8 Customer Sales 456 FALSE
27 Acme 9 Customer Sales 752 FALSE
28 United Product Solutions 10 Vendor Sales 87 FALSE
I have no idea where to go from here. Would appreciate any advise from this group!
The below should work as a way to compare the 2 values.
I have added aliases to table names to help with reading;
SELECT DISTINCT
ce.Label
,ce.Description AS Entity
,ca.AccountNumber
,ca.Description AS AccountDescription
,cb.GLAccountBalance
,CASE
WHEN cb.GLAccountBalance = p.PriorBalance THEN 'False'
WHEN cb.GLAccountBalance <> p.PriorBalance THEN 'True'
END AS [Changed]
FROM dbo.CURRENT_BALANCE cb
INNER JOIN dbo.CURRENT_ENTITY ce ON cb.EntityID = ce.ID
INNER JOIN dbo.CURRENT_ACCOUNT ca ON cb.AccountID = ca.ID
LEFT JOIN ( SELECT DISTINCT
pe.ID AS PE_ID
,pa.ID AS PA_ID
,pb.GLAccountBalance AS PriorBalance
FROM dbo.PRIOR_BALANCE pb
INNER JOIN dbo.PRIOR_ENTITY pe ON pb.EntityID = pe.ID
INNER JOIN dbo.PRIOR_ACCOUNT pa ON pb.AccountID = pa.ID
) p ON p.PE_ID = ce.ID AND p.PA_ID = ca.ID

Join two 2 tables to get sum of each year - SQL Server

I have 2 tables below. Common column is ObjectId.
Table 1:
Year ObjectId G_Amount
===== ======== ========-
2010 1234 5.00
2010 1234 5.00
2010 5678 10.00
2010 5678 20.00
2010 5678 50.00
2012 9101 0.00
2012 1122 70.00
2012 1122 100.00
Table 2:
ObjectId I_Amount
========= ===========-
1234 10.00
1234 30.00
5678 15.00
5678 20.00
9101 25.00
9101 35.00
1122 10.00
1122 0.00
I want to join above 2 table such that the resultset becomes like below. G_Amount (total of g_amount for each year from Table 1) & I_Amount (total of i_amount for each year from Table 2) should be displayed year wise.
Result set:
Year G_Amount I_Amount
===== ======== ========-
2010 90.00 75.00
2012 170.00 70.00
Can anyone please help?
Your datamodel allows for an object ID to appear in multiple years. The related table2 entries would thus be multiplied, i.e. added to every of these years. If you don't want this, then you need a separate table with one record per object.
As is you'd aggregate the first table by year and object ID and the second by object ID only. Then join and aggregate again to get years.
select
t1.year,
sum(t1.totalg) as total_g_amount,
coalesce(sum(t2.totali), 0) as total_i_amount
from
(
select Year, ObjectId, sum(G_Amount) as totalg
from table1
group by Year, ObjectId
) t1
left join
(
select ObjectId, sum(i_Amount) as totali
from table2
group by ObjectId
) t2 on t2.ObjectId = t1.ObjectId
group by t1.year
order by t1.year;
Something like this will work for your dataset. I've included a LEFT JOIN for any values in Table1 that aren't in Table2:
SELECT
T1.[Year]
, SUM(T1.G_Amount_Total) G_Amount
, ISNULL(SUM(T2.I_Amount_Total), 0) I_Amount
FROM
(
SELECT
[Year]
, ObjectId
, SUM(G_Amount) G_Amount_Total
FROM Table1
GROUP BY
[Year]
, ObjectId
) T1
LEFT JOIN
(
SELECT
ObjectId
, SUM(I_Amount) I_Amount_Total
FROM Table2
GROUP BY ObjectId
) T2
ON T1.ObjectId = T2.ObjectId
GROUP BY T1.[Year]
I do see some potential problems with your data structure, e.g, there is no Year reference in Table2, however, if your data is under control, it should all work out.
Try this , anyway i am new to T-SQL so it can be wrong .
Select DISTINCT T1.Year
SUM (T1.G_Amount) as G_Amount,
SUM (T2.I_Amount) AS I_Amount
from table1 T1 join Table2 T2 on T1.ObjectID = T2.ObjectID

Aggregate payments per year per customer per type

Please consider the following payment data:
customerID paymentID pamentType paymentDate paymentAmount
---------------------------------------------------------------------
1 1 A 2015-11-28 500
1 2 A 2015-11-29 -150
1 3 B 2016-03-07 300
2 4 A 2015-03-03 200
2 5 B 2016-05-25 -100
2 6 C 2016-06-24 700
1 7 B 2015-09-22 110
2 8 B 2016-01-03 400
I need to tally per year, per customer, the sum of the diverse payment types (A = invoice, B = credit note, etc), as follows:
year customerID paymentType paymentSum
-----------------------------------------------
2015 1 A 350 : paymentID 1 + 2
2015 1 B 110 : paymentID 7
2015 1 C 0
2015 2 A 200 : paymentID 4
2015 2 B 0
2015 2 C 0
2016 1 A 0
2016 1 B 300 : paymentID 3
2016 1 C 0
2016 2 A 0
2016 2 B 300 : paymentID 5 + 8
2016 2 C 700 : paymentId 6
It is important that there are values for every category (so for 2015, customer 1 has 0 payment value for type C, but still it is good to see this).
In reality, there are over 10 payment types and about 30 customers. The total date range is 10 years.
Is this possible to do in only SQL, and if so could somebody show me how? If possible by using relatively easy queries so that I can learn from it, for instance by storing intermediary result into a #temptable.
Any help is greatly appreciated!
a simple GROUP BY with SUM() on the paymentAmount will gives you what you wanted
select year = datepart(year, paymentDate),
customerID,
paymentType,
paymentSum = sum(paymentAmount)
from payment_data
group by datepart(year, paymentDate), customerID, paymentType
This is a simple query that generates the required 0s. Note that it may not be the most efficient way to generate this result set. If you already have lookup tables for customers or payment types, it would be preferable to use those rather than the CTEs1 I use here:
declare #t table (customerID int,paymentID int,paymentType char(1),paymentDate date,
paymentAmount int)
insert into #t(customerID,paymentID,paymentType,paymentDate,paymentAmount) values
(1,1,'A','20151128', 500),
(1,2,'A','20151129',-150),
(1,3,'B','20160307', 300),
(2,4,'A','20150303', 200),
(2,5,'B','20160525',-100),
(2,6,'C','20160624', 700),
(1,7,'B','20150922', 110),
(2,8,'B','20160103', 400)
;With Customers as (
select DISTINCT customerID from #t
), PaymentTypes as (
select DISTINCT paymentType from #t
), Years as (
select DISTINCT DATEPART(year,paymentDate) as Yr from #t
), Matrix as (
select
customerID,
paymentType,
Yr
from
Customers
cross join
PaymentTypes
cross join
Years
)
select
m.customerID,
m.paymentType,
m.Yr,
COALESCE(SUM(paymentAmount),0) as Total
from
Matrix m
left join
#t t
on
m.customerID = t.customerID and
m.paymentType = t.paymentType and
m.Yr = DATEPART(year,t.paymentDate)
group by
m.customerID,
m.paymentType,
m.Yr
Result:
customerID paymentType Yr Total
----------- ----------- ----------- -----------
1 A 2015 350
1 A 2016 0
1 B 2015 110
1 B 2016 300
1 C 2015 0
1 C 2016 0
2 A 2015 200
2 A 2016 0
2 B 2015 0
2 B 2016 300
2 C 2015 0
2 C 2016 700
(We may also want to play games with a numbers table and/or generate actual start and end dates for years if the date processing above needs to be able to use an index)
Note also how similar the top of my script is to the sample data in your question - except it's actual code that generates the sample data. You may wish to consider presenting sample code in such a way in the future since it simplifies the process of actually being able to test scripts in answers.
1CTEs - Common Table Expressions. They may be thought of as conceptually similar to temp tables - except we don't actually (necessarily) materialize the results. They also are incorporated into the single query that follows them and the whole query is optimized as a whole.
Your suggestion to use temp tables means that you'd be breaking this into multiple separate queries that then necessarily force SQL to perform the task in an order that we have selected rather than letting the optimizer choose the best approach for the above single query.

cross reference nearest date data

I have three table ElecUser, ElecUsage, ElecEmissionFactor
ElecUser:
UserID UserName
1 Main Building
2 Staff Quarter
ElecUsage:
UserID Time Amount
1 1/7/2010 23230
1 8/10/2011 34340
1 8/1/2011 34300
1 2/3/2012 43430
1 4/2/2013 43560
1 3/2/2014 44540
2 3/6/2014 44000
ElecEmissionFactor:
Time CO2Emission
1/1/2010 0.5
1/1/2011 0.55
1/1/2012 0.56
1/1/2013 0.57
And intended outcome:
UserName Time CO2
1 2010 11615
1 2011 37752 (34340*0.55 + 34300*0.55)
1 2012 24320.8
1 2013 24829.2
1 2014 25387.8
2 2014 25080
The logic is ElecUsage.Amount * ElecEmissionFactor.
If same user and same year, add them up for the record of that year.
My query is:
SELECT ElecUser.UserName, Year([ElecUsage].[Time]), SUM((ElecEmissionFactor.CO2Emission*ElecUsage.Amount)) As CO2
FROM ElecEmissionFactor, ElecUser INNER JOIN ElecUsage ON ElecUser.UserID = ElecUsage.UserID
WHERE (((Year([ElecUsage].[Time]))>=Year([ElecEmissionFactor].[Time])))
GROUP BY ElecUser.UserName, Year([ElecUsage].[Time])
HAVING Year([ElecUsage].[Time]) = Max(Year(ElecEmissionFactor.Time));
However, this only shows the year with emission factor.
The challenge is to reference the year without emission factor to the latest year with emission factor.
Sub-query may be one of the solutions but i fail to do so.
I got stuck for a while. Hope to see your reply.
Thanks
Try something like this..
-- not tested
select T1.id, year(T1.time) as Time, sum(T1.amount*T2.co2emission) as CO2
from ElecUsage T1
left outer join ElecEmissionFactor T2 on (year(T1.time) = year(T2.time))
Group by year(T1.time), T1.id
use sub query to get the corresponding factor in this way
select T1.id,
year(T1.time) as Time,
sum(T1.amount*
(
select top 1 CO2Emission from ElecEmissionFactor T2
where year(T2.time) <= year(T1.time) order by T2.time desc
)
) as CO2
from ElecUsage T1
Group by year(T1.time), T1.id