Joining tables with multiple columns

Joining tables with multiple columns - sql

I'm having an issue when trying to join two tables though. It is Google Analytics data. In my first table I have google ads data, split by the dimensions campaign_id and date, then I have some metrics (impressions, clicks, and cost).
Something like:
Date
Campaign_ID
Clicks
Impressions
Cost
2019-08-01
13345
345
10,045
296
2019-08-02
12343
452
23,033
359
2019-08-03
132456
587
25,056
562
So far so good for my knowledge. I then have a 2nd table which has transactions in it, the transactions are individual transactions with the transaction ID so needs to be in a separate table. I have again the dimensions date and campaign_ID, plus transaction id. And metric of revenue. I'm not actually bothered about revenue, its just in my table but can be ingnored.
So 2nd table is:
Date
Campaign_ID
Transaction_ID
Revenue
2019-08-01
12343
A1100
5000
2019-08-01
12343
A1101
5000
What I'm trying to do with the data is count the number of transactions by date and campaign_ID and then join this to table 1 by date and campaign_ID.
My query to count the transactions is very straightforward:
SELECT
date,
campaign_ID
count(Transaction_ID) as transactions
From bqproject.tables.ga_ecom_data
GROUP BY Date, Campaign_ID
This gives exactly the data I need, and I can use a join with a seperate table to join the 1st table and the query above.
What I'm trying to achieve though is to put this in a single query so that I can pull the impressions, clicks and cost data, then count the transactions, and join them together by date and campaign_ID.
Hopefully that makes sense in terms of my query.
I was trying to use select (select but the error I've getting back states that nesting queries using select can only return a single column where as I'm trying to return 3. I've tried using select struct but I can't figure out the correct format, or if it is right.
So I've gone with another route (based on some youtube videos I've been following), the query here doesn't break, but I don't get my column "count_transaction_ID".
Here is the query I've used:
SELECT newdate,
CID,
AD_I,
Ad_Click,
AD_COST ,
FROM (SELECT Date as newdate, Campaign_ID as CID, SUM(Ad_Impressions) as AD_I, SUM(Ad_Clicks) as AD_Click, SUM(Ad_Cost) as AD_COST
from `bqproject.tables.ga_ads_data`
GROUP BY Date, Campaign_ID) A
JOIN
(
SELECT Date, Campaign_ID, COUNT(Transaction_ID)
FROM `bqproject.tables.ga_ecom_data`
group by Date, Campaign_ID
) B
ON A.newdate = B.Date
AND A.CID = B.Campaign_ID
LIMIT 1000
What I was trying to get back is a table such as:
Date
Campaign_ID
Clicks
Impressions
Cost
Count_Of_Transactions
2019-08-01
13345
345
10,045
296
5
2019-08-02
12343
452
23,033
359
16
2019-08-03
132456
587
25,056
562
10
But the table data which is returned doesn't show the count(transactions).
I don't know if I'm using the wrong structure here and should be using different subqueries, or if this isn't actually possibly to do. It feels like it should be, but it's just past my understanding at the moment.

You can group the columns required for display in the first table then join second table using count to roll up the transactions, something like:
SELECT
first.Campaign_ID,
first.Date,
first.Impressions,
first.Clicks,
first.Cost,
count(second.Transaction_ID) transactions
from `bqproject.tables.ga_ads_data` first
left join `bqproject.tables.ga_ecom_data` second
on first.Campaign_ID = second.Campaign_ID and first.Date = second.Date
group by
first.Campaign_ID,
first.Date,
first.Ad_Impressions,
first.Ad_clicks,
first.Ad_cost
order by first.Campaign_ID, first.Date

Looks like you just need to name the count column and then add it to the above call. See below:
SELECT newdate,
CID,
AD_I,
Ad_Click,
AD_COST , b.counttransactions
FROM (SELECT Date as newdate, Campaign_ID as CID, SUM(Ad_Impressions) as AD_I, SUM(Ad_Clicks) as AD_Click, SUM(Ad_Cost) as AD_COST
from `bqproject.tables.ga_ads_data`
GROUP BY Date, Campaign_ID) A
JOIN
(
SELECT Date, Campaign_ID, COUNT(Transaction_ID) as countTransactions
FROM `bqproject.tables.ga_ecom_data`
group by Date, Campaign_ID
) B
ON A.newdate = B.Date
AND A.CID = B.Campaign_ID
LIMIT 1000

you are pretty close!
SELECT A.newdate,
A.CID,
A.AD_I,
A.Ad_Click,
A.AD_COST,
B.transactions
FROM (SELECT Date as newdate,
Campaign_ID as CID,
SUM(Ad_Clicks) as AD_Click,
SUM(Ad_Cost) as AD_COST,
SUM(Ad_Impressions) as AD_I,
from `bqproject.tables.ga_ads_data`
GROUP BY Date, Campaign_ID
) A
left JOIN
(
SELECT Date, Campaign_ID, COUNT(distinct Transaction_ID) as transactions
FROM `bqproject.tables.ga_ecom_data`
group by Date, Campaign_ID
) B
ON A.newdate = B.Date
AND A.CID = B.Campaign_ID
Let me know if this helps! Naming helps a lot when using subqueries, you just needed to add B.transactions to the primary select statement.
PD1: Just a tip going forward, try to keep column names as consistent as possible and also try to avoid capitalization! It will help your development timeline.
PD2: Watch out using simple join, that kind of join will not select campaigns without transactionIds.

Related

Find Last Purchase Order For Each Part

I need to find the last P.O.for parts purchased from Vendors.
I was trying to come up with a way to do this using a query I found that allowed me to find
the max Creation date for a group of Quotes linked to an Opportunity:
SELECT
t1.[quoteid]
,t1.[OpportunityId]
,t1.[Name]
FROM
[Quote] t1
WHERE
t1.[CreatedOn] = (SELECT MAX(t2.[CreatedOn])
FROM [Quote] t2
WHERE t2.[OpportunityId] = t1.[OpportunityId])
In the case of Purchase Orders, though, I have a header table and a line item table.
So, I need to include info from both:
SELECT
PURCHASE_ORDER.ORDER_DATE
,PURC_ORDER_LINE.PURC_ORDER_ID
,PURC_ORDER_LINE.PART_ID
,PURC_ORDER_LINE.UNIT_PRICE
,PURC_ORDER_LINE.USER_ORDER_QTY
FROM
PURCHASE_ORDER,
PURC_ORDER_LINE
WHERE
PURCHASE_ORDER.ID=
PURC_ORDER_LINE.PURC_ORDER_ID
If the ORDER_DATE from the header were available in the PURC_ORDER_LINE table I thought
this could be done like so:
SELECT
PURC_ORDER_LINE.ORDER_DATE
,PURC_ORDER_LINE.PURC_ORDER_ID
,PURC_ORDER_LINE.PART_ID
,PURC_ORDER_LINE.UNIT_PRICE
,PURC_ORDER_LINE.USER_ORDER_QTY
FROM
PURC_ORDER_LINE T1
WHERE T1.ORDER_DATE=(SELECT MAX(T2.ORDER_DATE)
FROM PURC_ORDER_LINE T2
WHERE T2.PURC_ORDER_ID=T1.PURC_ORDER_ID)
But I'm not sure that's correct and, in any case, there are 2 things:
The ORDER_DATE is in the Header table, not in the line table
I need the last P.O. created for each of the Parts (PART_ID)
So:
PART_A and PART_B, as an example, may appear on several P.O.s
Part
Order Date
P.O. #
PART_A
2020-08-17
PO12345
PART_A
2020-11-21
PO23456
PART_A
2021-07-08
PO29986
PART_B
2019-11-30
PO00861
PART_B
2021-08-30
PO30001
The result set would be (including the other fields from above):
ORDER_DATE
PURC_ORDER_ID
PART_ID
UNIT_PRICE
ORDER_QTY
2021-07-08
PO29986
PART_A
321.00
12
2021-08-30
PO30001
PART_B
426.30
8
I need a query that will give me such a result set.

You can use row-numbering for this. Just place the whole join inside a subquery (derived table), add a row-number, then filter on the outside.
SELECT *
FROM (
SELECT
pol.PART_ID,
po.ORDER_DATE,
pol.PURC_ORDER_ID,
pol.UNIT_PRICE,
pol.USER_ORDER_QTY,
rn = ROW_NUMBER() OVER (PARTITION BY pol.PART_ID ORDER BY po.ORDER_DATE DESC)
FROM PURCHASE_ORDER po
JOIN PURC_ORDER_LINE pol ON po.ID = pol.PURC_ORDER_ID
) po
WHERE po.rn = 1;
Note the use of proper join syntax, as well as table aliases

you can use window function:
select * from (
select * , row_number() over (partition by PART_ID order by ORDER_DATE desc) rn
from tablename
) t where t.rn = 1

Postgres: Count multiple events for distinct dates

People of Stack Overflow!
Thanks for taking the time to read this question. What I am trying to accomplish is to pivot some data all from just one table.
The original table has multiple datetime entries of specific events (e.g. when the customer was added add_time and when the customer was lost lost_time).
This is one part of two rows of the deals table:
id
add_time
last_mail_time
lost_time
5
2020-03-24 09:29:24
2020-04-03 13:20:29
NULL
310
2020-03-24 09:29:24
NULL
2020-04-03 13:20:29
I want to create a view of this table. A view that has one row for each distinct date and counts the number of events at this specific time.
This is the goal (times do not match with the example!):
I have working code, like this:
SELECT DISTINCT
change_datetime,
(SELECT COUNT(add_time) as add_time_count FROM deals WHERE add_time::date = change_datetime),
(SELECT COUNT(lost_time) as lost_time_count FROM deals WHERE lost_time::date = change_datetime)
FROM (
SELECT
add_time::date AS change_datetime
FROM
deals
UNION ALL
SELECT
lost_time::date AS change_datetime
FROM
deals
) AS foo
WHERE change_datetime IS NOT NULL
ORDER BY
change_datetime;
but this has some ugly O(n2) queries and takes a lot of time.
Is there a better, more performant way to achieve this?
Thanks!!

You can use a lateral join to unpivot and then aggregate:
select t::date,
count(*) filter (where which = 'add'),
count(*) filter (where which = 'mail'),
count(*) filter (where which = 'lost')
from deals d cross join lateral
(values (add_time, 'add'),
(last_mail_time, 'mail'),
(lost_time, 'lost')
) v(t, which)
group by t::date;

Recursive subtraction from two separate tables to fill in historical data

I have two datasets hosted in Snowflake with social media follower counts by day. The main table we will be using going forward (follower_counts) shows follower counts by day:
This table is live as of 4/4/2020 and will be updated daily. Unfortunately, I am unable to get historical data in this format. Instead, I have a table with historical data (follower_gains) that shows net follower gains by day for several accounts:
Ideally - I want to take the follower_count value from the minimum date in the current table (follower_counts) and subtract the sum of gains (organic + paid gains) for each day, until the minimum date of the follower_gains table, to fill in the follower_count historically. In addition, there are several accounts with data in these tables, so it would need to be grouped by account. It should look like this:
I've only gotten as far as unioning these two tables together, but don't even know where to start with looping through these rows:
WITH a AS (
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
total_followers_count,
null AS paid_follower_gain,
null AS organic_follower_gain,
account_name,
last_update
FROM follower_counts
UNION ALL
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
null AS total_followers_count,
organic_follower_gain,
paid_follower_gain,
account_name,
last_update
FROM follower_gains)
SELECT
a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.total_followers_count,
a.organic_follower_gain,
a.paid_follower_gain,
a.account_name,
a.last_update
FROM a
ORDER BY date desc LIMIT 100

UPDATE: Changed union to union all and added not exists to remove duplicates. Made changes per the comments.
NOTE: Please make sure you don't post images of the tables. It's difficult to recreate your scenario to write a correct query. Test this solution and update so that I can make modifications if necessary.
You don't loop through in SQL because its not a procedural language. The operation you define in the query is performed for all the rows in a table.
with cte as (SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
(a.follower_count - (b.organic_gain+b.paid_gain)) AS follower_count,
a.account_name,
a.last_update,
b.organic_gain,
b.paid_gain
FROM follower_counts a
JOIN follower_gains b ON a.account_id = b.account_id
AND b.date < (select min(date) from
follower_counts c where a.account.id = c.account_id)
)
SELECT b.account_id,
b.date,
b.organizational_entity,
b.organizational_entity_type,
b.vanity_name,
b.localized_name,
b.localized_website,
b.organization_type,
b.follower_count,
b.account_name,
b.last_update,
b.organic_gain,
b.paid_gain
FROM cte b
UNION ALL
SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.follower_count,
a.account_name,
a.last_update,
NULL as organic_gain,
NULL as paid_gain
FROM follower_counts a where not exists (select 1 from
follower_gains c where a.account_id = c.account_id AND a.date = c.date)

You could do something like this, instead of using the variable you can just wrap it another bracket and write at end ) AS FollowerGrowth
DECLARE #FollowerGrowth INT =
( SELECT total_followers_count
FROM follower_gains
WHERE AccountID = xx )
-
( SELECT TOP 1 follower_count
FROM follower_counts
WHERE AccountID = xx
ORDER BY date ASCENDING )

Use query result

I´m having issues with the following query. I have two tables; Table Orderheader and table Bought. The first query I execute gives me, for example, two dates. Based on these two dates, I need to find Production data AND, based on the production data, I need to find the Bought data, and combine those data together. Lets say I do the following:
Select Lotdate From Orderheader where orhsysid = 1
This results in two rows: '2019-02-05' and '2019-02-04'. Now I need to do two things: I need two run two queries using this result set. The first one is easy; use the dates returned and get a sum of column A like this:
Select date, SUM(Amount) from Orderheader where date = Sales.date() [use the two dates here]
The second one is slighty more complicated, I need to find the last day where something has been bought based on the two dates. Production is everyday so Productiondate=Sales.date()-1. But Bought is derived from Productionday and is not everyday so for every Productionday it needs to find the last Boughtday. So I can't say where date = Orderheader.date. I need to do something like:
Select date, SUM(Amount)
FROM Bought
WHERE date = (
SELECT top 1 date
FROM Bought
WHERE date < Orderheader.date)
But twice, for both the dates I got.
This needs to result in 1 table giving me:
Bought.date, Bought.SUM(AMOUNT), Orderheader.date, Orderheader.SUM(AMOUNT)
All based on the, possible multiple, Lotdate(s) I got from the first query from Sales table.
I've been struggling with this for a moment now, using joins and nested queries but I can't seem to figure it out!
Example sample:
SELECT CONVERT(date,ORF.orfDate) as Productiedatum, SUM(orlQuantityRegistered) as 'Aantal'
FROM OrderHeader ORH
LEFT JOIN OrderFrame ORF ON ORH.orhFrameSysID = ORF.orfSysID
LEFT JOIN OrderLine ORL ON ORL.orhSysID = ORH.orhSysID
LEFT JOIN Item ON Item.itmSysID = ORL.orlitmSysID
where CONVERT(date,ORF.orfDate) IN
(
SELECT DISTINCT(CONVERT(date, Lot.lotproductiondate)) as Productiedatum
FROM OrderHeader ORH
LEFT JOIN Registration reg ON reg.regorhSysID = ORH.orhSysID
LEFT JOIN StockRegistration stcreg ON stcreg.stcregRegistrationSysID = reg.regSysID
LEFT JOIN Lot ON Lot.lotSysID = stcregSrclotSysID
WHERE ORH.orhSysID = 514955
AND regRevokeRegSysID IS NULL
AND stcregSrcitmSysID = 5103
)
AND ORL.orlitmSysID = 5103
AND orldirSysID = 2
AND NOT orlQuantityRegistered IS NULL
GROUP BY Orf.orfDate
Sample output:
Productiedatum Aantal
2019-02-05 20
2019-02-06 20
Here I used a nested subquery to get the results from 'Production' (orderheader) because I just can use date = date. I'm struggling with the Sales part where I need to find the last date(s) and use those dates in the Sales table to get the sum of that date.
Expected output:
Productiedatum Aantal Boughtdate Aantal
2019-02-04 20 2019-02-01 55
2019-02-05 20 2019-02-04 60

Try this.
IF OBJECT_ID('tempdb..#Production') IS NOT NULL DROP TABLE #Production
IF OBJECT_ID('tempdb..#Bought') IS NOT NULL DROP TABLE #Bought
CREATE table #Production(R_NO int,ProductionDate datetime,ProductionAmount float)
CREATE table #Bought(R_NO int,Boughtdate datetime,Boughtamount float)
insert into #Production(ProductionDate,ProductionAmount,R_NO)
select p.date ProductionDate,sum(Amount) ProductionAmount,row_number()over (order by p.date) R_NO
from Production P
join Sales s on p.date=S.date-1
where orhsysid=1
group by p.date
declare #loop int,#ProdDate datetime
select #loop =max(R_NO) from #Production
while (1<=#loop)
begin
select #ProdDate=ProductionDate from #Production where r_no=#loop
insert into #Bought(Boughtdate,Boughtamount,R_NO)
select Date,Sum(Amount),#loop R_NO from Bought where date=(
select max(date) from bought B
where B.Date<#ProdDate)
group by Date
set #loop=#loop-1
end
select ProductionDate,ProductionAmount,Boughtdate,Boughtamount from #Bought B
join #Production p on B.R_NO=P.R_NO

SQL Server select max date per ID

I am trying to select max date record for each service_user_id for each finance_charge_id and the amount that is linked the highest date
select distinct
s.Finance_Charge_ID, MAX(s.start_date), s.Amount
from
Service_User_Finance_Charges s
where
s.Service_User_ID = '156'
group by
s.Finance_Charge_ID, s.Amount
The issue is that I receive multiple entries where the amount is different. I only want to receive the amount on the latest date for each finance_charge_id
At the moment I receive the below which is incorrect (the third line should not appear as the 1st line has a higher date)
Finance_Charge_ID (No column name) Amount
2 2014-10-19 1.00
3 2014-10-16 500.00
2 2014-10-01 1000.00

Remove the Amount column from the group by to get the correct rows. You can then join that query onto the table again to get all the data you need. Here is an example using a CTE to get the max dates:
WITH MaxDates_CTE (Finance_Charge_ID, MaxDate) AS
(
select s.Finance_Charge_ID,
MAX(s.start_date) MaxDate
from Service_User_Finance_Charges s
where s.Service_User_ID = '156'
group by s.Finance_Charge_ID
)
SELECT *
FROM Service_User_Finance_Charges
JOIN MaxDates_CTE
ON MaxDates_CTE.Finance_Charge_ID = Service_User_Finance_Charges.Finance_Charge_ID
AND MaxDates_CTE.MaxDate = Service_User_Finance_Charges.start_date

This can be done using a window function which removes the need for a self join on the grouped data:
select Finance_Charge_ID,
start_date,
amount
from (
select s.Finance_Charge_ID,
s.start_date,
max(s.start_date) over (partition by s.Finance_Charge_ID) as max_date,
s.Amount
from Service_User_Finance_Charges s
where s.Service_User_ID = 156
) t
where start_date = max_date;
As the window function does not require you to use group by you can add any additional column you need in the output.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Joining tables with multiple columns - sql

Related

Find Last Purchase Order For Each Part

Postgres: Count multiple events for distinct dates

Recursive subtraction from two separate tables to fill in historical data

Use query result

SQL Server select max date per ID

Categories

Resources