Multiple Joins And Writing to Destination Table with BigQuery - sql

I have the following query that works fine if I DON'T set a destination table.
SELECT soi.customer_id
, p.department
, p.category
, p.subcategory
, p.tier1
, p.tier2
, pc.bucket as categorization
, SUM(soi.price) as demand
, COUNT(1) as cnt
FROM store.sales_item soi
INNER JOIN datamart.product p ON (soi.product_id = p.product_id)
INNER JOIN daily_customer_fact.dcf_product_categorization pc
ON (p.department = pc.department
AND p.category = pc.category
AND p.subcategory = pc.subcategory
AND p.tier1 = pc.tier1
AND p.tier2 = pc.tier2)
WHERE DATE(soi.created_timestamp) < current_date()
GROUP EACH BY 1,2,3,4,5,6,7 LIMIT 10
However, if I set a destination table, it fails with
Error: Ambiguous field name 'app_version' in JOIN. Please use the table qualifier before field name.
That column exists on the store.sales_item table, but I'm not selecting nor joining to that column.

I've seen this error message before, and it points to the following:
Your query job when specifying a destination table is setting flattenResults to false.
Both of the store.sales_item and datamart.product tables contain a field named "app_version".
If so, I recommend looking at this answer:
https://stackoverflow.com/a/28996481/4001094
As well as this issue report: https://code.google.com/p/google-bigquery/issues/detail?id=459
In your case, you should be able to make your query succeed by doing something like the following, using suggestion #3 from the answer linked above. I'm unable to test it as I don't have access to your source tables, but it should be close to working with flattenResults set to false.
SELECT soi_and_p.customer_id
, soi_and_p.department
, soi_and_p.category
, soi_and_p.subcategory
, soi_and_p.tier1
, soi_and_p.tier2
, pc.bucket as categorization
, SUM(soi_and_p.price) as demand
, COUNT(1) as cnt
FROM
(SELECT soi.customer_id AS customer_id
, p.department AS department
, p.subcategory AS subcategory
, p.tier1 AS tier1
, p.tier2 AS tier2
, soi.price AS price
, soi.created_timestamp AS created_timestamp
FROM store.sales_item soi
INNER JOIN datamart.product p ON (soi.product_id = p.product_id)
) as soi_and_p
INNER JOIN daily_customer_fact.dcf_product_categorization pc
ON (soi_and_p.department = pc.department
AND soi_and_p.category = pc.category
AND soi_and_p.subcategory = pc.subcategory
AND soi_and_p.tier1 = pc.tier1
AND soi_and_p.tier2 = pc.tier2)
WHERE DATE(soi_and_p.created_timestamp) < current_date()
GROUP EACH BY 1,2,3,4,5,6,7 LIMIT 10

Related

SQL Query with a MAX and Nested Join

I have what should be a very simple query where I'm trying to get the Part Revision of a Part based on it's max effective date but I keep getting more than 1 result. Would be so great if someone could take a look and tell me what I'm missing? Below is query.
SELECT DISTINCT
Erp.PartRev.PartNum
, Erp.PartMtl.RevisionNum
, Erp.PartRev.EffectiveDate
FROM
Erp.PartRev INNER JOIN
Erp.PartMtl ON Erp.PartRev.Company = Erp.PartMtl.Company
AND Erp.PartRev.PartNum = Erp.PartMtl.PartNum
AND Erp.PartRev.RevisionNum = Erp.PartMtl.RevisionNum
WHERE
(Erp.PartRev.EffectiveDate =
(SELECT MAX(EffectiveDate)
FROM PartRev AS PartRev_1
WHERE (PartRev_1.PartNum = PartMtl.PartNum)
AND (PartRev_1.RevisionNum = PartMtl.RevisionNum)))
GROUP BY
Erp.PartRev.PartNum
, Erp.PartRev.RevisionNum
, Erp.PartMtl.RevisionNum
, Erp.PartRev.EffectiveDate
HAVING (Erp.PartRev.PartNum = N'100-220-2022-01')
and results below:
[enter image description here] (https://i.stack.imgur.com/QVsQn.png)
Expect to see record with most recent date and get 3 records instead of one.
Here is a better example with more than one part and multiple records for each.
SELECT DISTINCT
Erp.PartRev.PartNum
, Erp.PartMtl.RevisionNum
, Erp.PartRev.EffectiveDate
FROM
Erp.PartRev INNER JOIN
Erp.PartMtl ON Erp.PartRev.Company = Erp.PartMtl.Company
AND Erp.PartRev.PartNum = Erp.PartMtl.PartNum
AND Erp.PartRev.RevisionNum = Erp.PartMtl.RevisionNum
WHERE
(Erp.PartRev.EffectiveDate =
(SELECT MAX(EffectiveDate)
FROM PartRev AS PartRev_1
WHERE (PartRev_1.PartNum = PartMtl.PartNum)
AND (PartRev_1.RevisionNum = PartMtl.RevisionNum)))
GROUP BY
Erp.PartRev.PartNum
, Erp.PartRev.RevisionNum
, Erp.PartMtl.RevisionNum
, Erp.PartRev.EffectiveDate
HAVING (Erp.PartRev.PartNum LIKE N'100-220-2022%')
screenshot of results.
(The screenshot looks like SSMS so I'm making the assumption you're using SQL Server)
My guess is you're getting 3 results are because there are 3 companies with the same part number. In that case if you want to ensure one result you'd need to specify for which Company and you can do that without any joins or group by:
SELECT TOP 1
PartNum,
RevisionNum,
EffectiveDate
FROM
Erp.PartRev
WHERE
PartNum = N'100-220-2022-01'
and Comany = '???'
ORDER BY EffectiveDate Desc

How to rewrite query

I need to update based on a select. The following errors with: the column '' was specified multiple times for Q
UPDATE Evolution1.DimAdministrator
SET Evolution1.DimAdministrator.ClaimSystemCodeId = 17
FROM Evolution1.DimAdministrator da INNER JOIN (
Select
ExtractId,
base.AdministratorId,
base.CardprocessorAdministratorId,
AdministratorName,
EffectiveDate,
CancelDate ,
State,
StageError ,
AdministratorKey,
CustomerKey ,
Name ,
EffectiveDateKey ,
CancelDateKey,
StateProvinceKey ,
Alias ,
NavId ,
warehouse.AdministratorId ,
warehouse.CardprocessorAdministratorId,
warehouse.ClaimSystemCodeId,
Inserted ,
Updated
FROM OneStage.OnePay.Administrator base
INNER JOIN OneWarehouse.Evolution1.DimAdministrator warehouse ON base.AdministratorId = warehouse.AdministratorId
WHERE base.ClaimSystemCodeId <> warehouse.ClaimSystemCodeId
AND base.ClaimSystemCodeId = 1
) AS Q
Help please. Thanks.
You have multiple columns with duplicate names.
Put an alias on them like this:
UPDATE Evolution1.DimAdministrator
SET Evolution1.DimAdministrator.ClaimSystemCodeId = 17
FROM Evolution1.DimAdministrator da INNER JOIN (
Select
ExtractId,
base.AdministratorId AS base_AdminID,
base.CardprocessorAdministratorId AS base_CardID,
AdministratorName,
EffectiveDate,
CancelDate ,
State,
StageError ,
AdministratorKey,
CustomerKey ,
Name ,
EffectiveDateKey ,
CancelDateKey,
StateProvinceKey ,
Alias ,
NavId ,
warehouse.AdministratorId wh_AdminID,
warehouse.CardprocessorAdministratorId AS WH_CardID,
warehouse.ClaimSystemCodeId,
Inserted ,
Updated
FROM OneStage.OnePay.Administrator base
INNER JOIN OneWarehouse.Evolution1.DimAdministrator warehouse ON base.AdministratorId = warehouse.AdministratorId
WHERE base.ClaimSystemCodeId <> warehouse.ClaimSystemCodeId
AND base.ClaimSystemCodeId = 1
) AS Q
Are you sure you don't need to JOIN Q ON something?

Exclude few selected fields on group by

I had a query that was returning member transaction information. This query has an aggregate function to calculate the amount. All is working fine according to its grouping. Now what I need to do is to add two more columns from different tables. I did try to add them unfortunately they are giving me duplicated information with tons number of records.
Can anyone help me I just want to be able to include the two fields on the query and not include them in the group by clause. And also ensure that data returned is not a duplicate
See below is the query I used.
DECLARE #LastMonthExtractID Int = 11
SELECT x.*
,lstmnth.Submission ---added
,lm_subt.SubmissionTypeDescription ---added
FROM (
SELECT MemberRef --unique key
, SiteName
, ChargePeriod
, SUM(Amount) AS Amount
, TransactionMap
, PackageCode
FROM (
SELECT MemberRef
, SiteName
, ChargePeriod
, Amount
, PackageCode
, CASE WHEN map.TransactionMap = 'JoinFee' AND lstmnth.ChargeDate <> lstmnth.JoinDate THEN 'PayPlan'
WHEN map.TransactionMap = 'MemberFee' AND lstmnth.PackageCode LIKE 'PV%' AND lstmnth.SiteID <> 15 THEN 'VitalityMF' -- must use Package and not CURRENT PACKAGE
WHEN map.TransactionMap = 'MemberFee' AND lstmnth.PackageCode LIKE 'PV%' AND lstmnth.SiteID = 15 THEN 'PlatVitalityMF' -- PLATINUM
WHEN map.TransactionMap = 'MemberFee' AND lstmnth.PackageCode LIKE 'Z%' THEN 'ZContract'
WHEN map.TransactionMap IS NULL THEN 'Other'
ELSE map.TransactionMap END AS TransactionMap
--, lstmnth.Submission
--, lm_subt.SubmissionTypeDescription --added
FROM dbo.CCX_Billing lstmnth
LEFT JOIN dbo.TransactionMap map on lstmnth.TransactionType = map.TransactionType
AND lstmnth.TransactionDescription = map.TransactionDescription
AND ISNULL (lstmnth.AnalysisCode, '') = map.AnalysisCode
WHERE lstmnth.ExtractID = #LastMonthExtractID
) l
GROUP BY SiteName, MemberRef, ChargePeriod, PackageCode, TransactionMap
) x
INNER JOIN dbo.CCX_Billing lstmnth ON lstmnth.MemberRef = x.MemberRef
LEFT JOIN dbo.CCX_Billing_PSubmission lm_sub on lstmnth.SubmissionID = lm_sub.ID
INNER JOIN dbo.CCX_Billing_SubmissionType lm_subt on lm_sub.SubmissionTypeID = lm_subt.SubmissionID --added

SQL Query to get only 1 instance of a record where one to many relationship exists

Ive got an SQL Query trying to get 1 record back when a 1 to many relationship exists.
SELECT dbo.BlogEntries.ID AS blog_entries_id, dbo.BlogEntries.BlogTitle, dbo.BlogEntries.BlogEntry, dbo.BlogEntries.BlogName,
dbo.BlogEntries.DateCreated AS blog_entries_datecreated, dbo.BlogEntries.inActive AS blog_entries_in_active,
dbo.BlogEntries.HtmlMetaDescription AS blog_entries_html_meta_description, dbo.BlogEntries.HtmlMetaKeywords AS blog_entries_html_meta_keywords,
dbo.BlogEntries.image1, dbo.BlogEntries.image2, dbo.BlogEntries.image3, dbo.BlogEntries.formSelector, dbo.BlogEntries.image1Alignment,
dbo.BlogEntries.image2Alignment, dbo.BlogEntries.image3Alignment, dbo.BlogEntries.blogEntryDisplayName, dbo.BlogEntries.published AS blog_entries_published,
dbo.BlogEntries.entered_by, dbo.BlogEntries.dateApproved, dbo.BlogEntries.approved_by, dbo.blog_entry_tracking.id AS blog_entry_tracking_id,
dbo.blog_entry_tracking.blog, dbo.blog_entry_tracking.blog_entry, dbo.BlogCategories.ID, dbo.BlogCategories.BlogCategoryName,
dbo.BlogCategories.BlogCategoryComments, dbo.BlogCategories.DateCreated, dbo.BlogCategories.BlogCategoryTitle, dbo.BlogCategories.BlogCategoryTemplate,
dbo.BlogCategories.inActive, dbo.BlogCategories.HtmlMetaDescription, dbo.BlogCategories.HtmlMetaKeywords, dbo.BlogCategories.entry_sort_order,
dbo.BlogCategories.per_page, dbo.BlogCategories.shorten_page_content, dbo.BlogCategories.BlogCategoryDisplayName, dbo.BlogCategories.published,
dbo.BlogCategories.blogParent
FROM dbo.BlogEntries LEFT OUTER JOIN
dbo.blog_entry_tracking ON dbo.BlogEntries.ID = dbo.blog_entry_tracking.blog_entry LEFT OUTER JOIN
dbo.BlogCategories ON dbo.blog_entry_tracking.blog = dbo.BlogCategories.ID
i have some records assigned to 2 different blogcategories, and when i query everything it returns duplicate records.
How do i only return 1 instance of a blog?
Try this one -
SELECT blog_entries_id = be.Id
, be.BlogTitle
, be.BlogEntry
, be.BlogName
, blog_entries_datecreated = be.DateCreated
, blog_entries_in_active = be.inActive
, blog_entries_html_meta_description = be.HtmlMetaDescription
, blog_entries_html_meta_keywords = be.HtmlMetaKeywords
, be.image1
, be.image2
, be.image3
, be.formSelector
, be.image1Alignment
, be.image2Alignment
, be.image3Alignment
, be.blogEntryDisplayName
, blog_entries_published = be.published
, be.entered_by
, be.dateApproved
, be.approved_by
, blog_entry_tracking_id = bet.Id
, bet.blog
, bet.blog_entry
, bc2.Id
, bc2.BlogCategoryName
, bc2.BlogCategoryComments
, bc2.DateCreated
, bc2.BlogCategoryTitle
, bc2.BlogCategoryTemplate
, bc2.inActive
, bc2.HtmlMetaDescription
, bc2.HtmlMetaKeywords
, bc2.entry_sort_order
, bc2.per_page
, bc2.shorten_page_content
, bc2.BlogCategoryDisplayName
, bc2.published
, bc2.blogParent
FROM dbo.BlogEntries be
LEFT JOIN dbo.blog_entry_tracking bet ON be.Id = bet.blog_entry
OUTER APPLY (
SELECT TOP 1 *
FROM dbo.BlogCategories bc
WHERE bet.blog = bc.Id
) bc2
Also, I would like to mention that in this case, using of aliases in the column names decreases the size of your query and makes it more convenient for understanding.
if you just need one record back, you can use
SELECT TOP 1 dbo.BlogEntries.ID AS blog_entries_id, dbo.Bl.... (same as you have now).
it is more efficient than SELECT DISTINCT
Here is a Northwind Example.
It will return only 1 row in the Order Detail table for each Order.
Use Northwind
GO
Select COUNT(*) from dbo.Orders
select COUNT(*) from dbo.[Order Details]
select * from dbo.Orders ord
join
(select ROW_NUMBER() OVER(PARTITION BY OrderID ORDER BY UnitPrice DESC) AS "MyRowID" , * from dbo.[Order Details] innerOD) derived1
on ord.OrderID = derived1.OrderID
Where
derived1.MyRowID = 1
Order by ord.OrderID

SQL Sub-select as field?

I'm a bit lost here...
I have several tables I'd like to pull a unified record from: Unit, Building, Owner, and Picture.
Here's my query so far:
SELECT building.`Street_Address`
, building.`Building_Name`
, building.`Building_Type`
, CONCAT(building.`Cross_Street_1`, ' & ', building.`Cross_Street_2`) Cross_Streets
, building.`Cross_Street_1`
, building.`Cross_Street_2`
, building.`Access` Building_Access
, owner.`Company_Name`
, owner.`Contact_Or_Reference`
, owner.`Landlord_Phone`
, picture.`Path_To_Picture_On_Server`
, picture.`Picture_Category`
, unit.`Apartment_Number`
, unit.`Unit_Size_Number` Size
, unit.`Is_Doorman`
, unit.`Is_Furnished`
, unit.`Is_Elevator`
, unit.`Is_Pets`
, unit.`Is_OutdoorSpace`
, unit.`Rent_Price`
, unit.`Baths`
, unit.`Access` Unit_Access
, unit.`fourd_id`
, unit.`Updated_Date`
, unit.`Occupancy_Date`
, unit.`Term`
, unit.`Incentives`
, unit.`Info_OutdoorSpace`
, unit.`List_Date`
, zone.`Description`
FROM 4D_Units unit
JOIN 4D_Building building
ON unit.`BUILDING_RecID` = building.`fourd_id`
JOIN 4D_Zones zone
ON building.`ZONES_RecID` = zone.`fourd_id`
LEFT JOIN 4D_Owners owner
ON unit.`OWNER_RecID` = owner.`fourd_id`
LEFT JOIN 4D_Building_Picts picture
ON (building.`fourd_id` = picture.`BUILDING_RecID` AND picture.`Picture_Category` = 'Front')
WHERE unit.`id` = 49901
This works fine as-is, except that the return record will only ever have the "Front" picture in the record (if present). My issue is that there are several different types of photos that could be associated with a return record, including 'Panorama', 'Interior', and 'Floorplan'... all are different possible values for picture.Picture_Category.
Is there a way to return those values (if they are present, as above) in the returned set without doing a separate query? I want the returned set to include (if present) aliased values for all four possible options of picture.Picture_Category: 'Front', 'Panorama', 'Interior', & 'Floorplan' (with their own unique picture.Path_To_Picture_On_Server associated with it).
Does that make sense?
If I understand you correctly, you want to have 4 sets of picture columns in your result set - one of reach of 4 categories? Right now you have just one for front, right?
You can join to the same table multiple times with different aliases and different join clauses. Just join to 4D_Building_Picts 4 times, once for each picture you want.
select
--whatever
, pic_front.`Path_To_Picture_On_Server` AS Front_Path_To_Picture_On_Server
, pic_panorama.`Path_To_Picture_On_Server` AS Panorama_Path_To_Picture_On_Server
, pic_interior.`Path_To_Picture_On_Server` AS Interior_Path_To_Picture_On_Server
, pic_floorplan.`Path_To_Picture_On_Server` AS Floorplan_Path_To_Picture_On_Server
--whatever
FROM 4D_Units unit
JOIN 4D_Building building
ON unit.`BUILDING_RecID` = building.`fourd_id`
JOIN 4D_Zones zone
ON building.`ZONES_RecID` = zone.`fourd_id`
LEFT JOIN 4D_Owners owner
ON unit.`OWNER_RecID` = owner.`fourd_id`
LEFT JOIN 4D_Building_Picts pic_front
ON (building.`fourd_id` = pic_front.`BUILDING_RecID` AND pic_front.`Picture_Category` = 'Front')
LEFT JOIN 4D_Building_Picts pic_panorama
ON (building.`fourd_id` = pic_panorama.`BUILDING_RecID` AND pic_panorama.`Picture_Category` = 'Panorama')
LEFT JOIN 4D_Building_Picts pic_interior
ON (building.`fourd_id` = pic_interior.`BUILDING_RecID` AND pic_interior.`Picture_Category` = 'Interior')
LEFT JOIN 4D_Building_Picts pic_floorplan
ON (building.`fourd_id` = pic_floorplan.`BUILDING_RecID` AND pic_floorplan.`Picture_Category` = 'Floorplan')
WHERE unit.`id` = 49901
I think you want the coalesce function. It takes multiple fields, and returns the first of them that's non-null. So something like:
Select
Coalesce(A.Panorama, A.Interior, A.Floorplan, '') as ImagePath
From
Table A
You only end up with one value this way though, which may not actually be what you're after. If you want all of them I'd suggest using correlated subqueries, like so:
Select
(Select P.Path_To_Picture From 4D_Building_Picts P where P.Building_RecID = B.fourd_Id And P.Picture_Category = 'Front') as Front_Pic,
(Select P.Path_To_Picture From 4D_Building_Picts P where P.Building_RecID = B.fourd_Id And P.Picture_Category = 'Panorama') as Panamora_Pic,
(Select P.Path_To_Picture From 4D_Building_Picts P where P.Building_RecID = B.fourd_Id And P.Picture_Category = 'FloorPlan') as FloorPlan_Pic,
...
From
4D_Building B