Query is summing values multiple times - sql

Hi my query below is summing multiple values based on #cropseasons in my table. Since i have 4 crop seasons it seems to be multiplying the values by 4 since i have crop season as 1, 2, 3 or 4. All i want is values for 1 crop season. Can anyone assist? I have crop season in both tables.
With Summary as (
Select B_NAME as Branch, LOC as Location
,SUM(payment) as Gallons
,SUM(case when printed = 1 THEN Fee ELSE NULL END) as FeeCollected
,SUM(case when printed = 0 THEN Fee ELSE NULL END) as FeeNotCollected
,SUM(case when printed = 1 THEN Payment ELSE NULL END) as GallonsIssued
,SUM(case when printed = 0 THEN Payment ELSE NULL END) as GallonsNotIssued
From SicbWeeklyDeliveriesFuelArchive F Inner Join FarmerGroups G ON G.BSI_CODE = F.BSI_CODE
Where F.CROP_SEASON = #cropseason
Group By B_NAME, LOC
)
SELECT Branch
,Location
,Gallons
,GallonsIssued
,GallonsNotIssued
,FeeCollected
,FeeNotCollected
,((GallonsIssued/Gallons) * 100) as pct_GallonsCollected
FROM Summary
Order by Location, Branch
SicbWeeklyDeliveriesFuelArchive
+-------+----------+-------------+-----+---------+------+-------------+---------+
| ID | BSI_CODE | B_NAME | LOC | PAYMENT | FEE | CROP_SEASON | PRINTED |
+-------+----------+-------------+-----+---------+------+-------------+---------+
| 18735 | 2176 | SAN NARCISO | CZ | 85 | 8.5 | 4 | 0 |
| 18738 | 2176 | SAN NARCISO | CZ | 65 | 6.5 | 4 | 0 |
| 18739 | 10494 | SAN NARCISO | CZ | 85 | 8.5 | 3 | 0 |
+-------+----------+-------------+-----+---------+------+-------------+---------+
FarmerGroups
+-------+----------+-------------+-------------+
| ID | BSI_CODE | CROP_SEASON | BRANCH |
+-------+----------+-------------+-------------+
| 10473 | 2176 | 4 | SAN NARCISO |
| 11478 | 2176 | 3 | SAN NARCISO |
| 12787 | 10494 | 4 | SAN ROMAN |
+-------+----------+-------------+-------------+

It seems your join criteria is incomplete. The tables share BSI_CODE and CROP_SEASON, so I guess you want:
FROM sicbweeklydeliveriesfuelarchive f
JOIN farmergroups g ON g.bsi_code = f.bsi_code AND g.crop_season = f.crop_season
WHERE f.crop_season = #cropseason
But that's just guessing. Only you know how the tables are really related, what their rows represent, what columns make a row unique and what result you are actually after. Why do you join farmergroups at all? It looks like you are not really using the table in your query.

Related

Duplicate records upon joining table

I am still very new to SQL and Tableau however I am trying to work myself towards achieving a personal project of mine.
Table A; shows a table which contains the defect quantity per product category and when it was raised
+--------+-------------+--------------+-----------------+
| Issue# | Date_Raised | Category_ID# | Defect_Quantity |
+--------+-------------+--------------+-----------------+
| PCR12 | 11-Jan-2019 | Product#1 | 14 |
| PCR13 | 12-Jan-2019 | Product#1 | 54 |
| PCR14 | 5-Feb-2019 | Product#1 | 5 |
| PCR15 | 5-Feb-2019 | Product#2 | 7 |
| PCR16 | 20-Mar-2019 | Product#1 | 76 |
| PCR17 | 22-Mar-2019 | Product#2 | 5 |
| PCR18 | 25-Mar-2019 | Product#1 | 89 |
+--------+-------------+--------------+-----------------+
Table B; shows the consumption quantity of each product by month
+-------------+--------------+-------------------+
| Date_Raised | Category_ID# | Consumed_Quantity |
+-------------+--------------+-------------------+
| 5-Jan-2019 | Product#1 | 100 |
| 17-Jan-2019 | Product#1 | 200 |
| 5-Feb-2019 | Product#1 | 100 |
| 8-Feb-2019 | Product#2 | 50 |
| 10-Mar-2019 | Product#1 | 100 |
| 12-Mar-2019 | Product#2 | 50 |
+-------------+--------------+-------------------+
END RESULT
I would like to create a table/bar chart in tableau that shows that Defect_Quantity/Consumed_Quantity per month, per Category_ID#, so something like this below;
+----------+-----------+-----------+
| Month | Product#1 | Product#2 |
+----------+-----------+-----------+
| Jan-2019 | 23% | |
| Feb-2019 | 5% | 14% |
| Mar-2019 | 89% | 10% |
+----------+-----------+-----------+
WHAT I HAVE TRIED SO FAR
Unfortunately i have not really done anything, i am struggling to understand how do i get rid of the duplicates upon joining the tables based on Category_ID#.
Appreciate all the help I can receive here.
I can think of doing left joins on both product1 and 2.
select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')
, (p2.product1 - sum(case when category_id='Product#1' then Defect_Quantity else 0 end))/p2.product1 * 100
, (p2.product2 - sum(case when category_id='Product#2' then Defect_Quantity else 0 end))/p2.product2 * 100
from tableA t1
left join
(select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy') Date_Raised
, sum(Comsumed_Quantity) as product1 tableB
where category_id = 'Product#1'
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')) p1
on p1.Date_Raised = t1.Date_Raised
left join
(select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy') Date_Raised
, sum(Comsumed_Quantity) as product2 tableB
where category_id = 'Product#2'
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')) p2
on p2.Date_Raised = t1.Date_Raised
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')
By using ROW_NUMBER() OVER (PARTITION BY ORDER BY ) as RN, you can remove duplicate rows. As of your end result you should extract month from date and use pivot to achieve.
I would do this as:
select to_char(date_raised, 'YYYY-MM'),
(sum(case when product = 'Product#1' then defect_quantity end) /
sum(case when product = 'Product#1' then consumed_quantity end)
) as product1,
(sum(case when product = 'Product#2' then defect_quantity end) /
sum(case when product = 'Product#2' then consumed_quantity end)
) as product2
from ((select date_raised, product, defect_quantity, 0 as consumed_quantity
from a
) union all
(select date_raised, product, 0 as defect_quantity, consumed_quantity
from b
)
) ab
group by to_char(date_raised, 'YYYY-MM')
order by min(date_raised);
(I changed the date format because I much prefer YYYY-MM, but that is irrelevant to the logic.)
Why do I prefer this method? This will include all months where there is a row in either table. I don't have to worry that some months are inadvertently filtered out, because there are missing production or defects in one month.

SQL: Cascading conditions on Join

I have found a few similar questions to this on SO but nothing which applies to my situation.
I have a large dataset with hundreds of millions of rows in Table 1 and am looking for the most efficient way to run the following query. I am using Google BigQuery but I think this is a general SQL question applicable to any DBMS?
I need to apply an owner to every row in Table 1. I want to join in the following priority:
1: if item_id matches an identifier in Table 2
2: if no item_id matches try match on item_name
3: if no item_id or item_name matches try match on item_division
4: if no item_division matches, return null
Table 1 - Datapoints:
| id | item_id | item_name | item_division | units | revenue
|----|---------|-----------|---------------|-------|---------
| 1 | xyz | pen | UK | 10 | 100
| 2 | pqr | cat | US | 15 | 120
| 3 | asd | dog | US | 12 | 105
| 4 | xcv | hat | UK | 11 | 140
| 5 | bnm | cow | UK | 14 | 150
Table 2 - Identifiers:
| id | type | code | owner |
|----|---------|-----------|-------|
| 1 | id | xyz | bob |
| 2 | name | cat | dave |
| 3 | division| UK | alice |
| 4 | name | pen | erica |
| 5 | id | xcv | fred |
Desired output:
| id | item_id | item_name | item_division | units | revenue | owner |
|----|---------|-----------|---------------|-------|---------|-------|
| 1 | xyz | pen | UK | 10 | 100 | bob | <- id
| 2 | pqr | cat | US | 15 | 120 | dave | <- code
| 3 | asd | dog | US | 12 | 105 | null | <- none
| 4 | xcv | hat | UK | 11 | 140 | fred | <- id
| 5 | bnm | cow | UK | 14 | 150 | alice | <- division
My attempts so far have involved multiple joining the table onto itself and I fear it is becoming hugely inefficient.
Any help much appreciated.
Another option for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_AGG(a)[OFFSET(0)].*,
ARRAY_AGG(owner
ORDER BY CASE
WHEN type = 'id' THEN 1
WHEN type = 'name' THEN 2
WHEN type = 'division' THEN 3
END
LIMIT 1
)[OFFSET(0)] owner
FROM Datapoints a
JOIN Identifiers b
ON (a.item_id = b.code AND b.type = 'id')
OR (a.item_name = b.code AND b.type = 'name')
OR (a.item_division = b.code AND b.type = 'division')
GROUP BY a.id
ORDER BY a.id
It leaves out entries which k=have no owners - like in below result (id=3 is out as it has no owner)
Row id item_id item_name item_division units revenue owner
1 1 xyz pen UK 10 100 bob
2 2 pqr cat US 15 120 dave
3 4 xcv hat UK 11 140 fred
4 5 bnm cow UK 14 150 alice
I am using the following query (thanks #Barmar) but want to know if there is a more efficient way in Google BigQuery:
SELECT a.*, COALESCE(b.owner,c.owner,d.owner) owner FROM datapoints a
LEFT JOIN identifiers b on a.item_id = b.code and b.type = 'id'
LEFT JOIN identifiers c on a.item_name = c.code and c.type = 'name'
LEFT JOIN identifiers d on a.item_division = d.code and d.type = 'division'
I'm not sure if BigQuery optimizes today a query like this - but at least you would be writing a query that gives strong hints to not run the subqueries when not needed:
#standardSQL
SELECT COALESCE(
null
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.login=a.user)
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.id = SAFE_CAST(user AS INT64))
)
FROM (SELECT '15229281' user) a
4.2s elapsed, 683 GB processed
{"action":"started"}
For example, the following query took a long time to run, but BigQuery could optimize its execution massively in the future (depending on how frequently users needed an operation like this):
#standardSQL
SELECT COALESCE(
"hello"
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.login=a.user)
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.id = SAFE_CAST(user AS INT64))
)
FROM (SELECT actor.login user FROM `githubarchive.year.2016` LIMIT 10) a
114.7s elapsed, 683 GB processed
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello

Joining tables with multiple conditions

I have two tables I would like to combine and set multiple conditions to pull out the desired data:
Table 1: [Folder1].[Name].[Info]
|---------------------|------------------|------------------|
| NameID | Gender | DoB |
|---------------------|------------------|------------------|
| 1 | M | 19800909 |
|---------------------|------------------|------------------|
| 2 | M | 19620102 |
|---------------------|------------------|------------------|
| 3 | F | 19501012 |
|---------------------|------------------|------------------|
| 4 | F | 19900212 |
|---------------------|------------------|------------------|
| 5 | M | 19501010 |
|---------------------|------------------|------------------|
And Table 2: [Folder1].[Body].[Sign]
|----------------|------------|------------|------------|
| NameID | BODYID | Category | Result |
|----------------|------------|------------|------------|
| 1 | 80000001 | Height | 178 |
|----------------|------------|------------|------------|
| 1 | 80000002 | Waist | 32 |
|----------------|------------|------------|------------|
| 1 | 80000003 | weight | 78 |
|----------------|------------|------------|------------|
| 1 | 80000003 | weight | 85 |
|----------------|------------|------------|------------|
| 2 | 80000001 | height | 170 |
|----------------|------------|------------|------------|
| 2 | 80000002 | waist | 30 |
|----------------|------------|------------|------------|
| 2 | 80000003 | weight | 78 |
|----------------|------------|------------|------------|
| 2 | 80000003 | weight | 80 |
|----------------|------------|------------|------------|
| 2 | 80000003 | weight | 76 |
|----------------|------------|------------|------------|
| 3 | 80000001 | height | 168 |
|----------------|------------|------------|------------|
| 4 | 80000001 | height | 170 |
|----------------|------------|------------|------------|
| 5 | 80000001 | height | 171 |
|----------------|------------|------------|------------|
I want to combine the 2 tables together with set conditions so that the combined graph would have Top 50 NameID, Gender, DoB, BodyID, Category, Result of people with DoB before 19900101, showing only Height and weight data, and showing only the people with 3 or more weight data.
The current SQL code I have right now is:
SELECT TOP 50 [Info].[NameID]
,[Gender]
,[DoB]
,[BodyID]
,[Category]
,[Result]
FROM [Folder1].[Name].[Info] LEFT JOIN [Folder1].[Body].[Sign]
ON [Info].[NameID] = [Sign].[NameID]
WHERE ([DoB] < '19900101')
AND ([Category] = 'Weight' OR [Category] = 'Height')
AND [Category] IN (SELECT Count(case when [BODYID] = 80000003 then 1 else null end) FROM [Folder1].[Body].[Sign] GROUP BY [Category] HAVING COUNT([BODYID]) >2)
ORDER BY [NameID]
The query can be successfully executed and a table shows up, but no information has appeared. I have a feeling that something is wrong with that 'count' section, but couldn't figure out what's wrong with it.
What I am hoping to get as a result is something like:
|------------|------------|------------|------------|--------|--------|
| NameID | Gender | DoB | BODYID |Category|Result |
|------------|------------|------------|------------|--------|--------|
| 2 | M | 19620102 | 80000001 |Height | 170 |
|------------|------------|------------|------------|--------|--------|
| 2 | M | 19620102 | 80000003 |Weight | 78 |
|------------|------------|------------|------------|--------|--------|
| 2 | M | 19620102 | 80000003 |Weight | 80 |
|------------|------------|------------|------------|--------|--------|
| 2 | M | 19620102 | 80000003 |Weight | 76 |
|------------|------------|------------|------------|--------|--------|
Thanks in advance.
When you left join a table and then you put condition for a column that exists within that table you are actually making an inner join by discarding all the rows from the output that don't satisfy this condition. Since this is a left join a joining condition may evaluate to false and yet you are enforcing another condition in WHERE clause thus that row is discarded because it also evaluates to false.
I'm not going to follow the logic inside your entire WHERE clause, but I've moved one condition to JOIN clause and added brackets with OR [Category] IS NULL to the "complicated" condition so that even if LEFT JOIN is not satisfied, and in that case [Category] would be NULL it is still returning rows in the output.
SELECT TOP 50 [Info].[NameID]
,[Gender]
,[DoB]
,[BodyID]
,[Category]
,[Result]
FROM [Folder1].[Name].[Info] LEFT JOIN [Folder1].[Body].[Sign]
ON [Info].[NameID] = [Sign].[NameID] AND [Sign].[Category] IN ('Weight', 'Height')
WHERE [DoB] < '19900101'
AND ( [Category] IN ( ... ) OR [Category] IS NULL )
ORDER BY [NameID]
I'm not sure that I follow the whole question, but this definitely doesn't look right:
AND [Category] IN (SELECT Count(case when [BODYID] = 80000003 then 1 else null end)
FROM [Folder1].[Body].[Sign]
GROUP BY [Category]
HAVING COUNT([BODYID]) > 3
)
I don't fully follow the logic, but I could imagine that you want:
AND [Category] IN (SELECT [Category]
FROM [Folder1].[Body].[Sign]
GROUP BY [Category]
HAVING COUNT([BODYID]) > 3
)
or perhaps:
AND [Category] IN (SELECT [Category]
FROM [Folder1].[Body].[Sign]
GROUP BY [Category]
HAVING SUM(case when [BODYID] = 80000003 then 1 else 0 end) > 3
)
It looks like the reason you might not be getting results is because you have a SELECT COUNT in your where clause for. The code below shows the area of trouble.
AND [Category] IN (SELECT Count(case when [BODYID] = 80000003 then 1 else null end)
FROM [Folder1].[Body].[Sign] GROUP BY [Category] HAVING COUNT([BODYID]) >3)
The problem is that you are comparing a category which appears to be a varchar with a count which will return an int value. So if the count of the subquery = 10 then it will not match any of your categories.
This is likely why you are seeing no results because you have no categories equal to the count of your sub query.

Joining 3 tables on a date criteria

I need help with this.
Tbl: WarehouseInventory
Date | DelRec | ProductId | Quantity
2015-09-10 | 110 | 1 | 100
2015-09-12 | 111 | 1 | 100
2015-09-12 | 111 | 2 | 200
2015-09-12 | 111 | 3 | 300
Tbl: Withdrawals
Date | ID | ProductId | Quantity | CustomerId
2015-09-11 | 1 | 1 | 400 | 2
2015-09-12 | 1 | 1 | 100 | 1
2015-09-12 | 2 | 2 | 200 | 1
2015-09-12 | 3 | 3 | 300 | 1
Tbl: Customers
Customer Id | Name
1 | Somebody
2 | Someone
The output should be like this
DelRec | Date Added | ProductId | Stocked | Withdrawn | Customer
110 | 2015-09-10 | 1 | 100 | 0 | NULL
0 | 2015-09-11 | 1 | 0 | 400 | Someone
111 | 2015-09-12 | 1 | 100 | 100 | Somebody
111 | 2015-09-12 | 2 | 200 | 200 | Somebody
111 | 2015-09-12 | 3 | 300 | 300 | Somebody
This is what I have come up so far and it's giving me a wrong output
select wi.DateAdded as 'Date Added', max(wi.DeliveryReceipt) as 'Delivery Receipt', wi.ProductId as 'Product',
max(isnull(wi.Quantity, 0)) as 'Stocked', max(isnull(w.Quantity, 0)) as 'Withdrawn', e.Customers as 'Customer'
from WarehouseInventory wi
cross join Withdrawals w
cross join Customer e
group by wi.DateAdded, wi.ProductId, e.Customers, wi.DeliveryReceipt, w.ProductId
Basically, I need to join the two tables on the date and product and if there is a null value in one of the tables, just make it 0. I appreciate your help.
You can use a FULL OUTER JOIN:
SELECT DelRec,
COALESCE(wi.[Date], wd.[Date]) AS Date_Added,
COALESCE(wi.ProductId, wd.ProductId) AS ProductId,
COALESCE(wi.Quantity, 0) AS Stocked,
COALESCE(wd.Quantity, 0) AS Withdrawn,
c.Name AS Customer
FROM WarehouseInventory AS wi
FULL OUTER JOIN Withdrawals AS wd
ON wi.[Date] = wd.[Date] AND wi.ProductId = wd.ProductId
LEFT JOIN Customers AS c ON c.[Customer Id] = wd.CustomerId
ORDER BY Date_Added
You have a few inconsistencies between your example table and your query, but here's the basic gist:
You want to FULL OUTER JOIN your Warehouse delivery (A) and Withdrawal (B) tables on both product and date
Make sure you coalesce(A, B) for both date and product
Sum the quantities from each table, then coalesce outside each aggregate to get zeros (since one column can be all nulls).
Here:
select
coalesce(wi.DateAdded, w.date) as 'Date Added',
max(wi.DeliveryReceipt) as 'Delivery Receipt',
coalesce(wi.ProductId, w.productId) as 'Product',
coalesce(sum(wi.Quantity), 0) as 'Stocked',
coalesce(sum(w.Quantity), 0) as 'Withdrawn',
e.name as 'Customer'
from WarehouseInventory wi
full outer join Withdrawals w on w.date = wi.dateadded and w.productId = wi.productId
left join Customer e on e.customerId = w.customerId
group by
coalesce(wi.DateAdded, w.date),
coalesce(wi.ProductId, w.productId),
e.name

Dynamic columns in sql Join condition

Customer
customer_id | customer_name | customer_city | customer_number
---------------------------------------------------------------
1 | john | sanjose | 978234
2 | chris | newyork | 293
3 | mary | madrid | 342943
4 | tom | bangkok | 8627093
---------------------------------------------------------------
Data
data_id | data_name | data_city | data_number | data_cust_id | customer_id
--------------------------------------------------------------------------------------------
1 | abc | xyz | 990 | 1 | NULL
2 | john | sanjose | 978234 | 1 | NULL
3 | mary | madrid | 8627093 | 3 | NULL
4 | tom | LA | 7729 | 4 | NULL
ActionType
action_id | action_description
-----------------------------------
1 | customer_name
2 | customer_number
3 | customer_city
DataToAction
id | data_id | action_id
--------------------------
1 | 1 | 1
2 | 1 | 2
4 | 2 | 1
5 | 2 | 2
6 | 2 | 3
7 | 3 | 1
8 | 3 | 2
9 | 4 | 1
There are 4 tables -
Customer - Has customer datails
Data - Raw data pulled from an external source (has customer data and others)
ActionType - Has the column names which will be used in a join condition
DataToAction - For each of the raw data row in Data table, the columns to be used in the join is specified here.
Objective - To populate customer_id in 'Data' table.
I need something like this
UPDATE D
SET D.customer_id = C.customer_id
FROM Data D
INNER JOIN Customer C on D.data_cust_id = C.customer_id
WHERE *("GET THE COLUMNS TO BE MATCHED FROM DATATOACTION TABLE AND USE HERE")*
For eg., for Data id 1, i will update customer_id based on customer_name & customer_number, for data id 2 i will udpate customer_id based on customer_name, customer_number & customer_city and so on.
How do I apply the dynamic column conditions in the where clause for each of the row wherein the columns to be matched are specified in a different table.
Well the question is quite unclear. Can u elaborate the final resulset.
Purpose of ActionType table??
UPDATE D
SET D.customer_id = C.customer_id
FROM Data D
INNER JOIN Customer C on D.data_cust_id = C.customer_id
INNER JOIN DataToAction DA ON DA.data_id = D.data_id