SQL Group By and Join based on a weird client table - sql

I have 3 tables that I want to join together and group it to get client membership info. My code works for grouping the base table together but it breaks at the join part and I can't figure out why.
BASE TABLE : sales_detail
+-------+-----------+-----------+-----------------------------------------+
| order_date | transaction_id| product_cost | payment_type | country
+-------+-----------+-----------+------------------------------------------+
| 10/1 | 12345 | 20 | mastercard | usa
| 10/1 | 12345 | 50 | mastercard | usa
| 10/5 | 82456 | 50 | mastercard | usa
| 10/9 | 64789 | 30 | visa | canada
| 10/15 | 08546 | 20 | mastercard | usa
| 10/15 | 08546 | 90 | mastercard | usa
| 10/17 | 65898 | 50 | mastercard | usa
+-------+-----------+-----------+-------------------------------------+
table : client_information
+-------+-----------+-----------+-------------------+
| other_id | client_Type | item
+-------+-----------+-----------+----------+
| 112341 | new | hola |
| 112341 | old | mango |
| 145634 | old | pine |
| 879547 | old | vip |
| 745688 | new | unio |
| 745688 | old | dog |
| 147899 | new | cat |
| 124589 | new | amigo |
+-------+-----------+-----------+-----------+
table : connector
+-------+-----------+-----------+-------------------+
| transaction_ID | other_id | item
+-------+-----------+-----------+----------+
| 12345 | 112341 | hola |
| 82456 | 145634 | pine |
| 08157 | 879547 | unio |
| 08546 | 745688 | dog |
| 65898 | 147899 | cat |
| 06587 | 124589 | amigo |
+-------+-----------+-----------+-----------+
**I want the output to look something like this: **
IDEAL OUTPUT
+-------+-----------+-----------+--------------------------------+
| order_date | transaction_ID | product_cost | client_Type|
+-------+-----------+-----------+--------------------------------+
| 10/1 | 12345 | 70 | new |
| 10/5 | 82456 | 70 | old |
| 10/15 | 08546 | 110 | old |
| 10/17 | 65898 | 50 | new |
+-------+-----------+-----------+----------------------------------+
**i am trying to join my base table to the connector table by transaction ID to get other_id and items to match to client_type **
This is the code i used but it failed to compile after adding in left joins :
select t1.transaction_id, sum(t1.product_cost), t1.order_date, t3.client_type
from sales_detail t1
left join (select DISTINCT transaction_ID, other_id, fruits from connector) t2
ON t1.transaction_ID=t2.transaction_ID
left join (select DISTINCT order_id, client_type, fruits from client information) t3
ON t2.other_id=t3.other_id and t2.item=t3.item
where t1.payment_type='mastercard' and t1.order_Date between '2020-10-01' and'2020-10-31'
and country != 'canada'
GROUP BY t1.transaction_id, t1.order_date, t3.client_type;
Thanks in advance! I am a beginner so still learning the ins and outs of sql! (am using hive)

I think that's joins and aggregation. For more efficiency, you can pre-aggregate in a subquery, then join:
select sd.*, ci.client_type
from (
select order_date, transaction_id, sum(product_cost) product_cost
from sales_detail
where
payment_type = 'mastercard'
and order_date >= '2020-10-01'
and order_date < '2020-11-01'
and country <> 'canada'
group by order_date, transaction_id
) sd
inner join connector c on c.transaction_id = sd.transaction_id
inner join client_information ci on ci.other_id = c.other_id
Note that I rewrote the filter on order_date to use half-open intervals rather than between. This properly handles the case when your dates have a time portion.

From what I have understood, your code works although not as you would like using an INNER JOIN and it fails to add a LEFT JOIN. I think what happens is a failure due to the NULL elements. to add a NULL element and not get an error, you have to use some function that changes the NULL value to 0 .
One such function is the ISNULL(yourColumn, 0) function of T-SQL.The documentation.

I can see that in result table you only need clients who used mastercard, so you should use inner join there so only those client who used mastercard will be considered. While the remaining query is okay i guess, but main problem was the join on client information.

I think on the answer with GMB you also need to join on item column otherwise you will get multiple rows output.
select sd.*, ci.client_type
from (
select order_date, transaction_id, sum(product_cost) product_cost
from sales_detail
group by order_date, transaction_id
) sd
inner join connector c on c.transaction_id = sd.transaction_id
inner join client_information ci on ci.other_id = c.other_id and ci.item = c.item
Just modify with your filters and you should be sorted.

Related

Oracle SQL - Where Not Exists and multiple joins

I'm in need of some assistance to anyone familiar with Oracle SQL. I'm trying to use the Where Not Exists sub query, and is working fine with specific where clauses for specific customers, and then using UNION to join any other customer thereafter. I'm trying to use the SQL in a way where I'm not using UNION to join multiple customers, as there's 100's. I just don't know how to go about it.
I think I need to join the MAIN_ITEM table to the LOCATION table somehow, as it links the items in INVENTORY_LOCATIONS with a zone_code where the location_code can be compared against the table ZONE, showing any mismatches where location_code in INVENTORY_LOCATIONS does not exist in table ZONE. I'm not really sure if I"m explaining this properly, but hopefully my example below clears it up.
Many thanks in advance.
Current Query
select a.company, a.customer, c.customer_name, a.location_code, a.invt1, a.invt2, a.invt3, a.invt_qty
from inventory_locations a left join main_customer c
on a.company=c.company and a.customer=c.customer and a.ware_code=c.ware_code
where not exists (select 1 from zone b where b.location_code = a.location_code and b.zone_code='PM')
and a.company='M1'
and a.customer='100068'
UNION
select a.company, a.customer, c.customer_name, a.location_code, a.invt1, a.invt2, a.invt3, a.invt_qty
from inventory_locations a left join main_customer c
on a.company=c.company and a.customer=c.customer and a.ware_code=c.ware_code
where not exists (select 1 from zone b where b.location_code = a.location_code and b.zone_code='Z1')
and a.company='M1'
and a.customer='100012'
Table 1 - INVENTORY_LOCATIONS A
COMPANY | WARE_CODE |CUSTOMER | LOCATION_CODE |INVT1 | INVT2 | INVT3 | INVT_QTY
--------------------------------------------------------------------------------------
M1 | 01 | 100012 | 0101A |000052 | T100 | 000001001 | 60
M1 | 01 | 100012 | 0602A |000053 | T101 | 000001002 | 60
M1 | 01 | 100068 | 0601A |CANDY | T200 | 000001080 | 25
M1 | 01 | 100068 | 0102A |CANDY2 | T202 | 000001081 | 25
Table 2 - ZONE B
COMPANY | WARE_CODE |ZONE_CODE | LOCATION_CODE
--------------------------------------------------------
M1 | 01 |PM | 0101A
M1 | 01 |PM | 0102A
M1 | 01 |Z1 | 0601A
M1 | 01 |Z1 | 0602A
Table 3 - MAIN_ITEM D
COMPANY | WARE_CODE | CUSTOMER | ITEM_CODE | ZONE_CODE
----------------------------------------------------------------
M1 | 01 | 100012 | 000052 | PM
M1 | 01 | 100012 | 000053 | PM
M1 | 01 | 100068 | CANDY | Z1
M1 | 01 | 100068 | CANDY2 | Z1
Current results with above query.
COMPANY | CUSTOMER | CUSTOMER_NAME | LOCATION_CODE | INVT1 | INVT2 | INVT3 | INVT_QTY
---------------------------------------------------------------------------------------------------
M1 | 100012 | TEST COMP 1 | 0602A | 000053 | T101 | 000001002 | 60
M1 | 100068 | TEST COMP 2 | 0102A | CANDY2 | T202 | 000001081 | 25
Expected results with a query that doesn't use UNION to join multiple customers.
COMPANY | CUSTOMER | CUSTOMER_NAME | LOCATION_CODE | INVT1 | INVT2 | INVT3 | INVT_QTY
-------------------------------------------------------------------------------------------------
M1 | 100012 | TEST COMP 1 | 0602A | 000053 | T101 | 000001002 | 60
M1 | 100068 | TEST COMP 2 | 0102A | CANDY2 | T202 | 000001081 | 25
Thank you for taking the time to read and assist. Greatly appreciated.
I don't fully understand the semantics of your tables and am not sure of the primary keys which means the join conditions could need to be corrected.
However, my interpretation of your goal is to find which inventory_location rows imply a combination of location code and zone that is not in the zone table.
So I would do as follows:
Take the inventory_location table, and add on the customer_name and zone_code with joins. I am assuming each row has only one customer and only one zone. A "with" clause is convenient to treat this as if it were a single table.
Then take the location and zone code combinations and see which ones are missing from the zone table with a "where not exists" clause.
I apologize in advance for any typos/syntax errors. Without actually executing it, I think it would produce your requested output.
with inv_loc as (
select a.company, a.customer, c.customer_name, a.location_code, a.invt1, a.invt2, a.invt3, a.invt_qty, d.zone_code
from inventory_locations a
left join main_customer c on a.company=c.company and a.customer=c.customer and a.ware_code=c.ware_code
left join main_item d on d.company = a.company and d.customer = a.customer and d.ware_code = a.ware_code and d.item_code = a.invt1
)
select
company, customer, customer_name, location_code, invt1, invt2, invt3, invt_qty
from inv_loc i
where not exists (
select 1 from zone b
where b.location_code = i.location_code and b.zone_code =i.zone_code
)

Duplicate records upon joining table

I am still very new to SQL and Tableau however I am trying to work myself towards achieving a personal project of mine.
Table A; shows a table which contains the defect quantity per product category and when it was raised
+--------+-------------+--------------+-----------------+
| Issue# | Date_Raised | Category_ID# | Defect_Quantity |
+--------+-------------+--------------+-----------------+
| PCR12 | 11-Jan-2019 | Product#1 | 14 |
| PCR13 | 12-Jan-2019 | Product#1 | 54 |
| PCR14 | 5-Feb-2019 | Product#1 | 5 |
| PCR15 | 5-Feb-2019 | Product#2 | 7 |
| PCR16 | 20-Mar-2019 | Product#1 | 76 |
| PCR17 | 22-Mar-2019 | Product#2 | 5 |
| PCR18 | 25-Mar-2019 | Product#1 | 89 |
+--------+-------------+--------------+-----------------+
Table B; shows the consumption quantity of each product by month
+-------------+--------------+-------------------+
| Date_Raised | Category_ID# | Consumed_Quantity |
+-------------+--------------+-------------------+
| 5-Jan-2019 | Product#1 | 100 |
| 17-Jan-2019 | Product#1 | 200 |
| 5-Feb-2019 | Product#1 | 100 |
| 8-Feb-2019 | Product#2 | 50 |
| 10-Mar-2019 | Product#1 | 100 |
| 12-Mar-2019 | Product#2 | 50 |
+-------------+--------------+-------------------+
END RESULT
I would like to create a table/bar chart in tableau that shows that Defect_Quantity/Consumed_Quantity per month, per Category_ID#, so something like this below;
+----------+-----------+-----------+
| Month | Product#1 | Product#2 |
+----------+-----------+-----------+
| Jan-2019 | 23% | |
| Feb-2019 | 5% | 14% |
| Mar-2019 | 89% | 10% |
+----------+-----------+-----------+
WHAT I HAVE TRIED SO FAR
Unfortunately i have not really done anything, i am struggling to understand how do i get rid of the duplicates upon joining the tables based on Category_ID#.
Appreciate all the help I can receive here.
I can think of doing left joins on both product1 and 2.
select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')
, (p2.product1 - sum(case when category_id='Product#1' then Defect_Quantity else 0 end))/p2.product1 * 100
, (p2.product2 - sum(case when category_id='Product#2' then Defect_Quantity else 0 end))/p2.product2 * 100
from tableA t1
left join
(select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy') Date_Raised
, sum(Comsumed_Quantity) as product1 tableB
where category_id = 'Product#1'
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')) p1
on p1.Date_Raised = t1.Date_Raised
left join
(select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy') Date_Raised
, sum(Comsumed_Quantity) as product2 tableB
where category_id = 'Product#2'
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')) p2
on p2.Date_Raised = t1.Date_Raised
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')
By using ROW_NUMBER() OVER (PARTITION BY ORDER BY ) as RN, you can remove duplicate rows. As of your end result you should extract month from date and use pivot to achieve.
I would do this as:
select to_char(date_raised, 'YYYY-MM'),
(sum(case when product = 'Product#1' then defect_quantity end) /
sum(case when product = 'Product#1' then consumed_quantity end)
) as product1,
(sum(case when product = 'Product#2' then defect_quantity end) /
sum(case when product = 'Product#2' then consumed_quantity end)
) as product2
from ((select date_raised, product, defect_quantity, 0 as consumed_quantity
from a
) union all
(select date_raised, product, 0 as defect_quantity, consumed_quantity
from b
)
) ab
group by to_char(date_raised, 'YYYY-MM')
order by min(date_raised);
(I changed the date format because I much prefer YYYY-MM, but that is irrelevant to the logic.)
Why do I prefer this method? This will include all months where there is a row in either table. I don't have to worry that some months are inadvertently filtered out, because there are missing production or defects in one month.

MS Access duplicate values using SUM

I'm having trouble writing a query in Microsoft Access 2016 that will show the sum of an Expense for a particular event, the sum of the signs that event produced, along with the year, event description and company name.
I think I am missing something simple, and am going to feel ridiculous once someone points it out. Hopefully I managed to format my question well enough that it is easy to spot!
Here are the tables involved, along with the dummy data I am testing with.
All_Company Company_Event
------------------ ---------------------------
| ID | Company | | ID | EventDescription |
|------|---------| |----|--------------------|
| 1 | Crapple | | 1 | Concert |
| 2 | Rito | | 2 | Party |
------------------ ---------------------------
Company_Target_Actual
----------------------------------------------------------------
| All_CompanyID | Company_EventID | Year | Quarter | Signed |
|----------------|-------------------|------|---------|--------|
| 1 | 2 | 2015 | 1 | 1 |
| 1 | 2 | 2015 | 2 | 0 |
| 1 | 2 | 2015 | 3 | 3 |
| 1 | 2 | 2015 | 4 | 1 |
----------------------------------------------------------------
Budget_Company_Expense
---------------------------------------------------------------------------------
| ID | All_CompanyID | Company_EventID | Year | Category | SubCategory| Expense |
---------------------------------------------------------------------------------
| 1 | 1 | 2 | 2015 | ABCD | 123 | 40 |
| 2 | 1 | 2 | 2015 | ABCD | cat | 113 |
| 3 | 1 | 2 | 2015 | ABCD | dog | 71 |
---------------------------------------------------------------------------------
This is my code for the query, I broke it up from the ugly Access long lines of code to make it easier to read.
SELECT DISTINCTROW All_Company.Company, Budget_Company_Expense.Year,
Budget_Company_Expense.Company_EventID, Company_Event.EventDescription,
Sum(Budget_Company_Expense.Expense) AS [Sum Of Expense USD],
Sum(Company_Target_Actual.Signed) AS [Sum Of Signed]
FROM Company_Event
INNER JOIN ((All_Company
INNER JOIN Company_Target_Actual
ON All_Company.[ID] = Company_Target_Actual.[All_CompanyID])
INNER JOIN Budget_Company_Expense
ON All_Company.[ID] = Budget_Company_Expense.[All_CompanyID])
ON Company_Event.[ID] = Budget_Company_Expense.[Company_EventID]
GROUP BY All_Company.Company, Budget_Company_Expense.Year,
Budget_Company_Expense.Company_EventID, Company_Event.EventDescription;
and here is the result from running my query
Result
-------------------------------------------------------------------------------------------
| Company | Year | Company_EventID | EventDescription | Sum of Expense USD | Sum of Signed|
-------------------------------------------------------------------------------------------
| Crapple | 2015 | 2 | Party | $896.00 | 15 |
-------------------------------------------------------------------------------------------
As you can see, it is summing as if the total signs (5) happened 3 times (the number of entries in the Company_Target_Actual table) and vis versa for the Expense. Any help on my issue would be greatly appreciated,
and if I forgot any information that may help find my mistake please let me know what else I can provide!
Consider splitting the query into two aggregations, one to sum Signed in Company_Target_Actual and the other to sum Expense in Business_Company_Expense. Then, join the two queries by Company, Event, and Year which are the grouping factors.
Below uses two derived tables (subqueries in FROM/JOIN clause). However, you can very well save either one as a separate query and then join them in final query:
SELECT t1.Company, t1.Year, t1.Company_EventID, t1.EventDescription,
t2.[Sum Of Expense USD], t1.[Sum of Signed]
FROM
(SELECT ac.ID AS CompanyID, ac.Company, ca.Year, ca.Company_EventID, ev.EventDescription,
SUM(ca.Signed) AS [Sum Of Signed]
FROM (Company_Target_Actual ca
INNER JOIN Company_Event ev
ON ca.Company_EventID = ev.ID)
INNER JOIN All_Company ac
ON ca.All_CompanyID = ac.ID
GROUP BY ac.ID, ac.Company, ca.Year, ca.Company_EventID, ev.EventDescription) AS t1
INNER JOIN
(SELECT ac.ID AS CompanyID, ac.Company, be.Year, be.Company_EventID, ev.EventDescription,
SUM(be.Expense) AS [Sum Of Expense USD]
FROM (Budget_Company_Expense be
INNER JOIN Company_Event ev
ON be.Company_EventID = ev.ID)
INNER JOIN All_Company ac
ON be.All_CompanyID = ac.ID
GROUP BY ac.ID, ac.Company, be.Year, be.Company_EventID, ev.EventDescription) AS t2
ON t1.CompanyID = t2.CompanyID
AND t1.Company_EventID = t2.Company_EventID
AND t1.Year = t2.Year

Filter by value in last row of LEFT OUTER JOIN table

I have a Clients table in PostgreSQL (version 9.1.11), and I would like to write a query to filter that table. The query should return only clients which meet one of the following conditions:
--The client's last order (based on orders.created_at) has a fulfill_by_date in the past.
OR
--The client has no orders at all
I've looked for around 2 months, on and off, for a solution.
I've looked at custom last aggregate functions in Postgres, but could not get them to work, and feel there must be a built-in way to do this.
I've also looked at Postgres last_value window functions, but most of the examples are of a single table, not of a query joining multiple tables.
Any help would be greatly appreciated! Here is a sample of what I am going for:
Clients table:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 2 | SecondClient |
| 3 | ThirdClient |
Orders table:
| order_id | client_id | fulfill_by_date | created_at |
-------------------------------------------------------
| 1 | 1 | 3000-01-01 | 2013-01-01 |
| 2 | 1 | 1999-01-01 | 2013-01-02 |
| 3 | 2 | 1999-01-01 | 2013-01-01 |
| 4 | 2 | 3000-01-01 | 2013-01-02 |
Desired query result:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 3 | ThirdClient |
Try it this way
SELECT c.client_id, c.client_name
FROM clients c LEFT JOIN
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY created_at DESC) rnum
FROM orders
) o
ON c.client_id = o.client_id
AND o.rnum = 1
WHERE o.fulfill_by_date < CURRENT_DATE
OR o.order_id IS NULL
Output:
| CLIENT_ID | CLIENT_NAME |
|-----------|-------------|
| 1 | FirstClient |
| 3 | ThirdClient |
Here is SQLFiddle demo

Two Tables Relationship based on conditions

I am stuck in the following situations.
I have two tables Products and Postage:
Postage table
+----------------------------------+
| Weight_GM | Postal Charges ($) |
+----------------------------------+
| 20 | 1 |
| 40 | 1.5 |
| 50 | 1.7 |
+----------------------------------+
Products table
+-------------------------------+
| SKU | Title | Weight_GM |
+-------------------------------+
| ABC | Shose | 17 |
| JKL | Camera | 27 |
| XYZ | IPad | 48 |
+-------------------------------+
I want to create a relationship to take appropriate postal charge from Postage table based on the Weight defined in both tables.
The desired result would be like this:
+---------------------------------------------------+
| SKU | Title | Weight_GM | Postal Charges |
+---------------------------------------------------+
| ABC | Shose | 17 | 1 |
| JKL | Camera | 27 | 1.5 |
| XYZ | IPad | 48 | 1.7 |
+---------------------------------------------------+
Note: I have been through many similar questions but there was not solution to my problem.
Thanks in advance.
This should work -- just use GROUP BY and MIN:
SELECT DISTINCT Pr.SKU, Pr.Title, Pr.Weight_GM, MIN(PO.Postal_Charges) as PO_Charges
FROM Products Pr
JOIN Postage Po ON Pr.Weight_GM <= Po.Weight_GM
GROUP BY Pr.SKU, Pr.Title, Pr.Weight_GM
And the Fiddle.
This should do it
;With WeightNumber AS
(
SELECT Weight_GM, Postal_Charge, ROW_NUMBER() OVER (ORDER BY Weight_GM) AS Num
FROM Postage
)
WeightRange AS
(
SELECT ISNULL(Prec.Weight_GM - 1, 0) AS START_WEIGHT, CurrentRow.Weight_GM AS END_WEIGHT, CurrentRow.Postal_Charge
FROM WeightNumber CurrentRow
LEFT JOIN WeightNumber Prec
ON Prec.Num=CurrentRow.Num - 1
)
SELECT *
FROM Products p
JOIN WeightRange w
ON p.Weight_GM BETWEEN w.START_WEIGHT AND w.END_WEIGHT
You can do this with the following code
The inner join will join the two tables with the key Weight_GM
select a.SKU, b.Postal Charges from Products a
inner join Postage b on a.Weight_GM = b.Weight_GM
try this:
select SKU,Title,prds.Weight_GM b,pst.Weight_GM a,Postal_charges
from Postage pst ,Products prds
where prds.Weight_GM < pst.Weight_GM
group by prds.Weight_GM;
here's the demo : demo link