Multiple self joins plus one inner join - sql

I have two tables: ck_startup and ck_price. The price table contains the columns cu_type, prd_type, part_cd, qty, and dllrs. The startup table is linked to the price table through a one-to-many relationship on ck_startup.prd_type_cd = ck_price.prd_type.
The price table contains multiple entries for the same product/part/qty but under different customer types. Not all customer types have the same unique combination of those three values. I'm trying to create a query that will do two things:
Join some columns from ck_startup onto ck_price (description, and some additional values).
Join ck_price onto itself with a dllrs column for each customer type. So in total I would only have one instance of each unique key of product/part/qty, and a value in each customer's price column if they have one.
I've never worked with self joining tables, and so far I can only get records to show up where both customers have the same options available.
And because someone is going to demand I post sample code, here's the crappy query that doesn't show missing prices:
select pa.*, pac.dllrs from ck_price pa
join ck_price pac on pa.prd_type = pac.prd_type and pa.part_carbon_cd = pac.part_carbon_cd and pa.qty = pac.qty
where pa.cu_type = 'A' and pac.cu_type = 'AC';
EDIT: Here's sample data from the two tables, and how I want them to look when I'm done:
CK_STARTUP
+-----+-----------------+-------------+
| CD | DSC | PRD_TYPE_CD |
+-----+-----------------+-------------+
| 3D | Stuff | SKD3 |
| DC | Different stuff | SKD |
| DN2 | Similar stuff | SKD |
+-----+-----------------+-------------+
CK_PRICE
+---------+-------------+---------+-----+-------+
| CU_TYPE | PRD_TYPE_CD | PART_CD | QTY | DLLRS |
+---------+-------------+---------+-----+-------+
| A | SKD3 | 1 | 100 | 10 |
| A | SKD3 | 1 | 200 | 20 |
| A | SKD3 | 1 | 300 | 30 |
| A | SKD | 1 | 100 | 50 |
| A | SKD | 1 | 200 | 100 |
| AC | SKD3 | 1 | 300 | 30 |
| AC | SKD | 1 | 100 | 100 |
| AC | SKD | 1 | 200 | 200 |
| AC | SKD | 1 | 300 | 300 |
| AC | SKD | 1 | 400 | 400 |
+---------+-------------+---------+-----+-------+
COMBO:
+----+-----------------+---------+-----+---------+----------+
| CD | DSC | PART_CD | QTY | DLLRS_A | DLLRS_AC |
+----+-----------------+---------+-----+---------+----------+
| 3D | Stuff | 1 | 100 | 10 | null |
| 3D | Stuff | 1 | 200 | 20 | null |
| 3D | Stuff | 1 | 300 | 30 | 60 |
| DC | Different stuff | 1 | 100 | 50 | 100 |
| DC | Different stuff | 1 | 200 | 100 | 200 |
| DC | Different stuff | 1 | 300 | null | 300 |
| DC | Different stuff | 1 | 400 | null | 400 |
+----+-----------------+---------+-----+---------+----------+

Ok, take a look at below query and at the results:
SELECT *
FROM (SELECT
cs.cd, cs.dsc, cp.part_cd, cp.qty, cp.dllrs, cp.cu_type
FROM ck_startup cs
JOIN ck_price cp ON (cs.prd_type_cd = cp.prd_type_cd))
PIVOT (SUM(dllrs) AS dlllrs FOR (cu_type) IN ('A' AS a, 'AC' AS ac))
ORDER BY cd, qty
;
Output:
CD DSC PART_CD QTY A_DLLLRS AC_DLLLRS
-------- ----------------- ---------- ------- ---------- ----------
3D Stuff 1 100 10
3D Stuff 1 200 20
3D Stuff 1 300 30 30
DC Different stuff 1 100 50 50
DC Different stuff 1 200 100 100
DC Different stuff 1 300 150
DC Different stuff 1 400 200
DN2 Similar stuff 1 100 50 50
DN2 Similar stuff 1 200 100 100
DN2 Similar stuff 1 300 150
DN2 Similar stuff 1 400 200
It is not what you would expect, because I do not understand why you have different values in DLLRS_AC column that are in the CK_PRICE table? I mean, for example, why do you have 400 in last line of your output, not 200? Why is this value doubled (as others are in DLLRS_AC column)?
If you are using Oracle 10g, you can achieve the same result using DECODE and GROUP BY, take a look:
SELECT
cd,
dsc,
part_cd,
qty,
SUM(DECODE(cu_type, 'A', dllrs, NULL)) AS dllrs_a,
SUM(DECODE(cu_type, 'AC', dllrs, NULL)) AS dllrs_ac
FROM (
SELECT
cs.cd, cs.dsc, cp.part_cd, cp.qty, cp.dllrs, cp.cu_type
FROM ck_startup cs
JOIN ck_price cp ON (cs.prd_type_cd = cp.prd_type_cd)
)
GROUP BY cd, dsc, part_cd, qty
ORDER BY cd, qty;
Result is the same.
If you want to read more about pivoting, I recommend article by Tim Hall: Pivot and Unpivot at Oracle Base

Related

SQL group column where other column is equal

I'm trying to select some information from a database.
I get a database with columns like:
Ident,Name,Length,Width,Quantity,Planned
Table data is as follow
+-----------+-----------+---------+---------+------------+---------+
| Ident | Name | Length | Width | Quantity | Planned |
+-----------+-----------+---------+---------+------------+---------+
| 12345 | Name1 | 1500 | 1000 | 20 | 5 |
| 23456 | Name1 | 1500 | 1000 | 30 | 13 |
| 34567 | Name1 | 2500 | 1000 | 10 | 2 |
| 45678 | Name1 | 2500 | 1000 | 10 | 4 |
| 56789 | Name1 | 1500 | 1200 | 20 | 3 |
+-----------+-----------+---------+---------+------------+---------+
my desired result, would be to group rows where "Name,Length and Width" are equal, sum the "Quantity" and reduce it by the sum of "Planned"
e.g:
- Name1,1500,1000,32 --- (32 because (20+30)-(5+13))
- Name1,2500,1000,14 --- (14 because (10+10)-(2+4)))
- Name1,1500,1200,17
now I got problems how to group or join these information to get the wished select. may be some you of can help me.. if further information's required, please write it in comment.
You can achieve it by grouping your table and subtract sums of Quantity and Planned.
select
Name
,Length
,Width
,sum(Quantity) - sum(Planned)
from yourTable
group by Name,Length,Width
select
A1.Name,A1.Length,A1.Width,((A1.Quantity + A2.Quantity) -(A1.Planned+A2.Planned))
from `Table` AS A1, `Table` AS A2
where A1.Name = A2.Name and A1.Length = A2.Length and A1.Width = A2.Width
group by (whatever)
So you are comparing these columns form the same table?

Loop over one table, subselect another table and update values of first table with SQL/VBA

I have a source table that has a few different prices for each product (depending on the order quantity). Those prices are listed vertically, so each product could have more than one row to display its prices.
Example:
ID | Quantity | Price
--------------------------
001 | 5 | 100
001 | 15 | 90
001 | 50 | 80
002 | 10 | 20
002 | 20 | 15
002 | 30 | 10
002 | 40 | 5
The other table I have is the result table in which there is only one row for each product, but there are five columns that each could contain the quantity and price for each row of the source table.
Example:
ID | Quantity_1 | Price_1 | Quantity_2 | Price_2 | Quantity_3 | Price_3 | Quantity_4 | Price_4 | Quantity_5 | Price_5
---------------------------------------------------------------------------------------------------------------------------
001 | | | | | | | | | |
002 | | | | | | | | | |
Result:
ID | Quantity_1 | Price_1 | Quantity_2 | Price_2 | Quantity_3 | Price_3 | Quantity_4 | Price_4 | Quantity_5 | Price_5
---------------------------------------------------------------------------------------------------------------------------
001 | 5 | 100 | 15 | 90 | 50 | 80 | | | |
002 | 10 | 20 | 20 | 15 | 30 | 10 | 40 | 5 | |
Here is my Python/SQL solution for this (I'm fully aware that this could not work in any way, but this was the only way for me to show you my interpretation of a solution to this problem):
For Each result_ID In result_table.ID:
Subselect = (SELECT * FROM source_table WHERE source_table.ID = result_ID ORDER BY source_table.Quantity) # the Subselect should only contain rows where the IDs are the same
For n in Range(0, len(Subselect)): # n (index) should start from 0 to last row - 1
price_column_name = 'Price_' & (n + 1)
quantity_column_name = 'Quantity_' & (n + 1)
(UPDATE result_table
SET result_table.price_column_name = Subselect[n].Price, # this should be the price of the n-th row in Subselect
result_table.quantity_column_name = Subselect[n].Quantity # this should be the quantity of the n-th row in Subselect
WHERE result_table.ID = Subselect[n].ID)
I honestly have no idea how to do this with only SQL or VBA (those are the only languages I'd be able to use -> MS-Access).
This is a pain in MS Access. If you can enumerate the values, you can pivot them.
If we assume that price is unique (or quantity or both), then you can generate such a column:
select id,
max(iif(seqnum = 1, quantity, null)) as quantity_1,
max(iif(seqnum = 1, price, null)) as price_1,
. . .
from (select st.*,
(select count(*)
from source_table st2
where st2.id = st.id and st2.price >= st.price
) as seqnum
from source_table st
) st
group by id;
I should note that another solution would use data frames in Python. If you want to take that route, ask another question and tag it with the appropriate Python tags. This question is clearly a SQL question.

SQL: Cascading conditions on Join

I have found a few similar questions to this on SO but nothing which applies to my situation.
I have a large dataset with hundreds of millions of rows in Table 1 and am looking for the most efficient way to run the following query. I am using Google BigQuery but I think this is a general SQL question applicable to any DBMS?
I need to apply an owner to every row in Table 1. I want to join in the following priority:
1: if item_id matches an identifier in Table 2
2: if no item_id matches try match on item_name
3: if no item_id or item_name matches try match on item_division
4: if no item_division matches, return null
Table 1 - Datapoints:
| id | item_id | item_name | item_division | units | revenue
|----|---------|-----------|---------------|-------|---------
| 1 | xyz | pen | UK | 10 | 100
| 2 | pqr | cat | US | 15 | 120
| 3 | asd | dog | US | 12 | 105
| 4 | xcv | hat | UK | 11 | 140
| 5 | bnm | cow | UK | 14 | 150
Table 2 - Identifiers:
| id | type | code | owner |
|----|---------|-----------|-------|
| 1 | id | xyz | bob |
| 2 | name | cat | dave |
| 3 | division| UK | alice |
| 4 | name | pen | erica |
| 5 | id | xcv | fred |
Desired output:
| id | item_id | item_name | item_division | units | revenue | owner |
|----|---------|-----------|---------------|-------|---------|-------|
| 1 | xyz | pen | UK | 10 | 100 | bob | <- id
| 2 | pqr | cat | US | 15 | 120 | dave | <- code
| 3 | asd | dog | US | 12 | 105 | null | <- none
| 4 | xcv | hat | UK | 11 | 140 | fred | <- id
| 5 | bnm | cow | UK | 14 | 150 | alice | <- division
My attempts so far have involved multiple joining the table onto itself and I fear it is becoming hugely inefficient.
Any help much appreciated.
Another option for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_AGG(a)[OFFSET(0)].*,
ARRAY_AGG(owner
ORDER BY CASE
WHEN type = 'id' THEN 1
WHEN type = 'name' THEN 2
WHEN type = 'division' THEN 3
END
LIMIT 1
)[OFFSET(0)] owner
FROM Datapoints a
JOIN Identifiers b
ON (a.item_id = b.code AND b.type = 'id')
OR (a.item_name = b.code AND b.type = 'name')
OR (a.item_division = b.code AND b.type = 'division')
GROUP BY a.id
ORDER BY a.id
It leaves out entries which k=have no owners - like in below result (id=3 is out as it has no owner)
Row id item_id item_name item_division units revenue owner
1 1 xyz pen UK 10 100 bob
2 2 pqr cat US 15 120 dave
3 4 xcv hat UK 11 140 fred
4 5 bnm cow UK 14 150 alice
I am using the following query (thanks #Barmar) but want to know if there is a more efficient way in Google BigQuery:
SELECT a.*, COALESCE(b.owner,c.owner,d.owner) owner FROM datapoints a
LEFT JOIN identifiers b on a.item_id = b.code and b.type = 'id'
LEFT JOIN identifiers c on a.item_name = c.code and c.type = 'name'
LEFT JOIN identifiers d on a.item_division = d.code and d.type = 'division'
I'm not sure if BigQuery optimizes today a query like this - but at least you would be writing a query that gives strong hints to not run the subqueries when not needed:
#standardSQL
SELECT COALESCE(
null
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.login=a.user)
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.id = SAFE_CAST(user AS INT64))
)
FROM (SELECT '15229281' user) a
4.2s elapsed, 683 GB processed
{"action":"started"}
For example, the following query took a long time to run, but BigQuery could optimize its execution massively in the future (depending on how frequently users needed an operation like this):
#standardSQL
SELECT COALESCE(
"hello"
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.login=a.user)
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.id = SAFE_CAST(user AS INT64))
)
FROM (SELECT actor.login user FROM `githubarchive.year.2016` LIMIT 10) a
114.7s elapsed, 683 GB processed
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello

How to join a number with the range of number from another table

I have the two following tables:
| ID | Count |
| --- | ----- |
| 1 | 45 |
| 2 | 5 |
| 3 | 120 |
| 4 | 87 |
| 5 | 60 |
| 6 | 200 |
| 7 | 31 |
| SizeName | LowerLimit | UpperLimit |
| -------- | ---------- | ---------- |
| Small | 0 | 49 |
| Medium | 50 | 99 |
| Large | 100 | 250 |
Basically, one table specifies an unknown number of range names and their associated integer ranges. So a count range of 0 to 49 from the person table gets a small designation. 50-99 gets 'medium' etc. I need it to be dynamic because I do not know the range names or integer values.
Can I do this in a single query or would I have to write a separate function to loop through the possibilities?
One way to do this would be to join the tables, depending on if you want to keep values outside of your "range names", or not, you could use LEFT, or INNER join respectively.
SELECT A.id, A.Count, B.SizeName
FROM tableA A
LEFT JOIN tableB B ON A.id >= B.LowerLimit AND A.id < B.UpperLimit
You can also use the BETWEEN operator in a JOIN like this:
SELECT a.id, a.Count, b.SizeName
FROM tableA a
JOIN tableB b ON a.id BETWEEN b.LowerLimit AND b.UpperLimit

SQL Server 2008 - accumulating column

I would like to accumulate my data as you can see below there is origin table table1:
What is the best query for to do this?
Is possible to do this dynamically - when I add more types of terms??
Table 1
ID | term | value
-----------------------
1 | I | 100
2 | I | 200
3 | II | 100
4 | II | 50
5 | II | 75
6 | III | 50
7 | III | 65
8 | IV | 30
9 | IV | 45
And the result should be like below:
YTD | Acc Value
------------------
I-I | 300
I-II | 525
I-III| 640
I-IV | 715
Thanks
select
(select min(term) from yourtable ) +'-'+term,
(select sum(value) from yourtable t1 where t1.term<=t.term)
from yourtable t
group by term