Select rows with same value in one column but different value in another column - sql

I've been trying to build this query but am new to SQL so I'd really appreciate some help.
In the below table example, I have a Customer Code, a linked Customer Code (which is used to link a child customer to a parent customer), a salesperson, and other irrelevant columns. The goal is to have one Salesperson for each parent customer and it's children. So in the example, CustCode #100 is the parent of itself, #200, #500, and #800. All of these accounts have the same Salesperson (JASON) which is perfect. But for CustCode #300, it is the parent of itself, #400, and #600. However, there isn't one salesperson assigned - its both JIM and SUZY. I want to build a query that shows all accounts for this example. Basically, accounts where the Salesperson field isn't the same value for all of it's child customers.
I tried a Where clause for Salesperson <> Salesperson but its not showing up right.
+-----------+-----------------+------------+----------------------+
| CustCode | Linked CustCode | Salesperson| additional columns...|
+-----------+-----------------+------------+----------------------+
| 100 | 100 | JASON | ... |
| 200 | 100 | JASON | ... |
| 300 | 300 | JIM | ... |
| 400 | 300 | JIM | ... |
| 500 | 100 | JASON | ... |
| 600 | 300 | SUZY | ... |
| 700 | NULL | JIM | ... |
| 800 | 100 | JASON | ... |
+-----------+-----------------+------------+----------------------+
Thanks so much for your help!

You can do self join on the table.
select distinct r2.* from
table r1
join table r2
on
r1.linkedcustcode = r2.linkedcustcode and r1.salesperson <> r2.salesperson

This solution uses a recursive CTE first to build the hierarchy and find the leading code for each row, even if a linked code points to a row which is pointing to an upper row itself.
The final query shows the count of different Salespersons:
DECLARE #tbl TABLE(CustCode INT,[Linked CustCode] INT,Salesperson VARCHAR(100));
INSERT INTO #tbl VALUES
(100,100,'JASON')
,(200,100,'JASON')
,(300,300,'JIM')
,(400,300,'JIM')
,(500,100,'JASON')
,(600,300,'SUZY')
,(700,NULL,'JIM')
,(800,100,'JASON');
--The query
WITH CleanUp AS
(
SELECT CustCode
,CASE WHEN [Linked CustCode]=CustCode THEN NULL ELSE [Linked CustCode] END AS [Linked CustCode]
,Salesperson
FROM #tbl
)
,recCTE AS
(
SELECT CustCode AS LeadingCode,CustCode,[Linked CustCode],Salesperson
FROM CleanUp
WHERE [Linked CustCode] IS NULL
UNION ALL
SELECT recCTE.LeadingCode,t.CustCode,t.[Linked CustCode],t.Salesperson
FROM recCTE
INNER JOIN CleanUp AS t ON t.[Linked CustCode]=recCTE.CustCode
)
SELECT LeadingCode,COUNT(DISTINCT Salesperson) AS CountSalesperson
FROM recCTE
GROUP BY LeadingCode
The result
LeadingCode CountSalesperson
100 1
300 2
700 1

Related

SQL question, query is not updating account_id's fields: income, customerid, customergroup?

I am executing this query through a databricks notebook, to join data from a stage table to a target table based on the shared join keys: account_id and stmt_end_dt. The stage table has 2 billion rows of data and the target table has 3 billion rows of data.
Here is the main query:
"UPDATE TARGET_TBL SET INCOME = S.INCOME, CUSTOMERGROUPID = S.CUSTOMERGROUPID, CUSTOMERID = S.CUSTOMERID
FROM STAGE_TBL AS S
WHERE CAST(S.ACCT_ID AS NUMBER(18,0)) = TARGET_TBL.ACCT_ID
AND CAST(S.STMT_END_DT AS DATE) = TARGET_TBL.STMT_END_DT"
What I want to do is add "income", "customerid", and "customergroup" data to the matching rows of "account_id" and "stmt_end_dt" in the target table, from the stage table. When I go into the target table I see that there are now fields for "income", "customerid", and "customergrop" (this is fine because it was created through an earlier query). After my query has run and I click into the target table I see that account_id is blank and that "income", "customerid" and "customergroup" all have data filled. And when I run this query: SELECT * FROM TARGET_TBL WHERE INCOME IS NOT NULL; I get back 80000 rows (seems kinda low considering the stage table is 2 billion). Also after that query runs I see again that "income", "customerid" and "customergroup" are all populated with data, but account_id is full of NULLS. It is as this data is just being appended or tacked on, and not updating each account_id's fields with the matching data, this is how I imagine it should look like:
account_id | income | customerid | customergroupid
4321 | 60000 | 6345 | 3
5432 | 55000 | 4345 | 5
But instead it looks like this:
account_id | income | customerid | customergroupid
| 60000 | 6345 | 3
| 55000 | 4345 | 5
Or when I run: SELECT * WHERE INCOME IS NOT NULL:
account_id | income | customerid | customergroupid
NULL | 60000 | 6345 | 3
NULL | 55000 | 4345 | 5
And if I simply open the target table it looks like this:
account_id | income | customerid | customergroupid
4321 | | |
5432 | | |
After that query runs, it is also NULL for all other fields outside of the last 3 shown.
Perhaps the data types coming from the stage table aren't compatible with the target table?
What could be causing this strange behavior?
you can't compare "values" with "null"... if a field is "null" there is nothing to compare. I believe this is your problem.
if you have null fields and you want to compare, usually you can try "is null" or "nvl" lookup for the syntax of these.. it is very helpfull.

How to return a single value built from values stored in multiple records?

I am an application developer unfortunately put in the position of needing to write (/update) the SQL statement in order to return data for the application. My experience with SQL is limited, so would appreciate any help.
We have a Oracle Database 11g (11.2.0.4.0)
Example Tables
I've created the following example which replicates our set-up. It consists of:
A main table which contains records of trips around different cities. (MAIN_TRIP_TABLE)
Various additional tables which contain additional properties linked to these trips via INNER JOINs. (ADDITIONAL_TABLE)
A separate table showing the steps taken along the journey (ie. interim locations visited). A value of STEP_NUM = 1 is always the final destination, and thus there is always at least 1 record in this table per trip in the main table. If there were any interim stops made of the journey they are listed in this table as separate records with STEP_NUM iterating upwards. (JOURNEY_STEPS_TABLE)
MAIN_TRIP_TABLE
RECORD_ID | PROP_1 | PROP_2 | FINAL_DEST | ...
-------------------------------------------------
10001 | A | 1 | London | ...
10002 | A | 0 | Reading | ...
10003 | B | 1 | Leeds | ...
10004 | B | 0 | York | ...
ADDITIONAL_TABLE
RECORD_ID | PROP_3 | ...
------------------------
10001 | X | ...
10002 | Y | ...
10003 | Y | ...
10004 | X | ...
JOURNEY_STEPS_TABLE
RECORD_ID | STEP_NUM | LOCATION | ...
--------------------------------------
10001 | 1 | London | ...
10002 | 1 | Reading | ...
10002 | 2 | Bath | ...
10003 | 1 | Leeds | ...
10003 | 2 | York | ...
10003 | 3 | Bristol | ...
10004 | 1 | York | ...
10004 | 2 | Cardiff | ...
10004 | 3 | Oxford | ...
10004 | 4 | London | ...
Issue
I want to retrieve something that looks like:
SELECT
MAIN_TRIP_TABLE.RECORD_ID
, MAIN_TRIP_TABLE.PROP_1
, MAIN_TRIP_TABLE.PROP_2
, ADDITIONAL_TABLE.PROP_3
, <Concatenation/Array of JOURNEY_STEPS_TABLE> as "InterimStops"
FROM MAIN_TRIP_TABLE
INNER JOIN ADDITIONAL_TABLE ON MAIN_TRIP_TABLE.RECORD_ID = ADDITIONAL_TABLE.RECORD_ID
LEFT OUTER JOIN JOURNEY_STEPS_TABLE ON MAIN_TRIP_TABLE.RECORD_ID = JOURNEY_STEPS_TABLE.RECORD_ID
Where the "InterimStops" value above is some sort of concatenation of any and all values in found in the JOURNEY_STEPS_TABLE, for that particular RECORD_ID, in order of increasing STEP_NUM, with some sort of deliminator. (eg for '10001' I would want just "London", and for '10004' I would want "York,Cardiff,Oxford,London").
If I get something like this, I can then separate these out to an JSON array, within the application I'm developing.
Note: The actual SQL SELECT query is already significantly more complex with other fields and tables, so changing the query away from 1 SELECT query (ie. instead using multiple queries), is something I'd like to avoid unless absolutely necessary.
Things I've tried
After some Googling, I started to build a SQL statement using LISTAGG, and to begin with it looked promising:
SELECT
MAIN_TRIP_TABLE.RECORD_ID
, LISTAGG(JOURNEY_STEPS_TABLE.LOCATION, ',') WITHIN GROUP (ORDER BY JOURNEY_STEPS_TABLE.STEP_NUMBER) "InterimStops"
FROM MAIN_TRIP_TABLE
LEFT OUTER JOIN JOURNEY_STEPS_TABLE ON MAIN_TRIP_TABLE.RECORD_ID = JOURNEY_STEPS_TABLE.RECORD_ID
GROUP BY MAIN_TRIP_TABLE.RECORD_ID
This returned exactly the sort of value I was looking for, but this failed as soon as I tried to bring back in the other values from both the main table and additional tables (eg: MAIN_TRIP_TABLE.PROP_1, MAIN_TRIP_TABLE.PROP_2, ADDITIONAL_TABLE.PROP_3). This gave me a "ORA-00979: not a GROUP BY expression" error.
I then tried to get this data via a subquery but struggled to get anything working.
Any help, insight, or pointing in the right direct, would be very much appreciated.
Many Thanks
It's easier to do this with a subquery so you don't have to group the data on the joined set of columns (as you allready tried):
SELECT MAIN_TRIP_TABLE.RECORD_ID
, (SELECT LISTAGG(JOURNEY_STEPS_TABLE.LOCATION, ',') WITHIN GROUP (ORDER BY JOURNEY_STEPS_TABLE.STEP_NUMBER)
FROM JOURNEY_STEPS_TABLE
WHERE JOURNEY_STEPS_TABLE.RECORD_ID = MAIN_TRIP_TABLE.RECORD_ID) "InterimStops"
FROM MAIN_TRIP_TABLE
The other possibility is to LEFT JOIN the grouped data:
SELECT MAIN_TRIP_TABLE.RECORD_ID
, JOURNEY_STEPS_TABLE."InterimStops"
FROM MAIN_TRIP_TABLE
LEFT JOIN (SELECT RECORD_ID
, LISTAGG(LOCATION, ',') WITHIN GROUP (ORDER BY STEP_NUMBER) "InterimStops"
FROM JOURNEY_STEPS_TABLE
GROUP BY RECORD_ID) JOURNEY_STEPS_TABLE
ON JOURNEY_STEPS_TABLE.RECORD_ID = MAIN_TRIP_TABLE.RECORD_ID

Columns to Rows Two Tables in Cross Apply

Using SQL Server I have two tables, below sample Table #T1 in DB has well over a million rows, Table #T2 has 100 rows. Both tables are in Column format and I need to Pivot to rows and join both.
Can I get it all in one query with Cross Apply and remove the cte?
This is my code, I have correct output but is this the most efficient way to do this considering number of rows?
with cte_sizes
as
(
select SizeRange,Size,ColumnPosition
from #T2
cross apply (
values(Sz1,1),(Sz2,2),(Sz3,3),(Sz4,4)
) X (Size,ColumnPosition)
)
select a.ProductID,a.SizeRange,c.Size,isnull(x.Qty,0) as Qty
from #T1 a
cross apply (
values(a.Sale1,1),(a.Sale2,2),(a.Sale3,3),(a.Sale4,4)
) X (Qty,ColumnPosition)
inner join cte_sizes c
on c.SizeRange = a.SizeRange
and c.ColumnPosition = x.ColumnPosition
I have also code and considered this but is this the CROSS APPLY a better method?
with cte_sizes
as
(
select 1 as SizePos
union all
select SizePos + 1 as SizePos
from cte_sizes
where SizePos < 4
)
select a.ProductID
,a.SizeRange
,(case when b.SizePos = 1 then c.Sz1
when b.SizePos = 2 then c.Sz2
when b.SizePos = 3 then c.Sz3
when b.SizePos = 4 then c.Sz4 end
) as Size
,isnull((case when b.SizePos = 1 then a.Sale1
when b.SizePos = 2 then a.Sale2
when b.SizePos = 3 then a.Sale3
when b.SizePos = 4 then a.Sale4 end
),0) as Qty
from #T1 a
inner join #T2 c on c.SizeRange = a.SizeRange
cross join cte_sizes b
This is wild guessing, but my magic crystall ball told me, that you might be looking for something like this:
For this we do not need your table #TS at all.
WITH Unpivoted2 AS
(
SELECT t2.SizeRange,A.* FROM #t2 t2
CROSS APPLY(VALUES(1,t2.Sz1)
,(2,t2.Sz2)
,(3,t2.Sz3)
,(4,t2.Sz4)) A(SizePos,Size)
)
SELECT t1.ProductID
,Unpivoted2.SizeRange
,Unpivoted2.Size
,Unpivoted1.Qty
FROM #t1 t1
CROSS APPLY(VALUES(1,t1.Sale1)
,(2,t1.Sale2)
,(3,t1.Sale3)
,(4,t1.Sale4)) Unpivoted1(SizePos,Qty)
LEFT JOIN Unpivoted2 ON Unpivoted1.SizePos=Unpivoted2.SizePos AND t1.SizeRange=Unpivoted2.SizeRange
ORDER BY t1.ProductID,Unpivoted2.SizeRange;
The result:
+-----------+-----------+------+------+
| ProductID | SizeRange | Size | Qty |
+-----------+-----------+------+------+
| 123 | S-XL | S | 1 |
+-----------+-----------+------+------+
| 123 | S-XL | M | 12 |
+-----------+-----------+------+------+
| 123 | S-XL | L | 13 |
+-----------+-----------+------+------+
| 123 | S-XL | XL | 14 |
+-----------+-----------+------+------+
| 456 | 8-14 | 8 | 2 |
+-----------+-----------+------+------+
| 456 | 8-14 | 10 | 22 |
+-----------+-----------+------+------+
| 456 | 8-14 | 12 | NULL |
+-----------+-----------+------+------+
| 456 | 8-14 | 14 | 24 |
+-----------+-----------+------+------+
| 789 | S-L | S | 3 |
+-----------+-----------+------+------+
| 789 | S-L | M | NULL |
+-----------+-----------+------+------+
| 789 | S-L | L | 33 |
+-----------+-----------+------+------+
| 789 | S-L | XL | NULL |
+-----------+-----------+------+------+
The idea in short:
The cte will return your #T2 in an unpivoted structure. Each name-numbered column (something you should avoid) is return as a single row with an index indicating the position.
The SELECT will do the same with #T1 and join the cte against this set.
UPDATE: After a lot of comments...
If I get this (and the changes to the initial question) correctly, the approach above works perfectly well, but you want to know, what was best in performance.
The first answer to "What is the fastest approach?" is Race your horses by Eric Lippert.
Good to know 1: A CTE is nothing more then syntactic sugar. It will allow to type a sub-query once and use it like a table, but it has no effect to the way how the engine will work this down.
Good to know 2: It is a huge difference whether you use APPLY or JOIN. The first will call the sub-source once per row, using the current row's values. The second will have to create two sets first and will then join them by some condition. There is no general "what is better"...
For your issue: As there is one very big set and one very small set, all depends on when you reduce the big set usig any kind of filter. The earlier the better.
And most important: It is - in any case - a sign of bad structures - when you find name numbering (something like phone1, phone2, phoneX). The most expensive work will be to transform your 4 name-numbered columns to some dedicated rows. This should be stored in normalized format...
If you still need help, I'd ask you to start a new question.

SQL Server stored procedure inserting duplicate rows

I have a table with column GetDup and I'd like to the duplicate records based on the value of this column. For example, if value on is 1 in GetDup, then duplicate the record once. If value in the column is 2, then duplicate the record twice and so on and the statement has to be in looping statement.
What will be a good way to write a stored procedures for this? Please help.
Input:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
What I want:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
picture of data
Answer #2 After Clarification
Number Table to the Rescue!
The number table in my example (or tally table, if you want to call it that), is both temporary and very small. To make it bigger, just add more values to z and add more CROSS JOINs. In my opinion, a number table and a calendar table are both things that should be in every database you have. They are extremely useful.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE mytable ( Getdup int, CustomerName varchar(10), CustomerAdd varchar(20) ) ;
INSERT INTO mytable (Getdup, CustomerName, CustomerAdd)
VALUES (1,'John','123 SomeWhere'), (2,'Bob','987 SomeWhere')
;
Query 1:
;WITH z AS (
SELECT *
FROM ( VALUES(0),(0),(0),(0) ) v(x)
)
, numTable AS (
SELECT num
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY z1.x)-1 num
FROM z z1
CROSS JOIN z z2
) s1
)
SELECT t1.Getdup, t1.CustomerName, t1.CustomerAdd
FROM mytable t1
INNER JOIN numTable ON t1.getdup >= numTable.num
ORDER BY CustomerName, CustomerAdd
Results:
| Getdup | CustomerName | CustomerAdd |
|--------|--------------|---------------|
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
--------------------------------------------------------------------------
ORIGINAL ANSWER
EDIT: After further clarification of the problem, this won't duplicate rows, this will only duplicate the data in a column.
Something like one of these might work.
T-SQL
SELECT replicate(mycolumn,getdup) AS x
FROM mytable
MySQL
SELECT repeat(mycolumn,getdup) AS x
FROM mytable
Oracle SQL
SELECT rpad(mycolumn,getdup*length(mycolumn),mycolumn) AS x
FROM mytable
PostgreSQL
SELECT repeat(mycolumn,getdup+1) AS x
FROM mytable
If you can provide more details for exactly what you want and what you're working with, we might be able to help you better.
NOTE 2: Depending on what you need, you may need to do some math magic. You say above if GetDup is 1 then you want one duplicate. If that means that your output should be GetDup``GetDup, then you'll want to add one in the repeat(),replicate() or rpad() functions. ie replicate(mycolumn,getdup+1). Oracle SQL will be a little different, since it uses rpad().
In standard SQL you can use a recursive CTE:
with recursive cte as (
select t.dup, . . .
from t
union all
select cte.dup - 1, . . .
from cte
where cte.dup > 1
)
select *
from cte;
Of course, not all databases support recursive CTEs (and the recursive keyword is not used in some of them).
So, you want recursive solution :
with t as (
select Getdup, CustomerName, CustomerAdd, 0 as id
from table
union all
select Getdup, CustomerName, CustomerAdd, id + 1
from t
where id < getdup
)
insert into table (col1, col2, col3)
select Getdup, CustomerName, CustomerAdd
from t
order by getdup
option (maxrecursion 0);

SQL SELECT only rows where a max value is present, and the corresponding ID from another linked table

I have a simple Parts database which I'd like to use for calculating costs of assemblies, and I need to keep a cost history, so that I can update the costs for parts without the update affecting historic data.
So far I have the info stored in 2 tables:
tblPart:
PartID | PartName
1 | Foo
2 | Bar
3 | Foobar
tblPartCostHistory
PartCostHistoryID | PartID | Revision | Cost
1 | 1 | 1 | £1.00
2 | 1 | 2 | £1.20
3 | 2 | 1 | £3.00
4 | 3 | 1 | £2.20
5 | 3 | 2 | £2.05
What I want to end up with is just the PartID for each part, and the PartCostHistoryID where the revision number is highest, so this:
PartID | PartCostHistoryID
1 | 2
2 | 3
3 | 5
I've had a look at some of the other threads on here and I can't quite get it. I can manage to get the PartID along with the highest Revision number, but if I try to then do anything with the PartCostHistoryID I end up with multiple PartCostHistoryIDs per part.
I'm using MS Access 2007.
Many thanks.
Mihai's (very concise) answer will work assuming that the order of both
[PartCostHistoryID] and
[Revision] for each [PartID]
are always ascending.
A solution that does not rely on that assumption would be
SELECT
tblPartCostHistory.PartID,
tblPartCostHistory.PartCostHistoryID
FROM
tblPartCostHistory
INNER JOIN
(
SELECT
PartID,
MAX(Revision) AS MaxOfRevision
FROM tblPartCostHistory
GROUP BY PartID
) AS max
ON max.PartID = tblPartCostHistory.PartID
AND max.MaxOfRevision = tblPartCostHistory.Revision
SELECT PartID,MAX(PartCostHistoryID) FROM table GROUP BY PartID
Here is query
select PartCostHistoryId, PartId from tblCost
where PartCostHistoryId in
(select PartCostHistoryId from
(select * from tblCost as tbl order by Revision desc) as tbl1
group by PartId
)
Here is SQL Fiddle http://sqlfiddle.com/#!2/19c2d/12