logic to create more rows in sql - sql

I have a table1 that I wanted to transform into the expected table.
Expected table logic for columns:
cal: comes from cal of table1. ID comes from the ID of table1.
code: this is populated with lp or fp depending upon if we have a value in f_a then we create a new record with fp as code. corresponding to it we check if f_a is populated if yes then we take that date from f_a and put in in the Al column for the same ID. also we check if f_pl is populated if yes then we take the date from that and put it in the pl column.
If the code was lp then we check if l_a is populated then we take that date and place in the date in Al for that code and Id. also, we check if lpl is populated if yes then we take that date and put it in pl.
I am just a starter with SQL so it is a bit overwhelming for me on how to get it started. Please post some solutions.
table1:
ID f_a l_a f_pl lpl cal
CNT 6/20/2018 6/28/2018 6/28/2018 1/31/2020
expected output:
ID Cal code pl Al
CNT 1/31/2020 lp 6/28/2018 6/28/2018
CNT 1/31/2020 fp 6/20/2018
Update:
I have more IDs in the table, so it is not that CNT is the only Id. If I use unpivot then it should follow the same logic for all IDs.

This is a question about how to unpivot columns to rows. In Oracle, I would recommend a lateral join:
select t.id, t.cal, x.*
from mytable t
cross apply (
select 'lp' as code, t.lpl as pl, l_a as al from dual
union all
select 'fp', t.f_pl, t.f_a from dual
) x
This syntax is available in Oracle 12.1 onwards. In earlier versions, you would use union all:
select id, cal, 'lp' as code, lpl as pl, l_a as al from mytable
union all
select id, cal, 'lp' as code, 'fp', f_pl, f_a from mytable

You can use UNPIVOT for multiple columns then do the checks you need on dates:
with a as (
select
'CNT' as ID,
date '2018-06-20' as f_a,
date '2018-06-28' as l_a,
cast(null as date) as f_pl,
date '2018-06-28' as l_pl,
date '2020-01-31' as cal
from dual
)
select *
from a
unpivot(
(pl, al) for code in ((l_pl, l_a) as 'lp', (f_pl, f_a) as 'fp')
) up
ID | CAL | CODE | PL | AL
CNT | 31-JAN-07 | lp | 28-JUN-18 | 28-JUN-18
CNT | 31-JAN-07 | fp | | 20-JUN-18
Working example here.

Please try this script which is not version dependend:
-- Here we select columns foom source table. Please change the names if they are different
with r as (
select
ID,
f_a,
l_a,
f_pl,
lpl, -- Here not sure if example is wrong or no underscore in column deffinition
cal
from table_1 -- Please put real table name here
)
select * from (
select r.id, r.cal, 'lp' as code, r.l_pl as pl, l_a as ai
from r
where r.l_a is not null
union all
select r1.id, r1.cal, 'pl', r1.f_pl, r1.f_a
from r r1
where r1.f_a is not null
)
order by id, cal, code;

Related

Mapping All Terminal IDs to Previous IDs

I have a table in SQL Server that contains a list of all ID migrations overtime. An individual's ID can change overtime, and this table helps us understand when the change occurs, and what the ID changes from/to. What I'd ultimately like is a way to list all of the previous IDs for the most recent ID (which I'm referring to as the terminal ID). I'm assuming this will require some sort of CTE, but my brain is in a bit of a fog as to how I should set this up.
CREATE TABLE #ExampleIdCrosswalk
(
CurrentId VARCHAR(3)
,PreviousId VARCHAR(3)
,PreviousIdObsoleteDate DATE
)
INSERT INTO #ExampleIdCrosswalk
VALUES
('DEF','ABC','2021-01-01')
,('WVU','ZYX','2021-01-01')
,('MNO','ONM','2021-02-01')
,('PPP','EEE','2021-02-01')
,('GHI','DEF','2021-03-01')
,('TSR','WVU','2021-03-01')
,('NRP','QRS','2021-03-01')
,('JKL','GHI','2021-04-01')
SELECT * FROM #ExampleIdCrosswalk
Ultimately, what I'd like to show is a table with all the terminal ID's along with each of their corresponding previous IDs.
Any help would be appreciated!
You can use a recursive CTE for this:
with cte as (
select currentid, previousid
from ExampleIdCrosswalk ec
where not exists (select 1 from ExampleIdCrosswalk ec2 where ec2.previousId = ec.currentid)
union all
select cte.currentid, ec.previousid
from cte join
ExampleIdCrosswalk ec
on ec.currentId = cte.previousId
)
select *
from cte;
Here is a db<>fiddle.
You can use a recursive CTE, as in:
with
n (last, curr, prev) as (
select currentid, currentid, previousid
from ExampleIdCrosswalk where currentid not in (
select previousid from ExampleIdCrosswalk
)
union all
select n.last, c.currentid, c.previousid
from n
join ExampleIdCrosswalk c on c.currentid = n.prev
)
select last, prev
from n
order by last, prev
Result:
last prev
----- ----
JKL ABC
JKL DEF
JKL GHI
MNO ONM
NRP QRS
PPP EEE
TSR WVU
TSR ZYX
See running example at db<>fiddle.

Split record into 2 records with distinct values based on a unique id

I have a table with some IDs that correspond to duplicate data that i would like to get rid of. They are linked by a groupid number. Currently my data looks like this:
|GroupID|NID1 |NID2 |
|S1 |644763|643257|
|T2 |4759 |84689 |
|W3 |96676 |585876|
In order for the software to run, I need the data in the following format:
|GroupID|NID |
|S1 |644763|
|S1 |643257|
|T2 |4759 |
|T2 |84689 |
|W3 |96676 |
|W3 |585876|
Thank you for your time.
You want union all :
select groupid, nid1 as nid
from table t
union all -- use "union" instead if you don't want duplicate rows
select groupid, nid2
from table t;
In Oracle 12C+, you can use lateral joins:
select t.groupid, v.nid
from t cross apply
(select t.nid1 as nid from dual union all
select t.nid2 as nid from dual
) v;
This is more efficient than union all because it only scans the table once.
You can also express this as:
select t.groupid,
(case when n.n = 1 then t.nid1 when n.n = 2 then t.nid2 end) as nid
from t cross join
(select 1 as n from dual union all select 2 from dual) n;
A little more complicated, but still only one scan of the table.

How to eliminate duplicate and merge column value to single text in Vertica

I am trying to join three table and get the results, however, one of the tables has multiple event_code for the same CSO_Item_key which is resulting in duplicate records.
Please note my source is Vertica and Target is SQL server.
I tried stuff and for XML approach but is not working with vertica; it says incorrect syntax XML.
Is there any other solution
Table 1
Entry Date Cso Item Key Fail Code
8/1/2018 4:28 BLXB796201 CSL120
8/1/2018 4:40 BLXB799101 CLL250
8/1/2018 4:55 BLXB803001 CMS130
8/1/2018 5:08 BLXB806201 CNE100
Table 2
Cso Item Key Event Code
BLXB796201 GTS
BLXB796201 LC28
BLXB796201 SDR4
BLXB799101 GTS
BLXB799101 LC28
BLXB799101 SDR4
BLXB803001 GTS
BLXB803001 LC28
BLXB803001 SDR4
BLXB806201 GTS
BLXB806201 LC28
BLXB806201 SDR4
Table 3
Fail Code Desc
CSL120 Bad Part
CLL250 Unit Scrapped
CNE100 OS Reinstall
CBN101 NTF
Expected Result:
Entry_Date Cso_Item_Key Fail_Code Desc Event_Code
8/1/2018 4:28 BLXB796201 CSL120 Bad Part GTS,LC28,SDR4
8/1/2018 4:40 BLXB799101 CLL250 Unit Scrapped GTS,LC28,SDR4
8/1/2018 4:55 BLXB803001 CMS130 Null GTS,LC28,SDR4
8/1/2018 5:08 BLXB806201 CNE100 OS Reinstall GTS,LC28,SDR4
Screenshot of data:
One of the only solutions I've seen for this is the strings_package extension which can be found here on github. With it, you can use the group_concat function like so:
-- get a list of nodes
select group_concat(node_name) over () from nodes;
-- nodes with storage for a projection
select schema_name,projection_name,
group_concat(node_name) over (partition by schema_name,projection_name)
from (select distinct node_name,schema_name,projection_name from storage_containers) sc order by schema_name, projection_name;
This is trying to do it all in SQL - a bit cheating as I am relying on the fact that Table_2 always has 3 different event codes for each CSO Item Key.
If that is not the case, you would have to add a few rows - up to the maximum number of Event Codes per CSO Item Key, to the i index table I'm creating as a Common Table expression, and you would have to LEFT JOIN that i table to tb2, and add some NULL processing logic to the expression, for example: ||','||MAX(CASE i.i WHEN 2 THEN event_code END), so that an empty string is concatenated when the event_code in the expression is NULL.
But otherwise - with your input (which you should take out of the query when you really use it), it could look like this:
WITH
-- your input, don't use in real query ...
tb1(Entry_Date,Cso_Item_Key,Fail_Code) AS (
SELECT TIMESTAMP '8/1/2018 4:28','BLXB796201','CSL120'
UNION ALL SELECT TIMESTAMP '8/1/2018 4:40','BLXB799101','CLL250'
UNION ALL SELECT TIMESTAMP '8/1/2018 4:55','BLXB803001','CMS130'
UNION ALL SELECT TIMESTAMP '8/1/2018 5:08','BLXB806201','CNE100'
)
,
tb2(Cso_Item_Key,Event_Code) AS (
SELECT 'BLXB796201','GTS'
UNION ALL SELECT 'BLXB796201','LC28'
UNION ALL SELECT 'BLXB796201','SDR4'
UNION ALL SELECT 'BLXB799101','GTS'
UNION ALL SELECT 'BLXB799101','LC28'
UNION ALL SELECT 'BLXB799101','SDR4'
UNION ALL SELECT 'BLXB803001','GTS'
UNION ALL SELECT 'BLXB803001','LC28'
UNION ALL SELECT 'BLXB803001','SDR4'
UNION ALL SELECT 'BLXB806201','GTS'
UNION ALL SELECT 'BLXB806201','LC28'
UNION ALL SELECT 'BLXB806201','SDR4'
)
,
tb3(Fail_Code,Descr) AS (
SELECT 'CSL120','Bad Part'
UNION ALL SELECT 'CLL250','Unit Scrapped'
UNION ALL SELECT 'CNE100','OS Reinstall'
UNION ALL SELECT 'CBN101','NTF'
)
-- real WITH clause starts here - and table "i" can contain more than 3 rows..
,
i(i) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
)
,
tb2_w_i AS (
SELECT
*
, ROW_NUMBER() OVER (PARTITION BY cso_item_key ORDER BY event_code) AS i
FROM tb2
)
,
tb2_pivot AS (
SELECT
cso_item_key
, MAX(CASE i.i WHEN 1 THEN event_code END)
||','||MAX(CASE i.i WHEN 2 THEN event_code END)
||','||MAX(CASE i.i WHEN 3 THEN event_code END)
AS event_codes
FROM tb2_w_i JOIN i USING(i)
GROUP BY 1
)
SELECT
entry_date
, tb1.cso_item_key
, tb1.fail_code
, descr
, event_codes
FROM tb1
JOIN tb2_pivot USING(cso_item_key)
LEFT JOIN tb3 USING(fail_code)
;
The result (my NULLSTRING is the dash..)
entry_date |cso_item_key|fail_code|descr |event_codes
2018-08-01 04:28:00|BLXB796201 |CSL120 |Bad Part |GTS,LC28,SDR4
2018-08-01 04:40:00|BLXB799101 |CLL250 |Unit Scrapped|GTS,LC28,SDR4
2018-08-01 04:55:00|BLXB803001 |CMS130 |- |GTS,LC28,SDR4
2018-08-01 05:08:00|BLXB806201 |CNE100 |OS Reinstall |GTS,LC28,SDR4

SQL with nested condition

EDIT: added third requirement after playing with solution from Tim Biegeleisen
EDIT2: modified Robbie's DOB to be before his parent's marriage date
I am trying to create a query that will look at two tables and determine the difference in dates based on a percentage. I know, super confusing... Let me try and explain using the tables below:
Bob and Mary are married on 2010-01-01 and expect 4 kids (Parent table)
I want to know how many years it took until they met 50% of their expected kids (i.e. 2/4 kids). Using the Child table to see the DOB of their 4 kids, we know that Frankie is the second child which meets our 50% threshold so we use Frankie's DOB and subtract it from Frankie's parent's marriage date and end up with 3 years!
If the goal isn't reached then display no value e.g. Mick and Jo only had 1 child so far so they haven't yet reached their goal
Hoping this is doable using BigQuery standard SQL.
Parent table
id married_couple married_at expected_kids
--------------------------------------
1 Bob and Mary 2010-01-01 4
2 Mick and Jo 2010-01-01 4
Child table
id child_name parent_id date_of_birth
--------------------------------------
1 Eddie 1 2012-01-01
2 Frankie 1 2013-01-01
3 Robbie 1 2005-01-01
4 Duncan 1 2015-01-01
5 Rick 2 2014-01-01
Expected SQL result
parent_id half_goal_reached(years)
--------------------------------------
1 3
2
Below both soluthions for BigQuery Standard SQL
First one is more in classic sql way, the second one is more of BigQuery style (I think)
First Solution: with analytics function
#standardSQL
SELECT
parent_id,
IF(
MAX(pos) = MAX(CAST(expected_kids / 2 AS INT64)),
MAX(DATE_DIFF(date_of_birth, married_at, YEAR)),
NULL
) AS half_goal_reached
FROM (
SELECT c.parent_id, c.date_of_birth, expected_kids, married_at,
ROW_NUMBER() OVER(PARTITION BY c.parent_id ORDER BY c.date_of_birth) AS pos
FROM `child` AS c
JOIN `parent` AS p
ON c.parent_id = p.id
)
WHERE pos <= CAST(expected_kids / 2 AS INT64)
GROUP BY parent_id
Second Solution: with use of ARRAY
#standardSQL
SELECT
parent_id,
DATE_DIFF(dates[SAFE_ORDINAL(CAST(expected_kids / 2 AS INT64))], married_at, YEAR) AS half_goal_reached
FROM (
SELECT
parent_id,
ARRAY_AGG(date_of_birth ORDER BY date_of_birth) AS dates,
MAX(expected_kids) AS expected_kids,
MAX(married_at) AS married_at
FROM `child` AS c
JOIN `parent` AS p
ON c.parent_id = p.id
GROUP BY parent_id
)
Dummy Data
You can test / play with both solutions using below dummy data
#standardSQL
WITH `parent` AS (
SELECT 1 id, 'Bob and Mary' married_couple, DATE '2010-01-01' married_at, 4 expected_kids UNION ALL
SELECT 2, 'Mick and Jo', DATE '2010-01-01', 4
),
`child` AS (
SELECT 1 id, 'Eddie' child_name, 1 parent_id, DATE '2012-01-01' date_of_birth UNION ALL
SELECT 2, 'Frankie', 1, DATE '2013-01-01' UNION ALL
SELECT 3, 'Robbie', 1, DATE '2014-01-01' UNION ALL
SELECT 4, 'Duncan', 1, DATE '2015-01-01' UNION ALL
SELECT 5, 'Rick', 2, DATE '2014-01-01'
)
Try the following query, whose logic is too verbose to explain it well. I join the parent and child tables, bringing into line the parent id, number of years elapsed since marriage, running number of children, and expected number of children. With this information in hand, we can easily find the first row whose running number of children matches or exceeds half of the expected number.
SELECT parent_id, num_years AS half_goal_reached
FROM
(
SELECT parent_id, num_years, cnt, expected_kids,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY num_years) rn
FROM
(
SELECT
t2.parent_id,
YEAR(t2.date_of_birth) - YEAR(t1.married_at) AS num_years,
(SELECT COUNT(*) FROM child c
WHERE c.parent_id = t2.parent_id AND
c.date_of_birth <= t2.date_of_birth) AS cnt,
t1.expected_kids
FROM parent t1
INNER JOIN child t2
ON t1.id = t2.parent_id
) t
WHERE
cnt >= expected_kids / 2
) t
WHERE t.rn = 1;
Note that there may be issues with how I computed the yearly differences, or how I compute the threshhold for half the number of expected children. Also, if we were using a recent enterprise database we could have used an analytic function to get the running number of children instead of a correlated subquery, but I was unsure if Big Query would support that, so I used the latter.

How can I SELECT distinct data based on a date field?

I have table that stores a log of changes to objects in another table. Here are my table contents:
ObjID Color Date User
------- ------- ------------------------ --------
1 Red 2010-01-01 12:22:00.000 Joe
1 Blue 2010-01-02 15:22:00.000 Jill
1 Green 2010-01-03 16:22:00.000 Joe
1 White 2010-01-10 09:22:00.000 Mike
2 Red 2010-01-09 10:22:00.000 Mike
2 Blue 2010-01-12 09:22:00.000 Jill
2 Orange 2010-01-12 15:22:00.000 Joe
I want to select the most recent date for each Object, as well as the Color and User on the date of that record.
Bascically, I want this result set:
ObjID Color Date User
------- ------- ------------------------ --------
1 White 2010-01-10 09:22:00.000 Mike
2 Orange 2010-01-12 15:22:00.000 Joe
I'm having trouble wrapping my head around the SQL query I need to write to get this data...
I am retrieving data via ODBC from an iSeries DB2 database (AS/400).
Hey there, I think you want the following (where ColorTable is your table name):
SELECT Color.*
FROM ColorTable as Color
INNER JOIN
(
SELECT ObjID, MAX(Date) as Date
FROM ColorTable
GROUP BY ObjID
) as MaxDateByColor
ON Color.ObjID = MaxDateByColor.ObjID
AND Color.Date = MaxDateByColor.Date
Assuming at least SQL Server 2005
DECLARE #T TABLE (ObjID INT,Color VARCHAR(10),[Date] DATETIME,[User] VARCHAR(50))
INSERT INTO #T
SELECT 1,'Red',' 2010-01-01 12:22:00.000','Joe' UNION ALL
SELECT 1,'Blue','2010-01-02 15:22:00.000','Jill' UNION ALL
SELECT 1,'Green',' 2010-01-03 16:22:00.000','Joe' UNION ALL
SELECT 1,'White',' 2010-01-10 09:22:00.000','Mike' UNION ALL
SELECT 2,'Red',' 2010-01-09 10:22:00.000','Mike' UNION ALL
SELECT 2,'Blue','2010-01-12 09:22:00.000','Jill' UNION ALL
SELECT 2,'Orange','2010-01-12 15:22:00.000','Joe'
;WITH T AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ObjID ORDER BY Date DESC) AS RN
FROM #T
)
SELECT ObjID,
Color,
[Date],
[User]
FROM T
WHERE RN=1
Or a SQL Server 2000 method from the article linked to in the comments
SELECT ObjID,
CAST(SUBSTRING(string, 24, 33) AS VARCHAR(10)) AS Color,
CAST(SUBSTRING(string, 1, 23) AS DATETIME ) AS [Date],
CAST(SUBSTRING(string, 34, 83) AS VARCHAR(50)) AS [User]
FROM
(
SELECT ObjID,
MAX((CONVERT(CHAR(23), [Date], 126)
+ CAST(Color AS CHAR(10))
+ CAST([User] AS CHAR(50))) COLLATE Latin1_General_BIN) AS string
FROM #T
GROUP BY ObjID) T;
If you have an Objects table and your ObjectHistory table has an index on ObjID and date, then this could perform better than other queries given so far:
SELECT
X.*
FROM
Objects O
CROSS APPLY (
SELECT TOP 1 *
FROM ObjectHistory H
WHERE O.ObjID = O.ObjID
ORDER BY H.[Date] DESC
) X
The performance improvement may only come if you're pulling columns from the Objects table, too, but it's worth a shot.
If you want all Objects regardless of whether they have a history entry, switch to OUTER APPLY (and of course use O.ObjID instead of H.ObjID).
The neat thing about this query is that
It solves for situations where the Date value can have duplicates
It can support an arbitrary number of items per group (say, the top 5 instead of the top 1)
See these two related questions:
SQL/mysql - Select distinct/UNIQUE but return all columns?
And:
How to efficiently determine changes between rows using SQL
SELECT t1.* FROM Table_name as t1
INNER JOIN (
SELECT MAX(Date) as MaxDate, ObjID FROM Table_name
GROUP BY ObjID
) as t2
ON t1.ObjID = t2.ObjID AND t1.Date = t2.MaxDate
You can find out, per object, its most recent change like this:
select objectid, max(changedate) as LatestChange
from LOG
group by objectid
You can then get the color and user columns by linking the set returned above, instantiated as an inline view that has been given an alias, to the same table again:
select color, user, FOO.objectid, FOO.LatestChange
from LOG
inner join
(
select objectid, max(changedate) as LatestChange
from LOG
group by objectid
) as FOO
on LOG.objectid = FOO.objectid and LOG.changedate = FOO.LatestChange
like martin smiths above,
simply just do a row number over partition and pick one of the rows that is most recent
like
SELECT Color,Date,User
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY User ORDER BY [DATE]) AS ROW_NUMBER
FROM [tablename]
) AS ROWS
WHERE
ROW_NUMBER = 2