Join Issue in Teradata

Join Issue in Teradata - sql

I have 2 fields in 2 tables .ie. status (status VARCHAR(80) CHARACTER SET LATIN CASESPECIFIC)
One table has 1000 status as value is 'success'
2nd table has 1 value for status=success and other values like 'failure'. I want to join 2 tables and get the value from 2nd table (dw_status_id)
1ST TABLE scratch.COGIPF_RUNREPORT_test
STATUS | any_number
success | 67
success | 1
success | 2
success | 3
success | 42
success | 52
failure | 45
2nd table scratch.dw_job_status_dim_test
status |dw_status_id
failure |34
success |12
running |45
Result :-
Status | dw_status_id
success | 12
success | 12
success | 12
success | 12
success | 12
success | 12
failure | 34
Query I am using :-
sel b.dw_status_id from scratch.COGIPF_RUNREPORT_test a
join scratch.dw_job_status_dim_test b on trim(a.status)=trim(b.status)
Actual result =0
It would be very great if any one can help to achieve this
Thanks

You only have to select the status from scratch.COGIPF_RUNREPORT_test and the dw_status_id from scratch.dw_job_status_dim_test. And you have to check if the status of both tables equals success.
So I've tried this on my own, maybe it helps:
select distinct scratch.COGIPF_RUNREPORT_test.status, dw_status_id
from scratch.COGIPF_RUNREPORT_test, scratch.dw_job_status_dim_test
where scratch.COGIPF_RUNREPORT_test.status = scratch.dw_job_status_dim_test.status

These are 3 records from first table :-
ship by date on time metrics-superrush 0 0 0 0 2018-03-07 01:40:08 2018-03-07 00:00:00 2018-03-07 01:41:46.917000 1.64101666667 success
lab - late shipment details 0 0 0 0 2018-03-07 01:40:08 2018-03-07 00:00:00 2018-03-07 01:40:44.950000 0.6078 success
shipping upgrade-tp/wpd/mypub 0 0 0 0 2018-03-07 01:40:09 2018-03-07 00:00:00 2018-03-07 01:40:25.028000 0.2674 success
These are some records from 2nd table
dw_status_id status description success_indicator
10 running RUNNING 0
11 stopped STOPPED 0
12 success SUCCESS 1
I have tried your 2 queries.. both are giving no results.
Ideally it should give the desired result. But somehow we are mistaking in join or varchar case specific ..PLease let me know if you are having more thought where i am doing mistake

Related

SQL query that I have set up the algorithm but cannot write the code

I could not find keywords to describe in the title.
I have a problem and I just can explain with example, I have a table like this
user_id | transaction_id | bonus_id | created_at
1. 1 4 2021-05-01
1 3 65 2021-05-01
1 4 4 2021-05-02
1 1 5 2021-05-02
1. 3 76. 2021-05-03
1 2 5 2021-05-03
Due to a mistake I made in php here, transaction id 3 and bonus id 65 but the bonus id 4 that should be
I need to replace all transactions from transaction type 1 to the next transaction type 1 with the bonus id of the first transaction_type_1.
but of course I have to do this for every user. How can I do that?

Write a query in MSSQL to get report for last 30 days

I want to create a report basis on,
Lead Name
Verified By
Verified on
Lead 1
ABC
11-02-2021
Lead 2
KMJ
9-02-2021
Lead 3
ABC
11-02-2021
The report will look like,
Consider today's date as 12-02-2021, we need to create a report for the last 30 days for employees work count
user
12-02-2021
11-02-2021
10-02-2021
9-02-2021
8-02-2021
7-02-2021
so on till last 30 days
ABC
0
2
0
0
0
0
XYZ
0
0
0
0
0
0
KMJ
0
0
0
1
0
0
I have written MSSQL Query as below,
CAST(lead.CREATED_ON as date) between cast(DATEADD(day, -30, getdate()) as date) and CAST(getdate() as date)
but, I am not able to get data in the below format, and also if there no entry for one date that date should show 0 in front of all users
user
12-02-2021
11-02-2021
10-02-2021
9-02-2021
8-02-2021
7-02-2021
so on
Kindly help me to complete this query, if possible kindly share any article link, it will be a great help for me thank you

First of all, are the dates really stored as strings like that? If so, that's a big problem that will make this already-difficult situation much worse. It's important enough you should consider the current schema as actively broken.
Moving on, the most correct way to handle this situation is pivot the data in the client code or reporting tool. That is, return a result set from the SQL database looking more like this:
User | Date | Value
ABC | 2021-02-12 | 0
ABC | 2021-02-12 | 2
ABC | 2021-02-10 | 0
ABC | 2021-02-09 | 0
ABC | 2021-02-08 | 0
ABC | 2021-02-07 | 0
XYZ | 2021-02-12 | 0
XYZ | 2021-02-12 | 0
XYZ | 2021-02-10 | 0
XYZ | 2021-02-09 | 0
XYZ | 2021-02-08 | 0
XYZ | 2021-02-07 | 0
... and so on
And then let the client do the work to reformat this data however you want, rather than asking a server that is often licensed at thousands of dollars per cpu core and isn't suited for the task to do that work.
But if you really must do this on the database server, you should know the SQL language has a very strict requirement to know the number, name, and type of columns in the output at query evaluation time, before looking at any data. Even SELECT * respects this, because the * is based on a table definition known ahead of time.
If the output won't know how many columns there are until it looks at the data, you must do this in 3 steps:
Run a query to get data about columns.
Use the result from step 1 to build a new query dynamically.
Run the new query.
However, since it looks like you always want exactly 30 days worth of data, you may be able to do this in a single query using the PIVOT keyword if you're willing to name the columns something more like OneDayBack, TwoDaysBack, ThreeDaysBack, ... ThirtyDaysBack, such that you can reference them in the pivot code regardless of the current date.

Create an indicator using first record of each policy and subsequent transactions within 90 days from original

TL/DR: There are many transaction types for each policy, all start with an "N" as this is the new business transaction. R = Renewal and usually follow each year. A "C" means the policy was cancelled, and a "U" means it was reinstated, thus undoing the cancellation. I need to know if, for each policy, within 90 days of the N record the business cancelled (0) or was held via no cancellation or undo-ing of a cancellation which also occurred within that 90 day window (1).
VERY similar question to this one:
SQL Server: Find records with closest Date to CurrentDate based on conditions
My data would look like this, for 4 policies (ID).
Trs | Id | Effective_Dt | Expiration_DT
N | 01 | 2018-01-08 | 2018-01-23
C | 01 | 2018-01-23 | 2018-02-03
U | 01 | 2018-02-03 | 2019-01-08
R | 01 | 2019-01-08 | 2020-01-08
R | 01 | 2020-01-08 | 2021-01-08
N | 02 | 2019-10-10 | 2019-12-01
C | 02 | 2019-12-01 | NULL
N | 03 | 2017-06-10 | 2017-11-01
C | 03 | 2017-11-01 | NULL
N | 04 | 2017-06-10 | 2017-07-01
C | 04 | 2017-07-01 | 2017-11-01
U | 04 | 2017-11-01 | NULL
each record is a new transaction within a policy, where N=new business written, C = Canceled, U=Reinstated (undo cancel), R = renewal
The expiration date is usually 1 year from the effective date for renewals, but in the case when a cancellation happens the expiration date is the effective date of the new transaction.
For this ask I'm specifically concerned with N, C and U transactions; but I included R so you get a sense of what the data looks like. What I need to know, is which policies (0 if canceled or 1 indicator if retained) had a C transaction type within 90 days of their N transaction type... and were NOT followed by a U within that same period.
Example / Outcome:
Id | Retained
01 | 1
02 | 0
03 | 1
04 | 0
Details:
For policy 01 the N occurs on 2018-01-08. 90 days from this would be 2018-04-08. The C record on 2018-01-23 was undone on 2018-02-03; which falls within the 90 day range. So this policy would get a 1 for being retained.
Policy 02 the N occurs on 2019-10-10. 90 days from this would be 2020-01-08. The C record on 2019-12-01 was not undone. So this policy would get a 0 for being canceled.
Policy 03 the N occurs on 2017-06-10. The C record on 2017-11-01 happened after the 90 day. So this policy would get a 1 being retained.
Policy 04 the N occurs on 2017-06-10. The C record on 2017-07-01 happened before the 90 day, and was then undone on 2017-11-01-- but this is after the 90 days. So this policy would get a 0.
I hope this isn't too poorly asked... but basically taking the date of the N transaction for each policy, comparing it to the last U or C transaction that occurred within 90 days of the N. If it is a C, 0, else a 1.

Try this if it works for you
Select id,
max(Case when trs='C' and
Expiration_DT<=min(Effective_Dt)
+90
Then 1 else 0 end) as indicator
from table
group by id

You can use window functions in two steps: first get the date of the first 'N' per policy with a window min(), then do a conditional last_value() to recover the type of the last U or C transaction.
select distinct
id,
case
last_value(
case
when coalesce(expiration_dt, effective_dt) <= dateadd(day, 90, n_effective_dt)
and trs in ('C', 'U')
then trs
end
) over(partition by id order by effective_dt)
when 'U' then 1
when 'C' then 0
end retained
from (
select
t.*,
min(case when trs = 'N' then effective_dt end) over(partition by id) n_effective_dt
from mytable t
) t

I got my answer figured out... created a CTE :
NB_CANCELS AS (
SELECT
CONCAT(POL_SYMBOL_CD,POL_NBR) NB_CANCEL_ID
,EFFECTIVE_DT
,EFFECTIVE_TYPE_CD
,HISTORY_VLD_NBR
,PLN_EXP_DT
,EXPIRATION_TYPE_CD
,EXPIRATION_DT
FROM (SELECT *,(SELECT DATEADD(D,90,OGN_EFF_DT) -- Find NB record and add 90 Days
FROM Exceed_Reporting.XCD.POLICY_TAB SUB
WHERE SUB.EFFECTIVE_TYPE_CD='N' --- NB only
AND (SUB.EXPIRATION_TYPE_CD!='7' OR EXPIRATION_TYPE_CD IS NULL)
AND SUB.POLICY_ID = PT.POLICY_ID
AND SUB.QUOTE_SEQUENCE_NBR = 0
) NB90DAY
FROM Exceed_Reporting.XCD.POLICY_TAB pt) PT1
WHERE EFFECTIVE_TYPE_CD in ('5','3','s','p') -- Any cancel status
AND (EXPIRATION_TYPE_CD!='7' OR EXPIRATION_TYPE_CD IS NULL)
AND POL_SYMBOL_CD in ('HO','DP')
AND RIGHT(policy_ID,4)!='SAVE'
AND STATUS_CD='A'
AND pt1.EFFECTIVE_DT<=NB90DAY
AND (pt1.EXPIRATION_DT>=NB90DAY
OR PT1.EXPIRATION_DT IS NULL)
,
then in my base query, created the following Case statement:
, CASE WHEN pt.EFFECTIVE_TYPE_CD!='N' THEN NULL
WHEN CONCAT(PT.POL_SYMBOL_CD,PT.POL_NBR) IN (SELECT NB.NB_CANCEL_ID FROM NB_CANCELS nb)
THEN 0
else 1
END AS NB_90D_PERSISTENCE
FROM (SELECT *,DATEADD(dd,90,EFFECTIVE_DT) NB_90DAY
FROM POLICY_TAB
where EFFECTIVE_TYPE_CD in ('N','R')
) PT

Construct a grouping column in SQL Server 2012

I have created a table that looks something like this:
ID TSPPLY_DT NEXT_DT DAYS_BTWN TIME_TO_EVENT CENSORED ENDPOINT
-----------------------------------------------------------------------------
1 2014-01-01 2014-01-10 10 10 0 0
1 2014-01-10 2014-01-21 11 21 0 0
1 2014-01-21 NULL NULL 21 1 0
2 2015-04-01 2015-04-30 30 30 0 0
2 2015-04-30 2015-05-03 1 31 0 1
2 2015-05-03 2015-05-06 3 34(should be 3)0 0
2 2015-05-06 2015-05-16 10 44(shouldbe 13)1 0
The TIME_TO_EVENT column however is not adding up correctly with my code - The idea is to add up the days between until either ID changes, CENSORED = 1 or ENDPOINT = 1.
I think what I need is an addition column where I can sum based on an aggregate of ID and GROUPING... With an output as follows:
ID TSPPLY_DT NEXT_DT DAYS_BTWN TIME_TO_EVENT CENSORED ENDPOINT GROUPING
----------------------------------------------------------------------------------------
1 2014-01-01 2014-01-10 10 10 0 0 A
1 2014-01-10 2014-01-21 11 21 0 0 A
1 2014-01-21 NULL NULL 21 1 0 A
2 2015-04-01 2015-04-30 30 30 0 0 A
2 2015-04-30 2015-05-03 1 31 0 1 A
2 2015-05-03 2015-05-06 3 3 0 0 B
2 2015-05-06 2015-05-16 10 13 1 0 B
So any ideas on how to create the GROUPING column? It would be something like IF next rows ID is the same as current row, check CENSORED AND ENDPOINT. If either = 1, for the next row, change the grouping to a new value. Once a new ID is reached, reset the grouping to A (or whatever arbitrary value) and run the test again.

You need to use the DATEDIFF function, like this:
DATEDIFF(d, TSPPLY_DT, NEXT_DT) AS DAYS_BTWN
Now you don't need GROUP BY.

Complex grouping - design / performance problem

WARNING : This is one BIG Question
I have a design problem that started simple, but in one step of growth has stumped me completely.
The simple version of reality has a nice flat fact table...
All names have been changed to protect the innocent
CREATE TABLE raw_data (
tier0_id INT, tier1_id INT, tier2_id INT, tier3_id INT,
metric0 INT, metric1 INT, metric2 INT, metric3 INT
)
The tierIDs relate to entities in a fixed depth tree. Such as a business hierarchy.
The metrics are just performance figures, such as number of frogs captured, or pigeons released.
In the reporting the kindly user would make selections to mean something like the following:
tier0_id's 34 and 55 - shown separately
all of tier1_id's - grouped together
all of tier2_id's - grouped together
all of tier3_id's - shown separately
metrics 2 and 3
This gives me the following type of query:
SELECT
CASE WHEN #t0_grouping = 1 THEN NULL ELSE tier0_id END AS tier0_id,
CASE WHEN #t1_grouping = 1 THEN NULL ELSE tier1_id END AS tier1_id,
CASE WHEN #t2_grouping = 1 THEN NULL ELSE tier2_id END AS tier2_id,
CASE WHEN #t3_grouping = 1 THEN NULL ELSE tier3_id END AS tier3_id,
SUM(metric2) AS metric2, SUM(metric3) AS metric3
FROM
raw_data
INNER JOIN tier0_values ON tier0_values.id = raw_data.tier0_id OR tier0_values.id IS NULL
INNER JOIN tier1_values ON tier1_values.id = raw_data.tier1_id OR tier1_values.id IS NULL
INNER JOIN tier2_values ON tier2_values.id = raw_data.tier2_id OR tier2_values.id IS NULL
INNER JOIN tier3_values ON tier3_values.id = raw_data.tier3_id OR tier3_values.id IS NULL
GROUP BY
CASE WHEN #t0_grouping = 1 THEN NULL ELSE tier0_id END,
CASE WHEN #t1_grouping = 1 THEN NULL ELSE tier1_id END,
CASE WHEN #t2_grouping = 1 THEN NULL ELSE tier2_id END,
CASE WHEN #t3_grouping = 1 THEN NULL ELSE tier3_id END
It's a nice hybrid of Dynamic SQL, and parametrised queries. And yes, I know, but SQL-CE makes people do strange things. Besides, that can be tidied up as and when the following change gets incorporated...
From now on, we need to be able to include NULLs in the different tiers. This will mean "applies to ALL entities in that tier".
For example, with the following very simplified data:
Activity WorkingTime ActiveTime BusyTime
1 0m 10m 0m
2 0m 15m 0m
3 0m 20m 0m
NULL 60m 0m 45m
WorkingTime never applies to an activity, so al the values go in with a NULL ID. But ActiveTime is specifically about a specific activity, so it goes in with a legitimate ID. BusyTime is also against a NULL activity because it's the cumulation of all the ActiveTime.
If one were to report on this data, the NULL values -always- get included in every row, because the NULL -means- "applies to everything". The data would look like...
Activity WorkingTime ActiveTime BusyTime (BusyOnOtherActivities)
1 60m 10m 45m (45-10 = 35m)
2 60m 15m 45m (45-15 = 30m)
3 60m 20m 45m (45-20 = 25m)
1&2 60m 25m 45m (45-25 = 20m)
1&3 60m 30m 45m (45-30 = 15m)
2&3 60m 35m 45m (45-35 = 10m)
ALL 60m 45m 45m (45-45 = 0m)
Hopefully this example makes sense, because it's actually a multi-tiered hierarchy (as per the original example), and in every tier NULLs are allowed. So I'll try an example with 3 tiers...
t0_id | t1_id | t2_id | m1 | m2 | m3 | m4 | m5
1 3 10 | 0 10 0 0 0
1 4 10 | 0 15 0 0 0
1 5 10 | 0 20 0 0 0
1 NULL 10 | 60 0 45 0 0
2 3 10 | 0 5 0 0 0
2 5 10 | 0 10 0 0 0
2 6 10 | 0 15 0 0 0
2 NULL 10 | 50 0 30 0 0
1 3 11 | 0 7 0 0 0
1 4 11 | 0 8 0 0 0
1 5 11 | 0 9 0 0 0
1 NULL 11 | 30 0 24 0 0
2 3 11 | 0 8 0 0 0
2 5 11 | 0 10 0 0 0
2 6 11 | 0 12 0 0 0
2 NULL 11 | 40 0 30 0 0
NULL NULL 10 | 0 0 0 60 0
NULL NULL 11 | 0 0 0 60 0
NULL NULL NULL | 0 0 0 0 2
This would give many, many possible different output records in the reporting, but here are a few examples...
t0_id | t1_id | t2_id | m1 | m2 | m3 | m4 | m5
1 3 10 | 60 10 45 60 2
1 4 10 | 60 15 45 60 2
1 5 10 | 60 20 45 60 2
2 3 10 | 50 5 30 60 2
2 5 10 | 50 10 30 60 2
2 6 10 | 50 15 30 60 2
1 ALL 10 | 60 45 45 60 2
2 ALL 10 | 50 30 30 60 2
ALL 3 10 | 110 15 75 60 2
ALL 4 10 | 60 15 45 60 2
ALL 5 10 | 110 30 75 60 2
ALL 6 10 | 50 15 30 60 2
ALL 3 ALL | 180 30 129 120 2
ALL 4 ALL | 90 23 69 120 2
ALL 5 ALL | 180 49 129 120 2
ALL 6 ALL | 90 27 60 120 2
ALL ALL 10 | 110 129 129 60 2
ALL ALL 11 | 70 129 129 60 2
ALL ALL ALL | 180 129 129 120 2
1 3&4 ALL | 90 40 69 120 2
ALL 3&4 ALL | 180 53 129 120 2
As messy as this is to explain, it makes complete and logical sense in my head. I understand what is being asked, but for the life of me I can not seem to write a query for this that doesn't take excruciating amounts of time to execute.
So, how would you write such a query, and/or refactor the schema?
I appreciate that people will ask for examples of what I've done so far, but I'm eager to hear other people's uncorrupted ideas and advice first ;)

The problem looks more like a normalization activity. I would start with normalizing the table
to something like: (You may need some more identity fields depending on your usage)
CREATE TABLE raw_data (
rawData_ID INT,
Activity_id INT,
metric0 INT)
I'd create a tiering table that looks something like: (tierplan allows for multiple groupings. If a tier_id has no parent to roll up under, then tierparent_id is NULL This alllows for recursion in the query.)
CREATE TABLE tiers (
tierplan_id INT,
tier_id INT,
tierparent_id INT)
Finally, I'd create a table that relates tiers and Activities something like:
CREATE TABLE ActivTiers (
Activplan_id INT, --id on the table
tierplan_id INT, --tells what tierplan the raw_data falls under
rawdata_id INT) --this allows the ActivityId to be payload instead of identifier.
Queries off of this ought to be "not too difficult."

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Join Issue in Teradata - sql

Related

SQL query that I have set up the algorithm but cannot write the code

Write a query in MSSQL to get report for last 30 days

Create an indicator using first record of each policy and subsequent transactions within 90 days from original

Construct a grouping column in SQL Server 2012

Complex grouping - design / performance problem

Categories

Resources