SQL Column results with NULL Values - sql

I am having the following issue with my query.
I am trying to import data from multiple tables (Fact_Contact, Quali_Seg, etc…) into one table (Fact_Forecast). This is to predict how many individuals are eligible for a specific offer. The problem I am having is that for some reason, the column Date_ID, which is been pulled from Fact_Contact, when importing has NULL values. I don’t know where these NULL values are coming from as the table Fact_Contact don’t have any NULL values in the column DATE_ID.
This is the section of the query that has the problem,
DECLARE #lastDateID int
SELECT TOP 1 #lastDateID = date_id
FROM Fact_Contact
ORDER BY CREATE_DATE DESC
SELECT date_id, Offers.Segmentation_id, Offers.Offer_Code, Offers.Wave_no,
Offers.cadencevalue,
CASE
WHEN dailydata.activity_count IS NOT NULL THEN dailydata.activity_count
ELSE 0
END as "activity_count"
FROM (
SELECT s.Segmentation_id, s.Offer_Code, s.Wave_no, o.cadencevalue,
o.campaign_id, o.offer_desc
FROM Forecast_Model.dbo.Quali_Segment s
LEFT JOIN Forecast_Model.dbo.Dim_Offers o
ON s.offer_code = o.offer_code
) Offers
LEFT JOIN (
SELECT date_id, Offer_Code_1 Offer_Code,
segmentation_group_id, Count(indv_role_id) Activity_count
FROM Forecast_Model.dbo.Fact_Contact
WHERE date_id = #lastDateID
GROUP BY offer_code_1,segmentation_group_id,date_id
) DailyData
ON DailyData.offer_code = Offers.offer_code
AND Offers.Segmentation_id = dailydata.segmentation_group_id
ORDER BY Segmentation_id,Wave_no
The column Date_ID as I mentiones gets only 2 dates which is the same as the #LastDateID which is 2014-05-20 but the rest are NULL.
Thank you,
Omar

date_id will be NULL whenever you have records in Offers (join Quali_Segment) but no matching records in Fact_Contact

Related

How to join table is sql?

I have two tables which name shoes_type and shoes_list. The shoes_type table includes shoes_id, shoes_size, shoes_type, date, project_id. Meanwhile, on the shoes_list table, I have shoes_quantity, shoes_id, shoes_color, date, project_id.
I need to get the sum of shoes_quantity based on the shoes_type, shoes_size, date, and also project_id.
I get how to sum the shoes_quantity based on color by doing:
select shoes_color, sum(shoes_quantity)
from shoes_list group by shoes_color
Basically what I want to see is the total quantity of shoes based on the type, size, date and project_id. The type and size information are available on shoes_type table, while the quantity is coming from the shoes_list table. I expect to see something like:
shoes_type shoes_size total quantity date project_id
heels 5 3 19/10/02 1
sneakers 5 3 19/10/02 1
sneakers 6 1 19/10/05 1
heels 7 5 19/10/03 1
While for the desired result, I have tried:
select shoes_type, shoes_size, date, project_id, sum(shoes_quantity)
from shoes_type st
join shoes_list sl
on st.project_id = sl.project_id
and st.shoes_id = sl.shoes_id
and st.date = sl.date
group by shoes_type, shoes_size, date, project_id
Unfortunately, I got an error that says that the column reference "date" is ambiguous.
How should I fix this?
Thank you.
The date column exists in both tables, so you have to specify where to select it from. Replace date with shoes_type.date or shoes_list.date
Qualify all column references to remove the "ambiguous" column error:
select st.shoes_type, st.shoes_size, st.date, st.project_id, sum(slshoes_quantity)
from shoes_type st join
shoes_list sl
on st.project_id = sl.project_id and
st.shoes_id = sl.shoes_id and
st.date = sl.date
group by st.shoes_type, st.shoes_size, st.date, st.project_id;
If you want all columns from shoes_type, you might find that a correlated subquery is faster:
select st.*,
(select sum(slshoes_quantity)
from shoes_list sl
where st.project_id = sl.project_id and
st.shoes_id = sl.shoes_id and
st.date = sl.date
)
from shoes_type st;

Using a stored procedure in Teradata to build a summarial history table

I am using Terdata SQL Assistant connected to an enterprise DW. I have written the query below to show an inventory of outstanding items as of a specific point in time. The table referenced loads and stores new records as changes are made to their state by load date (and does not delete historical records). The output of my query is 1 row for the specified date. Can I create a stored procedure or recursive query of some sort to build a history of these summary rows (with 1 new row per day)? I have not used such functions in the past; links to pertinent previously answered questions or suggestions on how I could get on the right track in researching other possible solutions are totally fine if applicable; just trying to bridge this gap in my knowledge.
SELECT
'2017-10-02' as Dt
,COUNT(DISTINCT A.RECORD_NBR) as Pending_Records
,SUM(A.PAY_AMT) AS Total_Pending_Payments
FROM DB.RECORD_HISTORY A
INNER JOIN
(SELECT MAX(LOAD_DT) AS LOAD_DT
,RECORD_NBR
FROM DB.RECORD_HISTORY
WHERE LOAD_DT <= '2017-10-02'
GROUP BY RECORD_NBR
) B
ON A.RECORD_NBR = B.RECORD_NBR
AND A.LOAD_DT = B.LOAD_DT
WHERE
A.RECORD_ORDER =1 AND Final_DT Is Null
GROUP BY Dt
ORDER BY 1 desc
Here is my interpretation of your query:
For the most recent load_dt (up until 2017-10-02) for record_order #1,
return
1) the number of different pending records
2) the total amount of pending payments
Is this correct? If you're looking for this info, but one row for each "Load_Dt", you just need to remove that INNER JOIN:
SELECT
load_Dt,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE record_order = 1
AND final_Dt IS NULL
GROUP BY load_Dt
ORDER BY 1 DESC
If you want to get the summary info per record_order, just add record_order as a grouping column:
SELECT
load_Dt,
record_order,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE final_Dt IS NULL
GROUP BY load_Dt, record_order
ORDER BY 1,2 DESC
If you want to get one row per day (if there are calendar days with no corresponding "load_dt" days), then you can SELECT from the sys_calendar.calendar view and LEFT JOIN the query above on the "load_dt" field:
SELECT cal.calendar_date, src.Pending_Records, src.Total_Pending_Payments
FROM sys_calendar.calendar cal
LEFT JOIN (
SELECT
load_Dt,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE record_order = 1
AND final_Dt IS NULL
GROUP BY load_Dt
) src ON cal.calendar_date = src.load_Dt
WHERE cal.calendar_date BETWEEN <start_date> AND <end_date>
ORDER BY 1 DESC
I don't have access to a TD system, so you may get syntax errors. Let me know if that works or you're looking for something else.

Grouping records on consecutive dates

If I have following table in Postgres:
order_dtls
Order_id Order_date Customer_name
-------------------------------------
1 11/09/17 Xyz
2 15/09/17 Lmn
3 12/09/17 Xyz
4 18/09/17 Abc
5 15/09/17 Xyz
6 25/09/17 Lmn
7 19/09/17 Abc
I want to retrieve such customer who has placed orders on 2 consecutive days.
In above case Xyz and Abc customers should be returned by query as result.
There are many ways to do this. Use an EXISTS semi-join followed by DISTINCT or GROUP BY, should be among the fastest.
Postgres syntax:
SELECT DISTINCT customer_name
FROM order_dtls o
WHERE EXISTS (
SELEST 1 FROM order_dtls
WHERE customer_name = o.customer_name
AND order_date = o.order_date + 1 -- simple syntax for data type "date" in Postgres!
);
If the table is big, be sure to have an index on (customer_name, order_date) to make it fast - index items in this order.
To clarify, since Oto happened to post almost the same solution a bit faster:
DISTINCT is an SQL construct, a syntax element, not a function. Do not use parentheses like DISTINCT (customer_name). Would be short for DISTINCT ROW(customer_name) - a row constructor unrelated to DISTINCT - and just noise for the simple case with a single expression, because Postgres removes the pointless row wrapper for a single element automatically. But if you wrap more than one expression like that, you get an actual row type - an anonymous record actually, since no row type is given. Most certainly not what you want.
What is a row constructor used for?
Also, don't confuse DISTINCT with DISTINCT ON (expr, ...). See:
Select first row in each GROUP BY group?
Try something like...
SELECT `order_dtls`.*
FROM `order_dtls`
INNER JOIN `order_dtls` AS mirror
ON `order_dtls`.`Order_id` <> `mirror`.`Order_id`
AND `order_dtls`.`Customer_name` = `mirror`.`Customer_name`
AND DATEDIFF(`order_dtls`.`Order_date`, `mirror`.`Order_date`) = 1
The way I would think of it doing it would be to join the table the date part with itselft on the next date and joining it with the Customer_name too.
This way you can ensure that the same customer_name done an order on 2 consecutive days.
For MySQL:
SELECT distinct *
FROM order_dtls t1
INNER JOIN order_dtls t2 on
t1.Order_date = DATE_ADD(t2.Order_date, INTERVAL 1 DAY) and
t1.Customer_name = t2.Customer_name
The result you should also select it with the Distinct keyword to ensure the same customer is not displayed more than 1 time.
For postgresql:
select distinct(Customer_name) from your_table
where exists
(select 1 from your_table t1
where
Customer_name = your_table.Customer_name and Order_date = your_table.Order_date+1 )
Same for MySQL, just instead of your_table.Order_date+1 use: DATE_ADD(your_table.Order_date , INTERVAL 1 DAY)
This should work:
SELECT A.customer_name
FROM order_dtls A
INNER JOIN (SELECT customer_name, order_date FROM order_dtls) as B
ON(A.customer_name = B.customer_name and Datediff(B.Order_date, A.Order_date) =1)
group by A.customer_name

Finding the number of concurrent days two events happen over the course of time using a calendar table

I have a table with a structure
(rx)
clmID int
patid int
drugclass char(3)
drugName char(25)
fillDate date
scriptEndDate date
strength int
And a query
;with PatientDrugList(patid, filldate,scriptEndDate,drugClass,strength)
as
(
select rx.patid,rx.fillDate,rx.scriptEndDate,rx.drugClass,rx.strength
from rx
)
,
DrugList(drugName)
as
(
select x.drugClass
from (values('h3a'),('h6h'))
as x(drugClass)
where x.drugClass is not null
)
SELECT PD.patid, C.calendarDate AS overlap_date
FROM PatientDrugList AS PD, Calendar AS C
WHERE drugClass IN ('h3a','h6h')
AND calendardate BETWEEN filldate AND scriptenddate
GROUP BY PD.patid, C.CalendarDate
HAVING COUNT(DISTINCT drugClass) = 2
order by pd.patid,c.calendarDate
The Calendar is simple a calendar table with all possible dates throughout the length of the study with no other columns.
My query returns data that looks like
The overlap_date represents every day that a person was prescribed a drug in the two classes listed after the PatientDrugList CTE.
I would like to find the number of consecutive days that each person was prescribed both families of drugs. I can't use a simple max and min aggregate because that wouldn't tell me if someone stopped this regimen and then started again. What is an efficient way to find this out?
EDIT: The row constructor in the DrugList CTE should be a parameter for a stored procedure and was amended for the purposes of this example.
You are looking for consecutive sequences of dates. The key observation is that if you subtract a sequence from the dates, you'll get a constant date. This defines a group of dates all in sequence, which can then be grouped.
select patid
,MIN(overlap_date) as start_overlap
,MAX(overlap_date) as end_overlap
from(select cte.*,(dateadd(day,row_number() over(partition by patid order by overlap_Date),overlap_date)) as groupDate
from cte
)t
group by patid, groupDate
This code is untested, so it might have some typos.
You need to pivot on something and a max and min work that out. Can you state if someone had both drugs on a date pivot? Then you would be limiting by date if I understand your question correctly.
EG Example SQL:
declare #Temp table ( person varchar(8), dt date, drug varchar(8));
insert into #Temp values ('Brett','1-1-2013', 'h3a'),('Brett', '1-1-2013', 'h6h'),('Brett','1-2-2013', 'h3a'),('Brett', '1-2-2013', 'h6h'),('Joe', '1-1-2013', 'H3a'),('Joe', '1-2-2013', 'h6h');
with a as
(
select
person
, dt
, max(case when drug = 'h3a' then 1 else 0 end) as h3a
, max(case when drug = 'h6h' then 1 else 0 end) as h6h
from #Temp
group by person, dt
)
, b as
(
select *, case when h3a = 1 and h6h = 1 then 1 end as Logic
from a
)
select person, count(Logic) as DaysOnBothPresriptions
from b
group by person

Remove duplicates (1 to many) or write a subquery that solves my problem

Referring to the diagram below the records table has unique Records. Each record is updated, via comments through an Update Table. When I join the two I get lots of duplicates.
How to remove duplicates? Group By does not work for me as I have more than 10 fields in select query and some of them are functions.
Write a sub query which pulls the last updates in the Update table for each record that is updated in a particular month. Joining with this sub query will solve my problem.
Thanks!
Edit
Table structure that is of interest is
create table Records(
recordID int,
90more_fields various
)
create table Updates(
update_id int,
record_id int,
comment text,
byUser varchar(25),
datecreate datetime
)
Here's one way.
SELECT * /*But list columns explicitly*/
FROM Orange o
CROSS APPLY (SELECT TOP 1 *
FROM Blue b
WHERE b.datecreate >= '20110901'
AND b.datecreate < '20111001'
AND o.RecordID = b.Record_ID2
ORDER BY b.datecreate DESC) b
Based on the limited information available...
WITH cteLastUpdate AS (
SELECT Record_ID2, UpdateDateTime,
ROW_NUMBER() OVER(PARTITION BY Record_ID2 ORDER BY UpdateDateTime DESC) AS RowNUM
FROM BlueTable
/* Add WHERE clause if needed to restrict date range */
)
SELECT *
FROM cteLastUpdate lu
INNER JOIN OrangeTable o
ON lu.Record_ID2 = o.RecordID
WHERE lu.RowNum = 1
Last updates per record and month:
SELECT *
FROM UPDATES outerUpd
WHERE exists
(
-- Magic part
SELECT 1
FROM UPDATES innerUpd
WHERE innerUpd.RecordId = outerUpd.RecordId
GROUP BY RecordId
, date_part('year', innerUpd.datecolumn)
, date_part('month', innerUpd.datecolumn)
HAVING max(innerUpd.datecolumn) = outerUpd.datecolumn
)
(Works on PostgreSQL, date_part is different in other RDBMS)