Extract non-existent values based on previous months? - sql

I HAVE tb1
code Name sal_month
==== ===== ========
101 john 02/2017
102 mathe 02/2017
103 yara 02/2017
104 sara 02/2017
101 john 03/2017
102 mathe 03/2017
103 yara 03/2017
104 sara 03/2017
101 john 04/2017
103 yara 04/2017
In February all of them received salaries as well as March
How do I extract non-existent values based on previous months?
the result should be come
code sal_month
==== =======
102 04/2017
104 04/2017
Thank in advance

First I created this table:
create table #T(code int, sal_month varchar(10))
insert into #T values(101,'2/2017'),(102,'2/2017'),(103,'2/2017'),(104,'2/2017'),
(101,'3/2017'),(102,'3/2017'),(104,'3/2017'),(101,'4/2017'),(103,'4/2017')
Second, I executed this query:
SELECT code, Max(sal_Month)
From #T
Where code not in (select code from #T where sal_Month = (select Max(sal_Month) from #T))
Group by code
Then I got the following results:
Note: I am using SQL SERVER 2012

I think you can count salary_month grouped by id, something like this, and select the rows that shows less than 3 times.
select code, count (sal_month) from tb1
group by code
having count (sal_month) < 3
After that you join with initial table (just to filter the full rows which you need) on code.
So the final query will look like his:
select code, sal_month
from tb1 a
join (select code, count (sal_month) from tb1
group by code
having code < 3) X on a.code = X.code

Something like this:
DECLARE #DataSource TABLE
(
[code] INT
,[sal_month] VARCHAR(12)
);
INSERT #DataSource ([code], [sal_month])
VALUES (101, '2/2017')
,(102, '2/2017')
,(103, '2/2017')
,(104, '2/2017')
,(101, '3/2017')
,(102, '3/2017')
,(104, '3/2017')
,(101, '4/2017')
,(103, '4/2017');
WITH DataSource AS
(
SELECT *
,DENSE_RANK() OVER (ORDER BY [sal_month]) AS [MonthID]
,MAX([sal_month]) OVER () AS [MaxMonth]
FROM #DataSource DS1
)
SELECT DS1.[code]
,DS1.[sal_month]
FROM DataSource DS1
LEFT JOIN DataSource DS2
ON DS1.[code] = DS2.[code]
AND DS1.[MonthID] = DS2.[MonthID] - 1
LEFT JOIN DataSource DS3
ON DS1.[code] = DS3.[code]
AND DS1.[MonthID] = DS3.[MonthID] + 1
WHERE DS2.[code] IS NULL
AND DS3.[code] IS NOT NULL
AND DS1.[sal_month] <> DS1.[MaxMonth];
Some notes:
we need a way to sort the months and it is not easy as you are storing them in very unpractical way; you are not using a date/datetime column and your string is not a valid date; also, the string you are using is not good at all, because if you have [sal_month] from different years, we will not be able to sort them; you should think about this - one alternative is to use this format:
201512
201701
201702
201711
In this way we can sort by string.
in the core I am using ROW_NUMBER and sorting months as strings;
the idea is to look for all records, that have not exists in the next month, but have a record in the previous; at the same time, excluded records which are from the last month, as it's not possible for them to have record in the next month;

Try this:
select tb2.code, tb2.sal_month from tb
right join (
select code, sal_month, datepart(month, sal_month) + 1 as next_sal_month from tb) as tb2
on (tb.code = tb2.code and datepart(month, tb.sal_month) = tb2.next_sal_month)
where tb2.next_sal_month < 5 and tb.sal_month is null
In the result set there's one additional record: code 103 didn't receive salary in March, but did so in February, so it is included as well.
Here's SQL fiddle, to try :)

In the absence of more facts about your tables, create a cartesian product of the 2 axes of month & code, then left join the stored data. Then it is easy to identify missing items when no stored data exists when compared to every possible combination.
You might already have master tables of sal_month and/or code to use, if you do use those, but if not you can dynamically create them using select distinct as seen below.
create table tbl1 (code int, sal_month varchar(10))
insert into tbl1 values(101,'2/2017'),(102,'2/2017'),(103,'2/2017'),(104,'2/2017'),
(101,'3/2017'),(102,'3/2017'),(104,'3/2017'),(101,'4/2017'),(103,'4/2017')
select c.code, m.sal_month
from ( select distinct sal_month from tbl1 ) m
cross join ( select distinct code from tbl1 ) c
Left join tbl1 t on m.sal_month = t.sal_month and c.code = t.code
Where t.sal_month IS NULL
code | sal_month
---: | :--------
103 | 3/2017
102 | 4/2017
104 | 4/2017
dbfiddle here

Related

Self join query using SQL - demo query attached

I have a scenario where I need my query to pull out data based on Code value.
Example: The table #temp1 has the combination of Person ID, Program assignments and dataset. For any person '1001', I want to pull out the admission date of first program and discharge date and dataset of the last program under similar code 'PS'.
So, my desired output is:
Demo code:
https://rextester.com/ADDL95491
Any help?!
It seems to me you need below query
with cte1 as
(select cid,code,admissiondate,dischargedate,program
from #temp1 t1 where
t1.row_number = (select min(row_number) from #temp1)
) , cte2 as
(select * from #temp1 where dataset is not null
)
select cte1.cid,cte1.code,cte1.admissiondate,
cte2.dischargedate,cte2.dataset
from cte1 left join cte2 on cte1.code=cte2.code
https://rextester.com/BNVK71028
cid code admissiondate dischargedate dataset
1 1001 PR 01/01/2011 5/1/2011 discharge data
2 1001 PS 06/01/2011 7/1/2011 discharge data
3 1001 PQ 08/01/2011

Aggregate column text where dates in table a are between dates in table b

Sample data
CREATE TEMP TABLE a AS
SELECT id, adate::date, name
FROM ( VALUES
(1,'1/1/1900','test'),
(1,'3/1/1900','testing'),
(1,'4/1/1900','testinganother'),
(1,'6/1/1900','superbtest'),
(2,'1/1/1900','thebesttest'),
(2,'3/1/1900','suchtest'),
(2,'4/1/1900','test2'),
(2,'6/1/1900','test3'),
(2,'7/1/1900','test4')
) AS t(id,adate,name);
CREATE TEMP TABLE b AS
SELECT id, bdate::date, score
FROM ( VALUES
(1,'12/31/1899', 7 ),
(1,'4/1/1900' , 45),
(2,'12/31/1899', 19),
(2,'5/1/1900' , 29),
(2,'8/1/1900' , 14)
) AS t(id,bdate,score);
What I want
What I need to do is aggregate column text from table a where the id matches table b and the date from table a is between the two closest dates from table b. Desired output:
id date score textagg
1 12/31/1899 7 test, testing
1 4/1/1900 45 testinganother, superbtest
2 12/31/1899 19 thebesttest, suchtest, test2
2 5/1/1900 29 test3, test4
2 8/1/1900 14
My thoughts are to do something like this:
create table date_join
select a.id, string_agg(a.text, ','), b.*
from tablea a
left join tableb b
on a.id = b.id
*having a.date between b.date and b.date*;
but I am really struggling with the last line, figuring out how to aggregate only where the date in table b is between the closest two dates in table b. Any guidance is much appreciated.
I can't promise it's the best way to do it, but this is a way to do it.
with b_values as (
select
id, date as from_date, score,
lead (date, 1, '3000-01-01')
over (partition by id order by date) - 1 as thru_date
from b
)
select
bv.id, bv.from_date, bv.score,
string_agg (a.text, ',')
from
b_values as bv
left join a on
a.id = bv.id and
a.date between bv.from_date and bv.thru_date
group by
bv.id, bv.from_date, bv.score
order by
bv.id, bv.from_date
I'm presupposing you will never have a date in your table greater than 12/31/2999, so if you're still running this query after that date, please accept my apologies.
Here is the output I got when I ran this:
id from_date score string_agg
1 0 7 test,testing
1 92 45 testinganother,superbtest
2 0 19 thebesttest,suchtest,test2
2 122 29 test3,test4
2 214 14
I might also note that between in a join is a performance killer. IF you have large data volumes, there might be better ideas on how to approach this, but that depends largely on what your actual data looks like.

Selecting multiple unique matches from one column that match another column

I have a list of codes (101, 102, 103, 104) and I want to pick out the people in the following table that have two or more different codes from the list occurring within a year of each other.
Name Code1 Code1date
John 101 01/01/2016
John 102 01/02/2013
Chris 101 01/01/2015
Chris 101 01/05/2014
Chris 102 01/10/2015
Mark 101 01/11/2011
Mark 101 01/01/2011
Mark 107 01/07/2012
So in this sample only Chris would be selected because he has a 101 code and a 102 code within a year of each other.
Thanks!
Try with the below query if you are using SQL Server.
SELECT name
FROM yourtable
WHERE code1 in (101, 102, 103, 104)
GROUP BY name, year(code1date)
HAVING COUNT(distinct code1) > 1
Version 1: Same year: happened in the same year
You need to make a group for each name and year and only show those distinct names that have more than 1 unique code:
select distinct name
from sample_table
where code1 between 101 and 104
group by name, extract(year from code1date)
having count(distinct code1) > 1
This will result in Chris only being presented in output.
EXTRACT function is ANSI-SQL compliant, but it will work assuming that code1date is of date type
In case it is of text data type, you could get 4 characters from the right, so for example right(code1date, 4)
Version 2: Same year: scan back- and onwards for one year difference
If by one year you mean not the same year, but scanning backwards and onwards from a date for 1 year difference, then here's the solution for Postgres:
SELECT
a.name
FROM sample_table a
JOIN sample_table b ON
a.name = b.name
AND a.code1 <> b.code1
AND b.code1date BETWEEN a.code1date - interval '1 year' AND a.code1date + interval '1 year'
WHERE a.code1 BETWEEN 101 AND 104
GROUP BY a.name
HAVING COUNT(DISTINCT a.code1) > 1
Above also assumes that your code1date is of date type. If that's not the case then you should think about converting it to a proper format. If that's beyond your reach, then you could always get the last character from your column, cast to Integer, increment it and append it back to the substring without the last char thus replacing the value of year :-)
SELECT
t1.Name
FROM
TableName t1
INNER JOIN TableName t2
ON t1.Name = t2.Name
AND t2.Code1Date BETWEEN DATEADD(year,-1,t1.Code1Date) AND DATEADD(year,1,t1.Code1Date)
AND t1.Code1 <> t2.Code1
AND t2.Code1 IN (101,102,103,104)
WHERE
t1.Code1 IN (101,102,103,104)
GROUP BY
t1.Name
HAVING
COUNT(DISTINCT t1.Code1) > 1
So this is for sql-server but just change up the BETWEEN statement to reflect the date functions of what ever rdbms you are working with.

SQL Server 2008 - need help on a antithetical query

I want to find out meter reading for given transaction day. In some cases there won’t be any meter reading and would like to see a meter reading for previous day.
Sample data set follows. I am using SQL Server 2008
declare #meter table (UnitID int, reading_Date date,reading int)
declare #Transactions table (Transactions_ID int,UnitID int,Transactions_date date)
insert into #meter (UnitID,reading_Date,reading ) values
(1,'1/1/2014',1000),
(1,'2/1/2014',1010),
(1,'3/1/2014',1020),
(2,'1/1/2014',1001),
(3,'1/1/2014',1002);
insert into #Transactions(Transactions_ID,UnitID,Transactions_date) values
(1,1,'1/1/2014'),
(2,1,'2/1/2014'),
(3,1,'3/1/2014'),
(4,1,'4/1/2014'),
(5,2,'1/1/2014'),
(6,2,'3/1/2014'),
(7,3,'4/1/2014');
select * from #meter;
select * from #Transactions;
I expect to get following output
Transactions
Transactions_ID UnitID Transactions_date reading
1 1 1/1/2014 1000
2 1 2/1/2014 1010
3 1 3/1/2014 1020
4 1 4/1/2014 1020
5 2 1/1/2014 1001
6 2 3/1/2014 1001
7 3 4/1/2014 1002
Your SQL Query to get your desired out put will as following:
SELECT Transactions_ID, T.UnitID, Transactions_date
, (CASE WHEN ISNULL(M.reading,'') = '' THEN
(
SELECT MAX(Reading) FROM #meter AS A
JOIN #Transactions AS B ON A.UnitID=B.UnitID AND A.UnitID=T.UnitID
)
ELSE M.reading END) AS Reading
FROM #meter AS M
RIGHT OUTER JOIN #Transactions AS T ON T.UnitID=M.UnitID
AND T.Transactions_date=M.reading_Date
I can think of two ways to approach this - neither of them are ideal.
The first (and slightly better) way would be to create a SQL Function that took the Transactions_date as a parameter and returned the reading for Max(Reading_date) where reading_date <= transactions_date. You could then use this function in a select statement against the Transactions table.
The other approach would be to use a cursor to iterate through the transactions table and use the same logic as above where you return the reading for Max(Reading_date) where reading_date <= transactions_date.
Try the below query:
Please find the result of the same in SQLFiddle
select a.Transactions_ID, a.UnitID, a.Transactions_date,
case when b.reading IS NULL then c.rd else b.reading end as reading
from
Transactions a
left outer join
meter b
on a.UnitID = b.UnitID
and a.Transactions_date = b.reading_Date
inner join
(
select UnitID,max(reading) as rd
from meter
group by UnitID
) as C
on a.UnitID = c.UnitID

How can I SELECT distinct data based on a date field?

I have table that stores a log of changes to objects in another table. Here are my table contents:
ObjID Color Date User
------- ------- ------------------------ --------
1 Red 2010-01-01 12:22:00.000 Joe
1 Blue 2010-01-02 15:22:00.000 Jill
1 Green 2010-01-03 16:22:00.000 Joe
1 White 2010-01-10 09:22:00.000 Mike
2 Red 2010-01-09 10:22:00.000 Mike
2 Blue 2010-01-12 09:22:00.000 Jill
2 Orange 2010-01-12 15:22:00.000 Joe
I want to select the most recent date for each Object, as well as the Color and User on the date of that record.
Bascically, I want this result set:
ObjID Color Date User
------- ------- ------------------------ --------
1 White 2010-01-10 09:22:00.000 Mike
2 Orange 2010-01-12 15:22:00.000 Joe
I'm having trouble wrapping my head around the SQL query I need to write to get this data...
I am retrieving data via ODBC from an iSeries DB2 database (AS/400).
Hey there, I think you want the following (where ColorTable is your table name):
SELECT Color.*
FROM ColorTable as Color
INNER JOIN
(
SELECT ObjID, MAX(Date) as Date
FROM ColorTable
GROUP BY ObjID
) as MaxDateByColor
ON Color.ObjID = MaxDateByColor.ObjID
AND Color.Date = MaxDateByColor.Date
Assuming at least SQL Server 2005
DECLARE #T TABLE (ObjID INT,Color VARCHAR(10),[Date] DATETIME,[User] VARCHAR(50))
INSERT INTO #T
SELECT 1,'Red',' 2010-01-01 12:22:00.000','Joe' UNION ALL
SELECT 1,'Blue','2010-01-02 15:22:00.000','Jill' UNION ALL
SELECT 1,'Green',' 2010-01-03 16:22:00.000','Joe' UNION ALL
SELECT 1,'White',' 2010-01-10 09:22:00.000','Mike' UNION ALL
SELECT 2,'Red',' 2010-01-09 10:22:00.000','Mike' UNION ALL
SELECT 2,'Blue','2010-01-12 09:22:00.000','Jill' UNION ALL
SELECT 2,'Orange','2010-01-12 15:22:00.000','Joe'
;WITH T AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ObjID ORDER BY Date DESC) AS RN
FROM #T
)
SELECT ObjID,
Color,
[Date],
[User]
FROM T
WHERE RN=1
Or a SQL Server 2000 method from the article linked to in the comments
SELECT ObjID,
CAST(SUBSTRING(string, 24, 33) AS VARCHAR(10)) AS Color,
CAST(SUBSTRING(string, 1, 23) AS DATETIME ) AS [Date],
CAST(SUBSTRING(string, 34, 83) AS VARCHAR(50)) AS [User]
FROM
(
SELECT ObjID,
MAX((CONVERT(CHAR(23), [Date], 126)
+ CAST(Color AS CHAR(10))
+ CAST([User] AS CHAR(50))) COLLATE Latin1_General_BIN) AS string
FROM #T
GROUP BY ObjID) T;
If you have an Objects table and your ObjectHistory table has an index on ObjID and date, then this could perform better than other queries given so far:
SELECT
X.*
FROM
Objects O
CROSS APPLY (
SELECT TOP 1 *
FROM ObjectHistory H
WHERE O.ObjID = O.ObjID
ORDER BY H.[Date] DESC
) X
The performance improvement may only come if you're pulling columns from the Objects table, too, but it's worth a shot.
If you want all Objects regardless of whether they have a history entry, switch to OUTER APPLY (and of course use O.ObjID instead of H.ObjID).
The neat thing about this query is that
It solves for situations where the Date value can have duplicates
It can support an arbitrary number of items per group (say, the top 5 instead of the top 1)
See these two related questions:
SQL/mysql - Select distinct/UNIQUE but return all columns?
And:
How to efficiently determine changes between rows using SQL
SELECT t1.* FROM Table_name as t1
INNER JOIN (
SELECT MAX(Date) as MaxDate, ObjID FROM Table_name
GROUP BY ObjID
) as t2
ON t1.ObjID = t2.ObjID AND t1.Date = t2.MaxDate
You can find out, per object, its most recent change like this:
select objectid, max(changedate) as LatestChange
from LOG
group by objectid
You can then get the color and user columns by linking the set returned above, instantiated as an inline view that has been given an alias, to the same table again:
select color, user, FOO.objectid, FOO.LatestChange
from LOG
inner join
(
select objectid, max(changedate) as LatestChange
from LOG
group by objectid
) as FOO
on LOG.objectid = FOO.objectid and LOG.changedate = FOO.LatestChange
like martin smiths above,
simply just do a row number over partition and pick one of the rows that is most recent
like
SELECT Color,Date,User
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY User ORDER BY [DATE]) AS ROW_NUMBER
FROM [tablename]
) AS ROWS
WHERE
ROW_NUMBER = 2