How to select the best item in each group? - sql

I have table reports:
id
file_name
1
jan.xml
2
jan.csv
3
feb.csv
In human language: there are reports for each month. Each report could be in XML or CSV format. There could be 1-2 reports for each month in unique format.
I want to select the reports for all months, picking only 1 file for each month. The XML format is more preferable.
So, expected output is:
id
file_name
1
jan.xml
3
feb.csv
Explanation: the file jan.csv was excluded since there is more preferable report for that month: jan.xml.

As mentioned in the comments your data structure has a number of challenges. It really needs a column for ReportDate or something along those lines that is a date/datetime so you know which month the report belongs to. That would also give you something to sort by when you get your data back. Aside from those much needed improvements you can get the desired results from your sample data with something like this.
create table SomeFileTable
(
id int
, file_name varchar(10)
)
insert SomeFileTable
select 1, 'jan.xml' union all
select 2, 'jan.csv' union all
select 3, 'feb.csv'
select s.id
, s.file_name
from
(
select *
, FileName = parsename(file_name, 2)
, FileExtension = parsename(file_name, 1)
, RowNum = ROW_NUMBER() over(partition by parsename(file_name, 2) order by case parsename(file_name, 1) when 'xml' then 1 else 2 end)
from SomeFileTable
) s
where s.RowNum = 1
--ideally you would want to order the results but you don't have much of anything to work with in your data as a reliable sorting order since the dates are implied by the file name

You may want to use a window function that ranks your rows by partitioning on the month and ordering by the format name, by working on the file_name field.
WITH ranked_reports AS (
SELECT
id,
file_name,
ROW_NUMBER() OVER(
PARTITION BY LEFT(file_name, 3)
ORDER BY RIGHT(file_name, 3) DESC
) AS rank
FROM
reports
)
SELECT
id,
file_name
FROM
ranked_reports
WHERE
rank = 1

Related

Change the order by minimum order output in SQL

I am trying to select the minimum of a string however due to sql ordering automatically in lexicographic order, its not in the correct order for what I require.
I currently have 3 seasons where I would like to select the minimum / order by the minimum. The code I currently have is:
Select distinct season from table1 order by season desc;
The order it currently outputs this is:
Spring19
Autumn19
Autumn18
However I need it to order as chronoligical order as the seasons go so:
Autumn18
Spring19
Autumn19
Is there a way that I can change the format to a 'date' without actually changing the format of the text? Or is there another way to do this?
Thanks :)
Most databases support the right() function (if not, they have similar functionality by different names).
So, this should work:
Select distinct season
from table1
order by right(season, 2) asc, season desc;
with seasoncte (season_number, season_year, season) as (
select case when left(season, length(season) -2) = "Spring" then 1
when left(season, length(season) -2) = "Summer" then 2
when left(season, length(season) -2) = "Autumn" then 3
when left(season, length(season) -2) = "Winter then 4
end as season_number,
cast(right(season,2 as int) as season_year,
season
from table1
), seasoncte2 (season_number, season_year, season) as (
select season_number
case when season_year < 39 then 2000 + season_year
case else then 1900 + season_year
end,
season
from seasoncte
)
select t1.season
from table1 t1
join seasoncte2 cte
on t1.season = cte.season
order by cte.season_year, cte.season_number
Code may need tweaking depending on SQL dialect.
Op did not specify that. Or provide rextester link, so code could be tested.

SQL Server iterating through time series data

I am using SQL Server and wondering if it is possible to iterate through time series data until specific condition is met and based on that label my data in other table?
For example, let's say I have a table like this:
Id Date Some_kind_of_event
+--+----------+------------------
1 |2018-01-01|dsdf...
1 |2018-01-06|sdfs...
1 |2018-01-29|fsdfs...
2 |2018-05-10|sdfs...
2 |2018-05-11|fgdf...
2 |2018-05-12|asda...
3 |2018-02-15|sgsd...
3 |2018-02-16|rgw...
3 |2018-02-17|sgs...
3 |2018-02-28|sgs...
What I want to get, is to calculate for each key the difference between two adjacent events and find out if there exists difference > 10 days between these two adjacent events. In case yes, I want to stop iterating for that specific key and put label 'inactive', otherwise 'active' in my other table. After we finish with one key, we start with another.
So for example id = 1 would get label 'inactive' because there exists two dates which have difference bigger that 10 days. The final result would be like that:
Id Label
+--+----------+
1 |inactive
2 |active
3 |inactive
Any ideas how to do that? Is it possible to do it with SQL?
When working with a DBMS you need to get away from the idea of thinking iteratively. Instead you need to try and think in sets. "Instead of thinking about what you want to do to a row, think about what you want to do to a column."
If I understand correctly, is this what you're after?
CREATE TABLE SomeEvent (ID int, EventDate date, EventName varchar(10));
INSERT INTO SomeEvent
VALUES (1,'20180101','dsdf...'),
(1,'20180106','sdfs...'),
(1,'20180129','fsdfs..'),
(2,'20180510','sdfs...'),
(2,'20180511','fgdf...'),
(2,'20180512','asda...'),
(3,'20180215','sgsd...'),
(3,'20180216','rgw....'),
(3,'20180217','sgs....'),
(3,'20180228','sgs....');
GO
WITH Gaps AS(
SELECT *,
DATEDIFF(DAY,LAG(EventDate) OVER (PARTITION BY ID ORDER BY EventDate),EventDate) AS EventGap
FROM SomeEvent)
SELECT ID,
CASE WHEN MAX(EventGap) > 10 THEN 'inactive' ELSE 'active' END AS Label
FROM Gaps
GROUP BY ID
ORDER BY ID;
GO
DROP TABLE SomeEvent;
GO
This assumes you are using SQL Server 2012+, as it uses the LAG function, and SQL Server 2008 has less than 12 months of any kind of support.
Try this. Note, replace #MyTable with your actual table.
WITH Diffs AS (
SELECT
Id
,DATEDIFF(DAY,[Date],LEAD([Date],1,0) OVER (ORDER BY [Id], [Date])) Diff
FROM #MyTable)
SELECT
Id
,CASE WHEN MAX(Diff) > 10 THEN 'Inactive' ELSE 'Active' END
FROM Diffs
GROUP BY Id
Just to share another approach (without a CTE).
SELECT
ID
, CASE WHEN SUM(TotalDays) = (MAX(CNT) - 1) THEN 'Active' ELSE 'Inactive' END Label
FROM (
SELECT
ID
, EventDate
, CASE WHEN DATEDIFF(DAY, EventDate, LEAD(EventDate) OVER(PARTITION BY ID ORDER BY EventDate)) < 10 THEN 1 ELSE 0 END TotalDays
, COUNT(ID) OVER(PARTITION BY ID) CNT
FROM EventsTable
) D
GROUP BY ID
The method is counting how many records each ID has, and getting the TotalDays by date differences (in days) between the current the next date, if the difference is less than 10 days, then give me 1, else give me 0.
Then compare, if the total days equal the number of records that each ID has (minus one) would print Active, else Inactive.
This is just another approach that doesn't use CTE.

SQL - Compare 2 different ranges of date

The table has these columns:
DATA, CODE and so on..
I need to display two different ranges of date and its code like:
|data|code|data2|code|
My query is:
SELECT DATA,CODE
FROM people
WHERE DATA >= ${data1} AND DATA <= ${data2}
GROUP BY DATA
ORDER BY DATA
What I did was trying to do 2 queries with differents variable but both return always the same range of data.
So I did something like:
SELECT DATA,CODE
FROM people
WHERE DATA >= ${d1} AND DATA <= ${d2}
GROUP BY DATA
ORDER BY DATA
and try to assign 4 differents date in order to get 2 ranges of period. Let's imagine data1='01-01-2001' and data2='31-12-2001' while d1='01-01-2002' and d2='31-12-2002'.
When I assigned the dates, both return only the last range.
So instead of getting |2001|code|2002|code| I've got |2002|code|2002|code|
I need for comparison, so I want to compare every day of the year 2001 on the left and with every day of the year 2002 on the right.
Using the bind variables :start1 and :end1 as the bounds for the first range and :start2 and :end2 as the bounds for the second range:
SELECT d1.data AS data1,
d1.code AS code1,
d2.data AS data2,
d2.code AS code2
FROM (
SELECT data,
code,
ROW_NUMBER() OVER ( ORDER BY data ) AS rn
FROM people
WHERE data BETWEEN :start1 AND :end1
) d1
FULL OUTER JOIN
(
SELECT data,
code,
ROW_NUMBER() OVER ( ORDER BY data ) AS rn
FROM people
WHERE data BETWEEN :start2 AND :end2
) d2
ON ( d1.rn = d2.rn )

Why would the query show data from the wrong month?

I have a query:
;with date_cte as(
SELECT r.starburst_dept_name,r.monthly_past_date as PrevDate,x.monthly_past_date as CurrDate,r.starburst_dept_average - x.starburst_dept_average as Average
FROM
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY starburst_dept_name ORDER BY monthly_past_date) AS rowid
FROM intranet.dbo.cse_reports_month
) r
JOIN
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY starburst_dept_name ORDER BY monthly_past_date) AS rowid
FROM intranet.dbo.cse_reports_month
Where month(monthly_past_date) > month(DATEADD(m,-2,monthly_past_date))
) x
ON r.starburst_dept_name = x.starburst_dept_name AND r.rowid = x.rowid+1
Where r.starburst_dept_name is NOT NULL
)
Select *
From date_cte
Order by Average DESC
So doing some testing, I have alter some columns data, to see why it gives me certain information. I don't know why when I run the query it gives my a date column that should not be there from "january" (row 4) like the picture below:
The database has more data that has the same exact date '2014-01-25 00:00:00.000', so I'm not sure why it would only get that row and compare the average?
I did before I run the query alter the column in that row and change the date? But I'm not sure if that would have something to do with it.
UPDATE:
I have added the sqlfinddle,
What I would like to get it subtract the average
from last_month - last 2 month ago.
It Was actually working until I made a change and alter the data.
I made the changes to test a certain situation, which obviously lead
to learning that there are flaws to the query.
Based on your SQL Fiddle, this eliminates joins from prior than month-2 from showing up.
SELECT
thismonth.starburst_dept_name
,lastmonth.monthtly_past_date [PrevDate]
,thismonth.monthtly_past_date [CurrDate]
,thismonth.starburst_dept_average - lastmonth.starburst_dept_average as Average
FROM dbo.cse_reports thismonth
inner join dbo.cse_reports lastmonth on
thismonth.starburst_dept_name = lastmonth.starburst_dept_name
AND month(DATEADD(MONTH,-1,thismonth.monthtly_past_date))=month(lastmonth.monthtly_past_date)
WHERE MONTH(thismonth.monthtly_past_date)=month(DATEADD(MONTH,-1,GETDATE()))
Order by thismonth.starburst_dept_average - lastmonth.starburst_dept_average DESC

SQL PIVOT TABLE

I have the following data:
ID Data
1 tera
1 add
1 alkd
2 adf
2 add
3 wer
4 minus
4 add
4 ten
I am trying to use a pivot table to push the rows into 1 row with multiple columns per ID.
So as follows:
ID Custom1 Custom2 Custom3 Custom4..........
1 tera add alkd
2 adf add
3 wer
4 minus add ten
I have the following query so far:
INSERT INTO #SpeciInfo
(ID, [Custom1], [Custom2], [Custom3], [Custom4], [Custom5],[Custom6],[Custom7],[Custom8],[Custom9],[Custom10],[Custom11],[Custom12],[Custom13],[Custom14],[Custom15],[Custom16])
SELECT
ID,
[Custom1],
[Custom2],
[Custom3],
[Custom4],
[Custom5],
[Custom6],
[Custom7],
[Custom8],
[Custom9],
[Custom10],
[Custom11],
[Custom12],
[Custom13],
[Custom14],
[Custom15],
[Custom16]
FROM SpeciInfo) p
PIVOT
(
(
[Custom1],
[Custom2],
[Custom3],
[Custom4],
[Custom5],
[Custom6],
[Custom7],
[Custom8],
[Custom9],
[Custom10],
[Custom11],
[Custom12],
[Custom13],
[Custom14],
[Custom15],
[Custom16]
)
) AS pvt
ORDER BY ID;
I need the 16 fields, but I am not exactly sure what I do in the From clause or if I'm even doing that correctly?
Thanks
If what you seek is to dynamically build the columns, that is often called a dynamic crosstab and cannot be done in T-SQL without resorting to dynamic SQL (building the string of the query) which is not recommended. Instead, you should build that query in your middle tier or reporting application.
If you simply want a static solution, an alternative to using PIVOT of what you seek might look something like so in SQL Server 2005 or later:
With NumberedItems As
(
Select Id, Data
, Row_Number() Over( Partition By Id Order By Data ) As ColNum
From SpeciInfo
)
Select Id
, Min( Case When Num = 1 Then Data End ) As Custom1
, Min( Case When Num = 2 Then Data End ) As Custom2
, Min( Case When Num = 3 Then Data End ) As Custom3
, Min( Case When Num = 4 Then Data End ) As Custom4
...
From NumberedItems
Group By Id
One serious problem in your original data is that there is no indicator of sequence and thus there is no means for the system to know which item for a given ID should appear in the Custom1 column as opposed to the Custom2 column. In my query above, I arbitrarily ordered by name.