Complex SQL query or queries - sql

I looked at other examples, but I don't know enough about SQL to adapt it to my needs. I have a table that looks like this:
ID Month NAME COUNT First LAST TOTAL
------------------------------------------------------
1 JAN2013 fred 4
2 MAR2013 fred 5
3 APR2014 fred 1
4 JAN2013 Tom 6
5 MAR2014 Tom 1
6 APR2014 Tom 1
This could be in separate queries, but I need 'First' to equal the first month that a particular name is used, so every row with fred would have JAN2013 in the first field for example. I need the 'Last" column to equal the month of the last record of each name, and finally I need the 'total' column to be the sum of all the counts for each name, so in each row that had fred the total would be 10 in this sample data. This is over my head. Can one of you assist?

This is crude but should do the trick. I renamed your fields a bit because you are using a bunch of "RESERVED" sql words and that is bad form.
;WITH cte as
(
Select
[NAME]
,[nmCOUNT]
,ROW_NUMBER() over (partition by NAME order by txtMONTH ASC) as 'FirstMonth'
,ROW_NUMBER() over (partition by NAME order by txtMONTH DESC) as 'LastMonth'
,SUM([nmCOUNT]) as 'TotNameCount'
From Table
Group by NAME, [nmCOUNT]
)
,cteFirst as
(
Select
NAME
,[nmCOUNT]
,[TotNameCount]
,[txtMONTH] as 'ansFirst'
From cte
Where FirstMonth = 1
)
,cteLast as
(
Select
NAME
,[txtMONTH] as 'ansLast'
From cte
Where LastMonth = 1
Select c.NAME, c.nmCount, c.ansFirst, l.ansLast, c.TotNameCount
From cteFirst c
LEFT JOIN cteLast l on c.NAME = l.NAME

Related

SQL Compare Rows With Duplicate IDs and Return One With Lowest Sequence Number

Reaching out for help. I've seen plenty of answers on how to use DUPLICATE, but not quite how I need it. Let's say I have the result of query that looks like the following.
query result
Incident_No Open_Approval_Step Approval_ID
------------- -------------------- -------------------
1 3 Tech
1 4 Cust_Serv
2 1 Incident_Recorder
2 2 Estimation
2 3 Tech
3 4 Cust_Serv
3 5 Mgmt
3 6 Closure
And I need one row for each incident number with the smallest numbered approval step. So the result should look like this.
filtered query result
Incident_No Open_Approval_Step Approval_ID
------------- -------------------- -------------------
1 3 Tech
2 1 Incident_Recorder
3 4 Cust_Serv
Edit This is what I came up with in the end
SELECT DISTINCT
MIN(OPEN_APPROVAL_STEP) OVER(PARTITION BY INCIDENT_NO ORDER BY OPEN_APPROVAL_STEP ASC) AS CUR_APP_STEP,
INCIDENT_NO
FROM T
You can use row_number():
select *
from (
select
t.*,
row_number() over(partition by incident_no order by open_approval_step) rn
from mytable t
) t
where rn = 1
With just one extra column appart from the incident number and approval step, another option is aggregation and Oracle's keep syntax:
select
incident_no,
min(open_approval_step) open_approval_step,
min(approval_id) keep(dense_rank first order by open_approval_step) approval_id
from mytable
group by incident_no
If you have just three columns, you can easily use aggregation:
select incident_no, min(open_approval_step),
min(approval_id) keep (dense_rank first order by open_approval_step)
from t
group by incident_no;

SQL Server - Find similarities in column and write them into new column

I have a big table with data like this:
ID Title
-- ------------------------
1 01_SOMESTRING_038
2 01_SOMESTRING K5038
3 01_SOMESTRING-648
4 K-OTHERSTRING_T_73474
5 K-OTHERSTRING_T_ffk
6 ABC
7 DEF
And the task is now to find similarities in that column, and write that found similarity to a new column.
So the desired output would be like this:
ID Title Similarity
-- ------------------------ -----------------
1 01_SOMESTRING_038 01_SOMESTRING
2 01_SOMESTRING K5038 01_SOMESTRING
3 01_SOMESTRING-648 01_SOMESTRING
4 K-OTHERSTRING_T_73474 K-OTHERSTRING_T_
5 K-OTHERSTRING_T_ffk K-OTHERSTRING_T_
6 ABC NULL
7 DEF NULL
How can I achieve that in MS SQL Server 17?
Any help is much appreciated. Thanks!
EDIT: The strings are not only broken by delimiters such as "-", "_".
And for handling competeing similrities I would set a minimum length for the similarity. For instance 10.
Try the following, using a recursive CTE to split out the letters, then we can group them up to find the greatest match:
WITH TITLE_EXPAND AS (
SELECT
1 MatchLen
,CAST(SUBSTRING(Title,1,1) as NVARCHAR(255)) MatchString
,Title
,ID
FROM
[SourceDataTable]
UNION ALL
SELECT
MatchLen + 1
,CAST(SUBSTRING(Title,1,MatchLen+1) AS NVARCHAR(255))
,Title
,ID
FROM
TITLE_EXPAND
WHERE
MatchLen < LEN(Title)
)
SELECT DISTINCT
SDT.ID
,SDT.title
,FIRST_VALUE(MatchString) OVER (PARTITION BY SDT.ID ORDER BY SC.MatchLen DESC, SC.MatchCount DESC) Similarity
FROM
[SourceDataTable] SDT
LEFT JOIN
(SELECT
*
,COUNT(*) OVER (PARTITION BY MatchString, MatchLen) MatchCount
FROM
TITLE_EXPAND) SC
ON
SDT.ID = SC.ID
AND
SC.MatchCount > 1
ORDER BY SDT.ID
Where SourceDataTable is your source table. The Similarity value will be the longest matched similar value.

Adding same random rows to table in SQL

I have the following base table:
_ID_ _Name_
1 Bart Smit
2 Ahmed Lissabon
3 Medina Aziz
4 Ben Joeson
Whereby I would like to assign random titles to above table. However, the same titles should be assigned to the same persons every time the query is run. Thus if I have the following table:
_Titles_
Captain
Mr.
Ms.
Prince
King
Queen
Lieutenant
Doctor
Sir
So the output could look like this:
_ID_ _Title_ _Name_
1 Doctor Bart Smit
2 King Ahmed Lissabon
3 Captain Medina Aziz
4 Sir Ben Joeson
But then it should assign those titles to those names every time I run the code. Now I use the NEWID() in combination with a CROSS APPLY and it randomly assigns titles to names every time I run it.
SELECT _ID_, R._Title_, _NAME_
FROM TABLE
CROSS APPLY
(
SELECT TOP 1 Title
FROM #titles
WHERE TABLE.[_ID_]= TABLE.[_ID_]
ORDER BY NEWID()
) R
If you want the same result every time instead of just running query, update table:
WITH cte AS (
SELECT t.*, R.title AS new_title
FROM TABLE t
CROSS APPLY(SELECT TOP 1 Title
FROM #titles
ORDER BY NEWID()) R
WHERE _TITLE_ IS NULL
)
UPDATE cte
SET _Title_ = new_title;
You can't use any random calculation if it has to be repeatable (unless you add a new column to store that random value).
This applies checksum and modulo to get a repeatable value:
select *
from tab
join
( select *
,row_number() over (order by title) -1 as rn
from titles
) as t
-- or simply hardcoded 9
on abs(checksum(tab.Name,tab.id) % (select count(*) from titles)) = t.rn
;
Of course this will still return different when the number of titles changes (or add an ID column to titles).

SQL - Set field value based on count of previous rows values

I have the following table structure in Microsoft SQL:
ID Name Number
1 John
2 John
3 John
4 Mark
5 Mark
6 Anne
7 Anne
8 Luke
9 Rachael
10 Rachael
I am looking to set the 'Number' field to the number of times the 'Name' field has appeared previously, using SQL.
Desired output as follows:
ID Name Number
1 John 1
2 John 2
3 John 3
4 Mark 1
5 Mark 2
6 Anne 1
7 Anne 2
8 Luke 1
9 Rachael 1
10 Rachael 2
The table is ordered by 'Name', so there is no worry of 'John' appearing under ID 11 again, using my example.
Any help would be appreciated. I'm not sure if I can do this with a simple SELECT statement, or whether I will need an UPDATE statement, or something more advanced.
Use ROW_NUMBER:
SELECT ID, Name,
ROW_NUMBER() OVER (PARTITION BY Name
ORDER BY ID) AS Number
FROM mytable
There is no need to add a field for this, as the value can be easily calculated using window functions.
You should be able to use the ROW_NUMBER() function within SQL Server to partition each group (by their Name property) and output the individual row in each partition :
SELECT ID,
Name,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY ID) AS Number
FROM YourTable
ORDER BY ID
You can see what your data looks like prior to the query :
and then after it is executed :
If your system doesnt support OVER PARTITION, you can use following code:
SELECT
ID,
Name,
(
SELECT
SUM(counterTable.nameCount)
FROM
mytable innerTable
JOIN (SELECT 1 as nameCount) as counterTable
WHERE
innerTable.ID <= outerTable.ID
AND outerTable.Name = innerTable.Name
) AS cumulative_sum
FROM
mytable outerTable
ORDER BY outerTable.ID
Following CREATE TABLE statement I used and then filled in your data:
CREATE TABLE `mytable` (
`ID` INT(11) NULL DEFAULT NULL,
`Name` VARCHAR(50) NULL DEFAULT NULL
);
This should work with DBS not supporting OVER PARTITION like MySQL, Maria, ...

SQL Server find the missing number

I have a table like below
id name year
--------------
1 A 2000
2 B 2000
2 B 2000
2 B 2000
5 C 2000
1 D 2001
3 E 2001
as well as you see in the year 2000 we missed id '3' and id '4' and in the year 2001 we missed id '2'. I want to generate my second table which includes missed items.
2nd table :
From-id to-id name year
--------------------------------
3 4 null 2000
2 null null 2001
Which method in a SQL query can solve my problem?
Gaps and Islands in Sequences is the name of this problem. you read this article
Here's something to get you started:
WITH cte AS
(
SELECT *
FROM
(VALUES
(1),(2),(3),(4),(5)
) Tally(number)
), cte2 as
(
SELECT DISTINCT [year]
FROM
(VALUES
(2000),(2000),(2001)
)tbl([year])
), cte3 as
(
SELECT *
FROM cte
CROSS JOIN cte2
)
SELECT *
FROM cte3
LEFT OUTER JOIN YourTable ON cte3.number = YourTable.id AND cte3.[year] = YourTable[year)
A few notes: please avoid using reserved keywords as column names (such as year).
Furthermore, since I didn't know how you'd handle multiple missing ranges I did not format the output to reflect a range. For example: What would be your expected output if only one row with id=3 would be in your table?
I'd probably use ROW_NUMBER for this
This query gives you what the correct ID should be (if I interpreted your question right):
SELECT
ROW_NUMBER() OVER (PARTITION BY yr ORDER BY name, yr) as "Correct ID", *
FROM misorder
It assigns a row number (so a number starting from 1 increasing by 1 every time the year is the same).
And to let you know which ones are missing I think this should be a working solution:
WITH missing AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY yr ORDER BY name, yr) as "Correct ID", *
FROM misorder
)
SELECT * FROM missing
WHERE "Correct ID" != "id"
It takes the first query as a base to select only those records where the assumed correct ID is not equal to the currently assigned ID. You can turn this into a query to include the ranges you mentioned, but not sure if that is really necessary.
Hope this helps.