I have a table with 125k records. Each day, I insert ~20 records and generate 1,000 notifications based on the top 1000 records in the table ordered by insert time. Once notifications are generated, they are marked and no longer considered for future notification delivery. This has worked fine for a long time, except that a large insert of 100k which was ordered in a weird manner causes some issues.
There are 4 types of records, 2 columns with two different values each determines which of these 4 types it is. Based on the file sorting, one of the types is in the first 80k records and dominates the daily notifications.
I am working to fix this by creating a trigger on insert that will reorder the table in a manner that there is more evenly dispersed notifications each day.
My question: Is there a built in SQL Sorting Function that can proportionally sort results based on data in a column?
i.e. Can I get an 80/20 split based on column A and underneath, get an 80/20 split based on column B so that if I have the below options, I get 640 Records of (1,1), 160 records of (1,2), 160 records of (2,1), and 40 records of (2,2) without doing a hard coded select top X statement, as there are times I won't have 640 (1,1) records, but I still want 1000 total notifications generated.
Column A Column B
1 1
1 2
2 1
2 2
You should be able to use ROW_NUMBER to achieve what you want. You didn't provide table structures, so some of this is guesswork:
;WITH CTE_NumberedRows AS
(
SELECT
id,
column_a,
column_b,
some_date,
ROW_NUMBER() OVER (PARTITION BY column_a, column_b ORDER BY some_date) AS row_num
FROM
My_Table
)
SELECT TOP 1000
id,
column_a,
column_b
FROM
CTE_NumberedRows
ORDER BY
CASE WHEN column_a = 1 AND column_b = 1 AND row_num <= 640 THEN 0 ELSE 1 END,
CASE WHEN column_a = 1 AND column_b = 2 AND row_num <= 160 THEN 0 ELSE 1 END,
CASE WHEN column_a = 2 AND column_b = 1 AND row_num <= 160 THEN 0 ELSE 1 END,
CASE WHEN column_a = 2 AND column_b = 2 AND row_num <= 40 THEN 0 ELSE 1 END,
some_date
Whether or not you need to order by the date in ASC or DESC order isn't clear (or if you're even ordering on a date.) Hopefully the general gist of how it can be done is enough though. You're prioritizing (via the ORDER BY and CASE statements) the first "x" rows from each type and then after that just by the date.
Related
One table is a sample of users and their purchases.
Structure:
Email | NAME | TRAN_DATETIME (Varchar)
So we have customer email + FirstName&LastName + Date of transaction
and the second table that comes from second system contains all users, they sensitive data and when they got registered in our system.
Simplified Structure:
Email | InstertDate (varchar)
My task is to count minutes difference between the rows insterted from sale(first table)and the rows with users and their sensitive data.
The issue is that second table contain many rows and I want to find the nearest in time row that was inserted in 2nd table, because sometimes it may be a few minutes difeerence(delay or opposite of delay)and sometimes it can be a few days.
So for x email I have row in 1st table:
E_MAIL NAME TRAN_DATETIME
p****#****.eu xxx xxx 2021-10-04 00:03:09.0000000
But then I have 3 rows and the lastest is the one I want to count difference
Email InstertDate
p****#****.eu 2021-05-20 19:12:07
p****#****.eu 2021-05-20 19:18:48
p****#****.eu 2021-10-03 18:32:30 <--
I wrote that some query, but I have no idea how to match nearest row in the 2nd table
SELECT DISTINCT TOP (100)
,a.[E_MAIL]
,a.[NAME]
,a.[TRAN_DATETIME]
,CASE WHEN b.EMAIL IS NOT NULL THEN 'YES' ELSE 'NO' END AS 'EXISTS'
,(ABS(CONVERT(INT, CONVERT(Datetime,LEFT(a.[TRAN_DATETIME],10),120))) - CONVERT(INT, CONVERT(Datetime,LEFT(b.[INSERTDATE],10),120))) as 'DateAccuracy'
FROM [crm].[SalesSampleTable] a
left join [crm].[SensitiveTable] b on a.[E_MAIL]) = b.[EMAIL]
Totally untested: I'd need sample data and database the area of suspect is the casting of dates and the datemath.... since I dont' know what RDBMS and version this is.. consider the following "pseudo code".
We assign a row number to the absolute difference in seconds between the dates those with rowID of 1 win.
WTIH CTE AS (
SELECT A.*, B.* row_number() over (PARTITION BY A.e_mail
ORDER BY abs(datediff(second, cast(Tran_dateTime as Datetime), cast(InsterDate as DateTime)) desc) RN
FROM [crm].[SalesSampleTable] a
LEFT JOIN [crm].[SensitiveTable] b
on a.[E_MAIL] = b.[EMAIL])
SELECT * FROM CTE WHERE RN = 1
I have the task to find out if all columns in a SQL Server table have exact the same value. The table content is created by a stored procedure and can vary in the number of columns. The first column is an ID, the second and the following columns must be compared if the all columns have exact the same value.
At the moment I do not have a clue how to achieve this.
The best solution would be to display only the rows, which have different values in one or multiple columns except the first column with ID.
Thank you so much for your help!!
--> Edit: The table looks this:
ID Instance1 Instance2 Instance3 Instance4 Instance5
=====================================================
A 1 1 1 1 1
B 1 1 0 1 1
C 55 55 55 55 55
D Driver Driver Driver Co-driver Driver
E 90 0 90 0 50
F On On On On On
The result should look like this, only the rows with one or multiple different column values should be display.
ID Instance1 Instance2 Instance3 Instance4 Instance5
=====================================================
B 1 1 0 1 1
D Driver Driver Driver Co-driver Driver
E 90 0 90 0 50
My table has more than 1000 rows and 40 columns
you can achieve this by using row_number()
Try the following code
With c as(
Select id
,field_1
,field_2
,field_3
,field_n
,row_number() over(partition by field_1,field_2,field_3,field_n order by id asc) as rn
From Table
)
Select *
From c
Where rn = 1
row_number with partition is going to show you if the field is repeated by assigning a number to a row based on field_1,field_2,field_3,field_n, for example if you have 2 rows with same field values the inner query is going to show you
rn field_1 field_2 field_3 field_n id
1 x y z a 5
2 x y z a 9
After that on the outer part of the query pick rn = 1 and you are going to obtain a query without repetitions based on fields.
Also if you want to delete repeated numbers from your table you can apply
With c as(
Select id
,field_1
,field_2
,field_3
,field_n
,row_number() over(partition by field_1,field_2,field_3,field_n order by id asc) as rn
From Table
)
delete
From c
Where rn > 1
The best solution would be to display only the rows, which have different values in one or multiple columns except the first column with ID.
You may be looking for a the following simple query, whose WHERE clause filters out rows where all fields have the same value (I assumed 5 fields - id not included).
SELECT *
FROM mytable t
WHERE NOT (
field1 = field2
AND field1 = field3
AND field1 = field4
AND field1 = field5
);
I have a table that has a field called ‘group_quartile’ which uses the sql ntile() function to calculate which quartile does each customer lie in on the basis of their activity scores. However using this ntile(0 function i find there are some customers which have same activity scores but are in different quartiles. I need to modify the ‘group-quartile’ column to make all customers with the same activity scores lie in the same group_quartile.
A view of the table values :
Customer_id Product Activity_Score Group_Quartile
CH002 T 2328 1
CR001 T 268 1
CN001 T 178 1
MS006 T 45 2
ST001 T 21 2
CH001 T 0 2
CX001 T 0 3
KH001 T 0 3
MH002 T 0 4
SJ003 T 0 4
CN001 S 439 1
AC002 S 177 1
SC001 S 91 2
PV001 S 69 3
TS001 S 0 4
I used CTE expression but it didnot work.
My query only updates(from the above example) :
CX001 T 0 3
modified to
CX001 T 0 2
So only the first repeating activity score is checked and that row’s group_quartile is updated to 2.
I need to update all the below rows as well.
CX001 T 0 3
KH001 T 0 3
MH002 T 0 4
SJ003 T 0 4
I cannot use DENSE_RANK() instead of quartile to segregate the records as arranging the customers per product in approximately 4 quartiels is a business requirement.
From my understanding I need to loop through the table -
Find a row which has same activity score and the same product as its predecessor but has a different group_quartile
Update the selected row's group_quartile to its predecessor's quartile value
Then againg loop through the updated table to look for any row with the above condition , and update that row similarly.
The loop continues until all rows with same activity scores (for the same product) are put in the same group_quartile.
--
THIS IS THE TABLE STRUCTURE I AM WORKING ON:
CREATE TABLE #custs
(
customer_id NVARCHAR(50),
PRODUCT NVARCHAR(50),
ACTIVITYSCORE INT,
GROUP_QUARTILE INT,
RANKED int,
rownum int
)
INSERT INTO #custs
-- adding a column to give row numbers(unique id) for each row
SELECT customer_id, PRODUCT, ACTIVITYSCORE,GROUP_QUARTILE,RANKED,
Row_Number() OVER(partition by product ORDER BY activityscore desc) N
FROM
-- rows derived form a parent table based on 'segmentation' column value
(SELECT customer_id, PRODUCT, ACTIVITYSCORE,
DENSE_RANK() OVER (PARTITION BY PRODUCT ORDER BY ACTIVITYSCORE DESC) AS RANKED,
NTILE(4) OVER(PARTITION BY PRODUCT ORDER BY ACTIVITYSCORE DESC) AS GROUP_QUARTILE
FROM #parent_score_table WHERE (SEGMENTATION = 'Large')
) as temp
ORDER BY PRODUCT
The method I used to achieve this partially is as follows :
-- The query find the rows which have activity score same as its previous row but has a different GRoup_Quartiel value.
-- I need to use a query to update this row.
-- Next, find any rows in this newly updated table that has activity score same as its previous row but a differnet group_quartile vale.
-- Continue to update the tabel in the above manner until all rows with same activity scores have been updated to have the same quartile value
I managed to find only the rows which have activity score same as its previous row but has a different Group_Quartill value but cannot loop thorugh to find new rows that may match this updated row.
select t1.customer_id,t1.ACTIVITYSCORE,t1.PRODUCT, t1.RANKED, t1.GROUP_QUARTILE, t2.GROUP_QUARTILE as modified_quartile
from #custs t1, #custs t2
where (
t1.rownum = t2.rownum + 1
and t1.ACTIVITYSCORE = t2.ACTIVITYSCORE
and t1.PRODUCT = t2.PRODUCT
and not(t1.GROUP_QUARTILE = t2.GROUP_QUARTILE))
Can anyone help with what should be the t-sql statement for the above?
Cheers!
Assuming you've already worked out a basis Group_Quartile as indicated above, you can update the table with a query similar to the following:
update a
set Group_Quartile = coalesce(topq.Group_Quartile, a.Group_Quartile)
from activityScores a
outer apply
(
select top 1 Group_Quartile
from activityScores topq
where a.Product = topq.Product
and a.Activity_Score = topq.Activity_Score
order by Group_Quartile
) topq
SQL Fiddle with demo.
Edit after comment:
I think you did a lot of the work already by getting the Group_Quartile working.
For each row in the table, the statement above will join another row to it using the outer apply statement. Only one row will be joined back to the original table due to the top 1 clause.
So each for each row, we are returning one more row. The extra row will be matched on Product and Activity_Score, and will be the row with the lowest Group_Quartile (order by Group_Quartile). Finally, we update the original row with this lowest Group_Quartile value so each row with the same Product and Activity_Score will now have the same, lowest possible Group_Quartile.
So SJ003, MH002, etc will all be matched to CH001 and be updated with the Group_Quartile value of CH001, i.e. 2.
It's hard to explain code! Another thing that might help is looking at the join without the update statement:
select a.*
, TopCustomer_id = topq.Customer_Id
, NewGroup_Quartile = topq.Group_Quartile
from activityScores a
outer apply
(
select top 1 *
from activityScores topq
where a.Product = topq.Product
and a.Activity_Score = topq.Activity_Score
order by Group_Quartile
) topq
SQL Fiddle without update.
I have a query :
select score, count(1) as 'NumStudents' from testresults where testid = 'mytestid'
group by score order by score
where testresults table contains the performances of students in a test. A sample result looks like the following, assuming maximum marks of the test is 10.
score, NumStudents
0 10 1 20 2 12 3 5 5 34 .. 10 23
As you can see, this query does not return any records for scores which no student have scored. For eg. nobody scored 4/10 in the test and there are no records for score = 4 in the query output.
I would like to change the query so that I can get these missing records with 0 as the value for the NumStudents field. So that my end output would have max + 1 records, one for each possible score.
Any ideas ?
EDIT:
The database contains several tests and the maximum marks for the test is part of the test definition. So having a new table for storing all possible scores is not feasible. In the sense that whenever I create a new test with a new max marks, I need to ensure that the new table should be changed to contain these scores as well.
SQL is good at working with sets of data values in the database, but not so good at sets of data values that are not in the database.
The best workaround is to keep one small table for the values you need to range over:
CREATE TABLE ScoreValues (score int);
INSERT INTO ScoreValues (score)
VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
Given your comment that you define the max marks of a test in another table, you can join to that table in the following way, as long as ScoreValues is sure to have values at least as high or higher than the greatest test's max marks:
SELECT v.score, COUNT(tr.score) AS 'NumStudents'
FROM ScoreValues v
JOIN Tests t ON (v.score <= t.maxmarks)
LEFT OUTER JOIN TestResults tr ON (v.score = tr.score AND t.testid = tr.testid)
WHERE t.testid = 'mytestid'
GROUP BY v.score;
The most obvious way would be to create a table named "Scores" and left outer join your table to it.
SELECT s.score, COUNT(1) AS scoreCount
FROM score AS s
LEFT OUTER JOIN testScores AS ts
ON s.score = ts.score
GROUP BY s.score
If you don't want to create the table, you could use
SELECT
1 as score, SUM(CASE WHEN ts.score = 1 THEN 1 ELSE 0 END) AS scoreCount,
2 as score, SUM(CASE WHEN ts.score = 2 THEN 1 ELSE 0 END) AS scoreCount,
3 as score, SUM(CASE WHEN ts.score = 3 THEN 1 ELSE 0 END) AS scoreCount,
4 as score, SUM(CASE WHEN ts.score = 4 THEN 1 ELSE 0 END) AS scoreCount,
...
10 as score, SUM(CASE WHEN ts.score = 10 THEN 1 ELSE 0 END) AS scoreCount
FROM testScores AS ts
Does MySQL support set-returning functions? Recent releases of PostgreSQL have a function, generate_series(start, stop) that produces the value start on the first row, start+1 on the second, and so on up to stop on the stopth row. The advantage of this is that you can put this function in a subselect in the FROM clause and then join to it, instead of creating and populating a table and joining to that as suggested by le dorfier and Bill Karwin.
Just as a mental exercise I came up with this to generate a sequence in MySQL. As long as the number of tables in all databases on the box squared are less than the total length of the sequence it will work. I wouldn't recommend it for production though ;)
SELECT #n:=#n+1 as n from (select #n:=-1) x, Information_Schema.Tables y, Information_Schema.Tables WHERE #n<20; /* sequence from 0 to 20 inclusive */
Given a table (mytable) containing a numeric field (mynum), how would one go about writing an SQL query which summarizes the table's data based on ranges of values in that field rather than each distinct value?
For the sake of a more concrete example, let's make it intervals of 3 and just "summarize" with a count(*), such that the results tell the number of rows where mynum is 0-2.99, the number of rows where it's 3-5.99, where it's 6-8.99, etc.
The idea is to compute some function of the field that has constant value within each group you want:
select count(*), round(mynum/3.0) foo from mytable group by foo;
I do not know if this is applicable to mySql, anyway in SQL Server I think you can "simply" use group by in both the select list AND the group by list.
Something like:
select
CASE
WHEN id <= 20 THEN 'lessthan20'
WHEN id > 20 and id <= 30 THEN '20and30' ELSE 'morethan30' END,
count(*)
from Profiles
where 1=1
group by
CASE
WHEN id <= 20 THEN 'lessthan20'
WHEN id > 20 and id <= 30 THEN '20and30' ELSE 'morethan30' END
returns something like
column1 column2
---------- ----------
20and30 3
lessthan20 3
morethan30 13