T-SQL: identify difference of x in integer sequence - sql

I have a simple table: here is an example of what the values look like (after taking out confidential values).
What I need to do is count the number of instances the difference between a Date_Nbr value and the one following is greater than 2. I could imagine creating a second table where -- for each Group and Person, an integer count of the number of instances is placed.
Is this even possible in T-SQL? I haven't shown any code, because I'm at a total loss.
For those who need table definitions, it would be this:
Date_Nbr int not null,
Group varchar(5) not null,
Person varchar(5) not null
The values shown would exist within that table (let's call it Date_Seq).
I would be grateful for any ideas. Thank you.
Date_Nbr | Group | Person
1 C A
4 C A
5 C A
8 C A
10 C A
11 C A
13 C A
14 C A
15 C A
P.S. --
I described it verbally above, but this is a visual description of what I hope to achieve, where "Count_Gaps" is the number of times a difference greater than 2 was found in the sequence of "Date_Nbr".
Group | Person | Count_Gaps
C A [integer value]

All records that do not have a corresponding person+group record within 2 higher.
SELECT
*
FROM <table> t1
LEFT join <table> t2 ON t1.group=t2.group AND t1.person=t2.person AND (t1.Date_nbr+1)=t2.Date_nbr
LEFT join <table> t3 ON t1.group=t3.group AND t1.person=t3.person AND (t1.Date_nbr+2)=t3.Date_nbr
WHERE t2.Date_nbr IS NULL
AND t3.Date_nbr IS NULL

Related

Update table by all records from second table

Morning,
I have two tables. The first table (SecurityRules) is a list of security rules:
ID srRight srRole
1 4 NULL
2 2 32
The second table (Projects) is a list of Projects :
ProjId prRight prRole
1 0 NULL
2 0 32
3 0 NULL
I need to update the list of projects with all the records from SecurityRules and update the prRight column based on the Role from both tables. The Right values are bitwise-organised.
I used the following SQL update query to do this:
Update Projects
-- Perform binary sum
Set prRight = prRight | srRight
From SecurityRules
Where (srRole is Null) --Always apply srRight if srRole is not defined
OR (srRole is Not Null And srRole=prRole) --Apply right if both roles are equal
The expected result is:
ProjId prRight prRole
1 4 NULL
2 6 32
3 4 NULL
But I get:
ProjId prRight prRole
1 4 NULL
2 4 32
3 4 NULL
It looks like the update is done only by the first record of the SecurityRules table. And I need to apply all the records from the SecurityRules table to all records of the Project table.
If I create a simple loop and manually looped all the records from SecurityRules it is working fine, but the performance is very poor if you have to compare 10 security rules to 2000 projects...
Any suggestion?
Arno
This answer is based on the code in this answer for generating a bitwise OR of values. It uses CTEs to generate a bit mask for each rights value and then the overall bitwise OR by summing the distinct bit masks present in each of the rights values. The output of the last CTE is then used to update the Projects table:
WITH Bits AS (
SELECT 1 AS BitMask
UNION ALL
SELECT 2 * BitMask FROM Bits
WHERE BitMask < 65536
),
NewRights AS (
SELECT ProjId, SUM(DISTINCT BitMask) AS NewRight
FROM Projects p
JOIN SecurityRules s ON s.srRole IS NULL OR s.srRole = p.prRole
JOIN Bits b ON b.BitMask & s.srRight > 0
GROUP BY ProjID
)
UPDATE p
SET p.prRight = n.NewRight
FROM Projects p
JOIN NewRights n ON n.ProjId = p.ProjId
Resultant Projects table:
ProjId prRight prRole
1 4 null
2 6 32
3 4 null
Demo on dbfiddle
If I understand correctly, you have a direct match on the srRole column and then a default rule that applies to everyone.
The simplest method (in this case) is to use joins in the update:
update p
Set prRight = p.prRight | srn.srRight | coalesce(sr.srRight, 0)
From Projects p join
SecurityRules srn
on srRole is null left join
SecurityRules sr
on sr.srRole = p.prRole;
Here is a db<>fiddle.
You might be safer assuming no default rule. And that prRight could be NULL:
update p
Set prRight = coalesce(p.prRight, 0) | coalesce(srn.srRight, 0) | coalesce(sr.srRight, 0)
From Projects p left join
SecurityRules srn
on srRole is null left join
SecurityRules sr
on sr.srRole = p.prRole;
That said, I would recommend that you consider revising your data model. Bit fiddling is a lot of fun in programming languages. However, it is generally not the best approach in databases. Instead use junction tables, unless your application has some real need for bit switches.

Extract only variables which is greater than other table in influxDB

I am using influxDB and I would like to extract some values which is greater than certain threshold in other table.
For example, I have two tables as shown in below.
Table A
Time value
1 15
2 25
3 9
4 22
Table B
Time threshold
1 16
2 12
3 13
4 15
Give above two tables, I would like to extract three values which is greater than first row in Table B. Therefore what I want to have is as below.
Time value
2 25
4 22
I tried it using below sql query, but it didn't give any correct result.
select * from data1 where value > (select spec from spec1 limit1);
Look forward to your feedback.
Thanks.
Integrate the condition in an inner join:
select * from tableA as a
inner join tableB as b on a.id=b.id and a.value > b.threshold
When your time column doesn't only include integer values, you have to format the time and join on a time range. Here is an example:
SQL join on time range

Finding contiguous regions in a sorted MS Access query

I am a long time fan of Stack Overflow but I've come across a problem that I haven't found addressed yet and need some expert help.
I have a query that is sorted chronologically with a date-time compound key (unique, never deleted) and several pieces of data. What I want to know is if there is a way to find the start (or end) of a region where a value changes? I.E.
DateTime someVal1 someVal2 someVal3 target
1 3 4 A
1 2 4 A
1 3 4 A
1 2 4 B
1 2 5 B
1 2 5 A
and my query returns rows 1, 4 and 6. It finds the change in col 5 from A to B and then from B back to A? I have tried the find duplicates method and using min and max in the totals property however it gives me the first and last overall instead of the local max and min? Any similar problems?
I didn't see any purpose for the someVal1, someVal2, and someVal3 fields, so I left them out. I used an autonumber as the primary key instead of your date/time field; but this approach should also work with your date/time primary key. This is the data in my version of your table.
pkey_field target
1 A
2 A
3 A
4 B
5 B
6 A
I used a correlated subquery to find the previous pkey_field value for each row.
SELECT
m.pkey_field,
m.target,
(SELECT Max(pkey_field)
FROM YourTable
WHERE pkey_field < m.pkey_field)
AS prev_pkey_field
FROM YourTable AS m;
Then put that in a subquery which I joined to another copy of the base table.
SELECT
sub.pkey_field,
sub.target,
sub.prev_pkey_field,
prev.target AS prev_target
FROM
(SELECT
m.pkey_field,
m.target,
(SELECT Max(pkey_field)
FROM YourTable
WHERE pkey_field < m.pkey_field)
AS prev_pkey_field
FROM YourTable AS m) AS sub
LEFT JOIN YourTable AS prev
ON sub.prev_pkey_field = prev.pkey_field
WHERE
sub.prev_pkey_field Is Null
OR prev.target <> sub.target;
This is the output from that final query.
pkey_field target prev_pkey_field prev_target
1 A
4 B 3 A
6 A 5 B
Here is a first attempt,
SELECT t1.Row, t1.target
FROM t1 WHERE (((t1.target)<>NZ((SELECT TOP 1 t2.target FROM t1 AS t2 WHERE t2.DateTimeId<t1.DateTimeId ORDER BY t2.DateTimeId DESC),"X")));

List category/subcategory tree and display its sub-categories in the same row

I have a hierarchical table of Regions and sub-regions, and I need to list a tree of regions and sub-regions (which is easy), but also, I need a column that displays, for each region, all the ids of it's sub regions.
For example:
id name superiorId
-------------------------------
1 RJ NULL
2 Tijuca 1
3 Leblon 1
4 Gavea 2
5 Humaita 2
6 Barra 4
I need the result to be something like:
id name superiorId sub-regions
-----------------------------------------
1 RJ NULL 2,3,4,5,6
2 Tijuca 1 4,5,6
3 Leblon 1 null
4 Gavea 2 4
5 Humaita 2 null
6 Barra 4 null
I have done that by creating a function that retrieves a STUFF() of a region row,
but when I'm selecting all regions from a country, for example, the query becomes really, really slow, since I execute the function to get the region sons for each region.
Does anybody know how to get that in an optimized way?
The function that "retrieves all the ids as a row" is:
I meant that the function returns all the sub-region's ids as a string, separated by a comma.
The function is:
CREATE FUNCTION getSubRegions (#RegionId int)
RETURNS TABLE
AS
RETURN(
select stuff((SELECT CAST( wine_reg.wine_reg_id as varchar)+','
from (select wine_reg_id
, wine_reg_name
, wine_region_superior
from wine_region as t1
where wine_region_superior = #RegionId
or exists
( select *
from wine_region as t2
where wine_reg_id = t1.wine_region_superior
and (
wine_region_superior = #RegionId
)
) ) wine_reg
ORDER BY wine_reg.wine_reg_name ASC for XML path('')),1,0,'')as Sons)
GO
When we used to make these concatenated lists in the database we took a similar approach to what you are doing at first
then when we looked for speed
we made them into CLR functions
http://msdn.microsoft.com/en-US/library/a8s4s5dz(v=VS.90).aspx
and now our database is only responsible for storing and retrieving data
this sort of thing will be in our data layer in the application

Is there a [straightforward] way to order results *first*, *then* group by another column, with SQL?

I see that in an SQL query, the GROUP BY has to precede the ORDER BY expression. Does this imply that ordering is done after grouping would have discarded identical rows?
Because I seem to need to order rows by a timestamp first, then discard the rows with identical timestamp. And I don't know how to accomplish this.
I am using MySQL 5.1.41.
Here is the definition of the table expressed with create table:
create table
(
A int,
B timestamp
)
The data could be:
+-----+-----------------------+
| A | B |
+-----+-----------------------+
| 1 | today |
| 1 | yesterday |
| 2 | yesterday |
| 2 | tomorrow |
+-----+-----------------------+
The results of the query on the above table, which I am after, would be:
+-----+-----------------------+
| A | B |
+-----+-----------------------+
| 1 | today |
| 2 | tomorrow |
+-----+-----------------------+
Basically, I want the rows with the latest timestamp in column "B" (hence the mention of ORDER BY), and only one row for each value in column "A" (think DISTINCT or GROUP BY).
The actual problem behind the simplified example above:
In reality, I have two tables - users and payment_receipts:
create table users
(
phone_nr int(10) unsigned not null,
primary key (phone_nr)
)
create table payment_receipts
(
phone_nr int(10) unsigned not null,
payed_ts timestamp default current_timestamp not null,
payed_until_ts timestamp not null,
primary key (phone_nr, payed_ts, payed_until_ts)
)
The tables may include other columns but I omit these as irrelevant. Implementing a payment scheme, I have to send SMS to users across the cellular network, in periodic intervals depending on whether the payment is due or not. The payment is actualized when the SMS is sent as the recipient is taxed for it. I use the payment_receipts table to keep records of all payments done, i.e. for book-keeping. This is intended to model a real shop where both the buyer and the seller get a copy of the receipt of purchase, for reference. This table stores my (seller's) copy [of each receipt]. The customer's receipt is the received SMS itself. Each time an SMS is sent (and thus a payment is accomplished), the table is inserted a receipt record, stating who paid, when and "until when". To explain the latter, imagine a subscription service, but one which spans indefinitely until the user opt-out explicitly, at which point the corresponding user record is removed. A payment is made a month in advance, so as a rule, the difference between the payed_ts and payed_until_ts is 30 days worth of time.
I have a batch job that executes every day and needs to select a list of users that are due monthly payment as part of the automatic subscription renewal described above. To link this to the dummy example earlier, the phone number column phone_nr would be the column "A" and payed_until_ts would be column "B", but in reality there are two tables, which has to do with the following behaviour: when a user record is removed, the receipt must remain, for book-keeping. So not only do I need to group payments by date and discard all but the latest payment receipt date, I also need to watch out not to select receipts for which there no longer is a matching user record.
To solve the problem of selecting required records -- those that are due payment -- I need to find receipts with the latest payed_until_ts timestamp for each phone_nr (there may be several, obviously) and out of those records I further need to select only those phone numbers where payed_until_ts is earlier than the time the batch job executes. I then would send an SMS to each of these numbers, inserting a receipt record for each sent SMS, where payed_ts is now() and payed_until_ts is now() + interval 30 days.
But I can't seem to come up with the query required.
Select a,b from (select a,b from table order by b) as c group by a;
Yes, grouping is done first, and it affects a single select whereas ordering affects all the results from all select statements in a union, such as:
select a, 'max', max(b) from tbl group by a
union all select a, 'min', min(b) from tbl group by a
order by 1, 2
(using field numbers in order by since I couldn't be bothered to name my columns). Each group by affects only its select, the order by affects the combined result set.
It seems that what you're after can be achieved with:
select A, max(B) from tbl group by A
This uses the max aggregation function to basically do your pre-group ordering (it doesn't actually sort it in any decent DBMS, rather it will simply choose the maximum from an suitable index if available).
SELECT DISTINCT a,b
FROM tbl t
WHERE b = (SELECT MAX(b) FROM tbl WHERE tbl.a = t.a);
According to your new rules (tested with PostgreSQL)
Query You'd Want:
SELECT pr.phone_nr, pr.payed_ts, pr.payed_until_ts
FROM payment_receipts pr
JOIN users
ON (pr.phone_nr = users.phone_nr)
JOIN (select phone_nr, max(payed_until_ts) as payed_until_ts
from payment_receipts
group by phone_nr
) sub
ON ( pr.phone_nr = sub.phone_nr
AND pr.payed_until_ts = sub.payed_until_ts)
ORDER BY pr.phone_nr, pr.payed_ts, pr.payed_until_ts;
Original Answer (with updates):
CREATE TABLE foo (a NUMERIC, b TEXT, DATE);
INSERT INTO foo VALUES
(1,'a','2010-07-30'),
(1,'b','2010-07-30'),
(1,'c','2010-07-31'),
(1,'d','2010-07-31'),
(1,'a','2010-07-29'),
(1,'c','2010-07-29'),
(2,'a','2010-07-29'),
(2,'a','2010-08-01');
-- table contents
SELECT * FROM foo ORDER BY c,a,b;
a | b | c
---+---+------------
1 | a | 2010-07-29
1 | c | 2010-07-29
2 | a | 2010-07-29
1 | a | 2010-07-30
1 | b | 2010-07-30
1 | c | 2010-07-31
1 | d | 2010-07-31
2 | a | 2010-08-01
-- The following solutions both retrieve records based on the latest date
-- they both return the same result set, solution 1 is faster, solution 2
-- is easier to read
-- Solution 1:
SELECT foo.a, foo.b, foo.c
FROM foo
JOIN (select a, max(c) as c from foo group by a) bar
ON (foo.a=bar.a and foo.c=bar.c)
ORDER BY foo.a, foo.b, foo.c;
-- Solution 2:
SELECT a, b, MAX(c) AS c
FROM foo main
GROUP BY a, b
HAVING MAX(c) = (select max(c) from foo sub where main.a=sub.a group by a)
ORDER BY a, b;
a | b | c
---+---+------------
1 | c | 2010-07-31
1 | d | 2010-07-31
2 | a | 2010-08-01
(3 rows)
Comment:
1 is returned twice because their are multiple b values. This is acceptable (and advised). Your data should never have this problem, because c is based on b's value.
create table user_payments
(
phone_nr int NOT NULL,
payed_until_ts datetime NOT NULL
)
insert into user_payments
(phone_nr, payed_until_ts)
values
(1, '2016-01-28'), -- today
(1, '2016-01-27'), -- yesterday
(2, '2016-01-27'), -- yesterday
(2, '2016-01-29') -- tomorrow
select phone_nr, MAX(payed_until_ts) as latest_payment
from user_payments
group by phone_nr
-- OUTPUT:
-- phone_nr latest_payment
-- 1 2016-01-28 00:00:00.000
-- 2 2016-01-29 00:00:00.000
In the above example, I have used datetime column but similar query should work for timestamp column.
The MAX function will basically do the "ORDER BY" payed_until_ts column and pick the latest value for each phone_nr.
Also, you will get only one value for each phone_nr due to "GROUP BY" clause.