Postgres generate series of columns - sql

I have a couple of tables
Table 1:
meter_id | date
1 2019-01-01
Table 2:
meter_id | read_date | period | read
1 2019-01-01 1 5
1 2019-01-01 2 6
1 2019-01-01 5 2
1 2019-01-01 6 1
1 2019-01-01 7 2
2 2019-01-01 1 3
2 2019-01-01 2 10
1 2019-01-02 6 7
Is it possible to generate a series of columns so I end up with something like this:
meter_id | read_date | p_1 | p_2 | p_3 | p_4 | p_5 | p_6 ...
1 2019-01-01 5 6 2 1
2 2019-01-01 3 10
1 2019-01-02 7
where there are 48 reads per day (every half hour)
Without having to do multiple select statements?

You can use conditional aggregation:
select t1.meter_id, t1.date,
max(t2.read) filter (where period = 1) as p_1,
max(t2.read) filter (where period = 2) as p_2,
max(t2.read) filter (where period = 3) as p_3,
. . .
from table1 t1 join
table2 t2
on t1.meter_id = t2.meter_id and t1.date = t2.read_date
group by t1.meter_id, t1.date;

Related

How to create column for every single integer within a range in SQLite?

Here's some sample data from my table:
day_number daily_users_count
1 1
3 1
6 1
7 1
9 2
10 2
I need all day_number values, from 1 to max(day_number), and I want daily_users_count to be zero if it isn't mentioned in this table.
It should look something like this:
day_number daily_users_count
1 1
2 0
3 1
4 0
5 0
6 1
7 1
8 0
9 2
10 2
I think a left join with a table which has a number column with all integers from 1 to max(day_number) would work, if I put a default value for daily_users_count as 0.
What I don't get is how to create such a table where all integers within a certain range are present. Any alternate solutions or any ways to do this would be much appreciated.
You can do it with a recursive CTE which will return all the day_numbers including the missing ones and then a LEFT join to the table:
with cte as (
select min(day_number) day_number from tablename
union all
select day_number + 1 from cte
where day_number < (select max(day_number) from tablename)
)
select c.day_number,
coalesce(t.daily_users_count, 0) daily_users_count
from cte c left join tablename t
on t.day_number = c.day_number
See the demo.
Results:
| day_number | daily_users_count |
| ---------- | ----------------- |
| 1 | 1 |
| 2 | 0 |
| 3 | 1 |
| 4 | 0 |
| 5 | 0 |
| 6 | 1 |
| 7 | 1 |
| 8 | 0 |
| 9 | 2 |
| 10 | 2 |

SQL: Filling in Missing Records with Conditional

I need to count the number of products that existed in inventory by date. In the database however, a product is only recorded when it was viewed by a consumer.
For example consider this basic table structure:
date | productId | views
July 1 | A | 8
July 2 | A | 6
July 2 | B | 4
July 3 | A | 2
July 4 | A | 8
July 4 | B | 6
July 4 | C | 4
July 5 | C | 2
July 10 | A | 17
Using the following query, I attempt to determine the amount of products in inventory on a given date.
select date, count(distinct productId) as Inventory, sum(views) as views
from (
select date, productId, count(*) as views
from SomeTable
group by date, productID
order by date asc, productID asc
)
group by date
This is the output
date | Inventory | views
July 1 | 1 | 8
July 2 | 2 | 10
July 3 | 1 | 2
July 4 | 3 | 18
July 5 | 1 | 2
July 10 | 1 | 17
My output is not an accurate reflection of how many products were in inventory due to missing rows.
The correct understanding of inventory is as follows:
- Product A was present in inventory from July 1 - July 10.
- Product B was present in inventory from July 2 - July 4.
- Product C was in inventory from July 4 - July 5.
The correct SQL output should be:
date | Inventory | views
July 1 | 1 | 8
July 2 | 2 | 10
July 3 | 2 | 2
July 4 | 3 | 18
July 5 | 2 | 2
July 6 | 1 | 0
July 7 | 1 | 0
July 8 | 1 | 0
July 9 | 1 | 0
July 10 | 1 | 17
If you are following along, let me confirm that I am comfortable defining "in inventory" as the date difference between the first & last view.
I have followed the following faulty process:
First I created a table which was the cartesian product of every productID & every date.
'''
with Dates as (
select date
from SomeTable
group by date
),
Products as (
select productId
from SomeTable
group by productId
)
select Dates.date, Products.productId
from Dates cross join Products
'''
Then I attempted do a right outer join to reduce this to just the missing records:
with Records as (
select date, productId, count(*) as views
from SomeTable
group by date, productId
),
Cartesian as (
{See query above}
)
Select Cartesian.date, Cartesian.productId, 0 as views #for upcoming union
from Cartesian right outer join Records
on Cartesian.date = Records.date
where Records.productId is null
Then with the missing rows in hand, union them back onto the Records.
in doing so, I create a new problem: extra rows.
date | productId | views
July 1 | A | 8
July 1 | B | 0
July 1 | C | 0
July 2 | A | 6
July 2 | B | 4
July 2 | C | 0
July 3 | A | 2
July 3 | B | 0
July 3 | C | 0
July 4 | A | 8
July 4 | B | 6
July 4 | C | 4
July 5 | A | 2
July 5 | B | 0
July 5 | C | 0
July 6 | A | 0
July 6 | B | 0
July 6 | C | 0
July 7 | A | 0
July 7 | B | 0
July 7 | C | 0
July 8 | A | 0
July 8 | B | 0
July 8 | C | 0
July 9 | A | 0
July 9 | B | 0
July 9 | C | 0
July 10 | A | 17
July 10 | B | 0
July 10 | C | 0
And when I run my simple query
select date, count(distinct productId) as Inventory, sum(views) as views
on that table I get the wrong output again:
date | Inventory | views
July 1 | 3 | 8
July 2 | 3 | 10
July 3 | 3 | 2
July 4 | 3 | 18
July 5 | 3 | 2
July 6 | 3 | 0
July 7 | 3 | 0
July 8 | 3 | 0
July 9 | 3 | 0
July 10 | 3 | 17
My next thought would be to iterate through each productId, determine it's first & last date, then Union that with the Cartesian table with the condition that the Cartesian.date falls between the first & last date for each specific product.
There's got to be an easier way to do this. Thanks.
Below is for BigQuery Standard SQL
#standardSQL
WITH dates AS (
SELECT day FROM (
SELECT MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) day
), ranges AS (
SELECT productId, MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table` t
GROUP BY productId
)
SELECT day, COUNT(DISTINCT productId) Inventory, SUM(IFNULL(views, 0)) views
FROM dates d, ranges r
LEFT JOIN `project.dataset.table` USING(day, productId)
WHERE day BETWEEN min_day AND max_day
GROUP BY day
If to apply to sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2019-07-01' day, 'A' productId, 8 views UNION ALL
SELECT '2019-07-02', 'A', 6 UNION ALL
SELECT '2019-07-02', 'B', 4 UNION ALL
SELECT '2019-07-03', 'A', 2 UNION ALL
SELECT '2019-07-04', 'A', 8 UNION ALL
SELECT '2019-07-04', 'B', 6 UNION ALL
SELECT '2019-07-04', 'C', 4 UNION ALL
SELECT '2019-07-05', 'C', 2 UNION ALL
SELECT '2019-07-10', 'A', 17
), dates AS (
SELECT day FROM (
SELECT MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) day
), ranges AS (
SELECT productId, MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table` t
GROUP BY productId
)
SELECT day, COUNT(DISTINCT productId) Inventory, SUM(IFNULL(views, 0)) views
FROM dates d, ranges r
LEFT JOIN `project.dataset.table` USING(day, productId)
WHERE day BETWEEN min_day AND max_day
GROUP BY day
-- ORDER BY day
result is
Row day Inventory views
1 2019-07-01 1 8
2 2019-07-02 2 10
3 2019-07-03 2 2
4 2019-07-04 3 18
5 2019-07-05 2 2
6 2019-07-06 1 0
7 2019-07-07 1 0
8 2019-07-08 1 0
9 2019-07-09 1 0
10 2019-07-10 1 17

Looking for duplicate transactions within a 5 minutes over a 24 hour time period

I am looking for duplicate transactions between a 5 minute window during a 24 hour period. I am trying to find users abusing other users access. Here is what I have so far, but it is only searching the past 5 minutes and not searching the 24 hour period. It is ORACLE.
SELECT p.id, Count(*) count
FROM tranledg tl,
patron p
WHERE p.id = tl.patronid
AND tl.trandate > (sysdate-5/1440)
AND tl.plandesignation in ('1')
AND p.id in (select id from tranledg tl where tl.trandate > (sysdate-1))
GROUP BY p.id
HAVING COUNT(*)> 1
Example data:
Patron
id | Name
--------------------------
1 | Joe
2 | Henry
3 | Tom
4 | Mary
5 | Sue
6 | Marie
Tranledg
tranid | trandate | location | patronid
--------------------------
1 | 2015-03-01 12:01:00 | 1500 | 1
2 | 2015-03-01 12:01:15 | 1500 | 2
3 | 2015-03-01 12:03:30 | 1500 | 1
4 | 2015-03-01 12:04:00 | 1500 | 3
5 | 2015-03-01 15:01:00 | 1500 | 4
6 | 2015-03-01 15:01:15 | 1500 | 4
7 | 2015-03-01 17:01:15 | 1500 | 2
8 | 2015-03-01 18:01:30 | 1500 | 1
9 | 2015-03-01 19:02:00 | 1500 | 3
10 | 2015-03-01 20:01:00 | 1500 | 4
11 | 2015-03-01 21:01:00 | 1500 | 5
I would expect the following data to return:
ID | COUNT
1 | 2
4 | 2
You can use an analytic clause with a range window like this:
select *
from (select tranid
, patronid
, count(*) over(partition by patronid
order by trandate
range between 0 preceding
and 5/60/24 following) count
from tranledg
where trandate >= sysdate-1)
where count > 1
It will output all transactions that are followed with more ones for the same patronid in the range of 5 minutes along with the count of the transactions in the range (you did not specify what to do if there are more than one such a range or when the ranges are overlapping).
Output on the test data (without the condition for sysdate as it already passed):
TRANID PATRONID COUNT
------ -------- -----
1 1 2
5 4 2
I did it using Postgres online, Oracle version very similar, only be carefull with date operation.
SQL DEMO
You need a self join.
SELECT T1.patronid, count(*)
FROM Tranledg T1
JOIN Tranledg T2
ON T2."trandate" BETWEEN T1."trandate" + '-2 minute' AND T1."trandate" + '2 minute'
AND T1."patronid" = T2."patronid"
AND T1."tranid" <> T2."tranid"
GROUP BY T1.patronid;
OUTPUT
You need to fix the data, so 1 has two records.

SQL Query to select each row with max value per group

I'm very new to SQL and this one has me stumpted. Can you help me out with this query?
I have the following 2 tables:
TABLE 1: IssueTable
Id | RunId | Value
---
1 | 1 | 10
2 | 1 | 20
3 | 1 | 30
4 | 2 | 40
5 | 2 | 50
6 | 3 | 60
7 | 4 | 70
8 | 5 | 80
9 | 6 | 90
TABLE 2: RunTable
RunId | EnvironmentId
---
1 | 1
2 | 3
3 | 1
4 | 2
5 | 4
6 | 2
I need the IssueTable rows that represent the Max RunId grouped by the EnvironmentId in the RunTable. The result I would need from the tables is:
EXPECTED RESULT:
Id | RunId | Value | EnvironmentId
---
4 | 2 | 40 | 3
5 | 2 | 50 | 3
6 | 3 | 60 | 1
8 | 5 | 80 | 4
9 | 6 | 90 | 2
So only the rows with the most recent/highest RunId from the RunTable per EnvironmentId. For example, for the EnvironmentId of "1", I only want rows that contain a RunId of "3" because the most recent RunId on EnvironmentId "1" from the RunTable is "3". Likewise, the most recent run for EnvironementId "2" was RunId "6"
Use a subquery to get the max runid for each environmentid from the runtable. Join the obtained result to the issuetable and select the required columns.
select i.id, i.runid, i.value, r.environmentid
from (select environmentid, max(runid) maxrunid
from runtable
group by environmentid) r
join issuetable i on i.runid = r.maxrunid
order by i.runid, i.id
These days one can use the analytical functions like RANK, DENSE_RANK, ROW_NUMBER to generate some ranking of your records.
Window functions are part of the ANSI SQL:2003 standard.
And I've at least encountered them on TeraData, Oracle and SQL-Server.
select Id, RunId, Value, EnvironmentId
from (
select i.*, r.EnvironmentId,
dense_rank() over (partition by r.EnvironmentId order by r.RunId desc) as RN
from issuetable i
inner join runtable r on (i.RunId = r.RunId)
) Q
where RN = 1
order by Id;
The inner query would yield the following results :
Id RunId Value EnvironmentId RN
1 1 10 1 2
2 1 20 1 2
3 1 30 1 2
4 2 40 3 1
5 2 50 3 1
6 3 60 1 1
7 4 70 2 2
8 5 80 4 1
9 6 90 2 1

Oracle GROUP BY similar timestamps?

I have an activity table with a structure like this:
id prd_id act_dt grp
------------------------------------
1 1 2000-01-01 00:00:00
2 1 2000-01-01 00:00:01
3 1 2000-01-01 00:00:02
4 2 2000-01-01 00:00:00
5 2 2000-01-01 00:00:01
6 2 2000-01-01 01:00:00
7 2 2000-01-01 01:00:01
8 3 2000-01-01 00:00:00
9 3 2000-01-01 00:00:01
10 3 2000-01-01 02:00:00
I want to split the data within this activity table by product (prd_id) and activity date (act_dt), and update the the group (grp) column with a value from a sequence for each of these groups.
The kicker is, I need to group by similar timestamps, where similar means "all records have a difference of exactly 1 second." In other words, within a group, the difference between any 2 records when sorted by date will be exactly 1 second, and the difference between the first and last records can be any amount of time, so long as all the intermediary records are 1 second apart.
For the example data, the groups would be:
id prd_id act_dt grp
------------------------------------
1 1 2000-01-01 00:00:00 1
2 1 2000-01-01 00:00:01 1
3 1 2000-01-01 00:00:02 1
4 2 2000-01-01 00:00:00 2
5 2 2000-01-01 00:00:01 2
6 2 2000-01-01 01:00:00 3
7 2 2000-01-01 01:00:01 3
8 3 2000-01-01 00:00:00 4
9 3 2000-01-01 00:00:01 4
10 3 2000-01-01 02:00:00 5
What method would I use to accomplish this?
The size of the table is ~20 million rows, if that affects the method used to solve the problem.
I'm not an Oracle wiz, so I'm guessing at the best option for one line:
(CAST('2010-01-01' AS DATETIME) - act_dt) * 24 * 60 * 60 AS time_id,
This just needs to be "the number of seconds from [aDateConstant] to act_dt". The result can be negative. It just needs to be a the number of seconds, to turn your act_dt into an INT. The rest should work fine.
WITH
sequenced_data
AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY prd_id ORDER BY act_dt) AS sequence_id,
(CAST('2010-01-01' AS DATETIME) - act_dt) * 24 * 60 * 60 AS time_id,
*
FROM
yourTable
)
SELECT
DENSE_RANK() OVER (PARTITION BY prd_id ORDER BY time_id - sequence_id) AS group_id,
*
FROM
sequenced_data
Example data:
sequence_id | time_id | t-s | group_id
-------------+---------+-----+----------
1 | 1 | 0 | 1
2 | 2 | 0 | 1
3 | 3 | 0 | 1
4 | 8 | 4 | 2
5 | 9 | 4 | 2
6 | 12 | 6 | 3
7 | 14 | 7 | 4
8 | 15 | 7 | 4
NOTE: This does assume there are not multiple records with the same time. If there are, they would need to be filtered out first. Probably just using a GROUP BY in a preceding CTE.