Arranging the data on the basis of column value - sql

I have a table which has the below structure.
+ ----------------------+--------------+--------+
| timeStamp | value | type |
+ ----------------------+--------------+--------+
| '2010-01-14 00:00:00' | '11787.3743' | 'mean' |
| '2018-04-03 14:19:21' | '9.9908' | 'std' |
| '2018-04-03 14:19:21' | '11787.3743' | 'min' |
+ ----------------------+--------------+--------+
Now i want to write a select query where i can fetch the data on the basis of type.
+ ----------------------+--------------+-------------+----------+
| timeStamp | mean_type | min_type | std_type |
+ ----------------------+--------------+-------------+----------+
| '2010-01-14 00:00:00' | '11787.3743' | | |
| '2018-04-03 14:19:21' | | | '9.9908' |
| '2018-04-03 14:19:21' | | '11787.3743 | |
+ ----------------------+--------------+-------------+----------+
Please help me how can i do this in postgres DB by writing a query.I also want to get the data at the interval of 10 minutes only.

Use CASE ... WHEN ...:
with my_table(timestamp, value, type) as (
values
('2010-01-14 00:00:00', 11787.3743, 'mean'),
('2018-04-03 14:19:21', 9.9908, 'std'),
('2018-04-03 14:19:21', 11787.3743, 'min')
)
select
timestamp,
case type when 'mean' then value end as mean_type,
case type when 'min' then value end as min_type,
case type when 'std' then value end as std_type
from my_table;
timestamp | mean_type | min_type | std_type
---------------------+------------+------------+----------
2010-01-14 00:00:00 | 11787.3743 | |
2018-04-03 14:19:21 | | | 9.9908
2018-04-03 14:19:21 | | 11787.3743 |
(3 rows)

Related

Replace null values with most recent non-null values SQL

I have a table where each row consists of an ID, date, variable values (eg. var1).
When there is a null value for var1 in a row, I want like to replace the null value with the most recent non-null value before that date for that ID. How can I do this quickly for a very large table?
So presume I start with this table:
+----+------------|-------+
| id |date | var1 |
+----+------------+-------+
| 1 |'01-01-2022'|55 |
| 2 |'01-01-2022'|12 |
| 3 |'01-01-2022'|45 |
| 1 |'01-02-2022'|Null |
| 2 |'01-02-2022'|Null |
| 3 |'01-02-2022'|20 |
| 1 |'01-03-2022'|15 |
| 2 |'01-03-2022'|Null |
| 3 |'01-03-2022'|Null |
| 1 |'01-04-2022'|Null |
| 2 |'01-04-2022'|77 |
+----+------------+-------+
Then I want this
+----+------------|-------+
| id |date | var1 |
+----+------------+-------+
| 1 |'01-01-2022'|55 |
| 2 |'01-01-2022'|12 |
| 3 |'01-01-2022'|45 |
| 1 |'01-02-2022'|55 |
| 2 |'01-02-2022'|12 |
| 3 |'01-02-2022'|20 |
| 1 |'01-03-2022'|15 |
| 2 |'01-03-2022'|12 |
| 3 |'01-03-2022'|20 |
| 1 |'01-04-2022'|15 |
| 2 |'01-04-2022'|77 |
+----+------------+-------+
cte suits perfect here
this snippets returns the rows with values, just an update query and thats all (will update my response).
WITH selectcte AS
(
SELECT * FROM testnulls where var1 is NOT NULL
)
SELECT t1A.id, t1A.date, ISNULL(t1A.var1,t1B.var1) varvalue
FROM selectcte t1A
OUTER APPLY (SELECT TOP 1 *
FROM selectcte
WHERE id = t1A.id AND date < t1A.date
AND var1 IS NOT NULL
ORDER BY id, date DESC) t1B
Here you can dig further about CTEs :
https://learn.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-ver16

BigQuery - Get most recent data for each individual user

I wonder if anyone here can help with a BigQuery piece I am working on.
This will need to pull the most recent gplus/currents activity for each individual user in the domain.
I have tried the following query, but this pulls every activity for every user:
SELECT
TIMESTAMP_MICROS(time_usec) AS date,
email,
event_type,
event_name
FROM
`bqadminreporting.adminlogtracking.activity`
WHERE
record_type LIKE 'gplus'
ORDER BY
email ASC;
I have tried to use DISTINCT, but I still get multiple entries for the same user. Ideally, I need to do this looking back over 90 day... (So between today and 90 days ago, get the most recent activity for each user - if that makes sense?) which brings me to the issue with another question.
EDIT:
Example data and expected output.
Fields: There are over 500 fields, I have just listed the relevant ones
+--------------------------------+---------+----------+
| Field name | Type | Mode |
+--------------------------------+---------+----------+
| time_usec | INTEGER | NULLABLE |
| email | STRING | NULLABLE |
| event_type | STRING | NULLABLE |
| event_name | STRING | NULLABLE |
| record_type | STRING | NULLABLE |
| gplus | RECORD | NULLABLE |
| gplus. log_event_resource_name | STRING | NULLABLE |
| gplus. attachment_type | STRING | NULLABLE |
| gplus. plusone_context | STRING | NULLABLE |
| gplus. post_permalink | STRING | NULLABLE |
| gplus. post_resource_name | STRING | NULLABLE |
| gplus. comment_resource_name | STRING | NULLABLE |
| gplus. post_visibility | STRING | NULLABLE |
| gplus. user_type | STRING | NULLABLE |
| gplus. post_author_name | STRING | NULLABLE |
+--------------------------------+---------+----------+
Output from my query: This is the output I get when running my query above.
+-----+--------------------------------+------------------+----------------+----------------+
| Row | date | email | event_type | event_name |
+-----+--------------------------------+------------------+----------------+----------------+
| 1 | 2020-01-30 07:10:19.088 UTC | user1#domain.com | post_change | create_post |
| 2 | 2020-03-03 08:47:25.086485 UTC | user1#domain.com | coment_change | create_comment |
| 3 | 2020-03-23 09:10:09.522 UTC | user1#domain.com | post_change | create_post |
| 4 | 2020-03-23 09:49:00.337 UTC | user1#domain.com | plusone_change | remove_plusone |
| 5 | 2020-03-23 09:48:10.461 UTC | user1#domain.com | plusone_change | add_plusone |
| 6 | 2020-01-30 10:04:29.757005 UTC | user1#domain.com | coment_change | create_comment |
| 7 | 2020-03-28 08:52:50.711359 UTC | user2#domain.com | coment_change | create_comment |
| 8 | 2020-11-08 10:08:09.161325 UTC | user2#domain.com | coment_change | create_comment |
| 9 | 2020-04-21 15:28:10.022683 UTC | user3#domain.com | coment_change | create_comment |
| 10 | 2020-03-28 09:37:28.738863 UTC | user4#domain.com | coment_change | create_comment |
+-----+--------------------------------+------------------+----------------+----------------+
Desired result: Only 1 row of data per user, showing only the most recent event.
+-----+--------------------------------+------------------+----------------+----------------+
| Row | date | email | event_type | event_name |
+-----+--------------------------------+------------------+----------------+----------------+
| 1 | 2020-03-23 09:49:00.337 UTC | user1#domain.com | plusone_change | remove_plusone |
| 2 | 2020-11-08 10:08:09.161325 UTC | user2#domain.com | coment_change | create_comment |
| 3 | 2020-04-21 15:28:10.022683 UTC | user3#domain.com | coment_change | create_comment |
| 4 | 2020-03-28 09:37:28.738863 UTC | user4#domain.com | coment_change | create_comment |
+-----+--------------------------------+------------------+----------------+----------------+
Use array_agg:
select
email,
array_agg(STRUCT(TIMESTAMP_MICROS(time_usec) as date, event_type, event_name) ORDER BY time_usec desc LIMIT 1)[OFFSET(0)].*
from `bqadminreporting.adminlogtracking.activity`
where
record_type LIKE 'gplus'
and time_usec > unix_micros(timestamp_sub(current_timestamp(), interval 90 day))
group by email
order by email
Test example:
with mytable as (
select timestamp '2020-01-30 07:10:19.088 UTC' as date, 'user1#domain.com' as email, 'post_change' as event_type, 'create_post' as event_name union all
select timestamp '2020-03-03 08:47:25.086485 UTC', 'user1#domain.com', 'coment_change', 'create_comment' union all
select timestamp '2020-03-23 09:10:09.522 UTC', 'user1#domain.com', 'post_change', 'create_post' union all
select timestamp '2020-03-23 09:49:00.337 UTC', 'user1#domain.com', 'plusone_change', 'remove_plusone' union all
select timestamp '2020-03-23 09:48:10.461 UTC', 'user1#domain.com', 'plusone_change', 'add_plusone' union all
select timestamp '2020-01-30 10:04:29.757005 UTC', 'user1#domain.com', 'coment_change', 'create_coment' union all
select timestamp '2020-03-28 08:52:50.711359 UTC', 'user2#domain.com', 'coment_change', 'create_coment' union all
select timestamp '2020-11-08 10:08:09.161325 UTC', 'user2#domain.com', 'coment_change', 'create_coment' union all
select timestamp '2020-04-21 15:28:10.022683 UTC', 'user3#domain.com', 'coment_change', 'create_coment' union all
select timestamp '2020-03-28 09:37:28.738863 UTC', 'user4#domain.com', 'coment_change', 'create_coment'
)
select
email,
array_agg(STRUCT(date, event_type, event_name) ORDER BY date desc LIMIT 1)[OFFSET(0)].*
from mytable
group by email
If you want all columns from the most recent row, you can use this BigQuery syntax:
select array_agg(t order by date desc limit 1)[ordinal(1)].*
from mytable t
group by t.email;
If you want specific columns, then Sergey's solution might be simpler.
An alternative way to solve your problem is :-
select * from (
select
max (date1) max_dt
from mytable
group by date(date1)), mytable
where date1=max_dt

SQL Count In Range

How could I count data in range which could be configured
Something like this,
CAR_AVBL
+--------+-----------+
| CAR_ID | DATE_AVBL |
+--------------------|
| JJ01 | 1 |
| JJ02 | 1 |
| JJ03 | 3 |
| JJ04 | 10 |
| JJ05 | 13 |
| JJ06 | 4 |
| JJ07 | 10 |
| JJ08 | 1 |
| JJ09 | 23 |
| JJ10 | 11 |
| JJ11 | 20 |
| JJ12 | 3 |
| JJ13 | 19 |
| JJ14 | 22 |
| JJ15 | 7 |
+--------------------+
ZONE_CFG
+--------+------------+
| DATE | ZONE_DESCR |
+--------+------------+
| 15 | GREEN_ZONE |
| 25 | YELLOW_ZONE|
| 30 | RED_ZONE |
+--------+------------+
Table ZONE_CFG is configurable, so I could not use static value for this
The DATE column mean maximum date for each ZONE
And the result what I expected :
+------------+----------+
| ZONE_DESCR | AVBL_CAR |
+------------+----------+
| GREEN_ZONE | 11 |
| YELLOW_ZONE| 4 |
| RED_ZONE | 0 |
+------------+----------+
Please could someone help me with this
You can use LAG and group by as following:
SELECT
ZC.ZONE_DESCR,
COUNT(1) AS AVBL_CAR
FROM
CAR_AVBL CA
JOIN ( SELECT
ZONE_DECR,
COALESCE(LAG(DATE) OVER(ORDER BY DATE) + 1, 0) AS START_DATE,
DATE AS END_DATE
FROM ZONE_CFG ) ZC
ON ( CA.DATE_AVBL BETWEEN ZC.START_DATE AND ZC.END_DATE )
GROUP BY
ZC.ZONE_DESCR;
Note: Don't use oracle preserved keywords (DATE, in your case) as the name of the columns. Try to change it to something like DATE_ or DATE_START or etc..
Cheers!!
If you want the zero 0, I might suggest a correlated subquery instead:
select z.*,
(select count(*)
from car_avbl c
where c.date_avbl >= start_date and
c.date_avbl <= date
) as avbl_car
from (select z.*,
lag(date, 1, 0) as start_date
from zone_cfg z
) z;
In Oracle 12C, can phrase this using a lateral join:
select z.*,
(c.cnt - lag(c.cnt, 1, 0) over (order by z.date)) as cnt
from zone_cfg z left join lateral
(select count(*) as cnt
from avbl_car c
where c.date_avbl <= z.date
) c
on 1=1

Find the period of occurence of a value in table

I have a table with the following data.
+------------+---------+
| Date | Version |
+------------+---------+
| 1/10/2019 | 1 |
| .... | |
| 15/10/2019 | 1 |
| 16/10/2019 | 2 |
| .... | |
| 26/10/2019 | 2 |
| 27/10/2019 | 1 |
| .... | |
| 30/10/2019 | 1 |
+------------+---------+
I need to find the period of occurrence for version in the table.
Eg:Suppose I need to get Version 1 occurence details which is present from 1/10/2019 to 15/10/2019 and from 27/10/2019 to 30/10/2019. How can i query the database for such a result?
I have tried many ways but not able to produce the desired result .I even doubt this is possible using a query!
Any inputs are highly appreciated.
Expected output:
+---------+-------------+-------------+
| Version | Period from | Period To |
+---------+-------------+-------------+
| 1 | 1/10/2019 | 15/10/2019 |
| 2 | 16/10/2019 | 26/10/2019 |
| 1 | 27/10/2019 | 30/10/2019 |
+---------+-------------+-------------+
This is gaps and Islands question.
Try this
DECLARE #SampleData TABLE ( [Date] DATE, [Version] INT)
INSERT INTO #SampleData ([Date], [Version])
VALUES
('01-10-2019', 1), ('02-10-2019', 1), ('15-10-2019', 1),
('16-10-2019', 2), ('17-10-2019', 2),('26-10-2019', 2),
('27-10-2019', 1), ('28-10-2019', 1), ('30-10-2019', 1)
SELECT
Y.[Version]
,PeriodFrom = MIN(Y.[Date])
,PeriodTo = MAX(Y.[Date])
FROM(
SELECT
X.[Version]
,X.[Date]
,ISLAND = RN-ROW_NUMBER()OVER( PARTITION BY X.[Version] ORDER BY X.[Date])
FROM(
SELECT
RN=ROW_NUMBER()OVER( ORDER BY S.[Date])
,S.[Date]
,S.[Version]
FROM
#SampleData S
) X
) Y
GROUP BY
Y.[Version], Y.ISLAND
ORDER BY
PeriodFrom
Output
Version PeriodFrom PeriodTo
1 2019-10-01 2019-10-15
2 2019-10-16 2019-10-26
1 2019-10-27 2019-10-30

Get Data at the interval of 10 minutes

I have a table which has the below structure.
+ ----------------------+--------------+--------+---------+
| timeStamp | value | type | id |
+ ----------------------+--------------+--------+---------|
| '2010-01-14 00:00:00' | '11787.3743' | 'mean' | 1 |
| '2018-04-03 00:07:21' | '9.9908' | 'std' | 1 |
| '2018-04-03 00:10:00' | '11787.3743' | 'min' | 1 |
+ ----------------------+--------------+--------+---------+
Now i want to write a select query where i can fetch the data on the basis of type.
Here you can see i want the data at the interval of 10 minutes only and the column 'mean_type'/'min_type'/'std_type' should be dynamic and should be made using the concat query like concat(id,'_','mean')
+ ----------------------+--------------+-------------+----------+
| timeStamp | 1_mean | 1_min | 1_std |
+ ----------------------+--------------+-------------+----------+
| '2010-01-14 00:00:00' | '11787.3743' | | |
| '2018-04-03 00:10:00' | | '11787.3743 | |
+ ----------------------+--------------+-------------+----------+
I have used the below query but it is not working :-
Query :-
select
to_timestamp(floor((extract('epoch' from m.timeStamp) / 600 )) * 600)
AT TIME ZONE 'UTC' as t,
case type when 'mean' then value end as concat(id,'mean'),
case type when 'min' then value end as concat(id,'min'),
case type when 'std' then value end as concat(id,'std'),
from measure m
where id=1
GROUP by t
order by t;
I am using postgres DB.