I need your help in following case.
i have table ChartData
in that table i have large number of records as per below
PK,ProjectID,MAchineId,Powervalue,PowerData
1 1 1 20.5 2011-07-05 12:00:00
2 1 1 21.5 2011-07-05 12:01:00
3 1 1 22.5 2011-07-05 12:02:00
4 1 1 23.5 2011-07-05 12:03:00
5 1 1 24.5 2011-07-05 12:04:00
6 1 1 25.5 2011-07-05 12:05:00
7 1 1 26.5 2011-07-05 12:06:00
8 1 1 27.5 2011-07-05 12:07:00
9 1 1 26.5 2011-07-05 12:08:00
10 1 1 28.5 2011-07-05 12:09:00
Output
PK,ProjectID,MAchineId,Powervalue(Avg value of power),PowerData
1 1 1 20.5 2011-07-05 12:00:00
6 1 1 25.5 2011-07-05 12:05:00
any help would be appreciated. thanks in advance.
Select * from ChartData where PK=1 or PK=6
You are not doing any averages as far as i know
I guess you want to aggregate every five tuples? If so, maybe something like this will help:
SELECT PK, ProjectID, MachineID, AVG(Powervalue), PowerData FROM table GROUP BY (PK-1)/5
although I don't really know if you can do that with GROUP BY
#gbn: I would like to answer to your comment directly, but I think I lack the privilege. Can you explain why this will fail on all but MySql?
You just want the average powervalue?
SELECT PK,ProjectID,MachineID, AVG(PowerValue), PowerData FROM ChartData
If you'd for example want the average powervalue per machine, this would become the query:
SELECT PK,ProjectID,MachineID, AVG(PowerValue), PowerData FROM ChartData GROUP BY MachineID
Select every 5th record (T-SQL).
select
ch.*
from
(select
row_number() over (order by pk asc) as r,
pk
from
chartdata
) as r
inner join chartdata as ch
on r.pk = c.pl
where
(r.r-1)%5 = 0
;
Without further clarification on what you want this is the best answer I can give.
Related
I am looking to filter very large tables to the latest entry per user per month. I'm not sure if I found the best way to do this. I know I "should" trust the SQL engine (snowflake) but there is a part of me that does not like the join on three columns.
Note that this is a very common operation on many big tables, and I want to use it in DBT views which means it will get run all the time.
To illustrate, my data is of this form:
mytable
userId
loginDate
year
month
value
1
2021-01-04
2021
1
41.1
1
2021-01-06
2021
1
411.1
1
2021-01-25
2021
1
251.1
2
2021-01-05
2021
1
4369
2
2021-02-06
2021
2
32
2
2021-02-14
2021
2
731
3
2021-01-20
2021
1
258
3
2021-02-19
2021
2
4251
3
2021-03-15
2021
3
171
And I'm trying to use SQL to get the last value (by loginDate) for each month.
I'm currently doing a groupby & a join as follows:
WITH latest_entry_by_month AS (
SELECT "userId", "year", "month", max("loginDate") AS "loginDate"
FROM mytable
)
SELECT * FROM mytable NATURAL JOIN latest_entry_by_month
The above results in my desired output:
userId
loginDate
year
month
value
1
2021-01-25
2021
1
251.1
2
2021-01-05
2021
1
4369
2
2021-02-14
2021
2
731
3
2021-01-20
2021
1
258
3
2021-02-19
2021
2
4251
3
2021-03-15
2021
3
171
But I'm not sure if it's optimal.
Any guidance on how to do this faster? Note that I am not materializing the underlying data, so it is effectively un-clustered (I'm getting it from a vendor via the Snowflake marketplace).
Using QUALIFY and windowed function(ROW_NUMBER):
SELECT *
FROM mytable
QUALIFY ROW_NUMBER() OVER(PARTITION BY userId, year, month
ORDER BY loginDate DESC) = 1
I have a table like below,
id
number
date
1
23
2020-01-01
2
12
2020-03-02
3
23
2020-09-02
4
11
2019-03-04
5
12
2019-03-23
6
23
2019-04-12
I want to know is that how many times each number appears per year, such as,
number
2019
2020
23
1
2
12
1
1
11
1
0
I'm kinda stuck.. tried with left join or just a single select, but still, cannot figure out how to make it, please help thank you!
SELECT C.NUMBER,
SUM
(
CASE
WHEN C.DATE BETWEEN '20190101'AND '20191231'
THEN 1 ELSE NULL
END
) AS A_2019,
SUM
(
CASE
WHEN C.DATE BETWEEN '20200101'AND '20201231'
THEN 1 ELSE NULL
END
) AS A_2020
FROM I_have_a_table_like_below AS C
GROUP BY C.NUMBER
Is there a way to find the solution so that I need for 2 days, there are 2 UD's because there are June 24 2 times and for the rest there are single days.
I am showing the expected output here:
Primary key UD Date
-------------------------------------------
1 123 2015-06-24 00:00:00.000
6 456 2015-06-24 00:00:00.000
2 123 2015-06-25 00:00:00.000
3 658 2015-06-26 00:00:00.000
4 598 2015-06-27 00:00:00.000
5 156 2015-06-28 00:00:00.000
No of times Number of days
-----------------------------
4 1
2 2
The logic is 4 users are there who used the application on 1 day and there are 2 userd who used the application on 2 days
You can use two levels of aggregation:
select cnt, count(*)
from (select date, count(*) as cnt
from t
group by date
) d
group by cnt
order by cnt desc;
I have DB2 table like below -
Date1 Item_code Amt
2018-06-01 1 2
2018-06-02 1 3
2018-06-03 2 4
2018-06-03 2 5
2018-06-04 3 6
2018-06-05 3 7
2018-06-06 4 8
I need the cumulative sum item_code wise per day. The result should look like -
Date1 Item_code Amt
2018-06-01 1 2
2018-06-02 1 5
2018-06-03 2 9
2018-06-04 3 6
2018-06-05 3 13
2018-06-06 4 8
I have tried a lot by myself and search also on SO but nothing is fulfilling my need. There are a lot of examples if I just need the cumulative sum day wise irrespective of item code.
Any help is greatly appreciated. Thanks in advance.
I think you want aggregation with a cumulative sum:
select item_code, date1,
sum(sum(amt)) over (partition by item_code order by date1) as running_amt
from t
group by item_code, date;
I have a database with the following data:
Group ID Time
1 1 16:00:00
1 2 16:02:00
1 3 16:03:00
2 4 16:09:00
2 5 16:10:00
2 6 16:14:00
I am trying to find the difference in times between the consecutive rows within each group. Using LAG() and DATEDIFF() (ie. https://stackoverflow.com/a/43055820), right now I have the following result set:
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 00:06:00
2 5 00:01:00
2 6 00:04:00
However I need the difference to reset when a new group is reached, as in below. Can anyone advise?
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 NULL
2 5 00:01:00
2 6 00:04:00
The code would look something like:
select t.*,
datediff(second, lag(time) over (partition by group order by id), time)
from t;
This returns the difference as a number of seconds, but you seem to know how to convert that to a time representation. You also seem to know that group is not acceptable as a column name, because it is a SQL keyword.
Based on the question, you have put group in the order by clause of the lag(), not the partition by.