How to Find Last Change row in SQL - Big Query - sql

Can someone please provide a query that I can use in Google Big Query to identify the total count of users for whom the value changed specifically from 'C' to 'P'? In the below table userid=123 satisfies this even though later userid = 123 changes back from 'P' to 'C'.
userid timestamp Value
123 9-15-2020 02:35:45 C
456 9-15-2020 01:45:09 P
789 9-15-2020 06:22:10 P
123 9-15-2020 03:43:00 P
456 9-15-2020 03:45:10 C
123 9-15-2020 07:40:34 C

You can try using lag()
select userid from
(
select userid, timestamp, value, lag(value) over(partition by userid order by timestamp) as prev_value
from tablename
)A where value='P' and prev_value='C'

Can someone please provide a query that I can use in Google Big Query to identify the total count of users for whom the value changed specifically from 'C' to 'P'
Note that this is not consistent with the title of the question.
lag() is the key idea. But it is unclear whether you want the count of users or the count of changes. This calculates both:
select count(*) as num_changes,
count(distinct userid) as num_users_with_change
from (select t.*,
lag(value) over(partition by userid order by timestamp) as prev_value
from tablename t
) t
where value = 'P' and prev_value = 'C';
The second column counts a user only once, regardless of the number of times they have changed (which is my interpretation of your question).

identify the total count of users for whom the value changed specifically from 'C' to 'P'?
Below is for BigQuery Standard SQL
#standardSQL
SELECT COUNT(DISTINCT userid) AS qualified_users
FROM `project.dataset.table`
GROUP BY userid
HAVING STRPOS(STRING_AGG(value, '' ORDER BY timestamp), 'CP') > 0
Note; I assume your timestamp column is of TIMESTAMP data type - otherwise you will need to use PARSE_TIMESTAMP in ORDER BY portion

Related

PostgreSQL fill out nulls with previous value by category

I am trying fill out some nulls where I just need them to be the previous available value for a name (sorted by date).
So, from this table:
I need the query to output this:
Now, the idea is that for Jane, on the second and third there was no score, so it should be equal to the previous date on which an score was available, for Jane. And the same for Jon. I am trying coalesce and range, but range is not implemented yet in Redshift. I also looked into other questions and they don't fully apply to different categories. Any alternatives?
Thanks!
select day, name,
coalesce(score, (select score
from [your table] as t
where t.name = [your table].name and t.date < [your table].date
order by date desc limit 1)) as score
from [your table]
The query straightforwardly implements the logic you described:
if score is not null, coalesce will return its value without executing the subquery
if score is null, the subquery will return the last available score for that name before the given date
It's a "gaps and islands" problem and a query can be like this
SELECT
day,
name,
MAX(score) OVER (PARTITION BY name, group_id) AS score
FROM (
SELECT
*,
SUM(CASE WHEN score IS NULL THEN 0 ELSE 1 END) OVER (PARTITION BY name ORDER BY day) AS group_id
FROM data
) groups
ORDER BY name DESC, day
You can check a working demo here

Taking Last Element of Every User ID In SQL Table

I am trying to figure out how to do the following in SQL (I'm specifically working in Teradata). Given the following table of user IDs, transaction date, and item bought, I'm trying to figure out how for each user ID to get the last item bought, for example:
User ID Date Product
123 12/01/1996 A
123 12/02/1996 B
123 12/03/1996 C
124 12/01/1996 B
124 12/04/1996 A
123 12/05/1996 D
So the query would return in this case:
User ID Last Product Bought
123 D
124 A
And so forth. I tried using a Partition By or Window function in Teradata, but could not figure out how to implement it.
Thanks for your help.
Apply Teradata's proprietary syntax for filtering Windowed Aggregates:
select *
from tab
qualify
row_number()
over (partition by User_ID -- each user
order by Date_col desc) = 1 -- lastest row
In Teradata, you can use row_number() and qualify to solve this top-1-per-group problem:
select t.*
from mytable t
qualify row_number() over(partition by yser_id order by date desc) = 1

Teradata SQL Get Rid of Duplicates with Specific Order

I just started teradata SQL this week, so sorry if I don't phrase things correctly. I originally created a script in R that gets rid of duplicates within my table, but now I need to transfer this code into SQL. Here is some sample data:
I want to get rid of any D's in the DELETE column, partition by ID, order by STATUS, DATE, and AMOUNT (with actual dates and amounts before ?s). I want STATUS to go in this order: P, H, F, U, T. I want the first row that has STATUS, DATE, and AMOUNT filled out (with STATUS in order). Here is the example output data:
I'm really stuck on the order issue and the code I've written isn't producing any data at all (but no errors).
SAMPLE CODE:
CREATE VOLATILE TABLE new_tble
AS
(SELECT *
FROM table
QUALIFY row_number() OVER (partition BY ID ORDER BY ID, DATE, AMOUNT)=1
WHERE DELETE <> 'D'
)
with data;
This is a direct translation of your description into Teradata SQL, assuming ? means NULL:
select *
from tab
where "delete" is null
and "date" is not null
and amount is not null
qualify
row_number()
over (partition by id
order by case status
when 'P' then 1
when 'H' then 2
when 'F' then 3
when 'U' then 4
when 'T' then 5
end
,"date"
,amount) = 1

Highest Record for a set user

Hope someone can help.
I have been trying a few queries but I do not seem to be getting the desired result.
I need to identify the highest ‘’claimed’’ users within my table without discarding the columns from the final report.
The user can have more than one record in the table, however the data will be completely different as only the user will match.
The below query only provides me the count per user without giving me the details.
SELECT User, count (*) total_record
FROM mytable
GROUP BY User
ORDER BY count(*) desc
Table:
mytable
Column 1 = User Column 2 = Ref Number Column 3 = Date
The first column will be the unique identifier, however the data in the other columns will differ, therefore it needs to descend the highest claimed user with all the relevant rows to the user to the least claimed user.
User|Ref Num|Date
1|a|20150317
1|b|20150317
2|c|20150317
3|d|20150317
4|e|20150317
1|f|20150317
4|e|20150317
The below data is how the values should be returned.
User|Ref Num|Date|Count
1|a|20150317|3
1|b|20150317|3
1|f|20150317|3
2|c|20150317|1
3|d|20150317|1
4|e|20150317|2
4|e|20150317|2
Hope it makes sense.
Thank you
As you're using MSSQL you can use the OVER() clause like so:
SELECT [user], mt.ref_num, mt.[date], COUNT(mt.[user]) OVER(PARTITION BY mt.[user])
FROM myTable mt
More about the OVER clause can be found here: https://msdn.microsoft.com/en-us/library/ms189461.aspx
As per your comment you can use the wildcard * like so:
SELECT mt.*, COUNT(mt.[user]) OVER(PARTITION BY mt.[user])
FROM myTable mt
This would get you every column as well as the result of the count.
If you want to order by the number of record for each user, then use window functions instead of aggregation:
SELECT t.*
FROM (SELECT t., count(*) OVER (partition by user) as cnt
FROM mytable t
) t
ORDER BY cnt DESC, user;
Note that I added user to the order by so users with the same count will appear together in the list.
You could use an outer apply if your version of SQL Server supports it:
SELECT [User], [Ref Num], Date, total_record
FROM mytable M
OUTER APPLY (
SELECT count(*) total_record
FROM mytable
WHERE [user] = M.[user]
GROUP BY [user]
) oa
ORDER BY total_record desc, [user]
Note that user is a reserved keyword in MSSQL and you need to enclose it in either brackets [user] or double-quotes "user".
This would produce an output like:
user Ref Num Date total_record
1 a 2015-03-17 3
1 b 2015-03-17 3
1 f 2015-03-17 3
4 e 2015-03-17 2
4 e 2015-03-17 2
2 c 2015-03-17 1
3 d 2015-03-17 1
Note that the answers using the count(*) OVER (partition by [user]) construct are more efficient though.
Most simple way would be to use window fuction.
SELECT table.*, COUNT(*) OVER (PARTITION BY user)
FROM nameoftable table -- this is an alias
ORDER BY user, ref_num
This also seem to fit your need.
This is the old way of doing it. Where possible you should use OVER but as other people have answered with that I thought I'd throw this one into the mix.
SELECT
T.[User]
,T.[Ref Num]
,T.[Date]
,(SELECT count(*) from [myTable] T2 where T2.[User] = T.[USER]) as [Count]
FROM [mytable] T
ORDER BY [Count] DESC

How to select multiple rows in SQL Server while filling one column with the first value

Each of my rows have a date. I want the database to keep the good date. But I am in a situation where I want only the first date. But I still want all the other rows. So I would like to fill the date column with all the same date in my result.
For an example (Because I don't think I expressed myself well)
I have this:
name value date
a 10 5/13
b 14 2/13
c 20 1/13
a 11 7/13
a 5 8/13
b 8 9/13
I want it to become like this in the result:
name value date
a 26 5/13
b 22 5/13
c 20 5/13
I searched for this information but I only find the way to select the first row.
for now I'm doing
SELECT name, SUM(value), date FROM table
ORDER BY name
And I'm kind of clueless for what to do next.
Thanks :)
Databases don't have a concept of "first". Here is an attempt, but no guarantees unless you have a way of ordering to determine first:
select name, sum(value), const.date
from table cross join
(select top 1 date from table) const
group by name, const.date
If you only want to do this for a query, to provide this aggregated data for some specific client requirement, then #freshPrince's answer is appropriate. But if want to actually modify the data in the table itself, and prevent the issue from arising again, then you need to change the schema.
Create Table newTable(
name varChar(30) not null,
date datetime not null,
value decimal(10,2) not null default(0),
primary key (name, date) )
Insert newTable (name, date, value)
Select name, SUM(value), Min(date)
FROM currentTable
Group By Name
and delete the old table... then rename the new table to whatever...
You will also have to modify the process used to insert new rows so that instread of always inserting a new row, it updates the existing row for a specified name and date if it already exists...
Your question is slightly confusing since your desired result is showing a date that does not exists with either b or c but if that is the result that you want want you could use something similar to the following:
select name, sum(value) value, d.date
from yt
cross join
(
select min(date) date
from yt
where name = (select min(name)
from yt)
) d
group by name, d.date;
See SQL Fiddle with Demo
But it seems like you actually would want the min(date) for each name:
select name, sum(value) value, min(date)
from yt
group by name;
See SQL Fiddle with Demo.
If the order of the date should be the determined by the name then you could use:
select t.name, sum(value) value, d.date
from yt t
cross join
(
select top 1 name, date
from yt
order by name, date
) d
group by t.name, d.date;
See Demo