PostrgreSQL count how many transactions in the last 7 days - sql

Hi I'm a SQL noobie and have been working on this problem for hours on end.
I have a table of transactions and the field txnDate is of date data type. I've altered the table to add a column called txnLast7days which should count how many transactions exist in the table within the last 7 days of txnDate.
This is my table
What statement can I use to update all the table records at once and counts the # of transactions within a 7 day period based on txnDate and inserts the result in the txnLast7days column for each row?
This is the statement I'm currently using based on a suggestion, but I'm still not getting the right result.
UPDATE temp2
SET txnLast7Days = subquery.txnLast7Days
FROM
(
SELECT txnDate, sum(dateCounts.transactionCount) OVER (ORDER BY txnDate ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) as txnLast7Days
FROM (SELECT count(*) transactionCount, txnDate FROM temp2 GROUP BY txnDate) as dateCounts
) subquery
WHERE temp2.txnDate = subquery.txnDate
My current query is not updating the txnlast7days with the right count, as you can see here
my current query output

What you need to do is get a count for each txnDate and then get the rolling 7 day count for each txnDate.
The former is done with a simple COUNT(*) and GROUP BY on your table. The latter is done with a window function that looks back over the last 7 records, ordered by txnDate, and sums those counts up.
You can then use those results in an UPDATE query to populate your new column.
UPDATE yourtable
SET txnLast7Days = subquery.txnLast7Days
FROM
(
SELECT txnDate, sum(dateCounts.transactionCount) OVER (ORDER BY txnDate ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) as txnLast7Days
FROM (SELECT count(*) transactionCount, txnDate FROM yourtable GROUP BY txnDate) as dateCounts
) subquery
WHERE txnDate = subquery.txnDate

Related

How to filter records by them amount per date?

i have a tablet 'A' that have a column of date. and the same date can be in a few records. I'm trying to filter the records where the amount of the records by day is less than 5. And still keep all the fields of the tablet.
I mean that if i have only 4 records on 11/10/2017 I need to filter all of this 4 records.
So You can SELECT them basing at sub-query . In SUB-Query group them by this date column and then use HAVING with aggregated count to know how many in every date-group we have and then select all which have this count lesser than 5 ;
SELECT *
FROM A
WHERE A.date in (SELECT subA.date
FROM A
GROUP BY A.date
HAVING COUNT(*) < 5 );
Take Care's answer is good. Alternatively, you can use an analytic/windowing function. I'd benchmark both and see which one works better.
with cte as (
select *, count(1) over (partition by date) as cnt
from table_a
)
select *
from cte
where cnt < 5

Selecting 5 Most Recent Records Of Each Group

The below statement retrieves the top 2 records within each group in SQL Server. It works correctly, however as you can see it doesn't scale at all. I mean that if I wanted to retrieve the top 5 or 10 records instead of just 2, you can see how this query statement would grow very quickly.
How can I convert this query into something that returns the same records, but that I can quickly change it to return the top 5 or 10 records within each group instead, rather than just 2? (i.e. I want to just tell it to return the top 5 within each group, rather than having 5 unions as the below format would require)
Thanks!
WITH tSub
as (SELECT CustomerID,
TransactionTypeID,
Max(EventDate) as EventDate,
Max(TransactionID) as TransactionID
FROM Transactions
WHERE ParentTransactionID is NULL
Group By CustomerID,
TransactionTypeID)
SELECT *
from tSub
UNION
SELECT t.CustomerID,
t.TransactionTypeID,
Max(t.EventDate) as EventDate,
Max(t.TransactionID) as TransactionID
FROM Transactions t
WHERE t.TransactionID NOT IN (SELECT tSub.TransactionID
FROM tSub)
and ParentTransactionID is NULL
Group By CustomerID,
TransactionTypeID
Use Partition by to solve this type problem
select values from
(select values ROW_NUMBER() over (PARTITION by <GroupColumn> order by <OrderColumn>)
as rownum from YourTable) ut where ut.rownum<=5
This will partitioned the result on the column you wanted order by EventDate Column then then select those entry having rownum<=5. Now you can change this value 5 to get the top n recent entry of each group.

Return min date and corresponding amount to that distinct ID

Afternoon
I am trying to return the min value/ max values in SQL Server 2005 when I have multiple dates that are the same but the values in the Owed column are all different. I've already filtered the table down by my select statement into a temp table for a different query, when I've then tried to mirror I have all the duplicated dates that you can see below.
I now have a table that looks like:
ID| Date |Owes
-----------------
1 20110901 89
1 20110901 179
1 20110901 101
1 20110901 197
1 20110901 510
2 20111001 10
2 20111001 211
2 20111001 214
2 20111001 669
My current query:
Drop Table #Temp
Select Distinct Convert(Varchar(8), DateAdd(dd, Datediff(DD,0,DateDue),0),112)as Date
,ID
,Paid
Into #Temp
From Table
Where Paid <> '0'
Select ,Id
,Date
,Max(Owed)
,Min(Owed)
From #Temp
Group by ID, Date, Paid
Order By ID, Date, Paid
This doesn't strip out any of my dates that are the same, I'm new to SQL but I'm presuming its because my owed column has different values. I basically want to be able to pull back the first record as this will always be my minimum paid and my last record will always be my maximum owed to work out my total owed by ID.
I'm new to SQL so would like to understand what I've done wrong for my future knowledge of structuring queries?
Many Thanks
In your "select into"statement, you don't have an Owed column?
GROUP BY is the normal way you "strip out values that are the same". If you group by ID and Date, you will get one row in your result for each distinct pair of values in those two columns. Each row in the results represents ALL the rows in the underlying table, and aggregate functions like MIN, MAX, etc. can pull out values.
SELECT id, date, MAX(owes) as MaxOwes, MIN(owes) as minOwes
FROM myFavoriteTable
GROUP BY id, date
In SQL Server 2005 there are "windowing functions" that allow you to use aggregate functions on groups of records, without grouping. An example below. You will get one row for each row in the table:
SELECT id, date, owes,
MAX(Owes) over (PARTITION BY select, id) AS MaxOwes,
MIN(Owes) over (PARTITION BY select, id) AS MinOwes
FROM myfavoriteTable
If you name a column "MinOwes" it might sound like you're just fishing tho.
If you want to group by date you can't also group by ID, too, because ID is probably unique. Try:
Select ,Date
,Min(Owed) AS min_date
,Max(Owed) AS max_date
From #Temp
Group by Date
Order By Date
To get additional values from the row (your question is a bit vague there), you could utilize window functions:
SELECT DISTINCT
,Date
,first_value(ID) OVER (PARTITION BY Date ORDER BY Owed) AS min_owed_ID
,last_value(ID) OVER (PARTITION BY Date ORDER BY Owed) AS max_owed_ID
,first_value(Owed) OVER (PARTITION BY Date ORDER BY Owed) AS min_owed
,last_value(Owed) OVER (PARTITION BY Date ORDER BY Owed) AS max_owed
FROM #Temp
ORDER BY Date;

Update row only where max date

I have the following data
Date Week ID Tot_Seconds O_Seconds Week_ID
8/14/2011 12:00:00 AM 5823 22180 170043 26043 18
8/21/2011 12:00:00 AM 5824 22180 126471 0 18
I am trying to update a column in another table the value of O_Seconds,where the week and ID match, but i would only like to update where max(date) for each Week. The reason, is the table with the data source has dates by week, where the the table I will update is daily, and using the query I currently have, it updates for example 26043 for all days where id and week match, skewing my future queries where I will sum the values of those columns.
Is there any way to just update the max date?
Something like this
The derived table is used to get the 1st row per week/ID
UPDATE
O
SET
SomeCol = S.O_Second
FROM
OtherTable O
JOIN
(
SELECT
Week, ID, O_Second,
ROW_NUMBER() OVER (PARTITION BY Week, ID ORDER BY Date DESC) AS rn
FROM
ThisTable
) S ON O.Week = S.Week AND O.ID = S.ID
WHERE
S.rn = 1
For SQL Server 2000 and earlier you need an aggregate. See DBA.SE for more

Hive SQL aggregate merge multiple sqls into one

I have a serial sqls like:
select count(distinct userId) from table where hour >= 0 and hour <= 0;
select count(distinct userId) from table where hour >= 0 and hour <= 1;
select count(distinct userId) from table where hour >= 0 and hour <= 2;
...
select count(distinct userId) from table where hour >= 0 and hour <= 14;
Is there a way to merge them into one sql?
It looks like you are trying to keep a cumulative count, bracketed by the hour. To do that, you can use a window function, like this:
SELECT DISTINCT
A.hour AS hour,
SUM(COALESCE(M.include, 0)) OVER (ORDER BY A.hour) AS cumulative_count
FROM ( -- get all records, with 0 for include
SELECT
name,
hour,
0 AS include
FROM
table
) A
LEFT JOIN
( -- get the record with lowest `hour` for each `name`, and 1 for include
SELECT
name,
MIN(hour) AS hour,
1 AS include
FROM
table
GROUP BY
name
) M
ON M.name = A.name
AND M.hour = A.hour
;
There might be a simpler way, but this should yield the correct answer in general.
Explanation:
This uses 2 subqueries against the same input table, with a derived field called include to keep track of which records should contribute to the final total for each bucket. The first subquery simply takes all records in the table and assigns 0 AS include. The second subquery finds all unique names and the lowest hour slot in which that name appears, and assigns them 1 AS include. The 2 subqueries are LEFT JOIN'ed by the enclosing query.
The outermost query does a COALESCE(M.include, 0) to fill in any NULL's produced by the LEFT JOIN, and those 1's and 0's are SUM'ed and windowed by hour. This needs to be a SELECT DISTINCT rather than using a GROUP BY becuse a GROUP BY will want both hour and include listed, but it ends up collapsing every record in a given hour group into a single row (still with include=1). The DISTINCT is applied after the SUM so it will remove duplicates without discarding any input rows.