Basic Teradata SQL add column and summing columns - sql

Not great with SQL, so sorry if my questions seem dumb. I have this working code that pulls the entry date and the number of people that entered Store 1 on that date.
select entry_date as Enter_Date
,count(entry_date) as Entries
from db_entry
where entry_date between '2017-03-05' and '2017-03-11'
and entry_code like 'STR1%'
group by entry_date
This is what it shows up as
Enter_Date Entries
3/5/2017 35
3/9/2017 30
3/10/2017 27
3/8/2017 23
3/7/2017 29
3/6/2017 32
3/11/2017 39
I was wondering if there was a way to add another column for store 2, where the entry_code is 'STR2%'. The reason I'm not sure what to do is because I'm not pulling a different column from the db_entry, so I'm not sure how to differentiate the two columns in the WHERE clause.
In addition, I was wondering if there was a quick way to sum each column and have the latest date as the Enter Date. Ideally this is what I'd like my table to look like:
Enter_Date Store 1 Store 2
3/11/2017 215 301

Use case expressions to do conditional counting.
select entry_date as Enter_Date,
count(case when entry_code like 'STR1%' then entry_date end) as Entries1,
count(case when entry_code like 'STR2%' then entry_date end) as Entries2
from db_entry
where entry_date between '2017-03-05' and '2017-03-11'
and entry_code like any ('STR1%', 'STR2%')
group by entry_date
Note: The WHERE clause's like str1/str2 isn't really needed now, but may perhaps speed the query up.
Edit: Now using like any, as suggested by #Dudu Markovitz!

To answer your 2nd question, simply remove the GROUP BY and switch to:
select MAX(entry_date) as Enter_Date,
count(case when entry_code like 'STR1%' then entry_date end) as "Store 1",
count(case when entry_code like 'STR2%' then entry_date end) as "Store 2"
from db_entry
where entry_date between date '2017-03-05' and date '2017-03-11'
and entry_code like any ('STR1%', 'STR2%')

Related

Calculating Datediff of two days based on when the sum of a column hits a number cap

Tried to see if this was asked anywhere else but doesn't seem like it. Trying to create a sql query to give me the date difference in days between '2022-10-01' and the date when our impression sum hits our cap of 5.
For context, we may see duplicate dates because someone revisit our website that day so we'll get a different session number to pair with that count. Here's an example table of one individual and how many impressions logged.
My goal is to get the number of days it takes to hit an impression cap of 5. So for this individual, they would hit the cap on '2022-10-07' and the days between '2022-10-01' and '2022-10-07' is 6. I am also calculating the difference before/after '2023-01-01' since I need this count for Q4 of '22 and Q1 of '23 but will not include in the example table. I have other individuals to include but for the purpose of asking here, I kept it to one.
Current Query:
select
click_date,
case
when date(click_date) < date('2023-01-01') and sum(impression_cnt = 5) then datediff('day', '2022-10-01', click_date)
when date(click_date) >= date('2023-01-01') and sum(impression_cnt = 5) then datediff('day', '2023-01-01', click_date)
else 0
end days_to_capped
from table
group by customer, click_date, impression_cnt
customer
click date
impression_cnt
123456
2022-10-05
2
123456
2022-10-05
1
123456
2022-10-06
1
123456
2022-10-07
1
123456
2022-10-11
1
123456
2022-10-11
3
Result Table
customer
days_to_cap
123456
6
I'm currently only getting 0 days and then 81 days once it hits 2022-12-21 (last date) for this individual so i know I need to fix my query. Any help would be appreciated!
Edited: This is in snowflake!
So, the issue with your query is that the sum is being calculated at the level that you are grouping by, which is every field, so it will always just be the value of the impressions field every time.
What you need to do is a running sum, which is a SUM() OVER (PARTITION BY...) statement. And then qualify the results of that:
First, just to get the data that you have:
with x as (
select *
from values
(123456,'2022-10-05'::date,2),
(123456,'2022-10-05'::date,1),
(123456,'2022-10-06'::date,1),
(123456,'2022-10-07'::date,1),
(123456,'2022-10-11'::date,1),
(123456,'2022-10-11'::date,3) x (customer,click_date,impression_cnt)
)
Then, I query the CTE to do the running sum with a QUALIFY statement to choose the record that actually has the value I'm looking for
select
customer,
case
when click_date < '2023-01-01'::date and sum(impression_cnt) OVER (partition by customer order by click_date) = 5 then datediff('day', '2022-10-01', click_date)
when click_date >= '2023-01-01'::date and sum(impression_cnt) OVER (partition by customer order by click_date) = 5 then datediff('day', '2023-01-01', click_date)
else 0
end days_to_capped
from x
qualify days_to_capped > 0;
The qualify filters your results to just the record that you cared about.

how to use count with case when

I'm newbie to Hivesql.
I have a raw table with 6 million records like this:
I want to count the number of IP_address access to each Modem_id everyweek.
The result table I want will be like this:
I did it with left join, and it worked. But since using join will be time-consuming, I want do it with case when statement - but I can't write a correct statement. Do you have any ideas?
This is the join statement I used:
select a.modem_id,
a.Number_of_IP_in_Day_1,
b.Number_of_IP_in_Day_2
from
(select modem_id,
count(distinct ip_address) as Number_of_IP_in_Day_1
from F_ACS_DEVICE_INFORMATION_NEW
where day=1
group by modem_id) a
left join
(select modem_id,
count(distinct param_value) as Number_of_IP_in_Day_2
from F_ACS_DEVICE_INFORMATION_NEW
where day=2
group by modem_id) b
on a.modem_id= b.modem_id;
You can express your logic using just aggregatoin:
select a.modem_id,
count(distinct case when date = 1 then ip_address end) as day_1,
count(distinct case when date = 2 then ip_address end) as day_2
from F_ACS_DEVICE_INFORMATION_NEW a
group by a.modem_id;
You can obviously extend this for more days.
Note: As your question and code are written, this assumes that your base table has data for only one week. Otherwise, I would expect some date filtering. Presumably, that is what the _NEW suffix means on the table name.
Based on your question and further comments, you would like
The number of different IP addresses accessed by each modem
In counts by week (as columns) for 4 weeks
e.g., result would be 5 columns
modem_id
IPs_accessed_week1
IPs_accessed_week2
IPs_accessed_week3
IPs_accessed_week4
My answer here is based on knowledge of SQL - I haven't used Hive but it appears to support the things I use (e.g., CTEs). You may need to tweak the answer a bit.
The first key step is to turn the day_number into a week_number. A straightforward way to do this is FLOOR((day_num-1)/7)+1 so days 1-7 become week 1, days 8-14 become week2, etc.
Note - it is up to you to make sure the day_nums are correct. I would guess you'd actually want info the the last 4 weeks, not the first four weeks of data - and as such you'd probably calculate the day_num as something like SELECT DATEDIFF(day, IP_access_date, CAST(getdate() AS date)) - whatever the equivalent is in Hive.
There are a few ways to do this - I think the clearest is to use a CTE to convert your dataset to what you need e.g.,
convert day_nums to weeknums
get rid of duplicates within the week (your code has COUNT(DISTINCT ...) - I assume this is what you want) - I'm doing this with SELECT DISTINCT (rather than grouping by all fields)
From there, you could PIVOT the data to get it into your table, or just use SUM of CASE statements. I'll use SUM of CASE here as I think it's clearer to understand.
WITH IPs_per_week AS
(SELECT DISTINCT
modem_id,
ip_address,
FLOOR((day-1)/7)+1 AS week_num -- Note I've referred to it as day_num in text for clarity
FROM F_ACS_DEVICE_INFORMATION_NEW
)
SELECT modem_id,
SUM(CASE WHEN week_num = 1 THEN 1 ELSE 0 END) AS IPs_access_week1,
SUM(CASE WHEN week_num = 2 THEN 1 ELSE 0 END) AS IPs_access_week2,
SUM(CASE WHEN week_num = 3 THEN 1 ELSE 0 END) AS IPs_access_week3,
SUM(CASE WHEN week_num = 4 THEN 1 ELSE 0 END) AS IPs_access_week4
FROM IPs_per_week
GROUP BY modem_id;

Group By with Case statement?

I need find the number Sum of orders over a 3 day range. so imagine a table like this
Order Date
300 1/5/2015
200 1/6/2015
150 1/7/2015
250 1/5/2015
400 1/4/2015
350 1/3/2015
50 1/2/2015
100 1/8/2015
So I want to create a Group by Clause that Groups anything with a date that has the same Month, Year and a Day from 1-3 or 4-6, 7-9 and so on until I reach 30 days.
It seems like what I would want to do is create a case for the grouping that includes a loop of some type but I am not sure if this is the best way or if it is even possible to combine them.
An alternative might be create a case statement that creates a new column that assigns group number and then grouping by that number, month, and Year.
Unfortunately I've never used a case statement so I am not sure which method is best or how to execute them especially with a loop.
EDIT: I am using Access so it looks like I will be using IIF instead of Case
Consider the Partition Function and a crosstab, so, for example:
TRANSFORM Sum(Calendar.Order) AS SumOfOrder
SELECT Month([CalDate]) AS TheMonth, Partition(Day([Caldate]),1,31,3) AS DayGroup
FROM Calendar
GROUP BY Month([CalDate]), Partition(Day([Caldate]),1,31,3)
PIVOT Year([CalDate]);
As an aside, I hope you have not named a field / column as Date.
How about the following:
COUNT OF ORDERS
select year([Date]) as yr,
month([Date]) as monthofyr,
sum(iif((day([Date])>=1) and (day([Date])<=3),1,0)) as days1to3,
sum(iif((day([Date])>=4) and (day([Date])<=6),1,0)) as days4to6,
sum(iif((day([Date])>=7) and (day([Date])<=9),1,0)) as days7to9,
sum(iif((day([Date])>=10) and (day([Date])<=12),1,0)) as days10to12,
sum(iif((day([Date])>=13) and (day([Date])<=15),1,0)) as days13to15,
sum(iif((day([Date])>=16) and (day([Date])<=18),1,0)) as days16to18,
sum(iif((day([Date])>=19) and (day([Date])<=21),1,0)) as days19to21,
sum(iif((day([Date])>=22) and (day([Date])<=24),1,0)) as days22to24,
sum(iif((day([Date])>=25) and (day([Date])<=27),1,0)) as days25to27,
sum(iif((day([Date])>=28) and (day([Date])<=31),1,0)) as days28to31
from tbl
where [Date] between x and y
group by year([Date]),
month([Date])
Replace x and y with your date range.
The last group is days 28 to 31 of the month, so it may contain 4 days' worth of orders, for months that have 31 days.
THE ABOVE IS A COUNT OF ORDERS.
If you want the SUM of the order amounts:
SUM OF ORDER AMOUNTS
select year([Date]) as yr,
month([Date]) as monthofyr,
sum(iif((day([Date])>=1) and (day([Date])<=3),order,0)) as days1to3,
sum(iif((day([Date])>=4) and (day([Date])<=6),order,0)) as days4to6,
sum(iif((day([Date])>=7) and (day([Date])<=9),order,0)) as days7to9,
sum(iif((day([Date])>=10) and (day([Date])<=12),order,0)) as days10to12,
sum(iif((day([Date])>=13) and (day([Date])<=15),order,0)) as days13to15,
sum(iif((day([Date])>=16) and (day([Date])<=18),order,0)) as days16to18,
sum(iif((day([Date])>=19) and (day([Date])<=21),order,0)) as days19to21,
sum(iif((day([Date])>=22) and (day([Date])<=24),order,0)) as days22to24,
sum(iif((day([Date])>=25) and (day([Date])<=27),order,0)) as days25to27,
sum(iif((day([Date])>=28) and (day([Date])<=31),order,0)) as days28to31
from tbl
where [Date] between x and y
group by year([Date]),
month([Date])

Oracle SQL WHERE within a time range using sysdate

I'm trying to select data from the previous day, and within a certain time frame, but I may be calculating my where clause incorrectly. I've tried switching times around etc. Basically I want to see all data from 6am-6pm, and then 7pm-3am, but My results aren't relecting such. I've tried between trunc(sysdate)-1 '00:00:00'<- but specifying the time, but I feel I'm not familiar enough with the function.
Note: DB is in UTC hence the 8/24.
Query:
--TOTAL PROBLEM STOW EVENTS
SELECT to_char(entry_date -8/24, 'DD-MON-YYYY HH12:MI:SSam'), OLD_BIN_ID old_bin, NEW_BIN_ID NEW_BIN, ISBN ASIN, QUANTITY
FROM BINEDIT_ENTRIES
WHERE ENTRY_DATE BETWEEN trunc(SYSDATE) -1 +4/24 AND trunc(SYSDATE) -1 +16/24
--where entry_date BETWEEN trunc(sysdate)-1 '00:00:00' AND trunc(sysdate)-1 '00:00:00.000'
AND substr(old_bin_id,1,2) = 'SC'
AND substr(new_bin_id,1,2) = 'vt'
GROUP BY ENTRY_DATE, OLD_BIN_ID, NEW_BIN_ID, ISBN, Quantity
ORDER BY QUANTITY DESC;
Result:
This appears to look correct, BUT when I change to look at other time range, it shows me this..
Second Query(Night Time):
--TOTAL PROBLEM STOW EVENTS
SELECT to_char(entry_date -8/24, 'DD-MON-YYYY HH12:MI:SSam'), OLD_BIN_ID old_bin, NEW_BIN_ID NEW_BIN, ISBN ASIN, QUANTITY
FROM BINEDIT_ENTRIES
WHERE ENTRY_DATE BETWEEN trunc(SYSDATE) -1 +16/24 AND trunc(SYSDATE) -1 +24/24
--where entry_date BETWEEN trunc(sysdate)-1 '00:00:00' AND trunc(sysdate)-1 '00:00:00.000'
AND substr(old_bin_id,1,2) = 'SC'
AND substr(new_bin_id,1,2) = 'vt'
GROUP BY ENTRY_DATE, OLD_BIN_ID, NEW_BIN_ID, ISBN, Quantity
ORDER BY QUANTITY DESC;
Result:
As you can see it doesn't appear to be looking at the where clause, I believe I have it formatted incorrectly, I typically just look at yesterday as a whole, and not a time range, so this is my first time attempting this. Thank you.
Effectively you're asking for everything between 8 AM and 4 PM local time. I say 8 AM since you're adding 16 hours in the WHERE clause and subtracting 8 in the SELECT clause.
If you meant to query between 7 PM local time and 3AM you would just add 8 hours in the WHERE clause:
WHERE ENTRY_DATE BETWEEN
trunc(SYSDATE) -1 +19/24 + 8/24
AND trunc(SYSDATE) -1 +27/24 + 8/24

SQL sum 2 different column by different condtion then subtraction and add

what I am trying is kind of complex, I will try my best to explain.
I achieved the first part which is to sum the column by hours.
example
ID TIMESTAMP CUSTAFFECTED
1 10-01-2013 01:00:23 23
2 10-01-2013 03:00:23 55
3 10-01-2013 05:00:23 2369
4 10-01-2013 04:00:23 12
5 10-01-2013 01:00:23 1
6 10-01-2013 12:00:23 99
7 10-01-2013 01:00:23 22
8 10-01-2013 02:00:23 3
output would be
Hour TotalCALLS CUSTAFFECTED
10/1/2013 01:00 3 46
10/1/2013 02:00 1 3
10/1/2013 03:00 1 55
10/1/2013 04:00 1 12
10/1/2013 05:00 1 2369
10/1/2013 12:00 1 99
Query
SELECT TRUNC(STARTDATETIME, 'HH24') AS hour,
COUNT(*) AS TotalCalls,
sum(CUSTAFFECTED) AS CUSTAFFECTED
FROM some_table
where STARTDATETIME >= To_Date('09-12-2013 00:00:00','MM-DD-YYYY HH24:MI:SS') and
STARTDATETIME <= To_Date('09-13-2013 00:00:00','MM-DD-YYYY HH24:MI:SS') and
GROUP BY TRUNC(STARTDATETIME, 'HH')
what I need
what I need sum 2 queries and group by timestamp/hour. 2nd query is exactly same as first but just the where clause is different.
2nd query
SELECT TRUNC(RESTOREDDATETIME , 'HH24') AS hour,
COUNT(*) AS TotalCalls,
SUM(CUSTAFFECTED) AS CUSTRESTORED
FROM some_table
where RESTOREDDATETIME >= To_Date('09-12-2013 00:00:00','MM-DD-YYYY HH24:MI:SS') and
RESTOREDDATETIME <= To_Date('09-13-2013 00:00:00','MM-DD-YYYY HH24:MI:SS')
GROUP BY TRUNC(RESTOREDDATETIME , 'HH24')
so I need to subtract custaffected - custrestoed, and display tht total.
I added link to excel file. http://goo.gl/ioo9hg
Thanks
Ok, now that correct sql is in question text, try this:
SELECT TRUNC(STARTDATETIME, 'HH24') AS hour,
COUNT(*) AS TotalCalls,
Sum(case when RESTOREDDATETIME is null Then 0 else 1 end) RestoredCount,
Sum(CUSTAFFECTED) as CUSTAFFECTED,
Sum(case when RESTOREDDATETIME is null Then 0 else CUSTAFFECTED end) CustRestored,
SUM(CUSTAFFECTED) -
Sum(case when RESTOREDDATETIME is null Then 0 else CUSTAFFECTED end) AS CUSTNotRestored
FROM some_table
where STARTDATETIME >= To_Date('09-12-2013 00:00:00','MM-DD-YYYY HH24:MI:SS')
and STARTDATETIME <= To_Date('09-13-2013 00:00:00','MM-DD-YYYY HH24:MI:SS')
GROUP BY TRUNC(STARTDATETIME, 'HH24')
I recently needed to do this and had to play with it some to get it to work.
The challenge is to get the results of one query to link over to another query all inside the same query and then manipulate the returned value of a field so that the value in a given field in one query's resultset, call it FieldA, is subtracted from the value in a field in a different resultset, call it FieldB. It doesn't matter if the subject values are the result of an aggregation function like COUNT(...); they could be any numeric field in a resultset needing grouping or not. Looking at values from aggregation functions just means you need to adjust your query logic to use GROUP BY for the proper fields. The approach requires creating in-line views in the query and using those as the source of data for doing the subtraction.
A red herring when dealing with this kind of thing is the MINUS operator (assuming you are using an Oracle database) but that will not work since MINUS is not about subtracting values inside a resultset's field values from one another, but subtracting one set of matching records found in another set of records from the final result set returned from the query. In addition, MINUS is not a SQL standard operator so your database probably won't support it if it isn't Oracle you are using. Still, it's awfully nice to have around when you need it.
OK, enough prelude. Here's the query form you will want to use, taking for example a date range we want grouped by YYYY-MM:
select inlineview1.year_mon, (inlineview1.CNT - inlineview2.CNT) as finalcnt from
(SELECT TO_CHAR(*date_field*, 'YYYY-MM') AS year_mon, count(*any_field_name*) as CNT
FROM *schemaname.tablename*
WHERE *date_field* > TO_DATE('*{a year}-{a month}-{a day}*', 'YYYY-MM-DD') and
*date_field* < TO_DATE('*{a year}-{a month}-{a day}*', 'YYYY-MM-DD') and
*another_field* = *{value_of_some_kind}* -- ... etc. ...
GROUP BY TO_CHAR(*date_field*, 'YYYY-MM')) inlineview1,
(SELECT TO_CHAR(*date_field*, 'YYYY-MM') AS year_mon, count(*any_field_name*) as CNT
FROM *schemaname.tablename*
WHERE *date_field* > TO_DATE('*{a year}-{a month}-{a day}*', 'YYYY-MM-DD') and
*date_field* < TO_DATE('*{a year}-{a month}-{a day}*', 'YYYY-MM-DD') and
*another_field* = *{value_of_some_kind}* -- ... etc. ...
GROUP BY TO_CHAR(*date_field*, 'YYYY-MM')) inlineview2
WHERE
inlineview1.year_mon = inlineview2.year_mon
order by *either or any of the final resultset's fields* -- optional
A bit less abstractly, an example wherein a bookseller wants to see the net number of books that were sold in any given month in 2013. To do this, the seller must subtract the number of books retruned for refund from the number sold. He does not care when the book was sold, as he feels a returned book represents a loss of a sale and income statistically no matter when it occurs vs. when the book was sold. Example:
select bookssold.year_mon, (bookssold.CNT - booksreturned.CNT) as netsalescount from
(SELECT TO_CHAR(SALE_DATE, 'YYYY-MM') AS year_mon, count(TITLE) as CNT
FROM RETAILOPS.ACTIVITY
WHERE SALE_DATE > TO_DATE('2012-12-31', 'YYYY-MM-DD') and
SALE_DATE < TO_DATE('2014-01-01', 'YYYY-MM-DD') and
OPERATION = 'sale'
GROUP BY TO_CHAR(SALE_DATE, 'YYYY-MM')) bookssold,
(SELECT TO_CHAR(SALE_DATE, 'YYYY-MM') AS year_mon, count(TITLE) as CNT
FROM RETAILOPS.ACTIVITY
WHERE SALE_DATE > TO_DATE('2012-12-31', 'YYYY-MM-DD') and
SALE_DATE < TO_DATE('2014-01-01', 'YYYY-MM-DD') and
OPERATION = 'return'
GROUP BY TO_CHAR(SALE_DATE, 'YYYY-MM')) booksreturned
WHERE
bookssold.year_mon = booksreturned.year_mon
order by bookssold.year_mon desc
Note that to be sure the query returns as expected, the two in-line views must be equijoined based as shown above on some criteria, as in:
bookssold.year_mon = booksreturned.year_mon
or the subtraction of the counted records can't be done on a 1:1 basis, as the query parser will not know which of the records returned with a grouped count value is to be subtracted from which. Failing to specifiy an equijoin condition will yield a Cartesian join result, probably not what you want (though you may inded want that). For example, adding 'booksreturned.year_mon' right after 'bookssold.year_mon' to the returned fields list in the top-level select statement in the above example and eliminating the
bookssold.year_mon = booksreturned.year_mon
criteria in its WHERE clause will produce a working query that does the subtraction calculation on the CNT values for the YYYY-MM values in the first two columns of the resultset and shows them in the third column. Handy to know this if you need it, as it has solid application in business trends analysis if you can compare sales and returns not just within a given atomic timeframe but as compared across such timeframes in a 1:N fashion.