Teradata SQL: select a literal - sql

I want to use a list of arbitrary numbers as a sort of input to a select. Option A, of course, is to create a temporary table that contains just the values (e.g., 1,2,3).
I hope that you folks know what Option >A is.
Suppose the statement is like:
select Fx,
XXXXXX as Foo
from MyTable
where MyTest depends on each XXXXXX
So if I could magically make XXXXXX a list of values (1,2,3), I'd have a resultset like:
My val | Foo
-------+---
cat | 1
mouse | 2
cheesecake | 3
Again, I could source the inputs from a table, but I prefer not to if it's not necessary. Gurus, please chime in.
TIA.

You will probably find success using the ROW_NUMBER() Window Aggregate function.
Random Order
SELECT CALENDAR_DATE
, ROW_NUMBER()
OVER (ORDER BY 1)
FROM SYS_CALENDAR.CALENDAR
WHERE CALENDAR_DATE BETWEEN DATE '2010-06-01' AND DATE
;
OR Order by the column
SELECT CALENDAR_DATE
, ROW_NUMBER()
OVER (ORDER BY CALENDAR_DATE)
FROM SYS_CALENDAR.CALENDAR
WHERE CALENDAR_DATE BETWEEN DATE '2010-06-01' AND DATE
;
OR Partition by another column to restart the sequence
SELECT CALENDAR_DATE
, YEAR_OF_CALENDAR
, ROW_NUMBER()
OVER (PARTITION BY YEAR_OF_CALENDAR
ORDER BY CALENDAR_DATE)
FROM SYS_CALENDAR.CALENDAR
WHERE CALENDAR_DATE BETWEEN DATE '2009-11-01' AND DATE
;
;

Related

How can I create a week-to-date metric in vertica?

I have a table which stores year-to-date metrics once per client per day. The schema simplified looks roughly like so, lets call this table history::
bus_date | client_id | ytd_costs
I'd like to create a view that adds a week-to-date costs, essentially any cost that occurs after the prior friday would be considered part of the week-to-date. Currently, I have the following but I'm concerned about the switch case logic.
Here is an example of the logic I have right now to show that this works.
I also got to use the timeseries clause which I've never used before...
;with history as (
select bus_date,client_id,ts_first_Value(value,'linear') "ytd_costs"
from (select {ts'2016-10-07'} t,1 client_id,5.0 "value" union all select {ts'2016-10-14'},1, 15) k
timeseries bus_Date as '1 day' over (partition by client_id order by t)
)
,history_with_wtd as (select bus_date
,client_id
,ytd_costs
,ytd_costs - decode(
dayofweek(bus_date)
,6,first_value(ytd_costs) over (partition by client_id order by bus_date range '1 week' preceding)
,first_value(ytd_costs) over (partition by client_id,date_trunc('week',bus_date+3) order by bus_date)
) as "wtd_costs"
,ytd_costs - 5 "expected_wtd"
from history)
select *
from history_with_wtd
where date_trunc('week',bus_date) = '2016-10-10'
In Sql server, I could just use the lag function, as I can pass a variable to the look-back clause. but in Vertica no such option exists.
How about you partition by week starting on Saturday? First grab the first day of the week, then offset to start on Saturday. trunc(bus_date + 1,'D') - 1
Also notice the window frame is from the start of the partition (Saturday, unbounded preceding) to the current row.
select
bus_date
,client_id
,ytd_costs
,ytd_costs - first_value(ytd_costs) over (
partition by client_id, trunc(bus_date + 1,'D') - 1
order by bus_date
range between unbounded preceding and current row) wtd_costs
from sos.history
order by client_id, bus_date

Postgres windowing (determine contiguous days)

Using Postgres 9.3, I'm trying to count the number of contiguous days of a certain weather type. If we assume we have a regular time series and weather report:
date|weather
"2016-02-01";"Sunny"
"2016-02-02";"Cloudy"
"2016-02-03";"Snow"
"2016-02-04";"Snow"
"2016-02-05";"Cloudy"
"2016-02-06";"Sunny"
"2016-02-07";"Sunny"
"2016-02-08";"Sunny"
"2016-02-09";"Snow"
"2016-02-10";"Snow"
I want something count the contiguous days of the same weather. The results should look something like this:
date|weather|contiguous_days
"2016-02-01";"Sunny";1
"2016-02-02";"Cloudy";1
"2016-02-03";"Snow";1
"2016-02-04";"Snow";2
"2016-02-05";"Cloudy";1
"2016-02-06";"Sunny";1
"2016-02-07";"Sunny";2
"2016-02-08";"Sunny";3
"2016-02-09";"Snow";1
"2016-02-10";"Snow";2
I've been banging my head on this for a while trying to use windowing functions. At first, it seems like it should be no-brainer, but then I found out its much harder than expected.
Here is what I've tried...
Select date, weather, Row_Number() Over (partition by weather order by date)
from t_weather
Would it be better just easier to compare the current row to the next? How would you do that while maintaining a count? Any thoughts, ideas, or even solutions would be helpful!
-Kip
You need to identify the contiguous where the weather is the same. You can do this by adding a grouping identifier. There is a simple method: subtract a sequence of increasing numbers from the dates and it is constant for contiguous dates.
One you have the grouping, the rest is row_number():
Select date, weather,
Row_Number() Over (partition by weather, grp order by date)
from (select w.*,
(date - row_number() over (partition by weather order by date) * interval '1 day') as grp
from t_weather w
) w;
The SQL Fiddle is here.
I'm not sure what the query engine is going to do when scanning multiple times across the same data set (kinda like calculating area under a curve), but this works...
WITH v(date, weather) AS (
VALUES
('2016-02-01'::date,'Sunny'::text),
('2016-02-02','Cloudy'),
('2016-02-03','Snow'),
('2016-02-04','Snow'),
('2016-02-05','Cloudy'),
('2016-02-06','Sunny'),
('2016-02-07','Sunny'),
('2016-02-08','Sunny'),
('2016-02-09','Snow'),
('2016-02-10','Snow') ),
changes AS (
SELECT date,
weather,
CASE WHEN lag(weather) OVER () = weather THEN 1 ELSE 0 END change
FROM v)
SELECT date
, weather
,(SELECT count(weather) -- number of times the weather didn't change
FROM changes v2
WHERE v2.date <= v1.date AND v2.weather = v1.weather
AND v2.date >= ( -- bounded between changes of weather
SELECT max(date)
FROM changes v3
WHERE change = 0
AND v3.weather = v1.weather
AND v3.date <= v1.date) --<-- here's the expensive part
) curve
FROM changes v1
Here is another approach based off of this answer.
First we add a change column that is 1 or 0 depending on whether the weather is different or not from the previous day.
Then we introduce a group_nr column by summing the change over an order by date. This produces a unique group number for each sequence of consecutive same-weather days since the sum is only incremented on the first day of each sequence.
Finally we do a row_number() over (partition by group_nr order by date) to produce the running count per group.
select date, weather, row_number() over (partition by group_nr order by date)
from (
select *, sum(change) over (order by date) as group_nr
from (
select *, (weather != lag(weather,1,'') over (order by date))::int as change
from tmp_weather
) t1
) t2;
sqlfiddle (uses equivalent WITH syntax)
You can accomplish this with a recursive CTE as follows:
WITH RECURSIVE CTE_ConsecutiveDays AS
(
SELECT
my_date,
weather,
1 AS consecutive_days
FROM My_Table T
WHERE
NOT EXISTS (SELECT * FROM My_Table T2 WHERE T2.my_date = T.my_date - INTERVAL '1 day' AND T2.weather = T.weather)
UNION ALL
SELECT
T.my_date,
T.weather,
CD.consecutive_days + 1
FROM
CTE_ConsecutiveDays CD
INNER JOIN My_Table T ON
T.my_date = CD.my_date + INTERVAL '1 day' AND
T.weather = CD.weather
)
SELECT *
FROM CTE_ConsecutiveDays
ORDER BY my_date;
Here's the SQL Fiddle to test: http://www.sqlfiddle.com/#!15/383e5/3

Retrieve the Most recent Date based on Time

EX :
ID Date(with time) Price
---- ------- ------
A 23-Aug-12 (09:25pm) 10(cosider this was the latest on this date)
A 25-May-10 20
A 23-Aug-12 (8:20pm) 30
A 23-Aug-12 (7:00pm) 35
B 03-Apr-09 45
B 05-Dec-10 60
I want to Retrieve ID,Date,Price i.e If for two same dates if der are multiple prices then I have to select the date that is latest Update on date based on Timestamp included.
Expected output :
A,23-Aug,12,10
A,25-May-10,20
B,03-Apr-09,45
B,05-Dec-10,60
Most versions of SQL support the row_number function. Extracting the date from a datetime varies between databases. Here is one way to do what you want:
select id, datetime, price
from (select t.*,
row_number() over (partition by id, cast(datetime as date) order by datetime desc
) as seqnum
from t
) t
where seqnum = 1;
This gives the general structure. The exact syntax varies by database.
Try something this:-
SELECT
ID,
MAX(date)
FROM
Some_Unnamed_Table
GROUP BY
ID
ORDER BY
ID

PostgreSQL: How to return rows with respect to a found row (relative results)?

Forgive my example if it does not make sense. I'm going to try with a simplified one to encourage more participation.
Consider a table like the following:
dt | mnth | foo
--------------+------------+--------
2012-12-01 | December |
...
2012-08-01 | August |
2012-07-01 | July |
2012-06-01 | June |
2012-05-01 | May |
2012-04-01 | April |
2012-03-01 | March |
...
1997-01-01 | January |
If you look for the record with dt closest to today w/o going over, what would be the best way to also return the 3 records beforehand and 7 records after?
I decided to try windowing functions:
WITH dates AS (
select row_number() over (order by dt desc)
, dt
, dt - now()::date as dt_diff
from foo
)
, closest_date AS (
select * from dates
where dt_diff = ( select max(dt_diff) from dates where dt_diff <= 0 )
)
SELECT *
FROM dates
WHERE row_number - (select row_number from closest_date) >= -3
AND row_number - (select row_number from closest_date) <= 7 ;
I feel like there must be a better way to return relative records with a window function, but it's been some time since I've looked at them.
create table foo (dt date);
insert into foo values
('2012-12-01'),
('2012-08-01'),
('2012-07-01'),
('2012-06-01'),
('2012-05-01'),
('2012-04-01'),
('2012-03-01'),
('2012-02-01'),
('2012-01-01'),
('1997-01-01'),
('2012-09-01'),
('2012-10-01'),
('2012-11-01'),
('2013-01-01')
;
select dt
from (
(
select dt
from foo
where dt <= current_date
order by dt desc
limit 4
)
union all
(
select dt
from foo
where dt > current_date
order by dt
limit 7
)) s
order by dt
;
dt
------------
2012-03-01
2012-04-01
2012-05-01
2012-06-01
2012-07-01
2012-08-01
2012-09-01
2012-10-01
2012-11-01
2012-12-01
2013-01-01
(11 rows)
You could use the window function lead():
SELECT dt_lead7 AS dt
FROM (
SELECT *, lead(dt, 7) OVER (ORDER BY dt) AS dt_lead7
FROM foo
) d
WHERE dt <= now()::date
ORDER BY dt DESC
LIMIT 11;
Somewhat shorter, but the UNION ALL version will be faster with a suitable index.
That leaves a corner case where "date closest to today" is within the first 7 rows. You can pad the original data with 7 rows of -infinity to take care of this:
SELECT d.dt_lead7 AS dt
FROM (
SELECT *, lead(dt, 7) OVER (ORDER BY dt) AS dt_lead7
FROM (
SELECT '-infinity'::date AS dt FROM generate_series(1,7)
UNION ALL
SELECT dt FROM foo
) x
) d
WHERE d.dt &lt= now()::date -- same as: WHERE dt &lt= now()::date1
ORDER BY d.dt_lead7 DESC -- same as: ORDER BY dt DESC 1
LIMIT 11;
I table-qualified the columns in the second query to clarify what happens. See below.
The result will include NULL values if the "date closest to today" is within the last 7 rows of the base table. You can filter those with an additional sub-select if you need to.
1To address your doubts about output names versus column names in the comments - consider the following quotes from the manual.
Where to use an output column's name:
An output column's name can be used to refer to the column's value in
ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses;
there you must write out the expression instead.
Bold emphasis mine. WHERE dt <= now()::date references the column d.dt, not the the output column of the same name - thereby working as intended.
Resolving conflicts:
If an ORDER BY expression is a simple name that matches both an output
column name and an input column name, ORDER BY will interpret it as
the output column name. This is the opposite of the choice that GROUP BY
will make in the same situation. This inconsistency is made to be
compatible with the SQL standard.
Bold emphasis mine again. ORDER BY dt DESC in the example references the output column's name - as intended. Anyway, either columns would sort the same. The only difference could be with the NULL values of the corner case. But that falls flat, too, because:
the default behavior is NULLS LAST when ASC is specified or implied,
and NULLS FIRST when DESC is specified
As the NULL values come after the biggest values, the order is identical either way.
Or, without LIMIT (as per request in comment):
WITH x AS (
SELECT *
, row_number() OVER (ORDER BY dt) AS rn
, first_value(dt) OVER (ORDER BY (dt > '2011-11-02')
, dt DESC) AS dt_nearest
FROM foo
)
, y AS (
SELECT rn AS rn_nearest
FROM x
WHERE dt = dt_nearest
)
SELECT dt
FROM x, y
WHERE rn BETWEEN rn_nearest - 3 AND rn_nearest + 7
ORDER BY dt;
If performance is important, I would still go with #Clodoaldo's UNION ALL variant. It will be fastest. Database agnostic SQL will only get you so far. Other RDBMS do not have window functions at all, yet (MySQL), or different function names (like first_val instead of first_value). You might just as well replace LIMIT with TOP n (MS SQL) or whatever the local dialect.
You could use something like that:
select * from foo
where dt between now()- interval '7 months' and now()+ interval '3 months'
This and this may help you.

Last day of the month with a twist in SQLPLUS

I would appreciate a little expert help please.
in an SQL SELECT statement I am trying to get the last day with data per month for the last year.
Example, I am easily able to get the last day of each month and join that to my data table, but the problem is, if the last day of the month does not have data, then there is no returned data. What I need is for the SELECT to return the last day with data for the month.
This is probably easy to do, but to be honest, my brain fart is starting to hurt.
I've attached the select below that works for returning the data for only the last day of the month for the last 12 months.
Thanks in advance for your help!
SELECT fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,fd.column_name
FROM super_table fd,
(SELECT TRUNC(daterange,'MM')-1 first_of_month
FROM (
select TRUNC(sysdate-365,'MM') + level as DateRange
from dual
connect by level<=365)
GROUP BY TRUNC(daterange,'MM')) fom
WHERE fd.cust_id = :CUST_ID
AND fd.coll_date > SYSDATE-400
AND TRUNC(fd.coll_date) = fom.first_of_month
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)
You probably need to group your data so that each month's data is in the group, and then within the group select the maximum date present. The sub-query might be:
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY YEAR(coll_date) * 100 + MONTH(coll_date);
This presumes that the functions YEAR() and MONTH() exist to extract the year and month from a date as an integer value. Clearly, this doesn't constrain the range of dates - you can do that, too. If you don't have the functions in Oracle, then you do some sort of manipulation to get the equivalent result.
Using information from Rhose (thanks):
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY TO_CHAR(coll_date, 'YYYYMM');
This achieves the same net result, putting all dates from the same calendar month into a group and then determining the maximum value present within that group.
Here's another approach, if ANSI row_number() is supported:
with RevDayRanked(itemDate,rn) as (
select
cast(coll_date as date),
row_number() over (
partition by datediff(month,coll_date,'2000-01-01') -- rewrite datediff as needed for your platform
order by coll_date desc
)
from super_table
)
select itemDate
from RevDayRanked
where rn = 1;
Rows numbered 1 will be nondeterministically chosen among rows on the last active date of the month, so you don't need distinct. If you want information out of the table for all rows on these dates, use rank() over days instead of row_number() over coll_date values, so a value of 1 appears for any row on the last active date of the month, and select the additional columns you need:
with RevDayRanked(cust_id, server_name, coll_date, rk) as (
select
cust_id, server_name, coll_date,
rank() over (
partition by datediff(month,coll_date,'2000-01-01')
order by cast(coll_date as date) desc
)
from super_table
)
select cust_id, server_name, coll_date
from RevDayRanked
where rk = 1;
If row_number() and rank() aren't supported, another approach is this (for the second query above). Select all rows from your table for which there's no row in the table from a later day in the same month.
select
cust_id, server_name, coll_date
from super_table as ST1
where not exists (
select *
from super_table as ST2
where datediff(month,ST1.coll_date,ST2.coll_date) = 0
and cast(ST2.coll_date as date) > cast(ST1.coll_date as date)
)
If you have to do this kind of thing a lot, see if you can create an index over computed columns that hold cast(coll_date as date) and a month indicator like datediff(month,'2001-01-01',coll_date). That'll make more of the predicates SARGs.
Putting the above pieces together, would something like this work for you?
SELECT fd.cust_id,
fd.server_name,
fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,
fd.column_name
FROM super_table fd,
WHERE fd.cust_id = :CUST_ID
AND TRUNC(fd.coll_date) IN (
SELECT MAX(TRUNC(coll_date))
FROM super_table
WHERE coll_date > SYSDATE - 400
AND cust_id = :CUST_ID
GROUP BY TO_CHAR(coll_date,'YYYYMM')
)
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)