Dynamically choose a column to query in Redshift - sql

I want to take a one-string query response to populate a SELECT statement in another query. Some say it's impossible, but Redshift makes fools of us all.
Imagine I have a table day_of_week as follows:
Day of Week | Weekend
---------------------
Monday | No
Tuesday | No
Wednesday | No
Thursday | No
Friday | No
Saturday | Yes
Sunday | Yes
And another party_time like this:
Yes | No
--------------------------
All the time | None of the time
I want to allow someone to just tell me a day (eg, "Wednesday") and then use the resulting Weekend value to query of party_time.
eg
SELECT (SELECT Weekend FROM day_of_week WHERE "Day of Week" = 'Wednesday')
FROM party_time
Result: 'None of the time'
How?

SQL itself isn't dynamic/self-referential although most implementations have some sort of meta language to partly get around that.
The most obvious solution for your problem is to change table party_time:
Test Meaning
--------------------------
Yes All the time
No None of the time
Then you can use a join or a sub-select to get to your answer:
select meaning
from party_time
inner join day_of_week
on weekend = test
where Day_of_Week = 'Wednesday'
Example: http://sqlfiddle.com/#!9/f10b7/1

Related

Storing Dates In SQL: Having Some Trouble

I have a bunch of data sitting in a Postgres database for a website I am building. Problem is, I don't really know what I should do with the date information. For example, I have a table called events that has a date column that stores date information as a string until I can figure out what to do with it.
The reason they are in string format is unfortunately for the topic of my website, their is not a good API, so I had to scrape data. Here's what some of the data looks like inside the column:
| Date |
|-------------------------------------|
| Friday 07.30.2021 at 08:30 AM ET |
| Wednesday 04.07.2021 at 10:00 PM ET |
| Saturday 03.27.2010 |
| Monday 01.11.2010 |
| Saturday 02.09.2019 at 05:00 PM ET |
| Wednesday 03.31.2010 |
It would have been nice to have every row have the time with it, but a lot of entries don't. I have no problem doing some string manipulation on the data to get them into a certain format where they can be turned into a date, I am just somewhat stumped on what I should do next.
What would you do in this situation if you were restricted to the data seen in the events table? Would you store it as UTC? How would you handle the dates without a time? Would you give up and just display everything as EST dates regardless of where the user lives (lol)?
It would be nice to use these dates to display correctly for anyone anywhere in the world, but it looks like I might be pigeonholed because of the dates that don't have a time associated with them.
Converting your mishmash free form looking date to a standard timestamp is not all that daunting as it seems. Your samples indicate you have a string with 5 separate pieces of information: day name, date (mm.dd.yyyy), literal (at),time of day, day part (AM,PM) and some code for timezone each separated by spaces. But for them to be useful the first step is splitting this into the individual parts. For that us a regular expression to ensure single spaces then use string_to_array to create an array of up to 6 elements. This gives:
+--------------------------+--------+----------------------------------+
| Field | Array | Action / |
| | index | Usage |
+--------------------------+--------+----------------------------------+
| day name | 1 | ignore |
| date | 2 | cast to date |
| literal 'at' | 3 | ignore |
| time of day | 4 | interval as hours since midnight |
| AM/PM | 5 | adjustment for time of day |
| some code for timezone | 6 | ??? |
+--------------------------+--------+----------------------------------+
Putting it all together we arrive at:
with test_data ( stg ) as
( values ('Friday 07.30.2021 at 08:30 AM ET' )
, ('Wednesday 04.07.2021 at 10:00 PM ET')
, ('Saturday 03.27.2010' )
, ('Monday 01.11.2010' )
, ('Saturday 02.09.2019 at 05:00 PM ET' )
, ('Wednesday 03.31.2010')
)
-- <<< Your query begins here >>>
, stg_array( strings) as
( select string_to_array(regexp_replace(stg, '( ){1,}',' ','g'), ' ' )
from test_data --<<< your actual table >>>
) -- select * from stg_array
, as_columns( dt, tod_interval, adj_interval, tz) as
( select strings[2]::date
, case when array_length(strings,1) >= 4
then strings[4]::interval
else '00:00':: interval
end
, case when array_length(strings,1) >= 5 then
case when strings[5]='PM'
then interval '12 hours'
else interval '0 hours'
end
else interval '0 hours'
end
, case when array_length(strings,1) >= 6
then strings[6]
else current_setting('TIMEZONE')
end
from stg_array
)
select dt + tod_interval + adj_interval dtts, tz
from as_columns;
This gives the corresponding timestamp for date, time, and AM/PM indicator (in the current timezone) for items without a timezone specifies. For those containing a timezone code, you will have to convert to a proper timezone name. Note ET is not a valid timezone name nor a valid abbreviation. Perhaps a lookup table. See example here; it also contains a regexp based solution. Also the example in run on db<>fiddle. Their server is in the UK, thus the timezone.

How to count all time users by signup week?

I think what I am trying to do is probably super simple. My use of SQL is just different until now.
I have a list of accounts that are opened all time. I would like to count how many accounts are opened by week. Monday-Sunday. Since the start of tracking.
Account | Signup Date
---------------------
1 | 1/1/17
2 | 1/6/17
3 | 1/10/17
4 | 1/13/17
5 | 2/4/17
6 | 2/5/17
7 | 3/15/17
So ideally if every week of the year starting 1/1/17 is numbered 1-53 I would love to be able to get 1 count of how many accounts signed up each week.
If that possible I would love any help.
the mysql function week(date) will help you. Or the SQLServer DATEPART( wk, date)
SELECT week(Signup Date,3) as 'Week',
count(Account) as 'Accounts Created'
FROM table
GROUP BY week(Signup Date,3)
Shoud give you the desired result

Can you define a custom "week" in PostgreSQL?

To extract the week of a given year we can use:
SELECT EXTRACT(WEEK FROM timestamp '2014-02-16 20:38:40');
However, I am trying to group weeks together in a bit of an odd format. My start of a week would begin on Mondays at 4am and would conclude the following Monday at 3:59:59am.
Ideally, I would like to create a query that provides a start and end date, then groups the total sales for that period by the weeks laid out above.
Example:
SELECT
(some custom week date),
SUM(sales)
FROM salesTable
WHERE
startDate BETWEEN 'DATE 1' AND 'DATE 2'
I am not looking to change the EXTRACT() function, rather create a query that would pull from the following sample table and output the sample results.
If 'DATE 1' in query was '2014-07-01' AND 'DATE 2' was '2014-08-18':
Sample Table:
itemID | timeSold | price
------------------------------------
1 | 2014-08-13 09:13:00 | 12.45
2 | 2014-08-15 12:33:00 | 20.00
3 | 2014-08-05 18:33:00 | 10.00
4 | 2014-07-31 04:00:00 | 30.00
Desired result:
weekBegin | priceTotal
----------------------------------
2014-07-28 04:00:00 | 30.00
2014-08-04 04:00:00 | 10.00
2014-08-11 04:00:00 | 32.45
Produces your desired output:
SELECT date_trunc('week', time_sold - interval '4h')
+ interval '4h' AS week_begin
, sum(price) AS price_total
FROM tbl
WHERE time_sold >= '2014-07-01 0:0'::timestamp
AND time_sold < '2014-08-19 0:0'::timestamp -- start of next day
GROUP BY 1
ORDER BY 1;
db<>fiddle here (extended with a row that actually shows the difference)
Old sqlfiddle
Explanation
date_trunc() is the superior tool here. You are not interested in week numbers, but in actual timestamps.
The "trick" is to subtract 4 hours from selected timestamps before extracting the week - thereby shifting the time frame towards the earlier bound of the ISO week. To produce the desired display, add the same 4 hours back to the truncated timestamps.
But apply the WHERE condition on unmodified timestamps. Also, never use BETWEEN with timestamps, which have fractional digits. Use the WHERE conditions like presented above. See:
Unexpected results from SQL query with BETWEEN timestamps
Operating with data type timestamp, i.e. with (shifted) "weeks" according to the current time zone. You might want to work with timestamptz instead. See:
Ignoring time zones altogether in Rails and PostgreSQL

How can I see if a date is on a weekend?

I have a table:
ID | Name | TDate
1 | John | 1 May 2013, 8:67AM
2 | Jack | 2 May 2013, 6:43AM
3 | Adam | 3 May 2013, 9:53AM
4 | Max | 4 May 2013, 2:13AM
5 | Leny | 5 May 2013, 5:33AM
I need a query that will return all the items where TDate is a weekend. How would I write such a
query?
WHAT I HAVE SO FAR
select
table.*,
EXTRACT (DAY FROM table.tdate )
from table
I did a select using EXTRACT to just see if I can get the right values. However, EXTRACT with the parameter DAY returns the day of the month. If I instead use WEEKDAY, as per the documentation here, then I get error:
ERROR: timestamp units "weekday" not recognized
SQL state: 22023
limit 1250
EDIT
TDate has a data type of datetime (timestamp). I just wrote it like that for easy reading. But regardless of the type, I could easily cast between types if need be.
I know dates 4May and 5May are weekends (as they fall on a Saturday and a Sunday). Does firebird allow for a way to write a query that will return dates if they fall on weekends.
try this:
SELECT ID, Name, TDate
FROM your_table
WHERE EXTRACT(WEEKDAY FROM TDate) IN (6,0)
UPDATE
condition must be (0,6) not (0,1).

Attributes of my Time dimension table in star schema

I'm building a DW with a star schema modeling. I'll use it for a BI project with pentaho.
I'll have of course a time dimension table. I'll analyze my fact table with differents granularity (day, week, month year, perhaps other)
Should I put one attribute for each of those granularity in my dimension table (so I have one day attribute, one month attribute, one year attribute ...) or should I just write the date and then calculate everything with this date (get the month of the date, the year of the date ...)?
thks a lot for your help
In addition to day, week, month, and year, you should think of other attributes like "company holiday", or "fiscal quarter". This can be an enormous resource for driving the same query off of different time windows.
I would add the attributes of the dates as their own columns. This does not take up significantly more space, and generally gives the query optimiser a better shot at working out how many of the dimension table records match a given criterion (for example, that the day_of_month = 31).
Typically, the more, the merrier.
Here is an example I'm using...
ledger#localhost-> select * from date_dimension where date = '2015-12-25';
-[ RECORD 1 ]----+--------------------
date | 2015-12-25
year | 2015
month | 12
monthname | December
day | 25
dayofyear | 359
weekdayname | Friday
calendarweek | 52
formatteddate | 25. 12. 2015
quartal | Q4
yearquartal | 2015/Q4
yearmonth | 2015/12
yearcalendarweek | 2015/52
weekend | Weekday
americanholiday | Holiday
austrianholiday | Holiday
canadianholiday | Holiday
period | Christmas season
cwstart | 2015-12-21
cwend | 2015-12-27
monthstart | 2015-12-01
monthend | 2015-12-31 00:00:00
It's based on queries from the PostgreSQL wiki here... https://wiki.postgresql.org/wiki/Date_and_Time_dimensions
It would be interesting to augment this with further things:
Religious days (Easter, some of the numerous Saints' days, Ramadan, Jewish festivals, etc)
Statutory holidays for relevant jurisdictions. The firm I work for winds up publicizing Irish banking holidays because a number of the customers pay via bank transfers.
If you operate in France, you might want Lundi, Mardi, Mercredi, ... rather than English day names.
Daylight Savings Time (as true/false) would be a nice addition.