Add multiple future dates at one time into redshift SQL - sql

Now I have a table that with the detail as follows:
Date Campaign Visits Orders Revenue
.... .... .... .... ....
Jun-18 Promotion01 10 1 120
Let's say it called table A
Now because of report purpose, I would like to add in new dates like as follows
Date Campaign Visits Orders Revenue
Jul-18 NULL 0 0 0
Aug-18 NULL 0 0 0
Sep-18 NULL 0 0 0
.... .... .... .... ....
Dec-18 NULL 0 0 0
I would like to use the union to add in only the date data.
I tried the dateadd function in Amazon redshift with the following command
SELECT
to_char(dateadd(month, 18, '01-01-2017'),'yyyy-MM') as plus30,
NULL,
0,
0,
0
It returns the date, however it just return only 1 row i.e
Date Campaign Visits Orders Revenue
Jul-18 NULL 0 0 0
If I want to return multiple row like how it is shown before, except of joining 1 by 1, what should I do then?
Many thanks for your help!

Frankly, the easiest way to do these sorts of tasks is to make an Excel spreadsheet, fill-in all the desired values, save it as CSV and then use COPY to load it into Redshift. This gives you the benefit of a nice interface to create the data without having to play around with SQL.
The reason you are only getting one line is that SELECT normally works off a table of data, returning one result per row of input data. You have not specified a table, so it is returning only one row.
Fortunately, you can use generate_series() in some situation to simulate data:
SELECT
to_char(date '01-01-2017' + counter * interval '1 month','yyyy-MM') as plus30,
NULL,
0,
0,
0
FROM generate_series(1, 12) as counter
This generates:
2017-02 (null) 0 0 0
2017-03 (null) 0 0 0
2017-04 (null) 0 0 0
2017-05 (null) 0 0 0
2017-06 (null) 0 0 0
2017-07 (null) 0 0 0
2017-08 (null) 0 0 0
2017-09 (null) 0 0 0
2017-10 (null) 0 0 0
2017-11 (null) 0 0 0
2017-12 (null) 0 0 0
2018-01 (null) 0 0 0
(Your question showed yyyy-MM in the SQL, but Mon-YY in the output, so please adjust accordingly. It is normally better to use yyyy-MM because it is easily sortable as both a number and a string.)

Related

Issues with imported csv string values DBeaver

I've connected to some downloaded csv's in DBeaver. All of the connected values come in as "Strings". I aliased the names of the csvs first:
SELECT *
FROM us
SELECT *
FROM "us-counties-2023" AS usc
SELECT *
FROM "us-states" AS ust
But then I can't perform any JOINs or figure out how to cast the data. The data looks like this. How can I cast or convert this to useable data that I can perform JOINs and aggregate functions on?
date cases deaths
1/21/2020 1 0
1/22/2020 1 0
1/23/2020 1 0
1/24/2020 2 0
1/25/2020 3 0
1/26/2020 5 0
1/27/2020 5 0
1/28/2020 5 0
1/29/2020 5 0
1/30/2020 6 0
1/31/2020 7 0
2/1/2020 8 0
I attempted to CAST the data, but it keeps throwing errors. I expected to be able to CAST the data to the type I need (Date, Integer, etc...) to then perform some JOINS and other functions.

Creating 2 additional columns based on past dates - PostgresSQL

Seeking some help after spending alot of time on searching but to no avail and decided to post this here as I'm rather new to SQL, so any help is greatly appreciated. I've tried a few functions but can't seem to get it right. e.g. GROUP BY, BETWEEN etc
On the PrestoSQL server, I have a table as shown below starting with columns Date, ID and COVID. Using GROUP BY ID, I would like to create a column EverCOVIDBefore which looks back at all past dates of the COVID column to see if there was ever COVID = 1 or not, as well as another column called COVID_last_2_mth which checks if there was ever COVID = 1 within the past 2 months
(Highlighted columns are my expected outcomes)
Link to dataset: https://drive.google.com/file/d/1Sc5Olrx9g2A36WnLcCFMU0YTQ3-qWROU/view?usp=sharing
You can do:
select *,
max(covid) over(partition by id order by date) as ever_covid_before,
max(covid) over(partition by id order by date
range between interval '2 month' preceding and current row)
as covid_last_two_months
from t
Result:
date id covid ever_covid_before covid_last_two_months
----------- --- ------ ------------------ ---------------------
2020-01-15 1 0 0 0
2020-02-15 1 0 0 0
2020-03-15 1 1 1 1
2020-04-15 1 0 1 1
2020-05-15 1 0 1 1
2020-06-15 1 0 1 0
2020-01-15 2 0 0 0
2020-02-15 2 1 1 1
2020-03-15 2 0 1 1
2020-04-15 2 0 1 1
2020-05-15 2 0 1 0
2020-06-15 2 1 1 1
See running example at db<>fiddle.

How to show the closest date to the selected one

I'm trying to extract the stock in an specific date. To do so, I'm doing a cumulative of stock movements by date, product and warehouse.
select m.codart AS REF,
m.descart AS 'DESCRIPTION',
m.codalm AS WAREHOUSE,
m.descalm AS WAREHOUSEDESCRIP,
m.unidades AS UNITS,
m.entran AS 'IN',
m.salen AS 'OUT',
m.entran*1 + m.salen*-1 as MOVEMENT,
(select sum(m1.entran*1 + m1.salen*-1)
from MOVSTOCKS m1
where m1.codart = m.codart and m1.codalm = m.codalm and m.fecdoc >= m1.fecdoc) as 'CUMULATIVE',
m.PRCMEDIO as 'VALUE',
m.FECDOC as 'DATE',
m.REFERENCIA as 'REF',
m.tipdoc as 'DOCUMENT'
from MOVSTOCKS m
where (m.entran <> 0 or m.salen <> 0)
and (select max(m2.fecdoc) from MOVSTOCKS m2) < '2020-11-30T00:00:00.000'
order by m.fecdoc
Without the and (select max(m2.fecdoc) from MOVSTOCKS m2) < '2020-11-30T00:00:00.000' it shows data like this, which is ok.
REF WAREHOUSE UNITS IN OUT MOVEMENT CUMULATIVE DATE
1 0 2 0 2 -2 -7 2020-11-25
1 1 3 0 3 -3 -3 2020-11-25
1 0 5 0 5 -5 -7 2020-11-25
1 0 9 9 0 9 2 2020-11-26
2 0 2 2 0 2 2 2020-11-26
1 0 1 1 0 1 3 2020-12-01
The problem is, with the subselect in the where clause it returns no results (I think it is because it just looks for the max date and says it is bigger than 2020-11-30). I would like it to show the closest dates (all of them, for each product and warehouse) to the selected one, in this case 2020-11-30.
It should look slike this:
REF WAREHOUSE UNITS IN OUT MOVEMENT CUMULATIVE DATE
1 1 3 0 3 -3 -3 2020-11-25
1 0 9 9 0 9 2 2020-11-26
2 0 2 2 0 2 2 2020-11-26
Sorry if I'm not clear. Ask me if I have to clarify anything
Thank you
I am guessing that you want something like this:
select t.*
from (select m.*,
sum(m.entran - m1.salen) over (partition by m.codart, m.codalm order by fecdoc) as cumulative,
max(fecdoc) over (partition by m.codart, m.codalm) as max_fecdoc
from MOVSTOCKS m
where fecdoc < '2020-11-30'
) m
where fecdoc = max_fecdoc;
The subquery calculates the cumulative amount of stock using window functions and filters for records before the cutoff date. The outer query selects the most recent record from the combination of codeart/codalm, which seems to be how you are identifying a product.

Calculating attrition in SQL

I am more or less noob to sql so I would really appreciate if anyone of you could give some hints how start to deal with the task below.
We have a database of donatons made by regular donors to an NGO.
Fields: donor_id, date
We need to crate an attrition table listing the donation periods and the proportion of donors who are still donating the organization after n months.
As we count donors from their first donation, the first period is 100%, than the request should check if the donor gave donation in the 2nd, 3rd Nth month after the donors's first donation.
Donation 1 2 3 4 5 6 7 8 9 10 11
donor 1 1 1 1 0 1 0 1 0 0 0 0
donor 2 1 1 0 0 0 1 0 0 1 0 0
donor 3 1 0 1 0 0 0 0 0 0 0 0
Any idea? :) Thank you!
PS: until know we used excel or google sheets for this but now we got a databse with 50 million rows, so I have been told to find a solution quickly.

How to perform a Distinct Sum using MDX?

So I have data like this:
Date EMPLOYEE_ID HEADCOUNT TERMINATIONS
1/31/2011 1 1 0
2/28/2011 1 1 0
3/31/2011 1 1 0
4/30/2011 1 1 0
...
1/31/2012 1 1 0
2/28/2012 1 1 0
3/31/2012 1 1 0
1/31/2012 2 1 0
2/28/2011 2 1 0
3/31/2011 2 1 0
4/30/2011 2 0 1
1/31/2012 3 1 0
2/28/2011 3 1 0
3/31/2011 3 1 0
4/30/2011 3 1 0
...
1/31/2012 3 1 0
2/28/2012 3 1 0
3/31/2012 3 1 0
And I want to sum up the headcount, but I need to remove the duplicate entries from the sum by the employee_id. From the data you can see employee_id 1 occurs many times in the table, but I only want to add its headcount column once. For example if I rolled up on year I might get a report using this query:
with member [Measures].[Distinct HeadCount] as
??? how do I define this???
select { [Date].[YEAR].children } on ROWS,
{ [Measures].[Distinct HeadCount] } on COLUMNS
from [someCube]
It would product this output:
YEAR Distinct HeadCount
2011 3
2012 2
Any ideas how to do this with MDX? Is there a way to control which row is used in the sum for each employee?
You can use an expression like this:
WITH MEMBER [Measures].[Distinct HeadCount] AS
Sum(NonEmpty('the set of the employee ids', 'all the dates of the current year (ie [Date].[YEAR].CurrentMember)'), [Measures].[HeadCount])
If you want a more generic expression you can use this:
WITH MEMBER [Measures].[Distinct HeadCount] AS
Sum(NonEmpty('the set of the employee ids',
Descendants(Axis(0).Item(0).Item(0).Hierarchy.CurrentMember, Axis(0).Item(0).Item(0).Hierarchy.CurrentMember.Level, LEAVES)),
IIf(IsLeaf(Axis(0).Item(0).Item(0).Hierarchy.CurrentMember),
[Measures].[HeadCount],
NULL))