In snowflake :
I have two tables available:
"SEG_HISTO": This is a segmentation run once a month.
columns: Client ID /date (1st of each month) /segment.
"TCK": a table that contains the tickets with the columns: Ticket ID / Customer ID / Date / Amount.
For each customer ID in the "SEG_HISTO" table, I searched for all the customer's tickets over a rolling year and associated the sum of the amount spent:
SELECT SEG_OMNI.*, TCK_12M.TOTAL_AMOUNT_HT
FROM "SHARE"."DATAMARTS_DATASCIENCE"."SEG_OMNI" SEG_OMNI
LEFT OUTER JOIN
(
SELECT DISTINCT PR_ID_BU,
SUM(TOTAL_AMOUNT_HT) AS "TOTAL_AMOUNT_HT",
COUNT(*) "NB_ACHAT"
FROM
(
SELECT * FROM "SHARE"."RAW_BDC"."TCK"
WHERE TO_DATE(DT_SALE) >= DATEADD(YEAR, -1, '2022-07-01') -- <<<===== date add manually
)
GROUP BY PR_ID_BU
) TCK_12M
ON SEG_OMNI."pr_id_bu" = TCK_12M.PR_ID_BU
Now I need to create a for loop that iterates this for each date in the SEG_OMNI table (SELECT DISTINCT TO_DATE(DT_MAJ) DT FROM "SHARE"."DATAMARTS_DATASCIENCE"."SEG_HISTO") and stack the output in a view.
And it is at this level where I block
Thank you for your help in advance
As Dave said in the comments, it would be better if you could figure out how to run all this in one query, instead of running the same query multiple times.
But as you are asking how to output the results of multiple queries out of one stored procedure I'm going to give you the pattern for that here. I'm also assuming you want this in a SQL script (we could use Python/Java/JS instead):
declare
your_var string;
all_dates cursor for (
select dates
from your_table
);
begin
-- create a table to store results
create or replace temp table discovery_results(x string, y string, z int);
for record in all_dates do
-- for each date run the query an insert results into the table created
insert into discovery_results
select x, y, z
from the_query
where (:dates_cursor_data)
;
end for;
return 'run [select * from discovery_results] to find the results';
end;
select *
from discovery_results
Related
I am trying to create a table in Impala (SQL) that takes rows from a parquet table. The data represents bike rides in a city. Rows will be imported into the new table if there starting code (a string, ex: '6100') shows up more than 100 times in the first table. Heres what I have so far:
#I am using Apache Impala via the Hue Editor
invalidate metadata;
set compression_codec=none;
invalidate metadata;
Set compression_codec=gzip;
create table bixirides_parquet (
start_date string, start_station_code string,
end_date string, end_station_code string,
duration_sec int, is_member int)
stored as parquet;
Insert overwrite table bixirides_parquet select * from bixirides_avro;
invalidate metadata;
set compression_codec=none;
create table impala_out stored as textfile as select start_date, start_station_code, end_date, end_station_code, duration_sec, is_member, count(start_station_code) as count
from bixirides_parquet
having count(start_station_code)>100;
For some reason the statement will run, but no rows are inserted in the new table. It should import a row into the new table if that rows starting code shows up more than 100 times in the original table. I think I'm wording my select statement improperly but I'm not sure how exactly.
I think the final query you want is:
select start_date, start_station_code, end_date,
end_station_code, duration_sec, is_member, cnt
from (select bp.*,
count(*) over (partition by start_station_code) as cnt
from bixirides_parquet bp
) bp
where cnt > 100;
I have a Library loans table which records the LoanID, OutDate, DueDate, and InDate for each book loan. I want to create a function that will accept the previous year (ex. 2017) as input and output the number of loans for that year by counting the loan ids. I am able to do this but the number is repeated multiple times in the result set and I only want it to return one value.
create function Books.numLoansLastYear
(
#year as int
)
returns int
as
begin
declare #numLoans as int
select #numLoans = COUNT(LoanID)
from Reservations.Loan
where year(OutDate) = #year
return #numLoans
end
;
go
To test I am using the following query:
select Books.numLoansLastYear(year(GETDATE())-1) as 'Number of Loans'
from Reservations.Loan
;
The only way that I am able to resolve this is by using DISTINCT in my test query so I am wondering if my function is incorrect.
Any ideas?
I am a little unclear on your question. If your query were:
select l.*
from Reservations.Loan l;
You would not be surprised at getting multiple rows.
The same is true if you select anything with from and no filtering (or other conditions). You are selecting all the rows in the table. This is true even for a function call or constant.
I think you simply want:
select Books.numLoansLastYear(year(GETDATE())-1) as [Number of Loans];
Note the lack of from clause.
I use postgres and I need some help from you all PG experts...
I am looking to track counts from a large set of source tables whose counts keep changing everyday. I want to use the tablename, row count and tablesize in a tracker table, and a column called created_dttm field to show when this row count is recorded from source table. This is for trending how the table counts are changing with time and look for peaks.
insert into tracker_table( tablename, rowcount, tablesize, timestamp)
from
(
(select schema.tablename ... - not sure how to drive this to pick up a list of tables??
, select count(*) from schema.tablename
, SELECT pg_size_pretty(pg_total_relation_size('"schema"."tablename"'))
, select created_dttm from schema.tablename
)
);
Additionally, I want to get a particular column from source table for a fourth column. This would be a created_dttm timestamp field in the source table, and I want to run a simple sql to get this date to the tracker table. Any suggestions how to attack this problem?
before reading the code please consider this:
instead of selecting several subqueries, this if you can join them into one qry, eg select (select 1 from t), (select 2 from t) can be refactored to select 1,2 from t
pg_total_relation_size is sum of data pages, so it is size of table, but not size of data in it.
you need aggregation on your created_dttm column (I used oid instead), otherwise your subquery returns more then one row, so you won't be able to insert the result.
instead of select count(*) maybe use pg_stat_all_tables stats?.. counting can be very expensive and acuracy of the count() is neglected by the fact that next minute same select count() will be different and you probably wont run this count every two seconds...
code:
t=# create table so30 (n text, c int, s text, o int);
CREATE TABLE
t=# do
$$
declare
_r record;
_s text;
begin
for _r in (values('pg_database'),('pg_roles')) loop
_s := format('select %1$L,(select count(*) from %1$I), (SELECT pg_size_pretty(pg_total_relation_size(%1$L))), (select max(oid) from %1$I)',_r.column1);
execute format('insert into so30 %s',_s);
end loop;
end;
$$
;
DO
t=# select * from so30;
n | c | s | o
-------------+---+---------+-------
pg_database | 4 | 72 kB | 16384
pg_roles | 2 | 0 bytes | 4200
(2 rows)
I'm writing a program to email employees who's certifications are set to expire within the next 3 months. Since some employees have already renewed their certification's I'm creating a temporary table of "Good Ids", employees who have a certification that won't expire for at least three months.
To that end I am using:
CREATE GLOBAL TEMPORARY TABLE GOOD_IDS(
INTERNAL_EMPL_ID VARCHAR(10)
) ON COMMIT PRESERVE ROWS;
INSERT INTO GOOD_IDS (
SELECT DISTINCT (INTERNAL_EMPL_ID)
FROM LICENSE
WHERE LICENSE_TYP_CD IN ('STD') AND EXPIRATION_DT >= CURRENT DATE + 3 months);
SELECT * FROM GOOD_IDS
I've run the second select by it's self and can confirm that it returns ~3000 rows. However when I run all three I get zero rows. What am I missing?
OK. I rewrote the SQL to use a WITH statement instead.
So it looks like this now
WITH
GOOD_ID_LIST AS (SELECT DISTINCT (INTERNAL_EMPL_ID) FROM LICENSE WHERE LICENSE_TYP_CD IN ('STD') AND EXPIRATION_DT >= CURRENT DATE + 3 months)
SELECT ...
I have a table with records that look like this:
CREATE TABLE sample (
ix int unsigned auto_increment primary key,
start_active datetime,
last_active datetime
);
I need to know how many records were active on each of the last 30 days. The days should also be sorted incrementing so they are returned oldest to newest.
I'm using MySQL and the query will be run from PHP but I don't really need the PHP code, just the query.
Here's my start:
SELECT COUNT(1) cnt, DATE(?each of last 30 days?) adate
FROM sample
WHERE adate BETWEEN start_active AND last_active
GROUP BY adate;
Do an outer join.
No table? Make a table. I always keep a dummy table around just for this.
create table artificial_range(
id int not null primary key auto_increment,
name varchar( 20 ) null ) ;
-- or whatever your database requires for an auto increment column
insert into artificial_range( name ) values ( null )
-- create one row.
insert into artificial_range( name ) select name from artificial_range;
-- you now have two rows
insert into artificial_range( name ) select name from artificial_range;
-- you now have four rows
insert into artificial_range( name ) select name from artificial_range;
-- you now have eight rows
--etc.
insert into artificial_range( name ) select name from artificial_range;
-- you now have 1024 rows, with ids 1-1024
Now make it convenient to use, and limit it to 30 days, with a view:
Edit: JR Lawhorne notes:
You need to change "date_add" to "date_sub" to get the previous 30 days in the created view.
Thanks JR!
create view each_of_the_last_30_days as
select date_sub( now(), interval (id - 1) day ) as adate
from artificial_range where id < 32;
Now use this in your query (I haven't actually tested your query, I'm just assuming it works correctly):
Edit: I should be joining the other way:
SELECT COUNT(*) cnt, b.adate
FROM each_of_the_last_30_days b
left outer join sample a
on ( b.adate BETWEEN a.start_active AND a.last_active)
GROUP BY b.adate;
SQL is great at matching sets of values that are stored in the database, but it isn't so great at matching sets of values that aren't in the database. So one easy workaround is to create a temp table containing the values you need:
CREATE TEMPORARY TABLE days_ago (d SMALLINT);
INSERT INTO days_ago (d) VALUES
(0), (1), (2), ... (29), (30);
Now you can compare a date that is d days ago to the span between start_active and last_active of each row. Count how many matching rows in the group per value of d and you've got your count.
SELECT CURRENT_DATE - d DAYS, COUNT(*) cnt,
FROM days_ago
LEFT JOIN sample ON (CURRENT_DATE - d DAYS BETWEEN start_active AND last_active)
GROUP BY d
ORDER BY d DESC; -- oldest to newest
Another note: you can't use column aliases defined in your select-list in expressions until you get to the GROUP BY clause. Actually, in standard SQL you can't use them until the ORDER BY clause, but MySQL supports using aliases in GROUP BY and HAVING clauses as well.
Turn the date into a unix timestamp, which is seconds, in your query and then just look for the difference to be <= the number of seconds in a month.
You can find more information here:
http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_unix-timestamp
If you need help with the query please let me know, but MySQL has nice functions for dealing with datetime.
[Edit] Since I was confused as to the real question, I need to finish the lawn but before I forget I want to write this down.
To get a count of the number by day you will want your where clause to be as I described above, to limit to the past 30 days, but you will need to group by day, and so select by converting each start to a day of the month and then do a count of those.
This assumes that each use will be limited to one day, if the start and end dates can span several days then it will be trickier.