Automatically execute script for each day - sql

I have a script which calculates the metrics for yesterday automatically and inserts the data into the table, but I want to fill the table with all missing dates.Is it even possible to do automatically or I should manually start the script for each day?
Here is the simplified version of the script:
select sum(amount),id,yesterday
where date < yesterday
group by id
But for example the day before yesterday is also missing in the table, so I want the above script to execute, and also the script:
select sum(amount),id,day_before_yesterday
where date < day_before_yesterday
group by id

use the last date you have in the target table :
select sum(amount),id, max(date)
from table
where date < (select max(date) - interval 1 day from target)
group by id

Related

Big query - Schedule Query on an external partitioned table with the keyword #run_date

Because it is client data I've replace in this post the project name and the dataset name by ******)
I'm trying to create a new schedule query in BigQuery on Google cloud platform
The problem is I've got this error in the web Query editor
Cannot query over table '******.raw_bounce_rate' without a filter over column(s) 'dt' that can be used for partition elimination
The thing is I do filter on the column dt.
Here is the scheme of my external partitioned table
Tracking_Code STRING
Pages STRING NULLABLE
Clicks_to_Page INTEGER
Path_Lengths INTEGER
Visit_Number INTEGER
Visitor_ID STRING
Mobile_Device_Type STRING
All_Visits INTEGER
dt DATE
dt is the field of the partition and I selected the option "Require partition filter"
Here is the simplify sql of my query
WITH yesterday_raw_bounce_rate AS (
SELECT *
FROM `******.raw_bounce_rate`
WHERE dt = DATE_SUB(#run_date, INTERVAL 1 DAY)
),
entries_table as (
SELECT dt,
ifnull(Tracking_Code, "sans campagne") as tracking_code,
ifnull(Pages, "page non trackée") as pages,
Visitor_ID,
Path_Lengths,
Clicks_to_Page,
SUM(all_visits) AS somme_visites
FROM
yesterday_raw_bounce_rate
GROUP BY
dt,
Tracking_Code,
Pages,
Visitor_ID,
Path_Lengths,
Clicks_to_Page
HAVING
somme_visites = 1 and Clicks_to_Page = 1
)
select * from entries_table
if I remove the statement
Clicks_to_Page = 1
or if I replace the
DATE_SUB(#run_date, INTERVAL 1 DAY)
by a hard coded date
the query is accepted by Big Query, it does not make sense to me
Currently, there is an open issue, here. It addresses the error regarding using #run_date filter in the filter of scheduled queries to partitioned tables with required filter. The engineering team is currently working on it, although there is no ETA.
In your scheduled query, you can use one of the two workarounds using #run_date.As follows:
First option,
DECLARE runDateVariable DATE DEFAULT #run_date;
#your code...
WHERE date = DATE_SUB(runDateVariable, INTERVAL 1 DAY)
Second option,
DECLARE runDateVariable DATE DEFAULT CAST(#run_date AS DATE);
#your code...
WHERE date = DATE_SUB(runDateVariable, INTERVAL 1 DAY)
In addition, you can also use CURRENT_DATE() instead of #run_date, as shwon below:
DECLARE runDateVariable DATE DEFAULT CURRENT_DATE();
#your code...
WHERE date = DATE_SUB(runDateVariable, INTERVAL 1 DAY)
UPDATE
I have set up another scheduled query to run daily with a table partitioned by DATE from a field called date_formatted and the partition filter is required. Then I have set up a backfill, here, so I could see the result of the scheduled query for previous days. Below is the code I used:
DECLARE runDateVariable DATE DEFAULT #run_date;
SELECT #run_date as run_date, date_formatted, fullvisitorId FROM `project_id.dataset.table_name` WHERE date_formatted > DATE_SUB(runDateVariable, INTERVAL 1 DAY)

sql save row count of query every day

I want to create a view or table that counts eg. total number of students on that day the query is executed and add row results each day. Problem is the date column on table changes everyday to the current date.
SELECT
COUNT(*) AS no_of_Students
,CAST(GETDATE() AS DATE) as DATE
FROM mySchool
WHERE students=1
No of student . Date
-----------------------
8 . 2019.02.06
15 . 2019.02.07
(next row should auto update for next day when running the query)
You should not be using GETDATE(), You need to pick the date column you have in your mySchool table.
You need to write your query like following.
SELECT
COUNT(*) as [no_of_Students]
,CAST([DateCoulumn] AS DATE) as [DATE]
FROM [mySchool]
GROUP BY CAST(DateCoulumn AS DATE)
ORDER BY CAST(DateCoulumn AS DATE)
Note: You need to replace DateCoulumn with the correct column name.

Analytics in sql

I have a table with the following structure:
use_id (int) - event (str) - time (timestamp) - value (int)
Event can take several values : install, login, buy, etc.
I need to get all user records before updating the application.
For example moment of release of my application - 1 January 2019, but users may be install new version on any day.
How can i get sum(value) by the first and second versions. ---------
I tried self-join table, but I think that this is not the best solution.
Help me, please.
Here is the definition of your table (as I understood it from your comments and description):
CREATE TABLE user_events (
user_id integer,
event varchar,
time timestamp without time zone,
value integer
);
Here is the query you asked for:
SELECT
COUNT(user_id),
SUM(value)
FROM (
SELECT
DISTINCT ON (user_id)
user_id,time,value
FROM user_events
WHERE event='install'
ORDER BY user_id, time DESC
) last_installations
WHERE
time BETWEEN date '2018-01-01' AND date '2019-01-01';
Some explanations:
inner query ( last_installations ) selects last install events for each user
outer query filters out only installations of first and second versions, and calculates SUM(value) (as you asked) and COUNT(user_id) (I added for clarity - how many users are using 1 and 2 versions now)
UPDATE
sum value for all events by version
SELECT
event,
CASE
WHEN time BETWEEN date '2018-01-01' AND timestamp '2018-05-30 23:59:59' THEN 1
WHEN time BETWEEN date '2018-06-01' AND timestamp '2018-12-31 23:59:59' THEN 2
WHEN time > date '2018-01-01' THEN 3
ELSE 0 -- unknown version
END AS version,
SUM(value)
FROM user_events
GROUP BY 1,2

How to get previous months data using KDB query?

How can I retrieve data from the previous month in such a way that, if the query were to be automated, the date value in the query would change accordingly, every month?
So for example:
When query is run on 2012.01.01 --> select * from Table where date >= 2011.12.01
When query is run on 2012.02.01 --> select * from Table where date >= 2012.01.01
When query is run on 2012.03.01 --> select * from Table where date >= 2012.02.01
and so on..
Help would be much appreciated!
I assume you are using Oracle Database;
select * from Table where date >= ADD_MONTHS(TRUNC(SYSDATE),-1)

Help me build a SQL select statement

SQL isn't my greatest strength and I need some help building a select statement.
Basically, this is my requirement. The table stores a list of names and a timestamp of when the name was entered in the table. Names may be entered multiple times during a week, but only once a day.
I want the select query to return names that were entered anytime in the past 7 days, but not today.
To get a list of names entered today, this is the statement I have:
Select * from table where Date(timestamp) = Date(now())
And to get a list of names entered in the past 7 days, not including today:
Select * from table where (Date(now())- Date(timestamp) < 7) and (date(timestamp) != date(now()))
If the first query returns a set or results, say A, and the second query returns B, how can I get
B-A
Try this if you're working with SQL Server:
SELECT * FROM Table
WHERE Timestamp BETWEEN
dateadd(day,datediff(day,0,getdate()),-7),
AND dateadd(day,datediff(day,0,getdate()),0)
This ensures that the timestamp is between 00:00 7 days ago, and 00:00 today. Today's entries with time greater than 00:00 will not be included.
In plain English, you want records from your second query where the name is not in your first query. In SQL:
Select *
from table
where (Date(now())- Date(timestamp) < 7)
and (date(timestamp) != date(now()))
and name not in (Select name
from table
where Date(timestamp) = Date(now())
)
not in
like
select pk from B where PK not in A
or you can do something like
Select * from table where (Date(now())- Date(timestamp) < 7) and (Date(now())- Date(timestamp) > 1)