This question has been asked before in many forms. But none of the solutions proposed worked for my case.
I am using GBQ.
I have this table:
Hour Orders
2022-01-12T00:00:00 12
2022-01-12T01:00:00 8
2022-01-12T02:00:00 9
I want to create a query to insert data into this table automatically per hour, under these conditions:
If the "most recent hour" that I want to insert already exists, I do not want to insert it twice.
I tried the following SQL query :
IF EXISTS (SELECT 1 FROM `Table` WHERE Hour = var_most_recent_hour)
UPDATE `Table` SET Orders = var_most_recent_orders WHERE Hour = var_most_recent_hour
ELSE
INSERT INTO `Table` (Hour, Orders) VALUES (var_most_recent_hour, var_most_recent_orders)
This syntax is returning an error in GBQ, although the SQL syntax is usually accepted.
Is there a way to do this?
My priority is to insert without duplicates.
I don't care about the UPDATE part in my query.
Ideally I want something like (I know this syntax does not exist):
IF NOT EXISTS (SELECT 1 FROM `Table` WHERE Hour = var_most_recent_hour)
INSERT INTO `Table` (Hour, Orders) VALUES (var_most_recent_hour, var_most_recent_orders)
Thank you
Try Sample code below
declare most_rcnt_hour time;
INSERT INTO dataset.targettable(...
SELECT *
FROM dataset.targettable T
JOIN (SELECT most_rcnt_hour AS most_rcnt_hour) as S
ON T.rcnt_hour <> S.most_rcnt_hour
Note that IF in BQ works differently. IF
Situation:
We have a database "base1" ~ 6 million lines of data, which shows the actual customer purchases and the day of purchase + the parameters of this purchase.
CREATE TABLE base1 (
User_id NOT NULL PRIMARY KEY ,
PurchaseDate date,
Parameter1 int,
Parameter2 int,
...
ParameterK int );
And also another database "base2" ~ 90 million lines of data, which actually shows the same thing, but instead of the day of purchase, a weekly section is used (for example: all weeks for 4 years for each client - if there was no purchase for N week, the client is still shown).
CREATE TABLE base2 (
Users_id NOT NULL PRIMARY KEY ,
Week_start date ,
Week_end date,
Parameter1 int,
Parameter2 int,
...
ParameterN int );
The task to do the following query:
-- a = base1 , b , wb%% = base2
--create index idx_uid_purch_date on base1(Users_ID,Purchasedate);
SELECT a.Users_id
-- Checking whether the client will make a purchase in next week and the purchase will be bought on condition
,iif(b.Users_id is not null,1,0) as User_will_buy_next_week
,iif(b.Users_id is not null and b.Parameter1 = 1,1,0) as User_will_buy_on_Condition1
-- about 12 similar iif-conditions
,iif(b.Users_id is not null and (b.Parameter1 = 1 and b.Parameter12 = 1),1,0)
as User_will_buy_on_Condition13
-- checking on the fact of purchase in the past month, 2 months ago, 2.5 months, etc.
,iif(wb1m.Users_id is null,0,1) as was_buy_1_month_ago
,iif(wb2m.Users_id is null,0,1) as was_buy_2_month_ago
,iif(wb25m.Users_id is null,0,1) as was_buy_25_month_ago
,iif(wb3m.Users_id is null,0,1) as was_buy_3_month_ago
,iif(wb6m.Users_id is null,0,1) as was_buy_6_month_ago
,iif(wb1y.Users_id is null,0,1) as was_buy_1_year_ago
,a.[Week_start]
,a.[Week_end]
into base3
FROM base2 a
-- Join for User_will_buy
left join base1 b
on a.Users_id =b.Users_id and
cast(b.[PurchaseDate] as date)>=DATEADD(dd,7,cast(a.[Week_end] as date))
and cast(b.[PurchaseDate] as date)<=DATEADD(dd,14,cast(a.[Week_end] as date))
-- Joins for was_buy
left join base1 wb1m
on a.Users_id =wb1m.Users_id
and cast(wb1m.[PurchaseDate] as date)>=DATEADD(dd,-30-4,cast(a.[Week_end] as date))
and cast(wb1m.[PurchaseDate] as date)<=DATEADD(dd,-30+4,cast(a.[Week_end] as date))
/* 4 more similar joins where different values are added in
DATEADD (dd, %%, cast (a. [Week_end] as date))
to check on the fact of purchase for a certain period */
left outer join base1 wb1y
on a.Users_id =wb1y.Users_id and
cast(wb1y.[PurchaseDate] as date)>=DATEADD(dd,-365-4,cast(a.[Week_end] as date))
and cast(wb1y.[PurchaseDate] as date)<=DATEADD(dd,-365+5,cast(a.[Week_end] as date))
Because of the huge number of Joins and rather big databases - this script runs for about 24 hours, which is incredibly long.
Main time, as the execution plan shows, is spent on "Merge Join" and view the rows of the table from base1 and base2, and insert the data into another base3 table.
Question: Is it possible to optimize this query so it works faster?
Perhaps using one Join instead or something.
Help please, I'm not that smart enough :(
Thanx everybody for your answers!
UPD: Maybe use of different type of joins (merge, loop, or hash) may help me, but can't really check this theory. Maybe someone can tell me whether it's right or wrong ;)
You want to have all 90 million base2 rows in your result, each with additional information on base1 data. So, what the DBMS must do is a full table scan on base2 and quickly find related rows in base1.
The query with EXISTS clauses would look something like this:
select
b2.users_id,
b2.week_start,
b2.week_end,
case when exists
(
select *
from base1 b1
where b1.users_id = b2.users_id
and b1.purchasedate between dateadd(day, 7, cast(b2.week_end as date))
and dateadd(day, 14, cast(b2.week_end as date))´
) then 1 else 0 end as user_will_buy_next_week,
case when exists
(
select *
from base1 b1
where b1.users_id = b2.users_id
and b1.parameter1 = 1
and b1.purchasedate between dateadd(day, 7, cast(b2.week_end as date))
and dateadd(day, 14, cast(b2.week_end as date))´
) then 1 else 0 end as user_will_buy_on_condition1,
case when exists
(
select *
from base1 b1
where b1.users_id = b2.users_id
and b1.parameter1 = 1
and b1.parameter2 = 1
and b1.purchasedate between dateadd(day, 7, cast(b2.week_end as date))
and dateadd(day, 14, cast(b2.week_end as date))´
) then 1 else 0 end as user_will_buy_on_condition13,
case when exists
(
select *
from base1 b1
where b1.users_id = b2.users_id
and b1.purchasedate between dateadd(day, -30-4, cast(b2.week_end as date))
and dateadd(day, -30+4, cast(b2.week_end as date))´
) then 1 else 0 end as was_buy_1_month_ago,
...
from base2 b2;
We can easily see that this will take a long time, because all conditions must be checked per base2 row. That is 9 million times 7 lookups. The only thing we can do about this is to provide an index, hoping the query will benefit from it.
create index idx1 on base1 (users_id, purchasedate, parameter1, parameter2);
We can add more indexes, so the DBMS can choose between them on selectivity. Later we can check whether they are used and drop them in case they aren't.
create index idx2 on base1 (users_id, parameter1, purchasedate);
create index idx3 on base1 (users_id, parameter1, parameter2, purchasedate);
create index idx4 on base1 (users_id, parameter2, parameter1, purchasedate);
I assume that the base1 table stores information about the current week purchases.
If that is true, in the query conditions of joins you could ignore [PurchaseDate] parameter, replacing it with the current date constant instead. In that case your DATEADD functions will be applied to the current date and will be constants in the conditions of joins:
left join base1 b
on a.Users_id =b.Users_id and
DATEADD(day,-7,GETDATE())>=a.[Week_end]
and DATEADD(day,-14,GETDATE())<=a.[Week_end]
To have the query above running correctly you should limit b.[PurchaseDate] to the current day.
Then you could run another query, for the purchases made yesterday, and all DATEADD constants in join conditions corrected by -1
And so on, up to 7 queries, or whatever timespan the base1 table covers.
You could also implement grouping of [PurchaseDate] values by days, recalculate constants and make all of that in a single query, but I'm not ready to spend time creating it myself. :)
If you have recurring argument such as DATEADD(dd,-30-4,cast(a.[Week_end] as date)) for example, to make it SARGable you can create an index on it (SQL Server can't). Postgres can do this:
create index ix_base2__34_days_ago on base2(DATEADD(dd,-30-4, cast([Week_end] as date)))
Then an expression like the following would be SARGable as index on DATEADD(dd,-30-4, cast([Week_end])) would be utilized by your database, hence a condition like the following will be fast if you have an index like on the example above.
and cast(wb1m.[PurchaseDate] as date) >= DATEADD(dd,-30-4,cast(a.[Week_end] as date))
Note that casting PurchaseDate to date yields a SARGable expression, despite cast looking like a function, as SQL Server has special handling of datetime to date, an index on datetime field is SARGable even you search on datetime field partially (the date part only). Similar to partial expression like, where lastname LIKE 'Mc%', that expression is SARGable even if an index is for the whole lastname field. I digress.
To somewhat achieve the index on expression on SQL Server, you can create a computed column on that expression.., e.g.,
CREATE TABLE base2 (
Users_id NOT NULL PRIMARY KEY ,
Week_start date ,
Week_end date,
Parameter1 int,
Parameter2 int,
Thirty4DaysAgo as DATEADD(dd,-30-4, cast([Week_end] as date))
)
..and then create index on that column:
create index ix_base2_34_days_ago on base2(Thirty4DaysAgo)
Then change your expression to:
and cast(wb1m.[PurchaseDate] as date) >= a.Thirty4DaysAgo
That's what I would recommend before, change the old expression to use the computed column. However, upon further searching, it looks like you can just retain your original code, as SQL Server can intelligently match an expression to the computed column, and if you have an index on that column, your expression would be SARGable. Thus your DBA can optimize things behind the scenes and your original code would run optimized without requiring any changes on your code. So no need to change the following, and it will be SARGable (granted that your DBA created a computed column for dateadd(recurring parameters here) expression, and applied index on it) :
and cast(wb1m.[PurchaseDate] as date) >= DATEADD(dd,-30-4,cast(a.[Week_end] as date))
The only downside (when compared to Postgres) is you still have the dangling computed column on your table when using SQL Server :)
Good read: https://littlekendra.com/2016/03/01/sql-servers-year-function-and-index-performance/
I'm trying to create a new table based on particular values that match between two tables and that works fine but my issue comes about when I try to filter the newly joined table by dates.
CREATE TABLE JoinedValuesTable
(
[Ref] INT IDENTITY(1,1) PRIMARY KEY,
[Parties] CHAR(50),
[Accounts] CHAR(50),
[Amount] FLOAT
);
The table above is created okay and I join insert values into it by joining two tables like this....
INSERT INTO JoinedValuesTable ([Parties], [Accounts], [Amount])
SELECT
InputPerson.[PARTY], Input_Y.[R_Account_1], InputPerson.[Amount]
FROM
InputPerson
JOIN
Input_Y ON InputPerson.[Action] = Input_Y.[Action]
And this works fine it's when I try to filter by dates that it doesn't seem to work....
INSERT INTO JoinedValuesTable([Parties], [Accounts], [Amount])
SELECT
InputPerson.[PARTY], Input_Y.[R_Account_1], InputPerson.[Amount]
FROM
InputPerson
JOIN
Input_Y ON InputPerson.[Action] = Input_Y.[Action]
WHERE
InputPerson.[Date] BETWEEN '2018-01-01' AND '2018-03-03'
I'm not getting any values into my new table. Anyone got any ideas?
Do not use between for dates. A better method is:
WHERE InputPerson.[Date] >= '2018-01-01' AND
InputPerson.[Date] < '2018-03-04'
I strongly recommend Aaron Bertrand's blog on this topic: What do BETWEEN and the devil have in common?
This assumes that Date is being stored as a date/time column. If it is a string, then you need to convert it to a date using the appropriate conversion function.
A shameless copy/paste from an earlier answer:
BETWEEN CONVERT(datetime,'2018-01-01') AND CONVERT(datetime,'2018-03-03')
The original answer: Datetime BETWEEN statement not working in SQL Server
I suspect that the datetime versus string is the culprit. Resulting in (without the insert into):
SELECT InputPerson.[PARTY], Input_Y.[R_1], Input_Y.[R_Account_1], InputPerson.[Amount]
FROM InputPerson
JOIN Input_Y ON InputPerson.[Action] = Input_Y.[Action]
WHERE InputPerson.[Date] BETWEEN CONVERT(datetime,'2018-01-01') AND CONVERT(datetime,'2018-03-03')
-- Edit (after comments from Olivier) --
Are you sure the select-statement returns results? Perhaps the Inner-Join combined with the Where-clause results in an empty result set.
I have a slow performing query and was hoping someone with a bit more knowledge in sql might be able to help me improve the performance:
I have 2 tables a Source and a Common, I load in some data which contains a Date, a Time and String (whch is a server name), plus some..
The Source table can contain 40k+ rows (it has 30 odd columns, a mix of ints, dates, times and some varchars (255)/(Max)
I use the below query to remove any data from Common that is in source:
'Delete from Common where convert(varchar(max),Date,102)+convert(varchar(max),Time,108)+[ServerName] in
(Select convert(varchar(max),[date],102)+convert(varchar(max),time,108)+ServerName from Source where sc_status < 300)'
The Source Fields are in this format:
ServerName varchar(255) I.E SN1234
Date varchar(255) I.E 2012-05-22
Time varchar(255) I.E 08:12:21
The Common Fields are in this format:
ServerName varchar(255) I.E SN1234
Date date I.E 2011-08-10
Time time(7) I.E 14:25:34.0000000
Thanks
Converting both sides to strings, then concatenating them into one big string, then comparing those results is not very efficient. Only do conversions where you have to. Try this example and see how it compares:
DELETE c
FROM dbo.Common AS c
INNER JOIN dbo.Source AS s
ON s.ServerName = c.ServerName
AND CONVERT(DATE, s.[Date]) = c.[Date]
AND CONVERT(TIME(7), s.[Time]) = c.[Time]
WHERE s.sc_status < 300;
All those conversions to VARCHAR(MAX) are unnecessary and probably slowing you down. I would start with something like this instead:
DELETE c
from [Common] c
WHERE EXISTS(
SELECT 1
FROM Source
WHERE CAST([Date] AS DATE)=c.[Date]
AND CAST([Time] AS TIME(7))=c.[Time]
AND [ServerName]=c.[ServerName]
AND sc_status < 300
);
Something like
Delete from Common inner join Source
On Common.ServerName = Source.ServerName
and Common.Date = Convert(Date,Source.Date)
and Common.Time = Convert(Time, Source.Time)
And Source.sc_Status < 300
If it's too slow after that, then you need some indexes, possible on both tables.
Removing the unecessary conversions will help a lot as detailed in Aaron's answer. You might also consider creating an indexed view over the top of the log table, since you probably dont have much flexibility in that schema or insert DML from the log parser.
Simple example:
create table dbo.[Source] (LogId int primary key, servername varchar(255),
[date] varchar(255), [time] varchar(255));
insert into dbo.[Source]
values (1, 'SN1234', '2012-05-22', '08:12:21'),
(2, 'SN5678', '2012-05-23', '09:12:21')
go
create view dbo.vSource with schemabinding
as
select [LogId],
[servername],
[date],
[time],
[actualDateTime] = convert(datetime, [date]+' '+[time], 120)
from dbo.[Source];
go
create unique clustered index UX_Source on vSource(LogId);
create nonclustered index IX_Source on vSource(actualDateTime);
This will give you an indexed datetime column on which to seek and vastly improve your execution plans at the cost of some insert performance.
I'm creating a report (in Crystal Reports XI) based on a SQL stored procedure in a database. The query accepts a few parameters, and returns records within the specified date range. If parameters are passed in, they are used to determine which records to return. If one or more parameters are not passed in, that field is not used to limit the types of records returned. It's a bit complicated, so here's my WHERE clause:
WHERE ((Date > #start_date) AND (Date < #end_date))
AND (#EmployeeID IS NULL OR emp_id = #EmployeeID)
AND (#ClientID IS NULL OR client_id = #ClientID)
AND (#ProjectID IS NULL OR project_id = #ProjectID)
AND (#Group IS NULL OR group = #Group)
Now, for the problem:
The query (and report) works beautifully for old data, within the range of years 2000-2005. However, the WHERE clause is not filtering the data properly for more recent years: it only returns records where the parameter #Group is NULL (ie: not passed in).
Any hints, tips, or leads are appreciated!
Solved!
It actually had nothing to do with the WHERE clause, after all. I had let SQL Server generate an inner join for me, which should have been a LEFT join: many records from recent years do not contain entries in the joined table (expenses), so they weren't showing up. Interestingly, the few recent records that do have entries in the expenses table have a NULL value for group, which is why I got records only when #Group was NULL.
Morals of the story: 1. Double check anything that is automatically generated; and 2. Look out for NULL values! (n8wl - thanks for giving me the hint to look closely at NULLs.)
What are the chances that your newer data (post-2005) has some rows with NULL's in emp_id, client_id, project
_id, or group? If they were NULL's they can't match the parameters you're passing.
Since Date and group are reserved words you might try putting square brackets around the fields so they aren't processed. Doing so can get rid of "odd" issues like this. So that would make it:
WHERE (([Date] > #start_date) AND ([Date] < #end_date))
AND (#EmployeeID IS NULL OR emp_id = #EmployeeID)
AND (#ClientID IS NULL OR client_id = #ClientID)
AND (#ProjectID IS NULL OR project_id = #ProjectID)
AND (#Group IS NULL OR [group] = #Group)