SQL efficient way to match ANY in a large table - sql

I am joining a small table and a very large table and want to return a distinct item if ANY items match. The table is so large that it takes hours for something that I think should take seconds.
The problem is that I am "iterating" over every single entry in the second table. I want to be able to "break" once a condition is met and return that value instead of continuing over every single account.
In the code below, I am finding every single row for each name that I am joining, even though I am only returning the DISTINCT example.name and don't care about every row. How can I return DISTINCT.name after finding the first instance of new_ex.data = ... after performing the INNER JOIN?
SELECT DISTINCT example.name
FROM (
SELECT DISTINCT ex.user AS name
FROM exampleTable ex
WHERE ex.timestamp >= '2022-01-01'
AND ex.group = 'test'
AND new_ex.data = '123'
) AS example_users
INNER JOIN exampleTable new_ex on example_users.name = new_ex.user
AND new_ex.timestamp >= '2022-01-01'
AND (
OR new_ex.data = 'abc'
OR new_ex.data = 'def'
OR new_ex.data = 'ghi'
-- ~10 more of these OR statements
)

Without seeing the data it's hard to be sure this can't be simplified further, but I think you can at least boil this down to
select distinct ex.user as name
from exampleTable ex
where ex.timestamp >= '2022-01-01'
and ex.group = 'test'
AND new_ex.data = '123'
and exists (
select 1
from exampleTable new_ex
where new_ex.user=ex.name
and new_ex.data = '123'
and new_ex.timestamp >= '2022-01-01'
and new_ex.data in ('abc','def','ghi'...)
)

Use below query, using multiple OR will cause performance issue. Instead use IN.
select DISTINCT ex.user from exampleTable ex
INNER JOIN exampleTable new_ex on example_users.user = new_ex.user
where ex.timestamp >= '2022-01-01'
AND ex.group = 'test'
AND new_ex.timestamp >= '2022-01-01'
AND new_ex.data in ('abc', 'def', 'ghi'); -- include all your values
You can also use below query,
select DISTINCT ex.user from exampleTable ex
INNER JOIN (select distinct user, timestamp, data from exampleTable) new_ex on example_users.user = new_ex.user
where ex.timestamp >= '2022-01-01'
AND ex.group = 'test'
AND new_ex.timestamp >= '2022-01-01'
AND new_ex.data in ('abc', 'def', 'ghi'); -- include all your values

Related

Find the difference between 1 column depending on date

When I run this:
SELECT NAME FROM T1
WHERE _LOAD_DATETIME::date = '2022-01-31'
I see 62 rows
but when I do
SELECT NAME FROM T1
WHERE _LOAD_DATETIME::date = '2022-02-01'
I see 59
I want to see what NAME's are missing when it ran for _LOAD_DATETIME::date = '2022-02-01'
I thought this would work but it doesn't:
SELECT NAME FROM table
WHERE _LOAD_DATETIME::date = '2022-02-01'
AND NOT EXISTS (
SELECT NAME FROM
table
WHERE _LOAD_DATETIME::date = '2022-01-31')
You have to use MINUS for your purposes:
SELECT NAME FROM T1
WHERE _LOAD_DATETIME::date = '2022-01-31'
MINUS
SELECT NAME FROM T1
WHERE _LOAD_DATETIME::date = '2022-02-01'
If we are talking about PostgreSQL, you have to use EXCEPT instead of MINUS.
There are two set operators MINUS or EXCEPT you can use (they are aliases for each other)
SELECT column1 FROM values (1),(2),(3),(4)
MINUS
SELECT column1 FROM values (2),(3),(4),(5);
gives 1 if you want to see 5 you need to flip the order of SELECTs.

Union a table with a synthetic table to get result even if query result is empty

I have a data table which i want to select some fields filtering by date.
If the result is empty, based on the sysdate I need to decide if it is ok or not.
To be able to do that I am creating a synthetic table with a flag field which I expect to be populated in result set even if there is no data in my actual table at that date.
WITH const AS (
SELECT
'NAME 1' AS name,
(CASE WHEN TO_TIMESTAMP(TO_CHAR(CURRENT_TIMESTAMP, 'HH24:MI:SS'), 'HH24:MI:SS') < TO_TIMESTAMP('01:00:00', 'HH24:MI:SS') THEN 1 ELSE 0 END) AS flag
FROM
Data_Table
UNION
SELECT
'ANY NAME' AS name,
(CASE WHEN TO_TIMESTAMP(TO_CHAR(CURRENT_TIMESTAMP, 'HH24:MI:SS'), 'HH24:MI:SS') < TO_TIMESTAMP('01:00:00', 'HH24:MI:SS') THEN 1 ELSE 0 END) AS flag )
SELECT Data_Table.sysname, const.flag FROM const LEFT OUTER JOIN Data_Table ON Data_Table.sysname = const.name WHERE Data_Table.date=TO_CHAR(sysdate, 'DD-MM-YYYY')
I expect to get results like below:
sysname flag
Name1 1
(null) 1
But getting empty result if there is no data with that date.
Here's general example. If I understand correctly you need to be able to return a value even if query returns no rows:
SELECT table_name FROM all_tables
WHERE table_name = 'YOUR_TABLE'
UNION ALL
SELECT '1' table_name FROM dual
WHERE NOT EXISTS
(
SELECT table_name FROM all_tables
WHERE table_name = 'YOUR_TABLE'
)
/
The result is 1 as there is no table as 'YOUR_TABLE' in my database. But if I put a valid table name then i will get results from the top query. If not then always get 1 or any other value from second query. The table names in both queries must be the same. The second one is a copy of the top one.
I see the problem in your last statement that is
SELECT Data_Table.sysname, const.flag
FROM const
LEFT OUTER JOIN Data_Table
ON Data_Table.sysname = const.name
WHERE Data_Table.date=TO_CHAR(sysdate, 'DD-MM-YYYY')-- Here is the problem
You are doing a left outer join and then filtering the data using where condition, the where condition will never be true in this case. You need to push that condition into the join so that it does not affect the overall result:
SELECT Data_Table.sysname, const.flag
FROM const
LEFT OUTER JOIN Data_Table
ON Data_Table.sysname = const.name
AND Data_Table.date=TO_CHAR(sysdate, 'DD-MM-YYYY')

Select Statement Return 0 if Null

I have the following query
SELECT ProgramDate, [CountVal]= COUNT(ProgramDate)
FROM ProgramsTbl
WHERE (Type = 'Type1' AND ProgramDate = '10/18/11' )
GROUP BY ProgramDate
What happens is that if there is no record that matches the Type and ProgramDate, I do not get any records returned.
What I like to have outputted in the above is something like the following if there is no values returned. Notice how for the CountVal we have 0 even if there are no records returned that fit the match condition:
ProgramDate CountVal
10/18/11 0
This is a little more complicated than you would like however, it is very possible. You will first have to create a temporary table of dates. For example, the query below creates a range of dates from 2011-10-11 to 2011-10-20
CREATE TEMPORARY TABLE date_stamps AS
SELECT (date '2011-10-10' + new_number) AS date_stamp
FROM generate_series(1, 10) AS new_number;
Using this temporary table, you can select from it and left join your table ProgramsTbl. For example
SELECT date_stamp,COUNT(ProgramDate)
FROM date_stamps
LEFT JOIN ProgramsTbl ON ProgramsTbl.ProgramDate = date_stamps.date_stamp
WHERE Type = 'Type1'
GROUP BY ProgramDate;
Select ProgramDate, [CountVal]= SUM(occur)
from
(
SELECT ProgramDate, 1 occur
FROM ProgramsTbl
WHERE (Type = 'Type1' AND ProgramDate = '10/18/11' )
UNION
SELECT '10/18/11', 0
)
GROUP BY ProgramDate
Because each SELECT statement is really building a table of records you can use a SELECT query to build a table with both the program count and a default count of zero. This would require two SELECT queries (one to get the actual count, one to get the default count) and using a UNION to combine the two SELECT results into a single table.
From there you can SELECT from the UNIONed table to sum the CountVals (if the programDate occurs in the ProgramTable the CountVal will be
CountVal of the first query if it exists(>0) + CountVal of the second query (=0)).
This way even if there are no records for the desired programDate in ProgramTable you will get a record back indicating a count of 0.
This would look like:
SELECT ProgramDate, SUM(CountVal)
FROM
(SELECT ProgramDate, COUNT(*) AS CountVal
FROM ProgramsTbl
WHERE (Type = 'Type1' AND ProgramDate = '10/18/11' )
UNION
SELECT '10/18/11' AS ProgramDate, 0 AS CountVal) T1
Here's a solution that works on SQL Server; not sure about other db platforms:
DECLARE #Type VARCHAR(5) = 'Type1'
, #ProgramDate DATE = '10/18/2011'
SELECT pt.ProgramDate
, COUNT(pt2.ProgramDate)
FROM ( SELECT #ProgramDate AS ProgramDate
, #Type AS Type
) pt
LEFT JOIN ProgramsTbl pt2 ON pt.Type = pt2.Type
AND pt.ProgramDate = pt2.ProgramDate
GROUP BY pt.ProgramDate
Grunge but simple and efficient
SELECT '10/18/11' as 'Program Date', count(*) as 'count'
FROM ProgramsTbl
WHERE Type = 'Type1' AND ProgramDate = '10/18/11'
Try something along these lines. This will establish a row with a date of 10/18/11 that will definitely return. Then you left join to your actual data to get your desired count (which can now return 0 if there are no corresponding rows).
To do this for more than 1 date, you'd want to build a Date table that holds a list of all dates you want to query (so substitute the "select '10/18/11'" with "select Date from DateTbl").
SELECT ProgDt.ProgDate, [CountVal]= COUNT(ProgramsTbl.ProgramDate)
FROM (SELECT '10/18/11' as 'ProgDate') ProgDt
LEFT JOIN ProgramsTbl
ON ProgDt.ProgDate = ProgramsTbl.ProgramDate
WHERE (Type = 'Type1')
GROUP BY ProgDt.ProgDate
To create a date table that you can use for querying, do this (assumes SQL Server 2005+):
create table Dates (MyDate datetime)
go
insert into Dates
select top 100000 row_number() over (order by s1.name)
from master..spt_values s1, master..spt_values s2
go

#1222 - The used SELECT statements have a different number of columns

Why am i getting a #1222 - The used SELECT statements have a different number of columns
? i am trying to load wall posts from this users friends and his self.
SELECT u.id AS pid, b2.id AS id, b2.message AS message, b2.date AS date FROM
(
(
SELECT b.id AS id, b.pid AS pid, b.message AS message, b.date AS date FROM
wall_posts AS b
JOIN Friends AS f ON f.id = b.pid
WHERE f.buddy_id = '1' AND f.status = 'b'
ORDER BY date DESC
LIMIT 0, 10
)
UNION
(
SELECT * FROM
wall_posts
WHERE pid = '1'
ORDER BY date DESC
LIMIT 0, 10
)
ORDER BY date DESC
LIMIT 0, 10
) AS b2
JOIN Users AS u
ON b2.pid = u.id
WHERE u.banned='0' AND u.email_activated='1'
ORDER BY date DESC
LIMIT 0, 10
The wall_posts table structure looks like id date privacy pid uid message
The Friends table structure looks like Fid id buddy_id invite_up_date status
pid stands for profile id. I am not really sure whats going on.
The first statement in the UNION returns four columns:
SELECT b.id AS id,
b.pid AS pid,
b.message AS message,
b.date AS date
FROM wall_posts AS b
The second one returns six, because the * expands to include all the columns from WALL_POSTS:
SELECT b.id,
b.date,
b.privacy,
b.pid.
b.uid message
FROM wall_posts AS b
The UNION and UNION ALL operators require that:
The same number of columns exist in all the statements that make up the UNION'd query
The data types have to match at each position/column
Use:
FROM ((SELECT b.id AS id,
b.pid AS pid,
b.message AS message,
b.date AS date
FROM wall_posts AS b
JOIN Friends AS f ON f.id = b.pid
WHERE f.buddy_id = '1' AND f.status = 'b'
ORDER BY date DESC
LIMIT 0, 10)
UNION
(SELECT id,
pid,
message,
date
FROM wall_posts
WHERE pid = '1'
ORDER BY date DESC
LIMIT 0, 10))
You're taking the UNION of a 4-column relation (id, pid, message, and date) with a 6-column relation (* = the 6 columns of wall_posts). SQL doesn't let you do that.
(
SELECT b.id AS id, b.pid AS pid, b.message AS message, b.date AS date FROM
wall_posts AS b
JOIN Friends AS f ON f.id = b.pid
WHERE f.buddy_id = '1' AND f.status = 'b'
ORDER BY date DESC
LIMIT 0, 10
)
UNION
(
SELECT id, pid , message , date
FROM
wall_posts
WHERE pid = '1'
ORDER BY date DESC
LIMIT 0, 10
)
You were selecting 4 in the first query and 6 in the second, so match them up.
Beside from the answer given by #omg-ponies; I just want to add that this error also occur in variable assignment. In my case I used an insert; associated with that insert was a trigger. I mistakenly assign different number of fields to different number of variables. Below is my case details.
INSERT INTO tab1 (event, eventTypeID, fromDate, toDate, remarks)
-> SELECT event, eventTypeID,
-> fromDate, toDate, remarks FROM rrp group by trainingCode;
ERROR 1222 (21000): The used SELECT statements have a different number of columns
So you see I got this error by issuing an insert statement instead of union statement. My case difference were
I issued a bulk insert sql
i.e. insert into tab1 (field, ...) as select field, ... from tab2
tab2 had an on insert trigger; this trigger basically decline duplicates
It turns out that I had an error in the trigger. I fetch record based on new input data and assigned them in incorrect number of variables.
DELIMITER ##
DROP TRIGGER trgInsertTrigger ##
CREATE TRIGGER trgInsertTrigger
BEFORE INSERT ON training
FOR EACH ROW
BEGIN
SET #recs = 0;
SET #trgID = 0;
SET #trgDescID = 0;
SET #trgDesc = '';
SET #district = '';
SET #msg = '';
SELECT COUNT(*), t.trainingID, td.trgDescID, td.trgDescName, t.trgDistrictID
INTO #recs, #trgID, #trgDescID, #proj, #trgDesc, #district
from training as t
left join trainingDistrict as tdist on t.trainingID = tdist.trainingID
left join trgDesc as td on t.trgDescID = td.trgDescID
WHERE
t.trgDescID = NEW.trgDescID
AND t.venue = NEW.venue
AND t.fromDate = NEW.fromDate
AND t.toDate = NEW.toDate
AND t.gender = NEW.gender
AND t.totalParticipants = NEW.totalParticipants
AND t.districtIDs = NEW.districtIDs;
IF #recs > 0 THEN
SET #msg = CONCAT('Error: Duplicate Training: previous ID ', CAST(#trgID AS CHAR CHARACTER SET utf8) COLLATE utf8_bin);
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = #msg;
END IF;
END ##
DELIMITER ;
As you can see i am fetching 5 fields but assigning them in 6 var. (My fault totally I forgot to delete the variable after editing.
You are using MySQL Union.
UNION is used to combine the result from multiple SELECT statements into a single result set.
The column names from the first SELECT statement are used as the column names for the results returned. Selected columns listed in corresponding positions of each SELECT statement should have the same data type. (For example, the first column selected by the first statement should have the same type as the first column selected by the other statements.)
Reference: MySQL Union
Your first select statement has 4 columns and second statement has 6 as you said wall_post has 6 column.
You should have same number of column and also in same order in both statement.
otherwise it shows error or wrong data.

Selecting max/min value from more than one fields

In the following query the start/finish columns are datetime fields.
How should I modify this query to get two more columns, one with the min date and one with the max date (of all the 6 datetime fields and all the rows) repeated in each row.
Alternatively how could I create a new query returning only these 2 (min/max) dates, for the same resultset of course?
Thanks a lot! (I would like answers for both SQL Server 2005 and Sybase ASE 12.5.4)
select erg_mst.code,
erg_types.perigrafh,
erg_mst.FirstBaseStart,
erg_mst.FirstBaseFinish,
erg_mst.LastBaseStart,
erg_mst.LastBaseFinish ,
erg_mst.ActualStart,
erg_mst.ActualFinish
from erg_mst inner join
erg_types on erg_mst.type = erg_types.type_code
where erg_mst.activemodule = 'co'
and (
FirstBaseStart <> NULL OR
FirstBaseFinish <> NULL OR
LastBaseStart <> NULL OR
LastBaseFinish <> NULL OR
ActualStart <> NULL OR
ActualFinish <> NULL
)
order by isnull(FirstBaseStart,isnull(LastBaseStart,ActualStart))
See below for a SQL Server 2005 code sample using Miles D's suggestion of using a series of UNION'ed selects (sorry, I don't know Sybase syntax):
select min(AllDates) as MinDate, max(AllDates) as MaxDate
from
(
select erg_mst.FirstBaseStart as AllDates
from erg_mst
where erg_mst.activemodule = 'co'
and FirstBaseStart IS NOT NULL
union all
select erg_mst.FirstBaseFinish as AllDates
from erg_mst
where erg_mst.activemodule = 'co'
and FirstBaseFinish IS NOT NULL
union all
select erg_mst.LastBaseStart as AllDates
from erg_mst
where erg_mst.activemodule = 'co'
and LastBaseStart IS NOT NULL
union all
select erg_mst.LastBaseFinish as AllDates
from erg_mst
where erg_mst.activemodule = 'co'
and LastBaseFinish IS NOT NULL
union all
select erg_mst.ActualStart as AllDates
from erg_mst
where erg_mst.activemodule = 'co'
and ActualStart IS NOT NULL
union all
select erg_mst.ActualFinish as AllDates
from erg_mst
where erg_mst.activemodule = 'co'
and ActualFinish IS NOT NULL
) #Temp
I can think of two solutions, but both will need to take on board Lucer's comment to use IS NOT NULL, rather than <> NULL.
Create a two user defined function to return the max and minimum values - ok, but assumes you have access to do it.
Use a series of UNION'ed selects, each one selecting one of the six columns and then use this as the inner nested SELECT where you then use SELECT MAX(), MIN() from that.