Calculate row count when facing SPOOL space issue - sql

SEL COUNT(*) FROM DATABASE_A.QF
Count = 37,011,480
SEL COUNT(*) FROM DATABASE_A_INC.QFA
Count = 368,454
Query 1
DELETE A
FROM
DATABASE_A.QF A,
DATABASE_A_INC.QFA B
WHERE
A.Q_NUM = B.Q_NUM
AND
A.ID = B.ID
AND
A.LOCATION_ID=1;
The above DELETE query runs into SPOOL space issue.
So I rewrote it in another form.
Query 2
DELETE FROM DATABASE_A.QF A WHERE (Q_NUM,ID) IN
(SELECT Q_NUM,ID FROM DATABASE_A_INC.QFA B)
AND LOCATION_ID=1;
368454 rows processed.
DELETE Command Complete
My questions:
Are query 1 and 2 logically the same? Are they deleting the same records?
How do I verify the count from Query 1 without running into a SPOOL
space issue? I have tried a general COUNT function. I tried increasing spool space to a certain extent.
Is there a better way to check the count for Query 1?

The queries are logically the same, yes. My guess is the reason for your SPOOL space issue is that you are listing your tables with commas instead of joining them. Try counting query 1 like this:
SELECT COUNT(*)
FROM DATABASE_A.QF A
INNER JOIN DATABASE_A_INC.QFA B ON A.Q_NUM = B.Q_NUM
WHERE A.ID = B.ID
AND A.LOCATION_ID=1;

Related

Select statement for 1 table returns new rows then the table actually have

Update: the issue was in saving results into a different table. Apologies, this question should be deleted.
I got this query:
SELECT DISTINCT
SubscriberKey,
'True' as Email_Opens
FROM LN_Journey_21
WHERE SubscriberKey in(
SELECT
LN.SubscriberKey
FROM
_Job J
join _Open O on J.JobID = O.JobID
join LN_Journey_21 LN on LN.SubscriberKey = O.SubscriberKey
WHERE
J.EmailName LIKE 'IQOS_LN%'
and j.CreatedDate >= '2021-05-10'
)
SubscriberKey is a PK in LN_Journey_21.
The results have more rows than LN_Journey_21 had before running the query, how is that?
The query should be (most importantly you don't need DISTINCT anywhere):
SELECT SubscriberKey,
'True' as Email_Opens
FROM dbo.LN_Journey_21 AS LN
WHERE EXISTS
(
SELECT 1
FROM dbo._Job AS J
INNER JOIN dbo._Open AS O
ON J.JobID = O.JobID
WHERE O.SubscriberKey = LN.SubscriberKey
AND J.EmailName LIKE 'IQOS_LN%'
AND J.CreatedDate >= '20210510'
);
Extra rows could be explained by:
different query than what's posted in your question
COUNT query is more complex than just SELECT COUNT(*) FROM dbo.table;
data has actually changed between when you ran the COUNT query and when you ran this query
using NOLOCK (perhaps you've hidden it from us, or it's used on your COUNT query, or both)
you are relying on the status bar in SSMS, which shows total rows for the batch by default, and you other queries that return those additional 500 rows
Like the comments suggest, it would be great if you could show a scenario (e.g. on db<>fiddle where COUNT produces fewer rows than this query. With the information we have so far, it's not possible, except for situations like those I mentioned above (that list may not be exhaustive, but probably the most common).

Determine datatypes of columns - SQL selection

Is it possible to determine the type of data of each column after a SQL selection, based on received results? I know it is possible though information_schema.columns, but the data I receive comes from multiple tables and is joint together and the data is renamed. Besides that, I'm not able to see or use this query or execute other queries myself.
My job is to store this received data in another table, but without knowing beforehand what I will receive. I'm obviously able to check for example if a certain column contains numbers or text, but not if it is originally stored as a TINYINT(1) or a BIGINT(128). How to approach this? To clarify, it is alright if the data-types of the columns of the source and destination aren't entirely the same, but I don't want to reserve too much space beforehand (or too less for that matter).
As I'm typing, I realize I'm formulation the question wrong. What would be the best approach to handle described situation? I thought about altering tables on the run (e.g. increasing size if needed), but that seems a bit, well, wrong and not the proper way.
Thanks
Can you issue the following query about your new table after you create it?
SELECT *
INTO JoinedQueryResults
FROM TableA AS A
INNER JOIN TableB AS B ON A.ID = B.ID
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'JoinedQueryResults'
Is the query too big to run before knowing how big the results will be? Get a idea of how many rows it may return, but the trick with queries with joins is to group on the columns you are joining on, to help your estimate return more quickly. Here's of an example of just returning a row count from the query above which would have created the JoinedQueryResults table above.
SELECT SUM(A.NumRows * B.NumRows)
FROM (SELECT ID, COUNT(*) AS NumRows
FROM TableA
GROUP BY ID) AS A
INNER JOIN (SELECT ID, COUNT(*) AS NumRows
FROM TableB
GROUP BY ID) AS B ON A.ID = B.ID
The query above will run faster if all you need is a record count to help you estimate a size.
Also try instantiating a table for your results with a query like this.
SELECT TOP 0 *
INTO JoinedQueryResults
FROM TableA AS A
INNER JOIN TableB AS B ON A.ID = B.ID

Inefficient if/then loop in a SQL query

I'd like to clean up the results of a SQL query that is always run manually from with Management Studio. But my if/then loop is taking much longer than the individual elements.
Currently, this Select statement runs instantly (less than 1 second) and is usually empty:
Select * from A join B on A.id=B.id
Instead of an empty result set, I wanted to display a message if there were no results (this is part of a larger multi-part query so the clarity would help). I changed it to this:
If (Select count(*) from A join B on A.id=B.id)>0
begin
Select * from A join B on A.id=B.id
end
else
Select 'No Results'
Since both Select statements in there run near instantly (I checked), I expect this entire snippet to run in the same amount of time. Instead, it takes EIGHT seconds. Why is this taking so much longer and is there a simple way around it?
use if exists
If exists (Select * from A join B on A.id=B.id)
begin
Select * from A join B on A.id=B.id
end
else
Select 'No Results'
I'd suggest checking the result count after the query. This has the down side of giving you a second result set in your output, but it has the upside of not querying the data twice.
Select * from A join B on A.id=B.id
IF ##ROWCOUNT = 0 Select 'No Results'

query behave not as expected

I have a query:
select count(*) as total
from sheet_record right join
(select * from sheet_record limit 10) as sr
on 1=1;
If i understood correct (which i think i did not), right join suppose to return all row from right table in conjunction with left table. it suppose to be at list 10 row. But query returns only 1 row with 1 column 'total' . And it doesn't matter left full inner join it will be, result is the same always.
If i reverse tables and use left join with small modification of query, then it work correct (Modifications have no matter because in this case i get exactly what i expected to get). But I am interested to find what i actually didn't understand about join and why this query works not as expected.
You are returning one column because the select contains an aggregation function, turning this into an aggregation query. The query should be returning 10 times the number of rows in the sheet_record table.
Your query is effectively a cross join. So, if you did:
select *
from sheet_record right join
(select * from sheet_record limit 10) as sr
on 1=1;
You would get 10 rows for each record in sheet_record. Each of those records would have additional columns from one of ten records from the same table.
You are using a count(*) function, without any groupings. This will pretty much will result in retrieving a single row back. Try running your query without the count() to see if you get something closer to what you expect.
Eventually with help of commentators I did understood what was wrong. Not wrong actually, but what exactly i was not catching.
// this code below is work fine. query will return page 15 with 10 records in.
select *from sheet_record inner join (select count(*) as total from sheet_record) as sr on 1=1 limit 10 offset 140;
I was thinking that join takes table from left and join with the right table. But the moment i was working on script(above) I had on right side a view(table built by subquery) instead of pure table and i was thinking that left side as well a view, made by (select * from sheet_record) which is a mistake.
Idea is to get set of records from table X with additional column having value of total number of records in table.
(This is common problem when there is a demand to show table in UI using paging. To know how many pages still should be available i need to know how many record in total so i can calculate how many pages still available)
I think it should be something
select * from (
(here is some subquery which will give a view using count(*) function on some table X and it will be used as left table)
right join
(here is some subquery which will get some set or records from table X with limit and offset)
on 1=1 //becouse i need all row from right table(view) in all cases it should be true
)
Query with right join will a bit complicated.
I am using postgres.
So eventually i managed to get result with right join
select * from (select count(*) as total from sheet_record) as srt right join (select * from sheet_record limit 10 offset 140) as sr on 1=1;

Setting multiple rows in one table equal to multiple rows in another table based on multiple column values being equal

I am trying to run a rather convoluted query. Two tables are out of sync. In one step of the processing, a 16 digit value is copied from one table to the other, and is getting truncated to just 10 digits.
I'm using a few pieces of information to copy the full 16 digit number over. I'm trying to find anywhere where the 10 digit value matches the first 10 digits of the 16 digit value, and three other pieces of information in these two tables match. Combined, they give almost 100% certainty that we have a unique entry. This is the current iteration of my query:
UPDATE DB1.TABLE1
SET ID =
(
SELECT b.ID
FROM DB2.TABLE1 b
INNER DB1.TABLE1 a
ON left(b.ID, 10) = a.ID
WHERE len(a.ID) = 10
AND a.STORE = b.STORE
AND a.DOCTYPE = b.DOCTYPE
AND a.DOCDATE = b.DOCDATE
)
The problem is, it's telling me the subquery is returning multiple results. But I want multiple results. I tried adding another WHERE statement after the parenthesis, and duplicating the last four lines of the subquery, but that's not working either. I also tried using WHERE EXISTS and duplicating the entire SELECT statement, but that gives em the multiple results error as well. What am I missing here?
Your statement is attempting to update every row in DB1.TABLE1 to what's returned by the subquery. Not only is that not what you want, but the statement fails because the subquery is returning multiple values.
What you need to do is correlate the two tables as part of the update statement, like this:
UPDATE DB1.TABLE1
SET ID = b.ID
FROM DB1.TABLE1 a
INNER JOIN DB2.TABLE1 b
ON left(b.ID, 10) = a.ID
AND a.STORE = b.STORE
AND a.DOCTYPE = b.DOCTYPE
AND a.DOCDATE = b.DOCDATE
WHERE len(a.ID) = 10