SQL query to sort and select from the selected - sql

i would some help with this issue , i have news tables
i want to select 2000 terms and sort them, then check if the terms exist in the 2000 show it else 0.
some thing like that .
SELECT TOP 1000 [terms]
,[frequency]
,[occurance]
,[idf]
,[tfidf]
FROM [Central].[news]
ORDER BY tfidf DESC;
IF ##ROWCOUNT= 0
select 0 as FinalResult;
ELSE
if ##ROWCOUNT< 2000
select * from [CentralFinance].[dbo].[TFIDF_1] where terms = 'project'

Perhaps this is helpful. It is a total guess:
select top 2000 -- or 1000?
terms, frequency, occurance, idf, tfidf
from Central.news
order by tfidf desc;
if ##rowcount > 0 begin
select * from CentralFinance.dbo.TFIDF_1
where terms in (
select top 2000 terms
from Central.news
order by tfidf desc
);
select 1 as FinalResult;
end
else begin
select 0 as FinalResult;
end
Another thought is:
if exists (select 1 from Central.news) begin
select * from CentralFinance.dbo.TFIDF_1
where terms in (
select top 2000 terms
from Central.news
order by tfidf desc
);
select 1 as FinalResult;
end
else begin
select 0 as FinalResult;
end
And finally a third guess:
select sgn(count(*)) as FinalResult
from (select 1) dummy
where 'project' in
(
select top 2000 terms
from Central.news
order by tfidf desc
)

You can use a Temp table to store the initial query and the query the temp table for the new dataset. Or you can add a sub query to the where clause. What do you want returned from the query ?

Related

When select from insert into returns no values do something different

I want to insert rows into a table. The table is empty when I start. My query is as follows:
Select TOP 1 *
INTO #Result
FROM #SmallTable
WHERE CategoryID=11
ORDER BY ExpValue DESC;
It works flawless. But I want now to account for the case where the this returns no value. But I'm not sure how to approach this.
I could either make a case and select and ask if SELECT TOP 1 returns any values. Or I could check after I insert if there is a value present. But which approach would be better? Or is there an even better one?
You could use a union trick here to insert a dummy value should the first query not return any records:
INSERT INTO #Result (col)
SELECT TOP 1 col
FROM
(
SELECT TOP 1 col, 1 AS pos FROM #SmallTable WHERE CategoryID = 11 ORDER BY ExpValue DESC
UNION ALL
SELECT 'NA', 2
) t
ORDER BY pos;
Look at ##ROWCOUNT
This returns the number of rows affected by the last procedure.
IF ##ROWCOUNT = 0
PRINT 'Warning: No rows were inserted';
You can use apply :
select top (1) coalesce(st.CategoryID, 0) as CategoryID, . .
into #destination
from ( values (11)
) t(CategoryID) left join
#SmallTable st
on st.CategoryID = t.CategoryID
order by st.ExpValue desc;

MS SQL does not return the expected top row when ordering by DIFFERENCE()

I have noticed strange behaviour in some SQL code used for address matching at the company I work for & have created some test SQL to illustrate the issue.
; WITH Temp (Id, Diff) AS (
SELECT 9218, 0
UNION
SELECT 9219, 0
UNION
SELECT 9220, 0
)
SELECT TOP 1 * FROM Temp ORDER BY Diff DESC
Returns 9218 but
; WITH Temp (Id, Name) AS (
SELECT 9218, 'Sonnedal'
UNION
SELECT 9219, 'Lammermoor'
UNION
SELECT 9220, 'Honeydew'
)
SELECT TOP 1 *, DIFFERENCE(Name, '') FROM Temp ORDER BY DIFFERENCE(Name, '') DESC
returns 9219 even though the Difference() is 0 for all records as you can see here:
; WITH Temp (Id, Name) AS (
SELECT 9218, 'Sonnedal'
UNION
SELECT 9219, 'Lammermoor'
UNION
SELECT 9220, 'Honeydew'
)
SELECT *, DIFFERENCE(Name, '') FROM Temp ORDER BY DIFFERENCE(Name, '') DESC
which returns
9218 Sonnedal 0
9219 Lammermoor 0
9220 Honeydew 0
Does anyone know why this happens? I am writing C# to replace existing SQL & need to return the same results so I can test that my code produces the same results. But I can't see why the actual SQL used returns 9219 rather than 9218 & it doesn't seem to make sense. It seems it's down to the Difference() function but it returns 0 for all the record in question.
When you call:
SELECT TOP 1 *, DIFFERENCE(Name, '')
FROM Temp l
ORDER BY DIFFERENCE(Name, '') DESC
All three records have a DIFFERENCE value of zero, and hence SQL Server is free to choose from any of the three records for ordering. That is to say, there is no guarantee which order you will get. The same is true for your second query. Actually, it is possible that the ordering for the same query could even change over time. In practice, if you expect a certain ordering, you should provide exact logic for it, e.g.
SELECT TOP 1 *
FROM Temp
ORDER BY Id;

Random sample in bigquery gives inconsistent results

I'm using the RAND function in bigquery to provide me with a random sample of data, and unioning it with another sample of the same dataset.
This is for a machine learning problem where I'm interested in one class more than the other.
I've recreated the logic using a public dataset.
SELECT
COUNT(1),
bigarticle
FROM
(
SELECT
1 as [bigarticle]
FROM [bigquery-public-data:samples.wikipedia]
WHERE num_characters > 50000
),
(
SELECT
0 as [bigarticle]
FROM [bigquery-public-data:samples.wikipedia]
WHERE (is_redirect is null) AND (RAND() < 0.01)
)
GROUP BY bigarticle
Most of the time this behaves as expected,
giving one row with the count of rows where num_characters is more than 50k,
and giving another row with a count of a 1% sample of rows where is_redirect is null.
(This is an approximation of the logic I use in my internal dataset).
If you run this query repeatedly, occasionally it gives unexpected results.
In this result set (bquijob_124ad56f_15da8af982e) I only get a single row, containing the count of bigarticle = 1.
RAND does not use a deterministic seed. If you want deterministic results, you need to hash/fingerprint a column in the table and use a modulus to select a subset of values instead. Using legacy SQL:
#legacySQL
SELECT
COUNT(1),
bigarticle
FROM (
SELECT
1 as [bigarticle]
FROM [bigquery-public-data:samples.wikipedia]
WHERE num_characters > 50000
), (
SELECT
0 as [bigarticle]
FROM [bigquery-public-data:samples.wikipedia]
WHERE (is_redirect is null) AND HASH(title) % 100 = 0
)
GROUP BY bigarticle;
Using standard SQL in BigQuery, which is recommended since legacy SQL is not under active development:
#standardSQL
SELECT
COUNT(*),
bigarticle
FROM (
SELECT
1 as bigarticle
FROM `bigquery-public-data.samples.wikipedia`
WHERE num_characters > 50000
UNION ALL
SELECT
0 as bigarticle
FROM `bigquery-public-data.samples.wikipedia`
WHERE (is_redirect is null) AND MOD(FARM_FINGERPRINT(title), 100) = 0
)
GROUP BY bigarticle;

SQL Server check if where clause is true for any row

I'm going to select those provinces which intersects any railroad. So I do it like this (Using SQL Spatial):
SELECT * FROM ProvinceTable
WHERE (
SELECT count(*)
FROM RailroadTable
WHERE ProvinceTable.Shape.STIntersects(RailroadTable.Shape) > 1
) > 0
But it is not efficient because it has to check the intersection between every single railroad geometry and province geometry in order to calculate the count. However it is better to stop the where clause as soon as every first intersection detected and there is no need to check others. Here is what I mean:
SELECT * FROM ProvinceTable
WHERE (
--return true if this is true for any row in the RailroadTable:
-- "ProvinceTable.Shape.STIntersects(RailroadTable.Shape) > 1"
)
So is there a better way to rewrite this query for such a goal?
EDIT
Surprisingly This query takes the same time and returns no row:
SELECT * FROM ProvinceTable
WHERE EXISTS (
SELECT *
FROM RailroadTable
WHERE ProvinceTable.Shape.STIntersects(RailroadTable.Shape) > 1
)
You want to use exists:
SELECT pt.*
FROM ProvinceTable pt
WHERE EXISTS (SELECT 1
FROM RailroadTable rt
WHERE pt.Shape.STIntersects(rt.Shape) = 1
);

UNION ALL tables if ##rowcount from first SELECT yields less than 10 rows

I'm trying to do a stored procedure to be used by a search mechanism.
The way I want it to work is to first do a SELECT with LIKE TOYOTA%. After that, if the results yield less than 10 results, I want it to append or UNION ALL with another SELECT that has a LIKE %TOYOTA%.
So Basically, this is what I'm looking for:
SELECT *
FROM CARS
WHERE CARS.MAKE LIKE '#searchQuery%'
IF(##rowcount < 10)
BEGIN
UNION ALL
SELECT *
FROM CARS
WHERE CARS.MAKE LIKE '%#searchQuery%'
END
The only problem is that I can't do this - it won't let me use UNION ALL before or after the IF.
I'm doing this because I want to always have at least 10 results as much as possible. If I have less, then I want to fill the remaining slots with records that may have the name TOYOTA somewhere in the middle.
You could do this to always get 10 results:
SELECT top 10 *
FROM CARS
WHERE CARS.MAKE LIKE '%#searchQuery%'
order by (case when cars.make like '#searchQuery%' then 0 else 1 end);
To strictly do what you want (get all that begin with the search query and then pad out to 10 if less than 10), you can use window functions:
select c.*
from (SELECT c.*,
row_number() over (order by (case when c.MAKE LIKE '#searchQuery%' then 0 else 1 end
) as seqnum
FROM CARS c
WHERE c.MAKE LIKE '%#searchQuery%'
) c
where c.MAKE LIKE '#searchQuery%' or seqnum <= 10;
You should use a table variable I think...
DECLARE #MyTableVar table(
CarID int NOT NULL,
Puertas int,
Ruedas int,
FechaDeSalida datetime);
You should declare it exactly equal to the CARS table. Then you do
SELECT #MyTableVar=*
FROM CARS
WHERE CARS.MAKE LIKE '#searchQuery%'
Now in the variable, you have your query result. Finally:
IF(##rowcount < 10)
BEGIN
SELECT *
FROM #MyTableVar
UNION ALL
SELECT *
FROM CARS
WHERE CARS.MAKE LIKE '%#searchQuery%'
END
Sorry for my bad english! Hope this help!