SQL query to pull certain rows based on values in other rows in the same table - sql

I have a set of data that contains 2 sets of identifiers: a unique number for that record, Widget_Number, and the original unique number for the record, Original_Widget_Number. Typically these two values are identical but when a record has been revised, the a new record is created with a new Widget_Number, preserving the old Widget_Number value in Original_Widget_Number. IE SELECT * FROM widgets WHERE Widget_Number != Original_Widget_Number returns all records that have been changed. (Widget_Number increments by 10 for new widgets and by 1 for revised widgets.)
I would like to return all records that were changed as well as the original records related to those records. For example if I had a table containing:
Widget_Number Original_Widget Number More_Data
1: 10 10 Stephen
2: 11 10 Steven
3: 20 20 Joe
I would like a query to return rows 1 & 2. I know I could loop trough this in a higher-level language but is there a straightforward way to do this in MS SQL?

using exists():
select *
from widgets as t
where exists (
select 1
from widgets as i
where i.original_widget_number = t.original_widget_number
and i.widget_number != i.original_widget_number
)
or in()
select *
from widgets as t
where t.original_widget_number in (
select i.original_widget_number
from widgets as i
where i.widget_number != i.original_widget_number
)

The following should get both the records that have changed and the original records:
select w.*
from widgets w
where w.widget_number <> w.original_widget_number or
exists (select 1
from widgets w2
where w.widget_number = w2.original_widget_number and
w2.widget_number <> w2.original_widget_number
);

select * from widget
where original_widget_number in
(select original_widget_number from widget
where widget_number <> original_widget_number)

Related

Return 0 if no row found in SQL Server using Pivot

Thanks everyone, and thank you #Aaron Bertrand, your answer solved my problem :) !
i am struggling to find a solution to my problem, here is my query in SQL Server :
EDIT: a little more details : this is the kind of data that a have in my table :
identifier
date
status
1
20220421
have a book
2
20220421
have a pdf
3
20220421
have a pdf
4
20220421
have a book
5
20220421
have a book
6
20220421
have a book
my query gives this result :
have a book
have a pdf
4
2
so in the case when there is no records for a date, I need a query that returns :
have a book
have a pdf
0
0
instead of :
have a book
have a pdf
SELECT * FROM
(
Select status from database.dbo.MyTable where date = '20220421' and status
in ('have a book','have a pdf')) y
PIVOT( Count (status) FOR y.status IN ( [have a book],[have a
pdf])
) pivot_table
This query works well but my issue is that i want to display 0 in the results if no row is found, i tried with IsNull, it works without the Pivot part, but i wasn't able to make it work with the Pivot.
Thanks in advance :)
Since we're only dealing with bits and one or zero rows for a given date, you can just add a union to include a second row with zeros, and take the max (which will either pull the 0 or 1 from the real row, or the zeros from the dummy row when a real row doesn't exist).
SELECT [have a book] = MAX([have a book]),
[have a pdf] = MAX([have a pdf])
FROM
(
SELECT [have a book], [have a pdf] FROM
(
SELECT status FROM dbo.whatever
WHERE date = '20220421'
AND status IN ('have a book','have a pdf')
) AS src PIVOT
(
COUNT(status) FOR status IN
([have a book],[have a pdf])
) AS pivot_table
UNION ALL SELECT 0,0
) AS final;
Example db<>fiddle

Filtering a column based on having some value in one of the rows in SQL or Presto Athena

I am trying in Athena to output only users which have some specific value in them but not in all of the rows
Suppose I have the table below.
I want all users which have value '100' in at least one of their rows but also having in other rows value different than 100.
user | value
A | 1
B | 2
A | 100
D | 3
A | 4
C | 3
C | 5
D | 100
So in this example I would want to get only users A and D because only them having 100 and none 100.
I tried maybe grouping by user and creating an array of values per user and then checking if array contains 100 but I don't manage doing it presto.
Also I thought about converting rows to columns and then checking if one of columns equals 100.
Those solutions are too complex? Anybody knows how to implement them or anyone has a better simpler solution?
The users that have at least one value of 100 can be found with this SQL:
SELECT DISTINCT user
FROM some_table
WHERE value = 100
But I assume you are after all tuples of user and value where the user has at least one value of 100, this can be accomplished by using the query above in a slightly more complex query:
WITH matching_users AS (
SELECT DISTINCT user
FROM some_table
WHERE value = 100
)
SELECT user, value
FROM matching_users
LEFT JOIN some_table USING (user)
You can use sub query as below to achieve your required output=
SELECT * FROM your_table
WHERE User IN(
SELECT DISTINCT User
FROM your_table
WHERE Value = 100
)
If you just want the users, I would go for aggregation:
select user
from t
group by user
having sum(case when value = 100 then 1 else 0 end) > 0;
If 100 is the maximum possible value, this can be simplified to:
having max(value) = 100

SQL Rows to Columns if column values are unknown

I have a table that has demographic information about a set of users which looks like this:
User_id Category IsMember
1 College 1
1 Married 0
1 Employed 1
1 Has_Kids 1
2 College 0
2 Married 1
2 Employed 1
3 College 0
3 Employed 0
The result set I want is a table that looks like this:
User_Id|College|Married|Employed|Has_Kids
1 1 0 1 1
2 0 1 1 0
3 0 0 0 0
In other words, the table indicates the presence or absence of a category for each user. Sometimes the user will have a category where the value if false, sometimes the user will have no row for a category, in which case IsMember is assumed to be false.
Also, from time to time additional categories will be added to the data set, and I'm wondering if its possible to do this query without knowing up front all the possible category names, in other words, I won't be able to specify all the column names I want to count in the result. (Note only user 1 has category "has_kids" and user 3 is missing a row for category "married"
(using Postgres)
Thanks.
You can use jsonb funcions.
with titles as (
select jsonb_object_agg(Category, Category) as titles,
jsonb_object_agg(Category, -1) as defaults
from demog
),
the_rows as (
select null::bigint as id, titles as data
from titles
union
select User_id, defaults || jsonb_object_agg(Category, IsMember)
from demog, titles
group by User_id, defaults
)
select id, string_agg(value, '|' order by key)
from (
select id, key, value
from the_rows, jsonb_each_text(data)
) x
group by id
order by id nulls first
You can see a running example in http://rextester.com/QEGT70842
You can replace -1 with 0 for the default value and '|' with ',' for the separator.
You can install tablefunc module and use the crosstab function.
https://www.postgresql.org/docs/9.1/static/tablefunc.html
I found a Postgres function script called colpivot here which does the trick. Ran the script to create the function, then created the table in one statement:
select colpivot ('_pivoted', 'select * from user_categories', array['user_id'],
array ['category'], '#.is_member', null);

SQL: Find rows that match closely but not exactly

I have a table inside a PostgreSQL database with columns c1,c2...cn. I want to run a query that compares each row against a tuple of values v1,v2...vn. The query should not return an exact match but should return a list of rows ordered in descending similarity to the value vector v.
Example:
The table contains sports records:
1,USA,basketball,1956
2,Sweden,basketball,1998
3,Sweden,skating,1998
4,Switzerland,golf,2001
Now when I run a query against this table with v=(Sweden,basketball,1998), I want to get all records that have a similarity with this vector, sorted by number of matching columns in descending order:
2,Sweden,basketball,1998 --> 3 columns match
3,Sweden,skating,1998 --> 2 columns match
1,USA,basketball,1956 --> 1 column matches
Row 4 is not returned because it does not match at all.
Edit: All columns are equally important. Although, when I really think of it... it would be a nice add-on if I could give each column a different weight factor as well.
Is there any possible SQL query that would return the rows in a reasonable amount of time, even when I run it against a million rows?
What would such a query look like?
SELECT * FROM countries
WHERE country = 'sweden'
OR sport = 'basketball'
OR year = 1998
ORDER BY
cast(country = 'sweden' AS integer) +
cast(sport = 'basketball' as integer) +
cast(year = 1998 as integer) DESC
It's not beautiful, but well. You can cast the boolean expressions as integers and sum them.
You can easily change the weight, by adding a multiplicator.
cast(sport = 'basketball' as integer) * 5 +
This is how I would do it ... the multiplication factors used in the case stmts will handle the importance(weight) of the match and they will ensure that those records that have matches for columns designated with the highest weight will come up top even if the other columns don't match for those particular records.
/*
-- Initial Setup
-- drop table sport
create table sport (id int, Country varchar(20) , sport varchar(20) , yr int )
insert into sport values
(1,'USA','basketball','1956'),
(2,'Sweden','basketball','1998'),
(3,'Sweden','skating','1998'),
(4,'Switzerland','golf','2001')
select * from sport
*/
select * ,
CASE WHEN Country='sweden' then 1 else 0 end * 100 +
CASE WHEN sport='basketball' then 1 else 0 end * 10 +
CASE WHEN yr=1998 then 1 else 0 end * 1 as Match
from sport
WHERE
country = 'sweden'
OR sport = 'basketball'
OR yr = 1998
ORDER BY Match Desc
It might help if you wrote a stored procedure that calculates a "similarity metric" between two rows. Then your query could refer to the return value of that procedure directly rather than having umpteen conditions in the where-expression and the order-by-expression.

The MIN() Function Ms Access

this is a sample sql query that i created ms access query. i am trying to get only one row the min(DATE). how ever when i run my query i get multiple lines. any hits? thanks
SELECT tblWarehouseItem.whiItemName,
tblWarehouseItem.whiQty,
tblWarehouseItem.whiPrice,
Min(tblWarehouseItem.whiDateIn) AS MinOfwhiDateIn,
tblWarehouseItem.whiExpiryDate,
tblWarehouseItem.whiwrhID
FROM tblWarehouseItem
GROUP BY tblWarehouseItem.whiDateIn,
tblWarehouseItem.whiItemName,
tblWarehouseItem.whiQty,
tblWarehouseItem.whiPrice,
tblWarehouseItem.whiExpiryDate,
tblWarehouseItem.whiwrhID;
If i have my sql code like that is working as it should:
SELECT MIN(tblWarehouseItem.whiDateIn) FROM tblWarehouseItem;
In the first query, you group by a number of columns. That means the minimum value will be calculated for each group, which in turn means you may have multiple rows. On the other hand, the second query will only get the minimum value for the specified column from all rows, so that there is only one row in the result set.
A simple example is shown below to illustrate the above.
Table:
Key Value
1 1
1 2
2 3
2 4
On Group By Key:
GroupKey MinValue
1 = min(1,2) = 1 -> Row 1
2 = min(3,4) = 3 -> Row 2
On Min (Value)
MinValue
=min(1,2,3,4) = 1 -> Row 1
For a table like above, if you want to select all rows and also show the minimum value from whole table rather than per group, you can do something like this:
select key, (select min(value) from table)
from table
SELECT WI.*
FROM tblWarehouseItem AS WI INNER JOIN (SELECT whiimtID, MIN(tblWarehouseItem.whiDateIn) AS whiDateIn
FROM tblWarehouseItem
GROUP BY whiimtID) AS MinWI ON (WI.whiDateIn = MinWI.whiDateIn) AND (WI.whiimtID = MinWI.whiimtID);