SQL using fallback column for match - sql

Say I have a table in an sql database like
name age shoesize
---------------------
tom 20 NULL
dick NULL 4
harry 30 5
and I want an SQL statement that selects names that have age == X, or as a fallback, if no such names exist, use those with shoe size == Y. In other words, in this table, for X=20,Y=4 I should only get 'tom', while for X=25,Y=4 I should get only 'dick'. I can't do that with
SELECT name FROM table WHERE age = 20 OR shoe size = 4;
because that will select both tom and dick. I'm currently using
SELECT COALESCE ((SELECT name FROM tab WHERE age = 20),(SELECT name FROM tab WHERE shoesize = 4));
but is there a neater way? Also using coalesce like this doesn't allow me to get the whole row - i.e. I can't use SELECT * FROM tab, I can only select a single name.

You can use ORDER BY and FETCH FIRST 1 ROW ONLY or some similar clause:
SELECT name
FROM tab
ORDER BY (CASE WHEN age = X THEN 1
WHEN shoesize = Y THEN 2
ELSE 3
END)
FETCH FIRST 1 ROW ONLY;
Some databases spell FETCH FIRST 1 ROW ONLY like LIMIT or TOP or even something else.

Related

Filtering a column based on having some value in one of the rows in SQL or Presto Athena

I am trying in Athena to output only users which have some specific value in them but not in all of the rows
Suppose I have the table below.
I want all users which have value '100' in at least one of their rows but also having in other rows value different than 100.
user | value
A | 1
B | 2
A | 100
D | 3
A | 4
C | 3
C | 5
D | 100
So in this example I would want to get only users A and D because only them having 100 and none 100.
I tried maybe grouping by user and creating an array of values per user and then checking if array contains 100 but I don't manage doing it presto.
Also I thought about converting rows to columns and then checking if one of columns equals 100.
Those solutions are too complex? Anybody knows how to implement them or anyone has a better simpler solution?
The users that have at least one value of 100 can be found with this SQL:
SELECT DISTINCT user
FROM some_table
WHERE value = 100
But I assume you are after all tuples of user and value where the user has at least one value of 100, this can be accomplished by using the query above in a slightly more complex query:
WITH matching_users AS (
SELECT DISTINCT user
FROM some_table
WHERE value = 100
)
SELECT user, value
FROM matching_users
LEFT JOIN some_table USING (user)
You can use sub query as below to achieve your required output=
SELECT * FROM your_table
WHERE User IN(
SELECT DISTINCT User
FROM your_table
WHERE Value = 100
)
If you just want the users, I would go for aggregation:
select user
from t
group by user
having sum(case when value = 100 then 1 else 0 end) > 0;
If 100 is the maximum possible value, this can be simplified to:
having max(value) = 100

SQL Rows to Columns if column values are unknown

I have a table that has demographic information about a set of users which looks like this:
User_id Category IsMember
1 College 1
1 Married 0
1 Employed 1
1 Has_Kids 1
2 College 0
2 Married 1
2 Employed 1
3 College 0
3 Employed 0
The result set I want is a table that looks like this:
User_Id|College|Married|Employed|Has_Kids
1 1 0 1 1
2 0 1 1 0
3 0 0 0 0
In other words, the table indicates the presence or absence of a category for each user. Sometimes the user will have a category where the value if false, sometimes the user will have no row for a category, in which case IsMember is assumed to be false.
Also, from time to time additional categories will be added to the data set, and I'm wondering if its possible to do this query without knowing up front all the possible category names, in other words, I won't be able to specify all the column names I want to count in the result. (Note only user 1 has category "has_kids" and user 3 is missing a row for category "married"
(using Postgres)
Thanks.
You can use jsonb funcions.
with titles as (
select jsonb_object_agg(Category, Category) as titles,
jsonb_object_agg(Category, -1) as defaults
from demog
),
the_rows as (
select null::bigint as id, titles as data
from titles
union
select User_id, defaults || jsonb_object_agg(Category, IsMember)
from demog, titles
group by User_id, defaults
)
select id, string_agg(value, '|' order by key)
from (
select id, key, value
from the_rows, jsonb_each_text(data)
) x
group by id
order by id nulls first
You can see a running example in http://rextester.com/QEGT70842
You can replace -1 with 0 for the default value and '|' with ',' for the separator.
You can install tablefunc module and use the crosstab function.
https://www.postgresql.org/docs/9.1/static/tablefunc.html
I found a Postgres function script called colpivot here which does the trick. Ran the script to create the function, then created the table in one statement:
select colpivot ('_pivoted', 'select * from user_categories', array['user_id'],
array ['category'], '#.is_member', null);

SQL: Find rows that match closely but not exactly

I have a table inside a PostgreSQL database with columns c1,c2...cn. I want to run a query that compares each row against a tuple of values v1,v2...vn. The query should not return an exact match but should return a list of rows ordered in descending similarity to the value vector v.
Example:
The table contains sports records:
1,USA,basketball,1956
2,Sweden,basketball,1998
3,Sweden,skating,1998
4,Switzerland,golf,2001
Now when I run a query against this table with v=(Sweden,basketball,1998), I want to get all records that have a similarity with this vector, sorted by number of matching columns in descending order:
2,Sweden,basketball,1998 --> 3 columns match
3,Sweden,skating,1998 --> 2 columns match
1,USA,basketball,1956 --> 1 column matches
Row 4 is not returned because it does not match at all.
Edit: All columns are equally important. Although, when I really think of it... it would be a nice add-on if I could give each column a different weight factor as well.
Is there any possible SQL query that would return the rows in a reasonable amount of time, even when I run it against a million rows?
What would such a query look like?
SELECT * FROM countries
WHERE country = 'sweden'
OR sport = 'basketball'
OR year = 1998
ORDER BY
cast(country = 'sweden' AS integer) +
cast(sport = 'basketball' as integer) +
cast(year = 1998 as integer) DESC
It's not beautiful, but well. You can cast the boolean expressions as integers and sum them.
You can easily change the weight, by adding a multiplicator.
cast(sport = 'basketball' as integer) * 5 +
This is how I would do it ... the multiplication factors used in the case stmts will handle the importance(weight) of the match and they will ensure that those records that have matches for columns designated with the highest weight will come up top even if the other columns don't match for those particular records.
/*
-- Initial Setup
-- drop table sport
create table sport (id int, Country varchar(20) , sport varchar(20) , yr int )
insert into sport values
(1,'USA','basketball','1956'),
(2,'Sweden','basketball','1998'),
(3,'Sweden','skating','1998'),
(4,'Switzerland','golf','2001')
select * from sport
*/
select * ,
CASE WHEN Country='sweden' then 1 else 0 end * 100 +
CASE WHEN sport='basketball' then 1 else 0 end * 10 +
CASE WHEN yr=1998 then 1 else 0 end * 1 as Match
from sport
WHERE
country = 'sweden'
OR sport = 'basketball'
OR yr = 1998
ORDER BY Match Desc
It might help if you wrote a stored procedure that calculates a "similarity metric" between two rows. Then your query could refer to the return value of that procedure directly rather than having umpteen conditions in the where-expression and the order-by-expression.

SQL get values in multiple rows with same identifiers

For example, this is my response of
select *
from x
where id = 1
Result:
ID data
1 mouse
1 england
1 computer
Now, how do I search for a mouse, in the country England? I can't really check with this:
AND data = 'mouse'
AND data = 'england'
(Rather not use a query in a query if possible)
If I understand correctly, you have a "set-within-sets" query. You want to find all three attributes for a given id. I recommend using group by and having for this purpose:
select id
from x
where data in ('mouse', 'england')
group by id
having count(*) = 2;
Use subquery to constrain the country. For example for this data
ID DATA
---------- --------
1 computer
1 england
1 mouse
2 austria
2 computer
2 mouse
3 mouse
3 mouse
you first filters only IDs that belongs to England and than
constrains the Mouse.
SELECT *
FROM test
WHERE id IN
(SELECT id FROM test WHERE data = 'england'
)
AND data = 'mouse';
ID DATA
---------- --------
1 mouse

Sybase SQL CASE with CAST

I have a Sybase table (which I can't alter) that I am trying to get into a specific table format. The table contains three columns all which are string values, with an id (which is not unique), a "position" which is a number that represents a field name, and a field column that is the value. The table looks like:
id position field
100 0 John
100 1 Jane
100 2 25
100 3 50
101 0 Dave
101 3 30
Position 0 means "SalesRep1", Position 1 means "SR1Commission", Position 2 means "SalesRep2", and Position 3 means "SR2Commission".
I am trying to get a table that looks like following, with the Commission columns being decimals instead of strings:
id SalesRep1 SR1Commission SalesRep2 SR2Commisson
100 John 25 Jane 50
101 Dave 30 NULL NULL
I've gotten close using CASE, but I end up with only one value per row and not sure there's a way to do what I want. I also have problems with trying to get CAST included to change the commission values from strings to decimals. Here's what I have so far:
SELECT id
CASE "position" WHEN '0' THEN field END AS SalesRep1,
CASE "position" WHEN '1' THEN field END AS SalesRep2,
CASE "position" WHEN '2' THEN field END AS SR1Commission,
CASE "position" WHEN '3' THEN field END AS SR2Commission
FROM v_custom_field WHERE id = ?
This gives me the following result when querying for id 100:
id SalesRep1 SR1Commission SalesRep2 SR2Commission
100 John NULL NULL NULL
100 NULL 25 NULL NULL
100 NULL NULL Jane NULL
100 NULL NULL NULL 50
This is close, but I want to 'collapse' the rows down into one row based off of the id as well as cast the commission values to numbers. I tried adding in a CAST(field AS DECIMAL) I'm not sure if this is even the right direction to go, and was looking into PIVOT, but Sybase doesn't seem to support that.
This is known as an entity-attribute-value table. They're a pain to work with because they're one step removed from being relational data, but they're very common for user-defined fields in applications.
If you can't use PIVOT, you'll need to do something like this:
SELECT DISTINCT s.id,
f0.field AS SalesRep1,
CAST(f1.field AS DECIMAL(20,5)) AS SR1Commission,
f2.field AS SalesRep2,
CAST(f3.field AS DECIMAL(20,5)) AS SR2Commission
FROM UnnamedSalesTable s
LEFT JOIN UnnamedSalesTable f0
ON f0.id = s.id AND f0.position = 0
LEFT JOIN UnnamedSalesTable f1
ON f1.id = s.id AND f1.position = 1
LEFT JOIN UnnamedSalesTable f2
ON f2.id = s.id AND f2.position = 2
LEFT JOIN UnnamedSalesTable f3
ON f3.id = s.id AND f3.position = 3
It's not very fast because it's a ton of self-joins followed by a DISTINCT, but it does work.