SQL Rows to Columns if column values are unknown - sql

I have a table that has demographic information about a set of users which looks like this:
User_id Category IsMember
1 College 1
1 Married 0
1 Employed 1
1 Has_Kids 1
2 College 0
2 Married 1
2 Employed 1
3 College 0
3 Employed 0
The result set I want is a table that looks like this:
User_Id|College|Married|Employed|Has_Kids
1 1 0 1 1
2 0 1 1 0
3 0 0 0 0
In other words, the table indicates the presence or absence of a category for each user. Sometimes the user will have a category where the value if false, sometimes the user will have no row for a category, in which case IsMember is assumed to be false.
Also, from time to time additional categories will be added to the data set, and I'm wondering if its possible to do this query without knowing up front all the possible category names, in other words, I won't be able to specify all the column names I want to count in the result. (Note only user 1 has category "has_kids" and user 3 is missing a row for category "married"
(using Postgres)
Thanks.

You can use jsonb funcions.
with titles as (
select jsonb_object_agg(Category, Category) as titles,
jsonb_object_agg(Category, -1) as defaults
from demog
),
the_rows as (
select null::bigint as id, titles as data
from titles
union
select User_id, defaults || jsonb_object_agg(Category, IsMember)
from demog, titles
group by User_id, defaults
)
select id, string_agg(value, '|' order by key)
from (
select id, key, value
from the_rows, jsonb_each_text(data)
) x
group by id
order by id nulls first
You can see a running example in http://rextester.com/QEGT70842
You can replace -1 with 0 for the default value and '|' with ',' for the separator.

You can install tablefunc module and use the crosstab function.
https://www.postgresql.org/docs/9.1/static/tablefunc.html

I found a Postgres function script called colpivot here which does the trick. Ran the script to create the function, then created the table in one statement:
select colpivot ('_pivoted', 'select * from user_categories', array['user_id'],
array ['category'], '#.is_member', null);

Related

Filtering a column based on having some value in one of the rows in SQL or Presto Athena

I am trying in Athena to output only users which have some specific value in them but not in all of the rows
Suppose I have the table below.
I want all users which have value '100' in at least one of their rows but also having in other rows value different than 100.
user | value
A | 1
B | 2
A | 100
D | 3
A | 4
C | 3
C | 5
D | 100
So in this example I would want to get only users A and D because only them having 100 and none 100.
I tried maybe grouping by user and creating an array of values per user and then checking if array contains 100 but I don't manage doing it presto.
Also I thought about converting rows to columns and then checking if one of columns equals 100.
Those solutions are too complex? Anybody knows how to implement them or anyone has a better simpler solution?
The users that have at least one value of 100 can be found with this SQL:
SELECT DISTINCT user
FROM some_table
WHERE value = 100
But I assume you are after all tuples of user and value where the user has at least one value of 100, this can be accomplished by using the query above in a slightly more complex query:
WITH matching_users AS (
SELECT DISTINCT user
FROM some_table
WHERE value = 100
)
SELECT user, value
FROM matching_users
LEFT JOIN some_table USING (user)
You can use sub query as below to achieve your required output=
SELECT * FROM your_table
WHERE User IN(
SELECT DISTINCT User
FROM your_table
WHERE Value = 100
)
If you just want the users, I would go for aggregation:
select user
from t
group by user
having sum(case when value = 100 then 1 else 0 end) > 0;
If 100 is the maximum possible value, this can be simplified to:
having max(value) = 100

Returning several values within a CASE expression in subquery and separate columns again in main query

My test table looks like this:
# select * from a;
source | target | id
--------+--------+----
1 | 2 | 1
2 | 3 | 2
3 | 0 | 3
My query is this one:
SELECT *
FROM (
SELECT
CASE
WHEN id<>1
THEN source
ELSE 0
END
AS source,
CASE
WHEN id<>1
THEN target
ELSE 0
END
AS target
FROM a
) x;
The query seems a bit odd because the CASE expression with the same criteria is repeated for every column. I would like to simplify this and tried the following, but it doesn't work as expected.
SELECT *
FROM (
SELECT
CASE
WHEN id<>1
THEN (source, target)
ELSE (0, 0)
END
AS r
FROM a
) x;
It yields one column with a row value, but I would rather get the two original columns. Separating them with a (r).* or similar doesn't work, because the "record type has not been registered".
I found several questions here with solutions regarding functions returning RECORD values, but none regarding this example with a sub-select.
Actually, there is a quite long list of columns, so repeating the same CASE expression many times makes the whole query quite unreadable.
Since the real problem - as opposed to this simplified case - consists of several CASE expressions and several column groups, a solution with a UNION won't help, because the number of UNIONs would be large and make it unreadable as well as several CASEs.
My actual question is: How can I get the original columns from the row value?
This answers the original question.
If I understood your needs, you want 0 and 0 for source and target when id = 1:
SELECT
0 AS source,
0 AS target
FROM tablename
WHERE id = 1
UNION ALL
SELECT
source,
target
FROM tablename
WHERE id <> 1
Revised answer: You can make your query work (fixing the record type has not been registered issue) by creating a TYPE:
CREATE TYPE stpair AS (source int, target int);
And cast the composite value column to that type:
SELECT id, (cv).source, (cv).target
FROM (
SELECT id, CASE
WHEN id <> 1 THEN (source, target)::stpair
ELSE (0, 0)::stpair
END AS cv
FROM t
) AS x
Having said that, it should be far more convenient to use arrays:
SELECT id, av[1] AS source, av[2] AS target
FROM (
SELECT id, CASE
WHEN id <> 1 THEN ARRAY[source, target]
ELSE ARRAY[0, 0]
END AS av
FROM t
) AS x
Demo on db<>fiddle
Will this work for you?
select source,target,id from a where id <>1 union all select 0 as source,0 as target,id from a where id=1 order by id
I have used union all to included cases where multiple records may have ID=1

How do I display grouped ID's as a list with their respective values?

How do I not get all the ID's grouped, but instead listed from first to last; with all their respective values in the columns next to them?
Instead of grouping it, it should show ID 1 and its value, ID 2 and its value. EVEN if the values for the ID is the same. I tried removing the GROUP_CONCAT, but then it's only showing one ID per customfield_value?
SELECT GROUP_CONCAT(virtuemart_product_id), customfield_value, COUNT(*) c
FROM jos_virtuemart_product_customfields
WHERE virtuemart_custom_id = 6
GROUP BY customfield_value
HAVING c > 1
It's working currently, but grouping the ID's and spacing them with a comma. Should just display as in a normal table/list format.
Currently it shows like this(as you can see, it's ALL the same ICOS number, but different ID's. I ONLY need to display the values WHERE the ICOS NUMBER is "duplicate"):
ID ICOS Count
1,2,3 775896 3
It should be displaying like this:
ID ICOS Count
1 775896 1
2 775896 1
3 775896 1
All rows where the customfield_value is not unique:
-- Assuming MySQL
SELECT virtuemart_product_id, customfield_value
, COUNT(*) c -- maybe not needed
FROM jos_virtuemart_product_customfields
WHERE virtuemart_custom_id = 6
AND customfield_value IN
( SELECT customfield_value
FROM jos_virtuemart_product_customfields
WHERE virtuemart_custom_id = 6
GROUP BY customfield_value
HAVING COUNT(*) > 1 -- more than one row exists
)
GROUP BY virtuemart_product_id, customfield_value -- maybe not needed
If the virtuemart_product_id is unique you don't need the outer count/group by as it will always be 1.

SQL using fallback column for match

Say I have a table in an sql database like
name age shoesize
---------------------
tom 20 NULL
dick NULL 4
harry 30 5
and I want an SQL statement that selects names that have age == X, or as a fallback, if no such names exist, use those with shoe size == Y. In other words, in this table, for X=20,Y=4 I should only get 'tom', while for X=25,Y=4 I should get only 'dick'. I can't do that with
SELECT name FROM table WHERE age = 20 OR shoe size = 4;
because that will select both tom and dick. I'm currently using
SELECT COALESCE ((SELECT name FROM tab WHERE age = 20),(SELECT name FROM tab WHERE shoesize = 4));
but is there a neater way? Also using coalesce like this doesn't allow me to get the whole row - i.e. I can't use SELECT * FROM tab, I can only select a single name.
You can use ORDER BY and FETCH FIRST 1 ROW ONLY or some similar clause:
SELECT name
FROM tab
ORDER BY (CASE WHEN age = X THEN 1
WHEN shoesize = Y THEN 2
ELSE 3
END)
FETCH FIRST 1 ROW ONLY;
Some databases spell FETCH FIRST 1 ROW ONLY like LIMIT or TOP or even something else.

List category/subcategory tree and display its sub-categories in the same row

I have a hierarchical table of Regions and sub-regions, and I need to list a tree of regions and sub-regions (which is easy), but also, I need a column that displays, for each region, all the ids of it's sub regions.
For example:
id name superiorId
-------------------------------
1 RJ NULL
2 Tijuca 1
3 Leblon 1
4 Gavea 2
5 Humaita 2
6 Barra 4
I need the result to be something like:
id name superiorId sub-regions
-----------------------------------------
1 RJ NULL 2,3,4,5,6
2 Tijuca 1 4,5,6
3 Leblon 1 null
4 Gavea 2 4
5 Humaita 2 null
6 Barra 4 null
I have done that by creating a function that retrieves a STUFF() of a region row,
but when I'm selecting all regions from a country, for example, the query becomes really, really slow, since I execute the function to get the region sons for each region.
Does anybody know how to get that in an optimized way?
The function that "retrieves all the ids as a row" is:
I meant that the function returns all the sub-region's ids as a string, separated by a comma.
The function is:
CREATE FUNCTION getSubRegions (#RegionId int)
RETURNS TABLE
AS
RETURN(
select stuff((SELECT CAST( wine_reg.wine_reg_id as varchar)+','
from (select wine_reg_id
, wine_reg_name
, wine_region_superior
from wine_region as t1
where wine_region_superior = #RegionId
or exists
( select *
from wine_region as t2
where wine_reg_id = t1.wine_region_superior
and (
wine_region_superior = #RegionId
)
) ) wine_reg
ORDER BY wine_reg.wine_reg_name ASC for XML path('')),1,0,'')as Sons)
GO
When we used to make these concatenated lists in the database we took a similar approach to what you are doing at first
then when we looked for speed
we made them into CLR functions
http://msdn.microsoft.com/en-US/library/a8s4s5dz(v=VS.90).aspx
and now our database is only responsible for storing and retrieving data
this sort of thing will be in our data layer in the application