SQL SELECT Percentage of Rows Based on Column Values - sql

I am looking to return 15% of rows based on column values. For example, I have a citizenship column and gender column as well as name email,etc. I want to return 15% of each scenario in those columns. If citizenship = USA I would want 15% of total rows with USA and male another 15% of USA and female and another of 15% of USA and unknown. The same would go for each other citizenship in my result (ie Chinese, Canadian, etc.)
I am able to get 15% of all rows, but not based on column values.
A very stripped down query looks something like this.
SELECT TOP 15 PERCENT FROM (
SELECT name
, email
, citizenship
, gender
FROM bio) a

You want a stratified sample. You can do this using:
select top 15 percent b.*
from bio b
order by row_number() over (partition by citizenship, gender order by (select null));

Related

Transpose in Postgresql

I am trying to design a database of customer details. Where customers can have up to two different phone numbers.
When I run the Select * command to bring out the customers that match criteria, I get this:
Name | Number
James | 12344532
James | 23232422
I would like it to display all customers with two numbers this way:
Name | Number | Number
James 12344532 23232422
John 32443322
Jude 12121212 23232422
I am using Postgresql server on Azure Data studio.
Please assist.
I tried using this command:
Select * FROM name.name,
min(details.number) AS number1,
max(details.number) AS number2
FROM name
JOIN details
ON name.id=details.id
GROUP BY name.name
I got this:
Name | Number | Number
James 12344532 23232422
John 32443322 32443322
Jude 12121212 23232422
Customers with just 1 phone number gets duplicated in the table. How do I go about this?
I would aggregate the numbers into an array, then extract the array elements:
select n.name,
d.numbers[1] as number_1,
d.numbers[2] as number_2
from name n
join (
select id, array_agg(number) as numbers
from details
group by id
) d on d.id = n.id
order by name;
This is also easy to extend if you have more than two numbers.
Try using the following query:
SELECT
Name,
MIN(CASE WHEN rn = 1 THEN Number END) AS Number1,
MIN(CASE WHEN rn = 2 THEN Number END) AS Number2
FROM
(SELECT
Name, Number,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Number) AS rn
FROM name) t
GROUP BY Name
This query will use the ROW_NUMBER() function to assign a unique row number to each phone number for each customer. The ROW_NUMBER() function is ordered by the Number column, so the lowest number will have a row number of 1, and the second lowest number will have a row number of 2, etc.
Then we use the outer query to group customer by name and use MIN() function to get the first and second number based on the row number.
This query will return the desired output, with two columns, one showing the customer's first phone number and the other showing their second phone number.
Note: The query above assumes that the phone number is unique for each customer. If a customer has duplicate phone numbers, the query will return the first one it encounters.

SQLite query to get table based on values of another table

I am not sure what title has to be here to correctly reflect my question, I can only describe what I want.
There is a table with fields:
id, name, city
There are next rows:
1 John London
2 Mary Paris
3 John Paris
4 Samy London
I want to get a such result:
London Paris
Total 2 2
John 1 1
Mary 0 1
Samy 1 0
So, I need to take all unique values of name and find an appropriate quantity for unique values of another field (city)
Also I want to get a total quantity of each city
Simple way to do it is:
1)Get a list of unique names
SELECT DISTINCT name FROM table
2)Get a list of unique cities
SELECT DISTINCT city FROM table
3)Create a query for every name and city
SELECT COUNT(city) FROM table WHERE name = some_name AND city = some_city
4)Get total:
SELECT COUNT(city) FROM table WHERE name = some_name
(I did't test these queries, so maybe there are some errors here but it's only to show the idea)
As there are 3 names and 2 cities -> 3 * 2 = 6 queries to DB
But for a table with 100 cities and 100 names -> 100 * 100 = 10 000 queries to DB
and it may take a lot of time to do.
Also, names and cities may be changed, so, I can't create a query with predefined names or cities as every day it's new ones, so, instead of London and Paris it may be Moscow, Turin and Berlin. The same thing with names.
How to get such table with one-two queries to original table using sqlite?
(sqlite: I do it for android)
You can get the per-name results with conditional aggregation. As for the total, unfortunately SQLite does not support the with rollup clause, that would generate it automatically.
One workaround is union all and an additional column for ordering:
select name, london, paris
from (
select name, sum(city = 'London') london, sum(city = 'Paris') paris, 1 prio
from mytable
group by name
union all
select 'Total', sum(city = 'London'), sum(city = 'Paris'), 0
from mytable
) t
order by prio, name
Actually the subquery might not be necessary:
select name, sum(city = 'London') london, sum(city = 'Paris') paris, 1 prio
from mytable
group by name
union all
select 'Total', sum(city = 'London'), sum(city = 'Paris'), 0
from mytable
order by prio, name
#GMB gave me the idea of using group by, but as I do it for SQLite on Android, so, the answer looks like:
SELECT name,
COUNT(CASE WHEN city = :london THEN 1 END) as countLondon,
COUNT(CASE WHEN city = :paris THEN 1 END) as countParis
FROM table2 GROUP BY name
where :london and :paris are passed params, and countLondon and countParis are fields of the response class

How to compute percentage of total based on Counts of 1 field and filtered by another field

Data and Desired Result:
I have the above data, i would like to compute the percentage and show the corresponding counts of the records. The CountKey is a concatenation of 3 fields, i only want to count it when it is unique by LastName, I then would like to find out the percentage of total for each different Status type by last name. The CountKeyTotal is the total count of unique CountKeys for Smith, CountKey is the Total unique Countkeys by LastName by Status
I am fairly new to SQL and have only been able to get either totals in whole (as an example using the data provided, Smith 40 3 12 25%
any help would be appreciated
You can use a group by and a dinamic temp table for total
select
a.name, a.status
, count(a.countKey) as CountKEy
, c2 as CountKeyTotal
, (a.count(*) / b.c1) *100 as percentage
from my_table as a
inner join( select name, count(*) as c1 , count(countKey) c2 from my_table
group by name) b on a.name = b.name
group by a.name

Fill Users table with data using percentages from another table

I have a Table Users (it has millions of rows)
Id Name Country Product
+----+---------------+---------------+--------------+
1 John Canada
2 Kate Argentina
3 Mark China
4 Max Canada
5 Sam Argentina
6 Stacy China
...
1000 Ken Canada
I want to fill the Product column with A, B or C based on percentages.
I have another table called CountriesStats like the following
Id Country A B C
+-----+---------------+--------------+-------------+----------+
1 Canada 60 20 20
2 Argentina 35 45 20
3 China 40 10 50
This table holds the percentage of people with each product. For example in Canada 60% of people have product A, 20% have product B and 20% have product C.
I would like to fill the Users table with data based on the Percentages in the second data. So for example if there are 1 million user in canada, I would like to fill 600000 of the Product column in the Users table with A 200000 with B and 200000 with C
Thanks for any help on how to do that. I do not mind doing it in multiple steps I jsut need hints on how can I achieve that in SQL
The logic behind this is not too difficult. Assign a sequential counter to each person in each country. Then, using this value, assign the correct product based on this value. For instance, in your example, when the number is less than or equal to 600,000 then 'A' gets assigned. For 600,001 to 800,000 then 'B', and finally 'C' to the rest.
The following SQL accomplishes this:
with toupdate as (
select u.*,
row_number() over (partition by country order by newid()) as seqnum,
count(*) over (partition by country) as tot
from users u
)
update u
set product = (case when seqnum <= tot * A / 100 then 'A'
when seqnum <= tot * (A + B) / 100 then 'B'
else 'C'
end)
from toupdate u join
CountriesStats cs
on u.country = cs.country;
The with statement defines an updatable subquery with the sequence number and total for each each country, on each row. This is a nice feature of SQL Server, but is not supported in all databases.
The from statement is joining back to the CountriesStats table to get the needed values for each country. And the case statement does the necessary logic.
Note that the sequential number is assigned randomly, using newid(), so the products should be assigned randomly through the initial table.

How to join different columns of same table?

Suppose I have one table with two column, Country and City.
Country
USA
Canada
UK
City
NY
London
I want to join/merge both column records and expect the output like this -
USA
Canada
UK
NY
London
So, what will be the SQL query to merge different columns records of same table?
SELECT Country FROM TABLE
UNION
SELECT City FROM Table
should do it.
Responding to the comment "I am searching for any quick way. Because if I need to merge 10 columns then i have to write 10 Unions! Is there any other way?":
You can use an unpivot, which means you just need to add the column names into a list. Only thing is to watch for data types though. eg:
--CTE for example only
;WITH CTE_Locations as (
select Country = convert(varchar(50),'USA'), City = convert(varchar(50),'NY')
union select Country = 'Canada', City = 'Vancouver'
union select Country = 'UK', City = 'Manchester'
)
--Select a list of values from all columns
select distinct
Place
from
CTE_Locations l
unpivot (Place for PlaceType in ([Country],[City])) u