SQL Find the number of orders in each line count - sql

I have a SQL question from one of the well known IT company couple month ago when they were interviewing me, and I never got it figured out.
An order can have multiple lines - For ex., if a customer ordered cookies,
chocolates, and bread, this would count as 3 lines in one order. The question
is to find the number of orders in each line count. The output of this query
would be something like 100 orders had 1 line, 70 orders had 2 lines, 30 had 3
lines, and so on. This table has two columns - order_id and line_id
Sample Data:
order_id line_id
1 cookies
1 chocolates
1 bread
2 cookies
2 bread
3 chocolates
3 cookies
4 milk
desired output:
orders line
1 1
2 2
1 3
So generally speaking, we have a very large data set, and the line_id per order_id can be ranging from 1 to infinite(Theoretically speaking).
The desired output for the general case is:
orders line
100 1
70 2
30 3
etc..
How can I write a query to find the total number of orders per line count=1,2,3... etc
My thought on this problem is to first subquery the count of line_id per order_id.
And then select the subquery along with a list of values as the second column ranging from 1 to max(lines_id per order)
Test Data:
create table data
(
order_id int,
line_id char(50)
);
insert into data
values(1, 'cookies'),
(1, 'chocolates'),
(1, 'bread'),
(2, 'bread'),
(2, 'cookies'),
(3, 'chocolates'),
(3, 'cookies'),
(4, 'milk');
Since order_id=1 has 3 lines,
order_id=2 has 2 lines,
order_id=3 has 2 lines,
order_id=4 has 1 line.
Thus it yield our solution:
orders line
1 1
2 2
1 3
This is because both order_id = 2 and 3 has 2 lines. So it would mean 2 orders has line = 2.
So far, I have:
select lines,
sum(case when orders_per_line = '1' then 1 else 0),
sum(case when orders_per_line = '2' then 1 else 0),
sum(case when orders_per_line = '3' then 1 else 0)
from(
select lines, order_id, count(*) as orders_per_line from data
where lines in ('1, '2', '3')
group by order_id, lines
)
group by lines
My query is wrong, as I only want 2 columns, and also creating a sequence of numbers ranging from 1 to max(lines per order) is also wrong.
Any suggestions?
Thanks in advance!

Try this:
Select Count(*) as Orders, Lines from (
Select order_id, Count(*) as Lines from data group by order_id
)query group by Lines
For exmaple, look at this sqlfiddle

Try This:
with a AS
(
SELECT
COUNT(order_id) AS Orders
FROM
Table_1
GROUP BY
Order_Id
)
SELECT
Orders,
COUNT(*) AS line
FROM
a
GROUP BY Orders

Basically, it just count how many times the order_id are repeated:
SELECT order_id, count(order_id) FROM data GROUP BY order_id

Related

SELECT random 10% of rows for each category on SQL Server

There is a table of products sold.
row_id
customer
product
date_sold
1
customer_1
thingamajig
01.01.2023
2
customer_12
whosi-whatsi
03.01.2023
3
customer_1
watchamacallit
04.01.2023
4
customer_4
whosi-whatsi
06.01.2023
...
...
...
...
There is always one row per one item.
Let's say customer_1 ordered 100 items total. customer_2 ordered 50 items total. customer_3 ordered 17 items total. How do you select random 10% of rows for each customer? The fraction of rows selected should be rounded up (for example 12 rows total results in 2 selected). That means every customer that bought at least one item should appear in the resulting table. In this case the resulting table for customer_1, customer_2 and customer_3 would have 10 + 5 + 2 = 17 rows.
My initial approach would be to create a temp table, calculate desired row counts for each customer and then loop through the temp table and select rows for each customer. Then insert them to another table and select from that one:
drop table if exists #row_counts
select
customer
ceiling(convert(decimal(10, 2), count(product)) / 10) as row_count
into #row_counts
from products_sold
group by customer
-- then use cursor to loop over #row_counts and insert into the final table
-- for randomness an 'order by newid()' will be used
But this just doesn't feel like the right solution...
You need to know total count and a row count of what you want.
Something like this can perhaps be of service:
EDITED due to it not being randomized properly:
select *
from (
select row_number() over(partition by customerid order by newid()) as sortOrder
, COUNT(*) OVER(PARTITION BY customerID) AS cnt
, *
FROM products
) p
-- Now, we want 10% of total count rounded upwards
WHERE sortOrder <= CEILING(cnt * 0.1)

how can I count some values for data in a table based on same key in another table in Bigquery?

I have one table like bellow. Each id is unique.
id
times_of_going_out
fef666
2
S335gg
1
9a2c50
1
and another table like this one ↓. In this second table the "id" is not unique, there are different "category_name" for a single id.
id
category_name
city
S335gg
Games & Game Supplies
tk
9a2c50
Telephone Companies
os
9a2c50
Recreation Centers
ky
fef666
Recreation Centers
ky
I want to find the difference between destinations(category_name) of people who go out often(times_of_going_out<5) and people who don't go out often(times_of_going_out<=5).
** Both tables are a small sample of large tables.
 ・ Where do people who go out twice often go?
 ・ Where do people who go out 6times often go?
Thank you
The expected result could be something like
less than 5
more than 5
top ten “category_name” for uid’s with "times_of_going_out" less than 5 times
top ten “category_name” for uid’s with "times_of_going_out" more than 5 times
Steps:
combining data and aggregating total time_going_out
creating the categories that you need : less than equal to 5 and more than 5. if you don't need equal to 5, you can adjust the code
ranking both categories with top 10, using dense_rank(). this will produce the rank from 1 - 10 based on the total time_going out
filtering the cases so it takes top 10 values for both categories
with main as (
select
category_name,
sum(coalesce(times_of_going_out,0)) as total_time_per_category
from table1 as t1
left join table2 as t2
on t1.id = t2.id
group by 1
),
category as (
select
*,
if(total_time_per_category >= 5, 'more than 5', 'less than equal to 5') as is_more_than_5_times
from main
),
ranking_ as (
select *,
case when
is_more_than_5_times = 'more than 5' then
dense_rank() over (partition by is_more_than_5_times order by total_time_per_category desc)
else NULL
end AS rank_more_than_5,
case when
is_more_than_5_times = 'less than equal to 5' then
dense_rank() over (partition by is_more_than_5_times order by total_time_per_category)
else NULL
end AS rank_less_than_equal_5
from category
)
select
is_more_than_5_times,
string_agg(category_name,',') as list
from ranking_
where rank_less_than_equal_5 <=10 or rank_more_than_5 <= 10
group by 1

Using SQL to find the total number of customers with over X orders

I've been roasting my brain with my limited SQL knowledge while attempting to come up with a query to run a statistic on my orders database.
Table ORDERS is laid out like this:
CustomerID ProductID (etc)
1 10
1 10
1 11
2 10
4 9
Each purchase is recorded with the customer id and the product ID - there CAN be multiple records for the same customer, and even multiple records with the same customer and product.
I need to come up with a query that can return the amount of customers who bought between X and X distinct products - for example, 3 customers bought less then 5 different products, 10 bought from 5-10 different products, 1 bought over 10 different products.
I'm pretty sure this has something to do with derived tables, but advanced SQL is a new fairly craft to me. Any help would be appreciated!
Try this:
SELECT T1.products_bought, COUNT(T2.cnt) AS total
FROM (
SELECT '<5' AS products_bought, 0 AS a, 4 AS b
UNION ALL
SELECT '5-10', 5, 10
UNION ALL
SELECT '>10', 11, 999999
) T1
LEFT JOIN
(
SELECT COUNT(DISTINCT ProductID) AS cnt
FROM ORDERS
GROUP BY CustomerID
) T2
ON T2.cnt BETWEEN T1.a AND T1.b
GROUP BY a, b
Result:
products_bought total
<5 3
5-10 0
>10 0

mysql SELECT COUNT(*) ... GROUP BY ... not returning rows where the count is zero

SELECT student_id, section, count( * ) as total
FROM raw_data r
WHERE response = 1
GROUP BY student_id, section
There are 4 sections on the test, each with a different number of questions. I want to know, for each student, and each section, how many questions they answered correctly (response=1).
However, with this query, if a student gets no questions right in a given section, that row will be completely missing from my result set. How can I make sure that for every student, 4 rows are ALWAYS returned, even if the "total" for a row is 0?
Here's what my result set looks like:
student_id section total
1 DAP--29 3
1 MEA--16 2
1 NNR--13 1 --> missing the 4th section for student #1
2 DAP--29 1
2 MEA--16 4
2 NNR--13 2 --> missing the 4th section for student #2
3 DAP--29 2
3 MEA--16 3
3 NNR--13 3 --> missing the 4th section for student #3
4 DAP--29 5
4 DAP--30 1
4 MEA--16 1
4 NNR--13 2 --> here, all 4 sections show up because student 4 got at least one question right in each section
Thanks for any insight!
UPDATE: I tried
SELECT student_id, section, if(count( * ) is null, 0, count( * )) as total
and that didn't change the results at all. Other ideas?
UPDATE 2: I got it working thanks to the response below:
SELECT student_id, section, SUM(CASE WHEN response = '1' THEN 1 ELSE 0 END ) AS total
FROM raw_data r
WHERE response = 1
GROUP BY student_id, section
SELECT student_id, section, sum(case when response=1 then 1 else 0 end) as total
FROM raw_data_r GROUP BY student_id, section
Note that there's no WHERE condition.
SELECT r.student_id,
r.subject,
sum( r.response ) as total
FROM raw_data r
GROUP BY student_id, subject
if you have a separate table with student information, you can select students from that table and left join the results to the data_raw table:
SELECT si.student_name, rd.student_id, rd.section, rd.count(*) AS total
FROM student_info AS si LEFT JOIN raw_data AS rd USING rd.student_id = si.student_id
This way, it first selects all students, then executes the count command.

Using Alias with MySql

I want to add an amount to the rows returned from a select. I've been trying things along the lines of:
select *,
3 as amount
from products
where etc....
...and it works. However, I want to do the same thing for lots of rows in one go along the lines of:
select *,
3 as amount,
2 as amount,
4 as amount
from products
where id in ('1','2','3')
However this keeps adding amount columns and not changing the values in each row returned.
The amount is really an amount the users wants, it could be 1-99-4-2 or any number. I wanted to get a table with the results like: products amount --------------------------- ... 1 ... 99 ... 4 ... 2 I just wanted all the mount in one column thats why I was using select ? as amount select ? as amount but it just doesn't seem to work that way :-)
SELECT id, ELT(id, 3, 2, 4) AS amount
FROM products
WHERE id IN ('1', '2', '3')
Try with:
SELECT *, 3 AS amt1, 2 AS amt2, 4 AS amt3 FROM products WHERE id IN ('1','2','3')
Give each alias a unique name. For example, amount1, amount2, etc.
EDIT> If you'd like sum of the columns, use SELECT SUM(amount1, amount2, amount3, ...) FROM ...