SQL query - Beginner - sql

I'm new to SQL. I need SQL query to achieve the mentioned output.
I have asked a similar query but that doesn't describe my problem well. Here is my detailed requirement.
I have a table with data as below
Table: boxes
+------------+----------+
| box_id | Status |
+------------+----------+
| 1 | created |
| 2 | created |
| 3 | opened |
| 4 | opened |
| 5 | closed |
| 6 | closed |
| 7 | closed |
| 8 | wrapped |
+------------+----------+
With this there is also a status names destroyed But for which there is no box destroyed.
I need an output like this
+--------------+-------+
| Status | Count |
+--------------+-------+
| created | 2 |
| opened | 2 |
| destroyed | 0 |
| other_status | 4 | # this includes status (closed and wrapped)
| total | 8 |
+--------------+-------+
How can this be achieved in SQL. Thanks in advance

If you are using MSSQL or MySQL8.0, you can use CTE as below to achieve your required output-
DEMO HERE
WITH CTE AS
(
SELECT 'created' Status UNION ALL
SELECT 'opened' UNION ALL
SELECT 'destroyed' UNION ALL
SELECT 'other_status'
)
,CTE2 AS
(
SELECT
CASE
WHEN Status IN ('created','opened','destroyed') THEN Status
ELSE 'other_status'
END Status,
SUM(1) Cnt
FROM your_table
GROUP BY
CASE
WHEN Status IN ('created','opened','destroyed') THEN Status
ELSE 'other_status'
END
)
SELECT CTE.Status,ISNULL(CTE2.Cnt, 0) Cnt
FROM CTE LEFT JOIN CTE2 ON CTE.Status = CTE2.Status
UNION ALL
SELECT 'Total' Status, SUM(CTE2.Cnt) FROM CTE2

you can try the following code.
select status, count(box_id)
from table
where status in ('created','opened', 'destroyed')
group by status
UNION ALL
select 'other_status' status, count(box_id)
from table
where status not in ('created','opened', 'destroyed')
UNION ALL
select 'total' status, count(box_id)
from table;

Related

Tableau/SQL Calculated Field With Grouping

I have a table with the following structure
id, event_name, event_date
| 1 | a | 1.1.2020 |
| 2 | b | 3.2.2020 |
| 3 | b | 3.2.2020 |
| 3 | b | 5.2.2020|
| 1 | b | 31.12.2019 |
| 2 | a | 5.1.2020 |
My goal would be to perform a grouping on the id and then I'd have to check wheter the date of an event 'a' comes before an event 'b'. If so I'd like to output 'ok' and 'error' elsewise.
In this example this would result to
id, check
| 1 | error|
| 2 | ok |
| 3 | ok |
Would it be possible to perform the task with a calculated field in Tableau? SQL would be also be ok!
Try this
Select id, case when diff<0 then 'ok'
else 'error' end as status from
(
Select id,
max(case when event_name ='a' then event_date end) -
max(case when event_name='b' then event_date end)
As diff
From table group by id order by id)
You can use this query with UNION clause:
select id, 'Error' "check" from mydata md where event_name='a' and id in
(select id from mydata where id=md.id and md.event_name<>event_name
and md.event_date > event_date)
union
select id, 'Ok' "check" from mydata md where event_name='a' and id in
(select id from mydata where id=md.id and md.event_name<>event_name
and md.event_date < event_date);
Output should be :
| ID | 'ERROR' |
|----|---------|
| 1 | Error |
| 2 | Ok |
ID=3 doesn't appear, because event_name both are 'b'.

How do I select rows with maximum value?

Given this table I want to retrieve for each different url the row with the maximum count. For this table the output should be: 'dell.html' 3, 'lenovo.html' 4, 'toshiba.html' 5
+----------------+-------+
| url | count |
+----------------+-------+
| 'dell.html' | 1 |
| 'dell.html' | 2 |
| 'dell.html' | 3 |
| 'lenovo.html' | 1 |
| 'lenovo.html' | 2 |
| 'lenovo.html' | 3 |
| 'lenovo.html' | 4 |
| 'toshiba.html' | 1 |
| 'toshiba.html' | 2 |
| 'toshiba.html' | 3 |
| 'toshiba.html' | 4 |
| 'toshiba.html' | 5 |
+----------------+-------+
What SQL query do I need to write to do this?
Try to use this query:
select url, max(count) as count
from table_name
group by url;
use aggregate function
select max(count) ,url from table_name group by url
From your comments it seems you need corelated subquery
select t1.* from table_name t1
where t1.count = (select max(count) from table_name t2 where t2.url=t1.url
)
If row_number support on yours sqllite version
then you can write query like below
select * from
(
select *,row_number() over(partition by url order by count desc) rn
from table_name
) a where a.rn=1

SQL - How to find which page is the first for users?

I have a table like this:
+----------+-------------------------------------+----------------------------------+
| user_id | time | url |
+----------+-------------------------------------+----------------------------------+
| 1 | 02.04.2017 8:56 | www.landingpage.com/ |
| 1 | 02.04.2017 8:57 | www.landingpage.com/about-us |
| 1 | 02.04.2017 8:58 | www.landingpage.com/faq |
| 2 | 02.04.2017 6:34 | www.landingpage.com/about-us |
| 2 | 02.04.2017 6:35 | www.landingpage.com/how-to-order |
| 3 | 03.04.2017 9:11 | www.landingpage.com/ |
| 3 | 03.04.2017 9:12 | www.landingpage.com/contact |
| 3 | 03.04.2017 9:13 | www.landingpage.com/about-us |
| 3 | 03.04.2017 9:14 | www.landingpage.com/our-legacy |
| 3 | 03.04.2017 9:15 | www.landingpage.com/ |
+----------+-------------------------------------+----------------------------------+
I want to figure out which page is the first for most users (first page a user see when he comes to the site) and count the number of times it is viewed as the first page.
Is there a way to write a query to do this? I guess I need to use
MIN(time)
in conjunction with grouping but I don't know how.
So regarding the sample I provided it should be like:
url url_count
---------------------------------------------------
www.landingpage.com/ 2
www.landingpage.com/about-us 1
Thanks!
You're correct, you'll need to use the min() aggregate function within a subselect.
select
my_table.url
from
my_table
where
my_table.time = (
select
min(t.time)
from
my_table t
where
t.user_id = my_table.user_id
)
replace my_table with whatever your table is actually named.
To include how many pages the user has seen, you'll need something like this:
select
my_table.url
, (
select
count(t.url)
from
my_table t
where
t.user_id = my_table.user_id
) as url_count
from
my_table
where
my_table.time = (
select
min(t.time)
from
my_table t
where
t.user_id = my_table.user_id
)
SELECT *
FROM my_table
WHERE time IN
(
SELECT min(time)
FROM my_table
GROUP BY url
);
You can query as below:
Select top (1) with ties *
from yourtable
order by row_number() over(partition by user_id order by [time])
You can use outer query to get the same as below:
Select * from (
Select *, RowN = row_number() over(partition by user_id order by [time]) from yourtable) a
Where a.RowN = 1

SQL : Getting duplicate rows along with other variables

I am working on Terradata SQL. I would like to get the duplicate fields with their count and other variables as well. I can only find ways to get the count, but not exactly the variables as well.
Available input
+---------+----------+----------------------+
| id | name | Date |
+---------+----------+----------------------+
| 1 | abc | 21.03.2015 |
| 1 | def | 22.04.2015 |
| 2 | ajk | 22.03.2015 |
| 3 | ghi | 23.03.2015 |
| 3 | ghi | 23.03.2015 |
Expected output :
+---------+----------+----------------------+
| id | name | count | // Other fields
+---------+----------+----------------------+
| 1 | abc | 2 |
| 1 | def | 2 |
| 2 | ajk | 1 |
| 3 | ghi | 2 |
| 3 | ghi | 2 |
What am I looking for :
I am looking for all duplicate rows, where duplication is decided by ID and to retrieve the duplicate rows as well.
All I have till now is :
SELECT
id, name, other-variables, COUNT(*)
FROM
Table_NAME
GROUP BY
id, name
HAVING
COUNT(*) > 1
This is not showing correct data. Thank you.
You could use a window aggregate function, like this:
SELECT *
FROM (
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
) AS sub
WHERE duplicates > 1
Using a teradata extension to ISO SQL syntax, you can simplify the above to:
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
QUALIFY duplicates > 1
As an alternative to the accepted and perfectly correct answer, you can use:
SELECT {all your required 'variables' (they are not variables, but attributes)}
, cnt.Count_Dups
FROM Table_NAME TN
INNER JOIN (
SELECT id
, COUNT(1) Count_Dups
GROUP BY id
HAVING COUNT(1) > 1 -- If you want only duplicates
) cnt
ON cnt.id = TN.id
edit: According to your edit, duplicates are on id only. Edited my query accordingly.
try this,
SELECT
id, COUNT(id)
FROM
Table_NAME
GROUP BY
id
HAVING
COUNT(id) > 1

SELECT only latest record of an ID from given rows

I have this table shown below...How do I select only the latest data of the id based on changeno?
+----+--------------+------------+--------+
| id | data | changeno | |
+----+--------------+------------+--------+
| 1 | Yes | 1 | |
| 2 | Yes | 2 | |
| 2 | Maybe | 3 | |
| 3 | Yes | 4 | |
| 3 | Yes | 5 | |
| 3 | No | 6 | |
| 4 | No | 7 | |
| 5 | Maybe | 8 | |
| 5 | Yes | 9 | |
+----+---------+------------+-------------+
I would want this result...
+----+--------------+------------+--------+
| id | data | changeno | |
+----+--------------+------------+--------+
| 1 | Yes | 1 | |
| 2 | Maybe | 3 | |
| 3 | No | 6 | |
| 4 | No | 7 | |
| 5 | Yes | 9 | |
+----+---------+------------+-------------+
I currently have this SQL statement...
SELECT id, data, MAX(changeno) as changeno FROM Table1 GROUP BY id;
and clearly it doesn't return what I want. This should return an error because of the aggrerate function. If I added fields under the GROUP BY clause it works but it doesn't return what I want. The SQL statement is by far the closest I could think of. I'd appreciate it if anybody could help me on this. Thank you in advance :)
This is typically referred to as the "greatest-n-per-group" problem. One way to solve this in SQL Server 2005 and higher is to use a CTE with a calculated ROW_NUMBER() based on the grouping of the id column, and sorting those by largest changeno first:
;WITH cte AS
(
SELECT id, data, changeno,
rn = ROW_NUMBER() OVER (PARTITION BY id ORDER BY changeno DESC)
FROM dbo.Table1
)
SELECT id, data, changeno
FROM cte
WHERE rn = 1
ORDER BY id;
You want to use row_number() for this:
select id, data, changeno
from (SELECT t.*,
row_number() over (partition by id order by changeno desc) as seqnum
FROM Table1 t
) t
where seqnum = 1;
Not a well formed or performance optimized query but for small tasks it works fine.
SELECT * FROM TEST
WHERE changeno IN (SELECT MAX(changeno)
FROM TEST
GROUP BY id)
for other alternatives :
DECLARE #Table1 TABLE
(
id INT, data VARCHAR(5), changeno INT
);
INSERT INTO #Table1
SELECT 1,'Yes',1
UNION ALL
SELECT 2,'Yes',2
UNION ALL
SELECT 2,'Maybe',3
UNION ALL
SELECT 3,'Yes',4
UNION ALL
SELECT 3,'Yes',5
UNION ALL
SELECT 3,'No',6
UNION ALL
SELECT 4,'No',7
UNION ALL
SELECT 5,'Maybe',8
UNION ALL
SELECT 5,'Yes',9
SELECT Y.id, Y.data, Y.changeno
FROM #Table1 Y
INNER JOIN (
SELECT id, changeno = MAX(changeno)
FROM #Table1
GROUP BY id
) X ON X.id = Y.id
WHERE X.changeno = Y.changeno
ORDER BY Y.id