SQL query to split and keep only the top N values - sql

I have the following table data:
| name |items |
--------------------
| Bob |1, 2, 3 |
| Rick |5, 3, 8, 4|
| Bill |2, 4 |
I need to create a table with a split items column, but with the limitation to have at most N items per name. E.g. for N = 3 the table should look like this:
|name |item|
-----------
|Bob |1 |
|Bob |2 |
|Bob |3 |
|Rick |5 |
|Rick |3 |
|Rick |8 |
|Bill |2 |
|Bill |4 |

I have the following query that splits items correctly, but doesn't account for the maximum number N. What should I modify in the query (standard SQL, BigQuery) to account for N?
WITH data_split AS (
SELECT name, SPLIT(items,',') AS item
FROM (
SELECT name, items
-- A lot of additional logic here
FROM data
)
)
SELECT name, item
FROM data_split
CROSS JOIN UNNEST(data_split.item) AS item

You can try a more semi-standard way - works practically everywhere:
WITH
-- your input ...
indata(id,nam,items) AS ( -- need a sorting column "id" to keep the sort order
SELECT 1, 'Bob' ,'1,2,3' -- blanks after comma can irritate
UNION ALL SELECT 2, 'Rick','5,3,8,4' -- the splitting function below ...
UNION ALL SELECT 3, 'Bill','2,4'
)
-- real query starts here, replace comma below with "WITH" ...
,
-- exactly 3 integers
i(i) AS (
SELECT 1 -- need to add FROM DUAL , in Oracle, for example ...
UNION ALL SELECT 2
UNION ALL SELECT 3
)
SELECT
id
, nam
, SPLIT(items,',',i) AS item -- SPLIT_PART in other DBMS-s
FROM indata CROSS JOIN i
WHERE SPLIT_PART(items,',',i) <> ''
ORDER BY 1, 3
;
-- out id | nam | item
-- out ----+------+------
-- out 1 | Bob | 1
-- out 1 | Bob | 2
-- out 1 | Bob | 3
-- out 2 | Rick | 3
-- out 2 | Rick | 5
-- out 2 | Rick | 8
-- out 3 | Bill | 2
-- out 3 | Bill | 4

Consider below approach (BigQuery)
select name, trim(item) item
from your_table, unnest(split(items)) item with offset
where offset < 3
if applied to sample data in your question - output is

Related

Remove duplicate rows in table and update duplicated rows id with minimum ID in another table

I have a User table to store unique users, unfortunately something went wrong in my code and the same user kept inserted into table when user update profile.
For instance, User John Doe was created multiple times and the ID is 1, 2, 3 and 4. In table Order the same person placed multiple orders and UID refers to the user in table User and UID 1 and 4 both refers to John Doe.
I am trying to
Remove duplicates in User
Keep the minimum UID in Order if duplicate User ID are used in Order. For example both UID 1 and 4 are used in Order, I want to update 4 to 1.
Table User:
+----+------+-----+
| ID | Name | Gender |
+----+------+-----+
| 1 | John Doe | M |
| 2 | John Doe | M |
| 3 | John Doe | M |
| 4 | John Doe | M |
+----+------+-----+
Table Order:
+----+----+------+-----+
|ID | UID | BILLING_ADDR | STATE |
+----+----+------+-----+
|1 | 1 | XXX | XX |
|2 | 4 | XXX | XX |
|3 | 4 | XXX | XX |
|4 | 4 | XXX | XX |
+----+----+------+-----+
To find the duplicates I have:
SELECT Name, count(1)
FROM User
GROUP BY Name
HAVING count(1) > 1
ORDER BY count(1) DESC;
To remove duplicates I have:
DELETE FROM `User`
WHERE ID NOT IN (
SELECT * FROM (
SELECT MIN(ID) FROM User
GROUP BY Name
)
);
Expected Table User and Order:
+----+------+-----+
| ID | Name | Gender |
+----+------+-----+
| 1 | John Doe | M |
+----+------+-----+
+----+----+------+-----+
|ID | UID | BILLING_ADDR | STATE |
+----+----+------+-----+
|1 | 1 | XXX | XX |
|2 | 1 | XXX | XX |
|3 | 1 | XXX | XX |
|4 | 1 | XXX | XX |
+----+----+------+-----+
Unfortunately I don't know how to update the UID in table Order, can someone help? Thanks!
The whole scenario as I would do it (I use temporary tables in order to not mess up my database, but it works even better with permanent tables):
-- initial scenario, the two tables:
CREATE LOCAL TEMPORARY TABLE
usr(id,"name",gender)
ON COMMIT PRESERVE ROWS AS (
SELECT 1,'John Doe','M'
UNION ALL SELECT 2,'John Doe','M'
UNION ALL SELECT 3,'John Doe','M'
UNION ALL SELECT 4,'John Doe','M'
)
;
CREATE LOCAL TEMPORARY TABLE
ord(id,uid,billing_addr,state)
ON COMMIT PRESERVE ROWS AS (
SELECT 1, 1,'XXX','XX'
UNION ALL SELECT 2, 4,'XXX','XX'
UNION ALL SELECT 3, 4,'XXX','XX'
UNION ALL SELECT 4, 4,'XXX','XX'
)
;
-- de-dupe the usr table using an insert.. .select into yet another table:
CREATE LOCAL TEMPORARY TABLE
usr_dedup
ON COMMIT PRESERVE ROWS AS
WITH
w_rowcount AS (
SELECT
id
, "name"
, gender
, ROW_NUMBER() OVER(PARTITION BY "name" ORDER BY id) AS rn
FROM usr
)
SELECT
id
, "name"
, gender
FROM w_rowcount
WHERE rn=1;
-- Using the duped and the deduped usr tables to update the old table:
UPDATE ord SET uid = (
SELECT
d.id
FROM (
usr u
JOIN usr_dedup d USING("name")
)
WHERE u.id=ord.uid
);
-- "ord" now contains:
SELECT * FROM ord;
id | uid | billing_addr | state
----+-----+--------------+-------
1 | 1 | XXX | XX
2 | 1 | XXX | XX
3 | 1 | XXX | XX
4 | 1 | XXX | XX
-- now rename the two usr tables to have the usr table with de-duped records:
ALTER TABLE usr RENAME TO usr_dup;
ALTER TABLE
ALTER TABLE usr_dedup RENAME TO usr;
ALTER TABLE

Query to return non identical rows in two tables with same schema

I have been doing a bit of searching for a while now on a particular problem, but I can't quite find this particular question
I have a rather unusual task to achieve in SQL:
I have two tables, say A and B, which have exactly the same column names, of the following form:
id | column_1 | ... | column_n
Both tables have the same number of rows, with the same id's, but for a given id there is a chance that the rows from tables A and B differ in one or more of the other columns.
I want a query which returns all rows from table A for which the corresponding row in table B is not identical.
Suppose
Table A
|ID |C_ID | FName | Phone | Email |Title |
|:---|-----:|------:|------:|------:|-----------------|
|28 | abc | xyz |50925 |19080 |software engineer|
|29 | def | mno |50926 |19081 |software engineer|
|30 | def | pqr |50927 |19082 |software engineer|
Table B
|ID |C_ID | FName | Phone | Email |Title |
|:---|-----:|------:|------:|------:|-----------------|
|28 | abc | xyz |50925 |19080 |software engineer|
|29 | def | mno |50926 |19081 |Data Analyst |
|30 | def | pqr |6000 |19082 |software engineer|
The result should be:
|ID |C_ID | FName | Phone | Email |Title |
|:---|-----:|------:|------:|------:|-----------------|
|29 | def | mno |50926 |19081 |Data Analyst |
|30 | def | pqr |6000 |19082 |software engineer|
The value in phone and title is not matching with previous values. I need the query which will return the records which got updated.
Use MINUS:
SELECT * FROM table_a
MINUS
SELECT * FROM table_b
Which, for the sample data:
CREATE TABLE table_a (id, column1, column2) AS
SELECT 1, 1, 1 FROM DUAL UNION ALL
SELECT 2, 2, 2 FROM DUAL UNION ALL
SELECT 3, 3, 3 FROM DUAL UNION ALL
SELECT 4, 4, 4 FROM DUAL;
CREATE TABLE table_b (id, column1, column2) AS
SELECT 1, 1, 1 FROM DUAL UNION ALL
SELECT 2, 2, 1 FROM DUAL UNION ALL
SELECT 3, 3, 3 FROM DUAL UNION ALL
SELECT 4, 4, 5 FROM DUAL;
Outputs:
ID
COLUMN1
COLUMN2
2
2
2
4
4
4
fiddle

How to perform recursive SQL where statement?

Update
Thank you to #forpas and #trincot for sharing their solutions and ideas below. I got it working with the following code (demo):
with recursive cte_comments as (
select
*
from
comments
where parent_comment_id = 1
union all
select
this_execution.*
from
cte_comments prev_execution
inner join comments this_execution
on this_execution.parent_comment_id = prev_execution.comment_id
)
select * from cte_comments
Original post
I have the following comments table and data in a SQLite database:
Table structure
-----------------------------------------
| Column | Type |
+++++++++++++++++++++++++++++++++++++++++
| comment_id | integer |
+---------------------------------------+
| parent_comment_id | integer |
+---------------------------------------+
| comment_text | text |
-----------------------------------------
Table data
--------------------------------------------------------------------
| comment_id | parent_comment_id | comment_text |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| 1 | | First comment, level 1 |
--------------------------------------------------------------------
| 2 | 1 | First comment, level 2 |
--------------------------------------------------------------------
| 3 | 2 | First comment, level 3 |
--------------------------------------------------------------------
| 4 | 2 | First comment, level 3 |
--------------------------------------------------------------------
| 5 | | Second comment, level 1 |
--------------------------------------------------------------------
| 6 | 5 | Second comment, level 2 |
--------------------------------------------------------------------
| 7 | 6 | Second comment, level 3 |
--------------------------------------------------------------------
The data is for nested comment section in a website where comment_id is unique and parent_comment_id can be null. Two or more comments can be under one same parent_comment_id. The comment_text column contains random strings.
Question
How to perform SQL search that will return back all children under a parent comment? For example, when I search for all comments under comment 1, I want comment 2, 3 and 4 (all comments that start with First comment) to return back. And when I search for all comments under comment 5, I want comment 6 and 7 (all comments that start with Second comment) to return back.
Do I need to have an intermediary/join table? Do I need to alter my table structure? Or, do I need to use another database engine to make it happen?
With a recursive CTE:
with recursive cte as (
select * from comments
where parent_comment_id = 1
union all
select t.*
from cte c inner join comments t
on t.parent_comment_id = c.comment_id
)
select * from cte
See the demo.
Results:
| comment_id | parent_comment_id | comment_text |
| ---------- | ----------------- | ---------------------- |
| 2 | 1 | First comment, level 2 |
| 3 | 2 | First comment, level 3 |
| 4 | 2 | First comment, level 3 |
If your version of sqlite is 3.8.4 or greater, then you can use the recursive with clause:
with recursive cte (id, name, parent_id) as (
select comment_id,
comment_text,
parent_comment_id
from comments
where parent_comment_id = 1
union all
select c.comment_id,
c.comment_text,
c.parent_comment_id
from comments c
inner join cte
on c.parent_comment_id = cte.comment_id
)
select * from cte;
In the condition parent_comment_id = 1 you would mention the id of the comment of which the descendants should be retrieved.

SQL: How to select distinct on some columns

I have a table looking something like this:
+---+------------+----------+
|ID | SomeNumber | SomeText |
+---+------------+----------+
|1 | 100 | 'hey' |
|2 | 100 | 'yo' |
|3 | 100 | 'yo' | <- Second occurrence
|4 | 200 | 'ey' |
|5 | 200 | 'hello' |
|6 | 200 | 'hello' | <- Second occurrence
|7 | 300 | 'hey' | <- Single
+---+------------+----------+
I would like to extract the rows where SomeNumber appears more than ones, and SomeNumbers and SomeText are distinct. That means I would like the following:
+---+------------+----------+
|ID | SomeNumber | SomeText |
+---+------------+----------+
|1 | 100 | 'hey' |
|2 | 100 | 'yo' |
|4 | 200 | 'ey' |
|5 | 200 | 'hello' |
+---+------------+----------+
I don't know what to do here.
I need something along the lines:
SELECT t.ID, DISTINCT(t.SomeNumber, t.SomeText) --this is not possible
FROM (
SELECT mt.ID, mt.SomeNumber, mt.SomeText
FROM MyTable mt
GROUP BY mt.SomeNumber, mt.SomeText --can't without mt.ID
HAVING COUNT(*) > 1
)
Any suggestions?
Using a cte with row number and count rows might get you what you need:
Create and populate sample table (Please save us this step in your future questions):
CREATE TABLE MyTable(id int, somenumber int, sometext varchar(10));
INSERT INTO MyTable VALUES
(1,100,'hey'),
(2,100,'yo'),
(3,100,'yo'),
(4,200,'ey'),
(5,200,'hello'),
(6,200,'hello'),
(7,300,'hey');
The query:
;WITH cte as
(
SELECT id,
someNumber,
someText,
ROW_NUMBER() OVER (PARTITION BY someNumber, someText ORDER BY ID) rn,
COUNT(id) OVER (PARTITION BY someNumber) rc
FROM MyTable
)
SELECT id, someNumber, someText
FROM cte
WHERE rn = 1
AND rc > 1
Results:
id someNumber someText
1 100 hey
2 100 yo
4 200 ey
5 200 hello

How can I count how much each different values occur in each column independently?

Given table mytable with two columns letter and num
letter|num
------+------
a |1
a |1
b |1
b |2
I tried doing
SELECT letter, count(letter), num, count(num) from mytable group BY letter, num;
but it returns
letter|count|num |count
------+-----+------+-----
b | 1 | 1 | 1
a | 2 | 1 | 2
b | 1 | 2 | 1
whereas I wanted
letter|count|num |count
------+-----+------+-----
a | 2 | 1 | 3
b | 2 | 2 | 1
Is this possible to do, and can I do it in one query?
You could change it to 2 separate aggregates like this.
SELECT 'letter' as type, letter AS item, count(letter)
from mytable group BY letter
UNION ALL --CAST to be same type as letter
SELECT 'num', CAST(num AS varchar(100)), count(num)
from mytable group BY num;