postgres extract data from column and update another column - sql

The table has 217K records.
I have a table which consist of bands column . It consists of sample data like 'GSM1800, GSM 900,LTE 100,LTE 200,WCDMA 100,CA 100, CA 200. My objective is to showcase GSM, LTE,WCDMA, CA in separate columns while selecting( basically using temp separate columns for each band type).
Initially I was doing a select using the below query ,
select tac,
bands,
string_agg( gsm, ', ' ORDER BY gsm) gsm,
string_agg( lte, ', ' ORDER BY lte) lte,
string_agg( ca, ', ' ORDER BY ca) ca,
string_agg( wcdma, ', ' ORDER BY wcdma) wcdma,
from (
select
tac,
bands,
array_to_string(regexp_matches(bands,'(GSM[\s+\w+]+)','g'), '') as gsm,
array_to_string(regexp_matches(bands,'(LTE[\s+\w+]+)','g'), '') as lte,
array_to_string(regexp_matches(bands,'(CA[\s+\w+\-]+)','g'), '') as ca,
array_to_string(regexp_matches(bands,'(WCDMA[\s+\w+]+)','g'), '') as wcdma,
from
(select * from table_lookup )a )b
group by tac,bands
The above query was very slow.
Later my lead suggested we should create the separate columns and update the table.
I tried to test it with 9 rows and tried to update GSM column.
Using the below query
UPDATE wireless_nta.tac_lookup
SET gsm = (select string_agg( gsm, ', ' ORDER BY gsm) gsm from
(select array_to_string(regexp_matches(bands,'(GSM[\s+\w+]+)','g'), '') as gsm from table_lookup )a)
FROM table_lookup AS m
inner JOIN table_lookup AS g on m.id = g.id
WHERE table_lookup.id = m.id and table_lookup.id <10;
this is causing weird update.
for example id =1 , bands column consist of GSM 1900,GSM850 (GSM800),WCDMA100,LTE200
but GSM is recorded as
GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800
instead of GSM 1900,GSM850 (GSM800).

I think your INNER JOIN clause is causing you problems. The FROM clause in your UPDATE query will already join the table once:
When a FROM clause is present, what essentially happens is that the target table is joined to the tables mentioned in the from_item list, and each output row of the join represents an update operation for the target table.
https://www.postgresql.org/docs/current/sql-update.html
Additionally, since you select the table again in the subquery, the FROM clause in the UPDATE query can be omitted completely.
Starting form this schema as a minimal example:
CREATE TABLE items (
"id" INTEGER,
"categories_original" TEXT
);
INSERT INTO items
("id", "categories_original")
VALUES
('1', 'A1,A2,A3,B1,C1'),
('2', 'C1'),
('3', 'A1,A1,B1,C1'),
('4', 'A1,A4,B1,C1'),
('5', 'A1,A3,B1,C1');
ALTER TABLE items
ADD COLUMN "categories_a" TEXT,
ADD COLUMN "categories_b" TEXT,
ADD COLUMN "categories_c" TEXT;
A single column can be updated as follows (for the fist 10 items):
-- o: outer item
-- i: inner item
-- l: category list
-- n: category name
UPDATE items o SET categories_a = (
SELECT string_agg(l.n, ', ' ORDER BY l.n)
FROM (
SELECT array_to_string(regexp_matches(i.categories_original,'(A[\s+\w+]+)', 'g'), '') n
FROM items i
WHERE i.id = o.id
) l
)
WHERE o.id < 10;

Related

SQL: ANSI Subquery join

I'm practising subquery in Snowflake (it supports common SQL queries). I stuck and cannot get the result I want.
Data
INSERT INTO members (memid, surname, firstname, address, zipcode, telephone, recommendedby, joindate) VALUES
(12, 'Baker', 'Anne', '55 Powdery Street, Boston', 80743, '844-076-5141', 9, '2012-08-10 14:23:22'),
(21, 'Mackenzie', 'Anna', '64 Perkington Lane, Reading', 64577, '(822) 661-2898', 1, '2012-08-26 09:32:05'),
(6, 'Tracy', 'Burton', '3 Tunisia Drive, Boston', 45678, '(822) 354-9973', NULL, '2012-07-15 08:52:55');
I want to get each member's name, member id, recommender's name and recommender's id.
My code
with recommender as (
select distinct concat(t1.firstname, ' ', t1.surname) recommender
, memid recommender_id
from "EXERCISES"."CD"."MEMBERS" t1
where exists (select surname from "EXERCISES"."CD"."MEMBERS" t2
where t1.memid = t2.recommendedby)
)
, member as (
select
distinct concat(firstname, ' ', surname) as member,
memid,
recommender,
recommender_id
from recommender
left join "EXERCISES"."CD"."MEMBERS" t3 on recommender.recommender_id = t3.recommendedby
) select * from member
order by member;
My output
Noticed that Burton Tracy is missing from the output because she doesn't have any recommender. I want to keep her data in the output. How should I rewrite my code?
Thank you
I'm not quite sure why you are using CTEs for this...? Or subqueries, for that matter.
Getting the person who recommeded a member is not more than a LEFT JOIN:
select
concat(m.firstname, ' ', m.surname) as member,
m.memid member_id,
concat(r.firstname, ' ', r.surname) as recommender
r.memid recommender_id
from
members m
left join members r on r.memid = m.recommendedby

Combining/collapsing rows with group_by + conditional on column for next row

What I have is a chat from a telecom company signing up new customers. In the chat, a customer and company representative have a chat.
I am trying to collapse the chat, so that there are less rows. The diagrams below show how the before data looks like and how the after data needs to look like.
BEFORE
AFTER
I have looked at the following articles:
SQL - How to combine rows based on unique values
Optimal way to concatenate/aggregate strings
How to sort the result from string_agg()
I have tried this code:
select
unique_id, string_agg(concat(text, ' ', text), ', ')
from
conversation
group by
unique_id, user
However, this does not collapse it as necessary. It collapses it entirely to 2x lines, one for customer and another for company. The logic I am looking for is If the next row in this query contains the same unique_id, user Then concatenate the current row text field with the next row text field.
Here is the SQL Fiddle page, but I am running this code in SQL Server which has string_agg: http://sqlfiddle.com/#!9/5ad86c/3
I have requested an almost similar algorithm in R, if you look at my StackOverflow history.
CREATE TABLE conversation
(
`unique_id` double,
`line_no` int,
`user` varchar(7000),
`text` varchar(7000)
);
INSERT INTO conversation (`unique_id`, `line_no`, `user`, `text`)
VALUES
(50314585222, 1, 'customer', 'Hi I would like to sign up for a service'),
(50314585222, 2, 'company', 'Hi My name is Alex. We can offer the following plans. We also have signup bonuses, with doubling of data for 12 months '),
(50314585222, 3, 'company', 'Plan1: 40GB data, with monthly price of $80'),
(50314585222, 4, 'company', 'Plan2: 20GB data, with monthly price of $40'),
(50314585222, 5, 'company', 'Plan3: 5GB data, with monthly price of $15'),
(50314585222, 6, 'customer', 'I was hoping for a much smaller plan, with only voice service'),
(50314585222, 7, 'customer', 'maybe the $10 per month plan.'),
(50319875222, 4, 'customer', 'so how do I sign up'),
(50319875222, 5, 'customer', '*for the service'),
(50319875222, 7, 'company', 'maybe I can call you for your details?')
;
If I understand you correctly, the next approach is a possible solution. You need to find the changes and define the appropriate groups:
Table:
CREATE TABLE [conversation]
(
[unique_id] bigint,
[line_no] int,
[user] varchar(7000),
[text] varchar(7000)
);
INSERT INTO [conversation] ([unique_id], [line_no], [user], [text])
VALUES
(50314585222, 1, 'customer', 'Hi I would like to sign up for a service'),
(50314585222, 2, 'company', 'Hi My name is Alex. We can offer the following plans. We also have signup bonuses, with doubling of data for 12 months '),
(50314585222, 3, 'company', 'Plan1: 40GB data, with monthly price of $80'),
(50314585222, 4, 'company', 'Plan2: 20GB data, with monthly price of $40'),
(50314585222, 5, 'company', 'Plan3: 5GB data, with monthly price of $15'),
(50314585222, 6, 'customer', 'I was hoping for a much smaller plan, with only voice service'),
(50314585222, 7, 'customer', 'maybe the $10 per month plan.'),
(50319875222, 4, 'customer', 'so how do I sign up'),
(50319875222, 5, 'customer', '*for the service'),
(50319875222, 7, 'company', 'maybe I can call you for your details?')
;
Statement:
; WITH ChangesCTE AS (
SELECT
*,
LAG([user]) OVER (PARTITION BY [unique_id] ORDER BY [line_no]) AS prev_user
FROM [conversation]
), GroupsCTE AS (
SELECT
*,
SUM(CASE WHEN [user] <> [prev_user] OR [prev_user] IS NULL THEN 1 ELSE 0 END) OVER (PARTITION BY [unique_id] ORDER BY [line_no]) AS [group_id]
FROM ChangesCTE
)
SELECT
[unique_id],
MIN([line_no]) AS [line_no],
MIN([user]) AS [user],
STRING_AGG([text], ' ') WITHIN GROUP (ORDER BY [line_no]) AS [text]
FROM GroupsCTE
GROUP BY [unique_id], [group_id]
ORDER BY [unique_id]
Result:
unique_id line_no user text
50314585222 1 customer Hi I would like to sign up for a service
50314585222 2 company Hi My name is Alex. We can offer the following plans. We also have signup bonuses, with doubling of data for 12 months Plan1: 40GB data, with monthly price of $80 Plan2: 20GB data, with monthly price of $40 Plan3: 5GB data, with monthly price of $15
50314585222 6 customer I was hoping for a much smaller plan, with only voice service maybe the $10 per month plan.
50319875222 4 customer so how do I sign up *for the service
50319875222 7 company maybe I can call you for your details?
This is a gaps-and-island problems, where you want to group together adjacent rows by the same speaker.
For this to be solvable, you need a column to order the records. It seems we cannot use line_no, that has duplicate values in the same conversation. I still assumed that such column exists, and is called ordering_col.
select
unique_id,
min(line_no) line_no,
user,
string_agg(text) within group(order by ordering_id) text
from (
select
t.*,
row_number() over(partition by unique_id order by ordering_id) rn1,
row_number() over(partition by unique_id, user order by ordering_id) rn2
from mytable t
) t
group by unique_id, user, rn1 - rn2
order by unique_id, min(ordering_id)

TSQL: How do I search for a grade within a list of grades?

How do I search for a grade within a list of grades? Some grades are of string data type, for example 'PK', and 'KK'.
The production list has over a thousand students each in different grade levels, so I'm not sure how to ensure the query would address that.
The logic I'm trying for is something like WHERE CurrentGrade like ('%SchoolGrades%').
Example query results:
Desired result:
(I didn't design the tables I have to work with as I know they are not optimal, but it's what I have to work with, thanks for the help.)
Sample code:
CREATE TABLE #StudentGrades(
StudentID int
, CurrentGrade varchar(255)
, SchoolEarliestGrade varchar(255)
, SchoolID int
, School varchar(255)
, SchoolGrades varchar(255)
)
INSERT INTO #StudentGrades (StudentID, CurrentGrade, SchoolEarliestGrade, SchoolID, School, SchoolGrades)
VALUES
(7777777, 11, 'PK' , 111 ,'Smith Elementary' ,'PK, KK, 01, 02, 03, 04, 05'),
(7777777, 11, '06' , 222 ,'Jones Middle' ,'06, 07, 08'),
(7777777, 11, '09' , 333 ,'Perez High School' ,'09, 10, 11, 12')
SELECT * FROM #StudentGrades
This will give you the rows where the CurrentGrade is in the SchoolGrades.
SELECT *
FROM StudentGrades
WHERE ', ' + SchoolGrades + ', ' LIKE '%, ' + CurrentGrade + ', %'
Edit: This is the best solution with help from the comments. Thanks, all.
Based on the fatc that the grades are in string field you can use like
select * from StudentGrades
where schoolGrades like '%11%';
or
select * from StudentGrades
where schoolGrades like '%KK%';
select * from StudentGrades
where schoolGrades like '%KK%' OR schoolGrades like '%PK%';
One could use a recursive CTE to unpivot the data in, separated values of SchoolGrades. Then derrive the needed values, and fianally use a simple where clause in the select against the CTE. Not sure of performance as the recursive loop is a record by record approach; however it may be faster than a full table scan with 2 ors.
Working example:
http://rextester.com/NAB12900
strCTE gets the values to normalize the data for us
subcte provides the the needed grades on individual rows with data normalized
and the last query simply limits to only current grade matching stringvalue.
.
WITH StrCTE AS
(
SELECT 1 start, CHARINDEX(',' , schoolGrades) stop, StudentID, CurrentGrade, SchoolEarliestGrade, SchoolID, School, SchoolGrades
FROM#studentGrades A
UNION ALL
SELECT stop + 1, CHARINDEX(',' ,schoolgrades , stop + 1), StudentID, CurrentGrade, SchoolEarliestGrade, SchoolID, School, SchoolGrades
FROM StrCTE A
WHERE stop > 0
),
SUBCTE AS (SELECT StudentID, CurrentGrade, SchoolEarliestGrade, SchoolID, School, SchoolGrades, ltrim(SUBSTRING(schoolgrades , start, CASE WHEN stop > 0 THEN stop-start ELSE 4000 END)) AS stringValue
FROM StrCTE)
SELECT *
FROM SUBCTE
WHERE currentgrade = stringValue
IMO this strCTE query gives you the ability to normalize the data allowing other standard SQL queries to function. Maybe create STRCTE as a materialized view on which your analysis is done. as the materialized view can have indexes which reduce the performance loss from the recursive loop.

How do I exclude entries from a recursive CTE?

How can I exclude entries from a recursive CTW with Sqlite?
CREATE TABLE GroupMembers (
group_id VARCHAR,
member_id VARCHAR
);
INSERT INTO GroupMembers(group_id, member_id) VALUES
('1', '10'),
('1', '20'),
('1', '30'),
('1', '-50'),
('2', '30'),
('2', '40'),
('3', '1'),
('3', '50'),
('4', '-10'),
('10', '50'),
('10', '60');
I want a query that will give me the list of members (recursively) in the group. However, a member with the first character being '-' means that the id that comes after the minus is NOT in the group.
For example, the members of '1' are '10', '20', '30', and '-50'. '10', however, is a group so we need to add its children '50' and '60'. However, '-50' is already a member so we cannot include '50'. In conclusion the members of '1' are '10', '20', '30', '-50', and '60'.
It seems like this query should work:
WITH RECURSIVE members(id) AS (
VALUES('1')
UNION
SELECT gm.member_id
FROM members m
INNER JOIN GroupMembers gm ON mg.group_id=m.id
LEFT OUTER JOIN members e ON '-' || gm.member_id=e.id
WHERE e.id IS NULL
)
SELECT id FROM members;
But I get the error: multiple references to recursive table: members
How can I fix/rewrite this to do what I want?
Note: it doesnt matter whether the '-50' entry is returned in the result set.
I don't have a SQLite available for testing, but assuming the -50 also means that 50 should be excluded as well, I think you are looking for this:
WITH RECURSIVE members(id) AS (
VALUES('1')
UNION
SELECT gm.member_id
FROM GroupMembers gm
JOIN members m ON gm.group_id=m.id
WHERE member_id not like '-%'
AND not exists (select 1
from groupMembers g2
where g2.member_id = '-'||gm.member_id)
)
SELECT id
FROM members;
(The above works in Postgres)
You usually select from the base table in the recursive part and the join back to the actual CTE. The filtering of unwanted rows is then done with a regular where clause not by joining the CTE again. A recursive CTE is defined to terminate when the JOIN finds no more rows.
SQLFiddle (Postgres): http://sqlfiddle.com/#!15/04405/1
Edit after the requirements have changed (have been detailed):
As you need to exclude the rows based on their position (a detail that you didn't provide in your original question). The filter can only be done outside of the CTE. Again I can't test this with SQLite, only with Postgres:
WITH RECURSIVE members(id, level) AS (
VALUES('4', 1)
UNION
SELECT gm.member_id, m.level + 1
FROM GroupMembers gm
JOIN members m ON gm.group_id=m.id
)
SELECT m.id, m.level
FROM members m
where id not like '-%'
and not exists (select 1
from members m2
where m2.level < m.level
and m2.id = '-'||m.id);
Updated SQLFiddle: http://sqlfiddle.com/#!15/ec0f9/3

SQL GROUP BY, adding a column from another table works but produces inaccurate information

so here was my original query:
SELECT Batting.playerID, SUM(G) as 'G', SUM(AB) as 'AB', SUM(R) as 'R', SUM(H) as 'H', SUM(Doub) as '2B', SUM(Trip) as '3B',
SUM(HR) as 'HR', SUM(RBI) as 'RBI', SUM(SB) as 'SB', SUM(CS) as 'CS', SUM(BB) as 'BB',
SUM(SO) as 'SO', SUM(IBB) as 'IBB', SUM(HBP) as 'HBP', SUM(SH) as 'SH', SUM(SF) as 'SF', SUM(GIDP) as 'GIDP',
Master.nameLast, Master.nameFirst,
FROM Batting
JOIN Master ON Batting.playerID = Master.playerID
WHERE Master.nameLast LIKE #lastname + '%'
GROUP BY Batting.playerID, Master.nameLast, Master.nameFirst
here is my new query:
SELECT Batting.playerID, SUM(G) as 'G', SUM(AB) as 'AB', SUM(R) as 'R', SUM(H) as 'H', SUM(Doub) as '2B', SUM(Trip) as '3B',
SUM(HR) as 'HR', SUM(RBI) as 'RBI', SUM(SB) as 'SB', SUM(CS) as 'CS', SUM(BB) as 'BB',
SUM(SO) as 'SO', SUM(IBB) as 'IBB', SUM(HBP) as 'HBP', SUM(SH) as 'SH', SUM(SF) as 'SF', SUM(GIDP) as 'GIDP',
Master.nameLast, Master.nameFirst, Fielding.POS
FROM Batting
JOIN Master ON Batting.playerID = Master.playerID
JOIN Fielding ON Master.playerID = Fielding.playerID
WHERE Master.nameLast LIKE #lastname + '%'
AND Fielding.POS IN ('C', '1B','2B','3B', 'SS', 'LF', 'RF', 'CF', 'OF', 'DH')
GROUP BY Batting.playerID, Master.nameLast, Master.nameFirst, Fielding.POS
All i am doing is adding the pos column from the fielding table. now i had to sum all those columns on the batting table to get totals. some results have multiple POS's on the fielding table that relate to the sums, ie, someone has 50 games at one pos, another 50 at another pos. my query works, but the numbers being returned are getting multiplied a few times along the way (are a lot larger in the second query than the first) and i am getting multiple records based on the pos, i just want one. any thoughts? I tried a left outer join, but that didnt do anything
The proble is that 1 record in the Master table can have multiple records in the Fielding table, which will cause the query to return each record in the Batting table multiple times.
If you need data from the Fielding table, you will need to limit it to a single record per playerID.
You would have to do 2 separate queries and merge with a UNION ALL to avoid the row multiplication.
One of Batting JOIN Master
The other on Fielding JOIN Master