How to pivot rows into columns in AWS Athena? - sql

I'm new to AWS Athena and trying to pivot some rows into columns, similar to the top answer in this StackOverflow post.
However, when I tried:
SELECT column1, column2, column3
FROM data
PIVOT
(
MIN(column3)
FOR column2 IN ('VALUE1','VALUE2','VALUE3','VALUE4')
)
I get the error: mismatched input '(' expecting {',', ')'} (service: amazonathena; status code: 400; error code: invalidrequestexception
Does anyone know how to accomplish what I am trying to achieve in AWS Athena?

Extending #kadrach 's answer.
Assuming a table like this
uid | key | value1 | value2
----+-----+--------+--------
1 | A | 10 | 1000
1 | B | 20 | 2000
2 | A | 11 | 1001
2 | B | 21 | 2001
Single column PIVOT works like this
SELECT
uid,
kv1['A'] AS A_v1,
kv1['B'] AS B_v1
FROM (
SELECT uid, map_agg(key, value1) kv1
FROM vtable
GROUP BY uid
)
Result:
uid | A_v1 | B_v1
----+------+-------
1 | 10 | 20
2 | 11 | 21
Multi column PIVOT works like this
SELECT
uid,
kv1['A'] AS A_v1,
kv1['B'] AS B_v1,
kv2['A'] AS A_v2,
kv2['B'] AS B_v2
FROM (
SELECT uid,
map_agg(key, value1) kv1,
map_agg(key, value2) kv2
FROM vtable
GROUP BY uid
)
Result:
uid | A_v1 | B_v1 | A_v2 | B_v2
----+------+------+------+-----
1 | 10 | 20 | 1000 | 2000
2 | 11 | 21 | 1001 | 2001

You can do a single-column PIVOT in Athena using map_agg.
SELECT
uid,
kv['c1'] AS c1,
kv['c2'] AS c2,
kv['c3'] AS c3
FROM (
SELECT uid, map_agg(key, value) kv
FROM vtable
GROUP BY uid
) t
Credit goes to this website. Unfortunately I've not found a clever way to do a multi-column pivot this way (I nest the query, which is not pretty).

I had the same issue with using PIVOT function. However I used a turn around way to obtain a similar format data set :
select
columnToGroupOn,
min(if(colToPivot=VALUE1,column3,null)) as VALUE1,
min(if(colToPivot=VALUE2,column3,null)) as VALUE2,
min(if(colToPivot=VALUE3,column3,null)) as VALUE3
from
data
group by columnToGroupOn

Related

How to create 2 columns using data from 1 column and merging them

I'm facing the some problems in big query, the single column could not separate into 2 columns. I want the column index with 8 and 10 to be new columns called universal_id and project_id using the value in the column "value".
My current table is:
user_id | index | value
a. | 1. | 123
b. | 8. | 456
c. | 10. | 12.60
b. | 10. | 789
I want the result to be this:
user_id | project_id | universal_id |
a | NA | NA
b. | 789 | 456
c. | 12.60 | NA
I have tried this, but it does not work. I searched a lot of places, and could find the answer I am looking for. Any help would be greatly appreciated. Thank you in advance!!!
select user_id,
case when index = 8 then value else null end as universal_id,
case when index = 10 then value else null end as ps_project_id
from test_1
You may use conditional aggregation here:
SELECT
user_id,
MAX(CASE WHEN index = 10 THEN value END) AS project_id,
MAX(CASE WHEN index = 8 THEN value END) AS universal_id
FROM test_1
GROUP BY user_id;
Consider below approach
select * from your_table
pivot (
min(value) for case index
when 10 then 'project_id'
when 8 then 'universal_id'
end in ('project_id', 'universal_id')
)
if applied to sample data in your question - output is

How do I merge and delete duplicated rows in SQL using UPDATE?

For example, I have a table of:
id | code | name | type | deviceType
---+------+------+------+-----------
1 | 23 | xyz | 0 | web
2 | 23 | xyz | 0 | mobile
3 | 24 | xyzc | 0 | web
4 | 25 | xyzc | 0 | web
I want the result to be:
id | code | name | type | deviceType
---+------+------+------+-----------
1 | 23 | xyz | 0 | web&mobile
2 | 24 | xyzc | 0 | web
3 | 25 | xyzc | 0 | web
How do I do this in SQL Server using UPDATE and DELETE statements?
Any help is greatly appreciated!
I might actually suggest just leaving the original data intact, and instead creating a view here:
CREATE VIEW yourView AS
SELECT ROW_NUMBER() OVER (ORDER BY MIN(id)) AS id,
code, name, type,
STRING_AGG(deviceType, '&') WITHIN GROUP (ORDER BY id) AS deviceType
FROM yourTable
GROUP BY code, name, type;
Demo
One main reason for not actually doing the update is that every time new data comes in, you might possibly have to run that update, over and over. Instead, just keeping the original data and running the view occasionally might perform better here.
Note that I assume that you are using SQL Server 2017 or later. If not, then STRING_AGG would have to be replaced with an uglier approach, but you should consider upgrading in this case.
To do what you want, you would need two separate statements.
This updates the "first" row of each group with all the device types in the group:
update t
set t.devicetype = t1.devicetype
from mytable t
inner join (
select min(id) as id, string_agg(devicetype, '&') within group(order by id) as devicetype
from mytable
group by code, name, type
having count(*) > 1
) t1 on t1.id = t.id
This deletes everything but the first row per group:
with t as (
select row_number() over(partition by code, name, type order by id) rn
from mytable
)
delete from t where rn > 1
Demo on DB Fiddle

SQL Server stored procedure inserting duplicate rows

I have a table with column GetDup and I'd like to the duplicate records based on the value of this column. For example, if value on is 1 in GetDup, then duplicate the record once. If value in the column is 2, then duplicate the record twice and so on and the statement has to be in looping statement.
What will be a good way to write a stored procedures for this? Please help.
Input:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
What I want:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
picture of data
Answer #2 After Clarification
Number Table to the Rescue!
The number table in my example (or tally table, if you want to call it that), is both temporary and very small. To make it bigger, just add more values to z and add more CROSS JOINs. In my opinion, a number table and a calendar table are both things that should be in every database you have. They are extremely useful.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE mytable ( Getdup int, CustomerName varchar(10), CustomerAdd varchar(20) ) ;
INSERT INTO mytable (Getdup, CustomerName, CustomerAdd)
VALUES (1,'John','123 SomeWhere'), (2,'Bob','987 SomeWhere')
;
Query 1:
;WITH z AS (
SELECT *
FROM ( VALUES(0),(0),(0),(0) ) v(x)
)
, numTable AS (
SELECT num
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY z1.x)-1 num
FROM z z1
CROSS JOIN z z2
) s1
)
SELECT t1.Getdup, t1.CustomerName, t1.CustomerAdd
FROM mytable t1
INNER JOIN numTable ON t1.getdup >= numTable.num
ORDER BY CustomerName, CustomerAdd
Results:
| Getdup | CustomerName | CustomerAdd |
|--------|--------------|---------------|
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
--------------------------------------------------------------------------
ORIGINAL ANSWER
EDIT: After further clarification of the problem, this won't duplicate rows, this will only duplicate the data in a column.
Something like one of these might work.
T-SQL
SELECT replicate(mycolumn,getdup) AS x
FROM mytable
MySQL
SELECT repeat(mycolumn,getdup) AS x
FROM mytable
Oracle SQL
SELECT rpad(mycolumn,getdup*length(mycolumn),mycolumn) AS x
FROM mytable
PostgreSQL
SELECT repeat(mycolumn,getdup+1) AS x
FROM mytable
If you can provide more details for exactly what you want and what you're working with, we might be able to help you better.
NOTE 2: Depending on what you need, you may need to do some math magic. You say above if GetDup is 1 then you want one duplicate. If that means that your output should be GetDup``GetDup, then you'll want to add one in the repeat(),replicate() or rpad() functions. ie replicate(mycolumn,getdup+1). Oracle SQL will be a little different, since it uses rpad().
In standard SQL you can use a recursive CTE:
with recursive cte as (
select t.dup, . . .
from t
union all
select cte.dup - 1, . . .
from cte
where cte.dup > 1
)
select *
from cte;
Of course, not all databases support recursive CTEs (and the recursive keyword is not used in some of them).
So, you want recursive solution :
with t as (
select Getdup, CustomerName, CustomerAdd, 0 as id
from table
union all
select Getdup, CustomerName, CustomerAdd, id + 1
from t
where id < getdup
)
insert into table (col1, col2, col3)
select Getdup, CustomerName, CustomerAdd
from t
order by getdup
option (maxrecursion 0);

Create 7 digit codes as ID for every row

I want to generate ID in SQL server 2014 which will be of varchar(7). I am aware of identity column but what I want to do is create ID's based on SEQ_ID of a table. It should start with C and length is 7 Below is an example:-
TABLE A
SEQ_NO NAME
1 a
12 b
30 c
401 d
Output required is:-
SEQ ID NAME
1 C000001 a
12 C000012 b
30 C000030 c
401 C000401 d
Thought of using replicate and Identity column but that does not help me get the desired output. Any thoughts?
Since you are on 2014, yet another option is Format().
To be clear, Format() has some great features, but it is not known to be a high performer.
Select 'C'+format(Seq,'000000')
Returns (for example)
C000012
You can get it using REPLICATE.
SELECT SEQ_NO AS SEQ,
CONCAT('C', REPLICATE('0', 6 - LEN(SQL_NO)), CAST(SEQ_NO AS VARCHAR(10))) ID
NAME
FROM YOUR_TABLE;
using right()
select 'C'+right('000000'+convert(varchar(6),seq_no),6)
demo:
create table a (seq_no int, name varchar(16))
insert into a values
(1 ,'a') ,(12 ,'b') ,(30 ,'c') ,(401,'d')
select *
, Id = 'C'+right('000000'+convert(varchar(6),seq_no),6)
from a
rextester demo: http://rextester.com/XVQ74081
returns:
+--------+------+---------+
| seq_no | name | Id |
+--------+------+---------+
| 1 | a | C000001 |
| 12 | b | C000012 |
| 30 | c | C000030 |
| 401 | d | C000401 |
+--------+------+---------+

Search an SQL table that already contains wildcards?

I have a table that contains patters for phone numbers, where x can match any digit.
+----+--------------+----------------------+
| ID | phone_number | phone_number_type_id |
+----+--------------+----------------------+
| 1 | 1234x000x | 1 |
| 2 | 87654311100x | 4 |
| 3 | x111x222x | 6 |
+----+--------------+----------------------+
Now, I might have 511132228 which will match with row 3 and it should return its type. So, it's kind of like SQL wilcards, but the other way around and I'm confused on how to achieve this.
Give this a go:
select * from my_table
where '511132228' like replace(phone_number, 'x', '_')
select *
from yourtable
where '511132228' like (replace(phone_number, 'x','_'))
Try below query:
SELECT ID,phone_number,phone_number_type_id
FROM TableName
WHERE '511132228' LIKE REPLACE(phone_number,'x','_');
Query with test data:
With TableName as
(
SELECT 3 ID, 'x111x222x' phone_number, 6 phone_number_type_id from dual
)
SELECT 'true' value_available
FROM TableName
WHERE '511132228' LIKE REPLACE(phone_number,'x','_');
The above query will return data if pattern match is available and will not return any row if no match is available.