SQL Auto Number rows with grouping - sql

I am not sure if this can even be done, but what I am needing to do is create row numbers for specific columns and not just a simple row_number() as I need a specific pattern or sequence and am not sure how to handle this. Below is the desired result I am attempting in SQL.
COL_1 COL_2 DESIRED RESULT
AAA AAA 0
AAA BBB 1
AAA BBB 1
AAA CCC 2
AAA DDD 3
ABB ABB 0
ABB BBB 1
ABB CCC 2
ABB CCC 2
ABB DDD 3

I interpret this as wanting to enumerate the values of col_2 within col_1.
If so, you can use dense_rank():
select t.*,
dense_rank() over (partition by col_1 order by col_2) - 1 as ranking
from t;
This assumes that the ordering you want is based on the col_2 column within each col_1.

Related

regexp_extract insert new row for each comma

Suppose we have this table and it has a column with multiple values separated by commas. I need to be able to separate the comma values and make a separate row out of it.
SELECT * FROM the_table
customer_id customer_value
1 aaa,bbb,ccc
2 ddd,ggg,ttt,lll
3 ppp,nnn,mmm,kkk,fff
I do not know if regexp_extract is the right function to use here but I am unable to create a new row.
SELECT *,
regexp_extract(customer_value,"^(?:[^,]*,){0}([^,]*)(?:[^,]*,){1}([^,]*)",1) as value_1,
regexp_extract(customer_value,"^(?:[^,]*,){0}([^,]*)(?:[^,]*,){1}([^,]*)",2) as value_2
FROM the_table
customer_id customer_value value_1 value_2
1 aaa,bbb,ccc aaa bbb
2 ddd,ggg,ttt,lll ddd ggg
3 ppp,nnn,mmm,kkk,fff ppp nnn
What I am looking for:
SELECT * FROM the_table
customer_id customer_value customer_value_comma
1 aaa,bbb,ccc aaa
1 aaa,bbb,ccc bbb
1 aaa,bbb,ccc ccc
2 ddd,ggg,ttt,lll ddd
2 ddd,ggg,ttt,lll ggg
2 ddd,ggg,ttt,lll ttt
2 ddd,ggg,ttt,lll lll.........
Here's your SQL:
SELECT
*,
explode( -- turn array into rows
split(customer_value, ",") -- make an array
as customer_value_comma -- rename column
)
FROM the_table
Here's it in pyspark:
from pyspark.sql.functions import split, explode, col
data = [(1,"aaa,bbb,ccc"),
(2,"ddd,ggg,ttt,lll"),
(3,"ppp,nnn,mmm,kkk,fff")]
df = sc.parallelize(data).toDF(["customer_id","customer_value"])
df.withColumn("cust_value_array",explode(split(col("customer_value"),","))).show()

How to transform data rows into new column?

table
id text
1 aaa
121 bbb
4 ccc
1 ddd
new table
id text2
1 aaaddd
121 bbb
4 ccc
I do not think I can use PIVOT since I never know how many and what id and text values would be so I cannot hardcode them in a PIVOT instruction.
use group by with string_agg
select id,string_agg(text,'') as text2
from table
group by id

SQL - Counting unique rows based on a separate field

I am trying to count unique values on a per user basis and end up with a combined count. They may exist more than once per user, but should only be counted once per user.
Example:
user value
1 AAA
1 AAA
1 BBB
1 CCC
2 AAA
2 CCC
2 CCC
3 AAA
3 BBB
3 BBB
3 BBB
Expected result with count:
AAA 3
BBB 2
CCC 2
So values should only be counted once per user, no matter how many times they are present.
I have gotten as far as counting the total number of values with this:
SELECT value, COUNT(value) FROM table GROUP BY value")
But this counts all instances of each value, I cannot work out how to count only the unique values per user and the combine. Hope this makes sense! Many thanks!
Try this:
SELECT value, COUNT(distinct user) FROM table GROUP BY value

SQL: Eliminate rows that have one duplicate values in one column

I have a table that may contain duplicate values in one column. For each distinct value in that column I need to select only the row with the smallest index. I have tried many combinations of distinct() min() and group by but have not been able to figure this one out. This query will be run on sql server 2008.
color | index | user_id | organization_code
blue 44 xxx yyy
blue 66 xxx yyy
red 12 aaa bbb
white 55 ccc ddd
white 68 xxx yyy
The query would return the first, third and fourth rows.
blue 44 xxx yyy
red 12 aaa bbb
white 55 ccc ddd
Do not use keywords such as index as column names. Use windowing functions for your problem, see example below
select color, [index], [USER_ID], organization_code from (
select *, ROW_NUMBER() over (partition by color order by [index]) as ranker from table
) Z where ranker = 1

how do i combine and sum two results in sql?

i have one table, let's call it 'TBL'.
i have one column that have only 3 values available.(let's say 'AAA', 'BBB', 'CCC')
the values can return multiple times.
for example:
TBL
---
Column1
-------
AAA
AAA
BBB
CCC
BBB
CCC
BBB
CCC
AAA
i want to create a table result that looks like this:
TBL-RESULT
----------
AAA+BBB 60%
CCC 40%
i want to show AAA and BBB in one result and there precentage from all values in one line,
and CCC in a second line as well.
the big problem is also that i need to do so in sql of ACCESS (2007).
can someone help me?
thank you,
gady m
Assume table is called MyTable and column is MyColumn
select IIF(MyColumn<>'CCC', 'AAA+BBB', 'CCC'),
100*count(MyColumn='CCC')/(select count(*) from MyTable) from MyTable
group by MyColumn='CCC'