Increment and Change counter based on change in a column value - sql

My query returns result like this. sqlfiddle
COLOR_NAME
RED
RED
RED
GREEN
GREEN
BLUE
WHITE
WHITE
WHITE
WHITE
WHITE
WHITE
I need to show number with above result. So the desired result is like this.
COLOR_NAME SORT_NO
RED 10
RED 11
RED 12
GREEN 10
GREEN 11
BLUE 10
WHITE 10
WHITE 11
WHITE 12
WHITE 13
WHITE 14
WHITE 15
How could I achieve this in ms sql?

You can use ROW_NUMBER() function
select COLOR_NAME
, 9 + ROW_NUMBER() OVER (PARTITION BY COLOR_NAME ORDER BY ID) AS Sort_No
from TB_COLOR
SQLFiddle

Related

how to count the occurences of a value

How to count the number of occurences for a histogram using dataframes
d = {'color': ["blue", "green", "yellow", "red, blue", "green, yellow", "yellow, red, blue"],}
df = pd.DataFrame(data=d)
How do you go from
color
blue
green
yellow
red, blue
green, yellow
yellow, red, blue
to
color
occurance
blue
3
green
2
yellow
3
Let's try split by regex ,s\* for comma with zero or more whitespaces, then explode into rows and value_counts to get the count of values:
s = (
df['color'].str.split(r',\s*')
.explode()
.value_counts()
.rename_axis('color')
.reset_index(name='occurance')
)
Or can split and expand then stack:
s = (
df['color'].str.split(r',\s*', expand=True)
.stack()
.value_counts()
.rename_axis('color')
.reset_index(name='occurance')
)
s:
color occurance
0 blue 3
1 yellow 3
2 green 2
3 red 2
Here is another way using .str.get_dummies()
df['color'].str.get_dummies(sep=', ').sum()

How do you “pivot” using conditions, aggregation, and concatenation in Pandas?

I have a dataframe in a format such as the following:
Index Name Fruit Quantity
0 John Apple Red 10
1 John Apple Green 5
2 John Orange Cali 12
3 Jane Apple Red 10
4 Jane Apple Green 5
5 Jane Orange Cali 18
6 Jane Orange Spain 2
I need to turn it into a dataframe such as this:
Index Name All Fruits Apples Total Oranges Total
0 John Apple Red, Apple Green, Orange Cali 15 12
1 Jane Apple Red, Apple Green, Orange Cali, Orange Spain 15 20
Question is how do I do this? I have looked at the groupby docs as well as a number of posts on pivot and aggregation but translating that into this use case somehow escapes me. Any help or pointers much appreciated.
Cheers!
Use GroupBy.agg with join, create column F by split and pass to DataFrame.pivot_table, last join together by DataFrame.join:
df1 = df.groupby('Name', sort=False)['Fruit'].agg(', '.join)
df2 = (df.assign(F = df['Fruit'].str.split().str[0])
.pivot_table(index='Name',
columns='F',
values='Quantity',
aggfunc='sum')
.add_suffix(' Total'))
df3 = df1.to_frame('All Fruits').join(df2).reset_index()
print (df3)
Name All Fruits Apple Total \
0 John Apple Red, Apple Green, Orange Cali 15
1 Jane Apple Red, Apple Green, Orange Cali, Orange Spain 15
Orange Total
0 12
1 20

SQL query to select records based on existence of required or lack of excluded values

I'm hoping for some assistance in building a simple query that will return a list of names from a given table where an entry containing a required color exists and no entry containing an excluded color exists.
id name color
--- -------- --------
1 james red
2 james blue
3 james green
4 jim red
5 jim purple
6 bob white
7 bob red
8 bob pink
9 charlie white
10 charlie green
11 charlie black
12 kate violet
13 kate pink
14 kate red
I want to select all names where:
there must be a 'red' entry, i.e. excluding charlie
there must not be a 'pink' entry, i.e. excluding kate and bob
i.e.
james - included, has red, does not have pink
jim - included, has red, does not have pink
bob - excluded, has red but also has pink, which is excluded
charlie - excluded, does not have red
kate - excluded, has red, but also has pink, which is excluded
Ideally the output would include the list of distinct names (i.e. james, jim) and the query would allow me to use lists of colors for the required or excluded colors.
Thanks for your help!
You can use aggregation:
select name
from t
where color in ('pink', 'red')
group by name
having min(color) = 'red' and min(color) = max(color);
This version just limits the colors to 'pink' and 'red'. The having clause checks that only one color is present for a name, and that that color is 'red'.
Yes, you can use the IN and NOT IN operator in the WHERE clause. Example:
SELECT *
FROM table
WHERE column_1 IN ('red')
AND column_1 NOT IN ('pink')
If the list of inclusions and exclusions are static then you can use the query above.
If the list is dynamic, such as a table that stores the inclusion and exclusion lists, then you can replace the static values with a SELECT statement.

Pandas Dataframe rename duplicate values in Row

I have a dataframe
COL1 COL2 COL3
Red Blue Green
Red Yellow Blue
Blue Red Blue
I want to rename value in the dataframe if they appear 2x (or more) in a row
So the expected output is
COL1 COL2 COL3
Red Blue Green
Red Yellow Blue
Blue Red 2Blue
We can use a custom function here which will check if values are duplicated in a row and add an incremental counter to each of them after using series.mask:
def myf(x):
counter = x.groupby(x).cumcount().add(1).astype(str)
return x.mask(x.duplicated(),x.radd(counter))
print(df.apply(myf,axis=1))
#or df.T.apply(myf).T
COL1 COL2 COL3
0 Red Blue Green
1 Red Yellow Blue
2 Blue Red 2Blue

Columns to multiple rows [duplicate]

This question already has answers here:
Transpose multiple columns to multiple rows with VBA
(4 answers)
Closed 6 years ago.
I would like a macro to convert the following
NAME COLOR1 COLOR2 COLOR3 COLOR4
jane blue pink red teal
john red black green gold
to
NAME COLOR
jane blue
jane pink
jane red
jane teal
john red
john black
john green
john gold
I have tried using the built-in transpose tool, but that does not seem to work. It seems like I need a custom script...
With data in rows 2 and 3, pick a cell and enter:
=INDEX($A$2:$A$9999,ROUNDUP(ROWS($1:1)/4,0))
Next to it enter:
=OFFSET($B$2,ROUNDUP(ROWS($1:1)/4,0)-1,MOD(ROWS($1:1)-1,4))
and copy these down:
If you really love macros, have the macro deposit and copy the formulas.