Controlling decimal precision after resetting index of unstacked Pandas data frame - pandas

My data is as follows:
test_df = pd.DataFrame({'Manufacturer':['Ford', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW'],
'Metric':['Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty'],
'Sector':['Germany', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA'],
'Value':[45000, 70000, 90000, 65000, 40000, 65000, 63000, 2700, 4400, 3400, 3000, 4700, 5700, 1500, 2000, 2500, 1300, 2000, 2450],
'City': ['Frankfurt', 'Bremen', 'Berlin', 'Hamburg', 'New York', 'Chicago', 'Los Angeles', 'Dresden', 'Munich', 'Cologne', 'Miami', 'Atlanta', 'Phoenix', 'Nuremberg', 'Dusseldorf', 'Leipzig', 'Houston', 'San Diego', 'San Francisco']
})
I reset the index and create a pivot table, as follows:
temp_table = test_df.reset_index().pivot_table(values = 'Value', index = ['Manufacturer', 'Metric', 'Sector'], aggfunc='sum')
Then, I create two new data frames:
s1 = temp_table.set_index(['Manufacturer','Sector']).query("Metric=='Orders'").Value
s2 = temp_table.set_index(['Manufacturer','Sector']).query("Metric=='Sales'").Value
Then, I unstack these data frames:
s1.div(s2).unstack()
Which gives me:
Sector Germany USA
Manufacturer
---
BMW 19.117647 11.052632
Ford 42.592593 13.333333
Mercedes 20.454545 13.829787
Then, I reset the index:
df_out = s1.div(s2).reset_index()
Which gives me:
Manufacturer Sector Value
0 BMW Germany 19.117647
1 BMW USA 11.052632
2 Ford Germany 42.592593
3 Ford USA 13.333333
4 Mercedes Germany 20.454545
5 Mercedes USA 13.829787
I would like to be able to round the Value column to 2 decimal places.
I tried to use the round() function, as follows:
df_out['Value'].round(2)
But, this doesn't seem to affect the values when I call df_out again.
What is the best way to control the decimal precision in this case?
Thanks!

Related

Extract words from a column and count frequency

Does anyone know if there's an efficient way to extract all the words from a single column and count the frequency of each word in SQL Server? I only have read-only access to my database so I can't create a self-defined function to do this.
Here's a reproducible example:
CREATE TABLE words
(
id INT PRIMARY KEY,
text_column VARCHAR(1000)
);
INSERT INTO words (id, text_column)
VALUES
(1, 'SQL Server is a popular database management system'),
(2, 'It is widely used for data storage and retrieval'),
(3, 'SQL Server is a powerful tool for data analysis');
I have found this code but it's not working correctly, and I think it's too complicated to understand:
WITH E1(N) AS
(
SELECT 1
FROM (VALUES
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
) t(N)
),
E2(N) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b),
E4(N) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b)
SELECT
LOWER(x.Item) AS [Word],
COUNT(*) AS [Counts]
FROM
(SELECT * FROM words) a
CROSS APPLY
(SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = LTRIM(RTRIM(SUBSTRING(a.text_column, l.N1, l.L1)))
FROM
(SELECT
s.N1,
L1 = ISNULL(NULLIF(CHARINDEX(' ',a.text_column,s.N1),0)-s.N1,4000)
FROM
(SELECT 1
UNION ALL
SELECT t.N+1
FROM
(SELECT TOP (ISNULL(DATALENGTH(a.text_column)/2,0))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E4) t(N)
WHERE SUBSTRING(a.text_column ,t.N,1) = ' '
) s(N1)
) l(N1, L1)
) x
WHERE
x.item <> ''
AND x.Item NOT IN ('0o', '0s', '3a', '3b', '3d', '6b', '6o', 'a', 'a1', 'a2', 'a3', 'a4', 'ab', 'able', 'about', 'above', 'abst', 'ac', 'accordance', 'according', 'accordingly', 'across', 'act', 'actually', 'ad', 'added', 'adj', 'ae', 'af', 'affected', 'affecting', 'affects', 'after', 'afterwards', 'ag', 'again', 'against', 'ah', 'ain', 'ain''t', 'aj', 'al', 'all', 'allow', 'allows', 'almost', 'alone', 'along', 'already', 'also', 'although', 'always', 'am', 'among', 'amongst', 'amoungst', 'amount', 'an', 'and', 'announce', 'another', 'any', 'anybody', 'anyhow', 'anymore', 'anyone', 'anything', 'anyway', 'anyways', 'anywhere', 'ao', 'ap', 'apart', 'apparently', 'appear', 'appreciate', 'appropriate', 'approximately', 'ar', 'are', 'aren', 'arent', 'aren''t', 'arise', 'around', 'as', 'a''s', 'aside', 'ask', 'asking', 'associated', 'at', 'au', 'auth', 'av', 'available', 'aw', 'away', 'awfully', 'ax', 'ay', 'az', 'b', 'b1', 'b2', 'b3', 'ba', 'back', 'bc', 'bd', 'be', 'became', 'because', 'become', 'becomes', 'becoming', 'been', 'before', 'beforehand', 'begin', 'beginning', 'beginnings', 'begins', 'behind', 'being', 'believe', 'below', 'beside', 'besides', 'best', 'better', 'between', 'beyond', 'bi', 'bill', 'biol', 'bj', 'bk', 'bl', 'bn', 'both', 'bottom', 'bp', 'br', 'brief', 'briefly', 'bs', 'bt', 'bu', 'but', 'bx', 'by', 'c', 'c1', 'c2', 'c3', 'ca', 'call', 'came', 'can', 'cannot', 'cant', 'can''t', 'cause', 'causes', 'cc', 'cd', 'ce', 'certain', 'certainly', 'cf', 'cg', 'ch', 'changes', 'ci', 'cit', 'cj', 'cl', 'clearly', 'cm', 'c''mon', 'cn', 'co', 'com', 'come', 'comes', 'con', 'concerning', 'consequently', 'consider', 'considering', 'contain', 'containing', 'contains', 'corresponding', 'could', 'couldn', 'couldnt', 'couldn''t', 'course', 'cp', 'cq', 'cr', 'cry', 'cs', 'c''s', 'ct', 'cu', 'currently', 'cv', 'cx', 'cy', 'cz', 'd', 'd2', 'da', 'date', 'dc', 'dd', 'de', 'definitely', 'describe', 'described', 'despite', 'detail', 'df', 'di', 'did', 'didn', 'didn''t', 'different', 'dj', 'dk', 'dl', 'do', 'does', 'doesn', 'doesn''t', 'doing', 'don', 'done', 'don''t', 'down', 'downwards', 'dp', 'dr', 'ds', 'dt', 'du', 'due', 'during', 'dx', 'dy', 'e', 'e2', 'e3', 'ea', 'each', 'ec', 'ed', 'edu', 'ee', 'ef', 'effect', 'eg', 'ei', 'eight', 'eighty', 'either', 'ej', 'el', 'eleven', 'else', 'elsewhere', 'em', 'empty', 'en', 'end', 'ending', 'enough', 'entirely', 'eo', 'ep', 'eq', 'er', 'es', 'especially', 'est', 'et', 'et-al', 'etc', 'eu', 'ev', 'even', 'ever', 'every', 'everybody', 'everyone', 'everything', 'everywhere', 'ex', 'exactly', 'example', 'except', 'ey', 'f', 'f2', 'fa', 'far', 'fc', 'few', 'ff', 'fi', 'fifteen', 'fifth', 'fify', 'fill', 'find', 'fire', 'first', 'five', 'fix', 'fj', 'fl', 'fn', 'fo', 'followed', 'following', 'follows', 'for', 'former', 'formerly', 'forth', 'forty', 'found', 'four', 'fr', 'from', 'front', 'fs', 'ft', 'fu', 'full', 'further', 'furthermore', 'fy', 'g', 'ga', 'gave', 'ge', 'get', 'gets', 'getting', 'gi', 'give', 'given', 'gives', 'giving', 'gj', 'gl', 'go', 'goes', 'going', 'gone', 'got', 'gotten', 'gr', 'greetings', 'gs', 'gy', 'h', 'h2', 'h3', 'had', 'hadn', 'hadn''t', 'happens', 'hardly', 'has', 'hasn', 'hasnt', 'hasn''t', 'have', 'haven', 'haven''t', 'having', 'he', 'hed', 'he''d', 'he''ll', 'hello', 'help', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'heres', 'here''s', 'hereupon', 'hers', 'herself', 'hes', 'he''s', 'hh', 'hi', 'hid', 'him', 'himself', 'his', 'hither', 'hj', 'ho', 'home', 'hopefully', 'how', 'howbeit', 'however', 'how''s', 'hr', 'hs', 'http', 'hu', 'hundred', 'hy', 'i', 'i2', 'i3', 'i4', 'i6', 'i7', 'i8', 'ia', 'ib', 'ibid', 'ic', 'id', 'i''d', 'ie', 'if', 'ig', 'ignored', 'ih', 'ii', 'ij', 'il', 'i''ll', 'im', 'i''m', 'immediate', 'immediately', 'importance', 'important', 'in', 'inasmuch', 'inc', 'indeed', 'index', 'indicate', 'indicated', 'indicates', 'information', 'inner', 'insofar', 'instead', 'interest', 'into', 'invention', 'inward', 'io', 'ip', 'iq', 'ir', 'is', 'isn', 'isn''t', 'it', 'itd', 'it''d', 'it''ll', 'its', 'it''s', 'itself', 'iv', 'i''ve', 'ix', 'iy', 'iz', 'j', 'jj', 'jr', 'js', 'jt', 'ju', 'just', 'k', 'ke', 'keep', 'keeps', 'kept', 'kg', 'kj', 'km', 'know', 'known', 'knows', 'ko', 'l', 'l2', 'la', 'largely', 'last', 'lately', 'later', 'latter', 'latterly', 'lb', 'lc', 'le', 'least', 'les', 'less', 'lest', 'let', 'lets', 'let''s', 'lf', 'like', 'liked', 'likely', 'line', 'little', 'lj', 'll', 'll', 'ln', 'lo', 'look', 'looking', 'looks', 'los', 'lr', 'ls', 'lt', 'ltd', 'm', 'm2', 'ma', 'made', 'mainly', 'make', 'makes', 'many', 'may', 'maybe', 'me', 'mean', 'means', 'meantime', 'meanwhile', 'merely', 'mg', 'might', 'mightn', 'mightn''t', 'mill', 'million', 'mine', 'miss', 'ml', 'mn', 'mo', 'more', 'moreover', 'most', 'mostly', 'move', 'mr', 'mrs', 'ms', 'mt', 'mu', 'much', 'mug', 'must', 'mustn', 'mustn''t', 'my', 'myself', 'n', 'n2', 'na', 'name', 'namely', 'nay', 'nc', 'nd', 'ne', 'near', 'nearly', 'necessarily', 'necessary', 'need', 'needn', 'needn''t', 'needs', 'neither', 'never', 'nevertheless', 'new', 'next', 'ng', 'ni', 'nine', 'ninety', 'nj', 'nl', 'nn', 'no', 'nobody', 'non', 'none', 'nonetheless', 'noone', 'nor', 'normally', 'nos', 'not', 'noted', 'nothing', 'novel', 'now', 'nowhere', 'nr', 'ns', 'nt', 'ny', 'o', 'oa', 'ob', 'obtain', 'obtained', 'obviously', 'oc', 'od', 'of', 'off', 'often', 'og', 'oh', 'oi', 'oj', 'ok', 'okay', 'ol', 'old', 'om', 'omitted', 'on', 'once', 'one', 'ones', 'only', 'onto', 'oo', 'op', 'oq', 'or', 'ord', 'os', 'ot', 'other', 'others', 'otherwise', 'ou', 'ought', 'our', 'ours', 'ourselves', 'out', 'outside', 'over', 'overall', 'ow', 'owing', 'own', 'ox', 'oz', 'p', 'p1', 'p2', 'p3', 'page', 'pagecount', 'pages', 'par', 'part', 'particular', 'particularly', 'pas', 'past', 'pc', 'pd', 'pe', 'per', 'perhaps', 'pf', 'ph', 'pi', 'pj', 'pk', 'pl', 'placed', 'please', 'plus', 'pm', 'pn', 'po', 'poorly', 'possible', 'possibly', 'potentially', 'pp', 'pq', 'pr', 'predominantly', 'present', 'presumably', 'previously', 'primarily', 'probably', 'promptly', 'proud', 'provides', 'ps', 'pt', 'pu', 'put', 'py', 'q', 'qj', 'qu', 'que', 'quickly', 'quite', 'qv', 'r', 'r2', 'ra', 'ran', 'rather', 'rc', 'rd', 're', 'readily', 'really', 'reasonably', 'recent', 'recently', 'ref', 'refs', 'regarding', 'regardless', 'regards', 'related', 'relatively', 'research', 'research-articl', 'respectively', 'resulted', 'resulting', 'results', 'rf', 'rh', 'ri', 'right', 'rj', 'rl', 'rm', 'rn', 'ro', 'rq', 'rr', 'rs', 'rt', 'ru', 'run', 'rv', 'ry', 's', 's2', 'sa', 'said', 'same', 'saw', 'say', 'saying', 'says', 'sc', 'sd', 'se', 'sec', 'second', 'secondly', 'section', 'see', 'seeing', 'seem', 'seemed', 'seeming', 'seems', 'seen', 'self', 'selves', 'sensible', 'sent', 'serious', 'seriously', 'seven', 'several', 'sf', 'shall', 'shan', 'shan''t', 'she', 'shed', 'she''d', 'she''ll', 'shes', 'she''s', 'should', 'shouldn', 'shouldn''t', 'should''ve', 'show', 'showed', 'shown', 'showns', 'shows', 'si', 'side', 'significant', 'significantly', 'similar', 'similarly', 'since', 'sincere', 'six', 'sixty', 'sj', 'sl', 'slightly', 'sm', 'sn', 'so', 'some', 'somebody', 'somehow', 'someone', 'somethan', 'something', 'sometime', 'sometimes', 'somewhat', 'somewhere', 'soon', 'sorry', 'sp', 'specifically', 'specified', 'specify', 'specifying', 'sq', 'sr', 'ss', 'st', 'still', 'stop', 'strongly', 'sub', 'substantially', 'successfully', 'such', 'sufficiently', 'suggest', 'sup', 'sure', 'sy', 'system', 'sz', 't', 't1', 't2', 't3', 'take', 'taken', 'taking', 'tb', 'tc', 'td', 'te', 'tell', 'ten', 'tends', 'tf', 'th', 'than', 'thank', 'thanks', 'thanx', 'that', 'that''ll', 'thats', 'that''s', 'that''ve', 'the', 'their', 'theirs', 'them', 'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'thered', 'therefore', 'therein', 'there''ll', 'thereof', 'therere', 'theres', 'there''s', 'thereto', 'thereupon', 'there''ve', 'these', 'they', 'theyd', 'they''d', 'they''ll', 'theyre', 'they''re', 'they''ve', 'thickv', 'thin', 'think', 'third', 'this', 'thorough', 'thoroughly', 'those', 'thou', 'though', 'thoughh', 'thousand', 'three', 'throug', 'through', 'throughout', 'thru', 'thus', 'ti', 'til', 'tip', 'tj', 'tl', 'tm', 'tn', 'to', 'together', 'too', 'took', 'top', 'toward', 'towards', 'tp', 'tq', 'tr', 'tried', 'tries', 'truly', 'try', 'trying', 'ts', 't''s', 'tt', 'tv', 'twelve', 'twenty', 'twice', 'two', 'tx', 'u', 'u201d', 'ue', 'ui', 'uj', 'uk', 'um', 'un', 'under', 'unfortunately', 'unless', 'unlike', 'unlikely', 'until', 'unto', 'uo', 'up', 'upon', 'ups', 'ur', 'us', 'use', 'used', 'useful', 'usefully', 'usefulness', 'uses', 'using', 'usually', 'ut', 'v', 'va', 'value', 'various', 'vd', 've', 've', 'very', 'via', 'viz', 'vj', 'vo', 'vol', 'vols', 'volumtype', 'vq', 'vs', 'vt', 'vu', 'w', 'wa', 'want', 'wants', 'was', 'wasn', 'wasnt', 'wasn''t', 'way', 'we', 'wed', 'we''d', 'welcome', 'well', 'we''ll', 'well-b', 'went', 'were', 'we''re', 'weren', 'werent', 'weren''t', 'we''ve', 'what', 'whatever', 'what''ll', 'whats', 'what''s', 'when', 'whence', 'whenever', 'when''s', 'where', 'whereafter', 'whereas', 'whereby', 'wherein', 'wheres', 'where''s', 'whereupon', 'wherever', 'whether', 'which', 'while', 'whim', 'whither', 'who', 'whod', 'whoever', 'whole', 'who''ll', 'whom', 'whomever', 'whos', 'who''s', 'whose', 'why', 'why''s', 'wi', 'widely', 'will', 'willing', 'wish', 'with', 'within', 'without', 'wo', 'won', 'wonder', 'wont', 'won''t', 'words', 'world', 'would', 'wouldn', 'wouldnt', 'wouldn''t', 'www', 'x', 'x1', 'x2', 'x3', 'xf', 'xi', 'xj', 'xk', 'xl', 'xn', 'xo', 'xs', 'xt', 'xv', 'xx', 'y', 'y2', 'yes', 'yet', 'yj', 'yl', 'you', 'youd', 'you''d', 'you''ll', 'your', 'youre', 'you''re', 'yours', 'yourself', 'yourselves', 'you''ve', 'yr', 'ys', 'yt', 'z', 'zero', 'zi', 'zz')
GROUP BY x.Item
ORDER BY COUNT(*) DESC
Here's the result of the above code, as you can see it's not counting correctly:
Word Counts
server 2
sql 2
data 1
database 1
popular 1
powerful 1
Can anyone help on this? Would be really appreciated!
You can make use of String_split here, such as
select value Word, Count(*) Counts
from words
cross apply String_Split(text_column, ' ')
where value not in(exclude list)
group by value
order by counts desc;
You should should the string_split function -- like this
SELECT id, value as aword
FROM words
CROSS APPLY STRING_SPLIT(text_column, ',');
This will create a table with all the words by id -- to get the count do this:
SELECT aword, count(*) as counts
FROM (
SELECT id, value as aword
FROM words
CROSS APPLY STRING_SPLIT(text_column, ',');
) x
GROUP BY aword
You may need to lower case the LOWER(text_column) if you want it to not matter
If you don't have access to STRING_SPLIT function, you can use weird xml trick to convert space to a word node and then shred it with nodes function:
select word, COUNT(*)
from (
select n.value('.', 'nvarchar(50)') AS word
from (
VALUES
(1, 'SQL Server is a popular database management system'),
(2, 'It is widely used for data storage and retrieval'),
(3, 'SQL Server is a powerful tool for data analysis')
) AS t (id, txt)
CROSS APPLY (
SELECT CAST('<x>' + REPLACE(txt, ' ', '</x><x>') + '</x>' AS XML) x
) x
CROSS APPLY x.nodes('x') z(n)
) w
GROUP BY word
Of course, this will fail on "bad" words and invalid xml-characters but it can be worked on. Text processing has never been SQL Server's strong-point though, so probably better to use some NLP library to do this kind of stuff

Pandas: How to filter rows in dataframe which is not equal to the combination of columns in other dataframe?

Below are the two dataframes. I was trying to filter rows in df_2 which are not equal to the combination of df_count rows. How can I achieve this objective?
import pandas as pd
df_1 = pd.DataFrame({'Name_1':['tom', 'jack', 'tom', 'jack', 'tom', 'nick', 'tom', 'jack', 'tom', 'jack'],
'Name_2':['sam', 'sam', 'ruby', 'sam','sam', 'jack', 'ruby', 'sam','ruby', 'sam']})
df_count = df_1.groupby(['Name_1','Name_2']).size().reset_index().rename(columns={0:'count'}).sort_values(['count'], ascending = False)
df_count = df_count.head(2)
df_count = df_count[['Name_1','Name_2']]
df_2 = pd.DataFrame({'Name_1':['tom', 'nick', 'tom', 'jack', 'tom', 'nick', 'tom', 'jack'],
'Name_2':['sam', 'mike', 'ruby', 'sam', 'sam', 'jack', 'ruby', 'sam'],
'Salary':[200, 500, 1000, 7000, 100, 300, 1200, 900],
'Currency':['AUD', 'CAD', 'JPY', 'USD', 'GBP', 'CAD', 'INR', 'USD']})
pd.merge(df_2,df_count, indicator=True, how='outer').query('_merge=="left_only"').drop('_merge', axis=1)
Output:
Name_1 Name_2 Salary Currency
0 tom sam 200 AUD
1 tom sam 100 GBP
2 nick mike 500 CAD
7 nick jack 300 CAD
Answer taken from here.

How to correctly use map and use np.where to replace values in a column

This is the city column of my dataframe- df.city
array(['la', 'hollywood', 'pasadena', 'los angeles', 'new york',
'studio city', 'venice', 'santa monica', 'mar vista',
'beverly hills', 'w. hollywood', 'encino', 'st. boyle hts .',
'westlake village', 'westwood', 'west la', 'chinatown',
'monterey park', 'rancho park', 'redondo beach', 'long beach',
'marina del rey', 'culver city', 'burbank', 'century city',
'malibu', 'seal beach', 'northridge', 'st. hermosa beach'],
dtype=object)
I want the strings containing ['la','hollywood'] to be converted to 'los angeles'. How to do this, i was using np.where(condition,x,y) for this but its third-argument(y) let me down.
To replace the rest of the cities i made this dictionary
cities={'studio city':'los angeles', 'santa monika':'los angeles', 'mar vista':'los angeles', 'beverly hills':'los angeles', 'encino':'los angeles', 'st. boyle hts .':'los angeles', 'westwood':'los angeles', 'chinatown':'los angeles', 'moterey park':'los angeles', 'rancho park':'los angeles', 'redondo beach':'los angeles', 'century city':'los angeles', 'marina del rey':'los angeles', 'malibu':'los angeles', 'seal beach':'los angeles', 'northridge':'los angeles','st. hermosa beach':'los angeles'}
When i use df.city.map(cities) , it maps the ones present in dictionary and replace the others such as 'los angeles' with NaN's.
How can I go about cleaning this column of my dataframe column?
You could use np.where like this:
df['city'] = np.where((df['city'].str.contains('la'))| (df['city'].str.contains('hollywood')), 'los angeles', df['city'])
The third argument is just the original column.

How to reshape an unstacked Pandas data frame to "long" form before passing it to a plotting function

I'm trying to make a simple bar plot displaying ratios using the Plotly px.bar() function.
I have the following data set:
test_df = pd.DataFrame({'Manufacturer':['Ford', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW'],
'Metric':['Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty'],
'Sector':['Germany', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA'],
'Value':[45000, 70000, 90000, 65000, 40000, 65000, 63000, 2700, 4400, 3400, 3000, 4700, 5700, 1500, 2000, 2500, 1300, 2000, 2450],
'City': ['Frankfurt', 'Bremen', 'Berlin', 'Hamburg', 'New York', 'Chicago', 'Los Angeles', 'Dresden', 'Munich', 'Cologne', 'Miami', 'Atlanta', 'Phoenix', 'Nuremberg', 'Dusseldorf', 'Leipzig', 'Houston', 'San Diego', 'San Francisco']
})
I reset the index and create a pivot table, as follows::
temp_table = test_df.reset_index().pivot_table(values = 'Value', index = ['Manufacturer', 'Metric', 'Sector'], aggfunc='sum')
Then, I create two new data frames:
s1 = temp_table.set_index(['Manufacturer','Sector']).query("Metric=='Orders'").Value
s2 = temp_table.set_index(['Manufacturer','Sector']).query("Metric=='Sales'").Value
Then, I unstack these data frames:
s1.div(s2).unstack()
Which gives me:
Sector Germany USA
Manufacturer
---
BMW 19.117647 11.052632
Ford 42.592593 13.333333
Mercedes 20.454545 13.829787
I'd like to be able to make a bar plot using the data above, with Manufacturer on the x-axis and colored by Sector, as follows:
To do so, I think I need the data to be in the following long form:
Manufacturer Sector Ratio
BMW Germany 19.117647
Ford Germany 42.592593
Mercedes Germany 20.454545
BMW USA 11.052632
Ford USA 13.333333
Mercedes USA 13.829787
Question: how would I reshape the unstacked data above such that I would be able to pass it to the Plotly px.bar() function, which requires the following for the x-axis and y-axis arguments:
x (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to position marks along the x axis in cartesian coordinates. Either x or y can optionally be a list of column references or array_likes, in which case the data will be treated as if it were ‘wide’ rather than ‘long’.
Thanks in advance!
Just do not do unstack
df_out=s1.div(s2).reset_index()
This should give you the bar chart you have up there.
test_df.groupby(['Manufacturer', 'Sector'])['Value'].sum().unstack('Sector').plot.bar()

How do I use count and group by as a condition for another table

So to preface, I'm a first-year comp sci student and we've only just started on SQL, so forgive me if the solution seems obvious.
We were given a database for Zoo, which has tables for Animals, Keepers, and a link entity (if that's the right word) for care roles, connecting the two.
(Schema below)
CREATE TABLE Animal (ID VARCHAR(6) PRIMARY KEY, Name VARCHAR(10), Species
VARCHAR(20),
Age SMALLINT, Sex VARCHAR(1), Weight SMALLINT, F_ID VARCHAR(6), M_ID
VARCHAR(6));
CREATE TABLE Keeper (Staff_ID VARCHAR(6) PRIMARY KEY, Keeper_Name
VARCHAR(20), Specialisation VARCHAR(20));
CREATE TABLE Care_Role (ID VARCHAR(6), Staff_ID VARCHAR(6), Role
VARCHAR(10), PRIMARY KEY (ID, Staff_ID));
Now the task we've been given is to work out which Keepers have been caring for more than 10 animals of the same species using the following data:
INSERT INTO Animal VALUES
('11', 'Horace', 'Marmoset', 99, 'M', 5, '2','1'),
('12', 'sghgdht', 'Marmoset', 42, 'M', 3, '2','1'),
('13', 'xgnyn', 'Marmoset', 37, 'F', 3, '1','11'),
('14', 'sbfdfbng', 'Marmoset', 12, 'F', 3, '1','11'),
('15', 'fdghd', 'Marmoset', 12, 'M', 3, '1','11'),
('16', 'Fred', 'Marmoset', 6, 'M', 3, '15','1'),
('17', 'Mary', 'Marmoset', 3, 'F', 3, '8','14'),
('18', 'Jane', 'Marmoset', 5, 'F', 3, '7','13'),
('19', 'dfgjtjt', 'Marmoset', 5, 'M', 3, '16','17'),
('20', 'Eric', 'Marmoset', 5, 'M', 3, '12','13'),
('21', 'tukyufyu', 'Marmoset', 5, 'M', 3, '12','73'),
('31', 'hgndghmd', 'Giraffe', 99, 'M', 5, '201','1'),
('32', 'sghgdht', 'Giraffe', 42, 'M', 3, '201','1'),
('33', 'xgnyn', 'Giraffe', 37, 'F', 3, '111','1'),
('34', 'sbfdfbng', 'Giraffe', 12, 'F', 3, '111','1'),
('35', 'fdghd', 'Giraffe', 12, 'M', 3, '111','6'),
('36', 'Fred', 'Lion', 6, 'M', 3, '151','111'),
('37', 'Mary', 'Lion', 3, 'F', 3, '81','114'),
('38', 'Jane', 'Lion', 5, 'F', 3, '71','113'),
('39', 'Kingsly', 'Lion', 9, 'M', 3, '161','117'),
('40', 'Eric', 'Lion', 11, 'M', 3, '121','113'),
('41', 'tukyufyu', 'Lion', 2, 'M', 3, '121','173'),
('61', 'hgndghmd', 'Elephant', 6, 'F', 225, '201','111'),
('62', 'sghgdht', 'Elephant', 10, 'F', 230, '201','111'),
('63', 'xgnyn', 'Elephant', 5, 'F', 300, '111','121'),
('64', 'sbfdfbng', 'Elephant', 11, 'F', 173, '111','121'),
('65', 'fdghd', 'Elephant', 12, 'F', 231, '111','666'),
('66', 'Fred', 'Elephant', 17, 'F', 333, '151','147'),
('67', 'Mary', 'Elephant', 3, 'F', 272, '81','148'),
('68', 'Jane', 'Elephant', 8, 'F', 47, '71','136'),
('69', 'dfgjtjt', 'Elephant', 9, 'F', 131, '161','172'),
('70', 'Eric', 'Elephant', 10, 'F', 333, '121','136'),
('71', 'tukyufyu', 'Elephant', 7, 'M', 114, '121','731');
INSERT INTO Keeper VALUES
('1', 'Roger', 'tdfhuihiu'),
('2', 'Sidra', 'rgegegtnrty'),
('3', 'Amit', 'ergetetnt'),
('4', 'Lucia', 'dvojivhwivih');
INSERT INTO Care_Role VALUES
('32', '1', 'feeding'),
('32', '2', 'washing'),
('61', '1', 'feeding'),
('62', '1', 'feeding'),
('63', '1', 'feeding'),
('64', '1', 'feeding'),
('65', '1', 'feeding'),
('66', '1', 'feeding'),
('67', '1', 'feeding'),
('68', '1', 'feeding'),
('69', '1', 'feeding'),
('70', '1', 'feeding'),
('71', '1', 'feeding'),
('11', '4', 'feeding'),
('12', '4', 'feeding'),
('13', '4', 'feeding'),
('14', '4', 'feeding'),
('15', '4', 'feeding'),
('16', '4', 'feeding'),
('17', '4', 'feeding'),
('18', '4', 'feeding'),
('19', '4', 'feeding'),
('20', '4', 'feeding'),
('21', '4', 'feeding');
So far what I've managed to come up with is this:
SELECT Keeper.Keeper_Name, Animal.Species, COUNT(Animal.Species)
FROM Keeper
JOIN Care_Role
ON Keeper.Staff_ID = Care_Role.Staff_ID
JOIN Animal
ON Care_Role.ID = Animal.ID
GROUP BY Animal.Species
But this is returning more than just the name (which is what I want), as well as showing all the people who have looked after animals, rather than just those who have looked after 10 or more, I was wondering if anyone had any ideas on how to help with this? Many thanks!
Your query should be returning an error, because Keeper.Keeper_name is not in the GROUP BY. You have made a good attempt. A reasonable way to start the query is:
SELECT k.Keeper_Name, a.Species, COUNT(*)
FROM Keeper k JOIN
Care_Role cr
ON k.Staff_ID = cr.Staff_ID JOIN
Animal a
ON cr.ID = a.ID
GROUP BY k.Keeper_Name, a.Species;
This will return the number of animals of a given species that each keeper cares for.
Note the following:
Table aliases are abbreviations for the table.
All column names are qualified.
This uses the shorthand of COUNT(*) instead of counting some particular column.
Your question adds an additional condition about 10 animals. You can fit that in using a HAVING clause.