How to mock the results of BigQueryIO.read for unit testing?

How to mock the results of BigQueryIO.read for unit testing? - google-bigquery

I have an Apache Beam pipeline that reads data from BigQuery using a query that joins multiple tables.
I want to test the entire pipeline locally using mock data (i.e. without connecting to BigQuery).
Can I do this using .withTestServices(fakeBigQueryServices)? I could not find any relevant examples. Has anyone tried this approach or has suggestions on how this can be done?
String query = "Select o.*, p.name from Order o, Product p where o.product_id = p.id and o.created_on = '20220210'"
pipeline.apply("read data", BigQueryIO.read(input -> new OrderMapper().mapRow(input.getRecord()))
.withCoder(SerializableCoder.of(Order.class))
.fromQuery(query)
.withoutValidation())

I don't think you will be find mockup bigquery services on apache beam.
Asides from that, there are options available within Apache beam that you can use to mimic a database and build your own function that can behave in a similar way. I will share an updated example Reading from a SQLite database from the beam project for visibility:
1. Install these libraries
!pip install --upgrade apache-beam
!pip install --upgrade httplib2==0.20.0
# Note you might encounter issues with the library installation reinstall if necessary.
2. Create a mockup database (using sqlite3)
import sqlite3
database_file = "moon-phases.db" ##param {type:"string"}
with sqlite3.connect(database_file) as db:
cursor = db.cursor()
# Create the moon_phases table.
cursor.execute('''
CREATE TABLE IF NOT EXISTS moon_phases (
id INTEGER PRIMARY KEY,
phase_emoji TEXT NOT NULL,
peak_datetime DATETIME NOT NULL,
phase TEXT NOT NULL)''')
# Truncate the table if it's already populated.
cursor.execute('DELETE FROM moon_phases')
# Insert some sample data.
insert_moon_phase = 'INSERT INTO moon_phases(phase_emoji, peak_datetime, phase) VALUES(?, ?, ?)'
cursor.execute(insert_moon_phase, ('🌕', '2017-12-03 15:47:00', 'Full Moon'))
cursor.execute(insert_moon_phase, ('🌗', '2017-12-10 07:51:00', 'Last Quarter'))
cursor.execute(insert_moon_phase, ('🌑', '2017-12-18 06:30:00', 'New Moon'))
cursor.execute(insert_moon_phase, ('🌓', '2017-12-26 09:20:00', 'First Quarter'))
cursor.execute(insert_moon_phase, ('🌕', '2018-01-02 02:24:00', 'Full Moon'))
cursor.execute(insert_moon_phase, ('🌗', '2018-01-08 22:25:00', 'Last Quarter'))
cursor.execute(insert_moon_phase, ('🌑', '2018-01-17 02:17:00', 'New Moon'))
cursor.execute(insert_moon_phase, ('🌓', '2018-01-24 22:20:00', 'First Quarter'))
cursor.execute(insert_moon_phase, ('🌕', '2018-01-31 13:27:00', 'Full Moon'))
# Query for the data in the table to make sure it's populated.
cursor.execute('SELECT * FROM moon_phases')
for row in cursor.fetchall():
print(row)
3. Run the pipelines queries
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
import sqlite3
from typing import Iterable, List, Tuple, Dict
class SQLiteSelect(beam.DoFn):
def __init__(self, database_file: str):
self.database_file = database_file
self.connection = None
def setup(self):
self.connection = sqlite3.connect(self.database_file)
def process(self, query: Tuple[str, List[str]]) -> Iterable[Dict[str, str]]:
table, columns = query
cursor = self.connection.cursor()
cursor.execute(f"SELECT {','.join(columns)} FROM {table}")
for row in cursor.fetchall():
yield dict(zip(columns, row))
def teardown(self):
self.connection.close()
#beam.ptransform_fn
#beam.typehints.with_output_types(Dict[str, str])
def SelectFromSQLite(
pbegin: beam.pvalue.PBegin,
database_file: str,
queries: List[Tuple[str, List[str]]],
) -> beam.PCollection[Dict[str, str]]:
return (
pbegin
| 'Create None' >> beam.Create(queries)
| 'SQLite SELECT' >> beam.ParDo(SQLiteSelect(database_file))
)
queries = [
# (table_name, [column1, column2, ...])
('moon_phases', ['phase_emoji', 'peak_datetime', 'phase']),
('moon_phases', ['phase_emoji', 'phase']),
]
options = PipelineOptions(flags=[], type_check_additional='all')
with beam.Pipeline(options=options) as pipeline:
(
pipeline
| 'Read from SQLite' >> SelectFromSQLite(database_file, queries)
| 'Print rows' >> beam.Map(print)
)
output
{'phase_emoji': '🌕', 'peak_datetime': '2017-12-03 15:47:00', 'phase': 'Full Moon'}
{'phase_emoji': '🌗', 'peak_datetime': '2017-12-10 07:51:00', 'phase': 'Last Quarter'}
{'phase_emoji': '🌑', 'peak_datetime': '2017-12-18 06:30:00', 'phase': 'New Moon'}
{'phase_emoji': '🌓', 'peak_datetime': '2017-12-26 09:20:00', 'phase': 'First Quarter'}
{'phase_emoji': '🌕', 'peak_datetime': '2018-01-02 02:24:00', 'phase': 'Full Moon'}
{'phase_emoji': '🌗', 'peak_datetime': '2018-01-08 22:25:00', 'phase': 'Last Quarter'}
{'phase_emoji': '🌑', 'peak_datetime': '2018-01-17 02:17:00', 'phase': 'New Moon'}
{'phase_emoji': '🌓', 'peak_datetime': '2018-01-24 22:20:00', 'phase': 'First Quarter'}
{'phase_emoji': '🌕', 'peak_datetime': '2018-01-31 13:27:00', 'phase': 'Full Moon'}
{'phase_emoji': '🌕', 'phase': 'Full Moon'}
{'phase_emoji': '🌗', 'phase': 'Last Quarter'}
{'phase_emoji': '🌑', 'phase': 'New Moon'}
{'phase_emoji': '🌓', 'phase': 'First Quarter'}
{'phase_emoji': '🌕', 'phase': 'Full Moon'}
{'phase_emoji': '🌗', 'phase': 'Last Quarter'}
{'phase_emoji': '🌑', 'phase': 'New Moon'}
{'phase_emoji': '🌓', 'phase': 'First Quarter'}
{'phase_emoji': '🌕', 'phase': 'Full Moon'}
For more beam examples (either using java or python) the beam github project example node. Be advice that you might have to apply some adjustments as some are a bit deprecated or missing some imports.
As a final note, more on the testing aspect of your question. I think you can use PAssert to create testing pipelines as described on its official documentation.

Related

Extract words from a column and count frequency

Does anyone know if there's an efficient way to extract all the words from a single column and count the frequency of each word in SQL Server? I only have read-only access to my database so I can't create a self-defined function to do this.
Here's a reproducible example:
CREATE TABLE words
(
id INT PRIMARY KEY,
text_column VARCHAR(1000)
);
INSERT INTO words (id, text_column)
VALUES
(1, 'SQL Server is a popular database management system'),
(2, 'It is widely used for data storage and retrieval'),
(3, 'SQL Server is a powerful tool for data analysis');
I have found this code but it's not working correctly, and I think it's too complicated to understand:
WITH E1(N) AS
(
SELECT 1
FROM (VALUES
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
) t(N)
),
E2(N) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b),
E4(N) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b)
SELECT
LOWER(x.Item) AS [Word],
COUNT(*) AS [Counts]
FROM
(SELECT * FROM words) a
CROSS APPLY
(SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = LTRIM(RTRIM(SUBSTRING(a.text_column, l.N1, l.L1)))
FROM
(SELECT
s.N1,
L1 = ISNULL(NULLIF(CHARINDEX(' ',a.text_column,s.N1),0)-s.N1,4000)
FROM
(SELECT 1
UNION ALL
SELECT t.N+1
FROM
(SELECT TOP (ISNULL(DATALENGTH(a.text_column)/2,0))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E4) t(N)
WHERE SUBSTRING(a.text_column ,t.N,1) = ' '
) s(N1)
) l(N1, L1)
) x
WHERE
x.item <> ''
AND x.Item NOT IN ('0o', '0s', '3a', '3b', '3d', '6b', '6o', 'a', 'a1', 'a2', 'a3', 'a4', 'ab', 'able', 'about', 'above', 'abst', 'ac', 'accordance', 'according', 'accordingly', 'across', 'act', 'actually', 'ad', 'added', 'adj', 'ae', 'af', 'affected', 'affecting', 'affects', 'after', 'afterwards', 'ag', 'again', 'against', 'ah', 'ain', 'ain''t', 'aj', 'al', 'all', 'allow', 'allows', 'almost', 'alone', 'along', 'already', 'also', 'although', 'always', 'am', 'among', 'amongst', 'amoungst', 'amount', 'an', 'and', 'announce', 'another', 'any', 'anybody', 'anyhow', 'anymore', 'anyone', 'anything', 'anyway', 'anyways', 'anywhere', 'ao', 'ap', 'apart', 'apparently', 'appear', 'appreciate', 'appropriate', 'approximately', 'ar', 'are', 'aren', 'arent', 'aren''t', 'arise', 'around', 'as', 'a''s', 'aside', 'ask', 'asking', 'associated', 'at', 'au', 'auth', 'av', 'available', 'aw', 'away', 'awfully', 'ax', 'ay', 'az', 'b', 'b1', 'b2', 'b3', 'ba', 'back', 'bc', 'bd', 'be', 'became', 'because', 'become', 'becomes', 'becoming', 'been', 'before', 'beforehand', 'begin', 'beginning', 'beginnings', 'begins', 'behind', 'being', 'believe', 'below', 'beside', 'besides', 'best', 'better', 'between', 'beyond', 'bi', 'bill', 'biol', 'bj', 'bk', 'bl', 'bn', 'both', 'bottom', 'bp', 'br', 'brief', 'briefly', 'bs', 'bt', 'bu', 'but', 'bx', 'by', 'c', 'c1', 'c2', 'c3', 'ca', 'call', 'came', 'can', 'cannot', 'cant', 'can''t', 'cause', 'causes', 'cc', 'cd', 'ce', 'certain', 'certainly', 'cf', 'cg', 'ch', 'changes', 'ci', 'cit', 'cj', 'cl', 'clearly', 'cm', 'c''mon', 'cn', 'co', 'com', 'come', 'comes', 'con', 'concerning', 'consequently', 'consider', 'considering', 'contain', 'containing', 'contains', 'corresponding', 'could', 'couldn', 'couldnt', 'couldn''t', 'course', 'cp', 'cq', 'cr', 'cry', 'cs', 'c''s', 'ct', 'cu', 'currently', 'cv', 'cx', 'cy', 'cz', 'd', 'd2', 'da', 'date', 'dc', 'dd', 'de', 'definitely', 'describe', 'described', 'despite', 'detail', 'df', 'di', 'did', 'didn', 'didn''t', 'different', 'dj', 'dk', 'dl', 'do', 'does', 'doesn', 'doesn''t', 'doing', 'don', 'done', 'don''t', 'down', 'downwards', 'dp', 'dr', 'ds', 'dt', 'du', 'due', 'during', 'dx', 'dy', 'e', 'e2', 'e3', 'ea', 'each', 'ec', 'ed', 'edu', 'ee', 'ef', 'effect', 'eg', 'ei', 'eight', 'eighty', 'either', 'ej', 'el', 'eleven', 'else', 'elsewhere', 'em', 'empty', 'en', 'end', 'ending', 'enough', 'entirely', 'eo', 'ep', 'eq', 'er', 'es', 'especially', 'est', 'et', 'et-al', 'etc', 'eu', 'ev', 'even', 'ever', 'every', 'everybody', 'everyone', 'everything', 'everywhere', 'ex', 'exactly', 'example', 'except', 'ey', 'f', 'f2', 'fa', 'far', 'fc', 'few', 'ff', 'fi', 'fifteen', 'fifth', 'fify', 'fill', 'find', 'fire', 'first', 'five', 'fix', 'fj', 'fl', 'fn', 'fo', 'followed', 'following', 'follows', 'for', 'former', 'formerly', 'forth', 'forty', 'found', 'four', 'fr', 'from', 'front', 'fs', 'ft', 'fu', 'full', 'further', 'furthermore', 'fy', 'g', 'ga', 'gave', 'ge', 'get', 'gets', 'getting', 'gi', 'give', 'given', 'gives', 'giving', 'gj', 'gl', 'go', 'goes', 'going', 'gone', 'got', 'gotten', 'gr', 'greetings', 'gs', 'gy', 'h', 'h2', 'h3', 'had', 'hadn', 'hadn''t', 'happens', 'hardly', 'has', 'hasn', 'hasnt', 'hasn''t', 'have', 'haven', 'haven''t', 'having', 'he', 'hed', 'he''d', 'he''ll', 'hello', 'help', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'heres', 'here''s', 'hereupon', 'hers', 'herself', 'hes', 'he''s', 'hh', 'hi', 'hid', 'him', 'himself', 'his', 'hither', 'hj', 'ho', 'home', 'hopefully', 'how', 'howbeit', 'however', 'how''s', 'hr', 'hs', 'http', 'hu', 'hundred', 'hy', 'i', 'i2', 'i3', 'i4', 'i6', 'i7', 'i8', 'ia', 'ib', 'ibid', 'ic', 'id', 'i''d', 'ie', 'if', 'ig', 'ignored', 'ih', 'ii', 'ij', 'il', 'i''ll', 'im', 'i''m', 'immediate', 'immediately', 'importance', 'important', 'in', 'inasmuch', 'inc', 'indeed', 'index', 'indicate', 'indicated', 'indicates', 'information', 'inner', 'insofar', 'instead', 'interest', 'into', 'invention', 'inward', 'io', 'ip', 'iq', 'ir', 'is', 'isn', 'isn''t', 'it', 'itd', 'it''d', 'it''ll', 'its', 'it''s', 'itself', 'iv', 'i''ve', 'ix', 'iy', 'iz', 'j', 'jj', 'jr', 'js', 'jt', 'ju', 'just', 'k', 'ke', 'keep', 'keeps', 'kept', 'kg', 'kj', 'km', 'know', 'known', 'knows', 'ko', 'l', 'l2', 'la', 'largely', 'last', 'lately', 'later', 'latter', 'latterly', 'lb', 'lc', 'le', 'least', 'les', 'less', 'lest', 'let', 'lets', 'let''s', 'lf', 'like', 'liked', 'likely', 'line', 'little', 'lj', 'll', 'll', 'ln', 'lo', 'look', 'looking', 'looks', 'los', 'lr', 'ls', 'lt', 'ltd', 'm', 'm2', 'ma', 'made', 'mainly', 'make', 'makes', 'many', 'may', 'maybe', 'me', 'mean', 'means', 'meantime', 'meanwhile', 'merely', 'mg', 'might', 'mightn', 'mightn''t', 'mill', 'million', 'mine', 'miss', 'ml', 'mn', 'mo', 'more', 'moreover', 'most', 'mostly', 'move', 'mr', 'mrs', 'ms', 'mt', 'mu', 'much', 'mug', 'must', 'mustn', 'mustn''t', 'my', 'myself', 'n', 'n2', 'na', 'name', 'namely', 'nay', 'nc', 'nd', 'ne', 'near', 'nearly', 'necessarily', 'necessary', 'need', 'needn', 'needn''t', 'needs', 'neither', 'never', 'nevertheless', 'new', 'next', 'ng', 'ni', 'nine', 'ninety', 'nj', 'nl', 'nn', 'no', 'nobody', 'non', 'none', 'nonetheless', 'noone', 'nor', 'normally', 'nos', 'not', 'noted', 'nothing', 'novel', 'now', 'nowhere', 'nr', 'ns', 'nt', 'ny', 'o', 'oa', 'ob', 'obtain', 'obtained', 'obviously', 'oc', 'od', 'of', 'off', 'often', 'og', 'oh', 'oi', 'oj', 'ok', 'okay', 'ol', 'old', 'om', 'omitted', 'on', 'once', 'one', 'ones', 'only', 'onto', 'oo', 'op', 'oq', 'or', 'ord', 'os', 'ot', 'other', 'others', 'otherwise', 'ou', 'ought', 'our', 'ours', 'ourselves', 'out', 'outside', 'over', 'overall', 'ow', 'owing', 'own', 'ox', 'oz', 'p', 'p1', 'p2', 'p3', 'page', 'pagecount', 'pages', 'par', 'part', 'particular', 'particularly', 'pas', 'past', 'pc', 'pd', 'pe', 'per', 'perhaps', 'pf', 'ph', 'pi', 'pj', 'pk', 'pl', 'placed', 'please', 'plus', 'pm', 'pn', 'po', 'poorly', 'possible', 'possibly', 'potentially', 'pp', 'pq', 'pr', 'predominantly', 'present', 'presumably', 'previously', 'primarily', 'probably', 'promptly', 'proud', 'provides', 'ps', 'pt', 'pu', 'put', 'py', 'q', 'qj', 'qu', 'que', 'quickly', 'quite', 'qv', 'r', 'r2', 'ra', 'ran', 'rather', 'rc', 'rd', 're', 'readily', 'really', 'reasonably', 'recent', 'recently', 'ref', 'refs', 'regarding', 'regardless', 'regards', 'related', 'relatively', 'research', 'research-articl', 'respectively', 'resulted', 'resulting', 'results', 'rf', 'rh', 'ri', 'right', 'rj', 'rl', 'rm', 'rn', 'ro', 'rq', 'rr', 'rs', 'rt', 'ru', 'run', 'rv', 'ry', 's', 's2', 'sa', 'said', 'same', 'saw', 'say', 'saying', 'says', 'sc', 'sd', 'se', 'sec', 'second', 'secondly', 'section', 'see', 'seeing', 'seem', 'seemed', 'seeming', 'seems', 'seen', 'self', 'selves', 'sensible', 'sent', 'serious', 'seriously', 'seven', 'several', 'sf', 'shall', 'shan', 'shan''t', 'she', 'shed', 'she''d', 'she''ll', 'shes', 'she''s', 'should', 'shouldn', 'shouldn''t', 'should''ve', 'show', 'showed', 'shown', 'showns', 'shows', 'si', 'side', 'significant', 'significantly', 'similar', 'similarly', 'since', 'sincere', 'six', 'sixty', 'sj', 'sl', 'slightly', 'sm', 'sn', 'so', 'some', 'somebody', 'somehow', 'someone', 'somethan', 'something', 'sometime', 'sometimes', 'somewhat', 'somewhere', 'soon', 'sorry', 'sp', 'specifically', 'specified', 'specify', 'specifying', 'sq', 'sr', 'ss', 'st', 'still', 'stop', 'strongly', 'sub', 'substantially', 'successfully', 'such', 'sufficiently', 'suggest', 'sup', 'sure', 'sy', 'system', 'sz', 't', 't1', 't2', 't3', 'take', 'taken', 'taking', 'tb', 'tc', 'td', 'te', 'tell', 'ten', 'tends', 'tf', 'th', 'than', 'thank', 'thanks', 'thanx', 'that', 'that''ll', 'thats', 'that''s', 'that''ve', 'the', 'their', 'theirs', 'them', 'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'thered', 'therefore', 'therein', 'there''ll', 'thereof', 'therere', 'theres', 'there''s', 'thereto', 'thereupon', 'there''ve', 'these', 'they', 'theyd', 'they''d', 'they''ll', 'theyre', 'they''re', 'they''ve', 'thickv', 'thin', 'think', 'third', 'this', 'thorough', 'thoroughly', 'those', 'thou', 'though', 'thoughh', 'thousand', 'three', 'throug', 'through', 'throughout', 'thru', 'thus', 'ti', 'til', 'tip', 'tj', 'tl', 'tm', 'tn', 'to', 'together', 'too', 'took', 'top', 'toward', 'towards', 'tp', 'tq', 'tr', 'tried', 'tries', 'truly', 'try', 'trying', 'ts', 't''s', 'tt', 'tv', 'twelve', 'twenty', 'twice', 'two', 'tx', 'u', 'u201d', 'ue', 'ui', 'uj', 'uk', 'um', 'un', 'under', 'unfortunately', 'unless', 'unlike', 'unlikely', 'until', 'unto', 'uo', 'up', 'upon', 'ups', 'ur', 'us', 'use', 'used', 'useful', 'usefully', 'usefulness', 'uses', 'using', 'usually', 'ut', 'v', 'va', 'value', 'various', 'vd', 've', 've', 'very', 'via', 'viz', 'vj', 'vo', 'vol', 'vols', 'volumtype', 'vq', 'vs', 'vt', 'vu', 'w', 'wa', 'want', 'wants', 'was', 'wasn', 'wasnt', 'wasn''t', 'way', 'we', 'wed', 'we''d', 'welcome', 'well', 'we''ll', 'well-b', 'went', 'were', 'we''re', 'weren', 'werent', 'weren''t', 'we''ve', 'what', 'whatever', 'what''ll', 'whats', 'what''s', 'when', 'whence', 'whenever', 'when''s', 'where', 'whereafter', 'whereas', 'whereby', 'wherein', 'wheres', 'where''s', 'whereupon', 'wherever', 'whether', 'which', 'while', 'whim', 'whither', 'who', 'whod', 'whoever', 'whole', 'who''ll', 'whom', 'whomever', 'whos', 'who''s', 'whose', 'why', 'why''s', 'wi', 'widely', 'will', 'willing', 'wish', 'with', 'within', 'without', 'wo', 'won', 'wonder', 'wont', 'won''t', 'words', 'world', 'would', 'wouldn', 'wouldnt', 'wouldn''t', 'www', 'x', 'x1', 'x2', 'x3', 'xf', 'xi', 'xj', 'xk', 'xl', 'xn', 'xo', 'xs', 'xt', 'xv', 'xx', 'y', 'y2', 'yes', 'yet', 'yj', 'yl', 'you', 'youd', 'you''d', 'you''ll', 'your', 'youre', 'you''re', 'yours', 'yourself', 'yourselves', 'you''ve', 'yr', 'ys', 'yt', 'z', 'zero', 'zi', 'zz')
GROUP BY x.Item
ORDER BY COUNT(*) DESC
Here's the result of the above code, as you can see it's not counting correctly:
Word Counts
server 2
sql 2
data 1
database 1
popular 1
powerful 1
Can anyone help on this? Would be really appreciated!

You can make use of String_split here, such as
select value Word, Count(*) Counts
from words
cross apply String_Split(text_column, ' ')
where value not in(exclude list)
group by value
order by counts desc;

You should should the string_split function -- like this
SELECT id, value as aword
FROM words
CROSS APPLY STRING_SPLIT(text_column, ',');
This will create a table with all the words by id -- to get the count do this:
SELECT aword, count(*) as counts
FROM (
SELECT id, value as aword
FROM words
CROSS APPLY STRING_SPLIT(text_column, ',');
) x
GROUP BY aword
You may need to lower case the LOWER(text_column) if you want it to not matter

If you don't have access to STRING_SPLIT function, you can use weird xml trick to convert space to a word node and then shred it with nodes function:
select word, COUNT(*)
from (
select n.value('.', 'nvarchar(50)') AS word
from (
VALUES
(1, 'SQL Server is a popular database management system'),
(2, 'It is widely used for data storage and retrieval'),
(3, 'SQL Server is a powerful tool for data analysis')
) AS t (id, txt)
CROSS APPLY (
SELECT CAST('<x>' + REPLACE(txt, ' ', '</x><x>') + '</x>' AS XML) x
) x
CROSS APPLY x.nodes('x') z(n)
) w
GROUP BY word
Of course, this will fail on "bad" words and invalid xml-characters but it can be worked on. Text processing has never been SQL Server's strong-point though, so probably better to use some NLP library to do this kind of stuff

Google Analytics data api dictionary to pandas data frame

I exported google analytics data in below dictionary format with 3 Dimensions and 2 metrics. How can I change this format to pandas data frame. I don't need the columns rowCount,minimums,maximums,nextPageToken. Thank you
{'reports': [{'columnHeader': {'dimensions': ['ga:date', 'ga:eventCategory',
'ga:eventAction'], 'metricHeader': {'metricHeaderEntries': [{'name': 'ga:totalEvents', 'type':
'INTEGER'}
, {'name': 'ga:UniqueEvents', 'type': 'INTEGER'}, {'name': 'ga:eventvalue', 'type':
'INTEGER'}]}},
'data':
{'rows': [{'dimensions': ['20220820', 'accordion ', 'accordion'], 'metrics':
[{'values': ['547', '528', '0']}]},
{'dimensions': ['20220817', 'accordion click', 'benefits'], 'metrics': [{'values': ['26',
'26', '0']}]},
{'dimensions': ['20220818', 'accordion click', 'for-your-dog '], 'metrics': [{'values': ['1',
'1', '0']}]},
{'dimensions': ['20220819', 'account', 'register'], 'metrics': [{'values': ['1465', '1345',
'0']}]},
{'dimensions': ['20220820', 'account', 'reminders'], 'metrics': [{'values': ['59', '54',
'0']}]},
, 'rowCount': 17, 'minimums': [{'values': ['1', '1', '0']}], 'maximums': [{'values':
['40676', '37725', '5001337']}]}, 'nextPageToken': '1000'}]}
final dataframe format below

Add rows to a DataFrame, but with same values as preexisting rows for some columns

I have the following df data in pandas:
import pandas
data = pd.DataFrame({'Country Name': ['Indonesia', 'France'],
'Indicator': ['Literacy', 'Literacy'],
'2014': ['88.0', '98.0'],
'2015': ['89.0', '98.0'],
'Country Code': ['IDN', 'FRA'],
'Population': [80000000, 67000000],
'Income Group': ['Upper Middle', 'High, OED']})
I want to add rows for these two countries for a new indicator: 'Internet users'.
I know how to add the rows and fill the values for 'Country Name', 'Indicator', '2014', '2015'. See below:
for coun in ['Indonesia', 'France']:
data = data.append({'Country Name': coun, 'Indicator': 'Internet Users', '2014': 95, '2015': 95, 'Country Code': np.NaN, 'Population': np.NaN , 'Income Group': np.NaN },
ignore_index=True)
But how can I fill automatically the last three columns (np.NaN in appended data) according the Country Name ?
In excel I would have used a vlookup to easily get the data with the Country Name as a key, but I am clueless with Pandas.
From what I read here, I believe it can be done using, for each 'red' column, map() and a dict with 'Country Name' as key and the values of the column as the dict values, but I wondered whether there was a lighter solution.
code for the desired df:
desired_output = pd.DataFrame({'Country Name': ['Indonesia', 'France', 'Indonesia', 'France'],
'Indicator': ['Literacy', 'Literacy', 'Internet Users', 'Internet Users'],
'2014': [88.0, 98.0, 95.0, 95.0],
'2015': [89.0, 98.0, 95.0, 95.0],
'Country Code': ['IDN', 'FRA', 'IDN', 'FRA'],
'Population': [80000000, 67000000, 80000000, 67000000],
'Income Group': ['Upper Middle', 'High, OED', 'Upper Middle', 'High, OED']})
Thanks
EDIT: I was able to fill the NaN using this additional code:
pd.merge(data[['Country Name', 'Indicator', '2014', '2015']] , data[data.notna().all(1)][['Country Name', 'Country Code', 'Population', 'Income Group']], how='left', on='Country Name')
But if anyone has a cleaner solution, I am all ears
P.S: Is there a way to easily paste a tab from Jupyter or Excel in a post ? I had to paste the plain text and play with spacing to make something acceptable. If anyone has a tip :)

multi-gpu inference tensorflow

I wanted to perform multi-gpu inference using tensorflow/Keras
this is my prediction
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)
# COCO Class names
# Index of the class in the list is its ID. For example, to get ID of
# the teddy bear class, use: class_names.index('teddy bear')
class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard',
'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
# Load a random image from the images folder
file_names = next(os.walk(IMAGE_DIR))[2]
image = skimage.io.imread(os.path.join(IMAGE_DIR, random.choice(file_names)))
# Run detection
results = model.detect([image], verbose=1)
# Visualize results
r = results[0]
Is there a way to run this model on multiple gpus?
Thanks in advance.

Increase the GPU_COUNT as per the number of GPUs in the system and pass the new config when creating the model using modellib.MaskRCNN.
class InferenceConfig(coco.CocoConfig):
GPU_COUNT = 1 # increase the GPU count based on number of GPUs
IMAGES_PER_GPU = 1
config = InferenceConfig()
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
https://github.com/matterport/Mask_RCNN/blob/master/samples/demo.ipynb

Convert pandas to dictionary defining the columns used fo the key values

There's the pandas dataframe 'test_df'. My aim is to convert it to a dictionary. Therefore I run this:
id Name Gender Age
0 1 'Peter' 'M' 32
1 2 'Lara' 'F' 45
Therefore I run this:
test_dict = test_df.set_index('id').T.to_dict()
The output is this:
{1: {'Name': 'Peter', 'Gender': 'M', 'Age': 32}, 2: {'Name': 'Lara', 'Gender': 'F', 'Age': 45}}
Now, I want to choose only the 'Name' and 'Gender' columns as the values of dictionary's keys. I'm trying to modify the above script into sth like this:
test_dict = test_df.set_index('id')['Name']['Gender'].T.to_dict()
with no success!
Any suggestion please?!

You was very close, use subset of columns [['Name','Gender']]:
test_dict = test_df.set_index('id')[['Name','Gender']].T.to_dict()
print (test_dict)
{1: {'Name': 'Peter', 'Gender': 'M'}, 2: {'Name': 'Lara', 'Gender': 'F'}}
Also T is not necessary, use parameter orient='index':
test_dict = test_df.set_index('id')[['Name','Gender']].to_dict(orient='index')
print (test_dict)
{1: {'Name': 'Peter', 'Gender': 'M'}, 2: {'Name': 'Lara', 'Gender': 'F'}}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to mock the results of BigQueryIO.read for unit testing? - google-bigquery

Related

Extract words from a column and count frequency

Google Analytics data api dictionary to pandas data frame

Add rows to a DataFrame, but with same values as preexisting rows for some columns

multi-gpu inference tensorflow

Convert pandas to dictionary defining the columns used fo the key values

Categories

Resources