Athena query error : extraneous input 'select' expecting

Athena query error : extraneous input 'select' expecting - sql

Got an error while running query :
extraneous input 'select' expecting {'(', 'add', 'all', 'some', 'any', 'at', 'no', 'substring', 'position', 'tinyint', 'smallint', 'integer', 'date', 'time', 'timestamp', 'interval', 'year', 'month', 'day', 'hour', 'minute', 'second', 'zone', 'filter', 'over', 'partition', 'range', 'rows', 'preceding', 'following', 'current', 'row', 'schema', 'comment', 'view', 'replace', 'grant', 'revoke', 'privileges', 'public', 'option', 'explain', 'analyze', 'format', 'type', 'text', 'graphviz', 'logical', 'distributed', 'validate', 'show', 'tables', 'views', 'schemas', 'catalogs', 'columns', 'column', 'use', 'partitions', 'functions', 'to', 'system', 'bernoulli', 'poissonized', 'tablesample', 'unnest', 'array', 'map', 'set', 'reset', 'session', 'data', 'start', 'transaction', 'commit', 'rollback', 'work', 'isolation', 'level', 'serializable', 'repeatable', 'committed', 'uncommitted', 'read', 'write', 'only', 'call', 'input', 'output', 'cascade', 'restrict', 'including', 'excluding', 'properties', 'function', 'lambda_invoke', 'returns', 'sagemaker_invoke_endpoint', 'nfd', 'nfc', 'nfkd', 'nfkc', 'if', 'nullif', 'coalesce', identifier, digit_identifier, quoted_identifier, backquoted_identifier}
query is right join table ss with dd ,like:
select * from
(
select platform, id, nextMonth
FROM "logs"."user_records" as ss
right join
select id as idRight, platform, month
FROM "logs"."user_records" as dd
on ss.platform = dd.platform and ss.userid = dd.useridRight and ss.nextMonth=dd.month )

You probably need to surround the subquery after your right join in parentheses. Untested, but I'd guess:
select * from
(
select platform, id, nextMonth
FROM "logs"."user_records" as ss
right join
( select id as idRight, platform, month
FROM "logs"."user_records" ) as dd
on ss.platform = dd.platform and ss.userid = dd.useridRight and ss.nextMonth=dd.month )

Related

Extract words from a column and count frequency

Does anyone know if there's an efficient way to extract all the words from a single column and count the frequency of each word in SQL Server? I only have read-only access to my database so I can't create a self-defined function to do this.
Here's a reproducible example:
CREATE TABLE words
(
id INT PRIMARY KEY,
text_column VARCHAR(1000)
);
INSERT INTO words (id, text_column)
VALUES
(1, 'SQL Server is a popular database management system'),
(2, 'It is widely used for data storage and retrieval'),
(3, 'SQL Server is a powerful tool for data analysis');
I have found this code but it's not working correctly, and I think it's too complicated to understand:
WITH E1(N) AS
(
SELECT 1
FROM (VALUES
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
) t(N)
),
E2(N) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b),
E4(N) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b)
SELECT
LOWER(x.Item) AS [Word],
COUNT(*) AS [Counts]
FROM
(SELECT * FROM words) a
CROSS APPLY
(SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = LTRIM(RTRIM(SUBSTRING(a.text_column, l.N1, l.L1)))
FROM
(SELECT
s.N1,
L1 = ISNULL(NULLIF(CHARINDEX(' ',a.text_column,s.N1),0)-s.N1,4000)
FROM
(SELECT 1
UNION ALL
SELECT t.N+1
FROM
(SELECT TOP (ISNULL(DATALENGTH(a.text_column)/2,0))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E4) t(N)
WHERE SUBSTRING(a.text_column ,t.N,1) = ' '
) s(N1)
) l(N1, L1)
) x
WHERE
x.item <> ''
AND x.Item NOT IN ('0o', '0s', '3a', '3b', '3d', '6b', '6o', 'a', 'a1', 'a2', 'a3', 'a4', 'ab', 'able', 'about', 'above', 'abst', 'ac', 'accordance', 'according', 'accordingly', 'across', 'act', 'actually', 'ad', 'added', 'adj', 'ae', 'af', 'affected', 'affecting', 'affects', 'after', 'afterwards', 'ag', 'again', 'against', 'ah', 'ain', 'ain''t', 'aj', 'al', 'all', 'allow', 'allows', 'almost', 'alone', 'along', 'already', 'also', 'although', 'always', 'am', 'among', 'amongst', 'amoungst', 'amount', 'an', 'and', 'announce', 'another', 'any', 'anybody', 'anyhow', 'anymore', 'anyone', 'anything', 'anyway', 'anyways', 'anywhere', 'ao', 'ap', 'apart', 'apparently', 'appear', 'appreciate', 'appropriate', 'approximately', 'ar', 'are', 'aren', 'arent', 'aren''t', 'arise', 'around', 'as', 'a''s', 'aside', 'ask', 'asking', 'associated', 'at', 'au', 'auth', 'av', 'available', 'aw', 'away', 'awfully', 'ax', 'ay', 'az', 'b', 'b1', 'b2', 'b3', 'ba', 'back', 'bc', 'bd', 'be', 'became', 'because', 'become', 'becomes', 'becoming', 'been', 'before', 'beforehand', 'begin', 'beginning', 'beginnings', 'begins', 'behind', 'being', 'believe', 'below', 'beside', 'besides', 'best', 'better', 'between', 'beyond', 'bi', 'bill', 'biol', 'bj', 'bk', 'bl', 'bn', 'both', 'bottom', 'bp', 'br', 'brief', 'briefly', 'bs', 'bt', 'bu', 'but', 'bx', 'by', 'c', 'c1', 'c2', 'c3', 'ca', 'call', 'came', 'can', 'cannot', 'cant', 'can''t', 'cause', 'causes', 'cc', 'cd', 'ce', 'certain', 'certainly', 'cf', 'cg', 'ch', 'changes', 'ci', 'cit', 'cj', 'cl', 'clearly', 'cm', 'c''mon', 'cn', 'co', 'com', 'come', 'comes', 'con', 'concerning', 'consequently', 'consider', 'considering', 'contain', 'containing', 'contains', 'corresponding', 'could', 'couldn', 'couldnt', 'couldn''t', 'course', 'cp', 'cq', 'cr', 'cry', 'cs', 'c''s', 'ct', 'cu', 'currently', 'cv', 'cx', 'cy', 'cz', 'd', 'd2', 'da', 'date', 'dc', 'dd', 'de', 'definitely', 'describe', 'described', 'despite', 'detail', 'df', 'di', 'did', 'didn', 'didn''t', 'different', 'dj', 'dk', 'dl', 'do', 'does', 'doesn', 'doesn''t', 'doing', 'don', 'done', 'don''t', 'down', 'downwards', 'dp', 'dr', 'ds', 'dt', 'du', 'due', 'during', 'dx', 'dy', 'e', 'e2', 'e3', 'ea', 'each', 'ec', 'ed', 'edu', 'ee', 'ef', 'effect', 'eg', 'ei', 'eight', 'eighty', 'either', 'ej', 'el', 'eleven', 'else', 'elsewhere', 'em', 'empty', 'en', 'end', 'ending', 'enough', 'entirely', 'eo', 'ep', 'eq', 'er', 'es', 'especially', 'est', 'et', 'et-al', 'etc', 'eu', 'ev', 'even', 'ever', 'every', 'everybody', 'everyone', 'everything', 'everywhere', 'ex', 'exactly', 'example', 'except', 'ey', 'f', 'f2', 'fa', 'far', 'fc', 'few', 'ff', 'fi', 'fifteen', 'fifth', 'fify', 'fill', 'find', 'fire', 'first', 'five', 'fix', 'fj', 'fl', 'fn', 'fo', 'followed', 'following', 'follows', 'for', 'former', 'formerly', 'forth', 'forty', 'found', 'four', 'fr', 'from', 'front', 'fs', 'ft', 'fu', 'full', 'further', 'furthermore', 'fy', 'g', 'ga', 'gave', 'ge', 'get', 'gets', 'getting', 'gi', 'give', 'given', 'gives', 'giving', 'gj', 'gl', 'go', 'goes', 'going', 'gone', 'got', 'gotten', 'gr', 'greetings', 'gs', 'gy', 'h', 'h2', 'h3', 'had', 'hadn', 'hadn''t', 'happens', 'hardly', 'has', 'hasn', 'hasnt', 'hasn''t', 'have', 'haven', 'haven''t', 'having', 'he', 'hed', 'he''d', 'he''ll', 'hello', 'help', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'heres', 'here''s', 'hereupon', 'hers', 'herself', 'hes', 'he''s', 'hh', 'hi', 'hid', 'him', 'himself', 'his', 'hither', 'hj', 'ho', 'home', 'hopefully', 'how', 'howbeit', 'however', 'how''s', 'hr', 'hs', 'http', 'hu', 'hundred', 'hy', 'i', 'i2', 'i3', 'i4', 'i6', 'i7', 'i8', 'ia', 'ib', 'ibid', 'ic', 'id', 'i''d', 'ie', 'if', 'ig', 'ignored', 'ih', 'ii', 'ij', 'il', 'i''ll', 'im', 'i''m', 'immediate', 'immediately', 'importance', 'important', 'in', 'inasmuch', 'inc', 'indeed', 'index', 'indicate', 'indicated', 'indicates', 'information', 'inner', 'insofar', 'instead', 'interest', 'into', 'invention', 'inward', 'io', 'ip', 'iq', 'ir', 'is', 'isn', 'isn''t', 'it', 'itd', 'it''d', 'it''ll', 'its', 'it''s', 'itself', 'iv', 'i''ve', 'ix', 'iy', 'iz', 'j', 'jj', 'jr', 'js', 'jt', 'ju', 'just', 'k', 'ke', 'keep', 'keeps', 'kept', 'kg', 'kj', 'km', 'know', 'known', 'knows', 'ko', 'l', 'l2', 'la', 'largely', 'last', 'lately', 'later', 'latter', 'latterly', 'lb', 'lc', 'le', 'least', 'les', 'less', 'lest', 'let', 'lets', 'let''s', 'lf', 'like', 'liked', 'likely', 'line', 'little', 'lj', 'll', 'll', 'ln', 'lo', 'look', 'looking', 'looks', 'los', 'lr', 'ls', 'lt', 'ltd', 'm', 'm2', 'ma', 'made', 'mainly', 'make', 'makes', 'many', 'may', 'maybe', 'me', 'mean', 'means', 'meantime', 'meanwhile', 'merely', 'mg', 'might', 'mightn', 'mightn''t', 'mill', 'million', 'mine', 'miss', 'ml', 'mn', 'mo', 'more', 'moreover', 'most', 'mostly', 'move', 'mr', 'mrs', 'ms', 'mt', 'mu', 'much', 'mug', 'must', 'mustn', 'mustn''t', 'my', 'myself', 'n', 'n2', 'na', 'name', 'namely', 'nay', 'nc', 'nd', 'ne', 'near', 'nearly', 'necessarily', 'necessary', 'need', 'needn', 'needn''t', 'needs', 'neither', 'never', 'nevertheless', 'new', 'next', 'ng', 'ni', 'nine', 'ninety', 'nj', 'nl', 'nn', 'no', 'nobody', 'non', 'none', 'nonetheless', 'noone', 'nor', 'normally', 'nos', 'not', 'noted', 'nothing', 'novel', 'now', 'nowhere', 'nr', 'ns', 'nt', 'ny', 'o', 'oa', 'ob', 'obtain', 'obtained', 'obviously', 'oc', 'od', 'of', 'off', 'often', 'og', 'oh', 'oi', 'oj', 'ok', 'okay', 'ol', 'old', 'om', 'omitted', 'on', 'once', 'one', 'ones', 'only', 'onto', 'oo', 'op', 'oq', 'or', 'ord', 'os', 'ot', 'other', 'others', 'otherwise', 'ou', 'ought', 'our', 'ours', 'ourselves', 'out', 'outside', 'over', 'overall', 'ow', 'owing', 'own', 'ox', 'oz', 'p', 'p1', 'p2', 'p3', 'page', 'pagecount', 'pages', 'par', 'part', 'particular', 'particularly', 'pas', 'past', 'pc', 'pd', 'pe', 'per', 'perhaps', 'pf', 'ph', 'pi', 'pj', 'pk', 'pl', 'placed', 'please', 'plus', 'pm', 'pn', 'po', 'poorly', 'possible', 'possibly', 'potentially', 'pp', 'pq', 'pr', 'predominantly', 'present', 'presumably', 'previously', 'primarily', 'probably', 'promptly', 'proud', 'provides', 'ps', 'pt', 'pu', 'put', 'py', 'q', 'qj', 'qu', 'que', 'quickly', 'quite', 'qv', 'r', 'r2', 'ra', 'ran', 'rather', 'rc', 'rd', 're', 'readily', 'really', 'reasonably', 'recent', 'recently', 'ref', 'refs', 'regarding', 'regardless', 'regards', 'related', 'relatively', 'research', 'research-articl', 'respectively', 'resulted', 'resulting', 'results', 'rf', 'rh', 'ri', 'right', 'rj', 'rl', 'rm', 'rn', 'ro', 'rq', 'rr', 'rs', 'rt', 'ru', 'run', 'rv', 'ry', 's', 's2', 'sa', 'said', 'same', 'saw', 'say', 'saying', 'says', 'sc', 'sd', 'se', 'sec', 'second', 'secondly', 'section', 'see', 'seeing', 'seem', 'seemed', 'seeming', 'seems', 'seen', 'self', 'selves', 'sensible', 'sent', 'serious', 'seriously', 'seven', 'several', 'sf', 'shall', 'shan', 'shan''t', 'she', 'shed', 'she''d', 'she''ll', 'shes', 'she''s', 'should', 'shouldn', 'shouldn''t', 'should''ve', 'show', 'showed', 'shown', 'showns', 'shows', 'si', 'side', 'significant', 'significantly', 'similar', 'similarly', 'since', 'sincere', 'six', 'sixty', 'sj', 'sl', 'slightly', 'sm', 'sn', 'so', 'some', 'somebody', 'somehow', 'someone', 'somethan', 'something', 'sometime', 'sometimes', 'somewhat', 'somewhere', 'soon', 'sorry', 'sp', 'specifically', 'specified', 'specify', 'specifying', 'sq', 'sr', 'ss', 'st', 'still', 'stop', 'strongly', 'sub', 'substantially', 'successfully', 'such', 'sufficiently', 'suggest', 'sup', 'sure', 'sy', 'system', 'sz', 't', 't1', 't2', 't3', 'take', 'taken', 'taking', 'tb', 'tc', 'td', 'te', 'tell', 'ten', 'tends', 'tf', 'th', 'than', 'thank', 'thanks', 'thanx', 'that', 'that''ll', 'thats', 'that''s', 'that''ve', 'the', 'their', 'theirs', 'them', 'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'thered', 'therefore', 'therein', 'there''ll', 'thereof', 'therere', 'theres', 'there''s', 'thereto', 'thereupon', 'there''ve', 'these', 'they', 'theyd', 'they''d', 'they''ll', 'theyre', 'they''re', 'they''ve', 'thickv', 'thin', 'think', 'third', 'this', 'thorough', 'thoroughly', 'those', 'thou', 'though', 'thoughh', 'thousand', 'three', 'throug', 'through', 'throughout', 'thru', 'thus', 'ti', 'til', 'tip', 'tj', 'tl', 'tm', 'tn', 'to', 'together', 'too', 'took', 'top', 'toward', 'towards', 'tp', 'tq', 'tr', 'tried', 'tries', 'truly', 'try', 'trying', 'ts', 't''s', 'tt', 'tv', 'twelve', 'twenty', 'twice', 'two', 'tx', 'u', 'u201d', 'ue', 'ui', 'uj', 'uk', 'um', 'un', 'under', 'unfortunately', 'unless', 'unlike', 'unlikely', 'until', 'unto', 'uo', 'up', 'upon', 'ups', 'ur', 'us', 'use', 'used', 'useful', 'usefully', 'usefulness', 'uses', 'using', 'usually', 'ut', 'v', 'va', 'value', 'various', 'vd', 've', 've', 'very', 'via', 'viz', 'vj', 'vo', 'vol', 'vols', 'volumtype', 'vq', 'vs', 'vt', 'vu', 'w', 'wa', 'want', 'wants', 'was', 'wasn', 'wasnt', 'wasn''t', 'way', 'we', 'wed', 'we''d', 'welcome', 'well', 'we''ll', 'well-b', 'went', 'were', 'we''re', 'weren', 'werent', 'weren''t', 'we''ve', 'what', 'whatever', 'what''ll', 'whats', 'what''s', 'when', 'whence', 'whenever', 'when''s', 'where', 'whereafter', 'whereas', 'whereby', 'wherein', 'wheres', 'where''s', 'whereupon', 'wherever', 'whether', 'which', 'while', 'whim', 'whither', 'who', 'whod', 'whoever', 'whole', 'who''ll', 'whom', 'whomever', 'whos', 'who''s', 'whose', 'why', 'why''s', 'wi', 'widely', 'will', 'willing', 'wish', 'with', 'within', 'without', 'wo', 'won', 'wonder', 'wont', 'won''t', 'words', 'world', 'would', 'wouldn', 'wouldnt', 'wouldn''t', 'www', 'x', 'x1', 'x2', 'x3', 'xf', 'xi', 'xj', 'xk', 'xl', 'xn', 'xo', 'xs', 'xt', 'xv', 'xx', 'y', 'y2', 'yes', 'yet', 'yj', 'yl', 'you', 'youd', 'you''d', 'you''ll', 'your', 'youre', 'you''re', 'yours', 'yourself', 'yourselves', 'you''ve', 'yr', 'ys', 'yt', 'z', 'zero', 'zi', 'zz')
GROUP BY x.Item
ORDER BY COUNT(*) DESC
Here's the result of the above code, as you can see it's not counting correctly:
Word Counts
server 2
sql 2
data 1
database 1
popular 1
powerful 1
Can anyone help on this? Would be really appreciated!

You can make use of String_split here, such as
select value Word, Count(*) Counts
from words
cross apply String_Split(text_column, ' ')
where value not in(exclude list)
group by value
order by counts desc;

You should should the string_split function -- like this
SELECT id, value as aword
FROM words
CROSS APPLY STRING_SPLIT(text_column, ',');
This will create a table with all the words by id -- to get the count do this:
SELECT aword, count(*) as counts
FROM (
SELECT id, value as aword
FROM words
CROSS APPLY STRING_SPLIT(text_column, ',');
) x
GROUP BY aword
You may need to lower case the LOWER(text_column) if you want it to not matter

If you don't have access to STRING_SPLIT function, you can use weird xml trick to convert space to a word node and then shred it with nodes function:
select word, COUNT(*)
from (
select n.value('.', 'nvarchar(50)') AS word
from (
VALUES
(1, 'SQL Server is a popular database management system'),
(2, 'It is widely used for data storage and retrieval'),
(3, 'SQL Server is a powerful tool for data analysis')
) AS t (id, txt)
CROSS APPLY (
SELECT CAST('<x>' + REPLACE(txt, ' ', '</x><x>') + '</x>' AS XML) x
) x
CROSS APPLY x.nodes('x') z(n)
) w
GROUP BY word
Of course, this will fail on "bad" words and invalid xml-characters but it can be worked on. Text processing has never been SQL Server's strong-point though, so probably better to use some NLP library to do this kind of stuff

`sync_partition_metadata` in presto as a result returning query

Is there a way to make this query work?
SELECT "result" AS "result"
FROM (CALL system.sync_partition_metadata('schema', 'table', 'ADD')) AS "virtual_table"
LIMIT 1000;
this is the error I get:
presto error: line 3:18: mismatched input '.'. Expecting: '(', ')',
',', 'AS', 'CROSS', 'EXCEPT', 'FETCH', 'FULL', 'GROUP', 'HAVING',
'INNER', 'INTERSECT', 'JOIN', 'LEFT', 'LIMIT', 'MATCH_RECOGNIZE',
'NATURAL', 'OFFSET', 'ORDER', 'RIGHT', 'TABLESAMPLE', 'UNION',
'WHERE', 'WINDOW', ,

Slicing PySpark DataFrame by converting to Pandas DataFrame, Error when converting back to PySpark DataFrame

I want to slice a PySpark DataFrame by selecting a specific column and several rows as below:
import pandas as pd
# Data filled in our DataFrame
rows = [['Lee Chong Wei', 69, 'Malaysia'],
['Lin Dan', 66, 'China'],
['Srikanth Kidambi', 9, 'India'],
['Kento Momota', 15, 'Japan']]
# Columns of our DataFrame
columns = ['Player', 'Titles', 'Country']
# DataFrame is created
df = spark.createDataFrame(rows, columns)
# Converting DataFrame to pandas
pandas_df = df.toPandas()
# First DataFrame formed by slicing
df1 = pandas_df.iloc[[2], :2]
# Second DataFrame formed by slicing
df2 = pandas_df.iloc[[2], 2:]
# Converting the slices to PySpark DataFrames
df1 = spark.createDataFrame(df1, schema = "Country")
df2 = spark.createDataFrame(df2, schema = "Country")
I am running a notebook on Databricks and no need to import Spark Session.
There is an error message ParseException: when running following lines:
df1 = spark.createDataFrame(df1, schema = "Country")
df2 = spark.createDataFrame(df2, schema = "Country")
Please let me know any idea to solve this issue. Full error message is as below:
---------------------------------------------------------------------------
ParseException Traceback (most recent call last)
<command-4065192899858765> in <module>
23
24 # Converting the slices to PySpark DataFrames
---> 25 df1 = spark.createDataFrame(df1, schema = "Country")
26 df2 = spark.createDataFrame(df2, schema = "Country")
/databricks/spark/python/pyspark/sql/session.py in createDataFrame(self, data, schema, samplingRatio, verifySchema)
706
707 if isinstance(schema, str):
--> 708 schema = _parse_datatype_string(schema)
709 elif isinstance(schema, (list, tuple)):
710 # Must re-encode any unicode strings to be consistent with StructField names
/databricks/spark/python/pyspark/sql/types.py in _parse_datatype_string(s)
841 return from_ddl_datatype("struct<%s>" % s.strip())
842 except:
--> 843 raise e
844
845
/databricks/spark/python/pyspark/sql/types.py in _parse_datatype_string(s)
831 try:
832 # DDL format, "fieldname datatype, fieldname datatype".
--> 833 return from_ddl_schema(s)
834 except Exception as e:
835 try:
/databricks/spark/python/pyspark/sql/types.py in from_ddl_schema(type_str)
823 def from_ddl_schema(type_str):
824 return _parse_datatype_json_string(
--> 825 sc._jvm.org.apache.spark.sql.types.StructType.fromDDL(type_str).json())
826
827 def from_ddl_datatype(type_str):
/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
121 # Hide where the exception came from that shows a non-Pythonic
122 # JVM exception message.
--> 123 raise converted from None
124 else:
125 raise
ParseException:
mismatched input '<EOF>' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', 'EXPECT', 'FAIL', 'FILES', 'FORMAT_OPTIONS', 'HISTORY', 'INCREMENTAL', 'INPUT', 'INVOKER', 'LANGUAGE', 'LIVE', 'MATERIALIZED', 'MODIFIES', 'OPTIMIZE', 'PATTERN', 'READS', 'RESTORE', 'RETURN', 'RETURNS', 'SAMPLE', 'SCD TYPE 1', 'SCD TYPE 2', 'SECURITY', 'SEQUENCE', 'SHALLOW', 'SNAPSHOT', 'SPECIFIC', 'SQL', 'STORAGE', 'STREAMING', 'UPDATES', 'UP_TO_DATE', 'VIOLATION', 'ZORDER', 'ADD', 'AFTER', 'ALL', 'ALTER', 'ALWAYS', 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 'CASCADE', 'CASE', 'CAST', 'CATALOG', 'CATALOGS', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 'CLUSTERED', 'CODE', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DAY', 'DATA', 'DATABASE', 'DATABASES', 'DATEADD', 'DATEDIFF', 'DBPROPERTIES', 'DEFAULT', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 'DIV', 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FN', 'FOLLOWING', 'FOR', 'FOREIGN', 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GENERATED', 'GLOBAL', 'GRANT', 'GRANTS', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IDENTITY', 'IF', 'IGNORE', 'IMPORT', 'IN', 'INCREMENT', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEY', 'KEYS', 'LAST', 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'ILIKE', 'LIMIT', 'LINES', 'LIST', 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENTILE_CONT', 'PERCENT', 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PROPERTIES', 'PROVIDER', 'PROVIDERS', 'PURGE', 'QUALIFY', 'QUERY', 'RANGE', 'RECIPIENT', 'RECIPIENTS', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'REMOVE', 'RENAME', 'REPAIR', 'REPEATABLE', 'REPLACE', 'REPLICAS', 'RESET', 'RESPECT', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 'ROWS', 'SECOND', 'SCHEMA', 'SCHEMAS', 'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHARE', 'SHARES', 'SHOW', 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'SYNC', 'SYSTEM_TIME', 'SYSTEM_VERSION', 'TABLE', 'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TIME', 'TIMESTAMP', 'TIMESTAMPADD', 'TIMESTAMPDIFF', 'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 'TRUE', 'TRUNCATE', 'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', 'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 7)
== SQL ==
Country
-------^^^

How to explode quantiles in hive

I am trying to get quantiles of a field and I want to explode them so that each value is a separate row rather than all of them forming a single array. First, I calculate 20 quantiles as below:
select percentile_approx(probability,
array(0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35,
0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75,
0.8, 0.85, 0.9, 0.95, 1.0)) as quantiles
from my_table
The above code gives the array below:
[0.17808226409449213, 0.18250386256254247, 0.18525207046272224, 0.18800918537059694, 0.1907743631982954, 0.200154288105411, 0.30419108474685375, 0.3299437131426226, 0.352433633041806, 0.3589875791100745, 0.37581775428218006, 0.3825168120904496, 0.3966342376441502, 0.4173753044627164, 0.43268994899295316, 0.44015098935735575, 0.461413042176578, 0.4720422104416653, 0.487852850513824, 0.5050010622123932]
But since I wanted to explode it, I tried using lateral view posexplode like below (actually, I passed the output from the above code):
select i, x
lateral view posexplode([0.17808226409449213, 0.18250386256254247,0.18525207046272224, 0.18800918537059694,
0.1907743631982954, 0.200154288105411, 0.30419108474685375, 0.3299437131426226,
0.352433633041806, 0.3589875791100745, 0.37581775428218006, 0.3825168120904496,
0.3966342376441502, 0.4173753044627164, 0.43268994899295316, 0.44015098935735575,
0.461413042176578, 0.4720422104416653, 0.487852850513824, 0.5050010622123932]) q as i, x
but it gives the error message below:
ParseException: "\nextraneous input '[' expecting {'(', ')', 'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 'ANY', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 'PIVOT', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'FIRST', 'AFTER', 'LAST', 'ROW', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'DIRECTORY', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'COST', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IGNORE', 'BOTH', 'LEADING', 'TRAILING', 'IF', 'POSITION', 'EXTRACT', '+', '-', '*', 'DIV', '~', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', 'GLOBAL', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', STRING, BIGINT_LITERAL, SMALLINT_LITERAL, TINYINT_LITERAL, INTEGER_VALUE, DECIMAL_VALUE, DOUBLE_LITERAL, BIGDECIMAL_LITERAL, IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 3, pos 24)\n\n== SQL ==\n\nselect i, x\nlateral view posexplode([0.17808226409449213, 0.18250386256254247,0.18525207046272224, 0.18800918537059694, \n------------------------^^^\n 0.1907743631982954, 0.200154288105411, 0.30419108474685375, 0.3299437131426226, \n 0.352433633041806, 0.3589875791100745, 0.37581775428218006, 0.3825168120904496, \n 0.3966342376441502, 0.4173753044627164, 0.43268994899295316, 0.44015098935735575, \n 0.461413042176578, 0.4720422104416653, 0.487852850513824, 0.5050010622123932]) q as i, x\n"
On the other hand, if I create the array inside posexplode like below, it works fine:
select i, x
lateral view posexplode(array('a', 'b', 'c')) q as i, x
| i| x|
+---+---+
| 0| a|
| 1| b|
| 2| c|
+---+---+

select s.*, e.i, e.x
from
(
select percentile_approx(probability,
array(0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35,
0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75,
0.8, 0.85, 0.9, 0.95, 0.99)) as quantiles
from my_table
) s lateral view outer posexplode (s.quantiles) e as i, x
;

BigQuery Legacy SQL - How to insert into tables that has nested fields?

I'm trying to insert records into tables that has Nested and Repeated fields. I know that STRUCT and ARRAY keyword can be used respectively in Standard SQL.
What is equivalent of STRUCT and ARRAY keyword in Legacy SQL to insert records into Nested and Repeated fields?

I am reusing example you provided in bq command line tool - How to insert into Big query tables that has nested fields?
Try below, it is for Legacy SQL and using in-line version of Javascript UDF for Legacy SQL
Note: by default BigQuery Legacy SQL flattens any result, so make sure you set destination table and set Allow Large Results to true (or check it in Web UI) and Flatten Results to false (or uncheck it in Web UI)
SELECT Employee_id, Name, Age, Department.*, Location.* FROM JS((
SELECT Employee_id, Name, Age, Department_id, Department_Name, Department_Code, e.Location_id AS Location_id, Country, State, City
FROM (SELECT e.Employee_id AS Employee_id, e.Name AS Name, e.Age AS Age,
e.Department_id AS Department_id, d.Department_Name AS Department_Name, d.Department_Code AS Department_Code, e.Location_id AS Location_id
FROM Employee e JOIN Department d ON e.Department_id = d.Department_id ) AS e
JOIN Location l ON e.Location_id = l.Location_id
),
// input columns
Employee_id, Name, Age, Department_id, Department_Name, Department_Code, Location_id, Country, State, City,
// output schema
"[
{'name': 'Employee_id', 'type': 'INTEGER', 'mode': 'NULLABLE'},
{'name': 'Name', 'type': 'STRING', 'mode': 'NULLABLE'},
{'name': 'Age', 'type': 'INTEGER', 'mode': 'NULLABLE'},
{'name': 'Department', 'type': 'RECORD', 'mode': 'NULLABLE', 'fields': [
{'name': 'Department_id', 'type': 'STRING', 'mode': 'NULLABLE'},
{'name': 'Department_Name', 'type': 'STRING', 'mode': 'NULLABLE'},
{'name': 'Department_Code', 'type': 'STRING', 'mode': 'NULLABLE'}
]},
{'name': 'Location', 'type': 'RECORD', 'mode': 'NULLABLE', 'fields': [
{'name': 'Location_id', 'type': 'STRING', 'mode': 'NULLABLE'},
{'name': 'Country', 'type': 'STRING', 'mode': 'NULLABLE'},
{'name': 'State', 'type': 'STRING', 'mode': 'NULLABLE'},
{'name': 'City', 'type': 'STRING', 'mode': 'NULLABLE'}
]}
]",
// function
"function(r, emit){
emit({
Employee_id: r.Employee_id, Name: r.Name, Age: r.Age,
Department: {Department_id:r.Department_id, Department_Name:r.Department_Name, Department_Code:r.Department_Code},
Location: {Location_id:r.Location_id, Country:r.Country, State:r.State, City:r.City}
});
}"
)
Please note: i am using in-line version of UDF here for the purpose of easy showing and testing. In-line version is not recommended and not officially supported. But you can easily convert it to supported version - see more for User-Defined Functions in Legacy SQL
P.S. even though above works and helped a lot before the Standard SQL was an option - now what is the big reason for you to use Legacy SQL where Standard SQL is more elegant and gives you much more flexibility especially when it is about dealing with nested and repeated fields

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Athena query error : extraneous input 'select' expecting - sql

Related

Extract words from a column and count frequency

`sync_partition_metadata` in presto as a result returning query

Slicing PySpark DataFrame by converting to Pandas DataFrame, Error when converting back to PySpark DataFrame

How to explode quantiles in hive

BigQuery Legacy SQL - How to insert into tables that has nested fields?

Categories

Resources