I am using information schema to get count of distinct on all columns from schema and table.
I don't have access to create procedures/views/temp_tables. this option is ruled out. All I have access is select
What I wanted to do:
To query information schema for a table get all columns and run count(distinct of each column)
What is my input information schema
What should be my output
table_name column_name no_of_unique elements
Approach tried below is the query
select 'xyz' as table_name,'{1}',
count(0) from
(select {0}
from {1}
group by {0}
having count(0) > 1
from (
select ARRAY_TO_STRING(array_agg(column_name),',')
from first_db.information_schema.columns
where table_schema='abc'
and TABLE_NAME='xyz'));
Below is the error I get
SQL compilation error: syntax error line 1 at position 8 unexpected '0'. syntax error line 1 at position 17 unexpected '1'. syntax error line 1 at position 30 unexpected '0'. syntax error line 1 at position 53 unexpected 'from'.
I have achieved in two step process
Step 1: From information schema I have constructed the process in the format I need
Below is the code
select $4 dyn_col,$5 static_col from (
select 'SRC' as db, 'ab_x' as schema_nm,'a' as tbl_nm,ARRAY_TO_STRING(array_agg(concat('count(distinct ',column_name,') ',column_name)),','),ARRAY_TO_STRING(array_agg(column_name),',')
from edw_src.information_schema.columns where table_schema='ab_x' and
TABLE_NAME='a' AND COLUMN_NAME not in ('VALUE'));
Above query returns two columns as output . First column will have dynamically constructed columns as count(distinct col1),count(distinct col2) and second column will have plain "column names separated with commas"
Step 2:
Now I put above output results in below query
select 'a' as table_name,column_name,column_value no_of_unique_elements from (
select "From above query your count(distinct colums>" from SRC.ab_x.a)
unpivot(column_value for column_name in ("column names separated with commas"))
;
Now I get my intended output as expected
table_name column_name no_of_unique_elements
Related
How to get header name from select query execution in snowflake. Currently I am getting only values out of select query execution. is there way to get column name as well. I need to group by and aggregate function on top of the select query result.
Code tried
sql10 = f"""SELECT col1,col2,col3,col4 FROM tablename ORDER BY col4 ;"""
select_snow =cs.execute(sql10).fetchall()
snow_col = [(c[1],c[2]) for c in select_snow]
how to get the columns name and mapped to particular column value.
Output
select snow: [('value1','value12','value3','value4'), ('value1','value12','value3','value4'), ('value11','value12','value13','value14'), ('value21','value22','value23','value24')]
The with syntax is absolutely not cooperating, can not get it to work. Here is a stripped down version of it
set hive.strict.checks.cartesian.product=false;
with
newd as (select
avg(risk_score_highest) risk_score_hi,
avg(risk_score_current) risk_score_cur,
from table1),
oldd as ( select
avg(risk_score_highest) risk_score_hi,
avg(risk_score_current) risk_score_cur,
from table2
where ds='2022-09-08')
select
(newd.risk_score_hi-oldd.risk_score_hi)/newd.risk_score_hi diff_risk_score_hi,
(newd.risk_score_cur-oldd.risk_score_cur)/newd.risk_score_cur diff_risk_score_cur,
from newd cross join oldd
order by 1 desc
Apache Hive Error
[Statement 2 out of 2] hive error: Error while compiling statement:
FAILED: SemanticException [Error 10004]: Invalid table alias or
column reference 'newd': (possible column names are:
diff_risk_score_hi, diff_risk_score_cur)
I had been following the general form shown here: https://stackoverflow.com/a/47351815/1056563
WITH v_text
AS
(SELECT 1 AS key, 'One' AS value),
v_roman
AS
(SELECT 1 AS key, 'I' AS value)
INSERT OVERWRITE TABLE ramesh_test
SELECT v_text.key, v_text.value, v_roman.value
FROM v_text JOIN v_roman
ON (v_text.key = v_roman.key);
I can not understand what I am missing to get the inline with views to work.
Update My query (the first one on top) works in Presto (but obviously with the set hive.strict.checks.cartesian.product=false; line removed). So hive is really hard to get happy for with clauses apparently. I tried like a dozen different ways of using the aliases.
I have a table where for the same ID I have different information. Ex.
ID
Activity
1
12
1
15
2
15
3
20
I want to update the field "Activity", joining all different values of activity for each id as a single row.
When i do the select with the following code is what I want:
SELECT string_agg(id_epigrafe, ', ') AS epigrafe_list
FROM febrero20_2
GROUP BY id_local;
The result being:
However, when I want to introduce that query in my update query, posgresql (version 13), gives me the following error: ERROR: more than one row returned by a subquery used as an expression.
The code I am using to trying to update the field is:
UPDATE febrero20_2
SET id_epigrafe = (SELECT string_agg(id_epigrafe, ', ') AS epigrafe_list
FROM febrero20_2
GROUP BY id_local);
I have tried to create a new table and in that case I can do it correctly with the following code:
CREATE TABLE febrero20_3
AS
SELECT id_local, string_agg(id_epigrafe, ', ') AS epigrafe_list
FROM febrero20
GROUP BY id_local
ORDER BY id_local;
Could anyone help me to understand why am I getting that error? I am just new in posgresql and, therefore, I am sorry if it is just some simple error, but I could not find any answer
You have to join the table you update with the table from which you take the values, so that you don't end up with a subquery that has multiple result rows:
UPDATE febrero20_2 AS f_1
SET id_epigrafe = f_2.epigrafe_list
FROM (SELECT id_local,
string_agg(id_epigrafe, ', ') AS epigrafe_list
FROM febrero20_2
GROUP BY id_local) AS f_2
WHERE f_1.id_local = f_2.id_local;
Let me remark that that seems to be a strange statement: you essentially destroy information. Wouldn't it be better to perform such an aggregation when you query from the table?
I need to check if one or many fields already exists in a table so I can do a merge into statement using them.
I tried this:
select sat_sector_hkey,
CASE
WHEN EXISTS(select id from hub_sector)
THEN (MERGE INTO ...)
END AS id
from sat_sector;
For testing, I used only one case statement, and replaced merge into with a THEN...ELSE values:
SELECT sat_sector_hkey,
CASE
WHEN EXISTS(select id from hub_sector)
THEN '1'
ELSE ''
END AS id
FROM sat_sector;
When this field does not exists, the query return an error instead of '':
SQL compilation error: error line 3 at position 23 invalid identifier
'ID'
I am using a CASE, because I need to check if a column exists or not, as I don't know if it exists or not due to some technicalities in our data coming from multiple sources.
Try this:
Construct an object with the full row.
Test if the constructed object has data for "ID".
create or replace temp table maybe_id
as
select 1 x, 2 id;
select *,
case
when object_construct(a.*):ID is not null
then '1'
else ''
end as id
from maybe_id a
;
Works for me - it gives 1 when the column id has data, and `` when the column doesn't exist in the table.
I am trying to find rows in a table, that seems to have a DBCException in the cell value.
I cannot seem to find a quick way to figure out the unique rows that have this exception.
Error stored in the cell:
DBCException: SQL Error: [jcc][t4][1065][12306][4.18.60] Caught
java.io.CharConversionException. See attached Throwable for details.
ERRORCODE=-4220, SQLSTATE=null
PrimaryKey SomeColumn
1 A
2 B
3 C
4 DBCException: SQL Error...
5 DBCException: SQL Error...
On searching, this is the only link I came across with some help on this matter:
https://www.ibm.com/support/pages/sqlexception-message-caught-javaiocharconversionexception-and-errorcode-4220
Here as a diagnosis, it mentions to find Hex(col).
However, I cannot seem to narrow down the rows that have an error, so that I can fix it.
I was able to figure out which column has errors.
My question here is, how do I narrow down the rows?
I have figured out how to query the rows that have an exception.
So the exception is about invalid characters, so we will narrow down the results in the following way:
select all rows that have non null values
select all rows that have valid characters
subtract the two data sets, and you will get the rows that contain invalid characters.
Query:
SELECT * FROM ( select id, column from table WHERE column IS NOT NULL minus select id, column from table where TRANSLATE(TRANSLATE(TRANSLATE(column,'','!##$%^&*()-=+/\{}[];:.,<>?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'),'','''"')) = '' AND column IS NOT NULL )
Now you can also replace the content in the affected rows, by removing the invalid characters, the following way.
UPDATE table
SET column = regexp_replace(column,'[^a-zA-Z-\d]',' ')
WHERE id IN
(
SELECT id
(
select id
from table
WHERE column IS NOT NULL
minus
select id
from table
where TRANSLATE(TRANSLATE(TRANSLATE(column,'','!##$%^&*()-=+/{}[];:.,<>?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'),'','''"')) = ''
AND column IS NOT NULL
))