getting the description of a table using Google bigquery - google-bigquery

I am new to bigquery. First thing, I would have liked to do the SQL equivalent of DESC using Google bigquery.
I did:
DESC `paj.dw.MY_TABLE`;
But I get:
Statement not supported: DescribeStatement
There are mentions of INFORMATION_SCHEMA in beta version, but I get:
Syntax error: Unexpected identifier "INFORMATION_SCHEMA"
How do you do it yourself ?
Thank you.

In addition to INFORMATION_SCHEMA, you can also run the following from the console command line (cloud shell)
bq show --schema --format=prettyjson dataset.table
I prefer this for tables that have nested records.

You could do like
SELECT
* EXCEPT(is_generated, generation_expression, is_stored, is_updatable)
FROM
paj.dw.INFORMATION_SCHEMA.COLUMNS
WHERE
table_name="MY_TABLE"
For other information schema views and example see this page.

Take first table in public dataset for example:
SELECT column_name, is_nullable, data_type
FROM `bigquery-public-data.austin_311.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name="311_request"
You get:
+--------------------------+-------------+-----------+
| column_name | is_nullable | data_type |
+--------------------------+-------------+-----------+
| unique_key | YES | STRING |
| complaint_type | YES | STRING |
| complaint_description | YES | STRING |
| owning_department | YES | STRING |
| source | YES | STRING |
| status | YES | STRING |
| status_change_date | YES | TIMESTAMP |
| created_date | YES | TIMESTAMP |
| last_update_date | YES | TIMESTAMP |
| close_date | YES | TIMESTAMP |
| incident_address | YES | STRING |
| street_number | YES | STRING |
| street_name | YES | STRING |
| city | YES | STRING |
| incident_zip | YES | INT64 |
| county | YES | STRING |
| state_plane_x_coordinate | YES | STRING |
| state_plane_y_coordinate | YES | FLOAT64 |
| latitude | YES | FLOAT64 |
| longitude | YES | FLOAT64 |
| location | YES | STRING |
| council_district_code | YES | INT64 |
| map_page | YES | STRING |
| map_tile | YES | STRING |
+--------------------------+-------------+-----------+

Related

How to get the number of empty cells for each column in spark dataframe

I'd like to get the number of each column's empty value, so I tried
ele_df.where(ele_df['Shipment_ID'].isNotNull()).select('Shipment_ID').show()
But it returns me the empty value, it seems it consider the empty value as a non-null value.
+------------------+
|Shipment_ID|
+------------------+
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
+------------------+
Could you guys help me with this?

How to determine the name of an Impala object corresponds to a view

Is there a way in Impala to determine whether an object name returned by SHOW TABLES corresponds to a table or a view since:
this statement only return the object names, without their type
SHOW CREATE VIEW is just an alias for SHOW CREATE TABLE (same result, no view/table distinction)
DESCRIBE does not give any clue about the type of the item
Ideally I'd like to list all the tables + views and their types using a single operation, not one to retrieve the tables + views and then another call for each name to determine the type of the object.
(please note the question is about Impala, not Hive)
You can use describe formatted to know the type of an object
impala-shell> CREATE TABLE table2(
id INT,
name STRING
);
impala-shell> CREATE VIEW view2 AS SELECT * FROM table2;
impala-shell> DESCRIBE FORMATTED table2;
+------------------------------+--------------------------------------------------------------------+----------------------+
| name | type | comment |
+------------------------------+--------------------------------------------------------------------+----------------------+
| Retention: | 0 | NULL |
| Location: | hdfs://quickstart.cloudera:8020/user/hive/warehouse/test.db/table2 | NULL |
| Table Type: | MANAGED_TABLE | NULL |
+------------------------------+--------------------------------------------------------------------+----------------------+
impala-shell> DESCRIBE FORMATTED view2;
+------------------------------+-------------------------------+----------------------+
| name | type | comment |
+------------------------------+-------------------------------+----------------------+
| Protect Mode: | None | NULL |
| Retention: | 0 | NULL |
| Table Type: | VIRTUAL_VIEW | NULL |
| Table Parameters: | NULL | NULL |
| | transient_lastDdlTime | 1601632695 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
+------------------------------+-------------------------------+----------------------+
In the case of the table type is Table Type: MANAGED_TABLE and for the view is Table Type: VIRTUAL_VIEW
Other way is querying metastore database (if you can) to know about metadata in Impala(or Hive)
mysql> use metastore;
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-----------+-----------+-------+----------+---------------+------------------------------------------------------------+---------------------------+----------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | LINK_TARGET_ID |
+--------+-------------+-------+------------------+-----------+-----------+-------+----------+---------------+------------------------------------------------------------+---------------------------+----------------+
| 9651 | 1601631971 | 9331 | 0 | anonymous | 0 | 27996 | table1 | MANAGED_TABLE | NULL | NULL | NULL |
| 9652 | 1601632121 | 9331 | 0 | anonymous | 0 | 27997 | view1 | VIRTUAL_VIEW | SELECT `table1`.`id`, `table1`.`name` FROM `test`.`table1` | SELECT * FROM table1 | NULL |
| 9653 | 1601632676 | 9331 | 0 | cloudera | 0 | 27998 | table2 | MANAGED_TABLE | NULL | NULL | NULL |
| 9654 | 1601632695 | 9331 | 0 | cloudera | 0 | 27999 | view2 | VIRTUAL_VIEW | SELECT * FROM test.table2 | SELECT * FROM test.table2 | NULL |
+--------+-------------+-------+------------------+-----------+-----------+-------+----------+---------------+------------------------------------------------------------+---------------------------+----------------+

SQL, iif excel cell is null then fill another excel cell

I have the following part of my query within excel that is not working.
iif(master.[Canada] is null or master.[USA] is null ,'USER','' ) as [Stackoverflow]
Am I doing the nulls correctly?
The logic should
1) If there is No Canada or No Usa data, put "USER" in Stackoverflow column.
2) If either Canada OR USA has data then Stackoverflow should be empty.
Currently what Im getting:
+-----------+--------------+---------------+
| Canada | USA | Stackoverflow |
+-----------+--------------+---------------+
| | | |
| | | |
| 912796NZ8 | | |
| | | |
| | US912796NZ81 | |
| | | |
| 912796NZ8 | US912796NZ81 | |
| 912796NZ8 | US912796NZ81 | |
| 912796qd4 | US912796QD43 | |
| 298785HB5 | US298785HB50 | |
+-----------+--------------+---------------+
What I am expecting:
+-----------+--------------+---------------+
| Canada | USA | Stackoverflow |
+-----------+--------------+---------------+
| | | USER |
| | | USER |
| 912796NZ8 | | |
| | | USER |
| | US912796NZ81 | |
| | | USER |
| 912796NZ8 | US912796NZ81 | |
| 912796NZ8 | US912796NZ81 | |
| 912796qd4 | US912796QD43 | |
| 298785HB5 | US298785HB50 | |
+-----------+--------------+---------------+
After changing query to iif(TRIM(master.[Camada]) = '' OR TRIM(master.[USA]) = '','USER', '') as [Stackoverflow]
It does a good job except now I still have some canada and USA data that gives me USER.
+-----------+-----+---------------+
| Canada | USA | Stackoverflow |
+-----------+-----+---------------+
| 62941ZPA6 | | USER |
| 62943Z4R0 | | USER |
| 62945ZLQ1 | | USER |
| 62950ZZE5 | | USER |
| 75585RLK9 | | USER |
| 00433JAA3 | | USER |
| 13509PEV1 | | USER |
| 13509PEZ2 | | USER |
| 62931ZLX2 | | USER |
| 62941Z8M9 | | USER |
| 62941ZYK4 | | USER |
| 62942ZV42 | | USER |
| 62943Z6T4 | | USER |
| 62946Z6Y0 | | USER |
| 62947ZWC8 | | USER |
| 62948ZTJ6 | | USER |
| 62949ZE51 | | USER |
| 75585RLK9 | | USER |
| 75585RMB8 | | USER |
| 75585RMW2 | | USER |
+-----------+-----+---------------+
Should not have USER for these 20 records.
Any help would be appreciated, thanks.
The Jet/ACE SQL dialect does support IS NULL. However, as your current results suggest, empty strings ('') are not the same as the NULL entity. This is especially true in Excel (a non-database application where empty cells may not default to NULL). In fact, you are actually assigning empty strings in the falsepart of your IIF() call where records without 'USER' value in [Stackoverflow] will be empty string and not NULL.
Consider extending your IIF expressions to account for zero-length strings and assigning NULL to non-matches:
IIF((master.[Canada] IS NULL AND master.[USA] IS NULL) OR
(master.[Canada] = '' AND master.[USA] IS NULL) OR
(master.[Canada] IS NULL AND master.[USA] = '') OR
(master.[Canada] = '' AND master.[USA] = ''), 'USER', NULL) As [Stackoverflow]
Even account for invisible whitespace by using TRIM():
IIF((master.[Canada] IS NULL AND master.[USA] IS NULL) OR
(TRIM(master.[Canada]) = '' AND master.[USA] IS NULL) OR
(master.[Canada] IS NULL AND TRIM(master.[USA]) = '') OR
(TRIM(master.[Canada]) = '' AND TRIM(master.[USA]) = ''), 'USER', NULL) As [Stackoverflow]
I think, Jet uses the IsNull() function instead of the IS NULL operator:
iif(IsNull(master.[Canada]) or IsNull(master.[USA]),'USER','' ) as [Stackoverflow]

postgres table does not exist, but actually it does [duplicate]

This question already has answers here:
Cannot simply use PostgreSQL table name ("relation does not exist")
(18 answers)
I keep getting the error "relation [TABLE] does not exist"
(1 answer)
PostgreSQL "Column does not exist" but it actually does
(6 answers)
Closed 4 years ago.
I am facing problem similar to this question,
I have restored this database from a dump.
Here is my problem,
test_api=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+-------------------------------+----------+---------------+------------+-------------
public | vehiclenumber | table | luvpreetsingh | 8192 bytes |
public | vehiclenumber_id_seq | sequence | luvpreetsingh | 8192 bytes |
public | launchPad_pair | table | luvpreetsingh | 8192 bytes |
All tables are in public schema(I have posted only relevant tables here). I am able to query from vehiclenumber table.
test_api=# select * from iot_vehiclenumber;
id | vn | rk | rt | seed | vt
----+------+-----+-------+------+----
1 | 4513 | NO | RESET | 1234 | 01
2 | 1234 | YES | RESET | 1234 | 01
(2 rows)
But I am not able to query from launchPad_pair table.
test_api=# select * from launchPad_pair;
ERROR: relation "launchpad_pair" does not exist
LINE 1: select * from launchPad_pair;
test_api=# select * from public.launchPad_pair;
ERROR: relation "public.launchpad_pair" does not exist
LINE 1: select * from public.launchPad_pair;
I run the following query and it spits out all the info(again to make sure the schema is public),
test_api=# SELECT * FROM information_schema.columns where table_name='launchPad_pair';
table_catalog | table_schema | table_name | column_name | ordinal_position | column_default | is_nullable | data_type | character_maximum_length | character_octet_length | numeric_precision | numeric_precision_radix | numeric_scale | datetime_precision | interval_type | interval_precision | character_set_catalog | character_set_schema | character_set_name | collation_catalog | collation_schema | collation_name | domain_catalog | domain_schema | domain_name | udt_catalog | udt_schema | udt_name | scope_catalog | scope_schema | scope_name | maximum_cardinality | dtd_identifier | is_self_referencing | is_identity | identity_generation | identity_start | identity_increment | identity_maximum | identity_minimum | identity_cycle | is_generated | generation_expression | is_updatable
---------------+--------------+----------------+-------------+------------------+-----------------------------------------------+-------------+-----------+--------------------------+------------------------+-------------------+-------------------------+---------------+--------------------+---------------+--------------------+-----------------------+----------------------+--------------------+-------------------+------------------+----------------+----------------+---------------+-------------+--------------+------------+----------+---------------+--------------+------------+---------------------+----------------+---------------------+-------------+---------------------+----------------+--------------------+------------------+------------------+----------------+--------------+-----------------------+--------------
test_api | public | launchPad_pair | id | 1 | nextval('"launchPad_pairs_id_seq"'::regclass) | NO | integer | | | 32 | 2 | 0 | | | | | | | | | | | | | test_api | pg_catalog | int4 | | | | | 1 | NO | NO | | | | | | NO | NEVER | | YES
test_api | public | launchPad_pair | vehicle_id | 2 | | NO | integer | | | 32 | 2 | 0 | | | | | | | | | | | | | test_api | pg_catalog | int4 | | | | | 2 | NO | NO | | | | | | NO | NEVER | | YES
test_api | public | launchPad_pair | Box_id | 3 | | NO | integer | | | 32 | 2 | 0 | | | | | | | | | | | | | test_api | pg_catalog | int4 | | | | | 3 | NO | NO | | | | | | NO | NEVER | | YES
I have checked many times to make sure there is no typo.
What is the problem here?

Access(SQL) - Count distinct fields and group by field name

I am in the middle of making a Client a Access Database and am stuck on how to work around what im doing.
i have a table with somthing like
i have a table called Observations with somithing like this
Error Identified | Error Cat | ... | So on
No | | |
Yes | Dave3 | |
Yes | Dave | |
Yes | Dave3 | |
Yes | Dave5 | |
Yes | Dave | |
Yes | Dave6 | |
Yes | Dave6 | |
Yes | Dave | |
I want to count the number of occurrences that each [Error Cat] where [Error Identified] is yes
so it would bb
Error Cat | Count |
Dave | 3 |
Dave3 | 2 |
Dave5 | 1 |
Dave6 | 2 |
What is the Access SQL for this to happen
I tried so hard but it just wont run the SQL
Please help.
SELECT ErrorCat, COUNT(*) totalCount
FROM tableName
WHERE ErrorIdentified = 'YES'
GROUP BY ErrorCat