How to find number of observations in postgresql tables - sql

I am from DW/BI background using SAS for many years now I have task to find out number of records present in tables on the fly for postgresql tables
i.e. In SAS we have meta tables which has details about tables and no of records, column info etc in system meta tables in a same manner is there any meta table available in postgresql to get no of observation on the fly?
I know we can do select count(*) from table but I don't want to do that, I want to know is there any built in meta tables in postgresql to get no of records present in table?
Highly appreciated your help.

The pg_class system catalogue contains information about each relation (table, index, view, sequence...). For tables, this includes an estimate of the number of tuples (rows) and disk pages taken up by the table. e.g.:
SELECT reltuples, relpages FROM pg_class WHERE oid = 'table_name'::regclass
Note that reltuples is of "real" type and so stores about 6 significant figures.

Related

Find data size occupied for row by each foreign key

I have stored customers data in one table and another 21 table has foreign key for customer now I want to find data size for each customer in SQL Server.
One more thing: there are some other tables which have foreign keys for these 21 tables, I also want to find and add data size from those table.
How can I find TOTAL data size - any ideas?
You must find all the table associated by a foreign key from the customer table with the SQL ISO Standard views INFORMATION_SCHEMA : TABLE_CONSTRAINTS, REFERENTIAL_CONSTRAINTS...
Once you get all the table, you must get the object_id of all these tables.
Then you must use sys.dm_db_partition_stats and compute an agregation on the "used_page_count" of this table, and because pages are 8kb, you can do a calculus of this SUM with * 8 for Kb or / 128 for Mb
If you want the average size of each customer, divide the result by the count of customers... Due to fragmentation, the real exact amount of data size for each customer can very heavily changes with the exact same data when other data movment are executed or when some maintenance is done...
If you need more help, please post the exact name of your table, including schema name.

How to filter the SAP tables by the number of fields?

SAP table DD02L lists, for each table in SAP, among others, the number of fields in each table. For example table, PLPO (PM task lists) contains 244 fields, according to T-Code S_PH0_48000138. For business reporting and using SQL, I only want to see 5, 6 field values at most, but the entire table is replicated, all 244 fields!
So, I want to know how many transparent tables consist of more than say, 20 fields. if I run the above t-code, it will take me 10 years, to do one table at a time.
Mike McNally
I am not an experienced ABAPer, so I do not know how to set this up.
Table DD02L is "only" a list of all tables in SAP. I have not found any field in this table, which would tell, how many field the actual table has.
What could be used is table DD03L (SAP table fields), which lists all fields and tables in SAP. The fields are listed by position, which means we can select all tables where a field with position 21 exists (position 21 exists = there are more than 20 fields in the table):
SELECT FROM dd02l
INNER JOIN dd03l
ON dd02l~tabname EQ dd03l~tabname AND
dd02l~as4local EQ dd03l~as4local AND
dd02l~as4vers EQ dd03l~as4vers
FIELDS dd02l~tabname
WHERE dd02l~tabclass EQ 'TRANSP' "only transparent tables
AND dd03l~position EQ '0021'
INTO TABLE #DATA(lt_dd02l).
The result (internal table lt_dd02l) will contain all the tables, which have more than 20 fields. I am on an R/3 system currently, the query took a few seconds only, but there are still over 12.000 tables with more than 20 fields.
*Answer edited after the comment from Gert Beukema (see below)

Big Query Performance - Is there a way to partition by multiple dimentions (Date and customer) ? One Big table Vs 400 Tables

We are evaluating cloud warehouse options to build a analytics solution. We need to provide trend analysis per day, per customer across many customers (400+), ratio of queries for across these two dimensions are equal. My initial thought is to create Date partitioned table one per customer, so for queries per customer I limit the scan to one particular day and for queries across all customer I use table wildcard feature.
Questions:
Is there a way to partition by Date and customer, so I can store all data in one table and still limit data scan volume?
If ans to #1 is no, What is the performance impact of querying across 400 tables Vs one table (same amount of data)
Hash partitioning and partitioning by specific fields in a table are not supported yet, so this is not feasible now.
If you query the 400 tables using wildcards and filter customers using _TABLESUFFIX, the query will only read the matching tables and you'll only be charged for those tables; if you query one table then you can filter dates using _PARTITIONTIME and you'll only be charged for the matching partitions. Performance wise less metadata is read if you query one table, but that shouldn't be much for 400 tables.

SSRS 2008 R2 Data Region Embedded in Another Data Region

I have two unrelated tables (Table A and Table B) that I would like to join to create a unique list of pairings of the two. So, each row in Table A will pair with each row in Table B creating a list of unique pairings between the two tables.
My ideas of what can be done:
I can either do this in the query (SQL) by creating one dataset and having two fields outputted (each row equaling a unique pairing).
Or by creating two different datasets (one for each table) and have a data region embedded within a different data region; each data region pulling from a different dataset (of the two created for each table).
I have tried implementing the second method but it would not allow me to select a different dataset for the embedded data region from the parent data region.
The first method I have not tried but do not understand how or even if it is possible through the SQL language.
Any help or guidance in this matter would be greatly appreciated!
The first is called a cross join:
select t1.*, t2.*
from t1 cross join
t2;
Whether you should do this in the application or in the database is open to question. It depends on the size of the tables and the bandwidth to the database -- there is an overhead to pulling rows from a database.
If each table has 2 rows, this is a non-issue. If each table has 100 rows, then you would be pulling 10,000 rows from the database and it might be faster to pull 2*100 rows and do the looping in the application.

bigquery dataset design, multiple vs single tables for storing the same type of data

im planning to build a new ads system and we are considering to use google bigquery.
ill quickly describe my data flow :
Each User will be able to create multiple ADS. (1 user, N ads)
i would like to store the ADS impressions and i thought of 2 options.
1- create a table for impressions , for example table name is :Impressions fields : (userid,adsid,datetime,meta data fields...)
in this options of all my impressions will be stored in a single table.
main pros : ill be able to big data queries quite easily.
main cons: table will be hugh, and with multiple queries, ill end up paying too much (:
option 2 is to create table per ads
for example, ads id 1 will create
Impression_1 with fields (datetime,meta data fields)
pros: query are cheaper, data table is smaller
cons: todo big dataquery sometimes ill have to create a union and things will complex
i wonder what are your thoughts regarding this ?
In BigQuery it's easy to do this, because you can create tables per each day, and you have the possibility to query only those tables.
And you have Table wildcard functions, which are a cost-effective way to query data from a specific set of tables. When you use a table wildcard function, BigQuery only accesses and charges you for tables that match the wildcard. Table wildcard functions are specified in the query's FROM clause.
Assuming you have some tables like:
mydata.people20140325
mydata.people20140326
mydata.people20140327
You can query like:
SELECT
name
FROM
(TABLE_DATE_RANGE(mydata.people,
TIMESTAMP('2014-03-25'),
TIMESTAMP('2014-03-27')))
WHERE
age >= 35
Also there are Table Decorators:
Table decorators support relative and absolute <time> values. Relative values are indicated by a negative number, and absolute values are indicated by a positive number.
To get a snapshot of the table at one hour ago:
SELECT COUNT(*) FROM [data-sensing-lab:gartner.seattle#-3600000]
There is also TABLE_QUERY, which you can use for more complex queries.