Google Colaboratory data table display max 20 columns - google-colaboratory

I find data table display useful, but I have a warning "Warning: Total number of columns (28) exceeds max_columns (20) limiting to first max_columns": is there a way to go beyond?
Thanks for any suggestion
Best

You can increase the limit
from google.colab.data_table import DataTable
DataTable.max_columns = 30
Then it should display as you want.

Related

Complex SSRS Report to calculate a field

I am stuck with this report pattern can anyone help me out how to deal with this situation? Here is what want to accomplish on SSRS
I have table
Units distribution
High 10 (10/30) -33%
low 20 (20/30) -66%
total 30
how can we use the total value of high, low rows to calculate distribution?
Sample Pic with formulas table and Data table
and here I want to achieve data table on right on the picture, so I want to implement the formulas table pattern on Data table using SSRS, I have already pulled the data.
Thank you
create a dataset that will return those rows as columns.. then display the table vertically (columns to rows) since the structure is fixed.

How to get repeatable sample using Presto SQL?

I am trying to get a sample of data from a large table and want to make sure this can be repeated later on. Other SQL allow repeatable sampling to be done with either setting a seed using set.seed(integer) or repeatable (integer) command. However, this is not working for me in Presto. Is such a command not available yet? Thanks.
One solution is that you can simulate the sampling by adding a column (or create a view) with random stuff (such as UUID) and then selecting rows by filtering on this column (for example, UUID ended with '1'). You can tune the condition to get the sample size you need.
By design, the result is random and also repeatable across multiple runs.
If you are using Presto 0.263 or higher you can use key_sampling_percent to reproducibly generate a double between 0.0 and 1.0 from a varchar.
For example, to reproducibly sample 20% of records in table using the id column:
select
id
from table
where key_sampling_percent(id) < 0.2
If you are using an older version of Presto (e.g. AWS Athena), you can use what's in the source code for key_sampling_percent:
select
id
from table
where (abs(from_ieee754_64(xxhash64(cast(id as varbinary)))) % 100) / 100. < 0.2
I have found that you have to use from_big_endian_64 instead of from_ieee754_64 to get reliable results in Athena. Otherwise I got no many numbers close to zero because of the negative exponent.
select id
from table
where (abs(from_big_endian_64(xxhash64(cast(id as varbinary)))) % 100) / 100. < 0.2
You may create a simple intermediate table with selected ids:
CREATE TABLE IF NOT EXISTS <temp1>
AS
SELECT <id_column>
FROM <tablename> TABLESAMPLE SYSTEM (10);
This will contain only sampled ids and will be ready to use it downstream in your analysis by doing JOIN with data of interest.

SQL Server CTE average with conditions

I've been trying to visualise how to do this with a CTE as it appears on the surface to be the best way but just can't get it going. Maybe it needs a temp table as well. I am using SQL Server 2008 R2.
I need to create intercepts (a length along a line essentially) with the following parameters.
The average of the intercept must be greater than .7
The aim is to get the largest intercept possible
There can be up to 2 consecutive meters of values less than .7 (internal waste) but no more
There is no limit to the total internal waste within an intercept
There is no minimum intercept length (well there is but I'll take care of it later)
Note: there will be no gaps as I has taken care of that and the from and to can be decimal
An example is shown here:
enter image description here
In space seen here with the assay on the left and depth on right:
enter image description here
So for a little more clarity if needed, intervals 6 to 7 and 17 to 18 do not make part of the larger intercept as the internal waste (7-9 and/or 15-17) would bring the average below 0.7 - not because of the amount of internal waste.
However the result for 21-22 is not included because there are 3 meters of internal waste between it and the result for 17-18.
Note that there are multiple sites and areas which make part of the original tables primary key so I imagine that a partition of area and site would be used in any ROW_NUMBER OVER statements
Edit: the original code had errors in the from and to (multiple 14 to 15) which would have been confusing sorry. There will be no overlapping from-to's which hopefully simplifies things.
Example values to use:
create table #temp_inter (
area nvarchar(10),
site_ID nvarchar(10),
d_from decimal (18,3),
d_to decimal (18,3),
assay decimal (18,3))
insert into #temp_inter
values ('area_1','abc','0','5','0'),
('area_1','abc','5','6','0.165'),
('area_1','abc','6','7','0.761'),
('area_1','abc','7','8','0.321'),
('area_1','abc','8','9','0.292'),
('area_1','abc','9','10','1.135'),
('area_1','abc','10','11','0.225'),
('area_1','abc','11','12','0.983'),
('area_1','abc','12','13','0.118'),
('area_1','abc','13','14','0.438'),
('area_1','abc','14','15','0.71'),
('area_1','abc','15','16','0.65'),
('area_1','abc','16','17','2'),
('area_1','abc','17','18','0.367'),
('area_1','abc','18','19','0.047'),
('area_1','abc','19','20','0.71'),
('area_1','abc','20','21','0'),
('area_1','abc','21','22','0'),
('area_1','abc','22','23','0'),
('area_1','abc','23','24','2'),
('area_1','abc','24','25','0'),
('area_1','abc','25','26','0'),
('area_1','abc','26','30','0'),
('area_2','zzz','0','5','0'),
('area_2','zzz','5','6','1.165'),
('area_2','zzz','6','7','0.396'),
('area_2','zzz','7','8','0.46'),
('area_2','zzz','8','9','0.111'),
('area_2','zzz','9','10','0.053'),
('area_2','zzz','10','11','0.057'),
('area_2','zzz','11','12','0.055'),
('area_2','zzz','12','13','0.03'),
('area_2','zzz','13','14','0.026'),
('area_2','zzz','14','15','0.194'),
('area_2','zzz','15','16','0.367'),
('area_2','zzz','16','17','0.431'),
('area_2','zzz','17','18','0.341'),
('area_2','zzz','18','19','0.071'),
('area_2','zzz','19','20','0.26'),
('area_2','zzz','20','21','0.659'),
('area_2','zzz','21','22','0.602'),
('area_2','zzz','22','23','2.436'),
('area_2','zzz','23','24','0.874'),
('area_2','zzz','24','25','3.173'),
('area_2','zzz','25','26','0.179'),
('area_2','zzz','26','27','0.065'),
('area_2','zzz','27','28','0.024'),
('area_2','zzz','28','29','0')

Displaying data in grid view page by page

I have more than 30,000 rows in a table. It takes a lot of time to load all the data in the gridview. So I want to display 100 rows at a time. When I click next page button, another 100 rows should be displayed. When I click previous page button, previous 100 rows should be displayed. If I type page 5 in a text box, then I want to jump over to the 5th lot of rows.
I also want to display how many pages there will be. Can we implement this concept in vb.net [winform] gridview. I am using database PostgreSQL.
Can anybody give me a hint or some concept?
Look at OFFSET and LIMIT in PostgreSQL.
Your query for the 5th page could look like this:
SELECT *
FROM tbl
ORDER BY id
OFFSET 400
LIMIT 100;
id is the primary key in my example, therefore an index is in place automatically.
If you access the table a lot in this fashion, performance may profit from using CLUSTER.
Total number of pages:
SELECT ceil(1235::real / 100)::int
FROM tbl;
If you wanted the number rounded down, just simplify to:
SELECT 1235 / 100
FROM tbl;
With both numbers being integer the result will be an integer type and fractional digits truncated automatically. But I think you need to round up here.

transform rows into columns in a sql table

Supose I would like to store a table with 440 rows and 138,672 columns, as SQL limit is 1024 columns I would like to transform rows into columns, I mean to convert the
440 rows and 138,672 columns to 138,672 rows and 440 columns.
Is this possible?
SQL Server limit is actually 30000 columns, see Sparse Columns.
But creating a query that returns 30k columns (not to mention +138k) will be basically uncontrollable, the sheer size of the metadata on each query result would halt the client to a crawl. One simply does not design databases like that. Go back to the drawing board, when you reach 10 columns stop and think, when you reach 100 column erase the board and start anew.
And read this: Best Practices for Semantic Data Modeling for Performance and Scalability.
The description of the data is as follows....
Each attribute describes the measurement of the occupancy rate
(between 0 and 1) of a captor location as recorded by a measuring
station, at a given timestamp in time during the day.
The ID of each station is given in the stations_list text file.
For more information on the location (GPS, Highway, Direction) of each
station please refer to the PEMS website.
There are 963 (stations) x 144 (timestamps) = 138,672 attributes for
each record.
This is perfect for normalision.
You can have a stations table and a measurements table. Two nice long thin tables.