How to get only Integer values from a column - sql

I'm using SQL in Databricks and I wish to create a view of a table. The data looks similar to this:
A | B | C
------------------------
AA | AB | 1
AA | AC | 1.5
AA | AD | 2
AA | AE | 3
And basically, what I want to do is read in the table such that only the rows when C has integer values are read in, so that I get:
A | B | C
------------------------
AA | AB | 1
AA | AD | 2
AA | AE | 3
I've tried using code similar to this:
WHERE df.C NOT LIKE '.%[0-9$]%'
But this doesn't work, and similarly I tried this too:
Where IsNumeric(df.C) = 0x1
But IsNumeric doesn't seem to work in Databricks. If anyone has any ideas I'd greatly appreciate it, thank you!

Related

How to extract genotypes from public 1000G bigquery table

I would like to extract GENOTYPE information from bigquery-public-data:human_genome_variants.1000_genomes_phase_3_optimized_schema_variants_20150220 by SQL.
Considering how the data is organized in the table I find it to be a difficult task. I've used SQL occasionally but not for complex queries. I would be grateful for some guidance.
For this task the relevant columns are:
names (SNPid)
reference_bases
hom_ref_call (sample names that are homozygous
to reference bases. therefore each sample has genotype
'reference_bases'+'reference_bases').
By running the query:
SELECT ARRAY_TO_STRING(names, '') as SNP,
samples,
CONCAT(reference_bases, reference_bases) as GT
FROM `bigquery-public-data.human_genome_variants.1000_genomes_phase_3_optimized_schema_variants_20150220`
CROSS JOIN UNNEST(hom_ref_call) as samples
I get:
+-----+------------+---------+----+
| Row | SNP | samples | GT |
+-----+------------+---------+----+
| 1 | rs10158087 | HG00096 | GG |
| 2 | rs10158087 | HG00097 | GG |
| 3 | rs10465663 | HG00096 | CC |
| 4 | rs10465663 | HG00097 | CC |
+-----+------------+---------+----+
The result I am looking for should look like this:
+-----+------------+---------+---------+
| Row | SNP | HG00096 | HG00097 |
+-----+------------+---------+---------+
| 1 | rs10158087 | GG | GG |
| 2 | rs10465663 | CC | CC |
+-----+------------+---------+---------+
How should I structure the query to get the desired table? Thanks.
Note: There are 3500 samples so the column names (HG00096,...) should be generated automatically.
For your sample, you can use aggregation:
SELECT ARRAY_TO_STRING(names, '') as SNP,
samples,
MAX(CASE WHEN samples = 'HG00096' THEN reference_bases END) as HG00096,
MAX(CASE WHEN samples = 'HG00097' THEN reference_bases END) as HG00097
FROM `bigquery-public-data.human_genome_variants.1000_genomes_phase_3_optimized_schema_variants_20150220` CROSS JOIN
UNNEST(hom_ref_call) as samples
GROUP BY SNP;
Do note that you need to know the columns that you want in advance.

Using RANKX in PowerBI DAX

I am a PowerBI newbie and I have been playing with DAX functions, more specifically, the RANKX function. Here is my data set:
+----------+-------------------------------------+-----------------+----------+
| Category | Sub Category | Date | My Value |
+----------+-------------------------------------+-----------------+----------+
| A | A1 | 2018-01-01 | 2 |
| A | A2 | 2018-01-02 | 4 |
| A | A3 | 2018-01-03 | 6 |
| A | A4 | 2018-01-04 | 6 |
| B | B1 | 2018-01-05 | 21 |
| B | B2 | 2018-01-06 | 22 |
| B | B2 | 2018-01-07 | 23 |
| C | C1 | 2018-01-08 | 35 |
| C | C2 | 2018-01-09 | 35 |
| C | C3 | 2018-01-10 | 35 |
+----------+-------------------------------------+-----------------+----------+
And below is my code:
Rank all rows as Column =
RANKX(
'Table',
'Table'[My Value]
)
Unfortunately, I am getting the following error:
A single value for column 'My Value' in table 'Table' cannot be
determined. This can happen when a measure formula refers to a column
that contains many values without specifying an aggregation such as
min, max, count, or sum to get a single result.
Any help would be greatly appreciated.
Thanks
There is nothing wrong with your formula, you just put it in a wrong place.
There are 2 ways you can write DAX formulas in PowerBI:
as a calculated column
as a measure
The difference is critical, you need to learn it if you want to use PowerBI.
The formula you wrote is for calculated columns. If you create it as a measure, you will get an error. To fix the problem, go to tab "Model", click "New Column", paste your code and it should work.
If you need RANKX as a measure, Chrisoffer has given you a good answer.
Create a mesure to sum "My value" column:
Sum value = SUM(Table[My value])
Then use this measure to get your rank:
Rank all rows as Column =
RANKX(ALL(Table);[Sum value])
This will give you the rank of each sub category.

postgres: Multiply column of table A with rows of table B

Fellow SOers,
Currently I am stuck with the following Problem.
Say we have table "data" and table "factor"
"data":
---------------------
| col1 | col2 |
----------------------
| foo | 2 |
| bar | 3 |
----------------------
and table "factor" (the amount of rows is variable)
---------------------
| name | val |
---------------------
| f1 | 7 |
| f2 | 8 |
| f3 | 9 |
| ... | ... |
---------------------
and the following result should look like this:
---------------------------------
| col1 | f1 | f2 | f3 | ...|
---------------------------------
| foo | 14 | 16 | 18 | ...|
| bar | 21 | 24 | 27 | ...|
---------------------------------
So basically I want the column "col2" multiplicated with all the contents of "val" of table "factor" AND the content of column "name" should act as tableheader/columnname for the result.
We are using postgres 9.3 (upgrade to higher version may be possible), so an extended Search resulted in multiple possible solutions: using crosstab (though even with crosstab I was not able to figure this one out), using CTE "With" (preferred, but also no luck). Probably this may also be done with the correct use of array() and unnest().
Hence, any help is appreciated on how to achieve this (the less code, the better)
Tnx in advance!
This package seems to do what you want:
https://github.com/hnsl/colpivot

SQL query to get dif between dates in select statment

I have a two tables call RFS and RFS_History.
RFS_id | name
--------+--------
12 | xx
14 | yy
15 | zz
figure 1 :RFS table
RFS_id | gate | End | start
--------+-------+--------+-------
12 | aa | 19/02 | 20/03
12 | bb | 30/01 | 12/08
12 | cc | 30/01 | 12/08
13 | aa | 30/01 | 12/08
12 | dd | 30/01 | 12/08
figure 2 :RFS history
My initial query is a select * query to get information where FRSname ='xx'
SELECT * FROM RFS, RFSHistory
WHERE RFSname="xx" And RFShistory.RFS_ID=RFS.RFS_ID
result is:
RFS_id | gate | End | start
--------+-------+--------+-------
12 | aa | 19/02 | 19/01
12 | bb | 12/04 | 12/02
12 | cc | 20/03 | 12/03
12 | dd | 30/09 | 12/08
figure 3
however I want to get a result like bellow format :
RFS_id | gate_aa | gate_bb | gate_cc | gate_dd
----------------------------------------------
12 | 30 days | 60dyas | 8days | 18days
gate_aa is duraion and it gets from start - end date. Please help me to write single query to get this result.
Use datediff() to get date difference and Pivot() to convert row into cloumn
like here in your case gate wise column
Sample Syntax
SELECT DATEDIFF(day,'2008-06-05','2008-08-05') AS DiffDate
You can use the below query for get the difference b/w dates
SELECT RFS.ID,(RFS_HISTORY.end_t-RFS_HISTORY.start_t) AS DiffDate,gate FROM RFS, RFS_HISTORY
WHERE name='aa' And RFS_HISTORY.ID=RFS.ID group by RFS.ID,gate,RFS_HISTORY.end_t,RFS_HISTORY.start_t
I think you want to convert rows into columns on the values. This can be done with the help of pivoting.
SELECT * FROM RFS, RFSHistory
pivot for columname on [values]
I actually forgot the syntax but you can google it

How can I display two rows worth of data on one line side-by-side in Report Designer?

I am using SQL Server Reporting Services 2005, and I'm developing a report in Report Designer/Business Intelligence Studio. Right now I have a normal-looking table that displays data like this:
----------------
| A | B | C |
----------------
| A1 | B1 | C1 |
----------------
| A2 | B2 | C2 |
----------------
| A3 | B3 | C3 |
----------------
What I would like to do, is display two rows side-by-side on the same line, so that the table would look like this:
-------------------------------
| A | B | C | A | B | C |
-------------------------------
| A1 | B1 | C1 | A2 | B2 | C2 |
-------------------------------
| A3 | B3 | C3 | A4 | B4 | C4 |
-------------------------------
Is this even possible? Does anyone know how to accomplish this? Google searches have turned up nothing for me so far. Thanks in advance for any help.
Ok, I figured out how to do what I wanted. I created a table with 2 (repeating) table detail rows, with the following values:
--------------------------------------------------------------------------------------------------------------------------------------------
| =Previous(Fields!A.Value) | =Previous(Fields!B.Value) | =Previous(Fields!C.Value) | = Fields!A.Value | =Fields!B.Value | =Fields!C.Value |
--------------------------------------------------------------------------------------------------------------------------------------------
| =Fields!A.Value | =Fields!B.Value | =Fields!C.Value | | | |
--------------------------------------------------------------------------------------------------------------------------------------------
Then I went to the properties of each row, and set the "hidden" value to an expression. For the first line I used this expression:
=Iif(RowNumber("table1") mod 2 = 0, false, true)
For the second line, I used this expression:
=Iif(RowNumber("table1") = CountRows("table1") AND RowNumber("table1") mod 2 = 1, false, true)
That did the trick. It now displays how I wanted.
You would need a matrix report.
eidt: although now that I think about it that would probably only be able to get you to something like this:
| A1 | B1 | C1 |
-------------------------------------------------------
| A | B | C | A | B | C | A | B | C |
Would that format work for you?