Compare two columns and count the result rows - sql

I want to count how many times the first and last column of a sqlite file are the same for each row in my data set. the data set has 16+ million rows and efficiency is very important.
I have tried:
SELECT * FROM tab WHERE [0] = [3]
but it doesn't work. probably because it compares the first column of each row with the last column of the same row.
Let's assume this is my data set:
0 |1 |2 |3 |
--------------------------------------
2005:67 |ytg |6utgjgt |786:09 |
2005:903 |467 |009 |2005:67 |
2005:444 |355 |785 |2005:450|
2005:450 |355 |785 |N/A |
2005:934 |467 |009 |N/A |
2005:000 |355 |785 |2005:450|
2005:987 |355 |785 |2005:450|
--------------------------------------
the output should be this:
0 |1 |2 |3 |4 |
-----------------------------------------------
2005:67 |ytg |6utgjgt |786:09 |1 |
2005:450 |355 |785 |N/A |3 |
2005:934 |467 |009 |N/A |0 |
-----------------------------------------------
the rows whose 4th column were the same as the first column of one of the rows are dropped but were counted. (It is not possible that the 4th column of a row is the same as the first column of more than one row. And the first column's values for each row are identical)
Can everybody please help me? I am a rookie and greatly appreciate some explanation along with the code. Thank you

With NOT EXISTS:
select t.*,
(select count(*) from tab where [3] = t.[0]) [4]
from tab t
where not exists (
select 1 from tab
where [0] = t.[3]
)
See the demo.
Results:
| 0 | 1 | 2 | 3 | 4 |
| -------- | --- | ------- | ------ | --- |
| 2005:67 | ytg | 6utgjgt | 786:09 | 1 |
| 2005:450 | 355 | 785 | N/A | 3 |
| 2005:934 | 467 | 009 | N/A | 0 |

Related

Trying to use SQL to group accounts by number of sub-types

I'm using a table that houses account info. These accounts can have between 1 and 6 unique sub types. Currently it only tracks between single and multi subtypes but doesn't show the totals of how many of each multi sub-type account there are (how many accounts with 2 subtypes vs. 3 subtypes and so on). I'm looking for a wholly SQL way to view how many of each grouping of account types. There are a LOT of accounts in the table so pulling it manually isn't really an option. Is there a way I can get a count of each of the amount of sub-type groupings?
| account | Sub-Type | Single_V_Multi |
|---------|--------- | -------------- |
|123456789|123456789 | Multi |
|123456789|123456790 | Multi |
|123456789|123456791 | Multi |
|123456792|123456792 | Single |
|123456793|123456793 | Multi |
|123456793|123456794 | Multi |
|123456795|123456795 | Single |
|123456796|123456796 | Single |
|123456797|123456797 | Single |
|123456798|123456798 | Single |
|123456799|123456799 | Multi |
|123456799|123456800 | Multi |
|123456799|123456801 | Multi |
|123456799|123456802 | Multi |
From this example I'd be looking to get separate counts of the Account column based on the number of unique Sub-Type. What I've done so far is a query that groups the Sub-Types:
SELECT account, COUNT(DISTINCT(Sub-Type)) as BAN_SUB_COUNT
FROM Table
Which give the output:
| account | BAN_SUB_COUNT |
| ------- | ------------- |
|123456789| 3 |
|123456792| 1 |
|123456793| 2 |
|123456795| 1 |
|123456796| 1 |
|123456797| 1 |
|123456798| 1 |
|123456799| 4 |
What I need from this is a way to get a separate count of accounts for each of the distinct BAN_SUB_COUNT entries. Ideally it would be along the lines of:
| BAN_SUB_COUNT |count of Accounts|
| ------------- | --------------- |
| 1 | 5 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
Sorry for any confusion and I hope I'm explaining myself better here!
You just need to wrap your query with another one:
select ban_sub_count, count(distinct account) as count_of_accounts
from (
SELECT account, COUNT(DISTINCT Sub-Type ) as BAN_SUB_COUNT
FROM Table
group by account
)z
group by ban_sub_count
Output:
BAN_SUB_COUNT
count of Accounts
1
5
2
1
3
1
4
1
I try to answer your question:
select a2.*,a.`count_sub_type`
FROM (
select count(`sub-type`) as count_sub_type,`sub-type` from account group by `sub-type`
) a
left join account a2 on a2.`sub-type` = a.`sub-type`;
output :
|account |sub-type|single_v_multi|count_sub_type|
|--------|--------|--------------|--------------|
|account6|type1 |multiview |3 |
|account5|type1 |single |3 |
|account1|type1 |single |3 |
|account4|type2 |single |2 |
|account2|type2 |single |2 |
|account6|type3 |single |2 |
|account3|type3 |single |2 |
Best regards,

Count string occurrences within a list column SQL/Grafana

I have a table in the following format:
| id | tags |
|----|-------------------------|
|1 |['Car', 'Plane', 'Truck']|
|2 |['Plane', 'Truck'] |
|3 |['Car', 'Plane'] |
|4 |['Plane'] |
|5 |['Boat', 'Truck'] |
How can I create a table that gives me the total number of occurrences of each item in all cells of the "tags" column? Items ideally do not include single quotes, but may if necessary.
The resulting table would look like:
| tag | count |
|-------|-------|
| Car | 2 |
| Plane | 4 |
| Truck | 3 |
| Boat | 1 |
The following does not work because it only counts identical "tags" entries rather than comparing list contents.
SELECT u.id, count(u.tags) as cnt
FROM table u
group by 1
order by cnt desc;
I am aware of this near-identical question, but they are using Snowflake/SQL whereas I am using MySQL/Grafana so the accepted answer uses functions unavailable to me.

SQL select all records if at least one record fulfils a condition

I have several tables
table1
-------------------------
|id| rec | other_rec|
-------------------------
|1 | record1 | record6 |
|2 | record2 | record8 |
|3 | record4 | record0 |
|4 | record5 | record2 |
|n | ... | ... |
------------------------
and a second table
table2
-------------------------------------------------
|id| table_nr_1_foreign_key | rec_1 | rec_2 |
-------------------------------------------------
|1 | table_nr_1_key_1 |record1 | rt1 |
|2 | table_nr_1_key_2 |record2 | rt2 |
|3 | table_nr_1_key_2 |record4 | rt3 |
|4 | table_nr_1_key_3 |record5 | rt4 |
|5 | table_nr_1_key_2 |record6 | rt5 |
|n | table_nr_1_key_n | ... | ... |
-------------------------------------------------
and an SQL query
SELECT t.id,
t.rec,
t.other_rec,
t2.rec_1
FROM table1 t1
JOIN ...[other table]
and ...[conditions start]
and ...
and ...[conditions end]
LEFT JOIN table2 t2 ON t.id = table_nr_1_foreign_key
AND t2.rec_2 = any(array['rt2'])
where ...[other condition]
now I want to select all records from table2 which have a corresponding foreign key in a table1
if and only if at least one record exists in a table2 that points to a table1.
so I want
data1,..., rt2
data1,..., rt3
data1,..., rt5
...
to be selected
but all I get is
data1,..., rt2
Update no 1.
with my inexperience I failed to accomplish selection with one sql and wrote two sql queries instead
for data selection for a main record (table 1)
the other for selecting all records from a table 2 (using in([list
of ids]))
This question can be closed / deleted.

Correlated subquery issue in Hive / Spark SQL to compare aggregates

I have a table of about a billion game interactions where each record describes attributes of a gamer on a particular day (gamer ID, date on which the game was played, attribute 1, [attribute 2, ...])
+---------+
| TABLE_1 |
+-------+-----------+-------+-------+-------+
|gmr_id | played_dt | attr1 | attr2 | attr3 |
+-------+-----------+-------+-------+-------+
|1 | 2017-01-01| 1 | 2 | txt |
|1 | 2017-01-02| 3 | 2 | txt |
|2 | 2017-01-02| 1 | 2 | txt |
+-------+-----------+-------+-------+-------+
I have another table with millions of records where the gamers moves are recorded for each game played:
+---------+
| TABLE_2 |
+-------+-----------+---------+---------+---------+
|gmr_id | played_dt | finish | attacks | deaths |
+-------+-----------+---------+---------+---------+
|1 | 2017-01-01| 10 | 1 | 9 |
|1 | 2017-01-03| 12 | 10 | 2 |
|2 | 2017-01-02| 1 | 0 | 0 |
|4 | 2017-01-03| 1 | 0 | 1 |
|1 | 2017-01-04| 3 | 1 | 2 |
+-------+-----------+---------+---------+---------+
For every record in TABLE_1 -- specifically for every gmr_id and played_dt, I am trying to compare sums of moves in next two days of played_dt with previous five days (1 if true, else 0) and join to TABLE_1 based on gmr_id and played_dt:
i.e.
Sum of finishes, attacks and deaths for the gmr_id from played_dt to two days after, i.e. SUM(finishes), SUM(attacks) etc. BETWEEN played_dt AND DATE_ADD(played_dt, 2)
Sum of finishes, attacks and deaths for the gmr_id from five days before to one day before played_dt, i.e SUM(finishes), SUM(attacks) etc. BETWEEN DATE_SUB(played_dt, 5) AND DATE_SUB(played_dt, 1)
Compare and set flags i.e. get a row that looks like: gmr_id, played_dt, future_finish_gt_past_finish (1 if finish in days after played_dt greater than days before else 0), future_attacks_gt_past_attacks (1 if attacks in days after played_dt are greater than days before else 0), etc.
Join the row: gmr_id, played_dt, finish_f, attack_f
with TABLE_1 row: gmr_id, played_dt, attr1, attr2, attr3
on gmr_id, played_dt
I have tried writing correlated sub-query but to no-avail:
SELECT
t1.gmr_id,
t1.played_dt,
(SELECT
t2.gmr_id,
SUM(t2.finish) `future_finish`,
SUM(t2.attacks) `future_attacks`
FROM TABLE_2 t2 WHERE t2.played_dt BETWEEN played_dt AND DATE_ADD(played_dt, 2)
GROUP BY t2.gmr_id),
(SELECT
t2.gmr_id,
SUM(t2.finish) `past_finish`,
SUM(t2.attacks) `past_attacks`
FROM TABLE_2 t2 WHERE t2.played_dt BETWEEN DATE_SUB(played_dt, 5) AND DATE_SUB(played_dt, 1)
GROUP BY t2.gmr_id),
CASE WHEN future_finish > past_finish THEN 1 ELSE 0 END `finish_f`,
CASE WHEN future_attacks > past_attacks THEN 1 ELSE 0 END `attack_f`
FROM
TABLE_1 t1;
The expected output looks like this:
+---------+
| TABLE_1 |
+-------+-----------+-------+-------+-------+-----------+-----------+
|gmr_id | played_dt | attr1 | attr2 | attr3 | finish_f | attack_f |
+-------+-----------+-------+-------+-------+-----------+-----------+
|1 | 2017-01-01| 1 | 2 | txt | 1 | 0 |
|1 | 2017-01-02| 3 | 2 | txt | 1 | 1 |
|2 | 2017-01-02| 1 | 2 | txt | 0 | 1 |
+-------+-----------+-------+-------+-------+-----------+-----------+
I am using Hive 1.2 (or could use Spark 1.5) to do this, but so far I have been unable to do so. What would be the best way to accomplish this? I would greatly appreciate your help.

Pivot table using flat table structure in SQL Server without aggregation

I have a flat table structure which I've turned into a column based table. I'm struggling with getting the rowId from my raw data to appear in my column based table. Any help greatly appreciated.
Raw data in table derived from three different tables:
| rowId |columnName |ColumnValue |
| ---------------- |:---------------:| -----------:|
| 1 |itemNo |1 |
| 1 |itemName |Polo Shirt |
| 1 |itemDescription |Green |
| 1 |price1 |4.2 |
| 1 |price2 |5.3 |
| 1 |price3 |7.5 |
| 1 |displayOrder |1 |
| 1 |rowId |[NULL] |
| 2 |itemNo |12 |
| 2 |itemName |Digital Watch|
| 2 |itemDescription |Red Watch |
| 2 |price1 |4.0 |
| 2 |price2 |2.0 |
| 2 |price3 |1.5 |
| 2 |displayOrder |3 |
| 2 |rowId |[NULL] |
SQL using pivot to give me the column structure:
select [displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3],[rowId]
from
(
SELECT [columnName], [columnValue] , row_number() over(partition by c.columnName order by cv.rowId) as rn
FROM tblFlatTable AS t
JOIN tblFlatColumns c
ON t.flatTableId = c.flatTableId
JOIN tblFlatColumnValues cv
ON cv.flatColumnId = c.flatColumnId
WHERE (t.flatTableId = 1) AND (t.isActive = 1)
AND (c.isActive = 1) AND (cv.isActive = 1)
) as S
Pivot
(
MIN([columnValue])
FOR columnName IN ([displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3],[rowId])
) as P
Result:
|displayOrder|itemDescription|itemName |price1|price2|price3|rowId |
| ---------- |:-------------:|:------------:|:----:|:----:|:----:|-----:|
|1 |Green |Polo Shirt |4.2 |5.3 |7.5 |[NULL]|
|3 |Red watch |Digital Watch |4.0 |2.0 |1.5 |[NULL]|
I understand why I'm getting the NULL value for rowId. What I'm stuck on and I'm not sure if it's possible to do as I've looked an many example and none seem to do this, that is to pull the value for rowId from the raw data and add it to my structure.
It looks obvious now!
I'm now not including rowId as part of my flat structure.
| rowId |columnName |ColumnValue |
| ---------------- |:---------------:| -----------:|
| 1 |itemNo |1 |
| 1 |itemName |Polo Shirt |
| 1 |itemDescription |Green |
| 1 |price1 |4.2 |
| 1 |price2 |5.3 |
| 1 |price3 |7.5 |
| 1 |displayOrder |1 |
| 2 |itemNo |12 |
| 2 |itemName |Digital Watch|
| 2 |itemDescription |Red Watch |
| 2 |price1 |4.0 |
| 2 |price2 |2.0 |
| 2 |price3 |1.5 |
| 2 |displayOrder |3 |
I've updated the SQL, you can see I'm pulling in the rowId from tblFlatColumnValues
select [rowId],[displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3]
from
(
SELECT cv.rowId, [columnName], [columnValue] , row_number() over(partition by c.columnName order by cv.rowId) as rn
FROM tblFlatTable AS t
JOIN tblFlatColumns c
ON t.flatTableId = c.flatTableId
JOIN tblFlatColumnValues cv
ON cv.flatColumnId = c.flatColumnId
WHERE (t.flatTableId = 1) AND (t.isActive = 1)
AND (c.isActive = 1) AND (cv.isActive = 1)
) as S
Pivot
(
MIN([columnValue])
FOR columnName IN ([displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3])
) as P