SQLite3: merge rows with common columns - sql

For some context I have a table in SQLite3 that currently looks like this:
What I am looking to do is merge rows with the same breed. The same columns will not be populated in both cases. So far I have tried this kind of query but it doesn't really do the job I am looking for, as it will not deduplicate or merge the rows as desired. Also it seems to be difficult to generalise to all columns without having to manually type out each column name.
select distinct t1.breed, coalesce(t1.dog_group_1, t2.dog_group_1) from breed_merge t1 left join breed_merge t2 on t1.breed = t2.breed;
Output:
Afador|
Affenhuahua|
Affenpinscher|
Affenpinscher|GROUP 1 - TOYS
Afghan Hound|
Afghan Hound|GROUP 4 - HOUNDS
...
Desired output:
Afador|
Affenhuahua|
Affenpinscher|GROUP 1 - TOYS
Afghan Hound|GROUP 4 - HOUNDS
...

For this sample data, where you have max 2 rows for each breed and each of these 2 rows (if they exist) contain a value or null, all you have to do is group by breed and use an aggregate function like MAX() for each of the other columns:
SELECT breed, MAX(imgsrc) imgsrc, MAX(dog_group_1) dog_group_1, .....
FROM breed_merge
GROUP BY breed

Related

Filter by one column then count unique value in another column in SQL

I would like to filter data by column Base =1 and then count the number of unique values in another column 'Animal' in SQL, data:
Animal Base Value
1 A 1 X
2 B 1 X
3 A 2 Y
4 A 3 V
Expected output in this case is 2 from the first two rows.
Simpler than you may have thought:
SELECT count(DISTINCT Animal)
FROM tbl
WHERE Base = 1;
Should work in any halfway decent RDBMS including your undisclosed one. (You may have to enclose column names in double-quotes.)
This should do it, assuming the table is named animals:
select count(*) from (select distinct Animal from animals where Base=1) tb1;

How to write a SQL query to calculate percentages based on values across different tables?

Suppose I have a database containing two tables, similar to below:
Table 1:
tweet_id tweet
1 Scrap the election results
2 The election was great!
3 Great stuff
Table 2:
politician tweet_id
TRUE 1
FALSE 2
FALSE 3
I'm trying to write a SQL query which returns the percentage of tweets that contain the word 'election' broken down by whether they were a politician or not.
So for instance here, the first 2 tweets in Table 1 contain the word election. By looking at Table 2, you can see that tweet_id 1 was written by a politician, whereas tweet_id 2 was written by a non-politician.
Hence, the result of the SQL query should return 50% for politicians and 50% for non-politicians (i.e. two tweets contained the word 'election', one by a politician and one by a non-politician).
Any ideas how to write this in SQL?
You could do this by creating one subquery to return all election tweets, and one subquery to return all election tweets by politicians, then join.
Here is a sample. Note that you may need to cast the totals to decimals before dividing (depending on which SQL provider you are working in).
select
politician_tweets.total / election_tweets.total
from
(
select
count(tweet) as total
from
table_1
join table_2 on table_1.tweet_id = table_2.tweet_id
where
tweet like '%election%'
) election_tweets
join
(
select
count(tweet) as total
from
table_1
join table_2 on table_1.tweet_id = table_2.tweet_id
where
tweet like '%election%' and
politician = 1
) politician_tweets
on 1 = 1
You can use aggregation like this:
select t2.politician, avg( case when t.tweet like '%election%' then 1.0 else 0 end) as election_ratio
from tweets t join
table2 t2
on t.tweet_id = t2.tweet_id
group by t2.politician;
Here is a db<>fiddle.

How can I use a row value to dynamically select a column name in Oracle SQL 11g?

I have two tables, one with a single row for each "batch_number" and another with defect details for each batch. The first table has a "defect_of_interest" column which I would like to link to one of the columns in the second table. I am trying to write a query that would then pick the maximum value in that dynamically linked column for any "unit_number" in the "batch_number".
Here is the SQLFiddle with example data for each table: http://sqlfiddle.com/#!9/a1c27d
For example, the maximum value in the DEFECT_DETAILS.SCRATCHES column for BATCH_NUMBER = A1 is 12.
Here is my desired output:
BATCH_NUMBER DEFECT_OF_INTEREST MAXIMUM_DEFECT_COUNT
------------ ------------------ --------------------
A1 SCRATCHES 12
B3 BUMPS 4
C2 STAINS 9
I have tried using the PIVOT function, but I can't get it to work. Not sure if it works in cases like this. Any help would be much appreciated.
If the number of columns is fixed (it seems to be) you can use CASE to select the specific value according to the related table. Then aggregating is simple.
For example:
select
batch_number,
max(defect_of_interest) as defect_of_interest,
max(defect_count) as maximum_defect_count
from (
select
d.batch_number,
b.defect_of_interest,
case when b.defect_of_interest = 'SCRATCHES' then d.scratches
when b.defect_of_interest = 'BUMPS' then d.bumps
when b.defect_of_interest = 'STAINS' then d.stains
end as defect_count
from defect_details d
join batches b on b.batch_number = d.batch_number
) x
group by batch_number
order by batch_number;
See Oracle example in db<>fiddle.

Why aren't these two sql statements returning same output?

I'm just getting started with sql and have the objective to transform this:
select X.persnr
from Pruefung X
where X.persnr in (
select Y.persnr
from pruefung Y
where X.matrikelnr <> Y.matrikelnr)
output:
into the same output but using a form of join. I tried it the way below but I can't seem to get "rid" of the cartesian product as far as i can see. Or maybe i misunderstood the above statement what it should actually do. For me the above says "for each unique matrikelnr display all corresponding persnr".
select X.persnr
from Pruefung X
join pruefung y on x.persnr=y.persnr
where x.matrikelnr<>y.matrikelnr
output: A long list (I don't want to fill the entire question with it) - i am guessing the cartesian product from the join
This is the relation I am using.
Edit: Distinct (unless i am using it in the wrong place) won't work because then persnr is only displayed once, thats not the objective though.
Your initial query actually does:
select persnr from Pruefung if the same persnr exists for a a diferent matrikelnr.
"for each unique matrikelnr display all corresponding persnr"
This is achieved using aggregation:
Depending on the DBMS you are using you could use something like (SQL Server uses STRING_AGG, but MySQL uses GROUP_CONCAT)
SELECT matrikelnr,STRING_AGG(matrikelnr,',')
GROUP BY matrikelnr
You cannot easily achieve what you got from a correlated query (your first attempt) by using a join.
Edit:
A join does not result in a "Cartesian product" expect from when there is no join condition (CROSS JOIN).
A join matches two sets based on a join condition. The reason why you get more entries is that the join looks at the join key (PERSNR) and does its matching.
For example for 101 you have 3 entries. That means you will get 3x3 reults.
You then filter out the results for the cases where X.matrikelnr <> Y.matrikelnr If we assume matrikelnr is unique that would mean the row matched with itself. so you will lose 3 results ending up with 3x3 - 3 = 6.
If you want to achieve something in SQL you must first define what you are expecting to use and then use the appropiate tools (in this case correlated queries not joins)
You can write your 1st query with EXISTS instead of IN like:
select X.persnr
from Pruefung X
where exists (
select 1
from pruefung Y
where X.persnr = Y.persnr and X.matrikelnr <> Y.matrikelnr
)
This way it's obvious that this query means:
return all the persnrs of the table for which there exists another
row with the same persnr but different matrikelnr
For your sample data the result is all the persnrs of the table.
Your 2nd query though, does something different.
It links every row of the table with all the rows of the same table with the same persnr but different matrikelnr.
So for every row of the table you will get as many as rows as there are for the same persnrs but different matrikelnrs.
For example for the 1st row with persnr = 101 and matrikelnr = 8532478 you will get 2 rows because there are 2 rows in the table with persnr = 101 and matrikelnr <> 8532478.
You are right. It's the cartesian product's fault. Suppose you have persnr 1,1,1,2,2,2 in the first table and persnr 1,1,1,2,2 in the second. How many lines are you expecting to be returned?
In pdeuso-code it would go like this
Select
...
WHERE persnr in (second table)
-- 6 lines
Select persnr
FROM ...
JOIN ... ON a.persnr = b.persnr
-- 3X3 + 3X2 = 15 lines.
SELECT DISTINCT persnr
FROM ...
JOIN ... ON a.persnr = b.persnr
-- 2 lines (1 and 2)
Take your pick

SAP HANA SQL - Concatenate multiple result rows for a single column into a single row

I am pulling data and when I pull in the text field my results for the "distinct ID" are sometimes being duplicated when there are multiple results for that ID. Is there a way to concatenate the results into a single column/row rather than having them duplicated?
It looks like there are ways in other SQL platforms but I have not been able to find something that works in HANA.
Example
Select
Distinct ID
From Table1
If I pull only Distinct ID I get the following:
ID
1
2
3
4
However when I pull the following:
Example
Select
Distinct ID,Text
From Table1
I get something like
ID
Text
1
Dog
2
Cat
2
Dog
3
Fish
4
Bird
4
Horse
I am trying to Concat the Text field when there is more than 1 row for each ID.
What I need the results to be (Having a "break" between results so that they are on separate lines would be even better but at least a "," would work):
ID
Text
1
Dog
2
Cat,Dog
3
Fish
4
Bird,Horse
I see Kiran has just referred to another valid answer in the comment, but in your example this would work.
SELECT ID, STRING_AGG(Text, ',')
FROM TABLE1
GROUP BY ID;
You can replace the ',' with other characters, maybe a '\n' for a line break
I would caution against the approach to concatenate rows in this way, unless you know your data well. There is no effective limit to the rows and length of the string that you will generate, but HANA will have a limit on string length, so consider that.