Replacing null info if matching id in select statement - sql

I am pulling from multiple tables and have three union queries. I want information that is not in one row to be filled into the information from another row while still keeping a separate list of values for the month.
There is three rows for each id and essentially I want any information that is not in a row to be copied from a row with a matching id, but still keeping the three rows separate due to some pivoted monthly data that I want to keep separate for each row.

You can use analytic functions:
select max(name) over (partition by id) as name,
max(zip) over (partition by id) as zip,
max(type1) over (partition by id) as type1,
id, type, "201907", "201906", "201905"
from t;

You can use the following query:
SELECT
T2.MAX_NAME,
T2.MAX_ZIP,
T2.MAX_TYPE,
T1.ID,
T1.TYP1,
T1."201907",
T1."201906",
T1."201905"
FROM
TAB T1
JOIN (
SELECT
MAX(NAME) MAX_NAME,
MAX(ZIP) MAX_ZIP,
MAX(TYPE) MAX_TYPE,
ID
FROM
TAB
GROUP BY
ID
) T2 ON ( T1.ID = T2.ID );
db<>fiddle demo

Related

Getting MAX of a column and adding one more

I'm trying to make an SQL query that returns the greatest number from a column and its respective id.
For more information I have two columns ID and NUMBER. Both of them have 2 entries and I want to get the highest number with the ID next to it. This is what I tried but didn't success.
SELECT ID, MAX(NUMBER) AS MAXNUMB
FROM TABLE1
GROUP BY ID, MAXNUMB;
The problem I'm experiencing is that it just shows ALL the entries and if I add a "where" expression it just shows the same (all entries [ids+numbers]).
Pd.: Yes, I got what I wanted but only with one column (number) if I add another column (ID) to select it "brokes".
Try:
SELECT
ID,
A_NUMBER
FROM TABLE1
WHERE A_NUMBER = (
SELECT MAX(A_NUMBER)
FROM TABLE1);
Presuming you want the IDs* of the row with the highest number (and not, instead, the highest number for each ID -- if IDs were not unique in your table, for example).
* there may be more than one ID returned if there are two or more IDs with equal maximum numbers
you can try this
Select ID,maxNumber
From
(
SELECT
ID,
(Select Max(NUMBER) from Tmp where Id = t.Id) maxNumber
FROM
Tmp t
)T1
Group By ID,maxNumber
The query you posted has an illegal column name (number) and is group by the alias for the max value, which is illegal and also doesn't make sense; and you can't include the unaliased max() within the group-by either. So it's likely you're actually doing something like:
select id, max(numb) as maxnumb
from table1
group by id;
which will give one row per ID, with the maximum numb (which is the new name I've made up for your numeric column) for each ID. Or as you said you get "ALL the entries" you might have group by id, numb, which would show all rows from the table (unless there are duplicate combinations).
To get the maximum numb and the corresponding id you could group by id only, order by descending maxnumb, and then return the first row only:
select id, max(numb) as maxnumb
from table1
group by id
order by maxnumb desc
fetch first 1 row only
If there are two ID with the same maxnumb then you would only get one of them - and which one is indeterminate unless you modify the order by - but in that case you might prefer to use first 1 row with ties to see them all.
You could achieve the same thing with a subquery and analytic function to generating a ranking, and have the outer query return the highest-ranking row(s):
select id, numb as maxnumb
from (
select id, numb, dense_rank() over (order by numb desc) as rnk
from table1
)
where rnk = 1
You could also use keep to get the same result as first 1 row only:
select max(id) keep (dense_rank last order by numb) as id, max(numb) as maxnumb
from table1
fiddle

Joining data from two sources using bigquery

Can anyone please check whether below code is correct? In cte_1, I’m taking all dimensions and metrics from t1 excpet value1, value2, value3. In cte_2, I’m finding the unique row number for t2. In cte_3, I’m taking all distinct dimensions and metrics using join on two keys such as Date, and Ad. In cte_4, I’m taking the values for only row number 1. I’m getting sum(value1),sum(value2),sum(value3) correct ,but sum(value4) is incorrect
WITH cte_1 AS
(SELECT *except(value1, value2, value3) FROM t1 where Date >"2020-02-16" and Publisher ="fb")
-- Find unique row number from t2--
,cte_2 as(
SELECT ROW_NUMBER() OVER(ORDER BY Date) distinct_row_number, * FROM t2
,cte_3 as
(SELECT cte_2.*,cte_1.*except(Date) FROM cte_2 join cte_1
on cte_2.Date = cte_1. Date
and cte_2.Ad= cte_1.Ad))
,cte_4 AS (
(SELECT *
FROM
(
SELECT *,
row_number() OVER (PARTITION BY distinct_row_number ORDER BY Date) as rn
FROM cte_3 ) T
where rn = 1 ))
select sum(value1),sum(value2),sum(value3),sum(value4) from cte_4
Please see the sample table below:
Whilst your data does not seem compliant with the query you shared, since it is lacking the field named Ad and other fields have different names, such as Date and ReportDate, I was able to identify some issues and propose improvements.
First, within your temp table cte_1, you are only using a filter in the WHERE clause, you could use it within your from statement in your last step, such as :
SELECT * FROM (SELECT field1,field2,field3 FROM t1 WHERE Date > DATE(2020,02,16) )
Second, in cte_2, you need to select all the columns you will need from the table t2. Otherwise, your table will have only the row number and it won't be possible to join it with other tables, once it does not provide any other information. Thus, if you need the row number, you select it together with the other columns, which it has to include your primary key if you will perform any join in the future. The syntax would be as follows:
SELECT field1, field2, ROW_NUMBER() OVER(ORDER BY Date) FROM t2
Third, in cte_3, I assume you want to perform an INNER JOIN. Thus, you need to make sure that the primary keys are present in both tables, in your case Date and Ad, which I could not find within your data. Furthermore, you can not have duplicated names when joining two tables and selecting all the columns. For example, in your case you have Brand, value 1, value 2 and value 3 in both tables, it will cause an error. Thus, you need to specify where these fields should come from by selecting one by one or the using a EXCEPT clause.
Finally, in cte_4 and your final select could be together in one step. Basically, you are selecting only one row of data ordered by Date. Then summing the fields value 1, value 2 and value 3 individually based on the partition by date. Moreover, you are not selecting any identifier for the sum, which means that your table will have only the final sums. In general, when peforming a aggregation, such as SUM(), the primary key(s) is selected as well. Lastly, this step could have been performed in one step such as follows, using only the data from t2:
SELECT ReportDate, Brand, sum(value1) as sum_1,sum(value2) as sum_1,sum(value3) as sum_1, sum(value4) as sum_1 FROM (SELECT t2.*, ROW_NUMBER() OVER(PARTITION BY Date ORDER BY Date) as rn t2)
WHERE rn=1
GROUP BY ReportDate, Brand
UPDATE:
With your explanation in the comment section. I was able to created a more specific query. The fields ReportDate,Brand,Portfolio,Campaign and value1,value2,value3 are from t2. Whilst value4 is from t1. The sum is made based on the row number equals to 1. For this reason, the tables t1 and t2 are joined before being using ROW_NUMBER(). Finally, in the last Select statement rn is not selected and the data is aggregated based on ReportDate, Brand, Portfolio and t2.Campaign.
WITH cte_1 AS (
SELECT t2.ReportDate, t2.Brand, t2.Portfolio, t2.Campaign,
t2.value1, t2.value2, t2.value3, t1.value4
FROM t2 LEFT JOIN t1 on t2.ReportDate = t1.ReportDate and t1.placement=t2.Ad
),
cte_2 AS(
SELECT *, ROW_NUMBER() OVER(PARTITION BY Date ORDER BY ReportDate) as rn FROM cte_1
)
SELECT ReportDate, Brand, Portfolio, Campaign, SUM(value1) as sum1, SUM(value2) as sum2, SUM(value3) as sum3,
SUM(value4) as sum4
FROM cte_2
WHERE rn=1
GROUP BY 1,2,3,4

How do we find frequency of one column based off two other columns in SQL?

I'm relatively new to working with SQL and wasn't able to find any past threads to solve my question. I have three columns in a table, columns being name, customer, and location. I'd like to add an additional column determining which location is most frequent, based off name and customer (first two columns).
I have included a photo of an example where name-Jane customer-BEC in my created column would be "Texas" as that has 2 occurrences as opposed to one for California. Would there be anyway to implement this?
If you want 'Texas' on all four rows:
select t.Name, t.Customer, t.Location,
(select t2.location
from table1 t2
where t2.name = t.name
group by name, location
order by count(*) desc
fetch first 1 row only
) as most_frequent_location
from table1 t ;
You can also do this with analytic functions:
select t.Name, t.Customer, t.Location,
max(location) keep (dense_rank first order by location_count desc) over (partition by name) most_frequent_location
from (select t.*,
count(*) over (partition by name, customer, location) as location_count
from table1 t
) t;
Here is a db<>fiddle.
Both of these version put 'Texas' in all four rows. However, each can be tweaks with minimal effort to put 'California' in the row for ARC.
In Oracle, you can use aggregate function stats_mode() to compute the most occuring value in a group.
Unfortunately it is not implemented as a window function. So one option uses an aggregate subquery, and then a join with the original table:
select t.*, s.top_location
from mytable t
inner join (
select name, customer, stats_mode(location) top_location
from mytable
group by name, customer
) s where s.name = t.name and s.customer = t.customer
You could also use a correlated subquery:
select
t.*,
(
select stats_mode(t1.location)
from mytable t1
where t1.name = t.name and t1.customer = t.customer
) top_location
from mytable t
This is more a question about understanding the concepts of a relational database. If you want that information, you would not put that in an additional column. It is calculated data over multiple columns - why would you store that in the table itself ? It is complex to code and it would also be very expensive for the database (imagine all the rows you have to calculate that value for if someone inserted a million rows)
Instead you can do one of the following
Calculate it at runtime, as shown in the other answers
if you want to make it more persisent, you could embed that query above in a view
if you want to physically store the info, you could use a materialized view
Plenty of documentation on those 3 options in the official oracle documentation
Your first step is to construct a query that determines the most frequent location, which is as simple as:
select Name, Customer, Location, count(*)
from table1
group by Name, Customer, Location
This isn't immediately useful, but the logic can be used in row_number(), which gives you a unique id for each row returned. In the query below, I'm ordering by count(*) in descending order so that the most frequent occurrence has the value 1.
Note that row_number() returns '1' to only one row.
So, now we have
select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1 tb_
group by Name, Customer, Location
The final step puts it all together:
select tab.*, tb_.Location most_freq_location
from table1 tab
inner join
(select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1
group by Name, Customer, Location) tb_
on tb_.Name = tab.Name
and tb_.Customer = tab.Customer
and freq_name_cust = 1
You can see how it all works in this Fiddle where I deliberately inserted rows with the same frequency for California and Texas for one of the customers for illustration purposes.

Group by unique customers and then create column for min date

I have a table with customer IDs, transaction_date
Example of table:
ID |transaction_date
PT2073|2015-02-28
PT2073|2019-02-28
PT2013|2015-04-28
PT2013|2017-02-11
PT2013|2017-07-11
GOAL: I want to create another column so that for each unique ID I get the first first transaction date. It should look like:
Example of table:
ID |transaction_date|first_transaction_date
PT2073|2015-02-28 |2015-02-28
PT2073|2019-02-28 |2015-02-28
PT2013|2015-04-28 |2015-04-28
PT2013|2017-02-11 |2015-04-28
PT2013|2017-07-11 |2015-04-28
I created another table where I select the minimum and then group it :
SELECT id, MIN(transaction_date) as first_transaction_date FROM customer_details GROUP BY id
When I checked the table I was not getting the same value in first_transaction_date column for a unique id.
Use min() as a window function:
select t.*, min(t.transaction_date) over (partition by t.id)
from t;
No group by is needed.
Use first_value() window function: It gives out the first value of an ordered group (your groups are the ids that are ordered by the transaction_date):
Click for demo:db<>fiddle
SELECT
*,
first_value(transaction_date) OVER (PARTITION BY id ORDER BY transaction_date)
FROM
mytable
If your Postgres version does not support window functions (prior 8.4, Amazon Redshift):
Click demo:db<>fiddle
SELECT
t1.*,
(SELECT min(transaction_date) FROM mytable t2 WHERE t1.id = t2.id)
FROM
mytable t1

Filter SQL data by repetition on a column

Very simple basic SQL question here.
I have this table:
Row Id __________Hour__Minute__City_Search
1___1409346767__23____24_____Balears (Illes)
2___1409346767__23____13_____Albacete
3___1409345729__23____7______Balears (Illes)
4___1409345729__23____3______Balears (Illes)
5___1409345729__22____56_____Balears (Illes)
What I want to get is only one distinct row by ID and select the last City_Search made by the same Id.
So, in this case, the result would be:
Row Id __________Hour__Minute__City_Search
1___1409346767__23____24_____Balears (Illes)
3___1409345729__23____7______Balears (Illes)
What's the easier way to do it?
Obviously I don't want to delete any data just query it.
Thanks for your time.
SELECT Row,
Id,
Hour,
Minute,
City_Search
FROM Table T
JOIN
(
SELECT MIN(Row) AS Row,
ID
FROM Table
GROUP BY ID
) AS M
ON M.Row = T.Row
AND M.ID = T.ID
Can you change hour/minute to a timestamp?
What you want in this case is to first select what uniquely identifies your row:
Select id, max(time) from [table] group by id
Then use that query to add the data to it.
SELECT id,city search, time
FROM (SELECT id, max(time) as lasttime FROM [table] GROUP BY id) as Tkey
INNER JOIN [table] as tdata
ON tkey.id = tdata.id AND tkey.lasttime = tdata.time
That should do it.
two options to do it without join...
use Row_Number function to find the last one
Select * FROM
(Select *,
row_number() over(Partition BY ID Order BY Hour desc Minute Desc) as RNB
from table)
Where RNB=1
Manipulate the string and using simple Max function
Select ID,Right(MAX(Concat(Hour,Minute,RPAD(Searc,20,''))),20)
From Table
Group by ID
avoiding Joins is usually much faster...
Hope this helps