I have the following table called table_persons in Hive:
+--------+------+------------+
| people | type | date |
+--------+------+------------+
| lisa | bot | 19-04-2022 |
| wayne | per | 19-04-2022 |
+--------+------+------------+
If type is "bot", I have to add two rows in the table d1_info else if type is "per" i only have to add one row so the result is the following:
+---------+------+------------+
| db_type | info | date |
+---------+------+------------+
| x_bot | x | 19-04-2022 |
| x_bnt | x | 19-04-2022 |
| x_per | b | 19-04-2022 |
+---------+------+------------+
How can I add two rows if this condition is met?
with a Case When maybe?
You may try using a union to merge or duplicate the rows with bot. The following eg unions the first query which selects all records and the second query selects only those with bot.
Edit
In response to the edited question, I have added an additional parity column (storing 1 or 0) named original to differentiate the duplicate entry named
SELECT
p1.*,
1 as original
FROM
table_persons p1
UNION ALL
SELECT
p1.*,
0 as original
FROM
table_persons p1
WHERE p1.type='bot'
You may then insert this into your other table d1_info using the above query as a subquery or CTE with the desired transformations CASE expressions eg
INSERT INTO d1_info
(`db_type`, `info`, `date`)
WITH merged_data AS (
SELECT
p1.*,
1 as original
FROM
table_persons p1
UNION ALL
SELECT
p1.*,
0 as original
FROM
table_persons p1
WHERE p1.type='bot'
)
SELECT
CONCAT('x_',CASE
WHEN m1.type='per' THEN m1.type
WHEN m1.original=1 AND m1.type='bot' THEN m1.type
ELSE 'bnt'
END) as db_type,
CASE
WHEN m1.type='per' THEN 'b'
ELSE 'x'
END as info,
m1.date
FROM
merged_data m1
ORDER BY m1.people,m1.date;
See working demo db fiddle here
I think what you want is to create a new table that captures your logic. This would simplify your query and make it so you could easily add new types without having to edit logic of a case statement. It may also make it cleaner to view your logic later.
CREATE TABLE table_persons (
`people` VARCHAR(5),
`type` VARCHAR(3),
`date` VARCHAR(10)
);
INSERT INTO table_persons
VALUES
('lisa', 'bot', '19-04-2022'),
('wayne', 'per', '19-04-2022');
CREATE TABLE info (
`type` VARCHAR(5),
`db_type` VARCHAR(5),
`info` VARCHAR(1)
);
insert into info
values
('bot', 'x_bot', 'x'),
('bot', 'x_bnt', 'x'),
('per','x_per','b');
and then you can easily do a join:
select
info.db_type,
info.info,
persons.date date
from
table_persons persons inner join info
on
info.type = persons.type
Related
I have three tables in MS SQL Server, one with addresses, one with addresstypes and one with assignments of addresstypes:
Address:
IdAddress | Name | ...
1 | xyz
2 | abc |
...
AddressTypes
IdAddresstype | Caption
1 | Customer
2 | Supplier
...
Address2AddressType
IdAddress2AddressType | IdAddress | IdAddressType
1 | 1 | 2
3 | 3 | 2
Now I want to insert a row into Address2AddressType for each address, which is not assigned yet / not emerging in this table with the Addresstype Customer.
So to select those addresses, I use this query:
SELECT adresses.IdAddress
FROM [dbo].[Address] AS adresses
WHERE adresses.IdAddress NOT IN (SELECT adresstypeassignment.IdAddress
FROM [dbo].[Address2AddressType] AS adresstypeassignment)
Now I need to find a way to loop through all those results to insert like this:
INSERT INTO (Address2AddressType (IdAddress, IdAddresstype)
VALUES (<IdAddress from result>, 1)
Can anybody help, please?
Thanks in advance.
Regards
Lars
Use insert . . . select:
INSERT INTO Address2AddressType (IdAddress, IdAddresstype)
SELECT a.IdAddress, 1
FROM [dbo].[Address] a
WHERE a.IdAddress NOT IN (SELECT ata.IdAddress FROM [dbo].Address2AddressType ata);
I also simplified the table aliases.
Note: I don't recommend NOT IN for this purpose, because it does not handle NULLs the way you expect (if any values returned by the subquery are NULL no rows at all will be inserted). I recommend NOT EXISTS instead:
INSERT INTO Address2AddressType (IdAddress, IdAddresstype)
SELECT a.IdAddress, 1
FROM [dbo].[Address] a
WHERE NOT EXISTS (SELECT 1
FROM [dbo].Address2AddressType ata
WHERE ata.IdAddress = a.IdAddress
);
I have a requirement that needs columns with values to be transposed into rows. For instance refer to the table below:
cust:
cust_id | cover1 | cover2 | cover3
1234 | 'PAG' | Null | 'TDE'
5678 | Null | 'GAP' | Null
Given the above table, we have to find out which columns have a value and if there is a value in that column then there should be a row created. For e.g.
cust_id | cover
1234 | 'PAG'
1234 | 'TDE'
5678 | 'GAP'
For the customer 1234 only cover1 and cover 3 are populated hence there will be 2 records created. For 5678 cover1 & cover2 are Nulls hence, only 1 record for cover_3 needs to be created.
I could apply a simple approach like below. But I was wondering if there is an elegant approach and a smarter solution to this.
select cust_id, cover1 AS cover where cover1 IS Not Null
UNION ALL
select cust_id, cover2 AS cover where cover2 IS Not Null
UNION ALL
select cust_id, cover3 AS cover where cover3 IS Not Null
Please share your thoughts. We use Spark-SQL 2.4
Thanks
Maybe like this:
SELECT CUST_ID, COVER FROM(
SELECT * FROM test
UNPIVOT(
COVER
for COVER_C
IN(
COVER1,
COVER2,
COVER3))
);
Here is the DEMO
The requirement could be met by using Spark's stack() with the %sql mode:
create temporary view cust
as
select * from values (12345, 'PAG', Null, 'TDE'),
(5678, Null, 'GAP', Null) as (cust_id, cover1, cover2, cover3);
select * from
(select cust_id,
stack(3, 'cover1', cover1,
'cover2', cover2,
'cover3', cover3) as (cover_nm, cover)
from cust
)
where cover is not null;
SQL Noob here.
I realize that many variations to this question have been asked but none seems to work or be fully applicable to my annoying situation, ie. I dont think PIVOT would work for what I require. I cant fathom the necessary words to google what I need efficiently.
I have the following query:
Select w.WORKORDERID, y.Answer
From
[SD].[dbo].[WORKORDERSTATES] w
LEFT JOIN [SD].[dbo].[WO_RESOURCES] x
ON w.workorderid = x.woid
Full Outer Join [SD].[dbo].ResourcesQAMapping y
ON x.UID = y.MAPPINGID
WHERE w.APPR_STATUSID = '2'
AND w.STATUSID = '1'
AND w.REOPENED = 'false'
It will bring back the following result:
+-----------+---------------------+
| WORKORDER | Answer |
+-----------+---------------------+
| 55693 | Brad Pitt |
| 55693 | brad.pitt#mycom.com |
| 55693 | Location |
| 55693 | NULL |
| 55693 | george |
+-----------+---------------------+
I would like all rows related to the value 55693 to output as columns like below:
+-----------+-----------+---------------------+----------+--------+--------+
| WORKORDER | VALUE1 | VALUE2 | VALUE3 | VALUE4 | VALUE5 |
+-----------+-----------+---------------------+----------+--------+--------+
| 55693 | Brad Pitt | brad.pitt#mycom.com | Location | NULL | george |
+-----------+-----------+---------------------+----------+--------+--------+
There will always be the same amount of values, and I am almost sure that the solution involves creating a temporary table but I cant get it to work any which way.
Any help would be greatly appreciated.
If you always have the same number of values (5) you can use a static PIVOT, otherwise you need a dynamic TSQL statement with PIVOT.
In both cases you'll need to add a column to guarantee rows/columns ordering otherwise there is no guarantee that you'll see the correct value in each column.
Here is a sample query thet uses a static PIVOT on 5 values (but remember to add a column to properly order the data replacing ORDER BY WORKORDER with ORDER BY YOUR_COLUMN_NAME):
declare #tmp table (WORKORDER int, Answer varchar(50))
insert into #tmp values
(55693, 'Brad Pitt')
,(55693, 'brad.pitt#mycom.com')
,(55693, 'Location')
,(55693, 'NULL')
,(55693, 'george')
select * from
(
select
WORKORDER,
Answer,
CONCAT('VALUE', ROW_NUMBER() OVER (PARTITION BY WORKORDER ORDER BY WORKORDER)) AS COL
from #tmp
) as src
pivot
(
max(Answer) for COL in ([VALUE1], [VALUE2], [VALUE3], [VALUE4], [VALUE5])
)
as pvt
Results:
Try to select another column that has different values as answer column and try to run pivot and that will work
So, i have 2 tables, for simplicity's sake i'll show 2 columns for one, and only one for the other, but they have a lot more data.
ok, so, the tables looks like this:
_________________________________
| table 1 |
+---------------------------------+
| gas_deu est_pag |
+---------------------------------+
| 56857 (null) |
| 60857 (null) |
| 80857 (null) |
+---------------------------------+
______________________
| table 2 |
+----------------------+
| gas_pag |
+----------------------+
| 56857 |
| 21000 |
| 75857 |
+----------------------+
table 1 and table 2 can be joined using id_edi, and nr_dep (same name in both tables)
what's happening here is basically the following:
in table 1, i have gas_deu which is a number owned to someone.
in table 2 gas_pag is how much has been paid, meaning gas_deu-gas_pag should give 0 (or negative) if gas_deu was paid in full, or a positive number if it was partially paid
Also, some rows from table 1 are not in table 2 meaning gas_deu has not been paid at all.
What i need to do is update est_pag in table 1 with the following:
if gas_deu-gas_pag<=0 (debt paid) then update est_pag value to 1
if gas_deu-gas_pag>0 (partially paid) then update est_pag value to 2
if a row is in table 1 but not in table 2, it has not been paid at all so est_pag value will be 3
I have tried a lot, and i mean A LOT of code in the update for this, the tables have a lot of columns so i won-t post the code i have tried because it would only get more confusing
I mostly tried using select sub-queries in the set of the update, and in the where of the update. all of them always give me the single row error, i know this happens because it returns more than one value, but the how would i do this? I don't see a way in which i get only one value of the query and update only the rows that match the debt status.
using case was the logical choice for me but it always returns more than one row it seems (tried to make a (case when gas_deu-gas_pag<=0 then 1 else est_pag end), since if i could get at least one value in there, it would be a start, but get the same more than one row problem)
Any help or suggestions are highly appreciated,i already tried everything i could think of, and a lot of answers from here in stackoverflow, but still can't get it to work out
Edit: adding what table 1 should look like after updating
| table 1 |
+---------------------------------+
| gas_deu est_pag |
+---------------------------------+
| 56857 1 |
| 60857 2 |
| 80857 2 |
+---------------------------------+
Update 2:
update table1
set est_pag=(select (case when (select min((case when gasto.gas_deu-
pago.gas_pag<=0 then 0 else null end))
from table2 pago full outer join gasto_comun_pruebaestpago gasto on
pago.nr_dep=gasto.nr_dep
where gasto.nr_dep=pago.nr_dep and gasto.id_edi=pago.id_edi)=0 then 1 else
null end)
from table2 pago full outer join gasto_comun_pruebaestpago gasto on
pago.nr_dep=gasto.nr_dep
where gasto.nr_dep=pago.nr_dep and gasto.id_edi=pago.id_edi)
where est_pag is null;
This is one of many codes i tried. this one changes all values to 1, this is bacause of the min() in there, that outpust just one row, with a 0 and the gets checked by the case, 0=0 so everything goes to 1. The problem is that, without the min(), the select does what i need, but the update throws the 'single-row subquery returns more than one row' error again
You can update through a left outer join. It is possible the second table has more than one payment for the same account (customers may make partial payments), so I aggregate the second table by ID first. You didn't provide input data for testing, so in my small test I assume the tables are matched by id (primary key in the first table, foreign key in the second table - even though I didn't write the constraints into the table definitions).
The way I wrote the update, a 3 will be assigned if an account is not present in the second table, but also if it is present but with NULL in the gas_pag column. (It would be best if that column was declared NOT NULL from the outset!) If a different handling is desired in that case, you didn't say; it can be accommodated easily though.
So, here goes.
TABLE CREATION
create table t1 ( id number, gas_deu number, est_pag number );
insert into t1
select 101, 56857, null from dual union all
select 104, 60857, null from dual union all
select 108, 80857, null from dual
;
create table t2 ( id number, gas_pag number ) ;
insert into t2
select 101, 56857 from dual union all
select 104, 60000 from dual
;
select * from t1;
ID GAS_DEU EST_PAG
---------- ---------- ----------
101 56857
104 60857
108 80857
select * from t2;
ID GAS_PAG
---------- ----------
101 56857
104 60000
UPDATE STATEMENT
update
( select est_pag, gas_deu, gas_pag
from t1 left join
( select id, sum(gas_pag) as gas_pag from t2 group by id ) x
on t1.id = x.id
)
set est_pag = case when gas_deu <= gas_pag then 1
when gas_deu > gas_pag then 2
else 3
end
;
select * from t1;
ID GAS_DEU EST_PAG
---------- ---------- ----------
101 56857 1
104 60857 2
108 80857 3
CLEAN-UP
drop table t1 purge;
drop table t2 purge;
I have the following query:
select column_name, count(column_name)
from table
group by column_name
having count(column_name) > 1;
What would be the difference if I replaced all calls to count(column_name) to count(*)?
This question was inspired by How do I find duplicate values in a table in Oracle?.
To clarify the accepted answer (and maybe my question), replacing count(column_name) with count(*) would return an extra row in the result that contains a null and the count of null values in the column.
count(*) counts NULLs and count(column) does not
[edit] added this code so that people can run it
create table #bla(id int,id2 int)
insert #bla values(null,null)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,null)
select count(*),count(id),count(id2)
from #bla
results
7 3 2
Another minor difference, between using * and a specific column, is that in the column case you can add the keyword DISTINCT, and restrict the count to distinct values:
select column_a, count(distinct column_b)
from table
group by column_a
having count(distinct column_b) > 1;
A further and perhaps subtle difference is that in some database implementations the count(*) is computed by looking at the indexes on the table in question rather than the actual data rows. Since no specific column is specified, there is no need to bother with the actual rows and their values (as there would be if you counted a specific column). Allowing the database to use the index data can be significantly faster than making it count "real" rows.
The explanation in the docs, helps to explain this:
COUNT(*) returns the number of items in a group, including NULL values and duplicates.
COUNT(expression) evaluates expression for each row in a group and returns the number of nonnull values.
So count(*) includes nulls, the other method doesn't.
We can use the Stack Exchange Data Explorer to illustrate the difference with a simple query. The Users table in Stack Overflow's database has columns that are often left blank, like the user's Website URL.
-- count(column_name) vs. count(*)
-- Illustrates the difference between counting a column
-- that can hold null values, a 'not null' column, and count(*)
select count(WebsiteUrl), count(Id), count(*) from Users
If you run the query above in the Data Explorer, you'll see that the count is the same for count(Id) and count(*)because the Id column doesn't allow null values. The WebsiteUrl count is much lower, though, because that column allows null.
The COUNT(*) sentence indicates SQL Server to return all the rows from a table, including NULLs.
COUNT(column_name) just retrieves the rows having a non-null value on the rows.
Please see following code for test executions SQL Server 2008:
-- Variable table
DECLARE #Table TABLE
(
CustomerId int NULL
, Name nvarchar(50) NULL
)
-- Insert some records for tests
INSERT INTO #Table VALUES( NULL, 'Pedro')
INSERT INTO #Table VALUES( 1, 'Juan')
INSERT INTO #Table VALUES( 2, 'Pablo')
INSERT INTO #Table VALUES( 3, 'Marcelo')
INSERT INTO #Table VALUES( NULL, 'Leonardo')
INSERT INTO #Table VALUES( 4, 'Ignacio')
-- Get all the collumns by indicating *
SELECT COUNT(*) AS 'AllRowsCount'
FROM #Table
-- Get only content columns ( exluce NULLs )
SELECT COUNT(CustomerId) AS 'OnlyNotNullCounts'
FROM #Table
COUNT(*) – Returns the total number of records in a table (Including NULL valued records).
COUNT(Column Name) – Returns the total number of Non-NULL records. It means that, it ignores counting NULL valued records in that particular column.
Basically the COUNT(*) function return all the rows from a table whereas COUNT(COLUMN_NAME) does not; that is it excludes null values which everyone here have also answered here.
But the most interesting part is to make queries and database optimized it is better to use COUNT(*) unless doing multiple counts or a complex query rather than COUNT(COLUMN_NAME). Otherwise, it will really lower your DB performance while dealing with a huge number of data.
Further elaborating upon the answer given by #SQLMeance and #Brannon making use of GROUP BY clause which has been mentioned by OP but not present in answer by #SQLMenace
CREATE TABLE table1 (
id INT
);
INSERT INTO table1 VALUES
(1),
(2),
(NULL),
(2),
(NULL),
(3),
(1),
(4),
(NULL),
(2);
SELECT * FROM table1;
+------+
| id |
+------+
| 1 |
| 2 |
| NULL |
| 2 |
| NULL |
| 3 |
| 1 |
| 4 |
| NULL |
| 2 |
+------+
10 rows in set (0.00 sec)
SELECT id, COUNT(*) FROM table1 GROUP BY id;
+------+----------+
| id | COUNT(*) |
+------+----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 3 |
| 3 | 1 |
| 4 | 1 |
+------+----------+
5 rows in set (0.00 sec)
Here, COUNT(*) counts the number of occurrences of each type of id including NULL
SELECT id, COUNT(id) FROM table1 GROUP BY id;
+------+-----------+
| id | COUNT(id) |
+------+-----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 0 |
| 3 | 1 |
| 4 | 1 |
+------+-----------+
5 rows in set (0.00 sec)
Here, COUNT(id) counts the number of occurrences of each type of id but does not count the number of occurrences of NULL
SELECT id, COUNT(DISTINCT id) FROM table1 GROUP BY id;
+------+--------------------+
| id | COUNT(DISTINCT id) |
+------+--------------------+
| NULL | 0 |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+------+--------------------+
5 rows in set (0.00 sec)
Here, COUNT(DISTINCT id) counts the number of occurrences of each type of id only once (does not count duplicates) and also does not count the number of occurrences of NULL
It is best to use
Count(1) in place of column name or *
to count the number of rows in a table, it is faster than any format because it never go to check the column name into table exists or not
There is no difference if one column is fix in your table, if you want to use more than one column than you have to specify that how much columns you required to count......
Thanks,
As mentioned in the previous answers, Count(*) counts even the NULL columns, whereas count(Columnname) counts only if the column has values.
It's always best practice to avoid * (Select *, count *, …)