Select Rows with Longest Unique Strings - sql

I'm trying to figure out how to select a subset of rows from a table, with the longest unique string for each "parent" string. I'll provide table examples below and my code that didn't work.
Current Table:
Name
SalePrice
NorthAmerica\US\Northeast\NewYork
8576
NorthAmerica\US\Northeast
2598
SouthAmerica\Brazil
1348
SouthAmerica\Chile\NorthEast
9726
SouthAmerica\Chile
4412
NorthAmerica\Canada\Ontario
3894
NorthAmerica\Canada
6321
Desired Output:
Name
SalePrice
NorthAmerica\US\Northeast\NewYork
8576
SouthAmerica\Brazil
1348
SouthAmerica\Chile\NorthEast
9726
NorthAmerica\Canada\Ontario
3894
Originally, I thought I could apply some form of logic based off the number of backslashes (
example: SELECT * FROM TestTable WHERE Name LIKE '%\\%'
). However, this logic doesn't work because some names furthest branch may only have 1 backslash while others may have 3+.
Code to generate test table is below and any help/advice would be greatly appreciated
create table t1(
[name] varchar(60),
[saleprice] int );
insert into t1 values ('NorthAmerica\US\Northeast\NewYork',8576);
insert into t1 values ('NorthAmerica\US\Northeast',2598);
insert into t1 values ('SouthAmerica\Brazil',1348);
insert into t1 values ('SouthAmerica\Chile\NorthEast',9726);
insert into t1 values ('SouthAmerica\Chile',4412);
insert into t1 values ('NorthAmerica\Canada\Ontario',3894);
insert into t1 values ('NorthAmerica\Canada',6321);

Use the operator LIKE with NOT EXISTS:
SELECT t1.*
FROM t1
WHERE NOT EXISTS (
SELECT 1
FROM t1 AS t2
WHERE t2.name LIKE t1.name + '_%'
);
See the demo.

Related

Multiple Queries into 1 Query doing counts

Ok, trying to make this make sense and see if I can do this. I have multiple queries stored in SQL Server 2012 that I can run individually to get actual results. Each of these queries connect to a multitude of tables. What I want to do is take all of these queries and put them into a single query to get counts in one master list.
So for example. I have a query that looks for all records that have no email addresses. The next query looks for all records missing a phone number in a set field. The next query looks for records missing a filled field.
Each of these queries pull back results and I can run them one at a time. I want to set myself up a single query I can run to give me counts on each in a single results list.
I started doing a Union statement and put two of the query codes into it. The results came up like this:
NoEmail NoPhone
NULL 24486
74596 NULL
What I would like this to look like is this:
NoEmail NoPhone
74596 24486
Any ideas on how to do this?
I hope this is enough info and if not, let me know and I'll get you more.
Thanks.
You can do it like this:
Set up (for demonstration purposes only, so you can see the data I'm using):
create table t1 (
id numeric,
email varchar(20)
);
insert into t1 values (1,'test#example.com');
insert into t1 (id) values (2);
insert into t1 (id) values (3);
create table t2 (
id numeric,
phone varchar(20)
);
insert into t2 values (1,'123-456-7890');
insert into t2 (id) values (2);
insert into t2 (id) values (3);
insert into t2 (id) values (4);
insert into t2 (id) values (5);
Query:
select
(select count(id) from t1 where email is null) as NoEmail,
(select count(id) from t2 where phone is null) as NoPhone;
Result:
NoEmail NoPhone
2 4
You can add as many additional queries on to the end of the query as you'd like:
select
(select count(id) from t1 where email is null) as NoEmail,
(select count(id) from t2 where phone is null) as NoPhone,
(select ...) as anotherCol,
(select ...) as yetAnotherCol;

SQLite: accumulator (sum) column in a SELECT statement

I have a table like this one:
SELECT value FROM table;
value
1
3
13
1
5
I would like to add an accumulator column, so that I have this result:
value accumulated
1 1
3 4
13 17
1 18
5 23
How can I do this? What's the real name of what I want to do? Thanks
try this way:
select value,
(select sum(t2.value) from table t2 where t2.id <= t1.id ) as accumulated
from table t1
but if it will not work on your database, just add order by something
select value,
(select sum(t2.value) from table t2 where t2.id <= t1.id order by id ) as accumulated
from table t1
order by id
this works on an oracle ;) but it should on a sqlite too
Here's a method to create a running total without the inefficiency of summing all prior rows. (I know this question is 6 years old but it's one of the first google entries for sqlite running total.)
create table t1 (value integer, accumulated integer, id integer primary key);
insert into t1 (value) values (1);
insert into t1 (value) values (3);
insert into t1 (value) values (13);
insert into t1 (value) values (1);
insert into t1 (value) values (5);
UPDATE
t1
SET
accumulated = ifnull(
(
SELECT
ifnull(accumulated,0)
FROM
t1 ROWPRIOR
WHERE
ROWPRIOR.id = (t1.id -1 )),0) + value;
.headers on
select * from t1;
value|accumulated|id
1|1|1
3|4|2
13|17|3
1|18|4
5|23|5
This should only be run once after importing all the values. Or, set the accumulated column to all nulls before running again.
The operation is called a running sum. SQLite does not support it as is, but there are ways to make it work. One is just as Sebastian Brózda posted. Another I detailed here in another question.

Comparison Query to Compare Two SQL Server Tables [duplicate]

This question already has answers here:
sql query to return differences between two tables
(14 answers)
Closed 6 years ago.
I would like to know how to compare two different database table records. What I mean is I will compare two database tables which may have different column names but same data. But one of them may have more records than the other one so I want to see what the difference is between those two tables. To do that how to write the sql query ? FYI : these two databases are under the same SQL Server instance.
Table1
------+---------
|name |lastname|
------+---------
|John |rose |
------+---------
|Demy |Sanches |
------+---------
Table2
------+----------
|name2|lastname2|
------+----------
|John |rose |
------+----------
|Demy |Sanches |
------+----------
|Ruby |Core |
------+----------
Then when after comparing table 1 and table 2, it should return Ruby Core from Table2.
Select * from Table1
Except
Select * from Table2
It will show all mismatch records between table1 and table2
Late answer but can be useful to other readers of this thread
Beside other solutions, I can recommend SQL comparison tool called ApexSQL Data Diff.
I know you'd prefer the solution not based on the software, but for other visitors, who may want to do this in an easier way, I strongly suggest reading this article: http://solutioncenter.apexsql.com/how-to-compare-sql-server-database-tables-with-different-names/
The article explains how to use the Object mapping feature in ApexSQL Data Diff, which is particularly useful in situations where two tables share the same name, but their column names are different.
To handle such a case - each column pair needs to be mapped manually in order for the data stored within them to be included when comparing SQL database tables for differences.
If you do an outer join from T1 to T2 you can find rows in the former that are not in the latter by looking for nulls in the T2 values, similarly an outer join of T2 to T1 will give you rows in T2. Union the two together and you get the lot... something like:
SELECT 'Table1' AS TableName, name, lastname FROM
Table1 OUTER JOIN Table2 ON Table1.name = Table2.name2
AND Table1.lastname = Table2.lastname
WHERE Table2.name2 IS NULL
UNION
SELECT 'Table2' AS TableName, name2 as name, lastname2 as lastname FROM
Table2 OUTER JOIN Table1 ON Table2.name2 = Table1.name
AND Table2.lastname2 = Table1.lastname
WHERE Table1.name IS NULL
That's off the top of my head - and I'm a bit rusty :)
If you are using Sql server use a full join. it does exactly the same as Murph said but in one command.
SELECT 'Table1' AS TableName, name, lastname
FROM Table1
FULL JOIN Table2 ON Table1.name = Table2.name2
AND Table1.lastname = Table2.lastname
You could use the CHECKSUM function if you're confident that the data is expressed identically.
Example:
if not OBJECT_ID('Table1', 'Table') is null drop table Table1
if not OBJECT_ID('Table2', 'Table') is null drop table Table2
create table table1
( id int identity(0, 1),
name varchar(128),
lastname varchar(128)
)
create table table2
( id int identity(0, 1),
name varchar(128),
lastname varchar(128)
)
insert into table1 (name, lastname) values ('John', 'rose')
insert into table1 (name, lastname) values ('Demy', 'Sanches')
insert into table2 (name, lastname) values ('John', 'rose')
insert into table2 (name, lastname) values ('Demy', 'Sanches')
insert into table2 (name, lastname) values ('Ruby', 'Core')
select
table2.*
from table1
right outer join table2 on CHECKSUM(table1.name, table1.lastname) = CHECKSUM(table2.name, table2.lastname)
where table1.id is null
See the CHECKSUM MSDN topic for more information.
Try dbForge Data Compare for SQL Server. It can compare and synchronize any database data. Quick, easy, always delivering a correct result. See how it flies on your database!
create table #test
(
Sno INT IDENTITY(1,1),
ExpDate VARCHAR(50),
Amt INT,
Amt1 INT,
Amt2 INT,
SumoAmt INT
)
create table #test1
(
Sno INT IDENTITY(1,1),
ExpDate VARCHAR(50),
Amt INT,
Amt1 INT,
Amt2 INT,
SumoAmt INT
)
INSERT INTO #test(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',10,20,10,40)
INSERT INTO #test(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',10,20,20,50)
INSERT INTO #test(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',10,20,30,60)
INSERT INTO #test(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',NULL,20,40,70)
INSERT INTO #test1(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',10,20,10,40)
INSERT INTO #test1(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',10,20,20,50)
INSERT INTO #test1(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',10,20,30,60)
INSERT INTO #test1(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',NULL,20,40,70)
SELECT MIN(TableName) as TableName, Sno,Expdate,Amt,Amt1,Amt2,SumoAmt
FROM
(
SELECT '#test' as TableName,Sno,Expdate,Amt,Amt1,Amt2,SumoAmt
FROM #test
UNION ALL
SELECT '#test1' as TableName,Sno,Expdate,Amt,Amt1,Amt2,SumoAmt
FROM #test1
) tmp
GROUP BY Sno,Expdate,Amt,Amt1,Amt2,SumoAmt
HAVING COUNT(*) = 1
ORDER BY sno
If you want the differences from both the table.
(SELECT *, 'in Table1' AS Comments
FROM Table1
EXCEPT
SELECT * , 'in Table1' AS Comments
FROM Table2)
UNION
(SELECT *, 'in Table2' AS Comments
FROM Table2
EXCEPT
SELECT *, 'in Table2' AS Comments
FROM Table1)
Firefly will do exactly what you're looking for. It lets you build two sql statements then compare the results of the sql queries showing missing rows and data differences. Each query can even come from a different database like oracle / sql server.
http://download.cnet.com/Firefly-Data-Compare-Tool/3000-10254_4-10633690.html?tag=mncol

T-SQL Grouping rows from the MAX length columns in different rows (?)

i'm trying to come up with a way to combine rows in a table based on the longest string in any of the rows based on a row key
example
CREATE TABLE test1
(akey int not null ,
text1 varchar(50) NULL,
text2 varchar(50) NULL,
text3 varchar(50) NULL )
INSERT INTO test1 VALUES ( 1,'Winchester Road','crawley',NULL)
INSERT INTO test1 VALUES ( 1,'Winchester Rd','crawley','P21869')
INSERT INTO test1 VALUES ( 1,'Winchester Road','crawley estate','P21869')
INSERT INTO test1 VALUES ( 1,'Winchester Rd','crawley','P21869A')
INSERT INTO test1 VALUES ( 2,'','birmingham','P53342B')
INSERT INTO test1 VALUES ( 2,'Smith Close','birmingham North East','P53342')
INSERT INTO test1 VALUES ( 2,'Smith Cl.',NULL,'P53342B')
INSERT INTO test1 VALUES ( 2,'Smith Close','birmingham North','P53342')
with these rows i would be looking for the result of :
1 Winchester Road, crawley estate, P21869A
2 Smith Close, birmingham North East, P53342B
EDIT: the results above need to be in a table rather than just a comma separated string
as you can see in the result, the output should be the longest text column in the range of the 'akey' field.
i'm trying to come up with a solution that does not involve lots of subqueries on each column, the actual table has 32 columns and over 13 million rows.
the reason i'm doing this is to create a cleaned-up table that has the best results in each column for just one ID per row
this is my first post, so let me know if you need any more info, and i'm happy to hear about any best practices about posting that i've broken!
thanks
Ben.
SELECT A.akey,
(
SELECT TOP 1 T1.text1
FROM test1 T1
WHERE T1.akey=A.akey AND LEN(T1.TEXT1) = MAX(LEN(A.text1))
) AS TEXT1,
(
SELECT TOP 1 T2.text2
FROM test1 T2
WHERE T2.akey=A.akey AND LEN(T2.TEXT2) = MAX(LEN(A.text2))
) AS TEXT2,
(
SELECT TOP 1 T3.text3
FROM test1 T3
WHERE T3.akey=A.akey AND LEN(T3.TEXT3) = MAX(LEN(A.text3))
) AS TEXT3
FROM TEST1 AS A
GROUP BY A.akey
I just realized you said you have 32 columns. I don't see a good way to do that, unless UNPIVOT would allow you to create separate rows (akey, textn) for each text* column.
Edit: I may not have a chance to finish this today, but UNPIVOT looks useful:
;
WITH COLUMNS AS
(
SELECT akey, [Column], ColumnValue
FROM
(
SELECT X.Akey, X.Text1, X.Text2, X.Text3
FROM test1 X
) AS p
UNPIVOT (ColumnValue FOR [Column] IN (Text1, Text2, Text3))
AS UNPVT
)
SELECT *
FROM COLUMNS
ORDER BY akey,[Column], LEN(ColumnValue)
This seems really ugly, but at least works (on SQL2K) and doesn't need subqueries:
select test1.akey, A.text1, B.text2, C.text3
from test1
inner join test1 A on A.akey = test1.akey
inner join test1 B on B.akey = test1.akey
inner join test1 C on C.akey = test1.akey
group by test1.akey, A.text1, B.text2, C.text3
having len(a.text1) = max(len(test1.text1))
and len(B.text2) = max(len(test1.text2))
and len(C.text3) = max(len(test1.text3))
order by test1.akey
I must admit that it needs an inner join for each column and I wonder how this could impact on the 32 columns x 13millions record table... I try both this approach and the one based one subqueries and looked at executions plans: I'ld actually be curious to know

How to do INSERT into a table records extracted from another table

I'm trying to write a query that extracts and transforms data from a table and then insert those data into another table. Yes, this is a data warehousing query and I'm doing it in MS Access. So basically I want some query like this:
INSERT INTO Table2(LongIntColumn2, CurrencyColumn2) VALUES
(SELECT LongIntColumn1, Avg(CurrencyColumn) as CurrencyColumn1 FROM Table1 GROUP BY LongIntColumn1);
I tried but get a syntax error message.
What would you do if you want to do this?
No "VALUES", no parenthesis:
INSERT INTO Table2(LongIntColumn2, CurrencyColumn2)
SELECT LongIntColumn1, Avg(CurrencyColumn) as CurrencyColumn1 FROM Table1 GROUP BY LongIntColumn1;
You have two syntax options:
Option 1
CREATE TABLE Table1 (
id int identity(1, 1) not null,
LongIntColumn1 int,
CurrencyColumn money
)
CREATE TABLE Table2 (
id int identity(1, 1) not null,
LongIntColumn2 int,
CurrencyColumn2 money
)
INSERT INTO Table1 VALUES(12, 12.00)
INSERT INTO Table1 VALUES(11, 13.00)
INSERT INTO Table2
SELECT LongIntColumn1, Avg(CurrencyColumn) as CurrencyColumn1 FROM Table1 GROUP BY LongIntColumn1
Option 2
CREATE TABLE Table1 (
id int identity(1, 1) not null,
LongIntColumn1 int,
CurrencyColumn money
)
INSERT INTO Table1 VALUES(12, 12.00)
INSERT INTO Table1 VALUES(11, 13.00)
SELECT LongIntColumn1, Avg(CurrencyColumn) as CurrencyColumn1
INTO Table2
FROM Table1
GROUP BY LongIntColumn1
Bear in mind that Option 2 will create a table with only the columns on the projection (those on the SELECT).
Remove both VALUES and the parenthesis.
INSERT INTO Table2 (LongIntColumn2, CurrencyColumn2)
SELECT LongIntColumn1, Avg(CurrencyColumn) FROM Table1 GROUP BY LongIntColumn1
I believe your problem in this instance is the "values" keyword. You use the "values" keyword when you are inserting only one row of data. For inserting the results of a select, you don't need it.
Also, you really don't need the parentheses around the select statement.
From msdn:
Multiple-record append query:
INSERT INTO target [(field1[, field2[, …]])] [IN externaldatabase]
SELECT [source.]field1[, field2[, …]
FROM tableexpression
Single-record append query:
INSERT INTO target [(field1[, field2[, …]])]
VALUES (value1[, value2[, …])
Remove VALUES from your SQL.
Remove "values" when you're appending a group of rows, and remove the extra parentheses. You can avoid the circular reference by using an alias for avg(CurrencyColumn) (as you did in your example) or by not using an alias at all.
If the column names are the same in both tables, your query would be like this:
INSERT INTO Table2 (LongIntColumn, Junk)
SELECT LongIntColumn, avg(CurrencyColumn) as CurrencyColumn1
FROM Table1
GROUP BY LongIntColumn;
And it would work without an alias:
INSERT INTO Table2 (LongIntColumn, Junk)
SELECT LongIntColumn, avg(CurrencyColumn)
FROM Table1
GROUP BY LongIntColumn;
Well I think the best way would be (will be?) to define 2 recordsets and use them as an intermediate between the 2 tables.
Open both recordsets
Extract the data from the first table (SELECT blablabla)
Update 2nd recordset with data available in the first recordset (either by adding new records or updating existing records
Close both recordsets
This method is particularly interesting if you plan to update tables from different databases (ie each recordset can have its own connection ...)
inserting data form one table to another table in different DATABASE
insert into DocTypeGroup
Select DocGrp_Id,DocGrp_SubId,DocGrp_GroupName,DocGrp_PM,DocGrp_DocType
from Opendatasource( 'SQLOLEDB','Data Source=10.132.20.19;UserID=sa;Password=gchaturthi').dbIPFMCI.dbo.DocTypeGroup
Do you want to insert extraction in an existing table?
If it does not matter then you can try the below query:
SELECT LongIntColumn1, Avg(CurrencyColumn) as CurrencyColumn1 INTO T1 FROM Table1
GROUP BY LongIntColumn1);
It will create a new table -> T1 with the extracted information