Trying to do some data exploration on a table to count all columns and all unique strings in the row individually like the below output:
+------------+-------------+------------+----------+
| table_name | column_name | distinct | count(*) |
| | | row_string | |
+------------+-------------+------------+----------+
| customer | state | WA | 15 |
+------------+-------------+------------+----------+
| customer | state | NSW | 786 |
+------------+-------------+------------+----------+
| customer | state | SA | 51 |
+------------+-------------+------------+----------+
| ... | ... | ... | ... |
+------------+-------------+------------+----------+
| customer | zip_code | 3563 | 33 |
+------------+-------------+------------+----------+
| customer | zip_code | 7583 | 52 |
+------------+-------------+------------+----------+
| customer | zip_code | 3453 | 553 |
+------------+-------------+------------+----------+
| customer | zip_code | 2132 | 211 |
+------------+-------------+------------+----------+
| ... | ... | ... | ... |
+------------+-------------+------------+----------+
I've been doing something like this:
select state, count(*)
from customer
group by state
union
select zip_code, count(*)
from customer
group by zip_code
union
...
however this is not efficient assuming you have heaps of columns in the table. Is there a more effective way to achieve this?
(The below was posted before I looked at the link that Martin Smith posted in the comments above. That approach using XML appears simpler and likely has roughly similar performance.)
You can try generating dynamic SQL that looks like:
select
cast(N'customer' AS sysname) AS table_name,
C.column_name,
C.value,
count(*) AS [count]
from [customer] T
cross apply (
values
(cast(N'state' AS sysname), cast(T.[state] AS nvachar(max)),
(cast(N'zip_code' AS sysname), cast(T.[zip_code] AS nvachar(max))
) C(column_name, value)
group by column_name, value
The CAST() operations might be seem like overkill, but they may be necessary to prevent truncation if shorter data is followed by longer data. For some data types like DATETIME or FLOAT, you might want to be more precise in the formatting, perhaps even intentionally limiting excess precision.
When injecting table names and column names into the generated SQL, use QUOTENAME(...) and QUOTENAME(..., '''') to safely [quote] or 'quote' the injected values.
Related
I can't give the actual table, but my problem is something like this:
Assuming that there is a table called Names with entries like these:
+--------------+
| name | id |
+--------------+
| Jack | 1001 |
| Jack | 1022 |
| John | 1010 |
| Boris | 1092 |
+--------------+
I need to select all the unique names from that table, and display them(only names, not ids). But if I do:
SELECT DISTINCT name FROM Names;
Then it will return:
+-------+
| name |
+-------+
| Jack |
| John |
| Boris |
+-------+
But as you can see in the table, the 2 people named "Jack" are different, since they have different ids. How do I get an output like this one:
+-------+
| name |
+-------+
| Jack |
| Jack |
| John |
| Boris |
+-------+
?
Assuming that some ids can or will be repeated(not marked primary key in question)
Also, in the question, the result will have 1 column and some number of rows(exact number is given, its 18,013). Is there a way to check if I have the right number of rows? I know I can use COUNT(), but while selecting the unique values I used GROUP BY, so using COUNT() would return the counts for how many names have that unique id, as in:
SELECT name FROM Names GROUP BY id;
+------------------+
| COUNT(name) | id |
+------------------+
| 2 | 1001 |
| 1 | 1022 |
| 1 | 1092 |
| 3 | 1003 |
+------------------+
So, is there something to help me verify my output?
You can use group by:
select name
from names
group by name, id;
You can get all the distinct persons with:
SELECT DISTINCT name, id
FROM names
and you can select from the above query only the names:
SELECT name
FROM (
SELECT DISTINCT name, id
FROM names
)
I have the following table running on Postgres SQL 9.5:
+---+------------+-------------+
|ID | trans_id | message |
+---+------------+-------------+
| 1 | 1234567 | abc123-ef |
| 2 | 1234567 | def234-gh |
| 3 | 1234567 | ghi567-ij |
| 4 | 8902345 | ced123-ef |
| 5 | 8902345 | def234-bz |
| 6 | 8902345 | ghi567-ij |
| 7 | 6789012 | abc123-ab |
| 8 | 6789012 | def234-cd |
| 9 | 6789012 | ghi567-ef |
|10 | 4567890 | abc123-ab |
|11 | 4567890 | gex890-aj |
|12 | 4567890 | ghi567-ef |
+---+------------+-------------+
I am looking for the rows for each trans_id based on a LIKE query, like this:
SELECT * FROM table
WHERE message LIKE '%def-234%'
This, of course, returns just three rows, the three that match my pattern in the message column. What I am looking for, instead, is all the rows matching that trans_id in groups of messages that match. That is, if a single row matches the pattern, get all the rows with the trans_id of that matching row.
That is, the results would be:
+---+------------+-------------+
|ID | trans_id | message |
+---+------------+-------------+
| 1 | 1234567 | abc123-ef |
| 2 | 1234567 | def234-gh |
| 3 | 1234567 | ghi567-ij |
| 4 | 8902345 | ced123-ef |
| 5 | 8902345 | def234-bz |
| 6 | 8902345 | ghi567-ij |
| 7 | 6789012 | abc123-ab |
| 8 | 6789012 | def234-cd |
| 9 | 6789012 | ghi567-ef |
+---+------------+-------------+
Notice rows 10, 11, and 12 were not SELECTed because there was not one of them that matched the %def-234% pattern.
I have tried (and failed) to write a sub-query to get the all the related rows when a single message matches a pattern:
SELECT sub.*
FROM (
SELECT DISTINCT trans_id FROM table WHERE message LIKE '%def-234%'
) sub
WHERE table.trans_id = sub.trans_id
I could easily do this with two queries, but the first query to get a list of matching trans_ids to include in a WHERE trans_id IN (<huge list of trans_ids>) clause would be very large, and would not be a very inefficient way of doing this, and I believe there exists a way to do it with a single query.
Thank you!
This will do the job I think :
WITH sub AS (
SELECT trans_id
FROM table
WHERE message LIKE '%def-234%'
)
SELECT *
FROM table JOIN sub USING (trans_id);
Hope this help.
Try this:
SELECT ID, trans_id, message
FROM (
SELECT ID, trans_id, message,
COUNT(*) FILTER (WHERE message LIKE '%def234%')
OVER (PARTITION BY trans_id) AS pattern_cnt
FROM mytable) AS t
WHERE pattern_cnt >= 1
Using a FILTER clause in the windowed version of COUNT function we can get the number of records matching the predefined pattern within each trans_id slice. The outer query uses this count to filter out irrelevant slices.
Demo here
You can do this.
WITH trans
AS
(SELECT DISTINCT trans_id
FROM t1
WHERE message LIKE '%def234%')
SELECT t1.*
FROM t1,
trans
WHERE t1.trans_id = trans.trans_id;
I think this will perform better. If you have enough data, you can do an explain on both Sub query and CTE and compare the output.
I have a table in which several indentifiers of a person may be stored. In this table I would like to create a single calculated identifier column that stores the best identifier for that record depending on what identifiers are available.
For example (some fictional sample data) ....
Table = "Citizens"
Id | LastName | DL-No | SS-No | State-Id-No | Calculated
------------------------------------------------------------------------
1 | Smith | NULL | 374-784-8888 | 7383204848 | ?
2 | Jones | JG892435262 | NULL | NULL | ?
3 | Trask | TSK73948379 | NULL | 9276542119 | ?
4 | Clinton | CL231429888 | 543-123-5555 | 1840430324 | ?
I know the order in which I would like choose identifiers ...
Drivers-License-No
Social-Security-No
State-Id-No
So I would like the calculated identifier column to be part of the table schema. The desired results would be ...
Id | LastName | DL-No | SS-No | State-Id-No | Calculated
------------------------------------------------------------------------
1 | Smith | NULL | 374-784-8888 | 7383204848 | 374-784-8888
2 | Jones | JG892435262 | NULL | 4537409273 | JG892435262
3 | Trask | NULL | NULL | 9276542119 | 9276542119
4 | Clinton | CL231429888 | 543-123-5555 | 1840430324 | CL231429888
IS this possible? If so what SQL would I use to calculate what goes in the "Calculated" column?
I was thinking of something like ..
SELECT
CASE
WHEN ([DL-No] is NOT NULL) THEN [DL-No]
WHEN ([SS-No] is NOT NULL) THEN [SS-No]
WHEN ([State-Id-No] is NOT NULL) THEN [State-Id-No]
AS "Calculated"
END
FROM Citizens
The easiest solution is to use coalesce():
select c.*,
coalesce([DL-No], [SS-No], [State-ID-No]) as calculated
from citizens c
However, I think your case statement will also work, if you fix the syntax to use when rather than where.
this is somehow hard to describe, however I have a postgresql 9.1 table (planet_osm_roads).
My query is
SELECT
osm_id, name, highway, way, md5(astext(way)) AS md5
FROM planet_osm_roads
WHERE highway IS NOT NULL
AND md5(astext(way)) IN (
SELECT DISTINCT md5(astext(way))
FROM planet_osm_roads
WHERE highway IS NOT NULL
GROUP BY md5
HAVING count(osm_id) > 1
)
ORDER BY osm_id
The result is
osm_id | name | highway | ...way ... | md5
----------+------+---------------+-------...----...--+----------------------------------
-1641383 | | motorway | 010200...CA96...0 | 04b4336b997e7ea9d99208bd487bbe7d
-1641383 | | motorway | 010200...EC3E...0 | ae945148417ada285130c59277c48a25
-1641383 | | motorway | 010200...7BF6...0 | 5c5a1b8ae40c1b7f24e293a012ad2add
23133731 | | motorway_link | 010200...EC3E...0 | ae945148417ada285130c59277c48a25
31309105 | | motorway | 010200...7BF6...0 | 5c5a1b8ae40c1b7f24e293a012ad2add
49339926 | | motorway | 010200...CA96...0 | 04b4336b997e7ea9d99208bd487bbe7d
(6 rows)
I want a result that holds 3 rows (one for every md5 hash) and any of the other corresponding rows.
So a valid row for "ae945148417ada285130c59277c48a25" may contain osm_id-highway pair of "-1641383" & "motorway" or "23133731" & "motorway_link"- I don't mind and will consider both as correct.
How can I solve this and how is the required operation/technique called? So I know for next time how to call it an what to search for.
select
md5(astext(way)) as md5,
min(osm_id) osm_id,
min(name) name,
min(highway) highway,
min(way) way
from planet_osm_roads
where highway is not null
group by 1
having count(osm_id) > 1
I have at table containing procurement contracts that looks like this:
+------+-----------+------------+---------+------------+-----------+
| type | text | date | company | supplierID | name |
+ -----+-----------+------------+---------+------------+-----------+
| 0 | None | 2004-03-29 | 310 | 227234 | HRC INFRA |
| 0 | None | 2007-09-30 | 310 | 227234 | HRC INFRA |
| 0 | None | 2010-11-29 | 310 | 227234 | HRC INFRA |
| 2 | Strategic | 2011-01-01 | 310 | 227234 | HRC INFRA |
| 0 | None | 2012-04-10 | 310 | 227234 | HRC INFRA |
+------+-----------+------------+---------+------------+-----------+
In this example the first three rows the contract is the same. So I only want the first one.
The row with type = 2 is a change in procurement contract with the given supplier. I want to select that row as well.
On the last row the contract changes back to 0, so I want to select that row as well.
Basically I want to select the first row and the rows where the contract type changes. So the result should look like this:
+------+-----------+------------+---------+------------+-----------+
| type | text | date | company | supplierID | name |
+ -----+-----------+------------+---------+------------+-----------+
| 0 | None | 2004-03-29 | 310 | 227234 | HRC INFRA |
| 2 | Strategic | 2011-01-01 | 310 | 227234 | HRC INFRA |
| 0 | None | 2012-04-10 | 310 | 227234 | HRC INFRA |
+------+-----------+------------+---------+------------+-----------+
Any suggestions to how I can accomplish this?
;WITH cte AS
(
SELECT ROW_NUMBER() OVER (ORDER BY date) AS Id,
type, text, date, company, supplierId, name
FROM your_table
)
SELECT c1.type, c1.text, c1.date, c1.company,
c1.supplierId, c1.name
FROM cte c1 LEFT JOIN cte c2 ON c1.id = c2.id + 1
WHERE c2.text IS NULL OR c1.text != c2.text
Demo on SQLFiddle
I don't have SQL server in front of me to test it out so I'm not going to attempt the actual solution for it right now, but fyi there are few things you need:
1) A way to make sure the records are ordered properly. I don't see any kind of an id here which means you have no guarantee that they will appear in that order. I assume there is one so just make sure you order by it
2) You need to do an outer join on the table to itself on whatever the index is, but instead of "table1.index = table2.index" it will look like "table1.index = table2.index + 1". If your indexes aren't sequential then it will make joining them this way more complex than that though.
3) In the where clause you'll specify something like
where table1.type <> table2.type
That will get you most the way there. That won't pick up the very first record though since there is no record before the first record to compare to so you'll need an OR addition to compensate for that. And I'm assuming that type has no NULL values.
Sorry I couldn't be more help with an actual implementation but maybe someone else will take care of that shortly.
might be what you want. Presumingly you dont have type < 0.
SELECT *
FROM [TABLE] as ot where ot.type <>
(select top 1 coalesce(it.type, -1) from [TABLE] as it where it.date < ot.date order by it.date desc)
Also, take not of brandon note to make shure tables are ordered, due i dont see PK.