random sample results for access sql query - sql

I have a table with a column called names, it is meant for there to be several rows with the same name. I need to find a way to make a query to give as result a 4% random sample of the rows with every name using Access SQL syntax. For example, if there are 100 rows of each name in the table, for it to return 4 random rows for each name on that column. Does that make sense?
I'm not very experienced on Access SQL and SQL in general, yet. Any guidance will be greatly appreciated!!!!! :)

This material might be helpful
http://support.microsoft.com/kb/153747/en-us

Related

SQLite data comparison on two tables with 2 million rows of data

I've been trying to find a way of comparing huge amount of data in two different tables but I am not sure if this is the correct approach.
So, this is why I am asking it here to understand a problem more and getting some clarity in order to solve it.
As title says I have two tables with less than 2 million rows of data and I need to a data comparison on them. So basically I only need to check if data in one table matches the data from other tables. Each table comes from separate database and I've managed to create views in order to have the same column names.
Here is my approach which gives me differences from two tables.
SELECT db1.ADDRESS_ID, db1.address
FROM UAT_CUSTOMERS_DB1 db1
EXCEPT
SELECT db2.ADDRESS_ID, db2.address
FROM UAT_CUSTOMERS_DB2 db2;
I have two questions, so basically this works:
It seems pretty straightforward but can someone explain to me in a bit more depth how this query works with such speed ? Yes I know - read docs but I would really appreciate alternative answer.
How can I include all the columns from tables without specifying manually each column name ?

Transpose query using SQL query

I am currently having one problem and I need your help. In my data table, I have one column contains all information that I need, but it has json type. Hence, I have to use SQL query to parse these information.
However, those information now becomes several columns, and that's not what I need. So, may I ask if I want to change these information into rows, which mean transpose the table result, what function should I use? Thanks in advance. My table result contains several rows and several columns.

Performance Improve on SQL Large table

Im having 260 columns table in SQL server. When we run "Select count(*) from table" it is taking almost 5-6 to get the count. Table contains close 90-100 million records with 260 columns where more than 50 % Column contains NULL. Apart from that, user can also build dynamic sql query on to table from the UI, so searching 90-100 million records will take time to return results. Is there way to improve find functionality on a SQL table where filter criteria can be anything , can any1 suggest me fastest way get aggregate data on 25GB data .Ui should get hanged or timeout
Investigate horizontal partitioning. This will really only help query performance if you can force users to put the partitioning key into the predicates.
Try vertical partitioning, where you split one 260-column table into several tables with fewer columns. Put all the values which are commonly required together into one table. The queries will only reference the table(s) which contain columns required. This will give you more rows per page i.e. fewer pages per query.
You have a high fraction of NULLs. Sparse columns may help, but calculate your percentages as they can hurt if inappropriate. There's an SO question on this.
Filtered indexes and filtered statistics may be useful if the DB often runs similar queries.
As the guys state in the comments you need to analyse a few of the queries and see which indexes would help you the most. If your query does a lot of searches, you could use the full text search feature of the MSSQL server. Here you will find a nice reference with good examples.
Things that came me up was:
[SQL Server 2012+] If you are using SQL Server 2012, you can use the new Columnstore Indexes.
[SQL Server 2005+] If you are filtering a text column, you can use Full-Text Search
If you have some function that you apply frequently in some column (like SOUNDEX of column, for example), you could create PERSISTED COMPUTED COLUMN to not having to compute this value everytime.
Use temp tables (indexed ones will be much better) to reduce the number of rows to work on.
#Twelfth comment is very good:
"I think you need to create an ETL process and start changing this into a fact table with dimensions."
Changing my comment into an answer...
You are moving from a transaction world where these 90-100 million records are recorded and into a data warehousing scenario where you are now trying to slice, dice, and analyze the information you have. Not an easy solution, but odds are you're hitting the limits of what your current system can scale to.
In a past job, I had several (6) data fields belonging to each record that were pretty much free text and randomly populated depending on where the data was generated (they were search queries and people were entering what they basically would enter in google). With 6 fields like this...I created a dim_text table that took each entry in any of these 6 tables and replaced it with an integer. This left me a table with two columns, text_ID and text. Any time a user was searching for a specific entry in any of these 6 columns, I would search my dim_search table that was optimized (indexing) for this sort of query to return an integer matching the query I wanted...I would then take the integer and search for all occourences of the integer across the 6 fields instead. searching 1 table highly optimized for this type of free text search and then querying the main table for instances of the integer is far quicker than searching 6 fields on this free text field.
I'd also create aggregate tables (reporting tables if you prefer the term) for your common aggregates. There are quite a few options here that your business setup will determine...for example, if each row is an item on a sales invoice and you need to show sales by date...it may be better to aggregate total sales by invoice and save that to a table, then when a user wants totals by day, an aggregate is run on the aggreate of the invoices to determine the totals by day (so you've 'partially' aggregated the data in advance).
Hope that makes sense...I'm sure I'll need several edits here for clarity in my answer.

Is there a name for the SQL query pattern that gets the row data for each record in a GROUPed query?

A question about how to get the data for each record that is the max in some GROUP appears over and over again on the net. There are many solutions, some of them easier to conceptualize than others. Does the query 'template' here have a name? I ask because one of the other patterns has the name 'correlated subquery' I believe. I have a need to issue this type of query often, and if I had names for the approaches I would I have a better mental index of the possible solutions to try.
Here's another example of the query type I want to know a name for.
I would call that an example of a query with a compound join.

SQL How to find out what tables have a certain column

Is it possible to construct a query so that I can find out what tables have a particular column? Then if it has the column query that table for an ID number? Is this possible?
Sure, I think you can look at the contents of the X$Field system table, described here:
http://ww1.pervasive.com/library/docs/psql/794/sqlref/sqlsystb.html