Database Analytics: Leveraging SQL’s Analytic Functions

In this era of data-driven world, everyone is aware of the importance of Database Analytics and trying his or her best practice and tactics to bring valuable insights from available data. At the same time, we all know that not all the time it is as easy as it looks, and sometimes it becomes very challenging and time taking for a data analyst.

With the exploded volume of data available in each sector data analysis has become huge hectic and driving through data to come up with valuable data driven outcomes is more complicated now and this demands several tools and platforms to perform analysis that is more complex.

What if we parsed the data in a database by significantly using much simpler queries?

Using SQL Analytics function, this can be achieved. I am going to discuss various SQL analytic functions in this article, that can be executed within the SQL Server, which will simplify the process of end-result.

Using these SQL Analytics functions, you can calculate value based on a group of rows automatically, and these are free from the rules of basic row operations.

These operations include tools for ranking, time series calculations, windowing and trend analysis and much more.

Let’s jump into the practice section without wasting a single minute and dive into the pool of these functions one after another. I have gathered here details with practical Example:

You will defiantly learn in this tutorial about basics of practical knowledge of SQL queries.

Creating a Demo Table

We will create a demo table and apply all the analytic functions on this table so that you easily follow along with the tutorial.

Remember: Preferably, use MySQL or PostgreSQL Server for this tutorial because some functions delisted below in this tutorial are not available in SQLite.

This table contains the data of several graduate students, containing five columns Student ID, Student First Name, Student First Name, Subject and Final Marks out of 100.

Creating a University Students Table containing 5 columns

CREATE TABLE University Students

  (

     ID          INT NOT NULL PRIMARY KEY,

     FirstNAME        VARCHAR(255),

     LastNAME        VARCHAR(255),

     Subject     VARCHAR(30),

     Final_Marks INT

  );

Now, table is available in our data base and we need to input the data into it.

Inserting dummy data set into the table created:

Inserting Data Into The Table of University Students (SQL)

INSERT INTO Students (ID, FirstName, LastName, subject, Final_Marks)

VALUES (1, ‘Ben’, ‘Park’, ‘Maths’, 88),

       (2, ‘Rose’, ‘Mery’,’Maths’, 77),

       (3, ‘Peter’, ‘Parker’, ‘Chemistry’, 78),

       (4, ‘david’, ‘Methew’, ‘Maths’, 54),

       (5, ‘Pollard’, ‘Bell’, ‘Chemistry’, 91),

       (6, ‘Steve’, ‘Man’, ‘Biology’, 88),

       (7, ‘Jef’, ‘Jos’, ‘Physics’, 99),

       (8, ‘Tom’, ‘Herry’, ‘Maths’, 97),

       (9, ‘Ricky’, ‘Shon’, ‘Biology’, 78),

       (10, ‘David’, ‘Drill’, ‘Chemistry’, 93),

       (11, ‘Joseph’, ‘Hunt’, ‘Chemistry’, 93),

       (12, ‘James’, ‘Josh’, ‘Biology’, 65),

       (13, ‘Bill’, ‘Adam’, ‘Maths’, 90),

       (14, ‘Den’, ‘Warner’, ‘Biology’, 45),

       (15, ‘Frenk’, ‘Feny’, ‘Physics’, 56);

Time to visualize our table:

SELECT *

FROM   University Students

Result:

Let’s execute the analytic functions and find out the results.

RANK() & DENSE_RANK() Function

The RANK() function is used to assign a rank to each row within the result set based on the specified column’s values. It is commonly used in scenarios where you want to determine the rank of a particular value relative to others in the same column.

For example, if you have a table of students with their scores, you can use the RANK() function to assign a rank to each student based on their score. The student with the highest score would have a rank of 1, the student with the second-highest score would have a rank of 2, and so on.

The basic syntax of the RANK() function is as follows:

RANK() OVER (PARTITION BY column ORDER BY expression [ASC|DESC])

Description:

Here, `column` is the column used to partition the result set (optional), and `expression` is the column used to determine the ranking order. The `ASC` or `DESC` keyword is used to specify whether the ranking should be in ascending or descending order.

Keep in mind that RANK() can result in tied ranks if multiple rows have the same values in the ranking column. In such cases, the tied rows will receive the same rank, and the next rank will be skipped. To handle tied ranks differently, you can use other functions like DENSE_RANK() or ROW_NUMBER().

Let’s understand with the example for more clear picture

SELECT *,

       Rank()

         OVER (

           ORDER BY Final_Marks DESC) AS ‘ranks’

FROM   students;

Result:

Chart above shows that the final marks are arranged in descending order, and a particular rank is associated with each row. The second observation is that the student with same marks gets same ranks. and the following rank after the duplicate row is skipped. You can also look for the topers of each subject.

Partition the rank based on the subjects.

For Example

SELECT *,

       Rank ()

         OVER (

           PARTITION BY subject

           ORDER BY Final_Marks DESC) AS ‘ranks’

FROM   students;

Result:

In above example, we have partitioned the ranking based on subjects and the ranks are allocated separately for each subject.

__________________________________________________________________________________________________

Note: Please observe that two students got the similar marks in the subject of Chemistry and ranked 1. however, the rank for the next row directly starts from 3 and skips the rank of 2.

___________________________________________________________________________________________________

This is and exclusive feature of RANK() function. In RANK() function it is not always necessary to produce ranks consecutively. The next rank will be the sum of the previous rank and the duplicate numbers.

To overcome the problem of skipping the rank, DENSE_RANK() is introduced. It works similarly to the RANK() function, but it always assigns rank consecutively.

For example:

SELECT *,

       DENSE_RANK()

         OVER (

           PARTITION BY subject

           ORDER BY Final_Marks DESC) AS ‘ranks’

FROM   students;

Result:


 
The above figure shows that all the ranks are consecutive, even if duplicate marks are in the same partition.

NTILE() Function

In SQL, the NTILE() function is used to divide the result set into a specified number of equally sized groups or buckets. Each row in the result set is assigned a group number based on the specified number of buckets.

The basic syntax of the NTILE() function is as follows:

NTILE(number_of_buckets) OVER (ORDER BY expression [ASC|DESC])

Description:

Here, `number_of_buckets` represents the desired number of groups you want to divide the result set into, and `expression` is the column used to determine the ordering of the rows within the result set.

For example, if you have a table of Students with their Ranks, and you want to divide them into 4 Rank groups based on their Ranks Obtained, you can use NTILE(4) to achieve this. The function will distribute the students into four groups, with each group having approximately the same number of employees (or as close as possible).

Keep in mind that if the number of rows in the result set is not divisible by the specified number of buckets, some groups may have one more row than others.

Pro Tip: The NTILE() function is helpful when you need to perform operations on data that is evenly distributed across several partitions, such as quartiles, quintiles, or other similar divisions of the data.

 Related Post

Subscribe

Leave a Reply

Your email address will not be published. Required fields are marked *