Filter (social media)

Filter (social media)

Filters are digital image effects often used on social media. They initially simulated the effects of camera filters, and they have since developed with facial recognition technology and computer-generated augmented reality. Social media filters—especially beauty filters—are often used to alter the appearance of selfies taken on smartphones or other similar devices. While filters are commonly associated with beauty enhancement and feature alterations, there is a wide range of filters that have different functions. From adjusting photo tones to using face animations and interactive elements, users have access to a range of tools. These filters allow users to enhance photos and allow room for creative expression and fun interactions with digital content. == History == Beauty filters originate from Purikura ("print club"), a type of Japanese photographic arcade game machine conceived in 1994 by Sasaki Miho, a female employee at Atlus, and released in 1995 by Atlus and Sega primarily for female visitors at Japanese arcades. They allowed the manipulation of digital selfie photos with kawaii beauty filters similar to later Snapchat filters. Purikura filters included beautifying the image, cat whiskers, bunny ears, writing text, scribbling graffiti, selecting backdrops, borders, insertable decorations, icons, hair extensions, twinkling diamond tiaras, tenderized light effects, and predesigned decorative margins. To capitalize on the Purikura phenomenon in Japan during the late 1990s, Japanese mobile phones began including a front-facing camera, starting with the Kyocera Visual Phone VP‑210 in 1999. The Sanyo SCP-5300 released in 2002 was the first camera phone with filter effects, such as illumination, white‑balance control, sepia, black and white, and negative colors. Purikura-like beauty filters later appeared in smartphone apps such as Instagram and Snapchat in the 2010s. In 2010, Apple introduced the iPhone 4—the first iPhone model with a front-facing camera. It gave rise to a dramatic increase in selfies, which could be touched up with more flattering lighting effects with applications such as Instagram. The American photographer Cole Rise was involved in the creation of the original filters for Instagram around 2010, designing several of them himself, including Sierra, Mayfair, Sutro, Amaro, and Willow. However, the technology for virtual lens filters was invented and patented by Patrick Levy-Rosenthal in 2007. The patent received 100 citations, including Facebook, Nvidia, Microsoft, Samsung, and Snap. In September, 2011, the Instagram 2.0 update for the application introduced "live filters," which allowed the user to preview the effect of the filter while shooting with the application's camera. #NoFilter, a hashtag label to describe an image that had not been filtered, became popular around 2013. An update in 2014 allowed users to adjust the intensity of the filters as well as fine-tune other aspects of the image, features that had been available for years on applications such as VSCO and Litely. In 2014, Snapchat started releasing sponsored filters to monetize the participatory use of the application. In September 2015, Snapchat acquired Looksery and released a feature called "lenses," animated filters using facial recognition technology. Some of the early lenses available on Snapchat at the time were Heart Eyes, Terminator, Puke Rainbows, Old, Scary, Rage Face, Heart Avalanche. The Coachella filter released April 2016 was a popular early augmented reality filter. In April 2017, Facebook released the Camera Effects Platform, which is the first augmented reality platform that allows developers to create their own filters and effects on Facebook's Camera. In December 2017, Snapchat also launched their Lens Studio augmented reality developer tool that allows users and advertisers to do the same on the Snapchat application. In April 2022,TikTok joined the two, and launched their own augmented reality developer platform called Effect house. In February 2023, Effect House gave opened up the access to generative AI tools that allowed creators to change facial features in real time. In November 2023, TikTok released a feature where users no longer needed Effect House to create their own filters, as they are now able to create their own effects on the TikTok application. In August 2024, Meta announced that it would be removing third-party filter effects from its family of apps by January 14, 2025. The AR development software Meta Spark AR will also be retired at the same time; it was at one point the "world's largest mobile AR platform". Brand and creator effects represent the vast majority of filters available on Meta platforms, with over 2 million third-party filters available as of 2021. == Beauty filter == A beauty filter is a filter applied to still photographs, or to video in real time, to enhance the physical attractiveness of the subject. Typical effects of such filters include smoothing skin texture and modifying the proportions of facial features, for example enlarging the eyes or narrowing the nose. Filters may be included as a built-in feature of social media apps such as Instagram or Snapchat, or implemented through standalone applications such as Facetune. In 2020, the "Perfect Skin" filter for Snapchat and Instagram which was created by Brazilian augmented reality developer Brenno Faustino gained more than 36 million impressions in the first 24 hours of its release. In 2021, TikTok users pointed out how the default front-facing camera on the platform automatically applied the retouch and other feature-altering filters. Users noted that these filters slimmed down faces, smoothed skin, whitened teeth, and altered facial features such as nose and eye size, without the option to disable this feature through settings. In March 2023, the "Bold Glamour" filter was released on TikTok and instantly went viral with over 18 million videos created within its first week. This filter subtly enhances the user's facial features seamlessly, giving the illusion of fuller eyebrows, taller cheekbones, enhanced eye make up, a smaller nose, plumper lips, and clearer skin, giving off a natural yet distinct effect. As of May 2024, the filter has been used in over 220 million videos and has become a pivotal moment for beauty filters on digital platforms. Critics have raised concerns that the widespread use of such filters on social media may lead to negative body image, particularly among girls. Though Meta's intention of removing third-party filters will likely see all beauty filters removed, academics feel that the damage of beautifying filters is already done. === Background === The manipulation of photos to enhance attractiveness has long been possible using software such as Adobe Photoshop and, before that, analogue techniques such as airbrushing. However, such tools required considerable technical and artistic skill, and so their use was mostly limited to professional contexts, such as magazines or advertisements. By contrast, filters work in an automated fashion through the use of complex algorithms, requiring little or no input from the user. This ease of use, in combination with the increase in processing power of smartphones, and the rise of social media and selfie culture, have led to photographic manipulation occurring on a much wider scale than ever before. One of the earliest examples of a content-aware digital photographic filter is red-eye reduction. === Effects === Typical changes applied by beauty filters include: Smoothing skin texture; minimizing fine lines and blemishes Erasing under-eye bags Erasing naso-labial lines ("laugh lines") Application of virtual makeup, such as lipstick or eyeshadow Slimming the face; erasing double chins Enlarging the eyes Whitening teeth Narrowing the nose Increasing fullness of the lips Beauty filters most frequently target the face, though in some cases they may affect other body parts. For example, the app "Retouch Me" was reported to have a feature which allows users to superimpose visible abdominal muscles (a "six pack") onto photos featuring the subject's bare stomach. === Reception and psychological effects === Some commentators have expressed concern that beauty filters may create unrealistic beauty standards, particularly among girls, and contribute to rates of body dysmorphic disorder. A correlation has been established between negative body image and the use of beautifying filters, though the direction of causation is unknown. The inability to discern whether a particular image has been filtered is thought to exacerbate their negative psychological effects. Policymakers have advocated for social networks to disclose the use of filters; TikTok, Instagram, and Snapchat all label filtered photos and videos with the name of the filter applied. It has also been noted that beauty filters on social media tend to highlight Eurocentric features, like lighter eyes, a smaller nose, and flushed ch

Web intelligence

Web intelligence is the area of scientific research and development that explores the roles and makes use of artificial intelligence and information technology for new products, services and frameworks that are empowered by the World Wide Web. The term was coined in a paper written by Ning Zhong, Jiming Liu Yao and Y.Y. Ohsuga in the Computer Software and Applications Conference in 2000. == Research == The research about the web intelligence covers many fields – including data mining (in particular web mining), information retrieval, pattern recognition, predictive analytics, the semantic web, web data warehousing – typically with a focus on web personalization and adaptive websites.

Singularity studies

Singularity studies is an interdisciplinary academic field which examines the idea of technological singularity — the hypothesised point at which artificial intelligence may surpass human intelligence, might be attained by artificial intelligence (AI), robotics, and other technologies and sciences, and its social impacts. In this academic field, the study and research are conducted across a broad array of terrains such as information science, robotics, social informatics, economics, philosophy, and ethics. The primary aim of singularity studies is to gain an integrative understanding of the transformation of social systems occurring in tandem with the explosive evolution of AI and also the changes to be effected by such transformation in the view of humans, ethics, and legal systems. == History == An academic work on technological singurality has appeared in computer science, philosophy, sociology, and law since the early 1990s. Early discussions of an intelligence explosion were popularised by science-fiction writer Vernor Vinge in 1993 and later systematised by futurist Ray Kurzweil. Since the 2010s, universities such as Oxford, Stanford, and Keio have established dedicated programmes, while peer-reviewed journals have begun to publish scenario analyses and policy studies. Ongoing debates question the predictive value of singularity scenarios and warn against a deterministic view of technology. == Characteristics of research == Singularity studies extends beyond mere future predictions and offer an intellectual foundation for proactively designing and creating a desirable future. Principal research themes in this realm include: Ethics of AI; Social implications of technologies; Possibility of harmonious coexistence of humans and AI; Communication with AI; and Redesign of social systems. == Technologists and academics == Vernor Vinge: Propounded the concept of singularity in 1993, making a massive impact on the academic and science-fiction spheres. Ray Kurzweil: Predicted the advent around 2045 of the technological singularity in his 2005 book The Singularity Is Near. Nick Bostrom: Offered philosophical reflections on superintelligence and the risks posed by AI. He is the founding director of the now-dissolved Future of Humanity Institute at the University of Oxford. === Japan === Kento Sasano: A social informatician, AI educator, and inventor. He is the president of the Japan Society of Singularity Studies. == Challenges and outlook == Singularity studies is still evolving as an academic field, and quite a few challenges remain unresolved in regard to the systematization of their theories, research methods, and educational curricula. That said, in this day and age of accelerating technological and societal shifts, interdisciplinary approaches have gained in importance and are drawing much attention in the arenas of scholarly research, intercorporate collaboration, and policy planning.

Data Science and Predictive Analytics

The first edition of the textbook Data Science and Predictive Analytics: Biomedical and Health Applications using R, authored by Ivo D. Dinov, was published in August 2018 by Springer. The second edition of the book was printed in 2023. This textbook covers some of the core mathematical foundations, computational techniques, and artificial intelligence approaches used in data science research and applications. By using the statistical computing platform R and a broad range of biomedical case-studies, the 23 chapters of the book first edition provide explicit examples of importing, exporting, processing, modeling, visualizing, and interpreting large, multivariate, incomplete, heterogeneous, longitudinal, and incomplete datasets (big data). == Structure == === First edition table of contents === The first edition of the Data Science and Predictive Analytics (DSPA) textbook is divided into the following 23 chapters, each progressively building on the previous content. === Second edition table of contents === The significantly reorganized revised edition of the book (2023) expands and modernizes the presented mathematical principles, computational methods, data science techniques, model-based machine learning and model-free artificial intelligence algorithms. The 14 chapters of the new edition start with an introduction and progressively build foundational skills to naturally reach biomedical applications of deep learning. Introduction Basic Visualization and Exploratory Data Analytics Linear Algebra, Matrix Computing, and Regression Modeling Linear and Nonlinear Dimensionality Reduction Supervised Classification Black Box Machine Learning Methods Qualitative Learning Methods—Text Mining, Natural Language Processing, and Apriori Association Rules Learning Unsupervised Clustering Model Performance Assessment, Validation, and Improvement Specialized Machine Learning Topics Variable Importance and Feature Selection Big Longitudinal Data Analysis Function Optimization Deep Learning, Neural Networks == Reception == The materials in the Data Science and Predictive Analytics (DSPA) textbook have been peer-reviewed in the Journal of the American Statistical Association, International Statistical Institute’s ISI Review Journal, and the Journal of the American Library Association. Many scholarly publications reference the DSPA textbook. As of January 17, 2021, the electronic version of the book first edition (ISBN 978-3-319-72347-1) is freely available on SpringerLink and has been downloaded over 6 million times. The textbook is globally available in print (hardcover and softcover) and electronic formats (PDF and EPub) in many college and university libraries and has been used for data science, computational statistics, and analytics classes at various institutions.

Algorithmic bias

Algorithmic bias describes systematic and repeatable harmful tendency in a computerized sociotechnical system to create "unfair" outcomes, such as "privileging" one category over another in ways that may or may not be different from the intended function of the algorithm. Bias can emerge from many factors, including intentionally biased design decisions or the unintended or unanticipated use or decisions relating to the way data is coded, collected, selected or used to train the algorithm. For example, algorithmic bias has been observed in search engine results and social media platforms. This bias can have impacts ranging from privacy violations to reinforcing social biases of race, gender, sexuality, and ethnicity. The study of algorithmic bias is most concerned with algorithms that reflect "systematic and unfair" discrimination. This bias has only recently been addressed in legal frameworks, such as the European Union's General Data Protection Regulation (enforced in 2018) and the Artificial Intelligence Act (proposed in 2021 and adopted in 2024). As algorithms expand their ability to organize society, politics, institutions, and behavior, sociologists have become concerned with the ways in which unanticipated output and manipulation of data can impact the physical world. Because algorithms are often considered to be neutral and unbiased, they can inaccurately project greater authority than human expertise (in part due to the psychological phenomenon of automation bias), and in some cases, reliance on algorithms can displace human responsibility for their outcomes, without last mile thinking. Bias can enter into algorithmic systems as a result of pre-existing cultural, social, or institutional expectations; by how features and labels are chosen; because of technical limitations of their design; or by being used in unanticipated contexts or by audiences who are not considered in the software's initial design. Algorithmic bias has been cited in cases ranging from election outcomes to the spread of online hate speech. It has also arisen in criminal justice, healthcare, and hiring, compounding existing racial, socioeconomic, and gender biases. The relative inability of facial recognition technology to accurately identify darker-skinned faces has been linked to multiple wrongful arrests of black men, an issue stemming from imbalanced datasets. Problems in understanding, researching, and discovering algorithmic bias persist due to the proprietary nature of algorithms, which are typically treated as trade secrets. Even when full transparency is provided, the complexity of certain algorithms poses a barrier to understanding their functioning. Furthermore, algorithms may change, or respond to input or output in ways that cannot be anticipated or easily reproduced for analysis. In many cases, even within a single website or application, there is no single "algorithm" to examine, but a network of many interrelated programs and data inputs, even between users of the same service. A 2021 survey identified multiple forms of algorithmic bias, including historical, representation, and measurement biases, each of which can contribute to unfair outcomes. == Definitions == Algorithms are difficult to define, but may be generally understood as lists of instructions that determine how programs read, collect, process, and analyze data to generate a usable output. For a rigorous technical introduction, see Algorithms. Advances in computer hardware and software have led to an increased capability to process, store and transmit data. This has in turn made the design and adoption of technologies such as machine learning and artificial intelligence technically and commercially feasible. By analyzing and processing data, algorithms are the backbone of search engines, social media websites, recommendation engines, online retail, online advertising, and more. Contemporary social scientists are concerned with algorithmic processes embedded into hardware and software applications because of their political and social impact, and question the underlying assumptions of an algorithm's neutrality. The term algorithmic bias describes systematic and repeatable errors that create unfair outcomes, such as privileging one arbitrary group of users over others. For example, a credit score algorithm may deny a loan without being unfair, if it is consistently weighing relevant financial criteria. If the algorithm recommends loans to one group of users, but denies loans to another set of nearly identical users based on unrelated criteria, and if this behavior can be repeated across multiple occurrences, an algorithm can be described as biased. This bias may be intentional or unintentional (for example, it can come from biased data obtained from a worker that previously did the job the algorithm is going to do from now on). == Methods == Bias can be introduced to an algorithm in several ways. During the assemblage of a dataset, data may be collected, digitized, adapted, and entered into a database according to human-designed cataloging criteria. Next, programmers assign priorities, or hierarchies, for how a program assesses and sorts that data. This requires human decisions about how data is categorized, and which data is included or discarded. Some algorithms collect their own data based on human-selected criteria, which can also reflect the bias of human designers. Other algorithms may reinforce stereotypes and preferences as they process and display "relevant" data for human users, for example, by selecting information based on previous choices of a similar user or group of users. Beyond assembling and processing data, bias can emerge as a result of design. For example, algorithms that determine the allocation of resources or scrutiny (such as determining school placements) may inadvertently discriminate against a category when determining risk based on similar users (as in credit scores). Meanwhile, recommendation engines that work by associating users with similar users, or that make use of inferred marketing traits, might rely on inaccurate associations that reflect broad ethnic, gender, socio-economic, or racial stereotypes. Another example comes from determining criteria for what is included and excluded from results. These criteria could present unanticipated outcomes for search results, such as with flight-recommendation software that omits flights that do not follow the sponsoring airline's flight paths. Algorithms may also display an uncertainty bias, offering more confident assessments when larger data sets are available. This can skew algorithmic processes toward results that more closely correspond with larger samples, which may disregard data from underrepresented populations. == History == === Early critiques === The earliest computer programs were designed to mimic human reasoning and deductions, and were deemed to be functioning when they successfully and consistently reproduced that human logic. In his 1976 book Computer Power and Human Reason, artificial intelligence pioneer Joseph Weizenbaum suggested that bias could arise both from the data used in a program, but also from the way a program is coded. Weizenbaum wrote that programs are a sequence of rules created by humans for a computer to follow. By following those rules consistently, such programs "embody law", that is, enforce a specific way to solve problems. The rules a computer follows are based on the assumptions of a computer programmer for how these problems might be solved. That means the code could incorporate the programmer's imagination of how the world works, including their biases and expectations. While a computer program can incorporate bias in this way, Weizenbaum also noted that any data fed to a machine additionally reflects "human decision making processes" as data is being selected. Finally, he noted that machines might also transfer good information with unintended consequences if users are unclear about how to interpret the results. Weizenbaum warned against trusting decisions made by computer programs that a user doesn't understand, comparing such faith to a tourist who can find his way to a hotel room exclusively by turning left or right on a coin toss. Crucially, the tourist has no basis of understanding how or why he arrived at his destination, and a successful arrival does not mean the process is accurate or reliable. An early example of algorithmic bias resulted in as many as 60 women and ethnic minorities denied entry to St. George's Hospital Medical School per year from 1982 to 1986, based on implementation of a new computer-guidance assessment system that denied entry to women and men with "foreign-sounding names" based on historical trends in admissions. While many schools at the time employed similar biases in their selection process, St. George was most notable for automating said bias through the use of an algorithm, thus gaining the attention of people on a much

Automated dispensing cabinet

An automated dispensing cabinet (ADC), also called a unit-based cabinet (UBC), automated dispensing device (ADD), or automated dispensing machine (ADM)[1], is a computerized medicine cabinet for hospitals and healthcare settings. ADCs allow medications to be stored and dispensed near the point of care while controlling and tracking drug distribution. == Overview == Hospital pharmacies have provided medications for patients by filling patient-specific cassettes of unit-dose medications that were then delivered to the nursing unit and stored in medication cabinets or carts. ADCs, originally designed for hospital use, were introduced in hospitals in the 1980s and have facilitated the transition to alternative delivery models and more decentralized medication distribution systems.[2] Implementing automated dispensing cabinets as part of a decentralized or hybrid medication distribution system can improve patient safety and the accountability of the inventory, streamline certain billing processes. However, in the 2000s, the technology began to be deployed into other care settings where medication doses were stored onsite, and higher security methods were needed to control inventory, access, and dispensing of each patient dose. Settings that now deploy ADCs include long-term care facilities, hospice, critical access hospitals, surgery centers, group homes, residential care facilities, rehab and psych environments, animal health, dental clinics, and nursing education simulation. These diverse care settings share a common need to safely store, account for, and dispense individual doses of medications, especially narcotics and high-value medications, at the point of care.[3] ADCs track user access and dispensed medications, and their use can improve control over medication inventory. The real-time inventory reports generated by many cabinets can simplify the filling process and help the pharmacy track expired drugs. Furthermore, by restricting individual drugs – such as high-risk medications and controlled substances – to unique drawers within the cabinet, overall inventory management, patient safety, and medication security can be improved. Automated dispensing cabinets allow the pharmacy department to profile physician orders before they are dispensed.[4] ADCs can also enable providers to record medication charges upon dispensing, reducing the billing paperwork the pharmacy is responsible for. In addition, nurses can note returned medications using the cabinets' computers, enabling direct credits to patients' accounts. Since automated cabinets can be located on the nursing unit floor, nursing have speedier access to a patient's medications. Also, shorter waiting time ensures improved patient comfort and care.[5] == Role of automated dispensing in healthcare == Automated dispensing is a pharmacy practice in which a device dispenses medications and fills prescriptions. ADCs, which can handle many different medications, are available from a number of manufacturers such as BD, ARxIUM, and Omnicell. Though members of the pharmacy community have been utilizing automation technology since the 1980s, companies are constantly improving ADCs to meet changing needs and health standards in the industry. Several goals can be met by implementing an automated product in a healthcare facility. Patient safety can be ensured with the use of ADC technology such as barcoding. Anesthesia ADCs in operating rooms and perioperative areas may include label printing to prevent mix-ups such as errors between morphine and hydromorphone, two different opioid analgesics that frequently get confused. These systems also communicate with the pharmacy and its information management system to track medications removed and support inventory replenishment. == Key features == ADCs are like automated teller machines whose specific technologies such as barcode scanning and clinical decision support can improve medication safety. Some have metal locking drawers for added security and some have automated single-dose dispensing to prevent the need for a blind count each time a controlled substance is accessed. Over the years, ADCs have been adapted to facilitate compliance with emerging regulatory requirements such as pharmacy review of medication orders and safe practice recommendations. ADCs incorporate advanced software and electronic interfaces to synthesize high-risk steps in the medication use process. These unit-based medication repositories provide computer-controlled storage, dispensation, tracking, and documentation of medication distribution in the resident care unit. Since automated dispensing cabinets are not located in the pharmacy, they are considered "decentralized" medication distribution systems. Instead, they can be found at the point of care on the resident care unit. Tracking of the stocking and distribution process can occur by interfacing the unit with a central pharmacy computer. These cabinets can also be interfaced with other external databases such as resident profiles, the facility's admission/discharge/transfer system, and billing systems. Most ADC providers offer scalable systems since several important factors vary widely by facility such as budget, physical room size, patient population/demographics, type of healthcare facility, etc.

Kernel density estimation

In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on kernels as weights. KDE answers a fundamental data smoothing problem where inferences about the population are made based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form. One of the famous applications of kernel density estimation is in estimating the class-conditional marginal densities of data when using a naive Bayes classifier, which can improve its prediction accuracy. == Definition == Let x = ( x 1 , x 2 , x 3 , . . . ) {\displaystyle \mathbf {x} =\left(x_{1},x_{2},x_{3},...\right)} be independent and identically distributed samples drawn from some univariate distribution with an unknown density f at any given point x. We are interested in estimating the shape of this function f. Its kernel density estimator is f ^ h ( x ) = 1 n ∑ i = 1 n K h ( x − x i ) = 1 n h ∑ i = 1 n K ( x − x i h ) , {\displaystyle {\hat {f}}_{h}(x)={\frac {1}{n}}\sum _{i=1}^{n}K_{h}(x-x_{i})={\frac {1}{nh}}\sum _{i=1}^{n}K{\left({\frac {x-x_{i}}{h}}\right)},} where K is the kernel — a non-negative function — and h > 0 is a smoothing parameter called the bandwidth or simply width. A kernel with subscript h is called the scaled kernel and defined as Kh(x) = ⁠1/h⁠ K(⁠x/h⁠). Intuitively one wants to choose h as small as the data will allow; however, there is always a trade-off between the bias of the estimator and its variance. The choice of bandwidth is discussed in more detail below. A range of kernel functions are commonly used: uniform, triangular, biweight, triweight, Epanechnikov (parabolic), normal, and others. The Epanechnikov kernel is optimal in a mean square error sense, though the loss of efficiency is small for the kernels listed previously. Due to its convenient mathematical properties, the normal kernel is often used, which means K(x) = ϕ(x), where ϕ is the standard normal density function. The kernel density estimator then becomes f ^ h ( x ) = 1 n ∑ i = 1 n 1 h 2 π exp ⁡ ( − ( x − x i ) 2 2 h 2 ) , {\displaystyle {\hat {f}}_{h}(x)={\frac {1}{n}}\sum _{i=1}^{n}{\frac {1}{h{\sqrt {2\pi }}}}\exp \left({\frac {-(x-x_{i})^{2}}{2h^{2}}}\right),} where h {\displaystyle h} is the standard deviation of the sample x {\displaystyle \mathbf {x} } . The construction of a kernel density estimate finds interpretations in fields outside of density estimation. For example, in thermodynamics, this is equivalent to the amount of heat generated when heat kernels (the fundamental solution to the heat equation) are placed at each data point locations xi. Similar methods are used to construct discrete Laplace operators on point clouds for manifold learning (e.g. diffusion map). == Example == Kernel density estimates are closely related to histograms, but can be endowed with properties such as smoothness or continuity by using a suitable kernel. The diagram below based on these 6 data points illustrates this relationship: For the histogram, first, the horizontal axis is divided into sub-intervals or bins which cover the range of the data: In this case, six bins each of width 2. Whenever a data point falls inside this interval, a box of height 1/12 is placed there. If more than one data point falls inside the same bin, the boxes are stacked on top of each other. For the kernel density estimate, normal kernels with a standard deviation of 1.5 (indicated by the red dashed lines) are placed on each of the data points xi. The kernels are summed to make the kernel density estimate (solid blue curve). The smoothness of the kernel density estimate (compared to the discreteness of the histogram) illustrates how kernel density estimates converge faster to the true underlying density for continuous random variables. == Bandwidth selection == The bandwidth of the kernel is a free parameter which exhibits a strong influence on the resulting estimate. To illustrate its effect, we take a simulated random sample from the standard normal distribution (plotted at the blue spikes in the rug plot on the horizontal axis). The grey curve is the true density (a normal density with mean 0 and variance 1). In comparison, the red curve is undersmoothed since it contains too many spurious data artifacts arising from using a bandwidth h = 0.05, which is too small. The green curve is oversmoothed since using the bandwidth h = 2 obscures much of the underlying structure. The black curve with a bandwidth of h = 0.337 is considered to be optimally smoothed since its density estimate is close to the true density. An extreme situation is encountered in the limit h → 0 {\displaystyle h\to 0} (no smoothing), where the estimate is a sum of n delta functions centered at the coordinates of analyzed samples. In the other extreme limit h → ∞ {\displaystyle h\to \infty } the estimate retains the shape of the used kernel, centered on the mean of the samples (completely smooth). The most common optimality criterion used to select this parameter is the expected L2 risk function, also termed the mean integrated squared error: MISE ⁡ ( h ) = E [ ∫ ( f ^ h ( x ) − f ( x ) ) 2 d x ] {\displaystyle \operatorname {MISE} (h)=\operatorname {E} \!\left[\int \!{\left({\hat {f}}\!_{h}(x)-f(x)\right)}^{2}dx\right]} Under weak assumptions on f and K, (f is the, generally unknown, real density function), MISE ⁡ ( h ) = AMISE ⁡ ( h ) + o ( ( n h ) − 1 + h 4 ) {\displaystyle \operatorname {MISE} (h)=\operatorname {AMISE} (h)+{\mathcal {o}}{\left((nh)^{-1}+h^{4}\right)}} where o is the little o notation, and n the sample size (as above). The AMISE is the asymptotic MISE, i. e. the two leading terms, AMISE ⁡ ( h ) = R ( K ) n h + 1 4 m 2 ( K ) 2 h 4 R ( f ″ ) {\displaystyle \operatorname {AMISE} (h)={\frac {R(K)}{nh}}+{\frac {1}{4}}m_{2}(K)^{2}h^{4}R(f'')} where R ( g ) = ∫ g ( x ) 2 d x {\textstyle R(g)=\int g(x)^{2}\,dx} for a function g, m 2 ( K ) = ∫ x 2 K ( x ) d x {\textstyle m_{2}(K)=\int x^{2}K(x)\,dx} and f ″ {\displaystyle f''} is the second derivative of f {\displaystyle f} and K {\displaystyle K} is the kernel. The minimum of this AMISE is the solution to this differential equation ∂ ∂ h AMISE ⁡ ( h ) = − R ( K ) n h 2 + m 2 ( K ) 2 h 3 R ( f ″ ) = 0 {\displaystyle {\frac {\partial }{\partial h}}\operatorname {AMISE} (h)=-{\frac {R(K)}{nh^{2}}}+m_{2}(K)^{2}h^{3}R(f'')=0} or h AMISE = R ( K ) 1 / 5 m 2 ( K ) 2 / 5 R ( f ″ ) 1 / 5 n − 1 / 5 = C n − 1 / 5 {\displaystyle h_{\operatorname {AMISE} }={\frac {R(K)^{1/5}}{m_{2}(K)^{2/5}R(f'')^{1/5}}}n^{-1/5}=Cn^{-1/5}} Neither the AMISE nor the hAMISE formulas can be used directly since they involve the unknown density function f {\displaystyle f} or its second derivative f ″ {\displaystyle f''} . To overcome that difficulty, a variety of automatic, data-based methods have been developed to select the bandwidth. Several review studies have been undertaken to compare their efficacies, with the general consensus that the plug-in selectors and cross validation selectors are the most useful over a wide range of data sets. Substituting any bandwidth h which has the same asymptotic order n−1/5 as hAMISE into the AMISE gives that AMISE(h) = O(n−4/5), where O is the big O notation. It can be shown that, under weak assumptions, there cannot exist a non-parametric estimator that converges at a faster rate than the kernel estimator. Note that the n−4/5 rate is slower than the typical n−1 convergence rate of parametric methods. If the bandwidth is not held fixed, but is varied depending upon the location of either the estimate (balloon estimator) or the samples (pointwise estimator), this produces a particularly powerful method termed adaptive or variable bandwidth kernel density estimation. Bandwidth selection for kernel density estimation of heavy-tailed distributions is relatively difficult. === A rule-of-thumb bandwidth estimator === If Gaussian basis functions are used to approximate univariate data, and the underlying density being estimated is Gaussian, the optimal choice for h (that is, the bandwidth that minimises the mean integrated squared error) is: h = ( 4 σ ^ 5 3 n ) 1 / 5 ≈ 1.06 σ ^ n − 1 / 5 , {\displaystyle h={\left({\frac {4{\hat {\sigma }}^{5}}{3n}}\right)}^{1/5}\approx 1.06\,{\hat {\sigma }}\,n^{-1/5},} An h {\displaystyle h} value is considered more robust when it improves the fit for long-tailed and skewed distributions or for bimodal mixture distributions. This is often done empirically by replacing the standard deviation σ ^ {\displaystyle {\hat {\sigma }}} by the parameter A {\displaystyle A} below: A = min ( σ ^ , I Q R 1.34 ) {\displaystyle A=\min \left({\hat {\sigma }},{\frac {\mathrm {IQR} }{1.34}}\right)} where IQR is the