Hi! I am Ainesh Bakshi, a Research Fellow at Rutgers, New Brunswick. I finished my undergrad at Rutgers in January 2016. My research interests lie in the intersection of Algorithms and Machine Learning. Currently, I am working on Theoretical Machine Learning and Approximation Algorithms.
Previously, I interned in the Search and Discoverabiltiy Group at Bloomberg and the Machine Learning and Optimization group at Microsoft Research. I have also been involved with research in External Memory Algorithms and their application to File Systems.

## Manuscripts

• A. Bakshi, P. Awasthi, "Efficient Clustering Algorithms for Computing Better Local Optima" , Manuscript.
• A. Bakshi, "Polynomial-time Algorithm for {1, 2}-Instances below 2-perturbation resilience" , Manuscript.
• ## Publications

• A. Conway, A. Bakshi, Y. Jiao, W. Jannen, Y. Zhan, J. Yuan, M. Bender, R. Johnson, B. Kuzmaul, D. Porter, M. Farach-Colton,"File Systems fated for Senescence? Nonsense, Says Science!", To appear at USENIX Conference on File and Storage Technologies (FAST) 2017.
• R. Aggarwal, A. Bakshi, “Non Dominated Sorting Genetic Algorithm for Chance Constrained Supplier Selection Model with Volume Discounts” ACIIDS, Lecture Notes in Computer Science pp. 465–474, Apr 2014.
• A. Bakshi, K. Goel and Raunaq. Vohra, “A Novel Feature Selection and Extraction Technique for Classification” ICFHR 2014, pp. 104-109.
• R. Sant, N. Kulkarni, A. Bakshi, K. Goel, and S. Kapur, “Autonomous Robot Navigation: Path Planning on a Detail-Preserving Reduced-Complexity Representation of 3D Point Clouds” ICVS 2013, pp. 173–182, Jun 2013.
• ## Posters

• A. Bakshi, K. Goel and Raunaq. Vohra, “A Novel Feature Selection and Extraction Technique for Classification” SMC 2014, pp. 4033–4034, Oct 2014.
• A. Bakshi, Kostas Bekris, "Human Robot Interaction: Machine Vision and End Effector Control" 10th Annual Aresty Undergraduate Research Symposium, Rutgers.
• ## Research

#### Stable Clustering

Clustering is one of the most common problems in Machine Learning. Clustering under center based objectives ($k$-means, $k$-median, $k$-center) is NP-Hard. However, in practice, the $k$-means heuristic works reasonably well, and converges quickly. In order to better model the data sets we observe in the real world, we must move beyond worst case analysis. Therefore, we study various natural notions of "stability", including Bilu and Linial stability, where perturbing any set of pairwise distances by a factor of at most $\alpha$, the optimal clustering does not change. Assuming that our data is $\alpha$ stable provides structure that we can exploit to come up with polynomial-time algorithms to find the optimal clustering. Awasthi et. al. presented the first polynomial-time algorithm for clustering $3$-stable instances. The best known result is a polynomial-time algorithm for $2$-stable instances by Makarychev et. al. The only known non trivial lower bound is for the $k$-center objective which states that any instance that is $2-\epsilon$ stable is NP-Hard. There is no known lower bound for stable instances with $k$-means and $k$-median objective.
The best result we have so far shows that no existing reduction technique can be used to prove a lower bound for $k$-means and $k$-median objectives. Such reductions reduce NP-Complete problems like $1-3$ SAT, $3$-Cover or Perfect Dominating Set (PDS) to a clustering instance where all pair-wise distances are either 1 or 2. In particular, Ben-David and Reyzin (2014) reduce the perfect dominating set promise problem to a clustering instance with all pair-wise distances either $\frac{1}{2}$ or $1$ to hardness of $\alpha$-center stability. Since, scaling the distances by any factor preserves perturbation resilience, their instance is equivalent to one where all pair-wise distances are $1$ or $2$ (scaling by 2). Balcan, Haghtalab and White (2016) also reduce PDS to a clustering instance with pair-wise distances either $1$ or $2$ to show hardness of $(2-\epsilon)$-perturbation resilience under the k-center objective. I came up with a polynomial time algorithm that solves such instances for $\alpha \geq 1 + \epsilon$ , for any $\epsilon > 0$, given $k$-means or $k$-median objective. Therefore, we can conclude that such instances are easy, and we would need new reduction techniques to prove hardness results for $k$-means and $k$-median.
With Professor Awasthi, I also formalized the problem of efficiently computing good local optima for a non-convex optimization problem. We study this problem in the context of clustering and design polynomial-time algorithms to achieve the desired output. We define a good locally optimal solution as a clustering where each point is closer to the center of its optimal cluster than to any point outside the optimal cluster. Based on this definition, we prove that if a set of $n$ points is guaranteed to have a good locally optimal $k$-clustering, there exists a polynomial-time algorithm that outputs a set of $k$ clusters such that the resulting clustering is locally good.
Further, we also address the common issue in clustering problems that concerns the choice of $k$, the number of clusters. In many settings it is unclear apriori what the right number of clusters should be, whereas it is easier to prescribe a minimum cluster size. We prove that there exists a polynomial-time algorithm which when given a set of $n$ points such that there exists a good locally optimal clustering in which the size of each cluster is at least $r(n)$, outputs good locally optimal clustering where each cluster has cardinality at least $r(n)$.

#### External Memory Algorithms

External-memory data structures are analysed in terms of I/O complexity. $B^\epsilon$-trees have insert, point query and range query complexity of $\frac{\log_B{N}}{\epsilon B^{1-\epsilon}}$, $\frac{\log_B{N}}{\epsilon}$ and $\frac{\log_B{N}}{\epsilon} + \frac{k}{B}$ respectively, where $N$ is the total number of items in the tree, $B$ is the nodesize and $k$ is the range of the data we query over. In contrast, $B$-trees have $\log_B{N}$ insert and point query and $\log_B{N} + \frac{k}{B}$ range query complexity. A $B^\epsilon$-tree is a write optimised dictionary data structure, i.e. instead of writing entire blocks for small updates and inserts, it aggregates these operations to perform larger writes. Therefore, a larger nodesize ($B$) increases insert and range query performance without amplifying the number of writes, presenting a good theoretical candidate to mitigate aging in file systems.
As of now, most Unix based file systems used poor / no data structures and heuristics for space allocation and claim that aging ( read/write performance degradation over time ) is a solved problem. However, seek times on rotational disks have remained approximately the same and bandwidth has increased as a function of capacity, thus the theory predicts that fragmentation on disk should have an exacerbated effect on read performance (aging) over time. We created realistic workloads, such as a mailserver and running through the git history of large open-source projects that caused these heuristics to fail. For example, on EXT4 and ZFS, a few hundred git pull operations can reduce read performance by a factor of 2, as compared to a defragmented copy of the same file system, and performing a thousand pulls can reduce performance by up to $30$x. We created microbenchmarks to show that traditional file systems are very sensitive to file creation order. Creating files in random order can slow down file system reads by a factor of up to 8x which indicates that file-layout heuristics can fail easily. We also demonstrated that BetrFS, a file system based on $B^\epsilon$-tree avoids aging, which corroborates the theoretical performance guarantees of $B^\epsilon$-trees.

#### Query Reformulation

During the summer of '15 I interned in the Search and Discoverability group at Bloomberg L.P. The group primarily focuses on search within the Bloomberg terminal, which indexes a large amount of financial data, news, profiles and wikipedia articles. My project at Bloomberg was query reformulation, i.e. detecting ambiguous or vague queries and replacing or adding words to implicitly predict what the user might be searching for. I created a contextual language model for adding and substituting words using existing user query logs and generating n-grams for all the indexed documents. The contextual model was similar to the one presented by Wang & Zhai in the following paper:

X. Wang and C. Zhai "Mining Term Association Patterns from Search Logs for Effective Query Reformulation" In Proceedings of CIKM, pages 479-488, 2008.

I also contributed to generating click graphs, running hadoop clusters to aggregate training data and designing performance metrics for search result evaluation. In addition I evaluated the performance of the search engine by comparing the NDCG with and without the reformulation system. The reformulation system showed brilliant results that could also be used as suggested search queries. I integrated my service with the search engine and pushed the code to production, on all Bloomberg terminals across the world.

#### Kernel Approximation

During the summer of '14 I interned at at Microsoft Research, India, in the Machine Learning and Optimization group. I worked with Manik Varma on Non Linear Kernel approximations for Support Vector Machines. The most popular non-linear kernel is the Gaussian (RBF) Kernel, given its versatile applicability. However, the evaluation complexity is linear in the number of support vectors, which can be really large in real world datasets. I used a Taylor Series expansion to approximate the Additive Gaussian Kernel (similar to the RBF Kernel), giving tremendous speedup in evaluation time, bringing the complexity down to constant in the number of training examples. The algorithm performed up to 20% more accurately than other linear approximations like Efficient Additive Kernels ( Vedaldi et al. ) and Max Margin Additive Classifiers ( Maji et al ).

#### Human Robot Interaction

At Rutgers, I worked as a Aresty Research Assistant with Professor Kostas Bekris, who heads the Pracsys lab at Rutgers. The problem statement was to evaluate collision free trajectories that humans are accustomed to. The most efficient trajectory for a robotic arm doesn’t mimic motion that is natural and intuitive for humans. In a collaborative environment where safety and precision is essential, the robot must execute trajectories that are intuitive for humans. My project entailed mapping the environment, localizing the robot with respect to the environment, detecting pose, recognizing objects and obstacles and calculating optimal gripping configurations given parallel and vacuum grippers. The Pracsys team participated in the Amazon Picking Challenge and a video of the Robot in action can be found here. I presented a poster for the same at the Aresty Undergraduate Research Symposium.