Trie is an ordered tree data structure which stores words in it so that users can search a word or a prefix quickly. Time complexity for adding a word to a Trie is $O(l)$, and for searching a word or a prefix is $O(l)$ too ($l$ is the length of the word or the prefix being added or searched). So it’s why we prefer Trie rather than Set and Hash Table in some situations, such as Dynamic Set, Associative Array and a predictive text or autocomplete Dictionary.

Definition

The Trie is a tree where each node represents a single word or a prefix.

The root represents an empty string (“”), the nodes that are direct children of the root represent prefixes of length $1$, the nodes that are $2$ edges of distance from the root represent prefixes of length $2$, the nodes that are $3$ edges of distance from the root represent prefixes of length $3$ and so on. In other words, a node that is $k$ edges of distance of the root has an associated prefix of length $k$.

Let $v$ and $w$ be two nodes of the Trie, and assume that $v$ is the parent of $w$, then $v$ must have an associated prefix of $w$.

The figure below shows a Trie with the words “TREE”, “TRIE”, “ALGO”, “ASSOC”, “ALL”, and “ALSO.”

Trie

LeetCode Problem

211. Add and Search Word - Data structure design

Design a data structure that supports the following two operations: void addWord(word) bool search(word) search(word) can search a literal word or a regular expression string containing only letters “a-z” or “.”. A “.” means it can represent any one letter.

You may assume that all words are consist of lowercase letters a-z.

Solution

Since this problem need us to find whether that word actually exists or not, so we shouldn’t treat that word as a prefix of another word. Implementing a variant Trie with a flag representing it’s an end of a word can resolve that problem discussed before. This is the data structure called TrieNode:

classTrieNode{ privatestaticfinalint SIZE = 26; private TrieNode[] edges; // flag to represent whether this is an end of a word or not. privateboolean isWord;

publicTrieNode(){ edges = new TrieNode[SIZE]; isWord = false; }

publicclassWordDictionary{ private TrieNode head = new TrieNode();

// Adds a word into the data structure. publicvoidaddWord(String word){ TrieNode current = head; for (int i = 0; i < word.length(); i++) { char ch = word.charAt(i); if (current.has(ch)) { current = current.next(ch); } else { current = current.add(ch); }

} current.setWord(); }

// Returns if the word is in the data structure. A word could // contain the dot character '.' to represent any one letter. publicbooleansearch(String word){ return search(head, word); }

privatebooleansearch(TrieNode node, String word){ if (word.isEmpty()) { return node.isWord(); }

I am a Ph.D. student in Computer Science and Engineering at The Ohio State University (OSU), advised by Dr. Xiaoyi Lu. My professional interests involve high-performance interconnects and protocols, erasure coding, and distributed systems. In my ongoing and recent projects, which are mainly developed in C++ and Java, I focus on implementing RDMA (Remote Direct Memory Access)-based communication protocols, optimizing I/O performance by exploiting RDMA for data transmission, enabling distributed systems to leverage heterogeneous hardware-optimized erasure coders in parallel, and designing tripartite graph based EC paradigm to gain better overlapping and parallelism on HPC clusters. In the meantime, I have participated in characterizing deep learning over big data stacks recently, and have understandings of deep learning training on distributed systems and modern data centers. Prior to graduate school, I worked as a Software Engineer at MiningLamp and Weibo, China. I finished my Bachelor’s degree in Tianjin University, China.

This project aims to implement and compare variant machine learning techniques for predicting house prices. It is an expanded project from KaggleHouse Prices: Advanced Regression Techniques, such that the project is going to use the same dataset provided under this Kaggle competition. The machine learning techniques in this project include Linear Regression, Random Forest, Adaptive Boosting, Support Vector Regression, Gradient Boosted Decision Tree, K Nearest Neighbors, and Neural Network.

Trie is an ordered tree data structure which stores words in it so that users can search a word or a prefix quickly. Time complexity for adding a word to a Trie is $O(l)$, and for searching a word or a prefix is $O(l)$ too ($l$ is the length of the word or the prefix being added or searched). So it’s why we prefer Trie rather than Set and Hash Table in some situations, such as Dynamic Set, Associative Array and a predictive text or autocomplete Dictionary.

Though hash table is one of the most efficient data structures, it has some weaknesses.

For any hash function $\mathbf{h}$, a set of keys exist that can cause the average access time of a hash table to skyrocket. I.e. all keys from $\{k\in\mathcal{U} : \mathbf{h}(k) = i\}$ for some slot $i$.

Approach:

Choose the hash function at random, independently of the keys.

Randomized QuickSort is an efficient sort algorithm requiring only $\Theta(n\log n)$ expected time on any input.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

function RANDOMIZED-QUICKSORT(Array, low, high) if low < high then partition := RANDOMIZED-PARTITION(Array, low, high) RANDOMIZED-QUICKSORT(Array, low, partition - 1) RANDOMIZED-QUICKSORT(Array, partition + 1, high)

function RANDOMIZED-PARTITION(Array, low, high) pivotIndex := RANDOMIZED-CHOOSEPIVOT(Array, low, high) pivotValue := Array[pivotIndex] swap Array[high] and Array[pivotIndex] storeIndex := low for i from low to high do if Array[i] < pivotValue then swap Array[i] and Array[storeIndex] storeIndex := storeIndex + 1 swap Array[high] and Array[storeIndex] return storeIndex

Randomized Select

Randomized Select is an efficient algorithm to find $i$th smallest element of Array[low…high] with only linear expected time on any input.

1 2 3 4 5 6 7 8 9 10 11

function RANDOMIZED-SELECT(Array, low, high, i) if low = high then return Array[low] partition := RANDOMIZED-PARTITION(Array, low, high) rank := partition - low + 1 if i = rank then return Array[partition] else if i < rank then return RANDOMIZED-SELECT(Array, low, partition - 1, i) else return RANDOMIZED-SELECT(Array, partition + 1, high, i - rank)