5002 Data Stream Memo

Ideas

Condition 1: There is no false negative (true one wil be true)
The difference between estimated and true is at most $\epsilon$N.(error rate)
True frequencies less than (s-$\epsilon$)N are classified as infrequent (the most error one are still infrequent)

$t = \lceil 1/ \epsilon ln(s^{-1}\sigma^{-1})\rceil$ (decide the size of each bucket)

1st bucket: 1 ~ 2t, size = 2t, r = 1 2nd bucket: 2t+1 ~ 4t, size = 2t, r = 2 3rd bucket: 4t ~ 8t, size = 4t, r = 4 … (r = 8, size = 8t…)

![[Pasted image 20250516204827.png]]

1st bucket, it just absort everything into entries.
change to 2nd bucket, toss coin to decrease and diminish counter to diminish used memo. (memo change to 2t / 2 = t)
2nd bucket, toss coin to add counter (upper bounder of added memo: 2t * 1/2 = t, prev memo = t)
change to 3rd bucket, memo change to 2t * 1/4 = t / 2
3rd bucket, upper bounder of added memo: 4t * 1/4 = t
Output: $f + \epsilon N \geq sN$ (considering error)

using fix-sized bucket
support threshold and error parameter bucket size = $\lceil \frac{1}{\epsilon} \rceil$ bucket index: count starting 1. i.e. ($b_\text{current} = \lceil\frac{N}{w}\rceil$)

![[Pasted image 20250516210217.png]]

100% $\epsilon$-deficient synopsis
at most $\lceil 1/\epsilon log(\epsilon N) \rceil$ entries (related to N, i.e. data you read)

just support threshold s and memory parameter M(the greatest number of possible entries stored in the memory)

![[Pasted image 20250516210935.png]] When memory full, we remove those entries with smallest $p_e = f + \Delta$, and add new insert as $(e, 1, p_e)$

We output: $f + \Delta \geq sN$ (consider error)