What’s New in InsightIDR: Q4 2020 in Review

Post Syndicated from Margaret Zonay original https://blog.rapid7.com/2020/12/18/whats-new-in-insightidr-q4-2020-in-review/

What’s New in InsightIDR: Q4 2020 in Review

Throughout the year, we’ve provided roundups of what’s new in InsightIDR, our cloud-based SIEM tool (see the H1 recap post, and our most recent Q3 2020 recap post). As we near the end of 2020, we wanted to offer a closer look at some of the recent updates and releases in InsightIDR from Q4 2020.

Complete endpoint visibility with enhanced endpoint telemetry (EET)

With the addition of the enhanced endpoint telemetry (EET) add-on module, InsightIDR customers now have the ability to access all process start activity data (aka any events captured when an application, service, or other process starts on an endpoint) in InsightIDR’s log search. This data provides a full picture of endpoint activity, enabling customers to create custom detections, see the full scope of an attack, and effectively detect and respond to incidents. Read more about this new add-on in our blog here, and see our on-demand demo below.

Network Traffic Analysis: Insight Network Sensor for AWS now in general availability

In our last quarterly recap, we introduced our early access period for the Insight Network Sensor for AWS, and today we’re excited to announce its general availability. Now, all InsightIDR customers can deploy a network sensor on their AWS Virtual Private Cloud and configure it to communicate with InsightIDR. This new sensor generates the same data outputs as the existing Insight Network Sensor, and its ability to deploy in AWS cloud environments opens up a whole new way for customers to gain insight into what is happening within their cloud estates. For more details, check out the requirements here.

What’s New in InsightIDR: Q4 2020 in Review

New Attacker Behavior Analytics (ABA) threats

Our threat intelligence and detection engineering (TIDE) team and SOC experts are constantly updating our detections as they discover new threats. Most recently, our team added 86 new Attacker Behavior Analytics (ABA) threats within InsightIDR. Each of these threats is a collection of three rules looking for one of 38,535 specific Indicators of Compromise (IoCs) known to be associated with a malicious actor’s various aliases.  

In total, we have 258 new rules, or three for each type of threat. The new rule types for each threat are as follows:

  • Suspicious DNS Request – <Malicious Actor Name> Related Domain Observed
  • Suspicious Web Request – <Malicious Actor Name> Related Domain Observed
  • Suspicious Process – <Malicious Actor Name> Related Binary Executed

New InsightIDR detections for activity related to recent SolarWinds Orion attack: The Rapid7 Threat Detection & Response team has compared publicly available indicators against our existing detections, deployed new detections, and updated our existing detection rules as needed. We also published in-product queries so that customers can quickly determine whether activity related to the breaches has occurred within their environment. Rapid7 is closely monitoring the situation, and will continue to update our detections and guidance as more information becomes available. See our recent blog post for additional details.

Custom Parser editing

InsightIDR customers leveraging our Custom Parsing Tool can now edit fields in their pre-existing parsers. With this new addition, you can update the parser name, extract additional fields, and edit existing extracted fields. For detailed information on our Custom Parsing Tool capabilities, check out our help documentation here.

What’s New in InsightIDR: Q4 2020 in Review

Record user-driven and automated activity with Audit Logging

Available to all InsightIDR customers, our new Audit Logging service is now in Open Preview. Audit logging enables you to track user driven and automated activity in InsightIDR and across Rapid7’s Insight Platform, so you can investigate who did what, when. Audit Logging will also help you fulfill compliance requirements if these details are requested by an external auditor. Learn more about the Audit Logging Open Preview in our help docs here, and see step-by-step instructions for how to turn it on here.

What’s New in InsightIDR: Q4 2020 in Review

New event source integrations: Cybereason, Sophos Intercept X, and DivvyCloud by Rapid7

With our recent event source integrations with Cybereason and Sophos Intercept X, InsightIDR customers can spend less time jumping in and out of multiple endpoint protection tools and more time focusing on investigating and remediating attacks within InsightIDR.

  • Cybereason: Cybereason’s Endpoint Detection and Response (EDR) platform detects events that signal malicious operations (Malops), which can now be fed as an event source to InsightIDR. With this new integration, every time an alert fires in Cybereason, it will get relayed to InsightIDR. Read more in our recent blog post here.
  • Sophos Intercept X: Sophos Intercept X is an endpoint protection tool used to detect malware and viruses in your environment. InsightIDR features a Sophos Intercept X event source that you can configure to parse alert types as Virus Alert events. Check out our help documentation here.
  • DivvyCloud: This past spring, Rapid7 acquired DivvyCloud, a leader in Cloud Security Posture Management (CSPM) that provides real-time analysis and automated remediation for cloud and container technologies. Now, we’re excited to announce a custom log integration where cloud events from DivvyCloud can be sent to InsightIDR for analysis, investigations, reporting, and more. Check out our help documentation here.

Stay tuned for more!

As always, we’re continuing to work on exciting product enhancements and releases throughout the year. Keep an eye on our blog and release notes as we continue to highlight the latest in detection and response at Rapid7.

Not an InsightIDR customer? Start a free trial today.

Get Started

Правителството на ГЕРБ се държи “пропутински” България бави убежище за съратник на Навални

Post Syndicated from Николай Марченко original https://bivol.bg/%D0%B1%D1%8A%D0%BB%D0%B3%D0%B0%D1%80%D0%B8%D1%8F-%D0%B1%D0%B0%D0%B2%D0%B8-%D1%83%D0%B1%D0%B5%D0%B6%D0%B8%D1%89%D0%B5-%D0%B7%D0%B0-%D1%81%D1%8A%D1%80%D0%B0%D1%82%D0%BD%D0%B8%D0%BA-%D0%BD%D0%B0-%D0%BD.html

петък 18 декември 2020


41-годишният учител по английски и граждански активист от Русия Евгений Сергеевич Чупов почти 1,5 г. се опитва да получи политическо убежище в България. Чупов е доброволец от екипа на Иван Жданов – директор на Фонда за борба с корупцията (ФБК) на Алексей Навални и известен като неговата “дясна ръка”. В екипа на Навални Евгений е събирал подписи за него по време на предизборната кампания за Московската градска дума през 2019 г. Но на 28 май 2019 г. е арестуван от полицията, пребит и заплашен от началника на Центъра за борба с екстремизма към районното МВР на Москва Алексей Маскунов. „В Кавказ биха те убили“, казва полицейският шеф на активиста. Формалният повод за задържането е, че бил спукал гумите на колата на един от офицерите на полицията Игор Шепел, подчинен на Маскунов.

Задържането е станало 2 дни преди да дойде на традиционната си почивка в България, за кято има издадени визи за цялото семейство, заедно със самолетни билети. В България обаче Чупов получава предупреждение, че ако се върне в Русия го чака нов сигурен арест и репресивни действия от страна на властите. Освен това той е получил и заплахи за живота си по мобилно приложение от полицейски шефове в Москва. Поради тази причина руснакът решава да поиска политическо убежище в страната, заедно със семейството си. 

Въпреки абсурдността на обвиненията и опасността за живота на активиста при завръщането му в Русия, Държавната агенция за бежанците (ДАБ) мотае месеци наред семейството на Чупов, а двамата със съпругата му имат четири деца. В ръководството на агенцията с казуса “Чупови” е заето политически-ангажирано лице –  общинският съветник от ГЕРБ Иван Миланов. Той е известен с това, че е гласувал за скандалната продажба на земя на безценица в Божурище, което предизвика арест и обвинение от прокуратурата на кмета Георги Димов през май на 2019 г. Скандалният отказ за убежище е подписан от председателката на ДАБ Петя Първанова, която е и бивш служебен министър на вътрешните работи. 

Снимка: “Биволъ”

Докато световноизвестните медии като Bellingcat, CNN, Spiegel и The Insider Russia се цитират из целия свят след разследването им за осмината агенти на Федералната служба за сигурност ФСБ, отровили Алексей Навални, в София политическо убежище очаква един от доброволците на „Партията на прогреса“ на лидера на руската опозиция, участвал в кампанията за местните избори в Москва през 2019 г.

И това се случва на фона на осъденото от министъра на външните работи Екатерина Захариева отравяне на Алексей Навални, заради което през октомври 2020 г. България наравно с другите страни-членки на ЕС гласува допълнителни санкции срещу Москва.

Причината Евгений Чупов и семейството му да не са екстрадирани към Русия е това, че зад тях застанаха НПО-та като „Гласът на България“, „Българският червен кръст (БЧК), Атлантически съвет – България и руската правозащитна организация „Мемориал“. „Биволъ“ успя да говори с московския адвокат на активиста Владимир Воронин и да посети Евгений Чупов и семейството му, което даде ексклузивно интервю пред камерите на медията ни и предостави копия на цялата си документация по случая.

„Хванахме те!“

На 28 май 2019 г. московчанинът Евгений Чупов води децата си на детска градина. На връщане е задържан и прекарва 12 часа в арест към РПУ към Войковски район на гр. Москва. В интервю за „Биволъ“ той споделя, че задържането му е “без никакво официално обяснение защо”.

Снимка: Николай Марченко, “Биволъ”

Разследващият му показал някакъв протокол, според който той е нанесъл щети върху гумите на някаква кола „Киа Рио“, която по съвпадение е на сътрудник на същия Център за борба с екстремизма към МВР на Руската федерация.

Въпросният т. нар. Център „Е“ към МВР на Русия тормози от години екипа на Алексей Навални с външно наблюдение, следене преди и след протестите,  и задържане на активисти из цялата страна. Сътрудникът е майор в полицията Игор Шепел.

След като Чупов е заведен в стая без видеонаблюдение, полицаите искат да му отнемат смартфона. Но като отказва да го предостави без съответния протокол според процедурата, е ударен силно в корема.

Оставен е без телефон и без връзка с адвоката си, осигурен му от директора на Фонда за борба с корупцията (ФБК) на Алексей Навални – Иван Жданов.

Началникът на районния отдел за противодействие на екстремизъм Алексей Маскунов започва да преглежда съобщенията в телефона на активиста и контактите му.

Руският полицейски шеф Тимур Валиулин се прости с поста си заради недекларирани имоти в България

„Той ми каза: доброволец си в щаба на Иван Жданов, кандидата (на местните избори за Московската градска дума през лятото на 2019 г. – бел. ред.) на Навални. И сега те хванахме!“, припомня си Евгений.

След това полицейският началник записва и номерата на иззетите от него лични банкови карти.

„Каза ми: Сега ще видим кой те финансира!”

“Сред службите в Русия като КГБ (днес – ФСБ – бел. ред.) има мнение, че след като ние като активисти си защитаваме жилищата, дърветата, парковете, значи всички сме финансирани от Държавния департамент на САЩ“, разказва Чупов.

Маскунов също така го е заплашил, че полицията ще подхвърли наркотици в апартамента му и той ще трябва да лежи с години:

“Нали знаеш как действаме с наркотиците?“.

Обичайна практика на руските полицаи е да „натопяват“ лидерите на опозицията, НПО-та и журналисти с „намерените наркотици“. „Биволъ“ писа през 2019 г. как наркотици са подхвърлени на журналиста Иван Голунов, а това доведе до оставки след публикации на световни медии, които предизвикаха намесата на президента Владимир Путин: вижте материала „Репортерът на „Медуза“ в ареста, зам.-кмет на Москва – в пентхаус за 20 млн. евро“.

Репортерът на “Медуза” – в ареста, зам.-кмет на Москва – в пентхаус за €20 млн.

„Аз съм сигурен, че заплахите към мен са свързани с политическата ми активност, тъй като съм подкрепял кандидата за Московската държавна дума Иван Жданов, който ръководи Фонда за борба с коурпцията на Алексей Навални – тази НПО от години се занимава с разследвания на различни случаи на корупция, законови нарушения и други нередности в Русия“

Освен, че е доброволец по време на изборите през 2019 г., той е основател и координатор на московските квартални НПО-та «Отбрана на Головино», «Отбрана на Левобережни» и «Отбрана на района Аеропорт», които се борят с презастрояването, изсичането на дървета и др. нарушения от страна на общината.

Всичко това се случва само два дни преди ваканцията на Евгений Чупов и семейството му в България.  Те идват в страната ни всяко лято, за да прекарат отпуската си при приятелите им във Варна. Тъй като България не е в Шенген, визите на семейството са обяснението защо след три месеца не искат политическо убежище другаде.

Адвокатът: Ако се върне, го пращат в ареста!

Владимир Воронин (снимка: Twitter)

„Биволъ“ се свърза и с московския адвокат на Евгений Чупов Владимир Воронин, който от години сътрудничи с ФБК на Алексей Навални, за да изясни има ли опасност за активиста и семейството му, ако бъдат принудени да се върнат в Русия.

„Налице е абсолютната реалност, че по престъплението, което според законодателството на Руската федерация се смята за такова с неголяма тежест, наистина могат да го пратят в следствен арест и то за доста дълъг период от време“, коментира адв. Воронин пред медията ни.

„По силата на това, че най-вероятно ще сметнат, че се е укрил от руските власти,  това означава, че спрямо него може да бъде наложена най-строгата мярка за неотклонение – задържане под стража“.

„А след престой в следствения арест никой не става нито по-здрав, нито пък се оказва по-близо до семейството си. Затова разбира се, смятам, че ако се върне, ще бъде задържан под стража и смятам, че това е опасността“, категоричен бе защитникът на Евгений Чупов.

Адвокатът посочи и с какви доказателства разполага защитата за наличие на политическо преследване:

„Доказателствата ни са на базата на това, че Евгений е водил дейност в качеството си на доброволец”.

“Раздавал е флайъри, агитирал е в паркове, многократно е задържан и привличан към административна отговорност. Това, макар по руското законодателство и да не е много сурово подвеждане под отговорност, след това внезапно е задържан, след като е завел детето си до детска градина“.

Той изрази и възмущението си, че много продължително време не го допускали впри подзащитния му в качеството на негов адвокат. „Задържаха го сутринта. Научих това едва по обед. И не по-рано от 6 ч. вечерта бях допуснат при него. През цялото това време той беше без адвокат и какво точно се е случвало с него тогава, не знам“ , казва Владимир Воронин.

„И той ни разказва, че през цялото време с него са водени някакви неразбираеми разговори, които естествено не са предвидени от Процесуалния кодекс на РФ, казвали са му, че могат да му подхвърлят наркотици и да започнат срещу него по-сериозно наказателно преследване“.

„Затова тук виждам ясно, че по отношение на подзащитния е имало оказване на натиск. Той не е можел да се свърже с адвоката си, единственото което той е успял, е да напише на съпругата си, че е задържан. И тя да се свърже с адвоката му. През продължителния период от време не са му викали Спешна помощ, макар че той не се е чувствал добре. Иззели са му всичко, което е имал в себе си“, разказва още адвокатът.

Защитата е писала и съответните жалби с искане да бъдат върнати личните вещи на Евгенин Чупов:

„Мобилният му телефон, камера, още нещо имаше там от вещите му…Досега нищо не е върнато“.

„И когато неотдавна подадох заявление в полицейското районно, получих абсолютно неясен отговор за това, че срещу него е започнало наказателно преследване и това е. Тоест, наборът от всичките тези фактори, според мен, свидетелстват, че това е политически мотивирано дело“, разказа за „Биволъ“ Владимир Воронин.

Срамният отказ на ДАБ

През август 2019 г., вече в България, Евгений Чупов и съпругата му кандидатстват за закрила от Държавната агенция за бежанците по политически причини.

И тук започва сагата им от почти година и половина с безкрайната бюрокрация в България – интервюта, кореспонденция и чакане. Най-сложна е необходимостта от подновяване на личните карти през 3 месеца.

В началото на 2020 г. е привикан в Регистрационно-приемателния център – София в кв. Овча Купел. “Там разбрахме, че ни е отказано убежище“, припомня Чупов. Отказът е подписан лично от председателката на ДАБ Петя Първанова, която е и бивш служебен министър на вътрешните работи в кабинета на Марин Райков през 2013 г.

Руската служба на Радио Свободна Европа – Радио Свобода, публикува част от решението, според което:

СНимка: Николай Марченко, “Биволъ”

„изложената фактическа обстановка не дава основание да се предполага, че заявителят е бил принуден да напусне родината си заради реална опасност от сериозни посегателства като смъртно наказание, мъчения, нечовешко или унизително отношение или наказание“.

В Административния съд – София град са заведени три дела от името на Евгений Чупов, съпругата му с малолетните деца и за непълнолетните. При обжалване казусът трябва да бъде решен окончателно от Върховния административен съд (ВАС).

От ДАБ абсолютно наивно и формално едновременно твърдят, че били отправяли “запитване” до руското МВР с цел да проверят дали е имало случай на “упражнено насилие” над Евгений Чупов при задържането му през 2019 г.

Но били получили отрицателен отговор, че било “направено разследване за превишаване на правомощия на служители”, но то не усановило да е имало такова спрямо активиста. Нима в София са очаквали друг официален отговор от страна на Москва?

А през февруари инж. Иван Миланов, който е директор “Международна дейност” към ДАБ, пише доста странна 8-странична “справка” за ситуацията с правата на човека в Руската федерация, която е почти изцяло копипейст от руски медии. “Документът” съдържа доста повърхнoстен “анализ”, без да има достатъчно подробности за атаките срещу екипа на Алексей Навални, подложени на следене, обиски, задържания, блокиране на банкови карти и дори отвличания или арести.

Инж. Миланов отделя на темата едва един абзац, предпочитайки да разсъждава върху наличието на парламентарна “опозиция” в руската Държавна дума като партията ЛДПР на Владимир Жириновски, който самият от години не крие, че е управляван от Кремъл наравно с останалите формации в законодателния орган.

Общинският съветник в Божурище и директор в ДАБ Иван Миланов

Иван Миланов обаче няма как да не е ясно за какъв сериозен казус става дума, след като се представя за човек с “дипломатическа кариера” и е политически ангажирано лице. А именно – общински съветник от управляващата партия ГЕРБ.

Ето и как определя себе си в партийната листовка по време на кампанията, публикувана на уебстраницата на тогава все още кандидат-кмет на Божурище от ГЕРБ Георги Димов:

“Решителна и динамична личност съм. Силно позитивен, спортен характер, обичащ литературата, музиката и технологиите. На мен може да се разчита не само, когато сме добре…Имам изключително ниско ниво на компромис по отношение на лъжа и неправда”.

През лятото на 2019 г. при задържането на кмета на Божурище Георги Димов, общинският съветник Иван Миланов се оправдава пред bTV заради гласа си за сскандалната продажба на земя под пазарната, довела до арест на местния градоначалник:

“Подкрепих го, защото това е земя ливада”.

“Тя е таква земя, не може да бъде по-скъпа…”, коментира тогава “решителният” гербаджия Иван Миланов за мащабната измама в Община Божурище. Той обаче не се споменава сред задържаните или свидетели, продължавайки с “дипломатическата” си кариера в ръководството на ДАБ, където изготвя “справки” за случаите като този на семейство Чупови.

Медийният ефект 

Но въпреки мощната “аналитика” на ДАБ, медиите и НПО-тата се оказват далеч по-професионални и по-ефективни. През септември 2020 г. за отказа на ДАБ няколко пъти писа „Свободна Европа“. Известната в Русия правозащитна организация “Мемориал” предостави своето препоръчително писмо на Евгений Чупов, в което се опитва да увери властите в София, че за него е опасно да се връща в Русия.

Чупов

“Заплахите, с които Евгений Чупов ще се сблъска или би могъл да се сблъска в случай, че се върне в Русия, могат да бъдат окачествени като преследвания поради признака за принадлежност към определени политически убеждения”.

“Съответните заплахи могат да бъдат окачествени като “преследване” по смисъла на чл.1 на Конвенцията “За статута на бежанците” от 1951 г. и според допълнителния протокол към документа от 1967 г.”, гласи писмото на “Мемориал”, с което “Биволъ” разполага.

Публично изрази възмущението си и „Атлантическият съвет на България“, според който „българската държава трябва не с думи, а с конкретни действия и дела да доказва ежедневно, че е свободна, европейска, демократична и правова държава. 

Със съжаление научаваме, че все още не е уреден статутът на г-н Евгений Чупов. Отново ставаме свидетели на липсата на устойчиво справедливо решение. Атлантическият съвет на България настоява за незабавно решение на случая и предоставяне на политическо убежище на г-н Евгений Чупов. Вече има достатъчно публично оповестени сведения за упражнен в Русия полицейски натиск и насилие върху г-н Евгений Чупов заради това, че е бил част от доброволците, помагали в предизборната кампания на кандидат от екипа на руския опозиционен лидер Алексей Навални.  

Едва ли има в страната друг човек, който като Евгений Чупов да се чувства благодарен на българските медии. След публикациите на Каспаров.ру, Радио Свобода, сайтовете „Свободна Европа“,  Фактор.бг и становището на сайта на Атлантическия съвет – София, изведнъж на 25 септември 2020 г. ДАБ оттегля решението си. Това стана ясно от определението на Софийския административен съд (САД), който прекрати прекрати делото, заведено от Евгений Чупов с цел оспорване отказа на ДАБ за политическо убежище.

Агенцията на ЕС за основните права не може да помогне…

Оказа се, че ДАБ е оттеглила решението, с което отказва да даде закрила на Чупови. “Поради оттеглянето на оспорваното по делото решение, съдебното производство по делото е процесуално недопустимо”, пише в определението на съда. При обжалване казусът ще бъде решен окончателно от Върховния административен съд.

На 4 декеври 2020 г. Агенцията на ЕС за основните права (FRA) отговаря на Евгений Чупов, че няма как да му помогне по казуса му в България.

Със съжаление Ви уведомяваме, че мандатът на FRA не й дава правомощия да разглежда отделни случаи или жалби. Освен това агенцията няма правомощия да наблюдава държавите-членки на ЕС за наличие на нарушения на човешките права. Следователно, ние не можем да предложим никакъв съвет или помощ по Вашия случай.

Че напрактика не могат да въздействат върху властите в София отговарят на Евгений Чупов и от Европарламента.

Европарламентът нямало как да помогне…

„Биволъ“ се опита да се свърже с ръководството на ДАБ и нейния председател Петя Първанова. Тъй като не е била на работното място според секретарката й, се наложи да бъде проведен разговор с ръководителката на „Връзки с обществеността“ Калина Йотова.

„Не, официални позиции ние конкретно по казуси нямаме, тъй като когато се касае за граждани, които пред нас търсят международна закрила и са настанени при нас, ние нямаме и по принцип не даваме информация за такива хора“.

„Вижте, агенцията не се оправдава. Агенцията, когато предоставя информация, това е информацията на самите чужденци, на тези, за които се работи по техния статут, и на трети страни. И съответно ние нямаме практика, международното право не позволява да се дава такава информация, съответно да се коментират на съда решенията не е наша работа като институция, нали ме разбирате?“, каза тя.

Според нея колегите й в ДАБ, написали абсурдната „справка“ за ситуацията в Русия „си вършат съвестно работата“:

„Не ми е работата да им давам оценка на колегите си“.

На писмените въпроси от ДАБ щели да реагират „в обичайния срок, в който отговаряме“.

Петя Първанова с бившия главен прокурор Сотир Цацаров (Снимка: “Утро Русе”)

Занимаващата се със случая на Чупови началник на отдел „Производство за международна защита“ в РПЦ в Овча купел към ДАБ Елеонора Йорданова отказа да коментира ситуацията пред „Биволъ“ с мотива, че „не е упълномощена“.

Това не попречи на същата служителка през 2016 г. да търси оправдания от името на ДАБ пред Нова ТВ по повод пуснатия да живее на квартири в София “терорист от Анбах” Мохамед Далел, сириец, извършил атентат в германския град. Тогава тя твърди пред телевизията, че е бил “абсолютно уравновесен човек”. И допълва тогава също, че честите интервюта с очакващите убежище се провеждали тогава, когато се подготвял отказ за такова.

Официалният писмен отговор на пресслужбата на ДАБ от името на Петя Първанова също не носи каквато и да било сериозна информация, освен, че в институцията държат на дискретността и не искат никаква прозрачност по казусите като този на Евгений Чупов.

„В отговор на Вашето запитване, получено по електронната поща на 16 декември 2020 г., Ви информирам, че в момента лицето, от което се интересувате е в производство за международна закрила. Изследват се всички факти и обстоятелства, свързани с молбата му за международна закрила. В законоустановените срокове ще бъде постановено решение”.

Не става ясно и защо са толкова много провежданите интервюта, без да има яснота за бъдещето на семейството в България.

“С оглед спецификата на работата на Държавната агенция за бежанците при Министерския съвет (ДАБ при МС) и законовите актове, свързани с личните данни, ДАБ при МС не обсъжда конкретни казуси с трети лица. В рамките на производството търсещите закрила получават пълна информация за своите права и задължения, като могат да ангажират адвокати, които да ги представляват по време на процедурата и да защитават техните права“.

Докато отказва на едно семейство от Москва в правото да се настани в България, самата Петя Ангелова Първанова през 2019 г. се сдобива с къща с двор (реалната застроена площа – 105  кв. м) в с. Шипочане в Самоков срещу 56 хил. лв. Това показва данъчната й декларация за 2020 г., която може да се види в търсачката на “Биволъ” и сайта Bird.bg “Български политически лица”, според която ексминистърката от години трупа влогове в лева, евро и и британски лири по сметките си в банка ДСК и в ПИБ.

На 15 декември 2020 г. Евгений Чупов уведомява „Биволъ“, че е поканен на поредно интервю в центъра в Овча купел. „Преди малко ми се обадиха от агенцията, ново интервю е насрочено за понеделник в 10 ч.“, съобщи активистът.

Дали обаче ще получи дългоочакваното решение за политическо убежище, остава да гадаем, тъй като икономическите облаги от страна на Кремъл като проекта за газопровод „Турски поток 2“ са основният приоритет за правителството на Бойко Борисов. Добрата новина е, че санкциите на САЩ след встъпването в длъжност на избрания президент Джо Байдън през януари 2021 г., са въпрос на време.

Дотогава източноевропейските лидери като Бойко Борисов, Александър Вучич и Виктор Орбан ще се стараят да задоволят егото на руския държавен глава Владимир Путин, имитирайки „довършването“ на корупционния проект и създавайки всякакъв вид пречки за тези, които се борят срещу режима в Москва.

Запитан от „Биволъ“ какво би казал на премиера Бойко Борисов, Евгений Чупов бе непреклонен:

„Мисля, че българският министър-председател е наясно с това, което се случва в Руската федерация“.

Снимка: “Биволъ”

„Бих му казал: Отворете си очите! Ако сте подписали Женевската конвенция, си я спазвайте, ако сте приели съответното законодателство, спазвайте го!“.

Оказа се, че Чупови не са единствените руснаци, които бягат от Русия и безуспешно търсят закрила от ДАБ. Флора Ахметова и синът й Даниил от Санкт Петербург са получили от агенцията отказ за убежище, въпреки че семейството е заплашено от престъпните групировки в Русия. А в края на февруари те трябва да напуснат общежитието за бежанци в столичния квартал „Овча купел“, писа Faktor.bg, уведомен от Евгений Чупов за казуса на сънародничката му.

 

Security updates for Friday

Post Syndicated from original https://lwn.net/Articles/840731/rss

Security updates have been issued by Arch Linux (blueman, chromium, gdk-pixbuf2, hostapd, lib32-gdk-pixbuf2, minidlna, nsd, pam, and unbound), CentOS (gd, openssl, pacemaker, python-rtslib, samba, and targetcli), Debian (kernel, lxml, and mediawiki), Fedora (mbedtls), openSUSE (clamav and openssl-1_0_0), Oracle (firefox and openssl), Red Hat (openssl, postgresql:12, postgresql:9.6, and thunderbird), Scientific Linux (openssl and thunderbird), and SUSE (cyrus-sasl, openssh, slurm_18_08, and webkit2gtk3).

US Schools Are Buying Cell Phone Unlocking Systems

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/12/us-schools-are-buying-cell-phone-unlocking-systems.html

Gizmodo is reporting that schools in the US are buying equipment to unlock cell phones from companies like Cellebrite:

Gizmodo has reviewed similar accounting documents from eight school districts, seven of which are in Texas, showing that administrators paid as much $11,582 for the controversial surveillance technology. Known as mobile device forensic tools (MDFTs), this type of tech is able to siphon text messages, photos, and application data from student’s devices. Together, the districts encompass hundreds of schools, potentially exposing hundreds of thousands of students to invasive cell phone searches.

The eighth district was in Los Angeles.

Да оправим наредбата за електронните рецепти

Post Syndicated from Bozho original https://blog.bozho.net/blog/3673

Министерският съвет е приел изменение на наредбата за реда за отпускане на лекарствени продукти, добавяйки текстове за електронна рецепта., която е обнародвана днес.

Това, за съжаление, се е случило в пълно нарушение на Закона за нормативните актове и не е минало през обществено обсъждане, така че мога да дам коментари само пост фактум, с идеята все пак някои неща да бъдат прецизирани. Разбирам, че ако можеш да извадиш 30 дни от срока за приемане на наредба в кризисна ситуация е разумно да го направиш, но пандемия има от 9 месеца, а за електронна рецепта се говори от десетилетия.

Ако пишеш изменения на наредбата чак сега, това показва безкрайна неподготвеност. А наредбата явно е писана „на коляно“, с много недомислени неща и терминологични грешки. Да, ще свърши работа, колкото да могат да се издават електронни рецепти, но ще трябва да се коригира в движение.

Ето и моите критики и предложения:

  • Наредбата казва, че „електронните предписания се издават, въвеждат, обработват и съхраняват чрез специализиран медицински и аптечен софтуер.“. Електронните рецепти трябва да се съхраняват в Националната здравна информационна система (НЗИС), а не в медицинския и аптечен софтуер. Там може да се съхраняват временно и за удобство, но централното място за съхранение трябва да бъде НЗИС. В следваща алинея наредбата урежда „висящото“ положение на прогресивно появяваща се НЗИС, което може да звучи добре, но няма как да е нормативен акт. Наредба не може да казва „като стават някакви неща, ще видим как ще ги ползвате“.
  • Наредбата казва, че „За издаването на електронно предписание лекарят или лекарят по дентална медицина се идентифицира чрез КЕП“. Ред за идентифициране с КЕП има временен по наредба към Закона за електронното управление и е редно тази наредба да препрати към нея, защото иначе „идентификация с КЕП“ не значи нищо. А е важно в този контекст как точно ще се извършва идентификацията. Правилният подход е от квалифицираният електронен подпис на лекаря да се вземе ЕГН-то, то да се провери в регистъра на лекарите (или фармацевтите, за които имат сходен ред в следващия член) (вместо, например, да се изисква вписване на служебен номер в КЕП). Също така, идентификация трябва да може да се извършва и по реда на Закона за електронната идентификация. Но при издаване, процесът на идентификация може всъщност да е излишен – предвид, че рецепта се издава чрез софтуер при лекаря (и се проверява в аптеката), лекарят вече е влязъл в системата. Наредбата не казва пред кого се идентифицират, така че стъпката може да се премахне.
  • Според наредбата „При електронното предписване на лекарствения продукт се извършва автоматична проверка в регистъра [..]“ – тук е важно да се отбележат техническите параметри на тази проверка, т.е. посоченото по-горе – че на база на ЕГН се извлича УИН. Дали регистъра на БЛС и на фармацевтите поддържат такава справка? И да не поддържат, може бързо и лесно да се добави, тъй като я има. Но по-проблемното е тази алинея е, че тя не представлява норма, а прави разказ. Нормативните актове създават права и задължения. Не може да кажеш „се прави проверка“. Казваш кой е длъжен да направи тази проверка. Освен, че е лош нормативен текст, наистина не става ясно кой е длъжен да я прави – дали доставчикът на болничен и аптечен софтуер, или, както е правилно – НЗИС, откъдето трябва да минават всички рецепти. Само че НЗИС не е лице, на което може да се вменят задължения, така че трябва ясно да се напише, че Министерството на здравеопазването прави тази проверка чрез НЗИС.
  • Подписване на рецептата с КЕП според наредбата се прави след проверка в регистрите, а е редно да е обратно – в момента на изпращане на подписаната рецепта към НЗИС, да се проверят всичките ѝ реквизити.
  • След като пациентът отиде в аптеката,“ Магистър-фармацевтът извършва действия по идентифициране на издаденото електронно предписание, при които като водещ критерий използва ЕГН на пациента.“ – това тука е доста проблемно. Според публичните изказвания от преди месец идентифицирането на рецептата трябваше да става по ЕГН и последните 4 цифри от кода на рецептата. Този текст не казва как се прави проверка, „водещ критерий“ е неясно. Може ли само по този критерий (не би трябвало)? По кои други критерии може – ЕГН+номер на лична карта, ЕГН+4 цифри от номер на рецепта? По принцип е добре да се спести разпечатване на електронни рецепти (защото това би ги обезсмислило), така че предложението, което аз бях направил пред 8 месеца беше ЕГН+номер на лична карта, или поне част от номер на личната карта. Фармацевтът може да вижда няколко активни рецепти и е добре да знае коя да изпълни. Дали това трябва да се опише в наредба е спорно, но предвиждам обърквания, поне в началото
  • „При осигурени технически и организационни условия за това от Министерството на здравеопазването и НЗОК“ – това е много неясен критерий, за да го има в нормативен акт. Редно е държавата първо създаде условията и тогава да налага срокове.
  • Липсват изисквания за сигурност и защита на данните – в какъв вид НЗИС обработва и съхранява рецептите? След колко време те се изтриват или анонимизират? Има ли проследимост кой какви справки е правил – напр. фармацевти по кои ЕГН-та са търсили и съответно има ли отпуснати след това лекарства. Кой до каква функционалност има достъп? Как е уреден достъпа до НЗИС в МЗ, включително за справочни функционалности? Комисията за защита на личните данни дала ли е становище по проекта на наредба?
  • Липсват ясни инструкции за доставчиците на болничен и аптечен софтуер – къде и какви номенклатури да ползват и имат ли гаранция, че те са актуални. Наредбата казва, че „Програмните интерфейси и номенклатурите за обмен на информация между медицинския и аптечен софтуер и НЗИС се актуализират текущо“, но това е неясно и неприемливо. Липсва препратка към чл. 14 от наредбата към Закона за електронно управление, която урежда поддържането на версии на програмните интерфейси – не трябва да може МЗ/НЗИС да смени от днес за утре един итнерфейс и всичко да се счупи.
  • Не е уреден форматът на кода на рецептата. Това е малък проблем, но обикновено се урежда с нормативния акт, който въвежда дадена система. Бих предложил да следва предписанията на чл. 4, ал. 5 от наредбата към ЗЕУ, т.е. да ползва UUID (RFC 4122), особено ако няма да се налага да се цитират 4 цифри/букви от него

Дано електронните рецепти да заработят добре. Но това ще бъде въпреки тази наредба, която освен, че идва в най-последния възможен момент, нарушавайки Закона за нормативните актове, е и доста непрецизна и неясна. С две думи – така не се прави. МЗ възлага на Информационно обсжлуване НЗИС през юли. Оттогава имаше предостатъчно време да се подготви наредбата и да се приеме след обсъждане и изчистване на проблемите в нея.

Това е просто още един пример как се случва електронното управление у нас – на парче, на коляно, в последния момент и само под много силен обществен натиск. Все пак, хубавото е, че нещо се случва – че ще има електронни рецепти (и направления). Но Министерството на здравеопазването (и всяко друго) трябва да работи по-качествено и по-прозрачно.

Материалът Да оправим наредбата за електронните рецепти е публикуван за пръв път на БЛОГодаря.

Computing Euclidean distance on 144 dimensions

Post Syndicated from Marek Majkowski original https://blog.cloudflare.com/computing-euclidean-distance-on-144-dimensions/

Computing Euclidean distance on 144 dimensions

Computing Euclidean distance on 144 dimensions

Late last year I read a blog post about our CSAM image scanning tool. I remember thinking: this is so cool! Image processing is always hard, and deploying a real image identification system at Cloudflare is no small achievement!

Some time later, I was chatting with Kornel: “We have all the pieces in the image processing pipeline, but we are struggling with the performance of one component.” Scaling to Cloudflare needs ain’t easy!

The problem was in the speed of the matching algorithm itself. Let me elaborate. As John explained in his blog post, the image matching algorithm creates a fuzzy hash from a processed image. The hash is exactly 144 bytes long. For example, it might look like this:

00e308346a494a188e1043333147267a 653a16b94c33417c12b433095c318012
5612442030d14a4ce82c623f4e224733 1dd84436734e4a5d6e25332e507a8218
6e3b89174e30372d

The hash is designed to be used in a fuzzy matching algorithm that can find “nearby”, related images. The specific algorithm is well defined, but making it fast is left to the programmer — and at Cloudflare we need the matching to be done super fast. We want to match thousands of hashes per second, of images passing through our network, against a database of millions of known images. To make this work, we need to seriously optimize the matching algorithm.

Naive quadratic algorithm

The first algorithm that comes to mind has O(K*N) complexity: for each query, go through every hash in the database. In naive implementation, this creates a lot of work. But how much work exactly?

First, we need to explain how fuzzy matching works.

Given a query hash, the fuzzy match is the “closest” hash in a database. This requires us to define a distance. We treat each hash as a vector containing 144 numbers, identifying a point in a 144-dimensional space. Given two such points, we can calculate the distance using the standard Euclidean formula.

For our particular problem, though, we are interested in the “closest” match in a database only if the distance is lower than some predefined threshold. Otherwise, when the distance is large,  we can assume the images aren’t similar. This is the expected result — most of our queries will not have a related image in the database.

The Euclidean distance equation used by the algorithm is standard:

Computing Euclidean distance on 144 dimensions

To calculate the distance between two 144-byte hashes, we take each byte, calculate the delta, square it, sum it to an accumulator, do a square root, and ta-dah! We have the distance!

Here’s how to count the squared distance in C:

Computing Euclidean distance on 144 dimensions

This function returns the squared distance. We avoid computing the actual distance to save us from running the square root function – it’s slow. Inside the code, for performance and simplicity, we’ll mostly operate on the squared value. We don’t need the actual distance value, we just need to find the vector with the smallest one. In our case it doesn’t matter if we’ll compare distances or squared distances!

As you can see, fuzzy matching is basically a standard problem of finding the closest point in a multi-dimensional space. Surely this has been solved in the past — but let’s not jump ahead.

While this code might be simple, we expect it to be rather slow. Finding the smallest hash distance in a database of, say, 1M entries, would require going over all records, and would need at least:

  1. 144 * 1M subtractions
  2. 144 * 1M multiplications
  3. 144 * 1M additions

And more. This alone adds up to 432 million operations! How does it look in practice? To illustrate this blog post we prepared a full test suite. The large database of known hashes can be well emulated by random data. The query hashes can’t be random and must be slightly more sophisticated, otherwise the exercise wouldn’t be that interesting. We generated the test smartly by byte-swaps of the actual data from the database — this allows us to precisely control the distance between test hashes and database hashes. Take a look at the scripts for details. Here’s our first run of the first, naive, algorithm:

$ make naive
< test-vector.txt ./mmdist-naive > test-vector.tmp
Total: 85261.833ms, 1536 items, avg 55.509ms per query, 18.015 qps

We matched 1,536 test hashes against a database of 1 million random vectors in 85 seconds. It took 55ms of CPU time on average to find the closest neighbour. This is rather slow for our needs.

SIMD for help

An obvious improvement is to use more complex SIMD instructions. SIMD is a way to instruct the CPU to process multiple data points using one instruction. This is a perfect strategy when dealing with vector problems — as is the case for our task.

We settled on using AVX2, with 256 bit vectors. We did this for a simple reason — newer AVX versions are not supported by our AMD CPUs. Additionally, in the past, we were not thrilled by the AVX-512 frequency scaling.

Using AVX2 is easier said than done. There is no single instruction to count Euclidean distance between two uint8 vectors! The fastest way of counting the full distance of two 144-byte vectors with AVX2 we could find is authored by Vlad:

Computing Euclidean distance on 144 dimensions

It’s actually simpler than it looks: load 16 bytes, convert vector from uint8 to int16, subtract the vector, store intermediate sums as int32, repeat. At the end, we need to do complex 4 instructions to extract the partial sums into the final sum. This AVX2 code improves the performance around 3x:

$ make naive-avx2 
Total: 25911.126ms, 1536 items, avg 16.869ms per query, 59.280 qps

We measured 17ms per item, which is still below our expectations. Unfortunately, we can’t push it much further without major changes. The problem is that this code is limited by memory bandwidth. The measurements come from my Intel i7-5557U CPU, which has the max theoretical memory bandwidth of just 25GB/s. The database of 1 million entries takes 137MiB, so it takes at least 5ms to feed the database to my CPU. With this naive algorithm we won’t be able to go below that.

Vantage Point Tree algorithm

Since the naive brute force approach failed, we tried using more sophisticated algorithms. My colleague Kornel Lesiński implemented a super cool Vantage Point algorithm. After a few ups and downs, optimizations and rewrites, we gave up. Our problem turned out to be unusually hard for this kind of algorithm.

We observed “the curse of dimensionality”. Space partitioning algorithms don’t work well in problems with large dimensionality — and in our case, we have an enormous number of 144 dimensions. K-D trees are doomed. Locality-sensitive hashing is also doomed. It’s a bizarre situation in which the space is unimaginably vast, but everything is close together. The volume of the space is a 347-digit-long number, but the maximum distance between points is just 3060 – sqrt(255*255*144).

Space partitioning algorithms are fast, because they gradually narrow the search space as they get closer to finding the closest point. But in our case, the common query is never close to any point in the set, so the search space can’t be narrowed to a meaningful degree.

A VP-tree was a promising candidate, because it operates only on distances, subdividing space into near and far partitions, like a binary tree. When it has a close match, it can be very fast, and doesn’t need to visit more than O(log(N)) nodes. For non-matches, its speed drops dramatically. The algorithm ends up visiting nearly half of the nodes in the tree. Everything is close together in 144 dimensions! Even though the algorithm avoided visiting more than half of the nodes in the tree, the cost of visiting remaining nodes was higher, so the search ended up being slower overall.

Smarter brute force?

This experience got us thinking. Since space partitioning algorithms can’t narrow down the search, and still need to go over a very large number of items, maybe we should focus on going over all the hashes, extremely quickly. We must be smarter about memory bandwidth though — it was the limiting factor in the naive brute force approach before.

Perhaps we don’t need to fetch all the data from memory.

Short distance

The breakthrough came from the realization that we don’t need to count the full distance between hashes. Instead, we can compute only a subset of dimensions, say 32 out of the total of 144. If this distance is already large, then there is no need to compute the full one! Computing more points is not going to reduce the Euclidean distance.

The proposed algorithm works as follows:

1. Take the query hash and extract a 32-byte short hash from it

2. Go over all the 1 million 32-byte short hashes from the database. They must be densely packed in the memory to allow the CPU to perform good prefetching and avoid reading data we won’t need.

3. If the distance of the 32-byte short hash is greater or equal a best score so far, move on

4. Otherwise, investigate the hash thoroughly and compute the full distance.

Even though this algorithm needs to do less arithmetic and memory work, it’s not faster than the previous naive one. See make short-avx2. The problem is: we still need to compute a full distance for hashes that are promising, and there are quite a lot of them. Computing the full distance for promising hashes adds enough work, both in ALU and memory latency, to offset the gains of this algorithm.

There is one detail of our particular application of the image matching problem that will help us a lot moving forward. As we described earlier, the problem is less about finding the closest neighbour and more about proving that the neighbour with a reasonable distance doesn’t exist. Remember — in practice, we don’t expect to find many matches! We expect almost every image we feed into the algorithm to be unrelated to image hashes stored in the database.

It’s sufficient for our algorithm to prove that no neighbour exists within a predefined distance threshold. Let’s assume we are not interested in hashes more distant than, say, 220, which squared is 48,400. This makes our short-distance algorithm variation work much better:

$ make short-avx2-threshold
Total: 4994.435ms, 1536 items, avg 3.252ms per query, 307.542 qps

Origin distance variation

Computing Euclidean distance on 144 dimensions

At some point, John noted that the threshold allows additional optimization. We can order the hashes by their distance from some origin point. Given a query hash which has origin distance of A, we can inspect only hashes which are distant between |A-threshold| and |A+threshold| from the origin. This is pretty much how each level of Vantage Point Tree works, just simplified. This optimization — ordering items in the database by their distance from origin point — is relatively simple and can help save us a bit of work.

While great on paper, this method doesn’t introduce much gain in practice, as the vectors are not grouped in clusters — they are pretty much random! For the threshold values we are interested in, the origin distance algorithm variation gives us ~20% speed boost, which is okay but not breathtaking. This change might bring more benefits if we ever decide to reduce the threshold value, so it might be worth doing for production implementation. However, it doesn’t work well with query batching.

Transposing data for better AVX

But we’re not done with AVX optimizations! The usual problem with AVX is that the instructions don’t normally fit a specific problem. Some serious mind twisting is required to adapt the right instruction to the problem, or to reverse the problem so that a specific instruction can be used. AVX2 doesn’t have useful “horizontal” uint16 subtract, multiply and add operations. For example, _mm_hadd_epi16 exists, but it’s slow and cumbersome.

Instead, we can twist the problem to make use of fast available uint16 operands. For example we can use:

  1. _mm256_sub_epi16
  2. _mm256_mullo_epi16
  3. and _mm256_add_epu16.

The add would overflow in our case, but fortunately there is add-saturate _mm256_adds_epu16.

The saturated add is great and saves us conversion to uint32. It just adds a small limitation: the threshold passed to the program (i.e., the max squared distance) must fit into uint16. However, this is fine for us.

To effectively use these instructions we need to transpose the data in the database. Instead of storing hashes in rows, we can store them in columns:

Computing Euclidean distance on 144 dimensions

So instead of:

  1. [a1, a2, a3],
  2. [b1, b2, b3],
  3. [c1, c2, c3],

We can lay it out in memory transposed:

  1. [a1, b1, c1],
  2. [a2, b2, c2],
  3. [a3, b3, c3],

Now we can load 16 first bytes of hashes using one memory operation. In the next step, we can subtract the first byte of the querying hash using a single instruction, and so on. The algorithm stays exactly the same as defined above; we just make the data easier to load and easier to process for AVX.

The hot loop code even looks relatively pretty:

Computing Euclidean distance on 144 dimensions

With the well-tuned batch size and short distance size parameters we can see the performance of this algorithm:

$ make short-inv-avx2
Total: 1118.669ms, 1536 items, avg 0.728ms per query, 1373.062 qps

Whoa! This is pretty awesome. We started from 55ms per query, and we finished with just 0.73ms. There are further micro-optimizations possible, like memory prefetching or using huge pages to reduce page faults, but they have diminishing returns at this point.

Computing Euclidean distance on 144 dimensions
Roofline model from Denis Bakhvalov’s book‌‌

If you are interested in architectural tuning such as this, take a look at the new performance book by Denis Bakhvalov. It discusses roofline model analysis, which is pretty much what we did here.

Do take a look at our code and tell us if we missed some optimization!

Summary

What an optimization journey! We jumped between memory and ALU bottlenecked code. We discussed more sophisticated algorithms, but in the end, a brute force algorithm — although tuned — gave us the best results.

To get even better numbers, I experimented with Nvidia GPU using CUDA. The CUDA intrinsics like vabsdiff4 and dp4a fit the problem perfectly. The V100 gave us some amazing numbers, but I wasn’t fully satisfied with it. Considering how many AMD Ryzen cores with AVX2 we can get for the cost of a single server-grade GPU, we leaned towards general purpose computing for this particular problem.

This is a great example of the type of complexities we deal with every day. Making even the best technologies work “at Cloudflare scale” requires thinking outside the box. Sometimes we rewrite the solution dozens of times before we find the optimal one. And sometimes we settle on a brute-force algorithm, just very very optimized.

The computation of hashes and image matching are challenging problems that require running very CPU intensive operations.. The CPU we have available on the edge is scarce and workloads like this are incredibly expensive. Even with the optimization work talked about in this blog post, running the CSAM scanner at scale is a challenge and has required a huge engineering effort. And we’re not done! We need to solve more hard problems before we’re satisfied. If you want to help, consider applying!

Save the date for Coolest Projects 2021

Post Syndicated from Helen Drury original https://www.raspberrypi.org/blog/save-the-date-coolest-projects-2021/

The year is drawing to a close, and we are so excited for 2021!

More than 700 young people from 39 countries shared their tech creations in the free Coolest Projects online showcase this year! We loved seeing so many young people shine with their creative projects, and we can’t wait to see what the world’s next generation of digital makers will present at Coolest Projects in 2021.

A Coolest Projects participant showing off their tech creation

Mark your calendar for registration opening

Coolest Projects is the world-leading technology fair for young people! It’s our biggest event, and we are running it online again next year so that young people can participate safely and from wherever they are in the world.

Through Coolest Projects, young people are empowered to show the world something they’re making with tech — something THEY are excited about! Anyone up to age 18 can share their creation at Coolest Projects.

On 1 February, we will open registrations for the 2021 online showcase. Mark the date in your calendar! All registered projects will get their very own spot in the Coolest Projects online showcase gallery, where the whole world can discover them.

Taking part is completely free and enormously fun

If a young person in your life — your family, your classroom, your coding club — is making something with tech that they love, we want them to register it for Coolest Projects. It doesn’t matter how small or big their project is, because the Coolest Projects showcase is about celebrating the love we all share for getting creative with tech.

A teenage girl presenting a digital making project on a tablet

Everyone who registers a project becomes part of a worldwide community of peers who express themselves and their interests with creative tech. We will also have special judges pick their favourite projects! Taking part in Coolest Projects is a wonderful way to connect with others, be inspired, and learn from peers.

So if you know a tech-loving young person, get them excited for taking part in Coolest Projects!

“We are so very happy to have reached people who love to code and are enjoying projects from all over the world…everyone’s contributions have blown our minds…we are so so happy ️:woman-cartwheeling:️Thank you to Coolest Projects for hosting the best event EVER :star::star::star:

– mother of a participant in the 2020 online showcase

Want inspiration for projects? You can still explore all the wonderful projects from the 2020 showcase gallery.

A Coolest Projects participant

Young people can participate with whatever they’re making

Everyone is invited to take part in Coolest Projects — the showcase is for young people with any level of experience. The project they register can be whatever they like, from their very first Scratch animation, to their latest robotics project, website, or phone app. And we invite projects at any stages of the creation process, whether they’re prototypes, finished products, or works-in-progress!

  • To make the youngest participants and complete beginners feel like they belong, we work hard to make sure that taking part is a super welcoming and inspiring experience! In the showcase, they will discover what is possible with technology and how they can use it to shape their world.
  • And for the young creators who are super tech-savvy and make advanced projects, showcasing their creation at Coolest Projects is a great way to get it seen by some amazing people in the STEM sector: this year’s special judges were British astronaut Tim Peake, Adafruit CEO Limor Fried, and other fabulous tech leaders!

Sign up for the latest Coolest Projects news

To be the first to know when registration opens, you only have to sign up for our newsletter:

We will send you regular news about Coolest Projects to keep you up to date and help you inspire the young tech creator in your life!

The post Save the date for Coolest Projects 2021 appeared first on Raspberry Pi.

A quirk in the SUNBURST DGA algorithm

Post Syndicated from Nick Blazier original https://blog.cloudflare.com/a-quirk-in-the-sunburst-dga-algorithm/

A quirk in the SUNBURST DGA algorithm

A quirk in the SUNBURST DGA algorithm

On Wednesday, December 16, the RedDrip Team from QiAnXin Technology released their discoveries (tweet, github) regarding the random subdomains associated with the SUNBURST malware which was present in the SolarWinds Orion compromise. In studying queries performed by the malware, Cloudflare has uncovered additional details about how the Domain Generation Algorithm (DGA) encodes data and exfiltrates the compromised hostname to the command and control servers.

Background

The RedDrip team discovered that the DNS queries are created by combining the previously reverse-engineered unique guid (based on hashing of hostname and MAC address) with a payload that is a custom base 32 encoding of the hostname. The article they published includes screenshots of decompiled or reimplemented C# functions that are included in the compromised DLL. This background primer summarizes their work so far (which is published in Chinese).

RedDrip discovered that the DGA subdomain portion of the query is split into three parts:

<encoded_guid> + <byte> + <encoded_hostname>

An example malicious domain is:

7cbtailjomqle1pjvr2d32i2voe60ce2.appsync-api.us-east-1.avsvmcloud.com

Where the domain is split into the three parts as

Encoded guid (15 chars) byte Encoded hostname
7cbtailjomqle1p j vr2d32i2voe60ce2

The work from the RedDrip Team focused on the encoded hostname portion of the string, we have made additional insights related to the encoded hostname and encoded guid portions.

At a high level the encoded hostnames take one of two encoding schemes. If all of the characters in the hostname are contained in the set of domain name-safe characters "0123456789abcdefghijklmnopqrstuvwxyz-_." then the OrionImprovementBusinessLayer.CryptoHelper.Base64Decode algorithm, explained in the article, is used. If there are characters outside of that set in the hostname, then the OrionImprovementBusinessLayer.CryptoHelper.Base64Encode is used instead and ‘00’ is prepended to the encoding. This allows us to simply check if the first two characters of the encoded hostname are ‘00’ and know how the hostname is encoded.

These function names within the compromised DLL are meant to resemble the names of legitimate functions, but in fact perform the message encoding for the malware. The DLL function Base64Decode is meant to resemble the legitimate function name base64decode, but its purpose is actually to perform the encoding of the query (which is a variant of base32 encoding).

The RedDrip Team has posted Python code for encoding and decoding the queries, including identifying random characters inserted into the queries at regular character intervals.

One potential issue we encountered with their implementation is the inclusion of a check clause looking for a ‘0’ character in the encoded hostname (line 138 of the decoding script). This line causes the decoding algorithm to ignore any encoded hostnames that do not contain a ‘0’. We believe this was included because ‘0’ is the encoded value of a ‘0’, ‘.’, ‘-’ or ‘_’. Since fully qualified hostnames are comprised of multiple parts separated by ‘.’s, e.g. ‘example.com’, it makes sense to be expecting a ‘.’ in the unencoded hostname and therefore only consider encoded hostnames containing a ‘0’. However, this causes the decoder to ignore many of the recorded DGA domains.

As we explain below, we believe that long domains are split across multiple queries where the second half is much shorter and unlikely to include a ‘.’. For example ‘www2.example.c’ takes up one message, meaning that in order to transmit the entire domain ‘www2.example.c’ a second message containing just ‘om’ would also need to be sent. This second message does not contain a ‘.’ so its encoded form does not contain a ‘0’ and it is ignored in the RedDrip team’s implementation.

The quirk: hostnames are split across multiple queries

A list of observed queries performed by the malware was published publicly on GitHub. Applying the decoding script to this set of queries, we see some queries appear to be truncated, such as grupobazar.loca, but also some decoded hostnames are curiously short or incomplete, such as “com”, “.com”, or a single letter, such as “m”, or “l”.

When the hostname does not fit into the available payload section of the encoded query, it is split up across multiple queries. Queries are matched up by matching the GUID section after applying a byte-by-byte exclusive-or (xor).

Analysis of first 15 characters

Noticing that long domains are split across multiple requests led us to believe that the first 16 characters encoded information to associate multipart messages. This would allow the receiver on the other end to correctly re-assemble the messages and get the entire domain. The RedDrip team identified the first 15 bytes as a GUID, we focused on those bytes and will refer to them subsequently as the header.

We found the following queries that we believed to be matches without knowing yet the correct pairings between message 1 and message 2 (payload has been altered):

Part 1 – Both decode to “www2.example.c”
r1q6arhpujcf6jb6qqqb0trmuhd1r0ee.appsync-api.us-west-2.avsvmcloud.com
r8stkst71ebqgj66qqqb0trmuhd1r0ee.appsync-api.us-west-2.avsvmcloud.com

Part 2 – Both decode to “om”
0oni12r13ficnkqb2h.appsync-api.us-west-2.avsvmcloud.com
ulfmcf44qd58t9e82h.appsync-api.us-west-2.avsvmcloud.com

This gives us a final combined payload of www2.example.com

This example gave us two sets of messages where we were confident the second part was associated with the first part, and allowed us to find the following relationship where message1 is the header of the first message and message2 is the header of the second:

Base32Decode(message1) XOR KEY = Base32Decode(message2)

The KEY is a single character. That character is xor’d with each byte of the Base32Decoded first header to produce the Base32Decoded second header. We do not currently know how to infer what character is used as the key, but we can still match messages together without that information. Since A XOR B = C where we know A and C but not B, we can instead use A XOR C = B. This means that in order to pair messages together we simply need to look for messages where XOR’ing them together results in a repeating character (the key).

Base32Decode(message1) XOR Base32Decode(message2) = KEY

Looking at the examples above this becomes

Message 1 Message 2
Header r1q6arhpujcf6jb 0oni12r13ficnkq
Base32Decode (binary) 101101000100110110111111011
010010000000011001010111111
01111000101001110100000101
110110010010000011010010000
001000110110110100111100100
00100011111111000000000100

We’ve truncated the results slightly, but below shows the two binary representations and the third line shows the result of the XOR.

101101000100110110111111011010010000000011001010111111011110001010011101
110110010010000011010010000001000110110110100111100100001000111111110000
011011010110110101101101011011010110110101101101011011010110110101101101

We can see the XOR result is the repeating sequence ‘01101101’meaning the original key was 0x6D or ‘m’.

We provide the following python code as an implementation for matching paired messages (Note: the decoding functions are those provided by the RedDrip team):

# string1 is the first 15 characters of the first message
# string2 is the first 15 characters of the second message
def is_match(string1, string2):
    encoded1 = Base32Decode(string1)
    encoded2 = Base32Decode(string2)
    xor_result = [chr(ord(a) ^ ord(b)) for a,b in zip(encoded1, encoded2)]
    match_char = xor_result[0]
    for character in xor_result[0:9]:
        if character != match_char:
            return False, None
    return True, "0x{:02X}".format(ord(match_char))

The following are additional headers which based on the payload content Cloudflare is confident are pairs (the payload has been redacted because it contains hostname information that is not yet publicly available):

Example 1:

vrffaikp47gnsd4a
aob0ceh5l8cr6mco

xorkey: 0x4E

Example 2:

vrffaikp47gnsd4a
aob0ceh5l8cr6mco

xorkey: 0x54

Example 3:

vvu7884g0o86pr4a
6gpt7s654cfn4h6h

xorkey: 0x2B

We hypothesize that the xorkey can be derived from the header bytes and/or padding byte of the two messages, though we have not yet determined the relationship.

2020 ISO certificates are here, with a new Region and increased in-scope services

Post Syndicated from Anastasia Strebkova original https://aws.amazon.com/blogs/security/2020-iso-certificates-are-here-with-a-new-region-and-increased-in-scope-services/

Amazon Web Services (AWS) successfully completed the surveillance audits with no findings for ISO 9001, 27001, 27017, or 27018. Ernst and Young Certify Point auditors reissued the certificates on November 6, 2020. The certificates validate ISO compliance of our Information Security Management System from the perspective of third-party auditors.

We included 9 additional AWS services in scope for these audits in 2020, validated against ISO 9001, 27001, 27017, and 27018. We also added a new Cape Town Region to the scope, which was validated against ISO 9001, 27001, 27017, and 27018 standards before the general launch.

The services added to our ISO program during the 2020 audit cycle include the following:

AWS CloudEndure now expands to include ISO 9001, 27017, and 27018, in addition to the existing 27001 certification.

The list of ISO certified services is available on the AWS webpage, and we provide the certifications online and in the console via AWS Artifact, as well.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Anastasia Strebkova

Anastasia is a Security Assurance Manager at Amazon Web Services on the Global Audits team, managing the AWS ISO portfolio. She has previously worked on IT audits, governance, risk, privacy, business continuity, and information security program management for cloud enterprises. Anastasia holds a Bachelor of Arts degree in Civil Law from Moscow Law Academy.

Help Others Be “Cyber Aware” This Festive Season—And All Year Round!

Post Syndicated from Jen Ellis original https://blog.rapid7.com/2020/12/17/help-others-be-cyber-aware-this-festive-season-and-all-year-round/

Help Others Be

Are you tired of being the cybersecurity help desk for everyone you know? Are you frustrated with spending all your time securing your corporate environment, only to have to deal with the threat that snuck in through naive end-users? Are you new to security and wondering how you ended up here? This blog is for you!

Introducing the Cyber Aware Campaign

Every year, November and December tend to be awash with media articles sharing tips for “safe” online shopping, particularly around Cyber Monday. This has been compounded in 2020, a year characterized in cybersecurity by increased remote working, reliance on online and delivery services, and COVID-19-themed scams and attacks. Many have viewed 2020 as a hacker’s playground.

It’s in this setting then that the U.K. government has relaunched its Cyber Aware campaign to help internet citizens navigate the rocky shores of defending their digital lives. The campaign—which features TV, radio, and print ads, as well as various (virtual) events—offers six practical and actionable tips for helping people protect themselves online.

The tips are designed to be applicable to the broadest audience possible. They are not necessarily the most sophisticated security best practices, but rather (and very intentionally), they are fairly basic and applicable to a wide range of people. The list has been devised as the result of considerable development and testing: The U.K. government not only sought input from security experts, but also from nonprofits and civil society groups representing various constituent groups. This helped them ensure the tips would be practical for everyone from your granny to your favorite athlete (maybe they are the same person).

As with enterprise security, there is regrettably no silver bullet for personal security, so these tips will not make people completely invulnerable. However, they do focus on steps that are manageable and will meaningfully reduce risk exposure for individuals. The U.K. government has focused on finding a balance between being thorough and not alienating people from making the effort, hence settling on just six tips. Naturally, we prefer things that come in sevens, but this is a decent start. 😉

The tips

Four of the six tips focus on passwords and identity access management. This seems like a good choice; it’s extremely hard to change behavior such that people stop sharing personal information or clicking on links, but if you can make it harder for attackers to access accounts, that’s a good step toward meaningfully reducing risk.

So, let’s take a look at the actual tips…

  1. Use a strong and separate password for your email
  2. Create strong passwords using three random words
  3. Save your passwords in your browser
  4. Turn on two-factor authentication (2FA)
  5. Update your devices
  6. Back up your data

We recommend clicking on the links and taking a look at the full guidance. Or, for more information on the tips, how they were developed, and what the Cyber Aware campaign entails, check out this Security Nation podcast interview with the delightful Cub Llewelyn-Davies of the UK National Cyber Security Centre.

As a starting point or personal security baseline, this is a very decent list, and we hope it will have a meaningful impact in encouraging individuals to make a few small changes to protect themselves online.  

As overzealous security enthusiasts, though, we had to take it one step further. We’ve created a free personal security guide of our own that starts with the Cyber Aware steps, then offers additional advice for those that want to go further. We know that for the vast majority of internet users, even six steps feels like too many, but we also hold out hope that many people may be inspired to dig deeper or may just have more specific circumstances they need help with.

You can download the guide for free here. Maybe include it with your holiday cards this year—personal security is the gift that keeps on giving!

Why should you care about this?

If you are reading the Rapid7 blog, the chances are that you already think about security and are almost certainly taking these steps or some appropriate alternative to them (if only more websites accepted 50-character passwords, eh?). Nonetheless, even if you are a security professional, the need to educate others likely affects you. Maybe it’s because you’re sick of constantly being asked for security tips or assistance by family and friends. Maybe you just can’t handle reading more headlines about security incidents that could have been avoided with some basic personal security hygiene. Maybe you’re worried that no matter how diligently you work to protect your corporate environment, an attacker will gain a foothold through an unwitting end-user with access to your systems.

The point is that we are all engaging in the internet together. A better informed internet citizenry is one that makes the job of attackers slightly harder, reducing the potential opportunities for attackers and raising the bar of entry into the cybercrime economy. It’s not a revolution or that ever-elusive silver bullet that will save us all, but increasing even the basic security level of all internet citizens creates a more secure ecosystem for everyone. As security professionals, we should be highly invested in seeing that become a reality, so send the guide or Cyber Aware web page to your less security-savvy friends, family, and/or users today.

Help them become more Cyber Aware, and help create a safer internet for us all.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

More HaXmas blogs

138 AWS services achieve CSA STAR Level 2 certification

Post Syndicated from Anastasia Strebkova original https://aws.amazon.com/blogs/security/138-aws-services-achieve-csa-star-level-2-certification/

We’re excited to announce that Amazon Web Services (AWS) has achieved Cloud Security Alliance (CSA) Security Trust Assurance and Risk (STAR) Level 2 certification with no findings.

CSA STAR Level 2 certification is a rigorous third-party independent assessment of the security of a cloud service provider. The certification demonstrates that a cloud service provider conforms to the applicable requirements of the ISO/IEC 27001:2013 management system standard and has addressed requirements critical to cloud security as outlined in the CSA Cloud Controls Matrix criteria. CSA STAR Level 2 certification verifies for cloud customers the use of best practices and the security posture of AWS Cloud offerings.

Ernst and Young Certify Point issued the certificate on November 6, 2020. The covered AWS Regions are included on the CSA STAR Level 2 certificate and the full list of AWS services in scope for CSA STAR Level 2 is available on our ISO and CSA STAR Certified webpage. You can view and download our CSA STAR Level 2 certificate online and in the console via AWS Artifact. The certificate is also available for download from the CSA STAR certification registry.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Anastasia Strebkova

Anastasia is a Security Assurance Manager at Amazon Web Services on the Global Audits team, managing the AWS ISO portfolio. She has previously worked on IT audits, governance, risk, privacy, business continuity, and information security program management for cloud enterprises. Anastasia holds a Bachelor of Arts degree in Civil Law from Moscow Law Academy.

More on the SolarWinds Breach

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/12/more-on-the-solarwinds-breach.html

The New York Times has more details.

About 18,000 private and government users downloaded a Russian tainted software update –­ a Trojan horse of sorts ­– that gave its hackers a foothold into victims’ systems, according to SolarWinds, the company whose software was compromised.

Among those who use SolarWinds software are the Centers for Disease Control and Prevention, the State Department, the Justice Department, parts of the Pentagon and a number of utility companies. While the presence of the software is not by itself evidence that each network was compromised and information was stolen, investigators spent Monday trying to understand the extent of the damage in what could be a significant loss of American data to a foreign attacker.

It’s unlikely that the SVR (a successor to the KGB) penetrated all of those networks. But it is likely that they penetrated many of the important ones. And that they have buried themselves into those networks, giving them persistent access even if this vulnerability is patched. This is a massive intelligence coup for the Russians and failure for the Americans, even if no classified networks were touched.

Meanwhile, CISA has directed everyone to remove SolarWinds from their networks. This is (1) too late to matter, and (2) likely to take many months to complete. Probably the right answer, though.

This is almost too stupid to believe:

In one previously unreported issue, multiple criminals have offered to sell access to SolarWinds’ computers through underground forums, according to two researchers who separately had access to those forums.

One of those offering claimed access over the Exploit forum in 2017 was known as “fxmsp” and is wanted by the FBI “for involvement in several high-profile incidents,” said Mark Arena, chief executive of cybercrime intelligence firm Intel471. Arena informed his company’s clients, which include U.S. law enforcement agencies.

Security researcher Vinoth Kumar told Reuters that, last year, he alerted the company that anyone could access SolarWinds’ update server by using the password “solarwinds123”

“This could have been done by any attacker, easily,” Kumar said.

Neither the password nor the stolen access is considered the most likely source of the current intrusion, researchers said.

That last sentence is important, yes. But the sloppy security practice is likely not an isolated incident, and speaks to the overall lack of security culture at the company.

And I noticed that SolarWinds has removed its customer page, presumably as part of its damage control efforts. I quoted from it. Did anyone save a copy?

EDITED TO ADD: Both the Wayback Machine and Brian Krebs have saved the SolarWinds customer page.

Accelerating Amazon Redshift federated query to Amazon Aurora MySQL with AWS CloudFormation

Post Syndicated from BP Yau original https://aws.amazon.com/blogs/big-data/accelerating-amazon-redshift-federated-query-to-amazon-aurora-mysql-with-aws-cloudformation/

Amazon Redshift federated query allows you to combine data from one or more Amazon Relational Database Service (Amazon RDS) for MySQL and Amazon Aurora MySQL databases with data already in Amazon Redshift. You can also combine such data with data in an Amazon Simple Storage Service (Amazon S3) data lake.

This post shows you how to set up Aurora MySQL and Amazon Redshift with a TPC-DS dataset so you can take advantage of Amazon Redshift federated query using AWS CloudFormation. You can use the environment you set up in this post to experiment with various use cases in the post Announcing Amazon Redshift federated querying to Amazon Aurora MySQL and Amazon RDS for MySQL.

Benefits of using CloudFormation templates

The standard workflow for setting up Amazon Redshift federated query involves six steps. For more information, see Querying data with federated queries in Amazon Redshift. With a CloudFormation template, you can condense these manual procedures into a few steps listed in a text file. The declarative code in the file captures the intended state of the resources that you want to create and allows you to automate the setup of AWS resources to support Amazon Redshift federated query. You can further enhance this template to become the single source of truth for your infrastructure.

A CloudFormation template acts as an accelerator. It helps you automate the deployment of technology and infrastructure in a safe and repeatable manner across multiple Regions and accounts with the least amount of effort and time.

Architecture overview

The following diagram illustrates the solution architecture.

The following diagram illustrates the solution architecture.

The CloudFormation template provisions the following components in the architecture:

  • VPC
  • Subnets
  • Route tables
  • Internet gateway
  • Amazon Linux bastion host
  • Secrets
  • Aurora for MySQL cluster with TPC-DS dataset preloaded
  • Amazon Redshift cluster with TPC-DS dataset preloaded
  • Amazon Redshift IAM role with required permissions

Prerequisites

Before you create your resources in AWS CloudFormation, you must complete the following prerequisites:

Setting up resources with AWS CloudFormation

This post provides a CloudFormation template as a general guide. You can review and customize it to suit your needs. Some of the resources that this stack deploys incur costs when in use.

To create your resources, complete the following steps:

  1. Sign in to the console.
  2. Choose the us-east-1 Region in which to create the stack.
  3. Choose Launch Stack:
  4. Choose Next.

This automatically launches AWS CloudFormation in your AWS account with a template. It prompts you to sign in as needed. You can view the CloudFormation template from within the console.

  1. For Stack name, enter a stack name.
  2. For Session, leave as the default.
  3. For ec2KeyPair, choose the key pair you created earlier.
  4. Choose Next.

This automatically launches AWS CloudFormation in your AWS account with a template.

  1. On the next screen, choose Next.
  2. Review the details on the final screen and select I acknowledge that AWS CloudFormation might create IAM resources.
  3. Choose Create.

Stack creation can take up to 45 minutes.

  1. After the stack creation is complete, on the Outputs tab of the stack, record the value of the key for the following components, which you use in a later step:
  • AuroraClusterEndpoint
  • AuroraSecretArn
  • RedshiftClusterEndpoint
  • RedshiftClusterRoleArn

As of this writing, this feature is in public preview. You can create a snapshot of your Amazon Redshift cluster created by the stack and restore the snapshot as a new cluster in the sql_preview maintenance track with the same configuration.

You can create a snapshot of your Amazon Redshift cluster created by the stack and restore the snapshot as a new cluster

You’re now ready to log in to both the Aurora MySQL and Amazon Redshift cluster and run some basic commands to test them.

Logging in to the clusters using the Amazon Linux bastion host

The following steps assume that you use a computer with an SSH client to connect to the bastion host. For more information about connecting using various clients, see Connect to your Linux instance.

  1. Move the private key of the EC2 key pair (that you saved previously) to a location on your SSH client, where you are connecting to the Amazon Linux bastion host.
  2. Change the permission of the private key using the following code, so that it’s not publicly viewable:
    chmod 400 <private key file name; for example, bastion-key.pem>

  1. On the Amazon EC2 console, choose Instances.
  2. Choose the Amazon Linux bastion host that the CloudFormation stack created.
  3. Choose Connect.
  4. Copy the value for SSHCommand.
  5. On the SSH client, change the directory to the location where you saved the EC2 private key, and enter the SSHCommand value.
  6. On the console, open the AWS Secrets Manager dashboard.
  7. Choose the secret secretAuroraMasterUser-*.
  8. Choose Retrieve secret value.
  9. Record the password under Secret key/value, which you use to log in to the Aurora MySQL cluster.
  10. Choose the secret SecretRedshiftMasterUser.
  11. Choose Retrieve secret value.
  12. Record the password under Secret key/value, which you use to log in to the Amazon Redshift cluster.
  13. Log in to both Aurora MySQL using the MySQL Command-Line Client and Amazon Redshift using query editor.

The CloudFormation template has already set up MySQL Command-Line Client binaries on the Amazon Linux bastion host.

  1. On the Amazon Redshift console, choose Editor.
  2. Choose Query editor.
  3. For Connection, choose Create new connection.
  4. For Cluster, choose the Amazon Redshift cluster.
  5. For Database name, enter your database.
  6. Enter the database user and password recorded earlier.
  7. Choose Connect to database.

Choose Connect to database.

  1. Enter the following SQL command:
    select "table" from svv_table_info where schema='public';

You should see 25 tables as the output.

You should see 25 tables as the output.

  1. Launch a command prompt session of the bastion host and enter the following code (substitute <AuroraClusterEndpoint> with the value from the AWS CloudFormation output):
    mysql --host=<AuroraClusterEndpoint> --user=awsuser --password=<database user password recorded earlier>

  1. Enter the following SQL command:
    use tpc;
    show tables;
    

You should see the following eight tables as the output:

mysql> use tpc;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+------------------------+
| Tables_in_tpc          |
+------------------------+
| customer               |
| customer_address       |
| household_demographics |
| income_band            |
| item                   |
| promotion              |
| web_page               |
| web_sales              |
+------------------------+
8 rows in set (0.01 sec)

Completing federated query setup

The final step is to create an external schema to connect to the Aurora MySQL instance. The following example code creates an external schema statement that you need to run on your Amazon Redshift cluster to complete this step:

CREATE EXTERNAL SCHEMA IF NOT EXISTS mysqlfq 
FROM MYSQL 
DATABASE 'tpc' 
URI '<AuroraClusterEndpoint>' 
PORT 3306 
IAM_ROLE '<IAMRole>' 
SECRET_ARN '<SecretARN>'

Use the following parameters:

  • URI – The AuroraClusterEndpoint value from the CloudFormation stack outputs. The value is in the format <stackname>-cluster.<randomcharacter>.us-east-1.rds.amazonaws.com.
  • IAM_Role – The RedshiftClusterRoleArn value from the CloudFormation stack outputs. The value is in the format arn:aws:iam::<accountnumber>:role/<stackname>-RedshiftClusterRole-<randomcharacter>.
  • Secret_ARN – The AuroraSecretArn value from the CloudFormation stack outputs. The value is in the format arn:aws:secretsmanager:us-east-1:<accountnumber>: secret:secretAuroraMasterUser-<randomcharacter>. 

Federated query test

Now that you have set up federated query, you can start testing the feature using the TPC-DS dataset that was preloaded into both Aurora MySQL and Amazon Redshift.

For example, the following query aggregates the total net sales by product category and class from the web_sales fact table and date and item dimension tables. Tables web_sales and date are stored in Amazon Redshift, and the item table is stored in Aurora MySQL:

select
    sum(ws_net_paid
    ) as total_sum, i_category, i_class, 0 as g_category, 0 as g_class  
from
    web_sales ,date_dim d1 ,mysqlfq.item 
where
    d1.d_month_seq between 1205 
    and 1205+11 
    and d1.d_date_sk = ws_sold_date_sk 
    and i_item_sk = ws_item_sk 
group
    by i_category,i_class ; 

You can continue to experiment with the dataset and explore the three main use cases in the post [exact name of post title with embedded link].

Cleaning up

When you’re finished, delete the CloudFormation stack, because some of the AWS resources in this walkthrough incur a cost if you continue to use them. Complete the following steps:

  1. On the AWS CloudFormation console, choose Stacks.
  2. Choose the stack you launched in this walkthrough. The stack must be currently running.
  3. In the stack details pane, choose Delete.
  4. Choose Delete stack.

Summary

This post showed you how to automate the creation of an Aurora MySQL and Amazon Redshift cluster preloaded with the TPC-DS dataset, the prerequisites for the new Amazon Redshift federated query feature using AWS CloudFormation, and a single manual step to complete the setup. It also provided an example federated query using the TPC-DS dataset, which you can use to accelerate your learning and adoption of the new feature. You can continue to modify the CloudFormation templates from this post to support your business needs.

If you have any questions or suggestions, please leave a comment.


About the Authors

BP Yau is an Analytics Specialist Solutions Architect at AWS. His role is to help customers architect big data solutions to process data at scale. Before AWS, he helped Amazon.com Supply Chain Optimization Technologies migrate its Oracle data warehouse to Amazon Redshift and build its next generation big data analytics platform using AWS technologies.

 

Srikanth Sopirala is a Sr. Specialist Solutions Architect, Analytics at AWS. He is passionate about helping customers build scalable data and analytics solutions in the cloud.

 

 

 

Zhouyi Yang is a Software Development Engineer for Amazon Redshift Query Processing team. He’s passionate about gaining new knowledge about large databases and has worked on SQL language features such as federated query and IAM role privilege control. In his spare time, he enjoys swimming, tennis, and reading.

 

 

Entong Shen is a Senior Software Development Engineer for Amazon Redshift. He has been working on MPP databases for over 8 years and has focused on query optimization, statistics, and SQL language features such as stored procedures and federated query. In his spare time, he enjoys listening to music of all genres and working in his succulent garden.

 

[$] Managing multifunction devices with the auxiliary bus

Post Syndicated from original https://lwn.net/Articles/840416/rss

Device drivers usually live within a single kernel subsystem. Sometimes,
however, developers need to handle functionalities outside of this model.
Consider, for example, a network interface card (NIC) exposing both Ethernet and
RDMA functionalities. There is one hardware block, but two drivers for the
two functions. Those drivers need to work within their respective
subsystems, but they must also share access to the same hardware. There is
no standard way in current kernels to connect those drivers together, so
developers invent ad-hoc methods to handle the interaction between
them. Recently, Dave Ertman posted
a patch set introducing a new type of a bus, called the “auxiliary bus”, to
address this problem.

Commits are snapshots, not diffs

Post Syndicated from Derrick Stolee original https://github.blog/2020-12-17-commits-are-snapshots-not-diffs/

Git has a reputation for being confusing. Users stumble over terminology and phrasing that misguides their expectations. This is most apparent in commands that “rewrite history” such as git cherry-pick or git rebase. In my experience, the root cause of this confusion is an interpretation of commits as diffs that can be shuffled around. However, commits are snapshots, not diffs!

I believe that Git becomes understandable if we peel back the curtain and look at how Git stores your repository data. After we investigate this model, we’ll explore how this new perspective helps us understand commands like git cherry-pick and git rebase.

If you want to go really deep, you should read the Git Internals chapter of the Pro Git book.

I’ll be using the git/git repository checked out at v2.29.2 as an example. Follow along with my command-line examples for extra practice.

Object IDs are hashes

The most important part to know about Git objects is that Git references each by its object ID (OID for short), providing a unique name for the object. We will use the git rev-parse <ref> command to discover these OIDs. Each object is essentially a plain-text file and we can examine its contents using the git cat-file -p <oid> command.

You might also be used to seeing OIDs given as a shorter hex string. This string is given as something long enough that only one object in the repository has an OID that matches that abbreviation. If we request the type of an object using an abbreviated OID that is too short, then we will see the list of OIDs that match

$ git cat-file -t e0c03
error: short SHA1 e0c03 is ambiguous
hint: The candidates are:
hint: e0c03f27484 commit 2016-10-26 - contrib/buildsystems: ignore irrelevant files in Generators/
hint: e0c03653e72 tree
hint: e0c03c3eecc blob
fatal: Not a valid object name e0c03

What are these types: blobtree, and commit? Let’s start at the bottom and work our way up.

Blobs are file contents

At the bottom of the object model, blobs contain file contents. To discover the OID for a file at your current revision, run git rev-parse HEAD:<path>. Then, use git cat-file -p <oid> to find its contents.

$ git rev-parse HEAD:README.md
eb8115e6b04814f0c37146bbe3dbc35f3e8992e0

$ git cat-file -p eb8115e6b04814f0c37146bbe3dbc35f3e8992e0 | head -n 8
[![Build status](https://github.com/git/git/workflows/CI/PR/badge.png)](https://github.com/git/git/actions?query=branch%3Amaster+event%3Apush)

Git - fast, scalable, distributed revision control system
=========================================================

Git is a fast, scalable, distributed revision control system with an
unusually rich command set that provides both high-level operations
and full access to internals.

If I edit the README.md file on my disk, then git status notices that the file has a recent modified time and hashes the contents. If the contents don’t match the current OID at HEAD:README.md, then git status reports the file as “modified on disk.” In this way, we can see if the file contents in the current working directory match the expected contents at HEAD.

Trees are directory listings

Note that blobs contain file contents, but not the file names! The names come from Git’s representation of directories: trees. A tree is an ordered list of path entries, paired with object types, file modes, and the OID for the object at that path. Subdirectories are also represented as trees, so trees can point to other trees!

We will use diagrams to visualize how these objects are related. We use boxes for blobs and triangles for trees.

$ git rev-parse HEAD^{tree}
75130889f941eceb57c6ceb95c6f28dfc83b609c

$ git cat-file -p 75130889f941eceb57c6ceb95c6f28dfc83b609c  | head -n 15
100644 blob c2f5fe385af1bbc161f6c010bdcf0048ab6671ed    .cirrus.yml
100644 blob c592dda681fecfaa6bf64fb3f539eafaf4123ed8    .clang-format
100644 blob f9d819623d832113014dd5d5366e8ee44ac9666a    .editorconfig
100644 blob b08a1416d86012134f823fe51443f498f4911909    .gitattributes
040000 tree fbe854556a4ae3d5897e7b92a3eb8636bb08f031    .github
100644 blob 6232d339247fae5fdaeffed77ae0bbe4176ab2de    .gitignore
100644 blob cbeebdab7a5e2c6afec338c3534930f569c90f63    .gitmodules
100644 blob bde7aba756ea74c3af562874ab5c81a829e43c83    .mailmap
100644 blob 05f3e3f8d79117c1d32bf5e433d0fd49de93125c    .travis.yml
100644 blob 5ba86d68459e61f87dae1332c7f2402860b4280c    .tsan-suppressions
100644 blob fc4645d5c08bd005238fc72cfa709495d8722e6a    CODE_OF_CONDUCT.md
100644 blob 536e55524db72bd2acf175208aef4f3dfc148d42    COPYING
040000 tree a58410edddbdd133cca6b3322bebe4fb37be93fa    Documentation
100755 blob ca6ccb49866c595c80718d167e40cfad1ee7f376    GIT-VERSION-GEN
100644 blob 9ba33e6a141a3906eb707dd11d1af4b0f8191a55    INSTALL

Trees provide names for each sub-item. Trees also include information such as Unix file permissions, object type (blob or tree), and OIDs for each entry. We cut the output to the top 15 entries, but we can use grep to discover that this tree has a README.md entry that points to our earlier blob OID.

$ git cat-file -p 75130889f941eceb57c6ceb95c6f28dfc83b609c | grep README.md
100644 blob eb8115e6b04814f0c37146bbe3dbc35f3e8992e0    README.md

Trees can point to blobs and other trees using these path entries. Keep in mind that those relationships are paired with path names, but we will not always show those names in our diagrams.

The tree itself doesn’t know where it exists within the repository, that is the role of the objects pointing to the tree. The tree referenced by <ref>^{tree} is a special tree: the root tree. This designation is based on a special link from your commits.

Commits are snapshots

commit is a snapshot in time. Each commit contains a pointer to its root tree, representing the state of the working directory at that time. The commit has a list of parent commits corresponding to the previous snapshots. A commit with no parents is a root commit and a commit with multiple parents is a merge commit. Commits also contain metadata describing the snapshot such as author and committer (including name, email address, and date) and a commit message. The commit message is an opportunity for the commit author to describe the purpose of that commit with respect to the parents.

For example, the commit at v2.29.2 in the Git repository describes that release, and is authored and committed by the Git maintainer.

$ git rev-parse HEAD
898f80736c75878acc02dc55672317fcc0e0a5a6

/c/_git/git ((v2.29.2))
$ git cat-file -p 898f80736c75878acc02dc55672317fcc0e0a5a6
tree 75130889f941eceb57c6ceb95c6f28dfc83b609c
parent a94bce62b99be35f2ee2b4c98f97c222e7dd9d82
author Junio C Hamano <[email protected]> 1604006649 -0700
committer Junio C Hamano <[email protected]> 1604006649 -0700

Git 2.29.2

Signed-off-by: Junio C Hamano <[email protected]>

Looking a little farther in the history with git log, we can see a more descriptive commit message talking about the change between that commit and its parent.

$ git cat-file -p 16b0bb99eac5ebd02a5dcabdff2cfc390e9d92ef
tree d0e42501b1cf65395e91e22e74f75fc5caa0286e
parent 56706dba33f5d4457395c651cf1cd033c6c03c7a
author Jeff King &lt;[email protected]&gt; 1603436979 -0400
committer Junio C Hamano &lt;[email protected]&gt; 1603466719 -0700

am: fix broken email with --committer-date-is-author-date

Commit e8cbe2118a (am: stop exporting GIT_COMMITTER_DATE, 2020-08-17)
rewrote the code for setting the committer date to use fmt_ident(),
rather than setting an environment variable and letting commit_tree()
handle it. But it introduced two bugs:

- we use the author email string instead of the committer email

- when parsing the committer ident, we used the wrong variable to
compute the length of the email, resulting in it always being a
zero-length string

This commit fixes both, which causes our test of this option via the
rebase "apply" backend to now succeed.

Signed-off-by: Jeff King &lt;[email protected]&gt; Signed-off-by: Junio C Hamano &lt;[email protected]&gt;

In our diagrams, we will use circles to represent commits. Notice the alliteration? Let’s review:

  • Boxes are blobs. These represent file contents.
  • Triangles are trees. These represent directories.
  • Circles are commits. These are snapshots in time.

Branches are pointers

In Git, we move around the history and make changes without referring to OIDs most of the time. This is because branches provide pointers to the commits we care about. A branch with name main is actually a reference in Git called refs/heads/main. These files literally contain hex strings referencing the OID of a commit. As you work, these references change their contents to point to other commits.

This means branches are significantly different from our previous Git objects. Commits, trees, and blobs are immutable, meaning you can’t change their contents. If you change the contents, then you get a different hash and thus a new OID referring to the new object! Branches are named by users to provide meaning, such as trunk or my-special-project. We use branches to track and share work.

The special reference HEAD points to the current branch. When we add a commit to HEAD, it automatically updates that branch to the new commit.

We can create a new branch and update our HEAD using git switch -c:

$ git switch -c my-branch
Switched to a new branch 'my-branch'
$ cat .git/refs/heads/my-branch
1ec19b7757a1acb11332f06e8e812b505490afc6
$ cat .git/HEAD
ref: refs/heads/my-branch

Notice how creating my-branch created a file (.git/refs/heads/my-branch) containing the current commit OID and the .git/HEAD file was updated to point at this branch. Now, if we update HEAD by creating new commits, the branch my-branch will update to point to that new commit!

The big picture

Let’s put all of these new terms into one giant picture. Branches point to commits, commits point to other commits and their root trees, trees point to blobs and other trees, and blobs don’t point to anything. Here is a diagram containing all of our objects all at once:

In this diagram, time moves from left to right. The arrows between a commit and its parents go from right to left. Each commit has a single root tree. HEAD points to the main branch here, and main points to the most-recent commit. The root tree at this commit is fully expanded underneath, while the rest of the trees have arrows pointing towards these objects. The reason for that is that the same objects are reachable from multiple root trees! Since these trees reference those objects by their OID (their content) these snapshots do not need multiple copies of the same data. In this way, Git’s object model forms a Merkle tree.

When we view the object model in this way, we can see why commits are snapshots: they link directly to a full view of the expected working directory for that commit!

Computing diffs

Even though commits are snapshots, we frequently look at a commit in a history view or on GitHub as a diff. In fact, the commit message frequently refers to this diff. The diff is dynamically generated from the snapshot data by comparing the root trees of the commit and its parent. Git can compare any two snapshots in time, not just adjacent commits.

To compare two commits, start by looking at their root trees, which are almost always different. Then, perform a depth-first-search on the subtrees by following pairs when paths for the current tree have different OIDs. In the example below, the root trees have different values for the docs, so we recurse into those two trees. Those trees have different values for M.md, so those two blobs are compared line-by-line and that diff is shown. Still within docsN.md is the same, so that is skipped and we pop back to the root tree. The root tree then sees that the things directories have equal OIDs as well as the README.md entries.

In the diagram above, we notice that the things tree is never visited, and so none of its reachable objects are visited. This way, the cost of computing a diff is relative to the number of paths with different content.

Now we have the understanding that commits are snapshots and we can dynamically compute a diff between any two commits. Then why isn’t this common knowledge? Why do new users stumble over this idea that a commit is a diff?

One of my favorite analogies is to think of commits as having a wave/partical duality where sometimes they are treated like snapshots and other times they are treated like diffs. The crux of the matter really goes into a different kind of data that’s not actually a Git object: patches.

Wait, what’s a patch?

patch is a text document that describes how to alter an existing codebase. Patches are how extremely-distributed groups can share code without using Git commits directly. You can see these being shuffled around on the Git mailing list.

A patch contains a description of the change and why it is valuable, followed by a diff. The idea is that someone could use that reasoning as a justification to apply that diff to their copy of the code.

Git can convert a commit into a patch using git format-patch. A patch can then be applied to a Git repository using git apply. This was the dominant way to share code in the early days of open source, but most projects have moved to sharing Git commits directly through pull requests.

The biggest issue with sharing patches is that the patch loses the parent information and the new commit has a parent equal to your existing HEAD. Moreover, you get a different commit even if you use the same parent as before due to the commit time, but also the committer changes! This is the fundamental reason why Git has both “author” and “committer” details in the commit object.

The biggest problem with using patches is that it is hard to apply a patch when your working directory does not match the sender’s previous commit. Losing the commit history makes it difficult to resolve conflicts.

This idea of “moving patches around” has transferred into several Git commands as “moving commits around.” Instead, what actually happens is that commit diffs are replayed, creating new commits.

If commits aren’t diffs, then what does git cherry-pick do?

The git cherry-pick <oid> command creates a new commit with an identical diff to <oid> whose parent is the current commit. Git is essentially following these steps:

  1. Compute the diff between the commit <oid> and its parent.
  2. Apply that diff to the current HEAD.
  3. Create a new commit whose root tree matches the new working directory and whose parent is the commit at HEAD.
  4. Move the ref at HEAD to that new commit.

After Git creates the new commit, the output of git log -1 -p HEAD should match the output of git log -1 -p <oid>.

It is important to recognize that we didn’t “move” the commit to be on top of our current HEAD, we created a new commit whose diff matches the old commit.

If commits aren’t diffs, then what does git rebase do?

The git rebase command presents itself as a way to move commits to have a new history. In its most basic form it is really just a series of git cherry-pick commands, replaying diffs on top of a different commit.

The most important thing is that git rebase <target> will discover the list of commits that are reachable from HEAD but not reachable from <target>. You can show these yourself using git log --oneline <target>..HEAD.

Then, the rebase command simply navigates to the <target> location and starts performing git cherry-pick commands on this commit range, starting from the oldest commits. At the end, we have a new set of commits with different OIDs but similar diffs to the original commit range.

For example, consider a sequence of three commits in the current HEAD since branching off of a target branch. When running git rebase target, the common base P is computed to determine the commit list AB, and C. These are then cherry-picked on top of target in order to construct new commits A'B', and C'.

The commits A'B', and C' are brand new commits that share a lot of information with AB, and C, but are distinct new objects. In fact, the old commits still exist in your repository until garbage collection runs.

We can even inspect how these two commit ranges are different using the git range-diff command! I’ll use some example commits in the Git repository to rebase onto the v2.29.2 tag, then modify the tip commit slightly.

$ git checkout -f 8e86cf65816
$ git rebase v2.29.2
$ echo extra line >>README.md
$ git commit -a --amend -m "replaced commit message"
$ git range-diff v2.29.2 8e86cf65816 HEAD
1:  17e7dbbcbc = 1:  2aa8919906 sideband: avoid reporting incomplete sideband messages
2:  8e86cf6581 ! 2:  e08fff1d8b sideband: report unhandled incomplete sideband messages as bugs
    @@ Metadata
     Author: Johannes Schindelin <[email protected]>
     
      ## Commit message ##
    -    sideband: report unhandled incomplete sideband messages as bugs
    +    replaced commit message
     
    -    It was pretty tricky to verify that incomplete sideband messages are
    -    handled correctly by the `recv_sideband()`/`demultiplex_sideband()`
    -    code: they have to be flushed out at the end of the loop in
    -    `recv_sideband()`, but the actual flushing is done by the
    -    `demultiplex_sideband()` function (which therefore has to know somehow
    -    that the loop will be done after it returns).
    -
    -    To catch future bugs where incomplete sideband messages might not be
    -    shown by mistake, let's catch that condition and report a bug.
    -
    -    Signed-off-by: Johannes Schindelin <[email protected]>
    -    Signed-off-by: Junio C Hamano <[email protected]>
    + ## README.md ##
    +@@ README.md: and the name as (depending on your mood):
    + [Documentation/giteveryday.txt]: Documentation/giteveryday.txt
    + [Documentation/gitcvs-migration.txt]: Documentation/gitcvs-migration.txt
    + [Documentation/SubmittingPatches]: Documentation/SubmittingPatches
    ++extra line
     
      ## pkt-line.c ##
     @@ pkt-line.c: int recv_sideband(const char *me, int in_stream, int out)

Notice that the resulting range-diff claims that commits 17e7dbbcbc and 2aa8919906 are “equal”, which means they would generate the same patch. The second pair of commits are different, showing that the commit message changed and there is an edit to the README.md that was not in the original commit.

If you are following along, you can also see how the commit history still exists for these two commit sets. The new commits have the v2.29.2 tag as the third commit in the history while the old commits have the (earlier) v2.28.0 tag as the third commit.

$ git log --oneline -3 HEAD
e08fff1d8b2 (HEAD) replaced commit message
2aa89199065 sideband: avoid reporting incomplete sideband messages
898f80736c7 (tag: v2.29.2) Git 2.29.2

$ git log --oneline -3 8e86cf65816
8e86cf65816 sideband: report unhandled incomplete sideband messages as bugs
17e7dbbcbce sideband: avoid reporting incomplete sideband messages
47ae905ffb9 (tag: v2.28.0) Git 2.28

Since commits aren’t diffs, how does Git track renames?

If you were looking carefully at the object model, you might have noticed that Git never tracks changes between commits in the stored object data. You might have wondered “how does Git know a rename happened?”

Git doesn’t track renames. There is no data structure inside Git that stores a record that a rename happened between a commit and its parent. Instead, Git tries to detect renames during the dynamic diff calculation. There are two stages to this rename detection: exact renames and edit-renames.

After first computing a diff, Git inspects the internal model of that diff to discover which paths were added or deleted. Naturally, a file that was moved from one location to another would appear as a deletion from the first location and an add in the second. Git attempts to match these adds and deletes to create a set of inferred renames.

The first stage of this matching algorithm looks at the OIDs of the paths that were added and deleted and see if any are exact matches. Such exact matches are paired together.

The second stage is the expensive part: how can we detect files that were renamed and edited? Git iterates through each added file and compares that file against each deleted file to compute a similarity score as a percentage of lines in common. By default, anything larger than 50% of lines in common counts as a potential edit-rename. The algorithm continues comparing these pairs until finding the maximum match.

Did you notice a problem? This algorithm runs A * D diffs, where A is the number of adds and D is the number of deletes. This is quadratic! To avoid extra-long rename computations, Git will skip this portion of detecting edit-renames if A + D is larger than an internal limit. You can modify this limit using the diff.renameLimit config option. You can also avoid the algorithm altogether by disabling the diff.renames config option.

I’ve used my awareness of the Git rename detection in my own projects. For example, I forked VFS for Git to create the Scalar project and wanted to re-use a lot of the code but also change the file structure significantly. I wanted to be able to follow the history of these files into the versions in the VFS for Git codebase, so I constructed my refactor in two steps:

  1. Rename all of the files without changing the blobs.
  2. Replace strings to modify the blobs without changing filenames.

These two steps ensured that I can quickly use git log --follow -- <path> to see the history of a file across this rename.

$ git log --oneline --follow -- Scalar/CommandLine/ScalarVerb.cs
4183579d console: remove progress spinners from all commands
5910f26c ScalarVerb: extract Git version check
...
9f402b5a Re-insert some important instances of GVFS
90e8c1bd [REPLACE] Replace old name in all files
fb3a2a36 [RENAME] Rename all files
cedeeaa3 Remove dead GVFSLock and GitStatusCache code
a67ca851 Remove more dead hooks code
...

I abbreviated the output, but these last two commits don’t actually have a path corresponding to Scalar/CommandLine/ScalarVerb.cs, but instead it is tracking the previous path GVSF/GVFS/CommandLine/GVFSVerb.cs because Git recognized the exact-content rename from the commit fb3a2a36 [RENAME] Rename all files.

Won’t be fooled again!

You now know that commits are snapshots, not diffs! This understanding will help you navigate your experience working with Git.

Now you are armed with deep knowledge of the Git object model. You can use this knowledge to expand your skills in using Git commands or deciding on workflows for your team. In a future blog post, we will use this knowledge to learn about different Git clone options and how to reduce the data you need to get things done!

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close