cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

essura
by New Contributor II
  • 110 Views
  • 2 replies
  • 1 kudos

Create a docker image for dbt task

Hi there,We are trying to setup up a docker image for our dbt execution, primarily to improve execution speed, but also to simplify deployment (we are using a private repos for both the dbt project and some of the dbt packages).It seems to work curre...

  • 110 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @essura, Setting up a Docker image for your dbt execution is a great approach. Let’s dive into the details. Prebuilt Docker Images: dbt Core and all adapter plugins maintained by dbt Labs are available as Docker images. These images are distr...

  • 1 kudos
1 More Replies
Innov
by New Contributor
  • 58 Views
  • 1 replies
  • 0 kudos

Parse nested json for building footprints

Looking for some help. If anyone has worked with nested json file in Databricks notebook. I am trying to parse nested json file to get coordinates and use that to create polygon for footprint. Do I need to read it as txt? How can I use the Databricks...

  • 58 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Innov, Working with nested JSON files in Databricks Notebooks is a common task, and I can guide you through the process. Let’s break it down step by step: Reading the Nested JSON File: You don’t need to read the JSON file as plain text (.txt...

  • 0 kudos
zero234
by New Contributor II
  • 93 Views
  • 1 replies
  • 1 kudos

Data is not loaded when creating two different streaming table from one delta live table pipeline

 i am trying to create 2 streaming tables in one DLT pipleine , both read json data from different locations and both have different schema , the pipeline executes but no data is inserted in both the tables.where as when i try to run each table indiv...

Data Engineering
dlt
spark
STREAMINGTABLE
  • 93 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @zero234, It seems you’re encountering an issue with your Delta Live Tables (DLT) pipeline where you’re trying to create two streaming tables from different sources with distinct schemas. Let’s dive into this! DLT is a powerful feature in Data...

  • 1 kudos
RabahO
by New Contributor II
  • 217 Views
  • 1 replies
  • 0 kudos

Unit tests in notebook not working

Hello, I'm trying to setup a notebook for tests or data quality checks. The name is not important.I basically read a table (the ETL output process - actual data).Then I read another table and do the calculation in the notebook (expected data)I'm stuc...

  • 217 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
New Contributor III
  • 0 kudos

you can use nutter, Testing framework for Databricks notebooksmicrosoft/nutter: Testing framework for Databricks notebooks (github.com)

  • 0 kudos
vijaykumar99535
by New Contributor III
  • 224 Views
  • 1 replies
  • 0 kudos

How to create job cluster using rest api

I am creating cluster using rest api call but every-time it is creating all purpose cluster. Is there a way to create job cluster and run notebook using python code?

  • 224 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
New Contributor III
  • 0 kudos

job_cluster_key string [ 1 .. 100 ] characters ^[\w\-\_]+$ If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters.Create a new job | Jobs API | REST API reference | Databricks on AWS

  • 0 kudos
William_Scardua
by Contributor III
  • 80 Views
  • 1 replies
  • 1 kudos

groupBy without aggregation (Pyspark API)

Hi guys,You have any idea how can I do a groupBy without aggregation (Pyspark API)like: df.groupBy('field1', 'field2', 'field3') My target is make a group but in this case is not necessary count records or aggregationThank you  

  • 80 Views
  • 1 replies
  • 1 kudos
Latest Reply
feiyun0112
New Contributor III
  • 1 kudos

df.select("field1","field2","field3").distinct()do you mean get distinct rows for selected column?

  • 1 kudos
zero234
by New Contributor II
  • 218 Views
  • 1 replies
  • 0 kudos

I am trying to read nested data from json file to put it into streaming table using dlt

So i have this nested data with more than 200+columns and i have extracted this data into json file when i use the below code to read the json files, if in data there are few columns which have no value at all it doest inclued those columns in schema...

  • 218 Views
  • 1 replies
  • 0 kudos
Latest Reply
zero234
New Contributor II
  • 0 kudos

replying to my above questionwe cannot use inferschema on streaming table we need to externally specify schema can anyone please suggest a way to write data in nested form to streaming table and if this is possible?

  • 0 kudos
asad77007
by New Contributor II
  • 723 Views
  • 3 replies
  • 1 kudos

How to connect Analysis Service Cube with Databricks notebook

I am trying to connect AS Cube with Databricks notebook but unfortunately didn't find any solution yet. is there any possible way to connect AS cube with databricks notebook? if yes can someone please guide me

  • 723 Views
  • 3 replies
  • 1 kudos
Latest Reply
omfspartan
New Contributor III
  • 1 kudos

I am able to connect Azure analysis services using Azure Analysis services rest api. is yours on-prem?

  • 1 kudos
2 More Replies
Baldrez
by New Contributor II
  • 1920 Views
  • 4 replies
  • 5 kudos

Resolved! REST API for Stream Monitoring

Hi, everyone. I just recently started using Databricks on Azure so my question is probably very basic but I am really stuck right now.I need to capture some streaming metrics (number of input rows and their time) so I tried using the Spark Rest Api ...

  • 1920 Views
  • 4 replies
  • 5 kudos
Latest Reply
jose_gonzalez
Moderator
  • 5 kudos

hi @Roberto Baldrez​ ,if you think that @Gaurav Rupnar​ solved your question, then please select it as best response to it can be moved to the top of the topic and it will help more users in the future.Thank you

  • 5 kudos
3 More Replies
Miro_ta
by New Contributor III
  • 5146 Views
  • 8 replies
  • 3 kudos

Resolved! Can't query delta tables, token missing required scope

Hello,I've correctly set up a stream from kinesis, but I can't read anything from my delta tableI'm actually reproducing the demo from Frank Munz: https://github.com/fmunz/delta-live-tables-notebooks/tree/main/motion-demoand I'm running the following...

  • 5146 Views
  • 8 replies
  • 3 kudos
Latest Reply
Brosenberg
New Contributor II
  • 3 kudos

At this time, Delta Live Tables in Unity Catalog can only be read using a Shared compute or SQL Warehouse (support to read from Assigned compute is on the roadmap). To read the table using Assigned compute (e.g. Personal Compute), you will first need...

  • 3 kudos
7 More Replies
zero234
by New Contributor II
  • 146 Views
  • 2 replies
  • 2 kudos

I have created a DLT pipeline which  reads data from json files which are stored in databricks volum

I have created a DLT pipeline which  reads data from json files which are stored in databricks volume and puts data into streaming table This was working fine.when i tried to read the data that is inserted into the table and compare the values with t...

  • 146 Views
  • 2 replies
  • 2 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 2 kudos

Keep your DLT code separate from your comparison code, and run your comparison code once your DLT data has been ingested.

  • 2 kudos
1 More Replies
OliverCadman
by New Contributor III
  • 4888 Views
  • 9 replies
  • 4 kudos

'File not found' error when executing %run magic command

I'm just walking through a simple exercise presented in the Databricks Platform Lab notebook, in which I'm executing a remote notebook from within using the %run command. The remote notebook resides in the same directory as the Platform Lab notebook,...

Data Engineering
%file_not_found
%magic_commands
%run
  • 4888 Views
  • 9 replies
  • 4 kudos
Latest Reply
MuthuLakshmi
New Contributor III
  • 4 kudos

The %run command is a specific Jupyter magic command. The ipykernel used in Databricks examines the initial line of code to determine the appropriate compiler or language for execution. To minimize the likelihood of encountering errors, it is advisab...

  • 4 kudos
8 More Replies
LukeD
by New Contributor
  • 82 Views
  • 2 replies
  • 1 kudos

Billing support contact

Hi,What is the best way to contact Databricks support? I see the differences between AWS billing and Databricks report and I'm looking for explanation of that. I've send 3 messages last week by this form https://www.databricks.com/company/contact but...

  • 82 Views
  • 2 replies
  • 1 kudos
Latest Reply
LukeD
New Contributor
  • 1 kudos

Hi @MinThuraZaw I cannot login on that page somehow but I found email address and I was submitted by case using that channel. Thank you for advice.

  • 1 kudos
1 More Replies
Avinash_Narala
by New Contributor
  • 67 Views
  • 1 replies
  • 1 kudos

Unity Catalog Migration

Hello,We are in the process of migrating to Unity Catalog. So, can I know how to automate the process of Refactoring the Notebooks to Unity Catalog.

Data Engineering
automation
migration
unitycatalog
  • 67 Views
  • 1 replies
  • 1 kudos
Latest Reply
MinThuraZaw
New Contributor
  • 1 kudos

Hi @Avinash_Narala There is no one-click solution to refactor all table names notebooks with UC's three level namespaces. At least, manual updating table names is required during the migration process.One option is you can you search feature. Search ...

  • 1 kudos