cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

cltj
by New Contributor III
  • 2336 Views
  • 1 replies
  • 0 kudos

Managed tables and ADLS - infrastructure

Hi all. I want to get this right and therefore I am reaching out to the community. We are using azure, and currently are using 1 Azure Data Lake Storage for development, and 1 for production. These are connected to dev and prod databricks workspaces....

  • 2336 Views
  • 1 replies
  • 0 kudos
Latest Reply
ossinova
Contributor II
  • 0 kudos

I recommend you read this article (Managed vs External tables) and answer the following questions:do I require direct access to the data outside of Azure Databricks clusters or Databricks SQL warehouses?If yes, then External is your only optionIn rel...

  • 0 kudos
Marcin_U
by New Contributor II
  • 1912 Views
  • 1 replies
  • 0 kudos

AutoLoader - problem with adding new source location

Hello,I have some trouble with AutoLoader. Currently we use many diffrent source location on ADLS to read parquet files and write it to delta table using AutoLoader. Files in locations have the same schema.Every things works fine untill we have to ad...

  • 1912 Views
  • 1 replies
  • 0 kudos
Latest Reply
Marcin_U
New Contributor II
  • 0 kudos

Thanks for the reply @Retired_mod . I have some questions related to you answer.Checkpoint Location:Does deleteing checkpoint folder (or only files?) mean that next run of AutoLoader will load all files from provided source locations? So it will dupl...

  • 0 kudos
oussValrho
by New Contributor
  • 7299 Views
  • 0 replies
  • 0 kudos

Cannot resolve due to data type mismatch: incompatible types ("STRING" and ARRAY<STRING>

hey i have this error from a while : Cannot resolve "(needed_skill_id = needed_skill_id)" due to data type mismatch: the left and right operands of the binary operator have incompatible types ("STRING" and "ARRAY<STRING>"). SQLSTATE: 42K09;and these ...

  • 7299 Views
  • 0 replies
  • 0 kudos
essura
by New Contributor II
  • 2512 Views
  • 1 replies
  • 1 kudos

Create a docker image for dbt task

Hi there,We are trying to setup up a docker image for our dbt execution, primarily to improve execution speed, but also to simplify deployment (we are using a private repos for both the dbt project and some of the dbt packages).It seems to work curre...

  • 2512 Views
  • 1 replies
  • 1 kudos
vijaykumar99535
by New Contributor III
  • 1933 Views
  • 1 replies
  • 0 kudos

How to create job cluster using rest api

I am creating cluster using rest api call but every-time it is creating all purpose cluster. Is there a way to create job cluster and run notebook using python code?

  • 1933 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

job_cluster_key string [ 1 .. 100 ] characters ^[\w\-\_]+$ If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters.Create a new job | Jobs API | REST API reference | Databricks on AWS

  • 0 kudos
William_Scardua
by Valued Contributor
  • 2315 Views
  • 1 replies
  • 1 kudos

groupBy without aggregation (Pyspark API)

Hi guys,You have any idea how can I do a groupBy without aggregation (Pyspark API)like: df.groupBy('field1', 'field2', 'field3') My target is make a group but in this case is not necessary count records or aggregationThank you  

  • 2315 Views
  • 1 replies
  • 1 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 1 kudos

df.select("field1","field2","field3").distinct()do you mean get distinct rows for selected column?

  • 1 kudos
Innov
by New Contributor
  • 1063 Views
  • 0 replies
  • 0 kudos

Parse nested json for building footprints

Looking for some help. If anyone has worked with nested json file in Databricks notebook. I am trying to parse nested json file to get coordinates and use that to create polygon for footprint. Do I need to read it as txt? How can I use the Databricks...

  • 1063 Views
  • 0 replies
  • 0 kudos
zero234
by New Contributor III
  • 1234 Views
  • 1 replies
  • 0 kudos

I am trying to read nested data from json file to put it into streaming table using dlt

So i have this nested data with more than 200+columns and i have extracted this data into json file when i use the below code to read the json files, if in data there are few columns which have no value at all it doest inclued those columns in schema...

  • 1234 Views
  • 1 replies
  • 0 kudos
Latest Reply
zero234
New Contributor III
  • 0 kudos

replying to my above questionwe cannot use inferschema on streaming table we need to externally specify schema can anyone please suggest a way to write data in nested form to streaming table and if this is possible?

  • 0 kudos
asad77007
by New Contributor II
  • 3191 Views
  • 3 replies
  • 1 kudos

How to connect Analysis Service Cube with Databricks notebook

I am trying to connect AS Cube with Databricks notebook but unfortunately didn't find any solution yet. is there any possible way to connect AS cube with databricks notebook? if yes can someone please guide me

  • 3191 Views
  • 3 replies
  • 1 kudos
Latest Reply
omfspartan
New Contributor III
  • 1 kudos

I am able to connect Azure analysis services using Azure Analysis services rest api. is yours on-prem?

  • 1 kudos
2 More Replies
Baldrez
by New Contributor II
  • 4491 Views
  • 4 replies
  • 5 kudos

Resolved! REST API for Stream Monitoring

Hi, everyone. I just recently started using Databricks on Azure so my question is probably very basic but I am really stuck right now.I need to capture some streaming metrics (number of input rows and their time) so I tried using the Spark Rest Api ...

  • 4491 Views
  • 4 replies
  • 5 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 5 kudos

hi @Roberto Baldrez​ ,if you think that @Gaurav Rupnar​ solved your question, then please select it as best response to it can be moved to the top of the topic and it will help more users in the future.Thank you

  • 5 kudos
3 More Replies
zero234
by New Contributor III
  • 2619 Views
  • 2 replies
  • 2 kudos

I have created a DLT pipeline which  reads data from json files which are stored in databricks volum

I have created a DLT pipeline which  reads data from json files which are stored in databricks volume and puts data into streaming table This was working fine.when i tried to read the data that is inserted into the table and compare the values with t...

  • 2619 Views
  • 2 replies
  • 2 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 2 kudos

Keep your DLT code separate from your comparison code, and run your comparison code once your DLT data has been ingested.

  • 2 kudos
1 More Replies
Avinash_Narala
by Valued Contributor II
  • 1406 Views
  • 1 replies
  • 1 kudos

Unity Catalog Migration

Hello,We are in the process of migrating to Unity Catalog. So, can I know how to automate the process of Refactoring the Notebooks to Unity Catalog.

Data Engineering
automation
migration
unitycatalog
  • 1406 Views
  • 1 replies
  • 1 kudos
Latest Reply
MinThuraZaw
New Contributor III
  • 1 kudos

Hi @Avinash_Narala There is no one-click solution to refactor all table names notebooks with UC's three level namespaces. At least, manual updating table names is required during the migration process.One option is you can you search feature. Search ...

  • 1 kudos
valjas
by New Contributor III
  • 9102 Views
  • 3 replies
  • 0 kudos

Disable Machine Learning and Job Creation

We are working on creating a new databricks workspace for external entities. We have disabled Cluster and Warehouse creation permission but the external users are still able to create Jobs and job clusters. Is there a way to revoke Job creation permi...

  • 9102 Views
  • 3 replies
  • 0 kudos
Latest Reply
Venk1599
New Contributor II
  • 0 kudos

It permits cluster creation during Workflow/Job/DLT pipeline creation. However, when attempting to start any of these, it fails with a 'Not authorized to create compute' error. Please try it and inform me of the outcome

  • 0 kudos
2 More Replies
jaimeperry12345
by New Contributor
  • 1006 Views
  • 1 replies
  • 0 kudos

duplicate files in delta table

I am facing this issue from long time but so far there is no solution. I have delta table. My bronze layer is picking up the old files (mostly 8 days old file) randomly. My source of files is azure blob storage.

  • 1006 Views
  • 1 replies
  • 0 kudos
Latest Reply
Palash01
Valued Contributor
  • 0 kudos

Hey @jaimeperry12345 I will need more information to direct you in the right direction: Confirm the behavior: Double-check that your Delta table is indeed reading 8-day-old files randomly. Provide any logs or error messages you have regarding this.Ex...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels