cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AsgerLarsen
by New Contributor III
  • 2373 Views
  • 7 replies
  • 0 kudos

Using yml variables as table owner through SQL

I'm trying to change the ownership of a table in the Unity Catalog created through a SQL script. I want to do this though code.I'm using a standard databricks bundle setup, which uses three workspaces: dev, test and prod.I have created a variable in ...

  • 2373 Views
  • 7 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I guess that is a safe bet.Good luck!

  • 0 kudos
6 More Replies
Aatma
by New Contributor
  • 5608 Views
  • 3 replies
  • 1 kudos

Resolved! DABs require library dependancies from GitHub private repository.

developing a python wheel file using DABs which require library dependancies from GitHub private repository. Please help me understand how to setup the git user and token in the resource.yml file and how to authenticate the GitHub package.pip install...

  • 5608 Views
  • 3 replies
  • 1 kudos
Latest Reply
sandy311
New Contributor III
  • 1 kudos

Could you please give a detailed example?how to define env varaibles? BUNDLE_VAR?

  • 1 kudos
2 More Replies
minhhung0507
by Valued Contributor
  • 927 Views
  • 1 replies
  • 0 kudos

Handling Hanging Pipelines in Real-Time Environments: Leveraging Databricks’ Idle Event Monitoring

Hi everyone,I’m running multiple real-time pipelines on Databricks using a single job that submits them via a thread pool. While most pipelines are running smoothly, I’ve noticed that a few of them occasionally get “stuck” or hang for several hours w...

  • 927 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

may I ask why you use threadpools?  with jobs you can define multiple tasks which do the same.I'm asking because threadpools and spark resource management can intervene with each other.

  • 0 kudos
rodrigocms
by New Contributor
  • 3603 Views
  • 1 replies
  • 0 kudos

Get information from Power BI via XMLA

Hello everyone I am trying to get information from Power BI semantic models via XMLA endpoint using PySpark in Databricks.Can someone help me with that?tks

  • 3603 Views
  • 1 replies
  • 0 kudos
Latest Reply
CacheMeOutside
New Contributor II
  • 0 kudos

I would like to see this too. 

  • 0 kudos
PunithRaj
by New Contributor
  • 7247 Views
  • 2 replies
  • 2 kudos

How to read a PDF file from Azure Datalake blob storage to Databricks

I have a scenario where I need to read a pdf file from "Azure Datalake blob storage to Databricks", where connection is done through AD access.Generating the SAS token has been restricted in our environment due to security issues. The below script ca...

  • 7247 Views
  • 2 replies
  • 2 kudos
Latest Reply
Mykola_Melnyk
New Contributor III
  • 2 kudos

@PunithRaj You can try to use  PDF DataSource for Apache Spark for read pdf files directly to the DataFrame. So you will have extracted text and rendered page as image in output. More details here: https://stabrise.com/spark-pdf/df = spark.read.forma...

  • 2 kudos
1 More Replies
Kamal2
by Databricks Partner
  • 28214 Views
  • 5 replies
  • 7 kudos

Resolved! PDF Parsing in Notebook

I have pdf files stored in azure adls.i want to parse pdf files in pyspark dataframeshow can i do that ?

  • 28214 Views
  • 5 replies
  • 7 kudos
Latest Reply
Mykola_Melnyk
New Contributor III
  • 7 kudos

PDF Data Source works now on Databricks.Instruction with example: https://stabrise.com/blog/spark-pdf-on-databricks/

  • 7 kudos
4 More Replies
isaac_gritz
by Databricks Employee
  • 29677 Views
  • 7 replies
  • 7 kudos

Local Development on Databricks

How to Develop Locally on Databricks with your Favorite IDEdbx is a Databricks Labs project that allows you to develop code locally and then submit against Databricks interactive and job compute clusters from your favorite local IDE (AWS | Azure | GC...

  • 29677 Views
  • 7 replies
  • 7 kudos
Latest Reply
kmodelew
New Contributor III
  • 7 kudos

Hi, You can use any of existing IDE. I'm using pycharm. I have created my own utils to run code on databricks. In .env file I have environmental variables and using SDK I'm creating SparkSession object and WorkspaceObject that you can use to read cre...

  • 7 kudos
6 More Replies
ADuma
by New Contributor III
  • 5146 Views
  • 2 replies
  • 0 kudos

Job sometimes failing due to library installation error of Pypi library

I am running a job on a Cluster from a compute pool that is installing a package from our Azure Artifacts Feed. My task is supposed to run a wheel task from our library which has about a dozen dependencies.For more than 95% of the runs this job works...

  • 5146 Views
  • 2 replies
  • 0 kudos
Latest Reply
ADuma
New Contributor III
  • 0 kudos

Hi Brahma,thanks a lot for the help. I'm trying installing my libraries with an init script right now. Unfortunately the error does not occur very regularily, so I'll have to observer for a few days I'm not 100% happy with the solution though. We are...

  • 0 kudos
1 More Replies
Dnirmania
by Contributor
  • 3757 Views
  • 4 replies
  • 0 kudos

Read file from AWS S3 using Azure Databricks

Hi TeamI am currently working on a project to read CSV files from an AWS S3 bucket using an Azure Databricks notebook. My ultimate goal is to set up an autoloader in Azure Databricks that reads new files from S3 and loads the data incrementally. Howe...

Dnirmania_0-1744106993274.png
  • 3757 Views
  • 4 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

no ,it is very easy follow this guide it will work - https://github.com/aviral-bhardwaj/MyPoCs/blob/main/SparkPOC/ETLProjectsAWS-S3toDatabricks.ipynb   

  • 0 kudos
3 More Replies
William_Scardua
by Valued Contributor
  • 12816 Views
  • 4 replies
  • 1 kudos

How to read data from Azure Log Analitycs ?

Hi guys,I need to read data from Azure Log Analitycs Workspace directaly, have any idea ?thank you

  • 12816 Views
  • 4 replies
  • 1 kudos
Latest Reply
alexott
Databricks Employee
  • 1 kudos

You can use Kusto Spark connector for that: https://github.com/Azure/azure-kusto-spark/blob/master/docs/KustoSource.md#source-read-command It heavily depends on how you access data, there could be a need for using ADX cluster for it: https://learn.mi...

  • 1 kudos
3 More Replies
KristiLogos
by Contributor
  • 3153 Views
  • 2 replies
  • 0 kudos

Resolved! GCS Error getting access token from metadata server at: http://169.254.169.254/computeMetadata/v1/in

I’m running Databricks on Azure and trying to read a CSV file from Google Cloud Storage (GCS) bucket using Spark. However, despite configuring Spark with a Google service account key, I’m encountering the following error:Error getting access token fr...

  • 3153 Views
  • 2 replies
  • 0 kudos
Latest Reply
ShivangiB
New Contributor III
  • 0 kudos

Hey, @KristiLogos , can you please suggest in what format key was stored in gsa_private_key.Actually we are using key vault based scope

  • 0 kudos
1 More Replies
thomas_berry
by Databricks Partner
  • 1424 Views
  • 1 replies
  • 0 kudos

Federated query on the source

Hello,I want to be able to run an arbitrary query on the source before its result gets sent to databricks. I want to create something like this: create table gold.bigqueryUSING org.apache.spark.sql.jdbcoptions( url "jdbc:postgresql://---:---/---",dri...

  • 1424 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi thomas_berry,How are you doing today?, As per my understanding, You're spot on with your understanding, and you're not alone in running into this limitation. Unity Catalog doesn’t currently support creating tables using a JDBC query like in your e...

  • 0 kudos
AndrewBeck
by New Contributor
  • 2012 Views
  • 1 replies
  • 1 kudos

Python UDF support in Unity Catalog and runtime 13.3?

Hi community,I am running Databricks Unity Catalog. In the DataBricks UI, I see the Policy "shared-gp-(r6g)-small" and Runtime 13.3. (I have access to larger instances, just running a PoC on a small instance).Can anyone explain what looks like an inc...

  • 2012 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Great question — and yeah, what you’re seeing is a bit of a confusing experience that trips up a lot of folks working with Unity Catalog (UC). Let’s break it down: What’s Working for You   from pyspark.sql.types import LongType def squared_typed(s):...

  • 1 kudos
htd350
by New Contributor II
  • 1609 Views
  • 1 replies
  • 1 kudos

Predictive Optimization & Serverless Compute

Hello,I have a hard time understanding how predictive optimization if serverless compute is not enabled. According to the documentation:Predictive optimization identifies tables that would benefit from ANALYZE, OPTIMIZE, and VACUUM operations and que...

  • 1609 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @htd350, Predictive optimization in Databricks largely depends on the use of serverless compute to execute operations like ANALYZE, OPTIMIZE, and VACUUM, but not 100% sure if serverless is needed on all scenarios. I'll check internally and confirm...

  • 1 kudos
mrstevegross
by Contributor III
  • 1359 Views
  • 3 replies
  • 0 kudos

Graviton & containers?

Currently, DBR does not permit a user to run a containerized job on a graviton machines (per these docs). In our case, we're running containerized jobs on a pool. We are exploring adopting Graviton, but--per those docs--DBR won't let us do that.Are t...

  • 1359 Views
  • 3 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @mrstevegross Steve,I have found this docs from Databricks about enviroments, as you can see is in public preview... If you find my previous answer helpful, feel free to mark it as the solution so it can help others as well.Thanks!Isi

  • 0 kudos
2 More Replies
Labels