cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

lauraxyz
by New Contributor
  • 37 Views
  • 3 replies
  • 0 kudos

How to execute .sql file in volume

I have giant queries (SELECT.. FROM) that i store in .sql files. I want to put those files in the Volume, and run the queries from a workflow task.I can load the file content into a 'text' format string, then run the query.  My question is,  is there...

  • 37 Views
  • 3 replies
  • 0 kudos
Latest Reply
JAHNAVI
Databricks Employee
  • 0 kudos

@lauraxyz For SQL there is no direct way to run the file without parsing it. However, for Python, we can use %run to run the file from volumes.Example: %python %run /Volumes/jahnavi/datasets/data/test.py

  • 0 kudos
2 More Replies
lauraxyz
by New Contributor
  • 21 Views
  • 1 replies
  • 0 kudos

Rendering Volumes file content programmatically

Hi there!I have some files stored in Volume, and I have a use case that I need to show the file content in a UI.  Say I have a REST API that already knows the Volume path to the file, is there any built-in feature from Databricks that i can use to he...

  • 21 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

Hi @lauraxyz here is an example using the Databricks SDK in Python: from databricks.sdk import WorkspaceClient ws = WorkspaceClient() image_path = '/Volumes/catalog/schema/volume/filename.jpg' image_data = ( ws.files.download(image_path) # down...

  • 0 kudos
pemidexx
by New Contributor II
  • 38 Views
  • 2 replies
  • 0 kudos

AI_QUERY does not accept modelParameters argument

I am trying to pass a column of data from python/pandas to Spark, then run AI_QUERY. However, when I attempt to pass modelParameters (such as temperature), the function fails. Below is a minimal example: import pandas as pdqueries = pd.DataFrame([ ...

  • 38 Views
  • 2 replies
  • 0 kudos
Latest Reply
pemidexx
New Contributor II
  • 0 kudos

Hi @Walter_C , yes, I am receiving this error when only attempting to set temperature, which should be supported on most if not all models, including the specific models I'm working with. The error message seems to indicate this is a problem with AI_...

  • 0 kudos
1 More Replies
oakhill
by New Contributor III
  • 74 Views
  • 7 replies
  • 1 kudos

Is Delta Live Tables not supported anymore? How do I use it in Python?

Hi!Any time I try to import "dlt" in a notebook session to develop Pipelines, I get an error message saying DLT is not supported on Spark Connect clusters. These are very generic clusters, I've tried runtime 14, 15 and the latest 16, using shared clu...

  • 74 Views
  • 7 replies
  • 1 kudos
Latest Reply
BigRoux
Databricks Employee
  • 1 kudos

Oakhill, we do provide free onboard training. You might be interested in the "Get Started with Data Engineering on Databricks" session.  You can register here: https://www.databricks.com/training/catalog.  When you are searching the catalog of traini...

  • 1 kudos
6 More Replies
somedeveloper
by New Contributor
  • 25 Views
  • 1 replies
  • 0 kudos

Modifying size of /var/lib/lxc

Good morning,When running a library (sparkling water) for a very large dataset, I've noticed that during an export procedure the /var/lib/lxc storage is being used. Since the storage seems to be at a static 130GB of memory, this is a problem because ...

  • 25 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Unfortunately this is a setting that cannot be increased on customer side

  • 0 kudos
Dom1
by New Contributor II
  • 2169 Views
  • 4 replies
  • 3 kudos

Show log4j messages in run output

Hi,I have an issue when running JAR jobs. I expect to see logs in the output window of a run. Unfortunately, I can only see messages of that are generated with "System.out.println" or "System.err.println". Everything that is logged via slf4j is only ...

Dom1_0-1713189014582.png
  • 2169 Views
  • 4 replies
  • 3 kudos
Latest Reply
dbal
New Contributor III
  • 3 kudos

Any update on this? I am also facing this issue.

  • 3 kudos
3 More Replies
ChristianRRL
by Valued Contributor
  • 43 Views
  • 1 replies
  • 0 kudos

Databricks Workflows - Generate Tasks Programmatically

Hi there,I've used databricks workflows to explicitly create tasks with known input parameters (either user input or default parameters). But I'm wondering, what if I want the output of one task to be a list of specific ID's (e.g. id = [7,8,10,13,27]...

  • 43 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

This sounds like a great fit for the For Each task type! Here is the blog, and the documentation

  • 0 kudos
jeroaranda
by New Contributor II
  • 997 Views
  • 1 replies
  • 0 kudos

How to pass task name as parameter in scheduled job that will be used as a schema name in query

I want to run a parametrized sql query in a task. Query: select * from {{client}}.catalog.table with client value being {{task.name}}.if client is a string parameter, it is replaced with quotes which throws an error.if table is a dropdown list parame...

  • 997 Views
  • 1 replies
  • 0 kudos
Latest Reply
Zach_Jacobson23
Databricks Employee
  • 0 kudos

Try this:select * from identifier(:catalog||'.schema.table') The :catalog is a parameter within DBSQLReplace schema and table with actual names

  • 0 kudos
somedeveloper
by New Contributor
  • 46 Views
  • 2 replies
  • 0 kudos

Databricks Setting Dynamic Local Configuration Properties

It seems that Databricks is somehow setting the properties of local spark configurations for each notebook. Can someone point me to exactly how and where this is being done? I would like to set the scheduler to utilize a certain pool by default, but ...

  • 46 Views
  • 2 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

You will need to leverage cluster-level Spark configurations or global init scripts.  This will allow you to set "spark.scheduler.poo" property automatically for all workloads on the cluster. You can try navigationg to "Compute", select the cluster y...

  • 0 kudos
1 More Replies
chethankumar
by New Contributor III
  • 29 Views
  • 2 replies
  • 0 kudos

How to execute SQL statement using terraform

Is there a way to execute SQL statements using Terraform I can see it can be possible using API as bellow, https://docs.databricks.com/api/workspace/statementexecution/executestatementbut I want to know is a strength way to run like bellow code provi...

  • 29 Views
  • 2 replies
  • 0 kudos
Latest Reply
Nes_Hdr
New Contributor
  • 0 kudos

I was having the same question a while ago, and I couldn't find a way to automatically execute the query using terraform. What you can do though, is to set a schedule if the query needs to be executed rather regularly, or simply execute it manually i...

  • 0 kudos
1 More Replies
ChristianRRL
by Valued Contributor
  • 130 Views
  • 4 replies
  • 0 kudos

CREATE view USING json and *include* _metadata, _rescued_data

Title may be self-explanatory. Basically, I'm curious to ask if it's possible (and if so how) to add `_metadata` and `_rescued_data` fields to a view "using json".e.g. %sql CREATE OR REPLACE VIEW entity_view USING json OPTIONS (path="/.../.*json",mu...

ChristianRRL_0-1731949214474.png ChristianRRL_1-1731949348303.png
  • 130 Views
  • 4 replies
  • 0 kudos
Latest Reply
akhil393
Databricks Employee
  • 0 kudos

Hi @ChristianRRL You can still use the same method read_files when creating the view, I see that you are using classic hive style reader instead of using the read_files in the actual view definition of sql and you don't need to use spark.sql, please ...

  • 0 kudos
3 More Replies
shahabm
by New Contributor III
  • 2346 Views
  • 4 replies
  • 1 kudos

Resolved! Databricks job keep getting failed due to GC issue

There is a job that running successful but it's for more than a month we are experiencing long run which gets failed. In the stdout log file(attached), there are numerous following messages:[GC (Allocation Failure) [PSYoungGen:...]    and   [Full GC ...

  • 2346 Views
  • 4 replies
  • 1 kudos
Latest Reply
siddhu30
New Contributor
  • 1 kudos

Thanks a lot @shahabm for your prompt response, appreciate it. I'll try to debug in this direction.Thanks again!

  • 1 kudos
3 More Replies
VicS
by New Contributor II
  • 228 Views
  • 3 replies
  • 1 kudos

How to use custom whl file + pypi repo with a job cluster in asset bundles?

I tried looking through the documentation but it is confusing at best and misses important parts at worst.  Is there any place where the entire syntax and ALL options for asset bundle YAMLs are described? I found this https://docs.databricks.com/en/d...

  • 228 Views
  • 3 replies
  • 1 kudos
Latest Reply
VicS
New Contributor II
  • 1 kudos

It took me a while to realize the distinction of the keys inside the task - so for anyone else looking into this: only one of the following keys can exist in a task definition:    tasks: - task_key: ingestion_delta # existing_c...

  • 1 kudos
2 More Replies
ns_casper
by New Contributor II
  • 661 Views
  • 4 replies
  • 1 kudos

Databricks Excel ODBC driver bug

Hello!I might have experienced a bug with the ODBC driver. We have an issue where given certain priviledges in databricks, the ODBC driver is unable to show any schemas/tables.When we click the 'expand' button on any catalog in the list (of which we ...

  • 661 Views
  • 4 replies
  • 1 kudos
Latest Reply
jbibs
Visitor
  • 1 kudos

Following this post - we are also faced with the same issue. @KTheJoker- when I'm connecting and trying to expand a catalog, I do see the query fire off in the SQL Warehouse query history but in Excel nothing is returned. I can see the schemas/tables...

  • 1 kudos
3 More Replies
JissMathew
by New Contributor II
  • 185 Views
  • 6 replies
  • 1 kudos

Structured streaming in Databricks using delta table

Hi everyone, I’m new to Databricks and exploring its features. I’m trying to implement Change Data Capture (CDC) from the bronze layer to the silver layer using streaming. Could anyone share sample code or reference materials for implementing CDC wit...

  • 185 Views
  • 6 replies
  • 1 kudos
Latest Reply
Mike_Szklarczyk
New Contributor III
  • 1 kudos

You can also look at https://www.databricks.com/resources/demos#tutorials 

  • 1 kudos
5 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels