cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sparkrookie
by New Contributor II
  • 2699 Views
  • 1 replies
  • 0 kudos

Structured Streaming Delta Table - Reading and writing from same table

Hi I have a structured streaming job that reads from a delta table "A" and pushes to another delta table "B".A Schema - group_key, id, timestamp, valueB Schema - group_key, watermark_timestamp, derived_valueOne requirement is that i need to get the m...

  • 2699 Views
  • 1 replies
  • 0 kudos
shraddharane
by New Contributor
  • 40357 Views
  • 1 replies
  • 1 kudos

Migrating legacy SSAS cube to databricks

We have SQL database. Database is designed in star schema. We are migrating data from SQL to databricks. There are cubes designed using SSAS. These cubes are used for end users in excel for analysis purpose. We are now looking for solution for:1) Can...

  • 40357 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

Databricks itself does not deliver semantic models like SSAS cubes.  So Databricks cannot migrate them because there is nothing to migrate to.However, there are some options:- use of PowerBI instead of SSAS (there might even be a migrate option?).  W...

  • 1 kudos
140015
by Databricks Partner
  • 4029 Views
  • 3 replies
  • 1 kudos

Resolved! Using DLT pipeline with non-incremental data

Hi,I would like to know what you think about using the Delta Live Tables when the source for this pipeline is not incremental. What I mean by that is suppose that the data provider creates for me a new folder with files each time it has update to the...

  • 4029 Views
  • 3 replies
  • 1 kudos
Latest Reply
Joe_Suarez
New Contributor III
  • 1 kudos

When dealing with B2B data building, the process of updating and managing your data can present unique challenges. Since your data updates involve new folders with files and you need to process the entire new folder, the concept of incremental proces...

  • 1 kudos
2 More Replies
GNarain
by New Contributor II
  • 9509 Views
  • 7 replies
  • 4 kudos

Resolved! Is there api call to set "Table access control" workspace config ?

Is there api call to set "Table access control" workspace config ?

  • 9509 Views
  • 7 replies
  • 4 kudos
Latest Reply
SvenPeeters
New Contributor III
  • 4 kudos

Faciing the same issue, tried to fetch the current value via /api/2.0/workspace-conf?keys=enableTableAccessControlUnfortunately this is returning a 400 {    "error_code": "BAD_REQUEST",    "message": "Invalid keys: [\"enableTableAccessControl\"]"}

  • 4 kudos
6 More Replies
Eldar_Dragomir
by New Contributor II
  • 3466 Views
  • 1 replies
  • 2 kudos

Resolved! Reprocessing the data with Auto Loader

Could you please provide me an idea how I can start reprocessing of my data? Imagine I have a folder in adls gen2 "/test" with binaryFiles. They already processed with current pipeline. I want to reprocess the data + continue receive new data. What t...

  • 3466 Views
  • 1 replies
  • 2 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 2 kudos

@Eldar_Dragomir In order to re-process the data, we have to change the checkpoint directory. This will start processing the files from the beginning. You can use cloudFiles.maxFilesPerTrigger, to limit the number of files getting processed per micro-...

  • 2 kudos
anarad429
by New Contributor
  • 2472 Views
  • 1 replies
  • 1 kudos

Resolved! Unity Catalog + Reading variable from external notebook

I am trying to run a notebook which reads some of its variables from and external notebook (I used %run command for that purpose), but it keeps giving me error that these variables are not defined. These sequences of notebooks run perfectly fine on a...

  • 2472 Views
  • 1 replies
  • 1 kudos
Latest Reply
Atanu
Databricks Employee
  • 1 kudos

I think the issue here is the variable is not created until a value is assigned to it. So, you may need to assign a value to get_sql_schema

  • 1 kudos
NathanLaw
by New Contributor III
  • 1707 Views
  • 1 replies
  • 0 kudos

CPU and GPU Elapse Runtimes

I have to 2 questions about elapsed job runtimes. The same Scoring notebook is run 3 times as 3 Jobs.   The jobs are identical, same PetaStorm code, CPU cluster config ( not Spot cluster) and data but have varying elapsed runtimes.   Elapsed runtimes...

  • 1707 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hi @NathanLaw, Could you please confirm, if you have set any parameters for the best model? Is this stop after running some epochs if there is no improvement in the model performance? 

  • 0 kudos
Sanjay_AMP
by New Contributor II
  • 3514 Views
  • 1 replies
  • 1 kudos

Deployment-ready sample source-code for Delta Live Table & Autoloader

Hi all,We are planning to develop an Autoloader based DLT Pipeline that needs to beDeployable via a CI/CD PipelineObservableCan somebody please point me to source-code that we can start with a firm foundation instead of falling into a newbie-pattern ...

  • 3514 Views
  • 1 replies
  • 1 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 1 kudos

Hi @Sanjay_AMP Delta Live Tables and AutoLoader can be used together to incrementally ingest data from cloud object storage.• Python code example: - Define a table called "customers" that reads data from a CSV file in cloud object storage. - Define a...

  • 1 kudos
wojciech_jakubo
by New Contributor III
  • 19310 Views
  • 7 replies
  • 3 kudos

Question about monitoring driver memory utilization

Hi databricks/spark experts!I have a piece on pandas-based 3rd party code that I need to execute as a part of a bigger spark pipeline. By nature, pandas-based code is executed on driver node. I ran into out of memory problems and started exploring th...

Driver memory cycles_ Busy cluster
  • 19310 Views
  • 7 replies
  • 3 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 3 kudos

Hi @wojciech_jakubo 1. JVM memory will not be utilized for python related activities. 2. In the image we could only see the storage memory. We also have execution memory which would also be the same. Hence I came up with the executor memory to be of ...

  • 3 kudos
6 More Replies
kg6ka
by New Contributor
  • 3396 Views
  • 1 replies
  • 1 kudos

Is it possible to do without the github token and integration?

Hey, guys.I have a question, so, I have databricks jobs in workflow that are linked to my databricks repo, which contains the necessary scripts for one or another job. That is, the job is linked to the databricks repo.The main code is developed in gi...

  • 3396 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16752239289
Databricks Employee
  • 1 kudos

Does the user which the API token is generated from has the git credential configured for the git repo ?If not, you can follow the steps here : https://docs.databricks.com/en/repos/get-access-tokens-from-git-provider.html 

  • 1 kudos
Thor
by New Contributor III
  • 9121 Views
  • 1 replies
  • 2 kudos

Resolved! Dynamically change spark.task.cpus

Hello,I'm facing a problem with big tarballs to decompress and to fit in memory I had to limit Spark processing too many files at the same time so I changed the following property on my 8 cores VMs cluster:spark.task.cpus 4 This setting is the thresh...

  • 9121 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @Thor, Spark does not offer the capability to dynamically modify configuration settings, such as spark.task.cpus, for individual stages or transformations while the application is running. Once a configuration property is set for a Spark applicati...

  • 2 kudos
bharanireddy
by New Contributor
  • 2734 Views
  • 1 replies
  • 0 kudos

Resolved! Unable to access Data Engineer with Databricks V3 course

Hello, Since yesterday noon EST, the Data Engineering with Databricks V3 course is in maintenance mode. Can someone please help restore the access.Thank you,Bharani

  • 2734 Views
  • 1 replies
  • 0 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 0 kudos

I believe you will have access now.

  • 0 kudos
ThomasVanBilsen
by New Contributor III
  • 3163 Views
  • 1 replies
  • 0 kudos

Resolved! Lineage graph now working.

Hey everyone,I've run the following code successfully:CREATE CATALOG IF NOT EXISTS lineage_data;CREATE SCHEMA IF NOT EXISTS lineage_data.lineagedemo;CREATE TABLE IF NOT EXISTS  lineage_data.lineagedemo.menu (    recipe_id INT,    app string,    main ...

  • 3163 Views
  • 1 replies
  • 0 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 0 kudos

I recommend you open a ticket to the support.

  • 0 kudos
lazcanja
by New Contributor
  • 2648 Views
  • 0 replies
  • 0 kudos

How to update table location with wasb to abfss

I created a table including location such as: wasb://<container>@<storageaccount>.blob.core.windows.net/foldername We have updated access to storage accounts to use abfssI am trying to execute the following command: alter table mydatabase.mytable set...

  • 2648 Views
  • 0 replies
  • 0 kudos
Labels